library(bench)
Optimizations
A lot of hardware is not the answer to poorly written code. Before considering parallelization, you should think about ways to optimize your code sequentially.
Why?
- Not all code can be parallelized.
- Parallelization is costly (waiting time to access a cluster or money).
- The optimization of the sequential code will also benefit the parallel code.
In many cases, writing better code will save you more computing time than parallelization.
In this section, we will cover several principles by playing with the programmatic implementation of the fizz buzz game.
Toy example
Fizz buzz is a children game to practice divisions. Players take turn counting out loud while replacing:
- any number divisible by 3 with the word “Fizz”,
- any number divisible by 5 with the word “Buzz”,
- any number divisible by both 3 and 5 with the word “FizzBuzz”.
Let’s write functions to solve the game and time them to draw some general principles about more efficient code.
We will use bench::mark()
to benchmark our solutions. Let’s load it:
Pre-allocate memory
In this first function, we create an empty object z
of class integer and of length 0
that will hold the result of a loop, then we run the loop and at each iteration, we add a new value to z
:
<- function(n) {
f1 <- integer()
z for(i in 1:n) {
if(i %% 3 == 0 && i %% 5 == 0) {
<- "FizzBuzz"
z[i] else if(i %% 3 == 0) {
} <- "Fizz"
z[i] else if(i %% 5 == 0) {
} <- "Buzz"
z[i] else {
} <- i
z[i]
}
}
z }
The second function is very similar, but this time, we create an empty object z
of class integer and of length matching the final length z
will have after running the loop. This means that we are pre-allocating memory for the full vector before we run the loop instead of growing the vector at each iteration:
<- function(n) {
f2 <- integer(n)
z for(i in 1:n) {
if(i %% 3 == 0 && i %% 5 == 0) {
<- "FizzBuzz"
z[i] else if(i %% 3 == 0) {
} <- "Fizz"
z[i] else if(i %% 5 == 0) {
} <- "Buzz"
z[i] else {
} <- i
z[i]
}
}
z }
Let’s make sure that our functions work by testing it on a small number:
f1(20)
[1] "1" "2" "Fizz" "4" "Buzz" "Fizz"
[7] "7" "8" "Fizz" "Buzz" "11" "Fizz"
[13] "13" "14" "FizzBuzz" "16" "17" "Fizz"
[19] "19" "Buzz"
f2(20)
[1] "1" "2" "Fizz" "4" "Buzz" "Fizz"
[7] "7" "8" "Fizz" "Buzz" "11" "Fizz"
[13] "13" "14" "FizzBuzz" "16" "17" "Fizz"
[19] "19" "Buzz"
Now, let’s benchmark them for a large number:
<- 1e5
n mark(f1(n), f2(n))
Warning: Some expressions had a GC in every iteration; so filtering is
disabled.
# A tibble: 2 × 6
expression min median `itr/sec` mem_alloc `gc/sec`
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
1 f1(n) 135ms 139ms 7.17 16.55MB 23.3
2 f2(n) 121ms 127ms 7.77 1.15MB 27.2
f2()
is consistently faster. While in this example the difference is very slight, pre-allocating the object that will hold the result of a loop before running the loop can make a huge difference.
Aren’t loops a big ‘no no’ in R?
By now, you might be thinking: “Wait… aren’t loops a big ‘no no’ in R? I’ve always been told that they are slow and that one should always use functional programming! We are talking about optimization in this course and we are using loops?!?”
There are a lot of misconceptions around R loops. They can be very slow if you don’t pre-allocate memory. Otherwise they are almost always faster than functions (the apply()
family or the tidyverse equivalent of the purrr::map()
family). You can choose to use a functional programming approach for style and readability, but not for speed.
Let’s test it.
First we create a function:
<- function(n) {
fb if(n %% 3 == 0 && n %% 5 == 0) {
"FizzBuzz"
else if(n %% 3 == 0) {
} "Fizz"
else if(n %% 5 == 0) {
} "Buzz"
else {
}
n
} }
Then we pass it through sapply()
. We can test that it works on a small number:
sapply(1:20, fb)
[1] "1" "2" "Fizz" "4" "Buzz" "Fizz"
[7] "7" "8" "Fizz" "Buzz" "11" "Fizz"
[13] "13" "14" "FizzBuzz" "16" "17" "Fizz"
[19] "19" "Buzz"
Finally, we compare the timing with that of f2()
:
mark(f2(n), sapply(1:n, fb))
Warning: Some expressions had a GC in every iteration; so filtering is
disabled.
# A tibble: 2 × 6
expression min median `itr/sec` mem_alloc `gc/sec`
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
1 f2(n) 132ms 146ms 6.94 1.15MB 27.8
2 sapply(1:n, fb) 177ms 177ms 5.40 3.29MB 25.2
As you can see, the loop is faster.
Avoid unnecessary operations
Example 1
Calling z
as the last command in our function is the same as calling return(z)
.
From the R documentation:
If the end of a function is reached without calling return, the value of the last evaluated expression is returned.
Now, what about using print()
instead?
<- function(n) {
f3 <- integer(n)
z for(i in 1:n) {
if(i %% 3 == 0 && i %% 5 == 0) {
<- "FizzBuzz"
z[i] else if(i %% 3 == 0) {
} <- "Fizz"
z[i] else if(i %% 5 == 0) {
} <- "Buzz"
z[i] else {
} <- i
z[i]
}
}print(z)
}
Let’s benchmark it against f2()
:
mark(f2(n), f3(n))
[1] "1" "2" "Fizz" "4" "Buzz" "Fizz"
[7] "7" "8" "Fizz" "Buzz" "11" "Fizz"
[13] "13" "14" "FizzBuzz" "16" "17" "Fizz"
[19] "19" "Buzz" "Fizz" "22" "23" "Fizz"
[25] "Buzz" "26" "Fizz" "28" "29" "FizzBuzz"
[31] "31" "32" "Fizz" "34" "Buzz" "Fizz"
[37] "37" "38" "Fizz" "Buzz" "41" "Fizz"
...
Warning: Some expressions had a GC in every iteration; so filtering is
disabled.
# A tibble: 2 × 6
expression min median `itr/sec` mem_alloc `gc/sec`
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
1 f2(1e+05) 151.88ms 157.65ms 6.30 1.25MB 29.9
2 f3(1e+05) 3.25s 3.25s 0.308 1.04GB 26.8
What happened?
print()
returns its argument, but it additionally prints it to the standard output. This is why the mark()
function printed the output of f3()
before printing the timings.
As you can see, printing takes a long time.
The code in this website is run by Quarto. Since, by default, RStudio will only print the first 1,000 results, the timing you will get for f3()
in RStudio will be much less bad as it won’t include the time it takes to print the remaining 99,000 results.
If you are evaluating f2()
on its own (e.g. f2(20)
), the returned result will also be printed to standard output and both functions will be equivalent. However, if you are using the function in another context, printing becomes an unnecessary and timely operation and f3()
would be a very bad option. f3()
is thus not a good function.
Here is an example in which f3()
would perform a totally unnecessary operation that f2()
avoids:
<- f2(20) a
No unnecessary printing.
<- f3(20) a
[1] "1" "2" "Fizz" "4" "Buzz" "Fizz"
[7] "7" "8" "Fizz" "Buzz" "11" "Fizz"
[13] "13" "14" "FizzBuzz" "16" "17" "Fizz"
[19] "19" "Buzz"
Unnecessary printing.
For 1e5, the difference in timing between running an unnecessary printing vs not is a factor of 21!
Even worse would be to use:
<- function(n) {
f4 for(i in 1:n) {
if(i %% 3 == 0 && i %% 5 == 0) {
print("FizzBuzz")
else if(i %% 3 == 0) {
} print("Fizz")
else if(i %% 5 == 0) {
} print("Buzz")
else {
} print(i)
}
} }
Here the difference in timing is a factor of 50…
Example 2
One modulo operation and equality test can be removed by replacing i %% 3 == 0 && i %% 5 == 0
by i %% 15 == 0
. The difference isn’t huge, but there is a slight speedup:
<- function(n) {
f2bis <- integer(n)
z for(i in 1:n) {
if(i %% 15 == 0) {
<- "FizzBuzz"
z[i] else if(i %% 3 == 0) {
} <- "Fizz"
z[i] else if(i %% 5 == 0) {
} <- "Buzz"
z[i] else {
} <- i
z[i]
}
}
z
}
mark(f2(n), f2bis(n))
Warning: Some expressions had a GC in every iteration; so filtering is
disabled.
# A tibble: 2 × 6
expression min median `itr/sec` mem_alloc `gc/sec`
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
1 f2(n) 112ms 142ms 6.75 1.15MB 28.7
2 f2bis(n) 105ms 106ms 9.44 1.22MB 34.0
Example 3
But is f2()
really the fastest function?
Louis Arsenault-Mahjoubi—who attended this workshop—found ways to get rid of several operations and get a speedup of 1.7 over f2()
.
First, we can assign 1:n
to z
instead of pre-allocating memory with an empty vector, thus rendering the assignment of i
to z[i]
unnecessary in the last else statement:
<- function(n) {
f_louis1 <- 1:n
z for(i in z) {
if(i %% 3 == 0 && i %% 5 == 0) {
<- "FizzBuzz"
z[i] else if(i %% 3 == 0) {
} <- "Fizz"
z[i] else if(i %% 5 == 0) {
} <- "Buzz"
z[i]
}
}
z }
This function works:
f_louis1(20)
[1] "1" "2" "Fizz" "4" "Buzz" "Fizz"
[7] "7" "8" "Fizz" "Buzz" "11" "Fizz"
[13] "13" "14" "FizzBuzz" "16" "17" "Fizz"
[19] "19" "Buzz"
… and is faster (speedup of 1.3):
mark(f2bis(n), f_louis1(n))
Warning: Some expressions had a GC in every iteration; so filtering is
disabled.
# A tibble: 2 × 6
expression min median `itr/sec` mem_alloc `gc/sec`
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
1 f2bis(n) 108.8ms 109.7ms 9.04 1.15MB 34.4
2 f_louis1(n) 80.7ms 82.8ms 12.1 1.15MB 39.7
Then, we can prevent the repetitions of the modulo operations and equality tests by saving them to variables:
<- function(n) {
f_louis2 <- 1:n
z for(i in z) {
<- (i %% 3 == 0)
div3 <- (i %% 5 == 0)
div5 if(div3 && div5) {
<- "FizzBuzz"
z[i] else if(div3) {
} <- "Fizz"
z[i] else if(div5) {
} <- "Buzz"
z[i]
}
}
z }
This gets us an even greater speedup of 1.7:
mark(f2bis(n), f_louis2(n))
Warning: Some expressions had a GC in every iteration; so filtering is
disabled.
# A tibble: 2 × 6
expression min median `itr/sec` mem_alloc `gc/sec`
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
1 f2bis(n) 109.3ms 112.4ms 8.69 1.15MB 34.8
2 f_louis2(n) 66.6ms 68.9ms 14.3 1.21MB 33.9
But it gets even better: we can actually get rid of the for loop!
<- function(n) {
f_louis3 <- 1:n
z <- (z %% 3 == 0)
div3 <- (z %% 5 == 0)
div5 which(div3)] <- "Fizz"
z[which(div5)] <- "Buzz"
z[which(div3 & div5)] <- "FizzBuzz"
z[
z }
Now we get a speedup of 5.5 compared to our best f2
function:
mark(f2bis(n), f_louis3(n))
Warning: Some expressions had a GC in every iteration; so filtering is
disabled.
# A tibble: 2 × 6
expression min median `itr/sec` mem_alloc `gc/sec`
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
1 f2bis(n) 117ms 119ms 8.24 1.17MB 29.7
2 f_louis3(n) 24ms 25ms 38.6 5.19MB 3.86
You can ensure that we still get the same result:
f_louis3(20)
[1] "1" "2" "Fizz" "4" "Buzz" "Fizz"
[7] "7" "8" "Fizz" "Buzz" "11" "Fizz"
[13] "13" "14" "FizzBuzz" "16" "17" "Fizz"
[19] "19" "Buzz"
Replace costly operations where possible
Now imagine that we have a dataframe called dat
with a first column called datvar
filled with integers.
We want to write a function that will accept our dataframe as argument and play the fizz buzz game on the column datvar
.
One could imagine the following function:
<- function(dat) {
f5 <- integer(length(dat[[1]]))
z for(i in seq_along(dat[[1]])) {
if(dat[[1]][i] %% 3 == 0 && dat[[1]][i] %% 5 == 0) {
<- "FizzBuzz"
z[i] else if(dat[[1]][i] %% 3 == 0) {
} <- "Fizz"
z[i] else if(dat[[1]][i] %% 5 == 0) {
} <- "Buzz"
z[i] else {
} <- dat[[1]][i]
z[i]
}
}
z }
Indexing a column from a dataframe in this fashion is a very costly operation. It is much more efficient to index with the name of the column:
<- function(dat) {
f6 <- integer(length(dat$datvar))
z for(i in seq_along(dat$datvar)) {
if(dat$datvar[i] %% 3 == 0 && dat$datvar[i] %% 5 == 0) {
<- "FizzBuzz"
z[i] else if(dat$datvar[i] %% 3 == 0) {
} <- "Fizz"
z[i] else if(dat$datvar[i] %% 5 == 0) {
} <- "Buzz"
z[i] else {
} <- dat$datvar[i]
z[i]
}
}
z }
Now, let’s create a random dataframe to benchmark f5()
and f6()
:
set.seed(123)
<- data.frame(datvar = round(runif(n, 1, n)))
dat mark(f5(dat), f6(dat))
Warning: Some expressions had a GC in every iteration; so filtering is
disabled.
# A tibble: 2 × 6
expression min median `itr/sec` mem_alloc `gc/sec`
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
1 f5(dat) 1.81s 1.81s 0.553 1.26MB 30.4
2 f6(dat) 362.63ms 369.3ms 2.71 1.26MB 27.1
Avoid repetitions of costly operations
This made a big difference (speedup of 5), but notice that we are indexing the column 6 times in our function. Let’s remove the repetition of this operation:
<- function(dat) {
f7 <- dat$datvar
var <- integer(length(var))
z for(i in seq_along(var)) {
if(var[i] %% 3 == 0 && var[i] %% 5 == 0) {
<- "FizzBuzz"
z[i] else if(var[i] %% 3 == 0) {
} <- "Fizz"
z[i] else if(var[i] %% 5 == 0) {
} <- "Buzz"
z[i] else {
} <- var[i]
z[i]
}
}
z }
Let’s benchmark all 3 versions:
mark(f5(dat), f6(dat), f7(dat))
Warning: Some expressions had a GC in every iteration; so filtering is
disabled.
# A tibble: 3 × 6
expression min median `itr/sec` mem_alloc `gc/sec`
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
1 f5(dat) 1.78s 1.78s 0.562 1.15MB 30.9
2 f6(dat) 364.44ms 371.28ms 2.69 1.15MB 26.9
3 f7(dat) 151.86ms 156.22ms 5.95 1.25MB 17.9
f7()
gave us another speedup of almost 3 over f6()
. f7()
runs 14 times faster than our initial function!
Indexing from a vector isn’t costly. There is thus no advantage at removing the repetition of that operation.
Your turn:
Show that this last statement is true.