Optimizations

Author

Marie-Hélène Burle

A lot of hardware is not the answer to poorly written code. Before considering parallelization, you should think about ways to optimize your code sequentially.

Why?

Not all code can be parallelized.
Parallelization is costly (waiting time to access a cluster or money).
The optimization of the sequential code will also benefit the parallel code.

In many cases, writing better code will save you more computing time than parallelization.

In this section, we will cover several principles by playing with the programmatic implementation of the fizz buzz game.

Toy example

Fizz buzz is a children game to practice divisions. Players take turn counting out loud while replacing:

any number divisible by 3 with the word “Fizz”,
any number divisible by 5 with the word “Buzz”,
any number divisible by both 3 and 5 with the word “FizzBuzz”.

Let’s write functions to solve the game and time them to draw some general principles about more efficient code.

We will use bench::mark() to benchmark our solutions. Let’s load it:

library(bench)

Pre-allocate memory

In this first function, we create an empty object z of class integer and of length 0 that will hold the result of a loop, then we run the loop and at each iteration, we add a new value to z:

f1 <- function(n) {
  z <- integer()
  for(i in 1:n) {
    if(i %% 3 == 0 && i %% 5 == 0) {
      z[i] <- "FizzBuzz"
    } else if(i %% 3 == 0) {
      z[i] <- "Fizz"
    } else if(i %% 5 == 0) {
      z[i] <- "Buzz"
    } else {
      z[i] <- i
    }
  }
  z
}

The second function is very similar, but this time, we create an empty object z of class integer and of length matching the final length z will have after running the loop. This means that we are pre-allocating memory for the full vector before we run the loop instead of growing the vector at each iteration:

f2 <- function(n) {
  z <- integer(n)
  for(i in 1:n) {
    if(i %% 3 == 0 && i %% 5 == 0) {
      z[i] <- "FizzBuzz"
    } else if(i %% 3 == 0) {
      z[i] <- "Fizz"
    } else if(i %% 5 == 0) {
      z[i] <- "Buzz"
    } else {
      z[i] <- i
    }
  }
  z
}

Let’s make sure that our functions work by testing it on a small number:

f1(20)

 [1] "1"        "2"        "Fizz"     "4"        "Buzz"     "Fizz"    
 [7] "7"        "8"        "Fizz"     "Buzz"     "11"       "Fizz"    
[13] "13"       "14"       "FizzBuzz" "16"       "17"       "Fizz"    
[19] "19"       "Buzz"

f2(20)

 [1] "1"        "2"        "Fizz"     "4"        "Buzz"     "Fizz"    
 [7] "7"        "8"        "Fizz"     "Buzz"     "11"       "Fizz"    
[13] "13"       "14"       "FizzBuzz" "16"       "17"       "Fizz"    
[19] "19"       "Buzz"

Now, let’s benchmark them for a large number:

n <- 1e5
mark(f1(n), f2(n))

Warning: Some expressions had a GC in every iteration; so filtering is
disabled.

# A tibble: 2 × 6
  expression      min   median `itr/sec` mem_alloc `gc/sec`
  <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 f1(n)         135ms    139ms      7.17   16.55MB     23.3
2 f2(n)         121ms    127ms      7.77    1.15MB     27.2

f2() is consistently faster. While in this example the difference is very slight, pre-allocating the object that will hold the result of a loop before running the loop can make a huge difference.

Aren’t loops a big ‘no no’ in R?

By now, you might be thinking: “Wait… aren’t loops a big ‘no no’ in R? I’ve always been told that they are slow and that one should always use functional programming! We are talking about optimization in this course and we are using loops?!?”

There are a lot of misconceptions around R loops. They can be very slow if you don’t pre-allocate memory. Otherwise they are almost always faster than functions (the apply() family or the tidyverse equivalent of the purrr::map() family). You can choose to use a functional programming approach for style and readability, but not for speed.

Let’s test it.

First we create a function:

fb <- function(n) {
  if(n %% 3 == 0 && n %% 5 == 0) {
    "FizzBuzz"
  } else if(n %% 3 == 0) {
    "Fizz"
  } else if(n %% 5 == 0) {
    "Buzz"
  } else {
    n
  }
}

Then we pass it through sapply(). We can test that it works on a small number:

sapply(1:20, fb)

 [1] "1"        "2"        "Fizz"     "4"        "Buzz"     "Fizz"    
 [7] "7"        "8"        "Fizz"     "Buzz"     "11"       "Fizz"    
[13] "13"       "14"       "FizzBuzz" "16"       "17"       "Fizz"    
[19] "19"       "Buzz"

Finally, we compare the timing with that of f2():

mark(f2(n), sapply(1:n, fb))

Warning: Some expressions had a GC in every iteration; so filtering is
disabled.

# A tibble: 2 × 6
  expression           min   median `itr/sec` mem_alloc `gc/sec`
  <bch:expr>      <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 f2(n)              132ms    146ms      6.94    1.15MB     27.8
2 sapply(1:n, fb)    177ms    177ms      5.40    3.29MB     25.2

As you can see, the loop is faster.

Avoid unnecessary operations

Example 1

Calling z as the last command in our function is the same as calling return(z).

From the R documentation:

If the end of a function is reached without calling return, the value of the last evaluated expression is returned.

Now, what about using print() instead?

f3 <- function(n) {
  z <- integer(n)
  for(i in 1:n) {
    if(i %% 3 == 0 && i %% 5 == 0) {
      z[i] <- "FizzBuzz"
    } else if(i %% 3 == 0) {
      z[i] <- "Fizz"
    } else if(i %% 5 == 0) {
      z[i] <- "Buzz"
    } else {
      z[i] <- i
    }
  }
  print(z)
}

Let’s benchmark it against f2():

mark(f2(n), f3(n))

 [1] "1"        "2"        "Fizz"     "4"        "Buzz"     "Fizz"    
 [7] "7"        "8"        "Fizz"     "Buzz"     "11"       "Fizz"    
[13] "13"       "14"       "FizzBuzz" "16"       "17"       "Fizz"    
[19] "19"       "Buzz"     "Fizz"     "22"       "23"       "Fizz"    
[25] "Buzz"     "26"       "Fizz"     "28"       "29"       "FizzBuzz"
[31] "31"       "32"       "Fizz"     "34"       "Buzz"     "Fizz"    
[37] "37"       "38"       "Fizz"     "Buzz"     "41"       "Fizz"
...

Warning: Some expressions had a GC in every iteration; so filtering is
disabled.
# A tibble: 2 × 6
  expression      min   median `itr/sec` mem_alloc `gc/sec`
  <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 f2(1e+05)  151.88ms 157.65ms     6.30     1.25MB     29.9
2 f3(1e+05)     3.25s    3.25s     0.308    1.04GB     26.8

What happened?

print() returns its argument, but it additionally prints it to the standard output. This is why the mark() function printed the output of f3() before printing the timings.

As you can see, printing takes a long time.

The code in this website is run by Quarto. Since, by default, RStudio will only print the first 1,000 results, the timing you will get for f3() in RStudio will be much less bad as it won’t include the time it takes to print the remaining 99,000 results.

If you are evaluating f2() on its own (e.g. f2(20)), the returned result will also be printed to standard output and both functions will be equivalent. However, if you are using the function in another context, printing becomes an unnecessary and timely operation and f3() would be a very bad option. f3() is thus not a good function.

Here is an example in which f3() would perform a totally unnecessary operation that f2() avoids:

a <- f2(20)

No unnecessary printing.

a <- f3(20)

 [1] "1"        "2"        "Fizz"     "4"        "Buzz"     "Fizz"    
 [7] "7"        "8"        "Fizz"     "Buzz"     "11"       "Fizz"    
[13] "13"       "14"       "FizzBuzz" "16"       "17"       "Fizz"    
[19] "19"       "Buzz"

Unnecessary printing.

For 1e5, the difference in timing between running an unnecessary printing vs not is a factor of 21!

Even worse would be to use:

f4 <- function(n) {
  for(i in 1:n) {
    if(i %% 3 == 0 && i %% 5 == 0) {
      print("FizzBuzz")
    } else if(i %% 3 == 0) {
      print("Fizz")
    } else if(i %% 5 == 0) {
      print("Buzz")
    } else {
      print(i)
    }
  }
}

Here the difference in timing is a factor of 50…

Example 2

One modulo operation and equality test can be removed by replacing i %% 3 == 0 && i %% 5 == 0 by i %% 15 == 0. The difference isn’t huge, but there is a slight speedup:

f2bis <- function(n) {
  z <- integer(n)
  for(i in 1:n) {
    if(i %% 15 == 0) {
      z[i] <- "FizzBuzz"
    } else if(i %% 3 == 0) {
      z[i] <- "Fizz"
    } else if(i %% 5 == 0) {
      z[i] <- "Buzz"
    } else {
      z[i] <- i
    }
  }
  z
}

mark(f2(n), f2bis(n))

Warning: Some expressions had a GC in every iteration; so filtering is
disabled.

# A tibble: 2 × 6
  expression      min   median `itr/sec` mem_alloc `gc/sec`
  <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 f2(n)         112ms    142ms      6.75    1.15MB     28.7
2 f2bis(n)      105ms    106ms      9.44    1.22MB     34.0

Example 3

But is f2() really the fastest function?

Louis Arsenault-Mahjoubi—who attended this workshop—found ways to get rid of several operations and get a speedup of 1.7 over f2().

First, we can assign 1:n to z instead of pre-allocating memory with an empty vector, thus rendering the assignment of i to z[i] unnecessary in the last else statement:

f_louis1 <- function(n) {
  z <- 1:n
  for(i in z) {
    if(i %% 3 == 0 && i %% 5 == 0) {
      z[i] <- "FizzBuzz"
    } else if(i %% 3 == 0) {
      z[i] <- "Fizz"
    } else if(i %% 5 == 0) {
      z[i] <- "Buzz"
    } 
  }
  z
}

This function works:

f_louis1(20)

 [1] "1"        "2"        "Fizz"     "4"        "Buzz"     "Fizz"    
 [7] "7"        "8"        "Fizz"     "Buzz"     "11"       "Fizz"    
[13] "13"       "14"       "FizzBuzz" "16"       "17"       "Fizz"    
[19] "19"       "Buzz"

… and is faster (speedup of 1.3):

mark(f2bis(n), f_louis1(n))

Warning: Some expressions had a GC in every iteration; so filtering is
disabled.

# A tibble: 2 × 6
  expression       min   median `itr/sec` mem_alloc `gc/sec`
  <bch:expr>  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 f2bis(n)     108.8ms  109.7ms      9.04    1.15MB     34.4
2 f_louis1(n)   80.7ms   82.8ms     12.1     1.15MB     39.7

Then, we can prevent the repetitions of the modulo operations and equality tests by saving them to variables:

f_louis2 <- function(n) {
  z <- 1:n
  for(i in z) {
    div3 <- (i %% 3 == 0)
    div5 <- (i %% 5 == 0)
    if(div3 && div5) {
      z[i] <- "FizzBuzz"
    } else if(div3) {
      z[i] <- "Fizz"
    } else if(div5) {
      z[i] <- "Buzz"
    } 
  }
  z
}

This gets us an even greater speedup of 1.7:

mark(f2bis(n), f_louis2(n))

Warning: Some expressions had a GC in every iteration; so filtering is
disabled.

# A tibble: 2 × 6
  expression       min   median `itr/sec` mem_alloc `gc/sec`
  <bch:expr>  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 f2bis(n)     109.3ms  112.4ms      8.69    1.15MB     34.8
2 f_louis2(n)   66.6ms   68.9ms     14.3     1.21MB     33.9

But it gets even better: we can actually get rid of the for loop!

f_louis3 <- function(n) {
  z <- 1:n
  div3 <- (z %% 3 == 0)
  div5 <- (z %% 5 == 0)
  z[which(div3)] <- "Fizz"
  z[which(div5)] <- "Buzz"
  z[which(div3 & div5)] <- "FizzBuzz"
  z
}

Now we get a speedup of 5.5 compared to our best f2 function:

mark(f2bis(n), f_louis3(n))

Warning: Some expressions had a GC in every iteration; so filtering is
disabled.

# A tibble: 2 × 6
  expression       min   median `itr/sec` mem_alloc `gc/sec`
  <bch:expr>  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 f2bis(n)       117ms    119ms      8.24    1.17MB    29.7 
2 f_louis3(n)     24ms     25ms     38.6     5.19MB     3.86

You can ensure that we still get the same result:

f_louis3(20)

 [1] "1"        "2"        "Fizz"     "4"        "Buzz"     "Fizz"    
 [7] "7"        "8"        "Fizz"     "Buzz"     "11"       "Fizz"    
[13] "13"       "14"       "FizzBuzz" "16"       "17"       "Fizz"    
[19] "19"       "Buzz"

Replace costly operations where possible

Now imagine that we have a dataframe called dat with a first column called datvar filled with integers.

We want to write a function that will accept our dataframe as argument and play the fizz buzz game on the column datvar.

One could imagine the following function:

f5 <- function(dat) {
  z <- integer(length(dat[[1]]))
  for(i in seq_along(dat[[1]])) {
    if(dat[[1]][i] %% 3 == 0 && dat[[1]][i] %% 5 == 0) {
      z[i] <- "FizzBuzz"
    } else if(dat[[1]][i] %% 3 == 0) {
      z[i] <- "Fizz"
    } else if(dat[[1]][i] %% 5 == 0) {
      z[i] <- "Buzz"
    } else {
      z[i] <- dat[[1]][i]
    }
  }
  z
}

Indexing a column from a dataframe in this fashion is a very costly operation. It is much more efficient to index with the name of the column:

f6 <- function(dat) {
  z <- integer(length(dat$datvar))
  for(i in seq_along(dat$datvar)) {
    if(dat$datvar[i] %% 3 == 0 && dat$datvar[i] %% 5 == 0) {
      z[i] <- "FizzBuzz"
    } else if(dat$datvar[i] %% 3 == 0) {
      z[i] <- "Fizz"
    } else if(dat$datvar[i] %% 5 == 0) {
      z[i] <- "Buzz"
    } else {
      z[i] <- dat$datvar[i]
    }
  }
  z
}

Now, let’s create a random dataframe to benchmark f5() and f6():

set.seed(123)
dat <- data.frame(datvar = round(runif(n, 1, n)))
mark(f5(dat), f6(dat))

Warning: Some expressions had a GC in every iteration; so filtering is
disabled.

# A tibble: 2 × 6
  expression      min   median `itr/sec` mem_alloc `gc/sec`
  <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 f5(dat)       1.81s    1.81s     0.553    1.26MB     30.4
2 f6(dat)    362.63ms  369.3ms     2.71     1.26MB     27.1

Avoid repetitions of costly operations

This made a big difference (speedup of 5), but notice that we are indexing the column 6 times in our function. Let’s remove the repetition of this operation:

f7 <- function(dat) {
  var <- dat$datvar
  z <- integer(length(var))
  for(i in seq_along(var)) {
    if(var[i] %% 3 == 0 && var[i] %% 5 == 0) {
      z[i] <- "FizzBuzz"
    } else if(var[i] %% 3 == 0) {
      z[i] <- "Fizz"
    } else if(var[i] %% 5 == 0) {
      z[i] <- "Buzz"
    } else {
      z[i] <- var[i]
    }
  }
  z
}

Let’s benchmark all 3 versions:

mark(f5(dat), f6(dat), f7(dat))

Warning: Some expressions had a GC in every iteration; so filtering is
disabled.

# A tibble: 3 × 6
  expression      min   median `itr/sec` mem_alloc `gc/sec`
  <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 f5(dat)       1.78s    1.78s     0.562    1.15MB     30.9
2 f6(dat)    364.44ms 371.28ms     2.69     1.15MB     26.9
3 f7(dat)    151.86ms 156.22ms     5.95     1.25MB     17.9

f7() gave us another speedup of almost 3 over f6(). f7() runs 14 times faster than our initial function!

Indexing from a vector isn’t costly. There is thus no advantage at removing the repetition of that operation.

Your turn:

Show that this last statement is true.