Memory management

Author

Marie-Hélène Burle

Memory can be a limiting factor and releasing it when not needed can be critical to avoid out of memory states. On the other hand, memoisation is an optimization technique which stores the results of heavy computations for re-use at the cost of increasing memory usage.

Memory and speed are thus linked in a trade-off.

Releasing memory

It is best to avoid creating very large intermediate objects (e.g. with nested functions or functions chained with the magrittr pipe), but if you must, remove them from the global environment with rm() when you don’t need them anymore. Once all the pointers to an object in memory are deleted, the garbage collector will clear its value and release the memory it used.

Another way to release the memory used by heavy intermediate objects is with functions: if you create those objects in the local environment of a function (instead of directly in the global environment), they will be cleared from memory as soon as the function has finished running.

Note that in the case of a very large function, it might still be beneficial to run rm() inside the function to clear the memory for other processes coming next within that function. But this is a pretty rare case.

Caching in memory

Memoisation is a technique by which the results of heavy computations are stored in memory to avoid have to re-calculate them. This can be convenient in a variety of settings (e.g. to reduce calls to an API), but mostly, it can greatly improve the efficiency of some code such as recursive function calls.

Let’s consider the calculation of the Fibonacci numbers as an example. Those numbers form a sequence starting with 0 and 11, after which each number is the sum of the previous two.

  • 1 Alternative versions have the sequence start with 1, 1 or with 1, 2.

  • Here is a function that would return the nth Fibonacci number2:

  • 2 There are more efficient ways to calculate the Fibonacci numbers, but this inefficient function is a great example to show the advantage of memoisation.

  • fib <- function(n) {
      if(n == 0) {
        return(0)
      } else if(n == 1) {
        return(1)
      } else {
        Recall(n - 1) + Recall(n - 2)
      }
    }

    It can be written more tersely as:

    fib <- function(n) {
      if(n == 0) return(0)
      if(n == 1) return(1)
      Recall(n - 1) + Recall(n - 2)
    }

    Recall() is a placeholder for the name of the recursive function. We could have used fib() instead, but Recall() is more robust as it allows for function renaming.

    Memoisation is very useful here because, for each Fibonacci number, we need to calculate the two preceding Fibonacci numbers and to calculate each of those we need to calculate the two Fibonacci numbers preceding that one and to calculate… etc. That is a large number of calculations, but, thanks to caching, we don’t have to calculate any one of them more than once.

    The packages R.cache and memoise both allow for memoisation with an incredibly simple syntax.

    Applying the latter to our function gives us:

    library(memoise)
    
    fibmem <- memoise(
      function(n) {
        if(n == 0) return(0)
        if(n == 1) return(1)
        Recall(n - 1) + Recall(n - 2)
      }
    )

    We can do some benchmarking to see the speedup for the 30th Fibonacci number:

    library(bench)
    
    n <- 30
    mark(fib(n), fibmem(n))
    Warning: Some expressions had a GC in every iteration; so filtering is
    disabled.
    # A tibble: 2 × 6
      expression      min   median `itr/sec` mem_alloc `gc/sec`
      <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
    1 fib(n)         1.6s     1.6s     0.623    34.1KB     21.8
    2 fibmem(n)    36.7µs   38.2µs 24465.        429KB     17.1

    The speedup is over 35,000!