Reference classes in R are very useful for some situations, but using them has a cost. In this document, I’ll explore the costs in memory and speed of standard R reference classes vs. other reference objects which are created in different ways.

library(microbenchmark)
options(microbenchmark.unit = "us")
library(pryr)  # For object_size function
library(R6)

Class definitions

Here are a number of ways of creating reference objects in R, starting with the most complicated (standard R reference class) and ending with the simplest (an environment created by a closure).

R reference class

A_rc <- setRefClass("A_rc", 
  fields = list(x = "numeric"),
  methods = list(
    initialize = function(x = 1) .self$x <<- x,
    getx = function() x,
    inc = function(n = 1) x <<- x + n
  )
)

R6 class

R6 classes are similar to R’s standard reference objects, but they are simpler.

B_r6 <- R6Class("B_r6",
  public = list(
    x = NULL,
    initialize = function(x = 1) self$x <<- x,
    getx = function() x,
    inc = function(n = 1) x <<- x + n
  )
)

Objects of this type also have an automatically-created self member:

print(B_r6$new())
#> <B_r6>
#>   Public:
#>     getx: function
#>     inc: function
#>     initialize: function
#>     self: environment
#>     x: 1

R6 class, without class attribute

By default, a class attribute is added to the objects generated by the simple reference classes. This attribute adds a slight performance penalty because R will use S3 dispatch when using $ on the object.

It’s possible generate objects without the class attribute, by using class=FALSE:

C_r6_noclass <- R6Class("C_r6_noclass",
  public = list(
    x = NULL,
    initialize = function(x = 1) self$x <<- x,
    getx = function() x,
    inc = function(n = 1) x <<- x + n
  ),
  class = FALSE
)

R6 class, with public and private members

This is a variant of the previous type of reference class, but this version has public and private members.

D_r6_priv <- R6Class("D_r6_priv",
  private = list(x = NULL),
  public = list(
    initialize = function(x = 1) private$x <<- x,
    getx = function() x,
    inc = function(n = 1) x <<- x + n
  )
)

Instead of a single self object which refers to all items in an object, these objects have self (which refers to the public items) and private.

print(D_r6_priv$new())
#> <D_r6_priv>
#>   Public:
#>     getx: function
#>     inc: function
#>     initialize: function
#>     private: environment
#>     self: environment
#>   Private:
#>     x: 1

Environment created by a closure, with class attribute

This is simply an environment with a class attached to it.

E_closure_class <- function(x = 1) {
  inc <- function(n = 1) x <<- x + n
  getx <- function() x
  self <- environment()
  class(self) <- "D_closure"
  self
}

Even though x isn’t declared in the function body, it gets captured because it’s an argument to the function.

# Roundabout way to print the contents of a E object
str(as.list.environment(E_closure_class()))
#> List of 4
#>  $ self:Class 'D_closure' <environment: 0x7fdf1f10cc68> 
#>  $ getx:function ()  
#>   ..- attr(*, "srcref")=Class 'srcref'  atomic [1:8] 3 11 3 22 11 22 3 3
#>   .. .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x7fdf1f03f3f0> 
#>  $ inc :function (n = 1)  
#>   ..- attr(*, "srcref")=Class 'srcref'  atomic [1:8] 2 10 2 36 10 36 2 2
#>   .. .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x7fdf1f03f3f0> 
#>  $ x   : num 1

Objects created this way are very similar to those created by B_r6. The main difference is that those created by B_r6 contain an initialize function:

str(as.list.environment(B_r6$new()))
#> List of 5
#>  $ self      :Classes 'B_r6', 'R6' <environment: 0x7fdf1e602380> 
#>  $ inc       :function (n = 1)  
#>   ..- attr(*, "srcref")=Class 'srcref'  atomic [1:8] 6 11 6 37 11 37 6 6
#>   .. .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x7fdf1f05e740> 
#>  $ getx      :function ()  
#>   ..- attr(*, "srcref")=Class 'srcref'  atomic [1:8] 5 12 5 23 12 23 5 5
#>   .. .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x7fdf1f05e740> 
#>  $ initialize:function (x = 1)  
#>   ..- attr(*, "srcref")=Class 'srcref'  atomic [1:8] 4 18 4 45 18 45 4 4
#>   .. .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x7fdf1f05e740> 
#>  $ x         : num 1

Environment created by a closure, without class attribute

This is the simplest type of reference object:

F_closure_noclass <- function(x = 1) {
  inc <- function(n = 1) x <<- x + n
  getx <- function() x
  environment()
}

There are two differences between E and F: objects of type F don’t have a class attribute, and they don’t have a self object.


Tests

For all the timings using microbenchmark(), the results are reported in microseconds, and the most useful value is probably the median column.

Memory footprint

How much memory does a single instance of each object take, and how much memory does each additional object take?

# Utility functions for calculating sizes
obj_size <- function(expr, .env = parent.frame()) {
  size_n <- function(n = 1) {
    objs <- lapply(1:n, function(x) eval(expr, .env))
    as.numeric(do.call(object_size, objs))
  }

  data.frame(one = size_n(1), incremental = size_n(2) - size_n(1))
}

obj_sizes <- function(..., .env = parent.frame()) {
  exprs <- as.list(match.call(expand.dots = FALSE)$...)
  names(exprs) <- lapply(1:length(exprs),
    FUN = function(n) {
      name <- names(exprs)[n]
      if (is.null(name) || name == "") paste(deparse(exprs[[n]]), collapse = " ")
      else name
    })

  sizes <- mapply(obj_size, exprs, MoreArgs = list(.env = .env), SIMPLIFY = FALSE)
  do.call(rbind, sizes)
}

Sizes of each type of object, in bytes:

obj_sizes(
  A_rc$new(),
  B_r6$new(),
  C_r6_noclass$new(),
  D_r6_priv$new(),
  E_closure_class(),
  F_closure_noclass()
)
#>                        one incremental
#> A_rc$new()          472072        1368
#> B_r6$new()           12040         728
#> C_r6_noclass$new()   12368         672
#> D_r6_priv$new()      12608         840
#> E_closure_class()    10720         624
#> F_closure_noclass()   9288         512

It looks like using a reference class takes up a huge amount of memory, but much of that is shared between reference classes. Adding another object from a different reference class doesn’t require much more memory – around 38KB:

A_rc2 <- setRefClass("A_rc2",
  fields = list(x = "numeric"),
  methods = list(
    initialize = function(x = 2) .self$x <<- x,
    inc = function(n = 2) x <<- x * n
  )
)

# Size of a new A_rc2 object, over and above an A_rc object
as.numeric(object_size(A_rc$new(), A_rc2$new()) - object_size(A_rc$new()))
#> [1] 37688

Object instantiation speed

How much time does it take to create one of these objects? (The median time is probably the most informative.)

speed <- microbenchmark(
  A_rc$new(),
  B_r6$new(),
  C_r6_noclass$new(),
  D_r6_priv$new(),
  E_closure_class(),
  F_closure_noclass()
)
speed
#> Unit: microseconds
#>                 expr    min     lq median     uq      max neval
#>           A_rc$new() 296.00 310.00 321.00 331.00 1,200.00   100
#>           B_r6$new()  29.60  32.30  34.80  41.00    56.00   100
#>   C_r6_noclass$new()  21.30  23.80  26.10  31.20    62.50   100
#>      D_r6_priv$new()  35.20  38.30  41.90  50.60   877.00   100
#>    E_closure_class()   1.86   2.94   3.40   3.73    62.90   100
#>  F_closure_noclass()   0.79   1.34   1.55   1.85     4.55   100

R reference classes are much slower to instantiate than the other types of classes, with a median of 0.3207 milliseconds per instantiation.

Field access speed

How much time does it take to access a field in an object? First we’ll make some objects:

A <- A_rc$new()
B <- B_r6$new()
C <- C_r6_noclass$new()
D <- D_r6_priv$new()
E <- E_closure_class()
F <- F_closure_noclass()

Getting a value:

microbenchmark(
  A_rc = A$x,
  B_r6 = B$x,
  C_r6_noclass = C$x,
  D_r6_priv = D$private$x,
  E_closure_class = E$x,
  F_closure_noclass = F$x
)
#> Unit: microseconds
#>               expr   min    lq median     uq   max neval
#>               A_rc 9.260 9.890 10.200 10.500 48.60   100
#>               B_r6 1.900 2.140  2.370  2.560 35.80   100
#>       C_r6_noclass 0.172 0.281  0.339  0.411  9.24   100
#>          D_r6_priv 2.040 2.370  2.610  2.860 24.90   100
#>    E_closure_class 1.500 1.690  1.960  2.220  7.69   100
#>  F_closure_noclass 0.188 0.272  0.339  0.394  1.30   100

Setting a value:

microbenchmark(
  A_rc = A$x <- 4,
  B_r6 = B$x <- 4,
  C_r6_noclass = C$x <- 4,
  D_r6_priv = D$private$x <- 4,
  E_closure_class = E$x <- 4,
  F_closure_noclass = F$x <- 4
)
#> Unit: microseconds
#>               expr    min     lq median    uq    max neval
#>               A_rc 50.600 58.700  60.70 67.30 215.00   100
#>               B_r6  2.810  3.510   3.90  4.45   7.66   100
#>       C_r6_noclass  0.762  1.060   1.22  1.40  15.30   100
#>          D_r6_priv  5.470  6.780   7.44  8.52  41.80   100
#>    E_closure_class  2.580  3.000   3.36  3.78   9.70   100
#>  F_closure_noclass  0.715  0.933   1.05  1.28   7.08   100

The differences between the pairs C, D, and E, F are due to overhead from the class attribute. Because C and E have a class attribute, R will check whether there is a $ method for its class. All of the objects A, B, D, and E have a class, while C and F do not.

The standard reference class is slowest.

Method call speed

How much overhead is there when calling a method from one of these objects?

microbenchmark(
  A_rc = A$getx(),
  B_r6 = B$getx(),
  C_r6_noclass = C$getx(),
  D_r6_priv = D$getx(),
  E_closure_class = E$getx(),
  F_closure_noclass = F$getx()
)
#> Unit: microseconds
#>               expr   min     lq median     uq    max neval
#>               A_rc 9.950 10.700 11.000 11.400 184.00   100
#>               B_r6 2.210  2.500  2.680  3.000  36.00   100
#>       C_r6_noclass 0.341  0.480  0.553  0.640   6.16   100
#>          D_r6_priv 2.350  2.640  2.880  3.120   8.76   100
#>    E_closure_class 1.790  2.200  2.430  2.600   8.43   100
#>  F_closure_noclass 0.356  0.484  0.554  0.608   1.64   100

As expected, method call speed is very close to the field access speed – in this case there’s just the small additional overhead of calling a function.

Standard reference classes are the slowest by a large margin.

The difference between the pairs B, C, and E, F is probably due to S3 method lookup for the $ function – there could be a $.myclass method which would be called for myclass objects.

Overhead from using self

With standard reference class objects, you can modify fields using the <<- operator, or by using the self object. For example, compare the inc() methods of these two classes:

rc_self <- setRefClass("rc_self",
  fields = list(x = "numeric"),
  methods = list(
    initialize = function(x = 1) .self$x <- x,
    inc = function(n = 1) .self$x <- x + n
  )
)

rc_no_self <- setRefClass("rc_no_self",
  fields = list(x = "numeric"),
  methods = list(
    initialize = function(x = 1) .self$x <- x,
    inc = function(n = 1) x <<- x + n
  )
)

R6 classes are similar, except they use self instead of .self:

r6_self <- R6Class("r6_self",
  public = list(
    x = 1,
    inc = function(n = 1) self$x <- x + n
  )
)

r6_no_self <- R6Class("r6_no_self",
  public = list(
    x = 1,
    inc = function(n = 1) x <<- x + n
  )
)
rc_self_obj <- rc_self$new()
rc_no_self_obj <- rc_no_self$new()
r6_self_obj <- r6_self$new()
r6_no_self_obj <- r6_no_self$new()

microbenchmark(
  rc_self = rc_self_obj$inc(),
  rc_no_self = rc_no_self_obj$inc(),
  r6_self = r6_self_obj$inc(),
  r6_no_self = r6_no_self_obj$inc()
)
#> Unit: microseconds
#>        expr   min    lq median    uq   max neval
#>     rc_self 57.50 60.10  61.70 67.20 148.0   100
#>  rc_no_self 36.00 37.80  39.00 41.20 245.0   100
#>     r6_self  5.42  6.73   7.69  8.28  18.1   100
#>  r6_no_self  2.79  3.26   3.76  4.19  11.1   100

Using .self or self adds some overhead, which makes sense when you consider how R searches for objects.

When the method accesses x without using .self, R first searches in the execution environment but doesn’t find x there, so it then searches in the parent environment, finds x there, and assigns the value.

When using .self, R searches for .self in the function’s execution environment but doesn’t find it there, so it looks in the parent environment (which also happens to be the object environment, as well as the environment that .self points to) and does find it there. Then it looks in the .self environment for x, and assigns the value.

Additionally, there is some overhead when the environment has a class attribute.

r6_self_obj <- r6_self$new()
r6_no_self_obj <- r6_no_self$new()

r6_self_noclass_obj <- r6_self$new()
class(r6_self_noclass_obj) <- NULL
r6_no_self_noclass_obj <- r6_no_self$new()
class(r6_no_self_noclass_obj) <- NULL

microbenchmark(
  r6_self = r6_self_obj$inc(),
  r6_no_self = r6_no_self_obj$inc(),
  r6_self_noclass = r6_self_noclass_obj$inc(),
  r6_no_self_noclass = r6_no_self_noclass_obj$inc()
)
#> Unit: microseconds
#>                expr   min    lq median   uq   max neval
#>             r6_self 5.340 5.620   5.79 6.04 41.00   100
#>          r6_no_self 2.560 2.740   2.82 2.96 37.80   100
#>     r6_self_noclass 1.070 1.250   1.38 1.51  6.04   100
#>  r6_no_self_noclass 0.562 0.682   0.78 0.87  2.98   100

Lists vs. environments, and S3 object access overhead

This compares member access time with lists vs. environments, and when the list/environment has a class attribute vs. not having a class. If the class has a class attribute, R will use method lookup for $, which adds overhead.

list_noclass <- list(x = 10)
list_class <- structure(list(x = 10), class = "foo")
env_noclass <- new.env()
env_noclass$x <- 10
env_class <- structure(new.env(), class = "foo")
env_class$x <- 10

microbenchmark(
  list_noclass = list_noclass$x,
  list_class = list_class$x,
  env_noclass = env_noclass$x,
  env_class = env_class$x
)
#> Unit: microseconds
#>          expr   min    lq median    uq   max neval
#>  list_noclass 0.177 0.208  0.245 0.284 33.60   100
#>    list_class 1.390 1.460  1.520 1.640 22.80   100
#>   env_noclass 0.176 0.226  0.286 0.308  2.67   100
#>     env_class 1.450 1.530  1.610 1.740  4.40   100

Wrap-up

R6 class objects take less memory and are faster than standard reference class objects. Reference classes do provide additional features, such as type checking of fields, but these aren’t, in my opinion, enough to offset the performance penalty and especially the issues with S4 (which reference classes are built on). Another advantage to R6 objects is that they are simpler and easier to understand than R’s reference class objects.

Appendix

This document was generated with:

sessionInfo()
#> R version 3.1.1 (2014-07-10)
#> Platform: x86_64-apple-darwin13.1.0 (64-bit)
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] R6_1.0               pryr_0.1             microbenchmark_1.3-0
#> 
#> loaded via a namespace (and not attached):
#>  [1] codetools_0.2-8  digest_0.6.4     evaluate_0.5.5   formatR_0.10    
#>  [5] htmltools_0.2.4  knitr_1.6        Rcpp_0.11.2      rmarkdown_0.2.49
#>  [9] stringr_0.6.2    tools_3.1.1      yaml_2.1.13