> class(v) "numeric" > length(v) 80373285 # 80 million
The entries of
v are integers uniformly distributed between 0 and 100.
> ptm <- proc.time() > tv <- table(v) > show(proc.time() - ptm) user system elapsed 96.902 0.807 97.761
Why is the
table function so slow on this vector?
Is there a faster function for this simple operation?
By comparison, the
bigtable function from
bigtabulate is fast:
> library(bigtabulate) > ptm <- proc.time() ; bt <- bigtable(x = matrix(v,ncol=1), ccols=1) ; show(proc.time() - ptm) user system elapsed 4.163 0.120 4.286
bigtabulate is a good solution, it seems unwieldy to resort to a special package just for this simple function. Technically, there is overhead because I am contorting a vector into a matrix to make it work with
bigtable. Shouldn't there be simpler, faster solution in base
For whatever its worth, the base
cumsum is extremely fast even for this long vector:
> ptm <- proc.time() ; cs <- cumsum(v) ; show(proc.time() - ptm) user system elapsed 0.097 0.117 0.214