1.6.2. - microbenchmark's multcomp reference missing #264

engineerchange · 2020-05-03T07:18:19Z

The code snippet in 1.6.2 for microbenchmark shows cld as a column in the output. cld only shows up as a column if multcomp is already installed (see microbenchmark doc).

Earlier in the chapter, it is suggested to install microbenchmark, but not this companion package.

Snippet from 1.6.2:

library("microbenchmark")
df = data.frame(v = 1:4, name = letters[1:4])
microbenchmark(df[3, 2], df[3, "name"], df$name[3])
# Unit: microseconds
#          expr     min    lq  mean median    uq   max neval cld
#      df[3, 2]   17.99 18.96 20.16  19.38 19.77 35.14   100   b
# df[3, "name"]   17.97 19.13 21.45  19.64 20.15 74.00   100   b
#    df$name[3]   12.48 13.81 15.81  14.48 15.14 67.24   100   a

The text was updated successfully, but these errors were encountered:

engineerchange · 2020-05-03T07:43:01Z

Note the code snippet in 1.6.3.1 doesn't show the microbenchmark output with the cld column.

Robinlovelace · 2020-05-03T11:57:30Z

I suggest we switch to using the bench package. Sound reasonable @engineerchange and @csgillespie ?

csgillespie · 2020-05-10T17:40:40Z

@Robinlovelace Switching to bench does involve a fair bit of work, as it's used throughout the book. Also, not entirely convinced that it's "better". Thoughts?

Robinlovelace · 2020-05-10T20:04:29Z

Thoughts?

Bench tells you about memory use which is important if not vital when fixing critical performance issues. I'm a fan. No need to rush it, interested to hear what others have to say. I know @SymbolixAU, for example, has not switched to bench.

I think the +s outweigh the -s, that this is an important but not priority issue. So suggest keeping it open for now.

dcooley · 2020-05-10T21:40:41Z

I (speaking as @SymbolixAU and myself) dislike bench because the default implementation mixes units in columns

library(bench)


foo <- function() Sys.sleep(5)
bar <- function() Sys.sleep(0.1)

bench::mark(foo(), bar())

# # A tibble: 2 x 13
# expression          min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result memory           time     gc              
# <bch:expr>     <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list> <list>           <list>   <list>          
#   1 foo()            5s       5s     0.200        0B        0     1     0         5s <NULL> <df[,3] [0 × 3]> <bch:tm> <tibble [1 × 3]>
#   2 bar()         100ms    100ms     9.92         0B        0     5     0      504ms <NULL> <df[,3] [0 × 3]> <bch:tm> <tibble [5 × 3]>
#

Another example is here

engineerchange · 2020-05-11T06:29:41Z

I'm but a practitioner, but I agree with @dcooley on the above. Mixing units by default is an odd behaviour, and given that my issue got closed immediately, it doesn't appear like the maintainer(s) will change this behavior in the near future.

engineerchange · 2020-05-11T06:33:44Z

Perhaps the compromise is to suggest bench as a benchmarking alternative in Chapter 1 when microbenchmark is introduced; I do think memory use is an important feature that shouldn't be overlooked.

Robinlovelace · 2020-05-11T08:14:09Z

That would be quicker also than changing every instance of benchmarking. It is just a question of setting an argument so if we did switch to the bench in the longer term we could set the time_unit every time, e.g. with:

library(bench)


foo <- function() Sys.sleep(5)
bar <- function() Sys.sleep(0.1)

bench::mark(foo(), bar(), time_unit = "s")
#> # A tibble: 2 x 6
#>   expression   min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <dbl>  <dbl>     <dbl> <bch:byt>    <dbl>
#> 1 foo()      5.01   5.01      0.199     2.1KB        0
#> 2 bar()      0.100  0.100     9.97         0B        0

^{Created on 2020-05-11 by the reprex package (v0.3.0)}

It's great to be discussing this, has already made me learn something new, I will use this argument in my use of bench going forward.

jangorecki · 2020-06-13T21:28:59Z

This doesn't seems to be mentioned so far, but memory tracking in bench is not reliable. It only tracks memory allocated with R's functions (including R's C functions of course). Code is R packages don't have to use those, in fact if you need to allocate within parallel region in your C code you must not use those. That means it is not possible to measure memory with Rprof reliably, which bench use. AFAIR it uses now something more(/else?), not sure how reliable it is. When I had to measure memory usage I used cgmemtime which seems pretty accurate, but is not user friendly.
As for mixing time units by default, it can lead to really not easy to spot cases:

  expression                  min
  <bch:expr>             <bch:tm>       
1 CJ.dt(DT1, DT2, DT3)      1.52m
2 CJ.dt_1(DT1, DT2, DT3)    1.53s
3 CJ.dt_2(DT1, DT2, DT3)    1.27s
4 CJDT(DT1, DT2, DT3)       1.07s

another pluses for microbenchmark are that it is very lightweight and its API and codebase is mature and stable.

Robinlovelace · 2020-06-14T08:41:19Z

Thanks @jangorecki those are good reasons for sticking with the status quo!

Robinlovelace · 2020-06-14T08:43:21Z

On a related note, I think with dplyr 1.0.0 and recent improvements in data.table it's time to refactor https://csgillespie.github.io/efficientR/data-carpentry.html. Any suggestions on that are very welcome @jangorecki !

jangorecki · 2020-08-26T06:47:34Z

@Robinlovelace all looks to be up to date. I would only mention how base R changing with square bracket can be used with data.table due to its local scope.

Robinlovelace · 2020-08-26T08:07:52Z

Thanks @jangorecki, going back to the original issue, the solutions would seem to be keep the cld column or remove it in every benchmark. Which do you recommend?

The code snippet in 1.6.2 for microbenchmark shows cld as a column in the output.

jangorecki · 2020-08-26T08:21:54Z

I would remove it, to keep things more simple.

engineerchange changed the title ~~microbenchmark's multcomp reference missing~~ 1.6.2. - microbenchmark's multcomp reference missing May 3, 2020

engineerchange mentioned this issue May 10, 2020

Consistent time units (by default) r-lib/bench#80

Closed

Robinlovelace self-assigned this Aug 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1.6.2. - microbenchmark's multcomp reference missing #264

1.6.2. - microbenchmark's multcomp reference missing #264

engineerchange commented May 3, 2020

engineerchange commented May 3, 2020

Robinlovelace commented May 3, 2020

csgillespie commented May 10, 2020

Robinlovelace commented May 10, 2020

dcooley commented May 10, 2020 •

edited

Loading

engineerchange commented May 11, 2020

engineerchange commented May 11, 2020

Robinlovelace commented May 11, 2020 •

edited

Loading

jangorecki commented Jun 13, 2020 •

edited

Loading

Robinlovelace commented Jun 14, 2020

Robinlovelace commented Jun 14, 2020

jangorecki commented Aug 26, 2020

Robinlovelace commented Aug 26, 2020

jangorecki commented Aug 26, 2020

1.6.2. - microbenchmark's multcomp reference missing #264

1.6.2. - microbenchmark's multcomp reference missing #264

Comments

engineerchange commented May 3, 2020

engineerchange commented May 3, 2020

Robinlovelace commented May 3, 2020

csgillespie commented May 10, 2020

Robinlovelace commented May 10, 2020

dcooley commented May 10, 2020 • edited Loading

engineerchange commented May 11, 2020

engineerchange commented May 11, 2020

Robinlovelace commented May 11, 2020 • edited Loading

jangorecki commented Jun 13, 2020 • edited Loading

Robinlovelace commented Jun 14, 2020

Robinlovelace commented Jun 14, 2020

jangorecki commented Aug 26, 2020

Robinlovelace commented Aug 26, 2020

jangorecki commented Aug 26, 2020

dcooley commented May 10, 2020 •

edited

Loading

Robinlovelace commented May 11, 2020 •

edited

Loading

jangorecki commented Jun 13, 2020 •

edited

Loading