Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.6.2. - microbenchmark's multcomp reference missing #264

Open
engineerchange opened this issue May 3, 2020 · 14 comments
Open

1.6.2. - microbenchmark's multcomp reference missing #264

engineerchange opened this issue May 3, 2020 · 14 comments
Assignees

Comments

@engineerchange
Copy link
Contributor

The code snippet in 1.6.2 for microbenchmark shows cld as a column in the output. cld only shows up as a column if multcomp is already installed (see microbenchmark doc).

Earlier in the chapter, it is suggested to install microbenchmark, but not this companion package.

Snippet from 1.6.2:

library("microbenchmark")
df = data.frame(v = 1:4, name = letters[1:4])
microbenchmark(df[3, 2], df[3, "name"], df$name[3])
# Unit: microseconds
#          expr     min    lq  mean median    uq   max neval cld
#      df[3, 2]   17.99 18.96 20.16  19.38 19.77 35.14   100   b
# df[3, "name"]   17.97 19.13 21.45  19.64 20.15 74.00   100   b
#    df$name[3]   12.48 13.81 15.81  14.48 15.14 67.24   100   a 
@engineerchange engineerchange changed the title microbenchmark's multcomp reference missing 1.6.2. - microbenchmark's multcomp reference missing May 3, 2020
@engineerchange
Copy link
Contributor Author

Note the code snippet in 1.6.3.1 doesn't show the microbenchmark output with the cld column.

@Robinlovelace
Copy link
Collaborator

I suggest we switch to using the bench package. Sound reasonable @engineerchange and @csgillespie ?

@csgillespie
Copy link
Owner

@Robinlovelace Switching to bench does involve a fair bit of work, as it's used throughout the book. Also, not entirely convinced that it's "better". Thoughts?

@Robinlovelace
Copy link
Collaborator

Thoughts?

Bench tells you about memory use which is important if not vital when fixing critical performance issues. I'm a fan. No need to rush it, interested to hear what others have to say. I know @SymbolixAU, for example, has not switched to bench.

I think the +s outweigh the -s, that this is an important but not priority issue. So suggest keeping it open for now.

@dcooley
Copy link

dcooley commented May 10, 2020

I (speaking as @SymbolixAU and myself) dislike bench because the default implementation mixes units in columns

library(bench)


foo <- function() Sys.sleep(5)
bar <- function() Sys.sleep(0.1)

bench::mark(foo(), bar())

# # A tibble: 2 x 13
# expression          min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result memory           time     gc              
# <bch:expr>     <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list> <list>           <list>   <list>          
#   1 foo()            5s       5s     0.200        0B        0     1     0         5s <NULL> <df[,3] [0 × 3]> <bch:tm> <tibble [1 × 3]>
#   2 bar()         100ms    100ms     9.92         0B        0     5     0      504ms <NULL> <df[,3] [0 × 3]> <bch:tm> <tibble [5 × 3]>
#   

Another example is here

@engineerchange
Copy link
Contributor Author

I'm but a practitioner, but I agree with @dcooley on the above. Mixing units by default is an odd behaviour, and given that my issue got closed immediately, it doesn't appear like the maintainer(s) will change this behavior in the near future.

@engineerchange
Copy link
Contributor Author

Perhaps the compromise is to suggest bench as a benchmarking alternative in Chapter 1 when microbenchmark is introduced; I do think memory use is an important feature that shouldn't be overlooked.

@Robinlovelace
Copy link
Collaborator

Robinlovelace commented May 11, 2020

That would be quicker also than changing every instance of benchmarking. It is just a question of setting an argument so if we did switch to the bench in the longer term we could set the time_unit every time, e.g. with:

library(bench)


foo <- function() Sys.sleep(5)
bar <- function() Sys.sleep(0.1)

bench::mark(foo(), bar(), time_unit = "s")
#> # A tibble: 2 x 6
#>   expression   min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <dbl>  <dbl>     <dbl> <bch:byt>    <dbl>
#> 1 foo()      5.01   5.01      0.199     2.1KB        0
#> 2 bar()      0.100  0.100     9.97         0B        0

Created on 2020-05-11 by the reprex package (v0.3.0)

It's great to be discussing this, has already made me learn something new, I will use this argument in my use of bench going forward.

@jangorecki
Copy link

jangorecki commented Jun 13, 2020

This doesn't seems to be mentioned so far, but memory tracking in bench is not reliable. It only tracks memory allocated with R's functions (including R's C functions of course). Code is R packages don't have to use those, in fact if you need to allocate within parallel region in your C code you must not use those. That means it is not possible to measure memory with Rprof reliably, which bench use. AFAIR it uses now something more(/else?), not sure how reliable it is. When I had to measure memory usage I used cgmemtime which seems pretty accurate, but is not user friendly.
As for mixing time units by default, it can lead to really not easy to spot cases:

  expression                  min
  <bch:expr>             <bch:tm>       
1 CJ.dt(DT1, DT2, DT3)      1.52m
2 CJ.dt_1(DT1, DT2, DT3)    1.53s
3 CJ.dt_2(DT1, DT2, DT3)    1.27s
4 CJDT(DT1, DT2, DT3)       1.07s

another pluses for microbenchmark are that it is very lightweight and its API and codebase is mature and stable.

@Robinlovelace
Copy link
Collaborator

Thanks @jangorecki those are good reasons for sticking with the status quo!

@Robinlovelace
Copy link
Collaborator

On a related note, I think with dplyr 1.0.0 and recent improvements in data.table it's time to refactor https://csgillespie.github.io/efficientR/data-carpentry.html. Any suggestions on that are very welcome @jangorecki !

@jangorecki
Copy link

@Robinlovelace all looks to be up to date. I would only mention how base R changing with square bracket can be used with data.table due to its local scope.

@Robinlovelace Robinlovelace self-assigned this Aug 26, 2020
@Robinlovelace
Copy link
Collaborator

Thanks @jangorecki, going back to the original issue, the solutions would seem to be keep the cld column or remove it in every benchmark. Which do you recommend?

The code snippet in 1.6.2 for microbenchmark shows cld as a column in the output.

@jangorecki
Copy link

I would remove it, to keep things more simple.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants