Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some slight speedups of convert units #7

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

Carter12s
Copy link

While running the "help_functions_chemdose_ph" vignette I started profiling some of the code as I noticed it was taking a while to run on my computer. Specifically I was profiling this snippet with profvis

profvis({
          solve_column <- raw_water %>%
                  chemdose_ph_chain(input_water = "balanced_water") %>%
                 cross_join(raise_ph) %>%
                  solvedose_ph_once(input_water = "dosed_chem_water") %>%
                 select(-c(defined_water:dosed_chem_water))   
          })

The baseline case on main was taking ~7.5 seconds to run on my machine:
image
With convert_units accounting for ~2.1 seconds of runtime. Although results are very messy.

After these optimizations the example is down to ~7.0 seconds, and convert_units is down to 1.9 seconds. Not huge gains, but a start.
image

Going to continue to optimize from here, but wanted to at least show these gains were possible.

@Carter12s
Copy link
Author

Okay I've applied a memoization approach of pre-computing the standard unit conversions and then looking them up instead of recalculating each time.

image

This results in the same example taking 5.8s on my machine (~22% faster) and convert_units is now down to 800ms aggregate time instead of 2.1s (~61% faster).

I'm less familiar with how to distribute R data files, but it does look like storing a table of pre-computed unit conversions will be a significant speedup to the overall package.

@sierrajohnson
Copy link
Collaborator

This is a good idea. First step will be to move your hash map creation into the data-raw file and make your hash map an .rda file. Also consider using a named list or dataframe if possible to match other code.

@sierrajohnson sierrajohnson added the enhancement New feature or request label Nov 5, 2024
@Carter12s
Copy link
Author

Carter12s commented Nov 7, 2024

I've made a few changes:

  • Update to use .rda files to store the lookup tables I was using
  • I converted the first two tables to regular dataframes, but after testing found that the "Environment" was much faster for this use case
  • I expanded cached unit coverage to include all formula in mweights instead of just the charge table, this improve the % of time that the cache was hit to ~100%

The end result:

image

Same example improved from 7.4s (original) to 5.4s (-27%)

Amount of time being spent in convert_units from 2.1s to .09s (-97.5%) 🥳 🏃

@Carter12s Carter12s mentioned this pull request Nov 10, 2024
@sierrajohnson sierrajohnson self-requested a review November 11, 2024 19:02
@sierrajohnson
Copy link
Collaborator

sierrajohnson commented Nov 11, 2024

Just pushed an update that should fix the checks. Moves the new .rda files to not be exported, only used by the package internally. If you want to run checks locally, devtools::check() runs most of the same things as our R-CMD-check actions - it's a good way to see whether you have anything causing warnings.

R/general.R Outdated Show resolved Hide resolved
R/general.R Show resolved Hide resolved
R/general.R Show resolved Hide resolved
sierrajohnson and others added 4 commits November 11, 2024 12:58
Co-authored-by: Sierra Johnson <[email protected]>
Co-authored-by: Sierra Johnson <[email protected]>
Co-authored-by: Sierra Johnson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants