Skip to content

Commit

Permalink
checkpoint lookup in input function
Browse files Browse the repository at this point in the history
to dynamically set the number of threads (and amount of memory) for
certain rules (create_power_network and estimate_wind_fields) we use a
dynamically generated CSV file with the number of targets in each
country

this file must be present for rules to allocate resources

use the checkpoint lookup in an input function to guarantee its presence

previously, we had removed the checkpoint lookup from the
`threads_for_country` function as per the docs, but this does not seem
to work (snakemake error on comparison of thread count integer and
'TBDString')

maybe this whole scaling resources is too much complexity, but trying to
run a global analysis without it is also inefficient at the
aforementioned rules...
  • Loading branch information
thomas-fred committed Nov 1, 2023
1 parent b96b2eb commit 4e8d321
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 13 deletions.
2 changes: 2 additions & 0 deletions workflow/rules/exposure/wind_fields.smk
Original file line number Diff line number Diff line change
Expand Up @@ -201,6 +201,8 @@ rule estimate_wind_fields:
Optionally plot wind fields and save to disk
"""
input:
# `threads_for_country` will fail unless this CSV is present when resources are set
country_target_count=country_target_count_path,
storm_file=storm_tracks_by_country,
wind_grid="{OUTPUT_DIR}/power/by_country/{COUNTRY_ISO_A3}/storms/wind_grid.tiff",
downscaling_factors=rules.create_downscaling_factors.output.downscale_factors,
Expand Down
30 changes: 17 additions & 13 deletions workflow/rules/preprocess/create_network.smk
Original file line number Diff line number Diff line change
Expand Up @@ -234,19 +234,8 @@ def threads_for_country(wildcards) -> int:
Thread allocation
"""

# we don't use a checkpoint here, despite depending on this CSV existing

# from the docs...
# You don’t need to use the checkpoint mechanism to determine parameter or
# resource values of downstream rules that would be based on the output of
# previous rules. In fact, it won’t even work because the checkpoint
# mechanism is only considered for input functions. Instead, you can simply
# use normal parameter or resource functions that just assume that those
# output files are there. Snakemake will evaluate them immediately before
# the job is scheduled, when the required files from upstream rules are
# already present.
# https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#data-dependent-conditional-execution
ranked = pd.read_csv(f"{wildcards.OUTPUT_DIR}/power/target_count_by_country.csv")
ranking_file = checkpoints.rank_countries_by_target_count.get(**wildcards).output.lookup_table
ranked = pd.read_csv(ranking_file)

ranked["threads"] = logistic_min(
ranked.index, # input to transform
Expand All @@ -266,11 +255,26 @@ def threads_for_country(wildcards) -> int:
return max([1, n_threads])


def country_target_count_path(wildcards) -> str:
"""
We depend on the file (path) returned by this function to allocate
resources (number of CPUs and therefore memory) to certain rules. Those
rules use `threads_for_country` as a function returning a value for the
`threads` parameter. However those rules must also include this function as
an `input` function, to ensure the CSV data file is available for
`threads_for_country` to execute. Having the checkpoint lookup within
`threads_for_country` is not, on its own, sufficient.
"""
return checkpoints.rank_countries_by_target_count.get(**wildcards).output.lookup_table


rule create_power_network:
"""
Combine power plant, consumer and transmission data for given area
"""
input:
# `threads_for_country` will fail unless this CSV is present when resources are set
country_target_count=country_target_count_path,
plants="{OUTPUT_DIR}/power/by_country/{COUNTRY_ISO_A3}/network/powerplants.geoparquet",
targets="{OUTPUT_DIR}/power/by_country/{COUNTRY_ISO_A3}/network/targets.geoparquet",
gridfinder="{OUTPUT_DIR}/power/by_country/{COUNTRY_ISO_A3}/network/gridfinder.geoparquet",
Expand Down

0 comments on commit 4e8d321

Please sign in to comment.