Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modifying the workflow to enable the use of different methods (LHS, Sobol) allowing for scenario discovery applications #13

Draft
wants to merge 35 commits into
base: envs
Choose a base branch
from

Conversation

AgnesBelt
Copy link
Collaborator

Hi @willu47, I have now started trying to modify the workflow to enable the option to choose between different methods for generating the model runs, so that I can opt for LHS or Sobol in case I am interested in doing Scenario Discovery.
I am sure there is plenty of mistakes, but I have tried to use the following approach:

  1. Modify the config file to allow for selecting a sampling method different from Morris
  2. Modify the snakemake file to consider the method options and adjust the folder creation structure based on the number of model runs expected with each method.
  3. Create new create_sample.py scripts, one for LHS and one for Sobol, to generate the input data files needed for the runs.
  4. modify the sample.smk file to add if statement for redirecting the create_sample rule to the related script pertaining to the method chosen in the config file.

Would be great to look into it together at our next meeting.

…ods for scenario discovery applications - depending on the method selected, the folder structure will be generated according to the number of scenarios generated
…iles depending on the sampling method of interest, to allow for scenario discovery applications
…thods - i.e. LHS, Sobol respectively. Need for testing if they are defined correctly
@AgnesBelt AgnesBelt requested a review from willu47 April 15, 2024 15:49
@AgnesBelt
Copy link
Collaborator Author

Thanks to support from @HauHe, @willu47, and @camiloramirezgo, we seem to have now managed to fix the bugs I was encountering and the workflow should now be able to use different sampling methods (Morris, LHS, Sobol) to execute the workflow and produce results.

This should allow for the use of the esom_gsa workflow also in scenario discovery analysis.

Still pending: after having computed all scenarios and generated the results, the workflow proceed to compute some sensitivity analysis and fails to do so when LHS or Sobol are selected as methods.
If I am not mistaken, this could be skipped for SD applications with LHS and Sobol methods. So, I might still look into it to clean up the process.

@AgnesBelt
Copy link
Collaborator Author

Current issue still pending:
Screenshot 2024-04-17 at 17 39 39

…llow for using the user-specific choice of starting year of interpolation, as indicated in the config file under start_year
…ar of interpolation and pass it on to the create_modelrun.py script
… files to allow for using the user-specific choice of starting year of interpolation, as indicated in the config file under start_year

this is now working also for parameters with no interpolation_index
@AgnesBelt
Copy link
Collaborator Author

I have now added also the option to specific a start_year for the interpolation, to allow for changing the start interpolation year to a year in the middle of the modelling period. This is of use for GLUCOSE SD application, but might be also useful for other cases.

Copy link
Contributor

@willu47 willu47 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest combining the create_sample_*.py scripts into one, with one function per method, and then pass in the method, otherwise there is a fair amount of duplicate code and three files instead of one.

I found a few issues

  • one with len(PARAMETERS) not working (wrong number of modelruns created when using Sobol)
  • suggest using 2**replicates rather than replicates**2, as only the first is guaranteed to be a power of 2
  • non integer type of start_year was causing problems in the create_modelrun.py script

@@ -2,25 +2,32 @@

# Populate the scenarios.csv file with a list of scenario names
# and path (description optional) to the model csv data
scenarios: config/scenarios.csv
scenarios: resources/glucose/scenarios_glucose.csv
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't modify this e.g. revert to config/scenarios.csv


# Tell the workflow which model results to plot
results: config/results.csv
results: resources/glucose/results.csv
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert to config/results.csv


# Filetype options: 'csv' or 'parquet' or 'feather'
filetype: csv

# Define the uncertain parameters used to define the Monte Carlo sample
parameters: config/parameters.csv
parameters: resources/glucose/parameters_glucose.csv
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert to config/parameters.csv


# Path to the OSeMOSYS model file
model_file: resources/osemosys_fast.txt
model_file: resources/glucose/osemosys_fast.txt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert to resources/osemosys_fast.txt

config/config.yaml Outdated Show resolved Hide resolved
elif METHOD == 'LHS':
MODELRUNS = range(config['replicates'])
elif METHOD == 'Sobol':
MODELRUNS = range((config['replicates']**2) * (len(PARAMETERS) + 2))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

len(PARAMETERS) is always 1, you need:

PARAMETERS = pd.read_csv(config['parameters'])['name'].to_numpy()

and

MODELRUNS = range((2 ** REPLICATES) * (PARAMETERS.shape[0] + 2))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed partially, I did not see the reason for using PARAMETERS.shape[0] as the PARAMETERS have just one dimension. I also kept the PARAMETERS definition linked to the 'indexes', is there any specific reason why I should switch to the 'name'?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the fastest way would be PARAMETERS = len(pd.read_csv(config['parameters']))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other issue is that this ignores grouping of parameters...

workflow/rules/osemosys.smk Outdated Show resolved Hide resolved
workflow/scripts/create_sample_Sobol.py Outdated Show resolved Hide resolved
workflow/scripts/create_modelrun.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants