Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Air pollution #54

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Air pollution #54

wants to merge 6 commits into from

Conversation

kmulvaney
Copy link

@kmulvaney kmulvaney commented Jul 23, 2018

I am working with Steve Smith on using GCAM to evaluate the air pollution and health co-benefits of the Paris Agreement. I have modified the gcamrpt package to pull GCAM variables that I need to create precursor emissions scenarios. The most significant changes I've made are included in the emissions module and also in the filtering function. I've also written some tests for the emissions module. Please let me know if you have any feedback for me to improve the modifications I have made. I did my best to make everything useful to someone who needs air pollution emissions from GCAM, but I am new to this sort of collaborative coding and could probably use some pointers.

Additionally, I noticed after I merged the air pollution branch with the master, that there seems to be a failed test in the transportation module involving units. I did not modify the transportation module, aside from adding a variable to the modules to support the changes I made to the filter function. I did not change the code to fix this failed test, but I can if you would like me to do so.

Added filter_operator functionality to: (1)
 run multiple filters on query separately ("OR" option); (2)  apply all filters to the same query ("AND" option)

Air pollution query modules in progress
Added modules to query air pollution precursor emissions, final energy, and primary energy.  Added more filter_operator options to filtering function.
Final modification to filtering functionality; branch now summarizes all GCAM results needed to harmonize with CEDS emissions
Corrected OR filter operator function, modified existing test scripts to run with new filter functionality, added tests for filter functions and the emissions module
@rplzzz rplzzz self-requested a review August 6, 2018 14:36
@rplzzz
Copy link
Contributor

rplzzz commented Aug 6, 2018

@kmulvaney I reran the build on the latest master, and it built cleanly, so it seems likely that your changes are causing the checks to fail. In addition to the one test that failed, there are a whole slew of warnings from the package checker due to mismatches between the code and the documentation. Here's one example:

Codoc mismatches from documentation object 'module.electricity':
module.electricity
  Code: function(mode, allqueries, aggkeys, aggfn, years, filters,
                 filter_operator, ounit)
  Docs: function(mode, allqueries, aggkeys, aggfn, years, filters,
                 ounit)
  Argument names in code not in docs:
    filter_operator
  Mismatches in argument names:
    Position: 7 Code: filter_operator Docs: ounit

You will definitely need to fix these warnings. It could be that you just forgot to commit the man pages (the *.Rd files), so check that first.

If I had to guess I'd say that the test failure is also related to adding the new argument filter_operator to all of the modules, so that would be a good place to start looking. Note that the problem could just as easily be in the test setup code as in the module per se.

Finally, adding that additional argument to the module API seems like a fairly substantial change. Did you discuss that with Steve or with someone else here at PNNL? I'm not necessarily opposed to it, but I would like to have a little discussion of what the new argument is for and why it was needed. It seems like a lot of the things we are trying to accomplish by changing the filter operator could be accomplished just as readily with regular expressions.

@kmulvaney
Copy link
Author

@rplzzz thanks for your feedback! I know how I can resolve the transportation module issue-- the master branch had the units changed slightly (upper case "T" to lower case "t") for a few transportation tests, and when I was adding in major changes from the master branch to the air pollution branch, I missed that. All of the tests now run for me successfully in R studio, so I think that's resolved.

On the documentation-- I see where I need to make those changes, I think I can do that pretty quickly. However, before I add in the "filter_operator" argument to the documentation, let me give you some more background on why I added that in.

I asked @calebbraun (who Steve Smith connected me with to help with details on the gcamrpt package, so I did not discuss this directly with Steve) about the filtering functionality that I needed to pull precursor emissions from GCAM. Here's the issue: as far as I can tell based on the master branch, the filter function included within the gcamrpt package only returns results from a GCAM query if the results meet all filter criteria provided in the variable control file. For example, the Electricity|Rooftop PV|Ridiculous variable specified in example-iiasa-variable.ctl includes two filter criteria, "(notmatches; sector; electricity), (notmatches; sector; industrial energy use)". The filter function will take those filter criteria and return results of the GCAM Electricity query that do not have a sector value called "electricity" AND do not have a sector value called "industrial energy use".

This is a problem for me because in order to create the aggregated IIASA emission variables, I need to apply filters to the GCAM query that operate on the query separately. For example, I need to create a variable called Emissions|BC|Energy|Demand|Transportation|Aviation from the GCAM results. Both domestic and international aviation emissions belong in this variable, and they are both included in the "BC emissions by subsector" GCAM query, but they have different sector and subsector labels (trn_aviation_intl and International Aviation; trn_pass and Domestic Aviation). Therefore, in order to combine emissions from these two sources in the IIASA format, I need to apply 2 filters separately to the GCAM query, so that the filter function returns results that have a subsector value called "Domestic Aviation" OR subsector value called "International Aviation".

I sent this explanation to Caleb back in June, and he told me that he didn't see a way to apply this OR functionality to the filter function with the way the code is currently written in the master branch, and that if that's the case, it could be an oversight with the function. He said that he would check with the person who wrote the filtering module and asked me to share my solution to this issue. I committed the changes to filtering.R and the rest of the package to the air pollution branch a while ago; however, I did not circle back with Caleb to get additional feedback, so that's my fault, sorry about that!

Do you have any suggestions for how we could add the OR filtering functionality that I describe above without making this major change to the function input? Caleb suggested using a filter like "(matches; subsector; (Domestic Aviation|International Aviation))" in the variable control file, but he also said that this wouldn't work because "we can’t use parenthesis in the regular expression match". Perhaps there is a work around if we modify the Regexp "filterpattern" variable (see Line 73, filtering.R, master branch) used to parse the filters? I wasn't familiar with Regexp at the time I was brainstorming solutions to this issue, so I didn't consider that a possibility. But I have since learned a bit about it because I did end up having to change "filterpattern" so that the filter function could read multiple filtering criteria.

Please let me know what you think!

@codecov
Copy link

codecov bot commented Aug 17, 2018

Codecov Report

Merging #54 into master will decrease coverage by 0.7%.
The diff coverage is 54.2%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master     #54     +/-   ##
========================================
- Coverage    55.8%   55.1%   -0.8%     
========================================
  Files          15      16      +1     
  Lines        1097    1308    +211     
========================================
+ Hits          613     721    +108     
- Misses        484     587    +103
Impacted Files Coverage Δ
R/primary_energy.R 0% <0%> (ø) ⬆️
R/final_energy.R 0% <0%> (ø)
R/electricity.R 75% <100%> (ø) ⬆️
R/runmodule.R 95% <100%> (ø) ⬆️
R/filtering.R 89.5% <100%> (+3.8%) ⬆️
R/socioeconomics.R 94.7% <100%> (ø) ⬆️
R/emissions.R 38.9% <49.4%> (+38.9%) ⬆️
R/transportation.R 77.5% <85.7%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7c1c2fd...572c809. Read the comment docs.

@kmulvaney
Copy link
Author

@rplzzz -- I pushed the transportation module/test fixes so that the air_pollution branch passes all tests. However, before I update the documentation, I wanted to get your feedback on my reasons for making the major changes to the filtering function. Let me know your thoughts.

@rplzzz
Copy link
Contributor

rplzzz commented Sep 14, 2018

@kmulvaney Sorry I haven't been responsive on this; I've been chasing a lot of other deadlines lately. I'm fairly certain I won't have time to look at this next week, but hopefully I can get to it the following week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants