Investigating forecast year bug not caught by jenkins #424

k-doering-NOAA · 2021-02-10T23:00:57Z

k-doering-NOAA
Feb 10, 2021
Maintainer

Opening a team discussion rather than posting on #130 as this is somewhat of a tangent.

Only the vermillion_snapper model in our testing set used a negative value in the benchmark years; there was indeed a small change in forecasting after fixing the bug (as Ian predicted there would be: #130 (comment)), but it was too small to be captured (~0.1%).

Not sure what to do about this finding, as I think flagging small changes like this will cause more manual inspection of the results and catch few problems. Perhaps the right approach is to try to expand coverage of SS model features? For example, this would be pretty easy to do for the forecasting period by modifying forecasting files of models already in the test set. Trying to expand coverage in the main years of the model would be trickier.

If anyone has ideas, I would be glad to hear them!

Rick-Methot-NOAA · 2021-02-10T23:26:18Z

Rick-Methot-NOAA
Feb 10, 2021
Maintainer

This is a good topic. I do not think that a blanket of more testing examples will solve the problem. I think the solution should rely more upon unit testing of features. I do a lot of this while developing new features. I create a specific example that allows me to iteratively change inputs relevant to that feature and examine the results. It's all ad hoc and rather undocumented except for the occasional note that I will put into the Issue commentary. A story to illustrate. Right now I am working on development of Mark Maunder's new natural mortality option, which relies upon having access to age maturity vector. This got me into the need to change the order of operations for maturity relative to natural mortality. That's very far-reaching consequence, so if I broke something, then our current Jenkins test should catch it. So fine. But I also have had to tweak how and when age-maturity is calculated (previously only in report code) and I realized that the best place to calculate it is in the age-fecundity routine, then while in the age-fecundity routine I saw that the code for age-fecundity was complex because in integrating length fecundity with the age-length key it was invoking the arrays created to dynamically hold the range of lengths for each age. But we never use that feature (ALK tolerance from the starter file). So I removed that possibility (for now). I'll put a note in the Issue log, but a change like this is what creates testing trouble because one of the examples might be using ALK tolerance. So, I will restore the complex code for fecundity; create a new issue to look specifically at that topic in future; and continue with a specific testbed to examine the performance of the new feature. Leads to obvious question: which such testbeds need archiving. How and where?

…

On Wed, Feb 10, 2021 at 3:01 PM Kathryn Doering ***@***.***> wrote: Opening a team discussion rather than posting on nmfs-ost/ss3-source-code#130 <#130> as this is somewhat of a tangent. Only the vermillion_snapper model in our testing set used a negative value in the benchmark years; there was indeed a small change in forecasting after fixing the bug (as Ian predicted there would be: nmfs-ost/ss3-source-code#130 (comment) <#130 (comment)>), but it was too small to be captured (~0.1%). Not sure what to do about this finding, as I think flagging small changes like this will cause more manual inspection of the results and catch few problems. Perhaps the right approach is to try to expand coverage of SS model features? For example, this would be pretty easy to do for the forecasting period by modifying forecasting files of models already in the test set. Trying to expand coverage in the main years of the model would be trickier. If anyone has ideas, I would be glad to hear them! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <https://github.com/orgs/nmfs-stock-synthesis/teams/developers/discussions/3>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABPV4IHOKBR6GA7QO7BNKS3S6MF3XANCNFSM4XN2LICA> .

0 replies

k-doering-NOAA · 2021-02-11T00:07:52Z

k-doering-NOAA
Feb 11, 2021
Maintainer Author

Thanks, Rick! A while back I was trying to figure out how to harness the power of the one-off tests you create while developing; I put some of them in this google drive folder: https://drive.google.com/drive/u/0/folders/1chN-Q1AsJcn99vaag98dkE3TuGvc6k6u . I'm not sure if this is the best or easiest way to archive them, but that is one idea?

As far as using those tests, I'm honestly just not sure how to transform them into an automated testing routine. While there are unit tests for features, they aren't unit tests for functions, which makes it a little tricky - I think we would still have to stick with running the full model and standardized comparison? And your story illustrates how everything can be intertwined in SS...

But again, maybe I'm just not thinking creatively enough about how to use these in automated testing...

0 replies

k-doering-NOAA · 2021-02-11T00:14:48Z

k-doering-NOAA
Feb 11, 2021
Maintainer Author

I should also say, rather than adding more models to the test set, I was thinking of modifying the forecast files of existing test set models so that they better cover the forecasting options. This is sort of in the "unit testing features" mindset, I think...

0 replies

Rick-Methot-NOAA · 2021-02-11T00:14:51Z

Rick-Methot-NOAA
Feb 11, 2021
Maintainer

That folder is good and I should use it; we can improve it after we learn to start using it. The shortcoming of the current test routine is that it doesn't test the newly developed feature in the context of all the other features found in the examples, because those are old examples that have not implemented the new feature. Then once you implement the new feature of course the results will change.

…

On Wed, Feb 10, 2021 at 4:08 PM Kathryn Doering ***@***.***> wrote: Thanks, Rick! A while back I was trying to figure out how to harness the power of the one-off tests you create while developing; I put some of them in this google drive folder: https://drive.google.com/drive/u/0/folders/1chN-Q1AsJcn99vaag98dkE3TuGvc6k6u . I'm not sure if this is the best or easiest way to archive them, but that is one idea? As far as using those tests, I'm honestly just not sure how to transform them into an automated testing routine. While there are unit tests for features, they aren't unit tests for functions, which makes it a little tricky - I think we would still have to stick with running the full model and standardized comparison? And your story illustrates how everything can be intertwined in SS... But again, maybe I'm just not thinking creatively enough about how to use these in automated testing... — You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/orgs/nmfs-stock-synthesis/teams/developers/discussions/3/comments/2>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABPV4IAH2GHCFRW35QZMO7TS6MNWNANCNFSM4XN2LICA> .

0 replies

k-doering-NOAA · 2021-02-11T00:21:21Z

k-doering-NOAA
Feb 11, 2021
Maintainer Author

Ah, good point, I see what you mean now, Rick, about new features not being tested in the automated routine. We could instead put the tests in a repository rather than google drive that could be pulled in by an automated testing routine? But the number of models and probably associated run times would increase pretty quickly....

This seems a good path to explore, though. We basically would get these tests "for free" since you are already developing them.

0 replies

Rick-Methot-NOAA · 2021-02-11T00:41:20Z

Rick-Methot-NOAA
Feb 11, 2021
Maintainer

Free, but huge over time, so slow to run, and hard to maintain if some future change makes a mandatory Input change for all models. How about a collection of test routines. One to run through all the growth options; one to run through all the selectivity options; etc. So testing could be directed towards the feature that was most impacted. This would complement, but not replace a standard test set.

…

On Wed, Feb 10, 2021 at 4:21 PM Kathryn Doering ***@***.***> wrote: Ah, good point, I see what you mean now, Rick, about new features not being tested in the automated routine. We could instead put the tests in a repository rather than google drive that could be pulled in by an automated testing routine? But the number of models and probably associated run times would increase pretty quickly.... This seems a good path to explore, though. We basically would get these tests "for free" since you are already developing them. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/orgs/nmfs-stock-synthesis/teams/developers/discussions/3/comments/5>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABPV4IALZYZGB42B3N3SEN3S6MPI5ANCNFSM4XN2LICA> .

0 replies

iantaylor-NOAA · 2021-02-11T01:14:45Z

iantaylor-NOAA
Feb 11, 2021
Maintainer

Rick, sorry to hear about the complexities you're discovering with the new M option, but good that they are helping us think about ways of making this process better.

Now that Kathryn has improved the r4ss functions for reading and writing model input files, it would be relatively straightforward for any new feature to go through our set of test models and turn on that feature in all of them. However, this would just be a minimal test to see if the model could run because any new feature (like new M option) which had an impact on the model results would cause lots of differences from our reference output values and there's no easy way to confirm that these differences were intended.

The mention of the "ALK tolerance" option brings up another worthwhile topic (which could be discussed separately if that would be better). As we've discussed before, the development of SS hasn't had a research track separate from the production track. If it had, the ALK tolerance could have been explored in a research context to confirm that the extra complexity added had enough payoff to be worth including. Rick, I'm sure you did enough testing at the time that you added that option to decide that it was worth including, but of course you couldn't foresee the situation you're now facing.

I couldn't track down any email discussion that we surely had about the feature when first added, but I do remember creating the attached r4ss figure which has been useful for other stuff. However, this figure is showing long tails in spite of a 0.001 tolerance added to the "Simple" example model and the AGE_LENGTH_KEY is indeed showing values well below 0.001, so maybe the option isn't working right anyway. Therefore, turning off the ALK tolerance option seems totally reasonable a good step at this point to make implementing the new M option easier.

0 replies

Rick-Methot-NOAA · 2021-02-11T02:40:05Z

Rick-Methot-NOAA
Feb 11, 2021
Maintainer

Thanks for reminder Ian. I do recall that, which is why when I encountered the code in one place still using arrays relevant to ALK_tolerance option, I thought I might simplify.

The big issue with this new M option is order of operations for maturity, growth and M. I think I will separate the maturity code out of the WtLen function to give me more flexibility. Messy stuff could still happen in a multi-season model in which SS needs M in season 1, but that M depends on maturity that is calculated only in spawn_season which is season 2 of previous year. Then with settlements, morphs and platoons with different length-at-age by season, maturity-at-age due to length maturity is changing for each of them by season; etc.

0 replies

kellijohnson-NOAA · 2021-02-12T15:41:46Z

kellijohnson-NOAA
Feb 12, 2021
Maintainer

ss3sim might be helpful here in the research mode where we could have a single set of files for a generic OM and run the model with the feature turned off 100 times and then turn the feature on to an extreme and run it again 100 times. this would basically be the same as using the SS bootstrap capability to generate data sets except it has an easy structure to change the input files and summarize the results across replicates between the OM and the EM.

0 replies

k-doering-NOAA · 2021-02-12T21:02:12Z

k-doering-NOAA
Feb 12, 2021
Maintainer Author

That's a good point, Kelli!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigating forecast year bug not caught by jenkins #424

{{title}}

Replies: 10 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Investigating forecast year bug not caught by jenkins #424

k-doering-NOAA Feb 10, 2021 Maintainer

Replies: 10 comments

Rick-Methot-NOAA Feb 10, 2021 Maintainer

k-doering-NOAA Feb 11, 2021 Maintainer Author

k-doering-NOAA Feb 11, 2021 Maintainer Author

Rick-Methot-NOAA Feb 11, 2021 Maintainer

k-doering-NOAA Feb 11, 2021 Maintainer Author

Rick-Methot-NOAA Feb 11, 2021 Maintainer

iantaylor-NOAA Feb 11, 2021 Maintainer

Rick-Methot-NOAA Feb 11, 2021 Maintainer

kellijohnson-NOAA Feb 12, 2021 Maintainer

k-doering-NOAA Feb 12, 2021 Maintainer Author

k-doering-NOAA
Feb 10, 2021
Maintainer

Rick-Methot-NOAA
Feb 10, 2021
Maintainer

k-doering-NOAA
Feb 11, 2021
Maintainer Author

k-doering-NOAA
Feb 11, 2021
Maintainer Author

Rick-Methot-NOAA
Feb 11, 2021
Maintainer

k-doering-NOAA
Feb 11, 2021
Maintainer Author

Rick-Methot-NOAA
Feb 11, 2021
Maintainer

iantaylor-NOAA
Feb 11, 2021
Maintainer

Rick-Methot-NOAA
Feb 11, 2021
Maintainer

kellijohnson-NOAA
Feb 12, 2021
Maintainer

k-doering-NOAA
Feb 12, 2021
Maintainer Author