You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current regression testing framework is useful for development that doesn't alter simulation results. From time to time, however, it is necessary to make a change to the core SWMM engine to add a new feature, improve computational stability, or make the model more physically realistic. Currently, these scenarios cause our regression testing framework to fail. Simply making a change to the output report format to make it more useful and informative also causes test failures.
We need to provide testing tools that allow an engine developer to evaluate fundamental changes to the engine that result in a "benchmark shift." This could involve using different criteria for comparing models. Our current testing criteria, basically checks to see if the results are the same within a specified tolerance. @LRossman has suggested some additional criteria we could consider:
Compare overall flow and mass continuity errors.
Compare selected variables in the Summary Results tables (after SWMM sorts them from high to low).
Compare time series plots of total system inflow, outflow, flooding and storage.
Compare time series plots of inflow to the system’s outfalls.
Compare the links with the Highest Flow Instability Indexes.
Further thoughts contributed by Lew and paraphrased here include: When a change is made to a core SWMM engine procedure, the expectation should not be that the new results exactly equal the old ones. Comparison against a benchmark is at best a useful surrogate for the quality of model results, since for these kinds of problems there is no “theoretically correct” solution to compare against. Instead, we should make sure that any differences in results that get introduced in the course of development are “reasonably small.” A new method should be evaluated to insure that it produces a solution that is clearly more physically meaningful than the old one. The method's implementation should also be evaluated to ensure that continuity errors and model stability are improved.
The objectives when evaluating a benchmark shift are related but different than day to day regression testing. These new objectives need to be reflected in testing tools to better support core engine development.
The text was updated successfully, but these errors were encountered:
Great post @michaeltryby The system graphs in particular show so much about the functioning of the network. A graphical and/or statistical comparison of the inflow, outflow, flooding, runoff, DWF, GW, RDII and other major components would easily reveal important changes. I am not sure about the instability indices reveling much.
I think the discussion over on OpenWaterAnalytics/EPANET#169 is completely relevant here. We're in the realm of theoretical correctness and statistical validity being a condition of passing tests.
The current regression testing framework is useful for development that doesn't alter simulation results. From time to time, however, it is necessary to make a change to the core SWMM engine to add a new feature, improve computational stability, or make the model more physically realistic. Currently, these scenarios cause our regression testing framework to fail. Simply making a change to the output report format to make it more useful and informative also causes test failures.
We need to provide testing tools that allow an engine developer to evaluate fundamental changes to the engine that result in a "benchmark shift." This could involve using different criteria for comparing models. Our current testing criteria, basically checks to see if the results are the same within a specified tolerance. @LRossman has suggested some additional criteria we could consider:
Further thoughts contributed by Lew and paraphrased here include: When a change is made to a core SWMM engine procedure, the expectation should not be that the new results exactly equal the old ones. Comparison against a benchmark is at best a useful surrogate for the quality of model results, since for these kinds of problems there is no “theoretically correct” solution to compare against. Instead, we should make sure that any differences in results that get introduced in the course of development are “reasonably small.” A new method should be evaluated to insure that it produces a solution that is clearly more physically meaningful than the old one. The method's implementation should also be evaluated to ensure that continuity errors and model stability are improved.
The objectives when evaluating a benchmark shift are related but different than day to day regression testing. These new objectives need to be reflected in testing tools to better support core engine development.
The text was updated successfully, but these errors were encountered: