-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
As a user, I would like to declare each pool within an irregular sequence of pools for evaluation #245
Comments
Original Redmine Comment Currently we allow for the generation of a regular sequence of pools using @leadTimesPoolingWindow@ and @issuedDatesPoolingWindow@, which are used to generate the sequence of @TimeWindow@ (or outer pools) to evaluate. A @TimeWindow@ has six dimensions to be configured. When undefined, a dimension defaults to unlimited. In keeping with the current declaration, the new declaration might look something like this (for one pool, repeated as many times as pools), but I think we can improve on this nomenclature in the same way that we can improve on the current nomenclature, i.e., this is purely to clarify the ticket, not an actual suggestion of declaration.
For example, it should be clarified when a bookend is inclusive or exclusive. edit: This would be the declaration at most, since undeclared boundaries default to unlimited. Again, repeated for up to N pools. This declaration could not coincide with any other pool boundary definitions (edit: such as @leadTimesPoolingWindow@ - although, to be discussed, since this is more about simplicity than impossibility, i.e., you could have a regular sequence, followed by an irregular one, I suppose). |
Original Redmine Comment This one came up again today in the context of a discussion about event detection. It would not involve a tremendous amount of effort but, equally, it may not expand our userbase to those who want to perform evaluations of hydrologic events and want to use WRES (the intersection of those two things seems quite small). Regardless, this is more generally useful, so bumping the priority a little. |
Original Redmine Comment Thanks for finding this. Maybe it can allow us to partially check a box, by allowing users to specify their own event periods for which to compute metrics. Then again, maybe not. More information about use cases would help. Still, as you said, this should be generally useful in other cases, so why not? Hank |
Original Redmine Comment Indeed. |
Original Redmine Comment Bumping this to 6.8 since it keeps getting requested (or, rather, users keep asking for behavior that would be fixed by this). That said, I think "8" may be a bit optimistic for handling the graphics etc., so the estimate may need to be reconsidered - this is not simply about declaration. |
Original Redmine Comment Slipping to 6.9. I know you said you plan to work on the rescale lenience ticket, but this one might be a good candidate following that ticket. Hank |
Original Redmine Comment Agree. |
Original Redmine Comment I think this is a good candidate for the next effort, probably not an especially significant one (although there's a risk I'm forgetting something, such as whether the existing graphics templates/styles can handle fully arbitrary pool sequences). My main reservation is that it adds yet more to the declaration schema and will further propagate nomenclature that we probably want to dispense with, if only because consistency of the nomenclature will override accuracy/precision - when we change the nomenclature, we'll need to do it everywhere at once, I think. I am tempted to make progress on a better evaluation-specific declaration language before implementing tickets that further complicate the existing language, but that is obviously a much bigger task and not something we've discussed or agreed as near-term work. |
Original Redmine Comment Putting this on the backlog until progress is made with the declaration language. |
Original Redmine Comment In practice, this will be a breaking change insofar as file names will change, which is part of the API because the time windows used in file names will need to be fully qualified, including for lead durations. |
Original Redmine Comment Makes sense to target v6.15 for this one, as the new declaration language will be live. |
Original Redmine Comment I'm delaying this to 6.17, as I doubt it will be done by the end of the week. Hank |
Original Redmine Comment I missed this one, is this in 6.17? Is there UAT? |
Original Redmine Comment James: Do you want to tackle this as part of 6.18 or slip it to the backlog? Hank |
Original Redmine Comment In general, it's probably better to put everything on the backlog and then select from the backlog for the next release at the moment it can be nominated confidently, but it's OK to collapse that process in the really obvious cases. Otherwise, we're left performing musical chairs each time. In short, backlog. |
Original Redmine Comment Done, Hank |
Looking at this one. |
First task is to determine the appropriate declaration. I am leaning towards the following in its most complete form (example with two pools):
Or, in the more succinct flow style:
The repetition of the |
Each time pool is (up to) three-dimensional, with two bookends for each pool. When a dimension is undeclared, its defaults are taken from the relevant overall time bounds that are declared or the infinite interval, in that order. Given the schema more generally, they do need to be declared in terms of a This will, however, introduce a slight complication insofar as the automatically generated pools are non-overlapping by design, so the lower bound of each pool is always exclusive. To ensure a consistent interpretation, regardless of origin, we'll need to adjust the declared |
I will await any input and begin implementation tomorrow. |
James: What you propose seems reasonable to me. Can you explain the interaction between the pools defined in Thanks, |
I expect they will be additive to the automatically generated pools from |
Sounds reasonable to me. Thanks! Hank |
Schema updated and some unit tests written for deserializing via the |
Have the failing tests, making them pass. |
Moving this to 6.28 unless its done and we end up needing an RC2 |
I'm going to use the rest of my Friday to start testing this ticket. I'm going to begin with basic, in-memory evaluations of a simple RFC forecast for FROV2. Hank |
A first evaluation attempt using just 'time_pools' was successful with two explicit pools. The output was as expected. However, when I combined it with the original declaration using 'lead_time_pools', I encountered a netCDF error. Here is the YAML snippet: lead_time_pools:
period: 6
frequency: 6
unit: hours
lead_times:
minimum: 0
maximum: 120
unit: hours
time_pools:
- lead_times:
minimum: 12
maximum: 18
unit: hours
- lead_times:
minimum: 24
maximum: 48
unit: hours Here is the exception:
It appears to be due to one of the 'time_pools' ending at 48 hours while one of the Just reporting the exception; I'm not positive what the best fix would be. If you need a fully reproduceable example, let me know. I'm hoping that you already have one because its late on a Friday. :) Thanks, Hank |
This might have been commented on before (would need to look above), but its interesting how the declaration with two pools ending at 48 hours yields both points on the plot. I've attached the image, below. I think its correct, though I don't necessarily know how the WRES decides which 48 connects to 42 and which connects to the other 48. Maybe plotting the individual Regardless, the numbers and output look reasonable, Hank |
I'm done with my testing today. I need to pickup my kid and plan to do some email review and purging after I return. Next week, I'll add some different tests using different data, including multiple features, and test reference and valid date pools, as well. Have a great weekend! Hank |
I think the dots are connected when all the other attributes of the pool (and threshold) are identical, but the dots won't be joined if there are other things that vary. In any case, this is existing behavior, nothing special to |
Unable to reproduce this one locally using a similar snippet in another context. Can you provide the full declaration and data for this? Thanks! |
On it (but you may have to wait a bit), Hank |
The predicted data is here: github245_frov2_rfc_fcst_test_data_1.json I omitted the WRDS URL from the WRDS JSON response by replacing it with an invalid URL, I think. If that causes any problems, let me know. The declaration is below. Let me know if this is enough for you to reproduce, Hank ========== label: 'Example Meal: 6-hour AHPS vs USGS'
observed:
label: USGS NWIS Instantaneous Streamflow Observations
sources:
- interface: usgs nwis
uri: https://nwis.waterservices.usgs.gov/nwis/iv
variable:
name: '00060'
type: observations
predicted:
label: WRDS AHPS Streamflow Forecasts
sources:
- interface: wrds ahps
uri: https://[WRDS]/api/rfc_forecast/v2.0/forecast/streamflow
variable:
name: QR
unit: ft3/s
# The starting point...
lead_time_pools:
period: 6
frequency: 6
unit: hours
lead_times:
minimum: 0
maximum: 120
unit: hours
# Testing #245
time_pools:
- lead_times:
minimum: 12
maximum: 18
unit: hours
- lead_times:
minimum: 24
maximum: 48
unit: hours
reference_dates:
minimum: '2024-10-15T00:00:00Z'
maximum: '2024-10-25T17:40:44Z'
valid_dates:
minimum: '2024-10-15T00:00:00Z'
maximum: '2024-10-25T17:40:44Z'
unit_aliases:
- alias: herr
unit: hank
features:
- observed: '01631000'
predicted: FROV2
metrics:
- name: sample size
- name: volumetric efficiency
- name: bias fraction
- name: coefficient of determination
- name: mean square error skill score normalized
minimum_sample_size: 10
duration_format: hours
output_formats:
- format: netcdf2
- format: csv
- format: pairs
- format: png |
Reproduced, thanks.
|
I suppose this is a more general problem insofar as the time window would need to be completely qualified in the netcdf file names (all six dimensions) to avoid the possibility of overlap on the limited set of qualified dimensions. However, I don't think these edge cases are very likely to be encountered in practice and I'm going to start by fixing the reported problem, which involves a lead duration interval that is qualified by only one of its bookends, i.e. adding the other lead duration. It's also worth noting that one of the requirements for a netcdf3 format is a single file/blob of statistics, which would solve the problem more elegantly, #280. |
Naturally, there are also severe limitations on plotting with this new flexibility when odd/edge cases are formulated. For example, statistics plotted against the latest lead duration will show multiple instances per lead duration when there are several windows that span different lead durations but end on the same one. I think this should be transparent to users who request such odd edge cases but, if it isn't, we are unlikely to accommodate improved plots for those cases at the expense of simplicity for the 99% use case. |
Will need to fix some system tests as netcdf file names are used for some benchmarks, like 6xx. |
Benchmarks fixed and pushed. |
Add the earliest lead duration to the name of each netcdf blob, #245.
Netcdf fix is ready for testing. |
Note to self... I need to test this one again when I return to work tomorrow. I also need to figure out if it will impact the WRES GUI's mapping code once I see the new filename format in action. I figure either the WRES GUI will expect to see the old naming scheme and fail, or it will not be dependent on the naming scheme and get confused when it sees two sets of results ending at lead time 48 hours. Or maybe it works and just shows two sets of results at 48 hours to the user. I guess we'll see, Hank |
The netCDF error is resolved. I'm now going to do some additional testing using evaluation shapes. Hank |
(James: There is a question at the bottom for you.) For an HEFS streamflow evaluation, I tried this nonsensical pooling declaration just to see if could find what I expect in the pairs: lead_times:
minimum: 18
maximum: 720
unit: hours
lead_time_pools:
period: 24
frequency: 24
unit: hours
# TESTING GitHub 245
# Testing #245
time_pools:
- lead_times:
minimum: 1
maximum: 5
unit: days
- reference_dates:
minimum: 1995-03-17T00:00:00Z
maximum: 1995-03-20T00:00:00Z
- valid_dates:
minimum: 1995-03-18T00:00:00Z
maximum: 1995-03-21T00:00:00Z First, I found the pairs for the lead time pool ending at 120H, which had to be from the For the I'm wondering if its because the data required to obtain the 3/18 6Z, 24-hour rescaled value is from +outside+ of the If so, that's fine, I just want to understand how it works. Thanks, Hank |
If there's a bug in that regard, it won't be for the At least for the forecast lead duration dimension, the pool is expanded by one It should be a similar thing for |
Anyway, I think that is another ticket whose fix is independent of this ticket, i.e., this feature can be deployed when it passes UAT more generally. |
Got it. Thanks. If you happen to have the ABRN1 HEFS streamflow data and example declaration, you should be able to reproduce what I'm seeing. Note that I have only reproduced this issue in-memory. Currently, my user database is being used by Evan to test automated system testing. I'll try to reproduce it using a database later; I just need to find an existing one in the -dev Postgres server to use. I'll try to create the ticket before I leave today, but it may not be until tomorrow. Other work has grabbed my attention. Thanks, Hank |
I'll note that, beyond the issue documented in #357, the pools constructed for the HEFS test looked good. My next test of this ticket will use multiple features as well as multiple Hank |
Results look reasonable for a multi feature evaluation using these pools: lead_time_pools:
period: 6
frequency: 6
unit: hours
lead_times:
minimum: 0
maximum: 120
unit: hours
time_pools:
- lead_times:
minimum: 0
maximum: 5
unit: days
- reference_dates:
minimum: '2024-10-29T00:00:00Z'
maximum: '2024-11-03T00:00:00Z'
- reference_dates:
minimum: '2024-11-03T00:00:00Z'
maximum: '2024-11-08T00:00:00Z' The output images are a bit weird, but the pools are weird, as well, and I can make sense of it. For example: The three points connected with lines are the three One question I have is this: the evaluation includes the top-level lead times pools, the overall lead times pool, and the two reference dates pools. Does it default to an issued time based domain axis whenever reference dates pools are included? Note that I just included the Anyway, the images and CSV make sense, so this looks good. I'm now going to exercise some odd ball settings to see what happens. Hank |
I ran a battery of small tests late last week and just now, and I'm getting reasonable results. I looked at non-overlapping I think this feature is good to go, but note that #357 is still open. It was a pre-existing issue, so this feature doesn't necessary have to wait for that one. Thanks, Hank |
This passes my UAT, which admittedly is not as extensive as what I did in early Nov. I'm just ran a couple of tests to confirm the behavior is unchanged. Hank |
Author Name: James (James)
Original Redmine Issue: 86646, https://vlab.noaa.gov/redmine/issues/86646
Original Date: 2021-01-08
Given the need to evaluate an irregular sequence of pools
When I declare that in wres
Then I would like to declare it in a single evaluation
Related issue(s): #78
Redmine related issue(s): 86840, 95229, 103868, 108096
The text was updated successfully, but these errors were encountered: