Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

As a user, I would like to declare each pool within an irregular sequence of pools for evaluation #245

Closed
epag opened this issue Aug 21, 2024 · 69 comments · Fixed by #347
Closed
Assignees
Labels
enhancement New feature or improvement
Milestone

Comments

@epag
Copy link
Collaborator

epag commented Aug 21, 2024


Author Name: James (James)
Original Redmine Issue: 86646, https://vlab.noaa.gov/redmine/issues/86646
Original Date: 2021-01-08


Given the need to evaluate an irregular sequence of pools
When I declare that in wres
Then I would like to declare it in a single evaluation


Related issue(s): #78
Redmine related issue(s): 86840, 95229, 103868, 108096


@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2021-01-08T21:58:07Z


Currently we allow for the generation of a regular sequence of pools using @leadTimesPoolingWindow@ and @issuedDatesPoolingWindow@, which are used to generate the sequence of @TimeWindow@ (or outer pools) to evaluate. A @TimeWindow@ has six dimensions to be configured. When undefined, a dimension defaults to unlimited.

In keeping with the current declaration, the new declaration might look something like this (for one pool, repeated as many times as pools), but I think we can improve on this nomenclature in the same way that we can improve on the current nomenclature, i.e., this is purely to clarify the ticket, not an actual suggestion of declaration.

<pool>
        <leadHours minimum="0" maximum="18" />
        <dates earliest="2017-08-07T00:00:00Z" latest="2017-08-10T00:00:00Z" />
        <issuedDates earliest="2017-08-07T23:59:59Z" latest="2017-08-08T23:00:00Z" />
</pool>
</code>

For example, it should be clarified when a bookend is inclusive or exclusive.

edit: This would be the declaration at most, since undeclared boundaries default to unlimited. Again, repeated for up to N pools.

This declaration could not coincide with any other pool boundary definitions (edit: such as @leadTimesPoolingWindow@ - although, to be discussed, since this is more about simplicity than impossibility, i.e., you could have a regular sequence, followed by an irregular one, I suppose).

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-06-13T16:18:38Z


This one came up again today in the context of a discussion about event detection. It would not involve a tremendous amount of effort but, equally, it may not expand our userbase to those who want to perform evaluations of hydrologic events and want to use WRES (the intersection of those two things seems quite small). Regardless, this is more generally useful, so bumping the priority a little.

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2022-06-13T16:34:12Z


Thanks for finding this. Maybe it can allow us to partially check a box, by allowing users to specify their own event periods for which to compute metrics. Then again, maybe not. More information about use cases would help. Still, as you said, this should be generally useful in other cases, so why not?

Hank

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-06-13T17:07:04Z


Indeed.

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-09-13T15:36:38Z


Bumping this to 6.8 since it keeps getting requested (or, rather, users keep asking for behavior that would be fixed by this).

That said, I think "8" may be a bit optimistic for handling the graphics etc., so the estimate may need to be reconsidered - this is not simply about declaration.

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2022-10-13T13:43:10Z


Slipping to 6.9.

I know you said you plan to work on the rescale lenience ticket, but this one might be a good candidate following that ticket.

Hank

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-10-13T13:49:59Z


Agree.

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-11-08T16:57:34Z


I think this is a good candidate for the next effort, probably not an especially significant one (although there's a risk I'm forgetting something, such as whether the existing graphics templates/styles can handle fully arbitrary pool sequences).

My main reservation is that it adds yet more to the declaration schema and will further propagate nomenclature that we probably want to dispense with, if only because consistency of the nomenclature will override accuracy/precision - when we change the nomenclature, we'll need to do it everywhere at once, I think.

I am tempted to make progress on a better evaluation-specific declaration language before implementing tickets that further complicate the existing language, but that is obviously a much bigger task and not something we've discussed or agreed as near-term work.

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2022-12-09T13:46:07Z


Putting this on the backlog until progress is made with the declaration language.

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2023-01-04T13:29:59Z


In practice, this will be a breaking change insofar as file names will change, which is part of the API because the time windows used in file names will need to be fully qualified, including for lead durations.

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2023-06-05T16:54:29Z


Makes sense to target v6.15 for this one, as the new declaration language will be live.

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2023-10-24T16:24:07Z


I'm delaying this to 6.17, as I doubt it will be done by the end of the week.

Hank

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: Evan (Evan)
Original Date: 2023-11-15T16:19:03Z


I missed this one, is this in 6.17? Is there UAT?

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2023-11-15T18:04:36Z


James:

Do you want to tackle this as part of 6.18 or slip it to the backlog?

Hank

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: James (James)
Original Date: 2023-11-15T18:17:51Z


In general, it's probably better to put everything on the backlog and then select from the backlog for the next release at the moment it can be nominated confidently, but it's OK to collapse that process in the really obvious cases. Otherwise, we're left performing musical chairs each time. In short, backlog.

@epag
Copy link
Collaborator Author

epag commented Aug 21, 2024


Original Redmine Comment
Author Name: Hank (Hank)
Original Date: 2023-11-15T18:26:55Z


Done,

Hank

@james-d-brown james-d-brown self-assigned this Oct 16, 2024
@james-d-brown james-d-brown added the enhancement New feature or improvement label Oct 16, 2024
@james-d-brown james-d-brown added this to the v6.27 milestone Oct 16, 2024
@james-d-brown
Copy link
Collaborator

Looking at this one.

@james-d-brown
Copy link
Collaborator

james-d-brown commented Oct 16, 2024

First task is to determine the appropriate declaration. I am leaning towards the following in its most complete form (example with two pools):

time_pools:
  - lead_times:
      minimum: 1
      maximum: 6
      unit: hours
    reference_dates:
      minimum: 2551-03-17T00:00:00Z
      maximum: 2551-03-20T00:00:00Z
    valid_dates:
      minimum: 2551-03-18T00:00:00Z
      maximum: 2551-03-21T00:00:00Z  
  - lead_times:
      minimum: 7
      maximum: 12
      unit: hours
    reference_dates:
      minimum: 2551-03-21T00:00:00Z
      maximum: 2551-03-23T00:00:00Z
    valid_dates:
      minimum: 2551-03-22T00:00:00Z
      maximum: 2551-03-24T00:00:00Z

Or, in the more succinct flow style:

time_pools:
  - lead_times: { minimum: 1, maximum: 6, unit: hours }
    reference_dates: {minimum: 2551-03-17T00:00:00Z, maximum: 2551-03-20T00:00:00Z }
    valid_dates: { minimum: 2551-03-18T00:00:00Z, }, maximum: 2551-03-21T00:00:00Z  }
  - lead_times: { minimum: 7, maximum: 12, unit: hours }
    reference_dates: {minimum: 2551-03-21T00:00:00Z, maximum: 2551-03-23T00:00:00Z }
    valid_dates: { minimum: 2551-03-22T00:00:00Z, }, maximum: 2551-03-24T00:00:00Z  }

The repetition of the lead_time unit is a bit yucky. This is arguably a consequence of not using the ISO8601 standard for durations, which have the units embedded, but users find rather confusing (e.g., PT1H). Also, abstracting that out to a higher level or implementing a default may not be less confusing, since a time window is three-dimensional and this behavior is consistent with the overall declaration of lead_time bounds where the unit is required. Overall, I think we accept clarity at the expense of repetition for the lead duration unit.

@james-d-brown
Copy link
Collaborator

james-d-brown commented Oct 16, 2024

Each time pool is (up to) three-dimensional, with two bookends for each pool. When a dimension is undeclared, its defaults are taken from the relevant overall time bounds that are declared or the infinite interval, in that order.

Given the schema more generally, they do need to be declared in terms of a minimum and maximum, I think.

This will, however, introduce a slight complication insofar as the automatically generated pools are non-overlapping by design, so the lower bound of each pool is always exclusive. To ensure a consistent interpretation, regardless of origin, we'll need to adjust the declared minimum slightly, in code, to be the smallest possible temporal unit (that being a nanosecond) before the declared minimum. That way, the minimum is always inclusive, in practice, in keeping with the declaration, but the time windows have a consistent interpretation in code, regardless of whether they were explicitly declared or automatically generated.

@james-d-brown
Copy link
Collaborator

I will await any input and begin implementation tomorrow.

@HankHerr-NOAA
Copy link
Contributor

James:

What you propose seems reasonable to me. Can you explain the interaction between the pools defined in time_pools and the top level pools? Would these pools just augment the list of pools created by the top level declaration?

Thanks,
Hank

@james-d-brown
Copy link
Collaborator

I expect they will be additive to the automatically generated pools from lead_time_pools, reference_date_pools and valid_date_pools, but there will probably be a declaration warning. There may be situations where both make sense in the same evaluation, i.e., a regular sequence, together with an explicit pool or two, defined separately, so we probably shouldn't disallow.

@HankHerr-NOAA
Copy link
Contributor

Sounds reasonable to me. Thanks!

Hank

@james-d-brown
Copy link
Collaborator

Each time pool is (up to) three-dimensional, with two bookends for each pool. When a dimension is undeclared, its defaults are taken from the relevant overall time bounds that are declared or the infinite interval, in that order.

Schema updated and some unit tests written for deserializing via the DeclarationFactory. Currently working on unit tests for the above part in the DeclarationInterpolator and will then make them pass.

@james-d-brown
Copy link
Collaborator

Have the failing tests, making them pass.

@epag
Copy link
Collaborator Author

epag commented Oct 18, 2024

Moving this to 6.28 unless its done and we end up needing an RC2

@epag epag removed this from the v6.27 milestone Oct 18, 2024
@HankHerr-NOAA
Copy link
Contributor

I'm going to use the rest of my Friday to start testing this ticket. I'm going to begin with basic, in-memory evaluations of a simple RFC forecast for FROV2.

Hank

@HankHerr-NOAA
Copy link
Contributor

A first evaluation attempt using just 'time_pools' was successful with two explicit pools. The output was as expected. However, when I combined it with the original declaration using 'lead_time_pools', I encountered a netCDF error. Here is the YAML snippet:

lead_time_pools:
  period: 6
  frequency: 6
  unit: hours
lead_times:
  minimum: 0
  maximum: 120
  unit: hours

time_pools:
  - lead_times:
      minimum: 12
      maximum: 18
      unit: hours
  - lead_times:
      minimum: 24
      maximum: 48
      unit: hours

Here is the exception:

2024-10-25T18:32:22.216+0000 3577936 [main] ERROR Main - Operation 'execute' completed unsuccessfully
wres.pipeline.InternalWresException: Could not complete project execution
        at wres.pipeline.Evaluator.evaluate(Evaluator.java:406)
        at wres.pipeline.Evaluator.evaluate(Evaluator.java:178)
        at wres.Functions.evaluate(Functions.java:135)
        at wres.Functions.evaluate(Functions.java:97)
        at wres.helpers.MainUtilities.call(MainUtilities.java:107)
        at wres.Main.completeExecution(Main.java:206)
        at wres.Main.main(Main.java:126)
Caused by: wres.pipeline.WresProcessingException: Encountered an error while processing evaluation '-X-aKK3-D2bEaf_Ds5StS4SLh1w': 
        at wres.pipeline.Evaluator.evaluate(Evaluator.java:835)
        at wres.pipeline.Evaluator.evaluate(Evaluator.java:377)
        ... 6 common frames omitted
Caused by: java.lang.IllegalStateException: Cannot write to /tmp/wres_evaluation_-X-aKK3-D2bEaf_Ds5StS4SLh1w/WRDS_AHPS_Streamflow_Forecasts_20241025T174044Z_48_HOURS.nc because it already exists.
        at wres.writing.netcdf.NetcdfOutputFileCreator2.create(NetcdfOutputFileCreator2.java:98)
        at wres.writing.netcdf.NetcdfOutputWriter.createBlobsAndBlobWriters(NetcdfOutputWriter.java:591)
        at wres.writing.netcdf.NetcdfOutputWriter.createBlobsForWriting(NetcdfOutputWriter.java:490)
        at wres.pipeline.EvaluationUtilities.createNetcdfBlobs(EvaluationUtilities.java:261)
        at wres.pipeline.Evaluator.evaluate(Evaluator.java:684)
        ... 7 common frames omitted

It appears to be due to one of the 'time_pools' ending at 48 hours while one of the lead_time_pools' also ends at 48 hours. I confirmed it by changing the maximumto 44 hours for the secondtime_pools` entry.

Just reporting the exception; I'm not positive what the best fix would be. If you need a fully reproduceable example, let me know. I'm hoping that you already have one because its late on a Friday. :) Thanks,

Hank

@HankHerr-NOAA
Copy link
Contributor

This might have been commented on before (would need to look above), but its interesting how the declaration with two pools ending at 48 hours yields both points on the plot. I've attached the image, below. I think its correct, though I don't necessarily know how the WRES decides which 48 connects to 42 and which connects to the other 48. Maybe plotting the individual time_pools as disconnected points would be better? Maybe given the oddity of the evaluation, we shouldn't worry about it.

Regardless, the numbers and output look reasonable,

Hank

01631000_FROV2_WRDS_AHPS_Streamflow_Forecasts_SAMPLE_SIZE

@HankHerr-NOAA
Copy link
Contributor

I'm done with my testing today. I need to pickup my kid and plan to do some email review and purging after I return. Next week, I'll add some different tests using different data, including multiple features, and test reference and valid date pools, as well.

Have a great weekend!

Hank

@james-d-brown
Copy link
Collaborator

I think the dots are connected when all the other attributes of the pool (and threshold) are identical, but the dots won't be joined if there are other things that vary. In any case, this is existing behavior, nothing special to time_pools. The netcdf error sounds like a bug, though, or rather a weakness in how we qualify that output type (insufficiently). Will require a more comprehensive file name, at least in these circumstances, I suppose.

@james-d-brown
Copy link
Collaborator

A first evaluation attempt using just 'time_pools' was successful with two explicit pools. The output was as expected. However, when I combined it with the original declaration using 'lead_time_pools', I encountered a netCDF error. Here is the YAML snippet:

lead_time_pools:
  period: 6
  frequency: 6
  unit: hours
lead_times:
  minimum: 0
  maximum: 120
  unit: hours

time_pools:
  - lead_times:
      minimum: 12
      maximum: 18
      unit: hours
  - lead_times:
      minimum: 24
      maximum: 48
      unit: hours

Here is the exception:

2024-10-25T18:32:22.216+0000 3577936 [main] ERROR Main - Operation 'execute' completed unsuccessfully
wres.pipeline.InternalWresException: Could not complete project execution
        at wres.pipeline.Evaluator.evaluate(Evaluator.java:406)
        at wres.pipeline.Evaluator.evaluate(Evaluator.java:178)
        at wres.Functions.evaluate(Functions.java:135)
        at wres.Functions.evaluate(Functions.java:97)
        at wres.helpers.MainUtilities.call(MainUtilities.java:107)
        at wres.Main.completeExecution(Main.java:206)
        at wres.Main.main(Main.java:126)
Caused by: wres.pipeline.WresProcessingException: Encountered an error while processing evaluation '-X-aKK3-D2bEaf_Ds5StS4SLh1w': 
        at wres.pipeline.Evaluator.evaluate(Evaluator.java:835)
        at wres.pipeline.Evaluator.evaluate(Evaluator.java:377)
        ... 6 common frames omitted
Caused by: java.lang.IllegalStateException: Cannot write to /tmp/wres_evaluation_-X-aKK3-D2bEaf_Ds5StS4SLh1w/WRDS_AHPS_Streamflow_Forecasts_20241025T174044Z_48_HOURS.nc because it already exists.
        at wres.writing.netcdf.NetcdfOutputFileCreator2.create(NetcdfOutputFileCreator2.java:98)
        at wres.writing.netcdf.NetcdfOutputWriter.createBlobsAndBlobWriters(NetcdfOutputWriter.java:591)
        at wres.writing.netcdf.NetcdfOutputWriter.createBlobsForWriting(NetcdfOutputWriter.java:490)
        at wres.pipeline.EvaluationUtilities.createNetcdfBlobs(EvaluationUtilities.java:261)
        at wres.pipeline.Evaluator.evaluate(Evaluator.java:684)
        ... 7 common frames omitted

It appears to be due to one of the 'time_pools' ending at 48 hours while one of the lead_time_pools' also ends at 48 hours. I confirmed it by changing the maximumto 44 hours for the secondtime_pools` entry.

Just reporting the exception; I'm not positive what the best fix would be. If you need a fully reproduceable example, let me know. I'm hoping that you already have one because its late on a Friday. :) Thanks,

Hank

Unable to reproduce this one locally using a similar snippet in another context. Can you provide the full declaration and data for this? Thanks!

@james-d-brown james-d-brown reopened this Oct 28, 2024
@HankHerr-NOAA
Copy link
Contributor

On it (but you may have to wait a bit),

Hank

@HankHerr-NOAA
Copy link
Contributor

The predicted data is here:

github245_frov2_rfc_fcst_test_data_1.json

I omitted the WRDS URL from the WRDS JSON response by replacing it with an invalid URL, I think. If that causes any problems, let me know. The declaration is below. Let me know if this is enough for you to reproduce,

Hank

==========

label: 'Example Meal: 6-hour AHPS vs USGS'
observed:
  label: USGS NWIS Instantaneous Streamflow Observations
  sources:
  - interface: usgs nwis
    uri: https://nwis.waterservices.usgs.gov/nwis/iv
  variable:
    name: '00060'
  type: observations
predicted:
  label: WRDS AHPS Streamflow Forecasts
  sources:
  - interface: wrds ahps
    uri: https://[WRDS]/api/rfc_forecast/v2.0/forecast/streamflow
  variable:
    name: QR
unit: ft3/s

# The starting point...
lead_time_pools:
  period: 6
  frequency: 6
  unit: hours
lead_times:
  minimum: 0
  maximum: 120
  unit: hours

# Testing #245
time_pools:
  - lead_times:
      minimum: 12
      maximum: 18
      unit: hours
  - lead_times:
      minimum: 24
      maximum: 48
      unit: hours  
  
reference_dates:
  minimum: '2024-10-15T00:00:00Z'
  maximum: '2024-10-25T17:40:44Z'
valid_dates:
  minimum: '2024-10-15T00:00:00Z'
  maximum: '2024-10-25T17:40:44Z'
unit_aliases:
- alias: herr
  unit: hank
features:
- observed: '01631000'
  predicted: FROV2
metrics:
- name: sample size
- name: volumetric efficiency
- name: bias fraction
- name: coefficient of determination
- name: mean square error skill score normalized
minimum_sample_size: 10
duration_format: hours
output_formats:
- format: netcdf2
- format: csv
- format: pairs
- format: png

@james-d-brown
Copy link
Collaborator

Reproduced, thanks.

Caused by: wres.pipeline.WresProcessingException: Encountered an error while processing evaluation 'jV41clMvvXoXwQoGUEmWePRKU3g': 
	at wres.pipeline.Evaluator.evaluate(Evaluator.java:835)
	at wres.pipeline.Evaluator.evaluate(Evaluator.java:377)
	... 7 common frames omitted
Caused by: java.lang.IllegalStateException: Cannot write to <SNIP>\wres_evaluation_jV41clMvvXoXwQoGUEmWePRKU3g\WRDS_AHPS_Streamflow_Forecasts_20241025T174044Z_48_HOURS.nc because it already exists.
	at wres.writing.netcdf.NetcdfOutputFileCreator2.create(NetcdfOutputFileCreator2.java:98)
	at wres.writing.netcdf.NetcdfOutputWriter.createBlobsAndBlobWriters(NetcdfOutputWriter.java:591)
	at wres.writing.netcdf.NetcdfOutputWriter.createBlobsForWriting(NetcdfOutputWriter.java:490)
	at wres.pipeline.EvaluationUtilities.createNetcdfBlobs(EvaluationUtilities.java:261)
	at wres.pipeline.Evaluator.evaluate(Evaluator.java:684)
	... 8 common frames omitted

@james-d-brown
Copy link
Collaborator

james-d-brown commented Oct 28, 2024

I suppose this is a more general problem insofar as the time window would need to be completely qualified in the netcdf file names (all six dimensions) to avoid the possibility of overlap on the limited set of qualified dimensions. However, I don't think these edge cases are very likely to be encountered in practice and I'm going to start by fixing the reported problem, which involves a lead duration interval that is qualified by only one of its bookends, i.e. adding the other lead duration.

It's also worth noting that one of the requirements for a netcdf3 format is a single file/blob of statistics, which would solve the problem more elegantly, #280.

@james-d-brown
Copy link
Collaborator

Naturally, there are also severe limitations on plotting with this new flexibility when odd/edge cases are formulated. For example, statistics plotted against the latest lead duration will show multiple instances per lead duration when there are several windows that span different lead durations but end on the same one. I think this should be transparent to users who request such odd edge cases but, if it isn't, we are unlikely to accommodate improved plots for those cases at the expense of simplicity for the 99% use case.

@james-d-brown
Copy link
Collaborator

Will need to fix some system tests as netcdf file names are used for some benchmarks, like 6xx.

@james-d-brown
Copy link
Collaborator

Benchmarks fixed and pushed.

james-d-brown added a commit that referenced this issue Oct 29, 2024
Add the earliest lead duration to the name of each netcdf blob, #245.
@james-d-brown
Copy link
Collaborator

Netcdf fix is ready for testing.

@HankHerr-NOAA
Copy link
Contributor

Note to self... I need to test this one again when I return to work tomorrow. I also need to figure out if it will impact the WRES GUI's mapping code once I see the new filename format in action. I figure either the WRES GUI will expect to see the old naming scheme and fail, or it will not be dependent on the naming scheme and get confused when it sees two sets of results ending at lead time 48 hours. Or maybe it works and just shows two sets of results at 48 hours to the user.

I guess we'll see,

Hank

@HankHerr-NOAA
Copy link
Contributor

The netCDF error is resolved. I'm now going to do some additional testing using evaluation shapes.

Hank

@HankHerr-NOAA
Copy link
Contributor

(James: There is a question at the bottom for you.)

For an HEFS streamflow evaluation, I tried this nonsensical pooling declaration just to see if could find what I expect in the pairs:

lead_times:
  minimum: 18
  maximum: 720
  unit: hours
lead_time_pools:
  period: 24
  frequency: 24
  unit: hours
# TESTING GitHub 245
# Testing #245
time_pools:
  - lead_times:
      minimum: 1
      maximum: 5
      unit: days
  - reference_dates:
      minimum: 1995-03-17T00:00:00Z
      maximum: 1995-03-20T00:00:00Z
  - valid_dates:
      minimum: 1995-03-18T00:00:00Z
      maximum: 1995-03-21T00:00:00Z

First, I found the pairs for the lead time pool ending at 120H, which had to be from the time_pools lead_times because 120 is not part of the top level pooling (starting at 18 counting by 24). I then found the pairs for the reference_dates pool confirming that all lead times were included as expected.

For the valid_dates, however, I'm struggling a bit to understand what I'm seeing, but I have a theory. With the data being rescaled to 24-hours, the valid date of every pair is at 06:00:00Z. Between 3/18/1995 0Z and 3/21/1995 0Z, that means I might expect values for 3/18 6Z, 3/19 6Z, and 3/20 6Z, and I would expect that for any forecast value with a lead time <= 720 hours. However, I only saw values for 3/19 and 3/20 6Z. Why?

I'm wondering if its because the data required to obtain the 3/18 6Z, 24-hour rescaled value is from +outside+ of the valid_dates pooling window (i.e., some of the data comes from 3/17). If so, that would seemingly imply the window is applied to the 6-hour values before rescaling to 24-hours is done. Is that correct?

If so, that's fine, I just want to understand how it works.

Thanks,

Hank

@james-d-brown
Copy link
Collaborator

james-d-brown commented Nov 6, 2024

If there's a bug in that regard, it won't be for the time_pools specifically as that is merely another interface to defining the time windows that drive retrieval.

At least for the forecast lead duration dimension, the pool is expanded by one time_scale period earlier than the minimum value of the lead_times to ensure that there is sufficient data to form every upscaled pair whose lead time falls within the pool, meaning the lead time at the end of the upscaled period.

It should be a similar thing for valid_dates, but now I am suspecting that is not the case because we don't encounter pools that constrain valid_dates very often, so this interface is probably exposing an existing bug whereby we don't subtract the period associated with the time_scale from the lower bound of the valid_dates at retrieval time. Would be curious to see if you got the same results in in-memory mode, but the same logic should be applied for in-memory retrieval.

@james-d-brown
Copy link
Collaborator

Anyway, I think that is another ticket whose fix is independent of this ticket, i.e., this feature can be deployed when it passes UAT more generally.

@HankHerr-NOAA
Copy link
Contributor

Got it. Thanks. If you happen to have the ABRN1 HEFS streamflow data and example declaration, you should be able to reproduce what I'm seeing. Note that I have only reproduced this issue in-memory. Currently, my user database is being used by Evan to test automated system testing. I'll try to reproduce it using a database later; I just need to find an existing one in the -dev Postgres server to use.

I'll try to create the ticket before I leave today, but it may not be until tomorrow. Other work has grabbed my attention.

Thanks,

Hank

@HankHerr-NOAA
Copy link
Contributor

I'll note that, beyond the issue documented in #357, the pools constructed for the HEFS test looked good.

My next test of this ticket will use multiple features as well as multiple time_pools. I'll probably use an RFC forecast evaluation as a starting point.

Hank

@HankHerr-NOAA
Copy link
Contributor

Results look reasonable for a multi feature evaluation using these pools:

lead_time_pools:
  period: 6
  frequency: 6
  unit: hours
lead_times:
  minimum: 0
  maximum: 120
  unit: hours
time_pools:
  - lead_times:
      minimum: 0
      maximum: 5
      unit: days
  - reference_dates:
      minimum: '2024-10-29T00:00:00Z'
      maximum: '2024-11-03T00:00:00Z'
  - reference_dates:
      minimum: '2024-11-03T00:00:00Z'
      maximum: '2024-11-08T00:00:00Z'

The output images are a bit weird, but the pools are weird, as well, and I can make sense of it. For example:

02040000_MTXV2_WRDS_AHPS_Streamflow_Forecasts_SAMPLE_SIZE

The three points connected with lines are the three time_pools. The far left one is the first reference_dates pool, on the right the second reference_dates pool, and the one in the middle is the lead_times time pool. I assume they are connected because each includes all of the lead times, (0, 120].

One question I have is this: the evaluation includes the top-level lead times pools, the overall lead times pool, and the two reference dates pools. Does it default to an issued time based domain axis whenever reference dates pools are included? Note that I just included the png output format with no options specified, so its going with the defaults.

Anyway, the images and CSV make sense, so this looks good. I'm now going to exercise some odd ball settings to see what happens.

Hank

@HankHerr-NOAA
Copy link
Contributor

I ran a battery of small tests late last week and just now, and I'm getting reasonable results. I looked at non-overlapping reference_dates ranges defined at the top level and in time_pools, lead times pooling windows that are non-overlapping, overlapping, and one completely within another, and some bad declaration checks. Everything looks reasonable, though obviously my tests were not exhaustive.

I think this feature is good to go, but note that #357 is still open. It was a pre-existing issue, so this feature doesn't necessary have to wait for that one.

Thanks,

Hank

@HankHerr-NOAA
Copy link
Contributor

This passes my UAT, which admittedly is not as extensive as what I did in early Nov. I'm just ran a couple of tests to confirm the behavior is unchanged.

Hank

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or improvement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants