-
Notifications
You must be signed in to change notification settings - Fork 1
Declaration Language
- 1. What is the simplest possible evaluation I can declare?
- 2. What is the format of the evaluation language?
- 3. How do I declare the datasets to evaluate?
-
4. How do I declare geographic features to evaluate?
- 4.1. When do I need to declare geographic features?
- 4.2. How do I declare a list of geographic features to evaluate?
- 4.3. Can I declare a geographic feature authority explicitly?
- 4.4. What if I don’t know the relationship between different geographic features?
- 4.5. How do I declare a region to evaluate without listing all of the features within it?
- 4.6. How do I declare a spatial mask to evaluate only a subset of features?
- 4.7. How do I declare a datum offset for each feature?
- 5. How do I filter the time-series data to consider only specific dates or times?
- 6. How do I declare the measurement units for the evaluation?
- 7. How do I filter the time-series data to consider only values within a range?
-
8. How do I declare thresholds to use in an evaluation?
- 8.1. What types of thresholds are supported and how do I declare them?
- 8.2. What if I want to declare different thresholds for different metrics?
- 8.3. Can I obtain thresholds from external data sources?
- 8.4. How do I declare thresholds from the Water Resources Data Service (WRDS)?
- 8.5. How do I declare thresholds to read from CSV files?
- 9. How do I declare the pools of data that should be evaluated separately?
- 10. How do I declare the desired timescale (e.g., accumulation period)?
- 11. How do I declare the metrics to evaluate?
- 12. How do I declare summary statistics?
- 13. How do I ask for sampling uncertainties to be estimated?
- 14. How do I declare output formats to write?
- 15. Are there any other options?
- 16. Do you have some examples of complete declarations?
- 17. Does the declaration language use a schema?
- 18. What does this error really mean?
The declaration language refers to the language employed by the WRES to declare the contents of an evaluation.
The simplest possible evaluation contains the paths to each of two datasets whose values will be compared or evaluated:
observed: observations.csv
predicted: predictions.csv
In this example, the two datasets contain time-series values in CSV format and they are located in the user’s home directory, otherwise absolute paths must be declared. The WRES will automatically detect the format of the supplied datasets.
In this example, the WRES will make some reasonable choices about other aspects of the evaluation, such as the metrics to compute (depending on the data it sees) and the statistics formats to write.
The language of “observed” and “predicted” is simply intended to clarify the majority use case of the WRES, which is to compare predictions and observations. When computing error values, the order of calculation is to subtract the observed
values from the predicted
values. Thus, a negative error means that the predictions are too low and a positive error means they are too high. Beyond this, the WRES is agnostic about the content or origin of these datasets and simply views them as two time-series datasets. For example, observed or measured values could be used in both the observed
and predicted
slots, if desired.
An evaluation is declared to WRES in a prescribed format and with a prescribed grammar. The format or “serialization format” is YAML, which is a recursive acronym, “YAML Ain’t Markup Language”. The evaluation language itself builds on this serialization format and contains the grammar understood by the WRES software. For example, datasets can be declared, together with any optional filters, metrics or statistics formats to create.
It may be interesting to know that YAML is a superset of JSON, which means that any evaluation declared to WRES using YAML has an equivalent declaration in JSON, which the WRES will accept. For example, the equivalent minimal evaluation in JSON is:
{
"observed": "observations.csv",
"predicted": "predictions.csv"
}
As you can see, YAML tends to be cleaner and more human readable than JSON, but JSON is perfectly acceptable if you are familiar with it and prefer to use it.
If you are curious, the following resources provide some more information about YAML:
- https://en.wikipedia.org/wiki/YAML [comprehensive description and examples]
- https://www.yamllint.com/ [this will tell you whether your declaration is valid YAML]
As indicated above, the basic datasets to compare are observed
and predicted
:
observed: observations.csv
predicted: predictions.csv
Additionally, a baseline
dataset may be declared as a benchmark for the predicted
dataset.
observed: observations.csv
predicted: predictions.csv
baseline: baseline_predictions.csv
For example, when computing a mean square error skill score
, the mean square error
is first computed by comparing the predicted
and observed
datasets and then, separately, by comparing the baseline
and observed
datasets and then, finally, by comparing the two mean square error
scores in a skill score.
As of v6.22, covariate datasets can be used to filter evaluation pairs. For example, precipitation forecasts may be evaluated conditionally upon observed temperatures (a covariate) being at or below freezing. Further information about covariates is available here: Using covariates as filters.
In the simplest case, involving a single covariate dataset without additional parameters, the covariate may be declared in the same way as other datasets (note the plural form, covariates
, because there may be one or more):
observed: observations.csv
predicted: predictions.csv
baseline: baseline_predictions.csv
covariates: covariate_observations.csv
In this example, the evaluation pairs will include only those valid times when the covariate is also defined.
Unlike the observed
, predicted
and baseline
datasets, more than one covariate may be declared using a list. For example:
observed: observations.csv
predicted: predictions.csv
baseline: baseline_predictions.csv
covariates:
- sources: precipitation.csv
variable: precipitation
- sources: temperature.csv
variable: temperature
In this case, the list includes two covariates, one that contains precipitation observations and one that contains temperature observations.
Covariates may be declared with a minimum
and/or maximum
value. This will additionally filter evaluation pairs to only those valid times when the covariate meets the filter condition(s). For example:
observed: observations.csv
predicted: predictions.csv
baseline: baseline_predictions.csv
covariates:
- sources: precipitation.csv
variable: precipitation
minimum: 0.25
- sources: temperature.csv
variable: temperature
maximum: 0
In this case, the evaluation pairs will include only those valid times when the temperature is at or below freezing, 0°C, and the precipitation equals or exceeds 0.25mm. The measurement units correspond to the unit in which the covariate data is defined. Currently, it is not possible to transform the measurement unit of a covariate prior to filtering. In addition, the values must be declared at the evaluation time_scale
, whether or not this is declared explicitly. For example, if the evaluation is concerned with daily average streamflow, then each covariate filter should be declared as a daily value. However, the time scale function can be declared separately for each covariate using the rescale_function
. For example:
observed:
sources: observations.csv
variable: streamflow
predicted:
sources: predictions.csv
variable: streamflow
covariates:
- sources: precipitation.csv
variable: precipitation
minimum: 0.25
rescale_function: total
- sources: temperature.csv
variable: temperature
maximum: 0
rescale_function: minimum
time_scale:
period: 24
unit: hours
function: mean
In this case, the subject of the evaluation is daily mean streamflow and the streamflow pairs will include only those valid times when the daily total precipitation exceeds 0.25mm and the minimum daily temperature is below freezing.
Otherwise, all of the parameters that can be used to clarify an observed
or predicted
dataset can be used to clarify a covariate dataset (see How do I clarify the datasets to evaluate, such as the variable to use?).
You can declare multiple datasets by listing them. In this regard, YAML has two styles for collections, such as arrays, lists and maps. The ordinary or “block” style includes one item on each line. For example, if the observed
dataset contains several URIs, they may be declared as follows:
observed: observed.csv
predicted:
- predictions.csv
- more_predictions.csv
- yet_more_predictions.csv
In this context, the dashes and indentations are important to preserve. You should use two spaces for each new level of indentation, as in the example above.
Alternatively, you may use the “flow” style, which places all items in a continuous list or array and uses square brackets to begin and end the list:
observed: observed.csv
predicted: [predictions.csv, more_predictions.csv, yet_more_predictions.csv]
In some cases, it may be necessary to clarify the datasets to evaluate. For example, if a URI references a dataset that contains multiple variables, it may be necessary to clarify the variable to evaluate. In other cases, it may be necessary to clarify the time zone offset associated with the time-series or to apply additional parameters that filter data from a web service request.
When clarifying a property of a dataset, it is necessary to distinguish it from the other properties. For example, if a URI refers to a dataset that contains some missing values and the missing value identifier is not clarified by the source format itself, then it may be necessary to clarify this within the declaration:
observed:
- uri: some_observations.csv
missing_value: -999.0
- more_predictions.csv
predicted: some_predictions.csv
Here, the some_observations.csv
has now acquired a uri
property, in order to distinguish it from the missing_value
.
Likewise, it may be necessary to clarify some attribute of a dataset as a whole, such as the variable to evaluate (which applies to all sources of data within the dataset). In that case, it would be further necessary to distinguish the data sources
from the variable
:
observed:
sources:
- uri: some_observations.csv
missing_value: -999.0
- more_predictions.csv
variable: streamflow
predicted: some_predictions.csv
The following table contains the options that may be used to clarify either an observed
or predicted
dataset as of v6.14, with examples in context. You can also examine the schema, Does the declaration language use a schema?, which defines the superset of all possible evaluations supported by WRES.
Option | Purpose | Examples in context |
---|---|---|
sources |
To clarify the list of sources to evaluate when other options are present for the dataset as a whole. | observed: |
uri |
To clarify the URI associated with a dataset when other options are present for the dataset associated with that URI. | observed: |
variable |
To clarify the variable to evaluate when a data source contains multiple variables. Optionally, one or more variable aliases may be included, which will be treated as equivalent to the named variable. |
observed: observed: |
feature_authority |
To clarify the feature authority used to name features. This may be required when correlating feature names across datasets. For example, to correlate a USGS Site Code of 06893000 with a National Weather Service "Hankbook 5" feature name of KCDM7 , it is either necessary to explicitly correlate these two names in the declaration or it is necessary to use one of the names and to resolve the correlated feature with a feature service request. For this request to succeed, the feature service will need to know that 06893000 is a usgs site code or, equivalently, that the KCDM7 is an nws lid . The supported values for the feature_authority are:- nws lid ,- usgs site code ,- nwm feature id ; and- custom , which is the default. |
observed: |
type |
In rare cases, it may be necessary to clarify the type of dataset. For example, when requesting time-series datasets from web services that support multiple types of data, it may be necessary to clarify the type of data required. The supported values for the type are:- ensemble forecasts ,- single valued forecasts ,- observations ,- simulations , and- analyses . |
observed: some_observations.csv |
label |
A user-friendly label for the dataset, which will appear in the statistics formats, where appropriate. | observed: some_observations.csv |
ensemble_filter |
A filter that selects a subset of the ensemble forecasts to include in the evaluation or exclude from the evaluation. Only applies to datasets that contain ensemble forecasts. By default, the named members are included. |
observed: some_observations.csv observed: some_observations.csv |
time_shift |
A time shift that is applied to the valid times associated with all time-series values. This may be used to to help pair values whose times are not exactly coincident. | observed: |
time_scale |
The timescale associated with the time-series values. This may be necessary when the timescale is not explicitly included in the source format. In general, a time-scale is only required when the time-series values must be rescaled in order to form pairs. For example, if the observed dataset contains instantaneous values and the predicted dataset contains values that represent a 6-hour average, then the observed time-series values must be "upscaled" to 6-hourly averages before they can be paired with their corresponding predicted values. Upscaling to a desired time scale is only possible if the existing timescale is known/declared. |
observed: |
time_zone_offset |
The time zone offset associated with the dataset. This is only necessary when the source format does not explicitly identify the time zone in which the timestamps are recorded. Accepts either a quantitative time zone offset or, less precisely, a time zone shorthand, such as CST (Central Standard Time). When using a numeric offset, the value must be enclosed within single or double quotes to clarify that it should be treated as a time zone offset and not a number. |
observed: |
The following table contains the additional options that may be used to clarify a baseline
dataset as of v6.14, with examples in context. For the avoidance of doubt, these options extend the options available for an observed
or predicted
dataset.
Option | Purpose | Examples in context |
---|---|---|
persistence |
Allows for the declaration of a persistence baseline from a prescribed data source. The persistence time-series will be generated using the specified order or "lag", which corresponds to the value before the current time that will be persisted forwards into the future. For example, "1" means that the value from the persistence source that occurs one timestep prior to the current time will be persisted forwards. In this context, "current time" means the valid time of a non-forecast source or the reference time of a forecast source. The default value for the order is 1. |
observed: some_observations.csv observed: some_observations.csv |
climatology |
Allows for the declaration of a climatology baseline from a prescribed data source. For a given valid time, the climatology will contain the value from the prescribed data source at the corresponding valid time in each historical year of record, other than the year associated with the valid time (which is typically the "verifying observation"). The period associated with the climatology may be further constrained by a minimum_date and/or a maximum_date . Optionally, the climatology may be converted to a single-valued dataset by prescribing an average . The supported values for the average are:- mean ; and- median. The default value for the average is mean . |
observed: some_observations.csv observed: some_observations.csv |
separate_metrics |
A flag (true or false ) that indicates whether the same metrics computed for the predicted dataset should also be computed for the baseline dataset. When true , all metrics will be computed for the baseline dataset, otherwise the baseline will only appear in skill calculations for the predicted dataset. |
observed: some_observations.csv |
The following table contains the additional options that may be used to clarify covariates
as of v6.23, with examples in context. For the avoidance of doubt, these options extend the options available for an observed
or predicted
dataset.
Option | Purpose | Examples in context |
---|---|---|
minimum |
Allows for the declaration of a minimum value the covariate should take. Only those evaluation pairs will be considered when the covariate value is at or above the minimum value at the same valid time. The measurement unit is the unit in which the covariate dataset is supplied. The time scale is the evaluation time scale. |
observed: some_observations.csv |
maximum |
Allows for the declaration of a maximum value the covariate should take. Only those evaluation pairs will be considered when the covariate value is at or below the maximum value at the same valid time. The measurement unit is the unit in which the covariate dataset is supplied. The time scale is the evaluation time scale. |
observed: some_observations.csv |
rescale_function |
A function to use when rescaling the covariate dataset to the evaluation time scale. | observed: some_observations.csv |
Geographic features may be declared explicitly by listing each feature to evaluate (How do I declare a list of geographic features to evaluate?). Alternatively, they may be declared implicitly, either by declaring a named region to evaluate (How do I declare a region to evaluate without listing all of the features within it?) or by declaring a geospatial mask (How do I declare a spatial mask to evaluate only a subset of features?).
There are three scenarios in which you should declare the geographic features to evaluate, namely:
- When the declared datasets contain more features than you would like to evaluate, i.e., when you would like to evaluate a subset of the features for which data is available;
- When you are reading data from a web-service, otherwise the evaluation would request a potentially unlimited amount of data; or
- When there are multiple geographic features present within the declared datasets and two or more of the datasets use a different feature naming authority. In these circumstances, it is necessary to declare how the features are correlated with each other.
If you fail to declare the geographic features in any of these scenarios, you can expect an error message.
Conversely, it is unnecessary to declare the geographic features to evaluate when:
- There is a single geographic feature in each dataset; or
- There are multiple geographic features and:
- All of the datasets use a consistent feature naming authority; and
- The evaluation should include all of the features discovered.
Different datasets may name geographic features differently. Formally speaking, they may use different “feature authorities”. For example, time-series data from the USGS National Water Information System (NWIS) uses a USGS Site Code, whereas time-series data from the National Water Model uses a National Water Model feature ID.
As such, the software allows for as many feature names as sides of data, i.e., three (observed
, predicted
and baseline
). This is referred to as a “feature tuple”.
When all sides of data have the same feature authority, and this can be established from the other declaration present, it is sufficient to declare the name for only one side of data. Otherwise, the fully qualified feature tuple must be declared or a feature service used to establish the missing names.
In the simplest case, where each side of data has the same feature authority and the aim is to pair corresponding feature names, the features may be declared as follows:
features:
- DRRC2
- DOLC2
Where DRRC2
and DOLC2
are the names of two geographic features in the National Weather Service “Handbook 5” feature authority. In this example, the evaluation will produce statistics separately for each of DRRC2
and DOLC2
.
In the more complex case, where each side of data has a separate feature authority or the feature authority cannot be determined from the other information present, then the features must be declared separately for each side of data, as follows:
features:
- {observed: '07140900', predicted: '21215289'}
- {observed: '07141900', predicted: '941030274'}
In this example, the feature authority for the observed
data is a USGS Site Code and the feature authority for the predicted
data is a National Water Model feature ID. The quotes around the names indicate that the values should be treated as characters, rather than numbers.
Yes. Often, this is unnecessary because the software can determine the feature authority from the other information present. For example, consider the following declaration:
observed:
sources:
- uri: https://nwis.waterservices.usgs.gov/nwis/iv
interface: usgs nwis
variable:
name: '00060'
predicted:
sources:
- uri: data/nwmVector/
interface: nwm short range channel rt conus
variable: streamflow
In this case, it is unambiguous that the observed
data uses a USGS Site Code because the source interface
is usgs nwis
. Likewise, the predicted
data uses a National Water Model feature ID because the source interface
is a National Water Model type, nwm short range channel rt conus
. In short, if the source interface
is declared, it should be unnecessary to define the geographic feature authority.
In other cases, time-series data may be obtained from a file source whose metadata is unclear about the feature authority. In fact, none of the time-series data formats currently supported by WRES include information about the feature authority. In this case, the feature authority may be declared explicitly:
observed:
sources: data/DRRC2QINE.xml
feature_authority: nws lid
predicted:
sources: data/drrc2ForecastsOneMonth/
feature_authority: nws lid
The above unlocks the following as valid declaration:
observed:
sources: data/DRRC2QINE.xml
feature_authority: nws lid
predicted:
sources: data/drrc2ForecastsOneMonth/
feature_authority: nws lid
features:
- DRRC2
Conversely, in the absence of the declared feature_authority
for each side of data, this would be required:
observed:
sources: data/DRRC2QINE.xml
predicted:
sources: data/drrc2ForecastsOneMonth/
features:
- {observed: DRRC2, predicted: DRRC2}
If you are using datasets with different feature authorities and are either unaware of how features relate to each other across the different feature authorities or prefer not to declare them manually, then you can use the Water Resources Data Service (WRDS) feature service to establish feature correlations. The WRDS is available to those with access to web services hosted at the National Water Center (NWC) in Alabama. The WRDS hostname is omitted below; if you need the hostname, refer to the COWRES user support wiki or contact the WRES team.
A feature service may be declared as follows:
feature_service: https://[WRDS]/api/location/v3.0/metadata
Where [WRDS]
is the host name of the WRDS feature service.
The WRES can ask the WRDS feature service to resolve feature correlations, providing it knows how to pose the question correctly. To pose the question correctly, it must know the feature authority associated with each of the feature names that need to be correlated.
For example, consider the following declaration:
observed:
sources:
- uri: https://nwis.waterservices.usgs.gov/nwis/iv
interface: usgs nwis
variable:
name: '00060'
predicted:
sources:
- uri: data/nwmVector/
interface: nwm short range channel rt conus
variable: streamflow
feature_service: https://[WRDS]/api/location/v3.0/metadata
features:
- observed: '07140900'
- observed: '07141900'
In this case, the feature authority of the observed
data is a USGS Site Code (the interface
is usgs nwis
) and the feature authority of the predicted
data is a National Water Model feature ID. This allows the WRES to pose a valid question to the WRDS feature service, namely “what are the National Water Model feature IDs that correspond to USGS Site Codes ‘07140900’ and ‘07141900’?”. It is important to note that each feature must be qualified as observed
because the feature names are expressed as USGS Site Codes and the observed
data uses this feature authority.
You may use the WRDS Feature Service to acquire a list of features for a named geographic region, such as a River Forecast Center (RFC). The WRDS is available to those with access to web services hosted at the National Water Center (NWC) in Alabama. The WRDS hostname is omitted below; if you need the hostname, refer to the COWRES user support wiki or contact the WRES team.
For example, consider the following declaration, which requests all named features within the Arkansas-Red Basin RFC:
feature_service:
uri: https://[WRDS]/api/location/v3.0/metadata
group: RFC
value: ABRFC
Where [WRDS]
is the host name of the WRDS feature service. Here, the name of the geographic group
understood by WRDS is RFC
and the chosen value
is ABRFC
.
In this example, each of the geographic features contained within ABRFC, as understood by WRDS, would be included in the evaluation. To include features from multiple regions, simply list the individual regions. For example, to additionally includes features from the California Nevada RFC:
feature_service:
uri: https://[WRDS]/api/location/v3.0/metadata
- group: RFC
value: ABRFC
- group: RFC
value: CNRFC
By default, each geographic feature is evaluated separately. However, to pool all of the geographic features together and produce a single set of statistics for the overall group, the pool
attribute may be declared:
feature_service:
uri: https://[WRDS]/api/location/v3.0/metadata
- group: RFC
value: ABRFC
pool: true
Yes, you can declare a spatial mask that defines the geospatial boundaries for an evaluation. This requires a Well Known Text (WKT) string. For example:
spatial_mask: 'POLYGON ((-76.825 39.225, -76.825 39.275, -76.775 39.275, -76.775 39.225, -76.825 39.225))'
In this case, the evaluation will only include (e.g., gridded) locations that fall within the boundaries of the supplied polygon.
Optionally, you may name the region and include a Spatial Reference System Identifier (SRID), which unambiguously describes the coordinate reference system for the supplied WKT: https://en.wikipedia.org/wiki/Spatial_reference_system:
spatial_mask:
name: Region south of Ellicott City, MD
wkt: 'POLYGON ((-76.825 39.225, -76.825 39.275, -76.775 39.275, -76.775 39.225, -76.825 39.225))'
srid: 4326
When evaluating an elevation variable, such as river stage, one or more of the declared datasets may be referenced to a different elevation datum than the remaining datasets. For example, the observed
river stage may be referenced to a local gauge datum and the predicted
river stage may be referenced to mean sea level. To reconcile these measurements to a common datum for comparison and evaluation, a datum offset
may be declared for the relevant dataset associated with each geographic feature
. The datum offset
is then added to the existing elevation before pairs and statistics are computed. The datum offset
is declared in evaluation units
. In the following example, a datum offset of 975 feet
is added to the elevation data associated with feature 15478000
, while no offset is applied to the corresponding feature, BGDA2
:
unit: [ft_i]
features:
- observed:
name: '15478000'
offset: -975
predicted: BGDA2
You should filter the time-series data in either of these scenarios:
- When the goal is to evaluate only a subset of the available time-series data; or
- When reading data from a web service, otherwise the evaluation would request a potentially unlimited amount of data.
An evaluation may be composed of up to three timelines, depending on the type of data to evaluate:
- Valid times. These are the ordinary datetimes at which values are recorded. For example, if streamflow is observed at
2023-03-25T12:00:00Z
, then its “valid time” is2023-03-25T12:00:00Z
. - Reference times. These are the times to which forecasts are referenced. In practice, there are different flavors of forecast reference times, such as forecast “issued times”, which may correspond to the times at which forecast products are released to the public, or “T0s”, which may correspond to the times at which a forecast model begins forward integration. However, as of v6.14, all reference times are considered par.
- Lead times. These are durations rather than datetimes and refer to the period elapsed between a forecast reference time and a forecast valid time. For example, if a forecast is issued at
2023-03-25T12:00:00Z
and valid at2023-03-25T13:00:00Z
, then its lead time is “1 hour”.
The last two timelines only apply to forecast datasets.
Datetimes are always declared using an ISO8601 datetime string in Coordinate Universal Time (UTC), aka Zulu (Z) time. Further information about ISO8601 can be found here: https://en.wikipedia.org/wiki/ISO_8601
Each of these timelines can be constrained or filtered so that the evaluation only considers data in between the prescribed datetimes. These bounds always form an open interval, meaning that times that fall exactly on either boundary are included.
Consider the following declaration of a valid time interval:
valid_dates:
minimum: 2017-08-07T23:00:00Z
maximum: 2017-08-09T17:00:00Z
In this case, the evaluation will consider all time-series values whose valid times are between 2017-08-07T23:00:00Z
and 2017-08-09T17:00:00Z
, inclusive.
The following is also accepted:
valid_dates:
minimum: 2017-08-07T23:00:00Z
In this case, there is a lower bound or minimum
date, but no upper bound, so the evaluation will consider time-series values whose valid times occur on or after 2017-08-07T23:00:00Z
.
A reference time interval may be declared in a similar way:
reference_dates:
minimum: 2017-08-07T23:00:00Z
maximum: 2017-08-08T23:00:00Z
Finally, lead times may be constrained like this:
lead_times:
minimum: 0
maximum: 18
unit: hours
In this example, the evaluation will only consider forecast values whose lead times are between 0 hours and 18 hours, inclusive.
When using model analyses in an evaluation, these analyses are sometimes referenced to the model initialization time, which is a particular flavor of reference time. For example, the National Water Model can cycle for hourly periods prior to the forecast initialization time and produce an “analysis” for each period. These analysis durations may be constrained in WRES.
For example, consider the following declaration:
analysis_times:
minimum: -2
maximum: 0
unit: hours
In this case, the evaluation will consider analysis cycles that are less than 2 hours before the model initialization time, up to the initialization time of 0 hours.
The WRES allows for a seasonal evaluation to be declared through a season
filter. The season
filter will apply to the valid times associated with the pairs when both sides of the pairing contain non-forecast sources (i.e., there are no reference times present); otherwise it will apply to the reference times (i.e., when one or both sides of the pairing contain forecasts).
A seasonal evaluation is declared with a minimum day and month and a maximum day and month. For example:
season:
minimum_day: 1
minimum_month: 4
maximum_day: 31
maximum_month: 7
In this example, the evaluation will consider only those pairs whose valid times (non-forecast sources) or reference times (forecast sources) fall between 0Z on 1 April and an instant before 0Z on 1 August (i.e., the very last time on 31 July).
The desired measurement units are declared as follows:
unit: m3/s
The unit may be any valid Unified Code for Units of Measure (UCUM). In addition, the WRES will accept several informal measurement units that are widely used in hydrology, such as CFS (cubic feet per second, formal UCUM unit [ft_i]3/s
), CMS (cubic meters per second, formal UCUM unit m3/s
) and IN (inches, formal UCUM unit [in_i]
).
Further details on units of measurement can be found in a separate wiki, Units of measurement.
If a data source contains a measurement unit that is unrecognized by WRES, you may receive an UnrecognizedUnitException
indicating that a measurement unit alias should be defined. A measurement unit alias is a mapping between an unrecognized or informal measurement unit, known as an alias
, and a formal UCUM unit, known as a unit
. For example, consider the following declaration:
unit: K
unit_aliases:
- alias: °F
unit: '[degF]'
- alias: °C
unit: '[cel]'
In this example, °F
and °C
are informal measurement units whose corresponding UCUM units are [degF]
and [cel]
, respectively. The desired measurement unit is, K
or kelvin. By declaring unit_aliases
, the WRES will understand that any references to °F
should be interpreted as formal unit, [degF]
and any references to °C
should be interpreted as formal unit, [cel]
. This will allow the software to convert the informal units of °F
and °C
, on the one hand, to the formal unit of K
, on the other.
Further information about units of measurement and aliases can be found in a separate wiki, Units of measurement.
In some cases, it is necessary to omit values that fall outside a particular range. For example, it may be desirable to only evaluation precipitation forecasts whose values are greater than an instrument detection limit. Restricting values to a particular range is achieved by declaring the minimum
and/or maximum
values that the evaluation should consider, as follows:
unit: mm
values:
minimum: 0.0
maximum: 100.0
In this example, only those values (observed
, predicted
and baseline
) that fall within the range 0mm to 100mm will be considered. The values are always declared in evaluation units. Mechanically speaking, any values that fall outside this range will be assigned the default missing value identifier.
Optionally, however, values that fall outside of the nominated range may be assigned another value. For example:
unit: mm
values:
minimum: 0.25
maximum: 100.0
below_minimum: 0.0
above_maximum: 100.0
In this example, values that are less than 0.25mm will be assigned a value of 0mm (the below_minimum
value) and values above 100mm will be assigned a value of 100mm (the above_maximum
value).
There are three flavors of thresholds that may be declared:
- Ordinary thresholds (
thresholds
), which are real-valued. If not otherwise declared, the thresholds values are assumed to be in the same measurement units as the evaluation; - Probability thresholds (
probability_thresholds
) whose values must fall within the interval [0,1]. These are converted into real-valued thresholds by finding the corresponding quantile of theobserved
dataset; and - Classifier thresholds (
classifier_thresholds
) whose values must fall within the interval [0,1]. These are used to convert probability forecasts into dichotomous (yes/no) forecasts.
The simplest use of thresholds may look like this, in context:
observed: some_observations.csv
predicted: some_forecasts.csv
unit: ft
thresholds: 12.3
In this case, the evaluation will consider only those pairs of observed
and predicted
values where the observed
value exceeds 12.3 FT.
There are several other attributes that may be declared alongside the threshold value(s). For example, consider this declaration:
observed: some_observations.csv
predicted: some_forecasts.csv
unit: m
thresholds:
name: MAJOR FLOOD
values:
- { value: 23.0, feature: DRRC2 }
- { value: 27.0, feature: DOLC2 }
operator: greater equal
apply_to: predicted
unit: ft
In this example, the evaluation will consider only those pairs of observed
and predicted
values at DRRC2
where the predicted
value is greater than or equal to 23.0 FT and only those paired values at DOLC2
where the predicted
value is greater than 27.0 FT. Further, for both locations, this threshold will be labelled MAJOR FLOOD
. The evaluation itself will be conducted in units of m
(meters), so these thresholds will be converted from ft
to m
prior to evaluation.
The acceptable values for the operator
include:
-
greater
; -
greater equal
; -
less
; -
less equal
; and -
equal
.
The acceptable values for the apply_to
include:
-
observed
: include the pair when the condition is met for theobserved
value; -
predicted
: include the pair when the condition is met for thepredicted
value (orbaseline
predicted value for baseline pairs); -
observed and predicted
: include the pair when the condition is met for both theobserved
andpredicted
values (orbaseline
predicted value for baseline pairs); -
any predicted
: include the pair when the condition is met for any of thepredicted
values with an ensemble (orbaseline
predicted value for baseline pairs); -
observed and any predicted
: include the pair when the condition is met for both theobserved
value and for any of thepredicted
values within an ensemble (orbaseline
predicted value for baseline pairs); -
predicted mean
: include the pair when the condition is met for the ensemble mean of thepredicted
values (orbaseline
predicted value for baseline pairs); and -
observed and predicted mean
: include the pair when the condition is met for both theobserved
value and the ensemble mean of thepredicted
values (orbaseline
predicted value for baseline pairs).
The apply_to
is only relevant when filtering pairs for metrics that apply to continuous variables, such as the mean error (e.g., of streamflow predictions), and not when transforming pairs, such as converting continuous pairs to probabilistic or dichotomous pairs. For the latter, both sides of the pairing are always transformed, by definition.
The probability thresholds and classifier thresholds may be declared in a similar way. For example:
observed: some_observations.csv
predicted: some_forecasts.csv
unit: ft
probability_thresholds: [0.1,0.5,0.9]
In this example, the evaluation will consider only those pairs of observed
and predicted
values where the observed
value is greater than each of the 10th, 50th and 90th percentiles of the observed
values.
All of the declaration options for thresholds that are applied to the evaluation as a whole can be applied equally to individual metrics within the evaluation, if desired. For example, consider the following declaration:
observed: some_file.csv
predicted: another_file.csv
unit: ft
metrics:
- name: mean square error skill score
thresholds: 23
- name: pearson correlation coefficient
probability_thresholds:
values: [0.1,0.2]
operator: greater equal
In this example, the mean square error skill score
will be computed for those pairs of observed
and predicted
values where the observed
value exceeds 23.0 FT. Meanwhile, the pearson correlation coefficient
will be computed for those pairs of observed
and predicted
values where the observed
value is greater than or equal to the 10th percentile of observed
values and, separately, the 20th percentile of observed
values.
Yes. An evaluation may declare thresholds from one or both of these external sources:
- The Water Resources Data Service (WRDS) threshold service; and
- Comma separate values from a file on the default filesystem.
For those users with access to the WRDS threshold service, the WRES will request thresholds from the WRDS when declared. The WRDS is available to those with access to web services hosted at the National Water Center (NWC) in Alabama. The WRDS hostname is omitted below; if you need the hostname, refer to the COWRES user support wiki or contact the WRES team. Consider the following declaration:
observed:
sources: data/CKLN6_STG.xml
feature_authority: nws lid
predicted: data/CKLN6_HEFS_STG_forecasts.tgz
features:
- observed: CKLN6
threshold_sources: https://[WRDS]/api/location/v3.0/nws_threshold/
Where [WRDS]
is the host name for the WRDS production service (to be inserted). Note the use of feature_authority
, which is important in this context. In particular, it allows WRES to pose a complete/accurate request to WRDS, namely “please provide the streamflow thresholds associated with an NWS LID of CKLN6”. By default, the WRES will request streamflow thresholds unless otherwise declared.
Consider a more complicated declaration:
observed:
sources:
- uri: https://nwis.waterservices.usgs.gov/nwis/iv
interface: usgs nwis
variable:
name: '00060'
predicted:
sources:
- uri: data/nwmVector/
interface: nwm short range channel rt conus
variable: streamflow
features:
- {observed: '07140900', predicted: '21215289'}
- {observed: '07141900', predicted: '941030274'}
threshold_sources:
uri: https://[WRDS]/api/location/v3.0/nws_threshold/
parameter: stage
provider: NWS-NRLDB
rating_provider: NRLDB
missing_value: -999.0
feature_name_from: predicted
In this example, the WRES will ask WRDS to provide all thresholds for the parameter
of stage
, the provider
of NWS-NRLDB
, and the rating_provider
of NRLDB
and for those geographic features with NWM feature IDs of 21215289
and 941030274
. Furthermore, the evaluation will consider any threshold values of –999.0 to be missing values.
Thresholds may be read from CSV files in a similar way to thresholds from the Water Resources Data Service (WRDS). For example, consider the following declaration:
threshold_sources: data/thresholds.csv
In this example, thresholds will be read from the path data/thresholds.csv
on the default filesystem. By default, they will be treated as ordinary, real-valued, thresholds in the same units as the evaluation and for the same variable.
The options available to qualify thresholds from WRDS are also available to qualify thresholds from CSV files. For example, consider the following declaration:
threshold_sources:
- uri: data/thresholds.csv
missing_value: -999.0
feature_name_from: observed
- uri: data/more_thresholds.csv
missing_value: -999.0
feature_name_from: predicted
type: probability
In this example, thresholds will be read from two separate paths on the default filesystem, namely data/thresholds.csv
and data/more_thresholds.csv
. The thresholds from data/thresholds.csv
will be treated as ordinary, real-valued, thresholds whose feature names correspond to the observed
dataset. Conversely, the thresholds from data/more_thresholds.csv
will be treated as probability
thresholds whose feature names correspond to the predicted
dataset. In both cases, values of –999.0 are considered to be missing values.
By way of example, the CSV format should contain a location or geographic feature identifier in the first column, labelled locationId
, and one conceptual threshold per column in the remaining columns, with each column header containing the name of that threshold, if appropriate (otherwise blank), and each row containing a separate location:
locationId, ACTION, MINOR FLOOD
CKLN6, 10, 12
WALN6, 7.5, 9.5
A “pool” is the atomic unit of paired data from which a statistic is computed. Typically, there are many pools of pairs in each evaluation. For example, considering pooling over time, or temporal pooling, if the goal is to evaluate a collection of forecasts at each forecast lead time, separately, and all of the forecasts contain 3-hourly lead times for 2 days, then there are 24/3*2=16 lead times and hence 16 pools of data to evaluate.
Pooling can be done temporally (over time) or spatially (over features), both of which are described here.
In general, an evaluation will require a regular sequence of pools along one or more of the timelines described in What timelines are understood by WRES and how do I constrain them?, namely:
- Valid times;
- Reference times (of forecasts); and
- Lead times (of forecasts).
There is a consistent grammar for declaring a regular sequence of pools along each of these timelines. In each case, the sequence begins at the minimum
value and ends at the maximum
value associated with the corresponding timeline described in What timelines are understood by WRES and how do I constrain them?. For the same reason, a sequence of pools requires both a constraint on the timeline and the pool sequence itself. For example:
reference_dates:
minimum: 2023-03-17T00:00:00Z
maximum: 2023-03-19T19:00:00Z
reference_date_pools:
period: 13
unit: hours
In this example, there is a regular sequence of reference time pools. The sequence begins at 2023-03-17T00:00:00Z
and ends at 2023-03-19T19:00:00Z
, inclusive. Each pool is 13 hours wide and a new pool begins every 13 hours. In other words, the pools are not overlapping, by default. Using interval notation, the above declaration would produce the following sequence of pools where (
means that the lower boundary is excluded and ]
means that the upper boundary is included:
- Pool
rp1
: (2023-03-17T00:00:00Z, 2023-03-17T13:00:00Z] - Pool
rp2
: (2023-03-17T13:00:00Z, 2023-03-18T02:00:00Z] - Pool
rp3
: (2023-03-18T02:00:00Z, 2023-03-18T15:00:00Z] - Pool
rp4
: (2023-03-18T15:00:00Z, 2023-03-19T04:00:00Z] - Pool
rp5
: (2023-03-19T04:00:00Z, 2023-03-19T17:00:00Z]
Note that there is no “Pool 6” because a pool cannot partially overlap the minimum
or maximum
dates on the timeline.
If we assume that four separate forecasts were issued, beginning at 2023-03-17T00:00:00Z
and repeating every 12 hours, then the timeline may be visualized as follows, where fc
is a forecast whose reference time is denoted 0
and rp
is a reference date pool:
fc1: 0 v v v v v v v v v v v v v v v v
fc2: 0 v v v v v v v v v v v v v v v v
fc3: 0 v v v v v v v v v v v v v v v v
fc4: 0 v v v v v v v v v v v v v v v v
time: ─┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼───
16th 17th 17th 17th 17th 18th 18th 18th 18th 19th 19th 19th 19th 20th 20th 20th 20th 21st
18Z 00Z 06Z 12Z 18Z 00Z 06Z 12Z 18Z 00Z 06Z 12Z 18Z 00Z 06Z 12Z 18Z 00Z
boundaries: ├ ┤
rp1: └────────────┘ rp3: └────────────┘ rp5: └────────────┘
rp2: └────────────┘ rp4: └────────────┘
In this example, fc1
would fall in pool rp1
, fc2
would fall in pool rp2
, and so on. Pool rp5
would contain no data because there are no reference times that fall within it.
A regular sequence of valid time pools or lead time pools may be declared in a similar way. For example, the equivalent pools by valid time are:
valid_dates:
minimum: 2023-03-17T00:00:00Z
maximum: 2023-03-19T19:00:00Z
valid_date_pools:
period: 13
unit: hours
A similar sequence of lead time pools may be declared as follows:
lead_times:
minimum: 0
maximum: 44
unit: hours
lead_time_pools:
period: 13
unit: hours
Yes, pools may overlap or underlap each other; in other words, the pool boundaries may not abut perfectly. This is achieved by declaring a frequency
, which operates alongside the period
. For example:
reference_dates:
minimum: 2023-03-17T00:00:00Z
maximum: 2023-03-19T19:00:00Z
reference_date_pools:
period: 13
frequency: 7
unit: hours
In this case, a new reference time pool will begin every 7 hours and each pool will be 13 hours wide. To continue the above example and visualization:
fc1: 0 v v v v v v v v v v v v v v v v
fc2: 0 v v v v v v v v v v v v v v v v
fc3: 0 v v v v v v v v v v v v v v v v
fc4: 0 v v v v v v v v v v v v v v v v
time: ─┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┼───
16th 17th 17th 17th 17th 18th 18th 18th 18th 19th 19th 19th 19th 20th 20th 20th 20th 21st
18Z 00Z 06Z 12Z 18Z 00Z 06Z 12Z 18Z 00Z 06Z 12Z 18Z 00Z 06Z 12Z 18Z 00Z
boundaries: ├ ┤
rp1: └────────────┘ rp4: └────────────┘ rp7: └────────────┘
rp2: └────────────┘ rp5: └────────────┘ rp8: └────────────┘
rp3: └────────────┘ rp6: └────────────┘
Here, pools rp1
through rp7
each contain one forecast and pool rp8
contains no forecasts.
Yes, an explicit or irregular sequence of pools may be declared using time_pools
. These pools can be declared instead of, or in addition to, a regular sequence. For example, the following declaration contains a regular sequence of lead_time_pools
that span 0 to 120 hours, every 6 hours, as well as an explicit pool that spans 0 to 5 days.
lead_time_pools:
period: 6
unit: hours
lead_times:
minimum: 0
maximum: 120
unit: hours
time_pools:
- lead_times:
minimum: 0
maximum: 5
unit: days
When declaring explicit time_pools
, any of the lead_times
, valid_dates
or reference_dates
may be declared. In the following example, there are two explicit time pools. The first pool considers valid_dates
between 1995-03-18T00:00:00Z
and 1995-03-21T00:00:00Z
and the second pool considers
valid_dates
between 1995-03-21T00:00:00Z
and 1995-03-27T00:00:00Z
, as well as reference_dates
between 1995-03-21T06:00:00Z
and 1995-03-29T06:00:00Z
. Each list item, denoted by a -
, begins a new pool and each pool may contain up to three time dimensions, as noted above.
time_pools:
- valid_dates:
minimum: 1995-03-18T00:00:00Z
maximum: 1995-03-21T00:00:00Z
- valid_dates:
minimum: 1995-03-21T00:00:00Z
maximum: 1995-03-27T00:00:00Z
reference_dates:
minimum: 1995-03-21T06:00:00Z
maximum: 1995-03-29T06:00:00Z
For consistency with pools that are declared in a regular sequence, the minimum
is always exclusive, whereas the maximum
is inclusive. This allows for pools to overlap on a common boundary, which is more intuitive to declare, without the same time-series events falling into two separate pools, which is generally unintended/undesirable. However, in all contexts other than time_pools
, the minimum
and maximum
values are inclusive.
An evaluation answers a question (e.g., about forecast quality). When that question is concerned with a geographic area or region, it may be appropriate to gather and pool together data from several geographic features. However, there will be cases where pooling over geographic features is inappropriate, such as if evaluation land surface variables where evaluation results may vary significantly between features.
The why and how of pooling over geographic features is described in Pooling geographic features.
The desired timescale associated with the evaluation is declarative, which means that it may be different than the timescale of the existing datasets. However, the WRES currently only supports limited forms of “upscaling” (increasing the timescale of existing datasets) and does not support “downscaling” (reducing the timescale of existing datasets). More information about the timescale and rescaling can be found here: Time Scale and Rescaling Time Series.
A fixed timescale contains three elements, namely:
- The
period
, which is the number of time units to which the value applies; - The time
unit
associated with theperiod
. Supported values include:seconds
minutes
-
hours
; and -
days
; and
- The
function
, which describes how the value is distributed over theperiod
. Supported values include:-
mean
; -
minimum
; -
maximum
; and -
total
.
-
For example, to declare a desired timescale that represents a mean average value over a 6 hour period, use the following:
time_scale:
function: mean
period: 6
unit: hours
Yes. The desired timescale can span an explicit period
that begins or ends on a particular date or an implicit (and potentially varying) period that begins and ends on nominated dates. For example, to declare a timescale that represents a maximum value that occurs between 0Z on 1 April and the instant before 0Z on 1 August (i.e., the end of 31 July), declare the following:
time_scale:
function: maximum
minimum_day: 1
minimum_month: 4
maximum_day: 31
maximum_month: 7
More information and examples can be found here: Time Scale and Rescaling Time Series.
In principle, you don’t. Recall the simplest possible evaluation described in What is the simplest possible evaluation I can declare?:
observed: observations.csv
predicted: predictions.csv
When no metrics are declared explicitly, the software will read the time-series data and evaluate all metrics that are appropriate for the types of data discovered. For example, if one of the data sources contains ensemble forecasts, then the software will include all metrics that are appropriate for ensemble forecasts.
While the metrics can be chosen by the software, it is often desirable to calculate only a subset of the metrics that are technically valid for a given type of data. A list of metrics may be declared as follows:
metrics:
- sample size
- mean error
- mean square error
The list of supported metrics is provided here: List of metrics available.
In rare cases, it may be necessary to declare parameter values for some metrics. For example, if graphics formats are required for some metrics and not others, you can indicate that specific graphics formats should be omitted for some metrics:
metrics:
- sample size
- mean error
- name: ensemble quantile quantile diagram
png: false
svg: false
- mean square error
In this example, the png
and svg
graphics formats would be omitted for the ensemble quantile quantile diagram
. Note that, in order to distinguish the metric name
from the parameter values, the name
key is now declared explicitly for the ensemble quantile quantile diagram
, but is not required for the other metrics, as they do not have parameters.
The list of currently supported parameter values are tabulated below.
Parameter | Applicable metrics | Purpose | Example in context |
---|---|---|---|
png |
All. | A flag that allows for Portable Network Graphics (PNG) to be turned on (true ) or off (false ). |
metrics: |
svg |
All. | A flag that allows for Scalable Vector Graphics (SVG) to be turned on (true ) or off (false ). |
metrics: |
thresholds |
All. | Allows thresholds to be declared for a specific metric (rather than all metrics). To ensure that the metric is computed for the superset of pairs or "all data" only, and not for any other declared thresholds, you may use thresholds: all data. |
metrics: |
probability_thresholds |
All. | Allows probability_thresholds to be declared for a specific metric (rather than all metrics). |
metrics: |
classifier_thresholds |
All dichotomous metrics (e.g., probability of detection ). |
Allows classifier_thresholds to be declared for a specific, dichotomous metric (rather than all dichotomous metrics). |
metrics: |
ensemble_average |
All single-valued metrics as they relate to ensemble forecasts (e.g., mean error ). |
A function to use when deriving a single value from an ensemble of values. For example, to calculate the ensemble mean, the ensemble_average should be mean . The supported values are:- mean - median
|
metrics: |
summary_statistics |
All time-series metrics (e.g., time to peak error ). |
A collection of summary statistics to calculate from the distribution of time-series errors. For example, when calculating the time to peak error , there is one error value for each forecast and hence a distribution of errors across all forecasts. When declaring the median in this context, the median time to peak error will be reported alongside the distribution of errors. The supported values are:- mean - median - minimum - maximum - mean absolute - standard deviation
|
metrics: |
Summary statistics can be used to describe or summarize a broader collection of evaluation statistics, such as the statistics associated with all geographic features in an evaluation. Further information about summary statistics is available here: Evaluation summary statistics.
Summary statistics are declared as a list of summary_statistics
. For example:
summary_statistics:
- mean
- standard deviation
By default, summary statistics are calculated across all geographic features. Optionally, the dimensions
to summarize may be declared explicitly. For example:
summary_statistics:
statistics:
- mean
- standard deviation
dimensions:
- features
- feature groups
In this example, the features
option indicates that summary statistics should be calculated for all geographic features within the evaluation. These features may be declared explicitly as features
or using a feature_service
with one or more group
whose pool
option is set to “false” or they may be declared implicitly with sources
that contain time-series data for named features. In addition, the feature groups
option indicates that summary statistics should be calculated for each geographic feature group separately. These feature groups may be declared as feature_groups
or using a feature_service
with one or more group
whose pool
option is set to “true”. When declaring summary statistics for feature groups
, one or more feature groups must also be declared.
A few of the summary statistics support additional parameters, notably the quantiles
and the histogram
. In that case, the statistic name
must be qualified separately from the parameters. For example:
summary_statistics:
statistics:
- mean
- median
- minimum
- maximum
- standard deviation
- mean absolute
- name: quantiles
probabilities: [0.05,0.5,0.95]
- name: histogram
bins: 5
- box plot
The default probabilities
associated with the quantiles
are 0.1, 0.5, and 0.9. The default number of bins
in the histogram
is 10.
The sampling uncertainties may be estimated using a resampling technique, known as the “stationary bootstrap”. The declaration requires a sample_size
and a list of quantiles
to estimate. For example:
sampling_uncertainty:
sample_size: 1000
quantiles: [0.05,0.95]
Care should be taken in choosing the sample_size
because each additional sample requires that the pairs are resampled for every pool and the statistics recalculated each time, which is computationally expensive.
See Sampling uncertainty assessment for more details.
The statistics output formats are declared by listing them. For example:
output_formats:
- csv2
- pairs
- png
When no output_formats
are declared, the software will write the csv2
format, by default. For example, when considering the simplest possible evaluation described in What is the simplest possible evaluation I can declare?, no output_formats
are declared and csv2
will be written.
The supported statistics formats include:
-
png
: Portable Network Graphics (PNG); -
svg
: Scalable Vector Graphics (SVG); -
csv2
: Comma separated values with a single file per evaluation (see Output Format Description for CSV2 for more information); -
netcdf2
: Network Common Data Form (NetCDF); -
protobuf
: Protocol buffers. An efficient binary format that produces one file per evaluation.
The following statistics formats are supported (for now), but are deprecated for removal and should be avoided:
-
csv
: comma separated values; and -
netcdf
: comma separated values.
In addition, to help with tracing statistics to the paired values that produced them, the following is supported:
-
pairs
: Comma separated values of the paired time-series data from which statistics were produced (which are gzipped, by default).
Some of these formats support additional parameters, as follows:
Parameter | Applicable formats | Purpose | Example in context |
---|---|---|---|
width |
All graphics formats (e.g., png ). |
An integer value (greater than 0) that prescribes the width of the graphics to produce. | output_formats: |
height |
All graphics formats (e.g., png ). |
An integer value (greater than 0) that prescribes the height of the graphics to produce. | output_formats: |
Yes, there several additional other options for filtering or transforming data or otherwise refining the evaluation. These are listed below:
Option | Purpose | Example in context |
---|---|---|
pair_frequency |
By default, all paired values are included. However, this option allows for paired values to be included only at a prescribed frequency, such as every 12 hours. | observed: some_observations.csv |
cross_pair |
When calculating skill scores, all paired values are used by default. This can be misleading when the (observed , predicted ) pairs contain many more or fewer pairs than the (observed , baseline ) pairs. In order to mitigate this, cross pairing is supported. When using cross-pairing, only those pairs whose valid times appear in both sets of pairs will be included. In addition, the treatment of forecast reference times is prescribed by an option. The available options are:- exact : Only admit those pairs whose forecast reference times appear in both sets of pairs; and- fuzzy : Choose the nearest forecast reference times in both sets of pairs and discard any others.In all cases, the resulting skill score statistics will always use the same number of ( observed , predicted ) pairs and (observed , baseline ) pairs. In addition, when using exact cross-pairing, the valid times and reference times are both guaranteed to match exactly. |
observed: some_observations.csv |
minimum_sample_size |
An integer greater than zero that identifies the minimum sample size for which a statistic will be included. For continuous measures, this is the number of pairs. For dichotomous measures, it is the smaller of the number of occurrences and non-occurrences of the dichotomous event. If a statistic was computed from a smaller sample size than the minimum_sample_size , it will be discarded. |
observed: some_observations.csv |
decimal_format |
The decimal format to use when writing statistics to numeric formats. It also controls the format of tick labels for time-based domain axes in generated graphics. | observed: some_observations.csv |
duration_format |
The duration format to use when writing statistics to numeric formats. It also controls the units of time-based domain axes in generated graphics. The supported values include: - seconds - minutes - hours - days
|
observed: some_observations.csv |
Yes, examples of complete declarations can be found in a separate wiki, Complete Examples of Evaluation Declarations TODO.
Yes, the declaration language uses a schema, which defines the superset of declarations that the WRES could accept. The schema uses the JSON schema language:
The latest version of the schema is available in the code repository:
https://github.com/NOAA-OWP/wres/blob/master/wres-config/nonsrc/schema.yml
However, the schema is relatively permissive. In other words, there are some evaluations that are permitted by the schema that are not permitted by the WRES software itself. Indeed, a schema is best suited for simple validation. More comprehensive validation is performed by the software itself, once the declaration has been validated against the schema.
In practice, you may notice this when reading feedback from the software about validation failures. The earliest failures will occur when the declaration is inconsistent with the schema. The feedback that results from these failures will tend to be more abstract or less human readable because it will list a cascade of failures. In other cases, the failure will be straightforward. You should generally look for the simplest/most understandable among them. For example, a declaration like this:
observed: some_observations.csv
predicted: some_forecasts.csv
foo: bar.csv
Will produce an error like this, because the foo
key is not part of the schema and the schema does not permit additional properties:
wres.config.yaml.DeclarationException: When comparing the declared evaluation to the schema, encountered 1 errors, which must be fixed. Hint: some of these errors may have the same origin, so look for the most precise/informative error(s) among them. The errors are:
- $.foo: is not defined in the schema and the schema does not allow additional properties
You will sometimes encounter warnings or errors that relate to your declaration. For example, if an error is wrapped in a DeclarationException
, the problem will originate from your declaration. These errors arise because the declaration is invalid for some reason. There are three main reasons why a declaration could be invalid:
- The declaration is not a valid YAML document. You can test whether your declaration is a valid YAML document using an online tool, such as: https://www.yamllint.com/
- The declaration contains options that are not understood or allowed by WRES (specifically, they are not consistent with the declaration schema, as described in Does the declaration language use a schema?). For example, if you include options that are misspelled or options that fall outside valid bounds, such as probabilities that fall outside [0,1], you can expect an error; or
- The declaration contains options that are disallowed by WRES in combination with other options. For example, if you add an ensemble-like metric and declare that none of the data types are ensemble-like, then you can expect an error.
In general, any warning or error messages should be straightforward and intuitive, indicating what you should do to fix them (or, in the case of warnings, what you should consider about the options you chose). Furthermore, if there are multiple warnings or errors, they should all be listed at once. For example, consider the following invalid declaration:
observed: some_observations.csv
predicted: some_predictions.csv
lead_time_pools:
period: 13
unit: hours
metrics:
- probability of detection
This declaration produces the following errors:
wres.config.yaml.DeclarationException: Encountered 2 error(s) in the declared evaluation, which must be fixed:
- The declaration included 'lead_time_pools', which requires the 'lead_times' to be fully declared. Please remove the 'lead_time_pools' or fully declare the 'lead_times' and try again.
- The declaration includes metrics that require either 'thresholds' or 'probability_thresholds' but none were found. Please remove the following metrics or add the required thresholds and try again: [PROBABILITY OF DETECTION].
If the errors are not intuitive, you should create a ticket asking for more clarity and we will explain the failure and improve the error message. However, errors that fall within the first two categories are delegated to other tools and are not, therefore, fully within our control. For example, when your declaration fails validation against the schema, you may be presented with a cascade of errors that are not immediately intuitive. For example, consider the following, invalid declaration:
observed: some_observations.csv
predicted: some_predictions.csv
metrics:
- some metric
Since some metric
is not an expected metric, this declaration will produce an error. However, the evaluation actually produces a cascade of errors, which occur because the metrics
declaration is invalid against any known (sub)schema within the overall schema:
wres.config.yaml.DeclarationException: When comparing the declared evaluation to the schema, encountered 5 errors, which must be fixed. Hint: some of these errors may have the same origin, so look for the most precise/informative error(s) among them. The errors are:
- $.metrics[0]: does not have a value in the enumeration [box plot of errors by observed value, box plot of errors by forecast value, brier score, brier skill score, contingency table, continuous ranked probability score, continuous ranked probability skill score, ensemble quantile quantile diagram, maximum, mean, minimum, rank histogram, relative operating characteristic diagram, relative operating characteristic score, reliability diagram, sample size, standard deviation]
- $.metrics[0]: does not have a value in the enumeration [bias fraction, box plot of errors, box plot of percentage errors, coefficient of determination, pearson correlation coefficient, index of agreement, kling gupta efficiency, mean absolute error, mean error, mean square error, mean square error skill score, mean square error skill score normalized, median error, quantile quantile diagram, root mean square error, root mean square error normalized, sample size, sum of square error, volumetric efficiency, mean absolute error skill score]
- $.metrics[0]: does not have a value in the enumeration [contingency table, threat score, equitable threat score, frequency bias, probability of detection, probability of false detection, false alarm ratio, peirce skill score]
- $.metrics[0]: string found, object expected
- $.metrics[0]: does not have a value in the enumeration [time to peak relative error, time to peak error]
This cascade of errors is somewhat unintuitive but, at the time of writing, it cannot be improved easily. As suggested in the Hint
, you should look for the most precise and informative error among the cascade. In this case, it should be reasonably clear that the metric in position “[0]” (meaning the first metric) is not a name that occurs within any known enumeration. As the schema includes several metric groups, each with a separate enumeration, this error is reported with respect to each group.
The WRES Wiki
-
Options for Deploying and Operating the WRES
- Obtaining and using the WRES as a standalone application
- WRES Local Server
- WRES Web Service (under construction)
-
- Format Requirements for CSV Files
- Format Requirements for NetCDF Files
- Introductory Resources on Forecast Verification
- Instructions for Human Interaction with a WRES Web-service
- Instructions for Programmatic Interaction with a WRES Web-service
- Output Format Description for CSV2
- Posting timeseries data directly to a WRES web‐service as inputs for a WRES job
- WRES Scripts Usage Guide