Skip to content

Posting timeseries data directly to a WRES web‐service as inputs for a WRES job

HankHerr-NOAA edited this page Oct 11, 2024 · 6 revisions

Table of Contents

A WRES web-service instance supports posting timeseries (evaluation inputs) directly to the service. This capability is only available when executing an evaluation from the command-line or programmatically (e.g., via a script or other software), as described in Instructions for Programmatic Interaction with a WRES web-service.

NOTE: In all of the examples below, the WRES web-service URL is omitted and replaced with "...". To apply the example to your use case, replace the "..." in the examples with the appropriate web-service URL.

Steps 1 and 3 of those instructions are modified as follows, with details provided below:

1. Prepare the evaluation project declaration.

The source tags for data to be posted directly should be omitted from the declaration, as explained below. Otherwise, this step is unchanged.

3. “POST” the evaluation project declaration to the web-service using the web-service CA .pem file and record the server’s response.

When POSTing the declaration, include postInput=true as another form parameter; see the example below. Then post the evaluation data to the observed, predicted, or baseline inputs. Finally, when done posting data, signal to the web-service that all data has been posted and the evaluation can begin by posting another form parameter, postInputDone=true; again, see the example below. All steps are described in detail below.

After posting the signal that all timeseries have been posted, the job will be processed and should be monitored, output obtained, and cleaned up per Steps 4 - 8 in Instructions for Programmatic Interaction with a WRES web-service.

All posted data will be removed upon successful completion of the evaluation unless the keepInput flag was set, as described below.

Using fully functional bash and Python scripts, provided in WRES Scripts Usage Guide, is recommended

The scripts are described with examples in the WRES Scripts Usage Guide wiki. It is recommended that you use these scripts when interacting with a WRES web-service programmatically. The options to use to post data are -l, -p, and -b. However, if you need to make the HTTP requests directly, perhaps because you are using another programming language, then the below sections describe how to do so.

Prepare the declaration by omitting the sources for data to be sent directly

For example, your declaration could look like this if you wish to post the observed dataset and the predicted dataset directly to the web-service for this job:

observed:
  variable: 00060
predicted:
  variable: streamflow

Can I include non-posted data when using this capability?

Yes. Data posted directly to a WRES web-service will be added to the declaration, one sources entry per post. Sources already present will remain and be processed as usual.

Post the declaration to https://.../job as usual, but add the form parameter postInput=true

Use the projectConfig form parameter for the project declaration, as usual, but add form parameter postInput with the value true to tell web-service to wait for timeseries data before sending a job to a worker to be executed. For example, note the use of &postInput=true in the following curl command, where the declaration file has the name test2_config.yml:

curl -v --cacert [web-service CA .pem] -d "projectConfig=$(cat test2_config.yml)&postInput=true" https://.../job/

Check the response code for 200 or 201, as usual. If successful (200 or 201), then the job status will have the status AWAITING_POSTS_OF_DATA. You can navigate to the job location URL (see Step 5 of Instructions for Programmatic Interaction with a WRES web-service), .../job/{jobId}/status, to verify.

While the job status is AWAITING_POSTS_OF_DATA, the web-service will successfully accept posts of data for that particular job.

Post data to .../job/{jobId}/input/left, .../job/{jobId}/input/right, and/or .../job/{jobId}/input/baseline

It is highly recommend that files of data be gzipped prior to posting to save bandwidth and time during the posting process.

For each timeseries document (or blob), post to the corresponding dataset under the job’s input URL as multipart/form-data, with the data in the data variable.

  • For observed data, use .../job/{jobId}/input/left
  • For predicted data, use .../job/{jobId}/input/right
  • For (optional) baseline data, use .../job/{jobId}/input/baseline

For example, the following curl commands will post the file test2_data/DRRC2QINE.xml.gz (gzipped XML) for the observed (or left) source and test2_data/right_data.tgz (a gzipped tarball) for the predicted (or right) source (note that the -F option posts the data as multipart/form-data):

curl -v --cacert [web-service CA .pem] -F data=@test2_data/DRRC2QINE.xml.gz https://.../job/{jobId}/input/left  
curl -v --cacert [web-service CA .pem] -F data=@test2_data/right_data.tgz https://.../job/{jobId}/input/right

To post data using Python and the requests library, be sure to post the data using the files parameter to ensure it is posted as multipart/form-data. For example:

    input_file = "input_data.csv.gz" 
    right = open(input_file, "rb")
    data_post_response = requests.post( url=job_location + "/input/right",
                                        verify = wres_ca_file,
                                        files = {"data": right} )

What formats can I post?

Any simple formats supported by core WRES as described in Instructions for Using WRES#Available-Evaluation-Data. A simple format for the purpose of this guide has one or more timeseries per document (or blob).

As stated above, any posted data should be gzipped prior to sending. The WRES is able to read individually gzipped files and gzipped tarballs when ingesting data. Using gzip will reduce the amount of data sent "over the wire", sometimes dramatically if its plain ASCII such as with XML files, and that in turn could reduce time waiting for data to be posted.

Does order of each post matter?

No. You can post in any order and multiple files can be posted to each input “side”. For example you can post document A to the right, followed by document B to the left, followed by document C to the right, followed by document D to the left, etc.

Can I post concurrently?

Yes, but it should be limited to no more than 3 concurrently.

Post to .../job/{jobId}/input with form parameter postInputDone=true

Post using MIME/Content-Type application/x-www-form-urlencoded the parameter postInputDone set to value true to the job location URL input, .../job/{jobId}/input. This tells the web-service that you have no more data to post to this job. For example,

curl -v --cacert [web-service CA .pem] -d postInputDone=true https://.../job/{jobId}/input

The response from the service will include the complete declaration YAML prepared for the evaluation project with the posted data included. XML declaration will be migrated to YAML as part of this process.

NOTE: In order to add your posted data sources to the declaration, it must be validated and parsed. If the declaration fails to validate, then the evaluation will fail, the .../job/{jobId}/status will be FAILED_BEFORE_IN_QUEUE, and the validation failure explanation will be returned as the HTTP response for your request (e.g., curl will output that response to the terminal for the user to read).

Monitor the job as usual, get output as usual, clean up as usual

Proceed with the usual workflow documented at Instructions for Programmatic Interaction with a WRES web-service beginning with Step 6, “Monitor…”

If something went wrong in the previous step, and the job does not get worked on by a WRES worker instance, then it may have state FAILED_BEFORE_IN_QUEUE, so you should also monitor for this state in addition to the COMPLETED... states revealed at job/{jobId}/status.

Is it possible to post the data so that it is kept after the evaluation completes successfully?

Yes. To do so, add the form parameter keepInput with the value true to the request when posting the declaration. By doing so, after the evaluation completes, the web-service will NOT remove any of the posted data locally. The user may then copy the sources added for the posted data from the returned declaration in order to reuse them later. For example:

curl -v --cacert [web-service CA .pem] -d "projectConfig=$(cat test2_config.yml)&postInput=true@keepInput=true” https://.../job

The added sources will have file names that are randomly generated and stored in a local directory. For example (where the path is partially omitted and replaced with "..."):

sources: file:///.../input_data/815463329142625139_1556163733477951741

Simply copying that from the declaration returned in the response to a new evaluation declaration will allow that data to be reused in that evaluation.

Data will be kept so long as the web-service admin does not remove it OR the web-service does not remove it due to the disk space filling up. If disk space is filling up, the web-service will remove posted data to save space, starting with the oldest.

Job state diagram for status shown at .../job/{jobId}/status

          ┌───────────┐
          │           │
          │  CREATED  │
          │           │
          └───┬───┬───┘
              │   └──────────────────────┐
              │                          │
              │                          ▼
              │             ┌──────────────────────────┐
              │             │                          │
              │             │  AWAITING_POSTS_OF_DATA  │
              │             │                          │
              │             └────────┬───────┬─────────┘
              │                      │       └──────────────────┐
              │                      │                          │
              │                      ▼                          │
              │             ┌─────────────────────────┐         │
              │             │                         │         │
              │             │  NO_MORE_POSTS_OF_DATA  │         │
              │             │                         │         │
              │             └────────┬──────┬─────────┘         │
              │   ┌──────────────────┘      └───────────────┐   │
              │   │                                         │   │
              ▼   ▼                                         ▼   ▼
          ┌────────────┐                         ┌──────────────────────────┐
          │            │                         │                          │
          │  IN_QUEUE  │                         │  FAILED_BEFORE_IN_QUEUE  │
          │            │                         │                          │
          └──────┬─────┘                         └──────────────────────────┘
                 │
                 │
                 ▼
         ┌───────────────┐
         │               │
         │  IN_PROGRESS  │
         │               │
         └─────┬───┬─────┘
           ┌───┘   └────────────────────────────┐
           │                                    │
           ▼                                    ▼
┌──────────────────────────────┐    ┌──────────────────────────────┐
│                              │    │                              │
│  COMPLETED_REPORTED_SUCCESS  │    │  COMPLETED_REPORTED_FAILURE  │
│                              │    │                              │
└──────────────────────────────┘    └──────────────────────────────┘
Clone this wiki locally