Skip to content

Commit

Permalink
Workflows: docs for catalog upload, consistent longform for catalog o…
Browse files Browse the repository at this point in the history
…ptions dict, update destination & format docs wording (#7368)

GitOrigin-RevId: 95d7f319b78c395da42a086a4a42658ba5da2f03
  • Loading branch information
gjoseph92 authored and Descartes Labs Build committed Aug 20, 2020
1 parent e99bb6f commit 7eb6b57
Show file tree
Hide file tree
Showing 3 changed files with 127 additions and 23 deletions.
20 changes: 16 additions & 4 deletions descarteslabs/common/workflows/outputs/user_destination_options.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,18 +18,30 @@
def user_destination_to_proto(
params: Union[dict, str, Image]
) -> destinations_pb2.Destination:
if isinstance(params, (str, Image)):
if isinstance(params, str):
params = {"type": params}
elif isinstance(params, Image):
params = {"type": "catalog", "image": params}
else:
if "type" not in params:
raise ValueError(
"The destination dictionary must include a destination type "
"(like `'type': 'download'`), but key 'type' does not exist."
)

# TODO less weird way to conveniently set overwrite?
if isinstance(params["type"], Image):
img = params.pop("type")
if params["type"].lower() == "catalog":
try:
img = params.pop("image")
except KeyError:
if (
"name" not in params
and "product_id" not in params
and "attributes_json" not in params
):
raise ValueError(
"For the Catalog destination, the options dict must contain an `image` field "
"with the `dl.catalog.Image` to upload to."
) from None
params = _image_to_catalog_params(img, **params)

return user_dict_to_has_proto(params, destinations_pb2.Destination, DEFAULTS)
Expand Down
119 changes: 106 additions & 13 deletions descarteslabs/workflows/docs/destinations.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,11 @@ Output Destinations
-------------------

.. note::
Output destinations control *where* results are stored -- like an email versus a download link. You can use output formats, which control *how* the results are stored, in conjunction with output destinations.
Output destinations control *where* results are stored -- like a download link versus the DL Catalog. You can use output formats, which control *how* the results are stored, in conjunction with output destinations.

For example, with ``destination='email@example.com'`` you might use ``format='geotiff'`` or ``format='json'``. Both would send emails; the emails would have links to download data in GeoTIFF versus JSON format.
For example, with ``destination='email'`` you might use ``format='geotiff'`` or ``format='json'``. Both would send emails; the emails would have links to download data in GeoTIFF versus JSON format.

..
TODO: Add "Some output destinations can only be used with certain output formats. For example, with the Catalog destination you can only use the GeoTIFF format." when we have the Catalog destination
Some destinations can only be used with certain formats. For example, with the Catalog destination, you can only use the GeoTIFF format.

.. contents::
:local:
Expand All @@ -19,7 +18,7 @@ Output Destinations

When calling `~.models.compute`, you can pick the destination for the results using the ``destination`` argument.

If you don't need to supply any options for the destination, you can pass the destination name as a string::
If you don't need to supply any options for the destination, you can use the destination's *shorthand*::

>>> two = wf.Int(1) + 1
>>> two.compute(destination="download")
Expand All @@ -29,12 +28,12 @@ If you would like to provide more destination options, you pass the destination
>>> two = wf.Int(1) + 1
>>> two.compute(destination={"type": "email", "subject": "My Computation is Done"})

Note that when passing the destination as a dictionary, it must include a ``type`` key corresponding to the desired destination.
Note that when passing the destination as a dictionary, it must include a ``type`` key with the destination's name.

Destination Options
^^^^^^^^^^^^^^^^^^^
Available Destinations
^^^^^^^^^^^^^^^^^^^^^^

The following is a list of the available options for each destination. The keys in the destination dictionary must match the keys listed here.
The following is a list of the available destinations and their options. The keys in the destination dictionary must match the keys listed here.

Download
~~~~~~~~
Expand Down Expand Up @@ -66,18 +65,18 @@ Email

Shorthand: ``"email"``

Email is equivalent to `Download`_, but also sends an email when the job is done. The email contains a link to download the data. Anyone with that link can download the data. As with `Download`_, the link will expire after 10 days.
Email is equivalent to `Download`_, but also sends you an email when the job is done. The email contains a link to download the data. Anyone with that link can download the data, so be careful forwarding it. As with `Download`_, the link will expire after 10 days.

Options
*******

- ``subject``: the subject of the email (string, default "Your job has completed"). Always prefixed with ``Workflows:``.
- ``body``: the body of the email (string, default "Your Workflows job is done.")
- ``subject``, str, default "Your job has completed": The subject of the email. Always prefixed with ``Workflows:``.
- ``body``, str, default "Your Workflows job is done.": The body of the email.

Compatible Formats
******************

- All :ref:`formats <output-formats>`. However, widely-used formats like JSON or GeoTIFF usually make the most sense for email. With formats like MsgPack and especially PyArrow, recipients would have to write code to parse the data, instead of clicking the download link and getting a file they can easily work with.
- All :ref:`formats <output-formats>`. However, widely-used formats like JSON or GeoTIFF usually make the most sense for email. With formats like MsgPack and especially PyArrow, you'd have to write code to parse the data, instead of clicking the download link and getting a file you can easily work with.

Examples
********
Expand All @@ -86,3 +85,97 @@ Examples

>>> two = wf.Int(1) + 1
>>> two.compute(destination={"type": "email", "subject": "My Computation is Done"}, format="json")

Catalog
~~~~~~~

Shorthand: a `.catalog.Image`

Uploads a Workflows `~.geospatial.Image` to the :ref:`Catalog <catalog_v2_guide>`. Can only be used when computing a Workflows `~.geospatial.Image`.

`.Image.compute` or `.Job.result` with this destination will just return the `.catalog.Image` object, not the data uploaded.

Options
*******

Usually, you should set ``rescale=True`` and ``change_dtype=True``. However, since they can change your data in unexpected ways, they are off by default.

- ``image``, `.catalog.Image`: The Catalog `~.catalog.Image` object to upload to.
- ``overwrite``, bool, default False: Overwrites the image if it already exists.
- ``rescale``, bool, default False: Rescales pixel values in each band from ``physical_range`` to ``data_range``, only if ``physical_range`` is set for the band. (When loading imagery, Workflows automatically rescales values into ``physical_range``, so this reverses that.)
- ``change_dtype``, bool, default False: changes the data type of the uploaded array to match the data type of the `~.catalog.Product`. Usually combined with ``rescale``. (When loading imagery, Workflows converts to float64, so this undoes that.)

Beware of data loss before setting ``change_dtype``: whether with ``rescale``, `.Image.scale_values`, or plain arithmetic, be sure the values can be represented in the Product's data type. For example, if the product in Catalog was ``uint16``, and your Workflows `~.geospatial.Image` currently held ``float64`` values from 0.0 to 1.0, converting those to ``uint16`` would just give you only 0s and 1s, so you'd want to rescale to a range like 0-10000 first to avoid data loss.


Transformations
***************

Workflows does a number of transformations to make your Workflows `~.geospatial.Image` compatible with the Catalog, such as reordering bands to match by name, rescaling and changing dtype, if requested, and using the Image's mask to fill in `~.catalog.GenericBand.nodata` values and generate an alpha band. Here are the full details:

- If `~.Image.acquired` is not already set on the `.catalog.Image` you pass, it's taken from ``properties['date']`` on the Workflows `~.geospatial.Image`, or if that's also not set, then from the current timestamp.
- Reorders the Workflows `~.geospatial.Image` bands to the product's band order, matching by band name. If the names don't match, assumes the bands are already in order.
- If ``rescale`` is True: Rescales pixel values from ``physical_range`` to ``data_range``, for each band where ``physical_range`` is set.
- If ``change_dtype`` is True: Converts to the Product's dtype.
- Fills in nodata values from the Image's mask, for bands with a ``nodata`` value.
- Create an alpha band from the Image's mask if:

- The Product has exactly 1 alpha band, which must be a `~.catalog.MaskBand` with `~.catalog.MaskBand.is_alpha` set to True.
- The Workflows `~.geospatial.Image` doesn't have a band for alpha (it has one less band than the product, and doesn't have a band with the alpha band's name).

Requirements
************

- The Workflows `~.geospatial.Image` must have the same number (and ideally same names) of bands as the Catalog Product you're uploading to (except an alpha band, which is generated automatically).
- All the bands in the Catalog Product must have the same data type.
- The Catalog Product must either have an alpha band (a `~.catalog.MaskBand` with `~.catalog.MaskBand.is_alpha` set to True), or every band must have a `~.catalog.GenericBand.nodata` value set (nodata is preferrable).
- You must have write access to the `~.catalog.Product`.

Compatible Formats
******************

- Only GeoTIFF. Though if you don't set ``format=`` and it defaults to ``pyarrow``, it's automatically switched to GeoTIFF for you. To control the details of the GeoTIFF that's uploaded to Catalog (overviews, overview resampler, etc.), specify ``format={"type": "geotiff", ...}`` with the parameters you want.

Examples
********

>>> import descarteslabs as dl
>>> import descarteslabs.workflows as wf
>>> composite = (
... wf.ImageCollection.from_id("sentinel-1:GRD", "2020-01-01", "2020-05-01")
... .mean(axis="images")
... )

Assume the product ``org:my_product_id`` already has the same bands as ``composite`` (in this case, ``vv`` and ``vh``), and the bands have `~.catalog.GenericBand.nodata` values set.

We can upload a single Catalog `~.catalog.Image`:

>>> image = dl.catalog.Image(name="my_image", product_id="org:my_product_id")
>>> tile = dl.scenes.DLTle.from_latlon(35.6870, -105.9378, 10, 1024, 0)
>>>
>>> composite.compute(tile, destination=image)
Job ID: 8b21474899b177431d404e42e25a958cc32302af37646f7e
[######] | Steps: 21/21 | Stage: SUCCEEDED
Image: my_image
id: org:my_product_id:my_image
product: org:my_product_id
created: Wed Jan 1 12:00:00 2020

Or, if you need to set options:

>>> composite.compute(
... tile,
... destination={
... "type": "catalog",
... "image": image,
... "overwrite": True,
... "rescale": True,
... "change_dtype": True,
... },
... )

More commonly, you'd upload many Images by splitting the area into tiles and launching concurrent upload Jobs:

>>> tiles = dl.scenes.DLTile.from_shape(aoi_geometry, 10, 1024, 0)
>>> images = [dl.catalog.Image(name=tile.key.replace(":", "_"), product_id="org:my_product_id") for tile in tiles]
>>> jobs = [composite.compute(tile, destination=image, block=False) for tile, image in zip(tiles, images)]
11 changes: 5 additions & 6 deletions descarteslabs/workflows/docs/formats.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,7 @@ Output Formats

For example, with ``format='geotiff'`` you might use ``destination='[email protected]'`` or ``destination='download'``. Both would produce GeoTIFFs; one would send an email with a link to the file, and the other would download the GeoTIFF within your script.

..
TODO: Add "Some output formats must be used with certain destinations. For example, with the Catalog destination you can only use the GeoTIFF format." when we have the Catalog destination
Some output formats are required by certain destinations. For example, with the Catalog destination, you can only use the GeoTIFF format.

.. contents::
:local:
Expand All @@ -29,14 +28,14 @@ If you would like to provide more format options, you pass the format as a dicti
>>> two = wf.Int(1) + 1
>>> two.compute(format={"type": "pyarrow", "compression": "brotli"})

Note that when passing the format as a dictionary, it must include a ``type`` key corresponding to the desired format.
Note that when passing the format as a dictionary, it must include a ``type`` key with the format's name.

The results will be returned differently depending on the ``format`` specified. When using the "pyarrow" format, results will be deserialized and unpacked into :ref:`result-types`. For all other formats, the results will not be deserialized and will be returned as raw bytes.

Format Options
^^^^^^^^^^^^^^
Available Formats
^^^^^^^^^^^^^^^^^

The following is a list of the available options for each format. The keys in the format dictionary must match the keys listed here.
The following is a list of the available formats and their options. The keys in the format dictionary must match the keys listed here.

PyArrow
~~~~~~~
Expand Down

0 comments on commit 7eb6b57

Please sign in to comment.