Skip to content

GH-46193: [Flight][Format] Extend Flight Location URI Semantics #46194

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 55 additions & 0 deletions docs/source/format/Flight.rst
Original file line number Diff line number Diff line change
Expand Up @@ -360,6 +360,61 @@ string, so the obvious candidates are not compatible. The chosen
representation can be parsed by both implementations, as well as Go's
``net/url`` and Python's ``urllib.parse``.

Extended Location URIs
----------------------

In addition to alternative transports, a server may also return
URIs that reference an external service or object storage location.
This can be useful in cases where intermediate data is cached as
Apache Parquet files on S3 or is accessible via an HTTP service. In
these scenarios, it is more efficient to be able to provide a URI
where the client may simply download the data directly, rather than
requiring a Flight service to read it back into memory and serve it
from a ``DoGet`` request. Servers should use the following URI
schemes for this situation:

+--------------------+---------------------------------------+
| Location | URI Scheme |
+====================+=======================================+
| Object storage (1) | gs:, gcs:, abfs:, abfss:, wasbs:, s3: |
+--------------------+---------------------------------------+
| HTTP service (2) | http:, https: |
+--------------------+---------------------------------------+

Notes:

* \(1) Any auth required should be either negotiated externally to
Flight or should use a presigned URI.
* \(2) The client should make a GET request to the provided URI
to retrieve the data.

When using an extended location URI, the client should ignore any
value in the ``Ticket`` field of the ``FlightEndpoint``. The
``Ticket`` is only used for identifying data in the context of a
Flight service, and is not needed when the client is directly
downloading data from an external service.

Clients should assume that, unless otherwise specified, the data is
being returned using the :ref:`ipc-streaming-format` just as it would
via a ``DoGet`` call. If the returned ``Content-Type`` header is a generic
media type such as ``application/octet-stream``, the client should still assume
it is an Arrow IPC stream. For other media types, such as Apache Parquet,
the server should use the appropriate IANA Media Type that a client
would recognize.

Finally, the server may also allow the client to choose what format the
data is returned in by respecting the ``Accept`` header in the request.
If multiple formats are requested and supported, the choice of which to
use is server-specific. If none of the requested content-types are
supported, the server may respond with either 406 (Not Acceptable),
415 (Unsupported Media Type), or successfuly respond with a different
format that it does support, along with the correct ``Content-Type``
header.

*Note: new schemes may be proposed in the future to allow for more
flexibility based on community requests.*


Error Handling
==============

Expand Down
39 changes: 37 additions & 2 deletions format/Flight.proto
Original file line number Diff line number Diff line change
Expand Up @@ -426,8 +426,43 @@ message Ticket {
}

/*
* A location where a Flight service will accept retrieval of a particular
* stream given a ticket.
* A location to retrieve a particular stream from. This URI should be one of
* the following:
* - An empty string or the string 'arrow-flight-reuse-connection://?':
* indicating that the ticket can be redeemed on the service where the
* ticket was generated via a DoGet request.
* - A valid grpc URI (grpc://, grpc+tls://, grpc+unix://, etc.):
* indicating that the ticket can be redeemed on the service at the given
* URI via a DoGet request.
* - A valid HTTP URI (http://, https://, etc.):
* indicating that the client should perform a GET request against the
* given URI to retrieve the stream. The ticket should be empty
* in this case and should be ignored by the client.
* - An object storage URI (s3://, gs://, abfs://, etc.):
* indicating that the client should retrieve the data from the provided
* object storage location. The ticket should be empty in this case and
* should be ignored by the client.
*
* We allow non-Flight URIs for the purpose of allowing Flight services to indicate that
* results can be downloaded in formats other than Arrow (such as Parquet) or to allow
* direct fetching of results from a URI to reduce excess copying and data movement.
* In these cases, the following conventions should be followed by servers and clients:
*
* - Unless otherwise specified by the 'Content-Type' header of the response,
* a client should assume the response is using the Arrow IPC Streaming format.
* Usage of an IANA media type like 'application/octet-stream' should be assumed to
* be using the Arrow IPC Streaming format.
* - The server may allow the client to choose a specific response format by
* specifying an 'Accept' header in the request, such as 'application/vnd.apache.parquet'
* or 'application/vnd.apache.arrow.stream'. If multiple types are requested and
* supported by the server, the choice of which to use is server-specific. If
* none of the requested content-types are supported, the server may respond with
* either 406 (Not Acceptable) or 415 (Unsupported Media Type), or successfully
* respond with a different format that it does support along with the correct
* 'Content-Type' header.
*
* Note: new schemes may be proposed in the future to allow for more flexibility based
* on community requests.
*/
message Location {
string uri = 1;
Expand Down
Loading