Skip to content

Add warehouse parameter to the REST Catalog doc #2066

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

elca-anh
Copy link

@elca-anh elca-anh commented Jun 5, 2025

Rationale for this change

Missing parameter in REST Catalog documentation

Are these changes tested?

Doc only

Are there any user-facing changes?

Doc only

@@ -346,6 +346,7 @@ catalog:
| rest.signing-name | execute-api | The service signing name to use when SigV4 signing a request |
| oauth2-server-uri | <https://auth-service/cc> | Authentication URL to use for client credentials authentication (default: uri + 'v1/oauth/tokens') |
| snapshot-loading-mode | refs | The snapshots to return in the body of the metadata. Setting the value to `all` would return the full set of snapshots currently valid for the table. Setting the value to `refs` would load all snapshots referenced by branches or tags. |
| warehouse | myCatalog | For some catalog implementations like Databricks Unity, the warehouse is mandatory to identify which top level container (e.g. catalog, metastore...) is accessed. Namespaces are within this container. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This configuration is used across all catalog's, so wdyt about saying something closer to the root directory in storage where new Iceberg tables are created, also it may be worth mentioning that some REST clients may override this path via the getConfig api.

https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L97

We should also have a storage path be the example.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, I am not sure to understand the comment and if some rework is requested.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@geruh While that is true, the meaning of warehouse is different for the REST catalog and the others. For example, for Hive a path s3://bucket/warehouse is expected.

Copy link
Contributor

@geruh geruh Jun 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh so it doesn't have to be a path for the warehouse in rest?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warehouse is kinda confusing for rest catalog. its used as an identifier
polaris uses it to specify the "polaris catalog name" since it supports multiple catalogs https://polaris.apache.org/in-dev/unreleased/polaris-spark-client/
s3 tables uses it to specify the specific s3 tables arn https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-tables-integrating-open-source.html

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the docs for this! I added a comment to align the documentation with the REST openapi spec

@@ -346,6 +346,7 @@ catalog:
| rest.signing-name | execute-api | The service signing name to use when SigV4 signing a request |
| oauth2-server-uri | <https://auth-service/cc> | Authentication URL to use for client credentials authentication (default: uri + 'v1/oauth/tokens') |
| snapshot-loading-mode | refs | The snapshots to return in the body of the metadata. Setting the value to `all` would return the full set of snapshots currently valid for the table. Setting the value to `refs` would load all snapshots referenced by branches or tags. |
| warehouse | myCatalog | For some catalog implementations like Databricks Unity, the warehouse is mandatory to identify which top level container (e.g. catalog, metastore...) is accessed. Namespaces are within this container. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| warehouse | myCatalog | For some catalog implementations like Databricks Unity, the warehouse is mandatory to identify which top level container (e.g. catalog, metastore...) is accessed. Namespaces are within this container. |
| warehouse | myWarehouse | Warehouse location or identifier to request from the catalog service. May be used to determine server-side overrides, such as the warehouse location. |

how about something like this? this aligns with what the rest client use warehouse for. and is taken from the openapi spec
https://github.com/apache/iceberg/blob/17f9a9fd28bbcb37745f13ebfcd57cd5a96e0a5d/open-api/rest-catalog-open-api.yaml#L78

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warehouse is only one of the terms used by providers to identify the top level containers. Some are using catalog, other metastore. As it is quite important I suggest to be more explicit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some are using catalog, other metastore. As it is quite important I suggest to be more explicit.

It seems like catalog/metastore are terms specific to UC. I would like to keep the section generic for all REST implementations. Implementation can map the warehouse identifier to its internal construct. WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants