Skip to content

Add warehouse parameter to the REST Catalog doc #2066

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions mkdocs/docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -346,6 +346,7 @@ catalog:
| rest.signing-name | execute-api | The service signing name to use when SigV4 signing a request |
| oauth2-server-uri | <https://auth-service/cc> | Authentication URL to use for client credentials authentication (default: uri + 'v1/oauth/tokens') |
| snapshot-loading-mode | refs | The snapshots to return in the body of the metadata. Setting the value to `all` would return the full set of snapshots currently valid for the table. Setting the value to `refs` would load all snapshots referenced by branches or tags. |
| warehouse | myCatalog | For some catalog implementations like Databricks Unity, the warehouse is mandatory to identify which top level container (e.g. catalog, metastore...) is accessed. Namespaces are within this container. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This configuration is used across all catalog's, so wdyt about saying something closer to the root directory in storage where new Iceberg tables are created, also it may be worth mentioning that some REST clients may override this path via the getConfig api.

https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L97

We should also have a storage path be the example.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, I am not sure to understand the comment and if some rework is requested.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@geruh While that is true, the meaning of warehouse is different for the REST catalog and the others. For example, for Hive a path s3://bucket/warehouse is expected.

Copy link
Contributor

@geruh geruh Jun 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh so it doesn't have to be a path for the warehouse in rest?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warehouse is kinda confusing for rest catalog. its used as an identifier
polaris uses it to specify the "polaris catalog name" since it supports multiple catalogs https://polaris.apache.org/in-dev/unreleased/polaris-spark-client/
s3 tables uses it to specify the specific s3 tables arn https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-tables-integrating-open-source.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| warehouse | myCatalog | For some catalog implementations like Databricks Unity, the warehouse is mandatory to identify which top level container (e.g. catalog, metastore...) is accessed. Namespaces are within this container. |
| warehouse | myWarehouse | Warehouse location or identifier to request from the catalog service. May be used to determine server-side overrides, such as the warehouse location. |

how about something like this? this aligns with what the rest client use warehouse for. and is taken from the openapi spec
https://github.com/apache/iceberg/blob/17f9a9fd28bbcb37745f13ebfcd57cd5a96e0a5d/open-api/rest-catalog-open-api.yaml#L78

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warehouse is only one of the terms used by providers to identify the top level containers. Some are using catalog, other metastore. As it is quite important I suggest to be more explicit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some are using catalog, other metastore. As it is quite important I suggest to be more explicit.

It seems like catalog/metastore are terms specific to UC. I would like to keep the section generic for all REST implementations. Implementation can map the warehouse identifier to its internal construct. WDYT?


<!-- markdown-link-check-enable-->

Expand Down