-
Notifications
You must be signed in to change notification settings - Fork 498
Add more examples for pyiceberg view #3414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1529,15 +1529,72 @@ cleanup_old_snapshots("analytics.user_events", [12345, 67890, 11111]) | |
|
|
||
| ## Views | ||
|
|
||
| PyIceberg supports view operations. | ||
| If the REST server does not indicate support for view endpoints, you can enable it by setting `"view-endpoints-supported": "true"`: | ||
|
|
||
| ### Check if a view exists | ||
| ```python | ||
| from pyiceberg.catalog import load_catalog | ||
|
|
||
| catalog = load_catalog( | ||
| "docs", | ||
| **{ | ||
| "uri": "http://127.0.0.1:8181", | ||
| "s3.endpoint": "http://127.0.0.1:9000", | ||
| "py-io-impl": "pyiceberg.io.pyarrow.PyArrowFileIO", | ||
| "s3.access-key-id": "admin", | ||
| "s3.secret-access-key": "password", | ||
| "view-endpoints-supported": "true", | ||
|
Comment on lines
+1540
to
+1545
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Documenting How about keeping
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So the create/load/check tables are all H2. Thus, I proceeded with the same for views. I like the idea of keeping |
||
| } | ||
| ) | ||
| ``` | ||
|
|
||
| ## Create a view | ||
|
|
||
| To create a view from the catalog: | ||
|
|
||
| ```python | ||
| import time | ||
| from pyiceberg.catalog import load_catalog | ||
| from pyiceberg.schema import Schema | ||
| from pyiceberg.types import IntegerType, NestedField | ||
| from pyiceberg.view import SQLViewRepresentation, ViewVersion | ||
|
|
||
| catalog = load_catalog("default") | ||
|
MonkeyCanCode marked this conversation as resolved.
|
||
| catalog.view_exists("default.bar") | ||
|
|
||
| schema = Schema(NestedField(field_id=1, name="some_col", field_type=IntegerType(), required=False)) | ||
| view_version = ViewVersion( | ||
| version_id=1, | ||
| schema_id=1, | ||
| timestamp_ms=int(time.time() * 1000), | ||
| summary={"spark-version": "4.1"}, | ||
| representations=[ | ||
| SQLViewRepresentation( | ||
| type="sql", | ||
| sql="SELECT 1 as some_col", | ||
| dialect="spark", | ||
| ) | ||
| ], | ||
| default_namespace=["default"], | ||
| ) | ||
|
|
||
| catalog.create_view( | ||
| identifier="default.some_view", | ||
| schema=schema, | ||
| view_version=view_version, | ||
| ) | ||
| ``` | ||
|
|
||
| `catalog.create_view` also accepts a PyArrow schema, so the following is equivalent: | ||
|
|
||
| ```python | ||
| import pyarrow as pa | ||
|
|
||
| schema = pa.schema([pa.field("some_col", pa.int32())]) | ||
|
|
||
| catalog.create_view( | ||
| identifier="default.some_view", | ||
| schema=schema, | ||
| view_version=view_version, | ||
| ) | ||
| ``` | ||
|
|
||
| ## Register a view | ||
|
|
@@ -1551,6 +1608,48 @@ catalog.register_view( | |
| ) | ||
| ``` | ||
|
|
||
| ## Load a view | ||
|
|
||
| Loading the `some_view` view: | ||
|
|
||
| ```python | ||
| view = catalog.load_view("default.some_view") | ||
| # Equivalent to: | ||
| view = catalog.load_view(("default", "some_view")) | ||
| # The tuple syntax can be used if the namespace or view contains a dot. | ||
| ``` | ||
|
|
||
| This returns a `View` that represents an Iceberg view. You can access the SQL representation for a specific dialect: | ||
|
|
||
| ```python | ||
| sql_representation = view.sql_for("spark") | ||
| print(sql_representation.sql) | ||
| ``` | ||
|
|
||
| ## Check if a view exists | ||
|
|
||
| To check whether the `some_view` view exists: | ||
|
|
||
| ```python | ||
| catalog.view_exists("default.some_view") | ||
| ``` | ||
|
|
||
| ## List views | ||
|
|
||
| To list views in the `default` namespace: | ||
|
|
||
| ```python | ||
| catalog.list_views("default") | ||
| ``` | ||
|
|
||
| ## Drop a view | ||
|
|
||
| To drop a view: | ||
|
|
||
| ```python | ||
| catalog.drop_view("default.some_view") | ||
| ``` | ||
|
|
||
| ## Table Statistics Management | ||
|
|
||
| Manage table statistics with operations through the `Table` API: | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should be encouraging this pattern. If a Catalog says they don't support view endpoints (through /v1/config), we shouldn't tell people to ignore that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rambleraptor, what is your suggestion? This config property was introduced for backward compatibility. Older REST servers may not send
endpointsfield in the ConfigResponse even if view operations are supported.