Skip to content

Docs: Fix the wrong catalog name in using polaris page #1471

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

owenowenisme
Copy link
Contributor

@owenowenisme owenowenisme commented Apr 27, 2025

The SQL command in the "using polaris" page is using the wrong catalog name, it should be quickstart_catalog instead of polaris according to the quick start instruction.

Signed-off-by: owenowenisme <[email protected]>
@github-project-automation github-project-automation bot moved this to PRs In Progress in Basic Kanban Board Apr 27, 2025
@owenowenisme owenowenisme changed the title Docs: Fix the wrong Catalog name in using polaris page Docs: Fix the wrong catalog name in using polaris page Apr 27, 2025
@adnanhemani
Copy link
Collaborator

Don’t think this is correct, check this:

--conf, "spark.sql.catalog.polaris=org.apache.iceberg.spark.SparkCatalog",

Can look further into this tomorrow, but in the meantime, please reverify and paste your output that shows this being correctly run without any changes.

@owenowenisme
Copy link
Contributor Author

owenowenisme commented Apr 27, 2025

@adnanhemani Thanks for commenting!
When I follow this

./polaris \
--client-id ${CLIENT_ID} \
--client-secret ${CLIENT_SECRET} \
catalogs \
create \
--storage-type s3 \
--default-base-location ${DEFAULT_BASE_LOCATION} \
--role-arn ${ROLE_ARN} \
quickstart_catalog

The instruction tell us to define a catalog named quickstart_catalog ,
so I think that the Sample Command section
#### Sample Commands
Once the Spark session starts, we can create a namespace and table within the catalog:
```sql
USE polaris;
CREATE NAMESPACE IF NOT EXISTS quickstart_namespace;
CREATE NAMESPACE IF NOT EXISTS quickstart_namespace.schema;
USE NAMESPACE quickstart_namespace.schema;
CREATE TABLE IF NOT EXISTS quickstart_table (id BIGINT, data STRING) USING ICEBERG;
```

should also be quickstart_catalog or it will be misleading.

@adnanhemani
Copy link
Collaborator

There’s a difference between a Spark Catalog and a Polaris Catalog. It makes sense that it can be confusing, so maybe we add a line in the documentation that states that we have added a Spark catalog named “polaris” that points to the “quickstart_catalog” - or change all references in the Spark configs to “quickstart_catalog”. What are your thoughts?

cc @adutra, who I believe was the original author of the Dockerfiles where this was named.

@owenowenisme
Copy link
Contributor Author

owenowenisme commented Apr 27, 2025

My thought is to change all references in the Spark configs to “quickstart_catalog” since 0.9.0 doc also do it this way.

Once the Spark session starts, we can create a namespace and table within the catalog:
```
spark.sql("USE quickstart_catalog")
spark.sql("CREATE NAMESPACE IF NOT EXISTS quickstart_namespace")
spark.sql("CREATE NAMESPACE IF NOT EXISTS quickstart_namespace.schema")
spark.sql("USE NAMESPACE quickstart_namespace.schema")
spark.sql("""
CREATE TABLE IF NOT EXISTS quickstart_table (
id BIGINT, data STRING
)
USING ICEBERG
""")
```

eric-maynard
eric-maynard previously approved these changes Apr 28, 2025
@github-project-automation github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Apr 28, 2025
@eric-maynard
Copy link
Contributor

Looking at the invocation, quickstart_catalog looks correct cc @adnanhemani

bin/spark-sql \
--packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.7.1,org.apache.hadoop:hadoop-aws:3.4.0 \
--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
--conf spark.sql.catalog.quickstart_catalog.warehouse=quickstart_catalog \
. . .

@adnanhemani
Copy link
Collaborator

@eric-maynard - yes, that invocation is only for the local Spark setup, however (which from personal experience is very hit or miss on whether folks can set it up or not). For the Docker Spark instructions, the catalog is called polaris (as per

--conf, "spark.sql.catalog.polaris=org.apache.iceberg.spark.SparkCatalog",
). This is an issue - both should've been mirrors of each other.

@owenowenisme - I agree with your recommendation. Can you please change the Docker Compose files as well to mirror this quickstart_catalog Spark catalog name? If possible, let's hold on merging this until we get those changes as well. If not, I can branch a new PR to make those changes too.

Signed-off-by: owenowenisme <[email protected]>
@owenowenisme
Copy link
Contributor Author

@adnanhemani I changed them all in this PR, PTAL.
And thanks for reviewing!

Copy link
Collaborator

@adnanhemani adnanhemani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you for this @owenowenisme!!

@@ -174,7 +174,7 @@ Replace the credentials used in the Docker container using the following code:
```shell
USER_CLIENT_ID="XXXX"
USER_CLIENT_SECRET="YYYY"
sed -i "s/^\(.*spark\.sql\.catalog\.polaris\.credential=\).*/\1${USER_CLIENT_ID}:${USER_CLIENT_SECRET}\",/" getting-started/eclipselink/docker-compose.yml
sed -i "s/^\(.*spark\.sql\.catalog\.quickstart_catalog\.credential=\).*/\1${USER_CLIENT_ID}:${USER_CLIENT_SECRET}\",/" getting-started/eclipselink/docker-compose.yml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not relevant to this PR, but I still find this line really scary. This seems like a recipe for someone unknowingly pushing their credentials to GH.

Copy link
Contributor Author

@owenowenisme owenowenisme Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right!
I can open another PR to use env variable instead.
WDYT?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can make it work, then yes please!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll start as soon as this PR is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants