Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(api): extending uuid field in dataset, chart, dasboard APIs #29573

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

dankor
Copy link

@dankor dankor commented Jul 12, 2024

SUMMARY

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

Extend API to handle uuid

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@github-actions github-actions bot added the api Related to the REST API label Jul 12, 2024
@dosubot dosubot bot added api:charts Related to the REST endpoints of charts api:dashboard Related to the REST endpoints of the Dashboard labels Jul 12, 2024
@dankor dankor mentioned this pull request Jul 12, 2024
3 tasks
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Congrats on making your first PR and thank you for contributing to Superset! 🎉 ❤️

We hope to see you in our Slack community too! Not signed up? Use our Slack App to self-register.

Copy link

codecov bot commented Jul 12, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 64.99%. Comparing base (76d897e) to head (9c71763).
Report is 500 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #29573      +/-   ##
==========================================
+ Coverage   60.48%   64.99%   +4.50%     
==========================================
  Files        1931      526    -1405     
  Lines       76236    37928   -38308     
  Branches     8568        0    -8568     
==========================================
- Hits        46114    24651   -21463     
+ Misses      28017    13277   -14740     
+ Partials     2105        0    -2105     
Flag Coverage Δ
hive 49.12% <100.00%> (-0.05%) ⬇️
javascript ?
presto 53.68% <100.00%> (-0.13%) ⬇️
python 64.99% <100.00%> (+1.50%) ⬆️
unit 59.90% <100.00%> (+2.28%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -123,6 +125,36 @@ class DatasetRestApi(BaseSupersetModelRestApi):
"schema",
"sql",
"table_name",
"always_filter_main_dttm",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dpgaspar won't the payload critically increase with this change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have datasets with thousands of metrics and columns.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's only blocker, I don't mind to revert that. Perhaps it makes more sense to introduce a parallel API (v2) and catch up the front-end later. Please let me know, no rush here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could potentially generate huge payloads like @michael-s-molina is saying

@betodealmeida
Copy link
Member

Hi, @dankor! Welcome to the Superset GitHub!

Can you add some context on why these fields are needed? Thanks!

@michael-s-molina michael-s-molina added the review:checkpoint Last PR reviewed during the daily review standup label Jul 12, 2024
@dankor
Copy link
Author

dankor commented Jul 13, 2024

Hi, @dankor! Welcome to the Superset GitHub!

Can you add some context on why these fields are needed? Thanks!

Hi @betodealmeida! Sure! I'm working on establishing a robust and granular deployment process for Superset. The core issue here is similar to why we expose these UUIDs in the import/export API: we need persistent identifiers across multiple Superset instances.

You might suggest extending the import/export API instead, but we’re encountering several challenges:

  • No support for tags, roles, etc.
  • Deprecated resources aren’t cleaned up.
  • Archives are much harder to debug compared to classic CRUD operations.

What I’m really aiming for is a declarative way to manage Superset resources (let’s call it a "report-as-code" approach). Imagine having a development environment where everyone can experiment freely. Once a report is ready, it can be committed to git, reviewed, and then deployed to higher environments.

Ideally, I envision mature version of this, but supporting all types of resources. You can check out a proof-of-concept in this video, though no code is available.

My plan is to implement this feature incrementally, starting with minor changes and aligning as closely as possible with upstream.

Hope this makes sense! Let me know if you're open to these kinds of contributions and what concerns the community might have.

@michael-s-molina michael-s-molina removed the review:checkpoint Last PR reviewed during the daily review standup label Jul 15, 2024
@dankor
Copy link
Author

dankor commented Jul 29, 2024

Hey all! I'm just wondering if there are any upcoming plans or next steps here. Could you please let me what to expect?

@rusackas
Copy link
Member

It looks like CI is stuck because of some tests failing (which I'm re-running), and pre-commit checks (i.e. linting) so you will need to run the Pre-commit hooks

Superset uses Git pre-commit hooks courtesy of pre-commit. To install run the following:

pip3 install -r requirements/integration.txt
pre-commit install
A series of checks will now run when you make a git commit.

You can then run pre-commit manually:

pre-commit run --all-files

Copy link
Member

@michael-s-molina michael-s-molina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list_columns is used by the frontend to load the datasets/charts/dashboards pages. Projecting all column/metric information will significantly impact performance for environments that have thousands of columns/metrics. If you want to query for that information you can request them using the q parameter.

@dankor
Copy link
Author

dankor commented Jul 29, 2024

Hi @rusackas ! Thanks! I fixed the linting.

@michael-s-molina Fair point. Actually, I have added this fields in order to q them:

{
  "columns": [
    "always_filter_main_dttm",
    "cache_timeout",
    "catalog",
    "columns.advanced_data_type",
    "columns.column_name",
    "columns.description",
    "columns.expression",
    "columns.extra",
    "columns.filterable",
    "columns.groupby",
    "columns.is_active",
    "columns.is_dttm",
    "columns.python_date_format",
    "columns.type",
    "columns.verbose_name",
    "default_endpoint",
    "description",
    "extra",
    "fetch_values_predicate",
    "filter_select_enabled",
    "is_managed_externally",
    "is_sqllab_view",
    "main_dttm_col",
    "metrics.d3format",
    "metrics.description",
    "metrics.expression",
    "metrics.extra",
    "metrics.metric_name",
    "metrics.metric_type",
    "metrics.verbose_name",
    "metrics.warning_text",
    "normalize_columns",
    "offset",
    "schema",
    "sql",
    "table_name",
    "template_params",
    "uuid",
    "database.uuid"
  ],
  "filters": [
    {
      "col": "uuid",
      "opr": "eq",
      "value": "7d2a123a-1952-46e1-b9d0-4d41e1e042c1"
    }
  ]
}

Otherwise these fields won't be listed with q.

Do you suppose to keep list_columns unchanged and perform two requests instead?

  • Run list_columns with uuid filter and get id.
  • Run show_select_columns against obtained id with all required fields.

I'd tend to have one request if possible, but I assume the front-end changes may be too big, right?

@michael-s-molina
Copy link
Member

michael-s-molina commented Jul 29, 2024

I'd tend to have one request if possible, but I assume the front-end changes may be too big, right?

You would need to change the requests for all pages but I'm also worried that the default implementation of the endpoint will perform these joins (columns and metrics belong to separate tables).

@dpgaspar Is there a way we can achieve this without multiple requests?

@dankor
Copy link
Author

dankor commented Aug 8, 2024

I'm wondering if we can come up with some action plan here. It currently seems quite frontend-oriented. I'd like to find a common ground in order to both fulfill my use-case and contribute it upstream, so community can benefit too. What's your vision here?

@rusackas
Copy link
Member

I think the main thing is that we can't impact performance. @dpgaspar is currently out of office, but may be able to provide suggestions upon his return of how to handle this at the API layer.

I'm also wondering if this might be a precursor to a (fairly common) request to change URLs from being incremental integer IDs to UUIDs (e.g. here)

It might also tie into this issue.

Maybe we can just solve the performance concern and let this through, but at some point we need to have a SIP proposed about how/where UUIDs should be used more systemically by Superset. I think that's the real path forward - broadening the discussion to formulate a plan to use UUIDs everywhere relevant.

@dpgaspar
Copy link
Member

I'd tend to have one request if possible, but I assume the front-end changes may be too big, right?

You would need to change the requests for all pages but I'm also worried that the default implementation of the endpoint will perform these joins (columns and metrics belong to separate tables).

@dpgaspar Is there a way we can achieve this without multiple requests?

If we include columns and metrics on a dataset payload, these results won't be paginated. So I advise making multiple requests

@dankor dankor reopened this Aug 26, 2024
@dankor
Copy link
Author

dankor commented Aug 26, 2024

Hey @betodealmeida @rusackas @michael-s-molina @dpgaspar !

Could you please take another look? I've reworked this PR to eliminate the need for listing calls when looking up a single item by UUID. I was inspired by the dashboard GET approach, where the dashboard API supports three different PKs: ID, slug, and UUID. I've only implemented this for the chart so far, to get your feedback on the new approach. I hope it makes more sense now. Looking forward to hearing from you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api:charts Related to the REST endpoints of charts api:dashboard Related to the REST endpoints of the Dashboard api Related to the REST API size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants