-
Notifications
You must be signed in to change notification settings - Fork 13.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(api): extending uuid field in dataset, chart, dasboard APIs #29573
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #29573 +/- ##
==========================================
+ Coverage 60.48% 64.99% +4.50%
==========================================
Files 1931 526 -1405
Lines 76236 37928 -38308
Branches 8568 0 -8568
==========================================
- Hits 46114 24651 -21463
+ Misses 28017 13277 -14740
+ Partials 2105 0 -2105
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
superset/datasets/api.py
Outdated
@@ -123,6 +125,36 @@ class DatasetRestApi(BaseSupersetModelRestApi): | |||
"schema", | |||
"sql", | |||
"table_name", | |||
"always_filter_main_dttm", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dpgaspar won't the payload critically increase with this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have datasets with thousands of metrics and columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's only blocker, I don't mind to revert that. Perhaps it makes more sense to introduce a parallel API (v2) and catch up the front-end later. Please let me know, no rush here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could potentially generate huge payloads like @michael-s-molina is saying
Hi, @dankor! Welcome to the Superset GitHub! Can you add some context on why these fields are needed? Thanks! |
Hi @betodealmeida! Sure! I'm working on establishing a robust and granular deployment process for Superset. The core issue here is similar to why we expose these UUIDs in the import/export API: we need persistent identifiers across multiple Superset instances. You might suggest extending the import/export API instead, but we’re encountering several challenges:
What I’m really aiming for is a declarative way to manage Superset resources (let’s call it a "report-as-code" approach). Imagine having a development environment where everyone can experiment freely. Once a report is ready, it can be committed to git, reviewed, and then deployed to higher environments. Ideally, I envision mature version of this, but supporting all types of resources. You can check out a proof-of-concept in this video, though no code is available. My plan is to implement this feature incrementally, starting with minor changes and aligning as closely as possible with upstream. Hope this makes sense! Let me know if you're open to these kinds of contributions and what concerns the community might have. |
Hey all! I'm just wondering if there are any upcoming plans or next steps here. Could you please let me what to expect? |
It looks like CI is stuck because of some tests failing (which I'm re-running), and pre-commit checks (i.e. linting) so you will need to run the Pre-commit hooks Superset uses Git pre-commit hooks courtesy of pre-commit. To install run the following:
You can then run pre-commit manually:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
list_columns
is used by the frontend to load the datasets/charts/dashboards pages. Projecting all column/metric information will significantly impact performance for environments that have thousands of columns/metrics. If you want to query for that information you can request them using the q
parameter.
Hi @rusackas ! Thanks! I fixed the linting. @michael-s-molina Fair point. Actually, I have added this fields in order to {
"columns": [
"always_filter_main_dttm",
"cache_timeout",
"catalog",
"columns.advanced_data_type",
"columns.column_name",
"columns.description",
"columns.expression",
"columns.extra",
"columns.filterable",
"columns.groupby",
"columns.is_active",
"columns.is_dttm",
"columns.python_date_format",
"columns.type",
"columns.verbose_name",
"default_endpoint",
"description",
"extra",
"fetch_values_predicate",
"filter_select_enabled",
"is_managed_externally",
"is_sqllab_view",
"main_dttm_col",
"metrics.d3format",
"metrics.description",
"metrics.expression",
"metrics.extra",
"metrics.metric_name",
"metrics.metric_type",
"metrics.verbose_name",
"metrics.warning_text",
"normalize_columns",
"offset",
"schema",
"sql",
"table_name",
"template_params",
"uuid",
"database.uuid"
],
"filters": [
{
"col": "uuid",
"opr": "eq",
"value": "7d2a123a-1952-46e1-b9d0-4d41e1e042c1"
}
]
} Otherwise these fields won't be listed with Do you suppose to keep
I'd tend to have one request if possible, but I assume the front-end changes may be too big, right? |
You would need to change the requests for all pages but I'm also worried that the default implementation of the endpoint will perform these joins (columns and metrics belong to separate tables). @dpgaspar Is there a way we can achieve this without multiple requests? |
I'm wondering if we can come up with some action plan here. It currently seems quite frontend-oriented. I'd like to find a common ground in order to both fulfill my use-case and contribute it upstream, so community can benefit too. What's your vision here? |
I think the main thing is that we can't impact performance. @dpgaspar is currently out of office, but may be able to provide suggestions upon his return of how to handle this at the API layer. I'm also wondering if this might be a precursor to a (fairly common) request to change URLs from being incremental integer IDs to UUIDs (e.g. here) It might also tie into this issue. Maybe we can just solve the performance concern and let this through, but at some point we need to have a SIP proposed about how/where UUIDs should be used more systemically by Superset. I think that's the real path forward - broadening the discussion to formulate a plan to use UUIDs everywhere relevant. |
If we include columns and metrics on a dataset payload, these results won't be paginated. So I advise making multiple requests |
Hey @betodealmeida @rusackas @michael-s-molina @dpgaspar ! Could you please take another look? I've reworked this PR to eliminate the need for listing calls when looking up a single item by UUID. I was inspired by the dashboard GET approach, where the dashboard API supports three different PKs: ID, slug, and UUID. I've only implemented this for the chart so far, to get your feedback on the new approach. I hope it makes more sense now. Looking forward to hearing from you. |
SUMMARY
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
Extend API to handle uuid
TESTING INSTRUCTIONS
Plus, PUT and POST.
ADDITIONAL INFORMATION