-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DOCS] volume change detection #10927
base: develop
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for niobium-lead-7998 ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
✅ All tests successful. No failed tests found. Additional details and impacted files@@ Coverage Diff @@
## develop #10927 +/- ##
===========================================
- Coverage 80.84% 80.84% -0.01%
===========================================
Files 471 471
Lines 40790 40790
===========================================
- Hits 32976 32975 -1
- Misses 7814 7815 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
…pectations into kml/1042/vcd
8. Click **Start monitoring** or **Finish**. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for reviewers some UI copy was still in flux while I was working on this PR. I plan to double check all references to UI copy in this PR closer to the release date
## Next steps | ||
|
||
- [Add an Expectation](/cloud/expectations/manage_expectations.md#add-an-expectation). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for reviewers though not in the plan, I moved the CTA to add an Expectation to a new "Next steps" section because I thought it seemed weird to have sequential steps of
- Click Start monitoring ...
- Add an Expectation ...
- You have connected GX Cloud to the relevant Data Source. | ||
|
||
### Procedure | ||
To add a Data Asset from an existing Data Source, complete the following steps: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for reviewers Though not in the plan, I changed the section intro and headers here because the old "Define the data you want GX Cloud to access." intro seemed too narrow in scope now and the pre-reqs section seemed a bit redundant here.
@@ -34,6 +38,7 @@ import OverviewCard from '@site/src/components/OverviewCard'; | |||
<LinkCard topIcon label="Manage Data Assets" description="Create, edit, or delete a Data Asset." to="/cloud/data_assets/manage_data_assets" icon="/img/small_gx_logo.png" /> | |||
<LinkCard topIcon label="Manage Expectations" description="Create, edit, or delete an Expectation." to="/cloud/expectations/manage_expectations" icon="/img/small_gx_logo.png" /> | |||
<LinkCard topIcon label="Manage Validations" description="Run a Validation, or view the Validation run history." to="/cloud/validations/manage_validations" icon="/img/small_gx_logo.png" /> | |||
<LinkCard topIcon label="Manage schedules" description="Use a schedule to automate data quality checks." to="/cloud/schedules/manage_schedules" icon="/img/small_gx_logo.png" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for reviewer in repurposing this page to be the landing page for the new "Introduction" section, I noticed that there were cards for all the top-level "manage" sections except for "manage schedules" so I went ahead and added it for completeness
@@ -139,7 +139,7 @@ flexibility where column presence is more critical than their sequence. | |||
``` | |||
|
|||
:::tip Automate this rule | |||
When you [create a new Data Asset](/cloud/data_assets/manage_data_assets.md#add-a-data-asset-from-an-existing-data-source), you can choose to automatically generate this Expectation to test that columns don't diverge from the initial set over time. | |||
When you [create a new Data Asset with GX Cloud](/cloud/data_assets/manage_data_assets.md#add-a-data-asset-from-an-existing-data-source), you can choose to automatically generate this Expectation to test that columns don't diverge from the initial set over time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for reviewers Though not in the plan I updated this tip wording to be more clear that this feature is a cloud-only value-add. This matches the wording of the new tip on the volume page.
|
||
7. Click **Add x Asset(s)**. | ||
7. Decide which common data quality issues you want to start monitoring. By default, GX Cloud adds Expectations to detect **Schema** and **Volume** issues. You can de-select recommendations you’d like to opt out of. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any docs explaining more about each of these expectations? If so, I think linking directly to those would be good in this spot and for the other connect
docs with this step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe how Schema and Volume are defined, among the set of data quality issues we support? Could link out to the pages here: https://docs.greatexpectations.io/docs/reference/learn/data_quality_use_cases/dq_use_cases_lp
4. Decide if you want to **Generate Expectations that detect column changes in selected Data Assets**. | ||
4. Click **Add x Asset(s)**. | ||
|
||
5. Decide which common data quality issues you want to start monitoring. By default, GX Cloud adds Expectations to detect **Schema** and **Volume** issues. You can de-select recommendations you’d like to opt out of. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above about linking to what these expectations mean. As a new reader/user, I would be a little confused and my inclination would be to look for more detailed definitions.
- You can generate basic rules as part of adding a new Data Asset. | ||
- You can generate personalized AI-recommended rules for an existing Data Asset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two bullets feel disjointed to me compared to the rest of the language in the docs... might be better if they were between the first and second sentences in line 6? Not sure..
Can we be more specific about what "basic rules" mean? Assuming they refer to schema/volume expectations, can't those also be added to existing assets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use the term "standard" instead of "basic"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two bullets feel disjointed to me compared to the rest of the language in the docs... might be better if they were between the first and second sentences in line 6? Not sure..
I'll rephrase to improve the flow
Can we be more specific about what "basic rules" mean?
I'll add a link to clarify
Assuming they refer to schema/volume expectations, can't those also be added to existing assets?
yes but that would be manual, not automatic. I'll adjust wording to help clarify that this is about the automatic helper that is available only at the time of asset creation, not manual expectation creation which is available any time.
Can we use the term "standard" instead of "basic"?
Yes
|
||
## Monitoring common issues | ||
|
||
When you [add a new Data Asset](/cloud/data_assets/manage_data_assets.md), GX Cloud by default automatically generates Expectations to test the following common data quality issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"by default automatically" seems redundant. I think you'd be fine with just "automatically".
Nit: I'd use a colon to introduce the list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"by default automatically" seems redundant. I think you'd be fine with just "automatically".
I'll keep "by default" and drop "automatically" since folks can choose to opt out
### Schema | ||
|
||
To detect schema changes, we automatically generate a rule to **expect table columns to match set** using the Data Asset’s initial columns as the set to match. If the number or names of columns in the Data Asset change, this Expectation will fail. | ||
|
||
### Volume |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to link to these descriptions in the docs where I commented above.
@@ -46,7 +46,9 @@ There are a variety of GX Cloud features that support additional enhancements to | |||
|
|||
* **Data Asset profiling.** GX Cloud introspects your data schema by default on Data Asset creation, and also offers one-click fetching of additional descriptive metrics including column type and statistical summaries. Data profiling results are used to suggest parameters for Expectations that you create. | |||
|
|||
* **Automate schema change detection.** GX Cloud can automatically generate Expectations that detect column changes. This option is available when [you create new Data Assets](/cloud/data_assets/manage_data_assets.md#add-a-data-asset-from-an-existing-data-source). | |||
* **Automate rules for common issues.** GX Cloud can automatically generate Expectations that detect column changes and non-increasing volume. This option is available when [you create new Data Assets](/cloud/data_assets/manage_data_assets.md#add-a-data-asset-from-an-existing-data-source). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe say "table volume" here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We currently don't use the phrase "table volume" anywhere else in the docs so I'm reluctant to introduce it here. I'll change this to "data volume" (which is a phrase we already use elsewhere) to disambiguate from other potential interpretations of "volume"
|
||
7. Click **Add x Asset(s)**. | ||
7. Decide which common data quality issues you want to start monitoring. By default, GX Cloud adds Expectations to detect **Schema** and **Volume** issues. You can de-select recommendations you’d like to opt out of. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe how Schema and Volume are defined, among the set of data quality issues we support? Could link out to the pages here: https://docs.greatexpectations.io/docs/reference/learn/data_quality_use_cases/dq_use_cases_lp
description: Generate data quality rules to more quickly achieve test coverage for your data. | ||
--- | ||
|
||
With GX Cloud, you can automatically generate data quality rules to more quickly achieve test coverage for your data. This page provides an overview of options for automating data quality rules. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"This page provides..." feels a little formal. What about "GX provides options for generating standard rules and generating personalized AI-recommended rules across various parts of the application."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This phrasing is intentionally formal as I want to start having pages explicitly state their learning objectives to help folks navigate the docs. This is a common pattern in technical documentation. Examples from other docs sites
- Stripe - "This guide offers a few ways to understand your options: Use cases ... Types of recurring payments ... Stripe products ..."
- Soda - "As a step in the Get started roadmap, this guide offers instructions to schedule a Soda scan, run a scan, or invoke a scan programmatically."
- Monte Carlo - "This page outlines some networking basics for successfully connecting with Monte Carlo."
- You can generate basic rules as part of adding a new Data Asset. | ||
- You can generate personalized AI-recommended rules for an existing Data Asset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use the term "standard" instead of "basic"?
Resolves https://greatexpectations.atlassian.net/browse/DOC-1042 and https://greatexpectations.atlassian.net/browse/DOC-1043 according to the plan linked in those issues
invoke lint
(usesruff format
+ruff check
)For more information about contributing, visit our community resources.
After you submit your PR, keep the page open and monitor the statuses of the various checks made by our continuous integration process at the bottom of the page. Please fix any issues that come up and reach out on Slack if you need help. Thanks for contributing!