Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS] volume change detection #10927

Open
wants to merge 22 commits into
base: develop
Choose a base branch
from
Open

[DOCS] volume change detection #10927

wants to merge 22 commits into from

Conversation

klavavej
Copy link
Contributor

Resolves https://greatexpectations.atlassian.net/browse/DOC-1042 and https://greatexpectations.atlassian.net/browse/DOC-1043 according to the plan linked in those issues

  • Description of PR changes above includes a link to an existing GitHub issue
  • PR title is prefixed with one of: [BUGFIX], [FEATURE], [DOCS], [MAINTENANCE], [CONTRIB]
  • Code is linted - run invoke lint (uses ruff format + ruff check)
  • Appropriate tests and docs have been updated

For more information about contributing, visit our community resources.

After you submit your PR, keep the page open and monitor the statuses of the various checks made by our continuous integration process at the bottom of the page. Please fix any issues that come up and reach out on Slack if you need help. Thanks for contributing!

Copy link

netlify bot commented Feb 11, 2025

Deploy Preview for niobium-lead-7998 ready!

Name Link
🔨 Latest commit e9e7835
🔍 Latest deploy log https://app.netlify.com/sites/niobium-lead-7998/deploys/67bcf932da68f10008b5908c
😎 Deploy Preview https://deploy-preview-10927.docs.greatexpectations.io
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

codecov bot commented Feb 11, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.84%. Comparing base (d4dc22d) to head (e9e7835).

✅ All tests successful. No failed tests found.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop   #10927      +/-   ##
===========================================
- Coverage    80.84%   80.84%   -0.01%     
===========================================
  Files          471      471              
  Lines        40790    40790              
===========================================
- Hits         32976    32975       -1     
- Misses        7814     7815       +1     
Flag Coverage Δ
3.10 70.22% <ø> (-0.01%) ⬇️
3.10 athena or openpyxl or pyarrow or project or sqlite or aws_creds 56.55% <ø> (?)
3.10 aws_deps 46.51% <ø> (?)
3.10 big 54.96% <ø> (?)
3.10 bigquery 48.74% <ø> (?)
3.10 clickhouse 43.41% <ø> (?)
3.10 databricks 50.51% <ø> (?)
3.10 filesystem 63.01% <ø> (?)
3.10 mssql 51.51% <ø> (?)
3.10 mysql 51.89% <ø> (?)
3.10 postgresql 54.61% <ø> (?)
3.10 snowflake 51.26% <ø> (?)
3.10 spark 57.93% <ø> (?)
3.10 spark_connect 46.83% <ø> (?)
3.10 trino 52.43% <ø> (?)
3.11 70.24% <ø> (+0.01%) ⬆️
3.11 athena or openpyxl or pyarrow or project or sqlite or aws_creds 56.55% <ø> (?)
3.11 aws_deps 46.51% <ø> (?)
3.11 big 54.96% <ø> (?)
3.11 bigquery 48.74% <ø> (?)
3.11 clickhouse 43.41% <ø> (?)
3.11 databricks 50.51% <ø> (?)
3.11 filesystem 63.01% <ø> (?)
3.11 mssql 51.51% <ø> (?)
3.11 mysql 51.89% <ø> (?)
3.11 postgresql 54.61% <ø> (?)
3.11 snowflake 51.26% <ø> (?)
3.11 spark 57.93% <ø> (?)
3.11 spark_connect 46.83% <ø> (?)
3.11 trino 52.43% <ø> (?)
3.12 70.24% <ø> (ø)
3.12 athena or openpyxl or pyarrow or project or sqlite or aws_creds 56.56% <ø> (ø)
3.12 aws_deps 46.51% <ø> (ø)
3.12 big 54.96% <ø> (ø)
3.12 bigquery 48.74% <ø> (ø)
3.12 databricks 50.51% <ø> (ø)
3.12 filesystem 63.01% <ø> (ø)
3.12 mssql 51.52% <ø> (ø)
3.12 mysql 51.89% <ø> (ø)
3.12 postgresql 54.61% <ø> (ø)
3.12 snowflake 51.26% <ø> (ø)
3.12 spark 57.93% <ø> (ø)
3.12 spark_connect 46.83% <ø> (ø)
3.12 trino 52.44% <ø> (ø)
3.9 70.26% <ø> (+0.01%) ⬆️
3.9 athena or openpyxl or pyarrow or project or sqlite or aws_creds 56.56% <ø> (ø)
3.9 aws_deps 46.53% <ø> (ø)
3.9 big 54.97% <ø> (ø)
3.9 bigquery 48.74% <ø> (ø)
3.9 clickhouse 43.43% <ø> (ø)
3.9 databricks 50.51% <ø> (ø)
3.9 filesystem 63.01% <ø> (ø)
3.9 mssql 51.50% <ø> (ø)
3.9 mysql 51.87% <ø> (ø)
3.9 postgresql 54.60% <ø> (ø)
3.9 snowflake 51.26% <ø> (ø)
3.9 spark 57.90% <ø> (ø)
3.9 spark_connect 46.84% <ø> (ø)
3.9 trino 52.42% <ø> (ø)
cloud 0.00% <ø> (ø)
docs-basic 54.03% <ø> (ø)
docs-creds-needed 52.91% <ø> (ø)
docs-spark 52.46% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

8. Click **Start monitoring** or **Finish**.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for reviewers some UI copy was still in flux while I was working on this PR. I plan to double check all references to UI copy in this PR closer to the release date

## Next steps

- [Add an Expectation](/cloud/expectations/manage_expectations.md#add-an-expectation).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for reviewers though not in the plan, I moved the CTA to add an Expectation to a new "Next steps" section because I thought it seemed weird to have sequential steps of

  1. Click Start monitoring ...
  2. Add an Expectation ...

- You have connected GX Cloud to the relevant Data Source.

### Procedure
To add a Data Asset from an existing Data Source, complete the following steps:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for reviewers Though not in the plan, I changed the section intro and headers here because the old "Define the data you want GX Cloud to access." intro seemed too narrow in scope now and the pre-reqs section seemed a bit redundant here.

@@ -34,6 +38,7 @@ import OverviewCard from '@site/src/components/OverviewCard';
<LinkCard topIcon label="Manage Data Assets" description="Create, edit, or delete a Data Asset." to="/cloud/data_assets/manage_data_assets" icon="/img/small_gx_logo.png" />
<LinkCard topIcon label="Manage Expectations" description="Create, edit, or delete an Expectation." to="/cloud/expectations/manage_expectations" icon="/img/small_gx_logo.png" />
<LinkCard topIcon label="Manage Validations" description="Run a Validation, or view the Validation run history." to="/cloud/validations/manage_validations" icon="/img/small_gx_logo.png" />
<LinkCard topIcon label="Manage schedules" description="Use a schedule to automate data quality checks." to="/cloud/schedules/manage_schedules" icon="/img/small_gx_logo.png" />
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for reviewer in repurposing this page to be the landing page for the new "Introduction" section, I noticed that there were cards for all the top-level "manage" sections except for "manage schedules" so I went ahead and added it for completeness

@@ -139,7 +139,7 @@ flexibility where column presence is more critical than their sequence.
```

:::tip Automate this rule
When you [create a new Data Asset](/cloud/data_assets/manage_data_assets.md#add-a-data-asset-from-an-existing-data-source), you can choose to automatically generate this Expectation to test that columns don't diverge from the initial set over time.
When you [create a new Data Asset with GX Cloud](/cloud/data_assets/manage_data_assets.md#add-a-data-asset-from-an-existing-data-source), you can choose to automatically generate this Expectation to test that columns don't diverge from the initial set over time.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for reviewers Though not in the plan I updated this tip wording to be more clear that this feature is a cloud-only value-add. This matches the wording of the new tip on the volume page.

@klavavej klavavej marked this pull request as ready for review February 12, 2025 17:30

7. Click **Add x Asset(s)**.
7. Decide which common data quality issues you want to start monitoring. By default, GX Cloud adds Expectations to detect **Schema** and **Volume** issues. You can de-select recommendations you’d like to opt out of.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any docs explaining more about each of these expectations? If so, I think linking directly to those would be good in this spot and for the other connect docs with this step.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe how Schema and Volume are defined, among the set of data quality issues we support? Could link out to the pages here: https://docs.greatexpectations.io/docs/reference/learn/data_quality_use_cases/dq_use_cases_lp

4. Decide if you want to **Generate Expectations that detect column changes in selected Data Assets**.
4. Click **Add x Asset(s)**.

5. Decide which common data quality issues you want to start monitoring. By default, GX Cloud adds Expectations to detect **Schema** and **Volume** issues. You can de-select recommendations you’d like to opt out of.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above about linking to what these expectations mean. As a new reader/user, I would be a little confused and my inclination would be to look for more detailed definitions.

Comment on lines 8 to 9
- You can generate basic rules as part of adding a new Data Asset.
- You can generate personalized AI-recommended rules for an existing Data Asset.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two bullets feel disjointed to me compared to the rest of the language in the docs... might be better if they were between the first and second sentences in line 6? Not sure..

Can we be more specific about what "basic rules" mean? Assuming they refer to schema/volume expectations, can't those also be added to existing assets?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the term "standard" instead of "basic"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two bullets feel disjointed to me compared to the rest of the language in the docs... might be better if they were between the first and second sentences in line 6? Not sure..

I'll rephrase to improve the flow

Can we be more specific about what "basic rules" mean?

I'll add a link to clarify

Assuming they refer to schema/volume expectations, can't those also be added to existing assets?

yes but that would be manual, not automatic. I'll adjust wording to help clarify that this is about the automatic helper that is available only at the time of asset creation, not manual expectation creation which is available any time.

Can we use the term "standard" instead of "basic"?

Yes


## Monitoring common issues

When you [add a new Data Asset](/cloud/data_assets/manage_data_assets.md), GX Cloud by default automatically generates Expectations to test the following common data quality issues.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"by default automatically" seems redundant. I think you'd be fine with just "automatically".

Nit: I'd use a colon to introduce the list

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"by default automatically" seems redundant. I think you'd be fine with just "automatically".

I'll keep "by default" and drop "automatically" since folks can choose to opt out

Comment on lines +20 to +24
### Schema

To detect schema changes, we automatically generate a rule to **expect table columns to match set** using the Data Asset’s initial columns as the set to match. If the number or names of columns in the Data Asset change, this Expectation will fail.

### Volume
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to link to these descriptions in the docs where I commented above.

@@ -46,7 +46,9 @@ There are a variety of GX Cloud features that support additional enhancements to

* **Data Asset profiling.** GX Cloud introspects your data schema by default on Data Asset creation, and also offers one-click fetching of additional descriptive metrics including column type and statistical summaries. Data profiling results are used to suggest parameters for Expectations that you create.

* **Automate schema change detection.** GX Cloud can automatically generate Expectations that detect column changes. This option is available when [you create new Data Assets](/cloud/data_assets/manage_data_assets.md#add-a-data-asset-from-an-existing-data-source).
* **Automate rules for common issues.** GX Cloud can automatically generate Expectations that detect column changes and non-increasing volume. This option is available when [you create new Data Assets](/cloud/data_assets/manage_data_assets.md#add-a-data-asset-from-an-existing-data-source).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe say "table volume" here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We currently don't use the phrase "table volume" anywhere else in the docs so I'm reluctant to introduce it here. I'll change this to "data volume" (which is a phrase we already use elsewhere) to disambiguate from other potential interpretations of "volume"


7. Click **Add x Asset(s)**.
7. Decide which common data quality issues you want to start monitoring. By default, GX Cloud adds Expectations to detect **Schema** and **Volume** issues. You can de-select recommendations you’d like to opt out of.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe how Schema and Volume are defined, among the set of data quality issues we support? Could link out to the pages here: https://docs.greatexpectations.io/docs/reference/learn/data_quality_use_cases/dq_use_cases_lp

description: Generate data quality rules to more quickly achieve test coverage for your data.
---

With GX Cloud, you can automatically generate data quality rules to more quickly achieve test coverage for your data. This page provides an overview of options for automating data quality rules.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"This page provides..." feels a little formal. What about "GX provides options for generating standard rules and generating personalized AI-recommended rules across various parts of the application."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This phrasing is intentionally formal as I want to start having pages explicitly state their learning objectives to help folks navigate the docs. This is a common pattern in technical documentation. Examples from other docs sites

  • Stripe - "This guide offers a few ways to understand your options: Use cases ... Types of recurring payments ... Stripe products ..."
  • Soda - "As a step in the Get started roadmap, this guide offers instructions to schedule a Soda scan, run a scan, or invoke a scan programmatically."
  • Monte Carlo - "This page outlines some networking basics for successfully connecting with Monte Carlo."

Comment on lines 8 to 9
- You can generate basic rules as part of adding a new Data Asset.
- You can generate personalized AI-recommended rules for an existing Data Asset.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the term "standard" instead of "basic"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants