-
Notifications
You must be signed in to change notification settings - Fork 81
Add offlineGroupBy option for external source #1044
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
spark/src/test/scala/ai/chronon/spark/test/ExternalSourceBackfillTest.scala
Show resolved
Hide resolved
We may also need to update:
These logics are used in join backfill DAGs where each join_parts are run in parallel. Today, production run uses: https://github.com/airbnb/chronon/blob/main/spark/src/main/scala/ai/chronon/spark/JoinBase.scala#L477 which has been handled in this PR. For dev run, we use join_backfill.py to orchestrate a fine-grained DAG, where each task runs, |
Thanks @hzding621 , I've made a change and added unit tests. Let me know if you think we should do a test dev run (if yes, please share instructions of dev run) |
Summary
Add and validate offline groupby option for external source. This is the first part of the task to support backfill for external sources. This PR does:
Why / Goal
We want to introduce a first-level abstraction for External source to support offline backfill. The idea is that an ExternalSource now can take in an optional GroupBy definition. During offline backfill, Chronon will dispatch to the GroupBy for backfill.
Test Plan
========= Test configs =======
Test configs are at https://git.musta.ch/airbnb/ml_models/pull/27812
========= Export schema with External offline groupby =========
========= Backfill join with External offline groupby ===========
https://superset.a.musta.ch/sqllab/p/Oa8yamlbN8K/
======== Backfill join WITHOUT External offline groupby ========
https://superset.a.musta.ch/sqllab/p/gKx6rAQ2kxz/
Compare with backfilled table with official Chronon build
https://superset.a.musta.ch/sqllab/p/4RJ9e9zBwJA/
========= Export schema WITHOUT External offline groupby =====
Check schema matches with the schema exported from official package
https://superset.a.musta.ch/sqllab/p/6B3NRO5A13R/
Checklist
Reviewers