Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Add global variables and change assertions behavior #1382

Open
gshilo opened this issue Oct 24, 2022 · 2 comments
Open

Feature request: Add global variables and change assertions behavior #1382

gshilo opened this issue Oct 24, 2022 · 2 comments

Comments

@gshilo
Copy link

gshilo commented Oct 24, 2022

Hello

I hope it's the right place for asking for features.

I would like to suggest two changes in DataFlow:

  1. Add global variables. I know you can create a variable in dataflow.json, but I mean a variable that can be created and assigned anywhere in the workflow and can be accessed or even changed in any sqlx file down the stream.
    Many times there's a need to a centralized place, a single source of truth. We can use a database for that, but in many times its an overkill.

  2. Assertions behavior. I would expect assertions to filter out the bad records/events/items and send them to a special table/view, but let the good data go on so one bad record would not stop a million records process.
    Instead, if I have even one bad record from millions of records, the whole data pipeline stops and good records do not get processed.
    Maybe it is a good idea to change this and let the process finish with only the good records.

Thank you

@BenBirt
Copy link
Collaborator

BenBirt commented Oct 25, 2022

Thanks!

  1. global variables: I think you can do this today, in a JavaScript file. See https://docs.dataform.co/guides/javascript/includes for examples.
  2. we will consider adding options to assertions to allow for this. In the mean time however, you can do this yourself - you could write a reusable JavaScript function which creates such a view, and only "fails" if X% of the input data is "bad" (or whatever other logic you care about).

@Ekrekr
Copy link
Contributor

Ekrekr commented Apr 2, 2024

if I have even one bad record from millions of records, the whole data pipeline stops and good records do not get processed

We're currently working on a feature during execution (in GCP) that will alleviate this issue. I'll try to keep this bug updated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants