Feature request: Add global variables and change assertions behavior #1382

gshilo · 2022-10-24T08:47:44Z

Hello

I hope it's the right place for asking for features.

I would like to suggest two changes in DataFlow:

Add global variables. I know you can create a variable in dataflow.json, but I mean a variable that can be created and assigned anywhere in the workflow and can be accessed or even changed in any sqlx file down the stream.
Many times there's a need to a centralized place, a single source of truth. We can use a database for that, but in many times its an overkill.
Assertions behavior. I would expect assertions to filter out the bad records/events/items and send them to a special table/view, but let the good data go on so one bad record would not stop a million records process.
Instead, if I have even one bad record from millions of records, the whole data pipeline stops and good records do not get processed.
Maybe it is a good idea to change this and let the process finish with only the good records.

Thank you

BenBirt · 2022-10-25T10:07:16Z

Thanks!

global variables: I think you can do this today, in a JavaScript file. See https://docs.dataform.co/guides/javascript/includes for examples.
we will consider adding options to assertions to allow for this. In the mean time however, you can do this yourself - you could write a reusable JavaScript function which creates such a view, and only "fails" if X% of the input data is "bad" (or whatever other logic you care about).

Ekrekr · 2024-04-02T14:36:22Z

if I have even one bad record from millions of records, the whole data pipeline stops and good records do not get processed

We're currently working on a feature during execution (in GCP) that will alleviate this issue. I'll try to keep this bug updated!

Ekrekr added the feature-request label Feb 10, 2023

Provide feedback