Skip to content

Conversation

kalekem
Copy link

@kalekem kalekem commented Dec 28, 2022

Currently, when creating BigQuery table(s) from GCS file(s), if any of the GCS file has a newline character or if a row is empty, the record(s) is treated as bad and the BigQuery table cannot be created

To allow missing values to be treated as 'null' and records encased in quotation marks when they have newline characters and have BigQuery accept the records, we would need to add the following two parameters that are currently not included:

allow_jagged_rows = True

  •    True: Missing values are treated as 'null' but accepted.
    
  •    False: Rows with missing data are treated as bad records
    

allow_quoted_newlines = True

  •    True: Allow a CSV value to contain a newline character when the value is encased in quotation marks
    
  •    False:  A new line character, regardless of quotations, is always considered a new row
    

Let me know if you need more information on this and I could send an email together with screenshots for one of the pipelines we're currently running that uses this library

Copy link
Collaborator

@danielkulikov danielkulikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be use cases where we need these to be false? We probably want these to be a variable coming from the config - we can add them to the Salesforce base config class - you can take a look at examples in the base classes already present in the framework

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants