-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configurable max rows per streaming request #237
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -93,6 +93,18 @@ public class BigQuerySinkConfig extends AbstractConfig { | |
"The interval, in seconds, in which to attempt to run GCS to BQ load jobs. Only relevant " | ||
+ "if enableBatchLoad is configured."; | ||
|
||
public static final String BQ_STREAMING_MAX_ROWS_PER_REQUEST_CONFIG = "bqStreamingMaxRowsPerRequest"; | ||
private static final ConfigDef.Type BQ_STREAMING_MAX_ROWS_PER_REQUEST_TYPE = ConfigDef.Type.INT; | ||
private static final Integer BQ_STREAMING_MAX_ROWS_PER_REQUEST_DEFAULT = 50000; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's have the default behaviour same. We can use '-1' to say this is disabled and have that as the default |
||
private static final ConfigDef.Importance BQ_STREAMING_MAX_ROWS_PER_REQUEST_IMPORTANCE = ConfigDef.Importance.LOW; | ||
private static final String BQ_STREAMING_MAX_ROWS_PER_REQUEST_DOC = | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The maximum number of rows to be sent in one batch in the request payload to bigquery. |
||
"Due to BQ streaming put limitations, the max request size is 10MB. " + | ||
"Hence, considering that in average 1 record takes at least 20 bytes, " + | ||
"if we have big batches (e.g. 500000) we might need to run against BigQuery multiple requests " + | ||
"that would return a `Request Too Large` before finding the right size. " + | ||
"This config allows starting from a lower value altogether and reduce the amount of failed requests. " + | ||
"Only works with simple TableWriter (no GCS)"; | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Lets add a validator as well with minimum and maximum values allowed. |
||
public static final String GCS_BUCKET_NAME_CONFIG = "gcsBucketName"; | ||
private static final ConfigDef.Type GCS_BUCKET_NAME_TYPE = ConfigDef.Type.STRING; | ||
private static final Object GCS_BUCKET_NAME_DEFAULT = ""; | ||
|
@@ -518,6 +530,12 @@ public static ConfigDef getConfig() { | |
GCS_FOLDER_NAME_DEFAULT, | ||
GCS_FOLDER_NAME_IMPORTANCE, | ||
GCS_FOLDER_NAME_DOC | ||
).define( | ||
BQ_STREAMING_MAX_ROWS_PER_REQUEST_CONFIG, | ||
BQ_STREAMING_MAX_ROWS_PER_REQUEST_TYPE, | ||
BQ_STREAMING_MAX_ROWS_PER_REQUEST_DEFAULT, | ||
BQ_STREAMING_MAX_ROWS_PER_REQUEST_IMPORTANCE, | ||
BQ_STREAMING_MAX_ROWS_PER_REQUEST_DOC | ||
).define( | ||
PROJECT_CONFIG, | ||
PROJECT_TYPE, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rename this to -
maxRowsPerRequest