-
Notifications
You must be signed in to change notification settings - Fork 43
Load data into BQ #1045
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Load data into BQ #1045
Conversation
faucomte97
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@faucomte97 reviewed 18 of 18 files at r1, all commit messages.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @SKairinos)
.gcloud/functions/load_data_into_bigquery/utils/logging.py line 39 at r1 (raw file):
log_obj.update(context) # If the log call passed extra={"foo": "bar"}, add that too
Is this comment up to date?
.gcloud/functions/load_data_into_bigquery/utils/storage.py line 70 at r1 (raw file):
@processed_status.setter def processed_status(self, value: _ProcessedStatus): """Moves the blob to the failed subdirectory for manual inspection."""
Update this docstring to reflect new approach
.gcloud/functions/load_data_into_bigquery/utils/chunk.py line 40 at r1 (raw file):
timestamp: datetime # when the data export began obj_i_start: int # object index span start obj_i_end: int # object index span end
I would say these comments aren't needed here as they've all been explained already in the docstring.
If you'd rather keep them here that's fine too, I'm not strongly for or against either.
Code quote:
bq_table_name: str # name of BigQuery table
bq_table_write_mode: _BqTableWriteMode # write mode for BigQuery table
timestamp: datetime # when the data export began
obj_i_start: int # object index span start
obj_i_end: int # object index span end.gcloud/functions/load_data_into_bigquery/utils/chunk.py line 111 at r1 (raw file):
) # "2025-01-01_00:00:00__1_1000" file_name = file_name.removesuffix(file_name_suffix)
Do this before doing any of the splitting.
It's marginal but will save on some processing if the file is the wrong format (which we can check without needing to do any splitting).
Code quote:
file_name_suffix = ".csv"
if not file_name.endswith(file_name_suffix):
return handle_error(
f'File name should end with "{file_name_suffix}".'
)
# "2025-01-01_00:00:00__1_1000"
file_name = file_name.removesuffix(file_name_suffix).gcloud/functions/load_data_into_bigquery/utils/bigquery.py line 37 at r1 (raw file):
Returns: A flag designating whether the flag was successfully processed. False
whether the blob* was successfully processed
.gcloud/functions/load_data_into_bigquery/utils/bigquery.py line 39 at r1 (raw file):
A flag designating whether the flag was successfully processed. False will be returned if a known error occurred which makes it impossible to load the data (e.g. the BQ table does not exist) to avoid pointlessly
pointless*
This change is