⚡️ Speed up method BigQueryUtils.dataset_exists
by 14% in src/Connectors/gcp_bq_queries.py
#9
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄
BigQueryUtils.dataset_exists()
insrc/Connectors/gcp_bq_queries.py
📈 Performance improved by
14%
(0.14x
faster)⏱️ Runtime went down from
335 microseconds
to293 microseconds
Explanation and details
To optimize the given Python program for better runtime performance, we can consider the following.
Minimize API calls: Since the primary operation in the
dataset_exists
method is checking the existence of a dataset using theget_dataset
method, a single API call is already quite efficient. However, we can optimize the object initialization to make sure it doesn't need to reinitialize for every check if the method is called multiple times.Thread safety: Ensure that the BigQuery client initialization does not interfere with other threads if used in a multithreaded environment (although this is not directly shown in your code).
Remove unnecessary print statements: While they're useful for debugging, removing them can speed up execution slightly and is generally a good practice for production code.
Here's the optimized code based on the aforementioned suggestions.
Key changes.
_client
and_bqstorage_client
as class-level attributes to avoid re-initialization.This revised code keeps the function signatures and return values intact while focusing on minimalistic optimizations.
Correctness verification
The new optimized code was tested for correctness. The results are listed below.
🔘 (none found) − ⚙️ Existing Unit Tests
✅ 16 Passed − 🌀 Generated Regression Tests
(click to show generated tests)
🔘 (none found) − ⏪ Replay Tests