Skip to content

Commit 2b611eb

Browse files
authored
IR-258-readme-update (#186)
* Update README.md to describe local usage of the IR-258-cloudshell feature branch * Update dependencies * Miscellaneous fixes due to changing ruff rules
1 parent 452144a commit 2b611eb

File tree

7 files changed

+378
-387
lines changed

7 files changed

+378
-387
lines changed

Pipfile.lock

Lines changed: 367 additions & 380 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

README.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -242,9 +242,13 @@ LAMBDA_MAX_CONCURRENCY=### Maximum number of parallel workers for CLI bulk valid
242242
```
243243

244244
## Technical Limitations of the application
245-
Given that this application leverages AWS Lambda services, there are limitations to which bags it can successfully process. Lambdas have a 15 minute execution time limit which could cause issues with larger bags or bags with many files. Files larger than 5 GB must have checksums calculated in a more time-consuming operation, meaning that a bag with many 5 GB+ files may be more likely experience timeout issues than a similarly sized bag with smaller files.
245+
Given that this application leverages AWS Lambda services, there are limitations to which bags it can successfully process. Lambdas have a 15 minute execution time limit which could cause issues with larger bags or bags with many files. Files larger than 5 GB must have checksums calculated in a more time-consuming operation, meaning that a bag with many 5+ GB files may be more likely experience timeout issues than a similarly sized bag with smaller files.
246246

247-
In practice, the application has successfully processed a 97.7 GB bag and a 8,758 file bag in S3. It failed to process a bag with over 50,000 files though and we are investigating how we could handle bags with this many files or more.
247+
In practice, the application has successfully processed a 97.7 GB bag and a 8,758 file bag in S3. However, it failed to process a bag with over 50,000 files and we are investigating how we could handle bags with this many files or more.
248+
249+
For bags that fail Lambda validaton, users can run the `IR-258-cloudshell` feature branch locally. This branch calls the validation functionality directly from the CLI rather than through a Lambda and allows for much longer-running validations. This feature branch successfully validated the bag with over 50,000 files as well as 200+ GB bags. It has run upwards of 6 hours for bags with many 5+ GB files that required the more time-consuming checksum calculation.
250+
251+
**NOTE** `IR-258-cloudshell` was created under a tight deadline to validate a few edge cases so it only supports the `validate` command for individual bags. Further testing and optimization are needed for this to be a sustainable and maintainable solution.
248252

249253
## Related Assets
250254

lambdas/utils/aws/s3.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -125,7 +125,7 @@ def get_object_checksum(
125125
cls,
126126
s3_uri: str,
127127
size: int | None,
128-
has_sha256_checksum: bool | None = None,
128+
has_sha256_checksum: bool | None = None, # noqa: FBT001
129129
) -> str:
130130
"""Get SHA256 checksum for an S3 object.
131131

lambdas/utils/aws/s3_inventory.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -194,7 +194,7 @@ def get_aips_df(self) -> pd.DataFrame:
194194
aip_regex = (
195195
"""(.+?([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}))/(.*)"""
196196
)
197-
# ruff: noqa: E501, UP032
197+
# ruff: noqa: UP032
198198
query = """
199199
-- CTE of all inventory data rows
200200
with cdps_aip_inventory as (

tests/conftest.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# ruff: noqa: PD901, SIM117
1+
# ruff: noqa: SIM117
22

33
from unittest.mock import MagicMock, patch
44

tests/test_aip.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# ruff: noqa: PLR2004, SLF001, PD901, SIM117, ARG002, E501
1+
# ruff: noqa: PLR2004, SLF001, SIM117, ARG002, E501
22

33
import json
44
import os

tests/test_s3_inventory.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# ruff: noqa: PD901, DTZ001, PLR2004, SLF001, ARG002, BLE001
1+
# ruff: noqa: DTZ001, PLR2004, SLF001, ARG002, BLE001
22

33
import concurrent.futures
44
import datetime

0 commit comments

Comments
 (0)