Releases: awslabs/analytics-accelerator-s3
v1.3.0
What's Changed
- Cleanup README by @sullis in #310
- Append AAL user agent to customer user agent by @rajdchak in #311
- [minor] Adding more info on optimizations for our readme by @stubz151 in #295
- Upgrade to freefair lombok version 8.14 by @sullis in #313
- Add Retries and Retry Policy by @fuatbasik in #307
- Update README to reflect current integration status by @SanjayMarreddi in #312
- Add javadocs and host them in Github.io by @fuatbasik in #276
- Upgrade SDK version and fix user-agent changes to avoid prepending empty string by @fuatbasik in #319
- Improving Retries based on Iceberg Integration Feedback by @fuatbasik in #321
- Upgrade to junit 5.13.4 by @sullis in #314
- Sequential optimization README by @vaibhav5140 in #254
- Microbenchmarks to better reflect a Spark workload. by @ahmarsuhail in #334
- Fix for ITests failures due to new env variables. by @ahmarsuhail in #336
- Calls release() method on any failure. by @ahmarsuhail in #337
- New PhysicalIO implementation by @ozkoca in #325
- Micro benchmark changes by @ahmarsuhail in #339
- Adapt retries to new PhysicalIO by @fuatbasik in #340
- Added name for thread pool by @ozkoca in #342
- Add support for Java sync client. by @ahmarsuhail in #341
- [feat]:add ttl to metadata by @vaibhav5140 in #338
- Close inputStream. by @ahmarsuhail in #345
- Introduce max read bytes config for sequential read by @ozkoca in #348
- fix: update to push snapshot to maven local for iceberg build by @stubz151 in #350
- fix: using snapshot version for iceberg build by @stubz151 in #351
- Release Version 1.3.0 by @ozkoca in #352
New Contributors
Full Changelog: v1.2.1...v1.3.0
v1.2.1
What's Changed
- Build Iceberg with Spark 3.5 by @SanjayMarreddi in #303
- Generate encrypted objects for benchmark by @rajdchak in #296
- Refactor ITests. by @ahmarsuhail in #297
- IoStats-CallBack-AAL by @vaibhav5140 in #298
- fix: shading dependency on google error prone by @stubz151 in #305
- Release Version 1.2.1 by @vaibhav5140 in #308
Full Changelog: v1.2.0...v1.2.1
v1.2.0
What's Changed
- Updated release.yml to fix signing issue by @rajdchak in #269
- Move small object prefetching to physicalIO by @vaibhav5140 in #258
- Fix fleaky unit tests on BlobStore by @fuatbasik in #273
- Introduces common executor pool. by @ahmarsuhail in #275
- Run integration tests on self hosted runner by @rajdchak in #277
- Modify gradle setup step to reduce time by @rajdchak in #278
- Fix bug in small object prefetching by @rajdchak in #279
- Implements readVectored() in AAL by @ahmarsuhail in #270
- Migration to Central Portal for Maven by @SanjayMarreddi in #282
- Pass down openstreaminfo instead of streamcontext by @rajdchak in #283
- auditing - initial changes by @ahmarsuhail in #280
- SSE_C changes by @rajdchak in #281
- Update-integration-tests-worflow by @vaibhav5140 in #291
- Adds test cases for readVectored() by @ahmarsuhail in #284
- Add support for readFully at the S3SeekableInputStream level by @SanjayMarreddi in #293
- Replace DAT with AAL by @ozkoca in #300
- chore: increasing version number to 1.2.0 by @stubz151 in #299
Full Changelog: v1.1.0...v1.2.0
v1.1.0
Version 1.1.0 adds in new features like memory manager, new metrics, read optimisations for sequential file formats and multiple other improvements/fixes as listed below.
v1.1.0 (May 09, 2025)
- feat: Memory Manager #251
- feat: Added new metrics like memory usage and cache hit/miss #257
- feat: Read optimisations for sequential file formats #238
- Improved integration test documentation #260
- Added config to use format-specific LogicalIO implementations #259
- Reduced waiting time and retry on GrayTest #256
- fix: Failing ref tests #255
- fix: Setting log path for telemetry #252
- Added some debug logs #250
- Reduced default block read timeout to 30 seconds #249
- Enabled Iceberg unit-tests #245
v1.0.0
Version 1.0.0 Adds in retry logic for block reads in the prefetch path, and reduces logging level to debug. Removes the previous recommendation to restrict usage to non-production traffic. While this is a major version update, the release contains no breaking changes.
v1.0.0 (March 04, 2025)
v0.0.4
Version0.0.4 Adds in retry logic for SDK requests and a new constructor for passing in known stream information.
What's Changed
- Close input stream explicitly #222
- Timeout retry stuck sdk client #219
- Adds in constructor for open stream information #223
Full Changelog: v0.0.3...v0.0.4
v.0.0.3
Version0.0.3 Adds audit headers, exception handling, and etag support.
What's Changed
- Fix Len = 0 and Insufficient Buffer behaviours for positioned reads by @fuatbasik in #203
- fix: fixing jmh local build by @stubz151 in #205
- Support audit headers in request by @rajdchak in #204
- Add ability to dump configs by @CsengerG in #206
- Add JMH JAR generation to CICD by @CsengerG in #207
- Migrate to new Iceberg staging branch by @CsengerG in #208
- Improve the exception handling of the S3SdkObjectClient by @SanjayMarreddi in #210
- Improve the unit and integration tests by @SanjayMarreddi in #211
- feat: adding etag checking for stream reads by @stubz151 in #209
- fix: updating integ test to check at correct point by @stubz151 in #213
- add gray failure tests and FaultyS3Client by @fuatbasik in #214
- Update Version Number to 0.0.3 by @SanjayMarreddi in #215
New Contributors
- @stubz151 made their first contribution in #205
- @rajdchak made their first contribution in #204
- @SanjayMarreddi made their first contribution in #210
Full Changelog: v0.0.2...v.0.0.3
v0.0.2
Version0.0.2 improves the performance by separating prefetching of dictionaries and data for Parquet objects and splitting fetching of footer data into two. S3SeekableStream now lets consumer seek beyond the end of stream.
What's Changed
- Fixed a typo in the README by @oleg-lvovitch-aws in #180
- Remove unnecessary Maven publish step by @CsengerG in #182
- Move both Iceberg and S3A CICD to snapshot builds by @CsengerG in #186
- Split footer requests into two by @ahmarsuhail in #188
- Prefetch dictionaries and column data separately by @ahmarsuhail in #189
- Addresses review comments by @ahmarsuhail in #190
- Add support to seek beyond end of stream @fuatbasik (#192)
Full Changelog: v0.0.1...v0.0.2
v0.0.1
Alpha release of Analytics Accelerator Library for Amazon S3, an open source library that accelerates data access to S3 for client applications, lowering processing times and compute costs for your data analytics workloads. See README for further details.
Note: This is an Alpha release and should not be used in production.
Please see GitHub for known issues.
What's Changed
- Initial commit by @CsengerG in #1
- Add simple CI building with ./gradlew build by @CsengerG in #2
- Add code coverage verification to builds by @CsengerG in #3
- Build: Streamlined dependency and plugin management by @oleg-lvovitch-aws in #4
- Add spotless checks by @CsengerG in #13
- Initial implementation of object client. by @ahmarsuhail in #14
- Implement very first version of seekable stream by @CsengerG in #15
- Eliminate open-range GET requests by @CsengerG in #16
- Adds in jar task by @ahmarsuhail in #17
- Implement first version of JMH microbenchmarks by @CsengerG in #18
- Implements read(buf[], offset, len) by @ahmarsuhail in #19
- Implement readTail() by @CsengerG in #20
- Make client singleton, add a sensible readAhead. by @ahmarsuhail in #21
- Fixes off by one error. by @ahmarsuhail in #23
- Start of the new configuration approach, with opportunistic changes required to support it by @oleg-lvovitch-aws in #25
- New
common
module by @oleg-lvovitch-aws in #26 - Factor implementation into Logical and Physical IO layers by @CsengerG in #27
- Github action to build and upload s3 seekable stream jars to s3 bucket by @radhisat in #24
- Fixing the command to copy jars to s3 in GitHub actions by @radhisat in #28
- Extend GitHub workflow to upload JARs for S3FileIO by @CsengerG in #31
- Fix copy-paste typo in build-upload script by @CsengerG in #33
- Implements initial parquet parsing. by @ahmarsuhail in #32
- Fix reference tests by @CsengerG in #30
- Simplify local development of integrations by @CsengerG in #29
- Adds in initial logic for build column maps. by @ahmarsuhail in #37
- [Preview] Implementing footer caching with introducing single cache option. by @IsaevIlya in #35
- Prefetching fixes and improvements by @radhisat in #41
- Add property based testing by @CsengerG in #34
- Moves to byte array instead of ByteBuffer. by @ahmarsuhail in #44
- This change fixes a bug in creating a prefetch block by @radhisat in #43
- Adding new logic to send user-agent in S3 Requests by @fuatbasik in #42
- Extending unit test coverage and fixing race condition by @IsaevIlya in #45
- Parquet aware prefetching. by @ahmarsuhail in #39
- Prefetch recent columns. by @ahmarsuhail in #46
- Add extra logs and throw Exception when seeing wrong range. by @IsaevIlya in #47
- Replace sleep call with waiting on task for caching data by @IsaevIlya in #49
- Handles multi row grow group parquet files by @ahmarsuhail in #48
- Adds in prefetch details to referrer header. by @ahmarsuhail in #50
- Update instructions on how to run micro-benchmarks by @CsengerG in #51
- Supporting nested parquet schema by @radhisat in #52
- Use S3CrtAsyncClient to prevent writing time out by @IsaevIlya in #54
- Shades parquet-format dependency. by @ahmarsuhail in #55
- Optimize block reading by @IsaevIlya in #53
- Updates logs to the right levels. by @ahmarsuhail in #57
- Adds some debug logs. by @ahmarsuhail in #59
- Fixing reference tests by @radhisat in #60
- [Refactor] Move Parquet awareness out of the PhysicalIO layer by @CsengerG in #56
- Adds in minimum confidence ratio to prevent over reading. by @ahmarsuhail in #61
- Parquet metadata parsing improvements by @radhisat in #62
- Prevent overreading per schema. by @ahmarsuhail in #63
- Reverting parquet metadata improvements by @radhisat in #64
- Implement sequential prefetching in PhysicalIO by @CsengerG in #65
- [Fix] Resolve dependency conflict in uber JAR by @CsengerG in #67
- Update part size to 8MB by @ahmarsuhail in #68
- Prefetch lengths were asking for one extra byte by @ahmarsuhail in #69
- Add Configuration Modification by @fuatbasik in #70
- First cut at Telemetry and initial instrumentation by @oleg-lvovitch-aws in #74
- Fixed the
IOplan.toString
NSE and changed the telemetry to turn off the console by @oleg-lvovitch-aws in #76 - Telemetry: added a concept of
level
and reduced noisiness by @oleg-lvovitch-aws in #77 - Further telemetry updates to manage verbosity and control perf by @oleg-lvovitch-aws in #78
- Added SpotBugs coverage and fixed the codebase by @oleg-lvovitch-aws in #82
- Refactoring: Move all
Object-Client
modeling tocommon
and get rid of duplicateRange
. by @oleg-lvovitch-aws in #87 - Remove IOUtils dependency by @CsengerG in #75
- Set appropriate groupId by @CsengerG in #91
- Change version from 1.0.0 to 0.0.1 by @CsengerG in #92
- Telemetry refactoring for
input-stream
by @oleg-lvovitch-aws in #90 - Adds in a Default logical IO to be used for all non parquet objects. by @ahmarsuhail in #85
- Added lifetime management/
flush
toTelemetry
to set up support for reporter that allocate resources by @oleg-lvovitch-aws in #94 - Produce downstream S3A and S3FileIO artifacts to be used by benchmarks. by @shintaroonuma in #95
- Fix iceberg spark runtime jar path by @shintaroonuma in #97
- Telemetry for ObjectClient and related minor refactoring by @oleg-lvovitch-aws in #96
- Lookup iceberg spark runtime jar path by @shintaroonuma in #99
- Build all artifacts before uploading to S3 by @shintaroonuma in #100
- Rename cicd workflow iceberg artifact by @shintaroonuma in #102
- Telemetry: added support for metrics and simple aggregations by @oleg-lvovitch-aws in https://github.com/awslabs/analytics-accelerator-s3/...