Skip to content

Compacts parquet files present in an S3 location using AWS Glue job

Notifications You must be signed in to change notification settings

vadivelselvaraj/ParquetFileCompaction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ParquetFileCompaction

Compacts parquet files present in an S3 location using AWS Glue job. This repo is part of the medium post that I made here.

Setup

After cloning the repo, run the below.

  • Review the cloudformation stack parameters under Job Parameters of the manager.sh file.
  • Create the cloudformation stack.
./manager.sh create-stack
  • Run compaction job
./manager.sh run-compaction s3://PATH_WITH_MULTIPLE_FILES s3://READ_OPTIMIZED_STORAGE_PATH

Note: The S3 path location shouldn't end with a slash.

Maintenance

  • After updating any cloudformation stack parameters, update it using the below.
./manager.sh update-stack
  • Delete the cloudformation stack.
./manager.sh delete-stack

About

Compacts parquet files present in an S3 location using AWS Glue job

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published