Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto refresh of RCU usage #135

Open
sivankumar86 opened this issue Aug 29, 2020 · 0 comments
Open

Auto refresh of RCU usage #135

sivankumar86 opened this issue Aug 29, 2020 · 0 comments

Comments

@sivankumar86
Copy link
Contributor

Issue:
Dynamodb export job is running for more than 5 days which causes datapipeline time out due to data skew.

configuration , r5.24xlarge =20
RCU =400k
size= ~80Tb
maps=2000 maps

70TB exported in around 9 hours and reset of data scanned <10k hence, job runs longer.

have also tried increasing yarn map memory and reduce the node to increase RCU per maps however, it is a trail and error method which takes time and increase emr cost

Solution :
It can be mitigated if rcu usage refreshed based on running container with certain interval as only few container runs at end of job for long time and rcu is assigned at start of the job.

Any other suggestion ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant