Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maximum Throughput Limited to 1 Message per Second #9

Open
arashta opened this issue Mar 18, 2021 · 1 comment
Open

Maximum Throughput Limited to 1 Message per Second #9

arashta opened this issue Mar 18, 2021 · 1 comment

Comments

@arashta
Copy link

arashta commented Mar 18, 2021

We recently faced an issue with the maximum throughput of the s3-sqs connector. Apparently, in the fastest possible case, the fetching of a message from the target SQS queue can happen once every second since:

  1. the scheduling unit of the SQS fetch job is SECOND accorindg to the code, and
  2. the lowest possible value for the fetching intervals is 1

In other words, the connector can fetch only one message per second at its fastest possible rate. Therefore, if we have more than one message pushed into the SQS queue, i.e., the message generation rate is greater than 1 message per second, we end up having the queue size infinitely increasing. Besides, the utilization of the processing resources on the Spark cluster side would be considerably reduced.

I suggest changing the scheduling unit to MILLISECOND in order to resolve this issue.

@pnain
Copy link

pnain commented Sep 20, 2022

I am facing the same issue where source is publishing 500 files per second and those needs to be processed every minute. Spark logic is able to process these files under a minute when try to run it in batch mode but would like to go with this approach to further reduce the listing time.
Is there any way we can increase the throughput on sqs reader?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants