Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use go's Context all the way when producing #16

Open
jorgebay opened this issue Sep 12, 2022 · 2 comments
Open

Use go's Context all the way when producing #16

jorgebay opened this issue Sep 12, 2022 · 2 comments
Labels
enhancement New feature or request

Comments

@jorgebay
Copy link
Contributor

We should pass deadlines on all the replication communications to unbounded request times.

@jorgebay jorgebay added the enhancement New feature or request label Sep 12, 2022
@Mihai22125
Copy link
Contributor

Hi @jorgebay, is this issue still available? I would like to give it a shoot if you could provide me some pointers on where to start.

@jorgebay
Copy link
Contributor Author

jorgebay commented Nov 29, 2022

Hi @Mihai22125 ! I'm not sure it's still relevant.

Some background:
Currently, when the produce request is received it's passed to the Coaleacer to be grouped and compressed with other messages in that partition. Then, it's passed to the SegmentWriter which is responsible to writing it to the file and replicating it.
The Gossiper, responsible for all peer-to-peer communication between brokers including data replication, sends the data to the brokers that are considered followers for that partition.

The Gossiper uses a series of timers to make sure it doesn't keep waiting forever for the replication message to be sent to the follower and ack'ed. Using a compare-and-swap operation, we make sure we don't send it pass the deadline.

I initially thought it would be important to use a Context from the moment we handle the request, all the way to the writing and replicating. I now think that having those timers on the grouped messages (chunk) is good enough to keep the data flowing, specially considering that client deadlines should still always be used and on the server side we have to make sure a slow component doesn't bring down the server and I think that's the case:

  • When disk is slow, the flowcontroller / allocation pool will cause tcp backpressure to kick in.
  • When a broker is slow (for a moment), data will still be replicated in time on the other broker and not even written on the slow one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants