-
Notifications
You must be signed in to change notification settings - Fork 70
Blog post: Scaling Valkey Cluster to 1 Billion RPS #392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Harkrishn Patro <[email protected]>
Signed-off-by: Kyle J. Davis <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Over all good blog post. I made a bunch of small edits in a pull request on to your fork @hpatro see: hpatro#1
One thing that needs cleaning up is a collection pronoun issue - there are uses of 'we'/'our' that I think are referring to the project but others that are referring to the group of authors. This needs to be super crisp - take a look through and determine who did what and revise. If it's the project, try to make a cleaner attribution.
|
|
||
| ## Closing thoughts | ||
|
|
||
| With all these improvements made in Valkey, a cluster can now scale to 1B RPS / 2000 nodes which is quite a remarkable feat to achieve. However, there is plenty of room to improve further. The steady state CPU utilization overhead from the cluster bus message transfer/processing can be reduced further by incorporating [SWIM protocol](https://en.wikipedia.org/wiki/SWIM_Protocol) or move it off the main thread into an independent separate thread. The failover logic can be made smarter as well by incorporating the AZ placement of nodes. We would also like to introduce more observability metrics/logs into the system for better manageability of it. All of these are being linked under [Support Large Cluster](https://github.com/valkey-io/valkey/issues/2281) issue. Feel free to check it out and add in your suggestions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to have a little better call to action. What can a user do next aside from look at the issue? Is there further reading? Another blog post of interest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Engaging on the issue should be the next step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Feel free to check it out and add in your suggestions." - This statement was my call to action.
There is a detailed spec available here https://valkey.io/topics/cluster-spec/. Do we want to link it here ?
| github: cherukum-Amazon | ||
| --- | ||
|
|
||
| Mahesh Cherukumilli is an engineering leader passionate about large-scale distributed systems and high-performance databases. He sets strategy and builds high-performing teams that align architecture with business outcomes and deliver reliable, at-scale infrastructure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The phrase 'large-scale distributed systems' appears in all three bios. Can we mix it up a bit? Or maybe you all just share a very similar passion? 🤣
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something quite difficult to update 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just looked at the technical details and they looked good, will leave it to Kyle to ship.
review pass on 1 billion RPS blog
Signed-off-by: Harkrishn Patro <[email protected]>
Signed-off-by: Harkrishn Patro <[email protected]>
Signed-off-by: Harkrishn Patro <[email protected]>
Signed-off-by: Harkrishn Patro <[email protected]>
Thanks @stockholmux for the help, pulled that in.
Cleaned up the Please take a look. |
Signed-off-by: Harkrishn Patro <[email protected]>
Signed-off-by: Harkrishn Patro <[email protected]>
Signed-off-by: Harkrishn Patro <[email protected]>
|
@stockholmux / @makubo-aws please have a look, addressed your feedback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two small changes and I'm clear to publish on Monday ahead of the 9.0 launch.
Signed-off-by: Harkrishn Patro <[email protected]>
Have addressed them. The folder name and the release date on the post is also set for 10/20. So, we are good to release on Monday. @stockholmux Thanks for the help! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
Adds content for "1 Billion RPS blog post"
Issues Resolved
#337
Check List
--signoffBy submitting this pull request, I confirm that my contribution is made under the terms of the BSD-3-Clause License.