-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jolokia timeouts while running seastat #11
Comments
Tweaking jolokia with executor=cached helped to improve scraping reliability. Jolokia keeps responsive while seatsat is running.
However, we are still facing issue with some table stats reading timeouts and related metrics are missing. I found that there is a hard-coded 3s timeout here https://github.com/suhailpatel/seastat/blob/master/jolokia/client.go#L24 |
Hi @mkey-cdx, thanks for your issue report. I'm quite surprised that it's timing out with that number of tables. Typically, these stats are most expensive when you have lots of keyspaces and tables but 7 keyspaces and 67 tables is not a large number nor is the number of nodes (5) very large either. Would you be able to tell me how many cores/RAM is available to your C* instance per node? That may play a factor. I have created a
I would recommend first try changing the I'd love to hear what combination of parameters works for you. Additionally, I hope once you find the right combination, it's zippy enough to bring that interval period right down. For context, Seastat is used on a cluster scraping over 1500 tables every 30s and it does this with ease. |
Hi @suhailpatel
Timeouts start to occur as the system is suffering from IOwait (at about 13:57 then at 13:59 in this example). Because the cluster was idle by that time, I suspect IOs were caused by jolokia itself. IO activity is coming from the cassandra process (jolokia is attached as java agent) and goes completely down as I stop the seastat process. The JMX exporter we have in production is configured to whitelist/blacklist mbeans from being exposed during scrape. Do you know if we can implement a similar mechanism in seatstat? or perhaps at jolokia level? FYI the cluster I use for the testings has 5 nodes with 24 cores/128G RAM each. Data is located in HDD jbods. |
Hello @suhailpatel, we started to consider using your tool as an alternative to the cassandra jmx exporter to help performance and memory usage issues we are facing.
However, even if the collector has good performance/memory usage, debug logs shows that the scraper will often timeout during jolokia requests. This test has been done with seatstat v0.5.1 in a running 5 nodes cassandra v3.11 cluster, holding 150TB of data across 7 keyspaces and 67 tables (including system and reaper ones). Jolokia v1.6.2 is attached as a java agent.
Checking directly at jolokia in parallel shows that it becomes unresponsive a few seconds after we start seastat:
In practice, this results in metrics not being exported at all by the collector and next scraper runs will simply stop the jolokia scraping attempt from time to time:
🦂 Could not fetch version, bailing out
Two concerns here:
The text was updated successfully, but these errors were encountered: