Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High CPU usage in newBrokerProducer.func2 #2907

Open
dzy176 opened this issue May 22, 2024 · 4 comments
Open

High CPU usage in newBrokerProducer.func2 #2907

dzy176 opened this issue May 22, 2024 · 4 comments
Labels
stale Issues and pull requests without any recent activity

Comments

@dzy176
Copy link

dzy176 commented May 22, 2024

Description

We found our prog running with high CPU usage.
I found a same closed issue -> #2251, but no reply to tell us how to resolve it.
In that case, sarama version is 1.33.0, but we already use a very new version 1.43.0 :( , so I create a new issue here for help.

Versions
Sarama Kafka Go
1.43.0 3.6.1 1.22
Pprof
# go tool pprof http://127.0.0.1:9102/debug/pprof/profile
Fetching profile over HTTP from http://127.0.0.1:9102/debug/pprof/profile
Saved profile in /root/pprof/pprof.exporter.samples.cpu.001.pb.gz
File: exporter
Type: cpu
Time: May 22, 2024 at 1:47pm (CST)
Duration: 30.17s, Total samples = 30.05s (99.59%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top -cum
Showing nodes accounting for 27.04s, 89.98% of 30.05s total
Dropped 56 nodes (cum <= 0.15s)
Showing top 10 nodes out of 17
      flat  flat%   sum%        cum   cum%
     0.42s  1.40%  1.40%     29.94s 99.63%  github.com/IBM/sarama.(*asyncProducer).newBrokerProducer.func2
         0     0%  1.40%     29.94s 99.63%  github.com/IBM/sarama.withRecover
     8.28s 27.55% 28.95%     29.31s 97.54%  runtime.selectgo
     0.87s  2.90% 31.85%      9.43s 31.38%  runtime.selunlock
     1.33s  4.43% 36.27%      9.39s 31.25%  runtime.sellock
         0     0% 36.27%      8.56s 28.49%  runtime.unlock (inline)
     0.01s 0.033% 36.31%      8.56s 28.49%  runtime.unlockWithRank (inline)
     8.07s 26.86% 63.16%      8.55s 28.45%  runtime.unlock2
         0     0% 63.16%      8.06s 26.82%  runtime.lock (inline)
     8.06s 26.82% 89.98%      8.06s 26.82%  runtime.lock2
(pprof) list github.com/IBM/sarama
Total: 30.05s
ROUTINE ======================== github.com/IBM/sarama.(*asyncProducer).newBrokerProducer.func2 in /workspace/vendor/github.com/IBM/sarama/async_producer.go
     420ms     29.94s (flat, cum) 99.63% of Total
         .          .    850:   go withRecover(func() {
         .          .    851:           buf := queue.New()
      30ms       30ms    852:           for {
         .       10ms    853:                   if buf.Length() == 0 {
         .          .    854:                           res, ok := <-pending
         .          .    855:                           if !ok {
         .          .    856:                                   // We are done forwarding the last pending response
         .          .    857:                                   close(responses)
         .          .    858:                                   return
         .          .    859:                           }
         .          .    860:                           buf.Add(res)
         .          .    861:                   }
         .          .    862:                   // Send the head pending response or buffer another one
         .          .    863:                   // so that we never block the callback
      60ms      260ms    864:                   headRes := buf.Peek().(*brokerProducerResponse)
      90ms     29.40s    865:                   select {
      50ms       50ms    866:                   case res, ok := <-pending:
      30ms       30ms    867:                           if !ok {
         .          .    868:                                   continue
         .          .    869:                           }
         .          .    870:                           buf.Add(res)
         .          .    871:                           continue
     160ms      160ms    872:                   case responses <- headRes:
         .          .    873:                           buf.Remove()
         .          .    874:                           continue
         .          .    875:                   }
         .          .    876:           }
         .          .    877:   })
ROUTINE ======================== github.com/IBM/sarama.withRecover in /workspace/vendor/github.com/IBM/sarama/utils.go
         0     29.94s (flat, cum) 99.63% of Total
         .          .     33:func withRecover(fn func()) {
         .          .     34:   defer func() {
         .          .     35:           handler := PanicHandler
         .          .     36:           if handler != nil {
         .          .     37:                   if err := recover(); err != nil {
         .          .     38:                           handler(err)
         .          .     39:                   }
         .          .     40:           }
         .          .     41:   }()
         .          .     42:
         .     29.94s     43:   fn()
         .          .     44:}
         .          .     45:
         .          .     46:func safeAsyncClose(b *Broker) {
         .          .     47:   tmp := b // local var prevents clobbering in goroutine
         .          .     48:   go withRecover(func() {
@AleksandrShibanov
Copy link

I have the same problem when the broker was down, cpu went 100%

first there was error
Failed to produce message to topic xxx: dial tcp xx.xxx.xx.xxx:9092: i/o timeout

then it changed to error:
Failed to produce message to topic xxx: circuit breaker is open

And after that new error, the state of cpu was restored, it went back to normal

@puellanivis
Copy link
Contributor

Hm… thinking about it, it looks like if buf.Length() ends up non-zero, and pending is closed, then we enter a tight CPU spin-loop from the continue on line 868…

@AleksandrShibanov
Copy link

Hm… thinking about it, it looks like if buf.Length() ends up non-zero, and pending is closed, then we enter a tight CPU spin-loop from the continue on line 868…

#2922

Copy link

github-actions bot commented Sep 3, 2024

Thank you for taking the time to raise this issue. However, it has not had any activity on it in the past 90 days and will be closed in 30 days if no updates occur.
Please check if the main branch has already resolved the issue since it was raised. If you believe the issue is still valid and you would like input from the maintainers then please comment to ask for it to be reviewed.

@github-actions github-actions bot added the stale Issues and pull requests without any recent activity label Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Issues and pull requests without any recent activity
Projects
None yet
Development

No branches or pull requests

3 participants