-
Notifications
You must be signed in to change notification settings - Fork 437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
After update to bullmq: Error: could not renew lock for job #3056
Comments
As far as I know BullMQ is as stable as Bull having a much larger test suite, so in general you should not have more issues, but less. However it is possible that during migration you made some assumptions on how BullMQ works which may not hold true coming from Bull. The best would be if you could post a case that reproduces those issues so we can give you hints or look deeper into it if it happens to be a bug. Furthermore, you mentioned that you run into a given error. That error is only produced via an event, and is triggered only if a lock cannot be renewed for a given job, this is quite unusual, so probably related to the migration work. I also wonder, are you using typescript? |
I am having the ssame issue. It happens randomly and I cannot even destroy the queue. I have to restart each time. |
hi folks, just for curiosity, how did you migrate to bullmq from bull? Did you create new queues for bullmq or did you use a different prefix? |
Hello everyone, No, we do not use typescript, just vanilla JS.
... and we continue to have this issue. We need to restart the whole node instance (docker container) to make it startup again. We run into the We are already trying to not overload the CPU as best as we can. Of course we "fluctuate around 100%", but this, to me, does not mean that we really leave ZERO headroom for the CPU to even renew the lock. :/ Migration: We are using a different prefix ( |
9 out of 10 the errors of this nature steams in wrong passing of options or arguments when not using Typescript, specially coming from Bull which does not have the same signatures. It is difficult to asses if your issue is related to high CPU usage, as you mentioned that you are sometimes up in 100%. Without more information about the specifics of your use case and some test case that shows the problem we really have not a lot of chances to help you. |
It is highly unlikely that you are having "the same issue", specially when we do not even know yet what the issue it. So please if you have an issue, post a reproducible case in a new issue and we will look into it. |
Thanks for the reply @manast . Could you please add more details on the "wrong passing of options"? I don't understand how some "code bug" on our end could trigger this very error? Sidenote: We do use TS checks in our VS code setup and do not get any errors for like "wrong passed options" to e.g. On that note: we do have a queueEvent on the It is almost impossible for me to give you a reproduction example, also because of the randomness how/when the issue occurs for us. Our usecase: We use the queue to connect our main (MeteorJs) App to our workers (plain nodejs apps). These workers handle huge amounts of data imports. Basically all our jobs in the queue consist of reading data from files or APIs, create mongoDB update bulk operations, and running these bulk operations on our mongoDb. |
But when this happens, what is the status of the job that could not renew the lock? |
We will implemented some more logging today and get back to you. Thanks a lot for your responsiveness, highly appreciated! We do use bullboard, and for some reason, I can not find these jobIds in our "failed list". So I am also confused, where these jobs disappear to. |
How many jobs do you usually run concurrently? |
On these workers, only 1! |
Are these jobs blocking NodeJS event loop? Did you try using sandboxes instead? |
@Twisterking to find a pattern, I want to ask if you have the same setup with me.
|
Quick update from my end: It looks like that, indeed, we identified some nested Will report back when I know (even) more!
|
According to some logging, in my case, the job stucks at connecting to Discord via socket, "sometimes". But I don't understand why I cannot force worker process to be killed. I will dig more too. Thanks @Twisterking . |
@melihplt could it be a bug in NodeJS where the connection enters an infinite loop? Have you tried with a different runtime such as Bun to see if you get the same result? |
Why don't you use sandboxed processors which are precisely designed for handling cases where you keep the nodejs event loop busy? |
I tried this back then with For this, and other reasons, I would like to avoid doing this. |
Version
v5.34.2
Platform
NodeJS
What happened?
We used the predecessor
bull
successfully and very heavily over many months at our company.Now, we updated to
bullmq
and, to be honest, we have quite some issues.Our queues get stuck quite frequently (we never had this issue!), and we sometimes run into this error:
This just continues like that, it never resolves, until we do a restart. This is quite bad for us.
I also could not really find something here in any other issues? What can we do here?
How to reproduce.
No response
Relevant log output
Code of Conduct
The text was updated successfully, but these errors were encountered: