Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent permanent waits from destroyed workers #187

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

bobbysmith007
Copy link

  • There are many errors due to pools being configured with min: 0
    but then running out of workers while there is still work to do
  • This results in missed work, infinite loops and timeouts
  • A solution is to ensure that if we still have work todo
    that we make sure we still have some workers to do them

See:

Prevent permanent waits, by ensuring that we continue creating
resources if we need them

 * There are many errors due to pools being configured with min: 0
   but then running out of workers while there is still work to do
 * This results in missed work, infinite loops and timeouts
 * A solution is to ensure that if we still have work todo
   that we make sure we still have some workers to do them

See:

 * brianc/node-pg-pool#48
 * loopbackio/loopback-connector-postgresql#231
 * coopernurse#175 (Seems related)
@bobbysmith007
Copy link
Author

I believe this PR to be invalid.
I was working on from 2.4.2 and porting code over from there and while I thought I had done a good enough job reading it to see that the bug still existed, I feel I was probably wrong.

It looks like destroy (rather than _destroy) calls _dispense, which is probably good enough that the code I added in _ensureMinimum will never activate. Its still probably valid defensive code, but unneeded.

I have not been successful in writing a test that covers this case because it looks like the code that is run inside of aquire, doesnt have default error handling (should it?). That is: if the fn passed to aquire (the "then" of the aquire promise) thows an error without releasing or destroying, it seems that promise remains unresolved. It seems like a default error handler to do either release or destroy might be in order, but that was not what this PR addressed..

@sandfox
Copy link
Collaborator

sandfox commented Apr 5, 2017

Hi @bobbysmith007
sorry it's taken a while to reply to this. I really need to draw up (manually or programatically) a flow/state diagram. There are some edge cases that I'm aware of (but may not be written down anywhere) that can lock/infinite-loop-o-doom the pool but I think they are preventable in userland (even though thats not a friendly attitude to take) until I get round to dealing with them in the lib.

There is a substantial difference between the internals of v2 and v3 as I think we hit a dead end up with the v2 codebase with the features we wanted to add.

I'm also aware that the test suite is a complete mess at the moment, there is a lot of work required to bootstrap any scenario and there is plenty of room for adding some helpers/stuff to make this easier.

if the fn passed to aquire (the "then" of the acquire promise) thows an error without releasing or destroying, it seems that promise remains unresolved

Can you point me to the line/block in question? For some reason failing to wrap my head around this, but I suspect you are probably right,,,

@bobbysmith007
Copy link
Author

I have still been trying to come up with a good way to express this error / test case. Unfortunately Promises are still a bit new to me and I lack some of the intuition I have built up in other areas.

The best I have been able to come up with is that we should be doing promise based resource management à la: http://bluebirdjs.com/docs/api/promise.using.html and http://bluebirdjs.com/docs/api/disposer.html anything less than this can result in errors that consume pool objects permanently. I was hoping to come up with a "only promises" implementation since using and disposer seem to be linked to bluebird promises

@sandfox
Copy link
Collaborator

sandfox commented Apr 7, 2017

from a first glance that bluebird stuff seems like a generalised + promisey version of https://github.com/sandfox/generic-pool-decorator from the v2 of the pool.
Will take a closer look as I'm very sure there will be a way to apply this pattern (albeit with a slightly different API) on top of generic promises. (in actual fact maybe someone has already done this).

There is also a related-ish feature request #167 which is looking for similar sort of solution.

When it comes to expressing problems I'm more than happy with anything, pen + paper doodles etc. Sometimes putting a thought into code is the hard bit.

@bobbysmith007
Copy link
Author

Thanks for taking time to discuss this.

I made a gist of a very basic implementation of this, so that I could wrap my head around how I would want it to work.

https://gist.github.com/bobbysmith007/2e2d44d4e90609621482982bb4f68a66

At a basic level, I would like some assurance, preferably at the api level that the pool will always attempt to complete all scheduled work. Inherent in this goal is correctly releasing or destroying/recreating its resources. This means providing some context manager, where work is executed in a context and any error thrown there has the ability to be handled both locally and at the context manager level.

My goal would be that any unhandled error that comes from the work to be done will default in the worker being destroyed/recreated and any work completed successfully, returns the worker to the pool.

Two additional niceties to help with these assurances:

  • the promise from acquire should be rejected upon failing to acquire, or failing to run work.
  • a loan timeout to ensure that anyone "hogging" the resource will eventually have the resource taken back (not meaningful in all cases, but otherwise very useful eg: query timeouts on a database pool)
    • Similar to the acquire timeout but for the work period

Unfortunately this requires a new api endpoint (that would theoretically become the "standard" way of interacting with the pool. Probably something like withPooledObject(workToDo) => returns a Promise. You would no longer want to run workToDo on the then of withPooledObject, but rather you would do your work and return the results in the workToDo and the then of withPooledObject would not have access to the pooledObject (as it would be destroyed/returned at the end of work).

Rough corners:

  • handling already released resources in release / preventing double releases
  • If you return / save the pooled object from withing "workToDo" you would be passing around an already returned pool item (I dont think there is anything to do about this other than say "dont do that"). This bug already exists, but the new API would perhaps make it more obvious.

Sorry I couldnt provide a more meaningful patch here, and thanks again

@sandfox sandfox self-assigned this Apr 10, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants