Skip to content

SELECT pg_advisory_lock #134

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
denysvitali opened this issue May 5, 2025 · 6 comments · May be fixed by #158
Open

SELECT pg_advisory_lock #134

denysvitali opened this issue May 5, 2025 · 6 comments · May be fixed by #158

Comments

@denysvitali
Copy link
Contributor

It seems like pg_advisory_lock is treated as a SELECT statement, thus the lock never gets acquired.

Would it make sense to have a list of SQL statements that have side effects and that should therefore be whitelisted for being redirected to the write replicas?

@levkk
Copy link
Collaborator

levkk commented May 5, 2025

I'm surprised that doesn't work on a replica. I thought advisory locks are in-memory. But yes, there are SELECT statements that write, so I agree, we should have a whitelist. I'm thinking of using pg_query::fingerprint for this and a touch of manual routing (e.g. if query matches fingerprint, route to primary/replica, etc.)

@denysvitali denysvitali linked a pull request May 12, 2025 that will close this issue
@denysvitali
Copy link
Contributor Author

Despite #181 - it still seems to be broken:

[Nest] 20  - 05/23/2025, 9:23:15 PM     LOG [Api:EventRepository] Initialized websocket server
Query failed : {
  durationMs: 33.040015999999014,
  error: PostgresError: protocol is out of sync
      at ErrorResponse (/usr/src/app/node_modules/postgres/cjs/src/connection.js:790:26)
      at handle (/usr/src/app/node_modules/postgres/cjs/src/connection.js:476:6)
      at Socket.data (/usr/src/app/node_modules/postgres/cjs/src/connection.js:315:9)
      at Socket.emit (node:events:518:28)
      at addChunk (node:internal/streams/readable:561:12)
      at readableAddChunkPushByteMode (node:internal/streams/readable:512:3)
      at Readable.push (node:internal/streams/readable:392:5)
      at TCP.onStreamRead (node:internal/stream_base_commons:189:23)
      at TCP.callbackTrampoline (node:internal/async_hooks:130:17) {
    severity_local: 'ERROR',
    severity: 'ERROR',
    code: '58000'
  },
  sql: 'SELECT pg_advisory_lock($1)',
  params: [ 200 ]
}
Query failed : {
  durationMs: 28.966197000001557,
  error: Error: write CONNECTION_CLOSED immich-pgdog:5432
      at Socket.closed (/usr/src/app/node_modules/postgres/cjs/src/connection.js:440:57)
      at Socket.emit (node:events:518:28)
      at TCP.<anonymous> (node:net:351:12)
      at TCP.callbackTrampoline (node:internal/async_hooks:130:17) {
    code: 'CONNECTION_CLOSED',
    errno: 'CONNECTION_CLOSED',
    address: [ 'immich-pgdog' ],
    port: [ 5432 ]
  },
  sql: 'SELECT pg_advisory_unlock($1)',
  params: [ 200 ]
}
Error: write CONNECTION_CLOSED immich-pgdog:5432
    at Socket.closed (/usr/src/app/node_modules/postgres/cjs/src/connection.js:440:57)
    at Socket.emit (node:events:518:28)
    at TCP.<anonymous> (node:net:351:12)
    at TCP.callbackTrampoline (node:internal/async_hooks:130:17) {
  code: 'CONNECTION_CLOSED',
  errno: 'CONNECTION_CLOSED',
  address: [ 'immich-pgdog' ],
  port: [ 5432 ]
}
api worker exited with code 1

pgdog reports:

2025-05-23T21:24:18.808778Z TRACE 😳
2025-05-23T21:24:18.808856Z ERROR ProtocolOutOfSync got: T, extended buffer: ProtocolState { queue: [], names: [], simulated: [], extended: true, out_of_sync: false }
2025-05-23T21:24:18.809060Z TRACE 😳
2025-05-23T21:24:18.809076Z ERROR client disconnected with error [10.42.4.98:34114]: protocol is out of sync
2025-05-23T21:24:18.809329Z DEBUG [cleanup] draining data from "receiving data" server [immich-rw:5432, immich]
2025-05-23T21:24:18.809356Z TRACE >>> Sync(Sync) [immich-rw:5432, immich]
2025-05-23T21:24:18.810084Z TRACE <<< ReadyForQuery { status: 'I' } [immich-rw:5432, immich]
2025-05-23T21:24:18.810123Z DEBUG [cleanup] RESET ALL,SELECT pg_advisory_unlock_all(), server in "idle" state [immich-rw:5432, immich]
2025-05-23T21:24:18.810135Z TRACE >>> Query(Query { query: "RESET ALL" }) [immich-rw:5432, immich]
2025-05-23T21:24:18.810145Z TRACE >>> Query(Query { query: "SELECT pg_advisory_unlock_all()" }) [immich-rw:5432, immich]
2025-05-23T21:24:18.811476Z TRACE <<< CommandComplete { command: "RESET" } [immich-rw:5432, immich]
2025-05-23T21:24:18.811527Z TRACE <<< ReadyForQuery { status: 'I' } [immich-rw:5432, immich]
2025-05-23T21:24:18.811537Z TRACE <<< RowDescription { fields: [Field { name: "pg_advisory_unlock_all", table_oid: 0, column: 0, type_oid: 2278, type_size: 4, type_modifier: -1, format: 0 }] } [immich-rw:5432,
2025-05-23T21:24:18.811552Z TRACE <<< DataRow { columns: [Data { data: b"", is_null: false }] } [immich-rw:5432, immich]
2025-05-23T21:24:18.811563Z TRACE <<< CommandComplete { command: "SELECT 1" } [immich-rw:5432, immich]
2025-05-23T21:24:18.811574Z TRACE <<< ReadyForQuery { status: 'I' } [immich-rw:5432, immich]
2025-05-23T21:24:23.031413Z DEBUG calculated averages [immich-rw:5432, immich]
2025-05-23T21:24:23.031440Z DEBUG calculated averages [immich-ro:5432, immich]
2025-05-23T21:24:28.020243Z DEBUG running healthcheck ";" [immich-rw:5432, immich]
2025-05-23T21:24:28.020306Z TRACE >>> Query(Query { query: ";" }) [immich-rw:5432, immich]
2025-05-23T21:24:28.022079Z TRACE <<< EmptyQueryResponse [immich-rw:5432, immich]
2025-05-23T21:24:28.022138Z TRACE <<< ReadyForQuery { status: 'I' } [immich-rw:5432, immich]
2025-05-23T21:24:28.022160Z DEBUG [cleanup] no cleanup needed, server in "idle" state [immich-rw:5432, immich]
2025-05-23T21:24:28.022227Z DEBUG running healthcheck ";" [immich-ro:5432, immich]
2025-05-23T21:24:28.022249Z TRACE >>> Query(Query { query: ";" }) [immich-ro:5432, immich]
2025-05-23T21:24:28.023844Z TRACE <<< EmptyQueryResponse [immich-ro:5432, immich]
2025-05-23T21:24:28.023894Z TRACE <<< ReadyForQuery { status: 'I' } [immich-ro:5432, immich]
2025-05-23T21:24:28.023913Z DEBUG [cleanup] no cleanup needed, server in "idle" state [immich-ro:5432, immich]
2025-05-23T21:24:38.032897Z DEBUG calculated averages [immich-ro:5432, immich]
2025-05-23T21:24:38.032920Z DEBUG calculated averages [immich-rw:5432, immich]
2025-05-23T21:24:53.033481Z DEBUG calculated averages [immich-rw:5432, immich]
2025-05-23T21:24:53.033508Z DEBUG calculated averages [immich-ro:5432, immich]
2025-05-23T21:24:58.019808Z DEBUG running healthcheck ";" [immich-rw:5432, immich]
2025-05-23T21:24:58.019853Z TRACE >>> Query(Query { query: ";" }) [immich-rw:5432, immich]
2025-05-23T21:24:58.020465Z TRACE <<< EmptyQueryResponse [immich-rw:5432, immich]
2025-05-23T21:24:58.020493Z TRACE <<< ReadyForQuery { status: 'I' } [immich-rw:5432, immich]
2025-05-23T21:24:58.020507Z DEBUG [cleanup] no cleanup needed, server in "idle" state [immich-rw:5432, immich]
2025-05-23T21:24:58.021617Z DEBUG running healthcheck ";" [immich-ro:5432, immich]
2025-05-23T21:24:58.021643Z TRACE >>> Query(Query { query: ";" }) [immich-ro:5432, immich]
2025-05-23T21:24:58.023539Z TRACE <<< EmptyQueryResponse [immich-ro:5432, immich]
2025-05-23T21:24:58.023561Z TRACE <<< ReadyForQuery { status: 'I' } [immich-ro:5432, immich]
2025-05-23T21:24:58.023573Z DEBUG [cleanup] no cleanup needed, server in "idle" state [immich-ro:5432, immich]
2025-05-23T21:25:08.034778Z DEBUG calculated averages [immich-ro:5432, immich]
2025-05-23T21:25:08.034810Z DEBUG calculated averages [immich-rw:5432, immich]

@levkk
Copy link
Collaborator

levkk commented May 23, 2025

Could you show the lines from the log before the error as well? That's where the issue will be most likely. Thanks!

@levkk levkk mentioned this issue May 24, 2025
@denysvitali
Copy link
Contributor Author

Full log attached. Consider that I'm running 3 replicas of pgdog

immich-immich-pgdog-776f45559f-2vts4-pgdog.log

@levkk levkk mentioned this issue Jun 1, 2025
@levkk
Copy link
Collaborator

levkk commented Jun 1, 2025

Awesome, thank you. The fix should be in #199. Couple follow up questions just to confirm that I caught the right bug:

  • Are you using session mode? I only see you connecting to a server once but you ran several queries. Just a head up, in session mode, load balancing between replicas is not very effective. Your client will be stuck with the same server until it disconnects.
  • Did you disable prepared statements? prepared_statements = "disabled". The Parse/Describe in the buffer that caused the out of sync error were anonymous, but they should of been rewritten to named statements.

I think the combination of these two things surfaced a bug in the implementation of the extended protocol on the server side of PgDog. Still would love to confirm that the diagnosis is correct.

Cheers!

@denysvitali
Copy link
Contributor Author

Yes, following are my settings (that then get converted to their respective TOML version):

pgdog:
  replicaCount: 3
  workers: 4
  defaultPoolSize: 5
  minPoolSize: 3
  passthroughAuth: disabled
  queryTimeout: 30000
  connectTimeout: 30000
  checkoutTimeout: 5000
  logLevel: trace
  poolerMode: session
  banTimeout: 50 # 50ms
  preparedStatements: disabled

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants