`SELECT pg_advisory_lock` #134

denysvitali · 2025-05-05T18:52:57Z

It seems like pg_advisory_lock is treated as a SELECT statement, thus the lock never gets acquired.

Would it make sense to have a list of SQL statements that have side effects and that should therefore be whitelisted for being redirected to the write replicas?

The text was updated successfully, but these errors were encountered:

levkk · 2025-05-05T18:58:36Z

I'm surprised that doesn't work on a replica. I thought advisory locks are in-memory. But yes, there are SELECT statements that write, so I agree, we should have a whitelist. I'm thinking of using pg_query::fingerprint for this and a touch of manual routing (e.g. if query matches fingerprint, route to primary/replica, etc.)

denysvitali · 2025-05-23T21:25:34Z

Despite #181 - it still seems to be broken:

[Nest] 20  - 05/23/2025, 9:23:15 PM     LOG [Api:EventRepository] Initialized websocket server
Query failed : {
  durationMs: 33.040015999999014,
  error: PostgresError: protocol is out of sync
      at ErrorResponse (/usr/src/app/node_modules/postgres/cjs/src/connection.js:790:26)
      at handle (/usr/src/app/node_modules/postgres/cjs/src/connection.js:476:6)
      at Socket.data (/usr/src/app/node_modules/postgres/cjs/src/connection.js:315:9)
      at Socket.emit (node:events:518:28)
      at addChunk (node:internal/streams/readable:561:12)
      at readableAddChunkPushByteMode (node:internal/streams/readable:512:3)
      at Readable.push (node:internal/streams/readable:392:5)
      at TCP.onStreamRead (node:internal/stream_base_commons:189:23)
      at TCP.callbackTrampoline (node:internal/async_hooks:130:17) {
    severity_local: 'ERROR',
    severity: 'ERROR',
    code: '58000'
  },
  sql: 'SELECT pg_advisory_lock($1)',
  params: [ 200 ]
}
Query failed : {
  durationMs: 28.966197000001557,
  error: Error: write CONNECTION_CLOSED immich-pgdog:5432
      at Socket.closed (/usr/src/app/node_modules/postgres/cjs/src/connection.js:440:57)
      at Socket.emit (node:events:518:28)
      at TCP.<anonymous> (node:net:351:12)
      at TCP.callbackTrampoline (node:internal/async_hooks:130:17) {
    code: 'CONNECTION_CLOSED',
    errno: 'CONNECTION_CLOSED',
    address: [ 'immich-pgdog' ],
    port: [ 5432 ]
  },
  sql: 'SELECT pg_advisory_unlock($1)',
  params: [ 200 ]
}
Error: write CONNECTION_CLOSED immich-pgdog:5432
    at Socket.closed (/usr/src/app/node_modules/postgres/cjs/src/connection.js:440:57)
    at Socket.emit (node:events:518:28)
    at TCP.<anonymous> (node:net:351:12)
    at TCP.callbackTrampoline (node:internal/async_hooks:130:17) {
  code: 'CONNECTION_CLOSED',
  errno: 'CONNECTION_CLOSED',
  address: [ 'immich-pgdog' ],
  port: [ 5432 ]
}
api worker exited with code 1

pgdog reports:

2025-05-23T21:24:18.808778Z TRACE 😳
2025-05-23T21:24:18.808856Z ERROR ProtocolOutOfSync got: T, extended buffer: ProtocolState { queue: [], names: [], simulated: [], extended: true, out_of_sync: false }
2025-05-23T21:24:18.809060Z TRACE 😳
2025-05-23T21:24:18.809076Z ERROR client disconnected with error [10.42.4.98:34114]: protocol is out of sync
2025-05-23T21:24:18.809329Z DEBUG [cleanup] draining data from "receiving data" server [immich-rw:5432, immich]
2025-05-23T21:24:18.809356Z TRACE >>> Sync(Sync) [immich-rw:5432, immich]
2025-05-23T21:24:18.810084Z TRACE <<< ReadyForQuery { status: 'I' } [immich-rw:5432, immich]
2025-05-23T21:24:18.810123Z DEBUG [cleanup] RESET ALL,SELECT pg_advisory_unlock_all(), server in "idle" state [immich-rw:5432, immich]
2025-05-23T21:24:18.810135Z TRACE >>> Query(Query { query: "RESET ALL" }) [immich-rw:5432, immich]
2025-05-23T21:24:18.810145Z TRACE >>> Query(Query { query: "SELECT pg_advisory_unlock_all()" }) [immich-rw:5432, immich]
2025-05-23T21:24:18.811476Z TRACE <<< CommandComplete { command: "RESET" } [immich-rw:5432, immich]
2025-05-23T21:24:18.811527Z TRACE <<< ReadyForQuery { status: 'I' } [immich-rw:5432, immich]
2025-05-23T21:24:18.811537Z TRACE <<< RowDescription { fields: [Field { name: "pg_advisory_unlock_all", table_oid: 0, column: 0, type_oid: 2278, type_size: 4, type_modifier: -1, format: 0 }] } [immich-rw:5432,
2025-05-23T21:24:18.811552Z TRACE <<< DataRow { columns: [Data { data: b"", is_null: false }] } [immich-rw:5432, immich]
2025-05-23T21:24:18.811563Z TRACE <<< CommandComplete { command: "SELECT 1" } [immich-rw:5432, immich]
2025-05-23T21:24:18.811574Z TRACE <<< ReadyForQuery { status: 'I' } [immich-rw:5432, immich]
2025-05-23T21:24:23.031413Z DEBUG calculated averages [immich-rw:5432, immich]
2025-05-23T21:24:23.031440Z DEBUG calculated averages [immich-ro:5432, immich]
2025-05-23T21:24:28.020243Z DEBUG running healthcheck ";" [immich-rw:5432, immich]
2025-05-23T21:24:28.020306Z TRACE >>> Query(Query { query: ";" }) [immich-rw:5432, immich]
2025-05-23T21:24:28.022079Z TRACE <<< EmptyQueryResponse [immich-rw:5432, immich]
2025-05-23T21:24:28.022138Z TRACE <<< ReadyForQuery { status: 'I' } [immich-rw:5432, immich]
2025-05-23T21:24:28.022160Z DEBUG [cleanup] no cleanup needed, server in "idle" state [immich-rw:5432, immich]
2025-05-23T21:24:28.022227Z DEBUG running healthcheck ";" [immich-ro:5432, immich]
2025-05-23T21:24:28.022249Z TRACE >>> Query(Query { query: ";" }) [immich-ro:5432, immich]
2025-05-23T21:24:28.023844Z TRACE <<< EmptyQueryResponse [immich-ro:5432, immich]
2025-05-23T21:24:28.023894Z TRACE <<< ReadyForQuery { status: 'I' } [immich-ro:5432, immich]
2025-05-23T21:24:28.023913Z DEBUG [cleanup] no cleanup needed, server in "idle" state [immich-ro:5432, immich]
2025-05-23T21:24:38.032897Z DEBUG calculated averages [immich-ro:5432, immich]
2025-05-23T21:24:38.032920Z DEBUG calculated averages [immich-rw:5432, immich]
2025-05-23T21:24:53.033481Z DEBUG calculated averages [immich-rw:5432, immich]
2025-05-23T21:24:53.033508Z DEBUG calculated averages [immich-ro:5432, immich]
2025-05-23T21:24:58.019808Z DEBUG running healthcheck ";" [immich-rw:5432, immich]
2025-05-23T21:24:58.019853Z TRACE >>> Query(Query { query: ";" }) [immich-rw:5432, immich]
2025-05-23T21:24:58.020465Z TRACE <<< EmptyQueryResponse [immich-rw:5432, immich]
2025-05-23T21:24:58.020493Z TRACE <<< ReadyForQuery { status: 'I' } [immich-rw:5432, immich]
2025-05-23T21:24:58.020507Z DEBUG [cleanup] no cleanup needed, server in "idle" state [immich-rw:5432, immich]
2025-05-23T21:24:58.021617Z DEBUG running healthcheck ";" [immich-ro:5432, immich]
2025-05-23T21:24:58.021643Z TRACE >>> Query(Query { query: ";" }) [immich-ro:5432, immich]
2025-05-23T21:24:58.023539Z TRACE <<< EmptyQueryResponse [immich-ro:5432, immich]
2025-05-23T21:24:58.023561Z TRACE <<< ReadyForQuery { status: 'I' } [immich-ro:5432, immich]
2025-05-23T21:24:58.023573Z DEBUG [cleanup] no cleanup needed, server in "idle" state [immich-ro:5432, immich]
2025-05-23T21:25:08.034778Z DEBUG calculated averages [immich-ro:5432, immich]
2025-05-23T21:25:08.034810Z DEBUG calculated averages [immich-rw:5432, immich]

levkk · 2025-05-23T22:59:53Z

Could you show the lines from the log before the error as well? That's where the issue will be most likely. Thanks!

denysvitali · 2025-05-31T19:21:16Z

Full log attached. Consider that I'm running 3 replicas of pgdog

immich-immich-pgdog-776f45559f-2vts4-pgdog.log

levkk · 2025-06-01T17:52:12Z

Awesome, thank you. The fix should be in #199. Couple follow up questions just to confirm that I caught the right bug:

Are you using session mode? I only see you connecting to a server once but you ran several queries. Just a head up, in session mode, load balancing between replicas is not very effective. Your client will be stuck with the same server until it disconnects.
Did you disable prepared statements? prepared_statements = "disabled". The Parse/Describe in the buffer that caused the out of sync error were anonymous, but they should of been rewritten to named statements.

I think the combination of these two things surfaced a bug in the implementation of the extended protocol on the server side of PgDog. Still would love to confirm that the diagnosis is correct.

Cheers!

denysvitali · 2025-06-01T20:59:01Z

Yes, following are my settings (that then get converted to their respective TOML version):

pgdog:
  replicaCount: 3
  workers: 4
  defaultPoolSize: 5
  minPoolSize: 3
  passthroughAuth: disabled
  queryTimeout: 30000
  connectTimeout: 30000
  checkoutTimeout: 5000
  logLevel: trace
  poolerMode: session
  banTimeout: 50 # 50ms
  preparedStatements: disabled

denysvitali linked a pull request May 12, 2025 that will close this issue

feat: side-effect SELECT #158

Draft

levkk mentioned this issue May 22, 2025

Handle write functions and advisory locks #181

Merged

levkk mentioned this issue May 24, 2025

More LB tests' #186

Merged

levkk mentioned this issue Jun 1, 2025

Manual ban/unban #199

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`SELECT pg_advisory_lock` #134

`SELECT pg_advisory_lock` #134

denysvitali commented May 5, 2025

levkk commented May 5, 2025

Uh oh!

denysvitali commented May 23, 2025

Uh oh!

levkk commented May 23, 2025

Uh oh!

denysvitali commented May 31, 2025

Uh oh!

levkk commented Jun 1, 2025

Uh oh!

denysvitali commented Jun 1, 2025

Uh oh!

SELECT pg_advisory_lock #134

SELECT pg_advisory_lock #134

Comments

denysvitali commented May 5, 2025

levkk commented May 5, 2025

Uh oh!

denysvitali commented May 23, 2025

Uh oh!

levkk commented May 23, 2025

Uh oh!

denysvitali commented May 31, 2025

Uh oh!

levkk commented Jun 1, 2025

Uh oh!

denysvitali commented Jun 1, 2025

Uh oh!

`SELECT pg_advisory_lock` #134

`SELECT pg_advisory_lock` #134