You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is especially possible for the upgrade paths where we are taking strong locks, as in citus--11.2-2--11.3-1.sql.
An example on how this upgrade path can cause a deadlock with transaction recovery:
RecoverTwoPhaseCommits() acquires RowExclusiveLock on pg_dist_transaction via RecoverWorkerTransactions() and it doesn't release the lock until the end of the transaction as close the table with "NoLock".
Upgrade script acquires AccessExclusiveLock on pg_dist_authinfo because of the following DDL:
"ALTER TABLE pg_catalog.pg_dist_authinfo REPLICA IDENTITY USING INDEX pg_dist_authinfo_identification_index;"
RecoverTwoPhaseCommits() continues to process the next worker via RecoverWorkerTransactions() and this implicitly tries to acquire AccessShareLock on pg_dist_authinfo via GetNodeConnection().
Upgrade script tries to acquire AccessExclusiveLock on pg_dist_transaction because of the following DDL:
"ALTER TABLE pg_catalog.pg_dist_transaction REPLICA IDENTITY USING INDEX pg_dist_transaction_unique_constraint;"
Now, while the process that is executing "ALTER EXTENSION citus UPDATE" is waiting to acquire AccessExclusiveLock on pg_dist_transaction while holding AccessExclusiveLock on pg_dist_authinfo; maintenance daemon is waiting to acquire AccessShareLock on pg_dist_authinfo while holding AccessExclusiveLock on pg_dist_authinfo.
As a result, the processes involved in the deadlock are cancelled by Postgres to resolve the deadlock.
Luckily, this doesn't leave the database in a bad state or such because Postgres implicitly executes the upgrade scripts within an implicit transaction block and a retry mostly helps.
But rather than retrying, until we properly fix this issue, disabling 2PC recovery during Citus upgrade seems like a more reliable workaround, as in;
ALTER SYSTEM SETcitus.recover_2pc_interval TO -1;
SELECT pg_reload_conf();
ALTER EXTENSION citus UPDATE;
ALTER SYSTEM RESET citus.recover_2pc_interval;
SELECT pg_reload_conf();
The text was updated successfully, but these errors were encountered:
This is especially possible for the upgrade paths where we are taking strong locks, as in citus--11.2-2--11.3-1.sql.
An example on how this upgrade path can cause a deadlock with transaction recovery:
"ALTER TABLE pg_catalog.pg_dist_authinfo REPLICA IDENTITY USING INDEX pg_dist_authinfo_identification_index;"
"ALTER TABLE pg_catalog.pg_dist_transaction REPLICA IDENTITY USING INDEX pg_dist_transaction_unique_constraint;"
Now, while the process that is executing "ALTER EXTENSION citus UPDATE" is waiting to acquire AccessExclusiveLock on pg_dist_transaction while holding AccessExclusiveLock on pg_dist_authinfo; maintenance daemon is waiting to acquire AccessShareLock on pg_dist_authinfo while holding AccessExclusiveLock on pg_dist_authinfo.
As a result, the processes involved in the deadlock are cancelled by Postgres to resolve the deadlock.
Luckily, this doesn't leave the database in a bad state or such because Postgres implicitly executes the upgrade scripts within an implicit transaction block and a retry mostly helps.
But rather than retrying, until we properly fix this issue, disabling 2PC recovery during Citus upgrade seems like a more reliable workaround, as in;
The text was updated successfully, but these errors were encountered: