Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to fix pg_rewind can't find postgresql.conf file in data directory via the librepmgr.sh script #53226

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

xtianus79
Copy link

@xtianus79 xtianus79 commented Nov 27, 2023

This fixes the issues of the not found postgresql.conf file in the pg data directory. The actual file name there is postgresql.auto.conf.

Description of the change

The change uses the actual file name in the pg data directory. By default it would be postgresql.conf but the actual file name is postgresql.auto.conf

Benefits

prevent the error seens as pg_rewind can't find pg conf file.

postgresql-repmgr 04:00:49.64 DEBUG ==> Schema repmgr.repmgr exists!
postgresql-repmgr 04:00:49.65 INFO  ==> Rejoining node...
postgresql-repmgr 04:00:49.65 INFO  ==> Using pg_rewind to primary node...
postgresql-repmgr 04:00:49.65 INFO  ==> Running pg_rewind data to primary node...
pg_rewind: executing "/opt/bitnami/postgresql/bin/postgres" for target server to complete crash recovery
postgres: could not access the server configuration file "/bitnami/postgresql/data/postgresql.conf": No such file or directory
pg_rewind: error: postgres single-user mode in target cluster failed
pg_rewind: detail: Command was: /opt/bitnami/postgresql/bin/postgres --single -F -D /bitnami/postgresql/data template1 < /dev/null
postgresql-repmgr 04:00:49.72 WARN  ==> pg_rewind failed, resorting to data cloning
postgresql-repmgr 04:00:49.72 INFO  ==> Cloning data from primary node...
WARNING: following problems with command line parameters detected:
  -D/--pgdata will be ignored if a repmgr configuration file is provided

Issue seems to be from this code in the repmgr script librepmgr.sh

repmgr_pgrewind() {
    info "Running pg_rewind data to primary node..."
    local -r flags=("-D" "$POSTGRESQL_DATA_DIR" "--source-server" "host=${REPMGR_CURRENT_PRIMARY_HOST} port=${REPMGR_CURRENT_PRIMARY_PORT} user=${REPMGR_USERNAME} dbname=${REPMGR_DATABASE}")

    if [[ "$REPMGR_USE_PASSFILE" = "true" ]]; then
        PGPASSFILE="$REPMGR_PASSFILE_PATH" debug_execute "${POSTGRESQL_BIN_DIR}/pg_rewind" "${flags[@]}"
    else
        PGPASSWORD="$REPMGR_PASSWORD" debug_execute "${POSTGRESQL_BIN_DIR}/pg_rewind" "${flags[@]}"
    fi
}

Other posts related
bitnami/charts#20998
#52213
bitnami/charts#8933
And probably many more issues where a user is using pg_rewind

Possible drawbacks

It will continue not working so nothing from my perspective.

Applicable issues

The issue is that upon failing a node and it trying to come back as primary it will not know the repmgr_slot_nodeName and will error continuously. The returning node is never put on the right timeline and does not enter back into the HA setup correctly.

Additional information

Dos for pg_rewind

--config-file=filename
Use the specified main server configuration file for the target cluster. This affects pg_rewind when it uses internally the postgres command for the rewind operation on this cluster (when retrieving restore_command with the option -c/--restore-target-wal and when forcing a completion of crash recovery).

This fixes the issues of the not found postgresql.conf file in the pg data directory. The actual file name there is postgresql.auto.conf. 

Signed-off-by: xtianus79 <[email protected]>
@github-actions github-actions bot added the triage Triage is needed label Nov 27, 2023
@carrodher carrodher added the verify Execute verification workflow for these changes label Nov 28, 2023
@github-actions github-actions bot added in-progress and removed triage Triage is needed labels Nov 28, 2023
@bitnami-bot bitnami-bot removed the request for review from javsalgar November 28, 2023 15:32
@xtianus79
Copy link
Author

hi @dgomezleon how are you. Did you get a chance to look at the change request?

@dgomezleon
Copy link
Member

Hi @xtianus79,

Have you tested this change? I gave it a try and obtained this:

postgresql-repmgr 11:18:11.53 INFO  ==> Running pg_rewind data to primary node...
pg_rewind: executing "/opt/bitnami/postgresql/bin/postgres" for target server to complete crash recovery
postgres: could not access the server configuration file "/postgresql.auto.conf": No such file or directory
pg_rewind: error: postgres single-user mode in target cluster failed
pg_rewind: detail: Command was: /opt/bitnami/postgresql/bin/postgres --single -F -D /bitnami/postgresql/data -c config_file=postgresql.auto.conf template1 < /dev/null

So it seems it is necessary the full path. Also, the change should be applied to all the supported branches.

@xtianus79
Copy link
Author

Hi @xtianus79,

Have you tested this change? I gave it a try and obtained this:

postgresql-repmgr 11:18:11.53 INFO  ==> Running pg_rewind data to primary node...
pg_rewind: executing "/opt/bitnami/postgresql/bin/postgres" for target server to complete crash recovery
postgres: could not access the server configuration file "/postgresql.auto.conf": No such file or directory
pg_rewind: error: postgres single-user mode in target cluster failed
pg_rewind: detail: Command was: /opt/bitnami/postgresql/bin/postgres --single -F -D /bitnami/postgresql/data -c config_file=postgresql.auto.conf template1 < /dev/null

So it seems it is necessary the full path. Also, the change should be applied to all the supported branches.

@dgomezleon can you check a proper change. I was not able to test the change but I am assuming you're correct. The auto file is the correct file that is there.

@dgomezleon
Copy link
Member

Applying some changes to this PR we have passed that error. However, we have found other errors, so we have created a task to properly review the issue.

@xtianus79
Copy link
Author

xtianus79 commented Dec 9, 2023

Applying some changes to this PR we have passed that error. However, we have found other errors, so we have created a task to properly review the issue.

Thanks @dgomezleon that's great. Do you know how long that might be to patch all errors? As of now I am comfortable with where I'm at and I can plan out a future update down the road. A range of time is sufficient.

@dgomezleon
Copy link
Member

Hi @xtianus79 ,

After looking over our priorities and due to other issues/initiatives we are working on, the task is still in our backlog list, so I can't give you an ETA.

Please feel free to update/test this PR if you are interested.

We will update the case as soon as we have any news.

@dgomezleon dgomezleon added on-hold Issues or Pull Requests with this label will never be considered stale and removed in-progress labels Dec 11, 2023
@kwenzh
Copy link
Contributor

kwenzh commented Aug 6, 2024

excuse me, Is there any progress?
I encountered the same issue, unable to enable use REPMGR_USE_PGREWIND normally

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
on-hold Issues or Pull Requests with this label will never be considered stale postgresql-repmgr verify Execute verification workflow for these changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants