Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CherryPick Fix the version argument to ArgumentParser is deprecated etc #875

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

yjhjstz
Copy link
Member

@yjhjstz yjhjstz commented Jan 16, 2025

Fixes #873

What does this PR do?

Type of Change

  • Bug fix (non-breaking change)
  • New feature (non-breaking change)
  • Breaking change (fix or feature with breaking changes)
  • Documentation update

Breaking Changes

Test Plan

  • Unit tests added/updated
  • Integration tests added/updated
  • Passed make installcheck
  • Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Additional Context

CI Skip Instructions


huansong and others added 2 commits January 17, 2025 07:45
There is nothing wrong with doing that for the tests themselves, but it has an
implication to a bug #16120 where restarting w/o checkpoint can create orphaned
files and results in the failure of test gp_check_fiels. While that issue is
dealt with separately, for now turn the restart mode to fast to avoid the issue.

We ignored isolation2 tests for now because gp_check_files are not covering them
(and it would be really hard to cover, given so many intentional crash/panic or
other ad-hoc scenarios that we are doing in isolation2 could create orphaned
files... I don't see us do that anytime soon).
The two methods don't cooperate, so set_config_option("search_path",
...) has been ineffective under non-empty overrideStack.  This defect
enabled an attacker having database-level CREATE privilege to execute
arbitrary code as the bootstrap superuser.  While that particular attack
requires v13+ for the trusted extension attribute, other attacks are
feasible in all supported versions.

Standardize on the combination of NewGUCNestLevel() and
set_config_option("search_path", ...).  It is newer than
PushOverrideSearchPath(), more-prevalent, and has no known
disadvantages.  The "override" mechanism remains for now, for
compatibility with out-of-tree code.  Users should update such code,
which likely suffers from the same sort of vulnerability closed here.
Back-patch to v11 (all supported versions).

Alexander Lakhin.  Reported by Alexander Lakhin.

Security: CVE-2023-2454

(cherry-picked from 78119a0bf98edb13e7f79c6402378376cc6b6ca4 with conflicts pg:REL_12_STABLE)
@my-ship-it my-ship-it added the cherry-pick cherry-pick upstream commts label Jan 17, 2025
dh-cloud and others added 12 commits January 17, 2025 08:47
When optimizer is off.

For qual (NOT(EXISTS_SUBLINK) and EXPR_SUBLINK)
  or qual (NOT(ANY_SUBLINK) AND EXPR_SUBLINK)

In pull_up_sublinks_qual(), after NOT(EXISTS_SUBLINK) is converted to
Anti-JOIN (or NOT(ANY_SUBLINK) => LASJ_NOTIN-JOIN), if EXPR_SUBLINK is
also successfully converted to INNER-JOIN (see convert_EXPR_to_join)
, we can encounter the scenario where qual is postponed past the anti
or lasj_notin join. It will trigger assertion failure:

  Assert(j->jointype == JOIN_INNER || j->jointype == JOIN_SEMI);

> FATAL:  Unexpected internal error (assert.c:48)
> DETAIL:  FailedAssertion("!(j->jointype == JOIN_INNER || j->jointype == JOIN_SEMI)", File: "initsplan.c", Line: 1018)
> server closed the connection unexpectedly
> ...

We can easily reproduce this issue in master and 6X branch:

```
CREATE TABLE A(id int, tt timestamp)
  DISTRIBUTED BY (id);

SET optimizer TO off;

-- j->jointype == JOIN_LASJ_NOTIN
SELECT
  t1.*
FROM
  A t1
WHERE
  t1.id NOT IN (SELECT id FROM A WHERE id != 4)
  AND
  t1.tt = (SELECT MIN(tt) FROM A WHERE id = t1.id);

-- j->jointype == JOIN_ANTI
SELECT
  t1.*
FROM
  A t1
WHERE
  NOT EXISTS (SELECT id FROM A WHERE id != 4 and id = t1.id)
  AND t1.tt = (SELECT MIN(tt) FROM A WHERE id = t1.id);
```

Add these two jointype into the Assert(). I made several simple
tests, results are ok. However, same as the code comment said, i
am also unsure if postponing quals past anti-join/lasj_notin-join
is always semantically correct.
This commit fixes test, which was added at commit 50e66c76acf19d09ac7f0fc05cca01e014bb0ab8
```
begin;
declare c1 cursor for select count(*) from t_15278 group by sqrt(a);
abort;
select pg_cancel_backend(pg_backend_pid());
```
Above old version test checks that after some errors generated on segment,
InterruptHoldoffCount should be reset to 0 on QD.
However, this is flaky:
QD does not wait result of declare cursor, If error on segments occurs after declare finished,
This error is processed when abort is called.
At some cases abort is called before error is generated on segment.
It led to situation, when there is not any errors on segments and abort
is executed with success result. This test requires error to check
InterruptHoldoffCount after query which ended with error.

New version of the test uses existing inject fault, which generate error on segment at create table.
create table ends with error and InterruptHoldoffCount can be checked correctly.

Co-authored-by: Vasiliy Ivanov <[email protected]>
1. Updated code to use get_stderr() instead of get_result().stderr
2. Added test case for retry count.
The error(NOTICE) meessage is too long. Move the details into errdetail().

fix partition
We need to call gp_malloc|free() for the newly added MemoryContext types
instead of malloc|free(), so that we can get the VMEM reservation etc.

This was missed during the 12 merge.

Reviewed-by: Brent Doil <[email protected]>
Issue:
Some utilities (gprecoverseg, gpstop, etc...) use a lock directory to prevent customers from running
multiple instances of the program simultaneously. The lock directory is created inside the
COORDINATOR_DATA_DIRECTORY at the start of program execution and should be removed when the execution completes.
However, the lock directory is not always removed if the user abruptly kills the program
during execution. Consequently, the next run of the same utility encounters an error, stating
that the lock directory already exists and asks the user to manually delete it, even if
the process holding the lock is already terminated.

Solution:
To address this issue, this commit introduces a modification in mainUtils.py.
When attempting to acquire the lock, the code now checks if the process ID in the PID file,
present in the lock directory, still exists. If the process has been killed, the script
automatically removes the lock directory and proceeds to acquire a new lock for the current run.
This ensures that the utility can continue its execution without being hindered by
orphaned lock files from previous runs.
map_old_guc_names facility exist to have alias for old GUC names to
new ones. Hence best to use that facility instead of hand roll
gp_session_role as alias to gp_role.

Doing this also eliminates having the problem fixed via commit
8d52dad.
If gp_role value is not specified then internally "utility" is set as
default. This is mainly for single node coordinator or
segments. Changing that setting to use PGC_S_DYNAMIC_DEFAULT instead
of PGC_S_OVERRIDE.

The problem with using PGC_S_OVERRIDE is GUC value can't be set
later. Since commit 260710, we rely on assign_gp_role() hook to set
should_reject_connection. If PGC_S_OVERRIDE is used to
SetConfigOption() while setting the default in PostmasterMain(), then
later setting gp_role via PGOPTIONS or any other mechanism will not
call the assign hook and hence misses to set
should_reject_connection. The reason for change being safe is
check_gp_role() only allows "utility" mode to set as value for this
GUC and any other values like "execute" or "dispatch" are already
banned from PGOPTIONS or postgresql.conf followed by reload.

This problem was exposed by src/test/gpdb_pitr/test_gpdb_pitr.sh tests
which gets fixed now.

Reviewed-by: Marbin Tan <[email protected]>
The benefit from this optimization idea is plausible (basically, avoiding one
memcpy of the memtuple), but the changes incurred by this are just too
backwards and invasive for the effort to be worthy:

1. There is no memtuple to directly serialize from. The `PRIVATE_tts_memtuple`
field in `TupleTableSlot` is gone after the FIXME was added (d20e071e879).
Basically we do not store the memtuple outside of the AM layer anymore (like in
the executor memory). I think bringing `PRIVATE_tts_memtuple` back to
`TupleTableSlot` and utilize it everywhere is a clear backward change.

2. Assuming we somehow pass the memtuple into `SerializeTuple` as a function
argument. But on the deserialization side, we would still need to deserialize
it back to minimal tuple because all the deserilization logic and else utilizes
minimal tuple now. The only way to avoid that is to carry memtuple around and
use it, which is basically same as bringing `PRIVATE_tts_memtuple` back to
`TupleTableSlot`. We also might have to bring back the concept of
`GenericTuple`, and revert a lot of the changes that we have made to bring GPDB
aligned with the AM framework which is a big milestone in 7X. Not very
worthwhile IMO.
add some resource group test results into .gitignore
@yjhjstz yjhjstz marked this pull request as ready for review January 17, 2025 02:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-pick cherry-pick upstream commts
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] PG optimizer will core in LASJ_NOTIN case