-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ADBDEV-6156 Count startup memory of each process when using resource groups #1023
base: adb-6.x-dev
Are you sure you want to change the base?
Conversation
Failed job Deploy multiarch Dockerimages: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1807317 |
Allure report https://allure.adsw.io/launch/78383 |
Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1807327 |
Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1807328 |
Allure report https://allure.adsw.io/launch/78465 |
Failed job Regression tests with ORCA on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1817041 |
Failed job Regression tests with Postgres on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1817039 |
Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1817048 |
Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1818658 |
Failed job Resource group isolation tests on ppc64le: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1818659 |
Can you write some tests to check? |
Failed job Build for x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1850837 |
DROP | ||
|
||
-- start_ignore | ||
! gpstop -rai; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be add
! gpconfig -r gp_resource_manager;
before this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done in disable_resgroup
test
|
Failed job Build for x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1880757 |
Failed job Build for x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1890216 |
Allure report https://allure.adsw.io/launch/79897 |
Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1891649 |
Allure report https://allure.adsw.io/launch/80245 |
Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1911167 |
Failed job Build ubuntu22 for x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1924163 |
Failed job Build for x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1924162 |
Allure report https://allure.adsw.io/launch/80289 |
Failed job Resource group isolation tests on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1926814 |
Allure report https://allure.adsw.io/launch/80301 |
Maybe add |
Yes, this is pretty strange, so I would agree with you. However, I'm not sure about ignoring .sql since they are actually tracked, but some are not, as far as I remember |
Allure report https://allure.adsw.io/launch/83966 |
Tests which have input files in |
@bandetto I know that, but there are some tests which are generated from .source file but the respective .sql is still tracked. Though I agree with you that if a test is generated then .sql should be ignored |
In order to run tests, we need the old version of resGroupPalloc, however, to keep old tests intact, we need to add it under a new name, because the old logic of resGroupPalloc isn't compatible with the new behaviour
Allure report https://allure.adsw.io/launch/84568 |
Allure report https://allure.adsw.io/launch/84658 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, one should note that once a process is detached from a resource group (it finished query execution), resource groups memory manager doesn't see this memory again.
Can you please elaborate what this means?
We could've moved inactive processes' memory to freeChunks, but that would drain freeChunks pretty fast
Why?
Please explain these points, and why the added tests are needed in PR description (and maybe you can skip explaining some of parts in the description, if they are not really related to the patch, and shorten it to only describe key changes).
src/test/isolation2/input/resgroup/resgroup_startup_memory.source
Outdated
Show resolved
Hide resolved
src/test/isolation2/output/resgroup/resgroup_startup_memory.source
Outdated
Show resolved
Hide resolved
-- Start a session which will be detached from a group when the query is done | ||
-- resource groups can't see startup chunks occupied by a detached session but | ||
-- the vmem tracker can | ||
0: SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE pid != (SELECT pg_backend_pid()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here and below
0: SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE pid != (SELECT pg_backend_pid()); | |
0: SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE pid != pg_backend_pid(); |
Do we really want to kill ALL backends except the current one?!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's the point of the test, I don't want other processes to interfere. I can change it if you explain what problems it can cause
Allure report https://allure.adsw.io/launch/85888 |
Allure report https://allure.adsw.io/launch/85900 |
! gpconfig -c runaway_detector_activation_percent -v 20; | ||
! gpstop -rai; | ||
|
||
CREATE OR REPLACE FUNCTION resGroupPalloc(float) RETURNS int AS '/home/kds/gpdb/src/test/isolation2/../regress/regress.so', 'resGroupPalloc' LANGUAGE C READS SQL DATA; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Path remained the same (without generic variable), but the tests have passed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I did it intentionally. Compilation path is defined during configuration stage, so it differs on different machines. Having such statement outside ignore
block will cause this test to fail if your compilation path is not the same compared to the default path on ci. If this statement fails to complete, it's not a problem either, because those statements which use this function will indicate the problem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's about using @abs_builddir@
and @DLSUFFIX@
used in input, in output as well? It's replaced in runtime by pg_regress
. The paths don't have to be inside ignore block that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Co-authored-by: Vladimir Sarmin <[email protected]>
Allure report https://allure.adsw.io/launch/85984 |
Allure report https://allure.adsw.io/launch/86597 |
Allure report https://allure.adsw.io/launch/86650 |
Count startup memory of each process when using resource groups
The goal of the patch is to make resource groups manager to track startup
memory of each process so that the redzone handler would estimate memory more
accurately, but the context needs to be brought.
Startup memory is considered correctly by the VMEM tracker, but it's not
accounted by the resource groups manager because on a process' startup this
memory is not added to self->memUsage. This happened because the startup memory
is considered on VMEM tracker initialization, but the ways VMEM tracker and
resource groups count memory are not consistent with each other.
This patch adds startup memory consumption to self->memUsage to make resource
groups consider this memory.
However, one should note that once a process is detached from a resource group
(it finished query execution), resource groups memory manager doesn't see this
memory again. This happens due to resource groups' design. We could've moved
inactive processes' memory to freeChunks, but that would drain freeChunks pretty
fast and cause new processes to fail because any process starts without a
resource group and acquires it later, thus it must add it's startup memory to
freeChunks at first. If freeChunks is already exhausted, a new process fails to
start. Also, there are other complexities in making resource groups manager to
fully consider startup memory. Thankfully, we have the VMEM tracker which
calculates it correctly, so if the resource groups manager fails to track it
correctly, we can still rely on it. It's impossible to solve this problem in
this patch because it's a design flaw rather than a bug.