Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_timeline_size_quota_on_startup is flaky #6562

Open
jcsp opened this issue Feb 1, 2024 · 4 comments · May be fixed by #8255
Open

test_timeline_size_quota_on_startup is flaky #6562

jcsp opened this issue Feb 1, 2024 · 4 comments · May be fixed by #8255
Assignees
Labels
c/compute Component: compute, excluding postgres itself

Comments

@jcsp
Copy link
Contributor

jcsp commented Feb 1, 2024

Example failure: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-6471/7733161123/index.html#/testresult/abd05685d6e4707a

This test is exercising both size enforcement and backpressue, with a size limit of 30MB and the backpressure settings at the test defaults (max_replication_write_lag of 15MB).

From the compute side, postgres is indeed writing >30MB:

PG:2024-01-31 22:44:12.975 GMT [244895] LOG:  checkpoint complete: wrote 95 buffers (74.2%); 0 WAL file(s) added, 0 removed, 3 recycled; write=0.001 s, sync=0.001 s, total=0.001 s; sync files=0, longest=0.000 s, average=0.000 s; distance=47663 kB, estimate=47663 kB; lsn=0/437A9D0, redo lsn=0/437A9D0

The test is running in about 1 second, so we're reliant on the backpressure: if left to chance, it's unlikely that the pageserver would ingest data and update logical sizes. My hunch would be that something is going wrong with backpressure in the compute/safekeeper interaction.

@vadim2404 vadim2404 added the c/compute Component: compute, excluding postgres itself label Feb 1, 2024
@jcsp
Copy link
Contributor Author

jcsp commented Feb 1, 2024

Thread discussing this test: https://neondb.slack.com/archives/C04KGFVUWUQ/p1706784289405799

@jcsp
Copy link
Contributor Author

jcsp commented Feb 29, 2024

@lubennikovaav did you have any progress investigating this?

@koivunej
Copy link
Contributor

My earlier thoughts: #6542 (comment)

@jcsp jcsp linked a pull request Jul 3, 2024 that will close this issue
5 tasks
@jcsp
Copy link
Contributor Author

jcsp commented Jul 3, 2024

I got tired of looking at this in the flaky test dashboard.

#8255

@jcsp jcsp assigned jcsp and unassigned lubennikovaav Jul 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/compute Component: compute, excluding postgres itself
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants