Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(postgres): Add seed feature to postgres #576

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

OverkillGuy
Copy link
Contributor

Ref #541, implement the feature for postgres

Fix mysql "seed" feature's docstring (whoops)

@OverkillGuy
Copy link
Contributor Author

This duplicated transfer_seed function from mysql makes me think it could be done via a more reusable "Mixin" class, or something, but that's up for discussion if the complexity overhead is worth the removal of duplication.

@OverkillGuy OverkillGuy marked this pull request as ready for review May 21, 2024 00:35
@alexanderankin alexanderankin changed the title Add seed feature to postgres fix: Add seed feature to postgres May 21, 2024
@alexanderankin alexanderankin added the community-feat feature but its a community module so we wont bump tc core for it label May 21, 2024
@alexanderankin alexanderankin changed the title fix: Add seed feature to postgres fix(postgres): Add seed feature to postgres May 21, 2024
@alexanderankin
Copy link
Collaborator

alexanderankin commented May 21, 2024 via email

@alexanderankin
Copy link
Collaborator

in fact if you can remove transfer_seed from the generic class that would be great, we are planning to remove that eventually. i dont know how the mysql tests ever passed, without even looking at any test executions - you have to do the following dance if you want to place files into the container before the entrypoint executes (and this would go into core if we ever make it generic enough, like the DockerContainer):

  1. loop
  2. sleep for half a second
  3. if some file exists, break
  4. go to loop
  5. execute the container image's original entrypoint (specific to the type of image the container is intended to be used with, for example: in mysql: docker image inspect mysql:8-oracle | jq '.[].Config.Entrypoint | .[]' -r prints docker-entrypoint.sh)

and then after you start the container, you can copy the files into it asynchronously from python, do whatever you have to do, and then create the file you are using an IPC signal in step 3 above.

you can see this implementation in the kafka container

@OverkillGuy
Copy link
Contributor Author

I'll do anything that makes this feature more maintainable. Just want to make sure I understand your message.

The implementation provided here + mysql is indeed "racing" between our container.put_archive() command vs the container.start()'s entrypoint hoping that we can transfer the files before the mysql/postgres entrypoint hasn't finished yet.

That's bad because unreliable, on principle, as depending on winning a race is a recipe for hard to pin down bugs. But I expect worked fine here (in practice) because the start-up time of these dbs is in the multiple seconds, plenty of time to copy across a folder, so we always "won that race", enough that the feature worked.

I do see that the proposed solution you posted is nicer, in that it avoids the race. Paraphrasing, it seems to override the entrypoint to replace it with our own waiter script, giving time for put_archive to complete, blocking the script until some "sentinel" files are created, marking time to continue by executing the original entrypoint.

I like that solution, and I think I should be able to port this over to here, maybe even in a way that can be moved to the core class? Will investigate.

@OverkillGuy
Copy link
Contributor Author

And yes remove the transfer_seed and DbContainer "bits" too

@alexanderankin
Copy link
Collaborator

I think we are on the same page.

here is the specific code i was referring to:

def start(self, timeout=30) -> "KafkaContainer":
script = KafkaContainer.TC_START_SCRIPT
command = f'sh -c "while [ ! -f {script} ]; do sleep 0.1; done; sh {script}"'
self.with_command(command)
super().start()

@OverkillGuy
Copy link
Contributor Author

Okay I think I have an experimental version of the wait-for-sentinel system for mysql, to polish as PR soon, then use in postgres too.

Logic is mostly now in DbContainer class, nothing prevents moving it up to DockerContainer in general, but conceptual: what would the folder-transfer feature even mean for DockerContainer, when for DbContainer it's of interest to "mount" a folder.

One thing to note on the reference implementation: the Kafka solution linked is reusing the TC_START_SCRIPT path as both sentinel file to wait for presence of (regardless of content, if it exists we're done), and the content of that file is also original command, to re-launch after waiting via exec.
But there's no particular reason to use that file path both as sentinel and as command to execute afterwards.

Instead I've locally got the inspect_image() part of the docker-py low-level API, to get to the original image command, and set that command from before.
The most challenging part was to find a way to not bork the init, after change of start command = redirection of PID 1. I have to source the original entrypoint by hand, and relaunch original command. This is a little image-specific and may need a second pair of eyes to get generic enough.

So, yeah, I'll send it soon, but I thought some of these points were worth discussing now, not having to wait for the PR itself.

Jb DOYON added 3 commits June 2, 2024 01:31
Issue currently is the tests for postgres fail:
The container exits on postgres (not mysql, for identical both
entrypoint and command...) with error:

    sh: 8: source: not found
    sh: 8: _main: not found

I'm still tracking it down, because it's weird that /bin/sh says
"source" does not exist (but not with mysql) but the _main should be
available from the sourced entrypoint script.
@OverkillGuy
Copy link
Contributor Author

Okay, just sent back the rewritten solution per above thread, using (for now) a DbContainer-level command override, like Kafka does, and with per-image seed mount location, and startup command prefix.

It works fine for mysql, but the postgres image one exits (see the last commit's message). Seems that for mysql/postgres, the actual command (mysqld and postgres) needs the entrypoint script to be basically PID 1 or so, which is what the self.startup_command trick is for: source the original script, and run that main to set envvars properly.

I'm fully open to moving this dbcontainer-level thing up to DockerContainer, just not sure how that would conceptually fit.

Sending this for a look over, while I try to understand the exact issue around postgres entrypoint causing exits, to be fixed.

Copy link

codecov bot commented Jul 7, 2024

Codecov Report

Attention: Patch coverage is 0% with 35 lines in your changes missing coverage. Please review.

Please upload report for BASE (main@0ce4fec). Learn more about missing BASE report.

Files Patch % Lines
core/testcontainers/core/generic.py 0.00% 35 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #576   +/-   ##
=======================================
  Coverage        ?   72.75%           
=======================================
  Files           ?       11           
  Lines           ?      613           
  Branches        ?       87           
=======================================
  Hits            ?      446           
  Misses          ?      141           
  Partials        ?       26           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-feat feature but its a community module so we wont bump tc core for it
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants