fix(postgres): Add seed feature to postgres #576

OverkillGuy · 2024-05-21T00:16:13Z

Ref #541, implement the feature for postgres

Fix mysql "seed" feature's docstring (whoops)

OverkillGuy · 2024-05-21T00:17:23Z

This duplicated transfer_seed function from mysql makes me think it could be done via a more reusable "Mixin" class, or something, but that's up for discussion if the complexity overhead is worth the removal of duplication.

alexanderankin · 2024-05-21T00:39:19Z

I believe the overhead is not worth it, duplication is fine as we probably don't want to be in the business of maintaining that code in core. Core should stay clean/lean

…

On Mon, May 20, 2024, 7:17 PM Jb DOYON ***@***.***> wrote: This duplicated transfer_seed function from mysql makes me think it could be done via a more reusable "Mixin" class, or something, but that's up for discussion if the complexity overhead is worth the removal of duplication. — Reply to this email directly, view it on GitHub <#576 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACECGJB454KBY5DEQ5TYI43ZDKHCTAVCNFSM6AAAAABIASPHN6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRRGQ3TCNBXGQ> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

alexanderankin · 2024-05-24T22:46:47Z

in fact if you can remove transfer_seed from the generic class that would be great, we are planning to remove that eventually. i dont know how the mysql tests ever passed, without even looking at any test executions - you have to do the following dance if you want to place files into the container before the entrypoint executes (and this would go into core if we ever make it generic enough, like the DockerContainer):

loop
sleep for half a second
if some file exists, break
go to loop
execute the container image's original entrypoint (specific to the type of image the container is intended to be used with, for example: in mysql: docker image inspect mysql:8-oracle | jq '.[].Config.Entrypoint | .[]' -r prints docker-entrypoint.sh)

and then after you start the container, you can copy the files into it asynchronously from python, do whatever you have to do, and then create the file you are using an IPC signal in step 3 above.

you can see this implementation in the kafka container

OverkillGuy · 2024-05-25T14:40:14Z

I'll do anything that makes this feature more maintainable. Just want to make sure I understand your message.

The implementation provided here + mysql is indeed "racing" between our container.put_archive() command vs the container.start()'s entrypoint hoping that we can transfer the files before the mysql/postgres entrypoint hasn't finished yet.

That's bad because unreliable, on principle, as depending on winning a race is a recipe for hard to pin down bugs. But I expect worked fine here (in practice) because the start-up time of these dbs is in the multiple seconds, plenty of time to copy across a folder, so we always "won that race", enough that the feature worked.

I do see that the proposed solution you posted is nicer, in that it avoids the race. Paraphrasing, it seems to override the entrypoint to replace it with our own waiter script, giving time for put_archive to complete, blocking the script until some "sentinel" files are created, marking time to continue by executing the original entrypoint.

I like that solution, and I think I should be able to port this over to here, maybe even in a way that can be moved to the core class? Will investigate.

OverkillGuy · 2024-05-25T14:40:42Z

And yes remove the transfer_seed and DbContainer "bits" too

alexanderankin · 2024-05-25T16:50:45Z

I think we are on the same page.

here is the specific code i was referring to:

testcontainers-python/modules/kafka/testcontainers/kafka/__init__.py

Lines 78 to 82 in 9d2ceb6

    
           def start(self, timeout=30) -> "KafkaContainer": 
        
               script = KafkaContainer.TC_START_SCRIPT 
        
               command = f'sh -c "while [ ! -f {script} ]; do sleep 0.1; done; sh {script}"' 
        
               self.with_command(command) 
        
               super().start()

OverkillGuy · 2024-06-02T00:19:16Z

Okay I think I have an experimental version of the wait-for-sentinel system for mysql, to polish as PR soon, then use in postgres too.

Logic is mostly now in DbContainer class, nothing prevents moving it up to DockerContainer in general, but conceptual: what would the folder-transfer feature even mean for DockerContainer, when for DbContainer it's of interest to "mount" a folder.

One thing to note on the reference implementation: the Kafka solution linked is reusing the TC_START_SCRIPT path as both sentinel file to wait for presence of (regardless of content, if it exists we're done), and the content of that file is also original command, to re-launch after waiting via exec.
But there's no particular reason to use that file path both as sentinel and as command to execute afterwards.

Instead I've locally got the inspect_image() part of the docker-py low-level API, to get to the original image command, and set that command from before.
The most challenging part was to find a way to not bork the init, after change of start command = redirection of PID 1. I have to source the original entrypoint by hand, and relaunch original command. This is a little image-specific and may need a second pair of eyes to get generic enough.

So, yeah, I'll send it soon, but I thought some of these points were worth discussing now, not having to wait for the PR itself.

Issue currently is the tests for postgres fail: The container exits on postgres (not mysql, for identical both entrypoint and command...) with error: sh: 8: source: not found sh: 8: _main: not found I'm still tracking it down, because it's weird that /bin/sh says "source" does not exist (but not with mysql) but the _main should be available from the sourced entrypoint script.

OverkillGuy · 2024-06-02T23:35:38Z

Okay, just sent back the rewritten solution per above thread, using (for now) a DbContainer-level command override, like Kafka does, and with per-image seed mount location, and startup command prefix.

It works fine for mysql, but the postgres image one exits (see the last commit's message). Seems that for mysql/postgres, the actual command (mysqld and postgres) needs the entrypoint script to be basically PID 1 or so, which is what the self.startup_command trick is for: source the original script, and run that main to set envvars properly.

I'm fully open to moving this dbcontainer-level thing up to DockerContainer, just not sure how that would conceptually fit.

Sending this for a look over, while I try to understand the exact issue around postgres entrypoint causing exits, to be fixed.

codecov · 2024-07-07T15:16:03Z

Codecov Report

Attention: Patch coverage is 0% with 35 lines in your changes missing coverage. Please review.

Please upload report for BASE (main@0ce4fec). Learn more about missing BASE report.

Files	Patch %	Lines
core/testcontainers/core/generic.py	0.00%	35 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #576   +/-   ##
=======================================
  Coverage        ?   72.75%           
=======================================
  Files           ?       11           
  Lines           ?      613           
  Branches        ?       87           
=======================================
  Hits            ?      446           
  Misses          ?      141           
  Partials        ?       26

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Jb DOYON added 2 commits May 21, 2024 01:14

Fix mysql "seed" feature's docstring

176679b

Add seed feature to postgres

7764782

OverkillGuy mentioned this pull request May 21, 2024

Feature: Seed data for DbContainer #541

Open

OverkillGuy marked this pull request as ready for review May 21, 2024 00:35

alexanderankin changed the title ~~Add seed feature to postgres~~ fix: Add seed feature to postgres May 21, 2024

alexanderankin added the community-feat feature but its a community module so we wont bump tc core for it label May 21, 2024

alexanderankin changed the title ~~fix: Add seed feature to postgres~~ fix(postgres): Add seed feature to postgres May 21, 2024

Jb DOYON added 3 commits June 2, 2024 01:31

Replace mysql command init to wait for files

6a681f0

Refactor mysql mountpoint to generic

9dfc358

Wrap command in bash -c

0eec7ae

Merge branch 'main' into dbseed-pg

519c7d3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(postgres): Add seed feature to postgres #576

fix(postgres): Add seed feature to postgres #576

OverkillGuy commented May 21, 2024

OverkillGuy commented May 21, 2024

alexanderankin commented May 21, 2024 via email

alexanderankin commented May 24, 2024

OverkillGuy commented May 25, 2024

OverkillGuy commented May 25, 2024

alexanderankin commented May 25, 2024

OverkillGuy commented Jun 2, 2024

OverkillGuy commented Jun 2, 2024

codecov bot commented Jul 7, 2024 •

edited

Loading

fix(postgres): Add seed feature to postgres #576

Are you sure you want to change the base?

fix(postgres): Add seed feature to postgres #576

Conversation

OverkillGuy commented May 21, 2024

OverkillGuy commented May 21, 2024

alexanderankin commented May 21, 2024 via email

alexanderankin commented May 24, 2024

OverkillGuy commented May 25, 2024

OverkillGuy commented May 25, 2024

alexanderankin commented May 25, 2024

OverkillGuy commented Jun 2, 2024

OverkillGuy commented Jun 2, 2024

codecov bot commented Jul 7, 2024 • edited Loading

Codecov Report

codecov bot commented Jul 7, 2024 •

edited

Loading