Skip to content

Conversation

@Flamefire
Copy link
Contributor

@Flamefire Flamefire commented Jun 13, 2025

The current implementation of extract_file detected folders/files from the first tarball when extracting the second.
Due to the definition of find_base_dir it will then return the parent path (usually builddir)
which is used as the finalpath for sources.
This leads issues requiring workarounds in e.g. the Bundle easyblock
where sources from all components are extracted into the same folder.

Fix by storing the old state of the target folder and detect of
extraction resulted in a single (top-level) folder.

As part of this work I enhanced the documentation and test of find_base_dir and added tests for the expected and (previous) error cases of extract_file

guess_start_dir detects the fixed behavior already: If there is a prefix/<startdir>/<startdir> which doesn't exist prefix/<startdir> is used instead

E.g. GTK3 has start_dir set manually for components due to this bug which would now lead to builddir/foo/foo
IMO trivial to detect/fix

Remark: It is possible to have find_base_dir NOT change into the resulting directory which means we can do this explicitely when requested instead of undoing it. Not sure if anyone outside the main repo relies on that.

Followup: easybuilders/easybuild-easyblocks#3961

Copy link
Member

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Flamefire Rather than adding (more) binary blobs to the repo, can we create multi-0.0.tar.gz and multi-1.0.tar.gz dynamically in the test itself?

That probably even helps with exposing what's being tested, since the structure within these tarballs is now not visible.

This request is fueled by a recent example of binary blobs being abused.

I'm aware we have other binary blobs in here already, we should look into getting rid of those too.

@Flamefire
Copy link
Contributor Author

@boegel I replaced the added binaries by creating them in the test (replacing the commit to remove them from the history)

I also replaced all uses of other binaries in that test. The test-method I added should be easily usable for other tests.

What I noticed: Our filetools.make_archive is lacking a way to create an archive from a list of files or contents of a folder, i.e. exclude the parent folder from the hierarchy in the archive. See the new method I added.

@boegel
Copy link
Member

boegel commented Sep 24, 2025

@Flamefire merge conflict to resolve here...

Create an empty target folder so the extraction can change into the
extracted folder.
The current implementation of `extract_file` detected folders/files from the first tarball when extracting the second.
Due to the definition of `find_base_dir` it will then return the parent path (usually `builddir`)
which is used as the `finalpath` for sources.
This leads issues requiring workarounds in e.g. the Bundle easyblock
where sources from all components are extracted into the same folder.

Fix by storing the old state of the target folder and detect of
extraction resulted in a single (top-level) folder.
Adapt for `extract_dir` now returning the subfolder which it previously
did not because the downloaded tar file is in the same folder.
When a source is added multiple times the content after the 2nd
extraction will be the same and no new files/folders are detected.
So `extract_file` would return the parent folder.
However if there was a clear result after the first extraction the same
result should be returned.

This works trivially until something else is extracted to the same
folder between the 2 extraction operations.
@Flamefire
Copy link
Contributor Author

An added test, rebased to fix.

For compatibility with the previous version only.
`find_base_dir` in `extract_file` couldn't find the base dir for
subsequent sources so fell back to the parent directory which is the
builddir in this case
@Flamefire
Copy link
Contributor Author

Flamefire commented Oct 15, 2025

I found an issue with e.g. Clang:
Prior to this fix when extracting multiple sources the directory of the extracted source could not be uniquely determined which results in changing to the build dir instead inside the extract_step

We could

  • a) Change into the build dir whenever extracting multiple sources, which I find strange
  • b) Change into the directory of the first extracted source assuming this is the main source
  • c) Change into the directory of the last extracted source which seems to be the intention of the original code although that never(?) worked.

This doesn't matter much because in prepare_step we set and change to start_dir which is the final_path of the first source which will still be the same.
Clang fails because it does globing in the extract_step

FWIW: I'm using this for a while now installing hundreds of easyconfigs and Clang was the only issue I found so far. Followup: easybuilders/easybuild-easyblocks#3961

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants