Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use conda-package-handling.api.extract instead of tarfile.TarFile.extractall #5390

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

kenodegard
Copy link
Contributor

@kenodegard kenodegard commented Jun 25, 2024

Description

Python 3.12 introduced a deprecation warning in tarfile to warn of a future change in Python 3.14 where the default filter behavior in tarfile.TarFile.extract/tarfile.TarFile.extractall will change. Instead of allowing all file locations (and types) to be unpacked, the new default behavior is to only unpack the safe files (relative to the current location, no special file types, etc.). This is a reasonable change so here we attempt to adopt this change preemptively (and end up simplifying some of our code too!).

The decision to be made is whether we wish to use the strictest filtering (https://docs.python.org/3/library/tarfile.html#tarfile.data_filter) or the more permissive but still safer than before filtering (https://docs.python.org/3/library/tarfile.html#tarfile.tar_filter), I'm assuming we do not want the wild west no filtering of today (https://docs.python.org/3/library/tarfile.html#tarfile.fully_trusted_filter).

Instead of adjusting how we use tarfile we instead opt to use conda-package-handling for our extraction needs (it is then up to conda-package-streaming & conda-package-handling to filter correctly).

Side note, it is odd that this deprecation warning didn't surface earlier.

Xref #5387 #5379 #5281 conda/conda-package-streaming#87

Checklist - did you ...

  • Add a file to the news directory (using the template) for the next release's release notes?
  • Add / update necessary tests?
  • Add / update outdated documentation?

@conda-bot conda-bot added the cla-signed [bot] added once the contributor has signed the CLA label Jun 25, 2024
Copy link

codspeed-hq bot commented Jun 25, 2024

CodSpeed Performance Report

Merging #5390 will not alter performance

Comparing kenodegard:tarfile-filter-data (af58569) with main (c7a1e9b)

Summary

✅ 3 untouched benchmarks

@kenodegard kenodegard force-pushed the tarfile-filter-data branch 2 times, most recently from 4d24e41 to a3ba98f Compare June 25, 2024 15:51
conda_build/convert.py Outdated Show resolved Hide resolved
conda_build/render.py Outdated Show resolved Hide resolved
conda_build/utils.py Outdated Show resolved Hide resolved
@kenodegard kenodegard marked this pull request as ready for review June 26, 2024 08:01
@kenodegard kenodegard requested a review from a team as a code owner June 26, 2024 08:01
@zklaus
Copy link
Contributor

zklaus commented Jun 26, 2024

Note that the filter keyword was introduced in Python 3.12. It was apparently backported all the way to 3.8, but is available only in later patch level releases, namely:

Python version first patch level with filter
3.8 3.8.17
3.9 3.9.17
3.10 3.10.12
3.11 3.11.4
3.12 3.12.0

How do we handle that? Detect the used version and only add filter if supported? Or somehow assume/ensure that a recent enough Python version is available?

@kenodegard
Copy link
Contributor Author

@zklaus ah I had not caught that it was backported, I guess the easiest in this case would be duck typing

@jaimergp
Copy link
Contributor

The deprecation warning is only added in 3.12, so I'd say we only specify filter in those versions. The alternative is to reuse conda-package-streaming class for the extraction and handle the filters there?

@kenodegard kenodegard changed the title Silence deprecation warnings from tarfile by specifying filter='data' Use conda-package-handling.api.extract instead of tarfile.TarFile.extractall Jun 26, 2024
@zklaus
Copy link
Contributor

zklaus commented Jun 26, 2024

I suppose the continuation of the discussion about the appropriate filter belongs in cph now, but anyway, I think we might even need a custom filter, at least if we want to maintain the logic including umasks.

@dholth
Copy link
Contributor

dholth commented Jun 30, 2024

We implemented our own filter in conda-package-streaming. Could we use the Python one instead or pass our filter function through that parameter?

@@ -31,6 +31,7 @@
from conda.gateways.disk.create import TemporaryDirectory
from conda.models.records import PackageRecord
from conda.models.version import VersionOrder
from conda_package_handling.api import extract
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API can only extract .conda or .tar.bz2, not any other .tgz etc. variant.

@kenodegard kenodegard marked this pull request as draft August 5, 2024 20:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed [bot] added once the contributor has signed the CLA
Projects
Status: 🛑 Blocked
Development

Successfully merging this pull request may close these issues.

6 participants