Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pylint] re-enable consider-using-{with, f-string} #3729

Merged

Conversation

pponnuvel
Copy link
Contributor

@pponnuvel pponnuvel commented Jul 23, 2024


Please place an 'X' inside each '[]' to confirm you adhere to our Contributor Guidelines

  • Is the commit message split over multiple lines and hard-wrapped at 72 characters?
  • Is the subject and message clear and concise?
  • Does the subject start with [plugin_name] if submitting a plugin patch or a [section_name] if part of the core sosreport code?
  • Does the commit contain a Signed-off-by: First Lastname [email protected]?
  • Are any related Issues or existing PRs properly referenced via a Closes (Issue) or Resolved (PR) line?
  • Are all passwords or private data gathered by this PR obfuscated?

sos/cleaner/archives/__init__.py Fixed Show fixed Hide fixed
sos/utilities.py Fixed Show fixed Hide fixed
sos/cleaner/archives/__init__.py Fixed Show fixed Hide fixed
@pponnuvel pponnuvel force-pushed the re-enable-consider-using-with branch 3 times, most recently from 9a20d20 to a087b41 Compare July 23, 2024 12:57
@pponnuvel pponnuvel force-pushed the re-enable-consider-using-with branch from a087b41 to 4ef1d23 Compare July 23, 2024 13:01
Copy link

Congratulations! One of the builds has completed. 🍾

You can install the built RPMs by following these steps:

  • sudo yum install -y dnf-plugins-core on RHEL 8
  • sudo dnf install -y dnf-plugins-core on Fedora
  • dnf copr enable packit/sosreport-sos-3729
  • And now you can install the packages.

Please note that the RPMs should be used only in a testing environment.

@pponnuvel pponnuvel force-pushed the re-enable-consider-using-with branch 2 times, most recently from 6b4b0df to 64c90b8 Compare July 23, 2024 15:26
@pponnuvel pponnuvel marked this pull request as draft July 23, 2024 15:26
@pponnuvel pponnuvel force-pushed the re-enable-consider-using-with branch 2 times, most recently from 63d24bf to 3b2a274 Compare July 23, 2024 15:43
archive.extraction_filter = getattr(tarfile, 'fully_trusted_filter',
(lambda member, path: member))

# Guard against "Arbitrary file write during tarfile extraction"
Copy link
Contributor Author

@pponnuvel pponnuvel Jul 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracting files one-by-one here. Had to check if the extracted path of files don't have relative paths and stay within the directory to handle "Arbitrary file write during tarfile extraction".

https://codeql.github.com/codeql-query-help/python/py-tarslip/

#3729 (review)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting: I was afraid of negative runtime impact of this change so I run cleaner on some sosreport tarball twice: once without the patch and once with the patch. The difference was statistically insignificant (a fraction of a second from 3m cleaner run).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I think that's a reasonable.

If we want to fix "Arbitrary file write during tarfile extraction", I think this is the only way (check each file and confirm). Alternative is to ignore it altogether (i.e. keep this code as before without this change) if we think the source tar file is always trustable.

Copy link
Contributor Author

@pponnuvel pponnuvel Aug 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at https://github.com/python/cpython/blob/main/Lib/tarfile.py#L2265, extract() and extractall() pretty much work in the same way.

The major difference between previous and the change in this PR is one additional syscall (getcwd) due to os.path.abspath call. Cost of a single getcwd call varies depending on few factors.

A quick test on my system:

ponnuvel@magicbox:~$ cat t.c
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <time.h>

int main() {
    struct timespec start, end;
    char cwd[1024];

    if (clock_gettime(CLOCK_MONOTONIC, &start) == -1) {
        perror("clock_gettime");
        exit(0);
    }

    if (getcwd(cwd, sizeof(cwd)) == NULL) {
        perror("getcwd");
        exit(0);
    }

    if (clock_gettime(CLOCK_MONOTONIC, &end) == -1) {
        perror("clock_gettime");
        exit(0);
    }

    // Calculate the elapsed time in nanoseconds
    long seconds = end.tv_sec - start.tv_sec;
    long nanoseconds = end.tv_nsec - start.tv_nsec;
    long elapsed = seconds * 1000000000L + nanoseconds;

    printf("Current working directory: %s\n", cwd);
    printf("Execution time of getcwd: %ld nanoseconds\n", elapsed);
}
ponnuvel@magicbox:~$ gcc -O3 t.c
ponnuvel@magicbox:~$ ./a.out
ponnuvel@magicbox:~$ pwd 
/home/ponnuvel
ponnuvel@magicbox:~$ ./a.out 
Current working directory: /home/ponnuvel
Execution time of getcwd: 4512 nanoseconds
ponnuvel@magicbox:~$ mkdir -p a/b/c/d/e/f/g/h/i/j/k
ponnuvel@magicbox:~$ cp a.out a/b/c/d/e/f/g/h/i/j/k/
ponnuvel@magicbox:~$ cd -
/home/ponnuvel/a/b/c/d/e/f/g/h/i/j/k
ponnuvel@magicbox:~/a/b/c/d/e/f/g/h/i/j/k$ pwd
/home/ponnuvel/a/b/c/d/e/f/g/h/i/j/k
ponnuvel@magicbox:~/a/b/c/d/e/f/g/h/i/j/k$ ./a.out 
Current working directory: /home/ponnuvel/a/b/c/d/e/f/g/h/i/j/k
Execution time of getcwd: 5415 nanoseconds

(a relatively low-spec system: 4CPU, i7-6500U CPU @ 2.50GHz, 12G memory)

So it takes about 4.5ms to 5.5ms on my system.

Let's round it up to 6ms per call. On a large sosreport I've (1.5GB), it contains ~180K file:

$ tar -tf sosreport-mon2-00384261-2024-07-12-czwswhw.tar.xz | wc -l
179703

6ms (per getcwd call) * 180k files ~= 1.08 seconds extra time on my system. I'd think most servers where sosreports are manipulated/extracted are more powerful than my 2016 laptop (where I ran this test). So the extra cost would liekly be less than 1 second for a reasonably sized sosreport on a production server.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should add: please don't think I am trying to justify the change. I am simply too curious and woken up early on a Sunday morning ;-)

I am totally fine to revert this change if the conclusion is "we can always trust the source tar files here".

@pponnuvel pponnuvel marked this pull request as ready for review July 23, 2024 16:26
@pponnuvel pponnuvel force-pushed the re-enable-consider-using-with branch 6 times, most recently from 5e831f9 to e1d619f Compare July 29, 2024 18:15
@pponnuvel pponnuvel force-pushed the re-enable-consider-using-with branch 2 times, most recently from 2f6f42d to 9aa8395 Compare August 2, 2024 16:00
@pponnuvel pponnuvel force-pushed the re-enable-consider-using-with branch from 9aa8395 to c2ff489 Compare August 4, 2024 04:20
@TurboTurtle TurboTurtle merged commit be1ad32 into sosreport:main Aug 8, 2024
32 checks passed
@pponnuvel pponnuvel deleted the re-enable-consider-using-with branch August 22, 2024 09:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants