Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SyntaxWarning for invalid escape sequences in Python 3.12 #1457

Open
1 task done
Simon-Brandt opened this issue Mar 20, 2025 · 3 comments · May be fixed by #1459
Open
1 task done

SyntaxWarning for invalid escape sequences in Python 3.12 #1457

Simon-Brandt opened this issue Mar 20, 2025 · 3 comments · May be fixed by #1459
Assignees

Comments

@Simon-Brandt
Copy link

Description of bug

In Python >= 3.6, invalid escape sequences for Unicode strings emit a DeprecationWarning, changed to a SyntaxWarning in Python 3.12, to finally become a SyntaxError in a future Python version. SPAdes uses several of these invalid escape sequences across the Python scripts, most notably (or only?) in regular expressions. I obtained two of these warnings for a metaSPAdes run, in:

source_file, line_no = re.match(
'\<doctest (.*\.rst)\[(.*)\]\>',
source_file).groups()

and in:

cookie_re = re.compile("coding[:=]\s*([-\w.]+)")

To my knowledge, < and > should have never carried a specific meaning in Python's regex flavor and thus shouldn't have been required to escape, whilst the character classes \s and \d indeed do. As the SyntaxWarning will eventually become a SyntaxError, SPAdes will break, in the future. Since the same error has already been reported in #1320 and in #1326, but was only selectively fixed in 60ad35e, it may be preferable to go through the code base and find all strings with escape sequences and fix them, either by doubling the backslashes for Python's parser, or, preferred, by marking them as raw strings. If you want, I could try fixing this myself via pull request.

spades.log

Since my dataset contains sensitive information (including the names of file paths), I cannot upload the spades.log and am only able to provide the following snippets. Since, however, the error should be clear, I hope more data isn't needed.

Command line: /opt/spades/bin/spades.py --meta --threads=32 --memory=200 -1 /path/to/r1.fastq.gz -2 /path/to/r2.fastq.gz -o /path/to/metaspades_out_sample_1
                                                                                
System information:                                                             
  SPAdes version: 4.1.0                                                         
  Python version: 3.12.3                                                        
  OS: Linux-5.4.0-208-generic-x86_64-with-glibc2.39 
== Running: /usr/bin/python3 /opt/spades/share/spades/spades_pipeline/scripts/compress_all.py --input_file /path/to/metaspades_out_sample_1/corrected/corrected.yaml --ext_python_modules_home /opt/spades/share/spades --max_threads 32 --output_dir /path/to/metaspades_out_sample_1/corrected --gzip_output
                                                                                
/opt/spades/share/spades/joblib3/func_inspect.py:51: SyntaxWarning: invalid escape sequence '\<'
  '\<doctest (.*\.rst)\[(.*)\]\>',                                              
/opt/spades/share/spades/joblib3/_memory_helpers.py:10: SyntaxWarning: invalid escape sequence '\s'
  cookie_re = re.compile("coding[:=]\s*([-\w.]+)")   

params.txt

For the params.txt, I replaced the sensitive paths with /path/to.

SPAdes version

SPAdes 4.1.0

Operating System

Linux-5.4.0-208-generic-x86_64-with-glibc2.39

Python Version

Python 3.12.3

Method of SPAdes installation

Manual compilation, virtualized as Docker container, run as Singularity image

No errors reported in spades.log

  • Yes
@andrewprzh
Copy link
Collaborator

Hi @Simon-Brandt

Thanks a lot for the report!

This comes from a joblib that we imported once, a long time ago. Surprisingly, it was not fixed in their repository at the moment. Otherwise, we could've just updated it.

Pull request is most welcomed and appreciated! Quick search in main folders with Python code (ext/src/python_libs, src/projects/spades/pipeline) didn't give me any suspicious places, but second look would be great.

Also, joblib is only used for running gzip in parallel, which, I believe, can be done using inbuilt Python methods, so maybe we don't even need joblib.

Best
Andrey

@Simon-Brandt Simon-Brandt linked a pull request Mar 25, 2025 that will close this issue
@Simon-Brandt
Copy link
Author

Using the following two regexes

Regex 1: (?<!r)"[^"]*?\\[^0nrtux'"].*?"
Regex 2: (?<!r)'[^']*?\\[^0nrtux'"].*?'

on all Python scripts, I found several more places with problematic escape sequences and tried to fix them now. The regexes also apply for doubled backslashes (which are valid), but they are rare enough not to flood the output. In most cases, the issues were indeed related to the re module, i.e., to regexes, just in one case to os.system.

Besides, I noticed several other issues maybe worth some attention, viz. Python 2.x code (which is not supported, anymore, and a security risk), like print statements or outdated raise and except expressions in e.g. pydot.py. Since the manual claims Python 3.8 as minimum requirement, the Python 2 code won't run, but I guess (without verification) that the respective libraries aren't in use for the core of SPAdes.

Pylance reported several other errors, such as intermixed spaces and tabs in e.g. texify_results.py. Besides, there were instances with harmless but unnecessary trailing semicolons or unused module imports.

The last thing I noticed were shebang lines given after the copyright header, like in test.py. Since the kernel only interprets shebang lines when the #! makes up the first two bytes of an executable script, running ./test.py would start an sh or bash instance (depending on the currently active shell), not /usr/bin/python. Of course, when the scripts are only used as libraries via import, the shebang line doesn't matter.

@asl
Copy link
Member

asl commented Mar 25, 2025

@Simon-Brandt You can limit yourself you the code in https://github.com/ablab/spades/tree/main/src/projects/spades/pipeline and in https://github.com/ablab/spades/tree/main/ext/src/python_libs

Everything else are some aux scripts that are not used / run in the pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants