Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic StreamReadError #1833

Open
standage opened this issue Jan 23, 2018 · 0 comments
Open

Generic StreamReadError #1833

standage opened this issue Jan 23, 2018 · 0 comments

Comments

@standage
Copy link
Member

My colleague @Parsoa has been using khmer to implement some genotyping software, and he has run into a snag. The issue can be isolated with the following minimal example.

import khmer
counttable = khmer.Counttable(31, 16e9, 4)
nseqs, nkmers = counttable.consume_seqfile('/share/hormozdiarilab/Data/Genomes/Illumina/1KG_Trio/HG00512.fq')

This code runs for approximately a day before failing with the following error.

Traceback (most recent call last):
  File "/usr/lib/python3.4/runpy.py", line 170, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.4/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/share/hormozdiarilab/Codes/NebulousSerendipity/counttable.py", line 5, in <module>
    n, kmers = counttable.consume_seqfile('/share/hormozdiarilab/Data/Genomes/Illumina/1KG_Trio/HG00513.fq')
  File "khmer/_oxli/graphs.pyx", line 235, in khmer._oxli.graphs.Hashtable.consume_seqfile (khmer/_oxli/graphs.cpp:5839)
OSError: Generic StreamReadError error

For such a simple example involving code that is executed so frequently elsewhere, we're having a hard time troubleshooting what the error might be. We've discussed several possibilities, each seemingly as unlikely as the next.

  • Problem with the Fastq file: probably not the case. Other programs have run just fine on it, and a malformed Fastq file usually (in my experience) elicits a more specific error message from khmer.
  • Problem with khmer: possible, but also unlikely. We're running this very code very frequently.
  • Problem with filesize: it's a 450 Gb uncompressed Fastq file. Is there something about the filesize that might be problematic?
  • Problem with the machine: the Cabernet cluster isn't noted for its stability or robustness. Could (i.e.) transient filesystem issues result in this error?

Has anybody else seen this error message? Under what circumstances?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant