Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError in wgbs.map Step #66

Open
ebprideaux opened this issue Oct 15, 2021 · 4 comments
Open

TypeError in wgbs.map Step #66

ebprideaux opened this issue Oct 15, 2021 · 4 comments

Comments

@ebprideaux
Copy link

Describe the bug
At the wgbs.map step, I get a TypeError:

==== NAME=wgbs.map, STATUS=Failed, PARENT=
SHARD_IDX=1, RC=1, JOB_ID=9608
START=2021-10-15T03:04:21.289Z, END=2021-10-15T03:09:23.663Z
STDOUT=/resource3/data/WGBS/Processed_Caper/wgbs/9f04e4dd-2a84-4af6-8731-d3c49f6e2782/call-map/shard-1/attempt-2/execution/stdout
STDERR=/resource3/data/WGBS/Processed_Caper/wgbs/9f04e4dd-2a84-4af6-8731-d3c49f6e2782/call-map/shard-1/attempt-2/execution/stderr
STDERR_CONTENTS=
:
: Command map started at 2021-10-14 20:08:16.349606
:
: ------------ Mapping Parameters ------------
: Sample barcode : sample_1
: Data set : 1
: No. threads : 8
: Index : indexes/hg38.BS.gem
: Paired : False
: Read non stranded: False
: Type : SINGLE
: Input Files : ./fastq/1/Control_S1_L004_R2_001.fastq.gz
: Output dir : ./mapping/sample_1
:
: Bisulfite Mapping...
TypeError: sequence item 14: expected str instance, NoneType found
ln: failed to access '/resource3/data/WGBS/Processed_Caper/wgbs/9f04e4dd-2a84-4af6-8731-d3c49f6e2782/call-map/shard-1/attempt-2/execution/mapping//*.bam': No such file or directory
ln: failed to access '/resource3/data/WGBS/Processed_Caper/wgbs/9f04e4dd-2a84-4af6-8731-d3c49f6e2782/call-map/shard-1/attempt-2/execution/mapping/
/.csi': No such file or directory
ln: failed to access '/resource3/data/WGBS/Processed_Caper/wgbs/9f04e4dd-2a84-4af6-8731-d3c49f6e2782/call-map/shard-1/attempt-2/execution/mapping/**/
.bam.md5': No such file or directory
ln: failed to access '/resource3/data/WGBS/Processed_Caper/wgbs/9f04e4dd-2a84-4af6-8731-d3c49f6e2782/call-map/shard-1/attempt-2/execution/mapping/**/*.json': No such file or directory

How can I resolve this error?

OS/Platform

  • OS/Platform: Linux 7 CentOS on a Slurm-managed HPC
  • Conda version: conda 4.10.3
  • Pipeline version: 1.1.7
  • Caper version: 1.6.3

Caper configuration file
default.conf.txt

Error log
Caper automatically runs a troubleshooter for failed workflows. If it doesn't then get a WORKFLOW_ID of your failed workflow with caper list or directly use a metadata.json file on Caper's output directory.

$ caper debug [WORKFLOW_ID_OR_METADATA_JSON_FILE]

cromwell.out.txt

Input JSON File
json_input.txt

@paul-sud
Copy link
Contributor

This issue in gemBS is almost exactly the same as yours: heathsc/gemBS#37 , although I'm not sure if it is relevant to the version of gemBS used in the pipeline. You may be able to work around by passing in "wgbs.underconversion_sequence_name": "chrL" in your input. Since it looks like you used a lambda control you probably want to have this value there anyway so you can get the QC value for the bisulfite conversion rate.

FYI Conda isn't supported by this pipeline. I'd recommend using Docker, or if your HPC doesn't allow it then Singularity would be an option. In theory it should "just work" but in practice there can be quirks due to differences between the runtimes. I haven't tested this pipeline myself with Singularity so I can't say for sure.

By the way, looking at your input files, it looks like you might have paired-end data? You have it specified as

  "wgbs.fastqs": [
    [
      [
        "/resource3/data/WGBS/RawData/222_S2_L004_R1_001.fastq.gz"
      ],
      [
        "/resource3/data/WGBS/RawData/222_S2_L004_R2_001.fastq.gz"
      ]
    ],
    [
      [
        "/resource3/data/WGBS/RawData/Control_S1_L004_R1_001.fastq.gz"
      ],
      [
        "/resource3/data/WGBS/RawData/Control_S1_L004_R2_001.fastq.gz"
      ]
    ]
  ],

but if they are in fact paired, the two files should be placed in the same array like this:

  "wgbs.fastqs": [
    [
      [
        "/resource3/data/WGBS/RawData/222_S2_L004_R1_001.fastq.gz",
        "/resource3/data/WGBS/RawData/222_S2_L004_R2_001.fastq.gz"
      ]
    ],
    [
      [
        "/resource3/data/WGBS/RawData/Control_S1_L004_R1_001.fastq.gz",
        "/resource3/data/WGBS/RawData/Control_S1_L004_R2_001.fastq.gz"
      ]
    ]
  ],

@ebprideaux
Copy link
Author

I went ahead and changed the json input to reflect paired-end data. Thank you!

I believe my original json input already had:

"wgbs.underconversion_sequence_name": "chrL"

Is this what you were referring to?

Our HPC doesn't allow Docker. I don't have experience with Singularity, but may need to talk with our sysadmin about adding it (I don't have root privileges).

I am re-running with updated json input to see if error is replicated. Will follow up with results.
RHwgbsinput copy.json.update.txt

@paul-sud
Copy link
Contributor

Yeah sorry I missed that in the input, that's what I was referring to. In that case it looks OK then. If it still fails I would double check your gemBS version. You can see how it is installed in the pipeline here:

RUN git clone --recursive https://github.com/heathsc/gemBS.git && \

@ebprideaux
Copy link
Author

ebprideaux commented Oct 15, 2021

Confirming it failed on the same task:

==== NAME=wgbs.map, STATUS=Failed, PARENT=
SHARD_IDX=1, RC=1, JOB_ID=17853
START=2021-10-15T19:22:17.505Z, END=2021-10-15T19:27:35.017Z
STDOUT=/resource3/data/WGBS/Processed/wgbs/2ef10817-0cfc-4575-b976-16e85d0a46a3/call-map/shard-1/attempt-2/execution/stdout
STDERR=/resource3/data/WGBS/Processed/wgbs/2ef10817-0cfc-4575-b976-16e85d0a46a3/call-map/shard-1/attempt-2/execution/stderr
STDERR_CONTENTS=
:
: Command map started at 2021-10-15 12:26:08.265708
:
: ------------ Mapping Parameters ------------
: Sample barcode : sample_1
: Data set : 1
: No. threads : 8
: Index : indexes/hg38.BS.gem
: Paired : True
: Read non stranded: False
: Type : PAIRED
: Input Files : ./fastq/1/Control_S1_L004_R1_001.fastq.gz,./fastq/1/Control_S1_L004_R2_001.fastq.gz
: Output dir : ./mapping/sample_1
:
: Bisulfite Mapping...
TypeError: sequence item 17: expected str instance, NoneType found
ln: failed to access '/resource3/data/WGBS/Processed/wgbs/2ef10817-0cfc-4575-b976-16e85d0a46a3/call-map/shard-1/attempt-2/execution/mapping//*.bam': No such file or directory
ln: failed to access '/resource3/data/WGBS/Processed/wgbs/2ef10817-0cfc-4575-b976-16e85d0a46a3/call-map/shard-1/attempt-2/execution/mapping/
/.csi': No such file or directory
ln: failed to access '/resource3/data/WGBS/Processed/wgbs/2ef10817-0cfc-4575-b976-16e85d0a46a3/call-map/shard-1/attempt-2/execution/mapping/**/
.bam.md5': No such file or directory
ln: failed to access '/resource3/data/WGBS/Processed/wgbs/2ef10817-0cfc-4575-b976-16e85d0a46a3/call-map/shard-1/attempt-2/execution/mapping/**/*.json': No such file or directory

Looks like the gemBS conda installed is gembs-3.2.0 (released june 2018). I think this is likely the problem. From your link earlier, they pushed a new version that incorporated changes fixing this TypeError. Will try updating and get back to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants