Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NlaIII digestion problem #39

Open
StevenWingett opened this issue Sep 29, 2020 · 3 comments
Open

NlaIII digestion problem #39

StevenWingett opened this issue Sep 29, 2020 · 3 comments
Labels
bug Something isn't working

Comments

@StevenWingett
Copy link
Owner

We used NlaIII enzyme to digest in our Hi-C.
I specified --re1 CATG^,NlaIII when I run hicup_digester, the result file seems good, here shows the head of the file.

But when I run HICUP with it, it gives no results, from the log I found there is no sequence in [].

Truncating with HiCUP Truncater v0.7.4
Truncating sequences at occurrence of sequences '[]'
Truncating sequences

@StevenWingett StevenWingett added the bug Something isn't working label Sep 29, 2020
@mtekman
Copy link

mtekman commented Dec 8, 2023

I had the same issue, and I think it's because you need to provide the dangling sequence too on the otherside of the caret:

https://en.wikipedia.org/wiki/NlaIII

so I got something to print, when I used --re1 CATG^CATG,NlaIII

Truncating with HiCUP Truncater v0.8.3
Truncating sequences at occurrence of sequences '[CATGCATG]'
Truncating sequences
Truncating R1_fq.gz
Truncating R2_fq.gz

Edit: ignore this comment, see updated issue below

@mtekman
Copy link

mtekman commented Jan 30, 2024

After more playing around I realise that "CATG^" should actually work, and that the sequences being looked for should just be "CATG" and not "CATGCATG" or "CATGGTAC" or any other.

Currently the truncated file is completely improperly truncated:

If I have a sequence

I'll call this file test1.fq

@A00627:719:H7LLYDSX7:3:1101:20003:4914 1:N:0:ATCACG
ACCTAAAGCTTTACTACAGAGCAATTGTGATAAAAACTGCATGGTACTGGTATAGAGACAGACAAGTAGACCAATGGACT
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFF,F
@A00627:719:H7LLYDSX7:3:1101:19208:8202 1:N:0:ATCACG
AGAAAGAAAGAAAGAAAGAAACTCGTTTCTCTGAGATGTAGGCCATGGTACCTGACAGTTTAAAATTGAAACAAACAAAGACACAAGGAAGTGTGGGTGGGGT
+
FFFFFFFFFFF:FFFFFFF:FFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFF
@A00627:719:H7LLYDSX7:3:1101:4616:9267 1:N:0:ATCACG
AGCTACAAGGTCAGAGAGAGAGAGAGAGAGAGAGAGAGAGAATGAATATGAATCATGGTACCTGAAGCATATCTTGCAATTTACAATCATATACAGAAATTAAT
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF:FFFFFF:FFFFFFFFFFFF:FFF
@A00627:719:H7LLYDSX7:3:1101:9263:9580 1:N:0:ATCACG
ATACGTAGCCCAAGCTAGCTACAATCTCAAGATCCTCCTGCTTCAGCCTCCTGGGTGCTAGGATTACAGGCATGGTACCTTATCC
+
FFF,,FF,F,:FF,FFFFFF,,FFFFFFFFFFF,:F,FFF,:F:F:,FF,:F:FFFFFF:F,F:,:F:,F,,F,,FF,FF::FF:

How CATG^GTAC is truncated

rm -rf test_dir; mkdir test;
hicup_truncater --re1 "CATG^GTAC"  test1.fq test1.fq  ## just write it twice for testing

yields:

"Truncating sequences at occurrence of sequences '[CATGGTAC]'"

@A00627:719:H7LLYDSX7:3:1101:20003:4914 1:N:0:ATCACG
ACCTAAAGCTTTACTACAGAGCAATTGTGATAAAAACTGCATGGTAC
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:F
@A00627:719:H7LLYDSX7:3:1101:19208:8202 1:N:0:ATCACG
AGAAAGAAAGAAAGAAAGAAACTCGTTTCTCTGAGATGTAGGCCATGGTAC
+
FFFFFFFFFFF:FFFFFFF:FFFFF:FFFFFFFFFFFFFFFFFFFFFFFFF
@A00627:719:H7LLYDSX7:3:1101:4616:9267 1:N:0:ATCACG
AGCTACAAGGTCAGAGAGAGAGAGAGAGAGAGAGAGAGAGAATGAATATGAATCATGGTAC
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFF
@A00627:719:H7LLYDSX7:3:1101:9263:9580 1:N:0:ATCACG
ATACGTAGCCCAAGCTAGCTACAATCTCAAGATCCTCCTGCTTCAGCCTCCTGGGTGCTAGGATTACAGGCATGGTAC
+
FFF,,FF,F,:FF,FFFFFF,,FFFFFFFFFFF,:F,FFF,:F:F:,FF,:F:FFFFFF:F,F:,:F:,F,,F,,FF,

Note that each read which matches "CATGGTAC" is cut, and the sequence ends with it

How CATG^ is truncated:

rm -rf test_dir; mkdir test;
hicup_truncater --re1 "CATG^"  test1.fq test1.fq

yields:

"Truncating sequences at occurrence of sequences '[]'"

@A00627:719:H7LLYDSX7:3:1101:20003:4914 1:N:0:ATCACG
ACATG
+
FFFFF
@A00627:719:H7LLYDSX7:3:1101:19208:8202 1:N:0:ATCACG
ACATG
+
FFFFF
@A00627:719:H7LLYDSX7:3:1101:4616:9267 1:N:0:ATCACG
ACATG
+
FFFFF
@A00627:719:H7LLYDSX7:3:1101:9263:9580 1:N:0:ATCACG
ACATG
+
FFF,,

Note how the read has basically just vanished. This is wrong

What CATG^ should be producing

rm -rf test_dir; mkdir test;
hicup_truncater --re1 "CATG^"  test1.fq test1.fq

"Truncating sequences at occurrence of sequences '[CATG]'"

@A00627:719:H7LLYDSX7:3:1101:20003:4914 1:N:0:ATCACG
ACCTAAAGCTTTACTACAGAGCAATTGTGATAAAAACTGCATG
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@A00627:719:H7LLYDSX7:3:1101:19208:8202 1:N:0:ATCACG
AGAAAGAAAGAAAGAAAGAAACTCGTTTCTCTGAGATGTAGGCCATG
+
FFFFFFFFFFF:FFFFFFF:FFFFF:FFFFFFFFFFFFFFFFFFFFF
@A00627:719:H7LLYDSX7:3:1101:4616:9267 1:N:0:ATCACG
AGCTACAAGGTCAGAGAGAGAGAGAGAGAGAGAGAGAGAGAATGAATATGAATCATG
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFF
@A00627:719:H7LLYDSX7:3:1101:9263:9580 1:N:0:ATCACG
ATACGTAGCCCAAGCTAGCTACAATCTCAAGATCCTCCTGCTTCAGCCTCCTGGGTGCTAGGATTACAGGCATG
+
FFF,,FF,F,:FF,FFFFFF,,FFFFFFFFFFF,:F,FFF,:F:F:,FF,:F:FFFFFF:F,F:,:F:,F,,F,

Note how each read is truncated by ends with the desired sequence.

@mtekman
Copy link

mtekman commented Jan 30, 2024

This fix is implemented in PR above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants