Skip to content

Commit

Permalink
Merge pull request #31 from ressy/release-0.0.9
Browse files Browse the repository at this point in the history
Release 0.0.9
  • Loading branch information
ressy authored Jul 20, 2021
2 parents df7ca99 + 81dbc40 commit 9fab52a
Show file tree
Hide file tree
Showing 18 changed files with 357 additions and 211 deletions.
22 changes: 21 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,25 @@
# Changelog

## 0.0.9 - 2021-07-20

### Added

* `--outdir` argument to set output directory other than current working
directory ([#24])
* `--no-collapse` argument (and updates to `request` function) to disable
automatic combining of results across batched submissions ([#25])

### Fixed

* Empty config files now result in the usual error message about required
options ([#30])
* All command-line options now match V-QUEST option names ([#28])

[#30]: https://github.com/ressy/vquest/pull/30
[#28]: https://github.com/ressy/vquest/pull/28
[#25]: https://github.com/ressy/vquest/pull/25
[#24]: https://github.com/ressy/vquest/pull/24

## 0.0.8 - 2021-07-13

### Fixed
Expand Down Expand Up @@ -33,7 +53,7 @@

### Added

* `--align` argument (via `airr_to_fasta` function) for exraction of sequence
* `--align` argument (via `airr_to_fasta` function) for extraction of sequence
alignment FASTA from AIRR results ([#1])
* Error messages sent by the server are now raised as an exception containing
the server-provided message(s) ([#7])
Expand Down
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

[IMGT](http://imgt.org)'s [V-QUEST](http://www.imgt.org/IMGT_vquest/analysis)
is only available via a web interface. This Python package automates V-QUEST
usage by submitting request data like the web form does. Curently only the
usage by submitting request data like the web form does. Currently only the
"Download AIRR formatted results" option is supported.

Example command-line usage, with rhesus sequences in seqs.fasta:
Expand All @@ -13,7 +13,7 @@ Example command-line usage, with rhesus sequences in seqs.fasta:
vquest --species rhesus-monkey --receptorOrLocusType IG --fileSequences seqs.fasta

The output is saved to `Parameters.txt` and `vquest_airr.tsv` (the files
V-QUEST provides in a zip archive) in the working directory.
V-QUEST provides in a zip archive) in the working directory by default.

Or with `--align` to automatically extract the alignment as FASTA:

Expand All @@ -33,14 +33,14 @@ Here the output is a dictionary of filenames to contents.

The only required options are species, receptorOrLocusType, and either
fileSequences or sequences (to provide sequences directly as text). Options
can be given via command-line arguemnts or one or more YAML configuration
can be given via command-line arguments or one or more YAML configuration
files. See [data/defaults.yml](data/defaults.yml) and `./vquest.py --help` for
details.

The web form will only accept 50 sequences at a time, so the sequences given
here are grouped into chunks of 50, submitted, and the results combined. A
delay (default 1 second) is used between submissions to avoid being impolite to
the server.
here are grouped into chunks of 50, submitted, and (by default) the results
automatically combined. A delay (default 1 second) is used between submissions
to avoid being impolite to the server.

* V-QUEST: <http://www.imgt.org/IMGT_vquest/analysis>
* V-QUEST docs: <http://www.imgt.org/IMGT_vquest/user_guide#intro>
Expand Down
2 changes: 1 addition & 1 deletion test_vquest/data/test_vquest/TestVquestCustom/config.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
species: rhesus-monkey
receptorOrLocusType: IG
v_regionsearchindel: true
V_REGIONsearchIndel: true
sequences: |
>IGKV2-ACR*02
GACATTGTGATGACCCAGACTCCACTCTCCCTGCCCGTCACCCCTGGAGAGCCAGCCTCCATCTCCTGCAGGTCTAGTCA
Expand Down
11 changes: 11 additions & 0 deletions test_vquest/data/test_vquest/TestVquestCustom/config_inline.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
species: rhesus-monkey
receptorOrLocusType: IG
V_REGIONsearchIndel: true
resultType: excel
xv_outputtype: 3
sequences: |
>IGKV2-ACR*02
GACATTGTGATGACCCAGACTCCACTCTCCCTGCCCGTCACCCCTGGAGAGCCAGCCTCCATCTCCTGCAGGTCTAGTCA
GAGCCTCTTGGATAGTGACGGGTACACCTGTTTGGACTGGTACCTGCAGAAGCCAGGCCAGTCTCCACAGCTCCTGATCT
ATGAGGTTTCCAACCGGGTCTCTGGAGTCCCTGACAGGTTCAGTGGCAGTGGGTCAGNCACTGATTTCACACTGAAAATC
AGCCGGGTGGAAGCTGAGGATGTTGGGGTGTATTACTGTATGCAAAGTATAGAGTTTCCTCC
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
Date Wed Dec 02 19:18:14 CET 2020
IMGT/V-QUEST program version 3.5.21
IMGT/V-QUEST reference directory release 202049-2
Species Macaca mulatta
Receptor type or locus IG
IMGT/V-QUEST reference directory set F+ORF+ in-frame P
Search for insertions and deletions yes
Nb of nucleotides to add (or exclude) in 3' of the V-REGION for the evaluation of the alignment score 0
Nb of nucleotides to exclude in 5' of the V-REGION for the evaluation of the nb of mutations 0
Analysis of scFv no
Number of submitted sequences 1

Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
sequence_id sequence sequence_aa rev_comp productive complete_vdj vj_in_frame stop_codon locus v_call d_call j_call c_call sequence_alignment sequence_alignment_aa germline_alignment germline_alignment_aa junction junction_aa np1 np1_aa np2 np2_aa cdr1 cdr1_aa cdr2 cdr2_aa cdr3 cdr3_aa fwr1 fwr1_aa fwr2 fwr2_aa fwr3 fwr3_aa fwr4 fwr4_aa v_score v_identity v_support v_cigar d_score d_identity d_support d_cigar j_score j_identity j_support j_cigar c_score c_identity c_support c_cigar v_sequence_start v_sequence_end v_germline_start v_germline_end v_alignment_start v_alignment_end d_sequence_start d_sequence_end d_germline_start d_germline_end d_alignment_start d_alignment_end j_sequence_start j_sequence_end j_germline_start j_germline_end j_alignment_start j_alignment_end cdr1_start cdr1_end cdr2_start cdr2_end cdr3_start cdr3_end fwr1_start fwr1_end fwr2_start fwr2_end fwr3_start fwr3_end fwr4_start fwr4_end v_sequence_alignment v_sequence_alignment_aa d_sequence_alignment d_sequence_alignment_aa j_sequence_alignment j_sequence_alignment_aa c_sequence_alignment c_sequence_alignment_aa v_germline_alignment v_germline_alignment_aa d_germline_alignment d_germline_alignment_aa j_germline_alignment j_germline_alignment_aa c_germline_alignment c_germline_alignment_aa junction_length junction_aa_length np1_length np2_length n1_length n2_length p3v_length p5d_length p3d_length p5j_length consensus_count duplicate_count cell_id clone_id rearrangement_id repertoire_id rearrangement_set_id sequence_analysis_category d_number 5prime_trimmed_n_nb 3prime_trimmed_n_nb insertions deletions junction_decryption
IGKV2-ACR*02 gacattgtgatgacccagactccactctccctgcccgtcacccctggagagccagcctccatctcctgcaggtctagtcagagcctcttggatagtgacgggtacacctgtttggactggtacctgcagaagccaggccagtctccacagctcctgatctatgaggtttccaaccgggtctctggagtccctgacaggttcagtggcagtgggtcagncactgatttcacactgaaaatcagccgggtggaagctgaggatgttggggtgtattactgtatgcaaagtatagagtttcctcc F F IGK Macmul IGKV2S20*01 F gacattgtgatgacccagactccactctccctgcccgtcacccctggagagccagcctccatctcctgcaggtctagtcagagcctcttggatagt...gacgggtacacctgtttggactggtacctgcagaagccaggccagtctccacagctcctgatctatgaggtt.....................tccaaccgggtctctggagtccct...gacaggttcagtggcagtggg......tcagncactgatttcacactgaaaatcagccgggtggaagctgaggatgttggggtgtattactgtatgcaaagtatagagtttcctcc DIVMTQTPLSLPVTPGEPASISCRSSQSLLDS.DGYTCLDWYLQKPGQSPQLLIYEV.......SNRVSGVP.DRFSGSG..SXTDFTLKISRVEAEDVGVYYCMQSIEFP gatattgtgatgacccagactccactctccctgccagtcacccctggagagccggcctccatctcctgcaggtctagtcagagcctcttggatagtgaggatggaaacacctatttggaatggtacctgcagaagccaggccagtctccacagcccttgatttatgaggtt.....................tccaaccgggcctctggagtccca...gacaggttcagtggcagtggg......tcagacactgatttcacactgaaaatcagcagagtggaggctgaggatgttggggtttattactgcatgcaaggtatagagtatcctcc DIVMTQTPLSLPVTPGEPASISCRSSQSLLDSEDGNTYLEWYLQKPGQSPQPLIYEV.......SNRASGVP.DRFSGSG..SDTDFTLKISRVEAEDVGVYYCMQGIEYP cagagcctcttggatagtgacgggtacacctgt QSLLDSDGYTC gaggtttcc EVS atgcaaagtatagagtttcctcc MQSIEFP gacattgtgatgacccagactccactctccctgcccgtcacccctggagagccagcctccatctcctgcaggtctagt DIVMTQTPLSLPVTPGEPASISCRSS ttggactggtacctgcagaagccaggccagtctccacagctcctgatctat LDWYLQKPGQSPQLLIY aaccgggtctctggagtccctgacaggttcagtggcagtgggtcagncactgatttcacactgaaaatcagccgggtggaagctgaggatgttggggtgtattactgt NRVSGVPDRFSGSGSXTDFTLKISRVEAEDVGVYYC 1294 93.20 2=1X32=1X17=1X42=3D2=1X2=2X6=1X6=1X34=1X1=1X4=1X19=1X12=1X25=1M25=1X1=1X5=1X17=1X8=1X6=1X9=1X6= 1 302 1 335 1 335 79 111 163 171 280 302 1 78 112 162 172 279 gacattgtgatgacccagactccactctccctgcccgtcacccctggagagccagcctccatctcctgcaggtctagtcagagcctcttggatagt...gacgggtacacctgtttggactggtacctgcagaagccaggccagtctccacagctcctgatctatgaggtt.....................tccaaccgggtctctggagtccct...gacaggttcagtggcagtggg......tcagncactgatttcacactgaaaatcagccgggtggaagctgaggatgttggggtgtattactgtatgcaaagtatagagtttcctcc DIVMTQTPLSLPVTPGEPASISCRSSQSLLDS.DGYTCLDWYLQKPGQSPQLLIYEV.......SNRVSGVP.DRFSGSG..SXTDFTLKISRVEAEDVGVYYCMQSIEFP gatattgtgatgacccagactccactctccctgccagtcacccctggagagccggcctccatctcctgcaggtctagtcagagcctcttggatagtgaggatggaaacacctatttggaatggtacctgcagaagccaggccagtctccacagcccttgatttatgaggtt.....................tccaaccgggcctctggagtccca...gacaggttcagtggcagtggg......tcagacactgatttcacactgaaaatcagcagagtggaggctgaggatgttggggtttattactgcatgcaaggtatagagtatcctcc DIVMTQTPLSLPVTPGEPASISCRSSQSLLDSEDGNTYLEWYLQKPGQSPQPLIYEV.......SNRASGVP.DRFSGSG..SDTDFTLKISRVEAEDVGVYYCMQGIEYP 0 0 0 0 0 0 2 (indelcorr) 0 0 0 in CDR1-IMGT, from codon 33 of V-REGION: 3 nucleotides (from position 97 in the user submitted sequence), (do not cause frameshift)
Empty file.
8 changes: 8 additions & 0 deletions test_vquest/data/test_vquest/TestVquestInvalid/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
species: rhesus-monkey
receptorOrLocusType: antibody # not valid!
sequences: |
>IGKV2-ACR*02
GACATTGTGATGACCCAGACTCCACTCTCCCTGCCCGTCACCCCTGGAGAGCCAGCCTCCATCTCCTGCAGGTCTAGTCA
GAGCCTCTTGGATAGTGACGGGTACACCTGTTTGGACTGGTACCTGCAGAAGCCAGGCCAGTCTCCACAGCTCCTGATCT
ATGAGGTTTCCAACCGGGTCTCTGGAGTCCCTGACAGGTTCAGTGGCAGTGGGTCAGNCACTGATTTCACACTGAAAATC
AGCCGGGTGGAAGCTGAGGATGTTGGGGTGTATTACTGTATGCAAAGTATAGAGTTTCCTCC
10 changes: 10 additions & 0 deletions test_vquest/data/test_vquest/TestVquestInvalid/config_inline.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
species: rhesus-monkey
receptorOrLocusType: antibody # not valid!
resultType: excel
xv_outputtype: 3
sequences: |
>IGKV2-ACR*02
GACATTGTGATGACCCAGACTCCACTCTCCCTGCCCGTCACCCCTGGAGAGCCAGCCTCCATCTCCTGCAGGTCTAGTCA
GAGCCTCTTGGATAGTGACGGGTACACCTGTTTGGACTGGTACCTGCAGAAGCCAGGCCAGTCTCCACAGCTCCTGATCT
ATGAGGTTTCCAACCGGGTCTCTGGAGTCCCTGACAGGTTCAGTGGCAGTGGGTCAGNCACTGATTTCACACTGAAAATC
AGCCGGGTGGAAGCTGAGGATGTTGGGGTGTATTACTGTATGCAAAGTATAGAGTTTCCTCC
10 changes: 10 additions & 0 deletions test_vquest/data/test_vquest/TestVquestSimple/config_inline.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
species: rhesus-monkey
receptorOrLocusType: IG
resultType: excel
xv_outputtype: 3
sequences: |
>IGKV2-ACR*02
GACATTGTGATGACCCAGACTCCACTCTCCCTGCCCGTCACCCCTGGAGAGCCAGCCTCCATCTCCTGCAGGTCTAGTCA
GAGCCTCTTGGATAGTGACGGGTACACCTGTTTGGACTGGTACCTGCAGAAGCCAGGCCAGTCTCCACAGCTCCTGATCT
ATGAGGTTTCCAACCGGGTCTCTGGAGTCCCTGACAGGTTCAGTGGCAGTGGGTCAGNCACTGATTTCACACTGAAAATC
AGCCGGGTGGAAGCTGAGGATGTTGGGGTGTATTACTGTATGCAAAGTATAGAGTTTCCTCC
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
Date Tue Dec 01 22:08:11 CET 2020
IMGT/V-QUEST program version 3.5.21
IMGT/V-QUEST reference directory release 202049-2
Species Macaca mulatta
Receptor type or locus IG
IMGT/V-QUEST reference directory set F+ORF+ in-frame P
Search for insertions and deletions no
Nb of nucleotides to add (or exclude) in 3' of the V-REGION for the evaluation of the alignment score 0
Nb of nucleotides to exclude in 5' of the V-REGION for the evaluation of the nb of mutations 0
Analysis of scFv no
Number of submitted sequences 1

Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
sequence_id sequence sequence_aa rev_comp productive complete_vdj vj_in_frame stop_codon locus v_call d_call j_call c_call sequence_alignment sequence_alignment_aa germline_alignment germline_alignment_aa junction junction_aa np1 np1_aa np2 np2_aa cdr1 cdr1_aa cdr2 cdr2_aa cdr3 cdr3_aa fwr1 fwr1_aa fwr2 fwr2_aa fwr3 fwr3_aa fwr4 fwr4_aa v_score v_identity v_support v_cigar d_score d_identity d_support d_cigar j_score j_identity j_support j_cigar c_score c_identity c_support c_cigar v_sequence_start v_sequence_end v_germline_start v_germline_end v_alignment_start v_alignment_end d_sequence_start d_sequence_end d_germline_start d_germline_end d_alignment_start d_alignment_end j_sequence_start j_sequence_end j_germline_start j_germline_end j_alignment_start j_alignment_end cdr1_start cdr1_end cdr2_start cdr2_end cdr3_start cdr3_end fwr1_start fwr1_end fwr2_start fwr2_end fwr3_start fwr3_end fwr4_start fwr4_end v_sequence_alignment v_sequence_alignment_aa d_sequence_alignment d_sequence_alignment_aa j_sequence_alignment j_sequence_alignment_aa c_sequence_alignment c_sequence_alignment_aa v_germline_alignment v_germline_alignment_aa d_germline_alignment d_germline_alignment_aa j_germline_alignment j_germline_alignment_aa c_germline_alignment c_germline_alignment_aa junction_length junction_aa_length np1_length np2_length n1_length n2_length p3v_length p5d_length p3d_length p5j_length consensus_count duplicate_count cell_id clone_id rearrangement_id repertoire_id rearrangement_set_id sequence_analysis_category d_number 5prime_trimmed_n_nb 3prime_trimmed_n_nb insertions deletions junction_decryption
IGKV2-ACR*02 gacattgtgatgacccagactccactctccctgcccgtcacccctggagagccagcctccatctcctgcaggtctagtcagagcctcttggatagtgacgggtacacctgtttggactggtacctgcagaagccaggccagtctccacagctcctgatctatgaggtttccaaccgggtctctggagtccctgacaggttcagtggcagtgggtcagncactgatttcacactgaaaatcagccgggtggaagctgaggatgttggggtgtattactgtatgcaaagtatagagtttcctcc F F IGK Macmul IGKV2S20*01 F gacattgtgatgacccagactccactctccctgcccgtcacccctggagagccagcctccatctcctgcaggtctagtcagagcctcttggatagt...gacgggtacacctgtttggactggtacctgcagaagccaggccagtctccacagctcctgatctatgaggtt.....................tccaaccgggtctctggagtccct...gacaggttcagtggcagtggg......tcagncactgatttcacactgaaaatcagccgggtggaagctgaggatgttggggtgtattactgtatgcaaagtatagagtttcctcc DIVMTQTPLSLPVTPGEPASISCRSSQSLLDS.DGYTCLDWYLQKPGQSPQLLIYEV.......SNRVSGVP.DRFSGSG..SXTDFTLKISRVEAEDVGVYYCMQSIEFP gatattgtgatgacccagactccactctccctgccagtcacccctggagagccggcctccatctcctgcaggtctagtcagagcctcttggatagtgaggatggaaacacctatttggaatggtacctgcagaagccaggccagtctccacagcccttgatttatgaggtt.....................tccaaccgggcctctggagtccca...gacaggttcagtggcagtggg......tcagacactgatttcacactgaaaatcagcagagtggaggctgaggatgttggggtttattactgcatgcaaggtatagagtatcctcc DIVMTQTPLSLPVTPGEPASISCRSSQSLLDSEDGNTYLEWYLQKPGQSPQPLIYEV.......SNRASGVP.DRFSGSG..SDTDFTLKISRVEAEDVGVYYCMQGIEYP cagagcctcttggatagtgacgggtacacctgt QSLLDSDGYTC gaggtttcc EVS atgcaaagtatagagtttcctcc MQSIEFP gacattgtgatgacccagactccactctccctgcccgtcacccctggagagccagcctccatctcctgcaggtctagt DIVMTQTPLSLPVTPGEPASISCRSS ttggactggtacctgcagaagccaggccagtctccacagctcctgatctat LDWYLQKPGQSPQLLIY aaccgggtctctggagtccctgacaggttcagtggcagtgggtcagncactgatttcacactgaaaatcagccgggtggaagctgaggatgttggggtgtattactgt NRVSGVPDRFSGSGSXTDFTLKISRVEAEDVGVYYC 1294 93.20 2=1X32=1X17=1X42=3D2=1X2=2X6=1X6=1X34=1X1=1X4=1X19=1X12=1X25=1M25=1X1=1X5=1X17=1X8=1X6=1X9=1X6= 1 302 1 335 1 335 79 111 163 171 280 302 1 78 112 162 172 279 gacattgtgatgacccagactccactctccctgcccgtcacccctggagagccagcctccatctcctgcaggtctagtcagagcctcttggatagt...gacgggtacacctgtttggactggtacctgcagaagccaggccagtctccacagctcctgatctatgaggtt.....................tccaaccgggtctctggagtccct...gacaggttcagtggcagtggg......tcagncactgatttcacactgaaaatcagccgggtggaagctgaggatgttggggtgtattactgtatgcaaagtatagagtttcctcc DIVMTQTPLSLPVTPGEPASISCRSSQSLLDS.DGYTCLDWYLQKPGQSPQLLIYEV.......SNRVSGVP.DRFSGSG..SXTDFTLKISRVEAEDVGVYYCMQSIEFP gatattgtgatgacccagactccactctccctgccagtcacccctggagagccggcctccatctcctgcaggtctagtcagagcctcttggatagtgaggatggaaacacctatttggaatggtacctgcagaagccaggccagtctccacagcccttgatttatgaggtt.....................tccaaccgggcctctggagtccca...gacaggttcagtggcagtggg......tcagacactgatttcacactgaaaatcagcagagtggaggctgaggatgttggggtttattactgcatgcaaggtatagagtatcctcc DIVMTQTPLSLPVTPGEPASISCRSSQSLLDSEDGNTYLEWYLQKPGQSPQPLIYEV.......SNRASGVP.DRFSGSG..SDTDFTLKISRVEAEDVGVYYCMQGIEYP 0 0 0 0 0 0 1 (noindelsearch) 0 0 0
35 changes: 35 additions & 0 deletions test_vquest/test_util.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,38 @@ def test_chunker(self):
for chunk in util.chunker(range(5), 5):
chunks.append(chunk)
self.assertEqual([[0, 1, 2, 3, 4]], chunks)

class TestUnzip(unittest.TestCase):
"""Basic test of the unzip helper."""

def test_unzip(self):
"""Test that binary ZIP data with an empty file can be extracted."""
self.assertEqual(
util.unzip(bytes.fromhex(
"504b03040a0000000000ab6c"
"ef5200000000000000000000"
"000008001c00746573742e64"
"617455540900035272f06052"
"72f06075780b000104e90300"
"0004e9030000504b01021e03"
"0a0000000000ab6cef520000"
"000000000000000000000800"
"18000000000000000000b481"
"00000000746573742e646174"
"55540500035272f06075780b"
"000104e903000004e9030000"
"504b05060000000001000100"
"4e000000420000000000")),
{"test.dat": b""})

class TestAirrToFasta(unittest.TestCase):
"""Basic test of the airr_to_fastas helper."""

def test_airr_to_fasta(self):
"""Test that FASTA is generated from AIRR TSV."""
expected = ">1\nACTG\n>2\nCGTA\n"
observed = util.airr_to_fasta(
"sequence_id\tsequence\tsequence_alignment\n"
"1\tACTG\tACTG\n"
"2\t\tCGTA\n")
self.assertEqual(observed, expected)
Loading

0 comments on commit 9fab52a

Please sign in to comment.