Variants shouldn't be map to reference sequences #1

ypriverol · 2021-07-27T17:27:08Z

As I mentioned in the discussion the other day would be great if we have a logic in the tool that if the user uses the MM options -mm NUM to allow missmatches and the query peptide matches with 0 miss-matches and 1..2.. etc miss-matches have a way to discard the 1..2. .etc options. The use case is:

Peptide Query: AAAA -> Protein A in sequence ..AAAA... and Protein B in sequence ..AVAA.. we discard the second match because it has more probabilities (I will say 100%) that this is only a reference peptide. James mentioned that he is doing a two-step search to discard reference peptides first and then search with multiple gaps.

Would be great if we can implement that feature.

The text was updated successfully, but these errors were encountered:

EamonnOCearnaigh · 2021-08-18T19:17:24Z

Sounds great - I'll see if I can plan out another edit to implement that this week. Hey also, James passed a maven command on to me but I am still facing errors when running pepgenome from the terminal. Would you have any ideas?

ypriverol · 2021-08-18T20:05:16Z

Which error do you get?

ypriverol · 2021-08-19T06:27:08Z

this command should work:

$ java -jar pepgenome-1.1.1-bin.jar
usage: Arguments: -fasta TRANSL -gtf ANNO -in *.tsv[,*.tsv] [-format OUTF]
                  [-merge TRUE/FALSE] [-source SRC] [-mm NUM] [-mmmode
                  TRUE/FALSE] [-species SPECIES] [-chr 0/1]
 -ann <arg>              Filepath for file containing genome annotation in
                         GTF or GFF3 format
 -chr <arg>              Export chr prefix Allowed 0, 1  (default: 0)
 -exco <arg>             Use exon coordinates rather than CDS (Unannotated
                         peptides)
 -fasta <arg>            Filepath for file containing protein sequences in
                         FASTA format
 -format <arg>           Select the output formats from gtf, gct, bed,
                         ptmbed, all or combinations thereof separated by
                         ',' (default all)
 -genome <arg>           Filepath for file containing genome sequence in
                         FASTA format used to extract chromosome names and
                         order and differenciate between assembly and
                         scaffolds. If not set chromosome and scaffold
                         names and order is extracted from GTF input.
 -gff <arg>              Filepath for file containing genome annotation in
                         GFF3 format
 -gtf <arg>              Filepath for file containing genome annotation in
                         GTF format
 -h                      Print this help & exit
 -in <arg>               Comma(,) separated file paths for files
                         containing peptide identifications (Contents of
                         the file can tab separated format. i.e., File
                         format: four columns: SampleName
                         PeptideSequence
                         PSMs
                         Quant; or mzTab, and mzIdentML)
 -inf <arg>              Format of the input file (mztab, mzid, pavro, or
                         tsv). (default tsv)
 -inm <arg>              Compute the kmer algorithm in memory or using
                         database algorithm (default 0, database 1)
 -merge <arg>            Set 'true' to merge mappings from all files from
                         input (default 'false')
 -mm <arg>               Allowed mismatches (0, 1 or 2; default: 0)
 -mmmode <arg>           Mismatch mode (true or false): if true
                         mismatching with two mismatches will only allow 1
                         mismatch every kmersize (default: 5) positions.
                         (default: false)
 -source <arg>           Please give a source name which will be used in
                         the second column in the output gtf file
                         (default: PoGo)
 -variant_filter <arg>   Peptide filter mode.

EamonnOCearnaigh · 2021-08-26T02:24:54Z

Hey, sorry for delay - job hunting at the minute since my placement contract is ending.

I'm having trouble actually generating the JAR. The JAR wasn't working when generated from the Iintellij Jjava application configurations that I was using for debugging. So, I switched it to a maven configuration and I'm trying to get it to build. I tried running "mvn install" on the command line which downloaded some files but resulted in:

[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 06:38 min
[INFO] Finished at: 2021-08-26T03:10:49+01:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project pepgenome: Could not resolve dependencies for project org.bigbio.pgatk:pepgenome:jar:1.1.beta: Could not find
artifact com.sun.java:tools:jar:13.0.2 at specified path C:\Program Files\Java\jdk-13.0.2/../lib/tools.jar

Everything is set to use the same version of java, any idea what's causing this? I didn't get this error during the testing stages at all.

ypriverol · 2021-08-26T21:43:24Z

I remove that dependency. Can you try now @EamonnOCearnaigh

EamonnOCearnaigh · 2021-08-26T22:55:19Z

[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 8.108 s
[INFO] Finished at: 2021-08-26T23:54:46+01:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project pepgenome: Could not resolve dependencies for project io.github.bigbio:pepgenome:jar:1.1.1: Failed to collect
dependencies at uk.ac.ebi.jmzidml:jmzidentml:jar:1.2.11 -> uk.ac.ebi.pride.architectural:pride-xml-handling:pom:1.0.3 -> psidev.psi.tools:xxindex:jar:0.
23 -> net.sourceforge.cpdetector:cpdetector:jar:1.0.7 -> net.sourceforge.jargs:jargs:jar:1.0: Failed to read artifact descriptor for net.sourceforge.jar
gs:jargs:jar:1.0: Could not transfer artifact net.sourceforge.jargs:jargs:pom:1.0 from/to sonatype-release (https://oss.sonatype.org/service/local/stagi
ng/deploy/maven2): authentication failed for https://oss.sonatype.org/service/local/staging/deploy/maven2/net/sourceforge/jargs/jargs/1.0/jargs-1.0.pom,
status: 401 Unauthorized -> [Help 1]

EamonnOCearnaigh · 2021-08-26T22:56:38Z

I'm running "mvn install" on the command line. Is that the right command?

ypriverol · 2021-08-27T07:53:14Z

Yes, mvn install is fine. Which Java version do you have? Most of these issues are related with the Java version you are using.

ypriverol · 2021-08-27T17:02:32Z

I have added the dependency to our internal maven repo. Can you try again @EamonnOCearnaigh ?

EamonnOCearnaigh · 2021-08-27T17:41:03Z

It got further this time.

Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/struts/struts-core/1.3.8/struts-core-1.3.8.jar (329 kB at 270 kB/s)
Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/struts/struts-taglib/1.3.8/struts-taglib-1.3.8.jar (252 kB at 205 kB/s)
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 02:00 min
[INFO] Finished at: 2021-08-27T18:29:02+01:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-javadoc-plugin:3.1.1:jar (attach-javadocs) on project pepgenome: MavenReportException: Err
or while generating Javadoc: Unable to find javadoc command: The environment variable JAVA_HOME is not correctly set. -> [Help 1]

Also my Java version is:

java version "13.0.2" 2020-01-14
Java(TM) SE Runtime Environment (build 13.0.2+8)
Java HotSpot(TM) 64-Bit Server VM (build 13.0.2+8, mixed mode, sharing)

ypriverol · 2021-08-27T18:26:49Z

Java doesn't find thw javadoc command. I have not idea why. Can you check in stackoverflow

EamonnOCearnaigh · 2021-09-01T20:56:53Z

Will do. For now, do you have a working JAR you could post to GitHub or send by email? We just need to place it into a NextFlow pipeline.

ypriverol transferred this issue from bigbio/pgatk Aug 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Variants shouldn't be map to reference sequences #1

Variants shouldn't be map to reference sequences #1

ypriverol commented Jul 27, 2021

EamonnOCearnaigh commented Aug 18, 2021

ypriverol commented Aug 18, 2021

ypriverol commented Aug 19, 2021

EamonnOCearnaigh commented Aug 26, 2021

ypriverol commented Aug 26, 2021

EamonnOCearnaigh commented Aug 26, 2021

EamonnOCearnaigh commented Aug 26, 2021

ypriverol commented Aug 27, 2021

ypriverol commented Aug 27, 2021

EamonnOCearnaigh commented Aug 27, 2021

ypriverol commented Aug 27, 2021

EamonnOCearnaigh commented Sep 1, 2021

Variants shouldn't be map to reference sequences #1

Variants shouldn't be map to reference sequences #1

Comments

ypriverol commented Jul 27, 2021

EamonnOCearnaigh commented Aug 18, 2021

ypriverol commented Aug 18, 2021

ypriverol commented Aug 19, 2021

EamonnOCearnaigh commented Aug 26, 2021

ypriverol commented Aug 26, 2021

EamonnOCearnaigh commented Aug 26, 2021

EamonnOCearnaigh commented Aug 26, 2021

ypriverol commented Aug 27, 2021

ypriverol commented Aug 27, 2021

EamonnOCearnaigh commented Aug 27, 2021

ypriverol commented Aug 27, 2021

EamonnOCearnaigh commented Sep 1, 2021