Skip to content
This repository has been archived by the owner on Jul 10, 2018. It is now read-only.

Latest commit

 

History

History
80 lines (35 loc) · 2.09 KB

readme.md

File metadata and controls

80 lines (35 loc) · 2.09 KB

Protein Secondary Structure Support Vector Mechine Predictor

This predictor takes protein sequence fasta files, and predicts the amino acid structure in 3 state format.

It is trained using cas2.3line dataset. And it also predicts sequence structure in 3 state format.

data/testset.dat is an example of sequences it could predict.

To use this predictor, please set work directory as /StrucPred and run the predictor.py file.

Features:

  • Use evolutionary information (psiblast and PSSM)

  • Use the neighbor amino acids information in prediction (Builing amino acid windows)

  • A variaty of SVM methods to choose, including linear SVM and rbfsvm.

  • Cross Validation methods are used to split the dataset.

  • Other machine learning method can be used to compare the prediction result, including random forest and simple decision tree.

  • Prediction and evaluation of predaction are stored in the result folder. (You can find some evaluation of predictions I have tried.

This predictor is written in Python3

Download:

Packages required:

To use this predictor, you need to install pickle, pandas 0.22.0, scikiy-learn 0.19.1 package。

Models to choose :

There are several models to choose from in the model folder.You can change the model in lin293, predictor.py

Evaluation:

The evalations are stored in /result folder, which include Q3 and coeffeience co.

You can also get the cross validation store and prediction accuracy by removing the triple-quotes. This may take a long time.

Coding files:

  1. model_*.py files are files I used to create different models.

  2. additional_dataset_parser.py is used to parser additional 50 protein sequences.

PSSM:

In pssm folder:

Folder 'Sequences': sequences to be psiblasted

Folder 'psiblast_pssm': raw pssm result given by psiblast

Folder 'pssmMatrix': pssm in csv format

formatdb.sh : formating psiblast database-

psiblast.sh: carry out psiblast

extractPSSM.py : raw pssm result to pssm.csv

parser_PSSMtoSVM_MultipleFiles.py : parse pssm and use it later in svm