Scripts to preprocess deep mutational scanning (DMS) data. The data are analysed for combinability of mutations and the effect of epistasis.
If a residue distance matrix is available, structural epistasis graphs can be generated that visualise combinability hotspots and their centrality in the protein.
The used reference wildtype for the IRED that is also used in all scripts is:
reference = "MRDTDVTVLGLGLMGQALAGAFLKDGHATTVWNRSEGKAGQLAEQGAVLASSARDAAEASPLVVVCVSDHAAVRAVLDPLGDVLAGRVLVNLTSGTSEQARATAEWAAERGITYLDGAIMAIPQVVGTADAFLLYSGPEAAYEAHEPTLRSLGAGTTYLGADHGLSSLYDVALLGIMWGTLNSFLHGAALLGTAKVEATTFAPFANRWIEAVTGFVSAYAGQVDQGAYPALDATIDTHVATVDHLIHESEAAGVNTELPRLVRTLADRALAGGQGGLGYAAMIEQFRSPS*"
The DMS data are stored in csv files. The IRED DMS data csv files are used in this repository is:
srired_active_data.csv
pcired_active_data.csv
The easiest way to run the scripts is via Google Colab. The Jupyter notebook Epistasis_analysis.ipynb
can directly be opened on Colab by pressing the following Open in Colab button:
After having opened the notebook in Colab, the python scripts analysis_utils.py
and plotting_utils.py
need to be uploaded. These scripts contain the core functions for the analyses and plots to be carried out. Also, the csv files srired_active_data.csv
and pcired_active_data.csv
containing the DMS data as well as the distance matrices npy file srIRED_min_ca_dimer_distances.npy
and pcIRED_min_ca_dimer_distances.npy
must be uploaded as shown by the following screenshot:
The analyses can also be run via terminal.
Following package requirements are necessary
- Python >= 3.9
- pandas >= 1.5.1
pip install pandas
- numpy >= 1.23.4
pip install numpy
- scipy >= 1.9.3
pip install scipy
- scikit-learn >= 1.1.3
pip install -U scikit-learn
- networkx >= 2.8.7
pip install networkx
- matplotlib >= 3.6.0
pip install -U matplotlib
The core analyses is given in epistasis_analysis.py
and can be run via following command
python3 epistasis_analysis.py