-
Notifications
You must be signed in to change notification settings - Fork 0
Bibliography
Use this space to write about the papers you think are interesting.
Predicting microbial community compositions in wastewater treatment plants using artificial neural networks
Citation: Liu, X., Nie, Y. & Wu, XL. Predicting microbial community compositions in wastewater treatment plants using artificial neural networks. Microbiome 11, 93 (2023).
DOI: https://doi.org/10.1186/s40168-023-01519-9
Description: Used 16S rRNA sequences from 1186 activated sludge samples to build a fully connected artificial neural network in Pytorch with 3 layers. They attempted to predict alpha diversity (Shannon–Wiener index, Pielou's evenness index, species richness, and Faith's phylogenetic diversity) and relative abundances of 1493 ASVs found in more than 10% of samples (which actually only accounted for 3% of the number of ASVs or 64% of the total abundance of ASVs). They also predicted communitry structure by trying to recover the ASV10%/core ASVs of each sample. They used a wide range of environmental variables as predictors, including the design and operation of the wastewater treatment, climatic conditions, inflow conditions, etc, totaling 50 variables.
"We found that taxa with high relative abundance, high occurrence frequency, and low estimated migration rate were more accurately predicted by the ANN model. Furthermore, the presence of industrial wastewater in the inflow significantly impacted the prediction of microbial communities, as demonstrated by the weight analysis of environmental factors in the ANN models."
Thoughts: I think this paper is well documented enough for us to try and recreate. They're using Pytorch but we could do a Tensorflow implementation or just re-run the code first to see what we find. They ran into some trouble predicting anything for the less common ASVs due to them not appearing enough in the dataset. I wonder if we can find another method of making a neural network that works well for low abundance ASVs. I also like the idea of feeding it different values for the environmental parameters and seeing if the model turns out a reasonable-looking community, based on its functional profile.
Data availability: "The raw data in this study is from reference [3]. All analyzed data in this study is available in Additional file 3. The source code is available at https://github.com/Neina-0830/WWTP_community_prediction."
Community composition of microbial microcosms follows simple assembly rules at evolutionary timescales
Citation: Meroz, N., Tovi, N., Sorokin, Y. et al. Community composition of microbial microcosms follows simple assembly rules at evolutionary timescales. Nat Commun 12, 2891 (2021)
DOI: https://doi.org/10.1038/s41467-021-23247-0
Description: Great paper by Jonathan Friedman on predicting microbial community assembly/evolution. The authors tracked the dynamics of 87 two- and three-species bacterial communities, with 3–18 replicates each, for ~400 generations and used a previously derived bottom-up model to predict the resulting communities. The bottom up model is thus: "the fraction of a species when grown in a multispecies community is predicted as the weighted geometric mean of the fraction of the species in all pairwise cultures". So they had to supplement the evolution data with all pairwise combinations first.
Thoughts: This may be an opportunity to test time series prediction using neural networks. I'm also curious if we can convert the pairwise interactions into an interaction network and do some network stuff, like feed an interaction network to a trained model and see if it can predict the resulting community? This paper is obviously not a NN paper but it has some very interesting data to mine (and possibly we can collaborate with the Friedman lab which would be a dream for me honestly)
Data Availability: "The full dataset used in this paper is available at github.com/nittaym/Evo_assembly_rules, https://doi.org/10.5281/zenodo.4704257. Source data are provided with this paper. The Python 3 code used for the analysis is available at github.com/nittaym /Evo_assembly_rules, https://doi.org/10.5281/zenodo.4704257."
Citation: Dimitrov, I., Doytchinova, I. (2023). Prediction of Bacterial Immunogenicity by Machine Learning Methods. In: Reche, P.A. (eds) Computational Vaccine Design. Methods in Molecular Biology, vol 2673. Humana, New York, NY.
DOI: https://doi.org/10.1007/978-1-0716-3239-0_20
Description: A methods paper for doing shallow machine learning to predict antigenicity (in humans) of bacterial protein sequences. The authors collected a dataset of 317 human immunogenic protein sequences obtained from 47 bacterial species and the same number of NON-immunogenic sequences from the same bacterial species. They applied a transformation of the protein sequences into numerical representation (based on previous research). Then, they used the software WEKA to apply and compare the following machine learning models: Partial Least Squares-Based Discriminant Analysis (PLS-DA), k Nearest Neighbor (kNN), Support Vector Machine (SVM), Random Forest (RF), Random Subspace Method (RSM) with kNN Estimator.
Thoughts: Reproducing this paper would be a great way to get an introduction to machine learning as the foundations of deep machine learning come from these models and use some of the same methods and terminology. Additionally, WEKA is a software with a GUI so it would be low-code and might be quick to come together. We can then theorize about other ways to represent the data or how we would translate this to a neural network type of machine learning model.
Data Availability: Fasta files of the proteins and the scripts to prepare the data are on their github. The rest of the procedure is clearly outlined in the paper, as it's a methods journal.
Citation: Beatriz García-Jiménez and others, Predicting microbiomes through a deep latent space, Bioinformatics, Volume 37, Issue 10, May 2021, Pages 1444–1451
DOI: https://doi.org/10.1093/bioinformatics/btaa971
Description: They designed a deep learning model to predict the rhizosphere microbiome composition of Maize using environmental data. The selected environmental parameters were temperature, precipitation, plant age, maize line and maize variety. They used an autoencoder to transform both the predictors (environment) and outputs (taxonomic composition) into latent space, then trained a decoder using that latent space to predict taxonomic composition using unencoded environmental variables. Plus they extended the predictions to new environmental conditions (such as climate change scenarios), though without validation. Finally, they also applied transfer learning on the trained model to predict using a new, smaller dataset.
Thoughts: This is a nice paper that demonstrates key concepts in deep learning, including autoencoders, simple but non-standard model architecture, custom evaluation metrics, and transfer learning. If we can reproduce this research or use this method to train a new model for a different crop system, it would be a good learning experience and also relevant to USDA. The short discussion section is also worth a read about why ML in microbiome research is difficult. Because this paper was so well documented in its methods (complete with jupyter notebook examples and installation instructions), it may be easy to extend and apply to a new dataset.
Data availability: "Software, results and data are available at https://github.com/jorgemf/DeepLatentMicrobiome." NB THIS IS VERY WELL DOCUMENTED!!