Skip to content
View lhallee's full-sized avatar

Block or report lhallee

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
lhallee/README.md

👋 Hi, I’m @lhallee

My name is Logan Hallee, a PhD Candidate in Bioinformatics Data Science at the University of Delaware (Gleghorn Lab) specializing in curating large-dimensional feature spaces for biological data. My primary focus is on protein design and annotation using transformer neural networks. Techniques I developed led to the creation of SYNTERACT, the first large language model approach to protein-protein interaction prediction, ranking in the top 3% of research outputs by Altmetric.

At the Wolfram Winter School, I collaborated with Stephen Wolfram and other mentors to create "Tetris For Proteins," a shape-based metric for protein-protein interactions that emulates lock-and-key enzyme-substrate dynamics, generating hypotheses about protein aggregation likelihood.

I created the Annotation Vocabulary, a unique set of integers mapped to popular protein and gene ontologies, enabling state-of-the-art protein annotation and generation models when used with its own token embedding.

My work also supports the paradigm of codon usage bias as a key biological phenomenon for phylogenetic analysis. Our models, published in Nature Scientific Reports, highlight codon usage as a unique phylogenetic predictor. Our lab recently produced cdsBERT, showcasing cost-effective techniques to enhance the biological relevance of protein language models using a codon vocabulary.

In natural language processing, I invented Mixture of Experts extension for scalable transformer networks adept at sentence similarity tasks. We believe future networks with N experts will perform like N independently trained networks, offering significant time and computational savings for vector retrieval systems and search relying on semantic vector representations.

I also manage lab projects in computer vision, utilizing deep learning to reconstruct anatomically accurate 3D organs from 2D Z-stacks, informing morphometric and pharmacokinetic studies.

Some other stuff I've worked on over the years:

Norway, ME ➔ Newark, DE

Pinned Loading

  1. Gleghorn-Lab/AnnotationVocabulary Gleghorn-Lab/AnnotationVocabulary Public

    Jupyter Notebook 7

  2. Gleghorn-Lab/Mixture-of-Experts-Sentence-Similarity Gleghorn-Lab/Mixture-of-Experts-Sentence-Similarity Public

    Python 12

  3. Multi_Head_Mixture_of_Experts__MH-MOE Multi_Head_Mixture_of_Experts__MH-MOE Public

    Python 21 4

  4. ProteinVecHuggingface ProteinVecHuggingface Public

    Python 2

  5. featureranker featureranker Public

    Python package for feature ranking

    Python 6

  6. CUF-ORF CUF-ORF Public

    CUF Classification and ORF Identification

    Jupyter Notebook