ProtFlash: A lightweight protein language model
-
Updated
Mar 1, 2026 - Python
ProtFlash: A lightweight protein language model
Transmembrane proteins predicted through Language Model embeddings
Repository containing bio_embeddings resources
Protein homology search using transformer-based embeddings and Approximate Nearest Neighbor methods for efficient biological similarity detection
Transmembrane proteins predicted through Language Model embeddings
LLM-powered classification of phage protein functions to identify strong lytic candidates against Klebsiella, using transfer learning and biological embeddings.
This work was aimed at finding methods to identify the most distant proteins and most diverse subsets of proteins from large protein databases in a scalable and efficient way using a dataset of protein embeddings from SwissProt, data mining techniques and metaheuristics.
Similarity search for protein sequences using ESM-2 embeddings and Approximate Nearest Neighbor (ANN) methods.
Unsupervised clustering of human kinases using ESM-2 protein language model embeddings and sequence features
A hybrid C++/Python pipeline for remote protein homology detection, coupling ESM-2 language model embeddings with custom Neural-LSH for scalable approximate nearest neighbor search.
Embedding-space analysis of VH antibody diversity across human, mouse, and rat — ESM2 vs AntiBERTy, pooling strategy vs CDR masking, with paired bootstrap statistics.
Quantum-informed IBD modeling using ESM protein embeddings and Qiskit QSVC/quantum kernels; CLI for ingest → train → report.
Add a description, image, and links to the protein-embeddings topic page so that developers can more easily learn about it.
To associate your repository with the protein-embeddings topic, visit your repo's landing page and select "manage topics."