Table 3

A summary of key characteristics of FMs in bioinformatics.

Model nameModel sizeModel taskModel nameModel sizeModel task
BioBERT110 M/340 MBiomedical text mining (NER, RE, QA)ProGen1.2BStability prediction, remote homology detection, secondary structure prediction
BioELECTRA109 MBiomedical text mining (NER, RE, QA)ProGen26.4BFunctional sequence generation, protein fitness prediction
BLURBUnknownBiomedical NLP benchmark (QA, NER, parsing, etc.)CLAPE-DBUnknownProtein–ligand-binding site prediction
BioBART139 M/400 MBiomedical text generation (dialogue, summarization, NER)Geneformer30 MSequence-based prediction
Med-PaLM12B/84B/562BMedical question answeringscGPTUnknownMultibatch integration, multi-omic integration, cell-type annotation, genetic perturbation prediction, gene network inference
MSA30 M/100 MArabic NLP tasks (NER, POS tagging, sentiment analysis, etc.)ESM-1b650 MSupervised prediction of mutational effect and secondary structure
GMAIUnknownGeneralist medical AI (multimodal tasks)AlphaFold221 MProtein structure prediction
DNABERT110 MDNA sequence prediction (promoters, TFBSs, splice sites)AlphaFold393 MProtein structure prediction, structure of protein–protein interaction prediction
EnformerUnknownGene expression predictionRGN2110 MProtein design and analysis of allelic variation or disease mutations
HyenaDNA7 MGenomic sequence modeling (regulatory elements, chromatin profiles)Uni-Mol1.1B3D position recovery, masked atom prediction, molecular property prediction
Nucleotide Transformer500 M ~ 2.5BDNA sequence analysisRNA-FM99.52 MRNA secondary structure prediction, distance regression task
ProteinBERT16 MBidirectional language modeling of protein sequences, Gene Ontology (GO) annotation predictionUNI-RNA25 M/85 M/169 M/400 MRNA structure and function prediction
ProtGPT1.6 M/25.2 MProtein sequence generationRNA-MSMUnknownRNA structure and function prediction
ProtGPT2738 MProtein sequence generation, structural similarity detection, stability predictionBingo8 ~ 15 MFilling in randomly masked amino acids, generating residue-level feature matrix and protein contact map
xTrimoPGLM100BProtein understanding and generationscFoundation100 MGene expression enhancement, tissue drug response prediction, single-cell drug response classification, single-cell perturbation prediction
DNABERT-2117 MDNA sequence predictionscHyenaUnknownCell type classification, scRNA-seq imputation
scBERTUnknownSingle-cell RNA sequencing analysisProtST650 MUnimodal mask prediction, multimodal representation alignment, multimodal mask prediction
Model nameModel sizeModel taskModel nameModel sizeModel task
BioBERT110 M/340 MBiomedical text mining (NER, RE, QA)ProGen1.2BStability prediction, remote homology detection, secondary structure prediction
BioELECTRA109 MBiomedical text mining (NER, RE, QA)ProGen26.4BFunctional sequence generation, protein fitness prediction
BLURBUnknownBiomedical NLP benchmark (QA, NER, parsing, etc.)CLAPE-DBUnknownProtein–ligand-binding site prediction
BioBART139 M/400 MBiomedical text generation (dialogue, summarization, NER)Geneformer30 MSequence-based prediction
Med-PaLM12B/84B/562BMedical question answeringscGPTUnknownMultibatch integration, multi-omic integration, cell-type annotation, genetic perturbation prediction, gene network inference
MSA30 M/100 MArabic NLP tasks (NER, POS tagging, sentiment analysis, etc.)ESM-1b650 MSupervised prediction of mutational effect and secondary structure
GMAIUnknownGeneralist medical AI (multimodal tasks)AlphaFold221 MProtein structure prediction
DNABERT110 MDNA sequence prediction (promoters, TFBSs, splice sites)AlphaFold393 MProtein structure prediction, structure of protein–protein interaction prediction
EnformerUnknownGene expression predictionRGN2110 MProtein design and analysis of allelic variation or disease mutations
HyenaDNA7 MGenomic sequence modeling (regulatory elements, chromatin profiles)Uni-Mol1.1B3D position recovery, masked atom prediction, molecular property prediction
Nucleotide Transformer500 M ~ 2.5BDNA sequence analysisRNA-FM99.52 MRNA secondary structure prediction, distance regression task
ProteinBERT16 MBidirectional language modeling of protein sequences, Gene Ontology (GO) annotation predictionUNI-RNA25 M/85 M/169 M/400 MRNA structure and function prediction
ProtGPT1.6 M/25.2 MProtein sequence generationRNA-MSMUnknownRNA structure and function prediction
ProtGPT2738 MProtein sequence generation, structural similarity detection, stability predictionBingo8 ~ 15 MFilling in randomly masked amino acids, generating residue-level feature matrix and protein contact map
xTrimoPGLM100BProtein understanding and generationscFoundation100 MGene expression enhancement, tissue drug response prediction, single-cell drug response classification, single-cell perturbation prediction
DNABERT-2117 MDNA sequence predictionscHyenaUnknownCell type classification, scRNA-seq imputation
scBERTUnknownSingle-cell RNA sequencing analysisProtST650 MUnimodal mask prediction, multimodal representation alignment, multimodal mask prediction
Table 3

A summary of key characteristics of FMs in bioinformatics.

Model nameModel sizeModel taskModel nameModel sizeModel task
BioBERT110 M/340 MBiomedical text mining (NER, RE, QA)ProGen1.2BStability prediction, remote homology detection, secondary structure prediction
BioELECTRA109 MBiomedical text mining (NER, RE, QA)ProGen26.4BFunctional sequence generation, protein fitness prediction
BLURBUnknownBiomedical NLP benchmark (QA, NER, parsing, etc.)CLAPE-DBUnknownProtein–ligand-binding site prediction
BioBART139 M/400 MBiomedical text generation (dialogue, summarization, NER)Geneformer30 MSequence-based prediction
Med-PaLM12B/84B/562BMedical question answeringscGPTUnknownMultibatch integration, multi-omic integration, cell-type annotation, genetic perturbation prediction, gene network inference
MSA30 M/100 MArabic NLP tasks (NER, POS tagging, sentiment analysis, etc.)ESM-1b650 MSupervised prediction of mutational effect and secondary structure
GMAIUnknownGeneralist medical AI (multimodal tasks)AlphaFold221 MProtein structure prediction
DNABERT110 MDNA sequence prediction (promoters, TFBSs, splice sites)AlphaFold393 MProtein structure prediction, structure of protein–protein interaction prediction
EnformerUnknownGene expression predictionRGN2110 MProtein design and analysis of allelic variation or disease mutations
HyenaDNA7 MGenomic sequence modeling (regulatory elements, chromatin profiles)Uni-Mol1.1B3D position recovery, masked atom prediction, molecular property prediction
Nucleotide Transformer500 M ~ 2.5BDNA sequence analysisRNA-FM99.52 MRNA secondary structure prediction, distance regression task
ProteinBERT16 MBidirectional language modeling of protein sequences, Gene Ontology (GO) annotation predictionUNI-RNA25 M/85 M/169 M/400 MRNA structure and function prediction
ProtGPT1.6 M/25.2 MProtein sequence generationRNA-MSMUnknownRNA structure and function prediction
ProtGPT2738 MProtein sequence generation, structural similarity detection, stability predictionBingo8 ~ 15 MFilling in randomly masked amino acids, generating residue-level feature matrix and protein contact map
xTrimoPGLM100BProtein understanding and generationscFoundation100 MGene expression enhancement, tissue drug response prediction, single-cell drug response classification, single-cell perturbation prediction
DNABERT-2117 MDNA sequence predictionscHyenaUnknownCell type classification, scRNA-seq imputation
scBERTUnknownSingle-cell RNA sequencing analysisProtST650 MUnimodal mask prediction, multimodal representation alignment, multimodal mask prediction
Model nameModel sizeModel taskModel nameModel sizeModel task
BioBERT110 M/340 MBiomedical text mining (NER, RE, QA)ProGen1.2BStability prediction, remote homology detection, secondary structure prediction
BioELECTRA109 MBiomedical text mining (NER, RE, QA)ProGen26.4BFunctional sequence generation, protein fitness prediction
BLURBUnknownBiomedical NLP benchmark (QA, NER, parsing, etc.)CLAPE-DBUnknownProtein–ligand-binding site prediction
BioBART139 M/400 MBiomedical text generation (dialogue, summarization, NER)Geneformer30 MSequence-based prediction
Med-PaLM12B/84B/562BMedical question answeringscGPTUnknownMultibatch integration, multi-omic integration, cell-type annotation, genetic perturbation prediction, gene network inference
MSA30 M/100 MArabic NLP tasks (NER, POS tagging, sentiment analysis, etc.)ESM-1b650 MSupervised prediction of mutational effect and secondary structure
GMAIUnknownGeneralist medical AI (multimodal tasks)AlphaFold221 MProtein structure prediction
DNABERT110 MDNA sequence prediction (promoters, TFBSs, splice sites)AlphaFold393 MProtein structure prediction, structure of protein–protein interaction prediction
EnformerUnknownGene expression predictionRGN2110 MProtein design and analysis of allelic variation or disease mutations
HyenaDNA7 MGenomic sequence modeling (regulatory elements, chromatin profiles)Uni-Mol1.1B3D position recovery, masked atom prediction, molecular property prediction
Nucleotide Transformer500 M ~ 2.5BDNA sequence analysisRNA-FM99.52 MRNA secondary structure prediction, distance regression task
ProteinBERT16 MBidirectional language modeling of protein sequences, Gene Ontology (GO) annotation predictionUNI-RNA25 M/85 M/169 M/400 MRNA structure and function prediction
ProtGPT1.6 M/25.2 MProtein sequence generationRNA-MSMUnknownRNA structure and function prediction
ProtGPT2738 MProtein sequence generation, structural similarity detection, stability predictionBingo8 ~ 15 MFilling in randomly masked amino acids, generating residue-level feature matrix and protein contact map
xTrimoPGLM100BProtein understanding and generationscFoundation100 MGene expression enhancement, tissue drug response prediction, single-cell drug response classification, single-cell perturbation prediction
DNABERT-2117 MDNA sequence predictionscHyenaUnknownCell type classification, scRNA-seq imputation
scBERTUnknownSingle-cell RNA sequencing analysisProtST650 MUnimodal mask prediction, multimodal representation alignment, multimodal mask prediction
Close
This Feature Is Available To Subscribers Only

Sign In or Create an Account

Close

This PDF is available to Subscribers Only

View Article Abstract & Purchase Options

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Close