Table 1.

Comparison of the architecture, training data, and training approach for the protein language model (LM) ESM-2 (Lin et al. 2023), the antibody-specific LMs AntiBERTy (Ruffolo et al. 2021) and AbLang-1 (Olsen et al. 2022b), and our new selection of antibody-specific LMs.a

ArchitectureTraining dataPairedLoss functionTraining objectiveTraining stepsBatch size
ESM-2
  • ESM-2

  • 33L + 1280ES

  • UR50/D

  • 60M sequences

NCEMLM500K2M tokens
AntiBERTy
  • BERT

  • 8L + 512ES

558M VH/VLNCEMLM8 epochsN/A
AbLang-1
  • RoBERTa

  • 12L + 768ES

  • 187K VL

  • 14.2M VH

NCEMLM
  • 2K

  • 34K

  • 500K tokens

  • 1M tokens

Ab-Unpaired
  • ESM-2

  • 6L + 320ES

  • 1.26M VL

  • 1.26M VH

NCEMLM10K1M tokens
Ab-Paired
  • ESM-2

  • 6L + 320ES

1.26M pairedYCEMLM10K1–2M tokens
Ab-FL
  • ESM-2

  • 6L + 320ES

1.26M pairedYFLMLM10K1–2M tokens
Ab-ModMask
  • ESM-2

  • 6L + 320ES

1.26M pairedYFLModified MLM10K1–2M tokens
Ab-FT
  • ESM-2

  • 6L + 320ES

  • 35.6M VH/VL

  • 1.26M paired

YFLModified MLM10K + 1K1–2M tokens
AbLang-2
  • ESM-2

  • 12L + 480ES

  • 35.6M VH/VL

  • 1.26M paired

YFLModified MLM200K + 10K1–2M tokens
ArchitectureTraining dataPairedLoss functionTraining objectiveTraining stepsBatch size
ESM-2
  • ESM-2

  • 33L + 1280ES

  • UR50/D

  • 60M sequences

NCEMLM500K2M tokens
AntiBERTy
  • BERT

  • 8L + 512ES

558M VH/VLNCEMLM8 epochsN/A
AbLang-1
  • RoBERTa

  • 12L + 768ES

  • 187K VL

  • 14.2M VH

NCEMLM
  • 2K

  • 34K

  • 500K tokens

  • 1M tokens

Ab-Unpaired
  • ESM-2

  • 6L + 320ES

  • 1.26M VL

  • 1.26M VH

NCEMLM10K1M tokens
Ab-Paired
  • ESM-2

  • 6L + 320ES

1.26M pairedYCEMLM10K1–2M tokens
Ab-FL
  • ESM-2

  • 6L + 320ES

1.26M pairedYFLMLM10K1–2M tokens
Ab-ModMask
  • ESM-2

  • 6L + 320ES

1.26M pairedYFLModified MLM10K1–2M tokens
Ab-FT
  • ESM-2

  • 6L + 320ES

  • 35.6M VH/VL

  • 1.26M paired

YFLModified MLM10K + 1K1–2M tokens
AbLang-2
  • ESM-2

  • 12L + 480ES

  • 35.6M VH/VL

  • 1.26M paired

YFLModified MLM200K + 10K1–2M tokens
a

The architecture column shows the most similar architecture and the model’s size with the number of layers (L) and embedding size (ES). While the exact number of training steps for AntiBERTy is unknown, it was trained for eight epochs (Ruffolo et al. 2021). AbLang-1 and the new antibody-specific LMs were trained on 8192 sequences (4096 for AbLang-1 Light) per batch, with each sequence comprising approximately 120 amino acids. Each batch thus contained about 1M tokens for unpaired sequences and 2M for paired antibody VH-VL sequences. CE, cross-entropy loss; FL, focal loss; MLM, masked language modeling.

Table 1.

Comparison of the architecture, training data, and training approach for the protein language model (LM) ESM-2 (Lin et al. 2023), the antibody-specific LMs AntiBERTy (Ruffolo et al. 2021) and AbLang-1 (Olsen et al. 2022b), and our new selection of antibody-specific LMs.a

ArchitectureTraining dataPairedLoss functionTraining objectiveTraining stepsBatch size
ESM-2
  • ESM-2

  • 33L + 1280ES

  • UR50/D

  • 60M sequences

NCEMLM500K2M tokens
AntiBERTy
  • BERT

  • 8L + 512ES

558M VH/VLNCEMLM8 epochsN/A
AbLang-1
  • RoBERTa

  • 12L + 768ES

  • 187K VL

  • 14.2M VH

NCEMLM
  • 2K

  • 34K

  • 500K tokens

  • 1M tokens

Ab-Unpaired
  • ESM-2

  • 6L + 320ES

  • 1.26M VL

  • 1.26M VH

NCEMLM10K1M tokens
Ab-Paired
  • ESM-2

  • 6L + 320ES

1.26M pairedYCEMLM10K1–2M tokens
Ab-FL
  • ESM-2

  • 6L + 320ES

1.26M pairedYFLMLM10K1–2M tokens
Ab-ModMask
  • ESM-2

  • 6L + 320ES

1.26M pairedYFLModified MLM10K1–2M tokens
Ab-FT
  • ESM-2

  • 6L + 320ES

  • 35.6M VH/VL

  • 1.26M paired

YFLModified MLM10K + 1K1–2M tokens
AbLang-2
  • ESM-2

  • 12L + 480ES

  • 35.6M VH/VL

  • 1.26M paired

YFLModified MLM200K + 10K1–2M tokens
ArchitectureTraining dataPairedLoss functionTraining objectiveTraining stepsBatch size
ESM-2
  • ESM-2

  • 33L + 1280ES

  • UR50/D

  • 60M sequences

NCEMLM500K2M tokens
AntiBERTy
  • BERT

  • 8L + 512ES

558M VH/VLNCEMLM8 epochsN/A
AbLang-1
  • RoBERTa

  • 12L + 768ES

  • 187K VL

  • 14.2M VH

NCEMLM
  • 2K

  • 34K

  • 500K tokens

  • 1M tokens

Ab-Unpaired
  • ESM-2

  • 6L + 320ES

  • 1.26M VL

  • 1.26M VH

NCEMLM10K1M tokens
Ab-Paired
  • ESM-2

  • 6L + 320ES

1.26M pairedYCEMLM10K1–2M tokens
Ab-FL
  • ESM-2

  • 6L + 320ES

1.26M pairedYFLMLM10K1–2M tokens
Ab-ModMask
  • ESM-2

  • 6L + 320ES

1.26M pairedYFLModified MLM10K1–2M tokens
Ab-FT
  • ESM-2

  • 6L + 320ES

  • 35.6M VH/VL

  • 1.26M paired

YFLModified MLM10K + 1K1–2M tokens
AbLang-2
  • ESM-2

  • 12L + 480ES

  • 35.6M VH/VL

  • 1.26M paired

YFLModified MLM200K + 10K1–2M tokens
a

The architecture column shows the most similar architecture and the model’s size with the number of layers (L) and embedding size (ES). While the exact number of training steps for AntiBERTy is unknown, it was trained for eight epochs (Ruffolo et al. 2021). AbLang-1 and the new antibody-specific LMs were trained on 8192 sequences (4096 for AbLang-1 Light) per batch, with each sequence comprising approximately 120 amino acids. Each batch thus contained about 1M tokens for unpaired sequences and 2M for paired antibody VH-VL sequences. CE, cross-entropy loss; FL, focal loss; MLM, masked language modeling.

Close
This Feature Is Available To Subscribers Only

Sign In or Create an Account

Close

This PDF is available to Subscribers Only

View Article Abstract & Purchase Options

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Close