Figure 1.
Sketch of ProstT5. Model architecture: ProstT5 is a T5-based encoder-decoder model initialized with weights of the ProtT5 model (6). Pre-training: Foldseek (1) transferred protein 3D coordinates into 3Di tokens, i.e. 1D descriptions of 3D structure that assign each residue in a protein into one of twenty states described by a 1D string of letters. We used 17 million (17M) high-quality, non-redundant and diverse 3D predictions from AFDB (33). ProtT5 was leveraged as an already pre-trained starting point for translating between 1D sequence (amino acids, AA) and 3D structure (3Di). Firstly, we applied the original pre-training objective of ProtT5 (span-based denoising) to both, AAs and 3Di, to teach the model the new 3Di tokens while avoiding catastrophic forgetting of AAs. Secondly, we continued to train the resulting model to translate between AAs and 3Di and vice versa. The final model, ProstT5 (Protein structure-sequence T5) extracts the information in its internal embeddings that can be input into downstream applications. This includes established feature extraction using only the encoder (6), or bi-directional translation, either from AAs to 3Di (‘folding’) or from 3Di to AAs (‘inverse folding’). Inference: bi-directional translation from AA to 3Di (AA→3Di) or 3Di→AA can be conducted using either the encoder-decoder mode, necessitating token-wise decoder-inference or through an optimized inference mode, where 3Di tokens are directly predicted through a convolutional neural network from the encoder-embedding. The optimized 3Di inference mode results in a three orders of magnitude speedup over 3Di extraction from predicted protein structures (Figure 2).

Sketch of ProstT5. Model architecture: ProstT5 is a T5-based encoder-decoder model initialized with weights of the ProtT5 model (6). Pre-training: Foldseek (1) transferred protein 3D coordinates into 3Di tokens, i.e. 1D descriptions of 3D structure that assign each residue in a protein into one of twenty states described by a 1D string of letters. We used 17 million (17M) high-quality, non-redundant and diverse 3D predictions from AFDB (33). ProtT5 was leveraged as an already pre-trained starting point for translating between 1D sequence (amino acids, AA) and 3D structure (3Di). Firstly, we applied the original pre-training objective of ProtT5 (span-based denoising) to both, AAs and 3Di, to teach the model the new 3Di tokens while avoiding catastrophic forgetting of AAs. Secondly, we continued to train the resulting model to translate between AAs and 3Di and vice versa. The final model, ProstT5 (Protein structure-sequence T5) extracts the information in its internal embeddings that can be input into downstream applications. This includes established feature extraction using only the encoder (6), or bi-directional translation, either from AAs to 3Di (‘folding’) or from 3Di to AAs (‘inverse folding’). Inference: bi-directional translation from AA to 3Di (AA→3Di) or 3Di→AA can be conducted using either the encoder-decoder mode, necessitating token-wise decoder-inference or through an optimized inference mode, where 3Di tokens are directly predicted through a convolutional neural network from the encoder-embedding. The optimized 3Di inference mode results in a three orders of magnitude speedup over 3Di extraction from predicted protein structures (Figure 2).

Close
This Feature Is Available To Subscribers Only

Sign In or Create an Account

Close

This PDF is available to Subscribers Only

View Article Abstract & Purchase Options

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Close