Macro F1 (|$\% \uparrow $|) and Weighted F1 (|$\% \uparrow $|) on eight downstream tasks. This includes gene operon prediction on E-K12, ARG prediction on three CARD categories, virulence factors classification on VFDB, enzyme function annotation on ENZYME, microbial pathogens detection on PATRIC, Nitrogen Cycle processes prediction on NCycDB. RF denotes Random Forest, and VT represents Vanilla Transformer. The highest results are highlighted with boldface. The second highest results are highlighted with underline
Method . | Operons . | ARG prediction . | Virus . | Enzyme . | Pathogen . | N-Cycle . | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
. | E-K12 . | CARD-A . | CARD-D . | CARD-R . | VFDB . | ENZYME . | PATRIC . | NCycDB . | ||||||||
. | M.F1 . | W.F1 . | M.F1 . | W.F1 . | M.F1 . | W.F1 . | M.F1 . | W.F1 . | M.F1 . | W.F1 . | M.F1 . | W.F1 . | M.F1 . | W.F1 . | M.F1 . | W.F1 . |
RF | 20.2 | 34.8 | 22.4 | 35.3 | 36.1 | 49.0 | 47.8 | 57.6 | 22.4 | 38.5 | 33.6 | 41.2 | 25.3 | 29.8 | 67.0 | 71.7 |
SVM | 38.6 | 45.2 | 27.6 | 40.5 | 33.6 | 47.2 | 43.3 | 66.2 | 28.0 | 41.4 | 31.3 | 43.6 | 26.6 | 31.2 | 66.9 | 70.3 |
KNN | 39.9 | 41.0 | 36.9 | 54.4 | 36.4 | 51.3 | 36.2 | 63.5 | 27.3 | 47.1 | 31.4 | 42.9 | 11.0 | 27.4 | 68.8 | 73.2 |
LSTM | 40.4 | 42.5 | 47.1 | 60.3 | 39.1 | 62.3 | 47.5 | 84.2 | 36.7 | 66.3 | 42.8 | 51.0 | 41.3 | 49.7 | 71.9 | 81.2 |
BiLSTM | 38.2 | 43.8 | 47.4 | 61.9 | 43.5 | 58.1 | 58.9 | 80.3 | 46.1 | 72.1 | 38.7 | 50.2 | 43.3 | 48.5 | 82.0 | 88.4 |
VT | 43.3 | 47.8 | 57.1 | 70.0 | 49.8 | 68.1 | 55.7 | 86.4 | 58.0 | 81.0 | 68.2 | 75.8 | 49.8 | 57.3 | 84.5 | 90.7 |
HyenaDNA | 42.4 | 47.1 | 50.9 | 68.2 | 53.6 | 78.1 | 66.2 | 88.1 | 61.0 | 70.4 | 79.6 | 83.6 | 51.1 | 57.6 | 92.4 | 96.0 |
ESM-2 | 38.2 | 42.5 | 57.2 | 71.4 | 56.0 | 82.1 | 68.2 | 90.0 | 60.7 | 84.4 | 92.5 | 96.7 | 56.0 | 67.5 | 95.8 | 96.1 |
NT | 45.1 | 44.8 | 58.5 | 72.0 | 56.2 | 80.2 | 68.0 | 90.3 | 58.3 | 71.6 | 74.1 | 76.7 | 46.1 | 61.9 | 75.1 | 86.5 |
DNABERT2 | 51.7 | 52.4 | 65.2 | 79.8 | 51.5 | 78.7 | 61.2 | 88.6 | 58.2 | 82.3 | 85.4 | 85.2 | 52.9 | 60.6 | 88.6 | 95.7 |
Ours | 61.8 | 65.4 | 78.6 | 90.1 | 57.4 | 85.2 | 69.4 | 91.4 | 75.7 | 90.2 | 99.1 | 98.8 | 99.3 | 99.0 | 99.5 | 99.2 |
Method . | Operons . | ARG prediction . | Virus . | Enzyme . | Pathogen . | N-Cycle . | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
. | E-K12 . | CARD-A . | CARD-D . | CARD-R . | VFDB . | ENZYME . | PATRIC . | NCycDB . | ||||||||
. | M.F1 . | W.F1 . | M.F1 . | W.F1 . | M.F1 . | W.F1 . | M.F1 . | W.F1 . | M.F1 . | W.F1 . | M.F1 . | W.F1 . | M.F1 . | W.F1 . | M.F1 . | W.F1 . |
RF | 20.2 | 34.8 | 22.4 | 35.3 | 36.1 | 49.0 | 47.8 | 57.6 | 22.4 | 38.5 | 33.6 | 41.2 | 25.3 | 29.8 | 67.0 | 71.7 |
SVM | 38.6 | 45.2 | 27.6 | 40.5 | 33.6 | 47.2 | 43.3 | 66.2 | 28.0 | 41.4 | 31.3 | 43.6 | 26.6 | 31.2 | 66.9 | 70.3 |
KNN | 39.9 | 41.0 | 36.9 | 54.4 | 36.4 | 51.3 | 36.2 | 63.5 | 27.3 | 47.1 | 31.4 | 42.9 | 11.0 | 27.4 | 68.8 | 73.2 |
LSTM | 40.4 | 42.5 | 47.1 | 60.3 | 39.1 | 62.3 | 47.5 | 84.2 | 36.7 | 66.3 | 42.8 | 51.0 | 41.3 | 49.7 | 71.9 | 81.2 |
BiLSTM | 38.2 | 43.8 | 47.4 | 61.9 | 43.5 | 58.1 | 58.9 | 80.3 | 46.1 | 72.1 | 38.7 | 50.2 | 43.3 | 48.5 | 82.0 | 88.4 |
VT | 43.3 | 47.8 | 57.1 | 70.0 | 49.8 | 68.1 | 55.7 | 86.4 | 58.0 | 81.0 | 68.2 | 75.8 | 49.8 | 57.3 | 84.5 | 90.7 |
HyenaDNA | 42.4 | 47.1 | 50.9 | 68.2 | 53.6 | 78.1 | 66.2 | 88.1 | 61.0 | 70.4 | 79.6 | 83.6 | 51.1 | 57.6 | 92.4 | 96.0 |
ESM-2 | 38.2 | 42.5 | 57.2 | 71.4 | 56.0 | 82.1 | 68.2 | 90.0 | 60.7 | 84.4 | 92.5 | 96.7 | 56.0 | 67.5 | 95.8 | 96.1 |
NT | 45.1 | 44.8 | 58.5 | 72.0 | 56.2 | 80.2 | 68.0 | 90.3 | 58.3 | 71.6 | 74.1 | 76.7 | 46.1 | 61.9 | 75.1 | 86.5 |
DNABERT2 | 51.7 | 52.4 | 65.2 | 79.8 | 51.5 | 78.7 | 61.2 | 88.6 | 58.2 | 82.3 | 85.4 | 85.2 | 52.9 | 60.6 | 88.6 | 95.7 |
Ours | 61.8 | 65.4 | 78.6 | 90.1 | 57.4 | 85.2 | 69.4 | 91.4 | 75.7 | 90.2 | 99.1 | 98.8 | 99.3 | 99.0 | 99.5 | 99.2 |
Macro F1 (|$\% \uparrow $|) and Weighted F1 (|$\% \uparrow $|) on eight downstream tasks. This includes gene operon prediction on E-K12, ARG prediction on three CARD categories, virulence factors classification on VFDB, enzyme function annotation on ENZYME, microbial pathogens detection on PATRIC, Nitrogen Cycle processes prediction on NCycDB. RF denotes Random Forest, and VT represents Vanilla Transformer. The highest results are highlighted with boldface. The second highest results are highlighted with underline
Method . | Operons . | ARG prediction . | Virus . | Enzyme . | Pathogen . | N-Cycle . | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
. | E-K12 . | CARD-A . | CARD-D . | CARD-R . | VFDB . | ENZYME . | PATRIC . | NCycDB . | ||||||||
. | M.F1 . | W.F1 . | M.F1 . | W.F1 . | M.F1 . | W.F1 . | M.F1 . | W.F1 . | M.F1 . | W.F1 . | M.F1 . | W.F1 . | M.F1 . | W.F1 . | M.F1 . | W.F1 . |
RF | 20.2 | 34.8 | 22.4 | 35.3 | 36.1 | 49.0 | 47.8 | 57.6 | 22.4 | 38.5 | 33.6 | 41.2 | 25.3 | 29.8 | 67.0 | 71.7 |
SVM | 38.6 | 45.2 | 27.6 | 40.5 | 33.6 | 47.2 | 43.3 | 66.2 | 28.0 | 41.4 | 31.3 | 43.6 | 26.6 | 31.2 | 66.9 | 70.3 |
KNN | 39.9 | 41.0 | 36.9 | 54.4 | 36.4 | 51.3 | 36.2 | 63.5 | 27.3 | 47.1 | 31.4 | 42.9 | 11.0 | 27.4 | 68.8 | 73.2 |
LSTM | 40.4 | 42.5 | 47.1 | 60.3 | 39.1 | 62.3 | 47.5 | 84.2 | 36.7 | 66.3 | 42.8 | 51.0 | 41.3 | 49.7 | 71.9 | 81.2 |
BiLSTM | 38.2 | 43.8 | 47.4 | 61.9 | 43.5 | 58.1 | 58.9 | 80.3 | 46.1 | 72.1 | 38.7 | 50.2 | 43.3 | 48.5 | 82.0 | 88.4 |
VT | 43.3 | 47.8 | 57.1 | 70.0 | 49.8 | 68.1 | 55.7 | 86.4 | 58.0 | 81.0 | 68.2 | 75.8 | 49.8 | 57.3 | 84.5 | 90.7 |
HyenaDNA | 42.4 | 47.1 | 50.9 | 68.2 | 53.6 | 78.1 | 66.2 | 88.1 | 61.0 | 70.4 | 79.6 | 83.6 | 51.1 | 57.6 | 92.4 | 96.0 |
ESM-2 | 38.2 | 42.5 | 57.2 | 71.4 | 56.0 | 82.1 | 68.2 | 90.0 | 60.7 | 84.4 | 92.5 | 96.7 | 56.0 | 67.5 | 95.8 | 96.1 |
NT | 45.1 | 44.8 | 58.5 | 72.0 | 56.2 | 80.2 | 68.0 | 90.3 | 58.3 | 71.6 | 74.1 | 76.7 | 46.1 | 61.9 | 75.1 | 86.5 |
DNABERT2 | 51.7 | 52.4 | 65.2 | 79.8 | 51.5 | 78.7 | 61.2 | 88.6 | 58.2 | 82.3 | 85.4 | 85.2 | 52.9 | 60.6 | 88.6 | 95.7 |
Ours | 61.8 | 65.4 | 78.6 | 90.1 | 57.4 | 85.2 | 69.4 | 91.4 | 75.7 | 90.2 | 99.1 | 98.8 | 99.3 | 99.0 | 99.5 | 99.2 |
Method . | Operons . | ARG prediction . | Virus . | Enzyme . | Pathogen . | N-Cycle . | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
. | E-K12 . | CARD-A . | CARD-D . | CARD-R . | VFDB . | ENZYME . | PATRIC . | NCycDB . | ||||||||
. | M.F1 . | W.F1 . | M.F1 . | W.F1 . | M.F1 . | W.F1 . | M.F1 . | W.F1 . | M.F1 . | W.F1 . | M.F1 . | W.F1 . | M.F1 . | W.F1 . | M.F1 . | W.F1 . |
RF | 20.2 | 34.8 | 22.4 | 35.3 | 36.1 | 49.0 | 47.8 | 57.6 | 22.4 | 38.5 | 33.6 | 41.2 | 25.3 | 29.8 | 67.0 | 71.7 |
SVM | 38.6 | 45.2 | 27.6 | 40.5 | 33.6 | 47.2 | 43.3 | 66.2 | 28.0 | 41.4 | 31.3 | 43.6 | 26.6 | 31.2 | 66.9 | 70.3 |
KNN | 39.9 | 41.0 | 36.9 | 54.4 | 36.4 | 51.3 | 36.2 | 63.5 | 27.3 | 47.1 | 31.4 | 42.9 | 11.0 | 27.4 | 68.8 | 73.2 |
LSTM | 40.4 | 42.5 | 47.1 | 60.3 | 39.1 | 62.3 | 47.5 | 84.2 | 36.7 | 66.3 | 42.8 | 51.0 | 41.3 | 49.7 | 71.9 | 81.2 |
BiLSTM | 38.2 | 43.8 | 47.4 | 61.9 | 43.5 | 58.1 | 58.9 | 80.3 | 46.1 | 72.1 | 38.7 | 50.2 | 43.3 | 48.5 | 82.0 | 88.4 |
VT | 43.3 | 47.8 | 57.1 | 70.0 | 49.8 | 68.1 | 55.7 | 86.4 | 58.0 | 81.0 | 68.2 | 75.8 | 49.8 | 57.3 | 84.5 | 90.7 |
HyenaDNA | 42.4 | 47.1 | 50.9 | 68.2 | 53.6 | 78.1 | 66.2 | 88.1 | 61.0 | 70.4 | 79.6 | 83.6 | 51.1 | 57.6 | 92.4 | 96.0 |
ESM-2 | 38.2 | 42.5 | 57.2 | 71.4 | 56.0 | 82.1 | 68.2 | 90.0 | 60.7 | 84.4 | 92.5 | 96.7 | 56.0 | 67.5 | 95.8 | 96.1 |
NT | 45.1 | 44.8 | 58.5 | 72.0 | 56.2 | 80.2 | 68.0 | 90.3 | 58.3 | 71.6 | 74.1 | 76.7 | 46.1 | 61.9 | 75.1 | 86.5 |
DNABERT2 | 51.7 | 52.4 | 65.2 | 79.8 | 51.5 | 78.7 | 61.2 | 88.6 | 58.2 | 82.3 | 85.4 | 85.2 | 52.9 | 60.6 | 88.6 | 95.7 |
Ours | 61.8 | 65.4 | 78.6 | 90.1 | 57.4 | 85.2 | 69.4 | 91.4 | 75.7 | 90.2 | 99.1 | 98.8 | 99.3 | 99.0 | 99.5 | 99.2 |
This PDF is available to Subscribers Only
View Article Abstract & Purchase OptionsFor full access to this pdf, sign in to an existing account, or purchase an annual subscription.