Table 4

Macro F1 (⁠|$\% \uparrow $|⁠) and Weighted F1 (⁠|$\% \uparrow $|⁠) on eight downstream tasks. This includes gene operon prediction on E-K12, ARG prediction on three CARD categories, virulence factors classification on VFDB, enzyme function annotation on ENZYME, microbial pathogens detection on PATRIC, Nitrogen Cycle processes prediction on NCycDB. RF denotes Random Forest, and VT represents Vanilla Transformer. The highest results are highlighted with boldface. The second highest results are highlighted with underline

MethodOperonsARG predictionVirusEnzymePathogenN-Cycle
 E-K12CARD-ACARD-DCARD-RVFDBENZYMEPATRICNCycDB
 M.F1W.F1M.F1W.F1M.F1W.F1M.F1W.F1M.F1W.F1M.F1W.F1M.F1W.F1M.F1W.F1
RF20.234.822.435.336.149.047.857.622.438.533.641.225.329.867.071.7
SVM38.645.227.640.533.647.243.366.228.041.431.343.626.631.266.970.3
KNN39.941.036.954.436.451.336.263.527.347.131.442.911.027.468.873.2
LSTM40.442.547.160.339.162.347.584.236.766.342.851.041.349.771.981.2
BiLSTM38.243.847.461.943.558.158.980.346.172.138.750.243.348.582.088.4
VT43.347.857.170.049.868.155.786.458.081.068.275.849.857.384.590.7
HyenaDNA42.447.150.968.253.678.166.288.161.070.479.683.651.157.692.496.0
ESM-238.242.557.271.456.082.168.290.060.784.492.596.756.067.595.896.1
NT45.144.858.572.056.280.268.090.358.371.674.176.746.161.975.186.5
DNABERT251.752.465.279.851.578.761.288.658.282.385.485.252.960.688.695.7
Ours61.865.478.690.157.485.269.491.475.790.299.198.899.399.099.599.2
MethodOperonsARG predictionVirusEnzymePathogenN-Cycle
 E-K12CARD-ACARD-DCARD-RVFDBENZYMEPATRICNCycDB
 M.F1W.F1M.F1W.F1M.F1W.F1M.F1W.F1M.F1W.F1M.F1W.F1M.F1W.F1M.F1W.F1
RF20.234.822.435.336.149.047.857.622.438.533.641.225.329.867.071.7
SVM38.645.227.640.533.647.243.366.228.041.431.343.626.631.266.970.3
KNN39.941.036.954.436.451.336.263.527.347.131.442.911.027.468.873.2
LSTM40.442.547.160.339.162.347.584.236.766.342.851.041.349.771.981.2
BiLSTM38.243.847.461.943.558.158.980.346.172.138.750.243.348.582.088.4
VT43.347.857.170.049.868.155.786.458.081.068.275.849.857.384.590.7
HyenaDNA42.447.150.968.253.678.166.288.161.070.479.683.651.157.692.496.0
ESM-238.242.557.271.456.082.168.290.060.784.492.596.756.067.595.896.1
NT45.144.858.572.056.280.268.090.358.371.674.176.746.161.975.186.5
DNABERT251.752.465.279.851.578.761.288.658.282.385.485.252.960.688.695.7
Ours61.865.478.690.157.485.269.491.475.790.299.198.899.399.099.599.2
Table 4

Macro F1 (⁠|$\% \uparrow $|⁠) and Weighted F1 (⁠|$\% \uparrow $|⁠) on eight downstream tasks. This includes gene operon prediction on E-K12, ARG prediction on three CARD categories, virulence factors classification on VFDB, enzyme function annotation on ENZYME, microbial pathogens detection on PATRIC, Nitrogen Cycle processes prediction on NCycDB. RF denotes Random Forest, and VT represents Vanilla Transformer. The highest results are highlighted with boldface. The second highest results are highlighted with underline

MethodOperonsARG predictionVirusEnzymePathogenN-Cycle
 E-K12CARD-ACARD-DCARD-RVFDBENZYMEPATRICNCycDB
 M.F1W.F1M.F1W.F1M.F1W.F1M.F1W.F1M.F1W.F1M.F1W.F1M.F1W.F1M.F1W.F1
RF20.234.822.435.336.149.047.857.622.438.533.641.225.329.867.071.7
SVM38.645.227.640.533.647.243.366.228.041.431.343.626.631.266.970.3
KNN39.941.036.954.436.451.336.263.527.347.131.442.911.027.468.873.2
LSTM40.442.547.160.339.162.347.584.236.766.342.851.041.349.771.981.2
BiLSTM38.243.847.461.943.558.158.980.346.172.138.750.243.348.582.088.4
VT43.347.857.170.049.868.155.786.458.081.068.275.849.857.384.590.7
HyenaDNA42.447.150.968.253.678.166.288.161.070.479.683.651.157.692.496.0
ESM-238.242.557.271.456.082.168.290.060.784.492.596.756.067.595.896.1
NT45.144.858.572.056.280.268.090.358.371.674.176.746.161.975.186.5
DNABERT251.752.465.279.851.578.761.288.658.282.385.485.252.960.688.695.7
Ours61.865.478.690.157.485.269.491.475.790.299.198.899.399.099.599.2
MethodOperonsARG predictionVirusEnzymePathogenN-Cycle
 E-K12CARD-ACARD-DCARD-RVFDBENZYMEPATRICNCycDB
 M.F1W.F1M.F1W.F1M.F1W.F1M.F1W.F1M.F1W.F1M.F1W.F1M.F1W.F1M.F1W.F1
RF20.234.822.435.336.149.047.857.622.438.533.641.225.329.867.071.7
SVM38.645.227.640.533.647.243.366.228.041.431.343.626.631.266.970.3
KNN39.941.036.954.436.451.336.263.527.347.131.442.911.027.468.873.2
LSTM40.442.547.160.339.162.347.584.236.766.342.851.041.349.771.981.2
BiLSTM38.243.847.461.943.558.158.980.346.172.138.750.243.348.582.088.4
VT43.347.857.170.049.868.155.786.458.081.068.275.849.857.384.590.7
HyenaDNA42.447.150.968.253.678.166.288.161.070.479.683.651.157.692.496.0
ESM-238.242.557.271.456.082.168.290.060.784.492.596.756.067.595.896.1
NT45.144.858.572.056.280.268.090.358.371.674.176.746.161.975.186.5
DNABERT251.752.465.279.851.578.761.288.658.282.385.485.252.960.688.695.7
Ours61.865.478.690.157.485.269.491.475.790.299.198.899.399.099.599.2
Close
This Feature Is Available To Subscribers Only

Sign In or Create an Account

Close

This PDF is available to Subscribers Only

View Article Abstract & Purchase Options

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Close