Distribution of train and independent test sets for the three datasets and target residues (prior to balancing).
Set . | Target residue . | Train . | Independent test . | CD-HIT threshold . | ||||
---|---|---|---|---|---|---|---|---|
No. of P-sites . | No. of NP-sites . | Ratio (NP:P) . | No. of P-sites . | No. of NP-sites . | Ratio (NP:P) . | |||
Primarya | S + T | 154 220 | 800 329 | 5.19:1 | 16 964 | 85 057 | 5.01:1 | 0.5 |
Y | 27 077 | 123 918 | 4.57:1 | 3054 | 13 347 | 4.37:1 | 0.5 | |
Chlamydomonas reinhardtii | S + T | 17 345 | 460 015 | 26.52:1 | 4338 | 115 005 | 26.51:1 | N/A |
A549 | S + T | N/A | N/A | N/A | 1144 | 1049 | 0.92:1 | 0.3 |
Set . | Target residue . | Train . | Independent test . | CD-HIT threshold . | ||||
---|---|---|---|---|---|---|---|---|
No. of P-sites . | No. of NP-sites . | Ratio (NP:P) . | No. of P-sites . | No. of NP-sites . | Ratio (NP:P) . | |||
Primarya | S + T | 154 220 | 800 329 | 5.19:1 | 16 964 | 85 057 | 5.01:1 | 0.5 |
Y | 27 077 | 123 918 | 4.57:1 | 3054 | 13 347 | 4.37:1 | 0.5 | |
Chlamydomonas reinhardtii | S + T | 17 345 | 460 015 | 26.52:1 | 4338 | 115 005 | 26.51:1 | N/A |
A549 | S + T | N/A | N/A | N/A | 1144 | 1049 | 0.92:1 | 0.3 |
The adopted DeepPSP dataset, after reverse translation, is referred to as the primary dataset. The number of sites in both the S + T and Y sets in this dataset differs from those reported in the DeepPSP paper due to the loss of some sequences during the translation process.
Distribution of train and independent test sets for the three datasets and target residues (prior to balancing).
Set . | Target residue . | Train . | Independent test . | CD-HIT threshold . | ||||
---|---|---|---|---|---|---|---|---|
No. of P-sites . | No. of NP-sites . | Ratio (NP:P) . | No. of P-sites . | No. of NP-sites . | Ratio (NP:P) . | |||
Primarya | S + T | 154 220 | 800 329 | 5.19:1 | 16 964 | 85 057 | 5.01:1 | 0.5 |
Y | 27 077 | 123 918 | 4.57:1 | 3054 | 13 347 | 4.37:1 | 0.5 | |
Chlamydomonas reinhardtii | S + T | 17 345 | 460 015 | 26.52:1 | 4338 | 115 005 | 26.51:1 | N/A |
A549 | S + T | N/A | N/A | N/A | 1144 | 1049 | 0.92:1 | 0.3 |
Set . | Target residue . | Train . | Independent test . | CD-HIT threshold . | ||||
---|---|---|---|---|---|---|---|---|
No. of P-sites . | No. of NP-sites . | Ratio (NP:P) . | No. of P-sites . | No. of NP-sites . | Ratio (NP:P) . | |||
Primarya | S + T | 154 220 | 800 329 | 5.19:1 | 16 964 | 85 057 | 5.01:1 | 0.5 |
Y | 27 077 | 123 918 | 4.57:1 | 3054 | 13 347 | 4.37:1 | 0.5 | |
Chlamydomonas reinhardtii | S + T | 17 345 | 460 015 | 26.52:1 | 4338 | 115 005 | 26.51:1 | N/A |
A549 | S + T | N/A | N/A | N/A | 1144 | 1049 | 0.92:1 | 0.3 |
The adopted DeepPSP dataset, after reverse translation, is referred to as the primary dataset. The number of sites in both the S + T and Y sets in this dataset differs from those reported in the DeepPSP paper due to the loss of some sequences during the translation process.
This PDF is available to Subscribers Only
View Article Abstract & Purchase OptionsFor full access to this pdf, sign in to an existing account, or purchase an annual subscription.