The results of univariate-feature selection (on the left of each table cell) and double-feature selection (on the right of each table cell). Based on the GS, the performance metrics of our models were calculated with four types of sampling, including ordinary training set (noted as ‘NonSampling’), SMOTE training set (’SMOTE’), undersampling training set (’UnderSample’), and oversampling training set (’OverSample’). Selected features depend on the used classifiers as well as the sampling types. More analyses is presented in Section 4.1.2.
Method . | Sampling70 . | Feature ID . | Recall (per cent) . | Precision (per cent) . | F1 Score (per cent) . | FPR (per cent) . |
---|---|---|---|---|---|---|
DT | NonSampling | 10/3,10 | 80/91 | 94/93 | 86/92 | 0.062/0.093 |
SMOTE | 5/1,10 | 98/98 | 32/68 | 47/80 | 2.718/0.062 | |
UnderSample | 2/1,9 | 93/96 | 31/46 | 45/70 | 2.796/1.040 | |
OverSample | 2/2,10 | 92/95 | 41/80 | 56/87 | 2.784/0.307 | |
AdaBoost | NonSampling | 10/5,10 | 78/90 | 97/93 | 86/91 | 0.003/0.096 |
SMOTE | 10/1,10 | 93 /99 | 34/67 | 49/80 | 2.433/0.651 | |
UnderSample | 2/1,10 | 93/99 | 31/58 | 47/73 | 2.738/0.962 | |
OverSample | 2/2,10 | 93/97 | 34/64 | 50/77 | 2.405/0.733 | |
XGBoost | NonSampling | 10/2,10 | 78/91 | 94/93 | 86/92 | 0.064/0.056 |
SMOTE | 10/1,10 | 93/98 | 34/68 | 49/80 | 2.433/0.622 | |
UnderSample | 5/1,10 | 98/99 | 30/59 | 46/74 | 3.007/0.925 | |
OverSample | 10/6,10 | 91/95 | 45/82 | 60/88 | 1.469/0.278 | |
GBoost | NonSampling | 10/2,10 | 79/91 | 13/93 | 22/92 | 6.958/0.096 |
SMOTE | 5/1,10 | 98/98 | 32/69 | 48/81 | 2.762/0.593 | |
UnderSample | 5/1,10 | 95/99 | 27/58 | 42/73 | 3.429/0.964 | |
OverSample | 10/2,10 | 90/94 | 25/89 | 48/91 | 3.762/0.156 |
Method . | Sampling70 . | Feature ID . | Recall (per cent) . | Precision (per cent) . | F1 Score (per cent) . | FPR (per cent) . |
---|---|---|---|---|---|---|
DT | NonSampling | 10/3,10 | 80/91 | 94/93 | 86/92 | 0.062/0.093 |
SMOTE | 5/1,10 | 98/98 | 32/68 | 47/80 | 2.718/0.062 | |
UnderSample | 2/1,9 | 93/96 | 31/46 | 45/70 | 2.796/1.040 | |
OverSample | 2/2,10 | 92/95 | 41/80 | 56/87 | 2.784/0.307 | |
AdaBoost | NonSampling | 10/5,10 | 78/90 | 97/93 | 86/91 | 0.003/0.096 |
SMOTE | 10/1,10 | 93 /99 | 34/67 | 49/80 | 2.433/0.651 | |
UnderSample | 2/1,10 | 93/99 | 31/58 | 47/73 | 2.738/0.962 | |
OverSample | 2/2,10 | 93/97 | 34/64 | 50/77 | 2.405/0.733 | |
XGBoost | NonSampling | 10/2,10 | 78/91 | 94/93 | 86/92 | 0.064/0.056 |
SMOTE | 10/1,10 | 93/98 | 34/68 | 49/80 | 2.433/0.622 | |
UnderSample | 5/1,10 | 98/99 | 30/59 | 46/74 | 3.007/0.925 | |
OverSample | 10/6,10 | 91/95 | 45/82 | 60/88 | 1.469/0.278 | |
GBoost | NonSampling | 10/2,10 | 79/91 | 13/93 | 22/92 | 6.958/0.096 |
SMOTE | 5/1,10 | 98/98 | 32/69 | 48/81 | 2.762/0.593 | |
UnderSample | 5/1,10 | 95/99 | 27/58 | 42/73 | 3.429/0.964 | |
OverSample | 10/2,10 | 90/94 | 25/89 | 48/91 | 3.762/0.156 |
The results of univariate-feature selection (on the left of each table cell) and double-feature selection (on the right of each table cell). Based on the GS, the performance metrics of our models were calculated with four types of sampling, including ordinary training set (noted as ‘NonSampling’), SMOTE training set (’SMOTE’), undersampling training set (’UnderSample’), and oversampling training set (’OverSample’). Selected features depend on the used classifiers as well as the sampling types. More analyses is presented in Section 4.1.2.
Method . | Sampling70 . | Feature ID . | Recall (per cent) . | Precision (per cent) . | F1 Score (per cent) . | FPR (per cent) . |
---|---|---|---|---|---|---|
DT | NonSampling | 10/3,10 | 80/91 | 94/93 | 86/92 | 0.062/0.093 |
SMOTE | 5/1,10 | 98/98 | 32/68 | 47/80 | 2.718/0.062 | |
UnderSample | 2/1,9 | 93/96 | 31/46 | 45/70 | 2.796/1.040 | |
OverSample | 2/2,10 | 92/95 | 41/80 | 56/87 | 2.784/0.307 | |
AdaBoost | NonSampling | 10/5,10 | 78/90 | 97/93 | 86/91 | 0.003/0.096 |
SMOTE | 10/1,10 | 93 /99 | 34/67 | 49/80 | 2.433/0.651 | |
UnderSample | 2/1,10 | 93/99 | 31/58 | 47/73 | 2.738/0.962 | |
OverSample | 2/2,10 | 93/97 | 34/64 | 50/77 | 2.405/0.733 | |
XGBoost | NonSampling | 10/2,10 | 78/91 | 94/93 | 86/92 | 0.064/0.056 |
SMOTE | 10/1,10 | 93/98 | 34/68 | 49/80 | 2.433/0.622 | |
UnderSample | 5/1,10 | 98/99 | 30/59 | 46/74 | 3.007/0.925 | |
OverSample | 10/6,10 | 91/95 | 45/82 | 60/88 | 1.469/0.278 | |
GBoost | NonSampling | 10/2,10 | 79/91 | 13/93 | 22/92 | 6.958/0.096 |
SMOTE | 5/1,10 | 98/98 | 32/69 | 48/81 | 2.762/0.593 | |
UnderSample | 5/1,10 | 95/99 | 27/58 | 42/73 | 3.429/0.964 | |
OverSample | 10/2,10 | 90/94 | 25/89 | 48/91 | 3.762/0.156 |
Method . | Sampling70 . | Feature ID . | Recall (per cent) . | Precision (per cent) . | F1 Score (per cent) . | FPR (per cent) . |
---|---|---|---|---|---|---|
DT | NonSampling | 10/3,10 | 80/91 | 94/93 | 86/92 | 0.062/0.093 |
SMOTE | 5/1,10 | 98/98 | 32/68 | 47/80 | 2.718/0.062 | |
UnderSample | 2/1,9 | 93/96 | 31/46 | 45/70 | 2.796/1.040 | |
OverSample | 2/2,10 | 92/95 | 41/80 | 56/87 | 2.784/0.307 | |
AdaBoost | NonSampling | 10/5,10 | 78/90 | 97/93 | 86/91 | 0.003/0.096 |
SMOTE | 10/1,10 | 93 /99 | 34/67 | 49/80 | 2.433/0.651 | |
UnderSample | 2/1,10 | 93/99 | 31/58 | 47/73 | 2.738/0.962 | |
OverSample | 2/2,10 | 93/97 | 34/64 | 50/77 | 2.405/0.733 | |
XGBoost | NonSampling | 10/2,10 | 78/91 | 94/93 | 86/92 | 0.064/0.056 |
SMOTE | 10/1,10 | 93/98 | 34/68 | 49/80 | 2.433/0.622 | |
UnderSample | 5/1,10 | 98/99 | 30/59 | 46/74 | 3.007/0.925 | |
OverSample | 10/6,10 | 91/95 | 45/82 | 60/88 | 1.469/0.278 | |
GBoost | NonSampling | 10/2,10 | 79/91 | 13/93 | 22/92 | 6.958/0.096 |
SMOTE | 5/1,10 | 98/98 | 32/69 | 48/81 | 2.762/0.593 | |
UnderSample | 5/1,10 | 95/99 | 27/58 | 42/73 | 3.429/0.964 | |
OverSample | 10/2,10 | 90/94 | 25/89 | 48/91 | 3.762/0.156 |
This PDF is available to Subscribers Only
View Article Abstract & Purchase OptionsFor full access to this pdf, sign in to an existing account, or purchase an annual subscription.