The results of multiple-feature selection based on RFE. The values on the left/right of each table cell are the results of three-feature/four-feature selection. The performance metrics of our models were calculated with four types of sampling, including ordinary training set (noted as ‘NonSampling’), SMOTE training set (’SMOTE’), under-sampling training set (’UnderSample’), and oversampling training set (’OverSample’). Selected features depend on the used classifiers as well as the sampling types. Detailed analyses are made in Section 4.2.2.
Method . | Sampling . | Feature ID . | Recall (per cent) . | Precision (per cent) . | F1 Score (per cent) . | FPR (per cent) . |
---|---|---|---|---|---|---|
DT | NonSampling | 3,5,10/2,3,5,10 | 95/95 | 97/98 | 96/96 | 0.033/0.031 |
SMOTE | 2,5,10/2,5,10,11 | 98/96 | 81/85 | 89/90 | 0.307/0.222 | |
UnderSample | 2,5,10/2,5,10,11 | 98/98 | 51/51 | 67/67 | 1.260/1.260 | |
OverSample | 1,10,14/1,4,10,14 | 97/96 | 75/84 | 85/89 | 0.420/0.289 | |
AdaBoost | NonSampling | 3,5,9/3,5,9,16 | 95/95 | 99/99 | 97/97 | 0.015/0.011 |
SMOTE | 3,5,9/3,5,9,10 | 99/99 | 78/84 | 87/91 | 0.360/0.258 | |
UnderSample | 3,5,6/3,5,6,10 | 99/99 | 47/60 | 64/75 | 1.491/0.876 | |
OverSample | 3,5,9/3,5,9,16 | 99/99 | 73/74 | 84/85 | 0.493/0.460 | |
XGBoost | NonSampling | 3,5,10/3,5,810 | 97/97 | 97/97 | 97/97 | 0.033/0.033 |
SMOTE | 3,5,10/2,3,5,10 | 96/99 | 47/88 | 63/93 | 1.449/0.169 | |
UnderSample | 3,5,10/2,3,5,10 | 99/99 | 60/61 | 75/75 | 0.878/0.849 | |
OverSample | 3,5,10/3,5,7,10 | 98/98 | 95/94 | 96/96 | 0.073/0.078 | |
GBoost | NonSampling | 3,8,10/3,8,10,11 | 96/96 | 98/97 | 97/97 | 0.031/0.040 |
SMOTE | 3,5,10/3,5,10,11 | 99/99 | 89/93 | 94/95 | 0.162/0.102 | |
UnderSample | 5,8,10/1,5,8,10 | 98/98 | 53/53 | 69/69 | 1.158/1.158 | |
OverSample | 3,5,10/3,5,10,14 | 98/97 | 95/96 | 96/97 | 0.071 /0.031 | |
RF | NonSampling | 1,9,10,15 | 94 | 98 | 96 | 0.031 |
SMOTE | 1,5,8,10 | 98 | 76 | 85 | 0.420 | |
UnderSample | 1,5,8,10 | 98 | 59 | 73 | 0.913 | |
OverSample | 1,5,8,10 | 96 | 80 | 87 | 0.318 |
Method . | Sampling . | Feature ID . | Recall (per cent) . | Precision (per cent) . | F1 Score (per cent) . | FPR (per cent) . |
---|---|---|---|---|---|---|
DT | NonSampling | 3,5,10/2,3,5,10 | 95/95 | 97/98 | 96/96 | 0.033/0.031 |
SMOTE | 2,5,10/2,5,10,11 | 98/96 | 81/85 | 89/90 | 0.307/0.222 | |
UnderSample | 2,5,10/2,5,10,11 | 98/98 | 51/51 | 67/67 | 1.260/1.260 | |
OverSample | 1,10,14/1,4,10,14 | 97/96 | 75/84 | 85/89 | 0.420/0.289 | |
AdaBoost | NonSampling | 3,5,9/3,5,9,16 | 95/95 | 99/99 | 97/97 | 0.015/0.011 |
SMOTE | 3,5,9/3,5,9,10 | 99/99 | 78/84 | 87/91 | 0.360/0.258 | |
UnderSample | 3,5,6/3,5,6,10 | 99/99 | 47/60 | 64/75 | 1.491/0.876 | |
OverSample | 3,5,9/3,5,9,16 | 99/99 | 73/74 | 84/85 | 0.493/0.460 | |
XGBoost | NonSampling | 3,5,10/3,5,810 | 97/97 | 97/97 | 97/97 | 0.033/0.033 |
SMOTE | 3,5,10/2,3,5,10 | 96/99 | 47/88 | 63/93 | 1.449/0.169 | |
UnderSample | 3,5,10/2,3,5,10 | 99/99 | 60/61 | 75/75 | 0.878/0.849 | |
OverSample | 3,5,10/3,5,7,10 | 98/98 | 95/94 | 96/96 | 0.073/0.078 | |
GBoost | NonSampling | 3,8,10/3,8,10,11 | 96/96 | 98/97 | 97/97 | 0.031/0.040 |
SMOTE | 3,5,10/3,5,10,11 | 99/99 | 89/93 | 94/95 | 0.162/0.102 | |
UnderSample | 5,8,10/1,5,8,10 | 98/98 | 53/53 | 69/69 | 1.158/1.158 | |
OverSample | 3,5,10/3,5,10,14 | 98/97 | 95/96 | 96/97 | 0.071 /0.031 | |
RF | NonSampling | 1,9,10,15 | 94 | 98 | 96 | 0.031 |
SMOTE | 1,5,8,10 | 98 | 76 | 85 | 0.420 | |
UnderSample | 1,5,8,10 | 98 | 59 | 73 | 0.913 | |
OverSample | 1,5,8,10 | 96 | 80 | 87 | 0.318 |
The results of multiple-feature selection based on RFE. The values on the left/right of each table cell are the results of three-feature/four-feature selection. The performance metrics of our models were calculated with four types of sampling, including ordinary training set (noted as ‘NonSampling’), SMOTE training set (’SMOTE’), under-sampling training set (’UnderSample’), and oversampling training set (’OverSample’). Selected features depend on the used classifiers as well as the sampling types. Detailed analyses are made in Section 4.2.2.
Method . | Sampling . | Feature ID . | Recall (per cent) . | Precision (per cent) . | F1 Score (per cent) . | FPR (per cent) . |
---|---|---|---|---|---|---|
DT | NonSampling | 3,5,10/2,3,5,10 | 95/95 | 97/98 | 96/96 | 0.033/0.031 |
SMOTE | 2,5,10/2,5,10,11 | 98/96 | 81/85 | 89/90 | 0.307/0.222 | |
UnderSample | 2,5,10/2,5,10,11 | 98/98 | 51/51 | 67/67 | 1.260/1.260 | |
OverSample | 1,10,14/1,4,10,14 | 97/96 | 75/84 | 85/89 | 0.420/0.289 | |
AdaBoost | NonSampling | 3,5,9/3,5,9,16 | 95/95 | 99/99 | 97/97 | 0.015/0.011 |
SMOTE | 3,5,9/3,5,9,10 | 99/99 | 78/84 | 87/91 | 0.360/0.258 | |
UnderSample | 3,5,6/3,5,6,10 | 99/99 | 47/60 | 64/75 | 1.491/0.876 | |
OverSample | 3,5,9/3,5,9,16 | 99/99 | 73/74 | 84/85 | 0.493/0.460 | |
XGBoost | NonSampling | 3,5,10/3,5,810 | 97/97 | 97/97 | 97/97 | 0.033/0.033 |
SMOTE | 3,5,10/2,3,5,10 | 96/99 | 47/88 | 63/93 | 1.449/0.169 | |
UnderSample | 3,5,10/2,3,5,10 | 99/99 | 60/61 | 75/75 | 0.878/0.849 | |
OverSample | 3,5,10/3,5,7,10 | 98/98 | 95/94 | 96/96 | 0.073/0.078 | |
GBoost | NonSampling | 3,8,10/3,8,10,11 | 96/96 | 98/97 | 97/97 | 0.031/0.040 |
SMOTE | 3,5,10/3,5,10,11 | 99/99 | 89/93 | 94/95 | 0.162/0.102 | |
UnderSample | 5,8,10/1,5,8,10 | 98/98 | 53/53 | 69/69 | 1.158/1.158 | |
OverSample | 3,5,10/3,5,10,14 | 98/97 | 95/96 | 96/97 | 0.071 /0.031 | |
RF | NonSampling | 1,9,10,15 | 94 | 98 | 96 | 0.031 |
SMOTE | 1,5,8,10 | 98 | 76 | 85 | 0.420 | |
UnderSample | 1,5,8,10 | 98 | 59 | 73 | 0.913 | |
OverSample | 1,5,8,10 | 96 | 80 | 87 | 0.318 |
Method . | Sampling . | Feature ID . | Recall (per cent) . | Precision (per cent) . | F1 Score (per cent) . | FPR (per cent) . |
---|---|---|---|---|---|---|
DT | NonSampling | 3,5,10/2,3,5,10 | 95/95 | 97/98 | 96/96 | 0.033/0.031 |
SMOTE | 2,5,10/2,5,10,11 | 98/96 | 81/85 | 89/90 | 0.307/0.222 | |
UnderSample | 2,5,10/2,5,10,11 | 98/98 | 51/51 | 67/67 | 1.260/1.260 | |
OverSample | 1,10,14/1,4,10,14 | 97/96 | 75/84 | 85/89 | 0.420/0.289 | |
AdaBoost | NonSampling | 3,5,9/3,5,9,16 | 95/95 | 99/99 | 97/97 | 0.015/0.011 |
SMOTE | 3,5,9/3,5,9,10 | 99/99 | 78/84 | 87/91 | 0.360/0.258 | |
UnderSample | 3,5,6/3,5,6,10 | 99/99 | 47/60 | 64/75 | 1.491/0.876 | |
OverSample | 3,5,9/3,5,9,16 | 99/99 | 73/74 | 84/85 | 0.493/0.460 | |
XGBoost | NonSampling | 3,5,10/3,5,810 | 97/97 | 97/97 | 97/97 | 0.033/0.033 |
SMOTE | 3,5,10/2,3,5,10 | 96/99 | 47/88 | 63/93 | 1.449/0.169 | |
UnderSample | 3,5,10/2,3,5,10 | 99/99 | 60/61 | 75/75 | 0.878/0.849 | |
OverSample | 3,5,10/3,5,7,10 | 98/98 | 95/94 | 96/96 | 0.073/0.078 | |
GBoost | NonSampling | 3,8,10/3,8,10,11 | 96/96 | 98/97 | 97/97 | 0.031/0.040 |
SMOTE | 3,5,10/3,5,10,11 | 99/99 | 89/93 | 94/95 | 0.162/0.102 | |
UnderSample | 5,8,10/1,5,8,10 | 98/98 | 53/53 | 69/69 | 1.158/1.158 | |
OverSample | 3,5,10/3,5,10,14 | 98/97 | 95/96 | 96/97 | 0.071 /0.031 | |
RF | NonSampling | 1,9,10,15 | 94 | 98 | 96 | 0.031 |
SMOTE | 1,5,8,10 | 98 | 76 | 85 | 0.420 | |
UnderSample | 1,5,8,10 | 98 | 59 | 73 | 0.913 | |
OverSample | 1,5,8,10 | 96 | 80 | 87 | 0.318 |
This PDF is available to Subscribers Only
View Article Abstract & Purchase OptionsFor full access to this pdf, sign in to an existing account, or purchase an annual subscription.