Abstract
In order to identify plant pentatricopeptide repeat (PPR) proteins, a framework of variable selection has been proposed. In fact, it is an effective feature selection strategy that focuses on the performance of classification. Random forest has been used as the classifier with certain variables automatically selected for discrimination between PPR functional and non-functional proteins. However, it is found that samples regarded as PPR functional proteins are wrongly classified in a high rate. In this paper, we plan to improve the framework in order to achieve better classification results. Modifications are made on the framework for better identifying PPR functional proteins. Instead of random forest, a hybrid ensemble classifier is built with its base classifiers derived from six different classification methods. Besides, an incremental strategy and a clustering by search in descending order are alternatively used for feature selection, which can effectively select the most representative variables for identification on PPR proteins. In addition, it can be found that different base classifiers alternately play an important role in the ensemble classifier with feature dimension increasing. The experimental results demonstrate the effectiveness of our improvements.
Introduction
Pentatricopeptide repeat (PPR) proteins, which appear more than 400 forms in most species, are regarded as one of the largest protein families in land plants [1]. Commonly, typical PPR proteins are located in mitochondria [2, 3] or chloroplasts [4, 5] and bound to one or more organelle transcripts, the expressions of which are affected by changing the transcription, processing and translation of the RNA sequence [6, 7]. Their combined action has profound effects on the biogenesis and function of organelle, and consequently has an influence on photosynthesis, respiration, plant development and environmental responses.
Many functional proteins can be predicted using sequential tools [8–10]. First, algorithms are designed to extract various features from amino acid sequences for protein prediction. For instance, a feature named 188D which is composed of 188 feature components (namely variables) that are related to the content of 20 amino acids and eight types of physicochemical properties of amino acids is a case in point [11–14]. Another case is the feature including 65 pseudo-amino acids which is abbreviated as PAAC [15, 16]. Thereafter, classifiers such as random forest (RF) [17, 18], support vector machine (SVM) [19, 20] and the hybrid ensemble classifier composed of multiple base classifiers [21–23] can be applied to evaluate the distinguishing ability of the extracted features. Besides, a workbench named WEKA [24] using different classifiers to get quantitative classification results has been proposed. However, different classifiers inevitably appear different performances of classification according to various sample distribution, which makes an automatic switch among different classifiers become a need. In addition, there are still few studies on discussing whether feature components are more effective in identifying functional proteins or not, especially PPR proteins. In order to solve these problems, a framework of variable selection for identifying plant PPR proteins has been proposed [17]. The methionine content is firstly found to be effective for recognition of PPR proteins. Due to the application of random forest, many PPR proteins have been wrongly classified and regarded as false negative.

Figure 1
The proposed ensemble classification based feature selection method. ① Ten-fold nested cross validation with its inner loop to be the resampling, training and scoring step. ② Hybrid ensemble classification with six different types of base classifiers assigned for score accumulation, incremental variable selection and classifier establishment (see the control line ). ③ Variable score accumulation. Its resampling and training step correspond to the inner loop of 10-fold nested cross validation and variable scoring, respectively. ④ Variable selection by clustering. ⑤ Variable selection using an increasing strategy. ⑥ Quantitative measurements.
In this paper, a hybrid-ensemble-classifier based framework of variable selection is presented instead of random forest for better identifying plant PPR proteins, as shown in Figure 1. First, PPR positive and negative proteins are equally divided into 10 parts for 10-fold nested cross validation. Six different classification methods are automatically selected for variable scoring, variable selection and the establishment of the ensemble classifier, in order to cope with different sample distributions. Then, multiple rounds of resampling, training and scoring are implemented on the training set to accumulate scores for each variable. In each round, the score of a variable is accumulated by making a comparison between the classification error rates before and after one-time random permutation of the remaining sample on the variable in the training set. The number of rounds for resampling, training and scoring is determined by clustering. Variables are automatically selected through clustering and a presented variable increasing strategy. Qualitative and quantitative measurements are made on the selected variables in the testing set. The experimental results indicate the effectiveness of the automatically selected variables for identification of PPR proteins, which demonstrates that the ensemble classifier composed of various base classifiers is more effective than random forest.
Method
First of all, the dataset representing plant PPR is used [10], which contains 487 PPR positive and 9590 negative protein primary sequences. Subsequently, the feature 188D is extracted. In order to discuss which components in 188D play a role in identifying plant PPR proteins, we follow the feature selection framework in Figure 1 to select important variables. More details can be seen in the following subsections.
Data division
As previously processed [17], 243 PPR positive proteins and 4795 PPR negative ones are randomly selected as the training set. The remaining samples are used as the test set. This kind of sample division is used to make a comparison with the previous method. In addition, n-fold nested cross validation is thought to be more effective especially when sample size is limited. Therefore, CD-HIT [25] is firstly performed on the data in order to remove sequence redundancy. 170 PPR positive proteins and 9293 negative ones are remained. Secondly, PPR positive and negative proteins are separately divided into 10 groups with 17 PPR positive proteins and 929 PPR negative ones in each group for 10-fold nested cross validation.
Ensemble classification
Instead of random forest, an ensemble classifier is presented, which is based on multiple types of base classifiers including k-nearest neighbor (KNN), multilayer perceptron (MLP), linear discriminant analysis (LDA), Gaussian naive Bayes (GNB), support vector machine (SVM) and decision tree classifier (DTC). The ensemble classifier provides the following supports: assistance in calculating the score of each variable from the 188D feature; participation in variable increasing strategy to automatically select variables; and a distinction between PPR positive and negative proteins on the test set (see Figure 1).
Score accumulation
As illustrated in Figure 1, iterations are implemented on the training set to obtain important variables which contribute to discrimination between PPR positive and negative proteins. Each iteration consists of three steps, i.e. resampling, training and scoring.
Firstly, 70% of the training samples, i.e. 107 PPR positive proteins and 5855 PPR negative ones, are randomly selected in balance. The remaining 30% of the training samples, i.e. 46 PPR positive proteins and 2509 PPR negative ones, are known as the out-of-bag (OOB) samples.
Secondly, the selected samples are used to train each kind of the base classifiers. All the components of the 188D feature are taken into account. Therefore, six trained base classifiers are obtained. The one with the best classification performance on OOB samples is selected for variable scoring. Considering the unbalanced distribution between positive and negative samples, the classification error rate on OOB samples is modified. That is,
where
,
,
and
represent the number of false negative, true positive, false positive and true negative samples, respectively. Thus, the base classifier with a lowest classification error rate is selected as the specific classifier.
Thirdly, sample values of each variable are permuted, and the classification error rate of the specific base classifier is calculated again using Equation (
1). The difference of the error rates before and after permutation is assigned to the variable as its importance score, which is expressed as,
where
and
represent the error rates before and after the permutation of sample values on variable
in iteration round
, respectively. The iteration of resampling, training and scoring is executed for
rounds, and the accumulated score of variable
is expressed as,
If the accumulated score of variable
is small, it means that variable
has little contribution to sample classification. Otherwise, it is to be regarded as an important component for sample identification.
Variable selection by clustering
Once the accumulated score of each component or variable is obtained, variables with higher scores are to be selected for sample classification. An automatic model selection method is necessary to select variables contributing to the identification of PPR positive proteins. A clustering by search in descending order and automatic find of density peaks, which is previously proposed and abbreviated as A-DPC [26], is utilized to choose variables automatically and judge when to stop the rounds of resampling, training and scoring.
The implementation details are as follows. Firstly, 1000 rounds of resampling, training and scoring are implemented. Thus, the accumulated scores of all the variables in 188D are obtained. Secondly, A-DPC is used on the accumulated scores for selecting variable candidates, which contain the variables with their accumulated scores not in the cluster with the lowest accumulated scores. Thirdly, a repetition on a new 1000 rounds of resampling, training and scoring is made, and A-DPC is used on the obtained accumulated scores which correspond to the total rounds. This procedure is repeated until the selected variable candidates do not change.
Variable selection using an increasing strategy
Considering that the accumulated scores of all the variables may remain at a relatively low level, i.e. no outliers appear on the scatter plot as illustrated in Figure 1, an automatic variable selection is designed instead using a variable increasing strategy.
Firstly, the variables are rearranged with their accumulated scores in descending order. Secondly, each variable is added in an incremental way according to the new order to form a feature subspace, in which the ensemble classifier is trained. Then, 1000 rounds of resampling and training are made. In each round, one of the base classifiers with the lowest classification error rate is kept as a component of the ensemble classifier. Therefore, 1000 base classifiers with different types are obtained. This incremental strategy is made until all the variables in 188D are considered. Thirdly, the established ensemble classifier is used on each fold of samples for testing to get the classification accuracy, which is expressed as,
Thus, 188 ACCs are obtained after variable traverse through feature 188D. Correspondingly, a line chart can be drawn, as shown in
Figure 1. Fourthly, polynomial fit is made on the ACC line chart to find the inflexion point, which refers to the dimension of the variables to be selected.
Table 1Quantitative results using the hybrid ensemble classifier
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.295 | 0.036 | 0.293 | 0.295 | 0.294 |
| a | 72 | 172 | b: positive | 0.964 | 0.705 | 0.964 | 0.964 | 0.964 |
| b | 174 | 4621 | weighted average | 0.629 | 0.371 | 0.628 | 0.629 | 0.629 |
| classified as − > | a | b | a: positive | 0.373 | 0.014 | 0.569 | 0.373 | 0.450 |
| a | 91 | 153 | b: positive | 0.986 | 0.627 | 0.969 | 0.986 | 0.977 |
| b | 69 | 4726 | weighted average | 0.679 | 0.321 | 0.769 | 0.679 | 0.714 |
| classified as − > | a | b | a: positive | 0.619 | 0.021 | 0.604 | 0.619 | 0.611 |
| a | 151 | 93 | b: positive | 0.979 | 0.381 | 0.981 | 0.979 | 0.980 |
| b | 99 | 4696 | weighted average | 0.799 | 0.201 | 0.792 | 0.799 | 0.796 |
| classified as − > | a | b | a: positive | 0.721 | 0.017 | 0.682 | 0.721 | 0.701 |
| a | 176 | 68 | b: positive | 0.983 | 0.279 | 0.986 | 0.983 | 0.984 |
| b | 82 | 4713 | weighted average | 0.852 | 0.148 | 0.834 | 0.852 | 0.843 |
| classified as − > | a | b | a: positive | 0.910 | 0.053 | 0.464 | 0.910 | 0.615 |
| a | 222 | 22 | b: positive | 0.947 | 0.090 | 0.995 | 0.947 | 0.970 |
| b | 256 | 4539 | weighted average | 0.928 | 0.072 | 0.730 | 0.928 | 0.793 |
188D | classified as − > | a | b | a: positive | 0.873 | 0.002 | 0.955 | 0.873 | 0.912 |
| a | 213 | 31 | b: positive | 0.998 | 0.127 | 0.994 | 0.998 | 0.996 |
| b | 10 | 4785 | weighted average | 0.935 | 0.065 | 0.974 | 0.935 | 0.954 |
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.295 | 0.036 | 0.293 | 0.295 | 0.294 |
| a | 72 | 172 | b: positive | 0.964 | 0.705 | 0.964 | 0.964 | 0.964 |
| b | 174 | 4621 | weighted average | 0.629 | 0.371 | 0.628 | 0.629 | 0.629 |
| classified as − > | a | b | a: positive | 0.373 | 0.014 | 0.569 | 0.373 | 0.450 |
| a | 91 | 153 | b: positive | 0.986 | 0.627 | 0.969 | 0.986 | 0.977 |
| b | 69 | 4726 | weighted average | 0.679 | 0.321 | 0.769 | 0.679 | 0.714 |
| classified as − > | a | b | a: positive | 0.619 | 0.021 | 0.604 | 0.619 | 0.611 |
| a | 151 | 93 | b: positive | 0.979 | 0.381 | 0.981 | 0.979 | 0.980 |
| b | 99 | 4696 | weighted average | 0.799 | 0.201 | 0.792 | 0.799 | 0.796 |
| classified as − > | a | b | a: positive | 0.721 | 0.017 | 0.682 | 0.721 | 0.701 |
| a | 176 | 68 | b: positive | 0.983 | 0.279 | 0.986 | 0.983 | 0.984 |
| b | 82 | 4713 | weighted average | 0.852 | 0.148 | 0.834 | 0.852 | 0.843 |
| classified as − > | a | b | a: positive | 0.910 | 0.053 | 0.464 | 0.910 | 0.615 |
| a | 222 | 22 | b: positive | 0.947 | 0.090 | 0.995 | 0.947 | 0.970 |
| b | 256 | 4539 | weighted average | 0.928 | 0.072 | 0.730 | 0.928 | 0.793 |
188D | classified as − > | a | b | a: positive | 0.873 | 0.002 | 0.955 | 0.873 | 0.912 |
| a | 213 | 31 | b: positive | 0.998 | 0.127 | 0.994 | 0.998 | 0.996 |
| b | 10 | 4785 | weighted average | 0.935 | 0.065 | 0.974 | 0.935 | 0.954 |
Table 1Quantitative results using the hybrid ensemble classifier
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.295 | 0.036 | 0.293 | 0.295 | 0.294 |
| a | 72 | 172 | b: positive | 0.964 | 0.705 | 0.964 | 0.964 | 0.964 |
| b | 174 | 4621 | weighted average | 0.629 | 0.371 | 0.628 | 0.629 | 0.629 |
| classified as − > | a | b | a: positive | 0.373 | 0.014 | 0.569 | 0.373 | 0.450 |
| a | 91 | 153 | b: positive | 0.986 | 0.627 | 0.969 | 0.986 | 0.977 |
| b | 69 | 4726 | weighted average | 0.679 | 0.321 | 0.769 | 0.679 | 0.714 |
| classified as − > | a | b | a: positive | 0.619 | 0.021 | 0.604 | 0.619 | 0.611 |
| a | 151 | 93 | b: positive | 0.979 | 0.381 | 0.981 | 0.979 | 0.980 |
| b | 99 | 4696 | weighted average | 0.799 | 0.201 | 0.792 | 0.799 | 0.796 |
| classified as − > | a | b | a: positive | 0.721 | 0.017 | 0.682 | 0.721 | 0.701 |
| a | 176 | 68 | b: positive | 0.983 | 0.279 | 0.986 | 0.983 | 0.984 |
| b | 82 | 4713 | weighted average | 0.852 | 0.148 | 0.834 | 0.852 | 0.843 |
| classified as − > | a | b | a: positive | 0.910 | 0.053 | 0.464 | 0.910 | 0.615 |
| a | 222 | 22 | b: positive | 0.947 | 0.090 | 0.995 | 0.947 | 0.970 |
| b | 256 | 4539 | weighted average | 0.928 | 0.072 | 0.730 | 0.928 | 0.793 |
188D | classified as − > | a | b | a: positive | 0.873 | 0.002 | 0.955 | 0.873 | 0.912 |
| a | 213 | 31 | b: positive | 0.998 | 0.127 | 0.994 | 0.998 | 0.996 |
| b | 10 | 4785 | weighted average | 0.935 | 0.065 | 0.974 | 0.935 | 0.954 |
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.295 | 0.036 | 0.293 | 0.295 | 0.294 |
| a | 72 | 172 | b: positive | 0.964 | 0.705 | 0.964 | 0.964 | 0.964 |
| b | 174 | 4621 | weighted average | 0.629 | 0.371 | 0.628 | 0.629 | 0.629 |
| classified as − > | a | b | a: positive | 0.373 | 0.014 | 0.569 | 0.373 | 0.450 |
| a | 91 | 153 | b: positive | 0.986 | 0.627 | 0.969 | 0.986 | 0.977 |
| b | 69 | 4726 | weighted average | 0.679 | 0.321 | 0.769 | 0.679 | 0.714 |
| classified as − > | a | b | a: positive | 0.619 | 0.021 | 0.604 | 0.619 | 0.611 |
| a | 151 | 93 | b: positive | 0.979 | 0.381 | 0.981 | 0.979 | 0.980 |
| b | 99 | 4696 | weighted average | 0.799 | 0.201 | 0.792 | 0.799 | 0.796 |
| classified as − > | a | b | a: positive | 0.721 | 0.017 | 0.682 | 0.721 | 0.701 |
| a | 176 | 68 | b: positive | 0.983 | 0.279 | 0.986 | 0.983 | 0.984 |
| b | 82 | 4713 | weighted average | 0.852 | 0.148 | 0.834 | 0.852 | 0.843 |
| classified as − > | a | b | a: positive | 0.910 | 0.053 | 0.464 | 0.910 | 0.615 |
| a | 222 | 22 | b: positive | 0.947 | 0.090 | 0.995 | 0.947 | 0.970 |
| b | 256 | 4539 | weighted average | 0.928 | 0.072 | 0.730 | 0.928 | 0.793 |
188D | classified as − > | a | b | a: positive | 0.873 | 0.002 | 0.955 | 0.873 | 0.912 |
| a | 213 | 31 | b: positive | 0.998 | 0.127 | 0.994 | 0.998 | 0.996 |
| b | 10 | 4785 | weighted average | 0.935 | 0.065 | 0.974 | 0.935 | 0.954 |
Measurement
In order to show the effectiveness of the selected variables, we select seven quantitative measures, including confusion matrix, TP rate, FP rate, Precision, Recall, ACC and F1-measure. The confusion matrix describes the number of FN, TP, FP and TN samples. Accordingly, TP rate, FP rate, Precision and Recall are calculated as follows,
where TP rate and Recall are expressed in the same form. F1-measure is the harmonic average of precision and recall, which is expressed as,
Results
Since the complexity of the base classifier MLP is the highest, the time complexity of the proposed framework is , where refers to the resampling rounds. and represent the number of samples and variables. , , and correspond to the number of neurons, hidden layers, outputs and iterations of each layer in the neural network, respectively. However, it only corresponds to the training process. As for the test step, its time complexity is only .
Besides, the implementation environment is a notebook with AMD r7-5800h CPU, windows10 operating system and 16GB memory. Python version 3.9 and scikit learn (1.0.2) package are used. Six classifiers under the default parameters, i.e. LinearDiscriminantAnalysis, KNeigborsClassifier, DecisionTree Classifier, SVC, GaussianNB and MLPClassifier, are considered.
Classification results among different classifiers
We used the same feature set derived from 188D in [17]. The variables in the feature set were arranged in descending order according to their scores. In this feature dimension, a hybrid ensemble classifier was established on the training set with the same 243 PPR-positive proteins and 4795 PPR-negative ones selected in [17]. Quantitative classification results were obtained on the corresponding test set, as shown in Table 1. For comparison, random forest and other ensemble classifiers with the base classifiers to be KNN, MLP, LDA, GNB and SVM were built using the same training set with their quantitative experimental results on the same test set listed from Table 2 to Table 7. Besides, the corresoponding classficiation results of single classifiers including KNN, MLP, LDA, GNB and SVM were listed from Table 8 to Table 12 using WEKA [24]. The confusion matrix, TP rate, FP rate, precision, recall, and F1 measure were calculated and listed. Two categories representing PPR-positive proteins (labeled a) and PPR-negative ones (label b) were alternatively considered as positive. Besides, the average measure of the two classes which includes TP rate, FP rate, precision, recall, and F1 was calculated.
Table 2Quantitative results using random forest
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.279 | 0.031 | 0.316 | 0.279 | 0.296 |
| a | 68 | 176 | b: positive | 0.969 | 0.721 | 0.964 | 0.969 | 0.966 |
| b | 147 | 4648 | weighted average | 0.624 | 0.376 | 0.640 | 0.624 | 0.631 |
| classified as − > | a | b | a: positive | 0.365 | 0.016 | 0.533 | 0.365 | 0.433 |
| a | 89 | 155 | b: positive | 0.984 | 0.635 | 0.968 | 0.984 | 0.976 |
| b | 78 | 4717 | weighted average | 0.674 | 0.326 | 0.751 | 0.674 | 0.704 |
| classified as − > | a | b | a: positive | 0.484 | 0.012 | 0.678 | 0.484 | 0.565 |
| a | 118 | 126 | b: positive | 0.988 | 0.516 | 0.974 | 0.988 | 0.981 |
| b | 56 | 4739 | weighted average | 0.736 | 0.264 | 0.826 | 0.736 | 0.773 |
| classified as − > | a | b | a: positive | 0.586 | 0.008 | 0.799 | 0.586 | 0.676 |
| a | 143 | 101 | b: positive | 0.992 | 0.414 | 0.979 | 0.992 | 0.986 |
| b | 36 | 4759 | weighted average | 0.789 | 0.211 | 0.889 | 0.789 | 0.831 |
| classified as − > | a | b | a: positive | 0.680 | 0.001 | 0.976 | 0.680 | 0.802 |
| a | 166 | 78 | b: positive | 0.999 | 0.320 | 0.984 | 0.999 | 0.992 |
| b | 4 | 4791 | weighted average | 0.840 | 0.160 | 0.980 | 0.840 | 0.897 |
188D | classified as − > | a | b | a: positive | 0.656 | 0.001 | 0.976 | 0.656 | 0.784 |
| a | 160 | 84 | b: positive | 0999 | 0.344 | 0.983 | 0.999 | 0.991 |
| b | 1 | 4794 | weighted average | 0.827 | 0.173 | 0.979 | 0.827 | 0.888 |
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.279 | 0.031 | 0.316 | 0.279 | 0.296 |
| a | 68 | 176 | b: positive | 0.969 | 0.721 | 0.964 | 0.969 | 0.966 |
| b | 147 | 4648 | weighted average | 0.624 | 0.376 | 0.640 | 0.624 | 0.631 |
| classified as − > | a | b | a: positive | 0.365 | 0.016 | 0.533 | 0.365 | 0.433 |
| a | 89 | 155 | b: positive | 0.984 | 0.635 | 0.968 | 0.984 | 0.976 |
| b | 78 | 4717 | weighted average | 0.674 | 0.326 | 0.751 | 0.674 | 0.704 |
| classified as − > | a | b | a: positive | 0.484 | 0.012 | 0.678 | 0.484 | 0.565 |
| a | 118 | 126 | b: positive | 0.988 | 0.516 | 0.974 | 0.988 | 0.981 |
| b | 56 | 4739 | weighted average | 0.736 | 0.264 | 0.826 | 0.736 | 0.773 |
| classified as − > | a | b | a: positive | 0.586 | 0.008 | 0.799 | 0.586 | 0.676 |
| a | 143 | 101 | b: positive | 0.992 | 0.414 | 0.979 | 0.992 | 0.986 |
| b | 36 | 4759 | weighted average | 0.789 | 0.211 | 0.889 | 0.789 | 0.831 |
| classified as − > | a | b | a: positive | 0.680 | 0.001 | 0.976 | 0.680 | 0.802 |
| a | 166 | 78 | b: positive | 0.999 | 0.320 | 0.984 | 0.999 | 0.992 |
| b | 4 | 4791 | weighted average | 0.840 | 0.160 | 0.980 | 0.840 | 0.897 |
188D | classified as − > | a | b | a: positive | 0.656 | 0.001 | 0.976 | 0.656 | 0.784 |
| a | 160 | 84 | b: positive | 0999 | 0.344 | 0.983 | 0.999 | 0.991 |
| b | 1 | 4794 | weighted average | 0.827 | 0.173 | 0.979 | 0.827 | 0.888 |
Table 2Quantitative results using random forest
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.279 | 0.031 | 0.316 | 0.279 | 0.296 |
| a | 68 | 176 | b: positive | 0.969 | 0.721 | 0.964 | 0.969 | 0.966 |
| b | 147 | 4648 | weighted average | 0.624 | 0.376 | 0.640 | 0.624 | 0.631 |
| classified as − > | a | b | a: positive | 0.365 | 0.016 | 0.533 | 0.365 | 0.433 |
| a | 89 | 155 | b: positive | 0.984 | 0.635 | 0.968 | 0.984 | 0.976 |
| b | 78 | 4717 | weighted average | 0.674 | 0.326 | 0.751 | 0.674 | 0.704 |
| classified as − > | a | b | a: positive | 0.484 | 0.012 | 0.678 | 0.484 | 0.565 |
| a | 118 | 126 | b: positive | 0.988 | 0.516 | 0.974 | 0.988 | 0.981 |
| b | 56 | 4739 | weighted average | 0.736 | 0.264 | 0.826 | 0.736 | 0.773 |
| classified as − > | a | b | a: positive | 0.586 | 0.008 | 0.799 | 0.586 | 0.676 |
| a | 143 | 101 | b: positive | 0.992 | 0.414 | 0.979 | 0.992 | 0.986 |
| b | 36 | 4759 | weighted average | 0.789 | 0.211 | 0.889 | 0.789 | 0.831 |
| classified as − > | a | b | a: positive | 0.680 | 0.001 | 0.976 | 0.680 | 0.802 |
| a | 166 | 78 | b: positive | 0.999 | 0.320 | 0.984 | 0.999 | 0.992 |
| b | 4 | 4791 | weighted average | 0.840 | 0.160 | 0.980 | 0.840 | 0.897 |
188D | classified as − > | a | b | a: positive | 0.656 | 0.001 | 0.976 | 0.656 | 0.784 |
| a | 160 | 84 | b: positive | 0999 | 0.344 | 0.983 | 0.999 | 0.991 |
| b | 1 | 4794 | weighted average | 0.827 | 0.173 | 0.979 | 0.827 | 0.888 |
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.279 | 0.031 | 0.316 | 0.279 | 0.296 |
| a | 68 | 176 | b: positive | 0.969 | 0.721 | 0.964 | 0.969 | 0.966 |
| b | 147 | 4648 | weighted average | 0.624 | 0.376 | 0.640 | 0.624 | 0.631 |
| classified as − > | a | b | a: positive | 0.365 | 0.016 | 0.533 | 0.365 | 0.433 |
| a | 89 | 155 | b: positive | 0.984 | 0.635 | 0.968 | 0.984 | 0.976 |
| b | 78 | 4717 | weighted average | 0.674 | 0.326 | 0.751 | 0.674 | 0.704 |
| classified as − > | a | b | a: positive | 0.484 | 0.012 | 0.678 | 0.484 | 0.565 |
| a | 118 | 126 | b: positive | 0.988 | 0.516 | 0.974 | 0.988 | 0.981 |
| b | 56 | 4739 | weighted average | 0.736 | 0.264 | 0.826 | 0.736 | 0.773 |
| classified as − > | a | b | a: positive | 0.586 | 0.008 | 0.799 | 0.586 | 0.676 |
| a | 143 | 101 | b: positive | 0.992 | 0.414 | 0.979 | 0.992 | 0.986 |
| b | 36 | 4759 | weighted average | 0.789 | 0.211 | 0.889 | 0.789 | 0.831 |
| classified as − > | a | b | a: positive | 0.680 | 0.001 | 0.976 | 0.680 | 0.802 |
| a | 166 | 78 | b: positive | 0.999 | 0.320 | 0.984 | 0.999 | 0.992 |
| b | 4 | 4791 | weighted average | 0.840 | 0.160 | 0.980 | 0.840 | 0.897 |
188D | classified as − > | a | b | a: positive | 0.656 | 0.001 | 0.976 | 0.656 | 0.784 |
| a | 160 | 84 | b: positive | 0999 | 0.344 | 0.983 | 0.999 | 0.991 |
| b | 1 | 4794 | weighted average | 0.827 | 0.173 | 0.979 | 0.827 | 0.888 |
Table 3Quantitative results of the ensemble classifier with its base classifier to be KNN
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.139 | 0.009 | 0.447 | 0.139 | 0.212 |
| a | 34 | 210 | b: positive | 0.991 | 0.861 | 0.958 | 0.991 | 0.974 |
| b | 42 | 4753 | weighted average | 0.565 | 0.435 | 0.703 | 0.565 | 0.593 |
| classified as − > | a | b | a: positive | 0.373 | 0.015 | 0.558 | 0.373 | 0.447 |
| a | 91 | 153 | b: positive | 0.985 | 0.627 | 0.969 | 0.985 | 0.977 |
| b | 72 | 4723 | weighted average | 0.679 | 0.321 | 0.763 | 0.679 | 0.712 |
| classified as − > | a | b | a: positive | 0.537 | 0.017 | 0.618 | 0.537 | 0.575 |
| a | 131 | 113 | b: positive | 0.983 | 0.463 | 0.977 | 0.983 | 0.980 |
| b | 81 | 4714 | weighted average | 0.760 | 0.240 | 0.797 | 0.760 | 0.777 |
| classified as − > | a | b | a: positive | 0.066 | 0.014 | 0.706 | 0.660 | 0.682 |
| a | 161 | 83 | b: positive | 0.986 | 0.340 | 0.983 | 0.986 | 0.984 |
| b | 67 | 4728 | weighted average | 0.823 | 0.177 | 0.844 | 0.823 | 0.833 |
| classified as − > | a | b | a: positive | 0.791 | 0.010 | 0.794 | 0.791 | 0.793 |
| a | 193 | 51 | b: positive | 0.990 | 0.209 | 0.989 | 0.990 | 0.989 |
| b | 50 | 4745 | weighted average | 0.890 | 0.110 | 0.892 | 0.890 | 0.891 |
188D | classified as − > | a | b | a: positive | 0.799 | 0.011 | 0.793 | 0.799 | 0.796 |
| a | 195 | 49 | b: positive | 0.989 | 0.201 | 0.990 | 0.989 | 0.990 |
| b | 51 | 4744 | weighted average | 0.894 | 0.106 | 0.891 | 0.894 | 0.893 |
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.139 | 0.009 | 0.447 | 0.139 | 0.212 |
| a | 34 | 210 | b: positive | 0.991 | 0.861 | 0.958 | 0.991 | 0.974 |
| b | 42 | 4753 | weighted average | 0.565 | 0.435 | 0.703 | 0.565 | 0.593 |
| classified as − > | a | b | a: positive | 0.373 | 0.015 | 0.558 | 0.373 | 0.447 |
| a | 91 | 153 | b: positive | 0.985 | 0.627 | 0.969 | 0.985 | 0.977 |
| b | 72 | 4723 | weighted average | 0.679 | 0.321 | 0.763 | 0.679 | 0.712 |
| classified as − > | a | b | a: positive | 0.537 | 0.017 | 0.618 | 0.537 | 0.575 |
| a | 131 | 113 | b: positive | 0.983 | 0.463 | 0.977 | 0.983 | 0.980 |
| b | 81 | 4714 | weighted average | 0.760 | 0.240 | 0.797 | 0.760 | 0.777 |
| classified as − > | a | b | a: positive | 0.066 | 0.014 | 0.706 | 0.660 | 0.682 |
| a | 161 | 83 | b: positive | 0.986 | 0.340 | 0.983 | 0.986 | 0.984 |
| b | 67 | 4728 | weighted average | 0.823 | 0.177 | 0.844 | 0.823 | 0.833 |
| classified as − > | a | b | a: positive | 0.791 | 0.010 | 0.794 | 0.791 | 0.793 |
| a | 193 | 51 | b: positive | 0.990 | 0.209 | 0.989 | 0.990 | 0.989 |
| b | 50 | 4745 | weighted average | 0.890 | 0.110 | 0.892 | 0.890 | 0.891 |
188D | classified as − > | a | b | a: positive | 0.799 | 0.011 | 0.793 | 0.799 | 0.796 |
| a | 195 | 49 | b: positive | 0.989 | 0.201 | 0.990 | 0.989 | 0.990 |
| b | 51 | 4744 | weighted average | 0.894 | 0.106 | 0.891 | 0.894 | 0.893 |
Table 3Quantitative results of the ensemble classifier with its base classifier to be KNN
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.139 | 0.009 | 0.447 | 0.139 | 0.212 |
| a | 34 | 210 | b: positive | 0.991 | 0.861 | 0.958 | 0.991 | 0.974 |
| b | 42 | 4753 | weighted average | 0.565 | 0.435 | 0.703 | 0.565 | 0.593 |
| classified as − > | a | b | a: positive | 0.373 | 0.015 | 0.558 | 0.373 | 0.447 |
| a | 91 | 153 | b: positive | 0.985 | 0.627 | 0.969 | 0.985 | 0.977 |
| b | 72 | 4723 | weighted average | 0.679 | 0.321 | 0.763 | 0.679 | 0.712 |
| classified as − > | a | b | a: positive | 0.537 | 0.017 | 0.618 | 0.537 | 0.575 |
| a | 131 | 113 | b: positive | 0.983 | 0.463 | 0.977 | 0.983 | 0.980 |
| b | 81 | 4714 | weighted average | 0.760 | 0.240 | 0.797 | 0.760 | 0.777 |
| classified as − > | a | b | a: positive | 0.066 | 0.014 | 0.706 | 0.660 | 0.682 |
| a | 161 | 83 | b: positive | 0.986 | 0.340 | 0.983 | 0.986 | 0.984 |
| b | 67 | 4728 | weighted average | 0.823 | 0.177 | 0.844 | 0.823 | 0.833 |
| classified as − > | a | b | a: positive | 0.791 | 0.010 | 0.794 | 0.791 | 0.793 |
| a | 193 | 51 | b: positive | 0.990 | 0.209 | 0.989 | 0.990 | 0.989 |
| b | 50 | 4745 | weighted average | 0.890 | 0.110 | 0.892 | 0.890 | 0.891 |
188D | classified as − > | a | b | a: positive | 0.799 | 0.011 | 0.793 | 0.799 | 0.796 |
| a | 195 | 49 | b: positive | 0.989 | 0.201 | 0.990 | 0.989 | 0.990 |
| b | 51 | 4744 | weighted average | 0.894 | 0.106 | 0.891 | 0.894 | 0.893 |
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.139 | 0.009 | 0.447 | 0.139 | 0.212 |
| a | 34 | 210 | b: positive | 0.991 | 0.861 | 0.958 | 0.991 | 0.974 |
| b | 42 | 4753 | weighted average | 0.565 | 0.435 | 0.703 | 0.565 | 0.593 |
| classified as − > | a | b | a: positive | 0.373 | 0.015 | 0.558 | 0.373 | 0.447 |
| a | 91 | 153 | b: positive | 0.985 | 0.627 | 0.969 | 0.985 | 0.977 |
| b | 72 | 4723 | weighted average | 0.679 | 0.321 | 0.763 | 0.679 | 0.712 |
| classified as − > | a | b | a: positive | 0.537 | 0.017 | 0.618 | 0.537 | 0.575 |
| a | 131 | 113 | b: positive | 0.983 | 0.463 | 0.977 | 0.983 | 0.980 |
| b | 81 | 4714 | weighted average | 0.760 | 0.240 | 0.797 | 0.760 | 0.777 |
| classified as − > | a | b | a: positive | 0.066 | 0.014 | 0.706 | 0.660 | 0.682 |
| a | 161 | 83 | b: positive | 0.986 | 0.340 | 0.983 | 0.986 | 0.984 |
| b | 67 | 4728 | weighted average | 0.823 | 0.177 | 0.844 | 0.823 | 0.833 |
| classified as − > | a | b | a: positive | 0.791 | 0.010 | 0.794 | 0.791 | 0.793 |
| a | 193 | 51 | b: positive | 0.990 | 0.209 | 0.989 | 0.990 | 0.989 |
| b | 50 | 4745 | weighted average | 0.890 | 0.110 | 0.892 | 0.890 | 0.891 |
188D | classified as − > | a | b | a: positive | 0.799 | 0.011 | 0.793 | 0.799 | 0.796 |
| a | 195 | 49 | b: positive | 0.989 | 0.201 | 0.990 | 0.989 | 0.990 |
| b | 51 | 4744 | weighted average | 0.894 | 0.106 | 0.891 | 0.894 | 0.893 |
Table 4Quantitative results of the ensemble classifier with its base classifier to be MLP
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| a | 0 | 244 | b: positive | 1.000 | 1.000 | 0.952 | 1.000 | 0.975 |
| b | 0 | 4795 | weighted average | 0.500 | 0.500 | 0.476 | 0.500 | 0.488 |
| classified as − > | a | b | a: positive | 0.082 | 0.004 | 0.526 | 0.082 | 0.142 |
| a | 20 | 224 | b: positive | 0.996 | 0.918 | 0.955 | 0.996 | 0.975 |
| b | 18 | 4777 | weighted average | 0.539 | 0.461 | 0.741 | 0.539 | 0.559 |
| classified as − > | a | b | a: positive | 0.447 | 0.010 | 0.686 | 0.447 | 0.541 |
| a | 109 | 135 | b: positive | 0.990 | 0.553 | 0.972 | 0.990 | 0.981 |
| b | 50 | 4745 | weighted average | 0.718 | 0.282 | 0.829 | 0.718 | 0.761 |
| classified as − > | a | b | a: positive | 0.557 | 0.009 | 0.764 | 0.557 | 0.645 |
| a | 136 | 108 | b: positive | 0.991 | 0.443 | 0.978 | 0.991 | 0.984 |
| b | 42 | 4753 | weighted average | 0.774 | 0.226 | 0.871 | 0.774 | 0.815 |
| classified as − > | a | b | a: positive | 0.803 | 0.006 | 0.871 | 0.803 | 0.836 |
| a | 196 | 48 | b: positive | 0.994 | 0.197 | 0.990 | 0.994 | 0.992 |
| b | 29 | 4766 | weighted average | 0.899 | 0.101 | 0.931 | 0.899 | 0.914 |
188D | classified as − > | a | b | a: positive | 0.877 | 0.002 | 0.951 | 0.877 | 0.913 |
| a | 214 | 30 | b: positive | 0.998 | 0.123 | 0.994 | 0.998 | 0.996 |
| b | 11 | 4784 | weighted average | 0.937 | 0.063 | 0.972 | 0.937 | 0.954 |
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| a | 0 | 244 | b: positive | 1.000 | 1.000 | 0.952 | 1.000 | 0.975 |
| b | 0 | 4795 | weighted average | 0.500 | 0.500 | 0.476 | 0.500 | 0.488 |
| classified as − > | a | b | a: positive | 0.082 | 0.004 | 0.526 | 0.082 | 0.142 |
| a | 20 | 224 | b: positive | 0.996 | 0.918 | 0.955 | 0.996 | 0.975 |
| b | 18 | 4777 | weighted average | 0.539 | 0.461 | 0.741 | 0.539 | 0.559 |
| classified as − > | a | b | a: positive | 0.447 | 0.010 | 0.686 | 0.447 | 0.541 |
| a | 109 | 135 | b: positive | 0.990 | 0.553 | 0.972 | 0.990 | 0.981 |
| b | 50 | 4745 | weighted average | 0.718 | 0.282 | 0.829 | 0.718 | 0.761 |
| classified as − > | a | b | a: positive | 0.557 | 0.009 | 0.764 | 0.557 | 0.645 |
| a | 136 | 108 | b: positive | 0.991 | 0.443 | 0.978 | 0.991 | 0.984 |
| b | 42 | 4753 | weighted average | 0.774 | 0.226 | 0.871 | 0.774 | 0.815 |
| classified as − > | a | b | a: positive | 0.803 | 0.006 | 0.871 | 0.803 | 0.836 |
| a | 196 | 48 | b: positive | 0.994 | 0.197 | 0.990 | 0.994 | 0.992 |
| b | 29 | 4766 | weighted average | 0.899 | 0.101 | 0.931 | 0.899 | 0.914 |
188D | classified as − > | a | b | a: positive | 0.877 | 0.002 | 0.951 | 0.877 | 0.913 |
| a | 214 | 30 | b: positive | 0.998 | 0.123 | 0.994 | 0.998 | 0.996 |
| b | 11 | 4784 | weighted average | 0.937 | 0.063 | 0.972 | 0.937 | 0.954 |
Table 4Quantitative results of the ensemble classifier with its base classifier to be MLP
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| a | 0 | 244 | b: positive | 1.000 | 1.000 | 0.952 | 1.000 | 0.975 |
| b | 0 | 4795 | weighted average | 0.500 | 0.500 | 0.476 | 0.500 | 0.488 |
| classified as − > | a | b | a: positive | 0.082 | 0.004 | 0.526 | 0.082 | 0.142 |
| a | 20 | 224 | b: positive | 0.996 | 0.918 | 0.955 | 0.996 | 0.975 |
| b | 18 | 4777 | weighted average | 0.539 | 0.461 | 0.741 | 0.539 | 0.559 |
| classified as − > | a | b | a: positive | 0.447 | 0.010 | 0.686 | 0.447 | 0.541 |
| a | 109 | 135 | b: positive | 0.990 | 0.553 | 0.972 | 0.990 | 0.981 |
| b | 50 | 4745 | weighted average | 0.718 | 0.282 | 0.829 | 0.718 | 0.761 |
| classified as − > | a | b | a: positive | 0.557 | 0.009 | 0.764 | 0.557 | 0.645 |
| a | 136 | 108 | b: positive | 0.991 | 0.443 | 0.978 | 0.991 | 0.984 |
| b | 42 | 4753 | weighted average | 0.774 | 0.226 | 0.871 | 0.774 | 0.815 |
| classified as − > | a | b | a: positive | 0.803 | 0.006 | 0.871 | 0.803 | 0.836 |
| a | 196 | 48 | b: positive | 0.994 | 0.197 | 0.990 | 0.994 | 0.992 |
| b | 29 | 4766 | weighted average | 0.899 | 0.101 | 0.931 | 0.899 | 0.914 |
188D | classified as − > | a | b | a: positive | 0.877 | 0.002 | 0.951 | 0.877 | 0.913 |
| a | 214 | 30 | b: positive | 0.998 | 0.123 | 0.994 | 0.998 | 0.996 |
| b | 11 | 4784 | weighted average | 0.937 | 0.063 | 0.972 | 0.937 | 0.954 |
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| a | 0 | 244 | b: positive | 1.000 | 1.000 | 0.952 | 1.000 | 0.975 |
| b | 0 | 4795 | weighted average | 0.500 | 0.500 | 0.476 | 0.500 | 0.488 |
| classified as − > | a | b | a: positive | 0.082 | 0.004 | 0.526 | 0.082 | 0.142 |
| a | 20 | 224 | b: positive | 0.996 | 0.918 | 0.955 | 0.996 | 0.975 |
| b | 18 | 4777 | weighted average | 0.539 | 0.461 | 0.741 | 0.539 | 0.559 |
| classified as − > | a | b | a: positive | 0.447 | 0.010 | 0.686 | 0.447 | 0.541 |
| a | 109 | 135 | b: positive | 0.990 | 0.553 | 0.972 | 0.990 | 0.981 |
| b | 50 | 4745 | weighted average | 0.718 | 0.282 | 0.829 | 0.718 | 0.761 |
| classified as − > | a | b | a: positive | 0.557 | 0.009 | 0.764 | 0.557 | 0.645 |
| a | 136 | 108 | b: positive | 0.991 | 0.443 | 0.978 | 0.991 | 0.984 |
| b | 42 | 4753 | weighted average | 0.774 | 0.226 | 0.871 | 0.774 | 0.815 |
| classified as − > | a | b | a: positive | 0.803 | 0.006 | 0.871 | 0.803 | 0.836 |
| a | 196 | 48 | b: positive | 0.994 | 0.197 | 0.990 | 0.994 | 0.992 |
| b | 29 | 4766 | weighted average | 0.899 | 0.101 | 0.931 | 0.899 | 0.914 |
188D | classified as − > | a | b | a: positive | 0.877 | 0.002 | 0.951 | 0.877 | 0.913 |
| a | 214 | 30 | b: positive | 0.998 | 0.123 | 0.994 | 0.998 | 0.996 |
| b | 11 | 4784 | weighted average | 0.937 | 0.063 | 0.972 | 0.937 | 0.954 |
Table 5Quantitative results of the ensemble classifier with its base classifier to be LDA
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.045 | 0.021 | 0.099 | 0.045 | 0.062 |
| a | 11 | 233 | b: positive | 0.979 | 0.955 | 0.953 | 0.979 | 0.966 |
| b | 100 | 4695 | weighted average | 0.512 | 0.488 | 0.526 | 0.512 | 0.514 |
| classified as − > | a | b | a: positive | 0.057 | 0.019 | 0.136 | 0.057 | 0.081 |
| a | 14 | 230 | b: positive | 0.981 | 0.943 | 0.953 | 0.981 | 0.967 |
| b | 89 | 4706 | weighted average | 0.519 | 0.481 | 0.545 | 0.519 | 0.524 |
| classified as − > | a | b | a: positive | 0.107 | 0.016 | 0.250 | 0.107 | 0.149 |
| a | 26 | 218 | b: positive | 0.984 | 0.893 | 0.956 | 0.984 | 0.970 |
| b | 78 | 4717 | weighted average | 0.545 | 0.455 | 0.603 | 0.545 | 0.560 |
| classified as − > | a | b | a: positive | 0.135 | 0.017 | 0.292 | 0.135 | 0.185 |
| a | 33 | 211 | b: positive | 0.983 | 0.865 | 0.957 | 0.983 | 0.970 |
| b | 80 | 4715 | weighted average | 0.559 | 0.441 | 0.625 | 0.559 | 0.577 |
| classified as − > | a | b | a: positive | 0.434 | 0.015 | 0.596 | 0.434 | 0.502 |
| a | 106 | 138 | b: positive | 0.985 | 0.566 | 0.972 | 0.985 | 0.978 |
| b | 72 | 4723 | weighted average | 0.710 | 0.290 | 0.784 | 0.710 | 0.740 |
188D | classified as − > | a | b | a: positive | 0.820 | 0.013 | 0.766 | 0.820 | 0.792 |
| a | 200 | 44 | b: positive | 0.987 | 0.180 | 0.991 | 0.987 | 0.989 |
| b | 61 | 4734 | weighted average | 0.903 | 0.097 | 0.879 | 0.903 | 0.891 |
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.045 | 0.021 | 0.099 | 0.045 | 0.062 |
| a | 11 | 233 | b: positive | 0.979 | 0.955 | 0.953 | 0.979 | 0.966 |
| b | 100 | 4695 | weighted average | 0.512 | 0.488 | 0.526 | 0.512 | 0.514 |
| classified as − > | a | b | a: positive | 0.057 | 0.019 | 0.136 | 0.057 | 0.081 |
| a | 14 | 230 | b: positive | 0.981 | 0.943 | 0.953 | 0.981 | 0.967 |
| b | 89 | 4706 | weighted average | 0.519 | 0.481 | 0.545 | 0.519 | 0.524 |
| classified as − > | a | b | a: positive | 0.107 | 0.016 | 0.250 | 0.107 | 0.149 |
| a | 26 | 218 | b: positive | 0.984 | 0.893 | 0.956 | 0.984 | 0.970 |
| b | 78 | 4717 | weighted average | 0.545 | 0.455 | 0.603 | 0.545 | 0.560 |
| classified as − > | a | b | a: positive | 0.135 | 0.017 | 0.292 | 0.135 | 0.185 |
| a | 33 | 211 | b: positive | 0.983 | 0.865 | 0.957 | 0.983 | 0.970 |
| b | 80 | 4715 | weighted average | 0.559 | 0.441 | 0.625 | 0.559 | 0.577 |
| classified as − > | a | b | a: positive | 0.434 | 0.015 | 0.596 | 0.434 | 0.502 |
| a | 106 | 138 | b: positive | 0.985 | 0.566 | 0.972 | 0.985 | 0.978 |
| b | 72 | 4723 | weighted average | 0.710 | 0.290 | 0.784 | 0.710 | 0.740 |
188D | classified as − > | a | b | a: positive | 0.820 | 0.013 | 0.766 | 0.820 | 0.792 |
| a | 200 | 44 | b: positive | 0.987 | 0.180 | 0.991 | 0.987 | 0.989 |
| b | 61 | 4734 | weighted average | 0.903 | 0.097 | 0.879 | 0.903 | 0.891 |
Table 5Quantitative results of the ensemble classifier with its base classifier to be LDA
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.045 | 0.021 | 0.099 | 0.045 | 0.062 |
| a | 11 | 233 | b: positive | 0.979 | 0.955 | 0.953 | 0.979 | 0.966 |
| b | 100 | 4695 | weighted average | 0.512 | 0.488 | 0.526 | 0.512 | 0.514 |
| classified as − > | a | b | a: positive | 0.057 | 0.019 | 0.136 | 0.057 | 0.081 |
| a | 14 | 230 | b: positive | 0.981 | 0.943 | 0.953 | 0.981 | 0.967 |
| b | 89 | 4706 | weighted average | 0.519 | 0.481 | 0.545 | 0.519 | 0.524 |
| classified as − > | a | b | a: positive | 0.107 | 0.016 | 0.250 | 0.107 | 0.149 |
| a | 26 | 218 | b: positive | 0.984 | 0.893 | 0.956 | 0.984 | 0.970 |
| b | 78 | 4717 | weighted average | 0.545 | 0.455 | 0.603 | 0.545 | 0.560 |
| classified as − > | a | b | a: positive | 0.135 | 0.017 | 0.292 | 0.135 | 0.185 |
| a | 33 | 211 | b: positive | 0.983 | 0.865 | 0.957 | 0.983 | 0.970 |
| b | 80 | 4715 | weighted average | 0.559 | 0.441 | 0.625 | 0.559 | 0.577 |
| classified as − > | a | b | a: positive | 0.434 | 0.015 | 0.596 | 0.434 | 0.502 |
| a | 106 | 138 | b: positive | 0.985 | 0.566 | 0.972 | 0.985 | 0.978 |
| b | 72 | 4723 | weighted average | 0.710 | 0.290 | 0.784 | 0.710 | 0.740 |
188D | classified as − > | a | b | a: positive | 0.820 | 0.013 | 0.766 | 0.820 | 0.792 |
| a | 200 | 44 | b: positive | 0.987 | 0.180 | 0.991 | 0.987 | 0.989 |
| b | 61 | 4734 | weighted average | 0.903 | 0.097 | 0.879 | 0.903 | 0.891 |
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.045 | 0.021 | 0.099 | 0.045 | 0.062 |
| a | 11 | 233 | b: positive | 0.979 | 0.955 | 0.953 | 0.979 | 0.966 |
| b | 100 | 4695 | weighted average | 0.512 | 0.488 | 0.526 | 0.512 | 0.514 |
| classified as − > | a | b | a: positive | 0.057 | 0.019 | 0.136 | 0.057 | 0.081 |
| a | 14 | 230 | b: positive | 0.981 | 0.943 | 0.953 | 0.981 | 0.967 |
| b | 89 | 4706 | weighted average | 0.519 | 0.481 | 0.545 | 0.519 | 0.524 |
| classified as − > | a | b | a: positive | 0.107 | 0.016 | 0.250 | 0.107 | 0.149 |
| a | 26 | 218 | b: positive | 0.984 | 0.893 | 0.956 | 0.984 | 0.970 |
| b | 78 | 4717 | weighted average | 0.545 | 0.455 | 0.603 | 0.545 | 0.560 |
| classified as − > | a | b | a: positive | 0.135 | 0.017 | 0.292 | 0.135 | 0.185 |
| a | 33 | 211 | b: positive | 0.983 | 0.865 | 0.957 | 0.983 | 0.970 |
| b | 80 | 4715 | weighted average | 0.559 | 0.441 | 0.625 | 0.559 | 0.577 |
| classified as − > | a | b | a: positive | 0.434 | 0.015 | 0.596 | 0.434 | 0.502 |
| a | 106 | 138 | b: positive | 0.985 | 0.566 | 0.972 | 0.985 | 0.978 |
| b | 72 | 4723 | weighted average | 0.710 | 0.290 | 0.784 | 0.710 | 0.740 |
188D | classified as − > | a | b | a: positive | 0.820 | 0.013 | 0.766 | 0.820 | 0.792 |
| a | 200 | 44 | b: positive | 0.987 | 0.180 | 0.991 | 0.987 | 0.989 |
| b | 61 | 4734 | weighted average | 0.903 | 0.097 | 0.879 | 0.903 | 0.891 |
Table 6Quantitative results of the ensemble classifier with its base classifier to be GNB
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.004 | 0.005 | 0.040 | 0.004 | 0.007 |
| a | 1 | 243 | b: positive | 0.995 | 0.996 | 0.952 | 0.995 | 0.973 |
| b | 24 | 4771 | weighted average | 0.500 | 0.500 | 0.496 | 0.500 | 0.490 |
| classified as − > | a | b | a: positive | 0.357 | 0.021 | 0.463 | 0.357 | 0.403 |
| a | 87 | 157 | b: positive | 0.979 | 0.643 | 0.968 | 0.979 | 0.973 |
| b | 101 | 4694 | weighted average | 0.668 | 0.332 | 0.715 | 0.668 | 0.688 |
| classified as − > | a | b | a: positive | 0.623 | 0.021 | 0.603 | 0.623 | 0.613 |
| a | 152 | 92 | b: positive | 0.979 | 0.377 | 0.981 | 0.979 | 0.980 |
| b | 100 | 4695 | weighted average | 0.801 | 0.199 | 0.792 | 0.801 | 0.796 |
| classified as − > | a | b | a: positive | 0.721 | 0.017 | 0.682 | 0.721 | 0.701 |
| a | 176 | 68 | b: positive | 0.983 | 0.279 | 0.986 | 0.983 | 0.984 |
| b | 82 | 4713 | weighted average | 0.852 | 0.148 | 0.834 | 0.852 | 0.843 |
| classified as − > | a | b | a: positive | 0.910 | 0.054 | 0.461 | 0.910 | 0.612 |
| a | 222 | 22 | b: positive | 0.946 | 0.090 | 0.995 | 0.946 | 0.970 |
| b | 260 | 4535 | weighted average | 0.928 | 0.072 | 0.728 | 0.928 | 0.791 |
188D | classified as − > | a | b | a: positive | 0.955 | 0.293 | 0.142 | 0.955 | 0.248 |
| a | 233 | 11 | b: positive | 0.707 | 0.045 | 0.997 | 0.707 | 0.827 |
| b | 1404 | 3391 | weighted average | 0.831 | 0.169 | 0.570 | 0.831 | 0.538 |
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.004 | 0.005 | 0.040 | 0.004 | 0.007 |
| a | 1 | 243 | b: positive | 0.995 | 0.996 | 0.952 | 0.995 | 0.973 |
| b | 24 | 4771 | weighted average | 0.500 | 0.500 | 0.496 | 0.500 | 0.490 |
| classified as − > | a | b | a: positive | 0.357 | 0.021 | 0.463 | 0.357 | 0.403 |
| a | 87 | 157 | b: positive | 0.979 | 0.643 | 0.968 | 0.979 | 0.973 |
| b | 101 | 4694 | weighted average | 0.668 | 0.332 | 0.715 | 0.668 | 0.688 |
| classified as − > | a | b | a: positive | 0.623 | 0.021 | 0.603 | 0.623 | 0.613 |
| a | 152 | 92 | b: positive | 0.979 | 0.377 | 0.981 | 0.979 | 0.980 |
| b | 100 | 4695 | weighted average | 0.801 | 0.199 | 0.792 | 0.801 | 0.796 |
| classified as − > | a | b | a: positive | 0.721 | 0.017 | 0.682 | 0.721 | 0.701 |
| a | 176 | 68 | b: positive | 0.983 | 0.279 | 0.986 | 0.983 | 0.984 |
| b | 82 | 4713 | weighted average | 0.852 | 0.148 | 0.834 | 0.852 | 0.843 |
| classified as − > | a | b | a: positive | 0.910 | 0.054 | 0.461 | 0.910 | 0.612 |
| a | 222 | 22 | b: positive | 0.946 | 0.090 | 0.995 | 0.946 | 0.970 |
| b | 260 | 4535 | weighted average | 0.928 | 0.072 | 0.728 | 0.928 | 0.791 |
188D | classified as − > | a | b | a: positive | 0.955 | 0.293 | 0.142 | 0.955 | 0.248 |
| a | 233 | 11 | b: positive | 0.707 | 0.045 | 0.997 | 0.707 | 0.827 |
| b | 1404 | 3391 | weighted average | 0.831 | 0.169 | 0.570 | 0.831 | 0.538 |
Table 6Quantitative results of the ensemble classifier with its base classifier to be GNB
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.004 | 0.005 | 0.040 | 0.004 | 0.007 |
| a | 1 | 243 | b: positive | 0.995 | 0.996 | 0.952 | 0.995 | 0.973 |
| b | 24 | 4771 | weighted average | 0.500 | 0.500 | 0.496 | 0.500 | 0.490 |
| classified as − > | a | b | a: positive | 0.357 | 0.021 | 0.463 | 0.357 | 0.403 |
| a | 87 | 157 | b: positive | 0.979 | 0.643 | 0.968 | 0.979 | 0.973 |
| b | 101 | 4694 | weighted average | 0.668 | 0.332 | 0.715 | 0.668 | 0.688 |
| classified as − > | a | b | a: positive | 0.623 | 0.021 | 0.603 | 0.623 | 0.613 |
| a | 152 | 92 | b: positive | 0.979 | 0.377 | 0.981 | 0.979 | 0.980 |
| b | 100 | 4695 | weighted average | 0.801 | 0.199 | 0.792 | 0.801 | 0.796 |
| classified as − > | a | b | a: positive | 0.721 | 0.017 | 0.682 | 0.721 | 0.701 |
| a | 176 | 68 | b: positive | 0.983 | 0.279 | 0.986 | 0.983 | 0.984 |
| b | 82 | 4713 | weighted average | 0.852 | 0.148 | 0.834 | 0.852 | 0.843 |
| classified as − > | a | b | a: positive | 0.910 | 0.054 | 0.461 | 0.910 | 0.612 |
| a | 222 | 22 | b: positive | 0.946 | 0.090 | 0.995 | 0.946 | 0.970 |
| b | 260 | 4535 | weighted average | 0.928 | 0.072 | 0.728 | 0.928 | 0.791 |
188D | classified as − > | a | b | a: positive | 0.955 | 0.293 | 0.142 | 0.955 | 0.248 |
| a | 233 | 11 | b: positive | 0.707 | 0.045 | 0.997 | 0.707 | 0.827 |
| b | 1404 | 3391 | weighted average | 0.831 | 0.169 | 0.570 | 0.831 | 0.538 |
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.004 | 0.005 | 0.040 | 0.004 | 0.007 |
| a | 1 | 243 | b: positive | 0.995 | 0.996 | 0.952 | 0.995 | 0.973 |
| b | 24 | 4771 | weighted average | 0.500 | 0.500 | 0.496 | 0.500 | 0.490 |
| classified as − > | a | b | a: positive | 0.357 | 0.021 | 0.463 | 0.357 | 0.403 |
| a | 87 | 157 | b: positive | 0.979 | 0.643 | 0.968 | 0.979 | 0.973 |
| b | 101 | 4694 | weighted average | 0.668 | 0.332 | 0.715 | 0.668 | 0.688 |
| classified as − > | a | b | a: positive | 0.623 | 0.021 | 0.603 | 0.623 | 0.613 |
| a | 152 | 92 | b: positive | 0.979 | 0.377 | 0.981 | 0.979 | 0.980 |
| b | 100 | 4695 | weighted average | 0.801 | 0.199 | 0.792 | 0.801 | 0.796 |
| classified as − > | a | b | a: positive | 0.721 | 0.017 | 0.682 | 0.721 | 0.701 |
| a | 176 | 68 | b: positive | 0.983 | 0.279 | 0.986 | 0.983 | 0.984 |
| b | 82 | 4713 | weighted average | 0.852 | 0.148 | 0.834 | 0.852 | 0.843 |
| classified as − > | a | b | a: positive | 0.910 | 0.054 | 0.461 | 0.910 | 0.612 |
| a | 222 | 22 | b: positive | 0.946 | 0.090 | 0.995 | 0.946 | 0.970 |
| b | 260 | 4535 | weighted average | 0.928 | 0.072 | 0.728 | 0.928 | 0.791 |
188D | classified as − > | a | b | a: positive | 0.955 | 0.293 | 0.142 | 0.955 | 0.248 |
| a | 233 | 11 | b: positive | 0.707 | 0.045 | 0.997 | 0.707 | 0.827 |
| b | 1404 | 3391 | weighted average | 0.831 | 0.169 | 0.570 | 0.831 | 0.538 |
Table 7Quantitative results of the ensemble classifier with its base classifier to be SVM
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| a | 0 | 244 | b: positive | 1.000 | 1.000 | 0.952 | 1.000 | 0.975 |
| b | 0 | 4795 | weighted average | 0.500 | 0.500 | 0.476 | 0.500 | 0.488 |
| classified as − > | a | b | a: positive | 0.074 | 0.004 | 0.486 | 0.074 | 0.128 |
| a | 18 | 226 | b: positive | 0.996 | 0.926 | 0.955 | 0.996 | 0.975 |
| b | 19 | 4776 | weighted average | 0.535 | 0.465 | 0.721 | 0.535 | 0.552 |
| classified as − > | a | b | a: positive | 0.496 | 0.010 | 0.708 | 0.496 | 0.583 |
| a | 121 | 123 | b: positive | 0.990 | 0.504 | 0.975 | 0.990 | 0.982 |
| b | 50 | 4745 | weighted average | 0.743 | 0.257 | 0.841 | 0.743 | 0.783 |
| classified as − > | a | b | a: positive | 0.582 | 0.009 | 0.763 | 0.582 | 0.660 |
| a | 142 | 102 | b: positive | 0.991 | 0.418 | 0.979 | 0.991 | 0.985 |
| b | 44 | 4751 | weighted average | 0.786 | 0.214 | 0.871 | 0.786 | 0.823 |
| classified as − > | a | b | a: positive | 0.754 | 0.004 | 0.911 | 0.754 | 0.825 |
| a | 184 | 60 | b: positive | 0.996 | 0.246 | 0.988 | 0.996 | 0.992 |
| b | 18 | 4777 | weighted average | 0.875 | 0.125 | 0.949 | 0.875 | 0.909 |
188D | classified as − > | a | b | a: positive | 0.828 | 0.001 | 0.981 | 0.828 | 0.898 |
| a | 202 | 42 | b: positive | 0.999 | 0.172 | 0.991 | 0.999 | 0.995 |
| b | 4 | 4791 | weighted average | 0.914 | 0.086 | 0.986 | 0.914 | 0.947 |
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| a | 0 | 244 | b: positive | 1.000 | 1.000 | 0.952 | 1.000 | 0.975 |
| b | 0 | 4795 | weighted average | 0.500 | 0.500 | 0.476 | 0.500 | 0.488 |
| classified as − > | a | b | a: positive | 0.074 | 0.004 | 0.486 | 0.074 | 0.128 |
| a | 18 | 226 | b: positive | 0.996 | 0.926 | 0.955 | 0.996 | 0.975 |
| b | 19 | 4776 | weighted average | 0.535 | 0.465 | 0.721 | 0.535 | 0.552 |
| classified as − > | a | b | a: positive | 0.496 | 0.010 | 0.708 | 0.496 | 0.583 |
| a | 121 | 123 | b: positive | 0.990 | 0.504 | 0.975 | 0.990 | 0.982 |
| b | 50 | 4745 | weighted average | 0.743 | 0.257 | 0.841 | 0.743 | 0.783 |
| classified as − > | a | b | a: positive | 0.582 | 0.009 | 0.763 | 0.582 | 0.660 |
| a | 142 | 102 | b: positive | 0.991 | 0.418 | 0.979 | 0.991 | 0.985 |
| b | 44 | 4751 | weighted average | 0.786 | 0.214 | 0.871 | 0.786 | 0.823 |
| classified as − > | a | b | a: positive | 0.754 | 0.004 | 0.911 | 0.754 | 0.825 |
| a | 184 | 60 | b: positive | 0.996 | 0.246 | 0.988 | 0.996 | 0.992 |
| b | 18 | 4777 | weighted average | 0.875 | 0.125 | 0.949 | 0.875 | 0.909 |
188D | classified as − > | a | b | a: positive | 0.828 | 0.001 | 0.981 | 0.828 | 0.898 |
| a | 202 | 42 | b: positive | 0.999 | 0.172 | 0.991 | 0.999 | 0.995 |
| b | 4 | 4791 | weighted average | 0.914 | 0.086 | 0.986 | 0.914 | 0.947 |
Table 7Quantitative results of the ensemble classifier with its base classifier to be SVM
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| a | 0 | 244 | b: positive | 1.000 | 1.000 | 0.952 | 1.000 | 0.975 |
| b | 0 | 4795 | weighted average | 0.500 | 0.500 | 0.476 | 0.500 | 0.488 |
| classified as − > | a | b | a: positive | 0.074 | 0.004 | 0.486 | 0.074 | 0.128 |
| a | 18 | 226 | b: positive | 0.996 | 0.926 | 0.955 | 0.996 | 0.975 |
| b | 19 | 4776 | weighted average | 0.535 | 0.465 | 0.721 | 0.535 | 0.552 |
| classified as − > | a | b | a: positive | 0.496 | 0.010 | 0.708 | 0.496 | 0.583 |
| a | 121 | 123 | b: positive | 0.990 | 0.504 | 0.975 | 0.990 | 0.982 |
| b | 50 | 4745 | weighted average | 0.743 | 0.257 | 0.841 | 0.743 | 0.783 |
| classified as − > | a | b | a: positive | 0.582 | 0.009 | 0.763 | 0.582 | 0.660 |
| a | 142 | 102 | b: positive | 0.991 | 0.418 | 0.979 | 0.991 | 0.985 |
| b | 44 | 4751 | weighted average | 0.786 | 0.214 | 0.871 | 0.786 | 0.823 |
| classified as − > | a | b | a: positive | 0.754 | 0.004 | 0.911 | 0.754 | 0.825 |
| a | 184 | 60 | b: positive | 0.996 | 0.246 | 0.988 | 0.996 | 0.992 |
| b | 18 | 4777 | weighted average | 0.875 | 0.125 | 0.949 | 0.875 | 0.909 |
188D | classified as − > | a | b | a: positive | 0.828 | 0.001 | 0.981 | 0.828 | 0.898 |
| a | 202 | 42 | b: positive | 0.999 | 0.172 | 0.991 | 0.999 | 0.995 |
| b | 4 | 4791 | weighted average | 0.914 | 0.086 | 0.986 | 0.914 | 0.947 |
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| a | 0 | 244 | b: positive | 1.000 | 1.000 | 0.952 | 1.000 | 0.975 |
| b | 0 | 4795 | weighted average | 0.500 | 0.500 | 0.476 | 0.500 | 0.488 |
| classified as − > | a | b | a: positive | 0.074 | 0.004 | 0.486 | 0.074 | 0.128 |
| a | 18 | 226 | b: positive | 0.996 | 0.926 | 0.955 | 0.996 | 0.975 |
| b | 19 | 4776 | weighted average | 0.535 | 0.465 | 0.721 | 0.535 | 0.552 |
| classified as − > | a | b | a: positive | 0.496 | 0.010 | 0.708 | 0.496 | 0.583 |
| a | 121 | 123 | b: positive | 0.990 | 0.504 | 0.975 | 0.990 | 0.982 |
| b | 50 | 4745 | weighted average | 0.743 | 0.257 | 0.841 | 0.743 | 0.783 |
| classified as − > | a | b | a: positive | 0.582 | 0.009 | 0.763 | 0.582 | 0.660 |
| a | 142 | 102 | b: positive | 0.991 | 0.418 | 0.979 | 0.991 | 0.985 |
| b | 44 | 4751 | weighted average | 0.786 | 0.214 | 0.871 | 0.786 | 0.823 |
| classified as − > | a | b | a: positive | 0.754 | 0.004 | 0.911 | 0.754 | 0.825 |
| a | 184 | 60 | b: positive | 0.996 | 0.246 | 0.988 | 0.996 | 0.992 |
| b | 18 | 4777 | weighted average | 0.875 | 0.125 | 0.949 | 0.875 | 0.909 |
188D | classified as − > | a | b | a: positive | 0.828 | 0.001 | 0.981 | 0.828 | 0.898 |
| a | 202 | 42 | b: positive | 0.999 | 0.172 | 0.991 | 0.999 | 0.995 |
| b | 4 | 4791 | weighted average | 0.914 | 0.086 | 0.986 | 0.914 | 0.947 |
Table 8Quantitative results of KNN using WEKA
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.262 | 0.032 | 0.294 | 0.262 | 0.277 |
| a | 64 | 180 | b: positive | 0.968 | 0.738 | 0.963 | 0.968 | 0.965 |
| b | 154 | 4641 | weighted average | 0.615 | 0.385 | 0.628 | 0.615 | 0.621 |
| classified as − > | a | b | a: positive | 0.352 | 0.030 | 0.372 | 0.352 | 0.362 |
| a | 86 | 158 | b: positive | 0.970 | 0.648 | 0.967 | 0.970 | 0.968 |
| b | 145 | 4650 | weighted average | 0.661 | 0.339 | 0.670 | 0.661 | 0.665 |
| classified as − > | a | b | a: positive | 0.455 | 0.028 | 0.455 | 0.455 | 0.455 |
| a | 111 | 133 | b: positive | 0.972 | 0.545 | 0.972 | 0.972 | 0.972 |
| b | 133 | 4662 | weighted average | 0.714 | 0.286 | 0.714 | 0.714 | 0.714 |
| classified as − > | a | b | a: positive | 0.578 | 0.023 | 0.562 | 0.578 | 0.570 |
| a | 141 | 103 | b: positive | 0.977 | 0.422 | 0.978 | 0.977 | 0.978 |
| b | 110 | 4685 | weighted average | 0.777 | 0.223 | 0.770 | 0.777 | 0.774 |
| classified as − > | a | b | a: positive | 0.799 | 0.020 | 0.666 | 0.799 | 0.726 |
| a | 195 | 49 | b: positive | 0.980 | 0.201 | 0.990 | 0.980 | 0.985 |
| b | 98 | 4697 | weighted average | 0.889 | 0.111 | 0.828 | 0.889 | 0.855 |
188D | classified as − > | a | b | a: positive | 0.840 | 0.024 | 0.639 | 0.840 | 0.726 |
| a | 205 | 39 | b: positive | 0.976 | 0.160 | 0.992 | 0.976 | 0.984 |
| b | 116 | 4679 | weighted average | 0.908 | 0.092 | 0.815 | 0.908 | 0.855 |
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.262 | 0.032 | 0.294 | 0.262 | 0.277 |
| a | 64 | 180 | b: positive | 0.968 | 0.738 | 0.963 | 0.968 | 0.965 |
| b | 154 | 4641 | weighted average | 0.615 | 0.385 | 0.628 | 0.615 | 0.621 |
| classified as − > | a | b | a: positive | 0.352 | 0.030 | 0.372 | 0.352 | 0.362 |
| a | 86 | 158 | b: positive | 0.970 | 0.648 | 0.967 | 0.970 | 0.968 |
| b | 145 | 4650 | weighted average | 0.661 | 0.339 | 0.670 | 0.661 | 0.665 |
| classified as − > | a | b | a: positive | 0.455 | 0.028 | 0.455 | 0.455 | 0.455 |
| a | 111 | 133 | b: positive | 0.972 | 0.545 | 0.972 | 0.972 | 0.972 |
| b | 133 | 4662 | weighted average | 0.714 | 0.286 | 0.714 | 0.714 | 0.714 |
| classified as − > | a | b | a: positive | 0.578 | 0.023 | 0.562 | 0.578 | 0.570 |
| a | 141 | 103 | b: positive | 0.977 | 0.422 | 0.978 | 0.977 | 0.978 |
| b | 110 | 4685 | weighted average | 0.777 | 0.223 | 0.770 | 0.777 | 0.774 |
| classified as − > | a | b | a: positive | 0.799 | 0.020 | 0.666 | 0.799 | 0.726 |
| a | 195 | 49 | b: positive | 0.980 | 0.201 | 0.990 | 0.980 | 0.985 |
| b | 98 | 4697 | weighted average | 0.889 | 0.111 | 0.828 | 0.889 | 0.855 |
188D | classified as − > | a | b | a: positive | 0.840 | 0.024 | 0.639 | 0.840 | 0.726 |
| a | 205 | 39 | b: positive | 0.976 | 0.160 | 0.992 | 0.976 | 0.984 |
| b | 116 | 4679 | weighted average | 0.908 | 0.092 | 0.815 | 0.908 | 0.855 |
Table 8Quantitative results of KNN using WEKA
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.262 | 0.032 | 0.294 | 0.262 | 0.277 |
| a | 64 | 180 | b: positive | 0.968 | 0.738 | 0.963 | 0.968 | 0.965 |
| b | 154 | 4641 | weighted average | 0.615 | 0.385 | 0.628 | 0.615 | 0.621 |
| classified as − > | a | b | a: positive | 0.352 | 0.030 | 0.372 | 0.352 | 0.362 |
| a | 86 | 158 | b: positive | 0.970 | 0.648 | 0.967 | 0.970 | 0.968 |
| b | 145 | 4650 | weighted average | 0.661 | 0.339 | 0.670 | 0.661 | 0.665 |
| classified as − > | a | b | a: positive | 0.455 | 0.028 | 0.455 | 0.455 | 0.455 |
| a | 111 | 133 | b: positive | 0.972 | 0.545 | 0.972 | 0.972 | 0.972 |
| b | 133 | 4662 | weighted average | 0.714 | 0.286 | 0.714 | 0.714 | 0.714 |
| classified as − > | a | b | a: positive | 0.578 | 0.023 | 0.562 | 0.578 | 0.570 |
| a | 141 | 103 | b: positive | 0.977 | 0.422 | 0.978 | 0.977 | 0.978 |
| b | 110 | 4685 | weighted average | 0.777 | 0.223 | 0.770 | 0.777 | 0.774 |
| classified as − > | a | b | a: positive | 0.799 | 0.020 | 0.666 | 0.799 | 0.726 |
| a | 195 | 49 | b: positive | 0.980 | 0.201 | 0.990 | 0.980 | 0.985 |
| b | 98 | 4697 | weighted average | 0.889 | 0.111 | 0.828 | 0.889 | 0.855 |
188D | classified as − > | a | b | a: positive | 0.840 | 0.024 | 0.639 | 0.840 | 0.726 |
| a | 205 | 39 | b: positive | 0.976 | 0.160 | 0.992 | 0.976 | 0.984 |
| b | 116 | 4679 | weighted average | 0.908 | 0.092 | 0.815 | 0.908 | 0.855 |
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.262 | 0.032 | 0.294 | 0.262 | 0.277 |
| a | 64 | 180 | b: positive | 0.968 | 0.738 | 0.963 | 0.968 | 0.965 |
| b | 154 | 4641 | weighted average | 0.615 | 0.385 | 0.628 | 0.615 | 0.621 |
| classified as − > | a | b | a: positive | 0.352 | 0.030 | 0.372 | 0.352 | 0.362 |
| a | 86 | 158 | b: positive | 0.970 | 0.648 | 0.967 | 0.970 | 0.968 |
| b | 145 | 4650 | weighted average | 0.661 | 0.339 | 0.670 | 0.661 | 0.665 |
| classified as − > | a | b | a: positive | 0.455 | 0.028 | 0.455 | 0.455 | 0.455 |
| a | 111 | 133 | b: positive | 0.972 | 0.545 | 0.972 | 0.972 | 0.972 |
| b | 133 | 4662 | weighted average | 0.714 | 0.286 | 0.714 | 0.714 | 0.714 |
| classified as − > | a | b | a: positive | 0.578 | 0.023 | 0.562 | 0.578 | 0.570 |
| a | 141 | 103 | b: positive | 0.977 | 0.422 | 0.978 | 0.977 | 0.978 |
| b | 110 | 4685 | weighted average | 0.777 | 0.223 | 0.770 | 0.777 | 0.774 |
| classified as − > | a | b | a: positive | 0.799 | 0.020 | 0.666 | 0.799 | 0.726 |
| a | 195 | 49 | b: positive | 0.980 | 0.201 | 0.990 | 0.980 | 0.985 |
| b | 98 | 4697 | weighted average | 0.889 | 0.111 | 0.828 | 0.889 | 0.855 |
188D | classified as − > | a | b | a: positive | 0.840 | 0.024 | 0.639 | 0.840 | 0.726 |
| a | 205 | 39 | b: positive | 0.976 | 0.160 | 0.992 | 0.976 | 0.984 |
| b | 116 | 4679 | weighted average | 0.908 | 0.092 | 0.815 | 0.908 | 0.855 |
Table 9Quantitative results of MLP using WEKA
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| a | 0 | 244 | b: positive | 1.000 | 1.000 | 0.952 | 1.000 | 0.975 |
| b | 0 | 4795 | weighted average | 0.500 | 0.500 | 0.476 | 0.500 | 0.488 |
| classified as − > | a | b | a: positive | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| a | 0 | 244 | b: positive | 1.000 | 1.000 | 0.952 | 1.000 | 0.975 |
| b | 0 | 4795 | weighted average | 0.500 | 0.500 | 0.476 | 0.500 | 0.488 |
| classified as − > | a | b | a: positive | 0.238 | 0.012 | 0.500 | 0.238 | 0.322 |
| a | 58 | 186 | b: positive | 0.988 | 0.762 | 0.962 | 0.988 | 0.975 |
| b | 58 | 4737 | weighted average | 0.613 | 0.387 | 0.731 | 0.613 | 0.649 |
| classified as − > | a | b | a: positive | 0.504 | 0.009 | 0.737 | 0.504 | 0.599 |
| a | 123 | 121 | b: positive | 0.991 | 0.496 | 0.975 | 0.991 | 0.983 |
| b | 44 | 4751 | weighted average | 0.747 | 0.253 | 0.856 | 0.747 | 0.791 |
| classified as − > | a | b | a: positive | 0.750 | 0.008 | 0.832 | 0.750 | 0.789 |
| a | 183 | 61 | b: positive | 0.992 | 0.250 | 0.987 | 0.992 | 0.990 |
| b | 37 | 4758 | weighted average | 0.871 | 0.129 | 0.910 | 0.871 | 0.889 |
188D | classified as − > | a | b | : positive | 0.004 | 0.001 | 0.167 | 0.004 | 0.008 |
| a | 1 | 243 | b: positive | 0.999 | 0.996 | 0.952 | 0.999 | 0.975 |
| b | 5 | 4790 | weighted average | 0.502 | 0.498 | 0.559 | 0.502 | 0.491 |
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| a | 0 | 244 | b: positive | 1.000 | 1.000 | 0.952 | 1.000 | 0.975 |
| b | 0 | 4795 | weighted average | 0.500 | 0.500 | 0.476 | 0.500 | 0.488 |
| classified as − > | a | b | a: positive | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| a | 0 | 244 | b: positive | 1.000 | 1.000 | 0.952 | 1.000 | 0.975 |
| b | 0 | 4795 | weighted average | 0.500 | 0.500 | 0.476 | 0.500 | 0.488 |
| classified as − > | a | b | a: positive | 0.238 | 0.012 | 0.500 | 0.238 | 0.322 |
| a | 58 | 186 | b: positive | 0.988 | 0.762 | 0.962 | 0.988 | 0.975 |
| b | 58 | 4737 | weighted average | 0.613 | 0.387 | 0.731 | 0.613 | 0.649 |
| classified as − > | a | b | a: positive | 0.504 | 0.009 | 0.737 | 0.504 | 0.599 |
| a | 123 | 121 | b: positive | 0.991 | 0.496 | 0.975 | 0.991 | 0.983 |
| b | 44 | 4751 | weighted average | 0.747 | 0.253 | 0.856 | 0.747 | 0.791 |
| classified as − > | a | b | a: positive | 0.750 | 0.008 | 0.832 | 0.750 | 0.789 |
| a | 183 | 61 | b: positive | 0.992 | 0.250 | 0.987 | 0.992 | 0.990 |
| b | 37 | 4758 | weighted average | 0.871 | 0.129 | 0.910 | 0.871 | 0.889 |
188D | classified as − > | a | b | : positive | 0.004 | 0.001 | 0.167 | 0.004 | 0.008 |
| a | 1 | 243 | b: positive | 0.999 | 0.996 | 0.952 | 0.999 | 0.975 |
| b | 5 | 4790 | weighted average | 0.502 | 0.498 | 0.559 | 0.502 | 0.491 |
Table 9Quantitative results of MLP using WEKA
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| a | 0 | 244 | b: positive | 1.000 | 1.000 | 0.952 | 1.000 | 0.975 |
| b | 0 | 4795 | weighted average | 0.500 | 0.500 | 0.476 | 0.500 | 0.488 |
| classified as − > | a | b | a: positive | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| a | 0 | 244 | b: positive | 1.000 | 1.000 | 0.952 | 1.000 | 0.975 |
| b | 0 | 4795 | weighted average | 0.500 | 0.500 | 0.476 | 0.500 | 0.488 |
| classified as − > | a | b | a: positive | 0.238 | 0.012 | 0.500 | 0.238 | 0.322 |
| a | 58 | 186 | b: positive | 0.988 | 0.762 | 0.962 | 0.988 | 0.975 |
| b | 58 | 4737 | weighted average | 0.613 | 0.387 | 0.731 | 0.613 | 0.649 |
| classified as − > | a | b | a: positive | 0.504 | 0.009 | 0.737 | 0.504 | 0.599 |
| a | 123 | 121 | b: positive | 0.991 | 0.496 | 0.975 | 0.991 | 0.983 |
| b | 44 | 4751 | weighted average | 0.747 | 0.253 | 0.856 | 0.747 | 0.791 |
| classified as − > | a | b | a: positive | 0.750 | 0.008 | 0.832 | 0.750 | 0.789 |
| a | 183 | 61 | b: positive | 0.992 | 0.250 | 0.987 | 0.992 | 0.990 |
| b | 37 | 4758 | weighted average | 0.871 | 0.129 | 0.910 | 0.871 | 0.889 |
188D | classified as − > | a | b | : positive | 0.004 | 0.001 | 0.167 | 0.004 | 0.008 |
| a | 1 | 243 | b: positive | 0.999 | 0.996 | 0.952 | 0.999 | 0.975 |
| b | 5 | 4790 | weighted average | 0.502 | 0.498 | 0.559 | 0.502 | 0.491 |
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| a | 0 | 244 | b: positive | 1.000 | 1.000 | 0.952 | 1.000 | 0.975 |
| b | 0 | 4795 | weighted average | 0.500 | 0.500 | 0.476 | 0.500 | 0.488 |
| classified as − > | a | b | a: positive | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| a | 0 | 244 | b: positive | 1.000 | 1.000 | 0.952 | 1.000 | 0.975 |
| b | 0 | 4795 | weighted average | 0.500 | 0.500 | 0.476 | 0.500 | 0.488 |
| classified as − > | a | b | a: positive | 0.238 | 0.012 | 0.500 | 0.238 | 0.322 |
| a | 58 | 186 | b: positive | 0.988 | 0.762 | 0.962 | 0.988 | 0.975 |
| b | 58 | 4737 | weighted average | 0.613 | 0.387 | 0.731 | 0.613 | 0.649 |
| classified as − > | a | b | a: positive | 0.504 | 0.009 | 0.737 | 0.504 | 0.599 |
| a | 123 | 121 | b: positive | 0.991 | 0.496 | 0.975 | 0.991 | 0.983 |
| b | 44 | 4751 | weighted average | 0.747 | 0.253 | 0.856 | 0.747 | 0.791 |
| classified as − > | a | b | a: positive | 0.750 | 0.008 | 0.832 | 0.750 | 0.789 |
| a | 183 | 61 | b: positive | 0.992 | 0.250 | 0.987 | 0.992 | 0.990 |
| b | 37 | 4758 | weighted average | 0.871 | 0.129 | 0.910 | 0.871 | 0.889 |
188D | classified as − > | a | b | : positive | 0.004 | 0.001 | 0.167 | 0.004 | 0.008 |
| a | 1 | 243 | b: positive | 0.999 | 0.996 | 0.952 | 0.999 | 0.975 |
| b | 5 | 4790 | weighted average | 0.502 | 0.498 | 0.559 | 0.502 | 0.491 |
Table 10Quantitative results of LDA using WEKA
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.033 | 0.021 | 0.075 | 0.033 | 0.046 |
| a | 8 | 236 | b: positive | 0.979 | 0.967 | 0.952 | 0.979 | 0.966 |
| b | 99 | 4696 | weighted average | 0.506 | 0.494 | 0.513 | 0.506 | 0.506 |
| classified as − > | a | b | a: positive | 0.045 | 0.018 | 0.112 | 0.045 | 0.064 |
| a | 11 | 233 | b: positive | 0.982 | 0.955 | 0.953 | 0.982 | 0.967 |
| b | 87 | 4708 | weighted average | 0.513 | 0.487 | 0.533 | 0.513 | 0.516 |
| classified as − > | a | b | a: positive | 0.082 | 0.016 | 0.211 | 0.082 | 0.118 |
| a | 20 | 224 | b: positive | 0.984 | 0.918 | 0.955 | 0.984 | 0.969 |
| b | 75 | 4720 | weighted average | 0.533 | 0.467 | 0.583 | 0.533 | 0.544 |
| classified as − > | a | b | a: positive | 0.102 | 0.017 | 0.236 | 0.102 | 0.143 |
| a | 25 | 219 | b: positive | 0.983 | 0.898 | 0.956 | 0.983 | 0.969 |
| b | 81 | 4714 | weighted average | 0.543 | 0.457 | 0.596 | 0.543 | 0.556 |
| classified as − > | a | b | a: positive | 0.381 | 0.015 | 0.564 | 0.381 | 0.455 |
| a | 93 | 151 | b: positive | 0.985 | 0.619 | 0.969 | 0.985 | 0.977 |
| b | 72 | 4723 | weighted average | 0.683 | 0.317 | 0.766 | 0.683 | 0.716 |
188D | classified as − > | a | b | a: positive | 0.783 | 0.012 | 0.767 | 0.783 | 0.775 |
| a | 191 | 53 | b: positive | 0.988 | 0.217 | 0.989 | 0.988 | 0.988 |
| b | 58 | 4737 | weighted average | 0.885 | 0.115 | 0.878 | 0.885 | 0.882 |
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.033 | 0.021 | 0.075 | 0.033 | 0.046 |
| a | 8 | 236 | b: positive | 0.979 | 0.967 | 0.952 | 0.979 | 0.966 |
| b | 99 | 4696 | weighted average | 0.506 | 0.494 | 0.513 | 0.506 | 0.506 |
| classified as − > | a | b | a: positive | 0.045 | 0.018 | 0.112 | 0.045 | 0.064 |
| a | 11 | 233 | b: positive | 0.982 | 0.955 | 0.953 | 0.982 | 0.967 |
| b | 87 | 4708 | weighted average | 0.513 | 0.487 | 0.533 | 0.513 | 0.516 |
| classified as − > | a | b | a: positive | 0.082 | 0.016 | 0.211 | 0.082 | 0.118 |
| a | 20 | 224 | b: positive | 0.984 | 0.918 | 0.955 | 0.984 | 0.969 |
| b | 75 | 4720 | weighted average | 0.533 | 0.467 | 0.583 | 0.533 | 0.544 |
| classified as − > | a | b | a: positive | 0.102 | 0.017 | 0.236 | 0.102 | 0.143 |
| a | 25 | 219 | b: positive | 0.983 | 0.898 | 0.956 | 0.983 | 0.969 |
| b | 81 | 4714 | weighted average | 0.543 | 0.457 | 0.596 | 0.543 | 0.556 |
| classified as − > | a | b | a: positive | 0.381 | 0.015 | 0.564 | 0.381 | 0.455 |
| a | 93 | 151 | b: positive | 0.985 | 0.619 | 0.969 | 0.985 | 0.977 |
| b | 72 | 4723 | weighted average | 0.683 | 0.317 | 0.766 | 0.683 | 0.716 |
188D | classified as − > | a | b | a: positive | 0.783 | 0.012 | 0.767 | 0.783 | 0.775 |
| a | 191 | 53 | b: positive | 0.988 | 0.217 | 0.989 | 0.988 | 0.988 |
| b | 58 | 4737 | weighted average | 0.885 | 0.115 | 0.878 | 0.885 | 0.882 |
Table 10Quantitative results of LDA using WEKA
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.033 | 0.021 | 0.075 | 0.033 | 0.046 |
| a | 8 | 236 | b: positive | 0.979 | 0.967 | 0.952 | 0.979 | 0.966 |
| b | 99 | 4696 | weighted average | 0.506 | 0.494 | 0.513 | 0.506 | 0.506 |
| classified as − > | a | b | a: positive | 0.045 | 0.018 | 0.112 | 0.045 | 0.064 |
| a | 11 | 233 | b: positive | 0.982 | 0.955 | 0.953 | 0.982 | 0.967 |
| b | 87 | 4708 | weighted average | 0.513 | 0.487 | 0.533 | 0.513 | 0.516 |
| classified as − > | a | b | a: positive | 0.082 | 0.016 | 0.211 | 0.082 | 0.118 |
| a | 20 | 224 | b: positive | 0.984 | 0.918 | 0.955 | 0.984 | 0.969 |
| b | 75 | 4720 | weighted average | 0.533 | 0.467 | 0.583 | 0.533 | 0.544 |
| classified as − > | a | b | a: positive | 0.102 | 0.017 | 0.236 | 0.102 | 0.143 |
| a | 25 | 219 | b: positive | 0.983 | 0.898 | 0.956 | 0.983 | 0.969 |
| b | 81 | 4714 | weighted average | 0.543 | 0.457 | 0.596 | 0.543 | 0.556 |
| classified as − > | a | b | a: positive | 0.381 | 0.015 | 0.564 | 0.381 | 0.455 |
| a | 93 | 151 | b: positive | 0.985 | 0.619 | 0.969 | 0.985 | 0.977 |
| b | 72 | 4723 | weighted average | 0.683 | 0.317 | 0.766 | 0.683 | 0.716 |
188D | classified as − > | a | b | a: positive | 0.783 | 0.012 | 0.767 | 0.783 | 0.775 |
| a | 191 | 53 | b: positive | 0.988 | 0.217 | 0.989 | 0.988 | 0.988 |
| b | 58 | 4737 | weighted average | 0.885 | 0.115 | 0.878 | 0.885 | 0.882 |
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.033 | 0.021 | 0.075 | 0.033 | 0.046 |
| a | 8 | 236 | b: positive | 0.979 | 0.967 | 0.952 | 0.979 | 0.966 |
| b | 99 | 4696 | weighted average | 0.506 | 0.494 | 0.513 | 0.506 | 0.506 |
| classified as − > | a | b | a: positive | 0.045 | 0.018 | 0.112 | 0.045 | 0.064 |
| a | 11 | 233 | b: positive | 0.982 | 0.955 | 0.953 | 0.982 | 0.967 |
| b | 87 | 4708 | weighted average | 0.513 | 0.487 | 0.533 | 0.513 | 0.516 |
| classified as − > | a | b | a: positive | 0.082 | 0.016 | 0.211 | 0.082 | 0.118 |
| a | 20 | 224 | b: positive | 0.984 | 0.918 | 0.955 | 0.984 | 0.969 |
| b | 75 | 4720 | weighted average | 0.533 | 0.467 | 0.583 | 0.533 | 0.544 |
| classified as − > | a | b | a: positive | 0.102 | 0.017 | 0.236 | 0.102 | 0.143 |
| a | 25 | 219 | b: positive | 0.983 | 0.898 | 0.956 | 0.983 | 0.969 |
| b | 81 | 4714 | weighted average | 0.543 | 0.457 | 0.596 | 0.543 | 0.556 |
| classified as − > | a | b | a: positive | 0.381 | 0.015 | 0.564 | 0.381 | 0.455 |
| a | 93 | 151 | b: positive | 0.985 | 0.619 | 0.969 | 0.985 | 0.977 |
| b | 72 | 4723 | weighted average | 0.683 | 0.317 | 0.766 | 0.683 | 0.716 |
188D | classified as − > | a | b | a: positive | 0.783 | 0.012 | 0.767 | 0.783 | 0.775 |
| a | 191 | 53 | b: positive | 0.988 | 0.217 | 0.989 | 0.988 | 0.988 |
| b | 58 | 4737 | weighted average | 0.885 | 0.115 | 0.878 | 0.885 | 0.882 |
Table 11Quantitative results of GNB using WEKA
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.004 | 0.006 | 0.031 | 0.004 | 0.007 |
| a | 1 | 243 | b: positive | 0.994 | 0.996 | 0.951 | 0.994 | 0.972 |
| b | 31 | 4764 | weighted average | 0.499 | 0.501 | 0.491 | 0.499 | 0.490 |
| classified as − > | a | b | a: positive | 0.270 | 0.020 | 0.412 | 0.270 | 0.327 |
| a | 66 | 178 | b: positive | 0.980 | 0.730 | 0.964 | 0.980 | 0.972 |
| b | 94 | 4701 | weighted average | 0.625 | 0.375 | 0.688 | 0.625 | 0.649 |
| classified as − > | a | b | a: positive | 0.537 | 0.020 | 0.582 | 0.537 | 0.559 |
| a | 131 | 113 | b: positive | 0.980 | 0.463 | 0.977 | 0.980 | 0.978 |
| b | 102 | 4693 | weighted average | 0.759 | 0.241 | 0.779 | 0.759 | 0.769 |
| classified as − > | a | b | a: positive | 0.656 | 0.018 | 0.650 | 0.656 | 0.653 |
| a | 160 | 84 | b: positive | 0.982 | 0.344 | 0.982 | 0.982 | 0.982 |
| b | 86 | 4709 | weighted average | 0.819 | 0.181 | 0.816 | 0.819 | 0.818 |
| classified as − > | a | b | a: positive | 0.914 | 0.064 | 0.419 | 0.914 | 0.575 |
| a | 223 | 21 | b: positive | 0.936 | 0.086 | 0.995 | 0.936 | 0.965 |
| b | 309 | 4486 | weighted average | 0.925 | 0.075 | 0.707 | 0.925 | 0.770 |
188D | classified as − > | a | b | a: positive | 0.955 | 0.309 | 0.136 | 0.955 | 0.238 |
| a | 233 | 11 | b: positive | 0.691 | 0.045 | 0.997 | 0.691 | 0.816 |
| b | 1482 | 3313 | weighted average | 0.823 | 0.177 | 0.566 | 0.823 | 0.527 |
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.004 | 0.006 | 0.031 | 0.004 | 0.007 |
| a | 1 | 243 | b: positive | 0.994 | 0.996 | 0.951 | 0.994 | 0.972 |
| b | 31 | 4764 | weighted average | 0.499 | 0.501 | 0.491 | 0.499 | 0.490 |
| classified as − > | a | b | a: positive | 0.270 | 0.020 | 0.412 | 0.270 | 0.327 |
| a | 66 | 178 | b: positive | 0.980 | 0.730 | 0.964 | 0.980 | 0.972 |
| b | 94 | 4701 | weighted average | 0.625 | 0.375 | 0.688 | 0.625 | 0.649 |
| classified as − > | a | b | a: positive | 0.537 | 0.020 | 0.582 | 0.537 | 0.559 |
| a | 131 | 113 | b: positive | 0.980 | 0.463 | 0.977 | 0.980 | 0.978 |
| b | 102 | 4693 | weighted average | 0.759 | 0.241 | 0.779 | 0.759 | 0.769 |
| classified as − > | a | b | a: positive | 0.656 | 0.018 | 0.650 | 0.656 | 0.653 |
| a | 160 | 84 | b: positive | 0.982 | 0.344 | 0.982 | 0.982 | 0.982 |
| b | 86 | 4709 | weighted average | 0.819 | 0.181 | 0.816 | 0.819 | 0.818 |
| classified as − > | a | b | a: positive | 0.914 | 0.064 | 0.419 | 0.914 | 0.575 |
| a | 223 | 21 | b: positive | 0.936 | 0.086 | 0.995 | 0.936 | 0.965 |
| b | 309 | 4486 | weighted average | 0.925 | 0.075 | 0.707 | 0.925 | 0.770 |
188D | classified as − > | a | b | a: positive | 0.955 | 0.309 | 0.136 | 0.955 | 0.238 |
| a | 233 | 11 | b: positive | 0.691 | 0.045 | 0.997 | 0.691 | 0.816 |
| b | 1482 | 3313 | weighted average | 0.823 | 0.177 | 0.566 | 0.823 | 0.527 |
Table 11Quantitative results of GNB using WEKA
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.004 | 0.006 | 0.031 | 0.004 | 0.007 |
| a | 1 | 243 | b: positive | 0.994 | 0.996 | 0.951 | 0.994 | 0.972 |
| b | 31 | 4764 | weighted average | 0.499 | 0.501 | 0.491 | 0.499 | 0.490 |
| classified as − > | a | b | a: positive | 0.270 | 0.020 | 0.412 | 0.270 | 0.327 |
| a | 66 | 178 | b: positive | 0.980 | 0.730 | 0.964 | 0.980 | 0.972 |
| b | 94 | 4701 | weighted average | 0.625 | 0.375 | 0.688 | 0.625 | 0.649 |
| classified as − > | a | b | a: positive | 0.537 | 0.020 | 0.582 | 0.537 | 0.559 |
| a | 131 | 113 | b: positive | 0.980 | 0.463 | 0.977 | 0.980 | 0.978 |
| b | 102 | 4693 | weighted average | 0.759 | 0.241 | 0.779 | 0.759 | 0.769 |
| classified as − > | a | b | a: positive | 0.656 | 0.018 | 0.650 | 0.656 | 0.653 |
| a | 160 | 84 | b: positive | 0.982 | 0.344 | 0.982 | 0.982 | 0.982 |
| b | 86 | 4709 | weighted average | 0.819 | 0.181 | 0.816 | 0.819 | 0.818 |
| classified as − > | a | b | a: positive | 0.914 | 0.064 | 0.419 | 0.914 | 0.575 |
| a | 223 | 21 | b: positive | 0.936 | 0.086 | 0.995 | 0.936 | 0.965 |
| b | 309 | 4486 | weighted average | 0.925 | 0.075 | 0.707 | 0.925 | 0.770 |
188D | classified as − > | a | b | a: positive | 0.955 | 0.309 | 0.136 | 0.955 | 0.238 |
| a | 233 | 11 | b: positive | 0.691 | 0.045 | 0.997 | 0.691 | 0.816 |
| b | 1482 | 3313 | weighted average | 0.823 | 0.177 | 0.566 | 0.823 | 0.527 |
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.004 | 0.006 | 0.031 | 0.004 | 0.007 |
| a | 1 | 243 | b: positive | 0.994 | 0.996 | 0.951 | 0.994 | 0.972 |
| b | 31 | 4764 | weighted average | 0.499 | 0.501 | 0.491 | 0.499 | 0.490 |
| classified as − > | a | b | a: positive | 0.270 | 0.020 | 0.412 | 0.270 | 0.327 |
| a | 66 | 178 | b: positive | 0.980 | 0.730 | 0.964 | 0.980 | 0.972 |
| b | 94 | 4701 | weighted average | 0.625 | 0.375 | 0.688 | 0.625 | 0.649 |
| classified as − > | a | b | a: positive | 0.537 | 0.020 | 0.582 | 0.537 | 0.559 |
| a | 131 | 113 | b: positive | 0.980 | 0.463 | 0.977 | 0.980 | 0.978 |
| b | 102 | 4693 | weighted average | 0.759 | 0.241 | 0.779 | 0.759 | 0.769 |
| classified as − > | a | b | a: positive | 0.656 | 0.018 | 0.650 | 0.656 | 0.653 |
| a | 160 | 84 | b: positive | 0.982 | 0.344 | 0.982 | 0.982 | 0.982 |
| b | 86 | 4709 | weighted average | 0.819 | 0.181 | 0.816 | 0.819 | 0.818 |
| classified as − > | a | b | a: positive | 0.914 | 0.064 | 0.419 | 0.914 | 0.575 |
| a | 223 | 21 | b: positive | 0.936 | 0.086 | 0.995 | 0.936 | 0.965 |
| b | 309 | 4486 | weighted average | 0.925 | 0.075 | 0.707 | 0.925 | 0.770 |
188D | classified as − > | a | b | a: positive | 0.955 | 0.309 | 0.136 | 0.955 | 0.238 |
| a | 233 | 11 | b: positive | 0.691 | 0.045 | 0.997 | 0.691 | 0.816 |
| b | 1482 | 3313 | weighted average | 0.823 | 0.177 | 0.566 | 0.823 | 0.527 |
Table 12Quantitative results of SVM using WEKA
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| a | 0 | 244 | b: positive | 1.000 | 1.000 | 0.952 | 1.000 | 0.975 |
| b | 0 | 4795 | weighted average | 0.500 | 0.500 | 0.476 | 0.500 | 0.488 |
| classified as − > | a | b | a: positive | 0.307 | 0.009 | 0.630 | 0.307 | 0.413 |
| a | 75 | 169 | b: positive | 0.991 | 0.693 | 0.966 | 0.991 | 0.978 |
| b | 44 | 4751 | weighted average | 0.649 | 0.351 | 0.798 | 0.649 | 0.696 |
| classified as − > | a | b | a: positive | 0.447 | 0.008 | 0.732 | 0.447 | 0.555 |
| a | 109 | 135 | b: positive | 0.992 | 0.553 | 0.972 | 0.992 | 0.982 |
| b | 40 | 4755 | weighted average | 0.719 | 0.281 | 0.852 | 0.719 | 0.768 |
| classified as − > | a | b | a: positive | 0.590 | 0.008 | 0.796 | 0.590 | 0.678 |
| a | 144 | 100 | b: positive | 0.992 | 0.410 | 0.979 | 0.992 | 0.986 |
| b | 37 | 4758 | weighted average | 0.791 | 0.209 | 0.887 | 0.791 | 0.832 |
| classified as − > | a | b | a: positive | 0.693 | 0.003 | 0.923 | 0.693 | 0.792 |
| a | 169 | 75 | b: positive | 0.997 | 0.307 | 0.985 | 0.997 | 0.991 |
| b | 14 | 4781 | weighted average | 0.845 | 0.155 | 0.954 | 0.845 | 0.891 |
188D | classified as − > | a | b | a: positive | 0.172 | 0.000 | 0.977 | 0.172 | 0.293 |
| a | 42 | 202 | b: positive | 1.000 | 0.828 | 0.960 | 1.000 | 0.979 |
| b | 1 | 4794 | weighted average | 0.586 | 0.414 | 0.968 | 0.586 | 0.636 |
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| a | 0 | 244 | b: positive | 1.000 | 1.000 | 0.952 | 1.000 | 0.975 |
| b | 0 | 4795 | weighted average | 0.500 | 0.500 | 0.476 | 0.500 | 0.488 |
| classified as − > | a | b | a: positive | 0.307 | 0.009 | 0.630 | 0.307 | 0.413 |
| a | 75 | 169 | b: positive | 0.991 | 0.693 | 0.966 | 0.991 | 0.978 |
| b | 44 | 4751 | weighted average | 0.649 | 0.351 | 0.798 | 0.649 | 0.696 |
| classified as − > | a | b | a: positive | 0.447 | 0.008 | 0.732 | 0.447 | 0.555 |
| a | 109 | 135 | b: positive | 0.992 | 0.553 | 0.972 | 0.992 | 0.982 |
| b | 40 | 4755 | weighted average | 0.719 | 0.281 | 0.852 | 0.719 | 0.768 |
| classified as − > | a | b | a: positive | 0.590 | 0.008 | 0.796 | 0.590 | 0.678 |
| a | 144 | 100 | b: positive | 0.992 | 0.410 | 0.979 | 0.992 | 0.986 |
| b | 37 | 4758 | weighted average | 0.791 | 0.209 | 0.887 | 0.791 | 0.832 |
| classified as − > | a | b | a: positive | 0.693 | 0.003 | 0.923 | 0.693 | 0.792 |
| a | 169 | 75 | b: positive | 0.997 | 0.307 | 0.985 | 0.997 | 0.991 |
| b | 14 | 4781 | weighted average | 0.845 | 0.155 | 0.954 | 0.845 | 0.891 |
188D | classified as − > | a | b | a: positive | 0.172 | 0.000 | 0.977 | 0.172 | 0.293 |
| a | 42 | 202 | b: positive | 1.000 | 0.828 | 0.960 | 1.000 | 0.979 |
| b | 1 | 4794 | weighted average | 0.586 | 0.414 | 0.968 | 0.586 | 0.636 |
Table 12Quantitative results of SVM using WEKA
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| a | 0 | 244 | b: positive | 1.000 | 1.000 | 0.952 | 1.000 | 0.975 |
| b | 0 | 4795 | weighted average | 0.500 | 0.500 | 0.476 | 0.500 | 0.488 |
| classified as − > | a | b | a: positive | 0.307 | 0.009 | 0.630 | 0.307 | 0.413 |
| a | 75 | 169 | b: positive | 0.991 | 0.693 | 0.966 | 0.991 | 0.978 |
| b | 44 | 4751 | weighted average | 0.649 | 0.351 | 0.798 | 0.649 | 0.696 |
| classified as − > | a | b | a: positive | 0.447 | 0.008 | 0.732 | 0.447 | 0.555 |
| a | 109 | 135 | b: positive | 0.992 | 0.553 | 0.972 | 0.992 | 0.982 |
| b | 40 | 4755 | weighted average | 0.719 | 0.281 | 0.852 | 0.719 | 0.768 |
| classified as − > | a | b | a: positive | 0.590 | 0.008 | 0.796 | 0.590 | 0.678 |
| a | 144 | 100 | b: positive | 0.992 | 0.410 | 0.979 | 0.992 | 0.986 |
| b | 37 | 4758 | weighted average | 0.791 | 0.209 | 0.887 | 0.791 | 0.832 |
| classified as − > | a | b | a: positive | 0.693 | 0.003 | 0.923 | 0.693 | 0.792 |
| a | 169 | 75 | b: positive | 0.997 | 0.307 | 0.985 | 0.997 | 0.991 |
| b | 14 | 4781 | weighted average | 0.845 | 0.155 | 0.954 | 0.845 | 0.891 |
188D | classified as − > | a | b | a: positive | 0.172 | 0.000 | 0.977 | 0.172 | 0.293 |
| a | 42 | 202 | b: positive | 1.000 | 0.828 | 0.960 | 1.000 | 0.979 |
| b | 1 | 4794 | weighted average | 0.586 | 0.414 | 0.968 | 0.586 | 0.636 |
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| a | 0 | 244 | b: positive | 1.000 | 1.000 | 0.952 | 1.000 | 0.975 |
| b | 0 | 4795 | weighted average | 0.500 | 0.500 | 0.476 | 0.500 | 0.488 |
| classified as − > | a | b | a: positive | 0.307 | 0.009 | 0.630 | 0.307 | 0.413 |
| a | 75 | 169 | b: positive | 0.991 | 0.693 | 0.966 | 0.991 | 0.978 |
| b | 44 | 4751 | weighted average | 0.649 | 0.351 | 0.798 | 0.649 | 0.696 |
| classified as − > | a | b | a: positive | 0.447 | 0.008 | 0.732 | 0.447 | 0.555 |
| a | 109 | 135 | b: positive | 0.992 | 0.553 | 0.972 | 0.992 | 0.982 |
| b | 40 | 4755 | weighted average | 0.719 | 0.281 | 0.852 | 0.719 | 0.768 |
| classified as − > | a | b | a: positive | 0.590 | 0.008 | 0.796 | 0.590 | 0.678 |
| a | 144 | 100 | b: positive | 0.992 | 0.410 | 0.979 | 0.992 | 0.986 |
| b | 37 | 4758 | weighted average | 0.791 | 0.209 | 0.887 | 0.791 | 0.832 |
| classified as − > | a | b | a: positive | 0.693 | 0.003 | 0.923 | 0.693 | 0.792 |
| a | 169 | 75 | b: positive | 0.997 | 0.307 | 0.985 | 0.997 | 0.991 |
| b | 14 | 4781 | weighted average | 0.845 | 0.155 | 0.954 | 0.845 | 0.891 |
188D | classified as − > | a | b | a: positive | 0.172 | 0.000 | 0.977 | 0.172 | 0.293 |
| a | 42 | 202 | b: positive | 1.000 | 0.828 | 0.960 | 1.000 | 0.979 |
| b | 1 | 4794 | weighted average | 0.586 | 0.414 | 0.968 | 0.586 | 0.636 |

Figure 2
Scatter plots derived from feature 188D, each of which corresponds to a fold and keeps x- and y-axis representing a variable and its score, respectively.
After contrasting with the results in Table 1 and Table 2, it can be found as follows. First, the average TP rate of the hybrid ensemble classification is 0.005 higher than that of random forest when using the one-dimensional and two-dimensional feature to classify the test samples. Second, the average TP rate of the hybrid ensemble classification is 0.063 higher than that of random forest when the three-dimensional and four-dimensional feature were taken in account. Third, the average TP rates of the hybrid ensemble classification are 0.088 and 0.108 higher than those of random forest when using the 13-dimensional feature and 188D, respectively. Fourth, the TP rate of the hybrid ensemble classification is 0.217 higher than that of random forest using 188D, when PPR-positive proteins (labeled a) are considered. As to PPR-negative ones (labeled b), the TP rate of ensemble classification drops by 0.001 when comparing with the TP rate of random forest. Meanwhile, the number of TP samples in the confusion matrix of the hybrid ensemble classification is bigger than that of random forest using 188D, which indicates the preferable classification performance of the hybrid ensemble classifier. In contrast, more FP samples are found when the hybrid ensemble classification is considered. However, it is acceptable due to the larger sample size of PPR-negative proteins. Besides, the classification results of the hybrid ensemble classifier are better than those of the ensemble classifiers with a same base classifier and single classifiers after making comparisons among these Tables.

Figure 3
Gaussian fitting on clusters of cumulative scores of variable importance corresponding to 10-fold cross validation. Each sub-figure corresponds to the Gaussian fitting of the accumulated scores of the variables with its x and y-axis representing variable scores and their probability densities.

Figure 4
The ACCs of 10-fold cross validation on each feature dimension. Each sub-figure corresponds to the result of a fold.
Classification results after using 10-fold nested cross validation
In order to remove data redundancy, CD-HIT [25] was used with a 25% cutoff, which means no two protein sequences have a similarity more than 25%. The redundancy removal was made on 487 PPR-positive protein sequences and 9590 PPR-negative ones, respectively. Accordingly, 170 PPR-positive proteins and 9293 PPR-negative proteins were left, which constitutes the non-redundant data. We randomly divided both non-redundant PPR-positive proteins and PPR-negative ones into 10 groups, with 17 PPR-positive proteins and 929 PPR-negative ones in each group. Nine groups were used as a training set and the left one was used as a test set. In this way, 10-fold cross validation was made.
After n rounds of score accumulation on each training set, 10 scatter plots were obtained, as illustrated in Figure 2. It can be seen that the accumulated scores of the variables are very close except for several outliers. As having been stated previously, automatic variable selection is to be considered. First of all, the clustering strategy derived from A-DPC [26] was taken into account. Gaussian fitting [27] was performed on the obtained clusters of the accumulated scores. Corresponding results are shown in Figure 3. In each fold, Gaussian mixture component with the lowest accumulated scores was eliminated, and the remain variables were regarded as a feature candidate. Then, an intersection viewed as an important feature was made on feature candidates derived from 10 folds. Note that the important feature is just a compacted one in contrast with a feature candidate. Hence, 10-fold cross validation can be made on the important feature. Second, the incremental strategy was implemented on each fold. Ensemble classification was utilized and the ACC of each feature dimension was calculated. Besides, polynomial fitting on these ACCs from different dimensions was made. The results for 10 folds are illustrated in Figure 4.
Table 13Quantitative results of the one-dimensional feature using the hybrid ensemble classifier
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.176 | 0.014 | 0.188 | 0.176 | 0.182 |
| a | 3 | 14 | b: positive | 0.986 | 0.824 | 0.985 | 0.986 | 0.985 |
| b | 13 | 917 | weighted average | 0.581 | 0.419 | 0.586 | 0.581 | 0.584 |
| classified as − > | a | b | a: positive | 0.118 | 0.019 | 0.100 | 0.118 | 0.108 |
| a | 2 | 15 | b: positive | 0.981 | 0.882 | 0.984 | 0.981 | 0.982 |
| b | 18 | 912 | weighted average | 0.549 | 0.451 | 0.542 | 0.549 | 0.545 |
| classified as − > | a | b | a: positive | 0.118 | 0.019 | 0.100 | 0.118 | 0.108 |
| a | 2 | 15 | b: positive | 0.981 | 0.882 | 0.984 | 0.981 | 0.982 |
| b | 18 | 912 | weighted average | 0.549 | 0.451 | 0.542 | 0.549 | 0.545 |
| classified as − > | a | b | a: positive | 0.118 | 0.014 | 0.133 | 0.118 | 0.125 |
| a | 2 | 15 | b: positive | 0.986 | 0.882 | 0.984 | 0.986 | 0.985 |
| b | 13 | 916 | weighted average | 0.552 | 0.448 | 0.559 | 0.552 | 0.555 |
| classified as − > | a | b | a: positive | 0.235 | 0.010 | 0.308 | 0.235 | 0.267 |
| a | 4 | 13 | b: positive | 0.990 | 0.765 | 0.986 | 0.990 | 0.988 |
| b | 9 | 920 | weighted average | 0.613 | 0.387 | 0.647 | 0.613 | 0.627 |
| classified as − > | a | b | a: positive | 0.000 | 0.016 | 0.000 | 0.000 | 0.000 |
| a | 0 | 17 | b: positive | 0.984 | 1.000 | 0.982 | 0.984 | 0.983 |
| b | 15 | 914 | weighted average | 0.492 | 0.508 | 0.491 | 0.492 | 0.491 |
| classified as − > | a | b | a: positive | 0.000 | 0.022 | 0.000 | 0.000 | 0.000 |
| a | 0 | 17 | b: positive | 0.978 | 1.000 | 0.982 | 0.978 | 0.980 |
| b | 20 | 909 | weighted average | 0.489 | 0.511 | 0.491 | 0.489 | 0.490 |
| classified as − > | a | b | a: positive | 0.059 | 0.015 | 0.067 | 0.059 | 0.062 |
| a | 1 | 16 | b: positive | 0.985 | 0.941 | 0.983 | 0.985 | 0.984 |
| b | 14 | 915 | weighted average | 0.522 | 0.478 | 0.525 | 0.522 | 0.523 |
| classified as − > | a | b | a: positive | 0.059 | 0.009 | 0.111 | 0.059 | 0.077 |
| a | 1 | 16 | b: positive | 0.991 | 0.941 | 0.983 | 0.991 | 0.987 |
| b | 8 | 921 | weighted average | 0.525 | 0.475 | 0.547 | 0.525 | 0.532 |
| classified as − > | a | b | a: positive | 0.059 | 0.014 | 0.071 | 0.059 | 0.065 |
| a | 1 | 16 | b: positive | 0.986 | 0.941 | 0.983 | 0.986 | 0.984 |
| b | 13 | 916 | weighted average | 0.522 | 0.478 | 0.527 | 0.522 | 0.524 |
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.176 | 0.014 | 0.188 | 0.176 | 0.182 |
| a | 3 | 14 | b: positive | 0.986 | 0.824 | 0.985 | 0.986 | 0.985 |
| b | 13 | 917 | weighted average | 0.581 | 0.419 | 0.586 | 0.581 | 0.584 |
| classified as − > | a | b | a: positive | 0.118 | 0.019 | 0.100 | 0.118 | 0.108 |
| a | 2 | 15 | b: positive | 0.981 | 0.882 | 0.984 | 0.981 | 0.982 |
| b | 18 | 912 | weighted average | 0.549 | 0.451 | 0.542 | 0.549 | 0.545 |
| classified as − > | a | b | a: positive | 0.118 | 0.019 | 0.100 | 0.118 | 0.108 |
| a | 2 | 15 | b: positive | 0.981 | 0.882 | 0.984 | 0.981 | 0.982 |
| b | 18 | 912 | weighted average | 0.549 | 0.451 | 0.542 | 0.549 | 0.545 |
| classified as − > | a | b | a: positive | 0.118 | 0.014 | 0.133 | 0.118 | 0.125 |
| a | 2 | 15 | b: positive | 0.986 | 0.882 | 0.984 | 0.986 | 0.985 |
| b | 13 | 916 | weighted average | 0.552 | 0.448 | 0.559 | 0.552 | 0.555 |
| classified as − > | a | b | a: positive | 0.235 | 0.010 | 0.308 | 0.235 | 0.267 |
| a | 4 | 13 | b: positive | 0.990 | 0.765 | 0.986 | 0.990 | 0.988 |
| b | 9 | 920 | weighted average | 0.613 | 0.387 | 0.647 | 0.613 | 0.627 |
| classified as − > | a | b | a: positive | 0.000 | 0.016 | 0.000 | 0.000 | 0.000 |
| a | 0 | 17 | b: positive | 0.984 | 1.000 | 0.982 | 0.984 | 0.983 |
| b | 15 | 914 | weighted average | 0.492 | 0.508 | 0.491 | 0.492 | 0.491 |
| classified as − > | a | b | a: positive | 0.000 | 0.022 | 0.000 | 0.000 | 0.000 |
| a | 0 | 17 | b: positive | 0.978 | 1.000 | 0.982 | 0.978 | 0.980 |
| b | 20 | 909 | weighted average | 0.489 | 0.511 | 0.491 | 0.489 | 0.490 |
| classified as − > | a | b | a: positive | 0.059 | 0.015 | 0.067 | 0.059 | 0.062 |
| a | 1 | 16 | b: positive | 0.985 | 0.941 | 0.983 | 0.985 | 0.984 |
| b | 14 | 915 | weighted average | 0.522 | 0.478 | 0.525 | 0.522 | 0.523 |
| classified as − > | a | b | a: positive | 0.059 | 0.009 | 0.111 | 0.059 | 0.077 |
| a | 1 | 16 | b: positive | 0.991 | 0.941 | 0.983 | 0.991 | 0.987 |
| b | 8 | 921 | weighted average | 0.525 | 0.475 | 0.547 | 0.525 | 0.532 |
| classified as − > | a | b | a: positive | 0.059 | 0.014 | 0.071 | 0.059 | 0.065 |
| a | 1 | 16 | b: positive | 0.986 | 0.941 | 0.983 | 0.986 | 0.984 |
| b | 13 | 916 | weighted average | 0.522 | 0.478 | 0.527 | 0.522 | 0.524 |
Table 13Quantitative results of the one-dimensional feature using the hybrid ensemble classifier
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.176 | 0.014 | 0.188 | 0.176 | 0.182 |
| a | 3 | 14 | b: positive | 0.986 | 0.824 | 0.985 | 0.986 | 0.985 |
| b | 13 | 917 | weighted average | 0.581 | 0.419 | 0.586 | 0.581 | 0.584 |
| classified as − > | a | b | a: positive | 0.118 | 0.019 | 0.100 | 0.118 | 0.108 |
| a | 2 | 15 | b: positive | 0.981 | 0.882 | 0.984 | 0.981 | 0.982 |
| b | 18 | 912 | weighted average | 0.549 | 0.451 | 0.542 | 0.549 | 0.545 |
| classified as − > | a | b | a: positive | 0.118 | 0.019 | 0.100 | 0.118 | 0.108 |
| a | 2 | 15 | b: positive | 0.981 | 0.882 | 0.984 | 0.981 | 0.982 |
| b | 18 | 912 | weighted average | 0.549 | 0.451 | 0.542 | 0.549 | 0.545 |
| classified as − > | a | b | a: positive | 0.118 | 0.014 | 0.133 | 0.118 | 0.125 |
| a | 2 | 15 | b: positive | 0.986 | 0.882 | 0.984 | 0.986 | 0.985 |
| b | 13 | 916 | weighted average | 0.552 | 0.448 | 0.559 | 0.552 | 0.555 |
| classified as − > | a | b | a: positive | 0.235 | 0.010 | 0.308 | 0.235 | 0.267 |
| a | 4 | 13 | b: positive | 0.990 | 0.765 | 0.986 | 0.990 | 0.988 |
| b | 9 | 920 | weighted average | 0.613 | 0.387 | 0.647 | 0.613 | 0.627 |
| classified as − > | a | b | a: positive | 0.000 | 0.016 | 0.000 | 0.000 | 0.000 |
| a | 0 | 17 | b: positive | 0.984 | 1.000 | 0.982 | 0.984 | 0.983 |
| b | 15 | 914 | weighted average | 0.492 | 0.508 | 0.491 | 0.492 | 0.491 |
| classified as − > | a | b | a: positive | 0.000 | 0.022 | 0.000 | 0.000 | 0.000 |
| a | 0 | 17 | b: positive | 0.978 | 1.000 | 0.982 | 0.978 | 0.980 |
| b | 20 | 909 | weighted average | 0.489 | 0.511 | 0.491 | 0.489 | 0.490 |
| classified as − > | a | b | a: positive | 0.059 | 0.015 | 0.067 | 0.059 | 0.062 |
| a | 1 | 16 | b: positive | 0.985 | 0.941 | 0.983 | 0.985 | 0.984 |
| b | 14 | 915 | weighted average | 0.522 | 0.478 | 0.525 | 0.522 | 0.523 |
| classified as − > | a | b | a: positive | 0.059 | 0.009 | 0.111 | 0.059 | 0.077 |
| a | 1 | 16 | b: positive | 0.991 | 0.941 | 0.983 | 0.991 | 0.987 |
| b | 8 | 921 | weighted average | 0.525 | 0.475 | 0.547 | 0.525 | 0.532 |
| classified as − > | a | b | a: positive | 0.059 | 0.014 | 0.071 | 0.059 | 0.065 |
| a | 1 | 16 | b: positive | 0.986 | 0.941 | 0.983 | 0.986 | 0.984 |
| b | 13 | 916 | weighted average | 0.522 | 0.478 | 0.527 | 0.522 | 0.524 |
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.176 | 0.014 | 0.188 | 0.176 | 0.182 |
| a | 3 | 14 | b: positive | 0.986 | 0.824 | 0.985 | 0.986 | 0.985 |
| b | 13 | 917 | weighted average | 0.581 | 0.419 | 0.586 | 0.581 | 0.584 |
| classified as − > | a | b | a: positive | 0.118 | 0.019 | 0.100 | 0.118 | 0.108 |
| a | 2 | 15 | b: positive | 0.981 | 0.882 | 0.984 | 0.981 | 0.982 |
| b | 18 | 912 | weighted average | 0.549 | 0.451 | 0.542 | 0.549 | 0.545 |
| classified as − > | a | b | a: positive | 0.118 | 0.019 | 0.100 | 0.118 | 0.108 |
| a | 2 | 15 | b: positive | 0.981 | 0.882 | 0.984 | 0.981 | 0.982 |
| b | 18 | 912 | weighted average | 0.549 | 0.451 | 0.542 | 0.549 | 0.545 |
| classified as − > | a | b | a: positive | 0.118 | 0.014 | 0.133 | 0.118 | 0.125 |
| a | 2 | 15 | b: positive | 0.986 | 0.882 | 0.984 | 0.986 | 0.985 |
| b | 13 | 916 | weighted average | 0.552 | 0.448 | 0.559 | 0.552 | 0.555 |
| classified as − > | a | b | a: positive | 0.235 | 0.010 | 0.308 | 0.235 | 0.267 |
| a | 4 | 13 | b: positive | 0.990 | 0.765 | 0.986 | 0.990 | 0.988 |
| b | 9 | 920 | weighted average | 0.613 | 0.387 | 0.647 | 0.613 | 0.627 |
| classified as − > | a | b | a: positive | 0.000 | 0.016 | 0.000 | 0.000 | 0.000 |
| a | 0 | 17 | b: positive | 0.984 | 1.000 | 0.982 | 0.984 | 0.983 |
| b | 15 | 914 | weighted average | 0.492 | 0.508 | 0.491 | 0.492 | 0.491 |
| classified as − > | a | b | a: positive | 0.000 | 0.022 | 0.000 | 0.000 | 0.000 |
| a | 0 | 17 | b: positive | 0.978 | 1.000 | 0.982 | 0.978 | 0.980 |
| b | 20 | 909 | weighted average | 0.489 | 0.511 | 0.491 | 0.489 | 0.490 |
| classified as − > | a | b | a: positive | 0.059 | 0.015 | 0.067 | 0.059 | 0.062 |
| a | 1 | 16 | b: positive | 0.985 | 0.941 | 0.983 | 0.985 | 0.984 |
| b | 14 | 915 | weighted average | 0.522 | 0.478 | 0.525 | 0.522 | 0.523 |
| classified as − > | a | b | a: positive | 0.059 | 0.009 | 0.111 | 0.059 | 0.077 |
| a | 1 | 16 | b: positive | 0.991 | 0.941 | 0.983 | 0.991 | 0.987 |
| b | 8 | 921 | weighted average | 0.525 | 0.475 | 0.547 | 0.525 | 0.532 |
| classified as − > | a | b | a: positive | 0.059 | 0.014 | 0.071 | 0.059 | 0.065 |
| a | 1 | 16 | b: positive | 0.986 | 0.941 | 0.983 | 0.986 | 0.984 |
| b | 13 | 916 | weighted average | 0.522 | 0.478 | 0.527 | 0.522 | 0.524 |
The quantitative results of the important feature derived from the clustering strategy, together with those of the variable with the highest accumulated score and 188D, are listed in Table 13, Table 14 and Table 15. It can be seen that the average TP rates of the important feature exceed those of 188D. As to the qualitative results, 10 ACCs were calculated. A line chart of these ACCs and their average value are shown in Figure 5(A). When it comes to the incremental strategy, 10 ACCs from 10-folds were recorded in a line according to each feature dimension. Thus, 188 lines were obtained, as shown in Figure 5(B). The blue line refers to one-dimensional feature and the red line corresponds to 188D. The other gray lines represent the ACCs associated with the features from two to 187 dimensions. Besides, a feature derived from the polynomial fitting is labeled as a green star in each fold. From Figure 5(B), it can be indicated that the classification performance is better when using not 188D but a feature with its dimension higher than 10 following the incremental strategy. The quantitative results of 10 folds derived from the incremental strategy are listed in Table 16, each row of which corresponds to a feature derived from the polynomial fitting in each fold. By comparing the qualitative results (see Figure 5) and the quantitative results (see Table 13, Table 14, Table 15 and Table 16), it can be concluded that there is not any apparent difference of classification performance between the incremental strategy and the clustering strategy.
Table 14Quantitative results of the feature selected by the clustering strategy using the hybrid ensemble classifier
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.941 | 0.076 | 0.184 | 0.941 | 0.308 |
| a | 16 | 1 | b: positive | 0.924 | 0.059 | 0.999 | 0.924 | 0.960 |
| b | 71 | 859 | weighted average | 0.932 | 0.068 | 0.591 | 0.932 | 0.634 |
| classified as − > | a | b | a: positive | 0.941 | 0.065 | 0.211 | 0.941 | 0.344 |
| a | 16 | 1 | b: positive | 0.935 | 0.059 | 0.999 | 0.935 | 0.966 |
| b | 60 | 870 | weighted average | 0.938 | 0.062 | 0.605 | 0.938 | 0.655 |
| classified as − > | a | b | a: positive | 0.706 | 0.083 | 0.135 | 0.706 | 0.226 |
| a | 12 | 5 | b: positive | 0.917 | 0.294 | 0.994 | 0.917 | 0.954 |
| b | 77 | 853 | weighted average | 0.812 | 0.188 | 0.565 | 0.812 | 0.590 |
| classified as − > | a | b | a: positive | 0.824 | 0.073 | 0.171 | 0.824 | 0.283 |
| a | 14 | 3 | b: positive | 0.927 | 0.176 | 0.997 | 0.927 | 0.960 |
| b | 68 | 861 | weighted average | 0.875 | 0.125 | 0.584 | 0.875 | 0.622 |
| classified as − > | a | b | a: positive | 0.941 | 0.064 | 0.213 | 0.941 | 0.348 |
| a | 16 | 1 | b: positive | 0.936 | 0.059 | 0.999 | 0.936 | 0.967 |
| b | 59 | 870 | weighted average | 0.939 | 0.061 | 0.606 | 0.939 | 0.657 |
| classified as − > | a | b | a: positive | 0.941 | 0.075 | 0.186 | 0.941 | 0.311 |
| a | 16 | 1 | b: positive | 0.925 | 0.059 | 0.999 | 0.925 | 0.960 |
| b | 70 | 859 | weighted average | 0.933 | 0.067 | 0.592 | 0.933 | 0.635 |
| classified as − > | a | b | a: positive | 0.882 | 0.078 | 0.172 | 0.882 | 0.288 |
| a | 15 | 2 | b: positive | 0.922 | 0.118 | 0.998 | 0.922 | 0.959 |
| b | 72 | 857 | weighted average | 0.902 | 0.098 | 0.585 | 0.902 | 0.624 |
| classified as − > | a | b | a: positive | 0.882 | 0.075 | 0.176 | 0.882 | 0.294 |
| a | 15 | 2 | b: positive | 0.925 | 0.118 | 0.998 | 0.925 | 0.960 |
| b | 70 | 859 | weighted average | 0.904 | 0.096 | 0.587 | 0.904 | 0.627 |
| classified as − > | a | b | a: positive | 0.765 | 0.071 | 0.165 | 0.765 | 0.271 |
| a | 13 | 4 | b: positive | 0.929 | 0.235 | 0.995 | 0.929 | 0.961 |
| b | 66 | 863 | weighted average | 0.847 | 0.153 | 0.580 | 0.847 | 0.616 |
| classified as − > | a | b | a: positive | 0.941 | 0.059 | 0.225 | 0.941 | 0.364 |
| a | 16 | 1 | b: positive | 0.941 | 0.059 | 0.999 | 0.941 | 0.969 |
| b | 55 | 874 | weighted average | 0.941 | 0.059 | 0.612 | 0.941 | 0.666 |
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.941 | 0.076 | 0.184 | 0.941 | 0.308 |
| a | 16 | 1 | b: positive | 0.924 | 0.059 | 0.999 | 0.924 | 0.960 |
| b | 71 | 859 | weighted average | 0.932 | 0.068 | 0.591 | 0.932 | 0.634 |
| classified as − > | a | b | a: positive | 0.941 | 0.065 | 0.211 | 0.941 | 0.344 |
| a | 16 | 1 | b: positive | 0.935 | 0.059 | 0.999 | 0.935 | 0.966 |
| b | 60 | 870 | weighted average | 0.938 | 0.062 | 0.605 | 0.938 | 0.655 |
| classified as − > | a | b | a: positive | 0.706 | 0.083 | 0.135 | 0.706 | 0.226 |
| a | 12 | 5 | b: positive | 0.917 | 0.294 | 0.994 | 0.917 | 0.954 |
| b | 77 | 853 | weighted average | 0.812 | 0.188 | 0.565 | 0.812 | 0.590 |
| classified as − > | a | b | a: positive | 0.824 | 0.073 | 0.171 | 0.824 | 0.283 |
| a | 14 | 3 | b: positive | 0.927 | 0.176 | 0.997 | 0.927 | 0.960 |
| b | 68 | 861 | weighted average | 0.875 | 0.125 | 0.584 | 0.875 | 0.622 |
| classified as − > | a | b | a: positive | 0.941 | 0.064 | 0.213 | 0.941 | 0.348 |
| a | 16 | 1 | b: positive | 0.936 | 0.059 | 0.999 | 0.936 | 0.967 |
| b | 59 | 870 | weighted average | 0.939 | 0.061 | 0.606 | 0.939 | 0.657 |
| classified as − > | a | b | a: positive | 0.941 | 0.075 | 0.186 | 0.941 | 0.311 |
| a | 16 | 1 | b: positive | 0.925 | 0.059 | 0.999 | 0.925 | 0.960 |
| b | 70 | 859 | weighted average | 0.933 | 0.067 | 0.592 | 0.933 | 0.635 |
| classified as − > | a | b | a: positive | 0.882 | 0.078 | 0.172 | 0.882 | 0.288 |
| a | 15 | 2 | b: positive | 0.922 | 0.118 | 0.998 | 0.922 | 0.959 |
| b | 72 | 857 | weighted average | 0.902 | 0.098 | 0.585 | 0.902 | 0.624 |
| classified as − > | a | b | a: positive | 0.882 | 0.075 | 0.176 | 0.882 | 0.294 |
| a | 15 | 2 | b: positive | 0.925 | 0.118 | 0.998 | 0.925 | 0.960 |
| b | 70 | 859 | weighted average | 0.904 | 0.096 | 0.587 | 0.904 | 0.627 |
| classified as − > | a | b | a: positive | 0.765 | 0.071 | 0.165 | 0.765 | 0.271 |
| a | 13 | 4 | b: positive | 0.929 | 0.235 | 0.995 | 0.929 | 0.961 |
| b | 66 | 863 | weighted average | 0.847 | 0.153 | 0.580 | 0.847 | 0.616 |
| classified as − > | a | b | a: positive | 0.941 | 0.059 | 0.225 | 0.941 | 0.364 |
| a | 16 | 1 | b: positive | 0.941 | 0.059 | 0.999 | 0.941 | 0.969 |
| b | 55 | 874 | weighted average | 0.941 | 0.059 | 0.612 | 0.941 | 0.666 |
Table 14Quantitative results of the feature selected by the clustering strategy using the hybrid ensemble classifier
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.941 | 0.076 | 0.184 | 0.941 | 0.308 |
| a | 16 | 1 | b: positive | 0.924 | 0.059 | 0.999 | 0.924 | 0.960 |
| b | 71 | 859 | weighted average | 0.932 | 0.068 | 0.591 | 0.932 | 0.634 |
| classified as − > | a | b | a: positive | 0.941 | 0.065 | 0.211 | 0.941 | 0.344 |
| a | 16 | 1 | b: positive | 0.935 | 0.059 | 0.999 | 0.935 | 0.966 |
| b | 60 | 870 | weighted average | 0.938 | 0.062 | 0.605 | 0.938 | 0.655 |
| classified as − > | a | b | a: positive | 0.706 | 0.083 | 0.135 | 0.706 | 0.226 |
| a | 12 | 5 | b: positive | 0.917 | 0.294 | 0.994 | 0.917 | 0.954 |
| b | 77 | 853 | weighted average | 0.812 | 0.188 | 0.565 | 0.812 | 0.590 |
| classified as − > | a | b | a: positive | 0.824 | 0.073 | 0.171 | 0.824 | 0.283 |
| a | 14 | 3 | b: positive | 0.927 | 0.176 | 0.997 | 0.927 | 0.960 |
| b | 68 | 861 | weighted average | 0.875 | 0.125 | 0.584 | 0.875 | 0.622 |
| classified as − > | a | b | a: positive | 0.941 | 0.064 | 0.213 | 0.941 | 0.348 |
| a | 16 | 1 | b: positive | 0.936 | 0.059 | 0.999 | 0.936 | 0.967 |
| b | 59 | 870 | weighted average | 0.939 | 0.061 | 0.606 | 0.939 | 0.657 |
| classified as − > | a | b | a: positive | 0.941 | 0.075 | 0.186 | 0.941 | 0.311 |
| a | 16 | 1 | b: positive | 0.925 | 0.059 | 0.999 | 0.925 | 0.960 |
| b | 70 | 859 | weighted average | 0.933 | 0.067 | 0.592 | 0.933 | 0.635 |
| classified as − > | a | b | a: positive | 0.882 | 0.078 | 0.172 | 0.882 | 0.288 |
| a | 15 | 2 | b: positive | 0.922 | 0.118 | 0.998 | 0.922 | 0.959 |
| b | 72 | 857 | weighted average | 0.902 | 0.098 | 0.585 | 0.902 | 0.624 |
| classified as − > | a | b | a: positive | 0.882 | 0.075 | 0.176 | 0.882 | 0.294 |
| a | 15 | 2 | b: positive | 0.925 | 0.118 | 0.998 | 0.925 | 0.960 |
| b | 70 | 859 | weighted average | 0.904 | 0.096 | 0.587 | 0.904 | 0.627 |
| classified as − > | a | b | a: positive | 0.765 | 0.071 | 0.165 | 0.765 | 0.271 |
| a | 13 | 4 | b: positive | 0.929 | 0.235 | 0.995 | 0.929 | 0.961 |
| b | 66 | 863 | weighted average | 0.847 | 0.153 | 0.580 | 0.847 | 0.616 |
| classified as − > | a | b | a: positive | 0.941 | 0.059 | 0.225 | 0.941 | 0.364 |
| a | 16 | 1 | b: positive | 0.941 | 0.059 | 0.999 | 0.941 | 0.969 |
| b | 55 | 874 | weighted average | 0.941 | 0.059 | 0.612 | 0.941 | 0.666 |
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 0.941 | 0.076 | 0.184 | 0.941 | 0.308 |
| a | 16 | 1 | b: positive | 0.924 | 0.059 | 0.999 | 0.924 | 0.960 |
| b | 71 | 859 | weighted average | 0.932 | 0.068 | 0.591 | 0.932 | 0.634 |
| classified as − > | a | b | a: positive | 0.941 | 0.065 | 0.211 | 0.941 | 0.344 |
| a | 16 | 1 | b: positive | 0.935 | 0.059 | 0.999 | 0.935 | 0.966 |
| b | 60 | 870 | weighted average | 0.938 | 0.062 | 0.605 | 0.938 | 0.655 |
| classified as − > | a | b | a: positive | 0.706 | 0.083 | 0.135 | 0.706 | 0.226 |
| a | 12 | 5 | b: positive | 0.917 | 0.294 | 0.994 | 0.917 | 0.954 |
| b | 77 | 853 | weighted average | 0.812 | 0.188 | 0.565 | 0.812 | 0.590 |
| classified as − > | a | b | a: positive | 0.824 | 0.073 | 0.171 | 0.824 | 0.283 |
| a | 14 | 3 | b: positive | 0.927 | 0.176 | 0.997 | 0.927 | 0.960 |
| b | 68 | 861 | weighted average | 0.875 | 0.125 | 0.584 | 0.875 | 0.622 |
| classified as − > | a | b | a: positive | 0.941 | 0.064 | 0.213 | 0.941 | 0.348 |
| a | 16 | 1 | b: positive | 0.936 | 0.059 | 0.999 | 0.936 | 0.967 |
| b | 59 | 870 | weighted average | 0.939 | 0.061 | 0.606 | 0.939 | 0.657 |
| classified as − > | a | b | a: positive | 0.941 | 0.075 | 0.186 | 0.941 | 0.311 |
| a | 16 | 1 | b: positive | 0.925 | 0.059 | 0.999 | 0.925 | 0.960 |
| b | 70 | 859 | weighted average | 0.933 | 0.067 | 0.592 | 0.933 | 0.635 |
| classified as − > | a | b | a: positive | 0.882 | 0.078 | 0.172 | 0.882 | 0.288 |
| a | 15 | 2 | b: positive | 0.922 | 0.118 | 0.998 | 0.922 | 0.959 |
| b | 72 | 857 | weighted average | 0.902 | 0.098 | 0.585 | 0.902 | 0.624 |
| classified as − > | a | b | a: positive | 0.882 | 0.075 | 0.176 | 0.882 | 0.294 |
| a | 15 | 2 | b: positive | 0.925 | 0.118 | 0.998 | 0.925 | 0.960 |
| b | 70 | 859 | weighted average | 0.904 | 0.096 | 0.587 | 0.904 | 0.627 |
| classified as − > | a | b | a: positive | 0.765 | 0.071 | 0.165 | 0.765 | 0.271 |
| a | 13 | 4 | b: positive | 0.929 | 0.235 | 0.995 | 0.929 | 0.961 |
| b | 66 | 863 | weighted average | 0.847 | 0.153 | 0.580 | 0.847 | 0.616 |
| classified as − > | a | b | a: positive | 0.941 | 0.059 | 0.225 | 0.941 | 0.364 |
| a | 16 | 1 | b: positive | 0.941 | 0.059 | 0.999 | 0.941 | 0.969 |
| b | 55 | 874 | weighted average | 0.941 | 0.059 | 0.612 | 0.941 | 0.666 |
Table 15Quantitative results of 188D using the hybrid ensemble classifier
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
188D | classified as − > | a | b | a: positive | 0.824 | 0.000 | 1.000 | 0.824 | 0.903 |
| a | 14 | 3 | b: positive | 1.000 | 0.176 | 0.997 | 1.000 | 0.998 |
| b | 0 | 930 | weighted average | 0.912 | 0.088 | 0.998 | 0.912 | 0.951 |
188D | classified as − > | a | b | a: positive | 0.765 | 0.001 | 0.929 | 0.765 | 0.839 |
| a | 13 | 4 | b: positive | 0.999 | 0.235 | 0.996 | 0.999 | 0.997 |
| b | 1 | 929 | weighted average | 0.882 | 0.118 | 0.962 | 0.882 | 0.918 |
188D | classified as − > | a | b | a: positive | 0.706 | 0.001 | 0.923 | 0.706 | 0.800 |
| a | 12 | 5 | b: positive | 0.999 | 0.294 | 0.995 | 0.999 | 0.997 |
| b | 1 | 929 | weighted average | 0.852 | 0.148 | 0.959 | 0.852 | 0.898 |
188D | classified as − > | a | b | a: positive | 0.588 | 0.001 | 0.909 | 0.588 | 0.714 |
| a | 10 | 7 | b: positive | 0.999 | 0.412 | 0.993 | 0.999 | 0.996 |
| b | 1 | 928 | weighted average | 0.794 | 0.206 | 0.951 | 0.794 | 0.855 |
188D | classified as − > | a | b | a: positive | 0.824 | 0.000 | 1.000 | 0.824 | 0.903 |
| a | 14 | 3 | b: positive | 1.000 | 0.176 | 0.997 | 1.000 | 0.998 |
| b | 0 | 929 | weighted average | 0.912 | 0.088 | 0.998 | 0.912 | 0.951 |
188D | classified as − > | a | b | a: positive | 0.706 | 0.002 | 0.857 | 0.706 | 0.774 |
| a | 12 | 5 | b: positive | 0.998 | 0.294 | 0.995 | 0.998 | 0.996 |
| b | 2 | 927 | weighted average | 0.852 | 0.148 | 0.926 | 0.852 | 0.885 |
188D | classified as − > | a | b | a: positive | 0.588 | 0.000 | 1.000 | 0.588 | 0.741 |
| a | 10 | 7 | b: positive | 1.000 | 0.412 | 0.993 | 1.000 | 0.996 |
| b | 0 | 929 | weighted average | 0.794 | 0.206 | 0.996 | 0.794 | 0.868 |
188D | classified as − > | a | b | a: positive | 0.765 | 0.001 | 0.929 | 0.765 | 0.839 |
| a | 13 | 4 | b: positive | 0.999 | 0.235 | 0.996 | 0.999 | 0.997 |
| b | 1 | 928 | weighted average | 0.882 | 0.118 | 0.962 | 0.882 | 0.918 |
188D | classified as − > | a | b | a: positive | 0.706 | 0.001 | 0.923 | 0.706 | 0.800 |
| a | 12 | 5 | b: positive | 0.999 | 0.294 | 0.995 | 0.999 | 0.997 |
| b | 1 | 928 | weighted average | 0.852 | 0.148 | 0.959 | 0.852 | 0.898 |
188D | classified as − > | a | b | a: positive | 0.765 | 0.000 | 1.000 | 0.765 | 0.867 |
| a | 13 | 4 | b: positive | 1.000 | 0.235 | 0.996 | 1.000 | 0.998 |
| b | 0 | 929 | weighted average | 0.882 | 0.118 | 0.998 | 0.882 | 0.832 |
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
188D | classified as − > | a | b | a: positive | 0.824 | 0.000 | 1.000 | 0.824 | 0.903 |
| a | 14 | 3 | b: positive | 1.000 | 0.176 | 0.997 | 1.000 | 0.998 |
| b | 0 | 930 | weighted average | 0.912 | 0.088 | 0.998 | 0.912 | 0.951 |
188D | classified as − > | a | b | a: positive | 0.765 | 0.001 | 0.929 | 0.765 | 0.839 |
| a | 13 | 4 | b: positive | 0.999 | 0.235 | 0.996 | 0.999 | 0.997 |
| b | 1 | 929 | weighted average | 0.882 | 0.118 | 0.962 | 0.882 | 0.918 |
188D | classified as − > | a | b | a: positive | 0.706 | 0.001 | 0.923 | 0.706 | 0.800 |
| a | 12 | 5 | b: positive | 0.999 | 0.294 | 0.995 | 0.999 | 0.997 |
| b | 1 | 929 | weighted average | 0.852 | 0.148 | 0.959 | 0.852 | 0.898 |
188D | classified as − > | a | b | a: positive | 0.588 | 0.001 | 0.909 | 0.588 | 0.714 |
| a | 10 | 7 | b: positive | 0.999 | 0.412 | 0.993 | 0.999 | 0.996 |
| b | 1 | 928 | weighted average | 0.794 | 0.206 | 0.951 | 0.794 | 0.855 |
188D | classified as − > | a | b | a: positive | 0.824 | 0.000 | 1.000 | 0.824 | 0.903 |
| a | 14 | 3 | b: positive | 1.000 | 0.176 | 0.997 | 1.000 | 0.998 |
| b | 0 | 929 | weighted average | 0.912 | 0.088 | 0.998 | 0.912 | 0.951 |
188D | classified as − > | a | b | a: positive | 0.706 | 0.002 | 0.857 | 0.706 | 0.774 |
| a | 12 | 5 | b: positive | 0.998 | 0.294 | 0.995 | 0.998 | 0.996 |
| b | 2 | 927 | weighted average | 0.852 | 0.148 | 0.926 | 0.852 | 0.885 |
188D | classified as − > | a | b | a: positive | 0.588 | 0.000 | 1.000 | 0.588 | 0.741 |
| a | 10 | 7 | b: positive | 1.000 | 0.412 | 0.993 | 1.000 | 0.996 |
| b | 0 | 929 | weighted average | 0.794 | 0.206 | 0.996 | 0.794 | 0.868 |
188D | classified as − > | a | b | a: positive | 0.765 | 0.001 | 0.929 | 0.765 | 0.839 |
| a | 13 | 4 | b: positive | 0.999 | 0.235 | 0.996 | 0.999 | 0.997 |
| b | 1 | 928 | weighted average | 0.882 | 0.118 | 0.962 | 0.882 | 0.918 |
188D | classified as − > | a | b | a: positive | 0.706 | 0.001 | 0.923 | 0.706 | 0.800 |
| a | 12 | 5 | b: positive | 0.999 | 0.294 | 0.995 | 0.999 | 0.997 |
| b | 1 | 928 | weighted average | 0.852 | 0.148 | 0.959 | 0.852 | 0.898 |
188D | classified as − > | a | b | a: positive | 0.765 | 0.000 | 1.000 | 0.765 | 0.867 |
| a | 13 | 4 | b: positive | 1.000 | 0.235 | 0.996 | 1.000 | 0.998 |
| b | 0 | 929 | weighted average | 0.882 | 0.118 | 0.998 | 0.882 | 0.832 |
Table 15Quantitative results of 188D using the hybrid ensemble classifier
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
188D | classified as − > | a | b | a: positive | 0.824 | 0.000 | 1.000 | 0.824 | 0.903 |
| a | 14 | 3 | b: positive | 1.000 | 0.176 | 0.997 | 1.000 | 0.998 |
| b | 0 | 930 | weighted average | 0.912 | 0.088 | 0.998 | 0.912 | 0.951 |
188D | classified as − > | a | b | a: positive | 0.765 | 0.001 | 0.929 | 0.765 | 0.839 |
| a | 13 | 4 | b: positive | 0.999 | 0.235 | 0.996 | 0.999 | 0.997 |
| b | 1 | 929 | weighted average | 0.882 | 0.118 | 0.962 | 0.882 | 0.918 |
188D | classified as − > | a | b | a: positive | 0.706 | 0.001 | 0.923 | 0.706 | 0.800 |
| a | 12 | 5 | b: positive | 0.999 | 0.294 | 0.995 | 0.999 | 0.997 |
| b | 1 | 929 | weighted average | 0.852 | 0.148 | 0.959 | 0.852 | 0.898 |
188D | classified as − > | a | b | a: positive | 0.588 | 0.001 | 0.909 | 0.588 | 0.714 |
| a | 10 | 7 | b: positive | 0.999 | 0.412 | 0.993 | 0.999 | 0.996 |
| b | 1 | 928 | weighted average | 0.794 | 0.206 | 0.951 | 0.794 | 0.855 |
188D | classified as − > | a | b | a: positive | 0.824 | 0.000 | 1.000 | 0.824 | 0.903 |
| a | 14 | 3 | b: positive | 1.000 | 0.176 | 0.997 | 1.000 | 0.998 |
| b | 0 | 929 | weighted average | 0.912 | 0.088 | 0.998 | 0.912 | 0.951 |
188D | classified as − > | a | b | a: positive | 0.706 | 0.002 | 0.857 | 0.706 | 0.774 |
| a | 12 | 5 | b: positive | 0.998 | 0.294 | 0.995 | 0.998 | 0.996 |
| b | 2 | 927 | weighted average | 0.852 | 0.148 | 0.926 | 0.852 | 0.885 |
188D | classified as − > | a | b | a: positive | 0.588 | 0.000 | 1.000 | 0.588 | 0.741 |
| a | 10 | 7 | b: positive | 1.000 | 0.412 | 0.993 | 1.000 | 0.996 |
| b | 0 | 929 | weighted average | 0.794 | 0.206 | 0.996 | 0.794 | 0.868 |
188D | classified as − > | a | b | a: positive | 0.765 | 0.001 | 0.929 | 0.765 | 0.839 |
| a | 13 | 4 | b: positive | 0.999 | 0.235 | 0.996 | 0.999 | 0.997 |
| b | 1 | 928 | weighted average | 0.882 | 0.118 | 0.962 | 0.882 | 0.918 |
188D | classified as − > | a | b | a: positive | 0.706 | 0.001 | 0.923 | 0.706 | 0.800 |
| a | 12 | 5 | b: positive | 0.999 | 0.294 | 0.995 | 0.999 | 0.997 |
| b | 1 | 928 | weighted average | 0.852 | 0.148 | 0.959 | 0.852 | 0.898 |
188D | classified as − > | a | b | a: positive | 0.765 | 0.000 | 1.000 | 0.765 | 0.867 |
| a | 13 | 4 | b: positive | 1.000 | 0.235 | 0.996 | 1.000 | 0.998 |
| b | 0 | 929 | weighted average | 0.882 | 0.118 | 0.998 | 0.882 | 0.832 |
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
188D | classified as − > | a | b | a: positive | 0.824 | 0.000 | 1.000 | 0.824 | 0.903 |
| a | 14 | 3 | b: positive | 1.000 | 0.176 | 0.997 | 1.000 | 0.998 |
| b | 0 | 930 | weighted average | 0.912 | 0.088 | 0.998 | 0.912 | 0.951 |
188D | classified as − > | a | b | a: positive | 0.765 | 0.001 | 0.929 | 0.765 | 0.839 |
| a | 13 | 4 | b: positive | 0.999 | 0.235 | 0.996 | 0.999 | 0.997 |
| b | 1 | 929 | weighted average | 0.882 | 0.118 | 0.962 | 0.882 | 0.918 |
188D | classified as − > | a | b | a: positive | 0.706 | 0.001 | 0.923 | 0.706 | 0.800 |
| a | 12 | 5 | b: positive | 0.999 | 0.294 | 0.995 | 0.999 | 0.997 |
| b | 1 | 929 | weighted average | 0.852 | 0.148 | 0.959 | 0.852 | 0.898 |
188D | classified as − > | a | b | a: positive | 0.588 | 0.001 | 0.909 | 0.588 | 0.714 |
| a | 10 | 7 | b: positive | 0.999 | 0.412 | 0.993 | 0.999 | 0.996 |
| b | 1 | 928 | weighted average | 0.794 | 0.206 | 0.951 | 0.794 | 0.855 |
188D | classified as − > | a | b | a: positive | 0.824 | 0.000 | 1.000 | 0.824 | 0.903 |
| a | 14 | 3 | b: positive | 1.000 | 0.176 | 0.997 | 1.000 | 0.998 |
| b | 0 | 929 | weighted average | 0.912 | 0.088 | 0.998 | 0.912 | 0.951 |
188D | classified as − > | a | b | a: positive | 0.706 | 0.002 | 0.857 | 0.706 | 0.774 |
| a | 12 | 5 | b: positive | 0.998 | 0.294 | 0.995 | 0.998 | 0.996 |
| b | 2 | 927 | weighted average | 0.852 | 0.148 | 0.926 | 0.852 | 0.885 |
188D | classified as − > | a | b | a: positive | 0.588 | 0.000 | 1.000 | 0.588 | 0.741 |
| a | 10 | 7 | b: positive | 1.000 | 0.412 | 0.993 | 1.000 | 0.996 |
| b | 0 | 929 | weighted average | 0.794 | 0.206 | 0.996 | 0.794 | 0.868 |
188D | classified as − > | a | b | a: positive | 0.765 | 0.001 | 0.929 | 0.765 | 0.839 |
| a | 13 | 4 | b: positive | 0.999 | 0.235 | 0.996 | 0.999 | 0.997 |
| b | 1 | 928 | weighted average | 0.882 | 0.118 | 0.962 | 0.882 | 0.918 |
188D | classified as − > | a | b | a: positive | 0.706 | 0.001 | 0.923 | 0.706 | 0.800 |
| a | 12 | 5 | b: positive | 0.999 | 0.294 | 0.995 | 0.999 | 0.997 |
| b | 1 | 928 | weighted average | 0.852 | 0.148 | 0.959 | 0.852 | 0.898 |
188D | classified as − > | a | b | a: positive | 0.765 | 0.000 | 1.000 | 0.765 | 0.867 |
| a | 13 | 4 | b: positive | 1.000 | 0.235 | 0.996 | 1.000 | 0.998 |
| b | 0 | 929 | weighted average | 0.882 | 0.118 | 0.998 | 0.882 | 0.832 |
Table 16Quantitative results of the feature selected by the incremental strategy using the hybrid ensemble classifier
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 1.000 | 0.035 | 0.340 | 1.000 | 0.507 |
| a | 17 | 0 | b: positive | 0.965 | 0.000 | 1.000 | 0.965 | 0.982 |
| b | 33 | 897 | weighted average | 0.982 | 0.018 | 0.670 | 0.982 | 0.745 |
| classified as − > | a | b | a: positive | 0.882 | 0.040 | 0.288 | 0.882 | 0.435 |
| a | 15 | 2 | b: positive | 0.960 | 0.118 | 0.998 | 0.960 | 0.979 |
| b | 37 | 893 | weighted average | 0.921 | 0.079 | 0.643 | 0.921 | 0.707 |
| classified as − > | a | b | a: positive | 0.647 | 0.006 | 0.647 | 0.647 | 0.647 |
| a | 11 | 6 | b: positive | 0.994 | 0.353 | 0.994 | 0.994 | 0.994 |
| b | 6 | 924 | weighted average | 0.820 | 0.180 | 0.820 | 0.820 | 0.820 |
| classified as − > | a | b | a: positive | 0.824 | 0.053 | 0.222 | 0.824 | 0.350 |
| a | 14 | 3 | b: positive | 0.947 | 0.176 | 0.997 | 0.947 | 0.971 |
| b | 49 | 880 | weighted average | 0.885 | 0.115 | 0.609 | 0.885 | 0.661 |
| classified as − > | a | b | a: positive | 0.882 | 0.017 | 0.484 | 0.882 | 0.625 |
| a | 15 | 2 | b: positive | 0.983 | 0.118 | 0.998 | 0.983 | 0.990 |
| b | 16 | 913 | weighted average | 0.933 | 0.067 | 0.741 | 0.933 | 0.808 |
| classified as − > | a | b | a: positive | 0.882 | 0.041 | 0.283 | 0.882 | 0.429 |
| a | 15 | 2 | b: positive | 0.959 | 0.118 | 0.998 | 0.959 | 0.978 |
| b | 38 | 891 | weighted average | 0.921 | 0.079 | 0.640 | 0.921 | 0.703 |
| classified as − > | a | b | a: positive | 0.706 | 0.055 | 0.190 | 0.706 | 0.300 |
| a | 12 | 5 | b: positive | 0.945 | 0.294 | 0.994 | 0.945 | 0.969 |
| b | 51 | 878 | weighted average | 0.825 | 0.175 | 0.592 | 0.825 | 0.635 |
| classified as − > | a | b | a: positive | 0.882 | 0.037 | 0.306 | 0.882 | 0.455 |
| a | 15 | 2 | b: positive | 0.963 | 0.118 | 0.998 | 0.963 | 0.980 |
| b | 34 | 895 | weighted average | 0.923 | 0.077 | 0.652 | 0.923 | 0.717 |
| classified as − > | a | b | a: positive | 0.824 | 0.022 | 0.412 | 0.824 | 0.549 |
| a | 14 | 3 | b: positive | 0.978 | 0.176 | 0.997 | 0.978 | 0.988 |
| b | 20 | 909 | weighted average | 0.901 | 0.099 | 0.704 | 0.901 | 0.768 |
| classified as − > | a | b | a: positive | 0.941 | 0.047 | 0.267 | 0.941 | 0.416 |
| a | 16 | 1 | b: positive | 0.953 | 0.059 | 0.999 | 0.953 | 0.975 |
| b | 44 | 885 | weighted average | 0.947 | 0.053 | 0.633 | 0.947 | 0.695 |
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 1.000 | 0.035 | 0.340 | 1.000 | 0.507 |
| a | 17 | 0 | b: positive | 0.965 | 0.000 | 1.000 | 0.965 | 0.982 |
| b | 33 | 897 | weighted average | 0.982 | 0.018 | 0.670 | 0.982 | 0.745 |
| classified as − > | a | b | a: positive | 0.882 | 0.040 | 0.288 | 0.882 | 0.435 |
| a | 15 | 2 | b: positive | 0.960 | 0.118 | 0.998 | 0.960 | 0.979 |
| b | 37 | 893 | weighted average | 0.921 | 0.079 | 0.643 | 0.921 | 0.707 |
| classified as − > | a | b | a: positive | 0.647 | 0.006 | 0.647 | 0.647 | 0.647 |
| a | 11 | 6 | b: positive | 0.994 | 0.353 | 0.994 | 0.994 | 0.994 |
| b | 6 | 924 | weighted average | 0.820 | 0.180 | 0.820 | 0.820 | 0.820 |
| classified as − > | a | b | a: positive | 0.824 | 0.053 | 0.222 | 0.824 | 0.350 |
| a | 14 | 3 | b: positive | 0.947 | 0.176 | 0.997 | 0.947 | 0.971 |
| b | 49 | 880 | weighted average | 0.885 | 0.115 | 0.609 | 0.885 | 0.661 |
| classified as − > | a | b | a: positive | 0.882 | 0.017 | 0.484 | 0.882 | 0.625 |
| a | 15 | 2 | b: positive | 0.983 | 0.118 | 0.998 | 0.983 | 0.990 |
| b | 16 | 913 | weighted average | 0.933 | 0.067 | 0.741 | 0.933 | 0.808 |
| classified as − > | a | b | a: positive | 0.882 | 0.041 | 0.283 | 0.882 | 0.429 |
| a | 15 | 2 | b: positive | 0.959 | 0.118 | 0.998 | 0.959 | 0.978 |
| b | 38 | 891 | weighted average | 0.921 | 0.079 | 0.640 | 0.921 | 0.703 |
| classified as − > | a | b | a: positive | 0.706 | 0.055 | 0.190 | 0.706 | 0.300 |
| a | 12 | 5 | b: positive | 0.945 | 0.294 | 0.994 | 0.945 | 0.969 |
| b | 51 | 878 | weighted average | 0.825 | 0.175 | 0.592 | 0.825 | 0.635 |
| classified as − > | a | b | a: positive | 0.882 | 0.037 | 0.306 | 0.882 | 0.455 |
| a | 15 | 2 | b: positive | 0.963 | 0.118 | 0.998 | 0.963 | 0.980 |
| b | 34 | 895 | weighted average | 0.923 | 0.077 | 0.652 | 0.923 | 0.717 |
| classified as − > | a | b | a: positive | 0.824 | 0.022 | 0.412 | 0.824 | 0.549 |
| a | 14 | 3 | b: positive | 0.978 | 0.176 | 0.997 | 0.978 | 0.988 |
| b | 20 | 909 | weighted average | 0.901 | 0.099 | 0.704 | 0.901 | 0.768 |
| classified as − > | a | b | a: positive | 0.941 | 0.047 | 0.267 | 0.941 | 0.416 |
| a | 16 | 1 | b: positive | 0.953 | 0.059 | 0.999 | 0.953 | 0.975 |
| b | 44 | 885 | weighted average | 0.947 | 0.053 | 0.633 | 0.947 | 0.695 |
Table 16Quantitative results of the feature selected by the incremental strategy using the hybrid ensemble classifier
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 1.000 | 0.035 | 0.340 | 1.000 | 0.507 |
| a | 17 | 0 | b: positive | 0.965 | 0.000 | 1.000 | 0.965 | 0.982 |
| b | 33 | 897 | weighted average | 0.982 | 0.018 | 0.670 | 0.982 | 0.745 |
| classified as − > | a | b | a: positive | 0.882 | 0.040 | 0.288 | 0.882 | 0.435 |
| a | 15 | 2 | b: positive | 0.960 | 0.118 | 0.998 | 0.960 | 0.979 |
| b | 37 | 893 | weighted average | 0.921 | 0.079 | 0.643 | 0.921 | 0.707 |
| classified as − > | a | b | a: positive | 0.647 | 0.006 | 0.647 | 0.647 | 0.647 |
| a | 11 | 6 | b: positive | 0.994 | 0.353 | 0.994 | 0.994 | 0.994 |
| b | 6 | 924 | weighted average | 0.820 | 0.180 | 0.820 | 0.820 | 0.820 |
| classified as − > | a | b | a: positive | 0.824 | 0.053 | 0.222 | 0.824 | 0.350 |
| a | 14 | 3 | b: positive | 0.947 | 0.176 | 0.997 | 0.947 | 0.971 |
| b | 49 | 880 | weighted average | 0.885 | 0.115 | 0.609 | 0.885 | 0.661 |
| classified as − > | a | b | a: positive | 0.882 | 0.017 | 0.484 | 0.882 | 0.625 |
| a | 15 | 2 | b: positive | 0.983 | 0.118 | 0.998 | 0.983 | 0.990 |
| b | 16 | 913 | weighted average | 0.933 | 0.067 | 0.741 | 0.933 | 0.808 |
| classified as − > | a | b | a: positive | 0.882 | 0.041 | 0.283 | 0.882 | 0.429 |
| a | 15 | 2 | b: positive | 0.959 | 0.118 | 0.998 | 0.959 | 0.978 |
| b | 38 | 891 | weighted average | 0.921 | 0.079 | 0.640 | 0.921 | 0.703 |
| classified as − > | a | b | a: positive | 0.706 | 0.055 | 0.190 | 0.706 | 0.300 |
| a | 12 | 5 | b: positive | 0.945 | 0.294 | 0.994 | 0.945 | 0.969 |
| b | 51 | 878 | weighted average | 0.825 | 0.175 | 0.592 | 0.825 | 0.635 |
| classified as − > | a | b | a: positive | 0.882 | 0.037 | 0.306 | 0.882 | 0.455 |
| a | 15 | 2 | b: positive | 0.963 | 0.118 | 0.998 | 0.963 | 0.980 |
| b | 34 | 895 | weighted average | 0.923 | 0.077 | 0.652 | 0.923 | 0.717 |
| classified as − > | a | b | a: positive | 0.824 | 0.022 | 0.412 | 0.824 | 0.549 |
| a | 14 | 3 | b: positive | 0.978 | 0.176 | 0.997 | 0.978 | 0.988 |
| b | 20 | 909 | weighted average | 0.901 | 0.099 | 0.704 | 0.901 | 0.768 |
| classified as − > | a | b | a: positive | 0.941 | 0.047 | 0.267 | 0.941 | 0.416 |
| a | 16 | 1 | b: positive | 0.953 | 0.059 | 0.999 | 0.953 | 0.975 |
| b | 44 | 885 | weighted average | 0.947 | 0.053 | 0.633 | 0.947 | 0.695 |
Feature
. | Confusion matrix
. | Class
. | TP rate
. | FP rate
. | Precision
. | Recall
. | F1-measure
. |
---|
| classified as − > | a | b | a: positive | 1.000 | 0.035 | 0.340 | 1.000 | 0.507 |
| a | 17 | 0 | b: positive | 0.965 | 0.000 | 1.000 | 0.965 | 0.982 |
| b | 33 | 897 | weighted average | 0.982 | 0.018 | 0.670 | 0.982 | 0.745 |
| classified as − > | a | b | a: positive | 0.882 | 0.040 | 0.288 | 0.882 | 0.435 |
| a | 15 | 2 | b: positive | 0.960 | 0.118 | 0.998 | 0.960 | 0.979 |
| b | 37 | 893 | weighted average | 0.921 | 0.079 | 0.643 | 0.921 | 0.707 |
| classified as − > | a | b | a: positive | 0.647 | 0.006 | 0.647 | 0.647 | 0.647 |
| a | 11 | 6 | b: positive | 0.994 | 0.353 | 0.994 | 0.994 | 0.994 |
| b | 6 | 924 | weighted average | 0.820 | 0.180 | 0.820 | 0.820 | 0.820 |
| classified as − > | a | b | a: positive | 0.824 | 0.053 | 0.222 | 0.824 | 0.350 |
| a | 14 | 3 | b: positive | 0.947 | 0.176 | 0.997 | 0.947 | 0.971 |
| b | 49 | 880 | weighted average | 0.885 | 0.115 | 0.609 | 0.885 | 0.661 |
| classified as − > | a | b | a: positive | 0.882 | 0.017 | 0.484 | 0.882 | 0.625 |
| a | 15 | 2 | b: positive | 0.983 | 0.118 | 0.998 | 0.983 | 0.990 |
| b | 16 | 913 | weighted average | 0.933 | 0.067 | 0.741 | 0.933 | 0.808 |
| classified as − > | a | b | a: positive | 0.882 | 0.041 | 0.283 | 0.882 | 0.429 |
| a | 15 | 2 | b: positive | 0.959 | 0.118 | 0.998 | 0.959 | 0.978 |
| b | 38 | 891 | weighted average | 0.921 | 0.079 | 0.640 | 0.921 | 0.703 |
| classified as − > | a | b | a: positive | 0.706 | 0.055 | 0.190 | 0.706 | 0.300 |
| a | 12 | 5 | b: positive | 0.945 | 0.294 | 0.994 | 0.945 | 0.969 |
| b | 51 | 878 | weighted average | 0.825 | 0.175 | 0.592 | 0.825 | 0.635 |
| classified as − > | a | b | a: positive | 0.882 | 0.037 | 0.306 | 0.882 | 0.455 |
| a | 15 | 2 | b: positive | 0.963 | 0.118 | 0.998 | 0.963 | 0.980 |
| b | 34 | 895 | weighted average | 0.923 | 0.077 | 0.652 | 0.923 | 0.717 |
| classified as − > | a | b | a: positive | 0.824 | 0.022 | 0.412 | 0.824 | 0.549 |
| a | 14 | 3 | b: positive | 0.978 | 0.176 | 0.997 | 0.978 | 0.988 |
| b | 20 | 909 | weighted average | 0.901 | 0.099 | 0.704 | 0.901 | 0.768 |
| classified as − > | a | b | a: positive | 0.941 | 0.047 | 0.267 | 0.941 | 0.416 |
| a | 16 | 1 | b: positive | 0.953 | 0.059 | 0.999 | 0.953 | 0.975 |
| b | 44 | 885 | weighted average | 0.947 | 0.053 | 0.633 | 0.947 | 0.695 |

Figure 5
The line charts of the ACCs. (A) The line chart of the ACCs using the clustering strategy. (B) The line charts of the ACCs on each feature dimension using the incremental strategy.
In order to judge which base classifier of the hybrid ensemble classifier makes a greater contribution, the count of each base classifier assigned to the hybrid ensemble classifier was recorded in each feature dimension. Thus, 10 line charts corresponding to 10 folds are illustrated in Figure 6. It can be concluded that different base classifier is to be appointed according to different feature dimensions when establishing an ensemble classifier.

Figure 6
Line charts representing the count of each base classifier in accordance with the feature dimension in each fold.
Discussions
According to the previous experimental results, discussions can be made as follows. First, it needs to be considered whether the classification performance of the hybrid ensemble classifier is better than that of other classifiers for identification of PPR. According to Table 2, it can be seen that the TP rate of PPR negative proteins has reached 0.999 when using random forest for classification. However, the maximum TP rate of PPR positive proteins is only 0.680, which means that nearly one-third of PPR positive proteins have been wrongly categorized into PPR negative ones. When using the hybrid ensemble classification on the same samples, it can be observed that the TP rate of PPR negative proteins has reached 0.998, and the maximum TP rate of PPR positive proteins is 0.910, as listed in Table 1. The comparative results indicate that the hybrid ensemble classifier keeps a better classification performance than random forest. Besides, the experimental results shown in Figure 5(A) exhibit that the mean ACC value and the corresponding standard deviation are 0.902 and 0.042 using the 16-dimensional feature, which also demonstrates the effectiveness of the hybrid ensemble classification.
Second, it is necessary to discuss which strategy for variable selection is more reliable between the clustering strategy and the incremental way. Qualitative and quantitative analyses have been made on the features selected by these two strategies. It can be seen in Figure 5(A) that the mean ACC value is 0.902 with the feature selected using the clustering strategy. While, the mean ACC value that corresponds to the feature selected in the incremental strategy is 0.896 with its standard deviation to be 0.053 (see Figure 5(B)). This weak difference between the two mean ACCs indicates that both the two strategies do work on the dataset representing plant PPR. Actually, these two strategies can be performed, respectively. Decisions can be made after a careful comparison between the classification results.
Third, it needs to be further discussed which base classifier is more appropriate in different feature dimensions. As shown in Figure 6, DTC and KNN are mostly selected when the feature dimension is low. When the feature dimension increases, GNB is considered. Thus, different base classifier is to be assigned with different feature dimensions. More research about this phenomenon will be made in our future work.
Conclusion
In this study, we improved our feature selection framework for PPR protein identification. Ten-fold nested cross validation was utilized to make the results more reliable. A decision for automatic stop of resampling was made. Instead of previous random forest, a hybrid ensemble classification was applied. As to automatic variable selection, a clustering way and an incremental strategy were both considered. Better classification results demonstrate the effectiveness of all our improvements. Ultimately, an phenomenon was discovered that the automatic assignment of a base classifier is closely associated with feature dimension.
Key Points
Hybrid ensemble classification has better classification performance than other classifiers.
An automatic feature selection is proposed, which is based on either an incremental strategy or a clustering by search in descending order.
Different base classifiers alternately play an important role in the hybrid ensemble classifier with feature dimension increasing.
Data availability
The real dataset analyzed for this study can be found in the UniPort at https://www.uniprot.org/ and can be download at https://www.frontiersin.org/articles/10.3389/fpls.2018.01961.
Authors Contributions Statement
W.G.H. conceived the general project and supervised it. Z.X.D. initiated the idea, conceived the whole process, and finalized the paper. Z.J.W. was the principal developer and made the experiments. L.T. helped to provide the clustering way and the incremental strategy.
Funding
This work has been supported by the Natural Science Foundation of China (No. 62072095, 62225109), the Fundamental Research Funds for the Central Universities (No. 2572021CG03), the Natural Science Foundation of Heilongjiang Province (No. LH2020F002) and the financial support of Specialized Personnel Start-up Grant (No. 520-60201521039).
Xudong Zhao received the B.S. degree in intelligent instrument, the M.S. degree in computer science and technology and the Ph.D. degree in artificial intelligence and information processing from Harbin Institute of Technology, Harbin, China, in 2003, 2007 and 2013, respectively. He was a post-doctoral fellow of computer science and engineering in Chinese University of Hongkong, in 2014. Currently, he is an associate professor in college of information and computer engineering, Northeast Forestry University, Harbin, China. His research interest includes feature selection, clustering, discovery of signatures for prognosis in different cancers, differential expression analysis on expression profiles and medical image processing.
Jingwen Zhai received the B.S. degree in computer science and technology from Harbin University of Science and Technology, Harbin, China in 2020. He is currently pursuing the master’s degree with the College of Information and Computer Engineering, Northeast Forestry University, under the supervision of X. D. Zhao. His research interests include pattern recognition, bioinformatics and machine learning
Tong Liu received the B.S. degree in computer science and technology from Northeast Forestry University, Harbin, China in 2020. He is currently pursuing the master’s degree with the College of Information and Computer Engineering, Northeast Forestry University, under the supervision of X. D. Zhao. His research interests include pattern recognition, bioinformatics, medical image processing and machine learning
Guohua Wang is a professor in college of information and computer engineering, Northeast Forestry University. He received the master’s and Ph.D degrees in computer science and technology from Harbin Institute of Technology, in 2003 and 2009, respectively. He has been working at Johns Hopkins University as a postdoctoral fellow from 2014 to 2016. His research interests are bioinformatics, machine learning and algorithms design.
References
1.
Barkan
A
,
Small
I
.
Pentatricopeptide repeat proteins in plants
.
Annu Rev Plant Biol
2014
;
65
(
1
):
415
–
42
.
2.
Zhang
Q
,
Yanghong
X
,
Huang
J
, et al.
The rice pentatricopeptide repeat protein ppr756 is involved in pollen development by affecting multiple RNA editing in mitochondria
.
Front Plant Sci
2020
;
11
:
749
.
3.
Li
XJ
,
Zhang
YF
,
Hou
M
, et al.
Small kernel 1 encodes a pentatricopeptide repeat protein required for mitochondrial nad7 transcript editing and seed development in maize (Zea mays) and rice (Oryza sativa)
.
Plant J
2014
;
79
(
5
):
797
–
809
.
4.
Wang
X
,
Zhao
L
,
Man
Y
, et al.
Pdm4, a pentatricopeptide repeat protein, affects chloroplast gene expression and chloroplast development in Arabidopsis thaliana
.
Front Plant Sci
2020
;
11
:1198.
5.
Zhang
J
,
Xiao
J
,
Li
Y
, et al.
Pdm3, a pentatricopeptide repeat-containing protein, affects chloroplast development
.
J Exp Bot
2017
;
68
(
20
):
5615
–
27
.
6.
Toda
T
,
Fujii
S
,
Noguchi
K
, et al.
Rice mpr25 encodes a pentatricopeptide repeat protein and is essential for RNA editing of nad5 transcripts in mitochondria
.
Plant J
2012
;
72
(
3
):
450
–
60
.
7.
Liu
Y-J
,
Xiu
Z-H
,
Meeley
R
, et al.
Empty pericarp5 encodes a pentatricopeptide repeat protein that is required for mitochondrial RNA editing and seed development in maize
.
Plant Cell
2013
;
25
(
3
):
868
–
83
.
8.
Wei
L
,
Tang
J
,
Zou
Q
.
Local-dpp: an improved DNA-binding protein prediction method by exploring local evolutionary information
.
Inform Sci
2017
;
384
:
135
–
44
.
9.
Tang
H
,
Zhao
Y-W
,
Zou
P
, et al.
Hbpred: a tool to identify growth hormone-binding proteins
.
Int J Biol Sci
2018
;
14
(
8
):
957
–
64
.
10.
Kaiyang
Q
,
Wei
L
,
Yu
J
, et al.
Identifying plant pentatricopeptide repeat coding gene/protein using mixed feature extraction methods
.
Front Plant Sci
2019
;1961.
11.
Congzhong Cai
LY
,
Han
ZL
,
Ji
XC
, et al.
Svm-prot: web-based support vector machine software for functional classification of a protein from its primary sequence
.
Nucleic Acids Res
2003
;
31
(
13
):
3692
–
7
.
12.
Hou
R
,
Wang
L
,
Yi-Jun
W
.
Predicting ATP-binding cassette transporters using the random forest method
.
Front Genet
2020
;
11
:
156
.
13.
Kaiyang
Q
,
Zou
Q
,
Shi
H
.
Prediction of diabetic protein markers based on an ensemble method
.
Front Biosci
2021
;
26
(
7
):
207
–
21
.
14.
Ao
C
,
Zhou
W
,
Gao
L
, et al.
Prediction of antioxidant proteins using hybrid feature representation method and random forest
.
Genomics
2020
;
112
(
6
):
4666
–
74
.
15.
Amin
A
,
Awais
M
,
Sahai
S
, et al.
idrp-pseaac: identification of DNA replication proteins using general PSEAAC and position dependent features
.
Int J Peptide Res Ther
2021
;
27
(
2
):
1315
–
29
.
16.
Pufeng
D
,
Wang
X
,
Chao
X
, et al.
Pseaac-builder: a cross-platform stand-alone program for generating various special chou’s pseudo-amino acid compositions
.
Anal Biochem
2012
;
425
(
2
):
117
–
9
.
17.
Zhao
X
,
Wang
H
,
Li
H
, et al.
Identifying plant pentatricopeptide repeat proteins using a variable selection method
.
Front Plant Sci
2021
;
12
:
298
.
18.
Hakala
K
,
Kaewphan
S
,
Björne
J
, et al.
Neural network and random forest models in protein function prediction
.
BioRxiv
2019
;690271.
19.
Gong
Y
,
Liao
B
,
Wang
P
, et al.
Drughybrid_bs: using hybrid feature combined with bagging-svm to predict potentially druggable proteins
.
Front Pharmacol
2021
;3467.
20.
Zhang
Y
,
Ni
J
,
Gao
Y
.
Rf-svm: identification of DNA-binding proteins based on comprehensive feature representation methods and support vector machine
.
Prot Struct Funct Bioinformatics
2022
;
90
(
2
):
395
–
404
.
21.
Zhang
J
,
Lv
L
,
Donglei
L
, et al.
Variable selection from a feature representing protein sequences: a case of classification on bacterial type iv secreted effectors
.
BMC Bioinformatics
2020
;
21
(
1
):
1
–
15
.
22.
Dai
W
,
Chen
B
,
Peng
W
, et al.
A novel multi-ensemble method for identifying essential proteins
.
J Comput Biol
2021
;
28
(
7
):
637
–
49
.
23.
Ning
Wang
,
Jun
Zhang
, and
Bin
Liu
. idrbp-el: identifying DNA-and RNA-binding proteins based on hierarchical ensemble learning.
IEEE/ACM Transactions on Computational Biology and Bioinformatics
,
2021
.
24.
Frank
E
,
Hall
M
,
Trigg
L
, et al.
Data mining in bioinformatics using Weka
.
Bioinformatics
2004
;
20
(
15
):
2479
–
81
.
25.
Li
W
,
Godzik
A
.
Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences
.
Bioinformatics
2006
;
22
(
13
):
1658
–
9
.
26.
Liu
T
,
Li
H
,
Zhao
X
.
Clustering by search in descending order and automatic find of density peaks
.
IEEE Access
2019
;
7
:
133772
–
80
.
27.
Li
R
,
Perneczky
R
,
Yakushev
I
, et al.
Gaussian mixture models and model selection for [18f] fluorodeoxyglucose positron emission tomography classification in alzheimer’s disease
.
PloS One
2015
;
10
(
4
):e0122731.
© The Author(s) 2022. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com