Ensemble classification based feature selection: a case of identification on plant pentatricopeptide repeat proteins

Zhao, Xudong; Zhai, Jingwen; Liu, Tong; Wang, Guohua

doi:10.1093/bib/bbac369

Abstract

In order to identify plant pentatricopeptide repeat (PPR) proteins, a framework of variable selection has been proposed. In fact, it is an effective feature selection strategy that focuses on the performance of classification. Random forest has been used as the classifier with certain variables automatically selected for discrimination between PPR functional and non-functional proteins. However, it is found that samples regarded as PPR functional proteins are wrongly classified in a high rate. In this paper, we plan to improve the framework in order to achieve better classification results. Modifications are made on the framework for better identifying PPR functional proteins. Instead of random forest, a hybrid ensemble classifier is built with its base classifiers derived from six different classification methods. Besides, an incremental strategy and a clustering by search in descending order are alternatively used for feature selection, which can effectively select the most representative variables for identification on PPR proteins. In addition, it can be found that different base classifiers alternately play an important role in the ensemble classifier with feature dimension increasing. The experimental results demonstrate the effectiveness of our improvements.

pentatricopeptide repeat, feature selection, variable importance, ensemble classification, an incremental way, clustering

Issue Section:

Problem solving protocol

Introduction

Pentatricopeptide repeat (PPR) proteins, which appear more than 400 forms in most species, are regarded as one of the largest protein families in land plants [1]. Commonly, typical PPR proteins are located in mitochondria [2, 3] or chloroplasts [4, 5] and bound to one or more organelle transcripts, the expressions of which are affected by changing the transcription, processing and translation of the RNA sequence [6, 7]. Their combined action has profound effects on the biogenesis and function of organelle, and consequently has an influence on photosynthesis, respiration, plant development and environmental responses.

Many functional proteins can be predicted using sequential tools [8–10]. First, algorithms are designed to extract various features from amino acid sequences for protein prediction. For instance, a feature named 188D which is composed of 188 feature components (namely variables) that are related to the content of 20 amino acids and eight types of physicochemical properties of amino acids is a case in point [11–14]. Another case is the feature including 65 pseudo-amino acids which is abbreviated as PAAC [15, 16]. Thereafter, classifiers such as random forest (RF) [17, 18], support vector machine (SVM) [19, 20] and the hybrid ensemble classifier composed of multiple base classifiers [21–23] can be applied to evaluate the distinguishing ability of the extracted features. Besides, a workbench named WEKA [24] using different classifiers to get quantitative classification results has been proposed. However, different classifiers inevitably appear different performances of classification according to various sample distribution, which makes an automatic switch among different classifiers become a need. In addition, there are still few studies on discussing whether feature components are more effective in identifying functional proteins or not, especially PPR proteins. In order to solve these problems, a framework of variable selection for identifying plant PPR proteins has been proposed [17]. The methionine content is firstly found to be effective for recognition of PPR proteins. Due to the application of random forest, many PPR proteins have been wrongly classified and regarded as false negative.

$The proposed ensemble classification based feature selection method. ① Ten-fold nested cross validation with its inner loop to be the resampling, training and scoring step. ② Hybrid ensemble classification with six different types of base classifiers assigned for score accumulation, incremental variable selection and classifier establishment (see the control line $\rightarrow $). ③ Variable score accumulation. Its resampling and training step correspond to the inner loop of 10-fold nested cross validation and variable scoring, respectively. ④ Variable selection by clustering. ⑤ Variable selection using an increasing strategy. ⑥ Quantitative measurements.$

Figure 1

The proposed ensemble classification based feature selection method. ① Ten-fold nested cross validation with its inner loop to be the resampling, training and scoring step. ② Hybrid ensemble classification with six different types of base classifiers assigned for score accumulation, incremental variable selection and classifier establishment (see the control line $\to$ ⁠). ③ Variable score accumulation. Its resampling and training step correspond to the inner loop of 10-fold nested cross validation and variable scoring, respectively. ④ Variable selection by clustering. ⑤ Variable selection using an increasing strategy. ⑥ Quantitative measurements.

Open in new tab Download slide

In this paper, a hybrid-ensemble-classifier based framework of variable selection is presented instead of random forest for better identifying plant PPR proteins, as shown in Figure 1. First, PPR positive and negative proteins are equally divided into 10 parts for 10-fold nested cross validation. Six different classification methods are automatically selected for variable scoring, variable selection and the establishment of the ensemble classifier, in order to cope with different sample distributions. Then, multiple rounds of resampling, training and scoring are implemented on the training set to accumulate scores for each variable. In each round, the score of a variable is accumulated by making a comparison between the classification error rates before and after one-time random permutation of the remaining sample on the variable in the training set. The number of rounds for resampling, training and scoring is determined by clustering. Variables are automatically selected through clustering and a presented variable increasing strategy. Qualitative and quantitative measurements are made on the selected variables in the testing set. The experimental results indicate the effectiveness of the automatically selected variables for identification of PPR proteins, which demonstrates that the ensemble classifier composed of various base classifiers is more effective than random forest.

Method

First of all, the dataset representing plant PPR is used [10], which contains 487 PPR positive and 9590 negative protein primary sequences. Subsequently, the feature 188D is extracted. In order to discuss which components in 188D play a role in identifying plant PPR proteins, we follow the feature selection framework in Figure 1 to select important variables. More details can be seen in the following subsections.

Data division

As previously processed [17], 243 PPR positive proteins and 4795 PPR negative ones are randomly selected as the training set. The remaining samples are used as the test set. This kind of sample division is used to make a comparison with the previous method. In addition, n-fold nested cross validation is thought to be more effective especially when sample size is limited. Therefore, CD-HIT [25] is firstly performed on the data in order to remove sequence redundancy. 170 PPR positive proteins and 9293 negative ones are remained. Secondly, PPR positive and negative proteins are separately divided into 10 groups with 17 PPR positive proteins and 929 PPR negative ones in each group for 10-fold nested cross validation.

Ensemble classification

Instead of random forest, an ensemble classifier is presented, which is based on multiple types of base classifiers including k-nearest neighbor (KNN), multilayer perceptron (MLP), linear discriminant analysis (LDA), Gaussian naive Bayes (GNB), support vector machine (SVM) and decision tree classifier (DTC). The ensemble classifier provides the following supports: assistance in calculating the score of each variable from the 188D feature; participation in variable increasing strategy to automatically select variables; and a distinction between PPR positive and negative proteins on the test set (see Figure 1).

Score accumulation

As illustrated in Figure 1, iterations are implemented on the training set to obtain important variables which contribute to discrimination between PPR positive and negative proteins. Each iteration consists of three steps, i.e. resampling, training and scoring.

Firstly, 70% of the training samples, i.e. 107 PPR positive proteins and 5855 PPR negative ones, are randomly selected in balance. The remaining 30% of the training samples, i.e. 46 PPR positive proteins and 2509 PPR negative ones, are known as the out-of-bag (OOB) samples.

Secondly, the selected samples are used to train each kind of the base classifiers. All the components of the 188D feature are taken into account. Therefore, six trained base classifiers are obtained. The one with the best classification performance on OOB samples is selected for variable scoring. Considering the unbalanced distribution between positive and negative samples, the classification error rate on OOB samples is modified. That is,

\begin{aligned} E r r_{O O B} = \frac{1}{2} * (\frac{F N}{T P + F N} + \frac{F P}{T N + F P}), \end{aligned}

(1)

where

F N

⁠,

T P

⁠,

F P

and

T N

represent the number of false negative, true positive, false positive and true negative samples, respectively. Thus, the base classifier with a lowest classification error rate is selected as the specific classifier.

Thirdly, sample values of each variable are permuted, and the classification error rate of the specific base classifier is calculated again using Equation (1). The difference of the error rates before and after permutation is assigned to the variable as its importance score, which is expressed as,

\begin{aligned} s c o r e_{j} (i) = {\tilde{E r r}}_{O O B} - E r r_{O O B}, \end{aligned}

(2)

where

E r r_{O O B}

and

{\tilde{E r r}}_{O O B}

represent the error rates before and after the permutation of sample values on variable

i

in iteration round

j

⁠, respectively. The iteration of resampling, training and scoring is executed for

N

rounds, and the accumulated score of variable

i

is expressed as,

\begin{aligned} A c c_s c o r e (i) = \frac{\sum_{j = 1}^{N} s c o r e_{j} (i)}{N} . \end{aligned}

(3)

If the accumulated score of variable

i

is small, it means that variable

i

has little contribution to sample classification. Otherwise, it is to be regarded as an important component for sample identification.

Variable selection by clustering

Once the accumulated score of each component or variable is obtained, variables with higher scores are to be selected for sample classification. An automatic model selection method is necessary to select variables contributing to the identification of PPR positive proteins. A clustering by search in descending order and automatic find of density peaks, which is previously proposed and abbreviated as A-DPC [26], is utilized to choose variables automatically and judge when to stop the rounds of resampling, training and scoring.

The implementation details are as follows. Firstly, 1000 rounds of resampling, training and scoring are implemented. Thus, the accumulated scores of all the variables in 188D are obtained. Secondly, A-DPC is used on the accumulated scores for selecting variable candidates, which contain the variables with their accumulated scores not in the cluster with the lowest accumulated scores. Thirdly, a repetition on a new 1000 rounds of resampling, training and scoring is made, and A-DPC is used on the obtained accumulated scores which correspond to the total rounds. This procedure is repeated until the selected variable candidates do not change.

Variable selection using an increasing strategy

Considering that the accumulated scores of all the variables may remain at a relatively low level, i.e. no outliers appear on the scatter plot as illustrated in Figure 1, an automatic variable selection is designed instead using a variable increasing strategy.

Firstly, the variables are rearranged with their accumulated scores in descending order. Secondly, each variable is added in an incremental way according to the new order to form a feature subspace, in which the ensemble classifier is trained. Then, 1000 rounds of resampling and training are made. In each round, one of the base classifiers with the lowest classification error rate is kept as a component of the ensemble classifier. Therefore, 1000 base classifiers with different types are obtained. This incremental strategy is made until all the variables in 188D are considered. Thirdly, the established ensemble classifier is used on each fold of samples for testing to get the classification accuracy, which is expressed as,

\begin{aligned} A C C = \frac{1}{2} * (\frac{T P}{T P + F N} + \frac{T N}{T N + F P}) . \end{aligned}

(4)

Thus, 188 ACCs are obtained after variable traverse through feature 188D. Correspondingly, a line chart can be drawn, as shown in Figure 1. Fourthly, polynomial fit is made on the ACC line chart to find the inflexion point, which refers to the dimension of the variables to be selected.

Table 1

Open in new tab

Quantitative results using the hybrid ensemble classifier

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.295	0.036	0.293	0.295	0.294
	a	72	172	b: positive	0.964	0.705	0.964	0.964	0.964
	b	174	4621	weighted average	0.629	0.371	0.628	0.629	0.629
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.373	0.014	0.569	0.373	0.450
	a	91	153	b: positive	0.986	0.627	0.969	0.986	0.977
	b	69	4726	weighted average	0.679	0.321	0.769	0.679	0.714
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.619	0.021	0.604	0.619	0.611
	a	151	93	b: positive	0.979	0.381	0.981	0.979	0.980
	b	99	4696	weighted average	0.799	0.201	0.792	0.799	0.796
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.721	0.017	0.682	0.721	0.701
	a	176	68	b: positive	0.983	0.279	0.986	0.983	0.984
	b	82	4713	weighted average	0.852	0.148	0.834	0.852	0.843
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.910	0.053	0.464	0.910	0.615
$1, 63, 24, 13, 22,$	a	222	22	b: positive	0.947	0.090	0.995	0.947	0.970
$87, 62, 45, 9)^{T}$	b	256	4539	weighted average	0.928	0.072	0.730	0.928	0.793
188D	classified as − >	a	b	a: positive	0.873	0.002	0.955	0.873	0.912
	a	213	31	b: positive	0.998	0.127	0.994	0.998	0.996
	b	10	4785	weighted average	0.935	0.065	0.974	0.935	0.954

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.295	0.036	0.293	0.295	0.294
	a	72	172	b: positive	0.964	0.705	0.964	0.964	0.964
	b	174	4621	weighted average	0.629	0.371	0.628	0.629	0.629
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.373	0.014	0.569	0.373	0.450
	a	91	153	b: positive	0.986	0.627	0.969	0.986	0.977
	b	69	4726	weighted average	0.679	0.321	0.769	0.679	0.714
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.619	0.021	0.604	0.619	0.611
	a	151	93	b: positive	0.979	0.381	0.981	0.979	0.980
	b	99	4696	weighted average	0.799	0.201	0.792	0.799	0.796
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.721	0.017	0.682	0.721	0.701
	a	176	68	b: positive	0.983	0.279	0.986	0.983	0.984
	b	82	4713	weighted average	0.852	0.148	0.834	0.852	0.843
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.910	0.053	0.464	0.910	0.615
$1, 63, 24, 13, 22,$	a	222	22	b: positive	0.947	0.090	0.995	0.947	0.970
$87, 62, 45, 9)^{T}$	b	256	4539	weighted average	0.928	0.072	0.730	0.928	0.793
188D	classified as − >	a	b	a: positive	0.873	0.002	0.955	0.873	0.912
	a	213	31	b: positive	0.998	0.127	0.994	0.998	0.996
	b	10	4785	weighted average	0.935	0.065	0.974	0.935	0.954

Table 1

Open in new tab

Quantitative results using the hybrid ensemble classifier

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.295	0.036	0.293	0.295	0.294
	a	72	172	b: positive	0.964	0.705	0.964	0.964	0.964
	b	174	4621	weighted average	0.629	0.371	0.628	0.629	0.629
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.373	0.014	0.569	0.373	0.450
	a	91	153	b: positive	0.986	0.627	0.969	0.986	0.977
	b	69	4726	weighted average	0.679	0.321	0.769	0.679	0.714
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.619	0.021	0.604	0.619	0.611
	a	151	93	b: positive	0.979	0.381	0.981	0.979	0.980
	b	99	4696	weighted average	0.799	0.201	0.792	0.799	0.796
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.721	0.017	0.682	0.721	0.701
	a	176	68	b: positive	0.983	0.279	0.986	0.983	0.984
	b	82	4713	weighted average	0.852	0.148	0.834	0.852	0.843
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.910	0.053	0.464	0.910	0.615
$1, 63, 24, 13, 22,$	a	222	22	b: positive	0.947	0.090	0.995	0.947	0.970
$87, 62, 45, 9)^{T}$	b	256	4539	weighted average	0.928	0.072	0.730	0.928	0.793
188D	classified as − >	a	b	a: positive	0.873	0.002	0.955	0.873	0.912
	a	213	31	b: positive	0.998	0.127	0.994	0.998	0.996
	b	10	4785	weighted average	0.935	0.065	0.974	0.935	0.954

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.295	0.036	0.293	0.295	0.294
	a	72	172	b: positive	0.964	0.705	0.964	0.964	0.964
	b	174	4621	weighted average	0.629	0.371	0.628	0.629	0.629
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.373	0.014	0.569	0.373	0.450
	a	91	153	b: positive	0.986	0.627	0.969	0.986	0.977
	b	69	4726	weighted average	0.679	0.321	0.769	0.679	0.714
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.619	0.021	0.604	0.619	0.611
	a	151	93	b: positive	0.979	0.381	0.981	0.979	0.980
	b	99	4696	weighted average	0.799	0.201	0.792	0.799	0.796
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.721	0.017	0.682	0.721	0.701
	a	176	68	b: positive	0.983	0.279	0.986	0.983	0.984
	b	82	4713	weighted average	0.852	0.148	0.834	0.852	0.843
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.910	0.053	0.464	0.910	0.615
$1, 63, 24, 13, 22,$	a	222	22	b: positive	0.947	0.090	0.995	0.947	0.970
$87, 62, 45, 9)^{T}$	b	256	4539	weighted average	0.928	0.072	0.730	0.928	0.793
188D	classified as − >	a	b	a: positive	0.873	0.002	0.955	0.873	0.912
	a	213	31	b: positive	0.998	0.127	0.994	0.998	0.996
	b	10	4785	weighted average	0.935	0.065	0.974	0.935	0.954

Measurement

In order to show the effectiveness of the selected variables, we select seven quantitative measures, including confusion matrix, TP rate, FP rate, Precision, Recall, ACC and F1-measure. The confusion matrix describes the number of FN, TP, FP and TN samples. Accordingly, TP rate, FP rate, Precision and Recall are calculated as follows,

\begin{aligned} \begin{array}{lll} T P r a t e = \frac{T P}{T P + F N}, \\ F P r a t e = \frac{F P}{F P + T N}, \\ P r e c i s i o n = \frac{T P}{T P + F P}, \\ R e c a l l = \frac{T P}{T P + F N}, \end{array} \end{aligned}

(5)

where TP rate and Recall are expressed in the same form. F1-measure is the harmonic average of precision and recall, which is expressed as,

\begin{aligned} F 1 - m e a s u r e = \frac{2 * P r e c i s i o n * R e c a l l}{P r e c i s i o n + R e c a l l} . \end{aligned}

(6)

Results

Since the complexity of the base classifier MLP is the highest, the time complexity of the proposed framework is $O (r n v h^{k} o i)$ ⁠, where $r$ refers to the resampling rounds. $n$ and $v$ represent the number of samples and variables. $h$ ⁠, $k$ ⁠, $o$ and $i$ correspond to the number of neurons, hidden layers, outputs and iterations of each layer in the neural network, respectively. However, it only corresponds to the training process. As for the test step, its time complexity is only $O (1)$ ⁠.

Besides, the implementation environment is a notebook with AMD r7-5800h CPU, windows10 operating system and 16GB memory. Python version 3.9 and scikit learn (1.0.2) package are used. Six classifiers under the default parameters, i.e. LinearDiscriminantAnalysis, KNeigborsClassifier, DecisionTree Classifier, SVC, GaussianNB and MLPClassifier, are considered.

Classification results among different classifiers

We used the same feature set derived from 188D in [17]. The variables in the feature set were arranged in descending order according to their scores. In this feature dimension, a hybrid ensemble classifier was established on the training set with the same 243 PPR-positive proteins and 4795 PPR-negative ones selected in [17]. Quantitative classification results were obtained on the corresponding test set, as shown in Table 1. For comparison, random forest and other ensemble classifiers with the base classifiers to be KNN, MLP, LDA, GNB and SVM were built using the same training set with their quantitative experimental results on the same test set listed from Table 2 to Table 7. Besides, the corresoponding classficiation results of single classifiers including KNN, MLP, LDA, GNB and SVM were listed from Table 8 to Table 12 using WEKA [24]. The confusion matrix, TP rate, FP rate, precision, recall, and F1 measure were calculated and listed. Two categories representing PPR-positive proteins (labeled a) and PPR-negative ones (label b) were alternatively considered as positive. Besides, the average measure of the two classes which includes TP rate, FP rate, precision, recall, and F1 was calculated.

Table 2

Open in new tab

Quantitative results using random forest

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.279	0.031	0.316	0.279	0.296
	a	68	176	b: positive	0.969	0.721	0.964	0.969	0.966
	b	147	4648	weighted average	0.624	0.376	0.640	0.624	0.631
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.365	0.016	0.533	0.365	0.433
	a	89	155	b: positive	0.984	0.635	0.968	0.984	0.976
	b	78	4717	weighted average	0.674	0.326	0.751	0.674	0.704
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.484	0.012	0.678	0.484	0.565
	a	118	126	b: positive	0.988	0.516	0.974	0.988	0.981
	b	56	4739	weighted average	0.736	0.264	0.826	0.736	0.773
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.586	0.008	0.799	0.586	0.676
	a	143	101	b: positive	0.992	0.414	0.979	0.992	0.986
	b	36	4759	weighted average	0.789	0.211	0.889	0.789	0.831
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.680	0.001	0.976	0.680	0.802
$1, 63, 24, 13, 22,$	a	166	78	b: positive	0.999	0.320	0.984	0.999	0.992
$87, 62, 45, 9)^{T}$	b	4	4791	weighted average	0.840	0.160	0.980	0.840	0.897
188D	classified as − >	a	b	a: positive	0.656	0.001	0.976	0.656	0.784
	a	160	84	b: positive	0999	0.344	0.983	0.999	0.991
	b	1	4794	weighted average	0.827	0.173	0.979	0.827	0.888

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.279	0.031	0.316	0.279	0.296
	a	68	176	b: positive	0.969	0.721	0.964	0.969	0.966
	b	147	4648	weighted average	0.624	0.376	0.640	0.624	0.631
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.365	0.016	0.533	0.365	0.433
	a	89	155	b: positive	0.984	0.635	0.968	0.984	0.976
	b	78	4717	weighted average	0.674	0.326	0.751	0.674	0.704
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.484	0.012	0.678	0.484	0.565
	a	118	126	b: positive	0.988	0.516	0.974	0.988	0.981
	b	56	4739	weighted average	0.736	0.264	0.826	0.736	0.773
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.586	0.008	0.799	0.586	0.676
	a	143	101	b: positive	0.992	0.414	0.979	0.992	0.986
	b	36	4759	weighted average	0.789	0.211	0.889	0.789	0.831
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.680	0.001	0.976	0.680	0.802
$1, 63, 24, 13, 22,$	a	166	78	b: positive	0.999	0.320	0.984	0.999	0.992
$87, 62, 45, 9)^{T}$	b	4	4791	weighted average	0.840	0.160	0.980	0.840	0.897
188D	classified as − >	a	b	a: positive	0.656	0.001	0.976	0.656	0.784
	a	160	84	b: positive	0999	0.344	0.983	0.999	0.991
	b	1	4794	weighted average	0.827	0.173	0.979	0.827	0.888

Table 2

Open in new tab

Quantitative results using random forest

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.279	0.031	0.316	0.279	0.296
	a	68	176	b: positive	0.969	0.721	0.964	0.969	0.966
	b	147	4648	weighted average	0.624	0.376	0.640	0.624	0.631
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.365	0.016	0.533	0.365	0.433
	a	89	155	b: positive	0.984	0.635	0.968	0.984	0.976
	b	78	4717	weighted average	0.674	0.326	0.751	0.674	0.704
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.484	0.012	0.678	0.484	0.565
	a	118	126	b: positive	0.988	0.516	0.974	0.988	0.981
	b	56	4739	weighted average	0.736	0.264	0.826	0.736	0.773
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.586	0.008	0.799	0.586	0.676
	a	143	101	b: positive	0.992	0.414	0.979	0.992	0.986
	b	36	4759	weighted average	0.789	0.211	0.889	0.789	0.831
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.680	0.001	0.976	0.680	0.802
$1, 63, 24, 13, 22,$	a	166	78	b: positive	0.999	0.320	0.984	0.999	0.992
$87, 62, 45, 9)^{T}$	b	4	4791	weighted average	0.840	0.160	0.980	0.840	0.897
188D	classified as − >	a	b	a: positive	0.656	0.001	0.976	0.656	0.784
	a	160	84	b: positive	0999	0.344	0.983	0.999	0.991
	b	1	4794	weighted average	0.827	0.173	0.979	0.827	0.888

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.279	0.031	0.316	0.279	0.296
	a	68	176	b: positive	0.969	0.721	0.964	0.969	0.966
	b	147	4648	weighted average	0.624	0.376	0.640	0.624	0.631
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.365	0.016	0.533	0.365	0.433
	a	89	155	b: positive	0.984	0.635	0.968	0.984	0.976
	b	78	4717	weighted average	0.674	0.326	0.751	0.674	0.704
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.484	0.012	0.678	0.484	0.565
	a	118	126	b: positive	0.988	0.516	0.974	0.988	0.981
	b	56	4739	weighted average	0.736	0.264	0.826	0.736	0.773
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.586	0.008	0.799	0.586	0.676
	a	143	101	b: positive	0.992	0.414	0.979	0.992	0.986
	b	36	4759	weighted average	0.789	0.211	0.889	0.789	0.831
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.680	0.001	0.976	0.680	0.802
$1, 63, 24, 13, 22,$	a	166	78	b: positive	0.999	0.320	0.984	0.999	0.992
$87, 62, 45, 9)^{T}$	b	4	4791	weighted average	0.840	0.160	0.980	0.840	0.897
188D	classified as − >	a	b	a: positive	0.656	0.001	0.976	0.656	0.784
	a	160	84	b: positive	0999	0.344	0.983	0.999	0.991
	b	1	4794	weighted average	0.827	0.173	0.979	0.827	0.888

Table 3

Open in new tab

Quantitative results of the ensemble classifier with its base classifier to be KNN

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.139	0.009	0.447	0.139	0.212
	a	34	210	b: positive	0.991	0.861	0.958	0.991	0.974
	b	42	4753	weighted average	0.565	0.435	0.703	0.565	0.593
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.373	0.015	0.558	0.373	0.447
	a	91	153	b: positive	0.985	0.627	0.969	0.985	0.977
	b	72	4723	weighted average	0.679	0.321	0.763	0.679	0.712
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.537	0.017	0.618	0.537	0.575
	a	131	113	b: positive	0.983	0.463	0.977	0.983	0.980
	b	81	4714	weighted average	0.760	0.240	0.797	0.760	0.777
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.066	0.014	0.706	0.660	0.682
	a	161	83	b: positive	0.986	0.340	0.983	0.986	0.984
	b	67	4728	weighted average	0.823	0.177	0.844	0.823	0.833
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.791	0.010	0.794	0.791	0.793
$1, 63, 24, 13, 22,$	a	193	51	b: positive	0.990	0.209	0.989	0.990	0.989
$87, 62, 45, 9)^{T}$	b	50	4745	weighted average	0.890	0.110	0.892	0.890	0.891
188D	classified as − >	a	b	a: positive	0.799	0.011	0.793	0.799	0.796
	a	195	49	b: positive	0.989	0.201	0.990	0.989	0.990
	b	51	4744	weighted average	0.894	0.106	0.891	0.894	0.893

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.139	0.009	0.447	0.139	0.212
	a	34	210	b: positive	0.991	0.861	0.958	0.991	0.974
	b	42	4753	weighted average	0.565	0.435	0.703	0.565	0.593
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.373	0.015	0.558	0.373	0.447
	a	91	153	b: positive	0.985	0.627	0.969	0.985	0.977
	b	72	4723	weighted average	0.679	0.321	0.763	0.679	0.712
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.537	0.017	0.618	0.537	0.575
	a	131	113	b: positive	0.983	0.463	0.977	0.983	0.980
	b	81	4714	weighted average	0.760	0.240	0.797	0.760	0.777
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.066	0.014	0.706	0.660	0.682
	a	161	83	b: positive	0.986	0.340	0.983	0.986	0.984
	b	67	4728	weighted average	0.823	0.177	0.844	0.823	0.833
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.791	0.010	0.794	0.791	0.793
$1, 63, 24, 13, 22,$	a	193	51	b: positive	0.990	0.209	0.989	0.990	0.989
$87, 62, 45, 9)^{T}$	b	50	4745	weighted average	0.890	0.110	0.892	0.890	0.891
188D	classified as − >	a	b	a: positive	0.799	0.011	0.793	0.799	0.796
	a	195	49	b: positive	0.989	0.201	0.990	0.989	0.990
	b	51	4744	weighted average	0.894	0.106	0.891	0.894	0.893

Table 3

Open in new tab

Quantitative results of the ensemble classifier with its base classifier to be KNN

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.139	0.009	0.447	0.139	0.212
	a	34	210	b: positive	0.991	0.861	0.958	0.991	0.974
	b	42	4753	weighted average	0.565	0.435	0.703	0.565	0.593
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.373	0.015	0.558	0.373	0.447
	a	91	153	b: positive	0.985	0.627	0.969	0.985	0.977
	b	72	4723	weighted average	0.679	0.321	0.763	0.679	0.712
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.537	0.017	0.618	0.537	0.575
	a	131	113	b: positive	0.983	0.463	0.977	0.983	0.980
	b	81	4714	weighted average	0.760	0.240	0.797	0.760	0.777
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.066	0.014	0.706	0.660	0.682
	a	161	83	b: positive	0.986	0.340	0.983	0.986	0.984
	b	67	4728	weighted average	0.823	0.177	0.844	0.823	0.833
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.791	0.010	0.794	0.791	0.793
$1, 63, 24, 13, 22,$	a	193	51	b: positive	0.990	0.209	0.989	0.990	0.989
$87, 62, 45, 9)^{T}$	b	50	4745	weighted average	0.890	0.110	0.892	0.890	0.891
188D	classified as − >	a	b	a: positive	0.799	0.011	0.793	0.799	0.796
	a	195	49	b: positive	0.989	0.201	0.990	0.989	0.990
	b	51	4744	weighted average	0.894	0.106	0.891	0.894	0.893

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.139	0.009	0.447	0.139	0.212
	a	34	210	b: positive	0.991	0.861	0.958	0.991	0.974
	b	42	4753	weighted average	0.565	0.435	0.703	0.565	0.593
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.373	0.015	0.558	0.373	0.447
	a	91	153	b: positive	0.985	0.627	0.969	0.985	0.977
	b	72	4723	weighted average	0.679	0.321	0.763	0.679	0.712
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.537	0.017	0.618	0.537	0.575
	a	131	113	b: positive	0.983	0.463	0.977	0.983	0.980
	b	81	4714	weighted average	0.760	0.240	0.797	0.760	0.777
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.066	0.014	0.706	0.660	0.682
	a	161	83	b: positive	0.986	0.340	0.983	0.986	0.984
	b	67	4728	weighted average	0.823	0.177	0.844	0.823	0.833
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.791	0.010	0.794	0.791	0.793
$1, 63, 24, 13, 22,$	a	193	51	b: positive	0.990	0.209	0.989	0.990	0.989
$87, 62, 45, 9)^{T}$	b	50	4745	weighted average	0.890	0.110	0.892	0.890	0.891
188D	classified as − >	a	b	a: positive	0.799	0.011	0.793	0.799	0.796
	a	195	49	b: positive	0.989	0.201	0.990	0.989	0.990
	b	51	4744	weighted average	0.894	0.106	0.891	0.894	0.893

Table 4

Open in new tab

Quantitative results of the ensemble classifier with its base classifier to be MLP

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.000	0.000	0.000	0.000	0.000
	a	0	244	b: positive	1.000	1.000	0.952	1.000	0.975
	b	0	4795	weighted average	0.500	0.500	0.476	0.500	0.488
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.082	0.004	0.526	0.082	0.142
	a	20	224	b: positive	0.996	0.918	0.955	0.996	0.975
	b	18	4777	weighted average	0.539	0.461	0.741	0.539	0.559
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.447	0.010	0.686	0.447	0.541
	a	109	135	b: positive	0.990	0.553	0.972	0.990	0.981
	b	50	4745	weighted average	0.718	0.282	0.829	0.718	0.761
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.557	0.009	0.764	0.557	0.645
	a	136	108	b: positive	0.991	0.443	0.978	0.991	0.984
	b	42	4753	weighted average	0.774	0.226	0.871	0.774	0.815
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.803	0.006	0.871	0.803	0.836
$1, 63, 24, 13, 22,$	a	196	48	b: positive	0.994	0.197	0.990	0.994	0.992
$87, 62, 45, 9)^{T}$	b	29	4766	weighted average	0.899	0.101	0.931	0.899	0.914
188D	classified as − >	a	b	a: positive	0.877	0.002	0.951	0.877	0.913
	a	214	30	b: positive	0.998	0.123	0.994	0.998	0.996
	b	11	4784	weighted average	0.937	0.063	0.972	0.937	0.954

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.000	0.000	0.000	0.000	0.000
	a	0	244	b: positive	1.000	1.000	0.952	1.000	0.975
	b	0	4795	weighted average	0.500	0.500	0.476	0.500	0.488
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.082	0.004	0.526	0.082	0.142
	a	20	224	b: positive	0.996	0.918	0.955	0.996	0.975
	b	18	4777	weighted average	0.539	0.461	0.741	0.539	0.559
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.447	0.010	0.686	0.447	0.541
	a	109	135	b: positive	0.990	0.553	0.972	0.990	0.981
	b	50	4745	weighted average	0.718	0.282	0.829	0.718	0.761
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.557	0.009	0.764	0.557	0.645
	a	136	108	b: positive	0.991	0.443	0.978	0.991	0.984
	b	42	4753	weighted average	0.774	0.226	0.871	0.774	0.815
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.803	0.006	0.871	0.803	0.836
$1, 63, 24, 13, 22,$	a	196	48	b: positive	0.994	0.197	0.990	0.994	0.992
$87, 62, 45, 9)^{T}$	b	29	4766	weighted average	0.899	0.101	0.931	0.899	0.914
188D	classified as − >	a	b	a: positive	0.877	0.002	0.951	0.877	0.913
	a	214	30	b: positive	0.998	0.123	0.994	0.998	0.996
	b	11	4784	weighted average	0.937	0.063	0.972	0.937	0.954

Table 4

Open in new tab

Quantitative results of the ensemble classifier with its base classifier to be MLP

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.000	0.000	0.000	0.000	0.000
	a	0	244	b: positive	1.000	1.000	0.952	1.000	0.975
	b	0	4795	weighted average	0.500	0.500	0.476	0.500	0.488
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.082	0.004	0.526	0.082	0.142
	a	20	224	b: positive	0.996	0.918	0.955	0.996	0.975
	b	18	4777	weighted average	0.539	0.461	0.741	0.539	0.559
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.447	0.010	0.686	0.447	0.541
	a	109	135	b: positive	0.990	0.553	0.972	0.990	0.981
	b	50	4745	weighted average	0.718	0.282	0.829	0.718	0.761
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.557	0.009	0.764	0.557	0.645
	a	136	108	b: positive	0.991	0.443	0.978	0.991	0.984
	b	42	4753	weighted average	0.774	0.226	0.871	0.774	0.815
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.803	0.006	0.871	0.803	0.836
$1, 63, 24, 13, 22,$	a	196	48	b: positive	0.994	0.197	0.990	0.994	0.992
$87, 62, 45, 9)^{T}$	b	29	4766	weighted average	0.899	0.101	0.931	0.899	0.914
188D	classified as − >	a	b	a: positive	0.877	0.002	0.951	0.877	0.913
	a	214	30	b: positive	0.998	0.123	0.994	0.998	0.996
	b	11	4784	weighted average	0.937	0.063	0.972	0.937	0.954

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.000	0.000	0.000	0.000	0.000
	a	0	244	b: positive	1.000	1.000	0.952	1.000	0.975
	b	0	4795	weighted average	0.500	0.500	0.476	0.500	0.488
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.082	0.004	0.526	0.082	0.142
	a	20	224	b: positive	0.996	0.918	0.955	0.996	0.975
	b	18	4777	weighted average	0.539	0.461	0.741	0.539	0.559
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.447	0.010	0.686	0.447	0.541
	a	109	135	b: positive	0.990	0.553	0.972	0.990	0.981
	b	50	4745	weighted average	0.718	0.282	0.829	0.718	0.761
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.557	0.009	0.764	0.557	0.645
	a	136	108	b: positive	0.991	0.443	0.978	0.991	0.984
	b	42	4753	weighted average	0.774	0.226	0.871	0.774	0.815
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.803	0.006	0.871	0.803	0.836
$1, 63, 24, 13, 22,$	a	196	48	b: positive	0.994	0.197	0.990	0.994	0.992
$87, 62, 45, 9)^{T}$	b	29	4766	weighted average	0.899	0.101	0.931	0.899	0.914
188D	classified as − >	a	b	a: positive	0.877	0.002	0.951	0.877	0.913
	a	214	30	b: positive	0.998	0.123	0.994	0.998	0.996
	b	11	4784	weighted average	0.937	0.063	0.972	0.937	0.954

Table 5

Open in new tab

Quantitative results of the ensemble classifier with its base classifier to be LDA

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.045	0.021	0.099	0.045	0.062
	a	11	233	b: positive	0.979	0.955	0.953	0.979	0.966
	b	100	4695	weighted average	0.512	0.488	0.526	0.512	0.514
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.057	0.019	0.136	0.057	0.081
	a	14	230	b: positive	0.981	0.943	0.953	0.981	0.967
	b	89	4706	weighted average	0.519	0.481	0.545	0.519	0.524
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.107	0.016	0.250	0.107	0.149
	a	26	218	b: positive	0.984	0.893	0.956	0.984	0.970
	b	78	4717	weighted average	0.545	0.455	0.603	0.545	0.560
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.135	0.017	0.292	0.135	0.185
	a	33	211	b: positive	0.983	0.865	0.957	0.983	0.970
	b	80	4715	weighted average	0.559	0.441	0.625	0.559	0.577
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.434	0.015	0.596	0.434	0.502
$1, 63, 24, 13, 22,$	a	106	138	b: positive	0.985	0.566	0.972	0.985	0.978
$87, 62, 45, 9)^{T}$	b	72	4723	weighted average	0.710	0.290	0.784	0.710	0.740
188D	classified as − >	a	b	a: positive	0.820	0.013	0.766	0.820	0.792
	a	200	44	b: positive	0.987	0.180	0.991	0.987	0.989
	b	61	4734	weighted average	0.903	0.097	0.879	0.903	0.891

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.045	0.021	0.099	0.045	0.062
	a	11	233	b: positive	0.979	0.955	0.953	0.979	0.966
	b	100	4695	weighted average	0.512	0.488	0.526	0.512	0.514
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.057	0.019	0.136	0.057	0.081
	a	14	230	b: positive	0.981	0.943	0.953	0.981	0.967
	b	89	4706	weighted average	0.519	0.481	0.545	0.519	0.524
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.107	0.016	0.250	0.107	0.149
	a	26	218	b: positive	0.984	0.893	0.956	0.984	0.970
	b	78	4717	weighted average	0.545	0.455	0.603	0.545	0.560
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.135	0.017	0.292	0.135	0.185
	a	33	211	b: positive	0.983	0.865	0.957	0.983	0.970
	b	80	4715	weighted average	0.559	0.441	0.625	0.559	0.577
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.434	0.015	0.596	0.434	0.502
$1, 63, 24, 13, 22,$	a	106	138	b: positive	0.985	0.566	0.972	0.985	0.978
$87, 62, 45, 9)^{T}$	b	72	4723	weighted average	0.710	0.290	0.784	0.710	0.740
188D	classified as − >	a	b	a: positive	0.820	0.013	0.766	0.820	0.792
	a	200	44	b: positive	0.987	0.180	0.991	0.987	0.989
	b	61	4734	weighted average	0.903	0.097	0.879	0.903	0.891

Table 5

Open in new tab

Quantitative results of the ensemble classifier with its base classifier to be LDA

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.045	0.021	0.099	0.045	0.062
	a	11	233	b: positive	0.979	0.955	0.953	0.979	0.966
	b	100	4695	weighted average	0.512	0.488	0.526	0.512	0.514
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.057	0.019	0.136	0.057	0.081
	a	14	230	b: positive	0.981	0.943	0.953	0.981	0.967
	b	89	4706	weighted average	0.519	0.481	0.545	0.519	0.524
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.107	0.016	0.250	0.107	0.149
	a	26	218	b: positive	0.984	0.893	0.956	0.984	0.970
	b	78	4717	weighted average	0.545	0.455	0.603	0.545	0.560
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.135	0.017	0.292	0.135	0.185
	a	33	211	b: positive	0.983	0.865	0.957	0.983	0.970
	b	80	4715	weighted average	0.559	0.441	0.625	0.559	0.577
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.434	0.015	0.596	0.434	0.502
$1, 63, 24, 13, 22,$	a	106	138	b: positive	0.985	0.566	0.972	0.985	0.978
$87, 62, 45, 9)^{T}$	b	72	4723	weighted average	0.710	0.290	0.784	0.710	0.740
188D	classified as − >	a	b	a: positive	0.820	0.013	0.766	0.820	0.792
	a	200	44	b: positive	0.987	0.180	0.991	0.987	0.989
	b	61	4734	weighted average	0.903	0.097	0.879	0.903	0.891

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.045	0.021	0.099	0.045	0.062
	a	11	233	b: positive	0.979	0.955	0.953	0.979	0.966
	b	100	4695	weighted average	0.512	0.488	0.526	0.512	0.514
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.057	0.019	0.136	0.057	0.081
	a	14	230	b: positive	0.981	0.943	0.953	0.981	0.967
	b	89	4706	weighted average	0.519	0.481	0.545	0.519	0.524
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.107	0.016	0.250	0.107	0.149
	a	26	218	b: positive	0.984	0.893	0.956	0.984	0.970
	b	78	4717	weighted average	0.545	0.455	0.603	0.545	0.560
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.135	0.017	0.292	0.135	0.185
	a	33	211	b: positive	0.983	0.865	0.957	0.983	0.970
	b	80	4715	weighted average	0.559	0.441	0.625	0.559	0.577
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.434	0.015	0.596	0.434	0.502
$1, 63, 24, 13, 22,$	a	106	138	b: positive	0.985	0.566	0.972	0.985	0.978
$87, 62, 45, 9)^{T}$	b	72	4723	weighted average	0.710	0.290	0.784	0.710	0.740
188D	classified as − >	a	b	a: positive	0.820	0.013	0.766	0.820	0.792
	a	200	44	b: positive	0.987	0.180	0.991	0.987	0.989
	b	61	4734	weighted average	0.903	0.097	0.879	0.903	0.891

Table 6

Open in new tab

Quantitative results of the ensemble classifier with its base classifier to be GNB

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.004	0.005	0.040	0.004	0.007
	a	1	243	b: positive	0.995	0.996	0.952	0.995	0.973
	b	24	4771	weighted average	0.500	0.500	0.496	0.500	0.490
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.357	0.021	0.463	0.357	0.403
	a	87	157	b: positive	0.979	0.643	0.968	0.979	0.973
	b	101	4694	weighted average	0.668	0.332	0.715	0.668	0.688
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.623	0.021	0.603	0.623	0.613
	a	152	92	b: positive	0.979	0.377	0.981	0.979	0.980
	b	100	4695	weighted average	0.801	0.199	0.792	0.801	0.796
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.721	0.017	0.682	0.721	0.701
	a	176	68	b: positive	0.983	0.279	0.986	0.983	0.984
	b	82	4713	weighted average	0.852	0.148	0.834	0.852	0.843
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.910	0.054	0.461	0.910	0.612
$1, 63, 24, 13, 22,$	a	222	22	b: positive	0.946	0.090	0.995	0.946	0.970
$87, 62, 45, 9)^{T}$	b	260	4535	weighted average	0.928	0.072	0.728	0.928	0.791
188D	classified as − >	a	b	a: positive	0.955	0.293	0.142	0.955	0.248
	a	233	11	b: positive	0.707	0.045	0.997	0.707	0.827
	b	1404	3391	weighted average	0.831	0.169	0.570	0.831	0.538

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.004	0.005	0.040	0.004	0.007
	a	1	243	b: positive	0.995	0.996	0.952	0.995	0.973
	b	24	4771	weighted average	0.500	0.500	0.496	0.500	0.490
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.357	0.021	0.463	0.357	0.403
	a	87	157	b: positive	0.979	0.643	0.968	0.979	0.973
	b	101	4694	weighted average	0.668	0.332	0.715	0.668	0.688
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.623	0.021	0.603	0.623	0.613
	a	152	92	b: positive	0.979	0.377	0.981	0.979	0.980
	b	100	4695	weighted average	0.801	0.199	0.792	0.801	0.796
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.721	0.017	0.682	0.721	0.701
	a	176	68	b: positive	0.983	0.279	0.986	0.983	0.984
	b	82	4713	weighted average	0.852	0.148	0.834	0.852	0.843
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.910	0.054	0.461	0.910	0.612
$1, 63, 24, 13, 22,$	a	222	22	b: positive	0.946	0.090	0.995	0.946	0.970
$87, 62, 45, 9)^{T}$	b	260	4535	weighted average	0.928	0.072	0.728	0.928	0.791
188D	classified as − >	a	b	a: positive	0.955	0.293	0.142	0.955	0.248
	a	233	11	b: positive	0.707	0.045	0.997	0.707	0.827
	b	1404	3391	weighted average	0.831	0.169	0.570	0.831	0.538

Table 6

Open in new tab

Quantitative results of the ensemble classifier with its base classifier to be GNB

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.004	0.005	0.040	0.004	0.007
	a	1	243	b: positive	0.995	0.996	0.952	0.995	0.973
	b	24	4771	weighted average	0.500	0.500	0.496	0.500	0.490
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.357	0.021	0.463	0.357	0.403
	a	87	157	b: positive	0.979	0.643	0.968	0.979	0.973
	b	101	4694	weighted average	0.668	0.332	0.715	0.668	0.688
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.623	0.021	0.603	0.623	0.613
	a	152	92	b: positive	0.979	0.377	0.981	0.979	0.980
	b	100	4695	weighted average	0.801	0.199	0.792	0.801	0.796
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.721	0.017	0.682	0.721	0.701
	a	176	68	b: positive	0.983	0.279	0.986	0.983	0.984
	b	82	4713	weighted average	0.852	0.148	0.834	0.852	0.843
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.910	0.054	0.461	0.910	0.612
$1, 63, 24, 13, 22,$	a	222	22	b: positive	0.946	0.090	0.995	0.946	0.970
$87, 62, 45, 9)^{T}$	b	260	4535	weighted average	0.928	0.072	0.728	0.928	0.791
188D	classified as − >	a	b	a: positive	0.955	0.293	0.142	0.955	0.248
	a	233	11	b: positive	0.707	0.045	0.997	0.707	0.827
	b	1404	3391	weighted average	0.831	0.169	0.570	0.831	0.538

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.004	0.005	0.040	0.004	0.007
	a	1	243	b: positive	0.995	0.996	0.952	0.995	0.973
	b	24	4771	weighted average	0.500	0.500	0.496	0.500	0.490
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.357	0.021	0.463	0.357	0.403
	a	87	157	b: positive	0.979	0.643	0.968	0.979	0.973
	b	101	4694	weighted average	0.668	0.332	0.715	0.668	0.688
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.623	0.021	0.603	0.623	0.613
	a	152	92	b: positive	0.979	0.377	0.981	0.979	0.980
	b	100	4695	weighted average	0.801	0.199	0.792	0.801	0.796
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.721	0.017	0.682	0.721	0.701
	a	176	68	b: positive	0.983	0.279	0.986	0.983	0.984
	b	82	4713	weighted average	0.852	0.148	0.834	0.852	0.843
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.910	0.054	0.461	0.910	0.612
$1, 63, 24, 13, 22,$	a	222	22	b: positive	0.946	0.090	0.995	0.946	0.970
$87, 62, 45, 9)^{T}$	b	260	4535	weighted average	0.928	0.072	0.728	0.928	0.791
188D	classified as − >	a	b	a: positive	0.955	0.293	0.142	0.955	0.248
	a	233	11	b: positive	0.707	0.045	0.997	0.707	0.827
	b	1404	3391	weighted average	0.831	0.169	0.570	0.831	0.538

Table 7

Open in new tab

Quantitative results of the ensemble classifier with its base classifier to be SVM

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.000	0.000	0.000	0.000	0.000
	a	0	244	b: positive	1.000	1.000	0.952	1.000	0.975
	b	0	4795	weighted average	0.500	0.500	0.476	0.500	0.488
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.074	0.004	0.486	0.074	0.128
	a	18	226	b: positive	0.996	0.926	0.955	0.996	0.975
	b	19	4776	weighted average	0.535	0.465	0.721	0.535	0.552
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.496	0.010	0.708	0.496	0.583
	a	121	123	b: positive	0.990	0.504	0.975	0.990	0.982
	b	50	4745	weighted average	0.743	0.257	0.841	0.743	0.783
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.582	0.009	0.763	0.582	0.660
	a	142	102	b: positive	0.991	0.418	0.979	0.991	0.985
	b	44	4751	weighted average	0.786	0.214	0.871	0.786	0.823
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.754	0.004	0.911	0.754	0.825
$1, 63, 24, 13, 22,$	a	184	60	b: positive	0.996	0.246	0.988	0.996	0.992
$87, 62, 45, 9)^{T}$	b	18	4777	weighted average	0.875	0.125	0.949	0.875	0.909
188D	classified as − >	a	b	a: positive	0.828	0.001	0.981	0.828	0.898
	a	202	42	b: positive	0.999	0.172	0.991	0.999	0.995
	b	4	4791	weighted average	0.914	0.086	0.986	0.914	0.947

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.000	0.000	0.000	0.000	0.000
	a	0	244	b: positive	1.000	1.000	0.952	1.000	0.975
	b	0	4795	weighted average	0.500	0.500	0.476	0.500	0.488
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.074	0.004	0.486	0.074	0.128
	a	18	226	b: positive	0.996	0.926	0.955	0.996	0.975
	b	19	4776	weighted average	0.535	0.465	0.721	0.535	0.552
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.496	0.010	0.708	0.496	0.583
	a	121	123	b: positive	0.990	0.504	0.975	0.990	0.982
	b	50	4745	weighted average	0.743	0.257	0.841	0.743	0.783
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.582	0.009	0.763	0.582	0.660
	a	142	102	b: positive	0.991	0.418	0.979	0.991	0.985
	b	44	4751	weighted average	0.786	0.214	0.871	0.786	0.823
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.754	0.004	0.911	0.754	0.825
$1, 63, 24, 13, 22,$	a	184	60	b: positive	0.996	0.246	0.988	0.996	0.992
$87, 62, 45, 9)^{T}$	b	18	4777	weighted average	0.875	0.125	0.949	0.875	0.909
188D	classified as − >	a	b	a: positive	0.828	0.001	0.981	0.828	0.898
	a	202	42	b: positive	0.999	0.172	0.991	0.999	0.995
	b	4	4791	weighted average	0.914	0.086	0.986	0.914	0.947

Table 7

Open in new tab

Quantitative results of the ensemble classifier with its base classifier to be SVM

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.000	0.000	0.000	0.000	0.000
	a	0	244	b: positive	1.000	1.000	0.952	1.000	0.975
	b	0	4795	weighted average	0.500	0.500	0.476	0.500	0.488
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.074	0.004	0.486	0.074	0.128
	a	18	226	b: positive	0.996	0.926	0.955	0.996	0.975
	b	19	4776	weighted average	0.535	0.465	0.721	0.535	0.552
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.496	0.010	0.708	0.496	0.583
	a	121	123	b: positive	0.990	0.504	0.975	0.990	0.982
	b	50	4745	weighted average	0.743	0.257	0.841	0.743	0.783
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.582	0.009	0.763	0.582	0.660
	a	142	102	b: positive	0.991	0.418	0.979	0.991	0.985
	b	44	4751	weighted average	0.786	0.214	0.871	0.786	0.823
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.754	0.004	0.911	0.754	0.825
$1, 63, 24, 13, 22,$	a	184	60	b: positive	0.996	0.246	0.988	0.996	0.992
$87, 62, 45, 9)^{T}$	b	18	4777	weighted average	0.875	0.125	0.949	0.875	0.909
188D	classified as − >	a	b	a: positive	0.828	0.001	0.981	0.828	0.898
	a	202	42	b: positive	0.999	0.172	0.991	0.999	0.995
	b	4	4791	weighted average	0.914	0.086	0.986	0.914	0.947

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.000	0.000	0.000	0.000	0.000
	a	0	244	b: positive	1.000	1.000	0.952	1.000	0.975
	b	0	4795	weighted average	0.500	0.500	0.476	0.500	0.488
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.074	0.004	0.486	0.074	0.128
	a	18	226	b: positive	0.996	0.926	0.955	0.996	0.975
	b	19	4776	weighted average	0.535	0.465	0.721	0.535	0.552
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.496	0.010	0.708	0.496	0.583
	a	121	123	b: positive	0.990	0.504	0.975	0.990	0.982
	b	50	4745	weighted average	0.743	0.257	0.841	0.743	0.783
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.582	0.009	0.763	0.582	0.660
	a	142	102	b: positive	0.991	0.418	0.979	0.991	0.985
	b	44	4751	weighted average	0.786	0.214	0.871	0.786	0.823
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.754	0.004	0.911	0.754	0.825
$1, 63, 24, 13, 22,$	a	184	60	b: positive	0.996	0.246	0.988	0.996	0.992
$87, 62, 45, 9)^{T}$	b	18	4777	weighted average	0.875	0.125	0.949	0.875	0.909
188D	classified as − >	a	b	a: positive	0.828	0.001	0.981	0.828	0.898
	a	202	42	b: positive	0.999	0.172	0.991	0.999	0.995
	b	4	4791	weighted average	0.914	0.086	0.986	0.914	0.947

Table 8

Open in new tab

Quantitative results of KNN using WEKA

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.262	0.032	0.294	0.262	0.277
	a	64	180	b: positive	0.968	0.738	0.963	0.968	0.965
	b	154	4641	weighted average	0.615	0.385	0.628	0.615	0.621
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.352	0.030	0.372	0.352	0.362
	a	86	158	b: positive	0.970	0.648	0.967	0.970	0.968
	b	145	4650	weighted average	0.661	0.339	0.670	0.661	0.665
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.455	0.028	0.455	0.455	0.455
	a	111	133	b: positive	0.972	0.545	0.972	0.972	0.972
	b	133	4662	weighted average	0.714	0.286	0.714	0.714	0.714
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.578	0.023	0.562	0.578	0.570
	a	141	103	b: positive	0.977	0.422	0.978	0.977	0.978
	b	110	4685	weighted average	0.777	0.223	0.770	0.777	0.774
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.799	0.020	0.666	0.799	0.726
$1, 63, 24, 13, 22,$	a	195	49	b: positive	0.980	0.201	0.990	0.980	0.985
$87, 62, 45, 9)^{T}$	b	98	4697	weighted average	0.889	0.111	0.828	0.889	0.855
188D	classified as − >	a	b	a: positive	0.840	0.024	0.639	0.840	0.726
	a	205	39	b: positive	0.976	0.160	0.992	0.976	0.984
	b	116	4679	weighted average	0.908	0.092	0.815	0.908	0.855

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.262	0.032	0.294	0.262	0.277
	a	64	180	b: positive	0.968	0.738	0.963	0.968	0.965
	b	154	4641	weighted average	0.615	0.385	0.628	0.615	0.621
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.352	0.030	0.372	0.352	0.362
	a	86	158	b: positive	0.970	0.648	0.967	0.970	0.968
	b	145	4650	weighted average	0.661	0.339	0.670	0.661	0.665
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.455	0.028	0.455	0.455	0.455
	a	111	133	b: positive	0.972	0.545	0.972	0.972	0.972
	b	133	4662	weighted average	0.714	0.286	0.714	0.714	0.714
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.578	0.023	0.562	0.578	0.570
	a	141	103	b: positive	0.977	0.422	0.978	0.977	0.978
	b	110	4685	weighted average	0.777	0.223	0.770	0.777	0.774
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.799	0.020	0.666	0.799	0.726
$1, 63, 24, 13, 22,$	a	195	49	b: positive	0.980	0.201	0.990	0.980	0.985
$87, 62, 45, 9)^{T}$	b	98	4697	weighted average	0.889	0.111	0.828	0.889	0.855
188D	classified as − >	a	b	a: positive	0.840	0.024	0.639	0.840	0.726
	a	205	39	b: positive	0.976	0.160	0.992	0.976	0.984
	b	116	4679	weighted average	0.908	0.092	0.815	0.908	0.855

Table 8

Open in new tab

Quantitative results of KNN using WEKA

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.262	0.032	0.294	0.262	0.277
	a	64	180	b: positive	0.968	0.738	0.963	0.968	0.965
	b	154	4641	weighted average	0.615	0.385	0.628	0.615	0.621
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.352	0.030	0.372	0.352	0.362
	a	86	158	b: positive	0.970	0.648	0.967	0.970	0.968
	b	145	4650	weighted average	0.661	0.339	0.670	0.661	0.665
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.455	0.028	0.455	0.455	0.455
	a	111	133	b: positive	0.972	0.545	0.972	0.972	0.972
	b	133	4662	weighted average	0.714	0.286	0.714	0.714	0.714
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.578	0.023	0.562	0.578	0.570
	a	141	103	b: positive	0.977	0.422	0.978	0.977	0.978
	b	110	4685	weighted average	0.777	0.223	0.770	0.777	0.774
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.799	0.020	0.666	0.799	0.726
$1, 63, 24, 13, 22,$	a	195	49	b: positive	0.980	0.201	0.990	0.980	0.985
$87, 62, 45, 9)^{T}$	b	98	4697	weighted average	0.889	0.111	0.828	0.889	0.855
188D	classified as − >	a	b	a: positive	0.840	0.024	0.639	0.840	0.726
	a	205	39	b: positive	0.976	0.160	0.992	0.976	0.984
	b	116	4679	weighted average	0.908	0.092	0.815	0.908	0.855

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.262	0.032	0.294	0.262	0.277
	a	64	180	b: positive	0.968	0.738	0.963	0.968	0.965
	b	154	4641	weighted average	0.615	0.385	0.628	0.615	0.621
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.352	0.030	0.372	0.352	0.362
	a	86	158	b: positive	0.970	0.648	0.967	0.970	0.968
	b	145	4650	weighted average	0.661	0.339	0.670	0.661	0.665
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.455	0.028	0.455	0.455	0.455
	a	111	133	b: positive	0.972	0.545	0.972	0.972	0.972
	b	133	4662	weighted average	0.714	0.286	0.714	0.714	0.714
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.578	0.023	0.562	0.578	0.570
	a	141	103	b: positive	0.977	0.422	0.978	0.977	0.978
	b	110	4685	weighted average	0.777	0.223	0.770	0.777	0.774
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.799	0.020	0.666	0.799	0.726
$1, 63, 24, 13, 22,$	a	195	49	b: positive	0.980	0.201	0.990	0.980	0.985
$87, 62, 45, 9)^{T}$	b	98	4697	weighted average	0.889	0.111	0.828	0.889	0.855
188D	classified as − >	a	b	a: positive	0.840	0.024	0.639	0.840	0.726
	a	205	39	b: positive	0.976	0.160	0.992	0.976	0.984
	b	116	4679	weighted average	0.908	0.092	0.815	0.908	0.855

Table 9

Open in new tab

Quantitative results of MLP using WEKA

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.000	0.000	0.000	0.000	0.000
	a	0	244	b: positive	1.000	1.000	0.952	1.000	0.975
	b	0	4795	weighted average	0.500	0.500	0.476	0.500	0.488
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.000	0.000	0.000	0.000	0.000
	a	0	244	b: positive	1.000	1.000	0.952	1.000	0.975
	b	0	4795	weighted average	0.500	0.500	0.476	0.500	0.488
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.238	0.012	0.500	0.238	0.322
	a	58	186	b: positive	0.988	0.762	0.962	0.988	0.975
	b	58	4737	weighted average	0.613	0.387	0.731	0.613	0.649
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.504	0.009	0.737	0.504	0.599
	a	123	121	b: positive	0.991	0.496	0.975	0.991	0.983
	b	44	4751	weighted average	0.747	0.253	0.856	0.747	0.791
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.750	0.008	0.832	0.750	0.789
$1, 63, 24, 13, 22,$	a	183	61	b: positive	0.992	0.250	0.987	0.992	0.990
$87, 62, 45, 9)^{T}$	b	37	4758	weighted average	0.871	0.129	0.910	0.871	0.889
188D	classified as − >	a	b	: positive	0.004	0.001	0.167	0.004	0.008
	a	1	243	b: positive	0.999	0.996	0.952	0.999	0.975
	b	5	4790	weighted average	0.502	0.498	0.559	0.502	0.491

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.000	0.000	0.000	0.000	0.000
	a	0	244	b: positive	1.000	1.000	0.952	1.000	0.975
	b	0	4795	weighted average	0.500	0.500	0.476	0.500	0.488
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.000	0.000	0.000	0.000	0.000
	a	0	244	b: positive	1.000	1.000	0.952	1.000	0.975
	b	0	4795	weighted average	0.500	0.500	0.476	0.500	0.488
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.238	0.012	0.500	0.238	0.322
	a	58	186	b: positive	0.988	0.762	0.962	0.988	0.975
	b	58	4737	weighted average	0.613	0.387	0.731	0.613	0.649
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.504	0.009	0.737	0.504	0.599
	a	123	121	b: positive	0.991	0.496	0.975	0.991	0.983
	b	44	4751	weighted average	0.747	0.253	0.856	0.747	0.791
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.750	0.008	0.832	0.750	0.789
$1, 63, 24, 13, 22,$	a	183	61	b: positive	0.992	0.250	0.987	0.992	0.990
$87, 62, 45, 9)^{T}$	b	37	4758	weighted average	0.871	0.129	0.910	0.871	0.889
188D	classified as − >	a	b	: positive	0.004	0.001	0.167	0.004	0.008
	a	1	243	b: positive	0.999	0.996	0.952	0.999	0.975
	b	5	4790	weighted average	0.502	0.498	0.559	0.502	0.491

Table 9

Open in new tab

Quantitative results of MLP using WEKA

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.000	0.000	0.000	0.000	0.000
	a	0	244	b: positive	1.000	1.000	0.952	1.000	0.975
	b	0	4795	weighted average	0.500	0.500	0.476	0.500	0.488
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.000	0.000	0.000	0.000	0.000
	a	0	244	b: positive	1.000	1.000	0.952	1.000	0.975
	b	0	4795	weighted average	0.500	0.500	0.476	0.500	0.488
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.238	0.012	0.500	0.238	0.322
	a	58	186	b: positive	0.988	0.762	0.962	0.988	0.975
	b	58	4737	weighted average	0.613	0.387	0.731	0.613	0.649
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.504	0.009	0.737	0.504	0.599
	a	123	121	b: positive	0.991	0.496	0.975	0.991	0.983
	b	44	4751	weighted average	0.747	0.253	0.856	0.747	0.791
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.750	0.008	0.832	0.750	0.789
$1, 63, 24, 13, 22,$	a	183	61	b: positive	0.992	0.250	0.987	0.992	0.990
$87, 62, 45, 9)^{T}$	b	37	4758	weighted average	0.871	0.129	0.910	0.871	0.889
188D	classified as − >	a	b	: positive	0.004	0.001	0.167	0.004	0.008
	a	1	243	b: positive	0.999	0.996	0.952	0.999	0.975
	b	5	4790	weighted average	0.502	0.498	0.559	0.502	0.491

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.000	0.000	0.000	0.000	0.000
	a	0	244	b: positive	1.000	1.000	0.952	1.000	0.975
	b	0	4795	weighted average	0.500	0.500	0.476	0.500	0.488
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.000	0.000	0.000	0.000	0.000
	a	0	244	b: positive	1.000	1.000	0.952	1.000	0.975
	b	0	4795	weighted average	0.500	0.500	0.476	0.500	0.488
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.238	0.012	0.500	0.238	0.322
	a	58	186	b: positive	0.988	0.762	0.962	0.988	0.975
	b	58	4737	weighted average	0.613	0.387	0.731	0.613	0.649
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.504	0.009	0.737	0.504	0.599
	a	123	121	b: positive	0.991	0.496	0.975	0.991	0.983
	b	44	4751	weighted average	0.747	0.253	0.856	0.747	0.791
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.750	0.008	0.832	0.750	0.789
$1, 63, 24, 13, 22,$	a	183	61	b: positive	0.992	0.250	0.987	0.992	0.990
$87, 62, 45, 9)^{T}$	b	37	4758	weighted average	0.871	0.129	0.910	0.871	0.889
188D	classified as − >	a	b	: positive	0.004	0.001	0.167	0.004	0.008
	a	1	243	b: positive	0.999	0.996	0.952	0.999	0.975
	b	5	4790	weighted average	0.502	0.498	0.559	0.502	0.491

Table 10

Open in new tab

Quantitative results of LDA using WEKA

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.033	0.021	0.075	0.033	0.046
	a	8	236	b: positive	0.979	0.967	0.952	0.979	0.966
	b	99	4696	weighted average	0.506	0.494	0.513	0.506	0.506
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.045	0.018	0.112	0.045	0.064
	a	11	233	b: positive	0.982	0.955	0.953	0.982	0.967
	b	87	4708	weighted average	0.513	0.487	0.533	0.513	0.516
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.082	0.016	0.211	0.082	0.118
	a	20	224	b: positive	0.984	0.918	0.955	0.984	0.969
	b	75	4720	weighted average	0.533	0.467	0.583	0.533	0.544
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.102	0.017	0.236	0.102	0.143
	a	25	219	b: positive	0.983	0.898	0.956	0.983	0.969
	b	81	4714	weighted average	0.543	0.457	0.596	0.543	0.556
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.381	0.015	0.564	0.381	0.455
$1, 63, 24, 13, 22,$	a	93	151	b: positive	0.985	0.619	0.969	0.985	0.977
$87, 62, 45, 9)^{T}$	b	72	4723	weighted average	0.683	0.317	0.766	0.683	0.716
188D	classified as − >	a	b	a: positive	0.783	0.012	0.767	0.783	0.775
	a	191	53	b: positive	0.988	0.217	0.989	0.988	0.988
	b	58	4737	weighted average	0.885	0.115	0.878	0.885	0.882

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.033	0.021	0.075	0.033	0.046
	a	8	236	b: positive	0.979	0.967	0.952	0.979	0.966
	b	99	4696	weighted average	0.506	0.494	0.513	0.506	0.506
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.045	0.018	0.112	0.045	0.064
	a	11	233	b: positive	0.982	0.955	0.953	0.982	0.967
	b	87	4708	weighted average	0.513	0.487	0.533	0.513	0.516
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.082	0.016	0.211	0.082	0.118
	a	20	224	b: positive	0.984	0.918	0.955	0.984	0.969
	b	75	4720	weighted average	0.533	0.467	0.583	0.533	0.544
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.102	0.017	0.236	0.102	0.143
	a	25	219	b: positive	0.983	0.898	0.956	0.983	0.969
	b	81	4714	weighted average	0.543	0.457	0.596	0.543	0.556
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.381	0.015	0.564	0.381	0.455
$1, 63, 24, 13, 22,$	a	93	151	b: positive	0.985	0.619	0.969	0.985	0.977
$87, 62, 45, 9)^{T}$	b	72	4723	weighted average	0.683	0.317	0.766	0.683	0.716
188D	classified as − >	a	b	a: positive	0.783	0.012	0.767	0.783	0.775
	a	191	53	b: positive	0.988	0.217	0.989	0.988	0.988
	b	58	4737	weighted average	0.885	0.115	0.878	0.885	0.882

Table 10

Open in new tab

Quantitative results of LDA using WEKA

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.033	0.021	0.075	0.033	0.046
	a	8	236	b: positive	0.979	0.967	0.952	0.979	0.966
	b	99	4696	weighted average	0.506	0.494	0.513	0.506	0.506
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.045	0.018	0.112	0.045	0.064
	a	11	233	b: positive	0.982	0.955	0.953	0.982	0.967
	b	87	4708	weighted average	0.513	0.487	0.533	0.513	0.516
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.082	0.016	0.211	0.082	0.118
	a	20	224	b: positive	0.984	0.918	0.955	0.984	0.969
	b	75	4720	weighted average	0.533	0.467	0.583	0.533	0.544
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.102	0.017	0.236	0.102	0.143
	a	25	219	b: positive	0.983	0.898	0.956	0.983	0.969
	b	81	4714	weighted average	0.543	0.457	0.596	0.543	0.556
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.381	0.015	0.564	0.381	0.455
$1, 63, 24, 13, 22,$	a	93	151	b: positive	0.985	0.619	0.969	0.985	0.977
$87, 62, 45, 9)^{T}$	b	72	4723	weighted average	0.683	0.317	0.766	0.683	0.716
188D	classified as − >	a	b	a: positive	0.783	0.012	0.767	0.783	0.775
	a	191	53	b: positive	0.988	0.217	0.989	0.988	0.988
	b	58	4737	weighted average	0.885	0.115	0.878	0.885	0.882

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.033	0.021	0.075	0.033	0.046
	a	8	236	b: positive	0.979	0.967	0.952	0.979	0.966
	b	99	4696	weighted average	0.506	0.494	0.513	0.506	0.506
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.045	0.018	0.112	0.045	0.064
	a	11	233	b: positive	0.982	0.955	0.953	0.982	0.967
	b	87	4708	weighted average	0.513	0.487	0.533	0.513	0.516
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.082	0.016	0.211	0.082	0.118
	a	20	224	b: positive	0.984	0.918	0.955	0.984	0.969
	b	75	4720	weighted average	0.533	0.467	0.583	0.533	0.544
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.102	0.017	0.236	0.102	0.143
	a	25	219	b: positive	0.983	0.898	0.956	0.983	0.969
	b	81	4714	weighted average	0.543	0.457	0.596	0.543	0.556
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.381	0.015	0.564	0.381	0.455
$1, 63, 24, 13, 22,$	a	93	151	b: positive	0.985	0.619	0.969	0.985	0.977
$87, 62, 45, 9)^{T}$	b	72	4723	weighted average	0.683	0.317	0.766	0.683	0.716
188D	classified as − >	a	b	a: positive	0.783	0.012	0.767	0.783	0.775
	a	191	53	b: positive	0.988	0.217	0.989	0.988	0.988
	b	58	4737	weighted average	0.885	0.115	0.878	0.885	0.882

Table 11

Open in new tab

Quantitative results of GNB using WEKA

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.004	0.006	0.031	0.004	0.007
	a	1	243	b: positive	0.994	0.996	0.951	0.994	0.972
	b	31	4764	weighted average	0.499	0.501	0.491	0.499	0.490
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.270	0.020	0.412	0.270	0.327
	a	66	178	b: positive	0.980	0.730	0.964	0.980	0.972
	b	94	4701	weighted average	0.625	0.375	0.688	0.625	0.649
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.537	0.020	0.582	0.537	0.559
	a	131	113	b: positive	0.980	0.463	0.977	0.980	0.978
	b	102	4693	weighted average	0.759	0.241	0.779	0.759	0.769
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.656	0.018	0.650	0.656	0.653
	a	160	84	b: positive	0.982	0.344	0.982	0.982	0.982
	b	86	4709	weighted average	0.819	0.181	0.816	0.819	0.818
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.914	0.064	0.419	0.914	0.575
$1, 63, 24, 13, 22,$	a	223	21	b: positive	0.936	0.086	0.995	0.936	0.965
$87, 62, 45, 9)^{T}$	b	309	4486	weighted average	0.925	0.075	0.707	0.925	0.770
188D	classified as − >	a	b	a: positive	0.955	0.309	0.136	0.955	0.238
	a	233	11	b: positive	0.691	0.045	0.997	0.691	0.816
	b	1482	3313	weighted average	0.823	0.177	0.566	0.823	0.527

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.004	0.006	0.031	0.004	0.007
	a	1	243	b: positive	0.994	0.996	0.951	0.994	0.972
	b	31	4764	weighted average	0.499	0.501	0.491	0.499	0.490
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.270	0.020	0.412	0.270	0.327
	a	66	178	b: positive	0.980	0.730	0.964	0.980	0.972
	b	94	4701	weighted average	0.625	0.375	0.688	0.625	0.649
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.537	0.020	0.582	0.537	0.559
	a	131	113	b: positive	0.980	0.463	0.977	0.980	0.978
	b	102	4693	weighted average	0.759	0.241	0.779	0.759	0.769
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.656	0.018	0.650	0.656	0.653
	a	160	84	b: positive	0.982	0.344	0.982	0.982	0.982
	b	86	4709	weighted average	0.819	0.181	0.816	0.819	0.818
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.914	0.064	0.419	0.914	0.575
$1, 63, 24, 13, 22,$	a	223	21	b: positive	0.936	0.086	0.995	0.936	0.965
$87, 62, 45, 9)^{T}$	b	309	4486	weighted average	0.925	0.075	0.707	0.925	0.770
188D	classified as − >	a	b	a: positive	0.955	0.309	0.136	0.955	0.238
	a	233	11	b: positive	0.691	0.045	0.997	0.691	0.816
	b	1482	3313	weighted average	0.823	0.177	0.566	0.823	0.527

Table 11

Open in new tab

Quantitative results of GNB using WEKA

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.004	0.006	0.031	0.004	0.007
	a	1	243	b: positive	0.994	0.996	0.951	0.994	0.972
	b	31	4764	weighted average	0.499	0.501	0.491	0.499	0.490
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.270	0.020	0.412	0.270	0.327
	a	66	178	b: positive	0.980	0.730	0.964	0.980	0.972
	b	94	4701	weighted average	0.625	0.375	0.688	0.625	0.649
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.537	0.020	0.582	0.537	0.559
	a	131	113	b: positive	0.980	0.463	0.977	0.980	0.978
	b	102	4693	weighted average	0.759	0.241	0.779	0.759	0.769
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.656	0.018	0.650	0.656	0.653
	a	160	84	b: positive	0.982	0.344	0.982	0.982	0.982
	b	86	4709	weighted average	0.819	0.181	0.816	0.819	0.818
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.914	0.064	0.419	0.914	0.575
$1, 63, 24, 13, 22,$	a	223	21	b: positive	0.936	0.086	0.995	0.936	0.965
$87, 62, 45, 9)^{T}$	b	309	4486	weighted average	0.925	0.075	0.707	0.925	0.770
188D	classified as − >	a	b	a: positive	0.955	0.309	0.136	0.955	0.238
	a	233	11	b: positive	0.691	0.045	0.997	0.691	0.816
	b	1482	3313	weighted average	0.823	0.177	0.566	0.823	0.527

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.004	0.006	0.031	0.004	0.007
	a	1	243	b: positive	0.994	0.996	0.951	0.994	0.972
	b	31	4764	weighted average	0.499	0.501	0.491	0.499	0.490
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.270	0.020	0.412	0.270	0.327
	a	66	178	b: positive	0.980	0.730	0.964	0.980	0.972
	b	94	4701	weighted average	0.625	0.375	0.688	0.625	0.649
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.537	0.020	0.582	0.537	0.559
	a	131	113	b: positive	0.980	0.463	0.977	0.980	0.978
	b	102	4693	weighted average	0.759	0.241	0.779	0.759	0.769
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.656	0.018	0.650	0.656	0.653
	a	160	84	b: positive	0.982	0.344	0.982	0.982	0.982
	b	86	4709	weighted average	0.819	0.181	0.816	0.819	0.818
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.914	0.064	0.419	0.914	0.575
$1, 63, 24, 13, 22,$	a	223	21	b: positive	0.936	0.086	0.995	0.936	0.965
$87, 62, 45, 9)^{T}$	b	309	4486	weighted average	0.925	0.075	0.707	0.925	0.770
188D	classified as − >	a	b	a: positive	0.955	0.309	0.136	0.955	0.238
	a	233	11	b: positive	0.691	0.045	0.997	0.691	0.816
	b	1482	3313	weighted average	0.823	0.177	0.566	0.823	0.527

Table 12

Open in new tab

Quantitative results of SVM using WEKA

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.000	0.000	0.000	0.000	0.000
	a	0	244	b: positive	1.000	1.000	0.952	1.000	0.975
	b	0	4795	weighted average	0.500	0.500	0.476	0.500	0.488
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.307	0.009	0.630	0.307	0.413
	a	75	169	b: positive	0.991	0.693	0.966	0.991	0.978
	b	44	4751	weighted average	0.649	0.351	0.798	0.649	0.696
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.447	0.008	0.732	0.447	0.555
	a	109	135	b: positive	0.992	0.553	0.972	0.992	0.982
	b	40	4755	weighted average	0.719	0.281	0.852	0.719	0.768
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.590	0.008	0.796	0.590	0.678
	a	144	100	b: positive	0.992	0.410	0.979	0.992	0.986
	b	37	4758	weighted average	0.791	0.209	0.887	0.791	0.832
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.693	0.003	0.923	0.693	0.792
$1, 63, 24, 13, 22,$	a	169	75	b: positive	0.997	0.307	0.985	0.997	0.991
$87, 62, 45, 9)^{T}$	b	14	4781	weighted average	0.845	0.155	0.954	0.845	0.891
188D	classified as − >	a	b	a: positive	0.172	0.000	0.977	0.172	0.293
	a	42	202	b: positive	1.000	0.828	0.960	1.000	0.979
	b	1	4794	weighted average	0.586	0.414	0.968	0.586	0.636

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.000	0.000	0.000	0.000	0.000
	a	0	244	b: positive	1.000	1.000	0.952	1.000	0.975
	b	0	4795	weighted average	0.500	0.500	0.476	0.500	0.488
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.307	0.009	0.630	0.307	0.413
	a	75	169	b: positive	0.991	0.693	0.966	0.991	0.978
	b	44	4751	weighted average	0.649	0.351	0.798	0.649	0.696
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.447	0.008	0.732	0.447	0.555
	a	109	135	b: positive	0.992	0.553	0.972	0.992	0.982
	b	40	4755	weighted average	0.719	0.281	0.852	0.719	0.768
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.590	0.008	0.796	0.590	0.678
	a	144	100	b: positive	0.992	0.410	0.979	0.992	0.986
	b	37	4758	weighted average	0.791	0.209	0.887	0.791	0.832
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.693	0.003	0.923	0.693	0.792
$1, 63, 24, 13, 22,$	a	169	75	b: positive	0.997	0.307	0.985	0.997	0.991
$87, 62, 45, 9)^{T}$	b	14	4781	weighted average	0.845	0.155	0.954	0.845	0.891
188D	classified as − >	a	b	a: positive	0.172	0.000	0.977	0.172	0.293
	a	42	202	b: positive	1.000	0.828	0.960	1.000	0.979
	b	1	4794	weighted average	0.586	0.414	0.968	0.586	0.636

Table 12

Open in new tab

Quantitative results of SVM using WEKA

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.000	0.000	0.000	0.000	0.000
	a	0	244	b: positive	1.000	1.000	0.952	1.000	0.975
	b	0	4795	weighted average	0.500	0.500	0.476	0.500	0.488
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.307	0.009	0.630	0.307	0.413
	a	75	169	b: positive	0.991	0.693	0.966	0.991	0.978
	b	44	4751	weighted average	0.649	0.351	0.798	0.649	0.696
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.447	0.008	0.732	0.447	0.555
	a	109	135	b: positive	0.992	0.553	0.972	0.992	0.982
	b	40	4755	weighted average	0.719	0.281	0.852	0.719	0.768
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.590	0.008	0.796	0.590	0.678
	a	144	100	b: positive	0.992	0.410	0.979	0.992	0.986
	b	37	4758	weighted average	0.791	0.209	0.887	0.791	0.832
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.693	0.003	0.923	0.693	0.792
$1, 63, 24, 13, 22,$	a	169	75	b: positive	0.997	0.307	0.985	0.997	0.991
$87, 62, 45, 9)^{T}$	b	14	4781	weighted average	0.845	0.155	0.954	0.845	0.891
188D	classified as − >	a	b	a: positive	0.172	0.000	0.977	0.172	0.293
	a	42	202	b: positive	1.000	0.828	0.960	1.000	0.979
	b	1	4794	weighted average	0.586	0.414	0.968	0.586	0.636

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.000	0.000	0.000	0.000	0.000
	a	0	244	b: positive	1.000	1.000	0.952	1.000	0.975
	b	0	4795	weighted average	0.500	0.500	0.476	0.500	0.488
$(10, 12)^{T}$	classified as − >	a	b	a: positive	0.307	0.009	0.630	0.307	0.413
	a	75	169	b: positive	0.991	0.693	0.966	0.991	0.978
	b	44	4751	weighted average	0.649	0.351	0.798	0.649	0.696
$(10, 12, 130)^{T}$	classified as − >	a	b	a: positive	0.447	0.008	0.732	0.447	0.555
	a	109	135	b: positive	0.992	0.553	0.972	0.992	0.982
	b	40	4755	weighted average	0.719	0.281	0.852	0.719	0.768
$(10, 12, 130, 1)^{T}$	classified as − >	a	b	a: positive	0.590	0.008	0.796	0.590	0.678
	a	144	100	b: positive	0.992	0.410	0.979	0.992	0.986
	b	37	4758	weighted average	0.791	0.209	0.887	0.791	0.832
$(10, 12, 152, 130,$	classified as − >	a	b	a: positive	0.693	0.003	0.923	0.693	0.792
$1, 63, 24, 13, 22,$	a	169	75	b: positive	0.997	0.307	0.985	0.997	0.991
$87, 62, 45, 9)^{T}$	b	14	4781	weighted average	0.845	0.155	0.954	0.845	0.891
188D	classified as − >	a	b	a: positive	0.172	0.000	0.977	0.172	0.293
	a	42	202	b: positive	1.000	0.828	0.960	1.000	0.979
	b	1	4794	weighted average	0.586	0.414	0.968	0.586	0.636

Figure 2

Scatter plots derived from feature 188D, each of which corresponds to a fold and keeps x- and y-axis representing a variable and its score, respectively.

Open in new tab Download slide

After contrasting with the results in Table 1 and Table 2, it can be found as follows. First, the average TP rate of the hybrid ensemble classification is 0.005 higher than that of random forest when using the one-dimensional and two-dimensional feature to classify the test samples. Second, the average TP rate of the hybrid ensemble classification is 0.063 higher than that of random forest when the three-dimensional and four-dimensional feature were taken in account. Third, the average TP rates of the hybrid ensemble classification are 0.088 and 0.108 higher than those of random forest when using the 13-dimensional feature and 188D, respectively. Fourth, the TP rate of the hybrid ensemble classification is 0.217 higher than that of random forest using 188D, when PPR-positive proteins (labeled a) are considered. As to PPR-negative ones (labeled b), the TP rate of ensemble classification drops by 0.001 when comparing with the TP rate of random forest. Meanwhile, the number of TP samples in the confusion matrix of the hybrid ensemble classification is bigger than that of random forest using 188D, which indicates the preferable classification performance of the hybrid ensemble classifier. In contrast, more FP samples are found when the hybrid ensemble classification is considered. However, it is acceptable due to the larger sample size of PPR-negative proteins. Besides, the classification results of the hybrid ensemble classifier are better than those of the ensemble classifiers with a same base classifier and single classifiers after making comparisons among these Tables.

Figure 3

Gaussian fitting on clusters of cumulative scores of variable importance corresponding to 10-fold cross validation. Each sub-figure corresponds to the Gaussian fitting of the accumulated scores of the variables with its x and y-axis representing variable scores and their probability densities.

Open in new tab Download slide

Figure 4

The ACCs of 10-fold cross validation on each feature dimension. Each sub-figure corresponds to the result of a fold.

Open in new tab Download slide

Classification results after using 10-fold nested cross validation

In order to remove data redundancy, CD-HIT [25] was used with a 25% cutoff, which means no two protein sequences have a similarity more than 25%. The redundancy removal was made on 487 PPR-positive protein sequences and 9590 PPR-negative ones, respectively. Accordingly, 170 PPR-positive proteins and 9293 PPR-negative proteins were left, which constitutes the non-redundant data. We randomly divided both non-redundant PPR-positive proteins and PPR-negative ones into 10 groups, with 17 PPR-positive proteins and 929 PPR-negative ones in each group. Nine groups were used as a training set and the left one was used as a test set. In this way, 10-fold cross validation was made.

After n rounds of score accumulation on each training set, 10 scatter plots were obtained, as illustrated in Figure 2. It can be seen that the accumulated scores of the variables are very close except for several outliers. As having been stated previously, automatic variable selection is to be considered. First of all, the clustering strategy derived from A-DPC [26] was taken into account. Gaussian fitting [27] was performed on the obtained clusters of the accumulated scores. Corresponding results are shown in Figure 3. In each fold, Gaussian mixture component with the lowest accumulated scores was eliminated, and the remain variables were regarded as a feature candidate. Then, an intersection viewed as an important feature was made on feature candidates derived from 10 folds. Note that the important feature is just a compacted one in contrast with a feature candidate. Hence, 10-fold cross validation can be made on the important feature. Second, the incremental strategy was implemented on each fold. Ensemble classification was utilized and the ACC of each feature dimension was calculated. Besides, polynomial fitting on these ACCs from different dimensions was made. The results for 10 folds are illustrated in Figure 4.

Table 13

Open in new tab

Quantitative results of the one-dimensional feature using the hybrid ensemble classifier

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.176	0.014	0.188	0.176	0.182
	a	3	14	b: positive	0.986	0.824	0.985	0.986	0.985
	b	13	917	weighted average	0.581	0.419	0.586	0.581	0.584
$(10)^{T}$	classified as − >	a	b	a: positive	0.118	0.019	0.100	0.118	0.108
	a	2	15	b: positive	0.981	0.882	0.984	0.981	0.982
	b	18	912	weighted average	0.549	0.451	0.542	0.549	0.545
$(10)^{T}$	classified as − >	a	b	a: positive	0.118	0.019	0.100	0.118	0.108
	a	2	15	b: positive	0.981	0.882	0.984	0.981	0.982
	b	18	912	weighted average	0.549	0.451	0.542	0.549	0.545
$(10)^{T}$	classified as − >	a	b	a: positive	0.118	0.014	0.133	0.118	0.125
	a	2	15	b: positive	0.986	0.882	0.984	0.986	0.985
	b	13	916	weighted average	0.552	0.448	0.559	0.552	0.555
$(10)^{T}$	classified as − >	a	b	a: positive	0.235	0.010	0.308	0.235	0.267
	a	4	13	b: positive	0.990	0.765	0.986	0.990	0.988
	b	9	920	weighted average	0.613	0.387	0.647	0.613	0.627
$(10)^{T}$	classified as − >	a	b	a: positive	0.000	0.016	0.000	0.000	0.000
	a	0	17	b: positive	0.984	1.000	0.982	0.984	0.983
	b	15	914	weighted average	0.492	0.508	0.491	0.492	0.491
$(10)^{T}$	classified as − >	a	b	a: positive	0.000	0.022	0.000	0.000	0.000
	a	0	17	b: positive	0.978	1.000	0.982	0.978	0.980
	b	20	909	weighted average	0.489	0.511	0.491	0.489	0.490
$(10)^{T}$	classified as − >	a	b	a: positive	0.059	0.015	0.067	0.059	0.062
	a	1	16	b: positive	0.985	0.941	0.983	0.985	0.984
	b	14	915	weighted average	0.522	0.478	0.525	0.522	0.523
$(10)^{T}$	classified as − >	a	b	a: positive	0.059	0.009	0.111	0.059	0.077
	a	1	16	b: positive	0.991	0.941	0.983	0.991	0.987
	b	8	921	weighted average	0.525	0.475	0.547	0.525	0.532
$(10)^{T}$	classified as − >	a	b	a: positive	0.059	0.014	0.071	0.059	0.065
	a	1	16	b: positive	0.986	0.941	0.983	0.986	0.984
	b	13	916	weighted average	0.522	0.478	0.527	0.522	0.524

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.176	0.014	0.188	0.176	0.182
	a	3	14	b: positive	0.986	0.824	0.985	0.986	0.985
	b	13	917	weighted average	0.581	0.419	0.586	0.581	0.584
$(10)^{T}$	classified as − >	a	b	a: positive	0.118	0.019	0.100	0.118	0.108
	a	2	15	b: positive	0.981	0.882	0.984	0.981	0.982
	b	18	912	weighted average	0.549	0.451	0.542	0.549	0.545
$(10)^{T}$	classified as − >	a	b	a: positive	0.118	0.019	0.100	0.118	0.108
	a	2	15	b: positive	0.981	0.882	0.984	0.981	0.982
	b	18	912	weighted average	0.549	0.451	0.542	0.549	0.545
$(10)^{T}$	classified as − >	a	b	a: positive	0.118	0.014	0.133	0.118	0.125
	a	2	15	b: positive	0.986	0.882	0.984	0.986	0.985
	b	13	916	weighted average	0.552	0.448	0.559	0.552	0.555
$(10)^{T}$	classified as − >	a	b	a: positive	0.235	0.010	0.308	0.235	0.267
	a	4	13	b: positive	0.990	0.765	0.986	0.990	0.988
	b	9	920	weighted average	0.613	0.387	0.647	0.613	0.627
$(10)^{T}$	classified as − >	a	b	a: positive	0.000	0.016	0.000	0.000	0.000
	a	0	17	b: positive	0.984	1.000	0.982	0.984	0.983
	b	15	914	weighted average	0.492	0.508	0.491	0.492	0.491
$(10)^{T}$	classified as − >	a	b	a: positive	0.000	0.022	0.000	0.000	0.000
	a	0	17	b: positive	0.978	1.000	0.982	0.978	0.980
	b	20	909	weighted average	0.489	0.511	0.491	0.489	0.490
$(10)^{T}$	classified as − >	a	b	a: positive	0.059	0.015	0.067	0.059	0.062
	a	1	16	b: positive	0.985	0.941	0.983	0.985	0.984
	b	14	915	weighted average	0.522	0.478	0.525	0.522	0.523
$(10)^{T}$	classified as − >	a	b	a: positive	0.059	0.009	0.111	0.059	0.077
	a	1	16	b: positive	0.991	0.941	0.983	0.991	0.987
	b	8	921	weighted average	0.525	0.475	0.547	0.525	0.532
$(10)^{T}$	classified as − >	a	b	a: positive	0.059	0.014	0.071	0.059	0.065
	a	1	16	b: positive	0.986	0.941	0.983	0.986	0.984
	b	13	916	weighted average	0.522	0.478	0.527	0.522	0.524

Table 13

Open in new tab

Quantitative results of the one-dimensional feature using the hybrid ensemble classifier

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.176	0.014	0.188	0.176	0.182
	a	3	14	b: positive	0.986	0.824	0.985	0.986	0.985
	b	13	917	weighted average	0.581	0.419	0.586	0.581	0.584
$(10)^{T}$	classified as − >	a	b	a: positive	0.118	0.019	0.100	0.118	0.108
	a	2	15	b: positive	0.981	0.882	0.984	0.981	0.982
	b	18	912	weighted average	0.549	0.451	0.542	0.549	0.545
$(10)^{T}$	classified as − >	a	b	a: positive	0.118	0.019	0.100	0.118	0.108
	a	2	15	b: positive	0.981	0.882	0.984	0.981	0.982
	b	18	912	weighted average	0.549	0.451	0.542	0.549	0.545
$(10)^{T}$	classified as − >	a	b	a: positive	0.118	0.014	0.133	0.118	0.125
	a	2	15	b: positive	0.986	0.882	0.984	0.986	0.985
	b	13	916	weighted average	0.552	0.448	0.559	0.552	0.555
$(10)^{T}$	classified as − >	a	b	a: positive	0.235	0.010	0.308	0.235	0.267
	a	4	13	b: positive	0.990	0.765	0.986	0.990	0.988
	b	9	920	weighted average	0.613	0.387	0.647	0.613	0.627
$(10)^{T}$	classified as − >	a	b	a: positive	0.000	0.016	0.000	0.000	0.000
	a	0	17	b: positive	0.984	1.000	0.982	0.984	0.983
	b	15	914	weighted average	0.492	0.508	0.491	0.492	0.491
$(10)^{T}$	classified as − >	a	b	a: positive	0.000	0.022	0.000	0.000	0.000
	a	0	17	b: positive	0.978	1.000	0.982	0.978	0.980
	b	20	909	weighted average	0.489	0.511	0.491	0.489	0.490
$(10)^{T}$	classified as − >	a	b	a: positive	0.059	0.015	0.067	0.059	0.062
	a	1	16	b: positive	0.985	0.941	0.983	0.985	0.984
	b	14	915	weighted average	0.522	0.478	0.525	0.522	0.523
$(10)^{T}$	classified as − >	a	b	a: positive	0.059	0.009	0.111	0.059	0.077
	a	1	16	b: positive	0.991	0.941	0.983	0.991	0.987
	b	8	921	weighted average	0.525	0.475	0.547	0.525	0.532
$(10)^{T}$	classified as − >	a	b	a: positive	0.059	0.014	0.071	0.059	0.065
	a	1	16	b: positive	0.986	0.941	0.983	0.986	0.984
	b	13	916	weighted average	0.522	0.478	0.527	0.522	0.524

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10)^{T}$	classified as − >	a	b	a: positive	0.176	0.014	0.188	0.176	0.182
	a	3	14	b: positive	0.986	0.824	0.985	0.986	0.985
	b	13	917	weighted average	0.581	0.419	0.586	0.581	0.584
$(10)^{T}$	classified as − >	a	b	a: positive	0.118	0.019	0.100	0.118	0.108
	a	2	15	b: positive	0.981	0.882	0.984	0.981	0.982
	b	18	912	weighted average	0.549	0.451	0.542	0.549	0.545
$(10)^{T}$	classified as − >	a	b	a: positive	0.118	0.019	0.100	0.118	0.108
	a	2	15	b: positive	0.981	0.882	0.984	0.981	0.982
	b	18	912	weighted average	0.549	0.451	0.542	0.549	0.545
$(10)^{T}$	classified as − >	a	b	a: positive	0.118	0.014	0.133	0.118	0.125
	a	2	15	b: positive	0.986	0.882	0.984	0.986	0.985
	b	13	916	weighted average	0.552	0.448	0.559	0.552	0.555
$(10)^{T}$	classified as − >	a	b	a: positive	0.235	0.010	0.308	0.235	0.267
	a	4	13	b: positive	0.990	0.765	0.986	0.990	0.988
	b	9	920	weighted average	0.613	0.387	0.647	0.613	0.627
$(10)^{T}$	classified as − >	a	b	a: positive	0.000	0.016	0.000	0.000	0.000
	a	0	17	b: positive	0.984	1.000	0.982	0.984	0.983
	b	15	914	weighted average	0.492	0.508	0.491	0.492	0.491
$(10)^{T}$	classified as − >	a	b	a: positive	0.000	0.022	0.000	0.000	0.000
	a	0	17	b: positive	0.978	1.000	0.982	0.978	0.980
	b	20	909	weighted average	0.489	0.511	0.491	0.489	0.490
$(10)^{T}$	classified as − >	a	b	a: positive	0.059	0.015	0.067	0.059	0.062
	a	1	16	b: positive	0.985	0.941	0.983	0.985	0.984
	b	14	915	weighted average	0.522	0.478	0.525	0.522	0.523
$(10)^{T}$	classified as − >	a	b	a: positive	0.059	0.009	0.111	0.059	0.077
	a	1	16	b: positive	0.991	0.941	0.983	0.991	0.987
	b	8	921	weighted average	0.525	0.475	0.547	0.525	0.532
$(10)^{T}$	classified as − >	a	b	a: positive	0.059	0.014	0.071	0.059	0.065
	a	1	16	b: positive	0.986	0.941	0.983	0.986	0.984
	b	13	916	weighted average	0.522	0.478	0.527	0.522	0.524

The quantitative results of the important feature derived from the clustering strategy, together with those of the variable with the highest accumulated score and 188D, are listed in Table 13, Table 14 and Table 15. It can be seen that the average TP rates of the important feature exceed those of 188D. As to the qualitative results, 10 ACCs were calculated. A line chart of these ACCs and their average value are shown in Figure 5(A). When it comes to the incremental strategy, 10 ACCs from 10-folds were recorded in a line according to each feature dimension. Thus, 188 lines were obtained, as shown in Figure 5(B). The blue line refers to one-dimensional feature and the red line corresponds to 188D. The other gray lines represent the ACCs associated with the features from two to 187 dimensions. Besides, a feature derived from the polynomial fitting is labeled as a green star in each fold. From Figure 5(B), it can be indicated that the classification performance is better when using not 188D but a feature with its dimension higher than 10 following the incremental strategy. The quantitative results of 10 folds derived from the incremental strategy are listed in Table 16, each row of which corresponds to a feature derived from the polynomial fitting in each fold. By comparing the qualitative results (see Figure 5) and the quantitative results (see Table 13, Table 14, Table 15 and Table 16), it can be concluded that there is not any apparent difference of classification performance between the incremental strategy and the clustering strategy.

Table 14

Open in new tab

Quantitative results of the feature selected by the clustering strategy using the hybrid ensemble classifier

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(1, 130, 163, 69, 9, 10,$	classified as − >	a	b	a: positive	0.941	0.076	0.184	0.941	0.308
$171, 12, 13, 173, 15,$	a	16	1	b: positive	0.924	0.059	0.999	0.924	0.960
$16, 22, 87, 24, 62)^{T}$	b	71	859	weighted average	0.932	0.068	0.591	0.932	0.634
$(1, 130, 163, 69, 9, 10,$	classified as − >	a	b	a: positive	0.941	0.065	0.211	0.941	0.344
$171, 12, 13, 173, 15,$	a	16	1	b: positive	0.935	0.059	0.999	0.935	0.966
$16, 22, 87, 24, 62)^{T}$	b	60	870	weighted average	0.938	0.062	0.605	0.938	0.655
$(1, 130, 163, 69, 9, 10,$	classified as − >	a	b	a: positive	0.706	0.083	0.135	0.706	0.226
$171, 12, 13, 173, 15,$	a	12	5	b: positive	0.917	0.294	0.994	0.917	0.954
$16, 22, 87, 24, 62)^{T}$	b	77	853	weighted average	0.812	0.188	0.565	0.812	0.590
$(1, 130, 163, 69, 9, 10,$	classified as − >	a	b	a: positive	0.824	0.073	0.171	0.824	0.283
$171, 12, 13, 173, 15,$	a	14	3	b: positive	0.927	0.176	0.997	0.927	0.960
$16, 22, 87, 24, 62)^{T}$	b	68	861	weighted average	0.875	0.125	0.584	0.875	0.622
$(1, 130, 163, 69, 9, 10,$	classified as − >	a	b	a: positive	0.941	0.064	0.213	0.941	0.348
$171, 12, 13, 173, 15,$	a	16	1	b: positive	0.936	0.059	0.999	0.936	0.967
$16, 22, 87, 24, 62)^{T}$	b	59	870	weighted average	0.939	0.061	0.606	0.939	0.657
$(1, 130, 163, 69, 9, 10,$	classified as − >	a	b	a: positive	0.941	0.075	0.186	0.941	0.311
$171, 12, 13, 173, 15,$	a	16	1	b: positive	0.925	0.059	0.999	0.925	0.960
$16, 22, 87, 24, 62)^{T}$	b	70	859	weighted average	0.933	0.067	0.592	0.933	0.635
$(1, 130, 163, 69, 9, 10,$	classified as − >	a	b	a: positive	0.882	0.078	0.172	0.882	0.288
$171, 12, 13, 173, 15,$	a	15	2	b: positive	0.922	0.118	0.998	0.922	0.959
$16, 22, 87, 24, 62)^{T}$	b	72	857	weighted average	0.902	0.098	0.585	0.902	0.624
$(1, 130, 163, 69, 9, 10,$	classified as − >	a	b	a: positive	0.882	0.075	0.176	0.882	0.294
$171, 12, 13, 173, 15,$	a	15	2	b: positive	0.925	0.118	0.998	0.925	0.960
$16, 22, 87, 24, 62)^{T}$	b	70	859	weighted average	0.904	0.096	0.587	0.904	0.627
$(1, 130, 163, 69, 9, 10,$	classified as − >	a	b	a: positive	0.765	0.071	0.165	0.765	0.271
$171, 12, 13, 173, 15,$	a	13	4	b: positive	0.929	0.235	0.995	0.929	0.961
$16, 22, 87, 24, 62)^{T}$	b	66	863	weighted average	0.847	0.153	0.580	0.847	0.616
$(1, 130, 163, 69, 9, 10,$	classified as − >	a	b	a: positive	0.941	0.059	0.225	0.941	0.364
$171, 12, 13, 173, 15,$	a	16	1	b: positive	0.941	0.059	0.999	0.941	0.969
$16, 22, 87, 24, 62)^{T}$	b	55	874	weighted average	0.941	0.059	0.612	0.941	0.666

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(1, 130, 163, 69, 9, 10,$	classified as − >	a	b	a: positive	0.941	0.076	0.184	0.941	0.308
$171, 12, 13, 173, 15,$	a	16	1	b: positive	0.924	0.059	0.999	0.924	0.960
$16, 22, 87, 24, 62)^{T}$	b	71	859	weighted average	0.932	0.068	0.591	0.932	0.634
$(1, 130, 163, 69, 9, 10,$	classified as − >	a	b	a: positive	0.941	0.065	0.211	0.941	0.344
$171, 12, 13, 173, 15,$	a	16	1	b: positive	0.935	0.059	0.999	0.935	0.966
$16, 22, 87, 24, 62)^{T}$	b	60	870	weighted average	0.938	0.062	0.605	0.938	0.655
$(1, 130, 163, 69, 9, 10,$	classified as − >	a	b	a: positive	0.706	0.083	0.135	0.706	0.226
$171, 12, 13, 173, 15,$	a	12	5	b: positive	0.917	0.294	0.994	0.917	0.954
$16, 22, 87, 24, 62)^{T}$	b	77	853	weighted average	0.812	0.188	0.565	0.812	0.590
$(1, 130, 163, 69, 9, 10,$	classified as − >	a	b	a: positive	0.824	0.073	0.171	0.824	0.283
$171, 12, 13, 173, 15,$	a	14	3	b: positive	0.927	0.176	0.997	0.927	0.960
$16, 22, 87, 24, 62)^{T}$	b	68	861	weighted average	0.875	0.125	0.584	0.875	0.622
$(1, 130, 163, 69, 9, 10,$	classified as − >	a	b	a: positive	0.941	0.064	0.213	0.941	0.348
$171, 12, 13, 173, 15,$	a	16	1	b: positive	0.936	0.059	0.999	0.936	0.967
$16, 22, 87, 24, 62)^{T}$	b	59	870	weighted average	0.939	0.061	0.606	0.939	0.657
$(1, 130, 163, 69, 9, 10,$	classified as − >	a	b	a: positive	0.941	0.075	0.186	0.941	0.311
$171, 12, 13, 173, 15,$	a	16	1	b: positive	0.925	0.059	0.999	0.925	0.960
$16, 22, 87, 24, 62)^{T}$	b	70	859	weighted average	0.933	0.067	0.592	0.933	0.635
$(1, 130, 163, 69, 9, 10,$	classified as − >	a	b	a: positive	0.882	0.078	0.172	0.882	0.288
$171, 12, 13, 173, 15,$	a	15	2	b: positive	0.922	0.118	0.998	0.922	0.959
$16, 22, 87, 24, 62)^{T}$	b	72	857	weighted average	0.902	0.098	0.585	0.902	0.624
$(1, 130, 163, 69, 9, 10,$	classified as − >	a	b	a: positive	0.882	0.075	0.176	0.882	0.294
$171, 12, 13, 173, 15,$	a	15	2	b: positive	0.925	0.118	0.998	0.925	0.960
$16, 22, 87, 24, 62)^{T}$	b	70	859	weighted average	0.904	0.096	0.587	0.904	0.627
$(1, 130, 163, 69, 9, 10,$	classified as − >	a	b	a: positive	0.765	0.071	0.165	0.765	0.271
$171, 12, 13, 173, 15,$	a	13	4	b: positive	0.929	0.235	0.995	0.929	0.961
$16, 22, 87, 24, 62)^{T}$	b	66	863	weighted average	0.847	0.153	0.580	0.847	0.616
$(1, 130, 163, 69, 9, 10,$	classified as − >	a	b	a: positive	0.941	0.059	0.225	0.941	0.364
$171, 12, 13, 173, 15,$	a	16	1	b: positive	0.941	0.059	0.999	0.941	0.969
$16, 22, 87, 24, 62)^{T}$	b	55	874	weighted average	0.941	0.059	0.612	0.941	0.666

Table 14

Open in new tab

Quantitative results of the feature selected by the clustering strategy using the hybrid ensemble classifier

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(1, 130, 163, 69, 9, 10,$	classified as − >	a	b	a: positive	0.941	0.076	0.184	0.941	0.308
$171, 12, 13, 173, 15,$	a	16	1	b: positive	0.924	0.059	0.999	0.924	0.960
$16, 22, 87, 24, 62)^{T}$	b	71	859	weighted average	0.932	0.068	0.591	0.932	0.634
$(1, 130, 163, 69, 9, 10,$	classified as − >	a	b	a: positive	0.941	0.065	0.211	0.941	0.344
$171, 12, 13, 173, 15,$	a	16	1	b: positive	0.935	0.059	0.999	0.935	0.966
$16, 22, 87, 24, 62)^{T}$	b	60	870	weighted average	0.938	0.062	0.605	0.938	0.655
$(1, 130, 163, 69, 9, 10,$	classified as − >	a	b	a: positive	0.706	0.083	0.135	0.706	0.226
$171, 12, 13, 173, 15,$	a	12	5	b: positive	0.917	0.294	0.994	0.917	0.954
$16, 22, 87, 24, 62)^{T}$	b	77	853	weighted average	0.812	0.188	0.565	0.812	0.590
$(1, 130, 163, 69, 9, 10,$	classified as − >	a	b	a: positive	0.824	0.073	0.171	0.824	0.283
$171, 12, 13, 173, 15,$	a	14	3	b: positive	0.927	0.176	0.997	0.927	0.960
$16, 22, 87, 24, 62)^{T}$	b	68	861	weighted average	0.875	0.125	0.584	0.875	0.622
$(1, 130, 163, 69, 9, 10,$	classified as − >	a	b	a: positive	0.941	0.064	0.213	0.941	0.348
$171, 12, 13, 173, 15,$	a	16	1	b: positive	0.936	0.059	0.999	0.936	0.967
$16, 22, 87, 24, 62)^{T}$	b	59	870	weighted average	0.939	0.061	0.606	0.939	0.657
$(1, 130, 163, 69, 9, 10,$	classified as − >	a	b	a: positive	0.941	0.075	0.186	0.941	0.311
$171, 12, 13, 173, 15,$	a	16	1	b: positive	0.925	0.059	0.999	0.925	0.960
$16, 22, 87, 24, 62)^{T}$	b	70	859	weighted average	0.933	0.067	0.592	0.933	0.635
$(1, 130, 163, 69, 9, 10,$	classified as − >	a	b	a: positive	0.882	0.078	0.172	0.882	0.288
$171, 12, 13, 173, 15,$	a	15	2	b: positive	0.922	0.118	0.998	0.922	0.959
$16, 22, 87, 24, 62)^{T}$	b	72	857	weighted average	0.902	0.098	0.585	0.902	0.624
$(1, 130, 163, 69, 9, 10,$	classified as − >	a	b	a: positive	0.882	0.075	0.176	0.882	0.294
$171, 12, 13, 173, 15,$	a	15	2	b: positive	0.925	0.118	0.998	0.925	0.960
$16, 22, 87, 24, 62)^{T}$	b	70	859	weighted average	0.904	0.096	0.587	0.904	0.627
$(1, 130, 163, 69, 9, 10,$	classified as − >	a	b	a: positive	0.765	0.071	0.165	0.765	0.271
$171, 12, 13, 173, 15,$	a	13	4	b: positive	0.929	0.235	0.995	0.929	0.961
$16, 22, 87, 24, 62)^{T}$	b	66	863	weighted average	0.847	0.153	0.580	0.847	0.616
$(1, 130, 163, 69, 9, 10,$	classified as − >	a	b	a: positive	0.941	0.059	0.225	0.941	0.364
$171, 12, 13, 173, 15,$	a	16	1	b: positive	0.941	0.059	0.999	0.941	0.969
$16, 22, 87, 24, 62)^{T}$	b	55	874	weighted average	0.941	0.059	0.612	0.941	0.666

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(1, 130, 163, 69, 9, 10,$	classified as − >	a	b	a: positive	0.941	0.076	0.184	0.941	0.308
$171, 12, 13, 173, 15,$	a	16	1	b: positive	0.924	0.059	0.999	0.924	0.960
$16, 22, 87, 24, 62)^{T}$	b	71	859	weighted average	0.932	0.068	0.591	0.932	0.634
$(1, 130, 163, 69, 9, 10,$	classified as − >	a	b	a: positive	0.941	0.065	0.211	0.941	0.344
$171, 12, 13, 173, 15,$	a	16	1	b: positive	0.935	0.059	0.999	0.935	0.966
$16, 22, 87, 24, 62)^{T}$	b	60	870	weighted average	0.938	0.062	0.605	0.938	0.655
$(1, 130, 163, 69, 9, 10,$	classified as − >	a	b	a: positive	0.706	0.083	0.135	0.706	0.226
$171, 12, 13, 173, 15,$	a	12	5	b: positive	0.917	0.294	0.994	0.917	0.954
$16, 22, 87, 24, 62)^{T}$	b	77	853	weighted average	0.812	0.188	0.565	0.812	0.590
$(1, 130, 163, 69, 9, 10,$	classified as − >	a	b	a: positive	0.824	0.073	0.171	0.824	0.283
$171, 12, 13, 173, 15,$	a	14	3	b: positive	0.927	0.176	0.997	0.927	0.960
$16, 22, 87, 24, 62)^{T}$	b	68	861	weighted average	0.875	0.125	0.584	0.875	0.622
$(1, 130, 163, 69, 9, 10,$	classified as − >	a	b	a: positive	0.941	0.064	0.213	0.941	0.348
$171, 12, 13, 173, 15,$	a	16	1	b: positive	0.936	0.059	0.999	0.936	0.967
$16, 22, 87, 24, 62)^{T}$	b	59	870	weighted average	0.939	0.061	0.606	0.939	0.657
$(1, 130, 163, 69, 9, 10,$	classified as − >	a	b	a: positive	0.941	0.075	0.186	0.941	0.311
$171, 12, 13, 173, 15,$	a	16	1	b: positive	0.925	0.059	0.999	0.925	0.960
$16, 22, 87, 24, 62)^{T}$	b	70	859	weighted average	0.933	0.067	0.592	0.933	0.635
$(1, 130, 163, 69, 9, 10,$	classified as − >	a	b	a: positive	0.882	0.078	0.172	0.882	0.288
$171, 12, 13, 173, 15,$	a	15	2	b: positive	0.922	0.118	0.998	0.922	0.959
$16, 22, 87, 24, 62)^{T}$	b	72	857	weighted average	0.902	0.098	0.585	0.902	0.624
$(1, 130, 163, 69, 9, 10,$	classified as − >	a	b	a: positive	0.882	0.075	0.176	0.882	0.294
$171, 12, 13, 173, 15,$	a	15	2	b: positive	0.925	0.118	0.998	0.925	0.960
$16, 22, 87, 24, 62)^{T}$	b	70	859	weighted average	0.904	0.096	0.587	0.904	0.627
$(1, 130, 163, 69, 9, 10,$	classified as − >	a	b	a: positive	0.765	0.071	0.165	0.765	0.271
$171, 12, 13, 173, 15,$	a	13	4	b: positive	0.929	0.235	0.995	0.929	0.961
$16, 22, 87, 24, 62)^{T}$	b	66	863	weighted average	0.847	0.153	0.580	0.847	0.616
$(1, 130, 163, 69, 9, 10,$	classified as − >	a	b	a: positive	0.941	0.059	0.225	0.941	0.364
$171, 12, 13, 173, 15,$	a	16	1	b: positive	0.941	0.059	0.999	0.941	0.969
$16, 22, 87, 24, 62)^{T}$	b	55	874	weighted average	0.941	0.059	0.612	0.941	0.666

Table 15

Open in new tab

Quantitative results of 188D using the hybrid ensemble classifier

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
188D	classified as − >	a	b	a: positive	0.824	0.000	1.000	0.824	0.903
	a	14	3	b: positive	1.000	0.176	0.997	1.000	0.998
	b	0	930	weighted average	0.912	0.088	0.998	0.912	0.951
188D	classified as − >	a	b	a: positive	0.765	0.001	0.929	0.765	0.839
	a	13	4	b: positive	0.999	0.235	0.996	0.999	0.997
	b	1	929	weighted average	0.882	0.118	0.962	0.882	0.918
188D	classified as − >	a	b	a: positive	0.706	0.001	0.923	0.706	0.800
	a	12	5	b: positive	0.999	0.294	0.995	0.999	0.997
	b	1	929	weighted average	0.852	0.148	0.959	0.852	0.898
188D	classified as − >	a	b	a: positive	0.588	0.001	0.909	0.588	0.714
	a	10	7	b: positive	0.999	0.412	0.993	0.999	0.996
	b	1	928	weighted average	0.794	0.206	0.951	0.794	0.855
188D	classified as − >	a	b	a: positive	0.824	0.000	1.000	0.824	0.903
	a	14	3	b: positive	1.000	0.176	0.997	1.000	0.998
	b	0	929	weighted average	0.912	0.088	0.998	0.912	0.951
188D	classified as − >	a	b	a: positive	0.706	0.002	0.857	0.706	0.774
	a	12	5	b: positive	0.998	0.294	0.995	0.998	0.996
	b	2	927	weighted average	0.852	0.148	0.926	0.852	0.885
188D	classified as − >	a	b	a: positive	0.588	0.000	1.000	0.588	0.741
	a	10	7	b: positive	1.000	0.412	0.993	1.000	0.996
	b	0	929	weighted average	0.794	0.206	0.996	0.794	0.868
188D	classified as − >	a	b	a: positive	0.765	0.001	0.929	0.765	0.839
	a	13	4	b: positive	0.999	0.235	0.996	0.999	0.997
	b	1	928	weighted average	0.882	0.118	0.962	0.882	0.918
188D	classified as − >	a	b	a: positive	0.706	0.001	0.923	0.706	0.800
	a	12	5	b: positive	0.999	0.294	0.995	0.999	0.997
	b	1	928	weighted average	0.852	0.148	0.959	0.852	0.898
188D	classified as − >	a	b	a: positive	0.765	0.000	1.000	0.765	0.867
	a	13	4	b: positive	1.000	0.235	0.996	1.000	0.998
	b	0	929	weighted average	0.882	0.118	0.998	0.882	0.832

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
188D	classified as − >	a	b	a: positive	0.824	0.000	1.000	0.824	0.903
	a	14	3	b: positive	1.000	0.176	0.997	1.000	0.998
	b	0	930	weighted average	0.912	0.088	0.998	0.912	0.951
188D	classified as − >	a	b	a: positive	0.765	0.001	0.929	0.765	0.839
	a	13	4	b: positive	0.999	0.235	0.996	0.999	0.997
	b	1	929	weighted average	0.882	0.118	0.962	0.882	0.918
188D	classified as − >	a	b	a: positive	0.706	0.001	0.923	0.706	0.800
	a	12	5	b: positive	0.999	0.294	0.995	0.999	0.997
	b	1	929	weighted average	0.852	0.148	0.959	0.852	0.898
188D	classified as − >	a	b	a: positive	0.588	0.001	0.909	0.588	0.714
	a	10	7	b: positive	0.999	0.412	0.993	0.999	0.996
	b	1	928	weighted average	0.794	0.206	0.951	0.794	0.855
188D	classified as − >	a	b	a: positive	0.824	0.000	1.000	0.824	0.903
	a	14	3	b: positive	1.000	0.176	0.997	1.000	0.998
	b	0	929	weighted average	0.912	0.088	0.998	0.912	0.951
188D	classified as − >	a	b	a: positive	0.706	0.002	0.857	0.706	0.774
	a	12	5	b: positive	0.998	0.294	0.995	0.998	0.996
	b	2	927	weighted average	0.852	0.148	0.926	0.852	0.885
188D	classified as − >	a	b	a: positive	0.588	0.000	1.000	0.588	0.741
	a	10	7	b: positive	1.000	0.412	0.993	1.000	0.996
	b	0	929	weighted average	0.794	0.206	0.996	0.794	0.868
188D	classified as − >	a	b	a: positive	0.765	0.001	0.929	0.765	0.839
	a	13	4	b: positive	0.999	0.235	0.996	0.999	0.997
	b	1	928	weighted average	0.882	0.118	0.962	0.882	0.918
188D	classified as − >	a	b	a: positive	0.706	0.001	0.923	0.706	0.800
	a	12	5	b: positive	0.999	0.294	0.995	0.999	0.997
	b	1	928	weighted average	0.852	0.148	0.959	0.852	0.898
188D	classified as − >	a	b	a: positive	0.765	0.000	1.000	0.765	0.867
	a	13	4	b: positive	1.000	0.235	0.996	1.000	0.998
	b	0	929	weighted average	0.882	0.118	0.998	0.882	0.832

Table 15

Open in new tab

Quantitative results of 188D using the hybrid ensemble classifier

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
188D	classified as − >	a	b	a: positive	0.824	0.000	1.000	0.824	0.903
	a	14	3	b: positive	1.000	0.176	0.997	1.000	0.998
	b	0	930	weighted average	0.912	0.088	0.998	0.912	0.951
188D	classified as − >	a	b	a: positive	0.765	0.001	0.929	0.765	0.839
	a	13	4	b: positive	0.999	0.235	0.996	0.999	0.997
	b	1	929	weighted average	0.882	0.118	0.962	0.882	0.918
188D	classified as − >	a	b	a: positive	0.706	0.001	0.923	0.706	0.800
	a	12	5	b: positive	0.999	0.294	0.995	0.999	0.997
	b	1	929	weighted average	0.852	0.148	0.959	0.852	0.898
188D	classified as − >	a	b	a: positive	0.588	0.001	0.909	0.588	0.714
	a	10	7	b: positive	0.999	0.412	0.993	0.999	0.996
	b	1	928	weighted average	0.794	0.206	0.951	0.794	0.855
188D	classified as − >	a	b	a: positive	0.824	0.000	1.000	0.824	0.903
	a	14	3	b: positive	1.000	0.176	0.997	1.000	0.998
	b	0	929	weighted average	0.912	0.088	0.998	0.912	0.951
188D	classified as − >	a	b	a: positive	0.706	0.002	0.857	0.706	0.774
	a	12	5	b: positive	0.998	0.294	0.995	0.998	0.996
	b	2	927	weighted average	0.852	0.148	0.926	0.852	0.885
188D	classified as − >	a	b	a: positive	0.588	0.000	1.000	0.588	0.741
	a	10	7	b: positive	1.000	0.412	0.993	1.000	0.996
	b	0	929	weighted average	0.794	0.206	0.996	0.794	0.868
188D	classified as − >	a	b	a: positive	0.765	0.001	0.929	0.765	0.839
	a	13	4	b: positive	0.999	0.235	0.996	0.999	0.997
	b	1	928	weighted average	0.882	0.118	0.962	0.882	0.918
188D	classified as − >	a	b	a: positive	0.706	0.001	0.923	0.706	0.800
	a	12	5	b: positive	0.999	0.294	0.995	0.999	0.997
	b	1	928	weighted average	0.852	0.148	0.959	0.852	0.898
188D	classified as − >	a	b	a: positive	0.765	0.000	1.000	0.765	0.867
	a	13	4	b: positive	1.000	0.235	0.996	1.000	0.998
	b	0	929	weighted average	0.882	0.118	0.998	0.882	0.832

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
188D	classified as − >	a	b	a: positive	0.824	0.000	1.000	0.824	0.903
	a	14	3	b: positive	1.000	0.176	0.997	1.000	0.998
	b	0	930	weighted average	0.912	0.088	0.998	0.912	0.951
188D	classified as − >	a	b	a: positive	0.765	0.001	0.929	0.765	0.839
	a	13	4	b: positive	0.999	0.235	0.996	0.999	0.997
	b	1	929	weighted average	0.882	0.118	0.962	0.882	0.918
188D	classified as − >	a	b	a: positive	0.706	0.001	0.923	0.706	0.800
	a	12	5	b: positive	0.999	0.294	0.995	0.999	0.997
	b	1	929	weighted average	0.852	0.148	0.959	0.852	0.898
188D	classified as − >	a	b	a: positive	0.588	0.001	0.909	0.588	0.714
	a	10	7	b: positive	0.999	0.412	0.993	0.999	0.996
	b	1	928	weighted average	0.794	0.206	0.951	0.794	0.855
188D	classified as − >	a	b	a: positive	0.824	0.000	1.000	0.824	0.903
	a	14	3	b: positive	1.000	0.176	0.997	1.000	0.998
	b	0	929	weighted average	0.912	0.088	0.998	0.912	0.951
188D	classified as − >	a	b	a: positive	0.706	0.002	0.857	0.706	0.774
	a	12	5	b: positive	0.998	0.294	0.995	0.998	0.996
	b	2	927	weighted average	0.852	0.148	0.926	0.852	0.885
188D	classified as − >	a	b	a: positive	0.588	0.000	1.000	0.588	0.741
	a	10	7	b: positive	1.000	0.412	0.993	1.000	0.996
	b	0	929	weighted average	0.794	0.206	0.996	0.794	0.868
188D	classified as − >	a	b	a: positive	0.765	0.001	0.929	0.765	0.839
	a	13	4	b: positive	0.999	0.235	0.996	0.999	0.997
	b	1	928	weighted average	0.882	0.118	0.962	0.882	0.918
188D	classified as − >	a	b	a: positive	0.706	0.001	0.923	0.706	0.800
	a	12	5	b: positive	0.999	0.294	0.995	0.999	0.997
	b	1	928	weighted average	0.852	0.148	0.959	0.852	0.898
188D	classified as − >	a	b	a: positive	0.765	0.000	1.000	0.765	0.867
	a	13	4	b: positive	1.000	0.235	0.996	1.000	0.998
	b	0	929	weighted average	0.882	0.118	0.998	0.882	0.832

Table 16

Open in new tab

Quantitative results of the feature selected by the incremental strategy using the hybrid ensemble classifier

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10, 9, 87, 12,$	classified as − >	a	b	a: positive	1.000	0.035	0.340	1.000	0.507
$16, 69, 163, 13,$	a	17	0	b: positive	0.965	0.000	1.000	0.965	0.982
$1, 130, 24, 62)^{T}$	b	33	897	weighted average	0.982	0.018	0.670	0.982	0.745
$(10, 9, 12, 87,$	classified as − >	a	b	a: positive	0.882	0.040	0.288	0.882	0.435
$16, 163, 13, 69,$	a	15	2	b: positive	0.960	0.118	0.998	0.960	0.979
$130, 24, 173)^{T}$	b	37	893	weighted average	0.921	0.079	0.643	0.921	0.707
$(10, 12, 87, 130,$	classified as − >	a	b	a: positive	0.647	0.006	0.647	0.647	0.647
$9, 16, 163, 13)^{T}$	a	11	6	b: positive	0.994	0.353	0.994	0.994	0.994
	b	6	924	weighted average	0.820	0.180	0.820	0.820	0.820
$(10, 9, 87, 12,$	classified as − >	a	b	a: positive	0.824	0.053	0.222	0.824	0.350
$16, 163, 13, 130,$	a	14	3	b: positive	0.947	0.176	0.997	0.947	0.971
$69, 1, 150, 173, 62)^{T}$	b	49	880	weighted average	0.885	0.115	0.609	0.885	0.661
$(10, 9, 12, 87,$	classified as − >	a	b	a: positive	0.882	0.017	0.484	0.882	0.625
$163, 16, 69, 13,$	a	15	2	b: positive	0.983	0.118	0.998	0.983	0.990
$1, 24)^{T}$	b	16	913	weighted average	0.933	0.067	0.741	0.933	0.808
$(10, 12, 9, 163,$	classified as − >	a	b	a: positive	0.882	0.041	0.283	0.882	0.429
$16, 69, 87, 13,$	a	15	2	b: positive	0.959	0.118	0.998	0.959	0.978
$130, 1, 24, 173)^{T}$	b	38	891	weighted average	0.921	0.079	0.640	0.921	0.703
$(10, 9, 12, 163,$	classified as − >	a	b	a: positive	0.706	0.055	0.190	0.706	0.300
$87, 130, 16, 24, 69,$	a	12	5	b: positive	0.945	0.294	0.994	0.945	0.969
$13, 62, 15, 1, 22)^{T}$	b	51	878	weighted average	0.825	0.175	0.592	0.825	0.635
$(10, 87, 9, 12,$	classified as − >	a	b	a: positive	0.882	0.037	0.306	0.882	0.455
$16, 163, 13, 69,$	a	15	2	b: positive	0.963	0.118	0.998	0.963	0.980
$173)^{T}$	b	34	895	weighted average	0.923	0.077	0.652	0.923	0.717
$(10, 9, 12, 16,$	classified as − >	a	b	a: positive	0.824	0.022	0.412	0.824	0.549
$163, 87, 13, 69,$	a	14	3	b: positive	0.978	0.176	0.997	0.978	0.988
$171, 24, 130)^{T}$	b	20	909	weighted average	0.901	0.099	0.704	0.901	0.768
$(10, 12, 9, 163,$	classified as − >	a	b	a: positive	0.941	0.047	0.267	0.941	0.416
$69, 16, 13, 87,$	a	16	1	b: positive	0.953	0.059	0.999	0.953	0.975
$1, 24, 173, 120)^{T}$	b	44	885	weighted average	0.947	0.053	0.633	0.947	0.695

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10, 9, 87, 12,$	classified as − >	a	b	a: positive	1.000	0.035	0.340	1.000	0.507
$16, 69, 163, 13,$	a	17	0	b: positive	0.965	0.000	1.000	0.965	0.982
$1, 130, 24, 62)^{T}$	b	33	897	weighted average	0.982	0.018	0.670	0.982	0.745
$(10, 9, 12, 87,$	classified as − >	a	b	a: positive	0.882	0.040	0.288	0.882	0.435
$16, 163, 13, 69,$	a	15	2	b: positive	0.960	0.118	0.998	0.960	0.979
$130, 24, 173)^{T}$	b	37	893	weighted average	0.921	0.079	0.643	0.921	0.707
$(10, 12, 87, 130,$	classified as − >	a	b	a: positive	0.647	0.006	0.647	0.647	0.647
$9, 16, 163, 13)^{T}$	a	11	6	b: positive	0.994	0.353	0.994	0.994	0.994
	b	6	924	weighted average	0.820	0.180	0.820	0.820	0.820
$(10, 9, 87, 12,$	classified as − >	a	b	a: positive	0.824	0.053	0.222	0.824	0.350
$16, 163, 13, 130,$	a	14	3	b: positive	0.947	0.176	0.997	0.947	0.971
$69, 1, 150, 173, 62)^{T}$	b	49	880	weighted average	0.885	0.115	0.609	0.885	0.661
$(10, 9, 12, 87,$	classified as − >	a	b	a: positive	0.882	0.017	0.484	0.882	0.625
$163, 16, 69, 13,$	a	15	2	b: positive	0.983	0.118	0.998	0.983	0.990
$1, 24)^{T}$	b	16	913	weighted average	0.933	0.067	0.741	0.933	0.808
$(10, 12, 9, 163,$	classified as − >	a	b	a: positive	0.882	0.041	0.283	0.882	0.429
$16, 69, 87, 13,$	a	15	2	b: positive	0.959	0.118	0.998	0.959	0.978
$130, 1, 24, 173)^{T}$	b	38	891	weighted average	0.921	0.079	0.640	0.921	0.703
$(10, 9, 12, 163,$	classified as − >	a	b	a: positive	0.706	0.055	0.190	0.706	0.300
$87, 130, 16, 24, 69,$	a	12	5	b: positive	0.945	0.294	0.994	0.945	0.969
$13, 62, 15, 1, 22)^{T}$	b	51	878	weighted average	0.825	0.175	0.592	0.825	0.635
$(10, 87, 9, 12,$	classified as − >	a	b	a: positive	0.882	0.037	0.306	0.882	0.455
$16, 163, 13, 69,$	a	15	2	b: positive	0.963	0.118	0.998	0.963	0.980
$173)^{T}$	b	34	895	weighted average	0.923	0.077	0.652	0.923	0.717
$(10, 9, 12, 16,$	classified as − >	a	b	a: positive	0.824	0.022	0.412	0.824	0.549
$163, 87, 13, 69,$	a	14	3	b: positive	0.978	0.176	0.997	0.978	0.988
$171, 24, 130)^{T}$	b	20	909	weighted average	0.901	0.099	0.704	0.901	0.768
$(10, 12, 9, 163,$	classified as − >	a	b	a: positive	0.941	0.047	0.267	0.941	0.416
$69, 16, 13, 87,$	a	16	1	b: positive	0.953	0.059	0.999	0.953	0.975
$1, 24, 173, 120)^{T}$	b	44	885	weighted average	0.947	0.053	0.633	0.947	0.695

Table 16

Open in new tab

Quantitative results of the feature selected by the incremental strategy using the hybrid ensemble classifier

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10, 9, 87, 12,$	classified as − >	a	b	a: positive	1.000	0.035	0.340	1.000	0.507
$16, 69, 163, 13,$	a	17	0	b: positive	0.965	0.000	1.000	0.965	0.982
$1, 130, 24, 62)^{T}$	b	33	897	weighted average	0.982	0.018	0.670	0.982	0.745
$(10, 9, 12, 87,$	classified as − >	a	b	a: positive	0.882	0.040	0.288	0.882	0.435
$16, 163, 13, 69,$	a	15	2	b: positive	0.960	0.118	0.998	0.960	0.979
$130, 24, 173)^{T}$	b	37	893	weighted average	0.921	0.079	0.643	0.921	0.707
$(10, 12, 87, 130,$	classified as − >	a	b	a: positive	0.647	0.006	0.647	0.647	0.647
$9, 16, 163, 13)^{T}$	a	11	6	b: positive	0.994	0.353	0.994	0.994	0.994
	b	6	924	weighted average	0.820	0.180	0.820	0.820	0.820
$(10, 9, 87, 12,$	classified as − >	a	b	a: positive	0.824	0.053	0.222	0.824	0.350
$16, 163, 13, 130,$	a	14	3	b: positive	0.947	0.176	0.997	0.947	0.971
$69, 1, 150, 173, 62)^{T}$	b	49	880	weighted average	0.885	0.115	0.609	0.885	0.661
$(10, 9, 12, 87,$	classified as − >	a	b	a: positive	0.882	0.017	0.484	0.882	0.625
$163, 16, 69, 13,$	a	15	2	b: positive	0.983	0.118	0.998	0.983	0.990
$1, 24)^{T}$	b	16	913	weighted average	0.933	0.067	0.741	0.933	0.808
$(10, 12, 9, 163,$	classified as − >	a	b	a: positive	0.882	0.041	0.283	0.882	0.429
$16, 69, 87, 13,$	a	15	2	b: positive	0.959	0.118	0.998	0.959	0.978
$130, 1, 24, 173)^{T}$	b	38	891	weighted average	0.921	0.079	0.640	0.921	0.703
$(10, 9, 12, 163,$	classified as − >	a	b	a: positive	0.706	0.055	0.190	0.706	0.300
$87, 130, 16, 24, 69,$	a	12	5	b: positive	0.945	0.294	0.994	0.945	0.969
$13, 62, 15, 1, 22)^{T}$	b	51	878	weighted average	0.825	0.175	0.592	0.825	0.635
$(10, 87, 9, 12,$	classified as − >	a	b	a: positive	0.882	0.037	0.306	0.882	0.455
$16, 163, 13, 69,$	a	15	2	b: positive	0.963	0.118	0.998	0.963	0.980
$173)^{T}$	b	34	895	weighted average	0.923	0.077	0.652	0.923	0.717
$(10, 9, 12, 16,$	classified as − >	a	b	a: positive	0.824	0.022	0.412	0.824	0.549
$163, 87, 13, 69,$	a	14	3	b: positive	0.978	0.176	0.997	0.978	0.988
$171, 24, 130)^{T}$	b	20	909	weighted average	0.901	0.099	0.704	0.901	0.768
$(10, 12, 9, 163,$	classified as − >	a	b	a: positive	0.941	0.047	0.267	0.941	0.416
$69, 16, 13, 87,$	a	16	1	b: positive	0.953	0.059	0.999	0.953	0.975
$1, 24, 173, 120)^{T}$	b	44	885	weighted average	0.947	0.053	0.633	0.947	0.695

Feature	Confusion matrix			Class	TP rate	FP rate	Precision	Recall	F1-measure
$(10, 9, 87, 12,$	classified as − >	a	b	a: positive	1.000	0.035	0.340	1.000	0.507
$16, 69, 163, 13,$	a	17	0	b: positive	0.965	0.000	1.000	0.965	0.982
$1, 130, 24, 62)^{T}$	b	33	897	weighted average	0.982	0.018	0.670	0.982	0.745
$(10, 9, 12, 87,$	classified as − >	a	b	a: positive	0.882	0.040	0.288	0.882	0.435
$16, 163, 13, 69,$	a	15	2	b: positive	0.960	0.118	0.998	0.960	0.979
$130, 24, 173)^{T}$	b	37	893	weighted average	0.921	0.079	0.643	0.921	0.707
$(10, 12, 87, 130,$	classified as − >	a	b	a: positive	0.647	0.006	0.647	0.647	0.647
$9, 16, 163, 13)^{T}$	a	11	6	b: positive	0.994	0.353	0.994	0.994	0.994
	b	6	924	weighted average	0.820	0.180	0.820	0.820	0.820
$(10, 9, 87, 12,$	classified as − >	a	b	a: positive	0.824	0.053	0.222	0.824	0.350
$16, 163, 13, 130,$	a	14	3	b: positive	0.947	0.176	0.997	0.947	0.971
$69, 1, 150, 173, 62)^{T}$	b	49	880	weighted average	0.885	0.115	0.609	0.885	0.661
$(10, 9, 12, 87,$	classified as − >	a	b	a: positive	0.882	0.017	0.484	0.882	0.625
$163, 16, 69, 13,$	a	15	2	b: positive	0.983	0.118	0.998	0.983	0.990
$1, 24)^{T}$	b	16	913	weighted average	0.933	0.067	0.741	0.933	0.808
$(10, 12, 9, 163,$	classified as − >	a	b	a: positive	0.882	0.041	0.283	0.882	0.429
$16, 69, 87, 13,$	a	15	2	b: positive	0.959	0.118	0.998	0.959	0.978
$130, 1, 24, 173)^{T}$	b	38	891	weighted average	0.921	0.079	0.640	0.921	0.703
$(10, 9, 12, 163,$	classified as − >	a	b	a: positive	0.706	0.055	0.190	0.706	0.300
$87, 130, 16, 24, 69,$	a	12	5	b: positive	0.945	0.294	0.994	0.945	0.969
$13, 62, 15, 1, 22)^{T}$	b	51	878	weighted average	0.825	0.175	0.592	0.825	0.635
$(10, 87, 9, 12,$	classified as − >	a	b	a: positive	0.882	0.037	0.306	0.882	0.455
$16, 163, 13, 69,$	a	15	2	b: positive	0.963	0.118	0.998	0.963	0.980
$173)^{T}$	b	34	895	weighted average	0.923	0.077	0.652	0.923	0.717
$(10, 9, 12, 16,$	classified as − >	a	b	a: positive	0.824	0.022	0.412	0.824	0.549
$163, 87, 13, 69,$	a	14	3	b: positive	0.978	0.176	0.997	0.978	0.988
$171, 24, 130)^{T}$	b	20	909	weighted average	0.901	0.099	0.704	0.901	0.768
$(10, 12, 9, 163,$	classified as − >	a	b	a: positive	0.941	0.047	0.267	0.941	0.416
$69, 16, 13, 87,$	a	16	1	b: positive	0.953	0.059	0.999	0.953	0.975
$1, 24, 173, 120)^{T}$	b	44	885	weighted average	0.947	0.053	0.633	0.947	0.695

Figure 5

The line charts of the ACCs. (A) The line chart of the ACCs using the clustering strategy. (B) The line charts of the ACCs on each feature dimension using the incremental strategy.

Open in new tab Download slide

In order to judge which base classifier of the hybrid ensemble classifier makes a greater contribution, the count of each base classifier assigned to the hybrid ensemble classifier was recorded in each feature dimension. Thus, 10 line charts corresponding to 10 folds are illustrated in Figure 6. It can be concluded that different base classifier is to be appointed according to different feature dimensions when establishing an ensemble classifier.

Figure 6

Line charts representing the count of each base classifier in accordance with the feature dimension in each fold.

Open in new tab Download slide

Discussions

According to the previous experimental results, discussions can be made as follows. First, it needs to be considered whether the classification performance of the hybrid ensemble classifier is better than that of other classifiers for identification of PPR. According to Table 2, it can be seen that the TP rate of PPR negative proteins has reached 0.999 when using random forest for classification. However, the maximum TP rate of PPR positive proteins is only 0.680, which means that nearly one-third of PPR positive proteins have been wrongly categorized into PPR negative ones. When using the hybrid ensemble classification on the same samples, it can be observed that the TP rate of PPR negative proteins has reached 0.998, and the maximum TP rate of PPR positive proteins is 0.910, as listed in Table 1. The comparative results indicate that the hybrid ensemble classifier keeps a better classification performance than random forest. Besides, the experimental results shown in Figure 5(A) exhibit that the mean ACC value and the corresponding standard deviation are 0.902 and 0.042 using the 16-dimensional feature, which also demonstrates the effectiveness of the hybrid ensemble classification.

Second, it is necessary to discuss which strategy for variable selection is more reliable between the clustering strategy and the incremental way. Qualitative and quantitative analyses have been made on the features selected by these two strategies. It can be seen in Figure 5(A) that the mean ACC value is 0.902 with the feature selected using the clustering strategy. While, the mean ACC value that corresponds to the feature selected in the incremental strategy is 0.896 with its standard deviation to be 0.053 (see Figure 5(B)). This weak difference between the two mean ACCs indicates that both the two strategies do work on the dataset representing plant PPR. Actually, these two strategies can be performed, respectively. Decisions can be made after a careful comparison between the classification results.

Third, it needs to be further discussed which base classifier is more appropriate in different feature dimensions. As shown in Figure 6, DTC and KNN are mostly selected when the feature dimension is low. When the feature dimension increases, GNB is considered. Thus, different base classifier is to be assigned with different feature dimensions. More research about this phenomenon will be made in our future work.

Conclusion

In this study, we improved our feature selection framework for PPR protein identification. Ten-fold nested cross validation was utilized to make the results more reliable. A decision for automatic stop of resampling was made. Instead of previous random forest, a hybrid ensemble classification was applied. As to automatic variable selection, a clustering way and an incremental strategy were both considered. Better classification results demonstrate the effectiveness of all our improvements. Ultimately, an phenomenon was discovered that the automatic assignment of a base classifier is closely associated with feature dimension.

Key Points

Hybrid ensemble classification has better classification performance than other classifiers.
An automatic feature selection is proposed, which is based on either an incremental strategy or a clustering by search in descending order.
Different base classifiers alternately play an important role in the hybrid ensemble classifier with feature dimension increasing.

Data availability

The real dataset analyzed for this study can be found in the UniPort at https://www.uniprot.org/ and can be download at https://www.frontiersin.org/articles/10.3389/fpls.2018.01961.

Authors Contributions Statement

W.G.H. conceived the general project and supervised it. Z.X.D. initiated the idea, conceived the whole process, and finalized the paper. Z.J.W. was the principal developer and made the experiments. L.T. helped to provide the clustering way and the incremental strategy.

Funding

This work has been supported by the Natural Science Foundation of China (No. 62072095, 62225109), the Fundamental Research Funds for the Central Universities (No. 2572021CG03), the Natural Science Foundation of Heilongjiang Province (No. LH2020F002) and the financial support of Specialized Personnel Start-up Grant (No. 520-60201521039).

Xudong Zhao received the B.S. degree in intelligent instrument, the M.S. degree in computer science and technology and the Ph.D. degree in artificial intelligence and information processing from Harbin Institute of Technology, Harbin, China, in 2003, 2007 and 2013, respectively. He was a post-doctoral fellow of computer science and engineering in Chinese University of Hongkong, in 2014. Currently, he is an associate professor in college of information and computer engineering, Northeast Forestry University, Harbin, China. His research interest includes feature selection, clustering, discovery of signatures for prognosis in different cancers, differential expression analysis on expression profiles and medical image processing.

Jingwen Zhai received the B.S. degree in computer science and technology from Harbin University of Science and Technology, Harbin, China in 2020. He is currently pursuing the master’s degree with the College of Information and Computer Engineering, Northeast Forestry University, under the supervision of X. D. Zhao. His research interests include pattern recognition, bioinformatics and machine learning

Tong Liu received the B.S. degree in computer science and technology from Northeast Forestry University, Harbin, China in 2020. He is currently pursuing the master’s degree with the College of Information and Computer Engineering, Northeast Forestry University, under the supervision of X. D. Zhao. His research interests include pattern recognition, bioinformatics, medical image processing and machine learning

Guohua Wang is a professor in college of information and computer engineering, Northeast Forestry University. He received the master’s and Ph.D degrees in computer science and technology from Harbin Institute of Technology, in 2003 and 2009, respectively. He has been working at Johns Hopkins University as a postdoctoral fellow from 2014 to 2016. His research interests are bioinformatics, machine learning and algorithms design.

References

1.

Barkan

A

,

Small

I

.

Pentatricopeptide repeat proteins in plants

.

Annu Rev Plant Biol

2014

;

65

(

1

):

415

–

42

.

2.

Zhang

Q

,

Yanghong

X

,

Huang

J

, et al.

The rice pentatricopeptide repeat protein ppr756 is involved in pollen development by affecting multiple RNA editing in mitochondria

.

Front Plant Sci

2020

;

11

:

749

.

3.

Li

XJ

,

Zhang

YF

,

Hou

M

, et al.

Small kernel 1 encodes a pentatricopeptide repeat protein required for mitochondrial nad7 transcript editing and seed development in maize (Zea mays) and rice (Oryza sativa)

.

Plant J

2014

;

79

(

5

):

797

–

809

.

4.

Wang

X

,

Zhao

L

,

Man

Y

, et al.

Pdm4, a pentatricopeptide repeat protein, affects chloroplast gene expression and chloroplast development in Arabidopsis thaliana

.

Front Plant Sci

2020

;

11

:1198.

Google Scholar

OpenURL Placeholder Text

WorldCat

5.

Zhang

J

,

Xiao

J

,

Li

Y

, et al.

Pdm3, a pentatricopeptide repeat-containing protein, affects chloroplast development

.

J Exp Bot

2017

;

68

(

20

):

5615

–

27

.

6.

Toda

T

,

Fujii

S

,

Noguchi

K

, et al.

Rice mpr25 encodes a pentatricopeptide repeat protein and is essential for RNA editing of nad5 transcripts in mitochondria

.

Plant J

2012

;

72

(

3

):

450

–

60

.

7.

Liu

Y-J

,

Xiu

Z-H

,

Meeley

R

, et al.

Empty pericarp5 encodes a pentatricopeptide repeat protein that is required for mitochondrial RNA editing and seed development in maize

.

Plant Cell

2013

;

25

(

3

):

868

–

83

.

8.

Wei

L

,

Tang

J

,

Zou

Q

.

Local-dpp: an improved DNA-binding protein prediction method by exploring local evolutionary information

.

Inform Sci

2017

;

384

:

135

–

44

.

Google Scholar

Crossref

WorldCat

9.

Tang

H

,

Zhao

Y-W

,

Zou

P

, et al.

Hbpred: a tool to identify growth hormone-binding proteins

.

Int J Biol Sci

2018

;

14

(

8

):

957

–

64

.

10.

Kaiyang

Q

,

Wei

L

,

Yu

J

, et al.

Identifying plant pentatricopeptide repeat coding gene/protein using mixed feature extraction methods

.

Front Plant Sci

2019

;1961.

Google Scholar

OpenURL Placeholder Text

WorldCat

11.

Congzhong Cai

LY

,

Han

ZL

,

Ji

XC

, et al.

Svm-prot: web-based support vector machine software for functional classification of a protein from its primary sequence

.

Nucleic Acids Res

2003

;

31

(

13

):

3692

–

7

.

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

12.

Hou

R

,

Wang

L

,

Yi-Jun

W

.

Predicting ATP-binding cassette transporters using the random forest method

.

Front Genet

2020

;

11

:

156

.

13.

Kaiyang

Q

,

Zou

Q

,

Shi

H

.

Prediction of diabetic protein markers based on an ensemble method

.

Front Biosci

2021

;

26

(

7

):

207

–

21

.

Google Scholar

OpenURL Placeholder Text

WorldCat

14.

Ao

C

,

Zhou

W

,

Gao

L

, et al.

Prediction of antioxidant proteins using hybrid feature representation method and random forest

.

Genomics

2020

;

112

(

6

):

4666

–

74

.

15.

Amin

A

,

Awais

M

,

Sahai

S

, et al.

idrp-pseaac: identification of DNA replication proteins using general PSEAAC and position dependent features

.

Int J Peptide Res Ther

2021

;

27

(

2

):

1315

–

29

.

Google Scholar

Crossref

WorldCat

16.

Pufeng

D

,

Wang

X

,

Chao

X

, et al.

Pseaac-builder: a cross-platform stand-alone program for generating various special chou’s pseudo-amino acid compositions

.

Anal Biochem

2012

;

425

(

2

):

117

–

9

.

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

17.

Zhao

X

,

Wang

H

,

Li

H

, et al.

Identifying plant pentatricopeptide repeat proteins using a variable selection method

.

Front Plant Sci

2021

;

12

:

298

.

Google Scholar

OpenURL Placeholder Text

WorldCat

18.

Hakala

K

,

Kaewphan

S

,

Björne

J

, et al.

Neural network and random forest models in protein function prediction

.

BioRxiv

2019

;690271.

Google Scholar

OpenURL Placeholder Text

WorldCat

19.

Gong

Y

,

Liao

B

,

Wang

P

, et al.

Drughybrid_bs: using hybrid feature combined with bagging-svm to predict potentially druggable proteins

.

Front Pharmacol

2021

;3467.

Google Scholar

OpenURL Placeholder Text

WorldCat

20.

Zhang

Y

,

Ni

J

,

Gao

Y

.

Rf-svm: identification of DNA-binding proteins based on comprehensive feature representation methods and support vector machine

.

Prot Struct Funct Bioinformatics

2022

;

90

(

2

):

395

–

404

.

Google Scholar

Crossref

WorldCat

21.

Zhang

J

,

Lv

L

,

Donglei

L

, et al.

Variable selection from a feature representing protein sequences: a case of classification on bacterial type iv secreted effectors

.

BMC Bioinformatics

2020

;

21

(

1

):

1

–

15

.

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

22.

Dai

W

,

Chen

B

,

Peng

W

, et al.

A novel multi-ensemble method for identifying essential proteins

.

J Comput Biol

2021

;

28

(

7

):

637

–

49

.

23.

Ning

Wang

,

Jun

Zhang

, and

Bin

Liu

. idrbp-el: identifying DNA-and RNA-binding proteins based on hierarchical ensemble learning.

IEEE/ACM Transactions on Computational Biology and Bioinformatics

,

2021

.

24.

Frank

E

,

Hall

M

,

Trigg

L

, et al.

Data mining in bioinformatics using Weka

.

Bioinformatics

2004

;

20

(

15

):

2479

–

81

.

25.

Li

W

,

Godzik

A

.

Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences

.

Bioinformatics

2006

;

22

(

13

):

1658

–

9

.

26.

Liu

T

,

Li

H

,

Zhao

X

.

Clustering by search in descending order and automatic find of density peaks

.

IEEE Access

2019

;

7

:

133772

–

80

.

Google Scholar

Crossref

WorldCat

27.

Li

R

,

Perneczky

R

,

Yakushev

I

, et al.

Gaussian mixture models and model selection for [18f] fluorodeoxyglucose positron emission tomography classification in alzheimer’s disease

.

PloS One

2015

;

10

(

4

):e0122731.

Google Scholar

OpenURL Placeholder Text

WorldCat

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Download all slides

Article Contents

Ensemble classification based feature selection: a case of identification on plant pentatricopeptide repeat proteins

Abstract

Introduction

Method

Data division

Ensemble classification

Score accumulation

Variable selection by clustering

Variable selection using an increasing strategy

Measurement

Results

Classification results among different classifiers

Classification results after using 10-fold nested cross validation

Discussions

Conclusion

Data availability

Authors Contributions Statement

Funding

References

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Article Contents

Ensemble classification based feature selection: a case of identification on plant pentatricopeptide repeat proteins

Abstract

Introduction

Method

Data division

Ensemble classification

Score accumulation

Variable selection by clustering

Variable selection using an increasing strategy

Measurement

Results

Classification results among different classifiers

Classification results after using 10-fold nested cross validation

Discussions

Conclusion

Data availability

Authors Contributions Statement

Funding

References

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only