-
PDF
- Split View
-
Views
-
Cite
Cite
Jiangping Xiao, Xiangru Li, Haitao Lin, Kaibin Qiu, Pulsar candidate selection using pseudo-nearest centroid neighbour classifier, Monthly Notices of the Royal Astronomical Society, Volume 492, Issue 2, February 2020, Pages 2119–2127, https://doi.org/10.1093/mnras/stz3539
- Share Icon Share
ABSTRACT
A typical characteristic of the pulsar candidate classification task is the class imbalance between true pulsars and false candidates. This imbalance has negative effects on traditional classification methods. In this study, we introduce a strategy using a scatter matrix-based class separability measure to estimate the harmfulness of class imbalance on pulsar candidate classification. The measure quantitatively describes the damage of the imbalanced situations on the pulsar candidate classification problem and provides some priori information to guide us to select an appropriate data processing method and to construct an effective classifier. After that, we present a non-parametric data exploration technique, a pseudo-nearest centroid neighbour classifier (PNCN), to identify credible pulsar candidates from pulsar survey data sets. The PNCN algorithm can effectively resolve the class imbalance problem and is applicable to data streams. The proposed algorithm is tested on High Time Resolution Universe Pulsar Survey (HTRU) 2 (obtained by an analysis of HTRU Medium Latitude data) and LOTAAS 1 (obtained from the LOFAR Tied-Array All-Sky Survey). The experimental results show that the proposed classifier can excellently identify the pulsars with high performance: the precision and the recall on HTRU 2 are 92.3 per cent and 83.1 per cent, and those on LOTAAS 1 are 97.4 per cent and 95.6 per cent, respectively; the false positive rate (FPR) on HTRU 2 is 0.7 per cent, on LOTAAS 1 is 0.03 per cent, which is an order of magnitude lower than the corresponding FPR obtained in Lyon et al. (2016) and Tan et al. (2018).
1 INTRODUCTION
Pulsars are fast rotation neutron stars with strong magnetic field. They are an ideal astrophysics laboratory with extreme physical properties that cannot be achieved on the Earth, for instance, extremely stable periodicity radiation pulses. Observation and research of pulsars are of great significance, especially in astronomy, astrophysics, particle physics, plasma physics, general relativity, gravitational waves, navigation, and so on (Lorimer et al. 1998; Lyne et al. 2004; Cordes et al. 2005; Hobbs et al. 2009). Since the discovery of the first pulsar in 1968 (Hewish et al. 1968), a large number of radio telescope devices have been applied to pulsar search. Currently, about 2700 pulsars have been found (Wang et al. 2018).
The discovery of pulsars typically involves identifying periodic signals in the observed data, and then reducing each periodic signal into a group of diagnostic values and graphical representations, which is referred to as a ‘candidate’ (Morello et al. 2014). Since most of the candidates are caused by radio frequency interference (RFI) and noise, which look like pulsars, searching for pulsars is not a straightforward task. In recent years, the selection of candidates mainly depends on manual methods. The continuous improvement of the survey specification has expanded the number and volume of pulsar candidates exponentially. Therefore, the manual methods became impractical, and many machine learning methods have attracted much attention for automatically processing the ‘candidate selection problem’ (Lyon et al. 2016).
In pulsar candidate classification tasks, class imbalance occurs frequently, especially the number of non-pulsars is much more than that of pulsars. This imbalance can result in great underestimation for the performance for pulsar candidate selection. During the past decade, many effective methods to resolve this problem had been developed, including sampling, feature selection, artificial neural network (ANN), ensemble learning, and so on.
Sampling is one of the data pre-processing techniques by changing the sample distribution of training set to reduce or eliminate imbalances. Morello et al. (2014) designed a Straightforward Pulsar Identification using Neural Networks classifier (SPINN) to improve the performance of ANN. They oversampled the pulsar to obtain a 4:1 ratio of non-pulsars to pulsars to decrease the class imbalance in the training set. More recently, Bethapudi & Desai (2018) used supervised machine learning algorithms to separate pulsar signals from noise. They used the synthetic minority oversampling technique to deal with class imbalance. Zhang et al. (2019) presented a VGG-net-based learning algorithm for pulsar recognition. They oversimplify the original data set at different sampling rates to deal with the imbalanced data set.
Feature selection is another data pre-processing method widely used in classification. It chooses features with distinguishing characteristics, so as to improve the classification accuracy of minority classes. Eatough et al. (2010) designed 12 features and used ANN on the Parkes Multibeam Pulsar Survey (Manchester et al. 2001). Soon afterwards, Bates et al. (2012) increased the number of features to 22 from candidates obtained from the High Time Resolution Universe Pulsar Survey (HTRU; Keith et al. 2010), to train a new ANN classifier. Morello et al. (2014) extracted six features from candidates obtained from the HTRU Medlat survey as input of the SPINN classifier. Previous features are based on certain experiences and assumptions, which are strongly dependent on data sets. This makes them impossible to use directly on other surveys without modification. In order to solve this problem, Lyon et al. (2016) designed eight new features based on statistical specialization. A tree-based machine learning classifier, the gh-vfdt algorithm, was used to test these new features on three different survey data sets. Experiments proved that these statistical features were superior over previous empirical features. And these features were used for reference by later researchers (Mohamed 2018).
Different from the data pre-processing stage, many researchers focus on the algorithm of classification stage. By analysing the shortcomings of the algorithm in solving the imbalance problem and the characteristics of imbalanced data, the recognition rate of minority classes can be increased by improving the algorithm appropriately. More and more ensemble methods and ANN-based methods have been applied. For example, Guo et al. (2017) proposed a dcagn+l2-svm algorithm to dress the class imbalance problem; Zhang et al. (2019) presented a VGG-net-based learning algorithm for pulsar recognition; Wang et al. (2019) presented a new ensemble model, the PICS-ResNet model, for pulsar candidate selection problem on FAST, the Five-hundred-meter Aperture Spherical radio Telescope Survey (Wang et al. 2019); and so on.
The class imbalance learning methods mentioned above can effectively reduce the negative effects on classfication from imbalances on some cases. However, some studies found that not all applications were affected by this imbalance (Ho & Basu 2002; Prati et al. 2004; Khoshgoftaar et al. 2011; Jain & Bhatnagar 2015). If we carry out class imbalance learning on those ones unaffected by the imbalance cannot only hardly improve, but also even degenerate classification performance (Liu et al. 2009). This problem was first identified in by Liu et al. (2009). Yu et al. (2014) systematically analyse the reasons of harmfulness produced by class imbalance and present a novel and simple strategy by scatter matrix-based class separability measure to pre-estimate the harmfulness of class imbalance. The estimation is quantitative and automatic merely using training data sets. Therefore, it is meaningful for us to pre-estimate the harmfulness of the imbalance in pulsar survey data sets before we process the pulsar candidate classification tasks.
With the increasing number of pulsar candidates and data volumes, not all parametric machine learning methods are applicable, and online classification of data streams is a challenge. For example, ANN is a parametric machine learning method widely used in pulsar candidate classification tasks. The main advantage of ANN is that the input provided to the network can be raw data, so we do not need to manually design features. However, the uninterpretability and too many training parameters make the mobility of this method poor. Hence, it is worthwhile to build a non-parametric machine learning method.
K-nearest neighbour classifier (KNN) is one of the most widely used non-parametric machine learning methods, which has been theoretically proved to have asymptotically optimal performance in the Bayes sense (Cover & Hart 1967; Wagner 1971). Moreover, the appeal of KNN is that only a single integer parameter k is required to adjust and any particular statistical distribution of the training data should not be considered, which is what makes it suitable for data streams. However, the classification performance of KNN and its variants is still degraded by three main issues: the sparse problem, the imbalance problem, and the noise problem. To overcome the existing outliers well, especially in the sparse, imbalance, and noise situations, a reliable KNN-based approaches, called the pseudo-nearest centroid neighbour rule (PNCN), is well designed by Ma et al. (2016).
In this paper, an evaluation criterion using the scatter matrix-based class separability measure is applied to estimate the harmfulness of the imbalance on two pulsar survey data sets, and a non-parametric machine learning algorithm PNCN is proposed to deal with the pulsar candidate classification tasks. The proposed algorithm is tested on HTRU 2 and LOTAAS 1, and its superiority over other classifiers used in the literatures is shown.
The rest of this paper is organized as follows: Section 2 presents the necessary background about the two survey data sets. Section 3 introduces the theory of scatter matrix-based class separability measure. Section 4 gives the proposed algorithm. The methodology is placed in Section 5. Some experimental investigations and discussions are conducted in Section 6. Finally, the paper is concluded in Section 7.
2 PULSAR DATA SET AND FEATURE SELECTION
2.1 Survey data sets
Two survey data sets from different telescopes are used in our study. These are summarized in Table 1. The first survey data set HTRU 2 was obtained during an analysis of HTRU Medium Latitude data by Thornton (2013). The pipeline searched for pulsar signals with DMs from 0 to 2000 cm−3 pc. This data set consists of 1639 pulsars and 16 259 spurious pulsars candidates caused by RFI and noise all of the sample in this data set have been checked by human annotators. The class imbalance ratio of this data set is 9.92 : 1. It is a standard and open telescope data set widely used in evaluating the performance of machine learning algorithms.
Basic information of survey data sets used in this study. R denotes the class imbalance ratio of non-pulsars to pulsars.
Data set . | Samples . | Pulsars . | Non-pulsars . | R . |
---|---|---|---|---|
HTRU 2 | 17898 | 1639 | 16259 | 9.92 : 1 |
LOTAAS 1 | 5053 | 66 | 4987 | 75.56 : 1 |
Data set . | Samples . | Pulsars . | Non-pulsars . | R . |
---|---|---|---|---|
HTRU 2 | 17898 | 1639 | 16259 | 9.92 : 1 |
LOTAAS 1 | 5053 | 66 | 4987 | 75.56 : 1 |
Basic information of survey data sets used in this study. R denotes the class imbalance ratio of non-pulsars to pulsars.
Data set . | Samples . | Pulsars . | Non-pulsars . | R . |
---|---|---|---|---|
HTRU 2 | 17898 | 1639 | 16259 | 9.92 : 1 |
LOTAAS 1 | 5053 | 66 | 4987 | 75.56 : 1 |
Data set . | Samples . | Pulsars . | Non-pulsars . | R . |
---|---|---|---|---|
HTRU 2 | 17898 | 1639 | 16259 | 9.92 : 1 |
LOTAAS 1 | 5053 | 66 | 4987 | 75.56 : 1 |
The second data set is LOTAAS 1, which was obtained from LOTAAS, the LOFAR Tied-Array All-sky Survey (Lofar Working Group 2013; Cooper 2014). This data set consists of 66 pulsar and 4987 non-pulsar candidates, with a class imbalance ratio of 75.56 : 1. These few pulsars represent the first set of pulsars found during the LOTAAS survey. These are just a few of pulsars that easier to be found, not necessarily those that are difficult to find, such as pulsars with very low S/N, or unusual pulse shapes. The LOTAAS survey is an ongoing all-Northern-sky pulsar survey that is currently private.
2.2 Feature selection
Lyon et al. (2016) designed eight new features based on a statistical specialization. Eight unbiased statistical features, including mean, variance, kurtosis, and skewness, are extracted from the integrated pulse profile P and the dispersion measure–Signal-to-noise ratio (DM-SNR) curves D, as shown in Table 2. These statistical features have maximized the separation between pulsars and spurious pulsars candidates. The feature data were extracted from these data sets by the Pulsar Feature Lab on https://doi.org/10.6084/m9.figshare.1536472.v1.
Feature . | Description . |
---|---|
Prof.μ | Mean of the integrated profile P. |
Prof.σ | Standard deviation of the integrated profile P. |
Prof.k | Excess kurtosis of the integrated profile P. |
Prof.s | Skewness of the integrated profile P. |
DM.μ | Mean of the DM–SNR curve D. |
DM.σ | Standard deviation of the DM–SNR curve D. |
DM.k | Excess kurtosis of the DM–SNR curve D. |
DM.s | Skewness of the DM–SNR curve D. |
Feature . | Description . |
---|---|
Prof.μ | Mean of the integrated profile P. |
Prof.σ | Standard deviation of the integrated profile P. |
Prof.k | Excess kurtosis of the integrated profile P. |
Prof.s | Skewness of the integrated profile P. |
DM.μ | Mean of the DM–SNR curve D. |
DM.σ | Standard deviation of the DM–SNR curve D. |
DM.k | Excess kurtosis of the DM–SNR curve D. |
DM.s | Skewness of the DM–SNR curve D. |
Feature . | Description . |
---|---|
Prof.μ | Mean of the integrated profile P. |
Prof.σ | Standard deviation of the integrated profile P. |
Prof.k | Excess kurtosis of the integrated profile P. |
Prof.s | Skewness of the integrated profile P. |
DM.μ | Mean of the DM–SNR curve D. |
DM.σ | Standard deviation of the DM–SNR curve D. |
DM.k | Excess kurtosis of the DM–SNR curve D. |
DM.s | Skewness of the DM–SNR curve D. |
Feature . | Description . |
---|---|
Prof.μ | Mean of the integrated profile P. |
Prof.σ | Standard deviation of the integrated profile P. |
Prof.k | Excess kurtosis of the integrated profile P. |
Prof.s | Skewness of the integrated profile P. |
DM.μ | Mean of the DM–SNR curve D. |
DM.σ | Standard deviation of the DM–SNR curve D. |
DM.k | Excess kurtosis of the DM–SNR curve D. |
DM.s | Skewness of the DM–SNR curve D. |
These statistical features are effective and superior to other empirical features previously used. This conclusion is confirmed once again on the work by Tan et al. (2018). Since the eight statistical features designed by Lyon et al. (2016) easily make mistakes on wide pulsar recognition in practical classification processing, Tan et al. (2018) improved it by designing eight new features using time-phase and frequency-domain-phase information. At the same time, the pulsar-like RFI is separately classified. The pulsar candidate classification task is regarded as a binary classification problem (pulsars, non-pulsars) or a three-class classification problem (pulsars, noise, RFI). Multiple decision trees are constructed to improve the performance. On the new LOTAAS data set (called LOTAAS 2 instead of the previous data set), the recall is 98.7 per cent, increased by 2.5 per cent. The FPR is reduced from 2.8 per cent to 0.5 per cent. However, the increase in the number of statistical features will lead to the occurrence of dimension disasters, and it is easy to grow overfitting. In this study, we use the eight statistical features designed by Lyon et al. (2016) on HTRU 2 and LOTAAS 1.
As mentioned above, although these statistical characteristics have been proved effective, the variation range of the same feature varies on the different telescope. This is because different telescopes have different geographical locations and observation angles. Even if the signal from the same pulsar, different telescopes will receive different data, which cause different survey data sets to be different. One obvious result is that different data sets have different overlaps under the same feature, even the statistical feature. This is important because the overlap degree has an impact on the performance of machine learning classification algorithms.
3 SCATTER MATRIX-BASED CLASS SEPARABILITY MEASURE
Yu et al. (2014) systematically analyses the reasons for harmfulness produced by class imbalance. The fundamental reasons for the harmfulness arouse by class imbalance are the degree of overlapping and class imbalance ratio. Fig. 1 shows the relationship between sample distribution and the harmfulness of class imbalance in a more general situation.

Relationship between sample distribution and the harmfulness of class imbalance (a) is non-overlapping balanced classification task, (b) is overlapping balanced classification task, (c) and (d) are non-overlapping imbalanced classification task and overlapping imbalanced classification task, respectively, • and ★ represent the samples belonging to different classes, circular region denotes misclassified examples, and the black curve is the boundary between two classes.

4 PSEUDO-NEAREST CENTROID NEIGHBOUR CLASSIFICATION
4.1 The proposed algorithm
A non-parametric machine learning algorithm PNCN (Ma et al. 2016) is proposed to solve the pulsar candidate classification tasks. In what follows, for the ease of presentation in the general recognition problem, we suppose that there is a training set |$T=\lbrace x_{i}\in R^{d}\rbrace ^{N}_{i=1}$| with N training samples, M classes in d-dimensional feature space, and the corresponding class labels are c1, c2, ..., cM. A subset of T from the class cl is |$T_{l}=\lbrace x_{lj}\in R^{d}\rbrace ^{N_{l}}_{j=1}$| with the number of the training samples Nl. In this study, the proposed algorithm operates on two data sets containing eight input features and one output variable, which is to say, we consider M = 2 and d = 8.
Find the first nearest centroid neighbour |$x_{1}^{\mathrm{ NCN}}$| of x, which is its nearest neighbour.
Find the ith nearest centroid neighbour |$x_{i}^{\mathrm{ NCN}} (i \ge 2)$|, which is imposed by the constraint that the centroid of the query x and all previous centroid neighbours, i.e. |$x_{1}^{\mathrm{ NCN}}, x_{2}^{\mathrm{ NCN}}, ... , x_{n\text{-}1}^{\mathrm{ NCN}}$|, is the closest to x.
The PNCN decides the class label of x as follows:
Search k-nearest centroid neighbours from Tl of each class cl for the query pattern x in the training set T, say |$T^{\mathrm{ NCN}}_{lk}(x) = \lbrace x^{\mathrm{ NCN}}_{lj}\in R^{d} \rbrace ^{k}_{j=1}$| .
- Compute the local mean vector |$\bar{u}_{lj}^{\mathrm{ NCN}}(x)$| of the first j nearest centroid neighbours of a query x from class cl. Let |$\bar{U}_{lk}^{\mathrm{ NCN}}(x)=\lbrace \bar{u}_{lj}^{\mathrm{ NCN}}(x)\in R^{d}\rbrace _{j=1}^{k}$| represent the set of the k local mean vectors corresponding to k-nearest centroid neighbours in the class cl, and the corresponding Euclidean distances to x are |$d(x, \bar{u}_{l1}^{\mathrm{ NCN}}(x)), d(x, \bar{u}_{l2}^{\mathrm{ NCN}}(x)), . . . , d(x, \bar{u}_{lk}^{\mathrm{ NCN}}(x))$|, where(7)$$\begin{eqnarray} \bar{u}_{lj}^{\mathrm{ NCN}}(x)=\frac{1}{j}\sum _{m=1}^j x_{lm}^{\mathrm{ NCN}}. \end{eqnarray}$$
- Assign different weights to k categorical local mean vectors. The weight |$\bar{W}_{lj}^{\mathrm{ NCN}}$| of the jth local mean vector |$\bar{u}_{lj}^{\mathrm{ NCN}}(x)$| for the class cl is given as(8)$$\begin{eqnarray} \bar{W}_{lj}^{\mathrm{ NCN}}=\frac{1}{j}, \ \ j = 1, ... , k. \end{eqnarray}$$
- Design the pseudo-nearest centroid neighbour |$\bar{x}_{l}^{\mathrm{ PNCN}}(x)$| of the query pattern x from class cl, and cl can be viewed as the class label of |$\bar{x}_{l}^{\mathrm{ PNCN}}(x)$|. The distance |$\bar{d}(x,\bar{x}_{l}^{\mathrm{ PNCN}}(x))$| between x and |$\bar{x}_{l}^{\mathrm{ PNCN}}(x)$| can be defined by the weighted sum of distances of k categorical local mean vectors to x as follows:(9)$$\begin{eqnarray} \bar{d}\left(x, \bar{x}_{l}^{\mathrm{ PNCN}}(x)\right) &=& \bar{W}_{ l1}^{\mathrm{ NCN}}\times d\left(x,\bar{u}_{l1}^{\mathrm{ NCN}}(x)\right) \nonumber\\ &&+\,... + \bar{W}_{lk}^{\mathrm{ NCN}}\times d(x,\bar{u}_{ lk}^{\mathrm{ NCN}}(x)). \end{eqnarray}$$
- Classify the query pattern x into the class c, as follows:(10)$$\begin{eqnarray} c = \arg \min _{c_{l}} \bar{d}\left(x, \bar{x}_{l}^{\mathrm{ PNCN}}(x)\right). \end{eqnarray}$$
According to the procedure of the PNCN above, we summarize it in Algorithm 1 by means of the pseudo-codes. The development code of the PNCN algorithm can be found at https://github.com/xiaojiangping/PNCN.
|$\mathbf {Input: }$| A query pattern x, a number of nearest neighbours k, a training set T. |
|$\mathbf {Output:}$| A decision for x that class belong to. |
|$\mathbf {Step:}$| |
1. Calculate the Euclidean distances of training samples in each class cl to x. |
2. Search the k-nearest centroid neighbours of x in each class cl, say |$T_{lk}^{\mathrm{ NCN}}(x)=\lbrace x_{lj}^{\mathrm{ NCN}}\in R^{d}\rbrace ^{k}_{j=1}$|. |
3. Compute the local mean vector |$\bar{u}_{lj}^{\mathrm{ NCN}}(x)$| of the first j nearest neighbours of x using |$T^{\mathrm{ NCN}}_{lk}(x)$| and the corresponding distance |$d(x, \bar{u}_{lj}^{\mathrm{ NCN}}(x))$| between |$\bar{u}_{lj}^{\mathrm{ NCN}}(x)$| and x. |
4. Allocate the weight |$\bar{W}_{lj}^{\mathrm{ NCN}}$| of the jth the local mean vector |$\bar{u}_{lj}^{\mathrm{ NCN}}(x)$| in the set |$\bar{U}_{lk}^{\mathrm{ NCN}}(x)$|. |
5. Design pseudo-nearest centroid neighbour |$\bar{x}_{l}^{\mathrm{ PNCN}}(x)$| using |$\bar{W}_{lk}^{\mathrm{ NCN}}$| and |$d(x,\bar{u}_{lk}^{\mathrm{ NCN}}(x))$|. |
6. Assign the class c of the closest pseudo-nearest centroid neighbour to x. |
|$\mathbf {Input: }$| A query pattern x, a number of nearest neighbours k, a training set T. |
|$\mathbf {Output:}$| A decision for x that class belong to. |
|$\mathbf {Step:}$| |
1. Calculate the Euclidean distances of training samples in each class cl to x. |
2. Search the k-nearest centroid neighbours of x in each class cl, say |$T_{lk}^{\mathrm{ NCN}}(x)=\lbrace x_{lj}^{\mathrm{ NCN}}\in R^{d}\rbrace ^{k}_{j=1}$|. |
3. Compute the local mean vector |$\bar{u}_{lj}^{\mathrm{ NCN}}(x)$| of the first j nearest neighbours of x using |$T^{\mathrm{ NCN}}_{lk}(x)$| and the corresponding distance |$d(x, \bar{u}_{lj}^{\mathrm{ NCN}}(x))$| between |$\bar{u}_{lj}^{\mathrm{ NCN}}(x)$| and x. |
4. Allocate the weight |$\bar{W}_{lj}^{\mathrm{ NCN}}$| of the jth the local mean vector |$\bar{u}_{lj}^{\mathrm{ NCN}}(x)$| in the set |$\bar{U}_{lk}^{\mathrm{ NCN}}(x)$|. |
5. Design pseudo-nearest centroid neighbour |$\bar{x}_{l}^{\mathrm{ PNCN}}(x)$| using |$\bar{W}_{lk}^{\mathrm{ NCN}}$| and |$d(x,\bar{u}_{lk}^{\mathrm{ NCN}}(x))$|. |
6. Assign the class c of the closest pseudo-nearest centroid neighbour to x. |
|$\mathbf {Input: }$| A query pattern x, a number of nearest neighbours k, a training set T. |
|$\mathbf {Output:}$| A decision for x that class belong to. |
|$\mathbf {Step:}$| |
1. Calculate the Euclidean distances of training samples in each class cl to x. |
2. Search the k-nearest centroid neighbours of x in each class cl, say |$T_{lk}^{\mathrm{ NCN}}(x)=\lbrace x_{lj}^{\mathrm{ NCN}}\in R^{d}\rbrace ^{k}_{j=1}$|. |
3. Compute the local mean vector |$\bar{u}_{lj}^{\mathrm{ NCN}}(x)$| of the first j nearest neighbours of x using |$T^{\mathrm{ NCN}}_{lk}(x)$| and the corresponding distance |$d(x, \bar{u}_{lj}^{\mathrm{ NCN}}(x))$| between |$\bar{u}_{lj}^{\mathrm{ NCN}}(x)$| and x. |
4. Allocate the weight |$\bar{W}_{lj}^{\mathrm{ NCN}}$| of the jth the local mean vector |$\bar{u}_{lj}^{\mathrm{ NCN}}(x)$| in the set |$\bar{U}_{lk}^{\mathrm{ NCN}}(x)$|. |
5. Design pseudo-nearest centroid neighbour |$\bar{x}_{l}^{\mathrm{ PNCN}}(x)$| using |$\bar{W}_{lk}^{\mathrm{ NCN}}$| and |$d(x,\bar{u}_{lk}^{\mathrm{ NCN}}(x))$|. |
6. Assign the class c of the closest pseudo-nearest centroid neighbour to x. |
|$\mathbf {Input: }$| A query pattern x, a number of nearest neighbours k, a training set T. |
|$\mathbf {Output:}$| A decision for x that class belong to. |
|$\mathbf {Step:}$| |
1. Calculate the Euclidean distances of training samples in each class cl to x. |
2. Search the k-nearest centroid neighbours of x in each class cl, say |$T_{lk}^{\mathrm{ NCN}}(x)=\lbrace x_{lj}^{\mathrm{ NCN}}\in R^{d}\rbrace ^{k}_{j=1}$|. |
3. Compute the local mean vector |$\bar{u}_{lj}^{\mathrm{ NCN}}(x)$| of the first j nearest neighbours of x using |$T^{\mathrm{ NCN}}_{lk}(x)$| and the corresponding distance |$d(x, \bar{u}_{lj}^{\mathrm{ NCN}}(x))$| between |$\bar{u}_{lj}^{\mathrm{ NCN}}(x)$| and x. |
4. Allocate the weight |$\bar{W}_{lj}^{\mathrm{ NCN}}$| of the jth the local mean vector |$\bar{u}_{lj}^{\mathrm{ NCN}}(x)$| in the set |$\bar{U}_{lk}^{\mathrm{ NCN}}(x)$|. |
5. Design pseudo-nearest centroid neighbour |$\bar{x}_{l}^{\mathrm{ PNCN}}(x)$| using |$\bar{W}_{lk}^{\mathrm{ NCN}}$| and |$d(x,\bar{u}_{lk}^{\mathrm{ NCN}}(x))$|. |
6. Assign the class c of the closest pseudo-nearest centroid neighbour to x. |
The PNCN classification algorithm has a high computational complexity. More details can be seen in Table 3. Since the complexity of the algorithm only needs to preserve the high level of the computational cost, the computation cost of KNN is O(Nd+Nk), and that of PNCN is O(2Ndk+Mdk2/2). When the nearest neighbour number k and category number M are not large, there are not much differences between the KNN classifier and the PNCN classifier on the computational cost.
Details of the computational cost of the KNN and the PNCN classification algorithms. Let k denotes the nearest neighbour number, and M is category number. N and d are the number of training samples and the dimension of feature space, respectively.
Classifier . | Computational cost . | Description . |
---|---|---|
KNN | O(Nd + Nk + k) | For each class, KNN is used to determine the class of the sample. |
O(2Ndk + Nk) | Search k-nearest neighbour of the nearest centroid in each class. | |
PNCN | O(Mdk(k + 1)/2) | Calculate k local mean vetors corresponding to k-nearest centroid neighbours for each class. |
O(Mk) | Assigning weights to local mean vectors of each class. | |
O(Md + M) | For each class, PNCN is used to determine the class of the sample. |
Classifier . | Computational cost . | Description . |
---|---|---|
KNN | O(Nd + Nk + k) | For each class, KNN is used to determine the class of the sample. |
O(2Ndk + Nk) | Search k-nearest neighbour of the nearest centroid in each class. | |
PNCN | O(Mdk(k + 1)/2) | Calculate k local mean vetors corresponding to k-nearest centroid neighbours for each class. |
O(Mk) | Assigning weights to local mean vectors of each class. | |
O(Md + M) | For each class, PNCN is used to determine the class of the sample. |
Details of the computational cost of the KNN and the PNCN classification algorithms. Let k denotes the nearest neighbour number, and M is category number. N and d are the number of training samples and the dimension of feature space, respectively.
Classifier . | Computational cost . | Description . |
---|---|---|
KNN | O(Nd + Nk + k) | For each class, KNN is used to determine the class of the sample. |
O(2Ndk + Nk) | Search k-nearest neighbour of the nearest centroid in each class. | |
PNCN | O(Mdk(k + 1)/2) | Calculate k local mean vetors corresponding to k-nearest centroid neighbours for each class. |
O(Mk) | Assigning weights to local mean vectors of each class. | |
O(Md + M) | For each class, PNCN is used to determine the class of the sample. |
Classifier . | Computational cost . | Description . |
---|---|---|
KNN | O(Nd + Nk + k) | For each class, KNN is used to determine the class of the sample. |
O(2Ndk + Nk) | Search k-nearest neighbour of the nearest centroid in each class. | |
PNCN | O(Mdk(k + 1)/2) | Calculate k local mean vetors corresponding to k-nearest centroid neighbours for each class. |
O(Mk) | Assigning weights to local mean vectors of each class. | |
O(Md + M) | For each class, PNCN is used to determine the class of the sample. |
The classification performance of a standard KNN classification algorithm is degraded by three main issues: the sparse problem, the imbalance problem, and the noise problem. The presence of noise and RFI causes the class overlaps being so great. This is why the standard KNN classification algorithm has worked poorly for pulsar data in the past. The PNCN classification algorithm is not susceptible to the same problem. Because the PNCN classification algorithm takes into account not only the spatial distribution of samples (k-nearest neighbours of a tested sample ‘x’ in each class), but also the mean (local mean points corresponding to each nearest neighbour of the centre of mass), and the contribution of these mean values to the nearest neighbour classification from the distance to the sample to be measured (local mean points give different weights). Therefore, theoretically, the PNCN classification algorithm is effective and robust in more practical situations.
4.2 Data streams
Data streams refers to a large number of continuous, fast arriving, potentially infinite, and time-varying data sequences. More information about data streams can be found in Gaber et al. (2005) and Lyon et al. (2013, 2014).
Continuous improvement of survey specifications is leading to the rapid generation of data, especially the exponential rise in pulsar candidates numbers and data volumes. For example, the Square Kilometre Array (SKA), is currently underdevelopment by an international consortium (Carilli & Rawlings 2004). In the near future, the data rate of SKA in pulsar detection is expected to be between 0.43 and 1.45 TB s−1 (Smits et al. 2009). This is many orders of magnitude greater than previous surveys, and online classification of data streams is a challenge.
Since it is sometimes difficult to obtain reference data with confirmed labels and the data to be processed in the real world at the same time, a commonly used discipline is to train a supervised candidate classifiers with some off-line data and then apply them to some data streams. This is a reasonable method, but the classifier must also be trained in some i.i.d. data associated with the data in the stream. However, some data streams are known to exhibit distributional shifts over varying time periods. For example, a changing RFI environment can exhibit shifts over both short (minutes/hours), and/or long (days/weeks/years) time-scales. In either case, the shifts cause violations of the i.i.d. assumption, a phenomenon known as concept drift (Widmer & Kubat 1996; Gaber et al. 2005). Recent years, more and more attention has been paid to machine learning methods suitable for pulsar classification from data streams, for example, the gh-vfdt algorithm (Lyon et al. 2016).
Theoretically, the PNCN classification algorithm can be applied to data streams. On the one hand, like the KNN classifier, the PNCN classifier does not build a classifier in advance. That makes it suitable for data streams. When a new sample arrives, the PNCN classifier finds the k-nearest centroid neighbours to the new sample from the training space, calculates the pseudo-nearest centroid neighbour and finally assigns the class label of the closest pseudo-nearest centroid neighbour to the new sample. On the other hand, as being described on the computational cost in the penultimate paragraph of Section 4.1, the computational cost of the PNCN algorithm is high. Some special computational equipment requirements such as graphics processing unit (GPU) may be required when using this method on data streams.
5 STRATIFICATION CROSS-VALIDATION AND EVALUATION MEASURES
5.1 Stratification cross-validation
In our study, we use stratification cross-validation to evaluate the performance of the PNCN algorithm. Stratification cross-validation is one kind of cross-validation, which is a resampling method widely used in machine learning for model evaluation, especially on imbalanced data sets.
The basic idea of cross-validation is to divide the reference data into two subsets: one is a training set to be used in learning the parameters of a model, for example, the connection weights in an ANN; another is a test set. There are two main reasons for using cross-validation. First, cross-validation can be used to evaluate the prediction performance of the model, especially the performance of the training model for fresh data, which can reduce overfitting to a certain extent. Secondly, we can get as much effective information as possible from the limited data. In the original cross-validation, the training set and the validation set are randomly divided, which does not necessarily maintain the imbalanced proportion of the original data. However, the lackness of maintaining the imbalanced proportion can lead to excessive positive evaluation results. Therefore, the stratification cross-validation is proposed to let the cross-validation method more suitable for imbalanced data by maintaining the proportionality of each category in the original data.
The process of K times of stratification cross-validation is as follows. The training set is divided into K subsets and every subset maintains the proportionality of each category in the original data. Cross-validation repeats K training-test procedures. In each training-test procedure, a single subset is regarded as test data, other K-1 subsets constitute a training set, and each subset is utilized as a test set only once in K training-test procedures. Therefore, K models are learned in the stratification cross-validation, K test results are computed, and the average of the K test results are computed as the final performance measure. In application, five or ten cross-validations are the most commonly used configurations. In our study, we used five-fold stratification cross-validation, and the flow chart is shown in Fig. 3.

5.2 Evaluation measures
Since the pulsar data sets are extremely imbalanced, researchers are more concerned about the classification accuracy of pulsars. In this kind of scenario, recall, precision, and FPR are three performance metrics generally used. Recall describes the proportion of positive samples (pulsar) correctly classified. The higher the recall, the more pulsar samples are correctly classified. Precision reflects the proportion of actual positive samples in the samples classified by the classifier. FPR calculates the proportion of negative samples (non-pulsar) mistakenly recognized as positives (pulsar) by a classifier. The higher the precision or the lower the FPR, the less non-pulsar signals are misclassified.
. | Positive . | Negative . |
---|---|---|
TRUE | TP | FN |
FALSE | FP | TN |
. | Positive . | Negative . |
---|---|---|
TRUE | TP | FN |
FALSE | FP | TN |
. | Positive . | Negative . |
---|---|---|
TRUE | TP | FN |
FALSE | FP | TN |
. | Positive . | Negative . |
---|---|---|
TRUE | TP | FN |
FALSE | FP | TN |
6 RESULTS AND DISCUSSIONS
6.1 The harmful test for the pulsar candidate classification tasks
In our study, there are two pulsar candidate classification tasks, using the survey data sets HTRU 2 and LOTAAS 1, respectively. The class imbalance ratio of HTRU 2 is 9.92, while that of LOTAAS 1 is 75.56, both are highly imbalanced. Before dealing with the pulsar candidate classification tasks, a harm test is conducted. The evaluation criterion Q using a scatter matrix-based class separability measure is applied to estimate the harmfulness of class imbalance.
In our experiment, we have considered the value of TQ as 0.1 (Yu 2017), adopt five-fold stratification cross-validation, that is 80 per cent of data set as training samples. The value of Q is the average of 10 independent runs. Results show that QHTRU2 = 0.040413 and QLOTAAS1 = 0.002981, which are both less than TQ. As described in Section 3, the classification task on the imbalanced HTUR 2 data set and LOTAAS 1 data set are harmful, which means that direct use of traditional machine learning methods to these classification tasks may not necessarily achieve good classification results. The accuracy and precision on gh-vfdt are higher and FPR on gh-vfdt is lower than that of four traditional classifiers: C4.5, MLP, NB, and SVM on HTRU 2 (Table 5; Lyon et al. 2016). This result is consistent with the harm test. Therefore, it is a necessity for us to pre-estimate the harmfulness of the imbalanced pulsar survey data sets before processing the pulsar candidate classification tasks.
The performance of different ML classifiers. LC1 and LC2 are two ensemble classifiers used on LOTAAS 1 and LOTAAS 2, respectively (Tan et al. 2018). Bold type indicates the best performance on HTUR 2 and LOTAAS 1. aIndicates the best performance compared with PNCN and LC2.
Data set . | Classifier . | Accuracy . | Recall . | Precision . | G-mean . | F-score . | FPR . |
---|---|---|---|---|---|---|---|
C4.5 | 0.946 | 0.904 | 0.635 | 0.926 | 0.740 | 0.051 | |
MLP | 0.947 | 0.913 | 0.650 | 0.931 | 0.752 | 0.050 | |
NB | 0.937 | 0.863 | 0.579 | 0.902 | 0.692 | 0.057 | |
HTRU 2 | SVM | 0.871 | 0.901 | 0.723 | 0.919 | 0.789 | 0.031 |
gh-vfdt | 0.978 | 0.829 | 0.899 | 0.907 | 0.862 | 0.008 | |
KNN | 0.978 | 0.825 | 0.930 | 0.906 | 0.875 | 0.006 | |
PNCN | 0.978 | 0.831 | 0.923 | 0.908 | 0.874 | 0.007 | |
C4.5 | 0.990 | 0.948 | 0.494 | 0.969 | 0.623 | 0.009 | |
MLP | 0.997 | 0.979 | 0.753 | 0.988 | 0.846 | 0.002 | |
NB | 0.996 | 0.959 | 0.673 | 0.977 | 0.782 | 0.004 | |
SVM | 0.999 | 0.901 | 0.966 | 0.949 | 0.932 | 0.001 | |
LOTAAS 1 | gh-vfdt | 0.998 | 0.789 | 0.875 | 0.888 | 0.830 | 0.001 |
KNN | 0.999 | 0.947 | 0.978 | 0.973 | 0.961 | 0.0003 | |
PNCN | 0.999a | 0.956 | 0.974 | 0.977 | 0.964 | 0.0003a | |
LC1 | 0.968 | 0.962 | 0.961 | 0.967 | 0.961 | 0.028 | |
LOTAAS 2 | LC2 | 0.992 | 0.987a | 0.993a | 0.991a | 0.990a | 0.005 |
Data set . | Classifier . | Accuracy . | Recall . | Precision . | G-mean . | F-score . | FPR . |
---|---|---|---|---|---|---|---|
C4.5 | 0.946 | 0.904 | 0.635 | 0.926 | 0.740 | 0.051 | |
MLP | 0.947 | 0.913 | 0.650 | 0.931 | 0.752 | 0.050 | |
NB | 0.937 | 0.863 | 0.579 | 0.902 | 0.692 | 0.057 | |
HTRU 2 | SVM | 0.871 | 0.901 | 0.723 | 0.919 | 0.789 | 0.031 |
gh-vfdt | 0.978 | 0.829 | 0.899 | 0.907 | 0.862 | 0.008 | |
KNN | 0.978 | 0.825 | 0.930 | 0.906 | 0.875 | 0.006 | |
PNCN | 0.978 | 0.831 | 0.923 | 0.908 | 0.874 | 0.007 | |
C4.5 | 0.990 | 0.948 | 0.494 | 0.969 | 0.623 | 0.009 | |
MLP | 0.997 | 0.979 | 0.753 | 0.988 | 0.846 | 0.002 | |
NB | 0.996 | 0.959 | 0.673 | 0.977 | 0.782 | 0.004 | |
SVM | 0.999 | 0.901 | 0.966 | 0.949 | 0.932 | 0.001 | |
LOTAAS 1 | gh-vfdt | 0.998 | 0.789 | 0.875 | 0.888 | 0.830 | 0.001 |
KNN | 0.999 | 0.947 | 0.978 | 0.973 | 0.961 | 0.0003 | |
PNCN | 0.999a | 0.956 | 0.974 | 0.977 | 0.964 | 0.0003a | |
LC1 | 0.968 | 0.962 | 0.961 | 0.967 | 0.961 | 0.028 | |
LOTAAS 2 | LC2 | 0.992 | 0.987a | 0.993a | 0.991a | 0.990a | 0.005 |
The performance of different ML classifiers. LC1 and LC2 are two ensemble classifiers used on LOTAAS 1 and LOTAAS 2, respectively (Tan et al. 2018). Bold type indicates the best performance on HTUR 2 and LOTAAS 1. aIndicates the best performance compared with PNCN and LC2.
Data set . | Classifier . | Accuracy . | Recall . | Precision . | G-mean . | F-score . | FPR . |
---|---|---|---|---|---|---|---|
C4.5 | 0.946 | 0.904 | 0.635 | 0.926 | 0.740 | 0.051 | |
MLP | 0.947 | 0.913 | 0.650 | 0.931 | 0.752 | 0.050 | |
NB | 0.937 | 0.863 | 0.579 | 0.902 | 0.692 | 0.057 | |
HTRU 2 | SVM | 0.871 | 0.901 | 0.723 | 0.919 | 0.789 | 0.031 |
gh-vfdt | 0.978 | 0.829 | 0.899 | 0.907 | 0.862 | 0.008 | |
KNN | 0.978 | 0.825 | 0.930 | 0.906 | 0.875 | 0.006 | |
PNCN | 0.978 | 0.831 | 0.923 | 0.908 | 0.874 | 0.007 | |
C4.5 | 0.990 | 0.948 | 0.494 | 0.969 | 0.623 | 0.009 | |
MLP | 0.997 | 0.979 | 0.753 | 0.988 | 0.846 | 0.002 | |
NB | 0.996 | 0.959 | 0.673 | 0.977 | 0.782 | 0.004 | |
SVM | 0.999 | 0.901 | 0.966 | 0.949 | 0.932 | 0.001 | |
LOTAAS 1 | gh-vfdt | 0.998 | 0.789 | 0.875 | 0.888 | 0.830 | 0.001 |
KNN | 0.999 | 0.947 | 0.978 | 0.973 | 0.961 | 0.0003 | |
PNCN | 0.999a | 0.956 | 0.974 | 0.977 | 0.964 | 0.0003a | |
LC1 | 0.968 | 0.962 | 0.961 | 0.967 | 0.961 | 0.028 | |
LOTAAS 2 | LC2 | 0.992 | 0.987a | 0.993a | 0.991a | 0.990a | 0.005 |
Data set . | Classifier . | Accuracy . | Recall . | Precision . | G-mean . | F-score . | FPR . |
---|---|---|---|---|---|---|---|
C4.5 | 0.946 | 0.904 | 0.635 | 0.926 | 0.740 | 0.051 | |
MLP | 0.947 | 0.913 | 0.650 | 0.931 | 0.752 | 0.050 | |
NB | 0.937 | 0.863 | 0.579 | 0.902 | 0.692 | 0.057 | |
HTRU 2 | SVM | 0.871 | 0.901 | 0.723 | 0.919 | 0.789 | 0.031 |
gh-vfdt | 0.978 | 0.829 | 0.899 | 0.907 | 0.862 | 0.008 | |
KNN | 0.978 | 0.825 | 0.930 | 0.906 | 0.875 | 0.006 | |
PNCN | 0.978 | 0.831 | 0.923 | 0.908 | 0.874 | 0.007 | |
C4.5 | 0.990 | 0.948 | 0.494 | 0.969 | 0.623 | 0.009 | |
MLP | 0.997 | 0.979 | 0.753 | 0.988 | 0.846 | 0.002 | |
NB | 0.996 | 0.959 | 0.673 | 0.977 | 0.782 | 0.004 | |
SVM | 0.999 | 0.901 | 0.966 | 0.949 | 0.932 | 0.001 | |
LOTAAS 1 | gh-vfdt | 0.998 | 0.789 | 0.875 | 0.888 | 0.830 | 0.001 |
KNN | 0.999 | 0.947 | 0.978 | 0.973 | 0.961 | 0.0003 | |
PNCN | 0.999a | 0.956 | 0.974 | 0.977 | 0.964 | 0.0003a | |
LC1 | 0.968 | 0.962 | 0.961 | 0.967 | 0.961 | 0.028 | |
LOTAAS 2 | LC2 | 0.992 | 0.987a | 0.993a | 0.991a | 0.990a | 0.005 |
6.2 Some evaluations of the PNCN algorithm
In this study, a non-parametric machine learning algorithm PNCN is proposed to solve the pulsar candidate classification tasks. The PNCN algorithm is an improvement of KNN, which can effectively resolve the class imbalance classification problem.
In this experiment, we adopt five-fold stratification cross-validation. Six performance metrics, accuracy, recall, precision, G-Mean, F-Score, and FPR, are calculated by averaging the results of 10 independent runs. The experimental results are compared with the KNN classifier and other six different classifiers. These experiments are conducted on HTRU 2, LOTAAS 1, and LOTASS 2 (Lyon et al. 2016; Tan et al. 2018), and the evaluation results are presented in Table 5.
The HTRU 2 data set is a standard telescope data set, which is widely used to evaluate the performance of machine learning algorithms. Comparing with the classification algorithms that implemented on HTRU 2 mentioned in Lyon et al. (2016), the performance recorded in Table 5 shows that the PNCN classifier outperforms all these five classifiers: C4.5, MLP, NB, SVM, and PNCN in four evaluation metrics: accuracy, precision, F-Score, and FPR.
Next, we compare the performance of the classification algorithms implemented on HTRU 2 and LOTAAS 1. On the one hand, the values of recall on both data sets are higher on the PNCN than that of the KNN. This result means that the PNCN classifier is more superior than the traditional KNN classifier in identifying pulsars. On the other hand, focus on a binary classification problem. It is shown that our proposed classifier outperforms all these seven classifiers based on many other evaluation metrics. Specifically, the proposed classifier can identify the pulsar with high accuracy, 97.8 per cent and 99.9 per cent on HTRU2 and LOTAAS 1, respectively. The precision and the recall on HTRU 2 are 92.3 per cent and 83.1 per cent, and those on LOTAAS 1 are 97.4 per cent and 95.6 per cent, respectively. The FPR on HTRU 2 is 0.7 per cent. The FPR on LOTAAS 1 is 0.03 per cent, which is an order of magnitude lower. Although the proposed classifier exhibits a slightly lower performance than LC2 (Tan et al. 2018) based on recall and precision on LOTAAS 1, the FPR is an order of magnitude lower than that of LC2. Finally, comparing with the PNCN classification algorithm that implements on HTRU 2 and LOAAST 1, the results on HTRU 2 are not good. The main reason is that the distribution of data collected by different telescopes is very different, such as the difference in observing frequency and angle. Therefore, although the same feature calculation formula, the value range of feature value varies with the data set. This also leads to a different degree of overlap of the same features in different data sets. As mentioned in Section 3, the degree of class overlap affects the classification effect of machine learning.
Furthermore, one significant advantage of the PNCN algorithm is its robustness to k. On the one hand, the results in Figs 4 and 5 show the effect k on classification performance. Obviously, with the changement of k, there is no significant change in each performance evaluation metrics. Figs 6 and 7 show that the value of FPR tends to be flat when k is larger. All above suggesting that the PNCN algorithm is robust to k. Due to the imbalance of data set, researchers pay more attention to the classification accuracy of pulsars, so recall, precision, and FPR are generally used to reflect the performance of the algorithm. Consider the changes of these three metrics. The best choice of the k in PNCN is 7 for HTUR 2 and 4 for LOTAAS 1. On the other hand, comparing PNCN and KNN, both classifiers get higher accuracy and precision, lower FPR. Although the value of precision on both data sets is a little lower in PNCN than that of the KNN, the advantage of the PNCN algorithm is its robustness to k. This is very important because the robustness to k means that the performance of the PNCN algorithm is more stable with the change of data. This stableness makes us believe that the PNCN classification algorithm is not as susceptible to noise and RFI as the empirical KNN classification algorithm. This stableness also makes us look forward to the application to data streams. Also, as discussed in Section 4.1, the computation cost of both algorithms is near, since the category number M is 2, which is not large.

Performance evaluations on HTRU 2 using accuracy, recall, precision, G-Mean, and F-Score.

Performance evaluations on LOTAAS 1 using accuracy, recall, precision, G-Mean, and F-Score.


Overall, the PNCN algorithm can effectively solve the pulsar candidate classification task.
7 CONCLUSIONS
In this paper, an evaluation criterion using a scatter matrix-based class separability measure is applied to estimate the harmfulness of class imbalance on pulsar survey data sets. Results of harm test give us quantitative measure for the damage from the imbalance in pulsar survey data sets. The estimated results provide prior information to guide us to select appropriate data processing approaches and to construct efficient classifiers. This is the pre-evaluation strategy first used to deal with pulsar candidate classification tasks.
We proposed a PNCN classifier for pulsar candidate selection. The PNCN classifier is a non-parametric machine learning method first used in pulsar candidate selection problem. In terms of data feature selection, we used the statistical features designed by Lyon et al. (2016). Results show that the PNCN algorithm can effectively solve the class imbalance problem in pulsar candidate classification tasks. Our work proves once again that the selection of pulsar candidates based on statistical features is effective.
Although the PNCN classifier have an order of magnitude lower in FPR than the corresponding FPR obtained in Lyon et al. (2016) and Tan et al. (2018) on LOTAAS 1, the recall and the precision are not superior comparing with LC2 classifier. Whether the combination of appropriate data resampling methods can improve recall and precision? This is the future work we plan to explore. We also plan to apply the non-parametric machine learning method to other pulsar survey data sets.
ACKNOWLEDGEMENTS
Authors are grateful for the support from the National Natural Science Foundation of China (grant no.: 11973022), the Joint Research Fund in Astronomy (U1531242) under cooperative agreement between the National Natural Science Foundation of China (NSFC) and Chinese Academy of Sciences (CAS), the Major projects of the joint fund of Guangdong and the National Natural Science Foundation (U1811464), the Natural Science Foundation of Guangdong Province (2014A030313425, S2011010003348), and the China Scholarship Council (201706755006) to our free explorations.