Hyperparameters optimization of convolutional neural network based on local autonomous competition harmony search algorithm

harmony search algorithm, convolutional neural network, optimization speed, hyperparameters optimization

Issue Section:

Research article

Highlights

A parameter dynamic adjustment strategy is studied to improve the algorithm search speed.
An independent decision-making search strategy based on the optimal state is designed.
A local competition mechanism is proposed to help the algorithm to jump out of the local fitting situation.
An evaluation function is proposed to achieve the purpose of saving the calculation cost without affecting the search results.

1. Introduction

Convolutional neural network (CNN), as a representative of machine learning, is widely used in various fields because of its advantages in extracting local features of the input data (especially input images) by its convolution kernel (Khan et al., 2020). Looking back at the development process of CNN, LeCun et al. first proposed the concept of CNN, and built a LeNet-5 model to apply it to image processing (Khan et al., 2020). However, due to the limitation of historical conditions at that time, it did not attract much attention. With the development of science and technology, Krizhevsky et al. (Krizhevsky et al., 2012) proposed that the AlexNet model has made a significant breakthrough in image processing. This has caused an upsurge in studying the structure of CNN. The following network models, such as VGGNet (Simonyan & Zisserman, 2014), GoogLeNet (Szegedy et al., 2014), ResNet (He et al., 2016), and DenseNets (Huang et al., 2016), are all improved based on the network structure. With the maturity of CNN structure, because of its good network performance, it not only performs well in image recognition (Yan et al., 2015), but also is widely used in speech recognition (Yu et al., 2017), text recognition (Wang et al., 2016), self-driving (Chen et al., 2021), target recognition (Tan & Le, 2019), and other fields. Therefore, the optimization of CNN is of great research value.

The traditional direction of optimizing CNN performance is to improve CNN from the aspects of the network structure (He et al., 2016; Huang et al., 2016; Khan et al., 2020; Krizhevsky et al., 2012; Simonyan & Zisserman, 2014; Szegedy et al., 2014), parameter initialization (Zhang et al., 2018), loss function (Zhang et al., 2018), and optimization algorithm (Zhang et al., 2018) to make it have a better performance. For example, VGGNet (Simonyan & Zisserman, 2014), GoogleNet (Szegedy et al., 2014), ResNet (He et al., 2016), and DenseNets (Huang et al., 2016) proposed a series of different CNN network structures; a series of loss functions (Zhang et al., 2018) are designed for neural networks, such as zero-one loss function, logarithmic loss function, and square loss function mean-square error (MSE). However, the development of CNN’s network structure is very mature, so is not easy to improve CNN’s performance by optimizing the network structure. Furthermore, there are many kinds of existing loss functions, which have met the needs of neural networks for different situations. With the development of the CNN network structure and loss function, the problem of parameter initialization of CNN is more and more worthy of attention. It is not only because of the sensitivity of CNN to hyperparameters, e.g., the size of the convolution kernel can affect the effect of extracting image features by CNN, but also because the structure of CNN networks tends to widen and deepen, and the variety of loss functions makes the problem of parameter initialization more complicated. Predecessors call the parameters to be initialized as hyperparameters (Larochelle et al., 2007), and the parameter initialization problem is a superparameter optimization problem. Most of the efficient CNN models are adjusted manually. Still, it wastes a lot of time and computational cost. Thus, it is challenging to meet the needs of increasingly complex CNN. Therefore, how to quickly design a set of corresponding superparameter combinations of CNN that are suitable for solving specific problems is still a challenging problem.

With the progress of science and technology, it is feasible to optimize the hyperparameters (Feurer & Hutter, 2019) automatically. The so-called automatic optimization of hyperparameters is to find the best combination of CNN hyperparameters as an optimization problem, and then use intelligent algorithms to optimize. Good results have been achieved in this respect (Bergstra & Bengio, 2012; Kandasamy et al., 2018; Zoph & Le, 2016), such as random search algorithm (Bergstra & Bengio, 2012), grid search algorithm, reinforcement learning (Zoph & Le, 2016), Bayesian optimization (Kandasamy et al., 2018), and evolutionary computing (EC)-based methods. At present, the commonly used methods for optimizing hyperparameters automatically have their own defects. For example, the grid search algorithm makes full use of the advantages of parallel computing by searching the value of each hyperparameter set in a specific range, which makes the optimization very fast. However, the characteristics of parallel computing lead to a situation where if one task fails, other tasks will also fail accordingly. And the computational complexity will increase as the number of hyperparameters to be optimized increases. Therefore, the grid search method is not suitable for the situation that a large number of hyperparameters need to be optimized. The random search algorithm (Bergstra et al., 2011) makes it faster than grid search by randomly sampling the search range. However, due to the randomness of the algorithm, the accuracy of the results cannot be guaranteed. Therefore, the random search method is unsuitable for hyperparameters optimization with high precision requirements. EC-based approach (Zhan et al., 2022a) imitates the process of how a population learns to adapt to the environment and optimize species. Therefore, it has natural advantages in solving large-scale optimization problems. The EC-based methods have made good progress in solving CNN hyperparameters optimization problems (Aszemi & Dominic, 2019; Li et al., 2023a; Wang et al., 2022a). For example, Real et al. proposed a large-scale neural evolutionary algorithm. Find the best CNN model by optimizing the network structure (Real et al., 2017). Fernandes and Yen (2021) proposed a multi-objective evolutionary strategy algorithm to optimize the structure of deep CNNs. However, these EC-based methods still have slow convergence speed and are prone to falling into local optima in the face of enormous search space (Jian et al., 2021; Li et al., 2023b; Wang et al., 2020, 2022b). As the representative of EC algorithms for superparameter optimization, the genetic algorithm (GA) has been applied to CNN superparameter optimization many times. For example, Aszemi and Dominic (2019) proposed using the GA to optimize CNN hyperparameters. Taking advantage of the unique advantages of network blocks in ResNet and DenseNet in feature extraction, Raymond and Beng proposed a GA based on block enhancement (Raymond & Beng, 2007) to build CNN architecture and improve the network performance automatically. Furthermore, Yang et al. proposed using a multi-objective GA to obtain more precise and smaller CNN (Karpathy, 2016). Although the GA has achieved good results in hyperparameters optimization, it cannot avoid the problems of slow search speed and high time cost due to the inability to use the feedback information on the network timely. In addition, the optimization of GAs depends to a certain extent on the initialization of the population, which cannot guarantee the effectiveness of each optimization. In addition to GA, other powerful EC algorithms can also be used, such as particle swarm optimization, which has been applied to CNN hyperparameters optimization many times. For example, Guo et al. proposed a distributed particle swarm optimization method to improve the efficiency of CNN hyperparameters optimization (Guo et al., 2020). A two-stage variable length particle swarm optimization method (Huang et al., 2022) is used to search the microstructure and macrostructure of the neural network. To solve the problem of hyperparameter optimal and high computing cost, Wang et al. (2022a) proposed a particle swarm optimization method based on lightweight scale adaptive fitness evaluation (SAFE). In addition, the differential evolution algorithm (Awad et al., 2020), distribution estimation algorithm (Li et al., 2023a), and their combinations also have been applied to CNN hyperparameters optimization many times. However, just like GAs, they all face the dilemma of local fitting and slow convergence speed (Jian et al., 2020). In addition, expensive optimization problems (Zhan et al., 2022b) are inevitable when combining deep learning with EC algorithms. Previous researchers have rich experience solving expensive optimization problems (Li et al., 2022; Lu et al., 2020; Suganuma et al., 2020; Sun et al., 2019; Wang et al., 2021). For example, Wu et al., (2021) proposed a novel SAFE method to address the expense optimization problems. Li et al. (2020) presented to solve the expensive optimization problems by building a surrogate model.

The difficulties faced by predecessors in optimizing hyperparameters can be summarized as two points. Firstly, the algorithm is easy to fall into the dilemma of local fitting in the process of optimizing CNN hyperparameters. Because the search space corresponding to configuring CNN’s superparameters can be very vast, it is difficult for the algorithm to thoroughly search the whole space. In this case, the algorithm is prone to the dilemma of slow convergence speed and falling into local optima. Secondly, it is difficult to determine the evaluating indicator used to appraise CNN’s performance. The previous evaluation of CNN’s performance index often uses the accuracy of the CNN model in the test set after a certain number of trainings. Therefore, the training times have a great influence on the performance of the CNN. If the CNN model is trained too few times, it will lead to the performance index of evaluating network model is not representative. Thus, the accuracy of algorithm optimization decreases. If the CNN model is trained too many times, the computational complexity of evaluating CNN performance will increase, leading to extensive optimization problems. Therefore, this paper proposes a local autonomous competitive harmony search (LACHS) algorithm to solve these two problems in the process of CNN hyperparameter optimization.

The main contributions of the proposed LACHS algorithm are as follows:

From the perspective of algorithm parameter tuning, a dynamic adjustment strategy is adopted to dynamically adjust the key parameters, pitch adjustment probability PAR, and step factor BW of HS algorithm with the number of iterations. This strategy improves the adaptability of the algorithm to different optimization problems and avoids complex parameter adjustment.
From the perspective of CNN hyperparameters optimization, an automatic decision-making search strategy based on the optimal state. This strategy selects the search strategy independently through the update of the optimal harmony, which enhances the search precision of the algorithm in various fields. And this strategy improves the ability of the algorithm to jump out of the local fitting. In addition, a local competition mechanism is designed to make the newly generated harmony compete with the worst harmonic progression of local selection. This strategy improves the ability of the algorithm to jump out of local fitting. At the same time, it avoids the slow convergence speed of the algorithm.
From the perspective of solving expensive optimization problems, this paper designs an evaluation function that fuses the training times and recognition accuracy. This strategy makes the training time of each model change with the learning rate and batch size. This can avoid the expensive optimization problem without affecting the search results.
According to the experimental results, two classic image classification datasets are used: Fashion-MNIST dataset (Xiao et al., 2017) and CIFAR 10 dataset (Doon et al., 2018). In the experiment, the way in this paper is compared with CNN based on empirical configuration and CNN based on classical intelligent algorithm automatic configuration. The results show that the way proposed in this paper has the highest competitive performance under low computation. In addition, the way proposed in this paper is applied to expression recognition, and the experiment proves that the method proposed in this paper is feasible in practical application.

The rest of the paper consists of the following: The basic principles of CNN and HS algorithm will be briefly introduced in Section 2. the LACHS algorithm will be introduced in Section 3. Section 4 mainly introduces the relevant experimental research to prove the effectiveness of LACHS algorithm. Section 5 summarizes the work done in this paper and puts forward the future work direction.

2. Foundation Knowledge

This section mainly introduces related basic knowledge, including CNN and harmony search (HS) algorithm. The following will be introduced in detail.

2.1. Convolutional neural network

CNNs are a kind of feed-forward neural network with convolution calculation and deep structure, which is mainly composed of an input layer, convolution layer (CONV), pooling layer (POOL), full connection layer (FC), and the output layer. When the INPUT data enter a simple CNN, the main flow is as follows: first, the input layer reads the input data and keeps the original structure of the input data; then, enter the CONV, CONV is used to extract local features; and then the negative data value is converted to 0 by the linear rectification layer (RELU). Then enter the POOL layer (POOL) to reduce the eigenvector of the CONV to prevent over-fitting; finally, we enter the FC and map the learned “distributed feature representation” to the sample mark space to realize data classification. The main principle and structure of CNN are shown in Fig. 1.

Figure 1:

The main principle and structure diagram of CNN.

However, the structure of all CNN is not the same as that of Fig. 1, such as changing the number of CONV, changing the size of the convolution kernel of each CONV, and selecting different pooling methods for layering layers. Therefore, many kinds of CNNs are derived, such as LeNet (Khan et al., 2020), AlexNet (Krizhevsky et al., 2012), VG GNet (Simonyan & Zisserman, 2014), GoogLeNet (Szegedy et al., 2014), ResNet (He et al., 2016), and DenseNets (Huang et al., 2016). As the basic network, VGGNet has excellent classification performance. And the predecessors have rich experience in the research of VGGNet. Therefore, it is more beneficial to choose VGGNet as the basic network. Because there are many kinds of hyperparameters, it takes a lot of manpower and time to choose a set of suitable hyperparameters. LACHS algorithm proposed in this paper is used to solve this problem.

2.2. HS algorithm

Different from other algorithms, HS algorithm is a meta-heuristic search algorithm that simulates the principle of band harmony in music performance. This makes the HS algorithm have strong parallel and global search capabilities (Geem et al., 2001). There are natural advantages in solving the problems in the process of superparameter optimization, such as a large amount of optimization parameters, fast optimization speed, and high precision requirements, which is one of the reasons why the HS algorithm is chosen.

The basic working idea of the HS algorithm is as follows: firstly, HM initial solutions are generated and put into harmony memory. Then, each component of the solution is searched in harmony memory with probability HMCR, and searched outside memory with a likelihood of 1-HMCR. Finally, expecting to obtain the corresponding component of the new solution. When searching in the memory, whether the pitch adjustment probability PAR needs to be fine-tuned or not, and if so, fine-tuning according to the step factor BW to form a new solution. Otherwise, no fine-tuning is performed. If the new solution is better than the worst solution in the memory, replace the worst solution in the memory with the new solution. And so on until the termination condition is met.

According to the literature (Geem et al., 2001), it is concluded that the flow of the HS algorithm can be divided into the following six steps:

Step 1: Initialize the related variables of the algorithm. The parameters include harmony memory size HMS, memory value probability HMCR, pitch adjustment probability PAR, step factor BW, and maximum creation times Tmax.

Step 2: In the solution space determined by the algorithm, there are n musical instruments when there are n variables. Let the upper limit of musical instrument x(j) be U(j), let the lower limit of the instrument x(j) be L(j), and [L(j), U(j)] is the playable area of musical instrument x(j). The combination of the playable areas of all instruments is the solution space of the algorithm.

Step 3: Initialize the harmony memory bank, the harmony memory bank HM consists of HMS harmonies, and Xi = {xi(1), xi(2), …, xi(D)} represents the ith harmony, which is obtained by the following formula:

\begin{array}{r} X_{i} (j) = L (j) + r a n d (0, 1) \times (U (j) - L (j)) \end{array}

(1)

Rand (0,1) is a random number from 0 to 1. Therefore, HMS initial solutions are obtained and stored in the matrix (Fig. 2):

\begin{array}{r} H M = [\begin{array}{c} X_{1} \\ \dots \\ X_{HMS} \end{array}] \end{array}

(2)

Step 4: Generate a new harmony.

Figure 2:

Initialize the operation diagram of the sound memory library.

According to the three rules of HMCR selection, fine-tuning and a random selection of new harmony in HM, a new harmony vector xnew is generated: at first, a number lr from 0 to 1 is randomly generated, and if 1r is less than HMCR, the decision variable xnew(j) is generated from the memory. Then, each decision variable xnew(j) is fine-tuned with probability PAR. Otherwise, xnew (j) is generated by randomly selecting one in HM (generated according to formula 1). The fine-tuning method is as follows:

\begin{array}{r} x_{new} (j) = x_{new} (j) \pm r a n d (0, 1) \times B W \end{array}

(3)

Step 5: Update the harmony library. If the new sound is due to the worst solution, it will be replaced; otherwise, it will not be updated.

Step 6: Judge whether to terminate, judge whether the current creation times have reached this maximum number, if not, repeat the process of steps 4–5 until the maximum creation times are reached.

Finally, the pseudo-code of standard HS algorithm is summarized as shown in Table 1.

Table 1:

Pseudo-code of standard HS algorithm.

Algorithm 1 Harmonic search algorithm pseudo-code
1: Define fitness value function fitness(t) = f(x), x = (x₁, x₂, x₃, …, x_n).n
2: Define the generation range of harmony and the generating function of harmony.
3: Set the algorithm parameters: harmony library size (HMS), harmony library value probability (HMCR), pitch adjustment probability (PAR), step size factor (BW), and maximum creation times (MAXGEN).
4: Initialize the harmony library.
5: Evaluate the fitness value of the sound library.
6: Take the harmony best with the largest fitness value in the harmony library and set t = 0.
7: whilet < MAXGEN − 1 do
8: if random.random() < HMCR then
9: if random.random() < PAR then
10: Take a random harmony inv from the harmony library
11: aa = np.random.randint(0,self.len,size = BW)
12: fori = 0 → BW − 1 do
13: Fine-tuning the variables corresponding to the harmony to obtain a new harmony invx.
14: end for
15: end if
16: else
17: Generate new harmony invx
18: end if
19: The new harmony invx is compared with the harmony invy with the worst solution in the harmony library, and updated if it is better than it.
20: Update the harmony best with the maximum fitness value in the harmony library again.
21: t = t + 1
22: end while

Algorithm 1 Harmonic search algorithm pseudo-code
1: Define fitness value function fitness(t) = f(x), x = (x₁, x₂, x₃, …, x_n).n
2: Define the generation range of harmony and the generating function of harmony.
3: Set the algorithm parameters: harmony library size (HMS), harmony library value probability (HMCR), pitch adjustment probability (PAR), step size factor (BW), and maximum creation times (MAXGEN).
4: Initialize the harmony library.
5: Evaluate the fitness value of the sound library.
6: Take the harmony best with the largest fitness value in the harmony library and set t = 0.
7: whilet < MAXGEN − 1 do
8: if random.random() < HMCR then
9: if random.random() < PAR then
10: Take a random harmony inv from the harmony library
11: aa = np.random.randint(0,self.len,size = BW)
12: fori = 0 → BW − 1 do
13: Fine-tuning the variables corresponding to the harmony to obtain a new harmony invx.
14: end for
15: end if
16: else
17: Generate new harmony invx
18: end if
19: The new harmony invx is compared with the harmony invy with the worst solution in the harmony library, and updated if it is better than it.
20: Update the harmony best with the maximum fitness value in the harmony library again.
21: t = t + 1
22: end while

Table 1:

Pseudo-code of standard HS algorithm.

Algorithm 1 Harmonic search algorithm pseudo-code
1: Define fitness value function fitness(t) = f(x), x = (x₁, x₂, x₃, …, x_n).n
2: Define the generation range of harmony and the generating function of harmony.
3: Set the algorithm parameters: harmony library size (HMS), harmony library value probability (HMCR), pitch adjustment probability (PAR), step size factor (BW), and maximum creation times (MAXGEN).
4: Initialize the harmony library.
5: Evaluate the fitness value of the sound library.
6: Take the harmony best with the largest fitness value in the harmony library and set t = 0.
7: whilet < MAXGEN − 1 do
8: if random.random() < HMCR then
9: if random.random() < PAR then
10: Take a random harmony inv from the harmony library
11: aa = np.random.randint(0,self.len,size = BW)
12: fori = 0 → BW − 1 do
13: Fine-tuning the variables corresponding to the harmony to obtain a new harmony invx.
14: end for
15: end if
16: else
17: Generate new harmony invx
18: end if
19: The new harmony invx is compared with the harmony invy with the worst solution in the harmony library, and updated if it is better than it.
20: Update the harmony best with the maximum fitness value in the harmony library again.
21: t = t + 1
22: end while

Algorithm 1 Harmonic search algorithm pseudo-code
1: Define fitness value function fitness(t) = f(x), x = (x₁, x₂, x₃, …, x_n).n
2: Define the generation range of harmony and the generating function of harmony.
3: Set the algorithm parameters: harmony library size (HMS), harmony library value probability (HMCR), pitch adjustment probability (PAR), step size factor (BW), and maximum creation times (MAXGEN).
4: Initialize the harmony library.
5: Evaluate the fitness value of the sound library.
6: Take the harmony best with the largest fitness value in the harmony library and set t = 0.
7: whilet < MAXGEN − 1 do
8: if random.random() < HMCR then
9: if random.random() < PAR then
10: Take a random harmony inv from the harmony library
11: aa = np.random.randint(0,self.len,size = BW)
12: fori = 0 → BW − 1 do
13: Fine-tuning the variables corresponding to the harmony to obtain a new harmony invx.
14: end for
15: end if
16: else
17: Generate new harmony invx
18: end if
19: The new harmony invx is compared with the harmony invy with the worst solution in the harmony library, and updated if it is better than it.
20: Update the harmony best with the maximum fitness value in the harmony library again.
21: t = t + 1
22: end while

In addition, the development of HS algorithm is quite mature, such as IHS (Mahdavi et al., 2007), GHS (Omran & Mahdavi, 2008), SGHS (self-adaptive HS) algorithm (Pan et al., 2010), etc. GSHS (Castelli et al., 2014) proposed a geometric selection strategy from the direction of improving the selection strategy; based on the adjustment of the algorithm structure MHSA-EXTR Archive (Turky et al., 2014), etc. It can be seen from this that predecessors have provided rich reference experience in improving HS algorithm, which makes it a good choice to choose HS algorithm to solve the superparameter optimization problem.

3. LACHS Algorithm

This section will introduce the local autonomous competition and HS algorithm in detail, mainly including the strategies involved in the improvement of the algorithm. In addition, it also introduces the improvement of the evaluation method for superparameter optimization and the process, pseudo-code, and steps of the local autonomous competition and HS algorithm in detail.

3.1. Dynamic adjustment strategy for parameters

The parameter setting of an algorithm is an important factor affecting the performance of the algorithm, and the parameter setting is affected by the actual optimization problem. Although the parameters can be set according to the specific optimization problem, it will cost a lot of time and calculation cost, and once the parameters are initialized, they will not change in the iterative process, which makes it difficult to meet the needs of the search process, easy to make the algorithm search blind and inefficient. Inspired by IHS (Mahdavi et al., 2007) algorithm, dynamic adjustment strategy for parameters is adopted to set the key parameters of response algorithm performance, such as pitch adjustment probability PAR and step factor BW. Their settings are dynamically adjusted with the number of iterations, so as to quickly adapt to the current optimization problem.

In this strategy, the pitch adjustment probability PAR is shown in formula (4):

\begin{array}{r} P A R (t) = P A R_{\min} + ((\frac{t}{N I}) \times (P A R_{\max} - P A R_{\min})) \end{array}

(4)

PAR (t) is the local adjustment probability of the t generation, PAR_min is the minimum adjustment probability, PAR_max is the maximum adjustment probability, t is the current iteration number and NI is the total iteration number.

In addition, the step factor BW is as follows (5):

\begin{array}{r} B W (t) = B W_{\max} \times e^{((\frac{\ln (\frac{B W_{\min}}{B W_{\max}})}{N I}) \times t)} \end{array}

(5)

BW(t) is the local amplitude modulation of the t generation, BW_min is the minimum amplitude modulation, BW_max is the maximum amplitude modulation, t is the current iteration number, and NI is the total iteration number.

3.2. Self-decision search strategy based on the optimal state

Because the search space corresponding to the superparameter optimization problem is huge, considering the time and computational cost, the random initialization strategy cannot cover the entire search space evenly. To make the algorithm fully explores the entire range space, an autonomous decision-making search strategy based on the optimal state is proposed.

The self-decision search strategy based on the optimal state is inspired by GHS (Omran & Mahdavi, 2008) algorithm. The author found that the global optimal harmonic progression fine-tuning often can get better harmony in an ideal state. However, it is easy to fall into the dilemma of local optimum by using only the global optimum harmonic progression fine-tuning. Sometimes it is helpful for the algorithm to jump out of the plight of local optimum by adopting the harmony of the non-global optimum solution. And we are considering that the updated state of the optimal harmony is closely related to the optimum efficiency of the algorithm. Therefore, this strategy is to select a variety of sampling strategies according to the optimal harmony update state. The detailed process of self-selecting search strategy based on update status is as follows:

If the optimal harmony has not been updated in a short time, a sample is randomly taken from the current population, and the harmonic progression with the optimal solution is fine-tuned in the sample. If the optimal harmony has not been updated for a long time, a sample is randomly selected from the current population, and the harmonic progression with the worst solution is chosen in the sample for fine-tuning. In other cases, harmonic progression with the global optimal solution is used for fine-tuning. The detailed process pseudo-code of the self-decision search strategy based on the optimal state is shown in Table 2.

Table 2:

Pseudo-code of self-determination search strategy based on optimal state.

Algorithm 2 Pseudo-code of self-selected search strategy based on updated state
1: Enter the number of iterations 1 t₁, the number of iterations 2 t₂, and the maximum number of iterations t_max.
2: Define variables x; x = (x₁,x₂,x₃,…,x_n) and their ranges.
3: The fitness value function fitness(t) = f(x) is defined.
4: Input initialization variables and calculate the corresponding fitness values, and set t = 0, q₁ = 0, and q₂ = 0.
5: whilet < t_max − 1 do
6: iff(x)_max is not updated then
7: q₁ = q₁ + 1, q₂ = q₂ + 1, t = t + 1
8: ifq₁ = = t₁then
9: Take the variable x_i with local optimal solution for fine tuning, and q₁ = 0.
10: end if
11: ifq₂ = = t₂then
12: Take the variable x_i with the local worst solution for fine tuning, and q₂ = 0.
13: end if
14: ifq₁! = t₁ and q₂! = t₂then
15: Take the globally optimal variable x_i for fine-tuning.
16: end if
17: else
18: Take the globally optimal variable x_i for fine-tuning, and q₁, q₂ = 0, t = t + 1.
19: end if
20: end while

Algorithm 2 Pseudo-code of self-selected search strategy based on updated state
1: Enter the number of iterations 1 t₁, the number of iterations 2 t₂, and the maximum number of iterations t_max.
2: Define variables x; x = (x₁,x₂,x₃,…,x_n) and their ranges.
3: The fitness value function fitness(t) = f(x) is defined.
4: Input initialization variables and calculate the corresponding fitness values, and set t = 0, q₁ = 0, and q₂ = 0.
5: whilet < t_max − 1 do
6: iff(x)_max is not updated then
7: q₁ = q₁ + 1, q₂ = q₂ + 1, t = t + 1
8: ifq₁ = = t₁then
9: Take the variable x_i with local optimal solution for fine tuning, and q₁ = 0.
10: end if
11: ifq₂ = = t₂then
12: Take the variable x_i with the local worst solution for fine tuning, and q₂ = 0.
13: end if
14: ifq₁! = t₁ and q₂! = t₂then
15: Take the globally optimal variable x_i for fine-tuning.
16: end if
17: else
18: Take the globally optimal variable x_i for fine-tuning, and q₁, q₂ = 0, t = t + 1.
19: end if
20: end while

Table 2:

Pseudo-code of self-determination search strategy based on optimal state.

Algorithm 2 Pseudo-code of self-selected search strategy based on updated state
1: Enter the number of iterations 1 t₁, the number of iterations 2 t₂, and the maximum number of iterations t_max.
2: Define variables x; x = (x₁,x₂,x₃,…,x_n) and their ranges.
3: The fitness value function fitness(t) = f(x) is defined.
4: Input initialization variables and calculate the corresponding fitness values, and set t = 0, q₁ = 0, and q₂ = 0.
5: whilet < t_max − 1 do
6: iff(x)_max is not updated then
7: q₁ = q₁ + 1, q₂ = q₂ + 1, t = t + 1
8: ifq₁ = = t₁then
9: Take the variable x_i with local optimal solution for fine tuning, and q₁ = 0.
10: end if
11: ifq₂ = = t₂then
12: Take the variable x_i with the local worst solution for fine tuning, and q₂ = 0.
13: end if
14: ifq₁! = t₁ and q₂! = t₂then
15: Take the globally optimal variable x_i for fine-tuning.
16: end if
17: else
18: Take the globally optimal variable x_i for fine-tuning, and q₁, q₂ = 0, t = t + 1.
19: end if
20: end while

Algorithm 2 Pseudo-code of self-selected search strategy based on updated state
1: Enter the number of iterations 1 t₁, the number of iterations 2 t₂, and the maximum number of iterations t_max.
2: Define variables x; x = (x₁,x₂,x₃,…,x_n) and their ranges.
3: The fitness value function fitness(t) = f(x) is defined.
4: Input initialization variables and calculate the corresponding fitness values, and set t = 0, q₁ = 0, and q₂ = 0.
5: whilet < t_max − 1 do
6: iff(x)_max is not updated then
7: q₁ = q₁ + 1, q₂ = q₂ + 1, t = t + 1
8: ifq₁ = = t₁then
9: Take the variable x_i with local optimal solution for fine tuning, and q₁ = 0.
10: end if
11: ifq₂ = = t₂then
12: Take the variable x_i with the local worst solution for fine tuning, and q₂ = 0.
13: end if
14: ifq₁! = t₁ and q₂! = t₂then
15: Take the globally optimal variable x_i for fine-tuning.
16: end if
17: else
18: Take the globally optimal variable x_i for fine-tuning, and q₁, q₂ = 0, t = t + 1.
19: end if
20: end while

This strategy effectively avoids the dilemma that the algorithm falls into a local optimum for a long time. And make the algorithm explore the potential area of the whole search space as much as possible in a limited time. The optimization speed and ability of the final algorithm are improved.

3.3. Local competition update strategy

On the competitive update strategy, the HS algorithm chooses the global competitive update strategy. That is, the new harmony generated by each iteration of the HS algorithm is compared with the worst harmonic progression in the harmony library. If the new harmony is better, the new harmony replaces the global worst harmony. Because the worst harmony in the whole population is eliminated every time, although it is helpful to the optimization efficiency to a certain extent, it also means that once it falls into the dilemma of local fitting, it is challenging to jump out of the dilemma. It is also possible to produce the global best harmony by fine-tuning the global worst harmony. This is helpful for the algorithm to jump out of the dilemma of the local fitting. Therefore, a local competitive selection mechanism is established in this paper. That is, a sample is randomly selected from the sound memory in each iteration, and the new sound is compared with the harmonic progression with the worst solution in the sample, and if the new harmony is better, it will be replaced. This strategy keeps the possibility of generating the global optimal harmony by fine-tuning the global worst harmony. It helps the algorithm to jump out of the dilemma of local fitting to some extent.

In this strategy, the evaluation standard of competition is shown in formula (6):

\begin{array}{r} f (X_{lw}^{'}) = \max {f (X_{n e w}), f (X_{l w})} \end{array}

(6)

f(⁠ $X_{l w}^{^{'}}$ ⁠) is the fitness value of the updated local worst harmony, $X_{l w}^{^{'}}$ is the harmony corresponding to f(⁠ $X_{l w}^{^{'}}$ ⁠), f(X_new) the fitness value of the new harmony, and f(X_lw) the fitness value of the local worst harmony. The pseudo-code of local competitive update strategy is shown in Table 3.

Table 3:

Pseudo-code of local competition update strategy.

Algorithm 3 Pseudo-code of local competitive update strategy
1: iff(Xnew) > f(Xlw) then
2: f(Xlw) = f(Xnew)
3: Xlw = Xnew
4: end if

Table 3:

Pseudo-code of local competition update strategy.

Algorithm 3 Pseudo-code of local competitive update strategy
1: iff(Xnew) > f(Xlw) then
2: f(Xlw) = f(Xnew)
3: Xlw = Xnew
4: end if

3.4. Evaluation method of fusing training times and recognition accuracy

Predecessors usually take the accuracy of the network model after a certain number of training times T in the test set as the performance index to evaluate a network model. However, determining the training times T is the critical factor affecting the subsequent workload and accuracy. If T is too large, the following calculation cost will increase exponentially; if T is too small, it is impossible to objectively evaluate the performance of the network model. Moreover, once the training times T is initialized, they will not change in the iterative process, which makes it challenging to meet the needs of the search process, and makes the algorithm search blind and inefficient. Predecessors usually determine the training times T by the training times of the network model entering the fitting. Through experiments, it is found that the training times of the network model entering the fitting are related to the learning rate and batch size, as shown in Tables 4 and 5.

Table 4:

Training times T for different learning rates of the same network model to enter fitting on CIFAR10 dataset.

Learning rate	0.001	0.0015	0.002	0.0025
Training times T of network entering fitting	7	6	5	4

Table 4:

Training times T for different learning rates of the same network model to enter fitting on CIFAR10 dataset.

Learning rate	0.001	0.0015	0.002	0.0025
Training times T of network entering fitting	7	6	5	4

Table 5:

Training times T of different batch sizes in the same network model when entering the fitting on CIFAR10 dataset.

Batch sizes	32	64	96	128
Training times T of network entering fitting	8	9	10	12

Table 5:

Training times T of different batch sizes in the same network model when entering the fitting on CIFAR10 dataset.

Batch sizes	32	64	96	128
Training times T of network entering fitting	8	9	10	12

Through the experiment, it is found that the higher the learning rate, the smaller the training times of network model fitting. The higher the learning rate, the greater the training times of network model fitting. The bigger the batch is, the more training times the network model enters the fitting; the smaller the batch, the smaller the training times of the network model into the fitting. The relationship between the training times of the network model, the learning rate, and the batch size is summarized through the experimental rules as shown in formula (7):

\begin{array}{r} T = i n t (\frac{l r_{\max} - l r_{\min}}{l r} * a + \frac{b a t h}{b a t h_{\max} - b a t h_{\min}} * b + l r * \frac{10000}{b a t h} * c) \end{array}

(7)

Here, T is the training times of the current network model, the maximum learning rate within the parameter range, the minimum learning rate within the parameter range, the learning rate of the current network model, the maximum size within the parameter range, the minimum size within the parameter range, and the size of the current network model. Here, a, b, and c are coefficients adjusted according to the complexity of datasets.

3.5. LACHS framework and pseudo-code

The overall flow chart of LACHS algorithm is shown in Fig. 3:

Figure 3:

General flow chart of LACHS algorithm.

The steps of LACHS algorithm are basically the same as those of HS algorithm, with the main difference being improvisation. The specific process of LACHS algorithm is as follows:

Step 1: Initialize the relevant variables of the algorithm and optimization problem.

Step 2: Initialize the sound memory library, and take the accuracy of the network model with the training times T generated according to formula (7) in the test set as the fitness value.

Step 3: Update the pitch adjustment probability PAR and step factor BW through the dynamic adjustment strategy for parameters, and judge whether fine adjustment is needed.

Step 4: If fine-tuning is needed, the harmonic progression to be fine-tuned is selected by the self-decision search strategy based on the optimal state to generate a new harmony, otherwise, the new harmony is randomly generated from the solution space.

Step 5: According to the local competition update strategy, judge whether to keep the new harmony.

Step 6: Judge whether to terminate, judge whether the current creation times have reached this maximum number, if not, repeat the process of steps 3–5 until the maximum creation times are reached.

Finally, the detailed pseudo-code of local autonomous competition and acoustic search algorithm is shown in Table 6.

Table 6:

Pseudo-code of local autonomous competition HS algorithm.

Algorithm 4 Pseudo-code of local competition HS algorithm
1: Define fitness value function fitness(t) = f(x), x = (x₁, x₂, x₃, …, x_n).
2: Define the generation range of harmony and the generation function of harmony.
3: Set the algorithm parameters: harmony library size (HMS), harmony library value probability (HMCR), fine tuning probability (PAR), amplitude modulation (BW), and maximum creation times (MAXGEN).
4: Initialize the harmony library.
5: Evaluate the fitness value of the harmony library.
6: Take the harmony best with the largest fitness value in the harmony library, and set t = 0, q₁ = 0, q₂ = 0.
7: whilet < MAXGEN − 1 do
8: PAR and BW are updated according to formulas (4) and (5)
9: if random.random() < HMCR then
10: if random.random() < PAR then
11: iff(x)_max is not updated then
12: q₁ = q₁ + 1, q₂ = q₂ + 1
13: ifq₁ = = t₁then
14: Take the variable x_i with the local optimal solution for fine tuning, and q₁ = 0.
15: end if
16: ifq₂ = = t₂then
17: Take the variable x_i with the local worst solution for fine tuning, and q₂ = 0.
18: end if
19: ifq₁! = t₁ and q₂! = t₂then
20: Take the variable x_i with the global optimal solution for fine tuning.
21: end if
22: else
23: Take the variable x_i with the global optimal solution for fine tuning, and q₁, q₂ = 0
24: end if
25: else
26: Generate new harmony invx
27: end if
28: else
29: Generate new harmony invx
30: The new harmony invx is compared with the harmony invy with the worst solution in the harmony library, and updated if it is better than it.
31: Update the harmony best with the maximum fitness value in the harmony library again.
32: t = t + 1
33: end if
34: end while

Algorithm 4 Pseudo-code of local competition HS algorithm
1: Define fitness value function fitness(t) = f(x), x = (x₁, x₂, x₃, …, x_n).
2: Define the generation range of harmony and the generation function of harmony.
3: Set the algorithm parameters: harmony library size (HMS), harmony library value probability (HMCR), fine tuning probability (PAR), amplitude modulation (BW), and maximum creation times (MAXGEN).
4: Initialize the harmony library.
5: Evaluate the fitness value of the harmony library.
6: Take the harmony best with the largest fitness value in the harmony library, and set t = 0, q₁ = 0, q₂ = 0.
7: whilet < MAXGEN − 1 do
8: PAR and BW are updated according to formulas (4) and (5)
9: if random.random() < HMCR then
10: if random.random() < PAR then
11: iff(x)_max is not updated then
12: q₁ = q₁ + 1, q₂ = q₂ + 1
13: ifq₁ = = t₁then
14: Take the variable x_i with the local optimal solution for fine tuning, and q₁ = 0.
15: end if
16: ifq₂ = = t₂then
17: Take the variable x_i with the local worst solution for fine tuning, and q₂ = 0.
18: end if
19: ifq₁! = t₁ and q₂! = t₂then
20: Take the variable x_i with the global optimal solution for fine tuning.
21: end if
22: else
23: Take the variable x_i with the global optimal solution for fine tuning, and q₁, q₂ = 0
24: end if
25: else
26: Generate new harmony invx
27: end if
28: else
29: Generate new harmony invx
30: The new harmony invx is compared with the harmony invy with the worst solution in the harmony library, and updated if it is better than it.
31: Update the harmony best with the maximum fitness value in the harmony library again.
32: t = t + 1
33: end if
34: end while

Table 6:

Pseudo-code of local autonomous competition HS algorithm.

Algorithm 4 Pseudo-code of local competition HS algorithm
1: Define fitness value function fitness(t) = f(x), x = (x₁, x₂, x₃, …, x_n).
2: Define the generation range of harmony and the generation function of harmony.
3: Set the algorithm parameters: harmony library size (HMS), harmony library value probability (HMCR), fine tuning probability (PAR), amplitude modulation (BW), and maximum creation times (MAXGEN).
4: Initialize the harmony library.
5: Evaluate the fitness value of the harmony library.
6: Take the harmony best with the largest fitness value in the harmony library, and set t = 0, q₁ = 0, q₂ = 0.
7: whilet < MAXGEN − 1 do
8: PAR and BW are updated according to formulas (4) and (5)
9: if random.random() < HMCR then
10: if random.random() < PAR then
11: iff(x)_max is not updated then
12: q₁ = q₁ + 1, q₂ = q₂ + 1
13: ifq₁ = = t₁then
14: Take the variable x_i with the local optimal solution for fine tuning, and q₁ = 0.
15: end if
16: ifq₂ = = t₂then
17: Take the variable x_i with the local worst solution for fine tuning, and q₂ = 0.
18: end if
19: ifq₁! = t₁ and q₂! = t₂then
20: Take the variable x_i with the global optimal solution for fine tuning.
21: end if
22: else
23: Take the variable x_i with the global optimal solution for fine tuning, and q₁, q₂ = 0
24: end if
25: else
26: Generate new harmony invx
27: end if
28: else
29: Generate new harmony invx
30: The new harmony invx is compared with the harmony invy with the worst solution in the harmony library, and updated if it is better than it.
31: Update the harmony best with the maximum fitness value in the harmony library again.
32: t = t + 1
33: end if
34: end while

Algorithm 4 Pseudo-code of local competition HS algorithm
1: Define fitness value function fitness(t) = f(x), x = (x₁, x₂, x₃, …, x_n).
2: Define the generation range of harmony and the generation function of harmony.
3: Set the algorithm parameters: harmony library size (HMS), harmony library value probability (HMCR), fine tuning probability (PAR), amplitude modulation (BW), and maximum creation times (MAXGEN).
4: Initialize the harmony library.
5: Evaluate the fitness value of the harmony library.
6: Take the harmony best with the largest fitness value in the harmony library, and set t = 0, q₁ = 0, q₂ = 0.
7: whilet < MAXGEN − 1 do
8: PAR and BW are updated according to formulas (4) and (5)
9: if random.random() < HMCR then
10: if random.random() < PAR then
11: iff(x)_max is not updated then
12: q₁ = q₁ + 1, q₂ = q₂ + 1
13: ifq₁ = = t₁then
14: Take the variable x_i with the local optimal solution for fine tuning, and q₁ = 0.
15: end if
16: ifq₂ = = t₂then
17: Take the variable x_i with the local worst solution for fine tuning, and q₂ = 0.
18: end if
19: ifq₁! = t₁ and q₂! = t₂then
20: Take the variable x_i with the global optimal solution for fine tuning.
21: end if
22: else
23: Take the variable x_i with the global optimal solution for fine tuning, and q₁, q₂ = 0
24: end if
25: else
26: Generate new harmony invx
27: end if
28: else
29: Generate new harmony invx
30: The new harmony invx is compared with the harmony invy with the worst solution in the harmony library, and updated if it is better than it.
31: Update the harmony best with the maximum fitness value in the harmony library again.
32: t = t + 1
33: end if
34: end while

4. Experimental Results and Analysis

In this section, the effectiveness of the algorithm is studied through experiments. The experimental dataset, comparison algorithm, and parameter setting will be introduced in Sections 4.1–4.3. Sections 4.4–4.7 mainly compare and analyze the experimental results of different algorithms in different datasets to verify the effectiveness of LACHS algorithm.

4.1. Baseline datasets and evaluation indicators

To evaluate the performance of LACHS algorithm, the popular and widely used benchmark datasets Fashion-MNIST dataset (Xiao et al., 2017) and CIFAR10 dataset (Doon et al., 2018) are used as experimental datasets.

The Fashion-MNIST dataset is created to replace the MNIST dataset. It consists of 70 000 pictures with the size of 48 × 48 × 3, which are divided into ten categories. In this paper, whether the algorithm is optimized or the final parameter combination is obtained for network training, the Fashion-MNIST dataset is divided into a training set, a verification set, and a test set at a ratio of 5:1:1. The reason why the Fashion-MNIST dataset is selected is that the Fashion-MNIST dataset has a lot of noise, which can increase the difficulty of recognition, and it can represent modern machine learning.

The CIFAR-10 dataset is composed of ordinary daily items. The task is to classify a group of pictures with the size of 32 × 32 × 3. The CIFAR-10 dataset is composed of 60 000 color pictures and is divided into 10 categories (aircraft, cars, birds, cats, deer, dogs, frogs, horses, boats, and trucks) as shown in Fig. 4, each accounting for one-tenth. In this paper, whether the algorithm is optimized or the final parameter combination is obtained for network training, the CIFAR-10 dataset is divided into a training set and a test set by a 5:1 ratio. The CIFAR-10 dataset is chosen because it is composed of color images, which is more difficult to identify. Different types of pictures in CIFAR-10 dataset are very different, which makes it more difficult for CNN to identify them. Finally, the CIFAR-10 dataset is a daily product in the real world, which is relatively irregular, resulting in increased difficulty in recognition. Still, at the same time it is more representative in machine learning.

Figure 4:

Ten different classifications of CIFAR-10 and Fashion-MNIST dataset.

In the experiment, the accuracy of CNN on the test set after a certain number of training times t is taken as the fitness value of the LACHS algorithm. For the Fashion-MNIST dataset, the coefficients a, b, and c in formula (6) are set to 1, 0.5, and 1, respectively. For the CIFAR-10 dataset, the coefficients a, b, and c in formula (6) are set to 2, 1, and 2, respectively.

4.2. Compare the developed methods to the most advanced ones

To show the advantages of the improved HS algorithm in this paper, this paper compares the classification accuracy of the network model based on experience and the network model based on algorithm automatic optimization hyperparameters from the test dataset.

The network models based on experience include: ALL-CNN (Springenberg et al., 2014), Deeply-supervised (Lee et al., 2015), Network in Network (Lin et al., 2013), Maxout (Goodfellow et al., 2013), and VGGNet16. Because these hand-designed CNN are represented in machine learning, they are suitable for studying whether the improved HS algorithm in this paper can find a better network model than these classical CNN. Considering the calculated cost of the algorithm, in addition to VGGNet16, this paper directly quotes the best results in their original paper for comparison.

The network model built based on the automatic optimization of superparameters of intelligent algorithms can be divided into two types in comparison: the optimization based on other types of intelligent algorithms and the optimization based on the classic HS algorithm in the same type. The steps of the convolution neural network combined with superparameter based on an intelligence optimization algorithm are summarized as follows.

Step 1: Define the search space: determine the superparameter range to be optimized according to Tables 7 and 8.

Table 7:

Parameters of network model structure (the range of i is 1–4).

Hyperparameters	Range
Convolution layer number	[3, 4]
Number of layers of convolution block i	[2, 3, 4]
Convolution kernel i size	[3, 4, 5]
Number of filters 1	[16, 32, 64, 96]
Number of filters 2	[48, 64, 96, 128]
Number of filters 3	[64, 96, 128]
Number of filters 4	[96, 128]
Activate function 1	[“relu”, “elu”]
Activate function 2	[“relu”, “elu”]
Hidden layer 1	[60, 100, 125]
Hidden layer 2	[60, 100, 125]

Hyperparameters	Range
Convolution layer number	[3, 4]
Number of layers of convolution block i	[2, 3, 4]
Convolution kernel i size	[3, 4, 5]
Number of filters 1	[16, 32, 64, 96]
Number of filters 2	[48, 64, 96, 128]
Number of filters 3	[64, 96, 128]
Number of filters 4	[96, 128]
Activate function 1	[“relu”, “elu”]
Activate function 2	[“relu”, “elu”]
Hidden layer 1	[60, 100, 125]
Hidden layer 2	[60, 100, 125]

Table 7:

Parameters of network model structure (the range of i is 1–4).

Hyperparameters	Range
Convolution layer number	[3, 4]
Number of layers of convolution block i	[2, 3, 4]
Convolution kernel i size	[3, 4, 5]
Number of filters 1	[16, 32, 64, 96]
Number of filters 2	[48, 64, 96, 128]
Number of filters 3	[64, 96, 128]
Number of filters 4	[96, 128]
Activate function 1	[“relu”, “elu”]
Activate function 2	[“relu”, “elu”]
Hidden layer 1	[60, 100, 125]
Hidden layer 2	[60, 100, 125]

Hyperparameters	Range
Convolution layer number	[3, 4]
Number of layers of convolution block i	[2, 3, 4]
Convolution kernel i size	[3, 4, 5]
Number of filters 1	[16, 32, 64, 96]
Number of filters 2	[48, 64, 96, 128]
Number of filters 3	[64, 96, 128]
Number of filters 4	[96, 128]
Activate function 1	[“relu”, “elu”]
Activate function 2	[“relu”, “elu”]
Hidden layer 1	[60, 100, 125]
Hidden layer 2	[60, 100, 125]

Table 8:

Optimization parameter range of network model.

Hyperparameters	Range
Learning rate	[0.001, 0.003, 0.01, 0.03]
Batch size	[32, 64, 128, 256]
Momentum	[0.9, 0.95, 0.99]

Table 8:

Optimization parameter range of network model.

Hyperparameters	Range
Learning rate	[0.001, 0.003, 0.01, 0.03]
Batch size	[32, 64, 128, 256]
Momentum	[0.9, 0.95, 0.99]

Step 2: Initialization group: a group of candidate solutions is created, and each solution represents a solution in the search space.

Step 3: Evaluation of fitness value: training CNN determined by particles of intelligent optimization algorithm according to the training times determined by formula (7) on the verification set, and the fitness value adopts the accuracy of classification of trained CNN on the test set.

Step 4: Update particle position: use swarm intelligence optimization algorithm to update the solution in the search space.

Step 5: Repeat: repeat step 3–4 until the maximum number of iterations is met.

Step 6: Test model: evaluate the performance of CNN based on the optimal configuration of the test set.

Finally, the flowchart of optimizing CNN superparameter based on the intelligent optimization algorithm is shown in Fig. 5.

Figure 5:

Flow of network model based on intelligent algorithms for automatic optimization of superparameters.

The network models based on other types of intelligent algorithm optimization parameters include: CNN based on random search algorithm (RSCNN), CNN based on Bayesian optimization (BASCNN), CNN based on DE (DECN), and CNN based on PSO (PSOCNN). The network models based on the more classical HS algorithm of the same type to optimize the hyperparameters include: CNN based on the standard HS (Geem et al., 2001) algorithm (HSCNN), CNN based on the IHS (Mahdavi et al., 2007) algorithm (IHSCNN), and CNN based on the GHS (Omran & Mahdavi, 2008) algorithm (GHSCNN).

Because these optimization methods based on intelligent algorithms have different characteristics, they are ideal for evaluating the optimization advantages of the improved HS algorithm proposed in this paper. To more intuitively understand the benefits of the LACHS algorithm in hyperparametric optimization, except RSCNN and BASCNN, the CNN built based on other algorithms to optimize hyperparameters is optimized based on the same initial population and the same parameter range. Because of the algorithm characteristics of RSCNN and BASCNN, there is no need to initialize the population, so these two CNN are optimized based on the same parameter range.

4.3. Algorithm settings

In the experiment, the algorithm in this paper uses VGGNet as the basic network for experimental research, and the range of network parameters to be optimized is shown in Tables 6 and 7, with the total optimization parameter Leng of 20.

About the parameters of the improved HS algorithm, the size of the harmony library HMS is set to 10, the maximum creation times T_max is set to 30, the memory value probability HMCR is set to 0.8, the minimum adjustment probability PAR_min is set to 0.1, the maximum adjustment probability PAR_max is set to 1, the minimum amplitude modulation BW_min is set to 1, and BW_max is set to Leng-1 for the maximum amplitude modulation.

The traditional data expansion method (Qi et al., 2022) is adapted to process the dataset, in which the rotation angle range is 10, the width offset is 0.1, the height offset is 0.1, the perspective transformation range is 0.1, and the zoom range is 0.1, and the horizontal inversion is carried out. The filling mode is the nearest, and the rest is set by default.

For the parameter combination optimized by the algorithm, the training times are set to 50 in the training of Fashion-MNIST dataset and 100 in the training of the CIFAR-10 dataset.

4.4. Experimental results

In the experiment, the optimization process and the final optimized population of the LACHS algorithm on the Fashion-MNIST dataset are shown in Fig. 6.

Figure 6:

Optimization process and final optimization population of LACHS algorithm on Fashion-MNIST dataset.

In the Fashion-MNIST dataset, the superparameter combination of CNN (LACHS) based on the LACHS algorithm optimization is [‘64 ‘,’ 3 ‘,’ 128 ‘,’ 5 ‘,’ 96 ‘,’ 4 ‘,’ 128 ‘,’ 4 ‘,’ 60 ‘,’ 100 ‘,’ elu ‘,’ elu ‘,’ 2 ‘,’ 3 ‘,’ 3 ‘,’ 4 ‘,’ 0.001 ‘,’ 0.99 ‘,’ 64 ’]. The accuracy rate of LACHSCNN after data enhancement and training is 93.34%. The CNN model training process and confusion matrix are shown in Fig. 7.

Figure 7:

Training process and confusion matrix of LACHSCNN in Fashion-MNIST.

It can be concluded from Fig. 7 that the accuracy rate of LACHSCNN after training is basically stable at about 93%. In the classification of Fashion-MNIST dataset, the recognition effect of tag 6 is relatively poor, and the recognition effect of other tag types is good.

In the experiment, the optimization process and the final optimized population of the LACHS algorithm on the CIFAR10 dataset are shown in Fig. 8.

Figure 8:

Optimization process and final optimization population of LACHS algorithm on CIFAR10 dataset.

In the CIFAR10 dataset, the superparameter combination of LACHSCNN is [‘96 ‘,’ 4 ‘,’ 128 ‘,’ 3 ‘,’ 128 ‘,’ 4 ‘,’ 128 ‘,’ 5 ‘,’ 60 ‘,’ 100 ‘,’ elu ‘,’ elu ‘,’ 3 ‘,’ 3 ‘,’ 4 ‘,’ 2 ‘,’ 3 ‘,’ 0.001 ‘,’ 0.95 ‘,’ 32 ’]. The accuracy rate of LACHSCNN after training after data enhancement is 90.25%. The CNN model training process and confusion matrix are shown in Fig. 9.

Figure 9:

Training process and confusion matrix of LACHSCNN in CIFAR10 dataset.

It can be concluded from Fig. 9 that the accuracy rate of LACHSCNN after training is basically stable at about 90%. In the classification of CIFAR10 dataset, the recognition effect of tag 3 and tag 5 is relatively poor, while the recognition effect of other tag types is good.

4.5. Compared with the most advanced methods

First of all, as shown in Table 9, as a network architecture of the same type, the accuracy of VGGNet16 on the Fashion-MNIST dataset is 92.86%, and that on the CIFAR10 dataset is 88.74%. In contrast, the accuracy of LACHSCNN with the same basic architecture as VGG is 93.34% on the Fashion-MNIST dataset and 90.25% on the CIFAR10 dataset. Regarding classification accuracy, LACHSCNN has improved by 0.48% and 1.51% in the Fashion-MNIST dataset and CIFAR10 dataset, respectively. Therefore, compared with the same type of artificially designed CNN, the performance of CNN based on LACHS optimization has more advantages. Compared with other kinds of artificially designed CNN, LACHSCNN has more benefits than Maxout (Goodfellow et al., 2013), Deeply-supervised (Lee et al., 2015), and Network in Network (Lin et al., 2013) on CIFAR10 datasets, which are increased by 1.93%, 0.03%, and 0.65% respectively. Although ALL-CNN (Springenberg et al., 2014) performs better on CIFAR10 dataset, it uses a deeper and more complex network structure. Compared with the classification accuracy, ALL-CNN is 1.75% higher than LACHSCNN. However, compared with the resulting calculation cost, LACHSCNN’s calculation cost is lower. Therefore, compared with other types of artificially designed CNN, CNN based on LACHS optimization has more advantages regarding comprehensive performance and calculation cost.

Table 9:

Comparison with CNN based on experience.

Method	Network model	Fashion-MNIST	CIFAR10
Manually designed CNN	VGGNet16 Maxout (Goodfellow et al., 2013) ALL-CNN (Springenberg et al., 2014) Deeply supervised (Lee et al., 2015) Network in network (Lin et al., 2013)	92.86 – – – –	88.74 88.32 92.00 90.22 89.60
The network model constructed by this algorithm	LACHSCNN	93.34	90.25

Method	Network model	Fashion-MNIST	CIFAR10
Manually designed CNN	VGGNet16 Maxout (Goodfellow et al., 2013) ALL-CNN (Springenberg et al., 2014) Deeply supervised (Lee et al., 2015) Network in network (Lin et al., 2013)	92.86 – – – –	88.74 88.32 92.00 90.22 89.60
The network model constructed by this algorithm	LACHSCNN	93.34	90.25

Table 9:

Comparison with CNN based on experience.

Method	Network model	Fashion-MNIST	CIFAR10
Manually designed CNN	VGGNet16 Maxout (Goodfellow et al., 2013) ALL-CNN (Springenberg et al., 2014) Deeply supervised (Lee et al., 2015) Network in network (Lin et al., 2013)	92.86 – – – –	88.74 88.32 92.00 90.22 89.60
The network model constructed by this algorithm	LACHSCNN	93.34	90.25

Method	Network model	Fashion-MNIST	CIFAR10
Manually designed CNN	VGGNet16 Maxout (Goodfellow et al., 2013) ALL-CNN (Springenberg et al., 2014) Deeply supervised (Lee et al., 2015) Network in network (Lin et al., 2013)	92.86 – – – –	88.74 88.32 92.00 90.22 89.60
The network model constructed by this algorithm	LACHSCNN	93.34	90.25

Secondly, the results of RSCNN, BASCNN, GACNN, PSOCNN, and DECNN are all optimized based on the same initial population and hyperparametric range. As shown in the experimental results in Table 10, the accuracy of RSCNN, BASCNN, GACNN, PSOCNN, and DECNN on the Fashion-MNIST dataset reached 92.94%, 93.09%, 93.09%, 93.05%, and 93.26% respectively. The accuracy of RSCNN, BASCNN, GACNN, PSOCNN, and DECNN on the CIFAR10 dataset reached 83%, 89.36%, 88.81%, 88.81%, and 88.81%, respectively. It can be seen that the network model based on other types of evolutionary algorithms to optimize hyperparameters can achieve good results on the Fashion-MNIST dataset. Still, the effect on the more complex CIFAR10 dataset is not ideal. In contrast, on the Fashion-MNIST dataset, LACHSCNN is 0.4%, 0.25%, 0.25%, 0.29%, and 0.08% higher than RSCNN, BASCNN, GACNN, PSOCNN, and DECNN, respectively. Because the data structure of the Fashion-MNIST dataset is relatively simple. There is no gap in the experimental results. On that more complicated CIFAR10 dataset, the accuracy of LACHSCNN is 6.75%, 0.91%, 1.44%, 1.44%, and 1.44% higher than that of RSCNN, BASCNN, GACNN, PSOCNN, and DECNN, respectively. The experimental results show that the LACHS algorithm has more advantages in superparameter optimization than other evolutionary algorithms.

Table 10:

Comparison with CNN constructed by intelligent algorithm.

Method	Network model	Fashion-MNIST	CIFAR10
CNN constructed by intelligent algorithm	RSCNN BASCNN GACNN PSOCNN DECNN EvoCNN (Real et al., 2017) CNN-GA (Aszemi & Dominic, 2019) CNN-DPSO (Guo et al., 2020)	92.94 93.09 93.09 93.05 93.26 92.72 – 92.91	83.00 89.36 88.81 88.81 88.81 – 80.62 –
The network model constructed by this algorithm	LACHSCNN	93.34	90.25

Method	Network model	Fashion-MNIST	CIFAR10
CNN constructed by intelligent algorithm	RSCNN BASCNN GACNN PSOCNN DECNN EvoCNN (Real et al., 2017) CNN-GA (Aszemi & Dominic, 2019) CNN-DPSO (Guo et al., 2020)	92.94 93.09 93.09 93.05 93.26 92.72 – 92.91	83.00 89.36 88.81 88.81 88.81 – 80.62 –
The network model constructed by this algorithm	LACHSCNN	93.34	90.25

Table 10:

Comparison with CNN constructed by intelligent algorithm.

Method	Network model	Fashion-MNIST	CIFAR10
CNN constructed by intelligent algorithm	RSCNN BASCNN GACNN PSOCNN DECNN EvoCNN (Real et al., 2017) CNN-GA (Aszemi & Dominic, 2019) CNN-DPSO (Guo et al., 2020)	92.94 93.09 93.09 93.05 93.26 92.72 – 92.91	83.00 89.36 88.81 88.81 88.81 – 80.62 –
The network model constructed by this algorithm	LACHSCNN	93.34	90.25

Method	Network model	Fashion-MNIST	CIFAR10
CNN constructed by intelligent algorithm	RSCNN BASCNN GACNN PSOCNN DECNN EvoCNN (Real et al., 2017) CNN-GA (Aszemi & Dominic, 2019) CNN-DPSO (Guo et al., 2020)	92.94 93.09 93.09 93.05 93.26 92.72 – 92.91	83.00 89.36 88.81 88.81 88.81 – 80.62 –
The network model constructed by this algorithm	LACHSCNN	93.34	90.25

EvoCNN (Real et al., 2017), CNN-GA (Aszemi & Dominic, 2019), and CNN-DPSO (Guo et al., 2020) all refer to previous experimental results. On the Fashion-MNIST dataset, LACHSCNN is 0.62% and 0.43% higher than EvoCNN (Real et al., 2017) and CNN-DPSO (Guo et al., 2020), respectively. On the Fashion-MNIST dataset, LACHSCNN is 9.63% more accurate than CNN-GA. Although the performance of LACHSCNN is better than that of EvoCNN (Real et al., 2017), CNN-GA (Aszemi & Dominic, 2019), and CNN-DPSO (Guo et al., 2020), due to the different optimization parameter ranges and basic architecture, it cannot be explained that the previous methods are not excellent. Still, it can only prove that the optimization of neural network architecture by LACHS has particular frontier.

HSCNN, IHSCNN, and GHSCNN are all optimized based on the same initial population and hyperparametric range. The experimental results in Table 11 show that the accuracy of HSCNN, IHSCNN, and GHSCNN on the Fashion-MNIST dataset is 92.96%, 93.23%, and 93.29% respectively. The accuracy of HSCNN, IHSCNN, and GHSCNN on the CIFAR10 dataset reached 88.81%, 88.81%, and 88.81%, respectively. It can be seen that CNN based on different types of HS algorithm optimization hyperparameters can achieve good results in the Fashion-MNIST dataset. Still, the effect on the more complex CIFAR10 dataset is not ideal. In contrast, the classification accuracy of the Fashion-MNIST dataset is 0.38%,0.11%, and 0.05% higher than that of HSCNN, IHSCCNN, and GHSCCNN, respectively. Because the data structure of Fashion-MNIST dataset is relatively simple, there is no gap in the experimental results. On that more complex CIFAR10 dataset, the accuracy of LACHSCNN is 1.44%,1.44%, and 1.44% higher than that of HSCNN, IHSCNN, and GHSCNN, respectively. The experimental results show that compared with other different types of HS algorithms, the LACHS algorithm has more advantages in superparameter optimization.

Table 11:

Compared with network model constructed by classical HS algorithms.

Method	Network model	Fashion-MNIST	CIFAR10
Network model based on classical HS algorithms	HSCNN IHSCNN GHSCNN	92.96 93.23 93.29	88.81 88.81 88.81
The network model constructed by this algorithm	LACHSCNN	93.34	90.25

Method	Network model	Fashion-MNIST	CIFAR10
Network model based on classical HS algorithms	HSCNN IHSCNN GHSCNN	92.96 93.23 93.29	88.81 88.81 88.81
The network model constructed by this algorithm	LACHSCNN	93.34	90.25

Table 11:

Compared with network model constructed by classical HS algorithms.

Method	Network model	Fashion-MNIST	CIFAR10
Network model based on classical HS algorithms	HSCNN IHSCNN GHSCNN	92.96 93.23 93.29	88.81 88.81 88.81
The network model constructed by this algorithm	LACHSCNN	93.34	90.25

Method	Network model	Fashion-MNIST	CIFAR10
Network model based on classical HS algorithms	HSCNN IHSCNN GHSCNN	92.96 93.23 93.29	88.81 88.81 88.81
The network model constructed by this algorithm	LACHSCNN	93.34	90.25

The performance of the intelligent algorithm can be understood by analyzing the optimization process of the optimization parameters of the intelligent algorithm and the final optimization population. The intelligent algorithm is divided into other types of intelligent algorithm optimization and the classic HS algorithm in the same type for comparison and discussion.

The optimization process and final optimization population of LACHS algorithm and other intelligent algorithms in the Fashion-MNIST dataset and CIFAR10 dataset are shown in Figs 10 and 11.

Figure 10:

Optimization process and final optimization population of LACHS algorithm and intelligent algorithm in Fashion-MNIST dataset.

Figure 11:

Optimization process and final optimization population of LACHS algorithm and intelligent algorithm in CIFAR10 dataset.

On the Fashion-MNIST dataset, various algorithms have achieved good results in the optimization process in Fig. 10. Although the optimal speed of the LACHS algorithm is slightly slower than that of the DE algorithm, the final optimal result is better than that of the DE algorithm. For GA and PSO algorithms, the search speed and search ability of these two algorithms are not as good as the LACHS algorithm. And from the analysis of the final optimized population, compared with the initial population, except the GA algorithm, the last optimized population of other algorithms belongs to the elite population. The reason why the final optimized population of the GA algorithm is not ideal is related mainly to its elimination strategy and population size. GA algorithm chooses the next generation to directly replace the previous generation, which easily leads to the inability to guarantee the population quality of each generation when the population size is not large enough. The difference between the PSO algorithm and DE and LACHS algorithm is that the PSO algorithm has no an abnormal combination (the fitness value of combination is too different from that of the population). Having some suitable abnormal combinations can effectively prevent the algorithm from falling into the local optimization dilemma for a long time. This is also the reason why the PSO algorithm is worse than the DE algorithm and LACHS algorithm. However, because the data structure of the Fashion-MNIST dataset is simple, and the optimal results of various intelligent algorithms are not noticeable, a more complex CIFAR10 dataset is selected for the experiment.

As can be seen from Fig. 11, on the more complex CIFAR10 dataset, the optimal results of other algorithms are not ideal, and the best optimization results of the GA algorithm, DE algorithm, and PSO algorithm are still the best harmony in the original harmony database, which does not play an optimal role. Although the LACHS algorithm was caught in the dilemma of local fitting at first, with the increase of iterations, it used the autonomous decision-making search strategy based on the optimal state to help itself jump out of the dilemma in time and find better search results. From the analysis of the final optimal population, it is evident that the optimization effect of the LACHS algorithm is better. After the same number of searches, the last optimized population combinations of the LACHS algorithm all belong to the elite combination and the population reaches the convergence state. However, the last optimized population of other algorithms does not reach the convergence state. Therefore, the search speed and search ability of the LACHS algorithm are better than other algorithms.

Therefore, compared with other types of intelligent optimization algorithms, the LACHS algorithm has more advantages in terms of superparameter optimization.

The optimization process and last optimized population of the LACHS algorithm and classical HS algorithm on Fashion-MNIST dataset and CIFAR10 dataset are shown in Figs 12 and 13.

Figure 12:

Optimization process and final optimization population of various HS algorithms in Fashion-MNIST dataset.

Figure 13:

Optimization process and final optimization population of various HS algorithms in the CIFAR10 dataset.

On the Fashion-MNIST dataset, we can see from Fig. 12 that various HS algorithms have achieved good results in the optimization process. Although compared with other types of HS algorithms, the search speed of the LACHS algorithm is a little slower, but the final search results are better than other algorithms. From the analysis of the last optimized population, compared with the initial population, the final optimized population of various algorithms belongs to the elite population. However, the HS algorithm and IHS algorithm do not retain the abnormal combination, which is not conducive to the algorithm jumping out of the local optimization dilemma. Therefore, in terms of search ability, the GHS algorithm and LACHS algorithm are stronger. However, due to the simple data structure of the Fashion-MNIST dataset, the optimal results of various intelligent algorithms cannot be separated, so the more complex the CIFAR10 dataset was selected for experimental comparison.

As can be seen from Fig. 13, the optimal results of other algorithms except the LACHS algorithm are not ideal. Finally, their optimal results are still the best harmony in the original harmony library. These algorithms have not played an optimal role. Although the local autonomous competition HS algorithm fell into the dilemma of local fitting at the beginning, with the number of iterations, it used the autonomous decision-making search strategy based on the optimal state to help itself jump out of the dilemma in time and find a better search result. The analysis of the last optimized population shows that the LACHS algorithm and GHS algorithm have better optimization effects. After the same search times, the combination of the final optimized population of the LACHS algorithm and GHS algorithm generally belongs to the elite combination to reach the convergence state. In contrast, the HS algorithm and IHS algorithm have not reached the convergence state. Therefore, the search speed of the LACHS algorithm and GHS algorithm is better than that the HS algorithm and IHS algorithm. Although the GHS algorithm has good search speed, its search ability is still not as good as that the LACHS algorithm, and GHS algorithm still fails to jump out of the dilemma of local fitting until the end of the search.

Therefore, compared with the classical HS algorithm of the same type, the LACHS algorithm has more advantages in terms of superparameter optimization.

In short, considering the difficulty of processing the hyperparametric optimization of a given CNN, the contribution of this paper is reasonable.

4.6. Expression recognition case study

To evaluate the performance of LACHS-CNN in practical application, a case study of expression recognition is conducted in this part. Expression recognition can significantly promote the integration and development of many disciplines, such as graphic image processing, artificial intelligence, human-computer interaction, and psychology. The related video emotion database is the foundation of expression recognition research, so this paper uses the SVAEE dataset (Liu et al., 2022) to provide data for emotion research that needs training and testing. The SVAEE dataset consists of videos of four actors in seven emotions, and each video is about 3 seconds long. This dataset standardizes 68 key points of the face of the test object in the database. A specific sample example of the SAVEE database is shown in Fig. 14 below:

Figure 14:

Seven different expressions of the same person in SVAEE dataset.

In the experiment, a video frame was shot every 50 frames, and a total of 1957 pictures were obtained, each with a size of 48 × 48. Then the generated image dataset is divided into a training set and a testing set according to the proportion of 90% and 10%. In the LACHS algorithm, the coefficients a, b, and C in formula (6) are set to 10, 5, and 10, respectively, and the rest configurations are unchanged. Then, the CNN obtained by the LACHS algorithm is trained at 500 times. Because of the high computational cost, only the VGG16 network model with the same training time of 500 times is used as the control group. The training results are shown in Table 12.

Table 12:

Comparison with manually designed CNN.

Method	Network model	SVAEE
Manually designed CNN	VGG16	95.92
The network model constructed by this algorithm	CNN based on LACHS	97.96

Table 12:

https://doi.org/10.14569/IJACSA.2019.0100638

Comparison with manually designed CNN.

Method	Network model	SVAEE
Manually designed CNN	VGG16	95.92
The network model constructed by this algorithm	CNN based on LACHS	97.96

Table 12 shows that CNN based on LACHS can produce higher accuracy than VGG16, which verifies the effectiveness of LACHS algorithm and shows the potential of LACHS in solving practical applications.

4.7. Further discussion

The experiment in Section 4 proves the superiority of the LACHS algorithm in CNN superparameter optimization. In the experiment, we use VGGNet as the basic network of hyperparametric optimization. The LACHS algorithm can get better results than the same type of VGGNet16 and other most advanced CNN. However, the LACHS algorithm also has its limitations. Because the LACHS algorithm optimizes the given basic network superparameters, if the basic network is not suitable for the optimization problem, the improvement effect brought by the algorithm is not apparent. In this paper, there are two reasons for choosing VGGNet as the basic network architecture. First of all, it is a typical CNN in deep learning, and it is a representative example of superparameter optimization through this algorithm. Secondly, under the same effect, its structure is simpler and the calculation cost is lower. However, choosing the appropriate CNN model to solve the corresponding tasks is also a challenging problem. But this needs further study, which is beyond the scope of this paper. Therefore, the contribution of this paper is reasonable for the hyperparameter optimization problem of a given CNN.

5. Conclusions

In this paper, aiming at the two difficulties of CNN hyperparameter optimization, an improved HS algorithm for efficiently optimizing CNN hyperparameter is proposed. In this paper, the LACHS algorithm adopts a dynamic adjustment strategy for parameters, a self-selection search strategy based on update state, and a local competition update strategy to solve the complex optimization problem in large-scale search space. A new evaluation function is designed, which can save the calculation cost without affecting the search results. The comparative experiments on two datasets, Fashion-MNIST and CIFAR10, verify the superiority of the improved HS algorithm in this paper.

In the future, the HS algorithm will be used as an optimization search method for CNN superparameter optimization. We can continue to study the influence of hyperparameters on the fitting law of network model, so as to formulate more reasonable optimization function values to save time and calculation cost. On the optimization mechanism of the algorithm, the possibility of combining grid search, random search, local search, and HS is considered. It is expected to get an improved algorithm which is more suitable for hyperparameter optimization.

Acknowledgement

The authors would like to thank P. N. Suganthan for the useful information about meta-heuristic algorithms and optimization problems on his homepages. The authors also thank Prof. Zhihui Zhan of South China University of Technology. This work is supported by the Fund of Innovative Training Program for College Students of Guangzhou University (approval number: S202111078042), Guangzhou City School Joint Fund Project (2023A03J01009), National Nature Science Foundation of China (Grant Nos 52275097 and 61806058), Natural Science Foundation of Guangdong Province (2018A030310063), and Guangzhou Science and Technology Plan (201804010299).

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflict of interest statement

None declared.

References

Aszemi

N. M.

Dominic

P. D. D

. (

2019

Hyperparameter optimization in convolutional neural network using genetic algorithms

International Journal of Advanced Computer Science and Applications

(

Awad

Mallik

Hutter

(

2020

Differential evolution for neural architecture search

preprint

(

10.48550/arXiv.2012.06400

Bergstra

Bardenet

Bengio

Kégl

(

2011

Algorithms for hyper-parameter optimization

. In

Proceedings of the 24th International Conference on Neural Information Processing Systems

(pp.

2546

–

2554

.).

Curran Associates Inc

doi/10.5555/2986459.2986743

https://doi.org/10.1016/j.chemolab.2011.12.002

Bergstra

Bengio

(

2012

Random search for hyper-parameter optimization

Journal of Machine Learning Research

(

281

–

305

https://doi.org/10.1016/j.ins.2014.04.001

Castelli

Silva

Manzoni

Vanneschi

(

2014

Geometric selective harmony search

Information Sciences

279

468

–

482

https://doi.org/10.1109/PUNECON.2018.8745428

Chen

Liu

Pei

(

2021

Cross-modal matching CNN for autonomous driving sensor data monitoring

. In

Proceedings of the IEEE/CVF International Conference on Computer Vision

(pp.

3110

–

3119

.).

IEEE

Doon

Rawat

T. K.

Gautam

(

2018

Cifar-10 classification using deep convolutional neural network

. In

Proceedings of the 2018 IEEE Punecon

(pp.

–

.).

IEEE

Fernandes

F. E.

Jr.,

Yen

G. G.

(

2021

Pruning deep convolutional neural networks architectures with evolution strategy

Information Sciences

552

–

.. .

https://doi.org/10.1007/978-3-030-05318-5

Feurer

Hutter

(

2019

Hyperparameter optimization

. In

Automated machine learning: Methods, systems, challenges

(pp.

–

.).

Springer

Geem

Z. W.

Kim

J. H.

Loganathan

G. V.

(

2001

A new heuristic optimization algorithm: Harmony search

Simulation

(

–

https://doi.org/0037-5497(2001)l:2<60:ANHOAH>2.0.TX;2-3

Goodfellow

Warde-Farley

Mirza

Courville

Bengio

(

2013

Maxout networks

. In

Proceedings of the International Conference on Machine Learning

(pp.

1319

–

1327

.).

PMLR

https://doi.org/10.1080/01969722.2020.1827797

Guo

J.-Y.

Zhan

Z.-H.

(

2020

Efficient hyperparameter optimization for convolution neural networks in deep learning: A distributed particle swarm optimization approach

Cybernetics and Systems

(

–

https://doi.org/10.1109/TEVC.2022.3217290

Zhang

Ren

Sun

(

2016

Deep residual learning for image recognition

. In

Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

(pp.

770

–

778

.).

IEEE

Huang

Liu

Van Der Maaten

Weinberger

K. Q.

(

2016

Densely connected convolutional networks

. In

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

(pp.

2261

–

2269

.).

IEEE

10.1109/CVPR.2017.243

Huang

Xue

Sun

Zhang

Yen

G. G.

(

2022

Particle swarm optimization for compact neural architecture search for image classification

IEEE Transactions on Evolutionary Computation

https://doi.org/10.1109/TEVC.2021.3065659

Jian

J.-R.

Chen

Z.-G.

Zhan

Z.-H.

Zhang

(

2021

Region encoding helps evolutionary computation evolve faster: A new solution encoding scheme in particle swarm for large-scale optimization

IEEE Transactions on Evolutionary Computation

(

779

–

793

https://doi.org/10.1007/s13042-019-01030-4

Jian

J.-R.

Zhan

Z.-H.

Zhang

(

2020

Large-scale evolutionary optimization: A survey and experimental comparative study

International Journal of Machine Learning and Cybernetics

(

729

–

745

. https://dl.acm.org/doi/abs/10.5555/3326943.3327130

Kandasamy

Neiswanger

Schneider

Barnabás

Xing

E. P.

(

2018

Neural architecture search with Bayesian optimisation and optimal transport

NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems, 2020–2029

Karpathy

(

2016

CS231n convolutional neural networks for visual recognition

Neural Networks

(

). https://cs231n.github.io/convolutional-networks.

Khan

Sohail

Zahoora

Qureshi

A. S.

(

2020

A survey of the recent architectures of deep convolutional neural networks

Artificial Intelligence Review

5455

–

5516

https://doi.org/10.1007/s10462-020-09825-6

https://doi.org/10.1145/3065386

Krizhevsky

Sutskever

Hinton

G. E.

(

2012

ImageNet classification with deep convolutional neural networks

. In

Advances in neural information processing systems

, (pp.

1097

–

1105

.).

Curran Associates

Larochelle

Erhan

Courville

A. C.

(

2007

An empirical evaluation of deep architectures on problems with many factors of variation

. In

Proceedings of the 24th International Conference on Machine Learning

(pp.

473

–

480

.).

ACM

Lecun

Bengio

Hinton

(

2015

Deep learning

Nature

521

(

7553

436

–

444

https://doi.org/10.1038/nature14539

Lee

C. Y.

Xie

Gallagher

Zhang

(

2015

Deeply-supervised nets

Proceedings of the 18th International Conference on Artificial Intelligence and Statistics

(pp.

562

–

570

.).

PMLR

https://doi.org/10.1109/TEVC.2020.2979740

J.-Y.

Zhan

Z.-H.

Wang

Jin

Zhang

(

2020

Boosting data-driven evolutionary algorithm with localized data generation

IEEE Transactions on Evolutionary Computation

(

923

–

937

https://doi.org/10.1109/TNNLS.2021.3106399

J.-Y.

Zhan

Z.-H.

Kwong

Zhang

(

2023a

Surrogate-assisted hybrid-model estimation of distribution algorithm for mixed-variable hyperparameters optimization in convolutional neural networks

IEEE Transactions on Neural Networks and Learning Systems

2338

–

2352

J.-Y.

Zhan

Z.-H.

Tan

K. C.

Zhang

(

2023b

Dual differential grouping: A more general decomposition method for large-scale optimization

IEEE Transactions on Cybernetics

3624

–

3638

https://doi.org/10.1109/TCYB.2022.3158391

J.-Y.

Zhan

Z.-H.

Zhang

(

2022

Evolutionary computation for expensive optimization: A survey

Machine Intelligence Research

(

–

https://doi.org/10.1007/s11633-022-1317-4

Lin

Chen

Yan

(

2013

Network in network

preprint

(

10.48550/arXiv.1312.4400

Liu

Sisman

Schuller

Gao

(

2022

Accurate emotion strength assessment for seen and unseen speech based on data-driven deep learning

preprint

(

10.48550/arXiv.2206.07229

Whalen

Dhebar

Deb

Goodman

E. D.

Banzhaf

Boddeti

V. N.

(

2020

Multiobjective evolutionary design of deep convolutional neural networks for image classification

IEEE Transactions on Evolutionary Computation

(

277

–

291

https://doi.org/10.1109/TEVC.2020.3024708

https://doi.org/10.1016/j.amc.2006.11.033

Mahdavi

Fesanghary

Damangir

(

2007

An improved harmony search algorithm for solving optimization problems

Applied Mathematics and Computation

188

(

1567

–

1579

10.1016/j.amc.2007.09.004

Omran

M. G. H.

Mahdavi

(

2008

Global-best harmony search

Applied Mathematics & Computation

198

(

643

–

656

https://doi.org/10.1016/j.amc.2010.01.088

Pan

Q.-K.

Suganthan

P. N.

Tasgetiren

M. F.

Liang

J. J.

(

2010

A self-adaptive global best harmony search algorithm for continuous optimization problems

Applied Mathematics and Computation

216

(

830

–

848

https://doi.org/10.1007/s11831-021-09587-6

Yang

Sun

Lou

Lian

Zhao

Deng

(

2022

A comprehensive overview of image enhancement techniques

Archives of Computational Methods in Engineering

583

–

607

https://www.engineeringletters.com/issues_v14/issue_1/EL_14_1_14

Raymond

Beng

O. K

. (

2007

A comparison between genetic algorithms and evolutionary programming based on cutting stock problem

Engineering Letters

(

–

Real

Moore

Selle

Saxena

Suematsu

Y. L.

Tan

Kurakin

(

2017

Large-scale evolution of image classifiers

. In

Proceedings of the International Conference on Machine Learning

(pp.

2902

–

2911

.).

PMLR

Simonyan

Zisserman

(

2014

Very deep convolutional networks for large-scale image recognition

CoRR

preprint

(

Springenberg

J. T.

Dosovitskiy

Brox

Riedmiller

(

2014

Striving for simplicity: The all convolutional net

preprint

(

https://doi.org/10.1162/EVCO_A_00253

10.48550/arXiv.1412.6806

Suganuma

Kobayashi

Shirakawa

Nagao

(

2020

Evolution of deep convolutional neural networks using cartesian genetic programming

Evolutionary Computation

(

141

–

163

Sun

Xue

Zhang

Yen

G. G.

(

2019

Evolving deep convolutional neural networks for image classification

IEEE Transactions on Evolutionary Computation

(

394

–

407

https://doi.org/10.1109/TEVC.2019.2916183

10.1109/CVPR.2015.7298594

Szegedy

Liu

Jia

Sermanet

Reed

Anguelov

Erhan

Vanhoucke

Rabinovich

(

2014

Going deeper with convolutions

. In

Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

(pp.

–

.).

IEEE

Tan

(

2019

EfficientNet: Rethinking model scaling for convolutional neural networks

. In

Proceedings of the International Conference on Machine Learning

(pp.

6105

–

6114

.).

PMLR

https://doi.org/10.1016/j.procs.2014.05.177

Turky

A. M.

Abdullah

Sabar

N. R.

(

2014

A hybrid harmony search algorithm for solving dynamic optimisation problems

Procedia Computer Science

1926

–

1936

https://doi.org/10.1049/cit2.12106

Wang

Chen

Fan

Sun

Naoi

(

2016

Deep knowledge training and heterogeneous CNN for handwritten Chinese text recognition

. In

Proceedings of the 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR)

(pp.

–

.).

IEEE

Wang

Y. Q.

J. Y.

Chen

C. H.

Zhang

Zhan

Z. H.

(

2022a

Scale adaptive fitness evaluation-based particle swarm optimisation for hyperparameter and architecture optimisation in neural networks and deep learning

CAAI Transactions on Intelligence Technology

–

https://doi.org/10.1109/TEVC.2022.3185665

Wang

Z.-J.

Jian

J.-R.

Zhan

Z.-H.

Kwong

Zhang

(

2022b

Gene targeting differential evolution: A simple and efficient method for large scale optimization

IEEE Transactions on Evolutionary Computation

https://doi.org/10.1109/TNNLS.2021.3054400

Wang

Xue

Zhang

(

2021

Surrogate-assisted particle swarm optimization for evolving variable-length transferable blocks for image classification

IEEE Transactions on Neural Networks and Learning Systems

(

3727

–

3740

https://doi.org/10.1109/TCYB.2020.2977956

Wang

Z.-J.

Zhan

Z.-H.

Kwong

Jin

Zhang

(

2020

Adaptive granularity learning distributed particle swarm optimization for large-scale optimization

IEEE Transactions on Cybernetics

(

1175

–

1188

https://doi.org/10.1109/TEVC.2021.3051608

S.-H.

Zhan

Z.-H.

Zhang

(

2021

SAFE: Scale-adaptive fitness evaluation method for expensive optimization problems

IEEE Transactions on Evolutionary Computation

(

478

–

491

Xiao

Rasul

Vollgraf

(

2017

Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms

preprint

(

10.48550/arXiv.1708.07747

Chan

Jaitly

(

2017

Very deep convolutional networks for end-to-end speech recognition

. In

Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

(pp.

4845

–

4849

.).

IEEE

https://doi.org/10.1109/ICASSP.2017.7953077

https://doi.org/10.1007/s10462-021-10042-y

Zhan

Z. H.

Shi

Tan

K. C.

Zhang

(

2022a

A survey on evolutionary computation for complex continuous optimization

Artificial Intelligence Review

–

110

https://doi.org/10.1016/j.neucom.2022.01.099

Zhan

Z.-H.

J.-Y.

Zhang

(

2022b

Evolutionary deep learning: A survey

Neurocomputing

483

–

https://doi.org/10.1016/j.neucom.2018.09.038

Zhang

Chen

Sun

(

2018

Recent advances in convolutional neural network acceleration

Neurocomputing

325

–

Zoph

Q. V

. (

2016

Neural architecture search with reinforcement learning

preprint

(