Svoboda | Graniru | BBC Russia | Golosameriki | Facebook
Next Article in Journal
Combustion Characteristics of Sinusoidal-Shaped Walls with Catalyst Segmentation in Micro-Combustors for Micro-Thermophotovoltaic Application
Previous Article in Journal
Equivalent Thermal Conductivity of Topology-Optimized Composite Structure for Three Typical Conductive Heat Transfer Models
Previous Article in Special Issue
Advanced Single-Phase PLL-Based Transfer Delay Operators: A Comprehensive Review and Optimal Loop Filter Design
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Improved WOA (Whale Optimization Algorithm)-Based CNN-BIGRU-CBAM Model and Its Application to Short-Term Power Load Forecasting

School of Science, China University of Geosciences (Beijing), Beijing 100083, China
*
Author to whom correspondence should be addressed.
Energies 2024, 17(11), 2559; https://doi.org/10.3390/en17112559
Submission received: 26 April 2024 / Revised: 20 May 2024 / Accepted: 22 May 2024 / Published: 24 May 2024
(This article belongs to the Special Issue Simulation, Optimization and Intelligent Control of Energy System)

Abstract

:
The accuracy requirements for short-term power load forecasting have been increasing due to the rapid development of the electric power industry. Nevertheless, the short-term load exhibits both elasticity and instability characteristics, posing challenges for accurate load forecasting. Meanwhile, the traditional prediction model suffers from the issues of inadequate precision and inefficient training. In this work, a proposed model called IWOA-CNN-BIGRU-CBAM is introduced. To solve the problem of the Squeeze-and-Excitation (SE) attention mechanism’s inability to collect information in the spatial dimension effectively, the Convolutional Block Attention Module (CBAM) is firstly introduced as a replacement. This change aims to enhance the ability to capture location attributes. Subsequently, we propose an improved Whale Optimization Algorithm (IWOA) that addresses its limitations, such as heavy reliance on the initial solution and susceptibility to local optimum solutions. The proposed IWOA is also applied for the hyperparameter optimization of the Convolutional Neural Network–Bidirectional Gated Recurrent Unit–Convolutional Block Attention Module (CNN-BiGRU-CBAM) to improve the precision of predictions. Ultimately, applying the proposed model to forecast short-term power demand yields results that show that the CBAM effectively addresses the problem of the SE attention mechanism’s inability to capture spatial characteristics fully. The proposed IWOA exhibits a homogeneous dispersion of the initial population and an effective capability to identify the optimal solution. Compared to other models, the proposed model improves R2 by 0.00224, reduces the RMSE by 18.5781, and reduces MAE by 25.8940, and the model’s applicability and superiority are validated.

1. Introduction

Accurate load forecasting has become crucial in dispatching power to satisfy customer demands, load switching, and infrastructure expansion as modern energy systems become increasingly complex and flexible [1]. Short-term power load forecasting (STLF) is essential for the smooth functioning of the power system, and precise load prediction is crucial for guaranteeing the secure and steady operation of the power system [2]. The significance of predicting short-term load has increased with the advancement of the power industry. Nevertheless, the immediate demand exhibits characteristics of elasticity and unpredictability, resulting in increased challenges in accurate load forecasting.
Researchers have provided a variety of models in the past decades to make more accurate short-term load forecasts. The methods mainly consist of traditional methods for forecasting and artificial intelligence methods [3]. Conventional approaches to STLF primarily include the ARIMA model [4], the grey model (GM) [5], and the Kalman filtering method [6]. The ARIMA model is a statistical model based on time-series data, commonly used for analysing and forecasting trends and periodicity in time-series data. The Grey Model (GM) is a modelling method based on a small amount of data, particularly suitable for predicting when there is insufficient data support. Kalman filtering is a recursive filtering technique used to estimate system states, particularly suitable for systems with dynamic changes. These three methods have their advantages and limitations in dealing with different types of load data and forecasting periods. The ARIMA model is suitable for stable load data and short-term forecasts, but may perform poorly for nonlinear and dynamically changing load data; the GM method is suitable for situations with fewer data or slow load changes, but may be insufficient for handling rapid changes and complex data; the Kalman filtering method is suitable for situations that require consideration of system dynamic changes and time variations, but requires a good understanding and modelling of the system’s state space model. Additionally, due to the non-linear characteristics of load data, the above traditional forecasting methods encounter difficulties in accurately predicting load trends. In power systems, load data characteristics are complex and variable, with nonlinear features being particularly prominent. The nonlinearity of load data is primarily reflected in the following aspects: (1) Complexity of load demand: Power load demand is influenced by numerous factors, including temperature, humidity, seasonal changes, and economic activities. These factors have complex nonlinear relationships, causing power load to exhibit significant nonlinear characteristics. (2) Diversity of user behaviour: Different users exhibit varying electricity usage behaviours, with industrial, commercial, and residential users showing distinct usage patterns. This behavioural diversity makes the nonlinear characteristics of load data more complex. (3) Dynamic characteristics of power systems: The power system itself is a complex dynamic system. The start-up and shutdown of generation equipment, faults, and maintenance all cause fluctuations in load. Additionally, mechanisms for supply and demand balance in the power market, and real-time price fluctuations also impart nonlinear effects on load data.
Therefore, scholars have proposed artificial intelligence methods. Artificial intelligence (AI) techniques have recently seen increased use in STLF, which include Support Vector Machine (SVM) [7], Long Short-Term Memory (LSTM) network [8], Bi-directional Long Short-Term Memory (BiLSTM) [9], Gated Recurrent Unit (GRU) [10] network, and the improved models of various scholars, etc., which can capture the non-linear characteristics of power loads better and significantly enhance the precision of load forecasting. BiGRU [11] can consider past and future known data and learn more feature information effectively. Most of these single neural network models make predictions for time series. However, a single neural network model produces inferior forecasts in intricate tasks as it neglects to account for the spatial correlations among data points.
Further, extracting spatiotemporal features in load data can provide more comprehensive and accurate data, thus helping to capture the characteristics of load changes at a finer level and improve prediction accuracy. CNN-BiGRU [12] improves prediction accuracy by introducing a CNN layer to extract intricate high-dimensional spatio-temporal features. The CNN-BiGRU-Attention model enhances its prediction accuracy by incorporating the SE-Attention mechanism into the CNN-BiGRU model. However, factors such as the inability of the SE-Attention mechanism to capture valid information in the spatial dimension and the high human interference in the model parameters lead to poor prediction accuracy of the CNN-BiGRU-Attention.
The issue of selecting hyperparameters for the model can be considered an optimization problem, which is generally optimized using exact algorithms such as the Bayesian optimization algorithm [13], Adam’s algorithm [14], or heuristic algorithms. The exact approach can yield a precise solution. However, the efficiency of the solution is not good. Heuristic algorithms like Grey Wolf Optimizer (GWO) [15] and Particle Swarm Optimization (PSO) have the advantages of being highly effective in finding the optimal solution and having greater optimization efficiency, making them more competitive. Nevertheless, heuristic algorithms exhibit certain limitations. For instance, the GWO algorithm has inadequate population diversity and limited global search capability.
Similarly, the PSO algorithm exhibits a deceleration in convergence as the search progresses toward its later phases and is prone to becoming trapped in optimal local settings. Mirjalili and Lewis introduced the WOA [16] heuristic optimization algorithm in 2016, with good global search performance, few control parameters, ease of implementation, etc. Consequently, it has gained popularity in various problem-solving domains, such as combinatorial optimization, image segmentation, data prediction, path planning, etc. However, WOA also suffers from the problems of being sensitive to parameters, possibly falling into a local optimal solution, and having a strong dependence on the initial solution.
In summary, an IWOA-CNN-BIGRU-CBAM model is introduced in this paper. To mitigate the limitation of the SE attention mechanism to scenes with a greater number of channels and its inability to capture spatio-temporal features effectively, the CBAM is implemented as an alternative to enhance the capability of capturing such features. Meanwhile, considering the drawbacks of WOA falling into local optimal solutions and the strong dependence on the initial solution, an improved WOA is proposed. The improved WOA is applied to hyperparameter optimization of CNN-BiGRU-CBAM to enhance the accuracy of predictions. The proposed approach exhibits exceptional precision and effectiveness, making it well-suited for addressing issues such as the inadequate accuracy of STLF. This study contributes the following.
  • Aiming at the problem that the CNN-BiGRU-Attention model cannot capture adequate information in the spatial dimension, the CBAM is implemented to boost the model’s capacity to capture positional information.
  • Considering that WOA has the shortcomings of being sensitive to parameters, dependent on the initial solution, and easily falling into local optimal solutions, an improved WOA is proposed, which, by introducing good point sets, improved convergence factors, and mutation mechanisms, boosts the optimization potential of WOA.
  • Through experiments, this study presents a model that achieves high levels of prediction accuracy and high training efficiency. Compared with models such as BiLSTM, RMSE and MAE decreased by 291.9470 and 219.9830, respectively, and R2 improved by 0.06941.

2. Related Work

The traditional methods for power load forecasting mainly include the ARIMA model, the grey model (GM), and the improved models of various scholars. For example, scholars such as Fei Wu [17] proposed a fractional autoregressive integral moving average (FARIMA) model optimized with the cuckoo search (CS) algorithm, and scholars such as Saadat Bahrami [18] formulated a model integrating the WT (wavelet transform) and GM optimized with the PSO algorithm. Similarly, there are linear regression and non-linear regression models [19]. The ARIMA model is suitable for linear data and is easy to implement, but it performs poorly with nonlinear data and long-term forecasting. The Grey Model (GM) adapts well to small samples and uncertain systems, but it has limited prediction accuracy and capability to handle abrupt data changes. The FARIMA model optimized with the Cuckoo Search (CS) algorithm and the integrated model of Wavelet Transform (WT) and Grey Model (GM) optimized with the Particle Swarm Optimization (PSO) algorithm excel in handling complex data and improving prediction accuracy. However, they have high computational complexity and difficulty in parameter tuning. Linear and nonlinear regression models are simple and intuitive, widely applied, but the former lacks the ability to handle nonlinear relationships, while the latter is more complex in model selection and parameter tuning. The continuous development of these methods has improved the accuracy of power load forecasting, but their respective limitations still need to be comprehensively considered in practical applications.
AI techniques for STLF have grown in popularity in the last several years. For instance, machine learning-based approaches have been employed, including Support Vector Machines (SVM) [20,21], Random Forest (RF) and eXtreme Gradient Boosting (XGBoost) [22]; artificial neural networks (ANN), BP neural networks [23,24], the deep neural network (DNN) algorithm [25] and deep learning-based methods are also used, including the Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), Convolutional Neural Network (CNN), and improved models by various scholars, etc. These methods show broad application prospects in short-term load forecasting. Machine learning methods such as Support Vector Machines (SVM), Random Forest (RF), Extreme Gradient Boosting (XGBoost), etc., can handle complex nonlinear relationships, while deep learning methods such as Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), Convolutional Neural Network (CNN), etc., can effectively extract spatio-temporal features. Model fusion and improved structures also bring new ideas for improving prediction accuracy. However, these methods still face challenges in terms of computational complexity, parameter tuning, and handling high-dimensional nonlinear data, which require further research and improvement.
Accordingly, scholars such as Mu Yangyang [26] improved prediction accuracy by combining the sequence-to-sequence structure with LSTM. Fang Liu [27] and other scholars proposed an ultra-short-term power load-forecasting model by integrating an attention mechanism, BiLSTM, and CNN, which extracts load data-related spatio-temporal features using CNN and BiLSTM. Wu Kuihua [28] and other scholars suggested a model for STLF that integrates LSTM and BiLSTM with an attention-based CNN. The combination prediction model based on the LSTM module faces challenges when dealing with long sequential data, such as high computational complexity, difficulty in capturing long-term dependencies, and high memory consumption. The GRU model was introduced with a simpler structure and fewer parameters. The GRU model improves training speed and computational efficiency while maintaining performance, making it widely used in sequence modelling tasks. Scholars such as Jia Taorong [29] have suggested a method for predicting short-term power load using a combination of CEEMDAN, the Multiverse optimization (MVO) algorithm, and the GRU based on the Rectified Adam (RAdam) optimizer. Using the GRU as a foundation, the BiGRU model was further proposed to extract contextual information from sequence data more effectively. Liang Rong [30] and other scholars proposed an Adamax-BiGRU model using the Adamax optimization algorithm. Further, Zhang Chu [12] and other scholars proposed an integrated multivariate PV power prediction model incorporating VMD, CNN, and BiGRU with full consideration of load data’s geographical and chronological aspects. Then, the attention mechanism can notably enhance the predictive efficacy of the model by allocating distinct weights to the crucial information. Meng Yuyu [31] and other scholars proposed an ACNN-BiGRU wind power ultrashort-term prediction model, which applies a CNN to extract important spatio-temporal properties from the input data. Xu Yucheng [32] and other scholars proposed a hybrid model called BiGRU-SENet, which incorporates the attention mechanism. This model is particularly effective in handling the nonlinearities found in high-dimensional time-series data. The SE attention mechanism, however, cannot capture adequate information in the spatial dimension, which applies to the problem of scenarios with more channels, and its computational efficiency is not high.
The prediction model training process involves many parameters, which can be an optimization issue. Currently, it is generally solved by exact optimization algorithms or heuristic algorithms. Biao Yang [33] and other scholars built a Bayes-BiLSTM model by optimizing the BiLSTM parameters utilizing a Bayesian optimization technique. Dashe Li [14] and other scholars established the Enhanced Clustering Algorithm and Adam-based Radial Basis Function Neural Network (ECA-Adam-RBFNN). Although it produces accurate solutions, the precise approach is inefficient because it relies too much on gradient information. The heuristic algorithm is more competitive with the advantages of optimality-seeking solid ability, faster training efficiency, higher optimization efficiency, and shorter solution time. Scholars such as Mengdan Feng [34] developed the GWO-XGBOOST-CEEMDAN model for carbon price forecasting by optimizing the parameters of the XGBOOST model utilizing the GWO algorithm. However, the GWO algorithm has poor population diversity and weak global search ability. Jun Guo [35] and other scholars established the PSO-GRU model to forecast air-mining coal temperatures; Manzhe Xiao [36] and other scholars proposed an enhanced BP neural network based on the PSO algorithm to predict carbon price. The PSO algorithm slows the convergence speed in the late stage of the search, and it is rather simple to become stuck in the local optimum. The WOA has the advantages of good global search performance and easy implementation, etc. Luo Jun [37] and other scholars proposed an ARIMA-WOA-LSTM model, where WOA is applied for hyper-parameter optimization of the LSTM. Sun Youzhuang [38] and others developed a WOA-Elman model. These research methods demonstrate the potential of combining optimization algorithms with neural network models in various fields. By utilizing optimization algorithms to adjust model parameters, researchers have successfully improved the performance of prediction models. The advantage of this approach lies in its ability to better adapt to complex data structures and achieve more accurate results in prediction tasks. However, some optimization algorithms such as GWO and PSO have limitations in terms of population diversity and global search, which may result in slow convergence or convergence to local optima. Therefore, future research needs to further improve these algorithms to enhance their efficiency and robustness in optimizing prediction models.

3. Model and Methodology

3.1. CNN-BiGRU-CBAM

Figure 1 shows the architecture of the CNN-BiGRU-CBAM model. Figure 1 illustrates that after the historical load data are entered from the input layer, they enter the CNN network layer, where features are extracted. Data correlation is captured utilizing the convolutional layer. Network learning efficiency is enhanced by using the pooling layer, which reduces data dimensionality via pooling operations. The CBAM module facilitates the model’s learning and extraction of local features and the extended memorable features in the data. Then, the data enter the BiGRU network, and the processed data are fully learned, further improving spatio-temporal feature extraction accuracy. Ultimately, the fully interconnected layer generates final forecasting outcomes.

3.1.1. CNN

Convolutional neural networks (CNN) [39] are a subtype of feed-forward neural networks characterized by their deep architecture and incorporation of convolution processing. They frequently address overfitting, inefficiency, and spatial data loss. Figure 2 presents the CNN structure.
The CNN model structure can automatically extract features at different levels and scales by combining convolutional and pooling layers, enabling efficient feature-learning and classification tasks.

3.1.2. BiGRU

BIGRU [11] is a model that utilizes the deep learning algorithm to process sequence data. It is based on improving GRU [40] to better capture contextual information in sequence data by introducing a bidirectional loop structure. Each BiGRU unit contains two gated loop units, one dedicated to forward-direction sequence data processing and the other to reverse-direction sequence data processing. These two directional units can capture different information in the sequence and combine them to provide a more comprehensive contextual understanding. Figure 3 illustrates the structure of the GRU, and Figure 4 depicts the BiGRU structure.
A GRU (Gated Recurrent Unit) [39] consists of update gate z t and reset gate r t , which combines the input gate and forgetting gate in LSTM into a single update gate, which reduces the training parameters of the model and the model convergence time, reduces the training complexity, and has fewer parameters and faster convergence time during training.
r t = σ W r x t + U r h t 1
z t = σ W z x t + U z h t 1
h t ˜ = tanh W x t + U r t h t 1
h t = 1 z t h t 1 + z t h t ˜
where W t , W z , W , U r , U z , U are the weight matrix of GRU. σ denotes the logical sigmoid function; t a n h denotes the tanh function; denotes the element multiplication operation; z t denotes the update gate, which can decide the degree of updating of the activation value of the GRU unit, which is jointly decided by the input state and the state of the previous hidden layer; r t denotes the reset gate, whose updating process is similar to the process of z t ; h t ˜ denotes the candidate hidden layer; and h t denotes the hidden layer.

3.1.3. CBAM (Convolutional Block Attention Module)

CBAM [41] is an attention mechanism used to enhance the efficiency of convolutional neural networks. The model’s representation capacity is enhanced by incorporating an attention mechanism into the convolutional block, allowing the model to prioritize the significant aspects of the input effectively. The CBAM architecture is illustrated in Figure 5.
Channel attention and spatial attention modules are the two sequential steps of the CBAM. The input feature map F undergoes global maximum and global average pooling in the channel attention module to obtain F a v g c and F max c , after the Share MLP module, channel attention, is generated. Next, element-wise addition is used to merge the output feature vectors, which are subjected to compression using a sigmoid function followed by multiplication with the original input feature map to obtain the weighted feature map. Notably, the structure can be observed in Figure 6. The formula for channel attention is displayed below.
M c F = σ M L P A v g P o o l F + M L P M a x P o o l F = σ W 1 W 0 F a v g c + W 1 W 0 F max c
By applying global maximum and global average pooling to the channel attention output, F a v g s and F max s can be obtained for the spatial attention module, after which the resulting feature maps are stacked and then made into one-channel feature maps by a convolutional layer, followed by compression by a sigmoid function and multiplication with the original input feature maps to obtain the weighted feature maps. Its resulting structure is illustrated in Figure 7. Below is the formula for spatial attention:
M s F = σ f 7 × 7 A v g P o o l F ; M a x P o o l F = σ f 7 × 7 F a v g s ; F max s
The CBAM enables the model to dynamically acquire knowledge about the significance of each channel and location, hence enhancing the model’s capacity for expression. The CBAM exhibits clear advantages over the SE attention mechanism regarding channel characteristics, spatial characteristics, computational efficiency, and scalability.
Meanwhile, CBAM can handle the high-dimensional spatio-temporal features extracted by CNN, and the combination of the two can handle the spatio-temporal features well in load data. BiGRU does not take spatial correlations among data points into account. In this paper, the combination of CNN and CBAM is used to enhance the accuracy of predictions.

3.2. Improved Whale Optimization Algorithm (IWOA)

An IWOA is introduced to address its limitations, including sensitivity to parameters, susceptibility to local optimal solutions, and heavy reliance on the initial solution. The proposed algorithm incorporates strategies such as a friend variation mechanism, an improved convergence factor, and a suitable point set.

3.2.1. WOA

Mirjalili and Lewis developed the WOA, a heuristic optimization technique, in 2016 [16]. The hunting behaviour of humpback whales served as an inspiration for the algorithm, which was mathematically modelled by simulating the whales’ round-up behaviour and the attack mechanism of bubble-net foraging to achieve optimization. The WOA possesses the advantages of a smaller amount of control parameters, uncomplicated implementation, and an effective global search capability. This program follows a three-step process that mimics the distinctive search technique and feeding behaviour of humpback whales: prey seining, bubble net foraging, and prey searching. The WOA considers each location of humpback whales as a potential solution, and the optimal solution is obtained by consistently revising the whales’ locations within the solution space.
1.
Rounding up prey
WOA postulates that the present most favourable candidate solution is either the desired prey or a solution very close to optimal. After identifying the optimal search agent, subsequent search agents will strive to synchronize their positions with those of the highest-performing agent. The following formulae express this behaviour.
D = C X t X t
X t + 1 = X t A D
where t denotes the current iteration numbers, and A and C denote the coefficient vectors. The position vector X t represents the most optimal solution achieved thus far. When a superior solution is available, an update of X t will be made on each iteration. Below are the formulae for calculating A and C .
A = 2 a r 1 a
C = 2 r 2
a = 2 2 t T max
where a linear drop in the value of a from 2 to 0 occurs during the iteration, and r 1 and r 2 are random vectors in 0 ,   1 . T max is the maximum number of iterations.
2.
Bubble netting
Humpback whale predation mainly occurs through bubble-net and encircling predation. The humpback whale’s and its prey’s position updates during bubble-net feeding are calculated using the following logarithmic spiral equation.
X t + 1 = D e b l cos 2 π l + X t
D = X t X t
where D represents the distance vector between the current searching individual and the current optimal solution, b denotes a finite constant determining the helix shape, and l is a randomly and uniformly distributed number with a value range of 1 ,   1 .
Meanwhile, when leaning towards the prey, the WOA exhibits two predatory behaviours: constriction encirclement or bubble-net predation. p determines the choice between these behaviours, and the position is updated based on the formula below:
X t + 1 = X t A D ,   p < 0.5 ; D e b l cos 2 π l + X t ,   p 0.5 .
where p represents the predation mechanism’s probability, a random number ranging from 0 to 1.
Following an increase in iterations t, a steady decrease occurs in both the convergence factor a and the parameter A , and if A < 1 , the whales progressively surround the current optimal solution as a component of the WOA’s local optimal search phase.
3.
Searching for prey
To facilitate thorough exploration of the solution space by all whales, WOA adjusts the position of each whale based on its distance from other whales, hence promoting randomized searching. Thus, if A 1 , the individual searching will swim toward a randomly chosen whale as follows:
D = C X r t X t
X t + 1 = X r t A D
where D denotes the distance separating the random individual from the current searching individual, while X r t denotes the position vector of the current following individual.
The Whale Optimization Algorithm (WOA) offers several advantages, including simplicity in implementation, fewer parameters to tune compared to other optimization algorithms, and strong global search capability. It effectively balances exploration and exploitation during the optimization process, making it suitable for solving complex and high-dimensional optimization problems. Additionally, WOA has demonstrated robustness and efficiency in finding optimal or near-optimal solutions across various applications.
However, WOA also has some disadvantages, specifically: (1) The performance of the WOA is highly influenced by the parameters used, such as the initial position of the whale, step size, direction, and so on. These parameters significantly affect the efficacy of the algorithm. Inadequate parameter selection might result in suboptimal algorithm performance or even convergence to a local optimum. (2) The WOA is a computational method that emulates whales’ foraging behaviour to find the best possible solution. It achieves this by modelling the feeding behaviour of whales. However, due to the randomness of the whale’s behaviour, the algorithm might become trapped in the local optimum solution while searching without finding the global optimal solution. (3) The WOA exhibits a significant reliance on the initial solution. If the initial solution is not appropriately chosen, it can negatively impact the algorithm’s search efficiency and outcomes.

3.2.2. Improvement of the Initial Population

Generating an initial population with a good point set makes it more evenly distributed in space and more accessible to find the global optimum. The unit cube in s-dimensional Euclidean space is denoted as Gs. It is shaped as:
P n k = r 1 n k , r 2 n k , , r s n k ,   1 k n
Its deviation φ n satisfies φ n = C r , ε n 1 + ε , where C r , ε denotes a constant which is only related to r and ε ( ε represents an arbitrary positive value), with P n k and r considered as the set of good points, taken as r = 2 cos 2 π k / p ,   1 k s , where p denotes the smallest prime number satisfying ( p 3 ) / 2 s
The two-dimensional initial populations were generated using a combination of the good point set and random method for comparison. A total of 80 populations were created, and the results can be observed in Figure 8 and Figure 9. It is evident that when taking the same number of points, the good point set method yields more evenly distributed points than the random method. Hence, by mapping the favourable aspects of Gs onto the objective solution space, the initial population becomes more navigable, thereby enhancing the attainment of the global optimum.

3.2.3. Convergence Factors

When the convergence factor a has a high value, the algorithm is more proficient in conducting a global search. In contrast, when the convergence factor a has a small value, the algorithm is more capable of local search, so to balance the two, this paper considers updating the convergence factor as follows:
a = a 1 + a 0 a 1 1 + cos ( t 1 ) π / t max 1 n 2 ,   t t max / 2 ; a = a 1   + a 0 a 1 1 cos ( t 1 ) π / t max 1 n 2 , t max / 2 < t t max .
where a 0 = 2 ,   a 1 = 0 ,   n = 0.4 , and t max is the highest possible number of iterations.

3.2.4. Mechanisms for Friend Variation

It defines the extent of the circle of friends in terms of European distances.
D i s i t = X i t X i W O A t + 1
where D i s i t denotes the Euclidean distance between the current and previous generation individuals, X i t represents the previous generation individual, and X i W O A t + 1 indicates the typically updated current generation individual. A friend is defined as if the following conditions are satisfied:
P F i t = X j t D i X i t , X j t D i s i t , X j t P o p
where P F i t is the friend group of X j t and D i X i t , X j t is the Euclidean distance of X i t and X j t .
Each person’s behaviour is more similar to that of their friends, so the friends are selected from the friend group for location update as follows:
X i F r i , d ( t + 1 ) = X i , d ( t ) + r a n d × X n 1 , d ( t ) X r 1 , d ( t ) + r a n d × X n 2 , d ( t ) X r 2 , d ( t ) ,   p < 0.5 ; X i F r i , d ( t + 1 ) = X i , d ( t ) + r a n d × X b e s t , d ( t ) X r 1 , d ( t ) ,   p 0.5 .
where X i F r i , d ( t + 1 ) is the updated individual, X i , d ( t ) is the previous generation individual, X n 1 , d ( t ) and X n 2 , d ( t ) are the randomly selected friends from the friend group. X r 1 , d ( t ) and X r 2 , d ( t ) are the randomly selected individuals from the population, X b e s t , d ( t ) are the globally optimal individuals, and r a n d are the random numbers obeying standard normal distribution. Then, greedy updating is performed:
X i ( t + 1 ) = X i W O A ( t + 1 ) ,   if   f X i W O A < f X i F r i ; X i F r i ( t + 1 ) , otherwise .
The flow of IWOA is as follows:
  • Good point set method for initializing populations;
  • When p < 0.5 , the encircling prey behaviour is performed; otherwise, the bubble-net behaviour is performed, the individual X i is updated, and the fitness value is calculated;
  • Individuals X i F r i were updated using Equation (21), and fitness values were calculated;
  • Individuals in the population were updated using Equation (22);
  • The convergence factor a is updated;
  • Determine whether the iteration end condition is reached. If so, the optimum solution will be ended and output. If not, jump back to step 2 to continue the loop.
Pseudocode of the IWOA is described in Algorithm 1.
Algorithm 1: Pseudocode of the IWOA.
Input: Number of search agents: N, Dim, tmax.
Output: Optimal fitness value.
Generate the search agent’s initial position by using the good point set method.
Calculate each search agent’s fitness value.
The search agent with the best fitness was selected as the lead whale.
While t < tmax
Calculate parameter a, by Equation (18).
Calculate parameters A and C by Equations (9) and (10):
If p < 0.5
   If |A| < 1
     Apply Equation (8) to update the current search agent’s position.
   Else
     Apply Equation (16) to update the current search agent’s position.
   End if
Else
   Apply Equation (12) to update the current search agent’s position.
End if
Calculate the fitness value named Fit1 for current search agents.
Calculate the friends radius for current search agents by Equation (19).
Determine friends of every current search agent using Equation (20).
Update the current search agent’s new position named XFri using Equation (21).
Calculate the fitness value named Fit2 of XFri.
Update the current search agent’s position using Equation (22).
Should a better solution emerge, update X*.
t = t + 1
End while
Return X* and optimal fitness value.
Figure 10 shows the IWOA flowchart.

4. Experiments and Analyses

4.1. Data Sources, Environmental Configuration, and Evaluation Indicators

This study selects electricity load data samples from a southern region, specifically from 1 January 2017 0:00:00 to 31 January 2017 23:45:00. A total of 96 data points are gathered daily, with a time interval of 15 min. The training set to test set ratio is 7:3. The computer environment used 16 GB of RAM, an NVIDIA GeForce RTX 3060 Laptop GPU, and an AMD Ryzen 7-5800H processor with Radeon Graphics. All experiments were conducted using MATLAB 2021a simulation software.
Table 1 specifies the model parameters.
This research applies commonly used evaluation indices, namely, RMSE, R2 and MAE, to compare the predictive performance of different models. The calculation of these indices is as follows:
R M S E = 1 n i = 1 n y i y i ^ 2
R 2 = 1 i = 1 n y i ^ y i 2 i = 1 n y ¯ y i 2
M A E = 1 n i = 1 n y i y i ^
where y i denotes the first i sample point’s true value; y i ^ represents the sample point’s predicted value; n denotes the sum of all test samples; and y ¯ denotes the mean of all true y-values.
The value of RMSE, which is highly responsive to both significant and minor inaccuracies in a set of outcomes, can be used to measure the prediction accuracy effectively. As the RMSE value decreases, the load forecast becomes more precise. R2 evaluates the level of accuracy of the predicted value compared to the true value. Its value closer to 1 suggests a higher level of accuracy for the model. MAE is not sensitive to outliers, and a smaller MAE value suggests a better fit for the model.

4.2. Model Validation

The IWOA-CNN-BiGRU-CBAM is compared with WOA-CNN-BiGRU-Attention and WOA-CNN-BiGRU-CBAM on the dataset for the experiments. In this case, the hyperparameters for optimization are all chosen as the learning rate of the model; the BiGRU layer’s hidden neurons count with L2 regularization coefficients, the number of algorithmic search agents are all 6, and the number of iterations is 10. Each parameter of the model is detailed in Table 1. In addition, Figure 11 shows the loss training plot of the IWOA-CNN-BiGRU-CBAM model. Figure 12 shows a comparison plot of the predictions of the three models. Table 2 illustrates the prediction accuracy results of these three models. All the above results are average results obtained from 50 model runs. The variance of the results of these 50 runs is recorded in Table 2.
As shown in Figure 12 and Table 2 comparing WOA-CNN-BiGRU-CBAM and WOA-CNN-BiGRU-Attention, there is a 14.5448 and 12.9315 decrease in the RMSE and MAE, respectively, and a 0.00206 improvement in R2, indicating that adding CBAM can enhance the accuracy of predictions and fitting effect of the model; then, comparing IWOA-CNN-BiGRU-CBAM and WOA-CNN-BiGRU-CBAM, RMSE and MAE are reduced by 18.5781 and 25.8940, respectively, and R2 is improved by 0.00224, suggesting that the IWOA improvement effect is superior to that of WOA. In conclusion, the prediction accuracy and fitting effect of the IWOA-CNN-BiGRU-CBAM model established in this study are excellent. Meanwhile, by comparing the variance of RMSE and MAE of the above three models, the stability of the IWOA-CNN-BiGRU-CBAM model is shown to be superior.

4.3. Validation of IWOA

The performance evaluation of the five algorithms presented in this paper is conducted using the cec2022 test function set. IWOA and WOA, msWOA [42], mFOA [43], mPSO [44], and mSCA [45] are compared in this test function set, and these algorithms are run 100 times, respectively. The worst, best, and mean values and the standard deviation are taken as the evaluation indexes, and Table 3 depicts the results. Further, the overall results were subjected to the Friedman test [46], and the results are shown in Table 3, where the functions F1, F3, F5, and F9 were tested with dimension 2. The functions F2, F4, F6, F10, F11, F12 were tested with dimension 10, and the functions F7, F8 were tested with dimension 20. Figure 13 displays the convergence curves of each algorithm for the F2, F3, F8, and F12 functions.
As shown in Table 3, when IWOA optimizes the F1, F2, F3, F5, F6, F8, F9, F10, and F12 functions, its optimal value is closest to the theoretically optimal value, the mean value index is the best among many algorithms, and the standard deviation index is smaller, indicating that its optimization effect is more stable and the algorithm performance is better; to further assess the strengths and weaknesses of the algorithms, the results in the table are subjected to non-parametric tests. The results are displayed in Table 4. The IWOA has been determined to have the highest ranking, and its optimization effect is notably outstanding.
Figure 13 proves that IWOA performs better in optimizing the F2, F3, F8, and F12 functions. Initially, these functions display more fluctuations but converge rapidly in the early phases. The fluctuations decrease when the iterations continue to progress, and the downward trend of the curve indicates that the search agents are effectively collaborating to update their positions to achieve better results. Table 3 and Table 4 provide evidence that IWOA exhibits superior convergence behaviour and displays an improved capacity to balance the two extremes of exploitation and exploration throughout the iteration process compared to other algorithms.

4.4. Comparative Experiments

In this section, all the model parameters involved are depicted in Table 1, with the average results obtained from 50 runs of the model.
Firstly, the hidden neuron number in the BiGRU layer, the learning rate, the L2 regularization coefficients, the number of searching agents (fixed at six), and the number of iterations (maintained at ten) are optimized by the IWOA algorithm for the CNN-BiGRU, CNN-BiGRU-Attention, and CNN-BiGRU-CBAM models, respectively. The mentioned models are employed for the experiments on the selected dataset, and the forecasts’ accuracy findings are illustrated in Table 5.
In Table 5, by introducing CNN, the prediction error RMSE is reduced by 13.77%, MAE is reduced by 6.81%, and R2 is improved by 0.00541. By introducing the attention mechanism, the prediction error RMSE is reduced from 158.6322 to 111.2914, which is a reduction of 29.84%; MAE is reduced by 26.15%, and R2 is improved by 0.00794, which is more than that of introducing CNN. Therefore, the effect of the attention mechanism on the performance of the model’s predictions is truly astounding. While introducing CBAM instead of SE, the prediction error RMSE is reduced by 22.34%, MAE by 34.39%, and R2 is improved by 0.00304, proving the superiority of CBAM.
Then, five algorithms such as WOA, msWOA, mFOA, mPSO, and mSCA, are optimized for, respectively, hidden neuron count in the BiGRU layer, learning rate, and L2 regularization coefficients of the CNN-BiGRU-CBAM model; this model optimized with IWOA was used for experiments on the selected dataset. Table 6 illustrates the load-forecasting accuracy of various modelling approaches presented.
According to Table 6, the proposed model demonstrates the least amount of prediction error when contrasted with the other five models. Its RMSE value is 86.4299, R2 value is 0.99529, and MAE value is 56.0482, which are improved by 18.5781, 0.00224, and 25.894, respectively, compared to WOA, which shows that the influence of IWOA on optimization is mainly reflected in the significant reduction of RMSE and MAE. Comparing it with msWOA, the effectiveness of the improvement measures presented in this research for WOA is evident. Then, the predictive accuracy of the optimized model of iWOA is paired against that of the three improved algorithms, namely mFOA, mPSO, and mSCA. It can be seen that the optimized model of IWOA has a considerable decrease in RMSE and MAE, and the precision of the prediction is greatly enhanced. Meanwhile, the improvement of R2 also indicates that the model-fitting effect of the optimized model of IWOA has been improved. Simultaneously, the enhancement of the indicators also signifies that the model-fitting effect of the IWOA optimization presented in this paper has been enhanced. To summarize, the model proposed exhibits enhancements in the indicators RMSE, MAE, R2, indicating that the overall prediction accuracy and model performance of the IWOA-CNN-BIGRU-CBAM model proposed has been greatly improved in the prediction process. The efficacy of the model described in this research is validated.
Then, the predictions of BiLSTM, IWOA-CNN-BiLSTM-Attention, IWOA-CNN-BiLSTM-CBAM, and IWOA-CNN-BiGRU-CBAM models are compared with each other, and Table 7 depicts the corresponding results.
In Table 7, comparing IWOA-CNN-BiLSTM-Attention with BiLSTM, RMSE, MAE, respectively, are reduced by 202.2185 and 113.7977, and R2 improves by 0.03536. Comparing IWOA-CNN-BiLSTM-CBAM with IWOA-CNN-BiLSTM-Attention, RMSE, MAE, respectively, are reduced by 30.4749, 25.0097, and R2 improves by 0.01123. Comparing IWOA-CNN-BiGRU-CBAM with IWOA-CNN-BiLSTM-CBAM, RMSE and MAE decrease, respectively, by 68.2536, 81.1756, and R2 improves by 0.02282. As seen from the above, the proposed IWOA-CNN-BiGRU-CBAM model is more efficient in optimisation than BiLSTM and other models.

5. Conclusions

An IWOA-CNN-BiGRU-CBAM model is proposed to resolve the challenge of short-term power load forecasting. To address the issue of the SE attention mechanism’s inability to effectively capture information in the spatial dimension and its limited applicability to scenarios with many channels, the CBAM is proposed as a replacement. Meanwhile, considering the drawbacks of WOA falling into local optimal solutions and the strong dependence on the initial solution, an improved WOA algorithm is proposed and subsequently employed for hyper-parameter training of CNN-BiGRU-CBAM to enhance the precision of model training. The proposed IWOA-CNN-BiGRU-CBAM model is applied to the power load dataset for short-term forecasting. The experiment’s results indicate that the CBAM, as presented, can significantly enhance the model’s potential for generalization. The suggested IWOA exhibits enhanced optimization search capabilities and is better suited for addressing the hyperparameter optimization problem of the CNN-BIGRU-CBAM model. Compared with other models, IWOA-CNN-BiGRU-CBAM improves the R2 metric by 0.0224, RMSE decreases by 18.5781, and MAE decreases by 25.8940. The prediction accuracy and the interval coverage of the IWOA-CNN-BiGRU-CBAM model proposed are higher on the dataset, indicating a superior generalization capacity. The next step involves enhancing the multi-scenario processing capability of the forecasting model by incorporating meteorological factors such as rainfall, temperature, humidity, and holidays when addressing the more sophisticated task of short-term load forecasting in complex settings.

Author Contributions

Conceptualization, L.D. and H.W.; methodology, L.D.; software, L.D.; validation, L.D. and H.W.; formal analysis, L.D.; investigation, L.D.; resources, L.D.; data curation, L.D.; writing—original draft preparation, L.D.; writing—review and editing, H.W. and L.D.; visualization, L.D.; supervision, L.D.; project administration, H.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by 2023 University Students’ Innovation and Entrepreneurship Training Project of China University of Geosciences, Beijing (Grant No. S202311415157). And it is also supported by 2024 Special Projects for Graduate Education and Teaching Reform from China University of Geosciences, Beijing (Grant No. JG2024021 and JG2024013).

Data Availability Statement

The original contributions presented in the study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AdamAdaptive Moment Estimation
AIArtificial intelligence
ANNArtificial Neural Networks
ARIMAAutoregressive Integrated Moving Average
BiLSTMBidirectional Long Short-Term Memory
BiGRUBidirectional Gated Recurrent Unit
BiGRU-SENetBidirectional Gated Recurrent Units and Squeeze-and-Excitation Networks
BPBack Propagation
CBAMConvolutional Block Attention Module
CEEMDANComplete Ensemble Empirical Mode Decomposition with Adaptive Noise
CNNConvolutional Neural Network
CNN-BiGRUConvolutional Neural Network-Bidirectional Gated Recurrent Unit
CNN-BiGRU-AttentionConvolutional Neural Network-Bidirectional Gated Recurrent Unit-Squeeze-and-Excitation Block
CNN-BiGRU-CBAMConvolutional Neural Network-Bidirectional Gated Recurrent Unit-Convolutional Block Attention Module
CSCuckoo Search
ECAEnhanced Clustering Algorithm
FARIMAFractional Autoregressive Integral Moving Average
GMGrey Model
GRUGated Recurrent Unit
GWOGrey Wolf Optimizer
IWOAImproved Whale Optimization Algorithm
LSTMLong Short-Term Memory
MFOAMoth-Flame Optimization Algorithm
mPSOModified Particle Swarm Optimization
mSCAModified Sine Cosine Algorithm
msWOAMulti-Strategy Whale Optimization Algorithm
MVOMultiverse Optimization
PSOParticle Swarm Optimization
PVPhotovoltaic
RAdamRectified Adaptive Moment Estimation
RBFNNRadial Basis Function Neural Network
SESqueeze-and-Excitation
SENetSqueeze-and-Excitation Networks
STLFShort-term power load forecasting
SVMSupport Vector Machine
VMDVariational Mode Decomposition
WOAWhale Optimization Algorithm
WTWavelet Transform
XGBOOSTeXtreme Gradient Boosting
F An intermediate feature map
F a v g c ,   F a v g s Average-pooled features
F max c ,   F max s Max-pooled features
f 7 × 7 A convolution operation with the filter size of 7 × 7
M c A 1D channel attention map
M s A 2D spatial attention map
MAEMean Absolute Error
R2Coefficient of Determination
RMSERoot Mean Squared Error
σ The sigmoid function

References

  1. Sheng, Z.; An, Z.; Wang, H.; Chen, G.; Tian, K. Residual LSTM based short-term load forecasting. Appl. Soft Comput. J. 2023, 144, 110461. [Google Scholar] [CrossRef]
  2. Anh, N.N.; Dat, D.T.; Elena, V.; Vijender, K.S. Short-term forecasting electricity load by long short-term memory and reinforcement learning for optimization of hyper-parameters. Evol. Intell. 2023, 16, 1729–1746. [Google Scholar]
  3. Kim, D.; Lee, D.; Nam, H.; Joo, S.K. Short-Term Load Forecasting for Commercial Building Using Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) Network with Similar Day Selection Model. J. Electr. Eng. Technol. 2023, 18, 4001–4009. [Google Scholar] [CrossRef]
  4. Dima, A.; Mark, L. Short-term load forecasting in smart meters with sliding window-based ARIMA algorithms. Vietnam J. Comput. Sci. 2018, 5, 241–249. [Google Scholar]
  5. Mi, J.; Fan, L.; Duan, X.; Qiu, Y. Short-Term Power Load Forecasting Method Based on Improved Exponential Smoothing Grey Model. Math. Probl. Eng. 2018, 2018, 3894723. [Google Scholar] [CrossRef]
  6. Shalini, S.; Angshul, M.; Victor, E.; Emilie, C. Blind Kalman Filtering for Short-Term Load Forecasting. IEEE Trans. Power Syst. 2020, 35, 4916–4919. [Google Scholar]
  7. Pang, X.; Sun, W.; Li, H.; Wang, Y.; Luan, C. Short-term power load forecasting based on gray relational analysis and support vector machine optimized by artificial bee colony algorithm. Peer J. Comput. Sci. 2022, 8, e1108. [Google Scholar] [CrossRef] [PubMed]
  8. Jin, Y.; Guo, H.; Wang, J.; Song, A. A Hybrid System Based on LSTM for Short-Term Power Load Forecasting. Energies 2020, 13, 6241. [Google Scholar] [CrossRef]
  9. Zhao, H.; Zhou, Z.; Zhang, P. Forecasting of the Short-Term Electricity Load Based on WOA-BILSTM. Int. J. Pattern Recognit. Artif. Intell. 2023, 37, 272–286. [Google Scholar] [CrossRef]
  10. Gao, X.; Li, X.; Zhao, B.; Ji, W.; Jing, X.; He, Y. Short-Term Electricity Load Forecasting Model Based on EMD-GRU with Feature Selection. Energies 2019, 12, 1140. [Google Scholar] [CrossRef]
  11. Ji, X.; Liu, D.; Xiong, P. Multi-model fusion short-term power load forecasting based on improved WOA optimization. Math. Biosci. Eng. 2022, 19, 13399–13420. [Google Scholar] [CrossRef] [PubMed]
  12. Chu, Z.; Tian, P.; Shahzad, N.M. A novel integrated photovoltaic power forecasting model based on variational mode decomposition and CNN-BiGRU considering meteorological variables. Electr. Power Syst. Res. 2022, 213, 108796. [Google Scholar]
  13. Dao, F.; Zeng, Y.; Qian, J. Fault diagnosis of hydro-turbine via the incorporation of bayesian algorithm optimized CNN-LSTM neural network. Energy 2024, 290, 2901–2916. [Google Scholar] [CrossRef]
  14. Li, D.; Wang, X.; Sun, J.; Feng, Y. Radial Basis Function Neural Network Model for Dissolved Oxygen Concentration Prediction Based on an Enhanced Clustering Algorithm and Adam. IEEE Access 2021, 9, 44521–44533. [Google Scholar] [CrossRef]
  15. Ge, L.; Xian, Y.; Wang, Z.; Gao, B.; Chi, F.; Sun, K. A GWO-GRNN based model for short-term load forecasting of regional distribution network. CSEE J. Power Energy Syst. 2020, 7, 1093–1101. [Google Scholar]
  16. Mirjalili, S.; Lewis, A. The Whale Optimization Algorithm. Adv. Eng. Softw. 2016, 95, 51–67. [Google Scholar] [CrossRef]
  17. Wu, F.; Cattani, C.; Song, W.; Zio, E. Fractional ARIMA with an improved cuckoo search optimization for the efficient Short-term power load forecasting. Alex. Eng. J. 2020, 59, 3111–3118. [Google Scholar] [CrossRef]
  18. Bahrami, S.; Hooshmand, R.-A.; Parastegari, M. Short term electric load forecasting by wavelet transform and grey model improved by PSO (particle swarm optimization) algorithm. Energy 2014, 72, 434–442. [Google Scholar] [CrossRef]
  19. Abu-Shikhah, N.; Elkarmi, F.; Aloquili, O.M. Medium-Term Electric Load Forecasting Using Multivariable Linear and Non-Linear Regression. Smart Grid Renew. Energy 2011, 2, 126–135. [Google Scholar] [CrossRef]
  20. Zhang, X.; Wang, J.; Zhang, K. Short-term electric load forecasting based on singular spectrum analysis and support vector machine optimized by Cuckoo search algorithm. Electr. Power Syst. Res. 2017, 146, 270–285. [Google Scholar] [CrossRef]
  21. Liu, T.; Fan, D.; Chen, Y.; Dai, Y.; Jiao, Y.; Cui, P.; Wang, Y.; Zhu, Z. Prediction of CO2 solubility in ionic liquids via convolutional autoencoder based on molecular structure encoding. AIChE J. 2023, 69, e18182. [Google Scholar] [CrossRef]
  22. Fan, D.; Xue, K.; Zhang, R.; Zhu, W.; Zhang, H.; Qi, J.; Zhu, Z.; Wang, Y.; Cui, P. Application of interpretable machine learning models to improve the prediction performance of ionic liquids toxicity. Sci. Total Environ. 2023, 908, 168168. [Google Scholar] [CrossRef] [PubMed]
  23. Bian, H.; Zhong, Y.; Sun, J.; Shi, F. Study on power consumption load forecast based on K-means clustering and FCM–BP model. Energy Rep. 2020, 6, 693–700. [Google Scholar] [CrossRef]
  24. Lin, W.; Zhang, B.; Li, H.; Lu, R. Short-term load forecasting based on EEMD-Adaboost-BP. Syst. Sci. Control Eng. 2022, 10, 846–853. [Google Scholar] [CrossRef]
  25. Liu, T.; Chu, X.; Fan, D.; Ma, Z.; Dai, Y.; Zhu, Z.; Wang, Y.; Gao, J. Intelligent prediction model of ammonia solubility in designable green solvents based on microstructure group contribution. Mol. Phys. 2022, 120, e2124203. [Google Scholar] [CrossRef]
  26. Mu, Y.; Wang, M.; Zheng, X.; Gao, H. An improved LSTM-Seq2Seq-based forecasting method for electricity load. Front. Energy Res. 2023, 10, 1093667. [Google Scholar] [CrossRef]
  27. Liu, F.; Liang, C. Short-term power load forecasting based on AC-BiLSTM model. Energy Rep. 2024, 11, 1570–1579. [Google Scholar] [CrossRef]
  28. Wu, K.; Wu, J.; Feng, L.; Yang, B.; Liang, R.; Yang, S.; Zhao, R. An attention-based CNN-LSTM-BiLSTM model for short-term electric load forecasting in integrated energy system. Int. Trans. Electr. Energy Syst. 2020, 31, 576–583. [Google Scholar] [CrossRef]
  29. Jia, T.; Yao, L.; Yang, G.; He, Q. A Short-Term Power Load Forecasting Method of Based on the CEEMDAN-MVO-GRU. Sustainability 2022, 14, 16460. [Google Scholar] [CrossRef]
  30. Liang, R.; Chang, X.; Jia, P.; Xu, C. Mine Gas Concentration Forecasting Model Based on an Optimized BiGRU Network. ACS Omega 2020, 5, 28579–28586. [Google Scholar] [CrossRef]
  31. Meng, Y.; Chang, C.; Huo, J.; Zhang, Y.; Al-Neshmi, H.M.M.; Xu, J.; Xie, T. Research on Ultra-Short-Term Prediction Model of Wind Power Based on Attention Mechanism and CNN-BiGRU Combined. Front. Energy Res. 2022, 10, 920835. [Google Scholar] [CrossRef]
  32. Xu, Y.; Jiang, X. Short-term power load forecasting based on BiGRU-Attention-SENet model. Energy Sources Part A Recovery Util. Environ. Eff. 2022, 44, 973–985. [Google Scholar] [CrossRef]
  33. Yang, B.; Wang, Y.; Zhan, Y. Lithium Battery State-of-Charge Estimation Based on a Bayesian Optimization Bidirectional Long Short-Term Memory Neural Network. Energies 2022, 15, 4670. [Google Scholar] [CrossRef]
  34. Feng, M.; Duan, Y.; Wang, X.; Zhang, J.; Ma, L. Carbon price prediction based on decomposition technique and extreme gradient boosting optimized by the grey wolf optimizer algorithm. Sci. Rep. 2023, 13, 18447. [Google Scholar] [CrossRef] [PubMed]
  35. Guo, J.; Chen, C.; Wen, H.; Cai, G.; Liu, Y. Prediction model of goaf coal temperature based on PSO-GRU deep neural network. Case Stud. Therm. Eng. 2024, 53, 103813. [Google Scholar] [CrossRef]
  36. Xiao, M.; Luo, R.; Chen, Y.; Ge, X. Prediction model of asphalt pavement functional and structural performance using PSO-BPNN algorithm. Constr. Build. Mater. 2023, 407, 133534. [Google Scholar] [CrossRef]
  37. Luo, J.; Gong, Y. Air pollutant prediction based on ARIMA-WOA-LSTM model. Atmos. Pollut. Res. 2023, 14, 101761. [Google Scholar] [CrossRef]
  38. Sun, Y.; Zhang, J.; Yu, Z.; Liu, Z.; Yin, P. WOA (Whale Optimization Algorithm) Optimizes Elman Neural Network Model to Predict Porosity Value in Well Logging Curve. Energies 2022, 15, 4456. [Google Scholar] [CrossRef]
  39. Chen, L.; Li, S.; Bai, Q.; Yang, J.; Jiang, S.; Miao, Y. Review of Image Classification Algorithms Based on Convolutional Neural Networks. Remote Sens. 2021, 13, 4712. [Google Scholar] [CrossRef]
  40. Mansour, A.A.; Tilioua, A.; Touzani, M. Bi-LSTM, GRU and 1D-CNN models for short-term photovoltaic panel efficiency forecasting case amorphous silicon grid-connected PV system. Results Eng. 2024, 21, 101886. [Google Scholar] [CrossRef]
  41. Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module; Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2018; Volume 11211, pp. 3–19. [Google Scholar]
  42. Yang, W.; Xia, K.; Fan, S.; Wang, L.; Li, T.; Zhang, J.; Feng, Y. A Multi-Strategy Whale Optimization Algorithm and Its Application. Eng. Appl. Artif. Intell. 2022, 108, 104558. [Google Scholar] [CrossRef]
  43. Mirjalili, S. Moth-flame optimization algorithm: A novel nature-inspired heuristic paradigm. Knowl.-Based Syst. 2015, 89, 228–249. [Google Scholar] [CrossRef]
  44. Zhou, H.; Pang, J.; Chen, P.-K.; Chou, F.-D. A modified particle swarm optimization algorithm for a batch-processing machine scheduling problem with arbitrary release times and non-identical job sizes. Comput. Ind. Eng. 2018, 123, 67–81. [Google Scholar] [CrossRef]
  45. Gupta, S.; Deep, K. A hybrid self-adaptive sine cosine algorithm with opposition based learning. Expert Syst. Appl. 2019, 119, 210–230. [Google Scholar] [CrossRef]
  46. Derrac, J.; García, S.; Molina, D.; Herrera, F. A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol. Comput. 2011, 1, 3–18. [Google Scholar] [CrossRef]
Figure 1. Structure of CNN-BiGRU-CBAM.
Figure 1. Structure of CNN-BiGRU-CBAM.
Energies 17 02559 g001
Figure 2. Structure of CNN.
Figure 2. Structure of CNN.
Energies 17 02559 g002
Figure 3. Structure of GRU.
Figure 3. Structure of GRU.
Energies 17 02559 g003
Figure 4. Structure of BiGRU.
Figure 4. Structure of BiGRU.
Energies 17 02559 g004
Figure 5. CBAM architecture.
Figure 5. CBAM architecture.
Energies 17 02559 g005
Figure 6. Channel attention structure.
Figure 6. Channel attention structure.
Energies 17 02559 g006
Figure 7. Spatial attention structure.
Figure 7. Spatial attention structure.
Energies 17 02559 g007
Figure 8. Good point set method-generated initial two-dimensional population.
Figure 8. Good point set method-generated initial two-dimensional population.
Energies 17 02559 g008
Figure 9. Initial two-dimensional population generated using the stochastic method.
Figure 9. Initial two-dimensional population generated using the stochastic method.
Energies 17 02559 g009
Figure 10. Flowchart of the IWOA.
Figure 10. Flowchart of the IWOA.
Energies 17 02559 g010
Figure 11. Training loss of the proposed model.
Figure 11. Training loss of the proposed model.
Energies 17 02559 g011
Figure 12. Comparison of forecasting results.
Figure 12. Comparison of forecasting results.
Energies 17 02559 g012
Figure 13. Convergence diagram.
Figure 13. Convergence diagram.
Energies 17 02559 g013
Table 1. Model parameters.
Table 1. Model parameters.
ParametersValue
CNN2 convolutional layers
2 pooling layers
Convolution kernel size3 × 1
Initial learn rate0.01
Batch size128
Epochs500
The hidden neurons of BiGRU8 + 8
The hidden neurons of BiLSTM8 + 8
Table 2. An analysis of the accuracy of forecasting in several models.
Table 2. An analysis of the accuracy of forecasting in several models.
ModelRMSE/kwVarR2MAE / kw Var
WOA-CNN-BiGRU-Attention119.552883.58790.9909994.873781.6412
WOA-CNN-BiGRU-CBAM105.00861.52370.9930581.942265.8345
IWOA-CNN-BiGRU-CBAM86.429925.44160.9952956.048221.5276
Table 3. The comparison of obtained solutions.
Table 3. The comparison of obtained solutions.
(a) F1–F6
Function IWOAWOAmsWOAmFOAmPSOmSCA
F1best300300300300300.0001300.0001
worst300300300.0294300300.3384300.0524
mean300300300.0035300300.0486300.0133
std2.27 × 10−127.54 × 10−80.00447700.0580250.0128
F2best400.2777401.6992413.2647403.3437408.3338421.0018
worst479.3967717.12921183.94748.6179857.7683502.4041
mean422.1213458.1891612.9995475.8902496.6692455.5791
std24.3663461.39288171.007773.62923103.662716.61971
F3best600600600.0087600600.0081600.0409
worst600.003600.1573600.9438600.0976600.855601.2707
mean600.0001600.0202600.2592600.0023600.2633600.4247
std0.0003220.0371950.1716720.0118410.1734380.238818
F4best812.9351812.3129812.4637809.7649820.047822.6616
worst876.6685900.7113868.3376853.5478863.1018853.6567
mean843.9564838.2078839.1272825.2598838.2995839.0441
std20.2448514.5580612.252649.4187287.6139526.627878
F5best900900900900900900
worst900900900.0076900900.0735900.0028
mean900900900.0006900900.0067900.0004
std1.37 × 10−111.81 × 10−70.00122100.0098450.000393
F6best1883.5591891.4936177.151842.68315,077.641,854.58
worst8135.54442,432.68530,264.32,982,9481,494,90110,471,639
mean3500.3214691.582110,153.881,266.17446,976.41,507,792
std1631.1624312.31999,805.16365,816346,7081,588,267
(b) F7–F12
Function IWOAWOAmsWOAmFOAmPSOmSCA
F7best2089.8972077.672103.4712075.3512073.8082105.105
worst2384.0012482.7482338.5272348.32292.8952240.146
mean2204.22214.592187.5752178.6152160.9632164.61
std63.9726370.0977848.4417757.0985245.109324.47191
F8best2208.2542222.9032230.0312218.1752221.9172220.173
worst2247.8292285.3382374.4292257.8192414.0432239.526
mean2232.6852236.2152261.0162230.4352242.6652232.556
std5.45296511.4228846.949116.24174728.489833.501958
F9best230023002300.05523002300.3422300.031
worst25002300.0312500.0892500.3042306.8652303.564
mean23042300.0062323.262351.1982302.5442301.311
std28.141060.00637562.5708884.782681.5505690.760773
F10best2500.1742500.2912500.5662500.3852500.8622500.655
worst2501.9063740.1232502.5082669.4123943.5252649.952
mean2500.7042600.0322501.2912540.4912607.372511.309
std0.252649227.46140.41880663.03421231.299336.14797
F11best2600.1822625.6872735.6153010.6652775.4962758.747
worst4654.4934679.5334620.61911652.843780.672883.853
mean2940.1072986.623154.8115540.7643101.4572785.171
std327.2382270.581275.69161545.086328.732819.76031
F12best2861.5312864.3342896.1352860.8352862.5352864.759
worst2887.8733094.5822961.5063070.8763043.662879.305
mean2867.6742902.1192952.3132902.8242892.6792870.294
std4.90530943.536335.84215138.0121232.246272.236951
Table 4. Comparison of different algorithmic rank averages.
Table 4. Comparison of different algorithmic rank averages.
AlgorithmOrdinal Mean
IWOA2.87
WOA4.15
msWOA5.77
mFOA3.98
mPSO5.97
mSCA4.15
Table 5. Comparison of forecasting accuracy in different models.
Table 5. Comparison of forecasting accuracy in different models.
ModelRMSE/kwR2MAE / kw
IWOA-BiGRU183.96190.9789124.1239
IWOA-CNN-BiGRU158.63220.98431115.6731
IWOA-CNN-BiGRU-Attention111.29140.9922585.4261
IWOA-CNN-BiGRU-CBAM86.42990.9952956.0482
Table 6. Comparative analysis of forecast precision across various models.
Table 6. Comparative analysis of forecast precision across various models.
ModelRMSE/kwR2MAE / kw
IWOA-CNN-BiGRU-CBAM86.42990.9952956.0482
WOA-CNN-BiGRU-CBAM105.0080.9930581.9422
msWOA-CNN-BiGRU-CBAM101.72680.9934877.2248
mFOA-CNN-BiGRU-CBAM300.68140.94303239.2391
mPSO-CNN-BiGRU-CBAM116.33470.9914788.6592
mSCA-CNN-BiGRU-CBAM102.19710.9934269.5965
Table 7. Comparison of forecasting accuracy in different models.
Table 7. Comparison of forecasting accuracy in different models.
ModelRMSE/kwR2MAE / kw
BiLSTM387.37690.92588276.0312
IWOA-CNN-BiLSTM-Attention185.15840.96124162.2335
IWOA-CNN-BiLSTM-CBAM154.68350.97247137.2238
IWOA-CNN-BiGRU-CBAM86.42990.9952956.0482
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Dai, L.; Wang, H. An Improved WOA (Whale Optimization Algorithm)-Based CNN-BIGRU-CBAM Model and Its Application to Short-Term Power Load Forecasting. Energies 2024, 17, 2559. https://doi.org/10.3390/en17112559

AMA Style

Dai L, Wang H. An Improved WOA (Whale Optimization Algorithm)-Based CNN-BIGRU-CBAM Model and Its Application to Short-Term Power Load Forecasting. Energies. 2024; 17(11):2559. https://doi.org/10.3390/en17112559

Chicago/Turabian Style

Dai, Lei, and Haiying Wang. 2024. "An Improved WOA (Whale Optimization Algorithm)-Based CNN-BIGRU-CBAM Model and Its Application to Short-Term Power Load Forecasting" Energies 17, no. 11: 2559. https://doi.org/10.3390/en17112559

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop