# SCADA Data Based Wind Power Interval Prediction Using LUBE-Based Deep Residual Networks Huajin Li, et al.

May 15, 2022

## 1 Introduction

Wind power, a major source of renewable energy, has been widely developed worldwide to supplement and replace traditional fossil fuels (He and Kusiak 2017; Javed et al., 2020). Owing to the intermittent and stochastic nature of wind, wind power systems face challenges in terms of reliability and stability. Thus, high-quality wind power predictions are expected in practice (Long et al., 2020; Long et al., 2021).

According to a literature review, point estimation plays a dominant role in wind power prediction. Haykin (1994) experimented with multiple architectures of neural networks to explore the power of wind-turbine energy prediction. Kelouwani et al. (2004) first used a neural network and wind speed to forecast wind power based on power curves. Tascikaraoglu and Uzunoglu (2014) proposed the use of an autoregressive integrated moving average model to forecast short-term wind power. Ren et al. (2014) applied adaboost-backpropagation to improve the neural network algorithm and achieved an improved wind power prediction performance. Wu and Peng, (2017) performed short-term wind power prediction using k-means clustering with a bagging neural network. Zhang et al. (2016) adopted a probabilistic support vector machine to predict short-term wind power. Deng et al. (2020) trained deep neural networks (DNNs) to forecast short-term wind power. Li et al. (2021a) introduced a framework called ICEEMDAN to decompose wind power time-series data and discovered that the prediction performance was enhanced. Li et al. (2021b) trained a deep belief network to forecast short-term wind power and used EWMA control charts to monitor abnormal wind power prediction errors. In summary, the point-based prediction of wind power has already achieved promising performance in practice (Long et al., 2022).

High-quality wind power forecasting is expected to reduce uncertainty at various time scales (Ouyang et al., 2017; Huang et al., 2018; Tang et al., 2020). However, point estimation, which outputs a deterministic value, fails to provide sufficient consideration of the prediction uncertainty (Shen and Shen 2018; Ouyang et al., 2020). In comparison, interval prediction with a certain confidence level is gaining popularity among scholars and engineers (Shen et al., 2020). Unlike the point estimation approach, interval prediction quantifies the uncertainty of wind power and provides probabilistic estimation in the temporal domain.

Among various interval prediction methods, the interval prediction model based on the lower and upper bound estimation (LUBE) (Khosravi et al., 2010) approach has become the most popular and has attracted considerable attention. Following the LUBE architecture, a prediction algorithm with two outputs instead of a single output was utilized. The two outputs, which represent the upper and lower bounds, share the same input data vector and hidden layer. Both the loss function and training strategy are identical for both outputs (Sun et al., 2020a).

In this paper, we propose a combination of the LUBE approach with a deep residual network (DRN) for short-term wind power prediction. The DRN is first modified with two outputs that represent the upper and lower bounds of the prediction interval. The LUBE approach was then utilized to train the DRN algorithm. Here, the coverage width-based criterion (CWC) was selected as the objective function to optimize the DRN, and the Adam optimizer was adopted to optimize the CWC. Field data collected from a wind farm located in northwest China were used for the case study.

The main contributions of this paper can be concluded as follows:

• A new approach combining a DRN and the LUBE method is proposed for wind power interval prediction.

• Supervisory control and data acquisition (SCADA) data considering wind speed, wind direction, ambient temperature, air density, historic power output, gearbox bearing temperature, rotor speed, and pitch angle are utilized as inputs for power interval prediction.

The remainder of this paper is organized as follows. Section 2 introduces the DRN structure, the LUBE approach, other popular interval prediction algorithms, and evaluation metrics. Section 3 introduces the dataset and the variables used for interval prediction. Section 4 presents the computational results. Section 5 concludes the paper.

## 2 Methodology

### 2.1 Deep Residual Network

DNNs have achieved promising performances in both classification and regression tasks (Li et al., 2020; Li et al., 2022). However, in practice, gradient vanishing or explosion during the training process presents a challenge. The DRN, which incorporates the residual unit into the DNNs, is capable of offering superior performance in supervised learning tasks, such as image classification, target detection, and statistical anomaly detection (Sun et al., 2020b; Shen et al., 2021; Shen and Raksincharoensak, 2021).

According to a literature review (He et al., 2016), a single residual unit can be expressed as follows:

where

$Xl$

and

$Xl+1$

represent the input and output of the residual unit, respectively;

$F$

() denotes the residual function that contains a convolution operator, batch normalization, and rectified linear unit (ReLU); and

$f$

() represents a ReLU activation function. The output of the residual function is added to the input and passed through the ReLU activation function. During the training process, the gradient of the loss function with respect to any hidden layer can be derived using the chain rule used in backpropagation.

Compared with the conventional DNN architecture, the DRN has two major advantages: first, it does not experience the problem of gradient vanishing or explosion during the training process; second, the backpropagation step enables gradient progression from the deeper layer to the shallow layer. Thus, the residual characteristics enable a smooth transfer of information between the deeper and shallow layers. This guarantees successful training of the DRN in practice.

### 2.2 Lower Upper Bound Estimation Approach With Deep Residual Network

A common misconception in practice during interval prediction is that data follow a certain distribution (Shen et al., 2019). Although such an assumption can simplify the construction of prediction intervals (PIs), it can cause other problems concerning the possible deviation of the data from the pre-assumed distribution (Ouyang et al., 2019b; Ouyang et al., 2019c).

Khosravi et al. (2010) first proposed the LUBE approach for interval prediction in 2011. The proposed approach is based on the PI of neural networks and aims to train neural networks by minimizing the objective function of the PI. Instead of a single output for point-based estimation, the LUBE approach involves two outputs: the upper and lower boundaries of the PI. Here, the PI includes the predicted values within a certain range, along with a computed probability as the confidence level, which is based on historical data. Generally, high-quality interval prediction refers to the actual measured values that fall within the PI as much as possible, whereas the PI is as narrow as possible.

In this study, the LUBE approach was incorporated with a DRN to provide PIs for short-term wind energy. Figure 1 shows the revised version of the DRN that was applied using the LUBE approach. Figure 1A shows the general neural network architecture using the LUBE approach for interval prediction. It contains an input layer, hidden layer, and two output layers that represent both the lower and upper boundaries of the PIs. The PI denotes the interval between the two boundaries, and a correct prediction implies that the actual value falls within the PI. The hidden layers of the DRN differ from those in conventional neural networks. Instead of a layer with hidden nodes (see Figure 1B), the DRN contains residual blocks as hidden layers. As shown in Figure 1C, each residual block inputs the data into a residual function, and the output of the residual function is concatenated with the original input. It then passes through the ReLU activation function, as described in Section 2.1.

FIGURE 1. Deep residual network integrated with LUBE approach.

According to Figure 1, the main advantage of the proposed method that uses a DRN and the LUBE approach for short-term wind power forecasting is evident: it simplifies the process of PI construction. The LUBE approach uses a feed-forward strategy to estimate the lower and upper boundaries of the PI. By outputting two point forests that represent the two boundaries, the actual short-term wind power is expected to fall within the PI.

### 2.3 Other Interval Prediction Algorithms

Besides the DRN, there are other popular benchmark interval prediction algorithms, such as artificial neural networks (ANN), extreme learning machines (ELM), and kernel extreme learning machines (KELM). They achieved promising results in other time-series interval prediction tasks and were also selected and trained in this study for comparative analysis against the proposed DRN using the same LUBE approach.

The ANN is a nonparametric supervised learning analytic algorithm, that is, widely used for classification and regression tasks (Li et al., 2018). It is capable of performing high-quality predictions, as it is modeled after the processes of learning in a cognitive system. The ANN can accurately and effectively extract patterns from the dataset and construct mapping relationships between inputs and outputs. A typical ANN architecture contains an input layer, one or more hidden layers, and an output layer. The output of each neuron inside the layers is based on the neuron of the previous layer and its associated weights, which can be expressed by Eq. 2:

$αij=fj(∑k=1n(j−1)(αk(j−1)∗ωki(j−1))+bij)(2)$

where

$αij$

and

$bij$

are the output and bias of the ith neuron in the jth hidden layer, respectively;

$αk(j−1)$

and

$ωki(j−1)$

represent the output and weight of the neuron from the previous layer, respectively;

$n(j−1)$

is the total number of neurons in layer

$(j−1)$

and

$fj$

() is the activation function of the jth layer.

An ELM is a novel single-hidden-layer feedforward neural network (SLFN) proposed by Huang et al. (2018). It randomly initializes the linking weights and biases, and contains a limited number of hidden neurons defined by the users. With only one hidden layer, the ELM is capable of obtaining unique optimal output weights using only a one-step calculation, and thus obtains a high training speed. For a given dataset with input

$xj$

and target output

$tj$

, the ELM in a regression task can be expressed by Eq. 3 and the optimization task can be expressed by Eq. 4:

$oj=∑i=1nβiG(xj,ωi,bi)(3)$

where

$ωi$

and

$bi$

are the weights and bias for the connection between the ith node in the hidden layer and the input vector

$xj$

, respectively;

$βi$

is the weight vector between the ith node in the hidden layer and the output; and

$oj$

is the prediction output from the ELM. Here, Eq. 3 can be written as

$Hβ=T,$

where

$H$

is the hidden layer output matrix and

$T$

is the target output matrix. The solution is expressed in Eq. 5 as follows:

where

$H†$

is the Moore–Penrose pseudoinverse of the hidden layer output matrix of

$H$

.

KELM is an improved version of the ELM and has a higher generalization capacity and less chance of overfitting (Iosifidis et al., 2015). Compared with the vanilla ELM, it introduces a kernel function

$k(xi,xj)$

when the feature mapping

$H$

is unknown. Here, the kernel function

$k(xi,xj)$

is a substitution of the ELM’s arbitrary feature mapping, and the output weight becomes robust. The kernel serves as a function to describe the relationship between data points which enhances the performance of feature mapping for ELM. The generalization capacity on both regression and classification problem is improved by introducing the kernel function in ELM. Various kernel functions can be utilized for KELM, such as polynomial, linear, and radial basis function (RBF) kernels. In practice, the RBF kernel demonstrates considerable learning capacity in interval prediction tasks with fewer hyperparameters. Thus, the RBF kernel was considered in this task, and it can be expressed by Eq. 6:

$k(xi,xj)=exp(−g‖xi−xj‖2)(6)$

where

$g$

is the kernel parameter.

### 2.4 Objective Function and Evaluation Metrics

Once the PIs are constructed, it is essential to evaluate the quality of their output from interval prediction algorithms. In general, interval prediction algorithms aim to predict an interval that encompasses predicted points under a certain confidence level (Ouyang et al., 2019a). Thus, the prevailing two dimensions, i.e., the coverage rate and interval width, are key quantitative metrics for the quality evaluation of the constructed PIs.

First, the PI coverage probability (PICP) (Khosravi et al., 2011) was utilized to measure the coverage rate. The PICP can be computed using Eq. 7:

where

$N$

is the total number of samples measured and

$ci$

is the number of samples that fall into the PI. The value of

$ci$

is binary and is either 0 or 1.

Second, the PI normalized average width (PINAW) (Kavousi-Fard et al., 2015) was introduced in this study to evaluate the PI width. The PINAW can be computed as follows:

$PINAW=1RN∑i=1N(ui−li)(8)$

where

$N$

is the total number of samples measured,

$ui$

and

$li$

are the upper and lower bounds of the ith sample, respectively, and

$R$

is the total range of the prediction target.

In addition, the coverage width-based criterion (CWC) (Taormina and Chau, 2015), which considers both the PI width and coverage, was computed in this study. The computation of the CWC can be achieved using Eq. 9.

$CWC=PINAW(1+γ(PICP)e−η(PICP−μ))(9)$

where the parameters

$η$

and

$μ$

are used to define the penalty term

$e−η(PICP−μ)$

to maintain the balance between

$PINAW$

and

$PICP$

; and

$γ$

() is used to reduce the risk of the PI constraint violation during the training process. The CWC is utilized as the objective function in this study.

## 3 Dataset Summary

TABLE 1. SCADA variables utilized in this study.

According to Table 1, eight prevailing SCADA variables were utilized as inputs for the interval prediction task in this study. Half of the selected variables were environmental factors and the others were either the electrical or mechanical characteristics of the wind turbine measured. The selected variables overlap with those used by the majority of related studies, confirming the validity of the selection.

## 4 Experimental Results

To perform short-term wind power forecasting, experiments were conducted to train the DRN following the LUBE approach. In this study, the entire day dataset was utilized as the training dataset and the 10-min following wind power as the target output. The CWC was selected as the objective function, and the Adam optimizer tuned the hyperparameters of the DRN.

Figure 2 displays the training process of the DRN together with those of three other benchmark interval forecasting algorithms: ANN, ELM, and KELM. All the tested interval forecasting algorithms were trained using the LUBE approach, as described in Section 2.2. In total, 100 training epochs were set for all the interval forecasting algorithms. It can be observed that using DRN, the CWC converges around the first 20 epochs, which is significantly higher than the CWC from other interval prediction algorithms. This demonstrates the superiority of the proposed interval prediction approach using the DRN.

FIGURE 2. Changes of CWC at different training epochs.

In addition, this study also explored the relationship between the prediction horizon and CWC. Here, as see Figure 3, instead of single 10-min ahead power forecasting, we also tested the interval prediction performance of multiple horizons from 20-min ahead to 200-min ahead. Intuitively, the CWC for all the algorithms escalates as the prediction horizon increases. Comparatively, the CWC values of DRN escalate slower than those of the other algorithms, which confirms its outperformance in interval prediction tasks in longer prediction horizons.

FIGURE 3. Changes of CWC at different prediction horizons.

Finally, 10-min ahead short-term wind power forecasting was performed on the test dataset, as presented in Figure 4, which includes the interval forecasting outcome from a whole day in four different seasons. The PIs denote the 95% confidence interval within which the actual power falls, and the target represents the measured wind power according to the SCADA system. A summary of interval forecasting on the test data is provided in Table 2.

FIGURE 4. Constructed PIs and actual target wind power of the test dataset.

TABLE 2. Summary of the interval prediction performance.

As summarized in Table 2, all the tested algorithms in this study were trained using the LUBE approach and examined using the same test dataset. The proposed DRN produced the highest PICP and the lowest PINAW and CWC values. All evaluation metrics were computed as mean and standard deviation. The computational results confirmed the superiority of the proposed approach.

## 5 Conclusion

In this paper, we propose an interval prediction approach that provides probabilistic short-term wind turbine power generation. SCADA data at 10-min resolution were collected from a wind farm in northwestern China for the case studies. A DRN integrated with the LUBE approach was proposed in a short-term interval forecasting framework. A comparative analysis was performed with three other popular interval prediction algorithms. The computational results confirmed that the interval prediction error of the short-term wind power increased as the prediction horizon became more distant. The proposed approach using a DRN produced the best results for power interval prediction. The application of this model requires the development of new wind turbine control approaches.

## Data Availability Statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

## Author Contributions

HL conceptualized the study, contributed to the study methodology, data curation, software and formal analysis, and wrote the manuscript.

## Funding

This research is supported by the “Miaozi project” of scientific and technological innovation in Sichuan Province, China (Grant No. 2021090) and the Opening fund of State Key Laboratory of Geohazard Prevention and Geoenvironment Protection (Chengdu University of Technology) (Grant No. SKLGP 2021K014).

## Conflict of Interest

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

## Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.