# Short-Term Power Load Forecasting Under COVID-19 Based on Graph Representation Learning With Heterogeneous Features Zhuowei Yu, et al.

Jan 1, 2022

## 1 Introduction

### 1.1 Background

Short-term load forecasting for the future 24 h is one of the most critical techniques to ensure system stability, reliability, and economic efficiency. It affects power system operation in many aspects, including generation dispatch, power flow optimization (Chen et al.,; Meng et al., 2016), and energy bidding in the electricity market. Significant variabilities and uncertainties have been introduced by the diverse end-users and the ever-increasing shares of renewables (Li and Lu, 2020). As a result, accurate short-term load forecasting is a hard task in modern power systems.

Unfortunately, the outbreak of COVID-19 since early 2020 has posed extraordinary challenges on short-term load forecasting. As shown in Figure 1A, the virus has spread globally and in Figures 1B,C the cumulative number of confirmed diagnoses is increasing and in a rapid growth trend, with over two billion cumulative confirmed cases until August 25, 2021. It has been reported by (Ruan et al., 2020; Ruan et al., 2021) that the crisis has profoundly affected electricity unitization attributed to changes in people’s living habits and industrial production activities. Such effects vary spatially due to the differences in infection speed, vaccination coverage, and quarantine policies, resulting in additional variabilities and uncertainties in electricity consumption as (Figure 2).

FIGURE 1. COVID-19 situation worldwide and in China (updated August 25, 2021). (A) Distribution of confirmed cases of COVID-19; (B) Global cases of COVID-19; (C) Daily new confirmed cases at home and abroad.

FIGURE 2. Schematic diagram of the impact of COVID-19 pandemic on power grid load.

However, the infection has seen fast growth in the past 3 months. According to the World Health Organization (WHO, 2020) and the most recent research (Huang et al., 2020), the fight against COVID-19 is far from over, and its effects will last. Instead of sticking to strict quarantine policies or aiming at a sustained zero infection level, many countries start to resume economic activities and incorporate disease prevention and control into the day-to-day operation of society. In this context, a short-term load forecasting technique incorporating the effects of COVID-19 is of great significance to both power system operation and economic development, facilitating a smooth transition to a “living with COVID” new normal.

### 1.2 Literature Review

The effects of the pandemic on domestic and international regional electricity consumption are highly uncertain, which vary with infection speed and quarantine policies. Some published reports have preliminarily drawn qualitative conclusions: “electricity consumption decreases during the pandemic, but the grid remains reliable” (Bui and Wolfers, 2020; Cicala, 2020). However, the epidemic’s effects are multifaced and cannot be summarized simply by load reduction (Agdas and Barooah, 2020; Werth et al., 2020). Although (Ruan et al., 2020) has shown significant deviations between the simulated forecasts and the actual loads if the crisis’ effects are omitted, explicitly describing the complex effects and incorporating them into load forecasting are still open questions.

As a graph representation learning method, graph convolution network (GCN) can dig deeper into the intrinsic relationship of heterogeneous data by defining Fourier transform and convolution on the graph. As pioneering attempts, (Han et al., 2021) has built a GCN to forecast nitride emissions from coal-fired power plants, while (Wang and He, 2021) has built a graph attention network (GAT) for fault location in distribution networks. Their results demonstrate that GCN is a promising tool with powerful learning ability and generalization capability.

### 1.3 Contributions

Based on the research gap in short-term load forecasting and the recent progress in GCN, this paper encodes heterogeneous features related to electricity consumption and status of COVID-19 into a load graph and build a graph representation learning model to fit the complex mapping between the present load states and the load forecasts for the future.

The contributions in this study can be summarized as follows:

(1) Load graph encoding heterogeneous features. Each node in the graph corresponds to one time moment, while the edge weights are defined to represent temporal correlations between the nodes. The node features are defined with electricity consumption and epidemic status information so that the heterogeneous features can be fused in a graph.

(2) ResGCN with parallel training to learn graph representations and to forecast future loads under COVID-19. By learning residual from the input, ResGCN prevents over-smoothing and fits the mapping from heterogeneous features to the future loads. Besides, a graph concatenation is proposed for parallel training so that the learning efficiency can be improved significantly. Based on this method, precise short-term power load forecasting under COVID-19 is realized, laying the foundation for the stable operation of the power system.

## 2 Short-Term Load Forecasting Based on ResGCN With Heterogeneous Features

### 2.1 Load Graph Encoding Heterogeneous Features

##### 2.1.1 Feature Selection to Describe Epidemic Status

The COVID-19 crisis has had a significant impact on people’s living habits and industrial production activities and thus has led to changes in electricity consumption. To incorporate such effects into power load forecasting, it is important to identify the most representative features to describe the epidemic development status.

Herein, the COVID-EMDA + dataset is adopted, denoted as S, which has collected multi-source features from various sources in the United States since the epidemic, including weather temperature, human behavior, cell phone distribution, and so on. These features describe human activities from different aspects, while those showing similar extents of fluctuations as the epidemic develops are assumed to be more representative. Therefore, after data cleaning and normalization, the Pearson correlation coefficients between all the features in the dataset are computed. Then, the representativeness of each feature is assessed by the absolute value of its correlation coefficients with the others, as

where Fi and Fj denote features i and j, respectively.

Finally, the features with higher values of

$h$

are selected as inputs. In our implementation, the final selected futures are daily confirmed cases, mobility in grocery and pharmacy, and the counts of mobile devices locating at home. Apart from the three features describing the epidemic status, the temperature is also taken as input, which is acknowledged to exert significant effects on electricity utilization.

##### 2.1.2 Graph Representation of Electricity Consumption and Epidemic Status

Thus, this paper proposes to encode the multi-source features within time window Tk as a load graph, so that not only the features are contained at each time moment, but also the inherent correlations between the features can be considered. As shown in Figure 3, the load graph is fully connected and undirected, denoted as

$G(A,X)$

, where

$A$

$X∈ℝN×d$

is node feature matrix. The features of nodes and edges are defined as follows.

(1) Node feature encoding heterogeneous information.

FIGURE 3. Graph representation of power load and epidemic status.

Each node corresponds to a time moment within Tk while the node feature vector encodes heterogeneous information essential to load forecasting. The feature vector of node i, whose transpose is the ith row of input feature matrix

$X∈ℝN×d$

, can be formulated as

$X→i=[P→i,C→i,M→i,S→i,W→i]T(2)$

where

,

$C→i=[Ci(t)],t∈[1,Tk]$

,

$M→i=[Mi(t)],t∈[1,Tk]$

,

$S→i=[Si(t)],t∈[1,Tk]$

,

$W→i=[Wi(t)],t∈[1,Tk]$

denote vectors of power load, daily confirmed cases, mobility of grocery and pharmacy, and the counts of stay-at-home mobile devices, and temperature, respectively.

(2) Edge weight describing the temporal correlation.

The edge weights are defined to represent the temporal correlations between the node, based on the assumption that features of closer time moments exhibit more inherent correlations. Herein, the Gaussian kernel function is selected to define the edge weight due to its monotonicity and localizability, as

where ti and tj denote time moments of nodes i and j, respectively. ξ is the scale parameter, which is essential to generalization performance. Finally, with training data in [0, TL] and sliding size n, this paper collects a total of

load graphs as inputs into the ResGCN.

### 2.2 Load Forecasting Based on ResGCN

##### 2.2.1 Problem Statement

Graph representation learning refers to the technique which extracts desired high-dimension features of a graph so that the representation can be easily used by downstream tasks (Xia et al., 2021). In this paper, the short-term load forecasting problem can be stated as: Given a set of load graphs

and the actual following 24-h load records as labels

${y1,y2,…yL}$

. Our goal is to learn a model that can make 24-h forecasts for unseen load graphs.

Although traditional convolutional neural networks perform well in text processing and image recognition, they can only process data in Euclidean space. To that end, there has been an increasing interest in generalizing convolutions to the graph domain (Jie and Gc, 2020). GCN is one of the most popular methods, which learns node representations by passing and aggregating messages between neighbor nodes while preserving the topological structure. However, an aggregation process with kth GCN layers makes use of information of k-order neighbors. As a result, GCN can over-smooth the representations when more GCN layers are stack. Therefore, inspired by the residual modeling, this paper designed a residual graph convolutional network (ResGCN) (Li et al., 2018).

##### 2.2.2 Framework of ResGCN

As shown in Figure 4, the proposed ResGCN comprises the following components:

(1) Fully connected layer, which transforms the graph-structured representations into a sequence

$V$

;

(2) Residual graph convolutional network (ResGCN) blocks, each of which is tasked to learn an encoder

such that the output

$\mathrm{Z}=\left\{{\stackrel{\to }{\mathit{z}}}_{1},{\stackrel{\to }{z}}_{2},\dots {\stackrel{\to }{\mathit{z}}}_{{T}_{k}}\right\}$

is high-level node representations, where

$F$

and

${F}^{‘}$

are the dimension of input features and the dimension in the embedding space, respectively;

(3) LSTM layer, which extracts features from the input sequence

$V$

;

(4) Pooling layer with attention, which compresses the outputs by LSTM so that the redundant information can be removed. The pooling operation can be expressed as

where || denotes the concatenation of vectors.

(5) Fully connected layers integrating element-wise activation functions, which map the final graph representations to the forecasts of future 24-h load, as

FIGURE 4. Framework of ResGCN for short-term load forecasting.

##### 2.2.3 Construction of one ResGCN Block

Without loss of generality, denote the input of one ResGCN block as

$X∈ℝN×F$

while the output as

$Z∈ℝN×F‘$

. Instead of learning the original mapping

directly, the stacked layers in one ResGCN block aim at learning the residual mapping defined as

$f(X)=h(X)−X$

, which is easier to optimize and can mitigate the over-smoothing. Therefore, the original mapping is recast as

$f(X)+X$

.

As shown in Figure 5, the operation for

$f(X)+X$

is realized by a shortcut connection which is simply an identity mapping. The identity mapping ensures that the deeper model is at least have the same performance as its shallower counterpart. Besides, the residual mapping

$f(X)$

is fitted using one or two modified GCN layers, whose general form can be expressed as

where

$xi(k)$

is the feature vector of node i in embedding space at the kth layer; N(i) denotes the neighbors of node i. ϕ is a non-linear differentiable function, e.g., multi-layer perceptron (MLP), which updates representation of node i based on node features and edge weights ωi,j while Γ which aggregates representations of the neighbors, which is order invariant such as summation, maximization, and concatenation; γ is a non-linear activation function.

FIGURE 5. Configuration of a ResGCN block.

Herein, this paper make modifications of the traditional GCN to mapping the residual, as

$xj(k−1)+αl⋅ωi,j⋅xi(0))((1−βl)In+βlΘ))(10)$

where Θ is the learnable weight matrix while

$deg(i)$

denotes the degree of the ith node; α and β are decay parameters of the residual and weight matrix, respectively while

$In$

is the identity matrix.

##### 2.2.4 Construction of one ResGCN Block

The proposed ResGCN based short-term load forecasting is a regression problem so that the loss function is defined by mean square error. Besides, the L2 regularization is adopted to prevent the model from overfitting and to enhance the generalization ability, as

$Loss=1T∑t=1T(yt−y^t)2+λ2⋅∑iωi2(11)$

where

is training data while

$y^t$

is the output of ResGCN;

$λ$

is weight decay in case that the penalty term is too large,

$ωi$

is the parameter weight of the net.

With the loss function, ResGCN is trained by Adam, wherein the data are split into small batches that are used to calculate the loss function and update the coefficients. Besides, the EarlyStopping mechanism (Prechelt, 2012) is introduced to halt the training when the loss function stops to decrease for several iterations.

##### 2.2.5 Parallel Training

The load graphs in one batch cannot simply be fed into the model simultaneously, which is not computationally efficient. Therefore, this paper proposes a graph concatenation method for parallel training. Specifically, the adjacency matrix

$A$

and feature matrix

$X$

of multiple independent load graphs are concatenated respectively by diagonal, yielding a giant graph with sparse and large adjacency matrix and feature matrix containing the information of all the subgraphs, as depicted in Figure 6. Thanks to the weight sharing mechanism, the training of feeding the giant graph into ResGCN is the equivalent to training the multiple subgraphs separately, whereas the computation time can be saved significantly.

FIGURE 6. Graph concatenation for parallel training.

## 3 Case Studies

### 3.1 Implementation and Benchmark

The COVID-EMDA + dataset (Ruan et al., 2020) is adopted to validate the proposed method, which is developed by the group in Tsinghua university led by Prof. Haiwang Zhong, the primary supervisor of this study, in collaboration with other two famous groups in Texas A&M University and Massachusetts Institute of Technology, respectively. It integrates historical load information from major United States power markets such as CAISO, MISO, ISO-NE, and NYISO and other exogenous information such as epidemic status and population flows, which has won wide recognition and been published in Joule. The dataset integrates historical load information from major United States power markets such as CAISO, MISO, ISO-NE, and NYISO and other exogenous information such as epidemic status and population flows.

The distribution of confirmed COVID-19 cases in the United States until January 02, 2021, is shown in Figure 7. It can be observed that, as the most populous city in Texas, Houston has as many as 48,225 confirmed cases, ranking in the second place in the United States Besides, its GDP exceeded \$512 billion prior to the outbreak of the pandemic, which was more than those of 37 states in the United States and accounted for 27.8% of that of Texas. Moreover, as the fourth largest city, it has annual electricity consumption of nearly 1.08 billion kWh in 2019. Therefore, the data of ERCOT-Houston from January 23, 2020, to November 23, 2020, were selected from COVID-EMDA + for training and testing since Houston is representative in terms of epidemic development, economic status, and electricity consumption load.

FIGURE 7. Distribution of cumulative confirmed cases in the United States

The data is divided into training set, validation set, and test set in the ratio of 8:1:1. The proposed method is compared with the other five classical algorithms, including traditional temporal prediction algorithms, namely ARIMA, MAF, ES, machine learning-based method, i.e., Random forests (RF), and long short-term memory deep neural networks (LSTM-DNN). The inputs and hyperparameters of the algorithms are shown in Table 1. Besides, two metrics are selected to assess their performance, namely mean absolute percentage error (MAPE) and root mean square error (RMSE), which are calculated as

$MAPE=100%n∑i=1n|yi−y^iyi|(12)$
$RMSE=1n∑i=1n(yi−y^i)2(13)$

where

$y^i$

and

$yi$

are testing data and forecasts by the algorithms, respectively, while

$n$

is the total number of time moments in the data set.

TABLE 1. Inputs and hyperparameters of compared algorithms.

### 3.2 Validation of ResGCN for Short-Term Load Forecasting

The results on all the test data are shown in Figure 8A while on 1 day are depicted in Figure 8B, which demonstrates that the prediction results of the proposed method are closer to the actual value compared with other algorithms. Besides, it can be observed from Table 2 that the performance of ARIMA, MAF, ES, and RF is not satisfactory for both MAPE and RMSE. Although LSTM-DNN outperforms the above four methods, its performance is still worse than those of the proposed method, with increases of MAPE and RMSE by 1.3264 and 15.03%, respectively. This justifies the superiority of the proposed method over other algorithms in short-term load forecasting.

FIGURE 8. Forecasts by different methods. (A) Forecasts on all the testing data; (B) Forecasts on 2020.11.05.

TABLE 2. Performance metrics of different methods.

### 3.3 Validation of Changes in Electricity Consumption Under COVID-19

To validate our argument that the epidemic largely affects electricity utilization, the differences in power load in 2020 with and without the pandemic are compared. To that end, using historical data during 2017–2019, the load forecasts in 2020 by the well-acknowledged linear regression are assumed as a benchmark for the electricity utilization without the crisis.

Firstly, the differences in annual total electricity consumption and maximum load are analyzed, as shown in Figures 9A, B, respectively. According to the linear regression-based method, the total electricity consumption in 2020 should reach 110.142 million kWh. However, it was only 107.757 million kWh in reality, with a decrease by 2.3851 million kWh. By contrast, the actual maximum load was higher than the forecasts by 1,284 kW. The results by the linear regression are reasonable in the sense that Houston is still in a stage of high growth based on the trend of its load in the previous 2 years. However, there was a significant drop in electricity consumption and maximum load, which was clearly an anomaly likely caused by the changes in industrial production and economic activities under the epidemic.

FIGURE 9. Comparison of annual total electricity consumption and maximum load with and without COVID-19. (A) Annual total electricity consumption; (B) Annual maximum load.

Secondly, the differences in daily and monthly total electricity consumption and maximum load are also compared, as shown in Figures 10A, B and Figures 11A, B, respectively. It can be seen that the forecasts by linear regression remain bigger than the actual values from March to October. Turning to the status of the epidemic and people’s responses represented, this paper analyze the data of COVID-19 cases, infection rate and fatality rate, changes of working location as well as mobility in public places, which are depicted in Figures 12A-D respectively. It can be observed that there are similar trends involved in epidemic development and load changes: On March 13, 2020, then-President Trump declared a state of emergency in the United States, after which the ratio of the working-from-home population increased while the electricity consumption immediately drops; Besides, the number of confirmed cases started to increase in late March, resulting in less mobility in public places while the load level also kept at a lower level. Thus, it is safe to draw the conclusion that electricity consumption is strongly connected with COVID-19 cases and the level of social activities.

FIGURE 10. Comparison of monthly total electricity consumption and monthly total electricity consumption with and without COVID-19. (A) Daily total electricity consumption; (B) Monthly total electricity consumption.

FIGURE 11. Comparison of daily and monthly maximum load with and without COVID-19. (A) Daily maximum load; (B) Monthly maximum load Monthly.

FIGURE 12. COVID-19 Status and people’ response in Houston. (A) Daily increase numbers of confirmed cases and deaths; (B) Infection rate and fatality rate; (C) Changes of working location; (D) Mobility in public places.

### 3.4 Validation of Load Graph-Based ResGCN in Short-Term Load Forecasting Under COVID-19

With the correlation between COVID-19 and electricity consumption changes in mind, it is apparent that the effects of the pandemic shall be incorporated into short-term load forecasting. To evaluate the proposed load graph encoding heterogeneous features, this paper compared the method to another graph that only encodes load and temperature information, termed naive load graph in the following. Then, the forecasting performance of ResGCN with the two kinds of graphs are analyzed in the scenarios with and without COVID-19.

It can be seen from Table 3 that the ResGCN with naïve load graph achieves a desired performance in load forecasting in the scenario of the Year 2019, with MAPE as small as 6.0021%. However, the performance declines after the outbreak of the pandemic, with an increase in MAPE by around 1.5%. By contrast, using load graph encoding heterogeneous features, the forecast performance of ResGCN is much more robust, as shown in Table 4. The superiority of the proposed method is further validated by Figure 13, where the forecasts with consideration of COVID-19 are far closer to the actual data. This again justifies that incorporating the epidemic’s effects can improve the accuracy of short-term load forecasting, which is of significant value for a new normal featured by living with COVID-19.

TABLE 3. Forecasting performance of ResGCN with naive load graph.

TABLE 4. Forecasting performance of ResGCN with load graph encoding heterogeneous features.

FIGURE 13. Short-term load forecasts with and without considering effects of COVID-19 (2020.10.23-2020.10.26).

## 4 Conclusion and Prospects

### 4.1 Conclusion

The fight against COVID-19 is far from over, while many countries start to resume economic development aiming at a “living with COVID” new normal. In this context, this paper proposes a novel short-term load forecasting method under COVID-19 based on graph representation learning with heterogeneous features. Unlike existing methods that fit power load data to time series, this study encodes heterogeneous features relevant to electricity consumption and epidemic status into a load graph, so that not only the features are contained at each time moment, but also the inherent correlations between the features can be exploited; Then, a residual graph convolutional network (ResGCN) is constructed to fit the non-linear mapping between load graph to future loads. Besides, a graph concatenation method for parallel training is proposed to improve the learning efficiency.

The following points can be concluded from the case study using practical data in Houston:

(1) There are strong correlations between the evolution of COVID-19 and changes in electricity utilization.

(2) The proposed load graph is capable of exploiting heterogeneous features, while the accuracy of load forecasting can be improved significantly by considering the effects of the pandemic.

(3) The ResGCN outperforms existing short-term load forecasting methods in accuracy, with a decrease of RMSE by 15.03% compared with LSTM-DNN.

### 4.2 Prospects

In the present forecasting methodology with load graph, features like vaccination rates have not been considered. As the epidemic develops and vaccination becomes more widespread, these characteristics will become a non-negligible part of the epidemic’s impact on the load. Therefore, it remains for us to refine the load graph as the situation evolves. In addition, the selected features encoding in the load graph only reflect partial impacts of the epidemic, while further research is still needed to grasp their relationships fully.

It is worth noting that the idea of representing load as a graph and using ResGCN to do the forecasting can be applied not only to the regional load forecasting under major social and health events such as epidemics but also to the forecasting tasks that also require the integration of heterogeneous information, e.g., renewable energy output forecasting, which will be the direction of our future research.

As for the impacts of the epidemic on load, as the paper concluded in the case study, they are complex and need to be studied with respect to local policies. The discussions in this paper are focused on Houston only. In fact, the responses of the government and the attitude of people to the epidemic vary significantly from country to country, so that the situation in a larger area shall be investigated in the future.

## Author Contributions

Conceptualization, ZY; Data curation, JY; Formal analysis, YW; Methodology, ZY; Project administration, ZY; Resources, JY; Visualization, YH; Writing, ZY, All authors contributed to the article and approved the submitted version.

## Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

## Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

## Acknowledgments

The authors gratefully acknowledge the support of The Natural Science Foundation of China- Smart Grid Joint Fund of State Grid Corporation of China (U2066212) and the Data provided by Haiwang Zhong.