# A R-GCN-Based Correlation Characteristics Extraction Method for Power Grid Infrastructure Planning and Analysis Shengwei Lu, et al.

May 7, 2022

## Introduction

As the energy revolution continues to advance in depth, the electric power structure will gradually shift from traditional fossil fuel-based power to clean and low-carbon renewable energy power (Erdiwansyah et al., 2021; Zhang et al., 2022). Power grid enterprises shoulder the heavy burden of the national economy and people’s livelihood, and the construction of power grid infrastructure projects has a subtle influence on the security, stability and development of the construction area. In the face of massive infrastructure projects to be selected from various prefectures and cities across the province, power grid companies are facing huge challenges in coordinating the construction of regional and provincial main grid projects and distribution network projects among cities (Liu et al., 2017; Chen et al., 2020; Liu et al., 2021).

Although the current power grid infrastructure demand is huge, the actual available funds of power grid companies are often lower than the actual construction demand (Chen et al., 2020). Therefore, how to use limited resources such as capital, manpower and equipment for the most valuable projects is of great significance to power grid planning. For massive power grid infrastructure projects with different voltage levels, engineering attributes and project properties, there may be a special relationship among some projects (Sheng et al., 2020; Li et al., 2021). At this point, whether a project is constructed or not has important leading significance on whether and when other following projects are constructed. Moreover, a lot of manpower and time will be cost to identify the linkages among projects manually, and it is difficult to cover all aspects of the attributes and features of the projects to make a comprehensive consideration. Therefore, an intelligent correlation characteristics extraction method is of great necessity. At present, the existing studies have not considered the possible interrelationships among projects systematically, and few literatures have comprehensively analyzed the correlation characteristics among massive power grid infrastructure projects (Xiao et al., 2019; Sheng et al., 2020; Yang et al., 2021). In this context, fully considering the correlation characteristics among the massive infrastructure optimization projects and accurately identifying the linkages among different projects can provide more instructive opinions for the subsequent investment portfolio optimization (Huang et al., 2020; Yang et al., 2021).

In this paper, a R-GCN-based identification method of linkages among massive infrastructure projects is designed for power system planning which satisfies the growth of infrastructure demand and enhances investment benefit. The key contributions of this study are twofold:

1) From the perspective of the engineering attributes of infrastructure projects and the inherent attributes of the project itself, four project entity node types for massive power grid infrastructure projects are established: power transformation projects, transmission line projects, power transmission and transformation projects, and supporting transmission projects, as well as four specific linkages: mandatory relation, coexistence relation, interdependence relation, and mutual exclusion relation.

2) Based on the R-GCN methodology (Schlichtkrull et al., 2017), an identification method of linkages among massive power grid infrastructure projects is proposed, consisting of four parts: an input of original triples of the entity node feature vector of one project-the relation-the entity node feature vector of another project, a R-GCN encoder, a DistMutlt decoder and a cross-entropy-based boundary loss calculation.

## Linkages Among Massive Infrastructure Projects

While portfolio optimizing, the candidate project library covers a large number of power grid infrastructure projects. From the perspective of project properties, it includes power transformation projects, transmission line projects, power transmission and transformation projects with voltage levels of 500kV, 220kV, 110kV, and 35 KV (Xiao et al., 2019; Sheng et al., 2020; Yang et al., 2021). Each type of project covers newly-started projects, continued-construction projects, expansion projects, and renovation projects (Hong et al., 2021). The overall number of projects is extremely huge, and the relation among projects is intricate. The choice of which projects to build and the order of construction will affect the selection of subsequent projects and the management of the construction period (Xiao et al., 2019; Hong et al., 2021; Yang et al., 2021). Therefore, it is necessary to mine deeper into the potential linkages among projects. Considering the engineering attributes and project properties of massive power grid infrastructure projects with multiple voltage levels, the correlation characteristics are analyzed, and finally four types of project entity nodes are formulated: power transformation projects, transmission line projects, power transmission and transformation projects, and supporting transmission projects, as well as four specific linkages: mandatory relation, coexistence relation, interdependence relation, and mutual exclusion relation, as shown in Figure 1.

FIGURE 1. Linkages among massive infrastructure projects.

### Mandatory Relation

The portfolio optimization of power grid infrastructure projects does not only focus on one single project, but comprehensively considers the regional grid as a whole. Some of the projects may play a crucial role in the safety and reliability of the regional grid, and should be mandatorily selected, regardless of the comprehensive evaluation results. Such projects must be constructed and put into operation, and would certainly be of the highest priority. The mandatory projects cover three voltage levels of 500kV, 220kV for regional and provincial main grids and 110kV for distribution network. Furthermore, the project properties cover power supply delivery projects, electric railway supporting projects, UHV supporting projects and new energy collection stations and other power grid infrastructure projects.

### Coexistence Relation

The coexistence relation means that the two projects need to cooperate with each other to make sense, that is, both projects either going into production or not being selected at all. While building a new power transmission and transformation project, substations and transmission lines in the corresponding area will be constructed. In order to ensure the delivery of electric energy, it is necessary to construct supporting transmission projects corresponding to each voltage level. For example, a 220kV power transmission and transformation project and the 110kV transmission project of the 220kV substation are coexistent projects.

### Interdependence Relation

The interdependence relation refers to the fact that there is a sequential construction sequence between two projects in the aspects of time sequence or space for construction. One project must be arranged after another project is put into operation. On one hand, due to the large scale, technical difficulty and long construction period of power grid infrastructure projects, in order to avoid and reduce risks, the power supply delivery and transmission line projects of the regional and provincial main grids are implemented in two or three phases, so that there is an interdependence relation between the phased projects. On the other hand, multi-circuit lines are established for the newly-started and renovation transmission line projects of the regional and provincial main grids and part of the 110kV distribution network, which are spatially consistent. These projects are interdependent, working together to improve the security, stability and reliability of power grids.

### Mutual Exclusion Relation

The mutual exclusion relationship means that two projects are conflicting and cannot be selected simultaneously. Due to the huge number of power grid infrastructure projects, there may be risks that projects will be recorded repeatedly, the coverage regions will overlap, and projects with the same function may exist. In order to avoid unnecessary waste of resources caused by repeated construction, such projects should be selected on merit.

## Relational Graph Convolutional Neural Network Encoder

### Input

The input of the R-GCN-based identification method is defined as the original triples of the entity node feature vector of one project-the relation-the entity node feature vector of another project, which is essentially composed of limited power grid infrastructure projects and limited linkages among these power grid projects. Therefore, the input

$ftri(y)$

can be summarized as the following expression:

$ftri(y)={Vpro,Elin,XV,AV}(1)$

Where

$Vpro={vi}∈Rn$

denotes the set of massive power grid project entity nodes and

$n$

is the number of all power grid project entity nodes of

$Vpro$

. Correspondingly,

$Elin$

is the set of defined linkages among massive power grid infrastructure projects. And

$XV={xi,j}∈Rn×d$

is the feature matrix of power grid project entity nodes of

$Vpro$

, with

$d$
$xi,j$

denotes the value of the j-th attribute of the power grid project entity node

$i$

of

$Vpro$

. And

$AV$

$Vpro$

, which represents the linkage between every two power grid project entity nodes. The definition of the adjacency matrix

$AV={ai,j}∈Rn×n$

is shown as follows:

Based on the above definition, the input

$ftri(y)$

can be converted into a spectral signal

$f^tri(y)$

by Graph Fourier Transform, as shown in the formula 3.

Where

$ftri(y)$

is the input of defined original power grid project triples and

$f^tri(y)$

is corresponding spectral input.

$UVT$

is the transposed eigenvector matrix which originates from the eigen-decomposition of the normalized Laplacian matrix

$LV$

which corresponds to the adjacency matrix

$AV$

### Relational Graph Convolutional Neural Network

Based on the graph convolution methodology and the Graph Fourier Transform, the graph convolution of the input of defined original power grid project triples can be realized in the standard orthogonal space of the spectral domain, as shown below:

$ftri^(y)∗Gg=F−1(F(ftri^(y))⊙F(g))=UV(UVTftri^(y)⊙UVTg)(4)$

Where

$g$

is the graph convolution kernel,

$∗G$

is the graph convolution operator, and

$⊙$

After converting the Hadamard product into the matrix multiplication, the graph convolution of the input of original power grid project triples is changed into the following formulas:

$ftri^(y)∗Gg=UVgθUVTftri(y)(5)$
$UVTftri(y)=[θ1,θ2,…,θn]T(6)$
$g=diag([θ1,θ2,…,θn])(7)$

Where

$θ1,θ2,…,θn$

are the parameters of the graph convolution kernel

$g$

.

Then the feature matrix of power grid project entity nodes output by the graph convolutional neural network at layer

$l+1$

can be obtained and represented by the following formula (Peng, 2020):

Where

$Hil∈Rn$

is the value of the i-th input attribute of all the power grid project entity nodes output at layer

$l$

,

$s$

is the number of dimensions of input attributes of all the power grid project entity nodes at layer

$l+1$

, and similarly

$t$

is the number of dimensions of output attributes of all the power grid project entity nodes at layer

$l+1$

.

$gi,jl$

is the spectral graph convolution kernel and

$τ$

denotes the typical non-linear activation function.

Although the above graph convolutional neural network could be applied to form a multi-layer convolutional neural network, it is not reliable enough and eigen-decomposition is required in the above-mentioned calculation process and might cause the high complexity of calculation. In order to make up for the above shortcomings, the Chebyshev neural network is introduced to parameterize all the parameters to be learned of the graph convolution kernel

$g$
$g=g(Λ)≈∑k=0K−1θkTk(Λ^)(9)$

Where

$θk$

denotes the coefficients of Chebyshev polynomial.

If the Chebyshev polynomial of the eigenvalue block-diagonal matrix is defined as the graph convolution kernel, the graph convolution of the input of original power grid project triples can be computed by the following formula:

$ftri(y)∗Gg=UV(∑k=0K−1θkTk(L^))UVTftri(y)=∑k=0K−1θkTk(UVL^UVT)ftri(y)=∑k=0K−1θkTk(L˜)ftri(y)(11)$

Moreover, after the introduction of the Chebyshev neural network, the feature matrix of power grid project entity nodes output by the graph convolutional neural network at layer

$l+1$

is shown below:

In order to further simplify the calculation process, the first-order approximation is also introduced to the above graph convolutional neural network. Fixing the maximum eigenvalue of the normalized Laplacian matrix

$LV$

as constant two (Li et al., 2020; Jalali et al., 2021), the formula 11 and formula 12 can be simplified as follows:

$ftri(y)∗Gg=θ0ftri(y)−θ1DV−1/2AVDV−1/2ftri(y)(13)$

To avoid the problem of overfitting, let

$θ=θ0=−θ1$

. And then the formula 13 and formula (14) can be further simplified, as shown below:

$ftri(y)∗Gg=θ(In+DV−1/2AVDV−1/2)ftri(y)(15)$

On the basis of the above simplified graph convolutional neural network, the relational graph convolutional neural network comprehensively considers the connection mode with neighbor power grid project entity nodes under different types of defined linkages and adds a special self-connection to each power grid project entity node so that the information about all the power grid project entity nodes at each layer can be effectively transmitted (Gusmao et al., 2021). Consequently, the feature matrix of power grid project entity nodes output by the relational graph convolutional neural network at layer

$l+1$

is defined as follows:

Where

$σ$

is the activation function,

$Wrl$

is the regularization weight matrix of the corresponding power grid project entity nodes, and

$Wol$

is their own weight matrix.

$r∈R$

represents the r-th linkage of the set of all defined linkages between related power grid project entity nodes and

$m∈Nir$

denotes the set of neighbor nodes of the specific power grid project entity node i at layer

$l+1$

under the specific linkage r. Specially,

$ci,r$

is a normalized constant that can either be learned or chosen in advance, here let

$ci,r=|Nir|$

.

However, when applying the formula 17 to the input of defined original power grid project triples which is essentially a multi-relational dataset, the number of parameters of the relational graph convolutional neural network will increase rapidly with the increase of the number of defined linkages among massive power grid infrastructure projects, which can easily lead to the problem of overfitting. To address this issue, we introduce one method—basis-decomposition—for regularizing the weights of R-GCN-layers (Schlichtkrull et al., 2017). With the basis-decomposition, each

$Wrl$

in formula 17 is defined as follows:

Where

$Wrl$

is a linear combination of the basic transformations

$Cbl$

with the coefficients

$arbl$

which are only related to the corresponding linkage r between specific power grid infrastructure projects.

## A R-GCN-Based Identification Method of Linkages

FIGURE 2. The overall process of the proposed identification method.

Firstly, the original triples of the node feature vector of one project-the relation- the node feature vector of another project are used as both positive and negative samples to be the input of the relational graph convolutional neural network encoder. After a series of operations of feature selection such as aggregation, updating and circulation, the project entity node feature vector output by the R-GCN encoder which can extract features from the original triples input are combined with the candidate linkages to form the recombinant triples.

Next, we use the DistMult decoder as the scoring function to score the recombinant triples and sort scores in an ascending order (Yu et al., 2021). A recombinant triple is scored as formula (19).

Where

$Hi$

is the real-valued vector output by the relational graph convolutional neural network encoder and is corresponding to each project entity node

$vi$

. Here we have

$Rr$

is the matrix vector related to the specific linkage

$r$

.

Finally, the boundary loss calculation based on the cross-entropy loss is performed to make the score of the observable positive samples of the model higher than that of the negative samples. By optimizing the cross-entropy loss function, the result of the predicted linkages with the highest score is the final output. The cross-entropy loss function is shown below:

$L=−1(1+ω)|E^|∑(s,r,o,y)∈Ty⁡log(ftri(s,r,o))+(1−y)log(1−l(ftri(s,r,o)))(20)$

Where T represents the set of triples which covers both the positive samples and the negative samples,

$|E^|$

is an incomplete set of the linkages between projects, and

$l(∗)$

is the logistic sigmoid function. We take

$y$

as an indicator set to

$y=1$

for positive samples and

$y=−1$

for negative ones, which indicates the status of each triple.

The overall training process of the R-GCN-based identification method of linkages among massive power grid infrastructure projects is shown in detail as follows:

1) The related parameters of the R-GCN encoder are initialized and the dataset of massive power grid infrastructure projects is sorted out to get the original triples input.

2) The dataset of original triples is input onto the R-GCN encoder to perform a series of operations of feature selection and output the feature matrix of the set of project entity nodes.

3) The project entity node feature vector output by the R-GCN encoder is combined with the candidate linkages between projects to form the recombinant triples.

4) The DistMult decoder is used as the scoring function to score the recombinant triples and sort scores in an ascending order.

5) The boundary loss calculation which is based on the cross-entropy loss function is performed. Ensure that the score of the observable positive samples of the model is higher than that of the negative samples.

6) The results of predicted linkages among massive power grid infrastructure projects with the highest score are output.

7) The error between the predicted linkages and the actual linkages is calculated.

8) Whether the conditions of training termination are met is judged. If yes, the training process is terminated. If not, the error is used to update the weight matrix of the R-GCN encoder and then the process will jump to the second step.

## Case Studies

TABLE 1. The testing results of the R-GCN-based identification method.

As is shown in Table 1, the proposed method can effectively identify the linkages among massive power grid infrastructure projects. Where the deviation is defined as the difference between the predicted value and the actual value as a percentage of the number of actual triples. For the four linkages, the deviation values range from just below 8% to above 16%, that is, the overall accuracy rate is as high as 90%, which proves that the proposed method is feasible. Furthermore, with the increase of the sample size, the accuracy rate of the R-GCN-based method for identifying the correlation characteristics on the candidate project library is improving. In conclusion, when the sample size exceeds 30,000, the final accuracy rate can reach 94%, verifying the effectiveness of the proposed method.

Based on the existing engineering attributes and project properties, the candidate project library is converted into the format of original triples as an input of the model, and some of the predicted linkages among massive infrastructure projects are shown in Figure 3. It is not difficult to find that there are complex relations among the massive infrastructure projects, and the proposed method can quickly identify the linkages and extract the correlation characteristics, greatly improving the degree of intelligence for power grid infrastructure planning.

FIGURE 3. Predicted linkages among infrastructure projects.

## Discussion and Conclusion

From the perspective of the engineering attributes and inherent properties of the power grid infrastructure project, this paper analyzes in detail the correlation characteristics among the multi-voltage-level projects, and finally defines four specific linkages among the massive infrastructure projects. Furthermore, based on the R-GCN, a method which can accurately identify the correlation characteristics is proposed. In the follow-up research, the identified linkages can be considered as one of the constraints of the investment optimization model of massive power grid infrastructure projects, so that a more scientific and reasonable investment portfolio can be obtained. As a result, power infrastructure investment could be further promoted from relatively extensive management to sophisticated, intelligent and high-quality development to achieve precise resource allocation.

## Data Availability Statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding authors.

## Author Contributions

Writing the original draft and editing, SL and WZ; Conceptualization, JY; Formal analysis, YZ; Visualization, LQ and SW; Contributed to the discussion of the topic, QW and MZ.

## Funding

This work is supported by the State Grid Science and Technology Project (No.1400-202257234A-1-1-ZN).

## Conflict of Interest

Author S L, LQ, QW and MZ are employed by State Grid Hubei Electric Power Company Limited. Author JY and SW are employed by State Grid Hubei Electric Power Company Limited Economic and Technical Research Institute.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

## Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.