# Predicting circRNA-drug sensitivity associations via graph attention auto-encoder – BMC Bioinformatics

May 4, 2022

### Dataset

In this work, we download the circRNA-drug sensitivity associations from the circRic [14] database, in which the drug sensitivity data comes from the GDSC database [16], containing 80076 associations that involve 404 circRNAs and 250 drugs. The circRic database systematically characterizes circRNAs expression profiles in 935 cancer cell lines across 22 cancer lineages from Cancer Cell line Encyclopedia, and analyzed the circRNAs biogenesis regulators, the effect of circRNAs on drug response and association between circRNAs with mRNA, protein, and mutation, and predicted RNA regulatory element in circRNAs. For each individual circRNA, the Wilcoxon test is applied to identify drug sensitivity which is significantly associated with the circRNAs expression. Meanwhile, the association with a false discovery rate (FDR) less than 0.05 is defined as a significant association. In our method, only these significant associations are extracted as a training set which includes 4134 associations involving 271 circRNAs and 218 drugs. We finally construct an association matrix (A in R^{271times 218}) between circRNAs and drugs based on these significant associations. In A, element (A_{ij}=1) indicates that circRNA and drug sensitivity are interrelated; otherwise, (A_{ij}=0). Here, i and j denote the index of circRNA and drug in A, respectively. Besides the circRNA-drug sensitivity associations, we also curate the sequences of host genes of circRNAs and structure data of drugs, which come from the National Center for Biotechnology Information (NCBI) Gene database and PubChem database of NCBI, respectively [17, 18]. According to the sequences of host genes and structural information of drugs, their similarities are respectively calculated.

### Similarity networks

#### Sequence similarity of host genes of circRNAs

We calculate the sequence similarity between host genes as the similarity of circRNAs. The similarities are computed based on the Levenshtein distance of sequences through the ratio function of Python’s Levenshtein package. In the work, sequence similarities are represented by matrix (CSSin R^{271times 271}).

#### GIP kernel similarity of circRNA

The GIP (Gaussian interaction profile) kernel similarity is widely used in the similarity calculation of biological entities in previous research [19]. Similarly, we calculate the GIP kernel similarity of circRNAs according to the circRNA-drug sensitivity associations matrix A based on the assumption that circRNAs associated with the same drug sensitivity are more likely to be similar. The GIP kernel similarity matrix of circRNAs is denoted by (CGS in R^{271times 271}).

#### Structural similarity of drug

Since drugs structure dramatically affects drugs function, we can measure the similarity of drugs through their structures. Based on past studies, we chose the RDKit toolkit and the Tanimoto method to calculate the structural similarity of drugs [20, 21]. After obtaining these structure data from the PubChem database, we first used RDKit to calculate the topological fingerprint of each drug, then calculate the structure similarity between drugs through the Tanimoto method. Finally, the structure similarity matrix of drug is derived, denoted by (DSSin R^{218times 218}).

#### GIP kernel similarity of drug

Similar to circRNA, we also calculate the GIP kernel similarity of drugs, which is represented by (DGSin R^{218times 218}).

#### Similarity fusion method

As described above, we respectively calculate the similarities of circRNAs and drugs from different aspects. To obtain their comprehensive similarity matrix, the similarities from different aspects need to be fused. The circRNA’s comprehensive similarity matrix is constructed as follows.

begin{aligned} begin{aligned} CS_{ij}=left{ begin{array}{lll} frac{(CSS_{ij}+CGS_{ij})}{2} &{},&{} ifquad CSS_{ij} ne 0\ CGS_{ij} &{},&{} otherwise end{array}right. end{aligned} end{aligned}

(1)

Similarly, the drug’s comprehensive similarity matrix is computed as follows.

begin{aligned} begin{aligned} DS_{ij}=left{ begin{array}{lll} frac{(DSS_{ij}+DGS_{ij})}{2} &{},&{} ifquad DSS_{ij} ne 0\ DGS_{ij} &{},&{} otherwise end{array}right. end{aligned} end{aligned}

(2)

After obtaining the similarity networks, we binarize the similarity network for the downstream GATE model. In this step, we set the thresholds cth and dth for the binarization of circRNA similarity network and drug similarity network, respectively. We set the element in the similarity matrix to 1 if its value is greater than the threshold, otherwise 0.

### GATECDA framework

Our GATECDA model, the flowchart of which is depicted in Fig. 1, is based on Graph Attention Auto-encoder. The primary processing is composed of several steps: (1) Construct the circRNA and drug similarity network, respectively; (2) GATE is adopted to extract the vector representations of circRNAs and drugs; (3) The representations of circRNAs and drugs are combined and fed to a fully connected neural network for predicting the association score of each pair of circRNA and drug sensitivity.

### Graph attention auto-encoder

Graph Attention Auto-encoder(GATE) is an unsupervised learning model used for representation learning of structured Graph data. GATE can reconstruct node attributes and graphical structures of structured Graph data by stacking encoders and decoders. In the encoder, the attributes of nodes are fed into the encoders as the initial representation of nodes, and each encoder generates new representations of nodes by considering their relations based on a self-attention mechanism [22]. Furtherly, the encoder updates the representation of the current node with neighbors’ representations. In the decoder, the encoding process is reversed to reconstruct the initial attributes of nodes.

In this study, we used the GATE model to extract the representation of circRNAs and drugs. GATE assigns different weights to each neighbor of the current node through the attention mechanism, which can help the model to obtain better node representation.

The GATE model consists of multiple encoder layers and decoder layers. In GATE, encoders and decoders have the same number of layers. The multiple encoder layers can improve the learning ability of the model and produce a better node representation. Figure 2 shows the process of GATE encoding and decoding.

The Encoder layer generates new representations for nodes by taking into account their neighbors’ representations based on their relevance. Inspired by the work of Velickovic et al. [22], the GATE model employs a self-attention mechanism with shared parameters among nodes to determine the relations between one node and its neighbors. In the kth layer encoder, the correlation between node i and its neighbor node j is calculated as follows:

begin{aligned} begin{aligned} c^{(k)}_{ij}=Sigmoid( V_s^{(k)^{T}} sigma (W^{(k)}h_i^{(k-1)}) + V_r^{(k)^{T}} sigma (W^{(k)}h_j^{(k-1)}) ) end{aligned} end{aligned}

(3)

Here, ({W^{(k)}} in {R^{{d^{(k)}} times {d^{(k – 1)}}}}), (V_s^{(k)}in R^{d^{(k)}}), and (V_r^{(k)}in R^{d^{(k)}}) are the trainable parameter of the kth layer encoder, (sigma) and Sigmoid represent the activation function and the Sigmoid function, respectively.

To solve the problem of comparability among coefficients of node i’s neighbors, we employ the Softmax function to normalize the coefficients as shown in the following Equ. (4):

begin{aligned} begin{aligned} alpha _{ij}^{(k)}= frac{exp(c_{ij}^{(k)})}{sum _{lin N_i}exp(c_{il}^{(k)})} end{aligned} end{aligned}

(4)

where (N_i) denotes the neighbors of node i, including node i itself.

The node features are taken as initial node representations, namely (h_i^{(0)}=x_i), and then the representation of node i in the kth layer is generated by the Eq. (5):

begin{aligned} begin{aligned} h_i^{k}=sum _{jin N_i} alpha _{ij}^{(k)} sigma (W^{(k)}h_j^{(k-1)}) end{aligned} end{aligned}

(5)

The last encoder layer’s output will be considered as the node representations used in our model.

GATE unsupervised learn node representations through utilizing the same number of decoder layers as the encoder. Each decoder layer reconstructs the representations of nodes according to the representations of their neighbors based on their relevance. The normalized relevance between node i and a neighbor j in the kth layer decoder is calculated by the Eq. (6) and (7).

$${hat{alpha }}_{ij}^{(k)}=frac{exp({hat{c}}_{ij}^{(k)})}{sum _{lin N_i}exp({hat{c}}_{il}^{(k)})}$$

(6)

$${hat{c}}^{k}_{ij}=Sigmoid( {hat{v}}_s^{(k)^{T}} sigma ({hat{W}}^{(k)} {hat{h}}_i^{(k)}) + {hat{v}}_r^{(k)^{T}} sigma ({hat{W}}^{(k)}{hat{h}}_j^{(k)}) )$$

(7)

Similar to the encoder layers, ({hat{W}}^{k}in R^{d^{(k)}times d^{(k-1)}}), ({hat{v}}_s^{(k)}in R^{d^{(k-1)}}), and ({hat{v}}_r^{(k)}in R^{d^{(k-1)}}) are also the trainable parameters of the kth layer decoder. The input of the decoder comes from the output of the last layer encoder, and the kth decoder will reconstruct the node representation of layer k-1 according to the Eq. (8).

begin{aligned} begin{aligned} {hat{h}}_i^{k-1}=sum _{jin N_i} {hat{alpha }}_{ij}^{(k)} sigma ({hat{W}}^{(k)}{hat{h}}_j^{(k)}) end{aligned} end{aligned}

(8)

After decoding via L decoder layers, the last decoder layer’s output is considered the reconstructed node features.

The loss function consists of two parts, namely the reconstruction loss of node features and the reconstruction loss of graph structure. We combine them through the equation as follows:

begin{aligned} begin{aligned} Loss=sum ^{N}_{i=1}||x_i-hat{x_i}||_2-lambda sum _{jin N_i}logleft( frac{1}{1+exp(-h^T_ih_j)}right) end{aligned} end{aligned}

(9)

Here, (lambda) is a hyperparameter, which balances the contribution of reconstruction loss of graph structure. (x_i) and (hat{x_i}) represent the node features and the reconstructed features of nodes respectively. (h_j) is the representation of a neighboring node j to node i. We can obtain high-quality node representations by minimizing the Loss function.