# Effects of investor sentiment on stock volatility: new evidences from multi-source data in China’s green stock markets – Financial Innovation

Aug 23, 2022

### Variable constructions

#### Investor sentiment

Internet sentiment. The BERT method is a deep interactive pre-trained language model based on the semantic understanding derived from the transformer. The BERT uses transformer encoders as feature extraction tools and adds position encoding to recognize position information to understand language order. In addition, it uses self-attention to improve the computing capability of the model and adopts the scaled dot product as the attention scoring function. The output vector sequence can be written as

$$Attention(Q,K,V) = softmaxleft( {frac{{Q^{T} K}}{{sqrt {d_{k} } }}} right)V,$$

(1)

where Q represents the query vector, K denotes the key vector, V is the value vector, (1/sqrt {d_{k} }) is the scaling factor, and softmax is the normalization function. Furthermore, BERT introduces a multi-head self-attention mechanism to extract more interactive information in multiple spaces. The results of the attention function calculation are then processed by layer normalization, which is defined as follows:

$$LN(x_{i} ) = alpha times frac{{x_{i} – mu_{L} }}{{sqrt {sigma_{L}^{2} + varepsilon } }} + beta ,$$

(2)

where (mu_{L}) denotes the mean value of net input (x_{i}) of neurons in layer L, (sigma_{L}^{2}) is the variance of net input (x_{i}) of neurons at layer L, and (alpha) and (beta) represent the parameter vectors of scaling and translation, respectively. In addition,(varepsilon) is an extremely small constant set for numerical stability. After normalization, feed-forward neural networks composed of two full connections are used for the relevant learning. The BERT uses the above basic mechanism to yield a pre-trained language model through unsupervised training with massive text.

Although the BERT is a milestone in processing the sentiment classification of Chinese text, its application in the financial field still needs to be improved. Therefore, Entropy Jane Technology trained the FinBERT pre-training language model based on BERT, using one million financial and economic news articles, nearly two million various research papers, company announcements, and about one million financial encyclopedia entries in 2020. We add a specific task output layer and selected 30,000 titles from the Eastmoney Guba training output layer for application to the target task. The classifier labels negative sentiment as − 1, neutral sentiment as 0, and positive sentiment as 1. The overall process is illustrated in Fig. 1.

Then, referring to Antweiler and Frank (2004), we construct the Internet sentiment in Eq. (3).

$$SentiIntern_{i,t} = lnleft[ {left( {1 + M_{pos,i,t} } right)/left( {1 + M_{neg,i,t} } right)} right],$$

(3)

where (SentiIntern_{i,t}) represents the Internet investor sentiment of stock I on day t, (M_{pos,i,t}) indicates the number of positive titles of stock I on day t, and (M_{neg,i,t}) represents the corresponding number of negative titles.

Trading sentiment. To measure investor sentiment systematically and comprehensively, we select several investor sentiment proxies to synthesize the trading sentiment from multiple indicators. Following Fu et al. (2021), we employ the principal component analysis (PCA) method to construct a firm-specific trading sentiment based on three underlying indicators, including turnover rate (TURN), buy-sell imbalance (BSI), and price-earnings ratio (PE).

The TURN indicator is calculated as the share-trading volume divided by the number of outstanding shares. Baker and Wurgler (2006) believe that the turnover rate can measure the investor sentiment and reflect the active degree of market transactions. Generally speaking, a high turnover rate indicates high demand from emotional investors, which can easily cause stock price instability (Han and Li 2017).

The BSI indicator is constructed by the imbalance between active buying and selling amounts. Kumar and Lee (2006) first include BSI in the construction of retail sentiment. Since then, BSI has been widely used to construct investor sentiment (Gao and Liu 2020; Li 2021). The calculation of BSI is

$$BSI_{i,t} = frac{{BV_{i,t} – SV_{i,t} }}{{BV_{i,t} + SV_{i,t} }},$$

(4)

where BVi,t is the amount of active buying of stock I in period t, and SVi,t denotes the active selling orders of stock I in period t. Specifically, a positive BSI indicates that investors are in a high mood, and a negative BSI means that investors are depressed.

PE represents the ratio of a stock’s price divided by the earnings per share. The high PE ratio partly reflects investors’ recognition of a company’s growth potential. Suppose a stock’s PE ratio is much higher than its peers’. In this case, it is generally believed that the company’s future earnings will proliferate, and investor sentiment is relatively high. As the core and most commonly used measure of enterprise valuation, the PE ratio is widely used in the construction of trading sentiment (Cheema et al. 2020).

In consideration of the contemporaneous or lag interdependence between these three underlying proxies and investor sentiments, we first produce the lag-one terms of the sentiment indicators. We then conduct the PCA to develop a composite index of firm-specific investor sentiments based on the six indicators, including both the contemporaneous and lag-one terms of the three underlying proxies. The correlation comparison analysis reveals that the contemporaneous terms of TURN, PE, and the lag-one term of BSI take the first three places. Thus, we apply the PCA method on these three proxies and construct the firm-specific sentiment by retaining the first two principal components, whose cumulative variance contribution rate reaches 73%, as shown in Eq. (5).

$$mathop {SentiTrade}nolimits_{i,t} = 0.365TURN_{i,t} + 0.259BSI_{i,t – 1} + 0.551PE_{i,t}$$

(5)

#### Volatility and its decompositions

To measure daily volatility, we adopt the realized volatility (RV) proposed by Andersen and Bollerslev (1998), which is based on 5-min high-frequency data. Given stock I with n intraday returns on trading day t, the realized volatility is then defined as the square of the 5-min intraday returns, and the specific formula is

$$RV_{i,t} = sumlimits_{j = 1}^{n} {r_{i,t(j)}^{2} } ,$$

(6)

where ri,t(j) is the logarithmic return of the j-th 5-min interval of stock I on day t, j = 1,2,…,n. RV can be considered as a consistent estimate of the true volatility under a continuous diffusion process assumption of stock prices. However, the continuous-time financial theory posits that the asset price without arbitrage is a semi-martingale process. That is, the price process is not necessarily continuous and may contain jumps. Therefore, Barndorff-Nielsen and Shephard (2004, 2006) proposed a non-parametric estimation method called the realized bi-power variation (RBV) to filter jump volatility, as shown in Eq. (7).

$$RBV_{i,t} = mathop mu nolimits_{1}^{ – 2} mathop {left( {1 – 2n^{ – 1} } right)}nolimits^{ – 1} sumlimits_{j = 3}^{n} {left| {r_{i,t(j)} } right|} left| {r_{i,t(j – 2)} } right|,$$

(7)

where (mu_{1}) is a constant equal to (left( {2/pi } right)^{1/2}). Assuming that the logarithmic price process is a semi-martingale and finite jump process, the RBV converges to the integral variance in probability. Then, the difference between the realized volatility and the realized bi-power variation is indeed a consistent estimate of the jump volatility. In theory, the value of the jump volatility should be positive, but there may be an empirical case where RVi,t is less than RBVi,t. Therefore, based on the method of Andersen et al. (2007), we define Jumpi,t as

$$Jump_{i,t} = maxleft{ {RV_{i,t} – RBV_{i,t} ,0} right}.$$

(8)

#### Information asymmetry and control variables

Information asymmetry The probability of informed trading (PIN) refers to the probability that a transaction comes from an informed trader with private information, and it always performs as an essential indicator in measuring the degree of information asymmetry. The higher the PIN, the more severe the degree of information asymmetry. Because overflow problems are often encountered in the calculation of the PIN, Easley et al. (2011) developed a VPIN estimator to solve this problem. The VPIN method divides the total transaction volume of a trading day into n transaction buckets with equal volumes, and the transaction volume of each transaction bucket is denoted as V. Informed traders will choose the direction of buying or selling based on their private information, resulting in an imbalance in buying or selling transactions. In calculating the imbalance of each transaction bucket, a transaction is regarded as a buyer’s order if the trading amount of the present transaction is higher than the previous transaction. Otherwise, the transaction is denoted as a seller’s order. Referring to Easley et al. (2012), the series of price differences between adjacent transactions in each bucket is standardized and incorporated into the standard normal distribution function. We can then compute the active buying or selling volume of each transaction. Specifically, the VPIN can be computed by Eq. (9).

$$VPIN = frac{{sumnolimits_{tau = 1}^{n} {left| {V_{tau }^{B} – V_{tau }^{S} } right|} }}{nV}.$$

(9)

Here, n denotes the number of buckets, usually taken as 50. (V_{tau }^{B}) represents the active buying volume of each transaction, and (V_{tau }^{S}) is the active selling volume of each transaction.

Control variables. Following Antweiler and Frank (2004) and Sabherwal et al. (2011), we employ stock returns (Return), firm size (Size), book-to-market ratio (BM), and the number of posts (SenNum) as the control variables. Moreover, referring to John and Li (2021), we further add the market credit spread and term spread as control variables. The credit spread adopts the interest rate difference between the China Securities Index (CSI) corporate bond AA + and the government bond with a maturity of one year. The term spread is the interest difference between the 10-year and 1-year government bonds. Early studies reveal that stock market volatility is closely related to the weekday or calendar effect (Doyle and Chen 2009; Keef et al. 2009). We therefore add the weekday effect and introduce the following four dummy variables, Tuest, Wedt, Thurt, and Frit, into the regression models.

$$Tues_{t} = left{ {begin{array}{*{20}l} {1,} hfill & {{text{if}};t;{text{is}};{text{Tuesday}}} hfill \ {0,} hfill & {text{others,}} hfill \ end{array} } right.;Wed_{t} = left{ {begin{array}{*{20}l} {1,} hfill & {{text{if}};t;{text{is}};{text{Wednesday}}} hfill \ {0,} hfill & {text{others,}} hfill \ end{array} } right.;Thur_{t} = left{ {begin{array}{*{20}l} {1,} hfill & {{text{if}};t;{text{is}};{text{Thursday}}} hfill \ {0,} hfill & {text{others,}} hfill \ end{array} } right.;Fri_{t} = left{ {begin{array}{*{20}l} {1,} hfill & {{text{if}};t;{text{is}};{text{Friday}}} hfill \ {0,} hfill & {{text{others}}} hfill \ end{array} } right..$$

Detailed variable definitions are given in Table 1.

### Model construction

#### Baseline model

To investigate the impact of investor sentiment on the realized volatility of green stocks, we first include the trading sentiment to conduct a preliminary study employing the following regression:

begin{aligned} RV_{i,t} = & alpha_{11} + beta_{11} mathop {SentiTrade}nolimits_{{i,t{ – }1}} + sumlimits_{m = 1}^{p} {gamma_{m1} } Controls_{{i,t{ – }1}} + lambda_{11} RV_{i,t – 1} \ & quad + alpha_{i} + phi_{11} Tues_{t} + phi_{12} Wed_{t} + phi_{13} Thur_{t} + phi_{14} Fri_{t} + varepsilon_{1,i,t} . \ end{aligned}

(10)

Specifically, we adopt the lag-one terms of the independent variables in all regressions to avoid endogeneity. Considering the continuity of price fluctuation, we add the lag-one terms of the dependent variable as a control variable. The Internet sentiment is then added to examine its effect on realized volatility, as shown in Eq. (11).

begin{aligned} RV_{i,t} = & alpha_{12} + beta_{12} mathop {SentiTrade}nolimits_{{i,t{ – }1}} + delta_{1} mathop {SentiIntern}nolimits_{{i,t{ – }1}} + sumlimits_{m = 1}^{p} {gamma_{m2} } Controls_{{i,t{ – }1}} + lambda_{12} RV_{i,t – 1} \ & quad + alpha_{i} + phi_{21} Tues_{t} + phi_{22} Wed_{t} + phi_{23} Thur_{t} + phi_{24} Fri_{t} + varepsilon_{2,i,t} . \ end{aligned}

(11)

Under the assumption of a discontinuous diffusion process of stock prices, the realized volatility can be decomposed into continuous and jump volatilities. To further investigate whether the impact of investor sentiment on volatility is mainly attributable to continuous or jump volatility, we replace the realized volatility with continuous volatility in Eqs. (10) and (11). The specific equations are as follows:

begin{aligned} RBV_{i,t} = & alpha_{21} + beta_{21} mathop {SentiTrade}nolimits_{{i,t{ – }1}} + sumlimits_{k = 1}^{p} {gamma_{k1} } Controls_{{i,t{ – }1}} + lambda_{21} RBV_{i,t – 1} \ & quad + alpha_{i} + phi_{31} Tues_{t} + phi_{32} Wed_{t} + phi_{33} Thur_{t} + phi_{34} Fri_{t} + varepsilon_{3,i,t} , \ end{aligned}

(12)

begin{aligned} RBV_{i,t} = & alpha_{22} + beta_{22} mathop {SentiTrade}nolimits_{{i,t{ – }1}} + delta_{2} mathop {SentiIntern}nolimits_{{i,t{ – }1}} + sumlimits_{k = 1}^{p} {gamma_{k2} } Controls_{{i,t{ – }1}} + lambda_{22} RBV_{i,t – 1} \ & quad + alpha_{i} + phi_{41} Tues_{t} + phi_{42} Wed_{t} + phi_{43} Thur_{t} + phi_{44} Fri_{t} + varepsilon_{4,i,t} . \ end{aligned}

(13)

Similarly, we examine the influence of investor sentiment on jump volatility, as shown in Eqs. (14) and (15).

begin{aligned} Jump_{i,t} = & alpha_{31} + beta_{31} mathop {SentiTrade}nolimits_{{i,t{ – }1}} + sumlimits_{k = 1}^{p} {gamma_{k1} } Controls_{{i,t{ – }1}} + lambda_{31} Jump_{i,t – 1} \ & quad + alpha_{i} + phi_{51} Tues_{t} + phi_{52} Wed_{t} + phi_{53} Thur_{t} + phi_{54} Fri_{t} + varepsilon_{5,i,t} , \ end{aligned}

(14)

begin{aligned} Jump_{i,t} = & alpha_{32} + beta_{32} mathop {SentiTrade}nolimits_{{i,t{ – }1}} + delta_{3} mathop {SentiIntern}nolimits_{{i,t{ – }1}} + sumlimits_{k = 1}^{p} {gamma_{k2} } Controls_{{i,t{ – }1}} + lambda_{32} Jump_{i,t – 1} \ & quad + alpha_{i} + phi_{61} Tues_{t} + phi_{62} Wed_{t} + phi_{63} Thur_{t} + phi_{64} Fri_{t} + varepsilon_{6,i,t} . \ end{aligned}

(15)

#### Mediating effect model

We further verify the mediating effect of the VPIN in the influence of investor sentiment on stock volatilities. Specifically, based on Eq. (10), we construct the mediating effect model to examine the specific path of investor sentiment on volatility, as shown in Eqs. (16) and (17).

begin{aligned} VPIN_{i,t} = & omega_{11} + xi_{11} mathop {SentiTrade}nolimits_{{i,t{ – 1}}} + sumlimits_{{u{ = }1}}^{p} {gamma_{u1} } Controls_{{i,t{ – }1}} + varphi_{11} VPIN_{i,t – 1} \ & quad + omega_{i} + psi_{11} Tues_{t} + psi_{12} Wed_{t} + psi_{13} Thur_{t} + psi_{14} Fri_{t} + varepsilon_{7,i,t} , \ end{aligned}

(16)

begin{aligned} RV_{i,t} = & alpha_{14} + beta_{14} mathop {SentiTrade}nolimits_{{i,t{ – }1}} + theta_{1} mathop {VPIN}nolimits_{{i,t{ – }1}} + sumlimits_{{text{w = 1}}}^{p} {gamma_{w4} } Controls_{{i,t{ – }1}} + lambda_{14} RV_{i,t – 1} \ & quad + alpha_{i} + phi_{71} Tues_{t} + phi_{72} Wed_{t} + phi_{73} Thur_{t} + phi_{74} Fri_{t} + varepsilon_{8,i,t} . \ end{aligned}

(17)

In addition, our study also investigates the impact of the VPIN on volatility with the simultaneous existence of both Internet and trading sentiments. That is, we include the Internet sentiment into Eqs. (16) and (17), as shown in Eqs. (18) and (19).

begin{aligned} VPIN_{i,t} = & omega_{12} + xi_{12} mathop {SentiTrade}nolimits_{{i,t{ – 1}}} + delta_{4} mathop {SentiIntern}nolimits_{{i,t{ – 1}}} { + }sumlimits_{{u{ = }1}}^{p} {gamma_{u2} } Controls_{{i,t{ – }1}} + varphi_{12} VPIN_{i,t – 1} \ & quad + omega_{i} + psi_{21} Tues_{t} + psi_{22} Wed_{t} + psi_{23} Thur_{t} + psi_{24} Fri_{t} + varepsilon_{9,i,t} , \ end{aligned}

(18)

begin{aligned} RV_{i,t} = & alpha_{15} + beta_{15} mathop {SentiTrade}nolimits_{{i,t{ – }1}} + delta_{5} mathop {SentiIntern}nolimits_{{i,t{ – }1}} { + }theta_{2} mathop {VPIN}nolimits_{{i,t{ – }1}} + sumlimits_{{text{w = 1}}}^{p} {gamma_{w5} } Controls_{{i,t{ – }1}} + lambda_{15} RV_{i,t – 1} \ & quad + alpha_{i} + phi_{81} Tues_{t} + phi_{82} Wed_{t} + phi_{83} Thur_{t} + phi_{84} Fri_{t} + varepsilon_{10,i,t} . \ end{aligned}

(19)

Similarly, we replace dependent variable RV in Eq. (19) and conduct the mediating effect analysis on RBV and Jump, respectively.