# Reaching the bubble may not be enough: news media role in online political polarization – EPJ Data Science

Aug 13, 2022

### Data

The political situation for each election provides important context for the analysis. In Brazil, during the election time, the pollsFootnote 1 indicated high polarization among voters. The dispute was between Jair Messias Bolsonaro, representing the possibility of a 15-year break from the ruling Workers Party (PT), and Fernando Haddad, representing the continuity of PT’s rule, after a brief period in which a PT president elected was replaced by her vice-president as a result of impeachment. The election culminated in Bolsonaro’s victory, with 55.13% against 44.87% of the valid votes in favour of Haddad.Footnote 2 In Canada, Justin Trudeau represented the Liberals, which had previously held a parliamentary majority after unseating the Conservatives in 2015. In a close election, the Liberals won 157 (39.47%) seats in parliament, while the Conservatives, led by Andrew Scheer, won 121 (31.89%).Footnote 3 The (left-wing) New Democratic Party continued to lose ground from its 2011 peak, especially in French Quebec, where the Liberals and the separatist Bloc Québécois subsequently gained ground. The 2019 election resulted in the Liberals forming a minority government, which have historically exhibited instability, since the prime minister relies on representatives of other parties to remain in power.Footnote 4 Despite the multi-party character of the Canadian political system, at least since the 1980s, national politics has largely revolved around left-right differences [25].

### User polarity estimation

The next step was to estimate each user’s polarity based on the retweeted content. For this, we used the political orientation of the hashtags that users applied in their tweets as an indication of their polarity. For this task, we identified (87{,}620) and (86{,}959) unique hashtags for the Brazilian and Canadian cases, respectively. For each of these two sets, we extracted the top 100 most frequent hashtags and classified them manually according to their political orientation, “L”, “N”, “R” and “?”, were used to represent left-wing, neutral, right-wing and uncertain political leaning, respectively. Six volunteers (not the authors), three in each country, helped to classify all the top 100 hashtags without interference from one another. We maintained only the hashtags whose classification was the same for the three raters, resulting in 64 and 78 out of 100 hashtags for Brazil and Canada, respectively. We relied on the Fleiss’ kappa assessment [26] to measure the agreement degree between the raters, obtaining (kappa = 0.63) for Brazil, which means “substantial agreement”, and (kappa = 0.80) for Canada, meaning “almost perfect agreement”, according to Landis and Koch’s (1977) [27] interpretation of kappa scores.

To classify the rest of the hashtags at scale, we assumed that hashtags describing a common topic usually occur together in the same tweet. This way, a network of hashtag co-occurrences was built for Brazil and Canada, in which a node represented a hashtag and an edge between two nodes, the occurrence of both in the same tweet. Standalone hashtags were eliminated. A semi-supervised machine learning algorithm [28], which uses the edges’ weight to compute the similarity between nodes, was applied over the network to label unlabeled hashtags starting from the manually classified hashtags. By applying this procedure, (57{,}487) (65.6%) and (67{,}639) (77.8%) hashtags were classified in the Brazilian and Canadian datasets, respectively. For the Brazilian case we obtained 35,788 (62.3%) hashtags classified as “R” (right), 643 (1.1%) as “N” (neutral), 21,039 (36.6%) as “L” (left), and 17 (0.0%) as “?” (uncertain). For the Canadian case we obtained 8963 (13.3%) as “R”, 52,416 (77.5%) as “N”, 2597 (3.8%) as “L”, and 3663 (5.4%) as “?”. To assess the consistency of this method, the same hashtag classification procedure was performed 100 times, but randomly hiding 10% of the manually classified hashtags each time. Classification results were submitted to the Fleiss’ kappa assessment [26] (simulating 100 raters), where we obtained a (kappa = 0.84) for Brazil, and (kappa = 0.84) for Canada, meaning “almost perfect agreement” in both cases. We also considered a more aggressive strategy, hiding 20% of the manually classified hashtags, obtaining (kappa = 0.78) for Brazil, and (kappa = 0.73) for Canada, meaning “substantial agreement” in both cases, showing that the classification procedure was robust.

Results showed an imbalance in both datasets between hashtags classified as “N” and those classified as “R” or “L”. Taking a close look at the Canadian dataset, it is possible to conclude that users tend to apply neutral hashtags more frequently, for example, #cdnpoli, #elxn43, #onpoli, and #abpoli, together with polarized hashtags, such as #TrudeauMustGo, #blackface, #ChooseForward, or #IStandWithTrudeau, which, in turn, are applied in a proportionally smaller number, helping to explain the more prominent number of neutral hashtags in this case. On the other hand, in the Brazilian dataset, the use of left and/or right hashtags is higher than neutral hashtags, even when they appear together. This imbalance on both datasets was addressed by adding weights to each class during the users’ polarity estimation, presented in Equation (1). Given the low relevance of hashtags with uncertain positioning “?” for the analysis, only those classified as “L”, “N” and “R” were maintained. Finally, we removed all tweets from the dataset that did not contain any classified hashtag, to preserve only political domain-related tweets. This removal resulted in (4{,}217{,}070) ((6,1)%) and (2{,}304{,}911) ((17,4)%) tweets for the Brazilian and Canadian datasets, respectively.

The next step was to classify users according to their political orientation, which was performed on a weekly basis. For this, we separated each dataset into six weeks, starting on Monday and ending on Sunday. Less active users, with less than five tweets per week, were removed. This was done because these users did not create enough tweets to estimate their polarity. With that, we obtained (72{,}576) and (26{,}815) unique users for the Brazilian and Canadian datasets, respectively. For each of these users, a list of hashtags used in all of their tweets for the week was created, and their polarity ((P(H))), calculated using Equation (1):

begin{aligned} P(H)= frac{ vert H_{R} vert times W_{R}- vert H_{L} vert times W_{L}}{ vert H_{L} vert times W_{L}+ vert H_{N} vert times W_{N}+ vert H_{R} vert times W_{R}}, end{aligned}

(1)

where (H_{L}), (H_{N}) and (H_{R}) are the hashtag multisets (a set that allows for multiple instances for each of its elements) for classes “L”, “N” and “R”, respectively. (W_{L}=mathrm{avg}(|H_{N}|, |H_{R}|)/|H|), (W_{N}=mathrm{avg}(|H_{R}|, |H_{L}|)/|H|), and (W_{R}=mathrm{avg}(|H_{L}|, |H_{N}|)/|H|) are the weights for classes “L”, “N” and “R”, respectively. In these equations, (mathrm{avg}(.)) is a function that returns the average number of hashtags in two sets, and H is a set containing all hashtags applied by a user, i.e., (H = H_{L} cup H_{N} cup H_{R}). These weights are important to mitigate the class imbalance characteristic of our datasets. This is inspired by usual tasks in classification scenarios with imbalanced classes [29]. The general idea is to penalize classes with a higher number of hashtags, such as the case of class “N” in Canada, and “R” and “L” in Brazil, and increase the relevance of classes with fewer hashtags, such as “R” and “L” in Canada, and “N” in Brazil. This is necessary to avoid polarity estimation biased to the most frequent hashtags users’ tweeted. Without this strategy, most users would have their polarity estimated wrongly as neutral in the Canadian case. For example, a user who tweeted #ChooseForward, #ChooseForwardWithTrudeau, #cdnpoli, #elxn43, #ScheerCowardice, #onpoli, #VoteLiberal, #cdnpoli, #elxn43, #ItsOurVote, #CPC and #climatestrikecanada, which is a representative example of our dataset, would have a (P(H)) value equal to −0.1 without using weights, and a (P(H)) value equal to −0.5 with weights, better reflecting the polarity of the user. Similarly, a user who applied the hashtags #cdnpoli, #elxn43, #elxn43, #RCMP, #cdnpoli, #TrudeauIsARacist and #TrudeauCorruption, would have a (P(H)) value of +0.3 without weights and equal to +0.7 with weights, again the use of weights expresses a user’s polarity much better. Without weights, the opposite would happen in the Brazilian case; most users would have their (P(H)) values skewed to extremes improperly.

The result of (P(H)) is a value in the continuous range ([-1.0; +1.0]). Positive values represent a right-wing political orientation, negative values represent a left-wing orientation, and values close to 0.0 represent a neutral orientation. Based on this result, we labelled users according to their political orientation: the first third of values on the (P(H)) scale represents the left-wing (L) users, with (P(H) in [-1; -1/3[), the second third, the neutral users (N), with (P(H) in [-1/3; 1/3]) and the last third, the right-wing users (R), with (P(H) in ]1/3; 1]). It is important to note that the use of the terms left (L), right (R) and neutral (N) to denote the political orientation of hashtags and users was a simplification used to make it possible to compare the two political situations through a common and simplified categorization [25].

### Detecting bubble reachers

After identifying all users’ political orientation, a network was created for each week, connecting one user to another through retweets. Each network was represented in the form of a weighted undirected graph, where each user is a node, and the retweet is an edge that starts from the user who was retweeted and ends at the user who retweeted. The network is undirected because the incoming activity of any individual node—how much they retweet—is dwarfed by the outgoing activity of popular nodes—how much they get retweeted—, thus minimizing problems due to this characteristic. The edge weight represents the number of retweets between users. Self-loops and isolated nodes were eliminated.

To achieve the main goal of our study, we needed to detect highly central nodes linked to both sides of the network, i.e., we needed to detect brokers (bubble reachers) on the network capable of reaching users with diverging political views. For this task, betweenness centrality [30] could be a metric to be applied as a starting point, because it ranks nodes by their capacity of acting as information bridges between more pairs of nodes in the network, relying on the number of shortest paths that pass through each node. However, in a polarized situation, we can have nodes with a high betweenness degree within and between polarized groups. That is, highly influential nodes inside bubbles could be ranked the same as highly influential nodes between bubbles. The former are called “local bridges” and the latter “global bridges” [31]. We were interested only in the global bridges, nodes that most of the time act as brokers, by linking both sides of the network.

Considering that betweenness centrality was not an appropriate metric to distinguish local bridges from global bridges, we identified in the literature a relatively new centrality metric called “bridgenness” [31]. This metric is based on the betweenness algorithm, but while computing the number of shortest paths between all pairs of nodes that pass through a source node, it does not include the shortest paths that either start or end at the immediate neighborhood of the source node [31]. Even though the bridgeness algorithm could better emphasize global than local bridges, it brings up a problem when the considered network has a small average path length, which is precisely what happens with the retweet networks we are analyzing. This is because it could disregard some important small paths that either start or end on the neighborhood of a node. Considering that we already know the political orientation of all users in our dataset and that users with similar orientations tend to form tightly linked groups on the network, we used the political orientation as a filtering criterion for the shortest paths.

Algorithm 1 presents this proposed process, which we called the “intergroup bridging” algorithm. This algorithm builds on the betweenness and bridgeness algorithms. Still, while computing the number of shortest paths between all pairs of nodes that pass through a node, it does not include the shortest paths that either start or end at the immediate neighborhood of a certain node if the considered node on the neighborhood has the same class (political orientation, in our case) as the source node—a key twist added to the bridgeness algorithm is this restriction to be from a different class. Put in simple words: this new algorithm measures a node’s capacity to disseminate information to distinct groups on the network, with a different political orientation from itself. To construct the intergroup bridging algorithm, we relied on the Brandes “faster algorithm” [32]. These proposed changes are presented in line 35—in the count of the shortest paths that pass through node w, it is verified if the considered path is not a self-loop with (w neq s) (which already exists in the original Brandes’ algorithm) and if s is not in the neighborhood of w, with (A[w, s] == 0) or, if s is in the neighborhood of w, with (A[w,s] >= 1) and s has a different political orientation from w, with (O[w] neq O[s]). We refer to the measure created by this algorithm as “intergroup bridging centrality.” Note that this metric is also applicable to other problems with the same characteristics.

To illustrate the difference between centrality metrics, a synthetic network was used, the same one evaluated by Jensen et al. (2016) [31] to compare the metrics of betweenness and bridgeness, but with the addition of a label for each node representing its political orientation (L, N or R), to allow the intergroup bridging metric computation. Figure 1 presents this network on (a), including the computed values for all metrics, with colour-coded nodes according to political orientation, with L in red, N dark-coloured, and R in blue, and two scatter plots on (b), one representing the relationship between intergroup bridging with betweenness, and another between intergroup bridging with bridgeness, where the colors reflect the user political orientation, and the size of the point its degree (the bigger, the higher), with all values normalized in the ([0.0;1.0]) interval by each metric, to allow a fair comparison. In this figure, it is possible to observe that the intergroup bridging metric was more efficient in detecting nodes that bridge distinct groups on the network with a different political orientation from itself, represented by nodes A and B. To complement this analysis, Fig. 2 shows centrality values for the 25 users with the highest betweenness values in Brazil in week 4. These values were normalized using a min-max strategy for each metric. The user’s political orientation (L, N, or R) is presented next to their name. It is possible to note that the values of bridgeness and intergroup bridging centrality follow a similar pattern for most users in Brazil, being the former metric slightly higher than the latter, showing that those metrics capture similar information for most cases. However, they differ considerably in specific cases, especially when important nodes reach different sides of the spectrum in a more balanced way, such as the case for @UOLNoticias and @g1, two Brazilian news media profiles.

This result can be difficult to understand without being familiar with the network structure; thus, Fig. 3 compares the same metrics on a zoomed example of the same network analyzed above. Nodes are sized by their respective centrality measure. On the left and right corner of the figures, one can view the left-wing (red) and right-wing (blue) groups. Between them, a small number of nodes (dark-coloured) link these groups. On the first representation for betweenness centrality, two big nodes appear. These are for accounts @Haddad_Fernando and @ManuelaDavila, representing Fernando Haddad and Manuela Davila, candidates for presidency and vice-presidency. On the left-wing side of the figure, these accounts were ranked higher than the bridging nodes @UOLNoticias and @G1. On the second representation (bridgeness centrality), as expected, nodes from polarized groups were also not highlighted. Rather, @Trump_The_Robot, a spamming account banned from Twitter after data collection, was ranked higher than @UOLNoticias and @G1. And finally, on the third representation for intergroup bridging centrality, nodes from polarized groups also were not highlighted. However, in contrast to bridgeness centrality, @UOLNoticias ranked higher than @Trump_The_Robot and @G1. This same pattern prevailed in other weeks. For Canada (not shown due to lack of space), the intergroup bridging also helps to highlight important nodes that receive the attention of users from different sides of the political spectrum. These findings regarding the effectiveness of the intergroup bridging centrality algorithm speak to our first research question.

### Domain, content and topic polarity estimation

The next step was to extract entities (i.e., domain, content, and topic) related to the links to external sites present in tweets made by bubble reachers. Recall that content refers to news represented by its URL, domain refers to the news website domain, and the topic refers to the latent topic in the content extracted using standard automated processes. The process for extracting and estimating the polarity of these entities is presented in this section.

Having cleaned the data, we extracted topics from the pre-processed textual content. For this task, we applied the Latent Dirichlet Allocation (LDA) [33] algorithm using the Gensim library implementation. This algorithm allows for identifying topics in a set of texts, considering that each text has a mixture of topics [33]. The choice of the number of topics needs to be made manually for the LDA algorithm. Therefore, multiple models were created for each dataset with values for the number of topics in the range from 1 to 50. Within this range, we identified the model whose quantity of topics had the highest degree of semantic similarity between texts through the coherence metric generated with the (C_{v}) model8, which is based on the indirect cosine similarity of most representative words from each topic throughout all documents on the dataset. For all cases, the same seed was used so that the results of identifying topics could be replicated. Using this method, we found 48 topics in Brazil and 26 in Canada. After extracting topics with this method, it was analyzed the topic dominance of each content, from which it was found that 87% and 68% of contents in the Brazilian and Canadian datasets, respectively, were dominated by only one topic, with a dominance of at least 80% over other topics. Considering this result, we extracted the dominant topic of each content and checked the mean number of contents tied to each dominant topic. In the Brazilian case, it was found a mean value of 6.5 contents per dominant topic, with a standard deviation of 2.5, and, in the Canadian case, a mean of 15.0 contents per dominant topic, with a standard deviation of 4.1. These results indicate that most topics were well-defined (less nebulous) and evenly distributed between contents. Finally, the authors manually evaluated all topics identified and reached a consensus on whether the topics were closely related to the respective political situations.

The last step was to estimate the polarity of domains, content, and topics based on the polarity ((P(H))) of the users who retweeted a tweet. For this task, a new metric called relative polarity ((RP(H))) was created, which is calculated as follows: (1) for each entity (i.e. domain, content or topic), a list with 21 positions was created, where each cell computes the number of retweets that the entity received from users in the polarity bins represented by the set ({ -1.0, -0.9, ldots, 0.0, ldots, +0.9, + 1.0 }); (2) the entities were allocated in a matrix, in which each row represents an entity and each column one of the 21 polarity bins; (3) in order to avoid data imbalance for each bin, considering the overall dataset, each cell of the matrix was divided by their respective column maximum value; (4) for each entity (row), we normalize the values using a min-max strategy, putting the values on a ([0.0, 1.0]) interval; (5) for each entity (row), we summed each cell value multiplied by its respective polarity, for example, the first cell value was multiplied by −1.0, the second cell by −0.9, the third by −0.8, and so on, until +1.0. If the cell value was equal to zero or the cell represented the polarity 0.0 it was disregarded and not counted; (6) finally, the sum result was divided by the number of considered cells, which became the (RP(H)) value for the entity. The resulting value of the metric (RP(H)) was interpreted in the same way as the metric (P(H)), presented in Sect. 3.2.