# Networks of international football: communities, evolution and globalization of the game – Applied Network Science

#### ByYang Li and Gonzalo Mateos

Aug 22, 2022

In this section, we first briefly review the theoretical background of community detection. Then, the existence of communities within the global football network is verified by checking the extent to which Granovetter’s strength of weak ties theory holds in the constructed football graphs. Descriptive graph measures are used to quantify the structural properties of the football networks, which further reveal the dynamics of the network evolution. In the end, we advocate a graph similarity measure that comprehensively integrate various graph properties, further enabling the identification of several temporal states that mark specific development stages of the football history. Via thorough comparison with social history, we manage to determine the great correspondence between the football development stages and significant events occurred in history.

### Community detection

One of the most important questions in social network analysis is the identification of “communities”, which are loosely defined as collections of individuals who interact unusually frequently (Tantipathananandh et al. 2007). Community detection aims to detect the community structure inside graphs, to identify graph modules and possibly, their hierarchical organization, by using only the information encoded in the graph topology (Fortunato 2010). Up to now, abundant methods have been proposed for community detection, and most methods can be categorized into traditional methods, modularity-based methods and others; see a thorough review of available algorithms in (Fortunato 2010).

The well-known Girvan and Newman method (Girvan and Newman 2002; Newman and Girvan 2004) gives a new perspective for community detection by introducing the concept of edge betweenness. Communities are detected by sequentially removing the most important edges in the network. The algorithm also introduces the term of modularity, which serves as a criterion for measuring the quality of the division of networks. The basic idea is to maximize the modularity (Newman 2006) of the network

begin{aligned} Q = frac{1}{2|E|}displaystyle sum _{ij}left[ A_{ij}-frac{k_ik_j}{2|E|}right] frac{s_is_j+1}{2}, end{aligned}

(1)

where |E| is the total number of edges, (A_{ij}) is the entry of the adjacency matrix on the ith row and jth column that connects node i and j. (k_i,k_j) are the degree of node ij respectively, and (s_i,s_j) are the community assignment for node i and j. When node i and j are in the same community, (s_is_j = 1), otherwise (s_is_j = -1).

Based on modularity optimization, a whole new set of methods has been proposed. Two advanced approaches were brought up later to speed up the detection process, often referred to as the Fast Newman’s algorithm (Newman 2004) and Louvain algorithm (Blondel et al. 2008). In (Blondel et al. 2008), the algorithm first looks for communities in a local neighborhood of the node. Next, each identified community is aggregated into a new node, adding up to a new network building upon the previous one. Optimize modularity on this secondary network and repeat the steps until a maximum modularity is obtained. This method is among the fastest community detection methods. Consequently, it is implemented in this work for community detection on football networks.

### Strength of weak ties in football networks

The natural property of the network structure reflects its capability to bridge the local and the global components. Complex networks often optimize the tie strengths (connection between nodes) to maximize the overall flow in the network (Goh et al. 2001; Maritan et al. 1996). The weak tie hypothesis (Granovetter 1995; Csermely 2006) emphasizes the importance of weak ties in connecting communities. Connections with high tie strength are more likely to be structurally-embedded within communities, whereas connections with low tie strength correlate with long-range edges joining communities.

To verify the weak tie hypothesis and identify the intrinsic community structures of the football network, we extract a single graph including all the football games spanning from 1995 to 2015, and use participant teams as nodes and games as edges. In this graph, the numeric tie strength (i.e. edge weight) between two nodes is quantified by the total number of football games played between them. Additionally, follow the definition of the neighborhood overlap of an edge (e_{ij} in E) in (Onnela et al. 2007)

begin{aligned} O_{ij}=frac{|n(i)cap n(j)|}{|n(i)cup n(j)|}=frac{n_{ij}}{(k_i-1+k_j-1-n_{ij})},n(i):={j in V: (i,j) in E}, end{aligned}

(2)

where n(i) is the one-hop neighborhood of the node i. (n_{ij}) is the number of common neighbors shared by node i and node j, and (k_i,k_j) denote the degrees of node ij, respectively. Edges with low overlap are related with two end nodes that do not share many common neighbors, and such edges are more likely to exist between nodes in different communities.

Figure 3a and b show the network from year 1995 to 2015. The edge colors represent the tie strengths (edge weights) in Fig. 3a and edge overlap (O_{ij}) in Fig. 3b, respectively. The colors of the nodes correspond to the football confederations they belong to. From the figures, it is clear that edges (in green) with low tie strengths and low overlap are mostly between confederations, while edges (in red) with high tie strengths and high overlap are mostly within confederations. To quantitatively illustrate this property, from all the 5105 edges, edges with the highest 1000 tie strength and edges with the lowest 1000 tie strength are extracted. In each group of 1000 edges, the fraction of edges connecting countries in different confederations is computed. The same procedure is also applied for edge overlap. Table 5 presents the results, which show that edges with high tie strength or high overlap are very unlikely to exist between confederations, while edges with low tie strength or low overlap are more likely to connect countries in different confederations. This result matches the weak tie hypothesis discussed earlier in this section and the visualization in Fig. 3.

In Fig. 4, we use the tie strength and edge overlap as the (xy) coordinates and plot all the edges between 1995 and 2015. In Fig. 4a, edges representing World Cup games are marked as red circles. Most red circles gather in the lower left corner, which indicates that most World Cup games have low tie strength and low overlap. As most World Cup games are played between countries from different continents, this observation again verifies the weak tie hypothesis in the football network where edges with low tie strength and low overlap are most likely between communities. Figure 4b distinguishes games within and between confederations. Red circles that represent games between confederations are located near the origin, validating the existence of weak tie hypothesis in the football network. Figure 4b also provides a visual correspondence to Table 5, showing that most edges with low tie strength and low overlap exist between confederations.

Figures 3 and 4 attest the weak tie hypothesis in the football network. As demonstrated in (Granovetter 1995), most people know about their current jobs from an acquaintance instead of a friend. This fact reveals the role of weak ties in social cohesion and the vital importance of weak ties in message passing within social networks. Similarly, in the football network, edges with low tie strength are between countries with few games played between them. Edges with low overlap indicate that two countries do not share many common neighbors, which means there are few countries that these two countries have both played against. As a result, edges with either low tie strength or low overlap contain vital information about the structure of the football network, and serve as bridges between continents and confederations that contribute to integrate individual football societies into a complete, globalized football network.

To better reveal the community structures of the network using the concepts of tie strength and the edge overlap, edge removal was carried out by removing the strongest edge one by one. The relative size of the giant component (a connected subset of vertices whose size scales extensively (Newman 2001)) was computed to check the impact of the removal of each edge. Same procedure is repeated for removal of weakest edges as well. Figure 5a and b show the size change of the giant component as edge removal is progressing. From the image, we can see that by progressively removing the edge with either the lowest tie strength or the lowest edge overlap , the size of the giant component shows discontinuity and gaps between points, indicating a sudden disintegration of the network. This means that removing edges with low strengths or overlap would lead to a breakdown of the original network and the emergence of multiple smaller communities. On the other hand, removing edges with high strengths and overlap gradually shrinks the whole network and does not in fact break the network apart.

In this section, the weak tie hypothesis has been validated, and the underlying community structures of the football network are revealed by edge removal. The next step is to formally detect these existing communities in the football network via community detection.

### Community structure of static football network

By using the complete data set from year 1872 to year 2016, we constructed a single representative graph of the football network. In this graph, 238 countries involved in football history are included as 238 nodes, and all the 39052 matches are represented by the edges. A (238 times 238) adjacency matrix shown in Fig. 6a was built for this network, whose entries are the number of matches played between two countries.

Applying the Louvain community detection method on this network gives 6 detected communities. We rearrange the rows and columns in Fig. 6a based on the community assignment of each node to ensure that nodes of the same community are next to each other in the reformatted adjacency matrix. The new adjacency matrix exhibits a block-wise diagonal format as shown in Fig. 6b, with each block corresponding to one community. By plotting each community on the world map with different colors in Fig. 7, we see a clear correspondence between the detected communities and the football confederations presented in Table 1.

Although Fig. 7 shows a nice view of the structure of the football network, there exist a few exceptions. For example, Australia joined Asian Football Confederation in 2006 but is still included in the community corresponding to Oceania Football Confederation in Fig. 7. In addition, several countries such as Kazakhstan and Israel left AFC to join UEFA. Such transitions could not be revealed in a single network. Since more and more football matches take place in recent decades, it is reasonable to assume that football networks in earlier years are masked by the crowded networks of recent years where there are much more nodes and edges. The static football network is unable to reveal transitions and changes in the football society. In order to gain a dynamic view of the evolution of the football network, in the next section, we dissect the football history over time and dig into the dynamic properties of the football networks.

### Dynamics of the football network

In this section, we focus on the dynamics of football networks with the aim to unveil the evolution and globalization of football society.

#### Descriptive statistics of dynamic networks

To obtain the temporal dynamics of the football networks, we extract all the football match records from year 1901 to year 2010, and group them into 11 decades to generate one representative football network for each decade. We compute the number of games played either within each football confederation or between confederations per decade. The intention is to distill appealing information, such as which confederations dominate the football world, which confederations experience sudden prosperity or stasis, etc. It would also be interesting to correspond the observations with specific historical events. For example, we would expect to see a significant decrease in the amount of games played in UEFA due to the war, and a sudden spike in the 1950s for CAF as Africa officially enters the football world.

Different football confederations have different development paths. Some entered the football world early while others had a late start. Figure 8 shows the number of football games played within and between confederations. In Fig. 8a, it is clear that the dominant confederation is UEFA shown as the red line with the most games played. It suffered a severe drop down in the number of games in the decade of 1941-1950 due to the second World War. CONMEBOL shown as the sky blue diamond, as the first established confederation, does not experience much interruption and shows a steady growth. AFC and CAF do not have many football games until the 1950s, the decade in which both confederations were established. The first Asian Cup and the first African Cup were also held in that decade.

In addition, it is worth noticing that the 1990s witnesses great increases in number of games played in multiple confederations. Such increase makes sense if we look into football history for reference. The 1998 World Cup grew from 24 teams to 32 teams and allowed more teams from Africa, Asian and North America to participate. This change could significantly increase the eagerness of countries in these areas to join football and also enhance the competition. More friendly games and qualification games would be played within confederations.

Figure 8b shows the interaction between confederations. The communication between confederations is basically growing as more games are played in recent decades. Exception such as the number of games between AFC and OFC, shown as the sky blue line with diamonds, can be explained as Australia left OFC and joined AFC in 2006. Consequently, the original between-confederation edges between Asian countries and Australia now belong to the within-confederation edges of AFC. The line with the blue star shows a significant increase of football games between AFC and UEFA starting from the 1990s. This change is strongly related to the ambition of Asian countries in developing their football and the growing economy of Asia where money are spent to invite European teams for friendly games.

Other graph measures can also be useful to capture the dynamics of networks. In this work, we explore the measure of global efficiency (Latora and Marchiori 2001) which is the average of inverse shortest path length. The average efficiency of a network G is defined as:

begin{aligned} E(G) = frac{1}{n(n-1)}displaystyle sum _{i<j in G}frac{1}{d_{ij}}, end{aligned}

where n denotes the total number of nodes and (d_{ij}) denotes the length of the shortest path between node i and node j. The global efficiency is defined as:

begin{aligned} E_mathrm{{global}}(G) = frac{E(G)}{E(G^mathrm{{ideal}})} end{aligned}

where (G^{rm{ideal}}) is the graph with n nodes and all possible edges are present. Global efficiency serves as a quantitative measure of the average distance it takes for a node to reach another node. Networks with high global efficiency should have more edges thus connections between nodes are efficient.

Figure 9 shows the computed global efficiency for the football networks constructed for each decade. An obvious valley appears in the 1940s when the WWII broke out. This finding matches the results in Fig. 8. During the wartime, there were fewer football games thus most connections between the nodes were cut off, and the global efficiency of the network is severely compromised.

So far, we have looked at several descriptive statistics of the dynamic in the football networks. In Fig. 7, we observe the correspondence between football communities and real-world football confederations. However, football networks do not always appear in such form as different confederation were established at different times in history. In addition, it is practical to assume that communities in early decades are more localized in certain regions, while in recent decades communities are much more spread out and nodes in the same community may have huge geographical distance in between. Thus, it is reasonable to assume that football networks maintain different community structures in different decades.

#### Community structure of dynamic networks

To reveal the dynamics of the community structure of football networks, we applied the Louvain community detection algorithm on the 11 networks for the 11 decades from year 1901 to year 2010. Figures 10 and  11 give the adjacency matrices and visualization of the networks for 4 example decades. Both figures show a clear trend of the community evolution of the football network. Starting from early decades, the communities are quite local, and the correspondence between the identified communities and confederations was not clear. As more countries join the football society, the communities start to grow with more nodes. The boundaries between communities also become more apparent as shown in Figs. 10c and 11a. In these decades, the communities are slowly transforming their structures, sharing more and more similarity with the actual football confederations. Figure 11c and d show the football network of the last decade, in which the community assignment for each node is basically the same with the actual confederation affiliation of each country. From these figures, a clear evolution path of the football network is unveiled, and the temporal features beneath such evolving community structures definitely worth more in-depth research.

#### Temporal states extraction

In the previous section, we brought up the assumption that different stages exist during the evolution of football. It would be helpful to identify individual states in football history which represent different evolution stages. As stated in multiple literatures (Zager and Verghese 2008; Papadimitriou et al. 2010; Koutra et al. 2013), graphs belonging to the same state or same cluster shall exhibit high similarity. As a result, a reasonable way to find temporal states in football history is to first calculate the similarity between football networks per year, and gather the graphs with high similarities into one group.

Multiple literatures (Bunke 2000; Macindoe and Richards 2010; Bunke et al. 2007; Wilson and Zhu 2008) have discussed the calculation of graph similarity. However, each method only consider one aspect of the graph features and may lose other useful information. In (Li et al. 2011), the authors included 20 features regarding graph properties to generate a feature vector per graph. After normalizing the feature vectors, they fed them into a SVM for graph classification. The mean of the node degrees and the mean of node clustering coefficients are combined with other global measures such as the global efficiency in the final feature vectors. This procedure ignores the node correspondence between graphs, and may cause information loss regarding the different degree distribution between graphs. Also, combining all the measures into a singe vector makes it vague to determine the contribution from each measure. In order to preserve the node correspondence and take advantage of the graph-level measures, we bring up the following framework to combine different categories of graph measures for the computation of graph similarities. We continue to use the match records from year 1901 to year 2010, and construct 110 football networks in total, one for each year. The graph measures are categorized as the following 3 types.

(1) Node-level: degree, clustering coefficient (Watts and Strogatz 1998), closeness (Freeman 1978), local efficiency (Latora and Marchiori 2001).

• For each graph with N nodes, compute each of the above measures and generate a (N times 1) vector for each node-level measure;

• Concatenate the vectors for all the 110 networks into a (110 times N) matrix;

• Calculate the correlation coefficient between matrix rows;

• Transform the correlation coefficient (c_{ij}) into similarity measure to constrain the value into [0,1];

begin{aligned} s_{ij} = frac{c_{ij}+1}{2},i,j = 1,…,110 end{aligned}

• Generate a (110 times 110) similarity matrix for each node-level measure.

(2) Graph-level: number of nodes, number of edges, average path length, global efficiency (Latora and Marchiori 2001), diameter, radius, graph energy (Gutman 1978), link density (Black 2008) and transitivity (Newman 2003).

• Calculate the above measures and construct a (9 times 1) feature vector per graph;

• For all the 110 graphs, construct a (110 times 9) feature matrix;

• Normalize the columns with z-normalization;

• Generate a (110 times 110) similarity matrix based on the similarity (defined above) between rows.

(3) Structure-level: vertex-edge-overlap (VEO) (Papadimitriou et al. 2010). The vertex-edge-overlap similarity is defined as

begin{aligned} mathrm{{sim}}_mathrm{{VEO}}(G_1,G_2)=2times frac{E_1cap E_2+V_1cap V_2}{E_1+E_2+V_1+V_2} end{aligned}

where (G_1 = (V_1,E_1),G_2 = (V_2,E_2)). The vertex-edge-overlap measures the structural similarity between graphs. The VEO is computed between every pair of graphs of the 110 graphs, and a (110 times 110) similarity matrix is generated.

After the three similarity matrices are obtained, we further calculate the average of them as the final similarity matrix. Figure 12 presents the exact pipeline for the above procedure to identify temporal states in football history. This framework takes into consideration the node-level graph measures to preserve the node correspondence between graphs, the global graph measures to consider overall graph properties, and the structural properties (nodes and edges) to quantify the topological similarity between graphs. In Fig. 12, a clear dissection of the original complete network in the top left corner is presented. Three levels of similarity calculation are carried out and the results shall be discussed soon in later sections.

Figure 13 presents the graph similarity matrices obtained based on node-level graph measures (degree, closeness, clustering coefficient, local efficiency). In Fig. 13, we can see a clear boundary before and after the 1940s due to the second World War, and along the diagonal several blocks could be identified leading to potential temporal states. Figure 14a shows the graph similarity matrix based on global measures. From the image, we can roughly identify two clusters with a blurry boundary around 1950 to 1960. These years could be identified as a transition stage from earlier stage when the football society just started to grow, to current modern stage with established confederations. Figure 14b shows the vertex-edge-overlap similarity matrix, which exhibits a similar pattern with Fig. 13.

Figure 15 presents the ultimate graph similarity matrix as the average of all the similarity matrices obtained above. This matrix combines all three levels of similarity, taking features of the network from all aspects into account. We can see clear partitions of the 110 years in football history. If we go along the diagonal line, several individual states could be visually identified. We further carry out a community detection procedure on the similarity matrix of Fig. 15, and partition all the 110 years into community of years, enabling the following temporal states to emerge.

• 1901-1908: Start of the history In early years, only a few countries were playing football, and they played mostly against each other. Football networks in these years share high similarity (see the bright red block on the upper left corner) as they consists of similar nodes and roughly identical edges.

• 1909-1913: Embryonic form of globalization In 1908, the first official football tournament was played at the Summer Olympics in London. Most participant countries were from Europe, yet it was the first time that football appeared as an international sport.

• 1914-1918: WWI The first World War broke out in July 1914 and ended in November 1918. During the war, football in Europe was severely impacted. Fewer international football games were played and some football players even joined the army at that time. Meanwhile, other areas such as South America was less affected. CONMEBOL was founded at this time, and the first Copa América was held in 1916. This explains the high similarity between football networks in these years as shown in Fig. 15.

• 1919-1938: Rebuilding After the WWI, peace again returned Europe and football, as the most popular sport there, got prosperous one more time in Europe. Football networks in this stage have high similarities, showing a steady and healthy growth of the game of football. In 1930, Uruguay held the first World Cup with 13 teams. Another 2 World Cups were held in 1934 and 1938 in Italy and France, respectively. Football in this stage showed an increase in the total number of games played, total number of nations participated, and the diversity of the participant nations. Although most nations playing football were from Europe and South America, several Asian and African countries, such as China, Egypt, Palestine, also joined the football world in this period.

• 1939-1945: WWII The World War II, from 1939 to 1945, was a disaster to the whole world. In Fig. 15, we can see a blue cross in this time period as the football networks in these years are not at all similar with the rest of the networks. World Cup was forced to stop due to the war. Countries focused merely on how to survive instead of playing football. However, football was not completely forgotten, as in some neutral nations football was still quite popular. Moreover, football was very popular among soldiers and even inside prisoner-of-war camps.

• 1946-1959: Post-war recovery After the second World War, everything began to recover, including football as well. In 1950, Brazil held the 4th World Cup which had been cut off for 12 years. The whole football world started to heal itself. In this period, besides the international football events such as the World Cup and the Olympics, regional football also embraced a fast growth, including

• UEFA and AFC founded in 1954

• The first Asian Cup was held in 1956, won by South Korea

• CAF is founded and the first African Cup of Nations was held in Sudan

It is also interesting that graphs in this stage share much similarity with the graphs in the stage before the second World War. This exactly shows the attempt of the football society to get recovered to the status as the one before war.

• 1960-1990: Prospering In the previous stage, most regional football confederations were established with the exception of CONCACAF which was founded in 1961. The structure of the modern football world was more or less built up. In this stage, the football world experienced a peaceful growing stage for 30 years. There were not much significant events happening to change the overall landscape. In Fig. 15, we can see a large block from year 1960 to 1990, with a more or less uniform similarity between the networks.

• 1991-2010: The tremendous change In 1991, the Soviet Union dissolved. This historical and political event had its own impact on the football world. In Fig. 15, a new block appears starting from year 1991, indicating that from this moment the network enters a new stage and has less similarity with previous networks. The changes in the new networks are immense as the Soviet Union, which originally was a single and important node in the network, is dissolved into multiple new nodes. With these newly emerged nodes, more football games have been played thus the edges are also significantly influenced. We investigate the political history and generate Table 6. The nations in the table were either emerged as new nations after the cold war, or got independent from Soviet Union. The table also includes the first football game played by new countries, and the last game before cold war and first game after cold war for other countries. 20 new nodes emerged or re-emerged after Soviet Union dissolved. This political change brings significant alternation of the overall structure of the football network, especially due to the fact that most of the new countries joined UEFA, the confederation with the most impact to the whole football world.

The above analysis lists several stages in the football history from year 1901 to year 2010. Over a century of football history is partitioned into 8 states. The interesting fact about these stages is that, although people would assume that the changes of the football networks would be mainly related with the changes within the football world, such as the expansion of World Cup or the commencement of new football tournaments, it is also significantly related with historical and political events, such as wars and political incidents. This shows a perfect evidence of the social impact of football and its ability of offering an insight into the changes in the world. Changes in the world would affect football and football on the other hand, could reflect changes in the world.