In this section, we evaluate the efficacy of the methods to identify communitiesFootnote 4 on the finer-grained co-location network. Next, we perform experiments to study if the identified communities can help in understanding the neighborhood’s functioning.
In this section, we perform the community analysis of the adolescent representations learned by all the selected representation learning methods. We render the identified adolescent communities on the Columbus map, where each adolescent is represented through their approximate home location. We select the number of communities to be 18—similar to the one reported in Xi et al. (2020)—and also observe the perplexity metric (Blei et al. 2003) value with 18 number of communities to be one of the lowest. The identified communities for Deepwalk, LINE, LocationTrails, LDA (Xi et al. 2020), Metis and Graclus are shown in Figs. 5a, b, 6a, b, 7a, b respectively. Next, we analyze the identified communities from a sociological lens.
Qualitative holistic analysis of results
We observe that in white-dominated neighborhoods the evaluated methods often identify residentially proximate communities (refer Fig. 1). For instance, we observe that all methods identify a community present at Bexley, Ohio (community number: 10, color: blue). Bexley is a white-dominated area (86.5% of its population is white). The median household income of its residents is double than that of residents living in Columbus city. Bexley is also rich in organizational resources and was historically considered a relatively insular community given its spatial embeddedness in a largely lower-income context. The emergence of the Bexley community shows that many of its residents share the same activity profiles, and this might be due to the abundance of organizational resources (an advantaged neighborhood). Moreover, a few white-dominated neighborhoods such as Upper Arlington, Grandview Heights, and Worthington are commonly identified by Deepwalk, LINE, LocationTrails, Metis, and Graclus.
A few of the methods (Deepwalk, LINE, Metis, and Graclus) that rely solely on the graph structure place adolescents in the same community if they reside in the same black-dominated neighborhoods (such as Near East Side (Census Tract 29 and 36, Franklin, OH) and Milo Grogan (Census Tract 15 and 23, Franklin, OH)). This result does not align well with existing sociological studies (Basta et al. 2010; Sastry et al. 2004; Browning et al. 2021b; Small and McDermott 2006). These studies mention that the lack of organizational resources (grocery stores, schools) in black-dominated neighborhoods result in adolescents spending a nontrivial proportion of their time outside of their residential neighborhoods and they encounter more heterogeneous exposure to neighborhood racial composition than other adolescent . This often results in dissimilar activity profiles among adolescents residing in these disadvantaged neighborhoods. Hence, it is surprising that few methods (Deepwalk, LINE, Metis, and Graclus) identify residentially proximate communities in black-dominated neighborhoods. LocationTrails, which relies on the sequence of locations visited by the adolescents, does not identify residentially proximate communities in black-dominated neighborhoods. We present a detailed community analysis of each method in the next few sections.
Community analysis: LocationTrails
The communities identified by LocationTrails on the finer-grained co-location network are consistent with the ones identified by the peer reviewed study done by Xi et al. (2020) on the AHDC coarser-grained co-location network constructed using a structured data collection approach. Specifically, we observe that LocationTrails places adolescents in the same clusters who reside in Grandview Heights (cluster number: 8, color: light green), Upper Arlington (cluster number: 2, color: black), and Worthington (cluster number: 7, color: green). All these regions have more than 90% white residents, and the median household income of the residents in these regions is double that of residents living in Columbus. These communities share similar characteristics as that of Bexley, however, Deepwalk, LINE, and LDA methods are unable to find these communities. For the adolescents living in the black-dominated neighborhoods, LocationTrails place them in different communities. Specifically, the adolescents who reside in Near East Side (Census Tract 29 and 36, Franklin, OH), Milo Grogan (Census Tract 15 and 23, Franklin, OH) are placed in different communities. The median household income of residents in these regions is less than that of residents living in Columbus. The adolescents in these disadvantaged neighborhoods need to travel further, on average, to access organizational resources and have few common activity profiles. Therefore, LocationTrails assigned them to different communities.
Community analysis: Deepwalk and LINE
From Fig. 5a, we observe that Deepwalk and LINE identify communities that are often residentially proximate—adolescents who reside in the same neighborhood often share the same communities. The identified residentially proximate communities are present for most of the neighborhoods (both white-dominated and black-dominated). This result runs counter to expectations in that residentially proximate communities are less likely to occur in high poverty neighborhoods. As mentioned previously, youth from high poverty neighborhoods often spend a nontrivial proportion of their time outside of their residential neighborhoods and encounter more heterogeneous exposure to neighborhood racial composition than other youth (in order to seek organizationally-based resources) (Browning et al. 2021b). This often results in dissimilar activity profiles among youth residing in these disadvantaged neighborhoods. Drilling down on the raw activity profiles of individuals in this community, we find that they do indeed have activity profiles that differ and are quite heterogeneous. The results observed here suggest that LINE and Deepwalk are pre-disposed (biased) to identifying residentially proximate neighborhoods.
The reason both Deepwalk and LINE identify residentially proximate communities even for the segregated high poverty neighborhoods can be explained as follows. Both these methods rely solely on the structure of the graph to learn the node representations. Deepwalk relies on the random walks on the co-location network, while LINE relies on both explicit (first-order proximity) and implicit (second-order proximity) connectivity between nodes to learn the node representations. Hence, if two adolescents residing in the same neighborhood visit few common locations (e.g. local stores) present in that neighborhood, these methods would put a high constraint on learning similar representations of those adolescents, as there exists an implicit link between those adolescents. The clustering method would then assign these two adolescents in the same cluster as they would have similar representations.
Community analysis: LDA
From Fig. 6a, we observe that LDA identifies clusters at Bexley ( cluster number: 10, color: blue ) and Upper Arlington (cluster number: 2, color: black). However, it failed to identify clusters in white-dominated, advantaged neighborhoods that were identified by LocationTrails.
Community analysis: Metis and Graclus
The communities identified by standard network science algorithms Metis (Karypis et al. 1997) and Graclus (Dhillon et al. 2007) are shown in Fig. 7a and b, respectively. We observe that Metis and Graclus identifies clusters that are residentially proximate for both white-dominated and black-dominated neighborhoods. Metis and Graclus clustered adolescents residing in black-dominated neighborhoods such as South Columbus, south of Grandview Heights in the same communities. As mentioned earlier, these clusters are not aligned with the sociological findings mentioned in the section “Sociological studies on the activity profiles”.
To summarize, the above analysis of the identified communities suggests that a method that is cognizant to the sequence of locations visited by the adolescents while learning node representations (LocationTrails Gurukar et al. 2021) is effective in identifying higher-quality communities from the co-location networks.
Quantitative analysis of the communities
We measure the overlap between the identified communities by the methods using Normalized Mutual Information (NMI) (Estévez et al. 2009). From the qualitative analysis, we observe that adolescents who reside in white-dominated neighborhoods, often share the same cluster. This clustering pattern is observed across different methods. In our quantitative analysis, we focus on the adolescents who reside in white-dominated neighborhoods. We then identify their clusters with different methods and present the NMI between the identified clusters in Table 2. A similar analysis for adolescents residing in black-dominated neighborhoods are shown in Table 3. We observe the NMI between clusters identified Deepwalk, LINE, LocationTrails, Metis, and Graclus in the white-dominated neighborhood is relatively high. The relatively high NMI coupled with visual analysis of identified clusters suggest that adolescents who reside in white-dominated neighborhoods often share the same cluster. In black-dominated neighborhoods, the NMI value between clusters identified by Deepwalk, LINE, Metis, and Graclus is relatively higher than NMI between these methods and LocationTrails. The relatively high NMI of Deepwalk, LINE, Metis, and Graclus in black-dominated neighborhoods coupled with visual analysis of identified clusters suggest that these methods are identifying clusters even in black-dominated neighborhoods. As mentioned earlier, this suggestion does not align well with existing sociological studies. We will shortly discuss in the context of neighborhood affinity that further amplifies this point. Note that NMI of LDA is relatively lower in both Tables 2 and 3. The NMI between identified clusters of adolescents residing in all the neighborhoods is shared in the Additional file 1 (see section “Quantitative analysis”).
Quantitative analysis: neighborhood affinity
In this section, we quantitatively analyze the communities present in the neighborhoods. Following the literature (Xi et al. 2020), we consider the census tract as a proxy for neighborhood and compute the percentage of adolescents who reside in a census tract and share the same cluster. The neighborhood affinity of a neighborhood is the probability that two randomly selected adolescents who reside in the same census tract also share the same cluster. Since there are multiple neighborhoods, we report the average neighborhood affinity over all the neighborhoods. While computing the average neighborhood affinity, we filter out the neighborhoods that have fewer than five residents. The average neighborhood affinity scores of different methods are shown in Fig. 8. We also report the average neighborhood affinity scores of the Randomization method to know the expected average neighborhood affinity score under uniform community assignment. In Randomization method, we assign adolescents to communities at random in a uniform manner over 1000 times and then compute the average of average neighborhood affinity score.
From Fig. 8, we observe that the average neighborhood affinity score of the Deepwalk method is the highest, irrespective of the number of communities. LINE also identifies residentially proximate clusters and has the second highest average neighborhood affinity score, irrespective of the number of communities. The high-affinity score of Deepwalk and LINE quantitatively show that they find residentially proximate clusters. LocationTrails affinity score is lower than Deepwalk as LocationTrails places adolescents who reside in black-dominated disadvantaged neighborhoods in different communities. On the other hand, LocationTrails affinity score is higher than LDA, as LocationTrails identifies more clusters with similar characteristics (white-dominated, advantaged neighborhoods). The difference between the average neighborhood affinity score of LDA and Randomization is statistically significant at significance level 0.01 (Z-score (ge) 26.0 for all clusters).
Next, we compare the average neighborhood affinity score across white vs. black dominated neighborhoods and advantage vs. disadvantaged neighborhoods. The results are shown in Figs. 9 and 10. The average neighborhood affinity score is multiplied by 100. We observe that the average neighborhood affinity score of the adolescents living in the white-dominated neighborhood is higher than that of i) black-dominated neighborhoods and ii) all the neighborhoods, for the four representation learning methods (Deepwalk, LINE, LocationTrails, and LDA). We also observe that the average neighborhood affinity score of the adolescents living in the advantaged neighborhood is higher than that of i) disadvantaged neighborhoods and ii) all the neighborhoods, for the same four representation learning methods. This analysis suggests that white adolescents or adolescents residing in advantaged neighborhoods tend to share more similar activity profiles than their black or disadvantaged neighborhood counterparts. The average neighborhood affinity score of black-dominated/disadvantaged neighborhoods is lower than that of all the neighborhoods. This is because adolescents who reside in these neighborhoods are less likely to have common activity patterns, and this non-commonality in activity patterns might be due to a lack of organizational resources in the black-dominated/disadvantaged neighborhoods.
Drilldown analysis of communities: LocationTrails
In this section, we present a drilldown analysis of communities identified by LocationTrails and provide commentary on the activity profiles of adolescents placed in a community. We do not disclose the name of the locations that adolescents visit to preserve their privacy. The information about the types of public and private schools in the United States are provided in these articles [56,57]. The population statistics, economic and political information of Franklin county and the below-mentioned neighborhoods can be found on several web portals [58,59].
We observe that several communities identified by LocationTrails are residentially proximate. Specifically, Communities 0 and 3 (Upper Arlington), 2 and 17 (Clintonville), 6 (Hillard), 7 (Whitehall), 10 and 15 (Bexley), 13 (East of German village), 14 (Worthington), and 16 (Grandview Heights). Communities 0, 3, 6, 10, 14, 15, and 16 are present in white-dominated neighborhoods with rich organizational resources. Whitehall has a more diverse racial composition (43% white and 39% black residents) and is moderately affluent. Adolescents in residentially proximate Community 13 commonly visit one public magnet high school in East of German village and two public parks within 6 miles from East of German village.
We see that Community 0 and 3 both fall in Upper Arlington, but the adolescents in Community 0 are middle school students and commonly visit two middle schools in Upper Arlington while the adolescents in Community 3 are high school students and commonly visit one high school in Upper Arlington. Essentially, LocationTrails is able to distinguish the middle vs. high school adolescents based on their activity profiles even though their home locations lie in the same neighborhood. We also note that community 10 is extremely cohesive and centered in Bexley (students attending the local high school) whereas community 15 is also largely centered in the Bexley area, but it does have a spread of adolescents with neighborhood homes from largely advantaged neighborhoods in the rest of Franklin county. Drilling down, we observe that the rationale for this is largely driven by the fact that many of the students with shared activity profiles in this cluster attend one of several expensive private schools situated in Bexley. We point both of these out (two distinct clusters in Upper Arlington and Bexley) as this type of fine-grained analysis is not immediately visible when examining communities identified by the other methods in our study. Next, we observe that there are a few communities such as Community 5, 8, 11, and 12 in which the home locations of adolescents are spread out over Columbus city. We observe that in these communities, the adolescents often visit schools that have an open enrollment policy and often serve as magnet schools (for STEM, STEAM, and the Arts) or alternative high schools—the policy allows adolescents residing in one school district area to attend schools in another district area. Specifically,
Adolescents in Community 5 commonly visit one arts middle school near Downtown and a public magnet school near Downtown.
Adolescents in Community 8 commonly visit three public magnet high schools (one near Clintonville, one north of North Linden and one in Marion-Franklin).
Adolescents in Community 11 commonly visit one stem school in South Linden and a public-magnet alternative high school in North Linden.
Adolescents in Community 12 commonly visit two public magnet high schools (one between Worthington and Easton and another near downtown) and one public-magnet alternative high school (with intensive arts curriculum).
Finally, we note that community 4 is spread out over Columbus city as the adolescents in those communities share non-school activities such as a popular swimming club, visiting community centers, malls and church.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.