Biomes are a key concept in ecology and biogeography (Higgins et al., 2016; Mucina, 2019) and have been largely used in global change ecology (Prentice et al., 2007; Williams et al., 2007; Lehmann et al., 2014; Moncrieff et al., 2016), conservation planning (Hoekstra et al., 2005), and evolutionary biology (Donoghue and Edwards, 2014; Landis et al., 2021a). Although biome definitions have differed (Mucina, 2019), the scientific community has generally settled on an agreed biome definition: “a biotic community finding its expression at large geographic scales, shaped by climatic factors, and perhaps better characterized by physiognomy and functional aspects, rather than by species or life-form composition” (Mucina, 2019). Despite this accord over the definition of a biome, there remains no universally recognized method of delimiting and mapping biomes. Different approaches focus upon different elements of biomes – their physiognomic, floristic, environmental, or functional characteristics – which in turn produce different biome maps. Although such single-criterion-based biome schemes are helpful for understanding plant communities from an operational point of view, and at the local to the global scale (Conradi et al., 2020), they cannot define the nature of biomes through time and fail to capture the distribution, structure, and functioning of biomes in an evolutionary continuum.

Most recently, a global-scale conceptual view of biomes has been proposed which considers biomes as the confluence of ecology, evolution, and biogeography (Pennington et al., 2009, 2018; Oliveira-Filho et al., 2013a,b; Moncrieff et al., 2016; Pennington and Lavin, 2016; Mucina, 2019, 2020; Nürk et al., 2020; Ringelberg et al., 2020). While there is a consensus in ecology and biogeography that biomes should be defined based on physiognomy and functional aspects (Mucina, 2019; Pennington et al., 2018), an evolutionary dimension emphasizes the processes that have led to current biome distributions. This concept defines biomes as “evolutionary theaters” within which lineages interact and evolve through time, and as meta-communities regulated by community assembly at large spatial scale (Pennington et al., 2009; Oliveira-Filho et al., 2013b; Pennington and Lavin, 2016). The concept has emerged partly in response to increasing evidence for the prevalence of phylogenetic niche conservatism (Crisp et al., 2009; Pennington et al., 2009; Oliveira-Filho et al., 2013a,b; Kerkhoff et al., 2014; Gagnon et al., 2019; Ringelberg et al., 2020; Segovia et al., 2020). This tendency of plant lineages to inherit their overall ancestral environmental niches is based upon evidence that many plant lineages have dispersed across large distances over evolutionary timescales yet occupy similar ecological conditions. The general lack of dispersal limitation and difficulty of accruing novel environmental adaptations had led to the popularity of the phrase that for plants, it is “easier to move than evolve” (Donoghue, 2008).

That plant phylogenies are often more structured ecologically than geographically suggests that ecological gradients are evolutionarily important (Crisp et al., 2009; Oliveira-Filho et al., 2013a). The concept of phylogenetic niche conservatism is strongly linked to environmental filtering, i.e., the process whereby environmental gradients act as strong filters for species distributions (Cavender-Bares et al., 2009; Hardy et al., 2012; Blonder et al., 2015). For example, a previous study across plant lineages has shown that environmental filtering has played an important role in shaping the flora of the Galapagos Islands (Carvajal-Endara et al., 2017). In the presence of environmental filtering, lineages cannot successfully establish unless they have traits that leave them pre-adapted to pass environmental filters, leading to distinct biome assemblies (see also Donoghue and Edwards, 2014).

Biome conservatism and environmental filtering are of course not universal. For example, depending on the “evolutionary accessibility” of a new ecological setting or a lineage’s biome affinity and location relative to the spatial distribution of other biomes at any given time, there is a varying spectrum of possible biogeographical scenarios, including those at which lineages can transcend ecological barriers more easily than geographical barriers (Edwards and Donoghue, 2013; Landis et al., 2021a). Evidence from global tropical grasslands shows that many lineages were able to colonize the biome over the past 10 million years from other biomes (Simon et al., 2009; Maurin et al., 2014), indicating that perhaps some biomes are more open to outsiders (i.e., non-native or pre-adapted lineages; Edwards and Donoghue, 2013; Donoghue and Edwards, 2014). Such evolutionary biome switching reflects that some environmental gradients (i.e., biome borders) are more easily crossed than others, perhaps due to the ease at which adaptations to these gradients can be acquired (Pennington et al., 2006; Simon and Pennington, 2012). The intermediate disturbance hypothesis (Connell, 1978) posits that if a disturbance is not too extreme, many plant lineages may already have or can evolve traits required to survive it, but the more extreme the disturbance (e.g., extreme drought and extreme cold or extreme heat), an increasingly small number of species will have these traits because they are hard to evolve. Quantitative evidence across floras is now needed to understand the relative roles of niche conservatism and the species-environment interactions (environmental filtering) across ecological gradients, particularly in the Neotropics where much biome complexity is found (Pennington et al., 2009; Hughes et al., 2013; Dexter et al., 2018; Silva-de-Miranda et al., 2018).

If phylogenetic niche conservatism and environmental filtering have shaped the macroevolutionary patterns of floristic and functional diversity that make up the evolutionary theaters or biomes, then exploring variation in taxonomic, functional trait, and community phylogenetic data may help delimiting more biologically meaningful biomes that are globally comparable. While biomes defined exclusively by individual, distinct operational criteria will result in different biome maps fit for different purposes such as comparative ecology and global change research (Conradi et al., 2020), comparisons of biomes delimited under different or combinations of criteria remain rare, particularly across geologically and ecologically complex biogeographic regions like the Caatinga domain of northeast Brazil (NE Brazil, Figure 1) (Queiroz et al., 2017; Fernandes et al., 2020).

Figure 1. The main biogeographical regions of northeast Brazil (including the state of Minas Gerais) where three biomes predominate: the geographically disjunct Amazonian and Atlantic rainforests [(A), green; (D), blue], the succulent-rich Caatinga seasonally dry tropical forests [(B), red], and the grass-rich, fire-prone savannas of the Cerrado [(C), yellow]. For geographic reference, the whole north and northeastern borders of the inset are the Atlantic Ocean. Photos by PWM (A) and DC (B–D).

The Caatinga region is often treated in biodiversity, evolutionary, conservation, and biogeographical studies of plants and animals as a single, homogeneous unit, generally termed a “biome” (Instituto Brasileiro de Geografia e Estatística [IBGE], 2012; The Brazil Flora Group [BFG], 2015; Garda et al., 2017; Mesquita et al., 2017; Araujo and Silva, 2017; Antonelli et al., 2018; Manhães et al., 2018; Nascimento et al., 2018; Silva and Souza, 2018; Medeiros et al., 2019; Prieto-Torres et al., 2019; Souza-e-Silva et al., 2019; Correia et al., 2020; Dória and Dobrovolski, 2021). However, this definition of the Caatinga as a “biome” is in conflict with the more generally accepted definition of a biome at a global scale; it is in fact a biogeographic region. Whilst the Caatinga region may have a broadly similar, seasonally dry climate, it includes interdigitating, distinct biomes (as defined and recognized at global scale) such as rainforest and fire-prone savannas within a predominant matrix of Caatinga seasonally dry tropical forests (SDTF biome). The use of maps with a geographic delimitation of the Caatinga as a “biome” may impact upon downstream analyses aimed at disentangling the ecological and historical drivers that have shaped the evolutionary trajectories of all Caatinga species diversity (Queiroz, 2006; Cardoso and Queiroz, 2010; Queiroz et al., 2017; Guedes et al., 2014). It may also impact assessing priority areas for conservation in the severely impacted SDTF biome across Brazil, which it is not just confined to the Caatinga, but is found across the Cerrado and Pantanal regions (DRYFLOR, 2016; the Cerrado and Pantanal are also termed “biomes” by IBGE).

Here we explore how taxonomic, functional, and community phylogenetic data can be used to delimit biomes in NE Brazil and explore how biomes defined in these different ways are shaped by climatic variables. We aim to demonstrate that a Caatinga SDTF biome, lying at the extremely dry end of the tropical seasonality and rainfall gradient of NE Brazil, greatly differs from nearby and interdigitating ecologically and evolutionarily distinct rainforest and savanna biomes and occupies a unique environmental space. We use the three distinct data sources to measure the degree to which the Caatinga SDTF biome differs floristically, phylogenetically, and in functional trait composition, as first step to understand the ease at which these differences can and have been traversed adaptively by plant lineages through time (Wiens et al., 2010; Crisp and Cook, 2012; Donoghue and Edwards, 2014).

We use standard clustering algorithms, which in a geographic context have been termed regionalization approaches (Kreft and Jetz, 2010; Linder et al., 2012; Vilhena and Antonelli, 2015; Daru et al., 2017), to delimit clusters. Several authors have used biogeographic regionalization methods on a relatively similar (Linder et al., 2012; Fayolle et al., 2018; Aleman et al., 2020) or even larger scales (e.g., Kreft and Jetz, 2010; Holt et al., 2013; Vilhena and Antonelli, 2015; Ficetola et al., 2017) and argued that the results are comparable to biomes (Vilhena and Antonelli, 2015; Aleman et al., 2020). However, while extremely useful, previous regionalizations have been based on only a single type of data (e.g., floristic data; Linder et al., 2012) so cannot encompass all aspects of biomes. Our approach differs because we use three independent regionalizations, each based upon a different type of data (floristic, functional, and phylogenetic). By both combining and comparing the results of these regionalizations, we argue that our results approach for the first time a data-driven and repeatable biome map that considers all facets of agreed biome definitions.

Materials and Methods

Our approach differs from classical and modern approaches that fall under the umbrella of “biogeographical regionalization” or “bioregionalization” (e.g., Kreft and Jetz, 2010; Linder et al., 2012; Holt et al., 2013; Vilhena and Antonelli, 2015; Daru et al., 2016, 2017; Edler et al., 2017; Ficetola et al., 2017). Most such analyses are essentially focused on understanding the signature of historical (i.e., geographic barriers) processes in explaining the spatial distribution of specific “groups” of organisms across geographically confined areas, rather than delimiting biomes that are applicable across the tree of life. Those that do include broader ranges of taxa still rely only on floristic (e.g., Linder et al., 2012) or phylogenetic (e.g., Daru et al., 2017) data to classify the “bioregions,” rather than attempting to integrate floristic, functional, and phylogenetic data. The resulting areas are more akin to “biogeographic regions” (sensu Wallace, 1876; Holt et al., 2013) than “biomes.” In our study, biomes were delimited by comparing and combining independent regionalization analyses based upon floristic, functional, and phylogenetic data. We therefore consider the impact of geography and the evolutionary and functional distinctiveness of areas, as well as how the environment [here incorporated indirectly by the use of Species Distribution Models (SDMs)] defines where lineages are confined. We consider the results of our analyses biologically meaningful biomes.

Floristic data were based on thresholded, statistically significant SDMs for 3,457 flowering plant species, which were also included in a molecular phylogeny. SDMs were used to estimate species lists for all grid cells across the study area. Functional and phylogenetic data for each grid cell were based on seven functional traits and a community phylogeny generated for all the species present for the study area, respectively. Regionalization analyses were carried out using hierarchical clustering based upon floristic, functional, and phylogenetic distance among all cells. All three distance matrices (floristic, functional, and phylogenetic) were summed and used to delimit “total evidence” biomes.

Study Area

We applied our biome delimitation approaches to NE Brazil because of our long-term experience working on the taxonomy, distribution, ecology, and evolution of flowering plants in the region (e.g., Rocha et al., 2004; Queiroz, 2006; Queiroz et al., 2010, 2017; Särkinen et al., 2011; Santos et al., 2012; Cardoso et al., 2014; Fernandes et al., 2020; Moonlight et al., 2020). We defined our study area as NE Brazil (including the state of Minas Gerais) in order to include all areas defined as the Caatinga by Instituto Brasileiro de Geografia e Estatística [IBGE] (2012) and alternative biome classifications (e.g., Queiroz et al., 2017; Silva-de-Miranda et al., 2018; Moonlight et al., 2020). We have a particular interest in identifying “core” areas that are relatively homogenous in floristic, functional, and phylogenetic space, so it was important to include areas of all biomes that surround the Caatinga (i.e., areas known to differ in these respects). According to the Instituto Brasileiro de Geografia e Estatística [IBGE] (2012) classification, this includes the Cerrado (Brazilian savanna), Mata Atlantica (Brazilian Atlantic rainforest), and Amazonia (Amazon rainforest). Our study area has an area of 2.144 million km2 and includes a buffer of at least 200 km around the IBGE definition of the Caatinga in all directions (Figure 1).

Species Distribution Data

The ongoing Flora do Brasil 2020 project (2021) provides a robust taxonomic framework to work in the region with currently 16,351 species of flowering plants recorded for NE Brazil (Flora do Brasil 2020, 2021). A relatively large amount of species occurrence data is available for the region (Supplementary Figure 1) thanks to two dynamic network of local herbaria across Brazil available through CRIA speciesLink and the Reflora specimen database.

In this study, we attempted to produce an SDM for every angiosperm species with recorded distributions in NE Brazil (Reflora, 2021) using data from CRIA speciesLink and the Reflora specimen database. While many NE Brazilian species have distributions outside Brazil, we only included Brazilian specimen records due to difficulties matching taxonomic backbones across countries. We used the latest version of Reflora (2021) to harmonize the taxonomy of the two specimen databases and update synonymy. Data were cleaned in six stages as described in Appendix S2 of Moonlight et al. (2020), which were designed to remove misidentified specimens and those with coordinate errors. We addressed environmental bias in occurrence data using spatial filtering by retaining a single specimen record within each 10 km radius for every species following Kramer-Schadt et al. (2013). We retained and attempted to model all species with ≥5 records collected in different grid cells at a 0.05° resolution, resulting in 296,439 unique occurrence records for 9,134 species.

Species Distribution Modeling

Climatic predictors were derived from remotely sensed temperature (MODIS; Wan and Dozier, 1996; Wan, 2014), rainfall (CHIRPS; Funk et al., 2014), and cloud cover data (MODCF; Wilson and Jetz, 2016). These data are calibrated with data from ground weather stations and outperform those derived from ground data alone (e.g., WorldClim) for SDMs (Deblauwe et al., 2016). All data were downloaded at a 0.05° resolution (c. 5.5 km2 at the equator). Edaphic data were derived from the SoilGrids 250 m database (accessed February 2017) interpolated to a 0.05° resolution. Edaphic factors are believed to be important in determining species distributions in NE Brazil (Queiroz et al., 2017) and have been shown to increase SDM performance in the tropical Americas (Figueiredo et al., 2017; Moulatlet et al., 2017; Rapini et al., 2021). Climatic and edaphic predictors (35 and 55, respectively) were converted into two independent principal component analysis (PCA) axes. Four and five axes were selected, which each explained >80% of variation (see Appendix S4 of Moonlight et al., 2020). This process reduces the number of explanatory variables, thus minimizing collinearity (Dormann et al., 2013) and model overfitting (Peterson et al., 2007) while maximizing the explanatory data available for modeling.

SDMs for 9,134 were run using MaxEnt v.3.3.3 in the R package “dismo” (Hijmans et al., 2017). MaxEnt was chosen because it has been shown to outperform other SDM methodologies, particularly when species have few distribution points. MaxEnt was used with the default settings, with 5-fold cross-validation, and all feature classes allowed. Background data (also known as pseudo-absence data) were sampled from a unique area for each species, consisting of NE Brazil plus circles of 250 km around each species’ known occurrence points. This was a compromise between predicting species into areas not covered by background points, providing a large number of climatically unsuitable points (Anderson and Raza, 2010), and including a biologically realistic extent for each species’ model. We controlled for bias in sampling effort (Stolar and Nielsen, 2015) by selecting 10,000 background points for each species using an Epanechnikov kernel (Wiegand and Moloney, 2013) calculated from all angiosperm presence data for Brazil.

We evaluated model performance using the Continuous Boyce Index (CBI; Hirzel et al., 2006). This evaluation index that relies upon presence and pseudo-absence data is based upon the Boyce Index (Boyce et al., 2002), calculated with code available at CBI has been shown to be less stochastic to variation at low numbers of presence points than alternative indices (Hirzel et al., 2006). A CBI of ≥0 indicates that a model is better than chance and from the SDMs of 9,134 species we retained all models with a mean CBI of ≥0.25 over the five replicates (6,823 species, 75%, see Supplementary Table 3). Retained replicates were summed and converted into binary presence-absence maps using the 10th percentile logistic threshold. This was chosen because it was the strictest of the commonly used threshold values, so it limits the well-documented over-estimated of summed SDMs. To maximize the compatibility of analyses, we retained models only for species included in the community phylogeny (see section “Phylogenetic Delimitation” below) for downstream analyses (3,457 species). Predicted species lists were estimated for every 0.05° grid cell in NE Brazil, which were aggregated to a 0.25° resolution due to computation constraints. Presence in a species list was assigned based upon predicted presence in any constituent cell at the original 0.05° resolution.

Cluster Delimitation

Floristic Delimitation

To delimit clusters based upon floristic data (floristic regionalization), a distance matrix was computed based on β-diversity (Simpson’s dissimilarity: β-sim) in the R package “betapart” (Baselga et al., 2018). This approach was chosen because it measures floristic turnover (i.e., dissimilarity) between grid cells and not nestedness (Baselga, 2010). The floristic distance matrix was used in unbiased cluster analysis, where the row order of the distance matrix was randomized 100 times using the “recluster” package in R (Dapporto et al., 2013, 2015). RogueNaRok (Aberer et al., 2012; Available at: was used to identify rogue grid cells responsible for reducing resolution in the resulting 50% majority rule consensus dendrogram. A total of 138 rogue grid cells were identified and removed. Clusters were mapped based upon a process of reciprocal illumination following Moonlight et al. (2020) and analogous to the approaches taken by similar analyses (Silva-de-Miranda et al., 2018; Moonlight et al., 2020). Clusters were labeled based on comparing the mapped distribution of sub-clusters to our biological knowledge of the vegetation patterns in NE Brazil. Our priorities were to delimit clusters that could be matched with confidence to four biomes recognized by the Instituto Brasileiro de Geografia e Estatística [IBGE] (2012) classification, and to maximize similarity with the mapped clusters from the phylogenetic and functional biome classifications (see below). We acknowledge that this approach is not fully repeatable, but argue that it is the best currently available considering that alternative approaches (e.g., k-means clustering, Amaral et al., 2017) also rely upon prior knowledge to define an expected number of biomes. We term this approach “floristic regionalization.”

Phylogenetic Delimitation

To delimit clusters based upon phylogenetic distances between plant communities (phylogenetic regionalization), we produced a novel community phylogeny for the flora of NE Brazil based on DNA sequences mostly downloaded from GenBank (Benson et al., 2017). We attempted to download sequence data for all 16,351 species recorded in NE Brazil by Flora do Brasil 2020 (2021) for the following regions: matK, atpB, ndhF, rbcL, and trnL. Regions were chosen based upon: (i) wide use across angiosperms; (ii) ease of alignment across angiosperms; (iii) adequate level of sequence variation across orders, families, and genera.

To augment our species sampling in the community phylogeny, we have newly generated 445 matK and 444 rbcL sequences from herbarium and field-collected leaf tissues preserved through silica gel desiccation of 546 species. DNA extraction, PCR amplification, and robotic sequencing largely followed standard protocols of DNA barcoding for community phylogenetics (e.g., Kress et al., 2009). The newly generated DNA sequences are publicly available in GenBank (see Supplementary Table 1 for voucher and accession number details).

The R package “rentrex” (Winters, 2017) was used to query GenBank, specifying a sequence length of 500–5,000 base pairs. For species in NE Brazil for which no sequence data were available in GenBank, we repeated the steps above to locate sequences for congeners from outside of Brazil (for genera with one or two species within NE Brazil) because any one or two species for these genera would provide the same phylogenetic distance in our analyses as those present in NE Brazil. Alignments were done in Mafft v.7.450 for each DNA region with default settings with a maximum of six iterations performed per region. Data cleaning was done by identifying poorly aligned sequences based on visual assessment, using Vsearch v2.14.2 to identify highly variable sequences with <40% sequence similarity, and by identifying misplaced species based on neighbor-joining trees run for each DNA region with FastTree v2.1.10 (Price et al., 2010). Replacements for any species removed during the cleaning were searched in GenBank if alternative sequences were available. The final cleaned community phylogeny contained 10,279 sequences from GenBank and 662 newly generated sequences for a total of 6,296 species (Supplementary Table 2). All DNA regions were combined and analyzed using RaxML-HPC2 on XSEDE on the CIPRES Science Gateway v.3.3 on-line portal (Miller et al., 2010). The phylogeny was rooted with Nymphaeales. The community phylogeny included 6,296 species from 209 families and 1,775 genera (Supplementary Figure 2). The relationships among them were consistent with the phylogeny of flowering plants (The Angiosperm Phylogeny Group et al., 2016). A total of 3,457 species from 184 families and 1,325 genera in the phylogeny were also included in the SDMs so were retained for downstream analyses.

Phylogenetic distances between all grid cells in NE Brazil were calculated by estimating the phylogenetic β-diversity among estimated species lists. The phylogenetic Simpson’s index was used following Chave et al. (2007) because it is comparable with β-sim (see above). Phylogenetic regionalization was carried out following the hierarchical clustering method described above under section “Floristic Delimitation.” No rogue grid cells were identified and removed.

Functional Delimitation

To delimit clusters based upon functional distances between plant communities (functional regionalization), we scored seven independent plant traits for all 3,457 species for which SDMs were generated and which were included in the community phylogeny (100% coverage for 3,457 species for all seven traits). These included 931 species scored by Moonlight et al. (2020). Traits were chosen on the basis of: (i) that they were simple and unambiguous to score from herbarium specimens or literature; and (ii) had hypothesized links with environmental and functional differences among biomes in NE Brazil. The seven traits chosen were: (i) latex; (ii) corky bark; (iii) spines; (iv) compound leaves; (v) nitrogen nodulation; (vi) Crassulacean acid metabolism (CAM) photosynthesis; and (vii) C4 photosynthesis (Supplementary Table 2). The literature used to score traits is detailed in Appendix S7 of Moonlight et al. (2020).

Functional distance matrices between all grid cells in NE Brazil were created based upon estimated species lists. Euclidean distances of grid cells in 6-dimensional trait space based on the proportion of species with each trait in each grid cell was used for measuring functional distance. Published functional diversity metrics (e.g., Ricotta et al., 2016) were found inappropriate for measuring functional distance between plant communities because they are based on the presence rather than the proportion of a trait, which leads to overestimation of functional similarity at broad spatial scales where almost all grid cells had at least one predicted species with every trait. Functional regionalization was carried out following the hierarchical clustering method described above under section “Floristic Delimitation.” No rogue grid cells were identified and removed.

Combined Cluster Delimitation

Combined analyses of floristic, phylogenetic, and functional distance matrices were run to delimit “total evidence” clusters based on all three approaches. All possible combinations of the three approaches were also run to see subsets of results (i.e., floristic + phylogenetic, floristic + functional, and phylogenetic + functional). Distance matrices were scaled from 0 to 1 to give equal weight to each matrix before being summed, so the distance matrix values could range from 0 to 2 in analyses with two approaches, or 0 to 3 in analyses with three approaches. Clusters were estimated, mapped, and named following the hierarchical clustering method described under section “Floristic Delimitation.” The results of the “total evidence” clusters are named herein as biomes because they are the result of three, independent lines of evidence. No rogue grid cells were removed.

Comparison of Regionalizations: Clusters as Biomes

Comparisons of regionalizations based on individual approaches (functional, phylogenetic, and floristic) were carried out in both geographic and environmental space to highlight areas of “core” biomes (areas where all regionalizations analyses agreed on biome identity) and transition zones (areas where regionalizations differed between analyses). “Core” biome areas were visualized by highlighting areas of agreement between the three individual regionalizations (e.g., “core Caatinga”). Transitional biome areas were visualized by highlighting areas of disagreement where one or two approaches showed disagreement regarding biome distribution (e.g., “transitional Caatinga”). A raster file depicting the core Caatinga and associated transition zones is available as Supplementary Figure 5.

To investigate whether core biomes overlapped in environmental space, environmental comparison of biome delimitations was achieved by plotting clusters from different analyses in environmental space based on mean annual temperature (bio1) and rainfall data (bio12) extracted for each grid cell. The bio1 and bio12 values for each biome grid cell from each analysis were summed. The mean annual temperature (bio1) from MODIS temperature data (Wan and Dozier, 1996; Wan, 2014) and mean annual rainfall (bio12) from CHIRPS rainfall data (Funk et al., 2014) were used.


Major Biomes Across NE Brazil

All three individual regionalizations identified two clusters of Caatinga SDTF, two clusters as Mata Atlantica, and three clusters as Cerrado (Figures 2A–C). The functional analysis resulted in the identification of an additional cluster within both the Mata Atlantica and Cerrado, and an extra cluster that we were unable to assign to any recognized biome (Figure 2A). The combined “total evidence” analyses of all three approaches suggested seven major clusters based on functional, phylogenetic, and floristic data (Figure 2G).

Figure 2. Cluster classifications across northeast Brazil based upon hierarchical clustering analyses of functional, phylogenetic, and floristic data alone (A–C) and in combination (D–F): (A) functional data; (B) phylogenetic data; (C) floristic data; (D) functional and phylogenetic data; (E) phylogenetic and floristic data; (F) functional and floristic data; and (G) functional, phylogenetic, and floristic data. The black line indicates the limits of the Caatinga domain sensu Instituto Brasileiro de Geografia e Estatística (IBGE) (2012).

Differences Between Approaches

The areas where differences between the three regionalizations are seen can be considered as transitional areas (Figure 3 and Supplementary Figure 4). Most disagreement between the three individual regionalizations is seen in the southern part of the Caatinga biogeographic domain in the Chapada Diamantina area (Figures 2, 3). Functional and floristic regionalizations identify most of this area as Mata Atlantica but the phylogenetic regionalization identifies most of these areas as Cerrado (Figures 2, 3).

Figure 3. Areas of agreement and disagreement among biome delimitations across northeast Brazil based upon hierarchical clustering analyses of functional, phylogenetic, and floristic data alone (see Figures 2A–C). Areas of agreement are shown as “core” clusters (A–C) or biomes (D) and areas of disagreement are shown as “transitional zones.” (A) functional and phylogenetic data; (B) phylogenetic and floristic data; (C) functional and floristic data; and (D) functional, phylogenetic, and floristic data. The black line indicates the limits of the Caatinga domain sensu Instituto Brasileiro de Geografia e Estatística (IBGE) (2012). Alternative plots showing the identity of transitional zones are given in Supplementary Figure 4.

Comparison of clusters delimited by single approaches shows that clusters delimited by floristic and phylogenetic approaches are highly similar both in spatial extent and number of major clusters where both analyses resolve seven major clusters in largely similar areas across NE Brazil (Figures 2, 3). Functional clusters show largest differences compared to phylogenetic and floristic clusters, and resolve ten major clusters indicating higher resolution of functional data for biome delimitation within NE Brazil, enabling further splitting of vegetation types as functionally distinct clusters (Figures 2, 3 and Supplementary Figure 3). This is despite our functional classification being based on only seven categorical traits.

The “Core” Caatinga

In terms of the SDTF biome, all three analyses identified two groups as Caatinga SDTF (Figures 2A–C). A large area within NE Brazil is identified here as the “core Caatinga” biome supported by all three regionalizations (functional, phylogenetic, and floristic) (Figure 4). Areas surrounding the “core” Caatinga are identified as “transitional Caatinga” (Figure 4): these include areas supported as Caatinga by one or two regionalizations but not all three. Transitional Caatinga is found across the western and southern borders of the “core” Caatinga, much less so along the eastern side along the boundary with the coastal Mata Atlantica domain (Figures 3, 4). Floristic and phylogenetic data support transitional Caatinga in the South and West of the “core” Caatinga, but functional data identify these areas as savanna-like Cerrado (e.g., Chapada do Araripe; Figures 2, 3). In common with the individual analyses, all mapped Caatinga groups were largely congruent with the Caatinga domain along its eastern border but included several differences along the eastern border, which are discussed in detail below.

Figure 4. Distribution of the Caatinga seasonally dry forest biome within NE Brazil in geographic, environmental (A) and trait (B,C) space. Colors indicate areas delimited as Caatinga in no analyses (gray), one or two analyses (“transitional Caatinga,” yellow and orange), and three analyses (“core” Caatinga, red). The black line in the map indicates the limits of the Caatinga domain sensu Instituto Brasileiro de Geografia e Estatística (IBGE) (2012).

Our integrated biome delimitation using a combination of floristic, functional, and phylogenetic data identifies the “core” Caatinga within the driest and hottest areas of NE Brazil (Figure 4A). These areas are dominated by a floristically, functionally, and phylogenetically distinct flora adapted to dry conditions with seasonal rainfall. This “core” Caatinga area is found in areas with 353–1,271 mm annual rainfall and with a mean annual temperature from 19.6 to 28.0°, and does not overlap with other core biomes in environmental space (Figure 4A). The flora of the “core” Caatinga is characterized by a high proportion of species that are succulent, nodulating (Figure 4B), and with spines but without corky bark (Figure 4C). The low rainfall in the “core” Caatinga is notable. Dry forests in the Americas are found in areas with up to 1800 mm rainfall (DRYFLOR, 2016; Dexter et al., 2018) but none of the “core” Caatinga dry forests approach this threshold, despite the prevalence of areas with higher rainfall in the wider study area of NE Brazil.


Core Biomes and Transition Zones

Biomes have been delimited in various ways, including based on the spatial distribution of physiognomic, floristic or functional discontinuities amongst plant communities (Mucina, 2019) or even dynamic global vegetation biome modeling involving the combination of physical environment, plant functional types, physiology, and biochemical fluxes (Prentice et al., 1992; Kaplan et al., 2003). Physiognomic delimitations have followed plant growth form and canopy structure (Woodward et al., 2004); floristic maps have focused on dominant plant families, genera, and species and the associated ecological correlates (e.g., in Africa, White, 1983; Power et al., 2017; Aleman et al., 2020; South America, Silva-de-Miranda et al., 2018); and functional systems on the presence of functional groups, such as evergreen trees found in the tropical rainforest biome, succulents in the seasonally dry tropical forest or succulent biome, and a continuous grass layer in savannas (Whittaker, 1970; Scholes and Archer, 1997; Schrire et al., 2005). Additionally, strictly environmental-based delimitations of biomes based on climate, edaphic composition or degree of fire disturbance have provided important insights into our understanding of the ecological limits driving the assembly of plant communities (Archibald et al., 2013; Langan et al., 2017). These different approaches based upon different types of data produce different biome maps, but no map produced using a single type of data can possibly satisfy all aspects of the multifaceted definition of a biome expected to have shaped their biodiversity over evolutionary time, i.e., delimited biomes may not be floristically, functionally, or ecologically distinct.

Our approach differs in two ways from previous methods depicting global or regional distribution of biomes, including the Caatinga (e.g., Whittaker, 1970; White, 1983; Schrire et al., 2005; Instituto Brasileiro de Geografia e Estatística [IBGE], 2012; Conradi et al., 2020; Ringelberg et al., 2020): firstly, we identify “core” biome areas supported by three distinct lines of evidence (floristic, functional, and phylogenetic), and secondly, we identify transition zones that are supported by distinct data sources that point to biologically important areas of transition previously neglected by most biome maps. Our integrated biome analyses are able to highlight “core” areas where all three data sources (floristic, functional, and phylogenetic) agree on the distribution of the same biome (Figure 4). We suggest that such areas of congruence based on multiple lines of evidence may be of particular interest to biology and earth system science.

Ultimately, the identification of biologically relevant areas should depend upon its intended purpose. If you are interested in the response of the Caatinga to climate change, for example, you should focus on the areas that independent data sources agree are “core” Caatinga (Figure 4), not areas where data sources disagree (i.e., transition zones), where the species, ecological function, or phylogenetic diversity may overlap that of other biomes. Likewise, if you wish to measure the functional traits of a species, you measure the traits of an individual at the “core” of the species concept, not a hybrid of dubious identification. Also, because the transition zones may be particularly dynamic or vulnerable to climate change (i.e., “zones of tension,” Clements, 1905), if your interest is in conserving the maximum species diversity and ecosystem function of Caatinga dry forests in the face of climate change, it may be best to model a set of species/geographic area/set of traits that all data agree are Caatinga.

The “Core” Caatinga

The Caatinga and other seasonally dry tropical forests have often been considered equivalent to the trans-continentally distributed “succulent biome,” typically characterized by nutrient-rich substrates with little water holding capacity, and by a highly seasonal, drought-prone climate in which succulent plant lineages (e.g., columnar and arborescent members of Agavaceae, Bromeliaceae, Cactaceae, and Euphorbiaceae) evolved and diversified under a regime of regular and prolonged drought and without the influences of fire and other physical disturbances (Schrire et al., 2005; Pennington et al., 2009; Ringelberg et al., 2020). The climate space of the global succulent biome as recently modeled by the distribution of stem succulents involves a mean annual rainfall that closely matches that of South American SDTFs (Ringelberg et al., 2020), which also have a high proportion of stem succulents (DRYFLOR, 2016; Queiroz et al., 2017). Also at intercontinental scales, Segovia et al. (2020) underlines the structuring of evolutionary diversity of trees in the Neotropics along precipitation gradients. Precipitation-related bioclimatic variables were singled out as the most important precipitation measures predicting the succulent biome with a MaxEnt approach to large-scale biome distribution modeling of South American SDTFs (Särkinen et al., 2011). These same precipitation variables were found by Oliveira-Filho et al. (2013a) to be most important in distinguishing the succulent (including the Caatinga) and savanna (including the Cerrado) biomes. The analysis of community phylogenetic distances and the biome assignments of 466 floristic sites across eastern Brazil were best explained, of all the bioclimatic variables, by only annual precipitation at a threshold of <1,200 mm (Oliveira-Filho et al., 2013a). This ecological structure reflects the link between annual precipitation and phylogenetic niche conservatism (Oliveira-Filho et al., 2013a; Segovia et al., 2020). Substrate conditions were shown to be ecologically significant (with or without water-holding capacity), but this ecological variable may not mask the more important influence of annual precipitation as explanatory of the community phylogenetic structure (Oliveira-Filho et al., 2013a).

While comparative analyses have employed multiple taxa to interrogate biomes, other approaches seek to identify key taxa or functional groups that can be used to delimit biomes. One such technique was employed by Ringelberg et al. (2020), who used modeled distributions of a single functional group (i.e., stem succulents) to map the “succulent biome” (Schrire et al., 2005; Ringelberg et al., 2020). The results do not adequately capture the biome complexity in Brazil. For example, Ringelberg et al. (2020) delimit not just the Caatinga as succulent biome, but also parts of the Chaco and the campos rupestres of the Chapada Diamantina, which are ecologically, historically, and functionally distinct biomes (Pennington et al., 2000; DRYFLOR, 2016; Silva-de-Miranda et al., 2018; Rapini et al., 2021). Indeed, we demonstrate that not just the “core” Caatinga but also transitional and non-Caatinga areas have high proportions of stem succulents (Figure 4B). Such single-criterion approaches to biome delimitation do not differ much from an operational view of biomes (Conradi et al., 2020). If biomes are to be defined as evolutionary theaters, the distributions of as general group of taxa as possible should be examined in conjunction with their phylogenetic relationships and functional characteristics. Our integrated ecological and evolutionary approach involving multiple taxa across all growth forms and associated measurements of functional and phylogenetic diversity seem to better describe “core” evolutionary arenas.

Our results show that the “core” Caatinga area is in fact narrower than the widely used definition of Caatinga by Instituto Brasileiro de Geografia e Estatística [IBGE], 2012. It excludes the wetter end of dry forests and transitional areas between savanna and dry forests (Figure 4). Not only that, but other major biomes exist within the Caatinga region (Queiroz et al., 2017). Disagreement between the three regionalizations in the Chapada Diamantina region, for example, highlights the biological reality of complex variation across environmentally highly heterogeneous areas, and should not be ignored. Thus, areas delimited by an essentially geographic approach such as that used by Instituto Brasileiro de Geografia e Estatística [IBGE] (2012) should not be termed “biomes.” These approaches not only oversimplify the complexity of the interdigitating nature of biomes, but also disregard the global nature dimension of biomes, which is fundamental to a more biologically realistic use of biomes in ecology, conservation, and evolutionary biogeography.

Transitional Biome Areas

One of the perhaps most important messages of our study relates to areas where the three regionalizations disagree in their cluster delimitation (Figure 3). These areas of disagreement between floristic, functional, and phylogenetic data indicate the presence of transition zones between two or more biomes, presumably along environmental gradients. An alternative method of visualizing transition zones is presented in Supplementary Figure 3. Most previous biome delimitation analyses have not included transition zones, treating biomes as categorical variables with no intermediates (e.g., Instituto Brasileiro de Geografia e Estatística [IBGE] (2012); see Figure 1). Here we highlight transition zones as biologically interesting areas in their own right that cover a large percentage of our study area. For example, the area we delimit as “core” Caatinga covers 420k km2 and the areas we delimit as transition zones cover 480k km2 (254k km2 delimited as Caatinga by two analyses; 227k km2 delimited as Caatinga by one analysis), or 14% larger than the “core” Caatinga itself.

The existence and extent of transition zones raises an important question: what may underlie the differences among the results from our individual regionalizations and what may these differences tell us about biomes in NE Brazil? Transition zones may reveal areas along environmental gradients between “core” biomes where sets of traits from more than one biome may permit survival, as suggested by the “environmental crossroads hypothesis” (Neves et al., 2020). Additionally, we predict that differences in the adaptations required to successfully persist under different rainfall regimes lead to distinct floristic and functional compositions across biomes. The level of phylogenetic distinctiveness, on the other hand, will depend on the relative ease at which plants have acquired adaptations required for crossing environmental gradients, i.e., the level of phylogenetic niche and trait conservatism (Donoghue, 2008; Crisp et al., 2009).

Our results show that similar clusters are delimited using phylogenetic and floristic data across NE Brazil, some of which we suggest correspond to biomes that can be recognized at global scale, including SDTF, rainforest, and savanna. This indicates that phylogenetic niche conservatism is operating and preventing plant lineages from crossing environmental gradients across evolutionary time (e.g., extreme drought constrained species adaptations to successfully thrive in dry forests identified here as Caatinga). Functional clusters, however, show clear differences to floristic and phylogenetic clusters, both in the number of clusters and the geographic distribution of those clusters (Figure 3). The differences between the functional (Figure 3A) and phylogenetic (Figure 3B) regionalizations indicate that the functional traits included in our analyses are not conserved across the phylogeny of the angiosperms, or that phylogenetic trait conservatism acts differently for different traits. This may relate to the specific traits used here (e.g., we do not expect spines or corky bark to be conserved) but our analyses do include several well conserved traits (e.g., nodulation, latex, and C4 photosynthesis). The differences between the floristic (Figure 3A) and functional (Figure 3C) regionalizations may indicate that the possession of certain trait combinations may allow limited sets of species to span borders created by ecological and evolutionary processes.

The identity of clusters identified in the functional analysis that could not be linked to any previously indicated biomes is unclear. They might reflect that the functional data is capturing the complexity with respect to the high within-biomes habitat heterogeneity across NE Brazil. For example, each of the most predominant savanna, rainforest, and the “core” Caatinga seasonally dry forest biomes are not physiognomically, floristically, and edaphically homogeneous, rather they also exhibit highly specialized habitats. Just in the “core” Caatinga, plant communities seem to have been evolutionarily structured by major soil types like karst, sand, and crystalline (Santos et al., 2012; Moro et al., 2015; Queiroz et al., 2017); the savanna here encompasses all the campos rupestres vegetation on mountaintops of the Chapada Diamantina, which mostly involve fire-sensitive plant lineages, including succulents (Rapini et al., 2021); and the Mata Atlantica involves habitats as distinct as the more open coastal restingas (Scarano, 2002; Oliveira et al., 2014; Fernandes and Queiroz, 2015). These habitats may not be unique in their phylogenetic or floristic composition as potentially distinct biomes but are in their functional composition.

Agreement in clusters delimited by all three regionalizations in areas such as along the eastern border of the “core” Caatinga suggests ecological filtering is acting in conjunction with phylogenetic niche conservatism along the same environmental gradient (Caatinga-Mata Atlantica border; Figure 4). Transitional Caatinga areas in the southern parts of the Caatinga biogeographic domain indicate that ecological filtering is acting along different environmental gradients to phylogenetic niche conservatism, at least for some of the traits included in our analyses.

Biologically Meaningful Biomes in Evolutionary Biogeography

Comparative phylogenetic approaches of multiple taxa to understand the evolutionary history of the biodiversity in species-rich, yet geologically and climatically complex regions like the Neotropics have revealed important insights into how lineages and species interact with ecology and geography over evolutionary time. Such analyses often employ the reconstruction of ancestral areas across phylogenies to estimate the rates of biotic interchange between biogeographic regions. For example, Antonelli et al. (2018) showed an impressively large number of dispersal events out of Amazonia to other major neotropical regions, where at least 85 species among angiosperms, birds, ferns, frogs, mammals, and squamates inhabiting the Caatinga region were inferred to have derived from an Amazonian ancestor. Biogeographic regions are often treated as homogeneous units (i.e., synonymous with biomes) in estimates of ancestral distributions. This approach cannot take into account the complexity of climatic, evolutionary, and functional spaces that confine species ecologically as we demonstrate for NE Brazil here. As such, it risks conflating the primary roles of geography and ecology, such that biome switching is likely to be overestimated in ecologically confined groups with broad distributions but underestimated in biogeographically confined groups with broad ecological distributions. Thus, such comparative approaches cannot allow us to deepen our understanding of whether the “core” Caatinga or the other putatively distinct biomes of NE Brazil (Figure 3) are more evolutionarily accessible to lineages from Amazonian tropical rainforests. In other words, some Amazon to Caatinga “switches” may be to “rainforests” within the Caatinga biogeographic region and thus represent geographic movements, not evolutionary switches. Likewise, most studies neglect that the grass-rich, fire-prone savanna biome that predominates in the Cerrado domain is criss-crossed by a network of gallery forests (i.e., rainforests; Silva-de-Miranda et al., 2018), as well as fire-sensitive seasonally dry, evergreen or semideciduous forests, depending on water availability, in patches of high fertility soils (Oliveira-Filho et al., 2013b; Bueno et al., 2018).

That lineages can transcend geographic barriers is also a product of the evolutionary accessibility of distinct biomes that create geographical opportunity (Edwards and Donoghue, 2013). So, by inferring biotic interchange across biogeographic regions (e.g., Antonelli et al., 2018), without considering the biotic complexity within them, it is impossible to uncover the true balance of phylogenetic biome conservatism versus evolutionary biome shifts. Fortunately, recent progress has been made toward developing more biogeographically realistic approaches that model how lineages shift between biomes depending on the temporal availability and geographical connectivity of biomes (Landis et al., 2021a,b). A definition that captures the most biologically meaningful nature of a biome in space and time is clearly critical for comparative analyses involving Landis et al.’s (2021a) phylogenetic biome shift model. Thus, we believe that our approach of combining the multiple dimensions describing the spatial distribution of biomes will help to more reliably map and understand the evolution and functioning of biodiversity.

Data Availability Statement

The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

Author Contributions

DC, PWM, and TS designed the study and led the writing with significant contributions from all co-authors. DC, GR, and PWM entered functional trait data. GO generated sequence data for community phylogeny. CD designed the pipeline for producing community phylogenies. PWM led all analyses. All authors commented and agreed on the last version of the manuscript.


This research was funded by the Natural Environment Research Council-Newton grant NE/N012526/1 and Fundação de Amparo à Pesquisa do Estado de São Paulo grant 2015/50488-5 “Nordeste: New Science for a Neglected Biome,” and Royal Society Advanced Fellowship grant NAF/R1/180331. DC’s research in plant biodiversity is also funded by Fundação de Amparo à Pesquisa do Estado da Bahia (Universal no. APP0037/2016) and Conselho Nacional de Desenvolvimento Científico e Tecnológico Research Productivity PQ-2 grant 308244/2018-4.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.


We thank Centro de Referência em Informação Ambiental (CRIA) and Reflora for sharing their complete distribution datasets; Alexandre Brunello, Luiza Cosme, Moabe Fernandes, Raquel Miatto, Marcelo Mizushima, Tony Oliveira, Oliver Phillips, Desirée Ramos, Raphael Rocha, Rubens Santos, Valdemir Silva, and Elmar Veenendaal for helping collecting silica-gel dried leaf samples during fieldwork; Glaucia Drummond (Fundação Biodiversitas, Canudos) and Eudes Velozo (Fazenda Esperança, Boa Vista do Tupim) for their support during fieldwork; Matt Lavin and two reviewers for their valuable suggestions to improve the manuscript; and Associate Editor Danilo Neves for inviting us to contribute with the special issue “Temporal and large-scale spatial patterns of plant diversity and diversification.” DC acknowledges Jon Lloyd for inviting him to be speaker of the 44th New Phytologist Symposium “Determinants of tropical vegetation structure and function” in Accra, Ghana, when most ideas of this article were originally presented.

Supplementary Material

The Supplementary Material for this article can be found online at:

Supplementary Figure 1 | Species richness across northeast Brazil showing (A) “raw” species richness from cleaned occurrence data for 9,134 species with ≥5 records; (B) “raw” species richness from cleaned occurrence data for 3,547 species included in all analyses; (C) modeled species richness for 3,547 species included in all analyses.

Supplementary Figure 2 | Community phylogeny for northeast Brazil.

Supplementary Figure 3 | Dendrograms showing the major clusters from the cluster analyses based upon: (A) phylogenetic data; (B) floristic data; (C) functional data; (D) taxonomic and functional data combined; (E) taxonomic and phylogenetic data combined; (F) functional and phylogenetic data combined; (G) taxonomic, functional, and phylogenetic data combined. Clusters are colored by their biome classification and are mapped in geographic space in Figures 2A–G.

Supplementary Figure 4 | Areas of agreement and disagreement among biome classifications across northeast Brazil based upon hierarchical clustering analyses of functional, phylogenetic, and floristic data alone (see Figures 2A–C). Areas of agreement (“core” biomes) are highlighted as per the legend and areas of disagreement (“transitional zones”) are shown with intermediate colors between core biomes. (A) functional and phylogenetic data; (B) phylogenetic and floristic data; (C) functional and floristic data; and (D) functional, phylogenetic, and floristic data. The black line indicates the limits of the Caatinga domain sensu Instituto Brasileiro de Geografia e Estatística [IBGE] (2012).

Supplementary Figure 5 | Raster file used to produce Figure 4A that highlights the “core” Caatinga and associated transition zones.

Supplementary Table 1 | Voucher details for DNA sequences newly generated in this study.

Supplementary Table 2 | Species included in the analyses, including taxonomic family, functional traits, and model performance statistics.

Supplementary Table 3 | The relationship between Continuous Boyce Index (CBI) and the number of independent records by species.




This article is autogenerated using RSS feeds and has not been created or edited by OA JF.

Click here for Source link (