EvoLaps problem solving process
An EvoLaps analysis unfolds into three steps: data importation, clustering and edition (Fig. 2):
The first step (Fig. 2a) corresponds to the importation of data from a third-party software that generates a consensus of a phylogeographic analysis. Input data is submitted on the fly, it must contain a rooted tree (NEXUS format) with samples and ancestral (consensus) latitude/longitude coordinates for the tips (samples) and the internal nodes of the tree (ancestral species), respectively. This file may be generated using TreeAnnotator from the BEAST software package , following a Bayesian phylogenetic analysis. On the server side (Fig. 2b), the input file is parsed to extract and save the newick string and the latitude/longitude coordinates of each node under a XML format. Then, back to the browser side, the tree is displayed and the geographic map is updated to display the locations of each tip of the tree;
Clustering is the second step (Fig. 2c) of the analysis. It consists in gathering the latitude/longitude coordinates of every individual location into geographical areas. Several methods are available to define clusters of locations (see below). Clusters are color-encoded using a linear color scale (mono or polychromatic) or a 2D matrix color scale which is an efficient way to set progressive color changes between close clusters. The cluster/color list is then sent to the server (Fig. 2d). The tree is color-encoded and is read from its root to its tips to compute a transition diagram;
The third step of the analysis, edition, corresponds to the visualization of the phylogeographic reconstruction (Fig. 2e). The transition diagram is read from its root to its tips: each transition is projected on the map as a path between two clusters. The result is a phylogeographic scenario anchored to the clusters. This scenario is displayed step by step, manually (backward/forward buttons) or automatically (animation with adjustable speed). The phylogeographic pattern can be edited (size and curvature of paths), highlighted (visualization of transition suites) and restricted, thanks to dynamic time slices superimposed on the tree.
The analysis can be iterated from the clustering step. A session starts with a small number of clusters, and the clustering can then be refined on the fly, with subdivisions of one or several of the previous clusters into smaller ones up to a satisfying output.
EvoLaps clustering mode
EvoLaps proposes two modes of clustering of coordinates of samples and/or ancestral lineages (Fig. 3).
Clustering mode 1 The first mode of clustering does not consider ancestral locations in the definition of the clusters. The user defines clusters of sample locations on the geographic map at a given spatial scale, with the help of methods and selection tools, such as K-means clustering and/or manual lasso selections on the map and/or clade selections from the tree. The K-means algorithm  requires (a users’s setting) an initial K number of seeds (cluster centroids) randomly generated within the boundaries of sample locations. Then, iteratively until no change, sample locations are assigned to their closest centroid based on the Euclidean distance and centroids are updated. Clusters are displayed on the map as a list of smoothed polygons containing one or more sample locations (Fig. 3a).
Clustering mode 2 The second clustering mode uses a dynamic grid of latitude/longitude bounds to partition the space at a given density and scale, and each bound can be dragged and dropped to produce a more relevant space division. The grid of latitude/longitude bounds subdivides the whole space into regions. If a region contains one or more ancestral and/or sample locations, it contains a cluster. Ancestral locations are thus taken into account in clustering mode 2. Clusters are then displayed on the map as bounding boxes of their locations (Fig. 3b). This clustering mode may also be used with a K-means clustering. In this case, clusters are identified considering sampled locations only, then minimum and maximum of latitude and longitude coordinates of each cluster are used to position bounds of meridian/parallel.
Phylogenetic tree color-encoding
The phylogenetic tree is color-encoded when a list of clusters and their associated colors is established. Each ancestral location in the tree is associated to one cluster with its specific color, which is used to color the internal node. If clustering mode 1 is used, ancestral locations are associated to the closest cluster (the shortest distance from the ancestral location to the center of clusters). If clustering mode 2 is chosen, ancestral locations are naturally linked to one of the clusters defined by the latitude/longitude grid.
The transition diagram
A transition is defined as an inferred migration across a pair of geographical clusters between subsequent nodes of the tree in a top-down reading, from the root to the tips in a recursive process (Fig. 4a). A default diagram starts with a node corresponding to the ancestral root state i. The state i is the associated cluster knowing the ancestral latitude/longitude linked to the tree node. As in the tree color encoding process, if the first clustering mode was chosen at the clustering step, each ancestral lineage is associated to the cluster that minimizes the distance between the coordinates of the lineage of interest and the center of the cluster examined. Otherwise, if the second clustering mode was chosen at the clustering step, the cluster of the ancestral lineage is directly determined by its coordinates through the parallel/meridian grid. A node is inserted in the transition diagram when a cluster transition i—> j is observed until the tips are reached (Fig. 4b). A compressed version of the diagram is available by collapsing identical transitions having the same ancestor in the default version (Fig. 4c). The transition diagram is represented as a multi-furcating tree-like representation, summarizing the series of transition that took place during the course of evolution. It gives a synthetic view of a phylogeographic pattern without the geographical constraints (Fig. 4d). Several graphical layouts are available (tidy tree, force-directed graphs, etc.). Nodes sizes can be equal, or proportional to the Sz criterion, which is the count of descendants being in the same geographic cluster along the evolutionary path from a root node linked to a transition, to its tips. In case of a compressed version of the transition diagram, Sz values are added for the nodes sharing a geographic cluster at the same generation. For more details related to the transition diagram and the Sz criterion, we refer the reader to [10,11,12]. The transition diagram is then read by generation step from its root to its tips to produce paths between clusters on the geographic map (Fig. 4e).
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.