To the best of our knowledge, this is the first study to provide a whole scan for signatures of selection in the MM genome. Our findings shed light on the possible candidate genes/gene groups involved in the regions undergoing selection in this breed. The results and discussion found here can be useful for the comprehension of signatures of selection in other equine breeds.
Overall, quadrupeds use many footfall patterns during locomotion. The gaits are generally considered to be discrete patterns of footfalls and are divided into symmetrical and asymmetrical . The allele A of the DMRT3 gene is only related to the picada gait in the MM, with two genotypes AA and CA, while the genotype CC is related to batida gait ; however, some studies have shown gait ability (the lateral gait pattern) is under the influence of a set of genes [5, 29]. Other discoveries suggested that alleles related to the type of gait were differently fixed within each gait type . In Icelandic horses, no SNP demonstrated genome-wide significance for DMRT3, implying that the ability to pace goes beyond the presence of a single genetic variant . Considering these facts, there is still a lack of information regarding the genetic architecture behind gait.
Selection in the MM breed is based exclusively on competitions where gaited performance records are evaluated relative to that of competitors, often being an empirical selection. Thus, we presumed that time under strong artificial selection is necessary to identify a possible gait type segregation to well-defined lineages. In this regard, it is essential to understand which genes in the MM population are most relevant to accomplish such goals. According to Arnason et al. , the thoroughbred carried out a long history of artificial selection for galloping speed while being ridden by a jockey, and it might be the same for MM. A well-defined breeding scheme can shift the allele frequencies of the desirable phenotype, and well-defined lineages could be achieved by selection. We next focused on exploring the signatures of positive recent selection found in the MM population and understanding the genes and pathways associated with these regions. As no significant region was associated with the DMRT3 gene in our analyses, we included the DMRT3 gene in a network analysis to get insights on the interactions between the observed candidate genes and DMRT3. The identification of genomic regions modified by positive selection has provided discoveries of adaptive directions in different species. Nowadays, the search for signatures of selection is one of the branches of the theoretical and applied evolutionary studies . This study covered three distinct methods to scan for signatures of selection, which diverge according to their concepts and methodology. This implies that each method captures different patterns of genetic variation in different time scales. Due to the density of the genotyping panel used and the complexity of the biological information, these methods still present pitfalls and cannot exploit the genetic variation present within the entire genome. To overcome this limitation, it is essential that results from multiple methods, i. e., in our case, Tajima’s D, iHS and ROH, are used in a complementary way .
Common significant regions were found between iHS and ROH. Eight common candidate genes (C9orf24, CNTFR, DNAI1, ENHO, DCTN3, FAM219A, RPP25L, and SIGMAR1) were located on ECA23, and one common gene (RASGRP1) was located on ECA1. It is interesting that the genes mentioned above on ECA23 are located ~ 28 Mb away from DMRT3. The existence of LD in this region is possible biologically, and determining the exact gene(s) under selection can be challenging. Thus, we performed a network analysis including the DMRT3 gene, and only one occurrence of low co-expression was found between DCTN3 and DMRT3. Therefore, we excluded the possibility of any significant relationship between the eight candidate genes with DMRT3. However, the limitations of using non-model species may have interfered in our presumptions. Besides that, according to Ma et al.  and Ablondi et al. , during evolution, a series of unknown demographic events further increased the difficulty in detecting modified genomic regions due to different selective pressures. The use of next-generation sequencing (NGS) technologies can be promising for elucidating the relationships between loci in ECA23 because sequencing offers a higher genome coverage and more precision on the position of causative mutations and selection signatures . Complementing the conclusions regarding common candidate regions observed among the used approaches, only a few common candidate genes were found for iHS and ROH. Each method presents specific features implying that non-overlapping regions of signatures of selection between different methods should be treated as complementary information to better understand the different patterns of variation in the MM genome.
The TD results suggest that the MM population is under strong balancing selection; however, many hitchhiking effects were highlighted in the statistics based on the extended haplotype homozygosity and footprints on homozygous regions. The pronounced balancing status in the studied population supported by the TD results was an interesting consequence, possibly explained by the nonexistence of any breeding program in the breed during the past years.
In a previous study developed by our group with the same database, we investigated the runs of homozygosity and populational inbreeding (Froh) in the MM breed . A compressive discussion on the length of ROH segments is given in this previous study, which brings important information to understand the breed age and genetic variability. The MM breed is a relatively old breed compared to most modern breeds having originated in the mid-eighteenth century. The class with the highest percentage of ROH was 0–2 Mbps, with 92.78% of the observations, suggesting the majority of ROH segments was classified as short segments .
In the present study, we found only one long ROH segment (> 16 Mbps) located on ECA7 (Fig. 4). The number of short segments was more abundant, possibly due to events of recombination that occurred in the past and caused its reduction [19, 36], or due to the limitation of using genotyping data, thereby overestimating the number of short ROH . Again, sequencing data can add value to ROH studies as it covers more genetic variability . However, one of the strengths of ROH analysis is that long homozygous segments can be reliably identified, even from relatively modest marker densities .
Evidence for this long ROH segment on ECA7 has already been described in the literature . We cannot fail to consider that equestrian sports prioritize high performance, therefore, conditioning to a positive selection based on directional selection [34, 39]. Another view suggested that strong bottlenecks occurred in this region during the breed formation. Ablondi et al.  found similar results for ECA7 in Swedish Warmblood horses and Exmoor ponies. Thus, we speculate that this candidate region of signature of selection in ECA7 is possibly a consequence of a previous bottleneck and not recent positive selection because of the similarity in the results found in distinct breeds. In this sense, our findings corroborate the argument reported in Ablondi et al.  for an intense bottleneck, but pointing to a common moment in the evolutionary process for some breeds.
Four genes (TRIR, TNPO2, WDR83, and WDR83OS) were highlighted within this longest shared homozygosity segment located on ECA7. These genes were identified under biological functions for localization (GO:0051179) and metabolic processes (GO:0008152). It has been shown that the TRIR gene has a significant role in cellular functions . Other genes, TNPO2 and WDR83, were related to tumor development. One region on ECA1 encompassed the gene RASGRP1, which was found in common between ROH and iHS and played a key role in the development of T and B cells . Studies have associated RASGRP with disease phenotypes in bovine animals [42, 43] and dogs .
In general, the genetic signals for the three statistics were most enriched in ontologies corresponding to “biological regulation,” “metabolic process,” and “cellular process.” In the Panther results for iHS and ROH candidate genes, the ontology “localization” was also very representative. Some highlighted candidate genes were associated with gait and locomotor system aspects, with eight of them regulating anterior/posterior pattern specification (Fig. 5).
The HOX genes encode homeodomain transcription factors in developing many embryonic structures in vertebrates and invertebrates . According to Pineault & Wellik , as development progresses, tight spatial and temporal control of gene expression and cellular behavior sculpts the developing embryo, adding specific morphological and functional characteristics that determine the adult animal’s lifestyle and functionality.
The GLI3 gene was identified under the same HOX gene group to regulate anterior/posterior pattern specification. Exploring this information, we found that GLI3 is a transcriptional activator and a repressor of the sonic hedgehog pathway and plays a vital role in limb development. GLI3 has been described in the literature as an embryonic patterning of human limbs and other structures . The relationship between the HOX genes and limb musculoskeletal development has been well described in the literature. Pineault & Wellik  suggested that the integration of the musculoskeletal system is regulated in part by HOX function in the stromal connective tissue and plays critical roles in skeletal patterning throughout the axial and appendicular skeleton. Evidence to support these genes as possibly regulating limb formation and other processes associated with the locomotor system was reported by Grilz-Seger et al. , who found several GO terms shared by more than one breed when studying a set of European and Near Eastern horse breeds; high significance levels were reached for the GO terms “anterior/posterior pattern specification” (GO:0009952), “embryonic skeletal system morphogenesis” (GO:0048704), and “sequence-specific DNA binding” (GO:0043565), mainly based upon the HOXB-cluster in the breeds Gidran, Lipizzan, Posavina, and Noriker.
Other significant signals in the present study were found for the CCL19 and MAP3K6 genes enriched for the activation of JUN kinase (JNK) activity. Exercise stimulates c-Jun NH2 Kinase Activity and c-Jun transcriptional activity in human skeletal muscle, showing that the JNK pathway may serve as a link between contractile activity and transcriptional responses in skeletal muscle . Exercise causes selective changes over gene expression, leading to differentiation in skeletal muscle structure and function, which provides strong evidence that this regulation may be associated with gait type segregation in the skeletal muscle on limbs. The effect of activity during exercise in c-jun mRNA expression is via the phosphorylation of two serine residues through the JNKs in the c-Jun transactivation domain, leading to increased transcriptional activity .
It is well known in the modern horse that athletic performance has been the target of selection in recent years for many breeds. Increasingly, a perfect horse is being idealized in countless sporting modalities. Indeed, candidate genes were highlighted under important biological functions related to exercise physiology, energy mechanisms, catabolic processes, morphogenesis (bone, skeletal system, and cartilage), and fertility. However, these genes/functions do not act alone in the MM performance. As observed in the network analysis, gene functions are dependent, with the major part of them being regulated in sets.
The interpretation of the network analysis is that most candidate genes, either core genes or peripheral genes, are interconnected. Any peripheral gene is likely to affect the regulation or function of a hub gene. An explanation for the high interconnection in networks is that networks have structures consisting of distinct modules of connected nodes and frequent long-range connections. Under these conditions, any two nodes in the graph are usually connected by just a few steps .
Overall, the application of classical and recent techniques in genomics has successfully permitted the identification of several putative selection signatures in the MM population. Based on our discussions, gait may have a polygenic basis and is influenced by many genetic components. Further exploration would be strengthened by searching for signatures of selection by comparing the MM to a non-gaited breed. This method could then be compared to the regions found within the breed and would clarify whether these signatures are unique to the breed (or the gait) rather than being general signatures of selection in horses or if they could potentially detect new genetic bases of gait in the MM. Among the biological processes, genes of biological interest such as the HOX gene family were enriched in the ontology corresponding to “anterior/posterior pattern specification.” Biological processes related to limb morphogenesis, the skeletal system, proximal/distal pattern formation, JUN kinase activity (CCL19 and MAP3K6), and muscle stretch response (MAPK14), among others, were reported. Finally, identifying genes and pathways that drive phenotypes is still a challenge; here, we pinpoint some important genes and gene pathways involved in complex selective processes that could be useful in other studies and for the genetic improvement of this breed.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.