The number of reads for the obtained sequences ranged from 619,896 for CICESE-188 to 6,256,738 for CAIM 1400. The resulting reads presented average lengths of between 71 bp for CICESE-170 to 145 bp for CAIM 1400. The coverage depth showed differences between strains, wherein the lowest registered coverage was 14.2×, for CICESE-188, and the highest was registered for CAIM 1400, at 155.9×. Genomes were assembled from between 143 contigs (N50 = 194,688 and L50 = 8) for CICESE-170 to 1590 contigs (N50 = 5908 and L50 = 244) for CICESE-188 (Table 2). Among the Mexican genomes, CAIM 1400 presented the highest number of detected genes, with 4901 genes, including 283 new genes and 280 new gene families; CICESE-170 had the lowest number of genes, with 4643 detected genes, including 87 new genes and 69 new gene families (Table 2).
The pan- and core-genomes, new genes and new gene families for each strain, are show in Fig. 1. The strains Peru-466, isolated in 1996, CAIM 1400, isolated in 2004, and CICESE-188, isolated in 2009, presented the most new genes and new gene families relative to the reference genome of the strain RIMD 2210633, which was isolated in 1996, without a clear pattern associated with the year of isolation. The pan-genome presented an increase in the number of genes, from 4654 genes in the RIMD 2210633 strain to 5825 genes that were registered for CICESE-273, an environmental strain isolated in 2012; in contrast, the core-genome decreased, from 4654 genes in RIMD 2210633 to 4013 in CICESE-188, which was isolated in 2009.
The alignment of chromosomes I and II for the O3:K6 strains isolated in American countries, relative to the reference strain RIMD 2210633, which was isolated in Japan, are presented in Fig. 2. For both chromosomes, the figure indicates the most common genetic mobile elements that were associated with the pandemic strains, which were identified as the 7 VPaIs, the phage f237, the secretion system (TSS), and the type I pilus, among others. The positions of each element corresponded with those in the genome of the reference strain RIMD 2210633, as described in Hurley et al. , Boyd et al. , and Chen et al. .
From the RAST web server analysis, the genes commonly associated with pathogenic strains were detected, in categories such as resistance to antibiotics and toxic compounds, phages, prophages, iron acquisition systems, stress responses, toxins, the regulation of virulence genes, secretion systems, flagellar motility, capsular and extracellular polysaccharides, siderophores, colonization, and biofilm formation. Based on the RAST results, using a function-based comparison tool, most of the differences relative to the reference genome RIMD 2210633 were identified in genes that were associated with categories other than pathogenicity.
Mobile genetic elements that were associated with O3:K6 pandemic strains were detected in both chromosomes of the studied O3:K6 strains (Fig. 2). Five of the VPaIs (VPaI-1 to − 5) were detected in chromosome I, whereas the other 2 (VPaI-6 and -7) were detected in chromosome II. The results indicated that the VPaIs that were detected in the Mexican strains contained most of the genes that have been described for the reference genome RIMD 2210633, with high similarity (> 96.4%). The few variations that were identified were associated with non-coding bases.
The VPaIs 1, 2, 3, 4, and 6 were present in each genome of the Mexican strains, containing the same number of genes that were previously reported for RIMD 2210633. VPaI-5 was not detected in CICESE-188; however, this strain presented most of the mobile genetic elements, including transposase, hypothetical protein, lipopolysaccharides (LPS), phage f237, 2 secretion systems (T3SS-1 and − 2), an osmotolerance gene cluster, integron class I, type I pilus, a multidrug efflux gene cluster, ferric uptake, gametolysin, biofilm, degradative genes, and tdh gene, which are characteristic of pandemic strains. In VPaI-7, most of the Mexican genomes presented differences with the reference genome, in the genes VPA1312, VPA1313, VPA1314, VPA1316, VPA1318, and VPA1357. In addition, in most of the Mexican strains, the phage f237-like was not detected; a gap registered in chromosome II, associated with phage f237-like (6 o’clock), may be due to a close association with phage f237, which was registered in chromosome I.
Based on the BLAST matrix comparison, the homology between and within predicted amino acids is presented in Fig. 3. The analyzed genomes from strains isolated in various American countries registered a maximum of 4901 proteins within 4696 families, for CAIM 1400, whereas the reference strain RIMD 2210633 contained a total of 4832 proteins within 4654 families. The similarities in shared proteins between the American strains and RIMD 2210633 ranged from 79.6% (CICESE-188) to 88.8% (ATC210). The American strains presented higher percentages of similarities with each other, with CICESE-170 sharing 97.6% (4411) of proteins with CDC_K5058 and ATC210 sharing 97.8% (4393) of proteins with CDC_K5058. CICESE-188, isolated in 2009 in northeastern Mexico, presented the lowest percentage of shared proteins with all studied genomes. The homology among predicted amino acids ranged from 2.8 to 3.5%. Although no clear pattern was associated with isolation year, high percentages of shared proteins were registered between the Peru-466 strain, isolated in 1996, and the strains CICESE-170, isolated in Mexico in 1998, and ATC210, isolated in Chile in 1998, and these countries were the first American countries in which the pandemic clone was isolated.
The results presented in the pangenome of Fig. 4, indicated high similarity among the American O3:K6 strains and the reference strain RIMD 2210633. Two clusters could be distinguish in the dendrogram of this figure; one that groupe strains isolated in the USA in 2007 (CDC_K5058) and those isolated in Mexico in 1998 (CICESE-170), 1999 (CICESE-186), 2000 (CICESE-187) and 2009 (CICESE-188) and 2012 (CICESE-273). The strain CICESE-188, which was isolated in a northeastern state of Mexico, showed the lowest degree of similarity with the other strains in this group, and also displayed the lowest percentage of shared proteins. A second cluster was formed by those strains isolated in Peru (Peru-466) in 1996, in Chile (ATC 210) in 1998, CAIM 1400 isolated in 2004 in México and the reference strain RIMD 2210633 isolated in Japan in 1996.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.