The deep sea (generally below the 200 m limit of the euphotic zone) accounts for the majority of the world’s ocean (>95%; Costello et al., 2010; Wedding et al., 2013). This vast environment hosts a wealth of hydrocarbon and mineral resources and provides a series of ecosystem services associated with its functioning (e.g., nutrients regeneration and global biogeochemical cycles), resulting in a vast repository of complex organic molecules and unexplored biodiversity (Pikitch et al., 2014; Thurber et al., 2014; Kroodsma et al., 2018).

The biodiversity of deep-sea ecosystems is increasingly threatened by anthropogenic impacts resulting from pollutants and other activities such as the extraction of geochemical resources and minerals (Levin and Le Bris, 2015). Marine biodiversity conservation is in growing conflict with resource exploitation, especially when it comes to key deep-sea habitats such as abyssal plains (associated to manganese nodule mining), hydrothermal vents fields (associated to sulfide deposits) or submarine canyons (associated to oil and gas drilling) (Danovaro et al., 2017a). In the context of increasing climatic and human disturbances, deep-sea ecosystems, and biodiversity found in areas beyond national jurisdiction are prime conservation targets (as identified by the United Nations Convention on the Law of the Sea (UNCLOS) and the International Union for Conservation of Nature (IUCN), where the preservation of marine ecosystem functions should be balanced with sustainable use of resources (Danovaro et al., 2008, 2020; McIntyre, 2010; Morato et al., 2010; Pusceddu et al., 2014; Ramírez et al., 2017).

Scientific research must thoroughly investigate all ecosystems’ life components prior to onset of mass industrial activities, among which impending deep-sea mining raises particular concerns (e.g., Koschinsky et al., 2018; Washburn et al., 2019). This demand has resulted in the development of management guidelines for sustainable use of the sea, as reflected in the Aichi Target 11 (Convention on Biological Diversity, CBD) and by the Sustainable Development Goal 14 “Life below water” and the post-2020 Zero draft CBD proposal (UN 2030 Agenda for Sustainable Development; UNEP, 2020; UNESCO, 2020). A more comprehensive and multidimensional understanding of marine biodiversity in all its facets, including how it is shaped by the environment, human impacts, and climate, represents a critical knowledge framework that is needed to inform resource management operators (Howell et al., 2020). To gain this comprehensive knowledge, deep-sea research is merging the information on the number of species (or taxonomic units) with data on their ecological relationships and information on their spatiotemporal distribution (Berry et al., 2019; Costa et al., 2020).

Need for Filling Knowledge Gaps of the Deep-Sea

Biodiversity knowledge relies on access to adequate taxonomic information with emphasis on in situ sample collection, observation, and monitoring strategies (Glover et al., 2018). Nonetheless, the deep sea is still virtually unknown to science as <0.0001% of its surface area has been explored in detail (Ramirez-Llodra et al., 2011). Although it has been argued that richness of marine pelagic species decreases sharply with depth (Colloca et al., 2003; Costello and Chaudhary, 2017), major knowledge gaps still exist with current data likely to be biased by uneven and scattered sampling (Higgs and Attrill, 2015). In particular, less than 1% of the deep pelagic realm has been sampled to date due to its vastness and remoteness (Higgs and Attrill, 2015 and references therein). Overall, it is estimated that about 1.5 million deep-sea species have yet to be discovered (Costello and Chaudhary, 2017; Danovaro et al., 2017b).

Despite growing efforts to collect, store and publicly share biological and ecological data on the deep-sea through international programs such as the Census of Marine Life (CoML), the Ocean Biodiversity Information System (OBIS) and the Deep Ocean Observation System (DOOS), the baseline knowledge in biodiversity is still inadequate, and data on the distribution of deep-sea species over extended spatial and temporal scales are almost entirely lacking (Glover et al., 2010; Wedding et al., 2013). Reports of species occurrence in a given area depend on direct sampling for final taxonomic assignment (Glover et al., 2018; Danovaro et al., 2020). This is typically carried out by vessel-assisted methods and technologies [e.g., remotely operated vehicles (ROVs) and autonomous underwater vehicles (AUVs)], with considerable practical and logistic limitations still affecting sample collection and spatiotemporal replication (Aguzzi et al., 2019). Indeed, the capability of vessel-based research expeditions has advanced significantly in the past decades, however the data gathered provide a snapshot of the local biological complexity but are restricted to the relatively narrow timeframe of the cruise period (Ruth, 2006). This limitation is further emphasized in the deep-sea, where tidal and inertial currents can result in massive benthic and pelagic populations displacements (Gage and Tyler, 1992; Aguzzi and Company, 2010; Aguzzi et al., 2011a,2015). In addition to the many technical constraints of deep-sea surveys, sampling is often targeting specific taxonomic groups, habitats, ecological traits, sizes or behaviors, limiting the taxonomic resolution of species inventories (Hatch et al., 2020; McCowin et al., 2020; Weston et al., 2020). The most notorious example of such limitations is found in deep-sea fishery surveys, where data is collected either by trawl nets of a certain mesh size or using ROVs for habitat characterization, both of which target only benthic megafauna and also have known biases due to selective captures/sampling (e.g., Common Fishery Policy Data Collection Multiannual Program; Aymà et al., 2016; Jac et al., 2021).

Emerging Technological Advances in Deep-Sea Monitoring

Cabled observatories (seabed oceanographic research platforms connected to network systems to provide a continuous monitoring, observing, and recording of various seafloor activities) are constantly transforming ocean research, by establishing networks of interactive, globally distributed sensors for real-time data collection (Danovaro et al., 2017a; Aguzzi et al., 2019; Jahanbakht et al., 2021). These platforms enable the combination of data collection by optoacoustic (HD video and multi-beam rotary or dual-frequency sonar imaging devices), oceanographic and geochemical sensor technologies, in a continuous, high-frequency and long-lasting fashion (e.g., Thomsen et al., 2012, 2017; Howe et al., 2019; Table 1). Coupling the presence of species to the environmental conditions surrounding them makes these platforms the core of emerging in situ marine ecosystem-level laboratories (Rountree et al., 2020). These platforms can provide long-term imagery data sets (e.g., decades), hence enabling the compilation of comprehensive multiannual species richness lists (Juniper et al., 2013; Doya et al., 2017; Chauvet et al., 2018; del Rio et al., 2020). Taxonomic characterization of monitored communities by visual means is also complemented by Passive Acoustic Monitoring (PAM) systems with the use of specific acoustic markers for species identification (e.g., Juanes, 2018).

Table 1. List of some of the best known coastal and deep marine cabled observatories that are presently engaged in the recompilation of large image data sets with the implementation of eDNA prospection.

To overcome the spatial constraints imposed by limited fixed-point observation nodes, mobile platforms are being developed to monitor both the seafloor and the water column (Aguzzi et al., 2019). Internet Operated Vehicles (IOVs), such as crawlers and rovers, are benthic mobile platforms that are either tethered to cabled observatories (Purser et al., 2013) or completely free of direct physical connection (Brandt et al., 2016), and can operate with preloaded navigation plans to autonomously return to their docking station (i.e., the cabled observatory) to recharge and offload data (Thomsen et al., 2012, 2017; Aguzzi et al., 2020a). A recent addition to cabled observatories allows the study of subatomic particles such as neutrinos (Agostini et al., 2020). Using a suite of photomultipliers and other light-sensitive sensors, these neutrino telescopes are also capable of continuously monitoring bioluminescence from migrating deep-scattering layers and bacterioplankton (Martini et al., 2013, 2014; Tamburini et al., 2013; Bailly et al., 2021). These cross-disciplinary infrastructures will provide key complementary data for long-term monitoring of bentho-pelagic coupling in a rapidly changing ocean (Chatzievangelou et al., 2021).

Most observatories rely on information acquired by imaging to provide both qualitative and quantitative data on local biodiversity (Bicknell et al., 2016). Thus, the quality of biodiversity information relies upon the ability to classify organisms to the species level, that in turn can be used to compile local inventories (i.e., richness) and relative abundance estimates (Aguzzi et al., 2020a). Unfortunately, imaging does not always allow sufficiently high taxonomical precision, and generally requires the physical collection of samples to validate species identification. Furthermore, organisms’ attraction to or avoidance of submerged infrastructures is likely to cause some degree of bias toward the local communities (Widder et al., 2005; Aguzzi et al., 2019; Rountree et al., 2020; Garcia-Vazquez et al., 2021).

Significant advances in molecular methodology and bioinformatics, accompanied by a steady increase in computational power, have made “omics” technologies and data increasingly accessible, with great potential to fill gaps in biodiversity monitoring capabilities of deep-sea cabled observatories (Heidelberg et al., 2010; Garcia-Vazquez et al., 2021). One of the more recent contributions of “omics” to biodiversity monitoring is linked to the collection and analysis of genetic material extracted directly from environmental samples (sediment, water, ice, and air, etc.; Taberlet et al., 2012; Barnes and Tuner, 2016; Cristescu and Hebert, 2018), which can include a mixture of whole organisms and/or environmental DNA (eDNA) (sensu Rodriguez-Ezpeleta et al., 2021). Sequencing of eDNA by means of High Throughput Sequencing (HTS) technology has enabled the development of eDNA metabarcoding. Here, amplicon sequencing with universal primers is used to generate a extremely large (hundreds of thousands to millions) of DNA (mini)barcode reads (Meusnier et al., 2008; Hajibabaei and McKenna, 2012). These are preprocessed and curated using dedicated bioinformatic pipelines. This includes trimming the reads so that only marker sequences remain and quality filtering (e.g., DADA2 – Callahan et al., 2016; Cutadapt – Martin, 2011; Vsearch – Rognes et al., 2016). The quality filtered reads are then clustered into OTUs based on similarity (e.g., 99%, 97%) or taxonomically assigned directly using DNA reference databases (e.g., BOLD – Ratnasingham and Hebert, 2007; GenBank – Benson et al., 2013; PR2 – Guillou et al., 2013; SILVA – Quast et al., 2013; PLANiTS – Banchi et al., 2020; MZG-db – Bucklin et al., 2021). Reads that cannot be assigned to the desired taxonomic level (e.g., species, genus, and family) can still be used to assess alpha and beta diversity evaluation (e.g., Stefanni et al., 2018). eDNA metabarcoding approaches are revolutionizing marine biodiversity assessment and monitoring because they can be used to simultaneously determine entire species communities, even when the exact composition of these assemblages is unknown (e.g., Deiner et al., 2017; Djurhuus et al., 2017; Stefanni et al., 2018; Eble et al., 2020; Kolda et al., 2020; McClenaghan et al., 2020; Seymour et al., 2020; Kawato et al., 2021). eDNA metabarcoding is becoming a particularly valuable tool for deep-sea biodiversity research and monitoring given high species diversity, low animal numbers, difficulties in taxonomic identification due to limited taxonomic expertise, large and remote location, and associated logistical constraints for sample/specimen acquisition (Thomsen et al., 2016; Kersten et al., 2019; Atienza et al., 2020; Canals et al., 2021; Kawato et al., 2021; Merten et al., 2021).

Much of the eDNA work on deep-sea communities has focused on sediment samples to study benthic communities (e.g., Guardiola et al., 2016a; Atienza et al., 2020; Lins et al., 2021) as opposed to fish and pelagic communities. While fish taxa detected by eDNA metabarcoding are generally comparable to those identified by conventional fish survey methods, eDNA captures greater fish diversity than conventional methods when considering a single conventional approach. For example, eDNA metabarcoding in the deep sea generally outperforms trawling because of the presence of species that are typically elusive, small, rare or located on rocky surfaces or steep slopes (Thomsen et al., 2016; Closek et al., 2019; Afzali et al., 2020; Fraija-Fernández et al., 2020; McClenaghan et al., 2020). The advantage of deep ocean water eDNA metabarcoding has also been demonstrated for the study of other communities, including cephalopods (Merten et al., 2021; Visser et al., 2021) and zooplankton (Kersten et al., 2019; Laroche et al., 2020b; Govindarajan et al., 2021). In addition, eDNA extracted from water has also been used to study deep-sea benthic communities (Everett and Park, 2018; Laroche et al., 2020a). However, some studies have shown that samples from the water column are not a viable alternative to sediment samples for benthic diversity inventories (Brandt et al., 2021). Due to the patchiness of benthic fauna (Rosli et al., 2017), eDNA analysis of deep-sea sediment requires sampling of multiple biological replicates and larger samples sizes (Guardiola et al., 2016a, b; Atienza et al., 2020; Brandt et al., 2020). eDNA analysis of sediments may describe past rather than present communities, as sediments contain ancient DNA (aDNA) in addition to contemporary DNA, thus sediment eDNA analysis often targets the very top layer of sediment (Atienza et al., 2020; Brandt et al., 2020) and/or longer amplicons (e.g., COI – Leray et al., 2013).

In the present manuscript, we identify and discuss potential developments in the use of eDNA metabarcoding for deep-sea biodiversity assessment at cabled observatories and associated mobile platforms. Methodological developments are discussed in relation to: (i) Integrating eDNA with optoacoustic imaging; (ii) Development of eDNA repositories and cross-linking with other biodiversity databases; (iii) Artificial Intelligence (AI) for eDNA analyses and integration with imaging data; and (iv) Benefits of eDNA augmented observatories for the conservation and sustainable management of deep-sea biodiversity. We conclude by discussing the technical limitations and recommendations for future eDNA monitoring of the deep-sea.

Integrating Environmental DNA With Optoacoustic Imaging

Among the main benefits of using eDNA as a monitoring tool are the fact that it is an indirect non-invasive technique (i.e., no need to capture the target organism) and it does not require specialist taxonomic expertise to detect taxa across the tree of life (Goricki et al., 2017; Stefanni et al., 2018), though the latter strongly depends on availability of comprehensive reference DNA databases (as further discussed below). Once an environmental sample such as water, biofilm or sediment is acquired (Brandt et al., 2021), the collected eDNA can be queried either by using “universal” markers targeting whole communities by means of HTS (Jerde et al., 2019), or by targeted species-specific assays usually performed by real-time quantitative PCR (qPCR) or digital PCR (dPCR) (Goldberg et al., 2016). The effectiveness of both approaches depends on the availability of reference data, for taxonomy identification of sequenced reads with eDNA metabarcoding and for the development of species-specific assays with the targeted approach. These DNA-based tools offer several advantages over traditional techniques. They improve the ability to unravel the “hidden” biodiversity (e.g., detect rare, cryptic, elusive, and non-native species in the early stage of invasion), which is particularly relevant in the case of remote environments such as the deep-sea, and enable near real-time global census of species (Stat et al., 2017; LeBlanc et al., 2020).

Such features may enable the full integration of eDNA analysis into ecological monitoring procedures when its measurement is coupled with other non-molecular data as optoacoustic imaging (e.g., Stat et al., 2019; Easson et al., 2020; Mirimin et al., 2021). For this purpose, eDNA water sampling should also be provided in real-time by autonomous and independent samplers (e.g., Yamahara et al., 2019; Hansen et al., 2020; Jacobsen, 2021; Moore et al., 2021), with prototypes presently under construction (e.g., the Adjustable Volume eDNA Sampler, and the Robotic Cartridge Sampling Instrument-RoCSI) or that can be adjusted for this purpose, as the SALSA system (Kersten et al., 2019; Brandt et al., 2021). An alternative to water samplers, would be an opportunistic use of filter feeding organisms such as sponges or bivalves, that act as natural “DNA traps,” concentrating eDNA from water that can be retrieved at different time points (Mariani et al., 2019; Turon et al., 2020; Weber et al., 2021). The advantage of adding eDNA to ecological monitoring protocols is its ability to cross-validate data from other methodologies (e.g., imaging) (e.g., Aguzzi et al., 2019). On the other hand, it is reported that samples from the water column do not provide a good characterization of the underlying benthic taxa suggesting that benthic biodiversity surveys should be also performed (Antich et al., 2021b; Brandt et al., 2021). Seabed sediment acquisition technologies are continuously improved and optimized, so as to obtain more authentic and reliable samples to meet the ever-increasing demands on sampling capabilities (He et al., 2020) and adaptation to cabled observatory infrastructures.

Recent eDNA advancements allow us to study a wide range of taxa (including vertebrates) that are otherwise inaccessible by direct capture or optoacoustic technologies (e.g., Lacoursière-Roussel et al., 2018; Cowart et al., 2020; Laroche et al., 2020b; Canals et al., 2021). Though still limited to the near surface waters, the combined use of video-monitoring and eDNA metabarcoding has also been successfully applied using Baited Remote Underwater Video Systems (BRUVs) to monitor Marine Protected Areas (MPAs) (Stat et al., 2019) or integrated in cabled observatories (Mirimin et al., 2021). In these cases, taxa analyses were represented by visually conspicuous biota (mainly fish) and all post-sample collection steps were carried out off site in dedicated molecular laboratories. The way forward involves the integration and development of sampling methodology and sensing protocols adapted to operate on ROVs, AUVs and even biomimetic platforms (e.g., Aguzzi et al., 2021a), hence further expanding the sampling capability to most remote habitats while minimizing sampling disturbance (e.g., Trenkel et al., 2019).

Development of Environmental DNA Repositories and Cross-Linking With Other Biodiversity Databases

When identifying organisms, scientists can narrow down taxonomic possibilities thanks to the use of a single approach or, preferably, by combining and integrating multiple approaches, although a degree of uncertainty in taxa identification will always remain (Danovaro et al., 2020). In recent years, molecular tools have been integrated into classical morphology-based taxonomic approaches (e.g., Stefanni et al., 2021), which has proven extremely useful in resolving the taxonomic status of cryptic species (e.g., Carreiro-Silva et al., 2017). However, in an ideal integrative taxonomy framework, different lines of evidence obtained at the genetic, physiological, morphological, behavioral, and habitat level should be considered and all combined within Hutchinson’s (1957) multimodal niche (Schlick-Steiner et al., 2010).

Nowadays, most biodiversity data are recompiled into open-access online databases (Gemeinholzer et al., 2020). In the case of marine life, the most comprehensive database – the World Register of Marine Species (WoRMS) – is regularly updated by active communities of marine taxonomists (Costello et al., 2013). Building on this foundation, the World Register of Deep-Sea Species (WoRDSS; Glover et al., 2021), a taxonomic database of deep-sea species, was launched in 2012 by the International Network for Scientific Investigation of Deep-sea Ecosystems (INDEEP). This database also includes the global-scale trait database for the fauna of deep-sea hydrothermal-vents, the sDiv-funded trait database for the Functional Diversity of vents (sFDvent; Chapman et al., 2019). These inventories are exclusively based on records of collected organisms. In parallel, genetic and genomic databases have been implemented, that are either inclusive as in the case of GenBank (Clark et al., 2016) or BOLD (Ratnasingham and Hebert, 2013), or restricted to selected groups of organisms, such as MZGdb (Bucklin et al., 2021), PR2 (Guillou et al., 2013) and PLANiTS (Banchi et al., 2020).

As in conventional DNA barcoding, eDNA sequences are usually compared with a reference database of the expected species community to translate the obtained molecular operational taxonomic units into biological species for the final data interpretation. These matching processes are reliable when based on a comprehensive reference library supported by morphological description of the reference taxon. However, such reference databases are still far from complete, especially for deep-sea communities (Weigand et al., 2019). Additionally, misidentifications of reference sequences have been frequently reported, highlighting the need of refinement and curation of these databases to reduce false negatives, and conflicts in taxonomic assignment (Stefanni et al., 2018; Schroeder et al., 2020; Bucklin et al., 2021).

The performance of eDNA in providing accurate estimates of species’ diversity by matching different genetic repositories, has also been tested. For example, fishes are both a frequent target in eDNA studies and widely represented in genetic repositories by multi-marker sequences. Recently, the performance of eDNA from surface water samples in determining fish diversity, was evaluated by a comparing it to bottom trawl catches (Stoeckle et al., 2021). Fish diversity estimation obtained by eDNA was equal to, or greater than, that obtained from a single 66 million liters trawl. Most (70–87%) species detected by trawl in a given month were also detected by eDNA, and vice versa, including nearly all (92–100%) abundant species (Stoeckle et al., 2021). For a more comprehensive assessment of the local biodiversity including benthic taxa (from metazoans to protist and prokaryotic communities), eDNA from sediment should also be analyzed as only a fraction of total molecular clusters is shared between the eDNA of these two environmental matrices (Atienza et al., 2020; Zhao et al., 2020; Brandt et al., 2021). Furthermore, meiofauna, micro-eukaryotes, and bacteria constitute a large portion of deep-sea abundance and biomass and should not be neglected (Rex et al., 2006; Ingels et al., 2021). Even if these small-size organisms cannot be taxonomically identified due to lack of appropriate reference databases, their contribution to biodiversity can still be evaluated with taxonomy-free approaches (Cordier et al., 2019b).

The improvement of existing marine genetic databases and the development of portals exclusively dedicated to eDNA sequences are considered priorities for global biodiversity assessment and for filling taxonomic and spatial gaps in bio-surveys (Berry et al., 2021). Early initiatives have already been undertaken worldwide to integrate eDNA into biodiversity databases that provide accurate spatial information on aquatic species occurrence based solely on eDNA records collected according to standardized protocols (e.g., United States, New Zealand, and Sweden) (Young et al., 2018; DFO, 2020; Sundberg et al., 2020; Abbott et al., 2021). Integrating dedicated eDNA sequence repositories with high-resolution imaging or other attributes collected in situ (e.g., sound generated by animals; Mooney et al., 2020) can maximize the identification of species together with spatial and temporal resolution (e.g., Bicknell et al., 2016; Howell et al., 2019; Horton et al., 2021; Mirimin et al., 2021). Such integrated open-access online biodiversity databases can further enable a putative taxonomic identification of species detected (as particular OTUs), but not identified by eDNA. If the closest taxonomic match for eDNA sequences is below the percentage that would allow species-level identification, a putative identification of the sequence in question could be made using image or sound identifications taken along, at least until a specimen is collected and properly examined and a reference sequence record deposited for future use. This would provide information on what to expect in future biodiversity inventories in a given remote area.

A further step toward integration of marine biodiversity data repositories has been provided by BOLD, which contains open access records of organisms (including imaging) tagged with one or more standardized short DNA genetic markers (Ratnasingham and Hebert, 2007, 2013). A further step toward integrating marine biodiversity data repositories could be the creation of a single open access platform where data of different origins and typologies (including eDNA markers) are freely searchable (as in the case of the Global Biodiversity Information Facility, GBIF) (Andersson et al., 2020; Heberling et al., 2021). These database platforms have begun to include eDNA records as a new type of biological observation that can be accessed alongside millions of conventional biodiversity records (Berry et al., 2021). The development of AI algorithms (as indicated in the previous section) can facilitate better operational cross-linking between in situ eDNA data and other complementary data (e.g., temperature, pH, current, and etc.). These “Big Data” analyses could be fully embedded into cabled observatories protocols for autonomous data processing to provide reliable spatiotemporal assessment of biodiversity in almost near real time.

Artificial Intelligence for Environmental DNA Analyses and Integration With Imaging Data

The step forward to efficiently augment the in situ deep-sea ecological monitoring capability of cabled observatories and their docked platforms envisions the ability to collect genetic and imaging data in situ and process the information in real time using automated pipelines (e.g., Osterloff et al., 2016, 2019; Lopez-Vasquez et al., 2020; Zuazo et al., 2020). These developments rely on the establishment of AI algorithms for taxonomic assignment as well as dedicated reference DNA sequence databases.

The fully automated integration of eDNA and imaging data represents one of the core development aspects to augment the monitoring capability of deep-sea biodiversity at cabled observatories, enabling the detection of organisms over a wide range of taxa and different body sizes when it comes to fishes. Currently, there are several initiatives to automate in situ eDNA analyses in near real time (Scholin et al., 2017; Ribeiro et al., 2019; Yamahara et al., 2019). Integration of eDNA and imaging data involves the development of appropriate pipelines for: (i) automatic taxonomic identification of eDNA sequences to the highest level (e.g., species); and (ii) cross-check of eDNA taxonomic identification with large image repositories, accounting for a multi-annual status of local richness and biodiversity (i.e., based on species tracking and classification, resulting in time series of data on community structure as well as relative abundance). Both steps can be implemented by applying AI algorithms using Machine Learning (ML) methods.

Analysis of eDNA metabarcoding data using ML methods is a new and developing field. There are two main approaches in the use of ML methods for biodiversity monitoring, and while one operates on taxonomically assigned OTUs, the other is taxonomy-free, where there is no longer the need of a reference database, thus overcoming the limits of taxonomy-based eDNA bioassessment (Cordier et al., 2018). Such taxonomy-free approach still requires “training” data sets in order to feed into predictive models that can be used to make inference on previously unexplored taxa (Cordier et al., 2018).

Cordier et al. (2017) focused on the problem of lacking inventories for eDNA data from benthic foraminifera and showed that supervised ML approaches (i.e., random-forests and self-organizing-maps) can classify unknown sequences and infer biotic indices of macro-invertebrates reasonably well. They argued that ML makes good predictions and outperforms analyses based only on known sequences (Cordier et al., 2018).

Machine learning tools are currently part of many pipelines for eDNA data analysis. Dully et al. (2021a; 2021b) showed that ML-based pipelines are sufficiently robust even for rarefied samples. Other authors reached similar conclusions (Cordier et al., 2019a; Apothéloz-Perret-Gentil et al., 2021; Frühe et al., 2021; He et al., 2021) and Mathon et al. (2021) reviewed literature on “eDNA and Machine Learning.” In ML analysis, data is first pre-processed with common bioinformatic pipelines as for general metabarcoding analysis (Mathon et al., 2021) and subsequently processed through an automated DNA-Barcode Classifier (taxonomy assignment). ML supports this classification task with a consolidated pipeline. Sequences contained in DNA-barcode repositories (e.g., GenBank, GB; Barcoding of Life Database, BOLD) are first used to train a ML-based classifier (e.g., Cordier et al., 2017, 2019a; Frühe et al., 2021). The trained classifier is then ready to identify the taxa contained in the sample. We prospect, that in the framework of cabled observatories further assessment of the identification results could be obtained by cross-checking the eDNA taxonomy classification with organism identified through video/image data analysis. In this case, images have to be acquired contextually to eDNA sampling and a content-based image classification have to be performed in order to classify the framed organisms (e.g., fishes).

Applied underwater image classification based on ML demonstrated to provide high quality results (Langenkämper et al., 2020; Lopez-Vasquez et al., 2020; Malde et al., 2020; Mathur et al., 2020). The ML-based image classifier needs to be trained from an image ground-truth dataset. Then, the taxa of the classified specimens can be compared with those returned by the eDNA classifier. The diagram in Figure 1 shows a conceptual pipeline for handling the eDNA data and image cross-check.

Figure 1. This diagram is summarizing the main steps of the pipeline for eDNA and imaging data integration. The eDNA is collected from the water or sediment and processed through metabarcoding protocols. This step includes several bioinformatics pre-processing actions before going through an automated DNA-barcode classifier pipeline. The images acquired on cameras contextually to eDNA sampling are post-processed through an image classifier routine. Both protocols need independent reference repositories to train the ML classifiers before the cross-checking of the taxonomic assignments derived from eDNA and images.

Benefits of Using Environmental DNA-Augmented Observatories for the Conservation and Sustainable Management of Deep-Sea Biodiversity

Achieving conservation and sustainability goals through ecosystem-based management is challenging, particularly for deep-sea ecosystems, as lack of knowledge hinders science-based prioritization of appropriate management and conservation strategies (Glover et al., 2018; Howell et al., 2020; Manea et al., 2020). Cabled observatories have already been recognized as key tools capable of filling knowledge gaps through systematic monitoring (Danovaro et al., 2017a; Aguzzi et al., 2020a,b, 2021b). Integration eDNA surveillance within the monitoring capabilities offered by cabled observatories makes them even more promising (Mirimin et al., 2021). Indeed, eDNA has been highlighted as a key approach that will enable conservation managers and marine spatial planners to detect target species for conservation, provide biotic indexes for impact assessment, increase the spatio-temporal capability of biodiversity surveys, and map vulnerable deep-sea species or ecosystems (Aylagas et al., 2014; Pawlowski et al., 2018; Bani et al., 2020; Kutti et al., 2020). Another value of eDNA augmented cable observatories is their potential contribution to two synergistic global initiatives addressing monitoring of the marine environment: The Essential Ocean Variables (EOVs), supported by the Global Ocean Observing System (GOOS); the Essential Biodiversity Variables (EBVs), developed by the Group on Earth Observations Biodiversity Observation Network (GEOBON) (Pereira et al., 2013; Bax et al., 2018). These two frameworks are being developed to inform global policies and sustainability strategies, and produce comparable and integrated data through harmonization of monitoring (Canonico et al., 2019; Jetz et al., 2019).

Addressing conservation priorities in the deep sea, and monitoring the level of effectiveness of conservation measures are critical steps. The use of eDNA analyses has been recently extended to biodiversity assessment in the context of deep-seabed mining of polymetallic nodules to guide management of this deep-sea resource exploitation that is foreseen to have one of the highest environmental impacts in the near future (Wedding et al., 2015; Laroche et al., 2020a; Leray and Machida, 2020), being also suggested as a cost-effective method (Le et al., 2021). The performance of this high throughput approach has also been tested in impact assessment of offshore oil and gas drilling and extraction (Laroche et al., 2018), and in fish stock assessment to inform fishery management (Salter et al., 2019).

DNA-based tools coupled with cabled observatories and supported by visual and acoustic census can enhance monitoring capability within MPAs, as it has been tested in recent biodiversity assessments (Stat et al., 2019; Gold et al., 2021). Such an approach would be greatly beneficial to the monitoring of Large Scale Marine Protected Areas (LSMPAs). LSMPAs are greater than 150,000 km2 and may encompass critical habitats for migratory species (Lewis et al., 2017), but monitoring such areas is challenging if not impractical (O’Leary et al., 2018).

Furthermore, boosting knowledge of deep-sea biodiversity would help in the prioritization of deep-sea areas of conservation. The Ecologically and Biologically Significant Marine Areas (EBSAs) have been previously proposed to focus attention on where and what type of conservation measures could be established in offshore and deep-sea areas, including the designation of new MPAs (Ardron et al., 2009; Portman et al., 2013; Johnson et al., 2018). However, many potential EBSAs have been removed from the original list due to insufficient knowledge needed to inform the selection criteria, and a call has been made to strengthen scientific research in these areas (Johnson et al., 2019). Both LSMPAs and EBSAs initiatives are hindered by the absence of concrete knowledge of connectivity within and between regions (Cannizzo et al., 2021), as well as by the challenge of describing the links between ocean depths and the fundamental bentho-pelagic coupling (Johnson et al., 2018; O’Leary and Roberts, 2018) – these issues might be resolved by eDNA augmented observatories applying metaphylogeography tools for the analysis within OTUs connectives (Turon et al., 2019; Antich et al., 2021a).

Finally, in response to the urgency to increase knowledge of deep-sea ecosystems and manage deep-sea resources in a scientifically sound manner, the Deep Ocean Observing Strategy (DOOS) has been established to coordinate monitoring and observing efforts. As part of this strategy, genetic studies have been identified as key knowledge sources for biodiversity and connectivity assessment (Baco et al., 2016; Levin et al., 2019), as well as prioritizing the need of defining deep-sea ecological variables to feed global monitoring frameworks (Danovaro et al., 2020).

Technical Limitations and Steps Forward for Environmental DNA Monitoring in the Deep-Sea

While underwater imaging in deep-sea cabled observatories is usually used to detect and identify big to medium sized animals, advantages in image processing and pattern recognition have also made it possible to automatically or semi-automatically identify zooplankton (Gorsky et al., 2010). Zooplankton imaging instruments have gone beyond just laboratory bench-top application (e.g., ZooScan) and now allow in-flow onboard counting and classification (e.g., ZooCAM; Colas et al., 2018) or are mounted on AUVs (Ohman et al., 2018) or even integrated into shallow water cabled observatories (the COSYNA-AWIPEV observatory in the Kongsfjorden Arctic fjord system and the COSYNA-Helgoland observstory; Fischer et al., 2020). It is prospected that similar imaging systems could be integrated into deep-sea cabled observatories for imaging of mero-planktonik larvae. Future modification of such systems could be used to analyse meiobenthos (e.g., FlowCAM; Kitahashia et al., 2018) or benthos could also be studied with the assistance of Sediment Profiling Imaging (SPI) systems.

Despite the rapid and widespread adoption of eDNA metabarcoding analysis for species identification, limitations still exist, and are the subject of much active research. Sequence length constrains imposed by HTS technology may contribute to the detection of false positives, when the target species is absent but its DNA, or rather the DNA of a close match, is recovered. Moreover, primer biases may generate false negatives, i.e., species that are not detected even though they are present. These limitations have been carefully evaluated but only partially overcome (Taberlet et al., 2012; Cristescu and Hebert, 2018 and references therein). Strategies to address such limitations can be intrinsic to the eDNA approach, e.g., the use of multiple markers, (Stefanni et al., 2018; Liu and Zhang, 2021), capture by hybridization approach (Günther et al., 2021), or long reads sequencing (Davidov et al., 2020) but it is also expected that this will improve with the integration of multidisciplinary survey approaches (e.g., combining imaging with eDNA).

Although ML methods (see previous section) could provide valuable tools to reduce errors, their application presents some difficulties. ML methods require ad hoc training sets of sequences and images, that are used as benchmark data repositories to reduce problems with taxonomy assignments of sequences, such as in the cases of: (i) false positives, when incorrect species are assigned to certain sequences based on sequence similarity with a close match; and (ii) rarity or endemism, when eDNA sequences match species that are not detected by video or in the historical records of the area. The above-mentioned data gaps and erroneous entries in genetic repositories are another source of uncertainty for classifier algorithms. ML methods are capable to solve these issues by using existing datasets and generating multiple species trees based on a percentage of similarities. Moreover, by applying the Lowest Common Ancestor (LCA) algorithm, ML can still identify unassigned sequences whose taxonomy is deficient due to the lack of reference sequences deposited in publicly accessible repositories. A different ML based methodology involves the taxonomy-free approach, where bio-monitoring information is obtained through the treatment of data obtained from DNA sequencing without taxonomic assignment (Apothéloz-Perret-Gentil et al., 2017; Feio et al., 2020). The main limits of this approach are the possibility to the under-sampling the input data and the need to calibrate the used bio-index (Apothéloz-Perret-Gentil et al., 2017).

The discovery and implementation of new barcoding markers will be necessary to address low resolution power of existing markers in taxa characterized by exceptionally low rates of mitochondrial evolution (e.g., anthozoans; Hebert et al., 2003) or recently diverged species (e.g., cypraeid marine gastropods; Meyer and Paulay, 2005). A solution may be found in complementing short read amplicon sequencing with sequencing technologies (ONT, PacBio) capable of longer read lengths (e.g., full genes or even mitogenomes). The development of such approaches could indeed be facilitated by integration with video/image data analysis and reference sequence repositories to further enhance species level identification capabilities.

Further technological limitations for eDNA methodology derive from knowledge gaps regarding the persistence and transport of eDNA (Collins et al., 2018; Murakami et al., 2019), which are largely unexplored in deep-sea environments. Persistence of eDNA in the marine environment can be assessed according to specific seascape properties of the sampled water mass, which can be easily measured by cabled observatories multiparametric habitat sensors assets (Aguzzi et al., 2010). eDNA decay involves multiple processes, including cellular, microbial degradation, and also spontaneous degradation of DNA caused and/or accelerated by UV, temperature, and pH (Collins et al., 2018; Harrison et al., 2019; Hunter et al., 2019). Furthermore, the spatial coverage of species detection by eDNA depends on local hydrodynamics (i.e., strength and direction of currents), which affects the dispersal and transport of molecules from neighboring areas (Harrison et al., 2019). The combined action of oceanographic and biogeochemical variables should be carefully considered when inferring the temporal and spatial coverage of the information provided by eDNA markers (Harrison et al., 2019), and taken into account in sampling design and collection.

Spatiotemporal coverage of environmental data collected by networks of observatories provides a unique opportunity to define optimized eDNA sampling strategies, in terms of the best timing for seawater collection based on eDNA persistence and passive dispersion in the local marine environment (i.e., as a “molecular connectivity”). This may be even further aided by continuous multiparametric data collection and increasing knowledge of environmental conditions favoring eDNA detection. Data integration, performed for example with Lagrangian models of eDNA dispersal (Andruszkiewicz et al., 2019), can form the basis for innovative sampling scenarios (e.g., timing and repetition of sampling depending un current status). In this way, the spatial and temporal distribution of a species across a given area could be predicted even before planning the eDNA sampling, based on the status of a combined set of local environmental factors (e.g., currents, temperature, and nutrients), complemented by video-counts of target species (see previous section).

Some examples of time-strategies for eDNA sampling that would benefit from this approach include: (i) pre-programming continuous or time-lapse sampling based on modeled, forecasted environmental conditions; (ii) pre-programming surveillance approaches based on real-time remote sampling that is activated only when a predefined set of conditions is met (e.g., within a range of current intensity, pH, temperature, or while the camera is activated); and (iii) synchronous sampling over large areas (by multiple samplers) thanks to the network of IOVs operating away from the cabled observatories.

Final Remarks

There is an impending need for the laying out of a roadmap for the effective collection and synthesis of high-quality deep-sea biodiversity data to fill knowledge gaps required for policy decisions and environmental management (Levin et al., 2020). This requires the identification of (i) consensus biodiversity variables to be monitored, and (ii) adequate and harmonized methods for their monitoring and assessment.

While optoacoustics can help generate baseline data on some taxa and their size and relative abundance, integration of DNA-based approaches (Scholin, 2010) can provide precise taxonomic information on species richness, including their response to shifts in local environmental conditions. In particular, eDNA metabarcoding allows augmented monitoring of biodiversity because it has the potential to detect organisms across the tree of life. It can be used for a variety of studies, from detecting invasive species to measuring the impact of human activities on ecosystems.

Integration of datasets obtained from eDNA, images, and other sources such as sound, can now be almost completely automated thanks to ML algorithms. Several existing coastal and deep-sea cabled observatories can host pilot studies. For those that have been in operation for many years, long-term time series of biological, and environmental data in different ecological contexts are already available hence providing solid baseline datasets (Table 1). Some of these observatories have already started to experiment with inclusion of long-term images acquisition and eDNA analyses, while others are planning to include eDNA surveys in the future. Cabled observatories and the network of IOVs operating from these platforms augmented by eDNA sensors could not only provide a framework to evaluate the effectiveness of eDNA protocols in situ, but could more importantly improve our knowledge on deep-sea biodiversity at an unprecedented spatial and temporal dimension. Under this vision, eDNA augmented observatories provide unprecedented opportunities to fill knowledge gaps on deep-sea biodiversity and ecosystem functioning, thus supporting monitoring and conservation strategies and contributing to the decade of deep-sea exploration that is now upon us.

Author Contributions

SS, LM, JA, and SM conceptualized and proposed the idea. SS and LM provided background knowledge to molecular technology and sequencing. LB, EM, and AG provided background knowledge to international conservation and management initiatives. DS and MM provided background knowledge to genetic databanks and bioinformatics. SM and FB provided background knowledge to imaging data analysis and machine learning methods. JA, FD, NC, JR, DC, and SS provided background knowledge to technology development. All authors contributed to the writing and finalization of the manuscript.


This research has been funded within the framework of the following project activities: ARIM (Autonomous Robotic Sea-Floor Infrastructure for Benthopelagic Monitoring; MarTERA ERA-Net Cofound); RESBIO (TEC2017-87861-R; Ministerio de Ciencia, Innovación y Universidades); JERICO-S3: (Horizon 2020; Grant Agreement no. 871153); ENDURUNS (Research Grant Agreement H2020-MG-2018-2019-2020 n.824348); Slovenian Research Agency (Research Core Funding Nos. P1-0237 and P1-0255 and project ARRS-RPROJ-JR-J1-3015). We also profited of the funding from the Spanish Government through the “Severo Ochoa Centre of Excellence” accreditation (CEX2019-000928-S) and Italian Ministry of Education (MIUR) under the “Bando premiale FOE 2015” (nota prot. N. 850, dd. 27 ottobre 2017) with the project EarthCruisers “EARTH’s CRUst Imagery for Investigating Seismicity, Volcanism, and Marine Natural Resources in the Sicilian Offshore”. Ocean Networks Canada was funded through Canada Foundation for Innovation-Major Science Initiative (CFI-MSI) fund 30199.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer XT declared a shared affiliation, though no other collaboration, with several of the authors DC and JA to the handling editor.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.


JA, JR, and DC are members of the Research Unit Tecnoterra (ICM-CSIC/UPC), providing the conceptual framework for technological development applied to ecosystem remote, autonomous, and long-lasting multidisciplinary monitoring. The authors wish to thank the two reviewers whose work has greatly improved the manuscript.




This article is autogenerated using RSS feeds and has not been created or edited by OA JF.

Click here for Source link (