Research Article |
Corresponding author: Leho Tedersoo ( bioleho@ut.ee ) Academic editor: Thorsten Lumbsch
© 2016 Leho Tedersoo, Ingrid Liiv, Paula Ann Kivistik, Sten Anslan, Urmas Kõljalg, Mohammad Bahram.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Tedersoo L, Liiv I, Kivistik PA, Anslan S, Kõljalg U, Bahram M (2016) Genomics and metagenomics technologies to recover ribosomal DNA and single-copy genes from old fruit-body and ectomycorrhiza specimens. MycoKeys 13: 1-20. https://doi.org/10.3897/mycokeys.13.8140
|
High-throughput sequencing (HTS) has become a standard technique for genomics, metagenomics and taxonomy, but these analyses typically require large amounts of high-quality DNA that is difficult to obtain from uncultivable organisms including fungi with no living culture or fruit-body representatives. By using 1 ng DNA and low coverage Illumina HiSeq HTS, we evaluated the usefulness of genomics and metagenomics tools to recover fungal barcoding genes from old and problematic specimens of fruit-bodies and ectomycorrhizal (EcM) root tips. Ribosomal DNA and single-copy genes were successfully recovered from both fruit-body and EcM specimens typically <10 years old (maximum, 17 years). Samples with maximum obtained DNA concentration <0.2 ng µl-1 were sequenced poorly. Fungal rDNA molecules assembled from complex mock community and soil revealed a large proportion of chimeras and artefactual consensus sequences of closely related taxa. Genomics and metagenomics tools enable recovery of fungal genomes from very low initial amounts of DNA from fruit-bodies and ectomycorrhizas, but these genomes include a large proportion of prokaryote and other eukaryote DNA. Nonetheless, the recovered scaffolds provide an important source for phylogenetic and phylogenomic analyses and mining of functional genes.
Fungal fruit-bodies, low-coverage genome reconstruction, metagenome analysis, functional gene mining, Illumina HiSeq
DNA sequences of high quality are essential for precise molecular identification of organisms and construction of phylogenies. For these purposes, inclusion of type specimens of the species is of utmost importance, because they carry taxonomic information and anchor the target species amongst potentially multiple cryptic taxa (
Fungi represent one of the most diverse groups of eukaryotes with potentially millions of species and a high incidence of sympatric and allopatric cryptic species (
Other co-occurring organisms in voucher specimens may hamper molecular identification and genomic analyses of the target specimen. In living cultures, only endohyphal bacteria are common, but fruit-bodies are often infested with prokaryotes, protists, other fungi and meiofauna (nematodes, collembolans, Diptera larvae, etc.). Ectomycorrhizal (EcM) root tips and lesions on plant leaves are usually dominated by a single causal biotroph, although a vast diversity of microscopic organisms co-occurs (
DNA sequences from the nuclear ribosomal RNA cistron have been widely used, both for identification and phylogenetics of fungi due to a large number of copies and the level of conservation sufficient for discriminating between individuals (the intergenic spacer; IGS –
The rapid development of high-throughput sequencing (HTS) tools has greatly improved our understanding about the phylogeny, genome structure and functioning of fungi (
Using Illumina HiSeq 2x150 paired-end sequencing technology, we evaluate the usefulness of low-coverage genomics and metagenomics analyses for recovering barcoding and other phylogenetically informative genes from voucher specimens of fruit-bodies and mycorrhizas in 85 samples simultaneously. In particular, we aimed to i) develop a protocol for genomics and metagenomics from minute amounts of material; ii) evaluate the possibility to obtain high-quality rDNA and SCG sequence data from old type specimens and root tips; and iii) explain why fruit-bodies and EcM root tips of certain taxa consistently fail to amplify and sequence. The ultimate purpose of this study is to extend the public record of high-quality DNA sequences from taxonomically valuable fruit-body voucher specimens and EcM fungal lineages.
For genomics analysis, we selected 56 voucher specimens of fruit-bodies collected from all continents within the last 54 years (Table
Herbarium code | Identification, EcM lineage | Category | Collection date | Biosample |
---|---|---|---|---|
DAR69412 | Densospora nuda (holotype) | Old | 1989-08-19 | SAMN04578188 |
DAR69419 | Densospora nanospora (holotype) | Old | 1989-08-31 | SAMN04578189 |
DAR69421 | Densospora solicarpa (holotype) | Old | 1989-08-31 | SAMN04578190 |
DAR69441 | Endogone magnospora (holotype) | Old | 1991-09-25 | SAMN04578191 |
TAAM 042608 | Rutstroemia juglandis (holotype) | Old | 1961-xx-xx | SAMN04578222 |
TAAM 137803 | Sarconiptera vinacea (holotype) | Old | 2000-xx-xx | SAMN04578223 |
TAAM 159500 | Pseudotomentella atrofusca | Old | 1996-09-03 | SAMN04578235 |
TAAM 166877 | Tomentella ferruginea | Old | 1997-08-18 | SAMN04578245 |
TAAM 181146 | Bankera violascens | Old | 2001-09-25 | SAMN04578233 |
TAAM 182408 | Larissia pyrola (holotype) | Old | 1980-xx-xx | SAMN04578220 |
TAAM 190020 | Arctomollisia kolymensis (holotype) | Old | 1975-xx-xx | SAMN04578221 |
TAAM 194916 | Lasiomollisia phalaridis (holotype) | Old | 2003-xx-xx | SAMN04578224 |
TU100021 | Pseudotomentella sp. nov. | Old | 2004-11-03 | SAMN04578243 |
TU100364 | Odontia cf. fibrosa | Regular | 2006-08-04 | SAMN04578228 |
TU100621 | Amaurodon mustialaensis | Regular | 2006-09-28 | SAMN04578251 |
TU100663 | Sarcodon squamosus | Regular | 2006-10-06 | SAMN04578240 |
TU105081 | Thelephorales, Fam. nov. | Regular | 2006-03-05 | SAMN04578226 |
TU108047 | Pseudotomentella mucidula | Regular | 2008-08-27 | SAMN04578242 |
TU108089 | Phellodon tomentosus | Regular | 2008-09-10 | SAMN04578241 |
TU108144 | Tomentellopsis echinospora | Regular | 2008-09-27 | SAMN04578250 |
TU108291 | Tomentella sp. nov. | Regular | 2009-05-01 | SAMN04578247 |
TU108357 | Pseudotomentella armata, comb.ined | Regular | 2009-05-08 | SAMN04578246 |
TU108377 | Thelephora terrestris | Regular | 2009-08-26 | SAMN04578229 |
TU108482 | Thelephorales, Fam. nov. | Regular | 2010-03-17 | SAMN04578248 |
TU110716 | Ceratobasidiaceae, /ceratobasidium1 | Regular | 2011-12-06 | SAMN04578167 |
TU110838 | Thelephorales, Fam. nov. | Regular | 2012-09-24 | SAMN04578168 |
TU113361 | Endogone | Unseq. |
2014-09-27 | SAMN04578192 |
TU115221 | Thelephorales, Fam. nov. | Regular | 2009-10-19 | SAMN04578249 |
TU115235 | Thelephorales, Fam. nov. | Old | 1997-06-12 | SAMN04578230 |
TU115270 | Pseudotomentella italica, comb.ined. | Regular | 2008-08-09 | SAMN04578244 |
TU115333 | Boletopsis leucomelaena | Regular | 2011-09-09 | SAMN04578187 |
TU115426 | Thelephorales, Fam. nov. | Regular | 2012-08-28 | SAMN04578172 |
TU116148 | Atheliales; /atheliales1 | Regular | 2013-01-14 | SAMN04578173 |
TU116208 | Cantharellus | Unseq. | 2013-07-15 | SAMN04578174 |
TU116326 | Helvella | Unseq. | 2013-09-19 | SAMN04578175 |
TU116380 | Helvella | Unseq. | 2013-10-13 | SAMN04578176 |
TU116400 | Helvella | Unseq. | 2013-11-16 | SAMN04578177 |
TU116448 | Pezizaceae | Unseq. | 2014-08-09 | SAMN04578178 |
TU116491 | Helvella | Unseq. | 2014-08-11 | SAMN04578169 |
TU116505 | Hydnum | Unseq. | 2014-08-11 | SAMN04578179 |
TU116506 | Cantharellus | Unseq. | 2014-08-11 | SAMN04578180 |
TU116517 | Helvella | Unseq. | 2014-08-11 | SAMN04578181 |
TU116528 | Clavulina | Unseq. | 2014-08-12 | SAMN04578182 |
TU116531 | Helvella | Unseq. | 2014-08-12 | SAMN04578171 |
TU116607 | Coltricia | Unseq. | 2014-08-12 | SAMN04578183 |
TU116615 | Helvella | Unseq. | 2014-08-12 | SAMN04578171 |
TU116680 | Endogone | Unseq. | 2014-10-20 | SAMN04578184 |
TU116699 | Glomus macrocarpum | Unseq. | 2014-10-21 | SAMN04578185 |
TU118650 | Hydnellum ferrugineum | Regular | 2012-08-28 | SAMN04578186 |
TU115206 | Pseudotomentella humicola | Old | 1997-xx-xx | SAMN04578231 |
TU123535 | Lenzitopsis oxycedri | Old | 1991-04-26 | SAMN04578232 |
TU100990 | Tomentella subamyloidea (isotype) | Old | 1999-08-24 | SAMN04578234 |
FP133500 | Pseudotomentella fumosa (holotype) | Old | 1972-11-16 | SAMN04578236 |
FP133849 | Pseudotomentella molybdea (holotype) | Old | 1974-11-06 | SAMN04578237 |
FP134609 | Pseudotomentella kaniksuensis (holotype) | Old | 1981-07-23 | SAMN04578238 |
SSMF695-4961 | Pseudotomentella griseopergamacea (holotype) | Old | 1961-10-21 | SAMN04578239 |
For metagenomics approach, we selected 29 vouchered EcM root tip specimens from TU-linked collections of L. Tedersoo and M. Bahram (Table
Sample code | Identification and EcM lineage | Category | Collection date | Biosample |
---|---|---|---|---|
IO577 | Tulasnellaceae, /tulasnella1 | Rare | 2010-06-xx | SAMN04578193 |
KP016 | Serendipitaceae, /serendipita1 | Rare | 2011-07-xx | SAMN04578194 |
L3043d |
Sebacina
|
Unseq. | 2006-08-xx | SAMN04578195 |
L3078g | Tulasnellaceae, /tulasnella2 | Rare | 2006-08-xx | SAMN04578196 |
L3136g | unidentified | Unseq. |
2006-08-xx | SAMN04578197 |
L3161g |
Discinella
|
Unseq. | 2006-08-xx | SAMN04578198 |
L3185g |
Inocybe
|
Unseq. | 2006-08-xx | SAMN04578199 |
L3196a |
Discinella
|
Unseq. | 2006-08-xx | SAMN04578200 |
L3196g |
Discinella
|
Unseq. | 2006-08-xx | SAMN04578201 |
L3273b | Helotiales, /helotiales5 | Rare | 2006-08-xx | SAMN04578202 |
L3289 | Helotiales, /helotiales4 | Rare | 2006-08-xx | SAMN04578203 |
L3371b | Helotiales, /helotiales3 | Rare | 2006-08-xx | SAMN04578204 |
L3581g | Helotiales, /helotiales6 | Rare | 2006-12-xx | SAMN04578205 |
L3619g | Endogonales, /densospora | Rare | 2006-12-xx | SAMN04578206 |
L7664 | Sordariales, /sordariales1 | Rare | 2010-03-xx | SAMN04578207 |
L8253 | Pyronemataceae, /pyronemataceae1 | Rare | 2010-07-xx | SAMN04578208 |
L8574J |
Tomentella
|
Unseq. | 2013-05-16 | SAMN04578209 |
L8601L | Pyronemataceae, /pyronemataceae2 | Rare | 2013-06-10 | SAMN04578210 |
L8623J |
Helvella
|
Unseq. | 2013-06-11 | SAMN04578211 |
L874 | Helotiales, /helotiales2 | Rare | 2005-07-xx | SAMN04578212 |
L8748B | Helotiales, /helotiales7 | Rare | 2013-07-03 | SAMN04578213 |
L8760B | Sordariales, /sordariales2 | Rare | 2013-07-04 | SAMN04578214 |
L8970d |
Tricholoma
fulvum
|
Unseq. | 2013-08-12 | SAMN04578215 |
L9188J |
Tulasnella
|
Unseq. | 2013-09-20 | SAMN04578216 |
L9238J |
Fischerula
macrospora
|
Unseq. | 2013-09-22 | SAMN04578217 |
L9302J |
Geopora
|
Unseq. | 2013-10-08 | SAMN04578218 |
N120 | Ceratobasidiaceae, /ceratobasidium2 | Rare | 2008-09-xx | SAMN04578219 |
TRON3.1 | Agaricomycetes, /agaricomycetes1 | Rare | 2012-04-xx | SAMN04578225 |
TS1000 | Pyronemataceae, /genea-humaria | Rare | 2006-08-xx | SAMN04578227 |
The DNA concentration of all samples was measured using Qubit dsDNA HS Assay Kit (Life Technologies, Carlsbad, CA, USA) and Qubit 2.0 Fluorometer (Invitrogen, Carlsbad, CA, USA) in January, 2015. Since the DNA concentration of most samples was <1 ng µl-1, the DNA (300 µl) was concentrated up to three times using 750 µl 96% ethanol, 2 µl Pellet Paint Co-Precipitant (cat no 69049–3; Novagen, Madison, WI, USA) and sodium acetate (0.3 M, pH 5.2). DNA precipitation was performed overight at -20 °C. The pellets were washed once with 75% ethanol (-20 °C) and dissolved into MilliQ water, followed by re-determination of the concentration. The obtained ’maximum concentration’ ranged from 0.05 to 8.13 ng µl-1 (median, 0.57 ng µl-1). All samples were diluted to the concentration of 0.2 ng µl-1 (if below, the maximum concentration was used) and 1 ng of DNA was used as an input to prepare sequencing libraries with Nextera XT kit (Illumina Inc., San Diego, CA, USA) according to the instructions of the manufacturer. The concentration of the libraries was measured with Qubit fluorometer and the libraries were pooled equimolarly. The library pools were concentrated with vacuum evaporation and then the library pools were validated by TapeStation analysis (Agilent Technologies, Santa Clara, USA) and qPCR with Kapa Library Quantification Kit (Kapa Biosystems, Wilmington, MA, USA) in order to optimize cluster generation. From each library, 22 pg or 54 pg (dilute samples) of DNA was used in the cluster generation and sequenced on the HiSeq2500 rapid flowcell (Illumina Inc.) with 150 bp paired-end reads protocol.
The metagenomics reads of individual samples were demultiplexed and quality-filtered using sdm script of the Lotus pipeline (
The reference database for genomic and metagenomic fragments comprised 46 fungal genomes and 30 bacterial genomes (present in samples according to rDNA analysis). For the selected SCGs, we used a reference data set of
To evaluate the relative performance of genomics and metagenomics approaches for recovering genetic information of fungi from root tip and fruit-body material of different quality, we constructed linear regression and ANOVA models. First, we tested the effects of the maximum DNA concentration, age of specimen and age of DNA as well as DNA extraction method on the number of reads, size of all scaffolds (confirmed fungal and total and proportion of known fungal) and the longest scaffolds representing rDNA by use of general linear models and forward selection of variables as implemented in Statistica (Statsoft Inc., Tulsa, OK, USA). We determined Pearson correlations among the recovered length of ribosomal and mitochondrial rDNA and SCGs. Further, we arbitrarily chose a threshold of 1500 bases as a criterion for ‘successful’ sequencing of a barcode, because this value roughly corresponds to the size of mitochondrial SSU and LSU, nuclear SSU and the fragment of commonly amplified nuclear LSU (primers ITS3 and LR5 or LR0R and LR7) as well as SCGs. Differences in sequencing success among markers, sample material (fruit-body vs EcM) and fruit-body type (‘old’, ‘regular’ and ‘unsequenced’, see above) were tested using a series of Fisher’s exact tests.
To shed light on the potential issues with DNA secondary structure on amplification and sequencing success in Sanger sequencing, we calculated the minimum free energy (MFE) of the secondary structure of ITS1 and ITS2 reads using RNAstructure (default options for DNA;
DNA extraction methods yielded similar DNA content and concentration that usually required further concentrating efforts given the small size of our samples. Compared with other methods, the simple ammonium sulphate lysis (cf.
Effect of specimen age on the recovery of reads in the Illumina HiSeq run. Closed circles, ‘old’ fruit-bodies; shaded circles, ‘regular’ fruit-bodies; open circles, ‘unsequenced’ fruit-bodies; shaded triangles, ectomycorrhizal root tips representing unique rare lineages; open triangles, ‘unsequenced’ ectomycorrhizal root tips.
The proportion of genomic and metagenomic sequences belonging strictly to fungi varied greatly across samples, being on average three times lower for EcM root tip (median, 1.5%; SD, 3.7) compared with fruit-body (median, 4.5%; SD, 15.9) samples (F1,82=10.8; R2=0.098; P=0.001). The lack of closely related reference genomes clearly hampered unequivocal assignment of genomic fragments to fungi or other organisms. Of these, bacteria were the most common organisms in fruit-bodies and EcM root tips, whereas plant scaffolds strongly contributed to the EcM-derived metagenome. However, plant contribution was difficult to establish, because of the large size and ample non-coding regions in plant genomes. Samples of old fruit-bodies and particularly EcM root tips included multiple co-inhabiting fungal species. Their coverage was distinctly lower than that of the target species, but unambiguous separation of these satellite taxa was more difficult for relatively fragmented genomes.
The coverage of nuclear and mitochondrial DNA and SCGs and their ratio varied greatly across samples independent of sample origin (fruit-body vs. EcM) and category (Suppl. material
Among the target regions of fruit-body and EcM samples, nuclear rDNA was relatively more efficiently recovered compared with mitochondrial rDNA and both of these were sequenced with greater success than SCGs (P<0.01 in all cases). There was no difference in the recovery rate among individual SCGs (P>0.5), although RPB1 was completely missing in two samples (ectomycorrhiza of the /genea-humaria lineage TS1000 and Sarcodon squamosus TU100663) that exhibited nearly full-length recovery of other SCGs and rDNA. Apart from other taxa, most specimens belonging to Thelephorales contained two highly divergent copies of the TEF1 gene.
The SCGs were significantly less efficiently recovered in EcM samples compared with fruit-body samples (by a factor of 1.9 to 6.2; P<0.001), but the recovery of nuclear and mitochondrial rDNA was comparable between sample types (P>0.1). Across all samples, the maximum DNA concentration (partial effect: F1,82=26.7; R2=0.201; P<0.001; Fig.
Within fruit-body collections, rDNA and SCGs were better recovered from ‘regular’ and recent ‘unsequenced’ collections than ‘old’ material (Suppl. material
Fruit-body samples displayed great variation in genomic sequencing success. The ‘old’ samples sequenced most poorly - i.e., nuclear rDNA >1500 bases could be retrieved only for 43.0% of specimens, which is significantly less compared with ‘unsequenced’ (73.3%) and ‘regular’ (84.2%) specimens (P<0.01). Mitochondrial rDNA and SCGs were also relatively poorly recovered in ‘old’ collections, although the differences were less pronounced among the categories (0.01<P<0.15).
The HTS approach highlighted that primer bias and atypically long ITS markers may account for the Sanger sequencing problems in ‘unsequenced’ fruit-body samples. In particular, several Helvella spp. and Cantharellus spp. exhibited ITS1 markers of 500-600 bases that exceed the average values three-fold (
While most collections of stipitate fruit-bodies were relatively free from co-colonization by other fungi, specimens of Helvella and those with hypogeous and resupinate fruit-bodies were commonly inhabited by multiple putatively saprotrophic or mycoparasitic fungal taxa. Of these, Tulasnella, Rhizoctonia (syn. Ceratobasidium) and unidentified genera of Eurotiales and Sordariales were the most common. Their nuclear rDNA scaffolds were of relatively lower coverage even if the sequences were nearly full-length. Similar patterns but notably shorter satellite sequences were evident in mitochondrial rDNA (up to, 2000 bases) and SCGs (up to 500 bases).
There were no differences in rDNA and SCG recovery among EcM root tip samples that failed determination previously and those representing rare lineages. Out of 12 previously unidentified EcM root tip samples, only one (L3136g) remained further without identification due to low maximum DNA concentration (0.07 ng/µl) and hence low number of retrieved sequences (173,554 reads). Based on the ITS region, the Tasmanian sequences were identified as Sebacina sp. (L3043d), Inocybe australiensis (L3185g), and Discinella sp. (/helotiales4 lineage; L3161g, L3196a, L3196g). The Estonian sequences were identified as Tomentella sp. (L8574J), Helvella sp. (L8623J), Geopora sp. (L9302J), Tulasnella sp. (L9188J), Tricholoma fulvum (L8970d) and Fischerula macrospora (L9238J) based on the full or partial ITS sequences (Suppl. material
Using the metagenomics approach, three out of 17 EcM root tips with successful Sanger sequences (L848, L8601, L8760b) failed to retrieve high-quality nuclear rDNA sequences >1500 bases. A single EcM fungus always dominated in nuclear rDNA, but the samples were often co-inhabited by a myriad of ascomycetes, in particular Helotiales, Sordariales, Hypocreales and Dothideales. Basidiomycetes were less common, although Tulasnella, Ceratobasidiaceae and Tremellales (Cryptococcus) occurred in multiple samples. The ratio of plant to fungal nuclear rDNA varied nearly 80-fold, ranging from 0.21 to 16.3 (median, 1.76) with no apparent differences among host taxa.
Across all 29 EcM root tip metagenomes, fungal TEF1, RPB1 and RPB2 scaffolds >1500 bases were successfully obtained for two, five and fourteen samples, respectively. For 13 samples, none of these SCGs were recovered (scaffolds <500 bases). In successfully sequenced EcM samples, individual SCGs typically occurred in several scaffolds located tens to a few hundred bases apart based on mapping to the alignment. BlastN searches against INSDc and comparisons with rDNA revealed that the largest scaffolds obviously belong to the targeted mycobiont. The co-occurrence of other fungi rendered the taxonomic assignment of SCG scaffolds ambiguous.
The two highly complex soil metagenomes comprised altogether four fungal nuclear rDNA scaffolds >500 bases in size, three of which were obvious chimeras. The mock community sample included 25 scaffolds encompassing ITS or any of the nuclear rDNA genes (>500 bases). Comparisons with respective Sanger sequences revealed that 32% of these sequences were chimeric, some of which comprising >2 parents. Two of the chimeric sequences were ‘circular’, i.e. comprised of a full-length rDNA and fragments of another taxon in one of the ends. Most of the chimeric breaks were located in the conserved regions of 3’ half of the SSU and 5’ end of LSU, but none were evident in the 5.8S rRNA gene. SSU and LSU of certain congeneric taxa (Lyophyllum spp., Tomentella spp.) were represented by a consensus sequence that matched perfectly to none of the ingredient specimens. In scaffolds with lower coverage, 5’ or 3’ ends were sometimes highly diverged from the corresponding Sanger sequence or any database sequences, indicating that artefactual sequences are, to some extent, generated by metagenomics methods.
We recovered partial fungal genomes and metagenomes from <1 ng DNA of fruit-body and EcM root tip samples with variable success, depending on specimen age and DNA quality (see below). This indicates that fungal genomes can be sequenced from minute amounts of DNA if sufficient quality is secured. The current genome sequencing protocols in the 1000 Fungal Genomes project require three to four orders of magnitude more DNA (http://genome.jgi.doe.gov/programs/fungi/1000fungalgenomes.jsf) that cannot be obtained from tiny samples. In comparison, the genomes of prokaryotes are on average ten times smaller and these have been successfully recovered from common species (upwards 1% relative abundance) in the complex environmental material (
Our study aimed to recover the most important genetic markers used for barcoding and phylogenetic reconstruction. Nuclear and mitochondrial rDNA sequences were successfully recovered from most fresh and high-quality samples but typically not from fruit-body specimens >10 years old. For these old specimens, the maximum obtained DNA concentration, a proxy for DNA quality and quantity, remained <0.2 ng/µl. Although other DNA samples were further diluted to this level for library preparation, barcoding markers could not be usually obtained from samples with 0.05-0.2 ng/µl maximum DNA concentration. Because Nextera approach uses DNA fragmentation and 12 cycles of PCR in the ligation step (‘tagmentation’), the short DNA molecules of degraded material (
Across all samples, nuclear and mitochondrial rDNA were more efficiently recovered compared with SCGs, which reflects the results from amplicon sequencing (
We sought to uncover the causes why certain fungal species and EcM morphotypes have remained unidentified using direct Sanger sequencing of amplicons. We showed that EcM root tip DNA was degraded and/or comprised of multiple fungal species, which may have disabled direct Sanger sequencing. In fruit-body samples, excessive length of ITS1 sequence might have caused low amplification success in several Cantharellus spp. and Helvella spp. Due to rapid evolution of rDNA genes in Cantharellus (
Ribosomal DNA scaffolds from soil and mock community metagenomes indicated artificial generation of a high proportion of chimeric scaffolds during DNA assembly. This demonstrates that markers with long conserved regions such as nuclear rDNA cannot be reliably assembled even in simple fungal communities. Furthermore, artificial consensus sequences were generated for closely related species with nearly identical SSU and LSU. While such artefacts can be relatively easily tracked in mock communities, metagenomic assembly of rDNA is particularly problematic for natural samples from more complex substrates that comprise hundreds to thousands of fungal species. Due to short scaffolds and the paucity of reference data, we cannot estimate the reliability of scaffold assembly in mitochondrial genes and SCGs, but this may be more problematic with closely related species. Such assembly problems are considered of minor importance in prokaryote metagenomes (
Taxonomically informative rDNA genes and SCGs can be sequenced from <1 ng DNA of fruit-body and EcM root tip specimens using genomics and metagenomics approaches, respectively. However, fruit-body specimens >10 years old need specific care for obtaining high-quality DNA or require fragmentation-free options for ligation. HTS methods also enabled us to recover large fragments of fungal genomes for a majority of EcM root tips and fruit-bodies that could not be sequenced using Sanger method or that represented unique (including type) material. For high-quality DNA samples, two million (meta)genomic reads were sufficient to recover the full-length nuclear rDNA. Recovery of SCGs was more unpredictable among samples, requiring roughly 10 million unpaired reads. This enables sequencing of ca. 50 fungal genomes on a single 2x150 paired-end Illumina HiSeq run at low coverage (5-10 ×; cf.
The currently available sequence length and error rate combination does not allow reliable large-scale assembly of genetic information of eukaryotes from complex communities using a single HTS platform. Besides tens and hundreds of millions of Illumina HiSeq reads, metagenomics analyses would benefit from additional low-coverage sequence analysis of long (up to 3000 bases at 5-8 times circular coverage) fragments as routinely implemented by Pacific Biosciences for in-depth genomic reconstructions. Long amplicon-free backbone sequences reduce the incidence of chimeras and assembly artefacts. Combined with targeted marker capture, this approach would allow greater throughput of eukaryote target genes and more efficient utilization of phylogenetics tools in metabarcoding and community-level functional metagenomic analyses.
LT planned and designed research; UK provided material; IL and PAK performed laboratory analyses; MB, LT and SA analysed data; LT wrote the manuscript with others’ input.
This work is funded from the Estonian Science Foundation grants 9286, 171PUT, and EMP265. We thank I. Saar and K. Pärtel for providing some of the specimens, DNA extracts and associated metadata. We are grateful to four referees for their constructive comments on earlier versions of the manuscript.
Full information and metadata about the genomic and metagenomic samples
Data type: table
Explanation note: Detailed information about metadata, DNA quality and genomic/metagenomic results of fruit-body and EcM root tip samples.