Shotgun metagenomes and multiple primer pair-barcode combinations of amplicons reveal biases in metabarcoding analyses of fungi

Rapid development of high-throughput (HTS) molecular identification methods has revolutionized our knowledge about taxonomic diversity and ecology of fungi. However, PCR-based methods exhibit multiple technical shortcomings that may bias our understanding of the fungal kingdom. This study was initiated to quantify potential biases in fungal community ecology by comparing the relative performance of amplicon-free shotgun metagenomics and amplicons of nine primer pairs over seven nuclear ribosomal DNA (rDNA) regions often used in metabarcoding analyses. The internal transcribed spacer (ITS) barcodes ITS1 and ITS2 provided greater taxonomic and functional resolution and richness of operational taxonomic units (OTUs) at the 97% similarity threshold compared to barcodes located within the ribosomal small subunit (SSU) and large subunit (LSU) genes. All barcode-primer pair combinations provided consistent results in ranking taxonomic richness and recovering the importance of floristic variables in driving fungal community composition in soils of Papua New Guinea. The choice of forward primer explained up to 2.0% of the variation in OTU-level analysis of the ITS1 and ITS2 barcode data sets. Across the whole data set, barcode-primer pair combination explained 37.6–38.1% of the variation, which surpassed any environmental signal. Overall, the metagenomics data set recovered a similar taxonomic overview, but resulted in much lower fungal rDNA sequencing depth, inability to infer OTUs, and high Copyright Leho Tedersoo et al. This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. MycoKeys 10: 1–43 (2015) doi: 10.3897/mycokeys.10.4852 http://mycokeys.pensoft.net A peer-reviewed open-access journal


introduction
Fungi are one of the most diverse kingdoms of life on Earth (Blackwell 2011). Traditionally, fungi have been identified based on morphological characters of fruit-bodies or pure cultures in agar medium. Because most fungal species do not seem to form conspicuous fruit-bodies or are difficult to isolate into culture, molecular methods have been widely implemented to shed light onto the occurrence and ecological patterns of these fascinating organisms. In particular, high-throughput sequencing (HTS) technologies enable documentation and characterization of hundreds to thousands of species simultaneously from biological samples. These methods have been used successfully to uncover diversity and community composition of fungi in various substrates and localities, including little-studied habitats such as mangroves, aquatic ecosystems, and arctic soils (Arfi et al. 2012;Blaalid et al. 2013;Livermore and Mattes 2013). Similarly to other amplicon-based methods, HTS techniques are sensitive to PCR and primer biases (Lindahl et al. 2013). At least in theory, ultra-HTS technologies such as Illumina HiSeq may provide sufficient sequencing depth to detect barcode sequences from the entire metagenome as demonstrated for prokaryotes and soil animals (Zhou et al. 2013;Logares et al. 2014). The genomes of eukaryotes are typically several orders of magnitude larger compared to bacterial genomes such that the relative proportion of ribosomal DNA in the metagenome will be lower. This may complicate their detection without enrichment.
The internal transcribed spacer (ITS) region of the nuclear ribosomal DNA (rDNA) is the formal barcode for molecular identification of fungi . Alternative ribosomal DNA markers located in the nuclear rDNA large subunit (LSU/28S) and small subunit (SSU/18S) genes are often used to address phylogenetic diversity of fungi, especially in groups such as the former Zygomycota and Chytridiomycota (the so-called early diverging lineages) and Glomeromycota. Taxa belonging to the little-studied phyla Cryptomycota and Chytridiomycota as well as groups comprising mostly unicellular species are difficult to place phylogenetically based on variable markers such as the ITS region due to high genetic divergence and paucity of reference sequences (Ishii et al. 2015). Because taxonomic resolution strongly depends on the choice of barcodes (Arfi et al. 2012;Schoch et al. 2012;Kohout et al. 2014), the use of different markers and analysis methods renders studies on fungal biodiversity largely incomparable. In addition, alternative primers targeting the same barcode are known to differ in the recovery of taxa both in silico (Bellemain et al. 2010) and in vivo (Tedersoo et al. 2010;Op de Beeck et al. 2014).
Discussion related to potential taxonomic biases in relation to the class Archaeorhizomycetes due to the choice of primers in our global soil analysis  Rosling in press; C.W. Schadt and A. Rosling, pers. comm.) motivated us to undertake this study. Here we address potential biases in diversity and taxonomic composition of fungi in relation to PCR-free approaches and barcode and primer pair combinations of amplicons. We specifically aim to answer the following questions: i) do amplicon-free and PCR-based techniques reveal comparable results?; ii) are there substantial primer biases?; iii) to what extent does the recovered taxonomic richness differ among selected barcode-primer pair combinations?; and iv) do markers differ in their ability to recover ecological patterns in community-level analyses? This study seeks to evaluate the suitability of amplicon-free HTS technologies in fungal metabarcoding and to elaborate on the most suitable barcodes and primers for ampliconbased HTS studies by individual research groups and global metabarcoding consortia based on in vivo and in silico analyses.

Sampling and molecular analysis
Between 5 and 30 November 2011, 34 composite soil samples were collected from woody plant-dominated ecosystems in Papua New Guinea (PNG) following a standard protocol . Briefly, two soil cores (5 cm diam. to 5 cm depth) were collected at a distance of 1-1.5 m from 20 randomly selected tree trunks in each of the 2500-m 2 plots. These 40 soil cores from each plot were pooled and subjected to DNA extraction, analysis of soil chemistry, and assignment of climatic variables as described in Tedersoo et al. (2014). For in-depth molecular analysis, we selected 26 samples that represented various ecosystems in PNG and that had relatively well-preserved and inhibitor-free DNA (Suppl. material 1).
To address barcode and primer biases, we selected seven barcodes in SSU (variable domains V4 and V5), ITS (ITS1 and ITS2), and LSU (variable domains D1, D2, and D3) of the nuclear rDNA ( Figure 1). These short barcodes represent the most common marker choices in cloning and metabarcoding studies of fungi either alone or combined within a longer stretch of rDNA. To amplify these barcodes, we selected the most commonly used primers for each respective marker, targeting primer pairs that produce amplicons of 250-400 nucleotides on average. For the ITS1 and ITS2 barcodes, we used alternative forward primers to further quantify potential biases arising from primer choice. Due to a lack of suitable primers between the variable D1 and D2 domains of LSU, we designed the LF402 primer and its reverse complement LF402F that are relatively specific to fungi compared to other eukaryotes (Appendix 1). For the V4 domain of SSU, we generated a reverse primer Euk742R that covers a wide range of eukaryotes including the most common fungal groups. In addition to previous modifications to the ITS3 and ITS4 primers (ITS3tagmix and ITS4ngs, respectively;Tedersoo et al. 2014), we also optimized the commonly used SSU515F, SSU1196R, ITS1, ITS1F, and LR0R primers to improve their taxonomic coverage and to balance their melting temperature compared to the respective reverse primers. All primers are detailed in Table 2 and their relative positions are indicated in Figure  1 and on the UNITE homepage (https://unite.ut.ee/primers.php). We further evaluated the suitability of these primers through in silico-based assays on SSU and LSU data sets of SILVA release 119 (Quast et al. 2013), the SSU, ITS and LSU data sets of Bellemain et al. (2010), and the ITS reference data set of UNITE ). In addition to manual comparison of primer and template sequences, we used EcoPCR (www.grenoble.prabi.fr/trac/ecoPCR) for SSU and LSU. The mismatched primer variants were further BLASTn-searched against the International Nucleotide Sequence Database consortium (INSDc) to determine the natural frequency of inferred mismatches and to distinguish these from rare sequencing errors. The primer mismatches with fungal templates are outlined in Appendix 1.
A reverse or forward primer for each barcode was supplemented with one of the sixteen 10-base identifier tags (Table 1). We used a mock community sample with known composition of 24 species (Suppl. material 2) and a soil sample from Alaska as positive controls. We also added negative controls and spiked two soil samples (G2655 and G2658) with 2% genomic DNA from Archaeorhizomyces finlayi (isolate Ny10) and A. borealis (isolate 600) to screen accumulation of their sequences in primer pairs that have no or two mismatches (ITS4ngs and LR0Rngs) to the Archaeorhizomycetes. The PCR cocktail consisted of 0.6 µl DNA extract, 0.5 µl each of the primers (20 pmol), 5 µl 5xHOT FIREPol Blend Master Mix (Solis Biodyne, Tartu, Estonia), and 13.4 µl double-distilled water. PCR was carried out in four replicates using the following thermocycling conditions: an initial 15 min at 95 °C, followed by 30 cycles at 95 °C for 30 s, 55 °C for 30 s, 72 °C for 1 min, and a final cycle of 10 min at 72 °C. Four replicate PCR products were pooled and their relative quantity was estimated by running 5 µl amplicon DNA on 1% agarose gel for 15 min. DNA samples yielding no visible band were re-amplified using 35 cycles in an effort to obtain sufficient PCR product, whereas samples with a very strong band were re-amplified with only 25 cycles. Amplicons were subjected to quantity normalization with a SequalPrep Normalization Plate Kit (Invitrogen, Carlsbad, CA, USA) following the manufacturer's instructions. Normalized amplicons were subjected to ligation of Illumina adaptors using two variants of the TruSeq DNA PCR-free HT Sample Prep kit (Illumina Inc., San Diego, CA, USA). All samples were sequenced in a single Illumina MiSeq 2×300 paired-end run. HTS sequences are available in Short Read archive under accession SRP055957.
Selected soil samples (Suppl. material 1) were also subjected to shotgun sequencing of the full metagenome. DNA was fragmented to 300-400 bases on average with the Covaris S2 instrument (Covaris Ltd, Woburn, MA, USA). Samples were prepared for sequencing with the TruSeq Nano DNA HT Library Prep Kit (Illumina Inc.) accor- ding to the instructions of the manufacturer. These samples were sequenced using the Illumina HiSeq 2×250 paired-end protocol covering 80% of a full plate.

Bioinformatics
Paired-end sequencing (2×300 bp) in the Illumina MiSeq sequencer resulted in 12,771,565 reads. LSU and SSU amplicons were paired, quality filtered, and demultiplexed using the LOTUS pipeline (Hildebrand et al. 2014) with the following options: overlap >9 bases, average quality >28, no DNA ambiguities allowed, no differences to primer+identifier allowed, homopolymer length <9, and minimum sequence length >199 bases. Primers and tags were removed and reads were trimmed to 200 bases covering the variable domain, and clustered with cd-hit 4.6.1 (Fu et al. 2012) at 97% sequence similarity into operational taxonomic units (OTUs). Chimeric OTUs were detected using the UCHIME de novo and reference-based algorithms as implemented in USEARCH 7 (Edgar et al. 2011). The SILVA 119 data set was used as reference corpus. The ITS reads were quality filtered using MOTHUR 1.33.3 (Schloss et al. 2009) with the following options: average quality over 15 nucleotide window >30, no ambiguities allowed, and homopolymer length unrestricted. The remaining reads were paired using the PANDASEQ assembler (Masella et al. 2012) with minimum overlap of 25 bases and demultiplexed using mothur (no differences to primer+identifier allowed, minimum sequence length >200 bases). ITS1 and ITS2 sequences were processed in ITSx 1.0.9  to remove flanking 18S, 5.8S and 28S rRNA gene fragments. Potential chimeras were removed as described above, using the UNITE-UCHIME release as reference (Nilsson et al. in press). The few reads containing primer sequences were excluded using Unix commands. The quality-filtered ITS1 and ITS2 sequences were clustered as described above.
Following exclusion of singletons from all HTS data sets (cf. Brown et al. 2015), representative sequences were selected for each OTU based on the greatest similarity to the consensus sequence. These sequences were subjected to taxonomic assignments using the NAÏVE BAYESIAN CLASSIFIER (NBC; Wang et al. 2007) and BLASTn (options: word size=7, gap penalty=-1, gap extension penalty=-2 and match score=1) searches retrieving 10 best BLASTn matches. For both methods, The SILVA 119 eukaryote data set and the INSDc fungal data set served as references for SSU and LSU. The UNITE 7.0beta data set (https://unite.ut.ee/repository.php) and the INSDc eukaryote data set were used as a reference for BLASTn-and NBC-based taxonomic assignment of ITS, respectively.
Because of differential taxonomic resolution among the barcodes, we used the taxonomic assignments of both NBC and BLASTn searches to complement each other as both methods alone provided no assignment for ca 40% of OTUs due to poor representation of fungal data in SILVA, obvious misidentifications in INSDc, and great abundance of taxonomically unassigned sequence data (resulting in poor reso-lution using NBC). To optimize classification, we therefore combined and verified results of different methods and determined approximate E-value and sequence similarity thresholds for robust identification to phylum or class level for each barcode. For the SSU barcodes, we determined that sequences with a BLASTn E-value <e -80 and >90% similarity to verified fungal sequences could be reliably assigned to the fungal kingdom. Sequence similarities at or greater than 95% served to provide class-level assignments. Taxa with short introns in the SSU V5 barcode (primers pair SSU817F-SSU1196Rngs) such as some members of Pezizomycotina and Agaricomycetes exhibited more variation, and these groups were therefore manually evaluated for class-level affinity. For ITS1, ITS2, and LSU D2 (primers LF402Fmix-TW13), we determined that BLASTn E-values <e -50 and sequence similarity >75% over >70% sequence length allowed robust assignment to the fungal kingdom. For individual classes, we relied on an 80% sequence similarity threshold, except the early diverging lineages, Archaeorhizomycetes, and Cantharellales, where we used 75% sequence similarity. For the LSU D1 barcode (LR0Rngs-LF402 primers), BLASTn E-values <e -80 and sequence similarity above 80% were sufficient to consider sequences to be fungal, whereas similarity >85% was indicative of class-level affiliation. For the D3 barcode of LSU (LR3R-LR5 primer pair), BLASTn E-value <e -100 and sequence similarity >85% to fungi was strongly suggestive of fungal origin, and sequence similarity >90% allowed placement to classes. Many Leotiomycetes and Eurotiomycetes possessed introns close to the LR5 primer site, which allowed identification of these groups at >80% similarity level. For all barcodes, phylogenetic placement of BLASTn matches and NBC were highly concordant, except several instances in Glomeromycota, Microbotryomycetes, and Tremellomycetes. For these groups, we relied on the 10 best BLASTn matches.
We followed the taxonomy of INSDc, except raising several early diverging lineages to phylum rank (cf. Tedersoo et al. 2014). The OTUs of ITS1 and ITS2 barcodes were assigned to EcM lineages following Tedersoo and Smith (2013). For several common lineages such as /hebeloma-alnicola, /cortinarius, /cenococcum and /meliniomyces, SSU and LSU barcodes provided insufficient discrimination from non-EcM relatives such that it was impossible to reliably assign ecological categories to SSU and LSU data.
For the shotgun metagenome data, samples were demultiplexed, and LSU and SSU regions of all organisms were extracted using SORTMERNA (Kopylova et al. 2012). Ribosomal DNA sequences were classified against the SILVA 119 reference database (SSU, ITS, and LSU) and UNITE 7.0beta (ITS region). We also generated sets of 10 best BLASTn matches against the INSDc fungal data set. We manually checked conflicting assignments as well as those with <95% similarity to fungal sequences to improve phylum-and class-level classification.
To understand potential amplification biases related to sequence length in the ITS1 and ITS2 barcodes, we downloaded all ITS sequences of the 16 most common fungal classes (based on our amplicon data) from UNITE 7.0beta data set. Ribosomal RNA genes flanking the ITS1 and ITS2 barcodes were trimmed using ITSx 1.0.9. Average and median values and standard deviations were calculated for each group.

Statistical analyses
For OTU-based statistical analyses, we removed all non-fungal sequences and rarefied all amplicon samples to a depth of 8609 sequences using MOTHUR. This depth represents the median number of sequences of the ITS1 (ITS1ngs-ITS2 primers) barcode that was the second lowest among all markers ( Table 2). The LSU D2 barcode comprised several times less sequences, and this marker was therefore removed from comparisons of taxonomic richness and statistical performance of barcodes, but kept for comparisons of taxonomic coverage. Soil sample G2653 was poorly covered by sequence data for three barcodes (<1000 sequences) and was therefore omitted from all analyses. Samples G2666 (SSU V5 barcode), G2677 (SSU V4 barcode), and G2678 (LSU D3 barcode) were variously poorly represented in single data sets. Their values were treated as missing data in the statistical analyses.
OTU accumulation curves and their 95% confidence intervals were computed for ITS1 and ITS2 barcodes using ESTIMATES 9.1.0 (Colwell 2013). To further address whether ITS1Fngs and ITS1ngs or ITS3tagmix and gITS7 primers exhibit additive performance (i.e., recover more OTUs when combined), we pooled the primer pair data by samples and rarefied these combined samples to 8609 sequences, followed by inference of OTU accumulation curves as described above.
Differences in OTU richness among samples and barcode-primer combinations were evaluated based on two-way main-effect ANOVAs supplemented with Unequal n HSD tests for multi-level comparisons. To address the relative performance of the eight barcode-primer pair combinations in recovering the role of spatial, edaphic, floristic, and climatic predictors on fungal community composition, we performed multivariate permutational ANOVAs as implemented in the ADONIS routine of the vegan package of R (R Core Development Team 2013). Geographical coordinates were translated into Principal Coordinates of Neighbouring Matrices (PCNM) vectors with soil element concentrations being logarithm-transformed prior to analyses. All four categories of variables were included in separate matrices for these analyses following Põlme et al. (2014). We used both Bray-Curtis dissimilarity and Hellinger distance in analyses of community composition. We also analyzed the relative importance of primer choice for the ITS1 and ITS2 barcodes in ADONIS. Because data sets differed substantially in matrix connectance, we tested whether this matrix property affects the adjusted determination coefficients (R 2 adj ) that are widely used to interpret model performance quantitatively through the explained variance. Therefore, we re-analyzed the ITS1Fngs-ITS2 data set at 12 levels of connectance by selectively removing species based on their frequency from one (4%) to 12 (48%) and calculated the correlation between the logarithm of connectance and R 2 adj as revealed from ADONIS analyses. To further test changes in phylogenetic community structure among samples and barcode-primer combinations, we assigned the OTUs to fungal classes. Sequence number-based proportions of classes were log-ratio transformed in relation to the proportion of non-Agaricomycetes and unassigned fungi to ensure statistical independence of these groups. The proportion of EcM and arbuscular mycorrhizal (AM) to non-mycorrhizal sequences was similarly log-ratio transformed but only analyzed for the four ITS data sets. Using the Hellinger distance and Bray-Curtis distance, the effects of barcode-primer pair combination were analyzed using ADONIS. Global Nonmetric Multidimensional Scaling (GNMDS) graphs were drawn in parallel using the same options. 95% confidence ellipses were calculated using ORDISURF routine in the Vegan package of R.
Potential analytical biases in the recovery of fungal classes by ribosomal DNA region (SSU, ITS, and LSU) and analysis method (metagenomics and amplicon) were addressed based on the average values of 14 shared samples using two-way maineffect ANOVAs neglecting interactions. rDNA region-based biases in amplicon and metagenomics data sets were further tested using two-way ANOVAs including the samples and regions as fixed factors.
To understand possible amplification biases related to sequence length in the ITS1 and ITS2 barcodes, we downloaded all ITS sequences of the 16 most common fungal classes from UNITE 7.0beta. Ribosomal RNA genes flanking the ITS1 and ITS2 barcodes were trimmed using ITSx 1.0.9. Average and median values and standard deviations were calculated for each group for illustrative purpose.

Amplicons
Although combinations of samples and primer pairs were normalized separately, sequences assigned to each primer pair were differentially represented in the raw and final data sets ( Table 2). Much of the data loss from a total of 11,691,106 demultiplexed sequences was attributable to unpaired reads and stacked reads, where certain reads consisted of two repeats of itself. Some of these incomplete repeats could not be successfully filtered out by bioinformatics tools, but these sequences were easily recognized due to their abnormal length (>480 bases) and were removed manually. Remarkably, one such artefactual OTUs formed the second most abundant taxon in the ITS2 data set. The unequal occurrence of barcode-marker combinations was not related to the average marker length as the least represented LSU D2 (LF402Fmix-TW13) barcode was the shortest among all barcodes. Therefore, we recommend not to multiplex different barcodes in the same libraries unless their roughly equal representation is known.
Barcodes generated by the universal primer pairs SSU515Fngs-Euk742R (SSU V4) and LR3R-LR5 (LSU D3) exhibited the distinctly lowest proportion of fungal sequences (67-68%), suggesting that fungi account for roughly two thirds of eukaryote ribosomal DNA in the studied soils on average. The proportion of fungal sequences was the greatest for the primer pair LR0Rngs-LF402, reaching 99.9% of all sequences. Classifications based on both BLASTn searches and NBC individually assigned >90% of the reads to fungi, indicating that this primer pair could indeed be the most fungus-specific of those tested. Of fungal classes, Agaricomycetes was the dominant group in all data sets.
The barcode-primer combinations exhibited five-fold differences in the number of fungal OTUs recovered in total and on the basis of samples rarefied to 8906 sequences (Figure 2; Suppl. material 3-6). As expected, the most conserved barcode -SSU V5 (SSU817F-SSU1196Rngs) -revealed the lowest number of OTUs, followed by LSU D3 (LR3R-LR5) and SSU V4 (S515Fngs-Euk742R) barcodes. Of rDNA genes, the LSU D1 barcode recovered greater OTU richness compared to other SSU and LSU barcodes. Overall, however, the ITS1 and ITS2 barcodes recovered the greatest number of OTUs. Both primer pairs targeting ITS2 sequences produced significantly more OTUs than those used to amplify ITS1, although the latter is considered somewhat less conserved (Nilsson et al. 2009;Wang et al. 2015). For the ITS2 barcode, the forward primer ITS3tagmix produced significantly more OTUs per sample compared with the gITS7 primer (especially Leotiomycetes and Lecanoromycetes), but their richness estimates were strongly correlated (n=25; R=0.96; P<0.001; Table 3). Similarly, of the two primers targeting the ITS1 barcode, the ITS1Fngs forward primer produced significantly more OTUs compared to the ITS1ngs primer (especially Sordariomycetes), but again these estimates were strongly correlated (n=25; R=0.83; P<0.001). OTU accumulation curves of separate and mixed ITS1 data sets indicated that combining the results of both primer pairs enabled recovery of 6.6% more OTUs, but the fraction of gITS7 data did not add to the ITS3tagmix data ( Figure 3). Overall, primer pairs for the ITS1 barcode were the least correlated with other barcodes in predicting OTU richness and the median value of richness. The LSU D1 barcode and ITS2 barcode (ITS3tagmix-ITS4ngs) had the strongest average correlation with other combinations. Pearson correlations among barcode-primer pair combinations and the soil metagenome in recovering the relative abundance of fungal classes. All correlations are statistically highly significant (P<0.001).

gITS7-ITS4ngs
LR0Rngs-LF402 Across all samples, barcode-primer pair correlations were strongly correlated in recovering the relative abundance of fungal classes (Table 4). These correlations were slightly stronger for primer pairs targeting the same barcode. The ITS3tagmix-ITS4ngs and gITS7-ITS4ngs primer pairs exhibited the strongest correlations to the median values across all barcodes. On average, the most conserved LSU D3 (LR3R-LR5) and SSU V5 (SSU817F-SSU1196Rngs) barcodes had the distinctly lowest correlations to other barcodes and the median values of all barcodes taken together.
Ecological analyses using all barcodes consistently revealed that vegetation structure was the strongest predictor of fungal communities (Table 5). Contrary to our predictions, the most conservative barcodes SSU V5 (SSU817F-SSU1196Rngs) and LSU D3 (LR3R-LR5) indicated that floristic variables explain the greatest amount of variation in the fungal community among all barcodes. Conversely, floristic variables accounted for the lowest amount of variation among all barcodes in all four ITS data sets. We suspected that lower matrix connectance could result in relatively lower detection power in the more sensitive and larger data sets. Successive removal of infrequent species indicated that the logarithm of connectance was extremely strongly correlated with the coefficient of determination up to connectance values of 0.45 (Bray-Curtis distance: n=8; R=0.980; P<0.001; Hellinger distance: n=8; R=0.995; P<0.001). Beyond this point, the adjusted determination coefficients reached a plateau or began to decline ( Figure 4).

Metagenomics
Of the 290,779,313 high-quality metagenome sequences from the PNG soil samples, 1,309,342 (0.45%) were assigned to ribosomal DNA of prokaryotic and eukaryotic organisms. Bacterial sequences and eukaryote sequences unassigned to any kingdom dominated the rDNA subset. Only 16,833 (1.29%) of these sequences were determined to represent fungal nuclear rDNA. Across all samples and regions, Agaricomycetes (41.4%), Eurotiomycetes (10.4%), and Dothideomycetes (5.5%) dominated,    whereas 13.2% sequences could not be assigned to any fungal class (Suppl. material 7). The fungal class Mixiomycetes was only recovered based on metagenome analysis, but many other uncommon groups in the amplicon data set were absent from the metagenome data set, presumably due to the comparatively low number of fungal metagenome sequences. The relative abundance of fungal classes in the ampliconfree soil metagenome was significantly correlated with that of all barcodes; the two most conserved barcode-primer pair combinations exhibited the weakest correlations (Table 4).
Within the soil metagenome, there were substantial differences in the recovery of fungal classes based on SSU, ITS, and LSU ( Figure 6). Both in the metagenome and amplicon data set, 11 out of the 16 most common fungal classes exhibited significant differences in recovery by the rDNA markers. In spite of PCR and primer biases, the taxonomic composition of fungi was more similar based on the same marker using  . Relative abundance of fungal classes in the amplicon and metagenomics data sets divided into SSU, ITS, and LSU subsets averaged over different barcodes (amplicon data) and 14 shared samples. Asterisks in the margins indicate significant differences in recovery of classes among SSU, ITS, and LSU of metagenomics (right) and amplicon (left) data sets. Asterisks in the center indicate significant differences between the metagenomics and amplicon-bases approaches.
either metagenomics-or amplicon-based HTS. Chytridiomycota and Pezizomycetes were significantly less common in the metagenome compared to the amplicon-based HTS data, but Leotiomycetes and the unassigned category were more abundant in the metagenomics data set ( Figure 6).
Metagenomic analysis of the mock community enabled us to estimate class-level biases in identification of fungi. While other groups were roughly evenly represented, Archaeorhizomyces spp. accounted for 17.0%, 5.1%, and 10.1% of metagenomics sequences in the SSU, ITS, and LSU data subsets, respectively, which is consistent with trends in the metagenomics and amplicon data sets of soil samples. Across all three markers, 6.2% and 2.4% of sequences respectively belonged to the unknown category or were affiliated with classes that were not included in the mock community (Suppl. material 7). The proportion of unknowns and incorrectly identified sequences were 4.3 and 2.5 times greater for the LSU compared with ITS data subset. The proportion of incorrect identifications is probably much greater, because our mock community represents sequences from the most common classes and misidentifications among these classes cannot be inferred. Therefore, use of positive controls representing a mock community and single isolates is warranted for evaluation of identification biases in metagenomics data sets.

In silico analyses
In-depth analysis of primer mismatches to fungal templates revealed potential systematic biases inherent to different primer pairs (Appendix 1). The fungus-specific ITS1F primer family has central mismatches to multiple fungal groups as well as 3' terminal mismatches to Chytridiomycota, Saccharomycetes, and several taxa within Dothideomycetes. The "universal" ITS1 primer has two central mismatches to most Sordariomycetes. These two primers complement each other very well in terms of taxon coverage and avoidance of introns, except for the fact that both miss most Microsporidia. The ITS2, ITS3, and gITS7 primers exhibit single mismatches to multiple fungal groups especially within the early diverging lineages. The use of a mixture of multiple modified ITS3 variants accounts for the vast majority of these mismatches (Appendix 1; Tedersoo et al. 2014). Due to extreme variation in the 5.8S rDNA, many lineages in the Tulasnellaceae family are missed by all these primers. The same thing goes for the Microsporidia, whose unusual configuration of the ribosomal genes makes this a very challenging group to target in molecular ecology efforts. The ITS4 and LR0R primer families possess two mismatches to Archaeorhizomycetes and one mismatch to most Chaetothyriales, the latter of which was accounted for by a degenerate position in the ITS4ngs and LR0Rngs primers. Both primers are ill-suited for amplification of a large proportion of Tulasnellaceae and Microsporidia due to multiple central mismatches (Taylor and McCormick 2008;Oja et al. 2015). The LF402 and LF402F primers discriminate among most non-fungal groups and cover most of the fungal kingdom, except for a number of taxa within Cantharellus, Tulasnellaceae, Ustilaginaceae, Candida, and some early diverging lineages. These can be accounted for by adding degenerate positions or constructing primer mixes (Appendix 1). The LR3R, TW13, and LR5 primers are truly universal, with perfect match to nearly all eukaryotes. Some Saccharomycetes taxa exhibit a single central mismatch to all these primers. The SSU primer SSU515F is nearly universal across the three domains of life. However, Microsporidia and Archaeorhizomycetes exhibit a single central mismatch to this primer. This mismatch can be accounted for by a degenerate position in the original variant of this primer (Turner et al. 1999). The original primer contains an intron site common to many lineages in Saccharomycetes and Pezizomycotina. Shortening the SSU515Fngs primer served to remove the intron position from the priming site in this study. The Euk742R and SSU817F primers possess multiple mismatches to several fungal groups, especially Cantharellus, Tulasnellaceae, and Saccharomycetes but also to certain other groups as well. Therefore, these primers are not recommended for fungal metabarcoding. The SSU1196R primer has multiple (including terminal) mismatches to Cantharellus, Tulasnellaceae, Saccharomycetes, some groups of Pezizomycetes, and Glomeromycota. Shortening the primer as in SSU1196Rngs ameliorated the mismatches only in part. Of other commonly used primers, the ITS5 primer is universal to most eukaryotes with a single central mismatch to some fungal and non-fungal groups. The LR5F primer has a perfect match to nearly all fungal groups, animals, and stramenopiles, but at the same time it exhibits a terminal mismatch to plants. Of basidiomycete-specific primers, the ITS4-B has multiple mismatches to nearly half of the major orders, but it effectively excludes other fungal phyla and non-fungal groups. The LB-W primer covers nearly all Basidiomycota (except Cantharellus and Ustilaginomycetes) and early diverging lineages, but it discriminates strongly against Ascomycota.
Fungal taxa differed roughly three-fold in the length of the ITS1 and ITS2 barcodes (Figure 7). The Archaeorhizomycetes exhibited both very short ITS1 and ITS2 barcodes, with the full ITS region averaging 370 bp. On average, Mortierellomycetes, Glomeromycetes, Tremellomycetes, and Wallemiomycetes had a disproportionately short ITS1 barcode compared with the ITS2 barcode. Except for Wallemiomycetes, the abundance of these groups was lower based on the ITS1 barcode in spite of smaller amplicon size. However, critical mismatches to the ITS1Fngs and ITS2 primers (Appendix 1) may have caused relatively lower recovery of Glomeromycota and Tremellomycetes based on the ITS1.

Multiple DNA barcodes
Our analyses of seven barcodes indicate that markers differ substantially in their ability to recover OTUs at the 97% sequence similarity threshold, a threshold value that is almost universally used in HTS studies. Consistent with the lower species-level discrimination power of SSU and LSU compared with ITS , both SSU and LSU barcodes recovered fewer OTUs compared to ITS1 and especially ITS2 barcodes. Because the 97% sequence similarity threshold is considered too low for separation of closely related species for the ITS region (Kõljalg et al. 2013), the true taxonomic richness is probably underestimated with all barcodes used here, but relatively more so with SSU and LSU. Clustering of the mock community sequences indicated that the relatively more species from different fungal orders are inadvertently lumped together based on sequences of conservative domains of SSU and LSU (Suppl. material 2). Although OTU richness estimates of different barcode-primer pair combinations exhibited strong or moderate linear correlations, we caution against using SSU and LSU for metabarcoding. These genes may nonetheless be suitable for class or phylum-level analyses and inference of community phylogenies when functional assignment of taxa is not of concern.
In spite of the low taxonomic resolution of SSU and LSU, these barcodes were relatively more efficient in recovering trends in community composition in terms of greater proportion of variance explained. There are two alternative and perhaps additive explanations to this observation. First, phylogenetic niche conservatism among fungi may reinforce this pattern (Cavender-Bares et al. 2009) as taxonomic richness of fungal classes is driven by different environmental variables such as soil pH and climate . Second, lumping of closely related rare species increases the connectance of community matrices, i.e. results in greater matrix fill. Lower connectance associated with relatively more taxon-rich data sets adversely affects the performance of many community association metrics (Olesen and Jordano 2002). We demonstrated that greater connectance may improve statistical properties of multivariate analyses by reduction of noise. Importantly, qualitative aspects of these analyses were not altered as no other variable beside vegetation became significant with removal of infrequent OTUs. These results suggest that possibilities for optimal performance of multivariate analyses of large and sparse data sets warrant further exploration. It is important to assess the effects of connectance and removal of rare OTUs separately, because these two effects were not disentangled in our study.
Although all barcode-primer pair combinations revealed that floristic variables account for the strongest effects in fungal community composition, there were statistically significant primer biases that were not reported in previous studies (Arfi et al. 2012;Daghino et al. 2012;Blaalid et al. 2013). At the level of OTUs, the choice of primer pairs for the ITS1 barcode explained 2% of the community composition, which is less than any environmental predictor group. Tedersoo et al. (2010) found that selection of either ITS5 or ITS1F forward primers explains 15% of the community variation in ectomycorrhizal roots. Differences in performance can be explained by primer-template mismatches of the ITS1Fngs, ITS5, and ITS1ngs forward primers (Appendix 1) and by the common occurrence of an intron between ITS1Fngs/ ITS5 and ITS1ngs primer sites in many ascomycete groups (Bhattacharya et al. 2001). Lower taxonomic resolution or mismatches to primers in the 3´ end of the original ITS2 primer compared with optimized gITS7 and ITS3tagmix primers may have contributed to lower taxonomic richness of the ITS1 compared with ITS2 barcode. Taxa falling into Basidiomycota (except Agaricomycetes), Glomeromycota, Rozellomycota, Dothideomycetes, and Geoglossomycetes were disproportionately more OTU-rich in the ITS2 data set (Tedersoo et al. 2015b).

Metagenomics approach
Currently, the Illumina HiSeq technology enables generation of 4×10 8 DNA sequences per run. In our soil samples, fungal nuclear rDNA represented ca. 0.005% of all DNA molecules, resulting in an average sequencing depth of approx. 1000 sequences per sample (n=16). In spite of relatively even representation of all metagenomics sequences across samples, the coefficient of variation for the number of fungal sequences was roughly three times higher for metagenome samples compared with amplicons. Such uneven sequencing depth further complicates downstream statistical analyses.
PCR and primer biases in the amplicon data sets are well addressed, but there are also certain biases inherent to metagenomics approaches that are related to base composition and replication of DNA fragments (Gomez-Alvarez et al. 2009). Because the metagenomics sequences exhibit very low overlap across the rDNA, it is impossible to assign these sequences to OTUs and recover taxonomic richness (cf. Bengtsson et al. 2012).
Except for some minor but statistically significant taxonomic biases, the metagenomics data set covering SSU, ITS, and LSU provided highly comparable results to that of all barcodes taken together but especially to the results from ITS1 and ITS2. The metagenomics analyses confirmed that the Agaricomycetes, Eurotiomycetes, and Sordariomycetes are the key players in tropical soils of PNG. Furthermore, the ITS data set of both amplicons and metagenome exhibited a similar proportion of mycorrhizal fungi (Tedersoo et al. 2015b). We caution that some care should be taken in interpreting our comparison of metagenomes and amplicons, because both analyses were performed in a single run, such that there may be differences between the two Illumina models, MiSeq 2×300 and HiSeq 2×250, and their chemistry.
Given the low and uneven recovery of fungal rDNA sequences and difficulties in correct taxonomic assignment (see below), metagenomics with the sole purpose of metabarcoding is clearly a waste of financial and computational resources. Enrichment of targeted molecules such as mitochondrial DNA may improve the problems associated with insufficient sequencing depth (Zhou et al. 2013). Nonetheless, vast amounts of functional data produced in the metagenomics analyses promise a new paradigm in community ecology (Zhou et al. 2015).

Analytical biases
Our in silico analysis of primer mismatches extends the results of Bellemain et al. (2010) and Op De Beeck (2014) by covering many additional primers and pointing to particular mismatching positions and taxa. This has enabled us to fine-tune primers for targeting the ITS region in fungi and other soil eukaryotes for cross-kingdom studies (Tedersoo et al. 2015a). We also demonstrate that there is no universally good solution for covering all fungi, because groups with rapidly evolving nuclear rDNA such as Microsporidia, certain groups within Saccharomycetes, Tulasnellaceae, and Cantharellus commonly escape detection by these primers for all of SSU, ITS, and LSU. Similarly, excessively long ITS barcodes and introns within barcodes may hamper amplification and sequencing of additional groups (Lindahl et al. 2013;Tedersoo et al. 2015b). Besides primer bias, unequal representation of amplicons may result from extreme GC to AT ratio (Aird et al. 2011). Although there was no evidence for length bias in our study, Ihrmark et al. (2012) demonstrated that barcode length has a strong inverse correlation with recovered abundance in a mock community based on 454 pyrosequencing technology. In particular, Archaeorhizomycetes stand out as a group with exceptionally short ITS1 and ITS2 barcodes. Because of the very strong size selection in certain cloning methods and PCR (Kanagawa 2003;Carninci et al. 2011), the unprecedented dominance of Archaeorhizomycetes in molecular studies based on cloning of long DNA fragments (Schadt et al. 2003;Castro et al. 2010) may represent an extreme example of a cloning bias.
The technical biases of sample preparation steps are poorly understood and these may be largely specific to platforms, models, and chemistry. Nonetheless, depending on the base composition and size of DNA fragments, unequal competition for adaptors may occur (Aird et al. 2011). Besides potential competitive disadvantage in ligation, the relatively longer reads in the 300-600 bp frame are less likely to be recovered because of reduced read quality in the last quarter of the sequence and problems with pairing low-quality sequences.
In addition to PCR and primer biases inherent to amplicon-based analyses, a bias related to incompleteness of reference database and uncertainty of taxonomic assignment is common to both amplicon and metagenomics data sets. A part of this so-called "identification bias" results from differential quality and abundance of reference data that may affect the probability of identification of particular taxa (Kõljalg et al. 2013). For example, reference sequences covering the full LSU are available for most early diverging lineages and the most common groups of Ascomycota and Basidiomycota, but not for "minor" groups such as Orbiliomycetes, Geoglossomycetes, and Archaeorhizomycetes. Cryptomycota and Microsporidia were poorly represented in the ITS and LSU reference data sets, because taxonomic studies in these groups are largely focused on the SSU. Similarly, many other early diverging lineages such Zoopagomycota, Neocallimastigomycota, and Blastocladiomycota were underrepresented in the ITS reference data sets. Although the early diverging lineages represented <0.5% relative sequence abundance in all data sets, many of the OTUs not assigned to phyla probably represent one of these lineages. Another aspect of the identification bias was evident when comparing taxonomic assignments of reference databases and results from BLASTn vs. Naïve Bayesian Classifier. Combining the databases and methods enabled us to identify a vast majority of OTUs and metagenome sequences to class level. The identification bias is very difficult to quantify or explicitly test among different barcode data sets and among studies, because of confounding primer effects, PCR conditions as well as subjectively established similarity thresholds and manual re-evaluation based on previous experience and knowledge about sequence variability.
Differential representation of taxa in the reference data sets certainly plays a much greater role in metagenomics data sets targeting all genes (Korsakovsky Pond et al. 2009) or only ribosomal DNA, because the rDNA sequences used for identification may fall into any part of the SSU, ITS, or LSU, or into intergenic spacers (IGS) that are usually not considered. Since the SSU, ITS, and LSU all include variable as well as highly conserved regions, it is very difficult to provide correct taxonomic assignments to short fragments (100-450 bases) at lower taxonomic levels. For several fungal groups, full-length sequences of LSU are lacking, which results in partial ability to identify these. Typically, fungal orders can be distinguished based on specific similarity thresholds and NBC classifications for ITS and variable domains of SSU and LSU. However, orders, classes, and even phyla can be lumped together at the same levels of similarity if the metagenomic sequence happens to cover a highly conserved region. Furthermore, our metagenomics rDNA data comprised a remarkable proportion of SSU and LSU sequences that clearly belonged to the ophistokonts, but could not be reliably linked to animal and fungal kingdoms or minor lineages related to these groups. Therefore, the uncertainty and potential mistakes in taxonomic assignment are much more relevant to the metagenomics approach compared to DNA barcodes. To ameliorate identification biases in metagenomics data, reference databases should be updated with full-length rDNA sequences originating from genomics initiatives such as the 1000 Fungal Genomes Project (http://1000. fungalgenomes.org) and covering all classes of fungi and closely related organisms.

Conclusions
This study demonstrates that PCR-free metagenomics and amplicon-based approaches perform in a comparable fashion in recovering major fungal classes in spite of certain statistical differences. Within the amplicon data set, barcode-primer pair combinations differed strongly in recovering relative abundance of fungal classes and OTU richness (see also Tedersoo et al. 2015b). Nonetheless, these were all in agreement about trends in OTU richness and disentangling the key drivers of community composition. We found no evidence for reduced statistical performance in barcodes with relatively conserved sequences, but the use of conserved barcodes seriously hampers biological relevance of the data due to the inability to approximate species level (or explicitly any taxonomic level) and to assign functional categories such as trophic status (except arbuscular mycorrhizal mutualism). Considering the taxonomic resolution and primer bias, we recommend targeting the ITS2 barcode when using the current HTS technologies that permit <700 bases of high-quality reads. Due to its high taxonomic coverage, the ITS3mix series constitutes a good candidate for a forward primer and there is no better alternative than the ITS4 primer family in the beginning of LSU as a reverse primer. Both can be easily degenerated or multiplexed to address additional fungal or eukaryotic groups (Appendix 1). In strong disagreement with Waud et al. (2014), however, the ITS2 amplicon is unsuitable for studies of orchid mycorrhizal fungi due to multiple mismatches to Tulasnellaceae in primers located in the 5.8S rDNA. With technological advances in read length and quality, we advocate the use of full ITS region due to a better choice of forward primers and the presence of critical differences in either ITS1 or ITS2 in many pairs of closely related species (Blaalid et al. 2013;Kõljalg et al. 2013). Our in silico and in vivo results indicate that combining amplicons of ITS1Fngs and ITS1ngs forward primers effectively accounts for biases arising from primer mismatches and the occurrence of introns (Oja et al. 2015). In addition, the D1 and potentially D2 regions of LSU perform well in higher-level taxonomic assignment of fungi (see also Begerow et al. 2010). Reverse primers LF402 (including potential mixes) and TW13 located in LSU have strong affinities for fungi and for all eukaryotes, respectively. The use of these primers could form a strong basis for phylogenetic placement of isolates recovered from complex samples (Taylor et al. 2014

Appendix 1
Sequences of widely used ITS primers, their mismatches to fungal taxa, and recommendations for metabarcoding studies.

Cloning of ribosomal itS PCR products creates frequent, non-random chimeric sequences -a test involving heterozygotes between Gymnopus dichrous taxa i and ii introduction
Sequence chimeras are common when pooled DNA s are co-amplified by a PCR process (Edgar et al. 2011;Judo et al. 1998;Jumpponen 2007;Meyerhans et al. 1990;Odelberg et al. 1995;Qiu et al. 2001a;Smyth et al. 2010;Tedersoo et al. 2014;Wang and Wang 1997). There has been speculation that chimeras may be the result of incomplete extension of PCR products which subsequently act as primers for the next amplification cycle and, in fact, there seems to be a reduction in chimeric PCR products when extension times are increased (Meyerhans et al. 1990;Qiu et al. 2001a;Smyth et al. 2010;Thompson et al. 2002) or by optomizing the PCR protocol (Qiu et al. 2001b;Wang and Wang 1997). Odelberg et al. (1995) however, demonstrated that chimeras can be generated in a single round of PCR amplification in the absence of heat denaturation and re-annealing which suggests that some polymerase template switching may occur. They demonstrated that template switching was reduced several fold (but not eliminated) by fixing templates to streptavidin magnetic beads.
The overall frequency of chimeras in published studies is unknown. In a study of fungal ITS amplicons derived from soil samples using a PCR/cloning process (O'Brian et al. 2005), 5% were chimeric as determined by an ascomycete/basidiomycete discontinuity between ITS1 and ITS2 regions. Smaller chimeras between related taxa would not have been detected in this study [see later detection of a chimera in this data set by Ryberg et al. (2008)]. For nSSU sequences, the proportion of chimeric sequences can be extensive (Ashelford et al. 2005;Ashelford et al. 2006;Fonseca et al. 2012;Wang and Wang 1997) leading to 'false diversity estimates and false novel taxa. Fonseca et al (2012) demonstrated that nuclear SSU chimeras were produced at high levels during the PCR-process when mixed templates were present. Interestingly, their results with a nematode population demonstrated that chimera formation is higher in species-diverse PCR pools than in pools that are genetically less diverse but that the breakpoints were in regions of sequence similarity.
Most programs for checking and removing chimeric sequences were designed for 16S ribosomal sequences. DECIPHER (Wright et al. 2012) (http://decipher.cee.wisc. edu/index.html) for bacterial 16S sequences searches for sets of short fragments that are uncommon in the phylogenetic group where the sequence is classified, but frequent in other phylogenetic groups. This depends on a robust pre-existing data set. UCHIME (Edgar et al. 2011), a chimera check program, is best when two known parental sequences and a high-quality reference database that is chimera free are available. Bellerophon uses a partial treeing analyses to detect 16S chimeras (Huber et al. 2004). Other chimera checking programs are available [see Fonseca et al. (2012) for discussion]. Nilsson et al. (2012) note the high incidence of ITS chimeras in public ITS databases and suggest a mechanism for identifying them by Blast mismatches between the ITS1 and ITS2 regions and by exploring long branches in phylogenetic trees. They comment that the most frequent site for exchange between two similar templates in a PCR reaction is in the first part of the highly conserved 5.8S segment. An open source chimera checker for the ITS region has been developed and is available at http://www. emerencia.org/chimerachecker.html . Using chimera checker to evaluate 12 300 sequences, 1.5% were identified as chimeric sequences. To better facilitate ITS-based molecular identification of fungi for the scientific community, the UNITE database was established to provide reliable documented ITS sequences for the fungal community (Kõljalg et al. 2013;Nilsson et al. 2015) and a reference set of ITS sequences, each representing a species hypothesis is available in several formats including UCHIME (Nilsson et al. 2015), available at https://unite.ut.ee/repository. php#uchime.
With the establishment of the ribosomal ITS as the fungal barcode , and the identification of highly heterozygous fungal ITS sequences that require cloning to resolve, it becomes increasingly important to understand the consequences of cloning PCR products from mixed templates including cloned PCR products generated during environmental sampling. Divergent Gymnopus dichrous ITS2 haplotypes provide an appropriate experimental system with which to explore this issue.
Gymnopus dichrous is a small, saprobic mushroom commonly found on oak bark and other woody debris in mid-summer in the southern Appalachian Mountains (USA). ITS sequencing identified two ITS subgroups of this mushroom called for convenience G. dichrous I and II. Gymnopus dichrous I and II differ in the ITS2 region (10% divergence) but there are no consistent or significant bp differences between G. dichrous I and II ITS1 or 5.8S regions (average divergence = 0.29% between the G. dichrous I and II ITS1 region, 0% in the 5.8s region). Homozygous collections for G. dichrous I and G. dichrous II were collected in the southern Appalachian Mountains and were designated as parental genotypes (P 1 ). Fruitbodies that were ITS hybrids between G. dichrous I and G. dichrous II were also been collected and were designated as F 1 hybrids (first filial generation as used in standard genetic crosses). For F 1 hybrids, several indels in the ITS2 region obscured electropherograms and prevented recovery of the parental ITS sequences during Sanger sequencing. Cloning of the ITS1-5.8S-ITS2 PCR product was required to recover individual contributing haplotypes, however, a significant portion of recovered haplotypes were chimeric sequences.
Cloned ITS sequences were compared to P 1 (parental/ homozygote) ITS sequences of Gymnopus dichrous I and II to identify chimeric and non-chimeric sequences. Below, we examine chimeras derived from cloned G. dichrous heterozygotes and show that they can be small, frequent and non-random. We also provide evidence that putative chimeras were not due to natural meiotic recombination or variation in the ribosomal repeat.

Methods and materials
Collections. Gymnopus dichrous and G. subnudus are often collected as the same entity and are morphologically difficult to separate. Both are variable in morphology. Putative Gymnopus dichrous collections were made in the field using known morphological and environmental characteristics [e.g., small (ca. 5 cm in height) brown mushrooms, often with a darker, compressed, stem base and growing on wood, usually but not exclusively on oak bark]. Of 116 collections of putative G. dichrous, 16 were G. dichrous I-II hybrids (Table 1). Collections are archived in TENN-Fungi.
Single basidiospore isolation. Single-basidiospore isolates (SBIs) were obtained from fresh spore drops as described in Gordon and Petersen (1992). Monokaryon status of SBI cultures was determined microscopically by lack of clamp connections.
PCR and Cloning procedures. Cloning was carried out using Promega's pGEM-T easy kit and M109 Competent cells according to manufacturer's directions (Promega). Sanger sequencing of ITS-cloned products was performed as described in Hughes et al. (Hughes et al. 2009).
PCR of the nuclear ribosomal ITS area was performed using primers ITS1F (Gardes and Bruns 1993) and ITS4 (White et al. 1990) for all collections in this study. PCR parameters for ITS amplification were 3 min at 94 °C followed by 34 cycles of 94 °C for 1 min, 55 °C for 1 min, 72 °C for 1 min and a final extension of 72 °C for 3 min. Each 50 µl PCR reaction contained 24.25 µl sterile ddH 2 O, 10 µl of 5X PCR buffer (Promega Corporation, 2800 Woods Hollow Road, Madison, WI 53711 USA a), 2.5 µl 5% DMSO (Sigma-Aldrich Company, St. Louis, MO, USA), 6 µl of 25 mM Mg = 3 mM (Promega), 4 µl of 100 mM dNTP mix (Promega), 1 µl each of two primers (10 µM) and 0.25 µl Taq polymerase (Promega). Sequencing of the 5' end of the nuclear ribosomal RNA large subunit gene was performed for single spore isolates only and used primers LR0R and LR5 (Vilgalys and Hester 1990). Parameters for LSU amplification are the same as for ITS except the extension time at 72 °C was 1.5 min.
Identification of template switching regions. There are 25 regions of sequence mismatch within G. dichrous I and G. dichrous II (Fig. 1, red text) which were used to identify apparent template switching. The first difference at ITS2 position 15 and the 1bp indel at position 24 were used to establish whether the 5' end of ITS2 represented G. dichrous I or G. dichrous II. Base pair 31 (A or G) is variable within the G. dichrous I population and when it is an adenine residue, it can be used as an additional marker. Clones exhibiting template switching were identified by an observed sequence change from G. dichrous I to G. dichrous II (or vice-versa) between bases 25 and 332. Four discontinuities were observed in the data set between bp15 and bp25. In two cases, bp25 and an adenine/guanine base pair at bp31 were used to identify the 5' end of the clone as G. dichrous I or II. In the other two cases, bp25 and bp71 were used to identify the 5' end of the clone. The discontinuities may be due to template switching or to PCR generated mutation. We note that rare PCR-induced base pair mutations could affect determination of the correct template switching point.
DNA folding. Potential DNA folding of the ITS2 region for G. dichrous I and II exemplars was estimated at 72 °C (extension phase of PCR) using the MFOLD web server (http://mfold.rna.albany.edu/?q=mfold/DNA-Folding-Form) with a MgCl setting of 3 mM (Zuker 2003) and 0 mM Na.
Chi square analysis. The ITS 2 region was divided into 25 segments of varying length between bases 25 and 332. Each segment was flanked by a sequence difference between G. dichrous I and II that was informative for diagnosing template switching. Bases in red are points where DI and DII haplotypes differ in sequence and were used to determine if template switching had occurred in a cloned PCR product. Eight base pairs at which template switching can be detected are indicated by numbers 1-8. The possible area in which template switching (ts) could have occurred is indicated by vertical arrows and the number of observed template switching events is given above the vertical arrow. Bases that may be involved in intra-strand base pairing as determined by MFOLD are outlined with black boxes. Ambiguity codes indicate intraspecific variation.
For each segment, the number of template switching events was recorded, ranging from zero to six. These constituted observed values. Expected values were based on a null hypothesis of random template switching (each base has an equal probability of template switching).

Results and discussion
The proportion of ITS chimeras obtained from PCR amplification of the ITS regions of G. dichrous I-II heterozygotes is given in Table 1. Of 128 clones from F 1 heterozygotes, 28 (21.9%) were chimeric in the ITS2 region. The number of clones representing non-chimeric ITS sequences was unequally distributed between G. dichrous I and II. From F 1 heterozygotes, 34.4% of the clones were G. dichrous I sequences and 43.75% were G. dichrous II sequences.
The distribution of template switch points resulting in chimeras was not random along the ITS2 region (Chi square = 35.72, P<0.05). Of 25 possible diagnostic base pair sites, template switching was observed between only 8 points (Fig. 1) and the most frequent template switching (six events) occurred within a very short span of bases, bases 122-138 creating 6 identical chimeric sequences. Two regions of 5 tem- plate switching events were also observed ( Fig. 1). It should be noted that the exact point at which template switching occurred cannot be determined in regions where G. dichrous I and II have identical sequence but is between the last 5' diagnostic base pair difference between G. dichrous I and II and the diagnostic base pair at which template switching occurred (Fig. 1, horizontal arrows). The possibility of identical chimeras occurring in GenBank and thus being interpreted as valid taxa was noted by Nilsson et al. (2012). The non-random nature of template switching would suggest that some stable mechanism influences template switching. We investigated the possibility that secondary structure formation during the PCR process might lead to non-random chimera formation, perhaps by briefly stalling taq polymerase transcription at the point of secondary folding and allowing template switching. Ribosomal ITS2 RNA is known to have secondary structure at normal cellular temperatures and conditions (Joseph et al. 1999;Krüger and Gargas 2004;Krüger and Gargas 2008). We wondered if the formation of DNA secondary structure in the ITS2 region might affect chimera formation. Using MFOLD (Zuker 2003) we examined secondary structure of ITS2 DNAs at 72 °C (extension phase). At 72 °C, MFOLD predicted stable secondary structure in both G. dichrous I and G. dichrous II templates. For G. dichrous I, 4 possible folding configurations were reported at increasing levels of free energy (dG). The folding configuration with the lowest free energy is mapped to the sequence in Fig. 1. For G. dichrous II, a single folding configuration was reported. This is also mapped to Fig. 1. We note that factors other than the ITS2 sequence itself influence RNA folding.
The most frequent template switching occurred in a region between bases 122 and 138. This region overlaps and follows a small area of folding (a 4bp neck and 4bp loop) in G. dichrous I templates. The region between 140 and 200 is involved in complex folding patterns that are not consistent from model to model but present in all models. Five template switching events occur in this region. Between bases 200 and the end of the template at base 332, there is no consistently predicted secondary structure and fewer template switching events. Thus there is a loose agreement between secondary structure and regions of template switching but we cannot conclude cause and effect.
The size of detectable chimeras varied from chimeras occurring at the 5' end of the ITS2 sequence (approximately 300bp) to those occurring near the 3' end (45bp). Chimeras occurring between bp15 and bp25 would not have been recorded as such by our procedure but may have occurred (4 discontinuities may be due to template switching or to PCR generated mutation-see methods). We note that there is an area of secondary structure which overlaps bp15 and could be involved in template switching.
Areas where G. dichrous DI and DII differ extensively including indels (bases 147-151, 198-205) do not seem to be involved in template switching. This has been noted in other studies as well (Fonseca et al. 2012;Haas et al. 2011).
To evaluate whether cloning simply uncovered existing variation in the ribosomal repeat region (Lindner and Banik 2011), we sequenced the ITS plus LSU region of single basidiospore isolates (SBIs) derived from F1 heterozygotes ( Table 2). SBIs of F 1 spores were either G. dichrous I or II and did not show any evidence of meiotic recombination between ITS + LSU types I and II. We further examined SBI sequences from two collections of G. dichrous I (Table 2) and found no evidence that any G. dichrous II sequence elements were present in the ribosomal repeat. Finally, we cloned PCR products from three collections of G. dichrous I (TENN68152-8 clones, TENN67834-7 clones, TENN69091-8 clones) and again found no evidence for the presence of G. dichrous II sequence elements. We conclude that there is no current evidence in G. dichrous that cloning is recovering existing intragenomic variation in the ribosomal repeat but cannot exclude that possibility. Results reported as intragenomic variation in Laetiporus cincinnatus by Lindner and Banik (2011) could be due to differences in one or more ribosomal repeats but could also be explained if the sampled fruitbody was a hybrid and the two parental haplotypes and resulting chimeras were recovered.

Conclusions
Chimeras are common in cloned PCR products and tend to obscure contributing parental haplotypes, thus potentially creating errors in DNA sequence repositories. In this study, we show: 1. Template switching is non-random. Of 25 possible markers where ITS2 sequences of G. dichrous I and II differ, only 8 show template switching and template switching is higher in specific regions of the ITS2 sequences. The non-random nature of chimeras could lead to the misinterpretation of chimeras as parental haplotypes when the same chimera is recovered multiple times. 2. There is a loose correlation between areas predicted to form secondary structure and regions where template switching is high. We conclude that formation of secondary structure may affect template switching but speculate that secondary structure formation could either enhance or repress template switching, depending on location and the size of the stem-loop structure. 3. Chimeras occurring near the end of a template may be short and thus not easily detected. 4. Chimeras are not due to recovery of underlying variability in the ribosomal repeat in this system. The origins of chimeras remain obscure and may be due to multiple factors. 5. Chimera control should be exercised in environmental sampling studies and taxonomic studies wherever possible in order to minimize problems with persistent errors in sequence data repositories.
Distributional records of Antarctic fungi based on strains preserved in the Culture Collection of Fungi from extreme environments (CCFee) Mycological Section associated with the italian National Antarctic Museum (MNA)

Purpose
The aim of this study is to provide new distributional data on Antarctic fungi collected in the framework of past and recent expeditions (Italian PNRA, USA and Czech expeditions) and now preserved in the Antarctic section of the Culture Collection of Fungi From Extreme Environments (CCFEE -Antarctic Fungi), a collection associated with the Italian Antarctic National Museum (MNA, Section of Genoa, Italy). CCFEE is located at the Laboratory of Systematic Botany and Mycology, Department of Ecological and Biological Sciences (DEB) of the Tuscia University (Viterbo, Italy).
The dataset is the third Italian contribution to ANTABIF based on materials stored at the Italian National Antarctic Museum (MNA) and in its associated collections. The first MNA dataset published regarded the distributional records of Antarctic Mollusca, collected in the framework of the Latitudinal Gradient Program (Ghiglione et al. 2013), while the second MNA dataset regarded the distributional records of Antarctic Tanaidaceans collected in the Ross Sea (Piazza et al. 2014).

Project details
Project title: Antarctic Fungi from museum samples preserved in the Culture Collection of Fungi From Extreme Environments (CCFEE -Antarctic Fungi), which is a collection associated with the Italian National Antarctic Museum (MNA, Section of Genoa, Italy) hosted in the Laboratory of Systematic Botany and Mycology, Dept of Ecological and Biological Sciences (DEB), Tuscia University (Viterbo, Italy).

Curator and Promoter: Stefano Schiaparelli
Czech expeditions (2007)(2008)(2009)): latitude -63.8; longitude -57.88333 Design description: Data were gathered by assembling distributional records of Antarctic fungal strains stored in the CCFEE. Samples from which all the strains were isolated were obtained in the framework of different past research expeditions, which had different aims and geographical targets. The main purpose of PNRA "X" expedition was to evaluate the adaptation of microfungal strains to low temperatures, thermal stress, desiccation and exposure to radiation. During the "XII", "XVII" and "XIX" PNRA expeditions, the projects aimed at collecting rock samples for studying the biology and biodiversity of fungi from cryptoendolithic communities in the sandstone formations of several continental Antarctic coastal, and inland locations and from Antarctic Peninsula, as well as their role in the alteration of the rock substrate. The project developed in the framework of the "XXVI" PNRA expedition aimed at studying responses of rock meristematic fungi to environmental stresses related with the global change, and their role in the bioaccumulation and biomagnification processes of contaminants in the trophic networks. The main purpose of the Czech expedition was to study fungal biodiversity from soil and rock samples collected in the Antarctic Peninsula. The U.S.A. expeditions aimed at studying the biology and diversity of cryptoendolithic communities of the McMurdo Dry Valleys, and monitoring nanoclimate variations within the rock substrate.

Method step description:
See sampling description below and flowchart of Figure 1.
Study extent description: The distributional data herein considered refer to 53 different locations (Figure 2a, 2b, 2c), located along the Victoria Land (continental Antarctica) and the Antarctic Peninsula. Samples from which fungi were isolated were collected in the framework of the following expeditions, from 1980 to 2011: • PNRA X expedition: samples of soil, mosses, lichens, rocks and sediments were collected from lake area near Camp OASI, Skua Lake, Tethys Bay, Enigma Lake, Veg-    all the sandstone's formations and some granite formations of Northern Victoria Land were visited; • Czech IPY expedition: sampling was performed in the Antarctic Peninsula. Black meristematic fungi were isolated in Masaryk University, Brno, Czech Republic and donated to the CCFEE; The U.S.A. expeditions: aimed to study the biology and diversity of cryptoendolithic communities of the McMurdo Dry Valleys, and sampling was performed in that area. Black meristematic fungi isolated were donated to the CCFEE.
Sampling description: The materials were collected by hand and, whenever needed, by using hammer and chisel in order to chop rock fragments from large boulders.. Samples were then placed in sterile plastic bags and stored at -20 °C (Figure 1). Fungi from rocks were isolated by crushing stones and plating fragments directly on Petri dishes containing Malt Agar (MA) amended with chloramphenicol 100 ppm; plates were incubated at 10 °C and growth was inspected monthly.
Isolation from lichens was performed according to Selbmann et al. (2013). Isolation from other sediments, soils, and mosses was performed by diluting 1g of each substrate in 1 l of sterile water and plating 100 µl of the suspension in a 9 cm Petri dish containing MA or Czapek dox agar (CZA). Plates were incubated at 10 and 25 °C and growth inspected every day. Colonies were transferred in fresh medium immediately after their appearance and identified by macro-and microscopic observations. Scanning electron microscopy observations were performed according to the methods described by Onofri et al. (1980). When morphology was not informative enough for identification, the isolates were studied with a molecular approach according to Selbmann et al. (2005;. Fungi were identified by Selbmann, Onofri, Zucconi and Isola. All the isolates are preserved in the Culture Collection of Fungi from Extreme Environments (CCFEE). All the fungi are preserved as living (at 4 °C), frozen (at -80 °C) or freeze-dried cultures, or as dry samples. Holotype materials are deposited at: CBS-KNAW Fungal Biodiversity Centre (Utrecht, Netherlands); Industrial Yeasts Collection (Perugia, Italy); German Collection of Microorganisms and Cell Cultures (Brunswick, Germany); International Mycological Institute (London, United Kingdom) ( Table.1). Ex-types of newly described taxa are preserved in the CCFEE.
The present dataset of Antarctic fungi has been formatted in order to fulfil Darwin Core standards required by the IPT scheme, according the GBIF Data Toolkit (http:// www.gbif.org/resources/2573). The dataset was uploaded in the ANTABIF database (the geospatial component of GBIF).
Quality control description: Strains were identified by both morphological and molecular tools. DNA sequence data obtained from the studied strains are deposited in Mycobank and GenBank. Species names were crosschecked and made consistent with current ones as reported in MycoBank (last check made on 2015.08.04) ( Figure  1). Distributional records' geographical coordinates are as published in the different papers reported in the reference section. During the more recent expeditions (XIX and XXVI) the coordinates were recorded using a Garmin GPS.

taxonomic coverage
General taxonomic coverage description: Fungal assemblages in Antarctic rocks have been extensively studied with particular emphasis for free-living but also symbiotic fungal populations, which constitute an important part of epilithic and endolithic communities . The interest for these communities is strongly connected to their high adaptability to the Antarctic ice-free environmental conditions, a combination of extremely low temperatures, low water activity, oligotrophy and high solar irradiation; for these reasons they constitute a useful model for studying evolution and adaptations to extreme environments. On the whole, this dataset takes into account 259 fungal strains, out of the 486 available at CCFEE. This initial dataset corresponds to the species that have been well characterized from a taxonomic point of view and have all the metadata required by Darwin Core for publication as distributional records. They belong to three Divisions, i.e. Ascomycota, Basidiomycota and Mucoromycotina (Figure 3), corresponding to a total of 32 genera ( Figure 4) and 38 species. Among them, the 93% belong to Ascomycota Phylum, that includes the almost totality of genera and the totality of new taxa present in the collection (12 species and 6 genera) (Table 1). One basidiomycetous new species, isolated from soil under moss in Kay Island, Cryptococcus vaughanmartiniae Turchetti, Blanchette & Arenz (CCFEE5495=DBVPG5862), was recently described (Turchetti et al. 2015).