Corresponding author: Leho Tedersoo (
Academic editor: T. Lumbsch
Rapid development of high-throughput (HTS) molecular identification methods has revolutionized our knowledge about taxonomic diversity and ecology of fungi. However, PCR-based methods exhibit multiple technical shortcomings that may bias our understanding of the fungal kingdom. This study was initiated to quantify potential biases in fungal community ecology by comparing the relative performance of amplicon-free shotgun metagenomics and amplicons of nine primer pairs over seven nuclear ribosomal DNA (rDNA) regions often used in metabarcoding analyses. The internal transcribed spacer (ITS) barcodes ITS1 and ITS2 provided greater taxonomic and functional resolution and richness of operational taxonomic units (OTUs) at the 97% similarity threshold compared to barcodes located within the ribosomal small subunit (SSU) and large subunit (LSU) genes. All barcode-primer pair combinations provided consistent results in ranking taxonomic richness and recovering the importance of floristic variables in driving fungal community composition in soils of Papua New Guinea. The choice of forward primer explained up to 2.0% of the variation in OTU-level analysis of the ITS1 and ITS2 barcode data sets. Across the whole data set, barcode-primer pair combination explained 37.6–38.1% of the variation, which surpassed any environmental signal. Overall, the metagenomics data set recovered a similar taxonomic overview, but resulted in much lower fungal rDNA sequencing depth, inability to infer OTUs, and high uncertainty in identification. We recommend the use of ITS2 or the whole ITS region for metabarcoding and we advocate careful choice of primer pairs in consideration of the relative proportion of fungal DNA and expected dominant groups.
Tedersoo L, Anslan S, Bahram M, Põlme S, Riit T, Liiv I, Kõljalg U, Kisand V, Nilsson RH, Hildebrand F, Bork P, Abarenkov K (2015) Shotgun metagenomes and multiple primer pair-barcode combinations of amplicons reveal biases in metabarcoding analyses of fungi. MycoKeys 10: 1–43. doi:
The internal transcribed spacer (ITS) region of the nuclear ribosomal DNA (rDNA) is the formal barcode for molecular identification of fungi (
Discussion related to potential taxonomic biases in relation to the class
Between 5 and 30 November 2011, 34 composite soil samples were collected from woody plant-dominated ecosystems in Papua New Guinea (PNG) following a standard protocol (
To address barcode and primer biases, we selected seven barcodes in SSU (variable domains V4 and V5), ITS (ITS1 and ITS2), and LSU (variable domains D1, D2, and D3) of the nuclear rDNA (Figure
Map of ribosomal DNA indicating variable regions as well as primers used and/or discussed in this study. Primers pairs used for HTS are highlighted.
A reverse or forward primer for each barcode was supplemented with one of the sixteen 10-base identifier tags (Table
Primers and identifier tags used for Illumina MiSeq sequencing in this study.
Primer name | Features | Primer sequence | barcode | Reference |
---|---|---|---|---|
SSU515Fngs | Fwd, tagged | GCCAGCAGCCGCGGTAA | SSU V4 | This study |
Euk742R | Rev | AAATCCAAGAATTTCACCTCT | SSU V4 | This study |
SSU817F | Fwd | TTAGCATGGAATAATRRAATAGGA | SSU V5 |
|
S1196Rngs | Rev, tagged | TCTGGACCTGGTGAGTTT | SSU V5 | This study |
ITS1Fngs | Fwd, tagged | GGTCATTTAGAGGAAGTAA | ITS1 (combination 1) | This study |
ITS1ngs | Fwd, tagged | TCCGTAGGTGAACCTGC | ITS1 (combination 2) | This study |
ITS2 | Rev | GCTGCGTTCTTCATCGATGC | ITS1 (combinations 1,2) |
|
ITS3tagmix1 |
Fwd | CTAGACTCGTCATCGATGAAGAACGCAG | ITS2 (combination 1) |
|
ITS3tagmix2 |
Fwd | CTAGACTCGTCAACGATGAAGAACGCAG | ITS2 (combination 1) |
|
ITS3tagmix3 |
Fwd | CTAGACTCGTCACCGATGAAGAACGCAG | ITS2 (combination 1) |
|
ITS3tagmix4 |
Fwd | CTAGACTCGTCATCGATGAAGAACGTAG | ITS2 (combination 1) |
|
ITS3tagmix5 |
Fwd | CTAGACTCGTCATCGATGAAGAACGTGG | ITS2 (combination 1) |
|
gITS7 | Fwd | GTGARTCATCGARTCTTTG | ITS2 (combination 2) |
|
ITS4ngs | Rev, tagged | TTCCTSCGCTTATTGATATGC | ITS2 (combinations 1,2) |
|
LR0Rngs | Fwd, tagged | ACSCGCTGAACTTAAGC | LSU D1 | This study |
LF402 | Rev | TTCCCTTTYARCAATTTCAC | LSU D1 | This study |
LF402Fmix1 | Fwd | GTGAAATTGYTRAAAGGGAA | LSU D2 | This study |
LF402Fmix3 | Fwd | GTGAAATTGTCAAAAGGGAA | LSU D2 | This study |
TW13 | Rev, tagged | GGTCCGTGTTTCAAGACG | LSU D2 | T.J. White unpublished |
LR3R | Fwd | GTCTTGAAACACGGACC | LSU D3 |
|
LR5 | Rev, tagged | TCCTGAGGGAAACTTCG | LSU D3 |
|
Tag 001 | Tag | ACGAGTGCGT | All |
|
Tag 002 | Tag | ACGCTCGACA | All |
|
Tag 003 | Tag | AGACGCACTC | All |
|
Tag 026 | Tag | ACATACGCGT | All |
|
Tag 028 | Tag | ACTACTATGT | All |
|
Tag 029 | Tag | ACTGTACAGT | All |
|
Tag 030 | Tag | AGACTATACT | All |
|
Tag 032 | Tag | AGTACGCTAT | All |
|
Tag 033 | Tag | ATAGAGTACT | All |
|
Tag 049 | Tag | ACGCGATCGA | All |
|
Tag 050 | Tag | ACTAGCAGTA | All |
|
Tag 052 | Tag | AGTATACATA | All |
|
Tag 053 | Tag | AGTCGAGAGA | All |
|
Tag 054 | Tag | AGTGCTACGA | All |
|
Tag 077 | Tag | ACGACAGCTC | All |
|
Tag 078 | Tag | ACGTCTCATC | All |
|
The first 10 bases (CTAGACTCGT) represent an inert tag that does not align to any organism.
Selected soil samples (Suppl. material
We amplified DNA from two soil samples (G2655 and G2658 that were spiked with
Paired-end sequencing (2×300 bp) in the Illumina MiSeq sequencer resulted in 12,771,565 reads. LSU and SSU amplicons were paired, quality filtered, and demultiplexed using the LOTUS pipeline (
The ITS reads were quality filtered using MOTHUR 1.33.3 (
Following exclusion of singletons from all HTS data sets (cf.
Because of differential taxonomic resolution among the barcodes, we used the taxonomic assignments of both NBC and BLASTn searches to complement each other as both methods alone provided no assignment for ca 40% of OTUs due to poor representation of fungal data in SILVA, obvious misidentifications in INSDc, and great abundance of taxonomically unassigned sequence data (resulting in poor resolution using NBC). To optimize classification, we therefore combined and verified results of different methods and determined approximate
We followed the taxonomy of INSDc, except raising several early diverging lineages to phylum rank (cf.
For the shotgun metagenome data, samples were demultiplexed, and LSU and SSU regions of all organisms were extracted using SORTMERNA (
To understand potential amplification biases related to sequence length in the ITS1 and ITS2 barcodes, we downloaded all ITS sequences of the 16 most common fungal classes (based on our amplicon data) from UNITE 7.0beta data set. Ribosomal RNA genes flanking the ITS1 and ITS2 barcodes were trimmed using ITSx 1.0.9. Average and median values and standard deviations were calculated for each group.
For OTU-based statistical analyses, we removed all non-fungal sequences and rarefied all amplicon samples to a depth of 8609 sequences using MOTHUR. This depth represents the median number of sequences of the ITS1 (ITS1ngs-ITS2 primers) barcode that was the second lowest among all markers (Table
Number of sequences recovered using different barcode-primer pair combinations.
Primer pair | Raw sequences | Quality-filtered sequences | Fungal sequences | % fungal sequences | Data set connectance |
---|---|---|---|---|---|
SSU515Fngs-Euk742R | 2156146 | 1751042 | 1177111 | 67.2 | 0.264 |
SSU817F-SSU1196Rngs | 1583096 | 1431850 | 1382433 | 96.5 | 0.340 |
ITS1Fngs-ITS2 | 1104540 | 697900 | 634098 | 90.9 | 0.128 |
ITS1ngs-ITS2 | 1025094 | 451500 | 327397 | 72.5 | 0.128 |
ITS3tagmix-ITS4ngs | 2665289 | 1943355 | 1706010 | 87.8 | 0.065 |
gITS7-ITS4ngs | 1293599 | 1005751 | 923170 | 91.8 | 0.062 |
LR0Rngs-LF402 | 1001017 | 743637 | 742973 | 99.9 | 0.201 |
LF402Fmix-TW13 | 101161 | 84282 | 64661 | 76.7 | nd |
LR3R-LR5 | 761164 | 567222 | 384357 | 67.8 | 0.359 |
OTU accumulation curves and their 95% confidence intervals were computed for ITS1 and ITS2 barcodes using ESTIMATES 9.1.0 (
Differences in OTU richness among samples and barcode-primer combinations were evaluated based on two-way main-effect ANOVAs supplemented with Unequal n HSD tests for multi-level comparisons. To address the relative performance of the eight barcode-primer pair combinations in recovering the role of spatial, edaphic, floristic, and climatic predictors on fungal community composition, we performed multivariate permutational ANOVAs as implemented in the ADONIS routine of the vegan package of R (R Core Development Team 2013). Geographical coordinates were translated into Principal Coordinates of Neighbouring Matrices (PCNM) vectors with soil element concentrations being logarithm-transformed prior to analyses. All four categories of variables were included in separate matrices for these analyses following
To further test changes in phylogenetic community structure among samples and barcode-primer combinations, we assigned the OTUs to fungal classes. Sequence number-based proportions of classes were log-ratio transformed in relation to the proportion of non-
Potential analytical biases in the recovery of fungal classes by ribosomal DNA region (SSU, ITS, and LSU) and analysis method (metagenomics and amplicon) were addressed based on the average values of 14 shared samples using two-way main-effect ANOVAs neglecting interactions. rDNA region-based biases in amplicon and metagenomics data sets were further tested using two-way ANOVAs including the samples and regions as fixed factors.
To understand possible amplification biases related to sequence length in the ITS1 and ITS2 barcodes, we downloaded all ITS sequences of the 16 most common fungal classes from UNITE 7.0beta. Ribosomal RNA genes flanking the ITS1 and ITS2 barcodes were trimmed using ITSx 1.0.9. Average and median values and standard deviations were calculated for each group for illustrative purpose.
Although combinations of samples and primer pairs were normalized separately, sequences assigned to each primer pair were differentially represented in the raw and final data sets (Table
Barcodes generated by the universal primer pairs SSU515Fngs-Euk742R (SSU V4) and LR3R-LR5 (LSU D3) exhibited the distinctly lowest proportion of fungal sequences (67–68%), suggesting that fungi account for roughly two thirds of eukaryote ribosomal DNA in the studied soils on average. The proportion of fungal sequences was the greatest for the primer pair LR0Rngs-LF402, reaching 99.9% of all sequences. Classifications based on both BLASTn searches and NBC individually assigned >90% of the reads to fungi, indicating that this primer pair could indeed be the most fungus-specific of those tested. Of fungal classes,
The barcode-primer combinations exhibited five-fold differences in the number of fungal OTUs recovered in total and on the basis of samples rarefied to 8906 sequences (Figure
Sample-based OTU richness as recovered by different barcode-primer pair combinations. Error bars denote standard error; different letters indicate statistically different groups.
Rarefied OTU accumulation curves for samples based on the (
Correlation matrix of eight barcode-primer combinations in recovering OTUs per sample rarefied to 8609 sequences. Values denote Pearson correlation coefficients. Values <0.44 are statistically not significant at 95% confidence level.
|
|
|
|
|
|
|
|
|
SSU515Fngs-Euk742R | 0.82 | |||||||
SSU817F-SSU1196Rngs | 0.81 | 0.88 | ||||||
ITS1Fngs-ITS2 | 0.75 | 0.41 | 0.43 | |||||
ITS1ngs-ITS2 | 0.81 | 0.41 | 0.49 | 0.83 | ||||
ITS3tagmix-ITS4ngs | 0.85 | 0.71 | 0.69 | 0.69 | 0.67 | |||
gITS7-ITS4ngs | 0.82 | 0.66 | 0.69 | 0.70 | 0.63 | 0.96 | ||
LR0Rngs-LF402 | 0.90 | 0.93 | 0.87 | 0.48 | 0.50 | 0.81 | 0.77 | |
LR3R-LR5 | 0.82 | 0.84 | 0.81 | 0.44 | 0.55 | 0.71 | 0.64 | 0.85 |
Across all samples, barcode-primer pair correlations were strongly correlated in recovering the relative abundance of fungal classes (Table
Pearson correlations among barcode-primer pair combinations and the soil metagenome in recovering the relative abundance of fungal classes. All correlations are statistically highly significant (
SSU515Fngs-Euk742r | SSU817F-SSU1196Rngs | ITS1Fngs-ITS2 | ITS1ngs-ITS2 | ITS3tagmix-ITS4ngs | gITS7-ITS4ngs | LR0Rngs-LF402 | LF402F-TW13 | LR3R-LR5 | Median amplicon | |
---|---|---|---|---|---|---|---|---|---|---|
SSU817F-SSU1196Rngs | 0.82 | |||||||||
ITS1Fngs-ITS2 | 0.89 | 0.80 | ||||||||
ITS1Fngs-ITS2 | 0.89 | 0.74 | 0.97 | |||||||
ITS3tagmix-ITS4ngs | 0.88 | 0.76 | 0.92 | 0.90 | ||||||
gITS7-ITS4ngs | 0.87 | 0.74 | 0.92 | 0.91 | 0.98 | |||||
LR0Rngs-LF402 | 0.87 | 0.75 | 0.90 | 0.89 | 0.91 | 0.90 | ||||
LF402F-TW13 | 0.87 | 0.72 | 0.88 | 0.88 | 0.85 | 0.84 | 0.89 | |||
LR3R-LR5 | 0.82 | 0.79 | 0.79 | 0.75 | 0.72 | 0.71 | 0.79 | 0.79 | ||
Median amplicon | 0.92 | 0.80 | 0.96 | 0.95 | 0.98 | 0.97 | 0.95 | 0.91 | 0.79 | |
Metagenome | 0.83 | 0.77 | 0.85 | 0.84 | 0.82 | 0.80 | 0.86 | 0.85 | 0.76 | 0.87 |
Ecological analyses using all barcodes consistently revealed that vegetation structure was the strongest predictor of fungal communities (Table
Relationship between connectance and adjusted coefficient of determination (
Effects of environmental parameters on community composition of fungi as revealed by eight rDNA barcode-primer pair combinations and two distance measures.
Bray-Curtis dissimilarity | Hellinger distance | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
DF | Sum of Squares |
|
DF | Sum of Squares |
|
|||||
|
||||||||||
Vegetation | 6 | 2.401 | 1.659 | 0.145 | 0.003 | 6 | 2.049 | 2.131 | 0.197 | 0.001 |
Climate | 3 | 1.011 | 1.397 | 0.028 | 0.079 | 3 | 0.592 | 1.231 | -0.015 | 0.127 |
Soil | 5 | 1.071 | 0.888 | -0.068 | 0.714 | 5 | 0.995 | 1.241 | -0.026 | 0.075 |
Spatial vectors | 1 | 0.104 | 0.430 | -0.029 | 0.988 | 1 | 0.122 | 0.761 | -0.020 | 0.811 |
Residuals | 8 | 1.930 | 0.296 | 8 | 1.282 | 0.254 | ||||
|
||||||||||
Vegetation | 6 | 2.139 | 2.161 | 0.245 | 0.001 | 6 | 1.369 | 2.666 | 0.264 | 0.001 |
Climate | 3 | 0.737 | 1.490 | 0.025 | 0.115 | 3 | 0.355 | 1.383 | -0.014 | 0.085 |
Soil | 5 | 0.605 | 0.733 | -0.118 | 0.837 | 5 | 0.507 | 1.184 | -0.062 | 0.212 |
Spatial vectors | 1 | 0.039 | 0.234 | -0.037 | 0.991 | 0 | n.a. | n.a. | n.a. | n.a. |
Residuals | 8 | 1.320 | 0.273 | 9 | 0.771 | 0.257 | ||||
|
||||||||||
Vegetation | 6 | 3.682 | 1.596 | 0.113 | 0.002 | 6 | 3.796 | 2.087 | 0.177 | 0.001 |
Soil | 5 | 2.139 | 1.113 | -0.017 | 0.146 | 5 | 1.870 | 1.234 | -0.025 | 0.038 |
Climate | 3 | 1.283 | 1.112 | -0.009 | 0.201 | 3 | 1.155 | 1.269 | -0.010 | 0.055 |
Spatial vectors | 2 | 0.810 | 1.053 | -0.011 | 0.349 | 2 | 0.670 | 1.105 | -0.017 | 0.247 |
Residuals | 8 | 3.153 | 0.285 | 8 | 2.425 | 0.245 | ||||
|
||||||||||
Vegetation | 6 | 3.682 | 1.549 | 0.113 | 0.001 | 6 | 3.998 | 2.069 | 0.187 | 0.001 |
Soil | 5 | 2.139 | 1.080 | -0.017 | 0.241 | 5 | 1.919 | 1.192 | -0.027 | 0.086 |
Climate | 3 | 1.283 | 1.079 | -0.009 | 0.286 | 3 | 1.166 | 1.207 | -0.013 | 0.104 |
Spatial vectors | 3 | 1.112 | 0.935 | -0.027 | 0.731 | 3 | 0.917 | 0.949 | -0.041 | 0.612 |
Residuals | 7 | 2.773 | 0.252 | 7 | 2.254 | 0.220 | ||||
|
||||||||||
Vegetation | 6 | 3.7079 | 1.723 | 0.129 | 0.001 | 6 | 3.743 | 2.069 | 0.179 | 0.001 |
Soil | 5 | 2.0253 | 1.129 | -0.024 | 0.102 | 5 | 1.820 | 1.207 | -0.027 | 0.058 |
Climate | 3 | 1.3506 | 1.255 | 0.001 | 0.023 | 3 | 1.166 | 1.289 | -0.006 | 0.027 |
Spatial vectors | 3 | 1.1044 | 1.026 | -0.025 | 0.398 | 3 | 0.895 | 0.989 | -0.038 | 0.498 |
Residuals | 7 | 2.5107 | 0.235 | 7 | 2.111 | 0.217 | ||||
|
||||||||||
Vegetation | 6 | 3.806 | 1.726 | 0.136 | 0.001 | 6 | 3.921 | 2.178 | 0.192 | 0.001 |
Soil | 5 | 2.0525 | 1.117 | -0.023 | 0.154 | 5 | 1.864 | 1.243 | -0.027 | 0.037 |
Climate | 3 | 1.3146 | 1.193 | -0.004 | 0.088 | 3 | 1.185 | 1.316 | -0.007 | 0.037 |
Spatial vectors | 1 | 0.3236 | 0.881 | -0.012 | 0.721 | 1 | 0.282 | 0.940 | -0.014 | 0.545 |
Residuals | 9 | 3.3071 | 0.306 | 9 | 2.701 | 0.271 | ||||
|
||||||||||
Vegetation | 6 | 3.254 | 1.897 | 0.160 | 0.001 | 6 | 2.869 | 2.259 | 0.212 | 0.001 |
Soil | 5 | 1.792 | 1.253 | -0.006 | 0.049 | 5 | 1.357 | 1.282 | -0.019 | 0.034 |
Climate | 3 | 0.983 | 1.146 | -0.015 | 0.198 | 3 | 0.730 | 1.150 | -0.024 | 0.208 |
Spatial vectors | 3 | 0.768 | 0.895 | -0.043 | 0.741 | 3 | 0.578 | 0.911 | -0.049 | 0.704 |
Residuals | 7 | 2.001 | 0.227 | 7 | 1.482 | 0.211 | ||||
|
||||||||||
Vegetation | 6 | 2.328 | 2.309 | 0.231 | 0.001 | 6 | 1.408 | 2.339 | 0.214 | 0.001 |
Soil | 5 | 0.821 | 0.977 | -0.083 | 0.529 | 5 | 0.618 | 1.232 | -0.043 | 0.1 |
Climate | 3 | 0.582 | 1.154 | -0.026 | 0.248 | 3 | 0.359 | 1.194 | -0.027 | 0.171 |
Spatial vectors | 1 | 0.316 | 1.878 | 0.016 | 0.035 | 1 | 0.175 | 1.743 | 0.009 | 0.027 |
Residuals | 8 | 1.345 | 0.249 | 8 | 0.802 | 0.239 |
In the ITS1 barcode data set, the choice of primers (ITS1Fngs vs. ITS1ngs) explained 2.0% and 1.8% of the community variation based on Hellinger distance (partial analysis:
Global Nonmetric Multidimensional Scaling (NMDS) graph demonstrating the relative placement of samples (lower case letters, encoded in Suppl. material
Of the 290,779,313 high-quality metagenome sequences from the PNG soil samples, 1,309,342 (0.45%) were assigned to ribosomal DNA of prokaryotic and eukaryotic organisms. Bacterial sequences and eukaryote sequences unassigned to any kingdom dominated the rDNA subset. Only 16,833 (1.29%) of these sequences were determined to represent fungal nuclear rDNA. Across all samples and regions,
Within the soil metagenome, there were substantial differences in the recovery of fungal classes based on SSU, ITS, and LSU (Figure
Relative abundance of fungal classes in the amplicon and metagenomics data sets divided into SSU, ITS, and LSU subsets averaged over different barcodes (amplicon data) and 14 shared samples. Asterisks in the margins indicate significant differences in recovery of classes among SSU, ITS, and LSU of metagenomics (right) and amplicon (left) data sets. Asterisks in the center indicate significant differences between the metagenomics and amplicon-bases approaches.
Metagenomic analysis of the mock community enabled us to estimate class-level biases in identification of fungi. While other groups were roughly evenly represented,
In-depth analysis of primer mismatches to fungal templates revealed potential systematic biases inherent to different primer pairs (Appendix
Fungal taxa differed roughly three-fold in the length of the ITS1 and ITS2 barcodes (Figure
Differences in sequence length in the ITS1 and ITS2 barcodes of 16 most abundant fungal classes as revealed based on amplicon libraries in this study. Columns, asterisks, and error bars represent mean and median values and standard deviation, respectively. Numbers inside bars indicate the number of sequences analyzed (
Our analyses of seven barcodes indicate that markers differ substantially in their ability to recover OTUs at the 97% sequence similarity threshold, a threshold value that is almost universally used in HTS studies. Consistent with the lower species-level discrimination power of SSU and LSU compared with ITS (
In spite of the low taxonomic resolution of SSU and LSU, these barcodes were relatively more efficient in recovering trends in community composition in terms of greater proportion of variance explained. There are two alternative and perhaps additive explanations to this observation. First, phylogenetic niche conservatism among fungi may reinforce this pattern (
Although all barcode-primer pair combinations revealed that floristic variables account for the strongest effects in fungal community composition, there were statistically significant primer biases that were not reported in previous studies (
Currently, the Illumina HiSeq technology enables generation of 4×108 DNA sequences per run. In our soil samples, fungal nuclear rDNA represented ca. 0.005% of all DNA molecules, resulting in an average sequencing depth of approx. 1000 sequences per sample (
PCR and primer biases in the amplicon data sets are well addressed, but there are also certain biases inherent to metagenomics approaches that are related to base composition and replication of DNA fragments (Gomez-Alvarez et al. 2009). Because the metagenomics sequences exhibit very low overlap across the rDNA, it is impossible to assign these sequences to OTUs and recover taxonomic richness (cf.
Except for some minor but statistically significant taxonomic biases, the metagenomics data set covering SSU, ITS, and LSU provided highly comparable results to that of all barcodes taken together but especially to the results from ITS1 and ITS2. The metagenomics analyses confirmed that the
Given the low and uneven recovery of fungal rDNA sequences and difficulties in correct taxonomic assignment (see below), metagenomics with the sole purpose of metabarcoding is clearly a waste of financial and computational resources. Enrichment of targeted molecules such as mitochondrial DNA may improve the problems associated with insufficient sequencing depth (
Our in silico analysis of primer mismatches extends the results of
The technical biases of sample preparation steps are poorly understood and these may be largely specific to platforms, models, and chemistry. Nonetheless, depending on the base composition and size of DNA fragments, unequal competition for adaptors may occur (
In addition to PCR and primer biases inherent to amplicon-based analyses, a bias related to incompleteness of reference database and uncertainty of taxonomic assignment is common to both amplicon and metagenomics data sets. A part of this so-called “identification bias” results from differential quality and abundance of reference data that may affect the probability of identification of particular taxa (
Differential representation of taxa in the reference data sets certainly plays a much greater role in metagenomics data sets targeting all genes (
This study demonstrates that PCR-free metagenomics and amplicon-based approaches perform in a comparable fashion in recovering major fungal classes in spite of certain statistical differences. Within the amplicon data set, barcode-primer pair combinations differed strongly in recovering relative abundance of fungal classes and OTU richness (see also
We acknowledge C.W. Schadt and A. Rosling for raising communication about primer biases regarding
Sequences of widely used ITS primers, their mismatches to fungal taxa, and recommendations for metabarcoding studies.
Primers and taxa | Sequences and mismatches |
---|---|
|
|
ITS1F (original: |
|
ITSOF-T ( |
|
ITSOF ( |
|
ITS1Fngs (this study) |
|
****C**C***************C | |
****G***T*************** | |
****C******************* | |
***CC******************* | |
*******************T**** | |
|
****CC****************** |
|
****AC****************** |
|
**********C************* |
******A***************** | |
*********G************** | |
********************C*** | |
|
****A**CT*************** |
Viridiplantae | *****TA**************G*G |
Metazoa | ****K*A***************W* |
ITS5 (original: |
|
Tulasnella p.parte | ******WC************** |
*******M*****A******** | |
|
***T****************** |
****C***************** | |
*******C************** | |
Metazoa | ******W*************** |
Viridiplantae | *****G*G************** |
ITS1 (original: |
|
ITS1ngs (this study) |
|
|
*****T********A**** |
|
G*W*****W********M* |
Viridiplantae, Metazoa | ******************* |
gITS7 ( |
|
fITS7 ( |
|
ITS86F ( |
|
|
*C**********G******** |
no match | |
|
*********T****T****** |
*********TC**GT****** | |
|
*C**RY*************** |
************G******** | |
|
**A**CT***********A** |
|
**********-********** |
|
**************T****** |
|
************G*TC***** |
|
*********T-***T****** |
Neocallismatigales | **********A******C*** |
Lobulomycetales | *********TA********** |
**********A********** | |
|
**********A***T****** |
************G******** | |
|
************G*T****** |
|
*Y***Y******G******** |
|
************G*T****** |
Metazoa p.parte | A*T*A***CA*********** |
Metazoa p.parte | ****A*TGCA*G*CACA*K** |
Straminipila | ****R*****R*RWY****** |
ITS3 (original: |
|
58A1F ( |
|
58A2F ( |
|
ITS3-Kyo1 ( |
|
ITS3-Kyo3 ( |
|
ITS3mix ( |
|
|
*******************Y*W* |
|
no match |
|
CY********************* |
***************R******* | |
|
************R****C***** |
***C******************* | |
A********************** | |
AY*************A***T*** | |
|
R**C***********R******* |
|
C********************** |
Ophiocordyceps | ****TA**A************** |
|
A********************** |
***A******************* | |
***A******************* | |
******************G**** | |
*******************T*** | |
*************G********* | |
*T**************T****** | |
Thecaphora, Thysanophora | *T********************* |
****************T****** | |
********A************** | |
*T********************* | |
****************T****** | |
***A******************* | |
|
*A********************* |
|
*TGA*********WY*TT** |
Nuclearida, |
***A**************** |
Viridiplantae | ****************Y**Y |
*TRA**************** | |
AT*A*********G*AT*** | |
AA***********C**T*** | |
**T****Y*****G*****T | |
*Y*A**************** | |
A**A*********C**TG** | |
A**A********GR****** | |
***C*TN*****GG****** | |
AT*A************T*** | |
|
A**A**************** |
|
***A**************** |
***A**************** | |
*YG*******G*****T*** | |
*Y*N**************** | |
ATT**T************Y* | |
CAG*********G****G** | |
|
A**A**************** |
|
***A**************** |
|
***A**************** |
|
*YG**********GW***** |
AGN***************** | |
GGG********R*******T | |
RGN********R****GG** | |
*T*********A******** | |
TGG********A****GT** | |
GGG****************T | |
*G*******G********** | |
GGR**********C****** | |
|
*T**************T*** |
|
*TG**********G****** |
|
GGG**********G****** |
|
*Y**************Y*** |
|
*T*C*********G****** |
TGG*************Y**T | |
AA*********A***AT**T | |
AA*G*******A***RT**T | |
CA**************T*** | |
Ichtyes p.parte | CGC***************** |
A**Y*********M***T** | |
A****************T** | |
|
****************AC*T |
|
A*WA**************** |
|
A**W**************** |
R**A**************** | |
***A*************GT* | |
*YR*************Y*T* | |
|
RY*W************Y*** |
|
AY*A**************** |
ITS4 (original: |
|
ITS4ngs ( |
|
|
****G*************** |
|
*****GC************* |
*********G*W*A****** | |
|
****S*Y******M****** |
Viridiplantae, |
******************** |
|
*************R****** |
***K*******M*T****** | |
**T***A****A*T****** | |
**TG*******G*T****** | |
***********A*A****** | |
***********A*T****** | |
***********A******** | |
|
*********C*R*A****** |
|
***********G*T****** |
|
***********G******** |
|
***********G******** |
|
***********K*T****** |
|
***********K******** |
*********W*N******** | |
********C**G*A****** | |
******C**C***A****** | |
Ichtyes | ***********G*A****** |
|
*************A****** |
|
***********A*A****** |
|
***********Y******** |
|
*************A****** |
****G*T****G*A****** | |
***********K*A****** | |
********C**G******** | |
|
******************** |
|
***********A***G**** |
|
***********G*T****** |
|
***********A*T****** |
*****G*******A****** | |
|
***W*******R*W****** |
|
***********G*T****** |
Nucleariida | ***********C******** |
ITS4B ( |
|
**A******************** | |
|
**********A************ |
|
**********************A |
***************Y******* | |
|
**********R*******K**GA |
Amylocorticiales | **********A**********GA |
|
***A*****************GA |
|
******R*Y*R*******K**RK |
|
*****A****A************ |
|
************G********** |
|
**A***************G*TG* |
**A**R****K******R****R | |
**A*G****************** | |
|
*CA******************** |
|
**A*************R****RA |
|
**RRR********R****Y**RR |
*****A**********A****** | |
|
**A************T******* |
****GA*T**A*GTGT*****GA | |
****GA***************GA | |
**A*G*****RG*********** | |
|
**A*G****************G* |
|
**AC***T*************R* |
|
**********AGG*WT*****GW |
<40% identical | |
Other fungal phyla, |
<40% identical |
LB-w ( |
|
|
TA**************TG*** |
|
*****************G*** |
|
*******C*********G*** |
|
*******C************* |
|
**************A****T* |
|
*************GA****TC |
|
**************W****W* |
********************* | |
|
**************A****T* |
Viridiplantae | *****************G*** |
|
|
LR0R ( |
|
LR0Rngs (this study) |
|
|
**G*C**NR*Y****** |
|
**G************** |
|
***GC************ |
|
MMMKSY**R******** |
Viridiplantae | **********T****** |
LF402Fmix1 (this study; LF402 is a reverse complement) |
|
LF402Fmix3 (this study) |
|
Most fungi | *********TTG******** |
|
************CG**A*** |
**********Y*GTR***** | |
*********C********** | |
|
***************T**** |
***********A******** | |
|
*********CCA******** |
|
*********C****R***** |
********A*********** | |
********A****T*CW*** | |
|
*********C*A******** |
***********A******** | |
**********CR******** | |
Viridiplantae | T*********C*GG****** |
TW13 (original: T.J. White, unpublished) |
|
LR3 (original: |
|
*****A************** | |
******************** | |
Dictyostelids | RR**YR*R***T****TA** |
LR5-Fung ( |
|
|
********Y*********** |
*********C********** | |
Straminipila, Metazoa | ******************** |
Viridiplantae, |
***A***************T |
LR5 (original: |
|
TW14 (T.J. White et al. unpublished) |
|
********************* | |
Candida p.parte | ***********A********* |
**G********A**TCT**** | |
Straminipila | ****************Y**** |
|
|
SSU515F (original: |
|
SSU515Fngs (this study) |
|
*I***************** | |
|
*********C********* |
|
*********T********* |
Viridiplantae, Metazoa | ******************* |
Euk742R (this study) |
|
Many fungal groups | NR******************* |
|
TGG**ACT*G********C** |
|
GGR**ANM*R********Y** |
*****************Y*** | |
NR***************Y*Y* | |
|
TGG*T****T********A** |
CG*******G********C** | |
|
RW**NMN*********Y**** |
NNR**Y*************** | |
*****Y*************** | |
SSU817F (original: |
|
|
*********G****CSG*WWC*** |
|
*************RCWN*TKGACN |
*********G************** | |
|
**************C********* |
|
*********************R** |
AC********************** | |
**************K********* | |
**************C********* | |
*******************C*A** | |
*********************A** | |
************R******N*N*M | |
NY***W*****************R | |
NN*T********G*CAGG*CC*YT | |
NN************C***T*--** | |
Urocystidales | **************N****Y**** |
*********G************** | |
|
**************C********* |
*********************N** | |
|
*******************K*N*W |
|
*Y************Y********* |
no match | |
SSU1196R (original: |
|
SSU1196Rngs (this study) |
|
|
**CT**************T* |
|
**CT*T***********GT* |
************A******* | |
***************C*GT* | |
***************A**T* | |
**C*****C*********** | |
************R*****T* | |
|
NNNNNR**N***R**R*KT* |
Viridiplantae | ************A******* |
|
|
Recommended primer mixes for the ITS1F family | |
ITS1Fngs-Mix1 ( |
GGTCATTTAGAGGAAGTAA |
ITS1Fngs-Mix2 ( |
GGCCATTTAGAGGAAGTAC |
ITS1Fngs-Mix3 ( |
GGTCATTTAGAGGAACTAA |
ITS1Fngs-Mix4 (various groups) | GGTCGTTTAGAGGAAGTAA |
ITS1Fngs-Mix5 ( |
GGCTATTTAGAGGAAGTAA |
Recommended primer mixes for the ITS1 family | |
ITS1ngs-Mix1 (Most eukaryotes) | TCCGTAGGTGAACCTGC__ |
ITS1ngs-Mix2 ( |
TCCGTTGGTGAACCAGC__ |
Recommended ITS1 and full ITS forward primer mixes for fungi | |
ITS1Fngs (except SSU 5’ intron containing groups) | GGTCATTTAGAGGAAGTAA |
ITS1ngs (except |
TCCGTAGGTGAACCTGC |
Recommended forward primer mixes for ITS2 barcode | |
ITS3-Mix1 ( |
CATCGATGAAGAACGCAG_ |
ITS3-Mix2 ( |
CAACGATGAAGAACGCAG_ |
ITS3-Mix3 ( |
CACCGATGAAGAACGCAG_ |
ITS3-Mix4 ( |
CATCGATGAAGAACGTAG_ |
ITS3-Mix5 ( |
CATCGATGAAGAACGTGG_ |
Recommended reverse primers for ITS2 and full ITS | |
ITS4-Mix1 ( |
TCCTCCGCTTATTGATATGC |
ITS4-Mix2 ( |
TCCTGCGCTTATTGATATGC |
ITS4-Mix3 ( |
TCCTCGCCTTATTGATATGC |
ITS4-Mix4 ( |
TCCTCCGCTGAWTAATATGC |
ITS4-Euk (all eukaryotes) | TCCTSSGCTTANTDATATGC |
Recommended LF402 mixes for fungi | |
LF402f_mix1 ( |
TTCCCTTTYARCAATTTCAC |
LF402f_mix2 ( |
TTCCATTTCAACAATTTCAC |
LF402f_mix3 ( |
TTCCCTTTTGACAATTTCAC |
LF402f_mix4 ( |
TTCCCYACCRACAATTTCAC |
LF402f_mix5 (Cantharellus) | TTCTCCGTCAACAATTTCAC |
Table S1. Characteristics of soil samples.
measurement
Characteristics of soil samples used in this study.
Table S2. Taxonomic composition and clustering of the mock community sample.
measurement
Taxonomic composition and clustering of the mock community sample.
Table S3. Data set of the SSU V4 and V5 barcodes.
data set
Data set of the SSU V4 and V5 barcodes.
Table S4. Data set of the ITS1 barcode.
data set
Data set of the ITS1 barcode.
Table S5. Data set of the ITS2 barcode.
data set
Data set of the ITS2 barcode.
Table S6. Data set of the LSU D1, D2, and D3 barcodes.
data set
Data set of the LSU D1, D2, and D3 barcodes.
Table S7. Taxonomic classification of the rDNA of fungal.
taxonomic data
Taxonomic classification of the rDNA of fungal shotgun metagenome.