Research Article |
Corresponding author: Angelina Ceballos-Escalera ( angelinaceballosescalera@gmail.com ) Academic editor: Francesco Dal Grande
© 2022 Angelina Ceballos-Escalera, John Richards, Maria Belen Arias, Daegan J. G. Inward, Alfried P. Vogler.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Ceballos-Escalera A, Richards J, Arias MB, Inward DJG, Vogler AP (2022) Metabarcoding of insect-associated fungal communities: a comparison of internal transcribed spacer (ITS) and large-subunit (LSU) rRNA markers. MycoKeys 88: 1-33. https://doi.org/10.3897/mycokeys.88.77106
|
Full taxonomic characterisation of fungal communities is necessary for establishing ecological associations and early detection of pathogens and invasive species. Complex communities of fungi are regularly characterised by metabarcoding using the Internal Transcribed Spacer (ITS) and the Large-Subunit (LSU) gene of the rRNA locus, but reliance on a single short sequence fragment limits the confidence of identification. Here we link metabarcoding from the ITS2 and LSU D1-D2 regions to characterise fungal communities associated with bark beetles (Scolytinae), the likely vectors of several tree pathogens. Both markers revealed similar patterns of overall species richness and response to key variables (beetle species, forest type), but identification against the respective reference databases using various taxonomic classifiers revealed poor resolution towards lower taxonomic levels, especially the species level. Thus, Operational Taxonomic Units (OTUs) could not be linked via taxonomic classifiers across ITS and LSU fragments. However, using phylogenetic trees (focused on the epidemiologically important Sordariomycetes) we placed OTUs obtained with either marker relative to reference sequences of the entire rRNA cistron that includes both loci and demonstrated the largely similar phylogenetic distribution of ITS and LSU-derived OTUs. Sensitivity analysis of congruence in both markers suggested the biologically most defensible threshold values for OTU delimitation in Sordariomycetes to be 98% for ITS2 and 99% for LSU D1-D2. Studies of fungal communities using the canonical ITS barcode require corroboration across additional loci. Phylogenetic analysis of OTU sequences aligned to the full rRNA cistron shows higher success rate and greater accuracy of species identification compared to probabilistic taxonomic classifiers.
clustering, fungi, ITS, LSU, metabarcoding, pathogens, phylogeny, Scolytinae
Fungal communities associated with insects have been widely studied to disentangle the ecological roles and specificities of these interactions (
There is broad agreement that the internal transcribed spacer (ITS) of the nuclear rRNA gene cluster should be the standard DNA barcode in fungi (
In practice, both the ITS and LSU/SSU markers exhibit particularities whose benefits and drawbacks depend on the aim and scope of a study (
The problem of marker choice and the comparability of metabarcoding studies using either type could be alleviated if both regions were sequenced for the same specimens. Whilst this is a powerful approach for cultured isolates (
Here, we address the problem of identification and unification of information derived from both markers using phylogenetic approaches. Metabarcodes obtained from a given community, as those associated with a single insect, should be composed of the same lineages, and thus occupy the same positions in a phylogenetic tree. Generating trees independently for ITS and LSU does not overcome the problem of associating the sequences from both amplicons, and hence the aim here is to integrate these sequences in the same tree. This may be achieved based on a scaffold of well-identified reference sequences covering the entire rRNA cluster, including ITS and LSU, to which the non-overlapping sequences for each marker are added for a joint tree search. If both markers represent the same fungal community, the corresponding ITS and LSU sequences should appear in a similar place in the tree, relative to a given reference sequence spanning both regions. Besides the greater precision of the phylogenetic position, the use of both barcodes in a single analysis also overcomes the problem of using different reference sets in the prevailing databases for ITS and rRNA markers.
We test this approach for fungal communities associated with bark beetles (Coleoptera: Scolytinae). These insects breed in living or dead trees and form close associations with fungi, which are important for access to nutrients from wood that cannot be utilised directly by the beetles themselves (
Sequence data were generated from 20 specimens per species for four species (Xylosandrus germanus, Xyleborinus saxesenii, Gnathotrichus materiarius, and Tomicus piniperda), for a total of 80 specimens (Table
Beetle species included in the study and relevant life history information.
Forest type | Beetle species | Status | Adapted structures | Feeding mode |
---|---|---|---|---|
Spruce, oak | Xylosandrus germanus | Introduced | Mesonotal mycangia | Xylomycetophagous |
Spruce, oak | Xyleborinus saxesenii | Native | Elytral mycangia | Xylomycetophagous |
Pine, spruce | Gnathotrichus materiarius | Introduced | Tubular opening near precoxae | Xylomycetophagous |
Pine, spruce | Tomicus piniperda | Native | No known mycangia | Xylophagous |
In the laboratory, the specimens were rinsed with pure water to remove loosely adhering fungal tissue, and thoroughly macerated individually to ensure that all fungi associated with the specimens were released. DNA was extracted using the DNeasy Blood and Tissue spin column extraction kit (Qiagen, Valencia, CA, USA). Individual DNA extracts were first tested for correct beetle species identification using the COI barcode marker, which was amplified for a 418 bp fragment and sequenced on Illumina HiSeq following methods of
The DNA extracts were then used for fungal metabarcoding of the ITS2 region with primers ITS86F (5′-GTGAATCATCGAATCTTTGAA-3′) (
Raw reads were demultiplexed, primer sequences trimmed, and singleton reads removed with Cutadapt v. 2.10 (
Forward and reverse reads were merged and quality filtered (Phred score ≥ 30) using PEAR v. 0.9.8 (
Fungal OTUs were classified following three widely used methods for species identification. The Ribosomal Database Project (RDP) Bayesian Classifier (
The Sordariomycetes subset was also used to test the effect of variable sequence similarity thresholds on the classification, by generating OTUs under clustering at 97%, 98%, and 99% similarity and comparing the taxonomic assignments, using the RDP classifier (Warcup 2 and Fungal 11 training sets for ITS2 and LSU D1-D2, respectively) (
Reference sequences for the class Sordariomycetes were downloaded from Genbank, querying the database for various permutations of the gene names for the rRNA cluster composed of SSU, LSU and ITS, separately for each target fungal order. Only sequences that were complete for at least 2/3 of the rRNA operon were chosen (full list of accessions in Suppl. material
Phylogenetic distribution of ITS2 and LSU D1-D2 copies was assessed by metrics of clustering and over-dispersion originally developed for community ecology (
Community ecological analyses were carried out on samples rarefied to 1000 reads, which was sufficient for generating largely complete OTU sets as judged by species accumulation curves (Suppl. material
The Jaccard index was used to calculate beta-diversity between sample pairs based on OTU presence-absence data (richness) (betapart R package;
Sequencing of 80 libraries produced 2,436,075 quality-filtered, merged reads for ITS2 and 1,742,119 reads for LSU D1-D2, which resulted in 1157 OTUs from ITS2 and 548 OTUs from LSU D1-D2 after bioinformatics filtering and clustering at 97% threshold (1546 and 632 OTUs if singleton and doubleton reads were retained and without applying rarefaction on each library). Identifications of OTUs at ≥ 70% confidence level obtained with IDTAXA, Protax-fungi and RDP were higher for ITS2 than for LSU D1-D2 at all hierarchical levels from class to order, family, genus and species level (Fig.
Libraries from 73 beetle specimens remained after rarefaction and harboured a total of 1180 OTUs for ITS2 and 553 OTUs for LSU D1-D2. Using taxonomic classifiers, OTUs were assigned to 24 classes, 66 orders, 129 families and 369 genera. Identification at class level revealed the presence of 23 classes for ITS2 and 17 classes for LSU. The dominant classes were Dothideomycetes for ITS2 and Sordariomycetes for LSU D1-D2 (Fig.
Top panel: The proportion of OTUs identified as members of a fungal Class determined by the ITS2 and LSU D1-D2 regions. For the spruce forest, only nine X. germanus and four X. saxesenii specimens were retained after rarefaction. Lower panel: The number of fungal OTUs per beetle specimen, separate for each beetle species and forest type, for ITS2 and LSU.
Fungal communities obtained with either marker were compared with regard to total richness and differentiation across beetle species and forest type. For both markers, species accumulation curves displayed a similar shape, despite the roughly twice higher OTU number in ITS2, with a slow increase and not reaching a plateau, although LSU D1-D2 generally showed a more pronounced ‘shoulder’ indicating a fraction of OTUs that is encountered commonly in multiple samples. Across the different forest types, species accumulation in oak forest clearly lagged pine and spruce forests (fewer total species, slower accumulation) in both markers (Suppl. material
Richness in a single-beetle extract ranged from 9 to 140 fungal OTUs (average 56 ± 32.34) in ITS2 and from 11 to 109 fungal OTUs (average 48 ± 24.27) in LSU D1-D2 (Fig.
Correlation of species richness with beetle species and forest type. The table shows the result of a GLM analysis showing the percentage of explained variance for each predictor with the F parameter and significance level.
Factor | Explained variance | Fx,y | p | |||
---|---|---|---|---|---|---|
ITS2 | LSU | ITS2 | LSU | ITS2 | LSU | |
Beetle + forest type | 8.46% | 7.58% | F 6, 65 = 2.809 | F 6, 65 = 3.521 | <0.1 * | <0.05 * |
Beetle | 27.07% | 18.25% | F 3, 69 = 30.265 | F 3, 69 = 29.888 | <0.001 *** | <0.001 *** |
Forest type | 18.75% | 27.47% | F 2, 67 = 6.772 | F 2, 67 = 5.236 | <0.01 ** | <0.01 ** |
Unexplained | 45.72% | 46.68 % |
The indval function revealed significant levels of association with the tree species and or the beetle species for 50 and 60 OTUs, respectively, from the ITS2 and LSU D1-D2 regions. Many OTUs showed positive associations with pine and spruce, but much fewer with oak. Regarding the associations with beetle species, many OTUs had positive associations with T. piniperda, and to some extent with G. materiarius, whereas positive associations with Xs. germanus and Xb. saxesenii were limited to a small number of oak associated OTUs. Most other associations in these species were negative; e.g., the pine/spruce associated OTUs were absent, despite the fact that both beetle species were also sampled from spruce. General patterns of OTU associations and non-associations were similar for the two xyleborine species, and they were quite similar to those associated with oak. In contrast, association patterns in T. piniperda and G. materiarius were similar to pine and spruce (Fig.
Heatmap using Pearson’s correlation coefficient between the OTUs generated from the ITS2 and LSU D1-D2 metabarcodes and the analysed beetle species and forest types. Rectangles indicate the strength of association between an OTU and beetle/forest (strongly negative, grey, to strongly positive, red). Fungal OTUs (on the horizontal axis) were classified to genus or species level where possible; they are shown in random order and cannot be linked taxonomically between both markers.
A phylogenetic approach was used to associate ITS-based and LSU-based OTUs with each other, focusing on the class Sordariomycetes that includes the Ophiostomatales of important tree pathogens for which ITS efficiency has been questioned (
Sequence numbers and phylogenetic dispersion in Sordariomycetes OTUs under different threshold values. The table presents the Net relatedness index (NRI), nearest taxon index (NTI), and the number of OTUs recovered for ITS2 and LSU D1-D2. “Mixed” refers to a clustering threshold of 99% for LSU D1-D2 and 98% for ITS2. Reference sequences were included when building the trees used, though pruned (leaving only OTUs in the tree) for the above calculations. Positive NRI and NTI scores indicate phylogenetic clustering of either ITS and LSU sequences (indicating different species sets were sequenced), negative values indicate phylogenetic over-dispersions of ITS and LSU with respect to each other (indicating the same species was sequenced for the two markers).
ITS2 | LSU | |||||||
---|---|---|---|---|---|---|---|---|
97% | 98% | 99% | Mixed | 97% | 98% | 99% | Mixed | |
NRI | -0,111 | -0,805 | 1,497 | -0,122 | 1,328 | 0,697 | -2,55 | -0,653 |
NTI | -1,212 | -3,81 | -2,386 | -2,277 | 1,367 | 0,882 | -0,183 | -1,343 |
OTU count | 138 | 144 | 158 | 144 | 80 | 102 | 150 | 150 |
OTU sequences from both markers were included in a phylogenetic analysis together with publicly available full-length sequences covering the full or most of the rRNA cluster, including the ITS2 and LSU D1-D2 regions, with the SSU gene also present in most accessions. These sequences served as a scaffold to represent the major orders of Sordariomycetes (full list of accessions in Suppl. material
OTUs obtained from ITS2 and LSU D1-D2 were widely distributed on this tree, and across most orders, both types of sequences were interleaved, showing that overall community diversity at the order level could equally be inferred using either region (Suppl. material
Closed-reference clustering against the reference dataset to each order within Sordariomycetes by the RDP classifier revealed species-level matches for both ITS2 and LSU sequences (Fig.
Where closely related reference sequences were missing, ITS2 and LSU sequences may be matched based on their phylogenetic proximity, but the ITS2 and LSU sequences obtained from a single genome may not appear as sister taxa because the gene sequences are non-overlapping and thus lack characters that could group them. We tested the degree to which ITS2 and LSU sequences interleave on the tree, by assessing phylogenetic clustering and dispersion with the NRI and NTI (Table
The detailed observations were confirmed by the global classification of OTUs at order level, which showed an increase in the proportion of identified OTUs with increasing threshold value for LSU, but not ITS2 (Fig.
ML tree of Sordariomycetes constructed from the reference sequence alignments and OTUs for both markers (clustering thresholds: 98% ITS2, 99% LSU D1-D2). Leotia lubrica (Leotiomycetes) was specified as the outgroup. The assignment of OTUs by each of the three classifiers (RDP, IDTAXA, Protax-fungi) is shown by coloured boxes. Terminals missing these boxes are the reference sequences. Coloured dots on the nodes of the tree indicate the hypothetical ancestor defining monophyletic groups corresponding to the various orders of Sordariomycetes. The extent of each order is indicated by the coloured inner ring. Note that the ancestor of an order is defined by the youngest node from which all reference sequences are descended; OTUs falling outside of the resulting clades appear as ‘unassigned’ by the phylogenetic analysis approach. The distribution of ITS2 (red squares) and LSU D1-D2 (blue bullets) relative to the reference set (yellow stars) on each of the tips of the tree. Note the limited presence of ITS sequences in the Ophiostomatales (in top right quadrant).
Order-level trees and splitting/lumping of OTUs at clustering A order-level ML trees with mixed OTU clustering thresholds (99% LSU D1-D2, 98% ITS2). Full tree in supplementary materials. Leotia lubrica was used as the outgroup (not pictured). Brackets indicate reference taxa linked to an ITS2 and/or LSU OTU, with colours indicating potential splitting/lumping (blue, splitting; green, lumping; orange, 1:1) B diagram illustrating the effects of splitting and lumping of an OTU in the fungal community on the tree inference. Four hypothetical species (A to D) in a community are treated under uniform clustering thresholds for ITS2 and LSU. This may result in deviation from the 1:1 ratio of OTUs expected if each species in the community is represented equally by both markers (species A). Threshold values may be too high, resulting in splitting of species into multiples OTUs, which is likely to affect the more variable ITS2 region (species B) or may be too low, resulting in lumping of multiple species into a single OTU, likely to affect the conservative LSU region (species C and D).
Closed reference clustering of OTUs and phylogenetic trees at different thresholds A results from the closed reference clustering of OTUs at each clustering threshold against composite LSU/ITS2 reference sequences. LSU matches in green, ITS2 matches in blue, linked matches (for which both an ITS2 and LSU OTU were matched to a reference sequence of the same species) in yellow. Underlined taxa indicate new matches at each clustering threshold B phylogenetic tree of LSU OTUs under increasingly stringent clustering thresholds, with arrows marking newly added taxa as threshold values are increased.
Metabarcoding has revolutionised the study of fungal communities, revealing the huge proportion of hitherto unobserved species, including the unexpectedly high diversity of fungi associated with bark beetles (
We assessed the congruence of signal from ITS and LSU metabarcoding for the characterisation of fungal communities, by (1) comparing the ecological associations at various taxonomic levels as established with either marker (for the entire fungal set), and (2) testing the species-level correspondence of OTUs from both markers based on their phylogenetic positions (for the class Sordariomycetes only). As we showed in both cases, OTU identification is challenging and depends on the available reference databases, as well as the specific strategy for linking the metabarcode sequences into the taxonomic system. Taxonomic classifiers are now widely used and are becoming increasingly sophisticated. However, placement was possible mostly to higher taxonomic levels only (Fig.
In contrast, simple counts of OTU numbers (a proxy of species richness) produced a good correlation between both markers in several key parameters describing the community composition. First, the numbers of OTUs and the higher-level composition of fungal communities obtained from each treatment (beetle species, forest type) assessed with the ITS2 and LSU D1-D2 data closely mirror each other (Fig.
Yet, the difficulty of linking these metabarcoding sequences across multiple markers leaves some uncertainty about the biological relevance of the community data, which still may represent different species within the major taxonomic groups recovered by either ITS2 and LSU D1-D2, as already suggested for the Ophiostomatales (
As illustrated for the Sordariomycetes, we found that OTU assignments obtained by taxonomic classifiers are broadly in agreement with the phylogenetic analysis. The backbone of the phylogenetic tree from the full-length rRNA reference sequences recovered each order of the Sordariomycetes as monophyletic (Fig.
Second, we used the phylogenetic analysis to determine if both markers reveal the same species-level entities. Under ideal circumstances, each species is represented by exactly one sequence each of LSU and ITS, and these two sequences from both markers find themselves in the same position of the tree. As both markers are non-overlapping, they only can be placed relative to full-length reference sequences rather than to each other, and therefore if 1:1 represented for each species, sequences of ITS and LSU should be uniformly interleaved on the tree (Fig.
There are uncertainties associated with this inference. Across fungal species, intraspecific ITS variability varies considerably, highlighting the challenges and inevitable shortcomings resulting from the selection of a uniform OTU clustering threshold (
We addressed the problem of marker choice in fungal metabarcoding for the study of biodiversity patterns and taxonomic identifications. Community-level diversity metrics showed high consistency of results from metabarcoding with both the ITS2 and LSU D1-D2 regions, using OTU clustering under the widely used 97% threshold level. However, when identified with standard taxonomic classifiers, great discrepancies in the taxonomic composition at species level were evident between both markers. We attempted to reconcile the two distinct ‘images’ of the community using a phylogenetic approach that incorporates barcodes from both regions into a single phylogeny generated from reference sequences covering the full rRNA cistron. We find that the ITS2 and LSU D1-D2 metabarcodes are broadly interleaved in these trees, linking individual sequences across markers. This analysis also was used to select threshold values for clustering in each marker, recommending a mixed strategy of 98% similarity for ITS2 and 99% similarity for LSU D1-D2. Phylogenetic approaches which, unlike taxonomic classifiers, do not rely on sequence similarities with marker-specific reference sets, can link barcodes from different regions and provide greater precision of taxonomic placement. In addition, the approach provides a means to evaluate threshold values for clustering; despite the general tendency for the use of denoised ‘exact sequence reads’ (ASVs;
We express our gratitude for advice and consultations to Carola Gomez-Rodriguez (Santiago de Compostela), Kirsten Miller (Newcastle), and Ruth Bone (Kew Gardens). We thank Nicolas Magain and Brendan Furneaux for their most helpful comments on an earlier version of the manuscript. This work was funded by a PhD studentship of the John Spedan Lewis Foundation to ACE.
Figure S1. Length distribution of the ITS (grey) and LSU (orange) OTUs
Data type: Eps file.
Explanation note: The average length for each amplified region is indicated with a dashed line.
Figure S2. Species accumulation curves of the OTUs generated from the ITS (panel right) and LSU (panel left) metabarcodes
Data type: Eps file.
Explanation note: Colours show the different forest types in which beetles were trapped: oak (red), spruce (blue) or pine (green).
Figure S3. Maximum-likelihood tree constructed in IQ-Tree2 based on three-gene (LSU D1-D2, SSU, ITS2) reference sequence alignments and OTUs for both markers (clustering thresholds: 99% LSU D1-D2 and 98% ITS2)
Data type: Pdf file.
Explanation note: Leotia lubrica (Leotiomycetes) used as the outgroup. Node values indicate ultrafast bootstrap approximation support (n = 1000).
Figure S4. Binding site of ITS86 primer showing mismatched base pairs in Ophiostomatales
Data type: Eps file.
Explanation note: Degenerate primer suggestion with variable base pairs in bold. While all other reference sequences were consistent with the non-Ophiostomatales sequences, only several are shown for clarity.
Table S1. Accession numbers corresponding with the reference sequences used to build the phylogenetic trees
Data type: Xlsx file.
Table S2. Class level identification of OTUs showing the number of OTUs produced with ITS2 and LSU and the proportion of the total OTU set on the rarefied data
Data type: Xlsx file.
Explanation note: OTUs classified at species level but not correctly classified at class level were considered as “Misclassified”.
Table S3. Number of OTUs assigned to each order based on RDP Bayesian classifier
Data type: Xlsx file.
Explanation note: Average confidence scores were calculated over all order-level assignments, though only classifications with > 50% confidence were taxonomised, all others were kept as “unclassified Sordariomycetes”.