﻿Metabarcoding of insect-associated fungal communities: a comparison of internal transcribed spacer (ITS) and large-subunit (LSU) rRNA markers

﻿Abstract Full taxonomic characterisation of fungal communities is necessary for establishing ecological associations and early detection of pathogens and invasive species. Complex communities of fungi are regularly characterised by metabarcoding using the Internal Transcribed Spacer (ITS) and the Large-Subunit (LSU) gene of the rRNA locus, but reliance on a single short sequence fragment limits the confidence of identification. Here we link metabarcoding from the ITS2 and LSU D1-D2 regions to characterise fungal communities associated with bark beetles (Scolytinae), the likely vectors of several tree pathogens. Both markers revealed similar patterns of overall species richness and response to key variables (beetle species, forest type), but identification against the respective reference databases using various taxonomic classifiers revealed poor resolution towards lower taxonomic levels, especially the species level. Thus, Operational Taxonomic Units (OTUs) could not be linked via taxonomic classifiers across ITS and LSU fragments. However, using phylogenetic trees (focused on the epidemiologically important Sordariomycetes) we placed OTUs obtained with either marker relative to reference sequences of the entire rRNA cistron that includes both loci and demonstrated the largely similar phylogenetic distribution of ITS and LSU-derived OTUs. Sensitivity analysis of congruence in both markers suggested the biologically most defensible threshold values for OTU delimitation in Sordariomycetes to be 98% for ITS2 and 99% for LSU D1-D2. Studies of fungal communities using the canonical ITS barcode require corroboration across additional loci. Phylogenetic analysis of OTU sequences aligned to the full rRNA cistron shows higher success rate and greater accuracy of species identification compared to probabilistic taxonomic classifiers.


Introduction
Fungal communities associated with insects have been widely studied to disentangle the nature of their interactions (Ganter 2006;Li et al. 2016;Malacrinò et al. 2017;Raman et al. 2012;Six 2020). For these studies to succeed, an accurate and reliable fungal identi cation is essential. However, these identi cations are challenging due to the cryptic nature and incomplete taxonomy of the fungi, as only 3-8% of fungal species have been described so far (Hibbett et al. 2016; Wedaralalage et al. 2020).
Conventional studies of fungal communities have been conducted by isolating and culturing each fungus recovered from the substrate (Batra 1963), which generated biases towards culturable species. Highthroughput sequencing has provided an alternative methodology, especially through metabarcoding, i.e. the sequencing of short amplicons from mixed communities (Yu et al. 2012). Metabarcoding is now widely applied in characterising species composition and biodiversity of fungal communities associated with insects. In the speci c case of fungal communities associated with bark beetles, metabarcoding usually detects dozens of species of fungi on a single insect specimen (Bálint et  In practise, both ITS and the LSU/SSU markers exhibit particularities whose bene ts and drawbacks depend on the aim and scope of a study (Porras-Alfaro et al., 2014). While the LSU/SSU genes are less variable than the ITS intergenic region, which favours alignment and tree-based analyses, their low molecular rate reduces the taxonomic resolution at the species-level. In turn, ITS provides better species resolution, but the higher mutation rate leads to intragenomic variation of the tandem repeat units, and high variability often translates into poor alignments, making it di cult to apply phylogenetic methods However, these studies generally have applied a coarse-grain approach of higher-level taxonomic analysis, rather than the species level. As the databases become more complete and the quality of HTS reads is improving, increasingly precise matches of database and query allow a greater detail of community analysis, ultimately at the species (Operational Taxonomic Unit, OTU) or haplotype (Ampli ed Sequence Variant, ASV) level. In the few cases where this approach has been taken to date, the choice of markers seems to be critical. For example, the ITS marker may not detect key pathogen species that are recovered using LSU (Hulcr et  This leaves the question about which marker is preferrable, and given the presumed advantages of each, to what degree does the marker choice matter for the conclusions from metabarcoding studies? Here we address these questions for the case of fungal communities associated with bark beetles (Coleoptera: Scolytinae). These insects breed in living or dead trees and form close associations with fungi important for access to nutrients from wood that cannot be utilised directly by the beetles themselves (Batra, 1963). Fungal communities associated with these beetles are highly diverse and form symbioses of various strength and speci city, which in many cases involve the active transport of fungal associates in speci cally adapted pockets of the beetles' exoskeleton, the mycangia (Six, 2020). A beetle-fungus complex can cause enormous damage to forest ecosystems, e.g. resulting in the demise of chestnuts in North America, elms across the Northern Hemisphere, or the recent large-scale decline of conifer forests in Central Europe and North America, which usually involve fungi from the ascomycote orders Ophiostomatales, Microascales and Hypocreales (Class Sordariomycetes) (Ploetz et al. 2013). Metabarcoding provides a tool for detailed studies of these complex communities, but the results may be in uenced by the choice of barcode markers and other problems of using short sequences from PCR mixtures to characterise complex species communities. Con dence in the metabarcoding approach can be improved by analysing ITS and LSU marker sets from a given sample in parallel, which should produce the same outcome. However, given different evolutionary rates, the limited power of phylogenetic placement, and different resolution of species differentiation may result in mismatched conclusions from either marker. We used individuals from four bark beetle species obtained from three forest types to characterise the associated fungal communities, conducting a comparison of the two markers with regard to: (1) broad ecological trends of fungal associations taking a whole-community approach, and (2) species identi cations against existing ITS and LSU fungal reference databases, using various taxonomic classi ers and explicit phylogenetic methods. The side-by-side comparison addresses the power of either markers to infer critical parameters of fungal community metabarcoding, such as the number and taxonomic identity of OTUs, their ecological associations, and the inferences on wholecommunity diversity and turnover.

Samples used and laboratory procedures
Sequence data were generated from 20 specimens each of four beetle species: Xylosandrus germanus, Xyleborinus saxesenii, Gnathotrichus materiarius, and Tomicus piniperda (Table 1). Only the latter is a xylophagous 'bark beetle' in the strict sense, while the three others are considered mycelia feeding (xylomycetophagous) 'ambrosia beetles' that apparently rely on active transport of fungi indicated by the presence mycangia (see Six, 2020b). Specimens were collected by Forest Research during 2013-2015 in the New Forest National Park (50º50'52.08" N 1º35'33.51" W), Hampshire, UK, using Lindgren multiplefunnel traps (Lindgren 1983) (Phero Tech). These traps were located in oak, spruce and pine forests and were baited with lures (100% ethanol, plus α-pinene) (Inward 2019). Propylene glycol (65%) was used as the preservation uid at the bottom of the traps. Specimens were morphologically identi ed and selected at random to obtain the same number of specimens per beetle species and forest type.
In the laboratory, the specimens were rinsed with pure water to remove loosely adhering fungal tissue, and thoroughly macerated individually to ensure that all fungi associated with the specimens were released. DNA was extracted using the Qiagen DNeasy Blood and Tissue spin column extraction kit (Qiagen, Valencia, CA, USA). Individual DNA extracts were rst tested for correct species identi cation using the COI barcode marker, which was ampli ed for a 418 bp fragment and sequenced on Illumina distinguish the libraries. PCRs were pooled from three replicates conducted under slightly different annealing temperatures (54°C, 55°C and 56°C) to accommodate differences in optimal ampli cation conditions of the fungal species (Schmidt et al. 2013) and blank PCR reactions were used as negative control. Successful PCR amplicons were puri ed using the AMPure XP magnetic beads (Beckman Coulter) and amplicons were indexed using a secondary PCR with Nextera XT DNA Library Preparation Kit (Illumina Inc.) and sequenced on an Illumina® HiSeq® 2500 platform to generate 2 x 300 bp paired-end reads.

Bioinformatics
Raw reads were demultiplexed, primer trimmed and singleton reads removed with Cutadapt (Martin 2011). Read quality was evaluated using FASTQC (Andrews 2014). The raw reads generated for these analyses are available in the BioSample Submission Portal as Bio-Project PRJNA727174 and Sequence Read Archive (SRA) accession numbers PRJNA727174 (barcoding) and (metabarcoding).
Forward and reverse reads were merged and quality ltered (Phred score ≥ 30) using PEAR v. 0.9.8 (Zhang et al. 2014). After merging, the average read length was 252 bp for ITS and 357 bp for LSU.
Subsequent steps were carried out using VSEARCH v. 2.15.0 (Rognes et al., 2016). A further quality test was effectuated using, with the --fastx_ lter command and -fastq_maxee 1.0. After dereplication (--derep_fulllength), assemblies were denoised (--cluster_unoise --minsize 4 --unoise_alpha 2) and length ltered for a range of 100 to 500 bp (--fastx_ lter) and all singletons removed. Chimera ltering was performed (--uchime3_denovo) and reads were then clustered into Operational Taxonomic Units (OTUs) at various similarity thresholds (97%, 98%, 99%) using VSEARCH. The average length of the OTUs was 270 bp for ITS and 347 bp for LSU (Supplementary Figure S1). Reads were then mapped to the OTU clusters, outputting an OTU table of read abundances suitable for the ecological analysis of the samples. platform and based on the UNITE fungal database (February 2020). PROTAXFUNGI hierarchically assigns the OTU identities from the root node of the taxonomy through to the species level (Nilsson et al. 2019b). It has not been implemented for LSU, and thus was applied to the ITS data only. A third classi er, IDTAXA, employs machine learning to reduce over-classi cation errors to obtain a higher accuracy (Murali et al. 2018). Taxonomic assignment was carried out separately on class, order, genus, and species level. The percentage accuracy of identi cation from these methods was averaged as a measure of con dence of each identi cation (average of three values for ITS and two for LSU data). A minimum threshold of 70% probability for at least one of the classi ers was set, below which the OTUs were considered as "unclassi ed", together with other sequences that were identi ed with high con dence against database entries labelled as "unclassi ed", "unidenti ed" or "incertae sedis". When identi cations disagreed among the classi ers, the one with the highest probability was selected. Taxonomic composition of samples was presented as the number of OTUs assigned to a given taxonomic level in a barplot created with ggplot2 in Rstudio (team 2015).
In a more detailed study of OTU assignments in the ecologically important class Sordariomycetes, OTUs generated from clustering at 97%, 98%, and 99% similarity thresholds were classi ed using the Warcup training set and the RDP fungal data set for ITS2 and LSU, respectively (Deshpande et al. 2016). All OTUs with a con dence of assignment > 80% to class Sordariomycetes were retained. Order-level assignments (the Sordariomycetes are split into 28 orders) with a con dence > 50% were taxonomized, while all others were kept as "unclassi ed Sordariomycetes". To assess the effects of differing clustering thresholds on downstream taxonomic assignment, OTUs at each clustering threshold were also closed-reference clustered (i.e. only retaining sequences with hits in the reference set) against the composite LSU/ITS2 reference sequences used to construct the tree (Edgar 2010; Rognes et al. 2016).

Alignment and tree building in Sordariomycetes
Reference sequences for the class Sordariomycetes were downloaded from GenBank, consisting of species that were complete for at least 2/3 of the rRNA cluster composed of SSU, LSU, and ITS2 sequence (full list of accessions in Suppl. Table 1). 80% of species in this reference set had accessions covering all three regions. ITS2 reference sequences were processed through ITSx to eliminate redundancy in the concatenated alignment (Bengtsson-Palme et al. 2013).
The reference sequences and OTUs were aligned using MUSCLE (Edgar 2004) under default settings and the aligned matrices were concatenated. The concatenated three-gene alignment was then inspected in Mesquite and Geneious Prime (v 2020.0.4) and problematic accession sequences were removed. The alignment was partitioned for each marker region, and the best model for each partition was selected according to BIC values. Model testing, tree building, and ultrafast bootstrap approximation (n = 1000) were performed in IQ-Tree2 (Chernomor et al. 2016;Minh et al. 2020).

Phylogenetic diversity metrics
Phylogenetic distribution of ITS and LSU copies was assessed by metrics of clustering and overdispersion originally developed for community ecology (Webb et al. 2008). In the ideal case of capturing the same species with both markers, copies of ITS2 and LSU corresponding to the same species should be in close vicinity on the tree, i.e. the copies of each marker should be 'overdispersed' (more dispersed than a random phylogenetic structure). Deviations from this pattern can be assessed with the metrics calculating the Mean Pairwise Distances (MPD) and Mean Nearest Taxon Distances (MNTD) of each set (ITS2 and LSU). We report standardized values as the net relatedness index (NRI) and nearest taxon index (NTI) relative to null models of randomly distributed communities.

Assessment of species richness and community composition
Community ecological analyses were carried out on samples rare ed to 1000 reads, which was su cient for generating largely complete OTU sets as judged by species accumulation curves ( Supplementary Fig.  2), thus retaining 73 samples from the four species. Species accumulation curves were built with the specaccum function of the vegan package (Oksanen and et al. 2018). An OTU table and species classi cation was generated for fungal communities separately from ITS2 and LSU sequencing, after singletons and doubletons reads were removed. Fungal OTU richness among samples was assessed with a Generalised Linear Model (GLM), built using the lme4 package (Bates et al. 2015) of R, with fungal OTU richness as a response variable and beetle species and forest type as dependent variables. The Negative Binomial model was chosen, as it is suitable for over dispersed data. A post hoc pairwise comparison (Tukey HSD test at the 95% signi cance level) was carried out to compare the means among the different factors.
Sorensen and Jaccard indexes were used to calculate beta-diversity between sample pairs based on presence-absence data (richness), and the Bray-Curtis index was used for turnover based on the relative abundance of each OTU. The variation was visualised using multidimensional scaling MDS (betapart R package; (Baselga and Orme 2012), and the signi cance of the distribution was assessed using permutational multivariate analysis of variance (PERMANOVA) under a model for forest type, beetle species and their interaction, calculated with the adonis function of the vegan package (permutations = 999) (Oksanen and et al. 2018). A permutational multivariate analysis of dispersion (PERMUTEST) was performed to evaluate the assumption of homogeneity of variance in the data by using the betadisper function (Anderson and Walsh 2013). The non-euclidean distances between the group centroids were also calculated to assess the group dispersion.
To evaluate the stringency of association of fungal OTUs with tree species and beetle hosts for each assembly, a multilevel pattern analysis was carried out by calculating Pearson's phi coe cient of association ("p.g") (Chytrý et al. 2002) between sample pairs, correcting this index to account for the differences in specimen numbers among the compared groups (function multipatt of the indicspecies R package; (De Cáceres et al. 2011). The identi ed OTUs for which the association values were signi cant were displayed as a heatmap (aheatmap function, NMF R package; (Gaujoux and Seoighe 2010).

Results
Sequencing of 80 libraries produced 2,436,075 quality-ltered and merged reads for ITS2 and 1,742,119 reads for LSU, which resulted in 1157 OTUs from ITS2 and 548 OTUs from LSU after bioinformatics ltering and clustering (1546 and 632 OTUs if singleton and doubleton reads were retained and without applying rarefaction on each library). Identi cations of OTUs at ≥ 70% con dence level obtained with IDtaxa, PROTAXfungi and RDP were higher for ITS2 than for LSU at all hierarchical levels from class to order, family, genus and species level (Fig. 1). Yet, OTUs identi ed by one or multiple identi ers never exceeded 61.5% for ITS and 41.5% for LSU of the total OTUs. Identi cations dropped consistently from class to species level, and with each hierarchical level an increasing proportion of identi cations was provided by a single classi er only, indicating the growing uncertainty of taxonomic assignments. A classi cation at species-level was generally not possible for LSU because of the limitations of the databases that generally provide a taxonomy string to genus level only, and thus nearly 100% of the OTUs remained unidenti ed at this level. Nearly 50% of sequences were identi ed to species level with ITS but in virtually all cases only a single classi er produced these results (Fig. 1).
The 73 beetle specimens remaining after rarefaction harboured a total of 1180 OTUs for ITS and 553 OTUs for LSU. These OTUs could be assigned to 24 classes, 66 orders, 129 families and 369 genera. Identi cation at class level revealed the presence of 23 classes for ITS2 and 17 classes for LSU. The dominant classes were Dothideomycetes for ITS2 and Sordariomycetes for LSU ( Fig. 2A, B, Table S1). ITS2 produced twice as many identi ed OTUs, and in the classes Leotiomycetes and Tremellomycetes more than ve times as many, compared to LSU, due to the greater total number of OTUs and the higher proportion of them being fully identi ed. ITS2 also detected seven fungal classes not retrieved with LSU (Archaeorhizomycetes, Chytridiomycetes, Mucoromycetes, Orbiliomycetes, Spizellomycetes, Tritirachiomycetes and Ustilaginomycetes), while LSU recovered only one class not obtained with the ITS2 (Atractiellomycetes). Only for the Sordariomycetes and Agaricomycetes the proportion of OTUs detected with LSU was higher than with the ITS2 marker.
Effects of the ampli ed sub-region on the ecological analysis Fungal communities obtained with either marker were compared with regard to total richness and differentiation across beetle species and forest type. For both markers, species accumulation curves displayed a similar shape, despite the roughly twice higher OTU number in ITS2, with a slow increase and not reaching a plateau, although LSU generally showed a more pronounced 'shoulder' indicating a fraction of OTUs that is encountered commonly in multiple samples. Across the different forest types, species accumulation in oak forest clearly lagged behind pine and spruce forests in both markers (Supplementary Figure S2).
Richness in a single-beetle extract ranged from 9 to 140 fungal OTUs (mean of 56) in ITS2 and from 11 to 109 fungal OTUs (mean 48) in LSU (Fig. 2C, D). Despite some scatter among individual beetles, the number of OTUs per sample differed in a characteristic way between beetle species and forest types, and these differences were closely correlated in ITS2 and LSU, indicating that both markers detected a similar set of fungal species. This correlation was also evident at specimen level in the two outliers in each of the libraries corresponding to the same beetle individual. The variation in species richness explained by forest type and beetle species was broadly similar in ITS2 and LSU derived fungal communities ( Table 2), but the LSU data attributed a greater proportion of the variation to the forest type alone (27.47% compared to 18.75% from ITS2), rather than to the interaction of forest type and beetle species. Both ITS2 and LSU explained around 7% of the variation to be due to beetle species. MDS plots on the OTU composition revealed a very similar pattern of community separation of the three forest types in ITS2 and LSU (Fig. 3).
Tests of associations of OTUs to particular beetle species and forest types with the indval function revealed a large number of signi cantly positive and negative associations (Fig. 4). Despite representing fewer OTUs in total, the number of OTUs with signi cant levels of association was slightly higher in LSU compared to ITS2 (60 versus 50 OTUs). Positive associations of OTUs were prevalent in pine and spruce forests, but much less in oak. Positive associations were particularly strong with T. piniperda, and to some extent G. materiarius, whereas positive associations with Xs. germanus and Xb. saxesenii were limited to a small number of oak associated OTUs. Most other associations in these species were negative; e.g. the pine/spruce associated OTUs were absent, despite the fact that both beetle species were also sampled from spruce. General patterns of OTU associations and non-associations were similar for the two xyleborine species, and they were quite similar with those in Oak. In contrast, association patterns in T. piniperda and G. materiarius were similar to Pine and Spruce (Fig. 4). The similarity in these association patterns differed only slightly if assessed with the ITS2 and LSU-based OTUs (Fig. 4), despite the fact that the OTUs themselves could not be linked up between both markers, as they mostly were not identi ed or the identi cations did not overlap between the two marker sets.

OTU identi cations across markers
Because it remained unclear if the observed similarities in community composition were based on the same set of OTUs, a phylogenetic approach was used to associate ITS2-based and LSU-based OTUs with each other, focusing on the class Sordariomycetes as a model system. OTUs were clustered at minimum similarity thresholds of 97, 98 and 99%, which resulted in between 120-150 OTUs for ITS and 80-120 OTUs for LSU classi ed as Sordariomycetes using the RDP classi er at con dence > 80% (Table 3). OTU sequences from either marker were used for phylogenetic analysis together with publicly available fulllength sequences covering all or most of the rRNA cluster, including the LSU and ITS2 regions, besides the SSU gene present in most accessions. These sequences served as a scaffold for the major orders of Sordariomycetes (full list of accessions in Suppl. Table S1), to which the metabarcode data were added, using clustering thresholds of 99% similarity for LSU and 98% for ITS2. These conditions were used because they generated a similar number of OTUs for each marker (Table 3), and thus potentially represent a similar set of species.
Maximum-likelihood trees for the three-gene reference alignment and OTUs from metabarcoding resolved basal relationships similar to those found in the literature ( Figure S3). OTUs obtained from ITS and LSU were widely distributed across this tree, and their placement generally matched the order-level assignment obtained with the RDP classi er, which was dominated by the orders Hypocreales, Diaporthales, Ophiostomatales and Xylariales (Fig. 5) (Suppl. Figure S4). Across most orders in the trees, ITS2 and LSU sequences were recovered in close proximity and the distribution of both types of sequences across orders was similar, showing that overall community diversity at the order level could equally be inferred using either ITS2 or LSU (Suppl. Figure  S2). Order-level subsets of trees for these orders showed the placement of ITS2 and LSU sequence fragments relative to the reference set (Figs. 6A-C). If both sequences are derived from the same genomic template in the metabarcoding ampli cation they were expected to be represented by one OTU representative sequence for each marker, and these sequences to fall in close proximity on the tree (taking the same phylogenetic position) (Fig. 6D). However, due to their non-overlapping sequence, they can only be placed relative to the nearest full-length reference sequence. We found 15 instances where one ITS and one LSU barcode were in close proximity together with a reference sequence, potentially representing the same species. In an additional six instances, one or both of the barcodes formed a cluster on zero-length branches when matched to full length rRNA reference sequences, i.e. representing an exact match to an existing database entry.
Closed-reference clustering against the reference dataset to each order within Sordariomycetes by the RDP classi er revealed species-level matches for both ITS2 and LSU sequences (Fig. 7A). Notably, four species had matches to both markers, i.e. the same species were ampli ed. In addition, one ITS2 sequence produced a hit not reciprocated in LSU. Vice versa, LSU sequences produced hits to a minimum of eight additional species not seen in ITS, which was increased to 11 and 17 species under the higher 98 and 99% threshold values, respectively, as the trees became increasingly populated with the additional taxa from splitting of larger OTUs (Fig. 7B). Under these lower threshold values closely related sequences apparently were less affected by 'lumping', which obscured the true diversity in the sequencing mixture.
Where closely related reference sequences were missing, ITS2 and LSU sequences may be matched based on their phylogenetic proximity, but the ITS2 and OTU sequences obtained from a single genome may not appear as sister taxa because the gene sequences are non-overlapping and thus lack characters that could group them. We tested the degree to which ITS2 and LSU sequences interleave on the tree, by assessing phylogenetic clustering and dispersion with the NRI and NTI (Table 3). For ITS2, most values were negative, indicating overdispersion relative to the LSU sequences, except for the 99% similarity value, which produced positive NRI (clustering) possibly from selective oversplitting of OTUs that is not matched in the less variable LSU sequences. For LSU there was a progression from positive (clustering) at 97% similarity to negative (indicating overdispersion) at 99% similarity, which coincided with a near doubling in the number of OTUs (against only a small increase in the ITS2 data) ( Table 3). This indicated that OTUs newly formed by splitting were not clustered, unlike the ITS2-derived OTUs, but instead were widely distributed on the tree and interleaved with the ITS2 sequences. A 'mixed' threshold value of 98% for ITS2 and 99% for LSU presented slightly negative NRI/NTI values for both markers ( Table 3).
The detailed observations were con rmed by the global classi cation of OTUs at order level, which showed an increase in the proportion of identi ed OTUs with increasing threshold value for LSU, but not ITS2 (Fig. 8). Both markers produced broadly similar proportions of the four dominant orders, Xyliariales, Ophiostomatales, Diaporthales and Hypocreales, but differed to various degrees in the assignment of the 'small' orders. It was also evident that OTU numbers in Ophiostomatales were comparatively lower in ITS2, as also suggested from the phylogenetic tree (Fig. 5). This may be explained by the fact that the ITS2 forward primer binding site in this group differs from the consensus (Suppl. Figure S4). The RDP classi cations and tree-based assignments were largely in agreement regarding the relative proportion of order-level taxa, although generally the trees assigned a greater proportion of OTUs, reaching nearly 95%, and the result was more variable under the various similarity threshold values. This was particularly evident in ITS2, which saw hardly any change in the RDP classi cation under changing threshold values.

Discussion
Metabarcoding has revolutionised the study of fungal communities, revealing the huge proportion of hitherto unobserved species, including the unexpectedly diverse communities associated with bark beetles (Tedersoo et al. 2019). However, these inferences are based on short sequences and lack the biological underpinning of conventional assessment of communities using fungal cultures. Independent corroboration of species limits is needed, and principally can be achieved by using multiple markers that con rm the same groupings independently (e.g. (DeSalle et al. 2005). This was attempted here, in part motivated by the claims that the most widely used ITS fungal barcode marker may not be suitable for e cient ampli cation of key groups of fungi (Hulcr et al. 2020). The test of phylogenetic congruence is complicated because the amplicons come from complex mixtures of species, which does not allow to establish genetic linkage (phasing) across the two markers, despite the close proximity of the ITS2 and LSU in the genome. Instead, an indirect approach had to be used that identi es the amplicons relative to full-length reference sequences separately for ITS2 and LSU, which was not straightforward given the highly incomplete and taxonomically incompatible databases available for both markers.
We addressed the equivalency of ITS and LSU for the characterisation of fungal communities, by (1) assessing the ecological associations at the higher taxonomic level for the entire fungal set, and (2) establishing explicit correspondences of species-level taxa in metabarcoding from both markers (for the class Sordariomycetes only). As we showed with both approaches, OTU identi cation is challenging and depends on the available reference databases, as well as the speci c strategy for linking the metabarcode sequences into the taxonomic system. Taxonomic classi ers are now widely used and are becoming increasingly sophisticated. However, we note that placement is possible mostly to higher taxonomic levels, in line with comparisons of taxonomic classi cation software showing that the RDP classi er performed best in taxonomic classi cation to order (Richardson et al. 2017). Only a fairly small proportion of reads can be placed to genus or species level, in particular for the LSU marker (Fig. 1). Of those OTUs that could be placed, the exact taxonomic membership could be established only with one of the three classi ers. These di culties in identifying species greatly compromised the comparison of community composition obtained with either marker, which was impossible to link with this approach, as virtually none of the species level entities encountered were labelled with the same Linnaean binomial (Fig. 3). Thus, we did not obtain a corroboration of community composition from using both markers via taxonomic identi cation of reads against their respective reference databases.
In contrast, simple counts of OTU numbers (a proxy of species richness) produced a good correlation between both markers in several key parameters describing the community composition. First, plots of the number of OTUs obtained from each treatment (beetle species, forest type) from the ITS2 and LSU data closely mirror each other (Fig. 1). Equally, the proportion of explained OTU diversity by beetle species, forest type and beetle x forest interactions was closely similar between both markers, even if the absolute number of OTUs was generally much lower in LSU (Table 2). For both markers, communities from different forest types occupy remarkably similar portions of the multivariate space (Fig. 4). In addition, the higher-level composition of fungal communities associated with each beetle species is very similar whether generated with ITS2 or LSU (Fig. 2), which also holds for the composition of classes within the order Sordariomycetes (Fig. 8). The broad patterns of individual OTU associations in the indval analysis also show similar a nities with the beetle species and tree type (Fig. 4), even if the correspondences of species identity between ITS2 and LSU datasets could not be determined. All of these ndings point to a high level of corroboration between both markers and provide justi cation for the widely used approach of fungal community analysis based on higher level classi cation and read abundances. The latter is particularly remarkable given the frequently raised concern about PCR bias in Yet, the di culty of linking these metabarcoding sequences amongst markers leaves great uncertainty about the biological relevance of the community data, which still may represent entirely different species within the major taxonomic groups recovered by either marker, as already suggested for the Ophiostomatales (Hulcr et al., 2020). Thus, ultimately the metabarcoding approach may fall short of linking any particular fungal species to a beetle, unlike the conventional approaches of culturing speci c isolates that were used to reveal the symbioses of particular fungal and beetle species of direct relevance to the ecological processes. Phylogenetic analysis of individual sequences can improve the precision of placement, beyond the assignment to a broad taxonomic group, and using the phylogenetic position of particular ITS2 and LSU sequences relative to each other can potentially link the corresponding reads representing a given species from either marker, thus reconciling the distinct 'views' of the true community composition as revealed by each marker (Fig. 6D).
As shown by our extensive analysis of the Sordariomycetes, phylogenetic analyses provide clear insight into the taxonomic makeup of fungal communities and the most appropriate treatment of fungal metabarcodes for species delimitation. First, we found that the classi ers are broadly in agreement with the phylogenetic analysis. However, average RDP classi er con dence scores varied from order to order, with LSU assignments less con dent overall, ranging from an average of 41% in Sordariomycetes, to values generally between 80-100% in the ITS sequences (see Suppl. Table S2). Some OTUs were not identi ed beyond the class level by the RDP classi er, despite clear placement in the tree. Notably, the order Myrmecridiales was missing entirely from the classi er results, despite the presence of several OTUs placed clearly within the order and OTUs matching Myrmecridium schulzeri found in the closedreference clustering at all three thresholds (Figs. 7A, B). This outcome highlights the potential shortcomings of using classi ers that are dependent on reference databases and probabilistic assessments of short k-mers, which may be particularly problematic due to the limited sequence length of metabarcoding reads ( Second, the phylogenetic analysis attempted to determine if both markers reveal the same species-level entities. Under ideal circumstances, each species is represented by exactly one sequence each of LSU and ITS, and these two sequences from both markers nd themselves in the same position of the tree. Thus, the 1:1 represented sequences of ITS and LSU should be interleaved on the tree, which predicts a uniform phylogenetic distribution of both markers, contrary to a scenario of overdispersion or clustering of either type (Fig. 6D). However, deviations from the uniform distribution are due to two factors; the nonoverlapping sequence data that only can place ITS2 and LSU sequences relative to full-length reference sequences rather than to each other, and the problem of similarity thresholds that produce different numbers of potential species representatives by either lumping (too few species) or oversplitting (too many species) of OTUs in one or both of these loci. We here used the NRI/NTI framework to assess the uniformity of distributions of ITS and LSU sequences relative to each other on the phylogenetic tree.
Overall, the increase of the similarity threshold had a greater impact on the LSU than ITS2, almost doubling in numbers of recognised OTUs versus a small increase only (Table 3), and parity of OTUs in both markers was greatest at a 'mixed' threshold of 99% for LSU and 98% for ITS2. Slightly negative NRI/NTI values under these OTU thresholds (Table 3) approximate a pattern of overdispersed 'communities' of LSU and ITS2 sequences, as expected for the 1:1 correspondence of each marker, although the exact relationships were more complex. The 99% threshold for LSU is also supported by the greater matches in the closed-reference clustering (Fig. 7). The similarities in OTU counts and relative NRI/NTI values thus were considered the best estimate of the OTU diversity in each marker. Placement in proximity on the tree therefore would suggest that the respective ITS2 and LSU sequences in fact are derived from closely related fungal species, if not from the same genomic template. Frequently in our analysis this was corroborated by the fact that these closely related ITS2 and LSU sequences were obtained from the same specimen sample (not shown).
There are uncertainties associated with this inference, as it is well known that there is great variation in the rate of sequence evolution. Across many fungal classes, intraspeci c ITS variability varies considerably, highlighting the challenges and inevitable shortcomings resulting from the selection of a uniform OTU clustering threshold (Nilsson et al. 2008 (Fig. 6) may be the result of oversplitting of highly variable ITS2 OTUs (Nilsson et al. 2008). Yet, whether these OTUs correspond to biological species or some other hierarchical level, the approach generally is capable of linking close relatives across different markers. Reference sequences spanning both markers are required, but as shown for the Sordariomycetes, even an incomplete set can provide the scaffold for placing non-overlapping sequences, and in several instances the idealized placement of OTUs and reference sequences was found, in some cases across all clustering thresholds (Fig. 5), while in many cases some uncertainties remain where reference sequences are distant. Matching of ITS2 and LSU sequences was even possible in the Ophistomatales, despite the deviation in the ITS2 primer binding site in this group (Suppl. Figure S4), as the bias against the ampli cation presumably is overcome by permissive PCR conditions, and similar effects can be expected in other groups where such variation may exist, although we did not nd any evidence in the taxonomically broad reference sequences used here.

Conclusion
Two streams of evidence suggest the equivalency of the two most widely used metabarcoding markers, ITS2 and LSU, in assessing fungal communities associated with insects: community-level diversity metrics showed great consistency when inferred with either marker, and the taxonomic and phylogenetic distribution of taxa recovered were highly similar. Metabarcoding of these communities only began recently, revealing a huge diversity of mostly anonymous ("dark") fungal isolates that have been used to draw conclusions on biodiversity patterns, but the taxonomic assignment and estimates of species richness were lacking independent corroboration. The use of two independent metabarcode markers changes this situation dramatically, as each inference is independent, and although the markers are genetically linked, i.e. likely represent the same evolutionary history, the experimental approach is treating these samples as separate inferences of the fungal communities. This provides strong support for the metabarcoding approach of fungal diversity directly from the insect specimens, which can reveal critical information about the distribution, spread and epidemic conditions of fungal tree pathogens by sampling the insect specimens.

Declarations
Funding: John Spedan Lewis Foundation, London, UK Con icts of interest/Competing interests: APV is co-founder and scienti c advisor of NatureMetrics, a company providing commercial services in DNA based monitoring.
Sordariomycetes based on a four-gene phylogeny. Mycologia 98(6): 1076-1087   Tables   Due to technical limitations, table 1 to 3 is only available as a download in the Supplemental Files section. Figure 1 The proportion of fungi classi ed with IDtaxa, RDP and PROTAXfungi from class to species level. "ALL"

Figures
refers to the proportion of OTUs for which the three classi ers agreed in their classi cation.    Phylogenetic tree of Sordariomycetes. The circular tree shows the orders represented by the colour of branches, and the distribution of ITS (blue) and LSU (green) relative to the full-length ITS-LSU reference set (yellow) on the outer circle. Note the limited presence of ITS sequences in the Ophiostomatales (in top right quadrant). Red dots mark the three orders Xylariales, Ophiostomatales and Diaporthales shown in greater detail in Figure 6. The orange dot marks the Myrmecridiales whose presence in the sample was only detected with the tree-based method, but not with the classi ers. Order-level trees with mixed OTU clustering thresholds (99% LSU, 98% ITS2). Full tree in supplementary materials. Leotia lubrica was used as the outgroup (not pictured). Brackets indicate reference taxa linked to an ITS2 and/or LSU OTU, with colours indicating potential splitting/lumping (blue, splitting; green, lumping; orange, 1:1) by one of the markers as illustrated in Figure 6B. The interaction of non-overlapping ITS2 and LSU sequences in the combined community tree is only possible through their placement relative to the full-length reference sequences that include both markers (labelled "Reference"). The combined OTU inference shows how the different clustering artefacts may present themselves on the tree.

Figure 7
Results from the closed reference clustering of OTUs at each clustering threshold against composite LSU/ITS2 reference sequences. Figure 7A. LSU matches in green, ITS2 matches in blue, linked matches (for which both an ITS2 and LSU OTU were matched to a reference sequence of the same species) in yellow. Underlined taxa indicate new matches at each clustering threshold. Figure 7B. Phylogenetic tree of LSU OTUs under increasingly stringent clustering thresholds, with arrows marking newly added taxa as threshold values are increased.