Annotating public fungal ITS sequences from the built environment according to the MIxS-Built Environment standard – a report from a May 23-24 , 2016 workshop ( Gothenburg , Sweden )

Recent molecular studies have identified substantial fungal diversity in indoor environments. Fungi and fungal particles have been linked to a range of potentially unwanted effects in the built environment, including asthma, decay of building materials, and food spoilage. The study of the built mycobiome is hampered by a number of constraints, one of which is the poor state of the metadata annotation of fungal DNA sequences from the built environment in public databases. In order to enable precise interrogation of such data – for example, “retrieve all fungal sequences recovered from bathrooms” – a workshop was organized at the University of Gothenburg (May 23–24, 2016) to annotate public fungal barcode (ITS) sequences according to the MIxS-Built Environment annotation standard (http://gensc.org/mixs/). The 36 participants assembled a total of 45,488 data points from the published literature, including the addition of 8,430 instances of countries of collection from a total of 83 countries, 5,801 instances of building types, and 3,876 instances of surface-air contaminants. The results were implemented in the UNITE database for molecular identification of fungi (http://unite.ut.ee) and were shared with other online resources. Data obtained from human/animal pathogenic fungi will furthermore be verified on culture based metadata for subsequent inclusion in the ISHAM-ITS database (http://its.mycologylab.org).


Introduction
Fungi are found throughout the biosphere, and the built environment is no exception. The taxonomic composition of indoor fungal communities tends to reflect the local outdoor communities, although the majority of fungal particles found indoors is thought to represent spores, hyphal fragments, and other dormant and passively distributed stages (Seo et al. 2015). Although most of the fungi recovered from indoor environments would not be able to live in the built environment for any extended period of time, a minority of these species are able to cope with, and will even thrive in, the harsh conditions that the built environment presents (Hamada and Abe 2010;Nevalainen et al. 2015;Zupančič et al. 2016). These species are mainly saprotrophic, and their degree of active growth largely depends on water availability (Adams et al. 2013). They can be a serious cause of decay and other concerns in water-damaged buildings, but they are also found in buildings not subject to moisture issues -and even in buildings where very strict sanitization and filtration regimes are applied (e.g., La Duc et al. 2012;Checinska et al. 2015). Exposure to aerosolized fungal particles has been linked to asthma onset in humans and may furthermore play a role in eczema development and other issues in human health (Reijula et al. 2003;Knutsen et al. 2012). Indoor fungi may also contribute to other unwanted processes, such as food spoilage and wall staining (Varga et al. 2014). The built mycobiome is thus of interest to a range of scientific fields, including mycology, medicine, food biology, construction, and engineering.
Traditional, morphology-based studies of fungal spores and cultures derived from indoor sampling have recognized ca. 90 species of common indoor fungi (Flannigan et al. 2002). Efforts based on high-throughput DNA sequencing, in contrast, have revealed a vast and hitherto unknown diversity of indoor fungi. In a global study of indoor dust samples, Amend et al. (2010) using next-generation sequencing found ca. 4,500 fungal operational taxonomic units (OTUs; Blaxter et al. 2005) approximately at the species level. Similarly, another next-generation sequencingpowered study -Nonnenmann et al. (2012) -recovered 450 fungal species from 50 indoor dust samples in Yakima valley, WA (USA). Although precise species delimitation and species counts from next-generation sequencing data remain challenging (Nguyen et al. 2016), the taxonomic span of the fungal assemblages recovered in Amend et al. (2010) and Nonnenmann et al. (2012) is far larger than that occupied by the fungi traditionally thought of as common indoor fungi (cf. Flannigan et al. 2002). Thus, whereas these studies should not be used as estimates of the total number of indoor fungi, they do testify to the substantial diversity of fungi in the built environment. The lack of taxonomic reference sequences makes precise identification of many of these species problematic, and it is not unusual that a sizable proportion of the OTUs in environmental sequencing studies remain unassigned beyond the kingdom or phylum levels (e.g., Tedersoo et al. 2014;Fouquier et al. 2016;Nilsson et al. 2016). There is clearly a need to generate reliable reference sequences, most notably from type material, to address this issue (cf. Schoch et al. 2014). However, the estimated number of extant species of fungi -1.5-6 million (Hawksworth 2001;Taylor et al. 2014) -stands in stark contrast to the number of described species (~130,000 as of March 2016; www.speciesfungorum.org), and strongly suggests that molecular identification of fungi will remain challenging for the foreseeable future. In some cases, even reference barcode (nuclear ribosomal internal transcribed spacer, ITS) sequences from type material will not be enough. Several fungal genera regularly recovered from built environment samplessuch as Aspergillus, Cladosporium, Fusarium, and Penicillium -show little or no ITS variation across sets of two to several species (Bensch et al. 2012;Samson et al. 2014;Visagie et al. 2014;O'Donnell et al. 2015). Additional genetic markers are needed for robust species-level identification in these cases.
A second problem that compounds the scientific understanding of the built mycobiome has been the lack of a standardized vocabulary for sequence annotation. The International Nucleotide Sequence Database Collaboration (INSDC; Cochrane et al. 2016) holds more than 5,000 Sanger-derived fungal ITS (barcode) sequences from the built environment, but their level of metadata annotation differs widely. This unfortunately applies to most available fungal ITS sequences (cf. Nilsson et al. 2014); for example, a modest 43% are known to be annotated with something as simple and straightforward as country of collection (Tedersoo et al. 2011). In addition, where metadata exist they are not always provided in standardized and searchable formats, making precise queries difficult. There is, for instance, no straightforward way to download all fungal ITS sequences from bathrooms, or to target the substrate of gypsum board. It is reasonable to think that analysis of fungi recovered from bathrooms may prove a rewarding scientific enterprise, as indeed should be the case for fungi collected on specific building materials, under different moisture regimes, or any other particular parameter or setting. The full potential of such searches cannot presently be utilized due to the poor state of sequence annotation -primarily omitted by the original sequence authors -in the public sequence databases.
The new MIxS-Built Environment annotation standard (Glass et al. 2014; http:// gensc.org/mixs/) addresses the need for a thorough, standardized vocabulary for microbiological analysis of the built microbiome. If all relevant fungal ITS sequences in the INSDC were annotated according to this standard, then this would open up the body of extant molecular data to detailed, precise scientific queries in the context of the built mycobiome. Going through and annotating large sequence sets is a daunting effort for any researcher, but fortunately such efforts are easy to split among a set of individual researchers. This paper presents the outcome of a sequence metadata annotation workshop (University of Gothenburg, May 23-24, 2016) to annotate the ~6,500 public fungal ITS sequences from the built environment according to the most relevant parts of the MIxS-Built Environment annotation standard. In recognition of the fact that fungi found indoors are typically found outdoors as well, the workshop also annotated closely related outdoor sequences according to basic geo-ecological parameters. The workshop was organized jointly with the UNITE and ISHAM databases Irinyi et al. 2015). UNITE is a general-purpose sequence management environment seeking to reconcile molecular ecology and taxonomy of fungi and fungal communities. The ISHAM database centers on identification of human and animal pathogenic fungi to guide antifungal treatment choices. Both databases focus, at least for the time being, on the ITS region and share views on the importance of openness, free accessibility, and community participation.

Materials and methods
The workshop comprised 20 physical participants, mainly local Ph.D. students and postdocs -but also other researchers -in systematics and ecology. In addition, another 16 researchers participated remotely through Skype, Google Docs, and email. The participants focused on the public fungal ITS sequences of the INSDC as mirrored in the UNITE and ISHAM databases. To single out INSDC sequences associated with the built environment, we used a set of 24 keywords such as "dust", "gypsum", and "floor" (Suppl. material 1). Keyword matches were made to the title of the underlying publication (the INSDC field "title"), the INSDC fields "source" and "tissue type", and the UNITE field "sequence source". We refer to this set of sequences as the built mycobiome set (BMS). To single out outdoor sequences with a direct relation to the BMS, we extracted all UNITE species hypotheses with at least one BMS sequence. We then built the outdoor mycobiome set (OMS) from all sequences that did not match any of our keywords but that were found in the same species hypothesis as at least one BMS sequence. Sequences that initially were assigned to the BMS set, but that on closer inspection turned out not to qualify as the built mycobiome ("collected outside hospital", for example), were transferred to the OMS set.
For each BMS sequence we tried to locate any underlying publication through the INSDC fields TITLE, JOURNAL, and PUBMED. If these were not informative, we resorted to ISI Thompson, Google/Google Scholar, and ResearchGate searches. We examined the publications for the nine items of the MIxS-Built Environment annotation standard that we felt were the most relevant and the most likely to be covered by the studies: building occupancy type, indoor space, indoor surface, surface material, surface-air contaminant, space typical state, substructure type, ventilation type, and filter type (http://gensc.org/mixs/). In addition we also targeted the country and host of collection and the nature of the fungus-host association (e.g., "plant: wood", "plant: leaf", and "human/animal: skin"), as applicable, for all sequences. We only targeted metadata and information that was clearly and unequivocally specified in the paper. A research professional (G. Bok) from a building-related technical institute was present to assist with technical, analytical, and construction-related questions in the context of the built environment. For the OMS we similarly retrieved the underlying publications and annotated the sequences to country and host of collection plus host association (as applicable, and if and when these data were missing). All results were entered into an Excel sheet for upload into UNITE and ISHAM (after culture-based verification in the case of the latter), and for sharing with other online resources.

Results
A total of 6,526 BMS and 11,574 OMS sequences from a total of 255 separate studies were annotated with at least one metadata item. A total of 45,488 annotations were made during the workshop. For example, "building occupancy type" was established for 5,801 sequences, and "ventilation type" was established for 2,235 sequences (Table 1;  Figures 1-3). The results were uploaded into UNITE via its data management system PlutoF (https://plutof.ut.ee; Abarenkov et al. 2010) for open query by the scientific community and was shared with the INSDC as an Excel sheet (Suppl. material 2).

Discussion
The workshop compiled a total of 45,488 metadata items, making them available for scientific query through UNITE and other venues. These metadata, although typically "published" and thus "available", were previously not open for direct query. This highlights the wealth of relevant scientific information that lies buried in the last few decades' worth of scientific publications -formally available, yet only available to those who know where to look, and reachable only to those with access to that literature. Fortunately, we live in a digital age where the infrastructure for recovering and sharing such information is falling into place (Martin and Martin 2010). Furthermore, there is a growing awareness of the need to annotate newly generated sequences beyond the barest minimum when these are first deposited into public sequence databases (Hyde et al. 2013;Schoch et al. 2014). Such annotations unlock significant scientific potential of those molecular data, increase the citability of the underlying scientific studies, and  fulfill funding agencies' demands for openness and maximum scientific use of research funding. We certainly hope that the mycological community will be quick to embrace a more integrative approach to sequence annotation. The public sequence databases can similarly make it even easier and faster to provide such metadata upon sequence submission. We speculate that excessive time consumption is the primary reason why some sequence depositors do not annotate their sequences as well as they could have. We managed to process nearly all BMS sequences -for which we could retrieve the underlying publication(s) -for at least one metadata item. A total of 4,985 sequences were false positives -our keywords indicated them to belong to the BMS whereas in reality they did not. A sequence could stem from "outside city hospital" (keyword "hospital"), for instance. These sequences were annotated for country and host of sampling, plus the nature of the relation to the host, whenever the underlying scientific study could be retrieved and interpreted. It is reasonable to assume that our initiative suffered from a fair number of false negatives as well -sequences that should have been a part of the BMS, but that were not. Although we used no fewer than 24 keywords in our efforts to capture the built environment, we presumably missed one or more important terms in the field. We similarly missed out on all built-environment sequences that featured no relevant annotation whatsoever -perhaps just a species name and the country of origin were available. Thus, whereas we managed to do at least something about nearly all BMS sequences we recovered, we do not claim to have annotated all public fungal ITS sequences from the built environment.
The workshop identified several potential venues for amendments to the MIxS-BE standard. For example, "floor" was found to be a common place for sampling of, e.g., dust, yet the data point of "floor" could not easily be fitted into any extant MIxS-BE category. Similarly, "air" could not be represented in a straightforward way in the MIxS-BE standard (but rather applied to other packages of the MIxS standard). We also felt the need for a "laboratory" flag to indicate that a sequence stemmed from sampling in a laboratory. In addition, we were surprised by the number of fungal sequences generated from environments that must be considered to qualify as "built" or at least altered by man, but that nevertheless were difficult to fit into the present MIxS-BE categories. The examples included tombs, crypts, and mummies (Šimonovičová et  Barcaccia et al. 2015). In these cases, we tried to capture the essence of the underlying sequence entries to the extent that the MIxS-BE standard allowed. We used our free-text field "Comment" to provide additional information that we felt was important with respect to future queries of these entries. These potential venues for improvements of the MIxS-BE standard have been communicated to MIxS-BE representatives from the Genomic Standards Consortium's MIxS Compliance and Implementation working group (http://gensc. org/mixs/mixs-compliance-and-implementation/).

conclusions
The present study used a workshop-style approach to accomplish a task that would have taken several months for a single researcher to accomplish. Costs were kept low by recruiting many of the participants among local Ph.D. students and postdocs in systematics and ecology, and workshop participation was made attractive by providing the opportunity to contribute to this workshop report. We can recommend this model when tackling projects of a similar kind, such as data assembly and analysis in molecular ecology and systematics. As an added benefit, the more junior participants obtain experience in scientific collaboration and communication as well as in carrying out scientific projects (cf. Ryberg et al. 2016). The workshop was funded by an Alfred P. Sloan foundation grant to improve the support for the built mycobiome in UNITE and elsewhere. Other events include a forthcoming (2017) taxonomic sequence annotation workshop and the generation and public release of sequences from type material. We invite feedback and participation in these events, and we welcome any other idea to take molecular identification of the built mycobiome to the next level. Indoor Air 25 (2)

Supplementary material 3 high habitat-specificity in fungal communities in Introduction
Aquatic fungi play an important role in the cycling of carbon and nutrients in ecosystems (Gleason et al. 2008;Wurzbacher et al. 2010;Jobard et al. 2010;Grossart and Rojas-Jimenez 2016). Fungi may be involved in many stages of nutrient cycling, but can also be quite specific in their ecological functions. The degradation of recalcitrant plant, algal and animal residues may be carried out by a number of poorly known groups within the phyla Chytridiomycota and Rozellomycota (Corsaro et al. 2014, syn. Cryptomycota;Jones et al. 2011), and by ecological groups of aquatic hyphomycetes and yeasts (reviewed by Wurzbacher et al. 2010;Jobard et al. 2010). Parasitism by Chytridiomycota species facilitates the trophic transfer of nutrients from otherwise inedible phytoplankton to filter-feeding zooplankton (termed the "mycoloop"; Kagami et al. 2007Kagami et al. , 2014. Aquatic fungi also form symbiotic relationships, such as endophytic or mycorrhiza-forming fungi (Kohout et al. 2012) or Chytridiomycota symbioses with algae (Picard et al. 2013). Despite their important functional role in lakes, the biodiversity of freshwater fungi remains poorly known. Estimates of total fungal diversity currently range from 1.5-3 M species worldwide (Hawksworth 2012). Of these, roughly 100,000 species are described, with only ca. 3000 of these from aquatic habitats (Shearer et al. 2007;Tsui et al. 2016). The low diversity of aquatic compared to terrestrial (e.g., soil) fungi partly results from the fact that mycological studies in aquatic systems remain rare. Apart from a few well studied lotic ecosystems and wetlands (Wong et al. 1998;Shearer et al. 2007;Gulis et al. 2009;Krauss et al. 2011), the total diversity of aquatic fungi has not been linked to habitat heterogeneity. Most studies in freshwaters have focussed on marshlands (reviewed in Kuehn 2008) and examined the open water, leaf litter or emergent macrophytes (e.g., Typha, Phragmites). Studies in lakes have often concentrated on seasonal patterns in the water column (e.g., van Donk and Ringelberg 1983;Holfeld 1998;Lefèvre et al. 2012;Rasconi et al. 2012) or have compared different lakes (e.g. Zhao et al. 2011;Lefèvre et al. 2012;Taib et al. 2013). Several studies have found evidence for vertical and horizontal structuring of fungal communities in the water column (Lefèvre et al. 2007;Chen et al. 2008;Lepère et al. 2010), suggesting that there is an important spatial component of diversity. A recent meta-analysis of global diversity found that aquatic fungi clustered in habitat-specific biomes, with freshwater biomes having the highest diversity at the phylum level (Panzer et al. 2015). The authors attributed this to the high substrate diversity and temporal dynamics of environmental parameters in freshwater ecosystems.
Considering the multitude of available niches and fungal lifestyles in aquatic habitats (Karling et al. 1977;Wurzbacher et al. 2010), the actual species number of aquatic fungi is likely to be much higher than what is currently recognized. Freshwater systems contain a great diversity of habitats including the boundaries that connect them to terrestrial and groundwater ecosystems (Vadeboncoeur et al. 2002;Schindler and Scheuerell 2002). Temperate, stratified lakes encompass horizontal gradients from shallow (littoral zone) to open water (pelagic zone) habitats, as well as vertical gradients from the surface associated epilimnion (often photic, light) to the deeper hypolimnion (often aphotic, dark) and the sediment. Shore regions are transition zones between terrestrial and aquatic habitats, and include biogeochemical gradients and macrostructures such as aquatic macrophytes, animals, plant debris and biofilms. These shore regions may thus be "hot spots" of aquatic, amphibious and terrestrial fungal diversity (Wurzbacher et al. 2010). In contrast, pelagic habitats have little or no macrostructure, and pelagic fungi may be limited to planktonic substrates such as dissolved organic matter (DOM), phytoplankton and zooplankton (living or dead). In particular, accompanying the change of substrate from coarse particulate organic matter (CPOM) near the edges of the lake to fine particulate organic matter (FPOM) in the open water, filamentous Dikarya are expected to be replaced by less abundant single celled yeasts and flagellated Chytridiomycota (Wurzbacher et al. 2010). We hypothesize that such a change in "fungal morphotypes" to unicellular fungi is linked to a change in the abundance and size of substrates present in the various lake habitats.
We examined the fungal diversity of a temperate lake in North-East Germany (Lake Stechlin) using a high throughput sequencing and metabarcoding approach. Our first aim was to examine the effect of habitat specificity on the fungal community by measuring the extent to which different habitat types contained similar communities, or whether there was a pronounced taxa turnover among habitats. Our second aim was to test the morphotype hypothesis, specifically whether fungal groups present were related to the availability of major types of particulate organic matter (POM). We expected that the broad diversity of substrate size and structures sampled (e.g. plankton, macrophytes) would reveal a more heterogeneous fungal community than previously detected by traditional lake sampling strategies.

Sampling site
Lake Stechlin is a deep (maximum depth: 69.5 m), oligo-mesotrophic, dimictic hard-water lake in North-East Germany (53°10'N; 13°02'E). It has a surface area of 4.25 km 2 and is divided into three distinct basins (Figure 1). The lake has a littoral reed belt of Phragmites australis that is interspersed with areas of underwater macrophytes (mainly Characea). It is surrounded by mixed forest dominated by Pinus sylvestris and Fagus sylvatica. Lake Stechlin is part of the global lake ecological observatory network (GLEON) and has been monitored since 1959 (Casper 1985). Of the many publications from Lake Stechlin, few have examined the fungi (Casper 1965;Luo et al. 2011;Wurzbacher et al. 2014). This study thus represents the first attempt to characterize Lake Stechlin's mycobiota. During the course of our field sampling (April-June 2010), Figure 1. Sampling sites in Lake Stechlin. Integrated water samples, above-sediment water, plankton (> 55 µm), and sediment were taken from pelagic locations. Surface water samples, reed plants (Phragmites australis), biofilm samples (from stone, wood and macrophytes) and benthic samples (detritus, macrozoobenthos) were taken from littoral locations. the phytoplankton community was dominated by diatoms and by filamentous cyanobacteria (Dolichospermum flos-aquae). The nutrient status of the lake during the sampling period is detailed in the Suppl. material 1.

Sampling
We sampled eight different habitat types (Table 1) at three time points encompassing spring and early summer 2010 (8-9 April; 11-12 May; and 9-10 June). Sampling was carried out relatively early in the year to avoid an over-representation of wood-degrading Basidiomycetes that are introduced as airborne spores from the surrounding forest between July to November (personal observation). Our sampling scheme was designed to cover for both pelagic (defined here as areas with > 20 m depth and > 100 m from shore) and littoral (< 10 m from shore) habitats (Figure 1). Habitats were defined as follows: "Pelagic" samples consisted of a 1 litre water sample integrated from three depths: 1 m below the surface, at mid-depth, and at 2-3 m above the sediment. These were collected using a Niskin-type water sampler (Hydro-Bios, Germany); "Plankton" was obtained from an integrated sample (surface to 2-3 m above sediment) from a plankton net (55 µm mesh; Hydro-Bios, Germany); "Above Sediment" was a water sample from 0-20 cm above the sediment that was retrieved together with "Sediment", which itself comprised 1 ml of the uppermost cm of the core, using a sediment corer (6 cm diameter; Uwitec, Austria). "Littoral" samples consisted of a 1 litre water sample taken from 0.5-1 m depth in the Table 1. Overview of the total abundance of eukaryotic sequences and OTUs (97% similarity clustering) recovered for each lake habitat. Fungal contribution (reads, OTUs) was calculated as median percentage with standard deviation. Habitat abbreviations used in Figs 2 and 3 are indicated to the right of habitat names. POM mainly consisted of three types: fine (FPOM), coarse particulate organic matter (CPOM), or a mixture of both (MIX). Reported values were obtained from analysis of 3 samples in each habitat, except for Reed habitat which had only 1 sample. Each of these samples contained pooled DNA from three time points (April -June 2010). The shared Chao OTU richness is given as a range between a conservative and a non-conservative estimate (see Method section). n.a. (non applicable) * Chao estimate may be not reliable for small sample sizes littoral zone; "Reed" samples were taken from aerial, submerged, and rhizosphere parts of reed plants, following the physical removal of biofilm; "Biofilm" samples were taken from stones, woody debris, and reed stems (removed using a scalpel); and "Benthos" consisted of detritus and zoobenthos sampled from the littoral zone using a sediment grabber (Ekman-Birge bottom sampler, Hydrobios, Germany). Each of the eight habitat types was sampled at 3 locations in each of the 3 basins at each of the 3 time points (n = 27), for a total of 216 samples. Samples were pooled by combining one sample from each of the three basins, resulting in 3 representative samples of each habitat per time point. These were further pooled for sequencing analysis (see below). Water samples were filtered on a 0.22 µm Sterivex filter (Millipore, USA), plankton-net samples were filtered onto a 12 µm cellulose acetate filter (Sartorius AG, Germany), and 1 ml of sediment was transferred to a cryotube for storage. All samples and filters were stored at -80 °C until further processing. We categorized the typically predominant POM type for each habitat. Water samples were FPOM dominated and Reed and Benthos habitats were CPOM dominated, while Sediment and Biofilm was classified as a mixture of both POM types (Table 1).

DNA extraction
Total DNA was extracted using the Power Soil kit (MoBio Laboratories, Carlsbad, USA) for Sediment samples; the Qiagen Plant kit (Qiagen, Hilden, Germany) for Reed, Biofilm, and Benthos samples; and the Qiagen Blood & Tissue kit for Littoral, Pelagic, Above Sediment and Plankton samples. Manufacturers' instructions were followed with the following modifications: Reed and Benthos samples were homogenized with a mill (Pulverisette 9, rpm = self-optimize speed, 20 sec, Fritsch, Germany) and all other samples were subjected to a bead-beating step prior to extraction (MMX400, 2 × 2 min, f = 30 sec -1 , Retsch, Germany). We added 20 µl Proteinase K (Qiagen, Netherlands) to the lysis buffer for Sediment, Reed, Biofilm, and Benthos samples, and incubated these for 1 h at 56 °C. DNA concentrations were measured using a PicoGreen assay (Invitrogen, USA). Approximately 20 ng of DNA was used as template for PCR.

Library preparation for pyrosequencing
DNA metabarcoding was carried out on all samples using the D1/D2 variable region of the ribosomal LSU with the eukaryotic primers NLF184cw (TACCCGCT-GAAYTTAAGCATAT; modified from Van der Auwera et al. 1994) and Euk573rev (AGACTCCTTGGTCCRTGT; modified from NLR818, Van der Auwera et al. 1994). After in silico tests using TestPrime (Klindworth et al. 2012) we found the primer pair covered 84% of all eukaryotes deposited in the SILVA database (LSU r123 version) when allowing for two mismatches, neither of which was in the last 3 bp of the 3' region. The primer pair potentially excludes single eukaryotic lineages within Amoebozoa, Excavata, Cercozoa. Within fungi it covers 93.4% of deposited sequences in all phyla, except Microsporidia. Oomycetes were covered at 76%. Among the fungal phyla, the lowest coverage was 85% for Basidiomycota, followed by Zygomycota with 93%. Primers were modified with 5' sequencing adaptors (extended primer list in Suppl. material 2), consisting of barcodes recommended by Roche and Lennon et al. (2010) and Lib-L adapters (Roche). PCR was conducted with AccuPrime Taq Polymerase High Fidelity (Invitrogen, USA) in a 40 µl reaction with the following conditions: initial denaturation for 3 min at 98 °C followed by 32 cycles of 1 min denaturation at 94 °C and 2 min annealing/elongation at 60 °C. The quality and intensity of the amplicons were checked on an agarose gel to ensure semi-quantitative assumptions ). PCR amplicons were purified using AMPure XP Beads (Beckman Coulter) and quality was verified by microfluidics electrophoresis (Bioanalyzer, Agilent). The 9 PCR products per habitat (3 replicates per sampling time) were then pooled into three final replicates for sequencing, each of which contained all 3 time points. As a result, the sequencing triplicates were representative for the habitat biota within the sampled timespan of April-June. Pooling also helps to ameliorate PCR bias and template stochasticity. We sequenced only one of the triplicates of the reed habitat. Afterwards all amplicons were pooled equimolar for emulsion PCR and subjected to pyrosequencing library preparation and sequencing following the manufacturer's recommendations (Lib-L, FLX titanium chemistry, Roche, Switzerland). The sequence data was deposited at ENA (http:// www.ebi.ac.uk/ena) under following accession number: PRJEB14236.

Sequence data processing
Sequences were processed as briefly outlined in Suppl. material 3. Raw 454 sequencing data were transformed by coding any nucleotide with a Phred score < 11 as N. We removed all reads shorter than 300 nt and trimmed reads with trailing Ns. The D1 region is highly variable and has a pronounced length polymorphism, which renders an accurate alignment difficult. We therefore defined an end position to serve as an alignment anchor by screening the SILVA reference database (v123) for a conservative eukaryotic region located within our amplicon. We identified a conserved 42-nt sequence (GAG-NCCGATAGNNNACAAGTANNGNGANNGAAAGWTGNAAAG) located after the D1 region as being suitable to serve as a stable 3' end for the alignment by using the probe design tool of ARB (Ludwig et al. 2004). We subsequently clipped all filtered reads (fastq format) after the last nucleotide using Shore oligo-match: a sequence context-aware clipping tool (Ossowski et al. 2008). This normalized the length of the reads to a fixed position in a global alignment (average read length: 360 ± 13, n = 596k). We allowed for mismatches by scoring each match with 3, mismatches with -1, and gaps with -4. The threshold for clipping was set to score MAX > 0.5 and the effect on the size-frequency distribution can be found in Suppl. material 3. Unclipped sequences were rejected and analysed separately (Suppl. material 3). Clipped reads were processed in Mothur following 4-5-4 SOP (Schloss et al. 2009, accessed in August 2012. Quality filtering was achieved by using the slidingwindow option (quality threshold of 25). For the alignment-based procedure, we constructed a reference dataset with long, high-quality reads processed with pyrotag-ger (http://pyrotagger.jgi-psf.org) using a cutoff at > 500 nt. These were aligned to the eukaryotic backbone provided by the SILVA database LSURef (version 111; www. arb-silva.de) using the SINA aligner ). This reference alignment was used to align our reads in Mothur. After clustering at 97% sequence similarity (average neighbour algorithm), the OTU abundance matrix was imported into R (www.r-project.org, version 3.3.1.) for further analysis (see below). As a comparison to a fixed 97% sequence similarity cutoff, we employed a coalescent-based clustering analysis as implemented by the gmyc model (Powell et al. 2011;Fujisawa and Barraclough 2013) with an UPGMA tree for the early diverging lineages (456 OTUs). OTUs were classified by the RDP classifier using the RDP fungal LSU training dataset with a confidence level of 80% (version 11; Liu et al. 2011; Figure 2; see Suppl. material 4 for classifications).

Habitat richness and statistics
OTU count was positively correlated with read count (Pearson's r = 0.90), thus we avoided single sample based richness estimates. For richness estimates of the habitats we applied a shared corrected Chao index (Chiu et al. 2014). More specifically, we provided a range for the Chao estimates based on a lower conservative OTU filtering and an upper overestimate based on unfiltered OTUs. In the former case all OTUs that occurred only in one sample of the dataset (independent of the absolute OTU frequency) were removed while in the latter case all OTUs including singletons were kept. Supporting rarefaction curves displaying the sampling effort for Eukarya and Fungi based on a singleton filtered OTU matrix are provided in the Suppl. material 5).
POM and habitat types were compared by employing parametric statistics (ANO-VA and Tukey honest significant difference) on "logit" transformed proportional read (read counts) or OTU (OTU counts) data. Plankton samples were excluded due to their skewing effect on the distribution caused by their low fungal proportion (including Plankton samples will still lead to a significant Kruskal-Wallis test, p<0.001, but renders PostHoc tests difficult to apply). Normality and homogeneity of variances were confirmed by Shapiro-Wilk tests and Levene's Tests, respectively.

Multivariate analyses
For all subsequent β-diversity analyses, an OTU table without singletons was used to account for noise in the data (e.g., Reeder and Knight 2009). Differences among habitats within the fungal sub-community were examined with a non-metric multidimensional scaling (NMDS) ordination plot based on the Cao distance (Cao et al. 1997), which accounts for variable sampling intensity. Ellipses correspond to the standard deviation around the habitat group centroids. Stress values below 0.1 can be considered as a very good fit. We additionally tested for significance when separating habitats (excluding Reed) and POM type and using a PERMANOVA (1000 permutations) on the distance matrices. The robustness of the results were evaluated by comparing them with a presence/absence transformed OTU matrix using Jaccard distances, as well as with a classified taxonomy abundance table generated with SILVA NGS (see below) using Cao distance. Both additional analyses resulted in similar outcomes (Suppl. material 6).

Alternative sequence data processing
As an alternative to the OTU-based RDP classification of our sequences, we performed two additional analyses with the aim to gain resolution for the taxonomic classifications of our sequences (see also Suppl. material 3). First, clipped sequences were demultiplexed and quality trimmed in Mothur as described above and then submitted to SILVA NGS (www.arb-silva.de/ngs/) (Quast et al. 2013) for classification at the minimum similarity level of 85% against the LSU reference database (version 123). This resulted in 57 fungal taxonomic paths (unique taxonomic names, hierarchical, see Quast et al. 2013;Suppl. material 4). Second, we performed an analysis, in which we pooled all clipped sequences of one habitat and then subjected these to a blast search (Blast+) against the nt database (GenBank, accessed January 2015) for eukaryotes. Sequences were then classified using the LCA classifier implemented in Megan5 (Huson et al. 2011) using the following parameters: Min. Score = 100, Max. Expected = 0.01, Top % = 5.0, Min. Support % = 0.01, Min. Support = 2, LCA = 75 %, Min. Complexity = 0. Habitats were compared based on square root normalization (Suppl. material 7).

Phylogenetic inference
We recovered several Neocallimastigales (rumen fungi) sequences, as classified by SIL-VA NGS and also by the RDP classifier with a low probability (< 55%). We took 26 representative sequences and constructed a phylogenetic tree with an extension of the reference dataset from James et al. (2006) in order to confirm or reject this potential classification. A matrix was aligned using the SINA aligner  followed by manual inspection. For tree reconstruction, we used MrBayes (v3.2.6; Ronquist et al. 2012) with 10 million generations and an "invgamma" model.

Results
A total of 54 sampling stations, representing eight habitat types, were sampled at three time points in spring of 2010 and analysed using pyrosequencing of the large ribosomal subunit (LSU) as a universal eukaryotic marker. Across all habitats, the total number of eukaryotic OTUs was 3695, as estimated using alignment-based clustering at 97%, 47% of which were singletons. The lower limit of shared estimated OTUs (shared corrected Chao index) varied considerably among habitats, with the highest values found in Sediment, Benthos and Biofilm habitats, and the lowest in Plankton and water samples (Table 1). Of the total OTUs, 1027 (27%) were classified as fungi by RDP (48% of which were singletons). The gmyc method of OTU delimitation for the non-Dikarya taxa (mainly aquatic lineages that comprised 52% of the fungal OTUs in our data) resulted in 65% of OTUs with more than one occurrence, compared with 68% of units defined by gmyc clustering. The ability of each type of taxonomic unit to predict habitat (97%: adj. r 2 = 0.59; gmyc: adj. r 2 = 0.61) was very similar and we thus decided, to hereafter use the more conservative OTUs based on the 97% criterion. The shared lower estimated OTUs were following similar trends as for all eukaryotes with Benthos and Biofilm ranking highest (231 and 225 estimated OTUs, respectively) and markedly lower ranks for the water samples (42-85 estimated OTUs) and only 8 estimated OTUs for Plankton samples (Table 1). Both, the fungal proportions (fungal reads) as well as the proportional fungal diversity (fungal OTUs) were significantly different for each POM type (reads: ANOVA, F = 104.4, p = 6.6 -10 , Tukey Post Hoc Test p < 0.001 ( Figure 3); OTUs: ANOVA, F = 132.9, p = 1.08 -10 , Tukey Post Hoc Test p < 0.01).
Fungal community composition was significantly structured into different habitats according to the NMDS clustering of OTUs (Figure 4, stress = 0.08; PERMANOVA, r 2 = 0.71, p < 0.001) and POM types (PERMANOVA, r 2 = 0.36, p < 0.001). The three wa-   (Cao et al. 1997), which are insensitive to differences in sampling effort. Ellipses are based on standard deviations around habitat centroids (based on a confidence level of 0.95) and are coloured according to their POM type: FPOM (blue), MIX (magenta), CPOM (brown). Habitat codes and POM categories are taken from Table 1. ter samples (Pelagic, Littoral, and Above Sediment) appeared to be very similar, whereas all other habitats were distinct (Figure 4). Comparable results were found by NMDS clustering of the presence/absence OTU matrix or by clustering the fungal taxonomic paths generated by SILVA NGS, indicating that the habitat clustering was robust and took place at even higher taxonomic levels (phylum to order level; Suppl. material 6).
Only 23% of the fungal OTUs could be classified to the family or genus level, and around 20% of the sequences could not be assigned to the kingdom level at 0.8 confidence threshold (Figure 2). RDP classifier seem to provide limited classification success when certain early diverging lineages are targeted (e.g., see "Pel" sample in Suppl. material 4). Thus we decided to also evaluate other alternatives. By using Blast against the nucleotide database of NCBI most fungal sequences could only be classified as "fungi" or "environmental samples" (mean: 70.2% of sequences; range: 36.5-89.7% of sequences in a given habitat; see Suppl. material 7). Then by processing the sequences by SILVA NGS we obtained a classification on the order level. Hence, we will use RDP to discuss the fine scale resolution and the SILVA classification for overall comparisons on the order or phyla level (Figure 3, 5). The orders Spizellomycetales and Rhizophydiales (both Chytridiomycetes) comprised the majority of fungal sequences in the four pelagic habitats and the Littoral water sample, with a greater proportion of Spizellomycetales in the three types of water samples compared to more Rhizophydiales in the Plankton and Sediment habitats ( Figure 5). In contrast, the Biofilm habitat harboured a good representation of all major fungal phyla ( Figure 3) with Chytridiales, Rhizophydiales (both Chytridiomycetes), and Agaricomycetes (Basidiomycota) forming the most prominent orders (Figure 4). Capnodiales and Helotiales (both Ascomycota) were the most prominent orders in the Benthic habitat whereas the Reed habitat was dominated by Pleosporales (Ascomycota) (Figure 4). Only a small proportion of fungal sequences (0-6%) could be assigned to what we assume are forest taxa (Agaricales, Auriculariales, Boletales, Cantharellales, Gleophyllales, Hymenochaetales, Polyporales, Russulales; Suppl. material 1). This proportion was significantly different among habitats (Kruskal-Wallis test, df = 5, p = 0.023), being highest in the Sediment samples (mean = 4.9%, SD = 0.9). From all habitats, we recovered sequences from oomycetes (i.e. Albugo, Aphanomyces, Phytophthora, Pythium, Saprolegnia), a group that was formerly related to aquatic fungi and that occupied similar ecological niches (Sparrow 1960). They have a Chytridiomycota-like life cycle and serve as parasites (e.g., agent of the European cray-fish plague) and saprophytes in aquatic systems. These sequences were 1-3 orders of magnitude lower in abundance compared to the fungal sequences, with maxima in Benthos and Sediment samples (Suppl. material 1).

Discussion
In the following discussion, we first address methodological considerations and then discuss fungal diversity separately for the major habitats.

Methodological considerations
The occurrence of early diverging fungal lineages as well as members of the Dikarya renders a comprehensive assessment of the aquatic mycobiota challenging. This is due to the difficulty in finding a universally suitable marker (i.e., "DNA barcode") with both sufficient coverage of evolutionary distant groups and meaningful resolution within any of the individual groups. We employed the D1 region of the LSU as a marker because of its high variability while still being conservative enough to amplify across the fungal kingdom (Porter and Golding 2012). Both D1 and D2 regions were formerly used as molecular markers for fungi, especially yeast (Kurtzman and Robnett 1997) and perform almost as well as the commonly used ITS region in discriminating fungal groups (Schoch et al. 2012). The LSU is an established phylogenetic marker for Chytridiomycota ) and, unlike the ITS region, it can be used to delimit distant aquatic fungal lineages (Lefèvre et al. 2012;Wurzbacher et al. 2014). The small ribosomal subunit (SSU) is also well established for early diverging lineages (e.g., Jobard et al. 2012;Ishii et al. 2015); however, it is less suitable for fungal groups within Dikarya Tedersoo et al. 2015) and would fail to generate meaningful OTUs for a broad spectrum of simultaneously occurring fungal phyla, such as in our biofilm habitat. Currently, the major disadvantage of using LSU regions as taxonomic markers for aquatic fungi is the lack of reference sequences that allow assignment. Although the RDP classifier identified at least two aquatic hyphomycetes genera in our dataset (Spirosphaera and Tetracladium) most previous work on this ecological group was done with ITS (Duarte et al. 2014). The ITS region may pose a better solution for those lake habitats that were dominated by Dikarya, however, as of 2014 only 26% of described aquatic hyphomycetes species had an ITS database record (Duarte et al. 2014). The RDP database had also problems with classifying Chytridiomycota (and Rozellomycota) from the water samples and similar problems may arise with freshwater Chytridiomycota for ITS data (currently there are 1121 Chytridiomycota sequences in UNITE version 7.0, excluding Batrachochytrium sequences) pointing to larger gaps in the reference datasets for aquatic species. The UNITE species hypothesis concept introduced by Kõljalg et al. (2013) might be a good interim solution for dealing with undocumented species; however, the LSU offered alternatives here, namely (i) it was possible to employ species delimitation methods for clustering (GMYC) and classification (see below) and (ii) it allowed the multivariate analyses for using evolutionary (or trait based) based diversity indexes such as UniFrac, which is frequently used in microbial ecology (Lozupone et al. 2011).

Water and large plankton
All three water habitats (Pelagic, Littoral, and Above Sediment) had a low proportion of fungal sequences. Each were characterised by a predominance of Chytridiomycota. This was also the case for the habitats directly connected to processes in the open water (Plankton and Sediment). The proportion of fungal sequences in all samples classified as FPOM was significantly lower (Figure 3). Previous results from Lake Stechlin also reported a low proportion of fungi in water samples (Luo et al. 2011), which may be related to the fact that we did not enrich for fungi by prefiltration or by primer selection. Lefèvre et al. (2012) provided a summary of fungal and chytrid percentages ranging from 1-50% in water samples, relating these observations to prefiltration and primer pair used. Like Monchy et al. (2011), we observed similar communities in all of the water samples (Littoral, Pelagic, and Above Sediment). The greater number of OTUs per read abundance may originate from a rare fungal parasite community , whereby parasitic chytrids can recruit for temporally variable infection opportunities such as may occur over time scales of a few weeks (e.g., Ibelings et al. 2004;Alster and Zohary 2007).
There was a limited number of fungal taxa associated with water borne zoo-and phytoplankton samples (Plankton; > 55 µm), which presumably should represent attached or infective stages of fungi. 84-93% of these fungi belonged to Rhizophydiales, a group of well described phytoplankton parasites. By contrast, Rhizophydiales accounted for only 14-17% in the pelagic (open) water samples. This is insofar important because most microscopic studies on chytrids refer to infected algae of approximate the size of 50 µm or larger (e.g., Hohlfeld 1998; Ibelings et al. 2004;. However, in an unbiased water sample (i.e. not fractionated by filtration or enriched by a plankton net), they were replaced as dominant group by the order Spizellomycetales, which are common saproptrophs in soil and may underline to the importance of saprotrophic chytrids in aquatic environments (Wurzbacher et al. 2014).
This may establish the mycoloop (Kagami et al. 2014) as a trophic link during times with low prevalence of algal infections, based on the mineralization of detritus (cf. Gleason et al. 2008). Finally, there was a low proportion of Rozellomycota in the large plankton. Rozellomycota are discussed as inter alia attached algal parasites (Jones et al. 2011); however, their under-representation in Plankton samples indicates that they were not relevant parasites of the larger plankton in Lake Stechlin, where Chytridiomycota occupied this niche. Due to the small size of Rozellomycota, they may rather have a specialization towards smaller hosts, which do not provide enough resources for Chytridiomycota to complete their life cycle.

Sediment
The profundal sediment temperature in Lake Stechlin remains ca. 4 °C year-round while the upper sediment surface (~ 5 mm) is usually oxic. The sediment has a high water content (> 95%) and high organic matter content at the sampled sites because it receives sinking matter from pelagic organisms. Thus the sediments serve as a fungal spore bank. We therefore expected to observe elevated proportions of forest fungi that had probably blown in or been washed in as spores. The dominant fungal group was the Rhizophydiales (Chytridiomycota), as also found in the large plankton samples. Similar to their hosts, parasitic chytrids develop thick-walled resting spores (cysts), which can be found in sediment, while other parasitic species can actively infect algal resting stages in sediments (Canter 1948, Canter 1968). The few studies that have investigated lake or pond sediments reported Chytridiomycota and Rozellomycota (at that time referred to as LKM11 & LKM15) to be the dominant fungal phyla (Luo et al. 2005;Slapeta et al. 2005). Rozellomycota species appear to occur in the hypolimnion of lakes (Lepère et al. 2010) and also in anoxic habitats (Jones et al. 2011), but their ecological function remains unclear . Similarly enigmatic was the appearance of Zygomycota (Mortierella) at the sediment surface in our study. Some of them can grow at low temperatures under oxic conditions, e.g. under snow packs in sub-alpine regions (Schmidt et al. 2008). Very surprising was the appearance of Neocallimastigomycota, which are by definition obligate, mutualistic, anaerobic rumen fungi. They are exceptional in that they break down a broad variety of plant polymers under anaerobic conditions (Solomon et al. 2016). These fungi must have had an environmental ancestor and it is possible that anoxic sediments may represent such an ancestral habitat. However, the sequences were only approx. 90% similar to Orpinomyces and the RDP classifier assigned a low probability to this classification (< 0.55). Lefèvre et al. (2012) also described sequences from lake plankton samples that may support such a new environmental lineage of "rumen fungi". Our test of whether those sequences clustered within the Neocallimastigomycota in a phylogenetic tree (Suppl. material 8) found no support for this. The sequences resembled unknown fungal lineages or belonged to Rozellomycota or Zygomycota lineages with a moderate probability.

Biofilm (Periphyton)
Biofilm samples appeared to represent an intermediate fungal habitat between sediments and benthic samples by including a high diversity of early diverging lineages as well as elevated proportions of Dikarya (16-36%, Figure 3). Fungi formed a significant proportion of the overall eukaryotic biofilm community recovered (28% of the OTUs recorded in the habitat were identified as fungi), dominated by biofilm-forming algae (the ratio of fungi to periphyton/epilithic algae was roughly 1:5, see Suppl. material 4, SILVA NGS). Biofilms represent a complex environment (exhibiting the highest eukaryotic taxon richness of all eight habitats) and this is also reflected by a broad range of fungal groups and taxa. Along with Rhizophydiales and Spizellomycetales, we found other chytrids of the orders Chytridiales and Cladochytridiales. The spatial proximity of host cells in periphyton could be ideal for chytrid species, facilitating a high encounter rate with potential hosts and substrates. In contrast to the water samples, the RDP classifier was able to classify more Chytridiomycota to genus level: Nowakowskiella, Chyridium and Betamyces. The RDP classifier also identified the earlier mentioned aquatic hyphomycetes. However a large part of sequences was only classified to phylum level (Suppl. material 4). These autotrophic lake biofilms seem to be a rich source of fungal biodiversity and pose promising target habitats for future studies. Biofilms (in our case mainly littoral periphyton and epilithic biofilms) have been rarely examined for fungi, and only a few studies on stream ecosystems have investigated the fungal occurrence (measured as ergosterol) on substrates other than leaves (Tank and Dodds 2003;Artigas et al. 2004;Aguilera et al. 2007;Frossard et al. 2012). In lakes and streams, periphyton can contribute substantially to the primary production of the whole ecosystem (Lalonde et al. 1991;Vadeboncoeur et al. 2007 and references therein; Vis et al. 2007) and can be the primary food source for macrozoobenthic grazers (Cattaneo and Mousseau 1995). Our findings suggest that it is not only a rich source of widely divergent fungal lineages, but that fungi might play an important ecological role in periphyton, turning over a significant amount of algal carbon and thus total carbon in the lake.

Benthic and reed samples (CPOM)
In contrast to water samples, fungal sequences were dominant in CPOM (Benthos, Reed) samples and their relative proportions were significantly elevated in this POM type. Samples consisted mainly of submerged plant residues in addition to algae and benthic animals. Mitosporic ascomycetes lineages were predominant, followed by a small percentage of chytrids (mainly Cladochytridiales) and very few Basidiomycota. This appears congruent with our initial "morphotype hypothesis". Mitosporic ascomycetes are effective plant decomposers in freshwater systems (Gessner et al. 2007), where they are ecologically grouped together as aquatic hyphomycetes (e.g. Spirosphaera, a potential aero-aquatic hyphomycete increased to 12% in Benthos samples). The Ben-thos habitat had a high proportion of fungal OTUs and fungal reads and is probably home to those fungi responsible for the breakdown of submerged plant remains. Interestingly, the importance of aquatic hyphomycetes for plant litter breakdown has thus far only been demonstrated in lotic environments, with lakes not yet investigated in detail (see Chauvet et al. 2016). In contrast to the benthic samples, the Reed sample exhibited a dominance of the fungal order of Pleosporales (93%). Early molecular work has already established the high diversity of reed endophytes (Neubert et al. 2005, Angelini et al. 2012) and we could confirm their presence. The reed sample can be seen as an outgroup in our study, as it comprised the emergent parts of plants. Sequences of the order Pleosporales were largely restricted to the reed and, to a lesser extent, benthic samples.

conclusions
Fungi play an important role in the cycling of carbon and nutrients in a wide range of freshwater habitats (Bärlocher and Boddy 2016). While much of our understanding of their diversity and ecological roles stems from research in the terrestrial realm, there is increasing interest in their taxonomic and functional diversity in freshwater systems Grossart and Rojas-Jimenez 2016). We examined fungal community composition in eight different habitats of a single lake, in contrast to most studies which have compared water samples among seasons or lakes (e.g., Monchy et al. 2011;. We found pronounced differences in diversity and community composition among the sampled habitat types, and conclude that the habitat heterogeneity within a single lake offers a wide range of fungal niches. The results extend previous research of fungal diversity and distribution in freshwaters and clearly indicate that lake biofilms can be hotspots for aquatic fungi. Most of the fungi from the water samples were rather homogeneous in their community composition, with a clear dominance of Chytridiomycota. This may be due to the predominance of FPOM in the sampled habitats. Our study highlights the importance of habitat heterogeneity and we hope will stimulate further research on under-sampled lake habitats, such as sediments, biofilms, and submerged macrophytes. A more holistic approach in evaluating fungal diversity, using a more comprehensive inclusion of habitat types and taxonomic markers, should provide deeper insights into the multiple ecological roles of fungi in diverse freshwater environments.