Research Article |
Corresponding author: R. Henrik Nilsson ( henrik.nilsson@bioenv.gu.se ) Academic editor: Jozsef Geml
© 2018 R. Henrik Nilsson, Andy F. S. Taylor, Rachel I. Adams, Christiane Baschien, Johan Bengtsson-Palme, Patrik Cangren, Claudia Coleine, Heide-Marie Daniel, Sydney I. Glassman, Yuuri Hirooka, Laszlo Irinyi, Reda Iršėnaitė, Pedro M. Martin-Sanchez, Wieland Meyer, Seung-Yoon Oh, Jose Paulo Sampaio, Keith A. Seifert, Frantisek Sklenář, Dirk Stubbe, Sung-Oui Suh, Richard Summerbell, Sten Svantesson, Martin Unterseher, Cobus M. Visagie, Michael Weiss, Joyce HC Woudenberg, Christian Wurzbacher, Silke Van den Wyngaert, Neriman Yilmaz, Andrey Yurkov, Urmas Kõljalg, Kessy Abarenkov.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Nilsson RH, Taylor AFS, Adams RI, Baschien C, Bengtsson-Palme J, Cangren P, Coleine C, Daniel H-M, Glassman SI, Hirooka Y, Irinyi L, Iršėnaitė R, Martin-Sanchez PM, Meyer W, Oh S-Y, Sampaio JP, Seifert KA, Sklenář F, Stubbe D, Suh S-O, Summerbell R, Svantesson S, Unterseher M, Visagie CM, Weiss M, Woudenberg JHC, Wurzbacher C, den Wyngaert SV, Yilmaz N, Yurkov A, Kõljalg U, Abarenkov K (2018) Taxonomic annotation of public fungal ITS sequences from the built environment – a report from an April 10–11, 2017 workshop (Aberdeen, UK). MycoKeys 28: 65-82. https://doi.org/10.3897/mycokeys.28.20887
|
Recent DNA-based studies have shown that the built environment is surprisingly rich in fungi. These indoor fungi – whether transient visitors or more persistent residents – may hold clues to the rising levels of human allergies and other medical and building-related health problems observed globally. The taxonomic identity of these fungi is crucial in such pursuits. Molecular identification of the built mycobiome is no trivial undertaking, however, given the large number of unidentified, misidentified, and technically compromised fungal sequences in public sequence databases. In addition, the sequence metadata required to make informed taxonomic decisions – such as country and host/substrate of collection – are often lacking even from reference and ex-type sequences. Here we report on a taxonomic annotation workshop (April 10–11, 2017) organized at the James Hutton Institute/University of Aberdeen (UK) to facilitate reproducible studies of the built mycobiome. The 32 participants went through public fungal ITS barcode sequences related to the built mycobiome for taxonomic and nomenclatural correctness, technical quality, and metadata availability. A total of 19,508 changes – including 4,783 name changes, 14,121 metadata annotations, and the removal of 99 technically compromised sequences – were implemented in the UNITE database for molecular identification of fungi (https://unite.ut.ee/) and shared with a range of other databases and downstream resources. Among the genera that saw the largest number of changes were Penicillium, Talaromyces, Cladosporium, Acremonium, and Alternaria, all of them of significant importance in both culture-based and culture-independent surveys of the built environment.
Indoor mycobiome, built environment, molecular identification, fungi, taxonomy, systematics, sequence annotation, metadata, open data
The built environment presents dry, harsh conditions for fungal life, and traditional estimates of “indoor” fungi run in the low hundreds (
There is good reason to study the built mycobiome and the built microbiome at large (
Molecular identification of fungi is largely centred on the nuclear ribosomal internal transcribed spacer (ITS) region, which is the formal fungal DNA barcoding marker (
The UNITE database for molecular identification of fungi (https://unite.ut.ee/;
The workshop was held at the James Hutton Institute/University of Aberdeen on April 10–11 2017, and comprised 19 in situ and 12 remote participants. Fifteen of the participants had a taxonomic background and were tasked with assessing the public fungal ITS sequences and SHs within their respective expertise area in relation to assigned names, nomenclature, and recent taxonomic progress. Four of the participants had a general background in built-environment mycology and were asked to annotate recent sequences from the built environment according to the MIxS-BE standard. Nine participants had a background in other fields of mycology and were asked to harvest missing sequence metadata from the literature for fungal groups relevant to the built environment. Finally, four participants had a background in bioinformatics and were asked to process the corpus of public fungal ITS sequences from a technical point of view. All participants operated under the expectation that their contribution should meet the highest of quality standards, and that their work would be incorporated in UNITE, adopted by the downstream resources that make use of UNITE data (see, e.g., https://unite.ut.ee/repository.php), and shared with the INSDC and the recently established ISHAM database, which is a comprehensive, expertly curated ITS database of clinically important fungal pathogens (
The participants examined the public sequences from their respective fungal groups of expertise from nomenclatural and taxonomic points of view through the PlutoF workbench of UNITE (
In UNITE, all sequences that are at least 80% similar are grouped into compound clusters, which are further clustered into SHs (
Several of the workshop participants had a background in bioinformatics and focused on quality-related aspects of public fungal ITS sequences with and without a direct relation to the built environment. Chimera control was done following
The names of 4,783 sequences from a total of 387 distinct SHs were updated during the workshop (Supplementary material
Overview of genera. The 10 genera that saw the largest number of taxonomic changes during the workshop, plus the number of such changes.
Genus | Number of changes |
---|---|
Penicillium | 714 |
Talaromyces | 601 |
Cladosporium | 533 |
Mortierella | 372 |
Phialocephala | 327 |
Funneliformis | 196 |
Cyphellophora | 167 |
Acremonium | 136 |
Alternaria | 132 |
Leohumicola | 106 |
Total | 3284 |
Results of the taxonomic annotation part of the workshop. Name updates = number of sequences whose names were updated. RefS designations = number of reference sequences designated for individual SHs. Chimeras = number of chimeric sequences identified. Low read quality = number of sequences marked as being of substandard technical quality. The chimeras and the low read quality sequences were excluded from further use in UNITE (although kept in the system for future reference). Studies = number of distinct studies that saw at least one change to at least one sequence.
Name updates | RefS designations | Chimeras | Low read quality | Sum of changes | Studies | |
---|---|---|---|---|---|---|
Sequences | 4783 | 505 | 5 | 94 | 5387 | 250 |
A total of 922 of the 924 sequences from the built environment – corresponding to 33 different studies deposited since
Results of the metadata annotation part of the workshop, specified for the built mycobiome sequence set (BMS) and the outdoor mycobiome sequence set (OMS). Country and host of collection plus host association were assembled for both of these. The number of sequences processed, plus the number of underlying published and unpublished scientific studies, are also provided. For the BMS, the nine MIxS-BE annotation standard items targeted at the workshop are specified in separate columns. The sequence numbers shown in the table refer to the number of sequences annotated for each data item.
Number of sequences (annotated) | Number of different studies | Country of collection | Different countries | Host of collection | Different hosts | Host association | Comment | ||
---|---|---|---|---|---|---|---|---|---|
BMS | 924 (922) | 33 | 543 | 10 | 218 | 2 | 218 | 865 | |
OMS | 7657 (5264) | 218 | 4452 | 84 | 1524 | 275 | 1272 | 3181 | |
Both jointly | 8581 (6186) | 250 | 4995 | 84 | 1742 | 276 | 1490 | 4046 | |
build_occup_type | space_typ_state | substructure_type | ventilation_type | indoor_space | indoor_surf | surf_material | surface-air contaminant | filter_type | |
BMS | 597 | 732 | 19 | 95 | 4 | 76 | 130 | 195 | 0 |
Analysis of the built environment sequences for country of collection. Country centroids based on the geographical centres of contiguous country land masses are marked with bubbles of different size on the global map to indicate the number of built environment sequences originating from these countries as stated explicitly in the underlying INSDC records or as restored during the present effort and in
Analysis of the MIxS-BE “building occupancy type” (type of building where the underlying sample was taken). The figure is based on
Krona chart of the taxonomic affiliation of the built environment sequences down to order level. The Krona chart lists all annotated built environment sequences except those classified as Fungi sp. (32%) and those of non-fungal origin (1%). An interactive version of the Krona chart is provided as Supplementary material
A total of 5,264 sequences from a total of 218 distinct studies were annotated with at least one metadata item. A total of 10,429 metadata annotations were made during the workshop, including 4,452 country of collection (84 distinct countries) and 1,524 host of collection (275 distinct hosts; Table
Five sequences were marked for removal from the SH system because they were chimeric. Another 94 sequences were marked for removal because of low read quality.
Jointly the workshop participants implemented a total of 19,508 changes in UNITE (Tables
Several participants expressed frustration over the fact that numerous scientific studies were found to have released hundreds of sequences identified only as “Uncultured fungus” (or similar) even when a more informative name would be only seconds away through, e.g., a BLAST search (
Another issue that surfaced repeatedly during the workshop was the occurrence of legacy names, some of them downright outrageously outdated, and other obsolete data. In one case, a name that was synonymized more than 20 years ago was found. We take this to indicate that many researchers do not feel a personal responsibility for their INSDC submissions once those have become a part of the public corpus. However, this view goes against the INSDC policies (https://www.ncbi.nlm.nih.gov/genbank/submit/), which make it clear that sequence authors should approach the INSDC whenever additional explanatory information pertinent to their entries becomes available. Major changes to INSDC entries, such as changes in species names or the very sequence data, will also reach UNITE automatically. We hope that this workshop will serve as a general call to taxonomists and other researchers to revisit their previous INSDC submissions to see if they can be updated or if additional data can be provided. At an altruistic level, any such additional data are likely to move the study of fungi forward – in whatever context they are found – which should be at the heart of every mycologist. At a more personal level, researchers who ensure that “their” group of fungi are properly annotated in the public sequence databases, will soon start to see additional sequences for “their” fungi being identified and deposited by other researchers. This should translate into new opportunities for knowledge expansion and scientific collaboration, to the benefit of the initial researcher and, ultimately, everyone else.
The workshop also identified several shortcomings and avenues for improvement of the UNITE database. For example, recent taxonomic progress in fungi traditionally classified in the polyphyletic genera Candida, Cryptococcus, and Rhodotorula resulted in the recognition of a number of new genera and species names (e.g.,
In conclusion, the present workshop implemented a total of 19,508 changes in UNITE relating to fungi in the built environment. This will undoubtedly improve the taxonomic resolution in studies of the built, as well as many other, mycobiomes. Although truly uncharacterized lineages of fungi are repeatedly found in the built environment (e.g.,
The UNITE database community gratefully acknowledges support from the Alfred P. Sloan Foundation. HN and CW gratefully acknowledges financial support from Stiftelsen Olle Engkvist Byggmästare, Stiftelsen Lars Hiertas Minne, Kapten Carl Stenholms Donationsfond, and Birgit och Birger Wålhströms Minnesfond. CW gratefully acknowledges a Marie Skłodowska-Curie post doctoral grant from the ERC. Leho Tedersoo is gratefully acknowledged for providing helpful feedback on an earlier draft of this manuscript.
The sequences renamed during the workshop. The INSDC accession number, the original INSDC name, and the new UNITE name are shown
Data type: molecular data
The MIxS-BE annotations implemented for the built environment sequences during the workshop
Data type: molecular data
The metadata annotations for the sequences that were found in the same SHs as sequences from the built environment
Data type: molecular data
The interactive Krona chart associated with Figure
Data type: molecular data