Research Article |
Corresponding author: Kessy Abarenkov ( kessy.abarenkov@ut.ee ) Academic editor: Andrew Miller
© 2016 Kessy Abarenkov, Rachel I. Adams, Irinyi Laszlo, Ahto Agan, Elia Ambrosio, Alexandre Antonelli, Mohammad Bahram, Johan Bengtsson-Palme, Gunilla Bok, Patrik Cangren, Victor Coimbra, Claudia Coleine, Claes Gustafsson, Jinhong He, Tobias Hofmann, Erik Kristiansson, Ellen Larsson, Tomas Larsson, Yingkui Liu, Svante Martinsson, Wieland Meyer, Marina Panova, Nuttapon Pombubpa, Camila Ritter, Martin Ryberg, Sten Svantesson, Ruud Scharn, Ola Svensson, Mats Töpel, Martin Unterseher, Cobus Visagie, Christian Wurzbacher, Andy F.S. Taylor, Urmas Kõljalg, Lynn Schriml, R. Henrik Nilsson.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Abarenkov K, Adams RI, Irinyi L, Agan A, Ambrosio E, Antonelli A, Bahram M, Bengtsson-Palme J, Bok G, Cangren P, Coimbra V, Coleine C, Gustafsson C, He J, Hofmann T, Kristiansson E, Larsson E, Larsson T, Liu Y, Martinsson S, Meyer W, Panova M, Pombubpa N, Ritter C, Ryberg M, Svantesson S, Scharn R, Svensson O, Töpel M, Unterseher M, Visagie C, Wurzbacher C, Taylor AFS, Kõljalg U, Schriml L, Nilsson RH (2016) Annotating public fungal ITS sequences from the built environment according to the MIxS-Built Environment standard – a report from a May 23-24, 2016 workshop (Gothenburg, Sweden). MycoKeys 16: 1-15. https://doi.org/10.3897/mycokeys.16.10000
|
Recent molecular studies have identified substantial fungal diversity in indoor environments. Fungi and fungal particles have been linked to a range of potentially unwanted effects in the built environment, including asthma, decay of building materials, and food spoilage. The study of the built mycobiome is hampered by a number of constraints, one of which is the poor state of the metadata annotation of fungal DNA sequences from the built environment in public databases. In order to enable precise interrogation of such data – for example, “retrieve all fungal sequences recovered from bathrooms” – a workshop was organized at the University of Gothenburg (May 23-24, 2016) to annotate public fungal barcode (ITS) sequences according to the MIxS-Built Environment annotation standard (http://gensc.org/mixs/). The 36 participants assembled a total of 45,488 data points from the published literature, including the addition of 8,430 instances of countries of collection from a total of 83 countries, 5,801 instances of building types, and 3,876 instances of surface-air contaminants. The results were implemented in the UNITE database for molecular identification of fungi (http://unite.ut.ee) and were shared with other online resources. Data obtained from human/animal pathogenic fungi will furthermore be verified on culture based metadata for subsequent inclusion in the ISHAM-ITS database (http://its.mycologylab.org).
Built environment, Indoor fungi, ITS, Annotation, Mycobiome
Fungi are found throughout the biosphere, and the built environment is no exception. The taxonomic composition of indoor fungal communities tends to reflect the local outdoor communities, although the majority of fungal particles found indoors is thought to represent spores, hyphal fragments, and other dormant and passively distributed stages (
Traditional, morphology-based studies of fungal spores and cultures derived from indoor sampling have recognized ca. 90 species of common indoor fungi (
A second problem that compounds the scientific understanding of the built mycobiome has been the lack of a standardized vocabulary for sequence annotation. The International Nucleotide Sequence Database Collaboration (INSDC;
The new MIxS-Built Environment annotation standard (
The workshop comprised 20 physical participants, mainly local Ph.D. students and postdocs – but also other researchers – in systematics and ecology. In addition, another 16 researchers participated remotely through Skype, Google Docs, and email. The participants focused on the public fungal ITS sequences of the INSDC as mirrored in the UNITE and ISHAM databases. To single out INSDC sequences associated with the built environment, we used a set of 24 keywords such as “dust”, “gypsum”, and “floor” (Suppl. material
For each BMS sequence we tried to locate any underlying publication through the INSDC fields TITLE, JOURNAL, and PUBMED. If these were not informative, we resorted to ISI Thompson, Google/Google Scholar, and ResearchGate searches. We examined the publications for the nine items of the MIxS-Built Environment annotation standard that we felt were the most relevant and the most likely to be covered by the studies: building occupancy type, indoor space, indoor surface, surface material, surface-air contaminant, space typical state, substructure type, ventilation type, and filter type (http://gensc.org/mixs/). In addition we also targeted the country and host of collection and the nature of the fungus-host association (e.g., “plant: wood”, “plant: leaf”, and “human/animal: skin”), as applicable, for all sequences. We only targeted metadata and information that was clearly and unequivocally specified in the paper. A research professional (G. Bok) from a building-related technical institute was present to assist with technical, analytical, and construction-related questions in the context of the built environment. For the OMS we similarly retrieved the underlying publications and annotated the sequences to country and host of collection plus host association (as applicable, and if and when these data were missing). All results were entered into an Excel sheet for upload into UNITE and ISHAM (after culture-based verification in the case of the latter), and for sharing with other online resources.
A total of 6,526 BMS and 11,574 OMS sequences from a total of 255 separate studies were annotated with at least one metadata item. A total of 45,488 annotations were made during the workshop. For example, “building occupancy type” was established for 5,801 sequences, and “ventilation type” was established for 2,235 sequences (Table
Analysis of the BMS sequences for country of collection. Country centroids marked with bubbles of different size on the global map indicate the number of BMS sequences originating from these countries (54 distinct countries, sequence count ranging from 1 to 2,914). For an additional 2.9% of the sequences, country information could not be restored during the workshop. The figure includes pre-existing data plus the data added during the workshop, such that these charts indicate the scientific state of ITS-based Sanger-derived sequencing of the built mycobiome as of spring 2016. Sequences that were not annotated with a single built environment-related term in the INSDC were not included in this effort, and are not represented in these charts.
Krona chart of the taxonomic affiliation of the BMS sequences down to order level. The Krona chart lists all annotated BMS sequences except those classified as Fungi sp. (36.4%) and those of non-fungal origin (0.9%). An interactive version of the Krona chart is provided as Suppl. material
Results of the annotation workshop, specified for the built mycobiome sequence set (BMS) and the outdoor mycobiome sequence set (OMS). Countries and hosts of collections plus host association were assembled for both of these. The number of sequences processed, plus the number of underlying published and unpublished scientific studies, are also provided. For the BMS, the nine MIxS-Built Environment annotation standard items targeted at the workshop are specified in separate columns. The sequence numbers shown in the table refer to the number of sequences annotated for each data item.
Number of sequences (annotated) | Number of studies | Country of collection | Different countries | Host of collection | Different hosts | |
---|---|---|---|---|---|---|
BMS | 6550 (6526) | 144 | 2447 | 29 | 881 | 15 |
OMS | 16766 (11574) | 128 | 5983 | 83 | 5632 | 859 |
Total | 23316 (18100) | 255 unique | 8430 | 83 unique | 6513 | 865 unique |
Host association | Comments | Building occupancy type | Indoor space | Indoor surface | Surface material | |
BMS | 764 | 2348 | 5801 | 1223 | 1207 | 1318 |
OMS | 2892 | 1293 | N/A | N/A | N/A | N/A |
Total | 3656 | 3641 | 5801 | 1223 | 1207 | 1318 |
Surface-air contaminant | Space typical state | Substructure type | Ventilation type | Filter type | ||
BMS | 3876 | 5618 | 96 | 2235 | 1874 | |
OMS | N/A | N/A | N/A | N/A | N/A | |
Total | 3876 | 5618 | 96 | 2235 | 1874 |
The workshop compiled a total of 45,488 metadata items, making them available for scientific query through UNITE and other venues. These metadata, although typically “published” and thus “available”, were previously not open for direct query. This highlights the wealth of relevant scientific information that lies buried in the last few decades’ worth of scientific publications – formally available, yet only available to those who know where to look, and reachable only to those with access to that literature. Fortunately, we live in a digital age where the infrastructure for recovering and sharing such information is falling into place (
We managed to process nearly all BMS sequences – for which we could retrieve the underlying publication(s) – for at least one metadata item. A total of 4,985 sequences were false positives – our keywords indicated them to belong to the BMS whereas in reality they did not. A sequence could stem from “outside city hospital” (keyword “hospital”), for instance. These sequences were annotated for country and host of sampling, plus the nature of the relation to the host, whenever the underlying scientific study could be retrieved and interpreted. It is reasonable to assume that our initiative suffered from a fair number of false negatives as well – sequences that should have been a part of the BMS, but that were not. Although we used no fewer than 24 keywords in our efforts to capture the built environment, we presumably missed one or more important terms in the field. We similarly missed out on all built-environment sequences that featured no relevant annotation whatsoever – perhaps just a species name and the country of origin were available. Thus, whereas we managed to do at least something about nearly all BMS sequences we recovered, we do not claim to have annotated all public fungal ITS sequences from the built environment.
The workshop identified several potential venues for amendments to the MIxS-BE standard. For example, “floor” was found to be a common place for sampling of, e.g., dust, yet the data point of “floor” could not easily be fitted into any extant MIxS-BE category. Similarly, “air” could not be represented in a straightforward way in the MIxS-BE standard (but rather applied to other packages of the MIxS standard). We also felt the need for a “laboratory” flag to indicate that a sequence stemmed from sampling in a laboratory. In addition, we were surprised by the number of fungal sequences generated from environments that must be considered to qualify as “built” or at least altered by man, but that nevertheless were difficult to fit into the present MIxS-BE categories. The examples included tombs, crypts, and mummies (
The present study used a workshop-style approach to accomplish a task that would have taken several months for a single researcher to accomplish. Costs were kept low by recruiting many of the participants among local Ph.D. students and postdocs in systematics and ecology, and workshop participation was made attractive by providing the opportunity to contribute to this workshop report. We can recommend this model when tackling projects of a similar kind, such as data assembly and analysis in molecular ecology and systematics. As an added benefit, the more junior participants obtain experience in scientific collaboration and communication as well as in carrying out scientific projects (cf.
We gratefully acknowledge financial support from the Alfred P. Sloan Foundation and the Swedish Research Council of Environment, Agricultural Sciences, and Spatial Planning (FORMAS, 215-2011-498). VRMC thanks CNPq - Conselho Nacional de Desenvolvimento Científico e Tecnológico (SWE 232695/2014-8) for providing a shared Ph.D. scholarship. A.A. is funded by the Swedish Research Council (B0569601), the European Research Council under the European Union’s Seventh Framework Programme (FP/2007-2013, ERC Grant Agreement n. 331024), and a Wallenberg Academy Fellowship.
Keywords used to identify fungal sequences from the built environment in the INSDC
Data type: text
Explanation note: Keywords used to identify fungal sequences from the built environment in the INSDC.
Annotations made during the workshop
Data type: metadata
Explanation note: The annotations made during the workshop shown with original INSDC data. For the BMS, we targeted nine MIxS-BE items plus country of collection, host of collection, host association, and a general “Comment” field. For the OMS, we targeted country of collection, host of collection, host association, and a general “Comment” field.
Krona chart
Data type: html
Explanation note: Interactive Krona chart for visualizing the taxonomic distribution of annotated BMS sequences down to order level. Sequences classified as Fungi sp. (36.4%) or non-fungal (0.9%) were excluded from this dataset.