Research Article |
Corresponding author: R. Henrik Nilsson ( henrik.nilsson@bioenv.gu.se ) Academic editor: Thorsten Lumbsch
© 2020 Louisa Durkin, Tobias Jansson, Marisol Sanchez, Maryia Khomich, Martin Ryberg, Erik Kristiansson, R. Henrik Nilsson.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Durkin L, Jansson T, Sanchez M, Khomich M, Ryberg M, Kristiansson E, Nilsson RH (2020) When mycologists describe new species, not all relevant information is provided (clearly enough). MycoKeys 72: 109-128. https://doi.org/10.3897/mycokeys.72.56691
|
Taxonomic mycology struggles with what seems to be a perpetual shortage of resources. Logically, fungal taxonomists should therefore leverage every opportunity to highlight and visualize the importance of taxonomic work, the usefulness of taxonomic data far beyond taxonomy, and the integrative and collaborative nature of modern taxonomy at large. Is mycology really doing that, though? In this study, we went through ten years’ worth (2009–2018) of species descriptions of extant fungal taxa – 1,097 studies describing at most ten new species – in five major mycological journals plus one plant journal. We estimated the frequency at which a range of key words, illustrations, and concepts related to ecology, geography, taxonomy, molecular data, and data availability were provided with the descriptions. We also considered a range of science-demographical aspects such as gender bias and the rejuvenation of taxonomy and taxonomists as well as public availability of the results. Our results show that the target audience of fungal species descriptions appears to be other fungal taxonomists, because many aspects of the new species were presented only implicitly, if at all. Although many of the parameters we estimated show a gradual, and in some cases marked, change for the better over time, they still paint a somewhat bleak picture of mycological taxonomy as a male-dominated field where the wants and needs of an extended target audience are often not understood or even considered. This study hopes to leave a mark on the way fungal species are described by putting the focus on ways in which fungal taxonomy can better anticipate the end users of species descriptions – be they mycologists, other researchers, the public at large, or even algorithms. In the end, fungal taxonomy, too, is likely to benefit from such measures.
collaboration, gender equality, metadata, reproducibility, species description, taxonomy
Taxonomy is the science that discovers, identifies, classifies, and describes organisms. Like in any scientific field, the knowledge gained in taxonomy has a value in itself, but it also caters to the needs of other research areas. Almost all studies in biology, and many other sciences, are performed on a taxon (often a species), derivatives from samples of a specific taxon (e.g., a protein), or pertain to the diversity of taxa. This view of the fundamental nature of taxonomy is certain to be shared by scientists and decision makers alike, but surprisingly this is not enough to guarantee a steady long-term supply of resources to taxonomy (
Since the “taxonomy crisis” has been acting out gradually during at least the last 20 years, it is reasonable to think that few biologists are unaware of it. Taxonomists, in particular, are certain to be all too familiar with it, often reporting feeling marginalized in comparison to ecological or molecular initiatives in the context of, e.g., grant writing and scientific funding (
Several of the present authors have spent significant time going through published species descriptions for key data on, e.g., taxonomy, ecology, and geography for compilation into community-driven efforts such as UNITE (
To assess whether fungal species descriptions are attuned to both the wants and needs of a target audience beyond taxonomists and the sign of the times, we explored 10 years’ worth of fungal species descriptions of extant mycological taxa in five major mycological journals (plus one botany journal for reference) for a range of factors pertaining to inter- and intra-scientific terms and concepts, science-demographical aspects, and illustrations and visualisations (Tables
We went through each issue (2009–2018) of five major, well-reputed mycological journals known to publish new species regularly (Table
Journal name | Journal field | Continent |
---|---|---|
Mycologia | Mycology | North America |
Fungal Biology (Mycological Research) | Mycology | Europe |
Mycoscience | Mycology | Asia |
Mycological Progress | Mycology | Europe |
Studies in Mycology | Mycology | Europe |
Plant Systematics and Evolution | Botany | Europe |
The resulting 1,097 PDF files were converted to text using pdftotext version 2.1.4 (https://pypi.org/project/pdftotext/). The text files were mined using a Python script (Suppl. material
Estimates obtained by the automated and manual parsing of the PDF files, broken down to three individual years (columns 2–4) as well as overall (column 5). Column 6 indicates our interpretation of the mycological repercussions of the trend in the data. Suppl. material
Parameter group (automated search) | 2009 | 2013 | 2018 | All years | Trend interpretation |
---|---|---|---|---|---|
Altitude | 26.76 | 13.64 | 19.30 | 16.86 | Unclear |
Biodiversity | 12.68 | 22.73 | 27.19 | 23.25 | Positive |
Climate | 7.04 | 10.91 | 18.42 | 13.67 | Positive |
Climate zone | 88.73 | 83.64 | 92.98 | 89.15 | Positive |
Collection (specimen/culture repository) | 76.06 | 85.45 | 92.11 | 84.59 | Positive |
Distribution (geography) | 74.65 | 78.18 | 92.11 | 82.77 | Positive |
Ecological association | 92.96 | 75.45 | 93.86 | 88.15 | Unclear |
Ecological mode | 77.46 | 69.09 | 82.46 | 75.39 | Positive |
Ecology, the term | 29.58 | 45.45 | 56.14 | 42.02 | Positive |
Family (classification) | 64.79 | 74.55 | 85.96 | 75.39 | Positive |
GIS/GPS | 8.45 | 2.73 | 4.39 | 4.47 | Negative |
Index Fungorum | 4.23 | 6.36 | 14.91 | 11.12 | Positive |
Locality, the term | 29.58 | 42.73 | 37.72 | 35.64 | Unclear |
Molecular availability (TreeBase/Dryad) | 33.80 | 50.91 | 69.30 | 53.78 | Positive |
Molecular search (BLAST) | 9.86 | 34.55 | 46.49 | 35.46 | Positive |
Molecular database (e.g., GenBank) | 57.75 | 84.55 | 94.74 | 85.69 | Positive |
Molecular data used | 71.83 | 90.91 | 97.37 | 90.79 | Positive |
Mycobank | 53.52 | 94.55 | 92.11 | 88.06 | Positive |
Order (classification) | 57.75 | 64.55 | 71.93 | 64.81 | Positive |
Phylum (classification) | 21.13 | 34.55 | 50.88 | 31.91 | Positive |
Societal implications | 50.70 | 48.18 | 64.91 | 53.60 | Positive |
Supplementary data | 5.63 | 19.09 | 37.72 | 24.43 | Positive |
Threatened (endangered) | 0.00 | 2.73 | 3.51 | 2.37 | Positive |
Parameter group (manual search) | |||||
Colour photo/illustration | 30.99 | 70.91 | 88.60 | 73.02 | Positive |
Determination key provided | 29.58 | 27.27 | 18.42 | 24.52 | Negative |
Discussion section present | 71.83 | 75.45 | 79.82 | 77.58 | Positive |
Electron microscopy used | 23.94 | 26.36 | 22.81 | 24.89 | No change |
Fungal culture shown | 22.54 | 21.82 | 34.21 | 25.16 | Unclear |
Lead author male | 72.73 | 69.33 | 60.26 | 68.73 | Positive |
Macro-photo indicates size | 60.98 | 58.02 | 60.64 | 63.34 | No change |
Manual micromorphology illustration | 40.85 | 53.64 | 35.09 | 42.66 | Unclear |
Map used | 9.86 | 10.00 | 7.02 | 6.38 | Negative |
Paper available | 71.83 | 74.55 | 72.81 | 77.85 | No change |
Phylogenetic tree shown | 61.97 | 84.55 | 93.86 | 84.59 | Positive |
Photo showing biological context | 52.11 | 59.09 | 71.93 | 62.26 | Positive |
Photo of micromorphology | 81.69 | 68.18 | 72.81 | 74.20 | Unclear |
Spore print provided | 0.00 | 0.00 | 0.00 | 0.09 | No change |
Averages (manual search) | |||||
Academic age, last author | 29.47 | 30.66 | 28.00 | 27.99 | Unclear |
Academic age, lead author | 23.11 | 20.65 | 12.30 | 18.11 | Positive |
Co-authors | 3.66 | 4.18 | 4.97 | 4.40 | Positive |
Co-author continents | 1.38 | 1.41 | 1.55 | 1.45 | Positive |
Co-author countries | 1.52 | 1.77 | 2.06 | 1.85 | Positive |
Co-author departments | 2.38 | 2.95 | 3.33 | 2.98 | Positive |
Female co-authors | 0.89 | 1.06 | 1.71 | 1.21 | Positive |
Pages | 9.00 | 9.75 | 12.87 | 10.88 | Positive |
Data visualizations | 0.17 | 0.13 | 0.09 | 0.17 | Negative |
The number of co-authors, distinct co-author departments, countries of origin and continents of origin (using the seven-continent system) of the co-authors were counted manually to quantify the extent to which taxonomy is practised as a collaborative pursuit. We sought to establish the gender of all co-authors by brief queries in Google, Google Scholar, and ORCID (https://orcid.org/). Only articles where we could determine the gender of all co-authors were used to infer the proportion of female co-authors and lead male authors. In an attempt at quantifying recruitment of aspiring researchers into taxonomy, we made the admittedly coarse assumptions that the last author was the supervisor, mentor, or taxonomic expert, and that the first author was a student or a nascent taxonomist. Google Scholar was used to determine the academic age of an author: year-of-the-oldest-publication minus year-of-the-most-recent-publication, in a way that dismissed obvious homonyms and ambiguous entries. Unresolved cases were left out from the comparison.
For convenience we group our results and discussion under the headings Ecology and geography, Systematics and taxonomy, Metadata and data availability, Visualisation, and Demographical aspects. The overall automated and manual estimates are found in Table
Most biologists would probably agree that taxonomy should be pursued in light of as many data sources as possible, including molecular, morphological, and ecological information. The output of taxonomic work should similarly be rich and many-faceted. However, the fact that the word “ecology” (and its variations) was used in only 42.0% of the examined studies somehow speaks against this assertion. On a more positive note, explicit reference to host, substrate, habitat, or partner was made in 88.1% of the cases, and a reference to the nutritional mode of the new species was made in 75.4% of the cases. The word “ecology” and any of the 19 other ecology-related keywords (Suppl. material
We acknowledge that when a new species is described, there may be no or limited occurrence data beyond the type locality. Still, variations of the words “distribution” and “geography” were mentioned in a strong 82.8% of the studies. Although explicit reference to variations of “climate” was found in only 13.7% of the studies, a full 89.2% of the studies featured climate-related words such as “temperate” or “tropical”. GIS/GPS co-ordinates were provided in a much more modest 4.5% of the studies, and 6.4% of the studies provided a map. 62 (5.9%) of the studies that did not provide GIS/GPS co-ordinates provided a map instead. A total of 89.9% of the studies provided neither GIS coordinates nor a map, and 64.4% lacked any relevant variation of the word “locality”. This does little to facilitate recollection of the species at the type locality. Altitude/elevation was mentioned explicitly in 16.9% of the studies. It strains credibility that more than 80% of all fungi described during 2009–2018 were collected at sea level, suggesting that the absence of altitude information should not be taken to mean sea level.
62.3% of the studies featured at least one photo or illustration that gave at least some sort of feeling for the biological context of the new species, typically by showing the collection site, the collection spot, or the substrate of collection. We feel that there is room for improvement here, particularly if taxonomy indeed seeks to produce results of relevance and interest that extend beyond the field.
It is surprisingly common to describe a new fungal species without mentioning where in the fungal tree of life it belongs: a phylum-level name was found in 31.9% of the studies; order, 64.8%; and family, 75.4%. The intersection of these estimates was 20.8%. In a few cases, some of this information may be truly unknown for the species being described (e.g.,
Although taxonomy represents a core aspect of biodiversity, variations of the word “biodiversity” are not commonly used in papers describing new species of fungi – only 23.2% of the studies used it. This comes across as a missed opportunity to place the new species in a richer context – and to have the underlying paper indexed properly in search engines and automated classifiers of scientific papers. Highlighting the importance or relevance of the new species to society (e.g., agriculture, forestry, or biotechnology) – if motivated – would similarly lead to a wider readership and better article indexation. However, a moderate 53.6% of the studies featured such keywords. A much lower number of studies – 2.4% – made a reference to the threatened or endangered nature of the new species or its habitat, although this may be difficult to know at the time of description.
Where is the underlying specimen or culture deposited? We found it quite common (15.4% of the cases) to provide this information in a way that does not employ any variations of the words “herbarium”, “fungarium”, “museum”, or “culture collection” – an example would be “deposited in H”. The reader would then have to know – or find out – that H is a herbarium at the University of Helsinki, Finland. This poses no challenge to a seasoned taxonomist (through recourse to, e.g., Index Herbariorum at http://sweetgum.nybg.org/science/ih/ or GRSciColl at https://www.gbif.org/grscicoll), but we imagine that other readers would struggle with this, as would data mining efforts to extract information from scientific papers. Improving clarity by writing, say, “deposited in herbarium H” – and why not write out the name of the herbarium in full? – should be easy enough. 11.1% of the papers mention “Index Fungorum” (http://www.indexfungorum.org/names/names.asp) and 88.1% “MycoBank” (
Identification keys help define what exactly differentiates the new species from others, and 24.5% of the papers we examined featured an identification key. There can be many reasons why a key would be premature or impossible to construct for various fungal taxa, such that whether 24.5% is a comforting estimate or not is hard to say. Determination keys are, however, becoming rarer over time (Table
The proportion of species descriptions making use of DNA sequence data – as deduced from the use of variations of keywords such as “PCR”, “DNA”, and “sequencing” – is on the rise, from 71.8% in 2009 to 97.4% in 2018 (Fig.
A Data and metadata in the description of fungal species 2009–2018. The x axis depicts year and the y axis depicts proportion of studies (from 0 to 1) fulfilling a specific criterion. Dark green – proportion of studies mentioning the word “ecology” or its variations; brown – proportion of studies giving a complete account of the taxonomic affiliation of the new species (family, order, and phylum); purple – proportion of studies with a macroscopic colour photo/illustration of the new species; pink – proportion of studies with macroscopic photos, that also indicate the size of the depicted object through a scale bar or a fiducial marker; light green – proportion of studies with an identification key; yellow – proportion of openly available papers for each year as assessed in 2020 B demographical and publication trends showing the average number of co-authors (dark green), departments (brown), countries (purple), continents (pink), and number of data visualizations (light green) over time. The bars indicate the yearly standard error C the average academic age of the first (green) and last (brown) co-author over time. The bars indicate the yearly standard error D the proportion of female co-authors (green) over time plus the proportion of female lead authors (brown).
To simulate whether the general reader could access the underlying PDF publications by Google searches, we queried Google by pasting the name of the paper in quotation marks and then scrutinizing the first two pages of hits manually (February 2020). We did this from computers not connected to any university network. We accepted hits to PDF files and full-text papers in the HTML format of both the final, published paper and to any preprints in, e.g, bioRxiv (https://www.biorxiv.org/), and we accepted both legal as well as juridically more dubious sources of PDF files. If any sort of registration was needed to access the PDF file, we scored it as “not available”. We found that 77.8% of the studies could be accessed from outside university networks. The observation that more than 20% of the taxonomic output of the mycological community cannot be readily accessed by the general public comes across as unfortunate. However, all of the journals we targeted allow the submission of preprints to online repositories. Thus, submitting a vetted preprint at least post-publication (in order not to confuse effective publication dates of names) is a way around this inaccessibility (cf.
Many cases of taxonomic mistakes, redundant species descriptions, and laboratory contaminations would have been avoided if the authors had subjected the newly generated DNA sequences to a simple BLAST search in, e.g., GenBank (
The average study was 10.9 pages long, although we did not correct for the number of described species in each paper. The studies grew more voluminous over time (Table
The average number of co-authors was 4.4, which was higher than we expected given that taxonomy is sometimes touted as a solitary discipline. The average number of departments, countries, and continents were 3.0, 1.9, and 1.5 – again higher than we had expected. Plotting the number of co-authors and countries over time (Fig.
Although the key terms and concepts to look for in a mycological species description will be somewhat different from those of a botanical counterpart, we did find some notable differences between the description of fungi and plants. It should be kept in mind that our botanical reference corpus was limited to a single journal and 40 papers, and the extent to which our results can be extrapolated to botany at large remains unknown. Nevertheless, botany comes out on top of mycology when it comes to specifying the type locality through either a map or GIS/GPS co-ordinates: 65% of the botanical studies did this, as compared to only 10.4% of the mycological. On the other hand, the use of molecular data is more widespread in mycological species descriptions (90.8%) than in botanical (60.0%). 59.2% of the mycological studies that relied on molecular data made these available in TreeBase/Dryad, compared to 8.3% of the corresponding botanical papers. Full-color macro-illustrations of the species being described were more common in mycology (73.0%) than in botany (55.0%). The number of co-authors on botanical species descriptions is lower (3.6) than in mycology (4.4), and so is the average number of female co-authors (0.94 vs. 1.2). Mycology comes across as a somewhat more collaborative discipline in that the average number of co-authors from different departments, countries, and continents are all higher in mycology, but botany struggles somewhat less with recruitment of aspiring taxonomists (Suppl. material
The semi-automated nature of our approach is not without potential shortcomings, and we are likely to have both under- and over-estimated some of our parameter values. As an example of an overestimation, a study could mention “DNA” or perhaps “PCR” without actually making use of sequence data in the description of the new species. This would have led us to the incorrect conclusion that DNA sequence data was used in the description of the species. As an example of an underestimation, a study could conceivably provide information on the ecology or nutritional mode of the new species without using any of the ~20 terms we looked for, leading us to the erroneous conclusion that nothing was said about the ecology of the new species. Since we processed nearly 1,100 mycological papers, such outlier cases will not have contributed much to our estimates. Our manual verification of 10% of the papers did not reveal significant cause for concern with respect to over- or under-estimations.
A potentially larger bias lies in our choice of journals. We purposely selected five major international mycological journals with significant impact factors, stringent editorial and review processes, and very detailed author instructions. The journals are not solely focused on taxonomy but cover a wide spectrum of mycological subdisciplines, and the papers published therein can therefore be expected to be geared towards a more general mycological audience. However, fungal species are described also in other outlets. For instance, there are 29 mycological journals with a formal Web of Science impact factor for 2019. Indeed,
The International Code of Nomenclature for algae, fungi, and plants (Turland 2018) stipulates the minimal requirements for publication of new names (species). Notions of, for example, ecology or geographical distributions, or inclusion of illustrations, are not part of those requirements (
We were happy to note that the proportion of species descriptions using sequence data is on the rise (Fig.
Adopting a species description to be meaningful also to a non-taxonomic reader may be challenging enough, but we argue that mycological taxonomy needs to go one step further. In a world where information is increasingly culled through automatic means, mycologists should no longer assume that all readers of species descriptions are human to begin with. This means that all data and metadata items should be machine-readable and available online, come with globally unique and persistent identifiers (including ORCIDs for humans, accession numbers/DOIs for sequence data/datasets, DOIs for cited publications, and Open Tree of Life identifiers for phylogenies). The notion of automated readers also brings about changes in the way manuscripts should be written in that it becomes particularly important to provide clear and precise information, almost to the point of tabularization. We argue that standardized terms should be used even when they cannot be parameterized; “Ecology: unknown” is incomparably more helpful to human and automated readers alike than simply leaving out the word “ecology” altogether. Along the same lines, and although brevity may suffer somewhat, “in herbarium GB (University of Gothenburg)” is immeasurably more helpful than “in GB”. No automated reader, and few non-taxonomists, will be able to contextualize the acronym “GB” in a meaningful way. To assume that the reader will be able to extrapolate the position of the species in the fungal tree of life, or the GPS co-ordinates of the collection spot, and so on, should similarly be avoided.
Our demographical estimates suffered from various potential shortcomings and biases: online information can be hard to find (particularly when it comes to authors who have not registered on ORCID), Google Scholar profiles are not necessarily complete or correct in terms of their publication lists, the last author does not have to be a supervisor or a mentor figure, and so on. Getting around these shortcomings in a study of the present kind is next to impossible, and we feel that our demographical results should be seen merely as rough estimates of trends. But a surprisingly strong signal still came out of them: taxonomy is no longer – and perhaps never really was – an entirely solitary discipline, but instead comes across as a reasonably collaborative, international discipline where knowledge seems to be passed on to younger researchers, at least to some extent. This offers hope for the future – taxonomy may actually be on its way to shake some of the misconceptions surrounding it (
Our data generally, but not exclusively, indicate what we feel is a gradual improvement in the richness of species descriptions and in the demographical aspects of fungal taxonomy over time (Table
RHN acknowledges financial support from the Swedish Research Council of Environment, Agricultural Sciences, and Spatial Planning (FORMAS, 215-2011-498). Dmitry Schigel and Nils Hallenberg are acknowledged for very valuable feedback on an earlier draft of the manuscript.
Python code responsible for the PDF parsing
Data type: Source code (PDF)
Explanation note: The Python code responsible for the PDF parsing, including the full details on how searches were made for each of the entries in Table
The automatic and manual estimates for the targeted mycological papers
Data type: Raw data (statistics)
Explanation note: The automatic and manual estimates for each of the 1,097 mycological papers targeted. The corresponding plant data are available on the second sheet.