TRACE: Tennessee Research and Creative TRACE: Tennessee Research and Creative Exchange Exchange Cloning of ribosomal ITS PCR products creates frequent, non- Cloning of ribosomal ITS PCR products creates frequent, non-random chimeric sequences – a test involving heterozygotes random chimeric sequences – a test involving heterozygotes between Gymnopus dichrous taxa I and II. between Gymnopus dichrous taxa I and II.

Gymnopus dichrous exists in the southern Appalachians (USA) as two distinct entities with essentially identical nuclear ribosomal ITS1 sequences but differing ITS2 and LSU sequences (for convenience, called G. dichrous I and II). F 1 ITS heterozygotes between the two are routinely collected from nature. Cloning of ITS PCR products from F 1 heterozygotes produced sequences of both parental haplotypes but also numerous chimeric sequences (21.9%). The location of template switching was non-random leading to recovery of the same chimera several times and the chimeric region varied from 45bp to 300bp. By com-parison, single-basidiospore isolates from heterozygote F 1 fruitbodies showed no recombinant haplotypes within the ITS + LSU span and clones derived from P 1 homozygotes were identical to the P 1 parent. Thus, chimeric sequences are likely an artifact of the PCR-cloning process and not a consequence of natural recombination events found in nature, nor are they due to hidden existing variation within the ribosomal repeat. Chimeras and PCR-induced mutations are common in cloned PCR products and may result in incorrect sequence information in public databases.


Introduction
Sequence chimeras are common when pooled DNA s are co-amplified by a PCR process (Edgar et al. 2011;Judo et al. 1998;Jumpponen 2007;Meyerhans et al. 1990;Odelberg et al. 1995;Qiu et al. 2001a;Smyth et al. 2010;Tedersoo et al. 2014;Wang and Wang 1997). There has been speculation that chimeras may be the result of incomplete extension of PCR products which subsequently act as primers for the next amplification cycle and, in fact, there seems to be a reduction in chimeric PCR products when extension times are increased (Meyerhans et al. 1990;Qiu et al. 2001a;Smyth et al. 2010; Thompson et al. 2002) or by optomizing the PCR protocol (Qiu et al. 2001b;Wang and Wang 1997). Odelberg et al. (1995) however, demonstrated that chimeras can be generated in a single round of PCR amplification in the absence of heat denaturation and re-annealing which suggests that some polymerase template switching may occur. They demonstrated that template switching was reduced several fold (but not eliminated) by fixing templates to streptavidin magnetic beads.
The overall frequency of chimeras in published studies is unknown. In a study of fungal ITS amplicons derived from soil samples using a PCR/cloning process (O'Brian et al. 2005), 5% were chimeric as determined by an ascomycete/basidiomycete discontinuity between ITS1 and ITS2 regions. Smaller chimeras between related taxa would not have been detected in this study [see later detection of a chimera in this data set by Ryberg et al. (2008)]. For nSSU sequences, the proportion of chimeric sequences can be extensive (Ashelford et al. 2005;Ashelford et al. 2006;Fonseca et al. 2012;Wang and Wang 1997) leading to 'false diversity estimates and false novel taxa. Fonseca et al (2012) demonstrated that nuclear SSU chimeras were produced at high levels during the PCR-process when mixed templates were present. Interestingly, their results with a nematode population demonstrated that chimera formation is higher in species-diverse PCR pools than in pools that are genetically less diverse but that the breakpoints were in regions of sequence similarity.
Most programs for checking and removing chimeric sequences were designed for 16S ribosomal sequences. DECIPHER (Wright et al. 2012) (http://decipher.cee.wisc. edu/index.html) for bacterial 16S sequences searches for sets of short fragments that are uncommon in the phylogenetic group where the sequence is classified, but frequent in other phylogenetic groups. This depends on a robust pre-existing data set. UCHIME (Edgar et al. 2011), a chimera check program, is best when two known parental sequences and a high-quality reference database that is chimera free are available. Bellerophon uses a partial treeing analyses to detect 16S chimeras (Huber et al. 2004). Other chimera checking programs are available [see Fonseca et al. (2012) for discussion]. Nilsson et al. (2012) note the high incidence of ITS chimeras in public ITS databases and suggest a mechanism for identifying them by Blast mismatches between the ITS1 and ITS2 regions and by exploring long branches in phylogenetic trees. They comment that the most frequent site for exchange between two similar templates in a PCR reaction is in the first part of the highly conserved 5.8S segment. An open source chimera checker for the ITS region has been developed and is available at http://www. emerencia.org/chimerachecker.html (Nilsson et al. 2010). Using chimera checker to evaluate 12 300 sequences, 1.5% were identified as chimeric sequences. To better facilitate ITS-based molecular identification of fungi for the scientific community, the UNITE database was established to provide reliable documented ITS sequences for the fungal community (Kõljalg et al. 2013;Nilsson et al. 2015) and a reference set of ITS sequences, each representing a species hypothesis is available in several formats including UCHIME (Nilsson et al. 2015), available at https://unite.ut.ee/repository. php#uchime.
With the establishment of the ribosomal ITS as the fungal barcode , and the identification of highly heterozygous fungal ITS sequences that require cloning to resolve, it becomes increasingly important to understand the consequences of cloning PCR products from mixed templates including cloned PCR products generated during environmental sampling. Divergent Gymnopus dichrous ITS2 haplotypes provide an appropriate experimental system with which to explore this issue.
Gymnopus dichrous is a small, saprobic mushroom commonly found on oak bark and other woody debris in mid-summer in the southern Appalachian Mountains (USA). ITS sequencing identified two ITS subgroups of this mushroom called for convenience G. dichrous I and II. Gymnopus dichrous I and II differ in the ITS2 region (10% divergence) but there are no consistent or significant bp differences between G. dichrous I and II ITS1 or 5.8S regions (average divergence = 0.29% between the G. dichrous I and II ITS1 region, 0% in the 5.8s region). Homozygous collections for G. dichrous I and G. dichrous II were collected in the southern Appalachian Mountains and were designated as parental genotypes (P 1 ). Fruitbodies that were ITS hybrids between G. dichrous I and G. dichrous II were also been collected and were designated as F 1 hybrids (first filial generation as used in standard genetic crosses). For F 1 hybrids, several indels in the ITS2 region obscured electropherograms and prevented recovery of the parental ITS sequences during Sanger sequencing. Cloning of the ITS1-5.8S-ITS2 PCR product was required to recover individual contributing haplotypes, however, a significant portion of recovered haplotypes were chimeric sequences.
Cloned ITS sequences were compared to P 1 (parental/ homozygote) ITS sequences of Gymnopus dichrous I and II to identify chimeric and non-chimeric sequences. Below, we examine chimeras derived from cloned G. dichrous heterozygotes and show that they can be small, frequent and non-random. We also provide evidence that putative chimeras were not due to natural meiotic recombination or variation in the ribosomal repeat.

Methods and materials
Collections. Gymnopus dichrous and G. subnudus are often collected as the same entity and are morphologically difficult to separate. Both are variable in morphology. Putative Gymnopus dichrous collections were made in the field using known morphological and environmental characteristics [e.g., small (ca. 5 cm in height) brown mushrooms, often with a darker, compressed, stem base and growing on wood, usually but not exclusively on oak bark]. Of 116 collections of putative G. dichrous, 16 were G. dichrous I-II hybrids (Table 1). Collections are archived in TENN-Fungi.
Single basidiospore isolation. Single-basidiospore isolates (SBIs) were obtained from fresh spore drops as described in Gordon and Petersen (1992). Monokaryon status of SBI cultures was determined microscopically by lack of clamp connections.
PCR and Cloning procedures. Cloning was carried out using Promega's pGEM-T easy kit and M109 Competent cells according to manufacturer's directions (Promega). Sanger sequencing of ITS-cloned products was performed as described in Hughes et al. (Hughes et al. 2009).
PCR of the nuclear ribosomal ITS area was performed using primers ITS1F (Gardes and Bruns 1993) and ITS4 (White et al. 1990) for all collections in this study. PCR parameters for ITS amplification were 3 min at 94 °C followed by 34 cycles of 94 °C for 1 min, 55 °C for 1 min, 72 °C for 1 min and a final extension of 72 °C for 3 min. Each 50 µl PCR reaction contained 24.25 µl sterile ddH 2 O, 10 µl of 5X PCR buffer (Promega Corporation, 2800 Woods Hollow Road, Madison, WI 53711 USA a), 2.5 µl 5% DMSO (Sigma-Aldrich Company, St. Louis, MO, USA), 6 µl of 25 mM Mg = 3 mM (Promega), 4 µl of 100 mM dNTP mix (Promega), 1 µl each of two primers (10 µM) and 0.25 µl Taq polymerase (Promega). Sequencing of the 5' end of the nuclear ribosomal RNA large subunit gene was performed for single spore isolates only and used primers LR0R and LR5 (Vilgalys and Hester 1990). Parameters for LSU amplification are the same as for ITS except the extension time at 72 °C was 1.5 min.
Identification of template switching regions. There are 25 regions of sequence mismatch within G. dichrous I and G. dichrous II (Fig. 1, red text) which were used to identify apparent template switching. The first difference at ITS2 position 15 and the 1bp indel at position 24 were used to establish whether the 5' end of ITS2 represented G. dichrous I or G. dichrous II. Base pair 31 (A or G) is variable within the G. dichrous I population and when it is an adenine residue, it can be used as an additional marker. Clones exhibiting template switching were identified by an observed sequence change from G. dichrous I to G. dichrous II (or vice-versa) between bases 25 and 332. Four discontinuities were observed in the data set between bp15 and bp25. In two cases, bp25 and an adenine/guanine base pair at bp31 were used to identify the 5' end of the clone as G. dichrous I or II. In the other two cases, bp25 and bp71 were used to identify the 5' end of the clone. The discontinuities may be due to template switching or to PCR generated mutation. We note that rare PCR-induced base pair mutations could affect determination of the correct template switching point.
DNA folding. Potential DNA folding of the ITS2 region for G. dichrous I and II exemplars was estimated at 72 °C (extension phase of PCR) using the MFOLD web server (http://mfold.rna.albany.edu/?q=mfold/DNA-Folding-Form) with a MgCl setting of 3 mM (Zuker 2003) and 0 mM Na.
Chi square analysis. The ITS 2 region was divided into 25 segments of varying length between bases 25 and 332. Each segment was flanked by a sequence difference between G. dichrous I and II that was informative for diagnosing template switching. Bases in red are points where DI and DII haplotypes differ in sequence and were used to determine if template switching had occurred in a cloned PCR product. Eight base pairs at which template switching can be detected are indicated by numbers 1-8. The possible area in which template switching (ts) could have occurred is indicated by vertical arrows and the number of observed template switching events is given above the vertical arrow. Bases that may be involved in intra-strand base pairing as determined by MFOLD are outlined with black boxes. Ambiguity codes indicate intraspecific variation.
For each segment, the number of template switching events was recorded, ranging from zero to six. These constituted observed values. Expected values were based on a null hypothesis of random template switching (each base has an equal probability of template switching).

Results and discussion
The proportion of ITS chimeras obtained from PCR amplification of the ITS regions of G. dichrous I-II heterozygotes is given in Table 1. Of 128 clones from F 1 heterozygotes, 28 (21.9%) were chimeric in the ITS2 region. The number of clones representing non-chimeric ITS sequences was unequally distributed between G. dichrous I and II. From F 1 heterozygotes, 34.4% of the clones were G. dichrous I sequences and 43.75% were G. dichrous II sequences.
The distribution of template switch points resulting in chimeras was not random along the ITS2 region (Chi square = 35.72, P<0.05). Of 25 possible diagnostic base pair sites, template switching was observed between only 8 points (Fig. 1) and the most frequent template switching (six events) occurred within a very short span of bases, bases 122-138 creating 6 identical chimeric sequences. Two regions of 5 tem- plate switching events were also observed (Fig. 1). It should be noted that the exact point at which template switching occurred cannot be determined in regions where G. dichrous I and II have identical sequence but is between the last 5' diagnostic base pair difference between G. dichrous I and II and the diagnostic base pair at which template switching occurred (Fig. 1, horizontal arrows). The possibility of identical chimeras occurring in GenBank and thus being interpreted as valid taxa was noted by Nilsson et al. (2012). The non-random nature of template switching would suggest that some stable mechanism influences template switching. We investigated the possibility that secondary structure formation during the PCR process might lead to non-random chimera formation, perhaps by briefly stalling taq polymerase transcription at the point of secondary folding and allowing template switching. Ribosomal ITS2 RNA is known to have secondary structure at normal cellular temperatures and conditions (Joseph et al. 1999;Krüger and Gargas 2004;Krüger and Gargas 2008). We wondered if the formation of DNA secondary structure in the ITS2 region might affect chimera formation. Using MFOLD (Zuker 2003) we examined secondary structure of ITS2 DNAs at 72 °C (extension phase). At 72 °C, MFOLD predicted stable secondary structure in both G. dichrous I and G. dichrous II templates. For G. dichrous I, 4 possible folding configurations were reported at increasing levels of free energy (dG). The folding configuration with the lowest free energy is mapped to the sequence in Fig. 1. For G. dichrous II, a single folding configuration was reported. This is also mapped to Fig. 1. We note that factors other than the ITS2 sequence itself influence RNA folding.
The most frequent template switching occurred in a region between bases 122 and 138. This region overlaps and follows a small area of folding (a 4bp neck and 4bp loop) in G. dichrous I templates. The region between 140 and 200 is involved in complex folding patterns that are not consistent from model to model but present in all models. Five template switching events occur in this region. Between bases 200 and the end of the template at base 332, there is no consistently predicted secondary structure and fewer template switching events. Thus there is a loose agreement between secondary structure and regions of template switching but we cannot conclude cause and effect.
The size of detectable chimeras varied from chimeras occurring at the 5' end of the ITS2 sequence (approximately 300bp) to those occurring near the 3' end (45bp). Chimeras occurring between bp15 and bp25 would not have been recorded as such by our procedure but may have occurred (4 discontinuities may be due to template switching or to PCR generated mutation-see methods). We note that there is an area of secondary structure which overlaps bp15 and could be involved in template switching.
Areas where G. dichrous DI and DII differ extensively including indels (bases 147-151, 198-205) do not seem to be involved in template switching. This has been noted in other studies as well (Fonseca et al. 2012;Haas et al. 2011).
To evaluate whether cloning simply uncovered existing variation in the ribosomal repeat region (Lindner and Banik 2011), we sequenced the ITS plus LSU region of single basidiospore isolates (SBIs) derived from F1 heterozygotes ( Table 2). SBIs of F 1 spores were either G. dichrous I or II and did not show any evidence of meiotic recombination between ITS + LSU types I and II. We further examined SBI sequences from two collections of G. dichrous I (Table 2) and found no evidence that any G. dichrous II sequence elements were present in the ribosomal repeat. Finally, we cloned PCR products from three collections of G. dichrous I (TENN68152-8 clones, TENN67834-7 clones, TENN69091-8 clones) and again found no evidence for the presence of G. dichrous II sequence elements. We conclude that there is no current evidence in G. dichrous that cloning is recovering existing intragenomic variation in the ribosomal repeat but cannot exclude that possibility. Results reported as intragenomic variation in Laetiporus cincinnatus by Lindner and Banik (2011) could be due to differences in one or more ribosomal repeats but could also be explained if the sampled fruitbody was a hybrid and the two parental haplotypes and resulting chimeras were recovered.

Conclusions
Chimeras are common in cloned PCR products and tend to obscure contributing parental haplotypes, thus potentially creating errors in DNA sequence repositories. In this study, we show: 1. Template switching is non-random. Of 25 possible markers where ITS2 sequences of G. dichrous I and II differ, only 8 show template switching and template switching is higher in specific regions of the ITS2 sequences. The non-random nature of chimeras could lead to the misinterpretation of chimeras as parental haplotypes when the same chimera is recovered multiple times. 2. There is a loose correlation between areas predicted to form secondary structure and regions where template switching is high. We conclude that formation of secondary structure may affect template switching but speculate that secondary structure formation could either enhance or repress template switching, depending on location and the size of the stem-loop structure. 3. Chimeras occurring near the end of a template may be short and thus not easily detected. 4. Chimeras are not due to recovery of underlying variability in the ribosomal repeat in this system. The origins of chimeras remain obscure and may be due to multiple factors. 5. Chimera control should be exercised in environmental sampling studies and taxonomic studies wherever possible in order to minimize problems with persistent errors in sequence data repositories.