Testing the phylogenetic utility of MCM 7 in the Ascomycota

The Ascomycota are a group of filamentous fungi that occur as saprobes, pathogens, and symbionts. They are of immense industrial, medical, ecological, and economical importance. The search for new markers appropriate for molecular phylogenetic analysis of Ascomycota remains a challenging problem. In this study, we explore the phylogenetic utility of a single copy protein-coding gene, MCM7; newly recognized as useful for inferring phylogenetic relationships among the major classes of the Ascomycota. Our specific goals were to: 1) test the phylogenetic utility of MCM7 for estimating phylogenies at various taxonomic ranks (class and below) with an emphasis on non-lichenized ascomycetes; and, 2) compare the congruence, robustness and resolving power of MCM7-based phylogenies with that of nuclear large subunit rDNA (LSU)-based phylogenies for the same taxon set. A dataset of sequence data for MCM7 as well as LSU was assembled for 80 species belonging to 63 genera of lichenized and non-lichenized ascomycetes in the classes Dothideomycetes, Eurotiomycetes, Geoglossomycetes, Lecanoromycetes, Leotiomycetes, and Sordariomycetes. We obtained 93 new sequences of which 65 are MCM7 and 28 are LSU. MaximumLikelihood and Bayesian analyses were performed using single as well as combined gene datasets and partitions. We also assessed substitution saturation for the MCM7 gene. Results indicate that MCM7 can be used successfully for determining phylogenetic relationships of ascomycetes and provided good resolution and support at half the cost compared to LSU. Phylogenetic informativeness profiles showed that MCM7 was more phylogenetically informative than LSU. The MCM7 gene is also a valuable phylogenetic marker for both lower as well as higher level phylogenetic analyses within the Ascomycota, especially when used in MycoKeys 1: 63–94 (2011) doi: 10.3897/mycokeys.1.1966 www.pensoft.net/journals/mycokeys Copyright H.A. Raja et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. reseArCh ArTiCle A peer-reviewed open-access journal


introduction
The Ascomycota, commonly referred to as the sac-fungi (Eriksson 2009), is the largest and most phylogenetically diverse group of organisms within the Kingdom Fungi and consists of an estimated 64,000 described species (Kirk et al. 2008).Currently the Ascomycota comprises three subphyla, 15 classes and 68 orders (Kirk et al. 2008).Species belonging to the Ascomycota can be found in all ecosystems where they inhabit a diverse array of ecological niches, acting as saprobes that decay dead organic matter, pathogens of plants and animals, as well as mutualists (lichen-forming fungi) and endophytes.In addition, numerous taxa within the Ascomycota are of industrial, medical and economical importance.A large proportion of taxa that reside within the Ascomycota are known only from their mitosporic or asexual states (Gams and Seifert 2008), thereby, making it difficult to determine phylogenetic and evolutionary relationships within this mega diverse group of fungi.The advent of molecular systematics has revolutionized our knowledge of the phylogenetics of the Ascomycota.
Early fungal phylogenetic studies used DNA sequences from nuclear ribosomal genes such as small subunit (18S) and large subunit (28S) rDNA genes (Bruns et al. 1991, 1992, Berbee and Taylor 1995, Spatafora 1995, Taylor et al. 1994, Tehler et al. 2000).Due to the presence of a large number of copies within the genome being subjected to concerted evolution (Zimmer et al. 1990), and due to their ease of amplification (Hills and Dixon 1991), 18S and 28S sequence data were used early on and still dominate the fungal sequence data in GenBank (Begerow et al. 2010, Lutzoni et al. 2004).Recently, however, fungal systematists have started using a number of singlecopy protein-coding genes for investigating deep phylogenetic relationships among the fungi.This has largely become possible due to the advent of fungal phylogenomics (Galagan et al. 2005).This task has been achieved due in part to the efforts of research consortiums among fungal systematists such as "Assembling the Fungal Tree of Life" (Hibbett et al. 2007, Lutzoni et al. 2004) and "Deep Hypha" (Blackwell et al. 2006).
Despite their widely accepted use in inferring evolutionary relationships among the ascomycete fungi, a number of protein-coding genes have been shown to perform variably (Aguileta et al. 2008).In fact a number of studies have attempted to use varying definitions of phylogenetic informativeness to compare various genes to one another.In addition to the aforementioned study by Aguileta et al. (2008), which compared gene based trees to an ideal tree, Townsend et al. (2007) used character rates, Graybeal et al. (1994) used empirical saturation plots and Collins et al. (2005) used base compositional stationarity, amongst others.The Townsend et al. (2007) measure of selecting genes with an optimal rate as it is projected backwards in time was applied to a taxon set comprising all major classes in the Ascomycota for DNA sequences from three ribosomal genes (two nuclear, one mitochondrial) and three protein-coding genes by Schoch et al. (2009a).These studies showed how different genes behave differently for discovering older versus younger divergences.In the majority of cases the selected protein-coding genes were more informative than the ribosomal genes over all time periods.Using different criteria, Aguileta et al. (2008) showed that several protein-coding genes used routinely in fungal phylogenetic studies were not among the best performing genes when tested against 246 single-copy orthologous genes extracted from 30 fungal genomes (see supplementary material in Aguileta et al. 2008).The authors discovered two ortholog single-copy protein coding gene loci, MS277 and MS456, which outperformed all other protein-coding genes in their study.MS456, commonly referred to as MCM7, codes for a licensing factor required for DNA replication initiation and cell proliferation.The protein encoded by this gene is one of the highly conserved mini-chromosome maintenance proteins (MCM) that are essential for the initiation of eukaryotic genome replication (Kearsey and Labib 1998).Schmitt et al. (2009b) subsequently developed fungal-specific primers for these two loci and tested their phylogenetic utility across a wide range of classes from the Ascomycota with a majority of taxa sampled from within the lichenized fungi in the Lecanoromycetes.Notably, the large and diverse class Dothideomycetes did not have representatives in this study.Data from this study suggested that, compared to MS277 (TSR1), the MCM7 primers were able to amplify a greater number of diverse taxa within the Ascomycota.However, the authors did not compare MCM7 with any other gene commonly used for fungal phylogenies.This includes the 28S large-subunit nuclear ribosomal DNA (LSU), which is currently one of the most widely used ribosomal genes for assessing phylogenetic relationships at the class level and below for Fungi (Begerow et al. 2010, Lutzoni et al. 2004).
The major objectives of this study, therefore, were to: 1) test the phylogenetic utility of MCM7 for estimating phylogenies at various taxonomic ranks (class and below) with a focus on non-lichenized ascomycetes; 2) expand use of the MCM7 gene to include taxa in the Dothideomycetes, Geoglossomycetes, Leotiomycetes, and Sordariomycetes; and, 3) compare the congruence, robustness and resolving power of MCM7-based phylogenies with that of LSU-based phylogenies for the same taxon set.Comparing the phylogenetic utility of the new gene with that of existing ones helps build robust and well-resolved phylogenies among ascomycete fungi while improving cost management of molecular studies.

Taxon sampling
Taxa used in this study are listed in Table 1, along with information on the source of the isolates as well as their country of origin, where available.The focus of our taxon sampling was to include non-lichenized ascomycetes representing terrestrial and freshwater ascomycete taxa from the Dothideomycetes, Geoglossomycetes, Leotiomycetes, and Sordariomycetes for both MCM7 and LSU genes.We assembled datasets of each gene for the same 89 taxa.Six classes from within the rankless taxon Leotiomyceta (Schoch et al. 2009b): Dothideomycetes, Eurotiomycetes, Geoglossomycetes, Lecanoromycetes, Leotiomycetes, and Sordariomycetes, were sampled.Based on results of previous phylogenetic analyses (James et al. 2006, Lutzoni et al. 2004, Spatafora et al. 2006), one representative each from Saccharomycotina and Taphrinomycotina was used as outgroup taxa for all analyses.For some taxa, more than one representative was sequenced for both genes to verify its identity as well as to assess the utility of MCM7 in comparison to LSU at lower taxonomic levels.Newly generated sequences are deposited in GenBank and their accession numbers are listed in Table 1.

Molecular Methods (DNA extraction, primers and sequencing)
Total genomic DNA from terrestrial ascomycetes was extracted using methods outlined in Promputtha and Miller (2010), whereas DNA from freshwater ascomycete taxa was extracted from axenic cultures obtained from single-spore isolates following Campbell et al. (2007).PCR reactions were carried out using known LSU and MCM7 primers (Rehner and Samuels 1995, Schmitt et al. 2009b, Vilgalys and Hester 1990).The LSU gene was amplified using thermocycler conditions outlined in Miller and Huhndorf (2004) and MCM7 was amplified using the following thermocycler conditions: initial denaturing at 94 C for 5 min; 30 cycles of denaturing at 94 C for 45 sec, annealing at 50-56 C for 50 sec; extension at 72 C for 1 min; and a final extension step of 72 C for 5 min.For taxa which were difficult to amplify, the annealing temperature was decreased to 45 C. PCR reactions using illustra Ready-To-Go™ PCR Beads (GE Healthcare, Waukesha, WI) contained 1-5 µl genomic DNA, 2.5 µL of BSA (bovine serum albumin, New England Biolabs, Ipswich, MA) and/or 2.5 µL of DMSO (dimethyl sulfoxide, Fisher Scientific, Pittsburgh, PA), 1 µl of each 10mM primer, and enough distilled water to bring the reaction volume to 25 µL.Purified PCR products were used in 11 µL sequencing reactions with BigDye Terminators v. 3.1 (Applied Biosystems, Foster City, CA) in combination with the following LSU primers: LROR, LR3, LR3R, LR6 and MCM7 primers: Mcm7-709for, Mcm7-1384rev.Sequences were generated on an ABI Applied Biosystems 3730XL high-throughput DNA capillary sequencer at the UIUC Keck Center for Comparative and Functional Genomics.

Sequence alignment
Each sequence fragment was subjected to an individual BLAST search to verify its identity.MCM7 sequences from the GenBank were assembled and aligned with newly obtained sequences using Sequencher 4.9 (Gene Codes Corp.), optimized by eye, and manually corrected.For the LSU data, the newly obtained sequences were aligned with sequences from GenBank using the multiple sequence alignment program, MUSCLE® (Edgar 2004), with default parameters in operation.MUSCLE® was implemented using the program Seaview v. 4.1 (Gouy et al. 2010).The LSU sequences were aligned in MUSCLE® using a previous (trusted) alignment made by eye in Sequencher v. 4.9 based on a method called "jump-starting alignment" (Morrisson 2006).The final alignment was again optimized by eye and manually corrected using MacClade v. 4.08 (Maddison and Maddison 2000) and Se-Al v. 2.0a8 (Rambaut 1996).The separate and combined alignments are available from the authors upon request.

Maximum likelihood and Bayesian search strategies for phylogenetic analyses
Maximum likelihood (ML) and Bayesian Inference (BI) methods were used in phylogenetic analyses for both the MCM7 and LSU datasets.The Akaike Information Criterion (AIC) (Posada andBuckley 2004) as implemented in Modeltest v. 3.7 (Posada andCrandall 1998) was used to determine the best-fit model of evolution for each data set for both ML and BI.For the separate and combined datasets, the best-fit model of evolution was the GTR + I + G model.Likelihood analyses were conducted using PhyML (Guindon and Gascuel 2003) under the following parameters: GTR model was implemented with six rate classes and invariable sites.Across site variations were fixed with parameter values obtained from Modeltest and 1000 bootstrap replicates were performed from a BioNJ starting tree employing the best of nearest neighbor interchange (NNI) and subtree pruning and regrafting (SPR) branch swapping.Maximum likelihood analyses were also performed using RAxML v. 7.0.4(Stamatakis 2006) run on the CIPRES Portal v. 2.0 (Miller et al. 2010) with the default rapid hillclimbing algorithm and GTR model employing 1000 fast bootstrap searches.Clades with bootstrap values ≥ 70% were considered significant and strongly supported (Hills and Bull 1993).Bayesian analyses employing a Markov Chain Monte Carlo (MCMC) algorithm were run with MrBayes v. 3.1 (Huelsenbeck and Ronquist 2001) on the CIPRES Portal v. 2.0 as an additional means of assessing branch support.These analyses incorporated the general time reversible model (Rodríguez et al. 1991) including an estimation of invariant sites and assuming a gamma distribution parameter (GTR + I + G) with six rate categories.Four independent chains of MCMC were run for 50 million generations to insure that the same tree space was being sampled during each analysis and that the trees were not trapped in local optima.Trees were sampled every 1000 th generation resulting in 50,000 total trees.Bayesian posterior probabilities (BPP) were determined from a consensus tree generated from the remaining 40,000 trees in PAUP * 4.0b10 (Swofford 2002) after the first 10,000 trees, which extended beyond the burn-in phase in each analysis, were discarded.Clades with posterior probability ≥ 95% were considered significant and strongly supported.

Combined analyses and test for conflict
The individual LSU and MCM7 datasets were examined for potential conflict before they were combined into a single dataset for total evidence analyses (Eernisse andKluge 1993, Kluge 1989).Since previous studies have shown that the incongruence length difference (ILD) test performs poorly (Barker and Lutzoni 2002, Dolphin et al. 2000, Dowton and Austin 2002, Yoder et al. 2001), a simple test was used for comparing and assessing the combinability of the data from individual datasets.The individual gene phylogenies were considered to be incongruent if clades with significant ML bootstrap support and BI BPP (i.e.≥ 70% BS or ≥ 95% BPP) were conflicting in the tree topologies (Alfaro et al. 2003, Weins 1998, Lutzoni et al. 2004).Incongruent clades with < 70% BS and < 95% BPP suggest the conflict is statistically unsupported.If there is no conflict based on the above assumptions, it supports the argument that the individual genes possess similar phylogenetic histories and can be combined.Since no significant conflict was observed among clades in each of the individual datasets, they were combined to achieve maximum phylogenetic resolution and support.The combined dataset was analyzed with the same parameters as above except that the protein coding dataset was partitioned based on codon positions.For BI we used flat priors and unlinked model parameters across partitions.The combined datasets were partitioned and analyzed so as to allow separate parameter estimation for each gene as well as for each codon position for MCM7.

Substitution saturation test
All of the 89 sequences from the MCM7 alignment were used to assess transitions/ transversions (ti/tv) substitution saturation of first, second, and third-codon positions.Observed ti/tv was plotted against Jukes Cantor JC89 corrected distance (Jukes and Cantor 1969) for each codon position separately as well as combined using the program DAMBE (Xia andXia 2001, Xia 2009).Transition and transversion of each codon position can be considered saturated if the scatter points on the two-dimensional plot appear to level off with an increase in sequence divergence.In addition, the I SS statistic, which is a measure of substitution saturation in molecular phylogenetic datasets developed by Xia et al. (2003) and implemented in DAMBE, was also used to detect saturation.Nucleotide statistics for both genes were calculated in PAUP* 4.0b10 (Swofford 2002), SeqState v. 1.4.1 (Müller 2005), and Mega v. 4 (Tamura et al. 2007).

Phylogenetic Informativeness
We performed a phylogenetic informativeness (PI) measure on our combined dataset as proposed by Townsend (2007) using the PhyDesign online tool developed by López-Giráldez and Townsend (2011).PhyDesign measures per-site estimates to project the utility of a particular gene for resolving phylogeny related questions across historical time.This method allows for a comparison of different genes and loci used for phylogenetics by providing an estimate of the cost effectiveness of character sampling for specific time units.The time units used herein are relative time periods.Schoch et al. (2009a) compared PI for the Ascomycota using a 6-gene phylogeny, but MCM7 was not evaluated in their study.

Taxon sampling
A total of 89 taxa were included in the study, which comprises 80 species belonging to 63 genera of lichenized and non-lichenized ascomycetes in the classes Dothideomycetes, Eurotiomycetes, Geoglossomycetes, Lecanoromycetes, Leotiomycetes, and Sordariomycetes (Table 1).

New taxa sequenced
We report 93 new sequences of which 65 are MCM7 and 28 are LSU (Table 1).Table 1 provides accession numbers for sequences used from GenBank in addition to those newly generated in this study.Most of the newly generated data for both MCM7 and LSU are from ascomycetes that occur as saprobes on wood in terrestrial (Miller and Huhndorf 2009) or freshwater habitats (Shearer and Raja 2010).Our study resulted in a > 80% sequencing success rate for MCM7, which is comparable to that found by Schmitt et al. (2009b).We obtained the best PCR amplification results for MCM7 with about 5 µl of total genomic DNA concentration per 25 µl of PCR reaction.
Both LSU and MCM7 alignments consisted of the same 89-taxon dataset.The original LSU dataset had a total of 1484 nucleotides.After aligning in MUSCLE and excluding nucleotides from the 5´ and 3´ ends due to missing data in most sequences, the LSU dataset consisted of 1141 nucleotides.The final LSU dataset after excluding 57 ambiguous characters and two short intron regions from Saccharomyces cerevisiae consisted of 1076 nucleotides.The LSU dataset had 551 constant characters, 123 variable characters, and 402 parsimony informative characters (Table 2).The MCM7 dataset consisted of a total of 642 nucleotides (193 constant, 449 variable); there were no missing characters, ambiguous regions, or introns.The majority of informative characters were in third codon positions (Table 2).The GC content of MCM7 was slightly higher than LSU, although nucleotide percentages were somewhat similar in both datasets (Table 2).Although MCM7 had fewer nucleotides analyzed, it had a higher percent (62%) of parsimony informative characters than LSU (37%) (Table 2).The final LSU and MCM7 combined dataset had 1718 nucleotides.

Phylogenetic analyses
The estimated model parameter values obtained from AIC with modeltest are listed in Table 3. Application of separate models on the different codon positions for MCM7 did not affect the topology and posterior probabilities of clades (data not shown).Since PhyML and BI analyses produced trees with nearly identical topologies, only PhyML phylograms are shown (Figs 1-3).
Class-level relationships: The overall tree topologies of LSU and MCM7 genes were identical with the represented classes of fungi occurring as monophyletic (Figs 1 and 2).A total of 46 clades received strong support (≥ 70% BS and ≥ 95% BPP) with PhyBS, 52 for RAxBS, and 62 for BPP for LSU, whereas, 38, 39, and 46 clades were strongly supported for PhyBS, RAxBS, and BPP, respectively, for MCM7 (Table 2).More major lineages within the Ascomycota were more strongly supported with LSU compared to MCM7 data.For LSU, nine nodes were strongly supported with PhyBS and ten with RAxBS, while twelve were strongly supported with BPP (Table 4).For MCM7, ten nodes were strongly supported with PhyBS and RAxBS, while only nine were strongly supported based on BPP (Table 4).The somewhat higher support for the LSU gene may be due to the greater sequence length for LSU when compared to MCM7 in the present study.Min and Hickey (2007) have shown that reducing sequence length can have a profound effect on the resolution of resulting phylogenetic trees.The net PI profile, which is based on sequence length, also shows the LSU gene has slightly higher phylogenetic informativeness at older nodes across relative older dates compared to MCM7 (Table 5, Fig. 6).
Genus and species level relationships: Within the Dothideomycetes, Eurotiomycetes, Geoglossomycetes, and Sordariomycetes, we selected more than one species/ strain within a genus to assess the performance of MCM7 (MS456) versus LSU.
Our data shows that MCM7 can be used for assessing interspecific relationships of taxa within genera such as Camarops, Lasiosphaeria, (Sordariomycetes), Aspergillius (Eurotiomycetes), Geoglossum (Geoglossomycetes), and Aliquandostipite (Dothideomycetes).The above taxa sampled from their different classes within the Leotiomyceta each formed a monophyletic clade with high internal resolution and support based on MLBS and BPP values in each gene tree (see Figs 1 and 2, and Table 5).However, the combined gene tree showed even better resolution of relationships and clade support for the above genera (Fig. 3, Table 5).Removing the third codon position had a slight negative effect on clade support within genera (Fig. 5, Table 5).

Combined analysis
Since no significant conflict occurred among well-supported clades in each tree topology, we concatenated the two gene regions.The combined LSU-MCM7 gene tree (Fig. 3) had a total of 801 parsimony informative characters (Table 2) and provided a more robust phylogenetic hypothesis of the Ascomycota (Fig. 3) than either individual tree topology.A total of 62 clades were strongly supported with PhyBS, 63 with RAxBS, and 61 with BPP (Table 2).The combined tree also received higher nodal support for the major lineages included with 13 strongly supported lineages with PhyBS, twelve with RAxBS, and twelve with BPP (Table 4).The nodal support for the combined data set was higher for the total number of strongly supported clades as well as for the number of nodes strongly supported for the major lineages in comparison to the separate gene analyses (see Table 3, 4 and Figs 1-3).A number of nodes that were moderately (< 70 % BS and < 95% BPP) or poorly (< 50 % BS and < 70% BPP) supported in the separate gene analyses received strong support in the combined gene analyses (Table 4).

Substitution saturation
There is no indication of substitution saturation in the first or second codon positions (Fig. 4).However, for the third codon position, it is evident that there is leveling off in the scatter plot when transition/transversion divergence are plotted against pairwise sequence divergence (Fig. 4).It is also clear that third codon position transitions reach a plateau.Saturation tests therefore indicate poor phylogenetic signal at the third codon position, and transitions appear to be saturated on a plot of substitution type against JC corrected genetic distances.The test of Xia et al. (2003) suggested that for the first and second codon positions of MCM7 sequences, the values for the index of substitution saturation I SS were 0.253 and 0.152, respectively, for 32 OTUs, and the critical I SS.C values were 0.659 and 0.658.This suggests that there were no significant levels of substitution saturation at the first and second positions (I SS < I SS.C , P < 0.0001).However, for the third codon position of MCM7, the observed I SS value of 0.807 is significantly greater than the I SS.C value of 0.658, suggesting that the third codon position has experienced substitution saturation (Xia et al. 2003).This statistical test therefore corroborates the scatter plot data and suggests that the third codon position is saturated and therefore might possess a poor phylogenetic signal.Therefore, we carried out an additional set of ML and BI analyses using a method called site stripping (Verbruggen and Theriot 2008), where we entirely removed the third codon position in order to assess the effects of saturated third codon position on the tree topology and statistical clade support.The PhyML tree resulting from an analysis of only first and second codon positions for MCM7 is presented in Fig. 5.The topology of this phylogenetic tree is not congruent with the LSU and MCM7 trees or the combined gene trees.One major difference was that the Xylariomycetidae clade nested within the Sordariomycetidae clade.In addition the nodal support for the major lineages was quite poor for the first and second codon position tree when compared to the separate LSU and MCM7 gene trees or the combined gene tree (Table 4).For example, Dothideomycetes and Eurotiomycetes did not receive support with PhyBS, RAxBS, or BPP (Table 4).However, the MCM7 gene tree with all codon positions included showed strong support for the Dothideomycetes and the Eurotiomycetes lineage was strongly supported with PhyBS (Table 4).

Phylogenetic Informativeness
We derived the profiles from rates of evolution of sites within genes using PhyDesign, an online platform for profiling PI (López-Giráldez and Townsend 2011, see also Townsend 2007), which provides a unique empirical metric for guiding marker selection and facilitates locus prioritization.The net PI correlates with the degree of nodal support, while the per site PI compares the relative power of gene performance without confounding effects of gene length (Townsend 2007, López-Giráldez andTownsend 2011).Net PI showed a higher pulse for MCM7 than LSU.However, LSU   had a higher pulse of PI for older time units (beyond 0.4) (Fig. 6).Based on a per-site comparison, the MCM7 gene fragment (642 bp) produced a pulse of higher PI across relative time units compared to LSU (Fig. 6).

Class-level relationships
The topologies of the major classes obtained using the LSU gene (Fig. 1) as well as the MCM7 gene (Fig. 2) broadly agrees with previously published multi-gene phylogenies of Ascomycota (James et al. 2006, Lutzoni et al. 2004, Schoch et al. 2009a, Spatafora et al. 2006).We show that all classes in the present study are monophyletic, which corroborates earlier hypotheses by Eriksson and Winka (1997) 4), but, however, without strong support for its placement in relation to other classes of Leotiomyceta as noted previously (Schoch et al. 2009b).We did not recover support for the expanded subclass Pleosporomycetidae as found in a previous multi-gene study focused only on Dothideomycetes (Schoch et al. 2009c) and the influence of additional gene data and improved taxon sampling cannot be ruled out.
In spite of this, results of our study are also in agreement with those of Aguileta et al. (2008), who showed that MCM7 is a reliable marker for establishing phylogenetic relationships among fungi, and concur with Schmitt et al.'s (2009b) results regarding the phylogenetic utility of MCM7 for resolving relationships among the Ascomycota.

Genus and species-level relationships
We included more than one species or isolate of various genera such as Camarops, Lasiosphaeria (Sordariomycetes), Graddonia (Leotiomycetes), Aspergillus (Eurotiomycetes), and Aliquandostipite (Dothideomycetes) to test how MCM7 would perform in resolving relationships at the genus-level.Although several species of Jahnula were included in our study, we do not discuss results for this genus in more detail since independent data strongly suggest the genus may be polyphyletic within the order Jahnulales (Campbell et al. 2007, Suetrong et al. 2010).Currently, the ribosomal 18S small subunit and 28S large subunit are widely used genes for placing newly described genera of fungi within a class in the Ascomycota (see Begerow et al. 2010).Here we show MCM7 can be used along with LSU to resolve genus-level relationships.In general, we found slightly better resolution and support with likelihood BP and BPP for the aforementioned genera with MCM7 in comparison with LSU (Figs 1 and 2; Table 5).All clades within these genera were highly supported based on the combined gene analysis (Fig. 3; Table 5).Our results are in agreement with those of Schmitt et al. (2009b), who showed the utility of MCM7 at the genus level for taxa such as Aspergillus, Lecanora, and Malcomiella.Peterson et al. (2010) also used MCM7 successfully with other protein coding genes such as RPB2 and TSR1 to resolve phylogenetic relationships of the genus Hamigera, an ascomycete fungus belonging to the Eurotiomycetes.The MCM7 gene was also recently used in species delimitation of a lichen forming fungus, Xanthoparmelia (Leavitt et al. 2011).The authors reported high parsimony informative variable characters in MCM7 compared to other protein coding (Beta-tubulin) as well as other ribosomal gene markers (ITS, LSU).More recently, Spribille et al. (2011) used MCM7 for phylogenetic analysis of the boreal lichen Mycoblastus sanguinarius.Although MCM7 was reported as being highly variable and showed good phylogenetic signal, it showed a higher level of transition saturation at the third codon position (Spribille et al. 2011).The authors concluded that caution must be taken when using MCM7 to recover gene phylogenies.Although we did not find a significant difference in the nodal support between the MCM7 and LSU genes (Table 5), overall, based on our study and results of some recent studies, it seems likely that MCM7 shows good potential as a candidate gene for evaluating interspecific relationships among the Ascomycota.

Combine gene analyses
Combining datasets generally provides better resolution and nodal support for clades in phylogenetic analyses of the fungal kingdom (Lutzoni et al. 2004).Our combined LSU and MCM7 dataset showed enhanced phylogenetic resolution (Fig. 3) and increased nodal support for clades that were not strongly supported when analyzed separately (Table 4).Our data are in agreement with other Ascomycota studies that have shown that combining protein-coding data with nuclear ribosomal genes (either LSU or SSU) provides an increased number of supported nodes in phylogenetic analyses (Geiser et al. 2006, Hansen et al. 2005, Miller and Huhndorf 2005, Schoch et al. 2006, Spatafora et al. 2006, Tang et al. 2007).Hofstetter et al. (2007) concluded that for better resolution and support of clades in phylogenetic analyses of fungi more characters and proteincoding genes in particular are important.Our study also supports the prediction by Schmitt et al. (2009b) who suggested that MCM7 has a higher potential to resolve phylogenetic relationships between fungi when analyzed in combination with other commonly used genes such as LSU.In addition, our PI analyses using PhyDesign shows that MCM7 was a more phylogenetically informative gene than LSU.Schoch et al. (2009a) also found that protein-coding genes had better PI profiles than those of rDNA genes.

MCM7 codon saturation
In this study the third codon position in MCM7 appears to be saturated based on scatter plots of substitution saturation curves (Fig. 4), which agrees with results of empirical tests by Xia et al. (2003).Spribille et al. (2011) also showed a higher level of transition-saturation at the third codon position for MCM7 gene in their phylogenetic analyses.Substitution saturation appears to be a common problem among protein-coding genes routinely used for inferring phylogenetic relationships among fungi (Liu et al. 1999, Hansen et al. 2005, Matheny et al. 2007, Miller and Huhndorf 2005, Sung et al. 2007).There are currently two schools of thought regarding the inclusion or exclusion of third codon positions from saturated protein-coding genes and their method of utilization for phylogenetic analyses.One group is of the opinion that third codon positions should be excluded in ML analysis because these fast evolving, saturated characters can decease the signal/noise ratio, thus providing misleading interpretations of evolutionary relationships (Blouin et al. 1998, Swofford et al. 1996, Xia et al. 2003).Conversely, the other group suggests the inclusion of the third codon position since the presence of more phylogenetically informative characters helps with potentially decreasing stochastic errors and increases branch-support values (Edwards et al. 1991, Källersjö et al. 1998, Müller et al. 2006, Simmons et al. 2006).Björklund (1999), however, suggests that unless one finds evidence that third codon positions are significantly misleading they should not be eliminated from analyses a priori.
Based on our analyses of the MCM7 dataset with and without third codon positions (Fig. 2, all codon positions included, and Fig. 5, third codon positions excluded), we found that exclusion of third codon positions did not have a major effect on the monophyly of the classes, except that the subclass Xylariomycetidae was nested within the Sordariomycetidae when third codon positions were excluded (Fig. 5).However, exclusion of third codon positions led to a loss of nodal support (MLBS and BPP) for several clades both at the class and genus level (Table 4, 5).These results are in agreement with those found by Edwards et al. (1991), who found that removal of third codon positions in mitochondrial genes in a group of birds resulted in "biological unreasonable" groupings as well as loss of BS for one of the branches in their phylogenetic tree.Hackett (1996) also found that removal of saturated third codon positions from mitochondrial genes in another bird study resulted in a loss of phylogenetically informative transversions.Therefore, for our 89-taxon MCM7 gene phylogeny it seems appropriate to include the third codon positions in order to retain appropriate tree topology as well as MLBS and BPP nodal support.We concur with the conclusions of Simmons et al. (2006) that despite indications of saturation, third codon positions must be included in phylogenetic analyses since they contain a large number of phylogenetically informative characters.

Conclusions
We have presented evidence for the phylogenetic utility of MCM7 among the Ascomycota.Results of the PI profiles show that MCM7 was more informative than LSU.Here we show that this locus can also be used successfully for determining phylogenetic relationships of non-lichenized ascomycetes and provides good resolution and support at half the cost compared to LSU because we used only two primers to sequence the MCM7 gene as opposed to four primers used routinely for LSU.In addition, no introns were present in the MCM7 gene for the taxa sequenced in our study.MCM7 seems to qualitatively contribute to better resolution of higher as well as lower taxonomic level clades.We also show that combined LSU and MCM7 gene phylogeny had superior resolving power for both class and genus level relationships since all major classes received high BS in both PhyML and RAxML bootstrap analyses as well as high BPP values.We report that although the third codon position of MCM7 is saturated, it may be better to analyze the dataset with all codon positions included.Exclusion of third codon positions compromised the overall topology of the tree and, in some clades, resulted in poor nodal support with MLBS and BPP, perhaps due to exclusion of a significant number of phylogenetically informative characters.Lutzoni et al. (2004) suggested "there is a great need for housekeeping protein-coding genes to be sequenced and combined with other loci to assemble the fungal tree of life".The results from this study suggest that MCM7 will make an important contribution toward such an effort.

Future Directions
MCM7 shows good potential to be a candidate gene for fungal phylogeny reconstruction, especially for the Ascomycota.However, future studies comparing MCM7 with RPB1, RPB2, and EF1 alpha are warranted for the Ascomycota to better understand which single copy protein coding locus is easiest to PCR amplify and sequence, while at the same time also provides the greatest amount of phylogenetic informativeness.

Figure 1 .
Figure 1.Maximum Likelihood phylogeny of Leotiomyceta (Ascomycota) based on 28S nrDNA large subunit data set (1141 bp) of 89 taxa using PhyML ((-ln)L score 14971).Thickened branches indicate significant Bayesian posterior probabilities ≥ 95%; numbers refer to PhyML/RAxML bootstrap support values ≥ 70% based on 1000 replicates.One representative each from Saccharomycotina and Taphrinomycotina was used as outgroup taxa.The major classes are shaded in grey.Classification following Hibbett et al. (2007) is shown on the right.
Comparision of PhyML bootstrap support (PhyBS), RAxML bootstrap support (RAxBS), and Bayesian posterior probabilities (BPP) of all lineages within the Leotiomyceta (Ascomycota) included in the present study obtained from separate and combined data partitions of LSU rDNA and MCM7 sequence data.Only values > 70% BS, and > 95% BPP are shown.

Figures
Figures 4a, b.Nucleotide substitution saturation plots: The proportion of transitions (s) and transversions (v) were plotted against sequence divergence using Jukes-Cantor evolutionary distance in the program DAMBE.

Figure 6 .
Figure 6.Phylogenetic informativeness profiles for two genes LSU (1076 bp) and MCM7 (642 bp) through 1.4 time units using PhyDesign online tool.Tree was obtained with PHYML.The relative time units are shown on the X-axis and profiles of net and per-site phylogenetic informativeness is shown on the Y-axis.Profiles of LSU gene are shown in red and MCM7 are shown in green.

Table 2 .
Comparison of datasets and trees in phylogenetic analyses.
a Excluding sites at 5' and 3' ends b Divided into first, second, and third codon positions for MCM7; total shown in parentheses

Table 3 .
Maximum likelihood best-fit evolutionary models and parameters for separate and combined data sets selected by Akaike Information Criterion.
(Rodríguez et al. 1990ble model(Rodríguez et al. 1990) with unequal base frequencies, gamma distribution with among site variation and a proportion of sites are invariable.b Proportion of invariable sites c Variable sites gamma distribution parameter

Table 5 .
Genus level relationships among selected genera used in the present study.Support values and analyses are same as in Table4.
. Recently, Schoch et al. (2009b) proposed a new class, Geoglossomycetes, based on a multi-gene phylogeny.Our results for the MCM7 gene are in agreement with Schoch et al.'s study as Geoglossomycetes is shown as a monophyletic group with strong MLBS and BPP support (Fig 2, Table