Estimating the evidence of selection and the reliability of inference in unigenic evolution
 Andrew D Fernandes^{1, 2}Email author,
 Benjamin P Kleinstiver^{1},
 David R Edgell^{1},
 Lindi M Wahl^{2} and
 Gregory B Gloor^{1}
https://doi.org/10.1186/17487188535
© Fernandes et al; licensee BioMed Central Ltd. 2010
Received: 6 May 2010
Accepted: 8 November 2010
Published: 8 November 2010
Abstract
Background
Unigenic evolution is a largescale mutagenesis experiment used to identify residues that are potentially important for protein function. Both currentlyused methods for the analysis of unigenic evolution data analyze 'windows' of contiguous sites, a strategy that increases statistical power but incorrectly assumes that functionallycritical sites are contiguous. In addition, both methods require the questionable assumption of asymptoticallylarge sample size due to the presumption of approximate normality.
Results
We develop a novel approach, termed the Evidence of Selection (EoS), removing the assumption that functionally important sites are adjacent in sequence and and explicitly modelling the effects of limited samplesize. Precise statistical derivations show that the EoS score can be easily interpreted as an expected logoddsratio between two competing hypotheses, namely, the hypothetical presence or absence of functional selection for a given site. Using the EoS score, we then develop selection criteria by which functionallyimportant yet nonadjacent sites can be identified. An approximate power analysis is also developed to estimate the reliability of inference given the data. We validate and demonstrate the the practical utility of our method by analysis of the homing endonuclease IBmol, comparing our predictions with the results of existing methods.
Conclusions
Our method is able to assess both the evidence of selection at individual amino acid sites and estimate the reliability of those inferences. Experimental validation with IBmol proves its utility to identify functionallyimportant residues of poorly characterized proteins, demonstrating increased sensitivity over previous methods without loss of specificity. With the ability to guide the selection of precise experimental mutagenesis conditions, our method helps make unigenic analysis a more broadly applicable technique with which to probe protein function.
Availability
Software to compute, plot, and summarize EoS data is available as an opensource package called 'unigenic' for the 'R' programming language at http://www.fernandes.org/txp/article/13/ananalyticalframeworkforunigenicevolution.
Keywords
Background
One of the principal reasons for studying molecular evolution is that the function of a novel protein can be deduced, in part, by comparing it with a similar previouslycharacterized protein. But what recourse do we have if the novel protein does not exhibit significant sequence similarity to other proteins? More problematically, what if it is similar only to proteins of unknown function? In practice, even when the novel protein shares regions of extensive similarity to proteins of known function, it may be difficult to elucidate the importance of individual sites in the novel protein.
Unigenic Evolution
One innovative experimental approach that can help identify specific domains or residues required for function is unigenic evolution, first described and developed by Deminoff et al. [1]. Unigenic evolution can be applied to any protein where the loss of function can be used as a selectable phenotype [1–5].
The procedure consists of random mutagenesis and amplification of a single wildtype sequence via mutagenic polymerase chain reaction (PCR) with subsequent cloning and functional selection [6]. Functional clones are isolated and characterized by DNA sequencing. In contrast to traditional structurebased mutagenesis screening, unigenic evolution experiments produce an unbiased estimate of functionallyimportant residues regardless of putative structural role or conservation.
Deminoff's Analysis
The selection process ensures that, in functional clones, amino acids essential for function will be conserved relative to nonessential sites. However, differential mutation sensitivity can be caused by more than structural or functional constraints. Mutation rates of residues may differ due to differential transition/transversion rates, codon usage, and genetic code degeneracy. To correct for these confounding factors, Deminoff et al. developed a statistical analysis that compared the expected versus the observed mutation frequency for each codon, where the expected frequencies were derived from a population of clones that had not been subject to selection.
Deminoff et al. clearly demonstrated the importance of accounting for nonuniform transition versus transversion probabilities when computing expected mutational frequencies. To increase the inferential power of their analyses, they also developed a 'slidingwindow' χ^{2}analysis, binning together a 'window' of adjacent codons, assuming that residues critical for protein function would be nearby in primary structure. By comparing the probabilities of silent versus missense mutation in these windows, regions of either restrained or excessive mutability were identified as hypo or hypermutable, respectively.
Behrsin's Analysis
The subsequent analysis of Behrsin et al. [7] advanced the statistical framework of Deminoff et al. by improving three key features. These features were (a) the fixed window size of the χ^{2}analysis, and (b) the effect of samplesize on the codon mutation probability, and (c) accounting for multiple nucleotide mutations per codon. First, window size for the χ^{2}analysis was addressed by using windows of different sizes and comparing estimated falsediscovery rates. The 'best' window was selected via tradeoff between the estimated sensitivity and specificity for classifying hypo or hypermutable residues. Second, nucleotide substitution frequencies were computed using the continuity correction of Yates [8] resulting in more consistent codon mutation frequencies. Third, codon mutation frequencies were computed analytically from nucleotide substitution frequencies without the assumption that only one substitution per codon was likely.
Further Improvements
 1.
relaxing the assumption that sample sizes are large enough such that asymptotic normality necessarily applies,
 2.
relaxing the assumption that selectionsensitive regions of a protein are contiguous,
 3.
clarifying the relationship between Fisherstyle pvalues and NeymanPearson TypeI and TypeII error probabilities with regard to testing hypotheses of functional selection,
 4.
relaxing the the assumption that the PCR amplification protocol does not meaningfully affect mutation probabilities, and
 5.
addressing the ability to of unigenic evolution to detect hypermutability.
We expand upon each of these points, in turn, below.
First, both Deminoff et al. and Behrsin et al. equate observed event relative counts with the respective event probabilities. This equivalence is effectively true when either sample sizes are asymptotically large or probabilities are nonextreme (not too close to either zero or one). However, experimentallyfeasible sample sizes are typically limited to the order of 50100 replicates (clones) and even the most mutagenic of PCR conditions result in low probabilities (≈ 0.001 to 0.01) of point mutation. Therefore it is unlikely that observed counts have a simple relationship with the event frequency, even accounting for the continuity correction of Yates [8]. The difficulty of estimating probability parameters from eventcounts when the likelihood of the event is very small is a wellknown problem from the inference of binomial and multinomial frequency parameters [9]. The most obvious consequence of assuming "counts ≈ probabilities" under these constraints is that the normal approximation, on which the χ^{2} statistic is critically dependent, may be invalid enough to yield misleading results. At the very least, the sampling variance of the χ^{2} statistic itself is necessarily quite large. The anticipated parameter ranges above, for example, yield a coefficient of variation for χ^{2} to be on the order of 100300%. An additional problem with equating counts and probabilities is that, in doing so, the analysis of Behrsin et al. implicitly conditions on the total number of mutations as given. We anticipate that the actual number of mutations would be roughly Poisson distributed, implying that the variance of the mutation count will be on the order of the expected count itself, further degrading the validity (or at least power) of the χ^{2} statistic to correctly determine the effect of selection.
Second, the assumption that selectionsensitive regions of a protein are contiguous is incorrect because proteins are threedimensional amino acid chains where secondary, tertiary, and quaternary folds bring distantinsequence residues to threedimensional proximity. Perhaps the most widely known example of functionallyimportant yet noncontiguous sites is the catalytic triad of residues in serine proteases such as trypsin [10, 11]. Trypsin proteins have three absolutely required residues that form a chargerelay system needed for activity; with respect to the human sequence these are H57, D102, and S195. These three residues are widely separated in sequence, but are adjacent in the 3 D fold of the protein. Furthermore, S195 is surrounded by a conserved set of residues, but H57 and D102 are not. The residues surrounding the H58 and D102 equivalents in other organisms are very different and are drawn from all classes of amino acids. Thus using a 'windowing' procedure to identify hypomutable regions would fail in the H57 and D102 instances since many different amino acids are tolerated adjacent to an absolutely conserved position in a protein family. If selectionsensitive residues cannot be presumed contiguous, it is unclear how sites can be partitioned into 'selected' and 'nonselected' groups while correcting for implicit and combinatoriallyincreasing number of multiple comparisons.
Third, the use of a Fisherstyle hypothesis test to compute a pvalue directly is not equivalent to the estimation of the NeymanPearson TypeI and TypeII error probabilities α and β, respectively [12]. Although often confused in the literature, p and α are not interchangeable. Specifically, Fisher's pvalue expresses the probability of the observed and more extreme data given the null hypothesis, and can be considered a random variable whose distribution is uniform on (0, 1) under that null. In contrast, the NeymanPearson α and β values directly compare the probability of the observed data given either the null or alternate hypotheses as correct. The value of α must be fixed before the observations are made and is subject to the minimization of β. Confusion between α and p results in the systematic exaggeration of evidence against the null hypothesis [13–17].
Fourth, current unigenic evolution analyses treat the mutagenic PCR protocol as a 'black box' process that takes a wildtype sequence as input and produces a pool of mutagenized clones as output. However, the physical process of mutagenic PCR via nonproofreading Taq polymerase imposes significant constraints on the amplification process in turn constrains properties of the final clone population. We show herein that the ultimate probabilities of the different classes of codon mutation are complicated, nonlinear functions of the Taq misincorporation probabilities. Given the complicated relationship between Taq misincorporation frequencies and codon mutation frequencies, it is important to determine if the PCR protocol meaningfully constrains observed codon mutation frequencies.
Last, the claim that unigenic evolution can detect hypermutability follows the fact that critical values of the χ^{2} statistic can be observed due to either too few or too many observed mutation events. However, mutagenic PCR of a nucleotide sequence is an anisotropic 'random drift' through sequencespace. Selection can only act on the drifting sequence by 'slowing' its progression along trajectories that realize lessfunctional mutants, since these lessfunctional sequences are preferentially discarded. Neither mutagenic PCR nor functional selection are capable of 'accelerating' the sequence drift, thus implying that a 'large' number of observed mutations at a given site, while improbable, are not unexpected. We therefore claim that unigenic evolution is fundamentally incapable of detecting hypermutability, positing that unexpectedlylarge sitemutation counts stem from either ordinary sampling variance, or nonmodelled systematic, procedural, or other experimental errors.
The statistical framework described herein provides estimates of both the evidence of selection, and the statistical power available to detect that selection, independently for each codon site. It provides explicit comparisons with internal positive and negative controls, thus reducing the impact of systematic or experimental errors. It identifies individual sites rather than broad regions for followup analysis, and can guide wildtype sequence optimization with regard to unigenic mutability. With its emphasis on analytical rigour and its availability as an easytouse software addin package, this work helps to make the analysis of saturatingmutagenesis experiments both statistically sound and broadly accessible.
Results
To implement the aforementioned improvements we have developed a new method for analyzing data from unigenic evolution experiments. The analytical framework we have developed is uniquely powerful because it has no assumption of normality or other asymptotic approximations; has no requirement that critical residues be contiguous; provides detailed estimates of TypeI and TypeII errors; explicitly models the relationship between polymerase misincorporation probabilities and PCR mutation probabilities; and easily adapts to different codon compositions, nucleotide biases, and genetic codes. Our method, including subroutines for data visualization and summarization, was implemented as a package called unigenic as part of the R statistical software system [18] and is available under an opensource license.
Overall Procedure
Like previous methods, our procedure begins by estimating the 'background' nucleotide mutation frequencies given by the mutagenic PCR on a control population of clones that are not subject to functional selection. This population, termed the 'unselected' population, serves as a control group that describes expected mutation frequencies under the null hypotheses of 'no functional selection'. The nucleotide mutation frequencies of the unselected clones are used to compute synonymous and nonsynonymous mutation probabilities for the codons of the wildtype sequence under the null hypothesis. Finally, the observed number of synonymous and nonsynonymous mutations occurring at each codon of the functionallyselected clones are tested to see whether they are more concordant with the null hypothesis or a generalized alternative.
The overall procedure of using nucleotide mutation frequencies to estimate codon mutation frequencies was first suggested by Deminoff et al. who argued that adequate statistical power could not be realized if only codontriplet mutations were analyzed. During the development of our method, we tested the hypothesis that synonymous and nonsynonymous mutation counts alone have sufficient power to resolve functional versus nonfunctional proteins. Our test used a sitebysite test of multinomial homogeneity as described by Wolpert [19]. A test of multinomial homogeneity in our context is a test that, given the count of synonymous and nonsynonymous mutations at a particular site in both the selected and unselected populations, asks if the the counts are compatible with the hypothesis that the mutation frequencies are equal. This test is unique in that it does not require inferring or comparing frequencies directly. Instead, all possible frequencies that are compatible with the data are considered simultaneously. We found, for the experimental system described below, that codon mutation frequencies alone are insufficient to discriminate between populations. Results of the homogeneity tests are shown in Additional File 1. This lack of ability to discriminate populations underlies the necessity of formulating a null hypothesis via nucleotide mutation frequencies, and confirms the supposition of Deminoff et al..
Experimental System
Complete details of the experimental system and data analyzed in conjunction with the development of our method are presented in Kleinstiver et al. [20]. Briey, our experimental system analyzed the 266 amino acid GIYYIG homing endonuclease IBmol[21–23], a sitespecific DNA endonuclease consisting of an Nterminal catalytic domain connected by a linker region to a Cterminal DNAbinding domain. The Nterminal domain of ≈ 90 amino acids contains the classdefining GIYYIG motif that is highly conserved in almost all the members of this endonuclease family. IBmol binds to a ≈ 38 bp recognition sequence (the homing site) and introduces a doublestranded DNA break leaving a singlestranded 2nucleotide 3'overhang. We took advantage of the sitespecific endonuclease activity of IBmol in the genetic selection to isolate functional variants after 30 cycles of mutagenic PCR. The genetic selection utilizes a chloramphenicol resistance plasmid (pExp) to express wildtype IBmol or mutant variants, and a second ampicillin resistant compatible plasmid (pTox) that contains the IBmol homing site. pTox also encodes a DNA gyrase toxin under the control of an inducible arabinose promoter. Cells that contain both plasmids only survive plating under selective conditions if IBmol is functional and can cleave pTox. Cleavage of pTox by IBmol generates a linear plasmid that is rapidly degraded, thus removing the gyrase toxin and promoting cell survival. If IBmol is nonfunctional and pTox is not linearized, cells will not survive due to the activity of the gyrase toxin. Under nonselective conditions, all cells will survive regardless of whether IBmol is functional because there is no requirement to cleave the toxic plasmid. Using this genetic selection, we introduced random pointmutations in the IBmol coding region by mutagenic PCR, and isolated and sequenced 87 functional 'selected' clones after plating on selective media. We also isolated and sequenced 87 clones isolated on nonselective media resulting in a pool of 'unselected' clones in order to establish baseline mutagenesis frequencies.
PCR Misincorporation Frequencies
Sample Misincorporation Counts Under H_{0}
Unselected Clone Counts  Selected Clone Counts  

  A  C  G  T  A  C  G  T 
A  23 656  18  27  182  24 024  11  12  80 
C  65  18 045  1  184  24  18 058  1  72 
G  282  4  11 176  29  154  0  11 203  9 
T  270  29  19  15 439  71  27  7  15 673 
Total  24 273  18 096  11 223  15 834  24 273  18 096  11 223  15 834 
To remedy these difficulties, the columns of P were presumed to be multinomial probabilities of a 'blackbox' mutagenic process. Given an 'input' wildtype nucleotide j, column j of P gives the multinomial probabilities of the resultant 'output' clonal nucleotide. Under the hypothesis that the output nucleotide depends only on the input nucleotide, each of the four columns of P describe four, independent multinomial distributions.
where α is a vector of hyperparameters with each component set to 1/2. The justification and derivation of (1) are detailed under the 'Methods' subsection 'Multinomial Estimation'. Equation (1) defines P as a standard linear Markov operator. Let the fourdimensional vector x_{ j } denote the frequencies of the wildtype nucleotides A, C, G, or T at site j. In general the wildtype sequence will not display polymorphism, implying that x_{ j } is equal to one of [1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], or [0, 0, 0, 1]. The action of P on x_{ j } , given by standard matrixvector multiplication, results in y_{ j } = P_{ xj } representing the frequency of nucleotides expected in site j within the unselectedclone population.
Modelling the Polymerase
The main difficulty with accepting the hypothesis that the columns of P describe independent multinomial processes is that this hypothesis is not concordant with inspection of observed data. Deminoff et al. specifically noted correlated differences in mutation events that were attributed to differences between transition and transversion processes. Behrsin et al. observed ostensibly the same phenomenon, noting that complementary misincorporation events, such as {C ← A} and {G ← T}, always had similar counts [[7], Table 1]. Again, this similarity was attributed to differences between the mutagenic mechanisms leading to transition and transversion.
Estimates for the Expected Nucleotide Mutation Frequency.
Relative Count  Natural Parameter  Maximum a posteriori  

  A  C  G  T  A  C  G  T  A  C  G  T 
A  97.46  0.10  0.24  1.15  97.46  0.10  0.24  1.15  97.47  0.13  0.20  1.14 
C  0.27  99.72  0.01  1.16  0.27  99.72  0.01  1.16  0.24  99.66  0.02  1.17 
G  1.16  0.02  99.58  0.18  1.16  0.02  99.58  0.18  1.16  0.02  99.65  0.23 
T  1.11  0.16  0.17  97.51  1.11  0.16  0.17  97.51  1.12  0.19  0.13  97.46 
We denote (2) as a 'secondary' null hypothesis since T is estimated using only the unselected clone counts C, thereby still representing the hypothesis of 'no functional selection'. A detailed description of the model underlying ${{H}^{\prime}}_{0}$ and the computational challenges associated with computing the posterior distribution of P as a function of T are discussed under the 'Methods' subsection 'Modelling PCR via Polymerase'. Again, P under ${{H}^{\prime}}_{0}$ is defined such that it too is a linear Markov operator. Frequencies P inferred under ${{H}^{\prime}}_{0}$ are subtly different than those derived under H_{0} or via relative counts and are shown in Table 2. Although appearing small, the significance of these differences is difficult to discern by inspection for two reasons. First, codon mutation frequencies under the null hypothesis are computed as the product of three nucleotide mutation frequencies. The effect of small differences among the p_{ ij } parameters is therefore geometrically amplified with respect to codons. Second, the PCR process is a nonlinear, exponential amplification of misincorporation rates T, implying that small changes in T can result in large changes in both nucleotide and codon mutation frequencies.
Codon Mutation Frequencies
Unigenic evolution presumes that selection operates on protein function and not during transcription or translation. Differences in protein function are caused by nonsynonymous amino acid substitution. Therefore the frequencies of nonsynonymous mutation need to be computed from the given frequencies of independent nucleotide mutation. Although Behrsin et al. provide a number of different sample formulas for deriving codon mutation frequencies given nucleotide frequencies, we describe a generalized method of deriving these frequencies for two reasons. First, our framework immediately accommodates different genetic codes, such as mitochondrial or chloroplast. Second, we require ready generalization, beyond the two classes of synonymous or nonsynonymous mutation currently used, for future work.
where M is a 64 × 64 linear Markov operator that operates on the space of codon frequencies and '⊗' denotes the standard Kronecker matrixproduct. An explicit depiction of the Kronecker product and how it relates nucleotide mutation to amino acid mutation is shown in Additional File 2. As with nucleotides, given wildtype codon frequencies w_{ j } at codonsite j, the quantity z_{ j } = Mw_{ j } represents the frequency of sitej clone codons after mutagenic PCR under the null hypothesis.
and the probability of nonsynonymous mutation as p_{ns} = 1  p_{sn}. Such matrix partitioning is simple to code in languages supporting namedindex arrayslicing, such as R [18]. It is also simple to adapt the required bookkeeping to any desired genetic code. For computational efficiency and ease of notation, we denote p_{sn,j}and p_{ns,j}to be the probabilities of synonymous and nonsynonymous mutation at codon j, using the subscripts to clearly differentiate them from the entries of nucleotide mutation matrix P, above.
Unit of Analysis
Rather than using only the two classes of synonymous and nonsynonymous mutation, it is theoretically possible to compare the observed counts for all 20 amino acids at each codon site. Such a comparison reduces M to a 20 × 64 matrix mapping codons to amino acids. Preliminary analyses of the ≈ 100 clones sequenced in each of our selected and unselected populations, however, showed insufficient power to make meaningful inferences at the amino acid level.
Amino acids or synonymous/nonsynonymous mutation are not the only possible units of analysis, however. For example, the assumption that selection operates at the protein level implies that the functional assay is independent of transcription or translation efficiency. Future work could easily test this hypothesis by reducing M to the three classes of 'identical', 'synonymous but not identical', and 'nonsynonymous'. Possibly, even different classes of codons could be defined. The main tradeoff with using more classes for analysis is the requirement for larger sample sizes. However, the basic hypothesistesting framework described below could be used with only trivial modification.
Alternate Hypothesis
The analysis of a unigenic evolution experiment compares observed sitespecific mutation counts between two populations. The first population, the control group, is not subject to selection; these are the 'unselected' clones and they provide the values of p_{sn,j}and p_{ns,j}under the null hypothesis.
The second population is subject to functional selection and results in a set of 'selected' clones. The frequency of mutation at each codon of the selected population provides an alternate hypothesis that can be compared with expectations under the null.
For codonsite j we denote ${n}_{\text{sn},j}^{\text{us}}$ and ${n}_{\text{ns},j}^{\text{us}}$ to be the respective number of observed synonymous and nonsynonymous mutations in the unselected population, and ${n}_{\text{sn},j}^{\text{mx}}$ and ${n}_{\text{ns},j}^{\text{mx}}$ to be the respective number of observed synonymous and nonsynonymous mutations in the selected population. The total number of clones sequenced from each pool is therefore ${n}^{\text{us}}={n}_{\text{sn},j}^{\text{us}}+{n}_{\text{ns},j}^{\text{us}}$ and ${n}^{\text{mx}}={n}_{\text{sn},j}^{\text{mx}}+{n}_{\text{ns},j}^{\text{mx}}$, both of which are constant for all j.
Likelihood Model
with codon mutations presumed to be mutually independent. The conditional hypothesis H dictates the origin of the probability parameters ${p}_{\text{sn},j}^{\text{us}}$, ${p}_{\text{ns},j}^{\text{us}}$, ${p}_{\text{sn},j}^{\text{mx}}$, and ${p}_{\text{ns},j}^{\text{mx}}$. Under the null H = H_{0} these parameters are computed via nucleotide counts C using (1). Under the secondary null $H={{H}^{\prime}}_{0}$ they are computed via polymerase misincorporation frequencies T using (2). Under the alternate hypothesis H = H_{ A } , they are inferred using only the sitespecific counts ${n}_{\text{sn},j}^{\text{us}}$, ${n}_{\text{ns},j}^{\text{us}}$, ${n}_{\text{sn},j}^{\text{mx}}$ and ${n}_{\text{ns},j}^{\text{mx}}$, respectively, theoretically accommodating any type of selection mechanism.
as discussed previously. Again, α is a vector of hyperparameters with each component set to 1/2 and the justification and derivation of (7) are detailed under the 'Methods' subsection 'Multinomial Estimation'.
Note that since parameters inferred under H_{ A } encompass arbitrary types of selection, they necessarily overlap with parameters under the null. However, under H_{ A } parameters ${p}_{\text{sn},j}^{k}$ and ${p}_{\text{ns},j}^{k}$ are relatively diffuse since they are estimated from ≈100 sequenced clones. Under H_{0} or ${{H}^{\prime}}_{0}$ these same parameters are estimated by ≈ 20 000 nucleotide misincorporations and are thus more precisely determined.
Lastly, we test both selected and unselected populations for consistency between hypotheses since doing so treats the unselected population as a negative control. This helps identify possible systematic or experimental errors during cloning or sequencing, thereby reducing false signals of selection.
Evidence of Selection
the expected logoddsratio of H_{ A } versus H_{0} given the number of observed mutations for codon j in clonepopulation k. We call ${R}_{j}^{k}$ the Evidence of Selection (EoS) for codon j in clonepopulation k.
as expected. Such 'addition' holds over very general conditions, and underlies not only the validity of (9) for computing ${R}_{j}^{k}$, but the foundations of information theory and the KullbackLeibler divergence [30].
The interpretation of logoddsratios in the literature is traditionally taken as the 'strength of evidence' favoring one hypothesis over another [31, 32]. However, since H_{ A } describes a general case that embeds H_{0} as an alternative, this traditional interpretation is inappropriate for ${R}_{j}^{k}$. Instead, ${R}_{j}^{k}$ is interpreted in the NeymanPearson sense, where it describes the relative probability of correctly classifying a specific observation to be due to either H_{ A } or H_{0}. Interpretation of ${R}_{j}^{k}$ requires considering three cases:

${R}_{j}^{k}>0$: H_{ A }is more probable than H_{0}, with ${R}_{j}^{k}$ being the expected logarithm of the truepositive to falsenegative ratio for determining whether or not selection operated on the site.

${R}_{j}^{k}\simeq 0$: Both H_{ A }and H_{0} are supported equally by the data, implying that the data are unable to differentiate between whether or not functional selection has occurred.

${R}_{j}^{k}<0$: Although technically implying that H_{0} is more probable than H_{ A }, the embedding of H_{0} within H_{ A }implies that negative values of ${R}_{j}^{k}$ will likely have small magnitudes and can be interpreted as if they are zero.
The interpretation of ${R}_{j}^{k}<0$ follows from Gibbs' inequality which guarantees the probabilityweighted average of ${R}_{j}^{k}\left({n}_{\text{sn},j}^{k},{n}_{\text{ns},j}^{k}\right)$ over all possible mutation counts to be nonnegative. Therefore observations for which ${R}_{j}^{k}<0$ are likely due to sampling variance and are best interpreted, unless large in magnitude, as if they were zero. In practice, as shown both in Figure 1 and the data of Kleinstiver et al. [20], large negative values for ${R}_{j}^{k}$ have not been observed.
A concrete example of interpreting ${R}_{j}^{k}$ is given by examining the GIYYIG motif of IBmol, shown greyhighlighted in Figure 1. Precise values of ${R}_{j}^{k}$ along with the expected and observed number of nonsynonymous mutations are given in Additional File 3. Although all six motifresidues are well conserved within this homing endonuclease family, Figure 1 shows that unigenic selection is only detectable for the tyrosine residues which show posterior oddsratios of 2^{5.8} ≈ 59.7toone in favour of selection. The fourtoone odds ratio or less shown by the other residues is by general convention considered to be negligible.
This example highlights how the lack of evidence of selection does not imply a lack of functional importance. Lack of evidence is precisely that: there is not enough data to classify a given site as either 'important' or 'unimportant'. Often, as can be seen with the GIYYIG motif, many codons are intrinsically resistant to nonsynonymous changes under mutagenic PCR. The glycine residues, for example, can be seen to have had five mutations at site 4 in the selected population but less than one of them is expected to be nonsynonymous. Much of the reason that selection is detectable at the tyrosine residues is the large number of expected nonsynonymous mutations, a number principally dependent on the tyrosine codon's nucleotide composition. The number of nonsynonymous mutations expected for different clone population sizes is shown in Additional File 4 and is described more fully later.
We note an advantage of our method over previous work is shown by examining ${R}_{j}^{k}$ values for the unselected population. There, the negligible values of ${R}_{j}^{k}$ act as a negative control indicating 'no evidence of selection' when no selection is actually present. Large values of ${R}_{j}^{k}$ in the unselected clone population would indicate the presence of systematic bias or experimental errors.
Power and Reliability
which can itself be integrated over the multinomial parameter posteriors, as before, to yield the overall expected divergence ${D}_{{H}_{A}}$. As interpreted by Kullback [26, 30], ${D}_{{H}_{A}}$ measures how distinguishable two random variables are in terms of the expected truepositive versus falsenegative rate. Conditioning on H_{0}, the complementary ${D}_{{H}_{0}}$ provides the expected truenegative versus falsepositive rate. Together, ${D}_{{H}_{A}}$ and ${D}_{{H}_{0}}$ specify the confusion matrix between hypotheses, thus providing a detailed power estimate of hypothesis distinguishability.
Effect of Sample Size
To help elucidate the practical effects of sample size on EoS values with respect to both the unselected and selected clone population, subsampled clone populations were analyzed with results displayed in Additional File 5. In brief, even as few as 10 unselected clones (yielding 100300 nucleotide misincorporation counts) were capable of giving reasonable estimates of parameter matrix T and sensitivity. Reasonable specificity however, as judged by the ability to correctly detect selection of the methionine start signal, was not achieved with fewer than all 87 of the unselected clones. With respect to the misincorporation frequencies estimated from the counts in Table 1, percentiles of the likely number of nonsynonymous mutations observed for given clone population sizes under the null hypothesis of 'no selection' are shown in Additional File 4. This table shows considerable nonnormality that is particularly pronounced for mutationresistant codons, highlighting the requirement (and opportunity) to 'tune' the effective selection pressure on individual residues by adjusting codon composition. Again, since normality is a requirement for the validity of χ^{2}based statistics, the nonnormality displayed by many residues even under very large sample sizes (> 500 clones) calls the validity of such analyses into question.
Global Insights from Local
The derivation of ${R}_{j}^{k}$ (EoS), although a primary result of this work, is alone insufficient to analyze unigenic evolution data. The following subsections provide additional computational details required for a complete analysis.
Selecting Selected Sites
One of the major shortcomings of previous work was the difficulty of discerning which groups of sites were under functional selection given statistical procedures that were designed under the assumption of siteindependence. Given n codons with 'sufficientlyhigh' EoS score, there are n! different ways to partition those n codons into functionallyimportant and functionallyunimportant categories and thereby estimate the falsediscovery rate. This huge number of partitions implies that traditional techniques for multiplecomparison correction reduce statistical power to impractically low levels.
The 'windowing' analyses of Deminoff et al. and Behrsin et al. were used to constrain the number of required multipletest corrections to a reasonable level.
can be interpreted as the log of relative probability that all sites in J were observed due to the action of selection as hypothesized by H_{ A } . Unlike traditional multipletest corrections such as Bonferroni's, ${R}_{J}^{k}$ intrinsically 'selfcorrects' for the number of sites considered.
The benefit of using ${R}_{J}^{k}$ as evidence of selection in unigenic evolution is clearly demonstrated by Figure eight of Kleinstiver et al. [20] via the functional analysis of IBmol. There, assays of N12 D, S20Q, H31A, I67N, and I71N mutants clearly implicate these residues, as identified by their EoS scores, as functionally important. This importance is seen experimentally as the generation of a phenotype distinct from wildtype for each respective mutant.
Note that ${R}_{J}^{k}$ systematically underestimates the posterior odds that a set of sites are subject to selection since it predicates on all sites of J being selected. If only one or two of these sites were falsepositives, ${R}_{J}^{k}$ behaves as if all sites J were falsepositives, artificially reducing the actual truepositive rate. It is straightforward, though tedious, to compute precise overall truepositive rate given ${R}_{j}^{k}$ for j ∈ J. However, in practice the individual values ${R}_{j}^{k}$ generally display a sharp boundary between 'large' and 'small' values, making the choice of putative functional sites straightforward via simple inspection (see Figure 1).
We further note that neither ${R}_{j}^{k}$ or ${R}_{J}^{k}$ are directly comparable to either the Hscores or χ^{2}values of Deminoff et al. or Behrsin et al. since the former condition explicitly on observed data while the latter condition on unobserved hypothetical data. Restated, the former is congruent to a TypeI error probability (α), while the latter is a Fishertype significance pvalue. Although correlated, these two values have no simple relationship.
Protein Mutation Count
For the experimental conditions used herein, nonsynonymous mutation probabilities varied between ≈ 010%, depending on the given codon. These mutation probabilities in turn determine κ, the total number of nonsynonymous mutations expected in the overall protein. The overall mutation count is an important experimental diagnostic since too few mutations lead to inefficient mutagenesis and high sequencing costs, while too many mutations result in nearlycertain functional knockout. The simplest method of computing the distribution of κ uses standard Monte Carlo techniques to perform in silico mutagenesis of the wildtype sequence given nucleotide mutation parameters under H_{0}.
Note that like previous analyses, both our null and alternate hypotheses assume siteindependence among mutations. Secondsite suppression or other modes of siteinteraction are not taken into account. Under rare conditions for IBmol, it is possible that up to 20 mutations could be expected for this 266site protein, making it likely that interaction effects are nonnegligible factors of selection.
Comparison with Previous Work
Nonetheless we note that even using the highlysensitive ${\chi}_{1}^{2}$ values, only 10 of the 266 sites exceeded the critical pvalue of p < 0.05, whereas using EoS, 41 of 266 sites exceeded the critical logodds ratio of 20:1 (see Figure 4). Thus previous methods identify less than 25% of sites identified with EoS. Most importantly, the additional sensitivity of the EoS value appears without detriment to specificity, as shown by the estimated falsepositive to truenegative ratios in Figure 1, ratios that are impractical to estimate via previous methods. Thus our method appears both significantly more sensitive and selective than previous work for the unigenic analysis of IBmol.
Another advantage of EoS values over χ^{2}based statistics is illustrated by the nontrivial comparison of the p < 0.05 and 20:1 oddsratio critical values shown in Figure 4. Although detailed interpretations and the differences thereof have already been discussed, in the particular case of the IBmol data shown in Figure 4 it is important to realize that Bonferroni correction of the ${\chi}_{1}^{2}$ values would render none of the sites significant by conventional measures. In contrast, the logodds ratio shows comparatively little powerloss due to multiplecomparison correction.
From a theoretical viewpoint, another advantage of the EoS value is that it deals with a very direct statistical question: how likely are the observed data given the model? Unrealistic EoS values are necessary and sufficient to diagnose unrealistic assumptions in the mathematical description of unigenic evolution. In contrast, summary statistics such as χ^{2} values similarly condition on model accuracy, but necessarily further condition on the statistic being a good indicator of the phenomenon under investigation  in this case, selection. Thus unrealistic values of χ^{2} can be similarly be attributed to model misspecification, but could also be due to the inadequacy of the chosen test statistic.
Additional Results
Although the main result of this work is the EoS score ${R}_{j}^{k}$ and its associated reliability estimate, two additional results were discovered during the analysis of the IBmol system. The first result concerned differences between H_{0} and ${{H}^{\prime}}_{0}$, the second concerned the treatment of stop codons.
Polymerase Versus PCR Modelling
A surprising result was that ${R}_{j}^{k}$ and power estimates computed under H_{0}, modelling only the overall PCR process, and ${{H}^{\prime}}_{0}$, modelling the misincorporation of nucleotides by Taq polymerase, were effectively indistinguishable. Observed logodds differences were all on the order of the Monte Carlo samplingvariance cutoff used to estimate each and were therefore effectively zero. The inability of the data to discriminate between H_{0} and ${{H}^{\prime}}_{0}$ implies that there is no effective difference between these models. In effect, there is no meaningful difference in the ${R}_{j}^{k}$ scores computed by either (a) treating mutagenic PCR as a 'blackbox' process or (b) modelling mutagenic PCR to be be consistent with the action of errorprone polymerase.
This finding is surprising because modelling the effect of the polymerase over multiple PCR cycles appears to be required in order to produce mutation frequencies, such as those in Table 2, where complementarybase frequencies are always almost equal. The similarity of respective ${R}_{j}^{k}$ values imply that differences between estimates of P under each hypothesis are negligible compared to the magnitude of uncertainty inherent in estimating P given C.
A consequence of this finding is that it implies that the polymerase may not be the major mechanism responsible for nucleotide mutation. For example, cytosine deamination via thermal decomposition during PCR cycling [33–35] provides a credible, alternative mutation mechanism. In this case the nonproofreading property of Taq would be a more important factor than its misincorporation characteristics.
Optimizing the experimental conditions under which mutagenesis occurs can have important consequences in the efficiency and expected outcome of unigenic evolution. For example, note the ≈ 100fold difference between {C ← G} and {G ← A} mutation frequencies shown in Table 2. Small changes in mutation frequencies due to different mutagenic protocols could effect large changes in the distribution of mutant codons, suggesting that future work should investigate mechanisms to elucidate the factors affecting the precise characteristics of mutagenic PCR.
Stop Codon Assumption
One of the more significant assumptions of this work is the presumption that the effect of stop codons can be ignored. Unlike other mutations, the appearance of a premature stop codon affects every subsequent codon, negates our assumption of siteindependence, and likely results in a complete loss of function. For the system examined herein and in Kleinstiver et al. [20], we found that the frequency of stop codon production was sufficiently small as to not significantly affect results or interpretation. Given this putatively small effect, a more exact treatment of stop codons and their functional effects are left for future work since a correct and rigorous treatment would likely add considerable algebraic and computational complexity.
Conclusions
From an experimental point of view, the evidence of selection at a given site represents only part of the required information; in general, the reliability of that evidence must also be assessed. Quantifying reliability is important since small sample sizes, mutation events that are too rare to be reliably estimated, and the effects of multiple comparisons can complicate the interpretation of unigenic evolution experiments. Our method, which computes both evidence and reliability, represents a significant advance over previous work since it simultaneously assesses both the evidence of selection and the reliability of inference.
Experimental validation of our methods was provided via analysis of the poorlycharacterized homing endonuclease IBmol, where previouslydescribed methods from the literature were unable to elucidate functionallycritical residues. With the ability to guide the selection of precise experimental mutagenesis conditions, our method makes unigenic analysis a more broadly applicable technique with which to probe protein function.
Methods
Herein we provide technical and implementation details for our analytical framework, the most important of which are (a) the estimation of multinomial frequencies from counts, (b) our model of mutagenic PCR via the action of an errorprone polymerase, and (c) selecting the prior and sampling the posterior of the polymerase misincorporation frequencies.
Multinomial Estimation
The estimation of multinomial frequencies from counts is one of the oldest subjects in statistics. When asymptoticallymany observations are available, both Bayesian and frequentist methods infer nearly identical parameter values, where frequencies are simply proportional to counts. However, when observations are rare, prior beliefs will always, necessarily significantly affect inferred results. These effects, thoroughly described by Jaynes [36], can be understood though a simple example. Suppose that 10 000 clones were sequenced of which 5000 were found to have nonsynonymous mutations at a given site. Using the normal approximation to the binomial distribution both the mean frequency of nonsynonymous mutation and its standard error are easily computed with high accuracy. However, if only one nonsynonymous mutation had been observed, the actual frequency of mutation is not clear since mutation frequencies of 0.5, 1, or 2 mutations per 10000 clones, a range of 400%, are all realistic and compatible with the given data. Prior belief that the mutation rate should be 'somewhat high' will favor the higher rate, surprise at seeing any mutation would imply the lower rate is more believable.
The consensus in the statistical community is now that there is no 'best' notion of 'prior ignorance' that can be universally considered correct [37, 38]. Instead, research has focused on developing methods with precise and wellcharacterized assumptions in order to minimize, in some sense, the influence of prior assumptions on the inference. These wellcharactered assumptions are called 'objective' or 'reference' priors and are the type of prior we choose as a basis for inferring both multinomial nucleotide mutation frequencies and nonsynonymous codon mutation frequencies.
where Pr(np) is the likelihood of the counts given parameters and Pr(p) is our assumed prior distribution of the parameters before any data is observed. Detailed informationtheoretic studies by Berger and Bernardo [25] found that p ~ Dirichlet(α) with all components of vector α set to 1/2 was a prior that formally minimized the inuence of the prior on the posterior Pr(pn). This specific prior was found to be invariant to reparameterization and is identical to the one derived by Jeffreys [39].
From an experimental viewpoint, invariance to reparameterization is an critical requirement for inferring frequencies from counts since the property implies that the same inference would be made if, for example, relative mutation rates had been estimated rather than frequencies. Any other choice of prior would yield different posterior values of p even when given identical data.
for each frequencycomponent i, where ψ denotes the digamma function. Equation (13) is termed the 'natural' parameter mean for p, and when n_{ i } is sufficiently large it is approximately equal to n_{ i } /Σ _{ i }n_{ i } . This similarity is evident when comparing the 'Relative Count' and 'Natural Parameter' estimates for nucleotide mutation frequencies in Table 2 where all four multinomial parameter set estimates agree to within 1%. However, for codon mutation counts on the order of 02 observations per 100 clones, differences between estimates can be considerable.
Modelling PCR via Polymerase
Modelling the full mutagenic PCR process requires formally describing both the process of nucleotide misincorporation via polymerase and the action of multiple cycles of denaturation, synthesis, and reassociation that are the basis of PCR. Our model uses a Bayesian framework that explicitly accounts for the rarity of nonsynonymous mutations to infer parameter values of this model.
Polymerase Misincorporation
Mutagenic PCR is composed of of multiple cycles of lowfidelity Taqbased amplification. The model we adopt assumes that under mutagenic conditions [6]Taq polymerase

induces errors only by nucleotide misincorporation,

has negligible slippage, stutter, or other errors due to repeats,

and has siteindependent misincorporation probabilities.
The assumption that slippage and stutter are negligible is substantiated by previous studies of Taq errors [6, 34] and visual inspection of our data. The assumption that nucleotide misincorporation events are independent is somewhat stronger since it may be argued that spatial distortions induced by templateadduct mispairing affect subsequent DNA synthesis. However, inspection shows misincorporation events to be sufficiently rare that this effect, if it exists, is of sufficiently small magnitude as to be negligible.
where τ_{ ij } denotes the probability that adductnucleotide i is basepaired to template nucleotide j.
With this notation, the columns of T sum to one. Further, since misincorporation events are rare, the counterdiagonal components τ_{AT}, τ_{CG}, τ_{GC}, and τ_{TA}are assumed to be ≈ 1, while all other parameters are ≪ 1. We emphasize that a full specification of T requires sixteen parameters and four constraints, yielding twelve independent degrees of freedom. These constraints imply that both matrix T and an an implicit twelveparameter model describing the relative Taq misincorporation frequencies are equivalent.
The Mutagenic PCR Process
For practical computation, however, equation (16) should be used to avoid excessive underflow and truncation errors.
where, again, each column sums to one. Given a wildtype nucleotide sensestrand statevector w_{ s } and Taq error probabilities T, operator P transforms w_{ s } into m_{ s } = Pw_{ s } , where m_{ s } is the probability statevector for a mutant sensestrand after k cycles of mutagenic PCR.
and each frequency p_{ ij } is complicated, nonlinear function of the misincorporation frequencies T. Since the entries of T, not P, are the fundamental parameters of likelihood (18), the standard Dirichlet prior described above is inappropriate. Instead, a corresponding noninformative 'objective' prior is derived for it, below.
Polymerase Priors and Posteriors
Choosing a prior distribution for likelihood (18) is not trivial, especially since there is no universal notion of "complete prior ignorance" [9, 37, 40]. Choice of prior is therefore governed by specific criteria assumed by the investigator to be important. One nearly universallyaccepted criterion is that of reparameterizationinvariance, a criterion requiring that the inference should not depend on the units of either the parameters or observations. The importance of such invariance has been detailed by Jeffreys [39], Wallace and Freeman [41], Jermyn [42], and many others. In our context, such invariance ensures that parameterizing the likelihood by, for example, either "the expected number of observed misincorporations per PCR cycle" or its reciprocal "the expected number of PCR cycles before a misincorporation is observed" yield equivalent inferences. Since there is no meaningful physical difference between these two parameterizations it is essential that each yield equivalent results.
Under even highlymutagenic conditions, the error probabilities given by the columns of T are known a priori to be very close to the extremes of either zero or one. Taq polymerase is one of the beststudied polymerases in molecular biology and its error rate is known to be heavily influenced by the precise experimental conditions under which it is used [6, 43]. Therefore, since the precise relative scales of the different types of polymerase errors are not known, we derived a prior for T that is 'objective' in the sense of Berger et al. [38]. An 'objective' prior is one that formally minimizes, in an informationtheoretic sense, the influence of the prior on the posterior and is constructed to always be invariant to reparameterization.
An Objective Polymerase Prior
where 'vec' is the standard matrixvectorization operator. Each columnconstraint of T, namely that it sum to one, is equivalent to removing a particular onedimensional subspace from span $\left\{\tilde{T}\right\}$.
The quotientspace (20) which fully describes the 16parameter T is therefore isomorphic to the 12dimensional additive Lie group of ℝ^{12} which in itself is a Hilbert space [27].
Further justification for assuming that ℝ^{12} is the 'correct' structure with which to understand T comes from realizing that although the columns of T are parameters in our likelihood function, they are also discrete and finite probability densities in their own right. Each column of T represents the relative frequency of concrete events, namely Taq misincorporations. When normalized, they represent multinomial probabilities; when nonnormalized they represent relative Poisson frequencies.
Our model is only interested in relative relative rates as given by the multinomial model, and it is wellknown that the natural parameter space of the multinomial distribution, as a member of the exponential family, is the logarithm of the multinomial parameters.
where F(T) denotes the Fisher Information Matrix (FIM) of the given likelihood model, in this case given by (18). However, it is not entirely obvious which likelihood model should be utilized. The likelihoods of codon mutation, as given by (5) and (6), condition on the the codonmutation probabilities. However, our nullhypothesis is computed at the nucleotide level because the nullhypothesis states that the selectedclone mutation probabilities are consistent with the unselectedclone mutation probabilities.
Incidentally, we note that although it is widely reported in the literature that Jeffreys' prior fails for 'simple' distributions such as the univariate normal with unknown mean and variance. In fact, Jeffreys' prior results in compatible with frequentist methods if a parameterization with Abelian Lie group structure is enforced.
with the expectation being taken over all possible observations.
where c_{++} is the total number of observed wildtypetomutantclone nucleotide pairs and p_{ j } is the probability of that the wildtype nucleotide is of type j. Estimation of p_{ j } is equivalent to determining the ratio of (G+C)/(A+T) content for the wildtype sequence.
where c_{ +j } is the rowvector of columntotals of c_{ ij } and can be taken as a given value. Note that in the pedagogical or extremely rare case that (27) is not accurate, the (G+C)content can always be estimated via standard Bayesian methods at the expense of the simplified computation that we utilize. It is also worth noting that mutagenic PCR never incorporates enough nucleotide changes to appreciably change (G+C)content.
where each parameter p_{ ij } is a function of T, and Jeffreys' prior easily computed via the 12dimensional pseudodeterminant. All that remains is to compute the first and secondorder derivatives of p_{ ij } with respect to the 16 entries of T.
These derivatives could be computed analytically. However, since typical experiments use on the order of k ≈ 30 PCR cycles, analytic derivatives of P with respect to T result in unwieldily and numericallyunstable expressions. Instead, secondorder differentiation arithmetic [44, 45] is used via operator overloading in Fortran to compute all required derivatives of p_{ ij } with respect to T. Briey, differentiation arithmetic computes a function and its gradient and Hessian simultaneously by utilizing the algebra of differential operators. The simple structure of all relevant equations make them particularly amenable toward straightforward implementation.
Sampling from the Posterior
where $\mathcal{N}$ denotes the multivariate normal and ${\tilde{F}}^{1}\left(T\right)$ denotes the inverse of the FIM divided by the samplesize used to estimate $\widehat{T}$. Frequencies P computed from the MAP estimate of T are shown in Table 2 and appear very similar to those estimated via relative frequencies and natural parameters. Even though (29) describes an asymptotic relationship, it can be exploited to sample the exact posterior of T via the Metropolis algorithm [9]. Specifically, the mean expected value of T, computed via (13), can be combined with our estimate of F via (29) to yield a viable proposal function for Metropolis sampling. With Metropolis sampling, differences between the true and approximate posterior of T are eliminated due to the use of rejection sampling. The approximation only affects sampling efficiency, not accuracy: the poorer the approximation, the larger the proportion of rejected samples. In practice, we find that acceptance ratios even as low as 110% yield posterior samples more than rapidly enough for practical analysis of large proteins.
Contrasting Methods
The classical molecular evolution literature suggests two contrasting approaches to the analytical model and method we have described which deserve particular recognition. The first contrast is between models of sequence evolution and the second is between established statistical methods.
Codon Evolution Models
Codonspecific models for classical molecular evolution have been previously described by Goldman et al. [46] and Mayrose et al. [47]. These models seek to describe the combined effects of codon mutation and selection through a continuoustime Markov process by prescribing a constrained form of the infinitesimal Markov generator Q for the hypothesized 64by64 codon substitution matrix M. The principal constraints used enforce the ideas that (a) only singlenucleotide changes may have nonzero rates, and (b) the nonzero substitution rates have a biologicallyrelevant parameterization. In contrast, unigenic evolution proceeds through multiple rounds of mutation from a single ancestor before concluding with a single selective sweep. It therefore describes a process fundamentally different from the combined mutationplusselection process approximated by classical molecular evolution models. Toward this end, we describe unigenic evolution by a discretetime Markov transition model M. Note that since classical molecular evolution must account for hypothetical unobserved states throughout continuous time, it is necessarily parameterized by the instantaneous Markov generator Q. An immediate consequence of this differing parameterization is that two nucleotide misincorporations per codon per "instantaneous mutation step" necessarily have zero probability under the classical molecular evolution model but nonnegligible probability under the unigenic evolution model.
Furthermore, even approximate comparisons between the two models are difficult to describe because the combined mutationplusselection model of classical molecular evolution is necessarily constrained to satisfy the detailbalance relationship Q_{ ij }p_{ j } = Q_{ ji }p_{ i } , where p denotes the unique stationary vector of Q. In contrast, the stepwisemutation mechanism of Taq misincorporation that drives unigenic evolution implies that detail balance is not required nor even desirable.
Thus we conclude that the models and processes describing classical molecular evolution and unigenic evolution are sufficiently different to preclude straightforward comparison.
Maximum Likelihood Methods
Both likelihood methods [48] and Bayesian methods have a rich history of use in classical molecular evolution, and the general differences between these approaches have been discussed extensively in both the evolutionary and statistical literature [9]. To a first approximation, likelihood methods are well known to be equivalent to their Bayesian counterparts under the conditions of (a) an asymptotically uniform prior and (b) asymptotically large sample sizes. Under these conditions, the task of parameter estimation is essentially equivalent to the task of hypothesis testing, and different statistical frameworks yield essentially identical inferences.
In unigenic evolution experiments we know a priori that nucleotide misincorporation probabilities are very small. Since our hypothesis tests rely on indirectly estimating the probability of very rare events, it is therefore difficult to justify the assumption of "asymptotically large sample size" required by likelihood techniques. Furthermore, it is well known for these types of multinomialinference problems that uniform priors are generally inappropriate [25, 38]. Thus the two key requirements of maximum likelihood theory are not met by our model of unigenic evolution, motivating our decision to use Bayesian methods.
Furthermore, our methods have been developed specifically to test hypotheses about sitespecific selection and not estimate the strength of selection. Unlike likelihood methods, for Bayesian methods the tasks of "model selection" and "parameter estimation" are often not equivalent and can give seeminglyinconsistent inferences without careful analysis (see Kass et al. [31, 49] for details). We believe that the most direct comparison with previous work can be done in the context of NeymanPearson testing [12]. For traditional NeymanPearson testing, a critical value of observed nonsynonymous substitutions would be computed for each site based on estimated Taq misincorporation frequencies. If fewer nonsynonymous substitutions than that critical value are observed, that site is classified as being under selection. In some sense this procedure "doubledips" the data; on one hand using observations to infer misincorporation frequencies, and on the other using observations to actually classify the site.
In contrast, our KullbackLeiblerbased approach uses the data only to test hypotheses at each site, integrating over all possible hypothetical data sets that could have been observed given plausible misincorporation rates. More specific and extensive comparisons between the KullbackLeibler approach and the NeymanPearson approach have been extensively studied in the statistical literature, although they are not especially well known [[26], especially pp. 45]. More important than their differences, however, are their similarities. Both compute truepositive/falsenegative and truenegative/falsepositive classification ratios for the null and alternate hypotheses; they simply use different methodological approaches to do so.
Software Availability
Declarations
Acknowledgements
This work was supported by the Natural Sciences and Engineering Research Council of Canada, the Government of Ontario, and the University of Western Ontario. Data analysis was performed using software from the Rproject [18].
Authors’ Affiliations
References
 Deminoff SJ, Tornow J, Santangelo GM: Unigenic evolution: a novel genetic method localizes a putative leucine zipper that mediates dimerization of the Saccharomyces cerevisiae regulator Gcr1p. Genetics. 1995, 141 (4): 12631274.PubMedPubMed CentralGoogle Scholar
 Friedman KL, Cech TR: Essential functions of aminoterminal domains in the yeast telomerase catalytic subunit revealed by selection for viable mutants. Genes and Development. 1999, 13 (21): 28632874. 10.1101/gad.13.21.2863PubMedPubMed CentralView ArticleGoogle Scholar
 San Filippo J, Lambowitz AM: Characterization of the Cterminal DNAbinding/DNA endonuclease region of a group II intronencoded protein. Journal of Molecular Biology. 2002, 324 (5): 933951. 10.1016/S00222836(02)011476PubMedView ArticleGoogle Scholar
 Zeng X, Zhang D, Dorsey M, Ma J: Hypomutable regions of yeast TFIIB in a unigenic evolution test represent structural domains. Gene. 2003, 309: 4956. 10.1016/S03781119(03)00492XPubMedView ArticleGoogle Scholar
 Behrsin CD, Bailey ML, Bateman KS, Hamilton KS, Wahl LM, Brandl CJ, Shilton BH, Litchfield DW: Functionally important residues in the peptidylprolyl isomerase Pin1 revealed by unigenic evolution. Journal of Molecular Biology. 2007, 365 (4): 11431162. 10.1016/j.jmb.2006.10.078PubMedView ArticleGoogle Scholar
 Cadwell RC, Joyce GF: Mutagenic PCR. Genome Research (PCR Methods and Applications). 1994, 3 (6): S136S140. 10.1101/gr.3.6.S136View ArticleGoogle Scholar
 Behrsin CD, Brandl CJ, Litchfield DW, Shilton BH, Wahl LM: Development of an unbiased statistical method for the analysis of unigenic evolution. BMC Bioinformatics. 2006, 7: 150 10.1186/147121057150PubMedPubMed CentralView ArticleGoogle Scholar
 Yates F: Contingency Tables Involving Small Numbers and the http://www.jstor.org/stable/2983604 10.2307/2983604View ArticleGoogle Scholar
 Robert CP: The Bayesian choice: from decisiontheoretic foundations to computational implementation. 2001, Springer texts in statistics, New York: Springer, 2,Google Scholar
 Rawlings ND, Barrett AJ: Families of serine peptidases. Methods in Enzymology. 1994, 244: 1961. full_textPubMedView ArticleGoogle Scholar
 Polgár L: The catalytic triad of serine peptidases. Cellular and Molecular Life Sciences. 2005, 62 (1920): 21612172. 10.1007/s000180055160xPubMedView ArticleGoogle Scholar
 Neyman J, Pearson E: On the Problem of the Most Efficient Tests of Statistical Hypotheses. Philosophical Transactions of the Royal Society of London, Series A. 1933, 231: 289337. 10.1098/rsta.1933.0009View ArticleGoogle Scholar
 Hubbard R, Bayarri MJ: Confusion Over Measures of Evidence (p's) Versus Errors (α's) in Classical Statistical Testing. The American Statistician. 2003, 57 (3): 171178. 10.1198/0003130031856View ArticleGoogle Scholar
 Berger J: Could Fisher, Jeffreys and Neyman Have Agreed on Testing?. Statistical Science. 2003, 18: 112. 10.1214/ss/1056397485. http://www.jstor.org/stable/3182859 10.1214/ss/1056397485View ArticleGoogle Scholar
 Hubbard R, Bayarri MJ: P Values are not Error Probabilities. Tech Rep TR1403. 2003, University of Valencia, Department of Statistics and Operations Research, http://www.uv.es/sestio/TechRep/tr1403.pdfGoogle Scholar
 Christensen R: Testing Fisher, Neyman, Pearson, and Bayes. The American Statistician. 2005, 59 (2): 121126. 10.1198/000313005X20871View ArticleGoogle Scholar
 Hubbard R, Armstrong JS: Why We Don't Really Know What Statistical Significance Means: Implications for Educators. Journal of Marketing Education. 2006, 28: 114120. 10.1177/0273475306288399View ArticleGoogle Scholar
 R Development Core Team: R: A Language and Environment for Statistical Computing. 2009, Vienna, Austria: R Foundation for Statistical Computing, http://www.Rproject.orgGoogle Scholar
 Wolpert DH: Determining Whether Two Data Sets are from the Same Distribution. Maximum Entropy and Bayesian Methods: Proceedings of the Fifteenth International Workshop on Maximum Entropy and Bayesian Methods. Edited by: Hanson KM, Silver RN. 1995,Google Scholar
 Kleinstiver BP, Fernandes AD, Gloor GB, Edgell DR: A unified genetic, computational and experimental framework identifies functionally relevant residues of the homing endonuclease IBmol. Nucleic Acids Res. 2010, 38 (7): 241127. 10.1093/nar/gkp1223PubMedPubMed CentralView ArticleGoogle Scholar
 Edgell DR, Shub DA: Related homing endonucleases IBmoI and ITevI use different strategies to cleave homologous recognition sites. Proceedings of the National Academy of Sciences USA. 2001, 98 (14): 7898903. 10.1073/pnas.141222498View ArticleGoogle Scholar
 Edgell DR, Stanger MJ, Belfort M: Importance of a single base pair for discrimination between introncontaining and intronless alleles by endonuclease IBmoI. Current Biology. 2003, 13 (11): 9738. 10.1016/S09609822(03)003403PubMedView ArticleGoogle Scholar
 Edgell DR, Stanger MJ, Belfort M: Coincidence of cleavage sites of intron endonuclease ITevI and critical sequences of the host thymidylate synthase gene. Journal of Molecular Biology. 2004, 343 (5): 123141. 10.1016/j.jmb.2004.09.005PubMedView ArticleGoogle Scholar
 Aitchison J, Shen SM: LogisticNormal Distributions: Some Properties and Uses. Biometrika. 1980, 67 (2): 261272. 10.2307/2335470. http://www.jstor.org/stable/2335470 10.2307/2335470View ArticleGoogle Scholar
 Berger JO, Bernardo JM: Ordered Group Reference Priors with Application to the Multinomial Problem. Biometrika. 1992, 79: 2537. 10.1093/biomet/79.1.25View ArticleGoogle Scholar
 Kullback S: Information theory and statistics Dover. 1978,Google Scholar
 Egozcue JJ, PawlowskyGlahn V, MateuFigueras G, BarcelóVidal C: Isometric Logratio Transformations for Compositional Data Analysis. Mathematical Geology. 2003, 35 (3): 279300. 10.1023/A:1023818214614View ArticleGoogle Scholar
 Egozcue JJ, PawlowskyGlahn V: Groups of parts and their balances in compositional data analysis. Mathematical Geology. 2005, 37 (7): 795828. 10.1007/s1100400573819View ArticleGoogle Scholar
 Egozcue JJ, DíazBarrero JL, PawlowskyGlahn V: Hilbert space of probability density functions based on Aitchison geometry. Acta Mathematica Sinica. 2006, 22 (4): 11751182. 10.1007/s1011400506782View ArticleGoogle Scholar
 Kullback S, Leibler RA: On information and sufficiency. Annals of Mathematical Statistics. 1951, 22: 7986. 10.1214/aoms/1177729694View ArticleGoogle Scholar
 Kass R, Raftery A: Bayes Factors. Journal of the American Statistical Association. 1995, 90 (430): 773795. 10.2307/2291091View ArticleGoogle Scholar
 Jeffreys H: Theory of probability. 1961, The International series of monographs on physics, Oxford: Clarendon Press, 3,Google Scholar
 Frederico LA, Kunkel TA, Shaw BR: Cytosine deamination in mismatched base pairs. Biochemistry. 1993, 32 (26): 65236530. 10.1021/bi00077a005PubMedView ArticleGoogle Scholar
 Lindahl T: The Croonian Lecture, 1996: Endogenous Damage to DNA. Philosophical transactions of the Royal Society of London, Series B. 1996, 351 (1347): 15291538. 10.1098/rstb.1996.0139View ArticleGoogle Scholar
 Hofreiter M, Jaenicke V, Serre D, Haeseler Av A, Pääbo S: DNA sequences from multiple amplifications reveal artifacts induced by cytosine deamination in ancient DNA. Nucleic Acids Research. 2001, 29 (23): 47934799. 10.1093/nar/29.23.4793PubMedPubMed CentralView ArticleGoogle Scholar
 Jaynes ET, Bretthorst GL: Probability Theory: the Logic of Science. 2003, Cambridge, UK: Cambridge University Press, http://www.loc.gov/catdir/samples/cam033/2002071486.htmlView ArticleGoogle Scholar
 Bernardo JM, Rueda R: Bayesian hypothesis testing: a reference approach. International Statistical Review. 2002, 70 (3): 351372. 10.1111/j.17515823.2002.tb00175.xView ArticleGoogle Scholar
 Berger JO, Bernardo JM, Sun D: The Formal Definition of Reference Priors. Annals of Statistics. 2009, 37 (2): 905938. 10.1214/07AOS587View ArticleGoogle Scholar
 Jeffreys H: An Invariant Form for the Prior Probability in Estimation Problems. Proceedings of the Royal Society of London, Series A. 1946, 186: 453461. 10.1098/rspa.1946.0056View ArticleGoogle Scholar
 Bernardo JM: Reference Posterior Distributions for Bayesian Inference. Proceedings of the Royal Society of London, Series B. 1974, 41: 113147.Google Scholar
 Wallace CS, Freeman PR: Estimation and Inference by Compact Coding. Journal of the Royal Statistical Society, Series B. 1987, 49 (3): 240265. http://www.jstor.org/stable/2985992Google Scholar
 Jermyn IH: Invariant Bayesian estimation on manifolds. Annals of Statistics. 2005, 33: 583605. 10.1214/009053604000001273View ArticleGoogle Scholar
 Spee JH, de Vos WM, Kuipers OP: Efficient random mutagenesis method with adjustable mutation frequency by use of PCR and dITP. Nucleic Acids Research. 1993, 21 (3): 777778. http://nar.oxfordjournals.org 10.1093/nar/21.3.777PubMedPubMed CentralView ArticleGoogle Scholar
 Rall LB: Automatic differentiation: techniques and applications. 1981, 120: Berlin: SpringerVerlag,View ArticleGoogle Scholar
 Griewank A: Evaluating derivatives: principles and techniques of algorithmic differentiation. 2000, 19: Philadelphia: Society for Industrial and Applied Mathematics, http://www.loc.gov/catdir/enhancements/fy0621/99052587d.htmlGoogle Scholar
 Goldman N, Yang Z: A codonbased model of nucleotide substitution for proteincoding DNA sequences. Molecular Biology and Evolution. 1994, 11 (5): 72536.PubMedGoogle Scholar
 Mayrose I, DoronFaigenboim A, Bacharach E, Pupko T: Towards realistic codon models: among site variability and dependency of synonymous and nonsynonymous rates. Bioinformatics. 2007, 23 (13): i319i327. 10.1093/bioinformatics/btm176PubMedView ArticleGoogle Scholar
 Goldman N: Statistical tests of models of DNA substitution. Journal of Molecular Evolution. 1993, 36: 182198. [10.1007/BF00166252], http://dx.doi.org/10.1007/BF00166252 10.1007/BF00166252PubMedView ArticleGoogle Scholar
 Kass RE, Greenhouse JB: Comments on "Investigating therapies of potentially great benefit: ECMO". Statistical Science. 1989, 4: 310317. 10.1214/ss/1177012386View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.