From now on, we assume that the positions in the sequence are independently distributed. We denote p(x) the background probability associated to the letter x of the alphabet Σ. By extension, we write p(u) for the probability of the word u = u_{1} ... u_{
m
}: p(u) = p(u_{1}) × ⋯ × p(u_{
m
}).
Definition of the score distribution
The computation of the Pvalue is done through the computation of the score distribution. This concept is the core of the large majority of existing algorithms [9–11, 15]. Given a matrix M of length m and a score α, we define Q(M, α) as the probability that the background model can achieve a score equal to α. In other words, Q (M, α) is the probability of the set {u ∈ Σ^{m} Score(u, M) = α}. In the case where s is not an accessible score, then Q(M, s) = 0.
The computation of Q is easily performed by dynamic programming. For that purpose, we need some preliminary notation. Given two integers i, j satisfying 0 ≤ i, j ≤ m, M [i..j] denotes the submatrix of M obtained by selecting only columns from i to j for all character symbols. M [i..j] is called a slice of M. By convention, if i > j, then M [i..j] is an empty matrix.
The score distribution for the slice M [1..i] is expressed from the sore distribution of the previous slice M [1..i  1] as follows.
\begin{array}{lll}Q(M[\mathrm{1..0}],s)\hfill & =\hfill & \{\begin{array}{ll}1\hfill & \text{if}s=0\hfill \\ 0\hfill & \text{otherwise}\hfill \end{array}\hfill \\ Q(M[\mathrm{1..}i],s)\hfill & =\hfill & {\displaystyle \sum _{x\in \Sigma}Q(M[\mathrm{1..}i1],sM(i,x))\times p(x)}\hfill \end{array}
(1)
The time complexity is in O(mΣS), and the space complexity in O(S), where S is the number of scores that have to be visited. If coefficients of M are natural numbers, then S is bounded by m × max {M (i, x)  x ∈ Σ, 1 ≤ i ≤ m}. Equation 1 enables to solve the From score to Pvalue and From Pvalue to score problems. Given a score α, the Pvalue is obtained with the relation:
\text{Pvalue}(M,\alpha )={\displaystyle \sum _{s\ge \alpha}Q(M,s)}
Conversely, given P, Threshold (M, P) is computed from Q by searching for the greatest accessible score until the required Pvalue is reached.
Computing the score distribution for a range of scores
Formula 1 does not explicitly state which score ranges should be taken into account in intermediate steps of the calculation of Q. To this end, we introduce the best score and the worst score of a matrix slice.
Definition 1 (Best and worst scores) Let M be a matrix. The best score of the slice M [i..j] is defined as
\text{BS}(M[i\mathrm{..}j])={\displaystyle \sum _{k=i}^{j}\mathrm{max}\{M(k,x)x\in \Sigma \}}
Similarly, the worst score of the slice M [i..j] is defined as
\text{WS}(M[i\mathrm{..}j])={\displaystyle \sum _{k=i}^{j}\mathrm{min}\{M(k,x)x\in \Sigma \}}
The notion of best scores is already present in [16], where it is used to speed up the search for occurrences of a matrix in a text. It gives rise to look ahead scoring. Best scores allow to stop the calculation of Score(u, M) in advance as soon as it is guaranteed that the score threshold cannot be achieved, because we know the maximal remaining score. It has been exploited in [5, 6] in the same context. Here we adapt it to the score distribution problem. Let α and β be two scores such that α ≤ β. If one wants to compute the score distribution Q for the range [α, β], then given an intermediate score s and a matrix position i, we say that Q(M [1..i], s) is useful if there exists a word v of length m  i such that α ≤ s + Score(v, M [i + 1..m]) ≤ β. Lemma 2 characterizes useful intermediate scores.
Lemma 2 Let M be a matrix of length m, let α and β be two score bounds defining a score range for which we want to compute the score distribution Q. Q(M [1..i], s) is useful if, and only if,
α  BS(M [i + 1..m]) ≤ s ≤ β  WS(M [i + 1..m])
Proof. This is a straightforward consequence of Definition 1.
This result is implemented in Algorithm SCOREDISTRIBUTION, displayed in Figure 3. The algorithm ensures that only accessible scores are visited. In practice, this is done by using a hash table for storing values of Q.
If one wants only to calculate the Pvalue of a given score without knowing the score distribution, Algorithm SCOREDISTRIBUTION can be further improved. We introduce a complementary optimization that leads to a significant speed up. The idea is that for good words, we can anticipate that the final score will be above the given threshold without calculating it.
Definition 2 (Good words) Let α be a score and i be a position of M. Given u = u_{1} ... u_{
i
}a word of Σ^{i}, we say that u is good for α if the following conditions are fulfilled:

1.
Score(u, M [1..i]) ≥ α  WS(M [i + 1..m])

2.
Score(u_{1} ... u_{i1}, M [1..i  1]) <α  WS(M [i..m])
Lemma 3 Let u be a good word for α. Then for all v in u Σ^{mu}, we have Score(v, M) ≥ α.
Proof. Let w in Σ^{mu}such that v = uw and let i be the length of u. We have
\begin{array}{lll}\text{Score}(v,M)\hfill & =\hfill & \text{Score}(u,M[\mathrm{1..}i])+\text{Score}(w,M[i+\mathrm{1..}m])\hfill \\ \ge \hfill & \text{Score}(u,M[\mathrm{1..}i])+\text{WS}(M[i+\mathrm{1..}m])\hfill \\ \ge \hfill & \alpha \hfill \end{array}
Lemma 4 Let u be a string of Σ^{m}such that Score(u, M) ≥ α. Then there exists a unique prefix v of u such that v is good for α.
Proof. We first remark that if Score(u, M) ≥ α, then Score(u, M) ≥ α  WS(M[m + 1..m]). So there exists at least one prefix of u satisfying the first condition of Definition 2: u itself. Now, consider a prefix v of length i such that Score(v, M[1..i]) ≥ α  WS(M[i + 1..m]). Then for each letter x of Σ, we have Score(vx, M[1..i + 1]) ≥ α  WS(M[i + 2..m]): It comes from the fact that M(i + 1, x) ≥ WS(M[i + 1..m])  WS(M[i + 2..m]). This property implies that if a prefix v of u satisfies the first condition of Definition 2, then all longer prefixes also do. According to the second condition of Definition 2, it follows that only the shortest prefix v such that Score(v, M[1..i]) ≥ α  WS(M[i + 1..m]) is a good word.
Lemma 5 Let M be a matrix of length m.
\text{Pvalue}(M,\alpha )={\displaystyle \sum _{\begin{array}{c}1\le i\le m,x\in \Sigma ,s<\alpha \text{WS}(M[i\mathrm{..}m])\\ s+M(i,x)\ge \alpha \text{WS}(M[i+\mathrm{1..}m])\end{array}}Q(M[\mathrm{1..}i1],s)\times p(x)}
Proof. We consider the set \mathcal{\text{P}}(α) of words whose score is greater than or equal to α: \mathcal{\text{P}}(α) = {w ∈ Σ^{m}Score(w, M) ≥ α}. According to Lemma 4, each word of \mathcal{\text{P}}(α) has a unique prefix that is good for α. Conversely, Lemma 3 ensures that each word whose prefix is good for α belongs to \mathcal{\text{P}}(α). \mathcal{\text{P}}(α) can thus be expressed as a union of disjoint sets.
\mathcal{\text{P}}(\alpha )={\displaystyle \underset{u\text{isgoodfor}\alpha}{\cup}u{\Sigma}^{m\leftu\right}}
(2)
It follows that
\text{Pvalue}(M,\alpha )={\displaystyle \sum _{u\text{isgoodfor}\alpha}p(u)}
(3)
where p(u) denotes the probability of the string u in the background model. By definition of Q, we can deduce the expected result from Formula 3.
Lemma 5 shows that it is not necessary to build the entire dynamic programming table for Q. Only values for Q(M[1..i], s) such that s <α  WS(M[i + 1..m]) are to be computed. This gives rise to the FASTPVALUE algorithm, described in Figure 4.
Permuting columns of the matrix
Algorithms 1 and 2 can also be used in combination with permutated lookahead scoring [16]. The matrix M can be transformed by permuting columns without modifying the overall score distribution. This is possible because the columns of the matrix are supposed to be independent. We show that it is also relevant for Pvalue calculation.
Lemma 6 Let M and N be two matrices of length m such that there exists a permutation π on {1,..., m} satisfying, for each letter x of Σ, M(i, x) = N(π_{
i
}, x). Then for any α, Q(M, α) = Q(N, α).
Proof. Let u be a word of Σ^{m}and let v={u}_{{\pi}_{1}}\mathrm{...}u{\pi}_{m}. By construction of N, we have Score(u, M) = Score(v, N). Since the background model is multinomial, we have p(u) = p(v). This completes the proof.
The question is how to permute the columns of a given matrix to enhance the performances of the algorithms. In [6], it is suggested to sort columns by decreasing information content. We refine this rule of thumb and propose to minimize the total size of all score ranges involved in the dynamic programming decomposition for Q in Algorithm SCOREDISTRIBUTION. For each i, 1 ≤ i ≤ m, define δ_{
i
}as δ_{
i
}= BS(M[i..i])  WS(M[i..i]).
Lemma 7 Let M be a matrix such that δ_{1} ≥ ... ≥ δ_{
m
}. Then M minimizes the total size of all score ranges amongst all matrices that can be obtained by permutation of M.
Proof. We write SR(M) for the total size of all score ranges of the matrix M. We have
\begin{array}{lll}SR(M)\hfill & =\hfill & {\displaystyle {\sum}_{i=1}^{m1}(\beta \text{WS}(M[i+\mathrm{1..}m])(\alpha \text{BS}(M[i+\mathrm{1..}m]))+(\beta \alpha )}\hfill \\ =\hfill & m(\beta \alpha )+{\displaystyle {\sum}_{i=2}^{m}\text{BS}(M[i\mathrm{..}m])\text{WS}(M[i\mathrm{..}m])}\hfill \\ =\hfill & m(\beta \alpha )+{\displaystyle {\sum}_{i=2}^{m}{\displaystyle {\sum}_{j=i}^{m}{\delta}_{j}}}\hfill \\ =\hfill & m(\beta \alpha )+{\displaystyle {\sum}_{i=2}^{m}(i1){\delta}_{i}}\hfill \end{array}
Since permutation of matrices induces a permutation of the sequence δ_{2},..., δ_{
m
}, the value {\displaystyle {\sum}_{i=2}^{m}(i1){\delta}_{i}} is minimal when δ_{1} ≥ δ_{2} ≥ ... ≥ δ_{
m
}.
In the remaining of this paper, we shall always assume that the matrix M has been permuted so that it fulfills the condition on (δ_{
i
})_{1≤i≤m}of Lemma 7. This is simply a preprocessing of the matrix that does not affect the course of the algorithms.
Efficient algorithms for computing the Pvalue without error
We now come to the presentation of two exact algorithms, which is are the main algorithms of this paper. In Algorithms SCOREDISTRIBUTION and FASTPVALUE, the number of accessible scores plays an essential role in the time and space complexity. As mentioned in the Background section, this number can be as large as Σ^{m}. In practice, it strongly depends on the involved matrix and on the way the score distribution is approximated by round matrices. The choice of the precision is critical. Algorithms SCOREDISTRIBUTION and FASTPVALUE should compromise between accuracy, with faithful approximation, and efficiency, with rough approximation.
To overcome this problem, we propose to define successive discretized score distributions with growing accuracy. The key idea is to take advantage of the shape of the score distribution Q, and to use small granularity values only in the portions of the distribution where it is required. This is a kind of selective zooming process. Discretized score distributions are built from round matrices.
Definition 3 (Round matrix)
Let M be a matrix of real coefficient values of length m and let ε be a positive real number. We denote M
_{
ε
}
the round matrix deduced from M by rounding each value by ε:
{M}_{\epsilon}(i,x)=\epsilon \lfloor \frac{M(i,x)}{\epsilon}\rfloor
ε is called the granularity. Given ε, we can define E, the maximal error induced by M_{
ε
}.
E={\displaystyle \sum _{i=1}^{m}\mathrm{max}\{M(i,x){M}_{\epsilon}(i,x),x\in \Sigma \}}
(4)
Lemma 8 Let M be a matrix, ε the granularity, and E the maximal error associated. For each word u of Σ^{m}, we have 0 ≤ Score(u, M)  Score(u, M_{
ε
}) ≤ E.
Proof. This is a straightforward consequence of Definition 3 for M_{
ε
}and E.
Lemma 9
Let M, N and N' be three matrices of length m, E, E' be two nonnegative real numbers, α, β be two scores such that α ≤ β, satisfying the following hypotheses:
(i) for each word u in Σ^{m}, Score(u, N) ≤ Score(u, M) ≤ Score(u, N) + E,
(ii) for each word u in Σ^{m}, Score(u, N') ≤ Score(u, N) ≤ Score(u, M) ≤ Score(u, N') + E',
(iii) Pvalue(N, α  E) = Pvalue(N, α),
(iv) Pvalue(N', β  E') = Pvalue(N', β),
then
{\displaystyle \sum _{\begin{array}{c}\alpha \le t<\beta \\ taccessible\end{array}}Q(N,t)}={\displaystyle \sum _{\begin{array}{c}\alpha \le t<\beta \\ taccessible\end{array}}Q(M,t)}
Proof. Let u be a string in Σ^{m}. It is enough to establish that α ≤ Score(u, N) <β if, and only if, α ≤ Score(u, M) <β. The proof is divided into four parts.

If α ≤ Score(u, N), then α ≤ Score(u, M): This is a consequence of Score(u, N) ≤ Score(u, M) in (i).

If α ≤ Score(u, M), then α ≤ Score(u, N): By hypothesis (i) on E, α ≤ Score(u, M) implies α  E ≤ Score(u, N). Since Pvalue(N, α  E) = Pvalue(N, α) with (iii), it follows that α ≤ Score(u, N).

If Score(u, N) <β, then Score(u, M) <β: By hypothesis (ii), Score(u, N) <β implies that Score(u, N') <β. According to (iv), this ensures that Score(u, N') <β  E', which with (ii) guarantees Score(u, M) <β

If Score(u, M) <β, then Score(u, N) <β: This is a consequence of Score(u, N) ≤ Score(u, M) in (i).
What does this statement tell us ? It provides a sufficient condition for the distribution score Q computed with a round matrix to be valid for the initial matrix M. Assume that you can observe two plateaux ending respectively at α and β in the score distribution of M_{
ε
}. Then the approximation of the total probability for the score range [α, β[obtained with the round matrix is indeed the exact probability. In other words, there is no need to use smaller granularity values in this region to improve the result.
From score to Pvalue
Lemma 9 is used through a stepwise algorithm to compute the Pvalue of a score threshold. Let α be the score for which we want to determine the associated Pvalue. We estimate the score distribution Q iteratively. For that, we consider a series of round matrices M_{
ε
}for decreasing values of ε, and calculate successive values Pvalue (M_{
ε
}, α). The efficiency of the method is guaranteed by two properties. First, we introduce a stop condition that allows us to stop as soon as it is guaranteed that the exact value of the Pvalue is reached. Second, we carefully select relevant portions of the score distribution for which the computation should go on. This tends to restrain the score range to inspect at each step. The algorithm is displayed in Figure 5.
The correctness of the algorithm comes from the two next Lemmas. The first Lemma establishes that the loop invariants hold.
Lemma 10 Throughout Algorithm 3, the variables β and P satisfy the invariant relation P = Pvalue(M, β).
Proof. This is a consequence of invariant 1 and invariant 2 in Algorithm 3. Both invariants are valid for initial conditions. When P = 0 and β = BS(M) + 1: Pvalue(M, BS(M) + 1) = 0. Regarding N', choose N' = M_{
ε
}.
There are two cases to consider for invariant 1.
 If s does not exist. P and β remain unchanged, so we still have P = Pvalue(M, β). Regarding invariant 2, if there exists such a matrix N' at the former step for M_{
kε
}, then it is still suitable for M_{
ε
}.
 If s actually exists. invariant 1 implies that P is updated to Pvalue(M, β) + ∑_{s≤t<β}Q(M_{
ε
}, t).
According to Lemma 9 and invariant 2, we have ∑_{s≤t<β}Q(M_{
ε
}, t) = ∑_{s≤t<β}Q(M, t). Hence P = Pvalue(M, s). Since β is updated to s, it follows that P = Pvalue(M, β). Regarding invariant 2, take N' = M_{
ε
}.
The second Lemma shows that when the stop condition is met, the final value of the variable P is indeed the expected result Pvalue(M, α).
Lemma 11 At the end of Algorithm 3, P = Pvalue(M, α).
Proof. When s = α  E, then β = α. According to Lemma 10, it implies P = Pvalue(M_{
ε
}, α). Since the stop condition implies that Pvalue(M_{
ε
}, α  E) = Pvalue(M_{
ε
}, α), Lemma 9 ensures that Pvalue(M_{
ε
}, α) = Pvalue(M, α).
From Pvalue to score
Similarly, Lemma 9 is used to design an algorithm to compute the score threshold associated to a given Pvalue. We first show that the score threshold obtained with a round matrix for a Pvalue gives some insight about the potential score interval for the initial matrix M.
Lemma 12 Let M be a matrix, ε a granularity and E the maximal error associated. Given P, 0 ≤ P ≤ 1, we have
Threshold(M_{
ε
}, P) ≤ Threshold(M, P) ≤ Threshold(M_{
ε
}, P) + E
Proof. Let β = Threshold(M_{
ε
}, P). According to Lemma 8, Pvalue(M_{
ε
}, β) ≥ P implies Pvalue(M, β) ≥ P, which yields β ≤ Threshold(M, P). So it remains to establish that Threshold(M, P) ≤ β + E. If Pvalue(M, β + E) = 0, then the highest accessible score for M is smaller than β + E. In this case, the expected result is straightforward. Otherwise, there exists β' such that β' is the lowest accessible score for M that is strictly greater than β + E. Since s → Pvalue(M, s) is a decreasing function in s, we have to verify that Pvalue(M, β') <P to complete the proof of the Lemma. Assume that Pvalue(M, β') ≥ P. Let γ = min {Score(u, M_{
ε
})u ∈ Σ^{m}∧ Score(u, M) ≥ β'}. On the one hand, the definition of γ implies that
Pvalue(M, β') ≤ Pvalue(M_{
ε
}, γ)
On the other hand, γ is an accessible score for M_{
ε
}that satisfies γ ≥ β'  E > β. By hypothesis of β, it follows that
Pvalue(M_{
ε
}, γ) <P
Equations 5 and 6 contradict the assumption that Pvalue(M, β') ≥ P. Thus Pvalue(M, β') <P.
The sketch of the algorithm is as follows. Let P be the desired Pvalue. We compute iteratively the associated score threshold for successive decreasing values of ε. At each step, we use Lemma 12 to speed the calculation for the matrix M_{
ε
}. This Lemma allows us to restrain the computation of the detailed score distribution Q to a small interval of length 2 × E. For the remaining of the distribution, we can use the FASTPVALUE algorithm. Lemma 13 ensures that when Pvalue(M_{
ε
}, α  E) = Pvalue(M_{
ε
}, α), then α is the required score value for M. The algorithm is displayed in more details in Figure 6.
Lemma 13 Let M be a matrix, ε the granularity and E the maximal error associated. If Pvalue(M_{
ε
}, α  E) = Pvalue(M_{
ε
}, α), then Pvalue(M, α) = Pvalue(M_{
ε
}, α).
Proof. This is a corollary of Lemma 9 with M_{
ε
}in the role of N and N', and BS(M) + E in the role of β.