### The competitive two-species Langmuir model of microarray hybridization

We emphasize on Affymetrix GeneChip microarray data obtained after the chips have been hybridized, scanned and the images have been summarized into hundred-thousands of paired intensity values of perfect match (PM) and of mismatched (MM) probes. The intensities of probe "p" on chip "c" are well described using the Langmuir adsorption isotherm [12, 14, 20–22],

\begin{array}{ccc}{\text{I}}_{\text{p}}^{\text{P}}\ast ={\text{M}}_{\text{c}}\cdot {\Theta}_{\text{p}}^{\text{P}}+{\text{O}}_{\text{c}}& \text{with}& {\Theta}_{\text{p}}^{\text{P}}=\frac{{\text{X}}_{\text{p}}^{\text{P}}}{1+{\text{X}}_{\text{p}}^{\text{P}}}\end{array}.

(1)

Here the superscript denotes the probe-type (P = PM, MM). The indices "p" and "c" assign probe- and chip-specific parameters, respectively. The probe-index implies the chip specificity as well, i.e. p = p, c. This model predicts that the fraction of "occupied", i.e. dimerized oligonucleotides of a probe spot, Θ_{p}^{P} (also called surface probe coverage or occupancy), is directly related to the observed intensity, I_{p}^{P}* [14, 18]. The proportionality constant, M_{c}, specifies the maximum intensity referring to complete occupancy, Θ_{p}^{P} = 1, if all oligonucleotides of the respective probe spot on the given chip are dimerized. The minimum intensity referring to the absence of bound transcripts, Θ_{p}^{P} = 0, gives rise to the "optical" background intensity, O_{c}. Throughout the paper we will consider only "net" intensities which have been corrected for the optical background before further analysis, {\text{I}}_{\text{p}}^{\text{P}}={\text{I}}_{\text{p}}^{\text{P}}\ast -{\text{O}}_{\text{c}}, using, for example, the zone-algorithm provided by Affymetrix [23].

The surface coverage changes as a hyperbolic function of the "binding strength", X_{p}^{P}, which additively decomposes into contributions due to specific and non-specific hybridization

{\text{X}}_{\text{p}}^{\text{P}}={\text{X}}_{\text{p}}^{\text{P},\text{S}}+{\text{X}}_{\text{p}}^{\text{P},\text{N}}.

(2)

Since the binding strengths follow the mass action law they are related to the concentration of specific and non-specific transcripts, [S]_{p} and [N]_{c}, respectively, and to the respective effective association constants of duplex formation, K_{p}^{P,h} (h = S, N) (see [1] for details),

{\text{X}}_{\text{p}}^{\text{P},\text{S}}={\left[\text{S}\right]}_{\text{p}}\cdot {\text{K}}_{\text{p}}^{\text{P},\text{S}};{\text{K}}_{\text{p}}^{\text{P},\text{N}}={\left[\text{N}\right]}_{\text{c}}\cdot {\text{K}}_{\text{p}}^{\text{P},\text{N}}.

(3)

The latter equation assumes that the large number of different non-specific RNA-fragments in the hybridization solution effectively acts like a single species with the common concentration [N]_{c} for all probes of the chip [15, 16]. Contrarily, the concentration of specific transcripts, [S]_{p}, refers to a particular probe sequence, i.e., it represents a "single probe"-property. Microarrays of the GeneChip-type use so-called probe sets of several probes (usually N_{set} = 11) for estimating the expression of each considered gene. One expects therefore that all probes of a set probe the same, common transcript concentration, i.e. [S]_{set} = [S]_{p} for p ∈ set assuming that effects as alternative splicing have been appropriately considered during probe design.

The competitive two-species Langmuir adsorption isotherm (Eq. (1)) considers the effects of non-specific "background" hybridization and of saturation at small and large concentrations of specific transcripts, respectively. The maximum intensity at saturation, M_{c}, depends on factors such as the number of oligonucleotides per probe spot (which in turn is related to the density of oligomers and to the spot size), the mean number of optical labels per bound target and the settings of the scanner. These factors affect the PM and MM nearly in the same fashion giving rise to virtually identical values of M_{c} at complete saturation of the probe spots under equilibrium conditions (X_{p}^{P} >> 1) [16, 18].

Recent studies report significantly higher limiting intensity values of the PM, compared with that of the MM, i.e. M^{PM} > M^{MM} [22]. They interpreted this result assuming a probe-dependent partial dissociation of the duplexes during the post-hybridization washing phase. Another, additional explanation might be the truncation of a considerable amount of the probe oligomers due to incomplete synthesis because this effect causes the asymptote-like flattening of the hybridization isotherms at intermediate and large transcript concentrations in a sequence-dependent manner [1, 9].

We will apply in the following analysis the special-case of the common intensity asymptote for all probes of the chip according to Eq. (1). Possible consequences of deviations from this assumption for the data analysis will be addressed in a separate study.

### Matched and mismatched microarray probes

The probes on expression microarrays of the GeneChip-type are usually designed in a pairwise fashion. Each probe pair consists of 25-meric PM- and MM-probes where the PM-sequence is assumed to perfectly match a 25-meric section of the target gene. The MM-sequence differs from that of the PM by a single complementary mismatch in the centre of the sequence. The different middle bases of both probes of one pair cause different base pairings in the respective probe/target-duplexes and thus different binding constants (see below and [16]). Let us define the pairwise PM/MM ratio of the binding constants of specific and non-specific hybridization,

\begin{array}{ccc}{\text{s}}_{\text{p}}\equiv \frac{{\text{X}}_{\text{p}}^{\text{PM},\text{S}}}{{\text{X}}_{\text{p}}^{\text{MM},\text{S}}}=\frac{{\text{K}}_{\text{p}}^{\text{PM},\text{S}}}{{\text{K}}_{\text{p}}^{\text{MM},\text{S}}}& \text{and}& {\text{n}}_{\text{p}}\equiv \frac{{\text{X}}_{\text{p}}^{\text{PM},\text{N}}}{{\text{X}}_{\text{p}}^{\text{MM},\text{N}}}=\frac{{\text{K}}_{\text{p}}^{\text{PM},\text{N}}}{{\text{K}}_{\text{p}}^{\text{MM},\text{N}}}\end{array},

(4)

respectively, which specify the noted effect of different base-pairings formed by the PM and MM. For example, the binding strength of the complementary Watson-Crick (WC) base-pairings in the middle of the specific duplexes of the PM exceeds that of the specific duplexes of the MM which form a weaker self-complementary mismatch at this position [15–18]. For the ratio of the specific binding constants one consequently obtains s_{p} > 1. Contrarily, for the ratio of the non-specific binding constants one gets n_{p} < 1 for purines (Adenine, Guanine) and n_{p} > 1 for pyrimides (Thymine, Cytosine) in the middle of the PM sequence owing to the purine-pyrimidine asymmetry of Watson-Crick (WC) base-pair interactions in RNA/DNA duplexes [13, 24]. Hence, the parameters s_{p} and n_{p} specify the PM/MM-affinity gain of a selected probe pair upon specific and non-specific binding, respectively. Both, PM and MM probes obey the hyperbolic adsorption isotherm, Eq. (1) [18]. With Eq. (4) one obtains for the binding strengths of the PM and MM probes

\begin{array}{l}\begin{array}{cc}{\text{X}}_{\text{p}}^{\text{PM}}(\text{R})={\text{X}}_{\text{p}}^{\text{PM},\text{N}}\cdot \left(\text{R}+1\right)& \text{and}\end{array}\hfill \\ {\text{X}}_{\text{p}}^{\text{MM}}(\text{R})={\text{X}}_{\text{p}}^{\text{PM},\text{N}}\cdot \left(\text{R}/{\text{s}}_{\text{p}}+1/{\text{n}}_{\text{p}}\right)\hfill \end{array}

(5)

Eq. (5) scales the intensity of the PM and MM probes as a function of the relative hybridization degree,

\text{R}\equiv {\text{R}}_{\text{p}}^{\text{PM}}=\frac{{\text{X}}_{\text{p}}^{\text{PM},\text{S}}}{{\text{X}}_{\text{p}}^{\text{PM},\text{N}}}=\frac{{\left[\text{S}\right]}_{\text{p}}}{{\left[\text{N}\right]}_{\text{c}}}\cdot \frac{{\text{K}}_{\text{p}}^{\text{PM},\text{S}}}{{\text{K}}_{\text{p}}^{\text{PM},\text{N}}}.

(6)

This S/N-ratio, R, provides the specific binding strength of the PM in units of the non-specific one. It can serve as a relative measure of the expression degree because it is directly related to the concentration of specific transcripts, [S]_{p}. It scales the expression degree in a probe-specific fashion.

Part a of Figure 1 shows the courses of the intensities of a typical PM/MM pair as a function of the parameter R (see Eq. (6)). The PM intensity sigmoidally increases from its minimum value, I_{p}(R = 0), to I_{p}(R = ∞) = M_{c}, at small and large abscissa values, respectively. The respective probes referring to these limiting cases are either exclusively non-specifically hybridized or completely saturated with surface coverages of Θ_{p}^{PM}(0) = X_{p}^{PM,N}/(1 + X_{p}^{PM,N}) ≈ X_{p}^{PM,N} and Θ_{p}^{PM}(∞) = 1, respectively. The concentration and S/N-ratio referring to the inflection point of the isotherm at half-way between these values are

\begin{array}{ccc}{\left[\text{S}\right]}_{\text{p}}^{50\%}=\frac{1+{\text{X}}_{\text{p}}^{\text{PM},\text{N}}}{{\text{K}}_{\text{p}}^{\text{PM},\text{S}}}\approx \frac{1}{{\text{K}}_{\text{p}}^{\text{PM},\text{S}}}& \text{and}& {\text{R}}^{50\%}=\frac{1+{\text{X}}_{\text{p}}^{\text{PM},\text{N}}}{{\text{X}}_{\text{p}}^{\text{PM},\text{N}}}\approx \frac{1}{{\text{X}}_{\text{p}}^{\text{PM},\text{N}}}\end{array},

(7)

respectively. They specify the condition at which 50% of the free probes available in the absence of specific transcripts become occupied. The approximations at the right-hand side of Eq. (7) refer to small X_{p}^{PM,N} << 1.

The MM intensity responds in a very similar fashion as that of the PM with increasing R (see part a of Figure 1). The limiting surface coverage of exclusively non-specifically hybridized MM probes at R = 0 is changed compared with that of the PM (see Eq. (4)), Θ_{p}^{MM}(0) = X_{p}^{PM,N}/(n_{p} + X_{p}^{PM,N}) ≈ X_{p}^{PM,N}/n_{p}. The isotherm of the MM is clearly shifted to larger abscissa values in the intermediate R-range owing to the smaller binding strength for specific hybridization (s_{p} > 1, see above). For the inflection point of the isotherm one obtains in analogy to Eq. (7)

\begin{array}{ccc}{\left[\text{S}\right]}_{\text{p}}^{50\%}={\text{s}}_{\text{p}}\cdot \frac{1+{\text{X}}_{\text{p}}^{\text{PM},\text{N}}}{{\text{K}}_{\text{p}}^{\text{PM},\text{S}}}\approx \frac{{\text{s}}_{\text{p}}}{{\text{K}}_{\text{p}}^{\text{PM},\text{S}}}& ;& {\text{R}}^{50\%}=\frac{{\text{s}}_{\text{p}}}{{\text{n}}_{\text{p}}}\cdot \frac{1+{\text{X}}_{\text{p}}^{\text{PM},\text{N}}}{{\text{X}}_{\text{p}}^{\text{PM},\text{N}}}\approx \frac{{\text{s}}_{\text{p}}}{{\text{n}}_{\text{p}}}\cdot \frac{1}{{\text{X}}_{\text{p}}^{\text{PM},\text{N}}}\end{array},

(8)

which shows that the horizontal shift between the PM- and MM-isotherms is log(s_{p}) and log(s_{p}/n_{p}) in the log-scale of [S] and R, respectively.

### The delta- and sigma-transformations

The MM probes were designed as reference for estimating the non-specific background contribution to the respective PM intensity [2, 5, 25]. The "simple" subtraction of the MM-intensity from that of the PM however partly failed as correction method because both probes differently respond to non-specific and specific hybridization due to their complementary middle bases which, for example, gives rise to negative PM-MM intensity differences [15].

According to the Langmuir model, the behaviour of the PM and MM can be understood on the basis of the same hybridization rules where both probe types however differ with respect to their effective association constants for probe/target dimerization (see above). The intensities of the PM and MM are consequently expected to correlate in a well defined fashion. This mutual relation is determined by the mismatch design of the reference probe, the particular probe sequences and by the concentrations of specific and of non-specific RNA fragments in the sample solution used for hybridization on the particular chip [18].

Let us empirically analyze the relation between the PM and MM signals in terms of two simple linear combinations of the log-intensities of a probe pair, namely their difference and average value,

\begin{array}{l}{\Delta}_{\text{p}}\equiv \Delta \mathrm{log}{\text{I}}_{\text{p}}=\mathrm{log}{\text{I}}_{\text{p}}^{\text{PM}}-\mathrm{log}{\text{I}}_{\text{p}}^{\text{MM}}\hfill \\ {\Sigma}_{\text{p}}\equiv \Sigma \mathrm{log}{\text{I}}_{\text{p}}=\frac{1}{2}\left(\mathrm{log}{\text{I}}_{\text{p}}^{\text{PM}}+\mathrm{log}{\text{I}}_{\text{p}}^{\text{MM}}\right)\hfill \end{array},

(9)

(log ≡ log_{10} is the decadic logarithm). The intensity model predicts for this transformation (see Eqs. (1) and (5))

\begin{array}{l}{\Delta}_{\text{p}}(\text{R})={\Delta}_{\text{p}}^{\text{start}}+{\Delta}_{\text{p}}^{\text{Linear}}(\text{R})-\mathrm{log}\left\{\frac{{\text{B}}_{\text{p}}^{\text{PM}}(\text{R})}{{\text{B}}_{\text{p}}^{\text{MM}}(\text{R})}\right\}\hfill \\ \text{and}\hfill \\ {\Sigma}_{\text{p}}(\text{R})={\Sigma}_{\text{p}}^{\text{start}}+{\Sigma}_{\text{p}}^{\text{Linear}}(\text{R})-\frac{1}{2}\mathrm{log}\left\{{\text{B}}_{\text{p}}^{\text{PM}}(\text{R})\cdot {\text{B}}_{\text{p}}^{\text{MM}}(\text{R})\right\}\hfill \end{array},

(10)

with the "start", "linear" and the "saturation terms"

\begin{array}{ccc}{\Delta}_{\text{p}}^{\text{start}}=\mathrm{log}{\text{n}}_{\text{p}}& \text{and}& {\Sigma}_{\text{p}}^{\text{start}}=\mathrm{log}{\text{M}}_{\text{c}}-{\beta}_{\text{p}}\end{array}

\begin{array}{ccc}{\Delta}_{\text{p}}^{\text{Linear}}(\text{R})=\mathrm{log}\left\{\frac{(\text{R}+1)}{(\text{R}\cdot {10}^{-{\alpha}_{\text{p}}}+1}\right\}& \text{and}& {\Sigma}_{\text{p}}^{\text{Linear}}(\text{R})=\frac{1}{2}\mathrm{log}\left\{\left(\text{R}+1\right)\cdot \left(\text{R}\cdot {10}^{-{\alpha}_{\text{p}}}+1\right)\right\}\end{array}

\begin{array}{ccc}{\text{B}}_{\text{p}}^{\text{PM}}(\text{R})=1+{10}^{-\left({\beta}_{\text{p}}-\frac{1}{2}{\Delta}_{\text{p}}^{\text{start}}\right)}\left(\text{R}+1\right)& \text{and}& {\text{B}}_{\text{p}}^{\text{MM}}(\text{R})=1+{10}^{-\left({\beta}_{\text{p}}+\frac{1}{2}{\Delta}_{\text{p}}^{\text{start}}\right)}\left(R+{10}^{-{\alpha}_{\text{p}}}+1\right)\end{array}

respectively. The limiting values of Σ(R) and Δ(R) in the absence of specific transcripts (R = 0) are

\begin{array}{l}\begin{array}{ccc}{\Delta}_{\text{p}}(0)={\Delta}_{\text{p}}^{\text{start}}+{\text{o}}_{\Delta}& \text{and}& {\Sigma}_{\text{p}}(0)={\Sigma}_{\text{p}}^{\text{start}}+{\text{o}}_{\Sigma}\end{array}\hfill \\ \begin{array}{cccc}\text{with}& {\text{o}}_{\Delta}=\mathrm{log}\frac{1+{\text{X}}_{\text{p}}^{\text{PM},\text{N}}}{1+{\text{X}}_{\text{p}}^{\text{PM},\text{N}}/{\text{n}}_{\text{p}}}& \text{and}& {\text{o}}_{\Sigma}=\frac{1}{2}\mathrm{log}\left(\left(1+{\text{X}}_{\text{p}}^{\text{PM},\text{N}}\right)\cdot \left(1+{\text{X}}_{\text{p}}^{\text{PM},\text{N}}/{\text{n}}_{\text{p}}\right)\right)\end{array}\hfill \end{array}.

(11)

In the limit of weak non-specific binding (X_{p}^{P,N} << 1) the o-terms vanish and the limiting Δ- and Σ-coordinates are given by their start values. The probe-specific exponents in Eq. (10) are defined as

\begin{array}{ccc}{\alpha}_{\text{p}}=\mathrm{log}\frac{{\text{s}}_{\text{p}}}{{\text{n}}_{\text{p}}}& \text{and}& {\beta}_{\text{p}}=\frac{1}{2}\mathrm{log}{\text{n}}_{\text{p}}-\mathrm{log}{\text{X}}_{\text{p}}^{\text{PM},\text{N}}\end{array}

(12)

In summary, the hyperbolic intensity functions of the PM and MM can be transformed into Δ and Σ coordinates which are governed by essentially four parameters, the start values Δ_{p}^{start} ≅ Δ_{p}(0) and Σ_{p}^{start} ≅ Σ_{p}(0) and the exponents *α*_{p} and *β*_{p}. They were chosen to provide a simple geometrical interpretation of the Δ-vs-Σ trajectory in terms of its start-coordinates and its vertical and horizontal dimension with respect to the start values (see below and Figure 1 and Figure 2).

### The hybridization regimes

Part b and c of Figure 1 show the transformed intensities taken from part a of the figure as a function of the parameter R = R_{p}^{PM}. The course of the log-difference, Δ_{p}(R), can be roughly divided into five regimes which reflect different hybridization characteristics of the PM and the MM probes with increasing degree of specific hybridization (see Figure 1, part b):

(1) **N-regime**: In the non-specific-regime, at small values R → 0, both, the PM and MM nearly exclusively hybridize with non-specific transcripts. Saturation can be typically neglected in this range (B_{p}^{P} ≈ 1, see Eqs. (10) and (11)). The limiting ordinate value for X_{p}^{P,N} << 1 estimates the ratio of the binding constants referring to the respective pair of complementary middle bases in the PM and MM sequences, Δ_{p}(0) ≈ log n_{p} (see Eq. (4)). We will use the approximation of weak non-specific binding throughout the paper.

(2) **mix-regime**: In the subsequent mixed-regime, both, specific and non-specific transcripts significantly contribute to the observed intensity of the probes. The log-difference Δ increases with increasing amount of specific transcripts. The positive slope of Δ_{p}(R) implies ∂Δ/∂R ~ (1-10^{-α}) > 0, and thus *α*_{p} > 0 or equivalently s_{p} > n_{p} (see Eq. (12)). The increase of Δ_{p}(R) consequently reflects the simple fact that the specific binding constant of the PM exceeds that of the respective MM, i.e., K_{p}^{PM,S} > K_{p}^{MM,S}, if one assumes K_{p}^{PM,N} ≈ K_{p}^{MM,N} (see below and Eq. (4)).

(3) **S-regime**: In the specific-regime the probes predominantly hybridize with specific transcripts. As a consequence, Δ_{p} reaches a maximum at \text{R}={\text{R}}_{\mathrm{max}}\approx {10}^{0.5\left({\alpha}_{\text{p}}+{\beta}_{\text{p}}\right)} with the ordinate value

{\Delta}_{\text{p}}({\text{R}}_{\mathrm{max}})\approx {\Delta}_{\text{p}}(0)+{\alpha}_{\text{p}}-\mathrm{log}\left\{\frac{1+{10}^{-0.5\left({\beta}_{\text{p}}-{\alpha}_{\text{p}}\right)}}{1+{10}^{-0.5\left({\beta}_{\text{p}}+{\alpha}_{\text{p}}\right)}}\right\}.

(13)

This rough approximation assumes Δ_{p}(0) << 1 <*β*_{p} and R_{max} >> 1. At conditions of weak saturation Eq. (13) simplifies with 0.5 (*β*_{p} - *α*_{p}) >> 1 into {\Delta}_{p}\left({R}_{\mathrm{max}}\right)\approx {\Delta}_{p}^{Linear}\left(\infty \right)={\Delta}_{p}(0)+{\alpha}_{p}>0. At these conditions the height of the maximum directly provides the log-transformed PM/MM-ratio of the specific binding constants, *α*_{p}.

(4) **sat-regime**: In the saturation-regime the probes become progressively saturated with bound transcripts (B_{p}^{P} > 1). This effect first and foremost affects the PM due to their higher specific binding constant (see above). As a consequence Δ_{p} starts to decrease.

(5) **as-regime**: At very large expression degrees both, PM and MM reach their maximum intensity upon complete saturation. In this asymptotic-regime the trajectory reaches the abscissa for R → ∞, Δ_{p}(∞) ≈ 0 (see Eq. (10)).

The respective log-sum of the intensities, Σ_{p}(R), is shown in part c of Figure 1. It varies in a similar, sigmoidal fashion as the individual log-intensities of the PM and MM (compare part a and c of Figure 1). Here the mix-, S- and sat-regimes merge into one region of increasing Σ whereas the N- and as-regimes provide the minimum and maximum values, {\Sigma}_{\text{p}}(0)\approx {\Sigma}_{\text{p}}^{\text{start}} and Σ_{p}(∞) = log M_{c}, respectively. With Eqs. (11) and (12) one obtains for the difference

*β*_{p} ≈ Σ_{p}(∞) - Σ_{p}(0) > 0

*β*_{p} specifies the span between the maximum and minimum Σ-values. The Σ-coordinate of the maximum of Δ_{p}(R) at R = R_{max} becomes

{\Sigma}_{\text{p}}\left({\text{R}}_{\mathrm{max}}\right)\approx {\Sigma}_{\text{p}}(0)+\frac{1}{2}{\beta}_{\text{p}}.

(15)

Eqs. (15) and (14) provide {\Sigma}_{\text{p}}\left({\text{R}}_{\mathrm{max}}\right)-{\Sigma}_{\text{p}}(0)\approx \frac{1}{2}\left({\Sigma}_{\text{p}}\left(\infty \right)-{\Sigma}_{\text{p}}(0)\right)>0, i.e., the maximum of Δ_{
p
}(*R*) roughly bisects the total range of the Σ-coordinate.

### The Δ-vs-Σ trajectory

In the next step we plot the transformed intensities into Δ-vs-Σ coordinates (see Figure 2, part a). This presentation, also known as M-vs-A plot (difference-vs-sum), reflects the binding isotherms of a PM/MM-probe pair. The obtained Δ-vs-Σ trajectory shows a characteristic curved shape with start-, end- and maximum-points referring to the S/N-ratios R = 0, R = ∞ and R = R_{max}, respectively. They consequently define the N-, as- and S-hybridization regimes. The mix- and sat-regimes can be attributed to the increasing and decreasing parts of the Δ-vs-Σ trajectory, respectively.

The parameters *α*_{p} and *β*_{p} define the height and the width of the obtained Δ-vs-Σ curve (see also Eqs. (13) and (14)). The Δ- and Σ-coordinates of the characteristic points depend on the PM/MM-ratios of the binding constants (see Eq. (4)), on the maximum intensity, Σ_{p}(∞) = logM_{c}, and on the mean intensity of the chemical background due to non-specific hybridization, Σ_{p}(0) ∝ log(I_{p}^{PM}(0)) + log(I_{p}^{MM}(0)). Hence, the Δ-vs-Σ trajectory links the observed probe intensities with essential hybridization characteristics in terms of simple geometric parameters.

**The horizontal scale of the Δ-vs-Σ trajectory**

In the Appendix A we show that the difference between the actual Σ-coordinate and its "asymptotic-value", Σ_{p}-Σ_{p}(∞), estimates the mean probe coverage of the PM and MM probes

\u3008{\Theta}_{\text{p}}\u3009={10}^{\left({\Sigma}_{\text{p}}-\Sigma \left(\infty \right)\right)},

(16)

whereas the difference between the Σ-coordinate and its "start value", Σ_{p}-Σ_{p}(0) characterizes the relation between the amount of specific and non-specific hybridization in terms of the fraction of specifically occupied binding sites of the respective probe spot

\u3008{\text{x}}_{\text{p}}^{\text{S}}\u3009\approx \frac{1-{10}^{-\left({\Sigma}_{\text{p}}-{\Sigma}_{\text{p}}\left(0\right)\right)}}{1-{10}^{-{\beta}_{\text{p}}}}.

(17)

Eqs. (16) and (17) provide mean values averaged over the respective PM/MM-probe pair. The "individual" coverages of the PM and MM probes, Θ_{p}^{PM} and Θ_{p}^{MM}, and the respective fraction of specifically hybridized oligomers, x_{p}^{PM,S} and x_{p}^{MM,S}, in addition depend on the relative Δ-coordinates Δ-Δ(∞) and Δ-Δ(0), respectively (see Eqs. (42) and (45) in the Appendix A).

Part b of Figure 2 shows the surface coverage and the fraction of specifically occupied oligomers for the Δ-vs-Σ trajectory plotted in part a of the figure. Note that x^{P, S} and Θ^{P} exponentially scale with the coordinate differences Σ-Σ(0) and Σ-Σ(∞), respectively (see Eqs. (17) and (16), respectively).

Consequently, the fraction of specifically occupied probes steeply increases in the raising part of the Δ-vs-Σ trajectory (mix-regime) whereas the probe coverage steeply increases in its decaying part (sat-regime). The contribution of non-specific hybridization and/or the effect of saturation of a particular probe can be essentially neglected if the distance of its Σ-coordinate from the start and/or end points exceeds unity. Particularly, one obtains <x_{p}^{S}> > 0.9 for Σ-Σ(0) > 1 and < Θ_{p}> < 0.1 for Σ(∞)-Σ < 1.

The horizontal shift between the PM-and MM-curves in part b of Figure 2 illustrates the "delayed response" of the MM with respect to the specific transcript concentration: The MM reach a certain ordinate-level of the surface coverage and of the fraction of specifically bound probes at larger abscissa values and thus at larger concentrations of specific transcript concentrations than the PM (see also Eqs. (7) and (8)).

The fraction of specifically bound probes directly transforms into the mean S/N-ratio of the PM and MM (see Appendix A and also Eq. (6)),

\u3008\text{R}\u3009\approx \frac{{10}^{\left\{\left({\Sigma}_{\text{p}}-{\Sigma}_{\text{p}}\left(0\right)\right)\right\}}-1}{1-{10}^{\left\{\left({\Sigma}_{\text{p}}-{\Sigma}_{\text{p}}\left(\infty \right)\right)\right\}}}.

(18)

For abscissa values Σ < Σ(∞) -1, Eq. (18) simplifies into log(<R> + 1) ≈ Σ-Σ(0). Hence, the Σ-axis nearly linearly scales with the logarithm of the mean S/N-ratio. For the S/N-ratio of the PM, this equation modifies into \mathrm{log}({\text{R}}_{\text{p}}^{\text{PM}}+1)\approx \left(\Sigma -\Sigma (0)\right)+\frac{1}{2}\left(\Delta -\Delta (0)\right) (see Eq. (46) below), i.e., it depends in addition on the vertical coordinate of the Δ-vs-Σ trajectory.

For intermediate abscissa values, Σ(0) + 1 < Σ < Σ(∞) -1, the occupancy of the probe spots (Eqs. (16) and (42)) provide an approximation of the binding strength of specific hybridization of the PM and MM probes (Θ_{p}^{P} ≈ X_{p}^{P,S}, see also Eq. (1)) and of their mean

\begin{array}{l}\begin{array}{ccc}\mathrm{log}{\text{X}}_{\text{p}}^{\text{PM},\text{S}}\approx -\left({\Sigma}_{\text{p}}\left(\infty \right)-{\Sigma}_{\text{p}}\right)+\frac{1}{2}{\Delta}_{\text{p}}& ;& \mathrm{log}{\text{X}}_{\text{p}}^{\text{MM},\text{S}}\approx -\left({\Sigma}_{\text{p}}\left(\infty \right)-{\Sigma}_{\text{p}}\right)-\frac{1}{2}{\Delta}_{\text{p}}\end{array}\hfill \\ \begin{array}{cc}\text{and}& \mathrm{log}{\text{X}}_{\text{p}}^{\text{S}}\equiv \frac{1}{2}\left(\mathrm{log}{\text{X}}_{\text{p}}^{\text{PM},\text{S}}+\mathrm{log}{\text{X}}_{\text{p}}^{\text{MM},\text{S}}\right)\approx -\left({\Sigma}_{\text{p}}\left(\infty \right)-{\Sigma}_{\text{p}}\right)\end{array}\hfill \end{array}.

(19)

In summary, the position of a probe-point along the Σ-coordinate estimates the hybridization degree of the respective probe spot in terms of relative concentration measures characterizing either the S/N-ratio (Eq. (18)), the relative occupancy of the probe oligomers with specific transcripts (Eq. (17)), their overall degree of occupancy of (Eq. (16)) and the specific binding strength of the considered probe pair (Eq. (19)).

The probe coverage (Eq. (16)) provides an additional interpretation of the horizontal dimensions of the Δ-vs-Σ trajectory: For the N-point one obtains with Σ = Σ(0) the coverage due to non-specific hybridization, \u3008{\Theta}_{\text{p}}^{\text{N}}(\text{R}=0)\u3009\approx {10}^{\left({\Sigma}_{\text{p}}(0)-{\Sigma}_{\text{p}}(\infty )\right)}={10}^{-{\beta}_{\text{p}}}, because (almost) exclusively non-specific transcripts bind to the probes. Note that this "non-specific" coverage is exponentially related to the "width"-exponent, *β*_{p} (see Eq. (14)), and thus to the horizontal distance between the N- and the as-points. The remaining, not-occupied and thus free oligomers serve as potential binding sites for specific targets, i.e., \u3008{\Theta}_{\text{p}}^{\text{free}}(\text{R}=0)\u3009=1-{10}^{-{\beta}_{\text{p}}}. The horizontal dimension of the Δ-vs-Σ trajectory consequently specifies the maximum amount of free probes available for specific binding at R = 0 and thus the measurement range of the probe spots for estimating the expression degree. The narrowing of the model curves reflects the diminishing capacity of the respective probes for specific transcript binding. Figure 3 (part a) illustrates the narrowing of the Δ-vs-Σ trajectory upon increasing the non-specific background contribution. The special ideal case *β* = -∞ consequently refers to hybridization without non-specific background.

### The vertical scale of the Δ-vs-Σ trajectory

The Δ-coordinate of a probe is directly related to the so-called discrimination score DS used by Affymetrix as a relative measure of the PM-MM intensity difference.

\begin{array}{ccc}{\Delta}_{p}=\mathrm{log}\frac{1+D{S}_{p}}{1-D{S}_{p}}\approx \frac{2}{\mathrm{ln}10}\cdot D{S}_{p}& with& D{S}_{p}=\frac{{I}_{p}^{PM}-{I}_{p}^{MM}}{{I}_{p}^{PM}+{I}_{p}^{MM}}\end{array}.

(20)

The discrimination score roughly estimates the fraction of the signal due to specific hybridization (see [26]). The approximation on the right hand side of Eq. (20) seems save for small values DS << 1.

The discrimination score serves as the basic parameter in the MAS5-algorithm to calculate the so-called detection call (DC) which judges the "presence" or "absence" of a gene. Hence, the vertical scale of the Δ-vs-Σ trajectory is related to the detection call: the higher the Δ_{p}-value of a probe the higher the probability of the presence of the respective specific transcript in the hybridization solution. We will discuss this point more in detail in the accompanying paper in connection with our alternative method for classifying the genes into present and absent ones (see below).

The vertical scale of the Δ-vs-Σ trajectory admits an additional interpretation in terms of different strengths of the base pairings of the PM and MM probes. Particularly, the Δ-coordinates of the N- and the S-points estimate the ratio of the binding constants of the PM and MM upon specific and non-specific hybridization according to Eqs. (4), (11) and (13). We have previously shown that the log-ratio of the binding constants of the PM and MM probes can be interpreted in terms of the effective free energy difference for duplex formation with the respective targets [15, 16, 18]. For the MM-design used for GeneChip expression arrays it roughly refers to the effective free energy change upon replacement of the Watson Crick (WC) pairing in the middle position of the probe/target duplexes with the respective self complementary (SC) pairing in the specific duplexes and with the complementary WC-pairing in the non-specific duplexes, respectively, i.e.,

\begin{array}{l}\begin{array}{ccc}\mathrm{log}{\text{s}}_{\text{p}}\approx -\Delta {\epsilon}_{13}^{\text{WC}-\text{SC}}({\text{B}}_{\text{p}})& \text{and}& \mathrm{log}{\text{n}}_{\text{p}}\approx -\Delta {\epsilon}_{13}^{\text{WC}-\text{WC}}({\text{B}}_{\text{p}})\end{array}\hfill \\ \text{with}\hfill \\ \begin{array}{ccc}\Delta {\epsilon}_{13}^{\text{WC}-\text{SC}}({\text{B}}_{\text{p}})\equiv \left({\epsilon}_{13}^{\text{PM},\text{S}}({\text{B}}_{\text{p}})-{\epsilon}_{13}^{\text{MM},\text{S}}({\text{B}}_{\text{p}}^{\text{c}})\right)& \text{and}& \Delta {\epsilon}_{13}^{\text{WC}-\text{WC}}({\text{B}}_{\text{p}})\equiv \left({\epsilon}_{13}^{\text{PM},\text{N}}({\text{B}}_{\text{p}})-{\epsilon}_{13}^{\text{MM},\text{N}}({\text{B}}_{\text{p}}^{\text{c}})\right)\end{array}\hfill \end{array}.

(21)

Here Δ*ε*_{13}^{WC-SC} denotes the dimensionless free energy gain (given in units of the thermal energy, RT) upon replacements of the type B•b^{c} → B^{c}•b^{c} (i.e. WC → SC) for the base B_{p} = A, T, G, C at sequence position 13 of the probe (for example, C•g → G•g; upper case letters refer to the DNA-probe; lower case letters refer to the bound RNA-fragment, b = a, u, g, c; the superscript "c" indicates the respective complement). Accordingly, Δ*ε*_{13}^{WC-WC} is the respective free energy change upon WC-reversals, B•b^{c} → B^{c}•b (for example, C•g → G•c).

Hence, the ordinate position of the starting point of the Δ-vs-Σ trajectory estimates the effective free energy change upon replacing the central base in complementary WC-pairings, i.e. Δ_{p}(0) ≈ -Δ*ε*^{WC-WC}(B_{p}) (see Eqs. (11) and (21)). The relative ordinate value of the maximum is related to the respective free energy change upon replacing the central WC-pairing in the specific PM-duplexes by the respective SC-pairing in the MM-duplexes, i.e. Δ_{p}(R_{max}) ≈ -Δ*ε*^{WC-SC}.

Figure 3 illustrates that the maximum height of the Δ-vs-Σ trajectory starts to decrease for relatively small widths referring to large strengths of non-specific hybridization (*β*_{p} < 3) because saturation onsets almost in the mix-range. In such cases the observed vertical dimension of the trajectory potentially underestimates the height-parameter *α*_{p} (see Eq. (13)) which however can be obtained by appropriate curve fitting using Eq. (10) (see below).

In summary, the Δ-vs-Σ trajectory spans a sort of natural or intrinsic metric system between distinctive points which characterizes the binding thermodynamics of the probes of the particular microarray. The horizontal dimension characterizes the measurement range of the respective probe whereas the vertical dimension reflects the free energy gain due to the change of the central base pairing in the respective duplexes of the PM and MM probes.

### Δ-vs-Σ trajectories of individual probes

Each probe is characterized by its "individual" Δ-vs-Σ trajectory which describes the intensity change upon increasing content of S-transcripts in the range 0 ≤ R ≤ ∞. We used the Affymetrix HG-133 spiked-in data-set to study the R-dependence of selected probes http://www.affymetrix.com/support/technical/sample_data/datasets.affx. This data set was generated by Affymetrix to calibrate the observed intensities on the basis of known transcript concentrations. Particularly, transcripts referring to 42 selected genes were titrated with increased concentration onto a series of chips using the Latin-squares design. The non-specific background was taken into account by adding a HeLa-cell line extract to all hybridizations which does not contain the spiked-in transcripts.

Part a of Figure 4 shows the trajectories of six selected probes together with fits by means of Eq. (10) (compare curves and symbols). The probe-labels 1 to 6 are chosen to increase with increasing number of C and decreasing number of A per probe sequence (see Figure 4). The observed intensities and thus also the trajectories are functions of the binding constants for DNA/RNA duplex formation, which in turn depend on the sequences of the 25 meric probes. For example, the binding affinity of C•g WC-pairings exceeds that of A•u pairs in the hybrid duplexes. In general, the probes with a higher amount of cytosines are therefore expected to bind the RNA fragments more strongly than probes with a higher amount of adenines. Equation (3) predicts for the increase of K_{p}^{P,N} (and of logX_{p}^{PM,N}, see Eq. (3)) the decrease of the horizontal dimensions, *β*_{p}, of the respective probe-trajectory.

Indeed, the increase of the cytosine-content causes the narrowing of the trajectories by the shifting of their start-point, Σ_{p}(0), towards larger abscissa values at invariant Σ_{p}(∞) = const., which is assumed to be constant across all probes because of their common maximum binding capacity. Note that the width of the trajectories and thus the binding strength of the non-specific background varies over about two orders of magnitude, logX_{p}^{PM,N} ≈ -4 to -2, for the six selected probes.

The Δ-coordinates of the starting- and maximum-points of the selected probe-trajectories show considerable variation without obvious correlation to their sequence characteristics. We calculated the trajectories of all spiked-in probes (~500) using the results of our previous analysis of the hybridization isotherms (see refs. [18] and [16] for details) to estimate the variance of the positions of their starting- and maximum-points. The boxplot in part b of Figure 4 visualizes the center and the width of the distributions of the obtained Δ_{p}(0)- and Δ_{p}(R_{max})-data in vertical and horizontal directions.

The respective coordinates of the individual probe-trajectories depend mainly on the particular probe pairings of the middle bases in the non-specific and specific duplexes, respectively (see Eqs. (11) – (13) and (21)). To filter out the underlying sequence effects we calculated "mean" trajectories for all probe pairs with a certain middle base (see Figure 4, part b). These middle-base related mean trajectories are shifted each to another in vertical direction according to C ≈ T > G ≈ A for the N-, and C > G ≈ T > A for the S-point, respectively. This systematic trend is in agreement with Eq. (21) which predicts that the vertical positions of the N- and S-points are functions of the middle base of the respective probe sequences. The observed relations reflect the purine-pyrimidine asymmetry of binding strength of complementary WC-pairings at the N-point (i.e., Δ*ε*_{13}^{WC-WC}(B) ≠ Δ*ε*_{13}^{WC-WC}(B^{c})) and the higher stability of the WC-pairings compared with SC-mismatches at the S-point, Δ*ε*_{13}^{WC-SC}(B) (see Eq. (21)). Note that the specific binding constants of PM exceed that of the MM on the average by the factor of s ≈ 7 whereas for non-specific binding one obtains a mean ratio of n ≈ 1.2.

The comparison of the middle-base related trajectories with the width of the N- and S-boxes indicates that the systematic effect due to the middle-base explains the variability of the Δ_{p}(0)- and Δ_{p}(R_{max})-coordinates in the limits of their 25% and 75% quartiles. The consideration of the nearest neighbors of the middle base further broadens this range: For illustration we show the respective mean trajectories for the "middle triples" CCC and CGC which provide the strongest and weakest binding affinities among the 64 possible combinations of three adjacent bases, respectively (see [18] and [13]).

In summary, the transformed intensity data of individual probes are well described by the Δ-vs-Σ trajectories predicted by the Langmuir-isotherms. The presented data illustrate the probe-specific variability of the Δ-vs-Σ trajectories due to sequence effects. The positions of the start- and maximum-points can be attributed to the differences between the PM and MM probe-sequences which affect the respective binding constants in a middle-base dependent fashion.