Linear model for fast background subtraction in oligonucleotide microarrays
- K Myriam Kroll^{1},
- Gerard T Barkema^{2, 3} and
- Enrico Carlon^{1}Email author
DOI: 10.1186/1748-7188-4-15
© Kroll et al; licensee BioMed Central Ltd. 2009
Received: 21 July 2009
Accepted: 16 November 2009
Published: 16 November 2009
Abstract
Background
One important preprocessing step in the analysis of microarray data is background subtraction. In high-density oligonucleotide arrays this is recognized as a crucial step for the global performance of the data analysis from raw intensities to expression values.
Results
We propose here an algorithm for background estimation based on a model in which the cost function is quadratic in a set of fitting parameters such that minimization can be performed through linear algebra. The model incorporates two effects: 1) Correlated intensities between neighboring features in the chip and 2) sequence-dependent affinities for non-specific hybridization fitted by an extended nearest-neighbor model.
Conclusion
The algorithm has been tested on 360 GeneChips from publicly available data of recent expression experiments. The algorithm is fast and accurate. Strong correlations between the fitted values for different experiments as well as between the free-energy parameters and their counterparts in aqueous solution indicate that the model captures a significant part of the underlying physical chemistry.
Background
where I_{SP}(c) is the specific signal due to the hybridization of the surface-bound probe sequence with a complementary target sequence. This quantity depends on the concentration c of the complementary strand in solution (target). The non-specific term I_{bg} has different origins. It arises due to spurious effects such as incomplete hybridization where probe sequences bind to only partially complementary targets or due to other optical effects.
Models based upon the physical chemistry of hybridization (see e.g. [2]) predict a linear increase of the specific signal until saturation is approached. In case of highly expressed genes the specific part of the signal I_{SP}(c) dominates the total signal intensity I and hence one can safely make the approximation I ≈ I_{SP}(c). For lowly expressed genes, as well as for sequences with a low binding affinity, the specific and the non-specific contribution to the total intensity can be of comparable magnitude. In this case an accurate estimate of I_{bg} is crucial to draw reliable conclusions concerning the expression level; estimates based on the intensity distribution over the whole chip suggest that this is the case for roughly a quarter or half of the probes [3]. Once the background is calculated the gene expression level is then computed from background subtracted data I - I_{bg}.
In this paper we present an algorithm for the calculation of the background level for Affymetrix expression arrays, also known as GeneChips. In these arrays the probe sequences come in pairs: for each perfect match (PM) probe, which is exactly complementary to the transcript sequence in solution, there is a second probe with a single non-complementary nucleotide with respect to the specific target. The latter is called mismatch (MM) probe.
Several algorithms for background analysis of Affymetrix chips are available. Some of these use the MM intensities as corrections for non-specific hybridization, while others rely on PM intensities only. For instance, the Affymetrix MAS 5.0 software (Microarray Analysis Suite version 5.0) uses the difference (I^{PM} - I^{MM}) as estimator of the specific signal; an adjusted MM intensity (ideal MM) is used in the case the MM intensity exceeds the PM signal [4]. The Robust Multiarray Algorithm (RMA) [5] uses a different type of subtraction scheme which does not involve the MM intensities. The more recent version of this algorithm GCRMA performs background subtraction using information on the probe sequence composition through the calculation of binding affinities [6]. The position-dependent nearest-neighbor model (PDNN) [7] fits the background intensity using weight factors which depend on the position along the probe. The free energy parameters then enter in a nonlinear function. In the VSN algorithm [8] a generalized log transform is used to background correct the data. A study dedicated to the performance of different algorithms showed that the type of background subtraction used has a large effect on the global performance of the algorithms [9]. It is therefore not surprising that the background issue has attracted a lot of interest by the scientific community.
In this paper we present an algorithm for background estimation which combines information from the sequence composition and physical neighbors on the chip. This algorithm relies on previous work by the authors [10]. While the previous algorithm performed well with respect to the accuracy of the background estimation, the computational effort (per probe) involved was a severe limiting factor concerning its practical usability. The main cause of this significant computational effort was the iterative minimization of a cost function with nonlinear terms. The algorithm presented in this work involves a different cost function which is quadratic in the parameters. Its minimization can be performed via standard matrix computations of linear algebra. The algorithm is fast and accurate and is therefore suited for large scale analysis.
This paper is organized as follows. In Methods we discuss the optimization step from singular value decomposition and we provide the details of the selected cost function. In Results a test of the algorithm on about 360 Genechips from recent (2006 onwards) experiments from the Gene Expression Omnibus (GEO) is presented. Finally, the advantages of this scheme and its overall performance as background subtraction method is highlighted.
Methods
Approach
where N_{ f }is the number of fitting parameters. Ω_{ iα }is a sequence- and position-dependent element of the N_{ dim }× N_{ f }-dimensional matrix Ω, which will be defined below.
(Note that M = Ω^{ T }Ω is symmetric and dim(M) = N_{ f }× N_{ f }).
of the optimal parameter values. If the matrix M is singular, M^{-1} has to be replaced by its pseudoinverse M^{+}which can be obtained by means of Singular value decomposition (SVD). In this work a standard SVD algorithm based on Golub and Reinsch is used (see e.g. [11, 12]).
Due to the symmetry of M only half of the off-diagonal elements need to be generated, hence reducing the computational effort. For the chips tested in this paper with dimensions up to 1164 × 1164 features, the computational time on a standard PC (×86_64 Intel Core 2 Duo with 3 GHz, 3 GB RAM) required to estimate the background intensities is 8 to 10 seconds for the larger chips, and faster for the smaller ones.
This makes our algorithm an order of magnitude faster than our previous version [10], 3 to 5 times faster than GC-RMA, PDNN and MAS5, and about twice as slow as RMA and DFCM (Bioconductor packages were used for the testing). Note that for our algorithm, the time estimate includes both reading in the CEL-file and the background calculation, as it is done in one step.
This computation involves the generation of the matrix M and vector Γ (from Eqs. (6) and (7)), the SVD of M to solve Eq. (5) and the estimation of the background intensity for all PM probes through Eq. (2). Differently from other approaches in which the cost function is minimized by means of Monte Carlo methods [13] or other dynamical algorithms [10], the solution of SVD provides the exact minimum of the cost function Eq. (3). Hence, there is no risk in getting stuck in local minima different from the global one.
Data Set - Parameter Optimization
As mentioned above, probes in Affymetrix form PM/MM pairs. Consider now a target sequence at a concentration c in solution. The analysis of Affymetrix spike-in data (see e.g. [14]) shows that not only the PM signal increases with increased target concentration c but also the MM intensity. This is an indication that a single MM nucleotide only partially prevents probe-target hybridization. Therefore the intensity of MM probes can also be decomposed in a non-specific and specific part as in Eq. (1). Supported by Affymetrix spike-in data analysis, our assumption is that the non-specific part of the hybridization is about equal for PM and MM probes: . The specific part of the signal is different in those two cases; equilibrium thermodynamics suggests a constant ratio , independent of the target concentration, as observed in experiments [3].
These insights are useful for the selection of probes for the optimization set in Eq. (3): includes all MM probes whose intensities are below a certain threshold I_{0} and whose corresponding PM intensities also fulfill <I_{0} (a similar selection criterion was recently used by Chen et al. [15]). The threshold I_{0} is chosen on the basis of the total distribution of the intensities. contains a significant fraction of the mismatch probes: typically 35%. Since the specific signal of MM intensities is lower than that of their corresponding PM's, they provide more reliable information on the background. The coordinates and sequences of the probes in are then fitted to the intensities of these probes yielding the parameters ω. With those newly acquired parameters ω the background signal of all MM probes is estimated based upon the assumption .
The matrix Ω
The choice of the matrix elements of Ω is dictated by input from physical chemistry as well as by the architecture of the microarray. Different schemes involving different choices for Ω with a varying number of parameters N_{ f }were tested. Given a choice of Ω and in particular the number N_{ f }of fitting parameters, the accuracy of the background estimation is reflected by the value of S from the minimization of Eq. (3). While the addition of fitting parameters always yields lower values of S, a too large set of fitting parameters runs the risk of "overfitting". The final choice of Ω is a compromise between a minimization of S and the use of the smallest possible set of parameters.
In the present model the number of parameters is N_{ f }= 50. Similarly to the previous work [10] these parameters can be split into two groups: a first group describes the correlation of the background intensities with features which are physical neighbors on the chip; the second group are nearest-neighbor parameters which describe affinities for non-specific hybridization to the chip.
Physical Neighbors on the Chip
so that the intensities of the neighboring features explicitly enter the calculation of the background intensity of (x_{ i }, y_{ i }) as matrix elements Ω_{ iα }(2 ≤ α ≤ 9). In analogy to Eqs. (9,10) we define Ω_{ iα }with 10 ≤ α ≤ 18 corresponding to the sequences with a central pyrimidine.
Nearest-Neighbor Free Energy Parameters
according to the order given above. The sum runs over all the 24 dinucleotides along a probe sequence.
The matrix element Ω_{ αi } is equal to the number of dinucleotides of a given type in the sequence s(i). For instance, if the sequence s(i) contains 4 dinucleotides of type CC and 2 of type GC, then Ω_{i,19}= 4 and Ω_{i,20}= 2. Hybridization thermodynamics predicts log I ∝ ΔG where ΔG is the hybridization free energy.
In the nearest-neighbor model [17] the free energy is written as a sum of dinucleotide terms. Therefore, the parameters ω_{ α }(19 ≤ α ≤ 35) are the analogues of the free energy parameters of the nearest-neighbor model.
Position-Dependent Nearest-Neighbors
where l_{ m }= 12.5, i.e. each dinucleotide is given a parabolic weight according to its position relative to the center at l_{ m }of the sequence. Thus, possible "unzipping" effects of the DNA-RNA duplex are approximately accounted for by Eq. (14).
The introduction of a position-dependence effect is in analogy with work done by other groups [7, 16, 18, 19]. However, we do not introduce a position-dependent weight for each position along the 25-mer sequences. Instead, we limit ourselves to a parabolic modulation of the parameters along the chain, which drastically reduces the number of parameters involved in the model.
Invariances
since the shifting of Eq. (17) compensates the one introduced by Eq. (16). This reparametrization, valid for any real λ, leaves S invariant, and produces a zero eigenvalue of the matrix M of Eq. (6).
Similarly, one can verify that there is at least a second zero eigenvalue: a shift of the position-dependent nearest-neighbor parameters = _{ α }+ λ (for 35 ≤ α ≤ 50) as well as of , leaves S invariant. To obtain the latter equations Eq. (14) and have to be applied.
Having zero eigenvalues, the matrix M is therefore not invertible; the SVD thus provides the appropriate pseudo-inverse as discussed above. Accidental degeneracies or quasi-degeneracies of M could also occur, yielding eigenvalues close to zero in machine precision. These are, however, rare, and were actually never found in the calculations presented here.
Results
Overview over organisms and number of CEL-files analyzed
Organism | GEO # | Chiptype (dimension) | # files | _{ min } |
---|---|---|---|---|
A. Thaliana | GSE4847 | ATH1-121501 (712 × 712) | 18 | 0.0259 |
GSE7642 | ATH1-121501 (712 × 712) | 12 | 0.0544 | |
GSE9311 | ATH1-121501 (712 × 712) | 8 | 0.0546 | |
C. Elegans | GSE6547 | Celegans (712 × 712) | 25 | 0.0361 |
GSE8159 | Celegans (712 × 712) | 7 | 0.0396 | |
D. Melanogaster | GSE3990 | Drosophila_2 (732 × 732) | 6 | 0.0620 |
GSE6558 | DrosGenome1 (640 × 640) | 24 | 0.0605 | |
D. Rerio | GSE4859 | Zebrafish (712 × 712) | 8 | 0.0357 |
E. Coli | GSE11779 | E_coli_2 (478 × 478) | 3 | 0.0869 |
GSE2928 | Ecoli (544 × 544) | 12 | 0.0172 | |
GSE6195 | E_coli_2 (478 × 478) | 4 | 0.0664 | |
H. Sapiens | GSE10433 | HG-U133A_2 (732 × 732) | 12 | 0.0757 |
GSE5054 | HG-U133A (712 × 712) | 20 | 0.0392 | |
HG-U133A_2 (732 × 732) | ||||
M. Musculus | GSE7148 | HG-U133A (712 × 712) | 14 | 0.0296 |
GSE8514 | HG-U133_Plus_2 (1164 × 1164) | 15 | 0.0738 | |
GSE11897 | MOE430A (712 × 712) | 11 | 0.0640 | |
MOE430B (712 × 712) | ||||
Mouse430_2 (1002 × 1002) | ||||
GSE6210 | Mouse430_2 | 12 | 0.0594 | |
GSE6297 | Mouse430_2 | 24 | 0.0325 | |
O. Sativa | GSE15071 | Rice (1164 × 1164) | 20 | 0.1157 |
R. Norvegicus | GSE4494 | RG_U34A (534 × 534) | 59 | 0.0488 |
GSE7493 | Rat230_2 (834 × 834) | 9 | 0.0497 | |
GSE8238 | Rat230_2 (834 × 834) | 4 | 0.0640 | |
S. Aureus | GSE7944 | S_aureus (602 × 602) | 6 | 0.0746 |
S. Cerevisiae | GSE6073 | YG_S98 (534 × 534) | 12 | 0.0283 |
GSE8379 | YG_S98 (534 × 534) | 8 | 0.0180 | |
X. Laevis | GSE3368 | Xenopus_laevis (712 × 712) | 20 | 0.0514 |
Optimized parameter values as obtained from the minimization of Eq. (3).
A. Thaliana | C. Elegans | D. Melanogaster | D. Rerio | |||||
---|---|---|---|---|---|---|---|---|
GEO no | GSE4847 | GSE7642 | GSE9311 | GSE6547 | GSE8159 | GSE3990 | GSE6558 | GSE4859 |
_{1} | 0.178 | 0.196 | 0.261 | 0.379 | 0.520 | 0.270 | 0.421 | 0.338 |
_{2} | 0.050 | 0.056 | 0.042 | 0.047 | 0.050 | 0.022 | 0.025 | 0.041 |
_{3} | 0.051 | 0.056 | 0.041 | 0.034 | 0.042 | 0.024 | 0.024 | 0.050 |
_{4} | -0.013 | -0.012 | -0.013 | -0.012 | -0.016 | -0.005 | -0.011 | -0.012 |
_{5} | -0.013 | -0.011 | -0.012 | -0.010 | -0.012 | -0.003 | -0.010 | -0.008 |
_{6} | 0.186 | 0.198 | 0.224 | 0.168 | 0.271 | 0.228 | 0.195 | 0.140 |
_{7} | 0.004 | 0.005 | 0.003 | 0.003 | 0.005 | 0.001 | 0.005 | 0.006 |
_{8} | 0.002 | 0.009 | 0.003 | 0.002 | 0.005 | 0.001 | 0.006 | 0.002 |
_{9} | 0.013 | 0.027 | 0.010 | 0.008 | 0.008 | 0.009 | 0.012 | 0.014 |
_{10} | -0.174 | -0.192 | -0.258 | -0.375 | -0.517 | -0.267 | -0.418 | -0.334 |
_{11} | 0.063 | 0.060 | 0.059 | 0.058 | 0.062 | 0.042 | 0.057 | 0.068 |
_{12} | 0.069 | 0.064 | 0.064 | 0.052 | 0.059 | 0.046 | 0.061 | 0.075 |
_{13} | -0.017 | -0.015 | -0.016 | -0.011 | -0.012 | -0.012 | -0.013 | -0.016 |
_{14} | -0.018 | -0.016 | -0.019 | -0.010 | -0.009 | -0.011 | -0.014 | -0.012 |
_{15} | 0.258 | 0.299 | 0.328 | 0.301 | 0.459 | 0.336 | 0.316 | 0.246 |
_{16} | 0.004 | 0.006 | 0.005 | 0.005 | 0.009 | 0.002 | 0.008 | 0.007 |
_{17} | 0.002 | 0.007 | 0.003 | 0.003 | 0.006 | 0.001 | 0.007 | 0.004 |
_{18} | 0.013 | 0.025 | 0.012 | 0.012 | 0.015 | 0.009 | 0.016 | 0.016 |
_{19} | 0.068 | 0.138 | 0.143 | 0.149 | 0.155 | 0.159 | 0.217 | 0.140 |
_{20} | 0.052 | 0.191 | 0.169 | 0.090 | 0.157 | 0.124 | 0.152 | 0.140 |
_{21} | -0.021 | -0.038 | 0.006 | -0.033 | 0.030 | -0.060 | -0.033 | -0.069 |
_{22} | -0.021 | 0.051 | 0.056 | -0.025 | 0.072 | -0.084 | 0.003 | -0.068 |
_{23} | -0.032 | -0.162 | -0.147 | -0.038 | -0.121 | -0.060 | -0.094 | -0.104 |
_{24} | 0.041 | 0.075 | 0.072 | 0.024 | 0.015 | 0.030 | 0.048 | 0.035 |
_{25} | -0.063 | -0.221 | -0.162 | -0.122 | -0.140 | -0.172 | -0.200 | -0.190 |
_{26} | -0.060 | -0.130 | -0.119 | -0.118 | -0.109 | -0.173 | -0.154 | -0.181 |
_{27} | 0.000 | -0.008 | -0.047 | 0.035 | -0.041 | 0.052 | 0.035 | 0.055 |
_{28} | 0.036 | 0.148 | 0.076 | 0.062 | 0.063 | 0.105 | 0.096 | 0.128 |
_{29} | -0.026 | -0.058 | -0.069 | -0.046 | -0.044 | -0.065 | -0.085 | -0.052 |
_{30} | -0.050 | -0.032 | -0.093 | -0.065 | -0.052 | -0.106 | -0.102 | -0.078 |
_{31} | 0.057 | 0.037 | 0.056 | 0.082 | 0.012 | 0.133 | 0.103 | 0.130 |
_{32} | 0.093 | 0.180 | 0.173 | 0.125 | 0.112 | 0.211 | 0.181 | 0.221 |
_{33} | 0.011 | -0.074 | -0.021 | -0.008 | -0.026 | 0.015 | -0.036 | 0.001 |
_{34} | -0.009 | -0.017 | -0.016 | -0.023 | -0.014 | -0.035 | -0.047 | -0.022 |
Corr. Coeff | 0.775 | 0.686 | 0.738 | 0.814 | 0.700 | 0.754 | 0.826 | 0.705 |
_{35} | 0.020 | 0.016 | 0.016 | 0.019 | 0.013 | 0.016 | 0.012 | 0.018 |
_{36} | 0.020 | 0.010 | 0.012 | 0.021 | 0.011 | 0.016 | 0.016 | 0.017 |
_{37} | 0.025 | 0.025 | 0.022 | 0.029 | 0.019 | 0.028 | 0.027 | 0.031 |
_{38} | 0.024 | 0.019 | 0.019 | 0.028 | 0.016 | 0.028 | 0.023 | 0.031 |
_{39} | 0.026 | 0.036 | 0.034 | 0.030 | 0.030 | 0.027 | 0.031 | 0.033 |
_{40} | 0.021 | 0.021 | 0.020 | 0.026 | 0.021 | 0.022 | 0.023 | 0.026 |
_{41} | 0.028 | 0.039 | 0.035 | 0.036 | 0.031 | 0.035 | 0.038 | 0.039 |
_{42} | 0.027 | 0.032 | 0.031 | 0.035 | 0.028 | 0.033 | 0.033 | 0.038 |
_{43} | 0.023 | 0.024 | 0.026 | 0.024 | 0.023 | 0.019 | 0.021 | 0.021 |
_{44} | 0.020 | 0.013 | 0.017 | 0.022 | 0.015 | 0.015 | 0.017 | 0.016 |
_{45} | 0.024 | 0.028 | 0.028 | 0.030 | 0.024 | 0.027 | 0.029 | 0.029 |
_{46} | 0.025 | 0.025 | 0.029 | 0.030 | 0.024 | 0.028 | 0.029 | 0.030 |
_{47} | 0.020 | 0.022 | 0.020 | 0.022 | 0.021 | 0.015 | 0.019 | 0.016 |
_{48} | 0.017 | 0.012 | 0.012 | 0.019 | 0.013 | 0.010 | 0.014 | 0.011 |
_{49} | 0.023 | 0.029 | 0.026 | 0.028 | 0.023 | 0.023 | 0.028 | 0.026 |
_{50} | 0.023 | 0.023 | 0.023 | 0.028 | 0.021 | 0.024 | 0.026 | 0.026 |
Corr coeff | -0.679 | -0.618 | -0.679 | -0.759 18 | -0.631 | -0.683 | -0.783 | -0.658 |
Parameters of physical neighbors in the Chip
The parameters _{2} to _{9} and _{11} to _{18} describe the coupling of the background intensities to the physically neighboring features on the chip. As already mentioned, our estimate of the PM background is based on the non-specific intensity of the MM sequence. An Affymetrix chip is designed such that MM and PM are found in rows at equal y-coordinates. In addition, given a PM at (x, y), the corresponding MM feature is at (x, y + 1).
Influence of neighboring spot on background intensity in % of eight (randomly chosen) CEL-files of different organisms.
X. Laevis | 74500.CEL 35% | 76190.CEL 37% |
---|---|---|
C. Celegans | 201989.CEL 43% | 201994.CEL 52% |
H. Sapiens | 263931.CEL 41% | 263930.CEL 39% |
S. Cerevisiae | 207569.CEL 29% | 207570.CEL 29% |
Nearest-Neighbor Parameters
The parameters _{ α }(19 ≤ α ≤ 34) are the analogues of the nearest-neighbor free energy parameters. The nearest-neighbor model is commonly used to study the thermodynamics of hybridization of nucleic acids in solution (see e.g. [17]). In this model it is assumed that the stability and thus the hybridization free energy ΔG of a dinucleotide depends on the orientation and identity of the neighboring base pairs. For RNA/DNA duplexes there are 16 hybridization free energy parameters which were measured in aqueous solution by Sugimoto et al. [20].
Recent experiments [21] focusing on specific hybridization show a good degree of correlation between the hybridization free energies in solution and those directly determined from microarray data. Concerning background data, we also expect a certain degree of correlation between the parameters _{ α }(19 ≤ α ≤ 34) and their corresponding Sugimoto free energy parameters.
Position-dependent nearest neighbor parameters
Comparing estimated vs. measured background
Discussion
We have presented a background subtraction scheme for Affymetrix GeneExpression arrays which is both, accurate and usable on a standard ×86_64 Intel Core 2 PC. The algorithm centers around a cost function which is quadratic in its fitting parameters. This allows for a rapid minimization, through linear algebra, in particular through singular value decomposition. The accuracy of the present algorithm is very similar to that of a background algorithm previously presented by the authors [10]. The latter had been tested on Affymetrix spike-in data and its performance was compared to background schemes such as MAS5, RMA and GCRMA. Regarding spike-in data, the analysis had shown that the proposed algorithm is definitely more accurate than background computations done with MAS5 and RMA, but also improves on GCRMA [10].
The proposed algorithm has two categories of fitting parameters. The first category exploits correlations between features which are neighbors on the chip. The second category is based on the strong similarity between probe-target hybridization and duplex stability in solution, and involves stacking free energies in analogy to those in the nearest-neighbor model. Existing algorithms are either of the first [4] or the second [6, 7, 9] category, but not both.
The background subtraction scheme has been tested on 360 GeneChips from publicly available data of recent expression experiments. Since the fitted values for the same parameters in different experiments do not show much variation, the algorithm is robust and can be easily transferred to other experiments. Due to its speed and accuracy the present method is suited for large scale computations. An R-package integrating the background analysis scheme with the computation of expression values from background subtracted data will be made freely available to the community (a preliminary version of this package can be found in http://itf.fys.kuleuven.ac.be/~enrico/ilm.html). The performance of this approach is discussed in [22].
Declarations
Acknowledgements
We acknowledge financial support from FWO (Research Foundation - Flanders) grant n. G.0311.08. Stimulating discussions with N. Naouar are gratefully acknowledged.
Authors’ Affiliations
References
- Gentleman R, Carey V, Huber W, Irizarry R, Dudoit S: Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Statistics for Biology and Health, Springer; 2003.Google Scholar
- Held GA, Grinstein G, Tu Y: Modeling of DNA microarray data by using physical properties of hybridization. Proc Natl Acad Sci. 2003, 100: 7575-7580.PubMedPubMed CentralView ArticleGoogle Scholar
- Ferrantini A, Allemeersch J, Van Hummelen P, Carlon E: Thermodynamic scaling behavior in genechips. BMC Bioinformatics. 2009, 10: 3.PubMedPubMed CentralView ArticleGoogle Scholar
- New statistical algorithms for monitoring gene expression on genechip probe arrays. Tech rep, Affymetrix; 2001.
- Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP: Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostat. 2003, 4 (2): 249-264. 10.1093/biostatistics/4.2.249.View ArticleGoogle Scholar
- Wu Z, Irizarry R, Gentleman R, Martinez-Murillo F, Spencer F: A Model-Based Background Adjustment for Oligonucleotide Expression Arrays. Journal of the American Statistical Association. 2004, 99 (468): 909-10.1198/016214504000000683.View ArticleGoogle Scholar
- Zhang L, Miles MF, Aldape KD: A model of molecular interactions on short oligonucleotide microarrays. Nature Biotech. 2003, 21: 818-10.1038/nbt836.View ArticleGoogle Scholar
- Huber W, von Heydebreck A, Sütmann H, Poustka A, Vingron M: Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics. 2002, 18 (Suppl 1): S96-104.PubMedView ArticleGoogle Scholar
- Irizarry RA, Wu Z, Jaffee HA: Comparison of Affymetrix GeneChip expression measures. Bioinformatics. 2006, 22 (7): 789.PubMedView ArticleGoogle Scholar
- Kroll KM, Barkema GT, Carlon E: Modeling background intensity in DNA microarrays. Phys Rev E. 2008, 77: 061915-10.1103/PhysRevE.77.061915.View ArticleGoogle Scholar
- Golub GH, Reinsch C: Singular value decomposition and least squares solutions. Numer Math. 1970, 14: 403-402. 10.1007/BF02163027.View ArticleGoogle Scholar
- Golub GH, Van Loan CF: Matrix computations. The Johns Hopkins University Press, London; 1996.Google Scholar
- Ono N, Suzuki S, Furusawa C, Agata T, Kashiwagi A, Shimizu H, Yomo T: An improved physico-chemical model of hybridization on high-density oligonucleotide microarrays. Bioinformatics. 2008, 24 (10): 1278-1285.PubMedPubMed CentralView ArticleGoogle Scholar
- Burden CJ, Pittelkow Y, Wilson SR: Adsorption models of hybridization and post-hybridization behaviour on oligonucleotide microarrays. J Phys: Cond Matt. 2006, 18 (23): 5545-10.1088/0953-8984/18/23/024.Google Scholar
- Chen Z, McGee M, Liu Q, Kong M, Deng Y, Scheuermann R: A distribution-free convolution model for background correction of oligonucleotide microarray data. BMC Genomics. 2009, 10 (Suppl 1): S19.PubMedPubMed CentralView ArticleGoogle Scholar
- Binder H, Preibisch S: Specific and nonspecific hybridization of oligonucleotide probes on microarrays. Biophys J. 2005, 89: 337.PubMedPubMed CentralView ArticleGoogle Scholar
- Bloomfield VA, Crothers DM, Tinoco I: Nucleic Acids Structures, Properties and Functions. University Science Books, Mill Valley; 2000.Google Scholar
- Naef F, Magnasco MO: Solving the riddle of the bright mismatches: Labeling and effective binding in oligonucleotide arrays. Phys Rev E. 2003, 68: 011906-10.1103/PhysRevE.68.011906.View ArticleGoogle Scholar
- Zhang L, Wu C, Carta R, Zhao H: Free energy of DNA duplex formation on short oligonucleotide microarrays. Nucleic Acids Res. 2007, 35 (3): e18.PubMedPubMed CentralView ArticleGoogle Scholar
- Sugimoto N, Nakano S, Katoh M, Matsumura A, Nakamuta H, Ohmichi T, Yoneyama M, Sasaki M: Thermodynamic Parameters To Predict Stability of RNA/DNA Hybrid Duplexes. Biochemistry. 1995, 34: 11211-11216.PubMedView ArticleGoogle Scholar
- Hooyberghs J, Van Hummelen P, Carlon E: The effects of mismatches on hybridization in DNA microarrays: determination of nearest neighbor parameters. Nucleic Acids Res. 2009, 37 (7): e53.PubMedPubMed CentralView ArticleGoogle Scholar
- Mulders GC, Barkema GT, Carlon E: Inverse Langmuir method for oligonucleotide microarray analysis. BMC Bioinformatics. 2009, 10: 64.PubMedPubMed CentralView ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.