Distributional fold change test – a statistical approach for detecting differential expression in microarray experiments

  • Vadim Farztdinov1Email author and

    Affiliated with

    • Fionnuala McDyer1

      Affiliated with

      Algorithms for Molecular Biology20127:29

      DOI: 10.1186/1748-7188-7-29

      Received: 18 June 2012

      Accepted: 22 October 2012

      Published: 2 November 2012



      Because of the large volume of data and the intrinsic variation of data intensity observed in microarray experiments, different statistical methods have been used to systematically extract biological information and to quantify the associated uncertainty. The simplest method to identify differentially expressed genes is to evaluate the ratio of average intensities in two different conditions and consider all genes that differ by more than an arbitrary cut-off value to be differentially expressed. This filtering approach is not a statistical test and there is no associated value that can indicate the level of confidence in the designation of genes as differentially expressed or not differentially expressed. At the same time the fold change by itself provide valuable information and it is important to find unambiguous ways of using this information in expression data treatment.


      A new method of finding differentially expressed genes, called distributional fold change (DFC) test is introduced. The method is based on an analysis of the intensity distribution of all microarray probe sets mapped to a three dimensional feature space composed of average expression level, average difference of gene expression and total variance. The proposed method allows one to rank each feature based on the signal-to-noise ratio and to ascertain for each feature the confidence level and power for being differentially expressed. The performance of the new method was evaluated using the total and partial area under receiver operating curves and tested on 11 data sets from Gene Omnibus Database with independently verified differentially expressed genes and compared with the t-test and shrinkage t-test. Overall the DFC test performed the best – on average it had higher sensitivity and partial AUC and its elevation was most prominent in the low range of differentially expressed features, typical for formalin-fixed paraffin-embedded sample sets.


      The distributional fold change test is an effective method for finding and ranking differentially expressed probesets on microarrays. The application of this test is advantageous to data sets using formalin-fixed paraffin-embedded samples or other systems where degradation effects diminish the applicability of correlation adjusted methods to the whole feature set.


      Differential expression Microarray Feature selection Fold change Statistical test ROC curve FFPE



      Area under ROC curve


      Correlation adjusted t (test)


      CAT –test with option ‘diagonal’


      Differentially expressed


      Differentially expressed gene


      Distributional fold change (test)


      Equally expressed


      Fold change


      Fresh frozen


      Formalin-fixed and paraffin-embedded


      False positive rate


      Logarithm to base 2 of fold change


      Logit transformed AUC, 0.5⋅ln(AUC/(1-AUC)


      (Affymetrix) MicroArray Suite version 5


      Robust multi-chip average


      Receiver operating characteristic


      Real-time polymerase chain reaction


      Shrinkage t-test, same as CAT(diag)


      Standardized partial area under ROC curve


      True positive rate


      Weighted average difference


      Youden Index.


      The development of technology over the past two decades has established microarrays as a standard tool for genomic research and discovery [1, 2]. Nowadays, scientists can simultaneously measure the expression of tens of thousands of genes from an experimental sample and identify those genes, which demonstrate a significant change in expression level under the impact of certain experimental conditions. Numerous methods have been proposed to determine differentially expressed genes (DEGs), see, for example [29] and references cited therein. In the majority of cases, the utility of these methods was demonstrated by application to the analysis of expression levels of RNA extracted from fresh frozen (FF) tissue samples. However, clinical genomic research is often focused on retrospective studies, utilizing archival samples stored in formalin-fixed and paraffin-embedded (FFPE) blocksa. By nature of the fixation method, FFPE samples are partially degraded and contain low amounts of total RNA ( [10] and references therein for more details) leading to increased expression variability [10, 11]. This RNA degradation is dependent on a number of factors, including fixation protocol, storage time and storage conditions with the resulting variability introducing a number of challenges for gene expression studies [10, 11]. Apart from high technical variance, FFPE samples typically exhibit low gene expression intensities and a compression of fold change across experimental groups relative to matched FF samples (see, for example [11]), thereby compromising the ability to detect DEGs in samples preserved in this manner. Additionally, RNA transcripts from FFPE samples degrade at different rates and to different levels [1113], which can introduce false negative and false positive correlations between the expression levels of genes. These differential degradation effects impede the direct application of correlation adjusted methods [14, 15] to FFPE samples, and a pre-selection of the most stable (decaying at the same rate) genes should be considered [12]. Therefore, the development of a method dedicated to the analysis of RNA differential expression from FFPE samples is necessary to support the many studies attempting to make discoveries from the wealth of FFPE archival material available. The absence of such a method is especially surprising in the view of enormous improvement of the methods and protocols for the extraction of RNA from FFPE samples in recent years [16].

      In order to shrink the large technical variance inherent in expression levels measured from FFPE tissue samples, one should have enough samples, N s >> 1. Typically microarrays have very large number of probesets N p > 104[17]. Therefore FFPE-derived gene expression experiments fall within the N p >> N s >> 1 paradigm, with the associated complications for subsequent analysis [18]. If we assume that asymptotically, N p → ∞, we may then introduce a dependence of distributions of variables such as fold change and total variance on the expression level and develop an approach where the significance of a gene’s differential expression estimation accounts for its expression level.

      Compression of the expression distribution in FFPE samples towards the lower side [10, 11] necessitates a DEG selection method that work equally well with features at any expression level. Spanning the full expression scale will enable the selection of features with low expressions (typically comprising the main distribution of features in FFPE samples) and with high expressions.

      Summarizing the requirements for successful DEG selection method for FFPE sample sets, we can say that it should work with reasonable number of samples N s >> 1, pick up DEGs equally well at any expression level and be not bounded to specific pre-processing method. The same requirements are actually applicable to successive method working with samples obtained by any preservation method, be it FF or FFPE or some other [19, 20].

      In the following paper, we will use term feature, instead of probeset, transcript, gene, or protein, to emphasize that the methodology presented has general applicability.

      This paper presents the description of a method, called the distributional fold change (DFC) test, which is based on the analysis of the distribution of intensities of all features on a microarray mapped to a three dimensional feature space composed of the average difference of gene expression (logarithm of fold change), total variance and average expression level. It introduces a score based on signal-to-noise ratio that can be used for accurate ranking of DEGs independently of the expression range they come from – high, medium or low, which is extremely important for DEGs from FFPE samples. It also allows the introduction of a statistical (and expression dependent) threshold for the fold change and in this way removes one of the drawbacks of standard methods of filtering based on fold change – the arbitrariness of a cut-off value.

      We evaluate the performance of the new ranking method by comparison with the standard t-test (selected as a basic reference test) and with shrinkage CAT-test [7, 14], which was shown [7] (see also [9]) to be a good representative of the set of methods [46] developed to stabilize gene expression variance. Account of variance in the data is very important for FFPE data sets and in the performance evaluation of DFC test we limited our comparison to only these tests. Extended comparison of AUC values obtained by DFC test with those from t-test based methods [47] and fold change based tests [9] is provided in Additional file 1. The MATLAB source code of the DFC test program is provided in Additional file 2.

      Data sets with established DEGs were selected for testing as these had been previously used for comparison of different methods for detecting differential expression [8, 9]. We limited our comparison to such real life data sets in order to exclude any possibility of bias that could foster the advantage of DFC test.


      Distributional fold change test: general approach

      In a two class comparison setting, the purpose of the DFC test is to remove features based on the analysis of difference between the average expressions in Class 1 and Class 2 respectively:
      Here X = log2(I), logarithm to base 2 of intensity I. Variable d is also called as logFC because of its close connection with the logarithm of fold change, which is usually defined as the ratio of mean intensities:

      The connection between FC and d is FC = 2 d when expression variances in both classes are close (and/or when expectations in (2) are replaced by medians).

      First, we assume that the log transformed intensities have independent normal distributions and therefore their means μ 1 = E[X 1] and μ 2 = E[X 2] and d, as their difference, also have normal distributions. The variance of d can then be estimated as a sum of variances var(μ 1) and var(μ 2):
      where N i is the number of samples in the corresponding class. It is generally accepted that, for small sample sizes, traditional estimation of variance can be inaccurate and therefore needs a stabilizing correction. We apply a minimal correction approach and use the following ansatz:

      Here http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-29/MediaObjects/13015_2012_160_IEq1_HTML.gif is an average variance of unregulated features having (nearly) the same expression (see eq. (9) below for definition of http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-29/MediaObjects/13015_2012_160_IEq2_HTML.gif ). Note that definition (4) extrapolates the variance from standard unbiased definition of variance when http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-29/MediaObjects/13015_2012_160_IEq3_HTML.gif and is equivalent to the definition from likelihood maximization when http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-29/MediaObjects/13015_2012_160_IEq4_HTML.gif . More complicated shrinkage approaches can be applied to improve test performance on data sets with very small sample size < 10.

      The analysis of microarray gene expression data has shown that distributions of d and total and internal variances are expression dependent (Figure 1). We will use a simple approximation of these dependencies as dependence on the mean expression μ = (μ 1 + μ 2)/2 only.
      Figure 1

      Distribution of features in the two-dimensional space of log2(variance) and average expression. Data for two different pre-processing methods: MAS5 – left panel and RMA – right panel from data set GSE6011 (see Table 1) consisting of 37 samples. Blue line provides the mean E[μ |log2(v T)] under fixed variance and red line the mean E[log2(v T) | μ] under fixed average expression. The following colour scheme is used for plotting 2D distribution: green – minimum (0), yellow – maximum. Bright yellow spots therefore indicate high density location of features. On each panel, marginalized distribution of features over variance is shown on the left side and marginalized distribution over average expression is shown at the bottom of the panel.

      Next, we suppose that all features on a microarray can be considered as a mixture of unregulated (equally expressed) and regulated (differentially expressed) features. We will also suppose, for simplicity, that the logFC distribution of unregulated features d 0 at each expression level, μ, can be described by normal distribution http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-29/MediaObjects/13015_2012_160_IEq5_HTML.gif .

      We are interested in finding features that are significantly different from unregulated features. Therefore we test the null hypothesis, that the centre of feature’s logFC distribution coincides with the centre of unregulated features distribution: http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-29/MediaObjects/13015_2012_160_IEq6_HTML.gif . Note that this test is different from the testing hypothesis of μ 1μ 2 = 0 by account of the null (unregulated) logFC distribution, which is supposed to be known and independent from the distribution of regulated features (variance of the null distribution is further defined in the next section, see eq. (12)). A test statistic for evaluating the significance level of each feature with respect to this hypothesis is defined as statistics of the DFC-score:

      This statistic is an intermediate between the normal Z-statistic and T-statistic because of the presence of the variance of null features logFC distribution, which is expected to be (almost) independent of the sample size. Note that this definition of significance level statistic is similar to those of moderated t-statistics, used in a series of papers on variance stabilization [7] (and references cited therein), but principally differs from them in that the additional term v 0(μ) in variance is defined not through the variance of mean internal variance, but mainly through the variance of null features logFC distribution and only to a limited extent through the features’ internal variance.

      Even without knowing the exact statistic for the DFC-score, it can be used for ranking features and selection of a fixed number, or best fraction of features with highest score.

      Null (unregulated) features distribution and variance threshold

      Previously we supposed that we knew the properties of the null features distribution. Here we consider how one can establish them.

      As mentioned previously, the log fold change d and total variance v T depend on average expression μ. We suppose that the number of features is large and enough to accurately define these dependences, which will be exact in the limit N p → ∞.

      Consider features in a slice (μ∆μ/2, μ + ∆μ/2) of three dimensional space of log fold change d, log total variance log2 v T and average expression μ. With the assumption of N p → ∞, this slice can be made infinitesimally thin. The two– dimensional probability distribution f(log2 v T, d | μ) is used below to find the expectation of log variance LV = log2 v T, conditioned on the value of log fold change. According to our assumption, the unconditional distribution function can be considered as a mixture of unregulated (EE: equally expressed) and regulated (DE: differentially expressed) features
      Here π is prior probability of a feature to be differentially expressed and is supposed to be very small, π <<1. For unregulated features the probability distribution can be written as a product of two marginal distributions
      Here and below http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-29/MediaObjects/13015_2012_160_IEq7_HTML.gif and http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-29/MediaObjects/13015_2012_160_IEq8_HTML.gif . Using (7) and notation
      we can rewrite eq. (6) in integral form
      The relationship (8) can be simplified if we find such LV and d values, at which F DE (LV, d|μ) < or ≈ F EE (LV, d|μ) and therefore with account of π <<1 one can replace the expression in curly brackets by 1. In Additional file 1 it is shown that this can be done for some range of |d| around d = 0 and LV < LV Th (μ), with the threshold value defined as
      In this range the eq. (8) can be reduced to
      We will suppose that approximation (10) holds for all d values, that is for all d and all log2 v T < LV Th (μ) the distribution function f(LV, d|μ) ≈ f EE (LV, d|μ). The threshold (9) is an approximate way to separate a subset of unregulated (null) features:
      and can be used as a boundary to set up a variance filter. Its application to remove null features is shown in Figure 2. We supposed in previous section that f EE M (d|μ) ~ N(0, σ 0(μ)2). Basing on approximation (10) and using the definition (11) the dependence σ 0(μ) can be estimatedb from fit
      Figure 2

      Application of an expression dependent threshold ( 14 ). Scatterplot of features in the two-dimensional space of log2(variance), average expression for two different pre-processing methods: MAS5 – left panel and RMA – right panel. Data from data set GSE6011 (see Table 1) consisting of 37 samples. Blue dot represent features satisfying condition (11) and therefore considered as coming from null distribution. Green points represent features having total variance above expression dependent threshold and considered as non-nulls. On each panel, marginalized distributions of all and non-null features over variance is shown on the left side and marginalized distribution of all and non-null features over average expression is shown at the bottom of the panel.

      Significance level and power for testing each individual feature

      The standard deviation σ 0(μ) reflects the expression dependence of the unregulated features probability distribution and together with significance parameter α (for Type I error) can now be used to set expression dependent threshold on the absolute value of the logFC

      Here Φ −1 is normal inverse cumulative distribution function. Below this threshold, all features are considered as having insufficient evidence for differential expression at the confidence level α. As this is specified for the null distribution obtained from analysis of all features on a microarray with nearly the same expression that is through sharing information across these features, the parameter α indicates the significance level of taking multiple testing into account. For α = 1 the threshold (13) turns to 0 and no information about multiple testing is included into finding differentially expressed features.

      To define a power (probability of not committing Type II error) of detecting a DE feature, we calculate from eq. (3) standard deviation of d
      and use Student’s t(d(μ)/s(μ),DF) distribution with degrees of freedom DF,
      as an alternative distribution to impose statistical power requirements. Only features with power at least equal to 1− βTh above a level specified by the significance α shall pass the filter:

      Here T –1 is Student's t inverse cumulative distribution function. Note that in the definition of non-null features {d D (μ)}, the requirement for the variance to be above the threshold is also included in order to reflect that condition (11) was used to define properties of null features distribution. The condition is not directly required and is optional in software implementationc.

      Strictly speaking in (16.a) we should not assume that d(μ)/s(μ) follows the Student’s t-distribution as stabilized variances (4) are used to calculate s(μ) (14), but keeping in mind that Welch’s definition of degrees of freedom (15) is an approximate solution of Behrens-Fisher problem [21] and that correction (4) is small except in rare cases of very small number of samples, we suppose that the t-distribution is a sufficient approximation.

      The information obtained here can be used to calculate the power (of testing feature for being DE) conditional on significance level α, for selected features. For |log FC| > Δ 1Th (α, μ):
      Here T is Student's t cumulative distribution function. Note that conditions (16) can be transferred onto a requirement for fold change conditional power:

      Thus the DFC filter incorporates three different statistical filters: the multiple testing based threshold through parameter α, the t-test conditioned on the values of α through parameter β and the variance filter. Compared with a traditional fold change filter where the threshold is arbitrarily selected, the DFC threshold is defined by the features significance level and conditional power and depends on the properties of a particular data set. This method has the advantage of being self-adjusting through the accurate estimation of the unregulated features distribution d 0 and taking into account the d(μ) distribution of regulated features thus providing an option to impose power requirements. The two significance parameters, α and β, allow for a controlled tuning of filtering threshold.

      When α = 1, the method is reduced to the selection of features by a standard t-test with threshold p Th = 2β Th combined with variance filter; when β Th = 0.5 (and α < 1) the method is reduced to selection based on the ‘Unusual Ratio’ variant of fold change method (see, for example, [2]) with internal definition of the null feature distribution. There is no need in setting restrictive values for α and β, standard settings α = 0.05 and β = 0.2 should be sufficient as their intention is to remove unregulated features. Once the (α, β Th ) selection criteria are applied and unregulated features removed, ranking of differentially expressed features can be performed by DFC score (5) and used for selecting best subset of differentially expressed features.

      Evaluation method

      To evaluate the performance of the DFC algorithm, we use the receiver operating characteristic (ROC) curve [22]. This is a graphical plot of the parametric dependence of the fraction of true positives τ = true positive rate (TPR) on the fraction of false positives η = false positive rate (FPR) as the number of features predicted to be differentially expressed (K or, equivalently, ν = K/N p ), varies. For a given range of η or τ, one ROC curve is better than another if it is lying to the northwest (τ is higher for fixed η, or η is lower for fixed τ) of the first.

      We use the area under ROC curve (AUC):

      as one of criteria for comparison, because it has an important statistical property: the AUC of a test is equivalent to the probability that the test will rank a randomly chosen positive instance higher than a randomly chosen negative instance [23]. AUCs and ROC curves have been used in some previous works for comparison of different feature selection tests see, for examples [79], and are standard metrics used for the evaluation and comparison of diagnostic tests.

      The number of features on a microarray N p is usually extremely large (N p > 104) and is much higher that the number of true DEGs N T , (less than 100 for data sets listed in Table 1) N p >> N T . This is even more valid for data sets from FFPE samples (see also section Background). Therefore, when dealing with FF and FFPE sample sets of much higher interest is accessing performance of an algorithm relative to the ideal one, for only a small fraction
      of best features selected by a method (say up to ν ~ 0.05, which for the HG-U133A microarray would correspond to ~ 1000 features). Taking into account the relation
      one can also use η to estimate ν (or vice versa), unless η drops to values below ~0.001.
      Table 1

      Data sets from GEO database


      GEO data set

      Experiment summary/Title

      N A

      N B

      N PC

      N Ka



      Study of whether inadequate protein intake differentially affects skeletal muscle transcript levels and expression profiles in older adults [24]







      DNA methyltransferase 3B (DNMT3B) mutations in ICF syndrome [25]






      GSE2638 and 2639

      GSE2639: HUVEC were left untreated or stimulated for 5h with 2 ng/ml TNF. Comparsion of the gene profiles revealed TNF-mediated gene expression changes in HUVEC [26]. Study TNF stimulated vs controls.






      GSE2638 and 2639

      GSE2638: HMEC cultures were left untreated or stimulated for 5h with 2 ng/ml TNF. Comparison of the gene expression profiles revealed the TNF-mediated gene expression changes [26]. Study HMEC vs HUVEC







      Comparison of Hutchinson–Gilford Progeria Syndrome fibroblast cell lines to control fibroblast cell lines [27].







      Gene expression in Stage 1,2 Normal and Tumor kidney cancer [28]







      Dioxin-induced gene expression changes in MCF-7 human breast cancer cells [29]







      Comparison of transcriptional profiles of CD4+ and CD8+ T cells from HIV-infected patients and uninfected control group [30]. Study of CD4+ T cells







      Comparison of transcriptional profiles of CD4+ and CD8+ T cells from HIV-infected patients and uninfected control group [30]. Study of CD8+ T cells







      Expression data from quadriceps muscle of young DMD patients and age matched controls [31]







      Total RNA from two commonly used choriocarcinoma cell lines, JEG3 and BeWo, are compared in this experiment to identify differentially expressed transcripts [32].






      Total N P




      Data sets from GEO database [33], used for testing efficiency of DFC test. Samples in all data sets were profiled on Affymetrix GeneChip HG-U133A microarrays with 22283 probesets. Shortcuts: N A – number of samples in condition A, N B – number of samples in condition B, N PC – number of probesets checked by RT PCR. Total number of probesets, checked by RT PCR is 284. For easy access for data set’s detailed information we provide in the last column N Ka – data set’s number in the description file of ref [9].

      It is possible for a high-AUC test to perform worse than a low-AUC test in a specific region of ROC space. In our case, for evaluation of a method working well also with FFPE sample sets, the range (20) of small ν and η is of highest interest. Here, a more appropriate parameter is partial AUC [22], which is defined as an area under ROC curve when integration in (19) is carried out only up to η: pAUC(η) = ∫ 0 η τ(η ') '. For an ideal receiver τ(η) = 1, therefore pAUC ideal(η) = η and the pAUC of a method, standardized on the pAUC of ideal receiver will be:

      We use standardized partial area (SPA) curves and their ratios as the main criteria for comparison. Note that standardized partial area SPASPA(1) = AUC and its value shows how close the performance of a method is to the performance of an ideal method in the range of FPR [0, η]. SPA can be also considered as the average TPR over the same range [0, η]. We use both AUC and SPA to assess the performance of the DFC test.

      In typical for FFPE data sets situations where N p >> N T, ROC curves on a normal scale (η) are of little use and are much more informative on logarithmic scale; hence we present our result on log10 η scale.


      Data sets

      We evaluated the performance of the DFC test using 11 publicly available Homo sapiens microarray data sets, listed in Table 1, each of which have had a portion of discovered DEGs experimentally validated by a real-time polymerase chain reaction (RT-PCR). They are chosen from FF sample sets, listed and described in Ref. [9]. The selection of experimental data sets was based on the requirement that total number of DEGs confirmed by RT-PCR should be above ~10 (see Additional file 1 for details of subset selection). Having a large number (>>1) of verified DEGsd is important for building representative ROC curves and for the estimation of area and partial area under ROC curves.

      It is known [8] that the majority of true DEGs verified by RT-PCR in experimental studies on FF samples tend to have high expression levels. This was also exploited in some feature selection methods [9]. The DFC method is designed to pick up DEGs independent of their expression level and therefore should work in these as well as in FFPE data sets where the expression values tend to be comparatively lower.

      Following [8, 9] we consider that the evaluation of results based on real experimental data sets should take precedence over those based on artificial data sets. Therefore analysis of the test performance is based on real-world experimental data sets only.

      There are several methods available for pre-processing data profiled on Affymetrix microarrays [1, 34]. We used Affymetrix Expression Console with standard settings to apply two of the most frequently used pre-processing methods: MAS5 [35, 36], which is designed to work on a single chip basis, and RMA [37, 38], a multiarray-based approach. As can be seen from Figure 1, these two methods provide very different distributions of features in expression – variance space and we considered it sufficient to concentrate only on these two methods.


      Within the DFC algorithm, features are ranked on the basis of the Z d score (5) and their relevance to differential expression is assessed using two criteria (13,16): fold change should have an appropriate significance level < α and power > 1 – β Th. The latter two are complemented by requirement that variance should be above a specified threshold. To create continuous ROC curves we set α = 1 and β Th = 0.5 and ranked features using Z d p-values, calculated based on the assumption that Z d follows normal distributione. Specific values of α and β Th define starting point on the curve and their selection is equivalent to setting appropriate cut-off p-values. For t- and shrinkage t- test this is typically done by controlling the false discovery rate.

      Our aim is to develop and check performance of a test for systems where technical variation is large (such as FFPE samples sets) and assessment of reliability of detecting differential expression is of extreme importance. Therefore we compared the performance of the DFC test with t-test based methods: the standard t-test and with the CAT-test [14] with the ‘diagonal’ optionf. This option is equivalent [14] to shrinkage t-test [7], which was shown [7, 9] (see also Additional file 1) to perform similarly to other variance stabilization derivatives of the t-test [46], and can be considered as their representative. The ordinary t-test is provided as a reference for the improvement of any t-test based method, which DFC test and CAT test clearly are. According to [7] the ordinary t statistic shows average though never optimal performance (regardless of the variance structure across features). Detailed comparison of AUCs for DFC test and a set of t-test based methods [47], as well as with fold change test and its ad hoc modification weighted average difference (WAD) [9] method is presented in the Additional file 1.

      The AUC values for MAS5- and RMA-pre-processed data for the selected experimental data sets (described in Table 1), are shown in Table 2. One can see that, on average, the DFC test achieves higher AUCs than the t-test and shrinkage t-test.
      Table 2

      AUC performance of DFC test, t-test, and shrinkage t-test

      GEO data set

      N s

      AUC for MAS5 pre-processed data

      AUC for RMA pre-processed data








































































































      AUC performance of DFC test, t-test, and shrinkage t-test on MAS5 and RMA pre-processed data from data sets described in Table 1. N s is the number of samples in the set. aShrinkT -test values were calculated with CAT-test [14], option ‘diagonal’. bAverage was calculated for logit transformed AUC values, LTA = 0.5⋅ln(AUC/(1-AUC)) and then transformed back to AUC scale.

      For estimation of the significance of differences in AUC values we applied a paired-sample single-sided t-test. The observed AUC values are very close to 1 and consequently, their distributions and distributions of their differences cannot be very well approximated by normal distributions. To obtain a more comprehensive estimation of the significance of difference, we applied a paired-sample single sided Wilcoxon signed rank test to AUC values and paired-sample single sided t-test to logit transformed AUC values, 0.5⋅ln(AUC/(1-AUC)). The logit transformation [39] maps the interval (0,1) onto (−∞, +∞) and makes transformed variables more normally distributed and therefore t-test better applicable. The results shown in Table 3 indicate that all differences are significant (on a significance level better than 0.05).
      Table 3

      Significance of differences in AUC





      DFC – t-test

      DFC – CAT

      DFC – t-test

      DFC – CAT

      Wilcoxon on AUC





      t-test on 0.5⋅ln(AUC/(1-AUC))





      Paired-sample single sided Wilcoxon test p-values calculated for AUC and paired-sample single sided t- test p-values calculated for logit transformed AUC, variable 0.5×ln(AUC/(1-AUC)).

      One of the most important characteristics of the method is its ability to find DEGs independently of the pre-processing method applied to data. This should be evident from AUC as an overall characteristic of the test’s performance. Calculation of correlation coefficients between (logit transformed) AUCs for MAS5 and RMA pre-processed data (see Table A4 in the Additional file 1) showed that the DFC test has the highest correlation between AUCs (ρ DFC = 0.92), although its prevalence is not high enough to make it significantly different from other tests (ρ t-test = 0.88 and ρ shrinkT = 0.87), with differences in the correlation coefficients having p-values above 0.3 (see also Additional file 1 for broader range of comparisons).

      Figure 3 shows ROC and SPA curves for 3 out of 11 analysed data sets, selected to represent different pre-processing methods and different number of features proved by RT-PCR. The first data set was pre-processed with MAS5 and has the highest number of samples. The other two data sets were pre-processed with RMA and have a reasonable number of samples and features tested by RT-PCR. Curves for all data sets are provided in Additional file 1. One can see that independent of the pre-processing method, the DFC test performs in general slightly better than CAT(diag) and much better than t-test. This observation is confirmed when 〈ROC|ν〉 and 〈SPA|ν〉 curves are compared. These curves are obtained by averaging parametric dependences over all 11 data sets (indicated by angular brackets) under a fixed fraction ν of top ranked features selected. The dependences are shown in Figures 4 and 5 by thick lines and the plots are provided for both pre-processing methods, MAS5 and RMA. To reveal the extent of variance in the data for each method, Figure 4 also shows thin lines drawn at half of the standard error above and below the corresponding average curve.
      Figure 3

      Receiver operating characteristic curves (left panel) and standardized partial AUC curves (right panel) for different data sets. Upper row – data sets GSE6011, 37 samples, MAS5 pre-processing, 10 true DEGs, middle row – data sets GSE6344, 20 samples, RMA pre-processing, 19 true DEGs and lower row – data sets GSE 6740, 20 samples, RMA pre-processing, 62 true DEGs. To facilitate comparison of dependencies at low false positive rates log10 scale is applied (in subsequent figures also).

      Figure 4

      Average ROC curves. Average ROC curves for two different pre-processing methods: MAS5 – left panel and RMA – right panel. Data from 11 data sets having 284 true DEGs. Thick lines are 〈τ|ν〉 and thin lines represent 〈τ|ν〉 ± se(〈τ|ν〉)/2 (half of the standard error below and above corresponding line with the same colour.

      Figure 5

      Average SPA curves. Average standardized partial area (SPA) curves for two different pre-processing methods: MAS5 – left panel and RMA – right panel. Data from 11 data sets having 284 true DEGs.

      The behaviour of the DFC test ROC and SPA curves displayed in Figures 4 and 5 agrees with what one would expect from a test performing better than the standard t-test on a reasonably sized (more than 10 samples) data set with ~ 100 differentially expressed features. When a high fraction of features, ν > 0.5, is taken as differentially expressed the difference between the DFC test and t-test is minimal, as both tests remove the most easily detectable, non-expressed features. When a very small fraction of features ν ~ 1/N p is taken as differentially expressed, resulting in only few features selected, the difference between the DFC test and t-test will be small again, as the differential expression of the few features should be very strong and can be effectively selected by t-test alone. One can expect an improvement of DFC over t-test when dealing with an intermediate range (20).

      To quantify the DFC test improvement over t- and CAT- tests, we calculated the sensitivity ratios 〈τ(DFC)|ν〉 / 〈τ(other)|ν〉 and partial area ratios 〈SPA(DFC)|ν〉 / 〈SPA(other)|ν〉 as a function of ν (top fraction of ranked features). These are shown in Figures 6 and 7, for both pre-processing methods. One can see that the improvement over the t-test is significant (at the z-test level of ≤ 0.1) in the most important range (20). This is true for both the average sensitivity and partial area increase. Taking into account confidence intervals, the DFC- test behaviour in MAS5 and RMA pre-processed data sets is equivalent. Sensitivity 〈τ|ν〉 increase over the t- test is around 50 ÷ 100% for 0.0003 < ν < 0.001, then it gradually decreases to ~ 0 % at ν > 0.2 passing through ~ 30% when ν is ~ 0.01. Partial area increase can be described by nearly the same dependence with the exception that it decreases gradually to ~ 2% at ν =1.
      Figure 6

      DFC test sensitivity increase. DFC test sensitivity 〈τ|ν〉 increase over t- and CAT(diag)- test as a function of ν for two different pre-processing methods: MAS5 – left panel and RMA – right panel. Thick lines show the ratios and corresponding thin lines show ±1.28σ deviations from the ratio. Data from 11 data sets having 284 true DEGs.

      Figure 7

      DFC test partial area increase. Partial area 〈SPA|ν increase over t- and CAT(diag)- test value for two different pre-processing methods: MAS5 – left panel and RMA – right panel. Thick lines are the ratios and corresponding thin lines show ±1.28σ deviations from the ratio. Data from 11 data sets having 284 true DEGs.

      Improvement of the DFC- test over the CAT-test is in a narrower region. This can be clearly seen from Figure 7, where the improvement in the partial area under ROC curve is significant for ν > 0.0015 only. It decreases from ~30 ÷ 50% to 10% when ν changes from 0.0015 to 0.01 and then gradually to ~ 1% at ν =1.

      Using data represented in Figure 4, one can also calculate the Youden Index (YI), which is the maximum difference between the true positive and false positive rates, YI = max(τ(ν)−η(ν)) [22]. The YI ranges between 0 for random test and 1 for an ideal test. The threshold at the point ν max on the ROC curve corresponding to the YI is often taken to be the optimal threshold (see, for example, [12, 22]). Results for YI and ν max = argmax(τ(ν)−η(ν)) are provided in Table 4 and show that the DFC test outperforms the shrinkage CAT-test and t-test. It has the highest YI and the lowest ν max. All data sets were profiled on Affymetrix GeneChip HG-U133A microarrays with 22283 probesets. Therefore the optimal range for the number of features selected by the t-test is approximately (2.7 ÷ 4) × 104, by CAT-test approximately (1.8 ÷ 2.7) × 104 and by DFC- test approximately (0.9 ÷ 2)×104 features.
      Table 4

      Youden Index YI , CI – 80% confidence interval for YI and ν max for DFC-, CAT- and t-test





















      [0.78, 0.88]

      [0.79, 0.90]

      [0.72, 0.85]

      [0.76, 0.87]

      [0.78, 0.90]









      We have proposed a new method for removing non-differentially expressed features and ranking differentially expressed features from gene expression data.

      It was designed to work with expression data from microarrays containing large number of features ( N p > 104), allowing one to analyse the distribution of all features on a microarray mapped to a three dimensional space composed of average difference of feature expression (logarithm of fold change), total variance and average expression level. A simple approach was introduced to define the expression dependent null features distribution and to estimate null features expression dependent average variance (9) and variance of logFC (12). These dependences are incorporated into the DFC test score Z d (5) for individual feature, which in this way explicitly takes into account information about presence of other features and can be used for accurate feature ranking.

      The definition of the score Z d (5) is similar to moderated t-statistics, used in a series of papers on variance stabilization ( [1, 7] and references sited therein), but principally differs from them in that the variance stabilization is defined through the variance of null features logFC distribution (12) and to a limited extent through the features’ internal variance.

      The same dependences (9) and (12) were used to introduce a statistical (and expression dependent) threshold for the fold change based on specification of power 1 – β at given significance level α. This method has the advantage of being self-adjusting through the accurate estimation of the unregulated features distribution f(d 0) and taking into account the f(d|μ) distributions of regulated features, thus providing an option to impose power requirements. The two parameters, α and β, control Type I and Type II errors and allow for a tuning, to particular purposes of experiment, of a threshold (16) below which features are considered as having no sufficient evidence to be called differentially expressed. One can show that features passing DFC test all have (ordinary t-test) p-values below expression dependent threshold pp Th (we use notation p Th to distinguish it from α), which includes correction dependent on properties of unregulated features distribution

      When α = 1, the method is reduced to selection of features by t-test with threshold p Th = 2β Th (combined with variance filter), when β Th = 0.5 the method is reduced to selection based on the ‘Unusual Ratio’ variant of fold change method [2] with an internal definition of null features distribution. Once the selection criteria (α, β Th ) are applied and the set of unexpressed features removed, ranking of differentially expressed features can be performed by the DFC score (5).

      Standard approaches for multiple test correction [1, 2, 18] (and references therein) do not take into account expression dependence of the threshold (22). This problem will be considered in a separate publication. Here we note only that multiplicity correction affects only the arbitrary threshold choice and does not change the ranking of features [1]. Ranking of features with score (5) should be complemented with functional analysis (see, e.g. [1, chapter 5]) for final reduction of the number of false positives based on biological grounds.

      The definition of the Type II error (17) has some similarity with re-centered t- statistic [40], but differs from the TREAT method in the way how threshold is defined. In ref. [40] “a pre-specified threshold (τ) for the log-fold-change below which differential expression is not of material interest” [34] is introduced in order to address the thresholded null hypethesis H 0: |d| ≤ τ against alternative H 1: |d| > τ. The relevance of particular choice (τ=log2(1.1), or τ=log2(1.5) or τ=log2(2) were used in [40] for three data sets) to particular dataset actually has to be independently verified, while in our approach the threshold (13) is 1) expression dependent and 2) is defined through the significance parameter α and it fully reflects properties of particular experiment. Ranking of features in [40] is performed using TREAT test p-value, which is equivalent to 2β (17) but with replacement of ∆1Th(α μ) by an arbitrary threshold τ . Parameter β (conditional on the value of α (or τ according to definition in [40])) is good for defining the threshold (16) above which features differential expression can be considered as reliably detected, but we believe is not well suited for ranking of features (see also [41] for a discussion of fold change and p-value cutoffs). The best parameter for this purpose is signal-to-noise ratio Z d (5) and as it is shown in the paper and Additional file 1 it outperforms ranking by moderated t- test statistics and fold change based methods.

      The performance of the DFC test was verified using 11 real experimental data sets, with DEGs independently verified by RT-PCR. Their selection was based on the requirement of having in each set sufficiently large number of verified DEGs to build AUC. The total number of verified DEGs in these data sets was 284. We demonstrated that the DFC test is significantly better than the t-test in terms of the total and partial area under receiver operating curves. The improvement was dramatic (on average > 30%) in the most important (for FF and FFPE sample sets) range of the number of selected features K < 1000.

      Some improvement was obtained in comparison with shrinkage t-test [7, 14], which can be considered as one of the best variance stabilizing methods, although improvement in partial area under ROC curve was within confidence limits (for 0.1 confidence level) for a number of selected features below ~30. Variance stabilization is very important for small data sets, although, as comparison shows, even for medium range data sets of 10 ÷ 30 samples, improvement can be significant. Taking into account that the DFC test was not optimized for variance stabilization (FFPE sample sets are seldom small), its performance can potentially benefit from the inclusion of expression dependent stabilization of variance.

      Analysis of correlation coefficients between AUCs for MAS5 and RMA pre-processed data showed that DFC method works equally well with both methods. Correlation is very high (ρ DFC = 0.92) and is higher (though not significantly) than for the other tests considered. This demonstrates that the DFC method does accurately take into account expression dependence of fold change and total variance, which are very much different in MAS5 and RMA pre-processed data, see, for example, Figure 1 for variance dependences.

      We already mentioned above that our comparison was limited by only tests that take into account feature’s variance (which is very important for FFPE datasets as they have high technical variance [10, 11]). The fold change test has no associated value that can indicate the level of confidence in the designation of feature as DE. Its performance depends on features variances which can be very different for different pre-processing methods applied to data [42], see for example Figure 1 for comparison of MAS5 and RMA pre-processed data. Fold change test was shown [7] to be good only if features variances are all fairly similar [7]. Basing on this observation and taking into account that features variances are fairly similar for RMA pre-processed data in the high expression range (e.g., 9 – 12 on Figure 1) and decrease with expression for MAS5 pre-processed data (e.g., for expressions in the range 6 – 12 on Figure 1) one can expect that fold change test should perform well on RMA pre-processed data when a small number of features is looked after and fail on MAS5 pre-processed data. On the contrary, the WAD method [9] should perform well on the data with variances inversely proportional to the expression. Therefore it should work well for MAS5 pre-processed data, and fail on RMA pre-processed data. This corroborates with findings in [9] (see also Additional file 1). Nevertheless, when the set sizes and number of independently verified features are restricted to be reasonable, N s and N PC > 10, the DFC test and moderated t- tests [47] perform better than either of them (see Additional file 1).

      The independence of fold change test on features variances triggered researchers to look for combined approaches – to require that DE features satisfy both p-value and fold change criteria simultaneously [40]. Here the question arises as to how to combine these two tests – it was shown recently [41] that the cutoffs can significantly alter microarray interpretations. DFC test is free from these shortcomings as the ranking of features is performed using the signal-to-noise ratio (5) and the threshold (16) is defined by expression dependent properties of particular experiment and only removes unreliable features. No artificial fold change thresholds are introduced.

      Summarizing discussion we can say that DFC method was developed and shown to work with reasonable number of samples N s >> 1, pick up DEGs equally well at any expression level and is not bounded to specific pre-processing method.


      We have proposed a new method, called distributional fold change test for removing non-differentially expressed genes, and ranking differentially expressed genes from gene expression data. The method was designed to work with data sets of FFPE samples profiled on microarrays, containing large number of genes (> 104) and to accurately select and rank differentially expressed genes, taking into account their expression level.

      The method is based on analysis of the distribution of all genes on a microarray mapped to a three dimensional feature space composed of average difference of gene expression (logarithm of fold change), total variance and average expression level. It allows for the imposition of a statistical (and expression dependent) threshold for the fold change and the introduction a score based on signal-to-noise ratio which is used for accurate gene ranking.

      Performance of the DFC test was verified using 11 real experimental data sets, with DEGs independently verified by RT-PCR. We demonstrated that DFC test is significantly better than the t-test in terms of detecting DEGS as measured by the total and partial area under receiver operating curves. Its advantage is most prominent in the range of low fraction of DEGs, which is the most important range for the analysis of fresh frozen and especially FFPE sample sets. Given its excellent performance we believe that the DFC test should be routinely used for the analysis of microarray data.


      aSuch studies benefit from the availability of complete (or near complete) clinical information on patient history, treatments and prognosis/survival.

      bDetails of fitting procedure to get the dependence σ 0(μ) is provided in the Additional file 1.

      cThe condition log2 v T > LV Th(μ) is a convenient way if imposing expression dependent variance filter with threshold defined by properties of the null features distribution (see eq. 11). Its application is favourable in situations of imposing stringent selection criteria. When imposing mild selection criteria or looking for ranking of all features it shall be switched off (see also endnote e).

      dThese DEGs may comprise only a portion of true DEGs – not all DEGs can be physically checked by RT-PCR due to limitations of the method – but nevertheless allow a comparative analysis of the DFC test’s performance compared to the reference tests.

      eFor two data sets, GSE6740_2 (MAS5 pre-processing) and GSE9499 (RMA pre-processing), we had to lift the variance filter in order to calculate the AUC.

      fThis option was chosen because, for extremely high-dimensional data, estimating correlation is very difficult and in such instances it is recommended to conduct diagonal analysis [15].



      This research was conducted as a part of the Almac Diagnostics company program for developing methods specifically applicable for expression analysis of RNA extracted from FFPE samples. It was supported by the Invest Northern Ireland grant 1009/101038722 and partly by the European Sustainable Competitiveness Programme 2007–2013 under the European Regional Development Fund. The authors gratefully acknowledge Vitali Proutski for continuous support during this work and Miika Ahdesmäki for providing the Matlab version of shrinkage CAT score package. Discussions with colleagues Steve Deharo, Gera Jellema, Eamonn O’Brien, Vitali Proutski, and others are highly appreciated. The authors are thankful to Miika Ahdesmäki and Timothy Davison for their suggestions for improvement of the manuscript content. Timothy Davison also made contribution to improving the language of the manuscript.

      Authors’ Affiliations

      Almac Diagnostics


      1. Göhlmann H, Talloen W: Gene Expression Studies Using Affymetrix Microarrays. Boca Raton: CRC Press; 2009.
      2. Zhang A: Advanced analysis of gene expression data. Singapore: World Scientific; 2006.View Article
      3. Kim SY, Lee JW, Sohn IS: Comparison of various statistical methods for identifying differential gene expression in replicated microarray data. Stat Methods Med Research 2006, 15:3–20.View Article
      4. Smyth GK: Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 2004,3(1):Article 3.
      5. Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 2001,98(9):5116–5121.PubMedView Article
      6. Sartor MA, Tomlinson CR, Wesselkamper SC, Sivaganesan S, Leikauf GD, Medvedovic M: Intensity-based hierarchical Bayes method improves testing for differentially expressed genes in microarray experiments. BMC Bioinformatics 2006, 7:538.PubMedView Article
      7. Opgen-Rhein R, Strimmer K: Accurate ranking of differentially expressed genes by a distribution- free shrinkage approach. Statist Appl Genet Mol Biol 2007, 6:9.
      8. Hu J, Xu J: Density based pruning for identification of differentially expressed genes from microarray data. BMC Genomics 2010,11(Suppl 2):S3.PubMedView Article
      9. Kadota K, Nakai Y, Shimizu K: A weighted average difference method for detecting differentially expressed genes from microarray data. Algorithm Mol Biol 2008, 3:8.View Article
      10. Farragher SM, Tanney A, Kennedy RD, Harkin PD: RNA expression analysis from formalin fixed paraffin embedded tissues. Histochem Cell Biol 2008, 130:435–445.PubMedView Article
      11. Abdueva D, Wing M, Schaub B, Triche T, Davicioni E: Quantitative expression profiling in formalin-fixed paraffin-embedded samples by affymetrix microarrays. J Mol Diagn 2010, 12:409–17.PubMedView Article
      12. Kennedy RD, Bylesjo M, Kerr P, Davison T, Black JM, Kay EW, Holt RJ, Proutski V, Ahdesmaki M, Farztdinov V, Goffard N, Hey P, McDyer F, Mulligan K, Mussen J, O'Brien E, Oliver G, Walker SM, Mulligan JM, Wilson C, Winter A, O'Donoghue D, Mulcahy H, O'Sullivan J, Sheahan K, Hyland J, Dhir R, Bathe OF, Winqvist O, Manne U, et al.: Development and independent validation of a prognostic assay for stage II colon cancer using formalin-fixed paraffin-embedded tissue. J Clin Oncol 2011, 29:4620–4626.PubMedView Article
      13. Mittempergher L, de Ronde JJ, Nieuwland M, Kerkhoven RM, Simon I, et al.: Gene expression profiles from formalin fixed paraffin embedded breast cancer tissue are largely comparable to fresh frozen matched tissue. PLoS One 2011,6(2):e17163.PubMedView Article
      14. Zuber V, Strimmer K: Gene ranking and biomarker discovery under correlation. Bioinformatics 2009, 25:2700–2707.PubMedView Article
      15. Ahdesmäki M, Strimmer K: Feature selection in omics prediction problems using cat scores and false non-discovery rate control. Ann Appl Stat 2010, 4:503–519.View Article
      16. Klopfleisch R, Weiss AT, Gruber AD: Excavation of a buried treasure–DNA, mRNA, miRNA and protein analysis in formalin fixed, paraffin embedded tissues. Histol Histopathol 2011,26(6):797–810.PubMed
      17. Affymetrix, Inc: Technical Note: Design and Performance of the Gene-Chip Human Genome U133 Plus 2.0 and Human Genome U133A Plus 2.0 Arrays, 2003. Affymetrix, Inc. Technical Note: GeneChip® Expression Platform: Comparison, Evolution, and Performance, 2004. http://​media.​affymetrix.​com/​support/​technical/​technotes/​expression_​comparison_​technote.​pdf
      18. Hastie T, Tibshirani R, Friedman J: The Elements of Statistical Learning; Data Mining, Inference and Prediction. 2nd edition. New York: Springer; 2009.
      19. Braun M, Menon R, Nikolov P, Kirsten R, Petersen K, Schilling D, Schott C, Gündisch S, Fend F, Becker KF, Perner S: The HOPE fixation technique–a promising alternative to common prostate cancer biobanking approaches. BMC Cancer 2011, 11:511.PubMedView Article
      20. Klopfleisch R, von Deetzen M, Weiss AT, Weigner J, Weigner F, Plendl J, Gruber AD: Weigners fixative--an alternative to formalin fixation for histology with improved preservation of nucleic acids. Vet Pathol 2012. Apr 26. [Epub ahead of print]
      21. Sawilowsky SS: Fermat, Schubert, Einstein, and Behrens–Fisher: the probable difference between two means when σ1 ≠ σ2. Journal Mod App Stat Meth 2002, 1:461–472.
      22. Krzanowski WJ, Hand DJ: ROC curves for continuous data. Boca Raton: CRC Press; 2009. [Monographs on statistics and applied probability, vol 111]View Article
      23. Fawcett T: An introduction to ROC analysis. Pattern Recogn Lett 2006, 27:861–874.View Article
      24. Thalacker-Mercer AE, Fleet JC, Craig BA, Carnell NS, et al.: Inadequate protein intake affects skeletal muscle transcript profiles in older humans. Am J Clin Nutr 2007, 85:1344–52.PubMed
      25. Jin B, Tao Q, Peng J, Soo HM, et al.: DNA methyltransferase 3B (DNMT3B) mutations in ICF syndrome lead to altered epigenetic modifications and aberrant expression of genes regulating development, neurogenesis and immune function. Hum Mol Genet 2008, 17:690–709.PubMedView Article
      26. Viemann D, Goebeler M, Schmid S, Nordhues U, et al.: TNF induces distinct gene expression programs in microvascular and macrovascular human endothelial cells. J Leukoc Biol 2006, 80:174–85.PubMedView Article
      27. Csoka AB, English SB, Simkevich CP, Ginzinger DG, et al.: Genome-scale expression profiling of Hutchinson-Gilford progeria syndrome reveals widespread transcriptional misregulation leading to mesodermal/mesenchymal defects and accelerated atherosclerosis. Aging Cell 2004, 3:235–43.PubMedView Article
      28. Gumz ML, Zou H, Kreinest PA, Childs AC, et al.: Secreted frizzled-related protein 1 loss contributes to tumor phenotype of clear cell renal cell carcinoma. Clin Cancer Res 2007, 13:4740–9.PubMedView Article
      29. Hsu EL, Yoon D, Choi HH, Wang F, et al.: A proposed mechanism for the protective effect of dioxin against breast cancer. Toxicol Sci 2007, 98:436–44.PubMedView Article
      30. Hyrcza MD, Kovacs C, Loutfy M, Halpenny R, et al.: Distinct transcriptional profiles in ex vivo CD4+ and CD8+ T cells are established early in human immunodeficiency virus type 1 infection and are characterized by a chronic interferon response as well as extensive transcriptional changes in CD8+ T cells. J Virol 2007, 81:3477–86.PubMedView Article
      31. Pescatori M, Broccolini A, Minetti C, Bertini E, et al.: Gene expression profiling in the early phases of DMD: a constant molecular signature characterizes DMD muscle from early postnatal life throughout disease progression. FASEB J 2007, 21:1210–26.PubMedView Article
      32. Burleigh DW, Kendziorski CM, Choi YJ, Grindle KM, et al.: Microarray analysis of BeWo and JEG3 trophoblast cell lines: identification of differentially expressed transcripts. Placenta 2007, 28:383–9.PubMedView Article
      33. Barrett T, Troup DB, Wilhite SE, Ledoux P, Rudnev D, Evangelista C, Kim IF, Soboleva A, Tomashevsky M, Edgar R: NCBI GEO: mining tens of millions of expression profiles--database and tools update. Nucleic Acids Res 2007,35(Database issue):D760-D765.PubMedView Article
      34. Bolstad B: Preprocessing and Normalization for Affymetrix GeneChip Expression Microarrays. In Methods in microarray normalization. Edited by: Stafford P. Boca Raton: CRC Press; 2008:41–60.View Article
      35. Hubbell E, Liu WM, Mei R: Robust estimators for expression analysis. Bioinformatics 2002, 18:1585–1592.PubMedView Article
      36. Affymetrix, Inc: White paper: Statistical Algorithms Description Document. 2002. http://​www.​affymetrix.​com/​support/​technical/​whitepapers/​saddwhitepaper.​pdf
      37. Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP: Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res 2003,31(4):e15.PubMedView Article
      38. Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP: Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 2003, 4:249–264.PubMedView Article
      39. Cramer JS: Logit Models from Economics and Other Fields. Cambridge: Cambridge University Press; 2003.View Article
      40. McCarthy DJ, Smyth GK: Testing significance relative to a fold-change threshold is a TREAT. Bioinformatics 2009,25(6):765–71.PubMedView Article
      41. Dalman MR, Deeter A, Nimishakavi G, Duan ZH: Fold change and p-value cutoffs significantly alter microarray interpretations. BMC Bioinformatics 2012,13(Suppl. 2):S11.PubMedView Article
      42. Cui X, Churchill GA: Statistical tests for differential expression in cDNA microarray experiments. Genome Biol 2003, 4:210.PubMedView Article

      This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://​creativecommons.​org/​licenses/​by/​2.​0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.