Microarray analysis is often used to detect differentially expressed genes (DEGs) under different conditions. As there are considerable differences [1, 2] in how well it performs, choosing the best method of ranking these genes is important. Furthermore, Affymetrix GeneChip users need to choose a preprocessing algorithm from a number of competitors in order to obtain expression-level measurements .
We recently reported with another group that there are suitable combinations of preprocessing algorithms and gene ranking methods [1, 2]. We evaluated three preprocessing algorithms, MAS , RMA , and DFW , and eight gene ranking methods, WAD , AD, FC, RP , modT , samT , shrinkT , and ibmT , by using a total of 38 datasets (including 36 real experimental datasets) . Meanwhile, Pearson  evaluated nine preprocessing algorithms, MAS , RMA , DFW , MBEI , CP , PLIER, GCRMA , mmgMOS , and FARMS , and five gene ranking methods, modT , FC, a standard t-test, cyberT , and PPLR , by using only one artificial 'spike-in' dataset, the Golden Spike dataset .
When we re-evaluated the two reports using the common algorithms and methods we found that suitable gene ranking methods for each of the three preprocessing algorithms, i.e., MAS, RMA, and DFW, converge to the same: Combinations of MAS and modT (MAS/modT), RMA/FC, and DFW/FC can thus be recommended. However, the final conclusions for the original reports are understandably different: Our recommendations  are MAS/WAD, RMA/FC, and DFW/RP, while Pearson  recommends mmgMOS/PPLR, GCRMA/FC, and so on. This difference is mainly because fewer preprocessing algorithms were evaluated in our previous study .
We investigated suitable gene ranking methods for each of six preprocessing algorithms: MBEI, VSN , PLIER, GCRMA, FARMS, and mmgMOS. We also investigated the best combination of a preprocessing algorithm and gene ranking method using another evaluation metric, i.e., the percentage of overlapping genes (POG), proposed by the MAQC study .
Most authors of methodological papers have made claims that their methods have a greater area under the receiver operating characteristic curve (AUC) values, i.e., both high sensitivity and specificity [1, 2]. However, reproducibility is rarely mentioned . A good method should produce high POG values, i.e., those indicating reproducibility as well as high AUC ones, i.e., those for sensitivity and specificity. We will discuss suitable combinations of preprocessing algorithms and gene ranking methods.