Skip to main content
Figure 3 | Algorithms for Molecular Biology

Figure 3

From: PhyloScan: identification of transcription factor binding sites using cross-species evidence

Figure 3

ROC Curves for PhyloScan and MONKEY. Shown are Receiver Operating Characteristic (ROC) curves for algorithms applied to intergenic regions containing a pair of full-strength Crp sites, a pair of 1/2-strength sites, and a pair of 1/3-strength sites. The simulated sequence data is for fourteen prokaryotic species organized into four clades; the orthologous intergenic sequences are 500 bp and are multiply-aligned within each clade but not between clades. ROC curves are shown for fully enabled PhyloScan and MONKEY. Additionally, ROC curves for PhyloScan applied to only the Enterobacteriales clade are shown. The ROC curves for PhyloScan with its multiple-clades capability enabled but its multiple-sites capability disabled are not shown because they are nearly indistinguishable from the fully enabled PhyloScan. A comparison of the "PhyloScan (1 clade)" curves to the "MONKEY (1 clade)" curves shows that there is value in combining evidence from multiple sites within an intergenic region using the Neuwald-Green calculation. A comparison of the "PhyloScan (4 clades)" curves to the "PhyloScan (1 clade)" curves indicates that there is additional value in considering data from multiple clades. For instance, if p-value cutoffs are chosen so that type I error is 0.1% (i.e., the specificity is 99.9%) then PhyloScan correctly classifies 99.85% of the full-strength-Crp intergenic regions, 72.68% of the 1/2-strength regions, and 32.64% of the 1/3-strength regions. The corresponding numbers for "PhyloScan (1 clade)" are 96.98%, 33.01%, and 10.11%. The corresponding numbers for MONKEY are 79.02%, 21.66%, and 6.33%. It is possible that sensitivities for the four-clades curves would have been even stronger if we had not prohibited the non-Enterobacteriales clades from rescuing intergenic regions in the Enterobacteriales clade that had failed to pass our 0.05 p-value cutoff.

Back to article page