Skip to main content

Table 2 Confusion matrices for the evolved motif (top) and original motif (bottom). The performance

From: Evolving DNA motifs to predict GeneChip probe performance

GGGG|CGCC|G(G|C){4}|CCC
Whole training set Test set 2ndTest set
Median Correlation < 0.3 ≥ 0.3   < 0.3 ≥ 0.3   < 0.3 ≥ 0.3
  410 4448   448 4436   425 4553
-v 173 10061 -v 174 10045 -v 178 9947
GGGG
Whole training set Test set 2ndTest set
Median Correlation < 0.3 ≥ 0.3   < 0.3 ≥ 0.3   < 0.3 ≥ 0.3
  195 479   209 434   208 462
-v 388 14030 -v 413 14047 -v 395 14038
  1. The performance on the training data is given on the left. "Out of sample" data (i.e. not used for training) gives a better indication of true performance (middle). The number of poor probes correctly predicted is 448 of 622 whist for good probes it is 10 045 of 14 481. The new motif is much better at finding poor probes, 448 v. 209. (Poor probes are those whose average correlation with their own probeset is below 0.3.) But this is at the cost of incorrectly flagging more probes as potentially flawed. Performance does not fall significantly, indicating there is no over fitting. Values for the second (unused) test set are given on the right.