GGGG|CGCC|G(G|C){4}|CCC |
---|
Whole training set | Test set | 2ndTest set |
Median Correlation | < 0.3 | ≥ 0.3 |  | < 0.3 | ≥ 0.3 |  | < 0.3 | ≥ 0.3 |
 | 410 | 4448 |  | 448 | 4436 |  | 425 | 4553 |
-v | 173 | 10061 | -v | 174 | 10045 | -v | 178 | 9947 |
GGGG |
Whole training set | Test set | 2ndTest set |
Median Correlation | < 0.3 | ≥ 0.3 |  | < 0.3 | ≥ 0.3 |  | < 0.3 | ≥ 0.3 |
 | 195 | 479 |  | 209 | 434 |  | 208 | 462 |
-v | 388 | 14030 | -v | 413 | 14047 | -v | 395 | 14038 |
- The performance on the training data is given on the left. "Out of sample" data (i.e. not used for training) gives a better indication of true performance (middle). The number of poor probes correctly predicted is 448 of 622 whist for good probes it is 10 045 of 14 481. The new motif is much better at finding poor probes, 448 v. 209. (Poor probes are those whose average correlation with their own probeset is below 0.3.) But this is at the cost of incorrectly flagging more probes as potentially flawed. Performance does not fall significantly, indicating there is no over fitting. Values for the second (unused) test set are given on the right.