Skip to main content

Table 2 Confusion matrices for the evolved motif (top) and original motif (bottom). The performance

From: Evolving DNA motifs to predict GeneChip probe performance

GGGG|CGCC|G(G|C){4}|CCC

Whole training set

Test set

2ndTest set

Median Correlation

< 0.3

≥ 0.3

 

< 0.3

≥ 0.3

 

< 0.3

≥ 0.3

 

410

4448

 

448

4436

 

425

4553

-v

173

10061

-v

174

10045

-v

178

9947

GGGG

Whole training set

Test set

2ndTest set

Median Correlation

< 0.3

≥ 0.3

 

< 0.3

≥ 0.3

 

< 0.3

≥ 0.3

 

195

479

 

209

434

 

208

462

-v

388

14030

-v

413

14047

-v

395

14038

  1. The performance on the training data is given on the left. "Out of sample" data (i.e. not used for training) gives a better indication of true performance (middle). The number of poor probes correctly predicted is 448 of 622 whist for good probes it is 10 045 of 14 481. The new motif is much better at finding poor probes, 448 v. 209. (Poor probes are those whose average correlation with their own probeset is below 0.3.) But this is at the cost of incorrectly flagging more probes as potentially flawed. Performance does not fall significantly, indicating there is no over fitting. Values for the second (unused) test set are given on the right.