Skip to main content

Advertisement

Table 3 Evaluation of contiguous motifs on Prosite data.

From: Evaluating deterministic motif significance measures in protein databases

PS entry Motif NumSeqs DiffNGrams Rel. Supp(%) Supp Rank ZScore LogOdd Pratt IG Info
PS00341 IPCCPV 9 702 77.8 9 21 65 166 13 217
PS00415 LRRRLSDS 12 3582 91.6 9 503 1058 2103 11 1784
PS00047 GAKRH 105 653 93.3 21 61 109 216 27 460
PS00984 CFWKYC 19 1256 100 1 1 1 785 1 5
PS00541 SKRKYRK 6 144 100 1 85 110 131 3 134
PS00822 PFDRHDW 9 2251 100 1 1 5 204 1 400
PS00419 CDGPGRGGTC 207 32936 100 1 1 1 3 1 158
PS00349 RKRKYFKKHEKR 18 2929 100 1 38 86 2884 19 310
PS00861 GWTLNSAGYLLGP 32 888 100 1 66 301 179 1 569
PS01024 EFDYLKSLEIEEKIN 60 5527 100 1 620 2427 5266 1 5244
PS00291 AGAAAAGAVVGGLGGY 136 2423 100 1 1033 1770 184 3 1984
R m      0.2340 4.526E-3 1.854E-3 9.075E-4 0.1358 9.764E-4
  1. Ranking results of eleven Prosite datasets (identified by the Prosite (PS) entry column). For each dataset, the number of protein sequences, the number of different n-grams (Diff NGrams), where n is equal to the motif length and the relative support of the target motifs (Rel. Supp) are presented. Motifs are ranked with Information-theoretic based measures. Ranks obtained by support (Supp Rank) and information gain (Info) are also provided for comparison purposes. Last row gives the R m values of each measure, where best results are obtained by support and IG.