Skip to main content

Table 3 Evaluation of contiguous motifs on Prosite data.

From: Evaluating deterministic motif significance measures in protein databases

PS entry

Motif

NumSeqs

DiffNGrams

Rel. Supp(%)

Supp Rank

ZScore

LogOdd

Pratt

IG

Info

PS00341

IPCCPV

9

702

77.8

9

21

65

166

13

217

PS00415

LRRRLSDS

12

3582

91.6

9

503

1058

2103

11

1784

PS00047

GAKRH

105

653

93.3

21

61

109

216

27

460

PS00984

CFWKYC

19

1256

100

1

1

1

785

1

5

PS00541

SKRKYRK

6

144

100

1

85

110

131

3

134

PS00822

PFDRHDW

9

2251

100

1

1

5

204

1

400

PS00419

CDGPGRGGTC

207

32936

100

1

1

1

3

1

158

PS00349

RKRKYFKKHEKR

18

2929

100

1

38

86

2884

19

310

PS00861

GWTLNSAGYLLGP

32

888

100

1

66

301

179

1

569

PS01024

EFDYLKSLEIEEKIN

60

5527

100

1

620

2427

5266

1

5244

PS00291

AGAAAAGAVVGGLGGY

136

2423

100

1

1033

1770

184

3

1984

R m

    

0.2340

4.526E-3

1.854E-3

9.075E-4

0.1358

9.764E-4

  1. Ranking results of eleven Prosite datasets (identified by the Prosite (PS) entry column). For each dataset, the number of protein sequences, the number of different n-grams (Diff NGrams), where n is equal to the motif length and the relative support of the target motifs (Rel. Supp) are presented. Motifs are ranked with Information-theoretic based measures. Ranks obtained by support (Supp Rank) and information gain (Info) are also provided for comparison purposes. Last row gives the R m values of each measure, where best results are obtained by support and IG.