Skip to main content

Table 4 Experimental results for protein family classification

From: WildSpan: mining structured motifs from protein sequences

Method/Database

Time used in seconds

Sensitivity

Precision

Specificity

MCC1

PROSITE

-

85.717

93.043

99.996

0.857

RISOTTO

18.635

47.003

99.957

100

0.470

Pratt

1598.3

81.507

94.159

99.995

0.815

Teiresias

0.908

76.798

0.2523

41.163

0.030

WildSpan (Family-based)

89.782

99.042

97.481

99.993

0.990

  1. The table shows the performance of family-based mining of WildSpan on protein family classification based on PA10F. The results were compared to PROSITE annotated patterns and three other pattern mining methods: RISOTTO, Teiresias, and Pratt. The input data was prepared by collecting proteins in the release 50.9 of UniProtKB/Swiss-Prot (235673 entries), and the discovered patterns were verified through all protein sequences in the release 2010/08 of UniProtKB/Swiss-Port (518415 entries). Fragment and partially matches were excluded in both training and testing data. The parameter values of all the methods were set as default
  2. 1 Matthews correlation coefficient (MCC): (TP×TN - FP×FN)/SQRT( (TP+FP) × (TP+FN) × (FN+FP) × (TN+FN) )