Skip to main content

Table 1 Constraint models of gapped motifs employed in previous studies

From: WildSpan: mining structured motifs from protein sequences

Gap constraint models

Descriptions

Examples of existing algorithms

Model 1

At least L non-wildcards should be present in a pattern of maximum length of W. (e.g. 'A-x-K-H-x(2)- E')

Teiresias [6] and SPLASH [4]

Model 2

A gap with a maximum flexibility FL is allowed between any pair of pattern symbols; related constraints: maximum number of flexible gaps, maximum product of each flexibility. (e.g. 'A-x(2,3)-W-x-H-(4,6)-E')

Pratt [19]

Model 3

A gap with a minimum length of LB (e.g. LB = 1) and a maximum length UB (e.g. UB = 10) is allowed in between any pair of pattern symbols. (e.g. 'A-W-x(1,5)-H-x(4,10)-E')

Ref. [35, 36]

Model 4

A gap of any length (denoted as *) is allowed in between any pair of continuous words in a pattern; related constraints: minimum length of continuous words. (e.g. 'A-W-D-A-x(*)-H-E-D-x(*)-K-R')

Ref. [7, 11, 14]

Model 5

a gap with a minimum length of LB and a maximum length of UB is allowed in between any pair of symbols in a pattern block; a gap with a minimum length of LB" and a maximum length of UB" is allowed in between any pair of pattern blocks; related constraints: minimum length of pattern block; (e.g. MAGIIC [24]:'A-W-x(2,3)-H-x(45, 60)-E-x-D-x(1,2)-K', a pattern block is underscored), RISOTTO [15] (e.g. R-G-I-T-I-T-x(16,18)-P-G-H-A-D-F, one mismatch is allowed in a pattern block).

MAGIIC [24] and RISOTTO [15]