Skip to main content

Table 1 mBed performance on the ten biggest Pfam/HOMSTRAD families.

From: Sequence embedding for fast construction of guide trees for multiple sequence alignment

Name

Size

Len

%ID

Embedding Time (s)

Distance Matrix Calculation Time (s)

Alignment Column Score (%)

     

(1)

(2)

(3)

(4)

 

(1)

(2)

(3)

(4)

 

(1)

(2)

(3)

(4)

PF01381

9993

53

23

 

-

25

55

136

 

764

57

55

175

 

13.3

26.7

25.3

34.7

PF00006

9796

209

43

 

-

134

248

280

 

4364

48

49

88

 

42.8

36.6

36.6

38.0

PF00989

9681

95

17

 

-

43

88

197

 

1281

50

51

159

 

46.5

33.3

31.8

34.1

PF00486

9615

75

30

 

-

34

69

107

 

950

55

52

104

 

63.9

92.8

64.9

89.7

PF00571

9551

119

19

 

-

73

143

268

 

1993

54

50

152

 

6.15

3.08

1.54

1.54

PF00097

9423

41

33

 

-

18

38

94

 

517

44

43

115

 

53.2

54.8

61.3

54.8

PF01479

9352

47

32

 

-

17

40

90

 

496

45

46

124

 

58.3

91.7

89.6

79.2

PF00046

9305

54

35

 

-

20

43

85

 

651

41

42

77

 

59.4

44.9

46.4

60.9

PF00550

9249

63

25

 

-

28

59

136

 

794

47

47

141

 

51.3

32.9

55.3

59.2

PF00149

9072

198

14

 

-

133

256

552

 

3515

47

46

172

 

75.4

71.9

72.3

76.1

Average

9503

95

27

 

0

53

104

195

 

1533

49

48

131

 

47.0

48.9

48.5

52.8

  1. The ten biggest Pfam entries containing 9,000-10,000 sequences, which have a corresponding HOMSTRAD alignment are used here. Four different methods were applied to each entry to calculate a distance matrix. These methods are: (1) the traditional process of calculating a full distance matrix from the sequence data using an alignment distance measure; (2) mBed default; (3) mBed followed by the 'usePivotObjects' method; (4) mBed followed by the 'usePivotGroups' method. A UPGMA guide tree is constructed from each matrix and used as a guide tree for progressive alignment of the sequences. The alignment is then scored against the corresponding HOMSTRAD structural alignment using Column Score.
  2. (1) Full d(x, y) distance matrix; (2) mBed; (3) mBed + usePivotObjects; (4) mBed + usePivotGroups