Skip to main content

Advertisement

Table 1 mBed performance on the ten biggest Pfam/HOMSTRAD families.

From: Sequence embedding for fast construction of guide trees for multiple sequence alignment

Name Size Len %ID Embedding Time (s) Distance Matrix Calculation Time (s) Alignment Column Score (%)
      (1) (2) (3) (4)   (1) (2) (3) (4)   (1) (2) (3) (4)
PF01381 9993 53 23   - 25 55 136   764 57 55 175   13.3 26.7 25.3 34.7
PF00006 9796 209 43   - 134 248 280   4364 48 49 88   42.8 36.6 36.6 38.0
PF00989 9681 95 17   - 43 88 197   1281 50 51 159   46.5 33.3 31.8 34.1
PF00486 9615 75 30   - 34 69 107   950 55 52 104   63.9 92.8 64.9 89.7
PF00571 9551 119 19   - 73 143 268   1993 54 50 152   6.15 3.08 1.54 1.54
PF00097 9423 41 33   - 18 38 94   517 44 43 115   53.2 54.8 61.3 54.8
PF01479 9352 47 32   - 17 40 90   496 45 46 124   58.3 91.7 89.6 79.2
PF00046 9305 54 35   - 20 43 85   651 41 42 77   59.4 44.9 46.4 60.9
PF00550 9249 63 25   - 28 59 136   794 47 47 141   51.3 32.9 55.3 59.2
PF00149 9072 198 14   - 133 256 552   3515 47 46 172   75.4 71.9 72.3 76.1
Average 9503 95 27   0 53 104 195   1533 49 48 131   47.0 48.9 48.5 52.8
  1. The ten biggest Pfam entries containing 9,000-10,000 sequences, which have a corresponding HOMSTRAD alignment are used here. Four different methods were applied to each entry to calculate a distance matrix. These methods are: (1) the traditional process of calculating a full distance matrix from the sequence data using an alignment distance measure; (2) mBed default; (3) mBed followed by the 'usePivotObjects' method; (4) mBed followed by the 'usePivotGroups' method. A UPGMA guide tree is constructed from each matrix and used as a guide tree for progressive alignment of the sequences. The alignment is then scored against the corresponding HOMSTRAD structural alignment using Column Score.
  2. (1) Full d(x, y) distance matrix; (2) mBed; (3) mBed + usePivotObjects; (4) mBed + usePivotGroups