Skip to main content

Advertisement

Table 2 Sequence data and pairing accuracy

From: Algorithms for matching partially labelled sequence graphs

PDB Start spec. Number of seq.s Percent code match
ones dist. topol. dist. topol. no trans.
1aoz 8553 720 122 2588 982 39.7 (62.9) 58.6 (55.5) 54.8 (55.3)
6356
7940
1lci 5843 1579 565 3841 2045 63.8 (76.0) 75.7 (74.6) 60.8 (62.2)
5830
5655
1pkm 6243 1566 1026 3275 1989 72.3 (81.8) 80.0 (81.2) 79.9 (81.2)
6499
6532
3ctz 2095 634 487 1085 785 93.2 (94.7) 91.9 (91.5) 91.1 (90.8)
2143
2137
3vqt 4886 1132 633 2896 1858 95.8 (97.3) 91.3 (91.2) 91.6 (91.5)
4886
4886
4rcn 6018 705 120 2426 1151 50.2 (72.0) 53.1 (50.6) 53.1 (50.2)
4212
5879
  1. For each protein (“PDB”), the number of sequences found for each domain in the initial databank search is tabulated under “start”, followed by the number of species common to all domains (“spec.”) and the number of species with a single sequence entry (“ones”). After processing by the distance-based algorithm the number of sequences common to all domains dropped (“dist.”) with a further drop on application of the more restrictive topology based algorithm (“topol.”). The rough measure of matching success, based on the identity of paired sequence codes is given for the two methods (“dist.” and “topol.”) as a percentage along with the success rate for the topology based method when the transitivity bias is omitted (“no trans.”). These values are averages over the three domain pairings but as these matches are not independent, the percentage over the first two domain pairs (1,2 and 2,3) are given in parentheses