Skip to main content

Table 2 Sequence data and pairing accuracy

From: Algorithms for matching partially labelled sequence graphs

PDB

Start

spec.

Number of seq.s

Percent code match

ones

dist.

topol.

dist.

topol.

no trans.

1aoz

8553

720

122

2588

982

39.7 (62.9)

58.6 (55.5)

54.8 (55.3)

6356

7940

1lci

5843

1579

565

3841

2045

63.8 (76.0)

75.7 (74.6)

60.8 (62.2)

5830

5655

1pkm

6243

1566

1026

3275

1989

72.3 (81.8)

80.0 (81.2)

79.9 (81.2)

6499

6532

3ctz

2095

634

487

1085

785

93.2 (94.7)

91.9 (91.5)

91.1 (90.8)

2143

2137

3vqt

4886

1132

633

2896

1858

95.8 (97.3)

91.3 (91.2)

91.6 (91.5)

4886

4886

4rcn

6018

705

120

2426

1151

50.2 (72.0)

53.1 (50.6)

53.1 (50.2)

4212

5879

  1. For each protein (“PDB”), the number of sequences found for each domain in the initial databank search is tabulated under “start”, followed by the number of species common to all domains (“spec.”) and the number of species with a single sequence entry (“ones”). After processing by the distance-based algorithm the number of sequences common to all domains dropped (“dist.”) with a further drop on application of the more restrictive topology based algorithm (“topol.”). The rough measure of matching success, based on the identity of paired sequence codes is given for the two methods (“dist.” and “topol.”) as a percentage along with the success rate for the topology based method when the transitivity bias is omitted (“no trans.”). These values are averages over the three domain pairings but as these matches are not independent, the percentage over the first two domain pairs (1,2 and 2,3) are given in parentheses