Algorithms for matching partially labelled sequence graphs

Table 2 Sequence data and pairing accuracy

For each protein (“PDB”), the number of sequences found for each domain in the initial databank search is tabulated under “start”, followed by the number of species common to all domains (“spec.”) and the number of species with a single sequence entry (“ones”). After processing by the distance-based algorithm the number of sequences common to all domains dropped (“dist.”) with a further drop on application of the more restrictive topology based algorithm (“topol.”). The rough measure of matching success, based on the identity of paired sequence codes is given for the two methods (“dist.” and “topol.”) as a percentage along with the success rate for the topology based method when the transitivity bias is omitted (“no trans.”). These values are averages over the three domain pairings but as these matches are not independent, the percentage over the first two domain pairs (1,2 and 2,3) are given in parentheses

ISSN: 1748-7188