Skip to main content

Advertisement

Figure 1 | Algorithms for Molecular Biology

Figure 1

From: Lossless filter for multiple repeats with bounded edit distance

Figure 1

A ( L , d , 2)-repeat and a parallelogram. An example a (L, d, 2)-repeat with L = 11, d = 2. Diagonals 30, 31, and 32 are shown. Among them, 30 and 32 have distance 2, while 30 and 31 (as well as 31 and 32) are consecutive. Assuming that q = 2, a q-hit is represented by a thicker diagonal of length 2 plus a small black circle representing its pair of coordinates. The q-hit (19, 49) refers to the q-gram TA, and 19 (resp. 49) is its first (resp. second) projection. The q-hit (17, 49) refers to the same q-gram TA but has a different first projection. The words inside the grey boxes are two distinct fragments of the same sequence s, namely s[10, 20] and s [42, 52]; they have length 11, and their edit distance is 2. We obtain one word from the other by deleting s[13] – hence no q-hits in positions 12, 13 – and by inserting s [48] – no q-hits in positions 47, 48. We have p = 6 and the set S MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaWefv3ySLgznfgDOfdaryqr1ngBPrginfgDObYtUvgaiqaacqWFse=uaaa@3739@ of 7 q-hits in diagonals 31 and 32 satisfies the properties (1), (2), (3), (4) and (5). If we add the q-hit in diagonal 30 in order to obtain a new set S MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaWefv3ySLgznfgDOfdaryqr1ngBPrginfgDObYtUvgaiqaacuWFse=ugaqbaaaa@3745@ , properties (1), (2) still hold, but properties (3), (4) and (5) are no longer satisfied.

Back to article page