Skip to main content
Fig. 3 | Algorithms for Molecular Biology

Fig. 3

From: Space-efficient computation of k-mer dictionaries for large values of k

Fig. 3

A Spelling \(K^{c}=K^{c}_{1}=\texttt {atgat}\) from \(H[b_{1}]\). The white arrow to the left indicates that the figure is read bottom-up. Each jth circle is the bucket \(H[b_{j}]\) in the reference chain S. The black string is \(K^{c}_{j}\) and the grey string is \(\hat{K}^{c}_{j}\). The incomplete string next to each circle has the symbols of \(K^{c}\) we know up to that bucket. The green symbol is the one we extract from \(K^{o}_{j}\) and insert it in one of the inner ends of \(K^{c}\). The arrow from \(H[b_{j}]\) to \(H[b_{j+1}]\) indicates the text overlap of \(H[b_{j}].r = b_{j+1}\). The red line is the \(k-1\) prefix in \(H[b_{j}]\) that matches a \(k-1\) suffix in \(H[b_{j+1}]\) (blue line). B The k-mers \(K^{o}_{j}\) we use in the spell of \(K^c\). When we change the spelling direction from \(H[b_{j+1}]\) to \(H[b_{j+2}]\), \(K^{o}_{j+2}\) does not match \(K^{o}_{j}\) because the chain of buckets \(S=b_9,\ldots ,b_2,b_1\) spelling \(K^{c}\) cannot have repeated elements (see Lemma 3). We mark the mismatching symbols of \(K^{o}_{j}\) and \(K^{o}_{j+2}\) with vertical lines in the figure. On the other hand, we remark that changes in the spelling direction are induced by the order in which we insert the k-mers in H and the reference bucket we have available at the moment of the insertion

Back to article page