Skip to main content
Fig. 1 | Algorithms for Molecular Biology

Fig. 1

From: Space-efficient computation of k-mer dictionaries for large values of k

Fig. 1

Example of our compact hash table H filled by GetDict using the 4-length k-mers of the read \(R=\texttt {cgttagttaa}\). The arrows indicate the references where we recover the \((k-1)\)-prefixes of the k-mers. We first map \(R[1\ldots k] = \texttt {cgtt}\) to its bucket \(H[h(\texttt {cgtt})=4]\). However, as we do not know a reference bucket \(H[b_{pr}]\) to recover the prefix \(R[1\ldots k-1]=\texttt {cgt}\), we store the k-mer’s full sequence in the dynamic buffer as \(B[l=1\ldots l+k-1=4]=\texttt {cgtt}\) and store \(H[4]=(f=1,r=l,a=\varepsilon )\). The next k-mer in \(\mathcal {R}\) is \(R[2\ldots k+1] = \texttt {gtta}\), whose designated bucket is \(H[h(\texttt {gtta})=7]\). In this case, we have available a preceding k-mer \(K_{pr}=\texttt {cgtt}\) and its bucket \(b_{pr}=4\). Thus, we encode gtta as \(H[4]=(1,4,\texttt {a})\). We continue with the remaining k-mers of R in the same way

Back to article page