Skip to main content

Table 4 Breakdown of the space usage of the variants of algorithm A4

From: A representation of a compressed de Bruijn graph for pan-genome analysis that enables search

Algo Part 62 E. coli 7 × Chr1 7 × HG
A4 Wt-bwt 0.42 (23.83 %) 0.44 (36.23 %) 0.43 (22.68 %)
A4 Nodes 0.10 (5.94 %) 0.03 (2.61 %) 0.04 (2.02 %)
A4 \(B_{r}\) 0.16 (8.93 %) 0.16 (12.86 %) 0.16 (8.25 %)
A4 \(B_{l}\) 0.14 (8.04 %) 0.14 (11.57 %) 0.14 (7.42 %)
A4 Wt-doc 0.93 (53.26 %) 0.45 (36.73 %) 1.13 (59.63 %)
A4compr1 Wt-bwt 0.42 (28.57 %) 0.44 (47.83 %) 0.43 (26.85 %)
A4compr1 Nodes 0.10 (7.12 %) 0.03 (3.44 %) 0.04 (2.39 %)
A4compr1 \(B_{r}\) 0.00 (0.23 %) 0.00 (0.12 %) 0.00 (0.09 %)
A4compr1 \(B_{l}\) 0.00 (0.23 %) 0.00 (0.12 %) 0.00 (0.08 %)
A4compr1 Wt-doc 0.93 (63.85 %) 0.45 (48.49 %) 1.13 (70.59 %)
A4compr2 Wt-bwt 0.16 (13.03 %) 0.22 (31.01 %) 0.22 (15.62 %)
A4compr2 Nodes 0.10 (8.67 %) 0.03 (4.55 %) 0.04 (2.76 %)
A4compr2 \(B_{r}\) 0.00 (0.28 %) 0.00 (0.16 %) 0.00 (0.10 %)
A4compr2 \(B_{l}\) 0.00 (0.28 %) 0.00 (0.16 %) 0.00 (0.10 %)
A4compr2 Wt-doc 0.93 (77.74 %) 0.45 (64.11 %) 1.13 (81.42 %)
  1. The first column shows the algorithm used in the experiment (the k-mer size is 50). The second column specifies the different data structures used: wt-bwt stands for the wavelet tree of the \(\mathsf {BWT}\) (including rank and select support), nodes stands for the array of nodes (the implicit graph representation), \(BV_r\) and \(BV_l\) are the bit vectors described in "Computation of right-maximal k-mers" section (including rank support), and wt-doc stands for the wavelet tree of the document array. The remaining columns show the memory usage in bytes per base pair and, in parentheses, their percentage