Skip to main content

Advertisement

Table 4 Breakdown of the space usage of the variants of Algorithm A4

From: Erratum to: A representation of a compressed de Bruijn graph for pan-genome analysis that enables search

Algorithm Part 62 E.coli 7 × Chr1 7 × HG
A4 wt-bwt 0.42 (23.83%) 0.44 (36.23%) 0.43 (22.68%)
A4 Nodes 0.10 (5.94%) 0.03 (2.61%) 0.04 (2.02%)
A4 \(B_r\) 0.16 (8.93%) 0.16 (12.86%) 0.16 (8.25%)
A4 \(B_l\) 0.14 (8.04%) 0.14 (11.57%) 0.14 (7.42%)
A4 wt-doc 0.93 (53.26%) 0.45 (36.73%) 1.13 (59.63%)
A4compr1 wt-bwt 0.42 (28.57%) 0.44 (47.83%) 0.43 (26.85%)
A4compr1 Nodes 0.10 (7.12%) 0.03 (3.44%) 0.04 (2.39%)
A4compr1 \(B_r\) 0.00 (0.23%) 0.00 (0.12%) 0.00 (0.09%)
A4compr1 \(B_l\) 0.00 (0.23%) 0.00 (0.12%) 0.00 (0.08%)
A4compr1 wt-doc 0.93 (63.85%) 0.45 (48.49%) 1.13 (70.59%)
A4compr2 wt-bwt 0.16 (13.03%) 0.22 (31.01%) 0.22 (15.62%)
A4compr2 Nodes 0.10 (8.67%) 0.03 (4.55%) 0.04 (2.76%)
A4compr2 \(B_r\) 0.00 (0.28%) 0.00 (0.16%) 0.00 (0.10%)
A4compr2 \(B_l\) 0.00 (0.28%) 0.00 (0.16%) 0.00 (0.10%)
A4compr2 wt-doc 0.93 (77.74%) 0.45 (64.11%) 1.13 (81.42%)
  1. The first column shows the algorithm used in the experiment (the k-mer size is 50). The second column specifies the different data structures used: wt-bwt stands for the wavelet tree of the \(\mathsf {BWT}\) (including rank and select support), nodes stands for the array of nodes (the implicit graph representation), \(B_r\) and \(B_l\) are the bit vectors described in “Computation of right-maximal k-mers and node identifiers” section (including rank support), and wt-doc stands for the wavelet tree of the document array. The remaining columns show the memory usage in bytes per base pair and, in parentheses, their percentage