Skip to main content

Table 4 Breakdown of the space usage of the variants of algorithm A4

From: A representation of a compressed de Bruijn graph for pan-genome analysis that enables search

Algo

Part

62 E. coli

7 × Chr1

7 × HG

A4

Wt-bwt

0.42 (23.83 %)

0.44 (36.23 %)

0.43 (22.68 %)

A4

Nodes

0.10 (5.94 %)

0.03 (2.61 %)

0.04 (2.02 %)

A4

\(B_{r}\)

0.16 (8.93 %)

0.16 (12.86 %)

0.16 (8.25 %)

A4

\(B_{l}\)

0.14 (8.04 %)

0.14 (11.57 %)

0.14 (7.42 %)

A4

Wt-doc

0.93 (53.26 %)

0.45 (36.73 %)

1.13 (59.63 %)

A4compr1

Wt-bwt

0.42 (28.57 %)

0.44 (47.83 %)

0.43 (26.85 %)

A4compr1

Nodes

0.10 (7.12 %)

0.03 (3.44 %)

0.04 (2.39 %)

A4compr1

\(B_{r}\)

0.00 (0.23 %)

0.00 (0.12 %)

0.00 (0.09 %)

A4compr1

\(B_{l}\)

0.00 (0.23 %)

0.00 (0.12 %)

0.00 (0.08 %)

A4compr1

Wt-doc

0.93 (63.85 %)

0.45 (48.49 %)

1.13 (70.59 %)

A4compr2

Wt-bwt

0.16 (13.03 %)

0.22 (31.01 %)

0.22 (15.62 %)

A4compr2

Nodes

0.10 (8.67 %)

0.03 (4.55 %)

0.04 (2.76 %)

A4compr2

\(B_{r}\)

0.00 (0.28 %)

0.00 (0.16 %)

0.00 (0.10 %)

A4compr2

\(B_{l}\)

0.00 (0.28 %)

0.00 (0.16 %)

0.00 (0.10 %)

A4compr2

Wt-doc

0.93 (77.74 %)

0.45 (64.11 %)

1.13 (81.42 %)

  1. The first column shows the algorithm used in the experiment (the k-mer size is 50). The second column specifies the different data structures used: wt-bwt stands for the wavelet tree of the \(\mathsf {BWT}\) (including rank and select support), nodes stands for the array of nodes (the implicit graph representation), \(BV_r\) and \(BV_l\) are the bit vectors described in "Computation of right-maximal k-mers" section (including rank support), and wt-doc stands for the wavelet tree of the document array. The remaining columns show the memory usage in bytes per base pair and, in parentheses, their percentage