A representation of a compressed de Bruijn graph for pan-genome analysis that enables search

Table 6 Runtime and main memory usage for finding sequences that correspond to given nodes

k		62 E. coli	7 × Chr1	7 × HG
50	A4	10.84 (1.81)	3.31 (1.28)	15.33 (1.96)
50	A4compr1	10.91 (1.52)	3.17 (0.98)	14.88 (1.66)
50	A4compr2	11.02 (1.20)	3.07 (0.70)	13.02 (1.39)
100	A4	8.31 (1.78)	2.72 (1.26)	10.99 (1.94)
100	A4compr1	8.11 (1.49)	2.83 (0.97)	9.10 (1.64)
100	A4compr2	8.23 (1.17)	2.84 (0.68)	9.25 (1.37)
500	A4	2.43 (1.73)	1.32 (1.26)	4.51 (1.93)
500	A4compr1	2.78 (1.43)	1.32 (0.96)	4.22 (1.63)
500	A4compr2	2.32 (1.11)	1.29 (0.67)	4.30 (1.36)

The first column shows the k-mer size and the second column specifies the algorithm used in the experiment. The remaining columns show the run-times in seconds for finding out to which sequences each of the nodes belongs (where the nodes correspond to 10,000 patterns of length 900 that occur in the pan-genome) and, in parentheses, the maximum main memory usage in bytes per base pair for the data sets described in the text

ISSN: 1748-7188