Skip to main content

Advertisement

Table 4 The dictionary and parse sizes for prefixes of a database of Salmonella genomes, with three settings of the parameters w and p

From: Prefix-free parsing for building big BWTs

Number of genomes Size \(w = 6, p = 20\) \(w = 8, p = 50\) \(w = 10, p = 100\)
Dict. Parse % Dict. Parse % Dict. Parse %
50 249 68 43 44 77 20 39 91 10 40
100 485 83 85 35 99 39 28 122 19 29
500 2436 273 424 29 314 194 21 377 96 19
1000 4861 475 847 27 541 388 19 643 192 17
5000 24936 2663 4334 28 2915 1987 20 3196 985 17
10,000 49420 4190 8611 26 4652 3939 17 5176 1955 14
  1. Again, all sizes are reported in megabytes; percentages are the sums of the sizes of the dictionaries and parses, divided by the sizes of the uncompressed files
  2. For each prefix, the sizes are in italics for the settings with the best overall compression