Skip to main content

Advertisement

Table 3 The dictionary and parse sizes for several files from the Pizza and Chili repetitive corpus, with three settings of the parameters w and p

From: Prefix-free parsing for building big BWTs

File Size \(w = 6, p = 20\) \(w = 8, p = 50\) \(w = 10, p = 100\)
Dict. Parse % Dict. Parse % Dict. Parse %
cere 440 61 77 31 43 159 46 89 17 24
cere_no_Ns 409 33 77 27 43 33 18 60 17 19
dna.001.1 100 8 20 27 13 9 21 21 4 25
einstein.en.txt 446 2 87 20 3 39 9 4 17 5
influenza 148 16 28 30 32 12 29 49 6 37
kernel 247 14 52 26 14 20 13 15 10 10
world_leaders 45 5 5 21 8 2 21 11 1 26
world_leaders_no_dots 23 4 5 34 6 2 31 7 1 33
  1. All sizes are reported in megabytes; percentages are the sums of the sizes of the dictionaries and parses, divided by the sizes of the uncompressed files
  2. For each file, the sizes are in italics for the settings with the best overall compression