Skip to main content

Table 3 The dictionary and parse sizes for several files from the Pizza and Chili repetitive corpus, with three settings of the parameters w and p

From: Prefix-free parsing for building big BWTs

File

Size

\(w = 6, p = 20\)

\(w = 8, p = 50\)

\(w = 10, p = 100\)

Dict.

Parse

%

Dict.

Parse

%

Dict.

Parse

%

cere

440

61

77

31

43

159

46

89

17

24

cere_no_Ns

409

33

77

27

43

33

18

60

17

19

dna.001.1

100

8

20

27

13

9

21

21

4

25

einstein.en.txt

446

2

87

20

3

39

9

4

17

5

influenza

148

16

28

30

32

12

29

49

6

37

kernel

247

14

52

26

14

20

13

15

10

10

world_leaders

45

5

5

21

8

2

21

11

1

26

world_leaders_no_dots

23

4

5

34

6

2

31

7

1

33

  1. All sizes are reported in megabytes; percentages are the sums of the sizes of the dictionaries and parses, divided by the sizes of the uncompressed files
  2. For each file, the sizes are in italics for the settings with the best overall compression