Skip to main content

Table 2 The weights and sizes of various string set representations

From: Disk compression of k-mer sets

Dataset

UST

ESS-Tip-Compress

ESS-Compress

Eq. (3.1) lower bound

# strings

#char/ \(k\)-mer

# strings

#char/ \(k\)-mer

# strings

#char/ \(k\)-mer

#char/ \(k\)-mer

R. sphaeroides

240,562

2.22

61,909

1.38

36,456

1.29

1.28

Human RNA-seq

4,098,389

2.22

1,834,945

1.60

1,098,938

1.42

1.39

Gingiva metagenome

3,095,476

1.91

1,499,270

1.48

917,388

1.33

1.32

Soybean RNA-seq

1,806,078

1.49

1,137,350

1.32

515,244

1.17

1.17

Tongue metagenome

6,030,814

2.10

2,664,422

1.53

1,327,701

1.33

1.32

Whole human

22,072,219

1.32

21,320,263

1.28

10,321,275

1.15

1.14

  1. The rightmost column shows the lower bound computed by Eq. (3.1) in Sect. "The weight of the ESS-Compress representation". The weight of ESS-Compress was verified to be the same as predicted by Theorem 3.2