Skip to main content

Table 3 The compression sizes, as measured in bits per \(k\)-mer in the compressed output

From: Disk compression of k-mer sets

Dataset

Read FASTA

One \(k\)-mer per line

BOSS

UST-Compress

ESS-Tip-Compress

ESS-Compress

R. sphaeroides

45.4

28.4

6.55

3.93

2.90

2.87

Human RNA-seq

45.8

31.7

6.89

4.14

3.43

3.33

Gingiva metagenome

48.0

32.4

10.64

3.76

3.22

3.05

Soybean RNA-seq

43.0

33.1

5.97

2.83

2.66

2.55

Tongue metagenome

48.1

33.3

3.59

4.07

3.32

3.07

Whole human

31.9

48.2

4.65

2.49

2.46

2.40

  1. All string representations (i.e. not BOSS) are compressed using MFC in the final step. Since BOSS is a binary representation, we use LZMA for the final compression step