Skip to main content

Table 3 Experiments on references and read sets of single genomes with k = 52 and a min abundance of 10 for human and 1 for the others

From: Eulertigs: minimum plain text representation of k-mer sets without repetitions in linear time

Genome

Algorithm

CL ratio

SC ratio

Time [s]

Memory [GiB]

C. elegans (reads)

Unitigs

1.788

2.824

2278

 

5.94

 

UST

1.034

1.079

3164

(1.39)

15.0

(2.53)

Eulertigs

1

1

3101

(1.36)

24.8

(4.17)

B. mori (reads)

Unitigs

1.911

3.133

7157

 

9.35

 

UST

1.050

1.117

10530

(1.47)

52.3

(5.59)

Eulertigs

1

1

10006

(1.40)

78.3

(8.38)

H. sapiens (reads)

Unitigs

1.414

2.135

56418

 

12.0

 

UST

1.016

1.043

57174

(1.01)

16.1

(1.35)

Eulertigs

1

1

57252

(1.01)

25.9

(2.17)

C. elegans

Unitigs

1.059

3.145

72.9

 

1.22

 

UST

1.002

1.088

76.2

(1.05)

1.22

(1.00)

Eulertigs

1

1

82.0

(1.13)

1.22

(1.00)

B. mori

Unitigs

1.259

3.296

259

 

3.33

 

UST

1.017

1.153

295

(1.14)

3.33

(1.00)

Eulertigs

1

1

311

(1.20)

3.33

(1.00)

H. sapiens

Unitigs

1.190

3.521

1509

 

10.0

 

UST

1.014

1.190

1708

(1.13)

10.0

(1.00)

Eulertigs

1

1

1845

(1.22)

10.0

(1.00)

  1. The CL and SC ratios are compared to the CL-optimal Eulertigs. For time and memory, we report the total time and maximum memory required to compute the tigs from the respective data set. BCALM2 directly computes unitigs, while UST- and Eulertigs require a run of BCALM2 first before they can be computed themselves. Prophasm can only be run for \(k \le 32\), which does not make sense for large genomes. The number in parentheses behind time and memory indicates the slowdown/increase over computing just unitigs with BCALM2. BCALM2 was run with 28 threads, while all other tools support only one thread. The lengths of the genomes are 100Mbp for C. elegans, 482Mbp for B. mori and 3.21Gbp for H. sapiens and the read data sets have a coverage of 64x for C. elegans, 58x for B. mori and 300x for H. sapiens