Skip to main content

Table 5 Experiments on references and read sets of single genomes with \(k = 102\) and a min abundance of 10 for human and 1 for the others

From: Eulertigs: minimum plain text representation of k-mer sets without repetitions in linear time

Genome

Algorithm

CL ratio

SC ratio

Time [s]

Memory [GiB]

C. elegans (reads)

Unitigs

1.742

2.588

5585

 

5.91

 

UST

1.023

1.049

6292

(1.13)

11.8

(2.00)

Eulertigs

1

1

6565

(1.18)

21.6

(3.66)

B. mori (reads)

Unitigs

1.891

3.003

34979

 

10.8

 

UST

1.042

1.093

38272

(1.09)

47.1

(4.36)

Eulertigs

1

1

38939

(1.11)

77.3

(7.17)

H. sapiens (reads)

Unitigs

1.334

1.927

191808

 

9.15

 

UST

1.008

1.021

192219

(1.00)

10.8

(1.18)

Eulertigs

1

1

192464

(1.00)

13.8

(1.50)

C. elegans

Unitigs

1.042

3.061

176

 

2.14

 

UST

1.001

1.063

179

(1.01)

2.14

(1.00)

Eulertigs

1

1

186

(1.05)

2.14

(1.00)

B. mori

Unitigs

1.133

2.805

756

 

3.15

 

UST

1.005

1.071

771

(1.02)

3.15

(1.00)

Eulertigs

1

1

801

(1.06)

3.15

(1.00)

H. sapiens

Unitigs

1.060

3.189

5204

 

17.4

 

UST

1.003

1.101

5277

(1.01)

17.4

(1.00)

Eulertigs

1

1

5474

(1.05)

17.4

(1.00)

  1. The CL and SC ratios are compared to the CL-optimal Eulertigs. For time and memory, we report the total time and maximum memory required to compute the tigs from the respective data set. BCALM2 directly computes unitigs, while UST- and Eulertigs require a run of BCALM2 first before they can be computed themselves. Prophasm can only be run for \(k \le 32\), which does not make sense for large genomes. The number in parentheses behind time and memory indicates the slowdown/increase over computing just unitigs with BCALM2. BCALM2 was run with 28 threads, while all other tools support only one thread. The lengths of the genomes are 100Mbp for C. elegans, 482Mbp for B. mori and 3.21Gbp for H. sapiens and the read data sets have a coverage of 64x for C. elegans, 58x for B. mori and 300x for H. sapiens