Skip to main content

Table 2 Experiments on (references of) pangenomes with \(k = 31\) and a min abundance of 1

From: Eulertigs: minimum plain text representation of k-mer sets without repetitions in linear time

Pangenome

Tigs

CL ratio

SC ratio

Time [s]

Memory [MiB]

1102x N. gonorrhoeae

Unitigs

1.615

3.052

29.1

 

4351

 

UST

1.022

1.072

31.4

(1.08)

4351

(1.00)

ProphAsm

1.00004

1.00013

734

(25.2)

208

(0.05)

Eulertigs

1

1

30.2

(1.04)

4351

(1.00)

616x S. pneumoniae

Unitigs

1.679

3.055

26.1

 

3146

 

UST

1.026

1.080

30.8

(1.18)

3146

(1.00)

ProphAsm

1.00004

1.00012

412

(15.8)

434

(0.14)

Eulertigs

1

1

29.3

(1.12)

3146

(1.00)

3682x E. coli

Unitigs

1.705

3.092

334

 

7117

 

UST

1.031

1.092

418

(1.25)

7117

(1.00)

ProphAsm

1.00008

1.00023

7066

(21.1)

7221

(1.01)

Eulertigs

1

1

398

(1.19)

7117

(1.00)

\(\sim\)309kx Salmonella

Unitigs

1.830

3.151

82417

 

13007

 

UST

1.049

1.126

82836

(1.01)

13007

(1.00)

Eulertigs

1

1

82732

(1.00)

13007

(1.00)

2505x Human

Unitigs

1.479

3.201

77582

 

411472

 

ProphAsm

1.00004

1.00017

82797*

(1.07)

411472*

(1.00)

Eulertigs

1

1

79198

(1.02)

411472

(1.00)

  1. The CL and SC ratios are compared to the CL-optimal Eulertigs. For time and memory, we report the total time and maximum memory required to compute the tigs from the respective data set. BCALM2 directly computes unitigs, while UST- and Eulertigs require a run of BCALM2 first before they can be computed themselves. Prophasm is run directly on the source data. The number in parentheses behind time and memory indicates the slowdown/increase over computing just unitigs with BCALM2. BCALM2 was run with 28 threads, while all other tools support only one thread. The N. gonorrhoeae pangenome contains 8.36 million unique kmers, the S. pneumoniae pangenome contains 19.3 million unique kmers, the E. coli pangenome contains 341 million unique kmers, the Salmonella pangenome contains 657 million unique kmers and the human pangenome contains 2.8 billion unique kmers. Due to its size, ProphAsm could not be run on the Salmonella pangenome. Also due to size, BCALM2 did not run on the human pangenome, hence we used Cuttlefish 2. To still be able to compare against competitors, we ran ProphAsm on the unitigs produced by Cuttlefish 2 (UST requires extra information specific to BCALM2)
  2. * Indicates that resource usage includes running Cuttlefish 2 for ProphAsm