Skip to main content

Table 3 Assessment of assembly qualities for LazyB, Canu Wtdbg2, HASLR, Wengan and short-read only assemblies for two model organisms

From: LazyB: fast and cheap genome assembly

Org.

X

Tool

Compl. [%]

#ctg

#MA

MM

InDels

NA50

Yeast

\({\sim} 5\times \)

LazyB

90.466

127

9

192.56

274.62

118843

LazyB+QM

94.378

64

12

174.77

245.05

311094

Canu

14.245

115

5

361.47

2039.15

Wtdbg2

22.237

177

0

849.07

805.31

HASLR

64.158

111

1

14.87

34.86

60316

DBG2OLC

45.645

53

20

2066.64

1655.92

Wengan

95.718

41

11

49.14

68.47

438928

\({\sim} 11\times \)

LazyB

97.632

33

15

193.73

300.20

505126

LazyB+QM

94.211

34

14

234.59

329.4

453273

Canu

92.615

66

15

107.00

1343.37

247477

Wtdbg2

94.444

42

8

420.96

1895.28

389196

HASLR

92.480

57

1

7.89

33.91

251119

DBG2OLC

97.689

38

25

55.06

1020.48

506907

Wengan

96.036

37

4

32.35

53.04

496058

\({\sim} 80\times \)

Abyss

95.247

283

0

9.13

1.90

90927

Fruit fly

\({\sim} 5\times \)

LazyB

71.624

1879

68

446.19

492.43

64415

LazyB+QM

75.768

1164

79

322.49

349.29

167975

Canu

Wtdbg2

6.351

2293

2

916.77

588.19

HASLR

24.484

1407

10

31.07

58.96

DBG2OLC

25.262

974

141

1862.85

969.26

Wengan

81.02

2129

192

105.35

123.33

77215

\({\sim} 10\times \)

LazyB

80.111

596

99

433.37

486.28

454664

LazyB+QM

80.036

547

100

416.34

467.14

485509

Canu

49.262

1411

275

494.66

1691.11

Wtdbg2

41.82

1277

155

2225.12

1874.01

HASLR

67.059

2463

45

43.83

84.89

36979

DBG2OLC

82.52

487

468

739.47

1536.32

498732

Wengan

84.129

926

237

114.96

154.03

221730

\({\sim} 45\times \)

Abyss

83.628

5811

123

6.20

8.31

67970

Human

\({\sim} 10\times \)

LazyB

67.108

13210

2915

1177.59

1112.84

168170

\({\sim} 43\times \)

Unitig

69.422

4146090

252

93.07

13.65

338

\({\sim} 43\times \)

Abyss

84.180

510315

2669

98.53

25.03

7963

  1. LazyB outperforms Canu and Wtdbg2 in all categories, while significantly reducing contig counts compared to short-read only assemblies. While HASLR is more accurate, it covers significantly lower fractions of genomes at a higher contig count and drastically lower NA50. DBG2OL produces few contigs at a high NA50 for higher coverage cases, but calls significantly more mis-assemblies. Wengan performs well for yeast, but produces more misassemblies at a higher contig count on fruit fly. Merging LazyB assemblies to the set of short read contigs (+QM) has a positive effect at 5\(\times \) long-read coverage but negligible influence at higher coverage. Mismatches and InDels are given per 100 kb. Accordingly, errors in LazyB ’s unpolished output constitute \(<1\)% except for human. Wtdbg2 assemblies were not polished. Column descriptions: X coverage of sequencing data, completeness of the assembly. #ctg: number of contigs, #MA: number of mis-assemblies (breakpoints relative to the reference assembly) M is Matches and InDels relative to the reference genomes. NA50 of correctly assembled contigs. We follow the definition of QUAST: Given a set of fragments as the sub-regions of the original contigs that were correctly aligned to the reference, the NA50 (also named NGA50) is defined as the minimal length of a fragment needed to cover 50% of the genome. This value is omitted when \(< 50\%\) is correctly recalled