Skip to main content

Table 3 Assessment of assembly qualities for LazyB, Canu Wtdbg2, HASLR, Wengan and short-read only assemblies for two model organisms

From: LazyB: fast and cheap genome assembly

Org. X Tool Compl. [%] #ctg #MA MM InDels NA50
Yeast \({\sim} 5\times \) LazyB 90.466 127 9 192.56 274.62 118843
LazyB+QM 94.378 64 12 174.77 245.05 311094
Canu 14.245 115 5 361.47 2039.15
Wtdbg2 22.237 177 0 849.07 805.31
HASLR 64.158 111 1 14.87 34.86 60316
DBG2OLC 45.645 53 20 2066.64 1655.92
Wengan 95.718 41 11 49.14 68.47 438928
\({\sim} 11\times \) LazyB 97.632 33 15 193.73 300.20 505126
LazyB+QM 94.211 34 14 234.59 329.4 453273
Canu 92.615 66 15 107.00 1343.37 247477
Wtdbg2 94.444 42 8 420.96 1895.28 389196
HASLR 92.480 57 1 7.89 33.91 251119
DBG2OLC 97.689 38 25 55.06 1020.48 506907
Wengan 96.036 37 4 32.35 53.04 496058
\({\sim} 80\times \) Abyss 95.247 283 0 9.13 1.90 90927
Fruit fly \({\sim} 5\times \) LazyB 71.624 1879 68 446.19 492.43 64415
LazyB+QM 75.768 1164 79 322.49 349.29 167975
Canu
Wtdbg2 6.351 2293 2 916.77 588.19
HASLR 24.484 1407 10 31.07 58.96
DBG2OLC 25.262 974 141 1862.85 969.26
Wengan 81.02 2129 192 105.35 123.33 77215
\({\sim} 10\times \) LazyB 80.111 596 99 433.37 486.28 454664
LazyB+QM 80.036 547 100 416.34 467.14 485509
Canu 49.262 1411 275 494.66 1691.11
Wtdbg2 41.82 1277 155 2225.12 1874.01
HASLR 67.059 2463 45 43.83 84.89 36979
DBG2OLC 82.52 487 468 739.47 1536.32 498732
Wengan 84.129 926 237 114.96 154.03 221730
\({\sim} 45\times \) Abyss 83.628 5811 123 6.20 8.31 67970
Human \({\sim} 10\times \) LazyB 67.108 13210 2915 1177.59 1112.84 168170
\({\sim} 43\times \) Unitig 69.422 4146090 252 93.07 13.65 338
\({\sim} 43\times \) Abyss 84.180 510315 2669 98.53 25.03 7963
  1. LazyB outperforms Canu and Wtdbg2 in all categories, while significantly reducing contig counts compared to short-read only assemblies. While HASLR is more accurate, it covers significantly lower fractions of genomes at a higher contig count and drastically lower NA50. DBG2OL produces few contigs at a high NA50 for higher coverage cases, but calls significantly more mis-assemblies. Wengan performs well for yeast, but produces more misassemblies at a higher contig count on fruit fly. Merging LazyB assemblies to the set of short read contigs (+QM) has a positive effect at 5\(\times \) long-read coverage but negligible influence at higher coverage. Mismatches and InDels are given per 100 kb. Accordingly, errors in LazyB ’s unpolished output constitute \(<1\)% except for human. Wtdbg2 assemblies were not polished. Column descriptions: X coverage of sequencing data, completeness of the assembly. #ctg: number of contigs, #MA: number of mis-assemblies (breakpoints relative to the reference assembly) M is Matches and InDels relative to the reference genomes. NA50 of correctly assembled contigs. We follow the definition of QUAST: Given a set of fragments as the sub-regions of the original contigs that were correctly aligned to the reference, the NA50 (also named NGA50) is defined as the minimal length of a fragment needed to cover 50% of the genome. This value is omitted when \(< 50\%\) is correctly recalled