From: Estimating evolutionary distances between genomic sequences from spaced-word matches

Distances calculated by different alignment-free methods. Distances were calculated for pairs of simulated DNA sequences and plotted against their ‘real’ distances d measured in substitutions per site. Plots on the left-hand side are for sequence pairs without insertions and deletions, on the right-hand side the corresponding results are shown for sequences with an indel probability of 1% for each site and an average indel length of 25. From top to bottom, the applied methods were: 1. spaced words with the single-pattern approach and the Jensen-Shannon distance (squares) and the distance d N defined in equation (4) in this paper (circles), 2. the multiple-pattern version of Spaced Words using sets of m=100 patterns with the same distance functions, 3. distances calculated with K r [37], 4. with kmacs [47] and ACS [30] and 5. with Co-phylog [38].

