Towards a practical O(n logn) phylogeny algorithm

Table 1 The effect of heuristics on the performance of the algorithm

method	% taxa	RF
	inserted	accuracy
basic RW+NJ guide tree	56.3 ± 2.4	46.3 ± 4.6
basic RW+true guide tree	57.3 ± 2.1	49.4 ± 5.0
(not feasible in practice)
5 quartets per node query, UM	76.4 ± 2.0	41.0 ± 3.8
5 quartets, WM	85.6 ± 1.6	48.6 ± 3.7
5 quartets, WM, 2E	95.4 ± 1.0	45.5 ± 3.4
5 quartets, WM, CT, 2E	84.1 ± 1.8	57.4 ± 3.5
5 quartets, WM, re-running the RW, 2E	78.8 ± 2.4	59.5 ± 3.7
5 quartets, WTA, CT, 2E	80.2 ± 2.1	62.3 ± 2.9
20 quartets, WTA, CT, 2E	92.1 ± 1.4	60.8 ± 2.9
NJ	n/a	62.6

Shown are results for the COG840 data set with 1250 taxa. We show our algorithm’s performance in various settings, and compare it to Neighbor Joining. We report accuracies using the Robinson-Foulds measure. Our algorithm places approximately 80% − 90% of taxa with accuracy around 60%. We ran each version of the algorithm 100 times. In all cases, the guide tree is on 200 taxa; except in the second line of the table, this was generated with Neighbor Joining, and had RF accuracy of 50% ± 3%. Three voting schemes were used in the experiments: unweighted majority (UM), weighted majority (WM), and winner-takes-all (WTA). In some experiments, we also added 2 additional rounds of insertions (2E), and a confidence threshold for insertion (CT).

ISSN: 1748-7188