Skip to main content
Fig. 9 | Algorithms for Molecular Biology

Fig. 9

From: Unifying duplication episode clustering and gene-species mapping inference

Fig. 9

This figure summarizes coefficients of variations (CV) for gene-species distributions obtained from unlabeled leaves across 36 simulated datasets each with 100 gene trees (excluding datasets with \(p=0\)). Each bar corresponds to the average number of gene-species distributions for leaves with unknown label across the entire tree (A), the left subtree (B), and the right subtree (C), having the CV values falling within a specified range. All other distributions covering a subtree have all frequencies at most 1.5 and are therefore not included here. In D, a summary is presented for cases where the distributions do not span any subtree in the species tree. The panel (E) shows the average number of removed leaf-labels in the corresponding datasets. For example, the highest blue bar in \(S_4\) with \(p=0.6\) in A represents approximately 2400 leaves with \(\bot\) (out of an average of 3786.53 in this gene tree set) whose mapping inferences give every leaf in \(S_4\) with nearly identical frequency, as indicated by the corresponding CV values falling within the interval \([0-0.05)\). The key to histograms is on the right, where each bar represents the average count of gene-species distributions for \(\bot\)-leaves in a gene tree set with CV values falling within a specific interval. Intervals with CV values greater than 0.25 are excluded due to their low frequencies

Back to article page