Algorithms for Molecular Biology

Table 4 Feature importances of random forest trained on the biggest dataset (\(M=1000\) and \(\max L=100\)) based on normal (a) and LGT (b) network data

From: Constructing phylogenetic networks via cherry picking and machine learning

Features	Importance
	(a) Normal	(b) LGT
Leaf distance (t)	0.190	0.162
Trivial	0.155	0.184
Cherry in tree	0.143	0.146
Leaf distance (d)	0.122	0.114
LCA distance (t)	0.068	0.056
Depth x/y (t)	0.050	0.058
Cherry depth (t)	0.047	0.045
Depth x/y (d)	0.043	0.038
LCA distance (d)	0.028	0.032
Leaf depth x (t)	0.023	0.024
Leaf depth y (t)	0.023	0.023
Cherry depth (d)	0.020	0.023
Leaf depth x (d)	0.020	0.022
Leaf depth y (d)	0.020	0.022
Before/after	0.015	0.016
Tree depth (d)	0.012	0.013
Tree depth (t)	0.011	0.011
New cherries	0.006	0.006
Leaves in tree	0.004	0.003

Higher importance indicates that a feature has more effect on the trained model. The values sum up to one. The descriptions of the features are given in Table 1

Back to article page

ISSN: 1748-7188

Contact us

General enquiries: journalsubmissions@springernature.com