Skip to main content

Table 3 Trained random forest models on different datasets for different combinations of \(\max L\) (maximum number of leaves per network) and M (number of networks)

From: Constructing phylogenetic networks via cherry picking and machine learning

\(\max L\)

M

Accuracy

Num. data

Training (min)

Data gen. (hour/core)

(a) Normal

20

5

1.0

840

00:00

00:00:12

 

10

0.994

1804

00:00

00:00:22

 

100

0.998

17,388

00:03

00:04:19

 

500

0.994

73,168

00:16

00:15:18

 

1000

0.993

151,308

00:42

00:29:49

50

5

0.994

3580

00:00

00:01:21

 

10

0.997

7860

00:01

00:02:22

 

100

0.996

53,988

00:11

00:18:07

 

500

0.997

268,552

01:04

01:31:18

 

1000

0.998

535,624

04:01

02:56:21

100

5

1.0

4944

00:00

00:01:13

 

10

0.999

12,444

00:01

00:04:05

 

100

0.999

128,824

00:25

00:41:54

 

500

0.999

676,768

04:21

04:15:49

 

1000

0.999

1,362,220

12:10

08:08:58

\(\max L\)

M

Accuracy

Num. data

Training (min)

Data gen. (hour/core)

(b) LGT

20

5

0.974

768

00:01

00:00:19

 

10

0.994

1548

00:02

00:00:41

 

100

0.976

12,244

00:09

00:04:20

 

500

0.975

58,900

00:24

00:19:13

 

1000

0.975

118,104

00:27

00:35:38

50

5

0.997

2952

00:01

00:00:43

 

10

0.995

3796

00:03

00:01:01

 

100

0.995

44,116

00:23

00:14:01

 

500

0.994

219,472

01:39

01:06:45

 

1000

0.994

421,204

02:45

02:10:45

100

5

0.996

5080

00:06

00:01:23

 

10

0.996

7540

00:05

00:01:58

 

100

0.998

114,900

00:31

00:34:25

 

500

0.998

605,652

04:44

02:54:15

 

1000

0.998

1,175,628

10:23

05:31:13

  1. Each row in the table represents one model. For each model, the testing accuracy is given under “Accuracy”, and the total number of data points retrieved from all M networks is given under “Num. data”. Each dataset is split for training and testing (\(90-10\%\)). The training duration for the random forest is given in column “Training” and the time needed to generate the training data is given in column “Data gen.”, in hours per core (we used 16 cores in total)