Skip to main content

Table 2 Long-gap-length model conditions: parameter values and summary statistics

From: Non-parametric and semi-parametric support estimation using SEquential RESampling random walks on biomolecular sequences

Model condition

Tree height

Insertion/deletion probability

NHD

Gappiness

True align length

Est align length

SP-FN

SP-FP

10.long.A

0.4

0.13

0.276

0.440

1804.8

1433.7

0.272

0.315

10.long.B

0.7

0.1

0.363

0.481

1926.7

1447.8

0.381

0.426

10.long.C

1

0.06

0.455

0.456

1853.5

1413.3

0.510

0.537

10.long.D

1.6

0.031

0.542

0.432

1754.1

1403.1

0.725

0.729

10.long.E

4.3

0.013

0.660

0.445

1811.0

1560.1

0.899

0.897

  1. Our simulation study included additional 10-taxon model conditions that utilized the long gap length distribution from the study of Liu et al. [15]. The model parameters consisted of model tree height and insertion/deletion probability, and each model condition corresponds to a distinct set of model parameter values. The long-gap-length model conditions are named 10.long.A through 10.long.E in order of generally increasing sequence divergence. The following table columns list average summary statistics for each model condition (\(n=20\)). “NHD” is the average normalized Hamming distance of a pair of aligned sequences in the true alignment. “Gappiness” is the percentage of true alignment cells which consists of indels. “True align length” is the length of the true alignment. “Est align length” is the length of the MAFFT-estimated alignment [9] which was provided as input to the support estimation methods. “SP-FN” and “SP-FP” are the proportion of homologies that appear in the true alignment but not in the MAFFT-estimated alignment and vice versa, respectively