Non-parametric and semi-parametric support estimation using SEquential RESampling random walks on biomolecular sequences

Table 2 Long-gap-length model conditions: parameter values and summary statistics

Model condition	Tree height	Insertion/deletion probability	NHD	Gappiness	True align length	Est align length	SP-FN	SP-FP
10.long.A	0.4	0.13	0.276	0.440	1804.8	1433.7	0.272	0.315
10.long.B	0.7	0.1	0.363	0.481	1926.7	1447.8	0.381	0.426
10.long.C	1	0.06	0.455	0.456	1853.5	1413.3	0.510	0.537
10.long.D	1.6	0.031	0.542	0.432	1754.1	1403.1	0.725	0.729
10.long.E	4.3	0.013	0.660	0.445	1811.0	1560.1	0.899	0.897

Our simulation study included additional 10-taxon model conditions that utilized the long gap length distribution from the study of Liu et al. [15]. The model parameters consisted of model tree height and insertion/deletion probability, and each model condition corresponds to a distinct set of model parameter values. The long-gap-length model conditions are named 10.long.A through 10.long.E in order of generally increasing sequence divergence. The following table columns list average summary statistics for each model condition (\(n=20\)). “NHD” is the average normalized Hamming distance of a pair of aligned sequences in the true alignment. “Gappiness” is the percentage of true alignment cells which consists of indels. “True align length” is the length of the true alignment. “Est align length” is the length of the MAFFT-estimated alignment [9] which was provided as input to the support estimation methods. “SP-FN” and “SP-FP” are the proportion of homologies that appear in the true alignment but not in the MAFFT-estimated alignment and vice versa, respectively

ISSN: 1748-7188