Non-parametric and semi-parametric support estimation using SEquential RESampling random walks on biomolecular sequences

Table 5 Support estimation method performance on main model conditions

Model condition	PR-AUC (%)		Pairwise t-test corrected q-value	ROC-AUC (%)		DeLong et al. test corrected q-value
Model condition	GUIDANCE1	SERES + GUIDANCE1	Pairwise t-test corrected q-value	GUIDANCE1	SERES + GUIDANCE1	DeLong et al. test corrected q-value
10.A	88.74	91.17	\(5.4 \times 10^{-7}\)	80.22	85.57	\(<10^{-10}\)
10.B	82.21	86.26	\(1.5 \times 10^{-6}\)	84.83	88.66	\(<10^{-10}\)
10.C	76.23	83.49	\(1.9 \times 10^{-4}\)	86.98	91.23	\(<10^{-10}\)
10.D	74.65	85.81	\(1.9 \times 10^{-4}\)	88.55	93.72	\(<10^{-10}\)
10.E	42.61	59.20	\(3.1 \times 10^{-4}\)	82.24	87.40	\(<10^{-10}\)
50.A	98.22	98.92	\(5.3 \times 10^{-10}\)	83.09	90.64	\(<10^{-10}\)
50.B	97.84	98.69	\(2.8 \times 10^{-9}\)	82.85	90.39	\(<10^{-10}\)
50.C	95.08	96.80	\(5.6 \times 10^{-8}\)	85.54	90.64	\(<10^{-10}\)
50.D	90.79	95.75	\(5.3 \times 10^{-6}\)	88.89	94.56	\(<10^{-10}\)
50.E	62.47	79.14	\(8.0 \times 10^{-10}\)	91.02	93.23	\(<10^{-10}\)

Model condition	PR-AUC (%)		Pairwise t-test corrected q-value	ROC-AUC (%)		DeLong et al. test corrected q-value
Model condition	GUIDANCE2	SERES + GUIDANCE2	Pairwise t-test corrected q-value	GUIDANCE2	SERES + GUIDANCE2	DeLong et al. test corrected q-value
10.A	92.55	93.33	\(7.4 \times 10^{-6}\)	87.17	88.34	\(<10^{-10}\)
10.B	88.08	89.31	\(8.4 \times 10^{-4}\)	89.45	90.56	\(<10^{-10}\)
10.C	84.28	86.86	\(3.1 \times 10^{-4}\)	91.36	92.88	\(<10^{-10}\)
10.D	86.03	88.75	\(1.9 \times 10^{-4}\)	93.34	94.69	\(<10^{-10}\)
10.E	51.17	62.30	\(1.3 \times 10^{-3}\)	86.00	88.28	\(<10^{-10}\)
50.A	98.98	99.14	\(5.3 \times 10^{-6}\)	91.17	92.50	\(<10^{-10}\)
50.B	98.79	98.96	\(1.5 \times 10^{-6}\)	91.24	92.44	\(<10^{-10}\)
50.C	96.86	97.45	\(3.2 \times 10^{-7}\)	90.81	92.31	\(<10^{-10}\)
50.D	94.04	96.23	\(1.5 \times 10^{-5}\)	92.67	95.09	\(<10^{-10}\)
50.E	72.61	81.47	\(1.5 \times 10^{-8}\)	92.94	94.22	\(<10^{-10}\)

Results are shown for five 10-taxon model conditions (named 10.A through 10.E in order of generally increasing sequence divergence) and five 50-taxon model conditions (similarly named 50.A through 50.E). We evaluated the performance of two state-of-the-art methods for MSA support estimation—GUIDANCE1 [18] and GUIDANCE2 [20]—versus re-estimation on SERES and parametrically resampled replicates (using parametric techniques from either GUIDANCE1 or GUIDANCE2) (see “Methods” section for details.) We calculated each method’s precision-recall (PR) and receiver operating characteristic (ROC) curves. Performance is evaluated based upon aggregate area under curve (AUC) across all replicates for a model condition (\(n=20\)). The top rows show AUC comparisons of GUIDANCE1 (“GUIDANCE1”) vs. SERES combined with parametric techniques from GUIDANCE1 (“SERES + GUIDANCE1”), and the bottom rows show AUC comparisons of GUIDANCE2 (“GUIDANCE2”) vs. SERES combined with parametric techniques from GUIDANCE2 (“SERES + GUIDANCE2”); for each model condition and pairwise comparison, the best AUC is shown in italics. Statistical significance of PR-AUC or ROC-AUC differences was assessed using a one-tailed pairwise t-test or DeLong et al. [5] test, respectively, and multiple test correction was performed using the method of Benjamini and Hochberg [1]. Corrected q-values are reported (\(n=20\)) and all were significant (\(\alpha =0.05\))

ISSN: 1748-7188