Model condition | PR-AUC (%) |
---|
GUIDANCE2 | SERES + GUIDANCE2 | Pairwise t-test corrected q-value |
---|
10.long.A | 92.32 | 92.94 | \(9.7 \times 10^{-4}\) |
10.long.B | 90.62 | 91.64 | \(3.3 \times 10^{-6}\) |
10.long.C | 85.10 | 87.93 | \(9.7 \times 10^{-4}\) |
10.long.D | 79.22 | 86.18 | \(9.7 \times 10^{-4}\) |
10.long.E | 67.63 | 78.48 | \(9.7 \times 10^{-4}\) |
Model condition | ROC-AUC (%) |
---|
GUIDANCE2 | SERES + GUIDANCE2 | DeLong et al. test corrected q-value |
---|
10.long.A | 89.99 | 90.99 | \(<10^{-10}\) |
10.long.B | 91.84 | 93.02 | \(<10^{-10}\) |
10.long.C | 93.14 | 94.59 | \(<10^{-10}\) |
10.long.D | 93.89 | 96.13 | \(<10^{-10}\) |
10.long.E | 92.62 | 94.38 | \(<10^{-10}\) |
- The performance of GUIDANCE2 and SERES + GUIDANCE2 is compared across model conditions 10.long.A through 10.long.E (named in order of generally increasing sequence divergence). Aggregate PR-AUC and ROC-AUC are reported across all replicate datasets in a model condition (\(n=20\)), and the best AUC for each pairwise method comparison on a model condition is shown in italics. Statistical significance of PR-AUC or ROC-AUC differences was assessed using a one-tailed pairwise t-test or DeLong et al. [5] test, respectively, and multiple test correction was performed using the method of Benjamini and Hochberg [1]. Corrected q-values are reported (\(n=20\)) and all were significant (\(\alpha =0.05\))