Model condition | PR-AUC (%) |
---|

GUIDANCE2 | SERES + GUIDANCE2 | Pairwise t-test corrected q-value |
---|

10.long.A | 92.32 | *92.94* | \(9.7 \times 10^{-4}\) |

10.long.B | 90.62 | *91.64* | \(3.3 \times 10^{-6}\) |

10.long.C | 85.10 | *87.93* | \(9.7 \times 10^{-4}\) |

10.long.D | 79.22 | *86.18* | \(9.7 \times 10^{-4}\) |

10.long.E | 67.63 | *78.48* | \(9.7 \times 10^{-4}\) |

Model condition | ROC-AUC (%) |
---|

GUIDANCE2 | SERES + GUIDANCE2 | DeLong et al. test corrected q-value |
---|

10.long.A | 89.99 | *90.99* | \(<10^{-10}\) |

10.long.B | 91.84 | *93.02* | \(<10^{-10}\) |

10.long.C | 93.14 | *94.59* | \(<10^{-10}\) |

10.long.D | 93.89 | *96.13* | \(<10^{-10}\) |

10.long.E | 92.62 | *94.38* | \(<10^{-10}\) |

- The performance of GUIDANCE2 and SERES + GUIDANCE2 is compared across model conditions 10.long.A through 10.long.E (named in order of generally increasing sequence divergence). Aggregate PR-AUC and ROC-AUC are reported across all replicate datasets in a model condition (\(n=20\)), and the best AUC for each pairwise method comparison on a model condition is shown in italics. Statistical significance of PR-AUC or ROC-AUC differences was assessed using a one-tailed pairwise t-test or DeLong et al. [5] test, respectively, and multiple test correction was performed using the method of Benjamini and Hochberg [1]. Corrected q-values are reported (\(n=20\)) and all were significant (\(\alpha =0.05\))