Skip to main content


Fig. 1 | Algorithms for Molecular Biology

Fig. 1

From: Bayesian localization of CNV candidates in WGS data within minutes

Fig. 1

Pipeline for CNV calls in rat populations, divergently selected for tame and aggressive behavior. After individual barcoding and multiplex sequencing, counts of mapped start positions for the tame population are subtracted from those in the aggressive population. This removes shared additive bias from the data. Afterwards, due to low coverage, the data is averaged over 20 positions to make the noise approximately Gaussian. Then, the data is transformed into a breakpoint array data structure, comprised of sufficient statistics as well as a pointer structure to facilitate rapid creation of compressed data blocks depending on a given threshold. The breakpoint array generates block boundaries corresponding to discontinuities obtained in Haar wavelet regression. The universal threshold is used for compression, based on the lowest sampled noise variance in the emission parameters of the HMM during Forward–Backward Gibbs sampling

Back to article page