Skip to main content
Figure 4 | Algorithms for Molecular Biology

Figure 4

From: Computing distribution of scale independent motifs in biological sequences

Figure 4

Kernel density for L = 4 and S = 1 applied to the concatenation of 20 promoter regions of Bacillus subtilis (see Discussion). The density is displayed both as a 3D bar (top) and as a 2D gray scale heat map (bottom). The accurate capturing of conserved tetranucleotide segments is illustrated for the TATA-box in the latter view, and for the TTGACA binding site at position -35 in the former. The two views also illustrate the two types of decomposition of conserved sequences. For the TTGACA sequence the decomposition is performed for the resolution of the kernel (L = 4) and all 3 tetranucleotides embedded in the 6 unit sequence are identified. The density scale is normalized to the length of the sequence so the average height is one unit – which is to say that the area of the density distribution is, as it should for a unit square base, unitary by definition. The three tables at the top detail the densities of the possible tetranucleotides for each of the trinucleotide quadrants. It can be observed that in each of them the conserved segment invariably has the highest density. The decomposition of the TATA-box, in the bottom view is instead illustrated for a succession of scales, from mononucleotide to tetranucleotide. The cumulative distribution of densities is displayed at the top left, disclosing a skew towards lower values, with over 60% of densities are below the unit average.

Back to article page