Skip to main content


Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Fig. 2 | Algorithms for Molecular Biology

Fig. 2

From: Regmex: a statistical tool for exploring motifs in ranked sequence lists from genomics experiments

Fig. 2

a Motif in the form of a regular expression. Base coloring applies throughout the figure. b Deterministic finite state automaton (DFA) corresponding to the regular expression in a. Initial state is indicated in gray, and end state is indicated by a double circle. c Transition state probability matrix (TPM) associated with the model in b. d Embedded Markov Model (eDFA) for two observed occurrences of the motif. States are pre-indexed with the number of already observed motifs. e Embedded transition state probability matrix (eTPM) associated with the eDFA. The yellow matrix is an exact copy of the yellow matrix from c. The gray entries have zero probability. The transition probabilities from the end state of the DFA model (red/orange entries in matrix from c) are shifted forward and contain the initial state of the next motif occurrence, except for any end to end transition probability (occurs for REs ending with a *), which remains in the DFA template (red entry). The final state of the eDFA ((2,4) in d) is an absorbing state with transition probability of 1 to itself, indicated in black. f Heat diagrams of the n-step eTPM reflecting the probability of moving between states in the eDFA, given a random sequence of length n with a specific base composition. The row corresponding to the initial state (0,1) holds the probability distribution of going from the start state to any state in the eDFA in n steps. The last entry of this row (red entry) holds the probability of the observed number of motifs (nobs) or more in the sequence (the SSP)

Back to article page