Skip to main content

Table 1 Theoretical computational requirements

From: Efficient algorithms for training the parameters of hidden Markov models using stochastic expectation maximization (EM) training and Viterbi training

training one parameter at a time

type of training

algorithm

time

memory

reference

Viterbi

Viterbi

O(T max LM)

O(ML)

[17]

 

Lam-Meyer

O(T max LM)

O(M)

this paper

Baum-Welch

Baum-Welch

O(T max LM)

O(ML)

[13]

 

checkpointing

O(T max LM log(L))

O(M log(L))

[34]

 

linear-memory

O(T max LM)

O(M)

[29]

stochastic EM

forward & back-tracing

O(T max L(M + K))

O(ML)

[32]

 

Lam-Meyer

O(T max LMK)

O(MK + T max )

this paper

training P of Q parameters at the same time with P ∈ {1, ..., Q} and Q/P ∈ ℕ

Viterbi

Viterbi

O(T max LMQ/P)

O(ML)

[17]

 

Lam-Meyer

O(T max LMQ/P)

O(MP)

this paper

Baum-Welch

Baum-Welch

O(T max LMQ/P)

O(ML + P)

[13]

 

checkpointing

O(T max LMQ log(L/P))

O(M log(L))

[34]

 

linear-memory

O(T max LM Q/P)

O (M)

[29]

stochastic EM

forward & back-tracing

O(T max L(M + K)Q/P )

O(ML)

[32]

 

Lam-Meyer

O(T max LMKQ/P )

O(MKP + T max )

this paper

  1. Overview of the theoretical time and memory requirements for Viterbi training, Baum-Welch training and stochastic EM training for an HMM with M states, a connectivity of T max and Q free parameters. K denotes the number of state paths sampled in each iteration for every training sequence for stochastic EM training. The time and memory requirements below are the requirements per iteration for a single training sequence of length L. It is up to the user to decide whether to train the Q free parameters of the model sequentially, i.e. one at a time, or in parallel in groups. The two tables below cover all possibilities.
  2. In the general case we are dealing with a training set X = {X1, X2, ... , XN} of N sequences, where the length of training sequence Xi is Li. If training involves the entire training set, i.e. all training sequences simultaneously, L in the formulae below needs to be replaced by ∑ i = 1 N L i for the memory requirements and by max i {L i } for the time requirements. If, on the other hand, training is done by considering by one training sequence at a time, L in the formulae below needs to be replaced by ∑ i = 1 N L i for the time requirements and by max i {L i } for the memory requirements.