Interpretation and approximation tools for big, dense Markov chain transition matrices in population genetics
 Katja Reichel^{1}Email authorView ORCID ID profile,
 Valentin Bahier†^{1},
 Cédric Midoux†^{1},
 Nicolas Parisey^{1},
 JeanPierre Masson^{1} and
 Solenn Stoeckel^{1}
https://doi.org/10.1186/s1301501500615
© Reichel et al. 2015
Received: 20 August 2015
Accepted: 4 December 2015
Published: 30 December 2015
Abstract
Background
Markov chains are a common framework for individualbased state and time discrete models in evolution. Though they played an important role in the development of basic population genetic theory, the analysis of more complex evolutionary scenarios typically involves approximation with other types of models. As the number of states increases, the big, dense transition matrices involved become increasingly unwieldy. However, advances in computational technology continue to reduce the challenges of “big data”, thus giving new potential to staterich Markov chains in theoretical population genetics.
Results
Using a population genetic model based on genotype frequencies as an example, we propose a set of methods to assist in the computation and interpretation of big, dense Markov chain transition matrices. With the help of network analysis, we demonstrate how they can be transformed into clear and easily interpretable graphs, providing a new perspective even on the classic case of a randomly mating, finite population with mutation. Moreover, we describe an algorithm to save computer memory by substituting the original matrix with a sparse approximate while preserving its mathematically important properties, including a closely corresponding dominant (normalized) eigenvector. A global sensitivity analysis of the approximation results in our example shows that size reduction of more than 90 % is possible without significantly affecting the basic model results. Sample implementations of our methods are collected in the Python module mamoth.
Conclusion
Our methods help to make stochastic population genetic models involving big, dense transition matrices computationally feasible. Our visualization techniques provide new ways to explore such models and concisely present the results. Thus, our methods will contribute to establish staterich Markov chains as a valuable supplement to the diversity of population genetic models currently employed, providing interesting new details about evolution e.g. under nonstandard reproductive systems such as partial clonality.
Keywords
Background
Natural systems often possess inherently discrete states in space, time or both. Atoms, molecules and cells, organs, individuals, populations and taxa usually appear as distinct entities; along the time axis, the radiation cycles we use as the basis for atomic clocks, neuronal action potentials, developmental stages in an organisms life cycle, generations and the revolutions of the earth around the sun are examples for similar patterns.
Modeling these discrete systems as such can have advantages over continuous approximations. One of the earliest examples comes from thermodynamics [1], where heat emission spectra could only be predicted correctly if energy “comes in packets”, known as “quanta”. This discovery led to the new field of quantum mechanics, which provided the necessary theory for understanding the photovoltaic effect [2], thus proving essential for the invention of solar cells. In biology, the rediscovery of Mendel's rules and thus of the “quantal” nature of genetic heritability, at about the same time as Planck’s famous speech, has had a similar impact on the study of evolution as the latter’s research has had on thermodynamics [3]. While most of the objects of biological research have long been recognised as discrete (e.g., the word individual literally means not dividable, a notion very similar to that of a quantum), we still struggle with understanding the processes, such as evolution, linking them to potential emergent properties (analogous to the physicists’ heat spectra) at higher levels. Mathematical models preserving the discrete nature of the biological system are thus an interesting field of study.
Markov chains are a classical framework for modeling state and time discrete stochastic systems. Based on the assumption that the modeled system is memoryless (Markov property, [4]), the basic model equation consists in multiplying a “start” vector, providing the state of the system at a given time, with a typically square “step” matrix. This matrix holds the transition probabilities, which depend on the model parameters and typically remain constant through time, between all possible states of the system within one time step. By analyzing the transition matrix, both the “short term” transient behavior and the “long term” limiting behavior of the model can be studied, thus putting the matrix at the center of attention for the biological interpretation of the results. Markov chains and other related forms of matrixbased models, such as Leslie models in population dynamics, are already widely in use (e.g. [5– 7]), yet in many cases the number of modeled states is comparatively small and/or a major part of the transitions are considered impossible. The latter property leads to many zeros in the transition matrix, which then becomes sparse, as opposed to a dense matrix where zeros are rare. Computationally, sparse matrices are advantageous since memory may be saved by storing only those values which are different from zero. Special algorithms exist to carry out standard operations (e.g. matrix multiplication) directly on matrices stored in a sparse format (e.g. [8, 9]).
Examples of matrix size based on the StoeckelMasson model. Memory use is approximate and assumes 64bit accuracy
N  \(\mathcal {P}\)  \(\mathcal {L}\)  \(\mathcal {A}\)  g  S  Memory use 

20  2  1  2  3  231  420 KB 
100  2  1  2  3  5151  205 MB 
500  2  1  2  3  125,751  120 GB 
1000  2  1  2  3  501,501  2 TB 
20  4  1  2  5  10,626  865 MB 
20  2  2  2  9  3,108,105  75 TB 
20  2  1  4  10  10,015,005  730 TB 
20  2  2  4  100  \(9.8 \times 10^{20}\)  \(6.5 \times 10^{21}\) YB 
In this article, we provide a set of methods for visualizing and interpreting both the transient and limiting behavior of population genetic models involving staterich, irreducible, aperiodic and timehomogeneous Markov chains, based on the transition matrix and its dominant eigenvector, as well as a method for approximating a dense transition matrix by a sparse substitute. For the first part, we combine de Finetti diagrams [17] with network analysis, extending both concepts to provide clear and informative diagrams for the analysis of population genetic processes. For the second part, we use a predefined threshold (minimal percentage of information contained in the transition matrix) to keep only the more probable transient behavior of the model, while at the same time ensuring that mathematically important matrix properties are kept. The model presented in [16] serves as an example to illustrate our methods.
Model example
The population genetic model of Stoeckel and Masson [16] describes the evolution of genotype frequencies based on a single locus with two alleles a and A in a fixedsize population of diploid, partially asexual organisms. States are defined as assignations of the N individuals in the population to the three possible genotypes (aa, aA, AA). The transition probabilities between the states depend on a symmetric mutation rate \(\mu\) and a constant rate of asexual reproduction c, defined as the probability that an individual in the next generation was derived clonally from a single parent.
Transition matrices M resulting from this model are generally square, due to the fixed population size (a common feature of many population genetic models, compare [3]). They also have a density of one—transitions between all states are possible in one step, although some of them (e.g. all individuals aa to all individuals AA) are very unlikely. The corresponding Markov chain is thus irreducible (single communicating class, no absorbing states) and aperiodic (period of all states equals one, same state possible in consecutive time steps). Since the mutation rate \(\mu\) is symmetric, i.e. changes from a to A are just as likely as the inverse, M is also partially symmetric: if the transition probabilities from one particular state to all others have been calculated, swapping the names of all alleles also gives a correct result (compare Figs. 1, 2). The notation in this article assumes leftstochastic matrices (columns represent the transition probabilities from one state to all others and thus sum to one), which implies that the limiting behavior of the Markov chain is described by its transition matrices’ (normalized) right eigenvector v to the eigenvalue with the largest absolute value (and multiplicity one, [18]): one.
Results
Working with big, dense transition matrices poses two connected problems: on the one hand, the storage size of the matrix may considerably slow down calculations or be altogether too big for the computer, on the other hand, the relevant information about the model may be difficult to extract from the great amount of data contained in the matrix. Visualization techniques for the interpretation of matrix data can, however, also help to find matrix properties which allow reducing the storage size, such as partial symmetry or the occurrence of many nearzero transition probabilities. We therefore start by describing the visualization techniques in the first part, and then move on to storage size reduction by sparse approximation in the second part of the results.
Visualization
An intuitive first step in analyzing the transient behavior of a Markov chain model is a diagnostic visualization of the transition matrix. By summarizing results in an accessible way, the resulting diagram may ideally also provide a basis for direct biological interpretation. With one exception (landscape plot), all the following visualization methods are available using the functions histogrid, histo3d and networkplot (with its support function percolation) in the mamoth module; an example for the runtime of each method is given in Additional file 1.
Heat map

In our example, the heat map shows that many of the transition probabilities in the matrix are, though not equal, very close to zero. After reordering the states, the partial symmetry of the matrix also becomes visible.
Network display
The duality between matrices and graphs (e.g. [7, 25]) provides an alternative for the visualization and mathematical analysis of either structure. In a graph \(\mathcal {G(V, E)}\), the states of a Markov chain are thus represented as nodes/vertices \(\mathcal {V}\) and the transitions as (weighted and directed) edges \(\mathcal {E}\) connecting them, which is especially useful for sparse transition matrices.
For big, dense matrices, the number of edges in the resulting complete multidigraph (of edge multiplicity two) equals the number of entries in the transition matrix and is thus too big for easy interpretation. Concepts from network theory can be used to selectively display edges and summarize information about each state of the model system on the nodes. This leads to a variety of very clear synthetic representations constructed with different parameters and taking into account different time scales: from one generation (based on M) across t generations (based on \(M^{t}\)) up to the longtime equilibrium (dominant eigenvector of M, v).
The following visualization techniques are based on selectively displaying the network’s edges:
Most probable path This is the counterpart of a shortest path if distances (edge weights) represent probabilities. For each noncommutative pair of states i and j, there exists at least one series of consecutive edges connecting i to j along which the product of the edge weights is maximal. It can be determined by using an “ordinary” shortest path algorithm (e.g. [26, 27]) on a negative log transform of the transition matrix. The most probable path is the most likely trajectory of the model system to get from one state to another.

In our example (Fig. 2), a change from a population with only the aa genotype to one with only the AA genotype would closely follow the HardyWeinberg curve.

In our example (Additional file 3), horizontal transitions along the base of the triangle, where no heterozygotes are produced despite of two homozygous genotypes being present in the population, would be excluded.
The following visualization techniques are based on changing the appearance of the network’s nodes:

In our example (Fig. 2), the nodes with the highest indegree are nearest neighbors to the largest number of nodes; if all states were equally likely at the current generation, those next to (0.25; 0.5; 0.25) on the HardyWeinberg curve would be the most likely in the next generation.
Betweennesscentrality Based on the same concept as the most probable path, this can be redefined as the number of most probable paths passing through each node when connections between each pair of nodes are considered. It can be derived in a similar way as the most probable path, by applying a standard algorithm developed for additive distances to a negative log transform of the multiplicative probabilities in M. Nodes with a high betweennesscentrality represent frequent transient states.

In our example, these are all the states along the HardyWeinberg curve except for the fixation states (Additional file 4).
 \(p_{\text {stay}}\) :

probability to stay for one time step
\(p_{\text {stay}}(i) = p_{i,i}\), the probabilities on the matrix diagonal; for each state i this is the probability that the system remains at state i for the next time step (“stickiness”). This probability allows the easy detection of (near)absorptive states.
In population genetics, the fixation states \(\lbrace (N;0;0),\) \((0;0;N)\rbrace\) are typical examples (Fig. 2).
 \(p_{\text {out}}\) :

probability to leave in one time step
\(p_{\text {out}}(i) = 1p_{i,i},\) the column sums of the matrix without the diagonal; for each state i this is the probability that the system changes state at the next time step (“conductivity”). Being the opposite of \(p_{stay}\), this probability allows the detection of states which are rarely occupied for consecutive time steps. In our example, these are the states where the population consists of an approximately even mixture of both homozygotes (central basis of the triangle) or only of heterozygotes (top of the triangle; Additional file 3). In contrast, the row sums of a leftstochastic matrix may exceed one and are thus not probabilities. As a result of the Markov property, a probability to arrive always depends on the state at the previous time step, which results in a number of possible definitions.
 p(ij) :

probability to arrive from state j in one time step
\(p(ij) = p_{j, i}, j \in S\), all probabilities in one column of the transition matrix; the probability distribution (mean, variance, skew according to arrangement of nodes) for transitions starting from one particular state. This allows the prediction of the most likely states for the next time step. In our example, the variance around the fixation states is much more limited than at the interior states of the triangle (Additional file 4).
 \(p_{\text {in}}\) :

probability to arrive in one time step
\(p_{\text {in}}(i) = 1/(S1) \cdot \sum _j p_{j, i}\) for \(i \ne j\), the row sums of the matrix divided by the number of other states; probability to arrive at state i if all previous states are equally likely. This shows states which are generally very likely destinations for onestep transitions. In our example, these are the states around the HardyWeinberg curve (additional file 3).
 \(p_{\text {in}}^{\infty }\) :

probability to arrive in an infinite run
\(p_{\text {in}}^{\infty }(i) = \sum _j p_{j, i} \cdot v_{j}\) for \(i \ne j\), the sum over the elementwise product of eigenvector and matrix row, without the diagonal; probabilities to arrive at state i if the likelihood of the previous states is distributed according to the limiting distribution. This shows the states which are the most frequent destination of transitions in an infinite run of the model. In our example, these are the two states next to the fixation states where there is exactly one “foreign” allele (Additional file 4).
 \(p^{\infty }\) :

limiting distribution/eigenvectorcentrality
\(p^{\infty }(i) = v_{i}\), the eigenvector; probability to find the system at state i after infinitely many time steps, or proportion of time spent in each state averaged over infinitely many time steps (limiting distribution). This is the prediction for the most likely states independently of the start state. As is well known for our example, these are the fixation states (Additional file 3).

For our example, plotting the expected time to the fixation states shows that it depends predominantly on the current state’s allele frequencies (Additional file 4).
Landscape plot

In our example, the expected dynamics of the genotype frequencies show convergence to the HardyWeinberg equilibrium (Additional file 5).
Note: because of its dependence on a function or matrix specifying the distances between states, and on the triangular gridlike structure of the state space, this method is not included in the mamoth source code.
Approximation

External memory: the whole matrix is stored on a (sufficiently large) hard drive, only parts are loaded into active storage when needed (analogous to [28])

Iterative/selective matrix creation: the whole matrix is never stored, only parts are created when needed (e.g. in combination with algorithms such as [29])

Lumping states based on model properties: if a group of states has the same (sum of) transition probabilities leading into it and out of it to any other (group of) states and the same analytical meaning (e.g. same value of \(F_{IS}\)) they can be combined into one ([30, 31]); other algorithms of state aggregation, such as [32], lead to an approximation of the original matrix

Sparse approximation: turning a dense matrix into a sparse matrix by approximating very small matrix elements to zero (e.g. as in [33, 34])
Because of the symmetry between the two allele frequencies in our model example, almost half of all states could be pairwise lumped, thus reducing matrix size to a little over a quarter of the original. The exception are the states on the symmetry axis of the de Finetti diagram (compare Figs. 1, 2), which do not have a “lumping partner”. Symmetry with respect to the allele frequencies is often found in population genetics models [3]. However, because of this dependency on model structure a size reduction algorithm based on lumping would not be applicable to nonsymmetric extensions of the original model, e.g. with an asymmetric mutation rate or directional selection. Allele frequencies would have to be analyzed jointly, as the new states retain only the ratio of both; once lumped, “unpacking” the states becomes difficult.
The high number of very small values in the Markov chain transition matrix (Fig. 1) of our model example suggests that sparse approximation would be very effective. Moreover, as each column of the matrix corresponds to a probability distribution (constant sum of one) which becomes less uniform as the number of states/population size increases (the expected convergence to a multinormal distribution with variance proportional to 1/N is the underlying principle of the wellknown diffusion approximation), the proportion of very small transition probabilities is likely to augment as the matrix size increases. While sparse approximation is independent of modelspecific properties such as symmetry and does not change the states as such, it has the disadvantage of changing the actual content of the transition matrix, potentially leading to the loss of relevant properties such as leftstochasticity or irreducibility.
The sparse approximation algorithm we propose ensures that the resulting sparse matrix still has all the properties relevant to its function in the Markov chain model. Additionally, it can be executed iteratively so that the complete dense matrix need not be stored. The algorithm iterates over all columns of the transition matrix M and excludes (almost) all values which, in total, contribute less than a threshold value \(s \in [0,1]\) to the column sum:
 1.
Create a permutation R of the row indices so that the corresponding entries are ranked according to size: \(R \leftarrow \text {ordinalrank}(j\, \, 1 \ge C^i_{j} \ge 0)\)
 2.
Find the minimal rank (index of R) so the corresponding entries sum at least to the threshold value s: \(r \leftarrow \text {min}(k)\) for \(\sum _{R_1}^{R_k} C^i_{R_k} \ge s\)
 3.
Keep at least the two biggest values per column: \(r \leftarrow \text {max}(2,r)\)
 4.
Keep all values of equal rank: while \(C^i_{R_{r+1}} = C^i_{R_{r}}\) : \(r \leftarrow r+1\)
 5.
Round all values with ranks greater then r to zero, but keep those on the main diagonal and the first lower and first upper diagonals: \(C^i_{R_k} \leftarrow 0\) for all k with \(k > r \wedge R_k \notin \{(i1, i, i+1) \, \text {mod} \, S\}\)
 6.
Rescale the column to sum to 1: \(C^i \leftarrow C^i/\text {sum}(C^i)\).
Both the efficiency, i.e. the density or memory use of the resulting matrix, and the bias vary according to the value of s and the distribution of values in the original matrix. If s is low or the probability distribution in the column is far from uniform, more values will be discarded (compare Fig. 3). An appropriate value for s has to be determined heuristically by testing successively increasing values, up to the point where the bias due to the approximation no longer interferes with the interpretability of the model results. The sum of the differences between the entries of the approximate and original matrices has a theoretical upper limit of \((1s) \cdot S\), but the effect of this perturbation on the model output may be more complex.
The results of the GSA show that all four model parameters may generally have nonlinear/interacting effects on the quality of the approximation, but in the mean these effects are not very strong (Fig. 5). Memory reduction is highly efficient as the mean density of the sparse matrices was only \(\approx 0.11\). Individual densities ranged from \(\approx 0.42\) (small matrix, high threshold) to \(\approx 0.03\) (big matrix, low threshold), varying most strongly with the population size, though all four parameters have a significant influence. On our reference system (Intel Core i73930K 3.2 GHz processor with 64 GB RAM), calculating the sparse approximation based on the original matrix took on average 1.7 s for \(N=50\) (14.6 s to construct the original), and 31.3 s for \(N=100\) (221.7 s to construct the original). Finding the dominant eigenvector of sparse approximate and original matrix took on average 0.1 s (sparse) versus 51.7 s (original) for \(N=50\) and 2.4 s (sparse) versus 7869.1 s (2 h, 11 min, 9.1 s, original) for \(N=100\), so that in both cases less than one percent of the original runtime was needed with the sparse approximate matrix.
In conclusion, sparse approximation using our algorithm has the advantage of being easily applicable to all transition matrices independently of the properties of the underlying model, and is well suited to provide an overview of the equilibrium \(F_{IS}\) distribution under different rates of asexual reproduction in our model example. However, it needs an initial effort to verify the model results derived from the approximate matrix and to estimate their final bias. For finescale analyses, lumping states may provide an approximationfree alternative, but is not always possible as it depends on the model structure.
Discussion
As the technological obstacles of working with “big data” become smaller, new opportunities arise especially for stochastic models, e.g. in population genetics. Yet these opportunities also lead to new challenges: results need to be brought into an interpretable form, and the technological boundaries further pushed back to allow even more complexity. We developed methods to help with the computational analysis and interpretation of staterich time and spacediscrete Markov chain models in population genetics, focusing on the particularly challenging case of very dense matrices.
Markov chain models are a versatile framework also for population genetic questions, and may often provide a first step in the development of analytic formulae [3]. Further relevant parameters such as selection, migration or “unusual” reproductive systems can be easily included in such a model. Yet even for randomly mating population with genetic drift and mutation, a standard case of population genetics, a Markov chain model such as [16] may still yield additional information with the help of our visualization methods: In particular, the shortterm dynamics, e.g. probabilistic trajectories connecting a current and a previous or predicted state, and the resulting variation around the expectation of convergence to the HardyWeinberg equilibrium are made visible. Especially for small populations, which are highly relevant e.g. for conservation genetics [41], and questions relating to development through time rather than just the longterm equilibrium, such Markov chain models may thus become valuable tools.
Though the size limitation for computational matrix analysis may never be completely removed, we showed that there are ways to circumvent it: even without access to specialized hardware, big, dense transition matrices may be manageable either by lumping states, or by approximating rare transitions to zero with our sparse approximation algorithm. In our model example, the approximation provided sufficiently accurate results for the limiting distribution of \(F_{IS}\). Though there is an initial effort of verification, the advantage of sparse approximate matrices is considerable as they can subsequently be used also on less powerful hardware e.g. to speed up or allow the calculation of eigenvectors on systems incapable of storing the full model. In our model example, the size reduction of sometimes more than 90 % would e.g. make it possible to use the equilibrium \(F_{IS}\) distributions for the inference of model parameters in an analysis software without having to store a—necessarily incomplete—reference collection of precalculated distributions for very big matrices. Moreover, some of our visualization methods (e.g. most probable neighbor, \(p_{\text {in}}\), \(p_{\text {stay}}\), \(p_{\text {out}}\), p(ij)) can be used without ever storing the whole matrix, while providing even very powerful conclusions about model behavior. Our sparse approximation method is not intended to substitute other approaches, and we did not test if it outperforms the accuracy of other approximations (e.g. diffusion approximation) for any specific question. Rather, it is a supplement, allowing to keep the structure of the original Markov chain model with the corresponding interpretation techniques beyond the technical limit, and a potential reference for the existing methods.
Individualbased models are becoming more and more popular in biology [42, 43], which will further increase the frequency of encountering computationally challenging cases such as the one we presented. In population genetics, modeling more complex evolutionary parameters such as life cycles and reproductive mechanisms, multidimensional fitness landscapes or dispersal may often lead to the necessity of extending the traditional models from allele frequencies [3] to genotypes. Due to the diploid/polyploid nature of most higher organisms, this will necessarily increase the size of transition matrices and equation systems to be analysed. By presenting our approach, we hope to encourage and inspire others to extend and adapt our methods, thus further paving the way for the use of Markov Chain models with big, dense transition matrices.
Conclusion
We described and evaluated a set of tools, implemented in the Python module mamoth, for working with staterich Markov chain models in population genetics. These tools ease the interpretation of model behavior by providing diagnostic visualizations of transition matrices, and allow substituting dense transition matrices with a sparse counterpart by applying an iterative approximation algorithm that is independent of model symmetry. Thus, our methods permit an advanced analysis of increasingly complex Markov chain models in population genetics, without giving up their space and time discrete structure. They may therefore contribute e.g. to the study of the population genetic consequences of partially clonal reproduction.
Availability and requirements
The methods we described can be easily implemented in any scientific programming environment; we provide sample code for Python for all methods which do not rely on the specific state definitions of our model example.
Project name: mamoth.
Project home page: http://www6.rennes.inra.fr/igepp_eng/Productions/Software
Operating system(s): Platform independent.
Programming language: Python.
Other requirements: Python 2.7 or 3.4 and higher, extension modules numpy/scipy, matplotlib and networkx ([44– 46]).
License: GNU public license, version 2 (GPL2).
Any restrictions to use by nonacademics: see GPL2 license.
Notes
Declarations
Authors’ contributions
JPM and SS wrote the population genetic model and proposed first ideas. VB, CM and KR contributed equally to the development of the final methods, which was guided by SS. NP added the sensitivity analysis of the approximation method. SS was in charge of acquiring funds. KR wrote most of the manuscript, to which all authors contributed. All authors read and approved the final manuscript.
Acknowledgements
We thank Jürgen Angst, Sophie ArnaudHaond, Sina Brunsch, Florent Malrieu, Romuald Rouger and François Timon for constructive discussions, and all reviewers for their helpful criticism. This study is part of the CLONIX project (ANR11BSV7007) financed by the French National Research Agency. Katja Reichel receives a PhD grant by the Région Bretagne and the division “Plant Health and Environment” of the French National Institute of Agricultural Research (INRA), and would like to thank her supervisors Solenn Stoeckel and JeanChristophe Simon.
Competing interests
The authors declare that they have no competing interests.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Authors’ Affiliations
References
 Planck M. Zur Theorie des Gesetzes der Energieverteilung im Normalspectrum. In: Verhandlungen der Deutschen Physikalischen Gesellschaft. vol. 2; 1900. p. 237–245.Google Scholar
 Einstein A. Über einen die Erzeugung und Verwandlung des Lichtes betreffenden heuristischen Gesichtspunkt. Annalen der Physik. 1905;322(6):132–48.View ArticleGoogle Scholar
 Ewens WJ. Mathematical population genetics: I. Theoretical introduction. 2nd ed. Interdisciplinary applied mathematics. New York: Springer; 2004.Google Scholar
 Markov AA. Extension of the limit theorems of probability theory to a sum of variables connected in a chain. Proc Soc Phys Math Univ Kazan. 1906;15(2):135–56.Google Scholar
 Feller W. An introduction to probability theory and its applications. New York: Wiley; 1971.Google Scholar
 Otto SP, Day T. A biologist’s guide to mathematical modeling in ecology and evolution. Princeton: Princetown University Press; 2007.Google Scholar
 Allen LJS. An introduction to stochastic processes with applications to biology. 2nd ed. Boca Raton: Chapman & Hall; 2011.Google Scholar
 Davis TA. Direct methods for sparse linear systems. Fundamentals of Algorithms: Society for Industrial and Applied Mathematics; 2006.Google Scholar
 Davis TA. Algorithm 915 SuiteSparseQR: multifrontal multithreaded rankrevealing sparse QR factorization. ACM Trans Math Softw. 2011;38(1):8:1–22.Google Scholar
 Hardy GH. Mendelian proportions in a mixed population. Sci. 1908; 49–50.Google Scholar
 Weinberg W. Über den Nachweis der Vererbung beim Menschen. In: Eichler J, editor. Jahreshefte des Vereins für vaterländische Naturkunde in Württemberg, vol. 64. Stuttgart: Verein für vaterländische Naturkunde in Württemberg; 1908. p. 368–82.Google Scholar
 Orive ME. Effective population size in organisms with complex lifehistories. Theor Popul Biol. 1993;44(3):316–40.PubMedView ArticleGoogle Scholar
 Ceplitis A. Coalescence times and the Meselson effect in asexual eukaryotes. Genet Res. 2003;82(3):183–90 (WOS:000220642000004).PubMedView ArticleGoogle Scholar
 Gale JS. Theoretical population genetics. New York: Springer; 1990.View ArticleGoogle Scholar
 Greenbaum G. Revisiting the time until fixation of a neutral mutant in a finite population—a coalescent theory approach. J Theor Biol. 2015;380:98–102.PubMedView ArticleGoogle Scholar
 Stoeckel S, Masson JP. The exact distributions of F\(_{\rm {IS}}\) under partial asexuality in small finite populations with mutation. PLoS One. 2014;9(1):e85228.PubMedPubMed CentralView ArticleGoogle Scholar
 de Finetti B. Conservazione e Diffusione dei Caratteri Mendeliani. Nota I. Caso Panmittico. In: Rendiconti della R. Accademia Nazionale dei Lincei. vol. V (110–12); 1927. p. 913–921.Google Scholar
 Perron O. Zur Theorie der Matrices. Math Ann. 1907;64(2):248–63.View ArticleGoogle Scholar
 Brookes AJ. The essence of SNPs. Gene. 1999;234(2):177–86.PubMedView ArticleGoogle Scholar
 Drake JW. A constant rate of spontaneous mutation in DNAbased microbes. Proc Natl Acad Sci. 1991;88(16):7160–4.PubMedPubMed CentralView ArticleGoogle Scholar
 Ellegren H, Smith NG, Webster MT. Mutation rate variation in the mammalian genome. Curr Opin Genet Dev. 2003;13(6):562–8.PubMedView ArticleGoogle Scholar
 Kronholm I, Loudet O, de Meaux J. Influence of mutation rate on estimators of genetic differentiationlessons from Arabidopsis thaliana. BMC Genet. 2010;11(1):33.PubMedPubMed CentralView ArticleGoogle Scholar
 Wright S. Coefficients of inbreeding and relationship. Am Nat. 1922;56(645):330–8.View ArticleGoogle Scholar
 Halkett F, Simon J, Balloux F. Tackling the population genetics of clonal and partially clonal organisms. Trends Ecol Evol. 2005;20(4):194–201.PubMedView ArticleGoogle Scholar
 Aghagolzadeh M, Barjasteh I, Radha H. Transitivity matrix of social network graphs. In: Statistical Signal Processing Workshop (SSP), 2012 IEEE; 2012. p. 145–148.Google Scholar
 Dijkstra EW. A note on two problems in connexion with graphs. Numer Math. 1959;1:269–71.View ArticleGoogle Scholar
 Biswas SS, Alam B, Doja MN. Generalisation of Dijkstra’s algorithm for extraction of shortest paths in directed multigraphs. J Comput Sci. 2013;9(3):377–82.View ArticleGoogle Scholar
 Dongarra JJ, Sorensen DC. Linear algebra on high performance computers. Appl Math Comput. 1986;20(1):57–88.View ArticleGoogle Scholar
 Lehoucq RB, Sorensen DC, Yang C. ARPACK users’ guide: solution of large scale eigenvalue problems with implicitly restarted arnoldi methods. Software Environ Tools. 1997;6.Google Scholar
 Kemeny JG, Snell LJ. Finite Markov Chains. Undergraduate texts in mathematics. New York: SpringerVerlag; 1976.Google Scholar
 Schapaugh AW, Tyre AJ. A simple method for dealing with large state spaces. Methods Ecol Evol. 2012;3(6):949–57.View ArticleGoogle Scholar
 Deng K. Model reduction of Markov chains with applications to building systems [Dissertation]. Illinois: University of Illinois at UrbanaChampaign. Urbana; 2012.Google Scholar
 Kumar S, Mohri M, Talwalkar A. On samplingbased approximate spectral decomposition. In: Proceedings of the 26th International Conference on Machine Learning. Montreal, Canada; 2009.Google Scholar
 Talwalkar A. Matrix approximation for largescale learning. Courant Institute of Mathematical Sciences New York; 2010.Google Scholar
 Morris MD. Factorial sampling plans for preliminary computational experiments. Technometrics. 1991;33(2):161–74.View ArticleGoogle Scholar
 Saltelli A, editor. Sensitivity analysis in practice: a guide to assessing scientific models. Hoboken: Wiley; 2004.Google Scholar
 Wainwright HM, Finsterle S, Jung Y, Zhou Q, Birkholzer JT. Making sense of global sensitivity analyses. Comput Geosci. 2014;65:84–94.View ArticleGoogle Scholar
 R Core Team. R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; 2013.Google Scholar
 Pujol G, Iooss B, Janon A. Sensitivity: sensitivity analysis. R package version 1.11.; 2015.Google Scholar
 Cressie N, Read TRC. Multinomial goodnessoffit tests. J R Stat Soc. 1984;46(3):440–64.Google Scholar
 Ellstrand NC, Elam DR. Population genetic consequences of small population size: implications for plant conservation. Annu Rev Ecol Evol Syst. 1993;217–42.Google Scholar
 Zipkin EF, Jennelle CS, Cooch EG. A primer on the application of Markov chains to the study of wildlife disease dynamics: modelling disease dynamics with Markov chains. Methods Ecol Evol. 2010;1(2):192–8.View ArticleGoogle Scholar
 Black AJ, McKane AJ. Stochastic formulation of ecological models and their applications. Trends Ecol Evol. 2012;27(6):337–45.PubMedView ArticleGoogle Scholar
 Oliphant TE. Python for scientific computing. Comput Sci Eng. 2007;9(3):10–20.View ArticleGoogle Scholar
 Hunter JD. Matplotlib: a 2D graphics environment. Comput Sci Eng. 2007;9(3):90–5.View ArticleGoogle Scholar
 Hagberg AA, Schult DA, Swart PJ. Exploring network structure, dynamics, and function using NetworkX. In: Proceedings of the 7th Python in Science Conference (SciPy2008). Pasadena, USA; 2008. p. 11–15.Google Scholar