Uniform Random Generation
In the past, the problem of uniform random generation of combinatorial structures, that is the problem of randomly generating objects (of a preliminary fixed input size) of a specified class that have the same or similar properties, has been extensively studied. Special attention has been paid on the wide class of decomposable structures which are basically defined as combinatorial structures that can be constructed recursively in an unambiguous way.
In principle, two general (systematic) approaches have been developed for the uniform generation of these structures: First, the recursive method originated in [24] (to generate various data structures) and later systematized and extended in [25] (to decomposable data structures), where general combinatorial decompositions are used to generate objects at random based on counting possibilities. Second and more recently, the socalled Boltzmann method [1, 26], where random objects (under the corresponding Boltzmann model) have a fluctuating size, but objects with the same size invariably occur with the same probability. Note that according to [26], Boltzmann samplers may be employed for approximatesize (objects with a randomly varying size are drawn) as well as fixedsize (objects of a strictly fixed size are drawn) random generation and are an alternative to standard combinatorial generators based on the recursive method. However, fixedsize generation is considered the standard paradigm for the random generation of combinatorial structures.
(Admissible) Constructions and Specifications
According to [25], a decomposable structure is a structure that admits an equivalent combinatorial specification:
Definition 0.1 ( [25]). Let $\mathcal{A}=\left({\mathcal{A}}_{1},\mathrm{...},{\mathcal{A}}_{r}\right)$ be an rtuple of classes of combinatorial structures. A specification for $\mathcal{A}$ is a collection or r equations with the i th equation being of the form ${\mathcal{A}}_{i}={\varphi}_{i}\left({\mathcal{A}}_{1},...,{\mathcal{A}}_{r}\right)$, where ϕ_{
i
}denotes a term built of the ${\mathcal{A}}_{j}$ using the constructions of disjoint union, cartesian product, sequence, set and cycle, as well as the initial (neutral and atomic) classes.
The needed formalities that will also be used in the sequel are given as follows:
Definition 0.2 ( [27]). If $\mathcal{A}$ is a combinatorial class, then ${\mathcal{A}}^{n}$ denotes the class of objects in $\mathcal{A}$ that have size (defined as number of atoms) n. Furthermore:

Objects of size 0 are called neutral objects or tags and a class consisting of a single neutral object ϵ is called a neutral class, which will be denoted by ε (ε_{1}, ε_{2},... to distinguish multiple neutral classes containing the objects ϵ_{1}, ϵ_{2}, ..., respectively).

Objects of size 1 are called atomic objects or atoms and a class consisting of a single atomic object is called an atomic class, which will be denoted by Ƶ(Ƶ_{
a
}, Ƶ_{
b
},... to distinguish the classes containing the atoms a,b,..., respectively).

If ${\mathcal{A}}_{1},...,{\mathcal{A}}_{k}$ are combinatorial classes and ϵ_{1}, ..., ϵ_{
k
}are neutral objects, the combinatorial sum or disjoint union is defined as ${\mathcal{A}}_{1}+...+{\mathcal{A}}_{k}:=\left({\epsilon}_{1}\times {\mathcal{A}}_{1}\right)\cup ...\cup \left({\epsilon}_{k}\times {\mathcal{A}}_{k}\right)$ where ∪ denotes set theoretic union.

If $\mathcal{A}$ and $\mathcal{B}$ are combinatorial classes, the cartesian product is defined as $\mathcal{A}\times \mathcal{B}:=\left\{\left(\alpha ,\beta \right)\alpha \in \mathcal{A}\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{and}}\phantom{\rule{2.77695pt}{0ex}}\beta \in \mathcal{B}\right\}$, where size(α, β) = size(α) + size(β).
Note that the constructions of disjoint union, cartesian product, sequence, set and cycle are all admissible:
Definition 0.3 ( [27]). Let $\varphi $ be an mary construction that associates to a any collection of classes ${\mathcal{B}}_{1},...,{\mathcal{B}}_{m}$ a new class $\mathcal{A}:=\varphi \left[{\mathcal{B}}_{1},...,{\mathcal{B}}_{m}\right]$. The construction $\varphi $ is admissible iff the counting sequence (a_{
n
}) of $\mathcal{A}$ only depends on the counting sequences (b_{1,n}),..., (b_{m,n}) of ${\mathcal{B}}_{1},...,{\mathcal{B}}_{m}$, where the counting sequence of a combinatorial class $\mathcal{A}$ is the sequence of integers (a_{
n
})_{n≥0} for ${a}_{n}=\mathsf{\text{card}}\left({\mathcal{A}}^{n}\right)$.
The framework of (admissible) specifications obviously resembles that of contextfree grammars (CFGs) known from formal language theory (note that we assume the reader has basic knowledge of the notions concerning contextfree languages and grammars. An introduction can be found for instance in [28]). In order to translate a CFG into the framework of admissible constructions, it is sufficient to make each terminal symbol an atom and to assume each nonterminal A to represent a class $\mathcal{A}$ (the set of all words which can be derived from nonterminal A). However, for representing CFGs, only the admissible constructions disjoint union, cartesian product and sequence are needed: Words are constructed as cartesian products of atoms, sentential forms as cartesian products of atoms and the classes assigned to the corresponding nonterminal symbols. For instance, a production rule A → aB translates into the symbolic equation $\mathcal{A}=a\times \mathcal{B}$. Different production rules with the same lefthand side give rise to the union of the corresponding cartesian products. Nevertheless, it should be noted that [25] also shows how to reduce specifications to standard form, where the corresponding standard specifications constitute the basis of the recursive method for uniform random generation and extends the usual Chomsky normal form (CNF) for CFGs. Briefly, in standard specifications, all sums and products are binary and the constructions of sequences, sets and cycles are actually replaced with other constructions (for details see [25]).
The prime advantage of standard specifications is that they translate directly into procedures for computing the sizes of all combinatorial subclasses of the considered class $\mathcal{C}$ of combinatorial objects. This means they can be used to count the number of structures of a given size that are generated from a given nonterminal symbol. Moreover, standard specifications immediately translate into procedures for generating one such structure uniformly at random. The corresponding procedures (for class size calculations and structure generations) are actually required for (uniform) random generation of words of a given CFG by means of unranking.
Simply speaking, the unranking of decomposable structures (like for instance RNA secondary structures which can be uniquely decomposed into distinct structural components) works as follows: Each structure s in the combinatorial class ${\mathcal{S}}^{n}$ of all feasible structures having size n is given a number (rank) $i\in \left\{0,...,\mathsf{\text{card}}\left({\mathcal{S}}^{n}\right)1\right\}$, defined by a particular ranking method. Based on this ordering of the considered structure class ${\mathcal{S}}^{n}$, the corresponding unranking algorithm for a given input number $i\in \left\{0,...,\mathsf{\text{card}}\left({\mathcal{S}}^{n}\right)1\right\}$ computes the single structure $s\in {\mathcal{S}}^{n}$ having number i in the ranking scheme defined for class ${\mathcal{S}}^{n}$.
Note that in this context of unranking particular elements from a considered structure class, the corresponding algorithms make heavy use of their decomposability, as the distinct structural components are unranked from the corresponding subclasses. In fact, the class sizes can be derived according to the following recursion:
$\mathsf{\text{size(}}\mathcal{C},n\mathsf{\text{):=}}\left\{\begin{array}{cc}\hfill 1\hfill & \hfill \mathcal{C}\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{is}}\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{neutral}}\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{and}}\phantom{\rule{2.77695pt}{0ex}}n=0,\hfill \\ \hfill 0\hfill & \hfill \mathcal{C}\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{is}}\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{neutral}}\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{and}}\phantom{\rule{2.77695pt}{0ex}}n\ne 0,\hfill \\ \hfill 1\hfill & \hfill \mathcal{C}\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{is}}\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{atomic}}\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{and}}\phantom{\rule{2.77695pt}{0ex}}n=1,\hfill \\ \hfill 0\hfill & \hfill \mathcal{C}\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{is}}\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{atomic}}\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{and}}\phantom{\rule{2.77695pt}{0ex}}n\ne 1,\hfill \\ \hfill {\sum}_{i=1}^{k}\mathsf{\text{size}}\left({\mathcal{A}}_{i},n\right)\hfill & \hfill \mathcal{C}={\mathcal{A}}_{1}+...+{\mathcal{A}}_{k},\hfill \\ \hfill {\sum}_{j=0}^{n}size\left(\mathcal{A},j\right)\cdot \mathsf{\text{size}}\left(\mathcal{B},nj\right)\hfill & \hfill \mathcal{C}=\mathcal{A}\times \mathcal{B}.\hfill \end{array}\right.$
Note that when computing the sums for cartesian products, we can either consider the values for j in the sequential (also called lexicographic) order (1, 2, 3,..., n) or in the socalled boustrophedon order $\left(1,n,2,n1,...,\u2308\frac{n}{2}\u2309\right)$. In either case, given a fix number of considered combinatorial (sub)classes (or corresponding nonterminal symbols), the precomputation of all class size tables up to size n requires $\mathcal{O}\left({n}^{2}\right)$ operations on coefficients. One random generation step then needs $\mathcal{O}\left({n}^{2}\right)$ arithmetic operations when using the sequential method and $\mathcal{O}\left(n\cdot log\left(n\right)\right)$ operations when using the boustrophedon method (for details we refer to [25]). Obviously, using uniform unranking procedures to construct the i th structure of size n for a randomly drawn number i, any structure of size n is equiprobably generated. Consequently, in order to make sure that, for given size n and a sample set of random numbers i, the corresponding structures are in accordance with an appropriate probability distribution (as for instance observed from reallife RNA data), it is mandatory to use a corresponding nonuniform unranking method or an alternative nonuniform random generation approach.
NonUniform Random Generation
Coming back to the random testing problem from software engineering, we observe that generating objects of a given class of input data according to a uniform distribution is sufficient for testing the correctness of particular algorithms. However, if one intends to gather information about the "reallife behaviour" of the algorithm (e.g. with respect to runtime or space requirements), we need to perform simulations with input data that are as closely as possible related to corresponding application. This means to obtain suitable test data, we need to specify a distribution on the considered class that is similar to the one observed in real life and draw objects at random according to this (nonuniform) distribution. Deriving such a "realistic" distribution on a given class of objects can easily be done by modeling the class by an appropriate stochastic contextfree grammar (SCFG). Details will follow in the next section.
As regards RNA, it has been proven that both the combinatorial model (that is based on a uniform distribution such that all structures of a given size are equiprobable and that completely abstracts from the primary structure, see e.g. [29–31]) and the Bernoullimodel (which is capable of incorporating information on the possible RNA sequences for a given secondary structure, see e.g. [32–34]) for RNA secondary structures are rather unrealistic. However, modeling these structures by an appropriate SCFG yields a more realistic RNA model, where the probability distribution on all structures is determined from a database of real world RNA data (see e.g. [35, 36]).
Based on this observation, the problem of nonuniform random generation of combinatorial structures has been recently addressed in [20]. There, it is described how to get algorithms for the random generation of objects of a previously fixed size according to an arbitrary (nonuniform) distribution implied by a given SCFG. In principle, the construction scheme introduced in [20] extends on the recursive method for the (uniform) random generation [25] and adapted it to the problem of unranking of [37]: the basic principle is that any (complex) combinatorial class can be decomposed into (or can be constructed from) simpler classes by using admissible constructions.
Essentially, in [20], a new admissible construction called weighting has been introduced in order to make nonuniform random generation possible. By weighting, we understand the generation of distinguishable copies of objects. Formally:
Definition 0.4. If $\mathcal{A}$ is a combinatorial class and λ is an integer, the weighting of $\mathcal{A}$ by λ is defined as
. We will call two objects from a combinatorial class copies of the same object iff they only differ in the tags added by weighting operations.
For example, if we weight the class $\mathcal{A}=\left\{a\right\}$ by two, we assume the result to be the set {a, a}; weighting $\mathcal{B}=\left\{b\right\}$ by three generates {b,b,b}. Thus, $2\mathcal{A}+3\mathcal{B}=\left\{a,a,b,b,b\right\}$ and within this class, a has relative frequency $\frac{2}{5}$, while b has relative frequency $\frac{3}{5}$. Hence, this way it becomes possible to regard nonuniformly distributed classes.
As weighting a class can be replaced by a disjoint union, $\mathsf{\text{size}}\left(\lambda \mathcal{A},n\right)=\lambda \cdot \mathsf{\text{size}}\left(\mathcal{A},n\right)$ and the complexity results from [37] also hold for weighted classes. Hence, the corresponding class size computations up to n need $\mathcal{O}\left({n}^{2}\right)$ time.
Stochastic ContextFree Grammars
As already mentioned, stochastic contextfree grammars (SCFGs) are a powerful tool for modeling combinatorial classes and the essence of the nonuniform random sampling approach that will be worked out in this article. Therefore, we will now give the needed background information.
Basic Concepts
Briefly, SCFGs are an extension of traditional CFGs: usual CFGs are only capable of modeling the class of all generated structures and thus inevitably induce a uniform distribution on the objects, while SCFGs additionally produce a (nonuniform) probability distribution on the considered class of objects. In fact, an SCFG is derived by equipping the productions of a corresponding CFG with probabilities such that the induced distribution on the generated language models as closely as possible the distribution of the sample data.
The needed formalities are given as follows:
Definition 0.5 ( [
38]). A
weighted contextfree grammar (WCFG) is a 5tuple
$\mathcal{G}=\left(I,T,R,S,W\right)$, where
I (resp.
T) is an alphabet (finite set) of intermediate (resp. terminal) symbols (
I and
T are disjoint),
S ∈
I is a distinguished intermediate symbol called
axiom, R ⊂
I × (
I ∪
T)* is a finite set of production rules and
W :
R → ℝ
^{+} is a mapping such that each rule
f ∈
R is equipped with a weight
w_{
f
}: =
W(
f). If
$\mathcal{G}$ is a WCFG, then
$\mathcal{G}$ is a
stochastic contextfree grammar (SCFG) iff the following additional restrictions hold:
 1.
For all f ∈ R, we have W(f) ∈ (0,1], which means the weights are probabilities.
 2.
The probabilities are chosen in such a way that for all A ∈ I, we have ${\sum}_{f\in R,Q\left(f\right)=A}{w}_{f}=1$, where Q(f) denotes the premise of the production f, i.e. the first component A of a production rule (A, α) ∈ R. In the sequel, we will write w_{
f
}: A → α instead of f = (A, α) ∈ R, w_{
f
}= w(f).
However, at this point, we decided to not recall the basic concepts regarding SCFGs, as they are not really necessary for the understanding of this article. The interested reader is referred to the corresponding section in [21]. For a more fundamental introduction on stochastic contextfree languages, see for example [39]. In fact, the only information needed in the sequel is that if structures are modeled by a consistent SCFG, then the probability distribution on the production rules of the SCFG implies a probability distribution on the words of the generated language and thus on the modeled structures. To ensure that a SCFG gets consistent, one can for example assign relative frequencies to the productions, which are computed by counting the production rules used in the leftmost derivations of a finite sample of words from the generated language. For unambiguous SCFGs, the relative frequencies can actually be counted efficiently, as for every word, there is only one leftmost derivation to consider.
Modeling RNA Secondary Structure via SCFGs
Besides the popular planar graph representation of unknotted secondary structures, many other ways of formalizing RNA folding have been described in literature. One wellestablished example is the so called barbracket representation, where a secondary structure is modeled as a string over the alphabet Σ: = {(,), }, with a bar  and a pair of corresponding brackets ( ) representing an unpaired nucleotide and two paired bases in the molecule, respectively (see, e.g. [30]). Obviously, both models abstract from primary structure, as they only consider the number of base pairs and unpaired bases and their positions. Moreover, there exists a onetoone correspondence between both representations, as illustrated by the following example:
Example 0.1. The secondary structure shown in Figure
1 has the following equivalent barbracket representation that can be decomposed into subwords corresponding to the basic structural motifs that are distinguished in stateoftheart thermodynamic models:
Note that the reading order of secondary structures is from left to right, which is due to the chemical structure of the molecule.
Consequently, secondary structures without pseudoknots can be encoded as words of a contextfree language and the class of all feasible structures can thus effectively be modeled via a corresponding CFG. Basically, that CFG can be constructed to describe a number of classical constraints (e.g. the presence of particular motifs in structures) and it can also express longrange interactions (e.g. base pairings). By extending it to a corresponding SCFG, we can also model the fact that specific motifs of RNA secondary structures are more likely to be folded at certain stages than others (and not all possible motifs are equiprobable at any folding stage).
In fact, it is known for a long time that SCFGs can be used to model RNA secondary structures (see e.g. [40]). Additionally, SCFGs have already been used successfully for the prediction of RNA secondary structure [14, 15]. Moreoever, they can be employed for identifying structural motifs as well as for deriving stochastic RNA models that are  with respect to the expected shapes  more realistic than other models [36]. Furthermore, note that an SCFG mirror of the famous Turner energy model has been used in [21] to perform the first analytical analysis of the free energy of RNA secondary structures; this SCFG marks a cornerstone between stochastic and pyhsicsbased approaches towards RNA structure prediction.
Random Generation With SCFGs
SCFGs can easily be used for the random generation of combinatorial objects according to the probability distribution induced by a sample set, where the only problem is that they do not allow the user to fix the length of generated structures. In particular, given an SCFG $\mathcal{G}$ and the corresponding language (combinatorial class) $\mathcal{L}\left(\mathcal{G}\right)$, a random word $w\in \mathcal{L}\left(\mathcal{G}\right)$ can be generated in the following way:
This means consider all m ≥ 1 rules p_{1} : A → α_{1},..., p_{
m
}: A → α_{
m
}having lefthand side A, where according to the definition of SCFGs, ${\sum}_{i=1}^{m}{p}_{i}=1$ must hold. Then, find k ≥ 1 with ${\sum}_{i=1}^{k1}{p}_{i}<r\le {\sum}_{i=1}^{k}{p}_{i}$, i.e. determine k ≥ 1 with $r\in \left({\sum}_{i=1}^{k1}{p}_{i},{\sum}_{i=1}^{k}{p}_{i}\right]$. The production corresponding to the randomly drawn number r ∈ (0,1] is then given by A → α_{
k
}and hence, in the currently considered sentential form, the nonterminal symbol A is substituted by α_{
k
}.
Note that the choice of the production made in 3) according to the previously drawn random number is appropriate, since it is conform to the probability distribution on the grammar rules.
Example 0.2. Consider the language generated by the SCFG with productions ¾: S → ϵ and ¼: S → (S). Thus, we start with the sentential form S, then consider the leftmost nonterminal symbol, which is given by S, and draw a random number r ∈ (0,1]. If 0 <r ≤ ¾, the production determined by r is S → ϵ and thus, we get the empty word and are finished. Otherwise, ¾ <r ≤ ¾ + ¼ = 1, which means we have to consider A → (S) for the substitution in step 3) and thus obtain the sentential form (S). Afterwards, we must repeat the process, as there is still one nonterminal symbol left.
Unfortunately, there is one major problem that comes with this approach for the (nonuniform) random generation of combinatorial objects: The underlying (consistent) SCFG
$\mathcal{G}$ implies a probability distribution on the whole language
$\mathcal{L}\left(\mathcal{G}\right)$, such that we generate a word of arbitrary size. In order to fix the size, we can proceed along the following lines:
 1)
We translate the grammar $\mathcal{G}$ into a new framework which allows to consider fixed sizes for the random generation, such that
 2)
the distribution implied on $\mathcal{L}\left(\mathcal{G}\right)$ conditioned on any fixed size n is kept within the new framework.
A wellknown approach which allows for 1) is connected to the concept of admissible constructions used to describe a decomposable combinatorial class (see above). As the operations (like cartesian products, unions, and so on) used to construct the combinatorial objects are also used to define an order on them, it becomes possible to identify the i th object of a given size and the problem of generating objects uniformly at random reduces to the problem of unranking, that is the problem of constructing the object of order (rank) i, for i a random number (see e.g. [41]).
Remark. Some might think that with an appropriate SCFG (modeling a given class of objects) at hand, it is not really necessary to use an unranking method that implies cumbersome formalities such as admissible constructions and decomposable classes if we want to generate random objects of a fixed size
n. As a matter of principle, they are right  we could also use a
conditional sampling method: If we need to generate a word of size
n from nonterminal symbol
A, where there are
m ≥ 1 rules
f_{
i
}=
A →
α_{
i
}, 1 ≤
i ≤
m, having lefthand side
A, then we just need to choose the next production
f_{
i
}according to
$\frac{\mathsf{\text{Prob}}\left(A\to {\alpha}_{i}{\Rightarrow}^{*}x\mathsf{\text{size}}\left(x\right)=n\right)}{\mathsf{\text{Prob}}\left(A{\Rightarrow}^{*}x\mathsf{\text{size}}\left(x\right)=n\right)},$
which is the posterior probability that we used production rule f_{
i
}under the condition that a word of size n is generated.
Similarly, if the production rule is of the type A → BC (assuming the grammar is in Chomsky normal form (CNF), which does not pose a problem, as an unambiguous SCFG can be efficiently transformed into CNF [39]), we can choose a way to split size n into sizes j and n  j for the lengths generated from nonterminal symbols B and C. This requires precomputing n lengthdependent probabilities (i.e. all probabilities for generating a word of any length up to n) for each nonterminal symbol, which might seem to be similar (with respect to complexity) to precomputing all class sizes up to n for all considered combinatorial (sub)classes as needs to be done for unranking. However, there is a striking difference between the two approaches: While conditional sampling makes heavy use of rather small floating point values  with all the wellknown problems and discomforting details like underflows or using logarithms associated with it  our unranking approach builds on integer values only which we assume a major advantage. There is another striking difference: lengthdependent probabilities (which by the way yield a socalled lengthdependent SCFG (LSCFG), see [42], and already have been used in [43]), require a very rich training set. In fact, if the RNA data set used for determining the distribution induced by the grammar is not rich enough, then the corresponding stochastic RNA model is underestimated and its quality decreases. This is especially a problem when considering comprehensive CFGs that distinguish between many different structural motifs in order to get a realistic picture of the molecules' behaviour; such a grammar should however be preferred over simple lightweight grammars as basis for a nonuniform random generation method. Nevertheless, this problem does not surface when sticking to conventional probabilities and the corresponding traditional SCFG model. Actually, since we consider a huge CFG where all possible structural motifs are created by distinct productions, we generally obtain realistic probability distributions and RNA models (see [21]).
Finally note that of course we could make use of random sampling strategies originally designed to sample structures connected to a given sequence in order to generate a random secondary structure only. However, such algorithms typically use a linear time to sample a single base pair (see, e.g., [6]) such that the time to sample a complete structure is quadratic in its length. This causes no problems for the original application of such algorithms since the sequencedependent preprocessing which is part of their overall procedure is at least quadratic in time and thus the dominating part. Here our approach is of advantage (replacing a factor n by log(n)) and since our preprocessing only depends on the size of the structure to be generated it is performed once and stored to disk for later reuse. Last but not least we are not sure, if the different existing approaches just mentioned could easily be made as fast as ours by simple changes only.
Bottom line is that hooking up to unranking of combinatorial classes offers three significant benefit compared to conditional sampling, namely a fast sampling strategy, the usage of integers instead of floating point values and a greater independence of the richness of the training data (compared to lengthdependent models). For this reason, we assume our unranking algorithm a valuable contribution, even though it requires a more cumbersome framework.
Unranking of Combinatorial Objects
The problem of unranking can easily be solved along the composition of the objects at hand, i.e. the operations used for its construction, once we know the number of possible choices for each substructure. Assume for example we want to unrank objects from a class $\mathcal{C}=\mathcal{A}+\mathcal{B}$. We will assume all elements of $\mathcal{A}$ to be of smaller order than those of $\mathcal{B}$ (this way we use the construction of the class to imply an ordering). Finding the i th element of $\mathcal{C}$, i.e. unranking class $\mathcal{C}$, now becomes possible by deciding whether $i<\mathsf{\text{card}}\left(\mathcal{A}\right)$. In this case, we recursively call the unranking procedure for $\mathcal{A}$. Otherwise (i.e. if $i\ge \mathsf{\text{card}}\left(\mathcal{A}\right)$), we consider $\mathcal{B}$, searching for its ($i\mathsf{\text{card}}\left(\mathcal{A}\right)$))th element.
Formally, we first need to specify an order on all objects of the considered combinatorial class that have the same size. This can be done in a recursive way according to the admissible specification of the class:
Definition 0.6 ( [37]). Neutral and atomic classes contain only one element, such that there is only one possible ordering. Furthermore, let ${<}_{{C}^{n}}$ denote the ordering within the combinatorial class ${\mathcal{C}}^{n}$, then

If
$\mathcal{C}={\mathcal{A}}_{1}+...+{\mathcal{A}}_{k}$ and
$\gamma ,{\gamma}^{\prime}\in {\mathcal{C}}^{n}$, then
$\gamma {<}_{{C}^{n}}{\gamma}^{\prime}$ iff
$\left[\gamma \in {\left({\mathcal{A}}_{i}\right)}^{n}\phantom{\rule{0.3em}{0ex}}\mathsf{\text{and}}\phantom{\rule{0.3em}{0ex}}{\gamma}^{\prime}\in {\left({\mathcal{A}}_{j}\right)}^{n}\phantom{\rule{0.3em}{0ex}}\mathsf{\text{and}}\phantom{\rule{0.3em}{0ex}}i<j\right]\phantom{\rule{0.3em}{0ex}}\mathsf{\text{or}}\phantom{\rule{0.3em}{0ex}}\left[\gamma ,{\gamma}^{\prime}\in {\left({\mathcal{A}}_{i}\right)}^{n}\phantom{\rule{0.3em}{0ex}}\mathsf{\text{and}}\phantom{\rule{0.3em}{0ex}}\gamma {<}_{{\left({\mathcal{A}}_{i}\right)}^{n}}{\gamma}^{\prime}\right].$

If
$\mathcal{C}=\mathcal{A}\times \mathcal{B}$ and
$\gamma =\left(\alpha ,\beta \right),{\gamma}^{\prime}=\left({\alpha}^{\prime},{\beta}^{\prime}\right)\in {\mathcal{C}}^{n}$, then
$\gamma {<}_{{C}^{n}}{\gamma}^{\prime}$ iff
$\left[\mathsf{\text{size}}\left(\alpha \right)<\mathsf{\text{size}}\left({\alpha}^{\prime}\right)\right]\mathsf{\text{or}}\left[j=\mathsf{\text{size}}\left(\alpha \right)=\mathsf{\text{size}}\left({\alpha}^{\prime}\right)\phantom{\rule{0.3em}{0ex}}\mathsf{\text{and}}\phantom{\rule{0.3em}{0ex}}\alpha {<}_{{\left(\mathcal{A}\right)}^{j}}{\alpha}^{\prime}\right]\mathsf{\text{or}}\left[\alpha ={\alpha}^{\prime}\mathsf{\text{and}}\phantom{\rule{0.3em}{0ex}}\beta {<}_{\left(\mathcal{B}\right)nj}{\beta}^{\prime}\right]$
when considering the lexicographic order (1, 2, 3,..., n), which is induced by the specification ${\mathcal{C}}^{n}={\mathcal{A}}^{0}\times {\mathcal{B}}^{n}+{\mathcal{A}}^{1}\times {\mathcal{B}}^{n1}+{\mathcal{A}}^{2}\times {\mathcal{B}}^{n2}+...+{\mathcal{A}}^{n}\times {\mathcal{B}}^{0}$.

If
$\mathcal{C}=\mathcal{A}\times \mathcal{B}$ and
$\gamma =\left(\alpha ,\beta \right),{\gamma}^{\prime}=\left({\alpha}^{\prime},{\beta}^{\prime}\right)\in {\mathcal{C}}^{n}$, then
$\gamma \gamma {<}_{{C}^{n}}{\gamma}^{\prime}$ iff
$\begin{array}{c}\left[min\left(\mathsf{\text{size}}\left(\alpha \right),\mathsf{\text{size}}\left(\beta \right)\right)<min\left(\mathsf{\text{size}}\left({\alpha}^{\prime}\right),\mathsf{\text{size}}\left({\beta}^{\prime}\right)\right)\right]\mathsf{\text{or}}\\ \left[min\left(\mathsf{\text{size}}\left(\alpha \right),\mathsf{\text{size}}\left(\beta \right)\right)=min\left(\mathsf{\text{size}}\left({\alpha}^{\prime}\right),\mathsf{\text{size}}\left({\mathcal{B}}^{\prime}\right)\right)\phantom{\rule{0.3em}{0ex}}\mathsf{\text{and}}\phantom{\rule{0.3em}{0ex}}\mathsf{\text{size}}\left(\alpha \right)<\mathsf{\text{size}}\left({\alpha}^{\prime}\right)\right]\mathsf{\text{or}}\\ \left[j=\mathsf{\text{size}}\left(\alpha \right)=\mathsf{\text{size}}\left({\alpha}^{\prime}\right)\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{and}}\phantom{\rule{2.77695pt}{0ex}}\alpha {<}_{{\left(\mathcal{A}\right)}^{j}}{\alpha}^{\prime}\right]\mathsf{\text{or}}\left[\alpha ={\alpha}^{\prime}\phantom{\rule{2.77695pt}{0ex}}\mathsf{\text{and}}\phantom{\rule{0.3em}{0ex}}\beta {<}_{{\left(\mathcal{B}\right)}^{nj}}{\beta}^{\prime}\right]\end{array}$
when considering the boustrophedon order $\left(1,n,2,n1,...,\u2308\frac{n}{2}\u2309\right)$, induced by the specification ${\mathcal{C}}^{n}={\mathcal{A}}^{0}\times {\mathcal{B}}^{n}+{\mathcal{A}}^{n}\times {\mathcal{B}}^{0}+{\mathcal{A}}^{1}\times {\mathcal{B}}^{n1}+{\mathcal{A}}^{n1}\times {\mathcal{B}}^{1}+...$
Considering ${<}_{{C}^{n}}$, the actual unranking algorithms are quite straightforward. Therefore, they will not be presented here and we refer to [20, 44] for details.
Recall that in [20], the basic approach towards nonuniform random generation is weighting of combinatorial classes, as this makes it possible that the classes are nonuniformly distributed. If those combinatorial classes are to correspond to a considered SCFG, we have to face the problem that the maximum likelihood (ML) training introduces rational weights for the production rules while weighting as an admissible construction needs integer arguments.
When translating rational probabilities into integral weights, we have to assure that the relative weight of each (unambiguously) generated word remains unchanged. This can be reached by scaling all productions by the same factor (common denominator of all probabilities), while ensuring that derivations are of equal length for words of the same size (ensured by using grammars in CNF). However, a much more elegant way is to scale each production according to its contribution to the length of the word generated, that is, productions lengthening the word by k will be scaled by c^{
k
}. Since we consider CFGs, the lengthening of a production of the form A → α is given by α  1. However, this rule leads to productions with a conclusion of length 1 not being reweighted, hence we have to assure that all those productions already have integral weights. Furthermore, ϵproductions need a special treatment. We don't want to discuss full details here and conclude by noticing that the reweighting normal form (RNF) keeps track of all possible issues:
Definition 0.7 ( [
20]). If
$\mathcal{G}=\left(I,T,R,S,W\right)$ is a WCFG,
$\mathcal{G}$ is said to be in
reweighting normal form (RNF) iff
 1.
$\mathcal{G}$ is loopfree and ϵfree.
 2.
For all A → α ∈ R with A = S, we have α ≤ 1.
 3.
For all A → α ∈ R with A ≠ S, we have α > 1 or W(A → α) ∈ ℕ.
 4.
For all A ∈ I there exists α ∈ (I ∪ T)* such that A → α ∈ R.
Note that the last condition (that any intermediate symbol occurs as premise of at least one production) is not required for reweighting, but necessary for the translation of a grammar into an admissible specification.
Definition 0.8 ( [20]). A WCFG $\mathcal{G}$ is called loopfree iff there exists no nonempty derivation A ⇒^{+} A for A ∈ I. It is called ϵfree iff there exists no (A, ϵ) ∈ R with A = S and there exists no (A, α_{1}Sα_{2}) ∈ R, where ϵ denotes the empty word.
If $\mathcal{G}$ and ${\mathcal{G}}^{\prime}$ are WCFGs, then $\mathcal{G}$ and ${\mathcal{G}}^{\prime}$ are said to be wordequivalent iff $\mathcal{L}\left(\mathcal{G}\right)=\mathcal{L}\left({\mathcal{G}}^{\prime}\right)$ and for each word $w\in \mathcal{L}\left(\mathcal{G}\right)$, we have W(w) = W'(w).
In [20], it is shown how to transform an arbitrary WCFG to a wordequivalent, loopfree and ϵfree grammar, that grammar to one in RNF and the latter to the corresponding admissible specification. Formally:
Theorem 0.1 ( [39]). If $\mathcal{G}$is a SCFG, there exists a SCFG ${\mathcal{G}}^{\prime}$in Chomsky normal form (CNF) that is wordequivalent to $\mathcal{G}$, and ${\mathcal{G}}^{\prime}$can be effectively constructed from $\mathcal{G}$.
The construction given in [39] assumes that $\mathcal{G}$ is ϵfree. It can however be extended to nonϵfree grammars by adding an additional step after the intermediate grammar $\mathcal{G}$ has been created (see e.g. [20]). Furthermore, it should be noted that an unambiguous grammar is inevitably loopfree.
Theorem 0.2 ( [20]). If $\mathcal{G}$is a loopfree, ϵfree WCFG, there exists a WCFG ${\mathcal{G}}^{\prime}$in RNF that is wordequivalent to $\mathcal{G}$and ${\mathcal{G}}^{\prime}$can be effectively constructed from $\mathcal{G}$.
Altogether, starting with an arbitrary unambiguous SCFG ${\mathcal{G}}_{0}$ that models the class of objects to be randomly generated, we have to proceed along the following lines:

Transform ${\mathcal{G}}_{0}$ to a corresponding ϵfree and loopfree SCFG ${\mathcal{G}}_{1}$.

Transform ${\mathcal{G}}_{1}$ into ${\mathcal{G}}_{2}$ in RNF (where all production weights are rational).

Reweight the production rules of ${\mathcal{G}}_{2}$ (such that all production weights are integral), yielding reweighted WCFG ${\mathcal{G}}_{3}$.

Transform ${\mathcal{G}}_{3}$ (with integral weights) into the corresponding admissible specification.

This specification (with weighted classes) can be translated directly into a recursion for the function
size of all involved combinatorial (sub)classes (where class sizes are weighted) and
yielding the desired weighted unranking algorithm for generating random elements of $\mathcal{L}\left({\mathcal{G}}_{0}\right)$.
A small example that shows how to proceed from SCFG to reweighted normal form and the corresponding weighted combinatorial classes which allow for nonuniform generation by means of unranking is discussed in the Appendix.