BiC2PAM: constraintguided biclustering for biological data analysis with domain knowledge
 Rui Henriques^{1}Email author and
 Sara C. Madeira^{1}Email author
DOI: 10.1186/s1301501600855
© The Author(s) 2016
Received: 2 February 2016
Accepted: 16 August 2016
Published: 14 September 2016
Abstract
Background
Biclustering has been largely used in biological data analysis, enabling the discovery of putative functional modules from omic and network data. Despite the recognized importance of incorporating domain knowledge to guide biclustering and guarantee a focus on relevant and nontrivial biclusters, this possibility has not yet been comprehensively addressed. This results from the fact that the majority of existing algorithms are only able to deliver suboptimal solutions with restrictive assumptions on the structure, coherency and quality of biclustering solutions, thus preventing the upfront satisfaction of knowledgedriven constraints. Interestingly, in recent years, a clearer understanding of the synergies between pattern mining and biclustering gave rise to a new class of algorithms, termed as patternbased biclustering algorithms. These algorithms, able to efficiently discover flexible biclustering solutions with optimality guarantees, are thus positioned as good candidates for knowledge incorporation. In this context, this work aims to bridge the current lack of solid views on the use of background knowledge to guide (patternbased) biclustering tasks.
Methods
This work extends (patternbased) biclustering algorithms to guarantee the satisfiability of constraints derived from background knowledge and to effectively explore efficiency gains from their incorporation. In this context, we first show the relevance of constraints with succinct, (anti)monotone and convertible properties for the analysis of expression data and biological networks. We further show how patternbased biclustering algorithms can be adapted to effectively prune of the search space in the presence of such constraints, as well as be guided in the presence of biological annotations. Relying on these contributions, we propose BiClustering with Constraints using PAttern Mining (BiC2PAM), an extension of BicPAM and BicNET biclustering algorithms.
Results
Experimental results on biological data demonstrate the importance of incorporating knowledge within biclustering to foster efficiency and enable the discovery of nontrivial biclusters with heightened biological relevance.
Conclusions
This work provides the first comprehensive view and sound algorithm for biclustering biological data with constraints derived from user expectations, knowledge repositories and/or literature.
Introduction
Biological data are characterized by the presence of local patterns, whose discovery has been widely studied and motivated in the context of biclustering [1, 2]. In particular, the relevance of biclustering has been largely shown in the analysis of gene expression data (to discover transcriptional modules described by subsets of genes correlated in subsets of samples [2]) and biological networks (to unravel meaningfully dense regions from weighted adjacency matrices derived from interaction data [3]). A key question in the field of biclustering is how to benefit from the increasingly available domain knowledge. Initial attempts to incorporate background knowledge from user expectations [4–6] and knowledgebased repositories [7–10] within biclustering showed its importance to explore efficiency gains and guarantee relevant solutions. However, these attempts only support very specific forms of knowledge and cannot be extended to flexibly constrain the desirable properties of outputted biclusters. Furthermore, due to the complexity of the biclustering task^{1}, most of the existing algorithms: (1) are based on greedy or stochastic approaches, producing suboptimal solutions; and (2) usually place restrictions on the allowed structure, coherency and quality of biclusters, compromising the flexibility of the outputs [2, 11]. In this context, these biclustering approaches cannot be extended to incorporate knowledgedriven constraints since their restrictions may a priori contradict the inputted constraints.
Recent attempts to perform biclustering based on enhanced pattern mining searches [8, 12, 13], termed as patternbased biclustering, showed the unprecedented possibility to efficiently discover arbitrarily positioned biclusters with parameterizable size, coherency and quality [2, 14]. In this context, two valuable synergies can be identified between patternbased biclustering and knowledge incorporation. First, the optimality and flexibility of patternbased biclustering solutions provide an adequate basis upon which knowledgedriven constraints can be incorporated. Patternbased biclustering tackles the restrictions of peer algorithms, being an adequate candidate to flexibly constrain the desirable properties of the target solution space. Second, the effective use of domain knowledge to guide pattern mining searches has been largely studied in the context of domaindriven pattern mining [15, 16].
Despite these synergies, two major problems persist. First, there is a lack of understanding on whether domaindriven pattern mining and biclustering can be consistently integrated. In particular, there is not a solid ground on how to map the commonly available background knowledge in the form of constraints to guide the biclustering task. Second, patternbased biclustering algorithms depend on a specific variant of pattern mining, referred as fullpattern mining, which has been scarcely studied in the context of domaindriven pattern mining. In fact, although new fullpattern mining searches have been recently proposed to guarantee the scalability of the biclustering task over large and dense data [17, 18], there are not yet contributions on how these searches can be adapted to incorporate background knowledge.

integrative view of domaindriven pattern mining and (patternbased) biclustering. The consistency of this view is shown for patterns given by frequent itemsets, association rules and sequences;

principles for biclustering tabular data in the presence of an arbitrary number of annotations per observation (derived from knowledge repositories and literature);

list of meaningful constraints with succinct, (anti)monotone and convertible properties for biological data contexts with a focus on gene expression and network data;

principles to specify, process and incorporate different types of constraints;

extension of fullpattern miners based on patterngrowth searches to optimally explore efficiency gains from constraints with succinct, (anti)monotone and convertible properties. In particular we show:
In this context, we propose BiClustering with Constraints using PAttern Mining (BiC2PAM), an algorithm that integrates recent breakthroughs on patternbased biclustering [3, 14, 19, 20] and extends them to effectively incorporate constraints and annotations from domain knowledge.
Experimental results on synthetic and real data show the importance of incorporating background knowledge within patternbased biclustering to seize large efficiency gains by adequately pruning the search space and to guarantee nontrivial and (biologically) relevant solutions.
This paper is structured as follows. First, we provide background on domaindriven pattern mining for patternbased biclustering. Second, key contributions and limitations from related work are surveyed. Third, we list meaningful constraints in gene expression data and biological networks, and describe an algorithmic basis (BiC2PAM) for their incorporation. BiC2PAM is further extended to attain efficiency gains from constraints with nice properties. Fourth, we provide initial empirical evidence of BiC2PAM’s efficiency and ability to unravel nontrivial yet biologically significant biclusters. Finally, concluding remarks and major implications are synthesized.
Background
Biclustering, fullpattern mining and patternbased biclustering
Definition 1
Given a realvalued matrix A with n rows X = \(\{x_1,\ldots,x_n\}\) and m columns Y = \(\{y_1,\ldots,y_m\}\), and elements \(a_{ij}\) relating row \(x_i\) and column \(y_j\), the biclustering task aims to identify a set of biclusters \(\{B_1,\ldots,B_p\}\), where each bicluster \(B_k\) = \((I_k,J_k)\) is defined by a subset of rows \(I_k\subset X\) and columns \(J_k\subset Y\) satisfying specific criteria of homogeneity and statistical significance.
The homogeneity criteria determine the structure, coherency and quality of biclustering solutions, while the statistical significance of a bicluster determines whether its probability of occurrence deviates from expectations. The homogeneity of a biclustering model is commonly guaranteed through a merit function. Following Madeira’s taxonomy [2], existing biclustering algorithms can be grouped according to their homogeneity criteria (defined by the underlying merit function) and search paradigm (determining how the merit function is applied). The structure of a biclustering solution is essentially defined by the number, size and positioning of biclusters. Flexible structures are characterized by an arbitrary high set of (possibly overlapping) biclusters. The coherency of a bicluster is defined by the observed correlation of values (coherency assumption) and by the allowed deviation from expectations (coherency strength). A bicluster can have coherency of values across its rows, columns or overall elements, where the values typically follow constant, additive, symmetric and orderpreserving assumptions [2]. Finally, the quality of a bicluster is defined by the type and amount of accommodated noise. Definitions 2 and 3 formalize these concepts, while Fig. 2 shows a set of biclusters with different coherencies in a symbolic dataset.
Definition 2
Let the elements in a bicluster \(a_{ij}\in (I,J)\) have coherency across rows given by \(a_{ij}\) = \(k_j+\gamma _i+\eta _{ij}\), where \(k_j\) is the expected value for column j, \(\gamma _i\)is the adjustment for row i, and \(\eta _{ij}\) is the noise factor (affecting the quality of the bicluster). Let \(\bar{A}\) be the amplitude of values in a matrix A. Given a matrix A, the coherency strength is a real value \(\delta \in [0,\bar{A}]\), such that \(a_{ij}=k_j+\gamma _i+\eta _{ij}\) where \(\eta _{ij}\in [\delta /2,\delta /2]\).
Definition 3
The \(\gamma\) factors define the coherency assumption: constant when \(\gamma\) = 0, and additive otherwise. Symmetries can be accommodated on rows, \(a_{ij}\times c_i\) where \(c_i\in \{1,\) −\(1\}\). Orderpreserving assumption is verified when the values of rows induce the same linear ordering across columns.
Definition 4
Given a bicluster B = (I, J), the bicluster pattern \(\varphi _{B}\) is given by the sequence of expected values (\(k_j\)) according to a permutation of columns in the absence of adjustments (\(\gamma _i\) = 0) and noise (\(\eta _{ij}\) = 0): \(\{k_j \mid y_j\in J\}\), while its support is given by the number of rows satisfying the pattern: I.
Consider the additive bicluster (I,J) = (\(\{x_1,x_2\}\),\(\{y_1,y_2,y_3\}\)) in \(\mathbb {N}_0^+\) with coherency across rows. Assuming \(x_{1}J\)= \(\{1,3,2\}\) and \(x_{2}J\) = \(\{3,4,2\}\), then this biclusters can be described by \(a_{ij}\) = \(k_j\) + \(\gamma _i\) with the pattern \(\varphi\) = {\(k_1\) = 0, \(k_2\) = 2, \(k_3\) = 1}, supported by two rows with additive factors \(\gamma _1\) = 1 and \(\gamma _2\) = 3.
Despite the relevance of discovering optimal and flexible biclustering solutions to effectively incorporate knowledgedriven constraints, most of the existing biclustering algorithms are based on greedy or stochastic searches, producing suboptimal solutions, and place restrictions (such as simplistic forms of coherency, fixed number of biclusters, nonoverlapping structures) that prevent the flexibility of the outputs [2, 14].
Patternbased biclustering. In recent years, a clearer understanding of the synergies between pattern mining and biclustering gave rise to a new class of algorithms, referred as patternbased biclustering, aiming to address these limitations (no guarantees of optimality and flexibility). Patternbased biclustering is inherently prepared to efficiently find exhaustive solutions of biclusters with the unprecedented possibility to customize their structure, coherency and quality. Such behavior explains why these algorithms are receiving an increasing attention for biological data analysis [3, 8, 12, 14, 19–21]. The major potentialities include: (1) efficient searches with optimality guarantees; (2) biclusters with flexible coherency strength and assumption [14, 19, 20]; (3) robustness to noise, missing values and discretization problems [14] by introducing the possibility to assign or impute multiple symbols to a single data element; (4) nonfixed number of biclusters arbitrarily positioned [12, 21]; (5) applicability to network data and sparse data matrices [3, 22]; among others.
At its core, patternbased biclustering relies on the (iterative application of the) fullpattern mining task [14]. A fullpattern defines a region from the input data space, thus enclosing not only the underlying pattern (itemset, association rule, sequential pattern or graph with frequency and length above certain thresholds), but also its supporting rows and columns.
Definition 5
Let \(\mathcal {L}\) be a finite set of items, and a pattern P to be a composition of items, either an itemset (\(P\subseteq \mathcal {L}\)), association rule (\(P\,{:}\;P_1\rightarrow P_2\) where \(P_1\subseteq \mathcal {L}\wedge P_2\subseteq \mathcal {L}\)) or sequence ( P = \(P_1\ldots P_n\) where \(P_i\subseteq \mathcal {L}\)). Let a transactional database Dbe a finite set of rows/transactions, each defining a composition of items. A transaction is commonly given by an itemset or sequence. Given D, let the coverage \(\Phi _{P}\) of pattern P be the set of rows in D in which P is satisfied/occurs, and its support \(sup_P\) be the coverage size, \(\Phi _{P}\). Let the length of a pattern P be the number of items.
Definition 6
Given a matrix A, let D be a transactional database derived from A: either the concatenation of items with their column index (transactions given by itemsets) or the ordering of column indexes according to the values per row (transactions given by sequences). A fullpattern is a tuple \((P,\Phi _{P},\psi _P,\Upsilon _P)\), where P is the pattern in D, \(\Phi _{P}\subset X\) is its coverage (rows satisfying P), \(\Psi _P\subset Y\) is the set of indexes (columns), and \(\Upsilon _P\) is the original pattern in A (the corresponding itemset, rule or sequence prior to the concatenation or ordering of column indexes).
Definition 7
Given a matrix A, the mapped transactional database D, and a minimum support \(\theta _1\) and pattern length \(\theta _2\) thresholds, fullpattern mining consists of computing: \(\{(P,\Phi _{P},\psi _P,\Upsilon _P) \mid sup_P \ge \theta _1\wedge P\ge \theta _2\}\).
Frequent itemsets can be discovered to compose constant, additive and multiplicative models [14]; sequential patterns are used to learn orderpreserving models [19]; and rules can be composed to learn plaid models or tolerate parameterizable levels of localized noise [20]. Figure 3 further illustrates the paradigmatic cases where fullpattern mining is applied to discover constant and orderpreserving biclusters.
In this context, the set of maximal biclusters (bicluster not contained in larger biclusters) are mapped from closed fullpatterns (frequent yet not contained in larger patterns with same support). Definition 8 specifies the mapping between a fullpattern and a bicluster. For realvalued matrices, (realvalued) biclusters are mapped from fullpatterns discovered under a parameterizable coherency strength (\(\delta\) \(\propto\)1/\(\mathcal {L}\) where \(\mathcal {L}\) is the discretization alphabet).
Definition 8
Given a transactional database D derived from a realvalued matrix, the set of maximal biclusters \(\cup _k (I_k,J_k)\) can be derived from the set of closed fullpatterns \(\cup _k P_k\) by mapping \(I_k\) = \(\Phi _{P_k}\) and \(J_k\) = \(\Psi _{P_k}\), where \(\varphi _{B_k}\) = \(\Upsilon _{P_k}\).
Constraintbased biclustering
To formalize the task targeted in this work, we introduce below the concept of constraint in the context of biclustering, and further describe different types of constraints according to the selected fullpattern mining task.
A constraint is traditionally seen as a conjunction of relations (predicate) over a set of variables describing a given dataset [23]. Definitions 9 and 10 revise this notion to guarantee its proper applicability within (patternbased) biclustering tasks.
Definition 9
In the context of pattern mining, a constraint is a predicate on the powerset of items \(C{:}\;2^{\mathcal {L}}\rightarrow\){true,false}. In the context of fullpattern mining, a fullconstraint is a predicate on the powerset of original items, transactions, indexes and/or concatenations, \(C\,{:}\;\{2^{\mathbf {Y}}\times 2^\mathcal {L},2^{\mathbf {X}},2^{\mathbf {Y}},2^{\mathcal {L}}\}\rightarrow\){true,false}. A fullpattern \((P,\Phi _{P},\psi _P,\Upsilon _P)\) satisfies a fullconstraint C if \(C(P,\Phi _P,\psi _P,\Upsilon _P)\) is true.
Definition 10
A biclustering constraint is a predicate on a bicluster’s values per column, rows I, columns J and pattern \(\varphi _B\), \(C\,{:}\;\{2^{\mathbf {Y}}\times 2^\mathcal {L},2^{\mathbf {X}},2^{\mathbf {Y}},2^{\mathcal {L}}\}\rightarrow {true,false}.\) A bicluster B satisfies a constraint C if \(C(\varphi _B\cdot J,I,J,\varphi _B)\) is true (or, alternatively, when the associated fullpattern satisfies a fullconstraint).
Consider a matrix mapped into a transactional database with \(\mathcal {L}\) = {a,b,c}. An illustrative fullconstraint is \(y_1a\in P\wedge \{x_2,x_3\}\) \(\subseteq\) \(\Phi _P\wedge y_4\) \(\in\) \(\Psi _P\wedge \{b\}\) \(\subseteq\) \(\Upsilon _P\), and the associated biclustering constraint is \(y_1a\in B \wedge \{x_2,x_3\}\) \(\subseteq\) \(I\wedge\) \(y_4\in J\wedge \{b\}\) \(\subseteq\) \(\varphi _B\). Minimum support and minimum pattern length are the default fullconstraints in fullpattern mining: \(C_{support}\) = \(\Phi _{P}\ge \theta\) and \(C_{length}\) = \(P\ge \theta\).
More interesting constraints with properties of interest include regular expressions or aggregate functions. In the presence of matrices with numeric or ordinal values, further constraints can be specified. In this context, a cost table is specified in addition to the alphabet of items (e.g. {a:0, b:1, c:2}). Depending on the type of fullpattern, multiple constraints can be applied against a cost table, including the paradigmatic cases of aggregate functions such as length, maximum, minimum, range, sum, mean and variance [24].
Some of these constraints are said to exhibit nice properties when their input can be effectively pushed deep into the pattern mining task [15] to prune the search space and therefore achieve efficiency gains. Below, we explore different types of constraints according to the selected fullpattern mining task for biclustering: itemset, rulebased and sequentialpattern constraints.
Itemset constraints
Regular expressions and aggregate functions are the most common form of constraints to guide frequent itemset mining. In this context, efficiency gains can be seized in the presence of constraints with succinct, (anti)monotone and convertible properties.
Definition 11
Let \(\mathcal {L}\) be a set of items and P be an itemset, \(P\subseteq \mathcal {L}\). Let each item \(\sigma \in \mathcal {L}\) have a correspondence with a real value, \(c{:}\, \mathcal {L}\rightarrow \mathbb {R}\), according to a welldefined cost table. Let v be a realvalued constant and range(P) = max(P) − min(P), max(P) = \(max\bigcup \nolimits _{_\sigma \in P}c(\sigma )\), min(P) = \(min\bigcup \nolimits _{_\sigma \in P}c(\sigma )\) and avg(P) = \(\sum \nolimits _{\sigma \in P}\frac{c(\sigma )}{P}\) be welldefined predicates. In this context:

A constraint C is monotone if for any P satisfying C, P supersets satisfy C (e.g. \(range(P)\ge v\)).

A constraint C is antimonotone if for any P not satisfying C, P supersets do not satisfy C (e.g. \(max(P)\le v\)).

Given a pattern \(P'\) satisfying a constraint C, C is succint over P if P contains \(P'\) (e.g. \(min(P)\le v\)).

A constraint C is convertible with regards to an ordering of items \(R_{\Sigma }\) if for any itemset P satisfying C, the P suffixes satisfy C or/and itemsets with P as suffix satisfy C (e.g. \(avg(P)\ge v\)).
To instantiate the formalized constraints, consider three observations (\(\mathbf {x}_1=\{a,b,c\}\), \(\mathbf {x}_2=\{a,b,c,d\}\), \(\mathbf {x}_3=\{a,d\}\)), a minimum support \(\theta _1\) = 1 and length \(\theta _2\) = 2, and the cost table {a:0, b:1, c:2, d:3}. The set of closed fullpatterns satisfying: the monotone constraint range \((P)\ge 2\) is \(\{(\{a,b,c\},\{t_1,t_2\}),(\{a,d\},\{t_1,t_3\}),\) \((\{b,d\},\{t_2\})\}\); the antimonotone constraint sum \((P)\le 1\) is \(\{(\{a,b\},\{t_1,t_2\})\}\); the succint \(P\supseteq \{c,d\}\) is \(\{(\{a,b,c,d\},\{t_2\})\}\); and the convertible constraint avg \((P)\ge 2\) is \(\{(\{b,c,d\},\{t_2\})\}\).
Association rule constraints
Constraints satisfying these properties can be also effectively applied in the context of association rule mining (for the discovery of noisetolerant biclusters [1, 20]). In this context, constraints need to be satisfied by the antecedent, consequent, or can be alternatively applied during the generation of frequent itemsets, prior to the composition of rules.
Additional constraints to guarantee specific correlation/interestingness criteria [25] or the dissimilarity and minimality of rules [26] can be specified.
In the context of association rulebased biclustering, a fullconstraint is evaluated against the union of items on the antecedent and consequent as well as the union of supporting transactions of the antecedent and consequent. Given \(P{:}\;P_1\rightarrow P_2\) and a constraint C, P satisfies C if the fullpattern given by \((\Upsilon _{P_1\cup P_2},\Phi _{P_1}\cup \Phi _{P_2},\) \(\psi _{P_1\cup P_2},P_1\cup P_2)\) satisfies C.
Sequential pattern constraints
The introduced concepts can be further extended for the incorporation of constraints in the context of sequential pattern mining (for the discovery of orderpreserving biclusters [19]). A sequence P is an ordered set of itemsets, each itemset being a set of indexes in Y. Given a matrix (X, Y) with n = 5 rows and m = 3 columns and a minimum support \(\theta _1\) = 3, (\(y_2\le y_1\wedge y_2\le y_3,\{x_2,x_4,x_5\},\{y_1,y_2,y_3\}\), \(\langle y_2(y_1y_3) \rangle\)) is an illustrative fullpattern. Interestingly, the sequential pattern \(\Upsilon _{P}\) does not explicitly disclose the value expectations \(\varphi _B\). Instead, \(\Upsilon _{P}\) is associated with an ordering relation (such as \(y_2\le y_1\wedge y_2\le y_3\)). In this context, the following constraints can be specified: item constraints (e.g. \(\{y_1,y_3\}\subseteq P\)); length constraints (minimum/maximum number of precedences and/or cooccurrences); superpattern constraints (patterns that contain a particular set of patterns as subpatterns \({}y_2\le y_1\subseteq P\)); and, more interestingly, regular expressions (e.g. \(P\equiv y_{\bullet }\le \{y_{\bullet },y_{\bullet }\}\)). Constraints concerning value expectations can be also specified using the values from a given ordering based on the median of values from the supporting rows and columns (e.g. \(b\le a\) or \(1.3\le 0.4\)). As a result, aggregate functions can be additionally specified within sequential pattern constraints.
With regards to properties of the aforementioned constraints: length constraints are antimonotonic, while superpattern constraints are monotonic. Item constraints, length constraints and superpattern constraints are all succinct. Some aggregate constraints and regular expressions can also show nice properties [27].
Related work
Related work is surveyed according to: (1) the contributions and limitations of existing attempts to perform biclustering with domain knowledge; (2) the stateoftheart on domaindriven pattern mining; and (3) the existing efforts towards fullpattern mining and their adequacy to accommodate domain knowledge.
Knowledgedriven biclustering
The use of domain knowledge to guide biclustering has been increasingly stressed since solutions with good homogeneity and statistical significance may not necessarily be biologically relevant. However, few biclustering algorithms are able to incorporate domain knowledge.
AIISA [7], GenMiner [8] and scatter biclustering [10] are able to annotate data with functional terms retrieved from repositories with ontologies and use these annotations to guide the search.
COBIC [28] is able to adjust its behavior (maximumflow/minimumcut parameters) in the presence of background knowledge. Similarly, the priors and architectures of generative biclustering algorithms [29] can also be parameterized to accommodate specific forms of background knowledge. However, COBIC and its generative peers support only the definition of constraints concerning the algorithm’s behavior and are not able to deliver flexible biclustering solutions.
Fang et al. [4] proposed a constraintbased algorithm enabling the discovery of dense biclusters associated with highorder combinations of singlenucleotide polymorphisms (SNPs). DataPeeler [5], as well as algorithms from formal concept analysis [6] and bisets mining [30], are able to efficiently discover dense biclusters in binary matrices in the presence of (anti)monotone constraints. However, these algorithms impose a very restrictive form of homogeneity in the delivered biclusters.
Domaindriven pattern mining
A large number of studies explored how constraints can be used to guide pattern mining tasks. Two major paradigms are available: constraintprogramming (CP) [16] and dedicated searches [15, 31]. CP allows pattern mining to be declaratively defined according to sets of constraints [16, 32]. These declarative models can allow for complex mathematical expressions on the set of fullpatterns. Nevertheless, due to the poor scalability of CP methods, they have been only used in highly constrained settings, smalltomedium sized data, or to mine approximate patterns [16, 32].
Pattern mining searches have been adapted to seize efficiency gains from different types of constraints [15, 31, 33]. These efforts aim to replace naïve solutions based on postfiltering to guarantee the satisfaction of constraints. Instead, the constraints are pushed as deep as possible within the mining step for an optimal pruning of the search space. The nice properties exhibited by constraints, such as antimonotone and succinct properties, have been initially seized in the context of frequent itemset mining by Apriori methods [31] to affect the generation of candidates. Convertible constraints can hardly be pushed in Apriori methods but can be adequately handled by pattern growth methods such as FPGrowth [15]. FICA, FICM, and more recently MCFPTree [15], are FPGrowth extensions to further explore opportunities from diverse constraints. The inclusion of monotone constraints is more complex. Filtering methods, such as ExAnte [34], are able to combine antimonotone and monotone pruning based on reduction procedures. Empirical evidence shows that these reductions are optimally handled within pattern growth methods by adequately growing and pruning small FPTrees (referred as FPBonsais) [33].
These contributions were extended for association rule mining [33, 35]. In particular, nice properties were studied for item constraints [35], support constraints [36], bounds interestingness criteria [37], and constraints on the structure and dissimilarity of rules (respectively referred as schema and opportunistic) [38].
Similarly, some studies proposed ways to effectively incorporate constraints within Apriori and patterngrowth searches for sequential pattern mining (SPM) [27, 39]. Apriori searches were first extended to incorporate temporal constraints and userdefined taxonomies [39]. Mining frequent episodes in a sequence of events [40] can also be viewed as a constrained SPM task by seeing episodes as constraints in the form of acyclic graphs. SPIRIT [41] revises the Apriori search to incorporate a broader range of constraints with nice properties and regular expressions. Pattern growth searches based on data projections, such as PrefixSpan, were only later extended by Pei et al. [27, 42] to support a wideset of constraints with nice properties. Although multiple studies have been proposed on the use of temporal constraints for SPM, including length and gap constraints [27, 43], these constraints are not relevant for the aim of learning orderpreserving models.
Fullpattern mining with constraints
There are three major classes of fullpattern mining searches [1, 44, 45]: (1) AprioriTIDbased searches, generally suffering from costs of candidate generation for dense datasets and low support thresholds; (2) searches with vertical projections, which show efficiency bottlenecks for data with a high number of transactions since the bitset cardinality becomes large and associated intersection procedures expensive; and (3) recently proposed patterngrowth searches based on the annotation of original patterngrowth structures with transactions’ identifiers. In particular, F2G [17] and IndexSpan [18] (default options in BicPAM, BiP, BicNET and BicSPAM biclustering algorithms [14, 19, 20, 22]) were the first patterngrowth searches for fullpattern mining aiming to surpass memory and time bottlenecks associated with bitset and diffset structures used by AprioriTID and verticalbased searches.
Despite the high number of contributions from domaindriven pattern mining, the ability of patterngrowth searches to effectively incorporate fullconstraints with nice properties (Definition 9) was not yet demonstrated.
Solution: Patternbased biclustering with domain knowledge
This section extends patternbased biclustering algorithms [1] to accommodate constraints by proposing BiC2PAM (BiClustering with Constraints using PAttern Mining). In what follows, we first provide principles for biclustering annotated biological data. Second, meaningful fullconstraints with nice properties are listed to guide expression data analysis and network data analysis. The possibility to specify alternative constraints in order to customize the structure, coherency, quality and statistical significance of biclustering solutions according to available knowledge is discussed in Appendix. Third, we describe a set of principles for the specification, processing and incorporation of constraints within patternbased biclustering. Finally, we adapt the fullpattern mining searches used within BiC2PAM in order to seize heightened efficiency gains by exploring the properties associated with the inputted constraints.
Biclustering with annotations extracted from knowledge repositories and literature
Domain knowledge comes often in the form of annotations associated with specific rows and columns in a matrix (or nodes in a network). These annotations are often retrieved from knowledge repositories, semantic sources and/or literature. Annotations can be either directly derived from the properties associated with each row/column/node (e.g. properties of a gene or a sample in gene expression data) or can be implicitly predicted based on the observed values by using feature extraction procedures. For instance, consider the set of functional annotations associated with gene ontology (GO) terms [46]. A GO term is associated with an interrelated group of genes associated with a specific biological process. Since a gene can participate in multiple biological processes, genes can have an arbitrary number of functional annotations. As such, rows in an expression matrix (or nodes in a biological network) can be annotated with a nonfixed number of labels.
Patternbased biclustering supports the integrated analysis of matrices and annotations recurring to one of two strategies. First, association rules or sequential rules can be used to guide the biclustering task in the presence of annotations according to the principles introduced by Martinez et al. [8]. In this context, annotations can either appear in the consequent, antecedent or on both sides of an association rule. Biclusters can then be inferred from these rules using the principles introduced by Henriques et al. [1]. Illustrating, a rule \(\{y_12,y_42\}\rightarrow \{T_1, T_2\}\) supported by \(\{x_1,x_3,x_5\}\) rows can be used to compose a bicluster \((\{y_1,y_4\},\{x_1,x_3,x_5\})\) with elements consistently associated with annotations \(T_1\) and \(T_2\). Learning association rules with levels of confidence (or alternative interestingness scores) below 100 % [20] is relevant to discover biclusters with consistent annotations without imposing a subset of annotations to appear on all rows/columns of each bicluster.
Second, the annotations can be included directly within data since pattern mining is able to rely on rows with an arbitrary length. To this aim, annotations are associated with a new dedicated symbol and appended to the respective rows, possibly leading to a set of observations with varying length. Consider the annotations \(T_1\) and \(T_2\) to be respectively associated with genes \(\{x_1,x_3,x_4\}\) and \(\{x_3,x_5\}\), an illustrative transactional database of itemsets for this scenario would be \(\{x_1=\{a_{11},\ldots,a_{1m},T_1\},x_2=\{a_{21},\ldots,a_{2m}\},x_3=\{a_{31},\ldots,a_{3m},T_1,T_2\},\ldots\}\). Databases of sequences (for orderpreserving biclustering) can be composed by appending terms either at the end or the beginning of each sequence.
Given these enriched databases, pattern mining can then be applied on top of these annotated transactions with succinct, (anti)monotone and convertible constraints. Succinct constraints can be incorporated to guarantee the inclusion of certain terms (such as \(P\cap \{T_1,T_2\}\) \(\ne\) 0). This is useful to discover, for instance, biclusters with genes participating in specific functions of interest. (Anti)monotone convertible constraints can be, alternatively incorporated to guarantee, for instance, that a bicluster associated with a discovered pattern is functionally consistent, meaning that it can be mapped to a single annotation. The \(P\cap \{T_1,T_2\}\ge 1\) constraint is antimonotone and satisfies the convertible condition: if P satisfies C, the P suffixes also satisfy C.
Interestingly, the two previous strategies can be seen as equivalent when assuming that the discovery of the introduced class of association rules is guided by rulebased constraints and the discovery of patterns from annotated data is guided by itemset/sequence constraints.
Biological constraints with properties of interest
Different types of constraints were introduced in Definition 11. In order to show how these constraints can be specified and instantiated, this section provides examples of meaningful constraints for gene expression and network data analysis.
Note that similar constraints can be formulated for the analysis of alternative biological data, including: structural genome variations to enable the discovery of highorder singlenucleotide polymorphisms; genomewide data to find promoters where mutations or appearing binding sites show properties of interest; or medical data to force the inclusion of certain clinical features or to focus on lesstrivial disease markers.
Gene expression data analysis
First, succinct constraints in gene expression analysis allow the discovery of genes with specific constrained levels of expression across a subset of conditions. Illustrating, \(min(\varphi _B)\) = −3 implies an interest in biclusters (putative biological processes) where genes are at least highly repressed in one condition. Alternatively, succinct constraints can be used to discover nontrivial biclusters by focusing on nonhighly differential expression (e.g. patterns with symbols {−2,2}). Such option contrasts with the large focus on dense biclusters [2], thus enabling the discovery of lesstrivial yet coherent modules.
Second, (anti)monotone constraints are key to capture background knowledge and guide biclustering. For instance, the nonsuccinct monotonic constraint countVal \((\varphi _B)\ge 2\) implies that at least two different levels of expression must be present within a bicluster (putative biological process). In gene expression analysis, biclusters should be able to accommodate genes with different ranges of upregulation and/or downregulation. Yet, the majority of existing biclustering approaches can only model a single value across conditions [2, 14]. When constraints, such as the valuecounting inequality, are available, efficiency bottlenecks can be tackled by adequately pruning the search space.
Finally, convertible constraints also play an important role in biological settings to guarantee, for instance, that the observed patterns have an average of values within a specific range. Illustrating, the antimonotonic convertible constraint \(avg(\varphi _B)\le 0\) indicates a preference for patterns with repression mechanisms without a strict exclusion of activation mechanisms. These constraints are useful to focus the discovery on specific expression levels, while still allowing for noise deviations. Understandably, they are a robust alternative to the use of strict bounds from succinct constraints with maximum–minimum inequalities.
Biological network data analysis
To motivate the relevance of inputting similar constraints for the analysis of biological networks, we use again the tabular dataset provided in Fig. 4. In this context, rows and columns correspond to nodes associated with biological entities (such as genes, proteins, protein complexes or other molecular compounds), and the values in the matrix correspond to the strength of the interactions between the nodes. As such, the strength of the interactions is either negative {−3, −2} (e.g. inhibition), weak {−1, 0, 1} or positive {2, 3} (e.g. activation).
First, succinct constraints can be specified for the discovery of sets of nodes with specific interaction patterns of interest. Illustrating, \(\{2,2\}\subseteq \varphi _B\) implies an interest on nondense network modules (coherent interactions with soft inhibition and activation) to disclose nontrivial regulatory activity, and \(min(\varphi _B)=3\wedge max(\varphi _B)=3\) implies a focus on modules with the simultaneous presence of highly positive and negative interactions.
Second, (anti)monotone constraints are key to discover network modules with distinct yet coherent regulatory interactions. For instance, the nonsuccinct monotonic constraint countVal \((\varphi _B)\ge 3\) implies that at least three different types of interactions must be present within a module.
Finally, convertible constraints are useful to place nonstrict expectations on the desirable patterns, yet still accommodating deviations from expectations. Illustrating, \(avg(\varphi _B)\le 0\) indicates a preference for network modules with negative interactions without a strict exclusion of positive interactions.
Constraints with nice properties can be alternatively applied for networks with qualitative interactions. Regulatory interactions, such as “binds”, “activates” or “enhances”, are increasingly observed for a widevariety of proteinprotein and gene interaction networks [47, 48]. In this context, assuming the presence of {a, b, c} types of biological interactions, an illustrative antimonotone constraint is \(\varphi _B\cap \{a,b\}\ge 0\).
Biological data analysis with fullconstraints
Although less motivated, constraints can be also defined on the powerset of rows, columns and/or values per columns. In fact, the minimum support and minimum pattern length can be seen as constraints over I and J indexes, respectively. An alternative constraint over I and J is to require that biclusters include a minimum number rows/columns from a particular subset of rows/columns of interest. An illustrative succinct constraint in \(Y\times \mathcal {L}\) is \(P\cap \{y_2\)\(3,y_23\}\ne \emptyset\), which implies an interest in biclusters with differential expression (or interactions) associated with the \(\mathbf {y}_2\) sample/gene/node.
Please have in mind that the constraints instantiated throughout this section represent a small subset of all possible constraints of interest, thus being mainly introduced for the sake of motivating the relevance of succinct, (anti)monotone and convertible properties. The specification of constraints of interest is always dependent on the learning goal and the peculiarities of the input data. As such, an exhaustive listing and discussion of relevant constraints for biological data contexts is considered to be out the scope of this work.
Biclustering with fullconstraints

if native constraints are inputted, BiC2PAM maps them into parameterizations along the mapping, mining and closing steps of BicPAMS (Appendix);

if constraints without nice properties are inputted, BiC2PAM satisfies them recurring to postfiltering verifications;

if constraints with nice properties are inputted, BiC2PAM implements pruning heuristics from previous research on constraintbased Aprioribased methods [36, 41].
Similarly, constraints from \(\psi _P\in 2^{Y}\) are mapped to constraints over \(P\in 2^{Y\times \mathcal {L}}\). Illustrating, \(y_2\in Y\) is mapped as \(P\cap \{y_2a,y_2b,\ldots\}\ne \emptyset\).
Finally, constraints from \(\Phi _P\in 2^{X}\) are incorporated by adjusting the Apriori searches to effectively prune the search space. Consider a succinct constraint that specifies a set of transactions to be included in the resulting biclusters. In this case, as soon as a generated candidate is no longer supported by any transaction of interest, there is no need to further generate new candidates and, thus, the search space can be pruned at this point.
Understandably, despite the inherent simplicity of incorporating constraints with nice properties in Aprioribased searches, there is a critical drawback: the inability to rely on key patterngrowth searches, such as F2G (for the discovery of constant/additive/symmetric/plaid biclusters) and IndexSpan (for the discovery of orderpreserving biclusters). These patterngrowth searches were previously shown to be able to mine large data with superior efficiency [17, 18]. Adding to this observation, there is a considerable agreement that the underlying structures of patterngrowth searches, such as frequentpattern trees and prefixgrowth trees, provide a more adequate representation of the search space for an improved pruning.
Exploring efficiency gains from constraints with nice properties
Although the incorporation of constraints with nice properties can only be easily supported under Aprioribased searches, there is large consensus that patterngrowth searches are better positioned to seize efficiency gains from these constraints than peer Aprioribased and vertical searches. As such, F2GBonsai and IndexSpanPG, described below, extend respectively the recently proposed F2G (fullfrequent itemset miner) and IndexSpan (fullsequential pattern miner) algorithms to guarantee a more effective pruning of the search space in the presence of constraints. These extensions are integrated in BiC2PAM. Native constraints are effectively incorporated in BiC2PAM through adequate parameterizations of patternbased biclustering algorithms (Appendix).
F2GBonsai: F2G with itemset constraints
Compliance with different types of constraints
Unlike candidate generation methods, pattern growth searches provide further pruning opportunities. Pruning principles can be standardly applied on both the original database (FPTree) and on each projected database (conditional FPTree).
The CFG method extends patterngrowth searches [15] to seize the properties of nice constraints using simplistic principles. Supersets of itemsets violating antimonotone constraints are removed from each (conditional) FPTree. Illustrating, in the presence of \(sum(\Upsilon _P)\le 3\), when analyzing the \(y_12\) conditional database, the following items \(\cup _{i=1}^{m} \{y_i2,y_i3\}\) can be removed to avoid conflicts as their sum violates the given constraint. For an effective pruning, it is recommended to order the symbols in the header table according to their value and support [15, 24]. F2G is compliant with these pruning heuristics, since it allows the rising of transactionIDs in the FPTree according to the order of candidate items for removal in the header table (see Algorithms 1 and 2 in [17]).
For the particular case of an antimonotone convertible constraint, itemsets that satisfy the constraint are efficiently generated under a patterngrowth search [24]. This is done by assuming that original/conditional FPtrees are built according to a price table and by pruning patterns that no longer satisfy an antimonotone convertible constraint since the inclusion of new items will no longer satisfy the constraint. Illustrating, since \(\{y_1\)−\(3,y_42,y_23\}\) does not satisfies \(avg(\Upsilon _P)\le 0\), there is no need to further build \(\{y_1\)−\(3,y_42,y_23\}\)conditional trees. Therefore, this principle provides an important criterion to stop FPtree projections and/or prune items in a (conditional) FPtree.
Finally, the transactions and items within a (conditional) FPtree that conflict with a given constraint can be directly removed without causing any changes on the resulting set of valid patterns. Illustrating, given \(min(\Upsilon _P)=0\) constraint, the transactions \(\mathbf {x}_1=\{y_1\)−\(1,y_23,y_31\}\) and \(\mathbf {x}_4=\{y_11,y_2\)−\(1,y_32\}\) can be directly removed as they do not satisfy this succinct constraint. Similarly, given the same constraint, \(min(\Upsilon _P)=0\), the items with values below 0 can be removed. With regards to transactions \(\mathbf {x}_1\) and \(\mathbf {x}_4\), this means removing \(a_{1,1}=y_1\)−1 and \(a_{4,2}=y_2\)−1 items.
Furthermore, constraint checks can be avoided for subsets of itemsets satisfying a monotone constraint. Illustrating, no further checks are needed in the presence of countVal \((\Upsilon _P)\ge 2\) constraint when the range of values in the suffix of a pattern is \(\ge\)2 under the \(\{y_10,y_11\}\)conditional FPTree.
Combination of constraints with nice properties
The previous extensions to patterngrowth searches are not able to effectively comply with monotone constraints when antimonotone constraints (such as minimum support) are also considered. In FPBonsai [33], principles to further explore the monotone properties for pruning the search space are considered without reducing antimonotone pruning opportunities. This method is based on datareduction operations originally implemented in ExAnte to seize efficiency gains from the properties of monotone constraints. There are two datareductions: \(\mu\)reduction, which deletes transactions not satisfying C; and \(\alpha\)reduction, which deletes from transactions single items not satisfying C. Thanks to the recursive projections of FPgrowth, the ExAnte datareduction methods can be applied on each conditional FPtree to obtain a compact number of smaller FPTrees (FPBonsais). The FPBonsai method can be combined with the previously introduced principles, which are particularly prone to handle succinct and convertible antimonotone constraints. F2G can be extended to support these reductions on the (conditional) FPTrees by guaranteeing that transactions consistently rise up. The only requirement is to preserve the order of items in the header table [17]. As such, F2G complies with the FPBonsai extension (see Algorithm 2).
IndexSpanPG: IndexSpan with sequential pattern constraints
The work of Pei et al. [27] provides principles to extend patterngrowth searches with prefixbased database projections and no candidate generation to effectively incorporate regular expressions and constraints with nice properties. For this aim, the prefixmonotone property is defined. A constraint is called prefixmonotone if it is prefix antimonotonic or prefix monotonic. With a prefixmonotone constraint, there is only the need to search in the projected databases for prefixes that satisfy the constraint. When a constraint C is: (1) prefix antimonotonic, if C(P) = false, then there exists no sequential patterns containing P has a prefix and also satisfies C; (2) prefix monotonic, if C(P) = true, then every sequential pattern having P as a prefix satisfies C; and (3) a regular expression, if the prefix of a given sequential pattern is conflicting with the regular expression C, then there is no need to further expand (i.e. there are no sequential patterns with the same prefix that also satisfy C). As such, since monotonic, antimonotonic and regular expression constraints are prefixmonotone they can be pushed deep into the search. Understandably, the efficiency gains associated with such constraints cannot be attained under Aprioribased searches [41]. Although succinct constraints are not necessarily prefix antimonotonic or prefix monotonic, they can also be easily pushed deep into the mining process (independently of the applied SPM method).
According to these principles, we extended IndexSpan [18], an extension of PrefixSpan to explore efficiency gains from the intrinsic properties of the orderpreserving biclustering task. IndexSpan is compliant with the enumerated principles. The minimalist data structures, fast database projections and early pruning techniques [18] do not interfere with the underlying prefixgrowth behavior, the essential requirement to incorporate prefixmonotone constraints. Furthermore, given the fact that IndexSpan explores itemindexable properties associated with the orderpreserving biclustering task, testing constraints is done in an efficient and elegant way (see Algorithm 3). This is true with regards to both: (1) the validation of whether an antimonotonic constraint (or regular expression) cannot be satisfied by a given prefix (in order to stop its growth), and (2) the validation of whether a a monotonic constraint cannot be satisfied by a given (projected) sequence (in order to prune the search).
BiC2PAM: algorithmic details
Understandably, the behavior and performance of Algorithm 1 is essentially dependent on the underlying domaindriven pattern mining searches. Algorithms 2 and 3 respectively describe F2GBonsai and IndexSpanPG in accordance with the pruning principles respectively introduced in "F2GBonsai: F2G with itemset constraints" and "IndexSpanPG: indexSpan with sequential pattern constraints" sections. In F2GBonsai, reductions of the search space are efficiently applied during the creation of the initial FPtree and of each conditional FPtree (lines 7 and 32). Succinct, monotone, frequency and antimonotone reductions are efficiently applied in this order. In IndexSpanPG, the pruning of conflicting sequences or items with sequential constraints is done after the initial construction of the itemindexable database and after each database projection (lines 6, 24 and 29). Moreover, the growing of a given prefix is stopped whenever the prefix contradicts an antimonotonic constraint or regular expression (lines 21 and 26). In order to avoid an unnecessary overhead for biclustering tasks in the presence of high number of constraints, the pruning principles in F2GBonsai and IndexSpanPG might be only applied for certain database projections. In this case, the periodicity \(\tau\) of projections eligible for pruning should be given as input to the algorithms (\(\tau\) = 1 by default).
The computational complexity of BiC2PAM is bounded by the complexity of the patternbased biclustering task in the absence of constraints. The complexity of patternbased biclustering tasks for dense and sparse matrices can be respectively consulted in the documentation of BicPAM [14] and BicNET [3].
BiC2PAM also provides default behaviors in order to guarantee a friendly environment for users without expertise in biclustering. For this aim, BiC2PAM makes available: (1) default parameterizations (dataindependent setting) and (2) dynamic parameterizations (datadependent setting). Default parameterizations include: (1) zeromean roworiented normalization followed by overall Gaussian discretization with n/4 items for orderpreserving coherencies (for an adequate tradeoff of precedences vs. cooccurrences) and a set of \(\{3,5,7\}\) items for the remaining coherencies; (2) iterative discovery of biclusters with distinct coherencies (constant, symmetric, additive and orderpreserving); (3) F2GBonsai search for closed FIM and association rule mining, and IndexSpanPG search for SPM; (4) multiitem assignments; (5) merging of biclusters with over 70 % Jaccardbased similarity; (6) a filtering procedure for biclusters without statistical significance (according to [49]) and a 60 % Jaccardbased similarity against a larger bicluster; and (7) no constraints. For the default setting, BiC2PAM iteratively decreases the support threshold by 10 % (starting with \(\theta\) = 80 %) until the output solution discovers 50 dissimilar biclusters or a minimum coverage of 10 % of the inputted matrix elements or network interactions. Dynamic parameterizations enable the: (1) selection of datadriven normalization and discretization procedures according to their fitting error, and (2) activation of data partitioning procedures for large matrices: over 100 million elements (excluding missing values) for the discovery of constant biclusters and over 1 million elements for the remaining coherencies.
Results
This section provides empirical evidence of the soundness of the proposed contributions and of the relevance of using constraints within (patternbased) biclustering to prune the search space and guarantee biologically significant solutions. To this end, we assessed the performance of BiC2PAM on synthetic data, gene expression data and biological networks in the presence of domain knowledge. BiC2PAM was parameterized with default behavior and applied with F2GBonsai for the discovery of constant biclusters with itemset constraints and with IndexSpanPG for the discovery of orderpreserving biclusters with sequential pattern constraints. The stopping criteria of BiC2PAM was specified as a minimum of 20 dissimilar biclusters for synthetic data contexts and 50 dissimilar biclusters for real data contexts. BiC2PAM is implemented in Java (JVM v1.6.024). The experiments were computed using an Intel Core i5 2.30GHz with 6GB of RAM.
Results on synthetic data
Synthetic data
Properties of the generated dataset settings.
Nonexhaustive list of matrices (\(\sharp\)rows \(\times\) \(\sharp\)columns)  500 × 50  1000 × 100  2000 × 200  4000 × 400 

Number of hidden biclusters (K)  \(6\times \frac{1}{\mu }\)  \(10\times \frac{1}{\mu }\)  \(15\times \frac{1}{\mu }\)  \(20\times \frac{1}{\mu }\) 
Number of rows per hidden bicluster  \(\mu\)[50,70]  \(\mu\)[70,100]  \(\mu\)[100,200]  \(\mu\)[200,300] 
Number of columns per hidden bicluster  \(\mu\)[5,7]  \(\mu\)[7,10]  \(\mu\)[8,12]  \(\mu\)[10,15] 
Uninformative elements
Incorporating annotations
Itemset constraints
Sequential pattern constraints
Fullpattern growth searches
Results on biological data
Real data
Uninformative elements
Annotations
Succinct, monotone and convertible constraints
Figures 16 and 17 show the impact of inputting biologically meaningful constraints in the efficiency and effectiveness of BiC2PAM. For this purpose, we used the complete gasch dataset (6152 × 176) [54] with five levels of expression (\(\mathcal {L}\) = 6). The impact of considering a diverse set of constraints in the efficiency levels of BiC2PAM is provided in Fig. 16. The observed results demonstrate the relevance of using meaningful constraints with succinct, (anti)monotone and convertible properties not only to guarantee a userguided focus on specific regions of interest, but also to promote the tractability to perform biclustering to solve computationally complex biological problems and analyzes.
Conclusions and future work
This work motivates the relevance of constraintguided biclustering for biological data analysis with domain knowledge. To answer this task, we explored the synergies between patternbased biclustering and domaindriven pattern mining. As a result, BiC2PAM algorithm was proposed with two major goals: (1) to learn biclustering models in the presence of an arbitrary number of annotations from knowledge repositories and literature, and (2) to effectively incorporate constraints with nice properties derived from user expectations. BiC2PAM can therefore be applied in the presence of domain knowledge to guarantee a focus on relevant regions and explore potentially high efficiency gains.
We further demonstrated the consistency between domaindriven pattern mining and patternbased biclustering based on the notion of fullpatterns; surveyed the major drawbacks of existing research towards this end; and extended patterngrowth searches with stateoftheart principles to prune the search space by pushing constraints with nice properties deep into the mining process. In particular, we showed the compliance of F2G searches with principles to effectively prune (conditional) FPTrees, and the compliance of IndexSpan searches with principles to effectively prune prefixgrowth structures. These searches were respectively extended to support patternbased biclustering with constant and orderpreserving assumptions.
Meaningful constraints with succinct, monotone, antimonotone and convertible properties were presented for distinct biological tasks (gene expression analysis and network data analysis) in order to focus the search space on lesstrivial yet coherent regions.
Results from synthetic and real data show that the incorporation of background knowledge leads to large efficiency gains that turn the biclustering task tractable for largescale data. We further provide initial evidence of the relevance of the supported types of constraints to discover nontrivial yet meaningful biclusters in expression and network data with heightened biological significance.
Four major directions are identified for future work. First, the extension of the proposed contributions towards classification tasks based on the discriminative properties of biclusters in labeled data contexts. Second, an indepth systematization of constraints with nice properties across biological data domains, including a structured view on their relevance for omic, genomewide and chemical data analysis. Third, a broader quantification of the impact of incorporating constraints across these data domains. Finally, the extension of the proposed framework for the tasks of biclustering time series data and triclustering multivariate time series data in the presence of temporal constraints.
Data and software availability
The datasets and BiC2PAM software are available in http://web.ist.utl.pt/rmch/software/bic2pam/.
Biclustering involves combinatorial optimization to select and group rows and columns and it is known to be a NPhard problem (proven by mapping the problem of finding maximum edge (bi)clique in a bipartite graph into the problem of finding dense biclusters with maximum size [2, 10]). The problem complexity increases for nonbinary data contexts and when elements are allowed to participate in more than one bicluster (nonexclusive structure) and in no bicluster at all (nonexhaustive structure).
Abbreviations
 BicNET:

Biclustering NETworks (algorithm)
 Bic2PAM:

BiClustering with Constraints using PAttern Mining (algorithm)
 BicPAM:

BiClustering using PAttern Mining (algorithm)
 BicSPAM:

Biclustering using Sequential PAttern Mining (algorithm)
 BiModule:

Biclustering Modules (algorithm)
 BiP:

Biclustering Plaid models (algorithm)
 DeBi:

Differentially expressed Biclustering (algorithm)
 F2G:

Full Frequentpattern Growth
 FIM:

Frequent Itemset Mining
 FP:

Frequent Pattern
 GO:

Gene Ontology
 SPM:

Sequential Pattern Mining
Declarations
Authors’ contributions
RH designed the algorithms under the close supervision of SCM. Both authors revised the final manuscript. Both authors read and approved the final manuscript.
Competing interests
The authors declare that they have no competing interests.
Ethical approval and consent to participate
Not Applicable. The manuscript does not report new studies involving any animal or human data or tissue.
Funding and acknowledgments
This work was supported by Fundação para a Ciência e Tecnologia under the project Neuroclinomics2 PTDC/EEISII/1937/2014, InescID plurianual with reference UID/CEC/50021/2013, the research Grant SFRH/BD/75924/2011 to RH, and the sabbatical leave Grant SFRH/BSAB/1427/2014 to SCM. SCM was also partially funded by the EURIAS Fellowship Programme and the European Commission (MarieSklodowskaCurie actions CoFUND ProgrammeFP7) through a grant for a junior fellowship position at Istituto di Studi Avanzati, University of Bologna, Italy.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Authors’ Affiliations
References
 Henriques R, Antunes C, Madeira SC. A structured view on pattern miningbased biclustering. Pattern Recogn. 2015;48(12):3941–58.View ArticleGoogle Scholar
 Madeira SC, Oliveira AL. Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans Comput Biol Bioinformatics. 2004;1:24–45.View ArticlePubMedGoogle Scholar
 Henriques R, Madeira SC. BicNET: flexible module discovery in largescale biological networks using biclustering. Algorithms Mol Biol. 2016;11:1–30.View ArticleGoogle Scholar
 Fang G, Haznadar M, Wang W, Yu H, Steinbach M, Church TR, Oetting WS, Van Ness B, Kumar V, Highorder SNP combinations associated with complex diseases: efficient discovery, statistical power and functional interactions. Plos One. 2012;7:e33531. doi:10.1371/journal.pone.0033531.View ArticlePubMedPubMed CentralGoogle Scholar
 Guerra I, Cerf L, Foscarini J, Boaventura M, Meira W. Constraintbased search of straddling biclusters and discriminative patterns. JIDM. 2013;4(2):114–23.Google Scholar
 Kuznetsov SO, Poelmans J. Knowledge representation and processing with formal concept analysis. Wiley Interdisc Rev Data Min Knowl Discov. 2013;3(3):200–15.View ArticleGoogle Scholar
 Visconti A, Cordero F, Pensa RG. Leveraging additional knowledge to support coherent bicluster discovery in gene expression data. Intell Data Anal. 2014;18(5):837–55.Google Scholar
 Martinez R, Pasquier C, Pasquier N, Martinez R, Pasquier C, Pasquier N. GenMiner: mining informative association rules from genomic data. In BIBM. Washington, D.C.: IEEE CS; 2007.
 Nepomuceno JA, Troncoso A, NepomucenoChamorro IA, AguilarRuiz JS. Integrating biological knowledge based on functional annotations for biclustering of gene expression data. Computer Methods Programs Biomed. 2015;119(3):163–80.View ArticleGoogle Scholar
 Peeters R. The maximum edge biclique problem is NPcomplete. Discrete Appl Math. 2003;131(3):651–4.View ArticleGoogle Scholar
 Hochreiter S, Bodenhofer U, Heusel M, Mayr A, Mitterecker A, Kasim A, Khamiakova T, Van Sanden S, Lin D, Talloen W, Bijnens L, Göhlmann HWH, Shkedy Z, Clevert DA. FABIA: factor analysis for bicluster acquisition. Bioinformatics. 2010;26(12):1520–7.View ArticlePubMedPubMed CentralGoogle Scholar
 Serin A, Vingron M. DeBi: discovering differentially expressed biclusters using a frequent itemset approach. Algorithms Mol Biol. 2011;6:1–12.View ArticleGoogle Scholar
 Okada Y, Okubo K, Horton P, Fujibuchi W. Exhaustive search method of gene expression modules and its application to human tissue data. IAENG Int J Comput Sci. 2007;34:119–26.Google Scholar
 Henriques R, Madeira S. BicPAM: patternbased biclustering for biomedical data analysis. Algorithms Mol Biol. 2014;9:27.View ArticlePubMedPubMed CentralGoogle Scholar
 Pei J, Han J. Can we push more constraints into frequent pattern mining? In KDD. New York: ACM; 2000. p. 350–4.
 Bonchi F, Lucchese C. Extending the stateoftheart of constraintbased pattern discovery. Data Knowl Eng. 2007;60(2):377–99.View ArticleGoogle Scholar
 Henriques R, Madeira SC, Antunes C. F2G: efficient discovery of fullpatterns. In ECML/PKDD nfMCP. Prague; 2013.
 Henriques R, Antunes C, Madeira S. Methods for the efficient discovery of large itemindexable sequential patterns. In: Appice A, Ceci M, Loglisci C, Manco G, Masciari E, Ras ZW, editors. New frontiers in mining complex patterns. Lecture Notes in Computer Science, vol 8399. Springer; 2014. p. 100–116.
 Henriques R, Madeira S. BicSPAM: flexible biclustering using sequential patterns. BMC Bioinform. 2014;15:130.View ArticleGoogle Scholar
 Henriques R, Madeira S. Biclustering with flexible plaid models to unravel interactions between biological processes. IEEE/ACM Transactions on: Comput Biol Bioinform; 2015.12;738–752View ArticlePubMedGoogle Scholar
 Okada Y, Fujibuchi W, Horton P. A biclustering method for gene expression module discovery using closed itemset enumeration algorithm. IPSJ Trans Bioinform. 2007;48(SIG5):39–48.Google Scholar
 Henriques R, Madeira SC. BicNET: efficient biclustering of biological networks to unravel nontrivial modules. In: Algorithms in bioinformatics (WABI), LNCS. Berlin: SpringerVerlag; 2015.
 Marriott K, Stuckey P. Programming with constraints: an introduction. adaptive computation and machine. Cambridge: MIT Press; 1998.Google Scholar
 Pei J, Han J. Constrained frequent pattern mining: a patterngrowth view. SIGKDD Explor Newslett. 2002;4:31–9.View ArticleGoogle Scholar
 Tan PN, Kumar V, Srivastava J. Selecting the right interestingness measure for association patterns. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’02. New York: ACM; 2002. p. 32–41.
 Alves R, RodríguezBaena DS, AguilarRuiz JS. Gene association analysis: a survey of frequent pattern mining from gene expression data. Briefings Bioinform. 2010;11(2):210–24.View ArticleGoogle Scholar
 Pei J, Han J, Wang W. Constraintbased sequential pattern mining: the patterngrowth methods. J Intell Inf Syst. 2007;28(2):133–60.View ArticleGoogle Scholar
 Mouhoubi K, Létocart L, Rouveirol C. A knowledgedriven biclustering method for mining noisy datasets. In: Neural information processing. Berlin:Springer; 2012. p. 585–93.
 Henriques R, Antunes C, Madeira S. Generative modeling of repositories of health records for predictive tasks. Data Min Knowl Discov. 2015;29(4):999–1032. doi:10.1007/s1061801403857.View ArticleGoogle Scholar
 Besson J, Robardet C, De Raedt L, Boulicaut JF. Mining bisets in numerical data. In: Knowledge discovery in inductive databases. Berlin:Springer; 2007. p. 11–23.
 Ng RT, Lakshmanan LVS, Han J, Pang A. Exploratory mining and pruning optimizations of constrained associations rules. SIGMOD R. 1998;27(2):13–24.View ArticleGoogle Scholar
 Khiari M, Boizumault P, Crémilleux B. Constraint programming for mining nary patterns. In: Principles and practice of constraint programming. Berlin: Springer; 2010. p. 552–67.
 Bonchi F, Goethals B. FPBonsai: the art of growing and pruning small FPtrees. In: Dai H, Srikant R, Zhang C, editors. Advances in knowledge discovery and data mining. Berlin Heidelberg: Springer; 2004. p. 155–60.View ArticleGoogle Scholar
 Bonchi F, Giannotti F, Mazzanti A, Pedreschi D. ExAnte: a preprocessing method for frequentpattern mining. IEEE Intell Syst. 2005;20(3):25–31.View ArticleGoogle Scholar
 Srikant R, Vu Q, Agrawal R. Mining association rules with item constraints. KDD. 1997;97:67–73.Google Scholar
 Wang K, He Y, Han J. Pushing support constraints into association rules mining. IEEE Trans Knowl Data Eng. 2003;15(3):642–58.View ArticleGoogle Scholar
 Bayardo RJ, Agrawal R, Gunopulos D. Constraintbased rule mining in large, dense databases. In: 15th international conference on data engineering. New York: IEEE; 1999. p. 188–97.
 Baralis E, Cagliero L, Cerquitelli T, Garza P. Generalized association rule mining with constraints. Inf Sci. 2012;194:68–84.View ArticleGoogle Scholar
 Srikant R, Agrawal R. Mining sequential patterns: generalizations and performance Improvements. In: Proceedings of the 5th international conference on extending database technology: advances in database technology, EDBT ’96. London: SpringerVerlag; 1996. p. 3–17.
 Mannila H, Toivonen H, Verkamo AI. Discovery of frequent episodes in event sequences. Data Min Knowl Discov. 1997;1(3):259–89.View ArticleGoogle Scholar
 Garofalakis MN, Rastogi R, Shim K. SPIRIT: sequential pattern mining with regular expression constraints. VLDB. 1999;99:7–10.Google Scholar
 Pei J, Han J, Wang W. Mining sequential patterns with constraints in large databases. In: Proceedings of the eleventh international conference on information and knowledge management. New York: ACM; 2002. p. 18–25.
 Antunes C, Oliveira AL. Generalization of patterngrowth methods for sequential pattern mining with gap constraints. In: Machine learning and data mining in pattern recognition. Berlin: Springer; 2003. p. 239–51.
 Han J, Cheng H, Xin D, Yan X. Frequent pattern mining: current status and future directions. Data Min Knowl Discov. 2007;15:55–86.View ArticleGoogle Scholar
 Mabroukeh NR, Ezeife CI. A taxonomy of sequential pattern mining algorithms. ACM Comput Surv. 2010;43:3:1–41.View ArticleGoogle Scholar
 Martin D, Brun C, Remy E, Mouren P, Thieffry D, Jacq B. GOToolBox: functional analysis of gene datasets based on gene ontology. Gen Biol. 2004;12:101.View ArticleGoogle Scholar
 MacPherson JI, Dickerson J, Pinney J, Robertson D. Patterns of HIV1 protein interaction identify perturbed hostcellular subsystems. PLoS Comput Biol. 2010;6(7):e1000863.View ArticlePubMedPubMed CentralGoogle Scholar
 Mukhopadhyay A, Maulik U, Bandyopadhyay S. A novel biclustering approach to association rule mining for predicting HIV1human protein interactions. PLoS One. 2012;7(4):e32289.View ArticlePubMedPubMed CentralGoogle Scholar
 Henriques R. Learning from highdimensional data using local descriptive models. PhD thesis, Instituto Superior Tecnico, Universidade de Lisboa, Lisboa; 2016.
 Rosenwald A. dlblc team: the use of molecular profiling to predict survival after chemotherapy for diffuse largeBcell lymphoma. N Engl J Med. 2002;346(25):1937–47.View ArticlePubMedGoogle Scholar
 Lee W, Tillo D, Bray N, Morse RH, Davis RW, Hughes TR, Nislow C. A highresolution atlas of nucleosome occupancy in yeast. Nat Genet. 2007;39(10):1235–44.View ArticlePubMedGoogle Scholar
 Gasch AP, Spellman PT, Kao CM, CarmelHarel O, Eisen MB, Storz G, Botstein D, Brown PO. Genomic expression programs in the response of yeast cells to environmental changes. Mol Biol Cell. 2000;11(12):4241–57.View ArticlePubMedPubMed CentralGoogle Scholar
 Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, HuertaCepas J, Simonovic M, Roth A, Santos A, Tsafou KP, et al. STRING v10: protein–protein interaction networks, integrated over the tree of life. Nucl Acids Res. 2015;43:D447–52.
 Gasch AP, WernerWashburne M. The genomics of yeast responses to environmental stress and starvation. Funct Integr Genom. 2002;2(4–5):181–92.View ArticleGoogle Scholar