BiC2PAM: constraint-guided biclustering for biological data analysis with domain knowledge

Table 1 Properties of the generated dataset settings.

Non-exhaustive list of matrices (\(\sharp\)rows \(\times\) \(\sharp\)columns)	500 × 50	1000 × 100	2000 × 200	4000 × 400
Number of hidden biclusters (K)	\(6\times \frac{1}{\mu }\)	\(10\times \frac{1}{\mu }\)	\(15\times \frac{1}{\mu }\)	\(20\times \frac{1}{\mu }\)
Number of rows per hidden bicluster	\(\mu\)[50,70]	\(\mu\)[70,100]	\(\mu\)[100,200]	\(\mu\)[200,300]
Number of columns per hidden bicluster	\(\mu\)[5,7]	\(\mu\)[7,10]	\(\mu\)[8,12]	\(\mu\)[10,15]

where \(\mu\) defines the flexibility of the underlying coherency assumption (\(\mu\) = 1 for constant and \(\mu\) = 2 for order-preserving)
Additional properties (default settings in bold):
Coherency strength \(\delta\) = {5, 10, 15, 20, 25, 33 %} (or symbols \(|\mathcal {L}|\) = {20, 10, 7, 5, 4, 3})
Deviations on data values in {0, \(\varvec{\delta }\)/2, \({\delta }\), 2\(\delta\)}, and degree of noisy and missing elements in {0, 2, 5, 10 %}
Overlapping degree \(\theta\) = {0, 0.1, 0.2, 0.4} with plaid effects\(^2\) described by f = {sum, product, weighted} (cumulative function) \(\nu\) = {1, 0.7, 0.4} (cumulative effect), \(\epsilon\) = {0.1, 0.2} (noise), \(\kappa\) = {0.5, 0.3, 0.1 K} (average number of interacting biclusters) and \(\phi\) = {1, 0.8, 0.5} (distribution of overlapping areas between the \(\kappa\) bics)— variables according to [20]

ISSN: 1748-7188