# Identification of alternative topological domains in chromatin

- Darya Filippova
^{1, 2}Email author,### Affiliated with

- Rob Patro
^{1}Email author,### Affiliated with

- Geet Duggal
^{1, 2}Email author and### Affiliated with

- Carl Kingsford
^{1}Email author### Affiliated with

**9**:14

**DOI: **10.1186/1748-7188-9-14

© Filippova et al.; licensee BioMed Central Ltd. 2014

**Received: **7 December 2013

**Accepted: **14 April 2014

**Published: **3 May 2014

### Abstract

Chromosome conformation capture experiments have led to the discovery of dense, contiguous, megabase-sized topological domains that are similar across cell types and conserved across species. These domains are strongly correlated with a number of chromatin markers and have since been included in a number of analyses. However, functionally-relevant domains may exist at multiple length scales. We introduce a new and efficient algorithm that is able to capture persistent domains across various resolutions by adjusting a single scale parameter. The ensemble of domains we identify allows us to quantify the degree to which the domain structure is hierarchical as opposed to overlapping, and our analysis reveals a pronounced hierarchical structure in which larger stable domains tend to completely contain smaller domains. The identified novel domains are substantially different from domains reported previously and are highly enriched for insulating factor CTCF binding and histone marks at the boundaries.

### Keywords

Alternative topological domains Chromatin conformation capture Dynamic programming## Background

Chromatin interactions obtained from a variety of recent experimental techniques in chromosome conformation capture (3C) [1] have significantly advanced our understanding of the geometry of chromatin structure [2], its relation to the regulation of gene expression, nuclear organization, cancer translocations [3], and copy number alterations in cancer [4]. Recently, dense, contiguous regions of chromatin termed *topological domains* have been discovered in both mammals [5] and in fruit flies [6]. Topological domains have since been incorporated into many subsequent analyses [7–9] due to the fact that they are persistent across cell types, conserved across species, and serve as a skeleton for the placement of many functional elements of the genome [10, 11].

However, the single collection of megabase-sized domains may not be the only topologically and functionally relevant collection of domains. On closer inspection of the block-diagonal matrix structure in Figure 1, it becomes clear that there are alternative contiguous regions of the chromosome that self-interact frequently and are likely more spatially compact than their surrounding regions (dotted lines). Some of these regions appear to be completely nested within others, suggesting a hierarchy of compact regions along the chromosome, while others appear to overlap each other. These observations suggest that functionally-relevant chromosomal domains may exist at multiple scales potentially contributing to a hierarchy of domains or a more complex relationship between domains.

We introduce a new algorithm to efficiently identify topological domains in 3C interaction matrices for a given domain-length scaling factor *γ*. Our formulation of this problem as a dynamic program allows for an efficient traversal of the solution space to obtain alternative optimal and near-optimal domain sets. Our results suggest that there exist a handful of characteristic resolutions across which domains are similar. Based on this finding, we identify a consensus set of domains that persists across various resolutions. We find that domains discovered by our algorithm are dense and cover interactions of higher frequency than inter-domain interactions. Additionally, we show that inter-domain regions within the consensus domain set are highly enriched with insulator factor CTCF and histone modification marks. We analyze a set of domains from multiple optimal domain sets across scales and establish that the organization of domains is highly hierarchical, suggesting that the generated domains can be used as the basis for understanding the hierarchical organization of the genome and its role in gene regulation. We argue that our straightforward approach retains the essence of the more complex multi-parameter HMM introduced in [5] while allowing for the flexibility to identify biologically relevant domain structures at various scales.

## Problem definition

Given the resolution of the 3C experiment (say, 40kbp), the chromosome is broken into *n* evenly sized fragments. 3C contact maps record interactions between different sections of the chromosome in the form of a weighted adjacency matrix **A** where two fragments *i* and *j* interact with frequency **A**
_{
i
j
}.

###
**Problem**
**1** (Resolution-specific domains).

Given a *n*×*n* weighted adjacency matrix **A** and a resolution parameter *γ*≥0, we wish to identify a set of domains *D*
_{
γ
} where each domain is represented as an interval *d*
_{
i
}= [ *a*
_{
i
},*b*
_{
i
}], 1≤*a*
_{
i
}<*b*
_{
i
}≤*n* such that no two *d*
_{
i
} and *d*
_{
j
} overlap for any *i*≠*j*. Additionally, each domain should have a larger interaction frequency within the domain than to its surrounding regions.

*D*

_{ γ }that optimizes the following objective:

where *D*
_{
γ
} chosen from the set of all possible domains, and *q* is a function that quantifies the quality of a domain [*a*
_{
i
},*b*
_{
i
}] at resolution *γ*. Here, the parameter *γ* is inversely related to the average domain size in *D*
_{
γ
}: lower *γ* results in sets of larger domains and higher *γ* corresponds to sets of smaller domains. Since domains are required to contain consecutive fragments of the chromosome, this problem differs from the problem of clustering the graph of 3C interactions induced by **A**, since such a clustering may place non-contiguous fragments of the chromosome into a single cluster. In fact, this additional requirement allows for an efficient optimal algorithm.

###
**Problem**
**2** (Consensus domains across resolutions).

**A**and a set of resolutions

*Γ*={

*γ*

_{1},

*γ*

_{2},…}, identify a set of non-overlapping domains

*D*

_{ c }that are most persistent across resolutions in

*Γ*:

where *D*
_{
c
} is the set of non-overlapping persistent domains across resolutions, and *p*(*a*
_{
i
},*b*
_{
i
},*Γ*) is the persistence of domain [ *a*
_{
i
},*b*
_{
i
}] corresponding to how often it appears across resolutions.

## Algorithms

### Domain identification at a particular resolution

where OPT_{1}(*l*) is the optimal solution for objective (1) for the sub-matrix defined by the first *l* positions on the chromosome (OPT_{1}(0)=0). The choice of *k* encodes the size of the domain immediately preceding location *l*. We define negative-scoring domains as non-domains and, as such, only domains with *q*>0 in the max term in (3) are retained.

*q*is:

is a *scaled density* of the subgraph induced by the interactions *A*
_{
g
h
} between genomic loci *k* and *l*. When *γ*=1, the scaled density is the weighted subgraph density [12] for the subgraph induced by the fragments between *k* and *l*, which is the upper-triangular portion of the submatrix defined by the domain in the interval [ *k*,*l*] divided by the scaled length (*l*−*k*)^{
γ
} of the domain. When *γ*=2, the scaled density is half the internal density of a graph cluster [13]. For larger values of *γ*, the length of a domain in the denominator is amplified, hence, smaller domains would produce larger objective values than bigger domains with similar interaction frequencies. Equation (4) is the zero-centered sum of (5). *μ*
_{
s
}(*l*−*k*) is the mean value of (5) over all sub-matrices of length *l*−*k* along the diagonal of **A**, and can be pre-computed for a given **A**. We disallow domains where there are fewer than 100 sub-matrices available to compute the mean. By doing this, we are only excluding domains of size larger than *n*−100 fragments, which in practice means that we are disallowing domains that are hundreds of megabases long. Values for the numerator in (5) are also pre-computed using an efficient algorithm [14], resulting in an overall run-time of *O*(*n*
^{2}) to compute OPT_{1}(*n*).

### Enumerating multiple optimal and near-optimal solutions

The set of domains found by the dynamic program in Equation 3 may not be the only set obtaining the maximum value of OPT_{1}(·). In fact, there may be multiple optimal solutions and solutions which are near optimal. The domain structures that appear in alternative optimal or near optimal solutions are of interest, especially if they are significantly different, since they represent a potentially diverse array of alternative domains that are only precluded from the initially computed optimal solution as a result of the arbitrary breaking of ties that takes place in the dynamic program. We wish to be able to account for such alternative solutions by enumerating them efficiently and in order of a decreasing solution score.

*q*(

*k*,

*l*,

*γ*)≤0) to be split arbitrarily without affecting the optimal score, we modified the procedure as shown in Equation 6 to explicitly disallow adjacent non-domains:

*l*ending a domain is

for *l* ∈ {0,1}. In Equation 6, max*k*<*l*OPT_{D}(*k*−1) represents the optimal score at *l* where *l* ends a non-domain region. This solution to Problem 1 produces a set of domains with the same optimal score as Equation 3, but guarantees that alternative optimal and near-optimal domain sets do not contain non-domains that are adjacent.

To efficiently identify alternative optimal and near-optimal solutions, we use the fact that the dynamic program in Equation (6) can be conceptually represented as a directed acyclic graph
where each
and OPT_{D}(*l*) is connected by an edge to every other term it depends on:
and {OPT_{D}(*k*)}_{
k<l
}. For each edge *e*=(*k*,*l*) in
, the weight of *e* is *q*
^{′}(*k*,*l*,*γ*). Thus, finding a set of domains with an optimal score is equivalent to finding a highest-weight path in
starting from the node representing
. To find the top-*K* solutions, we then find the *K* highest weight paths in
using a standard procedure [15].

### Obtaining a consensus set of persistent domains across resolutions

For objective (2), we use the procedure above to construct a set
.
is a set of overlapping intervals or domains, each with a quality score defined by its persistence *p* across resolutions. To extract a set of highly persistent, non-overlapping domains from
, we reduce problem 2 to the weighted interval scheduling problem [16], where competing requests to reserve a resource in time are resolved by finding the highest-priority set of non-conflicting requests. To find a consensus set of domains, we map a request associated with an interval of time to a domain and its corresponding interval on the chromosome. The priority of a request maps to a domain’s persistence *p* across length scales.

_{2}(

*j*) is the optimal non-overlapping set of domains for the

*j*th domain in a list of domains sorted by their endpoints (OPT

_{2}(0)=0), and

*c*(

*j*) is the closest domain before

*j*that does not overlap with

*j*. The first and second terms in (9) correspond to either choosing or not choosing domain

*j*respectively. We pre-compute a domain’s persistence

*p*as:

Equation (10) is therefore a count of how often domain *i* appears across all resolutions in *Γ* for domain sets identified by the dynamic program at a single resolution. It may be desirable to treat multiple highly overlapping, non-equivalent domains as a single domain, however, we conservatively identify exact repetitions of a domain across resolutions since this setting serves as a lower bound on the persistence of the domain. If
, then pre-computing persistence takes *O*(*m*|*Γ*|) time, and *c*(*j*) is precomputed after sorting the intervals by their endpoints. The limiting factor when computing OPT_{2}(*m*) is the time to compute *c*(*j*), which is of order *m* log*m*. Thus, the overall algorithm runs in *O*(*m* log*m*+(*n*
^{2}+*m*)|*Γ*|) time taking into account an additional *O*(*n*
^{2}|*Γ*|) time for computing
.

## Results

We used chromatin conformation capture data from Dixon et al. [5] for human fibroblast and mouse embryonic cells. The 3C contact matrices were already aggregated at fragment size 40kb and were corrected for experimental bias according to [17]. We compared our multiscale domains and consensus sets against the domains generated by Dixon et al. for the corresponding cell type and species. For human fibroblast cells, we used CTCF binding sites from [18]. For mouse embryonic cell CTCF binding sites and chromatin modification marks, we used data by Shen et al. [19].

### Ability to identify densely interacting domains across scales

*γ*. The distribution of mean intra-domain frequencies for Dixon et al. is skewed more to the left than that of the multiscale domains (Figure 2(b)). This difference can be partially explained by the fact that multiscale domains on average are smaller in size (

*μ*=0.2Mb,

*σ*=1.2Mb) than domains reported by Dixon et al. (

*μ*=1.2Mb,

*σ*=0.9Mb).

### Domain persistence across scales

*γ*, suggesting a hierarchical domain structure. The stability of these domains across resolutions indicates that the underlying chromosomal structure is dense within these domains and that these domains interact with the rest of the chromosome at a much lower frequency.

A pairwise comparison of domain configurations displays regions of stability across multiple resolutions (Figure 4(b)). We use the variation of information (VI) [20], a metric for comparing two sets of clusters, to compute the distance between two sets of domains. To capture the similarities between two domain sets *D* and *D*
^{′} and the inter-domain regions induced by the domains, we construct new derivate sets *C* and *C*
^{′} where *C* contains all domains *d*∈*D* as well as non-domain regions (*C*
^{′} is computed similarly). To compute entropy
, we define the probability of seeing each interval *c*
_{
i
}= [ *a*
_{
i
},*b*
_{
i
}] in *C* as *p*
_{
i
}=(*b*
_{
i
}−*a*
_{
i
})/*L* where *L* is the length of the chromosome. When computing the mutual information
between two sets of intervals *C* and *C*
^{′}, we define the joint probability *p*
_{
i
j
} to be |[ *a*
_{
i
},*b*
_{
i
}]∩[ *a*
_{
j
},*b*
_{
j
}]|/*L*.

We then compute variation of information on these two new sets: *V*
*I*(*C*,*C*
^{′})=*H*(*C*)+*H*(*C*
^{′})−2*I*(*C*,*C*
^{′}). Chromosome 1, for example, has three visually pronounced groups of resolutions within which domain sets tend to be more similar than across (*γ*= [0.00-0.20], [0.25-0.70], and [0.75-1.00] — see Figure 4(b)).

### Comparison with the previously identified set of domains in Dixon et al

At higher resolutions, domains identified by our algorithm are smaller than those reported by Dixon et al. (Figure 3). As the resolution parameter decreases to 0.0, the average size of the domains increases. The composition of the domains we identify is different from that of Dixon et al. as illustrated in Figure 4(a) and captured by the variation of information in Figure 4(b).

We use the consensus domains algorithm to obtain a consensus set of domains *D*
_{
c
} persistent across resolutions. We construct the set *Γ* by defining the range of our scale parameter to be [0,*γ*
_{max}] and incrementing *γ* in steps of 0.05. In order to more directly compare with previous results, we set *γ*
_{max}=0.5 for human and 0.25 for mouse since these are the scales at which the maximum domain sizes in Dixon et al.’s sets match the maximum domain sizes in our sets.

### Enrichment of CTCF and histone modifications near boundaries

We assess the enrichment of transcription factor CTCF and histone modifications H3K4me3 and H3K27AC within the inter-domain regions induced by the consensus domains. These enrichments provide evidence that the boundary regions between topological domains correlate with genomic regions that act as insulators and barriers, suggesting that the topological domains may play a role in controlling transcription in mammalian genomes [5].

### Multiple optimal solutions across scales reveal the hierarchical organization of topological domains

*γ*conform to a hierarchical structure empirically identifiable in Figures 4(a) and 7.

*i*th optimal solution at resolution

*γ*and

*K*total solutions are found at each resolution. We quantify the extent to which domains in this set are nested by determining the fraction of sufficiently different domain pairs {

*d*

_{ i },

*d*

_{ j }} where either

*d*

_{ i }or

*d*

_{ j }is completely contained in the other:

and
contains all pairs of domains {*d*
_{
i
},*d*
_{
j
}} from domains in
such that *α*=|*d*
_{
i
}
*Δ*
*d*
_{
j
}|/|*d*
_{
i
}∪*d*
_{
j
}| — a fraction of genomic fragments different between two domains *d*
_{
i
} and *d*
_{
j
} in relation to the union of all fragments comprising the two domains — is greater than a user-specified value. For our tests, we define two domains to be different if more than 10% of their fragments differ (*α*=0.1). If no domain is contained fully in any other domain the score *h*(·)=0. If, for every pair of domains, one of the domains is fully contained in the other, the score attains its maximum value *h*(·)=1. We empirically observe that randomly generated domains result in *h*(·)≈0.5.

To determine whether the set of all identified domains we observe is significantly more hierarchical than expected by chance, we randomly shuffle domains while maintaining the same domain and non-domain length distributions as the sets of domains we find [21]. At each resolution, we identify the *K*=10 optimal and near-optimal solutions for all chromosomes in human fibroblast cell line (IMR90) as well as mouse embryonic cells (mESC). The choice of *K*=10 is computationally beneficial given that even for such low *K*, the score for the next optimal solution drops off fast at lower *γ*, but for *γ*=0.5 the optimal score only changes by 0.02% (from 16774.7 to 16771.2) after 50000 solutions are considered. Alternatively, a weaker null hypothesis could be constructed that uses randomly shuffled Hi-C matrix. However, this approach does not control for the distribution of domain lengths — a previously established property of topological domains [5, 6]. In addition, it has recently been shown that randomly shuffled Hi-C matrices lack a clear domain structure since they exhibit significantly depleted insulation scores [24]. This weaker null hypothesis is thus not appropriate for determining the significance of hierarchical domain structure. For both organisms, we find that *h*(·) for the identified set of domains is significantly larger than *h*(·) for the randomized domains (Benjamini-Hochberg corrected *P*<0.001 over all chromosomes). The mean value of the identified set of domains is ≈0.95 as opposed to ≈0.70 for 1,000 randomized sets of domains sampled from each resolution. Computing *h*(·) on the combined set of domains is conservative since it is likely that domains from multiple optimal and near-optimal solutions can overlap but may not be completely contained in one another within a length scale. This suggests that the multiple optimal and near-optimal domains across scales exhibit a hierarchical structure and that the ensemble of domains can be used as the basis of a more detailed analysis of the hierarchical organization of these genomes.

## Discussion and conclusions

In this paper, we introduce an algorithm to identify topological domains in chromatin using interaction matrices from recent high-throughput chromosome conformation capture experiments. Our algorithm produces domains that display much higher interaction frequencies within the domains than in-between domains (Figure 2) and for which the boundaries between these domains exhibit substantial enrichment for several insulator and barrier-like elements (Figure 6). To identify these domains, we use a multiscale approach that finds domains at various size scales and generates multiple optimal and near-optimal solutions.

We define a consensus set to be a set of domains that persist across multiple resolutions and give an efficient algorithm that finds such a set optimally.

Our method uses a score function that encodes the quality of putative domains in an intuitive manner based on their local density of interactions. Variations of the scoring function in (4), for example, by median centering rather than mean centering or by optimizing the homogeneity of interaction frequencies instead of total frequencies, can be explored to test the robustness of the enrichments described here.

Our method is particularly appealing in that it requires only a single user-specified parameter *γ*
_{max}. For our experiments, the parameter *γ*
_{max} was set based on the maximum domain sizes observed in Dixon et al.’s experiments so that we could easily compare our domains to theirs. This parameter can also be set intrinsically from properties of the Hi-C interaction matrices. For example, we observe similar enrichments in both human and mouse when we set *γ*
_{max} to be the smallest *γ*∈*Γ* such that the median domain size is >80kbp (two consecutive Hi-C fragments at a resolution of 40kbp). This is a reasonable assumption since domains consisting of just one or two fragments do not capture higher-order spatial relationships (e.g. triad closure) and interaction frequencies between adjacent fragments are likely large by chance [22].

We compared the fraction of the genome covered by domains identified by Dixon et al. vs. the domains obtained from our method at various resolutions. Dixon et al.’s domains cover 85% of the genome while our sets tend to cover less of the genome (≈ 65% for a resolution that results in the same number of domains as those of Dixon et al.). The fact that our domain boundaries are more enriched for CTCF sites indicates that our smaller, more dense domains may be more desirable from the perspective of genome function. The dense, functionally-enriched domains discovered by our algorithm provide strong evidence that alternative chromatin domains exist and that a single length scale is insufficient to capture the hierarchical and overlapping domain structure visible in heat maps of 3C interaction matrices.

We provided the first quantitative analysis testing the hypothesis that the domain structure across scales is significantly hierarchically organized, suggesting that the domains we identify can be used as the basis for studying the hierarchical organization of genomes and how this structure impacts gene regulation. By incorporating multiple optimal and near optimal solutions into this analysis, we provide evidence that the observed hierarchical structure persists not only across scales but across a variety of plausible high-scoring domain sets. However, multiple optimal solutions are not necessary to quantify the hierarchical structure of the domains since single optimal solutions across scales can already reveal a hierarchical structure. There are many more near-optimal solutions at higher values of *γ* since the domain sizes tend to be smaller. For this special case, it would be desirable to develop a method that more concisely characterizes these larger solution spaces, and this is an interesting direction for future work. The quantitative evidence of the hierarchical structure of topological domains also motivates the development of novel methods for domain discovery that directly account for such hierarchy in the models they assume and the functions they optimize.

The method for discovering topological domains that we have introduced is practical for existing datasets. Our implementation is able to compute the consensus set of domains for the human fibroblast cell line and extract the consensus set in 24 minutes when run on a personal computer with 2.3GHz Intel Core i5 processor and 8Gb of RAM. Computing optimal and near-optimal solutions adds only a small overhead to overall running time: when computing 20 top optimal and near-optimal solutions per each *γ* setting (with *γ* 0.0-0.9 with a step of 0.05) the computation finishes in 25 minutes 34 seconds.

A preliminary version of this manuscript appeared in the 2013 Workshop on Algorithms for Bioinformatics [25].

## Availability and requirements

A C++11 implementation of the algorithms and instructions for compilation and use are available at http://www.cs.cmu.edu/~ckingsf/software/armatus/.

## Declarations

### Acknowledgements

This work has been partially funded by National Science Foundation (CCF-1256087, CCF-1053918, and EF-0849899) and National Institutes of Health (R21AI085376, R01HG007104). C.K. received support as an Alfred P. Sloan Research Fellow. D.F. is a predoctoral trainee supported by NIH T32 training grant T32 EB009403 as part of the HHMI-NIBIB Interfaces Initiative.

## Authors’ Affiliations

## References

- de Wit E, de Laat W:
**A decade of 3C technologies: insights into nuclear organization.***Genes Dev*2012,**26:**11–24.PubMed CentralPubMedView Article - Gibcus JH, Dekker J:
**The hierarchy of the 3D genome.***Mol Cell*2013,**49**(5):773–782.PubMed CentralPubMedView Article - Cavalli G, Misteli T:
**Functional implications of genome topology.***Nat Struct Mol Biol*2013,**20**(3):290–299.PubMedView Article - Fudenberg G, Getz G, Meyerson M, Mirny LA:
**High order chromatin architecture shapes the landscape of chromosomal alterations in cancer.***Nat Biotechnol*2011,**29**(12):1109–13.PubMed CentralPubMedView Article - Dixon JR, Selvaraj S, Yue F, Kim A, Li Y, Shen Y, Hu M, Liu JS, Ren B:
**Topological domains in mammalian genomes identified by analysis of chromatin interactions.***Nature*2012,**485**(7398):376–80.PubMed CentralPubMedView Article - Sexton T, Yaffe E, Kenigsberg E, Bantignies F, Leblanc B, Hoichman M, Parrinello H, Tanay A, Cavalli G:
**Three-dimensional folding and functional organization principles of the drosophila genome.***Cell*2012,**148**(3):458–472.PubMedView Article - Hou C, Li L, Qin ZS, Corces VG:
**Gene density, transcription, and insulators contribute to the partition of the Drosophila genome into physical domains.***Mol Cell*2012,**48**(3):471–84.PubMed CentralPubMedView Article - Kölbl AC, Weigl D, Mulaw M, Thormeyer T, Bohlander SK, Cremer T, Dietzel S:
**The radial nuclear positioning of genes correlates with features of megabase-sized chromatin domains.***Chromosome Res*2012,**20**(6):735–52.PubMedView Article - Lin YC, Benner C, Mansson R, Heinz S, Miyazaki K, Miyazaki M, Chandra V, Bossen C, Glass CK, Murre C:
**Global changes in the nuclear positioning of genes and intra- and interdomain genomic interactions that orchestrate B cell fate.***Nat Immunol*2012,**13**(12):1196–204.PubMed CentralPubMedView Article - Bickmore WA, van Steensel B:
**Genome Architecture: domain organization of interphase chromosomes.***Cell*2013,**152**(6):1270–1284.PubMedView Article - Tanay A, Cavalli G:
**Chromosomal domains: epigenetic contexts and functional implications of genomic compartmentalization.***Curr Opin Genet Dev*2013,**23**(2):197–203.PubMedView Article - Goldberg AV:
*Finding a maximum density subgraph*. Tech. Rep. 171, University of California, Berkeley, CA 1984 - Schaeffer SE:
**Graph clustering.***Comput Sci Rev*2007,**1:**27–64.View Article - Filippova D, Gadani A, Kingsford C:
**Coral: an integrated suite of visualizations for comparing clusterings.***BMC Bioinformatics*2012,**13:**276.PubMed CentralPubMedView Article - Huang L, Chiang D:
**Better k-best parsing.**In*Proceedings of the Ninth International Workshop on Parsing Technology*. Stroudsburg, PA, USA: Association for Computational Linguistics; 2005:53–64.View Article - Kleinberg J, Tardos E:
*Algorithm Design. Boston*. MA: Addison-Wesley; 2005. - Yaffe E, Tanay A:
**Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture.***Nat Genet*2011,**43**(11):1059–1065.PubMedView Article - Kim TH, Abdullaev ZK, Smith AD, Ching KA, Loukinov DI, Green RD, Zhang MQ, Lobanenkov VV, Ren B:
**Analysis of the vertebrate insulator protein CTCF-binding sites in the human genome.***Cell*2007,**128**(6):1231–1245.PubMed CentralPubMedView Article - Shen Y, Yue F, McCleary DF, Ye Z, Edsall L, Kuan S, Wagner U, Dixon J, Lee L, Lobanenkov VV, Ren B:
**A map of the cis-regulatory sequences in the mouse genome.***Nature*2012,**488:**116–120.PubMedView Article - Meilă M:
**Comparing clusterings by the variation of information.***Learn Theory Kernel Mach*2777,**2003:**173–187. - Duggal G, Wang H, Kingsford C:
**Higher-order chromatin domains link eQTLs with the expression of far-away genes.***Nucleic Acids Res Adv Access*2013,**42**(1):87–96.View Article - Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, Amit I, Lajoie BR, Sabo PJ, Dorschner MO, Sandstrom R, Bernstein B, Bender MA, Groudine M, Gnirke A, Stamatoyannopoulos J, Mirny LA, Lander ES, Dekker J:
**Comprehensive mapping of long-range interactions reveals folding principles of the human genome.***Science*2009,**326**(5950):289–293.PubMed CentralPubMedView Article - Zhou X, Lowdon RF, Li D, Lawson HA, Madden PA, Costello JF, Wang T:
**Exploring long-range genome interactions using the WashU Epigenome Browser.***Nat Methods*2013,**10**(5):375–376.PubMedView Article - Nagano T, Lubling Y, Stevens TJ, Schoenfelder S, Yaffe E, Dean W, Laue ED, Tanay A, Fraser P:
**Single-cell Hi-C reveals cell-to-cell variability in chromosome structure.***Nature*2013,**502**(7469):59–64.PubMedView Article - Filippova D, Patro R, Duggal G, Kingsford C:
**Multiscale Identification of Topological Domains in Chromatin.**In*Proceedings of 13th Workshop on Algorithms in Bioinformatics (WABI), Volume 8126*. Heidelberg, Germany; 2013:300–3012.

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.