CHiCANE is a toolkit for identifying chromatin regions that interact more often than expected by chance given the specific interaction dynamics of a capture Hi-C (CHi-C) experiment. It offers extensive flexibility in implementing interaction calling via numerous optional parameters so that it can be tailored to a wide range of CHi-C experiments. Additionally, CHiCANE includes multiple helper functions for use in both pre- and post-processing/analysis of the interaction calls.
For details of statistical concepts implemented by CHiCANE and their application to both the region CHi-C and promoter CHi-C datasets, please see our manuscript: Holgerson, Erle, Gillespie, Andrea, et al. “Identifying High Confidence Capture Hi-C Interactions Using CHiCANE” (Nature Protocols 2021).
The main function for running the CHiCANE method is chicane()
. This takes a bam file and bed files with all restriction fragments and targeted restriction fragments as input, and produces a table with fragment interactions and associated p-values.
The package comes with a subset of reads from a capture Hi-C experiment on the Bre80 normal epithelial breast tissue cell line (Baxter et al. 2018). The experiment targeted several breast cancer risk loci in non-coding regions of the genome, and aimed to associate these risk variants with genes. Two of the risk SNPs are rs13387042 and rs16857609, and reads that map to them have been included in the file Bre80_2q35.bam.
library(chicane);
#> Loading required package: gamlss.tr
#> Loading required package: gamlss.dist
#> Loading required package: MASS
#> Loading required package: gamlss
#> Loading required package: splines
#> Loading required package: gamlss.data
#>
#> Attaching package: 'gamlss.data'
#> The following object is masked from 'package:datasets':
#>
#> sleep
#> Loading required package: nlme
#> Loading required package: parallel
#> ********** GAMLSS Version 5.3-4 **********
#> For more on GAMLSS look at https://www.gamlss.com/
#> Type gamlssNews() to see new features/changes/bug fixes.
#> Loading required package: data.table
# example BAM file, baits, and restriction fragments
<- system.file('extdata', 'Bre80_2q35.bam', package = 'chicane');
bam <- system.file('extdata', '2q35.bed', package = 'chicane');
baits <- system.file('extdata', 'GRCh38_HindIII_chr2.bed.gz', package = 'chicane'); # HindIII fragments on chromosome 2
fragments
if( bedtools.installed() ) {
<- chicane(
chicane.results bam = bam,
baits = baits,
fragments = fragments
);
print( chicane.results[ 1:10 ] );
}#> Taking input= as a system command ('sh /tmp/RtmpdNJTjh/Rinsta536160f7fb70/chicane/bin/prepare_bam.sh -b /tmp/RtmpdNJTjh/Rinsta536160f7fb70/chicane/extdata/Bre80_2q35.bam -c /tmp/RtmpdNJTjh/Rinsta536160f7fb70/chicane/extdata/2q35.bed -f /tmp/RtmpdNJTjh/Rinsta536160f7fb70/chicane/extdata/GRCh38_HindIII_chr2.bed.gz -d /tmp/RtmpdNJTjh/Rbuilda536136379101/chicane/vignettes ') and a variable has been used in the expression passed to `input=`. Please use fread(cmd=...). There is a security concern if you are creating an app, and the app could have a malicious user, and the app is not running in a secure environment; e.g. the app is running as root. Please read item 5 in the NEWS file for v1.11.6 for more information and for the option to suppress this message.
#> target.id bait.id target.chr target.start
#> 1: chr2:217491009-217492415 chr2:217035649-217042355 chr2 217491009
#> 2: chr2:216650000-216651600 chr2:217430817-217437011 chr2 216650000
#> 3: chr2:217290048-217292530 chr2:217430817-217437011 chr2 217290048
#> 4: chr2:216269087-216273140 chr2:217430817-217437011 chr2 216269087
#> 5: chr2:216551051-216551968 chr2:217430817-217437011 chr2 216551051
#> 6: chr2:216879929-216883553 chr2:217430817-217437011 chr2 216879929
#> 7: chr2:216696003-216698549 chr2:217430817-217437011 chr2 216696003
#> 8: chr2:216687613-216696003 chr2:217430817-217437011 chr2 216687613
#> 9: chr2:216906471-216908390 chr2:217430817-217437011 chr2 216906471
#> 10: chr2:217162758-217166250 chr2:217430817-217437011 chr2 217162758
#> target.end bait.chr bait.start bait.end bait.to.bait bait.trans.count
#> 1: 217492415 chr2 217035649 217042355 FALSE 0
#> 2: 216651600 chr2 217430817 217437011 FALSE 0
#> 3: 217292530 chr2 217430817 217437011 FALSE 0
#> 4: 216273140 chr2 217430817 217437011 FALSE 0
#> 5: 216551968 chr2 217430817 217437011 FALSE 0
#> 6: 216883553 chr2 217430817 217437011 FALSE 0
#> 7: 216698549 chr2 217430817 217437011 FALSE 0
#> 8: 216696003 chr2 217430817 217437011 FALSE 0
#> 9: 216908390 chr2 217430817 217437011 FALSE 0
#> 10: 217166250 chr2 217430817 217437011 FALSE 0
#> target.trans.count distance count expected p.value q.value
#> 1: 0 452710.0 28 6.222070 0.0001352420 0.2346449
#> 2: 0 783114.0 16 2.992273 0.0004798454 0.8172368
#> 3: 0 142625.0 33 8.889424 0.0010581410 0.8172368
#> 4: 0 1162800.5 8 2.101973 0.0019527803 0.8172368
#> 5: 0 882404.5 11 2.722403 0.0021130042 0.8172368
#> 6: 0 552173.0 11 3.822236 0.0036069810 0.8172368
#> 7: 0 736638.0 16 4.162590 0.0043689364 0.8172368
#> 8: 0 742106.0 15 3.999764 0.0055330101 0.8172368
#> 9: 0 526483.5 11 4.150239 0.0063231596 0.8172368
#> 10: 0 269410.0 24 8.030272 0.0063767687 0.8172368
The chicane
function returns a data.table
object sorted by q-value. Each row contains information about the target and bait fragments, and the observed and expected number of reads linking the two fragments.
Before running the main chicane()
function, a minumum of three input files must be obtained. The chicane()
function, which is the main interaction calling step, can be run on these three input files directly as it will work as a wrapper function to prepare the data and the run interaction calling in a single step. Alternatively, there is a prepare.data()
function which allows a user to run data preparation on these same three input files to make an interactions data table, which can subsequently be used as input for interaction calling separately in a step-wise fashion (as well as a number of other useful functions as discussed below). This section dicusses how to obtain each of the three required inputs, followed by a discussion of the optional preparation of an interactions data table.
The first required input is at least one BAM file containing mapped interaction reads. To convert the BAM file into a text file, R calls the shell script prepare_bam.sh
. This script relies on bedtools to identify the restriction fragments corresponding to each read (based on the second required input, the \(in\) \(silico\) genome digest) and select reads with at least one end overlapping the baitmap file (the third required input). The reads are also filtered to remove those in which both reads map to the same fragment. The resulting data is read into a data.table
object, and the number of reads linking any two fragments is calculated.
The second required input is the \(in\) \(silico\) digest file, which contains the coordinates of the fragments created by the restriction enzyme (\(e.g.\) HindIII) used in the experiment, in BED format. For users that have a digested genome output from HiCUP, this needs to be converted to a strict 3 column, header-free BED format to be compatible with bedtools. CHiCANE includes a helper function, convert.hicup.digest.bed()
, which can be used to convert a digested genome in HiCUP format to a BED file of the restiction digest fragments. Any BED file containing the coordinates of restriction enzyme digested fragments of the genome build used according to the experiment are appropriate to be used with CHiCANE. For the purposes of this vignette, the fragments
variable is assigned to the HindIII fragments of GRCh38 chromosome 2 (sample data included in the package) at the top of this document. Here is an example of what this should look like showing the top six entries in the example fragments file included in the package:
chr2 0 15298
chr2 15298 16049
chr2 16049 18132
chr2 18132 24992
chr2 24992 28070
chr2 28070 28179
The third and final input required is a baitmap in BED format containing the coordinates of only the restriction digested fragments which contain a bait from the capture step of the experiment. This can be obtained by conducting a bedtools intersect
command on the fragment digest above and a BED file with the coordinates of the individual probes included on the array used in the capture step. The following is an example command to create the baitmap, then limit it to only unique fragments with an overlap of at least 20bp:
# intersect restriction fragments file with coordinates for probes included on the array
bedtools intersect -wo -a /path/to/fragments.bed -b /path/to/capture_regions.bed \
> /path/to/fragments_overlap_full.bed
# limit to overlap of at least 20bp and unique fragments only
awk '$7>=20 {print $1 "\t" $2 "\t" $3}' /path/to/fragments_overlap_full.bed | uniq > /path/to/baitmap.bed
The final output of the above command (baitmap.bed)
is then used as the baits
parameter in either the chicane
or prepare.data
function calls. At the top of this vignette we have set baits
to the sample baitmap at the 2q35 locus which is included in the package. As an example of how this file should look, this is how the BED file for the sample baits at 2q35 would appear:
chr2 217035649 217042355
chr2 217430817 217437011
NOTE: The BED format used for the fragments and baits parameters is a header-free, tab-separated, 3 column (chr, start, end) format. Containing strictly 3 columns is necessary both as input to the CHiCANE function calls and for the snippet of code above which creates the baitmap to function properly.
To speed up model fitting, it is possible to pre-process the data with the prepare.data()
function and run chicane
directly on the resulting data table. Furthermore, running in this step-wise manner is useful for those wanting to add custom covariates not already included in the data table or test variations of parameters and/or distibutions in the interaction calling. To first compare biological replicates using the compare.replicates()
function, prepare.data()
should be run on each replicate separately to assess concordance before deciding to merge them. This processing can take a while (~45 minutes for a 19GB BAM file on a commodity computer with 16GB RAM), and only needs to be done once for each BAM.
if( bedtools.installed() ) {
<- prepare.data(
interaction.data bam = bam,
baits = baits,
fragments = fragments
);
}#> Taking input= as a system command ('sh /tmp/RtmpdNJTjh/Rinsta536160f7fb70/chicane/bin/prepare_bam.sh -b /tmp/RtmpdNJTjh/Rinsta536160f7fb70/chicane/extdata/Bre80_2q35.bam -c /tmp/RtmpdNJTjh/Rinsta536160f7fb70/chicane/extdata/2q35.bed -f /tmp/RtmpdNJTjh/Rinsta536160f7fb70/chicane/extdata/GRCh38_HindIII_chr2.bed.gz -d /tmp/RtmpdNJTjh/Rbuilda536136379101/chicane/vignettes ') and a variable has been used in the expression passed to `input=`. Please use fread(cmd=...). There is a security concern if you are creating an app, and the app could have a malicious user, and the app is not running in a secure environment; e.g. the app is running as root. Please read item 5 in the NEWS file for v1.11.6 for more information and for the option to suppress this message.
The interaction data object contains details of the fragments, and the number of reads linking the two fragments.
if( bedtools.installed() ) print(interaction.data);
#> target.id bait.id target.chr target.start
#> 1: chr2:200013520-200014510 chr2:217035649-217042355 chr2 200013520
#> 2: chr2:200018581-200020582 chr2:217035649-217042355 chr2 200018581
#> 3: chr2:200020582-200023537 chr2:217430817-217437011 chr2 200020582
#> 4: chr2:200033118-200034885 chr2:217430817-217437011 chr2 200033118
#> 5: chr2:200040560-200041826 chr2:217430817-217437011 chr2 200040560
#> ---
#> 4477: chr2:229867689-229872871 chr2:217035649-217042355 chr2 229867689
#> 4478: chr2:229872871-229875614 chr2:217035649-217042355 chr2 229872871
#> 4479: chr2:229882404-229887121 chr2:217430817-217437011 chr2 229882404
#> 4480: chr2:229894321-229904945 chr2:217035649-217042355 chr2 229894321
#> 4481: chr2:229917543-229919836 chr2:217430817-217437011 chr2 229917543
#> target.end bait.chr bait.start bait.end bait.to.bait bait.trans.count
#> 1: 200014510 chr2 217035649 217042355 FALSE 0
#> 2: 200020582 chr2 217035649 217042355 FALSE 0
#> 3: 200023537 chr2 217430817 217437011 FALSE 0
#> 4: 200034885 chr2 217430817 217437011 FALSE 0
#> 5: 200041826 chr2 217430817 217437011 FALSE 0
#> ---
#> 4477: 229872871 chr2 217035649 217042355 FALSE 0
#> 4478: 229875614 chr2 217035649 217042355 FALSE 0
#> 4479: 229887121 chr2 217430817 217437011 FALSE 0
#> 4480: 229904945 chr2 217035649 217042355 FALSE 0
#> 4481: 229919836 chr2 217430817 217437011 FALSE 0
#> target.trans.count distance count
#> 1: 0 17024987 1
#> 2: 0 17019420 1
#> 3: 0 17411854 2
#> 4: 0 17399912 1
#> 5: 0 17392721 1
#> ---
#> 4477: 0 12831278 1
#> 4478: 0 12835240 1
#> 4479: 0 12450848 1
#> 4480: 0 12860631 1
#> 4481: 0 12484776 1
The interactions
parameter of the chicane()
function can take either a data.table
object from prepare.data
or the path to such a table.
if( bedtools.installed() ) {
<- tempfile('interaction_data');
file.name write.table(interaction.data, file.name, row.names = FALSE);
<- chicane(interactions = file.name);
chicane.results }
CHiCANE models the expected number of paired reads containing two restriction fragments as a function of distance between them and how frequently the bait interacts with other fragments.
In the CHiCANE model, \(Y_{ij}\) represents the number of reads linking bait \(i\) and target fragment \(j\), such that it is assumed to follow a distribution with a mean \(\mu_{ij}\) as:
\[\begin{equation} \mu_{ij} = \beta_0 + \beta_1\log(d_{ij}) + \beta_2\log(t_i + 1) \end{equation}\]
where \(d_{ij}\) represents the distance between the two fragments, and \(t_i\) represents the number of reads linking the bait with a \(trans\) fragment.
For bait to bait interactions, terms are added to the model to adjust for \(trans\) counts of the other end, fragment \(j\), and the product of \(trans\) counts of both fragments as:
\[\begin{equation} \mu_{ij} = \beta_0 + \beta_1\log(d_{ij}) + \beta_2\log(t_i + 1) + \beta_3\log(t_j + 1) + \beta_4\log(t_i + 1)\log(t_j + 1) \end{equation}\]
Each possible interaction is assaigned an expected \(\mu_{ij}\) and a \(p\)-value for the observed counts \(y_{ij}\) versus what is expected by chance is calculated as:
\[\begin{equation} p = P(Y_{ij} \geq y_{ij}) \end{equation}\]
This probability is dependent on how the distribution is modelled.
The CHiCANE method supports several different distributions of the counts \(Y_{ij}\) conditional on the mean \(\mu_{ij}\). By default the counts are assumed to follow a negative binomial distribution with \(E(Y_{ij}) = exp (\mu_{ij})\). After fitting the model by a maximum likelihood over all possible interactions, the fitted model is used to estimate a \(p\)-value for each pair. Poisson, truncated Poisson, or truncated negative binomial distributions are also supported and can override the default negative binomial by being passed as the distribution
parameter.
Negative binomial is the default distribution in CHiCANE as it is deemed more appropriate for modelling datasets with high variance in comparison to Poisson due to its inclusion of a term for overdispersion.
Using this default distribution, the counts are modelled as \(Y \sim NB(\mu, \theta)\), with mean \(\mu\) and dispersion term \(\theta\). The variance of the counts conditional on the mean is then
\[\begin{equation} Var(Y) = \mu + \frac{\mu^2}{\theta} \end{equation}\]
This model implements the glm.nb()
function in the MASS package.
If the model fails to converge due to a lack of overdispersion in the data chicane()
will attempt to diagnose the error and if this is found to be the case a Poisson distribution will be implemented instead.
The Poisson distribution assumes that \(Y \sim \text{Poisson}(\mu)\), such that \(Var(Y) = \mu\). As the Poisson model does not allow variance of counts to exceed the mean, capture Hi-C counts usually possesses more variability than can be explained using this distribution, the result of which is false positives.
By default CHiCANE only includes observed interactions, rather than all possible interactions, \(i.e.\) leaving out zero counts (see Inclusion of Zeros). One way to account for this is in the statistical model, by fitting a truncated negative binomial distribution in the chicane()
call which implements the GAMLSS R package (Stasinopoulos and Rigby 2016). It assigns \(P(Y = 0) = 0\), and assumes proportional probabilities to the negative binomial model for values \(y > 0\).
The truncated Poisson distribution is also supported, which behaves much like the truncated negative binomial model, but rather assumes the probabilities are proportional to Poisson probabilities for values \(y>0\).
It is worth noting that specifying either of the truncated distributions will significantly slow down the model fitting.
CHiCANE allows users to specify additional covariates that can be added to the model with the adjustment.terms
parameter. For example, we can add a term to adjust for the chromosome of the target fragment as shown below.
data(bre80);
<- chicane(
adjusted.results interactions = bre80,
adjustment.terms = 'target.chr'
);
print( adjusted.results[ 1:5 ] );
#> target.id bait.id target.chr target.start
#> 1: chr19:1155128-1228414 chr2:217035649-217042355 chr19 1155128
#> 2: chr1:117125276-117135137 chr2:217035649-217042355 chr1 117125276
#> 3: chr1:14950583-14951791 chr2:217035649-217042355 chr1 14950583
#> 4: chr1:49076960-49080684 chr2:217035649-217042355 chr1 49076960
#> 5: chr3:145680459-145682483 chr2:217035649-217042355 chr3 145680459
#> target.end bait.chr bait.start bait.end bait.to.bait bait.trans.count
#> 1: 1228414 chr2 217035649 217042355 FALSE 9494
#> 2: 117135137 chr2 217035649 217042355 FALSE 9494
#> 3: 14951791 chr2 217035649 217042355 FALSE 9494
#> 4: 49080684 chr2 217035649 217042355 FALSE 9494
#> 5: 145682483 chr2 217035649 217042355 FALSE 9494
#> target.trans.count distance count expected p.value q.value
#> 1: 2 NA 2 1.003963 0.2656989 0.6417435
#> 2: 2 NA 2 1.003992 0.2657096 0.6417435
#> 3: 2 NA 2 1.003992 0.2657096 0.6417435
#> 4: 2 NA 2 1.003992 0.2657096 0.6417435
#> 5: 2 NA 2 1.004014 0.2657179 0.6417435
The adjustment.terms
parameter also supports expressions such as log transformations. Multiple adjustment terms can be added by passing as a vector.
Any adjustment term must correspond to a column already present in the data. If a user would like to adjust for anything that is not already present in the CHiCANE output (\(e.g.\) GC content), there’s a three step approach:
prepare.data()
on your BAM file(s)chicane()
by setting the interactions
parameter to the output from step 2By default, all baits and targets are included when fitting the CHiCANE model. An alternative approach is to filter out baits and fragments with low (Dryden et al. 2014) or high (Cairns et al. 2016) degree of “interactibility,” which is inferred via \(trans\) counts for each bait. CHiCANE also supports this filtering through the bait.filters
and target.filters
parameters.
Each of these take a vector of length two, where each element corresponds to the proportion of fragments that should be considered to fall below the “lower limit” and “upper limit.” For example, passing in bait.filters = c(0.3, 0.9)
means that the baits with the lowest 30% or highest 10% of \(trans\) counts will be removed before fitting the model.
Filtering fragments may affect multiple testing correction by changing the number of tests performed.
The helper function compare.replicates()
can be used to assess concordance between biological replicates before deciding whether to merge them to improve signal to noise ratio. This is done by conducting the prepare.data()
step on each BAM separately before passing to the compare.replicates()
function.
CHiCANE can merge replicates at the data processing step. If more than one BAM file is available, these can be passed as a vector to the bam
parameter of the chicane()
and prepare.data()
functions. The replicate.merging.method
parameter determines how replicates are merged. Available options are ‘sum’ and ‘weighted-sum,’ with the former being the default.
In order to assess goodness of fit for the model implemented, CHiCANE outputs rootogram model fit plots and statistics to the directory provided through the interim.data.directory
parameter in the chicane()
function. The default NULL
will skip these so it is important for a user to set this directory in order assess the model fit.
Adjusting for distance is done in a piecewise linear fashion. \(cis\) data is ordered by distance, and split into equally sized bins. The count model is then fit separately within each bin, and results are merged at the end by concatenating the results from the individual model fits.
By default, CHiCANE will split the data into distance quantiles, \(i.e.\) 100 equally sized groups. However, if the resulting datasets are too small, the number of distance bins is halved until all resulting datasets are large enough for the model to be fit. To override this default, pass the desired number of distance bins to distance.bins
. For example, setting distance.bins = 4
results in the count model being fit separately in each quartile. \(trans\) and \(cis\) interactions are fit separately.
Finally, CHiCANE applies the Benjamini-Hochberg method of multiple testing correction separately for each bait. To change this to a global multiple testing correction, set multiple.testing.correction = 'global'
in the main chicane()
function.
The chicane()
function’s final output is a q-value sorted data table of all interactions included in the model fitting. This full interactions table is useful in some instances such as the locus plot and background interactions functionality discussed below. However, its large size is rather unwieldy for most downstream analysis, so it is useful to also filter for only interactions below a desired q-value threshold and save this more workable table as well.
In order to assist users with interpretation of interaction calls, CHiCANE also includes several helper functions for this purpose designed to be applicable to a myriad of experimental designs as delineated below and in more detail in our aforementioned manuscript.
The first step in interpretation of the interaction calls is likely to be annotation of the target fragments. Selecting just columns 3-5 from the CHiCANE interactions data table and saving the resulting coordinates as a BED file will then allow a bedtools intersect
to be done between the target fragments and a genome annotation file also in BED format. The resulting output will indicate the genes that any target fragments overlap with.
Similarly, an intersect could be performed with VCF files containing variant calls or BED files containing ChIP-Seq peak calls for histone marks or CTCF binding sites. There are many disparate datasets which use a file type compatible with bedtools intersect
and numerous options for the command to customise the output.
CHiCANE also contains a function to convert its results into a format compatible with the WashU Epigenome Browser (Li et al. 2019), convert.standard.format()
. The path to significant interaction calling results is provided to the chicane.results
parameter and the desired name for the output file is passed to file.name
. Once standard format is obtained it can be uploaded to the browser from the ‘Tracks’ dropdown, selecting ‘Local Text tracks.’ In the upload window select ‘long-range format by CHiCANE’ as the text file type and then your standard format file in ‘Browse.’ Once the track is loaded in the browser, you can right click on the track and set the Display Mode to ‘ARC.’
CHiCANE results are also usable with the Gviz (Hahne and Ivanek 2016) package to visualise along with highly customisable genome anootation tracks. Additionally, the create.locus.plot()
function in the CHiCANE package will make a basic implementation of such a plot with 6 tracks:
The user needs to provide the genome build to the genome
parameter and coordinates of the locus desired to be plotted to the chr
, start
, and end
parameters. A genome annotation file in GTF format needs to be provided to gene.data
. The genomic feature chosen to be included will be passed as a BED format file to genomic.features
and the desired name for this track is passed as feature.name
. Finally, the path to the FULL interaction calls table is passed to interaction.data
and the name for the output plot is set by file.name
. By default only interactions below an FDR threshold of 0.05 will be included in the interactions track; a user can override this default by setting the `fdr.filter’ parameter. A brief example keeping all other parameters at default is shown below:
if( bedtools.installed() ) {
create.locus.plot(
genome = 'hg38',
chr = 'chr2', start = 111100000, end = 112200000,
gene.data = '/path/to/gene_data.gtf',
genomic.features = '/path/to/baitmap.bed',
feature.name = 'baits',
interaction.data = '/path/to/interaction_calls.txt',
file.name = '/path/to/locus_plot.pdf'
); }
In order to accurately estimate histone marker enrichment, the helper function stratified.enrichment.sample()
will randomise interactions which match the number and proportion of observed interactions which serve as background for comparison, stratified by distance bins: 0-100kbp, 100kbp-1Mbp, 1Mbp-10Mbp, 10Mbp+, and \(trans\). Full paths to data tables containing significant and insignificant calls are passed to significant.results
and nonsignificant.results
, respectively.
The code in this workflow is intended to be run on Baxter T-47D from the Zenodo directory of Supplementary Data for our manuscript at: https://zenodo.org/record/4073433
First pre-processing of input files needs to be completed as discussed above (see Pre-Processing Data). The BAM files referred to in this vignette have been created using HiCUP, the digested fragments were processed with the convert.hicup.digest.bed()
function, and the baitmap was created via bedtools intersect
between the aforementioned fragments and array bait probes used in the capture step. The fragments used here (hg38, HindIII digested) are found in Zenodo directory under data/
and the baitmap is found in data/captured_fragments/T47D/
as used in the prepare.data()
call below.
The interactions data table can then be prepared with these processed inputs (aligned BAMs, fragments BED, and baitmap BED). The resulting output can then be saved before being passed to the main chicane()
function for interaction calling. Once completed save both the full interaction calls results as well as calls filtered for significance to be used for different purposes.
if( bedtools.installed() ) {
# initial prep of interactions data table
<- prepare.data(
interaction.data bam = list.files('data/bams/T47D', pattern = '.bam', full.names = TRUE),
baits = 'data/captured_fragments/T47D/Baxter_captured_fragments.bed',
fragments = 'data/GRCh38_HindIII_fragments.bed'
);
# save interactions data table
<- tempfile('interaction_data');
file.name write.table(interaction.data, file.name, row.names = FALSE);
# run interaction calling and save model fit plots and statistics
<- chicane(
chicane.results interactions = file.name,
interim.data.dir = 'graphs/T47D/model_fits'
);
# save full interaction calls table
write.table(
chicane.results, file = 'data/chicane/T47D/interaction_calls.txt',
row.names = FALSE,
quote = FALSE,
sep = '\t'
);
# filter for significant only and save these results separately
<- chicane.results[q.value < 0.05]
significant.results
write.table(
significant.results, file = 'data/chicane/T47D/interaction_calls_significant.txt',
row.names = FALSE,
quote = FALSE,
sep = '\t'
);
}
After completion of interaction calling, the model fit plots and statistics in the interim.data.dir
should be examined to ensure goodness of fit (detailed discussion of this in manuscript). Additionally, it is useful to examine the proportion of bait-to-bait, \(cis\) and \(trans\) interactions in the significant calls and the number of interaction calls by distance bins, which can be done with the following code:
if( bedtools.installed() ) {
# calculate proportions by interaction type
<- nrow(significant.results);
total <- sum(is.na(significant.results$distance))/total;
trans.prop <- sum(!is.na(significant.results$distance))/total;
cis.prop <- sum(significant.results$bait.to.bait)/total;
b2b.prop <- c('trans' = trans.prop, 'cis' = cis.prop, 'bait.to.bait' = b2b.prop);
int.data print(int.data);
# get number of interactions by distance bins
<- NULL;
binned.data
<- list(
distance.bins "0-10kbp" = c(0, 1e4),
"10-100kbp" = c(1e4, 1e5),
"100kbp-1Mbp" = c(1e5, 1e6),
"1-10Mbp" = c(1e6, 1e7),
"10-100Mbp" = c(1e7, 1e8),
"100-1000Mbp" = c(1e8, 1e9)
);
for(dist.i in names(distance.bins)){
<- length(which(significant.results$distance >= distance.bins[[dist.i]][1] & significant.results$distance < distance.bins[[dist.i]][2]));
bin.sum <- cbind(binned.data, bin.sum);
binned.data
}
colnames(binned.data) <- names(distance.bins);
print(binned.data);
}
Once the interaction calls are determined to be reasonable, it is useful to examine the significant interactions in context. Uploading the files to the WashU Epigenome Browser is a quick and user-friendly way to do so. Use the CHiCANE helper function to easily convert to the correct format for the browser.
if( bedtools.installed() ) {
# make a browser compatible file from significant interactions
<- tempfile('significant_calls_standard.txt')
browser.file convert.standard.format(
chicane.results = 'data/chicane/T47D/interaction_calls_significant.txt',
file.name = browser.file
);
}
Viewing in the browser will help to elucidate interaction peaks for loci of interest, then a user can make a plot of interactions at a particular locus using another of CHiCANE’s helper functions. In the following example we plot interactions at the 2q35 locus.
if( bedtools.installed() ) {
# set genome annotation file
<- system.file('extdata', 'gencode_2q35.gtf', package = 'chicane');
gene.ann
# generate 2q35 locus plot
create.locus.plot(
genome = 'hg38',
chr = 'chr2',
start = 216600000,
end = 217140000,
gene.data = gene.ann,
genomic.features = 'data/captured_fragments/T47D/Baxter_captured_fragments.bed',
feature.name = 'baits',
interaction.data = 'data/chicane/T47D/interaction_calls.txt',
file.name = 'graphs/T47D/2q35.pdf'
);
}
sessionInfo();
#> R version 3.6.0 (2019-04-26)
#> Platform: x86_64-conda_cos6-linux-gnu (64-bit)
#> Running under: CentOS Linux 8 (Core)
#>
#> Matrix products: default
#> BLAS/LAPACK: /data/scratch/DBC/UBCN/BCRBIOIN/SHARED/software/conda_envs/R-3.6.0/lib/R/lib/libRblas.so
#>
#> locale:
#> [1] C
#>
#> attached base packages:
#> [1] parallel splines stats graphics grDevices utils datasets
#> [8] methods base
#>
#> other attached packages:
#> [1] chicane_0.1.8 data.table_1.14.0 gamlss.tr_5.1-7 gamlss_5.3-4
#> [5] nlme_3.1-151 gamlss.data_6.0-1 gamlss.dist_5.3-2 MASS_7.3-53.1
#>
#> loaded via a namespace (and not attached):
#> [1] bslib_0.2.4 compiler_3.6.0 formatR_1.11
#> [4] jquerylib_0.1.4 bedr_1.0.7 futile.logger_1.4.3
#> [7] iterators_1.0.13 R.methodsS3_1.8.1 futile.options_1.0.1
#> [10] R.utils_2.10.1 tools_3.6.0 testthat_3.0.2
#> [13] digest_0.6.27 jsonlite_1.7.2 evaluate_0.14
#> [16] lattice_0.20-41 rlang_0.4.11 foreach_1.5.1
#> [19] Matrix_1.3-2 yaml_2.2.1 VennDiagram_1.6.20
#> [22] xfun_0.22 stringr_1.4.0 knitr_1.34
#> [25] sass_0.3.1 grid_3.6.0 R6_2.5.1
#> [28] survival_3.2-10 rmarkdown_2.11 lambda.r_1.2.4
#> [31] magrittr_2.0.1 codetools_0.2-18 htmltools_0.5.1.1
#> [34] stringi_1.5.3 R.oo_1.24.0
Development of CHiCANE was supported by Breast Cancer Now.