ChIA-PET long-range chromatin interactions
This genome browser demo presents 15 datasets of ChIA-PET long-range chromatin interactions along with human genome assembly GRCh37 (hg19) for comparative studies. ChIA-PET data were generated by ENCODE Phase 2. The ChIA-PET chromatin interaction data includes 8 experiments in 5 cell lines (K562, MCF-7, HeLa-S3, HCT116 and NB4) with 3 kinds of target genes (POLR2A, CTCF and ESR1).
Data preparation for GIVE
We used the ENCODE ChIA-PET long-range chromatin interactions bed (bed12) format data, which can be downloaded from ENCODE Experiment Matrix. Alternatively, you can use batch_download.txt
with following shell command to download all the data.
## run the following command in linux shell
xargs -n 1 curl -O -L < batch_download.txt
The following table shows all the 15 bed format data sets we can get from ENCODE Phase 2.
DataFile | Experiment Accession | Target gene | Cell line | Description |
---|---|---|---|---|
ENCFF001THT.bed.gz | ENCSR000BZX | POLR2A | HCT116 | POLR2A ChIA-PET on human HCT-116 |
ENCFF001THU.bed.gz | ENCSR000BZW | POLR2A | HeLa-S3 | POLR2A ChIA-PET on human HeLa-S3 |
ENCFF001THV.bed.gz | ENCSR000CAC | CTCF | K562 | CTCF ChIA-PET on human K562 |
ENCFF001THW.bed.gz | ENCSR000BZY | POLR2A | K562 | POLR2A ChIA-PET on human K562 |
ENCFF001THX.bed.gz | ENCSR000CAD | CTCF | MCF-7 | CTCF ChIA-PET on human MCF-7 |
ENCFF001THY.bed.gz | ENCSR000CAD | CTCF | MCF-7 | CTCF ChIA-PET on human MCF-7 |
ENCFF001THZ.bed.gz | ENCSR000BZZ | ESR1 | MCF-7 | ESR1 ChIA-PET on human MCF-7 |
ENCFF001TIA.bed.gz | ENCSR000BZZ | ESR1 | MCF-7 | ESR1 ChIA-PET on human MCF-7 |
ENCFF001TIB.bed.gz | ENCSR000BZZ | ESR1 | MCF-7 | ESR1 ChIA-PET on human MCF-7 |
ENCFF001TIC.bed.gz | ENCSR000BZY | POLR2A | K562 | POLR2A ChIA-PET on human K562 |
ENCFF001TID.bed.gz | ENCSR000CAA | POLR2A | MCF-7 | POLR2A ChIA-PET on human MCF-7 |
ENCFF001TIE.bed.gz | ENCSR000CAA | POLR2A | MCF-7 | POLR2A ChIA-PET on human MCF-7 |
ENCFF001TIF.bed.gz | ENCSR000CAA | POLR2A | MCF-7 | POLR2A ChIA-PET on human MCF-7 |
ENCFF001TIG.bed.gz | ENCSR000CAB | POLR2A | NB4 | POLR2A ChIA-PET on human NB4 |
ENCFF001TIJ.bed.gz | ENCSR000CAA | POLR2A | MCF-7 | POLR2A ChIA-PET on human MCF-7 |
The following table shows a sample of the bed12 format used in ENCODE datasets, and the full description can be found here.
chrom | chromStart | chromEnd | name | score | strand | thickStart | thickEnd | itermRgb | blockCount | blockSizes | blockStarts |
---|---|---|---|---|---|---|---|---|---|---|---|
chr1 | 3507144 | 3538145 | chr1:3507144..3509308-chr1:3534421..3538145,3 | 300 | . | 3507144 | 3538145 | 255,0,0 | 2 | 2164,3724 | 0,27277 |
chr1 | 3507584 | 3520603 | chr1:3507584..3509631-chr1:3518303..3520603,3 | 300 | . | 3507584 | 3520603 | 255,0,0 | 2 | 2047,2300 | 0,10719 |
chr1 | 761369 | 763199 | chr1:761369..763199-chr8:182999..184804,2 | 200 | . | 761369 | 763199 | 255,0,0 | 1 | 1830 | 0 |
chr8 | 182999 | 184804 | chr1:761369..763199-chr8:182999..184804,2 | 200 | . | 182999 | 184804 | 255,0,0 | 1 | 1805 | 0 |
The first two rows in the sample show two intra-chromosome interactions, and the last two rows duplicates show one inter-chromosome interactions. So you may have found that we only need the name
column to get all the interaction information. So we wrote a simple script chiapet2give.sh to convert the datasets to the GIVE supported interaction bed format (the format definition can be found in GIVE Manual). Run following command results in converted GIVE interaction bed files, which are named with a prefix give_x_
, such as give_x_ENCFF001THU.bed.gz.bed
.
## run in linux shell
ls ENCFF*.bed.gz | xargs -n 1 -P 4 -I {} bash ../../chiapet2give.sh {} ./
The following table shows the GIVE interaction bed format. These datasets can be loaded to GIVE MySQL server.
ID | chrom | Start | End | linkID | value | dirFlag |
---|---|---|---|---|---|---|
1 | chr20 | 47889560 | 47895795 | 1 | 16.026310742063 | -1 |
2 | chr20 | 47896527 | 47898203 | 1 | 16.026310742063 | -1 |
3 | chr17 | 79827812 | 79838989 | 2 | 15.548584214411 | -1 |
4 | chr17 | 79848037 | 79871266 | 2 | 15.548584214411 | -1 |
5 | chr17 | 27046828 | 27048611 | 3 | 15.5357777367182 | -1 |
6 | chr17 | 27048612 | 27049990 | 3 | 15.5357777367182 | -1 |
Build track in MariaDB
You need a server to build a genome browser with GIVE. Please read the prerequisites and configuration of GIVE server. In that tutorial page, you will also learn how to prepare MariaDB database and the GIVE Toolbox has information on how to build a reference genome for GIVE. When you have prepared MariaDB and built a hg19
database, you can use GIVE_chiapetTrack.sql
file and following command template to load all the datasets to MariaDB and build 15 tracks.
Alternatively, you can set up the GIVE Docker container, which is mostly preconfigured.
## run these commands in linux shell
# change `<your user name>` to your user name of MariaDB
mysql -u `<your user name>` -p <./GIVE_chiapetTrack.sql
Build genome browser
When you have built tracks in MariaDB, it’s very easy to build a genome browser. The following code is what we used to build our demo. You can just copy and paste it in jsfiddle and then you can get the genome browser supported by our GIVE server. If you have built the tracks on your own GIVE server, you only need to replace the url in the code with your own server’s url.
<!-- change the url to your own server path -->
<script src="https://www.givengine.org/libWC/webcomponents-lite.min.js"></script>
<!-- change the url to your own server path-->
<link rel="import" href="https://www.givengine.org/lib/chart-controller/chart-controller.html">
<!-- Embed the browser in your web page -->
<chart-controller title-text="ChIA-PET long-range chromatin interactions"
ref="hg19" num-of-subs="2"
coordinate='["chr17:7520037-7643128", "chr17:7441031-7588154"]'
group-id-list='["genes", "ENCODE2_ChIA-PET", "customTracks"]'>
</chart-controller>
You can read this tutorial to learn how to simply tweak the genome browser. Our demo is based on the tweaked give_chiapet.html
HTML file.