ChIA-PET long-range chromatin interactions

This genome browser demo presents 15 datasets of ChIA-PET long-range chromatin interactions along with human genome assembly GRCh37 (hg19) for comparative studies. ChIA-PET data were generated by ENCODE Phase 2. The ChIA-PET chromatin interaction data includes 8 experiments in 5 cell lines (K562, MCF-7, HeLa-S3, HCT116 and NB4) with 3 kinds of target genes (POLR2A, CTCF and ESR1).

fig

Data preparation for GIVE

We used the ENCODE ChIA-PET long-range chromatin interactions bed (bed12) format data, which can be downloaded from ENCODE Experiment Matrix. Alternatively, you can use batch_download.txt with following shell command to download all the data.

## run the following command in linux shell 
xargs -n 1 curl -O -L < batch_download.txt

The following table shows all the 15 bed format data sets we can get from ENCODE Phase 2.

DataFile	Experiment Accession	Target gene	Cell line	Description
ENCFF001THT.bed.gz	ENCSR000BZX	POLR2A	HCT116	POLR2A ChIA-PET on human HCT-116
ENCFF001THU.bed.gz	ENCSR000BZW	POLR2A	HeLa-S3	POLR2A ChIA-PET on human HeLa-S3
ENCFF001THV.bed.gz	ENCSR000CAC	CTCF	K562	CTCF ChIA-PET on human K562
ENCFF001THW.bed.gz	ENCSR000BZY	POLR2A	K562	POLR2A ChIA-PET on human K562
ENCFF001THX.bed.gz	ENCSR000CAD	CTCF	MCF-7	CTCF ChIA-PET on human MCF-7
ENCFF001THY.bed.gz	ENCSR000CAD	CTCF	MCF-7	CTCF ChIA-PET on human MCF-7
ENCFF001THZ.bed.gz	ENCSR000BZZ	ESR1	MCF-7	ESR1 ChIA-PET on human MCF-7
ENCFF001TIA.bed.gz	ENCSR000BZZ	ESR1	MCF-7	ESR1 ChIA-PET on human MCF-7
ENCFF001TIB.bed.gz	ENCSR000BZZ	ESR1	MCF-7	ESR1 ChIA-PET on human MCF-7
ENCFF001TIC.bed.gz	ENCSR000BZY	POLR2A	K562	POLR2A ChIA-PET on human K562
ENCFF001TID.bed.gz	ENCSR000CAA	POLR2A	MCF-7	POLR2A ChIA-PET on human MCF-7
ENCFF001TIE.bed.gz	ENCSR000CAA	POLR2A	MCF-7	POLR2A ChIA-PET on human MCF-7
ENCFF001TIF.bed.gz	ENCSR000CAA	POLR2A	MCF-7	POLR2A ChIA-PET on human MCF-7
ENCFF001TIG.bed.gz	ENCSR000CAB	POLR2A	NB4	POLR2A ChIA-PET on human NB4
ENCFF001TIJ.bed.gz	ENCSR000CAA	POLR2A	MCF-7	POLR2A ChIA-PET on human MCF-7

The following table shows a sample of the bed12 format used in ENCODE datasets, and the full description can be found here.

chrom	chromStart	chromEnd	name	score	strand	thickStart	thickEnd	itermRgb	blockCount	blockSizes	blockStarts
chr1	3507144	3538145	chr1:3507144..3509308-chr1:3534421..3538145,3	300	.	3507144	3538145	255,0,0	2	2164,3724	0,27277
chr1	3507584	3520603	chr1:3507584..3509631-chr1:3518303..3520603,3	300	.	3507584	3520603	255,0,0	2	2047,2300	0,10719
chr1	761369	763199	chr1:761369..763199-chr8:182999..184804,2	200	.	761369	763199	255,0,0	1	1830	0
chr8	182999	184804	chr1:761369..763199-chr8:182999..184804,2	200	.	182999	184804	255,0,0	1	1805	0

The first two rows in the sample show two intra-chromosome interactions, and the last two rows duplicates show one inter-chromosome interactions. So you may have found that we only need the name column to get all the interaction information. So we wrote a simple script chiapet2give.sh to convert the datasets to the GIVE supported interaction bed format (the format definition can be found in GIVE Manual). Run following command results in converted GIVE interaction bed files, which are named with a prefix give_x_, such as give_x_ENCFF001THU.bed.gz.bed.

## run in linux shell
ls ENCFF*.bed.gz | xargs -n 1 -P 4 -I {} bash ../../chiapet2give.sh {} ./

The following table shows the GIVE interaction bed format. These datasets can be loaded to GIVE MySQL server.

ID	chrom	Start	End	linkID	value	dirFlag
1	chr20	47889560	47895795	1	16.026310742063	-1
2	chr20	47896527	47898203	1	16.026310742063	-1
3	chr17	79827812	79838989	2	15.548584214411	-1
4	chr17	79848037	79871266	2	15.548584214411	-1
5	chr17	27046828	27048611	3	15.5357777367182	-1
6	chr17	27048612	27049990	3	15.5357777367182	-1

Build track in MariaDB

You need a server to build a genome browser with GIVE. Please read the prerequisites and configuration of GIVE server. In that tutorial page, you will also learn how to prepare MariaDB database and the GIVE Toolbox has information on how to build a reference genome for GIVE. When you have prepared MariaDB and built a hg19 database, you can use GIVE_chiapetTrack.sql file and following command template to load all the datasets to MariaDB and build 15 tracks.

Alternatively, you can set up the GIVE Docker container, which is mostly preconfigured.

## run these commands in linux shell
# change `<your user name>` to your user name of MariaDB
mysql -u `<your user name>` -p <./GIVE_chiapetTrack.sql

Build genome browser

When you have built tracks in MariaDB, it’s very easy to build a genome browser. The following code is what we used to build our demo. You can just copy and paste it in jsfiddle and then you can get the genome browser supported by our GIVE server. If you have built the tracks on your own GIVE server, you only need to replace the url in the code with your own server’s url.

<!-- change the url to your own server path -->
<script src="https://www.givengine.org/libWC/webcomponents-lite.min.js"></script> 
<!-- change the url to your own server path-->
<link rel="import" href="https://www.givengine.org/lib/chart-controller/chart-controller.html">
<!-- Embed the browser in your web page -->
<chart-controller title-text="ChIA-PET long-range chromatin interactions" 
    ref="hg19" num-of-subs="2" 
    coordinate='["chr17:7520037-7643128", "chr17:7441031-7588154"]'
    group-id-list='["genes", "ENCODE2_ChIA-PET", "customTracks"]'>
</chart-controller>

You can read this tutorial to learn how to simply tweak the genome browser. Our demo is based on the tweaked give_chiapet.html HTML file.

Genomic-Interactive-Visualization-Engine

Genomic Interactive Visualization Engine

ChIA-PET long-range chromatin interactions

Data preparation for GIVE

Build track in MariaDB

Build genome browser