| ← 3.1. GIVE-Toolbox | ↑ Index | 4.1. Bed Format → |
MySQL commands for managing data in GIVE data source
To visualize new data with GIVE, you can add it to the MySQL-compatible data source. ** *NOTE: If you are not very familiar with SQL commands, it is recommended that you manage data used by GIVE using GIVE-Toolbox. Detailed manual of GIVE-Toolbox is shown in 3.1. GIVE-Toolbox usages for managing data in GIVE data source. ***
Table of Contents
Creating a new reference genome for GIVE
Creating a new reference genome database
To visualize a new reference genome, GIVE only needs to know:
- The names of the species for this reference (Latin and common names are recommended but any name should work);
- Its chromosomal information, including names, sizes and the location of centromeres.
These are stored in two locations within the data source. First you need to create a database with your_reference_database:
CREATE DATABASE `<your_reference_database>`;
Also, make sure your GIVE Database User has access to this database (if you haven’t granted privilege for *.*):
GRANT SELECT, CREATE TEMPORARY TABLES ON `<your_reference_database>`.* TO `<give_data_user>`@'%';
Creating a cytoBandIdeo table and populate it with data
Then you need to create a cytoBandIdeo table with chromosomal information in your reference database with the following columns:
| Column name | Type | Description |
|---|---|---|
chrom |
varchar |
Chromosome name |
chromStart |
unsigned int |
The start coordinate of the band |
chromEnd |
unsigned int |
The end coordinate of the band |
name |
varchar |
Name of the band |
gieStain |
varchar |
Giemsa Stain info, to identify bands, centromeres, etc. |
The SQL code to create this table is shown below:
CREATE TABLE `<your_reference_database>`.`cytoBandIdeo` ( `chrom` varchar(255) NOT NULL, `chromStart` int(10) unsigned NOT NULL, `chromEnd` int(10) unsigned NOT NULL, `name` varchar(255) NOT NULL, `gieStain` varchar(255) NOT NULL, KEY `chrom` (`chrom`(23), `chromStart`) ) ENGINE=InnoDB;
The cytoBandIdeo table also needs to be populated by actual data. The following SQL command can be used if the file cytoBandIdeo is already on the server:
LOAD DATA LOCAL INFILE "<cytoBandIdeo_file_path>" INTO TABLE `<your_reference_database>`.`cytoBandIdeo`;
NOTE: The annotation files, including the cytoBandIdeo file for all references available on GIVE can be downloaded from UCSC or the following URL: https://demo.give.genemo.org/annotations/. Currently, hg19, hg38, mm9 and mm10 are available.
***
Adding an entry in ref table of compbrowser
After creating the table, you also need to add one entry in table ref of database compbrowser. Notice that the browserActive field needs to be set to 1 and in the settings field, the JSON string also has its browserActive attribute set as true. (You may want to try http://www.objgen.com/json to get a JSON string with ease.)
Please see
The SQL code is shown below:
INSERT INTO `compbrowser`.`ref` (
`dbname`,
`name`,
`commonname`,
`browserActive`,
`settings`
) VALUES (
'<your_reference_database>',
'<species_name>',
'<species_common_name>',
1,
'{
"browserActive": true
}'
);
Creating track groups
Tracks in GIVE belong to track groups for better management and these groups need their place in the database. A grp table is required in the reference database to manage track groups with the following columns:
| Column name | Type | Description |
|---|---|---|
name |
varchar |
Group name |
label |
varchar |
Description of the group |
priority |
float |
Order for this group in the browser, less is upper |
defaultIsClosed |
tinyint |
Whether the group will be closed by default, reserved |
singleChoice |
tinyint |
Whether the group will only allow one track to be active at any time |
The SQL code to create this table is shown below:
CREATE TABLE `<your_reference_database>`.`grp` ( `name` char(150) NOT NULL, `label` char(255) NOT NULL DEFAULT '', `priority` float NOT NULL DEFAULT '0', `defaultIsClosed` tinyint(2) DEFAULT NULL, `singleChoice` tinyint(1) NOT NULL DEFAULT '0', PRIMARY KEY (`name`) ) ENGINE=InnoDB;
Individual track groups can be created by adding entries in the grp table, using the following SQL command:
INSERT INTO `<your_reference_database>`.`grp` VALUES ( '<group_name>', '<group_description>', <the_priority_of_the_group>, <default_is_closed_value>, <whether_the_group_only_allows_one_choice> );
Creating the track definition table
Tracks themselves also need a place to store their annotation and data. This is achieved by creating a table named trackDb in the reference database with the following columns:
| Column name | Type | Description |
|---|---|---|
tableName |
varchar |
Name of the track table |
type |
varchar |
Type of the track (Important) |
priority |
float |
Order for the track (within group) |
url |
longblob |
URL for the track, reserved |
html |
longtext |
HTML description for the track, reserved |
grp |
varchar |
Group of the track, should be the same as __* |
settings |
longtext |
Detailed track settings, JSON format. See Database table properties documentation for details. |
The SQL code to create this table is shown below:
CREATE TABLE `<your_reference_database>`.`trackDb` (
`tableName` varchar(150) NOT NULL,
`type` varchar(255) NOT NULL,
`priority` float NOT NULL,
`url` longblob,
`html` longtext,
`grp` varchar(150) NOT NULL,
`settings` longtext NOT NULL,
PRIMARY KEY (`tableName`),
FOREIGN KEY `group_id` (`grp`) REFERENCES
`<your_reference_database>`.`grp` (`name`)
ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB;
After this step, GIVE will be able to display the new reference genome and also data tracks within it.
Adding data
Adding tracks to GIVE database typically involves two steps:
- Create a table corresponding to the data type and populate it;
- Add an entry in the
trackDbtable with the metadata of the track.
Adding genomic span data (BED)
The table for BED tracks needs to contain the following columns:
| Column name | Type | Description |
|---|---|---|
name |
varchar |
Name of the span (gene, transcript, etc.) |
chrom |
varchar |
Chromosome name |
strand |
char |
Strand of the span |
txStart |
int |
Start coordinate of the span |
txEnd |
int |
End coordinate of the span |
cdsStart |
int |
Start coordinate of the ‘thick’ portion of the span |
cdsEnd |
int |
End coordinate of the ‘thick’ portion of the span |
exonCount |
int |
Number of exons in the span |
exonStarts |
longblob |
The start coordinates of all exons, separated by comma |
exonEnds |
longblob |
The end coordinates of all exons, separated by comma |
proteinID |
varchar |
The ID of corresponding protein, only applies to gene tracks |
alignID |
varchar |
The ID of alignment segments, only applies to gene tracks |
Note: This is different than BED12 format: 1) field order is slightly different; 2) the 9th and 10th columns represent the start and end coordinate of all the exons, instead of the start within the gene and length of the exon in BED12. ***
The SQL command to create such a table is shown below:
CREATE TABLE `<your_reference_database>`.`<track_table_name>` ( `name` varchar(255) NOT NULL DEFAULT '', `chrom` varchar(255) NOT NULL DEFAULT '', `strand` char(2) NOT NULL DEFAULT '', `txStart` int(10) unsigned NOT NULL DEFAULT '0', `txEnd` int(10) unsigned NOT NULL DEFAULT '0', `cdsStart` int(10) unsigned NOT NULL DEFAULT '0', `cdsEnd` int(10) unsigned NOT NULL DEFAULT '0', `exonCount` int(10) unsigned NOT NULL DEFAULT '0', `exonStarts` longblob NOT NULL, `exonEnds` longblob NOT NULL, `proteinID` varchar(40) NOT NULL DEFAULT '', `alignID` varchar(255) NOT NULL DEFAULT '', KEY `name` (`name`), KEY `chrom` (`chrom`(16),`txStart`), KEY `chrom_2` (`chrom`(16),`txEnd`), KEY `protein` (`proteinID`(16)), KEY `align` (`alignID`) ) ENGINE=InnoDB;
After the table is created, you can populate it with the actual data:
LOAD DATA LOCAL INFILE "<bed_data_file_path>" INTO TABLE `<your_reference_database>`.`<track_table_name>`;
The entry in the trackDb table can be added via the following SQL command:
INSERT INTO `<your_reference_database>`.`trackDb` VALUES (
'<track_table_name>',
'bed',
1,
NULL,
NULL,
'genes', -- Group name, should be the same as grp.name
'{
"group":"<group_name>",
"longLabel":"<long_label>",
"priority":1,
"shortLabel":"<short_label>",
"track":"<track_table_name>",
"type":"bed",
"visibility":"pack",
"adaptive":true
}'
);
Adding linear tracks (bigWig)
The table for bigWig tracks only needs to contain the following column:
| Column name | Type | Description |
|---|---|---|
fileName |
varchar |
The path or URL of the bigWig file |
The SQL command to create such a table is shown below:
CREATE TABLE `<your_reference_database>`.`<track_table_name>` ( `fileName` varchar(255) NOT NULL ) ENGINE=InnoDB;
After the table is created, you can populate it with the actual data:
INSERT INTO `<your_reference_database>`.`<track_table_name>` VALUES ( '<bigWig_file_path>' );
The entry in the trackDb table can be added via the following SQL command:
INSERT INTO `<your_reference_database>`.`trackDb` VALUES (
'<track_table_name>',
'bigWig',
1,
NULL,
NULL,
'genes', -- Group name, should be the same as grp.name
'{
"group":"<group_name>",
"longLabel":"<long_label>",
"priority":1,
"shortLabel":"<short_label>",
"track":"<track_table_name>",
"type":"bigWig",
"visibility":"full",
"autoScale":false,
}'
);
Adding interaction tracks
Adding interaction tracks (in interaction format) is similar to adding BED or GenePred tracks. The table for interaction tracks needs to contain the following columns:
| Column name | Type | Description |
|---|---|---|
ID |
int |
ID of the interaction segment (can be generated by auto-increment) |
chrom |
varchar |
Chromosome name |
Start |
int |
Start coordinate of the span |
End |
int |
End coordinate of the span |
linkID |
int |
ID of the link (segments with the same linkID are linked together) |
value |
float |
The value of the link |
dirFlag |
tinyint |
The direction of the link (the link should go from the segment with dirFlag = 0 to the one with dirFlag = 1) if the link has direction, ‘-1’ if the link does not have direction |
The SQL command to create such a table is shown below:
CREATE TABLE ``.`<track_table_name>` ( `ID` int(10) unsigned NOT NULL AUTO_INCREMENT, `chrom` varchar(255) NOT NULL DEFAULT '', `start` int(10) unsigned NOT NULL DEFAULT '0', `end` int(10) unsigned NOT NULL DEFAULT '0', `linkID` int(10) unsigned NOT NULL DEFAULT '0', `value` float NOT NULL DEFAULT '0', `dirFlag` tinyint(4) NOT NULL DEFAULT '-1', PRIMARY KEY (`ID`), KEY `chrom` (`chrom`(16),`start`), KEY `chrom_2` (`chrom`(16),`end`), KEY `linkID` (`linkID`) ) ENGINE=InnoDB; </pre> After the table is created, you can populate it with the actual data: LOAD DATA LOCAL INFILE "<interaction_data_file_path>" INTO TABLE `<your_reference_database>`.`<track_table_name>`;The entry in the `trackDb` table can be added via the following SQL command:INSERT INTO `<your_reference_database>`.`trackDb` VALUES ( '<track_table_name>', 'interaction', 1, NULL, NULL, 'genes', -- Group name, should be the same as grp.name '{ "group":"<group_name>", "longLabel":"<long_label>", "priority":1, "shortLabel":"<short_label>", "track":"<track_table_name>", "type":"interation", "visibility":"full", "thresholdPercentile": [ <percentile_values_for_interaction_thresholds> ], }' );## Property references For both references and tracks, the `settings` column in the corresponding tables (`ref` table in `compbrowser` database for a reference, and `trackDb` table in your reference database for a track.) includes some properties that are available. This section lists some of the useful properties. ### Reference table properties documentation Here are a brief description of the properties you may use for a reference in the `settings` column of `ref` in the `compbrowser` database: __browserActive__ > Type: `Boolean` > Default: `false` > Whether this reference will be enabled for the GIVE browser. If you would > like to have this reference available, you will need to set this value to > `true`. __geneCoorTable__ > Type: `String` > Default: `''` > The database table for the gene name search function of `` > element. This table should have the columns shown below. > | Column name | Type | Description | > | --- | --- | --- | > | ` ` | `VARCHAR` | Gene names/symbols (this can be a column of the table itself, or a column in the linked table(s)). This value can be specified in `geneSymbolColumn` property, see below. | > | `chrom` | `VARCHAR` | The name of the chromosome the gene is on | > | `chromStart` | `INT` | Chromosomal start coordinates | > | `chromEnd` | `INT` | Chromosomal end coordinates | __geneSymbolColumn__ > Type: `String` > Default: `'name'` > The column for gene symbols in `geneCoorTable`. __geneDescTable__ > Type: `String` > Default: `''` > When implementing gene name search function of ` `, this > table can be used to provide the gene description in the search results. This > table should have the columns shown below. > | Column name | type | Description | > | --- | --- | --- | > | ` ` | `VARCHAR` | Gene symbols (corresponding to ` ` in ` `). This value can be specified in `descSymbolColumn` property, see below. | > | `description` | `VARCHAR` | Gene description | __descSymbolColumn__ > Type: `String` > Default: `'Symbol'` > The column for gene symbols in `geneDescTable`. __aliasTable__ > Type: `String` > Default: `''` > When implementing gene name search function of ` `, this > table can be used to link all the different aliases to their official symbol. > (For example, `OCT3/4` to `POU5F1`.) This table should have the columns shown > below. > | Column name | type | Description | > | --- | --- | --- | > | ` ` | `VARCHAR` | gene symbols (corresponding to ` ` in ` `). This value can be specified in `aliasSymbolColumn` property, see below. | > | `alias` | `VARCHAR` | Gene aliases, each alias shall occupy one row | > | `isSymbol` | `BIT` | Indicating whether this alias is the same as ` `, 1 if the alias is the same, 0 if not | __aliasSymbolColumn__ > Type: `String` > Default: `'Symbol'` > The column for gene symbols in `aliasTable`. ### Database table properties documentation Here are a brief description of the properties you may use in the `settings` column of `trackDb` in your reference database: __shortLabel__ > Type: `String` > Default: `''` > The short label shown by the track. __longLabel__ > Type: `String` > Default: `''` > The long label providing more information. Reserved. __visibility__ > Type: `enum` (`'full'`, `'pack'`, `'collapsed'`, `'dense'`, and `'hide'`) > Default: `'hide'` > The visibility of the track, values on the left are more visible/prominent. __adaptive__ > Type: `Boolean` > Default: `false` > Whether the visibility will adapt to the height of the ending result. If the > height exceeds certain threshold, the visibility will be degraded by one. __dataType__, __cellType__, __trackFeature__, __labName__, __groupDataType__, __groupFeature__, __groupSampleType__ > Type: `String` > Default: `null` > These are metadata of the tracks. Reserved for future features. __thresholdPercentile__ > Type: `Array ` > Default: `null` > A series of numbers defining the percentile of the data points in the track. > This is used to color interaction graphs by their corresponding signal level. > May be applied to other types of tracks in a future release. |||| | --- | --- | --- | | [← 3.1. GIVE-Toolbox](/Genomic-Interactive-Visualization-Engine/manuals/3.1-GIVE-Toolbox-usages.html) | [↑ Index](/Genomic-Interactive-Visualization-Engine/manuals/) | [4.1. Bed Format →](/Genomic-Interactive-Visualization-Engine/manuals/4.1-bed.html) |