← 4.3.2. Data conversion from Hi-C contact matrices ↑ Index

Gene Annotation Tracks

Gene annotation track shows the locations and structures of genes and transcripts. There are many kinds of formats and sources of gene annotations, such as UCSC known gene table, GTF, GFF, bed, etc. Currently, GIVE only supports UCSC known gene table format. GTF/GFF support is coming in next update. Gene annotation track is set as genePred type in GIVE data source.

UCSC known gene table format

UCSC known gene table format is used by UCSC known gene dataset. The UCSC Known Genes dataset is constructed by a fully automated process, based on protein data from Swiss-Prot/TrEMBL (UniProt) and the associated mRNA data from Genbank. It’s a Tab separated 12 column text file format. Here, we describe the content of each column.

name: Name of gene. This name will be shown in the gene annotation track of GIVE genome browser. chrom: Reference sequence chromosome or scaffold
strand: + or - for strand
txStart: Transcription start position (or end position for minus strand item)
txEnd: Transcription end position (or start position for minus strand item)
cdsStart: Coding region start (or end position if for minus strand item)
cdsEnd: Coding region end (or start position if for minus strand item)
exonCount: Number of exons
exonStarts: Exon start positions (or end positions for minus strand item)
exonEnds: Exon end positions (or start positions for minus strand item)
proteinID: (Currently NOT be used in GIVE) UniProt ID, UniProt accession, or RefSeq protein ID
alignID: (Currently NOT be used in GIVE) Unique identifier (GENCODE transcript ID for GENCODE Basic)

The gene annotation file in UCSC known gene table format can be downloaded from UCSC table browser. The default name in the first column is UCSC known gene name (such as uc031tla.1), which will be shown in the genome browser. You might want to use gene symbol instead of the kgID. It can be done in three steps.

GTF Format

Coming soon.