HumanVega Home

Human Assembly and Annotation Information

Homo sapiens

Summary

This site presents data from the manual annotation of the human genome by the Havana group at the Welcome Trust Sanger Institute. A first pass annotation of the whole genome has been completed as part of the Gencode project. Vega also shows Loss Of Function (LoF) loci.

Additional MHC and LRC Haplotypes

Vega also shows manual annotation of loci and regions of particular interest:

External database identifiers

  • Vega human has CCDS identifiers assigned to translations where appropriate. Transcripts that have CCDS identifiers attached are highlighted in light blue on Location based views. The actual CCDS identifiers are accesible on Gene Summary, Gene External References, and Transcript Summary Pages. More information about CCDS.
  • Records are downloaded from HGNC and associations between the Vega Gene names and identifiers in the downloaded file are made. External sources added are HGNC, EntrezGene, OMIM, Pubmed and RefSeq.
  • Uniprot records, Gene Ontology (GO) terms and Gene Ontology Annotation (GOA) records are imported into Vega. These are generated by the EBI Uniprot and GOA teams.
  • lncRNAs are Vega are incorporated into the ENA and reciporocal links to the ENA are added to the transcripts in Vega.
  • Links to IMGT-GeneDB are added using the Vega gene names, and links to IMGT/HLA are added using associations downloaded from IMGT/HLA.

Publications

  • Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, Aken BL, Barrell D, Zadissa A, Searle S, Barnes I, Bignell A, Boychenko V, Hunt T, Kay M, Mukherjee G, Rajan J, Despacio-Reyes G, Saunders G, Steward C, Harte R, Lin M, Howald C, Tanzer A, Derrien T, Chrast J, Walters N, Balasubramanian S, Pei B, Tress M, Rodriguez JM, Ezkurdia I, van Baren J, Brent M, Haussler D, Kellis M, Valencia A, Reymond A, Gerstein M, Guigó R, Hubbard TJ.
    GENCODE: the reference human genome annotation for The ENCODE Project.
    Genome Res. 2012 Sep;22(9):1760-74. [Pubmed] [doi:10.1101/gr.135350.111].

Genome Summary

Last Full Update 7 February 2017
Datafreeze Date 18 October 2016
Total Bases 3,354,901,136
Golden Path Length 3,085,168,840

GRCh38 assembly genes

Havana: 50,580
Protein coding 19,768
lncRNAs: 14,175
lincRNA 7,513
antisense 5,526
sense intronic 903
sense overlapping 190
3prime overlapping ncRNA 31
bidirectional promoter lncRNA 8
non coding 3
macro lncRNA 1
ncRNAs: 11
snoRNA 8
scRNA 1
vaultRNA 1
snRNA 1
Unclassified processed transcripts 535
Pseudogenes: 14,613
processed pseudogene 10,240
unprocessed pseudogene 2,668
transcribed unprocessed pseudogene 751
transcribed processed pseudogene 452
IG pseudogene 201
unitary pseudogene 116
transcribed unitary pseudogene 97
polymorphic pseudogene 54
TR pseudogene 34
IG 213
TR 197
Other: 1,068
TEC 1,068
Readthrough genes 761
LOF: 258
Protein coding 245
lncRNAs: 1
lincRNA 1
Unclassified processed transcripts 4
Pseudogenes: 7
polymorphic pseudogene 7
TR 1

GRC patch genes

Havana: 2,258
Protein coding 883
lncRNAs 431
Unclassified processed transcripts 36
Pseudogenes 713
IG 104
TR 70
Other 21
Readthrough genes 26
LOF: 4
Protein coding 4

Haplotype genes

Havana: 1,955
Protein coding 1,052
lncRNAs 1
Unclassified processed transcripts 264
Pseudogenes 638
Readthrough genes 20

About this species