VegaVega Home
Annotation Attributes

Annotation Attributes

A standard set of attributes with strictly defined meanings are being added to the Vega annotation. Where present these are shown on Gene Summary and Transcript Summary Panels.

Transcript level attributes

transcript contains two confidently annotated CDSs. Support may come from e.g. proteomic data, cross-species conservation or published experimental work
CAGE supported TSS
transcript 5' end overlaps ENCODE or Fantom CAGE cluster
dotter confirmed
transcript checked using DOTTER dotplot alignment of homology evidence to genomic sequence to confirm exon structure
inferred exon combination
transcript model contains all possible in-frame exons supported by homology, experimental evidence or conservation, but the exon combination is not directly supported by a single piece of evidence and may not be biological. Used for large genes with repetitive exons (e.g. titin (TTN)) to represent all the exons individual transcript variants can pool from
inferred transcript model
transcript model is not supported by a single piece of transcript evidence. May be supported by multiple fragments of transcript evidence or by combining different evidence sources e.g. protein homology, RNA-seq data, published experimental data
low sequence quality
transcript supported by transcript evidence that, while mapping best-in-genome, shows regions of poor sequence quality
not organism-supported
mRNA, EST or protein homology evidence from orthologous loci from other species can be used to build variants on the condition that the homology is perfectly co-linear and all normal splicing rules are upheld
non-submitted evidence
transcript supported by sequence evidence from as yet unpublished experimental study
transcript connecting two independent loci, i.e. transcript has exons that overlap exons from transcripts belonging to 2 or more different loci
retained intron CDS
CDS codes through an internal retained intron (compared to a reference variant)
retained intron final
CDS ends in, or downstream of, a retained intron that, compared to a reference variant, is immediately downstream of the last coding exon
retained intron first
CDS starts in, or upstream of, a retained intron that, compared to a reference variant, is immediately upstream of the first coding exon
RNA-Seq supported only
transcript either supported in full by RNAseq data or has unique splice feature that is only supported by RNAseq data
RP supported TIS
transcript contains a CDS that has a translation initiation site supported by Ribosomal Profiling data
upstream ATG
an upstream ATG exists, but the ATG for the current CDS has been chosen taking into account factors like cross-species conservation, strength of Kozak sequence, signal peptides, experimental evidence, and ribosome profiling
3' nested supported extension
3' end extended based on RNA-seq data
3' standard supported extension
3' end extended based on RNA-seq data
454 RNA-Seq supported
annotated based on RNA-seq data
5' nested supported extension
5' end extended based on RNA-seq data
5' standard supported extension
5' end extended based on RNA-seq data
RNA-Seq supported only
annotated based on RNA-seq data
RNA-Seq supported partial
annotated based on mixture of RNA-seq data and EST/mRNA/protein evidence
nested 454 RNA-Seq supported
annotated based on RNA-seq data

Gene level attributes

fragmented locus
locus consists of non-overlapping transcript fragments either because of genome assembly issues (i.e., gaps or mis-assemblies), or because supporting transcripts (e.g., from another species) cannot be completely mapped, or because the supporting transcripts are non-overlapping end pairs (i.e., 5' and 3' ESTs from a single cDNA)
protein-coding locus with no paralogues or orthologs
overlapping locus
exon(s) of the locus overlap exon(s) of a readthrough transcript or a transcript belonging to another locus
reference genome error
locus overlaps a sequence error or an assembly error in the reference genome that affects its annotation (e.g., 1 or 2bp insertion/deletion, substitution causing premature stop codon). The main effect is that affected transcripts that would have had a CDS are currently annotated without one
protein-coding locus created via retrotransposition
ncRNA host
locus is host to ncRNAs such as piRNA, miRNA, snoRNA, etc.