Menu PATHWAY BRITE MODULE KO GENOME GENES SSDB Virus Plant


Gene Catalogs

KEGG GENES is a collection of gene catalogs for all complete genomes (see release history) generated from publicly available resources, mostly NCBI RefSeq and GenBank. They are subject to SSDB computation and KO assignment (gene annotation) by KOALA tool. KEGG MGENES is a collection of supplementary gene catalogs for metagenomes, which are given automatic KO assignment by GhostKOALA with GENES used as a reference data set. The collections of viral genomes in RefSeq is also included in KEGG GENES with the standard annotation procedures.

Furthermore, a KEGG original protein sequence database is being developed as the GENES Addendum category. Protein sequences whose functions are experimentally characterized are collected from PubMed references and used to define new KOs that have not been covered by complete genomes (see KO).
Category DBGET Remark
KEGG organisms
(Complete genomes)
GENES Complete genomes with KOALA and manual annotations
Viruses vg Viral genomes with KOALA and manual annotations
Addendum ag PubMed-based collection of functionally characterized proteins
Metagenomes MGENES Metagenomes with automatic (GhostKOALA) annotation

Data Source of KEGG GENES

The following table shows the data source of the KEGG GENES database.

CategoryOriginal DB1Content2Genome identifierGene identifier
EukaryotesRefSeqRefSeq release (complete)T0 numbers
(three or four letter
organism codes)
GeneID
ProkaryotesRefSeqNCBI reference genomesLocus_tag
GenBankOther complete genomesLocus_tag
VirusesRefSeqRefseq release (viral)T40000 (vg)
T4 numbers
GeneID
AddendumKEGGFunctionally characterized proteinsT10000 (ag)ProteinID

1 Original DB name is shown in the definition field of each GENES entry.
2 RefSeq bimonthly releases are used to update eukaryotes and viruses.
  Prokaryoteic genomes are selected from ftp://ftp.ncbi.nlm.nih.gov/genomes/ASSEMBLY_REPORTS/.


Gene Annotation

The annotation of KEGG GENES involves assignment of KO identifiers (K numbers). Internally, this is done using the KOALA and GFIT annotation tools based on the SSDB database (see: Genome Annotation in KEGG). For outside users, the following services are provided.

red Annotate genomes using KEGG GENES
BlastKOALA: automatic KO assignment by BLASTP search
GhostKOALA: automatic KO assignment by GHOSTX search
red Search similar sequences in KEGG GENES
BLAST:sequence similarity search by BLAST
FASTA:sequence similarity search by FASTA

Gene Name Conversion

KEGG GENES can be retrieved by giving identifiers of outside databases, such as NCBI-ProteinID (INSDC accession), NCBI-GeneID (Entrez Gene ID) and UniProt accession numbers.

red Search GENES for


Last updated: September 15, 2016
KEGG GenomeNet Kanehisa Laboratories