KEGG icon


Molecular building blocks of life in the genomic space

Search for
Search for

Enter org:gene      (Example) syn:ssr3451

Gene Catalogs

KEGG GENES is a collection of gene catalogs for all complete genomes (see release history) generated from publicly available resources, mostly NCBI RefSeq and GenBank. They are subject to SSDB computation and KO assignment (gene annotation) by KOALA tool. KEGG DGENES and MGENES are supplementary gene catalogs for draft genomes and metagenomes, which are given automatic KO assignment by BlastKOALA and GhostKOALA, respectively, with GENES used as a reference data set. The collections of viral genomes and plasmids in RefSeq are also included in KEGG GENES with the standard annotation procedures.

Furthermore, a KEGG original protein sequence database is being developed as the GENES Addendum category. Protein sequences whose functions are experimentally characterized are collected from PubMed references and used to define new KOs that have not been covered by complete genomes (see KO).
Category DBGET Remark
Complete genomes GENES Complete genomes with KOALA and manual annotations
Viruses Viral genomes with KOALA and manual annotations
Plasmids Plasmids with KOALA and manual annotations
Addendum PubMed-based collection of functionally characterized proteins
Draft genomes DGENES Draft genomes with automatic (BlastKOALA) annotation
Metagenomes MGENES Metagenomes with automatic (GhostKOALA) annotation

Data Source of KEGG GENES

The following table shows the data source of the KEGG GENES database.

CategoryOriginal DB1Content2Genome identifierGene identifier
EukaryotesRefSeqRefSeq release (complete)T0 numbers
(three or four letter
organism codes)
ProkaryotesRefSeqNCBI reference genomesLocus_tag
GenBankOther complete genomesLocus_tag
VirusesRefSeqRefseq release (viral)T40000 (vg)
T4 numbers
PlasmidsRefSeqRefseq release (plasmid)T20000 (pg)GeneID
AddendumKEGGFunctionally characterized proteinsT10000 (ag)ProteinID

1 Original DB name is shown in the definition field of each GENES entry.
2 RefSeq bimonthly releases are used to update eukaryotes, viruses and plasmids.
  Prokaryoteic genomes are selected from

Gene Annotation

The annotation of KEGG GENES involves assignment of KO identifiers (K numbers). Internally, this is done using the KOALA and GFIT annotation tools based on the SSDB database (see: Genome Annotation in KEGG). The annotation of KEGG DGENES and MGENES is done automatically using the BlastKOALA and GhostKOALA programs, respectively, shown below.

red Annotate genomes using KEGG GENES
BlastKOALA: automatic KO assignment by BLASTP search
GhostKOALA: automatic KO assignment by GHOSTX search
red Search similar sequences in KEGG GENES
BLAST:sequence similarity search by BLAST
FASTA:sequence similarity search by FASTA

Gene Name Conversion

KEGG GENES can be retrieved by giving identifiers of outside databases, such as NCBI-ProteinID (INSDC accession), NCBI-GeneID (Entrez Gene ID) and UniProt accession numbers.

red Search GENES for

Last updated: June 1, 2016
KEGG GenomeNet Kanehisa Laboratories