Gene Catalogs

KEGG GENES is a collection of gene catalogs for all complete genomes (see release history) generated from publicly available resources, mostly NCBI RefSeq and GenBank. They are subject to SSDB computation and KO assignment (gene annotation) by KOALA tool (see annotation statistics). KEGG MGENES is a collection of supplementary gene catalogs for metagenomes, which are given automatic KO assignment by GhostKOALA with GENES used as a reference data set. The collections of viral genomes in RefSeq is also included in KEGG GENES with the standard annotation procedures.

Furthermore, a KEGG original protein sequence database is being developed as the GENES Addendum category. Protein sequences whose functions are experimentally characterized are collected from PubMed references and used to define new KOs that have not been covered by complete genomes (see KO).
Category DBGET Remark
KEGG organisms
(Complete genomes)
GENES <org> Complete genomes with KOALA and manual annotations
Viruses vg Viral genomes with KOALA and manual annotations
Addendum ag PubMed-based collection of functionally characterized proteins
Metagenomes MGENES Metagenomes with automatic (GhostKOALA) annotation

<org> three- or four-letter organism code

Data Source of KEGG GENES

The following table shows the data source of the KEGG GENES database.

CategoryOriginal DB1Content2Genome identifierGene identifier
EukaryotesRefSeqRefSeq release (complete)T0 numbers
(three or four letter
organism codes)
ProkaryotesRefSeqNCBI reference genomesLocus_tag
GenBankOther complete genomesLocus_tag
VirusesRefSeqRefseq release (viral)T40000 (vg)
T4 numbers
AddendumKEGGFunctionally characterized proteinsT10000 (ag)ProteinID

1 Original DB name is shown in the definition field of each GENES entry.
2 RefSeq bimonthly releases are used to update eukaryotes and viruses.
  Prokaryoteic genomes are selected from

Gene Annotation

The annotation of KEGG GENES involves assignment of KO identifiers (K numbers). Internally, this is done using the KOALA and GFIT annotation tools based on the SSDB database (see: Genome Annotation in KEGG). For outside users, the following services are provided.

red Annotate genomes using KEGG GENES
BlastKOALA: automatic KO assignment by BLASTP search
GhostKOALA: automatic KO assignment by GHOSTX search
red Search similar sequences in KEGG GENES
BLAST:sequence similarity search by BLAST
FASTA:sequence similarity search by FASTA

Gene Identifier Conversion

KEGG GENES can be retrieved by giving identifiers of outside databases, such as NCBI-ProteinID (INSDC accession), NCBI-GeneID (Entrez Gene ID) and UniProt accession numbers.

red Search GENES for

Last updated: March 3, 2018
KEGG GenomeNet Kanehisa Laboratories