Gene Catalogs

KEGG GENES is a collection of gene catalogs for all complete genomes (see release history) generated from publicly available resources, mostly NCBI RefSeq and GenBank. They are subject to SSDB computation and KO assignment (gene annotation) by KOALA tool (see annotation statistics). KEGG MGENES is a collection of supplementary gene catalogs for metagenomes, which are given automatic KO assignment by GhostKOALA with GENES used as a reference data set. The collections of viral genomes in RefSeq is also included in KEGG GENES with the standard annotation procedures.

Furthermore, a KEGG original protein sequence database is being developed as the GENES Addendum category. Protein sequences whose functions are experimentally characterized are collected from PubMed references and used to define new KOs that have not been covered by complete genomes (see KO).
Category DBGET Remark
KEGG organisms
(Complete genomes)
GENES <org> Complete genomes with KOALA and manual annotations
Viruses vg Viral genomes with KOALA and manual annotations
Addendum ag PubMed-based collection of functionally characterized proteins
Metagenomes MGENES Metagenomes with automatic (GhostKOALA) annotation

<org> three- or four-letter organism code

Data Source of KEGG GENES

The following table shows the data source of the KEGG GENES database.

CategoryContentOriginal DB1Genome identifierGene identifier
EukaryotesComplete genomesRefSeq (some from GenBank)T0 numbers
(three or four letter
organism codes)
ProkaryotesGenBank (some from RefSeq)Locus_tag
VirusesComplete viral genomesRefseq release (viral)T40000 (vg)
T4 numbers
AddendumFunctionally characterized
GenBank (GenPept) records
selected by KEGG
T10000 (ag)ProteinID

1 Original DB name is shown in the definition field of each GENES entry.

Gene Annotation

The annotation of KEGG GENES involves assignment of KO identifiers (K numbers). Internally, this is done using the KOALA/KoAnn and GFIT annotation tools (see: KO Database). For outside users, the following services are provided.

red Annotate genomes using KEGG GENES
BlastKOALA: automatic KO assignment by BLASTP search
GhostKOALA: automatic KO assignment by GHOSTX search
red Search similar sequences in KEGG GENES
BLAST:sequence similarity search by BLAST
FASTA:sequence similarity search by FASTA

Gene Identifier Conversion

KEGG GENES can be retrieved by giving identifiers of outside databases, such as NCBI-ProteinID (INSDC accession), NCBI-GeneID (Entrez Gene ID) and UniProt accession numbers.

red Search GENES for

Last updated: January 1, 2021
KEGG GenomeNet Kanehisa Laboratories