KEGG icon

KEGG ORTHOLOGY (KO) Database

Linking genomes to pathways by ortholog annotation

KEGG2 PATHWAY BRITE MODULE KO Annotation RMODULE GENOME GENES
Enter keyword to search the ko system

KEGG Orthology (KO) System

The KEGG pathway maps, BRITE functional hierarchies, and KEGG modules are developed by capturing experimental knowledge from published literature. At the same time this knowledge is expanded from experimentally observed organisms to other organisms through the KEGG Orthology (KO) system. The KO system is a collection of manually defined ortholog groups (KO entries), which are categorized under the hierarchy of KEGG pathways and BRITE ontologies. By using the genome annotation procedure described below, genes are assigned the KO identifiers, or the K numbers, in the KEGG GENES database.
KEGG Orthology (KO)
The fact that functional information is associated with ortholog groups is a unique aspect of the KEGG resource. The sequence similarity based inference as a generalization of limited amount of experimental evidence is predefined in KEGG. As implemented in BlastKOALA, the sequence similarity search against KEGG GENES is a search for most appropriate K numbers. Once K numbers are assigned to genes in the genome, the KEGG pathways maps, Brite hierarchies, and KEGG modules are automatically reconstructed, enabling biological interpretation of high-level functions (see KEGG mapping for details).

Enter K numbers      (Example) K00161 K00162 K00163 K00627 K00382

Functionally Characterized Sequence Data

New efforts have been initated to associate each KO entry with experimental evidence of functionally characterized sequence data, now shown in the SEQUENCE subfield of the REFERENCE field.

Genome Annotation in KEGG

Genome annotation in KEGG is essentially cross-species annotation giving K numbers to orthologous genes in all available genomes, and is currently done as follows (see also brief description here).
  1. Experimental evidence on known functions of genes and proteins is organized in the KO database, which is created together with the KEGG PATHWAY, KEGG BRITE, and KEGG MODULE databases.
  2. Gene catalogs of complete genomes are generated from RefSeq and other public resources, and stored in the KEGG GENES database.
  3. Sequence similarity scores and best hit relations are computed from GENES by pairwise genome comparisons using SSEARCH, and stored in the KEGG SSDB database.
  4. For each gene in a genome the GFIT (Gene Function Identification Tool) table is created detailing the information about best-hit genes, including paralogs, in all other genomes.
  5. In the past, GFIT tables were used manually to assign K numbers by the GFIT tool, which is integrated with other tools including the gene cluster tool for consistency check of operon-like structures and the ortholog table for completeness check of pathway modules and complexes.
  6. The KOALA (KEGG Orthology And Links Annotation) tool was developed in 2008 to computerize KEGG annotators' knowledge of using GFIT tables. KOALA processes all the GFIT tables at a time and makes computational K number assignments.
  7. GFIT tables are continuously updated, and KOALA's computational assignments are automatically reflected for a selected set of well-curated K numbers (about 80%) in a newly determined genome, and also in the existing genomes that meet various other criteria.
  8. KOALA's computational assignments are repeated every two to three days, and a summary of discrepancies between its assignments and the current annotations is presented. Discrepancies are examined by annotators with the manual version of KOALA and GFIT tools.
  9. Annotation results can be mapped to KEGG pathways, BRITE hierarchies, and KEGG modules for inferring systemic functions of individual organisms, groups of organisms (eg., pangenomes), and combinations of organisms (eg., host-pathogen and human-microbiome relationships).
The read-only version of KEGG annotation tools is available for public view.
  • KOALA - linked from each KO entry page
  • GFIT - linked from each KEGG GENES entry page

Last updated: September 1, 2014
KEGG GenomeNet Kanehisa Laboratories