KEGG Mapping

KEGG mapping as a set operation

KEGG mapping is the process to map molecular objects (genes, proteins, small molecules, etc.) to molecular network objects (KEGG pathway maps, BRITE hierarchies and KEGG modules). It is not simply an enrichment process; rather it is a set operation to generate a new set. From the beginning of the KEGG project, the basic idea was to automatically generate organism-specific pathways by the set operation between manually created pathway maps (called reference pathway maps) and an annotated set of genes in the genome, originally (until 1999) annotated with EC numbers and then (since 2000) annotated with KO (KEGG Orthology) identifiers or K numbers. Thus, the KEGG mapping set operation has played a role to expand the KEGG knowledge base by converting KOs to individual gene IDs in many organisms (KO expansion). In addition, the KEGG mapping procedure has been used to assist integration and interpretation of users' datasets, especially large-scale datasets generated by high-throughput technologies (see: KEGG Mapper tools).

Here the network objects of pathway maps and Brite hierarchies are explained.

KEGG Pathway Maps

Graphical map objects

The KEGG pathway map is a moleculalr interaction/reaction network diagram represented in terms of KO functional orthologs, so that experimental evidence in specific organisms can be generalized to other organisms. Each map is manually drawn with in-house software called KegSketch, which generates the KGML+ file, an XML file with SVG graphics data. Basic graphics objects in the regular reference pathway maps are:

boxes - KOs identified by K numbers and enzymatic reactions identified by R numbers
circles - other molecules, mostly chemical substances identified by C/G/D numbers, but including DNAs, peptides, etc. that are not subject to KO expansion
lines - non-enzymatic reactions identified by R numbers

and in global and overview reference pathway maps:

lines - KOs identified by K numbers and reactions identified by R numbers
circles - chemical substances identified by C/G numbers

In computationally generated organism-specific pathway maps, KOs are expanded to individual gene identifiers (see below).

Convention of map number prefix

Each pathway map is identified by the combination of 2-4 letter code and 5 digit number (see KEGG Identifiers). The prefix has the following meaning:

map - Reference pathway
ko - Reference pathway (KO)
ec - Reference pathway (EC)
rn - Reference pathway (Reaction)
org - Organism-specific pathway map

Only the first reference pathway map is manually drawn; all other maps are computationally generated. For metabolic pathways, each box (or line) in the reference map is linked to the K number (KO identifeir), the EC number, and the R number (reaction identifier). The KO, EC, and reaction maps are linked to only one of them. For all metabolic and non-metabolic maps, K numbers are converted to gene identifiers in each organism to generate organism-specific pathways.

map00010

ko00010

hsa00010

As shown here, "map" pathways are not colored, "ko/ec/rn" pathways are colored blue, and organism-specific pathways are colored green, where coloring indicates that map objects exist and are linked to corresponding entries.

For global metabolism maps, "map" pathways are fully colored, so that "ko/ec/rn" pathways and organism-specific pathways are generated by reducing the coloring indicating the absence of corresponding entries.

About KGML files

KGML is an exchange format of KEGG pathway maps. It is meant for outside users and is not used in any service or database update procedure within KEGG. KGML files, which are computationally generated from the manually defined KGML+ file, contain information about entries (KEGG objects) and two types of relationships.

relations - relationships between boxes
reactions - relationships between circles

BRITE Functional Hierarchies

BRITE hierarchy files

The KEGG BRITE database is a collection of BRITE hierarchy files, called htext (hierarchical text) files, with additional files for binary relations. The htext file is manually created with in-house software called KegHierEditor. The htext file contains "A", "B", "C", etc. at the first column to indicate the hierarchy level.

A Metabolism
B   Carbohydrate Metabolism
C     00010 Glycolysis / Gluconeogenesis [PATH:ko00010]
D       K00844  HK; hexokinase [EC:2.7.1.1]
D       K12407  GCK; glucokinase [EC:2.7.1.2]
D       K00845  glk; glucokinase [EC:2.7.1.2]
D       ......

Each BRITE hierarchy file represents a classification system of KEGG objects identified by the KEGG Identifiers; for example, pathway-based gene classification or protein family classification by the K numbers, compound classification by C numbers, drug classification by D numbers, and disease classification by H numbers.

The binary relation files contain the relationship between KEGG objects and attributes, which can be dynamically added to the hierarchy file as additional columns using the join feature of the Brite hierarchy viewer. Many binary relation files are computationally generated from the KEGG database contents and shown in the left panel of the Brite hierarchy viewer.

Convention of brite number prefix

Each BRITE hierarchy file is identified by the combination of 2-4 letter code and 5 digit number (see KEGG Identifiers). The prefix has the following meaning:

ko - Reference hierarchy (KO)
org - Organism-specific hierarchy
br - Non-KO hierarchy
jp - Non-KO hierarchy in Japanese

Thus, the "ko" hierarchy file is manually created for the functional classifications of genes and proteins using the K numbers. Organism-specific hierarchy files are then computationally generated by converting K numbers to gene identifiers in each organism. The "br" hierarchy file is created for the functional classifications of chemical compounds, reactions, drugs, diseases, organisms, etc. using the KEGG identifiers other than the K numbers.

Last updated: November 7, 2025