KEGG Markup Language

The KEGG Markup Language (KGML) is an exchange format of the KEGG graph objects, especially the KEGG pathway maps that are manually drawn and updated. KGML enables automatic drawing of KEGG pathways and provides facilities for computational analysis and modeling of protein networks and chemical networks.

Background

The KEGG pathway maps are graphical image maps representing networks of interacting molecules responsible for specific cellular functions. There are two types of KEGG pathways:

The KGML files contain computerized information about graphical objects and their relations in the KEGG pathways as well as information about orthologous gene assignments in the KEGG GENES database.

In KGML the pathway element specifies one graph object with the entry elements as its nodes and the relation and reaction elements as its edges. The relation and reaction elements indicate the connection patterns of rectangles (gene products) and the connection patterns of circles (chemical compounds), respectively, in the KEGG pathways. The two types of graph objects, those consisting of entry and relation elements and those consisting of entry and reaction elements, are called the protein network and the chemical network, respectively. Since the metabolic pathway can be viewed both as a network of proteins (enzymes) and as a network of chemical compounds, another distinction of KEGG pathways is:

Overview

The following figure shows an overview of KGML.

The pathway element is a root element, and one pathway element is specified for one pathway map in KGML. The entry, relation, and reaction elements specify the graph information, and additional elements are used to specify more detailed information about nodes and edges of the graph.

PATHWAY

pathway element

The pathway element specifies graph information stored in the KEGG pathway map. The attributes of this element are as follows.

attribute name data type explanation REQUIRED/IMPLIED
name keggid.type the KEGGID of this pathway map REQUIRED
org maporg.type ko/ec/[org prefix] REQUIRED
number mapnumber.type the map number of this pathway map REQUIRED
title string.type the title of this pathway map IMPLIED
image url.type the resource location of the image file of this pathway map IMPLIED
link url.type the resource location of the information about this pathway map IMPLIED
name attribute

The name attribute contains the KEGG identifier of this pathway map.

attribute value explanation
path:ko*****
path:[org prefix]*****
the KEGGID of this pathway map
 ex)
    name="path:ko00010"
    name="path:hsa00010"

Here ***** represents the pathway map number and [org prefix] is a three-letter species code in KEGG.

org attribute

The org attribute specifies the classification of this pathway map. The distinction of reference pathways and pathways for various organisms is made according to the attribute value.

attribute value explanation
ko the reference pathway map represented by KO identifiers
ec the reference pathway map represented by ENZYME identifiers
[org prefix] the organism-specific pathway map for "org"
number attribute

The number attribute specifies the five-digit pathway map number.

attribute value explanation
five-digit integer ex) number="00030"
title attribute

The title attribute specifies the title of this pathway map.

attribute value explanation
string ex) title="Pentose phosphate pathway"
image attribute

The image attribute specifies the resource location of the image file for this pathway map in the KEGG Web service.

attribute value explanation
URL ex) image="http://www.genome.jp/kegg/pathway/ko/ko00010.png"
link attribute

The link attribute specifies the resource location of the information about this pathway map in the KEGG Web service.

attribute value explanation
URL ex) link="http://www.genome.jp/kegg-bin/show_pathway?ko00010"

ENTRY

entry element

The entry element contains information about a node of the pathway. The attributes of this element are as follows.

attribute name data type explanation REQUIRED/IMPLIED
id id.type the ID of this entry in the pathway map REQUIRED
name keggid.type the KEGGID of this entry REQUIRED
type entry_type.type the type of this entry REQUIRED
link url.type the resource location of the information about this entry IMPLIED
reaction keggid.type the KEGGID of corresponding reaction IMPLIED
id attribute

The id attribute specifies the identification number of this entry. Each entry element in the pathway element is uniquely specified according to this id attribute value.

attribute value explanation
positive integer the identification number of this entry
name attribute

The name attribute contains the KEGG identifier of this entry, which is generally in the form of db:accession where db is the database name and accession is the accession number.

attribute value explanation
path:(accession) pathway map
ex) name="path:map00040"
ko:(accession) KO (ortholog group)
ex) name="ko:E3.1.4.11"
ec:(accession) enzyme
ex) name="ec:1.1.3.5"
rn:(accession) reaction
ex) name="rn:R00120"
cpd:(accession) chemical compound
ex) name="cpd:C01243"
gl:(accession) glycan
ex) name="gl:G00166"
[org prefix]:(accession) gene product of a given organism
ex) name="eco:b1207"
group:(accession) complex of KOs
If accession is undefined, "undefined" is specified.
ex) name="group:ORC"
type attribute

The type attribute specifies the type of this entry. Note that when the pathway map is linked to another map, the linked pathway map is treated as a node, a clickable graphics object (round rectangle) in the KEGG Web service.

attribute value explanation
ortholog the node is a KO (ortholog group)
enzyme the node is an enzyme
reaction the node is a reaction
gene the node is a gene product (mostly a protein)
group the node is a complex of gene products (mostly a protein complex)
compound the node is a chemical compound (including a glycan)
map the node is a linked pathway map
link attribute

The link attribute specifies the resource location of the information about this entry in the KEGG Web service. In the organism-specific pathways, this attribute is not defined if the organism does not have the entry (gene).

attribute value explanation
URL ex)link="http://www.genome.jp/dbget-bin/www_bget?eco+b1207"
reaction attribute

The reaction attribute specifies the KEGGID of the corresponding chemical reaction(s) in the KEGG LIGAND database.

attribute value explanation
rn:(accession) ex)reaction="rn:R02749"

graphics element

The graphics element is a subelement of the entry element, specifying drawing information about the graphics object.

attribute name data type explanation REQUIRED/IMPLIED
name string.type the label of this graphics object IMPLIED
x number.type the X axis position of this graphics object IMPLIED
y number.type the Y axis position of this graphics object IMPLIED
coords string.type the polyline coordinates IMPLIED
type graphics.type the shape of this graphics object IMPLIED
width number.type the width of this graphics object IMPLIED
height number.type the height of this graphics object IMPLIED
fgcolor graphics-color.type the foreground color used by this graphics object IMPLIED
bgcolor graphics-color.type the backgraound color used by this graphics object IMPLIED
name attribute

The name attribute contains the label that is associated with this graphics object. When two or more name attributes are specified in the same entry element, the first one is taken as the attribute value. When the type attribute value of the entry element is "gene", the gene name is specified for this attribute value.

attribute value explanation
string the label of this graphics object
ex)
     name="1.1.1.43"
     name="Methane metabolism"
x attribute

The x attribute specifies the x-coordinate value of this graphics object in the manually drawn KEGG pathway map.

attribute value explanation
positive integer ex) x="190"
y attribute

The y attribute specifies the y-coordinate value of this graphics object in the manually drawn KEGG pathway map.

attribute value explanation
positive integer ex) y="51"
coords attribute

The coords attribute specifies a set of coordinates, x1,y1,x2,y2,..., for the line object.

attribute value explanation
string ex) coords="573,729,573,779"
type attribute

The type attribute specifies the shape of this object. The default value is "rectangle".

attribute value explanation
rectangle the shape is a rectangle, which is used to represent a gene product and its complex (including an ortholog group).
circle the shape is a circle, which is used to specify any other molecule such as a chemical compound and a glycan.
roundrectangle the shape is a round rectangle, which is used to represent a linked pathway.
line the shape is a polyline, which is used to represent a reaction or a relation (and also a gene or an ortholog group).
width attribute

The width attribute specifies the width this object. The default value is "45".

attribute value explanation
positive integer ex) width="73"
height attribute

The height attribute specifies the height of this object. The default value is "17".

attribute value explanation
positive integer ex) height="34"
fgcolor attribute

The fgcolor attribute specifies the foreground color of this object. It applies to the frame and the character string. The default value is "#000000".

attribute value explanation
numerical RGB ex) fgcolor="#000000"
bgcolor attribute

The bgcolor attribute specifies the background color of this object. The default value is "#FFFFFF". The background color for the gene product is "#BFFFBF".

attribute value explanation
numerical RGB ex) fgcolor="#BFFFBF"

component element

The component element is a subelement of the entry element, and is used when the entry element is a complex node; namely, when the type attribute value of the entry element is "group". The nodes that constitute the complex are specified by recurrent calls. For example, when the complex is composed of two nodes, two component elements are specified. The attribute of this element is as follows.

attribute name data type explanation REQUIRED/IMPLIED
id idref.type the ID of the component which is part of the complex REQUIRED
id attribute

The id attribute specifies the identification number of this component. The entry element of "group" type is specified by a complete set of component elements.

attribute value explanation
positive integer the identification number of this component

RELATION

relation element

The relation element specifies relationship between two proteins (gene products) or two KOs (ortholog groups) or protein and compound, which is indicated by an arrow or a line connecting two nodes in the KEGG pathways. The relation element has a subelement named the subtype element. When the name attribute value of the subtype element is a value with directionality like "activation", the direction of the interaction is from entry1 to entry2. The attributes of this element are as follows.

attribute name data type explanation REQUIRED/IMPLIED
entry1 idref.type the first (from) entry that defines this relation REQUIRED
entry2 idref.type the second (to) entry that defines this relation REQUIRED
type relation-type.type the type of this relation REQUIRED
entry1 attribute

The entry1 attirbute specifies the id attribute value of the first entry element.

attribute value explanation
positive integer the ID of node which takes part in this relation
entry2 attribute

The entry2 attirbute specifies the id attribute value of the second entry element.

attribute value explanation
positive integer the ID of node which takes part in this relation
type attribute

The type attribute specifies one of three types of relations, so-called the generalized protein interactions in KEGG, and additional PCrel for interaction between a protein and a chemical compound, and maplink for linkage between a protein and a map. The maplink relation is provided for interaction between a protein and another in the specified map.

attribute value explanation
ECrel enzyme-enzyme relation, indicating two enzymes catalyzing successive reaction steps
PPrel protein-protein interaction, such as binding and modification
GErel gene expression interaction, indicating relation of transcription factor and target gene product
PCrel protein-compound interaction
maplink link to another map

subtype element

The subtype element specifies more detailed information about the nature of the interaction or the relation. The attributes of this element are as follows.

attribute name data type explanation REQUIRED/IMPLIED
name subtype-name.type Interaction/relation information REQUIRED
value subtype-value.type Interaction/relation property value REQUIRED
name and value attributes

The name attribute specifies the subcategory and/or the additional information in each of the three types of the generalized protein interactions. The correspondence between the type attribute of the relation element (ECrel, PPrel or GErel) and the name and value attributes of the subtype element is shown below.

name value ECrel PPrel GErel Explanation
compound Entry element id attribute value for compound. * * shared with two successive reactions (ECrel) or intermediate of two interacting proteins (PPrel)
hidden compound Entry element id attribute value for hidden compound. * shared with two successive reactions but not displayed in the pathway map
activation --> * positive and negative effects which may be associated with molecular information below
inhibition --| *
expression --> * interactions via DNA binding
repression --| *
indirect effect ..> * * indirect effect without molecular details
state change ... * state transition
binding/association --- * association and dissociation
dissociation -+- *
missing interaction -/- * * missing interaction due to mutation, etc.
phosphorylation +p * molecular events
dephosphorylation -p *
glycosylation +g *
ubiquitination +u *
methylation +m *

REACTION

reaction element

The reaction element specifies chemical reaction between a substrate and a product indicated by an arrow connecting two circles in the KEGG pathways. The reaction element has the substrate element and the product element as subelements. The attributes of this element are as follows.

attribute name data type explanation REQUIRED/IMPLIED
id idref.type the ID of this reaction REQUIRED
name keggid.type the KEGGID of this reaction REQUIRED
type reaction-type.type the type of this reaction REQUIRED
id attribute

The id attribute specifies the identification number of this reaction.

attribute value explanation
positive integer the identification number of this reaction
name attribute

The name attribute contains the KEGG identifier of the REACTION database.

attribute value explanation
rn:(accession) ex) reaction="rn:R02749"
type attribute

The type attribute specifies the distinction of reversible and irreversible reactions, which are indicated by bi-directional and uni-directional arrows in the KEGG pathways. Note that the terms "reversible" and "irreversible" do not necessarily reflect biochemical properties of each reaction. They rather indicate the direction of the reaction drawn on the pathway map that is extracted from text books and literatures.

attribute value explanation
reversible reversible reaction
irreversible irreversible reaction

substrate element

The substrate element specifies the substrate node of this reaction. The attribute of this element is as follows.

attribute name data type explanation REQUIRED/IMPLIED
id idref.type the ID of this substrate REQUIRED
name keggid.type KEGGID of substrate node REQUIRED
id attribute

The id attribute specifies the identification number of this substrate.

attribute value explanation
positive integer the identification number of this substrate
name attribute

The name attribute contains the KEGG identifier of the COMPOUND database or the GLYCAN database.

attribute value explanation
cpd:(accession)
gl:(accession)
ex) cpd:C05378
      gl:G00037

product element

The product element specifies the product node of this reaction. The attribute of this element is as follows.

attribute name data type explanation REQUIRED/IMPLIED
id idref.type the ID of this product REQUIRED
name keggid.type the KEGGID of product node REQUIRED
id attribute

The id attribute specifies the identification number of this product.

attribute value explanation
positive integer the identification number of this product
name attribute

The name attribute contains the KEGG identifier of the COMPOUND database or the GLYCAN database.

attribute value explanation
cpd:(accession)
gl:(accession)
ex) cpd:C05378
      gl:G00037

alt element

The alt element specifies the alternative name of its parent element. The attribute of this element is as follows.

attribute name data type explanation REQUIRED/IMPLIED
name keggid.type the KEGGID of node REQUIRED
name attribute

The name attribute contains the KEGG identifier of the COMPOUND database or the GLYCAN database.

attribute value explanation
cpd:(accession)
gl:(accession)
ex) cpd:C05378
      gl:G00037

Access methods

The entire set of KGML files may be downloaded from the FTP site (academic users only).

Non-academic users are requested to obtain a licensing agreement. Please refer to the page below.

Release history

KGML v0.7.1
Release April 1, 2010
--------------------------------------------------------------------------------
KGML v0.7
Release October 1, 2009
--------------------------------------------------------------------------------
KGML v0.6.1
Release April 25, 2006
--------------------------------------------------------------------------------
KGML v0.6
Release March 1, 2005
--------------------------------------------------------------------------------
KGML v0.5
Release December 10, 2004
--------------------------------------------------------------------------------
KGML v0.4
Release April 1, 2004
--------------------------------------------------------------------------------
KGML v0.3
Release October 1, 2003
--------------------------------------------------------------------------------
KGML v0.2
Release April 1, 2003
--------------------------------------------------------------------------------
KGML v0.1
Release January 1, 2003
-------------------------------------------------------------------------------