Chemical Structures

KEGG COMPOUND is a collection of small molecules, biopolymers, and other chemical substances that are relevant to biological systems. Each entry is identified by the C number, such as C00047 for L-lysine, and contains chemical structure and associated information, as well as various links to other KEGG databases and outside databases. The classification of representative entries in KEGG COMPOUND is given in the following BRITE hierarchy file.

Chemical Compounds in Pathways and Diseases

KEGG COMPOUND entries are major constituents of KEGG pathway maps and KEGG modules. Some of them are represented differently as KEGG GLYCAN and KEGG DRUG entries with "Same as" links. Now compound-disease associations are being explored in the new KEGG NETWORK database.

It is interesting to note that the correlation is often observed between KEGG modules defined as conserved functional units across species and KEGG network units defined from disease-related gene variants in Homo sapiens. Here are some examples (see also KEGG GLYCAN).

Pathway Module Disease Network Content

hsa00220       Arginine biosynthesis
hsa00220+M00029 hsa_M00029     Urea cycle
hsa00220+H01398   H01398   Urea cycle disorders
      nt06010 Urea cycle

hsa00860       Porphyrin and chlorophyll metabolism
hsa00860+M00868 hsa_M00868     Heme biosynthesis, animals and fungi
hsa00860+H01763   H01763   Porphyria
      nt06011 Heme biosynthesis

Biosynthetic Codes

The structures of DNA, RNA, and proteins are determined by template-based syntheses of replication, transcription, and translation with the genetic code. In contrast, the structures of glycans, lipids, polyketides, nonribosomal peptides, and various secondary metabolites are determined by biosynthetic pathways. KEGG COMPOUND, as well as KEGG GLYCAN, is a resource for understanding such biosynthetic codes and for inferring chemical repertoires of these diverse substances from genomic information.

Peptide entries in KEGG COMPOUND are designated with "Peptide" in the first Entry line (see example here). They are always represented as sequence information using the three-letter amino acid codes, but they may or may not contain the full atomic structure representation. Small bioactive peptides are categorized in the BRITE hierarchy file shown below.
Glycans Lipids
KEGG BRITE contains a classification of lipids, and KEGG PATHWAY contains pathway maps for lipid biosynthesis and metabolism and a structure map for polyunsaturated fatty acids.
Polyketides and nonribosomal peptides
Polyketide (PK) and nonribosomal peptide (NRP) entries are designated in KEGG COMPOUND with "PK", "NRP", or "PKNRP" (mixed type) in the first Entry line (see example here). They are represented as sequence information using the abbreviation codes for the monomeric units of carboxylic acids and amino acids. Some PKs and NRPs are known antibiotics.

Divergent Compounds Biosynthesized and Biodegraded

Plant secondary metobolites
Plants are known to produce diverse compounds, many of which have practical values. Our knowledge on plant secondary metabolites is being organized in KEGG BRITE and KEGG PATHWAY.
Xenobiotic compounds
In addition to biochemical compounds, KEGG COMPOUND contains xenobiotic compounds, many of which may be degraded by microbial degradation pathways.

Database Search Tools

SIMCOMP and SUBCOMP are database search programs for similar chemical structures. SIMCOMP is based on a graph matching to find maximal common subgraphs allowing mismatches, while SUBCOMP is a bit-string based search program to find exactly matching substructures or superstructures.

  1. Hashimoto, K., Tokimatsu, T., Kawano, S., Yoshizawa, A.C., Okuda, S., Goto, S., and Kanehisa, M.; Comprehensive analysis of glycosyltransferases in eukaryotic genomes for structural and functional characterization of glycans. Carbohydr. Res. 344, 881-887 (2009). [pubmed]
  2. Hashimoto, K., Yoshizawa, A.C., Okuda, S., Kuma, K., Goto, S., and Kanehisa, M.; The repertoire of desaturases and elongases reveals fatty acid variations in 56 eukaryotic genomes. J. Lipid Res. 49, 183-191 (2008). [pubmed]
  3. Minowa, Y., Araki, M., and Kanehisa, M.; Comprehensive analysis of distinctive polyketide and nonribosomal peptide structural motifs encoded in microbial genomes. J. Mol. Biol. 368, 1500-1517 (2007). [pubmed]
  4. Hattori, M., Okuno, Y., Goto, S., and Kanehisa, M.; Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways. J. Am. Chem. Soc. 125, 11853-11865 (2003). [pubmed]
  5. Hattori, M., Tanaka, N., Kanehisa, M., and Goto, S.; SIMCOMP/SUBCOMP: chemical structure search servers for network analyses. Nucleic Acids Res. 38, W652-W656 (2010). [pubmed]

Last updated: May 21, 2021