Chemical Structures

KEGG COMPOUND is a collection of small molecules, biopolymers, and other chemical substances that are relevant to biological systems. Each entry is identified by the C number, such as C00047 for L-lysine, and contains chemical structure and associated information, as well as various links to other KEGG databases and outside databases. Some COMPOUND entries are also represented as GLYCAN and DRUG entries with the "Same as" links. The classification of representative entries in KEGG COMPOUND is given in the following BRITE hierarchy file.

Chemical Compounds in Pathways and Diseases

KEGG COMPOUND is one of the four original databases, together with KEGG PATHWAY, KEGG GENES and KEGG ENZYME, introduced at the start of the KEGG project. Its role has ALways been to enable links from individual molecular data to molecular network data. Chemical compound entries are constituents of KEGG pathway maps and KEGG modules, and they are used to analyze, for example, metabolomics data to uncover metabolic features using the KEGG Mapper tools. In addition to pathways and modules, chemical compound entries are now used to represent disease-associated perturbed networks, such as for congenital disorders of metabolism, in KEGG NETWORK.

Naturally processed peptides from gene products, such as C00873 for human angiotensin I, are sometimes represented in the KEGG COMPOUND database. They are always associated with sequence information using the three-letter amino acid codes, but they may or may not contain the full atomic structure representation. Small bioactive peptides are categorized in the BRITE hierarchy file shown below.

Biosynthetic Codes

The structures of DNA, RNA, and proteins are determined by template-based syntheses of replication, transcription, and translation with the genetic code. In contrast, the structures of glycans, lipids, polyketides, nonribosomal peptides, and various secondary metabolites are determined by biosynthetic pathways. KEGG COMPOUND, as well as KEGG GLYCAN, is a resource for understanding such biosynthetic codes and for inferring chemical repertoires of these diverse substances from genomic information.
Fatty acids
Fatty acid biosynthesis and metabolism are represented in various KEGG pathway maps and KEGG reaction modules.
KEGG BRITE contains classifications of lipids and lipid biosynthesis proteins.
Polyketides and nonribosomal peptides
Polyketide (PK) and nonribosomal peptide (NRP) entries, such as C11996 for methymycin, are represented as sequence information using the abbreviation codes for the monomeric units of carboxylic acids and amino acids. Some PKs and NRPs are known antibiotics.
Details are given in KEGG GLYCAN.

Divergent Compounds Biosynthesized and Biodegraded

Plant secondary metobolites
Plants are known to produce diverse compounds, many of which have practical values. Our knowledge of plant secondary metabolites is organized in KEGG BRITE and KEGG PATHWAY.
Xenobiotic compounds
In addition to biochemical compounds, KEGG COMPOUND contains xenobiotic compounds, many of which may be degraded by microbial degradation pathways.

Database Search Tools

SIMCOMP and SUBCOMP are database search programs for similar chemical structures. SIMCOMP is based on a graph matching to find maximal common subgraphs allowing mismatches, while SUBCOMP is a bit-string based search program to find exactly matching substructures or superstructures.

  1. Hashimoto, K., Yoshizawa, A.C., Okuda, S., Kuma, K., Goto, S., and Kanehisa, M.; The repertoire of desaturases and elongases reveals fatty acid variations in 56 eukaryotic genomes. J. Lipid Res. 49, 183-191 (2008). [pubmed]
  2. Minowa, Y., Araki, M., and Kanehisa, M.; Comprehensive analysis of distinctive polyketide and nonribosomal peptide structural motifs encoded in microbial genomes. J. Mol. Biol. 368, 1500-1517 (2007). [pubmed]
  3. Kanehisa, M.; KEGG bioinformatics resource for plant genomics and metabolomics. Methods Mol. Biol. 1374, 55-70 (2016). [pubmed]
  4. Hattori, M., Okuno, Y., Goto, S., and Kanehisa, M.; Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways. J. Am. Chem. Soc. 125, 11853-11865 (2003). [pubmed]
  5. Hattori, M., Tanaka, N., Kanehisa, M., and Goto, S.; SIMCOMP/SUBCOMP: chemical structure search servers for network analyses. Nucleic Acids Res. 38, W652-W656 (2010). [pubmed]

Last updated: July 25, 2022