Chemical Structures

KEGG COMPOUND is a collection of small molecules, biopolymers, and other chemical substances that are relevant to biological systems. Each entry is identified by the C number, such as C00047 for L-lysine, and contains chemical structure and associated information, as well as various links to other KEGG databases and outside databases. The COMPOUND database is maintained in the KEGG LIGAND relational database. The classification of representative entries in KEGG COMPOUND is given in the following BRITE hierarchy file.

Biosynthetic Codes

The structures of DNA, RNA, and proteins are determined by template-based syntheses of replication, transcription, and translation with the genetic code. In contrast, the structures of glycans, lipids, polyketides, nonribosomal peptides, and various secondary metabolites are determined by biosynthetic pathways. KEGG COMPOUND, as well as KEGG GLYCAN, is a resource for understanding such biosynthetic codes and for inferring chemical repertoires of these diverse substances from genomic information [1-3].

Peptide entries in KEGG COMPOUND are designated with "Peptide" in the first Entry line (see example here). They are always represented as sequence information using the three-letter amino acid codes, but they may or may not contain the full atomic structure representation. Small bioactive peptides are categorized in the BRITE hierarchy file shown below.
Glycans Lipids
KEGG BRITE contains a classification of lipids, and KEGG PATHWAY contains pathway maps for lipid biosynthesis and metabolism and a structure map for polyunsaturated fatty acids [2].
Polyketides and nonribosomal peptides
Polyketide (PK) and nonribosomal peptide (NRP) entries are designated in KEGG COMPOUND with "PK", "NRP", or "PKNRP" (mixed type) in the first Entry line (see example here). They are represented as sequence information using the abbreviation codes for the monomeric units of carboxylic acids and amino acids. Some PKs and NRPs are known antibiotics.

Divergent Compounds Biosynthesized and Biodegraded

Plant secondary metobolites
Plants are known to produce diverse compounds, many of which have practical values. Our knowledge on plant secondary metabolites is being organized in KEGG BRITE and KEGG PATHWAY.
Xenobiotic compounds
In addition to biochemical compounds, KEGG COMPOUND contains xenobiotic compounds, many of which may be degraded by microbial degradation pathways.

Database Search Tools

SIMCOMP and SUBCOMP are database search programs for similar chemical structures. SIMCOMP is based on a graph matching to find maximal common subgraphs allowing mismatches, while SUBCOMP is a bit-string based search program to find exactly matching substructures or superstructures [4,5].

KegDraw Tool

KegDraw is a standalone Java application for drawing chemical compound structures and glycan structures in a way similar to ChemDraw and ISIS/Draw. KegDraw runs on Mac, Windows, and Linux, and is made freely available to both academic and non-academic users.

  1. Hashimoto, K., Tokimatsu, T., Kawano, S., Yoshizawa, A.C., Okuda, S., Goto, S., and Kanehisa, M.; Comprehensive analysis of glycosyltransferases in eukaryotic genomes for structural and functional characterization of glycans. Carbohydr. Res. 344, 881-887 (2009). [pubmed]
  2. Hashimoto, K., Yoshizawa, A.C., Okuda, S., Kuma, K., Goto, S., and Kanehisa, M.; The repertoire of desaturases and elongases reveals fatty acid variations in 56 eukaryotic genomes. J. Lipid Res. 49, 183-191 (2008). [pubmed]
  3. Minowa, Y., Araki, M., and Kanehisa, M.; Comprehensive analysis of distinctive polyketide and nonribosomal peptide structural motifs encoded in microbial genomes. J. Mol. Biol. 368, 1500-1517 (2007). [pubmed]
  4. Hattori, M., Okuno, Y., Goto, S., and Kanehisa, M.; Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways. J. Am. Chem. Soc. 125, 11853-11865 (2003). [pubmed]
  5. Hattori, M., Tanaka, N., Kanehisa, M., and Goto, S.; SIMCOMP/SUBCOMP: chemical structure search servers for network analyses. Nucleic Acids Res. 38, W652-W656 (2010). [pubmed]

Last updated: September 14, 2016
KEGG GenomeNet Kanehisa Laboratories