KEGG COMPOUND Database

Chemical Structures

KEGG COMPOUND is one of the four original databases, together with KEGG PATHWAY, KEGG GENES and KEGG ENZYME, introduced at the start of the KEGG project in 1995. It is a collection of small molecules, biopolymers, and other chemical substances that are relevant to biological systems. Each entry is identified by the C number, such as C00047 for L-lysine, and contains chemical structure and associated information, as well as various links to other KEGG databases and outside databases. Some COMPOUND entries are also represented as GLYCAN and DRUG entries with the "Same as" links.

Current statistics (2026/6/12)

	Compound (C number)	Glycan (G number)	Drug (D number)
Number of entries	19,572	11,271	12,854
Entries linked to pathway maps	6,653	344	2,372
Entries linked to brite hierarchies	9,349	2,248	10,693
Entries linked to brite tables	911	219	2,961
Entries linked to network variation maps	362	65	213
C numbers same as G/D numbers		263	3,074

While GLYCAN entries are represented as tree structures with monosaccharide codes, COMPOUND entries for peptides and polyketides, such as C11996 for methymycin, are represented as sequences using the abbreviation codes for the monomeric units of amino acids and carboxylic acids.

Naturally processed peptides from gene products, such as C00873 for human angiotensin I, are sometimes represented in the KEGG COMPOUND database. They are always associated with sequence information using the three-letter amino acid codes, but they may or may not contain the full atomic structure representation.

Chemical Compound Categories

The classifications of representative categories of KEGG COMPOUND, as well as biosynthetic genes, are given in the following BRITE hierarchy files.

br08001: Compounds with biological roles
br08011: Secondary metabolites in Pathway Maps

br08002: Lipids
ko01004: Lipid biosynthesis proteins
ko01008: Polyketide biosynthesis proteins

br08003: Phytochemical compounds
br08012: Dietary phytochemicals

br08020: Nucleotide sugars
br08021: Glycosides

br08005: Bioactive peptides
br08007: Natural toxins

br08006: Endocrine disrupting compounds
br08007: Pesticides

br08308: Narcotics and psychotropics in Japan

Chemical Compounds in Pathways and Diseases

The role of KEGG COMPOUND has always been to enable links from molecular-levl data to molecular network-level data. Chemical compound entries are constituents of KEGG pathway maps, KEGG modules, reaction modules and network variation maps. They are used to analyze, for example, metabolome data to uncover higher-level functional features using the KEGG Mapper tools.

In addition, chemical compound entries are used to represent disease-associated perturbed networks, such as nt06014 for congenital disorders of sphingolipid metabolism in the KEGG NETWORK database.

Biosynthetic codes

The structures of DNA, RNA, and proteins are determined by template-based syntheses of replication, transcription, and translation with the genetic code. In contrast, the structures of glycans, lipids, polyketides, nonribosomal peptides, and various plant secondary metabolites are determined by biosynthetic pathways. Attempts are being made to develop overview pathway maps by defining reaction modules and to understand such biosynthetic codes.

map01212: Fatty acid metabolism
rm020_21: Variation of fatty acid synthesis

Biodegradation codes

KEGG COMPOUND also contains xenobiotic compounds, many of which may be degraded by microbial degradation pathways. Here again attempts are made to develop overview pathway maps together with reaction modules.

map01220: Degradation of aromatic compounds

Database Search Tools

SIMCOMP and SUBCOMP are database search programs for similar chemical structures. SIMCOMP is based on a graph matching to find maximal common subgraphs allowing mismatches, while SUBCOMP is a bit-string based search program to find exactly matching substructures or superstructures.

SIMCOMP search (served by GenomeNet)
SUBCOMP search (served by GenomeNet)

References

Hashimoto, K., Yoshizawa, A.C., Okuda, S., Kuma, K., Goto, S., and Kanehisa, M.; The repertoire of desaturases and elongases reveals fatty acid variations in 56 eukaryotic genomes. J. Lipid Res. 49, 183-191 (2008). [pubmed]
Minowa, Y., Araki, M., and Kanehisa, M.; Comprehensive analysis of distinctive polyketide and nonribosomal peptide structural motifs encoded in microbial genomes. J. Mol. Biol. 368, 1500-1517 (2007). [pubmed]
Kanehisa, M.; KEGG bioinformatics resource for plant genomics and metabolomics. Methods Mol. Biol. 1374, 55-70 (2016). [pubmed]
Hattori, M., Okuno, Y., Goto, S., and Kanehisa, M.; Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways. J. Am. Chem. Soc. 125, 11853-11865 (2003). [pubmed]
Hattori, M., Tanaka, N., Kanehisa, M., and Goto, S.; SIMCOMP/SUBCOMP: chemical structure search servers for network analyses. Nucleic Acids Res. 38, W652-W656 (2010). [pubmed]

Last updated: April 22, 2025