/ _ _/ ___ | _ / _ _/ _ _/ _ ___ |
/ / / _/ / | / | / / |
/ / / / | / | / / |
/ / / ____ | / | / / /
/ / / _ _/ / | / | / / /
/ / / / / | / / / /
______/ ___/ _____/ ___/ ___/ ___/ ___/ _______/
LIGAND - Database of Chemical Compounds and Reactions in Biological Pathways
Release 29.0, March 2004
User Manual
Susumu Goto, Masahiro Hattori, Takaaki Nishioka(*),
and Minoru Kanehisa
Bioinformatics Center, Institute for Chemical Research, Kyoto University
and
(*)Graduate School of Agricultural Sciences, Kyoto University
http://www.genome.ad.jp/ligand/
Feedback: http://www.genome.ad.jp/feedback/?category=ligand
==========================================================================
TABLE OF CONTENTS
==========================================================================
1. INTRODUCTION
2. CONVENTIONS
3. COMPOUND SECTION
4. GLYCAN SECTION
5. REACTION SECTION
6. ENZYME SECTION
7. NUMBERS OF DATA ENTRIES
8. CHANGES IN THIS RELEASE
9. UPCOMING CHANGES
10. ACKNOWLEDGMENTS
11. REFERENCES
==========================================================================
1. INTRODUCTION
LIGAND, Database of Chemical Compounds and Reactions in Biological Pathways,
is designed to provide the linkage between chemical and biological aspects of
life in the light of enzymatic reactions. The database consists of four
sections: COMPOUND, GLYCAN, REACTION, and ENZYME.
The COMPOUND section is a collection of metabolic and other compounds including
substrates, products, inhibitors of metabolic pathways as well as drugs and
xenobiotic chemicals. Each of the chemical substances that appear in the
REACTION and ENZYME sections and KEGG/PATHWAY database is identified by an
accession number and stored in this section. Each entry of COMPOUND
contains information of naming, chemical formula, structural formula in
separate GIF, MOL and KCF files, metabolic and other pathways, related enzymes,
related protein structures, prosthetic groups and CAS (Chemical Abstracts
Service) registry number.
The GLYCAN section is a collection of carbohydrate structures (sequences).
There was a well known database for carbohydrates called CarbBank/CCSD. Since
the fund for the database has been discontinued, it has not been updated and
maintained. We have taken over their activity and created a new database,
which we call GLYCAN, by merging entries whose carbohydrate structures are same
and by adding new information, such as related pathways. Each GLYCAN entry
contains information of naming, composition of monosaccharides, links to
metabolic pathways, related proteins and reactions.
The structural formul of each carbohydrate is provided by separate GIF and KCF
files. KCF (KEGG Chemical Function) format represents a graph, and in the case
of carbohydrate structure, monosaccharides and glycosidic bonds are represented
by nodes and egdes, respectively. The KCF format for chemical compounds is
described in:
Hattori, M., Okuno, Y., Goto, S., and Kanehisa, M., "Development of a
chemical structure comparison method for integrated analysis of chemical
and genomic information in the metabolic pathways", Journal of American
Chemical Society, 125:11853-11865 (2003). [PMID:14505407]
The REACTION section is a collection of chemical reactions that appear in the
pathway diagrams of the KEGG/PATHWAY database as well as in the ENZYME section.
It includes non-enzymatic reactions and enzymatic reactions whose EC numbers
have not been assigned yet, in addition to those with EC numbers.
The ENZYME section is a collection of all known enzymatic reactions
classified according to the nomenclature of the International Union of
Biochemistry and Molecular Biology (IUBMB):
International Union of Biochemistry and Molecular Biology,
"Enzyme Nomenclature: Recommendations (1992) of the Nomenclature
Committee of the International Union of Biochemistry and Molecular
Biology", Academic Press, New York (1992).
Current classification is maintained at
http://www.chem.qmw.ac.uk/iubmb/enzyme/
and the basic information about the enzymes are taken from the site.
Each entry of ENZYME is identified by the EC (Enzyme Commission) number, and
contains information of naming, chemical reactions, metabolic compounds,
metabolic pathways, references, genes encoding the enzyme for several organisms
(mainly completely sequenced ones), genetic diseases, and links to other
databases including protein sequence motifs and 3D structural data.
The LIGAND database is a major component of the DBGET/LinkDB integrated
database system (http://www.genome.ad.jp/dbget/), providing useful links
among the existing databases. LIGAND is tightly coupled to the pathway
database of the KEGG system (http://www.genome.ad.jp/kegg/), providing
links to the gene catalogs of a number of organisms. Substructure searches
of COMPOUND and REACTION can be done by the Chemscape server
(http://ligand.genome.ad.jp:8080/compound/ and
http://ligand.genome.ad.jp:8080/reaction/). Substructure and similar structure
searches for GLYCAN can be done by the glycan web page
(http://glycan.genome.ad.jp/), which uses Java applet.
The basic concept of the LIGAND database is described in:
Goto, S., Nishioka, T., and Kanehisa, M., "LIGAND: Chemical Database for
Enzyme Reactions", Bioinformatics, 14:591-599 (1998). [PMID:8730924]
and the most up-to-date information is described in:
Goto, S., Okuno, Y., Hattori, M., Nishioka, T., and Kanehisa, M.,
"LIGAND: database of chemical compounds and reactions in biological
pathways", Nucleic Acids Research, 30:402-404 (2002). [PMID:11752349]
Please cite them accordingly when making use of the LIGAND database.
2. CONVENTIONS
2.1. General Data Format
LIGAND is constructed as a flat-file database. Similar to the data
formats of PIR and GenBank databases, a fixed number of columns are
assigned to specify the attributes of data. Each attribute is identified
by the keyword that appears on columns 1-12. When these columns are
blank, the attribute of the preceding line continues. Columns from 13
are used to describe the entities of data.
For the COMPOUND section, the following data items appear on columns 1-12:
ENTRY
NAME
FORMULA
GLYCAN
PATHWAY
REACTION
ENZYME
STRUCTURES
DBLINKS
///
A COMPOUND entry starts with the ENTRY data item, which is followed by the
other data items in the order shown above, and ends with the end-of-entry data
item (///). The data items ENTRY, NAME, and end-of-entry are mandatory, while
the other data items are optional.
Example:
ENTRY C00116
NAME Glycerol
Glycerin
1,2,3-Trihydroxypropane
1,2,3-Propanetriol
FORMULA C3H8O3
REACTION R00841 R00847 R00850 R01034 R01036 R01039 R01041 R01043
R01044 R01045 R01046 R01047 R01048 R01104 R01350 R01351
R01352 R02591 R03331 R03616
PATHWAY PATH: MAP00052 Galactose metabolism
PATH: MAP00561 Glycerolipid metabolism
ENZYME 1.1.1.1 1.1.1.2 1.1.1.6 1.1.1.21
1.1.1.72 1.1.1.156 1.1.99.22 2.3.1.21
2.7.1.30 2.7.1.79 2.7.1.142 3.1.1.23
3.1.1.34 3.1.3.19 3.1.3.21 3.1.4.38
3.1.4.43 3.2.1.22 3.2.1.23 4.2.1.30
DBLINKS CAS: 56-81-5
///
In addition, the molecular structure is stored in separate MDL/MOL, GIF and KCF
format files for each compound. The GIF image is automatically displayed
between the FORMULA and REACTION data items in the WWW version of DBGET and is
a link to the Chemscape version of COMPOUND.
For the GLYCAN section, the following data items appear on colums 1-12:
ENTRY
NAME
COMPOSITION
MASS
CLASS
BINDING
COMMENT
COMPOUND
PATHWAY
REACTION
ENZYME
ORTHOLOG
DBLINKS
///
A GLYCAN entry starts with the ENTRY data item, which is followed by the other
data items in the order shown above, and ends with the end-of-entry data item
(///). The data items ENTRY and end-of-entry are mandatory, while the other
data items are optional.
Example:
ENTRY G00249 Glycan
NAME Raffinose
COMPOSITION (Glc)1 (Gal)1 (Fruf)1
MASS 504.44248 (Glc)1 (Gal)1 (Fruf)1
CLASS Oligosaccharide
BINDING ENZYME: beta-D-glucosidase/beta-D-xylosidase [PMID:8156553]
ENZYME: beta-glucosidase, (1->6)-alpha-arabinofuranosidase
[PMID:7986378]
ENZYME: alpha-galactosidase from Candida guilliermondii H-404
[PMID:7772826]
ENZYME: Aspergillus sydowi beta-fructofuranosidase (2,1- and
2,6-linked) [PMID:7766019]
ENZYME: honeybee alpha-glucosidase I [PMID:2204617]
COMPOUND C00492
PATHWAY PATH: MAP00052 Galactose metabolism
REACTION R06042 R06070 R06071 R06100
R06139 R06152
ENZYME 2.4.1.67 2.4.1.82 2.4.1.166 3.2.1.22
3.2.1.26 3.2.1.55 3.2.1.80
DBLINKS CCSD: 1887 8349 8657 9314 9361 10112 10774 11014 12280
12485 12488 12490 12535 12595 12624 13207 13409 13437
13512 13793 14929 15415 15496 15512 15806 16158 16761
17154 17162 17164 18130 18355 18361 18537 18732 19002
19141 19400 19742 20848 20854 20860 22731 22733 22860
22861 23008 23018 23116 23598 24270 24326 24652 24997
25071 25160 25372 25442 25483 25832 26128 26134 26189
26545 26580 26641 26681 26697 26777 27199 27240 27406
27410 27413 27419 27751 28306 28314 29193 29196 29578
29852 30154 30671 30675 31884 32410 32514 33563 3446
34479 37937 41648 42252 42283 42346 42601 44983 45673
46955 47474 48497 48556 48613
///
In addition, the molecular structure is stored in a separate KCF format file
as well as in a GIF image format file for each glycan. The GIF image is
automatically displayed between the MASS and CLASS data items in the WWW
version of DBGET and is a link to the glycan structure search using Java
applet version of GLYCAN.
For the REACTION section, the following data items appear on columns 1-12:
ENTRY
NAME
DEFINITION
EQUATION
PATHWAY
ENZYME
COMMENT
///
A REACTION entry starts with the ENTRY data item, which is followed by the
other data items in the order shown above, and ends with the end-of-entry data
item (///). The data items ENTRY, DEFINITION, EQUATION, and end-of-entry are
mandatory, while the other data items are optional.
Example:
ENTRY R00005
NAME Urea-1-carboxylate amidohydrolase
DEFINITION Urea-1-carboxylate + H2O <=> 2 CO2 + 2 NH3
EQUATION C01010 + C00001 <=> 2 C00011 + 2 C00014
PATHWAY PATH: RN00220 Urea cycle and metabolism of amino groups
PATH: RN00910 Nitrogen metabolism
PATH: RN00791 Atrazine degradation
ENZYME 3.5.1.54
COMMENT The yeast enzyme (but not that from green algae) also catalyses the
reaction of EC 6.3.4.6 urea carboxylase, thus bringing about the
hydrolysis of urea to CO2 and NH3 in the presence of ATP and
bicarbonate.
R00774 (6.3.4.6)
///
In addition, the molecular structure is stored in a separate GIF image format
file, which is created from the GIF images of the compounds in the reaction.
The GIF image is automatically displayed between the EQUATION and PATHWAY data
items in the WWW version of DBGET and is a link to the Chemscape version of
REACTION.
For the ENZYME section, the following data items appear on columns 1-12:
ENTRY
NAME
CLASS
SYSNAME
REACTION
SUBSTRATE
PRODUCT
COMMENT
REFERENCE
PATHWAY
ORTHOLOG
GENES
DISEASE
MOTIF
STRUCTURES
DBLINKS
///
An ENZYME entry begins with the ENTRY data item, which is followed by the other
data items in the order shown above, and ends with the end-of-entry data item.
The data items ENTRY, NAME, CLASS, and end-of-entry are mandatory and the other
data items are optional.
Example:
ENTRY EC 2.7.1.30
NAME glycerol kinase
glycerokinase
GK
ATP:glycerol-3-phosphotransferase
glycerol kinase (phosphorylating)
glyceric kinase
CLASS Transferases
Transferring phosphorus-containing groups
Phosphotransferases with an alcohol group as acceptor
SYSNAME ATP:glycerol 3-phosphotransferase
REACTION ATP + glycerol = ADP + sn-glycerol 3-phosphate
SUBSTRATE ATP
glycerol
PRODUCT ADP
sn-glycerol 3-phosphate
COMMENT Glycerone and L-glyceraldehyde can act as acceptors; UTP (and, in
the case of the yeast enzyme, ITP and GTP) can act as donors.
REFERENCE 1
Bergmeyer, H.-U., Holz, G., Kauder, E.M., Mollering, H. and
Wieland, O. Kristallisierte Glycerokinase aus Candida mycoderma.
Biochem. Z. 333 (1961) 471-480.
2
Bublitz, C. and Kennedy, E.P. Synthesis of phosphatides in isolated
mitochondria. III. The enzymatic phosphorylation of glycerol. J.
Biol. Chem. 211 (1955) 951-961.
3
Wieland, O. and Suyter, M. Glycerokinase: Isolierung und
Eigenschaften des Enzyms. Biochem. Z. 329 (1957) 320-331.
PATHWAY PATH: MAP00561 Glycerolipid metabolism
ORTHOLOG KO: K00864 glycerol kinase
GENES HSA: 2710(GK) 2712(GK2)
MMU: 14933(Gyk)
RNO: 79223(Gyk)
DME: CG1271-PA(CG1271) CG1271-PB(CG1271) CG1271-PD(CG1271)
CG18374-PA(CG18374) CG18374-PB(CG18374) CG7995-PA(CG7995)
CEL: C55B6.1 R11F4.1
ATH: At1g80460(F5I6.22)
PFA: PF13_0269
SCE: YHL032C(GUT1)
ECO: b3926(glpK)
ECJ: JW3897(glpK)
ECE: Z5471(glpK)
ECS: ECs4851
ECC: c4878(glpK)
STY: STY3784(glpK)
STT: t3532(glpK)
STM: STM4086(glpK)
YPE: YPO0090(glpK') YPO3312
YPK: y0047(glpK) y0876(glpK)
SFL: SF4004(glpK)
SFX: S3743(glpK)
PLU: plu4768(glpK)
HIN: HI0691(glpK)
PMU: PM1446(glpK)
XFA: XF2268
XFT: PD1304(glpK)
XCC: XCC0358(glpK)
XAC: XAC0358(glpK)
VCH: VCA0744
VVU: VV11787
VVY: VV2624
VPA: VP2386
PAE: PA1487 PA3579 PA3582(glpK)
PPU: PP1075(glpK)
PST: PSPTO2403 PSPTO4168(glpK)
SON: SO4230(glpK)
CBU: CBU0932(gplK)
CVI: CV0251(glpK)
RSO: RS00495(glpK)
BPE: BP3825(glpK)
BPA: BPP3969(glpK)
BBR: BB4442(glpK)
NEU: NE2128 NE2129
GSU: GSU2762(glpK)
BBA: Bd2718(glpK)
MLO: mll0700
SME: SMb21009(glpK)
ATU: Atu1903(glpK) Atu3890(glpK)
ATC: AGR_C_3493 AGR_L_1914
BME: BMEII0823 BMEII0824
BMS: BRA0443(glpK)
BJA: blr1410(glpK)
RPA: RPA3705(glpK)
CCR: CC1223
BSU: BG10187(glpK)
BHA: BH1093(glpK)
BAN: BA1026(glpK)
BCE: BC1035
OIH: OB2475
SAU: SA1141(glpK)
SAV: SAV1301(glpK)
SAM: MW1183(glpK)
SEP: SE0978
LMO: lmo1034 lmo1538
LIN: lin1573
LLA: L0014(glpK)
SPY: SPy1684(glpK)
SPM: spyM18_1696(glpK)
SPG: SpyM3_1468(glpK)
SPS: SPs0398
SPN: SP2186
SPR: spr1991(glpK)
SAG: SAG0273(glpK)
SAN: gbs0263(glpK)
LPL: lp_0370(glpK1) lp_0834(glpK2)
EFA: EF1929(glpK)
CAC: CAC1321(glpK)
CPE: CPE2552(glpK)
CTC: CTC01758 CTC02462
TTE: TTE2002(glpK)
MGE: MG038(glpK)
MPN: D09_orf508(glpK)
MPU: MYPU_2210(glpK)
MPE: MYPE6360(glpK)
MGA: MGA_0644(glpK)
MMY: MSC_0258(glpK)
MTU: Rv3696c(glpK)
MTC: MT3798
MBO: Mb3721c(glpKb) Mb3722c(glpKa)
MLE: ML2314(glpK)
MPA: MAP0353(glpK)
CGL: NCgl2790(Cgl2890)
CEF: CE2721
CDI: DIP2235(glpK)
SCO: SCO0509(glpK2) SCO1660(SCI52.02)
SMA: SAV6664(glpK1) SAV6963(glpK2) SAV7201(glpK3)
FNU: FN1839
RBA: RB3762(glpK)
BBU: BB0241(glpK)
TDE: TDE1916(glpK)
LIL: LA2119(glpK1) LA3565(glpK2)
SYN: slr1672(glpK)
SYW: SYNW1953
GVI: gll1751(glpK)
ANA: all1811
CTE: CT0185(glpK)
DRA: DR1928
AAE: aq_434(glpK)
TMA: TM0952 TM1430
AFU: AF0866(glpK)
HAL: VNG1967G(glpK)
TAC: Ta1096
TVO: TVG1192129
PAB: PAB2406(glpk)
PFU: PF2004
APE: APE0306
SSO: SSO1600 SSO2133(glpK-2)
DISEASE MIM: 307030 Glycerol kinase deficiency
MOTIF PS: PS00445 [GSA]-x-[LIVMFYW]-x-G-[LIVM]-x(7,8)-[HDENQ]-[LIVMF]-
x(2)-[AS]-[STALIVM]-[LIVMFY]-[DEQ]
PS: PS00933 [MFYGS]-x-[PST]-x(2)-K-[LIVMFYW]-x-W-[LIVMF]-x-
[DENQTKR]-[ENQH]
STRUCTURES PDB: 1GLC 1GLD 1GLE 1GLL 1GLF 1GLJ 1BO5 1GLA 1BWF 1BU6
1BOT 1GLB
DBLINKS IUBMB Enzyme Nomenclature: 2.7.1.30
ExPASy - ENZYME nomenclature database: 2.7.1.30
WIT (What Is There) Metabolic Reconstruction: 2.7.1.30
BRENDA, the Enzyme Database: 2.7.1.30
CAS: 9030-66-4
///
2.2. Continuation Lines
The name of an enzyme or a chemical compound is sometimes too long to fit
in one line, in which case continuation lines are used. The continuation
line is indicated by the dollar sign ($) on column 13. Note that a long
name is simply separated into two lines without any hyphenation. This
rule applies to the following data items: NAME, SYSNAME, REACTION,
SUBSTRATE, PRODUCT, and DEFINITION.
Examples:
NAME 5-Methyltetrahydropteroyltriglutamate--homocysteine
$S-methyltransferase
REACTION UDP-N-acetyl-D-galactosamine +
(N-Acetylneuraminyl)-D-galactosyl-D-glucosylceramide =
UDP + N-Acetyl-D-galactosaminyl-(N-acetylneuraminyl)-D-
$galactosyl-D-glucosylceramide
2.3. Database Links
The reference to other databases is made by the convention used in the
DBGET/LinkDB system; namely, the combination of a database name and an
identifier (entry name or accession number) separated by a colon (:) is
used for cross-reference. The database name may be an abbreviation defined
in DBGET/LinkDB. This rule applies to the following data items: PATHWAY,
GENES, DISEASE, MOTIF, STRUCTURES, ORTHOLOG, REFERENCE and DBLINKS.
3. COMPOUND SECTION
3.1. The ENTRY Data Item
The ENTRY data item contains the compound accession number of the LIGAND
database. This number also corresponds to the name of the GIF, MOL and KCF
files containing the molecular structure. This item is mandatory for all
entries.
3.2. The NAME Data Item
The NAME data item contains the recommended name and, if any, alternative
names. One name is given per one line, except for the name too long to
fit in one line (see General Data Format). This item is mandatory for all
entries.
3.3. The FORMULA Data Item
The FORMULA data item contains the chemical formula of the compound.
3.4. The Molecular Structure File
The molecular structure (structural formula) of the compound may be viewed and
manipulated in the WWW version of DBGET. The image of the molecular structure
stored in a GIF file can be seen between the FORMULA and REACTION data items.
The two dimensional atomic coordinates are stored in an MDL/MOL file and a KCF
file. The MOL file can be retrieved to launch a proper application, such as
ISIS/Draw and ChemDraw, and plug-in, such as Chime, in your WWW browser. See
the instructions below for details.
http://www.genome.ad.jp/dbget/isis_doc.html
The molecular structure may be also searched in the Chemscape version of
LIGAND. Preparing substructure data by using ISIS/Draw, chemical structure
can be input to a browser, and compound including the query structure can
be obtained. This feature is accessible via
http://ligand.genome.ad.jp:8080/compound/
The KCF files can be downloaded from the LIGAND FTP site:
ftp://ftp.genome.ad.jp/pub/kegg/ligand/
3.5. The GLYCAN Data Item
The GLYCAN data item contains the link information to the GLYCAN section.
Monosaccharides and small oligosaccharides are registered in both the COMPOUND
and GLYCAN sections, and this data item specifies the corresponding entry in
the GLYCAN section.
3.6. The REACTION Data Item
The REACTION data item contains the link information to the REACTION section.
The reaction, where one of its substrates and products is the corresponding
compound, is listed by its reaction number.
3.7. The PATHWAY Data Item
The PATHWAY data item contains the link information to the KEGG (Kyoto
Encyclopedia of Genes and Genomes) pathway database: the pathway map accession
number followed by the description. In case where the reference map is
available, its accession number is displayed. If not, the accession number of
organism-specific map is displayed. By clicking on this number in the WWW
version of DBGET, the pathway diagram is displayed with the corresponding
compound highlighted.
3.8. The ENZYME Data Item
The ENZYME data item contains the link information to the ENZYME section.
The enzyme reaction, where one of its substrates and products is the
corresponding compound, is listed by its EC number.
3.9. The STRUCTURES Data Item
The STRUCTURES data item contains the link information to the Protein Data
Bank (PDB), which stores the 3-D structure information of proteins. Compounds
with the STRUCTURES field are included in the 3-D structure data as cofactors
or substrates, which are listed in the HETATM lines of PDB entries.
3.10. The DBLINKS Data Item
The DBLINKS data item contains the link information to other databases.
Currently this data item includes links to the CAS (Chemical Abstracts
Service) registry number and PROMISE (Prosthetic groups and Metal Ions in
Protein Active Sites Database).
3.11. The end-of-entry Data Item
The end-of-entry data item marks the end of the entry. It is denoted by
the identifier consisting of three consecutive slashes, '///'. This item
is mandatory for all entries.
4. GLYCAN SECTION
4.1. The ENTRY Data Item
The ENTRY data item contains the glycan accession number of the LIGAND
database. This number also corresponds to the name of the GIF file containing
the carbohydrate structure. This item is mandatory for all entries.
4.2. The NAME Data Item
The NAME data item contains the recommended name and, if any, alternative
names. One name is given per one line, except for the name too long to
fit in one line (see General Data Format).
4.3. The COMPOSITION Data Item
The COMPOSITION data item contains the compositions of monosaccharides.
Frequently used monosaccharides are listed with their full names below.
GlcNAc: N-acetyl D-glucosamine
Gal: D-galactose
Man: D-mannose
Glc: D-glucose
LFuc: L-fucose
Neu5Ac: N-acetyl neuraminate
GalNAc: N-acetyl D-galactosamine
Xyl: D-xylose
LRha: L-rhamnose
GlcA: D-glucuronate
GlcN: D-glucosamine
4.4. The MASS Data Item
The MASS data item contains mass of the carbohydrate that is calculated by
summing up the masses of monosaccharides, or units, if available, and minus
the number of bonds times mass of water. Sometimes there is a unit whose
mass is not registered in the database. We do not list this data item for the
entry that contains such a unit.
4.5. The Molecular Structure File
The molecular structure (structural formula) of the carbohydrate may be
viewed and manipulated in the WWW version of DBGET. The image of the
carbohydrate structure stored in a GIF file can be seen between the MASS
and CLASS data items. The two dimensional monosaccharide coordinates are
stored in a KCF file, which can be retrieved via Java applet.
The molecular structure may be also searched in the Java applet version
of GLYCAN. Preparing structure data by using the applet, carbohydrate
structure can be input to a browser, and similar carbohydrates can be
searched using various options. This feature is accessible via
http://glycan.genome.ad.jp/
4.6. The CLASS Data Item
The CLASS data item contains the selected class for the carbohydrate
The list of current "Class; Subclass" is shown below.
Glycoprotein; N-Glycan
Glycoprotein; O-Glycan
Glycoprotein; Glycosaminoglycan
Glycoprotein; GPI anchor
Glycoprotein; Others
Glycolipid; Sphingolipid
Glycolipid; Glycerolipid
Glycolipid; LPS
Glycolipid; Others
Polysaccharide
Oligosaccharide
Glycoside
Neoglycoconjugate
Others
4.7. The BINDING Data Item
The BINDING data item contains the information on the related proteins
including enzymes which are involved in transferase activities using or
producing the carbohydrate, proteins which recognize the carbohydate as
a ligand, and proteins to which the carbohydrate is attached.
4.8. The COMMENT Data Item
The COMMENT data item contains the text information commenting on the
carbohydate.
4.9. The COMPOUND Data Item
The relatively small carbohydrates are also registered as compound entries in
the COMPOUND section. The COMPOUND data item contains the corresponding
compound entry ID to the carbohydrate.
4.10. The PATHWAY Data Item
The PATHWAY data item contains the link information to the KEGG pathway
database: the pathway map accession number followed by the description.
By clicking on this number in the WWW version of DBGET, the metabolic
pathway diagram highlighting the carbohydrate is displayed.
4.11. The REACTION Data Item
The REACTION data item contains the link information to the REACTION section.
The reactions, where the corresponding carbohydrate is used, are listed
by their reaction numbers.
4.12. The ENZYME Data Item
The ENZYME data item contains the link information to the ENZYME section.
The enzyme reactions, where the corresonding carbohydate is used, are
listed by their EC numbers.
4.13. The ORTHOLOG Data Item
The ORTHOLOG data item contains the link information to the KEGG/KO
database: the KO identifier followed by the description. KO is a database
of ortholog groups created from the amino acid sequences by clustering
them according to their similarities.
4.14. The DBLINKS Data Item
The DBLINKS data item contains the link information to the corresponding
CCSD entries in terms of the same carbohydrate structure.
4.15. The end-of-entry Data Item
The end-of-entry data item marks the end of the entry. It is denoted by
the identifier consisting of three consecutive slashes, '///'. This item
is mandatory for all entries.
5. REACTION SECTION
5.1. The ENTRY Data Item
The ENTRY data item contains the reaction accession number of the LIGAND
database. This number also corresponds to the name of the GIF file containing
the molecular structure. This item is mandatory for all entries.
5.2. The NAME Data Item
The NAME data item contains the recommended name. This item is mainly
based on the systematic name of the enzyme (See SYSNAME Data Item section
of ENZYME) that catalyzes the corresponding reaction.
5.3. The DEFINITION Data Item
The DEFINITION data item contains the chemical reaction in the form of an
equation; substrates and products are separated by '<=>', and each compound
in substrates and products is separated by ' + '. There may be a coefficient
before the compound name. This item is mandatory for all entries.
5.4. The EQUATION Data Item
The EQUATION data item also contains the chemical reaction in the form of an
equation. This item represents the chemical compounds by their compound IDs,
whereas the name of the compounds are listed in the DEFINITION data item.
This item is mandatory for all entries.
5.5. The Molecular Structure File
The molecular structure (structural formula) of the reaction may be
viewed and manipulated in the WWW version of DBGET. The image of the
molecular structure stored in a GIF file can be seen between the EQUATION
and PATHWAY data items. The GIF file is created from the GIF images of
the compounds in the reaction.
The molecular structure may be also searched in the Chemscape version of
LIGAND. Preparing substructure data by using ISIS/Draw, chemical structure can
be input to a browser, and reactions with the compound containing the query
structure can be obtained. This feature is accessible via
http://ligand.genome.ad.jp:8080/reaction/
5.6. The PATHWAY Data Item
The PATHWAY data item contains the link information to the KEGG (Kyoto
Encyclopedia of Genes and Genomes) pathway database: the pathway map accession
number followed by the description. Instead of using the accession number
starting with "MAP" like the corresponding data items in the other sections,
the accession number in this section starts with "RN", meaning the reference
pathway map for reactions. The difference between "MAP" and "RN" comes from
the linked objects for the EC numbers in each map. The EC numbers in the "RN"
pathway maps are linked to the REACTION section, whereas the ones in the "MAP"
pathway maps are linked to the ENZYME section.
By clicking on the accession number in the WWW version of DBGET, the pathway
diagram highlighting the reaction is displayed.
5.7. The ENZYME Data Item
The ENZYME data item contains the link information to the ENZYME section.
The enzyme entries that catalyse the corresponding reaction, are listed
by their EC numbers.
5.8. The COMMENT Data Item
The COMMENT data item contains the text information commenting on the
reaction.
5.9. The end-of-entry Data Item
The end-of-entry data item marks the end of the entry. It is denoted by
the identifier consisting of three consecutive slashes, '///'. This item
is mandatory for all entries.
5.10. The Format of reaction_name.lst, reaction.lst and reaction_main.lst Files
In the FTP site of the LIGAND database, we also provide files for the list of
reactions. Each reaction is written in a line that starts with the reaction ID.
Following the reaction ID and ':', the reaction is written in a chemical
equation format.
Example of the reaction in reaction_name.lst:
R00004: Pyrophosphate + H2O <=> 2 Orthophosphate
Example of the reaction in reaction.lst:
R00004: C00013 + C00001 <=> 2 C00009
Example of the reaction in reaction_main.lst:
R00004: C00013 <=> C00009
The reaction_main.lst contains chemical reactions that appear in the KEGG
pathway diagrams. Because such information depends on the pathway maps, we
currently store this information in the reaction_mapformula.lst file (see
5.11). Instead the information in this file will be replaced with others
reflecting more biochemical nature of the reaction.
5.11. The Format of reaction_mapformula.lst File
In addition to the three files listed in 5.10, the LIGAND FTP site contains
reaction_mapformula.lst file, where the format of each line is as follows.
R00004: 00190: C00013 => C00009
Here the second column, in the above case 00190, means the pathway map number
and the chemical compounds that appear in the corresponding KEGG pathway map
are listed in the equation. The information on the direction of the reaction
is also included in this format, => and <= for unidirectional reactions and <=>
for bidirectional reactions.
6. ENZYME SECTION
6.1. The ENTRY Data Item
The ENTRY data item contains the entry identifier, which is the EC number
assigned by NC-IUBMB (Nomenclature Committee of International Union of
Biochemistry and Molecular Biology). The EC number was originally devised
by the first Enzyme Commission in 1961, representing the hierarchical
classification scheme with the four figures separated by periods:
(1) the first figure is one of the six main divisions (classes),
(2) the second figure indicates the subclass,
(3) the third figure gives the sub-subclass, and
(4) the fourth figure is the serial number in the sub-subclass.
These numbers are prefixed by 'EC ' in this data item. The entire table
of classification may be browsed both in the DBGET and KEGG systems
(http://www.genome.ad.jp/dbget-bin/get_htext?ECtable). The meaning of the
first three elements is given in the CLASS data item.
There have been deleted and transferred entries in the update process of
the EC numbers in NC-IUBMB. These entries are still listed in the ENZYME
section and are designated as 'Obsolete' starting from the 31st column of
this data item. This ENTRY data item is mandatory for all entries.
6.2. The NAME Data Item
The NAME data item contains the recommended name and, if any, alternative
names. One name is given per one line, except for the name too long to
fit in one line (see Conventions). The recommended name is always placed
at the first line. This item is mandatory for all entries.
6.3. The CLASS Data Item
The CLASS data item contains the meaning of the EC number. Each line
corresponds to the class, subclass, and sub-subclass of the enzyme. This
item is mandatory for all entries.
6.4. The SYSNAME Data Item
The SYSNAME data item contains the systematic name given by the Enzyme
Commission, representing the nature of the chemical reaction.
6.5. The REACTION Data Item
The REACTION data item contains the chemical reaction in the form of an
equation or a text description; for example:
REACTION S-adenosyl-L-methionine + nicotinamide =
S-adenosyl-L-homocysteine + 1-methylnicotinamide
REACTION Hydrolysis of 1,4-beta-linkages between N-acetylmuramic acid and
N-acetyl-D-glucosamine residues in a peptidoglycan and between
N-acetyl-D-glucosamine residues in chitodextrins
If necessary, one name may continue to the next line (see Conventions).
If there are more than one reaction, each reaction except for the last one
ends with a semicolon (;).
By clicking on this item name 'REACTION' in the WWW version of DBGET,
the corresponding reactions in the REACTION section are displayed.
6.6. The SUBSTRATE and PRODUCT Data Items
The SUBSTRATE and PRODUCT data items contain the chemical compounds
that appear on the left and right sides, respectively, of the reaction
equation given in the REACTION data item. Each compound name is given
per line, except for the name too long to fit in one line (see Conventions).
6.7. The COMMENT Data Item
The COMMENT data item contains the text information commenting on the
enzyme.
6.8. The REFERENCE Data Item
The REFERENCE data item contains references describing the enzyme. Each
reference is numbered within the entry, followed by the MEDLINE UI or PMID
if available, in the first line. The author names, title, journal, volume,
year and pages follow in the subsequent lines.
6.9. The PATHWAY Data Item
The PATHWAY data item contains the link information to the KEGG (Kyoto
Encyclopedia of Genes and Genomes) pathway database: the pathway map
accession number followed by the description. By clicking on this number
in the WWW version of DBGET, the metabolic pathway diagram highlighting
the enzyme is displayed.
6.10. The ORTHOLOG Data Item
The ORTHOLOG data item contains the link information to the KEGG/KO database:
the KO identifier followed by the description. KO is a database of ortholog
groups created from the amino acid sequences of enzymes from the organisms
listed in the GENES item below.
6.11. The GENES Data Item
The GENES data item contains the link information to the gene catalogs: the
abbreviation of the organism followed by the list of genes that encode the
enzyme. The organisms are limited to those supported by the KEGG/GENES
database, mainly completely sequenced organisms.
Please refer to
http://www.genome.ad.jp/kegg/catalog/org_list.html
for the meaning of the three letter abbreviations. As of March 2004, 162
organisms are included.
6.11. The DISEASE Data Item
The DISEASE data item contains the link information to the OMIM (On-line
Mendelian Inheritance in Man) database: the MIM number followed by the
description.
6.12. The MOTIF Data Item
The MOTIF data item contains the link information to the PROSITE
database: the PROSITE accession number followed by the sequence pattern.
6.13. The STRUCTURES Data Item
The STRUCTURES data item contains the link information to the Protein Data
Bank (PDB), which stores the 3-D structure information of proteins.
6.14. The DBLINKS Data Item
The DBLINKS data item contains the link information to other databases,
including
o IUBMB Enzyme Nomenclature,
o ExPASy - ENZYME Nomenclature database at Swiss Institute of Bioinformatics,
o WIT (What Is There) Interactive Metabolic Reconstruction on the Web
at Argonne National Laboratory,
o UM-BBD (Biocatalysis/Biodegradation Database) at University of Minnesota
o BRENDA at University of Koeln,
o SCOP (Structural Classification of Proteins) at MRC Laboratory of
Molecular Biology and Centre for Protein Engineering, and
o CAS (Chemical Abstracts Service) registry number.
6.15. The end-of-entry Data Item
The end-of-entry data item marks the end of the entry. It is denoted by
the identifier consisting of three consecutive slashes, '///'. This item
is mandatory for all entries.
7. NUMBERS OF DATA ENTRIES
As of March 2004, the numbers of data entries are as follows.
COMPOUND: 10,788
GLYGAN: 10,146
REACTION: 5,902
ENZYME: 4,306 including 501 deleted or transferred entries
Please access to:
http://www.genome.ad.jp/kegg/docs/upd_ligand.html
for more detailed and up-to-date statistics of COMPOUND, GLYCAN, REACTION
and ENZYME sections.
8. CHANGES IN THIS RELEASE
8.1. GIF image files for reactions
GIF files for reaction equations has been added as reaction_gif.tar.Z in the
LIGAND FTP site.
9. UPCOMING CHANGES
9.1. New NODE and EGDE Data Items in COMPOUND and GLYCAN sections
The NODE and EDGE data items will be added to the entries in COMPOUND and
GLYCAN sections for KCF format. These data items will be extracted from the
KCF files currently stored separately and appear at just before the
end-of-entry data items.
10. ACKNOWLEDGMENTS
We thank Dr. Mikita Suyama for his excellent work during the initial phase of
the database development. We also thank Dr. Yukiteru Sugiyama, Hiroko Ishida,
Takako Nishikawa, Saeko Adachi, and Mitsuteru Nakao for their contribution in
the initial phase of the development of the COMPOUND and REACTION sections.
Nobue Takeuchi, Yuko Yoshinaga, Makiko Nakase, Michiru Motoyoshi and Yasushi
Okuno also contributed to the COMPOUND, GLYCAN and REACTION sections.
We appreciate information from carbohydrate research groups in Kyoto Univesity,
such as Prof. Kawasaki's group, for updating the GLYCAN section. Current data
entry is performed by Rumiko Yamamoto, Tomoko Komeno, Yoshinobu Igarashi,
Masaaki Kotera, Yuriko Matsuura, Masami Hamajima, Junko Nishida, Tomomi Kamiya,
Shin Kawano, Kosuke Hashimoto and Atsuko Yano. The LIGAND search engines based
on Chemscape and Java applet are developed by Koichiro Tonomura and Fujitsu
Kyushu System Engineering, respectively.
For the curation process of the LIGAND database especially for the COMPOUND
section, comments and correction from the outside of our research group has
been quite helpful. We thank all those who gave us reports, especially, Dr.
Marcus Ennis and Dr. Kirill Degtyarenko at European Bioinformatics Institute
and Dr. Masanori Arita at University of Tokyo.
The LIGAND database is supported by the grants from the Ministry of Education,
Culture, Sports, Science and Technology (MEXT), the Japan Society for the
Promotion of Science (JSPS), and the Japan Science and Technology Corporation
(JST).
11. REFERENCES
1) Nishioka, T., Sumi, K., and Oda, J., "Finding lead structures from
amino acid sequence similarities of target proteins", In "Probing
Bioactive Mechanism" (Magee, P.S., Henry, D.R., and Block, J.H., eds.),
pp. 105-122, American Chemical Society, New York (1989).
2) Sumi, K., Nishioka, T., and Oda, J., "Similarity graphing and enzyme-
reaction database: methods to detect sequence regions of importance
for recognition of chemical structures", Protein Eng. 4:413-420 (1991).
[PMID:1881867]
3) Suyama, M., Ogiwara, A., Nishioka, T., and Oda, J., "Searching for
amino acid sequence motifs among enzymes: the Enzyme-Reaction Data base",
Comp. Appl. Biosci. 9:9-15 (1993). [PMID:8435774]
4) Nishioka, T., and Oda, J., "Analysis of Amino Acid Sequence-Function
Relationship in Proteins", In "QSAR and Drug Design: New Developments
and Applications" (Fujita, T., ed.), pp. 215-233, Elsevier (1995).
5) Goto, S., Nishioka, T., and Kanehisa, M., "LIGAND: Chemical Database
for Enzyme Reactions", Bioinformatics, 14:591-599 (1998). [PMID:9730924]
6) Goto, S., Nishioka, T., and Kanehisa, M.; LIGAND database for enzymes,
compounds, and reactions. Nucleic Acids Res. 27, 377-379 (1999).
[PMID:9847234]
7) Goto, S., Nishioka, T., and Kanehisa, M.; LIGAND: chemical database of
enzyme reactions. Nucleic Acids Res. 28, 380-382 (2000). [PMID:10592281]
8) Goto, S., Okuno, Y., Hattori, M., Nishioka, T. and Kanehisa, M.;
LIGAND: database of chemical compounds and reactions in biological
pathways. Nucleic Acids Res. 30, 402-404 (2002). [PMID:11752349]