/       _ _/    ___  |     _  /      _  _/  _ _/  _ ___ |
           /         /     /   _/     /  |        / |    /     /    |
          /         /     /          /   |       /  |   /     /     |
         /         /     /          ____ |      /   |  /     /     /
        /         /     /  _ _/    /     |     /    | /     /     /
       /         /     /    /     /      |    /      /     /     /
     ______/  ___/    _____/   ___/   ___/ ___/   ___/   _______/



  LIGAND - Database of Chemical Compounds and Reactions in Biological Pathways

                         Release 29.0, March 2004

                               User Manual



               Susumu Goto, Masahiro Hattori, Takaaki Nishioka(*),
                            and Minoru Kanehisa

   Bioinformatics Center, Institute for Chemical Research, Kyoto University
                                   and
      (*)Graduate School of Agricultural Sciences, Kyoto University

                     http://www.genome.ad.jp/ligand/
         Feedback: http://www.genome.ad.jp/feedback/?category=ligand


==========================================================================
TABLE OF CONTENTS
==========================================================================

1.  INTRODUCTION
2.  CONVENTIONS
3.  COMPOUND SECTION
4.  GLYCAN SECTION
5.  REACTION SECTION
6.  ENZYME SECTION
7.  NUMBERS OF DATA ENTRIES
8.  CHANGES IN THIS RELEASE
9.  UPCOMING CHANGES
10. ACKNOWLEDGMENTS
11. REFERENCES

==========================================================================

1. INTRODUCTION

LIGAND, Database of Chemical Compounds and Reactions in Biological Pathways,
is designed to provide the linkage between chemical and biological aspects of
life in the light of enzymatic reactions.  The database consists of four
sections: COMPOUND, GLYCAN, REACTION, and ENZYME.

The COMPOUND section is a collection of metabolic and other compounds including
substrates, products, inhibitors of metabolic pathways as well as drugs and
xenobiotic chemicals.  Each of the chemical substances that appear in the
REACTION and ENZYME sections and KEGG/PATHWAY database is identified by an
accession number and stored in this section.  Each entry of COMPOUND
contains information of naming, chemical formula, structural formula in
separate GIF, MOL and KCF files, metabolic and other pathways, related enzymes,
related protein structures, prosthetic groups and CAS (Chemical Abstracts
Service) registry number.

The GLYCAN section is a collection of carbohydrate structures (sequences).
There was a well known database for carbohydrates called CarbBank/CCSD.  Since
the fund for the database has been discontinued, it has not been updated and
maintained.  We have taken over their activity and created a new database,
which we call GLYCAN, by merging entries whose carbohydrate structures are same
and by adding new information, such as related pathways.  Each GLYCAN entry
contains information of naming, composition of monosaccharides, links to
metabolic pathways, related proteins and reactions.

The structural formul of each carbohydrate is provided by separate GIF and KCF
files.  KCF (KEGG Chemical Function) format represents a graph, and in the case
of carbohydrate structure, monosaccharides and glycosidic bonds are represented
by nodes and egdes, respectively.  The KCF format for chemical compounds is
described in:

    Hattori, M., Okuno, Y., Goto, S., and Kanehisa, M., "Development of a
    chemical structure comparison method for integrated analysis of chemical
    and genomic information in the metabolic pathways", Journal of American
    Chemical Society, 125:11853-11865 (2003). [PMID:14505407]

The REACTION section is a collection of chemical reactions that appear in the
pathway diagrams of the KEGG/PATHWAY database as well as in the ENZYME section.
It includes non-enzymatic reactions and enzymatic reactions whose EC numbers
have not been assigned yet, in addition to those with EC numbers.

The ENZYME section is a collection of all known enzymatic reactions
classified according to the nomenclature of the International Union of
Biochemistry and Molecular Biology (IUBMB):

    International Union of Biochemistry and Molecular Biology,
    "Enzyme Nomenclature: Recommendations (1992) of the Nomenclature
    Committee of the International Union of Biochemistry and Molecular
    Biology", Academic Press, New York (1992).

Current classification is maintained at
    http://www.chem.qmw.ac.uk/iubmb/enzyme/
and the basic information about the enzymes are taken from the site.

Each entry of ENZYME is identified by the EC (Enzyme Commission) number, and
contains information of naming, chemical reactions, metabolic compounds,
metabolic pathways, references, genes encoding the enzyme for several organisms
(mainly completely sequenced ones), genetic diseases, and links to other
databases including protein sequence motifs and 3D structural data.

The LIGAND database is a major component of the DBGET/LinkDB integrated
database system (http://www.genome.ad.jp/dbget/), providing useful links
among the existing databases.  LIGAND is tightly coupled to the pathway
database of the KEGG system (http://www.genome.ad.jp/kegg/), providing
links to the gene catalogs of a number of organisms. Substructure searches
of COMPOUND and REACTION can be done by the Chemscape server
(http://ligand.genome.ad.jp:8080/compound/ and
http://ligand.genome.ad.jp:8080/reaction/). Substructure and similar structure
searches for GLYCAN can be done by the glycan web page
(http://glycan.genome.ad.jp/), which uses Java applet.

The basic concept of the LIGAND database is described in:

    Goto, S., Nishioka, T., and Kanehisa, M., "LIGAND: Chemical Database for
    Enzyme Reactions", Bioinformatics, 14:591-599 (1998). [PMID:8730924]

and the most up-to-date information is described in:

    Goto, S., Okuno, Y., Hattori, M., Nishioka, T., and Kanehisa, M.,
    "LIGAND: database of chemical compounds and reactions in biological
    pathways", Nucleic Acids Research, 30:402-404 (2002). [PMID:11752349]

Please cite them accordingly when making use of the LIGAND database.



2. CONVENTIONS

2.1.  General Data Format

LIGAND is constructed as a flat-file database.  Similar to the data
formats of PIR and GenBank databases, a fixed number of columns are
assigned to specify the attributes of data.  Each attribute is identified
by the keyword that appears on columns 1-12.  When these columns are
blank, the attribute of the preceding line continues.  Columns from 13
are used to describe the entities of data.

For the COMPOUND section, the following data items appear on columns 1-12:

        ENTRY
        NAME
        FORMULA
        GLYCAN
        PATHWAY
        REACTION
        ENZYME
        STRUCTURES
        DBLINKS
        ///

A COMPOUND entry starts with the ENTRY data item, which is followed by the
other data items in the order shown above, and ends with the end-of-entry data
item (///).  The data items ENTRY, NAME, and end-of-entry are mandatory, while
the other data items are optional.

Example:

ENTRY       C00116
NAME        Glycerol
            Glycerin
            1,2,3-Trihydroxypropane
            1,2,3-Propanetriol
FORMULA     C3H8O3
REACTION    R00841 R00847 R00850 R01034 R01036 R01039 R01041 R01043
            R01044 R01045 R01046 R01047 R01048 R01104 R01350 R01351
            R01352 R02591 R03331 R03616
PATHWAY     PATH: MAP00052  Galactose metabolism
            PATH: MAP00561  Glycerolipid metabolism
ENZYME      1.1.1.1         1.1.1.2         1.1.1.6         1.1.1.21
            1.1.1.72        1.1.1.156       1.1.99.22       2.3.1.21
            2.7.1.30        2.7.1.79        2.7.1.142       3.1.1.23
            3.1.1.34        3.1.3.19        3.1.3.21        3.1.4.38
            3.1.4.43        3.2.1.22        3.2.1.23        4.2.1.30
DBLINKS     CAS: 56-81-5
///

In addition, the molecular structure is stored in separate MDL/MOL, GIF and KCF
format files for each compound.  The GIF image is automatically displayed
between the FORMULA and REACTION data items in the WWW version of DBGET and is
a link to the Chemscape version of COMPOUND.


For the GLYCAN section, the following data items appear on colums 1-12:

        ENTRY
        NAME
        COMPOSITION
        MASS
        CLASS
        BINDING 
        COMMENT
        COMPOUND
        PATHWAY
        REACTION
        ENZYME
        ORTHOLOG
        DBLINKS
        ///

A GLYCAN entry starts with the ENTRY data item, which is followed by the other
data items in the order shown above, and ends with the end-of-entry data item
(///).  The data items ENTRY and end-of-entry are mandatory, while the other
data items are optional.

Example:

ENTRY       G00249            Glycan
NAME        Raffinose
COMPOSITION (Glc)1 (Gal)1 (Fruf)1
MASS        504.44248 (Glc)1 (Gal)1 (Fruf)1
CLASS       Oligosaccharide
BINDING     ENZYME: beta-D-glucosidase/beta-D-xylosidase [PMID:8156553]
            ENZYME: beta-glucosidase, (1->6)-alpha-arabinofuranosidase
                    [PMID:7986378]
            ENZYME: alpha-galactosidase from Candida guilliermondii H-404
                    [PMID:7772826]
            ENZYME: Aspergillus sydowi beta-fructofuranosidase (2,1- and
                    2,6-linked) [PMID:7766019]
            ENZYME: honeybee alpha-glucosidase I [PMID:2204617]
COMPOUND    C00492
PATHWAY     PATH: MAP00052  Galactose metabolism
REACTION    R06042         R06070         R06071         R06100
            R06139         R06152         
ENZYME      2.4.1.67       2.4.1.82       2.4.1.166      3.2.1.22
            3.2.1.26       3.2.1.55       3.2.1.80       
DBLINKS     CCSD: 1887  8349  8657  9314  9361  10112  10774  11014  12280
                  12485  12488  12490  12535  12595  12624  13207  13409  13437
                  13512  13793  14929  15415  15496  15512  15806  16158  16761
                  17154  17162  17164  18130  18355  18361  18537  18732  19002
                  19141  19400  19742  20848  20854  20860  22731  22733  22860
                  22861  23008  23018  23116  23598  24270  24326  24652  24997
                  25071  25160  25372  25442  25483  25832  26128  26134  26189
                  26545  26580  26641  26681  26697  26777  27199  27240  27406
                  27410  27413  27419  27751  28306  28314  29193  29196  29578
                  29852  30154  30671  30675  31884  32410  32514  33563  3446 
                  34479  37937  41648  42252  42283  42346  42601  44983  45673
                  46955  47474  48497  48556  48613
///

In addition, the molecular structure is stored in a separate KCF format file
as well as in a GIF image format file for each glycan.  The GIF image is
automatically displayed between the MASS and CLASS data items in the WWW
version of DBGET and is a link to the glycan structure search using Java
applet version of GLYCAN.


For the REACTION section, the following data items appear on columns 1-12:

        ENTRY
        NAME
        DEFINITION
        EQUATION
        PATHWAY
        ENZYME
        COMMENT
        ///

A REACTION entry starts with the ENTRY data item, which is followed by the
other data items in the order shown above, and ends with the end-of-entry data
item (///).  The data items ENTRY, DEFINITION, EQUATION, and end-of-entry are
mandatory, while the other data items are optional.

Example:

ENTRY       R00005
NAME        Urea-1-carboxylate amidohydrolase
DEFINITION  Urea-1-carboxylate + H2O <=> 2 CO2 + 2 NH3
EQUATION    C01010 + C00001 <=> 2 C00011 + 2 C00014
PATHWAY     PATH: RN00220  Urea cycle and metabolism of amino groups
            PATH: RN00910  Nitrogen metabolism
            PATH: RN00791  Atrazine degradation
ENZYME      3.5.1.54
COMMENT     The yeast enzyme (but not that from green algae) also catalyses the
            reaction of EC 6.3.4.6 urea carboxylase, thus bringing about the
            hydrolysis of urea to CO2 and NH3 in the presence of ATP and
            bicarbonate.
            R00774 (6.3.4.6)
///

In addition, the molecular structure is stored in a separate GIF image format
file, which is created from the GIF images of the compounds in the reaction.
The GIF image is automatically displayed between the EQUATION and PATHWAY data
items in the WWW version of DBGET and is a link to the Chemscape version of
REACTION.


For the ENZYME section, the following data items appear on columns 1-12:

        ENTRY
        NAME
        CLASS
        SYSNAME
        REACTION
        SUBSTRATE
        PRODUCT
        COMMENT
        REFERENCE
        PATHWAY
        ORTHOLOG
        GENES
        DISEASE
        MOTIF
        STRUCTURES
        DBLINKS
        ///

An ENZYME entry begins with the ENTRY data item, which is followed by the other
data items in the order shown above, and ends with the end-of-entry data item.
The data items ENTRY, NAME, CLASS, and end-of-entry are mandatory and the other
data items are optional.

Example:

ENTRY       EC 2.7.1.30
NAME        glycerol kinase
            glycerokinase
            GK
            ATP:glycerol-3-phosphotransferase
            glycerol kinase (phosphorylating)
            glyceric kinase
CLASS       Transferases
            Transferring phosphorus-containing groups
            Phosphotransferases with an alcohol group as acceptor
SYSNAME     ATP:glycerol 3-phosphotransferase
REACTION    ATP + glycerol = ADP + sn-glycerol 3-phosphate
SUBSTRATE   ATP
            glycerol
PRODUCT     ADP
            sn-glycerol 3-phosphate
COMMENT     Glycerone and L-glyceraldehyde can act as acceptors; UTP (and, in 
            the case of the yeast enzyme, ITP and GTP) can act as donors.
REFERENCE   1
            Bergmeyer, H.-U., Holz, G., Kauder, E.M., Mollering, H. and 
            Wieland, O. Kristallisierte Glycerokinase aus Candida mycoderma. 
            Biochem. Z. 333 (1961) 471-480.
            2
            Bublitz, C. and Kennedy, E.P. Synthesis of phosphatides in isolated
            mitochondria. III. The enzymatic phosphorylation of glycerol. J. 
            Biol. Chem. 211 (1955) 951-961.
            3
            Wieland, O. and Suyter, M. Glycerokinase: Isolierung und 
            Eigenschaften des Enzyms. Biochem. Z. 329 (1957) 320-331.
PATHWAY     PATH: MAP00561  Glycerolipid metabolism
ORTHOLOG    KO: K00864  glycerol kinase
GENES       HSA: 2710(GK) 2712(GK2)
            MMU: 14933(Gyk)
            RNO: 79223(Gyk)
            DME: CG1271-PA(CG1271) CG1271-PB(CG1271) CG1271-PD(CG1271)
                 CG18374-PA(CG18374) CG18374-PB(CG18374) CG7995-PA(CG7995)
            CEL: C55B6.1 R11F4.1
            ATH: At1g80460(F5I6.22)
            PFA: PF13_0269
            SCE: YHL032C(GUT1)
            ECO: b3926(glpK)
            ECJ: JW3897(glpK)
            ECE: Z5471(glpK)
            ECS: ECs4851
            ECC: c4878(glpK)
            STY: STY3784(glpK)
            STT: t3532(glpK)
            STM: STM4086(glpK)
            YPE: YPO0090(glpK') YPO3312
            YPK: y0047(glpK) y0876(glpK)
            SFL: SF4004(glpK)
            SFX: S3743(glpK)
            PLU: plu4768(glpK)
            HIN: HI0691(glpK)
            PMU: PM1446(glpK)
            XFA: XF2268
            XFT: PD1304(glpK)
            XCC: XCC0358(glpK)
            XAC: XAC0358(glpK)
            VCH: VCA0744
            VVU: VV11787
            VVY: VV2624
            VPA: VP2386
            PAE: PA1487 PA3579 PA3582(glpK)
            PPU: PP1075(glpK)
            PST: PSPTO2403 PSPTO4168(glpK)
            SON: SO4230(glpK)
            CBU: CBU0932(gplK)
            CVI: CV0251(glpK)
            RSO: RS00495(glpK)
            BPE: BP3825(glpK)
            BPA: BPP3969(glpK)
            BBR: BB4442(glpK)
            NEU: NE2128 NE2129
            GSU: GSU2762(glpK)
            BBA: Bd2718(glpK)
            MLO: mll0700
            SME: SMb21009(glpK)
            ATU: Atu1903(glpK) Atu3890(glpK)
            ATC: AGR_C_3493 AGR_L_1914
            BME: BMEII0823 BMEII0824
            BMS: BRA0443(glpK)
            BJA: blr1410(glpK)
            RPA: RPA3705(glpK)
            CCR: CC1223
            BSU: BG10187(glpK)
            BHA: BH1093(glpK)
            BAN: BA1026(glpK)
            BCE: BC1035
            OIH: OB2475
            SAU: SA1141(glpK)
            SAV: SAV1301(glpK)
            SAM: MW1183(glpK)
            SEP: SE0978
            LMO: lmo1034 lmo1538
            LIN: lin1573
            LLA: L0014(glpK)
            SPY: SPy1684(glpK)
            SPM: spyM18_1696(glpK)
            SPG: SpyM3_1468(glpK)
            SPS: SPs0398
            SPN: SP2186
            SPR: spr1991(glpK)
            SAG: SAG0273(glpK)
            SAN: gbs0263(glpK)
            LPL: lp_0370(glpK1) lp_0834(glpK2)
            EFA: EF1929(glpK)
            CAC: CAC1321(glpK)
            CPE: CPE2552(glpK)
            CTC: CTC01758 CTC02462
            TTE: TTE2002(glpK)
            MGE: MG038(glpK)
            MPN: D09_orf508(glpK)
            MPU: MYPU_2210(glpK)
            MPE: MYPE6360(glpK)
            MGA: MGA_0644(glpK)
            MMY: MSC_0258(glpK)
            MTU: Rv3696c(glpK)
            MTC: MT3798
            MBO: Mb3721c(glpKb) Mb3722c(glpKa)
            MLE: ML2314(glpK)
            MPA: MAP0353(glpK)
            CGL: NCgl2790(Cgl2890)
            CEF: CE2721
            CDI: DIP2235(glpK)
            SCO: SCO0509(glpK2) SCO1660(SCI52.02)
            SMA: SAV6664(glpK1) SAV6963(glpK2) SAV7201(glpK3)
            FNU: FN1839
            RBA: RB3762(glpK)
            BBU: BB0241(glpK)
            TDE: TDE1916(glpK)
            LIL: LA2119(glpK1) LA3565(glpK2)
            SYN: slr1672(glpK)
            SYW: SYNW1953
            GVI: gll1751(glpK)
            ANA: all1811
            CTE: CT0185(glpK)
            DRA: DR1928
            AAE: aq_434(glpK)
            TMA: TM0952 TM1430
            AFU: AF0866(glpK)
            HAL: VNG1967G(glpK)
            TAC: Ta1096
            TVO: TVG1192129
            PAB: PAB2406(glpk)
            PFU: PF2004
            APE: APE0306
            SSO: SSO1600 SSO2133(glpK-2)
DISEASE     MIM: 307030  Glycerol kinase deficiency
MOTIF       PS: PS00445  [GSA]-x-[LIVMFYW]-x-G-[LIVM]-x(7,8)-[HDENQ]-[LIVMF]-
                         x(2)-[AS]-[STALIVM]-[LIVMFY]-[DEQ]
            PS: PS00933  [MFYGS]-x-[PST]-x(2)-K-[LIVMFYW]-x-W-[LIVMF]-x-
                         [DENQTKR]-[ENQH]
STRUCTURES  PDB: 1GLC  1GLD  1GLE  1GLL  1GLF  1GLJ  1BO5  1GLA  1BWF  1BU6  
                 1BOT  1GLB  
DBLINKS     IUBMB Enzyme Nomenclature: 2.7.1.30
            ExPASy - ENZYME nomenclature database: 2.7.1.30
            WIT (What Is There) Metabolic Reconstruction: 2.7.1.30
            BRENDA, the Enzyme Database: 2.7.1.30
            CAS: 9030-66-4
///


2.2.  Continuation Lines

The name of an enzyme or a chemical compound is sometimes too long to fit
in one line, in which case continuation lines are used.  The continuation
line is indicated by the dollar sign ($) on column 13.  Note that a long
name is simply separated into two lines without any hyphenation.  This
rule applies to the following data items: NAME, SYSNAME, REACTION,
SUBSTRATE, PRODUCT, and DEFINITION.

Examples:

  NAME          5-Methyltetrahydropteroyltriglutamate--homocysteine
                $S-methyltransferase

  REACTION      UDP-N-acetyl-D-galactosamine +
                (N-Acetylneuraminyl)-D-galactosyl-D-glucosylceramide =
                UDP + N-Acetyl-D-galactosaminyl-(N-acetylneuraminyl)-D-
                $galactosyl-D-glucosylceramide


2.3.  Database Links

The reference to other databases is made by the convention used in the
DBGET/LinkDB system; namely, the combination of a database name and an
identifier (entry name or accession number) separated by a colon (:) is
used for cross-reference.  The database name may be an abbreviation defined
in DBGET/LinkDB.  This rule applies to the following data items: PATHWAY,
GENES, DISEASE, MOTIF, STRUCTURES, ORTHOLOG, REFERENCE and DBLINKS.



3. COMPOUND SECTION

3.1.  The ENTRY Data Item

The ENTRY data item contains the compound accession number of the LIGAND
database.  This number also corresponds to the name of the GIF, MOL and KCF
files containing the molecular structure.  This item is mandatory for all
entries.


3.2.  The NAME Data Item

The NAME data item contains the recommended name and, if any, alternative
names.  One name is given per one line, except for the name too long to
fit in one line (see General Data Format).  This item is mandatory for all
entries.


3.3.  The FORMULA Data Item

The FORMULA data item contains the chemical formula of the compound.


3.4.  The Molecular Structure File

The molecular structure (structural formula) of the compound may be viewed and
manipulated in the WWW version of DBGET.  The image of the molecular structure
stored in a GIF file can be seen between the FORMULA and REACTION data items.
The two dimensional atomic coordinates are stored in an MDL/MOL file and a KCF
file.  The MOL file can be retrieved to launch a proper application, such as
ISIS/Draw and ChemDraw, and plug-in, such as Chime, in your WWW browser.  See
the instructions below for details.
http://www.genome.ad.jp/dbget/isis_doc.html

The molecular structure may be also searched in the Chemscape version of
LIGAND.  Preparing substructure data by using ISIS/Draw, chemical structure
can be input to a browser, and compound including the query structure can
be obtained.  This feature is accessible via
http://ligand.genome.ad.jp:8080/compound/

The KCF files can be downloaded from the LIGAND FTP site:
ftp://ftp.genome.ad.jp/pub/kegg/ligand/


3.5.  The GLYCAN Data Item

The GLYCAN data item contains the link information to the GLYCAN section.
Monosaccharides and small oligosaccharides are registered in both the COMPOUND
and GLYCAN sections, and this data item specifies the corresponding entry in
the GLYCAN section.


3.6.  The REACTION Data Item

The REACTION data item contains the link information to the REACTION section.
The reaction, where one of its substrates and products is the corresponding
compound, is listed by its reaction number.


3.7.  The PATHWAY Data Item

The PATHWAY data item contains the link information to the KEGG (Kyoto
Encyclopedia of Genes and Genomes) pathway database: the pathway map accession
number followed by the description.  In case where the reference map is
available, its accession number is displayed. If not, the accession number of
organism-specific map is displayed. By clicking on this number in the WWW
version of DBGET, the pathway diagram is displayed with the corresponding
compound highlighted.


3.8.  The ENZYME Data Item

The ENZYME data item contains the link information to the ENZYME section.
The enzyme reaction, where one of its substrates and products is the
corresponding compound, is listed by its EC number.


3.9.  The STRUCTURES Data Item

The STRUCTURES data item contains the link information to the Protein Data
Bank (PDB), which stores the 3-D structure information of proteins. Compounds
with the STRUCTURES field are included in the 3-D structure data as cofactors
or substrates, which are listed in the HETATM lines of PDB entries.


3.10.  The DBLINKS Data Item

The DBLINKS data item contains the link information to other databases.
Currently this data item includes links to the CAS (Chemical Abstracts
Service) registry number and PROMISE (Prosthetic groups and Metal Ions in
Protein Active Sites Database).


3.11.  The end-of-entry Data Item

The end-of-entry data item marks the end of the entry.  It is denoted by
the identifier consisting of three consecutive slashes, '///'.  This item
is mandatory for all entries.



4. GLYCAN SECTION

4.1.  The ENTRY Data Item

The ENTRY data item contains the glycan accession number of the LIGAND 
database.  This number also corresponds to the name of the GIF file containing
the carbohydrate structure.  This item is mandatory for all entries.


4.2.  The NAME Data Item

The NAME data item contains the recommended name and, if any, alternative
names.  One name is given per one line, except for the name too long to 
fit in one line (see General Data Format).


4.3.  The COMPOSITION Data Item

The COMPOSITION data item contains the compositions of monosaccharides.
Frequently used monosaccharides are listed with their full names below.

  GlcNAc: N-acetyl D-glucosamine
  Gal:    D-galactose
  Man:    D-mannose
  Glc:    D-glucose
  LFuc:   L-fucose
  Neu5Ac: N-acetyl neuraminate
  GalNAc: N-acetyl D-galactosamine
  Xyl:    D-xylose
  LRha:   L-rhamnose
  GlcA:   D-glucuronate
  GlcN:   D-glucosamine


4.4.  The MASS Data Item

The MASS data item contains mass of the carbohydrate that is calculated by 
summing up the masses of monosaccharides, or units, if available, and minus
the number of bonds times mass of water.  Sometimes there is a unit whose
mass is not registered in the database.  We do not list this data item for the
entry that contains such a unit.


4.5.  The Molecular Structure File

The molecular structure (structural formula) of the carbohydrate may be 
viewed and manipulated in the WWW version of DBGET.  The image of the
carbohydrate structure stored in a GIF file can be seen between the MASS
and CLASS data items.  The two dimensional monosaccharide coordinates are
stored in a KCF file, which can be retrieved via Java applet.

The molecular structure may be also searched in the Java applet version
of GLYCAN. Preparing structure data by using the applet, carbohydrate
structure can be input to a browser, and similar carbohydrates can be 
searched using various options. This feature is accessible via
http://glycan.genome.ad.jp/


4.6.  The CLASS Data Item

The CLASS data item contains the selected class for the carbohydrate
The list of current "Class; Subclass" is shown below.

  Glycoprotein; N-Glycan
  Glycoprotein; O-Glycan
  Glycoprotein; Glycosaminoglycan
  Glycoprotein; GPI anchor
  Glycoprotein; Others
  Glycolipid; Sphingolipid
  Glycolipid; Glycerolipid
  Glycolipid; LPS
  Glycolipid; Others
  Polysaccharide
  Oligosaccharide
  Glycoside
  Neoglycoconjugate
  Others


4.7.  The BINDING Data Item

The BINDING data item contains the information on the related proteins
including enzymes which are involved in transferase activities using or
producing the carbohydrate, proteins which recognize the carbohydate as
a ligand, and proteins to which the carbohydrate is attached.


4.8.  The COMMENT Data Item

The COMMENT data item contains the text information commenting on the 
carbohydate.


4.9.  The COMPOUND Data Item

The relatively small carbohydrates are also registered as compound entries in
the COMPOUND section.  The COMPOUND data item contains the corresponding
compound entry ID to the carbohydrate.


4.10.  The PATHWAY Data Item

The PATHWAY data item contains the link information to the KEGG pathway
database: the pathway map accession number followed by the description.
By clicking on this number in the WWW version of DBGET, the metabolic
pathway diagram highlighting the carbohydrate is displayed.


4.11.  The REACTION Data Item

The REACTION data item contains the link information to the REACTION section.
The reactions, where the corresponding carbohydrate is used, are listed
by their reaction numbers.


4.12.  The ENZYME Data Item

The ENZYME data item contains the link information to the ENZYME section.
The enzyme reactions, where the corresonding carbohydate is used, are
listed by their EC numbers.


4.13.  The ORTHOLOG Data Item

The ORTHOLOG data item contains the link information to the KEGG/KO 
database: the KO identifier followed by the description.  KO is a database
of ortholog groups created from the amino acid sequences by clustering 
them according to their similarities.


4.14.  The DBLINKS Data Item

The DBLINKS data item contains the link information to the corresponding
CCSD entries in terms of the same carbohydrate structure.


4.15.  The end-of-entry Data Item

The end-of-entry data item marks the end of the entry.  It is denoted by
the identifier consisting of three consecutive slashes, '///'. This item
is mandatory for all entries.



5. REACTION SECTION

5.1.  The ENTRY Data Item

The ENTRY data item contains the reaction accession number of the LIGAND
database.  This number also corresponds to the name of the GIF file containing
the molecular structure.  This item is mandatory for all entries.


5.2.  The NAME Data Item

The NAME data item contains the recommended name.  This item is mainly
based on the systematic name of the enzyme (See SYSNAME Data Item section
of ENZYME) that catalyzes the corresponding reaction.


5.3.  The DEFINITION Data Item

The DEFINITION data item contains the chemical reaction in the form of an
equation; substrates and products are separated by '<=>', and each compound
in substrates and products is separated by ' + '. There may be a coefficient
before the compound name.  This item is mandatory for all entries.


5.4.  The EQUATION Data Item

The EQUATION data item also contains the chemical reaction in the form of an
equation.  This item represents the chemical compounds by their compound IDs,
whereas the name of the compounds are listed in the DEFINITION data item.
This item is mandatory for all entries.


5.5.  The Molecular Structure File

The molecular structure (structural formula) of the reaction may be
viewed and manipulated in the WWW version of DBGET.  The image of the
molecular structure stored in a GIF file can be seen between the EQUATION
and PATHWAY data items.  The GIF file is created from the GIF images of
the compounds in the reaction.

The molecular structure may be also searched in the Chemscape version of
LIGAND.  Preparing substructure data by using ISIS/Draw, chemical structure can
be input to a browser, and reactions with the compound containing the query
structure can be obtained. This feature is accessible via
http://ligand.genome.ad.jp:8080/reaction/


5.6.  The PATHWAY Data Item

The PATHWAY data item contains the link information to the KEGG (Kyoto
Encyclopedia of Genes and Genomes) pathway database: the pathway map accession
number followed by the description.  Instead of using the accession number
starting with "MAP" like the corresponding data items in the other sections,
the accession number in this section starts with "RN", meaning the reference
pathway map for reactions.  The difference between "MAP" and "RN" comes from
the linked objects for the EC numbers in each map.  The EC numbers in the "RN"
pathway maps are linked to the REACTION section, whereas the ones in the "MAP"
pathway maps are linked to the ENZYME section.

By clicking on the accession number in the WWW version of DBGET, the pathway
diagram highlighting the reaction is displayed.


5.7.  The ENZYME Data Item

The ENZYME data item contains the link information to the ENZYME section.
The enzyme entries that catalyse the corresponding reaction, are listed
by their EC numbers.


5.8.  The COMMENT Data Item

The COMMENT data item contains the text information commenting on the
reaction.


5.9.  The end-of-entry Data Item

The end-of-entry data item marks the end of the entry.  It is denoted by
the identifier consisting of three consecutive slashes, '///'.  This item
is mandatory for all entries.


5.10.  The Format of reaction_name.lst, reaction.lst and reaction_main.lst Files

In the FTP site of the LIGAND database, we also provide files for the list of
reactions. Each reaction is written in a line that starts with the reaction ID.
Following the reaction ID and ':', the reaction is written in a chemical
equation format.

Example of the reaction in reaction_name.lst:

  R00004: Pyrophosphate + H2O <=> 2 Orthophosphate

Example of the reaction in reaction.lst:

  R00004: C00013 + C00001 <=> 2 C00009

Example of the reaction in reaction_main.lst:

  R00004: C00013 <=> C00009

The reaction_main.lst contains chemical reactions that appear in the KEGG
pathway diagrams.  Because such information depends on the pathway maps, we
currently store this information in the reaction_mapformula.lst file (see
5.11).  Instead the information in this file will be replaced with others
reflecting more biochemical nature of the reaction.


5.11.  The Format of reaction_mapformula.lst File

In addition to the three files listed in 5.10, the LIGAND FTP site contains
reaction_mapformula.lst file, where the format of each line is as follows.

  R00004: 00190: C00013 => C00009

Here the second column, in the above case 00190, means the pathway map number
and the chemical compounds that appear in the corresponding KEGG pathway map
are listed in the equation.  The information on the direction of the reaction
is also included in this format, => and <= for unidirectional reactions and <=>
for bidirectional reactions.



6. ENZYME SECTION

6.1.  The ENTRY Data Item

The ENTRY data item contains the entry identifier, which is the EC number
assigned by NC-IUBMB (Nomenclature Committee of International Union of
Biochemistry and Molecular Biology).  The EC number was originally devised
by the first Enzyme Commission in 1961, representing the hierarchical
classification scheme with the four figures separated by periods:

    (1) the first figure is one of the six main divisions (classes),
    (2) the second figure indicates the subclass,
    (3) the third figure gives the sub-subclass, and
    (4) the fourth figure is the serial number in the sub-subclass.

These numbers are prefixed by 'EC ' in this data item.  The entire table
of classification may be browsed both in the DBGET and KEGG systems
(http://www.genome.ad.jp/dbget-bin/get_htext?ECtable).  The meaning of the
first three elements is given in the CLASS data item.

There have been deleted and transferred entries in the update process of
the EC numbers in NC-IUBMB.  These entries are still listed in the ENZYME
section and are designated as 'Obsolete' starting from the 31st column of
this data item.  This ENTRY data item is mandatory for all entries.


6.2.  The NAME Data Item

The NAME data item contains the recommended name and, if any, alternative
names.  One name is given per one line, except for the name too long to
fit in one line (see Conventions).  The recommended name is always placed
at the first line.  This item is mandatory for all entries.


6.3.  The CLASS Data Item

The CLASS data item contains the meaning of the EC number.  Each line
corresponds to the class, subclass, and sub-subclass of the enzyme.  This
item is mandatory for all entries.


6.4.  The SYSNAME Data Item

The SYSNAME data item contains the systematic name given by the Enzyme
Commission, representing the nature of the chemical reaction.


6.5.  The REACTION Data Item

The REACTION data item contains the chemical reaction in the form of an
equation or a text description; for example:

  REACTION    S-adenosyl-L-methionine + nicotinamide = 
              S-adenosyl-L-homocysteine + 1-methylnicotinamide

  REACTION    Hydrolysis of 1,4-beta-linkages between N-acetylmuramic acid and 
              N-acetyl-D-glucosamine residues in a peptidoglycan and between 
              N-acetyl-D-glucosamine residues in chitodextrins

If necessary, one name may continue to the next line (see Conventions).
If there are more than one reaction, each reaction except for the last one
ends with a semicolon (;). 

By clicking on this item name 'REACTION' in the WWW version of DBGET,
the corresponding reactions in the REACTION section are displayed.


6.6.  The SUBSTRATE and PRODUCT Data Items

The SUBSTRATE and PRODUCT data items contain the chemical compounds
that appear on the left and right sides, respectively, of the reaction
equation given in the REACTION data item.  Each compound name is given
per line, except for the name too long to fit in one line (see Conventions).


6.7.  The COMMENT Data Item

The COMMENT data item contains the text information commenting on the
enzyme.


6.8.  The REFERENCE Data Item

The REFERENCE data item contains references describing the enzyme.  Each
reference is numbered within the entry, followed by the MEDLINE UI or PMID
if available, in the first line.  The author names, title, journal, volume,
year and pages follow in the subsequent lines.


6.9.  The PATHWAY Data Item

The PATHWAY data item contains the link information to the KEGG (Kyoto
Encyclopedia of Genes and Genomes) pathway database: the pathway map
accession number followed by the description.  By clicking on this number
in the WWW version of DBGET, the metabolic pathway diagram highlighting
the enzyme is displayed.


6.10.  The ORTHOLOG Data Item

The ORTHOLOG data item contains the link information to the KEGG/KO database:
the KO identifier followed by the description. KO is a database of ortholog
groups created from the amino acid sequences of enzymes from the organisms
listed in the GENES item below.


6.11.  The GENES Data Item

The GENES data item contains the link information to the gene catalogs: the
abbreviation of the organism followed by the list of genes that encode the
enzyme.  The organisms are limited to those supported by the KEGG/GENES
database, mainly completely sequenced organisms.

Please refer to
http://www.genome.ad.jp/kegg/catalog/org_list.html
for the meaning of the three letter abbreviations. As of March 2004, 162
organisms are included.


6.11.  The DISEASE Data Item

The DISEASE data item contains the link information to the OMIM (On-line
Mendelian Inheritance in Man) database: the MIM number followed by the
description.


6.12.  The MOTIF Data Item

The MOTIF data item contains the link information to the PROSITE
database: the PROSITE accession number followed by the sequence pattern.


6.13.  The STRUCTURES Data Item

The STRUCTURES data item contains the link information to the Protein Data
Bank (PDB), which stores the 3-D structure information of proteins.


6.14.  The DBLINKS Data Item

The DBLINKS data item contains the link information to other databases,
including
o IUBMB Enzyme Nomenclature,
o ExPASy - ENZYME Nomenclature database at Swiss Institute of Bioinformatics,
o WIT (What Is There) Interactive Metabolic Reconstruction on the Web
  at Argonne National Laboratory,
o UM-BBD (Biocatalysis/Biodegradation Database) at University of Minnesota
o BRENDA at University of Koeln,
o SCOP (Structural Classification of Proteins) at MRC Laboratory of
  Molecular Biology and Centre for Protein Engineering, and
o CAS (Chemical Abstracts Service) registry number.


6.15.  The end-of-entry Data Item

The end-of-entry data item marks the end of the entry.  It is denoted by
the identifier consisting of three consecutive slashes, '///'.  This item
is mandatory for all entries.



7. NUMBERS OF DATA ENTRIES

As of March 2004, the numbers of data entries are as follows.
  COMPOUND: 10,788
  GLYGAN:   10,146
  REACTION:  5,902
  ENZYME:    4,306 including 501 deleted or transferred entries
Please access to:
        http://www.genome.ad.jp/kegg/docs/upd_ligand.html
for more detailed and up-to-date statistics of COMPOUND, GLYCAN, REACTION
and ENZYME sections.


8. CHANGES IN THIS RELEASE

8.1.  GIF image files for reactions

GIF files for reaction equations has been added as reaction_gif.tar.Z in the
LIGAND FTP site.


9. UPCOMING CHANGES

9.1.  New NODE and EGDE Data Items in COMPOUND and GLYCAN sections

The NODE and EDGE data items will be added to the entries in COMPOUND and
GLYCAN sections for KCF format.  These data items will be extracted from the
KCF files currently stored separately and appear at just before the
end-of-entry data items.


10. ACKNOWLEDGMENTS

We thank Dr. Mikita Suyama for his excellent work during the initial phase of
the database development.  We also thank Dr. Yukiteru Sugiyama, Hiroko Ishida,
Takako Nishikawa, Saeko Adachi, and Mitsuteru Nakao for their contribution in
the initial phase of the development of the COMPOUND and REACTION sections.
Nobue Takeuchi, Yuko Yoshinaga, Makiko Nakase, Michiru Motoyoshi and Yasushi
Okuno also contributed to the COMPOUND, GLYCAN and REACTION sections.

We appreciate information from carbohydrate research groups in Kyoto Univesity,
such as Prof. Kawasaki's group, for updating the GLYCAN section.  Current data 
entry is performed by Rumiko Yamamoto, Tomoko Komeno, Yoshinobu Igarashi,
Masaaki Kotera, Yuriko Matsuura, Masami Hamajima, Junko Nishida, Tomomi Kamiya,
Shin Kawano, Kosuke Hashimoto and Atsuko Yano.  The LIGAND search engines based
on Chemscape and Java applet are developed by Koichiro Tonomura and Fujitsu
Kyushu System Engineering, respectively.

For the curation process of the LIGAND database especially for the COMPOUND
section, comments and correction from the outside of our research group has
been quite helpful.  We thank all those who gave us reports, especially, Dr.
Marcus Ennis and Dr. Kirill Degtyarenko at European Bioinformatics Institute
and Dr. Masanori Arita at University of Tokyo.

The LIGAND database is supported by the grants from the Ministry of Education,
Culture, Sports, Science and Technology (MEXT), the Japan Society for the
Promotion of Science (JSPS), and the Japan Science and Technology Corporation
(JST).


11. REFERENCES

 1) Nishioka, T., Sumi, K., and Oda, J., "Finding lead structures from
    amino acid sequence similarities of target proteins", In "Probing
    Bioactive Mechanism" (Magee, P.S., Henry, D.R., and Block, J.H., eds.),
    pp. 105-122, American Chemical Society, New York (1989).
 2) Sumi, K., Nishioka, T., and Oda, J., "Similarity graphing and enzyme-
    reaction database: methods to detect sequence regions of importance
    for recognition of chemical structures", Protein Eng. 4:413-420 (1991).
    [PMID:1881867]
 3) Suyama, M., Ogiwara, A., Nishioka, T., and Oda, J., "Searching for
    amino acid sequence motifs among enzymes: the Enzyme-Reaction Data base",
    Comp. Appl. Biosci. 9:9-15 (1993). [PMID:8435774]
 4) Nishioka, T., and Oda, J., "Analysis of Amino Acid Sequence-Function
    Relationship in Proteins", In "QSAR and Drug Design: New Developments
    and Applications" (Fujita, T., ed.), pp. 215-233, Elsevier (1995).
 5) Goto, S., Nishioka, T., and Kanehisa, M., "LIGAND: Chemical Database
    for Enzyme Reactions", Bioinformatics, 14:591-599 (1998). [PMID:9730924]
 6) Goto, S., Nishioka, T., and Kanehisa, M.; LIGAND database for enzymes,
    compounds, and reactions. Nucleic Acids Res. 27, 377-379 (1999).
    [PMID:9847234] 
 7) Goto, S., Nishioka, T., and Kanehisa, M.; LIGAND: chemical database of
    enzyme reactions. Nucleic Acids Res. 28, 380-382 (2000). [PMID:10592281]
 8) Goto, S., Okuno, Y., Hattori, M., Nishioka, T. and Kanehisa, M.;
    LIGAND: database of chemical compounds and reactions in biological
    pathways. Nucleic Acids Res. 30, 402-404 (2002). [PMID:11752349]