KegArray ReadMe file v1.2

  Copyright (c) 2004 - 2009 Kanehisa Laboratories, All rights reserved.

  KegArray is a Java application that provides an environment for analyzing
  both transcriptome data (gene expression profiles) and metabolome data
  (compound profiles). Tightly integrated with the KEGG database, KegArray
  enables you to easily map those data to KEGG resources including PATHWAY,
  BRITE and genome maps.

----------------------------------------------------------------------------
Table of contents

  1. Requirements

  2. Instructions
  2-1. Installation and starting KegArray
  2-2. Data format of input files
  2-2-1. Format for transcriptome data
  2-2-2. Format for metabolome data
  2-2-3. Entry IDs from databases other than KEGG
  2-2-4. Using data prepared by Excel
  2-3. KegArray control panel
  2-4. KegArray tool-bar menu

  3. Limitations
  3-1. Out of memory error

  4. About the License

  5. Acknowledgement

  6. Feedback

----------------------------------------------------------------------------

1. Requirements

  - MacOS X 10.3.9 or higher
  - Java 1.4.x

  or

  - Windows XP
  - Java 1.4.x

  or
  
  - CentOS Linux 4.3
  - Java 1.5.0


2. Instructions

2-1. Installation and starting KegArray

  For MacOS X
   Simply drag and drop the KegArray program file included in the disk
   image (KegArray-*.dmg file) into your local disk.

   Start KegArray by double clicking the KegArray icon.
   
  or

  For Windows
   Simply expand the KegArray archive file (KegArray-*_jar.zip / 
   KegArray-*_exe.zip file).
   
   Start KegArray by double clicking the KegArray bat / exe file
   (KegArray.bat / KegArray.exe).

  or
  
  For Linux (CentOS)
   Simply expand the KegArray archive file (KegArray-*.tar.gz file).
   
   Start KegArray using the following command.
    > cd [Installation Directory]
    > ./KegArray.sh


2-2. Data format of input files

2-2-1. Format for transcriptome data

  KegArray can read data format of the EXPRESSION database
  (http://www.genome.jp/kegg/expression/) or tab-deliminated text similar to
  the EXPRESSION format.

  Each entry of EXPRESSION consists of brief description about the experiment,
  reference information, and a  set of intensity values or ratios of
  two-channels derived from a DNA microarray.

  An example for intensity values is given below.

   -------------------------------------------------------------------------
   #organism: syn
   #ORF      x       y       Control-sig     Control-bkg     Target-sig      Target-bkg
   slr1485   1       2       1037.13         502.62          1593.30         695.25
   slr1119   1       3       1261.63         494.72          2685.37         742.87
   sll0708   1       4       922.97          561.38          1598.37         727.28
   sll1120   1       6       2152.80         560.96          2591.23         771.07
   sll1734   1       7       1918.47         574.57          5968.97         823.66
       :
       :
   -------------------------------------------------------------------------

  An example for expression ratios between two-channels is given below.

   -------------------------------------------------------------------------
   #organism: syn
   #ORF	    x	y	ratio
   slr1485	1	2	0.610282
   slr1119	1	3	2.360655
   sll0708	1	4	0.842321
   sll1120	1	5	0.769038
       :
       :
   -------------------------------------------------------------------------

  All lines beginning with the "#" character (other than the '#organism:' or
  '#source:' line) are regarded as comments and skipped by KegArray. The
  organism information is necessary to identify the ORFs, because the entry
  identifier of KEGG GENES database is a combination of the organism code and
  the entry identifier joined by a colon (:), as in 'org:entry_id'. The 
  organism should be provided by the three-letter (or four-letter) organism
  code used in KEGG. (e.g. 'hsa' for human and 'mmu' for mouse. Full list is
  available from http://www.genome.jp/kegg/catalog/org_list.html).

  The lines in tab-delimited format below the #ORF section contain gene
  expression profile data. The definition of each column is as follows.

   - Columns for intensity values
       First column    KEGG GENES ID which is the unique identifier of 
                       the ORF in the organism.
       Second column   X-axis coordinate information of the ORF on the 
                       microarray. 
       Third column    Y-axis coordinate information of the ORF on the
                       microarray.
                       The second and third columns are used for specifying
                       the location of the ORF in the schematic of the DNA
                       microarray (ArrayImage in KegArray analysis view).
       Fourth column   Signal intensity of the control channel.
       Fifth column    Background intensity of the control channel.
       Sixth column    Signal intensity of the target channel.
       Seventh column  Background intensity of the target channel.

       When the subtraction of the background from the signal intensity 
       (4th column - 5th column and 6th column - 7th column) is negative,
       KegArray treats it as 1.

   - Columns for ratio values
       First column    KEGG GENES ID which is the unique identifier of
                       the ORF in the organism.
       Second column   X-axis coordinate information of the ORF on the 
                       microarray. 
       Third column    Y-axis coordinate information of the ORF on the
                       microarray.
                       The second and third columns are used for specifying
                       the location of the ORF in the schematic of the DNA
                       microarray (ArrayImage in KegArray analysis view).
       Fourth column   Ratio value between control channel and target channel. 

  When there are no coordinate information (i.e. the second and third columns
  are blank), KegArray assigns their coordinates information properly.


2-2-2. Data format for metabolome data

  Only ratio values can be used for metabolome data like the following example.

   -------------------------------------------------------------------------
   #COMPOUND	ratio
   C00668       1.2
   C00221       0.5
   C01172       2.2
   C00118       1.0
       :
       :
   -------------------------------------------------------------------------

       First column    KEGG COMPOUND ID
                       (e.g. C00668 for alpha-D-Glucose 6-phosphate)
       Second column   Relative amount of the target compound compared with the
                       control.


2-2-3. Entry IDs from databases other than KEGG

   KegArray can convert the external database IDs to the KEGG GENES IDs, which
   are necessary for mapping the array data to the KEGG resources such as
   pathway maps. The following example shows the case where NCBI-GIs are used
   in the first column.

   -------------------------------------------------------------------------
   #organism: bsu
   2633829     1       1       938     189     725     249
   2633830     2       1       2692    189     2253    249
   2633899     3       1       958     189     444     249
   2636068     4       1       6703    189     2533    249
       :
       :
   -------------------------------------------------------------------------

   Using the ID converter (see the next subsection), you can convert the
   NCBI-GIs to KEGG GENES IDs. Currently, the following external databases are
   supported:

   External database  Database prefix
   -----------------  ---------------
   NCBI GI            ncbi-gi
   NCBI Entrez Gene   ncbi-gene
   GenBank            gb
   UniGene            unigene
   UniProt            up
   IPI                ipi


2-2-4. Using data prepared by Microsoft Excel

  To convert data in Microsoft Excel format for KegArray, you need to order 
  the columns as in the KegArray format in advance and save them as
  "tab-deliminated text" using the "File->Save as" menu in Excel.


2-3. KegArray control panel

  Once you launch KegArray, you will see the KegArray control panel.
  Each function in the panel is described below.

  *Data

    There are two tabs to select the Gene/Compound or Clustering pane at the
    top of the KegArray control panel.
    In the Gene/Compound pane, you can load a data file of transcriptome
    and/or metabolome experiments and set parameters.
    In the Clustering pane, you can load several data files of transcriptome
    experiments and set an intensity threshold.

    *Gene/Compound pane

      *File:
        There are five buttons and a checkbox to display the pop-up window for
        specifying the input data file.

        [Local] button
          Open a pop-up window to select a data file on your local disk. The
          data file should comply with the format described in the section
          2-2-1.

        [GenomeNet] button
          Open a pop-up window to retrieve the data stored in the GenomeNet
          EXPRESSION database. Availalbe entry IDs are listed in the window,
          and once you select one, its description will be displayed.

        [Compound data]
          This box should be checked (default) for loading metabolome data.
          You can ignore the metabolome data after loading by unchecking it.

          [Local] button
            Open a pop-up window to select a compound data file on your local
            disk. The data file should comply with the format described in the
            section 2-2-2.

      *Threshold and normalization

        *Linear pane
          There are three input boxes to specify the parameters (Ratio
          threshold for transcriptome data and metabolome data and Intensity
          threshold for transcriptome data) for the confidence lines
          discriminating the regulated genes/compounds from unregulated ones.

        *CC pane
          There are three input boxes to specify the parameters (Window length,
          Window size and Significance level) for the confidence curves
          discriminating the regulated genes from unregulated ones.

        *Cancel/Apply buttons
          Once the values in the threshold and normalization input boxes are
          modified, "Linear" and "CC" in the pane names are marked with '*',
          which means the modified values have not been applied yet. You have
          to click "Apply" button to use the new values for the following 
          analyses.          

      *KegArray analysis view
        After loading a transcriptome data file, a window is automatically
        launched to display information on the data in four panes.

        *Statistics
          In the Statistics pane, two distributions of gene expression
          intensities and ratios are shown, which can be used for specifying
          the threshold.

        *ArrayImage
          In the ArrayImage pane, a schematic view of DNA microarray is shown.
          The colors of spots represent levels of increase or decrease of the
          target gene expressions against the control. The coloring scheme can
          be changed in the preference menu (KegArray > Preferences > Color).
          Each spot is clickable and linked to the corresponding KEGG GENES
          database entry.

        *Scatter plot (for Linear pane)
          The scatter plot of the data is shown in this pane. The colors of
          spots represent levels of increase or decrease of the target gene
          expressions against the control. The coloring scheme can be changed
          in the preference menu (KegArray > Preferences > Color).

          A zoom up view is launched by dragging an area of interest. In this
          view, each spot is clickable and linked to the corresponding KEGG
          GENES database entry.

        *MA plot (for CC pane)
          The MA plot of the data is shown in this pane. The colors of spots
          represent levels of increase or decrease of the target gene 
          expressions against the control. The coloring scheme can be changed
          in the preference menu (KegArray > Preferences > Color).

          A zoom up view is launched by dragging an area of interest. In this
          view, each spot is clickable and linked to the corresponding KEGG
          GENES database entry.

    *Clustering tab

      *Files:
        You can load the data files from your local disk or the EXPRESSION
        database.

        [Local] button
          Open a pop-up window to select data files on your local disk. Each
          data file should comply with the format described in the section
          2-2-1.

        [GenomeNet] button
          Open a pop-up window to retrieve the data stored in the GenomeNet
          EXPRESSION database. Availalbe entry IDs are listed with the checkbox
          in the window. The description of each entry will be displayed if you
          select one.

        [Clustering] button
          Once you select more than one data files, this button becomes active.
          Hierarchical clustering of the gene expression profiles constructed
          from the files listed will be performed by clicking this button.

          A tree view window is shown when the clustering is completed. You can
          change the number of clusters (1 - 6) by specifying the number in the
          input box at the top of the tree view window. Different clusters are
          shown in different colors. Clicking [Set results] button saves the
          color coding for further analysis using the Tools section.

      *Organism
        The organism name of the input data (specified by "#organism:" header).

      *Number of files
        The number of data files used for clustering.

      *Intensity threshold
        This is used to specify the threshold for the confidence lines
        discriminating the regulated genes from the unregulated ones.
        Only the genes with the intensity value above this threshold will be
        used for clustering.

      *Clustering algorithm
        Selection for a clustering algorithm from the pull-dowm menu (Currently
        only complete linkage is available).

  *Tools

    *Mapping to Pathway
      Map the gene expression and/or compound profiles onto the KEGG PATHWAY
      database. By clicking [Go] button, a list of pathways with the genes
      and/or compounds specified in the PathwayMap window is shown.

    *Mapping to BRITE
      Map the gene expression and/or compound profiles onto the KEGG BRITE
      database that is a collection of hierarchical classifications
      representing our knowledge on various aspects of biological systems. 
      By clicking [Go] button, a list of BRITE hierarchy data with the
      genes specified in the BRITE window is shown.

    *Mapping to Genome map
      Map the gene expression profiles onto the genome map provided by
      GenomeNet. The coloring scheme representing gene expression profiles can 
      be changed in the preference menu (KegArray > Preferences > Color).

    *Mapping to KEGG DAS
      Map the gene expression profiles onto the genome map provided by KEGG
      DAS. The coloring scheme representing gene expression profiles can be
      changed in the preference menu (KegArray > Preferences > Color).

    Please note that only KEGG IDs are available for mapping genes/compounds
    to the genome map, pathway map and BRITE data. If you use IDs other than
    KEGG IDs, ID conversion is necessary before using these tools (see below).

    *ID conversion
      The ORF IDs from the databases other than KEGG can be converted to the
      KEGG GENES IDs by using the ID converter provided by GenomeNet (the 
      Internet access is necessary). NCBI-GI, GenBank, IPI, NCBI-Gene, UniProt,
      UniGene are available for the target databases.

      The list of the conversion can be seen from the menu Tools > Conversion
      table.


2-4. KegArray tool-bar menu

  *KegArray

    [About KegArray]
      Show the version and copyright of KegArray

    *Preferences

      [Color]
        In the Color pane, you can specify the coloring scheme for the gene
        expression and comound profiles and the number of color gradient.

      [Network]
        In the Network pane, you can specify the server URL of the link action
        target and the proxy server URL.

      [Conversion]
        In the Conversion pane, you can specify the server URL providing the
        database ID conversion tool (only GenomeNet is available now). The list
        of the databases can be edited if you know the update of the database
        list of GenomeNet server.

  *File

    *Gene

     [Load data from EXPRESSION]
        Specify the entry id of the GenomeNet EXPRESSION database.

     [Load data from local file]
        Specify the file name of the gene expression data on your local disk.

    *Compound

     [Load data from local file]
        Specify the file name of the compound and ratio data on your local
        disk.

  *Edit

    [Find]
      Search for a gene by KEGG GENES ID. The gene will be marked on the 
      ArrayImage, Scatter plot and MA plot viewers in the KegArray analysis
      view window.

  *View

    [Statisticts]
      Show Statisticts viewer.

    [ArrayImage]
      Show ArrayImage viewer.

    [Scatter Plot (linear)]
      Show Scatter plot viewer.

    [MA Plot (CC)]
      Show MA plot viewer.

  *Tools

    [PathwayMap]
      Map the gene expression and/or compound profiles onto the KEGG PATHWAY
      database. By clicking [Go] button, a list of pathways with the genes
      and/or compounds specified in the PathwayMap window is shown.

    [BRITE]
      Map the gene expression and/or compound profiles onto the KEGG BRITE
      database that is a collection of hierarchical classifications
      representing our knowledge on various aspects of biological systems. 
      By clicking [Go] button, a list of BRITE hierarchy data with the
      genes specified in the BRITE window is shown.

    [GenomeMap]
      Map the gene expression profiles onto the genome map provided by
      GenomeNet. The coloring scheme representing gene expression profiles
      can be changed in the preference menu (KegArray > Preferences > Color).

    [GenomeMap (KEGG DAS)]
      Map the gene expression profiles onto the genome map provided by KEGG
      DAS. The coloring scheme representing gene expression profiles can be
      changed in the preference menu (KegArray > Preferences > Color).

    [ID conversion]
      The ORF IDs from the databases other than KEGG can be converted to the
      KEGG GENES IDs by using the ID converter provided by GenomeNet (the 
      Internet access is necessary). NCBI-GI, GenBank, IPI, NCBI-Gene, UniProt,
      UniGene are available for the target databases.

      The list of the conversion can be seen from the following menu Conversion
      table.

    [Conversion Table]
      Show conversion result table.

    Please note that only KEGG IDs are available for mapping genes/compounds
    to the genome map, pathway map and BRITE data. If you use IDs other than
    KEGG IDs, ID conversion is necessary before using these tools.

  *List

    *Gene

      The following menu items will display a pop-up table listing the
      regulated genes. The number of listed genes can be modified by specifying
      a value in the box at upper-right of the pop-up table.

      [Up-regulated (Linear)]
        List the up-regulated genes whose intensity ratios are greater than 
        the upper linear confidence line on the scatter plot.

      [Down-regulated (Linear)] 
        List the down-regulated genes whose intensity ratios are less than
        the lower linear confidence line on the scatter plot.

      [Up-regulated (CC)]
        List the up-regulated genes whose intensity ratios are greater than
        the upper confidence curve on the MA plot.

      [Down-regulated (CC)]
        List the down-regulated genes whose intensity ratios are less than
        the lower confidence curve on the MA plot.

    *Compound

      The following menu items will display a pop-up table listing the
      regulated compounds. The number of listed compounds can be modified by
      specifying a value in the box at upper-right of the pop-up table.

      [Up-regulated]
        List the up-regulated compounds whose ratios are greater than the
        threshold specified in the control panel.

      [Down-regulated] 
        List the down-regulated compounds whose ratios are less than the
        threshold specified in the control panel.


3. Limitations

  3-1. Out of memory error

  Clustering a large number of expression profiles will require a large heap
  memory size and KegArray will terminate the process with the "out of memory
  error" message. To avoid this, please set the intensity threshold higher (to
  decrease the number of genes) or run KegArray from a terminal by a command
  line as the following
    > java -jar -Xmx<memory size>M KegArray.jar
  You have to set larger size of memory than the default (usually 64M).
  (e.g. >java -jar -Xmx256M KegArray.jar)


4. About the License

  The KegArray license corresponds to the license of KEGG. Refer to the page
  below.
  http://www.genome.jp/kegg/legal.html

  Title and intellectual property rights in and to any content displayed by or
  accessed through this software belongs to the respective content owner. Such
  content may be protected by copyright or other intellectual property laws and
  treaties, and may be subject to terms of use of the third party providing 
  such content.  


5. Acknowledgment

  Some charts in this product have been developed by using the JFreeChart
  libraries (http://www.jfree.org/jfreechart/).


6. Feedback

  We appreciate any suggestions and comments. Please use the GenomeNet
  feedback form at the following URL to send your comments.
  http://www.genome.jp/feedback/?category=kegtools


==Escape Clause

  THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED WARRANTIES,
  INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
  FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL KANEHISA
  LABORATORIES OR ITS STAFF BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
  SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
  PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
  OR BUSINESS INTERRUPTION). HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
  WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
  OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
  ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
