DBGET (Integrated database retrieval system)
=============================================

Laboratory of Chemical Life Science, Bioinformatics Center, 
Institute for Chemical Research, Kyoto University
Uji, Kyoto 611-0011, Japan
Feedback: https://www.genome.jp/feedback/

1. Introduction
================

DBGET is a simple database retrieval system for finding and obtaining specific entries of
diverse databases. Here a database is simply considered a sequential collection of entries,
which may be stored in a single file or multiple files. Most of the existing molecular biology
databases can be treated in this simplified manner, or as so-called flat-file databases.
Because each entry of a database is given a unique identifier, i.e., an entry name or an
accession number, molecular biology databases in the world can be retrieved uniformly by the
combination of the database name and the identifier. 

      dbname:identifier

More, DBGET has been expanded to include the KEGG gene catalogs, which may also be
considered flat-file databases where the combination of the organism name and the gene name 

      organism:gene

is used for identification. 

2. Databases
===========

DBGET is an integrated database retrieval system for the following biological databases.
For more details, see https://www.genome.jp/dbget/.

KEGG database files can be downloaded from KEGG FTP site.
For more details, see the KEGG FTP web site https://www.kegg.jp/kegg/download/.

2-1. KEGG Databases
pathway	  path	KEGG pathways
brite	  br	Functional hierarchies
module	  md	KEGG modules
network	  ne	KEGG network
variant	  hsa_var	KEGG variant
orthology ko	KEGG orthology
genome	  gn	KEGG organisms
mgenome	  mgnm	Metagenomes
genes	  org code	Gene catalogs in high-quality genomes
mgenes	  -	Metagenomes Gene catalogs in metagenomes
compound  cpd	Chemical compounds
glycan	  gl	Glycans
reaction  rn	Chemical Reactions
rclass	  rc	Reaction class
enzyme	  ec	Enzyme nomenclature
disease	  ds	Human diseases
drug	  dr	Drugs
dgroup	  dg	Drug groups
genomes	  -	genome+mgenome
ligand	  -	compound+glycan+reaction+rclass+enzyme
kegg	  -	brite+pathway+module+network+variant+disease+dgroup+drug+orthology+genes+mgenes+genomes+ligand
expression	ex	Gene expression profiles Submitted by authors

2-2. Other Databases
refnuc	-	NCBI Reference Sequence Database	NCBI (https://www.ncbi.nlm.nih.gov/RefSeq/)
refpep	-	NCBI Reference Sequence Database	NCBI (https://www.ncbi.nlm.nih.gov/RefSeq/)
swissprot	sp	UniProt (Universal Protein Resource) protein sequence database	SIB/EBI (https://www.uniprot.org)
trembl	tr	UniProt (Universal Protein Resource) protein sequence database	SIB/EBI (https://www.uniprot.org)
pdb	pdb	PDB (Protein Data Bank) 3D structure database	RCSB (https://www.rcsb.org/)
epd	epd	Eukaryotic promoters	ISREC (https://epd.vital-it.ch/)
prosite	ps	Protein domains and families	ExPASy (https://prosite.expasy.org/)
pfam	pf	Collection of protein families	EBI (https://pfam.xfam.org/)
ncbi-cdd	ncbi-cdd	NCBI Conserved Domain Database	NCBI (https://www.ncbi.nlm.nih.gov/cdd/)
pmd	pmd	Protein mutants	DDBJ (http://pmd.ddbj.nig.ac.jp/~pmd/pmd.html)
aaindex1	aax1	Amino acid indices	Kyoto (https://www.genome.jp/aaindex/)
aaindex2	aax2	Amino acid indices	Kyoto (https://www.genome.jp/aaindex/)
aaindex3	aax3	Amino acid indices	Kyoto (https://www.genome.jp/aaindex/)
pdbstr	pdbstr	Protein sequences generated from PDB	Kyoto
carbbank	ccsd	Carbohydrate structures	Teikyo U /U Georgia
prosdoc	pdoc	Prosite literature	ExPASy (https://prosite.expasy.org/)
refseq	rs	refnuc+refpep
uniprot	up	swissport+trembl
aaindex	aax	aaindex1+aaindex2+aaindex3
motifdic	-	prosite+pfam+ncbi-cdd

3. Build Requirements
======================

DBGET is developed on Linux. It is tested on macOS and Cygwin. 
It may be able to build on UNIX-like operating systems.

3.1 Linux
----------
3.1.1
  - Ubuntu 22.04 LTS
  - Ubuntu GNU Compiler Collection version 11.3.0
   or
  - Ubuntu clang version 14.0.0
3.1.2
  - CentOS 7.9
  - GNU Compiler Collection version 4.8.5
   or
  - LLVM/Clang version 3.4.2
3.1.3
  - Fedora 37
  - GNU Compiler Collection version 12.2.1
   or
  - LLVM/Clang version 15.0.4
3.1.4
  - Red Hat Enterprise Linux Server release 7.6 (Maipo)
  - GNU Compiler Collection version 4.8.5

3.2 macOS
----------
3.2.1
  - macOS Ventura 13.0
  - Xcode 14.1
  - Apple clang version 14.0.0

3.3 Cygwin
----------
3.3.1
  - 3.3.6 on Windows 11
  - GNU Compiler Collection version 11.3.0
   or
  - LLVM/Clang version 8.0.1

4. Extracting DBGET
===================

The following directories will be created
(#.## is version number, eg dbget.6.5):

   dbget-#.## 			top level directory
   dbget-#.##/bin		executables
   dbget-#.##/doc		online manual
   dbget-#.##/etc		external data files
   dbget-#.##/src		source code


5. REST tools
===========

The REST tools connect to the GenomeNet REST server through the internet.
The dbget-#.##/bin directory contains the following shell scripts:

   bgets
   bfinds
   blists
   blinks
   bconvs
   binfos

All of the tools can be used without additional settings.
If required command is not installed on your system, you need to install it.
Examples are given at the header of each stript.

The REST tools are also provided at 
ftp://ftp.genome.jp/pub/tools/dbget/rest_tools.tar.gz.


6. Compile
===========

By using "configure", "Makefile" is automatically produced from Makefile.in
without specifying the operating system, so the compilation is very simple.

   % cd dbget-#.##    Moves into the distribution toplevel directory.
   % ./configure      Configures the software for your system.
   % make             Builds the binaries. 

After make, the build binaries and scripts are installed in the directory
"dbget-#.##/bin".


You may specify the following options [defaults in brackets] in configuration:

  --max-process=N     specify the number of max processor [1]
  --optimize=OPT      compile with optimize option [-O2]
  --cc=clang          compile with LLVM/Clang
  --rest              connect bfind, bget and btit to GenomeNet REST server
  --mcdb              create index with mcdb
  --help|--h          print this message


7. Set up
===========

7-1. KEGG FTP subscriber
In this example, suppose KEGG data is downloaded under $HOME
Set BIOROOT by entering

   on sh, bash, zsh:
      % BIOROOT=$HOME
      % export BIOROOT
      % PATH=$PATH:$BIOROOT/bin
   on csh, tcsh:
      % setenv BIOROOT $HOME
      % setenv PATH ${PATH}:$BIOROOT/bin

   % cd $HOME
   % /bin/ls -F
   README.kegg  RELEASE  brite/  dbget.6.5.tar.gz  genes/  ligand/  medicus/  module/  pathway/  xml/
   % tar xvzf dbget.6.5.tar.gz
   % cd dbget-6.5
   % ./install.sh

By default, install.sh command will install DBGET package under BIOROOT.

7-2. In the case of using GenomeNet REST API
Use "configure" with --rest option and make.
This option affects only bfind, bget and btit.

   % cd dbget-#.##
   % ./configure --rest
   % make clean
   % make

The created bfind, bget and btit do not require any external files.
These commands can be used without additional settings.

   % unset BIOROOT (for bash/zsh)

   % unsetenv BIOROOT (for csh/tcsh)

   % cd bin 
   % ./bfind hsa alcohol dehydrogenase
   % ./bget hsa:126

7-3. Otherwise
At first, you must define the environment variable BIOROOT to specify
the top direcotry of DBGET.
If you define it as bio at your home directory, the BIOROOT should be ~/bio. 

(for bash/zsh)
   % BIOROOT=$HOME/bio
   % export BIOROOT
   % PATH=$PATH:$BIOROOT/bin

(for csh/tcsh)
   % setenv BIOROOT $HOME/bio
   % setenv PATH ${PATH}:$BIOROOT/bin

And then make DBGET available as follows:

   % mkdir -p $BIOROOT/{bin,etc,db/kegg}
   % cd ~/dbget-#.##/bin 
   % cp bget bfind btit btab seqnew $BIOROOT/bin 
   % cd ~/dbget-#.##/etc 

Put the taxonomy file here.

   % ./make_genestab
   % cd etc 
   % cp dbtab keggtab genestab \ 
   databases databases-kegg databases-genes \ 
   services services-kegg services-genes \ 
   $BIOROOT/etc 

   % cd $BIOROOT/db 
   % cd kegg
   % mkdir brite disease drug genes genome ko ligand module network pathway
   % cd genes 

Put the flat-files of KEGG GENES here.

If you need other species, you should save them in the same manner
as H.influenzae. They are in lower case without filename extension.

   % seqnew T00001 T00001
   % bfind hin 012
   % bget hin:HI0012

The following command may slightly speed up the search.

   % btab -p $BIOROOT/etc/dbtab -o $BIOROOT/etc/dbtab 

The following shell script would be helpful to create their indexes.

(for bash/zsh)
   % for f in brite genome ko pathway module compound enzyme glycan reaction rclass disease drug dgroup network variant
   > do
   > ./seqnew $f $f
   > done
   % cd $BIOROOT/db/kegg/genes
   % for f in T0[0-9][0-9][0-9][0-9]
   > do
   > seqnew $f $f
   > done

(for csh/tcsh)
   % foreach f (brite genome ko pathway module compound enzyme glycan reaction rclass disease drug dgroup network variant)
   foreach? ./seqnew $f $f
   foreach? end
   % cd $BIOROOT/db/kegg/genes
   % foreach f (T0[0-9][0-9][0-9][0-9])
   foreach? seqnew $f $f
   foreach? end

If you need the online manuals, please set up them as follows:

   % mkdir -p ~/man/man{1,5}
   % cp ~/dbget-#.##/doc/*.1 ~/man/man1
   % cp ~/dbget-#.##/doc/*.5 ~/man/man5

   % export MANPATH=$MANPATH:~/man (for bash/zsh)

   % setenv MANPATH ${MANPATH}:${HOME}/man (for csh/tcsh)

Then the online manuals would be displayed using man command.

   % man bget


8. FAQ
=============

o Does DBGET work on my platform?

  This file "README" include information about installing DBGET on
  particular platforms. The target operating systems for DBGET are
  Linux and macOS.

o The modification for dbtab is non-effective. Why?

  Is there "dbtab.cdb" as well as "dbtab" in $BIOROOT/etc?
  Then, you must update the "dbtab.cdb" by using "btab" command.
  Please see the online manual for "btab".

==================================================================
Last update: 2022/12/08

------------------------------------------------------------------
If you have problems building DBGET, please access FeedBack form.
Feedback: https://www.genome.jp/feedback/
==================================================================