DBGET (Integrated database retrieval system) ============================================= Laboratory of Chemical Life Science, Bioinformatics Center, Institute for Chemical Research, Kyoto University Uji, Kyoto 611-0011, Japan Feedback: https://www.genome.jp/feedback/ 1. Introduction ================ DBGET is a simple database retrieval system for finding and obtaining specific entries of diverse databases. Here a database is simply considered a sequential collection of entries, which may be stored in a single file or multiple files. Most of the existing molecular biology databases can be treated in this simplified manner, or as so-called flat-file databases. Because each entry of a database is given a unique identifier, i.e., an entry name or an accession number, molecular biology databases in the world can be retrieved uniformly by the combination of the database name and the identifier. dbname:identifier More, DBGET has been expanded to include the KEGG gene catalogs, which may also be considered flat-file databases where the combination of the organism name and the gene name organism:gene is used for identification. 2. Databases =========== DBGET is an integrated database retrieval system for the following biological databases. For more details, see https://www.genome.jp/dbget/. KEGG database files can be downloaded from KEGG FTP site. For more details, see the KEGG FTP web site https://www.kegg.jp/kegg/download/. 2-1. KEGG Databases pathway path KEGG pathways brite br Functional hierarchies module md KEGG modules network ne KEGG network variant hsa_var KEGG variant orthology ko KEGG orthology genome gn KEGG organisms mgenome mgnm Metagenomes genes org code Gene catalogs in high-quality genomes mgenes - Metagenomes Gene catalogs in metagenomes compound cpd Chemical compounds glycan gl Glycans reaction rn Chemical Reactions rclass rc Reaction class enzyme ec Enzyme nomenclature disease ds Human diseases drug dr Drugs dgroup dg Drug groups genomes - genome+mgenome ligand - compound+glycan+reaction+rclass+enzyme kegg - brite+pathway+module+network+variant+disease+dgroup+drug+orthology+genes+mgenes+genomes+ligand expression ex Gene expression profiles Submitted by authors 2-2. Other Databases refnuc - NCBI Reference Sequence Database NCBI (https://www.ncbi.nlm.nih.gov/RefSeq/) refpep - NCBI Reference Sequence Database NCBI (https://www.ncbi.nlm.nih.gov/RefSeq/) swissprot sp UniProt (Universal Protein Resource) protein sequence database SIB/EBI (https://www.uniprot.org) trembl tr UniProt (Universal Protein Resource) protein sequence database SIB/EBI (https://www.uniprot.org) pdb pdb PDB (Protein Data Bank) 3D structure database RCSB (https://www.rcsb.org/) epd epd Eukaryotic promoters ISREC (https://epd.vital-it.ch/) prosite ps Protein domains and families ExPASy (https://prosite.expasy.org/) pfam pf Collection of protein families EBI (https://pfam.xfam.org/) ncbi-cdd ncbi-cdd NCBI Conserved Domain Database NCBI (https://www.ncbi.nlm.nih.gov/cdd/) pmd pmd Protein mutants DDBJ (http://pmd.ddbj.nig.ac.jp/~pmd/pmd.html) aaindex1 aax1 Amino acid indices Kyoto (https://www.genome.jp/aaindex/) aaindex2 aax2 Amino acid indices Kyoto (https://www.genome.jp/aaindex/) aaindex3 aax3 Amino acid indices Kyoto (https://www.genome.jp/aaindex/) pdbstr pdbstr Protein sequences generated from PDB Kyoto carbbank ccsd Carbohydrate structures Teikyo U /U Georgia prosdoc pdoc Prosite literature ExPASy (https://prosite.expasy.org/) refseq rs refnuc+refpep uniprot up swissport+trembl aaindex aax aaindex1+aaindex2+aaindex3 motifdic - prosite+pfam+ncbi-cdd 3. Build Requirements ====================== DBGET is developed on Linux. It is tested on macOS and Cygwin. It may be able to build on UNIX-like operating systems. 3.1 Linux ---------- 3.1.1 - Ubuntu 22.04 LTS - Ubuntu GNU Compiler Collection version 11.3.0 or - Ubuntu clang version 14.0.0 3.1.2 - CentOS 7.9 - GNU Compiler Collection version 4.8.5 or - LLVM/Clang version 3.4.2 3.1.3 - Fedora 37 - GNU Compiler Collection version 12.2.1 or - LLVM/Clang version 15.0.4 3.1.4 - Red Hat Enterprise Linux Server release 7.6 (Maipo) - GNU Compiler Collection version 4.8.5 3.2 macOS ---------- 3.2.1 - macOS Ventura 13.0 - Xcode 14.1 - Apple clang version 14.0.0 3.3 Cygwin ---------- 3.3.1 - 3.3.6 on Windows 11 - GNU Compiler Collection version 11.3.0 or - LLVM/Clang version 8.0.1 4. Extracting DBGET =================== The following directories will be created (#.## is version number, eg dbget.6.5): dbget-#.## top level directory dbget-#.##/bin executables dbget-#.##/doc online manual dbget-#.##/etc external data files dbget-#.##/src source code 5. REST tools =========== The REST tools connect to the GenomeNet REST server through the internet. The dbget-#.##/bin directory contains the following shell scripts: bgets bfinds blists blinks bconvs binfos All of the tools can be used without additional settings. If required command is not installed on your system, you need to install it. Examples are given at the header of each stript. The REST tools are also provided at ftp://ftp.genome.jp/pub/tools/dbget/rest_tools.tar.gz. 6. Compile =========== By using "configure", "Makefile" is automatically produced from Makefile.in without specifying the operating system, so the compilation is very simple. % cd dbget-#.## Moves into the distribution toplevel directory. % ./configure Configures the software for your system. % make Builds the binaries. After make, the build binaries and scripts are installed in the directory "dbget-#.##/bin". You may specify the following options [defaults in brackets] in configuration: --max-process=N specify the number of max processor [1] --optimize=OPT compile with optimize option [-O2] --cc=clang compile with LLVM/Clang --rest connect bfind, bget and btit to GenomeNet REST server --mcdb create index with mcdb --help|--h print this message 7. Set up =========== 7-1. KEGG FTP subscriber In this example, suppose KEGG data is downloaded under $HOME Set BIOROOT by entering on sh, bash, zsh: % BIOROOT=$HOME % export BIOROOT % PATH=$PATH:$BIOROOT/bin on csh, tcsh: % setenv BIOROOT $HOME % setenv PATH ${PATH}:$BIOROOT/bin % cd $HOME % /bin/ls -F README.kegg RELEASE brite/ dbget.6.5.tar.gz genes/ ligand/ medicus/ module/ pathway/ xml/ % tar xvzf dbget.6.5.tar.gz % cd dbget-6.5 % ./install.sh By default, install.sh command will install DBGET package under BIOROOT. 7-2. In the case of using GenomeNet REST API Use "configure" with --rest option and make. This option affects only bfind, bget and btit. % cd dbget-#.## % ./configure --rest % make clean % make The created bfind, bget and btit do not require any external files. These commands can be used without additional settings. % unset BIOROOT (for bash/zsh) % unsetenv BIOROOT (for csh/tcsh) % cd bin % ./bfind hsa alcohol dehydrogenase % ./bget hsa:126 7-3. Otherwise At first, you must define the environment variable BIOROOT to specify the top direcotry of DBGET. If you define it as bio at your home directory, the BIOROOT should be ~/bio. (for bash/zsh) % BIOROOT=$HOME/bio % export BIOROOT % PATH=$PATH:$BIOROOT/bin (for csh/tcsh) % setenv BIOROOT $HOME/bio % setenv PATH ${PATH}:$BIOROOT/bin And then make DBGET available as follows: % mkdir -p $BIOROOT/{bin,etc,db/kegg} % cd ~/dbget-#.##/bin % cp bget bfind btit btab seqnew $BIOROOT/bin % cd ~/dbget-#.##/etc Put the taxonomy file here. % ./make_genestab % cd etc % cp dbtab keggtab genestab \ databases databases-kegg databases-genes \ services services-kegg services-genes \ $BIOROOT/etc % cd $BIOROOT/db % cd kegg % mkdir brite disease drug genes genome ko ligand module network pathway % cd genes Put the flat-files of KEGG GENES here. If you need other species, you should save them in the same manner as H.influenzae. They are in lower case without filename extension. % seqnew T00001 T00001 % bfind hin 012 % bget hin:HI0012 The following command may slightly speed up the search. % btab -p $BIOROOT/etc/dbtab -o $BIOROOT/etc/dbtab The following shell script would be helpful to create their indexes. (for bash/zsh) % for f in brite genome ko pathway module compound enzyme glycan reaction rclass disease drug dgroup network variant > do > ./seqnew $f $f > done % cd $BIOROOT/db/kegg/genes % for f in T0[0-9][0-9][0-9][0-9] > do > seqnew $f $f > done (for csh/tcsh) % foreach f (brite genome ko pathway module compound enzyme glycan reaction rclass disease drug dgroup network variant) foreach? ./seqnew $f $f foreach? end % cd $BIOROOT/db/kegg/genes % foreach f (T0[0-9][0-9][0-9][0-9]) foreach? seqnew $f $f foreach? end If you need the online manuals, please set up them as follows: % mkdir -p ~/man/man{1,5} % cp ~/dbget-#.##/doc/*.1 ~/man/man1 % cp ~/dbget-#.##/doc/*.5 ~/man/man5 % export MANPATH=$MANPATH:~/man (for bash/zsh) % setenv MANPATH ${MANPATH}:${HOME}/man (for csh/tcsh) Then the online manuals would be displayed using man command. % man bget 8. FAQ ============= o Does DBGET work on my platform? This file "README" include information about installing DBGET on particular platforms. The target operating systems for DBGET are Linux and macOS. o The modification for dbtab is non-effective. Why? Is there "dbtab.cdb" as well as "dbtab" in $BIOROOT/etc? Then, you must update the "dbtab.cdb" by using "btab" command. Please see the online manual for "btab". ================================================================== Last update: 2022/12/08 ------------------------------------------------------------------ If you have problems building DBGET, please access FeedBack form. Feedback: https://www.genome.jp/feedback/ ==================================================================