How to Use DBGET/LinkDB

DBGET and LinkDB

DBGET is a simple database retrieval system for a diverse range of molecular biology databases. Here a database is considered simply as a set of entries, which may be stored in a single file or multiple files. Most of the existing molecular biology databases can be treated in this simplified manner, or as so-called flat-file databases. Our definition of flat-file is not limited to text data; it also includes other types of data such as images for KEGG pathways, Java graphics for genome maps and expression profiles, and 3D graphics for protein structures. This is accomplished by treating a collection of HTML files as a database.

Because each entry of a database is given a unique identifier, i.e., an entry name or an accession number, the molecular biology databases in the world can be retrieved uniformly by specifying the combination of the database name and the identifier:

The KEGG gene catalogs are also considered as flat-file databases where the combination of the organism name and the gene name: is used for identification.

It is a common practice to cross-reference related data among different databases. Thus, the molecular biology databases in the world form a web of data and data links, which is a huge graph object like the WWW. DBGET has powerful capabilities to search against this graph object, which is stored in the LinkDB database.

LinkDB is a database of links, each of which is represented as a binary relation in the form of:

LinkDB contains all cross-reference links, original and reverse links, extracted from all the databases in DBGET. Furthermore, LinkDB provides additional links, equivalent links, representing the relationship between same objects in different databases. Thus, the links in LinkDB are of the following three types: original, reverse and equivalent

Basic command line search in DBGET/LinkDB

DBGET/LinkDB has three basic commands (or three basic modes in the Web version), bfind, bget, and blink, to search and extract database entries. bget performs the retrieval of database entries specified by the combination of dbname:identifier. bfind is used for searching entries by keywords. One notable feature of DBGET is that no keyword indexing is performed when a database is installed or updated. Instead, selected fields are extracted and stored in separate files for bfind searches. This is an advantage for rapid database updates and the target fields are described here.

Once entries of interest are found, blink, which is the LinkDB search, can be used to retrieve related entries in a given database or all databases in GenomeNet. Related entries are found not only by the original links but also by the reverse and equivalent links. In addition to finding links for a single entry or multiple entries, blink is capable to find links for all entries in a database. This database-to-database link capability is especially useful, when a local database is to be included in the web of molecular biology databases. Suppose a new genome is completely sequenced and each ORF is linked to an entry of the existing databases by sequence similarity search. By combining the binary relation file containing ORF to dbname:identifier links and, for example, existing links between the database and KEGG PATHWAY, it is possible to map ORFs onto KEGG pathway maps.

The primary purpose of molecular biology databases is to collect and store factual data, and associated text information is not always complete or appropriate. GenomeNet Bioinformatics Tools, including sequence similarity searches by BLAST and FASTA, sequence motif searches by MOTIF, and biological searches in KEGG are all linked to the DBGET system on the web. Please refer to URLs for making DBGET/KEGG queries for linking on the web.

Database names

Each database (or organism) has a full name and an abbreviation as shown in:

In addition, generic names (compound database names) are predefined to facilitate search against similar databases, such as:

Advanced search by bfind

At the unix command level, bfind command takes the following form:

Here expression is to be entered in the search box in the Web, and it may contain Boolean operators. The default search is case-insensitive and it is as if wild characters are added to both ends of the given keyword. To restrict the query, use the following options: When two keywords are given, the default search will identify entries that contain both keywords. In other words, the default is an AND search. To modify this condition, use the following Boolean operators: To search a block of keywords in sequence, use double quotes: Without double quotes the search is made for separate keywords with the AND operator. You can also use parentheses to specify the priority of evaluation. For example: The parenthesis should be escaped by a backslash (\) in the command line mode.

Advanced retrieval by bget

The bget command syntax is the following:

where the marked items may be entered in the DBGET search box when the bget mode is selected. Thus, more than one entry may be specified in the search box, and if the second form is used entries in different databases may be retrieved. A most useful command option is: which is to obtain only the sequence data in the fasta format. When an entry contains multiple sequences, such as in PDB and GENES, use to select the sequence, where # is the sequence number or it can be a for the amino acid sequence and n for the nucleotide sequence in the GENES database.

The -f option can be also used to obtain chemical structure information in a MDL/MOL file format or a KCF (KEGG Chemical Function) format from the COMPOUND and DRUG databases:

Brief History of DBGET and LinkDB

Links and References

  1. Kanehisa, M.I.; Los Alamos sequence analysis package for nucleic acids and proteins. Nucleic Acids Res. 10, 183-196 (1982). [pubmed]
  2. Kanehisa, M., Klein, P., Greif, P., and DeLisi, C.; Computer analysis and structure prediction of nucleic acids and proteins. Nucleic Acids Res. 12, 417-428 (1984). [pubmed]
  3. Fujibuchi, W., Goto, S., Migimatsu, H., Uchiyama, I., Ogiwara, A., Akiyama, Y., and Kanehisa, M.; DBGET/LinkDB: an integrated database retrieval system. Pacific Symp. Biocomputing 1998, 683-694 (1997). [pubmed]
  4. Kanehisa, M.; Linking databases and organisms: GenomeNet resources in Japan. Trends Biochem Sci. 22, 442-444 (1997). [pubmed]

Created: 21 August 1995
Last Updated: 9 March 2012