DBGET is a simple database retrieval system for a diverse range of molecular biology databases. Here a database is considered simply as a set of entries, which may be stored in a single file or multiple files. Most of the existing molecular biology databases can be treated in this simplified manner, or as so-called flat-file databases. Our definition of flat-file is not limited to text data; it also includes other types of data such as GIF images for KEGG pathways, Java graphics for genome maps and expression profiles, and 3D graphics for protein structures. This is accomplished by treating a collection of HTML files as a database.
Because each entry of a database is given a unique identifier, i.e., an entry name or an accession number, the molecular biology databases in the world can be retrieved uniformly by specifying the combination of the database name and the identifier:
It is a common practice now to cross-reference related data among different databases. Thus, the molecular biology databases in the world form a web of data and data links, which is a huge graph object like the WWW. DBGET has powerful capabilities to search against this graph object, which is stored in the LinkDB database.
LinkDB is a database of links, each of which is represented as a binary relation in the form of:
The architecture of the DBGET system is illustrated in the figure. DBGET has three basic commands (or three basic modes in the Web version), bfind, bget, and blink, to search and extract database entries. bget performs the retrieval of database entries specified by the combination of dbname:identifier. bfind is used for searching entries by keywords. One notable feature of DBGET, which is different from Entrez, SRS, and other text search systems, is that no keyword indexing is performed when a database is installed or updated. Instead, selected fields are extracted and stored in separate files for bfind searches. This is an advantage for rapid database updates, but sometimes a disadvantage for elaborate searching. To supplement bfind, the full text search STAG is provided by GenomeNet. The primary purpose of molecular biology databases is to collect and store factual data, and associated text information is not always complete or appropriate. As illustrated in the figure, sequence similarity searches by BLAST and FASTA, sequence motif searches by MOTIF, and biological searches in KEGG are all linked to the DBGET system.
Once entries of interest are found, blink, which is the LinkDB search, can be used to retrieve related entries in a given database or all databases in GenomeNet. Related entries are found not only by the original links but also by the reverse links and indirect links. How to compute indirect links is defined in the link table for each database, which may be edited if necessary. In the previous versions of LinkDB, indirect links were precomputed and stored in the database. Currently, indirect links are computed on the fly using the data structure called suffix array.
In addition to finding links for a single entry or multiple entries, blink is capable to find links for all entries in a database. This database-to-database link capability is especially useful, when a local database is to be included in the web of molecular biology databases. Suppose, for example, a new genome is completely sequenced and each ORF is linked to an entry of the existing databases by sequence similarity search. By uploading the binary relation file containing ORF to dbname:identifier links, and defining paths for indirect links, it is possible, for example, to assign EC numbers to ORFs or to map ORFs onto KEGG pathways.
Each database (or organism) has a full name and an abbreviation as shown in:
| Fixed release | Cumulative updates | Both | |||
| genbank | gb | genbank-upd | gbu | genbank-today | gbt |
| embl | emb | embl-upd | embu | embl-today | embt |
| swissprot | sp | swissprot-upd | spu | swissprot-today | spt |
The Web version of DBGET provides the choice of the bfind mode (default) and the bget mode. In the bfind mode, keywords and optional characters can be entered in the search box. You then get a list of entries that contain matching keywords. By selecting an entry name in the list, you obtain the database entry. When you know beforehand the entry name (or the primary accession number or the primary gene name) of your interest, it is much faster to retrieve the entry by switching to the bget mode. Just enter the entry name (or the accession number or the gene name) in the search box. Once you get an entry in either mode, you can further retrieve related entries in different databases by clicking on marked items. In order to obtain all related entries at a time, click on LinkDB at the top line or the marked entry name. This invokes the LinkDB search or the blink mode search.
The default file searched in the bfind mode contains the title description, which is derived from entry name, primary accession, gene names and synonyms (GENES database only), and definition (or title) fields. The field names in the major databases are the following.
| Database | Entry name | Accession | Gene names | Definition or title |
| GenBank/RefSeq | LOCUS | ACCESSION | DEFINITION | |
| EMBL/SWISS-PROT | ID | AC | DE | |
| PIR | ENTRY | ACCESSIONS | TITLE | |
| PDB | HEADER | COMPND | ||
| LIGAND | ENTRY | NAME | ||
| GENES | ENTRY | NAME | DEFINITION |
At the unix command level, bfind command takes the following form:
The bget command syntax is the following: