Documentation

Overview

Alignment view

In the alignment viewer, users can browse an alignment of any set of uploaded nucleotide sequences (see example: alignment of three sequences). In the alignment, high-similarity regions detected by BLASTn or tBLASTx are color-coded based on the reported %-identity. In addition, dot plots summarizing these high-similarity regions are shown beside the alignment. Automatic position adjustment on the alignment (including adjustment of circular permutation) is implemented for enhancing visualization (e.g., for quick understanding of genomic colinearity). Manual position adjustment can also be applied. For details of genomic alignment view, see this section.

Gene prediction and similarity search

Gene prediction and protein similarity search against NCBI/nr can be performed in parallel with alignment computation. The result can be browsed through the web interface and downloaded as tables. In the alignment view, resulting gene positions and the best hit against nr are used to indicate gene positions and labels.

Resource for computation

The DiGAlign server is a part of the GenomeNet service. The computational time is provided by Supercomputer System of the Institute for Chemical Research, Kyoto University.

Upload sequences

From the upload page, users can upload nucleotide sequences and choose computational options (i.e, type of BLAST, and whether perform gene and function predictions). After validation of the uploaded sequences, a new session will be created and computation begins.

Current limitation for the number of sequences and ID constraints are listed below.

  • Input FASTA should include genomic nucleotide sequences (up to 200 sequences).
  • Two or more sequences should be uploaded. When only two sequences are uploaded, tree is not generated.
  • Each sequence should be no longer than 10 million nucleotides and no shorter than 100 nucleotides.
  • Sequence IDs are recommended to be within 30 characters. If longer than that, the first 30 characters will be used.
  • Alphabets, digits, and special characters (dots, hyphens, and underscores) can be used as sequence ID. The other characters will be replaced with underscores.
  • If sequence ID begins or ends with the special characters, the special characters will be removed.

When users wish to use already prepared gene annotation information, gene list can be uploaded. The list should be tab-separated, expanded BED format composed of columns as follows.

  1. genome sequence ID
  2. start position of gene (0-based position)
  3. stop position of gene
  4. gene ID (acceptable characters: digits, alphabets, dots, hyphens, and underscores)
  5. anything (not used by DiGAlign)
  6. strand (+/-)
  7. (optional) description label used in alignment viewer. If this is not given, GHOSTX result is used (default behavior).
  8. (optional) color name/code used in alignment viewer. If this is not given, default color is assigned.

Computation starts

Computation steps include BLAST execution, distance matrix generation, tree generation, gene finding, and gene similarity search. Clicking the submit button on the upload page makes redirection to a calculation progress reporting page like this. The page will be automatically reloaded to announce the progress under calculation. When all the calculation steps on the upload are finished, a "session" is activated. At that time, a notification email is sent to the uploaded address. Users are able to start browsing all the results from the session main page that is announced in the progress page and the email. An example of the session main page is shown here.

Computation time will depend on various factors. Generally, the sum of length of uploaded sequences is one factor. This is because the sequence length affects the computation time for the BLAST search. Note that gene similarity search by GHOSTX against NCBI/nr requires a relatively long computational time. We tested various input sequences with various options, and in most cases, calculations were finished from minutes to a couple of hours (except when the computater system is fully occupied).

Genomic alignment view

Caution: Microsoft Internet Explorer takes quite a long time to generate genomic alignments. Please use other browsers (Google Chrome, Edge, Firefox, Safari, etc).

The page of alignment view contains (1) panel for configuration/download and (2) the alignment view.

This view also provides pairwise dot plots of sequences included in the alignment. A color bar on the upper left corner of the view represents %-identity shown in the alignment and dot plots. This bar can be enlarged/shrinked by scrolling mouse wheel on the bar.

Panel for configuration/download

A panel for configuration/download is shown above of the alignment image. The panel contains three navigation tabs: "Basic parameters", "Customize sequences", and "Download". The tabs provide various functions for custom visualization of the alignment.

The "basic parameters" tab provides parameters related to genome positioning, gene labels, size adjustment, etc.

  • positioning of each sequence (auto/as is/reset)—switch for auto adjustment of alignment position for clear representation of sequence collinearity (i.e., circular permutation, reverse orientation, and shift of the start position)
  • gene labels (shown/not shown)—switch for gene labels shown over arrows representing genes.
  • gene labels angle—rotate gene labels
  • %-identity interval (default/customize)—%-identity corresponding to the alignment colors can be customized by inputting comma-separated multiple values (0 to 100; decimal point is accepted)
  • colors (default/rainbow/iridescent/monochromed/customize)—the figure can be monochromed or alignment colors can be customized by inputting one color (used as seed) or comma-separated multiple colors (all colors specified). If one color is specified, gradation from the given color to white is generated. When more than 10 intervals are given in "%-identity interval" section, "rainbow colors" will be used as default.
  • ticks of genomes / vertical dashed lines (shown/not shown)—ticks and vertical dashed lines interval for each sequence
  • representation of BLAST hits (normal/position emphasized)—"position emphasized" is effective to check the exact position of similar regions.
  • scale of sequences—parameter for enlargement/shrinkage of displayed sequences.
  • interval between sequences—vertical space length between sequences
  • interval of dotplot grid—grid interval in dot plots
  • font family, base font size and font weight—parameters for the font setting

A tab of "Customize sequences" provides a table to add/alternate/delete sequences that are included in the alignment and reorder of sequences. Each genome in the alignment can be manually/automatically repositioned by circular permutation, reverse stranded and shift of start position.

A tab of "Download" provides a download link of a represented alignment image in the SVG format.

Tree view

The page of tree view contains (1) tree and (2) a panel for configuration/download.

Tree

Sequence tree is shown on the right of the screen (or on the bottom when the screen width is short). Users can select one from two types of tree view: "circular view" and "rectangular view". The circular view is designed for a comprehensive visualization of the tree regardless of the number of sequences (even in the case of thousands of sequences). On the other hand, the rectangular (i.e., linear) view is suitable for browsing detailed information. When the number of sequences is small enough, the rectangular view could be comprehensive.

In the rectangular view, each of inner nodes represented by filled circles is linked to an alignment of genomes that are included in its subtree. Such inner node hyperlink can be shown in the circular view by a parameter "show link to alignment" inside the configuration panel. This panel provides various functions for configuration of a tree as well as image/data download. For details of the panel, see this section.

Note about the appearance of trees

  • The tree construction algorithm follows that of ViPTree. Namely, a BIONJ tree is generated based on genomic similarity score (SG), which is computed for each pair of sequences based on the BLAST result (please visit ViPTree documentation for more information). The scale of the tree indicates SG. Please note that the tree itself just facilitates identification of closely related sequences and by no means provides biological relationships to infer their evolution and phylogeny.
  • DiGAlign generates rooted trees using "midpoint rooting" so consider that the location of the root may not be appropriate.

Panel for configuration/download

A panel for configuration/download is also shown on the left (or on the top when the screen width is short) of the tree. This panel provides a switch between the circular view and the rectangular view by using tabs on the top of the panel and by clicking the "redraw tree" button. A download tab provides download links of three files. Namely, a BIONJ tree in the Newick format and the represented tree image in the SVG format.

This panel also provides many visualization parameters for a tree. For example, parameters available in the tab of "Circular tree" are listed below.

  • branch length scaling (log/linear/non-scaled)—Representation of branch length scaling can be changed.
  • show link to alignment (shown/not shown)—Links to a genomic alignment are shown or hidden.
  • show labels of genomes (shown/not shown)—Labels of each sequence can be shown and hidden.
  • tree radius and space for labels—Size of tree drawing and space for label descriptions.
  • base font size, font family, and font weight—Parameters for font settings.

A tab of "Download" provides two download links listed below.

  • A Newick formatted file of the tree (calculated by BIONJ based on genomic distance matrix)
  • An image file of current tree view (SVG format)

After the download of the SVG file, SVG formatted images can be edited and/or converted to other formats (e.g., PDF, PNG and TIFF), by software such as Adobe Illustrator and Inkscape. Inkscape is freely available for Windows, macOS, and Linux PC.

Session main page

The session main page is a portal for investigation of the results. On the left of the screen (or on the top when the screen width is short), basic information of the session is listed. On the right of the screen (or on the top when the screen width is short), menu for all the viewer and file download are provided.

Contents of the basic information

  • Uploaded sequences
  • Selected options(type of BLAST, gene prediction, genetic code, function prediction)
  • Time of data uploaded, calculation finished, and the session expired
  • Session ID and DiGAlign release by which the session is generated

Important note about session expiration

  • Session will be expired after three months from the time when sequences are uploaded.
  • In addition, if the DiGAlign is updated after the session creation, session might not work properly. Data compatibility of the session might not be reliable since visualization/genome tables started to use new data.

Contents of the menu

  • Browse alignment and tree
  • Browse table (gene similarity search)
  • Download files (zip file containing blast outputs, gene prediction, and gene similarity search, or all of them)