KAAS (KEGG Automatic Annotation Server) provides functional annotation of genes in a genome by BLAST comparisons against a manually curated set of ortholog groups in KEGG GENES. The result contains KO (KEGG Orthology) assignments and automatically generated KEGG pathways. This server is used internally to annotate DGENES for draft genomes and EGENES for EST contigs.
Normally query sequences are amino acid sequences representing a set of protein-coding genes in a complete genome. KO assignments are based on results from BLASTP.
Check the "Nucleotide" checkbox if queries are nucleotide sequences representing a set of EST contigs or ESTs. In this case, KO assignments are based on results from BLASTX and TBLASTN.
In both cases sequences should be in multi-FASTA format with unique IDs.
The gene IDs must not include tabulator.
The URL to access the results will be sent to this address after the assignments are completed.
GENES data set
One or more species may be specified as a template data set for KO assignment.
The computation time taken is proportional to the size of the data set. The accuracy will be improved if closely related species of the query are contained in the data set.
pre-selected data set of species from each taxonomic group in GENES (about 25 species)
- Manual selection
Template species may be selected using the KEGG organism code excepting codes in DGENES and EGENES.
KO assignment methods may be performed based on the bi-directional best hit (BBH, default) of BLAST or single-directional best hit (SBH).
The computation time of the BBH method is about twice that of SBH. However, the method based on BBH will be more accurate than SBH, if the number of query sequences is large enough (genome scale). If the number of query sequences is small, then the SBH method should suffice (and save time).
Example of result
Users query list
|ID||name||state||result||start (GMT)||end (GMT)||retry|
|1156789000||Human||computing||4/01 13:00 2007|
|1156789012||E.coli||complete||html, text||4/02 7:04 2007||4/02 7:15 2007|
|1156789055||query070403||failed||4/03 12:20 2007||retry|
'Failed' means abnormal process termination for the trouble with the server or network. You can retry the computation with same file and conditions.
The flat list of query genes with the K numbers given by the KAAS.
BRITE hierarchies (changed from "KO hierarchy")
The hierarchical list of annotated genes, which is categorized according to the BRITE database.
The list of pathways with linkes to graphical pathway maps.
First, the BLAST bit scores between a query sequence and the reference sequence set (taken from the KEGG GENES database) are computed, and homologs are found in the reference set. Next, homologs ranked above the threshold are selected as ortholog candidates based on the BLAST score and the bi-directional hit rate (BHR) defined below. Ortholog candidates are divided into KO groups according to the annotation of the KEGG GENES database. Finally, the assignment score is calculated based on the likelihood and heuristics for each KO group. Then, the K number of the KO group with the highest score is assigned to the query sequence.
Given a genome to be annotated, it is compared against each genome in the reference set of the KEGG GENES database by BLAST searches in both forward and reverse directions, taking each gene in genome A as a query compared against all genes in genome B, and vice versa. Those BLAST hits with bit scores less than 60 are removed. Because the bit scores of a gene pair a and b from two genomes A and B, respectively, can be different in forward and reverse directions, and because the top scores do not necessarily reflect the order of the rigorous Smith-Waterman scores, we define the BHR as:
BHR = Rf * Rr
Here, R = S'/Sb where S' is the bit score of a against b, and Sb is the score of a against the best-hit gene in genome B (which may not necessarily be b). Rf refers to the score from the forward BLAST (A against B), and Rr refers to the score from the reverse BLAST (B against A). We select those genes whose BHR is greater than 0.95 in BBH method, and Rf is greater than 0.95 in SBH method.
We define a score for each ortholog group in order to assign the best fitting K numbers to the query gene:
where Sh is the highest score among all ortholog candidates in the ortholog group, m and n are the sequence lengths of the query and the target of BLAST, respectively, N is the number of organisms in ortholog group, x is the number of organisms in the original ortholog group from which this group is derived, and p is the ratio of the size of the original ortholog group versus the size of the entire GENES database. The second term is for the normalization of the first term by sequence lengths, and the third term is a weighting factor to consider the number of ortholog candidates that are found in the original.
Download of stand-alone version
Stand-alone KAAS for Linux and OS X
Please use the feedback page to send your comments or questions to KAAS.