KAAS Help

KAAS (KEGG Automatic Annotation Server) provides functional annotation of genes in a genome by BLAST comparisons against a manually curated set of ortholog groups in KEGG GENES. The result contains KO (KEGG Orthology) assignments and automatically generated KEGG pathways. This server is used internally to annotate DGENES for draft genomes and EGENES for EST contigs.

Query form

Query sequences

Normally query sequences are amino acid sequences representing a set of protein-coding genes in a complete genome. KO assignments are based on results from BLASTP.

Check the "Nucleotide" checkbox if queries are nucleotide sequences representing a set of EST contigs or ESTs. In this case, KO assignments are based on results from BLASTX and TBLASTN.

In both cases sequences should be in multi-FASTA format with unique IDs.
The gene IDs must not include tabulator.

ex.)
>0001
MVKVYAPASSANMSVGFDVLGAAVTPVDGALLGDVVTVEAAETFSLNNLGRFADKLPSEP
RENIVYQCWERFCQELGKQIPVAMTLEKNMPIGSGLGSSACSVVAALMAMNEHCGKPLND
TRLLALMGELEGRISGSIHYDNVAPCFLGGMQLMIEENDIISQQVPGFDEWLWVLAYPGI
KVSTAEARAILPAQYRRQDCIAHGRHLAGFIHACYSRQPELAAKLMKDVIAEPYRERLLP
GFRQARQAVAEIGAVASGISGSGPTLFALCDKPETAQRVADWLGKNYLQNQEGFVHICRL
DTAGARVLEN
>0002
MKLYNLKDHNEQVSFAQAVTQGLGKNQGLFFPHDLPEFSLTEIDEMLKLDFVTRSAKILS
AFIGDEIPQEILEERVRAAFAFPAPVANVESDVGCLELFHGPTLAFKDFGGRFMAQMLTH
IAGDKPVTILTATSGDTGAAVAHAFYGLPNVKVVILYPRGKISPLQEKLFCTLGGNIETV
AIDGDFDACQALVKQAFDDEELKVALGLNSANSINISRLLAQICYYFEAVAQLPQETRNQ
LVVSVPSGNFGDLTAGLLAKSLGLPVKRFIAATNVNDTVPRFLHDGQWSPKATQATLSNA
MDVSQPNNWPRVEELFRRKIWQLKELGYAAVDDETTQQTMRELKELGYTSEPHAAVAYRA
LRDQLNPGEYGLFLGTAHPAKFKESVEAILGETLDLPKELAERADLPLLSHNLPADFAAL
RKLMMNHQ
>0003
MKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDHGWWKQH
YEWRGNRWHLHGPPPPPRHHKKAPHDHHGGHGPGKHHR
...
..
.

E-mail address

The URL to access the results will be sent to this address after the assignments are completed.

GENES data set

One or more species may be specified as a template data set for KO assignment.

The computation time taken is proportional to the size of the data set. The accuracy will be improved if closely related species of the query are contained in the data set.

- Representative set (for GENES, for Eukaryotes, for Prokaryotes)

pre-selected data set of species from each taxonomic group in GENES (about 25 species)

- Manual selection

Template species may be selected using the KEGG organism code excepting codes in DGENES and EGENES.

Assignment method

KO assignment methods may be performed based on the bi-directional best hit (BBH, default) of BLAST or single-directional best hit (SBH).

The computation time of the BBH method is about twice that of SBH. However, the method based on BBH will be more accurate than SBH, if the number of query sequences is large enough (genome scale). If the number of query sequences is small, then the SBH method should suffice (and save time).

Example of result

Users query list

ID name state result start (GMT) end (GMT) retry
1156789000 Human computing 4/01 13:00 2007
1156789012 E.coli complete html, text 4/02 7:04 2007 4/02 7:15 2007
1156789055 query070403 failed 4/03 12:20 2007 retry

'Failed' means abnormal process termination for the trouble with the server or network. You can retry the computation with same file and conditions.

KO list

The flat list of query genes with the K numbers given by the KAAS.

example1

BRITE hierarchies (changed from "KO hierarchy")

The hierarchical list of annotated genes, which is categorized according to the BRITE database.

example2 example3

Pathway map

The list of pathways with linkes to graphical pathway maps.

example4 example4

Method

First, the BLAST bit scores between a query sequence and the reference sequence set (taken from the KEGG GENES database) are computed, and homologs are found in the reference set. Next, homologs ranked above the threshold are selected as ortholog candidates based on the BLAST score and the bi-directional hit rate (BHR) defined below. Ortholog candidates are divided into KO groups according to the annotation of the KEGG GENES database. Finally, the assignment score is calculated based on the likelihood and heuristics for each KO group. Then, the K number of the KO group with the highest score is assigned to the query sequence.

Bi-directional hit

Given a genome to be annotated, it is compared against each genome in the reference set of the KEGG GENES database by BLAST searches in both forward and reverse directions, taking each gene in genome A as a query compared against all genes in genome B, and vice versa. Those BLAST hits with bit scores less than 60 are removed. Because the bit scores of a gene pair a and b from two genomes A and B, respectively, can be different in forward and reverse directions, and because the top scores do not necessarily reflect the order of the rigorous Smith-Waterman scores, we define the BHR as:

BHR = Rf * Rr

Here, R = S'/Sb where S' is the bit score of a against b, and Sb is the score of a against the best-hit gene in genome B (which may not necessarily be b). Rf refers to the score from the forward BLAST (A against B), and Rr refers to the score from the reverse BLAST (B against A). We select those genes whose BHR is greater than 0.95 in BBH method, and Rf is greater than 0.95 in SBH method.

Assignment score

We define a score for each ortholog group in order to assign the best fitting K numbers to the query gene:

formula

where Sh is the highest score among all ortholog candidates in the ortholog group, m and n are the sequence lengths of the query and the target of BLAST, respectively, N is the number of organisms in ortholog group, x is the number of organisms in the original ortholog group from which this group is derived, and p is the ratio of the size of the original ortholog group versus the size of the entire GENES database. The second term is for the normalization of the first term by sequence lengths, and the third term is a weighting factor to consider the number of ortholog candidates that are found in the original.

Download of stand-alone version

Stand-alone KAAS for Linux and OS X

Reference

Moriya, Y., Itoh, M., Okuda, S., Yoshizawa, A., and Kanehisa, M.; KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic Acids Res. 35, W182-W185 (2007). [pubmed] [NAR]

Feedback

Please use the feedback page to send your comments or questions to KAAS.

GenomeNet Feedback Form