KO assignment by BLAST search
This KO assignment tool is an interactive tool equivalent to the BlastKOALA server for use with a small dataset. BlastKOALA assigns KOs (K numbers) to a given set of amino acid sequences for subsequent analysis with the KEGG Mapper Reconstruct tool.About KO assignment methods
BlastKOALA annotation
Protein coding genes in a new genome are assigned KOs using BLAST comparison against the reference GENES dataset. This initial annotation is updated by the KOALA annotation described next. The BlastKOALA server, as well as the interactive tool in this page, uses the same database and procedure as the internal BlastKOALA annotation.
KOALA annotation
New or updated genomes are subject to SSEARCH comparison against the entire GENES dataset and registed in the SSDB database. For each gene an organism-based list of top-hit similarity neighbors is generated and stored in a tabular form, call the GFIT table. GFIT tables are used to assign KOs both computationally and manually.
Computational assignment is performed with a simple algorithm, called new KOALA algorithm internally. The measure of similarity is defined by a modified identity score with weighting of min(1, overlap*2/(aalen1+aalen2)) for the identity score of the overlap (aligned) region given by SSEARCH. Normally top ten hits in the GFIT table are used to assign KOs. Computational assignemnts are used in most cases to automatically rewrite KO records of GENES entries.
Manual annotation
Computational assignment is performed with a simple algorithm, called new KOALA algorithm internally. The measure of similarity is defined by a modified identity score with weighting of min(1, overlap*2/(aalen1+aalen2)) for the identity score of the overlap (aligned) region given by SSEARCH. Normally top ten hits in the GFIT table are used to assign KOs. Computational assignemnts are used in most cases to automatically rewrite KO records of GENES entries.
Manual annotation of KOs is performed whenever new KOs are defined or existing KOs are modified. The consistency of the entire GENES annotation is checked every day, and additional candidates and possible misannotations are presented for human intervention.
Reference GENES dataset
Reference genomes in KEGG are well-curated genomes and/or representative genomes in taxonomic groups. They are not subject to automatic KOALA annotation. The reference GENES dataset is a collection of protein coding genes in reference genomes and publication-based sequence data associated with KOs, many of which are in the GENES Addendum (ag) category.