KEGG in Keg

KEGG Syntax – Genome Similarity

Syntax Genome alignment Genome similarity Taxonomy mapping User data analysis

Genome similarity

In the genome alignment tool the genome is characterized by the sequence of KOs and the similarity of genomes is obtained by comparing KO sequences. Here the genome of an organism or a virus is characterized by the composition (distinct set) of KOs or modules, and a simple measure of genome similarity is introduced to rapidly identify similar genomes and organism groups. Three types of similarity measures are defined as shown below.
similarity = match / (num1 + num2 - match)
similarity1 = match / num1
similarity2 = (match / num1 + match / num2 ) / 2
where
num1 = number of distinct KOs/modules in genome 1
num2 = number of distinct KOs/modules in genome 2
match = number of matching KOs/modules in genomes 1 and 2
The second type may represent whether and how a shorter query genome is embedded in a longer genome.

Search KEGG organisms and viruses with similar KO composition
Enter organism code or vtax number
       Threshold similarity: %
Examples: hsa eco mge 2681611 (Lambda phage)
Search KEGG organisms and viruses with similar module composition
Enter organism code
       Threshold similarity: %
Examples: hsa eco mge

Organism group similarity New!

An organism group is a collection of genomes and can be characterized by the combined and distinct set of KOs. The third type of similarity measure, similarity2, is used to compare KEGG organism groups: the six top level groups of animals, plants, fungi, protists, bacteria and archea, and the second level groups. The resulting dendrograms are shown below: which indicate the correspondence between top and second-level groupings except for protists, which are more like eukaryotes other than animals, plants and fungi.


Virus similarity

Since the KO assignment rate is very low for viral proteins, computationally generated VOGs may be used to measure similarity among viruses. Here the 30% level VOG (VOG30) is used.

Search viruses and KEGG organisms with similar VOG composition
Enter vtax number
       Threshold similarity: %
Examples: 2697049 (Coronavirus) 2681611 (Lambda phage) 212035 (Mimivirus)
When virus groups are compared by the combined and distinct set of VOGs, they are found to be very different even at the Family level as shown below.


Metagenome similarity

The KEGG Metagenome dataset at GenomeNet is given K numbers by GhostKOALA and its annotation quality is low, but the KO composition may be used to uncover organism groups in the metagenome.

Search metagenomes and KEGG organisms with similar KO composition
Enter metagenome T3 number or organism code
       Threshold similarity: %
Examples: T30798 bth


Last updated: April 24, 2026