KEGG OC (KEGG Ortholog Cluster) is a novel database of ortholog clusters (OCs) based on the whole genome comparison. The OCs were constructed by applying a novel clustering method to all possible protein coding genes in all complete genomes, based on their amino acid sequence similarities. KEGG OC has the following original features in terms of coverage, efficiency, and usability. First, it consists of all fully sequenced genomes of a wide range of organisms from three domains (eukaryotes, bacteria, and archaea). Second, it is computationally efficient to calculate OCs, which makes it possible to regularly update the contents. Third, it is compatible with the KEGG database, which provides an easy way to link the OCs with KEGG PATHWAY, BRITE functional hierarchies, KEGG MODULE, KEGG MEDICUS, and many more.

OC search

ID or terms query

ex. hsa:362, K04517, M00170, pyruvate dehydrogenase org=bsu


Amino acid sequence query

ex. hsa:5162

Statistics

Number of organisms:5,491
Number of eukaryotes:441
Number of bacteria:4,777
Number of archaea:273
Number of sequences:24,835,030
Number of clusters:2,065,892

Cluster typeSingletonMultipleTotal
domain common0606606
eukaryotes, bacteria04,4024,402
eukaryotes, archaea01,2051,205
bacteria, archaea04,5154,515
eukaryotes496,405274,921771,326
bacteria714,100477,7881,191,888
archaea51,19440,75691,950
Total1,261,699804,1932,065,892

ver. 2018-10-29