VOG (Virus ortholog group) can be searched with protein name, virus name, virus family name, Baltimore class and K number. Try, for example, "RNA polymerase" or "RNA polymerase -ssRNA"
KEGG Virus Resource
KEGG Virus is a resource for integrated analysis of virses and cellular organisms. It is part of the GENES, KO, GENOME, BRITE, PATHWAY, MODULE, NETWORK, DISEASE and DRUG databases in KEGG. It also contains a computationally generated dataset of virus ortholog group (VOG).
Virus data taken from RefSeq release 229
Realm | Class | #Vtax | #Gene |
Riboviria | dsDNA-RT | 131 | 581 |
ssRNA-RT | 92 | 340 | |
dsRNA | 414 | 1,956 | |
+ssRNA | 3,104 | 9,900 | |
-ssRNA | 1,244 | 6,059 | |
Ribozyviria | -ssRNA | 11 | 18 |
Duplodnaviria | dsDNA | 5,305 | 568,713 |
Adnaviria | dsDNA | 34 | 1,741 |
Varidnaviria | dsDNA | 293 | 54,659 |
ssDNA | 1 | 16 | |
Singelaviria | dsDNA | 9 | 358 |
Monodnaviria | ssDNA | 1,559 | 7,750 |
dsDNA | 340 | 2,174 | |
Riboviria | - | 970 | 2,105 |
unclassified | dsDNA | 159 | 26,420 |
unclassified | ssDNA | 123 | 562 |
other | - | 588 | 5,271 |
Total | 14,377 | 688,623 |
VOG version 2025-03-27
Current statistics (2025/4/28)
Examples of integrated analysis
VOG threshold | 30% | 50% | 70% |
Number of VOGs | 50,667 | 76,299 | 87,373 |
Number of proteins in VOGs | 605,412 | 550,952 | 494,621 |
Total number of proteins | 676,333 |
Current statistics (2025/4/28)
Number of viral genes (vg entries) | 688,623 |
Number of viral mature peptides (vp entries) | 377 |
Number of vg/vp entries with assigned KOs | 57,715 |
Number of KOs assigned to vg/vp entries | 1,758 |
Number of virus-specific KOs | 1,325 |
Examples of integrated analysis
Complete modules such as M00793 and M00988 |
Conserved gene orders such as this |
Virus genes, genomes and taxonomy
The virus category of the GENES and GENOME databases is generated from the bimonthly release of the NCBI RefSeq database. RefSeq GeneID's are used as gene identifiers and the organism code "vg" is used for the entire set of virus genes. Thus, each virus gene in KEGG is identified by vg:<gene_id> where <gene_id> is the NCBI GeneID.
In order to distinguish virus genomes, the Vtax identifier is used in the GENOME database in the form of gn:<vtax>. Vtax is the same as the NCBI taxonomy ID, thus may occasionally contain multiple genome sequences determined separately.
All viruses are classified according to the NCBI taxonomy, which is based on the ICTV (International Committee on Taxonomy of Viruses) taxonomy, supplemented by KEGG with the traditional Baltimore classification. The correspondence between the ICTV realm, kingdom, phylum, class ranks and the seven types of Baltimore classification is shown below. Additional family-level assingments without higher ranks are also included.
The resulting classification is shown in the following Brite hierarchy files, where 08621 is used as a default file.
The Vtax identifiers used in these taxonomy files are not stable identifiers, for they may change when NCBI changes the taxonomy IDs. Thus, the T4 identifiers are given to selected viruses, especially for the purpose of creating links to/from the DISEASE database.
All viruses are classified according to the NCBI taxonomy, which is based on the ICTV (International Committee on Taxonomy of Viruses) taxonomy, supplemented by KEGG with the traditional Baltimore classification. The correspondence between the ICTV realm, kingdom, phylum, class ranks and the seven types of Baltimore classification is shown below. Additional family-level assingments without higher ranks are also included.
Riboviria Pararnavirae Artverviricota Revtraviricetes Blubervirales (VII dsDNA-RT) Ortervirales (VI ssRNA-RT) Caulimoviridae (VII dsDNA-RT) Orthornavirae Duplornaviricota (III dsRNA) Pisuviricota Duplopiviricetes (III dsRNA) Durnavirales Hypoviridae (IV +ssRNA) Fusariviridae (IV +ssRNA) Pisoniviricetes (IV +ssRNA) Stelpaviricetes (IV +ssRNA) Kitrinoviricota (IV +ssRNA) Lenarviricota (IV +ssRNA) Negarnaviricota (V -ssRNA) Ribozyviria (V -ssRNA) |
Duplodnaviria (I dsDNA) Adnaviria (I dsDNA) Varidnaviria (I dsDNA) Bamfordvirae Preplasmiviricota Ainoaviricetes (II ssDNA) Singelaviria (I dsDNA) Monodnaviria (II ssDNA) Shotokuvirae Cossaviricota Papovaviricetes (I dsDNA) |
- 08620 KEGG viruses in the NCBI taxonomy
- 08621 KEGG viruses in taxonomic ranks (fixed levels of taxonomic ranks reorganized from 08620)
The Vtax identifiers used in these taxonomy files are not stable identifiers, for they may change when NCBI changes the taxonomy IDs. Thus, the T4 identifiers are given to selected viruses, especially for the purpose of creating links to/from the DISEASE database.
- 08622 KEGG selected viruses
Virus KOs
Based on experimental evidence virus specific KOs are defined as summarized in the following BRITE hierarchy file.
- 03200 Viral proteins
KO | Name | Sequence with experimental evidence |
K25001 | SARS coronavirus spike protein S1 | vp:43740568-1 |
K25002 | SARS coronavirus spike protein S2 | vp:43740568-2 |
K24152 | SARS coronavirus spike glycoprotein | vg:43740568 |
K24324 | MERS coronavirus spike glycoprotein | vg:14254594 |
K24325 | Betacoronavirus (excluding SARS and MERS) spike glycoprotein | vg:39105218 |
K19254 | Coronaviridae (excluding betacoronavirus) spike glycoprotein | vg:918758 |
Virus ortholog group (VOG)
Due to the lack of experimental evidence, defining and assigning KOs for virus genes will be very limited. In order to supplement KOs, a new attempt has been made to computationally generate virus ortholog groups (formerly called virus ortholog clusters) using the same annotation resource of GFIT tables (see KO assignment tools).
Currently, virus ortholog groups are generated by a simple procedure shown below.
The most important usage of VOGs is described in KEGG Syntax involving the analysis of conserved genes and gene orders among viruses and also between viruses and cellular organisms.Annotation
- The result of vg-vg comparison is stored in the the paralog GFIT table, a variant form of which may be viewed from the "Paralog" button of each vg entry page.
- The measure of similarity is defined by a modified identity with weighting of min(1, overlap*2/(aalen1+aalen2)) for the identity of the overlap (aligned) region.
- For each gene its GFIT table is used to collect similar genes above a given threshold of modified identity.
- GFIT tables of similar genes are then used to collect additional similar genes and this process is repeated until no addition is made.
- This can be viewed as single-linkage clustering of truncated GFIT tables, which are processed in the order of the decreasing table size.
The most important usage of VOGs is described in KEGG Syntax involving the analysis of conserved genes and gene orders among viruses and also between viruses and cellular organisms.
Reference
- Jin, Z., Sato, Y., Kawashima, M., and Kanehisa, M.; KEGG tools for classification and analysis of viral proteins. Protein Sci. 32, e4820 (2023).
[pubmed]
[doi]
VOC (virus ortholog cluster) described in this paper is now called VOG (virus ortholog group). - Kanehisa, M., Furumichi, M., Sato, Y., Matsuura, Y. and Ishiguro-Watanabe, M.; KEGG: biological systems database as a model of the real world. Nucleic Acids Res. 53, D672-D677 (2025).
[pubmed]
[doi]
See also this paper about VOG.
Last updated: April 8, 2025