maple_logo

MAPLE - Metabolic And Physiological potentiaL Evaluator

for gene mapping to the KEGG functional modules and calculation of module completion ratio (MCR)

MAPLE Help

Metabolic and physiological potential evaluator (MAPLE) is an automatic system for mapping genes of individual genomes and metagenomes with functional modules defined by KEGG and for calculating module completion ratios (MCRs), the percentage of a module component filled with the input genes. It first assigns KEGG Orthology (KO) to the query genes using the KEGG Automatic Annotation Server (KAAS), then maps KO-assigned genes to the KEGG functional modules, and finally calculates MCR of each functional module. The MAPLE system provides a user-friendly web interface for submitting genomic and metagenomic data and viewing mapping patterns and MCR results.

Data Submission

Query sequences


Submission form: User can select a favorite data set file.
Click to enlarge.

Individual organism
Query sequences must be amino acid sequences representing a set of protein-coding genes from a complete genome or partial genome of an individual organism. KEGG Orthology (KO) assignments to the query sequences are processed by KAAS on the basis of the BLASTP results to the KEGG GENES database. Query sequences should be in multi-FASTA format with unique IDs, and the gene IDs must not include tab delimiters. After KO assignment for each query sequence is completed, mapping of KO-assigned sequences to the KEGG functional modules is initiated and subsequently the module completion ratios are calculated.

Example of data set: complete genes
>G0001
MVKVYAPASSANMSVGFDVLGAAVTPVDGALLGDVVTVEAAETFSLNNLGRFADKLPSEP
RENIVYQCWERFCQELGKQIPVAMTLEKNMPIGSGLGSSACSVVAALMAMNEHCGKPLND
TRLLALMGELEGRISGSIHYDNVAPCFLGGMQLMIEENDIISQQVPGFDEWLWVLAYPGI
KVSTAEARAILPAQYRRQDCIAHGRHLAGFIHACYSRQPELAAKLMKDVIAEPYRERLLP
GFRQARQAVAEIGAVASGISGSGPTLFALCDKPETAQRVADWLGKNYLQNQEGFVHICRL
DTAG
>G0002
MKLYNLKDHNEQVSFAQAVTQGLGKNQGLFFPHDLPEFSLTEIDEMLKLDFVTRSAKILS
AFIGDEIPQEILEERVRAAFAFPAPVANVESDVGCLELFHGPTLAFKDFGGRFMAQMLTH
IAGDKPVTILTATSGDTGAAVAHAFYGLPNVKVVILYPRGKISPLQEKLFCTLGGNIETV
AIDGDFDACQALVKQAFDDEELKVALGLNSANSINISRLLAQICYYFEAVAQLPQETRNQ
LVVSVPSGNFGDLTAGLLAKSLGLPVKRFIAATNVNDTVPRFLHDGQWSPKATQATLSNA
MDVSQPNNWPRVEELFRRKIWQLKELGYAAVDDETTQQTMRELKELGYTSEPHAAVAYRA
LRDQLNPGEYGLFLGTAHPAKFKESVEAILGETLDLPKELAERADLPLLSHNLPADFAAL
RKLMMNHQ
>G0003
MKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDHGWWKQH
YEWRGNRWHLHGPPPPPRHHKKAPHDHHGGHGPGKHHR


Metagenome
Query sequences must be amino acid sequences representing a set of protein coding genes from metagenomic sequences generated by high throughput DNA sequencers such as Roche 454 and Illumina MiSeq. KO assignments to the query sequences are processed by KAAS on the basis of the BLASTP results to the KEGG GENES database. The query sequence may not necessarily be a complete gene, but amino acid (aa) sequences longer than 100 residues are recommended for accurate KO assignment. Query sequences should be in multi-FASTA format with unique IDs that must not include tab delimiters. The file size of query sequences must not exceed 30 MB (100000∼200000 sequences). When the file size exceeds this limit, an error message will be displayed. After the KO assignment for each query sequence is completed, mapping of KO-assigned sequences to KEGG functional modules is initiated and subsequently the module completion ratios are calculated. The computation time is usually proportional to the size of the query sequences (several days for 200000 sequences).

Example of data set: complete or incomplete genes predicted from short read sequences
>MG0001
MVKVYAPASSANMSVGFDVLGAVTPVDGALLGDVVTVEAAETFSLNNLGRFADKLPSEPR
ENIVYQCWERFCQELGKQIPVAMTLEKNSSACVAALMAMNEHCGKPLNDTRLLALMGELE
GRISGSIHYDNVAPCFLG
>MG0002
MKLYNLKDHNEQVSFAQAVTQGLGKNQGLFFPDLPEFSLTEIDEMLKLDFVTRSAKILSA
FIGDEIPQEILEERVRAAFAFPAPVVESDVGCLELFHGPTLAFKDFGGRFMAQMLTHIAG
DKPVTILTATSGDTGDTVPRFLHDGQWSPKAT
>MG0003
MKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDHGWWKQH
YEWRGNRWHLHGPPPPPRHHKKAPHDHHGGHGPGKHHRAL



Waiting state of a calculation.
Click to enlarge.

After submission of a data set, the above message will be displayed. A URL for accessing the results will be sent via e-mail after the computation has been completed.

Result pages for MCR calculations

  • output example of MCR calculation.
  • output example of multiple results comparison.
  • A job list is presented on the result page of the URL sent to the user. After completion of the calculation, the Job ID is accessible for further interpretation of the results.


    First picture when access to URL address is sent to the submitter
    Click to enlarge.

    First picture of the result corresponding to each job ID


    Module information and mapping pattern to the KEGG module
    Click to enlarge.

    Users can view MCRs for KEGG modules as a histogram and the raw MCR results can be downloaded in Excel format in the first page. Clicking a module ID displays the mapping pattern of the KEGG module as presented below.

    Module completion pattern of reference organisms

    (A) Modules completed by almost all reference organisms. (B) Modules completed by very few organisms. The breakdown of organisms that complete the module is color-coded by taxonomic phylum level.


    Module completion pattern of the KEGG modules by 1288 prokaryotic species
    Click to enlarge.

    Click to enlarge.

    The distribution of the module completion ratios in 1288 prokaryotic species (one genome per species) can be categorized into four patterns (universal, restricted, diversified, and non-prokaryotic) regardless of the module type (pathway, structural complex, signature, or functional set), considering 70% of all species to represent a majority measurement for the patterns. Pattern A, defined as "universal," comprised modules completed by more than 70% of the 1288 species. Pattern B, defined as "restricted" comprised modules completed by less than 30 % of the species, with 37 being rare modules completed by less than 10% of the 1288 species. Pattern C, defined as "diversified," accounted for 40.3% of all the pathway modules and comprised modules ranging widely in completion ratio. Pattern D, which accounted for 35% of all pathway modules, comprised non-prokaryotic modules that are not completed by prokaryotic species. These four criteria and taxonomic classification for each module will be helpful for interpretation of module completion patterns.

    List of K number

    Some KO identifiers assigned to a KEGG module are shared by several other modules. These modules are listed with MCR. When the module completion ratio is low, the relationship between the module completion ratio of the targeted module and others assigned with the same KO IDs should be considered. In addition, all pathway maps possessing the KO IDs assigned to the module are presented in the rightmost column.

    Mapping pattern

    KO-assigned sequences in the genome and metagenome are mapped to KEGG modules. Because several sequences are assigned to the same KO for a metagenome, IDs are mapped to the module. The abundances of individual orthologs are differentiated in the shades of red.

    Comparison of multiple results

    Comparison of several jobs
    When several job IDs to be compared are checked in the job list, the comparative results of MCR and module mapping pattern are displayed as follows.


    User's own job list
    Click to enlarge.

    Comparative results of three jobs
    Click to enlarge.

    Comparison of job results
    against pre-analyzedindividual organisms
    Click to enlarge.

    Comparison of job results against individual organisms
    When users want to compare their own job results with those for pre-analyzed individual organisms, they may select the organisms to be compared from the organism list.