RefGene: Microbial reference gene catalogs
===========================================

Laboratory of Chemical Life Science, Bioinformatics Center,
Institute for Chemical Research, Kyoto University
Uji, Kyoto 611-0011, Japan
Feedback: http://www.genome.jp/feedback/
============================================================

RefGene is a set of non-redundant microbial reference gene catalogs. It consists 
of three previously published reference gene catalogs: OM-RGC and IGC that are 
derived from meta-genomes and MATOU derived from meta-transcriptomes. They contain 
genes from marine (OM-RGC and MATOU) and human gut (IGC) microorganisms. The 
taxonomic and functional annotations were performed at GenomeNet based on amino acid 
(for OM-RGC and IGC) or translated DNA (for MATOU) level sequence similarity searches 
against KEGG GENES database using GhostKOALA (http://www.kegg.jp/ghostkoala/). For 
further information about the catalogs, see [1] for OM-RGC, [2] for IGC and [3] for MATOU.

This directory contains following files:
RG001: annotated gene entries of OM-RGC.
RG001.nuc: FASTA formatted nucleotide sequences of OM-RGC.
RG001.pep: FASTA formatted amino acid sequences of OM-RGC.
RG001.gene.occurrences.matrix.gz: ...A matrix where rows correspond to genes and columns correspond to samples. 
				  ...Values are gene occurrence expressed in number of bases normalized by gene length. 
				  ...Sample names are as follow: TARA_ocean station number_sampling depth_filter size (um)

RG002: annotated gene entries of IGC.
RG002.nuc: FASTA formatted nucleotide sequences of IGC.
RG002.pep: FASTA formatted amino acid sequences of IGC.
RG002.gene.occurrences: ...A matrix where rows correspond to genes and columns correspond to samples. 
			...Values are gene relative abundance.

RG003: annotated unigene entries of MATOU.
RG003.nuc: FASTA formatted nucleotide sequences of MATOU.
RG003.gene.occurrences.tsv.gz: ...Tab delineated file with columns:
					UnigeneID
					SampleCode: is a string concatenating the following informations
            					Station
					        Sampling depth (categorical)
            					Size fraction (categorical)
           					Internal code used for the sequenced template.
    					Occurrence (in RPKM)

Gene occurrences files are as in the original publications associated with the catalogs [1,2,3] and were downloaded from:
http://ocean-microbiome.embl.de/companion.html (RGC001)
http://meta.genomics.cn/meta/dataTools (RGC002)
http://www.genoscope.cns.fr/tara/ (RGC003)


REFERENCE
==========
[1] Sunagawa, S. et al. (2015) "Structure and function of the global ocean microbiome." Science, 348(6237)
[2] Li, J. et al. (2014) "An integrated catalog of reference genes in the human gut microbiome." Nat Biotechnol , 32(8)
[3] Carradec, Q. et al. (2018) “A global ocean atlas of eukaryotic genes.” Nat Commun. 9 (373)

============================================================
Last update: 2018/12/18

