EPD Home page <http://www.epd.isb-sib.ch/index.html>

------------------------------------------------------------------------


  EUKARYOTIC PROMOTER DATABASE
  USER MANUALWritten by:Philipp Bucher, Rouaida Cavin Périer, Viviane
  Praz and Christoph Schmid

Swiss Institute of Bioinformatics
and Swiss Institute for Experimental Cancer Research
Ch. des Boveresses 155
CH-1066 Epalinges s/Lausanne
Switzerland

Electronic mail:

via webmail form <http://www.epd.isb-sib.ch/webmail.html>

    This manual and the database it accompanies may be copied and
    redistributed freely, without advance permission, provided that this
    statement is reproduced with each copy.

    Published Research assisted by the Eukaryotic Promoter Database
    should cite:
    EPD in its twentieth year: towards complete promoter coverage of
    selected model organisms
    Schmid, C.D., Perier, R., Praz, V. and Bucher, P. (2006) Nucleic
    Acids Res, 34, D82-85.

------------------------------------------------------------------------


    EPD RELEASE 100, June 2009

*WHAT IS NEW IN RELEASE 100:* adaptations to 3-digit release numbers
  EPD release history
<http://www.epd.isb-sib.ch/current/EPD_release_history.html>


    CONTENTS

   1. INTRODUCTION <#INTRODUCTION>
   2. PROMOTER SELECTION <#PROMOTER_SELECTION>
   3. ASSIGNMENT OF INITIATION SITE
      <#ASSIGNMENT_OF_TRANSCRIPTION_INITIATION_SITE>
   4. FORMAT CONVENTIONS <#FORMAT_CONVENTIONS>
         1. The title line <#The_title_line>
         2. Promoter entries <#Promoter_entries>
               1. The ID line <#The_ID_line>
               2. The AC line <#The_AC_line>
               3. The DT line <#The_DT_line>
               4. The DE line <#The_DE_line>
               5. The OS line <#The_OS_line>
               6. The HG line <#The_HG_line>
               7. The AP line <#The_AP_line>
               8. The NP line <#The_NP_line>
               9. The DR line <#The_DR_line>
              10. The RN, RX, RA, RT and RL lines
                  <#The_RN,_RX,_RA,_RT_and_RL_lines>
              11. The ME line <#The_ME_line>
              12. The SE line <#The_SE_line>
              13. The FL line <#The_FL_line>
              14. The IF line <#The_IF_line>
              15. The TX line <#The_TX_line>
              16. The KW line <#The_KW_line>
              17. The FP, DO and RF lines <#The_FP,_DO_and_RF_lines>
              18. The // line <#The_//_line>
         3. Line types retained from the old format
            <#Line_types_retained_from_the_old_format>
               1. The FP line <#The_FP_line>
               2. Documentation <#Documentation>
               3. Literature references <#Literature_references>
               4. Miscellaneous <#Miscellaneous>
         4. Distinct format of 'preliminary' entries in epd_bulk.dat
            <#Distinct_format_of_preliminary_entries_in_epd_bulk.dat>
   5. CLASSIFICATION <#CLASSIFICATION>
   6. HOMOLOGOUS PROMOTERS <#HOMOLOGOUS_PROMOTERS>
   7. PROMOTER SEQUENCE RETRIEVAL <#PROMOTER_SEQUENCE_RETRIEVAL>
   8. REFERENCES <#REFERENCES>

   1. APPENDIX A : SURVEY OF RELEASE
      <http://www.epd.isb-sib.ch/current/SURVEY.html>
   2. APPENDIX B  : CODES AND ABBREVIATIONS <#APPENDIXB>
         1. SPECIES CODES <#SPECIES_CODES>
         2. JOURNAL CODES <#JOURNAL_CODES>
         3. ABBREVIATIONS <#ABBREVIATIONS>


    1 INTRODUCTION

The Eukaryotic Promoter Database EPD was designed and developed at the
Weizmann Institute of Science in Rehovot (Israel) and is currently
maintained at ISREC in Epalinges s/Lausanne (Switzerland). EPD is a
specialized annotation database of the EMBL Data Library. It provides
information about eukaryotic promoters available in the EMBL Data
Library and is intended to assist experimental researchers, as well as
computer analysts, in the investigation of eukaryotic transcription
signals. The present version originated from a previous compilation
published in an article (1
<http://www.epd.isb-sib.ch/current/usrman.html#ref_1>) and is organized
as a hierarchically ordered and documented "functional position set" (2
<http://www.epd.isb-sib.ch/current/usrman.html#ref_2>) pointing to
transcription initiation sites. All information is either directly
extracted from scientific literature or, starting from release 73,
compiled by a new in silico primer extension method (16 <#ref_16>). Thus
promoter information in EPD is independent of the EMBL sequence entry
descriptions. As a consequence, many of the initiation sites referred to
in EPD do not appear in corresponding EMBL feature tables.A coordinated
updating procedure has been set up by the two laboratories that will
ensure future compatibility between the position references in EPD and
the sequence data in the main data library. Investigators who access
EMBL via publicly available programs should be aware of the fact that
software producers occasionally modify the sequence data in ways that
render position references inaccurate. EPD is generally not compatible
with sequence data of another release because EMBL sequence entries are
not designed as stable data units. The completeness and accuracy of EPD
greatly benefits from user-feedback. Any report of mistakes or omissions
would be very much appreciated. Direct communication of newly published
transcript mapping or gene expression data is also welcome. Please
forward all correspondence to the address given on top of this document.
Use electronic mail if possible.


    2 PROMOTER SELECTION

EPD is a rigorously selected database. In order to be included in EPD, a
promoter must be:

   1. recognized by eukaryotic RNA POL II,
   2. active in a higher eukaryote,
   3. experimentally defined, or homologous and sufficiently similar to
      an experimentally defined promoter,
   4. biologically functional,
   5. available in the current EMBL release,
   6. distinct from other promoters in the database.

Explanations:

   1. Transcription by RNA POL II is bona fide assumed for protein
      coding genes but must be supported by alpha-amanitin data if the
      end product is an RNA.
   2. All eukaryotes except phycophyta, fungi, myxomycetes, and protozoa
      are considered higher eukaryotes. Note that the expression "active
      in" does not always refer to the source organism of the promoter
      (e.g. in viruses). EPD contains currently promoter sequences from
      139 different species <#SPECIES_CODES>.
   3. A promoter is experimentally determined if a corresponding
      transcription initiation site is mapped with a precision of +/- 5
      bp or higher. Any technique that characterizes the 5'terminus of
      an in vivo or in vitro generated RNA is acceptable. Single
      nuclease-protection or primer-extension data must be accompanied
      by additional evidence unless the gene's intron-exon organization
      is well established. Similarity is considered "sufficient" if
      percent identity (as defined in Section 6) is >=60% between -79
      and +20 or >=75% between -49 and +10.
   4. A promoter is biologically functional if it contributes to the
      source organism's survival and/or reproduction. This is bona fide
      assumed except for promoters of pseudogenes, minor transcription
      initiation sites (<20% of total gene transcripts), promoters
      giving rise to an unstable RNA product, and mutant promoter.
   5. The minimum sequence requirement is 45 bp between -49 and +10.
   6. Promoters are considered distinct if they originate from different
      gene loci or different species. Identity is assumed if two
      promoters from the same species exhibit >95% similarity between
      -79 and +20 while their genetic relationship is unknown. Multiple
      isolates of viruses or transposable elements are considered
      distinct if at least one promoter region fails to fulfill the
      above similarity criterion.


    3 ASSIGNMENT OF TRANSCRIPTION INITIATION SITE

A eukaryotic promoter is defined as a DNA sequence around a
transcription initiation site. The position reference to the initiation
site is therefore the central part of a promoter entry. Its assignment
is based directly on experimental data shown in an article, proposed
adjustments originating from consensus sequence considerations being
ignored. In the case of minor discrepancies between different
publications averaged positions are given. Position references are
subject to permanent re-evaluation. A transcription initiation site may
be reassigned upon publication of new data. Position references are
replaced if longer upstream sequences of the same promoter become
available in a new EMBL sequence entry.
Several initiation sites preceding the same gene appear as alternative
promoters if they are clearly separated from each other or
differentially regulated. The minimum distance required between two
alternative initiation sites is 20 bp. Otherwise, they are considered a
single promoter region.
Four types of promoters are distinguished by one-letter codes in order
to account for the variety of transcription initiation patterns in
eukaryotes:

    * S: Single initiation site: >90% of all reported transcripts
      initiate within 10 bp (the experimental data usually do not allow
      distinction between a single cap-site and small mRNA 5'
      heterogeneity).
    * M: Multiple initiation sites: >75% of all reported transcripts
      initiate within 20 bp.
    * R: Initiation region: >75% of all reported transcripts initiate
      within 100 bp.
    * U: Undefined transcription initiation pattern, exclusively in
      'preliminary' entries in epd_bulk.dat (see next section).

Note that in addition to true alternative promoter activity, variability
in the position of the transcription initiation site might also be due
to experimental constraints, a biological variability in the activity of
the DNA polymerase II, or the presence of highly similar (pseudo-) genes
with distinct transcription initiation sites.
In sequence entries that contain a complete RNA or DNA genome of a
retrovirus or a retrovirus-like transposable elements, the position
reference points to the U3/R boundary of the 3'terminal LTR.


    4 FORMAT CONVENTIONS

EPD is distributed as two ASCII flatfiles (epd.dat, epd_bulk.dat) in
essentially identical format. Differences in the format of 'preliminary'
entries in 'epd_bulk.dat' are described in paragraph 4.4
<#Distinct_format_of_preliminary_entries_in_epd_bulk.dat>. EPD files
contain a title line followed by a number of promoter entries.
Interspersed are group headings whose function and format are described
in the next section. The title line and parts of the promoter entries
are rigidly formatted so that the entire database conforms to the
standards of an FPS file (functional position set) of our current signal
search analysis (1
<http://www.epd.isb-sib.ch/current/usrman.html#ref_1>,2
<http://www.epd.isb-sib.ch/current/usrman.html#ref_2>) software.


      4.1. The title line

The title line of EPD is shown below:

TI   EPD83     Eukaryotic Promoter Database / Release 83              EP

The TI line contains the following fields:

 

columns 	data type
1- 2 	"TI"
3- 5 	(blank)
6-15 	FPS name
16-70 	title
71-72 	FPS code

Explanations:

    * FPS name and FPS code are used by our data extraction software to
      generate default names for output files.


      4.2. Promoter entries

An EPD entry contains the following types of information:

    * Promoter identification and description.
    * Machine-readable pointers to the transcription initiation site in
      corresponding sequence entries.
    * Description of the experimental evidence defining the
      transcription start site.
    * Various kinds of promoter classifications useful for extraction of
      biologically meaningful promoter subsets.
    * Information on regulatory properties.
    * Cross-references to other databases.
    * Bibliographic references.

Promoter entries are presented in a similar format as EMBL and
SWISS-PROT sequence entries. Each line starts with a line code
identifying the type of information presented. The current line types
and line codes and the order in which they appear in an entry, are shown
below:

    ID  - IDentification.
    AC  - ACcession number(s).
    DT  - DaTe.
    DE  - DEscription.
    OS  - Organism Species.
    HG  - Homology Group.
    AP  - Alternative Promoter.
    NP  - Neighbouring Promoter.
    DR  - Database cross-References.
    RN  - Reference Number.
    RX  - Reference cross-references.
    RA  - Reference Authors.
    RT  - Reference Title.
    RL  - Reference Location.
    ME  - MEthods.
    SE  - SEquence.
    FL  - Full Length.
    IF  - Initiation Frequency.
    TX  - TaXonomy.
    KW  - KeyWords.
    FP  - Functional Position.
    DO  - DOcumentation.
    RF  - literature ReFerence.
    //  - Termination line.

Spacer lines (XX) are inserted in order to make the promoter database
easier to read by eye. Some line types occur many times in a single
entry. Each entry must begin with an identification line (ID) and end
with a terminator line (//). Text does not exceed column 72. Below is an
example of a promoter entry:

      ID   HS_MYC_2     standard; single; VRT.
      XX
      AC   EP11148;
      XX
      DT   ??-APR-1987 (Rel. 11, created)
      DT   07-MAR-2005 (Rel. 82, Last annotation update).
      XX
      DE   c-myc (cellular homologue of myelocytomatosis virus 29 oncogene),
      DE   promoter 2.
      OS   Homo sapiens (human)
      XX
      HG   Homology group 53; Mammalian c-myc proto-oncogene, promoter 2
      AP   Alternative promoter #2 of 2; exon 1; site 2; major promoter.
      NP   none.
      XX
      DR   GENOME; NT_008046.15; NT_008046; [-41966656, 15188617].
      DR   EPD; EP11146; HS_MYC_1; alternative promoter; [-162; +].
      DR   CLEANEX; HS_MYC.
      DR   EMBL; AC103819.3; [-87815, 60206].
      DR   EMBL; X00364.2; [-2489, 8507].
      DR   EMBL; D10493.1; [-2487, 5569].
      DR   EMBL; K01910.1; [-2451, 49].
      DR   EMBL; M16261.1; [-1843, 1048].
      DR   EMBL; J03253.1; [-1759, 461].
      DR   EMBL; L00057.1; [-810, 2795].
      DR   EMBL; K03015.1; [-555, 458].
      DR   EMBL; X00196.1; [-532, 2792].
      DR   EMBL; M12026.1; [-511, 678].
      DR   EMBL; K01708.1; [-410, 500].
      DR   EMBL; K00559.1; [-345, 1020].
      DR   EMBL; K02280.1; [-302, 178].
      DR   EMBL; K01909.1; [-266, 1365].
      DR   EMBL; S65124.1; [-266, 1023].
      DR   EMBL; M14206.1; [-266, 446].
      DR   EMBL; M20013.1; [-240, 982].
      DR   EMBL; AF111270.1; [-142, 264].
      DR   EMBL; K02275.1; [-96, 780].
      DR   EMBL; X00675.1; [-96, 404].
      DR   EMBL; K02277.1; [-96, 157].
      DR   SWISS-PROT; P01106; MYC_HUMAN.
      DR   TRANSFAC; R01157; HS$CMYC_01; [-211, -189]; by position.
      DR   TRANSFAC; R01158; HS$CMYC_02; [-168, -145]; by position.
      DR   TRANSFAC; R01804; HS$CMYC_04; [-300, -283]; by position.
      DR   TRANSFAC; R01851; HS$CMYC_05; [-65, -57]; by position.
      DR   TRANSFAC; R01852; HS$CMYC_06; [-42, -34]; by position.
      DR   TRANSFAC; R04076; HS$CMYC_12; [-251, -228]; by position.
      DR   TRANSFAC; R04076; HS$CMYC_12; [-252, -229]; by position.
      DR   TRANSFAC; R04076; HS$CMYC_12; [-253, -230]; by position.
      DR   TRANSFAC; R04621; HS$CMYC_17; [-313, -262]; by position.
      DR   TRANSFAC; R08503; HS$CMYC_18; [-50, -41]; by position.
      DR   TRANSFAC; R16688; HS$CMYC_24; [-7, 41]; by position.
      DR   TRANSFAC; R16689; HS$CMYC_25; [-7, 41]; by position.
      DR   TRANSFAC; R17051; HS$CMYC_30; [-510, -480]; by position.
      DR   TRANSFAC; R18503; HS$CMYC_31; [-185, -170]; by position.
      DR   TRANSFAC; R18504; HS$CMYC_32; [-153, -168]; by position.
      DR   RefSeq; NM_002467.
      DR   MIM; 190080.
      XX
      RN   [1]
      RX   MEDLINE; 84026482.
      RA   Battey J., Moulding C., Taub R., Murphy W., Stewart T., Potter H.,
      RA   Lenoir G., Leder P.;
      RT   "The human c-myc oncogene: structural consequences of
      RT   translocation into the IgH locus in Burkitt lymphoma";
      RL   Cell 34:779-787(1983).
      RN   [2]
      RX   MEDLINE; 84131953.
      RA   Bernard O.D., Cory S., Gerondakis S., Webb E., Adams J.M.;
      RT   "Sequence of the murine and human cellular myc oncogenes and two
      RT   modes of myc transcription resulting from chromosome translocation
      RT   in B lymphoid tumours";
      RL   EMBO J. 2:2375-2383(1983).
      RN   [3]
      RX   MEDLINE; 87257828.
      RA   Lipp M., Schilling R., Wiest S., Laux G., Bornkamm G.W.;
      RT   "Target sequences for cis-acting regulation within the dual
      RT   promoter of the human c-myc gene.";
      RL   Mol. Cell. Biol. 7:1393-1400(1987).
      RN   [4]
      RX   MEDLINE; 88038843.
      RA   Broome H.E., Reed J.C., Godillot E.P., Hoover R.G.;
      RT   "Differential promoter utilization by the c-myc gene in mitogen-
      RT   and interleukin-2-stimulated human lymphocytes.";
      RL   Mol. Cell. Biol. 7:2988-2993(1987).
      XX
      ME   Nuclease protection [1,4].
      ME   Nuclease protection; transfected or transformed cells [3].
      ME   Length measurement of an RNA product; low-precision data [1].
      XX
      SE   agggagggatcgcgctgagtataaaagccggttttcggggctttatctaACTCGCTGTAG
      XX
      TX   6. Vertebrate promoters
      TX   6.1. Chromosomal genes
      TX   6.1.5. Hormones, growth factors, regulatory proteins
      TX   6.1.5.16. Various cellular protooncogenes
      XX
      KW   Proto-oncogene, Nuclear protein, DNA-binding, Glycoprotein,
      KW   Transcription regulation.
      XX
      FP   Hs c-myc         P2+:+S  EU:NC_000008.9       1+ 128817660; 11148.053 010*2
      XX
      DO        Experimental evidence: 4,4#,2l
      DO        Expression/Regulation: +mitogen
      RF        Cell34:779     EMBOJ2:2375    MCB7:1393      MCB7:2988
      //

A detailed description of each line type is given below.


        4.2.1. The ID line

The identification line is always the first line of an entry. The
general form of the ID line is:

ID   ENTRY_NAME data class; initiation site type; TAXONOMIC DIVISION.

    * /ENTRY_NAME/ is a unique entry identifier "HS_MYC_2" which obeys
      rigorous naming conventions. It contains 2 or 3 fields, the first
      is the species identification code at most 4 alphanumeric
      characters representing the biological source of the promoter. The
      second field uses for gene identification the protein code of
      SWISS-PROT ID (if available). For human EPD entries, instead of
      the SwissProt ID the official gene symbol approved by the HUGO
      nomenclature committee <http://www.gene.ucl.ac.uk/nomenclature/>
      (if available) is used. The third field is optional, it is either
      a number which represents alternative promoters or a letter for
      promoters of duplicated genes. The `_' sign serves as a separator.
    * The /data class/ field relates to the quality of the information:
      "standard" means that the information is complete and correct
      according the standards laid down in this document; "preliminary"
      means that the entry has not yet undergone all quality checks
      necessary for being classified as "standard".
    * The /initiation site type/ is either "single", "multiple",
      "region" as defined in Section 3
      <http://www.epd.isb-sib.ch/current/usrman.html#ASSIGNMENT_OF_TRANSCRIPTION_INITIATION_SITE>.
    * /TAXONOMIC DIVISION/ are
          o PLN for plant
          o NEM for nematode
          o ART for arthropode
          o MLS for mollusc
          o ECH for echinoderm
          o VRT for vertebrates.
      Note that these codes relate to the organism in which the promoter
      is expressed, not to the source organism in which the promoter is
      replicated as defined on the OS line. 

The ID line is terminated by a period.


        4.2.2. The AC line

AC   EP11148;

The accession number consists of the character string "EP" followed by 5
digits representing the EMBL release number followed by the EPD entry
order. Most EPD entries currently have only one accession number. If
necessary, more then one AC will be used, separated by semicolons and
the list is terminated by a semicolon.


        4.2.3. The DT line

The date lines show the date of entry or last modification of the entry.

DT   DD-MMM-YEAR (Rel. XX, Comment)

where `DD' is the day, `MMM' the month, `YEAR' the year, and `XX' the
EPD release number. The comment portion of the line indicates the action
taken on that date.

    * The first DT line indicates when the entry first appeared in the
      database.
    * The second DT line indicates when the promoter data was last
      modified. It is terminated by a period.


        4.2.4. The DE line

DE   c-myc (cellular homologue of myelocytomatosis virus 29 oncogene),
DE   promoter 2.

The description lines contain general descriptive information about the
promoter. The description is given in ordinary English and is
free-format. It contains the swiss-prot gene names when known. In some
cases, more than one DE line is required; in this case, the text is
divided only between words. The last DE line is terminated by a period.


        4.2.5. The OS line

OS   Mus musculus (house mouse)

The species line specifies the source organism(s) of the promotery. The
species names are based on NCBI's taxonomy and thus can be automatically
hyperlinked to the NCBI's taxonomy web pages.


        4.2.6. The HG line

HG   Homology group 53; Mammalian c-myc proto-oncogene, promoter 2

The homology group <http://www.epd.isb-sib.ch/current/HG.html> line is
optional, it contains 2 fields: a homology group number that allows
identification of all sequence-wise similar promoters in EPD, and a
homology group name.


        4.2.7. The AP line

AP   Alternative promoter #2 of 2; 5' exon 1; site 2; major promoter.

The AP line is optional and provides information on alternative
promoters <http://www.epd.isb-sib.ch/current/AP.html> of the same gene
(for more details, see Section 4.3.1.). It contains 3 or 4 fields,
separated by semicolons, providing the following types of information:

      descriptive text fields followed by 

    * Two numbers indicating, respectively, the promoter's relative
      position along the gene, and the total number of alternative
      promoters of the gene. Promoters are numbered in the 5' to 3'
      directions starting with one.
    * A number referring to the exon preceded by the promoters. Note
      that multiple promoters may be associated with the same
      (3'-coterminal) exon or with different exons. Known exons are
      numbered in 5' to 3' direction starting with one.

      Note that the nomenclature of 5'-exons in EPD may differ from the
    * usage in the literature. A number indicating the promoter's
      relative position among the subset of promoters preceeding the
      same exon.
    * An optional keyword indicating major promoters.

The AP line is terminated by a period.


        4.2.8. The NP line

NP   Neighbouring Promoter; EP23008; MM_H2B1; [-209; -].

The NP line is optional and provides information on promoters which are
physically closer to each other than 1000 bp. It contains 3 fields,
separated by semicolons, providing the following types of information:

    * The EPD accession number of the neighbouring promoter.
    * The EPD identifier of the neighbouring promoter.
    * The last field indicates, respectively, the position and the
      direction of the neighbouring promoter relative to the
      transcription initiation site given in the promoter entry.
          o Negative numbers indicate the upstream region of this entry
            and positive ones indicate the downstream region.
          o The sign indicates the transcription direction of the
            neighbouring promoter relative to the promoter entry:

            "+" means same direction
            "-" means opposite direction 


        4.2.9. The DR line

The DR lines contain cross-references to other EPD entries (if there are
alternative promoters of the same gene), or to entries from other
databases. So far, we have incorporated links to CLEANEX,
<http://www.cleanex.isb-sib.ch/current/CleanEx_manual.html> EMBL (3
<http://www.epd.isb-sib.ch/current/usrman.html#ref_3>), GenBank (4
<http://www.epd.isb-sib.ch/current/usrman.html#ref_4>), DDBJ (5
<http://www.epd.isb-sib.ch/current/usrman.html#ref_5>),  SWISS-PROT (6
<http://www.epd.isb-sib.ch/current/usrman.html#ref_6>), TRANSFAC (7
<http://www.epd.isb-sib.ch/current/usrman.html#ref_7>),  Flybase (8
<http://www.epd.isb-sib.ch/current/usrman.html#ref_8>), MIM (9
<http://www.epd.isb-sib.ch/current/usrman.html#ref_9>) and MGD (10
<http://www.epd.isb-sib.ch/current/usrman.html#ref_10>). The precise
format of these lines depends on the target database. Note that some
cross-references include numbers enclosed in square brackets indicating
the relative position of a linked sequence object, or keywords
characterising the nature of the relationship between the entries. For
instance, the ranges associated with cross-references to EMBL entries
define the extensions of the EMBL sequences relative to the initiation
site described by the EPD entry. The multiplicity of EMBL
cross-references in some entries mirrors the redundancy of the sequence
database. The first of these references corresponds to the longest
promoter region, except when the sequences are cancelled from EMBL
database, but still exist in GenBank or DDBJ.
The format of the DR line is shown by the following example lines:

     DR   GENOME; NT_037436.1; NT_037436; [-14139754, 9212459].
     DR   EPD; EP11146; HS_MYC_1; alternative promoter; [-162; +].
     DR   EMBL; J00120.1; [-2489, 8507].
     DR   SWISS-PROT; P01106; MYC_HUMAN.
     DR   SPTREMBL; Q8IQL1.
     DR   FLYBASE; FBgn0013718; nuf.
     DR   TRANSFAC; R01804; HS$CMYC_04; [-300, -283]; by position.
     DR   MIM; 190080.
     DR   RefSeq; NM_003529.
     DR   MGD; MGI:88468; Cola2.
     DR   ENSEMBL; CG32140.
     DR   TRANSCRIPTOME; DMe000571.

Explanations (for detailed information go to Guidelines
<http://www.epd.isb-sib.ch/current/guidelines.html>):

    * The first item on the DR line is the abbreviated name of the data
      collection to which reference is made. The currently defined data
      bank identifiers are the following:

       
      GENOME 	NCBI Reference Sequence (RefSeq) of genomic sequence contigs
      EPD 	Eukaryotic Promoter Database: alternative promoters of the
      same gene
      CLEANEX 	Gene expression database for human EPD promoters
      EMBL 	Nucleotide sequence database of the EMBL
      SWISS_PROT 	Protein sequence database
      SPTREMBL 	Subset of protein sequence database TrEMBL. It contains
      the entries which should be eventually incorporated into
      SWISS-PROT. SWISS-PROT accession numbers have been assigned for
      all SP-TrEMBL entries
      FLYBASE 	Drosophila genome database
      TRANSFAC 	Transcription factor (TF) database
      MIM 	Mendelian Inheritance in Man Database
      RefSeq 	Reference Sequence Database
      MGD 	Mouse Genome Database
      ENSEMBL 	Metazoan genome annotation
      TRANSCRIPTOME 	Catalog of transcripts and their mapping onto the
      genome (LICR Lausanne branch) 
      TIGR 	'gene identifiers' from the 'Rice Genome Annotation' project
      at TIGR

    * The second item is the primary accession number (or an equivalent
      unique identifier of another data banks) of the entry to which
      reference is made.
    * The third item (if it exists) is a secondary idientifier or name
      for the cross-referenced database entry.
    * The fourth item for EMBL and Transfac indicates the location and
      extension of the sequences given in these entries relative to the
      transcription initiation site given in the promoter entry.
      Negative numbers indicate the upstream region of this site and
      positive ones indicate the downstream part.
    * The fifth item
          o in the EPD line, indicates the position and the direction of
            the alternative promoter as it is defined for the
            neighbouring promoter in the NP
            <http://www.epd.isb-sib.ch/current/usrman.html#The_NP_line>
            line last field
          o in the TRANSFAC line, designates the criteria used to
            collect the TF entry:

            - by position: The TF binding site is situated between -500
            and + 100, +1 being the transcription initiation site
            - by function: The TF binding site is known to regulate the
            corresponding promoter. 

/NB /: TRANSFAC cross-reference lines should not exceed the real number
of binding sites found in "TRANSFAC Site Table". Thus the position given
in this DR line in related to the longest EMBL entry common to both EPD
and TRANSFAC (version 6.3) databases.


        4.2.10. The RN, RX, RA, RT and RL lines

These lines comprise the literature citations within EPD. The citations
indicate the papers from which the data has been abstracted. The
reference lines for a given citation occur in a block, and are always in
the order RN, RX, RA, RT, RL. Within each such reference block the RN
line occurs once, the RX lines occurs zero or more times, and the RA, RT
and RL lines each occur one or more times. If several references are
given, there will be a reference block for each.An example of a complete
reference is:

RN   [1]
RX   MEDLINE; 84026482.
RA   Battey J., Moulding C., Taub R., Murphy W., Stewart T., Potter H.,
RA   Lenoir G., Leder P.;
RT   "The human c-myc oncogene: structural consequences of
RT   translocation into the IgH locus in Burkitt lymphoma";
RL   Cell 34:779-787(1983).

The formats of the individual lines are explained below. >


        4.2.10.1. The RN line

The RN line gives a sequential number to each reference citation in an
entry.This number is used to indicate the reference in the ME lines.


        4.2.10.2 The RX line

The RX line is an optional line which is used to indicate the identifier
assigned to a specific reference in PubMed (PMID, from the National
Library of Medicine (NLM)). .


        4.2.10.3 The RA line

The RA lines list the authors of the paper (or other work) cited. The
authors are are listed in the order given in the paper. The names are
listed surname first followed by a blank followed by initial(s) with
periods. The authors' names are separated by commas and terminated by a
semicolon. Author names are not split between lines. 


        4.2.10.4 The RT line

The RT lines contain the title of the reference citation.


        4.2.10.5 The RL line

The RL lines contain the conventional citation information for the
reference. In general, the RL lines alone are sufficient to find the
paper in question. It includes the journal abbreviation, the volume
number, the page range, and the year. Journal names are abbreviated
according to the conventions used by the National Library of Medicine
(NLM) and are based on the existing ISO and ANSI standards.


        4.2.11. The ME line

The method lines describe experiments defining the transcription
initiation site. The format of the ME line is as follows:

ME   Method_description [; Qualifier...] [n,...].

A complete list of method descriptions is given in Section 4.3.2.
Qualifiers may indicate that an experimental gene transcription system
was used, that data are of low precision (less +/- 5 bp), or that the
experiments were done with a closely related gene. The number(s)
enclosed in square brackets links the method descriptions to the
bibliographic references included in the promoter entry. The methods
line from the example are:

ME   Nuclease protection [1,4].
ME   Nuclease protection; transfected or transformed cells [3].
ME   Length measurement of an RNA product; low-precision data [1].


        4.2.12. The SE line

The sequence line shows a short sequence segment corresponding to the
-49 to +10 region of the promoter. Transcribed and untranscribed
nucleotides are represented by upper and lower case characters,
respectively. This line type is not meant to provide sequence data but
serves as a control string for sequence extraction.


        4.2.13. The FL line

The Full length line designates the large-scale cDNA sequencing projects
: NEDO (11 <http://www.epd.isb-sib.ch/current/usrman.html#ref_11>), MGC
(12 <http://www.epd.isb-sib.ch/current/usrman.html#ref_12>), and BDGP
(15 <http://www.epd.isb-sib.ch/current/usrman.html#ref_15>).


        4.2.13. The IF line

The Initiation Frequency lines reflect the frequency at which each
nucleotide within the initiation region is found at the 5'end of bone
fide full-length cDNA clone inserts.


        4.2.14. The TX line

The TX (TaXonomy) lines define a promoter's location within EPD's
hierarchical classification system (see Section 5). Note that starting
from release 72, the classification system
<http://www.epd.isb-sib.ch/current/usrman.html#CLASSIFICATION> is no
longer maintained.


        4.2.15. The KW line

The KW lines define a number of keywords
<http://www.epd.isb-sib.ch/current/keywords.html>describing an entry.


        4.2.16. The FP, DO and RF lines

These lines pertain to the EPD old format, see next Section. 


        4.2.17. The // line

The // (terminator) line contains no data or comments. It designates the
end of an entry.


      4.3. Line types retained from the old format

The last six lines of a entry present essential information in the more
concise, old format. A original description of the old format follows:
Each entry starts with an FP line that contains a position reference to
a transcription initiation site, and ends with a terminator (//).Below
is an example of a promoter entry:

FP   Hs c-myc         P2+:+S  EU:NC_000008.9       1+ 128817660; 11148.053 010*2
XX
DO        Experimental evidence: 4,4#,<2>
DO        Expression/Regulation: +mitogen
RF        Cell34:779     EMBOJ2:2375    MCB7:1393      MCB7:2988
//


        4.3.1. The FP line

The FP line contains the following fields and subfields:
 

    * *columns*

	

    * *data type*

    * 1- 2 
    * 3- 5 
    * 6-30 
          o 6-25 
          o 26-26 
          o 27-27 
          o 28-28 
          o 29-30 
    * 31-55 
          o 31-51 
          o 31-32 
          o 33-33 
          o 34-51 
          o 52-52 
          o 53-53 
          o 54-63 
    * 64-64 
    * 65-70 
    * 71-71 
    * 72-74 
    * 75-75 
    * 76-80 
          o 76-78 
          o 79-79 
          o 80-80 

	

    * "FP" 
    * (blank) 
    * description: 
          o promoter name 
          o ": "
          o independent subset status (see section 6
            <http://www.epd.isb-sib.ch/current/usrman.html#HOMOLOGOUS_PROMOTERS>) 
          o type of initiation site (see section 3
            <http://www.epd.isb-sib.ch/current/usrman.html#ASSIGNMENT_OF_TRANSCRIPTION_INITIATION_SITE>) 
          o (blank) 
    * functional position reference: 
          o sequence reference: 
          o genome db code 
          o ":" 
          o genome db entry accession number 
          o sequence type (0 = circular, 1 = linear) 
          o strand (+ or -) 
          o position number 
    * ";" 
    * entry code 
    * "." 
    * homology group number (see section 6
      <http://www.epd.isb-sib.ch/current/usrman.html#HOMOLOGOUS_PROMOTERS>) 
    * (blank) 
    * alternative promoter identification code: 
          o gene number 
          o "*" 
          o Initiation site number 

Explanations:

    * The promoter name begins with a species code usually followed by a
      gene locus or gene product name. Species codes consist of the
      initials of genus and species name. Occasionally, three characters
      are required to generate unique codes. Standard abbreviations
      identify viruses. The full names of the organisms are given in
      appendix B.1. Subspecies or strains are specified in parentheses.
      Chromosomal locations (genetic or cytogenetic loci, genomic map
      units, etc.) may appear in square brackets immediately following
      species codes. Many gene products are referred to by abbreviations
      explained in appendix B.3. Alternative promoters are identified by
      right-justified "P" and a digit indicating the corresponding
      initiation site numbered sequentially from 5' to 3'. An optional
      "E" and digit refers to the corresponding 5'exons, if known.
      Identical numbers indicate 3'co-terminal exons. The strongest
      initiation site is marked by trailing + if known (see also List of
      alternative promoters <http://www.epd.isb-sib.ch/current/AP.html>)
    * genome db codes currently used are 'EM' for EMBL database, and
      'EU' for genome contigs or chromosomal genome assemblies of the
      RefSeq database.
    * The EMBL accession number always relates to the first EMBL
      cross-reference. This one is usually the longest promoter region
      except when the entry is cancelled from the EMBL database, but
      still present in GenBank or DDBJ.
    * The sequence type indicates whether the sequence is circular or
      linear. A sequence comprising exactly one repeat unit of a tandem
      repeat cluster is also considered circular. Note that the
      annotation as circular or linear sequences in EPD is not always in
      agreement with the corresponding annotation in EMBL.
    * The entry code is a five-digit number which is the only part of a
      promoter entry that is stable from release to release.
    * Alternative promoter identification code: Genes represented by
      multiple promoter entries in EPD are assigned a promoters group
      number. The corresponding initiation sites are numbered
      sequentially from 5' to 3'.


        4.3.2. DO lines: Documentation

Documentation of promoter entries is presented on lines starting with
"DO". They are essentially free format and so far not processed by
specific programs. In the present release, there are two DO lines per
entry, the first referring to the transcript mapping experiments that
define the promoter, the second giving information about expression and
regulation.The varies experimental techniques are identified by number
codes.The "Medline's number" and/or "example" in brackets are linked,
respectively, to the abstract and/ or to the full text article
describing the related experiment.

 
codes 	experiments
1 	Direct RNA sequencing (1634116
<http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=1634116>)

2 	Length measurement of an RNA product (1989694
<http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=1989694>)

3 	Nuclease protection : Length measurement of a nuclease-protected
complementary RNA or DNA fragment (2845126
<http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=2845126>)
(8294473
<http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=8294473>)

4 	RNA sequencing by primer extension : by dideoxy-terminated primer
extension (3396543
<http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=3396543>)

5 	Sequencing of a full-length cDNA (8294473
<http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=8294473>)

6 	Primer extension : Length measurement of a primer extension product 
(10187799
<http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=10187799>
, example <http://www.jbc.org/cgi/content/full/274/15/10154/F3>)
(9880555
<http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=9880555>
, example <http://www.jbc.org/cgi/content/full/274/3/1736/F2>)
7 	DNA sequencing of a full-length processed pseudogene (3584116
<http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=3584116>)

8 	Reverse direction primer extension with homologous sequence ladder :
Length measurement of an in vitro synthesised DNA primed upstream of the
initiation site and blocked by the 5'end of the RNA hybridized to the
template (2451027
<http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=2451027>)

9 	Rapid amplification of cDNA ends (RACE) (9116864
<http://www.ncbi.nlm.nih.gov:80/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=9116864&dopt=Books>)

10 	RNA sequencing, type not specifed
11 	Oligo-capping : artificial capping of mRNA followed by sequencing of
the 5' end of cDNA (11375929
<http://www.ncbi.nlm.nih.gov:80/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=11375929&dopt=Books>,
11337467 and examples)
<http://www.ncbi.nlm.nih.gov:80/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=11337467&dopt=Books>

12 	Mammalian gene collection (MGC) full-length cDNA cloning (10521335
and example)
<http://www.ncbi.nlm.nih.gov:80/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=10521335&dopt=Books>

13 	5' end confirmed by alignment of first 100 downstream nucleotides to
EST database.
14 	Oligo-capping: Berkeley Drosophila Genome Project (12537569
<http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=12537569>)

15 	Oligo-capping: Rice full-length cDNA cloning (12869764
<http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=12869764&query_hl=2>)


Special characters appended to the number codes designate an
experimental gene expression system where the RNA for the corresponding
experiments was synthesized.
 
* 	RNA POL II in vitro system
o 	injected amphibian oocytes
# 	transfected or transformed cells, injected neurons
! 	transgenic organisms

	
r 	experiments performed with closely related gene
h 	homologous sequence ladder used for length measurement of  nuclease
protection or primer extension product
l 	low-precision data (error > +/- 5 bp)

Explanations and additional conventions:

    * The full-length assumption of a cDNA clone or a proccessed
      pseudogene is based on consistency with accompanying
      nuclease-protection or primer extension data or, alternatively,
      the existence of multiple 5'coterminal clones or pseudogenes.

The information on expression/regulation may include indication of
developmental stages, tissues, cell types, cell cycle stages, and
various regulatory features.Conventions:

    * Semicolon delimits the two fields : expression and regulation.
    * Comma delimits alternative keywords (e.g. liver, kidney)
    * "+" means "induced by" or "strongly expressed in".
    * "-" means "repressed by" or "weakly expressed in".
    * "~" means "modulated by".
    * Cell cycle stages are given in square brackets.


        4.3.3. RF line: Literature references

The first four references from the RN, RX, RA, RT and RL lines are
repeated in a highly condensed form. Each reference is spaced by 15
letters and indicates journal, volume, and starting page of the referred
article (maximal 14 letters). The journal code explained in Appendix B.2.


They primarily point to the articles where the experimental promoter
evidence is presented. Additional potential subjects are homology to
other promoters, gene expression and regulation, nomenclature. Papers
containing only sequence data are usually not referred to because they
are easy to find via the corresponding EMBL sequence entry descriptions.


        4.3.4. Miscellaneous

    * Greek letters are sometimes represented by corresponding latin
      letters followed by apostrophe:

       
      a' = alpha 	b' = beta 	g' = gamma 	d' = delta 	e' = epsilon
      z' = zeta 	h' = eta 	th'= theta 	k' = kappa 	l' = lambda
      n' = nu 	r' = rho

    * Sub- and superscripts are sometimes indicated by preceding "_" and
      "^", respectively.


      4.4. Distinct format of 'preliminary' entries in epd_bulk.dat


        4.4.1. The title line:

TI   epd83     Bulk Section Eukaryotic Promoter Database / Release 83 EP


        4.4.2. The ID line

The identification line is always the first line of an entry. The form
of the ID line in 'epd_bulk.dat' is:

ID   OS_bAAAA     preliminary; undefined; TAXONOMIC DIVISION.

    * An unique entry identifier "OS_bAAAA" is contructed using the
      species identification code ('OS') with at most 4 alphanumeric
      characters representing the biological source of the promoter and
      a 'b' (for bulk) followed by an arbitrary 4 letter code
    * "preliminary" /data class/ field indicates that the entry has not
      (yet) undergone all quality checks necessary for being classified
      as "standard".
    * "undefined" as /initiation site type/ due to insufficient data to
      define transcription initiation patterns (Section 3
      <http://www.epd.isb-sib.ch/current/usrman.html#ASSIGNMENT_OF_TRANSCRIPTION_INITIATION_SITE>).
    * /TAXONOMIC DIVISION/ are
          o PLN for plant
          o NEM for nematode
          o ART for arthropode
          o MLS for mollusc
          o ECH for echinoderm
          o VRT for vertebrates.
      Note that these codes relate to the organism in which the promoter
      is expressed, not to the source organism in which the promoter is
      replicated as defined on the OS line. 

The ID line is terminated by a period.


        4.4.3. The AC line

AC   EP00001;

The accession number consists of the character string "EP" followed by 5
digits. Previously the first two digits of the AC designated the release
number of initial appearance of the specific entry followed by the EPD
entry order. AC numbers in 'epd_bulk.dat' are continuous numbers,
excluding ACs already used for entries in the main file 'epd.dat'.


    5 CLASSIFICATION

*Starting from release 72, the classification system is no longer
maintained. New entries are presently added by default to an
'?Unclassified' category. The classification system might still provide
valuable information for entries added before release 72. However for
any category, consider the possible existence of additional, potentially
corresponding EPD entries in the default categories.*


/The entries of the Eukaryotic Promoter Database are embedded in a
hierarchical classification
<http://www.epd.isb-sib.ch/current/epd_classif.html> system. A
promoter's taxonomic location is made clear by interspersed group
headings. The example shown below is taken from top of the database. A
contrasting format has been chosen to emphasize the very different
nature of this information./

/*----------------------------------------------------------------------*
*    1. Plant promoters                                                *
*----------------------------------------------------------------------*
*    1.1. Chromosomal genes                                            *
*----------------------------------------------------------------------*
*    1.1.1. Small nuclear RNAs                                         *
*----------------------------------------------------------------------*/

/A group heading consists of a series of node numbers and a title. The
highest classification level distinguishes between promoters active in
major eukaryotic taxa (phyla). Further below, grouping considers
replicon type and functional properties of gene products. On the lowest
level, homology (as defined in section 6) is the criterion. A survey of
the upper part of the classification pyramid is presented in appendix
A.The proposed classification system has a highly tentative character as
it is often unclear how a new promoter should be classified, especially
if the gene product is a multifunctional protein. Users should therefore
not be surprised or discouraged if they don't find a promoter at the
initially expected place./


    6 HOMOLOGOUS PROMOTERS

Homology is defined as sequence similarity due to common phylogenetic
origin. In EPD, two promoters are considered homologous if they exhibit
>=50% sequence similarity between -79 and +20. Similarity is calculated
from optimal alignments generated with the aid of the UWGCG subroutine
ShiftAlign (13 <http://www.epd.isb-sib.ch/current/usrman.html#ref_13>)
using the following symbol comparison table:

 

A 	C 	G 	N 	T 	
1.0 	0.0 	0.0 	0.5 	0.0 	A

	1.0 	0.0 	0.5 	0.0 	C

	
	1.0 	0.5 	0.0 	G

	
	
	0.5 	0.5 	N

	
	
	
	1.0 	T

Gap weight and gap length weight are specified as 3 and 0, respectively.
Terminal gaps are ignored. Percent similarity is understood as alignment
score divided by segment length, times 100. Groups of homologous
promoters are identified by homology group numbers (see 4.2.1.).
Definition of these groups is based on similarity scores as defined
above and a tree generation method called UPGMA (14
<http://www.epd.isb-sib.ch/current/usrman.html#ref_14>). In a few cases,
similarities between 50% and 56% were ignored if the protein sequences
of the corresponding genes were not related. Similarities were also
ignored between alternative promoter sequences that are spaced by less
than 50 bp. A subset of "independent" promoters is marked by "+" in
column 27 of the FP line. This set contains only one member per homology
group (usually, the promoter with the longest upstream sequence
available) and is intended to be used for statistical analysis of
functional patterns where it is important to avoid bias by multiples of
closely related sequences.


    7 PROMOTER SEQUENCE RETRIEVAL

Promoter sequence listings have not been incorporated into EPD for two
reasons: (i) to avoid duplication of data already existing elsewhere in
the EMBL data library, and (ii) to encourage usage of FPS-dependent
sequence retrieval programs which enables the user to specify suitable
5'- and 3'boundaries of the requested sequence segments himself. Effort
is under way to motivate producers of standard nucleotide sequence
analysis packages to provide such tools in the future. In the meantime,
users with some programming experience will find it easy to write their
own routines. Our local sequence extraction programs run in a UWGCG
environment (13 <http://www.epd.isb-sib.ch/current/usrman.html#ref_13>)
and have been implemented at several sites in Europe and the United
States. They are documented and freely available on request.


    8 REFERENCES

   1. Bucher, P. & Trifonov, E.N., /Compilation and analysis of
      eukaryotic POL II promoter sequences/, Nucl. Acids Res. *14*,
      10009-10026 (1986). (3808945
      <http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=3808945>)

   2. Bucher, P. & Bryan, B., /Signal search analysis: a new method to
      localize and characterize functionally important DNA sequences/,
      Nucl. Acids Res. *12*, 287-305 (1984). (84118736
      <http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=84118736>)

   3. Stoesser, G., Tuli,M.A., Lopez, R. and Sterk, P., /The EMBL
      nucleotide sequence database/, Nucleic Acids. Res., *27*,* *18-24
      (1999). (99063644
      <http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=99063644>)

   4. Benson, D.A., Boguski, M.S., Lipman, D.J., Ostell, J., Ouellette
      B.F.F,  Rapp, B:A: and Wheeler, D.L., /GenBank,/ Nucleic Acids.
      Res., *27*, 12-17 (1999). (99063643
      <http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=99063643>)

   5. Sugawara,  H., Miyazaki, S., Gojobori, T. and Tateno, Y.,/DNA Data
      Bank of Japan dealing with large-scale data submission/, Nucleic
      Acids. Res., *27*, 25-28 (1999). (99063645
      <http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=99063645>)

   6. Bairoch, A. and Apweiler, R., /The SWISS-PROT protein sequence
      data bank and its supplement TrEMBL in 1999/, Nucleic Acids Res.,
      *27*, 49-54 (1999). (99063650
      <http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=99063650>)

   7. Heinemeyer, T., Chen, X., Karas, H., Kel, A.E., Kel, O.V.,
      Liebich, I., Meinhardt, T., Reuter, I., Schacherer, F. and
      Wingender, E., /Expanding the TRANSFAC database towards an expert
      system of regulatory molecular mechanisms/, Nucleic Acids. Res.,
      *27*,* *318-322 (1999). (99063727
      <http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=99063727>)

   8. The FlyBase consortium, /The FlyBase database of the drosophilia
      genome projects and community litterature/, Nucleic Acids. Res.,
      *27*,85-88 (1999). (99063659
      <http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=99063659>)

   9. Pearson, P., Francomano, C., Foster, P., Bocchini, C., Li, P. and
      McKusick, V., /The status of online Mendelian inheritance in man
      (OMIM) medio 1994, /Nucleic Acids Res., *22*, 3470-3473 (1994).
      (95023074
      <http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=95023074>)

  10. Blake, J.A., Richardson, J.E., Davisson, M.T., Eppig, J.T. and the
      Mouse Genome Database Group, /The Mouse Genome Database (MGD):
      genetic and genomic information about the laboratory mouse/,
      Nucleic Acids Res., *27*, 95-98 (1999). (99063661
      <http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=99063661>)

  11. Suzuki Y., Yamashita R., Nakai K., Sugano S., /DBTSS: database of
      human transcriptional start sites and full-length cDNAs. /Nucleic
      Acids Res. *30*(1):328-331(2002). (11752328
      <http://www.ncbi.nlm.nih.gov:80/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=11752328&dopt=Books>)

  12. Strausberg, R.L., Feingold, E.A., Klausner, R.D., Collins, F.S.,
      /The Mammalian Gene Collection. /Science, *286*, 455-457 (1999).
      (10521335)
      <http://www.ncbi.nlm.nih.gov:80/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=10521335&dopt=Books>

  13. Devereux,J., Haeberli,P., & Smithies,O. /A comprehensive set of
      sequence analysis programs for the VAX/, Nucl. Acids Res. *12*,
      387-395 (1984). (84118744
      <http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=84118744>)

  14. Sneath,H.A. & Sokal,R.R., /Numerical taxonomy/, W.H. Freemann, San
      Francisco, London (1973).

  15. Stapleton M., Liao GC., Brokstein P., Hong L., Carninci P.,
      Shiraki T., Hayashizaki Y., Champe M., Pacleb J., Wan K., Yu C.,
      Carlson J., George R., Celniker S., and Rubin GM., /The Drosophila
      Gene Collection: Identification of Putative Full-Length cDNAs for
      70% of D. melanogaster Genes. /Genome Res., *12*:1294-1300 (2002).
      (12176937
      <http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=12176937>)

  16. Schmid C.D., Praz V., Delorenzi M., Périer R., and Bucher P., The
      Eukaryotic Promoter Database EPD: the impact of in silico primer
      extension. Nucleic Acids Res. *32,* D82-5 (2004). (14681364
      <http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&dopt=Citation&list_uids=14681364>)

        


    A.  APPENDIX A : SURVEY OF RELEASE
    <http://www.epd.isb-sib.ch/current/SURVEY.html>


    B.  APPENDIX B : CODES AND ABBREVIATIONS


      B.1. SPECIES CODES


 

*Code* 	/Scientific name/ (English name)
AAV2 	/Adeno-associated virus 2
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Adeno-associated+virus&lvl=0&srchmode=1>/

Ac 	/Aplysia californica
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Aplysia+californica&lvl=0&srchmode=1>/
(California sea hare)
AcNPV 	/Autographa californica nuclear polyhedrosis virus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Autographa+californica+nuclear+polyhedrosis+virus&lvl=0&srchmode=1>/

Ad2 	/Human adenovirus type 2
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Human+adenovirus+type+2&lvl=0&srchmode=1>/

Ad5 	/Human adenovirus type 5
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Human+adenovirus+type+5&lvl=0&srchmode=1>/

Ad7 	/Human adenovirus type 7
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Human+adenovirus+type+7&lvl=0&srchmode=1>/

Ad12 	/Human adenovirus type 12
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Human+adenovirus+type+12&lvl=0&srchmode=1>/

Ag 	/Ateles geoffroyi
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Ateles+geoffroyi&lvl=0&srchmode=1>/
(black-handed spider monkey)
ALV 	/Avian leukosis virus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Avian+leukosis+virus&lvl=0&srchmode=1>/

Am 	/Antirrhinum majus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Antirrhinum+majus&lvl=0&srchmode=1>/
(snapdragon)
Ab-MLV 	/Abelson murine leukemia virus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Abelson+murine+leukemia+virus&lvl=0&srchmode=1>/

Apo 	/Antheraea polyphemus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Antheraea+polyphemus&lvl=0&srchmode=1>/
(polyphemus moth)
Ap 	/Anas platyrhynchos
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Anas+platyrhynchos&lvl=0&srchmode=1>/
(mallard, domestic duck)
As 	/Avena sativa
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Avena+sativa&lvl=0&srchmode=1>/
(oat)
At 	/Agrobacterium tumefaciens
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Agrobacterium+tumefaciens&lvl=0&srchmode=1>/

Ath 	/Arabidopsis thaliana
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1>/
(thale cress)
Atr 	/Aotus trivirgatus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Aotus+trivirgatus&lvl=0&srchmode=1>/
(douroucouli)
Ay 	/Antheraea yamamai
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Antheraea+yamamai&lvl=0&srchmode=1>/

B19 	/Human parvovirus B19
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Human+parvovirus+B19&lvl=0&srchmode=1>/

Be 	/Bertholletia excelsa
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Bertholletia+excelsa&lvl=0&srchmode=1>/
(Brazil nut)
BKV 	/Papovavirus BKV
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Papovavirus+BKV&lvl=0&srchmode=1>/

BLV 	/Bovine leukemia virus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Bovine+leukemia+virus&lvl=0&srchmode=1>/

Bm 	/Bombyx mori
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Bombyx+mori&lvl=0&srchmode=1>/
(silkworm)
Bn 	/Brassica napus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Brassica+napus&lvl=0&srchmode=1>/
(rape)
BPV1 	/Bovine papillomavirus type 1
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Bovine+papillomavirus+type+1&lvl=0&srchmode=1>/

Bt 	/Bos taurus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Bos+taurus&lvl=0&srchmode=1>/
(cattle)
CaMV 	/Cauliflower mosaic virus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Cauliflower+mosaic+virus&lvl=0&srchmode=1>/

Cco 	/Coturnix coturnix
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Coturnix+coturnix&lvl=0&srchmode=1>/
(quail)
Ce 	/Caenorhabditis elegans
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Caenorhabditis+elegans&lvl=0&srchmode=1>/

Cg 	/Canavalia gladiata
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Canavalia+gladiata&lvl=0&srchmode=1>/
(sword bean)
Cgr 	/Cricetulus griseus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Cricetulus+griseus&lvl=0&srchmode=1>/
(Chinese hamster)
Ch 	/Capra hircus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Capra+hircus&lvl=0&srchmode=1>/
(goat)
Cl 	/Canis lupus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Canis+lupus&lvl=0&srchmode=1>/
(gray wolf)
Cm 	/Cairina moschata
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Cairina+moschata&lvl=0&srchmode=1>/
(muscovy duck)
Cp 	/Cavia porcellus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Cavia+porcellus&lvl=0&srchmode=1>/
(domestic guinea pig)
Cpe 	/Cucurbita pepo
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Cucurbita+pepo&lvl=0&srchmode=1>/
(zucchini)
Ct 	/Chironomus thummi
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Chironomus+thummi&lvl=0&srchmode=1>/
(midge)
Cte 	/Chironomus tentans
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Chironomus+tentans&lvl=0&srchmode=1>/

Dc 	/Daucus carota
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Daucus+carota&lvl=0&srchmode=1>/
(carrot)
Df 	/Drosophila funebris
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Drosophila+funebris&lvl=0&srchmode=1>/
(fruit fly)
Dh 	/Drosophila hydei
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Drosophila+hydei&lvl=0&srchmode=1>/
(fruit fly)
DHBV 	/Duck hepatitis B virus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Duck+hepatitis+B+virus&lvl=0&srchmode=1>/

Dm 	/Drosophila melanogaster
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Drosophila+melanogaster&lvl=0&srchmode=1>/
(fruit fly)
Dma 	/Drosophila mauritiana
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Drosophila+mauritiana&lvl=0&srchmode=1>/
(fruit fly)
Dmo 	/Drosophila mojavensis
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Drosophila+mojavensis&lvl=0&srchmode=1>/
(fruit fly)
Dmu 	/Drosophila mulleri
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Drosophila+mulleri&lvl=0&srchmode=1>/
(fruit fly)
Do 	/Drosophila orena
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Drosophila+orena&lvl=0&srchmode=1>/
(fruit fly)
Dp 	/Drosophila pseudoobscura
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Drosophila+pseudoobscura&lvl=0&srchmode=1>/
(fruit fly)
Ds 	/Drosophila simulans
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Drosophila+simulans&lvl=0&srchmode=1>/
(fruit fly)
Dse 	/Drosophila sechellia
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Drosophila+sechellia&lvl=0&srchmode=1>/
(fruit fly)
Dv 	/Drosophila virilis
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Drosophila+virilis&lvl=0&srchmode=1>/
(fruit fly)
EBV 	/Human herpesvirus 4
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Human+herpesvirus+4&lvl=0&srchmode=1>/
(Epstein-Barr virus)
Ec 	/Equus caballus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Equus+caballus&lvl=0&srchmode=1>/
(horse)
FBJ-MSV 	/Murine osteosarcoma virus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Murine+osteosarcoma+virus&lvl=0&srchmode=1>/
(Finkel-Biskis-Jinkins)
FBR-MSV 	/Murine osteosarcoma virus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Murine+osteosarcoma+virus&lvl=0&srchmode=1>/
(Finkel-Biskis-Reilly)
F-MCF 	/Friend mink cell focus-forming virus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Friend+mink+cell+focus-forming+virus&lvl=0&srchmode=1>/
(Murine)
Fs 	/Felis silvestris
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Felis+silvestris&lvl=0&srchmode=1>/
(wild cat)
F-SFFV 	/Friend spleen focus-forming virus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Friend+spleen+focus-forming+virus&lvl=0&srchmode=1>/

Ft 	/Flaveria trinervia
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Flaveria+trinervia&lvl=0&srchmode=1>/

GA-FeLV 	/Gardner-Arnstein feline leukemia oncovirus B
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Gardner-Arnstein+feline+leukemia+oncovirus+B&lvl=0&srchmode=1>/

GALV 	/Gibbon ape leukemia virus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Gibbon+ape+leukemia+virus&lvl=0&srchmode=1>/

Gg 	/Gallus gallus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Gallus+gallus&lvl=0&srchmode=1>/
(chicken)
Ggo 	/Gorilla gorilla
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Gorilla+gorilla&lvl=0&srchmode=1>/
(gorilla)
Gm 	/Glycine max
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Glycine+max&lvl=0&srchmode=1>/
(soybean)
GSHV 	/Ground squirrel hepatitis virus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Ground+squirrel+hepatitis+virus&lvl=0&srchmode=1>/

H-1 	/Parvovirus H1
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Parvovirus+H1&lvl=0&srchmode=1>/
(Murine)
Ha 	/Helianthus annuus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Helianthus+annuus&lvl=0&srchmode=1>/
(common sunflower)
Hb 	/Hevea brasiliensis
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Hevea+brasiliensis&lvl=0&srchmode=1>/
(para rubber tree)
HBV 	/Human hepatitis B virus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Human+hepatitis+B+virus&lvl=0&srchmode=1>/

HCMV 	/Human cytomegalovirus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Human+cytomegalovirus&lvl=0&srchmode=1>/

Hg 	/Halichoerus grypus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Halichoerus+grypus&lvl=0&srchmode=1>/
(grey seal)
HIV-1 	/Human immunodeficiency virus type 1
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Human+immunodeficiency+virus+type+1&lvl=0&srchmode=1>/

HIV-2 	/Human immunodeficiency virus type 2
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Human+immunodeficiency+virus+type+2&lvl=0&srchmode=1>/

HPV16 	/Human papillomavirus type 16
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Human+papillomavirus+type+16&lvl=0&srchmode=1>/

HPV18 	/Human papillomavirus type 18
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Human+papillomavirus+type+18&lvl=0&srchmode=1>/

Hs 	/Homo sapiens
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Homo+sapiens&lvl=0&srchmode=1>/
(human)
HSV-1 	/Human herpesvirus 1
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Human+herpesvirus+1&lvl=0&srchmode=1>/

HSV-2 	/Human herpesvirus 2
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Human+herpesvirus+2&lvl=0&srchmode=1>/

HTLV-I 	/Human T-cell leukemia virus type I
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Human+T-cell+leukemia+virus+type+I&lvl=0&srchmode=1>/

HTLV-II 	/Human T-cell leukemia virus type II
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Human+T-cell+leukemia+virus+type+II&lvl=0&srchmode=1>/

Hv 	/Hordeum vulgare
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Hordeum+vulgare&lvl=0&srchmode=1>/
(barley)
HVS 	/Herpesvirus saimiri
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Herpesvirus+saimiri&lvl=0&srchmode=1>/

JCV 	/Human polyomavirus JCV
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=human+polyomavirus+JCV&lvl=0&srchmode=1>/

Le 	/Lycopersicon esculentum
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Lycopersicon+esculentum&lvl=0&srchmode=1>/
(tomato)
Leu 	/Lepus europaeus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Lepus+europaeus&lvl=0&srchmode=1>/
(European hare)
Lm 	/Locusta migratoria
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Locusta+migratoria&lvl=0&srchmode=1>/
(migratory locust)
Lp 	/Lytechinus pictus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Lytechinus+pictus&lvl=0&srchmode=1>/
(painted urchin)
Lpe 	/Lycopersicon peruvianum
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Lycopersicon+peruvianum&lvl=0&srchmode=1>/
(Peruvian tomato)
Lv 	/Lytechinus variegatus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Lytechinus+variegatus&lvl=0&srchmode=1>/
(green urchin)
Ma 	/Mesocricetus auratus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Mesocricetus+auratus&lvl=0&srchmode=1>/
(golden hamster)
Mc 	/Macaca fascicularis
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Macaca+fascicularis&lvl=0&srchmode=1>/
(crab-eating macaque)
MCMV 	/Murine cytomegalovirus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Murine+cytomegalovirus&lvl=0&srchmode=1>/

MLV_AKV 	/AKV murine leukemia virus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=AKV+murine+leukemia+virus&lvl=0&srchmode=1>/

MLVxeno 	/Xenotropic murine leukemia virus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Xenotropic+murine+leukemia+virus&lvl=0&srchmode=1>/

Mm 	/Mus musculus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Mus+musculus&lvl=0&srchmode=1>/
(house mouse)
M-MLV 	/Moloney murine leukemia virus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Moloney+murine+leukemia+virus&lvl=0&srchmode=1>/

M-MSV 	/Moloney murine sarcoma virus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Moloney+murine+sarcoma+virus&lvl=0&srchmode=1>/

MMTV 	/Mouse mammary tumor virus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Mouse+mammary+tumor+virus&lvl=0&srchmode=1>/

Ms 	/Medicago sativa
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Medicago+sativa&lvl=0&srchmode=1>/
(alfalfa)
MSV 	/Maize streak virus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Maize+streak+virus&lvl=0&srchmode=1>/

Np 	/Nicotiana plumbaginifolia
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Nicotiana+plumbaginifolia&lvl=0&srchmode=1>/
(curled-leaved tobacco)
Ns 	/Nicotiana sylvestris
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Nicotiana+sylvestris&lvl=0&srchmode=1>/
(wood tobacco)
Nt 	/Nicotiana tabacum
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Nicotiana+tabacum&lvl=0&srchmode=1>/
(common tobacco)
Nto 	/Nicotiana tomentosiformis
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Nicotiana+tomentosiformis&lvl=0&srchmode=1>/

Oa 	/Ovis aries
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Ovis+aries&lvl=0&srchmode=1>/
(sheep)
Oc 	/Oryctolagus cuniculus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Oryctolagus+cuniculus&lvl=0&srchmode=1>/
(rabbit)
Os 	/Oryza sativa
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Oryza+sativa&lvl=0&srchmode=1>/
(rice)
Ph 	/Petunia hybrida
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Petunia+hybrida&lvl=0&srchmode=1>/
(e.g. Petunia strain Mitchell)
Pa 	/Papio anubis
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Papio+anubis&lvl=0&srchmode=1>/
(olive baboon)
Pc 	/Petroselinum crispum
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Petroselinum+crispum&lvl=0&srchmode=1>/
(parsley)
Pl 	/Paracentrotus lividus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Paracentrotus+lividus&lvl=0&srchmode=1>/
(common urchin)
Pm 	/Psammechinus miliaris
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Psammechinus+miliaris&lvl=0&srchmode=1>/
(sand urchin)
Polyoma 	/Mouse polyomavirus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Mouse+polyomavirus&lvl=0&srchmode=1>/

Ppy 	/Photinus pyralis
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Photinus+pyralis&lvl=0&srchmode=1>/
(North American firefly)
Pp 	/Pongo pygmaeus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Pongo+pygmaeus&lvl=0&srchmode=1>/
(orangutan)
Ps 	/Pisum sativum
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Pisum+sativum&lvl=0&srchmode=1>/
(pea)
Pt 	/Pan troglodytes
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Pan+troglodytes&lvl=0&srchmode=1>/
(chimpanzee)
Pth 	/Pinus thunbergii
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Pinus+thunbergii&lvl=0&srchmode=1>/
(Japanese black pine)
Pv 	/Phaseolus vulgaris
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Phaseolus+vulgaris&lvl=0&srchmode=1>/
(kidney bean)
RAV2 	/Rous associated virus type 2
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Rous+associated+virus+type+2&lvl=0&srchmode=1>/
(Avian)
Rc 	/Ricinus communis
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Ricinus+communis&lvl=0&srchmode=1>/
(castor bean)
R-MCF 	/Rauscher mink cell focus-forming virus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Rauscher+mink+cell+focus-forming+virus&lvl=0&srchmode=1>/

Rn 	/Rattus norvegicus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Rattus+norvegicus&lvl=0&srchmode=1>/
(Norway rat)
RSV 	/Rous sarcoma virus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Rous+sarcoma+virus&lvl=0&srchmode=1>/
(Avian) 
Sa 	/Sinapis alba
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Sinapis+alba&lvl=0&srchmode=1>/
(white mustard)
SA7P 	/Simian adenovirus (7P)
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Simian+adenovirus+&lvl=0&srchmode=1>/

Sd 	/Strongylocentrotus droebachiensis
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Strongylocentrotus+droebachiensis&lvl=0&srchmode=1>/

Se 	/Nannospalax ehrenbergi
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Nannospalax+ehrenbergi&lvl=0&srchmode=1>/
(Ehrenberg's mole-rat)
Sg 	/Oncorhynchus mykiss
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Oncorhynchus+mykiss&lvl=0&srchmode=1>/
(rainbow trout)
SIV 	/Simian immunodeficiency virus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Simian+immunodeficiency+virus+&lvl=0&srchmode=1>/

SNV 	/Spleen necrosis virus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Spleen+necrosis+virus&lvl=0&srchmode=1>/

So 	/Spinacia oleracea
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Spinacia+oleracea&lvl=0&srchmode=1>/

Sp 	/Strongylocentrotus purpuratus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Strongylocentrotus+purpuratus&lvl=0&srchmode=1>/

Spe 	/Sarcophaga peregrina
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Sarcophaga+peregrina&lvl=0&srchmode=1>/

Sr 	/Sesbania rostrata
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Sesbania+rostrata&lvl=0&srchmode=1>/

SRV-1 	/Simian AIDS retrovirus SRV-1
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Simian+AIDS+retrovirus+SRV-1&lvl=0&srchmode=1>/

Ss 	/Sus scrofa
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Sus+scrofa&lvl=0&srchmode=1>/
(pig)
SSV 	/Simian sarcoma virus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Simian+sarcoma+virus&lvl=0&srchmode=1>/

St 	/Solanum tuberosum
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Solanum+tuberosum&lvl=0&srchmode=1>/
(potato)
Sv 	/Sorghum bicolor (sorghum)
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Sorghum+bicolor&lvl=0&srchmode=1>/

SV40 	/Simian virus 40
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Simian+virus+40&lvl=0&srchmode=1>/

Ta 	/Triticum aestivum
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Triticum+aestivum&lvl=0&srchmode=1>/
(wheat)
Visna 	/Visna lentivirus
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Visna+lentivirus&lvl=0&srchmode=1>/

Xb 	/Xenopus borealis
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Xenopus+borealis&lvl=0&srchmode=1>/
(Kenyan clawed frog)
Xl 	/Xenopus laevis
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Xenopus+laevis&lvl=0&srchmode=1>/
(African clawed frog)
Xt 	/Xenopus tropicalis
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Xenopus+tropicalis&lvl=0&srchmode=1>/
(western clawed frog)
Zm 	/Zea mays
<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Zea+mays&lvl=0&srchmode=1>/
(maize)

>


      B.2. JOURNAL CODES


 

Code 	Journal name
ARB 	Annual Review of Biochemistry
ARP 	Annual Review of Physiology
BBA 	Biochimica Biophysica Acta
BBRC 	Biochemical and Biophysical Research Communications
Bch 	Biochemistry
Bchi 	Biochimie
BchJ 	Biochemical Journal
BCHS 	Biological Chemistry Hoppe-Seyler
BrJR 	British Journal of Rheumatology
BrainR 	Brain Research
Btech 	Biotechnology
CanR 	Cancer Research
Cell 	Cell
CGD 	Cell Growth Differentiation
Chrom 	Chromosoma
CSHS 	Cold Spring Harbor Symposia on Quantitative Biology
CTMI 	Current Topics in Microbiology and Immunology
CurG 	Current Genetics
DCB 	DNA and Cell Biology
DevB 	Developmental Biology
Diab 	Diabetes
DNA 	DNA
ECR 	Experimental Cell Research
EJBc 	European Journal of Biochemistry
EJCB 	European Journal of Cellular Biology
EMBOJ 	EMBO Journal
EMBOR 	EMBO Reports
Evo 	Evolution
FEBS 	FEBS Letters
GDev 	Genes and Development
Gene 	Gene
GChC 	Genes Chromosomes Cancer
GnmR 	Genome Research
Gnms 	Genomics
Gnts 	Genetics
HGEN 	Human Genetics
IJCa 	International Journal of Cancer
ImTo 	Immunology Today
JBC 	Journal of Biological Chemistry
JBch 	Journal of Biochemistry
JCB 	Journal of Cell Biology
JEM 	Journal of Experimental Medicine
JGV 	Journal of General Virology
JI 	Journal of Immunology
JMAG 	Journal of Molecular and Applied Genetics
JMB 	Journal of Molecular Biology
JME 	Journal of Molecular Evolution
JMEnd 	Journal of Molecular Endocrinology
JNeSc 	Journal of Neuroscience
JVir 	Journal of Virology
MB 	Molecular Biology
MBE 	Molecular Biology and Evolution
MBM 	Molecular Biology and Medicine
MBR 	Molecular Biology Reports
MCB 	Molecular and Cellular Biology
MCEnd 	Molecular and Cellular Endocrinology
MEnd 	Molecular Endocrinology
MImm 	Molecular Immunology
MEnz 	Methods in Enzymology
MGG 	Molecular and General Genetics
MNeub 	Molecular Neurobiology
MPMI 	Molecular Plant-Microbe Interactions
NAR 	Nucleic Acids Research
Nat 	Nature
Oncg 	Oncogene
OncR 	Oncogene Research
Pla 	Planta
PlJ 	Plant Journal
PMB 	Plant Molecular Biology
PSL 	Plant Science Letters
RPHR 	Recent Progress in Hormone Research
PNAS 	Proceedings of the National Academy of Sciences of the United
States of America
Sci 	Science
SCMG 	Somatic Cell and Molecular Genetics
TiG 	Trends in Genetics
Vir 	Virology
VirR 	Virus Research

>


      B.3.  ABBREVIATIONS


 

1-25OH2D3 	1,25-(OH)_2 vitamin D_3
20-OHE 	20-Hydroxyecdysone
4CL 	4-coumarate coenzyme A ligase
a1 	Gene locus 1 involved in anthocyanin biosynthesis
abd-g. 	Abdominal ganglion
abl 	Abelson murine leukemia virus oncogene
ACC 	1-aminocyclopropane-1-carboxylic acid
AChR 	Acetylcholin receptor
ACP 	b'-ketoacyl-acyl carrier protein of fatty acid synthase
ACTH 	Adrenocorticotropic hormone
ADA 	Adenosine deaminase
ADH 	Alcohol dehydrogenase
ADPg-s 	GT ADPglucose-starch glucosyltransferase
adult-HA 	Adult hermaphrodite
AFW1 	Adult fast-white (myosin heavy chain) 1
Ag 	Antigen
(AGM) 	"from african green monkey"
AGP 	Acid glycoprotein
AGPP 	ADP glucose pyrophosphorylase
AIRS 	Aminoimidazole ribonucleotide synthase
ALA-synt. 	5-Aminolevulinate synthase
ALDH_2 	Aldehyde dehydrogenase 2
AlkExo 	Alkaline exonuclease
Amy 	Amylase
antp 	"antennapedia" locus
aP2 	Adipocyte homologue of myelin P2
apolipop. 	Apolipoprotein
apoVLDLII 	Very low densitiy apolipoprotein II
APRT 	Adenine phosphoribosyltransferase
AR 	Adrenergic receptor
ARF 	ADP-ribosylation factor
arg 	Arginine
AS 	Argininosuccinate synthetase
AS-C 	"achaete-scute" complex locus
AspAT 	Aspartate aminotransferase
ass. 	Associated
AT 	Antitrypsin
ATIII 	Antithrombin III
ATCase 	Aspartate transcarbamylase
ATP 	Adenosinetriphosphate
awd 	"abnormal wing disk" locus
BB 	Bowman-Birk (protease inhibitor)
BCKDHA 	Branched-chain alpha-keto acid dehydrogenase complex
Bcl-2 	B-cell leukemia/lymphoma 2 proto-oncogene
BMMC 	Bone marrow-derived mast cell
BPTI 	Bovine pancreatic trypsin inhibitor
BSF 	B-cell stimulating factor
bsg25D 	Blastoderm specific locus 25D
c- 	Cellular protooncogene ..
c1 	Regulatory locus of anthocyanin synthesis (maize)
C4BP 	Complement component C4-binding protein
CA 	Carbonic anhydrase
CAD 	Carbamoyl-phosphate synthetase (glutamine-hydrolysing)/aspartate
carbamoyl transferase/dihydroorotase
cab 	Chlorophyll a/b-binding protein
cAMP 	Cyclic AMP (Adenosinemonophosphate)
card-m. 	Cardiac muscle
cc-ind. 	Cell cycle-independent
CD3 	T-cell differentiation antigen CD3
CD4 	T-cell differentiation antigen CD4
CD8 	T-cell differentiation antigen CD8
CEA 	Carcinoembryonic antigen
CG 	Chorionic gonadotropin
CNS 	Central nervous system
CNTF 	Ciliary neurotrophic factor
car. 	Cartilage
col. 	Collagen
conglyc. 	Conglycinin
cor. 	Cornea
cotyl. 	Cotyledon
cp 	Cytoplasm(ic)
CPS 	Carbamyl-phosphate synthetase
CRF 	Corticotropin-releasing factor
CRP 	C-reactive protein
cs 	Cytosol(ic)
CSF 	Colony stimulating facter
cyt 	Cytokinin gene (coding for isopentenyltransferase)
DAF 	Decay-accelerating factor
dbp 	DNA binding protein
DDC 	DOPA decarboxylase
DDH 	Dihydrodiol dehydrogenase
dep. 	dependent
dev. 	Development(ally)
DHFR 	Dihydrofolate reductase
diff. 	differentiation, differentiated
DL/R 	Left and right duplicated region
dnc 	"dunce" locus
dUTPase 	Deoxyuridinetriphosphatase
E 	1. Early, 2. Erythroid cell-specific
E8 	Ethylene inducible gene during fruit ripening 8
EAS 	5-epi-aristolochene synthase (sesquiterpene cyclase)
EBNA 	Epstein-Barr virus nuclear antigens
ecd-ind. 	Ecdysone-inducible
EDF 	Eosinophil differentiation factor
EFW1 	Embryonic fast-white (myosin heavy chain) 1
EGF 	Epidermal growth factor
EIa 	Adenovirus early Ia region (transactivating element)
Eip 	Ecdysone-induced protein
ELH 	Egg-laying hormone
em 	Embryo, embryonic
epithel 	epithelial or epithelium
EPSP 	5-Enolpyruvylshikimate-3-phosphate
erbA,B 	(Avian) erythroblastosis virus oncogene A,B
E-resp. 	Estrogen-responsive
ERV3 	Endogenous retrovirus 3
E.Tn 	Early transposon
et-hypocot. 	Etiolated hypocotyl
ev1 	(Avian) endogenous virus 1
eve 	"even-skipped" locus
exch. 	Exchanger
f. 	Factor
fib. 	Fibers
fibrob. 	Fibroblasts
FMRFamide 	Phe-Met-Arg-Phe-NH(2) neuropeptide
FNR 	Ferredoxin-(NADP+)-oxidoreductase
FBP 	Folate Binding Protein
fos 	FBJ (Finkel-Biskis-Jinkins) osteosarcoma virus oncogene
FSH 	Follicle stimulating hormone
ftz 	"fushi tarazu" locus
g. 	Gene
G0S.. 	G0/G1 switch regulatory gene ..
G6PD 	Glucose-6-phosphate dehydrogenase
GA 	Gibberellic acid
GADPH 	Glyceraldehyde-3-phosphate dehydrogenase
GARS 	Glycinamide ribonucleotide synthase
Gart 	"Gart" locus (-> GARS, AIRS, GART)
GART 	Glycinamide ribonucleotide transformylase
gC 	Glycoprotein C
G-CSF 	Granulocyte colony stimulating factor
gD 	Glycoprotein D
GdX 	X-linked gene downstream of G6PD gene
gE 	Glycoprotein E
GFAP 	Glial fibrillary acidic protein
g'GT 	g'-Glutamyl transpeptidase
gln 	Glutamine
globul-12s 	12s globulin (oat seed storage protein)
glucc 	Glucocorticoid
GLUT1 	Glucose transporter type 1
GM-CSF 	Granulocyte/Macrophage colony stimulating factor
GnRH 	Gonadotropin-releasing hormone
gp 	Glycoprotein
GPD 	Glycerol-3-phosphate dehydrogenase
GPT 	UDP-GlcNAc:dolichol phosphate N-acetylglucosamine-1-phosphate
transferase
granulo-c 	Granulocyte
GRF 	Growth hormone-releasing factor
GRP 	Glycine-rich (cell wall) protein
GS17 	Gastrula-specific transcript 17
GSHPx 	Gluthathione peroxidase
G-spec. 	Gastrula-specific
GST 	Gutathione S-transferase
H 	1. Heavy chain, 2. Housekeeping-type promoter
Ha-ras 	Rat-derived Harvey murine sarcoma virus oncogene
haptoblob 	haptoglobin
hb 	"hunchbank" locus
Hc 	High-cysteine (chorion protein)
HDC 	L-histidine decarboxylase
hematop. 	hematopoietic
HGT 	High-(glycine+tyrosine) keratin
hist. 	Histone
HMG- 	High mobility group chromosomal protein
HMG-CoA 	3-Hydroxy-3-methylglutaryl coenzyme A
HPRT 	Hypoxanthine phosphoribosyltransferase
hs 	Heatshock
hsc 	Constitutive analogue of heatshock gene/protein
HSF 	Hepatocyte-stimulating factor
hsp 	Heatshock protein
Ht 	Testicular histone
HTF 	Restriction endonuclease HpaII tiny fragments
I-FABP 	Intestinal fatty-acid binding protein
IAA 	Indolacetic acid
IAP 	Intracisternal A-particles
ICP 	Infected cell protein
IE 	Immediate early (gene, RNA)
IF 	Intermediate filament
IFI 	Interferon-induced gene/protein
IFN 	Interferon
Ig 	Immunoglobulin
IGF 	Insulin-like growth factor
IL 	Interleukin
inf. 	Infected
inh. 	Inhibitor
iNOS 	Inducible nitric oxide synthase
IRF 	Interferon regulatory factor
ISG 	Interferon-stimulated gene
k. 	Kinase
keratino-c 	Keratinocyte
Ki-ras 	Rat-derived Kirsten murine sarcoma virus oncogene
L 	1. Light chain; 2. Late
larva-1,2,.. 	First, second, .. instar larva
LAT.. 	Lycopersicon anther-specific gene ..
LCAT 	Lecithin-cholesterol acyltransferase
lck 	T-cell- or lymphocyte-specific tyrosine kinase
LDH 	Lactate dehydrogenase
leghem. 	Leghemoglobin
LeIF 	Leukocyte interferon
leuko-c 	Leukocyte
LH 	Luteinizing hormone
LHC 	Light-harvesting complex
LHRH 	Luteinizing hormone-releasing factor
liv. 	liver
LMW 	Low molecular weight
LPH 	Lipotropic hormone
LPS 	Lipopolysaccharide
LTR 	Long Terminal Repeat
lympho-c 	Lymphocyte
lys 	Lysosomal
MBP 	Myelin basic protein
(MAC) 	Macaque
MC 	Methylcholanthrene
MCK 	Muscle-specific creatine kinase
mGK 	Submaxillary gland kallikrein
MHCI/MHCII 	Class I/II transplantation antigens of major
histocompatibility complex
MIF 	Macrophage migration inhibitory factor
minipara 	Miniparamyosin
mit 	Mitochondrial
mono-c 	Monocyte
mononuc-c. 	Mononuclear cells
MOPC.. 	Mineral oil-induced plasmacytoma
mos 	Moloney murine sarcoma virus oncogene
MP 	Macrophage
MPC.. 	Mouse plasma cell tumor
MRP 	MIF-related protein (see MIF)
MSF 	Megakaryocyte stimulating factor
msp 	Major sperm protein gene
MT 	Metallothionein
mst 	Male-specific transcript
MUP 	Major urinary protein
myb 	(Avian) myeoloblastosis virus oncogene
myc 	Myelocytomatosis virus 29 oncogene
NCA 	nonspecific cross-reacting (with -> CEA) antigen
nerv. sys 	Nervous system
neu 	Ethyl-nitrosurea-induced rat neuroblastoma oncogene
neuropep. 	Neuropeptide
NGF 	Nerve growth factor
ninaE 	"neither inactivation nor afterpotential" locus E
NMDH 	NADP-malate dehydrogenase
NOS 	Nitric oxide synthase
nos 	Nopaline synthetase
NR 	Nitrate reductase
N-ras 	Neuroblastoma ras-like (-> Ha-ras) oncogene
NS 	Nervous system
OAT 	Ornithine aminotransferase
ocs 	Octopine synthetase
ODC 	Ornithine decarboxylase
Ori 	Origin of replication
OTC 	Ornithine transcarbamylase
ovalb. 	Ovalbumin
p. 	Protein
P-450 	Cytochrome P-450
p53 	53K phosphoprotein
panc. 	pancreas, pancreatic
parath. 	Parathyroid
PB 	Phenobarbital
PBGD 	Porphobilinogen deaminase
PCNA 	Proliferating cell nuclear antigen
PDEase 	cAMP phosphodiesterase
PDGF 	Platelet-derived growth factor
PEPCase 	Phosphoenolpyruvate carboxylase
PEPCK 	Phosphoenolpyruvate carboxykinase
PG 	Prostaglandin
PGK 	3-Phosphoglycerate kinase
PHA 	Phytohemagglutinin
PK 	Protein kinase
P_L 	Late promoter
PLP 	Proteolipid protein
POL 	Polymerase
POMC 	Proopiomelanocortin
pp.. 	Phosphoprotein ..
PR1a 	Pathogenesis-related protein 1a
PRBP 	Plasma retinol-binding protein
PRL 	Prolactin
prog. 	Progesterone
prolyl 	4-hydr. Prolyl 4-hydroxylase
PrP 	Prion protein
PSG1,PSG2,. 	Pregnancy-specific glycoproteins 1,2,.
PSBP 	Prostatic steroid binding protein
PSP 	Parotid secretory protein
PTH 	Parathyroid hormone
pTiN 	Nopaline type tumor inducing plasmid
pTiO 	Octopine type tumor inducing plasmid
r 	"rudimentary" locus
R 	1. Regulatory subunit, 2. Erythroid cell-specific
RAB 	Gene responsive to ABA
ras 	Homologue of -> Ha-ras, Ki-ras, etc.
rec. 	Receptor
red. 	Reductase
reg. 	Regulated
rep-dep. 	Replication-dependent
rig 	Rat insulinoma gene
RnBP 	Renin-binding protein
RNR1, 	RNR2 Ribonucleotide reductase large, small subunit
rp 	Ribosomal protein
rTn 	Retrotransposon
RuBPCss 	Ribulose-1,5-biphosphate carboxylase small subunit
RuBPCA 	Ribulose-1,5-biphosphate carboxylase/oxygenase activase
s. 	Small
saliv-g. 	Salivary gland
SBP 	Spermine-binding protein
SC 	Stem cells
sem-v. 	Seminal vesicle
ser. 	Serum
sgs 	Salivary gland secretion protein
sis 	Simian sarcoma virus oncogene
sk-m. 	Skeletal muscle
skel-m. 	Skeletal muscle
smooth-m. 	Smooth muscle
snRNA 	Small nuclear RNA
snRNA 	Small nuclear ribonucleoprotein
SOD 	Superoxide dismutase
som 	Somatic
spat-reg. 	Spatially regulated
Spec 	Strongylocentrotus purpureatus ectoderm enriched RNA
SPI 	Serine protease inhibitor
sry 	"serendipity" locus
SV40T 	Tumor antigen of simian virus 40 (SV40)
SVS 	Seminal vesicle secretory protein
synt. 	Synthase
T3d' 	T-cell antigen receptor-associated T3-complex delta chain
TAT 	Tyrosine aminotransferase
TCDD 	2,3,7,8-Tetrachlorodibenzo-p-dioxin
TCGF 	T-cell growth factor
TCR 	T-cell receptor
TdT 	Terminal deoxynucleotidyltransferase
test. 	testis
TF 	Transcription factor
TGA1a 	TGACG-specific DNA-binding protein 1a
TGF-b' 	Transforming growth factor beta
TH 	Tyrosin hydroxylase
thyr. 	Thyroxine
Thy-1.2 	Thy-1 (thymocyte) antigen/glycoprotein allotype 2
TIF 	Trans-inducing factor
TIM 	Triosephosphate isomerase
tis. 	Tissue
TM 	Tropomyosin
tmr 	"tumor morphology root" locus
TNF 	Tumor necrosis factor
TnI 	Troponin I (inhibitory subunit)
TnT 	Troponin T (tropomyosin-binding subunit)
TO 	Tryptophan oxygenase
TP1,TP2,. 	Transition protein 1,2,.
TPA 	12-O-tetradecaonyl-phorbol-13-acetate
TPI 	Triosephosphate isomerase
tr.,tr- 	Transcript
TRF 	T-cell replacing factor
TRH 	Thyrotropin-releasing hormone
TS 	Thymidylate sythetase
TSH 	Thyroid stimulating hormone
T/t 	Large/small T(tumor) antigen
Ubx 	"ultrabithorax" locus
uPA 	Urine plasminogen activator
URO-D 	Uroporphyrinogen decarboxylase
Vg1 	Vegetal hemisphere-specific mRNA 1
vir-inf. 	Viral infection
VL30 	Retrovirus-like 30s RNA
VLDL 	Very low density lipoprotein
V_NP 	(Immunoglobulin heavy chain) variable region specific for
4-hydroxyl-3-nitrophenacetyl
VP5 	Virion protein 5 (HSV-1/2: =major capsid protein)
VSP 	Virion stimulatory protein
vWf 	von Willebrand factor
Zen 	"zerknuellt" protein

------------------------------------------------------------------------
EPD Home page <http://www.epd.isb-sib.ch/index.html>