GenomeNet

Database: UniProt
Entry: H2KUQ7_CLOSI
LinkDB: H2KUQ7_CLOSI
Original site: H2KUQ7_CLOSI 
ID   H2KUQ7_CLOSI            Unreviewed;      1303 AA.
AC   H2KUQ7;
DT   21-MAR-2012, integrated into UniProtKB/TrEMBL.
DT   21-MAR-2012, sequence version 1.
DT   27-MAR-2024, entry version 41.
DE   SubName: Full=Collagen alpha-2(I) chain {ECO:0000313|EMBL:GAA31451.2};
GN   ORFNames=CLF_109882 {ECO:0000313|EMBL:GAA31451.2};
OS   Clonorchis sinensis (Chinese liver fluke).
OC   Eukaryota; Metazoa; Spiralia; Lophotrochozoa; Platyhelminthes; Trematoda;
OC   Digenea; Opisthorchiida; Opisthorchiata; Opisthorchiidae; Clonorchis.
OX   NCBI_TaxID=79923 {ECO:0000313|EMBL:GAA31451.2, ECO:0000313|Proteomes:UP000008909};
RN   [1] {ECO:0000313|EMBL:GAA31451.2}
RP   NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
RC   STRAIN=Henan {ECO:0000313|EMBL:GAA31451.2};
RX   PubMed=22023798; DOI=10.1186/gb-2011-12-10-r107;
RA   Wang X., Chen W., Huang Y., Sun J., Men J., Liu H., Luo F., Guo L., Lv X.,
RA   Deng C., Zhou C., Fan Y., Li X., Huang L., Hu Y., Liang C., Hu X., Xu J.,
RA   Yu X.;
RT   "The draft genome of the carcinogenic human liver fluke Clonorchis
RT   sinensis.";
RL   Genome Biol. 12:R107-R107(2011).
CC   ---------------------------------------------------------------------------
CC   Copyrighted by the UniProt Consortium, see https://www.uniprot.org/terms
CC   Distributed under the Creative Commons Attribution (CC BY 4.0) License
CC   ---------------------------------------------------------------------------
DR   EMBL; DF144154; GAA31451.2; -; Genomic_DNA.
DR   Proteomes; UP000008909; Unassembled WGS sequence.
DR   GO; GO:0005581; C:collagen trimer; IEA:UniProtKB-KW.
DR   GO; GO:0005201; F:extracellular matrix structural constituent; IEA:InterPro.
DR   Gene3D; 2.60.120.1000; -; 1.
DR   InterPro; IPR008160; Collagen.
DR   InterPro; IPR000885; Fib_collagen_C.
DR   PANTHER; PTHR24023; COLLAGEN ALPHA; 1.
DR   PANTHER; PTHR24023:SF1082; COLLAGEN ALPHA-1(X) CHAIN; 1.
DR   Pfam; PF01410; COLFI; 1.
DR   Pfam; PF01391; Collagen; 9.
DR   SMART; SM00038; COLFI; 1.
DR   PROSITE; PS51461; NC1_FIB; 1.
PE   4: Predicted;
KW   Collagen {ECO:0000313|EMBL:GAA31451.2};
KW   Reference proteome {ECO:0000313|Proteomes:UP000008909};
KW   Signal {ECO:0000256|SAM:SignalP}.
FT   SIGNAL          1..39
FT                   /evidence="ECO:0000256|SAM:SignalP"
FT   CHAIN           40..1303
FT                   /evidence="ECO:0000256|SAM:SignalP"
FT                   /id="PRO_5003563208"
FT   DOMAIN          1079..1303
FT                   /note="Fibrillar collagen NC1"
FT                   /evidence="ECO:0000259|PROSITE:PS51461"
FT   REGION          41..640
FT                   /note="Disordered"
FT                   /evidence="ECO:0000256|SAM:MobiDB-lite"
FT   REGION          654..1069
FT                   /note="Disordered"
FT                   /evidence="ECO:0000256|SAM:MobiDB-lite"
FT   COMPBIAS        53..75
FT                   /note="Pro residues"
FT                   /evidence="ECO:0000256|SAM:MobiDB-lite"
FT   COMPBIAS        166..183
FT                   /note="Pro residues"
FT                   /evidence="ECO:0000256|SAM:MobiDB-lite"
FT   COMPBIAS        382..396
FT                   /note="Pro residues"
FT                   /evidence="ECO:0000256|SAM:MobiDB-lite"
FT   COMPBIAS        496..529
FT                   /note="Pro residues"
FT                   /evidence="ECO:0000256|SAM:MobiDB-lite"
FT   COMPBIAS        535..550
FT                   /note="Basic and acidic residues"
FT                   /evidence="ECO:0000256|SAM:MobiDB-lite"
FT   COMPBIAS        727..752
FT                   /note="Pro residues"
FT                   /evidence="ECO:0000256|SAM:MobiDB-lite"
FT   COMPBIAS        1041..1064
FT                   /note="Pro residues"
FT                   /evidence="ECO:0000256|SAM:MobiDB-lite"
SQ   SEQUENCE   1303 AA;  126751 MW;  984104F607DBE8D5 CRC64;
     MSRFDSEMSR RATFKQWCLL LPLFAFCLVL LSSINTVNGQ VSSPSEALGP RGPRGDPGPP
     GPEGGMGPPG PRGPIGLPGT DGERGPPGPQ GVPGTPGRDG RPGVPGVQGT PGPQGIAGSP
     GEPGPEGSPG PQGYMGPQGE KGDDGFEGAK GSQGVMGAQG LQGPVGPIGP PGPMGPPGPA
     GPSGETGPAG RIGPLGIQGS VGSPGTPGIP GSVGQKGERG EPGKKGAKGS RGTAGNPGKP
     GQAGQPGIQG EPGHDGRPGV RGEPGERGPV GAVGVDGRPG ERGLMGRLGA QGPKGAQGES
     GVIGNVGPTG PIGIKGEKGQ VGRPGGVGEP GPLGPRGLMG PVGVKGVRGE LGPSGSPGSP
     GLDGQPGTDG QPGEPGTPGE QGISGPPGPP GRPGLDGSPG PRGENGPPGL NGLPGLKGSS
     GPMGPAGLPG LNGKPGATGP AGPMGQVGPR GISGPVGADG EKGTAGLPGP QGFPGDVGDA
     GEPGADGEDG PEGPAGAIGP PGPPGPAGEQ GPPGTPGMAG PPGPTGYDGP PGPDGRDGEP
     GKDGKPGEQG EPGEPGKTGP QGRPGQRGYL GPQGPRGIKG DTGIMGPPGV YGIIGAAGFP
     GESGRQGTEG EPGEKGYPGM PGNKGRRGLR GAQGIRGPPG IAEVGQVINR TVVGSPGMVG
     QRGATGPPGP SGVPGNKGPR GRKGIQGDRG DYGERGPPGP AGEPGADGEP GRDGENGPDG
     ATGEAGPQGP PGPPGDLGTM GPQGPPGPPG PKGAVGLTGL RGEPGRRGPP GPPGLTGEVG
     GIGPIGAPGI SGNLGRKGPT GQRGSPGPRG KPGLPGEAGK PGAKGHVGYP GFMGPPGEPG
     PEGPAGTEGS EGPPGEQGPS GKYGEAGEAG NIGQPGPPGR PGPPGRRGPV GATGARGAQG
     AVGKPGEIGL TGSIGFPGSR GPRGEPGEPG EVGPKGEAGL PGASGSKGHT GPRGDSGRPG
     EAGKEGRPGK QGEPGPKGTP GGKGPVGLPG PPGLDGPMGY PGDQGPRGTP GPIGERGPMG
     ARGKRGDRGD PGEVGPVGPP GRDGDPGPPG PQGIMGPMGP PGPPGQVVSM QARASRTKGW
     MFSDEKAIRR RFGAIAPADP QGTQDAPART CAQMYSKFPN KPDGQYWINP SGSPLNEPTK
     AICRSRNKQT CISSKKSRFE SKEWSTATPN EKRVWLQHIN NFGEFDYAIE SEQLNFLKLL
     SNKATQQIIL RCMKQEEQSS QRNETTSAPP QLVQLLADDD TLLSPSALKR KVSITQNTCG
     LSTEGVTVAF VDSRPSLLPL RDIQLTIDTS TIISVELGEA CFS
//
DBGET integrated database retrieval system