ID H2KUQ7_CLOSI Unreviewed; 1303 AA.
AC H2KUQ7;
DT 21-MAR-2012, integrated into UniProtKB/TrEMBL.
DT 21-MAR-2012, sequence version 1.
DT 27-MAR-2024, entry version 41.
DE SubName: Full=Collagen alpha-2(I) chain {ECO:0000313|EMBL:GAA31451.2};
GN ORFNames=CLF_109882 {ECO:0000313|EMBL:GAA31451.2};
OS Clonorchis sinensis (Chinese liver fluke).
OC Eukaryota; Metazoa; Spiralia; Lophotrochozoa; Platyhelminthes; Trematoda;
OC Digenea; Opisthorchiida; Opisthorchiata; Opisthorchiidae; Clonorchis.
OX NCBI_TaxID=79923 {ECO:0000313|EMBL:GAA31451.2, ECO:0000313|Proteomes:UP000008909};
RN [1] {ECO:0000313|EMBL:GAA31451.2}
RP NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
RC STRAIN=Henan {ECO:0000313|EMBL:GAA31451.2};
RX PubMed=22023798; DOI=10.1186/gb-2011-12-10-r107;
RA Wang X., Chen W., Huang Y., Sun J., Men J., Liu H., Luo F., Guo L., Lv X.,
RA Deng C., Zhou C., Fan Y., Li X., Huang L., Hu Y., Liang C., Hu X., Xu J.,
RA Yu X.;
RT "The draft genome of the carcinogenic human liver fluke Clonorchis
RT sinensis.";
RL Genome Biol. 12:R107-R107(2011).
CC ---------------------------------------------------------------------------
CC Copyrighted by the UniProt Consortium, see https://www.uniprot.org/terms
CC Distributed under the Creative Commons Attribution (CC BY 4.0) License
CC ---------------------------------------------------------------------------
DR EMBL; DF144154; GAA31451.2; -; Genomic_DNA.
DR Proteomes; UP000008909; Unassembled WGS sequence.
DR GO; GO:0005581; C:collagen trimer; IEA:UniProtKB-KW.
DR GO; GO:0005201; F:extracellular matrix structural constituent; IEA:InterPro.
DR Gene3D; 2.60.120.1000; -; 1.
DR InterPro; IPR008160; Collagen.
DR InterPro; IPR000885; Fib_collagen_C.
DR PANTHER; PTHR24023; COLLAGEN ALPHA; 1.
DR PANTHER; PTHR24023:SF1082; COLLAGEN ALPHA-1(X) CHAIN; 1.
DR Pfam; PF01410; COLFI; 1.
DR Pfam; PF01391; Collagen; 9.
DR SMART; SM00038; COLFI; 1.
DR PROSITE; PS51461; NC1_FIB; 1.
PE 4: Predicted;
KW Collagen {ECO:0000313|EMBL:GAA31451.2};
KW Reference proteome {ECO:0000313|Proteomes:UP000008909};
KW Signal {ECO:0000256|SAM:SignalP}.
FT SIGNAL 1..39
FT /evidence="ECO:0000256|SAM:SignalP"
FT CHAIN 40..1303
FT /evidence="ECO:0000256|SAM:SignalP"
FT /id="PRO_5003563208"
FT DOMAIN 1079..1303
FT /note="Fibrillar collagen NC1"
FT /evidence="ECO:0000259|PROSITE:PS51461"
FT REGION 41..640
FT /note="Disordered"
FT /evidence="ECO:0000256|SAM:MobiDB-lite"
FT REGION 654..1069
FT /note="Disordered"
FT /evidence="ECO:0000256|SAM:MobiDB-lite"
FT COMPBIAS 53..75
FT /note="Pro residues"
FT /evidence="ECO:0000256|SAM:MobiDB-lite"
FT COMPBIAS 166..183
FT /note="Pro residues"
FT /evidence="ECO:0000256|SAM:MobiDB-lite"
FT COMPBIAS 382..396
FT /note="Pro residues"
FT /evidence="ECO:0000256|SAM:MobiDB-lite"
FT COMPBIAS 496..529
FT /note="Pro residues"
FT /evidence="ECO:0000256|SAM:MobiDB-lite"
FT COMPBIAS 535..550
FT /note="Basic and acidic residues"
FT /evidence="ECO:0000256|SAM:MobiDB-lite"
FT COMPBIAS 727..752
FT /note="Pro residues"
FT /evidence="ECO:0000256|SAM:MobiDB-lite"
FT COMPBIAS 1041..1064
FT /note="Pro residues"
FT /evidence="ECO:0000256|SAM:MobiDB-lite"
SQ SEQUENCE 1303 AA; 126751 MW; 984104F607DBE8D5 CRC64;
MSRFDSEMSR RATFKQWCLL LPLFAFCLVL LSSINTVNGQ VSSPSEALGP RGPRGDPGPP
GPEGGMGPPG PRGPIGLPGT DGERGPPGPQ GVPGTPGRDG RPGVPGVQGT PGPQGIAGSP
GEPGPEGSPG PQGYMGPQGE KGDDGFEGAK GSQGVMGAQG LQGPVGPIGP PGPMGPPGPA
GPSGETGPAG RIGPLGIQGS VGSPGTPGIP GSVGQKGERG EPGKKGAKGS RGTAGNPGKP
GQAGQPGIQG EPGHDGRPGV RGEPGERGPV GAVGVDGRPG ERGLMGRLGA QGPKGAQGES
GVIGNVGPTG PIGIKGEKGQ VGRPGGVGEP GPLGPRGLMG PVGVKGVRGE LGPSGSPGSP
GLDGQPGTDG QPGEPGTPGE QGISGPPGPP GRPGLDGSPG PRGENGPPGL NGLPGLKGSS
GPMGPAGLPG LNGKPGATGP AGPMGQVGPR GISGPVGADG EKGTAGLPGP QGFPGDVGDA
GEPGADGEDG PEGPAGAIGP PGPPGPAGEQ GPPGTPGMAG PPGPTGYDGP PGPDGRDGEP
GKDGKPGEQG EPGEPGKTGP QGRPGQRGYL GPQGPRGIKG DTGIMGPPGV YGIIGAAGFP
GESGRQGTEG EPGEKGYPGM PGNKGRRGLR GAQGIRGPPG IAEVGQVINR TVVGSPGMVG
QRGATGPPGP SGVPGNKGPR GRKGIQGDRG DYGERGPPGP AGEPGADGEP GRDGENGPDG
ATGEAGPQGP PGPPGDLGTM GPQGPPGPPG PKGAVGLTGL RGEPGRRGPP GPPGLTGEVG
GIGPIGAPGI SGNLGRKGPT GQRGSPGPRG KPGLPGEAGK PGAKGHVGYP GFMGPPGEPG
PEGPAGTEGS EGPPGEQGPS GKYGEAGEAG NIGQPGPPGR PGPPGRRGPV GATGARGAQG
AVGKPGEIGL TGSIGFPGSR GPRGEPGEPG EVGPKGEAGL PGASGSKGHT GPRGDSGRPG
EAGKEGRPGK QGEPGPKGTP GGKGPVGLPG PPGLDGPMGY PGDQGPRGTP GPIGERGPMG
ARGKRGDRGD PGEVGPVGPP GRDGDPGPPG PQGIMGPMGP PGPPGQVVSM QARASRTKGW
MFSDEKAIRR RFGAIAPADP QGTQDAPART CAQMYSKFPN KPDGQYWINP SGSPLNEPTK
AICRSRNKQT CISSKKSRFE SKEWSTATPN EKRVWLQHIN NFGEFDYAIE SEQLNFLKLL
SNKATQQIIL RCMKQEEQSS QRNETTSAPP QLVQLLADDD TLLSPSALKR KVSITQNTCG
LSTEGVTVAF VDSRPSLLPL RDIQLTIDTS TIISVELGEA CFS
//