ID W6US03_ECHGR Unreviewed; 1561 AA.
AC W6US03;
DT 16-APR-2014, integrated into UniProtKB/TrEMBL.
DT 16-APR-2014, sequence version 1.
DT 27-MAR-2024, entry version 38.
DE SubName: Full=Collagen alpha-2(I) chain {ECO:0000313|EMBL:EUB61137.1};
GN ORFNames=EGR_03985 {ECO:0000313|EMBL:EUB61137.1};
OS Echinococcus granulosus (Hydatid tapeworm).
OC Eukaryota; Metazoa; Spiralia; Lophotrochozoa; Platyhelminthes; Cestoda;
OC Eucestoda; Cyclophyllidea; Taeniidae; Echinococcus;
OC Echinococcus granulosus group.
OX NCBI_TaxID=6210 {ECO:0000313|EMBL:EUB61137.1, ECO:0000313|Proteomes:UP000019149};
RN [1] {ECO:0000313|EMBL:EUB61137.1, ECO:0000313|Proteomes:UP000019149}
RP NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
RX PubMed=24013640; DOI=10.1038/ng.2757;
RA Zheng H., Zhang W., Zhang L., Zhang Z., Li J., Lu G., Zhu Y., Wang Y.,
RA Huang Y., Liu J., Kang H., Chen J., Wang L., Chen A., Yu S., Gao Z.,
RA Jin L., Gu W., Wang Z., Zhao L., Shi B., Wen H., Lin R., Jones M.K.,
RA Brejova B., Vinar T., Zhao G., McManus D.P., Chen Z., Zhou Y., Wang S.;
RT "The genome of the hydatid tapeworm Echinococcus granulosus.";
RL Nat. Genet. 45:1168-1175(2013).
CC -!- CAUTION: The sequence shown here is derived from an EMBL/GenBank/DDBJ
CC whole genome shotgun (WGS) entry which is preliminary data.
CC {ECO:0000313|EMBL:EUB61137.1}.
CC ---------------------------------------------------------------------------
CC Copyrighted by the UniProt Consortium, see https://www.uniprot.org/terms
CC Distributed under the Creative Commons Attribution (CC BY 4.0) License
CC ---------------------------------------------------------------------------
DR EMBL; APAU02000023; EUB61137.1; -; Genomic_DNA.
DR STRING; 6210.W6US03; -.
DR EnsemblMetazoa; XM_024493234.1; XP_024352333.1; GeneID_36339700.
DR OMA; HCREIME; -.
DR OrthoDB; 2970887at2759; -.
DR Proteomes; UP000019149; Unassembled WGS sequence.
DR GO; GO:0005581; C:collagen trimer; IEA:UniProtKB-KW.
DR GO; GO:0005201; F:extracellular matrix structural constituent; IEA:InterPro.
DR Gene3D; 2.60.120.1000; -; 1.
DR Gene3D; 2.60.120.200; -; 1.
DR InterPro; IPR008160; Collagen.
DR InterPro; IPR013320; ConA-like_dom_sf.
DR InterPro; IPR000885; Fib_collagen_C.
DR PANTHER; PTHR24023; COLLAGEN ALPHA; 1.
DR PANTHER; PTHR24023:SF1082; COLLAGEN ALPHA-1(X) CHAIN; 1.
DR Pfam; PF01410; COLFI; 1.
DR Pfam; PF01391; Collagen; 10.
DR SMART; SM00038; COLFI; 1.
DR SUPFAM; SSF49899; Concanavalin A-like lectins/glucanases; 1.
DR PROSITE; PS51461; NC1_FIB; 1.
PE 4: Predicted;
KW Collagen {ECO:0000313|EMBL:EUB61137.1};
KW Reference proteome {ECO:0000313|Proteomes:UP000019149}.
FT DOMAIN 1318..1561
FT /note="Fibrillar collagen NC1"
FT /evidence="ECO:0000259|PROSITE:PS51461"
FT REGION 143..191
FT /note="Disordered"
FT /evidence="ECO:0000256|SAM:MobiDB-lite"
FT REGION 223..754
FT /note="Disordered"
FT /evidence="ECO:0000256|SAM:MobiDB-lite"
FT REGION 802..1271
FT /note="Disordered"
FT /evidence="ECO:0000256|SAM:MobiDB-lite"
FT COMPBIAS 144..158
FT /note="Acidic residues"
FT /evidence="ECO:0000256|SAM:MobiDB-lite"
FT COMPBIAS 159..173
FT /note="Basic and acidic residues"
FT /evidence="ECO:0000256|SAM:MobiDB-lite"
FT COMPBIAS 382..417
FT /note="Pro residues"
FT /evidence="ECO:0000256|SAM:MobiDB-lite"
FT COMPBIAS 647..661
FT /note="Pro residues"
FT /evidence="ECO:0000256|SAM:MobiDB-lite"
FT COMPBIAS 686..702
FT /note="Basic and acidic residues"
FT /evidence="ECO:0000256|SAM:MobiDB-lite"
FT COMPBIAS 1072..1087
FT /note="Pro residues"
FT /evidence="ECO:0000256|SAM:MobiDB-lite"
FT COMPBIAS 1129..1144
FT /note="Pro residues"
FT /evidence="ECO:0000256|SAM:MobiDB-lite"
FT COMPBIAS 1217..1240
FT /note="Pro residues"
FT /evidence="ECO:0000256|SAM:MobiDB-lite"
FT COMPBIAS 1250..1269
FT /note="Pro residues"
FT /evidence="ECO:0000256|SAM:MobiDB-lite"
SQ SEQUENCE 1561 AA; 155054 MW; 4CFC782A935FD184 CRC64;
MKPGYKGELF IIYSSQGRVL LSVGVDKQVV VAYQEAAAGV RAARISTSGT AIHSEKIGPA
LDDNEWHRVG LNFKDGRVRL AVDCKAVEES TIVLPVKFID KRNTLSLLPH GFNGVLQDLM
LVLGDRSLEQ QCLIYTPDCP TDVDDAMMND EGEPGEQGDP GEPGEKGEPG KPGPRGPDGE
QGPPGPTGSL FVVPLNLGIG GGANARAAHF RQLLQQHLES LRGVHGPRGL TGPTGPDGPM
GPPGLKGETG PQGDTGLQGR RGPMGPVGFP GPRGKTGRDG DRGDPGPAGP KGPDGLPGDS
GLMGPKGLRG PRGPPGPVGE QGSPGEEGSR GPTGDAGDQG DMGSYGPRGP PGLQGPLGPR
GKTGSRGPVG LQGEVGEPGP TGMPGPMGPP GPAGPEGPAG PRGVVGPPGP PGKQGPPGAT
GPEGRPGYPG SKGEMGLKGD TGPEGQKGEP GLPGPRGVKG ARGIVGISGP KGDTGKPGAV
GVPGEAGPKG LKGSRGSPGV MGLMGPQGEK GEPGKSGPFG PRGDRGLQGP PGLMGPAGPQ
GLLGDPGPTG PAGPQGREGD KGETGPVGPP GADGEQGQHG APGPRGIQGK VGPAGEKGDK
GATGPPGPPG PTGELGFQGH PGPSGLPGPA GPMGRQGPQG FPGPRGEPGE IGPPGPRGPI
GMTGPVGPAG EILNFFQQGP TGDPGKEGKD GPKGDKGPKG ERGNRGPPGQ RGPRGYLGLD
GMPGPAGPEG FKGETGPDGE VGPTGPKGYR GDAGILGQVG PPVIQSISIW IGSLLPPEAT
SKDKGNGTIF SISSEFLRLL TGPLGPRGAQ GIQGEKGPKG PDGAMGTVGP PGSRGSRGRS
GPTGPIGAPG FDGEKGERGP PGPQGKKGPP GPQGPRGIVG PIGSAGMPGI QGPPGAPGQP
GEPGPMGLIG PDGLTGPSGL PGPMGMEGAP GPIGKRGPKG DRGREGPQGS PGPMGPMGSM
GLTGPAGNPG PTGPAGAVGE PGDKGQQGPR GSPGYEGSPG NQGPQGRKGP DGPAGQVGPP
GDPGMEGPIG PQGPTGPLGP PGSQGARGIP GGRGSPGPVG EDGAPGNTGP QGKPGPRGPP
GPQGPQGPAG QAGPQGLPGP RGSKGPKGEP GPQGPPGETG NPGPRGLSGP PGLRGPPGPA
GLQGPPGVKG PEGEVGPEGI PGATGADGGK GETGPKGPKG AYGPVGPMGP PGPRGERGPT
GVEGERGRIG PKGAQGPPGP TGDPGPPGDS GPPGPEGPMG VRGPVGPGGP RGPKGKPGPQ
GPPGPPGPVK ILDLTAGYFK FEPGRTKRSI NEDELYDMKD PDASLFPNNV PAIGAVLRRL
YARIESLESA VRYYRRPIGT RAYPARHCRE IMEATDSPHG PVSGEYWIDP NLGSSRDAIK
VECKFSGSVA KTCVHATPES KALRLVNLRK SGGEGSWWFS KLLEGNTNGT QRLYYAPRNQ
FNFLQLLHHR AEQSITALCR GSVVYYDSRN KNYDLAANLL LFNGKVINTH LDRRVRGEGG
FVQLEVNIKD DCMDRSPTGS TANFDLVANH PELLPIIDMK MFDFGEDNQQ LGYYVNEVCF
S
//