ID I3JC78_ORENI Unreviewed; 1232 AA.
AC I3JC78;
DT 11-JUL-2012, integrated into UniProtKB/TrEMBL.
DT 17-JUN-2020, sequence version 2.
DT 27-MAR-2024, entry version 66.
DE SubName: Full=Collagen type II alpha 1 chain {ECO:0000313|Ensembl:ENSONIP00000006468.2};
GN Name=COL2A1 {ECO:0000313|Ensembl:ENSONIP00000006468.2};
OS Oreochromis niloticus (Nile tilapia) (Tilapia nilotica).
OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
OC Actinopterygii; Neopterygii; Teleostei; Neoteleostei; Acanthomorphata;
OC Ovalentaria; Cichlomorphae; Cichliformes; Cichlidae; African cichlids;
OC Pseudocrenilabrinae; Oreochromini; Oreochromis.
OX NCBI_TaxID=8128 {ECO:0000313|Ensembl:ENSONIP00000006468.2, ECO:0000313|Proteomes:UP000005207};
RN [1] {ECO:0000313|Proteomes:UP000005207}
RP NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
RG Broad Institute Genome Assembly Team;
RG Broad Institute Sequencing Platform;
RA Di Palma F., Johnson J., Lander E.S., Lindblad-Toh K.;
RT "The Genome Sequence of Oreochromis niloticus (Nile Tilapia).";
RL Submitted (JAN-2012) to the EMBL/GenBank/DDBJ databases.
RN [2] {ECO:0000313|Ensembl:ENSONIP00000006468.2}
RP IDENTIFICATION.
RG Ensembl;
RL Submitted (NOV-2023) to UniProtKB.
CC ---------------------------------------------------------------------------
CC Copyrighted by the UniProt Consortium, see https://www.uniprot.org/terms
CC Distributed under the Creative Commons Attribution (CC BY 4.0) License
CC ---------------------------------------------------------------------------
DR AlphaFoldDB; I3JC78; -.
DR STRING; 8128.ENSONIP00000070225; -.
DR Ensembl; ENSONIT00000006473.2; ENSONIP00000006468.2; ENSONIG00000005146.2.
DR eggNOG; KOG3544; Eukaryota.
DR GeneTree; ENSGT00940000155224; -.
DR HOGENOM; CLU_001074_2_3_1; -.
DR TreeFam; TF344135; -.
DR Proteomes; UP000005207; Linkage group LG5.
DR GO; GO:0005201; F:extracellular matrix structural constituent; IEA:InterPro.
DR GO; GO:0046872; F:metal ion binding; IEA:UniProtKB-KW.
DR Gene3D; 2.60.120.1000; -; 1.
DR Gene3D; 2.10.70.10; Complement Module, domain 1; 1.
DR InterPro; IPR008160; Collagen.
DR InterPro; IPR000885; Fib_collagen_C.
DR InterPro; IPR001007; VWF_dom.
DR PANTHER; PTHR24023; COLLAGEN ALPHA; 1.
DR PANTHER; PTHR24023:SF1108; ENDOSTATIN DOMAIN-CONTAINING PROTEIN; 1.
DR Pfam; PF01410; COLFI; 1.
DR Pfam; PF01391; Collagen; 9.
DR Pfam; PF00093; VWC; 1.
DR SMART; SM00038; COLFI; 1.
DR SMART; SM00214; VWC; 1.
DR SUPFAM; SSF57603; FnI-like domain; 1.
DR PROSITE; PS51461; NC1_FIB; 1.
DR PROSITE; PS01208; VWFC_1; 1.
DR PROSITE; PS50184; VWFC_2; 1.
PE 4: Predicted;
KW Disulfide bond {ECO:0000256|ARBA:ARBA00023157};
KW Extracellular matrix {ECO:0000256|ARBA:ARBA00022530};
KW Glycoprotein {ECO:0000256|ARBA:ARBA00023180};
KW Hydroxylation {ECO:0000256|ARBA:ARBA00023278};
KW Metal-binding {ECO:0000256|ARBA:ARBA00022723};
KW Reference proteome {ECO:0000313|Proteomes:UP000005207};
KW Secreted {ECO:0000256|ARBA:ARBA00022530}; Signal {ECO:0000256|SAM:SignalP}.
FT SIGNAL 1..22
FT /evidence="ECO:0000256|SAM:SignalP"
FT CHAIN 23..1232
FT /evidence="ECO:0000256|SAM:SignalP"
FT /id="PRO_5025578488"
FT DOMAIN 32..90
FT /note="VWFC"
FT /evidence="ECO:0000259|PROSITE:PS50184"
FT DOMAIN 998..1232
FT /note="Fibrillar collagen NC1"
FT /evidence="ECO:0000259|PROSITE:PS51461"
FT REGION 97..221
FT /note="Disordered"
FT /evidence="ECO:0000256|SAM:MobiDB-lite"
FT REGION 302..515
FT /note="Disordered"
FT /evidence="ECO:0000256|SAM:MobiDB-lite"
FT REGION 553..852
FT /note="Disordered"
FT /evidence="ECO:0000256|SAM:MobiDB-lite"
FT REGION 867..983
FT /note="Disordered"
FT /evidence="ECO:0000256|SAM:MobiDB-lite"
FT COMPBIAS 135..149
FT /note="Basic and acidic residues"
FT /evidence="ECO:0000256|SAM:MobiDB-lite"
FT COMPBIAS 157..173
FT /note="Pro residues"
FT /evidence="ECO:0000256|SAM:MobiDB-lite"
FT COMPBIAS 360..374
FT /note="Pro residues"
FT /evidence="ECO:0000256|SAM:MobiDB-lite"
FT COMPBIAS 727..742
FT /note="Pro residues"
FT /evidence="ECO:0000256|SAM:MobiDB-lite"
FT COMPBIAS 945..962
FT /note="Pro residues"
FT /evidence="ECO:0000256|SAM:MobiDB-lite"
SQ SEQUENCE 1232 AA; 122097 MW; 5F7BA2FAD988C369 CRC64;
MDSRTVLLLV ASQVCLLAVV RCQVEDDQED AFSCVQDGQR YNDKDVWKPE PCRICVCDTG
TVLCDEIVCE ELKDCPKPEI PFGECCPICA ADQPSTSGIP GVKGQKGEPG DITDVIGPRG
PPGPTGPPGE QGPRGSRGEK GEKGSPGPRG RDGEPGTPGN PGPPGPPGPN GPPGLGGNFA
AQMAGGFDEK AGGAQMGVMQ GPMGPMGPRG PPGPPGKPGD DVSVTVATKM KCPFFGARGF
PGTPGLPGIK GHRGYPGRDG AKGETGAVGA KVRWSDFSDS DPQLVLVCLP LLKMTLGPVG
PSGAPGFPGS PGAKGEAGPT GARGPEGAQG PRGESGTPGS PGPSGASVSD GAPGIAGAPG
FPGPRGPPGP QGATGPLGPK GTSVCTGPPG LQGPNGPQGE EGKRGPRGEP GSAGPRGPPG
ERGAPGNRGF PGQDGLAGPK GAPGERGPSG ASGPKGASGD PGRPGEPGLP GARLFSVWLD
FTQGAPGEDG RPGPPGPQGA RGQPGVMGFP GPKGATVSKK KKKYLLYGSN ILQCVAMSFS
FSFFSPFKGV PGEAGPAGAT GPRGERGFPG ERGAAGSQGL QGPRGLPGTP GTDGPKGAIG
PAGTAGAQGP PGLQGMPGER GGAGIPGPKG DRVSHDGKQG DRGETGPPGP AGFAGPPGAD
GQPGTKGEQG EPGQKGDAGA PGPQGPSGAP GPAGPTGVSG PKGARGAQGP PGATGFPGAA
GRVGPPGPNG NPGPPGPAGP PGKDGPKGVR GDGGPPGRQG DAGLRGPAGA SGEKGDAGED
GPPGPPGPSG PQGLAGQRGI VGLPGQRGER GFPGLPGPSG EPGKQGASGG PGDRGPPGPV
GPPGLTGPAG EPGREVREYL EIHTHITTLC SQGPQGPRGD KGEAGEAGER GQKGHRGFTG
LQGLPGPPGP PGPVGPAGKD GTNGLPGPIG PPGPRGRSGE TGPAGPPGNP GPPGPPGPPG
PGIDMSAFAG LGQTEKGPDP LRYMRADQAS GNLRQHDAEV DATLKSLNNQ IENIRSPEGS
KKNPARTCRD LKLCHPDWKS GEYWIDPNQG CTVDAIKVFC NMETGESCVH PKPSSIPRKN
WWTSKSTVPK HVWFGESMNG GFHFNYGDDS LAPNTAAIQM TFLRLLSTEA SQNITYHCKN
SVAYMDGATG NLKKAMLLQG SNDVEIRAEG NSRFTYAVLE DGCTRHTGRW GKTVIEYRSQ
KTSRLPILDI APMDIGGADQ EFGVDVGAVC FL
//