ID H2YGA3_CIOSA Unreviewed; 1333 AA.
AC H2YGA3;
DT 18-APR-2012, integrated into UniProtKB/TrEMBL.
DT 18-APR-2012, sequence version 1.
DT 27-MAR-2024, entry version 53.
DE RecName: Full=Fibrillar collagen NC1 domain-containing protein {ECO:0000259|PROSITE:PS51461};
OS Ciona savignyi (Pacific transparent sea squirt).
OC Eukaryota; Metazoa; Chordata; Tunicata; Ascidiacea; Phlebobranchia;
OC Cionidae; Ciona.
OX NCBI_TaxID=51511 {ECO:0000313|Ensembl:ENSCSAVP00000004352.1, ECO:0000313|Proteomes:UP000007875};
RN [1] {ECO:0000313|Proteomes:UP000007875}
RP NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
RA Birren B., Nusbaum C., Abebe A., Abouelleil A., Adekoya E., Ait-zahra M.,
RA Allen N., Allen T., An P., Anderson M., Anderson S., Arachchi H.,
RA Armbruster J., Bachantsang P., Baldwin J., Barry A., Bayul T.,
RA Blitshsteyn B., Bloom T., Blye J., Boguslavskiy L., Borowsky M.,
RA Boukhgalter B., Brunache A., Butler J., Calixte N., Calvo S., Camarata J.,
RA Campo K., Chang J., Cheshatsang Y., Citroen M., Collymore A., Considine T.,
RA Cook A., Cooke P., Corum B., Cuomo C., David R., Dawoe T., Degray S.,
RA Dodge S., Dooley K., Dorje P., Dorjee K., Dorris L., Duffey N., Dupes A.,
RA Elkins T., Engels R., Erickson J., Farina A., Faro S., Ferreira P.,
RA Fischer H., Fitzgerald M., Foley K., Gage D., Galagan J., Gearin G.,
RA Gnerre S., Gnirke A., Goyette A., Graham J., Grandbois E., Gyaltsen K.,
RA Hafez N., Hagopian D., Hagos B., Hall J., Hatcher B., Heller A.,
RA Higgins H., Honan T., Horn A., Houde N., Hughes L., Hulme W., Husby E.,
RA Iliev I., Jaffe D., Jones C., Kamal M., Kamat A., Kamvysselis M.,
RA Karlsson E., Kells C., Kieu A., Kisner P., Kodira C., Kulbokas E.,
RA Labutti K., Lama D., Landers T., Leger J., Levine S., Lewis D., Lewis T.,
RA Lindblad-toh K., Liu X., Lokyitsang T., Lokyitsang Y., Lucien O., Lui A.,
RA Ma L.J., Mabbitt R., Macdonald J., Maclean C., Major J., Manning J.,
RA Marabella R., Maru K., Matthews C., Mauceli E., Mccarthy M., Mcdonough S.,
RA Mcghee T., Meldrim J., Meneus L., Mesirov J., Mihalev A., Mihova T.,
RA Mikkelsen T., Mlenga V., Moru K., Mozes J., Mulrain L., Munson G.,
RA Naylor J., Newes C., Nguyen C., Nguyen N., Nguyen T., Nicol R., Nielsen C.,
RA Nizzari M., Norbu C., Norbu N., O'donnell P., Okoawo O., O'leary S.,
RA Omotosho B., O'neill K., Osman S., Parker S., Perrin D., Phunkhang P.,
RA Piqani B., Purcell S., Rachupka T., Ramasamy U., Rameau R., Ray V.,
RA Raymond C., Retta R., Richardson S., Rise C., Rodriguez J., Rogers J.,
RA Rogov P., Rutman M., Schupbach R., Seaman C., Settipalli S., Sharpe T.,
RA Sheridan J., Sherpa N., Shi J., Smirnov S., Smith C., Sougnez C.,
RA Spencer B., Stalker J., Stange-thomann N., Stavropoulos S., Stetson K.,
RA Stone C., Stone S., Stubbs M., Talamas J., Tchuinga P., Tenzing P.,
RA Tesfaye S., Theodore J., Thoulutsang Y., Topham K., Towey S., Tsamla T.,
RA Tsomo N., Vallee D., Vassiliev H., Venkataraman V., Vinson J., Vo A.,
RA Wade C., Wang S., Wangchuk T., Wangdi T., Whittaker C., Wilkinson J.,
RA Wu Y., Wyman D., Yadav S., Yang S., Yang X., Yeager S., Yee E., Young G.,
RA Zainoun J., Zembeck L., Zimmer A., Zody M., Lander E.;
RL Submitted (AUG-2003) to the EMBL/GenBank/DDBJ databases.
RN [2] {ECO:0000313|Ensembl:ENSCSAVP00000004352.1}
RP IDENTIFICATION.
RG Ensembl;
RL Submitted (NOV-2023) to UniProtKB.
CC ---------------------------------------------------------------------------
CC Copyrighted by the UniProt Consortium, see https://www.uniprot.org/terms
CC Distributed under the Creative Commons Attribution (CC BY 4.0) License
CC ---------------------------------------------------------------------------
DR Ensembl; ENSCSAVT00000004416.1; ENSCSAVP00000004352.1; ENSCSAVG00000002577.1.
DR GeneTree; ENSGT00940000167228; -.
DR Proteomes; UP000007875; Unassembled WGS sequence.
DR GO; GO:0005201; F:extracellular matrix structural constituent; IEA:InterPro.
DR Gene3D; 2.60.120.1000; -; 1.
DR InterPro; IPR008160; Collagen.
DR InterPro; IPR000885; Fib_collagen_C.
DR PANTHER; PTHR24023; COLLAGEN ALPHA; 1.
DR PANTHER; PTHR24023:SF1100; FIBRILLAR COLLAGEN NC1 DOMAIN-CONTAINING PROTEIN; 1.
DR Pfam; PF01410; COLFI; 2.
DR Pfam; PF01391; Collagen; 8.
DR SMART; SM00038; COLFI; 1.
DR PROSITE; PS51461; NC1_FIB; 1.
PE 4: Predicted;
KW Extracellular matrix {ECO:0000256|ARBA:ARBA00022530};
KW Reference proteome {ECO:0000313|Proteomes:UP000007875};
KW Secreted {ECO:0000256|ARBA:ARBA00022530}.
FT DOMAIN 1125..1332
FT /note="Fibrillar collagen NC1"
FT /evidence="ECO:0000259|PROSITE:PS51461"
FT REGION 1..70
FT /note="Disordered"
FT /evidence="ECO:0000256|SAM:MobiDB-lite"
FT REGION 127..1116
FT /note="Disordered"
FT /evidence="ECO:0000256|SAM:MobiDB-lite"
FT COMPBIAS 18..41
FT /note="Pro residues"
FT /evidence="ECO:0000256|SAM:MobiDB-lite"
FT COMPBIAS 304..318
FT /note="Polar residues"
FT /evidence="ECO:0000256|SAM:MobiDB-lite"
FT COMPBIAS 433..450
FT /note="Basic and acidic residues"
FT /evidence="ECO:0000256|SAM:MobiDB-lite"
FT COMPBIAS 784..799
FT /note="Pro residues"
FT /evidence="ECO:0000256|SAM:MobiDB-lite"
FT COMPBIAS 1005..1025
FT /note="Basic and acidic residues"
FT /evidence="ECO:0000256|SAM:MobiDB-lite"
FT COMPBIAS 1094..1116
FT /note="Basic and acidic residues"
FT /evidence="ECO:0000256|SAM:MobiDB-lite"
SQ SEQUENCE 1333 AA; 129917 MW; DCDA05FB6A4506CD CRC64;
QGQKGAKGEQ AVVEPGIFIP GPPGPPGDPG LDGPPGFQGP PGGPGEYGDR GPPGRDGLPG
EKGLPGPPGR HIMIPFRIAQ PNGEKGPGSN APAEMQAQLA LTQARLAMQG PPGPQGLAGM
PGTAGDMGPA GIKGESGEPG PMGQRGPRGS MGPDGMPGKP GRVGEDGLRG EPGAMGSKGD
RGYDGLPGLP GPKGHRVSHV GPIGPPGGDG EQGEQGEAGP RGLPGEAGPR GEMGHKGPPG
VSGPPGPQGE PGPPGQQGTS GPQGIPGPQG MMGPPGTKGP HGKAGLPGIP GADGAPGHPG
NEGPTGSKGS QGQPGPQGPQ GYPGTRGVKG ENGVRGIKGN KGEKGLDGLP GFKGDMGPKG
DTGIAGPTGS RGEDGPEGPK GREGARGELG PVGLTGEKGK IGVPGLPGYP GRSGIKGSLG
KPGKPGQMGL KGDRGLEGKR GQEGQRGPKG PRGKRGPRGS TGKAGPKGDR GQDGPQGPIG
ERGPPGPRGP SGYVGTKGPP GPPGKDGLPG HPGSRGETGF QGKTGPPGPA GVVGPQGPTG
ENGPSGNRGH PGPPGPAGEP GLSGSAGKEG SKGDRGPRGP IGKMGATGSQ GFPGSRGPQG
PVGAPGLKGS EGPPGPPGPL GAVGQRGPQG PAGQIGPAGS ANGPPGPQGE NGSPGGPGGP
GPAGRDGLQG PVGLPGAPGS IGPRGEDGDK GEAGPPGATG LKGGKGEHGP PGPPGVQGNS
GDPGPPGNDG EPGQRGQQGL YGEKGDEGPR GFPGPPGPRG LQGIPGPSGS KGDTGDAGPL
GPPGPAGQRG PPGGPEGPPG TYGQPGNVGD KGEPGGPGAP GISGEPGPMG PKGENGEKGE
AGLTGPQGEA GPRGPRGDDG PKGNPGPVGF PGDPGVSGIP GTPGDEGVPG DVGDTGAPGE
PGPPGPSGEV GPSGGPGRRG ESGGIGPVGE PGPHGLQGKT GKRGTTGLQG LPGPAGAPGL
PGSSGADGPV GPMGPSGLKG IKGEMGVSGE KGHPGLIGLV GPPGEEGEKG ERGPQGRDGP
HGAKGDDGRP GPSGPVGPMG APGLPGSLGS KGNKGSLGPT GPKGDEGIQG PPGPPGPPGQ
VYNASPLTAN SMKARRRRST EEEGLHREKR QAQDESFIEY PEGLEEIYAA METLKQELEM
MKEPMGRTQD NPGRSCKDIW LCHPDFPSGN YWIDPNGGCS ADAIEVFCDF EAEGDTCISP
VERTASVSWL TCLSDPPLVC IPQFNFLRLL SSQAKQRFTY KCVNSIGWEN QQTGSFDQAI
HLLAANDEVL TYGSEHLTVI EDNCKTGHGN GQVVLELRTR EVDLLPLFDY KAFDFGTRSQ
RHGYQLDRVC FSG
//