ID PGCA_CHICK Reviewed; 2109 AA.
AC P07898; Q90810; Q90820; Q90991; Q91047;
DT 01-AUG-1988, integrated into UniProtKB/Swiss-Prot.
DT 01-NOV-1997, sequence version 2.
DT 03-APR-2013, entry version 126.
DE RecName: Full=Aggrecan core protein;
DE AltName: Full=Cartilage-specific proteoglycan core protein;
DE Short=CSPCP;
DE Flags: Precursor;
GN Name=ACAN; Synonyms=AGC1;
OS Gallus gallus (Chicken).
OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
OC Archosauria; Dinosauria; Saurischia; Theropoda; Coelurosauria; Aves;
OC Neognathae; Galliformes; Phasianidae; Phasianinae; Gallus.
OX NCBI_TaxID=9031;
RN [1]
RP NUCLEOTIDE SEQUENCE [MRNA] (ISOFORM 1).
RC STRAIN=White leghorn; TISSUE=Embryo;
RX PubMed=8226878;
RA Li H., Schwartz N.B., Vertel B.M.;
RT "cDNA cloning of chick cartilage chondroitin sulfate (aggrecan) core
RT protein and identification of a stop codon in the aggrecan gene
RT associated with the chondrodystrophy, nanomelia.";
RL J. Biol. Chem. 268:23504-23511(1993).
RN [2]
RP NUCLEOTIDE SEQUENCE [MRNA] (ISOFORM 2).
RC TISSUE=Cartilage;
RX PubMed=1339285;
RA Chandrasekaran L., Tanzer M.L.;
RT "Molecular cloning of chicken aggrecan. Structural analyses.";
RL Biochem. J. 288:903-910(1992).
RN [3]
RP ERRATUM.
RX PubMed=8280087;
RA Chandrasekaran L., Tanzer M.L.;
RL Biochem. J. 296:885-887(1993).
RN [4]
RP NUCLEOTIDE SEQUENCE [MRNA] OF 1042-1559 (ISOFORMS 1/2).
RC TISSUE=Embryo;
RX PubMed=1694853;
RA Krueger R.C. Jr., Fields T.A., Mensch J.R. Jr., Schwartz N.B.;
RT "Chick cartilage chondroitin sulfate proteoglycan core protein. II.
RT Nucleotide sequence of cDNA clone and localization of the S103L
RT epitope.";
RL J. Biol. Chem. 265:12088-12097(1990).
RN [5]
RP NUCLEOTIDE SEQUENCE [GENOMIC DNA] OF 1492-1610.
RC STRAIN=White leghorn; TISSUE=Chondrocyte;
RX PubMed=7827752; DOI=10.1016/0945-053X(94)90195-3;
RA Primorac D., Stover M.L., Clark S.H., Rowe D.W.;
RT "Molecular basis of nanomelia, a heritable chondrodystrophy of
RT chicken.";
RL Matrix Biol. 14:297-305(1994).
RN [6]
RP NUCLEOTIDE SEQUENCE [MRNA] OF 1693-2109 (ISOFORM 2).
RX PubMed=3460082; DOI=10.1073/pnas.83.14.5081;
RA Sai S., Tanaka T., Kosher R.A., Tanzer M.L.;
RT "Cloning and sequence analysis of a partial cDNA for chicken cartilage
RT proteoglycan core protein.";
RL Proc. Natl. Acad. Sci. U.S.A. 83:5081-5085(1986).
RN [7]
RP NUCLEOTIDE SEQUENCE [GENOMIC DNA] OF 1894-2109.
RX PubMed=3170613;
RA Tanaka T., Har-El R., Tanzer M.L.;
RT "Partial structure of the gene for chicken cartilage proteoglycan core
RT protein.";
RL J. Biol. Chem. 263:15831-15835(1988).
RN [8]
RP PROTEIN SEQUENCE OF 718-736; 998-1023 AND 1247-1275, AND GLYCOSYLATION
RP AT THR-728; SER-1006; SER-1010; SER-1016; SER-1249; SER-1253;
RP SER-1259; SER-1263 AND SER-1269.
RX PubMed=2365711;
RA Krueger R.C. Jr., Fields T.A., Hildreth J. IV, Schwartz N.B.;
RT "Chick cartilage chondroitin sulfate proteoglycan core protein. I.
RT Generation and characterization of peptides and specificity for
RT glycosaminoglycan attachment.";
RL J. Biol. Chem. 265:12075-12087(1990).
CC -!- FUNCTION: This proteoglycan is a major component of extracellular
CC matrix of cartilagenous tissues. A major function of this protein
CC is to resist compression in cartilage. It binds avidly to
CC hyaluronic acid via an N-terminal globular region. May play a
CC regulatory role in the matrix assembly of the cartilage.
CC -!- SUBCELLULAR LOCATION: Secreted, extracellular space, extracellular
CC matrix (By similarity).
CC -!- ALTERNATIVE PRODUCTS:
CC Event=Alternative splicing; Named isoforms=2;
CC Name=1;
CC IsoId=P07898-1; Sequence=Displayed;
CC Name=2;
CC IsoId=P07898-2; Sequence=VSP_003073;
CC -!- DOMAIN: Two globular domains, G1 and G2, comprise the N-terminus
CC of the proteoglycan, while another globular region, G3, makes up
CC the C-terminus. G1 contains Link domains and thus consists of
CC three disulfide-bonded loop structures designated as the A, B, B'
CC motifs. G2 is similar to G1. The keratan sulfate (KS) and the
CC chondroitin sulfate (CS) attachment domains lie between G2 and G3.
CC -!- PTM: Contains mostly chondroitin sulfate, but also keratan sulfate
CC chains, N-linked and O-linked oligosaccharides.
CC -!- DISEASE: Note=Defects in ACAN are the cause of nanomelia, a lethal
CC connective tissue disorder affecting cartilage development
CC (chondrodystrophy) characterized by shortened and malformed limbs.
CC Aggrecan is truncated at its C-terminus in the CS-2 binding domain
CC and is not anymore secreted from the chondrocytes.
CC -!- SIMILARITY: Belongs to the aggrecan/versican proteoglycan family.
CC -!- SIMILARITY: Contains 1 C-type lectin domain.
CC -!- SIMILARITY: Contains 1 EGF-like domain.
CC -!- SIMILARITY: Contains 1 Ig-like V-type (immunoglobulin-like)
CC domain.
CC -!- SIMILARITY: Contains 4 Link domains.
CC -!- SIMILARITY: Contains 1 Sushi (CCP/SCR) domain.
CC -----------------------------------------------------------------------
CC Copyrighted by the UniProt Consortium, see http://www.uniprot.org/terms
CC Distributed under the Creative Commons Attribution-NoDerivs License
CC -----------------------------------------------------------------------
DR EMBL; L21913; AAB19128.1; -; mRNA.
DR EMBL; M88101; -; NOT_ANNOTATED_CDS; mRNA.
DR EMBL; M38187; AAA48731.1; -; mRNA.
DR EMBL; S74657; AAC60751.1; -; Genomic_DNA.
DR EMBL; S74656; AAC60751.1; JOINED; Genomic_DNA.
DR EMBL; M13993; AAA48720.1; -; mRNA.
DR EMBL; J04028; AAA48719.1; -; Genomic_DNA.
DR IPI; IPI00575366; -.
DR IPI; IPI00592668; -.
DR PIR; I50421; I50421.
DR UniGene; Gga.3977; -.
DR ProteinModelPortal; P07898; -.
DR SMR; P07898; 1897-2020.
DR PaxDb; P07898; -.
DR eggNOG; NOG12793; -.
DR HOGENOM; HOG000168421; -.
DR HOVERGEN; HBG007982; -.
DR InParanoid; P07898; -.
DR OrthoDB; EOG43BMN5; -.
DR GO; GO:0005578; C:proteinaceous extracellular matrix; IEA:UniProtKB-SubCell.
DR GO; GO:0005509; F:calcium ion binding; IEA:InterPro.
DR GO; GO:0030246; F:carbohydrate binding; IEA:InterPro.
DR GO; GO:0005540; F:hyaluronic acid binding; IEA:InterPro.
DR GO; GO:0007155; P:cell adhesion; IEA:InterPro.
DR Gene3D; 2.60.40.10; -; 1.
DR Gene3D; 3.10.100.10; -; 5.
DR InterPro; IPR001304; C-type_lectin.
DR InterPro; IPR016186; C-type_lectin-like.
DR InterPro; IPR018378; C-type_lectin_CS.
DR InterPro; IPR016187; C-type_lectin_fold.
DR InterPro; IPR000742; EG-like_dom.
DR InterPro; IPR001881; EGF-like_Ca-bd.
DR InterPro; IPR013032; EGF-like_CS.
DR InterPro; IPR000152; EGF-type_Asp/Asn_hydroxyl_site.
DR InterPro; IPR018097; EGF_Ca-bd_CS.
DR InterPro; IPR007110; Ig-like_dom.
DR InterPro; IPR013783; Ig-like_fold.
DR InterPro; IPR003599; Ig_sub.
DR InterPro; IPR013106; Ig_V-set.
DR InterPro; IPR000538; Link.
DR InterPro; IPR000436; Sushi_SCR_CCP.
DR Pfam; PF00008; EGF; 1.
DR Pfam; PF00059; Lectin_C; 1.
DR Pfam; PF00084; Sushi; 1.
DR Pfam; PF07686; V-set; 1.
DR Pfam; PF00193; Xlink; 4.
DR PRINTS; PR01265; LINKMODULE.
DR SMART; SM00032; CCP; 1.
DR SMART; SM00034; CLECT; 1.
DR SMART; SM00179; EGF_CA; 1.
DR SMART; SM00409; IG; 1.
DR SMART; SM00445; LINK; 4.
DR SUPFAM; SSF56436; C-type_lectin_fold; 5.
DR SUPFAM; SSF57535; Complement_control_module; 1.
DR PROSITE; PS00010; ASX_HYDROXYL; 1.
DR PROSITE; PS00615; C_TYPE_LECTIN_1; 1.
DR PROSITE; PS50041; C_TYPE_LECTIN_2; 1.
DR PROSITE; PS00022; EGF_1; 1.
DR PROSITE; PS50026; EGF_3; 1.
DR PROSITE; PS01187; EGF_CA; 1.
DR PROSITE; PS50835; IG_LIKE; 1.
DR PROSITE; PS01241; LINK_1; 4.
DR PROSITE; PS50963; LINK_2; 4.
DR PROSITE; PS50923; SUSHI; 1.
PE 1: Evidence at protein level;
KW Alternative splicing; Calcium; Complete proteome;
KW Direct protein sequencing; Disulfide bond; EGF-like domain;
KW Extracellular matrix; Glycoprotein; Immunoglobulin domain; Lectin;
KW Metal-binding; Proteoglycan; Reference proteome; Repeat; Secreted;
KW Signal; Sushi.
FT SIGNAL 1 16 Potential.
FT CHAIN 17 2109 Aggrecan core protein.
FT /FTId=PRO_0000017504.
FT DOMAIN 34 143 Ig-like V-type.
FT DOMAIN 149 244 Link 1.
FT DOMAIN 250 347 Link 2.
FT DOMAIN 520 615 Link 3.
FT DOMAIN 621 717 Link 4.
FT REPEAT 1363 1382 1.
FT REPEAT 1383 1402 2.
FT REPEAT 1403 1422 3.
FT REPEAT 1423 1442 4.
FT REPEAT 1443 1462 5.
FT REPEAT 1463 1482 6.
FT REPEAT 1483 1502 7.
FT REPEAT 1503 1522 8.
FT REPEAT 1523 1542 9.
FT REPEAT 1543 1562 10.
FT REPEAT 1563 1582 11.
FT REPEAT 1583 1602 12.
FT REPEAT 1603 1622 13.
FT REPEAT 1623 1642 14.
FT REPEAT 1643 1662 15.
FT REPEAT 1663 1682 16.
FT REPEAT 1683 1702 17.
FT REPEAT 1703 1722 18.
FT REPEAT 1723 1742 19.
FT DOMAIN 1855 1892 EGF-like.
FT DOMAIN 1901 2019 C-type lectin.
FT DOMAIN 2022 2082 Sushi.
FT REGION 48 137 G1-A.
FT REGION 148 243 G1-B.
FT REGION 249 346 G1-B'.
FT REGION 519 613 G2-B.
FT REGION 620 715 G2-B'.
FT REGION 718 803 KS.
FT REGION 805 1264 CS-1.
FT REGION 1265 1742 CS-2.
FT REGION 1363 1742 19 X 20 AA tandem repeats of E-[TA]-S-T-
FT [ADHIFSRVT]-[YQLRH]-E-[IVTAG]-[SR]-[GS]-
FT E-[SAT]-[SP]-[AG]-[FYL]-P-[EA]-[TIV]-
FT [SRTG]-[IVT].
FT REGION 1893 2109 G3.
FT METAL 1958 1958 Calcium 1 (By similarity).
FT METAL 1962 1962 Calcium 1 (By similarity).
FT METAL 1962 1962 Calcium 3 (By similarity).
FT METAL 1982 1982 Calcium 2 (By similarity).
FT METAL 1984 1984 Calcium 2 (By similarity).
FT METAL 1985 1985 Calcium 1 (By similarity).
FT METAL 1991 1991 Calcium 1; via carbonyl oxygen (By
FT similarity).
FT METAL 1991 1991 Calcium 2 (By similarity).
FT METAL 1992 1992 Calcium 1 (By similarity).
FT METAL 1992 1992 Calcium 3 (By similarity).
FT METAL 2005 2005 Calcium 2 (By similarity).
FT METAL 2006 2006 Calcium 2 (By similarity).
FT METAL 2006 2006 Calcium 2; via carbonyl oxygen (By
FT similarity).
FT SITE 1001 1001 Not glycosylated.
FT CARBOHYD 76 76 N-linked (GlcNAc...) (Potential).
FT CARBOHYD 122 122 N-linked (GlcNAc...) (Potential).
FT CARBOHYD 330 330 N-linked (GlcNAc...) (Potential).
FT CARBOHYD 388 388 N-linked (GlcNAc...) (Potential).
FT CARBOHYD 439 439 N-linked (GlcNAc...) (Potential).
FT CARBOHYD 644 644 N-linked (GlcNAc...) (Potential).
FT CARBOHYD 700 700 N-linked (GlcNAc...) (Potential).
FT CARBOHYD 728 728 O-linked (Xyl...) (keratan sulfate).
FT CARBOHYD 765 765 N-linked (GlcNAc...) (Potential).
FT CARBOHYD 801 801 N-linked (GlcNAc...) (Potential).
FT CARBOHYD 1006 1006 O-linked (Xyl...) (chondroitin sulfate).
FT CARBOHYD 1010 1010 O-linked (Xyl...) (chondroitin sulfate).
FT CARBOHYD 1016 1016 O-linked (Xyl...) (chondroitin sulfate).
FT CARBOHYD 1020 1020 O-linked (Xyl...) (chondroitin sulfate)
FT (Probable).
FT CARBOHYD 1249 1249 O-linked (Xyl...) (chondroitin sulfate).
FT CARBOHYD 1253 1253 O-linked (Xyl...) (chondroitin sulfate).
FT CARBOHYD 1259 1259 O-linked (Xyl...) (chondroitin sulfate).
FT CARBOHYD 1263 1263 O-linked (Xyl...) (chondroitin sulfate).
FT CARBOHYD 1269 1269 O-linked (Xyl...) (chondroitin sulfate).
FT CARBOHYD 1273 1273 O-linked (Xyl...) (chondroitin sulfate)
FT (Probable).
FT DISULFID 51 129 By similarity.
FT DISULFID 171 242 By similarity.
FT DISULFID 195 216 By similarity.
FT DISULFID 269 345 By similarity.
FT DISULFID 293 314 By similarity.
FT DISULFID 542 613 By similarity.
FT DISULFID 566 587 By similarity.
FT DISULFID 640 715 By similarity.
FT DISULFID 664 685 By similarity.
FT DISULFID 1859 1870 By similarity.
FT DISULFID 1864 1879 By similarity.
FT DISULFID 1881 1890 By similarity.
FT DISULFID 1897 1908 By similarity.
FT DISULFID 1925 2017 By similarity.
FT DISULFID 1993 2009 By similarity.
FT DISULFID 2024 2067 By similarity.
FT DISULFID 2053 2080 By similarity.
FT VAR_SEQ 1856 1892 Missing (in isoform 2).
FT /FTId=VSP_003073.
FT CONFLICT 362 362 E -> D (in Ref. 2).
FT CONFLICT 601 601 G -> D (in Ref. 2).
FT CONFLICT 1000 1000 P -> R (in Ref. 2; M88101).
FT CONFLICT 1029 1029 A -> P (in Ref. 2; M88101).
FT CONFLICT 1042 1043 VT -> PA (in Ref. 4; AAA48731).
FT CONFLICT 1251 1251 E -> D (in Ref. 2 and 8).
FT CONFLICT 1587 1587 I -> T (in Ref. 5; AAC60751).
FT CONFLICT 1590 1590 I -> V (in Ref. 5; AAC60751).
FT CONFLICT 1594 1594 T -> S (in Ref. 5; AAC60751).
FT CONFLICT 1602 1610 IETSTVREI -> VLCRCSVLR (in Ref. 5;
FT AAC60751).
FT CONFLICT 1603 1603 E -> A (in Ref. 2; M88101).
FT CONFLICT 1672 1672 S -> G (in Ref. 2; M88101).
FT CONFLICT 1796 1796 E -> G (in Ref. 2 and 6; AAA48720).
FT CONFLICT 1988 1988 F -> S (in Ref. 7; AAA48719).
SQ SEQUENCE 2109 AA; 223493 MW; 7F824FD5B3A2ABDA CRC64;
MTTLLLVFVC LQAITTAASA ELSDSSDGLE VKIPEQSPLR VVLGSSLNIP CYFNIPEEED
TNALLTPRIK WSKLSNGTEI VLLVATGGKI RLNAEYREVI SLPNYPAIPT DATLEIKALR
SNHTGIYRCE VMYGIEDRQD TIEVLVKGVV FHYRAISTRY TLNFERAKQA CIQNSAVIAT
PEQLQAAYED GYEQCDAGWL ADQTVRYPIH LPRERCYGDK DEFPGVRTYG VRETDETYDV
YCYAEQMQGK VFYATSPEKF TFQEAFDKCH SLGARLATTG ELYLAWKDGM DMCSAGWLAD
RSVRYPISRA RPNCGGNLVG VRTVYLNPAN QTGYPHPSSR YDAICYSGDD FEALVPGLFT
DEVGTELGSA FTIQTVTQTE VELPLPRNVT EEEARGSIAT LEPMEITATA TELYEAFTVL
PDLFATSVTV ETASPREENV TREEITGIWA VPEEVTTSVS GTAFTTGMAE VSSVEEAIAV
TATPGLESAS PFTIEDHLVQ VTAAPDVALL PRQPISPTGV VFHYRAATSR YAFSFIQAQQ
ACLENNAVIA TPEQLQAAYE AGFDQCDAGW LRDQTVRYPI VNPRSNCVGD KESSPGVRSY
GMRPASETYD VYCYIDRLKG EVFFATQPEQ FTFQEAQLYC ESQNATLASA GQLHAAWKQG
LDRCYPGWLA DGSLRYPIVS PRPACGGDAP GVRTIYQHHN QTGFPDPLSR HHAFCFRALP
SVVEEGVTSL FEEEVMVTQL IPGVEGIPSG EETTVETELS SEPENQTAQG TEVFPTDVSL
LSARPSAFPP ATVIPEETST NASIPEVSGE FPESGEHPTS GEPSASGAPD TSGEPTSVGF
ELSGEQSGIG ESGLPSVDLQ SSGFVPGESG LPSGDVSGLP SGIVDISGLP SAEEEVTVSV
SRIPEVSGMP SGAESSGLHS GFSGEISGTE LISGLPSGEE SGLASGFPTI SLVDSTLVEV
VTAAPGRQEE GKGSIGVSGE EELSGFPSAE WDSSGARGLP SGAETSGEQS GVPELSGEHS
GVPGLSGEAF EVPELSGEHS GVTELSGEHS GLPELSGEPF GVPELSGFPS GLDISGEPSG
APEVSGPVDV SGLTSGVDGS GEVSGVTFIS TSLQEVTTPS VAEAEAKEIL EISGLPSGET
SGMVSGSLDV SGQPSGHIGF GGSASGVLEM SGFPGGAVES SGEASGVEVT SGLASGEESG
LTSGFPTVSL VDTTLVEVVT QTSVAQEVGE GPSGMIEISG FLSGDRGVSG EGSGAVQSSG
LPSGTGDFSG EPSGIPYFSG DISGATDLSG QPSAVTDISG EDSGLPEVTL VTSDLVEVVT
RPTVSQELGG ETAVTFPYVF GPSGEGSASG DLSGGASAEG GIETSTAYEI SGESSAFPET
SIETSTDQEI SGEASAYPEI SVETSTHLET SGETSAYPEI STETSTIQEV SGETSAFPEI
STETSTIQEI SGETSAFPEI RIETSTFQEI SGETSAFPEI RIETSTSQEA RGETSAFPEI
TIEASTVHET SGETSAFPEI SIETSTVHET SGETSAFPEI SIETSTVHEI SGESSAFPEI
RIETSTSQEA RGETSAFPEI TIEASTIQEI SGETSAFPEI SIETSTVREI SGETSAFPEI
RIETSTSQEA RGETSALPEI TIETSTVHET SGEASAFPEI SIETSTRQEA RSEASAYPEV
SIEASTTQEV SGESSAFPEI SVETSTSQEA RGETSAFPEI GIETSTAHEG SGETPGLPAV
STDTAATSLA SGEPSGAPEK ETPDTTSHLI TGVSGETSVP DAVISTSAPD VELAQEPRNT
EETQLEIEPS TPAASGQETE TAAVLDNPHL PATATAALHP ASQEAVDALG PTTEDTDECH
SSPCLNGATC VDGIDSFKCL CLPSYGGDLC EIDLANCEEG WIKFQGHCYR HFEERETWMD
AESRCREHQA HLSSIITPEE QEFVNSHAQD YQWIGLSDRA VENDFRWSDG HSLQFENWRP
NQPDNFFFAG EDCVVMIWHE QGEWNDVPCN YHLPFTCKKG TVACGDPPVV ENARTFGRKK
DRYEINSLVR YQCDHGYIQR HVPTIRCQPN GHWEEPRISC TNPSSYQRRL YKRSPRSRLR
PGVVHRPTH
//