Sei sulla pagina 1di 4

919

Biochem. J. (1988) 253, 919-922 (Printed in Great Britain)

Structure of a full-length cDNA clone for the preproal(I) chain


of human type I procollagen
Gerard TROMP,* Helena KUIVANIEMI,* Alex STACEY,t Hideo SHIKATA,* Clinton T. BALDWIN,*
Rudolf JAENISCHt and Darwin J. PROCKOP*$
* Department of Biochemistry and Molecular Biology, Jefferson Institute of Molecular Medicine, Thomas Jefferson University,
Jefferson Medical College, Philadelphia, PA 19107, U.S.A., and t Whitehead Institute for Biomedical Research
and Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02142, U.S.A.

A full-length cDNA clone for the human preprocal(I) chain of type I procollagen was characterized.
Nucleotide sequencing of the first 1500 nucleotide residues of the 5'-end of the cDNA clone provided 729
nucleotide residues and the codons for 243 amino acid residues not previously defined from any species. The
data made it possible, for the first time, to compare completely codon usage for the human ocl(I) and z2(I)
chains.

INTRODUCTION
Type I collagen is the major component of bone,
tendon, skin and dentine (for reviews see Prockop &
Kivirikko, 1984; Burgeson & Morris, 1986). It is a
heterotrimer with the chain composition [al(I)]2a2(I),
and it is first synthesized as a procollagen comprised of
two proacl(I) and one proa2(I) chains. Although data on
the complex structure of type I collagen are now available,
a complete nucleotide sequence for the preproal1(I)
cDNA has not hitherto been determined.
Here we report the partial sequence of a full-length
human cDNA clone coding for a preproacl(I) chain
isolated previously (Stacey et al., 1987). The information

provided by the sequencing of this clone will be of great


help in studying mutations in type I procollagen genes in
diseases such as osteogenesis imperfecta, Ehlers-Danlos
syndrome and Marfan syndrome (see Prockop &
Kivirikko, 1984; Byers & Bonadio, 1985; Prockop &
Kuivaniemi, 1986).
MATERIALS AND METHODS
Nucleotide sequencing of cDNA clone for preprocl(I)
collagen
The isolation and partial characterization of the fulllength human preproal(I) cDNA clone (pHUCI) has
100 bp
l
l

(a)

Av

-Y

XP

(b)
pHC
b

Av XPAv Av

P Av

RR

Av X

Av

Hf 404

~~pHUCI

Ba
I

Hf 677
I

500 bp
H -l
chain (pHUCI)

Fig. 1. Sequencing strategy used and partial restriction map of the cDNA clone for preproal(I)
(a) Sequence strategy used for the about 1.5 kb EcoRI/XhoI fragment of pHUCI. Arrows starting with X indicate clones that
were generated by the Sequenest transposon-deletion system. Other arrows indicate clones starting at corresponding restriction
sites. (b) Partial restriction map for entire pHUCI. C], Untranslated region. The regions coding for the protein domains are
indicated by shading and hatching: X, propeptides; U, telopeptides; C, triple-helical domain; E, signal peptide. Also shown
are the relative sizes of two clones previously reported (Chu et al., 1982; Bernard et al., 1983b) for the human proal(I) chain
(Hf404 and Hf677). Symbols: Av, AvaI site; Ba, BamHI site; E, EcoRI site; P, PvuII site; R, RsaI site; X, XhoI site.

t -To whom correspondence should be addressed.


These sequence data have been submitted to the EMBL/GenBank Libraries.

Vol. 253

G. Tromp and others

920

113
+61
+103
A TA
C
C C
C
-- -GA CAA AG
CT A
C TG SA T A A TC
GO CAC GCG GAG TGT GAG 6CC ACG CAT GAG CGO ACG CTA ACC CCC TCC CCA GCC ACA AAG AGT CTA CAT GTC TAG GOT
Het Ser Gly Arg Stop
Met Ser Arg Val
- - Asn lie
- Thr Arg Pro

chicken
human
human
chicken

chicken

human
human
chicken
bovine

+120
v
A G T TA A C CG
T TCT
TA
GTG A
T
GG
CTA GAC ATG TTC AGC m GTG GAC CTC COG CTC CTG CTC CTC TTA OCG 6CC ACC GCC CTC CTG ACG CAC GOC CAA
Leu
Thr
His Gly Gln
Stop Met Phe Ser Phe Val Asp Leu Arg Leu Leu Leu Lou Leu Ala Ala Thr Ala Leu
l- -l-- -Va
Ser
Arg - Glu

203
AG -

GA

GAG GMAA GC CAA GTC


G1u Glu Gly Gin Val

Gly -Glu' *
- - - Glu

Glu -

28

chicken
human
human
chicken
bovine

I2

--- --AG A C 6GG G


__
GAG GGC CAA GAC GAA GAC TC CCA CCA ATC ACC
Glu Gly Gln Asp Giu Asp Ile Pro Pro Ile Thr
*
*
- - Gln Thr Gly Ser
u-- Val -

--

A C

G C

A G

T AAG

293

TGC GTA CAG AAC GGC CTC AGG TAC CAT GAC CGA GAC GTG TGO AAA CCC GAG CCC TGC
Cys Val Gin Asn Gly Leu Arg Tyr His Asp Arg Asp Val Trp Lys Pro Glu Pro Cys
-

- Asp - Asp-

- Thr - Asn - Lys


-

-a

58
chicken

human

human
chicken
bovine

G
C
A
CGOATC TGC GTC TGC GAC AAC GGC AAG
Arg II* Cys Val Cys Asp Asn Gly Lys
Gin - - - - - Ser - Asn
Gin - - - - - - - Asn

C G C
TCC G
A C C C
C G
GTG TTG tGC GAT GAC GTG ATC TGT GAC GAG ACC AAGAAC
Val Leu Cys Asp Asp Val IIe Cys Asp Glu Thr Lys Asn
Ile - - - G6u - - - Glu Asp - Ser Asp
- Gln Leu - Asp
-

T
C A GT T
A
A
C
GGC GAG TGC TGT CCC GTC TGC CCC GAC GGC TCA GAG tCA CCC ACC GAC
Gly Glu Cys Cys Pro Vol Cys Pro Asp Gly Ser Glu Ser Pro Thr Asp
human
- Ile - - Vol Asp Ala - - Vol Tyr
chickenAsp -- - - - - - - Glu - Gin - - - - bovine

humn

human

human

chicken

bovine

humn

chicken
bovine

human

humn

chicken
bovine
humn
human

chicken

- Thr

E4

473

C
T T A
CS G6 T G T A A
CAA GA ACC ACC GGCGTCGAG GGA CCC AAG GGA GAC ACT GGC
Gin G6u Thr Thr Gly Val Glu Gly Pro Lys Gly Asp Thr Gly
Pro - Ser Ala - - - - - - - - -

118
563

IES

A C T A
C C G
A
T CC
C
A
A SAC
CCC CGA GGC CCA AGG6ACCC GCA GGC CCC CcT GGC CCA GAT GGC ATC CCT GGA CAG CCT GGA c CCC GGA CCC CCC GGA CCC CCC GGA
Pro Arg Sly Pro Arg Gly Pro Ala Gly Pro Pro Gly Arg Asp Sly Ile Pro Gly Gin Pro Gly Leu Pro Gly Pro Pro Sly Pro Pro Sly
- - - Asp - - Lou Pro - - - - - - - - - - - - - - - - - - - - - -

148

4
653
|E6
GAGAAC m GCT CCC CA6 CTG TCT TAT GGC TAT SAT GAG AAA TCA ACC GSA GSA ATT TCC GTG CCT

human

TTC

GTC CCCGAG
Vol Pro G6u
I1i - Phe
88

|E3
T GAC KIC

chicken

chicken

AA
TGC CCC GGC GCC GAA
Cys Pro Gly Ala G6u
- - Asn - - - Asn - Lys

383

C A
CCT CCC GGA CCC CcT GGCCT
CTC
Pro Pro Gly Pro Pro Sly Leu Gly Gly Asn Phe Ala Pro Gin Leo Sor Tyr Gly Tyr Asp Glu Lys Ser Thy Sly Gly lIe Ser Val Pro
6lu t - - - - - - - - Ale - * Val Aia - - 6lu *
-

178

GGC CCC AT6IS CCC TCT GGt CCT CGT G6T CTC CCT GGC CCC CCT GGtTCA CCT G6t CCC CM GGC TTC CAAGGT CCC CCT G GA CCT
Sly Pro Met Gly Pro Ser Giy Pro Arg Sly Leu Pro Sly Pro Pro Sly Ala Pro Sly Pro Gin Sly Phe Gln Giy Pro Pro Gly Glu Pro
_
- Ala -

208
833

G6C SAG CCT GGA GCT TCA GGT CCC ATG 66T CCC CSA G6T CCC CCA GGT CCC CCT GSA AAS AAT GSA SAT SAT BBS GS GCT GSA AA CCT
Sly G1u Pro Gly Ala Ser Sly Pro Het Gly Pro Arg Sly Pro Pro Gly Pro Pro Sly Lys Asn Sly Asp Asp Sly G6u Ala Sly Lys Pro
- Al-

bovine

238
923

human

humn
chicken
bovine

S A TTG CCC GSA ACA GCT GGC CTC CCT GSA ATG UG GGA CAC ASA
GGt CGT CCT G6t GAG CGT GGG CCT CCT GGS CCT CA6 GGtGCT CTA
Gly Arg Pro Sly Glu Arg Gly Pro Pro Gly Pro Gin Gly Ala Arg Gly Leu Pro Gly Thr Ale Sly Lou Pro Gly Met Lys Sly His Arg
- Sin.

268
1013

humn
human
chicken

bovine
humn
humn

chicken

GGt TTG GAT GGt GCC AAM GGAAT GCT GGT CCT GCT Gt CCTAAM GGTSAG CCT GGC AGC CCT GtGAA AAT GGAGCT CCT
Gt TTC AGT
Gly Phe Ser Gly Leu Asp Gly Ala Lys Sly Asp Ale Sly Pro Ale Gly Pro Lys Gly Glu Pro Gly Ser Pro Gly Glu Asn Gly Ala Pro
- Gin Pro 298

GGT CAG ATG GGC CCC CGT GGC CTG CCT GGtGAT AGA GGT CGC CCT GGA GCC CCT GGC CCT GCT GGT GCt CGT GGA AMT GAT GGT GCT ACT
Gly Gln Met Gly Pro Arg Gly Leu Pro Gly Glu Arg Gly Arg Pro Gly Ala Pro Gly Pro Ala Gly Ala Arg Gly Asn Asp Sly Ala Thr
-

1103

bovine

human

humn
bovine

human
human
bovine

328

GGtTCT GCC GGG CCC CCT GGt CCC ACC GGC CCC GCT GGT CCTCCTGGC TTC CCTGGt TCT TT GGT GCT AAG GGT GM GCT GGT CCC CAA
Gly Ala Ala Gly Pro Pro Gly Pro Thr Gly Pro Ala Gly Pro Pro Sly Phe Pro Gly Ala Vol Gly Ala Lys Gly Glu Ala Sly Pro Gin
- Sly - GGG CCC CGA GGC TCT 6M GGT CCC CAG GGT GTG CGT GGT SAG CCT GC CCC CCT GGC CCT GCT GGt GCT GCT GGC CCT GCT GGA AAC CCT
Sly Pro Arg Sly Ser G6u Sly Pro Gin Sly Vol Arg Gly Slu Pro Sly Pro Pro Sly Pro Ale Gly Ala Ala Gly Pro Ala Gly Asn Pro

1193

358
1283

388

1373

humn
hman

bovine
human

human
bovine
human
human
bovine

GGT GCT SAT GGA CAG CCT GGT GCT AAA 66T GCC AAT GGT 6CT CCT 66T ATtGCT GGT GCT CCT GGC TTC CCT GGT GCCCGA GGC CCC TCT
Gly Ala Asp Gly Sin Pro Gly Ala Lys Sly Ala Asn Sly Ala Pro Sly IIe Ala Sly Ala Pro Sly Phe Pro Gly Ala Arg Gly Pro Ser
-

Glu

GGA CCC CAG SGC CCC GSC SOC CCT CCT GGT CCC AAM G6T AAC A6C SGT SM CCT GST GCT CCT GGC AC AAA SGA SAC ACT GGT GCT AAG
Sly Ala Pro Sly Ser Lys Sly Asp Thr Giy Ala Lys
Sly Pro Sin Sly Pro Sly Sly Pro Pro Sly Pro Lys Gly Asn Ser Sly Glu Pro
- Asn -Ser

AGAA GA MA CGA GGA GCT CGA G


GGA GAG CCT GGC CCT GTT GGT GTT CM GG CCC CCT GGC CCT GCT GG GAG
Vol Gin Sly Pro Pro Gly Pro Al Gly Glu Glu Sly Lys Arg Giy Ala Arg
Sly G1u Pro Gly Pro Vol Sly
- Thr Ile -47

418
1463

448
1536
472

1988

921

cDNA for preproal chain of human type I procollagen

been described previously (Stacey et al., 1987). For


sequencing, the clone was inserted into the Sequenest
transposon-deletion vectors called pAA-PZ618 and
pAA-PZ619 (Gold BioTechnology, St. Louis, MO,
U.S.A.; Peng & Wu, 1986). Deletions were generated
according to manufacturer's recommendations. Singlestranded templates for sequencing were obtained by
single-strand rescue by use of the helper bacteriophage
MI 3K07 and sequenced by using a primer specific for the
vectors pAA-pZ618 and pAA-pZ619. Subsequently,
fragments of pHUCI were subcloned into bacteriophages
Ml3mpl8 and M13mpl9, and sequenced by using
universal primers (Sanger et al., 1977; Messing, 1983).

RESULTS AND DISCUSSION


Restriction map and nucleotide sequence of the cDNA
The clone pHUCI (human collagen) is a cDNA clone
corresponding to the complete 4.8 kb human proa l(I)
mRNA (Stacey et al., 1987) (Fig. 1). To establish that the
clone was full length and coded for a functional al(I)
collagen mRNA, it was cloned into a retroviral vector
capable of expressing inserted sequences in host mouse
cells. Infection of Movi3 mouse cells (Harbers et al.,
1984; Schnieke et al., 1987) with the vector resulted in
the production of stable type I collagen consisting of two
human alI chains and one mouse a2 chain (Stacey et al.,
1987).
The previously analysed cDNAs covered the codons
for amino acid residues 247-1014 of the a-chain domain,
the 26 amino acid residues of the C-terminal telopeptide,
the 246 amino acid residues of the C-propeptide and
Fig. 2. Nucleotide and amino acid sequence of the cDNA clone
for the preproocl(I) chain (pHUCI)
The nucleotides are numbered from the start site for
transcription (Chu et al., 1985) and the amino acids from
the first amino acid of the preproal(I) chain. The 1500 bp
nucleotide sequences from the 5'-end of the human cDNA
clone are indicated in the second line. The overlap with
previously published sequences from Hf404 (Bernard et
al., 1983b) starts at nucleotide position 1392 (amino acid
residue 425). The amino acid sequence encoded for by the
clone is indicated below the nucleotide sequence. Line 1:
nucleotide sequences for the chicken preproal(I) chain
where they are known (bases 27-571; Finer et al., 1987)
and differ from the human sequence. Lines 2 and 3:
nucleotide and amino acid sequences of pHUCI defined in
the present work. Line 4: amino acid sequences of the
chicken preproal(I) chain where they are known, these
being residues 1-151 derived from genomic clone (Finer et
al., 1987) and residues 158-301 defined by Edman
degradation of peptide fragments (see Galloway, 1982).
Line 5: amino acid sequence of the bovine proal(I) chain
that was defined ty Edm4n degradation of peptide
fragments (see Galloway, 1982). Symbols: -, identical
amino acid; -, missing nucleotides in the human or
chicken cDNA; *, missing amino acid; empty space in
amino acid sequence, not known; + 61 and + 103, possible
start sites for translation that both end in a stop codon
after 12 nucleotide residues; + 120, start site for translation; vertical lines, beginnings of exons indicated, where
known (Chu et al., 1984); V, cleavage site for signal
peptidase; +, cleavage site for procollagen N-proteinase;
l, beginning of a-chain domain.
--

Vol. 253

Table 1. Codon usage in the triple-helical domain of the proal(I)


and proa2(W) human type I procoUagen
Data for the triple-helical domain of the proa l(I) chain are
from Kuivaniemi et al. (1988) and Bernard et al. (1983a).
Amino acid

al(I)

a2(I)

Gly

0.50
0.28
0.18
0.03
342
0.60
0.38
0.02
0
235
0.84
0.13
0.03
0
117
0.75
0.20
0.05
0
118

0.51
0.22

Codons examined
Pro (total)

Codons examined
Pro (Yaa position)

Codons examined
Ala

Codons examined

0.22
0.05

342
0.62
0.21
0.16
0.01
199
0.73
0.09
0.16
0.02
91
0.76
0.15
0.08
0
107

Third
base

U
C
A
G
U
C
A
G
U
C
A
G
U
C
A
G

224 bp of the 3'-non-translated region of the mRNA. In


total, they included 3344 bp of the mRNA and codons
for 1040 amino acid residues (Chu et al., 1982; Bernard
et al., 1983b). In the present work we determined 729
nucleotide residues and .the codons for 243 amino acids
residues not previously defined (residues 182-424 in Fig.
2) from any species. In addition, we re-examined the
sequence of 626 bp defined by sequencing genomic
clones containing exons 1 to 6 (Chu et al., 1984, 1985).
As indicated in Fig. 1, the entire 1500 bp was sequenced
in both directions.

Comparison of codon usage between the human al(I)


and oc2(1) cDNAs
The data developed here made it possible, for the first
time, to compare completely codon usage for the human
al(I) chain with that for the human a2(I) chain. The
results (Table 1) demonstrated a marked preference for
U as a third base in codons for glycine, proline and
alanine, in that U was used for 0.50-0.84 of these codons
in both chains. Even more striking was the rare use of G
for the third base of the same codons. G was found in
only nine of 342 glycine codons in the alI(I) chain. It was
not used for any of the total 353 proline and alanine
codons. The results therefore indicate a strong selective
pressure against the use of G in these coding sequences.
There was also a preference for U in the third position of
codons for proline that were in the Yaa position of the
repeating -Gly-Xaa-Yaa- _sequence of the collagen achains. The preference had been noted in the sequence of
the a2(I) chain (Kuivaniemi et al., 1988), but was even
more pronounced in the sequence of the al(I) chain.

922
The work presented here was supported in part by N.I.H.
Research Grant AR-38188 and by a grant from the March of
Dimes-Birth Defects Foundation.

REFERENCES
Bernard, M. P., Myers, J. C., Chu, M.-L., Ramirez, F.,
Eikenberry, E. F. & Prockop, D. J. (1983a) Biochemistry 22,
1139-1145

Bernard, M. P., Chu, M.-L., Myers, J. C., Ramirez, F.,


Eikenberry, E. F. & Prockop, D. J. (1983b) Biochemistry 22,
5213-5223
Burgeson, R. E. & Morris, N. P. (1986) in Connective Tissue
Disease: Molecular Pathology of the Extracellular Matrix
(Uitto, J. & Perejda, A. J., eds.), pp. 3-28, Marcel Dekker,
New York
Byers, P. H. & Bonadio, J. F. (1985) in Genetic and Metabolic
Diseases in Pediatrics (Lloyd, J. & Scriver, C. R., eds.), pp.
56-90, Butterworths, London
Chu, M.-L., Myers, J. C., Bernard, M. P., Ding, J.-F. &
Ramirez, F. (1982) Nucleic Acids Res. 10, 5925-5934
Chu, M.-L., de Wet, W., Bernard, M., Ding, J.-F., Morabito,
M., Myers, J., Williams, C. & Ramirez, F. (1984) Nature
(London) 310, 337-340

G. Tromp and others


Chu, M.-L., de Wet, W., Bernard, M. & Ramirez, F. (1985)
J. Biol. Chem. 260, 2315-2320
Finer, M. H., Aho, S., Gerstenfeld, L. C., Boedtker, H. &
Doty, P. (1987) J. Biol. Chem. 262, 13323-13332
Galloway, D. (1982) in Collagen in Health and Disease (Weiss,
J. B. & Jayson, M. I. V., eds.), pp. 528-557, ChurchillLivingstone, Edinburgh
Harbers, K., Kuehn, M., Delius, H. & Jaenisch, R. (1984) Proc.
Natl. Acad. Sci. U.S.A. 81, 1504-1508
Kuivaniemi, H., Tromp, G., Chu, M.-L. & Prockop, D. J.
(1988) Biochem. J. 252, 633-640
Messing, J. (1983) Methods Enzymol. 101, 20-78
Peng, Z.-G. & Wu, R. (1986) Gene 45, 247-252
Prockop, D. J. & Kivirikko, K. I. (1984) N. Engl. J. Med. 311,
376-386
Prockop, D. J. & Kuivaniemi, H. (1986) Rheumatology 10,
246-271
Sanger, F., Nicklen, S. & Coulson, A. R. (1977) Proc. Natl.
Acad. Sci. U.S.A. 74, 5463-5467
Schnieke, A., Dziadek, M., Bateman, J., Mascara, T., Harbers,
K., Gelinas, R. & Jaenisch, R. (1987) Proc. Natl. Acad. Sci.
U.S.A. 84, 764-768
Stacey, A., Mulligan, R. & Jaenisch, R. (1987) J. Virol. 61,
2549-2554

Received 5 April 1988/23 May 1988; accepted 1 June 1988

1988

Potrebbero piacerti anche