Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Prediction of Glass Transition Temperatures from Monomer and Repeat Unit Structure
Using Computational Neural Networks
Brian E. Mattioni and Peter C. Jurs*
Department of Chemistry, The Pennsylvania State University, 152 Davey Laboratory,
University Park, Pennsylvania 16802
Received June 24, 2001
Quantitative structure-property relationships (QSPR) are developed to correlate glass transition temperatures
and chemical structure. Both monomer and repeat unit structures are used to build several QSPR models for
Parts 1 and 2 of this study, respectively. Models are developed using numerical descriptors, which encode
important information about chemical structure (topological, electronic, and geometric). Multiple linear
regression analysis (MLRA) and computational neural networks (CNNs) are used to generate the models
after descriptor generation. Optimization routines (simulated annealing and genetic algorithm) are utilized
to find information-rich subsets of descriptors for prediction. A 10-descriptor CNN model was found to be
optimal in predicting Tg values using the monomer structure (Part 1) for 165 polymers. A committee of 10
CNNs produced a training set rms error of 10.1K (r2 ) 0.98) and a prediction set rms error of 21.7K (r2 )
0.92). An 11-descriptor CNN model was developed for 251 polymers using the repeat unit structure (Part
2). A committee of CNNs produced a training set rms error of 21.1K (r2 ) 0.96) and a prediction set rms
error of 21.9K (r2 ) 0.96).
INTRODUCTION
PREDICTION
OF
MATTIONI
AND JURS
polymer name
Tg (K)
compd no.
polymer name
Tg (K)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
poly(vinyl chloride)
poly(2-n-butyl-1,4-butadiene)
poly(vinyl n-octyl ether)
poly(3-hexoxypropylene oxide)
poly(3-butoxypropylene oxide)
poly(ethylene)
poly(2-n-propyl-1,4-butadiene)
poly(vinyl n-decyl ether)
poly(2-ethyl-1,4-butadiene)
poly(isobutylene)
poly(isoprene)
poly(propylene oxide)
poly(vinyl n-pentyl ether)
poly(vinyl 2-ethylhexyl ether)
poly(n-octyl acrylate)
poly(vinyl n-hexyl ether)
poly(3-methoxypropylene oxide)
poly(pentadiene)
poly(n-heptyl acrylate)
poly(-caprolactone)
poly(n-nonyl acrylate)
poly(n-hexyl acrylate)
poly(dodecyl methacrylate)
poly(n-butyl acrylate)
poly(1-heptene)
poly(vinyl n-butyl ether)
poly(2-isopropyl-1,4-butadiene)
poly(1-hexene)
poly(1-pentene)
poly(chloroprene)
poly(propylene sulfide)
poly(1-butene)
poly(2-octyl acrylate)
poly(n-propyl acrylate)
poly(propylene)
poly(vinylidene fluoride)
poly(2-heptyl acrylate)
poly(6-methyl-1-heptene)
poly(2-bromo-1,4-butadiene)
poly(isobutyl acrylate)
poly(vinyl isobutyl ether)
poly(ethyl acrylate)
poly(n-octyl methacrylate)
poly(vinyl sec-butyl ether)
poly(sec-butyl acrylate)
poly(vinyl ethyl ether)
poly(vinylidene chloride)
poly(3-pentyl acrylate)
poly(5-methyl-1-hexene)
poly(n-hexyl methacrylate)
poly(vinyl isopropyl ether)
poly(R-vinyl naphthalene)
poly(-vinyl naphthalene)
poly(3-phenoxypropylene oxide)
poly(R, , -trifluoro styrene)
poly(methyl R-cyano acrylate)
poly(o-hydroxymethyl styrene)
poly(vinyl methyl sulfide)
poly(vinyl butyrate)
poly(p-n-hexoxymethyl styrene)
poly(p-n-butyl styrene)
poly(methyl acrylate)
poly(vinyl propionate)
poly(2-ethylbutyl methacrylate)
poly(o-n-octoxy styrene)
poly(2-tert-butyl-1,4-butadiene)
poly(n-butyl methacrylate)
poly(2-methoxyethyl methacrylate)
poly(p-n-propoxymethyl styrene)
poly(3,3,3-trifluoropropylene)
poly(vinyl acetate)
poly(4-methyl-1-pentene)
poly(vinyl formate)
poly(vinyl chloroacetate)
poly(neopentyl methacrylate)
poly(n-propyl methacrylate)
poly(12-aminododecanoic acid)
poly(4-cyclohexyl-1-butene)
poly(pentafluoroethyl ethylene)
poly(11-aminoundecanoic acid)
348
192
194
188
194
195
196
197
197
199
203
206
207
207
208
209
211
213
213
213
215
216
218
219
220
221
221
223
223
225
226
228
228
229
233
233
235
239
241
249
251
251
253
253
253
254
256
257
259
268
270
432
424
315
475
433
433
272
278
278
279
281
283
284
286
293
293
293
295
300
301
302
304
304
306
308
310
313
314
315
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
poly(tert-butyl acrylate)
poly(2,3,3,3-tetrafluoropropylene)
poly(10-aminodecanoic acid)
poly(3,3-dimethylbutyl methacrylate)
poly(N-butyl acrylamide)
poly(vinyl trifluoroacetate)
poly(p-n-butoxy styrene)
poly(isobutyl methacrylate)
poly(3-methyl-1-butene)
poly(9-aminononanoic acid)
poly(8-aminocaprylic acid)
poly(hexafluoropropylene)
poly(ethyl methacrylate)
poly(isopropyl methacrylate)
poly(n-butyl R-chloroacrylate)
poly(7-aminoheptanoic acid)
poly(sec-butyl methacrylate)
poly(p-isopentoxy styrene)
poly(heptafluoropropyl ethylene)
poly(3-cyclopentyl-1-propene)
poly(3-phenyl-1-propene)
poly(-caprolactam)
poly(p-n-propoxy styrene)
poly(n-propyl R-chloroacrylate)
poly(N-vinyl carbazole)
poly(sec-butyl R-chloroacrylate)
poly(3-cyclohexyl-1-propene)
poly(vinyl cyclopentane)
poly(2-hydroxypropyl methacrylate)
poly(p-methoxymethyl styrene)
poly(cyclohexyl methacrylate)
poly(vinyl alcohol)
poly(p-sec-butyl styrene)
poly(p-ethoxy styrene)
poly(2-hydroxyethyl methacrylate)
poly(p-isopropyl styrene)
poly(2-methyl-5-tert-butyl styrene)
poly(p-methoxy styrene)
poly(isopropyl R-chloroacrylate)
poly(4-methoxy-2-methyl styrene)
poly(vinyl cyclohexane)
poly(m-chloro styrene)
poly(2-chloroethyl methacrylate)
poly(ethyl R-chloroacrylate)
poly(m-methyl styrene)
poly(chlorotrifluoroethylene)
poly(styrene)
poly(p-methyl styrene)
poly(2,5-difluoro styrene)
poly(o-ethyl styrene)
poly(3,5-dimethyl styrene)
poly(o-vinyl pyridine)
poly(methyl methacrylate)
poly(acrylonitrile)
poly(o-fluoro styrene)
poly(2,3,4,5,6-pentafluoro styrene)
poly(acrylic acid)
poly(p-fluoro styrene)
poly(tert-butyl methacrylate)
poly(3,4-dimethyl styrene)
poly(2-fluoro-5-methyl styrene)
poly(2,4-dimethyl styrene)
poly(p-methoxycarbonyl styrene)
poly(3-methyl-4-chloro styrene)
poly(cyclohexyl R-chloroacrylate)
poly(p-chloro styrene)
poly(o-chloro styrene)
poly(2,5-dichloro styrene)
poly(phenyl methacrylate)
poly(methacrylonitrile)
poly(R-p-dimethyl styrene)
poly(3-fluoro-4-chloro styrene)
poly(m-hydroxymethyl styrene)
poly(3,4-dichlorostyrene)
poly(p-tert-butyl styrene)
poly(2,4-dichloro styrene)
poly(o-methyl styrene)
poly(R-methyl styrene)
poly(p-phenyl styrene)
poly(p-hydroxymethyl styrene)
315
315
316
318
319
319
320
321
323
324
324
425
324
327
330
330
330
330
331
333
333
335
343
344
423
347
348
348
349
350
356
358
359
359
359
360
360
362
363
363
363
363
365
366
370
373
373
374
374
376
377
377
378
378
378
378
379
379
380
384
384
385
386
387
387
389
392
393
393
393
394
395
398
401
402
406
409
409
411
413
PREDICTION
OF
Table 1 (Continued)
compd no.
polymer name
161
162
163
164
165
poly(p-vinyl pyridine)
poly(2,5-dimethyl styrene)
poly(p-bromo styrene)
poly(2-methyl-4-chloro styrene)
poly(n-vinyl pyrrolidone)
415
416
417
418
418
208
209
210
211
212
166
167
poly(oxytetramethylene)
poly(oxytrimethylene)
190
195
213
214
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
poly(oxyoctamethylene)
poly(oxyhexamethylene)
poly(tetramethylene adipate)
poly(oxyethylene)
poly(decamethylene adipate)
poly(oxymethylene)
poly(ethylene azelate)
poly(ethylene adipate)
poly(ethylene sebacate)
perfluoropolymer 2
perfluoropolymer 1
poly(1,2-butadiene)
poly(ethylene succinate)
poly(hexamethylene sebacamide)
poly(decamethylene sebacamide)
poly(vinyl butyral)
poly(ethyl-p-xylylene)
poly(methyl-p-xylylene)
poly(hexamethylene adipamide)
poly(p-xylylene)
poly(ethylene terephthalate)
poly(chloro-p-xylylene)
poly(bromo-p-xylylene)
poly(cyano-p-xylylene)
poly(R,R,R,R-tetrafluoro-p-xylylene)
poly(oxy-2,2-dichloromethyltrimethylene)
poly(2,5-dimethyl-p-xylylene)
poly(vinyl formal)
perfluoropolymer 3
polyetherimide 1
poly(1,1-dichloroethylene bis[4-phenyl] carbonate)
poly(thio bis[4-phenyl] carbonate)
poly(1,1-ethane bis[4-phenyl] carbonate)
poly(2,2-butane bis[4-phenyl] carbonate)
poly(2,2-pentane bis[4-phenyl] carbonate)
poly(methane bis[4-phenyl] carbonate)
poly(1,1-butane bis[4-phenyl] carbonate)
poly(oxy-1,4-phenylene-oxy-1,4-phenylenecarbonyl-1,4-phenylene)
poly(4,4-heptane bis[4-phenyl] carbonate)
poly(1,1-[2-methyl propane] bis[4-phenyl] carbonate)
203
204
205
206
217
218
228
233
243
255
260
269
272
313
319
324
298
328
330
333
345
353
353
363
363
265
373
378
390
401
430
388
403
407
410
420
396
419
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
206
207
polymer name
Tg (K)
440
444
449
449
457
467
473
482
482
493
500
505
512
520
520
541
546
573
580
583
593
606
615
618
658
668
668
673
324
423
220
240
337
344
355
360
397
403
411
545
420
373
384
493
550
421
422
rms )
( fi - xi)
MATTIONI
AND JURS
type
coefficient
error
range
V6CH-18
N6PC-19
MOLC-8
MOLC-9
NSB-12
1SP2-1
MDE-44
GRAV-4
QSUM-1
RPCS-1
constant
topological
topological
topological
topological
topological
topological
topological
geometric
electronic
hybrid
1781.0
-1.639
46.71
57.39
-23.23
-68.15
16.57
4.93e-02
50.78
6.10
126.1
138.0
0.240
11.28
6.62
1.53
6.76
3.51
9.70e-03
6.60
1.02
19.27
0-0.136
0-183
0-1.69
1.66-4.97
0-15
0-2
0-11.52
83-2921
0.57-4.28
3e-14-19.04
a
V6CH-18, atom valence-corrected sixth-order chain values;41
N6PC-19, number of sixth-order path clusters;41 MOLC-8, heteroatom
and aromatic ring corrected fourth-order path cluster;45 MOLC-9,
heteroatom and aromatic ring corrected average distance sum connectivity;45 NSB-12, number of single bonds; 1SP2-1, number of sp2
hybridized carbons bound to 1 other carbon; MDE-44, molecular
distance edge between quaternary carbons;31 GRAV-4, gravitational
index using all pairs of atoms ((m1*m2/r2));42 QSUM-1, sum of
absolute charges;35 RPCS-1, relative positive charged surface area
(SAMPOS/RPCG where RPCG ) (charge of most positive atom)/(sum
total positive charge)).36
PREDICTION
OF
type
range
V5C-10
V6CH-18
MOLC-8
NSB-12
1SP2-1
ELOW-1
SHDW-4
DPOL-1
CARB-1
DPSA-3
topological
topological
topological
topological
topological
topological
geometric
electronic
electronic
hybrid
0-0.306
0-0.136
0-1.69
0-15
0-2
0-6.6
0.428-0.585
0.23e-03-4.98
-0.74e-02-0.32
17.9-72.1
a
V6CH-18, MOLC-8, NSB-12, 1SP2-1, see Table 2 caption; V5C10, atom valence-corrected fifth-order cluster;41 ELOW-1, through space
distance between EMIN and EMAX;30 SHDW-4, standardized area
projected onto the XY-plane;33 DPOL-1, dipole moment; CARB-1,
average charge on carbonyl carbons; DPSA-3, difference in charged
weighted partial surface areas [(+SAi)(Q+i) - (-SAi)(Q-i)] where
(+SAi) and (-SAi) are the surface area contributions of the ith positive
and or negative atom in the molecule, while Q+i and Q-i are the partial
atomic charges for the ith positive and negative atoms.36
different segments of the polymer chain. The hybrid descriptor RPCS-136 measures the relative positively charged surface
area of a molecule, which can also affect Tg due to
intermolecular interactions.
The descriptors from the best Type I model were then
passed as inputs to a CNN to form nonlinear Type II models.
CNN architectures ranging from 10-3-1 to 10-5-1 were
explored. A 10-5-1 CNN proved to be the best model in
accordance with rms error and predictive generalization. This
model generated rms errors of 15.67K (r2 ) 0.952), 15.08K
(r2 ) 0.959), and 21.76K (r2 ) 0.919) for the training set,
cross-validation set, and prediction set, respectively. This is
a 39% improvement for the training set and an 18%
improvement for the prediction set over the linear Type I
results. Results were obtained by averaging the output values
for each compound over 10 individual network trainings.
Averaging minimizes the dependence of network training
in the selection of initial random weights and biases.43 This
generally produces predictions much closer to experimental
values than any single training run.
Due to the improvement of using CNNs in Type II
modeling, Type III models were developed. The best model
found using Type III feature selection included the 10
descriptors shown in Table 3. The descriptors V6CH-18,
MOLC-8, NSB-12, and 1SP2-1 appear again in the Type
III model which were described in detail previously. This
clearly shows the important molecular contribution of these
four descriptors and their effect on the Tg.
The descriptor V5C-10 is similar to V6CH-18 in that it
encodes molecular features such as size and branching. The
hybrid descriptor ELOW-1 combines electronic and topological aspects of a molecule to form atom-level topological
indexes.30 This descriptor encodes the through-space distance
between atoms with the maximum and minimum E-state
values. Electrotopological indexes, such as ELOW-1, describe the probability of interaction between atoms in a
compound with other nearby atoms. The geometric descriptor
SHDW-4 encodes the standardized area projected onto the
XY-plane, which encodes information about the size and
shape of the 3-D structure. DPOL-1 represents the dipole
moment of the molecule. The descriptor CARB-1 represents
MATTIONI
AND JURS
coefficient
error
range
KAPA-6
V3C-8
V5C-10
S4C-9
NN-4
NSB-12
WTPT-2
EMIN-1
MDE-12
MDE-13
constant
-6.23
79.7
181.1
-212.4
43.5
-5.50
398.2
-16.1
-11.5
-3.4
-371.7
1.48
12.1
53.9
28.6
4.7
1.42
31.4
3.1
2.8
1.1
57.9
0-17.13
0-1.9
0-0.5
0-1.78
0-4
1-36
1.5-2.1
-8.9-2
0-5.4
0-37.9
a KAPA-6, atom valence-corrected third-order kappa index;27 V3C8, V5C-10, S4C-9 third- and fifth-order atom valence-corrected cluster,
simple fourth-order cluster respectively;41 NN-4, number of nitrogen
molecules; NSB-12, number of single bonds; WTPT-2, ratio of molecular ID for molecule to number of atoms in molecule;44 EMIN-1, minimum atomic E-state value;30 MDE-12, MDE-13, molecular distance
edge between primary/secondary and primary/tertiary carbons respectively.31
range
KAPA-6
V6C-11
NN-4
NDB-13
NLP-19
WTPT-2
3SP2-1
2SP3-1
EDIF-1
MDE-11
MDE-12
0-17.13
0-0.14
0-4
0-5
0-16
1.5-2.1
0-11
0-12
0-23.4
0-2.89
0-4.70
a KAPA-6, NN-4, WTPT-2, MDE-12, see Table 4 caption; V6C11, sixth-order valence cluster;41 NDB-13, number of double bonds;
NLP-19, number of lone pairs; 3SP2-1, number of sp2 hybridized
carbons bound to three other carbons; 2SP3-1, number of sp3 hybridized
carbons bound to two other carbons; EDIF-1, difference between max
E-state value and minimum E-state value;30 MDE-11, molecular distance
edge between primary carbons.31
PREDICTION
OF
MATTIONI
AND JURS
CI010062O