Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
33 1211
2017 © The Japan Society for Analytical Chemistry
Original Papers
*1 Department of Chemistry and Center of Excellence for Innovation in Chemistry, Faculty of Science and
Graduate School, Chiang Mai University, Chiang Mai 50200, Thailand
*2 Department of Statistics, Faculty of Science, Chiang Mai University, Chiang Mai 50200, Thailand
*3 Department of Pharmaceutical Technology and Biopharmaceutics, University of Vienna, Vienna 1090, Austria
*4 Institute of Theoretical Chemistry, University of Vienna, Vienna 1090, Austria
*5 Department of Pharmaceutical Sciences, Chiang Mai University, Chiang Mai 50200, Thailand
A quantitative structure–retention relationship (QSRR) study was applied for an estimation of retention times of secondary
volatile metabolites in Thai jasmine rice. In this study, chemical components in rice seed were extracted using solvent
extraction, then separated and identified by gas chromatography–mass spectrometry (GC-MS). A set of molecular
descriptors was generated for these substances obtained from GC-MS analysis to numerically represent the molecular
structure of such compounds. Principal component analysis (PCA) and principal component regression analysis (PCR)
were used to model the retention times of these compounds as a function of the theoretically derived descriptors. The best
fitted regression model was obtained with R-squared of 0.900. The informative chemical properties related to retention
time were elucidated. The results of this study demonstrate clearly that the combination of molecular weight and
autocorrelation functions of two dimensional interatomic distance, which are molecular polarizability, atom identity,
sigma charge, sigma electronegativity and polarizability, can be considered as comprehensive factors for predicting the
retention times of volatile compounds in rice.
(Received February 27, 2017; Accepted July 4, 2017; Published November 10, 2017)
substances were found in a comparison of volatile compounds particularly suitable for describing differences in congeneric
in California long-grain rice cultivar21 and Thai rice (KDML series of molecules.32,33
105)22 using GC-MS. The number of alcohol compounds in In this study, retention times of volatile organic compounds
Thai rice is higher than that of California long-grain rice, from Thai jasmine rice obtained from GC-MS were used for
however, the numbers of aldehyde, ketone, aromatic, acid, ester QSRR modelling. PCA and PCR were applied to predict
and nitrogeneous compounds in Thai rice are lower than in retention times of these compounds as a function of the
California long-grain rice. theoretically derived descriptors.
A quantitative description of molecular structures provided
through the parameters and descriptors is a prerequisite for
quantitative structure–property relationship (QSPR) studies.23 Experimental
Molecular descriptors can be defined as the outcome of a logical
and mathematical procedure that changes chemical information Materials
encoded within a symbolic representation of a molecule into a Thai jasmine rice (Oryza sativa) variety Khao Dawk Mali 105
practicable number.24 QSPR studies have been utilized to (KDML105) used in this study was obtained from the
investigate the relationship of the property to the relevant part of Agricultural Technology Research Institute, Lampang, Thailand.
structures. Consequently, if essential information of each After harvesting, the rice seeds were kept in cool conditions
structure can be extracted and screened properly, a rational (–20°C) until further use in experiments.
predictive model can be constructed. There are several ways to For use as an internal standard, 2,4,6-trimethylpyridine (TMP),
generate molecular descriptors containing topological, geometric 99% purity, was purchased from Aldrich Chemical Co.,
and electronic features to project all dimensions and information Milwaukee, WI. Preparation of the internal standard solution
of each structure. The technique is associated with several containing 0.25 ppm of TMP was made by dissolving an exact
advantages and applications, such as estimation of weight of it in a volume of 0.1 M HCl.
physicochemical properties using substituent constant, reduction
of the number of compounds to be synthesized, and faster Preparation of rice grain extracts
detection and identification of the most favorable compounds. First, 50 grams of the rice seeds were extracted with 0.1 M
There are numerous statistical techniques for extended QSPR hydrochloric acid. TMP 0.25 ppm was used as the internal
modeling.25 Principal component analysis (PCA) and principal standard. The extracted solution was made alkaline by 0.1 M
component regression analysis (PCR) are used for classification NaOH and then extracted with dichloromethane. The organic
in linear models and built with the help of a training set and phase was dried by anhydrous sodium prior to removal of
validation using an external prediction set. Statistical evaluation solvent using a rotary evaporator with reduced pressure. Finally,
has been suggested to produce an appropriate predictive model.25 the residue was subjected to analysis by GC-MS.
Molecular descriptors have been designed for encoding
structural and physicochemical features and fingerprints. They GC-MS analysis
can be applied in various fields of structural design and property The profile of volatile compounds from the rice extract was
prediction, such as analysis of high throughput screening (HTS) determined using GC-MS (Model 6890N/5973, Agilent, Palo
results, finding new lead structures and lead hopping, modeling Alto, CA). The GC-MS temperature program started from 45 to
biological activities26 and in the study of chromatographic 250° C with a rate of increase of 3°C/min, and held for 30 min.
retention.27 A capillary column HP-5MS with dimensions of 30 m ×
Retention time in gas chromatographic analysis is the most 0.25 mm i.d. and 0.5 μm film thickness was used. The injection
important criteria to separate and identify the composition of the port temperature was set at 250° C. Purified helium gas was
substances. It is commonly used to identify the type of used as the carrier gas with the flow rate of 1.3 mL/min. The
substance by comparing with authentic compounds. GC injector was in a split mode with a 1:10 split ratio. The MS
Nevertheless, most of the samples are not pure and are condition was operated in the electron impact (EI) mode with
sometimes complex mixtures, therefore the development of a ionization voltage of 70 eV and the ion source temperature was
theoretical model for estimating the retention times seems to be set to 230° C. The MS quadrupole temperature was 150° C and
useful for reducing the time spent on analysis. This quantitative mass scan was in the range of 45 – 550 amu. The volatile
determination of the retention time in chromatographic studies compounds were tentatively identified by matching their mass
can be defined as a quantitative structure–retention relationship spectra with reference spectra complied in NIST05 and Wiley7n
(QSRR). QSRR has been demonstrated to be a powerful tool in mass spectral libraries. The structures of these volatile
chromatographic studies for estimation of retention data of compounds were confirmed by linear retention index (RI) using
novel compounds provided through their molecular descriptors. n-alkanes (Supelco) as the reference. This experiment was done
The models have been successfully elaborated for many types of in triplicate.
chromatography including, gas chromatography, planar
chromatography, column liquid chromatography, micellar liquid Data set
chromatography and affinity chromatography.28 The retention times of volatile compounds in extracts of
According to a number of previous studies, the superiority of KDML105 were obtained from GC-MS analysis. The data set
the QSRR model has been shown in describing retention data was divided into two subsets including training set and test set.
using physicochemical properties and Moreau–Broto It is difficult to give a general rule on how to choose the number
autocorrelation topological descriptors. Physicochemical of observations in each of the two parts. A typical split might
properties are an important factor as molecules having a similar be 20 – 25% for test set, therefore 35 compounds were used for
structure will also have a similar physicochemical property.29–31 the training set, and 10 compounds for the test set in this case.
Moreau–Broto autocorrelations are 2D-descriptors derived from The training set was used to generate the retention times of
the molecular graph weighted by atom physicochemical these compounds as a function of chemical descriptors and
properties based on spatial autocorrelation and contain encoded the test set was used to evaluate the predictive ability of the
information on structural fragments and therefore seem to be regression model. The structures obtained from GC-MS
ANALYTICAL SCIENCES NOVEMBER 2017, VOL. 33 1213
Table 1 Structural assignments and EI mass spectral of volatile compounds in KDML 105 rice analyzed by GC-MS
Retention Match,
No Structure m/z (% relative abundance) MW
time/min %
Table 3 Spearman rank correlation coefficient between molecular descriptor and retention time of volatile compounds in rice
Weight HAcc XlogP TPSA Polariz Dipole LogS Ident1 Ident2 Ident3
0.937a 0.317 0.663a 0.347a 0.898a 0.264 –0.626a 0.952a 0.963a 0.838a
0 0.064 0 0.041 0 0.125 0 0 0 0
35 35 35 35 35 35 35 35 35 35
Ident4 Ident5 Ident6 Ident7 Ident8 Ident9 Ident10 Ident11 SigChg1 SigChg2
0.858a 0.881a 0.906a 0.869a 0.794a 0.777a 0.746a 0.757a 0.575a –0.06
0 0 0 0 0 0 0 0 0 0.731
35 35 35 35 35 35 35 35 35 35
SigChg3 SigChg4 SigChg5 SigChg6 SigChg7 SigChg8 SigChg9 SigChg10 SigChg11 PiChg1
0.541a 0.685a 0.711a 0.790a 0.795a 0.786a 0.731a 0.729a 0.751a 0.286
0.001 0 0 0 0 0 0 0 0 0.095
35 35 35 35 35 35 35 35 35 35
PiChg2 PiChg3 PiChg4 PiChg5 PiChg6 PiChg7 PiChg8 PiChg9 PiChg10 PiChg11
–0.26 –0.146 –0.319 –0.166 0.091 –0.105 0.126 0.022 0.105 0.287
0.131 0.402 0.061 0.342 0.604 0.546 0.469 0.902 0.547 0.095
35 35 35 35 35 35 35 35 35 35
TotChg1 TotChg2 TotChg3 TotChg4 TotChg5 TotChg6 TotChg7 TotChg8 TotChg9 TotChg10
0.564 a –0.045 0.482 a 0.702 a 0.707 a 0.778 a 0.778 a 0.786 a 0.697 a 0.716a
0 0.798 0.003 0 0 0 0 0 0 0
35 35 35 35 35 35 35 35 35 35
TotChg11 SigEN1 SigEN2 SigEN3 SigEN4 SigEN5 SigEN6 SigEN7 SigEN8 SigEN9
0.743 a 0.924 a 0.908 a 0.701 a 0.785 a 0.889 a 0.929 a 0.905 a 0.800 a 0.779a
0 0 0 0 0 0 0 0 0 0
35 35 35 35 35 35 35 35 35 35
SigEN10 SigEN11 PiEN1 PiEN2 LpEN1 Polrz1 Polrz2 Polrz3 Polrz4 Polrz5
0.747 a 0.760 a 0.235 0.172 0.326 0.848 a 0.841 a 0.752 a 0.799 a 0.825a
0 0 0.174 0.323 0.056 0 0 0 0 0
35 35 35 35 35 35 35 35 35 35
Polrz6 Polrz7 Polrz8 Polrz9 Polrz10 Polrz11
0.816a 0.859a 0.791a 0.771a 0.745a 0.758a
0 0 0 0 0 0
35 35 35 35 35 35
a. Correlation is significant at the 0.01 level (2-tailed).
ANALYTICAL SCIENCES NOVEMBER 2017, VOL. 33 1215
between selected molecular descriptor and retention time of backward elimination and stepwise regression were applied. All
volatile compounds in rice. Spearman rank correlation techniques give the same best-fitted models with R-squared
coefficient was utilized to identify the physicochemical 0.900 as shown in Table 6 and the equation for prediction of
properties and autocorrelation of 2D interatomic distance retention time in this model is defined as Eq. (2):
descriptors as abbreviated in Table 2, subscripting a certain
topological distance k in Eq. (1) associated with the retention y = –0.023 + 0.945x1 (2)
time (Table 3).
The discrimination power of the variables effect on retention
QSRR model and informative descriptors elucidation time were in the order of molecular polarizability of the
PCA is a mathematical procedure that uses an orthogonal molecule, molecular weight of compound, σ atom
transformation to convert a set of observations of possibly electronegativities, σ atom charges, and total atom charges
correlated variables into a set of values of linearly uncorrelated (Ident1, Weight, Ident2, Polariz, SigEN7, SigEN1, SigEN2,
variables called principal components. PCA is applied for SigEN9, Polrz9, Ident9, SigEN6, Ident7, SigEN11, Ident11, SigEN10,
reduction of the molecular descriptor dimension. Selected Polrz11, Polrz10, SigEN8, Ident10, TotChg8, Ident6, SigChg8,
descriptors obtained from these structures were used for PCA to Ident5, Polrz7, Ident8, SigChg11, Polrz8, Polrz1, SigEN5, TotChg11,
extract the relevant elements, which can be reduced to eight Polrz6, SigChg10, Ident4, SigChg9, TotChg10, Polrz2, Ident3,
components with 94.64% of the total variance accounted, as TotChg9, Polrz5, SigEN4, SigEN3, SigChg7, TotChg7, SigChg5).
shown in Table 4, and molecular descriptors obtained in each
component are demonstrated in Table 5.
Modeling of retention times as a function of theoretically
derived descriptors of each chemical structure was established Table 6 Statistical parameters of PCR model
by PCA and PCR. The eight components from PCA of
molecular descriptors were selected to build an appropriate All enter regression
model to determine the relationship between retention time of a
b SEb β
compound and its chemical structure. By using the retention Variable (Unstandardized (Standard (Standardized t p-value
time as the dependent variable and eight major components of coefficient) error) coefficient)
the molecular descriptor variables as independent variables,
PCR was generated. All enter regression, forward selection, PCA1(x1) 0.947 0.057 0.95 16.72 0
PCA2(x2) 0.037 0.056 0.037 0.657 0.517
PCA3(x3) 0.043 0.057 0.043 0.757 0.456
PCA4(x4) 0.076 0.056 0.078 1.364 0.184
Table 4 Cumulative variation and eigenvalue in each principal PCA5(x5) 0.019 0.056 0.02 0.344 0.733
component of chemical structure PCA6(x6) 0.02 0.056 0.02 0.351 0.728
PCA7(x7) –0.046 0.056 –0.047 –0.82 0.42
Total variance explained PCA8(x8) –0.061 0.056 –0.061 –1.08 0.29
Component
Initial eigenvalue % of Variance Cumulative, % Constant –0.025; SEest (Standard error of the estimate) = ±0.33; R =
0.957; R2 = 0.916; F = 35.464; p-value <0.001.
1 36.175 47.599 47.599
2 13.469 17.723 65.322 Stepwise regression
3 8.171 10.751 76.073
4 4.850 6.382 82.455 Variable b SEb β t p-value
5 2.882 3.793 86.248
6 2.687 3.535 89.783 PCA1(x1) 0.945 0.055 0.949 17.249 0
7 2.000 2.632 92.415
Constant –0.023; SEest (Standard error of the estimate) = ±0.32; R =
8 1.693 2.228 94.643
0.949; R2 = 0.900; F = 297.513; p-value <0.001.
Ident1 Polrz11 Polrz6 HAcc Polrz3 TotChg3 SigChg6 PiChg9 PiChg4 PiChg3
Weight Polrz10 SigChg10 LpEN1 Polrz4 TotChg4 TotChg6 PiChg11
Ident2 SigEN8 Ident4 SigChg2 SigChg3 PiChg10
Polariz Ident10 SigChg9 TPSA SigChg4 PiChg8
SigEN7 TotChg8 TotChg10 SigChg1 TotChg5
SigEN1 Ident6 Polrz2 TotChg1
SigEN2 SigChg8 Ident3 PiEN1
SigEN9 Ident5 TotChg9 TotChg2
Polrz9 Polrz7 Polrz5 Dipole
Ident9 Ident8 SigEN4 PiEN2
SigEN6 SigChg11 SigEN3 LogS
Ident7 Polrz8 SigChg7 XlogP
SigEN11 Polrz1 TotChg7 PiChg1
Ident11 SigEN5 SigChg5 PiChg2
SigEN10 TotChg11 PiChg6
PiChg7
PiChg5
1216 ANALYTICAL SCIENCES NOVEMBER 2017, VOL. 33
Table 7 An external test set of compounds used to test the performance of the QSRR model
a. Z-score of predicted retention time values. b. Predicted retention time values. c. Experimental retention time values.
molecular structure and chemical properties of the solute 10. G. Reineccius, “Flavor Chemistry and Technology”, 2nd
determine the type and extent of the interactions of the solute ed., 2005, CRC Press, New York.
with these phases. The differences between these properties 11. X. Yang and T. Peppard, J. Agric. Food Chem., 1994, 42,
govern the retention behavior through the column. 1925.
The ultimate goal of this study has been accomplished. QSRR 12. A. Steffen and J. Pawliszyn, J. Agric. Food Chem., 1996,
models for the prediction of GC retention time of various 44, 2187.
volatile components from Thai rice can successfully be 13. G. B. Lockwood, J. Chromatogr. A, 2001, 936, 23.
developed. The proposed models have good predictive ability 14. C. C. Grimm, C. Bergman, J. T. Delgado, and R. Bryant,
and are of high statistical significance. The models are helpful J. Agric. Food Chem., 2001, 49, 245.
for the discovery of new components in Thai rice using retention 15. H. S. Lam and A. Proctor, J. Food Sci., 2003, 68, 2676.
time projected to molecular descriptors of the compounds, 16. E. T. Champagne, J. F. Thompson, K. L. Bett-Garber, R.
which can be used as fragment information for structural Mutters, J. A. Miller, and E. Tan, Cereal Chem., 2004, 81,
elucidation of the unknown component and the PCA is useful 444.
for highlighting the key molecular descriptor for explaining 17. S. Wongpornchai, K. Dumri, S. Jongkaewwattana, and B.
chromatographic mechanisms. Siri, Food Chem., 2004, 87, 407.
18. Z. Zeng, H. Zhang, J. Y. Chen, T. Zhang, and R. Matsunaga,
Cereal Chem., 2007, 84, 423.
Acknowledgements 19. D. S. Yang, R. L. Shewfelt, K. S. Lee, and S. J. Kays,
J. Agric. Food Chem., 2008, 56, 2780.
We gratefully acknowledge the Center of Excellence for 20. K. Mahattanatawee and R. L. Rouseff, Food Chem., 2014,
Innovation in Chemistry (PERCH-CIC) and the Graduate 154, 1.
School, Chiang Mai University, for financial support. P. N. and 21. R. G. Buttery, J. G. Turnbaugh, and L. C. Ling, J. Agric.
N. W. acknowledge partial financial support from CMU-IC Food Chem., 1988, 36, 1006.
research project for Asean+3 Cross Border Research, the Center 22. S. Mahatheeranont, S. Promdang, and A. Chiampiriyakul,
of Excellence for Innovation in Analytical Science, CMU and Kasetsart J. Nat. Sci., 1995, 29, 508.
Standardization and Development of Miang Extract and 23. M. Grover, B. Singh, M. Bakshi, and S. Singh, Pharm. Sci.
Chemical Analysis Methodology Project, ARDA & NRCT, Technol. Today, 2000, 3, 28.
Thailand. 24. Z. Garkani-Nejad, M. Karlovits, W. Demuth, T. Stimpfl, W.
Vycudilik, M. Jalali-Heravi, and K. Varmuza, J. Chromatogr.
A, 2004, 1028, 287.
References 25. L. Xu and W.-J. Zhang, Anal. Chim. Acta, 2001, 446, 475.
26. M. Wagener, J. Sadowski, and J. Gasteiger, J. Am. Chem.
1. W. E. Marshall and J. I.. Wadsworth, “Rice Science and Soc., 1995, 117, 7769.
Technology”, 1993, Taylor & Francis, New York. 27. T. Gobbo-Neto, J. Schmidt, and F. B. Da Costa, J. Chem.
2. B. O. Juliano, “Rice: Chemistry and Technology”, 1985, Inf. Model., 2015, 55, 26.
American Association of Cereal Chemists, Minnesota. 28. K. Héberger, J. Chromatogr. A, 2007, 1158, 273.
3. V. Leardkamolkarn, W. Thongthep, P. Suttiarporn, R. 29. S. Z. Kovacevic, S. O. Podunavac-Kuzmanovic, L. R.
Kongkachuichai, S. Wongpornchai, and A. Wanavijitr, Food Jevric, P. T. Jovanov, E. A. Djurendic, and J. J. Ajdukovic,
Chem., 2011, 125, 978. Eur. J. Pharm. Sci., 2016, 93, 1.
4. B. M. Rao, U. V. R. V. Saradhi, N. S. Rani, S. Prabhakar, G. 30. M. M. Talmaciu, E. Bodoki, J. Platts, and R. Oprean, Stud.
S. V. Prasad, G. S. Ramanjaneyulu, and M. Vairamani, Food Ubb. Che., 2016, 4, 99.
Chem., 2007, 105, 736. 31. L. T. Qin, S. S. Liu, F. Chen, Q. F. Xiao, and Q. S. Wu,
5. T. Sriseadka, S. Wongpornchai, and P. Kitsawatpaiboon, Chemosphere, 2013, 90, 300.
J. Agric. Food Chem., 2006, 54, 8183. 32. T. B. Oliveira, L. Gobbo-Neto, T. J. Schmidt, and F. B. Da
6. R. G. Buttery, L. C. Ling, B. O. Juliano, and J. G. Costa, J. Chem. Inf. Model., 2015, 55, 26.
Turnbaugh, J. Agric. Food Chem., 1983, 31, 823. 33. M. H. Fatemi and H. Malekzadeh, J. Iran. Chem. Soc.,
7. R. J. Bryant and A. M. McClung, Food Chem., 2011, 124, 2015, 12, 405.
501. 34. T. Puzyn, J. Leszczynski, and M. T. Cronin, “Recent
8. N. J. N. Yau and T. T. Liu, J. Sens. Stud., 1999, 14, 209. Advances in QSAR Studies: Methods and Applications”,
9. A. M. D. Mundo and B. O. Juliano, J. Texture Stud., 1981, 2010, Springer Netherlands, Dordrecht.
12, 107.