Sei sulla pagina 1di 42

Prediction of Drug Like Properties

Gisbert Schneider. Models are to be used, not believed. (H. Theil)

The Drug-Likeness Concept


Historically, computer-aided molecular design (CAMD) has focused on lead identification and lead optimization, and many innovative strategies have been developed that assist in improving the binding affinities of drug candidates to specific receptors. One such method, QSAR, has been discussed in the previous Chapter. In this Chapter, we will discuss the emerging concept of druglikeness, as well as the computational modeling of a set of physicochemical and biological properties that play an important role in the transformation of a clinical lead to a marketed drug. Although high potency is an important factor in pharmacological design, one must also recognize the huge gulf between a tightly bound inhibitor and a bioavailable drug.1 Far too often, promising candidates are abandoned during clinical trialsor worse, withdrawn after market launch in the medico-economic phasefor a variety of reasons, including low bioavailability, high toxicity, poor pharmacokinetics, or drug-drug interactions. In addition, the advent of parallel synthesis methods and high throughput screening has placed increasing stress on the technology that has traditionally been used to assess potential drug candidates in non-clinical development. Due to the limited time and resources available to conduct formal in vivo studies, typically only tens of candidates will be screened. Thus, prioritization by computational means prior to experiment is important in order to ensure that valuable resources are apportioned to the most promising candidates. Drug molecules generally act on specific targets at the cellular level, and exert therapeutic action upon binding to receptors that subsequently modify the cellular machinery. Before a drug molecule exerts its pharmaceutical (pharmacodynamic) effect on the body via interaction with its target, it must travel through the body to reach the site of drug action. The study of pharmacokinetics refers to the journey of the drug from its point of entry to the site of action. Broadly speaking, this process can be defined by the following phases: absorption, distribution, metabolism, and excretion (ADME). The first hurdle for an orally administrated drug is adequate absorption from the gut wall into the blood circulatory system. Upon absorption, it will be transported to the liver, where it is liable to modification by a panel of hepatic microsomal enzymes; some molecules may be metabolized and some may be excreted via the bile. If a drug molecule survives this first-pass metabolism, it will enter arterial circulation and is subsequently distributed to the body, including the target tissue. Once the drug has triggered the desirable therapeutic response, it should be steadily eliminated from the body; otherwise bioaccumulation may become a concern. In addition, a 1

drug must not cause any serious toxic side effects, including, but not limited to interference with the actions of any other drugs the patient may be taking. Such interference is normally caused by enzyme induction, a process in which one drug stimulates an enzyme, thereby causing a change in the metabolism of a second drug. It is not surprising then, that even though the chemical structures of drugs can differ greatlyin accordance with the requirement of complementary interactions to diverse target receptors successful drugs on the market today do share certain similarities in their physicochemical properties. Primarily, such characteristics determine the pharmacokinetics of the drug, where favorable ADME (absorption, distribution, metabolism, and excretion) properties are required. 2,3 Perhaps the most well-known study in this area is the work of Lipinski and coworkers at Pfizer, who performed a statistical analysis of 2,200 drugs from the World Drug Index (WDI). 4 They established a set of heuristics that appears to be generally valid for the majority of the drugs considered in the study, normally referred to as the Pfizer rule, or the rule of five, which states that the absorption or permeation of a drug (that is not a substrate for a biological transporter) is likely to be impaired when:

logP > 5 Molecular weight >500 Number of hydrogen donor groups >5 Number of hydrogen acceptor groups >10

The beauty of this rule lies in its simplicity. Because all parameters can be easily computed, the Pfizer rule (or its variants) has become the most widely applied filter in virtual library design today. However, it should be stressed that compliance to the rule does not necessarily make a molecule drug-like. In fact, the Pfizer rule by itself appears to be a rather ineffective discriminator between drugs and non-drugs. Frimurer et al showed that using the above criteria, only 66% of the compounds in the MDL Drug Data Report (MDDR) database, which contains compounds with demonstrated biological activities, were classified as drug-like; whereas 75% of the supposedly nondrug-like compounds from the Available Chemical Directory (ACD) were in fact regarded as drug-like.5 In other words, if the primary objective is to isolate drugs from nondrugs in the broadest sense, the Pfizer rule fares no better than making close to random assignments. Obviously, a more complex set of logical rules is required to recognize molecules with drug-like properties. Independently, two research groups investigated the use of artificial neural networks to develop virtual screening tools that can distinguish between drug-like and nondrug-like molecules. The results of their work were published in two back-to-back articles in the Journal of Medicinal Chemistry in 1998.6,7 The first paper was a contribution from Ajay, Walters, and Murcko at Vertex 2

Pharmaceuticals.6 They selected a set of approximately 5000 compounds from the Comprehensive Medicinal Chemistry (CMC) database serving as a surrogate for drug-like molecules. They also chose a similar number of drug-size compounds from the ACD to represent molecules that were nondrug-like. Seven simple 1D descriptors were generated to encode each molecule, including molecular weight, number of hydrogen-bond donors, number of hydrogen-bond acceptors, number of rotatable bonds, 2 (the degree of branching of a molecule), aromatic density, and logP. To augment these 1D features, Ajay and coworkers also considered a second set of 2D descriptors. They were the 166 binary ISIS keys, which contained information on the presence or absence of certain substructural features in a given molecule. A Bayesian neural network (BNN) was used to train a subset of 7,000 compounds, which was comprised of approximately equal numbers of compounds from the CMC and ACD sets. The trained neural network was then applied to the remaining CMC and ACD compounds that were outside the training set. As an external validation they also tested their network on a large collection of compounds from the MDDR databases, which were assumed to contain mostly drug-like candidates. The accuracy of classification for the test predictions using different combinations of 1D and 2D descriptor sets is summarized in Table 1. Neural network models using seven 1D descriptors alone classified about 83% of the CMC compounds as drugs, and about 73% of the ACD set as nondrugs. The majority (65%) of the MDDR compounds were predicted to be drug-like, which was in accordance with general expectation. When 2D ISIS descriptors were utilized the classification accuracy for the ACD (82%) and the MDDR (83%) compounds improved significantly, though this was at the expense of inferior prediction for the CMC set (78%). The combined use of 1D and 2D descriptors yielded the best prediction overall. The classification accuracy of both CMC and ACD approached 90% and, in addition, about 78% of the MDDR compounds were classified as drug-like. Furthermore, the Vertex team was able to extract the most informative descriptors and suggested that all seven 1D and only 71 out of 166 ISIS descriptors provided relevant information to the neural network. It was demonstrated that the prediction accuracy of a neural network using this reduced set of 78 descriptors was essentially identical to the full model. Finally, to demonstrate the utility of this drug-likeness filter, the researchers conducted a series of simulated library design experiments and concluded that their system could dramatically increase the probability of picking drug-like molecules from a large pool of mostly nondrug-like entities.

Table 1 Average drug-likeness prediction performance of a Bayesian neural network with five hidden nodes on 10 independent test sets6
Descriptors 7 1D 166 ISIS 7 1D + 166 ISIS 7 1D + 71 ISIS CMC / % ACD / % MDDR druglike / % 8184 7175 6168 7779 8183 8384 8991 8889 7779 8890 8788 7780

Average drug-likeness prediction performance of a Bayesian neural network with five hidden nodes on 10 independent test sets. Another neural network-based drug-likeness scoring scheme was reported by Sadowski and Kubinyi from BASF.7 They selected 5,000 compounds each from the World Drug Index (WDI) and ACD, to serve as their databases of drug-like and nondrug-like compounds. The choice of molecular descriptors in their application was based on Ghose and Crippen atom-types, 8 which have been successfully used in the prediction of other physicochemical properties such as logP. In this study, each molecule was represented by the actual count for each of the 120 atom-types found. The full set of descriptors was pruned to a smaller subset of 92 that were populated in at least 20 training molecules, a procedure designed to safeguard against the neural network learning single peculiarities. Their neural network, a 9251 feed-forward model, classified 77% of the WDI and 83% of the ACD compounds correctly. Application of the neural network to the complete WDI and ACD databases (containing > 200,000 compounds) yielded similar classification accuracy. It was noteworthy that, in spite of this apparently good predictivity, Sadowski and Kubinyi did not advocate the use of such a scoring scheme to evaluate single compounds because they believed that there was still considerable risk of misclassifying molecules on an individual basis. Instead, they believed that it would be more appropriate to apply this as a filter to weed out designs with very low predicted scores. Recently, Frimurer and coworkers from Novo Nordisk extended these earlier works by attempting to create a drug-likeness classifier that uses a neural network trained with a larger set of data. 5 Again, MDDR and the ACD were used as the sources of drug-like and nondrug-like entities. The MDDR compounds were partitioned into two sets. The first set represents 4,500 compounds that have progressed to at least Phase I of clinical trials (i.e., they should be somewhat drug-like), and the second was a larger collection of 68,500 molecules that have the status label of Biological Testing (i.e., lead-like). To decrease the redundancy of the data sets, a diversity filter was applied to the data set so that any MDDR compounds that had a Tanimoto coefficient (based on ISIS fingerprints) of greater than 0.85 amongst themselves were removed. This procedure discarded about 100 compounds from the drug-like MDDR set and 8,500 compounds from the lead-like set. 4

To reinforce the nondrug-like nature of the ACD set, any compounds that are similar to the 4,400 MDDR drug-like set (greater than a Tanimoto cutoff of 0.85) were eliminated, leaving 90,000 ACD compounds for data analysis. After removing the redundant entries, the 4,400 MDDR drug-like set was partitioned into 3,000 training compounds and 1,400 test compounds, and the ACD compounds into a 60,000 member training and a 30,000 member test set. The 60,000 lead-like MDDR compounds were not utilized in any way during model construction, but were used only as external validation data. Each compound was represented by three molecular descriptors (number of atoms, no of heavy atoms, and total charge) in addition to 77 CONCORD atom-type descriptors encoding the frequency of occurrence (normalized by the entire data set) of particular atom types. Empirically, it was concluded that the optimal neural network configuration contained 200 hidden nodes, based on the quality of test set predictions. This neural network gave a training and test Matthews correlation coefficient of 0.65 and 0.63, respectively (Eq. 2.18). 9 Using a threshold value of 0.5 as a criteria to distinguish drug from nondrug, the neural network was able to classify 98% of the ACD compounds but only 63% of the MDDR drug-like set. By lowering the prediction threshold, an increasing number of MDDR drug-like compounds would be correctly identified, at the expense of more false positives for the ACD set. They claimed that a threshold value of 0.15 (anything above that was classified as a drug) was an optimal cutoff, providing the best discrimination between the two data sets. Below this threshold value, 88% of the MDDR drug-like set and ACD databases were correctly classified. In addition, 75% of the MDDR lead-like molecules were also predicted as drug-like. The decrease in percentage from the lead-like to the drug-like set was not unexpected given that there may still be some intrinsic differences between the two classes of compounds. Finally, Frimurer and coworkers probed for the most informative descriptors that allowed for discrimination between drugs and nondrugs. By setting each of the 80 descriptors systematically to a constant value (zero was used in this case), and monitoring the variation in training errors of each sub-system. They argued that the removal of an important descriptor from the input would lead to a substantial increase in training error. Fifteen key descriptors were identified by this method; they were aromatic X-H, non-aromatic X-H, C-H, C=O, sp2 conjugated C, =N-, non-aromatic N, N=C, non-aromatic O, sp2 O, sp2 P, F, Cl, number of atoms, and total charge. The performance of their neural network was commendable, even with this vastly reduced set of descriptors. Using a prediction threshold of 0.15, 82% of the MDDR and 87% of the ACD compounds were correctly classified. An interesting aspect of the drug-likeness scoring function that was briefly discussed in the Frimurer publication concerns the setting of the threshold value. For example, if the purpose of the scoring function is to limit the number of false-positive predictions, then a higher cutoff value

should be used for the threshold. Table 2 gives the percentages of ACD and MDDR compounds that are correctly classified using different cutoff values.

Table 2Percentage of ACD and MDDR compounds that are correctly predicted with their corresponding threshold values in the drug-likeness classifier of Frimurer et al5
Cutoff % ACD correctly predicted % MDDR correctly predicted 0.05 72 95 0.15 88 88 0.35 95 74 0.50 98 63 From: Prediction of Drug-Like Properties As reported, a model with a higher cutoff value contributes fewer false positives (i.e., nondrugs that were predicted as drug-like), although this comes at an expense of worse MDDR classification. It is important to keep in mind that the cutoff value should be set depending on whether false positives or false negatives are more harmful for the intended application. 10 In a typical virtual screening application, we usually like to first identify and then remove molecules that are predicted to be nondrug-like from a large compound library. Let us assume x is the percentage of the compounds that are actually drugs in the library, and that pD is the probability that a drug is correctly identified as drug-like, and pN is the probability that a nondrug is correctly identified as nondrug-like. To gauge the performance of the drug-likeness scoring function one would compute what percentage of the compounds that were flagged as drug-like were actually drugs. This quantity, denoted henceforth as drug fraction, is given by Equation 1:

If we assume that for a given threshold value, p D and pN take the values of %MDDR and %ACD that are correctly classified, we can plot how drug fraction varies with x, the percentage of drugs in the complete library. Figure 1 shows the hypothetical curves for each threshold value listed in Table 2. In all cases the drug scoring function gives substantial enrichment of drugs after the initial filtering. This is particularly true in situations where the fraction of actual drug molecules in the library is very small, a phenomenon that is perhaps reminiscent of reality. Based on statistics reported by Frimurer et al, the reduction of false positives is, in fact, the key to this kind of virtual screening application, and therefore a high threshold value should be set. Thus, although on a percentage basis a 0.15 threshold seems the most discriminating (88% of both ACD and MDDR), 6

the premise under which virtual screening is applied calls for more rigorous removal of false negatives, even at the expense of a loss of true positives.

Figure 1
Graphs showing how drug fraction varies with the percentage of drugs in the library (see text). From: Prediction of Drug-Like Properties Finding a generally applicable scoring function to predict the drug-likeness of a molecule will remain one of the most sought-after goals for pharmaceutical researchers in the coming years. The tools that exist today can discriminate between molecules that come from presumably drug-like (e.g., MDDR, CMC, WDI) or nondrug-like (e.g., ACD) databases. In our opinion, the majority of the MDDR and CMC compounds should be regarded as lead-like and not, strictly speaking, druglike. Ideally, the drug-like training set should contain only drugs that have passed all safety hurdles. We also believe that the nondrug set should consist of molecules that have close resemblance to marketed drugs (i.e., at least somewhat lead-like) but were abandoned during pre-clinical or clinical development. We anticipate that the analysis will benefit from the more rigorous definition of drug and nondrug because the intrinsic differencepresumably owing to their pharmacokinetics or toxicological characteristicsbetween them will be amplified. In a recent review article, Walters, Ajay, and Murcko wrote:11 [What] we may witness in coming years might be attempts to predict the various properties that contribute to a drug's success, rather than the more complex problem of drug-likeness itself. These might include oral absorption, blood-brain barrier penetration, toxicity, metabolism, aqueous solubility, logP, pKa, half-life, and plasma protein binding. Some of these properties are themselves rather complex and are likely to be extremely difficult to model, but in our view it should be possible for the majority of properties to be predicted with better-than-random accuracy. 7

This divide-and-conquer approach to drug-likeness scoring also brings better interpretability to the result. The potential liability of a drug candidate becomes more transparent, and an appropriate remedy can be sought out accordingly. In the following sections of this Chapter we will discuss the role played by adaptive modeling and artificial intelligence methods in the prediction of individual properties that contribute to the overall drug-likeness of a molecule.

Physicochemical Properties
An implicit statement of the Pfizer rule is that a drug must have a balanced hydrophilic-lipophilic character. Two physicochemical parameters have the most profound influence on drug-like properties of a molecule. (i) aqueous solubility, which is critical to drug delivery; (ii) hydrophobicity, which plays a key role in drug absorption, transport and distribution.

Aqueous Solubility
A rapidly advancing area of modern pharmaceutical research is the prediction of the aqueous solubility of complex drug-sized compounds from their molecular structures. The ability to design novel entities with sufficient aqueous solubility can bring many benefits to both pre-clinical research and clinical development. For example, accurate activity measurements can be obtained only if the substance is sufficiently solubleabove the detection limits of the assay. Otherwise, a potentially good SAR can be obscured by apparent poor activity due to insufficient solubility rather than inadequate potency. Finding a ligand with adequate solubility is also a key factor that determines the success of macromolecular structure determination. In X-ray crystallography, the formation of crystals appears to be very sensitive to the solubility of ligands. Most biostructural NMR experiments require ligands dissolved at a relatively high concentration in a buffer. At a more downstream level in drug development, the solubility of a drug candidate has perhaps the most profound effect on absorption. Although pro-drug strategies or special methods in pharmaceutical formulation can help to increase oral absorption, the solubility largely dictates the route of drug administration and, quite often, the fate of the drug candidates. The aqueous solubility of a substance is often expressed as log units of molar solubility (mol/L), or logS. It is suggested that solubility is determined by three major thermodynamic components that describe the solubilization process.4 The first is the crystal packing energy, which measures the strength of a solid lattice. The second is the cavitation energy, which accounts for the loss of hydrogen bonds between the structured water upon the formation of a cavity to host the solute. The third is the solvation energy, which gauges the interaction energy between the solute and the water molecules. To account for these effects, a number of experimental and theoretical descriptors have been introduced to solubility models in the past year. Some of them include melting points, 1214 8

cohesive interaction indices,15 solvatochromic parameters,16,17 shape, electronic and topological descriptors,1823 and mobile order parameters.24 Most of this work has been summarized in an excellent review by Lipinski et al,4 and will not be discussed here. In this Section, we will focus on some of the most recent developments involving the use of neural networks to correlate a set of physicochemical or topological descriptors with experimental solubility. The earliest neural network-based solubility model in the literature was reported by Bodor and coworkers.18 Fifty-six molecular descriptors, which mostly accounted for geometric (e.g., surface, volume, and ovality), electronic (e.g., dipole moment, partial charges on various atom-types) and structural (e.g., alkenes, aliphatic amines, number of N-H bonds) properties, were generated from the AMPAC optimized structures of 311 compounds. Empirically, Bodor et al determined that 17 out of the 56 descriptors seemed most relevant for solubility and the resulting 17181 neural network yielded a standard deviation of error of 0.23, which was superior to the corresponding regression model (0.30), based on identical descriptors. In spite of such success, we think that there are two major deficiencies in this neural network model. First, the use of 18 nodes in the hidden layer may be excessive for this application, given that there are only 300 training examples. Second, some of the 17 input descriptors are, in our opinion, redundant. For example, the inclusion of functional transforms of a descriptor (e.g., Q N2 and QN4 are functions of QN) might be unnecessary because a neural network should be able to handle such mapping implicitly. To overcome such limitations PCA and smaller networks could be applied. The research group of Jurs at Pennsylvania State University has investigated many QSPR/QSAR models for a wide range of physical or biological properties based on molecular structures. 2123,2531 Recently, they published two solubility studies using their in-house ADAPT (Automated Data Analysis and Pattern Recognition Toolkit) routine and neural network modeling. 22,23 Briefly, each molecule was entered into the ADAPT system as a sketch and the three-dimensional structure was optimized using MOPAC with the PM3 Hamiltonian. In addition to topological indices, many geometric and electronic descriptors, including solvent-accessible surface area and volume, moments of inertia, shadow area projections, gravitational indices, and charged partial surface area (CPSA) were computed. To reduce the descriptor set they applied a genetic algorithm and simulated annealing techniques to select a subset of descriptors that yielded optimal predictivity of a 'validation' set (here, a small set of test molecules that was typically 10% of the training set). In the first study,22 application of the ADAPT procedure to 123 organic compounds led to the selection of nine descriptors for solubility correlation. The rms errors of the regression and the 931 neural network models were 0.277 and 0.217 log units, respectively. In the next study, 23 the same methodology was applied to a much larger data set containing 332 compounds, whose solubility spanned a range of over 14 log units. The best model reported in this study was a 961 neural 9

network yielding a rms error of 0.39 log units for the training compounds. It is noteworthy that there was no correspondence between any of the current descriptors to the set that was selected by their previous model. A possible explanation is that the ADAPT descriptors may be highly intercorrelated and therefore the majority of the descriptors are interchangeable in the model with no apparent loss in predictivity. Perhaps the most comprehensive neural network studies of solubility were performed by Huuskonen et al.3234 In their first study,32 system-specific ANN models were developed to predict solubility for three different drug classes, which comprised 28 steroids, 31 barbituric acid derivatives, and 24 heterocyclic reverse transciptase (RT) inhibitors. The experimental logS of these compounds ranged from 5 to 2. For each class of compounds, the initial list of descriptors contained 30 molecular connectivity indices, shape indices, and E-state indices. Five representative subgroups of descriptors were established, based on the clustering of their pairwise Pearson correlation coefficients. A set of five parameters were then selected, one from each subgroup, as inputs to a 531 ANN for correlation analysis. Several five-descriptor combinations were tried, and those that gave the best fit of training data were further investigated. To minimize overtraining, an early stopping strategy was applied, so that the training of the neural network stopped when the leave-one-out cross-validation statistics began to deteriorate. The final models yielded q2 values of 0.80, 0.86, and 0.72 for the steroids, barbiturates, and RT inhibitors classes, respectively. Overall, the standard error of predictions was approximately 0.3 to 0.4 log units. Since each ANN was optimized with respect to a specific compound class, it was not surprising that application of solubility models derived from a particular class to other classes of compounds yields unsatisfactory results. It was more surprising, however, that the effort to unravel an universal solubility model applicable to all three classes of compounds also proved unsuccessful (Note: They could, in theory, obtain reasonable predictivity for the combined set if an indicator variable was introduced to specify each compound class. However, this would obviously defeat the purpose of a generally applicable model). One possible explanation is that the combined data set (83 compounds in total) contained compounds segregated in distinct chemical spaces and it would be difficult to find a set of common descriptors that could accurately account for the behavior of each group of compounds. In their next study,33 Huuskonen et al collated experimental solubilities of 211 drugs and related analogs from literature. This set of compounds spanned approximately six log units (from 5.6 to 0.6), which was almost twice the range of their previous study. Thirty-one molecular descriptors, which included 21 E-state indices, 7 molecular connectivity indices, number of hydrogen donors, number of hydrogen acceptors, and an aromaticity indicator, were used initially in model building. The final number of descriptors was later pruned to a subset of 23 by probing the contribution of 10

each individual parameter. The final ANN model had a 2351 configuration, and yielded r2 = 0.90, and s = 0.46 for the 160-member training set, and r 2 = 0.86, s = 0.53 for the remaining 51 test compounds. Besides these descriptive statistical parameters, the authors also published the individual predictions for these compounds. Because all 24 RT inhibitors from the previous study were part of the data set, this allowed us to investigate the relative merit of a system-specific solubility model versus a generally applicable model. Of the 24 compounds, 20 were selected for the training set, and 4 were used as test compounds. Figure 2(a) shows the predicted versus observed aqueous solubilities for the RT inhibitor in their previous system-specific model, and Figure 2(b) is the corresponding plot for the predicted solubilities from the general purpose model. It is clear that, although the predictions of most RT inhibitors were within the correct solubility range (logS _2 to _5), a comparison of individual predictions for this class of compounds reveals very weak correlation (r2 = 0.16; s = 0.73). This result contrasted sharply with the very good predictivity (r2 = 0.73; s = 0.41) when the RT inhibitors had been considered on their own. 32 This supports the notion that a system-specific solubility predictor is more accurate than a general one, though the former obviously has only limited scope. Thus, one must choose an appropriate prediction tool depending on the nature of the intended application. For instance, if the emphasis is on a single class of compounds we should consider the construction of a specialist model (provided there are sufficient experimental data for the series) or recalibrate the general model by seeding its training set with compounds of interest.

Figure 2
Predicted versus experimental solubility for the 24 RT inhibitors using (a) a system-specific solubility model and (b) a general model. From: Prediction of Drug-Like Properties 11

In his most recent study,34 Huuskonen attempted to improve the accuracy and generality of his aqueous solubility model by considering a large, diverse collection of 1,300 compounds. The logS values for these compounds ranged from _11.6 to +1.6, which essentially covers the range of solubilities that can be reliably measured. The full data set was partitioned to a randomly chosen training set of 884 compounds and a test set of 413. Starting from 55 molecular connectivity, structural, and E-state descriptors, he applied a MLR stepwise backward elimination strategy to reduce the set to 30 descriptors. For the training data, this equation yielded r 2 = 0.89, s = 0.67, and r2cv = 0.88, scv = 0.71. The statistical parameters for the 413 test set were essentially identical to that of leave-one-out cross-validation, thereby indicating the generally robust nature of this model. He applied ANN modeling to the same set of parameters in order to determine whether the prediction could be further improved via nonlinear dependencies. Using a 30121 ANN, he obtained r2 = 0.94, s = 0.47 for the training set, and r 2 = 0.92, s = 0.60 for the test set, which were both significantly better than the MLR model. The general applicability of the MLR and ANN models was further verified by application to a set of 21 compounds suggested by Yalkowsky, 35 which has since become a benchmark for novel methods. The r2 and s values for the MLR model are 0.83 and 0.88, and for the ANN, 0.91 and 0.63, in good agreement with their respective cross-validated and external test statistics. Both results were, however, significantly better than those derived from their previous model constructed using 160 training compounds (r2 = 0.68, s = 1.25). This indicated that a large and structurally diverse set of compounds were required to train a model capable of giving reasonable solubility predictions for structures relevant to pharmaceutical and environment interest, such as the set of compounds under consideration in this study.

logP
The n-octanol/water partition coefficient of a chemical is the ratio of its concentration in n-octanol to that in aqueous medium at equilibrium. The logarithm of this coefficient, logP, is perhaps the best-known descriptor in classical QSAR studies. The reason for the usefulness of this property is related to its correlation with the hydrophobicity of organic substances, which plays a key role in the modulation of many key ADME processes. Specifically, drug-membrane interactions, drug transport, biotransformation, distribution, accumulation, protein and receptor binding are all related to drug hydrophobicity.36 The significance of logP is also captured by the Rule-of-5, 4 which states that a molecule will likely be poorly absorbed if its logP value exceeds five. Other researchers also established links between logP and blood-brain barrier (BBB) penetration, a critical component in the realization of activity on the central nervous system (CNS).10,3742 For CNS-active compounds, usually a logP around 45 is required. 12

One of the earliest attempts to derive logP values from computational means was the f-constant method proposed by Rekker.43 Later, Leo and Hansch made a significant advance to this fragmentbased approach that ultimately led to the successful development of the widely popular ClogP program.44 In summary, they assumed an additive nature of hydrophobicity values from different molecular fragments, whose parameter values were calibrated by statistical analysis of a large experimental database. To estimate the logP value of a novel molecule, the chemical structure is first decomposed into smaller fragments that can be recognized by the program. The logP value of the molecule is simply the incremental sum of parameter values from the composite fragments, and in some cases, additional correction factors. The main advantage of a fragment-based method is that it tends to be very accurate. However, this approach suffers from two major problems. The first is that the molecular decomposition process is often very tricky. The second, and the more serious, concerns missing parameter values when a given structure cannot be decomposed to structures for which fragment values are available. Thus, it becomes more fashionable to treat the molecule in its entirety, and to correlate its logP value with descriptors that are easy to calculate. Most published reports follow this scheme and are based on the use of MLR or ANN on some combination of electronic and steric properties. For example, molecular descriptors such as atomic charges, hydrogen bond effects, molecular volumes or surface areas have been considered in this role. Schaper and Samitier proposed a logP method based on an ANN to determine the lipophilicity of unionized organic substances by recognition of structural features.45 Molecules were encoded by a connection table, where indicator variables were used to denote the presence or absence of specific atoms or bonds in different molecular positions. Eight different atom types (C, N, O, S, F, Cl, Br, and I) and four different bond types (single, double, triple, and resonant) were represented in their implementation. For compounds with up to 10 non-hydrogen atoms, a full description of the molecule required 260 variables (10 8 indicator variables for atoms and 45 4 for bonds). After preliminary analysis of their data set, which was comprised of 268 training and 50 test compounds, 147 non-zero descriptors were retained. They experimented with three different hidden layer configurations (2, 3, and 4 hidden nodes) and suggested that an ANN with three hidden layer neurons was the optimal choice based on the prediction accuracy of the test set. The 14731 NN yielded a Pearson correlation coefficient (rtrn) of 0.98 and a standard deviation (s trn) of 0.25 between observed and calculated logP values for the training compounds. It is interesting to note that, despite the use of a large number of adjustable parameters (448), this particular NN showed little evidence of overfitting: the test set correlation coefficient (rtst) is 0.88 and standard deviation (stst) = 0.66. The authors suggested that with a decrease in the rho ( r) ratio (either an increase in data objects or a reduction in non-critical indicator variables), the predictivity of this type of NN system would further increase. The major shortcoming of this approach is that the molecular representation 13

is based on connection matrix/indicator variables. Their study was limited to compounds containing no more than 10 non-hydrogen atoms. With a connection matrix, the number of input descriptors to the ANN increases quadratically with the maximum number of allowed atoms (N MaxAtom) in the data set. For example, using the current scheme of 8 atom-types and 4 bond-types, the total number of descriptors is calculated by:

If we were to apply this method to drug-size molecules, which contain on average of 2025 nonhydrogen atoms, then the ANN would need to deal with approximately 1,000 indicator variables. Introduction of new atom-types, such as phosphorus, would add further complexity to the molecular description. This begs the question: are all these descriptors necessary to produce a sound logP model? The answer is, most probably, no. We speculate that because molecular connectivity descriptors have no physical meaning a large number of them is required to depict or correlate physicochemical properties. If physically meaningful descriptors are used, then one may obtain a more direct relationship from fewer predictors. In a recent study, Breindl, Beck, and Clark applied semi-empirical methods to obtain a small set of quantum chemical descriptors to correlate the logP values for 105 organic molecules. 46 They used the CONCORD program to convert 2D connectivity into standard 3D structures, 47 whose geometries were further refined by energy minimization using SYBYL. 48 The structures were then optimized using VAMP,49 a semi-empirical program. The input descriptors, which included both electrostatic and shape properties of the molecules, were derived from AM1 and PM3 calculations. Using MLR analysis, they derived a 10-term equation that reported a r trn value of 0.94 and rcv of 0.87. The choice of descriptors for this MLR model was further analyzed using ANN. With a 124 1 back-propagation network, they improved the fitting of the training set to r trn = 0.96 and rcv = 0.93 and, furthermore, the neural network also seemed to perform consistently well on 18 test set molecules. Finally, this approach was validated with a larger data set of 1085 compounds, for which 980 molecules were used as the training set and 105 were held back for testing. The best performance was obtained with a 16251 network, which yields a r trn = 0.97 for training and a rcv of 0.93 with the AM1 parameters, and a slightly worse (r trn = 0.94 and rcv = 0.91, strn = 0.45) result for the PM3 set. Again, the validity of the neural network result was confirmed by accurate test set predictions, which yielded impressive statistical parameters of rtst = 0.95 and stst = 0.53 for the AM1 result and, again, slightly worse values for the PM3 set (r tst = 0.91; stst = 0.67). The deficiency of the PM3 set was further analyzed, and it was concluded that there was a systematic problem with the 14

estimation of logP values for those compounds with large alkyl chains. They reasoned that the large error was due to the uncertainty of appropriate conformations from gas phase geometries under their setup. By systematically varying the values of one input descriptor while keeping others fixed, they concluded that the logP values were predominately influenced by three descriptors, namely polarizability, balance u, and charge OSUM. Furthermore, a direct linear dependence between logP and polarizability was observed. On the other hand, the effects for the balance parameter and OSUM were shown to be highly non-linear with respect to logP. Overall, it seems that reliable logP models can be sought using a few quantum chemical parameters, although the time-consuming nature of the calculation makes it less attractive for analysis of large virtual libraries. To address some of the limitations of the older QSPR approaches, Huuskonen and coworkers proposed the use of atom-type electrotropological state (E-state) indices for logP correlation. 36 The E-state indices were first introduced by Kier and Hall, 50,51 and have been validated in many QSAR and QSPR applications. They capture both the electronic and topological characteristics surrounding an atomic center as well as its neighboring environment. In the implementation of Estate descriptors by Huuskonen et al, several new atom-types corresponding to contributions from amino, hydroxyl, and carbonyl groups in different bonding environments were introduced. This level of detail seems particularly relevant for the purpose of hydrophobicity modeling. For instance, it is known that an aromatic amino group is generally less basic than its aliphatic counterpart, which makes the former less likely to ionize and presumably more hydrophobic. The use of the extended parameter set was justified by a significant improvement in cross-validated statistics of the 1,754 training set. An MLR model using 34 basic E-state descriptors yielded a q 2 value of 0.81 and an RMScv of 0.64; whereas with 41 extended parameters, the corresponding values were 0.85 and 0.55. Huuskonen et al also applied an ANN to be able to model higher-order nonlinearity between the input descriptors and logP. The final model, which had a 3951 architecture, gave a q 2 value of 0.90 and an RMScv 0.46 for leave-one-out cross-validation. Further validation on three independent test sets yielded a similar RMS error (0.41), thereby confirming the consistency of the predictions. The logP predictions of this new method were compared to those derived from commercial programs. It was found that this method was as reliable or better than the established methods for even the most complex structures. In our opinion, the approach of Huuskonen and coworkers represents a method of choice for fast logP estimation, particularly for applications where both speed and accuracy are critical. Because the algorithm does not depend on the identification of suitable basis fragments, the method is generally applicable. Unlike methods that utilize quantum chemical descriptors, the calculation is genuinely high-throughput because E-state indices can be computed directly from SMILES line notation without costly structure optimization. Furthermore, this hydrophobicity model, which was 15

developed using 40 descriptors, can account for most, if not all, molecules of pharmaceutical interest. In contrast, a connectivity table representation may require on the order of thousands of input values, which also increases the risk of chance correlation. The major limitation of the Huuskonen hydrophobicity method is the difficulty of chemical interpretation. This is in part due to the topological nature of the molecular description and in part the use of nonlinear neural networks for property correlation. Particularly, it is hard to isolate the individual contributions of the constituent functional groups to the overall hydrophobicity; or conversely, to design modifications that will lead to a desirable property profile (i.e., the inverse QSPR problem). Another important issue that has not been addressed concerns the treatment of ionizable compounds, which may adopt distinct protonation states under different solvent environments (e.g., water and 1-octanol). Currently, this phenomenon is either ignored or assumed to be handled implicitly. Together with the inverse QSPR problem, the correct handling of such molecules will be the major question that needs to be answered by the next generation logP prediction systems.

Bioavailability
Bioavailability is the percentage of a drug dose which proceeds, in an unaltered form, from the site of administration to the central circulation. By definition, a drug that is administered intravenously has 100% bioavailability. By comparing systemic drug levels achieved after intravenous injection with other drug delivery routes, an absolute bioavailability can be measured. Since for several reasons, oral administration is the preferred route for drug delivery, a major challenge for biopharmaceutical research is to achieve high oral bioavailability. Several factors contribute to reduction of oral bioavailability. First, drug molecules may bind to other substances present in the gastrointestinal tract, such as food constituents. The extent of reduction may vary significantly with an individual diet. Second, the drug may be poorly absorbed due to unfavorable physicochemical properties, such as those outlined in the Pfizer rule. Third, the drug may be metabolized as it passes through the gut wall, or, more commonly, by the liver during first-pass metabolism. Due to the complexity of the different processes affecting oral bioavailability, as well as the scarcity of data, the development of a generally applicable quantitative structure-bioavailability relationship (QSBR) has proven to be a formidable task. The most extensive QSBR study to-date was reported by Yoshida and Topliss, 52 who correlated the oral bioavailability of 232 structurally diverse drugs with their physicochemical and structural attributes. Specifically, they introduced a new parameter DlogD, which is the difference between the logarithm of the distribution coefficient of the neutral form at pH = 6.5 (intestine) versus pH = 7.4 (blood) for an ionizable species. The purpose of this descriptor was to account for the apparent 16

higher bioavailability observed for many acidic compounds. They also included 15 descriptors to encode the structural motifs with well-known metabolic transformations and therefore elucidated the reduction of bioavailability due to the first-pass effect. Using these descriptors and a method termed ORMUCS (ordered multicategorical classification method using the simplex technique), they achieved an overall classification rate of 71% (97% within one class) when the compounds were separated to four classes according to bioavailability. Furthermore, 60% (95% within one class) of the 40 independent test compounds were also correctly classified using this linear QSAR equation. The result of this study indicates that it might be feasible to obtain reasonable estimates of oral bioavailability from molecular structures when physically and biologically meaningful descriptors are employed. In the following section, we will give a brief review of how neural network methods have been applied to the modeling of absorption and metabolism processes.

Human Intestinal Absorption


The major hurdle in the development of a robust absorption modeland other modelsis very often the lack of reliable experimental data. Experimental percent human intestinal absorption (%HIA) data have generally large variability and are usually skewed to either very low or very high values, with only few compounds in the intermediate range. Jurs and coworkers collated a data set of 86 compounds with measured %HIA from the literature. 31 The data were divided to three groups: a training set of 67 compounds, a validation set of 9 compounds; and an external prediction set of 10 compounds. Using their in-house ADAPT program, 162 real-value descriptors were generated that encoded the topological, electronic and geometric characteristics for every structure. In addition, 566 binary descriptors were added to the set to indicate the presence of certain substructural fragments. Two approaches were applied to prune this initial set of 728 descriptors to a smaller pool of 127. First, descriptors that had variance less than a user-defined minimum threshold were removed to limit the extent of single-example peculiarities in the data set. Second, a correlation analysis was performed to discard potentially redundant descriptors. Application of a GA-NN type hybrid system to this data set yielded a six-descriptor QSAR model. The mean absolute error was 6.7 %HIA units for the training set, 15.4 %HIA units for the validation set, and 11 %HIA units for the external prediction set. The six descriptors that were selected by the GA could elucidate the mechanism of intestinal absorption via passive transport, which is controlled by diffusion through lipid and aqueous media. Three descriptors are related to hydrogen bonding capability, which reflects the lipophilic and lipophobic characteristics of the molecule. The fourth descriptor is the number of single bonds, which can be regarded as a measure of structural flexibility. The other two descriptors represent geometric properties providing information about the molecular size. This set of descriptors, in our opinion, shares a certain similarity to the ones that 17

define the Pfizer rule. However, it is fair to point out the great popularity of the Pfizer rule amongst medicinal chemists is, in the words of Lipinski et al,4 because the calculated parameters are very readily visualized structurally and are presented in a pattern recognition format. On the contrary, the use of more complex 3D descriptors and neural network modeling may enhance prediction accuracy, although it is probably at the expense of a diminished practical acceptance. Overall, the result of this initial attempt to predict absorption models is encouraging and more work in this area is assured. Because in vivo data are generally more variable and expensive, there will be strong emphasis on correlating oral absorption and in vitro permeability obtained from model systems such as Caco-2 or immobilized artificial membranes. In addition, future absorption models may have a molecular recognition component, which will handle compounds that are substrates for biological transporters.

Drug Metabolism
Drug metabolism refers to the enzymatic biotransformations which drug molecules are subject to the body. This is an important defensive mechanism of our bodies against potential toxins, which are generally lipophilic and are converted to more soluble derivatives that can be excreted more readily. Most drug metabolism processes occur in the liver, where degradation of drugs is catalyzed by a class of enzymes called hepatic microsomal enzymes. This constitutes the first-pass effect, which can limit a drug's systemic oral bioavailability. In the past, relatively few researchers paid special attention to drug clearance until a lead molecule had advanced nearly to the stage of clinical candidate selection. More recently this attitude has changed as the requirement for pharmacokinetic data for the purposes of correct dose calibration has been recognized. Thus, there is considerable interest in the development of in vitro or in vivo physiological models to predict hepatic metabolic clearance during the lead optimization stage. Lav and coworkers at Roche made an attempt to correlate human pharmacokinetic data from in vitro and in vivo metabolic data. 53 They collated experimental data for 22 literature and in-house compounds that were structurally diverse. The in vitro metabolic data were derived from the metabolic stability of the substances in hepatocytes isolated from rats, dogs, and humans, and the in vivo pharmacokinetic data were measured after intravenous administration for the same species. All in vitro data, as well as the in vivo data for rats and dogs, were used in combination to predict the human in vivo data. Their statistical analysis included multiple linear regression (MLR), principal component regression (PCR), partial least squares (PLS) regression, and artificial neural networks (ANN). The results of their study are summarized in Table 3. The major conclusion from this study is that the strongest predictors of human in vivo data were human and rat hepatocyte data; the in vivo clearance data from either rats or dogs did not significantly contribute to any statistical model. 18

One possible explanation is that the results from in vivo experiments are generally more variable and are therefore more noisy when they were used as predictors. It is also clear that all statistical methods (MLR, PCR, PLS and ANN) appeared to work satisfactorily for this data set; in fact, from a statistical view point the results are practically identical. It is interesting to note that the non-linear mapping capability of a neural network was not required in this case, probably because of the already strong linear correlation between the human in vivo data and the human hepatocyte data ( r = 0.88) and to the rat hepatocyte data (r = 0.81). Overall, despite the limitation of modest data set size, the results of this study provide further support for early in vitro screening of drug candidates because satisfactory human pharmacokinetic data can be predicted through mathematical modeling of these less expensive parameters. It is also fair to point out that the accuracy of their model does come with a price, that is one must first synthesize a compound and determine the appropriate biological parameters before a prediction can be made.

Table 3 Accuracy of the statistical models for human in vivo clearance prediction53
Description useda Statistical parameters Statistic Model r_h d_h h_h r_a d_a No. of terms b r2 q2 MLR 5 0.84 0.74 MLR 2 0.84 0.79 PCR 2 0.85 0.79 PLS 2 0.86 0.77 PLS 1 0.83 0.79 PLS 1 0.83 0.79 NN_linear 5 0.86 0.79 NN_sigmoidal 3 0.88 0.77 NN_sigmoidal 2 0.88 0.77 a r_h = rat hepatocyte; d_h = dog hepatocyte; h_h = human hepatocyte; r_a = rat animal data; d_a = dog animal data b MLR = number of descriptors; PCR = number of principal components; PLS = no. of components; ANN = no. of descriptors From: Prediction of Drug-Like Properties To overcome this problem, some researchers prefer to focus on theoretical descriptors that can be computed from molecular structure. Recently, Quiones et al tried to correlate drug half-life values based on physicochemical or topological descriptors, which were derived from a series of 30 structurally diverse antihistamines.54 These descriptors were used as input values to an ANN and 19

were trained against the experimental half-life of a drug, which is the time it takes for one-half of a standard dose to be eliminated from the body. Initially, they tried to formulate a model that made use of seven physicochemical descriptors: logP, pKa, molecular weight, molar refractivity, molar volume, parachor, and polarizability. However, it did not lead to a statistically significant model. They then investigated the possibility of using the CODES descriptors, which capture the atomic character as well as its neighboring chemical environment for each individual atom. In their study, they picked four CODES descriptors that corresponded to a common chemical substructure present in all 30 antihistamines. Two neural network configurations, one with five hidden nodes and another with six, were tested. The results from cross-validated predictions of their model were very encouraging, and they were mostly consistent with the range of experimental half-life values (Fig. 3). A test set of five other antihistamines was used to evaluate the two ANN models. Again, there was good agreement between the experimental and calculated half-life values, indicating the general robustness of their models, at least within the domain of biogenic amines.

Figure 3
Calculated half-life values from a neural network versus experimental values. The cross-validated predictions for the 30 training compounds are shown as open circles; the predictions for the 5 test set compounds are shown in filled squares. The experimental values for some compounds were reported as a range and are plotted accordingly. The diagonal line represents a perfect correlation between experimental and calculated half-life values. From: Prediction of Drug-Like Properties 20

Both approaches described above have their strengths and limitations. From a virtual screening perspective, the approach of Quiones et al is more attractive since their model does not rely on any experimental parameters. However, the association between the four CODES descriptors and metabolism is unclear and the current model is relevant only to a specific class of compounds that share a common substructure. In this regard, it is our opinion that the set of structural descriptors used by Yoshida and Topliss 52 are particularly informative because they represent some well-characterized metabolic liabilities. Nevertheless, drug metabolism is immensely complex and biotransformations are catalyzed by many enzymessome of them may still be unknown to us. As different enzymes have different substrate specificity according to their structural requirement, it will be a challenging task to formulate a simple theoretical model that is generally applicable to many diverse chemical classes. On the other hand, the method proposed by Lav is likely to be more general because their approach relies on experimental parameters and is thus less dependent on the metabolic pathways involved.53 It is conceivable that we will see a process that is somewhat of a hybrid of the two in the future. A panel of lead compounds with similar structures could be synthesized and tested in vitro, which is generally less expensive and time-consuming than in vivo animal testing. These in vitro data will serve as calibration data for correlation with a set of relevant theoretical descriptors that are directly obtained from molecular structures. Because of the strong relationship between in vitro and in vivo data, the predictions from the resulting QSAR model could be used to predict human pharmacokinetic clearance for compounds that are within the scope of the original lead class. Further research to establish the relationship between in vitro assay from tissue cultures of major metabolic sites (e.g., liver, kidneys, lungs, gastrointestinal tract) and in vivo data appears to be justified.

CNS Activity
The nervous system of higher organisms is divided into a central system (CNS) that comprises the brain and spinal cord, and a peripheral system (PNS) that embodies the remaining nervous tissues in the body. The CNS coordinates the activities of our bodily functions, including the collection of information from the environment by means of receptors, the integration of such signals, the storage of information in terms of memory, and the generation of adaptive patterns of behavior. Many factors, such as infectious diseases, hormonal disorders and other neurological degenerative disorders, can disrupt the balance of this extremely complex system, leading to the manifestation of CNS-related diseases. These include depression, anxiety, sleep disorder, eating disorders, meningitis, Alzheimer's and Parkinson's diseases. The prevalence of such diseases in the modern 21

world is reflected in part by the continuous growth of the market for CNS drugs, which is now the third highest selling therapeutic category behind cardiovascular and metabolic products, and is predicted to reach over $60 billion worldwide by 2002. 55 These drugs, which have the brain as the site of action, must cross the barrier between brain capillaries and brain tissue (the blood-brain barrier, or BBB). This barrier helps to protect the brain from sudden chemical changes and allows only a tiny fraction of a dose of most drugs to penetrate to cerebrospinal fluid and enter the brain. Knowledge of the extent of drug penetration through the BBB is of significant importance in drug discovery, not only for new CNS drugs, but also for other peripherally acting drugs whose exposure to the brain should be limited in order to minimize the potential risk of CNS-related side-effects. It is believed that there are certain common physicochemical characteristics common to molecules that are capable of BBB penetration, whose extent is often quantified by logBB, the logarithm of the ratio of steady-state concentration of drug in brain to that in blood. Some of these attributes include size, lipophilicity, hydrogen-bonding propensity, charge, and conformation. It was about 20 years ago that Levin reported a study describing a strong relationship between rat brain penetration and molecular weight for drugs that have MW less than 400.37 In a later study, Young et al observed that the logBB could be related to the difference between the experimental logP octanol/water and logPcyclohexane/water values for a set of histamine H2 antagonists. 56 This provided a rationale to improve blood-brain penetration of new designs by reduction of the overall hydrogen bonding propensity. The earliest correlative logBB study that involved theoretical descriptors was that of Kansy and van de Waterbeemd, who reported a two-descriptor MLR model using polar surface area (PSA) and molecular volume from a small set of compounds.38 Although their model seemed to work well for the 20 compounds within the training set, it was evident that predictions for other compounds were rather unreliable, presumably due to erroneous extrapolation.57 To overcome this problem, Abraham and coworkers examined a larger data set of 65 compounds and formulated QSAR models based on excess molar refraction, molecular volume, polarizability, and hydrogen-bonding parameters, as well as the experimental logP value.39 Later, Lombardo et al performed semi-empirical calculations on a subset of 57 compounds selected from the Abraham training set, and derived a free energy solvation parameter that correlated well with the logBB values.58 Norinder and coworkers developed PLS models of logBB using a set of MolSurf parameters, which provide information on physicochemical properties including lipophilicity, polarity, polarizability, and hydrogen bonding. 40 More recently, Luco applied the PLS method using topological and constitutive (e.g., element counts, sum of nitrogen atoms, and indicator variables of individual atoms or molecular fragments) descriptors to correlate logBB.59 In the past two years, several research groups revisited the use of PSA and logP in attempts to create models that are easy to interpret and also generally applicable. These include Clark's MLR models,42 which are two-descriptor models based on PSA and logP 22

values computed using different methods; the sterberg PLS model that considered logP and a simple count of hydrogen bond donor and acceptor atoms;60 and the Feher MLR model,10 which utilized logP, polar surface area, and the number of solvent accessible hydrogen bond acceptors. Most recently, Keser and Molnr reported a significant correlation between logBB and solvation free energy derived from generalized Born/surface area (GB/SA) continuum calculations. This established an efficient means to predict CNS penetration in terms of thermodynamic properties, whose utility had been limited previously due to high computational cost. 61 The statistical parameters reported by the various studies discussed above are shown in Table 4. The following general comments can be made on these studies:

Table 4 Summary of representative linear logBB models that have appeared in the literature
LogBB Model Kansy38 Abraham I39 Abraham II39 Lombardo58 Norinder I40 Norinder II40 Luco59 Kelder62 Clark I42 Clark II42 sterberg I60 sterberg II60 Feher10 Keser61 a N r2 s RMSE Model: Descriptorsa 20 0.70 0.45 MLR: PSA, Mol_vol 57 0.91 0.20 MLR: R2, 2 H, 2 H, Sb2 H, Vx 49 0.90 0.20 MLR: logPoct, 2 H, 2 H 55 0.67 0.41 LR: Gw 0 28 0.86 0.31 PLS: MolSurf parameters 56 0.78 0.31 PLS: MolSurf parameter 58 0.85 0.32 PLS: topological, constitutional 45 0.84 LR: dPSA 55 0.79 0.35 MLR: PSA, ClogP 55 0.77 0.37 MLR: PSA, MlogP 69 0.76 0.38 PLS: #HBAo, #HBAn, #HBD, logP 45 0.72 0.49 PLS: #HBAo, #HBAn, #HBD, logP 61 0.73 0.42 MLR: nacc, solv, logP, Apol 55 0.72 0.37 LR: Gsolv

Molecular descriptors: polar surface area (PSA, Apol), dynamic polar surface area (dPSA), excess molar refraction (R2), dipolarity/polarisability (2H), hydrogen-bond acceptor acidity (2H), hydrogen-bond acceptor basicity (2H), characteristic volume of McGowan (Vx), experimental logP (logPoct), free energy of solvation in water (DG w0, Gsolv), calculated logP (ClogP, MlogP, logP), no. of hydrogen bonds accepting oxygen and nitrogen atoms (#HBAo, #HBAn), no. of hydrogen bonds donors (#HBD), and no. of hydrogen bond in aqueous medium (nacc, solv). From: Prediction of Drug-Like Properties 1. most models were developed from an analysis of a core set of 50 structures introduced by Young et al and Abraham et al;56,39

23

2. the various linear models (either MLR or PLS) report r2 values in the range of 0.7 to 0.9, and standard errors of 0.3 to 0.4 log units. The accuracy of the models is acceptable given that most data sets have logBB values that span over 3 log units; 3. the descriptors used can be categorized to the following classes: hydrophilic (PSA and its variant, hydrogen bond propensities), hydrophobic (either calculated or measured logP values), or solvation free energy (which arguably characterizes both the hydrophilic and hydrophobic properties of the molecule), or topological indices (which encode, perhaps indirectly, the above physicochemical properties); 4. with the exception of the study of Keser and Molnr, few models have been validated extensively on a sufficiently large test set, probably due to scarcity of reliable data. The results from the linear models indicate that it is feasible to estimate candidate blood-brain penetration using computed physicochemical parameters from the molecular structure of a drug. The major drawback of the above models is that they were developed using limited data and therefore their general applicability may be questionable. A consequent solution to development therefore was to increase the diversity of the training set, with the advantage that a larger data set could also safeguard to some degree against model overfitting. This was the strategy followed by Ajay and coworkers at Vertex,41 who developed a Bayesian neural network (BNN) to predict drug BBB penetration using the knowledge acquired from a large (65,000) number of supposedly CNSactive and -inactive molecules. To construct this data set, they selected compounds from the CMC and the MDDR databases, based on therapeutic indication. In their initial classification, compounds that were within the following activity classes were defined as CNS active: anxiolytic, antipyschotic, neuronal injury inhibitor, neuroleptic, neurotropic, antidepressant, non-opioid analgesic, anticonvulsant, antimigraine, cerebral antiischemic, opioid analgesic, antiparkinsonian, sedative, hypnotic, central stimulant, antagonist to narcotics, centrally acting agent, nootropic agent, neurologic agent and epileptic. Other compounds that did not fall into the above categories were considered to be CNS inactive, an assumption that was later shown to be invalid. Based on this classification scheme, there were over 15,000 CNS active molecules and over 50,000 inactive ones. To minimize the risk of chance correlation, they elected to start with only a few molecular descriptors. The seven one-dimensional descriptors adopted in their earlier drug-likeness prediction system6 were also used in this work. These were molecular weight (MW), number of hydrogen bond donors (Don), number of hydrogen bond acceptors (Acc), number of rotatable bonds (Rot), 2 (which indicates the degree of branching of a molecule), aromatic density (AR), and MlogP. The authors believed that this set of descriptors were related to the physical attributes that correlate with BBB penetration, thereby allowing the neural network to discriminate between CNS active and 24

inactive compounds. Using a BNN with just the seven physicochemical descriptors, they achieved a prediction accuracy of 75% on active compounds and 65% on inactive ones. Further, they analyzed the false-positive entities among the supposedly inactive CMC compounds and discovered that a significant portion of the false positives actually had no information in the activity class (i.e., their inactivity labeling might be somewhat dubious). Interestingly, for the remaining false positives, the Vertex team discovered that most of the remaining compounds belonged to the following categories: tranquilizer, antivertigo, anorexic, narcotic antagonist, serotonin antagonist, anti-anxiety, sleep, enhancer, sigma opioid antagonist, antiemetic, antinauseant, antispasmodic, and anticholinergic. Thus, it is evident that there were significant omissions of therapeutic indication in the initial CNS activity definition; and furthermore, their BNN made sound generalizations that led to the correct identification of other known CNS agents. Additional validation of their method on a database of 275 compounds revealed that prediction accuracies of 93% and 72% were achieved for the CNS active and inactive compounds, respectively. The BNN method also ranked the relative importance of the seven descriptors in their CNS model, namely: Acc > AR Don2 > MW MlogP > Rot Ajay and coworkers concluded that CNS activity was negatively correlated with MW, 2a, Rot, and Acc, and positively correlated with AR, Don, and MlogP, a result that was consistent with known attributes of CNS drugs. They found that the addition of 166 2D ISIS keys to the seven 1D descriptors yielded significant improvement, which confirmed their earlier drug-likeness prediction result.6 Using the combined 1D and 2D descriptors, the BNN yielded predictivity accuracy of 81% on the active compounds and 78% on the inactive. The utility of this BNN as a filter to design a virtual library against CNS targets was subsequently demonstrated. As for any filter designed to handle large compound collections, the principal consideration was the throughput of the calculation. With their in-house implementation, they achieved a throughput of almost 1 million compounds on a single processor (195 MHz R10000) per day. The CNS activity filter was tested on a large virtual library, consisting of about 1 million molecules constructed with 100 drug-like scaffolds 63 combined with 300 most common side chains. Two types of filters were applied to prune this library. The first was substructure-based, to exclude compounds containing reactive functional groups; the second was property-based, to discard molecules with undesirable physicochemical properties, including high MW, high MlogP, and in the example case, low predicted CNS activity. From the remaining compounds, they identified several classes of molecules that have favorable BBB penetration properties and are also particularly amenable to combinatorial chemical library synthesis. As a result, such libraries are considered as privileged compound classes to address CNS targets.

25

Toxicity
No substance is free of possible harmful effects. Of the tens of thousands of current commercial chemical products, only perhaps hundreds have been extensively characterized and evaluated for their safety and potential toxicity.64,65 There is strong evidence implicating pesticides and industrial byproducts in links to numerous health problems, including birth defects, cancer, digestive disorders, mutagenicity, tumorigenicity, chronic headaches, fatigue, and irritability. The effect of widespread use of toxic substances on the environment and public health can be devastating. For example, each year about 10,000 U.S. tons of mercury is mined for industrial use, half of which is lost to the environment. The most notorious episode of methylmercury poisoning in history occurred in the 1950s at Minamata Bay in Japan, where the mercury discharged by a chemical factory was ingested and accumulated by fish, and ultimately by the people in the community. Many people developed a severe neurological disorder that later became known as Minamata disease. Despite tremendous effort to restore Minamata Bay, it was not until 199750 years later that the bay was declared mercury-free again. Some biologically persistent chemicals are introduced to the environment in the form of insecticides and pesticides. One of the best known compounds in this category is DDT (dichlorodiphenyltrichloroethane), which was used to protect crops from insect pests and is also credited with the marked decline of insect-vectored human diseases such as malaria and yellow fever in Asia, Africa, and South America. However, there is now strong evidence that DDT contamination has contributed to the marked decline of many animal species via bioaccumulation through food chains. Other chemical agents that can cause great harm even in small quantities are, ironically, medicines whose function are to alleviate the state of disease. For example, anticancer drugs are often highly toxic, because they can disrupt the cellular metabolism of normal tissue as well as that in tumors. Two government agencies are responsible for the regulation of the release of substances that are potentially hazardous in the United States. The Environmental Protection Agency (EPA) seeks to control pollution caused by pesticides, toxic substances, noise, and radiation. The U.S. Food and Drug Administration (FDA), oversees the safety of drugs, foods, cosmetics, and medical devices. The FDA issues regulations that make the drug review process more stringent, requiring that new drugs must be proven effective as well as safe. Because of the potential economic impact on environmental and health effects, finding reliable means to assess chemical toxicity is of enormous interest to both the pharmaceutical and agricultural industries. A large-scale ecological assessment of toxicity for a new agricultural chemical is very expensive, and often not feasible. Likewise, traditional in vivo toxicity screening involves animal testing, which is slow and costly, and therefore unacceptable for a mass screening 26

of many potential drug candidates. A remedy to this severe problem is the establishment of standardized in vitro or in vivo tests on model systems relevant to safety assessment. To a large extent, it is the increasing availability of experimental data that has facilitated the on-going development of computational toxicology, also known as in silico toxicology, ComTox, or e-Tox. 66 The major goal of this emerging technology is to analyze toxicological data and create SAR models that can be used to provide toxicity predictions on the basis of molecular structures alone. The work in this field to date can be categorized as two main approaches. The first is an expert system that is based on a set of rules derived from prior knowledge of similar chemical classes or substructures. Depending on whether the rules are inferred by human experts or are extracted by artificial intelligence algorithms, the system will be referred to either as a human expert system or an artificial expert system (see Chapter 2).67 When a query structure is presented for assessment, the rules associated with the structure are identified from the knowledge base to invoke a decision, often together with a possible mechanism of toxic action. The commercial programs DEREK 68 and ONCOLOGIC69 represent the most advanced systems in this category. The major criticism of a rule-based system is that it tends to give false-positive predictions. 67 The second tool is a statistical approach that uses correlative algorithms to determine quantitative structure-toxicity relationship (QSTR) from a large heterogeneous source of chemical structures with harmonized assay data. Two well-known toxicity prediction programs, TOPKAT (Oxford Molecular Inc.) and MCASE (MultiCASE Inc.),70,71 are based on this method. Briefly, TOPKAT relies on physicochemical descriptors, (size-corrected) E-state indices, and topological shape indices to characterize the physical attributes of a molecule. On the other hand, MCASE reduces a molecule to its constituent fragments (from 2 to 10 atoms in size), and treats them as fundamental descriptors. The fragments, or biophores, that were associated with most of the toxic chemicals in the database are identified, and a potential toxicity is predicted by summing up the individual contributions. Recent QSTR developments in computational pharmacotoxicology, including case studies of aquatic toxicity, mutagenicity, and carcinogenicity, will be discussed in the next Section.

Aquatic Toxicity
Aquatic toxicity is one of key toxicological indicators commonly used to assess the potential risk posed to human and environmental health by chemical substances. A number of marine and freshwater organisms, such as Pimephales promelas, Tetrahymena pyriformis, Daphnia magna, Daphnia pulex and Ceriodaphnia dubia, have become ecotoxicity systems of choice because of their fast growth rate under simple and inexpensive culture conditions. More significantly, the establishment of standard testing protocols for these species makes comparisons of inter-laboratory results meaningful. The most comprehensive resource for aquatic toxicity is the Aquatic Toxicity 27

Information Retrieval (AQUIRE) Database maintained by the EPA, which contains data on 3,000 organisms and 6,000 environmental chemicals, extracted from over 7,000 publications. This abundant experimental data provides a foundation for some of the multivariate QSTR modeling work described here. Basak and coworkers proposed a new approach called hierarchical QSAR for predicting the acute aquatic toxicity (LC50) of a set of benzene derivatives. 72 The data set, which included benzene and 68 substituted benzene derivatives containing chloro, bromo, nitro, methyl, methoxy, hydroxy, or amino substitutions. The toxicity test for these compounds were done against Pimephales promelas (fathead minnow), with pLC50 values ranging from 3.04 to 6.37. Ninety-five descriptors belonging to four major categories were computed to characterize each molecule:

35 topostructural indices (TSI), which encode information on the molecular graph without any information on the chemical nature of the atoms or bonds; 51 topochemical indices (TCI), which were derived from the molecular graph weighted by relevant chemical or physical atom type properties; 3 geometric indices (3D), which carried three-dimensional information on the geometry of the molecule; 6 quantum chemical parameters (QCI), which were calculated using the MOPAC program (HOMO, LUMO, heat of formation, etc)

To reduce the number of model parameters, variables from the TSI and TCI categories were clustered based on their inter-correlation. The index that was most correlated with the cluster was automatically selected, along with other poorly correlated (r 2 < 0.7) indices that helped to explain data variance. The clustering procedure eliminated most of the descriptors from the two groups, and only five TSI and nine TCI descriptors were retained. All nine descriptors in 3D and QCI categories were kept because they were relatively few in number and in addition they seemed to be poorly correlated amongst themselves. As a result of this pre-processing, a group of 23 molecular descriptors were used for further statistical analysis. In the next step, the authors followed an incremental approach, hierarchical QSAR, to build a linear model based on the reduced descriptor set. First, an exhaustive enumeration of the five TSI descriptors yielded a four-parameter linear regression model with a cross-validated r2 value (rcv2) of 0.37. It was apparent that using the TSI parameters alone could not produce a satisfactory model. Next, they added the nine TCI descriptors to the four selected TSI features from the first model, and again performed an exhaustive search to examine all combinations of linear models. This led to an improved four-descriptor model, yielding rcv2 = 0.75. Repeating this procedure to include the 3D and QCI descriptors resulted in a fourdescriptor and a seven-descriptor model that gave rcv2 values of 0.76 and 0.83, respectively. For comparison, they performed variable clustering on all 95 descriptors, and selected seven descriptors 28

to build another linear model. It was noteworthy that this model gave essentially the same predictivity (rcv2 = 0.83) as the (different) selection based on the hierarchical procedure (see discussion below). Basak and coworkers also explored nonlinear models using neural networks. Due to high computational expense, instead of an exhaustive enumeration at every tier level, they decided to use all descriptors in each of the four categories at each hierarchy. The cross-validation results for the four models are shown in Table 5. The trend for improvement of predictive performance paralleled that of the regression model. Inclusion of the TCI descriptors appeared to yield the biggest improvement in rcv2. The neural network result also demonstrated that a smaller subset of descriptors can yield similar results as compared to the full 95-descriptor ANN model. Perhaps the most surprising result was that the linear regression model consistently outperformed ANN in this study, at least in terms of cross-validation statistics. We think that one possible reason was an overfitting of data by the neural network that hampered its ability to generalize on the training patterns. One of the ANN models reported by Basak et al had a configuration of 95151. This means that over 1400 adjustable parameters were available to map a data set that was modest in size.69 Consequently, the predictions were likely a result of memorization rather than generalization, leading to a deterioration of predictive performance. This example clearly demonstrates the usefulness of different models to address the same problem.

Table 5 Statistical results of the hierarchical QSAR models reported by Basak et al72
Modela Artificial neural network Multiple linear regression # Desc rcv2 s # Desc rcv2 s TSI 5 0.3 0.63 4 0.37 0.63 TSI + TCI 14 0.62 0.47 4 0.75 0.39 TSI + TCI + 3D 17 0.66 0.44 4 0.76 0.38 TSI + TCI + 3D + QC 23 0.77 0.36 7 0.83 0.34 All 95 indices 95 0.76 0.37 7 0.83 0.34 a Descriptor classes: topostructural indices (TSI); topochemical indices (TCI); geometric indices (3D); and quantum chemical parameters (QC). From: Prediction of Drug-Like Properties Based on the regression studies, the two seven-descriptor models yielded essentially identical statistics. This begs the question: what is the real merit of the hierarchical approach versus the traditional kitchen sink approach? From a computational point of view, the latter approach is probably simpler to execute. We believe that the biggest benefit of the hierarchical approach is to probe the dependence of activity prediction on a particular descriptor class. In their study, Basak et al demonstrated that the use of TCI descriptors would substantially increase the efficacy of both the neural network and the regression model, and this piece of information would not have been easily 29

deciphered from the single cluster approach. However, to make this unequivocal, one should perform all combinations of cluster-based QSAR models. In addition, there is the issue of a possible order-dependence in the build-up of hierarchical layers. It is likely that if one starts with TCI as the first model, the final descriptor choice will differ from the reported set. Recently, Niculescu et al published an in-depth study of the modeling of chemical toxicity to Tetrahymena pyriformis using molecular fragments and a probabilistic neural network (PNN). 65 They obtained the 48-hour ICG50 sublethal toxicity data for T. pyriformis for 825 compounds from the TerraTox database.73 This group of compounds covered a range of chemical classes, many of which have known mechanisms of toxic actions. They included narcotics, oxidative phosphorylation uncouplers, acetylcholinesterase inhibitors, respiratory inhibitors, and reactive electrophiles.74 Instead of using physicochemical descriptors such as logP, Niculescu et al preferred functional group descriptors that represented certain molecular fragments. Each structural descriptor encoded the number of occurrences of specific molecular substructural/atomic features, and a total of 30 descriptors were used. They reasoned that there were two major advantages of this molecular presentation. First, substructural descriptors were considerably cheaper to calculate than most physicochemical or other 3D descriptors, and were also not subject to common errors in their generation (e.g., missing fragments or atom-types from logP calculations, or uncertainty of the 3D conformation). Second, they argued that the use of substructure-based descriptors would lead to a more general QSAR model. The authors suggested that one of the factors that contributed to their superior results relative to other toxicity prediction systems (such as ASTER, CNN, ECOSAR, OASIS, and TOPKAT) was that it did not rely on logP as an independent variable. The 825-compound data set was partitioned to a larger set of 750 compounds which were used to derived quantitative models, and a smaller set of 75 compounds that were later used for model validation. They performed a five-way leave-group-out cross-validation experiment (i.e., 20% of data were left out at one time) on the larger data set, where 150 randomly selected compounds were left out in turn. Five sub-models, m1 to m5, were constructed using each data partition from the same input descriptors. They achieved very good performance for the five sets of training compounds (r 2 from 0.93 to 0.95) and also very respectable test performance (r 2 from 0.80 to 0.86). Further, the m1-m5 models, together with m0 that was trained from the entire set of 750 compounds, were combined to produce a series of linear correction models. M1 simply accounted for the unexplained variance of m0 model. M2 and M3 were derived from geometric regression to force linear correction, and M4 was a multivariate regression to fit all results:

30

Since all 750 compounds played a role in the building of M 1-M4 models, their true predictivity could only be evaluated by the 75 compounds that had been left out in the initial partition. Figure 4 gives a summary of the data partition scheme. Judging from the results obtained for these external compounds, it is encouraging to see that all four models gave consistent predictions, with r2 ranging from 0.88 to 0.89. They concluded that it was viable to utilize in silico prescreens to evaluate chemical toxicity towards a well-characterized endpoint.

Figure 4
Data partitioning scheme of Niculescu et al. 65 The full data set is split into a 750-compound training set and a 75-compound external test set. Six sub-models are derived using full training set (m 0), and five leave-group-out cross-validation experiments (m1 - m5). These sub-models are combined using linear corrections, yielding four final QSAR models (M 1 - M4). The final QSAR models are then validated using the external test set. The gray color coding represents the different groups of compounds that are used for the model building propose (i.e., external test set and cross-validation). From: Prediction of Drug-Like Properties

31

To demonstrate the generality of this approach to toxicity prediction, Niculescu et al applied this same technique to an analysis of the acute aquatic toxicity of 700 highly structurally diverse chemicals to Daphnia magna, another model organism with widespread use in ecotoxicological screening.75 Using the same set of fragment descriptors as their previous work, 65 they obtained five sub-models (m1-m5), with reported r2 values in the range 0.88 to 0.9 for the training compounds, and about 0.5 to 0.72 for the test objects. Again, they constructed four linearly corrected models M1-M4 that combined the characteristics of the individual sub-models. Consistent with the previous work, these corrected models yielded very similar predictivity, though the M 4 model, which was based on a linear combination of m1-m5, appeared to give the best overall result. Application of this model on 76 external compounds yielded a standard deviation error of 0.7 log units, which is very impressive, considering that the measured activity spanned approximately 9 orders of magnitude. In contrast, the ECOSAR system, the aquatic toxicity prediction program developed by the EPA, yielded a standard deviation error of 1.4 log units for these compounds. So it seems that, although ECOSAR has information of over 150 SARs for more than 50 chemical classes, the PNN models do cover a wider scope for general application.

Carcinogenicity
Carcinogenicity is a toxicity endpoint that concerns the ability of a chemical to produce cancer in animals or humans. The extent of carcinogenicity of a substance is indicated qualitatively through the following categories: Clear Evidence or Some Evidence for positive results, Equivocal Evidence for uncertain finding, No Evidence in the absence of observable effects. Every year the National Toxicology Program publishes a list of carcinogens or potential carcinogens in the Annual Report on Carcinogens. An approach for the prediction of the carcinogenic activity of polycyclic aromatic compounds using calculated semi-empirical parameters was described by Barone et al. 76 Central to this work was the use of electronic indices, which were first applied to a set of 26 non-methylated polycyclic aromatic hydrocarbons (PAH). They reasoned that the carcinogenicity of these compounds is related to the local density of state over the ring that contains the highest bond order and on the energy gap (H) between HOMO and HOMO-1, the energy level right below HOMO. Six descriptors were considered:

HOMO, the energy of highest occupied molecular orbital HOMO-1, the energy level right below HOMO DH, the difference between the energy of HOMO and HOMO-1 CH, the HOMO contribution to the local density of state CH-1, the HOMO-1 contribution to the local density of state 32

H, CH - CH-1

A simple rule, stating that if H > 0 eV and H > 0.408 eV, then the molecule would likely be a carcinogen was formulated in this study. The use of electronic index methodology (EIM) was extended in a more recent study, where 81 non-methylated and methylated PAH were analyzed through the use of principal component analysis (PCA) and neural network methods. 77 For a set of 26 non-methylated compounds, PCA of the six thermodynamic parameters yielded two principal components that accounted for 42.6% and 35.2% of the total variance. For the 46 methylated compound, the first two principal components captured 81.5% and 15.9% of the variance, respectively. They characterized two clusters of active and a region of inactive compounds from a two-dimensional principal component score plot, for which only three carcinogenic molecules were incorrectly classified. The use of an ANN yielded the best result, both in separate non-methylated and methylated series or when in the combined set (Table 6). They concluded that the EIM, which explored energy separation between frontier molecular orbitals, offered a relevant set of descriptors that could be used for accurate modeling of carcinogenic activity of this class of compounds.

Table 6 Percentage of correct classification for the three different methodologies for 32 non-methylated and 46 methylated polycyclic aromatic hydrocarbons. The number of correct classifications in each instance is shown in parentheses77
Method Non-Methylated Methylated All EIM 84.4% (27) 73.9% (34) 78.2% (61) PCA 84.4% (27) 78.3% (36) 80.8% (63) NN 93.8% (32) 78.3% (36) 84.6% (66) From: Prediction of Drug-Like Properties In another study, Bahler et al constructed a rodent carcinogenicity model using a number of machine learning paradigms, including decision trees, neural networks, and Bayesian classifiers. 78 They analyzed a set of 904 rodent bioassay experiments, of which 468 were defined as either clear evidence or some evidence (referred as positive), 276 as no evidence (referred to as negative), and 160 equivocal cases that were eliminated from their analysis. They used 258 different attributes to characterize each of the subjects. Some of these attributes were theoretical descriptors, but the majority were observations from histopathological examination. These included 20 physicochemical parameters, 21 substructural alerts, 209 subchronic histopathology indicators, 4 attributes for maximally tolerated dose, sex and species exposed, route of administration and the Salmonella mutagenesis test result. A ten-way leave-group-out cross-validation was used to assess the predictivity of a neural network model. Using all attributes, the training accuracy converged to 90% whilst the test set reached about 80% before the performance deteriorated as the neural 33

network began to overtrain. Later, they applied a feature selection routine, termed single hidden unit method,79 to partition the descriptors into relevant and irrelevant features. They discovered that the removal of irrelevant features greatly improved the predictive accuracy for the test set (89%). In addition, the model successfully classified 392 of 452 (87%) of the positive cases, and 272 out of 292 (93%) of the negative cases. More significantly, they were able to extract knowledge embedded in the trained neural network and established a set of classification rules. It seemed that the conversion of a neural network system to such a rule-based system led to only minimal loss of prediction accuracy. For the same set of compounds, the rule-based system correctly identified 389 cases of true positives and 256 cases of true negatives, which corresponded to an overall accuracy of 87%. Because heuristic systems offer superior interpretability, while a neural network is generally more predictive, there is great optimism that the two methods can work synergistically in toxicological prediction. In the past ten years we have witnessed significant development in computational toxicology. 67 In contrast to the experimental determination of physicochemical properties (e.g., logP, solubility), the toxicology endpoint has much greater variability, and is often assessed qualitatively (toxic/nontoxic, clear/some/equivocal/no evidence of carcinogenicity) rather than quantitatively. Due to this intrinsic uncertainty, as well as the fact that most, if not all, toxicity models are derived from limited data, we must address the predictions with appropriate caution. Further refinement in reliability estimation, such as the concept of optimum prediction space (i.e., the portion of chemical space within which the model is applicable) advocated in TOPKAT, should be a priority in future development. The major problem in toxicology modeling certainly is the lack of consistent data with uniform assay evaluation criteria. Not surprisingly, most of the larger data sets that have been compiled to date concern aquatic toxicity from simple model organisms, which are useful indicators for risk assessment in environmental safety. For the purpose of drug discovery, however, it would be desirable to have the predictions extrapolated from mammalian toxicity data, but traditional in vivo toxicity screening involves animal testing, which is costly and time-consuming, and is impractical for mass screening of a large library. With the advent of molecular biology, in particular innovative proteomics technology, one should soon be able to devise rapid in vitro toxicity methods that can gauge the effects of chemicals on human cellular proteins. An early fruit of such multidisciplinary research has appeared in the realm of toxicogenomics, offering hope for the prediction of the mechanisms of toxic action through DNA microarray analysis. Besides technological advances, another possible way to accelerate the compilation of toxicology data could be through a virtual laboratory,80 where consortium members could exchange proprietary data in order to minimize unnecessary duplication of work. In regard to methodology development, the next major advance will probably come from the combined use of expert systems and correlative 34

methods. We must emphasize that expert systems and neural network technologies should not be regarded as competitive, but rather as ideal complements. In particular, a neural network method can achieve an impressive predictive accuracy despite the fact that the method itself is nave in the sense that it neither relies on any fundamental theory nor provides any clue in the formulation of its answer. While the ability of generalization is the key to the extraordinary power of neural networks, the lack of theory behind the predictions does, to a certain extent, impede its use. This is because scientists are generally less enthusiastic about reliance upon results which are conceived by a black-box approach that offers little or no qualitative explanation. Thus, it would be desirable to have an expert systemwhether human or artificialwhich could provide the logical reasoning complementary to the predictions made by the network. In this regard, the work by Bahler et al have given us a first glimpse of the cross-fertilization of artificial intelligence research in the form of future expert-neural-network.78

Multidimensional Lead Optimization


It is argued that the advent of combinatorial chemistry and parallel synthesis methods has caused the development of lead compounds that are less drug-like (i.e., higher molecular weights, higher lipophilicity, and lower solubility).4,34,81,82 Because there is seemingly unlimited diversity in chemical entities that can be synthesized, and a vast majority of these molecules appear to be pharmaceutically uninteresting, it is of great practical importance to reliably eliminate poor designs earlier in the drug discovery process. Filters such as the Pfizer rule/Rule-of-5
4,82

and the REOS

system (Rapid Elimination Of Swill) 11 were developed as a means to prioritize compounds for synthesis. We have also witnessed a change of paradigm in pre-clinical drug discovery, where in the not-so-distant past researchers were optimizing almost entirely by addressing the aspect of potency. There is now an increasing emphasis on properties such as bioavailability and toxicity in parallel with potency improvement during lead optimization. This strategy, which is referred to as multidimensional lead optimization, is depicted in Figure 5. For clarity, only some of the key factors are shown inside the middle box, although obviously additional factors such as patent coverage, synthetic feasibility, ease of scale-up and formulation, and chemical stability can also be taken into consideration. Consistent with this strategy, computational chemistry becomes of increasing importance to address these tasks. Traditionally, computer-aided drug discovery has focussed on ligand design by using structure-based or QSAR methods. Today, significant resources are dedicated to the development and application of in silico ADME and toxicity models to predict physicochemical and biological properties that are relevant to in vivo efficacy and therapeutic drug safety.83 While it is still too early to judge the impact of in silico prediction methods on the drug discovery process, we are confident that the continuous integration of medicinal chemistry, high-

35

throughput screening, pharmacology, toxicology, computer modeling and information management will greatly enhance our ability and efficiency in discovering novel medicines.

Figure 5
Schematic illustration of a multidimensional lead optimization strategy. From: Prediction of Drug-Like Properties

Madame Curie Bioscience Database [Internet]. Austin (TX): Landes Bioscience; 2000-. Copyright 2000-2013, Landes Bioscience. NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

References
1. 36

Verlinde C L M J, Hol W G J. Structure-based drug design: progress, results and challenges. Structure. 1994;2:577587. [PubMed] 2. Navia MA, Chaturvedi PR. Design principles for orally bioavailable drugs. Drug Discov Today. 1996;1:179 189. 3. Chan OH, Stewart BH. Physicochemical and drug-delivery considerations for oral drug bioavailability. Drug Discov Today. 1996;1:461473. 4. Lipinski CA, Lombardo F, Dominy BW. et al. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv Drug Delivery Rev. 1997;23:3 25. [PubMed] 5. Frimurer TM, Bywater R, Nrum L. et al. Improving the odds in discriminating drug-like from non druglike compounds. J Chem Inf Comput Sci. 2000;40:13151324. [PubMed] 6. Ajay , Walters P, Murcko MA. Can we learn to distinguish between drug-like and nondrug-like molecules? J Med Chem. 1998;41:33143324. [PubMed] 7. Sadowski J, Kubinyi H. A scoring scheme for discriminating between drugs and nondrugs. J Med Chem. 1998;41:33253329. [PubMed] 8. Viswanadhan VN, Ghose AK, Revankar GR. et al. Atomic physiochemical parameters for three dimensional structure directed quantitative structure-activity relationships. 4. Additional parameters for hydrophobic and dispersive interactions and their applications for an automated superposition of certain naturally occurring nucleoside antibiotics. J Chem Inf Comput Sci. 1989;29:163172. 9. Matthews BW. Comparison of predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta. 1975;405:442451. [PubMed] 10. Feher M, Sourial E, Schmidt JM. A simple model for the prediction of blood-brain partitioning. Int J Pharm. 2000;201:239247. [PubMed] 11. Walters WP, Ajay, Murcko MA. Recognizing molecules with drug-like properties. Curr Opin Chem Biol. 1999;3:384387. [PubMed] 12. Abramovitz R, Yalkowski SH. Estimation of aqueous solubility and melting point of PCB congeners. Chemosphere. 1990;21:12211229. 13. Suzuki T. Development of an automatic estimation system for both the partition coefficient and aqueous solubility. J Comput-Aided Mol Des. 1991;5:149166. [PubMed] 14.

37

Kamlet MJ. Linear solvation energy relationships: an improved equation for correlation and prediction of aqueous solubilities of aromatic solutes including polycyclic aromatic hydrocarbons and polychlorinated biphenyls. Prog Phys Org Chem. 1993;19:293317. 15. Jorgensen WL, Duffy EM. Prediction of drug solubility from Monte Carlo simulations. Bioorg Med Chem Lett. 2000;10:11551158. [PubMed] 16. Abraham MH, McGowan JC. The use of characteristic volumes to measure cavity terms in reversed phase liquid chromatography. Chromatographia. 1987;23:243246. 17. Abraham MH. Scales of solute hydrogen-bonding: their construction and application to physicochemical and biochemical processes. Chem Soc Rev. 1993;22:7383. 18. Bodor N, Harget A, Huang N -J. Neural network studies. 1. Estimation of the aqueous solubility of organic compounds. J Am Chem Soc. 1991;113:94809483. 19. Patil GS. Correlation of aqueous solubility and octanol-water partition coefficient based on molecular structure. Chemosphere. 1991;22:723738. 20. Patil GS. Prediction of aqueous solubility and octanol-water partition coefficient for pesticides based on their molecular structure. J Hazard Mater. 1994;36:3543. 21. Nelson TM, Jurs PC. Prediction of aqueous solubility of organic compounds. J Chem Inf Comput Sci. 1994;34:601609. 22. Sutter JM, Jurs PC. Prediction of aqueous solubility for a diverse set of heteroatom-containing organic compounds using a quantitative structure-activity relationship. J Chem Inf Comput Sci. 1996;36:100107. 23. Mitchell BE, Jurs PC. Prediction of aqueous solubility of organic compounds from molecular structure. J Chem Inf Comput Sci. 1998;38:489496. 24. Ruelle P, Kesselring UW. Prediction of the aqueous solubility of proton-acceptor oxygen-containing compounds by the mobile order solubility model. J Chem Soc, Faraday Trans. 1997;93:20492052. 25. Egolf LM, Wessel MD, Jurs PC. Prediction of boiling points and critical temperatures of industrially important organic compounds from molecular structure. J Chem Inf Comput Sci. 1994;34:947956. 26. Xu L, Ball JW, Dixon SL. et al. Quantitative structure-activity relationships for toxicity of phenols using regression analysis and computational neural networks. Environmental Toxicol Chem. 1994;13:841851. 27. Sutter JM, Dixon SL, Jurs PC. Automated descriptor selection for quantitative structure-activity relationships using generalized simulated annealing. J Chem Inf Comput Sci. 1995;35:7784.

38

28. Wessel MD, Jurs PC. Prediction of normal boiling points for a diverse set of industrially important organic compounds from molecular structure. J Chem Inf Comput Sci. 1995;35:841850. 29. Mitchell BE, Jurs PC. Prediction of autoignition temperature of organic compounds from molecular structure. J Chem Inf Comput Sci. 1997;37:538547. 30. Engelhardt HL, Jurs PC. Prediction of supercritical carbon dioxide solubility of organic compounds from molecular structure. J Chem Inf Comput Sci. 1997;37:478484. 31. Wessel MD, Jurs PC, Tolan JW. et al. Prediction of human intestinal absorption of drug compounds from molecular structure. J Chem Inf Comput Sci. 1998;38:726735. [PubMed] 32. Huuskonen J, Salo M, Taskinen J. Neural network modeling for estimation of the aqueous solubility of structurally related drugs. J Pharm Sci. 1997;86:450454. [PubMed] 33. Huuskonen J, Salo M, Taskinen J. Aqueous solubility prediction of drugs based on molecular topology and neural network modeling. J Chem Inf Comput Sci. 1998;38:450456. [PubMed] 34. Huuskonen J. Estimation of aqueous solubility for a diverse set of organic compounds based on molecular topology. J Chem Inf Comput Sci. 2000;40:773777. [PubMed] 35. Yalkowsky SH, Banerjee S. Aqueous solubility Methods of estimation for organic compounds New York: Marcel Dekker,1992. 36. Huuskonen JJ, Livingstone DJ, Tetko IV. Neural network modeling for estimation of partition coefficient based on atom-type electrotopological state indices. J Chem Inf Comput Sci. 2000;40:947955. [PubMed] 37. Levin VA. Relationship of octanol/water partition coefficient and molecular weight to rat brain capillary permeability. J Med Chem. 1980;23:682684. [PubMed] 38. Kansy M, van de Waterbeemd H. Hydrogen bonding capacity and brain penetration. Chimia. 1992;46:299 303. 39. Abraham MH, Chadha HS, Mitchell RC. Hydrogen bonding. 33. Factors that influence the distribution of solutes between blood and brain. J Pharm Sci. 1994;83:12571268. [PubMed] 40. Norinder U, Sjberg P, sterberg T. Theoretical calculation and prediction of brain-blood partitioning of organic solutes using MolSurf parameterization and PLS statitistics. J Pharm Sci. 1998;87:952959. [PubMed] 41. Ajay, Bemis GW, Murcko MA. Designing libraries with CNS activity. J Med Chem. 1999;42:49424951. [PubMed]

39

42. Clark DE. Rapid calculation of polar molecular surface area and its application to the prediction of transport phenomena. 2. Prediction of blood-brain barrier penetration. J Pharm Sci. 1999;88:815821. [PubMed] 43. Rekker RE. The Hydrophobic Fragment Constant Amsterdam: Elsevier, 1976. [PMC free article] [PubMed] 44. Leo AJ, Jow PY, Silipo C. et al. Calculation of hydrophobic constant (logP) from p and F constants. J Med Chem. 1975;18:865868. [PubMed] 45. Schaper K -J, Samitier M L R. Calculation of octanol/water partition coefficients (logP) using artificial neural networks and connection matrices. Quant Struct-Act Relat. 1997;16:224230. 46. Breindl A, Beck B, Clark T. Prediction of the n-octanol/water partitiion coefficient, logP, using a combination of semiempirical MO-calculations and a neural network. J Mol Model. 1997;3:142155. 47. CONCORD University of Texas, Austin, TX. 48. SYBYL Tripos, Inc., St Louis, MO. 49. VAMP Oxford Molecular Group, Oxford, UK. 50. Kier LB, Hall LH. An electrotopological state index for atoms in molecules. Pharm Res. 1990;7:801807. [PubMed] 51. Hall LH, Kier LB. Electrotopological state indices for atom types: a novel combination of electronic, topological, and valence shell information. J Chem Inf Comput Sci. 1995;35:10391045. 52. Yoshida F, Topliss JG. QSAR model for drug human oral bioavailability. J Med Chem. 2000;43:25752585. [PubMed] 53. Schneider G, Coassolo P, Lav T. Combining in vitro and in vivo pharmacokinetic data for prediction of hepatic drug clearance in humans by artificial neural networks and multivariate statistical techniques. J Med Chem. 1999;42:50725076. [PubMed] 54. Quiones C, Caceres J, Stud M. et al. Prediction of drug half-life values of antihistamines based on the CODES/neural network model. Quant Struct-Act Relat. 2000;19:448454. 55. http://www.pjbpubs.com/scriprep/bs1024.htm. 56. Young RC, Mitchell RC, Brown TH. et al. Development of a new physiochemical model for brain penetration and its application to the design of centrally acting H 2 receptor histamine antagonist. J Med Chem. 1988;31:656671. [PubMed]

40

57. Calder JA, Ganellin CR. Predicting the brain-penetrating capability of histaminergic compounds. Drug Des Discov. 1994;11:259268. [PubMed] 58. Lombardo F, Blake JF, Curatolo WJ. Computation of brain-blood partitioning of organic solutes via free energy calculations. J Med Chem. 1996;39:47504755. [PubMed] 59. Luco JM. Prediction of the brain-blood distribution of a large set of drugs from structurally derived descriptors using partial least-squares (PLS) modeling. J Chem Inf Comput Sci. 1999;39:396404. [PubMed] 60. sterberg T, Norinder U. Prediction of polar surface area and drug transport processes using simple parameters and PLS statistics. J Chem Inf Comput Sci. 2000;40:14081411. [PubMed] 61. Keser GM, Molnr L. High-throughput prediction of blood-brain partitioning: A thermodynamic approach. J Chem Inf Comput Sci. 2001;41:120128. [PubMed] 62. Kelder J, Grootenhuis PD, Bayada DM. et al. Polar molecular surface as a dominating determinant for oral absorption and brain penetration of drugs. Pharm Res. 1999;16:15141519. [PubMed] 63. Bemis GW, Murcko MA. The properties of known drugs. 1. Molecular frameworks. J Med Chem. 1996;39:28872893. [PubMed] 64. Young SS, Profeta SJ, Unwalla RJ. et al. Exploratory analysis of chemical structure, bacterial mutagenicity and rodent tumorigenicity. Chemo Intell Lab Sys. 1997;37:115124. 65. Niculescu SP, Kaiser K L E, Schultz TW. Modeling the toxicity of chemicals to Tetrahymena pyriformis using molecular fragment descriptors and probabilistic neural networks. Arch Environ Contam Toxicol. 2000;39:289298. [PubMed] 66. Matthews EJ, Benz RD, Contrera JF. Use of toxicological information in drug design. J Mol Graph Model. 2000;18:605615. [PubMed] 67. http://www.netsci.org/Science/Special/feature05.html. 68. Sanderson DM, Earnshaw CG. Computer prediction of possible toxic action from chemical structure: the DEREK system. Human Exp Toxicol. 1991;10:261273. [PubMed] 69. Woo Y, Lai DY, Argus MF. et al. Development of structure-activity relationship rules for predicting carcinogenic potential of chemicals. Toxicol Lett. 1995;79:219228. [PubMed] 70. Klopman G. Artificial intelligence approach to structure-activity studies: Computer automated structure evalulation method of biological activity of organic molecules. J Am Chem Soc. 1984;106:73157321.

41

71. Matthews EJ, Contrera JF. A new highly specific method for predicting the carcinogenic potential of pharmaceuticals in rodents using enhanced MCASE QSAR-ES software. Regul Toxicol Pharmacol. 1998;28:242264. [PubMed] 72. Basak SC, Grunwald GD, Gute BD. et al. Use of statistical and neural net approaches in predicting toxicity of chemicals. J Chem Inf Comput Sci. 2000;40:885890. [PubMed] 73. TerraTox 2000. TerraBase Inc., Burlington, Ontario, Canada. 74. Russom CL, Bradbury SP, Broderius SJ. et al. Predicting modes of toxic action from chemical structure: acute toxicity in the fathead minnow (Pimephales promelas). Environ Toxicol Chem. 1997;16:948967. 75. Kaiser K L E, Niculescu SP. Modeling acute toxicity of chemicals to Daphnia magna: A probablistic neural network approach. Environ Toxicol Chem. 2001;20:420431. [PubMed] 76. Barone P M V B, Camilo A Jr., Galvo DS. Theoretical approach to identify carcinogenic activity of polycyclic aromatic hydrocarbons. Phys Rev Lett. 1996;77:11861189. [PubMed] 77. Vendrame R, Braga RS, Takahata Y. et al. Structure-activity relationship studies of carcinogenic activity of polycyclic aromatic hydrocarbons using calculated molecular descriptors with principal component analysis and neural network. J Chem Inf Comput Sci. 1999; 39:10941104. [PubMed] 78. Bahler D, Stone B, Wellington C. et al. Symbolic, neural, and Bayesian machine learning models for predicting carcinogenicity of chemical compounds. J Chem Inf Comput Sci. 2000; 40:906914. [PubMed] 79. Stone B. Feature selection and rule extraction for neural networks in the domain of predictive toxicology 1999. Department of Computer Science, North Carolina State University. 80. Vedani A, Dobler M. Multi-dimensional QSAR in drug research. Predicting binding affinities, toxicity and pharmacokinetic parameters. Prog Drug Res. 2000;55:105135. [PubMed] 81. Fecik RA, Frank KE, Gentry EJ. et al. The search for oral drug bioavailability. Med Res Rev. 1998; 18:149 185. [PubMed] 82. Lipinski CA. Drug-like properties and the causes of poor solubility and poor permeability. J Pharmacol Toxicol Methods. 2000; 44:235249. [PubMed] 83. Ekins S, Waller CL, Swaan PW. et al. Progress in predicting human ADME parameters in silico. J Pharmacol Toxicol Methods. 2000; 44:251272. [PubMed] Copyright 2000-2013, Landes Bioscience.

42

Potrebbero piacerti anche