Sei sulla pagina 1di 16

Molecular Descriptors

C371 Fall 2004

INTRODUCTION
Molecular descriptors are numerical values that characterize properties of molecules Examples:
Physicochemical properties (empirical) Values from algorithms, such as 2D fingerprints

Vary in complexity of encoded information and in compute time

Descriptors for Large Data Sets


Descriptors representing properties of complete molecules
Examples: LogP, Molar Refractivity

Descriptors calculated from 2D graphs


Examples: Topological Indexes, 2D fingerprints

Descriptors requiring 3D representations


Example: Pharmacophore descriptors

DESCRIPTORS CALCULATED FROM 2D STRUCTURES


Simple counts of features
Lipinski Rule of Five (H bonds, MW, etc.) Number of ring systems Number of rotatable bonds

Not likely to discriminate sufficiently when used alone Combined with other descriptors for best effect

Physicochemical Properties
Hydrophobicity
LogP the logarithm of the partition coefficient between n-octanol and water

ClogP (Leo and Hansch) based on small set of values from a small set of simple molecules
BioByte: http://www.biobyte.com/
Daylights MedChem Help page

http://www.daylight.com/dayhtml/databases/medchem/m edchem-help.html
Isolating carbon: one not doubly or triply bonded to a heteroatom

ACD Labs Calculated Properties


http://www.acdlabs.com ACD Labs values now incorporated into the CAS Registry File for millions of compounds I-Lab: http://ilab.acdlabs.com/
Name generation NMR prediction Physical property prediction

Molar Refractivity
MR = n2 1 MW -------- ----n2 + 2 d where n is the refractive index, d is density, and MW is molecular weight. Measures the steric bulk of a molecule.

Topological Indexes
Single-valued descriptors calculated from the 2D graph of the molecule Characterize structures according to size, degree of branching, and overall shape Example: Wiener Index counts the number of bonds between pairs of atoms and sums the distances between all pairs

Topological Indexes: Others


Molecular Connectivity Indexes
Randi (et al.) branching index
Defines a degree of an atom as the number of adjacent non-hydrogen atoms Bond connectivity value is the reciprocal of the square root of the product of the degree of the two atoms in the bond. Branching index is the sum of the bond connectivities over all bonds in the molecule.

Chi indexes introduces valence values to encode sigma, pi, and lone pair electrons

Kappa Shape Indexes


Characterize aspects of molecular shape
Compare the molecule with the extreme shapes possible for that number of atoms
Range from linear molecules to completely connected graph

2D Fingerprints
Two types:
One based on a fragment dictionary
Each bit position corresponds to a specific substructure fragment Fragments that occur infrequently may be more useful

Another based on hashed methods


Not dependent on a pre-defined dictionary Any fragment can be encoded

Originally designed for substructure searching, not for molecular descriptors

Atom-Pair Descriptors
Encode all pairs of atoms in a molecule Include the length of the shortest bond-bybond path between them Elemental type plus the number of nonhydrogen atoms and the number of bonding electrons

BCUT Descriptors
Designed to encode atomic properties that govern intermolecular interactions Used in diversity analysis Encode atomic charge, atomic polarizability, and atomic hydrogen bonding ability

DESCRIPTORS BASED ON 3D REPRESENTATIONS


Require the generation of 3D conformations
Can be computationally time consuming with large data sets Usually must take into account conformational flexibility 3D fragment screens encode spatial relationships between atoms, ring centroids, and planes

Pharmacophore Keys & Other 3D Descriptors


Based on atoms or substructures thought to be relevant for receptor binding Typically include hydrogen bond donors and acceptors, charged centers, aromatic ring centers and hydrophobic centers Others: 3D topographical indexes, geometric atom pairs, quantum mechanical calculations for HUMO and LUMO

DATA VERIFICATION AND MANIPULATION


Data spread and distribution
Coefficient of variation (standard deviation divided by the mean)

Scaling (standardization): making sure that each descriptor has an equal chance of contributing to the overall analysis Correlations Reducing the dimensionality of a data set: Principal Components Analysis

Potrebbero piacerti anche