Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Abstract: Fragment-based drug design (FBDD) is a promising approach for the discovery and optimiza-
tion of lead compounds. Despite its successes, FBDD also faces some internal limitations and challenges.
FBDD requires a high quality of target protein and good solubility of fragments. Biophysical techniques
for fragment screening necessitate expensive detection equipment and the strategies for evolving fragment
hits to leads remain to be improved. Regardless, FBDDis necessary for investigating larger chemical space
and can be applied to challenging biological targets. In this scenario, cheminformatics and computational
chemistry can be used as alternative approaches that can signicantly improve the efciency and success
rate of lead discovery and optimization. Cheminformatics and computational tools assist FBDD in a very
exible manner. Computational FBDDcan be used independently or in parallel with experimental FBDD
for efciently generating and optimizing leads. Computational FBDD can also be integrated into each
step of experimental FBDDand help to play a synergistic role by maximizing its performance. This review
will provide critical analysis of the complementarity between computational and experimental FBDDand
highlight recent advances in new algorithms and successful examples of their applications. In particular,
fragment-based cheminformatics tools, high-throughput fragment docking, and fragment-based de novo
drug design will provide the focus of this review. We will also discuss the advantages and limitations of
different methods and the trends in new developments that should inspire future research. C 2012
Wiley Periodicals, Inc. Med. Res. Rev., 00, No. 0, 145, 2012
Key words: computational fragment-based drug design; fragment informatics; fragment docking;
fragment-based de novo design
Correspondence to: Wannian Zhang or Chunquan Sheng, Department of Medicinal Chemistry, School of Phar-
macy, Second Military Medical University, 325 Guohe Road, Shanghai 200433, Peoples Republic of China.
E-mail: zhangwnk@hotmail.com or shengcq@hotmail.com
Medicinal Research Reviews, 00, No. 0, 145, 2012
C
2012 Wiley Periodicals, Inc.
2
r
SHENG AND ZHANG
1. INTRODUCTION
The identication of small molecules that selectively bind to a biological target is the key step in
drug discovery. High-throughput screening (HTS) is a routine method for hit or lead discovery
in the pharmaceutical industry.
1
It has proven to be effective in many research programs,
particularly with improved lead-like libraries. Although HTS has produced a number of lead
molecules for various drug targets, it also has several limitations. A HTS campaign usually
screens 10
6
10
7
compounds, which only cover a small portion of drug-like chemical space
(about 10
6
molecules
2
). Moreover, the hit rate of HTS is generally low and the resulting
leads are difcult to be optimized into drug-like candidates, because many of them have large
molecular weights (MWs) and are strongly hydrophobic.
3
In this context, fragment-based drug
design (FBDD) is becoming an alternative approach for drug discovery.
4
Taking advantages of both random screening and structure-based drug design (SBDD),
FBDD constructs novel lead structures from small molecular fragments. Since the introduction
of the SAR by NMR method in 1996,
5
FBDD has become a practical and promising
tool in drug discovery.
6
The workow of a FBDD study is depicted in Figure 1. The rst
Figure 1. The complementarity between computational and experimental FBDD.
Medicinal Research Reviews DOI 10.1002/med
FRAGMENT INFORMATICS AND COMPUTATIONAL FBDD
r
3
part of FBDD is to identify weak to moderate binders of the desired target by fragment
screening. The libraries for fragment screening contain hundreds to thousands of small and
low MW fragments, which are screened at high concentration.
7
Because the binding afnities
of fragments are relatively weak (5 mM1 M), highly sensitive detection methods have been
developed for this purpose.
8
The biophysical techniques for fragment screening mainly include
nuclear magnetic resonance (NMR),
9
mass spectroscopy (MS),
10, 11
X-ray crystallography,
12
and surface plasmon resonance (SPR) spectroscopy.
13, 14
Then, various optimization strategies
can be used to increase the afnity and drug likeness of fragment hits to evolve them into high-
quality leads. The hit-to-lead optimization process may involve a combination of fragment
linking, fragment evolution, fragment optimization, and fragment self-assembly, which is often
guided by the structural information of the target-fragment complex.
15
The lead optimization
stage is technically similar to that of conventional SBDD. More than ten clinical candidates have
been generated by the FBDD strategy.
4
In 2011, the B-Raf inhibitor vemurafenib (Zelboraf),
16
the rst FBDD derived drug, was approved by the FDA for the treatment of melanoma. It took
only 6 years from concept to approval for vemurafenib. Inspired by these encouraging results,
FBDD is attracting more and more attention from both the pharmaceutical industry and the
academic community.
1719
Compared to HTS, FBDDhas several advantages, including generation of higher chemical
diversity (sampling a larger chemical space), higher hit rates, and higher ligand efciency
(LE = log IC
50
/ number of heavy atoms).
20
However, current FBDD approaches also face
some internal limitations and challenges. First, FBDD methods still cover a small fraction of
the total diversity space. It is estimated that a library of 10
3
fragments can typically sample the
chemical diversity space of 10
9
molecules. Although the combinatorial advantage of FBDD
provides a signicant increase in diversity space relative to HTS, exploring a larger region of
drug-like space is still needed. Second, current fragment screening methods demand signicant
amounts, purity, solubility, and suitability of target proteins for labeling or crystallization.
Although progress has been made, the successful application of FBDD to membrane proteins
(e.g. G-protein coupled receptors, GPCR) remains a signicant challenge.
21
Moreover, FBDD
is more suitable for certain classes of targets whose binding site often consists of multiple
distinctive subsites (such as kinases) as individual fragments may occupy different subsites and
they can be joined later into complete molecules. On the other hand, the process of fragment
optimization is often guided by structure-based design, which is difcult to be applied to
targets whose structures are unknown. Third, most of the FBDD methods do not take ligand
specicity or selectivity into account. Although there are good examples of selective fragments
targeted to kinases, the methodologies of FBDD need to be improved to efciently identify
fragments that bind to the sites responsible for target specicity.
22
Fourth, the geometries and
key interactions of the original fragment hits may be changed when they are evolved into
lead compounds.
23
New methods should be developed to efciently select proper linkers to
bridge fragments, nd proper groups (fragments) to be added to the initial fragment hits, and
predict the binding mode of the newly generated molecules. Last, the techniques of FBDD
often require specialized equipment and specic expertise,
24
which limits the broad application
of FBDD.
Cheminformatics andcomputational approaches provide analternative tothe experimental
FBDD methods. The incorporation of computational methods into the FBDD process can sig-
nicantly improve the efciency and success rate of lead discovery and optimization. Moreover,
the low-throughput nature of experimental FBDDmakes computational tools an attractive way
to explore larger commercially available fragment databases. Computational chemistry tools
can signicantly improve the efciency of each step of FBDD, such as fragment library design,
active site characterization, fragment hit discovery, and hit-to-lead-to-candidate optimization.
Several reviews have covered the topic of computational FBDD approaches.
2532
Our goal here
Medicinal Research Reviews DOI 10.1002/med
4
r
SHENG AND ZHANG
is to highlight recent advances of new algorithms, analyze their advantages and limitations, and
discuss the trends to inspire future research. We will rst provide a comprehensive overview
of the complementarity between computational and experimental FBDD. Then, a broad set
of cheminformatics and computational FBDD tools as well as their successful applications
will be discussed in detail. In particular, we will focus on recent advances of fragment-based
cheminformatics tools, high-throughput fragment docking, and fragment-based de novo drug
design.
2. OVERVIEW OF THE COMPLEMENTARITY BETWEEN COMPUTATIONAL
AND EXPERIMENTAL FBDD
Computational chemistry provides complementary methods for experimental FBDD, and has
assisted the implementation of FBDD in an efcient and cost-effective manner. Computational
approaches play an important role throughout the process of FBDD (Fig. 1). The construction
of high-quality fragment libraries is the rst step in the FBDDprocess. The library for fragment
screening should have good diversity to represent drug-like chemical space and also meet
certain criteria of physicochemical properties, solubility and synthetic accessibility.
7, 8, 33
Such
properties can be quickly obtained by computational methods and then used as lters for
commercially available fragment databases. Computational methods are also helpful to remove
fragments with unwanted chemical groups and incorporate the most frequently occurring
fragments from known drugs. The fragments also need to be highly soluble, because they
are screened at a high concentration. Approaches for the prediction of aqueous solubility
mainly include quantitative structureactivity relationships (QSAR) andquantitative structure
property relationships (QSPR) modeling.
34, 35
During the fragment screening stage, molecular docking has been used as a prescreen tool
to reduce experimental efforts. Virtual fragment screening can also directly yield potent hits
without using a direct detection technique of experimental FBDD. For the hits identied from
fragment screening, computational approaches (e.g. substructure search and similarity-based
search) can be used to facilitate hit expansion and obtain SAR information for a secondary
screen. For example, a researchgroupfromVertex usedthe NMRSHAPES
36
methodtoidentify
several micromolar hits by performing a secondary screenon500 compounds that were obtained
from substructure and similarity searching around the fragment hits.
37
Although hit expansion
of existing compounds may be a practical approach, it is more desirable to design and synthesize
totally new compounds using fragment hits as seeds. Computational analysis of a protein-hit
complex can provide useful information to prioritize the most promising fragment hits for the
subsequent fragment-to-lead process. Structure-based in silico methods can iteratively assist
the buildup of the fragment hits into a new lead compound that possesses improved potency
and drug likeness. For example, the selection of an appropriate linker to join fragment hits is
of great importance to generate high-afnity ligands. The exibility of the linker is important
for the binding geometries of the original fragment hits and the binding afnity of the resulting
molecules.
38
In this context, computational methods (e.g. de novo drug design algorithms
39
)
are helpful to virtually screen linker libraries. Moreover, de novo drug design methods can
not only signicantly aid the assembly of fragment hits into novel compounds, but can also
automatically design novel ligands. Molecular docking and molecular dynamics simulations
can efciently predict the binding afnity and binding pose of the designed ligands, and thus
they are powerful tools for the evolution of fragments into potent leads.
40, 41
According to
Vangrevelings review,
30
31 out of 36 successful FBDD examples used structure-based design
tools for fragment-to-lead optimization. Such structure-based approaches are also popular for
the optimization of the lead into a clinical candidate.
29
Medicinal Research Reviews DOI 10.1002/med
FRAGMENT INFORMATICS AND COMPUTATIONAL FBDD
r
5
3. FRAGMENT INFORMATICS
What is a fragment? It is difcult to give a precise denition. Generally, fragments are small,
low MW and highly soluble molecules that have weak binding afnity with the target protein.
Fragments are often used to build a larger lead compound with improved biological activity.
Also, the term fragment can be regarded as a substructure or structural part of a more complex
molecule. In 2003, Congreve et al. found that fragment hits possessed physicochemical proper-
ties that meet the criteria of the rule of three (RO3), namely (i) MW 300 Da; (ii) hydrogen
bond donors and acceptors 3; (iii) LogP 3.
42
Additional physicochemical properties for a
fragment include three or less rotatable bonds and a polar surface area less than 60
A
2
. Typically,
the MW of fragment hits is in the range of 120250 and the binding afnity is 30 M1 mM.
15
More recently, modications or extensions of these rules have been suggested.
4345
Molecular
fragments have long been used as descriptors for chemical similarity searches or diversity anal-
ysis and played an important role for chemoinformatics analysis. In addition, fragments have
also been associated with specic biological activities, privileged structural motifs of specic
target families, and absorption, distribution, metabolism, excretion, and toxicity (ADME/T)
proles.
4648
In the following sections, recent progress of fragment informatics and its impact
on FBDD will be described (Fig. 2).
A. Fragmentation Approaches and Fragment Space
Breaking molecules into fragments is the rst step in fragment mining or fragment informat-
ics analysis. Fragmentation of molecules allows the comparison of molecules using standard
Figure 2. The inuence of fragment informatics to FBDD.
Medicinal Research Reviews DOI 10.1002/med
6
r
SHENG AND ZHANG
cheminformatics approaches. Nowadays, there are various publicly or commercially avail-
able databases, such as PubChem,
49
eMolecules,
50
WOMBAT,
51
ZINC,
52
WDI,
53
Medchem,
54
MDDR,
55
and CMC,
56
that consist of structure, property, and/or biological activity data for
millions of small molecules. These easily accessible databases provide useful sources for chem-
informatics analysis. Fragments are often obtained by in silico fragmentation of molecular
structures. The average size of fragments and the composition of the fragment population are
determined by two important parameters: the maximum number of permitted bond deletions
per iteration and the total number of iterations.
A number of well-dened computational fragmentation schemes have been reported. They
can be mainly classied into substructure methods and building block methods. The substruc-
ture approaches treat the fragment as a substructure of the molecules and aim to complete the
analysis of all possible fragments.
57
Such methods are not specic to fragments and are often
used in QSAR or similarity searches. The building block methods mainly focus on chemically
meaningful fragments and have wide applications in computational FBDD. Predened break-
ing rules are used to dissect molecules into building blocks. For example, building blocks can
be dened as rings, functional groups, side chains, or linkers. Our group has decomposed the
MDDR database
55
into rings, linkers, and side chains, which can be used to build drug-like
fragment libraries.
58
Another efcient way of fragmentation is virtual retrosynthesis. RECAP
is the most widely used method and employs some common chemical reactions as the rules
to break structures.
59
During the process of RECAP fragmentation, the bonds formed by one
of these reactions are cleaved. RECAP has been successfully applied to explore the fragment
space, analyze drug-like fragments in marketed drugs,
60, 61
and construct synthetically feasible
fragment libraries for de novo ligand design.
62
More recently, Schulz et al. evaluated six different
cheminformatics tools for the construction of a fragment library.
63
An iterative removal proto-
col was proven to be the best method to design a diverse fragment library that can maximally
represent the commercially available chemical space.
Chemical fragment space means combinations of molecular fragments andtheir connection
rules. A rather small number of fragments can span a huge space of virtual compounds due
to the combinatorial explosion.
64
Because the chemical space of fragments is signicantly
smaller thanthat of drug-like molecules,
65
goodsampling inthe fragment space may be achieved
by FBDD. An important goal of fragment informatics analysis is to construct drug-like and
chemically tractable fragment space that can generate potent active compounds against a large
variety of targets.
66
Mauser et al. generated thousand-size fragment space through fragmentation of the WDI
2004
53
and the Medchem03
54
databases using the RECAP principle.
67
The fragment space
contained two subsets: a subset containing the most frequently occurring fragments (2039 frag-
ments) and a substructure-based diverse subset (1923 fragments). Validation studies revealed
that the two subsets were complementary to each other and that their combination covered
a larger part of drug-like chemical space. Tanaka et al. performed network analysis of frag-
ment libraries by extracting relatively small compounds from the ZINC database.
52
Moreover,
an efcient compound-prioritization method was proposed for fragment linking. The variety
of linkers was also shown to be relatively important for molecular diversity when fragment
linking was performed. More recently, the fragment subset of a large compound database was
analyzed. Deursen et al. visualized 4.5 million fragments in PubChem and provided important
information on the distribution of structural diversity.
68
Although RECAP has been widely used, it only covers a very limited number of gen-
erally applicable reactions. Other methods constructed fragment spaces that avoid splitting
known molecules by retrosynthetic rules. For example, Cramers group from Tripos devel-
oped the ChemSpace technology
69
and AllChem
70
to navigate through known chemistry by the
Topomer search methodologies.
71
Another study used the Feature Trees Fragment Space Search
Medicinal Research Reviews DOI 10.1002/med
FRAGMENT INFORMATICS AND COMPUTATIONAL FBDD
r
7
(Ftrees-FS) method
64, 72
to generate a huge fragment space encoding about 5 10
11
compounds
based on established in-house synthetic protocols for combinatorial libraries.
73
B. Fragment Frequency Analysis and Fragment Mining
In terms of fragments, a great deal of information can be obtained from existing or virtual
compound databases, such as the distribution of fragments types, their frequency of occurrence
and co-occurrence. Fragment frequency analysis is typically useful for understanding the nature
of fragmentactivity and fragmentdrug-likeness relationships. Cheminformatics analysis of
the differences of fragments between drugs and nondrugs, or between various classes of drugs
will provide medicinal chemists with useful information for prioritizing screening libraries or
designing drug-like compounds.
There are two types of fragment frequency analysis: occurrence analysis and co-occurrence
analysis. Occurrence analysis means the characterization of fragment distributions in large
databases, while co-occurrence analysis is used to compare fragment sets in a pairwise man-
ner. Pioneering work on the analysis of drug-like fragments was performed by Bemis and
Murcko.
60, 61
They identied frequently occurred molecular frameworks and side chains in
drug sets selected from the CMC database. A similar strategy has also been applied to various
databases to nd drug-like fragments.
7477
More recently, Wang and Hou provided an update
on drug-like fragments analysis and identied high-quality fragments for drug design.
78
Fre-
quencies for three kinds of building blocks (ring system, drug scaffold, and small fragment)
were calculated for a FDA-approved drug database (ADDS) and an extended drug dataset
(EDDS). Most top fragments were found to be essentially common for both drug datasets.
Moreover, there is signicant difference in the distribution of chemical fragments between oral
drugs and injectable drugs.
79
Therefore, compounds designed by marketed oral drug fragments
are more likely to have good bioavailability. Fragments that are recurrent in compounds with
different activities are relevant as a useful source for the design of multitarget ligands.
80, 81
Sheridan et al. used common substructures from the MDDR database to identify fragment re-
placements in drug-like molecules
82
and fragments that are associated with multiple biological
activities.
81
Drug-like bioisosteric groups have been identied by the cheminformatics analysis
of the frequency of occurrence of organic substituents in more than 3 million molecules.
83
Haubertin and Bruneau systematically analyzed one-to-one chemical replacements occurring
in an in-house drug-like dataset of AstraZeneca and built a web-based database of historically
observed chemical replacements.
84
Furthermore, fragment frequency analysis has also been
used to build predictive models for ADME/T prediction.
85, 86
Bajoraths group introduced a new method named MolBlaster that used randomly gen-
erated fragment populations to evaluate molecular similarity relationships.
87
MolBlaster gen-
erates fragment proles of molecules by random deletion of bonds in connectivity tables
and quantitative comparisons using entropy-based metrics. The term fragment prole differs
from molecular ngerprint, because it is randomly generated and does not depend on pre-
dened structural or property descriptors. Fragment prole can encode sufcient information
for similarity evaluation, which has been developed into a new tool for ligand-based virtual
screening.
88
Furthermore, the same group developed a new methodology to mine and organize
randomly generated molecular fragment populations.
89
Unique fragment signatures were iden-
tied for molecular sets with similar activities, and then fragment pathways of biologically active
molecules were mapped. The results indicated that compound class-specic information and
activity-specic fragment hierarchies could be obtained from random fragment proles. More
recently, Lounkine et al. developed a new approach termed FragFCA to identify molecular
fragments and fragment combinations that are specic for compounds having different activity
Medicinal Research Reviews DOI 10.1002/med
8
r
SHENG AND ZHANG
proles or that are unique to highly potent molecules.
90
FragFCA uses chemically intuitive
queries of varying complexity to systematically identify sets of signature fragments in a exible
and interactive manner, and is applicable to fragments derived by any fragmentation scheme.
Lameijer et al. reported fragment mining results for the NCI database.
91
The database was
split into more than 60,000 fragments of varying types: ring systems, linkers (26 linkers) and
substituents. After analyzing the fragment occurrence and co-occurrence, the authors identied
chemical clich es that indicated the most-occurring fragments and frequently co-occurring
pairs of fragments. The resulting fragment libraries and correlations can give medicinal chemists
more ideas about lead optimization, clusters of biologically active structures, and relatively
unexplored parts of chemical space.
Analysis of fragment frequencies in biologically active compounds can also be used to build
interpretable models for the prediction of activity and target space. A research group from Lilly
described a simple approach for fragmentation of a literature-based dataset and constructed
naive Bayes models for predicting potency in individual kinases by comparing the similarity of
fragment ngerprints.
76
The statistical models had good predictive ability for kinase potency in
both retrospective and prospective tests. Moreover, the comparison of fragment distributions in
active molecules is also useful to assess target similarity. This method based on fragment-derived
similarities complements sequence-based comparisons and whole-molecule approaches.
C. Fragment Tree
During the process of lead optimization in drug discovery, medicinal chemists often synthesize
a series of analogues with a common or similar core fragment (scaffold). Thus, the structure
of a biologically active compound can be dissected into scaffold and substituents (R-groups).
In order to nd a highly potent compound against a given target, most medicinal chemistry
efforts are focused on the variation of the composition of the scaffolds and the substituents.
For a small dataset, the importance of the scaffolds and substituents are straightforward and
the SARs can be easily understood. But for large databases, especially the one containing a
series of similar scaffold substructures, it is difcult to clearly elucidate SAR. In this case,
organizing the structures in the form of a hierarchical tree is often benecial to rationalize
SAR. In a hierarchical tree, structures with a common core fragment are arranged in branches
and each node is a substructure (fragment) shared by all of its descendent (smaller) nodes.
A well-constructed hierarchical tree can bring insights into which core fragments and which
peripheral substituents are responsible for the activity, toxicity, or other relevant properties.
The methods for building a hierarchical tree are largely dependent on the nature of the
dataset. If the compounds were synthesized by linking various substituents with similar core
fragments in a stepwise manner, it is easy to construct a fragmentation tree on the basis of
the synthetic procedures. A descendent hierarchy can also be constructed by using the known
common scaffolds as the root fragments. However, scaffolds are not always known ahead of time
for a collection of structurally similar molecules without specic information about common
substructures. Thus, algorithms should be developed to determine which parts of a structure
are most scaffold-like.
Several methods have been developed for the extraction, identication, and classica-
tion of chemical scaffolds.
9294
Using these approaches, the scaffold tree can be built and the
scaffold universe of the dataset can be visualized. More recently, Clark et al. reported new algo-
rithms for common scaffold alignment, multiple scaffold detection, and scaffold substructure
assignment.
95
These methods can address the issues of multiple scaffolds, noncommon scaf-
folds, and symmetrical common scaffolds and produce informative data for structureactivity
analysis. Furthermore, the same group described a more informative method for producing
Medicinal Research Reviews DOI 10.1002/med
FRAGMENT INFORMATICS AND COMPUTATIONAL FBDD
r
9
two-dimensional (2D) depiction layout coordinates for each node in a scaffold tree.
96
The al-
gorithm includes generating a fragment tree, mapping sibling fragments onto each other in an
optimal way, and using this mapping to guide a 2D depiction process. The advantage of the
approach lies in that common ancestor fragments can be depicted and oriented in a consistent
way and thus common structural features can be readily evident to medicinal chemists. An in-
teractive tool called the scaffold explorer
97
differs from other automated scaffold classication
algorithms in that the scaffolds can be of arbitrary complexity and the user can construct a
scaffold tree interactively. Scaffold explorer allows medicinal chemists to accommodate their
intuition and shows good interactivity for mapping SAR across different chemotypes.
The above-mentioned methods for building the scaffold trees are mainly based on
chemistry-derived rules and are primarily used to map chemical space. If the generation of
scaffold trees can be guided by both chemistry and bioactivity-derived rules, chemical space
andits relatedbiological space canbe navigatedwithbetter efciency. Waldmannandcolleagues
reported an interactive tool, named Scaffold Hunter, for intuitive hierarchical structuring, anal-
ysis and visualization of complex structure, and biological activity data.
98, 99
Scaffold Hunter
reads data containing both chemical structure and biological activity (e.g. data from HTS).
Then, the program extracts chemically meaningful scaffolds and iteratively deconstructs those
large scaffolds (child scaffolds) one ring at a time to create small scaffolds (parent scaffolds).
Biochemical and biological activities were used as major criteria to guide hierarchical arrange-
ment of parent scaffolds and children scaffolds to create a tree with various branches.
Thus, the resulting tree can be associated with potency data. The method has been validated
by retrospective analysis of two large databases, PubChem and WOMBAT. The advantages of
Scaffold Hunter include: (i) it investigates large chemical and biological spaces more rapidly
and efciently; (ii) it can identify virtual (or new) scaffolds that possess bioactivity similar to
the respective child or parent scaffolds; (iii) it can simplify structurally complex compounds
(e.g. natural products) to two-ring to four-ring scaffolds with retained bioactivity that are syn-
thetically tractable and can be used to design new active chemotypes. Figure 3 outlines the
process of Scaffold Hunter for identication of new active scaffolds. The seven-ring scaffold
5-lipoxygenase (5-LOX) inhibitor 1 was successively deconstructed one ring at a time.
98
A
branch of smaller molecules, except 5, were annotated with 5-LOXinhibitory activity in WOM-
BAT. After testing for 5-LOX inhibitory activity, the three-ring scaffold 5 (IC
50
=9.5 M) and
its derivative 8 (IC
50
=3 M) were found to be novel 5-LOX inhibitors. Although they are less
active than the four-ring scaffold 3 (IC
50
= 1.5 M), compound 5 had higher LE values.
100
Moreover, compounds 5 and 8 did not contain typical functional groups found in classical
5-LOX inhibitors and represent a new scaffold for hit optimization. By a similar procedure, a
tetrahydroisoquinoline scaffold was identied to possess estrogen receptor (ER) antagonis-
tic activity,
98
and subsequent hit optimization led to novel ligands with a simple two-ring core
and good selectivity toward ER (ER).
101
4. ACTIVE SITE MAPPING AND CHARACTERIZATION BY FRAGMENT-BASED
APPROACHES
An initial step of SBDD is the identication of hot spots in the binding pocket or active site
of the drug target. The hot spots (or consensus sites) are important regions that can bind
small drug-like molecules and contribute substantially to the binding free energy. Therefore,
identication and characterization of such hot spots is critical for rational drug design. Multiple
solvent crystal structures (MSCS) is anexperimental tool topredict ligand-binding sites of target
proteins.
102, 103
To use this technique, a crystalline protein is exposed to various organic solvents
Medicinal Research Reviews DOI 10.1002/med
10
r
SHENG AND ZHANG
Figure 3. Illustration of the procedure for the generation of the scaffold tree and brachiation-based identication
of new active scaffolds for 5-lipoxygenase inhibitors.
(smaller fragments), and then the consensus sites on the proteins surface that are colocalized
with multiple solvent molecules can be treated as potential ligand-binding regions. Although
MSCS and other experimental methods are efcacious tools for active site mapping, they
generally require an expensive investment in equipment and resources. As an alternative, various
computational approaches are available to predict the ligand-binding sites of a protein.
104
There are two classes of algorithms for structure-based pocket prediction: (i) geometric
algorithms and (ii) probe mapping/docking algorithms.
105
For the latter, fragments are used
as molecular probes for protein surface mapping and identication of hot spots. GRID
106, 107
and multiple copy simultaneous search (MCSS)
108
are two well-accepted methods for active site
characterization. GRIDcalculates three-dimensional (3D) energy maps around protein binding
sites, thus highlighting favorable sites for small functional groups. MCSS randomly places
thousands of copies of small functional groups into the binding site, and the most energetically
favorable position of each copy is determined by energy minimization. The copies with the
Medicinal Research Reviews DOI 10.1002/med
FRAGMENT INFORMATICS AND COMPUTATIONAL FBDD
r
11
Table I. Summary of Advantages andDisadvantages of Fragment-BasedComputational Tools for Active
Site Mapping
Method Advantages Disadvantages
GRID Global search of the entire protein
surface
Require empirical parametrization and
lack of water molecules in the model
CS-Map Better sampling, the ability to nd
small buried pockets and desolvation
term in the free energy calculation
No consideration for bonded
interactions and different dielectric
constants for different targets
FTMAP Fast FFT correlation approach to
efciently reduce the computational
costs
Lack of water molecules in the model
MCSS The most established method,
reasonable use of physicochemical
potential functions and
incorporation of the standard
molecular simulation framework
No consideration for the cooperative
effects of water and locating
minimum enthalpy poses rather than
nding hot spots
3D-RISM-
based
method
A realistic model including the
coexistence of water and revealing
the dependence of ligand binding
modes on the ligand concentration
No consideration of protein structural
change induced by ligand binding
Grand canon-
ical Monte
Carlo simu-
lation
Fast and simple parameters without
prior knowledge and calibration
Lack of complete validation and case
sensitive
Barrils method Nonparametric and applicable to any
target class, and detecting hot spots
for both small molecules and
macromolecules
Computationally expensive and limited
sampling
lowest energies highlight hot spots of ligand binding. The methodology and application of
MCSS has been reviewed by Schubert et al.
109
Besides GRID and MCSS, other computational
approaches based on fragment mapping/docking and scoring are summarized in Table I.
Earlier methods in this eld have been reviewed,
110, 111
and the following sections mainly focus
on important progress in recent years.
Vajdas group reported a fragment-based computational mapping program named CS-
Map.
90
CS-Map uses a three-step mapping algorithm
81
that includes: (i) nding regions with
favorable electrostatics and solvation by rigid body search; (ii) renement of free energy and
docking; and (iii) clustering, scoring, and ranking. As compared with earlier mapping methods,
CS-Map performs better sampling of regions with favorable desolvation and electrostatics. Its
scoring function takes the desolvation effect into account and the positions of the docked
ligands are clustered and ranked on the basis of their average free energies. More recently,
the same group proposed a new algorithm named FTMAP that uses the Fourier transform
(FT) correlation method for sampling proteinprobe complexes in combination with a highly
accurate energy function.
80
FTMAP is more efcient than CS-Map and free to academic users.
A recent validation study revealed that FTMAP could duplicate the MSCS data successfully
for two targets of Parkinsons disease.
112
Moreover, this method can discover hot spots that are
not found in the MSCS experiments.
Imai et al. used a 3D reference interaction site model (3D-RISM) to identify the most
favorable positions and orientations of fragment molecules on a protein surface.
46
A unique
feature of the 3D-RISM-based method is that the ligand mapping calculation is performed
Medicinal Research Reviews DOI 10.1002/med
12
r
SHENG AND ZHANG
within a realistic model by considering ligands and water at the same level in terms of the site
distribution. Therefore, this method may achieve entire ligand mapping on a protein surface
in a real solution system. In addition, the 3D-RISM-based method can investigate the inuence
of ligand concentration on the binding mode.
Some computationally expensive procedures, such as molecular dynamics simulation and
Monte Carlo simulation, have also been used for fragment mapping, clustering, and ranking.
Clark et al. reported a rapid and practical method to compute binding free energies of a large
number of fragment poses on the entire protein surface.
113
The method uses grand canon-
ical Monte Carlo (GCMC) simulation to compute ligandprotein binding and predicts the
afnities and preferred binding poses of small molecular fragments. Thus, the fragments can
be computationally assembled into higher MW lead compounds. Barrils group developed a
new method to detect binding sites based on rst-principles molecular simulations.
47
More-
over, this method is able to quantify the maximal binding afnity that a ligand may achieve,
and thus can efciently measure the druggability of the target. Because the method is not
trained on a dataset, it is applicable to any target class. Although it is computationally de-
manding, it provides very detailed information about the interaction preferences of the binding
sites.
5. FRAGMENT DOCKING AND VIRTUAL FRAGMENT SCREENING
In most experimental FBDDstudies, only hundreds to thousands of fragments can be screened.
In contrast, at least 250,000 fragments are commercially available,
114
leaving a large portion
of fragment libraries untested. Because commercially available fragments are too numerous
to be screened experimentally, virtual fragment screening by molecular docking seems to be
a complementary approach. However, docking and scoring fragments accurately remains a
challenge. First, fragments are small in size and have low MWs. During docking calcula-
tions, a number of interaction sites on protein surfaces (closely related energy minima) might
be found to accommodate the fragment, which would lead to false docking positions. Even
if fragments are placed into the correct pocket, if the binding pocket is large, it might re-
sult in incorrect binding modes.
115
Second, the internal degrees of freedom of fragments are
generally less than larger compounds. It is more difcult to predict their binding pose be-
cause alternative binding might yield similar docking scores or calculated binding energies.
Third, fragments are always weak binders and current scoring functions are not accurate
enough to differentiate an active fragment among many nonactive fragments, because most
of the scoring functions have been developed and optimized on the basis of larger drug-like
molecules.
116
Shoichets group reported pioneering results for fragment docking and screening of
AmpC -lactamase inhibitors.
117
A database containing 137,639 fragments were docked by
DOCK3.5.54. Forty-eight top ranked fragments were subjected to an in vitro enzyme inhibi-
tion assay and 23 hits with K
i
values in the range of 0.79.2 mM were identied. For AmpC
-lactamase, the hit rate of the in silico fragment screening (48%) was considerably higher
than both virtual screening and HTS of larger molecules. The accuracy of the docking poses
of the active fragments was further investigated by solving the crystal structures of fragment
enzyme complexes. For the eight cocrystallized fragments, four fragments had good pose delity
(RMSD range: 1.21.4
A) and two fragments retained most key interactions (RMSD values:
2.4
A and 2.6
A). The high hit rate and docking accuracy in this case study supports the
feasibility of molecular docking to prioritize molecules from commercially available fragment
libraries. Moreover, this study also highlighted the importance of the selection of an appropriate
docking method and scoring function. In Shoichets study, DOCK3.5.54 was selected to screen
Medicinal Research Reviews DOI 10.1002/med
FRAGMENT INFORMATICS AND COMPUTATIONAL FBDD
r
13
fragments because its physics-based scoring function can prioritize active fragments from inac-
tive ones.
117
In contrast, energy-based scoring functions
118
are limited in their applications for
ranking fragments.
Besides DOCK, several de novo drug design software, such as LUDI
107
and SEED,
119
also can dock fragments into the correct pocket of the active site. For example, Caischs
group developed the program DAIM,
102
which can automatically decompose molecules into
fragments and then select the anchor fragments for docking. DAIMuses the docking algorithms
from SEED
103, 119
and has been validated with six different target enzymes.
106
Glide
120, 121
is
another efcient tool for fragment docking.
122, 123
Researchers from AstraZeneca evaluated
the performance of Glide for virtual screening of fragment inhibitors of DNA ligase and
prostaglandin D2 synthase.
122
The results indicated that using GlideSP with its default settings
gave the best performance and provided an enrichment that was better than random sampling
and comparable to virtual screening of drug-like molecules. More recently, evaluation studies
suggested that the sampling efcacy of Glide was adequate for fragment docking, but the
performance of scoring functions required further improvement.
123
Another newly reported
study revealed that there is no signicant difference in docking performance between fragments
and drug-like compounds.
124
Better docking performance was observed for compounds with
higher LE values, mainly because they can form high-quality interactions with the target.
In 2011, Knehans et al. reported a successful example of in silico fragment screening of
dengue virus (DENV) protease inhibitors.
125
Alibrary of 149,151 fragments was obtained from
RECAP fragmentation of the drug-like molecules from the ZINC database, which were subse-
quently docked into a homology model of DENV protease by AutoDock Vina.
126
A total of
220 top-ranked fragment hits were discovered and subsequently linked to 815 new molecules.
Virtual high-throughput docking was performed again to nally select 23 candidates for bio-
logical testing and two hits were proven to be active in the micromolar range. It is expected that
higher hit rates would be achieved if the accuracy of fragment docking is improved. Regardless,
this computational strategy effectively mimicked the process of experimental FBDD, which
integrates fragment library design, fragment docking, fragment linking, and biological testing.
Two other studies used a similar approach for ligand design,
127, 128
but the resulting molecules
have not yet been validated by experimental studies. Fragment docking can also be used in
parallel to experimental fragment screening. A recent study discovered orally active inhibitors
of Hsp90 molecular chaperone by merging structural elements of different hits derived from
parallel fragment screening and fragment docking.
129
Currently, there are four kinds of strategies to improve the accuracy of fragment dock-
ing. The rst is to add sophisticated and intensive computational tools (e.g., MM/PBSA,
MM/GBSA, and QM/MM) to the postdocking process. Gleeson et al. evaluated the per-
formance of QM/MM-based models to reoptimize and rescore cross-docking poses of nine
fragment-like kinase inhibitors.
130
Hybrid QM/MM calculations were proven to be useful as
a tool for kinase FBDD. On the contrary, Kawatkars results indicated that adding more com-
putationally intensive procedures to Glide docking, such as MM/GBSA rescoring, did not
improve the enrichment.
122
Therefore, success with these computationally expensive rescoring
strategies might be system dependent.
The second strategy to optimize the fragment docking procedure is to improve the per-
formance of scoring functions. The main problem with fragment docking failures is that the
scoring functions are often unable to distinguish the correct binding mode from the incor-
rect ones.
124
Marcou et al. found that scoring by the similarity of interaction ngerprints for
posing and prioritizing either fragments or molecular scaffolds was statistically superior to
conventional scoring functions.
131
One possible option to overcome the limitation of scoring
functions might be the development of fragment-specic scoring functions.
123
To achieve this
goal, the binding nature of fragments should be taken into account. For example, fragments
Medicinal Research Reviews DOI 10.1002/med
14
r
SHENG AND ZHANG
have less rotatable bonds, hydrogen bond acceptors and donors, and form fewer specic inter-
actions than drug-like molecules. These features should be considered when estimating their
enthalpic and entropic contributions during the parameterization of currently available scoring
functions. In addition, force eld-based scoring protocols and other more advanced methods
for evaluating proteinligand interaction energies are likely to be good solutions for improving
fragment-specic scoring functions.
124
The third strategy is to make the fragments larger when docking them into the active site.
Fukunishi et al. proposed the replica generation (FSRG) method to optimize fragment docking.
In the FSRG method, a set of larger molecules (replica molecules) are generated by adding
side chains to the fragment. In the docking simulation, only complementarity between the
surface of the compoundandproteinwas evaluated, whereas hydrogen-bonding andCoulombic
interactions were ignored.
132
In a validation study where inhibitors of six target proteins were
screened, the FSRG method was proven to be effective in nding active fragments among the
decoy compounds.
The fourth strategy is to dock multiple fragments simultaneously. In real cases, multiple
ligands are always involved in the process of molecular recognition whereas most of the docking
methods only dock one ligand at a time. Li et al. proposed the multiple ligand simultaneous
docking (MLSD) strategy that can mimic real molecular binding processes and improve the
sampling of docking poses and scoring of binding free energy.
133
The MLSD method has
been used for fragment-based discovery of novel inhibitors of signal transducer and activator
of transcription 3 (STAT3).
134
A small library of drug scaffolds was simultaneously docked
into hot spots of STAT3 by MLSD to identify optimal fragment combinations. Linking of
the fragment hits in combination with similarity search and structural optimization led to the
discovery of two novel STAT3 inhibitors.
Moreover, preparing the protein structure carefully and choosing appropriate parameters
are also important for accurate fragment docking.
108
Molecular dynamics simulation is an ef-
cient tool to investigate the conformational space of the target proteins and provide reasonable
conformation or conformation ensembles for subsequent fragment docking. Ekonomiuks ex-
ample indicated that using molecular dynamics snapshots of NS3 protease for fragment-based
docking could identify two small-molecule inhibitors that could not be identied by simply
using the X-ray structure.
135
6. FROM FRAGMENTS TO LEADS: DE NOVO DRUG DESIGN
A. De novo Drug Design versus FBDD
Experimental FBDD uses sensitive biophysical techniques to identify low-afnity fragments
hits.
4
Then fragment hits are evolved into leads or candidates by various structure-based design
strategies.
15
In contrast, de novo drug design can be seen as completely virtual FBDDbecause
they have similar concepts and objectives. Both approaches start fromsmall fragments (building
blocks) and aim to convert them into drug-like compounds with novel chemotypes and desired
pharmacological properties. De novo design was approximately rst introduced in 1989,
136
and
was regarded as a complementary approach to HTS and FBDD. In principle, de novo design
is cost and time effective, and can explore larger chemical space. Although more than 30 de
novo design tools have been developed,
39, 62
the success of de novo design in lead discovery
and optimization lags far behind that of experimental FBDD. With respect to more than ten
clinical candidates identied by experimental FBDD,
6
de novo drug design rarely generates
novel molecules with nanomolar activity. The main problems of de novo design include (i) low
efciency in the sampling of the chemical space (or drug-like space); (ii) little consideration for
Medicinal Research Reviews DOI 10.1002/med
FRAGMENT INFORMATICS AND COMPUTATIONAL FBDD
r
15
synthetic feasibility when constructing new structures; (iii) poor accuracy of scoring functions
to predict the binding free energy of the designed compounds; and (iv) unreasonable ADME/T
properties.
39
The success of experimental FBDD greatly motivates the improvement of de novo design.
First, the core idea of de novo drug design is very similar to that of experimental FBDD. De-
veloping new methods that can overcome the limitations of traditional de novo design methods
should help to nd high quality leads efciently. Second, de novo design could also com-
plement current experimental FBDD methods by providing potential solutions and reducing
experimental resources. Third, de novo design tools are helpful to address current challenges
of experimental FBDD. For instance, it is difcult to experimentally investigate the linkers that
enable fragment hits to maintain their binding mode. In this case, de novo design tools can use
simple geometric descriptors to suggest reasonable linkers and predict the binding mode of the
resulting molecules by docking and scoring.
The starting point for de novo design can be atoms or fragments. It is noted that all
the atom-based tools were developed two decades ago.
39
Atom-based approaches have the
advantage that they can systematically search both the chemical space and structural diversity.
However, these tools always suggest a huge number of potential solutions and the resulting
molecules are often problematic in terms of chemical stability, synthetic possibility, and drug
likeness. These limitations can be overcome by fragment-based approaches because the search
space can be signicantly reduced. The synthetic accessibility and drug likeness of the designed
ligand can be more readily captured. The typical components of a computational fragment-
based de novo design process include a fragment library, a compound build-up scheme, a
scoring function, and an optimization procedure. Table II summarizes the key features of new
de novo design methods reported from 2006 to 2011. In the following sections, we will present
recent progress made in the methodologies and applications of fragment-based de novo design
with a focus on current solutions to address the problems of de novo design.
B. Strategies to Assemble and Optimize Fragments
There are two major strategies to assemble fragments into novel molecules: fragment-based
growing and/or linking strategies and fragment hybridization strategies. It is estimated that
the search space of drug-like molecules is about 10
60
compounds, which makes it difcult to
evaluate all solutions in a reasonable computational time. Thus, the realistic function of de
novo drug design is to efciently nd good solutions rather than nd the best solution. The
optimizationalgorithms canimprove the efciency of sampling the huge chemical space of drug-
like molecules and guide the search to appropriate regions where candidate molecules can be
found. Natural Computing algorithms including particle swarm optimization (PSO), evolu-
tionary algorithms (e.g. genetic algorithm, evolution strategy),
137
and ant cloning optimization
(ACO)
138
have been widely used in de novo drug design.
139
Such nature-inspired techniques are
useful for searching very large chemical space to converge on chemically meaningful optima.
1. Fragment-Based Growing and Linking Strategies
For the growth approaches, predened conformation between a seed (xed scaffold) and the
binding pocket of the receptor is necessary. The binding complex can be obtained from molecu-
lar docking or experimental techniques. Then, the seed grows fragment by fragment to comple-
ment the active site geometrically and energetically. At each step of fragment growth, scoring
functions are used to accept or reject the modications. Early methods of this type include:
SmoG,
140, 141
GrowMol,
142
GroupBuild,
143
SPROUT,
144
and GROW.
145
Medicinal Research Reviews DOI 10.1002/med
16
r
SHENG AND ZHANG
Table II. Important Features of the De Novo Design Methods Reported from 2006 to 2011.
Methods Building blocks Assembly rules Search procedure Fitness functions
Fragment
hopping
Five general-purpose
libraries: the basic
fragment library, the
bioisostere library,
the rules for
metabolic stability,
the toxicophore
library, and the side
chain library
Fragment linking
by LUDI
Mapping
fragments on
the minimal
pharmacophoric
elements
Consensus docking
scoring, ADMET
lters
AutoGrow Filtering the ZINC
database by atom
numbers
Fragment-based
growing
strategies
Evolutionary
algorithms
AutoDock scores
COLIBREE Pseudoretrosynthetic
fragmentation of six
compound libraries
A xed build-up
scheme of
scaffold, linkers,
and building
blocks
A discrete version
of particle
swarm
optimization
(PSO)
Using CATS
topological
pharmacophore
similarity to
reference ligands
as tness function
FlexNovo Large fragment spaces
with up to several
thousand fragments
Sequential growth
strategy,
consider
synthetic
possibility by
well-dened
connection rules
Incremental
construction
algorithm
within FlexX
Scoring functions
from FlexX,
physicochemical
property lters,
pose geometry,
and diversity lter
Flux Virtual retrosynthesis
of the COBRA
dataset by RECAP
A restricted set of
reaction
schemes
Evolutionary
algorithms
Ligand-based
similarity scoring
using reference
compound
FOG Collection of
connectivity statistics
for fragments of
interest from a
database of small
molecules
The sequential
growth of small
molecules
constrained by
the transition
probabilities of
the growth
fragment
Statistically
biasing the
growth of
molecules with
desired features
A linear scoring
algorithm
(TopClass)
Fragment
shufing
Alignment of the
proteinligand
complexes, ligand
fragmentation, and
calculation of
fragment score
Incremental
construction of
novel ligands
A tree search
algorithm
QXP scores
together with the
overall fragment
scores
GANDI Fragment library from
molinspiration
cheminformatics
Linking
predocked
fragments by
SEED
A genetic
algorithm and a
tabu search
A linear
combination of
three scoring
functions
including force
eld energy and
2D or 3D
similarity
Medicinal Research Reviews DOI 10.1002/med
FRAGMENT INFORMATICS AND COMPUTATIONAL FBDD
r
17
Table II. Continued
Methods Building blocks Assembly rules Search procedure Fitness functions
MED-Hybridise Using
MED-SuMo-
Fragmentor to
dene 3D
protein-
fragment
patterns called
MED-Portion
Recombining
chemical
moieties from
MED-Portions
into putative
ligand molecules
Query
MED-SuMo
database
MED-SuMo hit
descriptors,
bioinformatic
descriptors, and
physicochemical
properties;
target-specic
lters for
hybridized
molecules
MEGA A substructure
mining tool
including
frequent
subgraph
mining and
RECAP rule
Growing strategy Combines
multiobjective
evolutionary
techniques with
graph theory
Binding afnity
scorers, molecular
similarity scorers,
and chemical
structure scorers
SQUIRRELnovo Multiconformer
database (17,934
fragments) by
RECAP-based
fragmentation
of COBRA
collection
Bioisosteric
replacements
Alignment of
fragments to a
reference
compound by
graph matching
algorithm
LIQUID fuzzy
pharmacophore
function
Hechts method Fragmentation of
literature
libraries into
scaffolds and
R-group
fragments
Fragment linking
and docking
Evolved fragment
assembly
Binding afnity
score and
articial neural
network
Recore Fragmentation of
CSD database
by RECAP rules
and ltered by
various criteria
Scaffold
replacement
R-tree index and
k-nearest-
neighbor
search
Geometric ranking
EAISFD A fragment
library with over
1300 fragments
extracted from
MDDR
Scaffold hopping
or substructure
optimization
Evolutionary
algorithm
Surex-Dock score
LigBuilder 2.0 Extraction from
WDI database
Growing and
linking
Genetic algorithm Binding afnity
prediction,
physicochemical
properties
evaluation,
lockkey match
evaluation,
synthesizability
prediction
Medicinal Research Reviews DOI 10.1002/med
18
r
SHENG AND ZHANG
Table II. Continued
Methods Building blocks Assembly rules Search procedure Fitness functions
NovoFLAP A fragment
library with over
1300 fragments
extracted from
MDDR
Modication or
hopping of the
starting structure
Evolutionary
algorithm
(EA-Inventor)
Ligand-based
scoring function:
exible ligand
alignment
protocol (FLAP)
PROTOBUILD 594 fragments
derived from
decomposition
of Thomson
Pharma
database
Fragment growth
in the binding site
according to a set
of fragment
fragment
interconnectivity
rules
Genetic algorithm PROTOSCORE
that includes
improved terms
for the estimation
of entropy plus
terms for various
nonbonded
interactions
PhDD Eight types of
fragment
databases
(correspond to
eight popular
pharmacophore
features) by
fragmentation
of MDDR and
CMC
Linking fragments
that are tted to
pharmacophore
model
Alignment of
fragments to the
pharmacophore
hypothesis
Fitness score to the
pharmacophore
hypothesis,
assessment of
drug likeness, and
synthetic
accessibility
Fragment-based growing strategies are computationally efcient and the generated
molecules always have good chemical diversity. Moreover, the growth methods always com-
bine docking software for binding pose prediction and scoring. For example, the incremental
construction algorithm of docking software FlexX
146
was used to develop the new de novo de-
sign algorithm FlexNovo.
147
FlexNovo works with the fragments directly and deals with large
fragment spaces (several thousandfragments). Moreover, various lters including physicochem-
ical properties, diversity, and placement geometry have been incorporated into the fragment
build-up process. More recently, FlexNovo was implemented into a comprehensive project
named NovoBench.
148
The NovoBench project integrates several tools, such as generation
of fragment space (Colibri
149
and FragView), structure-based (FlexNovo
147
), property-based
(FragEnum
150
), and ligand-based (Ftrees-FS
64
) search algorithms to meet the various demands
of de novo design.
In many growing strategies, the binding mode or pose of the seed fragment is assumed
to be xed upon fragment growth. However, this assumption is often invalid in many cases.
Durrant et al. developed AutoGrow that combines elements of fragment-growing, docking,
and evolutionary algorithm.
151
In AutoGrow, each generated new compound during fragment
addition is dynamically redocked into the binding pocket by AutoDock
152
and generates new
poses for each molecule. An evolutionary algorithm is used to explore new chemical space by
evaluating the docking scores of every population member, and select the best molecule for
subsequent generation.
Another method, FOG, also grows molecules by adding fragments to a nascent molecule.
153
The novelty of FOG lies in that the growth of molecules depends on the frequency of specic
fragmentfragment connections by mining a specic molecular database. In addition, FOGcan
be trained to grow new molecules with chemical and topological features similar to a desired
Medicinal Research Reviews DOI 10.1002/med
FRAGMENT INFORMATICS AND COMPUTATIONAL FBDD
r
19
class of compounds (e.g. natural products and drugs) by the Topology Classier (TopClass)
algorithm.
Fragment-based linking strategies map fragments onto the key regions in the binding
pocket to determine various energetically favorable positions and then link them together to
build new molecules. The above-mentioned active site mapping approaches, such as GRID
and MCSS, are commonly used to position seed fragments or functional groups to the correct
locations in the binding pocket. The link concept has been widely used in a number of de
novo design methods including CONCERTS,
154
LUDI,
155, 156
CAVEAT,
157
NEWLEAD,
158
DLD,
159
BUILDER,
160
and SKELGEN.
161
These methods can search large chemical space
using a relatively small number of initial fragments. The resulting molecules are expected to
bind more tightly with the target than the individual fragment. However, fragment linking might
lead to a change of the overall conformation of the generated molecules, and key interactions
between the initial fragment and target might be lost. Thus, redocking the new ligands might
be necessary in the postprocessing step.
GANDI is a new de novo design tool for automatically linking predocked fragments
with a user-dened fragment library.
162
GANDI uses SEED
163
for fragment docking and
its optimization procedure combines a genetic algorithm and a tabu search. An important
feature of GANDI is its multiobjective evolutionary optimization strategy that simultaneously
optimizes the force eld energy and a 3D-overlap term to known binding modes or a 2D-
similarity term to known inhibitors. Thus, GANDI can be both structure-based and ligand-
based according to the users need. In addition, GANDI is free to academic users, which
provides more opportunities to validate the method.
Ji et al. proposed fragment hopping as a new fragment-based tool for de novo inhibitor
design.
164, 165
Fragment hopping is a pharmacophore-driven strategy focusing on isozyme se-
lectivity and ligand diversity. The derivation of the minimal pharmacophoric element for each
pharmacophore is the key point of this approach. The minimal pharmacophoric element can
be an atom, a cluster of atoms, a virtual graph, or vector(s), which can be derived from a
combinatorial application of different active site analysis and pharmacophore identication
methods. The novelty of minimal pharmacophoric elements lies in that they can map an impor-
tant interaction pattern between a ligand and hot spots for both isozyme selectivity and ligand
binding based on a priori knowledge and experience. Five fragment libraries are implemented
within fragment hopping. The basic fragment library and the bioisostere library are queried
to generate a focused fragment library with diverse structures that can match the requirements
of the minimal pharmacophoric elements. Then, the focused fragment library is ltered by the
rules for metabolic stability and the toxicophore library. The binding positions of the resulting
fragments to each pharmacophore are searched by LUDI and the MCSS program. Finally, the
desired molecules are generated by linking these fragments using the side chain library. The
evaluation process includes docking, consensus scoring, and ADMET lters. Moreover, this
new de novo design methodology is an open and interactive system according to the medicinal
chemists requirements for a specic research project. The application of fragment hopping to
de novo design of highly potent and selective neuronal nitric oxide synthase (nNOS) inhibitors
will be introduced in the section of case studies.
More recently, Lais group developed LigBuilder 2.0,
166
which is an improved version of
their previously reported method LigBuilder 1.0.
167
LigBuilder uses a genetic algorithm to
construct ligands iteratively by fragment linking or growing. Compared with LigBuilder 1.0,
the new version takes synthetic accessibility into account by an embedded chemical reaction
database and a retrosynthesis analyzer (SYLVIA).
168
Moreover, an accurate cavity detection
program (Cavity 1.0) and the Drug Space Exploring Algorithm was incorporated into the
design process. Various lters including binding afnity evaluation, physicochemical properties
evaluation, and lock-key match evaluation are used to nd the best molecules.
Medicinal Research Reviews DOI 10.1002/med
20
r
SHENG AND ZHANG
2. Molecular Hybridization and Scaffold Hopping
Molecular hybridization is a common design strategy in medicinal chemistry.
169, 170
The
core idea of molecular hybridization is to combine pharmacophoric fragments from differ-
ent bioactive compounds to generate a new hybrid molecule with improved biological or
physicochemical properties. This concept has been used to develop new de novo design algo-
rithms. BREED is the rst automated method that produces novel molecules from structures
of different known ligands targeting a common receptor.
171
This method takes advantage
of the structural information of known ligandstarget complexes, aligns the 3D coordi-
nates of two ligands, and recombines fragments at overlapping bonds to generate hybrid
molecules.
Inspired by the idea of the BREEDmethod, Wangs group developed an automatic method
named automatic tailoring and transplanting (AutoT&T) that can effectively utilize the results
of virtual screening in fragment-based lead optimization.
172
AutoT&T identies suitable frag-
ments from virtual screening hits and then transplants them onto a predened lead compound
to generate new ligand molecules with improved binding afnities. As compared with the con-
ventional de novo design methods, AutoT&T has several advantages. First, it detects fragments
directly from other organic molecules and does not rely on a predened building block library.
The input molecule databases can be exible with no limitations in terms of sizes or types.
Second, AutoT&T is more efcient and does not have the problem of combinatorial explosion.
It performs structural transplantation on the basis of the matched bonds between the lead
compound and each given steak molecule without adopting a sequential build-up approach.
Third, synthetic feasibilities and drug-likeness properties are taken into account during the
invention of new molecules.
The idea of mix and match in BREED was also used to develop several new algorithms,
such as MED-Hybridise and fragment shufing. Fragment shufing differs from BREED in
the way that it is able to hybridize multiple ligands within one iteration step and includes
fragment scores to guide the incremental construction of the new ligands.
173
MED-Hybridise
is a computational drug design toolkit at PDB scale that combines the local similarity of
protein surfaces and a fragment-based approach.
174
MED-Hybridise takes advantage of the
rich structural information of targetligand complexes in the PDB database. An important step
in MED-Hybridise is to dene the MED-Portion, which is the 3D protein-fragment patterns
obtained from mining all available proteinligand crystal complexes within a library of small
molecules. For any binding surface query, matched MED-Portions can be retrieved using
MED-SuMo
175, 176
to superimpose similar protein interaction surfaces. The resulting MED-
Portion chemical moieties are collected and used to generate new 3D hybrid molecules. In a
retrospective validation study, MED-Hybridise could successfully retrieve scaffolds of known
active compounds for a GPCR target (2-adrenergic receptor) and a protein kinase target
(vascular endothelial growth factor receptor 2, VEGFR-2).
The concept of scaffold hopping
177
is similar to that of molecular hybridization. The
core idea of scaffold hopping is to replace a central element of the molecular scaffold by
a new molecular fragment. There are several computational tools for scaffold hopping.
178
Recore is a fast and effective approach for scaffold replacement that uses 3D fragments
as queries and can search pharmacophore-type features.
179
Moreover, Recore incorporates
k-nearest-neighbor searches and a voting system to enable the exploration of large search
spaces.
3. Ligand-Based Methods
Although the majority of de novo design tools are structure-based methods, they also face
several major challenges such as the accuracy of the scoring functions and the exibility of the
Medicinal Research Reviews DOI 10.1002/med
FRAGMENT INFORMATICS AND COMPUTATIONAL FBDD
r
21
receptor. Moreover, it remains a challenge to solve the crystal structures of many membrane-
bound drug targets (e.g. GPCRs). In this case, ligand-based methods provide an alternative
for de novo drug design. Such approaches do not rely on the 3D structure of the drug target
and instead, their design process is guided by maximizing the similarity between the generated
molecules and the known active compounds.
Ligand-based methods are an emerging hot area in recent years. Schneiders group has
proposed three promising approaches (i.e. COLIBREE, Flux, and SQUIRREL).
62, 180
COL-
IBREE uses a xed build-up strategy by adding various building blocks and linkers to a
predened molecular scaffold to generate a focused combinatorial library.
181
The optimiza-
tion procedure is guided by a stochastic optimization algorithm, PSO,
180
with CATS topo-
logical pharmacophore similarity
177
to reference ligands as a scoring function. Flux assem-
bles molecules by a restricted set of reaction schemes and the chemical synthesis of the
constructed molecules is eased using molecular building blocks obtained from the RECAP
principle.
182, 183
A stochastic search algorithm is implemented in Flux with similarity-based
descriptors and metrics as tness functions. SQUIRREL is a new algorithm to compare both
molecular shape and potential pharmacophore features.
184
It was used to develop a ligand-
based de novo design tool that can suggest bioisosteric replacement groups for a reference
compound.
185
Triposs EA-Inventor is a generic structure invention engine based on an evolutionary
algorithm.
186
It works on the connection tables of an initial population of structures and the
evolutionary process of structure invention. It can be driven by any user-dened scoring func-
tion (binding afnity, pharmacophores, similarity, or other desired properties). EA-Inventor
has been used in several structure-based or ligand-based de novo design approaches.
187190
Liu
et al. reported a structure-based method named EAISFD,
189
which combines EA-Inventor for
structure evolution with Surex-Dock
191, 192
for docking and scoring. EAISFD introduced the
Tagged Fragment (TF) strategy for the multiobjective build-up process. TF can be either a
fragment (substructure) of the ligand or a newfragment attached to the ligand that serves to an-
chor key binding interactions. Thus, the TFstrategies can be used for partial or full drug design,
such as scaffold hopping, substructure optimization, and structure extension. Now, EAISFD
has been developed into a commercialized product, namely Muse (http://www.tripos.com),
for de novo drug design. More recently, researchers from Tripos reported NovoFLAP as a
ligand-based approach that combines EA-Inventor with a powerful scoring function Flexible
Ligand Alignment Protocol (FLAP). FLAP uses both molecular shape and pharmacophoric
features in a multiconformational context.
187
Pharmacophore hypothesis can also be useful in de novo drug design. Fragments that t in
different parts of a pharmacophore model can be linked together by various spacers to gener-
ate novel structures. NEWLEAD is the rst pharmacophore-based de novo design method.
158
However, it can only process pharmacophoric functional groups rather than abstract chemi-
cal features such as hydrogen bond donors and acceptors, and hydrophobic features. Yangs
group made important improvements for pharmacophore-based de novo design. Their new
method, PhDD, is able to work with abstract pharmacophore models and be implemented with
comprehensive evaluators including drug likeness, bioactivity, and synthetic accessibility.
193
A
research group from Eli Lilly reported a fragment-based method for the de novo design of
kinase inhibitors.
194
Fragmentation of existing kinase inhibitors was used to generate building
blocks that were subsequently recombined to create de novo chemical libraries. The libraries
were driven by a general kinase pharmacophore model and a support vector machine based
method (SVMFP)
195
was used to predict combinations of fragments. The overall hit rate of the
pharmacophore-driven de novo library was very high (92%), which highlights the superiority
of this fragment-based strategy over virtual screening or structure-based minor modications
of existing inhibitors.
Medicinal Research Reviews DOI 10.1002/med
22
r
SHENG AND ZHANG
C. Scoring Functions and Multiobjective Optimization
Scoring functions are crucial to guide the optimization process and to evaluate the binding
afnity or chemical similarity of the designed molecules. During the process of sampling chem-
ical space, a huge number of iterations should be evaluated by a scoring function. Thus, scoring
functions are required to be fast, but this feature comprises their accuracy. Scoring functions
from molecular docking are popular in structure-based de novo methods.
196
These scorers are
adept at discriminating between inactive and active compounds, but their ability to rank the
ligands with similar chemotypes is relatively poor. Therefore, the application of docking-based
scoring functions in de novo design is less successful than in virtual screening. On the other
hand, comprehensive physics-based approaches, such as MM-GBSA/PBSA,
197, 198
free-energy
perturbation,
199, 200
single-step perturbation,
201
GCMC simulations,
113
and thermodynamic
integration,
202
can yield very accurate binding free energies. However, these methods require
high computational expenses, and thus are not suitable for the search process of de novo design.
Even so, they are very helpful to improve the success rate of de novo design by re-evaluating the
nal molecules after postprocessing and identifying the best one for chemical synthesis and bi-
ological testing. Nowadays, the availability of computational resources (e.g. cloud computing)
is increasing dramatically, which will enable a wider use of physics-based scoring functions in
de novo design.
Computational intelligence, such as machine learning approaches, has been used in drug
design for the automatic selection of important features and the optimization of models.
203
In
combination with traditional scoring functions, computational intelligence tools can quickly
and efciently search diversity space for good solutions. Hechts method uses an evolved frag-
ment assembly algorithm for directed searches of novel leads.
204
The novelty of the method
lies in that it uses a computational intelligence screening tool for compound selection. The
screening tool integrates evolved articial neural nets, docking software as well as QSAR and
QSPR models.
Early de novo design approaches have been created to satisfy a single objective and most of
them have focused on the interaction scores with the binding pocket. However, these methods
ignore the multiobjective nature of drug discovery and development. Thus, the application of
multiobjective optimization strategies in design workow is benecial to improve the drug-like
behavior of the generated molecules. Moreover, multiobjective optimization methods can avoid
local optima and dead ends corresponding to a single objective and lead to a more efcient
search process. MEGA is a new de novo design algorithm for multiobjective optimization
that combines graph theory with evolutionary techniques to perform an efcient global search
for promising solutions.
205
Three kinds of tness functions, namely binding afnity scorers,
molecular similarity scorers, and chemical structure scorers, are used to guide the optimization
process. Therefore, structurally diverse molecules with good binding energy and drug-like
properties can be designed by MEGA.
D. Synthetic Accessibility and Drug Likeness
Synthetic accessibility is one of the key issues that remain to be addressed in de novo design.
Two types of approaches have been reported to improve the synthetic accessibility of computer-
designed structures. The rst approach uses connection rules and synthesizable building blocks
to construct new molecules. Synthesizable building blocks can be obtained either from decom-
position of compound databases by virtual retrosynthesis rules or from commercially available
compounds. The connection rules are mainly derived fromorganic synthesis reactions.
206, 207
As
mentioned above, RECAP
59
rules have been commonly used to disassemble and reconstruct
synthetically feasible molecules (e.g. TOPAS
206
and Flux). The advantage of using RECAP
Medicinal Research Reviews DOI 10.1002/med
FRAGMENT INFORMATICS AND COMPUTATIONAL FBDD
r
23
rules in de novo design is that the chemical environment around a new bond is similar to
that in drug-like and synthesizable compounds. Compounds created from such rules are more
approachable. The limitations of RECAP lie in that they are crude abstractions of actual
chemical reactions and only cover a small number of types. Moreover, compounds constructed
from RECAP rules are only potentially synthesizable, and no scores and synthetic routes can
be suggested. Other methods, such as SYNOPSIS,
207
use known chemical reactions to form
bonds between readily available building blocks. A total of 70 selected organic reactions were
implemented in SYNOPSIS and the building blocks are taken from the ACD
208
database.
The second approach is amenable to the rst one and uses additional scoring functions
to evaluate the synthesizability of the generated candidates in the postprocessing step. For ex-
ample, FOG generates synthetically tractable molecules by use of the software
168
SYLVIA.
153
Other computer-aided organic synthesis design methods, such as Route Designer,
209
can sug-
gest synthetic routes for experimental validation studies of de novo design. More recently, new
cheminformatics tools (e.g. reaction vectors
210
and Reaction-MQL
211
) for the representation
and encoding of organic reactions should contribute to the de novo design of molecules with
good synthetic accessibility. A new algorithm named DOGS is under development by Schnei-
ders group.
62, 212
DOGS is based on a series of known reactions selected from the literature and
can propose at least one possible synthetic route for each designed compound. Other methods
for synthetic evaluation are reviewed by Kutchukian et al.
213
Drug likeness is another important constraint in de novo design. As described above, the
consideration of drug likeness has been incorporated into every stage of de novo drug design.
The fragment libraries for molecular invention are often obtained from the decomposition of a
drug or drug-like database, which is based on the assumption that molecules built fromdrug-like
building blocks are more likely to possess drug-like properties. An efcient search of drug-like
chemical space is also necessary to invent high-quality molecules. In the postprocessing stage,
Lipinskis rule of ve
214
is used as a popular lter for prioritizing candidates before synthesis
and biological evaluation. Moreover, recent progress in computational ADMET prediction
strategies
215
can also be applied to de novo drug design, which is helpful to re-evaluate the
candidate molecules in a cost-effective manner.
E. Recent Examples of Fragment-Based De Novo Design
Although de novo design tools are far from perfect and rarely generate ligands with nanomolar
activity, they can be viewed as an idea generator to provide novel chemotypes for medici-
nal chemists.
196
Integration of de novo design software and expertise knowledge of medicinal
chemists is likely still necessary to nd high-quality hits or leads. Table III summarizes re-
cent examples of de novo design.
216233
In combination with our own work in this eld, four
successful examples will be discussed in detail.
1. Case 1: De Novo Design of Selective nNOS Inhibitors
NOS represent a family of enzymes that produces nitric oxide (NO). There are three isozymic
forms of NOS including endothelial (eNOS), macrophage or inducible (iNOS), and neuronal
(nNOS) isozymes. Among them, nNOS is an important target for various neurodegenerative
disorders.
234
However, structure-based design of isoform-selective inhibitors is a difcult and
challenging task because the active sites are nearly identical for all three NOS isoforms.
235238
Ji et al. reported successful examples for the de novo design of selective nNOS inhibitors on the
basis of the minimal pharmacophoric elements andfragment hopping.
164
The active site of NOS
was investigated by two different methods (i.e. GRIDand MCSS) and the obtained information
combined with previous SAR results led to the generation of the minimal pharmacophoric
Medicinal Research Reviews DOI 10.1002/med
24
r
SHENG AND ZHANG
Table III. Selected Examples of Fragment-Based De Novo Design from 2006 to 2011
Examples Ref Examples Ref
Date: 2006
Target: Peroxisome proliferator
activated receptor (PPAR)
Method: LeapFrog
Activity: K
D
= 6.86 M
LE = 0.16
222
N
S
O
N
O
O
O
Date: 2006
Target: FK506-binding protein (FKBP)
Method: LUDI / Project Library
Activity: Half-maximal neuroprotective
effect concentration at about 10 pM
233
S
HO
OH
Date: 2006
Target: estrogen receptor (ER)
Method: SkelGen
Activity: IC
50
= 0.34 M
LE = 0.38
223
N
O
H
3
C
HOOC
Date: 2006
Target: Plasmodium falciparum
dihydroorotate dehydrogenase
Method: SPROUT
Activity: IC
50
= 42.6 M
LE = 0.17
225
I
CN
O
Date: 2007
Target: HIV-1 reverse transcriptase
(RT)
Method: LUDI
Activity: IC
50
= 3.5 M
LE = 0.42
226
N
N
HN
O
N
N
Date: 2007
Target: histamine H3 receptor
Method: SkelGen
Activity: K
i
= 0.3 nM
LE = 0.31
267
Medicinal Research Reviews DOI 10.1002/med
FRAGMENT INFORMATICS AND COMPUTATIONAL FBDD
r
25
Table III. Continued
Date: 2008
Target: Tat-TAR RNA
Method: FLUX
Activity: inhibit Tat-TAR interaction at
250 M
231
Date: 2008
Target: p38 MAP kinase
Method: LigBuilder
Activity: IC
50
= 83 nM (whole blood)
LE = 0.15
219
Date: 2008
Target: Bacteria RNA polymerase
(RNAP)
Method: SPROUT
Activity: IC
50
= 62 M
LE = 0.12
216
Date: 2008
Target: -secretase (BACE1)
Method: AlleGrow
Activity: IC
50
= 1 nM
LE = 0.20
221
Cl
N
S N
H
O
N O
O
CH
3
O O
Date: 2008
Target: T-type calcium channel
Method: SPROUT
Activity: IC
50
= 0.11 M
LE = 0.17
224
Date: 2009
Target: gp41
Method: LCT
Activity: IC
50
= 31 M
LE = 0.13
228,22
9
Medicinal Research Reviews DOI 10.1002/med
26
r
SHENG AND ZHANG
Table III. Continued
Date: 2009
Target: D-alanyl-D-lactate ligase (VanA)
and D-alanyl-D-alanine ligase (DdlB)
Method: SPROUT
Activity: 83% inhibition at 500 M
232
Date: 2009
Target: two Cdc25 phosphatases
(Cdc25A and Cdc 25B)
Method: LigBuilder
Activity: IC
50
= 5.1 M (Cdc25A); IC
50
= 1.2 M (Cdc25B)
LE1 = 0.22; LE2 = 0.25
230
Date: 2009
Target: HCV helicase
Method: LigBuilder
Activity: IC
50
= 0.26 M
LE = 0.27
227
O
O
F
Cl
Cl
F
N
O
O
Date: 2009
Target: Human cannabinoid-1 receptor
Method: TOPAS
Activity: K
i
= 3.3 nM
LE = 0.26
217
N
H
N
H
N
O
OH
F
F
O
CH
3
N
O
H
3
CO
S
O
O
Date: 2010
Target: -secretase (BACE1)
Method: AlleGrow
Activity: K
i
= 1 nM
LE = 0.18
220
Date: 2011
Target: Peroxisome proliferator
activated receptors (PPARs)
Method: PROTOBUILD
Activity: active at 1 M as PPAR/
dual agonists
218
Medicinal Research Reviews DOI 10.1002/med
FRAGMENT INFORMATICS AND COMPUTATIONAL FBDD
r
27
Figure 4. A Minimal pharmacophoric elements for selective nNOS inhibitor design. B Binding mode of inhibitor
9 with nNOS and hydrogen bonds are displayed as red dash lines. C Chemical structures of two novel nNOS
inhibitors derived by fragment-based de novo design. The structural information is obtained from the Protein
Databank (PDB code: 3B3N).
elements for nNOS. The minimal pharmacophoric elements identied for selective nNOS
inhibitor design included an amidino group, four nitrogen atoms, and two hydrophobic (or
steric) groups (Fig. 4A). An amidino group is positioned close to E592 of nNOS to form
chargecharge and hydrogen-bonding interactions. A sp
3
-hybridized nitrogen cation is placed
close to the selective region dened by D597, while the other three nitrogen atoms are near to the
heme propionate to form chargecharge interactions and hydrogen bonds. The regions where
hydrophobic and/or steric interactions play important roles are the positions closest to D597
and the heme propionate. Using a design process of fragment hopping, a focused fragment
library was generated to match the minimal pharmacophoric elements and the fragments were
linked by LUDI to build new molecules.
After evaluation, compound 9 (Fig. 4C) and its analogues were subjected to chemical
synthesis andinhibitory activity testing. Compound9 revealedpotent inhibitory activity toward
nNOS (K
i
= 388 nM). Moreover, it was also a highly selective nNOS inhibitor with 1100-fold
and 150-fold selectivity over eNOS and iNOS, respectively. The crystal structure of nNOS in
complex with compound 9 indicated that only the (3
S, 4
R, 4
R, 4