Sei sulla pagina 1di 223

Biotechnology

A problem approach
Fundamentals and Practice - I
Fifth edition

Pranav Kumar
Praveen Verma
Usha Mina
Biotechnology
A problem approach

Fifth edition

Pranav Kumar
Former faculty,
Department of Biotechnology,
Jamia Millia Islamia,
New Delhi, India

Praveen Verma
Scientist VI and Group Leader,
National Institute of Plant Genome Research (NIPGR),
New Delhi, India

Usha Mina
Senior Scientist,
CESCRA,
Indian Agricultural Research Institute (IARI),
New Delhi, India

Pathfinder Publication
New Delhi, India
Pranav Kumar
Former faculty,
Department of Biotechnology,
Jamia Millia Islamia,
New Delhi, India

Praveen Verma
Scientist VI and Group Leader,
National Institute of Plant Genome Research (NIPGR),
New Delhi, India

Usha Mina
Senior Scientist,
CESCRA,
Indian Agricultural Research Institute (IARI),
New Delhi, India
Biotechnology: A problem approach, Fifth edition

ISBN: 978-93-80473-00-0 (paperback)

Copyright © 2017 by Pathfinder Publication, all rights reserved.

This book contains information obtained from authentic and highly


regarded sources. Reasonable efforts have been made to publish reliable data
and information, but the author and the publisher cannot assume responsibility
for the validity of all materials or for the consequences of their use.
No part of this book may be reproduced by any mechanical, photographic, or
electronic process, or in the form of a phonographic recording, nor it may be
stored in a retrieval system, transmitted, or otherwise copied for public or
private use, without written permission from the publisher.

Publisher : Pathfinder Publication


Production editor : Ajay Kumar
Copy editor : Jomesh Joseph
Illustration and layout : Pradeep Verma
Cover design : Monu
Marketing director : Arun Kumar
Production coordinator : Murari Kumar Singh
Printer : Ronit Enterprises, Uttar Pradesh, India

Pathfinder Publication
A unit of Pathfinder Academy Private Limited, New Delhi, India.
pathfinderpublication.in
Preface
The present century has been considered as one that belongs to biotechnology because
it has an unlimited potential to produce an extensive range of valuable products. This
branch of science has been viewed as something vital for life with numerous scientific
applications in several fields of human endeavours. The branch of science is significant for
mankind that many of the big discoveries of the second half of the last century and early this
century would not have been possible in the absence of our accomplishments in this disci-
pline. Biotechnology – A problem approach, covers the basic concepts, methodologies and
applications of biotechnology. This book provides a balanced introduction to all major areas of
the subject. The chapters such as Biomolecules and catalysis, Bioenergetics and metabolism,
Cell structure and functions, Immunology, Genetics, Bioinformatics and Bioprocess engineer-
ing were selected in a sharply focused manner without overwhelming or excessive details.
Sincere efforts have been made to support textual clarifications and explanations with the help
of flow charts, figures and tables to make learning easy and convincing. The chapters have been
supplemented with self-tests and questions so as to check one’s own level of knowledge.
This book has been designed to serve as a comprehensive biotechnology textbook as well as
a wide-ranging reference book.

Acknowledgements
Our students were the original inspiration for the first edition of this book, and we remain
continually grateful to all of them, because we learn from them how to think about the life
sciences and how to communicate knowledge in most meaningful way. We thank, Abhai Kumar,
Rizwan Ansari, Sarika Srivastava, Shashi Prakash Singh, Lekha Nath and Mr. Ajay Kumar,
reviewers of this book, whose comments and suggestions were invaluable in improving the
text. Any book of this kind requires meticulous and painstaking efforts by all its contribu-
tors. Several diligent and hardworking minds have come together to bring out this book in
this complete form. This book is a team effort, and producing it would be impossible without
the outstanding people of Pathfinder Publication. It was a pleasure to work with many other
dedicated and creative people especially Pradeep Verma of Pathfinder Publication during the
production of this book.

Pranav Kumar

Praveen Verma

Usha Mina

iii
Contents
Chapter 1
Biomolecules and Catalysis
1.1 Amino acids and Proteins 1

1.1.1 Optical properties 3

1.1.2 Absolute configuration 4

1.1.3 Standard and non-standard amino acids 5

1.1.4 Titration of amino acids 7

1.1.5 Peptide and polypeptide 12

1.1.6 Peptide bond 13

1.1.7 Protein structure 16

1.1.8 Denaturation of proteins 20

1.1.9 Solubilities of proteins 21

1.1.10 Simple and conjugated proteins 22

1.2 Fibrous and globular proteins 22

1.2.1 Collagen 23

1.2.2 Elastin 24

1.2.3 Keratins 25

1.2.4 Myoglobin 25

1.2.5 Hemoglobin 27

1.2.6 Models for the behavior of allosteric proteins 32

1.3 Protein folding 33

1.3.1 Molecular chaperones 35

1.3.2 Amyloid 36

1.3.3 Ubiquitin mediated protein degradation 36

1.3.4 N–end rule 38

1.4 Protein sequencing and assays 39

1.5 Nucleic acids 48

1.5.1 Nucleotides 48

1.5.2 Chargaff’s rules 52

1.6 Structure of dsDNA 53

1.6.1 B-DNA 53

1.6.2 Z-DNA 55

1.6.3 Triplex DNA 55

v
1.6.4 G-quadruplex 56

1.6.5 Stability of the dsDNA helix 57

1.6.6 Thermal denaturation 57

1.6.7 Quantification of nucleic acids 59

1.6.8 Supercoiled forms of DNA 59

1.6.9 DNA: A genetic material 61

1.7 RNA 63

1.7.1 Alkali-catalyzed cleavage of RNA 64

1.7.2 RNA world hypothesis 65

1.7.3 RNA as genetic material 65

1.8 Carbohydrates 66

1.8.1 Monosaccharide 66

1.8.2 Epimers 68

1.8.3 Cyclic forms 68

1.8.4 Derivatives of monosaccharide 70

1.8.5 Disaccharides and glycosidic bond 71

1.8.6 Polysaccharides 73

1.8.7 Glycoproteins 76

1.8.8 Reducing and non-reducing sugar 76

1.9 Lipids 76

1.9.1 Fatty acids 77

1.9.2 Triacylglycerol and Wax 79

1.9.3 Phospholipids 80

1.9.4 Glycolipids 82

1.9.5 Steroid 83

1.9.6 Eicosanoid 83

1.9.7 Plasma lipoproteins 85

1.10 Vitamins 86

1.10.1 Water-soluble vitamins 86

1.10.2 Fat-soluble vitamins 90

1.11 Reactive oxygen species and antioxidant 93

1.12 Enzymes 93

1.12.1 Naming and classification of enzyme 94

1.12.2 How enzymes operate? 96

1.12.3 Enzyme kinetics 99

1.12.4 Enzyme inhibition 106

1.12.5 Regulatory enzymes 110

1.12.6 Isozymes 112

1.12.7 Zymogen 113

1.12.8 Ribozyme 114

1.12.9 Examples of enzymatic reactions 114

vi
Chapter 2
Bioenergetics and Metabolism
2.1 Bioenergetics 123

2.2 Metabolism 128

2.3 Respiration 129

2.3.1 Aerobic respiration 129

2.3.2 Glycolysis 130

2.3.3 Pyruvate oxidation 135

2.3.4 Krebs cycle 137

2.3.5 Anaplerotic reaction 140

2.3.6 Oxidative phosphorylation 141

2.3.7 Inhibitors of electron transport 145

2.3.8 Electrochemical proton gradient 146

2.3.9 Chemiosmotic theory 147

2.3.10 ATP synthase 148

2.3.11 Uncoupling agents and ionophores 150

2.3.12 ATP-ADP exchange across the inner mitochondrial membrane 150

2.3.13 Shuttle systems 151

2.3.14 P/O ratio 153

2.3.15 Fermentation 154

2.3.16 Pasteur effect 156

2.3.17 Warburg effect 156

2.3.18 Respiratory quotient 157

2.4 Glyoxylate cycle 157

2.5 Pentose phosphate pathway 158

2.6 Entner-Doudoroff pathway 160

2.7 Photosynthesis 160

2.7.1 Photosynthetic pigment 161

2.7.2 Absorption and action spectra 164

2.7.3 Fate of light energy absorbed by photosynthetic pigments 166

2.7.4 Concept of photosynthetic unit 167

2.7.5 Hill reaction 168

2.7.6 Oxygenic and anoxygenic photosynthesis 168

2.7.7 Concept of pigment system 169

2.7.8 Stages of photosynthesis 171

2.7.9 Light reactions 171

2.7.10 Prokaryotic photosynthesis 178

2.7.11 Non-chlorophyll based photosynthesis 180

2.7.12 Dark reaction: Carbon reduction and fixation cycle 180

2.7.13 Starch and sucrose synthesis 184

vii
2.8 Photorespiration 185

2.8.1 C4 cycle 186

2.8.2 CAM pathway 188

2.9 Carbohydrate metabolism 191

2.9.1 Gluconeogenesis 191

2.9.2 Glycogen metabolism 196

2.10 Lipid metabolism 201

2.10.1 Synthesis and storage of triacylglycerols 201

2.10.2 Biosynthesis of fatty acid 203

2.10.3 Fatty acid oxidation 207

2.10.4 Biosynthesis of cholesterol 214

2.10.5 Steroid hormones and Bile acids 215

2.11 Amino acid metabolism 217

2.11.1 Amino acid synthesis 217

2.11.2 Biological nitrogen fixation 220

2.11.3 Amino acid catabolism 224

2.11.4 Molecules derived from amino acids 229

2.12 Nucleotide metabolism 230

2.12.1 Nucleotide synthesis 230

2.12.2 Nucleotide degradation 237

Chapter 3
Cell Structure and Functions
3.1 What is a Cell? 243

3.2 Structure of eukaryotic cells 244

3.2.1 Plasma membrane 244

3.2.2 ABO blood group 252

3.2.3 Transport across plasma membrane 254

3.3 Membrane potential 261

3.4 Transport of macromolecules across plasma membrane 271

3.4.1 Endocytosis 271

3.4.2 Fate of receptor 276

3.4.3 Exocytosis 276

3.5 Ribosome 277

3.5.1 Protein targeting and translocation 279

3.6 Endoplasmic reticulum 280

3.6.1 Endomembrane system 285

3.6.2 Transport of proteins across the ER membrane 285

3.6.3 Transport of proteins from ER to cis Golgi 289

3.7 Golgi complex 291

viii
3.7.1 Transport of proteins through cisternae 292

3.7.2 Transport of proteins from the TGN to lysosomes 293

3.8 Vesicle fusion 294

3.9 Lysosome 296

3.10 Vacuoles 298

3.11 Mitochondria 298

3.12 Plastids 302

3.13 Peroxisome 302

3.14 Nucleus 303

3.15 Cytoskeleton 307

3.15.1 Microtubules 307

3.15.2 Kinesins and Dyneins 310

3.15.3 Cilia and Flagella 311

3.15.4 Centriole 313

3.15.5 Actin filament 314

3.15.6 Myosin 316

3.15.7 Muscle contraction 317

3.15.8 Intermediate filaments 321

3.16 Cell junctions 322

3.17 Cell adhesion molecules 325

3.18 Extracellular matrix of animals 327

3.19 Plant cell wall 328

3.20 Cell signaling 330

3.20.1 Signal molecules 331

3.20.2 Receptors 331

3.20.3 GPCR and G-proteins 333

3.20.4 Ion channel-linked receptors 342

3.20.5 Enzyme-linked receptors 342

3.20.6 Nitric oxide 349

3.20.7 Two-component signaling systems 350

3.20.8 Chemotaxis in bacteria 351

3.20.9 Quorum sensing 352

3.20.10 Scatchard plot 353

3.21 Cell Cycle 355

3.21.1 Role of Rb protein in cell cycle regulation 365

3.21.2 Role of p53 protein in cell cycle regulation 366

3.21.3 Replicative senescence 367

3.22 Mechanics of cell division 368

3.22.1 Mitosis 368

3.22.2 Meiosis 375

3.22.3 Nondisjunction and aneuploidy 379

ix
3.23 Apoptosis 382

3.24 Cancer 385

Chapter 4
Prokaryotes and Viruses
4.1 General features of Prokaryotes 397
4.2 Phylogenetic overview 398
4.3 Structure of bacterial cell 398
4.4 Bacterial genome : Bacterial chromosome and plasmid 409
4.5 Bacterial nutrition 413
4.5.1 Culture media 415
4.5.2 Bacterial growth 416
4.6 Horizontal gene transfer and genetic recombination 419
4.6.1 Transformation 420
4.6.2 Transduction 422
4.6.3 Conjugation 426
4.7 Bacterial taxonomy 431
4.8 General features of important bacterial groups 432

4.9 Archaebacteria 434


4.10 Bacterial toxins 436
4.11 Control of microbial growth 437
4.12 Virus 441
4.12.1 Bacteriophage (Bacterial virus) 443
4.12.2 Life cycle of bacteriophage 444
4.12.3 Plaque assay 447
4.12.4 Genetic analysis of phage 450
4.12.5 Animal viruses 453
4.12.6 Plant viruses 463
4.13 Prions and Viroid 464
4.13.1 Bacterial and viral disease 465

Chapter 5
Immunology
5.1 Innate immunity 469
5.2 Adaptive immunity 471
5.3 Cells of the immune system 473
5.3.1 Lymphoid progenitor 474
5.3.2 Myeloid progenitor 476
5.4 Organs involved in the adaptive immune response 477
5.4.1 Primary lymphoid organs 477

x
5.4.2 Secondary lymphoid organs/tissues 478
5.5 Antigens 479
5.6 Major-histocompatibility complex 483
5.6.1 MHC molecules and antigen presentation 485
5.6.2 Antigen processing and presentation 486
5.6.3 Laboratory mice 488
5.7 Immunoglobulins : Structure and function 489
5.7.1 Basic structure of antibody molecule 489
5.7.2 Different classes of immunoglobulin 491
5.7.3 Action of antibody 494
5.7.4 Antigenic determinants on immunoglobulins 494
5.8 B-cell maturation and activation 496
5.9 Kinetics of the antibody response 502
5.10 Monoclonal antibodies and Hybridoma technology 503
5.10.1 Engineered monoclonal antibodies 504
5.11 Organization and expression of Ig genes 506
5.12 Generation of antibody diversity 512
5.13 T-cells and CMI 515
5.13.1 Superantigens 525
5.14 Cytokines 526
5.15 The complement system 529
5.16 Hypersensitivity 533
5.17 Autoimmunity 535

5.18 Transplantation 536

5.19 Immunodeficiency diseases 536

5.20 Failures of host defense mechanisms 537

5.21 Vaccines 539

Chapter 6
Genetics
6.1 Mendel’s principles 545

6.1.1 Mendel’s laws of inheritance 547

6.1.2 Incomplete dominance and codominance 551

6.1.3 Multiple alleles 552

6.1.4 Lethal alleles 554

6.1.5 Penetrance and expressivity 555

6.1.6 Probability 555

6.2 Chromosomal basis of inheritance 558

6.3 Gene interaction 559

6.3.1 Dominant epistasis 561

6.3.2 Recessive epistasis 562

xi
6.3.3 Duplicate recessive epistasis 562

6.3.4 Duplicate dominant interaction 563

6.3.5 Dominant and recessive interaction 563

6.3.6 Genetic dissection to investigate gene action 565

6.3.7 Pleiotropy 566

6.4 Genetic linkage and gene mapping 566

6.4.1 Genetic mapping 570

6.4.2 Gene mapping from two point cross 572

6.4.3 Gene mapping from three point cross 572

6.4.4 Interference and coincidence 575

6.5 Tetrad analysis 576

6.5.1 Analysis of ordered tetrad 577

6.5.2 Analysis of unordered tetrad 579

6.6 Sex chromosomes and sex determination 580

6.6.1 Sex chromosome 580

6.6.2 Sex determination in animals 581

6.6.3 Sex determination in plants 585

6.6.4 Mosaicism 585

6.6.5 Sex-linked traits and sex-linked inheritance 585

6.6.6 Sex-limited traits 587

6.6.7 Sex-influenced traits 587

6.6.8 Pedigree analysis 587

6.7 Quantitative inheritance 591

6.7.1 Quantitative trait locus analysis 595

6.7.2 Heritability 595

6.8 Extranuclear inheritance and maternal effect 596

6.8.1 Maternal effect 599

6.9 Cytogenetics 601

6.9.1 Human karyotype 601

6.9.2 Chromosome banding 602

6.9.3 Variation in chromosome number 603

6.9.4 Chromosome aberrations 607

6.9.5 Position effect 612

6.10 Genome 613

6.10.1 Genome complexity 614

6.10.2 Transposable elements 617

6.10.3 Gene 625

6.10.4 Introns 626

6.10.5 Acquisition of new genes 628

6.10.6 Fate of duplicated genes 628

6.10.7 Gene families 629

xii
6.10.8 Human nuclear genome 631

6.10.9 Organelle genome 632

6.10.10 Yeast S. cerevisiae genome 633

6.10.11 E. coli genome 633

6.11 Eukaryotic chromatin and chromosome 633

6.11.1 Packaging of DNA into chromosomes 635

6.11.2 Histone modification 639

6.11.3 Heterochromatin and euchromatin 640

6.11.4 Polytene chromosomes 644

6.11.5 Lampbrush chromosomes 644

6.11.6 B-chromosomes 645

6.12 DNA replication 645

6.12.1 Semiconservative replication 646

6.12.2 Replicon and origin of replication 647

6.12.3 DNA replication in E. coli 650

6.12.4 Telomere replication 661

6.12.5 Rolling circle replication 662

6.12.6 Replication of mitochondrial DNA 663

6.13 Recombination 663

6.13.1 Homologous recombination 664

6.13.2 Site-specific recombination 669

6.14 DNA repair 671

6.14.1 Direct repair 671

6.14.2 Excision repair 671

6.14.3 Mismatch repair 673

6.14.4 Recombinational repair 674

6.14.5 Repair of double strand DNA break 676

6.14.6 SOS response 677

6.15 Transcription 678

6.15.1 Transcription unit 679

6.15.2 Prokaryotic transcription 679

6.15.3 Eukaryotic transcription 685

6.15.4 Role of activator and co-activator 690

6.15.5 Long-range regulatory elements 691

6.15.6 DNA binding motifs 693

6.16 RNA processing 695

6.16.1 Processing of eukaryotic pre-mRNA 696

6.16.2 Processing of pre-rRNA 706

6.16.3 Processing of pre-tRNA 709

6.17 mRNA degradation 710

xiii
6.18 Regulation of gene transcription 711

6.18.1 Operon model 711

6.18.2 Tryptophan operon system 718

6.18.3 Riboswitches 722

6.19 Bacteriophage lambda : A transcriptional switch 723

6.20 Regulation of transcription in eukaryotes 726

6.20.1 Influence of chromatin structure on transcription 726

6.20.2 DNA methylation and gene regulation 728

6.20.3 Post-transcriptional gene regulation 730

6.21 RNA interference 731

6.22 Epigenetics 734

6.23 Genetic code 735

6.24 Protein synthesis 740

6.24.1 Incorporation of selenocysteine 752

6.24.2 Cap snatching 753

6.24.3 Translational frameshifting 753

6.24.4 Antibiotics and toxins 753

6.24.5 Post-translational modification of polypeptides 754

6.25 Mutation 757

6.25.1 Mutagen 762

6.25.2 Types of mutation 765

6.25.3 Fluctuation test 769

6.25.4 Replica plating experiment 770

6.25.5 Ames test 771

6.25.6 Complementation test 771

6.26 Developmental genetics 773

6.26.1 Genetic control of embryonic development in Drosophila 773

6.26.2 Genetic control of vulva development in C. elegans 779

6.27 Population genetics 780

6.27.1 Calculation of allelic frequencies 780

6.27.2 Hardy-Weinberg Law 781

6.27.3 Inbreeding 786

Chapter 7
Recombinant DNA technology
7.1 DNA cloning 797

7.2 Enzymes for DNA manipulation 799

7.2.1 Template-dependent DNA polymerase 799

7.2.2 Nucleases 799

7.2.3 End-modification enzymes 803

xiv
7.2.4 Ligases 805

7.2.5 Linkers and adaptors 805

7.3 Vectors 808

7.3.1 Vectors for E. coli 809

7.3.2 Cloning vectors for yeast, S. cerevisiae 814

7.3.3 Vectors for plants 815

7.3.4 Vectors for animals 819

7.4 Introduction of DNA into the host cells 819

7.4.1 In bacterial cells 819

7.4.2 In plant cells 819

7.4.3 In animal cells 822

7.5 Selectable and screenable marker 824

7.6 Selection of transformed bacterial cells 826

7.7 Recombinant screening 827

7.8 Expression vector 829

7.8.1 Expression system 830

7.8.2 Fusion protein 831

7.9 DNA library 831

7.10 Polymerase chain reaction 834

7.11 DNA sequencing 838

7.12 Genome mapping 842

7.12.1 Genetic marker 842

7.12.2 Types of DNA markers 843

7.12.3 Physical mapping 847

7.12.4 Radiation hybrids 849

7.13 DNA profiling 850

7.14 Genetic manipulation of animal cells 851

7.14.1 Transgenesis and transgenic animals 851

7.14.2 Gene knockout 853

7.14.3 Formation and selection of recombinant ES cells 855

7.15 Nuclear transfer technology and animal cloning 856

7.16 Gene therapy 857

7.17 Transgenic plants 862

7.17.1 General procedure used to make a transgenic plant 862

7.17.2 Antisense technology 865

7.17.3 Molecular farming 866

7.18 Plant tissue culture 867

7.18.1 Cellular totipotency 867

7.18.2 Tissue culture media 867

7.18.3 Types of cultures 869

7.18.4 Somaclonal and gametoclonal variation 874

xv
7.18.5 Somatic hybridization and cybridization 874

7.18.6 Applications of cell and tissue culture 875

7.19 Animal cell culture 878

7.19.1 Primary cultures 878

7.19.2 Cell line 878

7.19.3 Growth cycle 880

7.19.4 Culture media 881

Chapter 8
Bioprocess engineering
8.1 Concept of material and energy balance 887

8.1.1 Material balance 887

8.1.2 Energy balance 892

8.2 Cell growth kinetics 894

8.3 Fermentation 902

8.3.1 Fermentation processes 902

8.3.2 Fermentation media 903

8.4 Bioreactor 904

8.4.1 Agitation and aeration 904

8.4.2 Types of bioreactors 905

8.4.3 Mass balances for bioreactor 909

8.4.4 Ideal batch reactor 909

8.5 Basic operation and process control 915

8.6 Sterilization 917

8.7 Genetic instability 920

8.8 Mass and Heat transfer 921

8.8.1 Mass transfer 921

8.8.2 Heat transfer 925

8.9 Rheology of fermentation fluids 929

8.10 Enzyme immobilization 930

8.11 Scale up 935

8.12 Downstream processing 935

8.13 Industrial production of chemicals 942

8.14 Wastewater treatment 945

8.15 Bioremediation 947

xvi
Chapter 9
Bioinformatics
9.1 Introduction 954

9.2 Biological databases 954

9.3 Sequence formats 957

9.4 Biosequence analysis 960

9.5 Sequence alignment 961

9.6 Molecular phylogenetics 967

9.7 Protein structure prediction 970

9.8 Bioinformatics resources on the web 973

9.9 Genomics and proteomics 974

9.9.1 Genomics 974

9.9.2 Proteomics 974

Answers of self test 979

Index 981

xvii
Chapter 01
Biomolecules and Catalysis

A biomolecule is a carbon-based organic compound that is produced by a living organism. More than 25 naturally
occurring chemical elements are found in biomolecules, but these biomolecules consist primarily of carbon, hydrogen,
nitrogen, oxygen, phosphorus and sulfur. In terms of the percentage of the total number of atoms, four elements
such as hydrogen, oxygen, nitrogen and carbon together make up over 99% of the mass of most cells.
Biomolecules include both small as well as large molecules. The small biomolecules are low molecular weight (less
than 1000) compound which include sugars, fatty acids, amino acids, nucleotides, vitamins, hormones,
neurotransmitters, primary and secondary metabolites. Sugars, fatty acids, amino acids and nucleotides constitute
the four major families of small biomolecules in cells. Large biomolecules which have high molecular weight are
called macromolecules and mostly are polymers of small biomolecules. These macromolecules are proteins,
carbohydrates and nucleic acids.
Small biomolecules Macromolecules
Sugars Polysaccharides
Amino acids Polypeptides (proteins)
Nucleotides Nucleic acids
Fatty acids

Nucleic acids and proteins are informational macromolecules. Proteins are polymers of amino acids and constitute
the largest fraction (besides water) of cells. The nucleic acids, DNA and RNA, are polymers of nucleotides. They
store, transmit, and translate genetic information. The polysaccharides, polymers of simple sugars, have two
major functions. They serve as energy-yielding fuel stores and as extracellular structural elements.

1.1 Amino acids and Proteins


Amino acids are compounds containing carbon, hydrogen, oxygen and nitrogen. They serve as monomers (building
blocks) of proteins and are composed of an amino group, a carboxyl group, a hydrogen atom, and a distinctive side
chain, all bonded to a carbon atom, the α-carbon. In an α-amino acid, the amino and carboxylate groups are
attached to the same carbon atom, which is called the α-carbon. The various α-amino acids differ with respect to
the side chain (R group) attached to their α-carbon. The general structure of an amino acid is:

a-carboxyl group

COO

+
a-amino group H3N Ca H

R
Side chain

Figure 1.1 General structure of an amino acid.

1
Biomolecules and Catalysis

This structure is common to all except one of the α-amino acids (proline is the exception). The R group or side chain
attached to the α-carbon is different in each amino acid. In the simplest case, the R group is a hydrogen atom and
amino acid is glycine.

— —
COO COO
+ +
H3N Ca H H3N Ca H

H=R b CH2

Glycine g CH
2

d CH2 R

e CH2

+
NH 3

Lysine

Figure 1.2 Structure of glycine and lysine.

In α-amino acids both the amino group and the carboxyl group are attached to the same carbon atom. However,
many naturally occurring amino acids not found in protein, have structures that differ from the α-amino acids. In
these compounds the amino group is attached to a carbon atom other than the α-carbon atom and they are called
β, γ, δ, or ε amino acids depending upon the location of the C-atom to which amino group is attached.

Amino acids can act as acids and bases


Amino acids contain both an amino (–NH2) and a carboxyl (–COOH) group. Amino group is basic (proton acceptor)
and carboxyl group is acidic (proton donor). Therefore, amino acids are amphoteric in nature. An amphiprotic
molecule can either donate or accept a proton, thus acting either as an acid or a base. At high concentrations of
protons (low pH), the carboxyl group accepts a proton and becomes uncharged, so that the overall charge on the
molecule is positive. Similarly, at low concentrations of protons (high pH), the amino group loses its proton and
becomes uncharged; thus the overall charge on the molecule is negative. At specific value of pH called isoelectric
point (pI), every amino acid exists predominatly as dipolar ion or zwitterion. A zwitterion is a compound with no
overall electrical charge, but contains positively and negatively charged groups.

— —
COOH COO COO
+ +
H3N Ca H H3N Ca H H2N Ca H

R R R

Low pH (pH < pI) Intermediate pH High pH (pH > pI)


(pH = pI)

Figure 1.3 The acid-base behaviour of an amino acid in solution. At low pH, the positively charged species
predominates. As the pH increases, the electrically neutral zwitterion becomes predominant. At higher pH, the
negatively charged species predominates.

2
This page intentionally left blank.
Biomolecules and Catalysis

1.1.1 Optical properties


All amino acids except glycine are optically active i.e. they rotate the plane of plane polarized light. Optically active
molecules contain chiral carbon. A tetrahedral carbon atom with four different constituents are said to be chiral. All
amino acids except glycine have chiral carbon and hence they are optically active. An optically active compound
can rotate the plane of polarized light either clockwise (to the right) or counterclockwise (to the left). Optically
active compounds that rotate the plane of polarized light clockwise are said to be dextrorotatory. By convention,
this direction is designated by a plus sign (+). Optically active compounds that rotate the plane of polarized light
counterclockwise are said to be levorotatory. This is designated by a minus sign (–). The + and – forms have also
been termed d- and l-, respectively.
— —
COO COO
+ +
H3N Ca H H3N Ca H

Achiral H Chiral CH3


carbon carbon

Glycine Alanine

Figure 1.4 Amino acids showing achiral and chiral carbon.

Optical activity is measured by polarimeter. Optical activity is the ability of an optically active compound to rotate
the plane of linearly polarized light. Optical rotation is a quantitative measure of the rotation of light caused by the
compound. The magnitude of optical rotation indicates the extent to which plane of linearly polarized light is rotated
and sign represents the direction of rotation. Optical rotation of an optically active compound depends on the
concentration of the compound, temperature, wavelength of light used, solvent used to dissolve the sample and
light pathlength. The optical rotation of a solution at a given temperature and wavelength is given by

Å = [α]Tλ × C × l

where, Å = observed rotation in degrees


C = concentration of the solution in g/ml
l = light path length in decimeters (dm)
[α]λT = the specific rotation of compound at temperature, T (in degrees Celsius) and wavelength, λ (in nm).
If the wavelength of the light used is 589 nm, the symbol ‘D’ is used, [α]DT .

Specific rotation is the reference value of optical rotation for a given concentration of compound at a given
temperature and fixed wavelength. At a given temperature and for a given wavelength of light, the specific rotation
is defined as the observed value of optical rotation when plane polarized light is passed through a sample with a
path length of 1 decimeter and a sample concentration of 1g per milliliter.

Sample tube
containing a
chiral compound

Normal light Polarizer Plane-polarized Rotation of


light plane-polarized light

Figure 1.5 When plane polarized light is passed through a solution that contains an optically active compound,
there is net rotation of the plane polarized light. The light is rotated either clockwise (dextrorotatory) or
counterclockwise (levorotatory) by an angle that depends on the molecular structure and concentration of
the compound, the pathlength and the wavelength of the light.

3
This page intentionally left blank.
Biomolecules and Catalysis

peptides are cyclic in nature. Two cyclic decapeptides (peptides containing 10 amino acid residues) produced by the
bacterium Bacillus brevis are common examples. Both of these peptides, gramicidin S and tyrocidine A, are
antibiotics, and both contain D-amino acids as well as L-amino acids. In addition, both contain the amino acid
ornithine, which does not occur in proteins. Small peptides play many roles in organisms. Some, such as oxytocin
and vasopressin, are important hormones. Others, like glutathione, regulate oxidation–reduction reactions. Still
others, such as enkephalins, are naturally occurring painkillers. Aspartame is a commercially synthesized dipeptide,
L-aspartylphenylalanyl methylester, and is used as an artificial sweetener.
When many amino acid residues are joined, the product is called a polypeptide. Amino acids which have been
incorporated into a peptide or polypeptide are termed amino acid residues. By convention, in a polypeptide the left
end represented by the first amino acid while the right end represented by the last amino acid. The first amino acid
is also called as N-terminal amino acid residue. The last amino acid is called the C-terminal amino acid residue.

H O H O H O
N-terminal C-terminal

H2N C C N C C N C C OH

R H R H R

Amino acid Amino acid Amino acid


residue residue residue

Figure 1.11 A series of amino acids joined by peptide bonds form a polypeptide chain, and each amino acid
unit in a polypeptide is called a residue. A polypeptide chain has polarity because its ends are different, with
an α-amino group at one end and an α-carboxyl group at the other.

The peptide bonds in proteins are formed between the α-amino and the α-carboxyl groups. But peptides do occur
naturally where the peptide linkage involves a carboxyl or amino group which is attached to a carbon atom other
than the α-carbon. For example a dipeptide formed between the γ -carboxyl group of glutamic acid and the amino
group of alanine is called γ-glutamylalanine.

1.1.6 Peptide bond


Peptides and polypeptides are linear and unbranched polymers composed of amino acids linked together by peptide
bonds. Peptide bonds are amide linkages formed between α-amino group of one amino acid and the α-carboxyl
group of another. This reaction is a dehydration reaction, that is, a water molecule is removed. Peptide bond
formation is an endergonic process, with ΔG ~ +21kJ/mol.

H O H O

H2N C C OH H N C C OH

R1 H R2

H2O

H O H O

H2N C C N C C OH

R1 H R2

Figure 1.12 The formation of a peptide bond (also called an amide bond) between the α-carboxyl group of
one amino acid to the α-amino group of another amino acid is accompanied by the loss of a water molecule.

13
Biomolecules and Catalysis

HN NH
+ +
HC (CH2)2 CH2 CH2 NH3 H3N CH2 CH2 (CH2)2 CH

O C C O
Lysine Lysine

Lysyl oxidase Lysyl oxidase

HN NH

HC (CH2)2 CH2 CHO OHC CH2 (CH2)2 CH

O C C O
Allysine Allysine

HN NH

HC (CH2)2 CH2 CH C (CH2)2 CH

O C CHO C O

Aldol cross-link

Figure 1.21 Intramolecular cross-links form between allysine after oxidative deamination of ε-amino groups
of lysine residues. Such aldehydes of two side chains then link covalently in a spontaneous nonenzymatic
aldol condensation.

The intermolecular cross-linking of tropocollagens involves the formation of a unique hydroxypyridinium structure
from one lysine and two hydroxylysine residues.

Sequence of events in the biosynthesis of collagen:


1. Synthesis and entry of polypeptide into lumen of RER.
2. Hydroxylation of prolyl and lysyl residues.
3. Glycosylation.
4. Formation of tropocollagen.
5. Packaged into transport vesicle.
6. Exocytosis.
7. Lateral covalent cross-linking of tropocollagens.
8. Aggregation of fibrils.

1.2.2 Elastin
Elastin is a highly hydrophobic connective tissue protein that is responsible for extensibility and elasticity. It is the
second major protein in the extracellular matrix, which is the main component of elastic fibers found in ligaments,
large arteries, and lungs. After synthesis, a 72 kDa molecule of soluble tropoelastin is secreted into the matrix. This
protein is rich in nonpolar amino acid and unusually rich in proline and glycine. Unlike collagen it is not glycosylated
and contain some hydroxyproline but no hydroxylysine. The tropoelastin is composed largely of two types of short
sequences that alternate along the polypeptide chain: hydrophobic segments and alanine and lysine rich segments.
After secretion, the tropoelastin molecules become highly cross-linked to one another, generating an extensive
network of elastin fibers. After secretion from the cell, certain lysyl residues of tropoelastin are oxidatively deami-
nated to aldehydes by lysyl oxidase. The condensation of three of these lysine-derived aldehydes with an unmodified
lysine results in formation of a tetrafunctional cross-link called desmosines. Once cross-linked in its mature, extra-
cellular form, elastin is highly insoluble and extremely stable.

24
This page intentionally left blank.
Biomolecules and Catalysis

H H O

N C C

(CH2)2

H N N H
CH2
H C CH2 H2C CH2 CH2 C H

O C C O
+
N

CH2

(CH2)3

N C C

H H O

Figure 1.22 Intramolecular desmosine cross-links in elastin.

1.2.3 Keratins
Keratins are fibrous proteins present in eukaryotes. They form a large family, with about 30 members being
distinguished. Keratins have been classified as either α-keratins or β-keratins.
Proteins α-keratin β-keratin
Characteristics Tough, insoluble Soft, flexible
Conformation Helical Extended chain
Basic unit Protofibril Antiparallel β-pleated sheet
α-keratins are intermediate filament proteins present only in many metazoans, including vertebrates. In vertebrates,
α-keratins constitute almost the entire dry weight of hair, wool, feathers, nails, claws, scales, horns, hooves, and
much of the outer layer of skin. The α-keratin polypeptide chain which forms polymerized α-keratin structure, is a
right-handed α-helix and rich in hydrophobic amino acid residues Ala, Val, Leu, Ile, Met and Phe. Every α-keratin
polypeptide chain dimerizes to form heterodimer. The heterodimer is made up of type I (acidic) and the type II
(neutral/basic) α-keratin polypeptide chains. The two chains in heterodimer have a parallel arrangement. Two
heterodimers join in an antiparallel manner to form the fundamental tetrameric subunit (a protofilament). Two
protofilaments constitute a protofibril. Four protofibrils constitute a microfibril, which associates with other microfibrils
to form a macrofibril.

1.2.4 Myoglobin
Myoglobin (Mb), a globular protein, contains a single polypeptide chain of 153 amino acid residues (molecular
weight 17,800), and a single heme group. The inside of myoglobin consists almost exclusively of nonpolar residues,
whereas the outside contains both polar and nonpolar residues. About 75% of the polypeptide chain is α-helical.
There are eight helical segments. These eight helical segments are commonly labeled A–H, starting from the
NH2-terminal end. The interhelical regions are designated as AB, BC, CD,..., GH, respectively. The iron atom of the
heme is directly bonded to a nitrogen atom of a histidine side chain of globin.

Heme
Globin of Mb binds a single heme group by forming a co-ordinate bond. The heterocyclic ring system of heme is a
porphyrin derivative. The porphyrin in heme is known as protoporphyrin IX. It is made up of 4-pyrrole ring and
4-pyrroles are linked by methine (=CH_) bridges to form a tetrapyrrole ring. The Fe atom is present either in Fe2+
or Fe3+ oxidation state in the center of the protoporphyrin IX ring.
25
This page intentionally left blank.
Biomolecules and Catalysis

Biological interaction
Biological interactions (or bonds) in living systems fall under two categories – covalent and non-covalent interactions.

Covalent interactions
A covalent bond is formed when two atoms share one or more pairs of electrons. Covalent bonds are strong
bonds and very stable in nature. These bonds may be either polar or nonpolar. A nonpolar covalent bond such
as that in the hydrogen molecule, H2, the electron pair is shared equally between the two hydrogen nuclei. Both
hydrogen atoms have the same electronegativity (the electronegativity difference between the atoms is zero).
Covalent bonds, such as the one in HF, in which the electron pairs are shared unequally due to difference in
electronegativity are called polar covalent bonds. In HF, the shared electron pair between the two atoms gets
displaced more towards fluorine since the electronegativity of fluorine is far greater than that of hydrogen. The
atom towards which the electron pair shift gets slight negative charge while the other atom acquires a slight
positive charge. The magnitude of electronegativity difference reflects the degree of polarity. Greater the
difference in the electronegativities of the atoms forming the bond, greater will be the charge separation and
hence greater will be the polarity of the molecule. The polarity of a molecule can be expressed by its dipole
moment, which measures the separation of charge within the molecule.

Non-covalent interactions
Non-covalent interactions include ionic bonds, hydrogen bonds, van der Waals forces and hydrophobic interactions.
These interactions are weak interactions. The energy required to break non-covalent interactions is only
1-5 kcal/mol which is much less than the bond energies of covalent bonds.

Ionic bonds
An ionic bond is a chemical bond formed by the electrostatic attraction between positive and negative ions. A
positively charged ion is called a cation and a negatively charged ion is called an anion. In ionic (electrovalent)
bonding, the atoms are bound by electrostatic attraction of opposite ions, whereas, in covalent bonding, atoms
are bound by sharing electrons to attain stable electron configurations. Ionic bond forms when the electronegativity
difference between two elements is large, as between a metal and a nonmetal.
An important aspect of ionic compound in aqueous solution is the hydration of ions. Because water molecules
are polar, they are attracted to charged ions. Shells of water molecules, referred to as solvation spheres, cluster
around both positive and negative ions. As ions become hydrated, the attractive force between them is reduced,
and the charged species dissolves in the water.

Hydrogen bonds
Hydrogen bonding is a weak electrostatic attractive force that exists between a hydrogen atom covalently
bonded to a very electronegative atom, X and a lone pair of electrons on another small, electronegative atom,
Y. A typical hydrogen bond may be depicted as X–H•••Y–, where the three dots denote the bond. X–H represents
the hydrogen bond donor. The atoms X and H are covalently bonded to one another and the X–H bond is
polarized, the H•••Y bond strength increasing with the increase in electronegativity of X. Usually, hydrogen
bonding is seen in case where X and Y are the atoms F, O or N. Y is electronegative atom with a lone pair of
electrons. Although considerably weaker than ionic and covalent bonds, hydrogen bonds are stronger than
most non-covalent bonds. It can be intermolecular and intramolecular hydrogen bond. Hydrogen bonds are
both longer and weaker than covalent bonds between the same atoms. For example, the bond energy of O–H
covalent bond is 110 kcal mol–1 whereas, the energy of hydrogen bonds in water is only about 5 kcal mol–1.

van der Waals forces


van der Waals forces are weak, intermolecular interactions. They occur between the permanent or induced
dipoles. The attraction between molecules is greatest at a distance called the van der Waals radius.
If molecules approach each other more closely, a repulsive force develops. The magnitude of van der Waals
forces depends on how easily an atom is polarized. Electronegative atoms with unshared pairs of electrons are
easily polarized.

46
This page intentionally left blank.
Biomolecules and Catalysis

1.5 Nucleic acids


Nucleic acid was first discovered by Friedrich Miescher from the nuclei of the pus cells (Leukocytes) from discarded
surgical bandages and called it nuclein. Nuclein was later shown to be a mixture of a basic protein and a phosphorus-
containing organic acid, now called nucleic acid. There are two types of nucleic acids – ribonucleic acid (RNA) and
deoxyribonucleic acid (DNA).

1.5.1 Nucleotides
Nucleic acids are polymer. The monomeric units of nucleic acids are called nucleotides. Nucleic acids therefore are
also called polynucleotides. Nucleotides are phosphate esters of nucleosides and made up of three components:
1. A base that has a nitrogen atom (nitrogenous base)
2. A five carbon sugar
3. An ion of phosphoric acid

Nitrogenous bases
Nitrogenous bases are heterocyclic, planar and relatively water insoluble aromatic molecules. There are two general
types of nitrogenous bases - pyrimidines and purines.

H H
7
C6 5 N C4 5
1N C 3N CH
8
2 CH 2
HC C HC CH
4 N9 6
N H N
3 1

Purine Pyrimidine

Purines
Two different nitrogenous bases with a purine ring (composed of carbon and nitrogen) are found in DNA. The two
common purine bases found in DNA and RNA are adenine (6-aminopurine) and guanine (6-oxy-2-aminopurine).
Adenine has an amino group (–NH2) on the C6 position of the ring (carbon at position 6 of the ring). Guanine has an
amino group at the C2 position and a carbonyl group at the C6 position.

Pyrimidines
The two major pyrimidine bases found in DNA are thymine (5-methyl-2,4-dioxypyrimidine) and cytosine (2-oxy-4-
aminopyrimidine) and in RNA they are uracil (2,4-dioxypyrimidine) and cytosine. Thymine contains a methyl group
at the C5 position with carbonyl groups at the C4 and C2 positions. Cytosine contains a hydrogen atom at the C5
position and an amino group at C4. Uracil is similar to thymine but lacks the methyl group at the C5 position. Uracil
is not usually found in DNA. It is a component of RNA.
NH2 O NH2 O O
C C C C C
N N
N C HN C N CH HN CH HN C CH3
CH CH
HC C C C C CH C CH C CH
N H2N N O N O N O N
N H N H H H H

Adenine Guanine Cytosine Uracil Thymine

Sugars
Naturally occurring nucleic acids have two types of pentose sugars: ribose and deoxyribose sugar.
Ribose sugar is found in RNA. It is a five carbon monosaccharide with a hydroxyl group (–OH) on each carbon.
Deoxyribose sugar is found in DNA. It is a five carbon monosaccharide, lacking one oxygen atom at 2’ position. The
hydroxyl group (–OH) at 2’ position of ribose sugar is replaced by a hydrogen (–H).

48
This page intentionally left blank.
Biomolecules and Catalysis

1.6.2 Z-DNA
Z-DNA is a left-handed double helical structure with two anti-parallel strands that are held together by Watson-
Crick base pairing. The transition from B- to Z-DNA conformation occurs most readily in DNA segments containing
alternating purines and pyrimidines, especially alternations of C and G on one strand (and also in DNA segments
containing alternations of T G on one strand and C A on the other). The existence of Z-DNA was first suggested by
optical studies demonstrating that a polymer of alternating C and G in one strand in a 4 M NaCl solution. The
physical reason for this finding remained a mystery until an X-ray crystallographic study of a self-complementary
DNA hexamer d(CG)3 revealed a left-handed double helix with two anti-parallel chains that were held together by
Watson–Crick base pairing.
Z-DNA is thinner (18 Å) than B-DNA (20 Å) and there is only one deep, narrow groove equivalent to the minor
groove in B-DNA. No major groove exists. In contrast to B-DNA where a repeating unit is a 1 base pair, in Z-DNA
the repeating unit is a 2 base pairs. This dinucleotide repeat causes the backbone to follow a zigzag path, giving rise
to the name Z-DNA. The glycosidic bond conformations alternate between anti and syn (anti for pyrimidines; syn
for purines). Similarly, the sugar puckers alternate between C3’-endo and C2’-endo (C2’-endo for pyrimidines and
C3’-endo for purines).
Z-DNA can form in regions of alternating purine-pyrimidine sequence; GCGCGC... sequences form Z-DNA most
easily. TGTGTGTG… sequences also form Z-DNA but they require a greater stabilization energy. Formation of
Z-DNA conformation is generally unfavourable. Certain conditions promote Z-DNA conformation from B-DNA
conformation; such as negative DNA supercoiling, high salt concentration or 5-methylated deoxycytosine.

Table 1.11 Comparisons of different forms of DNA

Geometry attribute A-form B-form Z-form


Helix sense Right-handed Right-handed Left-handed
Repeating unit 1 bp 1 bp 2 bp
Rotation/bp (Twist angle) 33.6° 34.3° 60°/2
Mean bp/turn 10.7 10.4 12
Base pair tilt 20° –6° 7°
Rise/bp along axis 2.3Å 3.32Å 3.8Å
Pitch/turn of helix 24.6Å 33.2Å 45.6Å
Mean propeller twist +18° +16° 0°
Glycosidic bond conformation Anti Anti Anti : pyrimidines and Syn : purines
Sugar pucker C3’-endo C2’-endo C2’-endo and C3’-endo
Diameter 23Å 20Å 18Å
Major groove Narrow and deep Wide and deep Flat
Minor groove Wide and shallow Narrow and deep Narrow and deep

1.6.3 Triplex DNA


In certain circumstances (e.g. low pH), a DNA sequence containing a long segment consisting of a polypurine
strand, hydrogen bonded to a polypyrimidine strand and form a triple helix. The triple helix will be written as
(dT).(dA).(dT) with the third strand in italics. Triple-stranded DNA is formed by laying a third strand into the major
groove of DNA. A third strand makes a hydrogen bond to another surface of the duplex. The third strand pairs in
a Hoogsteen base-pairing scheme. The central strand of the triplex must be purine rich. Thus, triple-stranded
DNA requires a homopurine: homopyrimidine region of DNA. If the third strand is purine rich, it forms reverse
Hoogsteen hydrogen bonds in an antiparallel orientation with the purine strand of the Watson-Crick helix. If the
third strand is pyrimidine rich, it forms Hoogsteen bonds in a parallel orientation with the Watson-Crick-paired
purine strand.

55
Biomolecules and Catalysis

Triple helix can be intermolecular or intramolecular. In the intermolecular Pu.Pu.Py triple helix, the poly-purine
third strand is organized antiparallel with respect to the purine strand of the original Watson-Crick duplex. In the
intermolecular Py.Pu.Py triplex, the polypyrimidine third strand is organized parallel with respect to the purine
strand and the phosphate backbone is positioned.

5' 3’

Polypyrimidine strand
Polypurine
third strand Polypurine strand

Figure 1.49
Intermolecular Pu.Pu.Py triple
5'
helix. The polypurine third strand
(black color) is organized antiparallel
with respect to the purine strand
5' of the original double strand DNA.
3’

An intramolecular triplex (also referred to as H-DNA) could form within a single homopurine-homopyrimidine
duplex DNA region in the supercoiled DNA. As in intermolecular triplexes, when the third strand is the pyrimidine
strand, it forms Hoogsteen pairs in a parallel fashion with the central purine strand. When the third strand is the
purine strand, it forms reverse Hoogsteen pairs in an antiparallel fashion with the central purine strand.

1.6.4 G-quadruplex
Nucleic acid sequences which are rich in guanine are capable of forming four-stranded structures called
G-quadruplexes (also called G-quartat). These consist of a square arrangement of guanines (a tetrad), stabilized
by Hoogsteen hydrogen bonding. The formation and stability of the G-quadruplexes is a monovalent cation-dependent.
A monovalent cation is present in the center of the tetrads. G-quadruplexes can be formed of DNA or RNA. They
can be formed from one, two or four separate strands of DNA or RNA. Depending on the direction of the strands or
parts of a strand that form the tetrads, structures may be described as parallel or antiparallel. All parallel quadruplexes
have all guanine glycosidic angles in an anti conformation. Anti-parallel quadruplexes have both syn and anti
conformations.
H
Anti
N N N H N N Anti

N H
N O N
O N
H
+ N H
H M H
H N
O H
N
O
N H N N

N N
N N H N
Anti H Anti

Figure 1.50 Four-stranded structures can arise from square arrangement of guanines.

56
Biomolecules and Catalysis

3’ 3’
3’ 3’ 5’ 3’
G G
G G
G G G G
G G G G
G G G G
G G G G
G G G G
G G G G
T T G G
T T
T G G
T T
T T T
T T
G G T T
G G T T
G G G G
G G G G
G G G G
G G G G
G G G G
G G G G
T T G G
T T G G
T T
T T
5’ 5’
5’ 5’ 5’ 3’
Parallel Antiparallel

Figure 1.51 G-Quadruplex DNA. Quadruplex structures may be parallel or antiparallel.

1.6.5 Stability of the dsDNA helix


The helical structure of dsDNA is stabilized by non-covalent interactions. These interactions include stacking
interactions (major) between adjacent bases and hydrogen bonding (minor) between complementary strands.
The core of the helix consists of the base pairs which stack together through stacking interactions. These interactions
include hydrophobic interactions and van der Waals interactions between base pairs that contribute significantly to
the overall stability. Base stacking also helps to minimize contact of the bases with water.
Internal and external hydrogen bonds also stabilize the double helix. The two strands of DNA are held together by
hydrogen bonds that form between the complementary purines and pyrimidines, two hydrogen bonds in an A:T pair
and three hydrogen bonds in a G:C pair, while the polar atoms in the sugar-phosphate backbone form external
hydrogen bonds with surrounding water molecules.
The overall energy of hydrogen bonding depends predominantly on base composition; that is, all A • T and C • G
base pairs. On the other hand, base stacking energies depend on the sequence of the DNA. Some combinations of
base pairs form more stable interactions than others. For example a (GC).(GC) dinucleotidc stack has a stacking
energy of –14.59 kcal/mol/stacked pair, whereas a (TA).(TA) stack has an energy of –3.82 kcal/mol/stacked pair.
Once the DNA double helix is formed, it is remarkably stable. The individual interactions stabilizing the helix are
weak, but the sum of all interactions makes a very stable helix.

1.6.6 Thermal denaturation


DNA denaturation is a process in which a dsDNA separates into two single strands due to disruption of hydrogen
bonds and stacking interactions i.e. it is a process of separation of DNA strands. Several factors (such as extreme
pH, temperature or ionic strength) cause DNA denaturation. If temperature is the denaturing agent, the double
helix is said to have melted. DNA denaturation is a co-operative process. Denaturation process is accompanied by
a change in the DNA’s physical properties. Denaturation increases the relative absorbance of the DNA solution at
260 nm. This increase in the absorbance is known as hyperchromic shift. Stacked bases in dsDNA absorbs less

57
This page intentionally left blank.
Biomolecules and Catalysis

1.7.2 RNA world hypothesis


The concept of an RNA World is a way of answering the basic problem of what was the self-replicating molecule
present at the beginning of life. This hypothesis proposes that RNA was actually the first life-form on earth, later
developing a cell membrane around it and becoming the first prokaryotic cell (the phrase RNA World was first used
by Walter Gilbert in 1986). This hypothesis is supported by the RNA’s ability to store, transmit, and duplicate genetic
information, just like DNA does and to catalyze chemical reactions, just like protein does. Because RNA can perform
the tasks of both genetic materials and enzymes, RNA is believed to have once been capable of independent life.

1.7.3 RNA as genetic material


Some viruses contain RNA as genetic material. One of the first experiments that established RNA as the genetic
material in RNA viruses was the reconstitution experiment of H.Fraenkel-Conrat and B.Singer. They took two
different strains of Tobacco Mosaic Virus (TMV), separated the RNAs from their protein coats, and reconstituted
hybrid viruses by mixing the proteins of one strain with the RNA of the second strain, and vice versa. When the
hybrid virus was spread on tobacco leaves, the lesions that developed corresponded to the TMV from which the
RNA had been obtained. Thus, it was concluded that RNA serves as the genetic material in TMV.

TMV type A

Infection of
tobacco leaf

RNA from TMV type A TMV type A


and Protein (capsid)
from TMV type B

TMV type B

Figure 1.59 In vivo reconstitution of a hybrid TMV virus. There are two strains of virus (TMV type A and
type B) which were separated into protein and RNA. The protein of one strain (type B) was allowed to recombine
with the RNA of the other (type A). The in vivo progeny of this hybrid had the protein originally associated
with its RNA. This proves that the genetic material of TMV is RNA, not protein.

Problem

What is the approximate molecular weight of duplex DNA required to code for glyceraldehyde phosphate
dehydrogenase (MW 40,000)?
Solution
The average molecular weight of an amino acid residue in a protein is 110. Thus, a protein whose molecular weight
is 40,000 contains 40,000/110 = ~364 amino acids and requires a minimum DNA duplex of 3 × 364 = ~1090,
nucleotide pairs. Since each nucleotide pair has an average molecular weight of about 650, the molecular weight of
this gene would be about 1090 × 650 = 708,500. On the average, the molecular weight of coding DNA is about 18
times that of the corresponding protein.

Problem

The molecular weight of bacteriophage T4 dsDNA is 1.3 × 108.


1. How many amino acids can be coded for by T4 DNA?
2. How many different proteins of MW 55000 could be coded for by T4 DNA?

65
Biomolecules and Catalysis

Solution

1. The genetic code is a triplet code. That is, it takes a sequence of three nucleotides on the coding strand of DNA
to specify one amino acid. The DNA of T4 contains:

1.3 × 108
= 2 × 105 nucloeotide pairs = 2 × 105 nucleotides in the codin
ng strand.
650

2 × 105
= ~ 6.7 × 104 codons.
3

2. The average MW of an amino acid residue is 110. A protein of MW 55000 contains:


55000
= 500 amino acids.
110

6.7 × 104
6.7 × 104 codons can yield: = 134.
500

Nucleic acid conversion factors


Average molecular mass of a DNA base pair = 650 Da
1 A260 unit = ~50 microgram/ml of double strand DNA
1 A260 unit = ~40 microgram/ml of single strand RNA
1 A260 unit = ~33 microgram/ml of single strand DNA
1000 bp DNA open reading frame ≅ 333 amino acids ≅ 37,000 Da protein
To calculate the concentration of plasmid DNA in solution using absorbance at 260 nm:
(Observed A260) × (dilution factor) × (0.050) = DNA concentration in μg/μl

1.8 Carbohydrates
Carbohydrates are polyhydroxy aldehydes or polyhydroxy ketones, or compounds that can be hydrolyzed to them.
In the majority of carbohydrates, H and O are present in the same ratio as in water, hence also called as hydrates
of carbon. Carbohydrates are the most abundant biomolecules on Earth. Carbohydrates are classified into following
classes depending upon whether these undergo hydrolysis and if so on the number of products form:
Monosaccharides are simple carbohydrates that consist of a single polyhydroxy aldehyde or ketone unit.
Oligosaccharides are polymers made up of two to ten monosaccharide units joined together by glycosidic linkages.
Oligosaccharides can be classified as di-, tri-, tetra- depending upon the number of monosaccharides present.
Amongst these the most abundant are the disaccharides, with two monosaccharide units.
Polysaccharides are polymers with hundreds or thousands of monosaccharide units. Polysaccharides are not sweet
in taste hence they are also called non-sugars.

1.8.1 Monosaccharide
Monosaccharides consist of a single polyhydroxy aldehyde or ketone unit. Monosaccharides are the simple sugars
and they have a general formula CnH2nOn. Monosaccharides are colorless, crystalline solids that are freely soluble
in water but insoluble in nonpolar solvents. The most abundant monosaccharide in nature is the D-glucose.
Monosaccharides can be further sub classified on the basis of:

I. Number of the carbon atoms


Monosaccharides can be named by a system that is based on the number of carbons with the suffix-ose added.
Monosaccharides with four, five, six and seven carbon atoms are called tetroses, pentoses, hexoses and heptoses,
respectively.

66
This page intentionally left blank.
Biomolecules and Catalysis

1.8.7 Glycoproteins
Various types of compound consisting of carbohydrates covalently linked with non-carbohydrates constituent are
classified under the general name called glycoconjugates. The major types of glycoconjugates are the glycoproteins,
peptidoglycans, glycolipids and lipopolysaccharides. Carbohydrates covalently linked with proteins are called
glycoproteins. The carbohydrate may be in the form of a monosaccharide, disaccharide, oligosaccharide,
polysaccharide, or their derivatives. The term glycoprotein also includes proteoglycans, which in the past were
considered as a separate class of glycoconjugates. Proteoglycans are a subclass of glycoproteins in which the
carbohydrates are glycosaminoglycans. In glycoproteins, carbohydrates are attached either to the amide nitrogen
atom in the side chain of asparagine (termed as N-linkage) or to the oxygen atom in the side chain of serine or
threonine (termed as O-linkage).

O-linked glycosidic bond N-linked glycosidic bond

CH2OH CH2OH
C O O C O
O O CH2 CH Ser O NH C CH2 CH Asn
1 1
OH NH OH NH
O H O H

Monosaccharide Monosaccharide
Core protein Core protein

Figure 1.67 Carbohydrates are covalently attached to many different proteins to form glycoproteins.
Carbohydrates are attached either to the amide nitrogen atom in the side chain of asparagine (termed an
N-linkage) or to the oxygen atom in the side chain of serine or threonine (termed an O-linkage).

1.8.8 Reducing and non-reducing sugar


Sugars capable of reducing ferric or cupric ion are called reducing sugar. A reducing sugar is any sugar that either
has an aldehyde group or is capable of forming one in solution through isomerization. This functional group allows
the sugar to act as a reducing agent.
All monosaccharides whether aldoses and ketoses, in their hemiacetal and hemiketal form are reducing sugars. All
disaccharides formed from head to tail condensation are also reducing sugar i.e. disaccharides except sucrose,
trehalose are reducing sugars. All reducing sugars undergo mutarotation in aqueous solution.

Disaccharides like sucrose, trehalose not capable of reducing ferric or cupric ion are called non-reducing sugar.
In sucrose and trehalose, anomeric carbons of both monosaccharides participate in glycosidic bond formation. So,
they do not contain free anomeric carbon atoms. Sucrose and trehalose are therefore non-reducing sugar, and
have no reducing end. So it cannot be oxidized by cupric or ferric ions. In describing disaccharides or polysaccharides,
the end of a chain that has a free anomeric carbon (i.e. is not involved in a glycosidic bond) is called the reducing
end of the chain.

1.9 Lipids
Biological lipids are a chemically diverse group of organic compounds which are insoluble or only poorly soluble in
water. They are readily soluble in nonpolar solvents such as ether, chloroform, or benzene. The hydrophobic nature
of lipids is due to the predominance of hydrocarbon chains (—CH2—CH2—CH2—) in their structures. Unlike the
proteins, nucleic acids, and polysaccharides, lipids are not polymers.

76
Biomolecules and Catalysis

Leukotrienes are hydroxy fatty acid derivatives of arachidonic acid and do not contain a ring structure. Leukotrienes
are distinguished by containing a conjugated triene double-bond arrangement. They are involved in chemotaxis,
inflammation, and allergic reactions.
H
O

COOH
H

CH3

Figure 1.77 Structure of leukotriene A.

Table 1.19 Biological effects of eicosanoids


Type Major functions
Prostaglandins Mediation of inflammatory response
Regulation of nerve transmission
Inhibition of gastric secretion
Sensitization to pain
Stimulation of smooth muscle contraction
Thromboxanes Platelet aggregation
Aorta constriction
Prostacyclins Thromboxane antagonists
Leukotrienes Bronchoconstriction
Leukotaxis

1.9.7 Plasma lipoproteins


Triacylglycerols, phospholipids, cholesterol and cholesterol esters are transported in human plasma in association
with proteins as lipoproteins. Blood plasma contains a number of soluble lipoproteins, which are classified, according
to their densities, into four major types. These lipid-protein complexes function as a lipid transport system because
isolated lipids are insoluble in blood. There are four basic types of lipoproteins in human blood: chylomicrons, very
low density lipoproteins (VLDL), low density lipoproteins (LDL), and high density lipoproteins (HDL). A lipoprotein
contains a core of neutral lipids, which includes triacylglyerols and cholesterol esters. This core is coated with a
monolayer of phospholipids in which proteins (called apolipoprotein) and cholesterol are embedded.

Table 1.20 Some properties of major classes of human plasma lipoproteins

Lipoprotein Density Protein Phospho- Free Cholesterol Triacyl- Apolipo-


(g/mL) lipids cholesterol esters glycerols protein

Chylomicrons <1.006 1.5–2.5 7–9 1–3 3–5 85 A-I, C-I, B-48

VLDL 0.95–1.006 5–10 15–20 5–10 10–15 50 B-100, C-I, C-II

LDL 1.006–1.063 20–25 15–20 7–10 35–40 7–10 B-100

HDL 1.063–1.210 50–55 20–25 3–4 15 3–4 A-I, A-II, C-I

85
Biomolecules and Catalysis

1.10 Vitamins
Vitamins are organic compounds required by the body in trace amounts to perform specific cellular functions. They
can be classified according to their solubility and their functions in metabolism. The requirement for any given
vitamin depends on the organisms. Not all vitamins are required by all organisms. Vitamins are not synthesized by
humans, and therefore must be supplied by the diet. Vitamins may be water soluble or fat soluble. Nine vitamins
(thiamines, riboflavin, niacin, biotin, pantothenic acid, folic acid, cobalamin, pyridoxine, and ascorbic acid) are
classified as water soluble, whereas four vitamins (vitamins A, D, E and K) are termed fat-soluble. Except for
vitamin C, the water soluble vitamins are all precursors of coenzymes.

1.10.1 Water-soluble vitamins


Thiamine (vitamin B1)

Thiamine pyrophosphate (TPP) is the biologically active form of the vitamin, formed by the transfer of a pyrophosphate
group from ATP to thiamine. Thiamine is composed of a substituted thiazole ring joined to a substituted pyrimidine
by a methylene bridge.

Thiazolium Aminopyrimidine
Reactive H NH2
H NH2 carbon +
S N N
+
S N N AMP
ATP
N CH3
CH3
N CH3
CH3 O
TPP synthetase
O —
O P O
H
Thiamine O

O P O

O

Thiamine pyrophosphate (TPP)

Figure 1.78 Structure of thiamine and thiamine pyrophosphate.

TPP serves as a coenzyme in the oxidative decarboxylation of α-keto acid, and in the formation or degradation of
α-ketols (hydroxy ketones) by transketolase.

Pyruvate
decarboxylase
Pyruvate (a-keto acid) Acetaldehyde + CO2

Transketolase
Xylulose-5-Phosphate + Ribose-5-Phosphate Glyceraldehyde–3–Phosphate + Sedoheptulose–7-Phosphate

Beri-Beri is a severe thiamine-deficiency syndrome found in areas where polished rice is the major component of the
diet.

Riboflavin (vitamin B2)


Riboflavin is a constituent of flavin mononucleotide (FMN) and flavin adenine dinucleotide (FAD). FMN is synthesized
after the addition of phosphate in riboflavin and FAD formed by the transfer of an AMP moiety from ATP to FMN. FMN
and FAD are each capable of reversibly accepting two hydrogen atoms, forming FMNH2 or FADH2. The oxidized form
of the isoalloxazine structure absorbs light around 450 nm. The color is lost, when the ring is reduced.

86
Biomolecules and Catalysis

H
O O
Isoalloxazine

H3C N H3C N N
NH NH 2H
+


N N O N N O 2e N
H3C H3C

CH2 CH2 H
H C OH ADP PPi H C OH FADH2 (Reduced)
ATP ATP
Ribitol H C OH FMN H C OH

H C OH H C OH

CH2OH H2C O P P Adenosine

Riboflavin FAD (Oxidized)

Figure 1.79 Structure and biosynthesis of flavin mononucleotide (FMN) and flavin adenine dinucleotide (FAD).

Niacin

Niacin, or nicotinic acid, is a substituted pyridine derivative. The biologically active coenzyme forms are nicotinamide
adenine dinucleotide (NAD+) and its phosphorylated derivative, nicotinamide adenine dinucleotide phosphate
(NADP+). Nicotinamide, is a derivative of nicotinic acid that contains an amide instead of a carboxyl group. NAD+
and NADP+ serve as coenzymes in oxidation-reduction reactions in which the coenzyme undergoes reduction of the
pyridine ring by accepting a hydride ion (H–). The reduced forms of NAD+ and NADP+ are NADH and NADPH,
respectively.

O O
H H H
NH2
C C
+
N NH2 H NH2
O O

5' 5' + 2e :
N H2C O P O P O CH2 N N
N
O
O O

O

H H NADH (Reduced)
1'
H H H
H
OH OH
OH OH

Adenosine

Figure 1.80 Structure of NAD+ (Oxidized).

Deficiency of niacin causes pellagra, a disease involving the skin and central nervous system. The symptoms of
pellagra progress through the three Ds: Dermatitis, Diarrhoea, Dementia, and, if untreated, death.

Biotin

Biotin is a coenzyme in carboxylation reactions, in which it serves as a mobile carboxyl group carrier. Biotin is
covalently bound to the enzyme by an amide linkage between the carboxyl group of its valerate side chain and
the ε-amino group of an enzyme Lys residue to form a biocytin (alternatively, biotinyllysine) residues.

87
This page intentionally left blank.
Chapter 02
Bioenergetics and Metabolism

2.1 Bioenergetics
Bioenergetics is the quantitative study of the energy transductions that occur in living cells and of the nature and
functions of the chemical processes underlying these transductions.

Thermodynamic principles

The First law of thermodynamics states that the energy is neither created nor destroyed, although it can be
transformed from one form to another i.e. the total energy of a system, including surroundings, remains constant.

Mathematically, it can be expressed as:


ΔU = Δq – Δw
ΔU is the change in internal energy,
Δq is the heat exchanged from the surroundings,
Δw is the work done by the system.

If Δq is positive, heat has been transferred to the system, giving an increase in internal energy. When Δq is
negative, heat has been transferred to the surroundings, giving a decrease in internal energy. When Δw is positive,
work has been done by the system, giving a decrease in internal energy. When Δw is negative, work has been done
by the surroundings, giving an increase in internal energy.

The Second law of thermodynamics states that the total entropy of a system must increase if a process is to occur
spontaneously. Mathematically, it can be expressed as:

Dq
DS ³ where, ΔS is the change in entropy of the system
T
Entropy is unavailable form of energy and it is very difficult to determine it, so a new thermodynamic term called
free energy is defined.

Free energy
Free energy or Gibb’s free energy indicates the portion of the total energy of a system that is available for useful
work (also known as chemical potential). The change in free energy is denoted as ΔG.
Under constant temperature and pressure, the relationship between free energy change (ΔG) of a reacting system
and the change in entropy (ΔS) is expressed by following the equation:

ΔG = ΔH – TΔS

Where, ΔH is the change in enthalpy and T is absolute temperature. ΔH is the measure of change in heat content of
reactants and products. The change in the free energy, ΔG, can be used to predict the direction of a reaction at
constant temperature and pressure.

123
Bioenergetics and Metabolism

If 'G is negative, the reaction proceeds spontaneously with the loss of free energy ( exergonic),
'G is positive, the reaction proceeds only when free energy can be gained (endergonic),
'G is 0, the system is at equilibrium; both forward and reverse reactions occur at equal rates,
'G of the reaction A o B depends on the concentration of reactant and product. At constant temperature and
pressure, the following relation can be derived:
[B]
'G = 'G0 + RT ln
[A]
Where, 'G0 is the standard free energy change;
R is the gas constant;
T is the absolute temperature;
[A] and [B] are the actual concentrations of reactant and product.

Standard free energy change


The actual change in free energy ('G) during a reaction is influenced by temperature, pressure and the initial
concentrations of reactants and products, and usually differs from standard free energy change, 'G0.
The chemical reaction has a characteristic standard free energy change and it is constant for a given reaction. It
can be calculated from the equilibrium constant of the reaction under standard conditions i.e., at a solute concentration
of 1.0M, at temperature of 25°C and at 1.0 atm pressure. The free energy change which corresponds to this
standard state is known as standard free energy change, 'G0.

Relationship between 'G0 and Keq

In a reaction A o B, a point of equilibrium is reached at which no further net chemical change takes place–that is,
when A is being converted to B, B is also being converted to A, as fast as A into B. In this state, the ratio of [B] to
[A] is constant, regardless of the actual concentrations of the two compounds:

[B]eq
K eq =
[A]eq

where Keq is the equilibrium constant, and [A]eq and [B]eq are the concentrations of A and B at equilibrium. The
concentration of reactants and products at equilibrium define the equilibrium constant, Keq. The equilibrium constant
Keq depends on the nature of reactants and products, the temperature and the pressure. Under standard physical
conditions (25°C and 1 atm pressure, for biological systems), the Keq is always the same for a given reaction,
whether or not a catalyst is present.

If the reaction A B is allowed to go to equilibrium at constant temperature and pressure, then at equilibrium
the overall free energy change ('G) is zero. Therefore,
[B]eq
'G0 = –RT ln
[A]eq
0
So, 'G = –RT ln Keq
This equation allows some simple predictions:
Keq 'G 0 Reaction
> 1.0 Negative proceeds forward
1.0 Zero is at equilibrium
< 1.0 Positive proceeds in reverse

As we know, the ionic composition of an acid or base varies with pH. So, the standard free energy calculated
according to the biochemistry convention is valid only at pH=7. Hence, under biochemistry convention, 'G0 is
symbolized by 'G0’ and likewise, the biochemical equilibrium constant is represented by K’ eq.

So'G0’ = –RT ln K’eq

124
This page intentionally left blank.
Bioenergetics and Metabolism

2.3 Respiration
Living cells require an input of free energy. Energy is required for the maintenance of highly organized structures,
synthesis of cellular components, movement, generation of electrical currents and for many other processes. Cells
acquire free energy from the oxidation of organic compounds that are rich in potential energy.
Respiration is an oxidative process, in which free energy released from organic compounds is used in the formation
of ATP. The compounds that are oxidized during the process of respiration are known as respiratory substrates,
which may be carbohydrates, fats, proteins or organic acids. Carbohydrates are most commonly used as respiratory
substrates.
During oxidation within a cell, all the energy contained in respiratory substrates is not released free in a single step.
Free energy is released in multiple steps in a controlled manner and used to synthesise ATP, which is broken down
whenever (and wherever) energy is needed. Hence, ATP acts as the energy currency of the cell.
During cellular respiration, respiratory substrates such as glucose may undergo complete or incomplete oxidation.
The complete oxidation of substrates occurs in the presence of oxygen, which releases CO2, water and a large
amount of energy present in the substrate. A complete oxidation of respiratory substrates in the presence of
oxygen is termed as aerobic respiration.
Although carbohydrates, fats and proteins can all be oxidized as fuel, but here processes have been described by
taking glucose as a respiratory substrate. Oxidation of glucose is an exergonic process. An exergonic reaction
proceeds with a net release of free energy. When one mole of glucose (180 g) is completely oxidized into CO2 and
water, approximately 2870 kJ or 686 kcal energy is liberated. Part of this energy is used for synthesis of ATP. For
each molecule of glucose degraded to carbon dioxide and water by respiration, the cell makes up to about 30 or 32
ATP molecules, each with 7.3 kcal/mol of free energy.

C6H12O6 + 6O2 6CO2 + 6H2O + Energy (ATP + Heat)

The incomplete oxidation of respiratory substrates occurs under anaerobic conditions i.e. in the absence of oxygen.
As the substrate is never totally oxidized, the energy generated through this type of respiration is lesser than that
during aerobic respiration.

2.3.1 Aerobic respiration


Enzyme catalyzed reactions during aerobic respiration can be grouped into three major processes: glycolysis, citric
acid cycle and oxidative phosphorylation. Glycolysis takes place in the cytosol of cells in all living organisms. The
citric acid cycle takes place within the mitochondrial matrix of eukaryotic cells and in the cytosol of prokaryotic
cells. The oxidative phosphorylation takes place in the inner mitochondrial membrane. However, in prokaryotes,
oxidative phosphorylation takes place in the plasma membrane.

Table 2.3 Intracellular location of major processes of aerobic respiration


In eukaryotes,
Glycolysis – Cytosol
Citric acid cycle – Mitochondrial matrix
Oxidative phosphorylation – Inner mitochondrial membrane
In prokaryotes,
Glycolysis – Cytosol
Citric acid cycle – Cytosol
Oxidative phosphorylation – Plasma membrane

129
Bioenergetics and Metabolism

2.3.2 Glycolysis
Glycolysis (from the Greek glykys, meaning sweet, and lysis, meaning splitting) also known as Embden-Meyerhof
pathway, is an oxidative process in which one mole of glucose is partially oxidized into the two moles of pyruvate
in a series of enzyme-catalyzed reactions. Glycolysis occurs in the cytosol of all cells. It is a unique pathway that
occurs in both aerobic as well as anaerobic conditions and does not involve molecular oxygen.

6 CH2OH
5 O
Glucose (G) 4
OH
1
3 2
HO OH
2+ ATP
Hexokinase, Mg OH
1
DG° (kJ/mol) = –16.7 ADP CH2OP
O
Glucose-6-phosphate (G6P) OH
HO OH
Preparatory phase (Energy investment phase)

Phosphoglucoisomerase OH
2
DG° (kJ/mol) = +1.7
POH2C O CH2OH
Fructose-6-phosphate (F6P) HO
OH
2+ ATP
Phosphofructokinase, Mg OH
3

DG° (kJ/mol) = –14.2 ADP


POH2C O CH2OP
Fructose-1,6-bisphosphate (FBP)
HO
OH
2+
Aldolase, Zn OH
4
DG° (kJ/mol) = +23.9
OH

Glyceraldehyde-3-phosphate (G3P) POH2C CH CHO

Triose phosphate isomerase


5
DG° (kJ/mol) = +7.6
O

Dihydroxyacetone phosphate HOH2C C CH2OP

Step 1 : (Phosphorylation) Glucose is phosphorylated by ATP to form a glucose 6-phosphate. The negative
charge of the phosphate prevents the passage of the glucose 6-phosphate through the plasma membrane, trapping
glucose inside the cell. This irreversible reaction is catalyzed by hexokinase. Hexokinase is present in all cells of all
organisms. Hexokinase requires divalent metal ions such as Mg2+ or Mn2+ for activity. Hepatocytes and β-cells of
the pancreas also contain a form of hexokinase called glucokinase (hexokinase D). Hexokinase and glucokinase
are isozymes. Glucokinase is present in liver and beta-cells of the pancreas and has a high Km and Vmax as
compared to hexokinase.

Step 2 : (Isomerization) A readily reversible rearrangement of the chemical structure (isomerization) moves the
carbonyl oxygen from carbon 1 to carbon 2, forming a ketose from an aldose sugar. Thus, the isomerization of
glucose 6-phosphate to fructose 6-phosphate is a conversion of an aldose into a ketose.

130
This page intentionally left blank.
Bioenergetics and Metabolism

Experimental proof of chemiosmotic hypothesis


Experimental proof of chemiosmotic hypothesis was provided by Andre Jagendorf and Ernest Uribe in 1966. In an
elegant experiment, isolated chloroplast thylakoid vesicles containing F0F1 particles were equilibrated in the dark
with a buffered solution at pH 4.0. When the pH in the thylakoid lumen became 4.0, the vesicles were rapidly mixed
with a solution at pH 8.0 containing ADP and Pi. A burst of ATP synthesis accompanied the transmembrane movement
of protons driven by the electrochemical proton gradient. In similar experiments using inside-out preparations of
submitochondrial vesicles, an artificially generated membrane electric potential also resulted in ATP synthesis.
Thylakoid membrane

pH=7 pH=4 pH=4

CF0 CF1 ATPase


pH=7 pH=4 pH=8
ADP+Pi ATP
+
H
Figure 2.15 Synthesis of ATP by F0F1 depends on a pH gradient across the membrane.

2.3.10 ATP synthase


The use of proton motive force for ATP synthesis is catalyzed by ATP synthase. The multiprotein ATP synthase or
F0F1 complex or complex V catalyzes ATP synthesis as protons flow back through the inner membrane down the
electrochemical proton gradient. It consists of two components – F0 component and F1 ATPase. The F0 component
is embedded in the inner mitochondrial membrane. F0 contains one ‘a’ subunit, two ‘b’ subunits and 9–12 ‘c’
subunits. The c subunit consists of two α helices that span the membrane. An aspartic acid residue in the second
helix lies on the center of the membrane. F0 is a transmembrane complex that forms a regulated H+ channel. An
antibiotic oligomycin completely blocks ATP synthesis by blocking the flow of protons through F0 of ATP synthase
(subscript ‘O’ denotes its inhibition by antibiotic oligomycin). F1 ATPase (made up of 3α, 3β, γ, δ and ε) is tightly
bound to F0 and protrudes into the matrix; it contains three β-subunits that are the sites of ATP synthesis. At the
center of F1 ATPase is the γ-subunit. The γ-subunit extends through F1 and interacts with F0. The γε and C9–12 ring
complex is the rotor (moving unit) and the a, b2 and α3β3δ complex is the stator (stationary unit). Rotational motion
is imparted to the rotor by the passage of protons.

d
a b a b
F1 ATPase

Matrix
g
b e

Inner
mitochondrial F0
a
membrane c c c c cc

+
Intermembrane H
space

Figure 2.16 The enzyme complex consists of an F0 component and F1 ATPase. Proton passing through the
disc of ‘C’ units cause it and the attached γ-subunit to rotate. The γ-subunit fits inside the F1 ATPase of a three
α and three β-subunits, which are fixed to the membrane and do not rotate.

148
Bioenergetics and Metabolism

ATP synthase synthesizes ATP by harnessing the proton motive force. ATP synthase can also function in reverse to
hydrolyze ATP and pump H+ across the inner mitochondrial membrane. It thus acts as a reversible coupling devise,
interconverting electrochemical proton gradient and chemical bond energies, or vice versa.
F1 ATPase was first extracted from the mitochondrial inner membrane and purified by Efraim Racker and his
colleagues. F1 cannot synthesize ATP from ADP and Pi; because it can catalyze the hydrolysis of ATP. Thus the
enzyme was originally called F1ATPase. The complete F0F1 complex, like isolated F1, can hydrolyze ATP to ADP and
Pi, but its biological function is to catalyze the condensation of ADP and Pi to form ATP. The F0F1 complex is,
therefore, more appropriately called ATP synthase.

F0 F1 ATPase
F0
Show electron
transport, but no
ATP synthesis

Show ATPase activity,


but no electron transport
and ATP synthesis
Inner mitochondrial membrane F1 ATPase

Figure 2.17 F1 particles are required for ATP synthesis, but not for electron transport. Submitochondrial
vesicles from which F1 is removed by mechanical agitation cannot catalyze ATP synthesis. Because F1 separated
from membranes is capable of catalyzing ATP hydrolysis, it has been called the F1 ATPase.

ATP synthesis

The binding change mechanism is a widely accepted model of ATP synthesis. Paul Boyer developed the binding
change, or flip-flop mechanism, which postulated that ATP synthesis is coupled with a conformational change in the
ATP synthase generated by rotation of the gamma subunit. Proton translocation through F0 powers the rotation of
the γ-subunit of F1 ATPase, leading to changes in the conformation of the nucleotide-binding sites in the F1
β-subunits (as described below). By means of this binding change mechanism, the F0F1 complex harnesses the
proton-motive force to power ATP synthesis.

ADP+Pi ADP+Pi ATP ATP


120°
b3 L T T
rotation
g of g g g
O T L O L O
AD

b1 b2 ATP ATP
P+

(counter-clockwise
ADP+Pi
Pi

as viewed from
Released
the top)

Stage 1 Stage 2

Figure 2.18 The binding-change mechanism of ATP synthesis from ADP and Pi by the F0F1 complex. The
molecule contains three binding sites, which interconvert between three conformational states as the molecule
rotates. The diagram shows one stage of the active cycle. The three αβ-dimers have three different states.
In 1, the open state O is empty; the loose state L contains ADP + Pi ; and the tight state T contains ATP. In
logical intermediate stage (bracketed), rotation of the γ within the (αβ)3 hexamer converts the L state to a T
state, the T state to an O, and the O state to an L. The L state can accept a new charge of substrate, the T state
can form ATP. At stage 2, the ATP has fallen out of the O state, new ADP + Pi have bound to the L state, and
ATP has been synthesized in the T state.

149
This page intentionally left blank.
Bioenergetics and Metabolism

A major function of GSH in the erythrocyte is to eliminate H2O2 and organic hydroperoxides. H2O2, a toxic product
of various oxidative processes, reacts with double bonds in the fatty acid residues of the erythrocyte cell membrane
to form organic hydroperoxides. These, in turn, result in premature cell lysis. Peroxides are eliminated through the
action of glutathione peroxidase, yielding glutathione disulfide (GSSG). So, G6PD deficiency results in hemolytic
anemia caused by the inability to detoxify oxidizing agents.

Pentose NADPH G–S–S–G 2H2O


Phosphate
Pathway Glutathione Glutathione
Reductase Peroxidase
+
2H NADP 2G–SH H2O2

Figure 2.30 Role of the pentose phosphate pathway in the reduction of oxidized glutathione.

2.6 Entner-Doudoroff pathway


Entner-Doudoroff pathway is an alternative pathway that catabolizes glucose to pyruvate using a set of enzymes
different from those used in either glycolysis or the pentose phosphate pathway. This pathway, first reported by
Michael Doudoroff and Nathan Entner, occurs only in prokaryotes, mostly in gram-negative bacteria such as
Pseudomonas aeruginosa, Azotobacter, Rhizobium.
In this pathway, glucose phosphate is oxidized to 2-keto-3-deoxy-6-phosphogluconic acid (KDPG) which is cleaved
by 2-keto-3-deoxyglucose-phosphate aldolase to pyruvate and glyceraldehyde-3-phosphate. The latter is oxidized
to pyruvate by glycolytic pathway where in two ATPs are produced by substrate level phosphorylations. This
process yields one ATP as well as one NADH and one NADPH for every glucose molecule.

— —
COO COO O O

H C OH C O CH3 C C O
6 CH2OH CH2OP
5 O ATP ADP O NADP NADPH HO C H H2O H C H Pyruvate
4 1
OH OH H C OH H C OH
3 2
HO OH HO OH
OH
OH OH H C OH H C OH
POH2C CH CHO
CH2O P CH2O P
Glucose Glucose-6-phosphate
6-Phosphogluconate Glyceraldehyde-3-
2-Keto-3-deoxy-
phosphate
6-phosphogluconate
+
NAD
2 ADP
NADH
2 ATP

Pyruvate

Figure 2.31 Entner-Doudoroff pathway.

2.7 Photosynthesis
Photosynthesis is a physiochemical process by which photosynthetic organisms convert light energy into chemical
energy in the form of reducing power (as NADPH) and ATP, and use these chemicals to drive carbon dioxide
fixation.

160
Bioenergetics and Metabolism

Sun

Light reaction

ATP + NADPH

CO2 Calvin cycle Triose phosphate Glucose

Figure 2.32 Photosynthesis is a two stage process. The first process is a light dependent one (light reactions)
that requires the direct energy of light to make energy carrier molecules that are used in the second process.
The Calvin cycle (light independent process) occurs when the products of the light reaction are used in the
formation of carbohydrate.

On the basis of generation of oxygen during photosynthesis, the photosynthetic organisms may be oxygenic or
anoxygenic. Oxygenic photosynthetic organisms include both eukaryotes as well as prokaryotes whereas anoxygenic
photosynthetic organisms include only prokaryotes.

Oxygenic photosynthetic organisms


Eukaryotes – Plants and Photosynthetic protists
Prokaryotes – Cyanobacteria

Anoxygenic photosynthetic organisms


Prokaryotes – Green and purple photosynthetic bacteria

In oxygenic photosynthetic organisms, photosynthetic oxygen generation occurs via the light-dependent oxidation
of water to molecular oxygen. This can be written as the following simplified chemical reaction:

nCO2 + nH2O (CH2O)n + nH2O + nO2

2.7.1 Photosynthetic pigment


The solar energy required for photosynthesis is captured by photosynthetic pigment molecules. Different types of
pigments, described as photosynthetic pigment, participate in this process. The major photosynthetic pigment is
the chlorophyll.

Chlorophylls

Chlorophyll, a light-absorbing green pigment, contains a polycyclic, planar tetrapyrrole ring structure. Chlorophyll
is a lipid soluble pigment. It has the following important features:
1. The central metal ion in chlorophyll is Mg2+.
2. Chlorophyll has a cyclopentanone ring (ring V) fused to pyrrole ring III.
3. The propionyl group on a ring IV of chlorophyll is esterified to a long-chain tetraisoprenoid alcohol. In chlorophyll
a and b it is phytol.

161
This page intentionally left blank.
Bioenergetics and Metabolism

Glycogen storage diseases

Glycogen storage diseases are caused by a genetic deficiency of one or another of the enzymes of glycogen
metabolism. Many diseases have been characterized that result from an inherited deficiency of the enzyme.
These defects are listed in the table.

Table 2.17 Glycogen storage diseases


Name Enzyme deficiency
Von Gierke’s disease Liver glucose-6-phosphatase
Pompe’s disease Lysosomal α1 → 4 and α1 → 6 glucosidase (acid maltase)
Hers’ disease Liver phosphorylase
Tarui’s disease Muscle and erythrocyte phosphofructokinase 1
McArdle’s disease Muscle glycogen phosphorylase
Andersen’s disease Amylo (1,4 → 1,6) transglycosylase (Branching enzyme)

2.10 Lipid metabolism


2.10.1 Synthesis and storage of triacylglycerols
All animals and plants have the ability to synthesize triacylglycerol (TAG). In animals, many cell types and organs
have the ability to synthesize triacylglycerols, but the liver and intestines are most active. Within all cell types, even
those of the brain, triacylglycerols are stored as cytoplasmic lipid droplets (also termed fat globules, oil bodies, lipid
particles, adiposomes, etc.) enclosed by a monolayer of phospholipids and hydrophobic proteins, such as the
perilipins in adipose tissue or oleosins in seeds. Two main biosynthetic pathways are known, the sn-glycerol-3-
phosphate pathway, which predominates in liver and adipose tissue, and a monoacylglycerol pathway in the intestines.
The most important route to triacylglycerol biosynthesis is the sn-glycerol-3-phosphate or Kennedy pathway.

O O O
CH2 OH CH2 O C R1 CH2 O C R1 CH2 O C R1
O O
1 2 3
CH OH CH OH CH O C R2 CH O C R2
Fatty Fatty
acyl-CoA acyl-CoA Pi
CH2 OP CH2 OP CH2 OP CH2 OH

Glycerol-3-phosphate Lysophosphatidic acid Phosphatidic acid Diacylglycerol

Fatty 4
acyl-CoA
Enzymes
1 Glycerol-3-phosphate acyltransferase O
2 Acylglycerophosphate acyltransferase CH2 O C R1
3 Phosphatidic acid phosphohydrolase O
4 Diacylglycerol acyltransferase CH O C R2
O
CH2 O C R3

Triacylglycerol

Figure 2.72 Triacylglycerol biosynthetic pathway.

201
This page intentionally left blank.
Chapter 03

Cell Structure and Functions

3.1 What is a Cell?


The basic structural and functional unit of cellular organisms is the cell. It is an aqueous compartment bound by cell
membrane, which is capable of independent existence and performing the essential functions of life. All organisms,
more complex than viruses, consist of cells. Viruses are noncellular organisms because they lack cell or cell-like
structure. In the year 1665, Robert Hooke first discovered cells in a piece of cork and also coined the word cell. The
word cell is derived from the Latin word cellula, which means small compartment. Hooke published his findings in
his famous work, Micrographia. Actually, Hooke only observed cell walls because cork cells are dead and without
cytoplasmic contents. Anton van Leeuwenhoek was the first person who observed living cells under a microscope
and named them animalcules, meaning little animals.
On the basis of the internal architecture, all cells can be subdivided into two major classes, prokaryotic cells and
eukaryotic cells. Cells that have unit membrane bound nuclei are called eukaryotic, whereas cells that lack a
membrane bound nucleus are prokaryotic. Eukaryotic cells have a much more complex intracellular organization
with internal membranes as compared to prokaryotic cells. Besides the nucleus, the eukaryotic cells have other
membrane bound organelles (little organs) like the endoplasmic reticulum, Golgi complex, lysosomes, mitochondria,
microbodies and vacuoles. The region of the cell lying between the plasma membrane and the nucleus is the
cytoplasm, comprising the cytosol (or cytoplasmic matrix) and the organelles. The prokaryotic cells lack such unit
membrane bound organelles.

Cell theory
In 1839, Schleiden, a German botanist, and Schwann, a British zoologist, led to the development of the cell theory
or cell doctrine. According to this theory all living things are made up of cells and cell is the basic structural and
functional unit of life. In 1855, Rudolf Virchow proposed an important extension of cell theory that all living cells
arise from pre-existing cells (omnis cellula e cellula). The cell theory holds true for all cellular organisms. Non-
cellular organisms such as virus do not obey cell theory. Over the time, the theory has continued to evolve. The
modern cell theory includes the following components:
• All cellular organisms are made up of one or more cells.
• The cell is the structural and functional unit of life.
• All cells arise from pre-existing cells by division.
• Energy flow occurs within cells.
• Cells contain hereditary information (DNA) which is passed from cell to cell.
• All cells have basically the same chemical composition.

Evolution of the cell


The earliest cells probably arose about 3.5 billion years ago in the rich mixture of organic compounds, the primordial
soup, of prebiotic times; they were almost certainly chemoheterotrophs. Primitive heterotrophs gradually acquired

243
Cell Structure and Functions

the capability to derive energy from certain compounds in their environment and to use that energy to synthesize
more and more of their own precursor molecules, thereby becoming less dependent on outside sources of these
molecules-less extremely heterotrophic. A very significant evolutionary event was the development of photosynthetic
ability to fix CO2 into more complex organic compounds. The original electron (hydrogen) donor for these
photosynthetic organisms was probably H2S, yielding elemental sulfur as the byproduct, but at some point, cells
developed the enzymatic capacity to use H2O as the electron donor in photosynthetic reactions, producing O2. The
cyanobacteria are the modern descendants of these early photosynthetic O2 producers.
One important landmark along this evolutionary road occurred when there was a transition from small cells with
relatively simple internal structures - the so-called prokaryotic cells, which include various types of bacteria - to a
flourishing of larger and radically more complex eukaryotic cells such as are found in higher animals and plants.
The fossil record shows that earliest eukaryotic cells evolved about 1.5 billion years ago. Details of the evolutionary
path from prokaryotes to eukaryotes cannot be deduced from the fossil record alone, but morphological and
biochemical comparison of modern organisms has suggested a reasonable sequence of events consistent with the
fossil evidence.
Three major changes must have occurred as prokaryotes gave rise to eukaryotes. First, as cells acquired more
DNA, mechanisms evolved to fold it compactly into discrete complexes with specific proteins and to divide it equally
between daughter cells at cell division. These DNA-protein complexes called chromosomes become especially
compact at the time of cell division. Second, as cells became larger and intracellular membrane organelles developed.
Eukaryotic cells have a nucleus which contains most of the cell’s DNA, enclosed by a double layer of membrane.
The DNA is, thereby, kept in a compartment separate from the rest of the contents of the cell, the cytoplasm, where
most of the cell’s metabolic reactions occur.
Finally, primitive eukaryotic cells, which were incapable of photosynthesis or of aerobic metabolism, pooled their
assets with those of aerobic bacteria or photosynthetic bacteria to form symbiotic associations that became
permanent. Some aerobic bacteria evolved into the mitochondria of modern eukaryotes, and some photosynthetic
cyanobacteria became the chloroplasts of modern plant cells.

3.2 Structure of eukaryotic cells


3.2.1 Plasma membrane
Plasma membrane is a dynamic, fluid structure and forms the external boundary of cells. It acts as a selectively
permeable membrane and regulates the molecular traffic across the boundary. The plasma membrane exhibits
selective permeability; that is, it allows some solutes to cross it more easily than others. Different models were
proposed to explain the structure and composition of plasma membranes. In 1972, Jonathan Singer and Garth
Nicolson proposed fluid-mosaic model, which is now the most accepted model. In this model, membranes are viewed
as quasi-fluid structures in which proteins are inserted into lipid bilayers. It describes both the mosaic arrangement
of proteins embedded throughout the lipid bilayer as well as the fluid movement of lipids and proteins alike.

Peripheral protein

Phospholipid
bilayer

Integral
protein Peripheral
protein

Figure 3.1 Fluid mosaic model for membrane structure. The fatty acyl chains in the lipid bilayer form a
fluid, hydrophobic region. Integral proteins float in this lipid bilayer. Both proteins and lipids are free to move
laterally in the plane of the bilayer, but movement of either from one face of the bilayer to the other is restricted.

244
This page intentionally left blank.
Cell Structure and Functions

Thermodynamics of transport
The amount of energy needed for the transport of a solute against a concentration gradient can be calculated from
the initial concentration gradient. When there is transport of one mole of a solute (uncharged) from a region in
which its concentration is C1 to a place where its concentration is C2 and the standard free energy change (ΔG0) is
zero, then free energy change (ΔG) is given by
C2
ΔG = RT ln ... (1)
C1

According to this equation, if C2 is less than C1, ΔG is negative, and the process is thermodynamically favourable.
As more and more substance is transferred, C1 decreases and C2 increases, until C2 = C1. At this point ΔG = 0, and
the system is in equilibrium.
However, if the solute is an ion of charge Z, then the free energy change for transport across a cell membrane
involves two contributors: the normal concentration term, as given in equation (1), plus a second term describing
the energy change involved in moving a mole of ions across the potential difference. If we consider a process in
which ions are transported from outside to inside of a cell, then ΔG is given by:
Cin
ΔG = RT ln + Z .F . Δψ
Cout
Here F is the Faraday constant (96.5 kJ mole–1 V–1) and Δψ is the trans-membrane electrical potential (in volts).
Eukaryotic cells typically have electrical potentials across their plasma membranes of about 0.05 to 0.1 V (with the
inside negative relative to the outside).

3.3 Membrane potential


All cells have an electrical potential difference, or membrane potential, across their plasma membrane. Electrical
potential across plasma membranes is a function of the ions concentrations in the intracellular and extracellular
solutions and of the selective permeabilities of the ions. Active transport of ions by ATP-driven ion pumps, generate
and maintain ionic gradients. In addition to ion pumps, which transport ions against electrochemical gradients,
plasma membrane also contains channel protein that allows ions to move along their electrochemical gradients.
Movement of ions occurs passively through ion channels. Ion channels may be either leaky channels or gated
channels. Leaky channels, which are open all the time, permit unregulated leakage of specific ion across the
membrane. Gated channels, in contrast, have gates that can be open or closed, permitting ion passage through the
channels when open and preventing ion passage through the channels when closed. Ion concentration gradients
across plasma membrane and selective movements of ions along gradient create a difference in electric potential
or voltage across the plasma membrane. This is called membrane potential.

How membrane potentials arise?


To help explain how an electric potential across the plasma membrane can arise, we first consider a set of simplified
experimental systems in which a membrane, which is only permeable for K+ separates a 1 M KCl solution on the left
from a 1 M KCl solution on the right. Because the concentrations of K+ across the membrane are equal, there is no
net flow of ions across the membrane and thus no electric potential is generated. If the concentration of K+ ions
across the membrane is different as shown in the figure, then K+ ions tend to move down their concentration
gradient from the left side to the right, leaving an excess of negative Cl– ions compared with K+ ions on the left side
and generate an excess of positive K+ ions compared with Cl– ions on the right side. The resulting separation of
charge across the membrane constitutes an electric potential, or voltage, with the left side of the membrane having
excess negative charge with respect to the right. However, continued left-to-right movement of the K+ ions eventually
is inhibited by the mutual repulsion between the excess positive charges accumulated on the right side of the
membrane and by the attraction of K+ ions to the excess negative charges built up on the left side. The system soon
reaches an equilibrium point at which the two opposing factors that determine the movement of K+ ions—the
membrane electric potential and the ion concentration gradient—balance each other out. At equilibrium, no net
movement of K+ ions occurs across the membrane.

261
This page intentionally left blank.
Cell Structure and Functions

Problem

When a neurotoxin is placed in the solution bathing an isolated neuron, it affects the action potential of the neuron
as shown in the figure below. What is the probable mechanism of action of this drug on this neuron?

80
Membrane potential (mV)
+Drug
40

0
Neuron
depolarized
–40 –Drug

–80 Time

Solution

One possibility is that the drug maintains the voltage-gated Na+ channels in an open position, although the drug is
not capable of opening voltage-gated channels by itself. Another possibility is that it prevents opening of the
voltage-gated K+ channels that are responsible for the quick return to the resting potential.

3.4 Transport of macromolecules across plasma membrane


The plasma membrane is a dynamic structure that functions to segregate the chemically distinct intracellular milieu
(the cytoplasm) from the extracellular environment by regulating and coordinating the entry and exit of small and
large molecules. Essential small molecules, such as amino acids, sugars and ions, can traverse the plasma membrane
through the action of integral membrane protein pumps or channels. Macromolecules must be carried into the cell
in membrane bound vesicles derived by the invagination and pinching-off of pieces of the plasma membrane in a
process termed endocytosis.

3.4.1 Endocytosis
The term endocytosis was coined by Christian de Duve in the year 1963. Endocytosis is a process whereby
eukaryotic cells internalize material from their surrounding environment. Internalization is achieved by the formation
of membrane-bound vesicles at the cell surface that arise by progressive invagination of the plasma membrane,
followed by pinching off and release of free vesicles into the cytoplasm.

Classically, endocytosis has been divided into phagocytosis (cellular eating) and pinocytosis (cellular drinking).

Phagocytosis or cell eating (first reported by Metchnikoff) describes the internalization of large particles following
particle binding to specific plasma membrane receptors and by the formation of large endocytic vesicles (generally
>250 nm in diameter) called phagosomes. The phagosomes fuse with lysosomes to form phagolysosomes. In
protozoa, phagocytosis is a form of feeding: large food particles taken up into phagosomes end up in lysosomes. In
multicellular eukaryotes, few specialized cells – so called professional phagocytes perform phagocytosis for non-
nutritive purposes. In mammals, two classes of white blood cells act as professional phagocytes—macrophage and
neutrophils. Phagocytosis is an active, actin mediated and highly regulated process involving specific cell-surface
receptors and signalling cascades mediated by Rho-family GTPases.

Pinocytosis or cell drinking (also termed as fluid-phase endocytosis) involves the ingestion of fluid by the formation
of small endocytic vesicles (termed pinocytic vesicles) of about 100 nm in diameter. Virtually all eukaryotic cells
perform pinocytosis. Uptake of soluble material dissolved in extracellular fluid during pinocytosis occurs both
selectively as well as non-selectively. Selective and efficient uptake occurs when solutes are captured by specific
high-affinity receptors (receptor mediated endocytosis). In receptor-mediated endocytosis, a specific receptor on
the cell surface binds tightly to the extracellular macromolecule (the ligand) that it recognizes. The plasma membrane

271
This page intentionally left blank.
Cell Structure and Functions

3.6.1 Endomembrane system


The endomembrane system is composed of membrane bound structures that are suspended in the cytoplasm of a
eukaryotic cell. These membranes divide the cell into functional and structural compartments. The membrane
bound structures (organelles) of the endomembrane system include: the nuclear envelope, the endoplasmic reticulum,
the Golgi apparatus, lysosomes, vacuoles and transport vesicles. The system is defined more accurately as the set
of membranes that form a single functional and developmental unit, either being connected together directly, or
exchanging material through vesicular transport. The endomembrane system does not include the membranes of
mitochondria and chloroplasts. Peroxisome is not considered as part of endomembrane system by many authors.
However, growing evidence supports the view that peroxisome is actually a part of endomembrane system that
originates from the endoplasmic reticulum.

3.6.2 Transport of proteins across the ER membrane


Proteins synthesized by membrane bound ribosomes include soluble and membrane proteins that reside in the ER
itself, resident proteins in the lumen of Golgi complex and lysosomes, integral proteins in the membrane of these
organelles and the plasma membrane and proteins that are secreted from the cell. Proteins destined to be secreted
move through the secretory pathway in the following order: rough ER → ER-to-Golgi transport vesicles → Golgi
cisternae → secretory or transport vesicles → cell surface.
Proteins synthesized by membrane bound ribosomes translocate the ER membrane co-translationally. Some proteins,
however, are translocated into the ER after their synthesis has been completed (post-translational translocation).
Synthesis of these proteins begins on an unattached ribosome in the cytosol. Ribosomes engaged in the synthesis
of secretory proteins are then targeted to the ER by a signal sequence (a short sequence of 15 to 35 amino acids
that contain a sequence of at least six non-polar amino acids) at the N-terminus of the growing polypeptide chain.

Secretory
vesicle Lyso-
somes
Trans-Golgi
network

Plasma
membrane

Rough ER

Nucleus

Figure 3.36 Diagrammatic representation of secretory pathway. Newly synthesized proteins are inserted
into the lumen of the ER. Those proteins that are transported out of the ER may then pass through various
sub-compartments of the Golgi until they reach the trans-Golgi network, the exit side of the Golgi. In the
trans-Golgi network, proteins are segregated and sorted. Proteins destined for the plasma membrane or
those that are secreted in a constitutive manner are carried out to the cell surface in transport vesicles.
Some proteins enter late endosomes and are selectively transferred to lysosomes.

285
This page intentionally left blank.
Cell Structure and Functions

Lipid-linked membrane proteins

GPI-linked proteins are lipid linked (or anchored) membrane proteins. These proteins are exactly like type I
transmembrane proteins, with a cleaved N-terminal signal sequence and internal stop transfer sequence. These
proteins are synthesized and initially inserted into the ER membrane. After insertion in the ER membrane, these
proteins are transferred to a glycosylphosphatidylinositol (GPI) anchor. Enzyme transamidase present in the ER
membrane cuts the protein free from its membrane bound C-terminus and simultaneously attaches the new C-
terminus to an amino group on a GPI. GPI helps to direct these proteins to cell membranes.

— Preformed —
COO COO
GPI anchor
Cytosol

ER lumen NH3
+
NH
+
NH 3 C O
+
NH
3

Figure 3.41 GPIs are added to polypeptides anchored in the membrane by a carboxy-terminal membrane
spanning region. The membrane-spanning region is cleaved, and the new carboxy terminus is joined to the
NH2 group after translocation is completed leaving the protein attached to the membrane by the GPI anchor.

3.6.3 Transport of proteins from ER to cis Golgi


Proteins entering into the lumen of the ER are of two types – resident proteins such as BiP and export proteins such
as secretory proteins and lysosomal proteins. Following the ER-specific folding, oligomerization and processing,
export proteins are exported from the ER to the cis Golgi network, the first compartment of the Golgi apparatus.
This transport occurs through the formation of transport vesicles followed by the targeting and fusion of these
vesicles. Most of the protein components in transport vesicle are highly specific in order to maintain organelle
distinction. To be transported from one compartment to another, protein products must be packaged into transport
vesicles. Transport vesicles arise from specialized coated regions of membranes, which are surrounded by a coat
of proteins covering the cytosolic face so that these membranes eventually bud off as coated transport vesicles.
Prior to fusing with the target membrane, this protein coat is discarded to allow the membranes to fuse directly.
Mainly three types of coated vesicles are known, each with a different type of protein coat and formed by reversible
polymerization of a distinct set of protein subunits. In addition to coat proteins, various adaptor proteins and small
GTP-binding proteins are required for formation of coated vesicles. Each type of vesicle transports proteins from
particular parent organelles to particular destination organelles.
Clathrin coated : Clathrin forms multiple complexes based on its association with different adaptor proteins
(APs). Clathrin that is associated with AP1 and AP3 forms vesicles for transport from the
trans-Golgi network to the lysosome. Clathrin associated with AP2 forms vesicles from the
plasma membrane during endocytosis that transport to the early endosomes.
COPI coated : COPI (Coat protein I) forms vesicles for both intra-Golgi transport and retrograde transport
from the Golgi to the ER. ADP-ribosylation factor 1 (ARF1) is a small GTPase that regulates
COPI vesicle formation by recruiting coatomer (for coat protomer). Like all small GTPase,
activation of ARF1 is catalyzed by a guanine nucleotide exchange factor (GEF), while
its deactivation is catalyzed by a GTPase-activating protein (GAP). A lactone antibiotic,
Brefeldin A, prevents COPI coated vesicles formation. It targets the activity of GEF which
catalyzes the activation of ARF1.

289
This page intentionally left blank.
Cell Structure and Functions

3.9 Lysosome
Lysosomes are single membrane-bound organelles present in animal cells. They are heterogeneous structure and
greatly vary in size and shape. Lysosomes have acidic internal pH (about 5) and are filled with hydrolytic enzymes.
They contain about 40 different types of hydrolytic enzymes (including proteases, nucleases, glycosidases, lipases,
phospholipases, phosphatases and sulfatases) which are responsible for the controlled intracellular digestion of
macromolecules. All are acid hydrolases because they require an acidic environment for optimal activity and the
lysosome provides this by maintaining a pH of about 5.0. A vacuolar H+ ATPase in the lysosomal membrane uses
the energy of ATP hydrolysis to pump H+ into the lysosome, thereby maintaining the internal acidic pH.

0.2–0.5 µm

Lysosome Cytosol
pH ~5 pH ~7.2

+
H

ATP ADP

Figure 3.47 The interior of lysosomes has a pH of about 5.0. To create the low pH, V- type H+ ATPase located in
the lysosomal membrane pump protons into the lysosome using energy supplied from ATP. All the lysosomal
enzymes work most efficiently at acidic pH and collectively are termed acid hydrolases.

There are two types of lysosomes: primary lysosomes (do not contain materials for intracellular digestion) and
secondary lysosomes (contain materials that are undergoing digestion or that already have been digested).
Lysosomes are responsible for the digestion of both extracellular as well as intracellular materials. Lysosomal
digestion of materials can be classified into autophagy and heterophagy. The process by which substances are
taken into the cell from external environment and broken down by lysosome is called heterophagy. In contrast,
the degradation of cytoplasmic components within lysosomes is called autophagy.
In heterophagy, there are two different pathways that brings extracellular materials to lysosomes for degradation.
Phagocytic cells, such as macrophages and neutrophils in vertebrates, engulf large particles by the process of
phagocytosis. During phagocytosis, a single-membrane phagosome is generated, and this compartment fuses
directly with a lysosome to form a phagolysosome.
Virtually all eucaryotic cells continually internalize fluid substances in small pinocytic (endocytic) vesicles by the
process of pinocytosis. Most of endocytosed substances eventually end up in lysosomes, where they degraded. In
this process, the endocytosed substances first move from the endocytic vesicles to the endosomes. At the end of
this pathway, the late endosomes convert to endolysosomes and lysosomes as a result of both their fusion with
preexisting lysosomes and progressive acidification.
Autophagy is an intracellular degradation process of cytoplasmic constituents within lysosomes. During autophagy,
sequestration begins with the formation of a phagophore. Phagophores form de novo in the cytoplasm from a
cup-shaped membrane that expands into a double-membrane bound autophagosome surrounding a portion of the
cytoplasm. The autophagosome may fuse with an endosome. The product of the endosome-autophagosome fusion is
called an amphisome. The completed autophagosome or amphisome fuses with a lysosome, which supplies acid hydrolases.
The enzymes in the resulting compartment, an autolysosome, break down the inner membrane from the
autophagosome and degrade the materials. The resulting macromolecules are released and recycled in the cytosol.

296
Cell Structure and Functions

Cytosol
Phagocytosis
Phagosome Plasma
membrane
Early
Late endosome
endosome

(a) Figure 3.48


Pinocytosis
Lysosome (a) Schematic overview
of three pathways by which
materials are moved to
lysosomes: phagocytosis,
Phagophore Autophagosome
pinocytosis and autophagy.
Autophagy
(b) Process of autophagy.

Phagophore Autophagosome

Fusion
(b)
Engulfing
cytoplasmic
components
Autolysosome Degradation of
cytoplasmic components

Lysosome

Some lysosomes participate in exocytosis. This enables cells to eliminate undigested contents. For most cells, this
seems to be a minor pathway, used only when the cells are stressed. Some cell types, however, contain specialized
lysosomes that have acquired the necessary machinery for fusion with the plasma membrane. Melanocytes (melanin-
producing cells) in the skin, for example, produce and store melanin pigments in their lysosomes. These pigment
containing melanosomes release their pigment into the extracellular space of the epidermis by exocytosis.

Table 3.10 Example of some acid hydrolases present in lysosomes


Enzyme Natural substrate
Phosphatases
Acid phosphatase Most phosphomonoesters
Acid phosphodiesterase Oligonucleotides and other phosphodiesters
Nucleases
Acid ribonuclease RNA
Acid deoxyribonuclease DNA
Polysaccharide/mucopolysaccharide hydrolyzing enzymes
β-Galactosidase Galactosides
α-Glucosidase Glycogen
α-Mannosidase Mannosides, glycoproteins
β-Glucuronidase Polysaccharides and mucopolysaccharides
Hyaluronidase Hyaluronic acids; chondroitin sulfates
Proteases
Cathepsin(s) Proteins
Collagenase Collagen
Peptidases Peptides
Lipid-degrading enzymes
Esterase(s) Fatty acyl esters
Phospholipase(s) Phospholipids

297
Cell Structure and Functions

3.10 Vacuoles
Most plants and fungal cells contain one or several very large, fluid-filled vesicles called vacuoles. They are
surrounded by single membrane called tonoplast and related to the lysosomes of animal cells, containing a variety
of hydrolytic enzymes, but their functions are remarkably diverse. Like a lysosome, the lumen of a vacuole has an
acidic pH, which is maintained by similar transport proteins in the vacuolar membrane. The plant vacuole contains
water and dissolved inorganic ions, organic acids, sugars, enzymes and a variety of secondary metabolites. Solute
accumulation causes osmotic water uptake by the vacuole, which is required for plant cell enlargement. This water
uptake generates the turgor pressure.
The vacuole is different from contractile vacuole. A contractile vacuole is an organelle involved in osmoregulation.
It pumps excess water out of the cell. It is found predominantly in protists (such as Paramecium, Amoeba) and in
unicellular algae (Chlamydomonas). It was previously known as pulsatile or pulsating vacuole.

3.11 Mitochondria
Mitochondria (term coined by C. Benda) are energy-converting organelles, which are present in virtually all eukaryotic
cells. They are the sites of aerobic respiration. They produce cellular energy in the form of ATP, hence they are
called ‘power houses’ of the cell. Mitochondria are membrane-bound mobile as well as plastic organelle. Each
mitochondrion is a double membrane-bound structure with outer and inner membranes. The space between the
outer and inner membranes is called intermembrane space. The outer membrane is fairly smooth. But the inner
membrane is highly convoluted; forming folds called cristae. The inner membrane is also very impermeable to
many solutes due to very high content of a phospholipid called cardiolipin. The cristae greatly increase the inner
membrane’s surface area. The two faces of this membrane are referred to as the matrix side (N-side) and the
cytosolic side (P-side). Inner membrane contains enzyme complex called ATP synthase (or F0-F1 ATPase or oxysome)
that makes ATP. The outer membrane protects the organelle, and contains specialized transport proteins such as
porin which allows free passage for various molecules into the intermembrane space of the mitochondria. Mitochondrial
porins, or voltage-dependent anion-selective channels (VDAC) allow the passage of small molecules across the
mitochondrial outer membrane.

Inner membrane

Outer membrane
Matrix
Intermembrane space

ATP synthase (F0-F1 ATPase)

Figure 3.49 A mitochondrion has double-membraned organization and contains: the outer mitochondrial
membrane, the intermembrane space (the space between the outer and inner membranes), the inner
mitochondrial membrane, and the matrix (space within the inner membrane).

The matrix (large internal space) contains multiple copies of the dsDNA (as genetic material), mitochondrial ribosomes
(ranging from 55S-75S), tRNAs and various proteins. Mitochondrial dsDNA is mostly circular. The size of mitochondrial
DNA also varies greatly among different species.

Organisms Size (kb)


Human 16.6
Xenopus (frog) 18.4
Drosophila (fruit fly) 18.4
Saccharomyces (yeast) 75.0
Arabidopsis (mustard plant) 367.0

298
This page intentionally left blank.
Cell Structure and Functions

3.12 Plastids
Plastids are double membrane bound semi-autonomous organelles present in all living plant cells and photosynthetic
protists. All plastids contain multiple copies of the dsDNA as genetic materials and 70S ribosomes for proteins
synthesis. Plastids differentiate from proplastids. Proplastids are inherited with cytoplasm of plant egg cells. As
immature plant cells differentiate, the proplastids develop according to the needs of the specialized cell: they can
become chloroplast, leucoplasts or chromoplasts. A collective term used for different kinds of organelles, all derived
from proplastids, is plastid. Chloroplast is the most important member of plastid family. It occurs in all photosynthetic
eukaryotes and acts as site of photosynthesis. It has a double membrane which encloses a fluid-filled region called
the stroma. Embedded in the stroma is a complex network of stacked sacs. Each stack is called a granum and each
of the flattened sacs which makes up the granum is called a thylakoid. The thylakoid membrane, that encloses a
fluid-filled thylakoid interior space, contains photosynthetic pigments. There are many grana in each chloroplast
(usually 10 to 100 grana) which are interconnected by unstacked stromal lamellae. The lipids of the thylakoid
membrane have a distinctive composition. About 80% lipids are uncharged mono- and digalactosyl diacylglycerol
and only about 10% are phospholipids.

Granum Thylakoid membrane


Granum

}
}

Stroma

Thylakoid lumen

Stroma
Outer Inner
lamella
membrane membrane

Figure 3.52 The two envelope membranes enclose the stroma. The stacks of the thylakoid termed grana
are connected by tubes, forming a continuous thylakoid lumen.

In the dark grown plants, proplastids develop into etioplasts, which have a yellow chlorophyll precursor pigment
protochlorophyll instead of chlorophyll. When exposed to light, the etioplasts rapidly change into chloroplasts by
converting this precursor to chlorophyll.
Chromoplasts are plastids responsible for pigment synthesis and storage. They are rich in carotenoids and mainly
responsible for the yellow, orange, or red colors of many fruits and flowers, as well as of autumn leaves. Leucoplasts
are colorless (non-pigmented) plastids and act as storage organelles. Based on the kind of substance they store,
they are further classified into amyloplasts (for starch storage), elaioplasts (for fat storage) and proteinoplasts or
aleuroplasts (for storing and modifying proteins).

3.13 Peroxisome
Peroxisome (discovered by Christian de Duve in 1965) is a single membrane bound small organelle (approximately
0.5–1 μm in diameter) present in all eukaryotes. The term, peroxisome, was proposed by de Duve because it
produced and consumed hydrogen peroxide. Peroxisomes lack DNA and ribosomes. Thus, all peroxisomal proteins
(peroxisomal matrix and membrane proteins) are encoded by nuclear genes, synthesized on ribosomes present in
the cytosol and then incorporated into pre-existing peroxisomes.
The ability of peroxisomes to divide themselves suggests that the peroxisome may have had an endosymbiotic
origin similar to mitochondria. However, the localization of peroxisomal proteins to the endoplasmic reticulum and
the similarity of some peroxisomal proteins to those localized in the ER suggest an alternative hypothesis: that the
peroxisome was developed from the ER (de novo origin). Aspects of both views may be true. Most peroxisomal
membrane proteins are made in the cytosol by membrane free ribosomes and insert into the membrane of preexisting
ones. However, few others are first synthesized by membrane bound ribosomes of ER and then integrated into the

302
Cell Structure and Functions

ER membrane from where they may bud in specialized peroxisomal precursor vesicles. New peroxisome precursor
vesicles may then fuse with one another and begin importing additional peroxisomal proteins synthesized by membrane
free cytosolic ribosomes to grow into mature peroxisomes, which can enter into a cycle of growth and fission.

Like mitochondria, peroxisomes contain several oxidative enzymes, such as catalase, oxidases. Peroxisomal oxidases
transfer hydrogen atoms to molecular oxygen and form hydrogen peroxide. The enzyme catalase (a member of
the peroxidase family) present in the peroxisome uses the hydrogen peroxide to oxidize a variety of other substrates
such as phenols, formic acid, formaldehyde and alcohol by the peroxidation reaction.

Catalase
H2O2 + RH2 R + 2H2O

When excess hydrogen peroxide accumulates in the cell, catalase converts it to H2O through the reaction:

Catalase
H2O2 + H2O2 2H2O + O2

A major oxidative reaction carried out in peroxisomes is the β-oxidation. β-oxidation in mammalian cells occur
both in mitochondria and peroxisomes; in plant cells, however, this is exclusively found in peroxisomes. Peroxisomes
also have two important roles in plants – photorespiration and glyoxylate cycle. In photorespiration, 2-phosphoglycolate
produced by oxygenase activity of rubisco is metabolized into serine, CO2 and NH3. This pathway involves three
subcellular compartments, the chloroplasts, peroxisomes and mitochondria.
Glyoxysome is a specialized form of peroxisome present in some plant cells, mainly the cells of germinating
seeds. Glyoxysomes contain the enzymes of the glyoxylate cycle – which help to convert stored lipid into
carbohydrates that can be translocated throughout the young plant to provide energy for growth. In the glyoxylate
cycle, two molecules of acetyl-CoA produced by fatty acid breakdown are used to make succinic acid, which then
leaves the glyoxysome and is converted into glucose in the cytosol. The glyoxylate cycle does not occur in animal
cells, and animals are therefore unable to convert the fatty acids in fats into carbohydrates.

Targeting of peroxisomal proteins from cytosol to peroxisome synthesized by membrane free ribosomes
Transport of proteins from cytosol to peroxisomes occur post-translationally. Peroxisomal proteins synthesized on
cytosolic ribosomes are generally fold into their mature conformation in the cytosol before import into the organelle.
Proteins that are involved in peroxisome biogenesis, including peroxisome generation, division as well as matrix and
membrane protein import are called peroxins. At least 23 distinct peroxins participate in the import process,
which is driven by ATP hydrolysis. Proteins that are imported into the peroxisome have peroxisomal targeting
sequences–PTS1 and PTS2. The PTS1 is a tri- or tetrapeptide at the C-terminus. The consensus sequence of PTS1 is
(S/A/C)–(K/R/H)–(L/M). It was first characterized in catalase as a Ser-Lys-Leu sequence (SKL in one-letter code) at
the very C-terminus. PTS1 containing proteins are recognized by the cytoplasmic receptor Pex5 and are imported
into peroxisomes in their fully folded form. The PTS2 signal is a sequence of nine amino acids and can be located
near the N-terminus or internally and recognized by the soluble receptor Pex7. PTS2 exhibits the consensus
sequence (R/K)–(L/V/I)–X5–(H/Q)–(L/A). The importance of the import process in peroxisomes is dramatically
demonstrated by the inherited human disease, Zellweger syndrome. It is a rare, congenital disorder, characterized
by the reduction or absence of peroxisomes due to defect in importing proteins into peroxisomes.

3.14 Nucleus
The nucleus is the controlling center of eukaryotic cell. It contains most of the genetic materials of cell. Most
eukaryotic cells have one nucleus (uninucleate) each, but some have many nuclei (multinucleate) and certain cells,
such as mature red blood cells, do not have it. Paramecia (unicellular ciliate protozoa) have two nuclei – a macronucleus
and a micronucleus. Genes in the macronucleus control the everyday functions of the cell, such as feeding, waste
removal, and maintenance of water balance. Micronucleus controls the sexual reproduction.
Nuclei differ in size depending on the cell type. Most nuclei are spherical, but multilobed nuclei are also common,
such as those found in polymorphonuclear leukocytes or mammalian epididymal cells. A nucleus in G0 phase has
four components: Nuclear envelope, nucleolus, nuclear matrix and chromatins.

303
This page intentionally left blank.
Cell Structure and Functions

Intermediate filament proteins are classified into six major types based on their sequences and tissue distribution:

Type Protein Site of expression


I Acidic keratins Epithelial cells
II Neutral or basic keratins Epithelial cells
III Vimentin Most widely distributed of all intermediate filament proteins is vimentin,
which is typically expressed in leukocytes, blood vessel endothelial cells,
some epithelial cells, and mesenchymal cells such as fibroblasts.
Desmin Muscle cells
Glial fibrillary acidic protein Glial cells
IV Neurofilament proteins Neurons
In mammals, three different neurofilament proteins have been recognized:
NF-L, NF-M and NF-H, for low, middle and high molecular weight, respectively.
V Nuclear lamins Most ubiquitous group of intermediate filaments are found exclusively in
the nucleus. Lamins form a network structure that lines the inside surface
of the inner nuclear membrane termed nuclear lamina.
VI Nestin Stem cells of central nervous system.

3.16 Cell junctions


Many cells in tissues are linked to one another and to the extracellular matrix at specialized contact sites called cell
junctions. The cell junctions are critical to the development and functions of multicellular organisms. Cell junctions
can be classified into three functional groups: occluding junctions, anchoring junctions and communicating junctions.

1. Occluding junctions

Occluding junctions seal cells together in an epithelium in a way that prevents even small molecules from leaking
from one side of the sheet to the other (i.e. forms permeability barrier across epithelial cell sheets). These junctions
are of two types– tight junction and septate junction.

Tight junctions (or zonula occludens) are cell-cell occluding junctions mediated by two major transmembrane
proteins-claudins and occludin. Claudins and occludins associate with intracellular peripheral membrane proteins
called ZO proteins. Tight junctions make the closest contact between adjacent cells and prevent the free passage
of molecules (including ions) across an epithelial sheet in the spaces between cells. They also maintain the polarity
of epithelial cells by preventing the diffusion of molecules between the apical and the basolateral regions of the
plasma membrane. Septate junctions are the main occluding junctions in invertebrates.

Lumen

Figure 3.67
Tight junctions allow cell
Tight sheets to serve as barriers to
junction solute diffusion. Schematic
drawing showing how a small
extracellular molecule present
on one side of an epithelial cell
sheet cannot traverse the tight
Cell 1 Cell 2 Cell 3 Cell 4 junctions that seal adjacent
cells together.

322
Cell Structure and Functions

2. Anchoring junctions

Anchoring junctions mechanically attach cells (and their cytoskeletons) to their neighbours or to the extracellular
matrix and perform the key task of holding cells together into tissues. It includes two main types of junctions–
adherens junction and desmosome.

Adherens junctions
Adherens junctions connect bundles of actin filaments from cell to cell or from cell to the extracellular matrix.
Adhesion belt (or zonula adherens): It is a cell to cell junction, mediated by actin filaments and proteins belonging
to the cadherin family. Adhesion belts are usually located near the apical surface, just below the tight junctions.
Focal contact (or adhesion plaque): It is a cell-matrix junction which is mediated by transmembrane adhesion
proteins of the integrin family and by actin filament.

Desmosomes
Desmosomes are buttonlike points of intercellular contacts which bond neighbouring cells together. It has a dense
cytoplasmic plaque which is composed of a mixture of intracellular attachment proteins, including plakoglobin and
desmoplakins. The cytoplasmic plaque is responsible for connecting the cytoskeleton to the transmembrane linker
proteins of the cadherin family of cell-cell adhesion molecules. Desmosomes contain two specialized cadherin
proteins, desmoglein and desmocollin. Through extracellular domains, cadherins are responsible for holding the
adjacent membranes together. Each plaque is associated with a thick network of keratin intermediate filaments (in
most epithelial cells) and desmin intermediate filaments (in heart muscle cells), which are attached to the surface of
the plaque.
Hemidesmosomes, or half-desmosomes, resemble desmosomes, but instead of joining adjacent epithelial cell
membranes, they connect the basal surface of epithelial cells to the underlying basal lamina- a specialized mat of
extracellular matrix at the interface between the epithelium and connective tissue. The transmembrane linker
proteins in hemidesmosomes belong to the integrin family of extracellular matrix receptors, rather than to the
cadherin family of cell-cell adhesion proteins used in desmosomes.

Cell Keratin Cell


intermediate filaments Intermediate filaments

Plasma Plasma
membrane membrane
Cytoplasmic plaque Cytoplasmic plaque

Intercellular Cadherin Integrin


space

Cytoplasmic plaque

Plasma
membrane

Cell Extracellular matrix

A. Desmosomes B. Hemidesmosomes

Figure 3.68 A. Schematic drawing of a desmosome. On the cytoplasmic surface of each interacting plasma
membrane is a cytoplasmic plaque. Each plaque is associated with a network of keratin intermediate filaments.
Transmembrane adhesion proteins, which belong to the cadherin family of cell-cell adhesion molecules,
bind to the plaques and interact through their extracellular domains to hold the adjacent membranes together.
B. Schematic drawing of a hemidesmosome, joining adjacent epithelial cell membranes to the underlying
basal lamina.

323
This page intentionally left blank.
Cell Structure and Functions

N C C
S S
S S
a
Fibrin
binding
domains
b g
N N
Heparin
binding
domains

RGD
Three stranded
Cell
coiled coil
binding
a-helical region
domains

Fibrin
binding
domains
C C
LG1 Collagen
binding
Integrin
LG2 domains
binding domain
LG3 Globular
region
Heparin and fibrin
a-dystroglycan LG4 binding
(Proteoglycan) domains
binding domain LG5
C N N

Figure (A) Structure of laminin. Figure (B) Structure of fibronectin.

Figure 3.73 (A) Structure of laminin – a heterotrimeric glycoprotein. (B) A fibronectin molecule consists of
two nearly identical polypeptide chains joined by two disulfide bonds near their carboxyl ends. Each polypeptide
chain is folded into a series of globular domains linked by short, flexible segments. The globular domains
have binding sites for extracellular matrix components or for specific receptors on the cell surface. The cell-
binding domain contains the tripeptide sequence RGD (arginine-glycine-aspartate), which is recognized by
fibronectin receptors.

Table 3.16 Comparison of extracellular matrix of animals and plants


Animals Plants
Chemical nature Protein rich Carbohydrate rich
Structural fiber Collagens and elastins Cellulose
Components of hydrated matrix Proteoglycans Pectins
Adhesive molecules Fibronectins and laminins Hemicelluloses

3.19 Plant cell wall


Many cells are surrounded by insoluble secreted macromolecules. Cells of bacteria, fungi, many protists and plants
are surrounded by rigid cell walls, which are an integral part of the cell. The cell walls of eukaryotes (including fungi,
plants) are composed principally of polysaccharides. The basic structural polysaccharide of fungal cell walls is chitin
(a polymer of N-acetylglucosamine residues). The cell wall of plant cell is composed principally of cellulose, which
is the single most abundant polymer on Earth.

Structural components of plant cell walls


Cellulose microfibrils
Cellulose is a linear polymer of glucose residues. The glucose residues are joined by β(1 → 4) linkages. Several
chains then associate in parallel with one another to form ~30 nm diameter cellulose microfibrils. The extensive

328
Cell Structure and Functions

noncovalent bonding between adjacent chains (18 to 24) within a cellulose microfibril gives this structure a high
tensile strength. Cellulose is also insoluble, chemically stable and relatively resistant to chemical and enzymatic
attack. Cellulose microfibril is synthesized by a plasma membrane bound enzyme complex – cellulose synthase.
Cellulose synthases in plants are encoded by a gene family named cellulose synthase A (CESA). In expanding
cells, the newly synthesized cellulose microfibrils are deposited parallel to cortical microtubules underlying the
plasma membrane.

Matrix polysaccharides
Cellulose microfibrils are embedded in a matrix consisting of proteins and polysaccharides. The major polysaccharides
of the matrix are synthesized by membrane-bound enzymes in the Golgi apparatus and are delivered to the cell
wall via exocytosis of tiny vesicles. Two major types of matrix polysaccharides are hemicelluloses and pectins.
Hemicelluloses (cross linking glycan) are a heterogeneous group of highly branched polysaccharides (such as
xyloglucan, arabinoxylan) that are hydrogen-bonded to the surface of cellulose microfibrils. Pectins are heterogeneous
group of polysaccharides, characteristically containing acidic sugars such as galacturonic acid. The pectins are gel-
forming components of the matrix. Pectin has roles in forming connections between plant cells, adjusting pH and ion
balance, recognizing foreign molecules to alert the cell to the presence of microbes and establishing cell wall
porosity.

Lignin
Lignin is a phenolic polymer. It is a highly branched polymer of three simple phenolic alcohols- coniferyl alcohol,
coumaryl alcohol, and sinapyl alcohol - known as monolignols. Precursors of lignin are synthesized from phenylalanine
and are secreted to the wall. It is insoluble in water and most organic solvents. As lignin forms in the wall, it
displaces water from the matrix and forms a hydrophobic network that bonds tightly to cellulose and prevents wall
enlargement. Lignin adds significant mechanical strength to cell walls and reduces the susceptibility of walls to
attack by pathogens.

Structural proteins
The cell wall also contains several classes of structural proteins. These proteins usually are classified according to
their predominant amino acid composition— for example, hydroxyproline-rich glycoprotein, glycine-rich protein,
arabinogalactan protein and proline-rich protein. With the exception of glycine-rich proteins, all are glycosylated
and contain hydroxyproline. Extensin, a major structural protein in the cell walls of higher plants, is a hydroxyproline-
rich glycoprotein. Cell walls also contain functional proteins such as expansin. It causes the pH-dependent extension
and stress relaxation of cell walls. The molecular basis for expansin action is still uncertain, but most evidence
indicates that expansins act by disrupting non-covalent interactions between wall polysaccharides.

Primary and secondary cell walls


Plant cell walls commonly are classified into two major types: primary cell walls and secondary cell walls. Plant cell
first secretes a relatively thin and flexible wall called the primary cell wall. In general, the primary cell wall is
composed of approximately 25% celluloses, 25% hemicelluloses, and 35% pectins, with perhaps 1 to 8% structural
protein. However, large deviations from these values may be found.
When the cell matures and stops growing, it strengthens its wall. Some plant cells do this simply by secreting
hardening substances into the primary wall. Other cells add a secondary cell wall between the plasma membrane
and the primary wall. Secondary walls are more specialized in structure and composition as compared to the
primary cell wall. They are often quite thick and often layered. In wood, three layers of secondary cell wall,
referred to as the S1, S2 and S3 lamellae, result from different arrangement of the cellulose microfibrils. Secondary
walls contain up to 45% cellulose, 20-30% hemicellulose and are often (but not always) impregnated with lignin.

Middle lamella
A thin layer of material, the middle lamella, is present at the junction, where the walls of neighboring cells come into
contact. It acts as cementing material. The composition of the middle lamella differs from the rest of the wall. It is
high in pectin (as calcium pectate) and may be complexed with hydroxyproline-rich glycoproteins.

329
Cell Structure and Functions

3.20 Cell signaling


All cells receive and respond to signals from their surroundings. This is accomplished by a variety of signal molecules
that are secreted or expressed on the surface of one cell and bind to receptors expressed by other cells, thereby
integrating and coordinating the functions of the many individual cells that make up organisms. Each cell is
programmed to respond to specific extracellular signal molecules. Extracellular signaling usually involves the
following steps:
1. Synthesis and release of the signaling molecule by the signaling cell;
2. Transport of the signal to the target cell;
3. Binding of the signal by a specific receptor leading to its activation;
4. Initiation of signal-transduction pathways.

In animals, extracellular signaling by signal molecules can be classified into four categories—endocrine, paracrine,
autocrine and juxtacrine signaling.
In endocrine signaling, the signaling molecules act on target cells distantly located from their site of synthesis. It is
a long-range signaling in which signal molecule is transported by the blood stream.
In paracrine signaling, the signaling molecules released by a cell affect target cells only in close proximity. An
example of this is the action of neurotransmitters in carrying signals between nerve cells at a synapse.
In autocrine signaling, the signaling molecules produce an effect on same cell that produces it. One important
example of such is the response of cells of the vertebrate immune system to foreign antigens. Certain types of T-
lymphocytes respond to antigenic stimulation by synthesizing a growth factor that drives their own proliferation,
thereby increasing the number of responsive T-lymphocytes and amplifying the immune response.
In juxtacrine signaling, signal molecules do not diffuse from the cell producing it and cell bearing signal molecules
interact with receptor proteins of adjacent responding cells. Unlike other modes of cell signaling, juxtacrine signaling
requires physical contact between the cells involved. Notch signalling and classical cadherin signalling are examples
of juxtacrine signaling.

Endocrine signaling

Bloodstream

Signal
molecule Target cell
Paracrine signaling

Target cell

Autocrine signaling

Figure 3.74 Long-range signaling between cells is called endocrine when the signal molecule is transported
by the bloodstream (typical for hormones), paracrine when the signal diffuses between neighboring cells
across the extracellular matrix (typical for neurotransmitters and many so-called tissue hormones or local
mediators), and autocrine when the signal re-acts on the transmitter cell.

330
Cell Structure and Functions

3.20.1 Signal molecules


Signal molecules are chemically heterogenous compounds. These molecules are divided into two categories –
membrane bound and secretory signal molecules. Membrane bound signal molecules remain bound to the surface
of the cells and mediate contact dependent signaling. In most cases, signal molecules are secreted by signaling
cells. Secreted extracellular signal molecules are further divided into three general categories based on the distance
over which signals are transmitted: endocrine, paracrine and autocrine signal molecules.
Extracellular signal molecules are synthesized and released by signaling cells and produce a specific response only
in target cells that have either cell surface receptors or intracellular receptors for the signaling molecules. Extracellular
signal molecules fall into two broad categories - small lipophilic molecules that diffuse across the plasma membrane
and interact with intracellular receptors; and hydrophilic molecules that bind to cell-surface receptors. Few lipophilic
signal molecules bind to cell-surface receptors also. Most of these molecules are members of eicosanoids, which
include prostaglandins, prostacyclin, thromboxanes and leukotrienes. All eicosanoids are synthesized from arachidonic
acid, which is formed from phospholipids. However, most of the extracellular signal molecules are hydrophilic and
bind to the cell surface receptors of the target cell.

Examples of signal molecules that interact with cell surface receptor


Epinephrine, Non-epinephrine, Glucagons, Insulin, Gastrin, Secretin, Cholecystokinin and ACTH

Examples of signal molecules (hormones) that interact with cytosolic or nuclear receptor
Steroid hormones (Progesterone, Estradiol, Testosterone, Cortisol, Corticosterone, Aldosterone),
Steroid like hormone (α-ecdysone) and
Non-steroid hormones (Thyroid hormone and Retinoic acid).

Binding of extracellular signaling molecules to the cell surface receptor leads to increase (or decrease) in concentration
of low molecular weight intracellular signaling molecules termed secondary messengers. These low-molecular-
weight signaling molecules include cAMP, cGMP, diacylglycerol (DAG); inositol 1,4,5-trisphosphate (IP3 ),
phosphoinositides and calcium.

3.20.2 Receptors
The cellular response to a particular extracellular signal molecule depends on its binding to a specific receptor
located on the surface of a target cell or in its nucleus or cytosol. Receptors are chemically protein or glycoprotein
molecules which bind to signaling molecules (termed ligand). Binding of a ligand to its receptor causes a conformational
change in the receptor that initiates a sequence of reactions leading to a specific cellular response. Based on
location, receptors are classified into two broad categories - intracellular receptors and cell-surface receptors.

Intracellular receptors

Intracellular receptor proteins are located in the cytosol or the nucleus. These include receptors for steroid hormones,
thyroid hormones, retinoids and vitamin D as well as different “orphan” receptors. The intracellular receptors are
all structurally related and belong to the nuclear receptor superfamily. Within the cell, intracellular receptor – ligand
complex controls the activities of responsive genes. A large number of nuclear receptors have been identified
through sequence similarity to known receptors, but have no identified natural ligand, and are referred to as
nuclear orphan receptor. Nuclear receptors (NRs) are a family of highly conserved transcription factors that regulate
transcription in response to small lipophilic signal molecules. They are ubiquitous and unique to the animal kingdom.
Members of the nuclear-receptor superfamily can both positively and negatively regulate transcription.
Most members of the nuclear receptor superfamily consist of a N-terminal activation domain (also known as the
A/B region), a central DNA binding domain (DBD) and a C-terminal ligand binding domain (LBD). N-terminal
domain contains an AF-1 (activation function-1) sequence which functions as a ligand-independent transcriptional
activator. AF-1 is recognized by coactivators and/or other transcription factors. The DBD is comprised of two zinc-

331
This page intentionally left blank.
Cell Structure and Functions

3.21 Cell Cycle


The cell cycle is an ordered series of events. It is the sequence of events by which a cell duplicates its genome and
eventually divides into two daughter cells. The cell cycle has two main phases - interphase and M-phase. The period
of actual division, corresponding to the visible mitosis, is called M phase (mitosis phase). The interphase is the time
during which the cell is preparing for division by undergoing both cell growth and DNA replication in an orderly
manner. The interphase is further subdivided into;
G1 phase (Gap 1, the period between the end of M phase and the start of DNA replication);
S phase (Synthesis, the period during which DNA synthesis occurs); and
G2 phase (Gap 2, the gap period following DNA replication and preceding the initiation of the M phase).

Cells that do not divide enter into G0 state. Cells are able to enter reversible (quiescent) or irreversible (senescent
and terminally differentiated) G0 states. Most cells in our body are in G0 state. Quiescent state represent a
reversible resting state. Cells in this state remain metabolically active but no longer proliferate unless called on to
do so depending on the requirement of the organism. This state can remain for days, weeks, or even years before
resuming proliferation. Senescent cells are dysfunctional cells that have ceased proliferation and are perma-
nently withdrawn from the cell cycle. Senescence is by the irreversible loss of proliferative potential. Terminally
differentiated cells (e.g. mammalian skeletal muscle cells and nerve cells) are those cells that, in the course of
acquiring specialized functions, have irreversibly lost its ability to proliferate.

M phase
(mitosis)
G2 phase
(Gap 2)

Eukaryotic
G0
Cell cycle

G1 phase
(Gap 1)
S phase
(DNA synthesis)

Figure 3.94 The four successive phases of a standard eukaryotic cell cycle. During interphase the cell grows
continuously; during M phase it divides. DNA replication is confined to the part of interphase known as
S phase. G1 phase is the gap between M phase and S phase; G2 is the gap between S phase and M phase.
Cells in G1, if they have not yet committed themselves to DNA replication, can pause in their progress around
the cycle and enter a specialized resting state, often called G0.

Approximately 95% of the cell cycle is spent in interphase. The duration of the three stages (G1, S and G2) varies
from species to species, and also from cell to cell within a species. Although the length of all phases of the cycle is
variable to some extent, by far, the greatest variation occurs in the duration of G1. Its length is adjusted in response
to growth conditions. In most cells, the whole of M phase takes only about an hour, which is only a small fraction of the
total cycle time. For a typical rapidly proliferating human cell in culture with a total cycle time of approximately
24 hours, the G1 phase might last about 11 hours, S phase about 8 hours, G2 about 4 hours and M about 1 hour.

2n and 2C 2n and 4C 2n and 4C

G1 phase S phase G2 phase


Diploid cell nucleus

Figure 3.95 Change in number of chromosomes and amount of DNA during interphase.

355
This page intentionally left blank.
Cell Structure and Functions

Cell fusion experiments


The cell cycle is a dependent phase of events. It is divided into four distinct phases. Control mechanisms
operate to regulate the onset of each phase and avoid improper transitions between phases. Initial insights
about the nature of cell-cycle regulation came from cell fusion experiments of Rao and Johnson in 1970. These
investigators used an inactivated Sendai virus to fuse together mammalian cells that had reached different
stages of the cell cycle. The resulting heterokaryons possessed two different nuclei — one from each parent
cell—and share the same cytoplasm. The behaviour of the two nuclei was then monitored.
In the first set of experiments, when G1 cells were fused with S phase cells, the G1 cells abruptly resumed DNA
synthesis and entered into S phase. Thus, the cytoplasm of S phase cells contained the factors that initiated
DNA synthesis in the G1 nucleus.
In the second set of experiments, S-phase cells were fused with G2-phase cells. Although chromosome replication
continued in the S nucleus of the heterokaryon, the G2 nucleus was unable to synthesize DNA, indicating that
the G2 nucleus is prevented from entering further rounds of DNA replication.
Finally, when G1 cells were fused with G2 cells, the rate of initiation of DNA synthesis and of mitosis was similar
to that of G1/G1 cells rather than G2/G2 cells. Although the G2 nucleus had no effect on the G1 nucleus, entry
of the G2 nucleus into M phase was delayed by factors associated with the G1 component. This elegant series of
experiments was the first indication in mammalian cells that the sequential and unidirectional phases of the cell
cycle are controlled by a series of chemical signals that can diffuse freely between the nucleus and cytoplasm.

+ G1 nucleus enters into S phase

G1 phase S phase

+ No DNA synthesis in G2 nucleus

G2 phase S phase

G2 nucleus had no effect on the


+
G1 nucleus

G1 phase G2 phase

Figure 3.102 Cell fusion experiments. Cells in S phase were fused either to cells in G1 or to cells in G2.
When G1 cells were fused with S phase cells, the G1 nucleus immediately began to replicate DNA. In contrast,
when G2 cells were fused with S phase cells, only the S phase nucleus continued DNA replication. It therefore
appeared that the G2 nucleus had to pass through M and enter G1 before another round of DNA replication
could be initiated. This elegant series of experiments was the first indication in mammalian cells that the
sequential and unidirectional phases of the cell cycle are controlled by a series of chemical signals that can
diffuse freely between the nucleus and cytoplasm.

362
This page intentionally left blank.
Cell Structure and Functions

Types of meiosis
Meiosis occurs during sexual life cycle of all eukaryotes. Fertilization and meiosis alternate in sexual life cycles,
maintaining a constant number of chromosomes in each species from one generation to the next. Although the
alternation of meiosis and fertilization is common to all organisms that reproduce sexually, the timing of these
two events in the life cycle varies, depending on the species. Meiosis is of three types – gametic, zygotic and
sporogenic – depending on the stage, where it occurs during sexual cycle.

Diploid
(2n)

Gametic
meiosis
In the gametic meiosis, meiosis
occurs at the time of gametes
formation. Gametes are the only
Gametes (n)
(Male and Female) haploid cells. Gametes fertilize
to form the zygote. The diploid
zygote undergoes repeated
Fertilization
mitotic division to give organism.
This type of meiosis occurs in
(2n) most animals including human.
Zygote

Haploid
(n)

Zygotic meiosis occurs in most


Zoospores (n) Gametes (n)
fungi and algae. In this case after
(Male and Female)
gametic fusion and formation of
diploid zygote, meiosis occurs.
Zygotic Fertilization Meiosis does not produce gametes
meiosis
but haploid cells that then divide
by mitosis.
(2n)
Zygote

Sporophyte
(2n)
Plants and some species of algae
Sporic
meiosis exhibit sporogenic meiosis. In this
Zygote (2n)
case multicellular diploid body
(called the sporophyte) produces
Spores (n) haploid spores by meiotic division.
Fertilization
Unlike a gamete, a haploid spore
doesn't fuse with another cell but
Gametes (n) divides mitotically, generating a
(Male and Female) multicellular haploid body called
the gametophyte. Cells of the
(n)
gametophyte give rise to gametes
Gametophyte by mitosis. Fusion of two haploid
gametes at fertilization results
in a diploid zygote, which develops
into the next sporophyte generation.

Figure 3.122 Gametic, zygotic and sporogenic meiosis.

380
Cell Structure and Functions

Stem cells
Stem cells are unspecialized (undifferentiated) cells that have the ability to differentiate into other cells and self-
regenerate. These cells divide to produce one daughter cell that remains a stem cell and one that divides and
differentiates. Because the division of stem cells produces new stem cells as well as differentiated daughter
cells, stem cells are self renewing populations of cells that can serve as a source for the production of differentiated
cells throughout life. Typically, stem cells generate an intermediate cell type or types before they achieve their
fully differentiated state.
The intermediate cell is called a precursor or progenitor cell. The ability to differentiate is the potential to
develop into other cell types. Depending on the ability to differentiate into other cell types, stem cells can be
classified as totipotent, pluripotent and multipotent stem cells. Totipotent stem cells are cells that can give rise
to a fully functional organism as well as to every cell type of the body. Pluripotent stem cells can differentiate
into nearly all cell types. Multipotent stem cells can differentiate into a limited number of closely related families
of cells.

Totipotent stem cell


These cells have unlimited capability, and
have the ability to form extraembryonic
membranes and tissues, the embryo itself,
and all postembryonic tissues and organs.

Pluripotent stem cell


These cells are capable of giving rise to
most, but not all, tissues of an organism.
An example is inner mass cells.

Multipotent stem cell


These cells are committed to give rise to
cells that have a specific function.
An example is blood stem cell.
Blood stem Other committed
cells stem cells

RBCs
WBCs
Platelets

There are two broad types of stem cells: embryonic stem cells, which are isolated from the inner cell mass of
blastocysts, and adult stem cells, which are found in various tissues. Embryonic stem cells can become all cell
types of the body because they are pluripotent. An adult stem cell (also termed as somatic stem cell) is an
undifferentiated cell found among differentiated cells in a tissue or organ, can renew itself and differentiate to
yield the major specialized cell types of the tissue or organ. The primary roles of adult stem cells in a living
organism are to maintain and repair the tissue in which they are found. Unlike embryonic stem cells, which are
defined by their origin (the inner cell mass of the blastocyst), the origin of adult stem cells in mature tissues is
unknown. Most adult stem cells are multipotent. The bone marrow contains two kinds of stem cells. One population,
called hematopoietic stem cells, forms all the types of blood cells in the body. A second population called bone
marrow stromal cells generates bone, cartilage, fat and fibrous connective tissue. The adult brain also contains
stem cells that are able to generate the brain’s three major cell types—astrocytes and oligodendrocytes, which
are non-neuronal cells and neurons or nerve cells.

381
Cell Structure and Functions

3.23 Apoptosis
Apoptosis (from the Greek words apo = from and ptosis = falling,) is an energy dependent biochemical mechanism
of programmed cell death. It is a genetically programmed process occurs normally during embryogenesis,
metamorphosis and aging. For example, the differentiation of human fingers in a developing embryo requires the
cells between the fingers to initiate apoptosis so that the fingers can separate. Apoptosis also occurs as a defense
mechanism such as in immune reactions or when cells are damaged by disease or noxious agents. Although
apoptosis is the most common form of programmed cell death (PCD), there are several non-apoptotic programmed
cell death such as autophagy and necroptosis have also been reported.
The demise of cells by apoptosis is marked by a well-defined sequence of morphological changes. Apoptotic cells
become more compact, blebbing occur at the membranes, chromatin becomes condensed and DNA is fragmented.
During the early stage of apoptosis, cell shrinkage and pyknosis (i.e. chromatin condensation) occur. With cell
shrinkage, the cells becomes smaller in size with dense cytoplasm. Pyknosis is the most characteristic feature of
apoptosis. Later, extensive plasma membrane blebbing occurs and separation of cell fragments occurs in the form
of small membrane-bound apoptotic bodies by a process called budding. Apoptotic bodies consist of cytoplasm
with tightly packed organelles with or without a nuclear fragment. These bodies are subsequently phagocytosed by
macrophages or surrounding cells. Chemical changes in the surface of apoptotic cells or bodies allow the surrounding
cells or macrophages to recognize and engulf them. An especially important change occurs in the plasma membrane
of apoptotic cells. The negatively charged phospholipid phosphatidylserine is normally exclusively located in the
inner leaflet of the lipid bilayer of the plasma membrane, but it flips to the outer leaflet in apoptotic cells, where
it can serve as a marker of these cells. There is essentially no inflammatory reaction associated with the process
of apoptosis nor with the removal of apoptotic cells because:
1. apoptotic cells do not release their cellular constituents into the surrounding interstitial tissue;
2. they are quickly phagocytosed by surrounding cells and,
3. the engulfing cells do not produce anti-inflammatory cytokines.

Apoptosis versus necrosis


Necrosis refers to the degradative processes that occur after cell death. It is not a mechanism of cell death. The
process that leads to necrosis is called oncosis. In contrast to necrosis, which is a form of an energy independent
cell death that results from acute tissue injury, apoptosis is carried out in an ordered process that generally confers
advantages during an organism’s life cycle.

Apoptosis Necrosis
Cell shrinkage and convolution Cell swelling
Pyknosis and karyorrhexis Karyolysis, pyknosis and karyorrhexis
Intact cell membrane Disrupted cell membrane
Cytoplasm retained in apoptotic bodies Cytoplasm released
No inflammation Inflammation usually present

Pyknosis (or karyopyknosis) is the irreversible condensation of chromatin in the nucleus of a cell undergoing
necrosis or apoptosis. Karyorrhexis is the nuclear fragmentation and karyolysis is the complete dissolution of the
chromatins.

Mechanisms of apoptosis
The mechanisms of apoptosis are highly complex and regulated, involving an energy-dependent cascade of molecular
events. There are multiple apoptotic pathways. These pathways are both caspase-dependent as well as caspase-
independent. The classical, caspase-dependent apoptosis is initiated either by extrinsic or intrinsic factors. There
are two main caspase-dependent apoptotic pathways:

382
Prokaryotes and Viruses

Table 4.11 Comparison of mechanism of DNA transfer


Mechanism Main feature Size of DNA transferred Polarity Sensitivity to DNase
Transformation Naked DNA transferred About 20 genes No Yes
Transduction DNA enclosed in a Usually part of the No No
bacteriophage capsid chromosome
Conjugation
Chromosomal DNA Cell-to-cell contact required Small fraction of chromosome Yes No
F plasmid Cell-to-cell contact required Entire F plasmid Yes No

Problem

Several Hfr strains are mated with an auxotrophic F— strain by the interrupted mating technique. The order of
transfer for the loci are given in the table.
Strain 1 Strain 2 Strain 3 Strain 4
thr str his thy
lip thy thy his
trp his str trp
his trp ilv lip
thy lip thr thr
What is the order of the loci on the chromosome?

Solution
thr-lip-trp-his-thy-str-ilv

Problem

An F+ strain of E. coli gave rise to Hfr progeny by random integration of the F-factor into the circular chromosome
at many points such that the segregants transfer the genetic markers in different order. When six of the Hfr
segregants were checked for the order of the marker transfer to a recipient by interrupted mating experiments,
following results were obtained. What is the order of the markers?
Hfr segregant Order of marker transfer
1 PAQB
2 CZEF
3 EFBQ
4 FEZC
5 ZCWD
6 APDW

Solution

APDWCZEFBQAP

4.7 Bacterial taxonomy


Taxonomy is the science of classification, identification, and nomenclature. In the process of classification (orderly
arrangement of organisms into groups), organisms are usually organized into species, genera, families, and higher
orders. The basic taxonomic unit is the species. Each species is assigned to a genus, the next rank in the taxonomic
hierarchy. For eukaryotes, the definition of the species usually stresses the ability of similar organisms to reproduce
sexually with the formation of a zygote and to produce fertile offspring. However, bacteria do not undergo sexual

431
This page intentionally left blank.
Prokaryotes and Viruses

Hepatitis virus
Hepatitis is a liver inflammation commonly caused by an infectious agent. Hepatitis sometimes results in destruction
of functional liver anatomy and cells, a condition known as cirrhosis. Some forms of hepatitis may lead to liver
cancer. Although many viruses and a few bacteria can cause hepatitis, a restricted group of viruses is often
associated with liver disease termed hepatitis viruses. Hepatitis viruses are diverse, and none of these viruses are
genetically related, but all infect cells in the liver, causing hepatitis.

Characteristics of hepatitis viruses


Features Incubation period
Hepatitis A ssRNA; No envelope 2–6 week
Hepatitis B dsDNA; enveloped 4–26 week
Hepatitis C ssRNA; enveloped 2–22 week
Hepatitis D ssRNA; enveloped 6–26 week
Hepatitis E ssRNA; No envelope 2–6 week

The genome of hepatitis B virus (hepadnavirus) is among the smallest known of any viruses, 3-4 kb. Like retroviruses,
hepatitis B virus uses reverse transcriptase during replication cycle. However, unlike retroviruses the DNA genome
of hepatitis B virus is replicated through an RNA intermediate, the opposite of what occurs in retroviruses. Hepatitis
D virus, classified as a hepatitis delta virus, is considered to be a subviral satellite because it can propagate only in
the presence of the hepatitis B virus. Transmission of hepatitis D virus can occur either via simultaneous infection
with hepatitis B virus (coinfection) or via infection of an individual previously infected with hepatitis B virus
(superinfection). The hepatitis D virus genome consists of a single stranded, negative sense, circular RNA.

4.12.6 Plant viruses


Plant viruses exist in rod and polyhedral shape. Most plant viruses have genomes consisting of a single RNA strand
of plus (+) sense type. The best-known plant virus is the rod-shaped tobacco mosaic virus (TMV). Relatively few
plant viruses have DNA genomes. There are only two classes of DNA containing plant viruses. The cauliflower
mosaic virus belongs to the first class, which contains a double-stranded DNA genome in a polyhedral capsule. The
second class of DNA containing plant viruses are the geminiviruses (gemini = twins), characterized by a connected
pair of capsids, each containing a circular, single-stranded DNA molecule of about 2500 nucleotides.
Tobacco Mosaic Virus (TMV) causes leaf mottling and discoloration in tobacco and many other plants. It was the
first virus to be discovered (by Dmitri Iwanowasky) and first virus to be crystallized (by W. Stanley). TMV is a rod
shaped virus with ~2130 capsomeres arranged in a hollow right handed helix. It contains a single genetic RNA
(ss, plus sense) of ~6400 nucleotides.

RNA

Capsid

Figure 4.51 Tobacco mosaic virus has a rod-like appearance. Its capsid is made up of ~2130 capsomeres.
One molecule of genomic ssRNA, 6400 nucleotides long, present in the centre of the capsid. The capsomere
self-assembles into the rod like helical structure (16.3 capsomeres per helical turn) around the RNA.

463
Prokaryotes and Viruses

4.13 Prions and Viroid


Prions are proteins. Prion proteins are designated as PrP. The word prion, coined in 1982 by Stanley B. Prusiner, is
derived from the words protein and infection. The endogenous, normal cellular form is denoted PrPC (for Cellular)
while the disease-causing, infectious and misfolded form is denoted PrPSc (for Scrapie, after one of the diseases
first linked to prions). Prions are glycosylated proteins and linked to the membrane by a GPI-linkage. The infectious
form, PrPSc, is responsible for neurodegenerative diseases in animals including human. The normal cellular PrPC
form is converted into PrPSc through a process whereby a portion of its α-helical and coil structure is refolded into
a β-sheet. This structural transition is accompanied by profound changes in the physicochemical properties of the
PrP. PrPC is sensitive to proteases whereas PrPSC is protease resistant. High content of β-sheet in PrPSc results in the
formation of amyloid fibrillous structure that is absent from the PrPC form. The PrPSc form can perpetuate itself by
causing the newly synthesized PrP protein to take up the PrPSc form instead of the PrPC form.

Figure 4.52 The normal prions (PrPc) have a large percentage of α-helix, but the abnormal forms (PrPsc)
have more β-pleated sheets.

Prions are novel transmissible agents causing a group of neuro-degenerative diseases that can be perpetuated by
inoculating animal with tissue extracts from infected one. Collectively, prion diseases are described as spongiform
encephalopathies. No prion diseases of plants are known. In 1997, American scientist Stanley B. Prusiner won
the Nobel Prize for this pioneering work with these diseases and with the prion proteins. Kuru was the first naturally
occurring spongiform encephalopathy of humans shown to be caused by prions. It was first described by Gajdusek
and Zigas in 1957. Kuru is characterized by cerebellar ataxia and a shivering-like tremor that produces complete
motor incoordination.

Table 4.23 Prion disease of human/animals


Disease Organism
Creutzfeldt-Jakob Human
Kuru Human
Bovine spongiform encephalopathy Cow
(Also known as Mad cow disease)
Scrapie Sheep
Chronic wasting disease Mule deer

Viroid and virusoid


Viroid is an infectious agent of plants that is a single-stranded, covalently closed circular RNA (about 250 to 400
nucleotides long) not associated with any protein. Viroid RNA does not code for any proteins. Viroids (discovered
and named by Otto Diener) have so far been shown to infect plants only. A few well-studied viroids include coconut
cadang-cadang viroid and Potato Spindle-Tuber Viroid (PSTV). No viroid diseases of animals are known, and the

464
This page intentionally left blank.
Chapter 05
Immunology

Immunology is the science that is concerned with immune response to foreign challenges. Immunity (derived
from Latin term immunis, meaning exempt), is the ability of an organism to resist infections by pathogens or state
of protection against foreign organisms or substances. The array of cells, tissues and organs which carry out this
activity constitute the immune system. Immunity is typically divided into two categories—innate and adaptive immunity.

5.1 Innate immunity


Innate (native/natural) immunity is present since birth and consists of many factors that are relatively nonspecific—
that is, it operates against almost any foreign molecules and pathogens. It provides the first line of defense against
pathogens. It is not specific to any one pathogen but rather acts against all foreign molecules and pathogens. It also
does not rely on previous exposure to a pathogen and response is functional since birth and has no memory.

Elements of innate immunity

Physical barriers
Physical barriers are the first line of defense against microorganisms. It includes skin and mucous membrane. Most
organisms and foreign substances cannot penetrate intact skin but can enter the body if the skin is damaged.
Secondly, the acidic pH of sweat and sebaceous secretions and the presence of various fatty acids and hydrolytic
enzymes like lysozyme inhibit the growth of most microorganisms. Similarly, respiratory and gastrointestinal tracts
are lined by mucous membranes. Mucus membranes entrap foreign microorganisms. The respiratory tract is also
covered by cilia, which are hair like projections of the epithelial-cell membranes. The synchronous movement of
the cilia propels mucus-entrapped microorganisms out of these tracts. Similarly, the conjunctiva is a specialized,
mucus-secreting epithelial membrane that lines the interior surface of each eyelid. It is kept moist by the continuous
flushing action of tears (lacrimal fluid) from the lacrimal glands. Tears contain lysozyme, lactoferrin, IgA and thus
provide chemical as well as physical protection.
Microorganisms do occasionally breach the epithelial barricades. It is then up to the innate and adaptive immune
systems to recognize and destroy them, without harming the host. In the case of innate immune response, several
antimicrobial chemicals and phagocytic cells provide protection against pathogens.

Chemical mediator
A variety of chemicals mediates protection against microbes during the period before adaptive immunity develops.
The molecules of the innate immune system include complement proteins, cytokines, pattern recognition molecules,
acute-phase proteins, cationic peptides, enzymes like lysozyme and many others.

Complement proteins
The complement proteins are soluble proteins/glycoproteins that are mainly synthesized by liver and circulate in
the blood and extracellular fluid. They were originally identified by their ability to amplify and complement the

469
Immunology

action of antibodies; hence, the name complement. It also bridges innate and adaptive immunity and removes
immune complexes. The complement system is composed of over 30 serum proteins. Activation of complement
proteins in response to certain microorganisms results in a controlled enzymatic cascade, which targets the membrane
of pathogenic organisms and leads to their destruction.

Cytokines
The term cytokine is a generic term for any low molecular weight soluble protein or glycoprotein released by one
cell population which acts as an intercellular mediator. It includes monokines, lymphokines, interleukins, interferons
and others. Cytokines are required for immunoregulation of both innate as well as adaptive immune responses.
Interferons are cytokines made by cells in response to virus infection, which essentially induce a generalized
antiviral state in surrounding cells.
Chemokines are small, positively charged secreted proteins that have a central role in guiding the migrations of
various types of white blood cells. They bind to the surface of endothelial cells, and to negatively charged proteoglycans
of the extracellular matrix in organs. By binding to G-protein-linked receptors on the surface of specific blood cells,
chemokines attract these cells from the bloodstream into an organ, guide them to specific locations within the
organ, and then help stop migration.

Pattern recognition molecule


Many molecules involved in innate immunity have the ability to recognize a given class of molecules i.e. recognize
pattern. Patterns are conserved structures and invariant among microorganisms of a given class. Pattern recognition
molecules that recognize Pathogen-Associated Molecular Pattern (PAMP) may be soluble, circulating proteins or cell
surface receptors. Many PAMPs are recognized by pattern recognition molecules present on the surface of phagocytic
cells. Mannose-binding lectin (MBL) and C-reactive protein (CRP) are soluble pattern recognition molecules that
bind to microbial surfaces and promote their opsonization. Toll-Like Receptors (TLRs) are a class of pattern recognition
molecules that function exclusively as signaling receptors. It was originally identified as a protein involved in the
establishment of dorso-ventral polarity in developing fly embryos. It is also involved, however, in the adult fly’s
resistance to fungal infections. There are at least 10 distinct TLRs in humans, which recognize lipopolysaccharide,
peptidoglycan, zymosan and CpG DNA. For example, TLR-4 signals the presence of bacterial lipopolysaccharide
(LPS) and heat-shock proteins. TLR-2 signals the presence of bacterial lipoproteins and peptidoglycans. The TLR
family proteins consist of extracellular leucine rich repeat (LRR) motifs and a cytoplasmic tail containing a Toll/IL-1
receptor homology (TIR) domain. The LRR motifs are responsible for ligand recognition and the TIR domain is
essential for triggering intracellular signaling pathways.

Recruitment of Recruitment and Activation of Gene


TLR
adaptor proteins activation of transcription transcription
protein kinase factors

Figure 5.1 TLR and basic signaling mechanisms.

Acute phase proteins are a heterogeneous group of plasma proteins mainly produced in the liver as the result of
a microbial stimulus. They include C-reactive protein (CRP), serum amyloid protein A (SAA) and mannose binding
protein (MBP). Cytokines (IL-1, IL-6, IL-8, etc.) released by macrophages upon activation by bacteria stimulate the
liver to rapidly produce acute-phase proteins. These proteins maximize activation of the complement system and
opsonization of invading microbes.

Cellular defenses

Many specialized cell types like neutrophils, macrophages, monocytes, natural killer cells participate in innate host
defense mechanisms. Once a pathogen evades the physical and chemical barriers, these specialized cells play an
important role in protection. Phagocytosis is a fundamental protective mechanism carried out by these cell types,

470
This page intentionally left blank.
Immunology

5.5 Antigens
Adaptive immune responses arise as a result of exposure to foreign compounds. The compound that evokes the
response is referred to as antigen, a term initially coined due to the ability of these compounds to cause antibody
responses to be generated. An antigen is any agent capable of binding specifically to T-cell receptor (TCR) or an
antibody molecule (membrane bound or soluble). The ability of a compound to bind with an antibody or a TCR is
referred to as antigenicity. There is a functional distinction between the term antigen and immunogen. An
immunogen is any agent capable of inducing an immune response and is therefore immunogenic. The distinction
between the terms is necessary because there are many compounds that are incapable of inducing an immune
response, yet they are capable of binding with components of the immune system that have been induced specifically
against them. Thus all immunogens are antigens, but not all antigens are immunogens.

Requirements for immunogenicity

A substance must possess the following characteristics to be immunogenic:

1. Foreignness
The most important feature of an immunogen is that an effective immunogen must be foreign with respect to
the host. The adaptive immune system recognizes and eliminates only foreign (nonself) antigens. Self antigens
are not recognized and thus individuals are tolerant to their own self molecules, even though these same
molecules have the capacity to act as immunogens in other individuals of the same species.

2. Size
The second requirement for being immunogenic is that the compound must have a certain minimal molecular
weight. There is a relationship between the size of immunogen and its immunogenicity. In general, small
compounds with a molecular weight <1000 Da (e.g. penicillin, aspirin) are not immunogenic; those of molecular
weight between 1000 and 6000 Da (e.g. insulin, adrenocorticotropic hormone) may or may not be immunogenic;
and those of molecular weight >6000 Da (e.g. albumin, tetanus toxin) are generally immunogenic. The most
active immunogens tend to have a molecular mass of 100,000 Da or more. In short relatively small substances
have decreased immunogenicity, whereas large substances have increased immunogenicity.

3. Chemical complexity
The third characteristic necessary for a compound to be immunogenic is a certain degree of chemical complexity.
For example, homopolymers of amino acids or sugars are seldom good immunogens regardless of their size.
Similarly, a homopolymer of poly-γ-D-glutamic acid (the capsular material of Bacillus anthracis) with a molecular
weight of 50,000 Da is not immunogenic. The absence of immunogenicity is because these compounds, although
of high molecular weight, are not sufficiently chemically complex.
Virtually all proteins are immunogenic. Furthermore, the greater the degree of complexity of the protein, the
more vigorous will be the immune response to that protein. Carbohydrates are immunogenic only if they have
a complex polysaccharide structure or part of complex molecules such as glycoproteins. Nucleic acids and
lipids are poor immunogens by themselves, but they become immunogenic when they are conjugated to
protein carriers.

4. Dosage and route of administration


The insufficient dose of immunogen may not stimulate an immune response either because the amount
administered fails to activate enough lymphocytes or because such a dose renders the responding cells
unresponsive. Besides the need to administer a threshold amount of immunogen to induce an immune response,
the number of doses administered also affects the outcome of the immune response generated.
The route of administration also affects the outcome of the immunization because this determines which organs
and cell populations will be involved in the response. Immunogens can be administered through a number of
common routes: Intravenous (into a vein); intradermal (into the skin); subcutaneous (beneath the skin);

479
This page intentionally left blank.
Immunology

Class I Inhibitory
MHC receptor


No killing
+

Ligand
Activating
Normal cell receptor NK cell

Killing
+

Ligand
Activating
Altered self cell receptor NK cell

Figure 5.46 An activating receptor on NK-cells interacts with its ligand on normal and altered self cells,
inducing an activation signal that results in killing. However, interaction of inhibitory NK-cell receptors with
class I MHC molecules delivers an inhibition signal that counteracts the activation signal. Expression of class I
molecules on normal cells thus prevents their destruction by NK-cells. Because class I expression is often
decreased on altered self cells (virus infected cells and tumor cells), the killing signal predominates, leading
to their destruction.

5.13.1 Superantigens
Superantigens are viral or bacterial proteins that bind simultaneously to the variable domain of β of a T-cell
receptor (TCR) and to the α-chain of a class II MHC molecule (i.e. outside the peptide-binding groove). Because of
their unique binding ability, superantigens can activate large numbers of T-cells irrespective of their antigenic
specificity. Superantigens can be exogenous and endogenous. Exogenous superantigens are soluble proteins secreted
by bacteria whereas endogenous superantigens are cell-membrane proteins encoded by certain viruses that infect
mammalian cells.

b Ag
a
MHC TCR
a b

Superantigen

APC TH cell

Figure 5.47 Superantigen-mediated cross-linkage of T-cell receptor (TCR) and class II MHC molecules.
Superantigen binds to class II MHC molecule and a part of the Vβ chain of the T-cell receptor that is outside
the normal antigen-binding site and this binding is sufficient to trigger T-cell activation. A superantigen binds
to all TCRs bearing a particular V sequence regardless of their antigen specificity.

525
Immunology

5.14 Cytokines
Cytokines are low-molecular-mass (generally less than 30 kDa) soluble proteins/glycoproteins, non-immunoglobulin
in nature, secreted by a variety of cell types and act nonenzymatically through specific receptors to regulate host
cell function. They do not include the peptide and steroid hormones of the endocrine system. Cytokines play major
roles in the development of cellular and humoral immune responses, induction of the inflammatory response,
regulation of hematopoiesis, control of cellular proliferation and differentiation.
Cytokines can affect the same cell responsible for their production (an autocrine function) or nearby cells (a
paracrine function), or they can be distributed by the circulatory system to distant target cells (an endocrine
function). They are highly potent hormone-like substances, active even at femto molar concentration. However,
they differ from endocrine hormones as being not produced by glands but by widely distributed cells. Cytokines
produce biological actions only when they bind to specific, high-affinity receptors on the surface of target cells. The
biological activities of cytokines exhibit pleiotropy (a given cytokines that has different biological effect on different
target cells), redundancy (two or more cytokines that mediates similar functions), synergy (combined effect of two
cytokines on cellular activity is greater than the additive effect of the individual cytokines) and antagonism (effect
of one cytokines inhibit the effect of another cytokines).

Target cell Effect


PLEIOTROPY
B cell Activation, proliferation, differentiation
Activated TH cells IL-4 Thymocyte Proliferation
Mast cell Proliferation

REDUNDANCY
IL-2
Activated TH cells IL-4 B cell Proliferation
IL-5

SYNERGY
IL-4
Activated TH cells + B cell Induces class switch to IgE
IL-5
ANTAGONISM

Activated TH cells IL-4 B cell Blocks class switch of IgE induced by IL-4

IFN-g

Figure 5.48 Cytokine attributes of pleiotropy, redundancy, synergy (synergism), antagonism.

Cytokines differ from hormones and growth factors. All three are secretory proteins that elicit their biological
effects at very low concentrations by binding to receptors on target cells. Growth factors tend to be produced
constitutively, whereas cytokines and hormones are secreted in response to discrete stimuli. Unlike hormones,
which generally act long range in an endocrine fashion, most cytokines act over a short distance in an autocrine or
paracrine fashion. In addition, most hormones are produced by specialized glands and tend to have a unique action
on one or a few types of target cell. In contrast, cytokines are often produced by, and bind to, a variety of cells.
There are over 100 different cytokines. The generic name of cytokines includes all proteins with a small molecular
weight, released by cells of the immune system, especially by monocytes and T-lymphocytes. But they are also
secreted by many cells in addition to those of the immune system, such as endothelial cells and fibroblasts. They used
to have different names depending either on their origin, such as lymphokines (produced by lymphocytes), monokines
(substances produced by monocytes or macrophages) or on their activity: chemokines, interleukins, interferons.
Cytokines may be grouped into following categories : hematopoietins, interleukins, interferons, chemokines and
members of the tumor necrosis factor (TNF) family.

526
This page intentionally left blank.
Immunology

Antibodies do not play a role in type IV hypersensitivity reactions. On activation, the TH1 cells release cytokines
that cause accumulation and activation of macrophages, which, in turn, cause local damage. The tuberculin skin
test is an example of a type IV hypersensitivity. This test is done by putting a small amount of tuberculin purified
protein derivative (PPD) under the top layer of skin. If any person has ever been exposed to the Mycobacterium
tuberculosis, skin will react to the antigens by developing a firm red bump at the site within 2 days. It is a standard
method of determining whether a person is infected with Mycobacterium tuberculosis.

Overview of Hypersensitivity
IgE-mediated

Type I hypersensitivity
Ab-mediated

IgG/IgM-mediated
Immediate
Type II hypersensitivity
Symptoms are manifested
within minutes or hours Ag-Ab mediated IgG-mediated
after exposure
Type III hypersensitivity
Hypersensitivity

Delayed T-cell mediated


Symptoms are manifested Type IV hypersensitivity
within days after exposure

5.17 Autoimmunity
The body is normally able to distinguish its own self-antigens from foreign nonself antigens and does not mount an
immunologic attack against itself. This phenomenon is called immune tolerance. Autoimmunity is a condition in
which structural or functional damage is produced by the action of immunologically competent cells or Ab against
self antigen. Autoimmunity literally means protection against self, but actually it implies injury to self, and therefore
sometimes the term is also under criticism.
Autoimmune disease results from the activation of self-reactive T and B-cells that, following stimulation by
genetic or environmental triggers, cause actual tissue damage. Four factors influence the development of autoimmune
disease. These factors are genetic, viral, hormonal and psycho-neuro-immunological (the influence of stress and
neurochemicals). All four of these factors can affect gene expression, which directly or indirectly interferes with
important immunoregulatory actions. Based on the site of involvement and nature of lesions autoimmune diseases
may be classified as hemocytolytic, localized (or organ specific), systemic (or non-specific) and transitory diseases.
Important examples of autoimmune diseases in human and their respective autoantigen are given below in the table.

Table 5.14 Some autoimmune diseases in humans


Disease Autoantigen
Autoimmune hemolytic anemia Rh blood group
Graves disease Thyroid-stimulating hormone receptor
Multiple sclerosis Myelin basic protein
Myasthenia gravis Acetylcholine receptor
Rheumatoid arthritis Unknown synovial joint antigen
Systemic lupus erythematosus DNA, histones, snRNP
Type 1 diabetes mellitus Pancreatic beta cell antigen

535
Immunology

5.18 Transplantation
The immune system has evolved as a way of discriminating between self and non-self. This discriminating power of
the immune system between self and non-self is undesirable in the case of tissue transplant from one individual to
another for therapeutic purposes. Indeed, result of transplants culminates in the phenomenon of graft rejection.
Before the discussion about the immunological mechanisms associated with graft rejection, it is important to
understand the various gradations in relationship from donor to recipient.
Isograft : Graft between genetically identical individuals (syngeneic). In humans, an isograft (or syngraft) can
be performed between monozygotic twins.
Allograft : Transplants between genetically different individuals within a species.
Xenograft : A graft between individuals from different species.
Autograft : A graft or transplant from one body part to another on the same individual.

Transplanting tissue that is not immunologically privileged generates the possibility that the recipient’s cells will
recognize the donor’s tissue as foreign. This triggers the recipient’s immune mechanisms, which may destroy the
donor tissue. Such a response is called a graft rejection reaction. Some transplanted tissues do not stimulate an
immune response. For example, a transplanted cornea is rarely rejected because lymphocytes do not circulate into
the anterior chamber of the eye. This site is considered an immunologically privileged site. Another example of a
privileged tissue is the heart valve.
A tissue rejection reaction can occur by two different mechanisms. First, foreign class II MHC molecules on transplanted
tissue, or the graft is recognized by host T-helper cells, which aid cytotoxic T-cells in graft destruction. Cytotoxic
T-cells then recognize the graft through the foreign class I MHC molecules. This response is much like the activation
of CTLs by virally infected host cells. A second mechanism involves the T-helper cells reacting to the graft and
releasing cytokines. The cytokines stimulate macrophages to enter, accumulate within the graft, and destroy it. The
MHC molecules play a dominant role in the tissue rejection reaction because of their unique association with the
recognition system of T-cells.

5.19 Immunodeficiency diseases


Immunodeficiencies occur when one or more components of the immune system is defective. Immunodeficiencies
may be primary and secondary.
A deficiency caused by a defect in one or more genes involved in the development or function of the immune
system is called primary immunodeficiency.
A deficiency in the immune system that is acquired after birth, usually because of infection and that is not related
to a genetic defect is called secondary or acquired immunodeficiency.

Example of primary immunodeficiency diseases:

Severe combined immunodeficiency (SCID)

SCID is a genetic disorder which is characterized by a very low number of circulating lymphocytes. Both arms
(B-cells and T-cells) of the adaptive immune system become non-functional. As such patients make neither specific
T-cell dependent antibody responses nor cell-mediated immune responses, and thus cannot develop immunological
memory. Several different defects can lead to the SCID phenotype. In X-linked SCID, which is the commonest
form of SCID, T-cells fail to develop because of defect in the genes code for several cytokine receptors, including
those for the interleukins IL-2, IL-4, IL-7, IL-9 and IL-15. The autosomally inherited SCID occurs due to adenosine
deaminase deficiency. Adenosine deaminase catalyzes conversion of adenosine to inosine, and its deficiency results
in accumulation of adenosine, which interferes with purine metabolism which result in an accumulation of nucleotide
metabolites that are particularly toxic to developing T-cells.

536
Immunology

Chediak-Higashi syndrome

Chediak-Higashi syndrome is an autosomal recessive disease. It is characterized by recurrent bacterial infections,


lack of skin and eye pigment. Phagocytes from patients with this immune defect contain giant granules but do not
have the ability to kill bacteria. With time, patients develop massive infiltrates of lymphocytes and macrophages in
the liver, spleen, and lymph nodes. The molecular basis of the defect is a mutation in a protein involved in the
regulation of intracellular trafficking. The mutation impairs the targeting of proteins to lysosomes.

DiGeorge syndrome

DiGeorge syndrome, or congenital thymic aplasia, is not hereditary but occurs sporadically and is result of a
deletion in chromosome 22. The syndrome is caused by defective migration of fetal neural crest cells into the third
and fourth pharyngeal pouch. DiGeorge syndrome in its most severe form is the complete absence of a thymus.
This developmental defect causes immunodeficiency along with hypoparathyroidism, and congenital heart disease.
The immune defect includes a profound depression of T-cell numbers and absence of T-cell responses. Although
B-cells are present in normal numbers, affected individuals do not produce antibody in response to immunization
with specific antigens.

5.20 Failures of host defense mechanisms


The propagation of a pathogen depends on its ability to multiply in a host. Hence pathogens must therefore grow
without activating an immune response. The most successful pathogens persist either because they do not elicit an
immune response, or because they evade the response once it has occurred. Pathogens have developed various
strategies for avoiding destruction by the immune system. Some of the strategies are mentioned below:
• One way in which a pathogenic agent can evade the immune response is by altering its antigens. There are
three ways in which antigenic variation can occur:
First, many pathogenic agents exist in a wide variety of antigenic types. For example, there are many types of
Streptococcus pneumoniae. Each type differs from the others in the structure of its polysaccharide capsule. The
different types are distinguished by serological tests and so are often known as serotypes. Infection with one
serotype of Streptococcus pneumoniae can lead to type-specific immunity, which protects against reinfection
with that type but not with a different serotype.
A second, more dynamic mechanism of antigenic variation arises due to antigenic drift and antigenic shift.
Antigenic drift is caused by point mutations in the genes encoding epitopes. For example, in influenza virus new
variant arises with mutations in genes encoding the major surface proteins hemagglutinin and neuraminidase.
Thus individuals who were previously infected with, and hence are immune to, the old variant are thus susceptible
to the new variant. Periodically influenza viruses also show an antigenic shift through reassortment of their
segmented genome with another influenza virus, changing their surface antigens radically. Such antigenic shift
variants are not recognized by individuals immune to influenza.
The third mechanism of antigenic variation involves programmed rearrangements in the DNA of the pathogen.
For example, African trypanosomes which cause sleeping sickness in humans changes the major surface
antigen repeatedly within a single infected host. The trypanosome is coated with a single type of glycoprotein,
the Variant-Specific Glycoprotein (VSG), which elicits an antibody response. The trypanosome genome, however,
contains about 1000 VSG genes, each encoding a protein with distinct antigenic properties. Only one of these
is expressed at any one time. The VSG gene expressed can be changed by gene rearrangement. So, by having
their own system of gene rearrangement that can change the VSG protein produced, trypanosomes keep one
step ahead of an immune system capable of generating many distinct antibodies by gene rearrangement.

537
This page intentionally left blank.
Immunology

5.21 Vaccines
An individual may be exposed to an antigen to induce formation of antibodies, a type of immunity known as artificial
active immunity. The material used to induce artificial active immunity, the antigen or a mixture of antigens, is
known as a vaccine (or an immunogen), and the process of generating such an immune response is immunization.
Immunization is commonly known as vaccination. The term vaccine has been derived from the Latin word vaccinus
meaning from cows. It can be defined as a nontoxic or non virulent preparation of antigenic material that can be
used to induce long term humoral as well as a cell mediated immune response against pathogens.

Type of antigens used in vaccines

In principle anything from whole organisms to small subcellular fragment can be used as antigen in vaccines. Most
current vaccines in use for humans consist of whole organisms, described as whole organism vaccine, which
include live but attenuated organisms or killed organisms and purified antigen vaccines (toxoid, capsular
polysaccharide and recombinant microbial antigens). However, recent advances in molecular biology had provided
alternative methods for producing vaccines such as DNA vaccines and recombinant vector vaccines.

Whole organism vaccines

It consists of live but attenuated or inactivated bacterial cells or viral particles.

Live but attenuated vaccines

These vaccines are prepared by attenuating pathogenic organisms by growing them in unfavorable conditions
which result in gene mutations due to which organism looses pathogenicity but retains their capacity for transient
growth. Attenuated vaccines have advantages as well as disadvantages. Due to their capacity for transient growth,
these vaccines show prolonged immunogenicity and eliminate the need for repeated boosters. But a major
disadvantage of these vaccines is the associated risk of reverting back to virulent form. However these days
genetic engineering is also used to cause site directed mutation causing irreversible removal of virulence genes
from attenuated organisms making them safe for use.
The Sabin polio vaccine is an example of attenuated vaccine, consisting of three attenuated strains of poliovirus.
Sabin vaccine in the intestines induces production of secretory IgA, which serves as an important defense against
naturally acquired poliovirus.

Inactivated (killed) vaccines


Inactivated vaccines are produced by inactivation of pathogenic organisms by heat or chemical treatment so that
organisms are unable to multiply in the host. Inactivation is carried out in such a way so that the structure of
epitopes on surface antigens remain maintained. Hence heat treatment is generally not satisfactory since it can
cause protein degradation and thus loss of some epitopes. Thus chemical inactivation using formaldehyde or other
alkylating agent is more common. Inactivated vaccines are effective, but they are less immunogenic so often
require several boosters and normally do not adequately stimulate cell-mediated immunity or secretory IgA
production. In contrast, attenuated vaccines usually are given in a single dose and stimulate both humoral and cell-
mediated immunity.

Table 5.15 Comparison of attenuated and inactivated vaccine


Attenuated vaccine Inactivated vaccine
Booster dose Generally requires single booster Requires multiple boosters
Relative stability Less stable More stable
Immunity induced Humoral and cell-mediated Mainly humoral
Reversion to virulent form May revert to a virulent form Cannot revert to a virulent form

539
Immunology

Purified antigen vaccines


Purified antigen vaccines (sometimes called subunit vaccines) composed of molecules purified directly from the
pathogen. Three general forms of subunit vaccines are in current use: Toxoid, capsular polysaccharide, recombinant
microbial antigens. Many exotoxins can be modified chemically so they retain their antigenicity but are no longer
toxic. Such a modified exotoxin is called a toxoid. Toxoids are usually not as efficient as the original exotoxin for
producing immunity, but they can be given safely and in high doses.
In the case of recombinant antigen vaccine, the gene encoding any immunogenic protein can be cloned and
expressed in bacterial, yeast or mammalian cells using recombinant DNA technology. The first recombinant antigen
vaccine approved for human use is the hepatitis B vaccine. The immune response generated by recombinant
antigen vaccines is primarily humoral. The antigens are processed via the MHC class II pathway, and therefore do
not induce a cellular immune response.

Host for vector

Expression vector with Synthesis of


Enveloped virus
viral gene viral protein

Figure 5.58 Schematic representation of the development of a recombinant antigen vaccine.

DNA vaccines

DNA vaccines, also known as genetic vaccines, use the genetic material of the pathogen itself to immunize the
individual. In this class of vaccine, fragments of the pathogen’s genome encoding antigenic proteins are injected
directly into the host cells where they can integrate into the chromosomal DNA or exist as episomes. Expression of
genes within the host generate foreign proteins to which the host immune system responds. Hence in DNA vaccine,
an immune response is made against the protein encoded by the vaccine DNA. The DNA itself is not immunogenic.
DNA vaccines induce both humoral and cell mediated immunity.

Antigenic protein
coded by plasmid
Humoral
Gene for immunity
antigenic
protein DNA
vaccine Antigenic
peptides

Cell mediated immunity

Figure 5.59 DNA vaccines and humoral and cell mediated immunity.

540
This page intentionally left blank.
Chapter 06
Genetics

All living organisms reproduce. Reproduction results in the formation of offspring of the same kind. However, the
resulting offspring need not and, most often, does not totally resemble the parent. Several characteristics may
differ between individuals belonging to the same species. These differences are termed variations. The mechanism
of transmission of characters, resemblances as well as differences, from the parental generation to the offspring,
is called heredity. The scientific study of heredity, variations and the environmental factors responsible for these, is
known as genetics (from the Greek word genno = give birth). The word genetics was first suggested to describe the
study of inheritance and the science of variation by prominent British scientist William Bateson.

Genetics can be divided into three areas: classical genetics, molecular genetics and evolutionary genetics. In
classical genetics, we are concerned with Mendel’s principles, sex determination, sex linkage and cytogenetics.
Molecular genetics is the study of the genetic material: its structure, replication and expression, as well as the
information revolution emanating from the discoveries of recombinant DNA techniques. Evolutionary genetics is the
study of the mechanisms of evolutionary change or changes in gene frequencies in populations (population genetics).

Classical genetics
6.1 Mendel’s principles
Gregor Johann Mendel (1822–1884), known as the Father of Genetics, was an Austrian monk. In 1856, he published
the results of hybridization experiments titled Experiments on Plant Hybrids in a journal “The proceeding of the
Brunn society of natural history” and postulated the principles of inheritance which are popularly known as Mendel’s
laws. But his work was largely ignored by scientists at that time. In 1900, the work was independently rediscovered
by three biologists - Hugo de Vries of Holland, Carl Correns of Germany and Erich Tschermak of Austria. Mendel did
a statistical study (he had a mathematical background). He discovered that individual traits are inherited as discrete
factors which retain their physical identity in a hybrid. Later, these factors came to be known as genes. The term
was coined by Danish botanist Wilhelm Johannsen in 1909. A gene is defined as a unit of heredity that may
influence the outcome of an organism’s traits.

Mendel’s experiment
Mendel chose the garden pea, Pisum sativum, for his experiments since it had the following advantages.
1. Well-defined discrete characters
2. Bisexual flowers
3. Predominant self fertilization
4. Easy hybridization
5. Easy to cultivate and relatively short life cycle

545
Genetics

Characters studied by Mendel


The characteristics of an organism are described as characters or traits. Traits studied by Mendel were clear cut and
discrete. Such clear-cut, discrete characteristics are known as Mendelian characters. Mendel studied seven characters/
traits (all having two variants) and these are:
Dominant Recessive
1. Stem length Tall Dwarf
2. Flower position Axial Terminal
3. Flower color Violet White
Seed coat color Grey White
4. Pod shape Inflated Constricted
5. Pod color Green Yellow
6. Cotyledon color Yellow Green
7. Seed form Round Wrinkled

Flower color is positively correlated with seed coat colors. Seeds with white seed coats were produced by plants
that had white flowers and those with gray seed coats came from plants that had violet flower.

Allele

Each gene may exist in alternative forms known as alleles, which code for different versions of a particular inherited
character. We may also define alleles as genes occupying corresponding positions on homologous chromosomes
and controlling the same characteristic (e.g. height of plant) but producing different effects (tall or short). The term
homologous refers to chromosomes that carry the same set of genes in the same sequence, although they may
not necessarily carry identical alleles of each gene.

Wild-type versus Mutant alleles


Prevalent alleles in a population are called wild-type alleles. These alleles typically encode proteins that are made
in the right amount and function normally. Alleles that are present at less than 1% in the population and have been
altered by mutation are called mutant alleles. Such alleles usually result in a reduction in the amount or function of
the wild-type protein and are most often inherited in a recessive fashion.

Dominant and Recessive alleles


A dominant allele masks or hides expression of a recessive allele and it is represented by an uppercase letter.
A recessive allele is an allele that exerts its effect only in the homozygous state and in the heterozygous condition
its expression is masked by a dominant allele. It is represented by a lowercase letter.

Homozygous and Heterozygous


Each parent (diploid) has two alleles for a trait — they may be:
1. Homozygous, indicating they possess two identical alleles for a trait.
a. Homozygous dominant genotypes possess two dominant alleles for a trait (T T ).
b. Homozygous recessive genotypes possess two recessive alleles for a trait (tt).

2. Heterozygous genotypes possess one of each allele for a particular trait (Tt).

546
This page intentionally left blank.
Genetics

6.1.5 Penetrance and expressivity


The percentage of individuals that shows a particular phenotype among those capable of showing it, is known as
penetrance. Let us take an example of polydactyly in human, which is produced by a dominant gene. Homozygous
recessive genotype does not cause polydactyly. However, some heterozygous individuals are not polydactylous. If
suppose 20% of heterozygous individuals do not show polydactyly, this means that the gene has a penetrance of
80%. The degree of expression of a trait is controlled by a gene. A particular gene may produce different degrees
of expression in different individuals. This is known as expressivity. Different degrees of expression in different
individuals may be due to variation in the allelic constitution of the rest of the genome or to environmental factors.
Thus, the terms penetrance and expressivity quantify the modification of gene expression by varying environment
and genetic background; they measure respectively the percentage of cases in which the gene is expressed and the
level of expression.

Phenocopy
A phenotype that is not genetically controlled but looks like a genetically controlled one is called phenocopy. It is an
environmentally induced phenotype that resembles the phenotype determined by the genotype. An example of a
phenocopy is Vitamin-D-resistant rickets. A dietary deficiency of vitamin D, for example, produces rickets that is
virtually indistinguishable from genetically caused rickets.

6.1.6 Probability
The chance that an event will occur in the future is called the event’s probability. For example, if you flip a coin, the
probability is 0.50, or 50%, that the head side will be showing when it lands. The probability depends on the number
of possible outcomes. In this case, there are two possible outcomes (head and tail), which are equally likely. This
allows us to predict that there is a 50% chance that a coin flip will produce a head. The general formula for the
probability is:

Number of times an event occurs


Probability =
Total number of events

Phead = 1 head/(1 head + 1 tail) = 1/2 = 50%

A probability calculation allows us to predict the likelihood that an event will occur in the future. The accuracy of this
prediction, however, depends to a great extent on the size of the sample.
In genetic problems, we are often interested in the probability that a particular type of offspring will be produced.
For example, when two heterozygous tall pea plants (Tt) are crossed, the phenotypic ratio of the offspring is
3 tall : 1 dwarf. This information can be used to calculate the probability for either type of offspring:

Number of individuals with a given phenotype


Probability =
Total number of individuals

Ptall = 3 tall/(3 tall + 1 dwarf) = 3/4 = 0.75 = 75% and

Pdwarf = 1 dwarf/(3 tall + 1 dwarf) = 1/4 = 0.25 = 25%

The probability of obtaining a tall plant is 75% and a dwarf plant 25%. When we add together the probabilities of all
the possible outcomes (tall and dwarf), we should get a sum of 100% (here, 75% + 25% = 100%).
There are two basic laws of probability that are used for genetic analysis. The first law, the multiplicative law
(product rule) of probability, states that the chance of two or more independent events occurring together is the
product of the probability of the events occurring separately. Independent events are events whose outcomes do
not influence one another. This is also known as the and rule. The product rule can be used to predict the
probability of independent events that occur in a particular order.

555
This page intentionally left blank.
Genetics

r y

r y Heterozygous (YyRr) diploid cell


R Y from a plant with round yellow seeds

R Y
Meiosis I

r Y r y

r Y r y
R y R Y

R y R Y

r R r R

r R r R
Y y y Y

Y y y Y
Meiosis II

Y r Y r y R y R y r y r Y R Y R

Possible haploid gametes

Figure 6.7 Random alignment of bivalents during prophase of meiosis I explains Mendel’s law of independent
assortment.

6.3 Gene interaction


According to Mendel, genes are functioning independently of each other i.e. each of seven traits considered was
controlled by a single gene. But many traits of an organism are determined by the complex contribution of many
different genes. When two or more different genes (non-allelic) influence the outcome of single trait, this is known
as a gene interaction.
The first case of two different genes interacting to affect a single trait was discovered by William Bateson and
Reginald Punnett in 1906. They discovered an unexpected gene interaction when they studied crosses involving the
sweet pea, Lathyrus odoratus. When they crossed true breeding purple flowered plant to a true breeding white
flowered plant, the F1 generation was all purple flowered plants and the F2 generation (produced by self fertilization
of the F1 generation) contained purple and white flowered plants in a 3 : 1 ratio. But when they crossed two
different varieties of white flowered plants then all F1 generation plants had purple flowers. When these purple
flower plants were allowed to self fertilized, the F2 generation contained purple and white flowers in a ratio of
9 purple : 7 white. How can this unexpected result be explained? This surprising result was explained by Bateson

559
Genetics

and Punnett by considering the involvement of two different (non-allelic) genes; because of the F2 9 : 7 ratio is a
variation of the 9 : 3 : 3 : 1 ratio. Let us consider the formation of the purple pigment in which products of two
different genes are involved.

Genotype Genotype
(CC or Cc) (PP or Pp)

Enzyme A Enzyme B
Colorless precursor Colorless intermediate Purple pigment
(Anthocyanin)

C (purple color producing) allele is dominant to c (white)


P (purple color producing) allele is dominant to p (white)

In the above pathway, a colorless precursor molecule must be acted on by two different enzymes to produce the
purple pigment. Gene C encodes a functional enzyme A, which converts the colorless precursor into a colorless
intermediate and finally gene P encodes enzyme B, which gives purple color by converting colorless intermediate.
If any of these two genes will be in homozygous recessive condition (cc or pp) then the purple color will not appear.
Thus the genotype cc can hide or mask the phenotype expression of genotype PP or Pp.

P generation White flowered plant × White flowered plant


(CCpp) (ccPP)

F1 generation All purple


(CcPp)

The F1 hybrid plants are allowed to self fertilize


CcPp × CcPp

CP Cp cP cp

CP CCPP CCPp CcPP CcPp


Purple Purple Purple Purple

Cp CCPp CCpp CcPp Ccpp


Purple White Purple White
F2 generation
cP CcPP CcPp ccPP ccPp
Purple Purple White White

cp CcPp Ccpp ccPp ccpp


Purple White White White

Figure 6.8 9 : 7 phenotypic ratio in F2 generation.

The purple color appears only when dominant alleles of both genes are present. When one or both genes have only
recessive alleles, the color will be white.

Epistasis
The term epistasis (Greek for standing upon) describes a type of gene interaction when one gene masks or
modifies the expression of another gene at distinct locus. Any gene that masks the expression of another non-allelic
gene is epistatic to that gene. The gene suppressed is hypostatic. In the pathway discussed for the formation of
purple color, when either is homozygous recessive (cc or pp) that gene is epistatic to the other.
Epistasis is different from dominance. Epistasis is the interaction between different genes (non-alleles). Dominance
is the interaction between different alleles of the same gene i.e. intraallelic.

560
This page intentionally left blank.
Genetics

AABB(1), AABb(2), AaBB(2) These have at least one functional allele A and convert all the substrates to
AaBb(4), AAbb(1), Aabb(2) purple product.

(Purple 12)

aaBB(2), aaBb(1) Lack any functional enzyme A, but have a functional enzyme B, which converts
(Red 3) the substrate to a red product.

aabb(1) Have no functional enzymes and cannot synthesize any colored pigment.
(White 1)

6.3.2 Recessive epistasis


In the case of recessive epistasis, in a pair of non-allelic genes, one produces its phenotypic effect independently
in a dominant state, but another cannot produce a phenotypic effect independently. However, the latter can produce
its effect when they are together in dominant state. For example, A and B are two non-allelic genes and A can
produce a phenotypic effect independently in dominant state, but second gene B cannot produce a phenotypic
effect independently. In this case, the recessive genotype aa suppresses the expression of alleles at the B locus.
But, in the presence of dominant allele at the A locus, the alleles of the B locus express. Thus the genotypes A-B-
and A-bb produce two additional phenotypes. The 9 : 3 : 3 : 1 ratio becomes a 9 : 3 : 4 ratio.

Explanation : Let us take the following case, in which F2 phenotypic ratio is 9 Purple : 3 Red : 4 White.

Parent 1 Parent 2
AA bb aa BB
(Red) (White)
F1
AaBb
(Purple)

F2
9 purple : 3 red : 4 white

In this example, the biochemical pathway would again be a simple chain, but the product of enzyme A would be red
in color.
Enzyme A Enzyme B
White substance Red product Purple product

AABB(1), AaBB(2), AABb(2), AaBb(4) have at least one functional copy of both A and B and therefore can
(Purple 9) synthesize the purple pigment.
AAbb(2), Aabb(1) have only functional enzyme A and produce red pigment but do not
(Red 3) convert it to purple pigment.
aaBB(2), aaBb(1) have no functional enzyme A and so cannot synthesize the red product
that is the substrate for enzyme B and will remain white.
aabb(1) have no functional enzymes and cannot synthesize the purple pigment.
(White 4)

6.3.3 Duplicate recessive epistasis


If two non-allelic genes are involved in a specific pathway and functional products from both are required for
expression, then one homozygous recessive allele at either allelic pair would result in the mutant phenotype. In
such case, the genotype aaBB, aaBb, AAbb and aabb produce one phenotype and genotype AABB, AaBB, AABb,
AaBb produce another phenotype (9 : 7). Because both dominant alleles complement each other for the correct
phenotype, these non-allelic genes are called complementary genes. Hence, this interaction is also termed as
complementary gene interaction.

562
Genetics

6.3.4 Duplicate dominant interaction


If the alleles of both gene loci produce the same phenotype without cumulative effect, the 9 : 3 : 3 : 1 ratio is
modified into 15 : 1 ratio. Duplicate gene interaction allows dominant alleles of either duplicate gene to produce the
wild-type phenotype. Only organisms with homozygous recessive of both genes have a mutant phenotype.
The mechanism by which wheat kernel color is determined is an example of duplicate gene action. In wheat, kernel
color is dependent upon a biochemical reaction that converts a colorless precursor substance into a colored product,
and this reaction can be performed with the product of either gene A or gene B. Thus, having either an A allele or
a B allele produces color in the kernel, but a lack of either allele will produce a white kernel that is devoid of color.
So, if two plants with genotype AaBb are crossed with each other, the genotype AABB, AABb, AaBB, AaBb, AAbb,
Aabb, aaBB and aaBb produce the color phenotype and the genotype aabb produce no color. In this cross, whenever
a dominant allele is present at either locus, the biochemical conversion occurs, and a colored kernel results. Thus,
only the double homozygous recessive genotype produces a phenotype with no color, and the resulting phenotypic
ratio of color to noncolor is 15 : 1.
Enzyme A
(Product of gene A)

Precursor Product
(Colorless) (Colored)

Enzyme B
(Product of gene B)

6.3.5 Dominant and recessive interaction


Dominant and recessive interaction is similar to dominant epistasis but occurs when a dominant allele of one gene
completely suppresses the phenotypic expression of alleles of another gene. This type of epistasis is sometimes
called dominant suppression, because the deviation from 9 : 3 : 3 : 1 is caused by a single allele that produces a
dominant phenotype.
For example, in Primula plant, the pigment malvidin creates blue-colored flowers. Synthesis of malvidin is controlled
by gene A, yet production of this pigment can be suppressed by non-allelic gene B. In this case, the B gene is dominant
to the A gene, so plants with the genotype AaBb will not produce malvidin because of the presence of the B gene.
So, if two plants with genotype AaBb are crossed with each other, the genotype AABB, AABb, AaBB, AaBb, aaBB,
aaBb and aabb produce the white color and the genotype AAbb and Aabb produce blue color. In this case, the
presence of the B gene suppresses the production of malvidin.

Product of gene A
Precursor Malvidin
(Colorless) × (Colored)

Product of gene B

Summary of different forms of gene interactions


Each gene pair affecting a different character
1. Complete dominance at both gene pairs:
Example : Pisum sativum
Phenotype classes Genotypes
9 yellow round AABB (1), AABb (2), AaBB (2), AaBb (4)
3 yellow wrinkled AAbb (1), Aabb (2)

563
This page intentionally left blank.
Genetics

6.6.3 Sex determination in plants


Sexually reproducing plant species may be 'sexually monomorphic' or 'sexually polymorphic. In sexually monomorphic
condition, individual plants have both sexes – whether present within single flower (hermaphrodite) or in separate
male and female flowers (monoecious). A minority of plant species are 'sexually polymorphic', including dioecious
species. Dioecious species are the ones showing animal-like sexual dimorphism, with female plants bearing unisexual
flowers containing only carpels and male plants bearing unisexual flowers containing only stamens. Many, but not all,
dioecious plants have a non-identical pair of chromosomes associated with the sex determination. Of the species with
non-identical sex chromosomes, a large proportion have an XY system. For example, the dioecious plant Melandrium
album has 22 chromosomes per cell: 20 autosomes plus 2 sex chromosomes, with XX females and XY males.

6.6.4 Mosaicism
Mosaicism is a condition in which cells within the same individual have a different genetic makeup. Individuals
showing mosaicism are referred to as mosaics. Mosaicism can be caused by DNA mutations, epigenetic alterations
of DNA, chromosomal abnormalities (change in chromosome number and structure) and the spontaneous reversion
of inherited mutations. Mosaicism can be associated with changes in either nuclear or mitochondrial DNA. An
individual with two or more cell types, differing in chromosome number or structure is either a mosaic or a chimera.
If the two cell types originated from a single zygote, the individual is a mosaic, and when originated from two or
more zygotes that subsequently fused, the individual is a chimera.
Mosaicism can exist in both somatic cells (somatic mosaicism) and germ line cells (germline mosaicism). As their
names imply, somatic and germ line mosaicism refer to the presence of genetically distinct groups of cells within
somatic and germ line tissues, respectively. If the event leading to mosaicism occurs during development, it is
possible that both somatic and germ line cells will become mosaic. In this case, both somatic and germ line tissue
populations would be affected, and an individual could transmit the mosaic genotype to his or her offspring.
Conversely, if the triggering event occurs later in life, it could affect either a germ line or a somatic cell population.
If the mosaicism occurs only in a somatic cell population, the phenotypic effect will depend on the extent of the
mosaic cell population; however, there would be no risk of passing on the mosaic genotype to offspring. On the
other hand, if the mosaicism occurs only in a germ line cell population, the individual would be unaffected, but the
offspring could be affected.
How is somatic mosaicism generated? There are many possible reasons, including somatic mutations, epigenetic
changes in DNA, alterations in chromosome structure and/or number, and spontaneous reversal of inherited mutations.
In all of these cases, a given cell and those cells derived from it could exhibit altered function.

6.6.5 Sex-linked traits and sex-linked inheritance


In an XY-chromosomal system of sex determination, both X and Y-chromosomes are sex chromosomes. In general,
genes on sex chromosomes are described as sex linked genes. However, the term sex linked usually refers to loci
found only on the X-chromosome; the term Y-linked is used to refer to loci found only on the Y-chromosome, which
control holandric traits (traits found only in males).
Cytogeneticists have divided the X and Y-chromosomes of some species into homologous and non-homologous
regions. The latter is called differential regions. These differential regions contain genes that have no counterparts
on the other sex chromosome. Genes in the differential regions are said to be hemizygous (half zygous). Genes
in the differential region of the X show an inheritance pattern called X-linkage; those in the differential region of
the Y show Y-linkage. Genes in the homologous region show what might be called X-and-Y linkage.
Another important feature of sex linked genes in XY-chromosomal system of sex determination is that females
have two X-chromosomes, they can have normal homozygous and heterozygous allelic combinations. But males,
with only one copy of the X-chromosome can be neither homozygous nor heterozygous. Hence the term hemizygous
is used for X-linked genes in males. Since only one allele is present, a single copy of a recessive allele can determine

585
This page intentionally left blank.
Genetics

In X-linked inheritance, the pattern of inheritance for loci on the heteromorphic sex chromosome differs from the
pattern for loci on the homomorphic autosomal chromosomes, because sex chromosome alleles are inherited in
association with the sex of offspring. Alleles on a male’s X-chromosome go to his daughters, but not to his sons,
because the presence of his X-chromosome normally determines that his offspring is a daughter. Since the father
passes a trait to his daughters, who passes it to their sons. Hence, this pattern of inheritance is known as criss-cross
pattern of inheritance. In Drosophila, eye color has nothing to do with sex determination, so we see that genes on
the sex chromosomes are not necessarily related to sexual function. The same is true in humans, for whom
pedigree analysis has revealed many X-linked genes, of which few could be constructed as being connected to
sexual function.

6.6.6 Sex-limited traits


Sex hormones influence the action of certain genes. In some cases, a given genotype is so dependent on the
presence of these hormones that its expression is limited to one sex. The result is a sex-limited trait, which is
expressed in only one sex, although the genes are present in both sexes. Sex-limited traits are usually determined
by autosomal genes and primarily concerned with the secondary sexual characters. In humans, for example,
breast development is a trait that is normally limited to female, whereas beard growth is limited to males.

6.6.7 Sex-influenced traits


The sex-limited trait is an extreme example of how the expression of a gene can be controlled by hormones. In
other less extreme cases of sex controlled characteristics, only the dominance relationship of the two alleles is
affected. Characteristics of this type are known as sex-influenced traits (or sex-conditioned), in which an allele is
dominant in one gender, but recessive in the opposite gender. In human, pattern baldness provides an example of
a sex-influenced trait. Pattern baldness is characterized by the premature loss of hair from the front and top of the
head. It is more common in males than in females. Women who have the genotype for pattern baldness typically
show only thinning of hair rather than a complete loss. The gene that causes pattern baldness is inherited as an
autosomal trait. When a male is heterozygous for the baldness allele, he will become bald.

Genotype Phenotype
Male Female
BB Bald Bald
Bb Bald Non-bald
bb Non-bald Non-bald

In contrast, a heterozygous female will not be bald. Women who are homozygous for the baldness allele will
develop the trait. Sex influence nature of pattern baldness appears to be related to the levels of the male sex
hormones.

6.6.8 Pedigree analysis


A pedigree is a family tree or chart made of symbols and lines that represent a person’s genetic family history. In
pedigree, symbols represent people and lines represent genetic relationships. The pedigree is a visual tool for
documenting the biological relationship in families and determine the mode of inheritance (dominant, recessive etc.)
of genetic diseases. Pedigrees are most often constructed by medical geneticists or genetic counselors. A sample
pedigree is given below:

587
This page intentionally left blank.
Genetics

6.9 Cytogenetics
A chromosome is an organized structure of DNA and protein that is found in the nucleus of a eukaryotic cell. The
study of the structure, function and abnormalities of chromosome is called cytogenetics, a discipline that combines
cytology with genetics.

6.9.1 Human karyotype


The number, sizes and shapes of the metaphase chromosomes constitute the karyotype or karyogram, which is
distinctive for each species. The useful karyotypic characteristics are: chromosome size, chromosome number, sex
chromosomes, centromere position, nucleolar organizer position, heterochromatin pattern, secondary constriction
and banding patterns. Karyotype consisting of a photograph or diagram of all the metaphasic chromosomes arranged
in homologous pairs according to decreasing length and position of centromere is described as idiogram.

Table 6.6 Symbol used in describing a karyotype


Symbol Meaning
p (petit) Short arm
q (queue) Long arm
13p Short arm of chromosome 13
13q Long arm of chromosome 13
del Deletion
del(2) Deletion in chromosome 2
dup Duplication
dup(1) Duplication in chromosome 1
inv Inversion
inv(4) Inversion in chromosome 4
t Translocation
t(2;5) Reciprocal translocation between a chromosome 2 and a chromosome 5
tel Telomere
cen Centromere
+ or – Indicate gain or loss of part of chromosome
2q– Deletion of the long arm of chromosome 2

Tijo and Levan (1956) of Sweden found that human cells have 23 pairs or 46 chromosomes. Of the 23 pairs, 22 are
perfectly matched in both males and females, and are called autosomes. The remaining pair, the sex chromosomes,
consists of two similar chromosomes in females and two dissimilar chromosomes in males. In human, females are
designated XX and males XY. The largest autosome is number 1, and the smallest is number 21.

Denver system
According to ‘Denver system’ of classification, the 22 pairs of human chromosomes are placed in seven groups as;
Group Position of centromere Idiogram number
I (A) Metacentric or submetacentric 1, 2, 3
II (B) Submetacentric 4, 5
III (C) Submetacentric 6, 7, 8, 9, 10, 11, 12 and X
IV (D) Acrocentric 13, 14 and 15
V (E) Metacentric or submetacentric 16, 17 and 18
VI (F) Metacentric 19 and 20
VII (G) Metacentric 21, 22 and Y

601
Genetics

Male Female

1 2 3 4 5 1 2 3 4 5

6 7 8 9 10 6 7 8 9 10

11 12 13 14 15 11 12 13 14 15

16 17 18 19 20 16 17 18 19 20

21 22 XY 21 22 XX

Figure 6.37 The karyotype of a human male and female.

6.9.2 Chromosome banding


Chromosome banding is a cytological procedure of differential staining of mitotic chromosome along the longitudinal
axis. The differential staining reactions reflect the heterogeneity and complexity of the chromosome along its
length. The molecular mechanisms involved in producing the various banding patterns are not precisely defined.
Chromosome painting is different from banding. It refers to the hybridization of fluorescently labeled chromosome-
specific, composite probe pools to chromosome.
The most common methods of dye-based chromosome banding are G- (Giemsa), R- (reverse), C- (centromere)
and Q- (quinacrine) banding. Bands that show strong staining are referred to as positive bands; weakly staining
bands are negative bands. Features of commonly used banding techniques are described in the table 6.7.

Table 6.7 Chromosome banding techniques


Technique Procedure Banding pattern
G-banding Mild proteolysis with trypsin followed by staining Dark bands are AT-rich (low gene density)
with Giemsa (G stand for Giemsa). Pale bands are GC-rich (high gene density)
R-banding Heat denature followed by staining with Giemsa. Dark bands are GC-rich
Reverse of G-banding and R stand for Reverse. Pale bands are AT-rich
Q-banding Stain with Quinacrine mustard (a fluorescent stain). Dark bands are AT-rich
Q stands for Quinacrine. Pale bands are GC-rich

C-banding Denature with barium hydroxide and then stain with Dark bands contain constitutive
Giemsa. C stands for Constitutive heterochromatin. heterochromatin

Regions, bands and sub-bands


A region is an area that lies between two landmarks. Regions are divided into bands. A band is that part of a
chromosome that is distinctly different from the adjacent area by virtue of being lighter or darker in staining
intensity. Each band is approximately 5 to 10 megabase pairs of DNA that may include hundreds of genes. The

602
This page intentionally left blank.
Genetics

Molecular genetics

6.10 Genome
Genome is the sum total of all genetic material of an organism which store biological information. The nature of the
genome may be either DNA or RNA. All eukaryotes and prokaryotes always have a DNA genome, but viruses may
either have a DNA genome or RNA genome. The eukaryotic genome consists of two distinct parts: Nuclear genome
and organelles (mitochondrial and chloroplast) genome. The nuclear genome consists of linear dsDNA. In a few
lower eukaryotes, double-stranded circular plasmid DNA (for example, 2-micron circle in yeast) is also present
within the nucleus.
The amount of DNA present in the genome of a species is called a C-value, which is characteristic of each species.
The value ranges from <106 bps as in smallest prokaryote, Mycoplasma to more than 1011 bps for eukaryotes such
as amphibians. The genomes of higher eukaryotes contain a large amount of DNA.

Flowering plants

Mammals

Reptiles

Birds

Amphibians

Fish

Echinoderms

Insects

Worms

Algae and fungi

6 7 8 9 10 11
10 10 10 10 10 10

Size of eukaryotic haploid genome (base pairs)

Figure 6.48 The DNA content of the haploid genome of a range of phyla. The range of values within a phylum
is indicated by the shaded area.

The DNA content of the organism’s genome is related to the morphological complexity of lower eukaryotes, but
varies extensively among the higher eukaryotes. In lower eukaryotic organisms like yeast, amount of DNA increases
with increasing complexity of organisms. However, in higher eukaryotes there is no correlation between increased
genome size and complexity. This lack of correlation between genome size and genetic complexity refers to
C-value paradox. For example, a man is more complex than amphibians in terms of genetic development, but
some amphibian cells contain 30 times more DNA than human cells. Moreover, the genomes of different species of
amphibians can vary 100-fold in their DNA contents.

613
Genetics

Table 6.9 Genome size in some eukaryotes


Organism Genome size (Mb)

S. cerevisiae (yeast) 12
A. thaliana (mustard plant) 120
D. melanogaster (fruit fly) 170
H. sapiens (human) 3,300
H. vulgare (barley) 5,300

6.10.1 Genome complexity


Genome complexity is the total length of different sequences of DNA. It can be measured through the renaturation
kinetics of denatured DNA. Renaturation of DNA occurs through complementary base pairing. Renaturation of DNA
depends on the random collision of the complementary strands, and follows second-order kinetics. A DNA renaturation
(reassociation) reaction is described by the Cot1/2. If large DNA is sheared into uniform fragments and allowed to
renature, then the rate of renaturation of denatured DNA is expressed as

dC 2
= − kC
dt

where k is the second-order rate constant. C is the concentration of single-stranded DNA at time t and the second
order rate equation for two complementary strands coming together is given by the rate of decrease in C.

Starting with a concentration, C0, of completely denatured DNA at t=0, the amount of single-stranded DNA remaining
at some time t is
C 1
=
C0 (1 + k.C0.t)

The time for half of the DNA to renature (when C/C0 = 0.5) is defined as t = t1/2. Then,
1
0.5 = and thus 1 + k.C0.t1/2 = 2, yielding
(1 + k.C0.t1 / 2 )

1
C0.t1 / 2 =
k

The product of C0 × t1/2 is called the Cot1/2. It is inversely proportional to the rate constant. Since the Cot1/2 is the
product of the concentration and time required to proceed halfway, a greater Cot1/2 implies a slower reaction. The
renaturation of DNA usually is followed in the form of a Cot curve. A graph of the fraction of single-stranded DNA
reannealed (1 – C/C0) as a function of Cot on a semilogarithmic plot is referred to as a Cot curve.

5 6
Genome size 1 3500 1.7×10 4.2×10 bp

100%

Poly U:polyA MS2 T4 E.coli

Fraction
reassociated

0 –6 –4 –2 2 Figure 6.49
10 10 10 1 10
Cot curve of dsDNA
–6
2×10 8×10
–3
3×10
–1
9 Cot1/2 from the indicated source.

614
This page intentionally left blank.
Genetics

6.10.10 Yeast S. cerevisiae genome


The yeast genome consists of 16 linear chromosomes, each containing a centromeric region required for chromosome
segregation. The nucleotide sequence of the entire S. cerevisiae genome has been determined and found to contain
12,068 kb of DNA. Sequence analysis has identified 5885 potential protein coding genes and another 45S RNA
coding genes (rRNA, snRNA and tRNA genes). Almost 70% of the yeast genome is devoted to protein coding
sequences. Interestingly, unlike most other eukaryotic genes, only about 4% of the about 6000 yeast genes have
introns, and even then, most of these genes contain only a single intron within the coding sequence.

6.10.11 E. coli genome


E. coli genome comprises single main chromosome and plasmids. The main chromosome is made up of circular
dsDNA with a homogeneous distribution of genes. Computer analysis of the E. coli DNA sequence identified 4288
actual and proposed gene-coding sequences. It was found that approximately 88% of the genome encodes proteins
or RNAs, ~11% appears to be utilized for gene regulatory functions, and <1% consists of repetitive DNA sequences.
The average distance between E. coli genes is only 120 bp.

6.11 Eukaryotic chromatin and chromosome


A chromatin is an organized structure of DNA and protein that is found in the nucleus of eukaryotic cells. It contains
a single dsDNA in coiled and condensed form. Chromatin and chromosomes are basically the same thing. The
difference is that chromatin is less condensed, extended DNA while chromosomes are highly condensed DNA. The
word chromosome comes from the Greek word chroma, color and soma, body due to their property of being very
strongly stained by particular dyes. The extent of chromatin condensation varies during the life cycle of the cell. In
non-dividing as well as interphase stages of cell, most of the chromatin remain relatively decondensed. The light-
staining, less condensed portions of chromatin is termed euchromatin. The darkly stained and highly condensed
regions of chromatin is termed as heterochromatin. In interphase nuclei, chromatin appears to be attached to a
nuclear matrix, a proteinaceous structure. DNA sequence attached to nuclear matrix are called MAR (matrix
attachment regions). MAR are usually ~70% A·T-rich, but lack any consensus sequences. A chromatin DNA molecule
contains three specific nucleotide sequences: Centromere, Telomere and Origin of replication.

Centromere
The centromere is a constricted region of a eukaryotic chromatin/chromosome where the kinetochore is assembled
and sister chromatids are held together. Although this constriction is termed as centromere, it is usually not located
exactly in the center of the chromosome and, in some cases, is located almost at the chromosome’s end. The
regions on either side of the centromere are referred to as the chromosome’s arms. Kinetochore associated with
the centromere is a complex of proteins where spindle fibers attach to the chromosome during mitosis/meiosis and
help in the proper segregation of sister chromatids or homologous chromosomes. The centromere has no defined
DNA sequence. It typically consists of large arrays of tandemly repeated DNA sequences. In humans, the centromeric
sequences are made up of 171 bp repeating unit and are called alphoid DNA. In the yeast, Saccharomyces cerevisiae,
the centromeric sequence (CEN) is about 110 bp long and it consists of three types of sequence element:

• CDE-I - 9 bp sequence;
• CDE-II - >90% A·T-rich sequence of 80–90 bp;
• CDE-III - 11 bp highly conserved sequence.

TC A C ATG AT TG ATTTC C G A A
A G TG TA C TA A C TA A A G G C TT
{
{
{

CDE-I CDE-II CDE-III


80–90 bp, > 90% (A+T)

633
This page intentionally left blank.
Genetics

6.11.4 Polytene chromosomes


Polytene chromosomes (also known as giant chromosomes) were discovered by Balbiani in 1881 in larval salivary
glands of Chironomus. Polytene chromosomes are specialized interphase chromosomes present in certain insect
cells. Cells with polytene chromosomes differ from mitotically dividing cells. These cells undergo repeated rounds
of DNA replication without cell division (endomitosis). In this case the cell cycle consists of just two periods,
synthetic and intersynthetic. At the end of each replication period, daughter chromatids do not segregate, rather,
they remain paired with each other to different degrees.
Polyteny has been most studied in the salivary gland cells of Drosophila larvae, in which the DNA in each of the four
Drosophila chromosomes has been replicated through 10 cycles without separation of the daughter chromosomes,
so that 1024 (210) identical strands of chromatin are lined up side by side. When polytene chromosomes are viewed
in the light microscope after staining, distinct alternating dark bands and light interbands are visible. About 95%
of the DNA in polytene chromosomes is in bands, and about 5% is in interbands. The chromatin in each band
appears dark, either because it is much more condensed than the chromatin in the interbands, or because it
contains a higher proportion of proteins, or both. Both bands and interbands in polytene chromosomes contain
genes. There are approximately 5000 bands and 5000 interbands in the complete set of Drosophila polytene
chromosomes. Bands that are sites of gene expression expand to give chromosome puffs (Balbiani rings). It
consists of a region in which the chromatin fibers unwind from their usual state of packing in the band. The puffs are
sites where RNA is being synthesized. A characteristic pattern of puffs is found in each tissue at any given time.
Organs containing cells with polytene chromosomes are, as a rule, involved in intense secretory functions
accomplished during a short time against a background of rapid growth. The features of polyteny provide the
conditions necessary to accomplish these functions.

X-chromosome
Right arm of
chromosome 2

Chromocenter

Left arm of
chromosome 2

Right arm of Left arm of


chromosome 3 chromosome 3

Figure 6.78 A light micrograph of polytene chromosomes present in Drosophila salivary glands. Each parental
chromosome is tightly paired with its homologue. All the chromosomes are linked together by the pericentromeric
region to create a single chromocenter. Under light microscope, distinct alternating dark bands and light
bands (known as interbands) are visible.

6.11.5 Lampbrush chromosomes


Lampbrush chromosome was first observed by Flemming in 1882 in amphibian oocytes. It develops during the diplotene
stage of meiotic prophase during oogenesis in oocytes of many animal species (except mammals). The lampbrush
chromosomes are meiotic bivalent, each consisting of two pairs of sister chromatids held together by chiasmata.

644
Genetics

The loop as shown in the figure 6.79 is an extruded segment of DNA that is being actively transcribed. The lateral
loops extend in pairs, one from each sister chromatid. The loops are surrounded by a matrix of ribonucleoproteins
that contain nascent RNA chains. Lampbrush chromosomes are thought to assist in fulfilling the high demand for
transcripts during oogenesis.

Maternal chromosome

Paternal chromosome

Chromomere

Enlarged section of
a chromosome
Chromatin loop

Chromatin
loop

Sister chromatids

Chromomere

Figure 6.79 Lampbrush chromosome structure. Most of the DNA in each chromosome remains highly
condensed in the chromomeres. Each of the two chromosomes shown consists of two closely apposed sister
chromatids. This four stranded structure is characteristic of diplotene stage of meiosis.

6.11.6 B-chromosomes
The B-chromosomes (also referred to as supernumerary or accessory chromosomes) are additional (extra)
chromosomes that are present in some individuals in some species. In eukaryotic cells normal chromosomes are
termed as A-chromosomes. Most B-chromosomes are mainly or entirely heterochromatic and genetically inert.
They are thought to be selfish genetic elements with no defined functions. The evolutionary origin of B-chromosomes
is not clear, but presumably they must have been derived from heterochromatic segments of normal A-chromosomes.

6.12 DNA replication


Transmission of chromosomal DNA from generation to generation is crucial to cell propagation. This can only be
achieved when chromosomal DNA is accurately replicated, providing two copies of the entire genome for faithful
distribution into each daughter cell.

645
Genetics

6.12.1 Semiconservative replication


It is crucial that the genetic material is reproduced accurately. When Watson and Crick worked out the double-helix
structure of DNA in 1953, they recognized that the complementary nature of the two strands - A paired with T and
G paired with C - might play an important role in its replication. Because the two polynucleotide strands are joined
only by hydrogen bonds, they are able to separate without requiring breakage of covalent bonds. If the two strands
of a parental double helix of DNA are separated, the base sequence of each parental strand could serve as a
template for the synthesis of a new complementary strand, producing two identical progeny double helices. This
process is called semiconservative replication because the parental double helix is half conserved, each parental
single strand remaining intact. The alternative methods are conservative and dispersive. In conservative replication,
the whole original double helix acts as a template for a new one, one daughter molecule would consist of the
original parental DNA, and the other daughter would be totally new DNA. In dispersive replication, some parts of
the original double helix are conserved, and some parts are not. In this model, the parental double stranded helix
is broken into double-stranded DNA segments and just like conservative mode of replication the synthesis of new
double-stranded DNA segments occurs.

A. Conservative model B. Semiconservative model C. Dispersive model

Figure 6.80 A. In conservative model, after one round of replication two daughter dsDNA molecules form.
In which one daughter molecule contains both parental DNA strands and the other daughter molecule contains
two newly synthesized DNA strands. B. In semiconservative model, the two parental DNA strands separate
and each of those strands then serves as a template for the synthesis of a new DNA strand. The result is two
DNA double helices, both of which consist of one parental and one new strand. C. In dispersive model, the
parental double helix is broken into double-stranded DNA segments. The segments then reassemble into
complete DNA double helices, each with parental and all newly-synthesized dsDNA segments interspersed.

Meselson and Stahl experiment


Meselson and Stahl experimentally demonstrated the semiconservative replication of DNA in E. coli in 1958. They
grew E. coli cells in a medium in which the sole nitrogen source was 15N-labeled ammonium chloride (15N is a heavy
15 14
isotope of nitrogen). The N-containing E. coli cell culture was then transferred to a N medium and allowed to
14
continue growing ( N is a light isotope of nitrogen). Samples were harvested at regular intervals. The DNA was
extracted and its buoyant density determined by centrifugation in CsCl density gradients. The isolated DNA showed
14 15
a single band in the density gradient, midway between the light N-DNA and the heavy N-DNA bands. After two

646
This page intentionally left blank.
Genetics

Topoisomerase
A DNA topoisomerase is a nuclease that breaks a phosphodiester bond in a DNA strand. This reaction is reversible,
and the phosphodiester bond reforms as the enzyme leaves. The first DNA topoisomerase was discovered by
James Wang in 1971 from E. coli. There are several types of topoisomerases present in eukaryotes and
prokaryotes. All topoisomerases can be classified into two classes– type I and type II, depending on whether
they cleave one or two strands of DNA, respectively. Type I topoisomerases cleave one DNA strand and pass
other strand through the break before resealing it, while type II topoisomerases cleave both DNA strands and
pass another double strand through the break followed by resealing of the double strand break. Enzymes with
an odd Roman numeral after their name (for example, topo I and topo V) fall into the type I class, whereas
those with an even Roman numeral after their name are type II. Type I topoisomerases do not require ATP for
activity; the reaction is driven by the energy stored in the supercoiled DNA. So far, the only exception is reverse
gyrase, which introduces positive supercoils with the aid of ATP hydrolysis. Type II topoisomerases also do not
require an external source of energy for the cleavage and religation during reaction, but they do utilize ATP
hydrolysis to drive conformational changes in the protein during the reaction cycle.
All topoisomerases contain a nucleophilic tyrosine, which they use to promote strand cleavage. The tyrosyl
oxygen attacks and breaks phosphodiester bond and at the same time forming a covalent phosphotyrosine
bond. Rejoining of the DNA strand occurs by a second transesterification reaction, which is basically the reverse
of the first.
Type I topoisomerases operate by forming a transient phosphotyrosine covalent bond with one end of the
broken DNA strand, either the 5' or the 3' end, followed by passage of the unbroken strand through the break,
and ultimately resealing of the break. These topoisomerases can be further divided into two subfamilies: Type
IA and Type IB topoisomerases. During DNA hydrolysis, type IA topoisomerases covalently bind 5’-phosphate,
whereas type IB enzymes form a covalent bond with 3’-phosphate. Type IA topoisomerases pass a single-stranded
DNA segment through a transient break in a second single DNA strand. On the contrary, type IB topoisomerases
nick one DNA strand, allowing one duplex end to rotate with respect to the other around the remaining
phosphodiester bond.

dsDNA dsDNA

Type IA Type IB

Tyr Tyr
5’ P 3’ 3’ P 5’
3’ HO 5’ 5’ HO 3’

Figure 6.85 Type IA topoisomerases effect topological changes in DNA through a ‘strand passage’
mechanism, in which one strand of dsDNA is cleaved and the second DNA strand is passed through the gap.
After passage of the second DNA strand, the broken strand is resealed. Type IB topoisomerases effect supercoil
relaxation by nicking a single strand of dsDNA and allowing one DNA strand end to rotate with respect to the
other around the intact phosphodiester bond on the opposing strand.

Type IA topoisomerases comprise three distinct classes– eubacterial Topo IA, eubacterial and eukaryotic Topo III
and eubacterial and archaeal reverse gyrase. These enzymes are primarily responsible for relaxing positively
or negatively supercoiled DNA, except for reverse gyrase, which can introduce positive supercoils into DNA.
Type IB topoisomerases appear to be represented by a single family member (Topo IB).

651
Genetics

Type II topoisomerases make a transient double-strand break in the DNA helix and form a covalent linkage
to both strands of the DNA helix at the same time. During catalysis, the enzyme introduces a double strand
break in one DNA, termed the G-segment, and pass a second DNA segment termed the T-segment through the
transient break.

G-segment T-segment

ATP ATP

2 ATP

2 ADP + 2 Pi

ATP ATP ATP ATP

Figure 6.86 General mechanism of type II topoisomerases. Type II topoisomerases cleave both strands of
a dsDNA and pass another dsDNA through the transient break. The enzyme binds to and bends the G-segment.
The binding of two ATP molecules, allowing the opening of the DNA gate and passage of the T-segment. The
T-segment is dissociated from the bottom of the enzyme, leading to unidirectional strand-passage. Hydrolysis
of the ATP molecules dissociates the ATPase domains and resets the enzyme.

There are two subfamilies of type II topoisomerases– type IIA topoisomerases and type IIB topoisomerases.
Type IIA topoisomerases are found throughout all cellular organisms, as well as in some viruses and can be
divided into three classes– eukaryotic topoisomerase II (topo II), bacterial topoisomerase IV (topo IV) and
bacterial and archaeal DNA gyrase. Type IIB topoisomerases include topo VI from plants and homologues of
Spo11 present in Saccharomyces cerevisiae. Both type IIA and type IIB topoisomerases use a duplex strand
passage mechanism and have the same ATPase and cleavage domains but differ in overall tertiary structure.
Both type I and type II topoisomerases change the linking number of DNA. Type IA topoisomerases change the
linking number by one and type IB topoisomerase change the linking number by any integer, while type IIA and
type IIB topoisomerases change the linking number by two.

Problem

Some viruses, like SV40, are closed circular DNAs carrying nucleosomes. If the SV40 virus is treated with
topoisomerase, and the histone is then removed, it is found to still be supercoiled. However, if histones are
removed before topoisomerase treatment, the DNA is relaxed. Explain.

Solution

The DNA makes 1.75 left-hand super helical turns about each nucleosome. These turns are ‘constrained’ and
cannot be removed while the histones are still present. However, if the histones are first removed, the DNA, so
produced, will have unconstrained supercoils, which can be relaxed by topoisomerase.

652
This page intentionally left blank.
Genetics

6.14 DNA repair


Although the genetic variation is important for evolution, the survival of the individual demands genetic stability
also. Maintaining genetic stability requires not only an extremely accurate mechanism for replicating DNA, but also
mechanisms for repairing the many accidental lesions that occur continually in DNA. Most such spontaneous changes
in DNA are temporary because they are immediately corrected by a set of processes that are collectively called as
DNA repair. Without repair systems, a genome would not be able to maintain its essential cellular functions. Most
cells possess four different categories of DNA repair system: Direct repair, Excision repair, Mismatch repair and
Recombination repair.

6.14.1 Direct repair


Direct repair systems act directly on damaged nucleotides, converting each one back to its original structure. But
only a few types of damaged nucleotide can be repaired directly. One very common type of UV radiation mediated
damages, pyrimidine dimers, are repaired by a light-dependent direct system called photoreactivation. In
E. coli, the process involves the enzyme called DNA photolyase. When stimulated by light with a wavelength
between 300 and 500 nm, the enzyme binds to pyrimidine dimers and converts them back to the original monomeric
nucleotides. Photoreactivation is a widespread but not universal type of repair.

5 5
T T UV-B
6 6

Photolyase + blue light


P P P P P P

Adjacent thymines Thymine dimer

Another example is the repair of O6-methylguanine, which forms in the presence of alkylating agents and is a
common and highly mutagenic lesion. It tends to pair with thymine rather than cytosine during replication. Direct
repair of O6-methylguanine is carried out by O6-methylguanine DNA methyltransferase (an alkyl transferase),
which catalyzes the transfer of the methyl group of O6-methylguanine to a specific Cys residue in the same protein.

6
P CH3 P Alkyl P P
transferase
S G C S S G C S
P P P P

6.14.2 Excision repair


Excision repair involves the excision of a segment of the polynucleotide containing a damaged site, followed by
resynthesis of the correct nucleotide sequence by a DNA polymerase. These pathways fall into two categories:

Base-excision repair

Base excision repair involves removal of a damaged nucleotide base, excision of a short piece of the polynucleotide
and resynthesis with a DNA polymerase. It is used to repair many minor damage like alkylation and deamination
resulting from exposure to mutagenic agents. Enzyme DNA glycosylase initiates the repair process. A DNA

671
This page intentionally left blank.
Genetics

Introns in tRNA genes are unrelated and there is no consensus sequence that could be recognized by the splicing
enzymes. Thus splicing of tRNA depends principally on recognition of a common secondary structure in tRNA. All
the introns include a sequence that is complementary to the anticodon of the tRNA. The exact sequence and size of
the intron is not important.

3' 3' 3' 3'


5' 5' 5' 5'
Phosphodiesterase
+
Nuclease Kinase RNA ligase

2—
5'-OH 5'-PO4
Intron

2—
2'-3' PO4 3'-OH

Figure 6.144 Splicing of pre tRNA intron.

6.17 mRNA degradation


Prokaryotic mRNA

The average half life of bacterial mRNAs is only about 1.5 minutes. Degradation of mRNA occurs in the 3’–5’
direction and is mediated by several endo- and exonucleases. No enzyme capable of RNA degradation in 5’–3’
direction has yet been reported in bacteria. Endonucleases (RNase E and RNase III), make internal cut in RNA
molecules whereas exonucleases (RNase II and polynucleotide phosphorylase, PNPase), remove nucleotides
sequentially from the 3’ end of an mRNA. In E. coli, RNase E and PNPase along with RNA helicase are located within
a multiprotein complex called the degradosome.

Eukaryotic mRNA
The average half life of eukaryotic mRNA is much longer than bacterial counterpart, on average 10–20 minutes in
lower eukaryotes like yeast to several hours in mammals. For example, mRNA incoding β-globin have half lives of
more than 10 hours. In eukaryotes, there are several degradation pathways of mRNA which occur in both 3’–5’ and
5’–3’ directions. Cytosolic mRNAs are degraded by three different pathways– deadenylation dependent, deadenylation
independent and endonucleolytic pathway. Most eukaryotic mRNA degradation is deadenylation dependent. In this
degradation process, removal of poly(A) tail occurs first. Removal of tail is catalyzed by deadenylase enzyme. The
deadenylated mRNA then may either (1) be decapped and degraded by a 5’–3’ exonuclease or (2) be degraded by a
3’–5’ exonuclease.
In the major 5’–3’ degradation pathway, deadenylation at the 3’ end triggers decapping at the 5’ end. This shows
that each end of the mRNA influences events that occur at the other end. Decapping reaction occurs by cleavage of
1–2 bases from the 5’ end. Removal of the cap triggers the 5’–3’ degradation pathway in which the mRNA is
degraded rapidly from the 5’ end, by the 5’–3’ exonuclease.
In the second degradation pathway, deadenylated mRNAs are degraded by the 3’–5’ exonuclease activity. Exosome,
which is related to degradosome, degrades the mRNA in the 3’ to 5’ direction. The exosome is also found in the
nucleus, where it degrades unspliced precursors to mRNA. In deadenylation independent pathway, mRNAs are
decapped and degraded by the 5’–3’ exonuclease. Some mRNAs are degraded by an endonucleolytic pathway that
does not involve decapping or deadenylation.

710
Genetics

mRNA surveillance
mRNA surveillance is a conserved mRNA degradation mechanism utilized by organisms to ensure fidelity and
quality of mRNA molecules. There are a number of surveillance mechanisms present within cells. Two most important
surveillance mechanisms are the nonsense mediated mRNA decay and the nonstop mediated mRNA decay.
Nonsense mediated decay (NMD) is involved in detection and decay of mRNA transcripts which contain premature
termination codons. This process plays an important role in checking that mRNAs have been properly synthesized
and functions. A critical issue is how normal and aberrant mRNAs are distinguished and how that distinction leads to
differences in mRNA stability. NMD is a translation coupled mechanism that eliminates mRNAs containing premature
translation-termination codons. In eukaryotic cells, NMD requires both active mRNA translation and NMD-specific
trans-acting factors. In yeasts, three well-investigated trans-acting factors in NMD are the proteins encoded by the
UPF1, UPF2 and UPF3 genes. These genes are evolutionarily conserved, and their deletion prevents NMD. UPF1 is a
cytosolic protein that has a Cys-His-rich region at its N-terminus. It is a helicase that has RNA-dependent ATPase
and ATP-dependent 5’ to 3’ helicase activities. It interacts with translation release factors eRF1 and eRF3, providing
a direct link between the translation termination complex and the NMD machinery. In the cytoplasm, ribosomes
associate and translate the mRNA, but are stalled on encountering a premature termination codon. This results in
binding of factors such as UPF1, eRF1 and eRF3 to the ribosome. Subsequent steps that are still being elucidated
lead to mRNA decay.
A unique aspect of mammalian NMD is the involvement of the EJC (Exon Junction Complex), a complex of proteins
deposited at exon-exon junctions during mRNA splicing. In mammals, a premature termination codon is recognized
by its position relative to the last exon-exon junction. As a general rule, mammalian transcripts that contain a stop
codon more than ~50 nucleotides upstream of the last exon-exon junction will be subjected to NMD.
Nonstop mediated decay is involved in the detection and decay of mRNA transcripts which lack in-frame stop
codon. It is hypothesized that these transcripts are identified during translation when the ribosome arrives at the
3’-end of the mRNA and stalls. Presumably the ribosome stalling recruits additional cofactors and the exosome
complex. The exosome degrades the transcript.

6.18 Regulation of gene transcription


Prokaryotic transcriptional regulation is accomplished by gene regulatory proteins which bind with regulatory
sequences located near the beginning of transcription units. Gene regulatory proteins, the products of regulatory
genes, act as an activator or repressor. The binding of an activator protein to its target DNA site (located near the
promoter) increases the rate of transcription. Such instances are referred to as positive regulation because it is
the presence of an activator that is required for increased rate of transcription. The binding of repressor protein to
its target DNA site (called the operator and located within the promoter) prevents a gene from being expressed.
Because binding of a repressor prevents the gene expression, it is referred to as negative regulation. Thus, gene
regulation may be positive or negative. Negative regulation mediated by repressor that block or turn off transcription
and positive regulation mediated by activator that is required for the increased rate of transcription.
A typical bacteria contain several thousand genes. Some genes are very important to the life of the cell and hence
they remain active at all times. Thus their expressions occur constitutively, meaning that they are expressed at
a reasonably constant rate and not subject to regulation. These are referred to as housekeeping genes. In
multicellular eukaryotes, housekeeping genes need to be expressed in essentially all types of nucleated cells
because they encode a key product that is required to fulfill a general function in all cells. The genes encoding the
enzymes of glycolysis are an example of housekeeping genes.

6.18.1 Operon model


The basic concept about how gene regulation occurs at the level of transcription in bacteria was provided by the
classical model called operon model (formulated by Jacob and Monod in 1961). An operon is a unit of bacterial gene
expression and regulation, which includes structural genes and regulatory sequences recognized by regulatory

711
This page intentionally left blank.
Genetics

6.21 RNA interference


RNA interference (abbreviated RNAi) is an evolutionarily conserved mechanism of gene regulation that is induced
by small silencing RNA in a sequence-specific manner. In 1998, Fire and Mello first established this in C. elegans.
Historically, RNA interference was known by other names, including post transcriptional gene silencing (PTGS),
transgene silencing and quelling. RNAi has been observed in all eukaryotes, from yeast to mammals. RNA interference
has an important role in post-transcriptional gene regulation, transposon regulation and defending cells against
viruses. Two types of small silencing RNA molecules – small interfering RNA (siRNA) and microRNA (miRNA) – are
central to RNA interference.

siRNAs mediated RNAi


In the siRNAs mediated RNAi pathway, the dsRNAs are processed into siRNAs duplexes comprised of two ~21
nucleotides long strands with two nucleotides overhangs at the 3’ ends by an enzyme called Dicer. Dicer is a
~200 kDa multidomain, an RNase III family enzyme that functions in processing dsRNA to siRNA. The Dicer includes
an ATPase/RNA helicase domain, catalytic RNase III domains, and dsRNA binding domain. Dicer and a dsRNA
binding protein (together form the RISC loading complex) then load the RNA duplex into RISC. The siRNA is thought
to provide target specificity to RISC through base pairing of the guide strand with the target mRNA. Only one of the
two strands, which is known as the guide strand, directs the gene silencing. The other anti-guide strand or passenger
strand is degraded during RISC activation. The active components of an RNA-induced silencing complex (RISC) are
endonucleases called argonaute proteins, which cleave the target mRNA strand complementary to their bound siRNA.

Long dsRNA

Dicer

Guide strand
siRNA duplex
Passenger strand

RISC
loading complex

pre-RISC

RISC Guide strand

Target cleavage

Figure 6.164 dsRNA precursors are processed by Dicer to generate siRNA duplexes containing guide and
passenger strands. RISC-loading complex loads the duplex into RISC. The passenger strand is later destroyed
and the guide strand directs RISC to the target RNA.

miRNAs mediated RNAi


miRNAs (microRNAs) are small, non-coding RNA molecules encoded in the genomes of plants, animals and their
viruses. These highly conserved, 20–25 mer RNAs appear to regulate gene expression post-transcriptionally by
binding to the 3'-untranslated regions (3'-UTR) of specific mRNAs. Victor Ambros and colleagues identified the

731
This page intentionally left blank.
Genetics

Table 6.30 Types of small silencing RNAs


Types Organism Length Features
miRNA Animals, plants, protists 20–25 Dicer/Drosha-dependent
siRNA
Exo-siRNA Animals, plants, fungi, protists ~21 Dicer-dependent
Endo-siRNA Animals, plants, fungi, protists ~21 Dicer-dependent
piRNA Metazoans 24–30 Dicer-independent

Noncoding RNA
There are large numbers of functional RNAs that are transcribed but did not encode proteins. These functional
RNAs are called noncoding RNAs (ncRNAs). Noncoding RNAs perform a variety of biological functions. They
regulate gene expression at the levels of transcription, RNA processing and translation. They protect genomes
from foreign nucleic acids. They can guide DNA synthesis or genome rearrangement. Most noncoding RNAs
operate as RNA-protein complexes, including ribosomes, snRNPs, snoRNPs, telomerase, miRNAs and lncRNAs.
Group I and II introns: Catalytic RNAs (ribozymes), catalyze RNA splicing.
RNase P RNAs: Ribozymes, catalyze removal of 5’ leader sequence from pre-tRNAs.
Hammerhead and hepatitis delta virus: Ribozymes, induce RNA cleavage to form 2’,3’-cyclic phosphate and
5’-OH termini; also catalyze the reverse reaction and RNA ligation.
gRNA (guide RNA): Base pairs with an RNA target, orienting bound proteins to carry out a site-specific cleavage,
ligation or modification reaction.
Xist (X-inactive-specific transcript RNA): Coats one X-chromosome in mammalian female, triggering hetero-
chromatization and transcriptional repression.
Telomerase RNA: Provides template for telomeric DNA synthesis and scaffolds protein assembly.
snoRNA (small nucleolar RNA): Essential for pre-rRNA processing or modification by serving as a guide RNA to
direct methylation or pseudouridylation of complementary sequence in rRNA.
siRNA (small interfering RNA): Product of dicer cleavage of dsRNA; when complexed with an AGO protein,
induces cleavage of a perfectly-complementary target RNA.
scaRNA (small Cajal body-associated RNA): Function similar to snoRNAs, but located in the Cajal body to guide
modification of snRNAs.
Riboswitch: RNA element within an mRNA that switch between two conformations upon exposure to a small-
molecule ligand or other stimulus and inhibits or promotes gene expression at the level of transcription, translation,
or RNA splicing.
piRNA (PIWI-associated RNA): RNA that directs the modification of chromatin to repress transcription; best
characterized in the male germline.
lncRNA (long noncoding RNA): Autonomously transcribed RNA that does not encode a protein; often capped and
polyadenylated; can be nuclear, cytoplasmic or both.

6.22 Epigenetics
Although all cells in an organism contain essentially the same DNA, cell types and functions differ because of
qualitative and quantitative differences in their gene expression. Epigenetics refers to both heritable and non-heri-
table changes in gene expression that are not caused by changes in DNA sequence. The epigenetic processes that
stably alter gene expression patterns are thought to include:
1. cytosine methylation,
2. posttranslational modification of histone proteins and remodelling of chromatin and
3. RNA-based mechanisms.

734
Genetics

Methylation of the 5’-position of cytosine residues is a reversible covalent modification of DNA, resulting in production
of 5-methyl-cytosine. In general, DNA methylation is associated with gene repression. As DNA methylation
patterns can be maintained following DNA replication and mitosis, this epigenetic modification is also associated
with inheritance of the repressed state.
Posttranslational modification of histone proteins on transcription is complex and constantly expanding. Three
general principles are thought to be involved:
1. It directly affects the structure of chromatin, regulating its higher order conformation and thus acting in cis to
regulate transcription;
2. It disrupts the binding of proteins that are associated with chromatin (trans effect);
3. It attracts certain effector proteins to the chromatin (trans effect).

RNA-based mechanisms of epigenetic regulation are less well understood than mechanisms based on DNA
methylation and histones. A number of non-coding RNAs (Small non-coding RNAs as well as Long non-coding RNAs)
play important roles in modifying the sequence, structure, or expression of mRNAs and thereby also changes the
protein expression from these genes.

6.23 Genetic code


General features of genetic code
• The genetic code is a triplet code called a codon.
How many nucleotides in DNA are needed to specify each amino acid in a protein? We know that the information
in DNA must reside in the sequence of the four nucleotides that constitute the DNA: A, T, G and C. A doublet
code involving two adjacent nucleotides would not be adequate, as four kinds of nucleotides taken two at a time
can generate only 42 = 16 different combinations. But with three nucleotides per word, the number of different
words that can be produced with an alphabet of just four letters is 43 = 64. This number is more than sufficient
to code for 20 different amino acids. Such mathematical arguments led biologists to suspect the existence of a
triplet code. Later Francis Crick, Sydney Brenner, and their colleagues provided genetic evidence for the triplet
nature of the code by studying the mutagenic effects of the chemical proflavin on bacteriophage T4.
• Certain codons contain start and stop signals to initiate and terminate translation. The initiation codon is usually
AUG, which specifies methionine. In few mRNA, GUG or UUG also acts as initiation codon. Out of 64 codons,
three do not code for any amino acids and called a stop or termination codons (UAA, UAG, and UGA).
• The code is unambiguous, meaning that each triplet specifies only a single amino acid.
• No internal punctuation (commas) is used in the code. Thus, the code is said to be commaless. Once the
translation of mRNA begins, the codons are read one after the other with no breaks between them.
• The code is degenerate, meaning that a given amino acid can be specified by more than one triplet codon.
This is the case for 18 of the 22 amino acids. The different codons for a given amino acid are said to be
synonymous. For example, UUU and UUC are synonyms for phenylalanine, whereas serine is encoded by the
synonyms UCU, UCC, UCA, UCG, AGU and AGC.

Table 6.31 Amino acids and their synonymous codons


Amino acids Number of synonymous codon
Leu, Ser, Arg 6
Gly, Pro, Ala, Val, Thr 4
Ile 3
Phe, Tyr, Cys, His, Gln, Glu, Asn, Asp Lys 2
Met, Trp 1

735
Genetics

• The code is nonoverlapping. After translation commences, any single ribonucleotide at a specific location
within the mRNA is part of only one triplet.
• It is usual to describe the genetic code as a universal code, meaning that the same code is used throughout
all life forms. This is not strictly true. There is a few example of context dependent codons also. For example,
selenocysteine is coded by UGA and pyrrolysine by UAG. These codons, therefore, have a dual meaning
because they are mainly used as stop codons. Similarly, some differences in the genetic code have been found,
especially in the mitochondria, chloroplast, some protozoans and others as mentioned in table 6.32. In this
context the code is nearly universal. With only minor exceptions, a single coding dictionary is used by almost all
viruses, prokaryotes, archaea and eukaryotes.

Table 6.32 Some differences between the universal code and mitochondrial genetic codes.
Codon Universal code Unusual code Occurrence
UGA Stop Trp Mycoplasma, Spiroplasma, mitochondria of many species
CUG Leu Thr Mitochondria in yeasts
UAA, UAG Stop Gln Acetabularia, Tetrahymena, Paramecium, etc.
UGA Stop Cys Euplotes

Second position

U C A G

} phe
} } tyr } Cys
UUU UCU UAU UGU U
U UUC UCC UAC UGC C
ser
UUA
UUG } leu UCA
UCG
UAA
UAG } stop UGA
UGG
stop
trp
A
G

} } }
First position (5’-end)

} his

Third position (3’-end)


CUU CCU CAU CGU U
C CUC CCC CAC CGC arg C
leu pro
CUA
CUG
CCA
CCG
CAA
CAG
} gln CGA
CGG
A
G

A
AUU
AUC
AUA
} ile
ACU
ACC
ACA } thr
AAU
AAC
AAA
} asn
} lys
AGU
AGC
AGA
} ser
} arg
U
C
A
AUG met ACG AAG AGG G

} } } asp
}
GUU GCU GAU GGU U
GUC GCC
ala GAC GGC C
G val gly
GUA
GUG
GCA
GCG
GAA
GAG
} glu GGA
GGG
A
G

Figure 6.167 The coding dictionary.

Codon bias
Codon bias is the probability that a given codon will be used to codes for an amino acid over a different codon which
codes for the same amino acid. It refers to the fact that not all codons are used equally in the genes of a particular
organism. For example, of the four valine codons, human genes use GTG four times more frequently than GTA. The
biological reason for codon bias is not understood, but all organisms have a codon bias.

Problem

If a hypothetical peptide has the sequence Phe–Tyr–Met–Pro–His.


1. Indicate why more than one nucleotide sequence is possible.
2. Calculate the number of possible nucleotide sequence.
Solution
1. Due to degeneracy of codon.
2. Phe, Tyr and His have 2 codons, Pro has four and Met only one. So, 2 × 2 × 2 × 4 × 1 = 32.

736
This page intentionally left blank.
Genetics

6.24.2 Cap snatching


Some viruses such as influenza virus perform a unique cap-snatching process. Cap snatching is a transcription
initiation process during which a nucleotide sequence between 10 to 13 in size is cleaved from the 5’ end of host
mRNAs by an endonuclease activity present in the viral RNA-dependent RNA polymerase. The capped nucleotide
sequence removed from host mRNAs is subsequently used as a primer for transcription of the viral genome, which
ultimately leads to the synthesis of capped viral mRNAs.

6.24.3 Translational frameshifting


Translational frameshifting is a mechanism by which the translational machinery (ribosomes) shifts the frame in
which it decodes the mRNA. The result is that the mRNA does not encode the protein by a continuous run of three
nucleotide codons, known as an open reading frame (ORF). Rather, the information encoding the protein comes
from two distinct ORFs. Frameshifting is a stochastic process, meaning that each translating ribosome has a certain
probability of undergoing the shift, but that only a fraction of the ribosomes does so. Classes of signals have been
identified that direct a fraction of elongating ribosomes to shift reading frame by one base in the 5’ (–1) or 3’ (+1) direction.
In general, it is believed that the occurrence and frequency of translational frameshifting are determined
predominantly by two elements of mRNA: a slippery sequence and a downstream RNA structure. Various slippery
sequences have been identified from different retroviruses and from other viruses. A typical slippery sequence is
a heptanucleotide X XXY YYZ (where X ≠ C, Y = A or U and Z ≠ G). There are two kinds of downstream RNA
structures associated with frameshifting, stem-loop and pseudoknot. However, not all stem-loop or pseudoknot
structures can induce ribosomal frameshifting. Thus, the definitive characterization of specific RNA structures and
how these RNA structures mediate ribosomal frameshifting remain to be defined.

Ser Thr Phe Leu Asn Gly Phe Ala


5’ ... ... UCA ACG UUU UUA AAC GGG UUU GCG ORF1

Arg Val Cys


5’ ... ... UC AAC GUU UUU AAA CGG GUU UGC G ORF2

–1 frameshift

Figure 6.183 Translational frameshift in SARS coronavirus.

Ribosomes that frameshift produce a translational fusion of the two overlapping ORFs, whereas those that do not
frameshift continue normal in frame decoding and terminate at the end of the first ORF. Each of these products thus
shares a common N terminal region. Many viruses use programmed translational frameshifting to ensure synthesis
of the correct ratios of virus-encoded proteins required for proper viral particle assembly and maturation. The
phenomenon was first described in year 1985 as the way in which the Gag-Pol polyprotein of the retrovirus Rous
Sarcoma Virus (RSV) is expressed from the overlapping gag and pol ORFs. It has been demonstrated that when
ribosomes translate the unspliced genomic RNA of retroviruses, 95% of translation yields Gag proteins while only
about 5% of translation produces Gag-Pol proteins through –1 ribosomal frameshifting.

6.24.4 Antibiotics and toxins


Protein synthesis is a target of a wide variety of naturally occurring antibiotics and toxins. Mechanism of action of
some common antibiotics and toxins, which inhibit protein synthesis in prokaryotes and eukaryotes, are described
below:
Streptomycin : Streptomycin, a basic trisaccharide, binds with the 30S subunit of the bacterial ribosome and
causes misreading of mRNA at relatively low concentrations.
Chloramphenicol : Chloramphenicol binds to the 50S ribosomal subunit and blocks peptide bond formation through
inhibition of peptidyl transferase, but does not affect the cytosolic protein synthesis in eukaryotes.

753
Genetics

Tetracycline : Tetracycline binds to the 30S ribosomal subunit and interferes with aminoacyl-tRNA binding.
Erythromycin : Binds to the 50S ribosomal subunit and inhibits peptide chain elongation.
Fusidic acid : Fusidic acid binds to EF-G and blocks translocation.
Cycloheximide : Cycloheximide blocks the peptidyl transferase of 80S ribosome but not that of 70S bacterial
(and mitochondrial and chloroplast) ribosomes.
Puromycin : Puromycin is a secondary metabolite of Streptomyces alboniger that blocks protein biosynthesis.
Puromycin is a structural analogue of the 3’ end of aminoacyl transfer RNA, but differs from
tRNA insofar as the aminoacyl residue is linked to the ribose via an amide bond rather than an
ester bond. Puromycin, like aminoacyl-tRNA, binds to the A site of the ribosome peptidyl-
transferase center. When the A site is occupied by puromycin, peptidyl-transferase links the
peptide residues of the peptidyl-tRNA in the ribosomal P site covalently to puromycin. Since
the amide bond cannot be cleaved by the ribosome, no further peptidyl transfer takes place,
and the peptidyl-puromycin complex falls off the ribosome.

H3C CH3
NH2 N

N N
N N
tRNA

O P O C N HO C N
O N O N
O

O OH HN OH

C O C O

H2N C H H2N C H

CH2 CH2

O
tRNA-phenylalanine
CH3

Puromycin

Diphtheria toxin : Diphtheria toxin, an exotoxin of Corynebacterium diphtheriae infected with a specific temperate
phage (Corynephage β), stops the protein synthesis in eukaryotes by inactivating the elongation
factor eEF2. Inactivation of elongation factor eEF2 occurs due to ADP-ribosylation, which is
catalyzed by A fragment of toxin.
Ricin : A toxic protein of the castor bean (Ricinus communis) that inactivates the 60S subunit of
eukaryotic ribosomes by depurinating a specific adenosine in 28S rRNA.

6.24.5 Post-translational modification of polypeptides


Chemical modification

Primary translation products often undergo a variety of modification reactions, involving the addition of chemical
groups, which are attached covalently to the polypeptide. This can involve simple chemical modification like
hydroxylation and phosphorylation of the side chains of single amino acids or the addition of different types of
carbohydrate or lipid group.

754
This page intentionally left blank.
Genetics

Second step: Transesterification


In this step, the side-chain of the first residue of the C-extein attacks the ester (or thioester) bond at the amino end
of the intein. Here too the attack is by a polar side chain of a Ser, Thr (both –OH) or Cys (–SH). This leads to a
transesterification and formation of thioester or ester bond between N-extein and C-extein.

Third step: Asn cyclization


Cyclization of the Asn side chain leads to cleavage of the peptide bond between the intein and the C-extein
(C-terminal splice junction). This reaction removes intein from the ligated exteins, which are linked together via the
ester bond.

Fourth step: O–N shift


This step of protein splicing is spontaneous. The reverse N–O or N–S shift takes place and peptide bond formation
occurs between N- and C-exteins.
Some inteins show sequence specific endonuclease activity also. Such inteins cut DNA in the intein-minus gene at
a specific point and allow a copy of a DNA sequence coding intein to integrate. This event is similar to intron homing
and termed as intein homing.

6.25 Mutation
Genome is not a static entity. It is dynamic in nature. It is subject to different types of heritable genetic changes.
A heritable genetic change in the genetic material of an organism that gives rise to alternate forms of any gene is
called mutation. The process by which mutations is produced is called mutagenesis. An organism exhibiting a
novel phenotype as a result of the presence of a mutation is referred to as a mutant. In a broad sense, the term
mutations include all types of heritable genetic changes of an organism not explainable by recombination of preexisting
genetic variability. Mutation may include change in chromosome number, chromosomal aberrations and changes in
chemistry of genes. But here we have described ‘mutation’ in terms of change in chemistry of gene which is known
as gene mutation.

General characteristics of mutation


• Mutations are generally recessive, but dominant mutations also occur.
• Mutations are generally harmful to the organisms.
• Mutations are random, occur at any time and in any cell of an organism.
• Mutations are recurrent i.e. the same mutation may occur again and again.

Role of mutation
• Ultimate source of all genetic variation and it provides the raw material for evolution.
• Mutation results into the formation of alleles. Without mutation, all genes would exist in only one form.
• Organisms would able to evolve and adapt to environmental change.

Molecular basis of gene mutation


Mutations arise in two ways: Some mutations are spontaneous that occur without treatment of the organism with an
exogenous mutagen. Mutagen is an agent that leads to an increase in the frequency of occurrence of mutations.
Spontaneous mutations account for the ‘background rate’ of mutation and are presumably the ultimate source of
natural genetic variation that is seen in populations. Spontaneous mutations can occur because of replication
errors, spontaneous lesions and transposition of transposable elements during the normal growth of the cell. Other
mutations called induced mutations arise because a mutagen has reacted with the parent DNA, causing a structural
change that affects the base-pairing capability of the altered nucleotide.

757
This page intentionally left blank.
Genetics

Loss- and gain- of function mutations

In principle, mutation of a gene might cause a phenotypic change in either of two ways:
• Loss of function (null) mutation : the product may have reduced or no function.
• Gain of function mutation : the product may have increased or new function.

Because mutation events introduce random genetic changes, most of the time they result in loss of function.
Generally, loss of function mutations are found to be recessive. In a wild type diploid cell, there are two wild type
alleles of a gene, both making normal gene product. In heterozygotes, the single wild type allele may be able to
provide enough normal gene product to produce a wild type phenotype. In such cases, loss of function mutations
are recessive. However, some loss of function mutations are dominant. In such cases, the single wild type allele in
the heterozygote cannot provide the enough amount of gene product needed for the cells to be wild type. Gain of
function mutations usually cause dominant phenotypes, because the presence of a normal allele does not prevent
the mutant allele from behaving abnormally.

6.25.3 Fluctuation test


The fluctuation test was invented by Luria and Delbruck in 1943 to determine the randomness of mutation in
bacteria. They grew a series of E. coli cultures in different flasks and then added T1 bacteriophage to each one.
Most of the bacteria were killed by the phage, but a few T1 resistant mutants were able to survive. Luria and
Delbruck measured the number of mutants resistant to bacteriophage T1 in a large number of replicate cultures of
E. coli. If mutants occur after the culture is exposed to the phage, then little variation should occur among cultures
in the number of mutants. However, if mutants arise at random during nonselective growth of cells, each culture
would contain different number of resistant mutant. The numbers depend on how early during the growth period the
first mutant cells arose. But the consequence of that mutation would depend on when during the growth of the
population the mutation occurred. Thus a mutation during the early generations gives rise to a large clone of
mutant cells, whereas a late mutation gives rise to a few mutant cells. Among a large set of identical cultures of
dividing cells, the few cultures in which the mutation happened in the early generations have a large number of
mutants, whereas the majority of the cultures have none or a few mutants. This is what Luria and Delbruck observed.

E. coli : Wild type

Normal receptor
Lysis

T1

Mutant type

Mutant receptor

T1 cannot bind

Figure 6.193 When bacteriophage T1 infects wild-type E. coli, it binds to a receptor in the outer membrane,
protein TonB. After phage replication, the E. coli cell is lysed and new phages are released. A mutation in the
tonB gene results in an altered receptor to which T1 can no longer bind and so the cells survive.

769
Genetics

Their test is known as the fluctuation test because it measures the degree of fluctuation in the number of mutants
found in replicate cultures. They proved that mutations occur before selection. The fluctuation test is also useful in
determining mutation rates during nonselective growth.

6.25.4 Replica plating experiment


Replica plating experiment suggests that the resistant cells are selected by the environmental agent rather than
produced by it (nonadaptive nature of mutation). The technique was developed by Joshua and Esther Lederberg in
1952. A population of bacteria was plated on nonselective medium that is, containing no phages and from each cell
a colony grew. This plate was called the master plate. A sterile piece of velvet was pressed down lightly on the
surface of the master plate, and the velvet picked up cells wherever there was a colony.

Replica plating

Master plate Replica plate Replica plate


(nonselective medium) (nonselective medium) (selective medium)

After incubation

Figure 6.194 Replica plating. For the detection of mutants, cells are transferred on to successive plates
containing either a selective medium or a non-selective medium. Colonies form on the non-selective plate
in the same pattern as on the master plate. Only mutant cells can grow on the selective plate; the mutant
colonies that are formed derive from colonies on the master plate that are mutant.

In this way, the velvet picked up a colony ‘imprint’ from the whole plate. On touching the velvet to replica plates
containing selective medium (that is, containing T1 phages), cells clinging to the velvet are inoculated onto the
replica plates in the same relative positions as those of the colonies on the original master plate. As expected, rare
Tomr mutant colonies were found on the replica plates, but the multiple replica plates showed identical patterns of
resistant colonies. If the mutations had occurred after exposure to the selective agents, the patterns for each plate
would have been as random as the mutations themselves. The mutation events must have occurred before exposure
to the selective agent.
Replica plating has become an important technique of microbial genetics. It is useful in screening for mutants that
fail to grow under the selective regime. The position of an absent colony on the replica plate is used to retrieve the
mutant from the master. For example, replica plating can be used to screen auxotrophic mutants in precisely this
way. In general, replica plating is a way of retaining an original set of strains on a master plate while simultaneously
subjecting replicas to various kinds of tests on different media or under different environmental conditions.

770
Genetics

6.25.5 Ames test


The Ames test, named for its developer, Bruce Ames, is a method to test chemicals for their cancer-causing
properties. The use of the Ames test is based on the assumption that any substance that is mutagenic may also turn
out to be a carcinogen; that is, to cause cancer.
The assay is based on the reversion of mutations in the histidine (his) operon in the genetically altered tester
strains of bacterium Salmonella typhimurium. The his operon encodes enzymes required for the biosynthesis of the
amino acid histidine. Strains with mutations in the his operon are histidine auxotrophs — they are unable to grow
without added histidine. However, this mutation can be reversed, a back mutation, with the gene regaining its
function. These revertants are able to grow on a medium lacking histidine. The tester strains are specially constructed
to have both frameshift and point mutations in the genes required to synthesize histidine, which allows for the
detection of mutagens acting via different mechanisms. The tester strains also carry mutations in the genes responsible
for lipopolysaccharide synthesis, making the cell wall of the bacteria more permeable, and in the excision repair
system to make the test more sensitive.
The Ames test can detect mutagens that work directly to alter DNA. In humans, however, many chemicals are
promutagens, agents that must be activated to become true mutagens. Activation, involving a chemical modification,
often occurs in the liver as a consequence of normal liver activity on unusual substances. Bacteria such as
S. typhimurium do not produce the enzymes required to activate promutagens, so promutagens would not be
detected by the Ames test unless they were first activated. An important part of the Ames test also involves mixing
the test compound with enzymes from rat liver that convert promutagens into active mutagens. These potentially
activated promutagens are then used in the Ames test. If the liver enzymes convert the agent to a mutagen, the
Ames test will detect it, and it will be labeled as a promutagenic agent.

Problem

In the Ames test, auxotrophic strains of Salmonella that are unable to produce histidine are mixed with a rat liver
extract and a suspected mutagen. The cells are then plated on a medium without histidine. The plates are incubated
to allow any revertant bacteria (those able to produce histidine) to grow. The number of colonies is a measure of
the mutagenicity of the suspected mutagen. Why is the rat liver extract included?

Solution
Most mutagens cannot act unless they are converted to electrophile by liver enzymes called mixed-function oxidase,
which include the cytochromes P-450s. The rat liver extract in the Ames test contains enzymes for converting
suspected mutagens to compounds that would be physiologically relevant mutation-causing agents in a mammal.

6.25.6 Complementation test


If two recessive mutations arise independently and both have the same phenotype, how do we know whether they
are both mutations of the same gene? The complementation test allows us to determine whether two mutations,
both of which produce a similar phenotype are in the same gene i.e. whether they are alleles or represent mutations
in separate genes, whose proteins are involved in the same function. In genetics, complementation occurs when
two strains of an organism with different homozygous recessive mutations that produce the same phenotype
produce offspring with the wild-type phenotype when mated or crossed. Complementation will occur only if the
mutations are in different genes.
In a diploid organism the complementation test of allelism (allelism test) is performed by intercrossing homozygous
recessive mutants two at a time and observing whether or not the progeny have a wild-type phenotype. If the two
recessive mutations are in separate genes and are not alleles of one another, then following the cross, all F1
progeny are heterozygous for both genes. Complementation is said to occur. Because each mutation is in a
separate gene and each F1 progeny is heterozygous at both loci, the normal products of both genes are produced.
If the two mutations affect the same gene and are alleles of one another. Complementation does not occur. Because

771
This page intentionally left blank.
Genetics

5. Large population size: The population is sufficiently large so that the frequencies of alleles do not change from
generation to generation because of chance. In small populations, significant random fluctuations in allele
frequencies are possible due to sampling error. The random change in allele frequencies simply as a result of
chance from one generation to next in a finite population is called genetic drift. Drift ultimately leads to the
fixation of one allele at a locus and the loss of all other alleles. In diploid organisms, the rate at which genetic
variability is lost by random genetic drift is 1/2N, where N is the population size.

Let’s take one example of a population of diploid organisms having a gene with two alleles A and a, with
respective frequencies p and q. Let’s assume that neither allele has any effects on fitness; that is, A and a are
selectively neutral. Furthermore let’s assume that the population mates randomly and that in any given generation,
the genotypes are present in Hardy-Weinberg proportions. In a very large population – essentially infinite in
size – the frequencies of A and a will be constant and the frequency of the heterozygotes that carry these two
alleles will be 2pq. In a small population of finite size N, the allele frequencies will change randomly as a result
of genetic drift. Because of these changes, the frequency of heterozygotes will also change. To express the magnitude
of this change over one generation, let’s consider the current frequency of heterozygotes as H and the frequency
of heterozygotes in the next generation as H’. Then the mathematical relationship between H and H’ is

§ 1 ·
H' ¨1  ¸ H
© 2N¹

This equation tells us that in one generation, random genetic drift causes the heterozygosity to decline by a
factor of 1/2N. Over many generations, the heterozygosity will eventually be reduced to 0, at which point all
genetic variability in the population will be lost. At this point the population will possess only one allele of the
gene, and either p = 1 and q = 0, or p = 0 and q = 1. Thus, through random changes in allele frequencies, drift
steadily erodes the genetic variability of a population, ultimately leading to the fixation and loss of alleles.

If the frequency of a homozygous dominant genotype in a randomly mating population is 0.09, what is the frequency
of the dominant allele? What is the combined frequency of all the other alleles of this gene?
Solution

p2 = 0.09, and so p = (0.09)1/2 = 0.30. All other alleles have a combined frequency of 1 – 0.30 = 0.70.

A particular recessive disorder is present in one in ten thousand individuals. If the population is in Hardy-Weinberg
equilibrium, what are the frequencies of the two alleles?
Solution
If the population is in equilibrium, there should be p2 of AA + 2pq of Aa + q2 of aa individuals.
Since, 1/10,000 shows the recessive trait, this is q2.

1
Therefore, q 0.0001 0.01.
10,000

Since, p + q = 1, then p = 1 – 0.01 = 0.99.

6.27.3 Inbreeding
Inbreeding is a mating between individuals that are closely related through common ancestry. The extent of
inbreeding occurring in a population is measured by inbreeding coefficient. The inbreeding coefficient (expressed
as F) is the probability that two alleles of a given gene in an individual are identical by descent. Such a genotype
would be homozygous and considered autozygous since the alleles were inherited from a common ancestor

786
Genetics

(homozygosity by descent). Hence, inbreeding coefficient is also defined as the probability of autozygosity. When
two alleles are not identical by descent, we call the genotype allozygous (allo- means other). Note that allozygous
can be either homozygous or heterozygous.

Figure 6.202 A diagram showing how both allozygous and autozygous individuals can be generated within
the same family. Two unrelated heterozygotes in first generation produced four offspring, all with different
genotypes (second generation). Inbreeding occurs among the siblings of second generation resulting in
two allozygous and one autozygous individual in third generation. The allozygous individuals include both
a heterozygote and a homozygote.

Genotype frequencies under inbreeding


In most species, including all mammals, inbreeding is associated with reduction in heterozygote frequencies and
increase in homozygote frequencies. When two heterozygote individuals mate, the expected genotype frequencies
among the progeny are one half heterozygous genotypes and one quarter of each homozygous genotype. In every
generation, the heterozygote frequency declines by one-half, while one-quarter of the heterozygote frequency is
added to the frequencies of each homozygote. Eventually, the population will lose all heterozygosity although allele
frequencies will remain constant.
Let us consider a hypothetical plant population consisting exclusively of Aa heterozygotes. With self-fertilization,
each plant would produce offspring in the proportions 1/4 AA, 1/2 Aa and 1/4 aa. Thus, one generation of self-
fertilization reduces the proportion of heterozygotes from 1 to 1/2. In the second generation, only the heterozygotes
can again produce heterozygous offspring and only half of their offspring will again be heterozygotes. Heterozygosity
is therefore reduced to 1/4 of what it was originally. Three generations of self-fertilization reduce the heterozygosity
to 1/4 × 1/2 = 1/8, and so forth. Reduction in heterozygote frequency can be calculated according to the following
formula:

787
This page intentionally left blank.
Genetics

Pedigree 3

Path = DBACE
Number of individual per path (n) = 5
n
§1·
FI ¨ ¸ (1  FA )
©2¹

Since A is not inbred, then FA is zero then


5
§1· 1
FI ¨ ¸ .
©2¹ 32

Wahlund effect
So far we have applied population genetics within a single, uniform population. In practice, a species may consist
of a number of separate populations, each more or less isolated from the others. For example, the members of a
species might inhabit a number of islands, with each island population being separated by the sea from the others.
Individuals might migrate between islands from time to time, but each island population would evolve to some
extent independently. A species with a number of more or less independent subpopulations is said to have population
subdivision. The effect of population subdivision on genotypic frequencies was first investigated by S. Wahlund in
1928. In the large, fused population there are fewer homozygotes than in the average for the set of subdivided
populations. The increased frequency of homozygotes in subdivided populations is called the Wahlund effect. A
subdivided population contains fewer heterozygotes than predicted despite the fact that all subdivisions are in
Hardy-Weinberg equilibrium.

791
This page intentionally left blank.
Chapter 07
Recombinant DNA technology

Recombinant DNA technology is the set of techniques that enable the DNA from different sources to be identified,
isolated and recombined so that new characteristics can be introduced into an organism. The invention of recombinant
DNA technology—the way in which genetic material from one organism is artificially introduced into the genome of
another organism and then replicated and expressed by that other organism—was largely the work of Paul Berg,
Herbert W. Boyer, and Stanley N. Cohen, although many other scientists made important contributions to the new
technology as well. Paul Berg developed the first recombinant DNA molecules that combined DNA from SV40 virus
and lambda phage. Later, Herbert Boyer and Stanley Cohen develop recombinant DNA technology, showing that
genetically engineered DNA molecules may be cloned in foreign cells.
One important aspect in recombinant DNA technology is DNA cloning. It is a set of techniques that are used to
assemble recombinant DNA molecules and to direct their replication within host organisms. The use of the word
cloning refers to the fact that the method involves the replication of a single DNA molecule starting from a single
living cell to generate a large population of cells containing identical DNA molecules.

7.1 DNA cloning


DNA cloning is the production of a large number of identical DNA molecules from a single ancestral DNA molecule.
The essential characteristic of DNA cloning is that the desired DNA fragments must be selectively amplified resulting
in a large increase in copy number of selected DNA sequences. In practice, this involves multiple rounds of DNA
replication catalyzed by a DNA polymerase acting on one or more types of template DNA molecule. Essentially two
different DNA cloning approaches are used: Cell-based and cell-free DNA cloning.

Cell-based DNA cloning


This was the first form of DNA cloning to be developed and is an in vivo cloning method. The first step in this
approach involves attaching foreign DNA fragments in vitro to DNA sequences which are capable of independent
replication. The recombinant DNA fragments are then transferred into suitable host cells where they can be propagated
selectively.

The essence of cell-based DNA cloning involves following steps:

Construction of recombinant DNA molecules


Recombinants are hybrid DNA molecules consisting of autonomously replicating DNA segment plus inserted elements.
Such hybrid molecules are also called chimera. Recombinant DNA molecules are constructed by in vitro covalent
attachment (ligation) of the desired DNA fragments (target DNA) to a replicon (any sequence capable of independent
DNA replication). This step is facilitated by cutting the target DNA and replicon molecules with specific restriction
endonucleases before joining the different DNA fragments using the enzyme DNA ligase.

797
This page intentionally left blank.
Recombinant DNA technology

Cell-free DNA cloning

The polymerase chain reaction (PCR) is a newer form of DNA cloning which is enzyme mediated and is conducted
entirely in vitro. PCR (developed in 1983 by Kary Mullis) is a revolutionary technique used for selective amplification
of specific target sequence of nucleic acid by using short primers. It is a rapid, inexpensive and simple method of
copying specific DNA sequence.

7.2 Enzymes for DNA manipulation


The enzymes used in the recombinant DNA technology fall into four broad categories:

7.2.1 Template-dependent DNA polymerase


DNA polymerase enzymes that synthesize new polynucleotides complementary to an existing DNA or RNA template
are included in this category. Different types of DNA polymerase are used in gene manipulation.

DNA polymerase I (Kornberg enzyme) has both the 3’-5’ and 5’-3’ exonuclease activities and 5’-3’ polymerase
activity.

Reverse transcriptase, also known as RNA-directed DNA polymerase, synthesizes DNA from RNA.
Reverse transcriptase was discovered by Howard Temin at the University of Wisconsin, and independently by David
Baltimore at about the same time. The two shared the 1975 Nobel Prize in Physiology or Medicine.

Taq DNA polymerase is a DNA polymerase derived from a thermostable bacterium, Thermus aquaticus. It operates
at 72°C and is reasonably stable above 90°C and used in PCR. It has a 5’ to 3’ polymerase activity and a 5’ to 3’
exonuclease activity, but it lacks a 3’ to 5’ exonuclease (proofreading) activity.

7.2.2 Nucleases
Nucleases are enzymes that degrade nucleic acids by breaking the phosphodiester bonds that link one nucleotide
to the next. Ribonucleases (RNases) attack RNA and deoxyribonucleases (DNases) attack DNA. Some nucleases
will only attack single stranded nucleic acids, others will only attack double-stranded nucleic acids and a few will
attack either kind. Nuclease are of two different kinds – exonucleases and endonucleases. Exonucleases remove
nucleotides one at a time from the end of a nucleic acid whereas endonucleases are able to break internal
phosphodiester bonds within a nucleic acid. Any particular exonuclease attacks either the 3’-end or the 5’-end but
not both.

Mung bean nuclease


The mung bean nuclease is an endonuclease specific for ssDNA and RNA. It is purified from mung bean sprouts. It
digests single-stranded nucleic acids, but will leave intact any region which is double stranded. It requires Zn2+ for
catalytic activity.

S1 nuclease
The S1 nuclease is an endonuclease purified from Aspergillus oryzae. This enzyme degrades RNA or single stranded
DNA, but does not degrade dsDNA or RNA-DNA hybrids in native conformation. Thus, its activity is similar to mung
bean nuclease, however, the enzyme will also cleave a strand opposite a nick on the complementary strand.

RNase A
RNase A is an endonuclease, which digests ssRNA at the 3’ end of pyrimidine residues.

RNase H
It is an endonuclease which digests the RNA strand of an RNA-DNA heteroduplex. The enzyme does not digest ss or
dsDNA.

799
Recombinant DNA technology

Restriction endonuclease
A restriction endonuclease (or restriction enzyme) is a bacterial enzyme that cuts dsDNA into fragments after
recognizing specific nucleotide sequences known as recognition or restriction site. The term restriction comes from
the fact that these enzymes restrict the entry of foreign DNA in the bacteria. Restriction enzymes, therefore, are
believed to be evolved by bacteria to resist viral attack.
The existence of restriction enzymes was first postulated by W. Arber. He noticed that when the DNA of a bacteriophage
entered a host bacterium it was cut up into smaller pieces and, for this, he theorized the presence of restriction
enzyme. In 1970, Hamilton Smith and his co-workers first isolated a restriction enzyme from the bacterium
Haemophilus influenzae strain Rd. The enzyme, called HindII, recognizes a six base-pair dsDNA sequence. After
discovery of HindII restriction enzyme, EcoRI, was isolated and characterized from Escherichia coli strain RY13.

Nomenclature
The name of any restriction endonuclease consists of three parts:
1. An abbreviation of the genus and species of the organism to three letters, e.g. Eco for Escherichia coli
identified by the first letter of the genus and the first two letters of the species.
2. A letter, number or combination of the two to indicate the strain of the relevant species.
3. A Roman numeral to indicate the order in which different restriction modification systems were found in the
same organism or strain.
For example, the name of the EcoRI restriction enzyme was derived as:
E Escherichia (genus)
co coli (species)
R RY13 (strain)
I First identified (order of identification in the bacterium)

Restriction sites
Rather than cutting DNA indiscriminately, a restriction enzyme cuts only double-helical segments that contain a
particular nucleotide sequence of four to eight base pairs in length, known as a restriction or recognition site. These
are generally palindromic sequences. The position at which the restriction enzyme cuts is usually shown by the
symbol ‘/’. Restriction enzymes make either blunt or staggered cuts. Thus, restriction fragments may have:
• Blunt ends (the cleavage points occur exactly on the axis of symmetry).
• Overhanging ends (the cleavage points do not fall on the symmetry axis, so that the resulting restriction
fragments possess sticky ends or cohesive ends).

5’ G A T A T C 3’ EcoRV 5’ G A T A T C 3’
3’ C T A T A G 5’ 3’ C T A T A G 5’

Cleavage site Blunt (flush) cut

5’ G A A T T C 3’ EcoRI 5’ G 5’ A A T T C 3’
3’ C T T A A G 5’ 3’ C T T A A 5’ G 5’

Cleavage site Staggered cut

After the staggered cuts, the resulting restriction fragments possess so-called 5’ overhangs or 3’ overhangs. For
example, the recognition site for EcoRI enzyme is 5’-GAATTC-3’. Once the staggered cuts have been made, the
resulting fragments have 5’ overhangs or staggered ends. Similarly, restriction enzyme PstI create staggered cuts
in the recognition site (5’-CTGCAG-3’) that results in 3’ overhangs or staggered ends.

800
This page intentionally left blank.
Recombinant DNA technology

7.3.4 Vectors for animals


For insects

P element, a transposon, is used as a vector in Drosophila. The P element is 2.9 kb in length and contains three
genes flanked by short inverted repeat sequences at either end of the element. The genes code for transposase,
the enzyme that carries out the transposition process. The inverted repeats form the recognition sequences that
enable the enzyme to identify the two ends of the inserted transposon. The vector is a plasmid that carries two
P elements, one of which contains the insertion site for the DNA that will be cloned. Insertion of the new DNA into
this P element results in disruption of its transposase gene. But the second P element carried by the plasmid has an
intact version of the transposase gene that provides transposase enzyme to carry out transposition.
Vectors for insects based on viral DNA are not common. However, dsDNA of baculoviruses is used as cloning
vector for many insects. Baculoviruses have rod-shaped capsids and large, dsDNA genomes.

For mammal
The genome of many viruses are used as cloning vectors for mammals. The first vector used for mammalian cell
was based on SV40 virus genome. SV40 is a small virus that infects monkey (simian). Now genome of many
viruses such as adenoviruses and papillomaviruses which have a relatively high insert capacity are used as vectors
for cloning/expression of genes in mammalian cells. At present, retroviruses are the most commonly used vectors.

7.4 Introduction of DNA into the host cells


7.4.1 In bacterial cells
The process by which bacterial cells take up naked DNA molecules is called transformation. There are basically two
general methods for transforming bacteria.

Chemical transformation method: Bacteria which are able to uptake DNA are called ‘competent’ and are made
so by chemical treatment. Competency is a physiologic state, which changes the structure and permeability of the
cell membrane so the naked DNA can enter the cell. The chemical transformation method utilizing CaCl2 and heat
shock to promote DNA entry into cells. The chemical method uses bacteria that are incubated with DNA on ice cold
salt solution containing CaCl2 followed by a brief heat shock at 42°C. Exactly how this treatment works is not
understood. Possibly CaCl2 causes the DNA to precipitate onto the surface of the cells, or perhaps the salt is
responsible for some kind of change in the cell wall that improves DNA binding.
Electroporation: Competency can also be achieved through the use of electrical pulses called electroporation. It
uses a short pulse of electric charge to facilitate DNA uptake. Electroporation induces formation of microscopic
pores within a biological membrane. These pores, called electropores, allow molecules, ions and water to pass from
one side of the membrane to the other.

7.4.2 In plant cells


The process of transferring exogenous DNA into plant cells is called transformation. Gene transfer to plant cells is
achieved using two different methods:

A. Vector-mediated methods
The vector-mediated methods (or indirect gene transfer methods) exploit the natural ability of certain bacteria
(Agrobacterium species) and viruses to naturally transfer DNA to the genomes of infected plant cells.

Agrobacterium-mediated transformation
Members of the genus Agrobacterium are also known as natural genetic engineers of plants since these bacteria
have ability to transfer T-DNA of their plasmid (Ti and Ri) into plant genome upon infection of cells at the wound site.
In the natural environment, Agrobacterium introduces its T-DNA into compatible host plant cells and via highly
evolved molecular mechanisms stably integrates the new DNA into the plant genome.

819
Recombinant DNA technology

The foreign gene is cloned in the T-DNA region of Ti- or Ri-plasmid by replacing unwanted sequences. Agrobacterium
transfers T-DNA, which makes up a small (~5%–10%) region of the Ti- or Ri-plasmid. Transfer requires three
major elements:
1. The right and left border sequences that flank the T-DNA (imperfect, direct repeats of 25 base pairs and the
only essential cis-elements for T-DNA transfer),
2. vir genes located on the Ti and Ri-plasmid and
3. Some chromosomal genes (chromosomal virulence and other genes) located on the bacterial chromosomes.
These chromosomal genes generally are involved in bacterial exopolysaccharide synthesis, maturation and
secretion.

The first step in the process of gene transfer to plant cell involves the formation of the recombinant plasmid. For the
recombinant formation, T-DNA needs to be disarmed. T-DNA contains phytohormone synthesis genes, whose
expression causes infected plants to suffer from unregulated growth. Thus, wild-type Ti plasmids are not suitable as
general vectors. Hence, we must use vectors in which the T-DNA has been disarmed. To do this, the genes encoding
the proteins for the production of phytohormones are simply removed from the T-DNA fragment. New DNA can,
then, be inserted between the left and right border repeats.
Earlier, to introduce gene of interest into T-DNA for subsequent transfer to plants was very cumbersome process.
This was because Ti and Ri-plasmids are very large, low copy number plasmid, difficult to isolate and manipulate in
vitro, and do not replicate in Escherichia coli. In large DNA molecules, there is a problem of unique restriction site
also. However, this problem has been resolved by using two novel strategies:

Binary vector strategy: The T-DNA does not need to be physically associated with the vir genes in order to
become integrated into the plant genome. T-DNA regions of Ti-plasmids could be split onto two separate replicons.
As long as both of these replicons are located within the same Agrobacterium cell, proteins encoded by vir genes
could act upon T-DNA in trans to mediate its processing and export to the plant. Systems in which T-DNA and vir
genes are located on separate replicons were eventually termed T-DNA binary systems. Thus, in binary vector
strategy, two plasmids are used and both complement each other in the same bacterial cell. The T-DNA carried by
one plasmid is transferred to the plant chromosomal DNA by proteins coded by vir genes carried by other plasmid.

ori

T-DNA vir helper vir genes


binary vector

RB LB

Marker Gene of
interest
ori

Figure 7.13 Schematic diagram of T-DNA binary vector systems. Genes of interest are maintained within
the T-DNA region of a binary vector. Vir proteins encoded by genes on a separate replicon (vir helper) mediate
T-DNA processing from the binary vector and T-DNA transfer from the bacterium to the host cell.

Co-integration vector strategy: Although disarmed wild-type Ti plasmids can be used as vector, they are not
easy to manipulate, because their large size makes them difficult to manipulate in vitro and there are no unique
restriction sites in the T-DNA. This problem can be overcomed by the construction of co-integrative vectors. In this
strategy, the gene of interest to be introduced into the Ti plasmid vector is first sub-cloned in a conventional E. coli
plasmid vector (such as pBR322) for easy manipulation, producing a so-called intermediate vector. The insertion of
gene of interest into a Ti plasmid results from the recombination of intermediate vector and a Ti plasmid. The

820
This page intentionally left blank.
Recombinant DNA technology

these pores. Plant cell electroporation generally utilizes the protoplast because thick plant cell walls restrict
macromolecule movement. Electrical pulses are applied to a suspension of protoplasts with DNA placed between
electrodes in an electroporation cuvette. Short high-voltage electrical pulses induce the formation of transient
micropores in cell membranes allowing DNA to enter the cell and then the nucleus.
Microinjection: Extensively used with the animal cell, microinjection of DNA into plant cells has achieved only a
limited success. This is largely because of difficulties in getting the protoplasts immobilized and injecting DNA into
the protoplast without damaging the tonoplast, which surrounds the plant cell vacuole.
Particle bombardment: Particle bombardment or microprojectile bombardment or biolistic transformation employs
foreign DNA coated high velocity gold or tungsten particles (0.2–0.4 μm) to deliver DNA into plant cells. Different
approaches are being used to accelerate the particles. Particle gun accelerated particles penetrates even deep into
the tissues. This method is being widely used because of its ability to deliver foreign DNA into regenerable cells,
tissues or organs irrespective of the monocots or dicots. Because of the physical nature of the process, there is no
biological limitation to the actual DNA delivery, thus it is genotype independent.

Chloroplast transformation
Genetic material in plants is distributed into nucleus and the chloroplast and mitochondria in the cytoplasm. Each of
these three compartments carries its own genome and expresses heritable traits. The chloroplast present in
photosynthetic eukaryotes. There are up to 300 chloroplasts in one plant cell. Chloroplast genomes are usually
circular dsDNA and usually vary in length from 120-190 kb. In most species, chloroplasts are usually maternally
inherited in most (~80%) angiosperm plant species. It is also not influenced by polyploidy, gene duplication and
recombination that are widespread features of the nuclear genomes of plants. Therefore, chloroplast DNA varies
little among angiosperms in terms of size, structure and gene content.
Chloroplasts transformation can involve delivery of DNA into chloroplasts. For chloroplast transformation, DNA
has to be delivered through the cell wall and through at least three membranes (the plasma membrane and two
chloroplast membranes). Efficient chloroplast transformation has been achieved both through particle bombardment
and polyethylene glycol (PEG)-mediated transformation. PEG-mediated transformation of plastids requires
enzymatically removing the cell wall to obtain protoplasts, then exposing the protoplasts to purified DNA in the
presence of PEG. The protoplasts first shrink in the presence of PEG, then lyse due to disintegration of the cell
membrane. Removing PEG before the membrane is irreversibly damaged reverses the process. Biolistic delivery
is the routine system for most laboratories. The flowering plants contain a variety of plastids (including chloroplast,
leucoplasts or chromoplasts), thus the term plastid transformation is more accurate than chloroplast transformation.
Plants with transformed plastid genomes are termed transplastomic .
The major difficulty in chloroplast transformation for production of transplastomic plants is in generating homoplasmic
plants in which all the chloroplasts are uniformly transformed. This is due to the presence of about 10-100 chloroplasts
in one cell, each of which has up to 100 copies of the chloroplast genome, that does not allow achieving homoplastomic
state. Apart from this, getting high level of protein expression, even though the gene copy number is high, is another
problem.

7.4.3 In animal cells


Gene transfer to animal cells has been carried out in many different cell types in culture, either to study gene
function and regulation or to produce large amounts of recombinant protein. Gene transfer to animal cells can be
achieved essentially via two processes. One is transfection which includes techniques to introduce foreign DNA
either directly into the cells (attack strategies) or persuade cells to take up DNA from their surroundings (stealth
strategies). The attack strategy uses physical methods to force DNA into cells e.g. biolistics and microinjection.
The chemical transfection methods such as calcium phosphate precipitation, DEAE dextran-mediated transfection
and liposome-mediated transfection are stealth strategies.

822
Recombinant DNA technology

The second method is to package the DNA inside an animal virus, since viruses have evolved mechanisms to
naturally infect cells and introduce their own nucleic acid. The transfer of foreign DNA into a cell by this route is
termed transduction.

Transfection
The direct transfer of DNA into animal cells can be accomplished by a number of techniques that either force the
cells to take in DNA by breaching the cell membrane or exploit the natural ability of cells to internalize certain
molecules in their environment. The term transfection was originally coined to describe the introduction of phage
DNA into bacterial cells. In the same way, forcing animal cells to take up DNA from the surrounding medium using
a variety of chemical and physical methods is also termed transfection, and can be a highly efficient way to
introduce DNA either transiently or stably into cultured cells or cells in vivo. When the term transformation is
applied to animal cells, it usually refers to a stable change of genotype brought about, either by incorporation of the
transfected DNA into the genome, or its long-term episomal maintenance. However, the same term is also used to
indicate oncogenic transformation, that is, the change in phenotype resulting from the activation of an oncogene.
The fate of DNA introduced into the cells depends on the vector system being used. In one fate, the DNA introduced
into the cells replicate and express without integration i.e. maintained in the nucleus in an extrachromosomal state
(episomally). This is known as transient transfection. In second fate, DNA may integrate into a random chromosomal
site of the host genome and replicate as a normal part of the genome. If the introduced DNA integrates into the host
genome and maintained permanently in the cell, this is called stable transfection. If the transfected exogenous DNA
is non-replicative, stable transfection must occur by integration of the DNA into the genome.

Chemical transfection strategies


The principle of chemical transfection involves the interaction of negatively charged nucleic acids with positively
charged carrier molecules, like polymers or lipids, enabling the nucleic acid to come into contact with the negatively
charged membrane components and incorporating the gene into the cell by endocytosis.
Transfection with calcium phosphate: This chemical transfection can be achieved by washing cultured cells in
a phosphate buffer, adding the DNA, and then adding calcium chloride to the mixture. Under these circumstances,
it is thought that the precipitate settles on the surface of cells and is then internalized through endocytosis.
Transfection with DEAE-dextran: This chemical transfection method utilizes diethylaminoethyl dextran (DEAE-
dextran), a soluble polycationic carbohydrate that promotes interactions between DNA and the cell and thus their
internalization.
Liposomes and Lipofection: Liposomes are vesicles that have an aqeuous compartment enclosed by a phospholipid
bilayers. It can be used as DNA delivery system either by entrapping the DNA inside the aqueous compartment or
complexing them to the phospholipid bilayer. When mixed with cells in culture, the vesicles fuse with the cell
membrane and deliver DNA directly into the cytoplasm. The efficiency of liposome-mediated gene transfer can be
enhanced by incorporating viral proteins that facilitate the active fusion between viral envelopes and cell membranes.
Such fusogenic particles have been termed virosomes.
Lipofection involves cationic or neutral lipid mixtures which spontaneously associate with negatively charged DNA
to form complexes. Unlike liposome-mediated transfection, where the DNA is encapsulated within a lipid vesicle,
lipofection involves the formation of a DNA lipid complex (lipoplex) which is taken up efficiently by endocytosis.
Cell or protoplast fusion: Certain chemicals (called fusogens), such as polyethylene glycol (PEG), cause cell
membranes to fuse together. This can be exploited to transfect animal cells by mixing them with other cells
containing large amounts of plasmid DNA. Schaffner first successfully used bacterial protoplasts to transfect
mammalian cells in culture by treating bacterial cells with chloramphenicol to amplify the plasmid contents and
lysozyme to remove the cell wall. The protoplasts were then induced to fuse with mammalian cells.
Receptor-mediated transfection: Receptor-mediated transfection involves the delivery of DNA to particular
cells by conjugation to a specific ligand. The ligand interacts with receptors on the cell surface, allowing both and
the attached DNA to be internalized. One problem associated with this technique is that the ligand-DNA complexes

823
This page intentionally left blank.
Recombinant DNA technology

7.8 Expression vector


An expression vector contains regulatory elements allowing the expression of any foreign DNA it carries. A
foreign gene present on expression vector can be efficiently transcribed and translated by the host cell. The
simplest expression vectors, transcription vectors, allow transcription, but not a translation of cloned foreign
DNA. Typical protein expression vectors allow both the transcription and translation of cloned DNA, and thus
facilitate the production of recombinant protein.
All protein expression vectors carry a transcription unit containing the sequences required for efficient gene
expression. These comprise transcription regulatory sequences, RNA processing signals and sequence for protein
synthesis and targeting. For transcription, a promoter site and a terminator site are necessary. Transcription of the
desired gene begins at the promoter site and ends at the terminator site. Promoter is the most critical component
of an expression vector since it is the site where RNA polymerase binds. It also regulates the rate of transcription.
An expression vector should carry a strong promoter so that the highest possible rate of gene expression could be
achieved. Regulation of promoter is another important factor to be considered during construction of an expression
vector. Two important ways of regulating a promoter in E. coil are:
Induction : Where transcription of a gene is switched on by the addition of a chemical.
Repression : Where gene transcription is switched off upon addition of a regulatory chemical.

Promoter

Regulatory
gene Ribosome binding site

Start
codon
Coding sequence
Origin
Stop
codon

C-terminal tag
Selectable
marker Transcription
terminator

Figure 7.18 The basic architecture of an E. coli expression vector. It contains the features: origin of replication,
promoter, regulatory gene (repressor), selectable marker and transcription terminator. Ribosome binding
site (Shine-Dalgarno sequence), multiple cloning site and N- or C-terminal tags. N- or C-terminal tags offer
several potential advantages such as improved expression and solubility, improved detection and purification.

Most frequently used promoters for an E. coli expression vector:


The lac promoter : It regulates transcription of lacZ gene coding for β-galactosidase. It can be induced by
isopropylthiogalactoside (IPTG). Fusing the lac promoter sequences to target gene will result in
the lactose- (or IPTG-) dependent expression of that gene. However, the lac promoter suffers
from a number of problems. First, the lac promoter is fairly weak and, therefore, cannot drive
very high levels of protein production, and second the lac genes are transcribed to a significant
level in the absence of induction.
The trp promoter : It regulates transcription of a cluster of genes involved in tryptophan biosynthesis. It is repressed
by tryptophan and easily induced by 3-β-indoleacrylic acid.
The tac promoter : It is a hybrid of trp and lac promoter, but is stronger than either of them. It is induced by IPTG.
The λPL promoter : It is a very strong promoter responsible for transcription of λDNA molecule in E. coli. It is repressed
by a product of λcI gene called λ repressor. Expression vector with λPL promoter is used with
mutant E. coli host that synthesizes a temperature sensitive form of the λ repressor protein. At
low temperature (<30°C), this mutant λ repressor protein is able to repress the λPL promoter; at
higher temperature the protein is inactivated resulting in transcription of the cloned gene.

829
This page intentionally left blank.
Recombinant DNA technology

7.12 Genome mapping


Genome mapping is a method used to identify the locations of genetic markers (which can be genes and other DNA
sequences) and the relative distances between genetic markers on genome. There is a difference between a
genome map and a genome sequence. A genome sequence spells out the order of every nucleotide in the genome,
while a map simply identifies a series of landmarks in the genome. A genome map is less detailed than a genome
sequence. There are three kinds of maps– genetic, cytological (or cytogenetic) and physical map.

The genetic map gives the relative position of genetic markers according to the frequency of recombination,
expressed in term of centimorgans (cM). Genetic maps illustrate the order of genetic markers on a chromosome
and the relative distances between those markers.

The cytological map depicts the locations of genetic markers in a chromosome relative to visible landmarks. In
most cases, each chromosome has a characteristic banding pattern, which may be either naturally present, (e.g. in
polytene chromosomes of Drosophila) or more commonly generated by specific staining protocols (e.g. in case of
human chromosomes); the genetic markers are mapped cytologically relative to these band locations. Fluorescent in
situ hybridization (FISH) is widely used to map the cytological locations of genes and other DNA sequences within
large eukaryotic chromosomes.

The physical maps describe the absolute distance between two genetic markers in term of base pairs.

7.12.1 Genetic marker


A gene or DNA sequence having a known location on a chromosome and associated with a particular trait or gene
is used as a genetic marker. Genes were the first markers to be used to prepare the first genetic maps of fruit fly.
Genetic markers used in genetics and plant breeding can be classified into two categories: classical markers and
DNA markers.

Classical markers include morphological markers, cytological markers and biochemical markers.
Morphological markers: Morphological (or visible) markers are usually visually characterized phenotypic traits
or characters such as flower color, seed shape, growth habits or pigmentation. However, morphological markers
are very limited, and many of these markers are not associated with important economic traits (e.g. yield and
quality). These markers are also influenced by environmental factors or the developmental stages.
Cytological markers: Cytological markers are the unique structural features of chromosomes such as bands,
secondary constrictions. These chromosome features are used not only for characterization of normal chromosomes
and detection of chromosomal mutation, but also widely used in mapping and linkage group identification. However,
direct use of cytological markers has been very limited in genetic mapping.
Biochemical markers are gene products that can be detected easily by electrophoresis and specific staining.
Enzyme variants such as isozymes and allozymes are commonly used as biochemical markers. Allozymes are
enzymes encoded by different alleles of a gene but have the same catalytic activity or function. Allozymes can be
separated by electrophoresis and other separating techniques on the basis of differences in molecular size, shape
and electrical charge. Isozymes are different from allozymes. Isozymes are enzymes that perform the same
catalytic function, but are encoded by different nonallelic genes located at different loci. Allozymes reflect the
products of different alleles of a gene rather than different nonallelic genes located at different loci. Biochemical
markers are also called as protein markers. The major disadvantages of biochemical markers are that they are
limited in number.
A DNA marker is defined as a particular segment of DNA that is representative of the differences at the genome
level. DNA marker is also called as molecular marker. Strictly speaking, protein markers and DNA markers are
both molecular markers, but the current uses of the term is limited to DNA markers. DNA markers should not be
considered as normal genes, as they usually do not have any biological effect, and instead can be thought of as
constant landmarks in the genome. They are identifiable DNA sequences, found at specific locations of the genome,

842
Recombinant DNA technology

and transmitted by the standard laws of inheritance from one generation to the next. An ideal DNA marker should
have the following criteria:
1. High level of polymorphism,
2. Even distribution across the whole genome,
3. Provide adequate resolution of genetic differences,
4. Co-dominance in expression (so that heterozygotes can be distinguished from homozygotes),
5. Have linkage to distinct phenotypes,
6. Genome-specific in nature.

7.12.2 Types of DNA markers


Various types of DNA markers have been described in the literature. They can be broadly divided into two classes
based on the method of their detection: Hybridization-based (such as RFLP) and PCR based (such as RAPD, AFLP, SSLP).
PCR-based techniques can further be subdivided into two subcategories: arbitrarily primed PCR-based techniques
or sequence nonspecific techniques (such as RAPD, AFLP) and sequence targeted PCR-based techniques (such as
SSLP, SNP). The molecular markers can also be classified on the basis of sequence variation (e.g. RFLP) and
length variation (e.g. SSR). DNA markers may be described as codominant or dominant. This description is based
on whether markers can discriminate between homozygotes and heterozygotes. Codominant markers indicate
differences in size whereas dominant markers are either present or absent.

P1 P2 F1 P1 P2 F1

AA aa Aa BB bb Bb

(a) (b)

Figure 7.27 Comparison between (a) codominant and (b) dominant markers. Codominant markers can
clearly discriminate between homozygotes and heterozygotes whereas dominant markers do not. Genotypes
at two marker loci (A and B) are indicated below the gel diagrams.

RFLPs
RFLP (Restriction Fragment Length Polymorphisms) is the most widely used hybridization-based molecular marker.
RFLP markers were first used in 1975 to identify DNA sequence polymorphisms for genetic mapping. RFLPs arise
because mutations can create or destroy the sites recognized by specific restriction enzymes, leading to variations
between individuals in the length of restriction fragments produced from identical regions of the genome. Although
two individuals of the same species have almost identical genomes, they will always differ at a few nucleotides due
to point mutation and insertion/deletion. Some of the differences in DNA sequences at the restriction sites can
result in the gain, loss or relocation of a restriction site.
A single base change within a restriction site is a readily detectable genetic marker because the mutated site is no
longer cleaved by the enzyme in question. Two chromosomes that differ by such a mutation are then distinguish-
able on the basis of a restriction fragment length polymorphism (RFLP), which arises because a particular cleavage
site is present in only one of the two DNA molecules. A mutation that gives rise to an RFLP, thus represents a genetic
marker. RFLPs have only two alleles: the site is present or absent. The maximum heterozygosity is 0.5. The RFLP
markers are codominantly inherited and highly reproducible.

843
This page intentionally left blank.
Recombinant DNA technology

Ovum Mammary gland cells


of 6-year-old ewe

Induce G0 phase

Nucleus

Enucleated
oocyte Fusion and
activation

Renucleated
oocyte
In vitro Implant
embryo
culture

Figure 7.37 Cloning sheep by nuclear transfer. The nucleus of an ovum is removed with a pipette. Cells
from the mammary epithelium of an adult are grown in culture, and the G0 state is induced by inhibiting cell
growth. A G0 cell and an enucleated ovum are fused, and the renucleated ovum is grown in culture or in
ligated oviducts until an early embryonic stage before it is implanted into a foster mother, where development
proceeds to term.

7.16 Gene therapy


Gene therapy is a technique for correcting defective genes responsible for disease development. Gene therapy
typically aims to supplement a defective mutant allele with a functional one. Scientist may use one of several
approaches for correcting defective or abnormal genes:
• A normal gene may be inserted into a nonspecific location within the genome (gene addition). This is the most
common approach.
• An abnormal gene can be replaced by a normal gene through homologous recombination (gene replacement).
• An abnormal gene can be repaired through selective reverse mutation, which returns the gene to its normal
function.
Gene therapy may be germ-line or somatic cell gene therapy. Current gene therapy is exclusively somatic gene
therapy which involves the introduction of genes into somatic cells of an affected individual. Germ-line gene
therapy involves the permanent transmissible modification of the genome of a gamete, a zygote or an early
embryo. The prospect of human germline gene therapy is currently not sanctioned.
Gene therapy may be classical and nonclassical gene therapy. In classical gene therapy genes are delivered to
appropriate target cells with the aim of obtaining the optimal expression of the introduced genes. The idea of
nonclassical gene therapy is to inhibit the expression of genes associated with the pathogenesis, or to correct a
genetic defect for restoring the normal gene expression.

Potential use of somatic gene therapy


The potential use of this therapy is to cure genetic diseases. The first case of gene therapy occurred in 1990, at the
NIH in Bethesda, Maryland. On that occasion, a four-year-old patient with a severe combined immuno- deficiency
(due to adenosine deaminase enzyme deficiency) received an infusion of white blood cells that had been genetically
modified to contain the gene that was non-functional in his genome. Since then, gene therapy has been studied and
experimentally tested for several medical conditions.

857
This page intentionally left blank.
Recombinant DNA technology

7.18 Plant tissue culture


The field of plant tissue culture is based on the fact that plants can be separated into their component parts (organs,
tissues or cells), which can be manipulated in vitro and then grown back into complete plants. Plant cells or tissues
will continue to grow if supplied with the appropriate nutrients and conditions. The culture of plant cells, tissues and
organs such as roots, shoot tips and leaves in artificial nutrient media aseptically under defined physical and
chemical conditions is referred to as plant tissue culture. ‘Tissue culture’ is commonly used as a broad term to
describe all types of plant cultures, namely callus, cell, protoplast, anther, meristem, embryo and organ cultures.

Plant cells - Unique features


A plant cell is a eukaryotic cell and shares similar features with the typical eukaryote cell. However some features
are uniquely present in plant cells. Their distinctive features include:
• A cell wall outside the cell membrane which is composed of cellulose, hemicellulose, pectin and in many cases lignin.
• A large central vacuole enclosed by a membrane known as the tonoplast which maintains the cell’s turgor,
controls movement of molecules between the cytosol and sap, stores useful material and digests waste proteins
and organelles.
• Specialized cell-cell communication through plasmodesmata, pores in the primary cell wall through which the
plasmalemma and endoplasmic reticulum of adjacent cells are continuous.
• Plastids such as chloroplasts which contain chlorophyll for photosynthesis, amyloplasts for starch storage,
elaioplasts for fat storage and chromoplasts for the synthesis and storage of pigments.
• A specialized peroxisome called glyoxysome for the operation of glyoxylate cycle.
• Cytokinesis by formation of a phragmoplast and cell plates.
• Absence of centrioles in MTOC that are present in animal cells.
• Plant cells are totipotent, which means that, in principle, every cell contains all genetic information to grow a
new plant.

7.18.1 Cellular totipotency


Totipotency is the ability of a single cell to divide and produce all the differentiated cells in an organism. In a
multicellular organism, a cell after regulated division undergoes for cell differentiation. It is a process of specializing
cell’s functions. Isolated cells from differentiated tissues are generally non-dividing and quiescent; to show totipotency
the differentiation process has to be reversed (called de-differentiation) and repeated again (called re-differentiation).
A differentiated cell reverting to an undifferentiated state is termed de-differentiation, whereas the ability of a
dedifferentiated cell to form a whole organism or organs is termed redifferentiation. Theoretically, all living cells
can revert to an undifferential status through de-differentiation process. However, the more differentiated a cell has
been, the more difficult it will be to induce its de-differentiation. In plants, even highly mature or differentiated cells
have the ability to regress to a meristematic state as long as they are viable and show totipotency. This phenomenon
of totipotency is an amazing developmental plasticity that sets plant cells apart from most of their animal counterparts.
In animals the differentiation is irreversible.

7.18.2 Tissue culture media


The success of tissue culture depends on the composition of the growth medium and culture conditions such as
temperature, pH, light and humidity. Growth and morphogenesis of plant in vitro are largely governed by the
composition of the culture media. Media compositions are formulated considering the requirements of a particular
culture system. Culture media used for the in vitro culture of plant cells are composed of four basic components:

1. Essential elements, or mineral ions, supplied as a complex mixture of salts


The principal components of most plant tissue culture media are inorganic nutrients (macronutrients like nitrogen,
phosphorus, potassium, calcium, magnesium and sulphur and micronutrients like iron, manganese, zinc, boron,
copper and molybdenum).
867
This page intentionally left blank.
Chapter 08
Bioprocess engineering

Bioprocess engineering is a specialization of chemical engineering that deals with the design and development of
equipment and processes for the manufacturing of products such as food, pharmaceuticals and polymers from
biological materials. It uses the capabilities of organisms in industrial, medical, environmental or agricultural processes
in order to produce useful biological materials.

Application of bioprocess engineering includes:


• Design and operation of fermentation systems,
• Development of food processing systems,
• Application and testing of product separation technologies,
• Design of instrumentation to monitor and
• Control biological processes and much more.

Bioprocess engineers work at the frontiers of biological and engineering sciences to bring engineering to Life
through the conversion of biological materials into other forms needed by mankind. One of the main tasks of a
bioprocess engineer is to control and maintenance of a biological processes such as the production of beverages,
pharmaceuticals, antibiotics, enzymes, biochemicals, food processing and biological waste treatment. These processes
require a well-designed growth environment to obtain the maximum yield of the product and consequently, these
conditions need to be carefully controlled. Environmental design comprises the determination of the environment of
the process, while fermentation engineering provides the means for meeting those requirements.

8.1 Concept of material and energy balance


8.1.1 Material balance
System and process
In performing the material balance, we apply thermodynamic terms – system and process. A system is defined as
a part of the universe that is under consideration. All space outside the system is known as the surroundings. A
system is separated from the surrounding by a system boundary, which may be real or imaginary. If the boundary
doesn’t allow mass to pass from system to surroundings and vice versa, the system is considered as a closed
system with constant mass. If the system boundary allows the mass to pass from system to surroundings and vice
versa, then it is an open system.

A process causes changes in the system or surroundings. In bioprocess, the process can be batch, semi-batch,
fed-batch and continuous processes.

A batch process operates in a closed system. All materials are added to the system at the start of the process, the
system is then closed and products removed only when the process is complete.

887
Bioprocess engineering

A semi-batch process allows either input or output of mass, but not both.
A fed-batch process allows input of material to the system but not output.
A continuous process allows matter to flow in and out of the system. If rates of mass input and output are equal,
continuous processes can be operated indefinitely.

Steady state and Equilibrium


If all properties of a system (such as temperature, volume, mass etc.) do not vary with time, the process is said to
be at steady state. An equilibrium can be considered as a special case of a steady state. If a system is in
equilibrium then it is definitely in a steady state, but the reverse is not necessarily true. That is, if the system is in
a steady state then it is not necessarily in an equilibrium state. In both an equilibrium and a steady state there is no
net observable change taking place in the system. A system at equilibrium is one in which all opposing forces are
exactly counter-balanced so that the properties of the system do not change with time. At equilibrium there is no
net change in either the system or the universe. Equilibrium implies that there is no net driving force for change;
the energy of the system is at a minimum and, in rough terms, the system is 'static'.
According to above definition of steady state, batch, fedbatch and semi-batch processes cannot operate under
steady-state conditions. Mass of the system is either increasing or decreasing with time during fed-batch and semi-
batch processes; in batch processes, even though the total mass is constant, changes occurring inside the system
cause the system properties to vary with time. Such processes are called transient or unsteady-state processes. In
unsteady state, system properties vary with time. Processes like batch and fed batch cannot operate under steady
state conditions. Mass and other parameters of system change with time. Continuous processes may be steady or
unsteady. In continuous processes at steady state, mass is constantly exchanged with the surroundings; this
disturbance drives the system away from equilibrium so that a net change in both the system and the surrounding
can occur.

Law of conservation of mass


Material balances are based on the law of conservation of mass. The law of ‘conservation of mass’ states that mass
cannot be created or destroyed. Doing a mass balance is similar in principle to accounting. In accounting, accountants
do balances of what happens to a company’s money. In the process of mass balance, the first step is to look at the
three basic categories: mass in, mass out and mass stored. The mass can be total mass, the mass of a particular
molecular or atomic species, or biomass.

General mass balance equation

Mass in Mass out Mass Mass Mass


through through generated consumed accumulated
– + – =
system system within within within …(1)
boundaries boundaries system system system

Note: During chemical or biochemical reactions, the following two quantities are conserved:
• Total mass, so that total mass of reactants = total mass of products.
• Number of atoms of each element, so that, for example, the number of C, H and O atoms in the reactants = the
number of C, H and O atoms, respectively, in the products.

Bioprocess engineers do a mass balance to account for what happens to each of the chemicals that is used in a
chemical process. For example, in a plant that is producing sugar, if the total quantity of sugar going into the plant
is not equalled by the total of the purified sugar and the sugar in the waste liquors, then there is something wrong.
Sugar is either being burned (chemically changed) or accumulating in the plant or else it is going unnoticed down
the drain somewhere. In this case, the mass balance is:

Raw materials = Products + Waste products + Stored products + Losses

Mass balances can be based on total mass, mass of dry solids or mass of particular components, for example
protein. If a mass balance is written using the total mass in each process stream, then it is called total balance.

888
Bioprocess engineering

A separate mass balance can be written for a particular chemical component in the total mass. This is called
component balance. Thus, for a component mass balance the simplest expression is:

Input – Output + Formation – Disappearance = Accumulation

Types of mass balance


The general mass-balance equation can be applied in both continuous and batch processes. For continuous processes,
it is usual to collect information about the process referring to a particular instant of time. Amounts of mass entering and
leaving the system are specified using flow rates. A mass balance based on rates is called a differential balance.
An alternative approach is required for batch and semi-batch processes. Information about these systems is usually
collected over a period of time rather than at a particular instant. This type of balance is called an integral balance.

Mass balance in steady and unsteady state


If all properties of a system (such as temperature, volume, mass etc.) do not vary with time, the process is said to
be at steady state. In unsteady state, system properties vary with time. Processes like batch and fed-batch cannot
operate under steady state conditions. Mass and other parameters of system change with time. Total mass in case
of fed-batch process changes with time, but in a batch process, even though the total mass remains constant,
changes occurring inside the system cause the system properties to vary with time. Continuous processes may be
steady or unsteady. Generally unsteady state condition exists during startup of continuous process. If continuous
process is at steady state, the accumulation term in equation 1 must be zero. Because all properties of the system,
including its mass, must be unchanging with time, a system at steady state cannot accumulate mass. Under these
conditions, we can write a mass balance equation as,

Mass in + Mass generated = Mass out + Mass consumed

This is called the general steady-state mass balance equation. If there is no reaction in the system, then there is
neither generation nor consumption of mass. In such situation at steady state,

Mass in = Mass out


At steady state, does mass in = mass out?
Material Without reaction With reaction
Total mass Yes Yes
Total number of moles Yes No
Mass of a molecular species Yes No
Number of moles of a molecular species Yes No
Mass of an atomic species Yes Yes
Number of moles of an atomic species Yes Yes

In unsteady state, when the mass of the system varies as a function of time we can apply the following flow diagram.

Min System Mout


M RG RC

If, M = the mass of species A in the system


Min = the mass flow rate of A entering the system
Mout = the mass flow rate of A leaving the system
RG = the rate of generation of species A
RC = the rate of consumption of species A
dM
In this situation general mass balance equation is, = Min − Mout + R G − R C
dt

889
This page intentionally left blank.
Bioprocess engineering

unfamiliar with the relative magnitudes of the various forms of energy entering into a particular processing situation,
it is wise to put them all down. Then after some preliminary calculations, the important ones emerge and minor
ones can be lumped together or even ignored without introducing substantial errors.

Let us take the following system in which mass Min enters the system while mass Mout leaves. Both these masses
have energy (enthalpy, kinetic and potential) associated with them. During bioprocessing, high velocity motion and
large changes in height or electromagnetic field do not generally occur. Thus, we can ignore the values of kinetic
and potential energy. Enthalpy (H) is the total heat content of the system. It is defined as sum of the internal energy
plus the product of the pressure and volume.
Energy leaves the system as heat, Q and work, WS is done on the system by the surroundings. The total change or
accumulation of energy in the system,

ΔE = (MH)in – (MH)Out – Q + WS

WS

Min System Mout

It is the mathematical expression of the law of conservation of energy. Energy flow represented by Q and WS can
be directed either into or out of the system; appropriate signs must be used to indicate the direction of flow. In the
above figure we have followed the convention that:
• Work is positive when energy flows from the surroundings to the system.
• Work will be considered negative when the system supplies work energy to the surroundings.
• Heat is positive when the temperature of the system is higher than the surroundings.
At steady state, there is no change in the energy of the system; ΔE = 0
The steady state energy balance equation is (MH)in – (MH)out – Q + WS = 0

Enthalpy calculation
The most common energy form is heat energy. In a constant pressure system, with negligible changes in potential
and kinetic energies, the energy balance can be cast in terms of enthalpy changes. A heat balance is calculated on
the basis of the enthalpies of the substances taking part in a process and the heats of the corresponding chemical
reactions. Enthalpy is a measure of the total energy. It is an extensive property. Since internal energy cannot be
measured in absolute terms; thus absolute enthalpy of compound cannot be calculated. Changes in enthalpy are
evaluated relative to reference states. Change in enthalpy can occur as a result of – temperature change, phase
change, mixing and reaction.

Temperature change
The amount of energy released or absorbed by a chemical substance during a change of enthalpy is termed as
sensible heat. The sensible heat of a process may be calculated as the product of the body’s mass with its specific
heat and its change in temperature:

ΔH = M CP ΔT

Where, M = mass of the body,


CP = specific heat capacity at constant pressure
ΔT = change in temperature

893
Bioprocess engineering

Phase change
When a substance changes from one phase of matter to another, we say that it has undergone a change of phase.
These changes of phase always occur with a change of heat. Heat either comes into the material during a change
of phase or heat comes out of the material during this change. However, the heat content of the material changes,
the temperature does not. The amount of energy released or absorbed during a phase change is called latent heat.
The latent heat for a different mass of the substance can be calculated using the equation:

Q = ML

where, Q is the amount of energy released or absorbed during the change of phase of the substance
M is the mass of the substance and
L is the specific latent heat per gram for a particular substance; substituted as Lf to represent the specific
latent heat of fusion, Lv as the specific latent heat of vaporization.

Enthalpy change due to mixing

When compounds are mixed or dissolved, the bonds between molecules in the solvent and solute are broken and
reformed so a net absorption or release of energy takes place due to which internal energy and enthaply of mixture
change. The enthaply change during mixing of non-ideal two compounds A and B is given by

ΔHmixing = Hmixture – (HA + HB)

For an ideal solution or ideal mixture, ΔHmixing = 0

Enthalpy change due to reaction


Bioprocessing involves enzyme catalyzed reactions. During the reaction, relatively large changes in internal energy
and enthalpy occur. Enthalpy of a reaction is the amount of heat released or absorbed during the reaction and equal
to the difference in enthalpy of reactants and products. In case of an exothermic reaction, the enthalpy of a
reaction is negative. On the other hand, enthalpy of a reaction is positive for an endothermic reaction.

ΔHreaction = Hproduct – Hreactant

8.2 Cell growth kinetics


Cell growth is an orderly increase in the cell mass and cell number. In batch culture, when cells are inoculated into
a flask containing fresh culture medium and incubated, after a short lag phase cells enter into a rapid growth phase
during which the cells divide and increase their population in the flask medium. Since the cells are not transferred
to a new medium or no fresh nutrients are added to the medium, the increasing population of cells, after sometime,
enters into a stationary-phase with the exhaustion of the required nutrients and the accumulation of inhibitory end
products in the medium. Eventually, the stationary phase of cell population culminates into death-phase when the
viable cells begin to die. A batch culture can be considered to be a closed system.
Growth in batch culture can be divided into four distinct phases– these are lag phase, log phase, stationary phase
and death phase. Rate of growth varies depending on the growth phase.

Lag phase
During the lag phase, there is no increase in cell number. It is a period of adaptation of cells to a new environment.
There is no change in number, but an increase in mass. Thus in this phase, cells are not dormant. The length of the
lag phase is determined in part by the characteristics of the cells and conditions in the media. Multiple lag phases
may sometimes be observed when the medium contains more than one carbon source. This phenomenon is known
as diauxic growth. It occurs due to a shift in metabolic pathways in the middle of a growth cycle. After one carbon
source is exhausted, the cells adapt their metabolic activities to utilize the second carbon source.

894
This page intentionally left blank.
Bioprocess engineering

The residual substrate concentration in the reactor is controlled by the dilution. Any alteration to this dilution rate
results in a change in the growth rate of the cells that will be dependent on substrate availability at the new dilution
rate. Thus, growth is controlled by the availability of a rate-limiting nutrient. This system, where the concentration
of the rate-limiting nutrient entering the system is fixed, is often described as a chemostat as opposed to operation
as a turbidostat, where nutrients in the medium are not limiting. In turbidostat, turbidity of the culture is monitored
and maintained at a constant value by regulating the dilution rate, i.e. cell concentration is held constant.

Table 8.2 Comparison between turbidostat and chemostat continuous cultures


Parameter Turbidostat Chemostat
Methods of growth rate control Internal External
Growth rate of culture At or close to µmax From just above zero to just below µmax
Culture volume Constant Constant
Environmental conditions Constant Constant
Duration of culture Indefinite Indefinite

8.3 Fermentation
Fermentation (derived from the Latin verb fervere, to boil) is a general term for the anaerobic catabolism of organic
compounds such as sugar to obtain energy. In biochemistry, the term ‘fermentation’ has been used in a strict sense
to mean an energy-generation process in which organic compounds act as both electron donors and terminal
electron acceptors. However, industrial microbiologists have extended the term fermentation to describe any anaerobic
as well as aerobic process for the production of the product by the mass culture of a microorganisms. The bioreactors
used for fermentation process can be called fermentors. Although the term bioreactor is often considered to
synonymous with fermentor, not all bioreactors are fermentors. Bioreactors are the apparatus in which biochemical
reactions are performed, involving the organisms or biochemically active substances which are derived from such
organisms. Bioreactors which use living cells are usually called fermentors. Some bioprocess engineers use term
fermentor for vessels used to grow microbial cells and bioreactors for those in which plant and mammalian cells can
be cultured.

8.3.1 Fermentation processes


On the commercial scale, there are five major groups of fermentation processes:
1. Produce microbial cells (or biomass) as a product.
Bakers’ yeast, used in the baking industry, is an example of a produced cell mass. Others include single-cell
proteins for food sources.
2. Produce microbial enzymes.
3. Produce microbial metabolites.
4. Produce recombinant products.
5. Processes that modify a compound that is added to the fermentation process are referred to as biotransformations.
Biotransformations occur using the inherent enzymatic capability of most cells. Cells of all types can be employed
to biocatalyze a transformation of certain compounds via dehydration, oxidation, hydroxylation, amination or
isomerization.

Component parts of a fermentation process


Regardless of the type of fermentation an established process may be divided into six basic component parts:
1. The formulation of the medium to be used in culturing the process organism during the development of the
inoculum and in the production fermenter.

902
This page intentionally left blank.
Bioprocess engineering

8.4 Bioreactor
A bioreactor (in biochemical engineering, we also use terms like a biochemical reactor, biological reactor, fermenter
or microbial reactor, which are all synonymous) is a vessel in which biochemical reactions are performed, involving
the organisms or biochemically active substances which are derived from such organisms. Bioreactors are commonly
cylindrical, ranging in size from some liter to cubic meters and are often made of stainless steel. The process in the
bioreactor can either be aerobic or anaerobic. The term bioreactor is often used synonymously with fermenter;
however, in the strict sense, a fermenter is a system in which living cells are used.

Bioreactor design
Bioreactor design is a relatively complex engineering task. The goal of an effective bioreactor is to control, contain
and positively influence the biological reaction. Suitable bioreactor design criteria include:
• Microbiological and biochemical characteristics of the cell systems (microbial, mammalian, plant cell).
• Hydrodynamic characteristics of the bioreactor.
• Mass and heat characteristics of the bioreactor.
• Kinetics of cell growth and product formation.
• Genetic stability characteristics of the cell system.
• Sterilization and maintenance of sterility.
• Agitation (for mixing of cells and medium) and aeration (aerobic fermenters; for O 2 supply).
• Process monitoring and control (regulation of factors like temperature, pH, pressure, aeration, nutrient).
• Implication of bioreactor design on downstream product separation.
• Capital and operating costs of the bioreactor.
• Potential for bioreactor scale-up.

In addition to controlling these, the bioreactor must be designed to both promote formation of the optimal morphology
of the organism and eliminate or reduce contaminations by unwanted organisms or mutation of the organisms.
There are a wide variety of bioreaction systems, and any attempt to categorize them by their various attributes will
naturally result in some overlap of system characteristics.

8.4.1 Agitation and aeration


Agitation

Mixing is one of the most important operations in bioprocessing. Within a fermenter, there is a need to mix three
different phases:
• Liquid phase, which contains dissolved nutrients and metabolites.
• Gaseous phase, which is predominantly oxygen and CO 2.
• Solid phase, which is made up of the cells and any solid substance that may be present.

Purpose of mixing
• Air bubble dispersion;
• Mass transfer from air bubbles (i.e. oxygen supply) to the liquid and then to the cells;
• Supply of the nutrient components to cells (more precisely, cell agglomerates);
• Prevention of sedimentation;
• Securing of heat transfer;
• Solubility of the nutrient’s components which are less soluble.

904
This page intentionally left blank.
Bioprocess engineering

Chemicals controlling foams have been classified into antifoams, which are added in the medium to prevent foam
formation, and defoamers which are added to knock down foams once these are formed. Natural antifoams include
plant oils (e.g. from sunflower and rapeseed), deodorized fish oil and mineral oils. The synthetic antifoams are
mostly silicon oils, polyalcohols and alkylated glycols. An ideal antifoam should have the following properties:

1. Should disperse readily and have fast action on an existing foam.


2. Should be active at low concentrations.
3. Should not be metabolized by the microorganism and also non-toxic to the microorganism.
4. Should not cause any problem in the extraction and purification of the product.
5. Should have no effect on oxygen transfer.
6. Should be heat sterilizable.

Table 8.5 List of monitoring parameters and measuring devices


Monitoring parameter Measuring devices
Temperature Resistance thermometer or thermistors
Pressure Diaphragm gauges
Flow rate Rotameter or thermal mass flow meter
Viscosity Rotational viscometers
Turbidity Photometer
Foam Conductance probe, Capacitance probe
Agitation Tachometer

8.6 Sterilization
If a culture medium or a part of the equipment used for fermentation becomes contaminated by living foreign
microorganisms, the target microorganisms must grow in competition with the contaminating microorganisms.
Thus, not only the medium but also all of the fermentation equipment being used must be sterilized prior to the start
of fermentation, so that they are perfectly free from any living microorganisms and spores. In case of aerobic
fermentations, the air supplied to the fermentor should also be free from contaminating microorganisms. Steriliza-
tion is a term referring to any process that eliminates or kills all forms of life. Sterilization can be achieved by
applying the proper combinations of heat, chemicals, irradiation and filtration.

Thermal sterilization
For all microorganisms, there is a maximum temperature for growth, beyond which viability decreases. At very
high temperature, virtually all macromolecules lose their structure and their ability to function. The destruction of
microorganisms by heat at a constant temperature follows a first-order rate equation. If the initial number of
cells = N0, the number of destroyed cells = N’ at time t and N the surviving cells, then death rate is
dN
 k d (N 0  N') kdN
dt

where kd is specific death constant dependent on microbial type and temperature. When integrated between N 0 at
time t = 0 and N at time t = t, the following equation is obtained.
N
ln  k dt
N0

N N0ekdt where, t is time (min)

The rate constant kd is a function of temperature. It increases sharply with temperature and can be experimentally
determined for an organism. According to Arrhenius equation, the relationship between temperature and k d can be
expressed as:

917
This page intentionally left blank.
Bioprocess engineering

(e.g. water and ethanol) are Newtonian fluids. In Non-Newtonian fluids, viscosity varies with shear or agitation
rates. Depending on how the shear stress varies with the shear rate, these are categorized into pseudoplastic,
dilatant and Bingham plastic fluids. The viscosity of pseudoplastic fluids decreases with increasing shear rate,
whereas dilatant fluids show an increase in viscosity with shear rate. Bingham plastic fluids do not flow until a
threshold stress called the yield stress is applied, after which the shear stress increases linearly with the shear rate.

Bingham fluid

Pseudoplastic fluid

Newtonian fluid

Shear stress, Pa
Dilatant fluid

–1
Shear rate, s

Figure 8.14 Relationship between shear rate and shear stress for Newtonian and non-Newtonian fluids.

For Newtonian fluids the viscosity is independent of the shear rate – i.e. it is constant – whereas for non-Newtonian
fluids it is a function of the shear rate. The viscosities of fluids vary over a wide range. Water and many fermentation
broths containing yeasts or bacteria, can be considered to be Newtonian fluids. However, fermentations involving
filamentous fungi or fermentations in which polymers are excreted, will often exhibit non-Newtonian behaviour.
Thus, bioreactor performance influenced by broth rheology is determined by:

• Biomass concentration
• Cell morphology, including size, shape and mass
• Biomass growth rate
• Flexibility and deformability of cells
• Osmotic pressure of the suspending fluid
• Concentration of polymeric substrate
• Concentration of polymeric product

8.10 Enzyme immobilization


Enzymes are protein molecules which serve to accelerate the biochemical reactions in living cells. Enzymes display
great specificity and are not permanently modified by their participation in reactions. Since they are not changed
during the reactions, it is cost-effective to use them more than once. However, if the enzymes are in solution with
the reactants and/or products, it is difficult to separate them. Therefore, if they can be attached to the reactor in
some way, they can be used again after the products have been removed. The term immobilized means unable to
move or stationary. An immobilized enzyme remains attached to inert, insoluble materials and the restriction of
enzyme mobility in a fixed space is known as enzyme immobilization. Immobilized enzymes are also sometimes
referred to as insolubilized, supported or matrix-linked enzymes.
Enzyme immobilization provides enzyme reutilization, eliminates costly enzyme recovery and purification processes,
and may result in increased activity by providing a more suitable microenvironment for the enzyme. However
immobilization may also cause enzyme instability, loss of activity and a shift in optimal conditions (pH, ionic strength).

930
Bioprocess engineering

To obtain maximum reaction rates, the particle size of the support material and enzyme loading need to be
optimized, and a support material with the correct surface characteristics must be selected.

Benefits of immobilizing an enzyme


There are several advantages of attaching enzymes to a solid support and some of them are:
• Multiple or repetitive use of a single batch of enzymes.
• The ability to stop the reaction rapidly by removing the enzyme from the reaction solution (or vice versa).
• Product is not contaminated with the enzyme (especially useful in the food and pharmaceutical industries).
• Stability of enzymes may be enhanced over a broader range (pH, temperature).

Various methods for enzyme immobilization may be subdivided into two general classes: entrapment and bound
methods.

Entrapment method

The entrapment method is based on the localization of an enzyme within the micro space. Enzymes can be immobilized
by entrapment in a porous matrix or by encapsulation in a semipermeable membrane capsule or between membranes,
such as in a hollow-fiber unit. In entrapment, the enzymes are not directly attached to the support surface, but
simply trapped inside the polymer matrix. Thus, loss of enzyme activity upon immobilization is minimum.

Entrapment can be matrix entrapment and membrane entrapment.

Matrix entrapment
Matrix entrapment involves entrapping enzymes within the interstitial spaces of a cross-linked water-insoluble
polymer. Matrices used are usually polymeric materials such as calcium alginate, agar and polyacrylamide. All
these gels can be formed with a simple and similar procedures. In all the protocols, enzymes are well mixed with
monomers/polymers and cross-linking agents in a solution. The solution is then exposed to polymerization promoters
to start the process of gel formation. The solution is poured into a mold to achieve the desired shape. Among
different gels, polyacrylamide is the most widely used matrix for entrapping enzymes. It has the advantage of
being non-ionic.

Membrane entrapment

Membrane entrapment involves:


Entrapment in porous hollow fibre: In this case hollow fibre units are used to entrap an enzyme solution between
thin permeable membranes (such as nylon, cellulose, polysulfone, polyacrylate).
Entrapment in microencapsule (microencapsulation): In this case, enzyme can be immobilized within microcapsules
(ranging from 5–300 μm in diameter) prepared from organic polymers, so that the enzyme cannot escape, but
substrates and products can enter and leave the capsule by diffusion through the membrane.

E E E

E E E E Membrane
E E E
E E E
E

Matrix entrapped Micro-encapsulated

Bound method

Bound methods include:


• Covalent bonding of enzyme to a water insoluble matrix.
• Intermolecular cross-linking of enzyme molecules.
• Adsorption of enzyme to a water insoluble matrix.

931
This page intentionally left blank.
Bioprocess engineering

8.11 Scale up
The term scale up is used for the step from small scale to production scale. Transferring to an industrial scale
processes successfully developed at the lab scale is not a simple procedure. Scale up is the successful start up and
operation of a commercial size unit whose design and operating procedures are in part based on experimentation
and demonstration at a smaller scale. While the scale-up of any bioprocess can involve a host of issues, the
challenges are compounded when the process involves batch fermentation. The phenomenon that needs to be
taken into account during scale up can be divided into physical processes (transport phenomenon) and metabolic
processes (microbial kinetics).
Due to the typical fragility of the engineered microorganisms, large-scale fermentation vessels must be designed
with the ability to:
• Remove the heat buildup that results from metabolic processes;
• Manage agitation and mixing with minimal shear damage;
• Effectively control the highly variable liquid flow rates and turndowns that are associated with batch fermentation;
• Execute safeguards and sterilization techniques to guard against potential contamination.

8.12 Downstream processing


Any industrial fermentation operation can be divided into three main stages, viz, upstream processing, the
fermentation process and downstream processing. Upstream processing includes formulation of the fermentation
medium, sterilization of air, fermentation medium and the fermenter, inoculum preparation and inoculation of the
medium. The fermentation process involves the propagation of the microorganism and production of the desired
product. Downstream processing includes the recovery of the products in a pure state.

Crude raw materials

Upstream process

Processed raw materials

Fermentation process

Crude products

Downstream process

Finished products

Bioprocessing treats raw materials and generates useful products. A problem common to all biological processes,
whether based on fermentation or cell culture technology, is the need to recover the product. Fermentation broths
are complex, aqueous mixtures of cells, comprising the soluble extracellular, intracellular products and any
unconverted substrates. The fermentation broth has to be processed and passed through several stages of
separation and purification. Any treatment of the culture broth after fermentation to concentrate and purify the
product is known as downstream processing. Individual operations or steps in the process that alter the
properties of materials are called unit operations. It usually refers to processes that cause physical modifications

935
Bioprocess engineering

to materials, such as a change of phase or component concentration. Unit operations involve centrifugation,
chromatography, crystallization, membrane processes (such as dialysis, ultrafiltration, microfiltration, and reverse
osmosis), distillation, drying, evaporation, mixing, precipitation, solvent extraction etc.
The recovery of product from fermentation broths depends very much on the type of cell and how the bioreactor
is designed and operated. The selection of appropriate purification step depends on the nature of the end product,
its concentration, the side product present, the stability of the biological materials and necessary degree of
purification. Although each recovery scheme will be different, the sequence of steps in downstream processing
can be generalized depending on whether the biomass itself is the desired product (e.g. bakers’ yeast), whether
the product is contained within the cells (e.g. enzymes and recombinant proteins), or whether the product
accumulates outside the cells in the fermentation liquor (e.g. ethanol, antibiotics, and monoclonal antibodies).
General schemes for these three types of downstream processing operation are represented in figure and involve
the following major steps.
The strategy during downstream processing can be generalized into five steps:

1. Cell separation: It involves the separation of cells from fermentation broths. This step involves unit operations
such as filtration, centrifugation and flocculation.

2. Cell disruption and cell debris removal: In those cases where the intracellular products (i.e. the products are
located inside the cells) are required, the cells must first be disrupted. Some products may be present in the
solution within the cytoplasm, while others may be insoluble and exist as membrane-bound proteins or small
insoluble particles called inclusion bodies. In the latter case, these must be solubilized before further purification.

Unit operations such as high-pressure homogenization, ultrasonication are used to break open the cells and
release their contents for subsequent purification. The cell debris generated during cell disruption is separated
from the product by filtration or centrifugation. A variety of methods is available to disrupt cells. Cell disruption
methods can be classified as either mechanical and non-mechanical. Typical animal cells are fragile and can be
easily ruptured by using a low shearing force or a change in the osmotic pressure. In contrast, bacteria, fungi
such as yeasts, and plant cells have rigid cell walls, the disruption of which requires high shearing forces.
Mechanical methods include grinding with abrasives, high-speed agitation, high-pressure homogenization
and ultrasonication. Non-mechanical methods such as osmotic shock, enzymic digestion of cell walls, and
treatment with solvents and detergents can also be applied. A widely-used technique for cell disruption is high-
pressure homogenization. In this process, cell disintegration takes place upon the application of high hydrostatic
pressure followed by immediate pressure release as the cell suspension passes through a valve. Ultrasonication
(on the order of 20 kHz) causes high-frequency pressure fluctuations in the liquid, leading to the repeated
formation and collapse of bubbles. Cell disruption by ultrasonication is used extensively in the laboratory.

3. Product isolation or concentration: It involves removal of those components whose properties vary markedly
from that of the desired product. Solvent extraction, adsorption, ultrafiltration and precipitation are some of the
unit operations involved in products isolation. Liquid extraction, also known as solvent extraction and partitioning,
is a method to separate compounds based on their relative solubilities in two different immiscible liquids,
usually water and an organic solvent. It is an extraction of a substance from one liquid into another liquid phase.

4. Product purification: It is done to separate those contaminants that resemble the product very closely in
physical and chemical properties. Examples of some unit operations used in purification include chromatography
and membrane separation (ultrafiltration and reverse osmosis).

5. Product preparation: It describes the final processing steps which end with the packaging of the product in a
form that is stable, easily transportable and convenient. This step involves unit operations such as drying,
crystallization and freeze-drying.

936
This page intentionally left blank.
Bioprocess engineering

Malic acid Leuconostoc brevis (bacteria)


Penicillins Penicillium chrysogenum (fungi)
Cephalosporins Acremonium chrysogenum (fungi)
Bacitracin Bacillus licheniformis (bacteria)
Gramicidin Bacillus brevis (bacteria)
B 12 (Cyanocobalamin) Pseudomonas denitrificans (bacteria)
β-Carotene (Provitamin A) Blakeslea trispora (fungi)
Ascorbic acid (vitamin C) Acetobacter suboxydans (bacteria)
Alginate Azotobacter vinelandii (bacteria)
Cellulose Acetobacter xylinum (bacteria)
Dextran Leuconostoc mesenteroides (bacteria)
Pullulan Aureobasidium pullulans (fungi)
Xanthan Xanthomonas campestris (bacteria)

8.14 Wastewater treatment


Wastewater is the end-product or by-product of municipal, agricultural and industrial activity. It contains a variety
of organic and inorganic compounds of anthropogenic and natural origin. The contaminants in wastewater include
biodegradable organic matters, suspended solids, nutrients, heavy metals and pathogens.

Quantification of biodegradable material in wastewater


The biodegradable materials of a waste-water sample can be expressed in two ways: biological oxygen demand
(BOD) and chemical oxygen demand (COD). The BOD test estimates the amount of oxygen required by aerobic
microorganisms to oxidize biodegradable materials in the wastewater- over a fixed period of time (normally 5
days), at constant temperature (20°C) in the dark. A wastewater sample is saturated with oxygen and seeded with
an inoculum containing a diverse range of microbes. Its oxygen concentration is measured before and after a 5
days incubation period and the results are expressed as milligrams of oxygen per litre of waste.
COD determines the amount of oxygen required to chemically oxidize any oxidizable organic material present in a
wastewater. Organic compounds are oxidized by a strong chemical oxidant, and using the reaction stoichiometry,
the organic content is calculated. This test involves the addition of a known volume of sample to a mixture of
oxygen-rich potassium dichromate and concentrated sulphuric acid. Almost all organic compounds present in
wastewater are oxidized by strong chemical oxidants. Therefore, the COD content of a waste-water sample usually
exceeds the measured BOD (COD > BOD). The BOD:COD ratios for sewage are normally between 0.2:1 and 0.5:1.

Treatment processes
Waste materials in wastewater can be classified into three major categories: industrial wastes, domestic wastes
and agricultural wastes. Each of these waste materials has its own characteristics, and thus treatment methods
vary. The waste treatment methods include physical, chemical and biological treatments.

1. Physical treatment
Physical treatment includes screening, flocculation, sedimentation and filtration, which are usually used for the
removal of insoluble materials.

2. Chemical treatment
Chemical treatment includes chemical oxidations and chemical precipitation.

3. Biological treatment
Biological treatment includes the aerobic and anaerobic treatment of wastewater by a mixed culture of
microorganisms.

945
Bioprocess engineering

The physical, chemical and biological treatments can be further categorized as primary and secondary (or biological)
treatments.
Primary treatment includes physical and chemical processes, such as coagulation and sedimentation, to remove
particles too large to pass through simple screening devices. Primary treatment typically decreases 25 to 40
percent of the BOD and removes about 60 percent of the suspended solids.
In secondary treatment (or biological treatment), the physical and chemical processes that make up primary
treatment are augmented with processes that involve the microbial oxidation of wastes. Such biological treatment
mimics nature by utilizing microorganisms to oxidize the organic compounds. The main purpose of secondary
treatment is to lower BOD.
Any treatment above secondary treatment is defined as tertiary treatment. This treatment is sometimes called
as the final or advanced treatment and consists of removing the organic load left after secondary treatment.

A typical wastewater treatment process includes the primary and secondary treatment, depending on the degree of
purification. Biological treatment may be achieved aerobically or anaerobically in a number of ways. The most
widely used aerobic processes are trickling filters, activated sludge processes and their modifications. The anaerobic
processes (digestion, filtration and sludge blankets) are used both in the treatment of specific wastewaters and in
sludge conditioning.

Trickling filters
The basic principle of aerobic trickle filters is that a microbial population is allowed to develop as a biofilm on an
inert support material within a biological reactor. Wastewater is continuously sprayed over the surface of the
support material which percolates through the filter bed, where it is biodegraded by the microbial population.
Aeration is achieved by exploiting the difference in temperature between the inside and the outside of the reactor,
resulting in a countercurrent of air. High microbial activity within the reactor causes a rise in temperature, and warm
air rises and allows fresh air to enter at the bottom of the reactor.

Activated sludge
Activated sludge processes include a well agitated and aerated continuous flow reactor and a settling tank. This is
a two step process, involving biological treatment and secondary settlement. Biological treatment is performed in
an aeration tank containing a diverse range of microorganisms. To maintain aerobic conditions, air or oxygen is
pumped into the tank and the mixture is kept thoroughly agitated.
Secondary settlement occurs when the treated effluent (mixed liquor) from the aeration tank passes into a secondary
settlement tank. In settlement tank, solids mostly bacterial masses are separated from the liquid by subsidence.

Sludge treatment
The purpose of trickling filter and activated sludge process is to lower BOD from the wastewater before the treated
liquid is released to a water body. What remains to be disposed off is a mixture of solids and water, called sludge.
Sludge treatment process can be divided into two categories– either aerobic or anaerobic. In aerobic treatment
sludge is brought into contact with the mixed microbial population of aerobic microorganisms and oxygen. During
the process, part of the biodegraded material is converted into carbon dioxide and a portion becomes new biomass.
A major problem associated with aerobic treatment is the disposal of excess biomass produced during the degradation
of waste. The traditional method of sludge processing utilizes anaerobic treatment. That is, it involves bacteria that
thrive in the absence of oxygen. Anaerobic digestion is slower than aerobic digestion, but has the advantage that
only a small percentage of the wastes is converted into new bacterial cells. Most of the organics are converted into
carbon dioxide and methane gas.
The anaerobic digestion process is very complex and can be divided into two phases. In the first phase, complex
organic compounds such as proteins, lipids and carbohydrates are converted into simpler organic materials. The
bacteria that perform this conversion are commonly referred to as acid formers. In the second phase, organic
materials are converted slowly into CO2, CH4 and other stable end products by methane forming bacteria. These

946
Bioprocess engineering

bacteria are very sensitive to temperature, pH, toxins and oxygen. The optimal temperature and pH range for
methanogenic bacteria are 35° to 40°C and pH 7 to 7.8. Methanogenic bacteria used for this purpose are
Methanobacterium, Methanobacillus and Methanococcus.

Oxidation ponds
Oxidation ponds are large, shallow ponds, typically 1-2 m deep. It acts as a shallow waste-treatment reactor where
raw or partially treated sewage is decomposed by microorganisms. The conditions are similar to eutrophic lake.
The ponds can be designed to maintain aerobic conditions. Oxidation ponds are also used to augment secondary
treatment, in which case they are often called polishing ponds.

Advanced wastewater treatment


Advanced wastewater treatments are designed for the purpose of removing nitrogen and phosphorus. Nitrogen
containing organic compounds are first oxidized biologically to ammonium ions which is further oxidized to nitrite
and nitrate by genera nitrosomonas and nitrobacter, respectively. The second phase is anaerobic denitrification
which releases nitrogen gas. A number of bacteria can act as denitrifiers such as Pseudomonas, Alcaligenes, Arthrobacter.
Phosphorus in wastewater exists in many forms but all of it ends up as orthophosphate. Removing phosphate is
most often accomplished by adding a coagulant, usually alum or lime. Phosphate removal from wastewater by
biological means involves assimilation or storage.

8.15 Bioremediation
Bioremediation is a biological process whereby organic wastes are biologically degraded under controlled conditions.
This process involves the use of living organisms, primarily microorganisms, to degrade the environmental
contaminants. In this process, contaminant compounds are transformed by living organisms through reactions that
take place as a part of their metabolic processes. For bioremediation to be effective, microorganisms must
enzymatically attack the contaminants and convert them to harmless products. Hence, it is effective only where
environmental conditions permit microbial growth and activity. Thus, its application involves the manipulation of
environmental parameters to allow microbial growth and degradation to proceed at a faster rate. The control and
optimization of bioremediation processes is a complex phenomenon. Various factors influencing this process include:
the existence of a microbial population capable of degrading the pollutants; the availability of contaminants to the
microbial population; and the environment factors (type of soil, temperature, pH, the presence of oxygen or other
electron acceptors, and nutrients).

Bioremediation strategies
Bioremediation strategies can be in-situ or ex-situ. In-situ bioremediation involves treating the contaminated material
at the site while ex-situ bioremediation involves the removal of the contaminated material to be treated elsewhere.
In-situ bioremediation techniques are generally the most desirable options due to lower cost and less disturbance
since they provide the treatment at a site avoiding excavation and transport of contaminants. Ex-situ bioremediation
requires transport of the contaminated water or excavation of contaminated soil prior to remediation treatments.
In-situ and ex-situ bioremediation strategies involve different technologies such as bioventing, biosparging, bioreactor,
composting, landfarming, bioaugmentation and biostimulation.

Bioventing is an in-situ bioremediation technology that uses microorganisms to biodegrade organic constituents
adsorbed on soils in the unsaturated zone (extends from the top of the ground surface to the water table). Bioventing
enhances the activity of indigenous bacteria and stimulates the natural in-situ biodegradation of contaminated
materials in soil by inducing air or oxygen flow into the unsaturated zone and, if necessary, by adding nutrients.

Biosparging is also an in-situ bioremediation technology that uses indigenous microorganisms to biodegrade
organic constituents in the saturated zone. In biosparging, air (or oxygen) and nutrients (if needed) are injected
into the saturated zone to increase the biological activity of the indigenous microorganisms.

947
This page intentionally left blank.
Chapter 09
Bioinformatics

9.1 Introduction
Bioinformatics is a discipline at the intersection of biology, computer science, information technology and mathematics.
There are a number of definitions put forth for bioinformatics. Most accepted definition of bioinformatics is ‘a
subject of genetic data collection, analysis and dissemination to the research community’. Bioinformatics aims at
integrating and analyzing a wealth of biological data with the aim of identifying and assigning a function to each. It
is applied, for example, in the construction of genetic and physical maps of genomes, gene discovery, the inference
of the molecular function and three-dimensional structure of their products, the interpretation of the effect of gene
variations on the phenotype, the reconstruction of interaction and signal transduction pathways and the simulation
of biological systems.

9.2 Biological databases


Bioinformatics is about exploring biological information. This information is kept safely in databases. A database
consists of an organized collection of persistent data that provides a standardized way for locating, adding, and
changing data. Biological data are available in the form of sequences and structures of proteins and nucleic acids.
The biological information of nucleic acids is available as sequences while the data of proteins is available as
sequences and structures. Sequences are represented in a single dimension whereas the structure contains the
three-dimensional data of sequences.
The first database was created after the insulin protein sequence was made available in 1956. Insulin (consists of
51 residues) is the first protein to be sequenced. Later, three-dimensional structure of proteins were studied and the
well known Protein Data Bank was developed as the first protein structure database.

Database classification
Biological databases can be classified into sequence and structure databases or primary and secondary databases.
Primary and secondary databases are classified on the basis of source of data.

Primary databases
Databases consisting of data derived experimentally such as nucleotide sequences and three-dimensional structures
are known as primary databases. Examples of these include GenBank, EMBL and DDBJ for nucleotide sequences
and the Protein Data Bank (PDB) for 3D-protein structures. GenBank, EMBL and DDBJ are three major public
sequence databases that store raw nucleic acid sequence data produced and submitted by researchers worldwide:
which are all freely available on the Internet.

Secondary databases
A secondary database derives from the analysis or treatment of the primary database. A secondary sequence
database contains information like the conserved sequence, signature sequence and active site residues of the
protein families arrived by multiple sequence alignment of a set of related proteins.

954
Bioinformatics

Table 9.1 Examples of primary and secondary databases


Sequence databases

Primary databases
EMBL (the European Molecular Biology Laboratory)
GenBank
DDBJ (DNA Data Bank of Japan)
Swiss-Prot
PIR (Protein Information Resource)
Uniprot

Secondary databases
TrEMBL (Translation of EMBL nucleotide sequence database)
Pfam
Prosite

Structure databases

Primary databases
PDB (Protein Data Bank)

Secondary databases
SCOP (Structural Classification of Proteins)
CATH (Class, Architecture, Topology and Homology)
NDB (Nucleic Acid Database)
DSSP (Database of Secondary Structure Assignments)
HSSP (Homology-derived Secondary Structure of Proteins)

Swiss-Prot
Swiss-Prot (developed at the Swiss Institute of Bioinformatics) is a protein sequence database which strives to
provide a high level of annotations (such as the description of the function of a protein, its domains structure, post-
translational modifications, variants etc.), a minimal level of redundancy and high level of integration with other
databases. Two related databases closely associated with Swiss-Prot are the ENZYME DB and PROSITE (a set of
motifs pattern database). The ENZYME DB stores the following information about enzymes:
• EC Number
• Recommended name
• Alternative names, if any
• Catalytic activity
• Cofactors, if any
• Pointers to Swiss-Prot and other data banks
• Pointers to disease associated with enzyme deficiency if any known

Protein Information Resource (PIR)


PIR was established in 1984 by the National Biomedical Research Foundation (NBRF) as a resource to assist
researchers in the identification and interpretation of protein sequence information.

PIR maintains several databases about proteins:


• PIR-PSD: the main protein sequence databases.
• iProClass: classification of proteins according to structure and function.
• ASDB: annotation and similarity database.

955
This page intentionally left blank.
Bioinformatics

Genome databases
Genome sequences form entries in the standard nucleic acid sequence databases. Many species like Arabidopsis
thaliana, C. elegans, Rice etc., have special databases that bring together the genome sequence and its annotation
with other data related to the species.

• Microbial genome database


http://www.ncbi.nlm.nih.gov:80/PMGGifs/Genomes/micr.html
• TIGR: The comprehensive Microbial Resource
http://www.tigr.org/tigr-scripts/CMR2:CMRHomepage.spl
• Arabidopsis thaliana genome displayer
http://www.kazusa.or.jp/kaos
• Caenorhabditis elegans (worm) database
http://www.wormbase.org/
• EBI genomes
http://www.ebi.ac.uk/genomes/

NCBI

The National Center for Biotechnology Information (NCBI) was established on November 4, 1988 at the National
Institutes of Health (NIH) with an objective to develop new information technologies to help in the understanding of
fundamental molecular and genetic processes that control health and disease. The NCBI has been assigned with
creating automated systems for storing and analyzing knowledge about molecular biology, biochemistry, and genetics;
facilitating the use of such databases and software by the research and medical community; coordinating efforts to
gather biotechnology information both nationally and internationally; and performing research into advanced methods
of computer-based information processing for analyzing the structure and function of biologically important molecules.

At NCBI, databases are linked through a unique search and retrieval system, called Entrez. Entrez allows a user
to not only access and retrieve specific information from a single database but also access integrated information
from many NCBI databases. It is a gateway that allows text-based searches for a wide variety of data, including
annotated genetic sequence information, structural information, as well as citations and abstracts, full papers, and
taxonomic data.

Specialized databases
Specialized databases normally serve a specific research community or focus on a particular organism. Many
individuals or groups select, annotate, and recombine data focused on particular topics, and include links affording
streamlined access to information about subjects of interest. The protein kinase resource is a specialized com-
pilation that includes sequences, structures and functional information, laboratory procedures, list of interested
scientists, tools for analysis, a bulletin board and links. The HIV protease database store structures of HIV1
proteinases, HIV2 proteinases and SIV proteinases, and their complexes and provides tools for their analysis and
other links.

9.3 Sequence formats


The protein and nucleic acids sequences can be stored in computer files. Once in the computer, the sequences can
be analyzed by a variety of methods. Most sequence analysis programs require that the information in a sequence
file be stored in a particular format. Format refers to the arrangement of data within a document file that typically
permits the document/data to be read or written by certain application. In other words, it is an organization of data
in a particular order. Some of the commonly used sequence formats are discussed below:

957
This page intentionally left blank.
Bioinformatics

NBRF/PIR sequence format


The NBRF (National Biomedical Research Foundation) format has the following features. The first line includes an
initial “>” character followed by a two-letter code such as P for complete sequence or F for fragment, followed by
a 1 or 2 to indicate type of sequence, then a semicolon, then a four- to six-character unique name for the entry.
There is also an essential second line with the full name of the sequence, a hyphen, then the species of origin. The
sequence terminates with an asterisk.

>P1;CRAB_ANAPL
ALPHA CRYSTALLIN B CHAIN (ALPHA(B)-CRYSTALLIN).
MDITIHNPLIRRPLFSWLAPSRIFDQIFGEHLQESELLPASPSLSPFLMRSPIFRMPSWL
ETGLSEMRLEKDKFSVNLDVKHFSPEELKVKVLGDMVEIHGKHEERQDEHGFIAREFN
RKYRIPADVDPLTITSSLSLDGVLTVSAPRKQSDVPERSIPITREEKPAIAGAQRK*

Figure 9.4 NBRF/PIR sequence entry format.

9.4 Biosequence analysis


The determination of the linear sequence of amino acids in proteins and the nucleotides in DNA and RNA leads to
the requisite for compiling and analyzing sequence data. Sequence analysis is the process of investigating the
information content of linear raw nucleic acid and protein sequence data.

Amino acid sequence analysis


Apart from maintaining the large database, mining useful information from these sets of primary and secondary
databases is very important. Linear chains of amino acids, in proteins, the product of gene translation, are normally
found in cells folded into functionally active structures. It is established that the primary sequence of the protein,
that is, its amino acid sequence, determines the ultimate conformation of the protein and therefore its biological
function. However, the flexibility of long-chain polypeptides can generate an almost infinite number of shapes, and
the computational task of predicting correct structures is beyond the reach of current knowledge. Predicting the
shape of a protein from its linear amino acid sequence is one of the important goals of computational biology.
A lot of efficient algorithms have been developed for data mining and knowledge discovery. These are computation
intensive and need fast and parallel computing facilities for handling multiple queries simultaneously. It is these
search tools that integrate the user and the databases. One of the widely used search program is BLAST (Basic
Local Alignment Search Tool).

Nucleic acid sequence analysis


Nucleic acid sequence analysis includes assembling partially overlapping fragments, analyzing sequences, comparing
sequences and detecting functional (RNA coding) regions. The bulk of genomic DNA does not code for proteins, and
the protein-coding regions of human genes are not collinear but arranged with exons interspersed with introns.
Therefore, an important question for computational biology is how to detect protein-coding regions within genomic
DNA.
Current DNA sequencing technologies are not capable of generating a complete sequence of long nucleic acid
molecules in a single sequencing run and so it is necessary to utilize computational methods to assemble contiguous
sequences from individual short-sequence determinations. If a large DNA molecule is randomly broken into smaller
pieces for the actual sequence determinations then a contiguous linear sequence can be reconstructed by aligning
the overlapping portions from different random fragments.
A common question arising when new genes are cloned and sequenced is whether the sequence is already known
or does not occur in current databases. Answering this question requires comparing the newly obtained sequence
to every sequence in the database.

960
Bioinformatics

9.5 Sequence alignment


Sequence alignment refers to the procedure of comparing two or more sequences of nucleic acid or protein by
looking for a series of individual characters or character patterns that are in the same order in the sequences. It is
used to identify regions of similarity that may be a consequence of functional, structural or evolutionary relationships
between the sequences.

Global alignments and local alignments


Computational approaches to sequence alignment generally fall into two categories: global alignments and local
alignments.
In global alignment, two sequences to be aligned are assumed to be generally similar over their entire length.
Alignment is carried out from beginning to end of two sequences to find the best possible alignment across the
entire length between the two sequences. It attempts to align every residue in every sequence. Sequences that are
quite similar and approximately of the same length are suitable candidates for global alignment. A general global
alignment technique is the Needleman-Wunsch algorithm, which is based on dynamic programming.
Local alignment, on the other hand, does not assume that the two sequences in question have similarity over the
entire length. It only finds local regions with the highest level of similarity between the two sequences and aligns
these regions without regard for the alignment of the rest of the sequence regions. Local alignments are more
useful for dissimilar sequences that are suspected to contain regions of similarity or similar sequence motifs within
their larger sequence context. The Smith-Waterman algorithm is a general local alignment algorithm, also based on
dynamic programming. With sufficiently similar sequences, there is no difference between local and global alignments.

Pairwise and multiple sequence alignments

Pairwise sequence alignment

Pairwise alignment is used between two query sequences at a time. It is used to identify regions of similarity that
may indicate functional, structural and/or evolutionary relationships between two biological sequences (protein or
nucleic acid). It involves matching of homologous positions in two sequences. Positions with no homologous pair
are matched with a space ‘–’ and a group of consecutive spaces is a gap.

C A – – G AT T C G A AT
C G C C G AT T– – – AT
{

gap

The three primary methods of producing pairwise alignments are dot-matrix methods, dynamic programming
and word methods. Although each method has its individual strengths and weaknesses, all three pairwise methods
have difficulty with highly repetitive sequences.

Dot matrix analysis


The dot matrix (alternatively, a dotplot or diagonal plot) is a simple picture that gives an overview of pairwise
sequence similarity. It is a graphical way of comparing two sequences in a two dimensional matrix. Dot represents
the similarity between segments of the two sequences. The rows correspond to the residues of one sequence and
the columns to the residues of the other. In its simplest form, the positions in the dotplot are left blank if the
residues are different, and filled if they match. The dot plots of very closely related sequences will appear as a
single line along the matrix’s main diagonal.

Dynamic programming
The idea behind dynamic programming is, to solve a given problem, we need to solve different parts of the problem
(subproblems), then combine the solutions of the subproblems to reach an overall solution. Often, many of these
subproblems are really the same. The dynamic programming approach seeks to solve each subproblem only once,

961
This page intentionally left blank.
Bioinformatics

9.6 Molecular phylogenetics


Phylogenetics is the science of estimating and analyzing evolutionary relatedness among various group of organisms.
Molecular phylogenetics is the use of the structure of molecules to gain information on an organism’s evolutionary
relationships. Evolution, at the molecular level, is observable as nucleotide changes in the nucleic acids and amino
acid changes in proteins. Nucleic acids (DNA and RNA) and proteins are ‘information molecules’ in that they retain
information of an organism’s evolutionary history. The approach is to compare nucleic acid or protein sequences
from different organisms using computer programmes and estimate the evolutionary relationships based on the
degree of homology between the sequences. Nucleic acids and proteins are linear molecules made of smaller units
called nucleotides and amino acids, respectively. The nucleotide differences within a gene or amino acid differences
within a protein reflect the evolutionary distance between two organisms. In other words, closely related organisms
will exhibit fewer sequence differences than distantly related organisms. Thus, the field of molecular phylogenetics
can be defined as the study of evolutionary relationships of genes or proteins by analyzing mutations at various
positions in their sequences and developing hypotheses about the evolutionary relatedness of the biomolecules.
Based on the sequence similarity of the molecules, evolutionary relationships between the organisms can often be
inferred. Phylogenetic studies construct the tree like pattern that describes the evolutionary relationship between
the organisms being studied.

Phylogenetic trees
In phylogenetic studies, the most convenient way of visually presenting evolutionary relationships among a group
of organisms is through illustrations called phylogenetic trees. Phylogenetic tree is a two-dimensional graph showing
evolutionary relation between organisms, or genes from various organisms. It is represented by branches and
nodes. Nodes can be internal or external. Each internal node represents the last common ancestor of the two
lineages. External nodes (also termed as terminal nodes, leaves or Operational Taxonomic Units) represent tip of
the tree i.e. extant taxonomic unit under consideration. Nodes correspond to species, organisms or sequences.
Similarly, branches can be internal or external. Internal branches or internodes connect two nodes, whereas
external branches connect a tip and a node.
The branching pattern in a tree is called tree topology. When all branches bifurcate on a phylogenetic tree, it is
referred to as dichotomy. In this case, each ancestor divides and gives rise to two descendants. Sometimes, a
branch point on a phylogenetic tree may have more than two descendants, resulting in a multifurcating node. The
phylogeny with multifurcating branches is called polytomy.

D C B A External node

Branch

Internal node

Figure 9.6 A phylogenetic tree.

A phylogenetic tree may be rooted or unrooted. A rooted tree infers the existence of a common ancestor from
which all the other species originate and indicates the direction on the evolutionary process. A rooted tree in which
every node has two descendants is called a binary tree. An unrooted tree does not provide any information about
their common ancestor and shows only the evolutionary relationships between the organisms. Most phylogenetic
trees are rooted.

967
This page intentionally left blank.
Bioinformatics

This analysis is continued for every position in the sequence alignment. Finally, those trees that produce the
smallest number of changes overall for all sequence positions are identified. This method is best suited for sequences
that are quite similar and is limited to small numbers of sequences.
Maximum likelihood approach: This method uses probability calculations to find a tree that best accounts for the
variation in a set of sequences. All possible trees are considered. Hence, the method is only feasible for a small
number of sequences. For each tree, the number of sequence changes or mutations that may have occurred to the
given sequence variation is considered. Because the rate of appearance of new mutations is very small, the more
mutations needed to fit a tree to the data, the less likely the tree.
The maximum likelihood method presents an additional opportunity to evaluate trees with variations in mutation
rates in different lineages, and to use explicit evolutionary models such as the Jukes-Cantor and Kimura models.
The method can be used to explore relationships among more diverse sequences and conditions that are not well
handled by maximum parsimony methods.

9.7 Protein structure prediction


Genome sequencing projects are producing linear amino acid sequences, but full understanding of the biological
role of these proteins will require knowledge of their structure and function. One of the major goals of bioinformatics
is to understand the relationship between amino acid sequence and the three dimensional structure in proteins. If
these relationships are known then the structure of a protein could be reliably predicted from the amino acid
sequence. Although experimental structure determination methods are providing high-resolution structure information
about a subset of the proteins, computational structure prediction methods will provide valuable information for the
large fraction of sequences whose structures will not be determined experimentally.

Methods for prediction of protein structure from amino acid sequence include:
• Attempts to predict secondary structure without attempting to assemble these regions in three dimensions.
• Homology modeling prediction of the three-dimensional structure of a protein from the known structures of one
or more related proteins.
• Fold recognition, from a library of known structures, determine which of them shares a folding pattern with a
query protein of known sequence but unknown structure.
• Prediction of novel folds, either by a priori or knowledge based methods.

Secondary structure prediction


Secondary structure prediction is a set of techniques that aim to predict the local secondary structures of proteins
based only on knowledge of their primary structure - amino acid sequence. For proteins, a prediction consists of
assigning regions of the amino acid sequence as likely α-helices, β-strands, or turns. The prediction is based on the
fact that secondary structures have a regular arrangement of amino acids, stabilized by hydrogen bonding patterns.
The secondary structure prediction methods can be either ab initio based, which make use of single sequence
information only, or homology based, which make use of multiple sequence alignment information. The ab initio
methods predict secondary structures based on statistical calculations of the residues of a single query sequence.
The homology-based methods do not rely on statistics of residues of a single sequence, but on common secondary
structural patterns conserved among multiple homologous sequences.

Ab initio based methods


This type of method predicts the secondary structure based on a single query sequence. It measures the relative
propensity of each amino acid belonging to a certain secondary structure element. The propensity scores are
derived from known crystal structures. Examples of ab initio prediction are the Chou-Fasman and Garnier, Osguthorpe,
Robson (GOR) methods.

970
Bioinformatics

The Chou-Fasman method for secondary structure prediction is one of the oldest and simplest methods. The basic
idea is that each amino acid residue is assigned three numbers that describes its propensity to be part of α-helices,
β-sheets and turns respectively. A large number (above 100) corresponds to a propensity for that kind of structure.
These parameters may be determined from the occurrence of different amino acids in different types of secondary
structure in known protein structures.
The GOR method is based on the assumption that amino acids flanking the central amino acid residue influence the
secondary structure that the central residue is likely to adopt. This method uses principles of information theory to
derive predictions. This method takes into account not only the probability of each amino acid having a particular
secondary structure, but also the conditional probability of the amino acid assuming each structure given the
contributions of its neighbors (it does not assume that the neighbors have that same structure). The approach is
both more sensitive and more accurate than that of Chou and Fasman because amino acid structural propensities
are only strong for a small number of amino acids such as proline and glycine.

Homology-based methods
Homology-based method combines the ab initio secondary structure prediction of individual sequences and alignment
information from multiple homologous sequences (> 35% identity). The idea behind this approach is that close
protein homologs should adopt the same secondary and tertiary structure.

Table 9.2 Selected programs for performing protein secondary structure prediction
Program Method
BSC Linear discrimination
NNPRED Neural networks enhanced to detect sequence periodicity
Protein sequence analysis (PSA) Discrete space models (hidden Markov models) for patterns of alpha helices,
beta strands, tight turns, and loops in specific structural classes
PREDATOR Based on analysis of long and short-range amino acid interactions and alignments
of sequence pairs
Predict protein server Neural networks of multiple sequence alignment
PSSP Nearest neighbor enhanced by non-intersecting local and multiple sequence
alignments
SOPM, SOPMA Nearest-neighbor method
SSP Linear discriminant analysis based on amino acid composition of local and adjacent
regions

Homology modeling
Homology means having a common evolutionary origin, but does not necessarily mean similarity. It is a qualitative
description of the nature of the relationship between two or more things, and it cannot be partial. Either there is an
evolutionary relationship or there is not.
A major goal of structural biology is to predict the three-dimensional structure of proteins from the sequence of
amino acids. Technique such as X-ray diffraction or NMR are being used to develop the three-dimensional structure
of proteins. Many proteins are simply too large for NMR analysis and cannot be crystallized for X-ray diffraction.
However, alternative strategies are being applied to develop models of protein structure. One method that can be
applied to generate reasonable models of protein structures is homology modeling. It is based on the reasonable
assumption that two homologous proteins will share very similar structures. This procedure, also termed comparative
modeling or knowledge-based modeling, develops a three-dimensional model from a protein sequence based on
the structures of homologous proteins. It is based on two major observations:
1. The structure of a protein is uniquely determined by its amino acid sequence.
2. During evolution, the structure is more stable and changes much slower than the associated sequence, so that
similar sequences adopt practically identical structures, and distantly related sequences still fold into similar structures.

971
This page intentionally left blank.
Bioinformatics

9.9 Genomics and proteomics


9.9.1 Genomics
The root word genome is universally defined as the sum total of all genetic material present in a cell or an
organism. So, term genomics is the study of an organism’s genome, but this definition is too simplistic. Genomics
concerned with the structure, function, fine-scale mapping and sequencing of the genome. The field genomics
includes:

• Structural genomics – It refers to the initial phase of genome analysis, which includes construction of genetic
and physical maps of a genome, identification of genes, annotation of gene features and comparison of genome
structures.

• Functional genomics – It focuses on the dynamic aspects such as gene transcription, translation and protein–
protein interactions, as opposed to the static aspects of the genomic information such as DNA sequence or
structures. Functional genomics not only simply attempts to answer the function of the identified genes but also
the organization and control of genetic pathways that come together to make up the physiology of an organism.

• Comparative genomics – It focuses on the analysis and comparison of genomes from different species. It
includes comparison of gene number, gene location, and gene content from these genomes. The comparison
helps to reveal the extent of conservation among genomes, which will provide insights into the mechanism of
genome evolution and gene transfer among genomes. The comparison helps to reveal the extent of conservation
among genomes, which will provide insights into the mechanism of genome evolution and gene transfer among
genomes. It helps to identify genes that are conserved among species, as well as genes that give each organism
its unique characteristics. In addition, the evolutionary perspective may prove extremely helpful in understanding
disease susceptibility. For example, chimpanzees do not suffer from some of the diseases that occur in humans,
such as malaria and AIDS, even though chimpanzees’ DNA sequence is 98.8 percent identical to ours. A
comparison of the sequence of genes involved in disease susceptibility may reveal the reasons for this species
barrier, thereby suggesting new pathways for prevention of human disease. Similarly, comparative analysis of
the fruit fly genome with the human genome discovered that about 60 percent of genes are conserved between
fly and human. Researchers have found that two-thirds of human genes known to be involved in cancer have
counterparts in the fruit fly. Even more surprisingly, when scientists inserted a human gene associated with
early-onset of Parkinson’s disease into fruit flies, they displayed symptoms similar to those seen in humans with
the disorder, raising the possibility that the tiny insects could serve as a new model for testing therapies aimed
at Parkinson’s.

9.9.2 Proteomics
Proteomics involves studying the structure, expression, localization, interactions and cellular roles of all of the
proteins present in a particular organism i.e. proteome (the word proteome refers to the complete set of proteins
encoded by the genome, including the added variation due to post-translational modifications. The proteome is
neither as uniform nor as static as the genome). It is a field that encompasses: protein expression and purification,
separation, visualization and identification, quantification, interactions, sequence analysis, structural analysis and
protein modification. This field also focuses on the development of new high-throughput techniques and the
computational machinery needed to analyze the data. The central aim of proteomics is the quantitative detection of
proteins in cells and tissues and comparison in different conditions (health, disease, differentiation, drug treatment,
etc). In general, proteomic approaches can be used

a. for proteome profiling,


b. for comparative expression analysis of two or more protein samples,
c. for the localization and identification of posttranslational modifications, and
d. for the study of protein–protein interactions.

974
This page intentionally left blank.

Potrebbero piacerti anche