Sei sulla pagina 1di 28

341: Introduction to Bioinformatics

Dr. Nataa Prulj & Prof. Yike Guo Department of Computing Imperial College London natasha@imperial.ac.uk Winter 2011

Course overview
Explosion in the availability of biological data:
Sequences and microarrays (Prof. Guo) Networks: expected to be as useful as the sequence data in uncovering new biology (Dr. Prulj)

The goal of systems biology:


Systems-level understanding of biological systems, e.g. the cell Analyze not only individual components, but their interactions as well and its functioning as a whole E.g.: Learn new biology from the topology of such interaction networks

However, biological network research faces considerable challenges


Incomplete and noisy data Computational infeasibility of many graph theoretic problems
2

Course overview
We will cover: 1. Biological aspects:
Basic biological concepts (e.g., DNA, genes, proteins, gene expression, ) Different types of biological networks Experimental techniques for acquiring the data and their biases Public databases and other sources of biological network data

2. 3. 4.

Sequence analysis (Prof. Yi-Ke Guo) Microarray analysis (Prof. Yi-Ke Guo) Graph theoretic aspects:
Fundamental topics in graph theory (e.g. basic graph notation, graph representation, and special graph types) Basic graph algorithms (e.g., graph search/traversal algorithms and running time analysis) Important computational complexity concepts (e.g., complexity classes, subgraph isomorphism, and NP-completeness) which pose challenges on analyzing biological nets

5.

Existing approaches for analyzing and modeling biological networks:


Structural properties of large networks Network models Network clustering Graph alignment Software tools for network analysis

6.

Applications: interplay of topology and biology


Learn how the above methods have been applied Discuss valuable insights that have been learned: into biological function, evolution, complex diseases (e.g., cancer) and drug discovery

Course overview
Grading scheme:
Two homework assignments
Each assignment worth equally Due at the beginning of the class

Written exam Standard College Grading Scheme will be used

Course overview
Course organization:
1. 2. 3. 4. Lectures
Relevant theoretical concepts and examples Exercises covering concepts covered in class Opportunity to solve practical problems using the methods learned in class Testing students understanding of the concepts learned in lectures

Tutorials Two homework assignments Written exam

Course overview
Textbooks and readings
Recommended textbooks:
or Junker and Schreiber, Analysis of Biological Networks, Wiley, 2008. West, Introduction to graph theory, 2nd edition, Prentice Hall, 2001 T. Cormen et al., Analysis of Algorithms, 3rd eddition, MIT press, 2009. A list of up-to-date research papers selected by the instructor. F. Kepes (Author, Editor), Biological Networks (Complex Systems and Interdisciplinary Science), World Scientific Publishing Company; 1st edition, 2007. Bornholdt and Schuster (Editors), Handbook of Graphs and Networks: From the Genome to the Internet, Wiley, 2003. or Dorogovtsev and Mendes (Authors), Evolution of Networks: From Biological Nets to the Internet and WWW (Physics), Oxford University Press, 2003. Chapter 17 from: Chen and Lonardi (Editors), Biological Data Mining, Chapman and Hall/CRC press, 2009. Chapter 4 from: Jurisica and Wigle (Editors), Knowledge Discovery in Proteomics, CRC Press, 2005. LEDA: A Platform for Combinatorial and Geometric Computing, by Kurt Mehlhorn, Stefan Nher, Cambridge University Press, 1999.

Recommended readings:

Course overview
When and where:
Fridays, 2 5 pm (2 hours of lecture, 1 hour tutorail) 145 Huxley

Contact:
E-mail: natasha@imperial.ac.uk Subject: 341 Bioinformatics

Office hours:
Fridays after class, 5 pm Office: 407 A Huxley
7

Course overview
Prerequisites: none
Basic programming skills are desirable Introduction into biological concepts will be provided

Course website (curriculum, class material, etc.):


http://www.doc.ic.ac.uk/~natasha/course/index.html

Academic code of honor

Topics
Introduction: biology (Dr. Przulj) Sequence analysis (Prof. Guo, 2 lectures) Microarray analysis (Prof. Guo, 3 lectures) Network biology (Dr. Przulj):
Introduction to graph theory Network properties
Network/node centralities Network motifs

Network models Network/node clustering Network comparison/alignment Software tools for network analysis Interplay between topology and biology

Course overview
Any questions so far?

10

Course overview
About you

11

Introduction: biology

12

Introduction: biology
Cell - the building block of life
Cytoplasm and organelles separated by membranes:
Mitochondria, nucleus, etc.

13

Introduction: biology
Distinguish between:
Prokaryotes
Single-celled, no cell nucleus or any other membrane-bound organelles
The genetic material in prokaryotes is not membrane-bound

The bacteria and the archaea Model organism: E.coli

Eukaryotes
Have "true" nuclei containing their DNA May be unicellular, as in amoebae May be multicellular, as in plants and animals Model organism: S. cerevisiae (bakers yeast)

14

Introduction: biology
Nucleus contains DNA
Deoxyribonucleic acid

DNA nucleotides: A and T, C and G DNA structure: double helix

15

Introduction: biology
Chromosomes
RNA: similar to DNA, except T U and single stranded

16

Introduction: biology
Main role of DNA: long-term storage of genetic information Genes: DNA segments that carry this information
Intron: part of gene not translated into protein, spliced out of mRNA Exon: mRNA translated into protein consists only of exon-derived sequences

Genome: total set of (unique) genes in an organism Every cell (except sex cells and mature red blood cells) contains the complete genome of an organism

17 17

Introduction: biology
Codons: sets of three nucleotides
4 nucleotides 43=64 possible codons

Each codon codes for an amino acid


64 codons produce 20 different amino acids More than one codon stands for one amino acid

Polypeptide:
String of amino acids, composed from a 20-character alphabet

Proteins:

String composed of one or more polypeptides (70-3000 amino acids) Sequence of amino acids is defined by a gene Gene expression: information transmission from DNA to proteins

Proteome: total set of proteins in an organism

18

Introduction: biology
The 20 amino acids

19

Introduction: biology
Levels of protein structure:

20 20

Introduction: biology
Genes vs. proteins
Genes passive; proteins active

Protein synthesis: from genes to proteins


Transcription (in nucleus) Splicing (eukaryotes) Translation (in cytoplasm)

21

Introduction: biology
Transcription (in nucleus)
RNA polymerase enzyme builds an RNA strand from a gene (DNA is "unzipped) The gene is transcribed to messenger RNA (mRNA) Transcription is regulated by proteins called transcription factors

22

Introduction: biology
Splicing (eukaryotes)
Regions that are not coding for proteins (introns) are removed from sequence

23

Introduction: biology
Translation (in cytoplasm)
Ribosomes synthesize proteins from mRNA mRNA is decoded and used as a template to guide the synthesis of a chain of amino acids that form a protein Translation: the process of converting the mRNA codon sequences into an amino acid polypeptide chain

24

Introduction: biology
Microarrays:
Measure mRNA abundance for each gene The amount of transcribed mRNA correlates with gene expression
The rate at which a gene produces the corresponding protein

It is hard to measure protein level directly!


25

Introduction: biology
Every cell* contains the complete genome of an organism How is the variety of different tissues encoded and expressed?

26

Introduction: biology

27

Introduction: biology
-ome and omics
Genome and genomics Proteome and proteomics

28

Potrebbero piacerti anche