Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West 573-882-7064.

Slides:



Advertisements
Similar presentations
Biotechnolgy. Basic Molecular Biology Core of biotechnology.
Advertisements

The 20 amino acids. AAlaAlanine Small Hydrophobic Helix: ++ Strand: – Turn: – – Mutate to Ala if you have to mutate but have no clue to which residue.
Disease-causing bacteria (smooth colonies) Harmless bacteria (rough colonies) Heat-killed, disease- causing bacteria (smooth colonies) Control (no growth)
August 19, 2002Slide 1 Bioinformatics at Virginia Tech David Bevan (BCHM) Lenwood S. Heath (CS) Ruth Grene (PPWS) Layne Watson (CS) Chris North (CS) Naren.
DNA and Gene Expression. DNA Deoxyribonucleic Acid Deoxyribonucleic Acid Double helix Double helix Carries genetic information Carries genetic information.
Introduction to Bioinformatics 2. Genetics Background Course 341 Department of Computing Imperial College, London © Simon Colton.
1 Genetics The Study of Biological Information. 2 Chapter Outline DNA molecules encode the biological information fundamental to all life forms DNA molecules.
Chapter 3 The Biological Basis of Life. Introduction Genetics is the study of how one trait transfers from one generation to the next Involves process.
Section 8.6: Gene Expression and Regulation
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Protein Structure Modeling (1). Protein Folding Problem A protein folds into a unique 3D structure under physiological conditions Lysozyme sequence: KVFGRCELAA.
Introduction to Structural Bioinformatics Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia.
Marlou Snelleman 2012 Proteins and amino acids. Overview Proteins Primary structure Secondary structure Tertiary structure Quaternary structure Amino.
Introduction to Biological Sequences. Background: What is DNA? Deoxyribonucleic acid Blueprint that carries genetic information from one generation to.
RNA Ribonucleic Acid.
All illustrations in this presentation were obtained from Google.com
{ DNA Processes: Transcription and Translation By: Sidney London and Melissa Hampton.
DNA/RNA/Protein Synthesis All illustrations in this presentation were obtained from Google.com.
Chapter 9, Section 2 & 3 Regular Biology
How Genes Are Controlled
Lecture 10 – protein structure prediction. A protein sequence.
What must DNA do? 1.Replicate to be passed on to the next generation 2.Store information 3.Undergo mutations to provide genetic diversity.
GENETIC CONTROL OF PROTEIN SYNTHESIS, CELL FUNCTION, AND CELL REPRODUCTION PART 1.
Molecular Genetics Section 1: DNA: The Genetic Material
Protein synthesis mb.edu/cellbio/r ibosome.htm.
Proteins dictate function in an organism:
CSCI 6900/4900 Special Topics in Computer Science Automata and Formal Grammars for Bioinformatics Bioinformatics problems sequence comparison pattern/structure.
CO341: Introduction to Bioinformatics Prof. Yi-Ke Guo
CHAPTER 12 STUDY GUIDE MATER LAKES ACADEMY MR. R. VAZQUEZ BIOLOGY
Chapter 11 DNA and GENES. DNA: The Molecule of Heredity DNA, the genetic material of organisms, is composed of four kinds nucleotides. A DNA molecule.
Chromosome Abnormalities Non-disjunction during meiosis can cause a gamete to have an extra chromosome Trisomy = three copies of the same chromosome. Most.
Biology: DNA, Transcription, Translation, and Protein Synthesis
C11- DNA and Genes Chapter 11.
Gene Regulations and Mutations
Chapter 9 From DNA to Protein.
Chapter 12 DNA, RNA, Gene function, Gene regulation, and Biotechnology.
DNA, RNA & Protein Synthesis Chapters 12 & 13. The Structure of DNA.
Marlou Snelleman 2011 Proteins and amino acids. Overview Proteins Primary structure Secondary structure Tertiary structure Quaternary structure Amino.
Bailee Ludwig Quality Management. Before we get started…. ….Let’s see what you know about Genomics.
Protein Synthesis Review By PresenterMedia.com PresenterMedia.com.
Replication, Transcription and Translation. Griffith’s Experiment.
DNA, RNA. Genes A segment of a chromosome that codes for a protein. –Genes are composed of DNA.
Gene Expression Gene: contains the recipe for a protein 1. is a specific region of DNA on a chromosome 2. codes for a specific mRNA.
The Discovery of DNA as the genetic material. Frederick Griffith.
Biotechnolgy. Basic Molecular Biology Core of biotechnology.
Biology Chapter 12.  Performed the first major experiment that led to the discovery of DNA as the genetic material Griffith.
Introduction to molecular biology Data Mining Techniques.
Unit 7 (A)-DNA Structure Learning Targets I can describe the role that Wilkins, Franklin, Watson, and Crick had in the discovery of the structure of DNA.
Ch 12 DNA and RNA 12-1DNA 12-2 Chromosomes and DNA Replication 12-3 RNA and Protein Synthesis 12-4 Mutations 12-5 Gene Regulation 12-1DNA 12-2 Chromosomes.
RNA & Protein Synthesis
1 Genes and Proteins The genetic information contained in the nucleotide sequence of DNA specifies a particular type of protein Enzymes = proteins that.
8.2 KEY CONCEPT DNA structure is the same in all organisms.
DNA and Protein Synthesis
MCB 7200: Molecular Biology
DNA RNA Protein Synthesis
From Gene to Protein pp Discover Biology: C15 From Gene to Protein pp
Life’s Instruction Manual or What Genes are Made Of
Gene Expression Gene: contains the recipe for a protein
DNA and Heredity Why do children “look” like their biological parents?
Chapter 12 Molecular Genetics.
BIOLOGY Vocabulary Chapter 12 & 13.
Ch 12 DNA and RNA.
Life’s Instruction Manual or What Genes are Made Of
CHAPTER 12 Review.
How Proteins are Made Biology I: Chapter 10.
DNA, RNA & Protein Synthesis
The Study of Biological Information
Molecular Biology of the Gene
Chapter 12 Molecular Genetics.
Presentation transcript:

Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West

Lecture Outline l From DNA to gene l Protein sequence and structure l Gene expression l Protein interaction and pathway l Provide a roadmap for the entire course l Biology from system level (computational perspective)

About Life l Life is wonderful: amazing mechanisms l Life is not perfect: errors and diseases l Life is a result of evolution

Cells l Basic unit of life l Prokaryotes/eukaryotes l Different types of cell: å Skin, brain, red/white blood å Different biological function l Cells produced by cells å Cell division (mitosis) å 2 daughter cells

DNA l Double Helix (Watson & Crick) l Nitrogenous Base Pairs å Adenine  Thymine [A,T] å Cytosine  Guanine [C,G] å Weak bonds (can be broken) å Form long chains

Genome l Each cell contains a full genome (DNA) l The size varies: å Small for viruses and prokaryotes (10 kbp-20Mbp) å Medium for lower eukaryotes X Yeast, unicellular eukaryote 13 Mbp X Worm (Caenorhabditis elegans) 100 Mbp X Fly, invertebrate (Drosophila melanogaster) 170 Mbp å Larger for higher eukaryotes X Mouse and man 3000 Mbp å Very variable for plants (many are polyploid) X Mouse ear cress (Arabidopsis thaliana) 120 Mbp X Lilies 60,000 Mbp

Differences in DNA ~2%~4% ~0.2%

Genes l Chunks of DNA sequence that can translate into functional biomolecules (protein, RNA) l 2% human DNA sequence for coding genes l 32,000 human genes, 100,000 genes in tulips

Gene Structure l General structure of an eukaryotic gene l Unlike eukaryotic genes, a prokaryotic gene typically consists of only one contiguous coding region

Informational Classes in Genomic DNA l Transcribed sequences (exons and introns) l Messenger sequences (mRNA, exons only) l Coding sequences (CDS, part of the exons only) l Heads and tails: untranslated parts (UTR) l Regulatory sequences l... and all the rest  Identify them: gene-finding

Genetic Code A=Ala=Alanine C=Cys=Cysteine D=Asp=Aspartic acid E=Glu=Glutamic acid F=Phe=Phenylalanine G=Gly=Glycine H=His=Histidine I=Ile=Isoleucine K=Lys=Lysine L=Leu=Leucine M=Met=Methionine N=Asn=Asparagine P=Pro=Proline Q=Gln=Glutamine R=Arg=Arginine S=Ser=Serine T=Thr=Threonine V=Val=Valine W=Trp=Tryptophan Y=Tyr=Tyrosine

Protein Synthesis AGCCACTTAGACAAACTA (DNA) å Transcribed to: AGCCACUUAGACAAACUA (mRNA) å Translated to: SHLDKL (Protein)

About Protein 10s – 1000s amino acids (average 300) Lysozyme sequence (129 amino acids): KVFGRCELAA AMKRHGLDNY RGYSLGNWVC AAKFESNFNT QATNRNTDGS TDYGILQINS RWWCNDGRTP GSRNLCNIPC SALLSSDITA SVNCAKKIVS DGNGMNAWVA WRNRCKGTDV QAWIRGCRL Protein backbones: Side chain

Evolution of Genes: Mutation l Genes alter (slightly) during reproduction å Caused by errors, from radiation, from toxicity å 3 possibilities: deletion, insertion, alteration l Deletion: ACGTTGACTC  ACGTGACTC l Insertion: ACGTTGACTC  AGCGTTGACTC l Substitution: ACGTTGACTC  ACGATGACTC l Mutations are mostly deleterious

Ancestor Gene duplication X Y Recombination 75%X 25%Y Paralogs (related functions) Mixed Homology Orthologs (similar function) Evolution and Homology Twilight zone: undetectable homology (<20% sequence identity)

Sequence Comparison o Pairwise sequence comparison o multiple alignment SAANLEYLKNVLLQFIFLKPG--SERERLLPVINTMLQLSPEEKGKLAAV O15045 NEKNMEYLKNVFVQFLKPESVP-AERDQLVIVLQRVLHLSPKEVEILKAA P34562 KNEKIAYIKNVLLGFLEHKE----QRNQLLPVISMLLQLDSTDEKRLVMS Q06704 REINFEYLKHVVLKFMSCRES---EAFHLIKAVSVLLNFSQEEENMLKET Q92805 MLIDKEYTRNILFQFLEQRD----RRPEIVNLLSILLDLSEEQKQKLLSV O42657 EPTEFEYLRKVMFEYMMGR-----ETKTMAKVITTVLKFPDDQAQKILER O70365 DPAEAEYLRNVLYRYMTNRESLGKESVTLARVIGTVARFDESQMKNVISS Q21071 STSEIDYLRNIFTQFLHSMGSPNAASKAILKAMGSVLKVPMAEMKIIDKK Q18013

Phylogenetic Trees Understand evolution

Protein Structure Lysozyme structure: ball & stick strand surface

Structure Features of Folded Proteins l Compact l Secondary structures: loop  -helix  -sheet Protein cores mostly consist of  -helices and  -sheets

Protein Structure Comparison Structure is better conserved than sequence Structure can adopt a wide range of mutations. Physical forces favor certain structures. Number of fold is limited. Currently ~700 Total: 1,000 ~10,000 TIM barrel

Protein Folding Problem A protein folds into a unique 3D structure under the physiological condition Lysozyme sequence: KVFGRCELAA AMKRHGLDNY RGYSLGNWVC AAKFESNFNT QATNRNTDGS TDYGILQINS RWWCNDGRTP GSRNLCNIPC SALLSSDITA SVNCAKKIVS DGNGMNAWVA WRNRCKGTDV QAWIRGCRL

Structure-Function Relationship Certain level of function can be found without structure. But a structure is a key to understand the detailed mechanism. A predicted structure is a powerful tool for function inference. Trp repressor as a function switch

Structure-Based Drug Design HIV protease inhibitor Structure-based rational drug design is still a major method for drug discovery.

Gene Expression Same DNA in all cells, but only a few percent common genes expressed (house-keeping genes). A few examples: (1) Specialized cell: over-represented hemoglobin in blood cells. (2) Different stages of life cycle: hemoglobins before and after birth, caterpillar and butterfly. (3) Different environments: microbial in nutrient poor or rich environment. (4) Special treatment: response to wound.

Eucaryote Gene Expression Control DNA Primary RNA transcript mRNA nucleuscytosol RNA transport control inactive mRNA degradation control translation control nucleus membrane transcriptional control protein inactive protein activity control RNA processing control Methods: Mass-spec Microarray

Gene Regulation DNA sequence Start of transcription promoter operator

Microarray Experiments Microarray data  Regulation/function/pathway/cellular state/phenotype  Disease: diagnosis/gene identification/sub-typing Microarray chip

Genetic vs. Physical Interaction Regulatory network Genetic interaction Complex system Physical interaction Gene/protein interaction Expressed gene Transcription factor

Biological Pathway

Studying Pathways through Systems Biology Approach RGYSLGNWVC AAKFESNFNT QATNRNTDGS TDYGILQINS RWWCNDGRTP GSRNLCNIPC sequence structure functionprotein interaction gene regulation pathway (cross-talk)

Discussion l Possible impacts of biotechnology to our life

Assignments l Required reading: * Chapter 13 in “Pavel Pevzner: Computational Molecular Biology - An Algorithmic Approach. MIT Press, 2000.” * Larry Hunter: molecular biology for computer scientists l Optional reading: