Macromolecular and Physical Data Michael J. Watts 1.

Slides:



Advertisements
Similar presentations
Protein Targetting Prokaryotes vs. Eukaryotes Mutations
Advertisements

Molecular Genetics PaCES Summer Program in Environmental Science.
DNA replication—when? Where? Why? What else does a cell do?
Introduction to Bioinformatics Spring 2008 Yana Kortsarts, Computer Science Department Bob Morris, Biology Department.
© 2006 W.W. Norton & Company, Inc. DISCOVER BIOLOGY 3/e
Transcription & Translation Biology 6(C). Learning Objectives Describe how DNA is used to make protein Explain process of transcription Explain process.
Principles of Biology By Frank H. Osborne, Ph. D. Molecular Genetics.
How Proteins are Made. I. Decoding the Information in DNA A. Gene – sequence of DNA nucleotides within section of a chromosome that contain instructions.
Unit 7 RNA, Protein Synthesis & Gene Expression Chapter 10-2, 10-3
Intelligent Systems for Bioinformatics Michael J. Watts
RNA and Protein Synthesis
RNA & Protein Synthesis.
Chapter 8 Microbial Genetics part A. Life in term of Biology –Growth of organisms Metabolism is the sum of all chemical reactions that occur in living.
DNA. Week 2 Review 1.Draw and label a diagram showing the cell membrane. 2.Define Osmosis 3.Define Active and Passive Transport 4.Describe the difference.
Chapter 8: From DNA to Protein Section Transcription
Transcription and Translation. Central Dogma of Molecular Biology  The flow of information in the cell starts at DNA, which replicates to form more DNA.
Introduction to Molecular Biology and Genomics BMI/CS 776 Mark Craven January 2002.
CFE Higher Biology DNA and the Genome Transcription.
Protein Synthesis RNA, Transcription, and Translation.
Gene Expression DNA, RNA, and Protein Synthesis. Gene Expression Genes contain messages that determine traits. The process of expressing those genes includes.
Unit 1: DNA and the Genome Structure and function of RNA.
8.2 KEY CONCEPT DNA structure is the same in all organisms.
Chapter 13- RNA and Protein Synthesis
Topic 25 – RNA and protein synthesis
PROTEIN SYNTHESIS.
From Gene to Protein pp Discover Biology: C15 From Gene to Protein pp
RNA carries DNA’s instructions.
Transcription, Translation & Protein Synthesis
The Central Dogma Transcription & Translation
Molecular Biology DNA Expression
Chapter 21 DNA and Biotechnology.
Chapter 5 RNA and Transcription
Chapter 13: Protein Synthesis
Notes over Active Transport and Protein Synthesis
Let’s Review Who discovered the structure of DNA?
Protein Synthesis in Detail
Central Dogma of Molecular Biology From Genes to Protein
The Central Dogma of Life.
RNA.
Outline What is an amino acid / protein
Transcription and Translation
PROTEIN SYNTHESIS.
Transcription.
Protein Synthesis Chapter 10.
KEY CONCEPT DNA structure is the same in all organisms.
How Proteins are Made.
UNIT 5 Protein Synthesis.
DNA and the Genome Key Area 3b Transcription.
What is RNA? Do Now: What is RNA made of?
RNA and Transcription DNA RNA PROTEIN.
How Proteins are Made Biology I: Chapter 10.
Transcription Credit for the original presentation is given to Mrs. Boyd, Westlake High School.
Transcription and Translation
Central Dogma Central Dogma categorized by: DNA Replication Transcription Translation From that, we find the flow of.
Transcription and Translation
Biology, 9th ed,Sylvia Mader
CHAPTER 10 Molecular Biology of the Gene
Let’s Review Who discovered the structure of DNA?
Higher Biology Unit 1: 1.3 Transcription.
4/6 Objective: Explain the steps and key players in transcription.
RNA and protein synthesis
LECTURE 5: DNA, RNA & PROTEINS
Genes and Protein Synthesis Review
RNA.
Chapter 14: Protein Synthesis
4/2 Objective: Explain the steps and key players in transcription.
Gene Expression How proteins are made..
Genes Determine the characteristics of individuals.
So how do we get from DNA to Protein?
Presentation transcript:

Macromolecular and Physical Data Michael J. Watts 1

Lecture Outline ● Basic biochemistry ● Sources of biochemical data ● Representation of biochemical data ● Uses of biochemical data 2

What is DNA? DeoxyriboNucleicAcid storage medium of genetic information in higher organisms encapsulated in cell nucleus of eukaryotic (multi- cellular) organisms consists of long chains of nucleotides Two chains in double helix structure 3

What is DNA? (continued) Four bases: Adenine A Guanine G Thymine T Cytosine C sequence of bases encodes genetic information 4

How is DNA Processed in Cells? “Central Dogma of Molecular Genetics” describes the flow of information from DNA to protein 5

DNA Processing (continued) DNA is transcribed into messenger RNA (mRNA) RNA is a less stable relative of DNA replaces Thymine (T) with Uracil (U) RNA strand read by ribosome to produce protein (translation) 6

Transcription DNA split into single strands RNA polymerase binds to DNA strand at promoter site RNA strand formed from DNA base complements A -> U G -> C C -> G T -> A 7

Transcription (continued) mRNA cut into sections Coding (exon) portions of RNA strands spliced together Non-coding (intron) segments discarded 8

RNA Translation Triplets of RNA bases (codons) translated to amino acid (residue) The genetic code Amino acids linked to form protein Protein folds according to electrostatic forces Shape of protein determines its function 9

DNA Data Many different kinds of DNA data and DNA related data in existence DNA promoter data RNA splice junction data The “genetic code” Protein sequences and configurations 10

DNA Data ● Lists of bases AAGCTTCGTGAGCTGCGTAGGCTAGGGCTTTAGGCTCCGAGTCC GTAAGCTCGAGACTGCTAGAGCTCTAGAGCTATAGCGCTATACGG ACTATCGAGCTCTGGGCTATATATTTTATCGCGTTATAGAGAGATC TCGAGATCGCGCGATCGAGCTTAGCAGCTATATCGGCTATCAGG CATCATAGCTTCGTGAGCTGCGTAGGCTAGGGCTTTAGGCTCCG AGTCCGTAAGCTCGAGACTGCTAGAGCTCTAGAGCTATAGCGCT ATACGGACTATCGAGCTCTGGGCTATATATTTTATCGCGTTATAGA GAGATCTCGAGATCGCGCGATCGAGCTTAGCAGCTATATCGGCTA TCAGGCATCAT 11

Sources of DNA Data ● EMBL – ● BLAST – ● GenBank – 12

Uses of DNA Data ● gene finding ● disease detection ● disease prediction ● genetic engineering ● gene therapy? 13

Protein Data ● Long chains of amino acids (residues) ● Can be sequenced – yields long lists of residues ALA VAL SER LYS VAL TYR ALA ARG SER VAL TYR ASP SER ARG GLY ASN PRO THR VAL GLU VAL GLU LEU THR THR GLU LYS GLY VAL PHE ARG SER ILE VAL PRO SER GLY ALA SER THR GLY VAL HIS GLU ALA LEU GLU MET ARG ASP GLY ASP LYS SER LYS TRP MET GLY LYS GLY VAL LEU 14

Protein Data ● Proteins have complex 3D shapes 3D shape of protein (conformation) affects its biological function 15

Sources of Protein Data ● PDB – Protein Data Bank, Brookhaven – ● SwissProt – ● Proteome, Inc. – 16

Uses of Protein Data ● Proteins are the means by which genes are expressed ● Study of proteins tells us how genes affect the organism ● Proteomics – MUCH larger field than genomics 17

Representing Biological Data ● Depends what you’re doing with it ● Data can be processed by statistical methods – need some numeric representation ● Data can be visualised – need to represent base identity 18

Representing Biological Data Basic sequence data string of letters A, C, G, T (DNA) A,C,D,E, etc for Amino Acids Can be represented in several ways Substitute arbitrary numbers for letters e.g. A=1, C=2, G=3, T=4 doesn’t reflect some properties of the bases problems dealing with uncertainty 19

Representing Biological Data Binary representation orthogonal encoding each base can be one of four (or 20) represent each base by four bits i.e. A = , C = , etc. handles uncertainty better e.g. A or C = still ignores properties of the bases 20

Representing Biological Data Electron Ion Interaction Potential (EIIP) Measure of the chemical properties of DNA bases Preserves information about the properties of the bases specific to DNA Problems with uncertainty remain 21

Representing Biological Data ● Charge, hydrophobicities – biophysical properties of amino acids – specific to proteins 22

Representing Biological Data ● Most biochem / molecular biology databases are badly organised ● Use flat text files to store data ● Poor search facilities ● Major legacy problems ● Big opportunity for INFO graduates 23

Processing Biological Data ● Homology finding ● Gene expression ● Protein structure ● Gene mutations 24

Homology ● Finding similarity between sequences ● Implies evolutionary relationships between species ● Done via sequence alignment ● Sequence alignment compares two or more sequences 25

Sequence Alignment ● Goal is to maximise number of matching characters in each sequence ● Count number of matches and differences ● Differences include – substitutions – additions – Deletions 26

Sequence Alignment ● Substitution – a character has been replaced with another ● Addition (or insertion) – a character has been inserted into the sequence ● Deletion – a character has been removed from the sequence 27

Sequence Alignment ● This makes sequence alignment a difficult problem ● The sequences may be of different lengths ● Need to identify and align homologous groups / modules – local vs global alignment 28

Gene Expression ● Each cell nucleus contains a complete copy of the organism’s genome ● BUT only specific genes are expressed in each tissue / cell type ● Gene expression analysis seeks to identify these genes 29

Gene Expression ● What use is this? ● Gene expression in tumors – find genes that are expressed in cancers and not other tissues – target drugs to these genes – disable tumors 30

Protein Structure ● Shape of protein determines its function ● Enzyme activity – enzymes bind to substrates – lock and key arrangement – shapes must match – therapeutic applications 31

Protein Structure ● Structure can be determined directly – crystallise the protein – examine with X-ray diffraction or NMR ● Difficult – not all proteins crystallise well ● Slow! 32

Protein Structure ● ‘Holy Grail’ of proteomics – predicting structure accurately from primary sequence ● Homology / alignments used so far ● Intelligent techniques can also be used – EC – ANN 33

Gene Mutations ● Single Nucleotide Polymorphisms ● Point mutations ● Change of a single nucleotide ● Examples – Sickle-cell disease – Lactose tolerance 34

Gene Mutations ● Can be used to diagnose disease – Sickle-cell disease – Huntington’s chorea ● Can be used to predict disease – Stomach cancer – Alzheimers 35

Gene Mutations ● Targets for gene therapy ● Speculative treatment ● Uses retroviruses to replace defective genes ● No major successes yet ● Use of nanotechnology has been proposed.. 36

Gene Mutations ● Can be discovered via sequence alignment ● Can be discovered via analysis of gene expression ● Once discovered, can be screened for (relatively) easily 37

Summary ● Bioinformatics is a very large field ● Filled with many challenges – and lots of $$$$! ● HUGE amount of data exists and being continuously produced ● Problems with processing it all ● Many opportunities for INFO-type folk 38