Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Bioinformatics 2. Genetics Background Course 341 Department of Computing Imperial College, London © Simon Colton.

Similar presentations


Presentation on theme: "Introduction to Bioinformatics 2. Genetics Background Course 341 Department of Computing Imperial College, London © Simon Colton."— Presentation transcript:

1 Introduction to Bioinformatics 2. Genetics Background Course 341 Department of Computing Imperial College, London © Simon Colton

2 Coursework 1 coursework – worth 20 marks – Work in pairs Retrieving information from a database Using Perl to manipulate that information

3 The Robot Scientist Performs experiments Learns from results – Using machine learning Plans more experiments Saves time and money Team member: – Stephen Muggleton

4 Biological Nomenclature Need to know the meaning of: – Species, organism, cell, nucleus, chromosome, DNA – Genome, gene, base, residue, protein, amino acid – Transcription, translation, messenger RNA – Codons, genetic code, evolution, mutation, crossover – Polymer, genotype, phenotype, conformation – Inheritance, homology, phylogenetic trees

5 Substructure and Effect (Top Down/Bottom Up) Species Organism Cell Nucleus Chromosome DNA strand Gene Base Protein Amino Acid Folds into Affects the Function of Affects the Behaviour of Prescribes

6 Cells Basic unit of life Different types of cell: – Skin, brain, red/white blood – Different biological function Cells produced by cells – Cell division (mitosis) – 2 daughter cells Eukaryotic cells – Have a nucleus

7 Nucleus and Chromosomes Each cell has nucleus Rod-shaped particles inside – Are chromosomes – Which we think of in pairs Different number for species – Human(46),tobacco(48) – Goldfish(94),chimp(48) – Usually paired up X & Y Chromosomes – Humans: Male(xy), Female(xx) – Birds: Male(xx), Female(xy)

8 DNA Strands Chromosomes are same in every cell of organism – Supercoiled DNA (Deoxyribonucleic acid) Take a human, take one cell – Determine the structure of all chromosonal DNA – You’ve just read the human genome (for 1 person) – Human genome project 13 years, 3.2 billion chemicals (bases) in human genome Other genomes being/been decoded: – Pufferfish, fruit fly, mouse, chicken, yeast, bacteria

9 DNA Structure Double Helix (Crick & Watson) – 2 coiled matching strands – Backbone of sugar phosphate pairs Nitrogenous Base Pairs – Roughly 20 atoms in a base – Adenine  Thymine [A,T] – Cytosine  Guanine [C,G] – Weak bonds (can be broken) – Form long chains called polymers Read the sequence on 1 strand – GATTCATCATGGATCATACTAAC

10 Differences in DNA 2% tiny Roughly 4% Share Material DNA differentiates: – Species/race/gender – Individuals We share DNA with – Primates,mammals – Fish, plants, bacteria Genotype – DNA of an individual Genetic constitution Phenotype – Characteristics of the resulting organism Nature and nurture

11 Genes Chunks of DNA sequence – Between 600 and 1200 bases long – 32,000 human genes, 100,000 genes in tulips Large percentage of human genome – Is “junk”: does not code for proteins “Simpler” organisms such as bacteria – Are much more evolved (have hardly any junk) – Viruses have overlapping genes (zipped/compressed) Often the active part of a gene is spit into exons – Seperated by introns

12 The Synthesis of Proteins Instructions for generating Amino Acid sequences – (i) DNA double helix is unzipped – (ii) One strand is transcribed to messenger RNA – (iii) RNA acts as a template ribosomes translate the RNA into the sequence of amino acids Amino acid sequences fold into a 3d molecule Gene expression – Every cell has every gene in it (has all chromosomes) – Which ones produce proteins (are expressed) & when?

13 Transcription Take one strand of DNA Write out the counterparts to each base – G becomes C (and vice versa) – A becomes T (and vice versa) Change Thymine [T] to Uracil [U] You have transcribed DNA into messenger RNA Example: Start: GGATGCCAATG Intermediate: CCTACGGTTAC Transcribed: CCUACGGUUAC

14 Genetic Code How the translation occurs Think of this as a function: – Input: triples of three base letters (Codons) – Output: amino acid – Example: ACC becomes threonine (T) Gene sequences end with: – TAA, TAG or TGA

15 Genetic Code A=Ala=Alanine C=Cys=Cysteine D=Asp=Aspartic acid E=Glu=Glutamic acid F=Phe=Phenylalanine G=Gly=Glycine H=His=Histidine I=Ile=Isoleucine K=Lys=Lysine L=Leu=Leucine M=Met=Methionine N=Asn=Asparagine P=Pro=Proline Q=Gln=Glutamine R=Arg=Arginine S=Ser=Serine T=Thr=Threonine V=Val=Valine W=Trp=Tryptophan Y=Tyr=Tyrosine

16 Example Synthesis TCGGTGAATCTGTTTGAT Transcribed to: AGCCACUUAGACAAACUA Translated to: SHLDKL

17 Proteins DNA codes for – strings of amino acids Amino acids strings – Fold up into complex 3d molecule – 3d structures:conformations – Between 200 & 400 “residues” – Folds are proteins Residue sequences – Always fold to same conformation Proteins play a part – In almost every biological process

18 Evolution of Genes: Inheritance Evolution of species – Caused by reproduction and survival of the fittest But actually, it is the genotype which evolves – Organism has to live with it (or die before reproduction) – Three mechanisms: inheritance, mutation and crossover Inheritance: properties from parents – Embryo has cells with 23 pairs of chromosomes – Each pair: 1 chromosome from father, 1 from mother – Most important factor in offspring’s genetic makeup

19 Evolution of Genes: Mutation Genes alter (slightly) during reproduction – Caused by errors, from radiation, from toxicity – 3 possibilities: deletion, insertion, alteration Deletion: ACGTTGACTC  ACGTGACTC Insertion: ACGTTGACTC  AGCGTTGACTC Substitution: ACGTTGACTC  ACGATGACTT Mutations are almost always deleterious – A single change has a massive effect on translation – Causes a different protein conformation

20 Evolution of Genes: Crossover (Recombination) DNA sections are swapped – From male and female genetic input to offspring DNA

21 Bioinformatics Application #1 Phylogenetic trees Understand our evolution Genes are homologous – If they share a common ancestor By looking at DNA seqs – For particular genes – See who evolved from who Example: – Mammoth most related to African or Indian Elephants? LUCA: – Last Universal Common Ancestor – Roughly 4 billion years ago

22 Genetic Disorders Disorders have fuelled much genetics research – Remember that genes have evolved to function Not to malfunction Different types of genetic problems Downs syndrome: three chromosome 21s Cystic fibrosis: – Single base-pair mutation disables a protein – Restricts the flow of ions into certain lung cells – Lung is less able to expel fluids

23 Bioinformatics Application #2 Predicting Protein Structure Proteins fold to set up an active site – Small, but highly effective (sub)structure – Active site(s) determine the activity of the protein Remember that translation is a function – Always same structure given same set of codons – Is there a set of rules governing how proteins fold? – No one has found one yet – “Holy Grail” of bioinformatics

24 Protein Structure Knowledge Both protein sequence and structure – Are being determined at an exponential rate 1.3+ Million protein sequences known – Found with projects like Human Genome Project 20,000+ protein structures known – Found using techniques like X-ray crystallography Takes between 1 month and 3 years – To determine the structure of a protein – Process is getting quicker

25 Sequence versus Structure 00959085 0 100000 200000 300000 400000 500000 Year Number Protein sequence Protein structure

26 Database Approaches Slow(er) rate of finding protein structure – Still a good idea to pursue the Holy Grail Structure is much more conservative than sequence – 1.3m genes, but only 2,000 – 10,000 different conformations First approach to sequence prediction: – Store [sequence,structure] pairs in a database – Find ways to score similarity of residue sequences – Given a new sequence, find closest matches A good match will possibly mean similar protein shape E.g., sequence identity > 35% will give a good match – Rest of the first half of the course about these issues

27 Potential (Big) Payoffs of Protein Structure Prediction Protein function prediction – Protein interactions and docking Rational drug design – Inhibit or stimulate protein activity with a drug Systems biology – Putting it all together: “E-cell” and “E-organism” – In-silico modelling of biological entities and process

28 Further Reading Human Genome Project at Sanger Centre – http://www.sanger.ac.uk/HGP/ Talking glossary of genetic terms – http://www.genome.gov/glossary.cfm Primer on molecular genetics – http://www.ornl.gov/TechResources/Human_Genome/publicat/primer/toc.html http://www.ornl.gov/TechResources/Human_Genome/publicat/primer/toc.html


Download ppt "Introduction to Bioinformatics 2. Genetics Background Course 341 Department of Computing Imperial College, London © Simon Colton."

Similar presentations


Ads by Google