Introduktion til Bioinformatik Hold 01 Oktober 2010.

Slides:



Advertisements
Similar presentations
FROM DNA TO PROTEIN Transcription – Translation
Advertisements

DNA as Biological Information Rasmus Wernersson Henrik Nielsen.
Biologisk information Med fokus på DNA. Læringsmål / learning objectives Læringsmål –Hvad er biologisk information –Informations flow –Teknikken bag DNA.
It og Sundhed Nov Jan. Thomas Nordahl Petersen, Associate Professor Center for Biological Sequence Analysis, DTU Normal
It og Sundhed Thomas Nordahl Petersen, Associate Professor Center for Biological Sequence Analysis, DTU Building 208, room 021
It og Sundhed Nov Jan. Thomas Nordahl Petersen, Associate Professor Center for Biological Sequence Analysis, DTU
DNA as Biological Information Rasmus Wenersson. Overview Learning objectives –About Biological Information –A note about DNA sequencing techniques and.
RNA and PROTEIN SYNTHESIS
Classification of Living Things. 2 Taxonomy: Distinguishing Species Distinguishing species on the basis of structure can be difficult  Members of the.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Molecular Evolution Course #27615 Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Introduction to Bioinformatics Spring 2008 Yana Kortsarts, Computer Science Department Bob Morris, Biology Department.
© 2006 W.W. Norton & Company, Inc. DISCOVER BIOLOGY 3/e
What is bioinformatics? Finding patterns in molecular biological data Implies: managing molecular biological data identifying correlations in molecular.
It og Sundhed Thomas Nordahl Petersen, Associate Professor Center for Biological Sequence Analysis, DTU
DNA as Biological Information Rasmus Wernersson Henrik Nielsen.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Brief Introduction to the Theory of Evolution Anders Gorm Pedersen Molecular Evolution Group Center for Biological.
Bioinformatics Student host Chris Johnston Speaker Dr Kate McCain.
Anders Gorm Pedersen Molecular Evolution Group
Introduction to Biological Sequences. Background: What is DNA? Deoxyribonucleic acid Blueprint that carries genetic information from one generation to.
{ DNA Processes: Transcription and Translation By: Sidney London and Melissa Hampton.
What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe...
Protein Synthesis: Transcription
Transcription Transcription is the synthesis of mRNA from a section of DNA. Transcription of a gene starts from a region of DNA known as the promoter.
Essential Bioinformatics and Biocomputing Module (Tutorial) Biological Databases Lecturer: Chen Yuzong Jan 2003 TAs: Cao Zhiwei Lee Teckkwong, Bernett.
ZOOLOGY BIO 141 Dr. Sumithran.
FROM DNA TO PROTEIN Transcription – Translation We will use:
Chapter 17~ From Gene to Protein.
FROM DNA TO PROTEIN Transcription – Translation. I. Overview Although DNA and the genes on it are responsible for inheritance, the day to day operations.
1 The Interrupted Gene. Ex Biochem c3-interrupted gene Introduction Figure 3.1.
Organizing information in the post-genomic era The rise of bioinformatics.
AP Biology DNA Study Guide. Chapter 16 Molecular Basis of Heredity The structure of DNA The major steps to replication The difference between replication,
Recombinant DNA Technology and Genomics A.Overview: B.Creating a DNA Library C.Recover the clone of interest D.Analyzing/characterizing the DNA - create.
Bioinformatics for Human Biologists Rasmus Wernersson, Associate Professor Center for Biological Sequence Analysis, DTU [ -
Peptide Bond Formation Walk the Dogma RECALL: The 4 types of organic molecules… CARBOHYDRATES LIPIDS PROTEINS (amino acid chains) NUCLEIC ACIDS (DNA.
GENE SEQUENCING. INTRODUCTION CELL The cells contain the nucleus. The chromosomes are present within the nucleus.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Human Genomics. Writing in RED indicates the SQA outcomes. Writing in BLACK explains these outcomes in depth.
Johnson - The Living World: 3rd Ed. - All Rights Reserved - McGraw Hill Companies Genomics Chapter 10 Copyright © McGraw-Hill Companies Permission required.
PROTEIN SYNTHESIS TRANSCRIPTION AND TRANSLATION. TRANSLATING THE GENETIC CODE ■GENES: CODED DNA INSTRUCTIONS THAT CONTROL THE PRODUCTION OF PROTEINS WITHIN.
1 DNA and Biotechnology. 2 Outline DNA Structure and Function DNA Replication RNA Structure and Function – Types of RNA Gene Expression – Transcription.
Copyright OpenHelix. No use or reproduction without express written consent1.
Transcription and Translation. Central Dogma of Molecular Biology  The flow of information in the cell starts at DNA, which replicates to form more DNA.
Annotation of eukaryotic genomes
DNA Deoxyribose Nucleic Acid – is the information code to make an organism and controls the activities of the cell. –Mitosis copies this code so that all.
CFE Higher Biology DNA and the Genome Transcription.
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
GENBANK FILE FORMAT LOCUS –LOCUS NAME Is usually the first letter of the genus and species name, followed by the accession number –SEQUENCE LENGTH Number.
Unit 1: DNA and the Genome Structure and function of RNA.
The Genetic Material Biology Unit DNA DNA is a Special molecule: 1. DNA stores and carries genetic information form one generation to the next.
Bioinformatics Computing 1 CMP 807 – Day 4 Kevin Galens.
Ch. 11: DNA Replication, Transcription, & Translation Mrs. Geist Biology, Fall Swansboro High School.
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
DNA Sequencing First generation techniques
FROM DNA TO PROTEIN Transcription – Translation
DNA as Biological Information
Gene Expression and Protein Synthesis
DNA as Biological Information
DNA Technology and Genomics
From Gene to Protein Chapter 17.
What is RNA? Do Now: What is RNA made of?
Unit 2 revision.
It og Sundhed Thomas Nordahl Petersen, Associate Professor
Central Dogma Central Dogma categorized by: DNA Replication Transcription Translation From that, we find the flow of.
RNA & Protein synthesis
3.1 Genes Essential idea: Every living organism inherits a blueprint for life from its parents. Genes and hence genetic information is inherited from.
It og Sundhed Thomas Nordahl Petersen, Associate Professor
Comparison Of DNA And RNA Synthesis in Prokaryotes and Eukaryotes
Thomas Nordahl Petersen, Associate Prof, Food DTU
Thomas Nordahl Petersen, Associate Bioinformatics, DTU
Presentation transcript:

Introduktion til Bioinformatik Hold 01 Oktober 2010

Introduktion Rasmus Wernersson, Lektor Anders Gorm Pedersen, Docent Center for Biologisk Sekvensanalyse, DTU

Oversigt Data & Databaser Metoder Taksonomi DNA Protein Protein struktur Alignment Pairwise + Multiple BLAST (søgning) Fylogenetiske træer PyMOL (3D visualisering) Opsamlende øvelse Malaria vaccine

Øvelserne er det primære

Kursusplan på vores wiki

On evolution and sequences Background information

Classification: Linnaeus Carl Linnaeus

Classification: Linnaeus Hierarchical system –Kingdom –Phylum –Class –Order –Family –Genus –Species

Classification depicted as a tree

No “mixed” animals Source:

Classification depicted as a tree SpeciesGenusFamilyOrderClass

Comparison of limbs Image source:

Theory of evolution Charles Darwin

Phylogenetic basis of systematics Linnaeus: Ordering principle is God. Darwin: Ordering principle is shared descent from common ancestors. Today, systematics is explicitly based on phylogeny.

Natural Selection: Darwin’s four postulates More young are produced each generation than can survive to reproduce. Individuals in a population vary in their characteristics. Some differences among individuals are based on genetic differences. Individuals with favorable characteristics have higher rates of survival and reproduction. Evolution by means of natural selection Presence of ”design-like” features in organisms: Quite often features are there “for a reason”

Evolution at the sequence level

About DNA DNA contains the recipes of how to make protein / enzymes. Every time a cells divides it’s DNA is duplicated, and each daughter cell gets a copy.

The DNA alphabet The information in the DNA is written in a four letter code: A, T, G, C. The DNA can be “sequenced” and the result stored in a computer file. ATGGCCCTGTGGAT

DNA is always written 5’  3’ 5’ AGCC 3’ 3’ TCGG 5’ 5’ ATGGCCAGGTAA 3’ DNA backbone: (Deoxy)ribose: Ribose Deoxyribose ’ 3’ 5’ 3’

ATGGCCCTGTGGATGCG Can DNA be changed?

ATGGCCCTGTGGATGCG ATGGCCCTATGGATGCG Can DNA be changed?

A history of mutations ATGGCCCTGTGTATGCG ATGGCAATGTGGATGCA ATGGCCCTGTGGATGCG ATGGCCCCGTGGATGCG ATGTCCCCGTGGATGCG ATGGCCCCGTGGAACCG Time

Species1: ATGGCAATGTGGATGCA Species2: ATGGCCCCGTGGAACCG Species3: ATGTCCCCGTGGATGCG “DNA alignment” 3 6 5

Real life example: Alignment Insulin from 7 different species Homo:ATGGCCCTGTGGATGCGCCTCCTGCCCCTGCTGGCGCTGCTGGCCCTCTGGGGACCTGACCCAGCCGCAGCCTTTGTGAA Pan:ATGGCCCTGTGGATGCGCCTCCTGCCCCTGCTGGTGCTGCTGGCCCTCTGGGGACCTGACCCAGCCTCGGCCTTTGTGAA Sus:ATGGCCCTGTGGACGCGCCTCCTGCCCCTGCTGGCCCTGCTGGCCCTCTGGGCGCCCGCCCCGGCCCAGGCCTTCGTGAA Ovis:ATGGCCCTGTGGACACGCCTGGTGCCCCTGCTGGCCCTGCTGGCACTCTGGGCCCCCGCCCCGGCCCACGCCTTCGTCAA Canis:ATGGCCCTCTGGATGCGCCTCCTGCCCCTGCTGGCCCTGCTGGCCCTCTGGGCGCCCGCGCCCACCCGAGCCTTCGTTAA Mus:ATGGCCCTGTTGGTGCACTTCCTACCCCTGCTGGCCCTGCTTGCCCTCTGGGAGCCCAAACCCACCCAGGCTTTTGTCAA Gallus:ATGGCTCTCTGGATCCGATCACTGCCTCTTCTGGCTCTCCTTGTCTTTTCTGGCCCTGGAACCAGCTATGCAGCTGCCAA

Real life example: Tree

Interpretation of Multiple Alignments Conserved features assumed to be important for functionality For instance: conserved pairs of cysteines indicate possible disulphide bridge

Darwin: all organisms are related through descent with modification Prediction: similar molecules have similar functions in different organisms Protein synthesis carried out by very similar RNA-containing molecular complexes (ribosomes) that are present in all known organisms Sequences are related

Sequences are related, II Related oxygen- binding proteins in humans

DNA as Biological Information Rasmus Wenersson

Overview Learning objectives –About Biological Information –A note about DNA sequencing techniques and DNA data –File formats used for biological data –Introduction to the GenBank database

Information flow in biological systems

DNA sequences = summary of information 5’ AGCC 3’ 3’ TCGG 5’ 5’ ATGGCCAGGTAA 3’ DNA backbone: (Deoxy)ribose: Ribose Deoxyribose ’ 3’ 5’ 3’

PCR Melting 96º, 30 sec Annealing ~55º, 30 sec Extension 72º, 30 sec 35 cycles Animation :

PCR Animation: PCR graph:

Gel electrophoresis DNA fragments are seperated using gel electrophoresis –Typically 1% argarose –Colored with EtBr or ZybrGreen (glows in UV light). –A DNA ”ladder” is used for identification of known DNA lengths. Gel picture: PCR setup: + -

The Sanger method of DNA sequencing Images: } Terminator X-ray sequenceing gel OH

Automated sequencing The major break-through of sequencing has happended through automation. Fluorescent dyes. Laser based scanning. Capillary electrophoresis Computer based base- calling and assembly. Images:

Handout exercise: ”base-calling” Handout: Chromotogram Groups of 2-3. Tasks: –Identify “difficult” regions –Identify “difficult” sequence stretches. –Try to estimate the best interval to use.

Biological data on computers The GenBank database File formats –FASTA –GenBank

NCBI GenBank GenBank is one of the main internaltional DNA databases. GenBank is hosted by NCBI: National Center for Biotechnology Information. GenBank has exists since The database is public - no restrictions on the use of the data within.

FASTA format >alpha-D ATGCTGACCGACTCTGACAAGAAGCTGGTCCTGCAGGTGTGGGAGAAGGTGATCCGCCAC CCAGACTGTGGAGCCGAGGCCCTGGAGAGGTGCGGGCTGAGCTTGGGGAAACCATGGGCA AGGGGGGCGACTGGGTGGGAGCCCTACAGGGCTGCTGGGGGTTGTTCGGCTGGGGGTCAG CACTGACCATCCCGCTCCCGCAGCTGTTCACCACCTACCCCCAGACCAAGACCTACTTCC CCCACTTCGACTTGCACCATGGCTCCGACCAGGTCCGCAACCACGGCAAGAAGGTGTTGG CCGCCTTGGGCAACGCTGTCAAGAGCCTGGGCAACCTCAGCCAAGCCCTGTCTGACCTCA GCGACCTGCATGCCTACAACCTGCGTGTCGACCCTGTCAACTTCAAGGCAGGCGGGGGAC GGGGGTCAGGGGCCGGGGAGTTGGGGGCCAGGGACCTGGTTGGGGATCCGGGGCCATGCC GGCGGTACTGAGCCCTGTTTTGCCTTGCAGCTGCTGGCGCAGTGCTTCCACGTGGTGCTG GCCACACACCTGGGCAACGACTACACCCCGGAGGCACATGCTGCCTTCGACAAGTTCCTG TCGGCTGTGTGCACCGTGCTGGCCGAGAAGTACAGATAA >alpha-A ATGGTGCTGTCTGCCAACGACAAGAGCAACGTGAAGGCCGTCTTCGGCAAAATCGGCGGC CAGGCCGGTGACTTGGGTGGTGAAGCCCTGGAGAGGTATGTGGTCATCCGTCATTACCCC ATCTCTTGTCTGTCTGTGACTCCATCCCATCTGCCCCCATACTCTCCCCATCCATAACTG TCCCTGTTCTATGTGGCCCTGGCTCTGTCTCATCTGTCCCCAACTGTCCCTGATTGCCTC TGTCCCCCAGGTTGTTCATCACCTACCCCCAGACCAAGACCTACTTCCCCCACTTCGACC TGTCACATGGCTCCGCTCAGATCAAGGGGCACGGCAAGAAGGTGGCGGAGGCACTGGTTG AGGCTGCCAACCACATCGATGACATCGCTGGTGCCCTCTCCAAGCTGAGCGACCTCCACG CCCAAAAGCTCCGTGTGGACCCCGTCAACTTCAAAGTGAGCATCTGGGAAGGGGTGACCA GTCTGGCTCCCCTCCTGCACACACCTCTGGCTACCCCCTCACCTCACCCCCTTGCTCACC ATCTCCTTTTGCCTTTCAGCTGCTGGGTCACTGCTTCCTGGTGGTCGTGGCCGTCCACTT CCCCTCTCTCCTGACCCCGGAGGTCCATGCTTCCCTGGACAAGTTCGTGTGTGCCGTGGG CACCGTCCTTACTGCCAAGTACCGTTAA (Handout)

GenBank format Originates from the GenBank database. Contains both a DNA sequence and annotation of feature (e.g. Location of genes). (handout)

GenBank format - HEADER LOCUS CMGLOAD 1185 bp DNA linear VRT 18-APR-2005 DEFINITION Cairina moschata (duck) gene for alpha-D globin. ACCESSION X01831 VERSION X GI:62724 KEYWORDS alpha-globin; globin. SOURCE Cairina moschata (Muscovy duck) ORGANISM Cairina moschata Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Archosauria; Aves; Neognathae; Anseriformes; Anatidae; Cairina. REFERENCE 1 (bases 1 to 1185) AUTHORS Erbil,C. and Niessing,J. TITLE The primary structure of the duck alpha D-globin gene: an unusual 5' splice junction sequence JOURNAL EMBO J. 2 (8), (1983) PUBMED COMMENT Data kindly reviewed (13-NOV-1985) by J. Niessing.

GenBank format - ORIGIN section ORIGIN 1 ctgcgtggcc tcagcccctc cacccctcca cgctgataag ataaggccag ggcgggagcg 61 cagggtgcta taagagctcg gccccgcggg tgtctccacc acagaaaccc gtcagttgcc 121 agcctgccac gccgctgccg ccatgctgac cgccgaggac aagaagctca tcgtgcaggt 181 gtgggagaag gtggctggcc accaggagga attcggaagt gaagctctgc agaggtgtgg 241 gctgggccca gggggcactc acagggtggg cagcagggag caggagccct gcagcgggtg 301 tgggctggga cccagagcgc cacggggtgc gggctgagat gggcaaagca gcagggcacc 361 aaaactgact ggcctcgctc cggcaggatg ttcctcgcct acccccagac caagacctac 421 ttcccccact tcgacctgca tcccggctct gaacaggtcc gtggccatgg caagaaagtg 481 gcggctgccc tgggcaatgc cgtgaagagc ctggacaacc tcagccaggc cctgtctgag 541 ctcagcaacc tgcatgccta caacctgcgt gttgaccctg tcaacttcaa ggcaagcggg 601 gactagggtc cttgggtctg ggggtctgag ggtgtggggt gcagggtctg ggggtccagg 661 ggtctgagtt tcctggggtc tggcagtcct gggggctgag ggccagggtc ctgtggtctt 721 gggtaccagg gtcctggggg ccagcagcca gacagcaggg gctgggattg catctgggat 781 gtgggccaga ggctgggatt gtgtttggaa tgggagctgg gcaggggcta gggccagggt 841 gggggactca gggcctcagg gggactcggg gggggactga gggagactca gggccatctg 901 tccggagcag gggtactaag ccctggtttg ccttgcagct gctggcacag tgcttccagg 961 tggtgctggc cgcacacctg ggcaaagact acagccccga gatgcatgct gcctttgaca 1021 agttcttgtc cgccgtggct gccgtgctgg ctgaaaagta cagatgagcc actgcctgca 1081 cccttgcacc ttcaataaag acaccattac cacagctctg tgtctgtgtg tgctgggact 1141 gggcatcggg ggtcccaggg agggctgggt tgcttccaca catcc //

FEATURES Location/Qualifiers source /organism="Cairina moschata" /mol_type="genomic DNA" /db_xref="taxon:8855" CAAT_signal TATA_signal precursor_RNA /note="primary transcript" exon /number=1 CDS join( , , ) /codon_start=1 /product="alpha D-globin" /protein_id="CAA " /db_xref="GI: " /db_xref="GOA:P02003" /db_xref="InterPro:IPR000971" /db_xref="InterPro:IPR002338" /db_xref="InterPro:IPR002340" /db_xref="InterPro:IPR009050" /db_xref="UniProt/Swiss-Prot:P02003" /translation="MLTAEDKKLIVQVWEKVAGHQEEFGSEALQRMFLAYPQTKTYFP HFDLHPGSEQVRGHGKKVAAALGNAVKSLDNLSQALSELSNLHAYNLRVDPVNFKLLA QCFQVVLAAHLGKDYSPEMHAAFDKFLSAVAAVLAEKYR" repeat_region /note="direct repeat 1" intron /number=1 repeat_region /note="direct repeat 1" exon /number=2 intron /number=2 exon /number=3 polyA_signal polyA_signal 1114 GenBank format - FEATURE section

Exercise: GenBank Work in groups of 2-3 people. The exercise guide is linked from the course programme. Read the guide carefully - it contains a lot of information about GenBank.