MBV3070 Bioinformatikk. Pensumliste MBV3070 - Bioinformatikk Arthur M. Lesk: Introduction to Bioinformatics. Oxford University Press 2002. 270 sider I.

Slides:



Advertisements
Similar presentations
Proteins: Structure reflects function….. Fig. 5-UN1 Amino group Carboxyl group carbon.
Advertisements

Review.
Aim: How does a chromosome code for a specific protein ?
Amino Acids PHC 211.  Characteristics and Structures of amino acids  Classification of Amino Acids  Essential and Nonessential Amino Acids  Levels.
Fundamentals of Protein Structure August, 2006 Tokyo University of Science Tadashi Ando.
• Exam II Tuesday 5/10 – Bring a scantron with you!
5’ C 3’ OH (free) 1’ C 5’ PO4 (free) DNA is a linear polymer of nucleotide subunits joined together by phosphodiester bonds - covalent bonds between.
RNA Say Hello to DNA’s little friend!. EngageEssential QuestionExplain Describe yourself to long lost uncle. How do the mechanisms of genetics and the.
A Zero-Knowledge Based Introduction to Biology Cory McLean 26 Sep 2008 Thanks to George Asimenos.
The Organic Molecules of Living Organisms
Lectures on Computational Biology HC Lee Computational Biology Lab Center for Complex Systems & Biophysics National Central University EFSS II National.
© 2010 Pearson Education, Inc. Lectures by Chris C. Romero, updated by Edward J. Zalisko PowerPoint ® Lectures for Campbell Essential Biology, Fourth Edition.
Molecular Techniques in Molecular Systematics. DNA-DNA hybridisation -Measures the degree of genetic similarity between pools of DNA sequences. -Normally.
Exciting Developments in Molecular Biology As seen by an amateur.
You Must Know How the sequence and subcomponents of proteins determine their properties. The cellular functions of proteins. (Brief – we will come back.
Chapter 27 Amino Acids, Peptides, and Proteins. Nucleic Acids.
Proteins and Enzymes Nestor T. Hilvano, M.D., M.P.H. (Images Copyright Discover Biology, 5 th ed., Singh-Cundy and Cain, Textbook, 2012.)
Wellcome Trust Workshop Working with Pathogen Genomes Module 1 Artemis.
1.What makes an enzyme specific to one type of reaction (in other words, what determines the function of a protein)? –SHAPE determines the function of.
Unit 7 RNA, Protein Synthesis & Gene Expression Chapter 10-2, 10-3
How does DNA work? What is a gene?
Protein Synthesis. DNA RNA Proteins (Transcription) (Translation) DNA (genetic information stored in genes) RNA (working copies of genes) Proteins (functional.
Human Genetic Variation Basic terminology. What is a gene? A gene is a functional and physical unit of heredity passed from parent to offspring. Genes.
CHAPTER 12 PROTEIN SYNTHESIS AND MUTATIONS -RNA -PROTEIN SYNTHESIS -MUTATIONS.
©CMBI 2006 Amino Acids “ When you understand the amino acids, you understand everything ”
Concept 5.4: Proteins have many structures, resulting in a wide range of functions Proteins account for more than 50% of the dry mass of most cells Protein.
How Proteins Are Made Mrs. Wolfe. DNA: instructions for making proteins Proteins are built by the cell according to your DNA What kinds of proteins are.
. Sequence Alignment. Sequences Much of bioinformatics involves sequences u DNA sequences u RNA sequences u Protein sequences We can think of these sequences.
1 LSM2241 P1 & P2 – Extra Discussion Questions. Features of major databases (PubMed and NCBI Protein Db) 2.
© 2010 Pearson Education, Inc. Lectures by Chris C. Romero, updated by Edward J. Zalisko PowerPoint ® Lectures for Campbell Essential Biology, Fourth Edition.
LESSON 4: Using Bioinformatics to Analyze Protein Sequences PowerPoint slides to accompany Using Bioinformatics : Genetic Research.
AMINO ACIDS.
Genetics in ~1920: 1. Cells have chromosomes Sketch of Drosophila chromosomes (Bridges, C. 1913)
Learning Targets “I Can...” -State how many nucleotides make up a codon. -Use a codon chart to find the corresponding amino acid.
Fig Second mRNA base First mRNA base (5 end of codon) Third mRNA base (3 end of codon)
Macromolecules of Life Proteins and Nucleic Acids
CELL REPRODUCTION: MITOSIS INTERPHASE: DNA replicates PROPHASE: Chromatin condenses into chromosomes, centrioles start migrating METAPHASE: chromosomes.
Chap. 1 basic concepts of Molecular Biology Introduction to Computational Molecular Biology Chapter 1.
End Show Slide 1 of 39 Copyright Pearson Prentice Hall 12-3 RNA and Protein Synthesis 12–3 RNA and Protein Synthesis.
RNA 2 Translation.
DNA Pretest! Yes, I know I am a little late… Take out a separate sheet of paper Name Date Period DNA Pretest.
Transcription and Translation
Amino Acids ©CMBI 2001 “ When you understand the amino acids, you understand everything ”
1 Discussion Practical 1. Features of major databases (PubMed and NCBI Protein Db) 2.
Proteins.
Chapter 3 Proteins.
Biochemistry I Chapter 4 Amino Acids revised 9/5/2013
M3/31EXAM IIChapters 8-12, parts of 2, 3 W4/2Transcription and TranslationChapters 4, 15 M4/7"Molecular" GeneticsChapter 16 W4/9"Classical" GeneticsChapter.
CS273a A Zero-Knowledge Based Introduction to Biology Courtesy of George Asimenos.
DANDY Deoxyribonucleic Acid ALL CELLS HAVE DNA… Cells are the basic unit of structure and function of all living things. –Prokaryotes (bacteria) –Eukaryotes.
Parts is parts…. AMINO ACID building block of proteins contain an amino or NH 2 group and a carboxyl (acid) or COOH group PEPTIDE BOND covalent bond link.
1 Discussion Practical 1. Features of major databases (PubMed and NCBI Protein Db) 2.
GOVERNMENT ENGINEERING COLLEGE, BHARUCH Subject : Organic Chemistry and Unit Process.
Prepared By: Syed Khaleelulla Hussaini. Outline Proteins DNA RNA Genetics and evolution The Sequence Matching Problem RNA Sequence Matching Complexity.
Genomics Lecture 3 By Ms. Shumaila Azam. Proteins Proteins: large molecules composed of one or more chains of amino acids, polypeptides. Proteins are.
Sequence File Formats.
Translation PROTEIN SYNTHESIS.
Protein Synthesis: Translation
Alignment Sequence, Structure, Network
BIOLOGY 12 Protein Synthesis.
RNA Ribonucleic Acid.
Protein Sequence Alignments
Proteins.
Fig. 5-UN1  carbon Amino group Carboxyl group.
Proteins Genetic information in DNA codes specifically for the production of proteins Cells have thousands of different proteins, each with a specific.
The 20 amino acids.
Translation.
The 20 amino acids.
“When you understand the amino acids,
Presentation transcript:

MBV3070 Bioinformatikk

Pensumliste MBV Bioinformatikk Arthur M. Lesk: Introduction to Bioinformatics. Oxford University Press sider I tillegg: 1.Tom Kristensen: Sekvenssammenstillinger. 7 sider. 2.Thompson, J.D., Higgins, D.G. and Gibson, T.J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions- specific gap penalties and weight matrix choice. Nucleic Acids Research, 22: D.G:Higgins, J.D.Thompson and T.J.Gibson: Using CLUSTAL for multiple sequence alignments. Methods Enzymol. 266 (1994) ??? (Genfinning) 5.???? (Mikromatriser

Innledning. Sekvensering. Databaser. Entrez og SRS. Dotplots Parvis sekvenssammenstilling FASTA og BLAST Flersekvenssammenstilling. ClustalW/ClustalX Motiver, profiler, PSI-BLAST Fylogeni Genomer. Analyse av genomisk DNA. Genfinning Mikromatriser (Ola Myklebost/Ole Chr. Lindgjærde) Proteinmodellering Fremdriftsplan Vincent Eijsink

Nyttige nettsteder for MBV3070 Emnets hjemmeside: molbio/MBV3070/v04/ molbio/MBV3070/v04/ Lærebokas hjemmeside:

Hva er bioinformatikk? The NIH Biomedical Information Science and Technology Initiative Consortium agreed on the following definitions of bioinformatics and computational biology recognizing that no definition could completely eliminate overlap with other activities or preclude variations in interpretation by different individuals and organizations. Bioinformatics: Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data. Computational Biology: The development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems.

Andre måter å definere bioinformatikk på "The mathematical, statistical and computing methods that aim to solve biological problems using DNA and amino acid sequences and related information." Fredj Tekaja, Institute Pasteur ”The use of computers to store, retrieve, analyze or predict the composition or the structure of biomolecules.” Damian Councell, bioinformatics.org

“It tries experiments. It wakes up every morning, does a little mutagenesis, changes a nucleotide here and there, and sees how it works. If it’s a success, it keeps the notes. In this notebook, we have all of the information of the greatest experimental tinkerer ever.” “For the last three and a half billion years, evolution has been taking notes.” Dr. Eric Lander Director of the Whitehead InstituteMIT Center for Genome Research

Hva betyr dette?

Base symbols AAdenine CCytosine GGuanine TThymine UUracil RGuanine / Adenine (puRine) YCytosine / Thymine (pYrimidine) KGuanine / Thymine (Keto) MAdenine / Cytosine (aMino) SGuanine / Cytosine (Strong) WAdenine / Thymine (Weak) BGuanine / Thymine / Cytosine (not A) DGuanine / Adenine / Thymine (not C) HAdenine / Cytosine / Thymine (not G) VGuanine / Cytosine / Adenine (not T) NAdenine / Guanine / Cytosine / Thymine

Hvorfor tvetydige symboler? Sekvenseringsinstrumenter vil ikke alltid kunne lese sekvensen entydig I konsensussekvenser er det nyttig med tvetydige symboler Sekvens 1 aagcggtaccag Sekvens 2 aaacagcaccaa Konsensus aarcrgyaccar

Den genetiske kode

Aminosyresymboler A Ala alanine B Asx aspartic acid or asparagine C Cys cysteine D Asp aspartic acid E Glu glutamic acid F Phe phenylalanine G Gly glycine H His histidine I Ile isoleucine K Lys lysine L Leu leucine M Met methionine N Asn asparagine P Pro proline Q Gln glutamine R Arg arginine S Ser serine T Thr threonine U Sec selenocysteine V Val valine W Trp tryptophan X Xaa unknown or 'other' amino acid Y Tyr tyrosine Z Glx glutamic acid or glutamine (or substances such as 4-carboxyglutamic acid and 5-oxoproline that yield glutamic acid on acid hydrolysis of peptides)

To måter å sekvensere på Shotgun-sekvensering: Dette er strategien som ble valgt av Celera for kommersiell sekvensering av det humane genom Ordnet sekvensering (top down): Denne strategien ble brukt i den ”offentlige” sekvensering av genomet, i et internasjonalt samarbeid

Ovenfra og nedover-strategi for sekvensering

To måter å sekvensere genomet på BAC to BAC Sequencing The BAC to BAC approach first creates a crude physical map of the whole genome before sequencing the DNA. Constructing a map requires cutting the chromosomes into large pieces and figuring out the order of these big chunks of DNA before taking a closer look and sequencing all the fragments. Whole Genome Shotgun Sequencing The shotgun sequencing method goes straight to the job of decoding, bypassing the need for a physical map. Therefore, it is much faster.

Fragmentering av genomet BAC to BAC Sequencing Whole Genome Shotgun Sequencing

Kloning av fragmentene BAC to BAC Sequencing Whole Genome Shotgun Sequencing

Plassering på kartet av BAC-klonene BAC to BAC Sequencing Whole Genome Shotgun Sequencing This step not needed in shotgun sequencing

Subkloner fra BAC-klonene BAC to BAC Sequencing Whole Genome Shotgun Sequencing This step not needed in shotgun sequencing

Sekvensering av klonene BAC to BAC Sequencing Whole Genome Shotgun Sequencing

Råsekvens fra et sekvenseringsinstrument

Oppbygging av sammenhengende sekvenser BAC to BAC Sequencing Whole Genome Shotgun Sequencing

Sammensetting av enkeltsekvenser til større sekvenser

DNA sequencing 2001

Biological databases Primary databases (archival) –GenBank, EMBL, DDBJ, PDB Secondary databases (curated) –PIR, SwissProt and everything else

Database Categories List Genomics Databases (non-vertebrate) Human and other Vertebrate Genomes Human Genes and Diseases Metabolic and Signaling Pathways Microarray Data and other Gene Expression Databases Nucleotide Sequence Databases Other Molecular Biology Databases Protein sequence databases Proteomics Resources RNA sequence databases Structure Databases In all 548 databases, 162 more than one year ago

GenBank entry LOCUS LISOD 756 bp DNA BCT 30-JUN-1993 DEFINITION L.ivanovii sod gene for superoxide dismutase. ACCESSION X64011 S78972 NID g44010 VERSION X GI:44010 KEYWORDS sod gene; superoxide dismutase. SOURCE Listeria ivanovii. ORGANISM Listeria ivanovii Bacteria; Firmicutes; Bacillus/Clostridium group; Bacillaceae; Listeria. REFERENCE 1 (bases 1 to 756) AUTHORS Haas,A. and Goebel,W. TITLE Cloning of a superoxide dismutase gene from Listeria ivanovii by functional complementation in Escherichia coli and characterization of the gene product JOURNAL Mol. Gen. Genet. 231 (2), (1992) MEDLINE REFERENCE 2 (bases 1 to 756) AUTHORS Kreft,J. TITLE Direct Submission JOURNAL Submitted (21-APR-1992) J. Kreft, Institut f. Mikrobiologie, Universitaet Wuerzburg, Biozentrum Am Hubland, 8700 Wuerzburg, FRG

GenBank entry (cont.) FEATURES Location/Qualifiers source /organism="Listeria ivanovii" /strain="ATCC 19119" /db_xref="taxon:1638" RBS /gene="sod" gene /gene="sod" CDS /gene="sod" /EC_number=" " /codon_start=1 /transl_table=11 /product="superoxide dismutase" /protein_id="CAA " /db_xref="SWISS-PROT:P28763" /translation="MTYELPKLPYTYD… terminator /gene="sod" BASE COUNT 247 a 136 c 151 g 222 t ORIGIN 1 cgttatttaa ggtgttacat agttctatgg aaatagggtc tatacctttc gccttacaat 61 gtaatttctt //

EMBL database entry EMBL:TRBG361 ID TRBG361 standard; RNA; PLN; 1859 BP. XX AC X56734; S46826; XX SV X XX DT 12-SEP-1991 (Rel. 29, Created) DT 15-MAR-1999 (Rel. 59, Last updated, Version 9) XX DE Trifolium repens mRNA for non-cyanogenic beta-glucosidase XX KW beta-glucosidase. XX OS Trifolium repens (white clover) OC Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; OC Spermatophyta; Magnoliophyta; eudicotyledons; core eudicots; Rosidae; OC eurosids I; Fabales; Fabaceae; Papilionoideae; Trifolieae; Trifolium. XX

EMBL database entry (cont.) RN [5] RP RX MEDLINE; RA Oxtoby E., Dunn M.A., Pancoro A., Hughes M.A.; RT "Nucleotide and derived amino acid sequence of the cyanogenic RT beta-glucosidase (linamarase) from white clover (Trifolium repens L.)."; RL Plant Mol. Biol. 17: (1991). XX RN [6] RP RA Hughes M.A.; RT ; RL Submitted (19-NOV-1990) to the EMBL/GenBank/DDBJ databases. RL M.A. Hughes, UNIVERSITY OF NEWCASTLE UPON TYNE, MEDICAL SCHOOL, NEW CASTLE RL UPON TYNE, NE2 4HH, UK XX DR AGDR; X56734; X DR MENDEL; 11000; Trirp;1162; DR SWISS-PROT; P26204; BGLS_TRIRP. XX

EMBL database entry (cont.) FH Key Location/Qualifiers FH FT source FT /db_xref="taxon:3899" FT /organism="Trifolium repens" FT /tissue_type="leaves" FT /clone_lib="lambda gt10" FT /clone="TRE361" FT CDS FT /db_xref="SWISS-PROT:P26204" FT /note="non-cyanogenic" FT /EC_number=" " FT /product="beta-glucosidase" FT /protein_id="CAA " FT /translation="MDFIVAIFALFVISSFTITSTNAVEASTLLDIGNLSRSSFPRGFI FT FGAGSSAYQFEGAVNEGGRGPSIWDTFTHKYPEKIRDGSNADITVDQYHRYKEDVGIMK FT DQNMDSYRFSI…. FT mRNA FT /evidence=EXPERIMENTAL XX SQ Sequence 1859 BP; 609 A; 314 C; 355 G; 581 T; 0 other; aaacaaacca aatatggatt ttattgtagc catatttgct ctgtttgtta ttagctcatt 60 cacaattact tccacaaatg cagttgaagc ttctactctt cttgacatag gtaacctgag 120 tcggagcagt tttcctcgtg

EMBL database fields Note that each line begins with a two-character line code, which indicates the type of information contained in the line. The currently used line types, along with their respective line codes, are listed below: ID - identification (begins each entry; 1 per entry) AC - accession number (>=1 per entry) SV - new sequence identifier (>=1 per entry) DT - date (2 per entry) DE - description (>=1 per entry) KW - keyword (>=1 per entry) OS - organism species (>=1 per entry) OC - organism classification (>=1 per entry) OG - organelle (0 or 1 per entry) RN - reference number (>=1 per entry) RC - reference comment (>=0 per entry)

EMBL database fields (cont.) RP - reference positions (>=1 per entry) RX - reference cross-reference (>=0 per entry) RA - reference author(s) (>=1 per entry) RT - reference title (>=1 per entry) RL - reference location (>=1 per entry) DR - database cross-reference (>=0 per entry) FH - feature table header (0 or 2 per entry) FT - feature table data (>=0 per entry) CC - comments or notes (>=0 per entry) XX - spacer line (many per entry) SQ - sequence header (1 per entry) bb - (blanks) sequence data (>=1 per entry) // - termination line (ends each entry; 1 per entry)

The feature table The overall goal of the feature table design is to provide an extensive vocabulary for describing features in a flexible framework for manipulating them. The Feature Table documentation represents the shared rules that allow the three databases to exchange data on a daily basis. The range of features to be represented is diverse, including regions which:  perform a biological function,  affect or are the result of the expression of a biological function,  interact with other molecules,  affect replication of a sequence,  affect or are the result of recombination of different sequences,  are a recognizable repeated unit,  have secondary or tertiary structure,  exhibit variation, or  have been revised or corrected.

Feature table terminology The format and wording in the feature table use common biological research terminology whenever possible. For example, an item in the new feature table such as: Key Location/Qualifiers CDS /product="alcohol dehydrogenase" /gene="adhI" might be read as: The feature CDS is a coding sequence beginning at base 23 and ending at base 400, has a product called 'alcohol dehydrogenase' and corresponds to the gene called 'adhI'.

Feature table terminology (cont.) A more complex description: Key Location/Qualifiers CDS join( , ) /product="T-cell receptor beta-chain" /partial which might be read as: This feature, which is a partial coding sequence is formed by joining the indicated elements to form one contiguous sequence encoding a product called T-cell receptor beta-chain.

Feature key examples Key Description conflict Separate determinations of the "same" sequence differ rep_origin Origin of replication protein_bind Protein binding site on DNA CDS Protein-coding sequence misc_RNA Generic label for an undefined RNA insertion_seq Insertion element D-loop Mitochondrial or other D-loop structure