It & Health 2009 Summary Thomas Nordahl Petersen.

Slides:



Advertisements
Similar presentations
Proteins: Structure reflects function….. Fig. 5-UN1 Amino group Carboxyl group carbon.
Advertisements

It og Sundhed Nov Jan. Thomas Nordahl Petersen, Associate Professor Center for Biological Sequence Analysis, DTU Normal
It og Sundhed Thomas Nordahl Petersen, Associate Professor Center for Biological Sequence Analysis, DTU Building 208, room 021
It og Sundhed Nov Jan. Thomas Nordahl Petersen, Associate Professor Center for Biological Sequence Analysis, DTU
5’ C 3’ OH (free) 1’ C 5’ PO4 (free) DNA is a linear polymer of nucleotide subunits joined together by phosphodiester bonds - covalent bonds between.
Sequence alignment & Substitution matrices By Thomas Nordahl & Morten Nielsen.
Protein-a chemical view A chain of amino acids folded in 3D Picture from on-line biology bookon-line biology book Peptide Protein backbone N / C terminal.
1 Levels of Protein Structure Primary to Quaternary Structure.
Amino Acids and Proteins 1.What is an amino acid / protein 2.Where are they found 3.Properties of the amino acids 4.How are proteins synthesized 1.Transcription.
Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.
Sequence Alignment.
Introduction to Bioinformatics Algorithms Sequence Alignment.
It og Sundhed Thomas Nordahl Petersen, Associate Professor Center for Biological Sequence Analysis, DTU
Scoring Matrices June 19, 2008 Learning objectives- Understand how scoring matrices are constructed. Workshop-Use different BLOSUM matrices in the Dotter.
Scoring Matrices June 22, 2006 Learning objectives- Understand how scoring matrices are constructed. Workshop-Use different BLOSUM matrices in the Dotter.
Introduction to bioinformatics
ProteinStructuralDatabases. Proteins are built from amino-acids. Introduction H | NH2-c-CO2H | R.
©CMBI 2005 Why align sequences? Lots of sequences with unknown structure and function. A few sequences with known structure and function If they align,
Sequence alignment & Substitution matrices By Thomas Nordahl & Morten Nielsen.
Pairwise Alignment Global & local alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
It & Health 2010 Summary Thomas Nordahl Petersen.
Entropy, Information contents & Logo plots By Thomas Nordahl Petersen.
Introduction to Bioinformatics Algorithms Sequence Alignment.
Sequence Alignments Revisited
The relative orientation observed for  helices packed on ß sheets.
Protein Structure FDSC400. Protein Functions Biological?Food?
You Must Know How the sequence and subcomponents of proteins determine their properties. The cellular functions of proteins. (Brief – we will come back.
Proteins. The central role of proteins in the chemistry of life Proteins have a variety of functions. Structural proteins make up the physical structure.
Structural alignment Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.
BIOCHEMISTRY REVIEW Overview of Biomolecules Chapter 4 Protein Sequence.
Introduction to Bioinformatics Algorithms Sequence Alignment.
Pairwise Sequence Alignment (II) (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 27, 2005 ChengXiang Zhai Department of Computer Science University.
Eric C. Rouchka, University of Louisville Sequence Database Searching Eric Rouchka, D.Sc. Bioinformatics Journal Club October.
. Sequence Alignment. Sequences Much of bioinformatics involves sequences u DNA sequences u RNA sequences u Protein sequences We can think of these sequences.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Biology 4900 Biocomputing.
AMINO ACIDS.
Proteins – Amides from Amino Acids
Secondary structure prediction
Learning Targets “I Can...” -State how many nucleotides make up a codon. -Use a codon chart to find the corresponding amino acid.
Construction of Substitution Matrices
Outline 1.What is an amino acid / protein naturally occurring amino acids 3.Codon – triplet coding for an amino acid 1.How are proteins synthesized.
Outline What is an amino acid / protein
Protein Secondary Structure Prediction G P S Raghava.
A program of ITEST (Information Technology Experiences for Students and Teachers) funded by the National Science Foundation Background Session #3 DNA &
In-Class Assignment #1: Research CD2
Proteins.
Proteins Structure of proteins Proteins are made of C, H, O and nitrogen and may have sulfur. The monomers of proteins are amino acids An amino acid.
Chapter 3 Proteins.
Construction of Substitution matrices
Protein Sequence Alignment Multiple Sequence Alignment
Introduction to Bioinformatics Summary Thomas Nordahl Petersen.
Fibrous Proteins Examples 1. a-keratins 2. Silk Fibroin 3. Collagen
Prepared By: Syed Khaleelulla Hussaini. Outline Proteins DNA RNA Genetics and evolution The Sequence Matching Problem RNA Sequence Matching Complexity.
Arginine, who are you? Why so important?. Release 2015_01 of 07-Jan-15 of UniProtKB/Swiss-Prot contains sequence entries, comprising
Bioinformatics Overview
Sequence similarity, BLAST alignments & multiple sequence alignments
Cathode (attracts (+) amino acids)
Outline What is an amino acid / protein
Entropy, Information contents & Logo plots By Thomas Nordahl Petersen
Figure 3.14A–D Protein structure (layer 1)
Haixu Tang School of Inforamtics
Entropy, Information contents & Logo plots By Thomas Nordahl Petersen
Amino Acids Amine group -NH2 Carboxylic group -COOH
It og Sundhed Thomas Nordahl Petersen, Associate Professor
Levels of Protein Structure
Entropy, Information contents & Logo plots By Thomas Nordahl Petersen
It og Sundhed Thomas Nordahl Petersen, Associate Professor
Thomas Nordahl Petersen, Associate Prof, Food DTU
Thomas Nordahl Petersen, Associate Bioinformatics, DTU
Presentation transcript:

It & Health 2009 Summary Thomas Nordahl Petersen

Teachers Thomas Nordahl Petersen Rasmus Wernersson Lisbeth Nielsen Fink Anders Gorm Pedersen Bent Petersen Ramneek Gupta Thomas Blicher

Outline of the course Topics will cover a general introduction to bioinformatics –Evolution –DNA / Protein –Alignment and scoring matrices How does it work & what are the numbers –Visualization of multiple alignments Phylogenetic trees and logo plots –Commonly used databases Uniprot/Genbank & Genome browsers –Protein 3D-structure –Artificial neural networks & case stories –Practical use of bioinformatics tools Preparation for exam

Topics covered - (some of them)

Information flow in biological systems

Amino Acids Amine and carboxyl groups. Sidechain ‘R’ is attached to C-alpha carbon The amino acids found in Living organisms are L-amino acids

Amino Acids - peptide bond N-terminalC-terminal

1 and 3-letter codes 1.There are 20 naturally occurring amino acids 2.Normally the one/three codes are used Ala - A Cys - C Asp - D Glu - E Phe - F Gly - G His - H Ile - I Lys - K Leu - L Met - M Asn - N Pro - P Gln - Q Arg - R Ser - S Thr - T Val - V Trp - W Tyr - Y

CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Theory of evolution Charles Darwin

Phylogenetic tree

Global versus local alignments Global alignment: align full length of both sequences. (The “Needleman-Wunsch” algorithm). Local alignment: find best partial alignment of two sequences (the “Smith-Waterman” algorithm). Global alignment Seq 1 Seq 2 Local alignment

Pairwise alignment: the solution ” Dynamic programming ” (the Needleman-Wunsch algorithm)

Sequence alignment - Blast

Blosum & PAM matrices Blosum matrices are the most commonly used substitution matrices. Blosum50, Blosum62, blosum80 PAM - Percent Accepted Mutations PAM-0 is the identity matrix. PAM-1 diagonal small deviations from 1, off- diag has small deviations from 0 PAM-250 is PAM-1 multiplied by itself 250 times.

Sequence profiles (1J2J.B) >1J2J.B mol:aa PROTEIN TRANSPORT NVIFEDEEKSKMLARLLKSSHPEDLRAANKLIKEMVQEDQKRMEK

Log-odds scores BLOSUM is a log-likelihood matrix: Likelihood of observing j given you have i is –P(j|i) = P ij /P i The prior likelihood of observing j is –Q j, which is simply the frequency The log-likelihood score is –S ij = 2log 2 (P(j|i)/log(Q j ) = 2log 2 (P ij /(Q i Q j )) –Where, Log 2 (x)=log n (x)/log n (2) –S has been normalized to half bits, therefore the factor 2

BLAST Exercise

Genome browsers - UCSC Intron - Exon structure Single Nucleotide polymorphism - SNP

SNPs

Protein 3D-structure

Protein structure Primary structure: Amino acids sequences Secondary structure: Helix/Beta sheet Tertiary structure: Fold, 3D cordinates

Protein structure  -helix    helix3 residues/turn - few, but not uncommon  - helix3.6 residues/turn - by far the most common helix Pi-helix4.1 residues/turn - very rare

Protein structure  strand/sheet

Protein folds Class 4’th is ‘few secondary structure Architecture Overall shape of a domain Topology Share secondary structure connectivity

Protein 3D-structure

Neural Networks From knowledge to information Protein sequence Biological feature

A data-driven method to predict a feature, given a set of training data In biology input features could be amino acid sequence or nucleotides Secondary structure prediction Signal peptide prediction Surface accessibility Propeptide prediction Use of artificial neural networks N C Signal peptide Propeptide Mature/active protein

Prediction of biological features Surface accessible Predict surface accessible from amino acid sequence only.

Logo plots Information content, how is it calculated - what does it mean.

Logo plots - Information Content Sequence-logo Calculate Information Content I =  a  p a log 2 p a + log 2 (4), Maximal value is 2 bits Total height at a position is the ‘Information Content’ measured in bits. Height of letter is the proportional to the frequency of that letter. A Logo plot is a visualization of a mutiple alignment. ~0.5 each Completely conserved