Presentation is loading. Please wait.

Presentation is loading. Please wait.

It & Health 2009 Summary Thomas Nordahl Petersen.

Similar presentations


Presentation on theme: "It & Health 2009 Summary Thomas Nordahl Petersen."— Presentation transcript:

1 It & Health 2009 Summary Thomas Nordahl Petersen

2 Teachers Thomas Nordahl Petersen Rasmus Wernersson Lisbeth Nielsen Fink Anders Gorm Pedersen Bent Petersen Ramneek Gupta Thomas Blicher

3 Outline of the course Topics will cover a general introduction to bioinformatics –Evolution –DNA / Protein –Alignment and scoring matrices How does it work & what are the numbers –Visualization of multiple alignments Phylogenetic trees and logo plots –Commonly used databases Uniprot/Genbank & Genome browsers –Protein 3D-structure –Artificial neural networks & case stories –Practical use of bioinformatics tools Preparation for exam

4 Topics covered - (some of them)

5 Information flow in biological systems

6 Amino Acids Amine and carboxyl groups. Sidechain ‘R’ is attached to C-alpha carbon The amino acids found in Living organisms are L-amino acids

7 Amino Acids - peptide bond N-terminalC-terminal

8 1 and 3-letter codes 1.There are 20 naturally occurring amino acids 2.Normally the one/three codes are used Ala - A Cys - C Asp - D Glu - E Phe - F Gly - G His - H Ile - I Lys - K Leu - L Met - M Asn - N Pro - P Gln - Q Arg - R Ser - S Thr - T Val - V Trp - W Tyr - Y

9 CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Theory of evolution Charles Darwin 1809-1882

10 Phylogenetic tree

11 Global versus local alignments Global alignment: align full length of both sequences. (The “Needleman-Wunsch” algorithm). Local alignment: find best partial alignment of two sequences (the “Smith-Waterman” algorithm). Global alignment Seq 1 Seq 2 Local alignment

12 Pairwise alignment: the solution ” Dynamic programming ” (the Needleman-Wunsch algorithm)

13 Sequence alignment - Blast

14

15 Blosum & PAM matrices Blosum matrices are the most commonly used substitution matrices. Blosum50, Blosum62, blosum80 PAM - Percent Accepted Mutations PAM-0 is the identity matrix. PAM-1 diagonal small deviations from 1, off- diag has small deviations from 0 PAM-250 is PAM-1 multiplied by itself 250 times.

16 Sequence profiles (1J2J.B) >1J2J.B mol:aa PROTEIN TRANSPORT NVIFEDEEKSKMLARLLKSSHPEDLRAANKLIKEMVQEDQKRMEK

17 Log-odds scores BLOSUM is a log-likelihood matrix: Likelihood of observing j given you have i is –P(j|i) = P ij /P i The prior likelihood of observing j is –Q j, which is simply the frequency The log-likelihood score is –S ij = 2log 2 (P(j|i)/log(Q j ) = 2log 2 (P ij /(Q i Q j )) –Where, Log 2 (x)=log n (x)/log n (2) –S has been normalized to half bits, therefore the factor 2

18 BLAST Exercise

19 Genome browsers - UCSC Intron - Exon structure Single Nucleotide polymorphism - SNP

20 SNPs

21 Protein 3D-structure

22 Protein structure Primary structure: Amino acids sequences Secondary structure: Helix/Beta sheet Tertiary structure: Fold, 3D cordinates

23 Protein structure  -helix    helix3 residues/turn - few, but not uncommon  - helix3.6 residues/turn - by far the most common helix Pi-helix4.1 residues/turn - very rare

24 Protein structure  strand/sheet

25 Protein folds Class 4’th is ‘few secondary structure Architecture Overall shape of a domain Topology Share secondary structure connectivity

26 Protein 3D-structure

27 Neural Networks From knowledge to information Protein sequence Biological feature

28 A data-driven method to predict a feature, given a set of training data In biology input features could be amino acid sequence or nucleotides Secondary structure prediction Signal peptide prediction Surface accessibility Propeptide prediction Use of artificial neural networks N C Signal peptide Propeptide Mature/active protein

29 Prediction of biological features Surface accessible Predict surface accessible from amino acid sequence only.

30 Logo plots Information content, how is it calculated - what does it mean.

31 Logo plots - Information Content Sequence-logo Calculate Information Content I =  a  p a log 2 p a + log 2 (4), Maximal value is 2 bits Total height at a position is the ‘Information Content’ measured in bits. Height of letter is the proportional to the frequency of that letter. A Logo plot is a visualization of a mutiple alignment. ~0.5 each Completely conserved


Download ppt "It & Health 2009 Summary Thomas Nordahl Petersen."

Similar presentations


Ads by Google