Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bioinformatics A Summary seminar (with many hints for exam questions)

Similar presentations


Presentation on theme: "Bioinformatics A Summary seminar (with many hints for exam questions)"— Presentation transcript:

1 Bioinformatics A Summary seminar (with many hints for exam questions)

2 1.The question: Transfer of information 2.The tools: MRS, BLAST, Clustal, Databases, SwissProt 3.Amino acid knowledge: understand secondary structure 4.Secondary structure -> protein structure 5.Protein structure helps make alignments 6.Alignments allow for transfer of information 1) Introduction

3 Bioinformatics Necessary evil, panacea, or just a useful tool? With a month in the lab you can easily prevent having to sit an hour in front of the computer. Nothing is impossible for a biologist who doesn’t have to discover it him/her-self.

4 Bio + informatica

5 Genome annotation

6 Bioinformatics and medicines One day we know everything about all human (and flu) proteins and then can we start to ‘calculate’ flu-medicines.

7 Drug Design

8 Mens vs parasiet Parasite Active site

9 H1N1 / H5N1

10 2) Tools MRS, kind of bioGoogle BLAST to find homologs SwissProt: protein sequences PDB: macromolecular structures EMBL: nucleotide sequences OMIM: genetic disorders ProSite: motifs (e.g. {P} [ST] {P} N )

11 ©CMBI 2010 Biological databases (1) Primary databases contain biomolecular sequences or structures (experimental data!) and associated annotation information SequencesNucleic acid sequences EMBL, Genbank, DDBJ Protein sequences SwissProt, trEMBL, UniProt StructuresProtein Structures PDB Structures of small compounds CSD Genomes Ensembl UCSC

12 ©CMBI 2015 Databases Data must be in a certain format for software to recognize Every database can have its own format but some data elements are essential for every database: 1. Unique identifier, or accession code 2. Name of depositor 3. Literature references 4. Deposition date 5. The real data Nomenclature: Database entry or database record Database fields

13 ©CMBI 2015 SwissProt database Database of protein sequences >500.000 sequence entries SwissProt is manually annotated and reviewed, thus of high quality, but never complete; it contains many feature descriptions and many hyperlinks to other databases; a bioinformatician always looks in SwissProt first… Obligatory deposit of in SwissProt before publication SwissProt is part of UniProt The other main part of UniProt is Trembl (translated EMBL). Trembl is automatically annotated and is not reviewed.

14 ©CMBI 2009 Part III: Sequence Retrieval with MRS GoogleThé best generic search and retrieval system Google searches everywhere for everything MRSMaarten’s Retrieval System (http://mrs.cmbi.ru.nl ) MRS searches in selected data environments MRS is the Google of the biological database world Search engine (like Google) Input/Query = word(s) Output = entry/entries from database Other programs exist: Entrez, SRS,....

15 Transfer of information to corresponding residues BLAST finds two database hits that are annotated to have a phosphorylated serine. DRT-GHNIPLMSTRK-TYHIHIENASEERTIKLLMN DRR-GTTINLMTTKR-TYADELENASEDRTLLLNMN AEPIYYHL---LTKRETYHIHIENASEEKIIKIVVN “this serine is phorphorylated in a known protein from the database, so in my protein the corresponding serine is likely to be phosphorylated too”.

16 PAM250 Matrix (Dayhoff Matrix) Symmetric Many matrices exist Question determines method

17 Amino Acid substitutions, some thoughts Not all 20x20 possible mutations occur equally often Residues mutate more easily to similar ones (e.g. Leucine and Isoleucine) Residues at surface mutate more easily Aromatics mutate preferably into aromatics Core tends to be hydrophobic; Cysteines are dangerous at the surface Cysteines in sulfur bridges (S-S) seldom mutate Some amino acids have similar codons (for example TTT & TTC for Phe, TTA & TTG for Leu) Etc etc

18 BLAST Output A high score indicates a likely relationship A low E-value indicates that a match is unlikely to have arisen by chance Click here to go to the corresponding swissprot entry Click here to study alignment in detail; Look here first!!

19 Low complexity motifs visible

20 3) Amino acids Hydrophobicity Entropy of water Amino acids have characteristics that determine their behaviour, and what they are being used for (Gly, Cys, His, Ser, Asp, etc).

21 Amino acids – Hydrophobicity Hydrophobicity is the most important property It drives the folding of a protein The sticky amino acids glue together The non-sticky amino acids point into the water The waters must be ‘happy’

22 Amino acids - Hydrophobicity ( Not to scale )

23 Amino acids – Properties Amino acids are not easily put into boxes according to their properties Every amino acid belongs to several categories Every amino acid is unique Hydrophobicity Size Secondary structure preference Charge Special characteristics

24 Structure data often is not available. Sequences don’t exist; structures exist. Residues at corresponding positions in structures have corresponding functions. Sequence alignment is the poor man’s solution to structure alignment. Knowledge of the structure (even if only predicted) can help improve the alignment. 4) (Secondary) structure

25 Secondary structure – α -helix N-terminus C-terminus Three things: AMELK residues Fobic-filic... Helix dipole

26 Secondary structure – β -strand A β-sheet consists of at least two β-strands that interact with each other Anti-parallelParallel Two things: VITWYF residues Fobic-filic...

27 Secondary structure – Turn Turns connect the secondary structure elements. Turns are between two things… Beta-turns hold PSDNG.

28 Secondary structure - Loop A loop is everything that has no regular secondary structure; non of the above.

29 Residues that are good for a helix Ala, Met, Glu, Leu, Lys (AMELK) Residues that are good for strands Val, Ile, Thr, Trp, Tyr, Phe (VITWYF) Residues that are good for turns Pro, Ser, Asp, Asn, Gly (PSDNG) Amino acids – Secondary structure preference

30 3) Align CWEALALLAELALAAMKGSTPNGS met CWEALALLLEALMRGTTPNGG CWEALALLAELALAAMKGSTPNGS ??hhhhhhhhhhhhhhh------- CWEALALLLEALMR---GTTPNGG ??hhhhhhhhhhhh---------- CW obviously on top of CW. Predict and align the two helices. Gap at end of helix. ©CMBI 20

31 5) Structure and alignment

32 Aquaporin

33 112 143 A G H > S+ 0 0 48 597 70 AAAGDSHDTSASTGGNGASTTAAAGSSAKTNSSTSSGSAGSgggrKKKGKRKKNSGGSKADDSSGKDKGA 113 144 A P H > S+ 0 0 32 599 91 YYYFDYQDYYYPPPPPLYYYIYFYPYYYQYPFYYFYRQFPQQQQVQQQPRPQQQYLLLPFAAEFLQAQFF 114 145 A Y H <>S+ 0 0 4 599 4 FFYYYYFYYYYYYYYYYFYYYYYYYYYYYYYYYYYFYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY 115 146 A N H ><5S+ 0 0 65 599 77 NDVATEDNNNVQQQQQQNTNQNVHQIVVQNQNTNNVEQVMEGGGEQQQHQQQQQTAEDEVQQEMAQQQAV 116 147 A Q H 3<5S+ 0 0 147 600 79 RRRRKVATRRRTTRRTTRRRVRRRRKRRTRTRRRRRVVRALRRRSAAAAASAAARRTRRRAAMDRAAARR 117 148 A F T 3<5S- 0 0 72 600 72 YYYYHYFHYYYLLLLLNYYYQYYYLYYYLYLYYYYYLHYLHLLLLLLLNLNLLLYYTYLYNNDHHLNLYY 118 149 A G T X 5 - 0 0 29 600 13 GGGGGGGGGGGGGGGGGGGGGGGDGGGGGGGGGGGGGGGKNGGGGGGGGGGGGGGGGGGGGGGGGGGGGG 119 150 A G T 3 < - 0 0 0 599 0 GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG 120 151 A G T 3 S+ 0 0 5 601 0 GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG 121 152 A A < - 0 0 22 601 9 AAAAAATAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 122 153 A N + 0 0 1 601 0 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN 123 154 A S - 0 0 29 601 83 TFEETEVTVQSTTVTTVTEMEEEMVSSSGMTMEVTFTREFMAAAVTTTLTLTTTGEVEVEMMSAETMTEE 124 155 A V - 0 0 4 601 25 LLLVVLVVLLLVVVVVVLLLVLLLVLLLVLVVLLVLVVLVVVVVVVVVVVVVVVLVVLVLVVVVVVVVVL 125 156 A A > - 0 0 37 602 47 AHSSASAAAAAAAQAAAAASASSSNSAAASAAAASQNQSNSAAAAAAAAAAAAAASAASSAAAAGAAASS 126 157 A L T 3 S+ 0 0 180 602 75 ADDAVPLVDEDPPPHHHADDHEADPDDDDDHHDDDPHNSHAHHHSPPPHPHHPPHAPADSSSHPAPSPAD 127 158 A G T 3 S+ 0 0 82 601 3 GGGGGGCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG 128 159 A Y < - 0 0 95 601 2 YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYFYYYYYYYYYYYYYYYYYYYYFFFYYYYFYYY ©CMBI 20 MSAs contain conserved residues, correlated mutations, and variable residues. SFTDALKNMKPYESSFTRIVN SFTASLKNLKPYCSSFTRVIG SFTDALKLIVPYESSFTDVIH SWTAVLKLMVPYLSSFTDILR SYTDALKNVKPYESSFTRVVN

34 The amino acids in their natural habitat Topics: Hydrogen bonds Secondary Structure Alpha helix Beta strands & beta sheets Turns Loop Tertiary & Quarternary Structure Protein Domains

35 6) Transfer of information GPNANGPALLEILSLIAEAAQALAGGNQDDEA Can be phosphorylated at exactly one spot by kinase X. GGLEAAKLASSAASAAELLAGDNKKKW too.

36 Transfer of information GPNANGPALLEILSLIAEAAQALAGGNQDDEA Can be phosphorylated at exactly one spot by kinase X. GGLEAAKLASSAASAAELLAGDNKKKW too. GPNANGPALLEILSLIAEAAQALAGGNQDDEA GGLEAAKLASSAASAAELLAGDNKKKW


Download ppt "Bioinformatics A Summary seminar (with many hints for exam questions)"

Similar presentations


Ads by Google