Rationale for offering bioinformatics 1. Need to understand how popular bioinformatics algorithms operate (Clustal W, BLAST, PSIPRED). 2. A programming assignment gives a taste of what it is like to be a developer.
Definition of Bioinformatics Use of computers to catalog and organize biological information into meaningful entities.
Learning Outcomes 1) Retrieve gene sequence information from GenBank. 2) Use BLAST to conduct gene similarity searches. 3) Align multiple sequences with Clustal W software. 4) Predict secondary structures with PSIPRED. 5) Display and compare protein structures. 6) Write software programs that perform queries a database with a protein sequence. 7) Understand the theory that led to the development of scoring methods commonly used to measure sequence similarities.
How is Bioinformatics Used? Experimental proof is still the “Gold Standard”. Bioinformatics isn’t going to replace lab work anytime soon Bioinformatics is used to help “focus” the experiments of the benchtop scientist
Useful textbooks on the subject Beginning Python: From Novice to Professional, Apress 2008 ISBN: 1-50059-982-9
Bioinformatics – Why to Do It Richard Karp’s Motivation: "Find genetic basis of complex diseases so that we can develop more effective modes of treatment."
Bioinformatics – How to Do It “… solving biological problems requires far more than clever algorithms: it involves a creative partnership between biologists and mathematical scientists to arrive at an appropriate mathematical model, the acquisition and use of diverse sources of data, and statistical methods to show that the biological patterns and regularities that we discover could not be due to chance." -- Richard Karp
Who is Richard Karp? UC Berkeley Professor Recipient of Turing Award (1985) The Benjamin Franklin Medal in Computer and Cognitive Science (2004) The Kyoto Prize (2008) Turing award citation For his continuing contributions to the theory of algorithms … most notably, contributions to the theory of NP-completeness. Karp introduced the now standard methodology for proving problems to be NP-complete which has led to the identification of many theoretical and practical problems as being computationally difficult.NP-completeness Recent work on transcriptional regulation of genes, discovering conserved regulatory pathways, analyzing genetic variations in humans.
Basis of molecular life sciences Hierarchy of relationships (some exceptions): Genome Gene 1Gene 3Gene 2Gene X Protein 1Protein 2Protein 3Protein X Function 1Function 2Function 3Function X
Table 1.1. Single letter abbreviations used for DNA nucleotide sequences One letter abbreviation Nucleotide nameBase nameCategory AAdenosine monophosphate AdeninePurine CCytidine monophosphate CytosinePyrimidine GGuanosine monophosphate GuaninePurine TThymidine monophosphate ThyminePyrimidine NAny nucleotideAny baseNA RA or G Purine YC or T Pyrimidine -or *-------------Gap human GCTGTCCCTCACTGTTGAATTTTCTCTAACTTCAAGGCCCATATCTGTGAAATGCT drosophila GCTATTAGT--ATCTTAAGTTTGTATTA--------GTCCTTGTTCGTAAGGCGTT
Table 1.2. Abbreviations used for ambiguous and rare amino acids 1-letter abbreviation 3-letter abbreviationMeaning BAsn or AspAsparagine or aspartic acid JXleIsoleucine or leucine OPyrPyrrolysine USecSelenocysteine ZGln or GluGlutamine or glutamic acid XXaaAny amino acid - or *---No corresponding residue (gap)