Computers and Programming for Biologists. What is Bioinformatics? The use of information technology to collect, analyze, and interpret biological data.

Slides:



Advertisements
Similar presentations
LESSON 1: What is Genetic Research? PowerPoint slides to accompany Using Bioinformatics : Genetic Research.
Advertisements

Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
A Lite Introduction to (Bioinformatics and) Comparative Genomics Chris Mueller August 10, 2004.
HCS806 “Methods in Horticulture and Crop Science” Introduction to methods in Bioinformatics for plant science. David Francis (Coordinator) Ian Holford.
Bioinformatics at WSU Matt Settles Bioinformatics Core Washington State University Wednesday, April 23, 2008 WSU Linux User Group (LUG)‏
Perl Programming: Developing Key Tools for Bioinformatics An Informative Look Behind the Importance of Programming Skills and Brief Tutorial on Getting.
AP Biology Teaching Biology Through Bioinformatics Real world genomics research in your classroom Kim B. Foglia Division Ave. High School Levittown.
Let’s investigate some of the Hot Areas of Life Sciences in more detail: Genomics –Human Genome Project –Use of Microarrays or DNA chips Bioinformatics.
A Grid implementation of the sliding window algorithm for protein similarity searches facilitates whole proteome analysis on continuously updated databases.
Sequence Similarity Searching Class 4 March 2010.
The Golden Age of Biology DNA -> RNA -> Proteins -> Metabolites Genomics Technologies MECHANISMS OF LIFE Health Care Diagnostics Medicines Animal Products.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Bioinformatics and Phylogenetic Analysis
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
Scientific Data Mining: Emerging Developments and Challenges F. Seillier-Moiseiwitsch Bioinformatics Research Center Department of Mathematics and Statistics.
Structural Bioinformatics Dr. Avraham Samson Course no.: Credit points: 1.5 Final grade is based on 10 assignments Course homepage:
Signaling Pathways and Summary June 30, 2005 Signaling lecture Course summary Tomorrow Next Week Friday, 7/8/05 Morning presentation of writing assignments.
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011.
Bioinformatics Jan Taylor. A bit about me Biochemistry and Molecular Biology Computer Science, Computational Biology Multivariate statistics Machine learning.
9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.
Bioinformatics.
Bioinformatics Core Facility Ernesto Lowy February 2012.
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
Bioinformatics Stuart M. Brown, Ph.D. NYU School of Medicine.
Master’s Degrees in Bioinformatics in Switzerland: Past, present and near future Patricia M. Palagi Swiss Institute of Bioinformatics.
BLAST: A Case Study Lecture 25. BLAST: Introduction The Basic Local Alignment Search Tool, BLAST, is a fast approach to finding similar strings of characters.
Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers.
Supporting High- Performance Data Processing on Flat-Files Xuan Zhang Gagan Agrawal Ohio State University.
Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.
BIOINFORMATICS IN BIOCHEMISTRY Bioinformatics– a field at the interface of molecular biology, computer science, and mathematics Bioinformatics focuses.
David R. McWilliams, Ph.D. Section of Statistical Genetics, Department of Biostatistical Sciences, Center for Public Health Genomics Bioinformatician IV.
Bioinformatics: Theory and Practice – Striking a Balance (a plea for teaching, as well as doing, Bioinformatics) Practice (Molecular Biology) Theory: Central.
Institute of Biomedical Sciences (ICB) Malaria Nucleus Institute of Mathematics and Statistics (IME) BIOINFO-USP Nucleus Latin American Course on Bioinformatics.
Robert Crawford, MBA West Middle School.  Explain how the binary system is used by computers.  Describe how software is written and translated  Summarize.
Genomics for Librarians Stuart M. Brown, Ph.D. Director, Research Computing, NYU School of Medicine.
Bioinformatics Core Facility Guglielmo Roma January 2011.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Overview of Bioinformatics 1 Module Denis Manley..
Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill
EMBOSS over a Grid 1. 1st EELA Grid School December 4th of 2006 Eduardo MURRIETA LEON Romualdo ZAYAS-LAGUNAS Pierre-Alain BRANGER Jérôme VERLEYEN Roberto.
BIOLOGICAL DATABASES. BIOLOGICAL DATA Bioinformatics is the science of Storing, Extracting, Organizing, Analyzing, and Interpreting information in biological.
IPlant Genomics in Education
Information Technology in the Natural Sciences Biology – Chemistry – Physics.
EB3233 Bioinformatics Introduction to Bioinformatics.
Bioinformatics Curriculum Issues, goals, curriculum.
Computer Software Types Three layers of software Operation.
Bioinformatics and Computational Biology
An approach to carry out research and teaching in Bioinformatics in remote areas Alok Bhattacharya Centre for Computational Biology & Bioinformatics JAWAHARLAL.
COMPUTATIONAL BIOLOGIST DR. MARTIN TOMPA Place of Employment: University of Washington Type of Work: Develops computer programs and algorithms to identify.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
B i o i n f o r m a t i c s / B i o m e d i c a l A p p l i c a t i o n s i n E E L A Mexico, D.F., october 22 – 26, e – s c i e n c e M e x i c.
Bioinformatics Chem 434 Dr. Nancy Warter-Perez Computer Engineering Dr. Jamil Momand Chemistry & Biochemistry.
__________________________________________________________________________________________________ Fall 2015GCBA 815 __________________________________________________________________________________________________.
The State of Microarrays The Scientist: 2003 By: Hien Dang.
High throughput biology data management and data intensive computing drivers George Michaels.
Graduate Research with Bioinformatics Research Mentors Nancy Warter-Perez, ECE Robert Vellanoweth Chem and Biochem Fellow Sean Caonguyen 8/20/08.
EMBOSS "The European Molecular Biology Open Software Suite "
STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2016 Xiaole Shirley Liu.
Bioinformatics Overview
Research Paper on BioInformatics
Introduction to Bioinformatics and Functional Genomics
MATLAB Distributed, and Other Toolboxes
Bioinformatics Madina Bazarova. What is Bioinformatics? Bioinformatics is marriage between biology and computer. It is the use of computers for the acquisition,
Perl for Bioinformatics
Fast Sequence Alignments
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
LESSON 1 INTNRODUCTION HYE-JOO KWON, Ph.D /
Supporting High-Performance Data Processing on Flat-Files
Introduction to Bioinformatics
Presentation transcript:

Computers and Programming for Biologists

What is Bioinformatics? The use of information technology to collect, analyze, and interpret biological data. An ad hoc collection of computing tools that are used by molecular biologists to manage research data. –Computational algorithms –Database schema –Statistical methods –Data visualization tools

The Human Genome Project

A Genome Revolution in Biology and Medicine  We are in the midst of a "Golden Era" of biology  The Human Genome Project has produced a huge storehouse of data that will be used to change every aspect of biological research and medicine  The revolution is about treating biology as an information science, not about specific biochemical technologies.

The job of the biologist is changing –The biologist will spend more time using computers & on experimental design and data analysis (and less time doing tedious lab biochemistry) –Biology will become a more quantitative science (think how the periodic table affected chemistry) As more biological information becomes available and laboratory equipment becomes more automated...

What are the Tools? Alignment Similarity = string matching –Pattern search –Hash tables and substitution matrices Clustering Genome assembly and annotation

Align by hand GATGCCATAGAGCTGTAGTCGTACCCT < — — > CTAGAGAGC- GTAGTCAGAGTGTCTTTGAGTTCC Somebody should make a computer program for this kind of thing…

Global vs. Local Alignments

BLAST Algorithm

>ZFISH9:GNL-TI fi72b02.y1 Length = 724 Score = 307 bits (786), Expect = 8e-82 Identities = 145/200 (72%), Positives = 166/200 (82%), Gaps = 1/200 (0%) Frame = +3 Query: 45 VLLKEYRVILPVSVDEYQVGQLYSVAEASKNXXXXXXXXXXXXXXPYEK-DGEKGQYTHK 103 +L+KE+R++LPVSV+EYQVGQLYSVAEASKN PYEK DGEKGQYTHK Sbjct: 123 MLIKEFRIVLPVSVEEYQVGQLYSVAEASKNETGGGDGVEVLKNEPYEKEDGEKGQYTHK 302 Query: 104 IYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITNEYMKEDFLIKIETWHKPDLG 163 IY LQSKVP+FVR+LAP AL IHEKAWNAYPYCRTV+TNEYMK++FLI IETWHKPDLG Sbjct: 303 IYRLQSKVPSFVRLLAPSSALIIHEKAWNAYPYCRTVLTNEYMKDNFLIMIETWHKPDLG 482 Query: 164 TQENVHKLEPEAWKHVEAVYIDIADRSQVLSKDYKAEEDPAKFKSIKTGRGPLGPNWKQE 223 QENVH L+ E WK VE ++IDIADRSQV +KDYK +EDPA FKS KTGRGPLGP+WK+E Sbjct: 483 EQENVHNLDSERWKQVEVIHIDIADRSQVDTKDYKPDEDPATFKSQKTGRGPLGPDWKKE 662 Query: 224 LVNQKDCPYMCAYKLVTVKF 243 L ++DCP+MCAYK VTV F Sbjct: 663 LPQKRDCPHMCAYKXVTVNF 722

Clustering (Phylogenetics)

Genome Assembly

Raw Genome Data:

UCSC

The Challenge of New Data Types Gene expression microarrays –thousands of genes, imprecise measurements –huge images, private file formats Proteomics –high-throughput Mass Spec –protein chips: protein-protein interactions Genotyping –thousands of alleles, thousands of individuals

cDNA spotted microarrays

High-Throughput Genotyping

Bioinformatics: Beyond Using Websites You can do a lot of sophisticated bioinformatics using public websites But at some point you may be faced with a LOT of data - thousands of searches, annotations, etc. The only solution is to have your own bioinformatics computer, database, and custom programs. Needs more processor power and more hard drive space than a typical desktop personal computer

Bioinformatics Requires Powerful Computers One definition of bioinformatics is "the use of computers to analyze biological problems.” As biological data sets have grown larger and biological problems have become more complex, the requirements for computing power have also grown. Computers that can provide this power generally use the Unix operating system - so you must learn Unix be a computational biologist

Stable and Efficient Unix is very stable - computers running Unix almost never crash Unix is very efficient it gets maximum number crunching power out of your processor (and multiple processors) it can smoothly manage extremely huge amounts of data it can give a new life to otherwise obsolete Macs and PCs Most new bioinformatics software is created for Unix - its easy for the programmers

Open Source Bioinformatics Almost all of the bioinformatics software that you need to do complex analyses is free for UNIX computers The Open Source software ethic is very strong among biologists –Bioinformatics.org –Bioperl.org –Open-bio.org New algorithms generally appear first as free software (a publication requirement)

Free Software Linux operating system, mySYQL database Perl - programming language Blast and Fasta - similarity search Clustal - multiple alignment Phylip - phylogenetics Phred/Phrap/Consed - sequence assembly and SNP detection EMBOSS - a complete sequence analysis package created by the EMBL (like GCG)

Computer Hardware is not Free However, you can build a powerful Linux cluster for $20-50K (depending on how much power you need) The real cost is for a person to manage the machines, install the software, and train scientists to use it. Small schools can join together or affiliate with a larger neighbor.

Do Biologists have to become Programmers? No, but it can give you a big advantage. More and more of biology is becoming computer aided design of experiments, automated equipment, and computational analysis of the results. “I just want to say one word to you... Databases”

Why teach bioinformatics in undergraduate education?  Demand for trained graduates from the biomedical industry  Bioinformatics is essential to understand current developments in all fields of biology  We need to educate an entire new generation of scientists, health care workers, etc.  Use bioinformatics to enhance the teaching of other subjects: genetics, evolution, biochemistry

Genomics in Medical Education “The explosion of information about the new genetics will create a huge problem in health education. Most physicians in practice have had not a single hour of education in genetics and are going to be severely challenged to pick up this new technology and run with it." Francis Collins

Becoming a Unix Power User Learn more Unix commands Use the shell to execute simple programs Write scripts - automate repetitive tasks Download and install the latest bioinformatics software Drive your system manager crazy… or get your own Unix machine (Linux on an Intel machine or Mac OS-X)

BioPerl Why re-invent the wheel? Lots of common bioinformatics tasks have already been programmed as “modules” in Perl. –Grab sequences from GenBank, extract e- values and annotation from Blast results, etc. Download from

Resources Notes for Lincoln Stein’s course on “Genome Informatics” BioPerl.org PERL for biologists (Kurt Stüber) “Why Biologists Want to Program Computers” by James Tisdall:

Resources for Bio-Computing

Stuart M. Brown, Ph.D. Bioinformatics: A Biologist's Guide to Biocomputing and the Internet Essentials of Medical Genomics