Research Computing, NYU School of Medicine

Slides:



Advertisements
Similar presentations
Zoology 305 Library Databases/Indexes Lab Goals for session: 1) Meet your librarian Kevin Messner 2) Understand.
Advertisements

© Wiley Publishing All Rights Reserved. Using Nucleotide Sequence Databases.
NCBI data, sliding window programs and dot plots Sept. 25, 2012 Learning objectives-Become familiar with OMIM and PubMed. Understand the difference between.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
1.
On line (DNA and amino acid) Sequence Information Lecture 7.
Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
Educational Programs in Bioinformatics at UNO Hesham H. Ali Department of Computer Science College of Info Science and Technology University of Nebraska.
NATIONAL LIBRARY OF MEDICINE The PubMed ID and Entrez, PubMed and PubMed Central Edwin Sequeira National Center for Biotechnology Information June 21,
How to use the web for bioinformatics Molecular Technologies Ethan Strauss X 1171
Bioinformatics and the Engineering Library ASEE 2008 Amy Stout.
AP Biology Teaching Biology Through Bioinformatics Real world genomics research in your classroom Kim B. Foglia Division Ave. High School Levittown.
Computers and Programming for Biologists. What is Bioinformatics? The use of information technology to collect, analyze, and interpret biological data.
Archives and Information Retrieval
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Using Bioinformatics to Make the Bio- Math Connection The Confessions of a Biology Teacher.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
How to use the web for bioinformatics Molecular Technologies February 11, 2005 Ethan Strauss X 1373
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
Biological Databases Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
Integration of Bioinformatics into Inquiry Based Learning by Kathleen Gabric.
IST Computational Biology1 Information Retrieval Biological Databases 2 Pedro Fernandes Instituto Gulbenkian de Ciência, Oeiras PT.
The BIG Goal “The greatest challenge, however, is analytical. … Deeper biological insight is likely to emerge from examining datasets with scores of samples.”
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Signaling Pathways and Summary June 30, 2005 Signaling lecture Course summary Tomorrow Next Week Friday, 7/8/05 Morning presentation of writing assignments.
Bioinformatics Tools Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine.
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
From T. MADHAVAN, & K.Chandrasekaran Lecturers in Zoology.. EXIT.
Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy) Genome Databases Computational Molecular Biology Biochem 218 – BioMedical Informatics.
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
On line (DNA and amino acid) Sequence Information
Bioinformatics.
Development of Bioinformatics and its application on Biotechnology
Bioinformatics and medicine: Are we meeting the challenge?
NCBI’s Bioinformatics Resources Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries U.F. Genetics Institute January 2015.
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
Bioinformatics Stuart M. Brown, Ph.D. NYU School of Medicine.
Searching PubMed® NCBI, NLM Resources, Micromedex -GSBS TTUHSC Preston Smith Library presents Rev. 08/17/14.
Biological Databases By : Lim Yun Ping E mail :
BLAST: A Case Study Lecture 25. BLAST: Introduction The Basic Local Alignment Search Tool, BLAST, is a fast approach to finding similar strings of characters.
Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers.
BIOINFORMATICS IN BIOCHEMISTRY Bioinformatics– a field at the interface of molecular biology, computer science, and mathematics Bioinformatics focuses.
Bioinformatics: Theory and Practice – Striking a Balance (a plea for teaching, as well as doing, Bioinformatics) Practice (Molecular Biology) Theory: Central.
Organizing information in the post-genomic era The rise of bioinformatics.
Genomics for Librarians Stuart M. Brown, Ph.D. Director, Research Computing, NYU School of Medicine.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
BioInformatics Database of Primer Results In order to help predict the way proteins will act in an organism, biologists cross-examine sequences of amino.
Overview of Bioinformatics 1 Module Denis Manley..
BIOLOGICAL DATABASES. BIOLOGICAL DATA Bioinformatics is the science of Storing, Extracting, Organizing, Analyzing, and Interpreting information in biological.
Biological Networks & Systems Anne R. Haake Rhys Price Jones.
Bioinformatics and Computational Biology
By Chris Paine Genes Essential idea: Every living organism inherits a blueprint for life from its parents. Genes and.
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Integration of Bioinformatics into Inquiry Based Learning by Kathleen Gabric.
An Introduction to NCBI & BLAST National Center for Biotechnology Information Richard Johnston Pasadena City College.
Welcome to the combined BLAST and Genome Browser Tutorial.
NCBI: something old, something new. What is NCBI? Create automated systems for knowledge about molecular biology, biochemistry, and genetics. Perform.
Information retrieval and sliding window programs April 5, 2011 Hand in Homework #1. Homework #2 due Tuesday, April 12. Learning objectives- Understand.
Introduction to Bioinformatics and Functional Genomics
Biological Databases By: Komal Arora.
Archives and Information Retrieval
Bioinformatics Madina Bazarova. What is Bioinformatics? Bioinformatics is marriage between biology and computer. It is the use of computers for the acquisition,
Genomes and Their Evolution
Introduction to Bioinformatics
Next Generation Sequencing and Human Genome Databases
3.1 Genes Essential idea: Every living organism inherits a blueprint for life from its parents. Genes and hence genetic information is inherited from.
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

Research Computing, NYU School of Medicine Teaching Bioinformatics to Undergraduates http://www.med.nyu.edu/rcr/ASM Stuart M. Brown Research Computing, NYU School of Medicine

I. What is Bioinformatics? II. Challenges of teaching bioinformatics to undergraduates III. Common bioinformatics tools that you can use for teaching IV. The limits of knowledge V. Resources for the teacher

I. What is Bioinformatics? The use of information technology to collect, analyze, and interpret biological data. The use of software tools that deal with biological sequences, genome analysis, molecular structures, gene expression, regulatory and metabolic modeling Computational biology - the design of new algorithms and software to support biology research The routine use of computers in all phases of biology and medicine

The Human Genome Project

A Genome Revolution in Biology and Medicine We are in the midst of a "Golden Era" of biology The Human Genome Project has produced a huge storehouse of data that will be used to change every aspect of biological research and medicine The revolution is about treating biology as an information science, not about specific biochemical technologies.

The job of the biologist is changing As more biological information becomes available …and laboratory equipment becomes more automated ... The biologist will spend more time using computers The biologist will spend more time on experimental design and data analysis (and less time doing tedious lab biochemistry) Biology will become a more quantitative science (think how the periodic table affected chemistry)

II. Why teach bioinformatics in undergraduate education? Demand for trained graduates from the biomedical industry Bioinformatics is essential to understand current developments in all fields of biology We need to educate an entire new generation of scientists, health care workers, etc. Use bioinformatics to enhance the teaching of other subjects: genetics, evolution, biochemistry

Biochemistry & Protein Structures "Hands-on graphics is a powerful enhancement to learning, particularly individualized learning. There is powerful synergy in learning about proteins and learning simultaneously about how to represent and manipulate them with computer graphics. When students learn to use graphics they see proteins and other complex biomolecules in a new and vivid way, and discover personal solutions to the problem of "seeing" new structural concepts." Molecular Graphics Manifesto Gale Rhodes Chemistry Department, Univ. of Southern Maine

Challenges of presenting bioinformatics to undergraduates Requires a deep understanding of molecular biology - lots of prerequisites Training users or makers of these tools? A good bioinformatics program will require substantially more math and statistics than most existing molecular biology and computer science curricula. Who will teach?

Different Programs, Different Goals Integrate into existing biology courses: genetics, molecular biology, microbiology Make one or a few cross-disciplinary courses jointly taught by biology and computing faculty open to both biology and computing students Create a curriculum for a true bioinformatics major (is this a double major?) Are you training for employment or providing the fundamentals for advanced training?

Shallow End This workshop will focus on faculty skills needed at the shallow end of the continuum (a few lectures or a short course). Use bioinformatics to teach biological concepts Evolution Genetics Protein structure and function

How much Computing skills? Bioinformatics can be seen as a tool that the biologist needs to use - like PCR Or should biologists be able to write their own programs and build databases? it is a big advantage to be able to design exactly the tool that you want this may be the wave of the future Is your school going to train "bioinformatics professionals" or biologists with informatics skills?"

Designing a Curriculum To really master bioinformatics, students need to learn a lot of molecular biology and genetics as well as become competent programmers. Then they need to learn specific bioinformatics skills - dealing with sequence databases, similarity algorithms, etc. How can students learn this much material and still manage a well rounded education? Graduates of these programs will become scientists and managers. Writing and presentation skills are essential components of their education.

Different Schools have Different Biases There are still only a handful of bioinformatics undergraduate programs [Many more schools offer a single course or a "specialized track" similar to a biotechnology major] You can generally predict the bias according to what school/department hosts the program Computer Science vs. biology Biomedical engineering Medical informatics (library science)

Teaching the Teachers There are more graduate level bioinformatics programs, but they are all very new. Graduates of these programs will have many opportunities as more schools gear up to offer bioinformatics training The reality is that most schools will draft existing faculty - often jointly from Bio and CompSci departments We need to train an entire generation of existing faculty in a new discipline

Teaching Tips Strike a balance between theory and practical experience early bioinformatics training should be about what you can do with the tools deeper training can focus on how they work Balance the "click here" tutorials against letting them figure it out for themselves it will be different when they look at it next time real bioinformatics work involves finding ways to overcome frustrations with balky computer systems

Training "computer savvy" scientists Know the right tool for the job Get the job done with tools available Network connection is the lifeline of the scientist Jobs change, computers change, projects change, scientists need to be adaptable

III. Bioinformatics Tools You Can Use GenBank - genes, proteins, genomes Similarity Search tools: BLAST Alignment: CLUSTAL Protein families: Pfam, ProDom Protein Structures: PDB, RasMol Whole Genomes: UCSC, Entrez Genomes Human Mutations: OMIM Biochemical Pathways: KEGG Integrated tools: Biology Workbench, BCM SearchLauncher

Large Databases Once upon a time, GenBank sent out sequence updates on CD-ROM disks a few times per year. Now GenBank is over 40 Gigabytes (11 billion bases) Most biocomputing sites update their copy of GenBank every day over the internet. Scientists access GenBank directly over the Web

Finding Genes in GenBank These billions of G, A, T, and C letters would be almost useless without descriptions of what genes they contain, the organisms they come from, etc. All of this information is contained in the "annotation" part of each sequence record.

Entrez is a Tool for Finding Sequences GenBank is managed by the NCBI (National Center for Biotechnology Information) which is a part of the US National Library of Medicine. NCBI has created a Web-based tool called Entrez for finding sequences in GenBank. http://www.ncbi.nlm.nih.gov Each sequence in GenBank has a unique “accession number”. Entrez can also search for keywords such as gene names, protein names, and the names of orgainisms or biological functions

Entrez Databases contain more than just DNA & protein sequences

Type in a Query term Enter your search words in the query box and hit the “Go” button

Refine the Query Often a search finds too many (or too few) sequences, so you can go back and try again with more (or fewer) keywords in your query The “History” feature allows you to combine any of your past queries. The “Limits” feature allows you to limit a query to specific organisms, sequences submitted during a specific period of time, etc. [Many other features are designed to search for literature in MEDLINE]

Related Items You can search for a text term in sequence annotations or in MEDLINE abstracts, and find all articles, DNA, and protein sequences that mention that term. Then from any article or sequence, you can move to "related articles" or "related sequences". Relationships between sequences are computed with BLAST Relationships between articles are computed with "MESH" terms (shared keywords Relationships between DNA and protein sequences rely on accession numbers Relationships between sequences and MEDLINE articles rely on both shared keywords and the mention of accession numbers in the articles.

Database Search Strategies General search principles - not limited to sequence (or to biology) Use accession numbers whenever possible Start with broad keywords and narrow the search using more specific terms Try variants of spelling, numbers, etc. Search all relevant databases Be persistent!!

>gb|BE588357.1|BE588357 194087 BARC 5BOV Bos taurus cDNA 5'. Length = 369 Score = 272 bits (137), Expect = 4e-71 Identities = 258/297 (86%), Gaps = 1/297 (0%) Strand = Plus / Plus Query: 17 aggatccaacgtcgctccagctgctcttgacgactccacagataccccgaagccatggca 76 |||||||||||||||| | ||| | ||| || ||| | |||| ||||| ||||||||| Sbjct: 1 aggatccaacgtcgctgcggctacccttaaccact-cgcagaccccccgcagccatggcc 59 Query: 77 agcaagggcttgcaggacctgaagcaacaggtggaggggaccgcccaggaagccgtgtca 136 |||||||||||||||||||||||| | || ||||||||| | ||||||||||| ||| || Sbjct: 60 agcaagggcttgcaggacctgaagaagcaagtggagggggcggcccaggaagcggtgaca 119 Query: 137 gcggccggagcggcagctcagcaagtggtggaccaggccacagaggcggggcagaaagcc 196 |||||||| | || | ||||||||||||||| ||||||||||| || |||||||||||| Sbjct: 120 tcggccggaacagcggttcagcaagtggtggatcaggccacagaagcagggcagaaagcc 179 Query: 197 atggaccagctggccaagaccacccaggaaaccatcgacaagactgctaaccaggcctct 256 ||||||||| | |||||||| |||||||||||||||||| |||||||||||||||||||| Sbjct: 180 atggaccaggttgccaagactacccaggaaaccatcgaccagactgctaaccaggcctct 239 Query: 257 gacaccttctctgggattgggaaaaaattcggcctcctgaaatgacagcagggagac 313 || || ||||| || ||||||||||| | |||||||||||||||||| |||||||| Sbjct: 240 gagactttctcgggttttgggaaaaaacttggcctcctgaaatgacagaagggagac 296

Sample Multiple Alignment

Protein domains (from ProDom database)

Limits on best Matched Annotation Inheritance result from many things including multi domain proteins transitivity. New sequence Closest database annotated entry Original studied protein from which annotation was inherited.

Protein Structure It is not really possible to predict protein structure from just amino acid sequence PDB is a database of know protein structures (determined by X-ray crystallography and NMR) There are also very handy structure viewers such as RasMol that are free for any computer

Genome Browsers Scientists need to work with a lot of layers of information about the genome coding sequence of known genes and cDNAs computer-predicted genes genetic maps (known mutations and markers) gene expression cross species homology

UCSC

Ensembl at EBI/EMBL

Human Alleles The OMIM (Online Mendelian Inheritance in Man) database at the NCBI tracks all human mutations with known phenotypes. It contains a total of about 2,000 genetic diseases [and another ~11,000 genetic loci with known phenotypes - but not necessarily known gene sequences] It is designed for use by physicians: can search by disease name contains summaries from clinical studies

KEGG: Kyoto Encylopedia of Genes and Genomes Enzymatic and regulatory pathways Mapped out by EC number and cross-referenced to genes in all known organisms (wherever sequence information exits) Parallel maps of regulatory pathways

Integrated Online Tools National Database/Sequence Analsysis Servers: NCBI, EMBL/EBI, DDBJ Tools for specific types of data or problems Expasy (Protein, Mass Spec, 2-D PAGE) 3-D Protein Structures: PDB, Predict Protein Server Education oriented tools Biology Workbench Collections of links to other servers BCM SearchLauncher

The Limits of our Knowledge Bioinformatics is a very dynamic discipline Teachers can't know everything in the field The databases are clumsily built Biology is vastly more complex than our software Lots of our current bioinformatics programs don't work well We don't have even theoretical solutions for Gene prediction, alternative splicing, protein structure & function prediction, regulatory networks

What is a Gene? For every 2 biologists, you get 3 definitions “A DNA sequence that encodes a heritable trait.” The unit of heredity Is it an abstract concept, or something you can isolate in a tube or print on your screen? “Classic” vs.. “modern” understanding of molecular biology

Genome Confusion The sequence of a gene in the genome includes: protein coding sequence introns and exons 5' and 3' untranslated regions on the mRNA promoter and 5' transcription factor binding sites enhancers?? What about alternative splicing? Multiple cDNAs with different sequences (that produce different proteins) can be transcribed from the same genomic locus

V. Teaching Resources The Biology Student WorkBench http://peptide.ncsa.uiuc.edu/bioswb.html RasMol/Chime/Protein Explorer http://www.umass.edu/microbio/rasmol/ http://www.umass.edu/microbio/chime/ http://www.umass.edu/microbio/chime/explorer/ Bioinformatics.org Other courses - It ain't cheating to learn from your peers

Terri Attwood's Web Biocomputing tutorials http://www.biochem.ucl.ac.uk/bsm/dbbrowser/c32/index.html http://www.biochem.ucl.ac.uk/bsm/dbbrowser/jj/prefacefrm.html Sequence Analysis on the Web Christian Büschking and Chris Schleiermacher http://bibiserv.techfak.uni-bielefeld.de/sadr/ Online Lectures on Bioinformatics Hannes Luz, Max Planck Institute for Molecular Genetics http://lectures.molgen.mpg.de/ Using Computers in Molecular Biology Stuart Brown, NYU School of Medicine http://www.med.nyu.edu/rcr/rcr/course/index.html Teach Yourself Bioinformatics on the Web http://www.med.nyu.edu/rcr/rcr/btr/index.html

Long Term Implications A "periodic table for biology" will lead to an explosion of research and discoveries - we will finally have the tools to start making systematic analyses of biological processes (quantitative biology). Understanding the genome will lead to the ability to change it - to modify the characteristics of organisms and people in a wide variety of ways

Genomics in Medical Education “The explosion of information about the new genetics will create a huge problem in health education. Most physicians in practice have had not a single hour of education in genetics and are going to be severely challenged to pick up this new technology and run with it." Francis Collins

Stuart M. Brown, Ph.D. stuart.brown@med.nyu.edu www.med.nyu/rcr Bioinformatics: A Biologist's Guide to Biocomputing and the Internet Stuart M. Brown, Ph.D. stuart.brown@med.nyu.edu www.med.nyu/rcr