Presentation on theme: "Research Computing, NYU School of Medicine"— Presentation transcript:
1 Research Computing, NYU School of Medicine Teaching Bioinformatics to UndergraduatesStuart M. BrownResearch Computing, NYU School of Medicine
2 I. What is Bioinformatics? II. Challenges of teaching bioinformatics to undergraduatesIII. Common bioinformatics tools that you can use for teachingIV. The limits of knowledgeV. Resources for the teacher
3 I. What is Bioinformatics? The use of information technology to collect, analyze, and interpret biological data.The use of software tools that deal with biological sequences, genome analysis, molecular structures, gene expression, regulatory and metabolic modelingComputational biology - the design of new algorithms and software to support biology researchThe routine use of computers in all phases of biology and medicine
5 A Genome Revolution in Biology and Medicine We are in the midst of a "Golden Era" of biologyThe Human Genome Project has produced a huge storehouse of data that will be used to change every aspect of biological research and medicineThe revolution is about treating biology as an information science, not about specific biochemical technologies.
6 The job of the biologist is changing As more biological information becomes available …and laboratory equipment becomes more automated ...The biologist will spend more time using computersThe biologist will spend more time on experimental design and data analysis (and less time doing tedious lab biochemistry)Biology will become a more quantitative science (think how the periodic table affected chemistry)
7 II. Why teach bioinformatics in undergraduate education? Demand for trained graduates from the biomedical industryBioinformatics is essential to understand current developments in all fields of biologyWe need to educate an entire new generation of scientists, health care workers, etc.Use bioinformatics to enhance the teaching of other subjects: genetics, evolution, biochemistry
8 Biochemistry & Protein Structures "Hands-on graphics is a powerful enhancement to learning, particularly individualized learning. There is powerful synergy in learning about proteins and learning simultaneously about how to represent and manipulate them with computer graphics. When students learn to use graphics they see proteins and other complex biomolecules in a new and vivid way, and discover personal solutions to the problem of "seeing" new structural concepts."Molecular Graphics ManifestoGale RhodesChemistry Department, Univ. of Southern Maine
9 Challenges of presenting bioinformatics to undergraduates Requires a deep understanding of molecular biology - lots of prerequisitesTraining users or makers of these tools?A good bioinformatics program will require substantially more math and statistics than most existing molecular biology and computer science curricula.Who will teach?
10 Different Programs, Different Goals Integrate into existing biology courses:genetics, molecular biology, microbiologyMake one or a few cross-disciplinary coursesjointly taught by biology and computing facultyopen to both biology and computing studentsCreate a curriculum for a true bioinformatics major (is this a double major?)Are you training for employment or providing the fundamentals for advanced training?
11 Shallow EndThis workshop will focus on faculty skills needed at the shallow end of the continuum (a few lectures or a short course).Use bioinformatics to teach biological conceptsEvolutionGeneticsProtein structure and function
12 How much Computing skills? Bioinformatics can be seen as a tool that the biologist needs to use - like PCROr should biologists be able to write their own programs and build databases?it is a big advantage to be able to design exactly the tool that you wantthis may be the wave of the futureIs your school going to train "bioinformatics professionals" or biologists with informatics skills?"
13 Designing a Curriculum To really master bioinformatics, students need to learn a lot of molecular biology and genetics as well as become competent programmers.Then they need to learn specific bioinformatics skills - dealing with sequence databases, similarity algorithms, etc.How can students learn this much material and still manage a well rounded education?Graduates of these programs will become scientists and managers. Writing and presentation skills are essential components of their education.
14 Different Schools have Different Biases There are still only a handful of bioinformatics undergraduate programs[Many more schools offer a single course or a "specialized track" similar to a biotechnology major]You can generally predict the bias according to what school/department hosts the programComputer Science vs. biologyBiomedical engineeringMedical informatics (library science)
15 Teaching the TeachersThere are more graduate level bioinformatics programs, but they are all very new.Graduates of these programs will have many opportunities as more schools gear up to offer bioinformatics trainingThe reality is that most schools will draft existing faculty - often jointly from Bio and CompSci departmentsWe need to train an entire generation of existing faculty in a new discipline
16 Teaching Tips Strike a balance between theory and practical experience early bioinformatics training should be about what you can do with the toolsdeeper training can focus on how they workBalance the "click here" tutorials against letting them figure it out for themselvesit will be different when they look at it next timereal bioinformatics work involves finding ways to overcome frustrations with balky computer systems
17 Training "computer savvy" scientists Know the right tool for the jobGet the job done with tools availableNetwork connection is the lifeline of the scientistJobs change, computers change, projects change, scientists need to be adaptable
18 III. Bioinformatics Tools You Can Use GenBank - genes, proteins, genomesSimilarity Search tools: BLASTAlignment: CLUSTALProtein families: Pfam, ProDomProtein Structures: PDB, RasMolWhole Genomes: UCSC, Entrez GenomesHuman Mutations: OMIMBiochemical Pathways: KEGGIntegrated tools: Biology Workbench, BCM SearchLauncher
19 Large DatabasesOnce upon a time, GenBank sent out sequence updates on CD-ROM disks a few times per year.Now GenBank is over 40 Gigabytes(11 billion bases)Most biocomputing sites update their copy of GenBank every day over the internet.Scientists access GenBank directly over the Web
20 Finding Genes in GenBank These billions of G, A, T, and C letters would be almost useless without descriptions of what genes they contain, the organisms they come from, etc.All of this information is contained in the "annotation" part of each sequence record.
22 Entrez is a Tool for Finding Sequences GenBank is managed by the NCBI (National Center for Biotechnology Information) which is a part of the US National Library of Medicine.NCBI has created a Web-based tool called Entrez for finding sequences in GenBank.Each sequence in GenBank has a unique “accession number”.Entrez can also search for keywords such as gene names, protein names, and the names of orgainisms or biological functions
27 Refine the QueryOften a search finds too many (or too few) sequences, so you can go back and try again with more (or fewer) keywords in your queryThe “History” feature allows you to combine any of your past queries.The “Limits” feature allows you to limit a query to specific organisms, sequences submitted during a specific period of time, etc.[Many other features are designed to search for literature in MEDLINE]
28 Related ItemsYou can search for a text term in sequence annotations or in MEDLINE abstracts, and find all articles, DNA, and protein sequences that mention that term.Then from any article or sequence, you can move to "related articles" or "related sequences".Relationships between sequences are computed with BLASTRelationships between articles are computed with "MESH" terms (shared keywordsRelationships between DNA and protein sequences rely on accession numbersRelationships between sequences and MEDLINE articles rely on both shared keywords and the mention of accession numbers in the articles.
30 Database Search Strategies General search principles - not limited to sequence (or to biology)Use accession numbers whenever possibleStart with broad keywords and narrow the search using more specific termsTry variants of spelling, numbers, etc.Search all relevant databasesBe persistent!!
35 Limits on best Matched Annotation Inheritance result from many things including multi domain proteins transitivity.New sequenceClosest database annotated entryOriginal studied protein from whichannotation was inherited.
36 Protein StructureIt is not really possible to predict protein structure from just amino acid sequencePDB is a database of know protein structures (determined by X-ray crystallography and NMR)There are also very handy structure viewers such as RasMol that are free for any computer
39 Genome BrowsersScientists need to work with a lot of layers of information about the genomecoding sequence of known genes and cDNAscomputer-predicted genesgenetic maps (known mutations and markers)gene expressioncross species homology
43 Human AllelesThe OMIM (Online Mendelian Inheritance in Man) database at the NCBI tracks all human mutations with known phenotypes.It contains a total of about 2,000 genetic diseases [and another ~11,000 genetic loci with known phenotypes - but not necessarily known gene sequences]It is designed for use by physicians:can search by disease namecontains summaries from clinical studies
45 KEGG: Kyoto Encylopedia of Genes and Genomes Enzymatic and regulatory pathwaysMapped out by EC number and cross-referenced to genes in all known organisms(wherever sequence information exits)Parallel maps of regulatory pathways
48 Integrated Online Tools National Database/Sequence Analsysis Servers:NCBI, EMBL/EBI, DDBJTools for specific types of data or problemsExpasy (Protein, Mass Spec, 2-D PAGE)3-D Protein Structures: PDB, Predict Protein ServerEducation oriented toolsBiology WorkbenchCollections of links to other serversBCM SearchLauncher
54 The Limits of our Knowledge Bioinformatics is a very dynamic disciplineTeachers can't know everything in the fieldThe databases are clumsily builtBiology is vastly more complex than our softwareLots of our current bioinformatics programs don't work wellWe don't have even theoretical solutions for Gene prediction, alternative splicing, protein structure & function prediction, regulatory networks
55 What is a Gene? For every 2 biologists, you get 3 definitions “A DNA sequence that encodes a heritable trait.”The unit of heredityIs it an abstract concept, or something you can isolate in a tube or print on your screen?“Classic” vs.. “modern” understanding of molecular biology
56 Genome Confusion The sequence of a gene in the genome includes: protein coding sequenceintrons and exons5' and 3' untranslated regions on the mRNApromoter and 5' transcription factor binding sitesenhancers??What about alternative splicing?Multiple cDNAs with different sequences (that produce different proteins) can be transcribed from the same genomic locus
57 V. Teaching Resources The Biology Student WorkBench RasMol/Chime/Protein ExplorerBioinformatics.orgOther courses - It ain't cheating to learn from your peers
58 Terri Attwood's Web Biocomputing tutorials Sequence Analysis on the WebChristian Büschking and Chris SchleiermacherOnline Lectures on BioinformaticsHannes Luz, Max Planck Institute for Molecular GeneticsUsing Computers in Molecular BiologyStuart Brown, NYU School of MedicineTeach Yourself Bioinformatics on the Web
59 Long Term Implications A "periodic table for biology" will lead to an explosion of research and discoveries - we will finally have the tools to start making systematic analyses of biological processes (quantitative biology).Understanding the genome will lead to the ability to change it - to modify the characteristics of organisms and people in a wide variety of ways
60 Genomics in Medical Education “The explosion of information about the new genetics will create a huge problem in health education. Most physicians in practice have had not a single hour of education in genetics and are going to be severely challenged to pick up this new technology and run with it."Francis Collins
61 Stuart M. Brown, Ph.D. firstname.lastname@example.org www.med.nyu/rcr Bioinformatics: A Biologist's Guide to Biocomputing and the InternetStuart M. Brown, Ph.D.