Research In the Post-Genomics Era Martina McGloughlin, Biotechnology Program and Life Sciences Informatics Program UC Davis.

Slides:



Advertisements
Similar presentations
Recombinant DNA Technology
Advertisements

Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Wrapup. NHGRI strategic plan What does the NIH think genomics should be for the next 10 years? [Nature, Feb. 2011]
Introduction and Importance of Bioinformatics: Application in Drug/Vaccine Design G. P. S. Raghava Web:
Let’s investigate some of the Hot Areas of Life Sciences in more detail: Genomics –Human Genome Project –Use of Microarrays or DNA chips Bioinformatics.
Bioinformatics at IU - Ketan Mane. Bioinformatics at IU What is Bioinformatics? Bioinformatics is the study of the inherent structure of biological information.
1 Genetics The Study of Biological Information. 2 Chapter Outline DNA molecules encode the biological information fundamental to all life forms DNA molecules.
Bioinformatics Needs for the post-genomic era Dr. Erik Bongcam-Rudloff The Linnaeus Centre for Bioinformatics.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
9 Genomics and Beyond Brief Chapter Outline
Jeffery Loo NLM Associate Fellow ’03 – ’05 chemicalinformaticsforlibraries.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
The Human Genome Project and ~ 100 other genome projects:
Bindley Bioscience Center Vision: Nurture interactive communication and interdisciplinary discovery with flexible laboratory project spaces and an open.
Data visualization in the post-genomics era Carol Morita Genentech, Inc.
Bioinformatics Student host Chris Johnston Speaker Dr Kate McCain.
Agilent: The Company, The Myth, The Lengend. Agilent: Agilent Technologies Inc. (NYSE: A) is a world-wide, diverse technology company focused on expansion.
Arrays: Narrower terms include bead arrays, bead based arrays, bioarrays, bioelectronic arrays, cDNA arrays, cell arrays, DNA arrays, gene arrays, gene.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Genetics: From Genes to Genomes
The field of science in which biology, computer science, and information technology merge into a single discipline. NCBI, Aug 2001 BIO INFORMATICS BIOLOGY.
Why microarrays in a bioinformatics class? Design of chips Quantitation of signals Integration of the data Extraction of groups of genes with linked expression.
Proteomics Understanding Proteins in the Postgenomic Era.
Opportunities in Bioinformatics Presented By Dr G. P. S. Raghava Co-ordinator, Bioinformatic Centre IMTECH, Chandigarh Web:
Paola CASTAGNOLI Maria FOTI Microarrays. Applicazioni nella genomica funzionale e nel genotyping DIPARTIMENTO DI BIOTECNOLOGIE E BIOSCIENZE.
with an emphasis on DNA microarrays
Bioinformatics Jan Taylor. A bit about me Biochemistry and Molecular Biology Computer Science, Computational Biology Multivariate statistics Machine learning.
Overview of Bioinformatics A/P Shoba Ranganathan Justin Choo National University of Singapore A Tutorial on Bioinformatics.
Serono Science Scientific computing and high performance applications
9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.
Bioinformatics.
Knowledgebase Creation & Systems Biology: A new prospect in discovery informatics S.Shriram, Siri Technologies (Cytogenomics), Bangalore S.Shriram, Siri.
Development of Bioinformatics and its application on Biotechnology
1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview پرتال پرتال بيوانفورماتيك ايرانيان.
Chapter 14 Genomes and Genomics. Sequencing DNA dideoxy (Sanger) method ddGTP ddATP ddTTP ddCTP 5’TAATGTACG TAATGTAC TAATGTA TAATGT TAATG TAAT TAA TA.
Continuing Education for Biological and Life Sciences Librarians in the Post- Genomic Era You CAN Teach an Old Dog New Tricks Frederick W Stoss University.
Beyond the Human Genome Project Future goals and projects based on findings from the HGP.
Genome Project and Bioinformatics Dr Tan Tin Wee Director Bioinformatics Centre.
Chapter 13. The Impact of Genomics on Antimicrobial Drug Discovery and Toxicology CBBL - Young-sik Sohn-
Introduction to Pharmacoinformatics
Biotechnology in Medicine Chapter 12.
CS 790 – Bioinformatics Introduction and overview.
TOPICS IN (NANO) BIOTECHNOLOGY
A New Oklahoma Bioinformatics Company. Microarray and Bioinformatics.
Integrated Biomedical Information for Better Health Workprogramme Call 4 IST Conference- Networking Session.
Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.
Helping scientists collaborate BioCAD. ©2003 All Rights Reserved.
Finish up array applications Move on to proteomics Protein microarrays.
Introduction to Proteomics 1. What is Proteomics? Proteomics - A newly emerging field of life science research that uses High Throughput (HT) technologies.
Function first: a powerful approach to post-genomic drug discovery Stephen F. Betz, Susan M. Baxter and Jacquelyn S. Fetrow GeneFormatics Presented by.
Organizing information in the post-genomic era The rise of bioinformatics.
Genomics and Arabidopsis. What is ‘genomics’? Study of an organism’s entire genome –All the DNA encoded in the organism –Nucleus, mitochondria, chloroplasts.
Harbin Institute of Technology Computer Science and Bioinformatics Wang Yadong Second US-China Computer Science Leadership Summit.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Genomics II: The Proteome Using high-throughput methods to identify proteins and to understand their function.
Proteomics Session 1 Introduction. Some basic concepts in biology and biochemistry.
Central dogma: the story of life RNA DNA Protein.
© Copyright 2002 Frost & Sullivan. All Rights Reserved. The U.S. Functional Genomics Market Need for Higher Efficiency and Innovation Spur Functional Genomics.
Bioinformatics and Computational Biology
Microarrays and Other High-Throughput Methods BMI/CS 576 Colin Dewey Fall 2010.
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS) LECTURE 13 ANALYSIS OF THE TRANSCRIPTOME.
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
High throughput biology data management and data intensive computing drivers George Michaels.
신기술 접목에 의한 신약개발의 발전전망과 전략 LGCI 생명과학 기술원. Confidential LGCI Life Science R&D 새 시대 – Post Genomic Era Genome count ‘The genomes of various species including.
STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2016 Xiaole Shirley Liu.
Genomes and Their Evolution
The Study of Biological Information
Introduction to Bioinformatic
Introduction to Bioinformatics
Presentation transcript:

Research In the Post-Genomics Era Martina McGloughlin, Biotechnology Program and Life Sciences Informatics Program UC Davis

2 “Biology in the 21st century will increasingly become an information science” Leroy Hood, Jan 11, 1999 “Any cell has in it a billion years of experimentation by its ancestors” Max Delbruck, 1949 “Biology in the 21st century will increasingly become an information science” Leroy Hood, Jan 11, 1999 “Any cell has in it a billion years of experimentation by its ancestors” Max Delbruck, 1949 UC Davis Biotechnology Program UC Systemwide Life Sciences Informatics Program UC Davis Biotechnology Program UC Systemwide Life Sciences Informatics Program

3  The massive interest and commitment of resources in both the public and private sectors flows from the generally-held perception that genomics will be the single most fruitful approach to the acquisition of new information in basic and applied biology in the next several decades.  If genomics were only to be a tool for the basic biologist, the benefits of this approach would be staggering, yielding new insights into fundamental processes such as cell division, differentiation, transformation, the development and reproduction of organisms and the diversity of populations.  The rewards in applied biology, however, have clearly attracted the private sector and public interest. These include the promise of facile new approaches for drug discovery, new understanding of metabolic processes and new approaches to determining qualitative and quantitative traits in plants and animals for breeding and genetic engineering.  The massive interest and commitment of resources in both the public and private sectors flows from the generally-held perception that genomics will be the single most fruitful approach to the acquisition of new information in basic and applied biology in the next several decades.  If genomics were only to be a tool for the basic biologist, the benefits of this approach would be staggering, yielding new insights into fundamental processes such as cell division, differentiation, transformation, the development and reproduction of organisms and the diversity of populations.  The rewards in applied biology, however, have clearly attracted the private sector and public interest. These include the promise of facile new approaches for drug discovery, new understanding of metabolic processes and new approaches to determining qualitative and quantitative traits in plants and animals for breeding and genetic engineering. Genomics

4 Typed in 10-pitch font, one human sequence would stretch for more than 5,000 miles. Digitally formatted, it could be stored on one CD-ROM. Biologically encoded, it fits easily within a single cell. One Human Sequence

5 Organism #of genes % genes with Comp. date inferred function for genome sequencing E. Coli4, Yeast 6, C. Elegans 19, Drosophila 12,000-14, Arabidopsis 25, Mouse 26,000-40, Human 26,383-39, Organism #of genes % genes with Comp. date inferred function for genome sequencing E. Coli4, Yeast 6, C. Elegans 19, Drosophila 12,000-14, Arabidopsis 25, Mouse 26,000-40, Human 26,383-39,

6 Paradigm Shift in Biology The new paradigm, now emerging, is that all the ‘genes’ will be known (in the sense of being resident in databases available electronically), and that the starting point of a biological investigation will be theoretical. An individual scientist will begin with a theoretical conjecture, only then turning to experiment to follow or test that hypothesis. Walter Gilbert Towards a paradigm shift in biology. Nature, 349:99.

7 Paradigm Shift in Biology To use [the] flood of knowledge, which will pour across the computer networks of the world, biologists not only must become computer literate, but also change their approach to the problem of understanding life. Walter Gilbert Towards a paradigm shift in biology. Nature, 349:99.

8 What’s Really Next The post-genome era in biological research will take for granted ready access to huge amounts of genomic data. The challenge will be understanding those data and using the understanding to solve real-world problems...

9 Fundamental Dogma DNA RNA Proteins Circuits Phenotypes Populations GenBank EMBL DDBJ Map Databases SwissPROT PIR PDB Gene Expression? Clinical Data ? Regulatory Pathways? Metabolism? Biodiversity? Neuroanatomy? Development ? Molecular Epidemiology? Comparative Genomics? the post-genomic era will need many more to collect, manage, and publish the coming flood of new findings. Although a few databases already exist to distribute molecular information, If this extension covers functional genomics, then “functional genomics” is equivalent to biology.

10 There are Problems with the HGP... The actual sequence data makes up only 16% of the content; the other 84% is annotated. The is leads to a number of issues: –How to structure databases for mining –How to establish control vocabularies to establish integrity of searches –What new algorithms are needed to facilitate processing and correlating the petabytes (10 15 bytes) of information –How can protein function be extracted for the purposes of diagnostics, drug discovery and therapeutics

Bioinformatics - Two Views USERS of Information of Tools of Instrumentation In-Silico Modeling INTERPRETERS of Information DEVELOPERS* of Information of Tools of Instrumentation of Architecture/Storage Algorithms Modeling Strategies Visualization Per Pete Smietana, VP Lumicyte * * These people are in highest demand

12 Typical Bioinformatics Multi-Disciplinary Training Scientists –Biology, Molecular Genetics, Clinical Biochemistry, Protein Structure Chemistry Mathematicians –Statistics, Algorithms, Image processing Computer Scientists –Database, User Interface/Visualizations, Networking (Internets/Intranets), Instrument Control Typical Bioinformatics Multi-Disciplinary Training Scientists –Biology, Molecular Genetics, Clinical Biochemistry, Protein Structure Chemistry Mathematicians –Statistics, Algorithms, Image processing Computer Scientists –Database, User Interface/Visualizations, Networking (Internets/Intranets), Instrument Control

13 Typical Bioinformatics Multi-Disciplinary Functions Scientists –Experimental Design & Interpretation –Laboratory Protocols & Standards/Controls Mathematicians –Analysis & Correlation of Data –Validation methodologies Computer Scientists –Information Storage / Control Vocabulary –Data Mining Typical Bioinformatics Multi-Disciplinary Functions Scientists –Experimental Design & Interpretation –Laboratory Protocols & Standards/Controls Mathematicians –Analysis & Correlation of Data –Validation methodologies Computer Scientists –Information Storage / Control Vocabulary –Data Mining

Bioinformatics Functional Organization Infrastructure Support Computer operations Database Admin Skillset Computer Network Database Applications Support Help Desk Training Skillset Program knowledge Communication Teaching Research Support Scientific support Gene discovery Data Smelting Skillset Molecular Biology Computer Communications Research Bioinformatics research Algorithm develompment New Technologies Skillset Computational Biology Bioinformatics Programming Systems Development Program development System integration Database design Skillset Systems analysis Database development Programming Gene Discovery Genomics Sequencing Molecular Biology High Throughput Screening Database Support Administration Curation Skillset Molecular Biology Computer Communications

15 TGT AAT AGT TAT ATT TTC ATT ATA AAT TGT GTT TGT AGA CAT CAT AAA TTT AAA ACA TGG CTT TTT AAC CTG ATA AAT CCT ACG AAT ATT TGT AAT AGT TAT GTT ATT GCA GTA AGT ACC GTT TGT ATT ATA AAT TGT GTT CTG TGT AAT AGT TAT ATT TTC ATT ATA AAT TGT GTT TGT AGA CAT CAT AAA TTT AAA ACA TGG CTT TTT AAC CTG ATA AAT CCT ACG AAT ATT TGT AAT AGT TAT GTT ATT GCA GTA AGT ACC GTT TGT ATT ATA AAT TGT GTT CTG Which genes are turned off then on ? Courtesy of Dr. Young Moo Lee

16 GenBank Release Numbers Growth in GenBank is exponential. Recently more data were added in ten weeks than were added in the first ten years of the project. Base Pairs in GenBank

17 Rhetorical Question Which is likely to be more complex: identifying, documenting, and tracking the whereabouts of all parcels in transit in the US at one time identifying, documenting, and analyzing the structure and function of all individual genes in all economically significant organisms; then analyzing all significant gene-gene and gene- environment interactions in those organisms and their environments Which is likely to be more complex: identifying, documenting, and tracking the whereabouts of all parcels in transit in the US at one time identifying, documenting, and analyzing the structure and function of all individual genes in all economically significant organisms; then analyzing all significant gene-gene and gene- environment interactions in those organisms and their environments

18 Business Factoids United Parcel Service: uses two redundant 3 Terabyte (yes, 3000 GB) databases to track all packages in transit. has 4,000 full-time employees dedicated to IT spends one billion dollars per year on IT has an income of 1.1 billion dollars, against revenues of 22.4 billion dollars United Parcel Service: uses two redundant 3 Terabyte (yes, 3000 GB) databases to track all packages in transit. has 4,000 full-time employees dedicated to IT spends one billion dollars per year on IT has an income of 1.1 billion dollars, against revenues of 22.4 billion dollars

19 Examples of Biotech/IT Fusion Technologies  Genomics, proteomics and bioinformatics  Combinatorial –chemistry  Peptide libraries- tea bags, beads  Combinatorial -biology  Directed evolution  DNA Shuffling, Molecular Breeding  High throughput analysis  Nucleic Acid based Sequencing Microarrays Photolithography Mirrors Spotted Chips Semi-conductor  Protein based 2-D, electrospray/nanospray MS: MALDI-TOF, LC/MS/MS, SELDI  Imaging/optical biology  Biosensors, Bioelectronics and Bionetworks (Nanotechnology) Examples of Biotech/IT Fusion Technologies  Genomics, proteomics and bioinformatics  Combinatorial –chemistry  Peptide libraries- tea bags, beads  Combinatorial -biology  Directed evolution  DNA Shuffling, Molecular Breeding  High throughput analysis  Nucleic Acid based Sequencing Microarrays Photolithography Mirrors Spotted Chips Semi-conductor  Protein based 2-D, electrospray/nanospray MS: MALDI-TOF, LC/MS/MS, SELDI  Imaging/optical biology  Biosensors, Bioelectronics and Bionetworks (Nanotechnology)

20 Genomics, Proteomics and Bioinformatics  Genomics is operationally defined as investigations into the structure and function of very large numbers of genes undertaken in a simultaneous fashion.  Structural genomics includes the genetic mapping, physical mapping and sequencing of entire genomes.  Comparative genomics means information gained in one organism can have application in other even distantly related organisms. This enables the application of information gained from facile model systems to agricultural and medical problems. The nature and significance of differences between genomes also provides a powerful tool for determining the relationship between genotype and phenotype through comparative genomics and morphological and physiological studies.  Functional genomics Phenotype is logically the subject of functional genomics. Genome sequencing for most organisms of interest will be complete within the near future, ushering in the so called "post-genome era." Walter Gilbert directly speculated on the nature of biology in the "post-genome era": "The new paradigm, now emerging, is that all genes will be known (in the sense of being resident in databases available electronically), and that the starting point of a biological investigation will be theoretical.“ Genomics, Proteomics and Bioinformatics  Genomics is operationally defined as investigations into the structure and function of very large numbers of genes undertaken in a simultaneous fashion.  Structural genomics includes the genetic mapping, physical mapping and sequencing of entire genomes.  Comparative genomics means information gained in one organism can have application in other even distantly related organisms. This enables the application of information gained from facile model systems to agricultural and medical problems. The nature and significance of differences between genomes also provides a powerful tool for determining the relationship between genotype and phenotype through comparative genomics and morphological and physiological studies.  Functional genomics Phenotype is logically the subject of functional genomics. Genome sequencing for most organisms of interest will be complete within the near future, ushering in the so called "post-genome era." Walter Gilbert directly speculated on the nature of biology in the "post-genome era": "The new paradigm, now emerging, is that all genes will be known (in the sense of being resident in databases available electronically), and that the starting point of a biological investigation will be theoretical.“

21 Genomics, Proteomics and Bioinformatics  Proteomics At the molecular level, phenotype includes all temporal and spatial aspects of gene expression as well as related aspects of the expression, structure, function and spatial localization of proteins. The Proteome is the set of all expressed proteins for a given organism.  The next hierarchical level of phenotype considers how the proteome within and among cells cooperates to produce the biochemistry and physiology of individual cells and organisms. “Physiomics" is a descriptor for this approach. “Phenomics" The final hierarchical levels of phenotype include anatomy and function for cells and whole organisms.  Bioinformatics: Computational or algorithmic approaches to the production of information from large amounts of biological data, include prediction of protein structure, dynamic modeling of complex physiological systems or the statistical treatment of quantitative traits in populations in order to determine the genetic basis for these traits.  Unquestionably, bioinformatics will be an essential component of all research activities utilizing structural and functional genomics approaches Genomics, Proteomics and Bioinformatics  Proteomics At the molecular level, phenotype includes all temporal and spatial aspects of gene expression as well as related aspects of the expression, structure, function and spatial localization of proteins. The Proteome is the set of all expressed proteins for a given organism.  The next hierarchical level of phenotype considers how the proteome within and among cells cooperates to produce the biochemistry and physiology of individual cells and organisms. “Physiomics" is a descriptor for this approach. “Phenomics" The final hierarchical levels of phenotype include anatomy and function for cells and whole organisms.  Bioinformatics: Computational or algorithmic approaches to the production of information from large amounts of biological data, include prediction of protein structure, dynamic modeling of complex physiological systems or the statistical treatment of quantitative traits in populations in order to determine the genetic basis for these traits.  Unquestionably, bioinformatics will be an essential component of all research activities utilizing structural and functional genomics approaches

Medical Bioinformatics: What is it? Laboratory/Clinical Experiments Biological Interpretation Informatics Hi-throughput Screening DataHi-throughput Screening Data genotype sequencinggenotype sequencing functional assays functional assays DNA libraryDNA library Patient Clinical DataPatient Clinical Data cancer phenotypecancer phenotype outcomes, treatments, ageoutcomes, treatments, age Patient SamplesPatient Samples TissuesTissues TumorsTumors Model SystemsModel Systems RatsRats cultured tissuescultured tissues Published LiteraturePublished Literature Scientific/Medical ExpertsScientific/Medical Experts Sample/Experiment TrackingSample/Experiment Tracking Data Processing, Quality ControlData Processing, Quality Control Statistical AnalysesStatistical Analyses Sequence Matching/AnnotationSequence Matching/Annotation Functional SignificanceFunctional Significance User Access to ResultsUser Access to Results

What’s in a name? Sequence Analysis Database Homology Searching Multiple Sequence Alignment Homology Modeling Docking Protein Analysis Proteomics 3D Modeling Sample Registration & Tracking Integrated Data Repositories Common Visual Interfaces Intellectual Property Auditing Bio Informatics Genome Mapping

Gene Discovery Informatics Microdissection Create DNA Libraries Signature Hybridization Clustering by Signature Expression Profiles Differential Expression DNA Sequencing Gene Assignments Functional Predictions Micro Arrays Functional Assays Small Molecule Drugs Tissues & Cell Lines In situ Hybridization Clones Database DNA Libraries Database Annotated Sequence Database Assays & Validation Database Clustering Database Tissue & Cell Lines Database Small Molecule Database Micro Array Database In Situ Hybridiz- ation

IM Intensive Research Groups Pharmacology Screening Assay Analysis Animal Data Robotics Sample Management Chemical Informatics Chemistry SMDD Bioinformatics Genomics Gene Expression Target Discovery Computational Biology Modeling Structure Desktop Research Experimental Data Information Management Infrastructure Support

Cancer Gene Discovery Knowledgebase User's Web Browser DNA Sequence Proprietary Relational Database Homology searches Functional Profiles Cancer Tissue Inventory Patient Data Pathology Tissue Info Functional/Validation Microarrays In situ Hybridization Functional validation Anti-sense RNA Knockouts Cancer Differential Expression Data Tissue cDNA libraries Gene Expression Patterns

External Public Databases Bioinformatics Architecture External Proprietary Databases Unix servers & Specialized Hardware Users Workstation Java & Desktop Programs Web Browser Active Server Livewire CGI NT servers Proprietary Internal Databases Web Server Shared Access Databases MS Access

Challenges in High Throughput Biotechnology R&D  Volume of Data is Growing Rapidly  Technology is Evolving Rapidly  Instrumentation, Informatics  Biological Definitions are Constantly Updated  New Interactions and Functions Discovered Daily  New Genes  Sometimes Homologous to Known Genes ->re-evaluate old data  Full Value is Realized by Integrating Multiple High-Throughput Platforms  Sequencing, Functional Screens, Small Molecule Activity  Up-Front Design of Data and Quality Control Databases is Crucial to Success  High Data Quality is Essential  Financially Impossible to Repeat Experiments  Requires Informatics Specialist who Understands Laboratory Techniques

Recent Informatics Job Description DUTIES: The role of this position is to provide scientific bioinformatics support for gene discovery research utilizing DNA microarray technology at Chiron. The successful candidate will participate in research projects within a bioinformatics team that will provide analysis and data management resources necessary to optimize research activities. REQUIREMENTS: A BS/MS in molecular biology, biochemistry and/or a field related to bioinformatics. A minimum of 1 year of biotechnology research experience working with projects and scientists in the field and 1 year experience utilizing data analysis tools in biotechnology research, specifically in the field of microarrays. Proficiency with bioinformatics programs and algorithms including both unix and desktop systems. Proficiency in data analysis programs such as Excel and statistical analysis packages used in the analysis of microarray data. Proficiency in SQL and working with relational database systems. Strong communication and teaching skills for working with colleagues and project researchers. Scientist II, Research Microarrays DUTIES; Develop and apply data analysis methods for interpreting high-throughput microarray experiments. Disseminate research results in presentations and writing. Should be flexible and work well in a team environment. REQUIREMENTS: Ph.D. in Physical Sciences, Computing Sciences or Statistics. Previous experience analyzing large data sets, modeling laboratory experiments, developing quantitative assessments of data reliability, and associated computer programming tasks. Title: Information Specialist

30 Slide 30 Funding UC-Industry research collaborations in bioinformatics food and agricultural informatics environmental informatics medical informatics computational aspects of imaging & modeling LSI Research Proposals January 26, 2001 May 22, 2001 October 2, 2001 Opportunity Awards Year round Learn more about it…

31 “The two technologies that will shape the next century are biotechnology and information technology” Bill Gates “The two technologies that will have the greatest impact on each other in the new millennium are biotechnology and information technology ” Martina McGloughlin “The two technologies that will shape the next century are biotechnology and information technology” Bill Gates “The two technologies that will have the greatest impact on each other in the new millennium are biotechnology and information technology ” Martina McGloughlin

32 Technology Division Informatics Technology Division Pharmaceutical Division Small Molecule Drugs Millennium BioTherapeutics, Inc (Mbio) Proteins Antibodies Vaccines Gene Therapy Antisense Cereon Genomics, LLC (Monsanto Subsidary) Plant Genomics MillenniumPredictive Medicine, Inc (MPM) Diagnostics Pharmacogenomics Patient Mangagement Millennium’s Genomics Strategies Millennium Pharmaceuticals, Inc., the parent company, consists of a technology division and a pharmaceutical division. The technology division has two tasks: 1) developing and acquiring technologies, and 2) moving those technologies into production mode. The company’s philosophy is to industrialize discovery and development, moving as many discovery and development operations as possible into a production mode.

33

34 Expression Technologies There are currently four commonly used approaches to high throughput, comprehensive analysis of relative transcript expression levels. The enumeration of expressed sequence tags (ESTs), Serial Analysis of Gene Expression (SAGE), Differential Display Approaches, Array-based hybridization  The enumeration of expressed sequence tags (ESTs) from representative cDNA libraries. A method of approximating the relative representation of the gene transcript within the starting cell population.  GeneTrace Systems, HHMI, IMAGE Consortium, Incyte, The Institute for Genomic Research  Serial Analysis of Gene Expression.The enumeration of serially concatenated 9-11 base tags from specially prepared cDNA libraries. The frequency of particular transcripts within the starting cell population is reflected by the number of times the associated sequence tag is encountered within the sequence pop  Genzyme Molecular Oncology, Johns Hopkins University Expression Technologies There are currently four commonly used approaches to high throughput, comprehensive analysis of relative transcript expression levels. The enumeration of expressed sequence tags (ESTs), Serial Analysis of Gene Expression (SAGE), Differential Display Approaches, Array-based hybridization  The enumeration of expressed sequence tags (ESTs) from representative cDNA libraries. A method of approximating the relative representation of the gene transcript within the starting cell population.  GeneTrace Systems, HHMI, IMAGE Consortium, Incyte, The Institute for Genomic Research  Serial Analysis of Gene Expression.The enumeration of serially concatenated 9-11 base tags from specially prepared cDNA libraries. The frequency of particular transcripts within the starting cell population is reflected by the number of times the associated sequence tag is encountered within the sequence pop  Genzyme Molecular Oncology, Johns Hopkins University

35 Expression Technologies  Differential Display Approaches Fragments defined by specific sequence delimiters can be used as unique identifiers of genes, when coupled with information about fragment length or fragment location within the expressed gene. The relative representation of an expressed gene within a cell can then be estimated based on the relative representation of the fragment associated with that gene within the pool of all possible fragments. A number of different approaches have been developed to exploit this hypothesis for comprehensive expression analysis.  Curagen Corporation - Quantitative Expression Analysis (QEA)  Digital Gene Technologies, Inc. - Total Gene expression Analysis (TOGA)  Display Systems Biotech - Restriction Fragment Differential Display-PCR (RFDD-PCR)  Genaissance  GeneLogic - Restriction Enzyme Analysis of Differentially- expressed Sequences (READS) Expression Technologies  Differential Display Approaches Fragments defined by specific sequence delimiters can be used as unique identifiers of genes, when coupled with information about fragment length or fragment location within the expressed gene. The relative representation of an expressed gene within a cell can then be estimated based on the relative representation of the fragment associated with that gene within the pool of all possible fragments. A number of different approaches have been developed to exploit this hypothesis for comprehensive expression analysis.  Curagen Corporation - Quantitative Expression Analysis (QEA)  Digital Gene Technologies, Inc. - Total Gene expression Analysis (TOGA)  Display Systems Biotech - Restriction Fragment Differential Display-PCR (RFDD-PCR)  Genaissance  GeneLogic - Restriction Enzyme Analysis of Differentially- expressed Sequences (READS)

36 Expression Technologies Array-based hybridization Based on the exquisite specificity of nucleotide interactions oligonucleotides or cDNA can be used to selectively identify or capture DNA or RNA of specific sequence composition. The primary approaches include array- based technologies that can identify specific expressed gene products on high density formats, including filters, microscope slides, or microchips, and solution- based technologies relying on spectroscopic analyses, such as mass spectrometry. Affymetrix, Axon Instruments, Inc, BioDiscovery Inc. BioRobotics, Cartesian Technologies, Clontech General Scanning Inc., GeneMachines, Genetic MicroSystems Inc.,GeneTrace Systems, Genome Systems, Genometrix, Genomic Solution, Hyseq, Inc. Hyseq/ Applied Biosystems Division of Perkin Elmer Incyte, Intelligent Automation Systems/Intelligent Bio- Instruments, Molecular Dynamics, NHGRI Laboratory of Cancer Genetics, NEN Life Science Products Protogene, Radius BioSciences, Research Genetics, Inc. Stanford University, Dr. Pat Brown, Synteni, TeleChem International, Rosetta Inpharmatics (LeeHood)

37 Expression Technologies- Proteomics Most processes manifest themselves at the level of protein activity, but until recently, high throughput analysis of proteins was not possible. Several technologies now makes it feasible to perform mass screening of proteins  2- D Gel Electrophoresis- LifeProt Protein Expression Database provides a bioinformatics platform for investigating 2D gel images sequence data/annotation. (Incyte, Oxford Bioscience) Immobiline, ImageMaster software Amersham/ Pharmacia/Molecular Dynamics, BioRad  LC-MS/MS, MALDI-TOF mass spectrometer offers fast and reliable protein identification for high throughput proteomic studies. MD, Perkin Elmer  PerSeptive Biosystems, PE Biosystems venture, integrates robotics, mass spec, data searching technologies into 1 system for HT ID proteins, peptides  SELDI (Surface-Enhanced Laser Desorption/ Ionization) ProteinChip technology rapid separation, detection and analysis of proteins at the femtomole level directly from biological samples- Ciphergen  Variants on yeast two-hybrid system, which is widely used for analyzing protein–protein interactions in vivo  Phage Display Expression Technologies- Proteomics Most processes manifest themselves at the level of protein activity, but until recently, high throughput analysis of proteins was not possible. Several technologies now makes it feasible to perform mass screening of proteins  2- D Gel Electrophoresis- LifeProt Protein Expression Database provides a bioinformatics platform for investigating 2D gel images sequence data/annotation. (Incyte, Oxford Bioscience) Immobiline, ImageMaster software Amersham/ Pharmacia/Molecular Dynamics, BioRad  LC-MS/MS, MALDI-TOF mass spectrometer offers fast and reliable protein identification for high throughput proteomic studies. MD, Perkin Elmer  PerSeptive Biosystems, PE Biosystems venture, integrates robotics, mass spec, data searching technologies into 1 system for HT ID proteins, peptides  SELDI (Surface-Enhanced Laser Desorption/ Ionization) ProteinChip technology rapid separation, detection and analysis of proteins at the femtomole level directly from biological samples- Ciphergen  Variants on yeast two-hybrid system, which is widely used for analyzing protein–protein interactions in vivo  Phage Display

38  Some of the 25 new genomics faculty will belong to the UC Davis Genome Center, the first new product of the Genomics Initiative. ($20m set aside for faculty)  Designed to establish the campus as an international leader in functional and comparative genomics, the center will include scientists specializing in gene studies from a multitude of disciplines, including human and animal medicine, engineering, agriculture, mathematics and the biological and physical sciences.  The Genome Center will also include a revitalized pharmacology and toxicology department in the School of Medicine and a group of bioinformatics faculty members who will provide the computational biology and informatics research needed to analyze the enormous amounts of data generated by the genomics research.  Some of the 25 new genomics faculty will belong to the UC Davis Genome Center, the first new product of the Genomics Initiative. ($20m set aside for faculty)  Designed to establish the campus as an international leader in functional and comparative genomics, the center will include scientists specializing in gene studies from a multitude of disciplines, including human and animal medicine, engineering, agriculture, mathematics and the biological and physical sciences.  The Genome Center will also include a revitalized pharmacology and toxicology department in the School of Medicine and a group of bioinformatics faculty members who will provide the computational biology and informatics research needed to analyze the enormous amounts of data generated by the genomics research. Genomics Center

39 Proteomics Companies Location Business Approach Collaborators Ciphergen Biosystems Inc. Palo Alto, CA Protein arrays N/A Genomic Solutions Inc. Ann Arbor, MI Automated 2-D gel/ MS platform N/A Hybrigenics SA Paris, France Protein-protein interaction Pasteur Institute mapping and databases Small Molecule Therapeutics Inc.; Large Scale Biology Corp. Rockville, MD Biological assayBiosource and Vacaville, CATechnologies Inc. (parent) Oxford GlycoSciences plc Oxford, England Biological assay; Incyte Pharma Protein databases Pfizer Inc LumicyteCAProtein Arrays. Proteome Inc. Beverly, MA Protein databases N/A Proteome Systems Ltd. Sydney, Australia Biological assay; Dow AgroSciences Protein databases Myriad Genetics Inc. Salt Lake CityProtein datbases UtahBiological assays CuraGen CorpBiogen, Genentech COR Therapeutics, Glaxo Wellcome, Roche, Pioneer Hi- Bred/ Dupont

40 Affymtrix Agilent Alpha Gene Alpha Innotech Amersham Pharmacia Biotech Axon Instruments Bio Discovery Bio Roboti c s Biospace Mesure s Cartesian Technologies Cellomics Ciphergen Clinical Micro Sensors CLONTECH CuraGen Display Systems Biotech Double Twist GeneData GeneFocus GeneMachines Genetic Micro Systems Genometrix Genomic Solutions GSI Lumonics Imaging Research Iris BioTechnologies Incyte Pharmaceutical s LION Ag Lumicyte Micronics Nanogen NEN Life Science Pro d u c t s PHASE 1 Molecular Toxicology Phoretix International Proteome Protogene Laboratories R&D Systems Radius Biosciences Research Genetics Scanalyticsl Sigma - Genosys Silicon Genetics TeleChem International Universal Imaging V&P Scientific Virtek Vysis Companies dependant on Informatics

41 IP Challenges  June 5, Human Genome Sciences applies for a patent on a gene that produces a "receptor" protein that is later called CCR5. HGS has no idea that CCR5 is an HIV receptor.  December U.S. researcher Robert Gallo, the co-discoverer of HIV, and colleagues find three chemicals that inhibit the AIDS virus. Don’t how the chemicals work.  February Edward Berger at the NIH discovers that Gallo's inhibitors work in late-stage AIDS by blocking a receptor on the surface of T-cells.  June In a period of just 10 days, five groups of scientists publish papers saying CCR5 is the receptor for virtually all strains of HIV.  January Schering-Plough researchers tell a San Francisco AIDS conference they have discovered new inhibitors. Merck researchers are known to have made similar discoveries.  Feb. 15, The U.S. Patent and Trademark Office grants HGS a patent on the gene that makes CCR5 and on techniques for producing CCR5 artificially. The decision sends HGS stock flying and dismays researchers.  HGS: identified in whole or in part 95% of the 100,000 or so human genes. 100 human gene patents 7,500 pending.

42 The Shape of the Wave –1999 »JGI releases 150 Mbases draft »Celera releases the sequence of Drosophila (140 Mb) »Public “draft” effort reaches halfway point (1,500 Mb) »20 more Microbial genomes completed (80 Mb but 60,000 genes) »First release of Celera “shotgun” (9,000 Mb) –2000 »Public “draft” completed (1,500 Mb) »Mouse “draft” begins (500 Mb - comparisons with human) »Two more Celera shotgun releases ( 18,000 Mb) »40 more Microbial genomes sequenced (160 Mb -120,000 genes)

43 Projected Base Pairs Year 500,000 50,000 5,000 Projected size of the sequence database, indicated as the number of base pairs per individual medical record in the US. The amount of digital data necessary to store bases of DNA is only a fraction of the data necessary to describe the world’s microbial biodiversity at one square meter resolution...

44 How much information is there in the World? Library of Congress: –3 Petabytes (3,000 TB) »6 billion book pages (1 PB) »13 million photographs (13TB) »maps, movies, audio tapes Cinema Images –520 Petabytes (520,000TB) »52 billion photographs / year / 10KB Broadcasting Sound Telephony Library of Congress: –3 Petabytes (3,000 TB) »6 billion book pages (1 PB) »13 million photographs (13TB) »maps, movies, audio tapes Cinema Images –520 Petabytes (520,000TB) »52 billion photographs / year / 10KB Broadcasting Sound Telephony Michael Lesk, Bellcore 1997

45 Business Comparisons CompanyRevenuesIT BudgetPct Bristol-Myers Squibb15,065,000,000440,000, %Pfizer11,306,000,000300,000, %Pacific Gas & Electric10,000,000,000250,000, %K-Mart31,437,000,000130,000, %Wal-Mart104,859,000,000550,000, %Sprint14,235,000,000873,000, %MCI18,500,000,0001,000,000, %United Parcel22,400,000,0001,000,000, %AMR Corporation17,753,000,0001,368,000, %IBM75,947,000,0004,400,000, %Microsoft11,360,000,000510,000, %Chase-Manhattan16,431,000,0001,800,000, %Nation’s Bank17,509,000,0001,130,000, %