Overview of Biomedical Informatics

Overview of Biomedical Informatics
Vipin Kumar University of Minnesota Team Members: Michael Steinbach, Rohit Gupta, Gowtham Atluri, Gang Fang, Gaurav Pandey, Sanjoy Dey, Vanja Paunic Collaborators: Brian Van Ness, Bill Oetting, Gary L. Nelsestuen, Christine Wendt, Piet C. de Groen, Michael Wilson Research Supported by NSF, IBM, BICB-UMR, Pfizer Nov 12th, Understanding Biotechnology – The Science of the ‘Omics’ 1

Biomedical Informatics
Recent technological advances are helping to generate large amounts of biomedical data Data from high-throughput experimental techniques Gene expression data Biological networks Proteomics and metabolomics data Single Nucleotides Polymorphism (SNP) data Electronic Medical Records IBM-Mayo clinic partnership has created a DB of 5 million patients Great potential benefits from the analysis of these large-scale data sets: Automated analysis of patients history for customized treatment Discovery of biomarkers for complex diseases and other phenotypes Cheminformatics and drug discovery 2 2

Large-scale Data is Everywhere!
There has been enormous data growth in both commercial and scientific databases due to advances in data generation and collection technologies New mantra Gather whatever data you can whenever and wherever possible. Expectations Gathered data will have value either for the purpose collected or for a purpose not envisioned. Homeland Security Business Data Geo-spatial data Computational Simulations Sensor Networks Scientific Data

Data Mining Data Automated techniques for analyzing large data sets.
Draws ideas from machine learning/AI, pattern recognition, statistics, and database systems. Predictive Modeling Clustering Association Rules Anomaly Detection Milk Data 4

Model for predicting credit worthiness
Predictive Modeling: Classification Find a model for class attribute as a function of the values of other attributes Model for predicting credit worthiness Class

Discovering biomarkers
Gene Expression Data Given: n labeled subjects, each with expression levels of p genes Objectives: build a predictive model to identify cancer subtypes Genes Classical study of cancer subtypes Golub et al. (1999) identification of diagnostic genes SNP Data Given: n labeled subjects, each with genotypes of p SNPs Objectives: build a model using genotypes to predict labels. SNP 1 SNP 2 SNP 3 …….. ……. Class Patient 1 AC GT AA 1 Patient 2 GG ……… .. Patient n CC AG

Predicting short-term vs. long-term survivors among myeloma subjects
3404 SNPs (Selected according to potential relevance to Myeloma) Cases: 70 Patients who survived shorter than 1 year Controls: 73 Patients survived longer than 3 years SNPs cases Brian Van Ness et al, Genomic Variation in Myeloma: Design, content and initial application of the Bank On A Cure SNP Panel to detect associations with progression free survival, BMC Medicine, Volume 6, pp 26, 2008. controls

Clustering Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups Applications: Finding groups of similar genes or proteins based upon their expression profiles Clustering of patients based on phenotypic and genotypic factors for efficient disease diagnosis Market Segmentation Document Clustering Courtesy: Michael Eisen Michael Eisen et al, 1999 8

Association Pattern Discovery
Given a set of records each of which contain some number of items from a given collection; Produce dependency rules which will predict occurrence of an item based on occurrences of other items. Biological applications Identifying functional modules in protein interaction networks Identifying transcription modules in gene expression data Identifying biological entities associated with disease phenotypes Biomarker discovery from genomic data, e.g. gene expression, Single-nucleotide polymorphism(SNP), metabolite data etc. Rules Discovered: {Milk} --> {Coke} {Diaper, Milk} --> {Beer}

Discovery of Discriminative Patterns from Lung Cancer Gene Expression Data
67 Normal samples, 102 cancer patients, 8787 genes [Stearman et al. 2005], [Su et al. 2007], [Bhattacharjee et al. 2001] Visualization of a size-10 pattern using a new discriminative pattern finding technique Gang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach, Chad L. Myers and Vipin Kumar, Subspace Differential Coexpression Analysis: Problem Definition and A General Approach, In the Proceedings of the 15th Pacific Symposium on Biocomputing (PSB), pp , 2010. Enriched with the TNF/NFkB signaling pathway which is well-known to be related to lung cancer P-value: 1.4*10-5 (6/10 overlap with the pathway)

Discriminative Metabolite Patterns from Liver Cirrhosis Data
41 alcoholic liver cirrhosis (row 1-41), 19 controls (row 42-60), 3610 metabolites Data from Gary Nelsestuen et al. A sample group of five metabolites having very similar (in relative terms) intensity values in cases, but mostly absent in controls. (a) The rank values (black is 10, white is 0), (b) original intensity values. Gaurav Pandey, Gowtham Atluri, Michael Steinbach, Chad L. Myers and Vipin Kumar, An Association Analysis Approach to Biclustering, Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), , 2009. (a) (b)

Summary Data mining techniques hold great promise for data-driven hypothesis generation in the biomedical domain. Ample scope exists for the development and application of novel techniques for the analysis of different types of biomedical data.

For further information…
Visit Send to Pang-Ning Tan, Michael Steinbach and Vipin Kumar, Introduction to Data Mining, Addison-Wesley, 2005.

Overview of Biomedical Informatics

Similar presentations

Presentation on theme: "Overview of Biomedical Informatics"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Overview of Biomedical Informatics

Similar presentations

Presentation on theme: "Overview of Biomedical Informatics"— Presentation transcript:

Similar presentations

About project

Feedback