Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.

Slides:



Advertisements
Similar presentations
Prof. Carolina Ruiz Department of Computer Science Worcester Polytechnic Institute INTRODUCTION TO KNOWLEDGE DISCOVERY IN DATABASES AND DATA MINING.
Advertisements

Microarray Data Analysis Day 2
Bioinformatics at WSU Matt Settles Bioinformatics Core Washington State University Wednesday, April 23, 2008 WSU Linux User Group (LUG)‏
Bioinformatics at IU - Ketan Mane. Bioinformatics at IU What is Bioinformatics? Bioinformatics is the study of the inherent structure of biological information.
1 Enriching UK PubMed Central SPIDER launch meeting, Wolfson College, Oxford Paul Davey, UK PubMed Central Engagement Manager.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Advanced Topics COMP163: Database Management Systems University of the Pacific December 9, 2008.
An Overview of Text Mining Rebecca Hwa 4/25/2002 References M. Hearst, “Untangling Text Data Mining,” in the Proceedings of the 37 th Annual Meeting of.
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
1 CIS607, Fall 2006 Semantic Information Integration Instructor: Dejing Dou Week 10 (Nov. 29)
Scientific Data Mining: Emerging Developments and Challenges F. Seillier-Moiseiwitsch Bioinformatics Research Center Department of Mathematics and Statistics.
1 CSE591 (575) Data Mining 1/21/ /6/2003 Computer Science & Engineering ASU.
Data Mining – Intro.
Data mining By Aung Oo.
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
Computer Science Universiteit Maastricht Institute for Knowledge and Agent Technology Data mining and the knowledge discovery process Summer Course 2005.
Introduction to Data Mining Engineering Group in ACL.
GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland.
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Warehouse Fundamentals Rabie A. Ramadan, PhD 2.
Srihari-CSE730-Spring 2003 CSE 730 Information Retrieval of Biomedical Text and Data Inroduction.
9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.
Formal Empirical Applied Mathematical and technical methods and theories Cognitive, behavioral, and organizational techniques and theories ImagingBioInformaticsClinical.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Tang: Introduction to Data Mining (with modification by Ch. Eick) I: Introduction to Data Mining A.Short Preview 1.Initial Definition of Data Mining 2.Motivation.
Data Management Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Last Words COSC Big Data (frameworks and environments to analyze big datasets) has become a hot topic; it is a mixture of data analysis, data mining,
Information Systems Basic Core Specialization Clinical Imaging BioInformatics Public Health Computer Science Methods (formal models) Biomedical Decision.
GTL Facilities Computing Infrastructure for 21 st Century Systems Biology Ed Uberbacher ORNL & Mike Colvin LLNL.
Bioinformatics and medicine: Are we meeting the challenge?
Accomplishments and Challenges in Literature Data Mining for Biology L. Hirschman et al. Presented by Jing Jiang CS491CXZ Spring, 2004.
Chapter 1 Introduction to Data Mining
Knowledge Discovery and Data Mining Evgueni Smirnov.
 CiteGraph: A Citation Network System for MEDLINE Articles and Analysis Qing Zhang 1,2, Hong Yu 1,3 1 University of Massachusetts Medical School, Worcester,
Data Mining By Dave Maung.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Mining Biological Data. Protein Enzymatic ProteinsTransport ProteinsRegulatory Proteins Storage ProteinsHormonal ProteinsReceptor Proteins.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, January 24, 2001.
Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill
Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Tuesday, December 7, 1999 William.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
EB3233 Bioinformatics Introduction to Bioinformatics.
March 31, 1998NSF IDM 98, Group F1 Group F Multi-modal Issues, Systems and Applications.
A collaborative tool for sequence annotation. Contact:
Bioinformatics and Computational Biology
9/03 Data Mining – Introduction G Dong (WSU)1 CS499/ Data Mining Fall 2003 Professor Guozhu Dong Computer Science & Engineering WSU.
Databases, Ontologies and Text mining Session Introduction Part 2 Carole Goble, University of Manchester, UK Dietrich Rebholz-Schuhmann, EBI, UK Philip.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Friday, 14 November 2003 William.
GeWorkbench Overview Support Team Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and Harvard.
1 Survey of Biodata Analysis from a Data Mining Perspective Peter Bajcsy Jiawei Han Lei Liu Jiong Yang.
KNOWLEDGE DISCOVERY & DATA MINING Abhishek M. Mehta ROLL NO:24.
The KDD Process for Extracting Useful Knowledge from Volumes of Data Fayyad, Piatetsky-Shapiro, and Smyth Ian Kim SWHIG Seminar.
Brief Intro to Machine Learning CS539
Data Mining – Intro.
KnowEnG: A SCALABLE KNOWLEDGE ENGINE FOR LARGE SCALE GENOMIC DATA
Data challenges in the pharmaceutical industry
Introduction C.Eng 714 Spring 2010.
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Mining: Concepts and Techniques Course Outline
Basic Intro Tutorial on Machine Learning and Data Mining
Data Warehousing and Data Mining
Batyr Charyyev.
Course Introduction CSC 576: Data Mining.
Data Mining.
Welcome! Knowledge Discovery and Data Mining
Presentation transcript:

Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL AND BIOMEDICAL DATABASE MINING

WHY THIS COURSE? Biological and Biomedical Research Problems Genome 1980’s-1990’s Sequencing, sequence analysis, … Proteome 1990’s-2000’s Protein structure, protein-protein interactions, protein pathways Central dogma: DNA  (trascription)  RNA  (translation)  Protein Transcriptome mid 1990’s-2000’s Gene expression, DNA/RNA microarrays Biological Function 2000’s Applications 2000’s Organism-organism interactions Organism-environment interactions Genome-wide association studies Cancer therapies Drug development

THIS ALL HAS GENERATED … Data Massive datasets and databases of sequence, gene, gene expression, protein, biological function, clinical information, … Text Annotations in data sources, abstracts (e.g., Medline), research articles, medical literature (e.g., PubMed, NCBI Bookshelf, Google Scholar), patients records, … Ontologies Description of terms and their relationship (e.g., Gene Ontology)

CURRENT CHALLENGES To make sense of and put to use all this information. How? Computational tools and techniques are needed to help humans in integrating, summarizing, understanding, and taking advantage of accumulated information Data mining Text mining Data and text mining together

“Non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data [text]” (Fayyad et al., 1996) Raw Data [Text] Data [Text] Mining Patterns Analytical Patterns (rules, decision trees) Statistical Patterns (data distribution) Visual Patterns Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. "From Data Mining to Knowledge Discovery in Databases" AAAI Magazine, pp Fall WHAT IS DATA [TEXT] MINING? OR MORE GENERALLY, KNOWLEDGE DISCOVERY IN DATABASES (KDD)

DATA MINING METHODS IN BIOINFORMATICS Clustering Sequence Mining Bayesian Methods Expectation Maximization (EM) Gibbs Sampling Hidden Markov Models Kernel methods Support Vector Machines

TEXT MINING IN BIOINFORMATICS Document indexing Information retrieval Lexical analysis (Sentence tokenization, Word tokenization, Stemming, Stop word removal) Semantic analysis Query processing Text classification Text clustering Text summarization (Semi-) Automatic curation of literature repositories Knowledge discovery from text, hypothesis generation

DATA/TEXT MINING PROCESS (KDD) information sources data analysis data mining analytical statistical visual models model/patterns deployment prediction decision support new data data management databases data warehouses “good” model model/pattern evaluation quantitative qualitative data “pre”- processing noisy/missing data feature selection cleaned data

PUTTING ALL TOGETHER … Data / Text / Information Integration Mining over data and text combined Visualization Other real-world issues Developing tools and techniques that are efficient, scalable, and user friendly

Biology and Biomedicine Contributes domain knowledge Machine Learning (AI) Contributes (semi-)automatic induction of empirical laws from observations & experimentation Statistics Contributes language, framework, and techniques Pattern Recognition Contributes pattern extraction and pattern matching techniques Natural Language Processing (AI) Computational Linguistics Contributes text analysis techniques Databases Contributes efficient data storage, data cleansing, and data access techniques Data Visualization Contributes visual data displays and data exploration High Performance Comp. Contributes techniques to efficiently handling complexity Signal processing Image Processing … INTERDISCIPLINARY TECHNIQUES COME FROM MULTIPLE FIELDS

QUESTIONS? * Images in this presentation were downloaded from Google images