Presentation is loading. Please wait.

Presentation is loading. Please wait.

Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.

Similar presentations


Presentation on theme: "Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL."— Presentation transcript:

1 Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL AND BIOMEDICAL DATABASE MINING

2 WHY THIS COURSE? Biological and Biomedical Research Problems Genome 1980’s-1990’s Sequencing, sequence analysis, … Proteome 1990’s-2000’s Protein structure, protein-protein interactions, protein pathways Central dogma: DNA  (trascription)  RNA  (translation)  Protein Transcriptome mid 1990’s-2000’s Gene expression, DNA/RNA microarrays Biological Function 2000’s Applications 2000’s Organism-organism interactions Organism-environment interactions Genome-wide association studies Cancer therapies Drug development

3 THIS ALL HAS GENERATED … Data Massive datasets and databases of sequence, gene, gene expression, protein, biological function, clinical information, … Text Annotations in data sources, abstracts (e.g., Medline), research articles, medical literature (e.g., PubMed, NCBI Bookshelf, Google Scholar), patients records, … Ontologies Description of terms and their relationship (e.g., Gene Ontology)

4 CURRENT CHALLENGES To make sense of and put to use all this information. How? Computational tools and techniques are needed to help humans in integrating, summarizing, understanding, and taking advantage of accumulated information Data mining Text mining Data and text mining together

5 “Non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data [text]” (Fayyad et al., 1996) Raw Data [Text] Data [Text] Mining Patterns Analytical Patterns (rules, decision trees) Statistical Patterns (data distribution) Visual Patterns Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. "From Data Mining to Knowledge Discovery in Databases" AAAI Magazine, pp. 37-54. Fall 1996. WHAT IS DATA [TEXT] MINING? OR MORE GENERALLY, KNOWLEDGE DISCOVERY IN DATABASES (KDD)

6 DATA MINING METHODS IN BIOINFORMATICS Clustering Sequence Mining Bayesian Methods Expectation Maximization (EM) Gibbs Sampling Hidden Markov Models Kernel methods Support Vector Machines

7 TEXT MINING IN BIOINFORMATICS Document indexing Information retrieval Lexical analysis (Sentence tokenization, Word tokenization, Stemming, Stop word removal) Semantic analysis Query processing Text classification Text clustering Text summarization (Semi-) Automatic curation of literature repositories Knowledge discovery from text, hypothesis generation

8 DATA/TEXT MINING PROCESS (KDD) information sources data analysis data mining analytical statistical visual models model/patterns deployment prediction decision support new data data management databases data warehouses “good” model model/pattern evaluation quantitative qualitative data “pre”- processing noisy/missing data feature selection cleaned data

9 PUTTING ALL TOGETHER … Data / Text / Information Integration Mining over data and text combined Visualization Other real-world issues Developing tools and techniques that are efficient, scalable, and user friendly

10 Biology and Biomedicine Contributes domain knowledge Machine Learning (AI) Contributes (semi-)automatic induction of empirical laws from observations & experimentation Statistics Contributes language, framework, and techniques Pattern Recognition Contributes pattern extraction and pattern matching techniques Natural Language Processing (AI) Computational Linguistics Contributes text analysis techniques Databases Contributes efficient data storage, data cleansing, and data access techniques Data Visualization Contributes visual data displays and data exploration High Performance Comp. Contributes techniques to efficiently handling complexity Signal processing Image Processing … INTERDISCIPLINARY TECHNIQUES COME FROM MULTIPLE FIELDS

11 QUESTIONS? * Images in this presentation were downloaded from Google images


Download ppt "Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL."

Similar presentations


Ads by Google