Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Collaborative.

Slides:



Advertisements
Similar presentations
1 Probability and the Web Ken Baclawski Northeastern University VIStology, Inc.
Advertisements

Intute Repository Search Project A showcase for UK research output Sophia Jones SHERPA October.
Mitsunori Ogihara Center for Computational Science
Kensington Oracle Edition: Open Discovery Workflow Meets Oracle 10g Professor Yike Guo.
KSU Math Department Colloquium
Prof. Carolina Ruiz Department of Computer Science Worcester Polytechnic Institute INTRODUCTION TO KNOWLEDGE DISCOVERY IN DATABASES AND DATA MINING.
Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management Eytan Adar et al Presented by Xiao Hu CS491CXZ.
A Graph-based Recommender System Zan Huang, Wingyan Chung, Thian-Huat Ong, Hsinchun Chen Artificial Intelligence Lab The University of Arizona 07/15/2002.
27 January Semantically Coordinated E-Market Semantic Web Term Project Prepared by Melike Şah 27 January 2005.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
A Paradigm for Space Science Informatics Kirk D. Borne George Mason University and QSS Group Inc., NASA-Goddard or
Prof. Jesús A. Izaguirre Department of Computer Science and Engineering Computational Biology and Bioinformatics at Notre Dame.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Fungal Semantic Web Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE) Etsuko Moriyama, Ken Nickerson, Audrey Atkin (Biological Sciences) Steve Harris.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Introduction to Graphical Models.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.
Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( Graphical.
Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( William.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Intelligent Systems Lecture 23 Introduction to Intelligent Data Analysis (IDA). Example of system for Data Analyzing based on neural networks.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Computing & Information Sciences Kansas State University Lecture 28 of 42 CIS 530 / 730 Artificial Intelligence Lecture 28 of 42 William H. Hsu Department.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Anomaly detection with Bayesian networks Website: John Sandiford.
Chapter 1 Introduction to Data Mining
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Computing & Information Sciences Kansas State University Boulder, Colorado First International Conference on Weblogs And Social Media (ICWSM-2007) Structural.
Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( Permutation.
Helping scientists collaborate BioCAD. ©2003 All Rights Reserved.
Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( cDNA.
Building and Running caGrid Workflows in Taverna 1 Computation Institute, University of Chicago and Argonne National Laboratory, Chicago, IL, USA 2 Mathematics.
Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.
Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( Relational.
Computing & Information Sciences Kansas State University Monday, 29 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture 25 of 42 Wednesday, 29 October.
Computing & Information Sciences Kansas State University Paper Review Guidelines KDD Lab Course Supplement William H. Hsu Kansas State University Department.
Computing & Information Sciences Kansas State University Wednesday, 22 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture 22 of 42 Wednesday, 22 October.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 28 of 41 Friday, 22 October.
Center for Computational Intelligence, Learning, and Discovery Artificial Intelligence Research Laboratory Department of Computer Science Supported in.
LEAP-KMC Workshop 2006 Visualization of KMC Simulation Data and Evolutionary Computation: The LEAP Infrastructure and Content Management System William.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, January 24, 2001.
Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Tuesday, December 7, 1999 William.
Computing & Information Sciences Kansas State University Data Sciences Summer Institute Multimodal Information Access and Synthesis Learning and Reasoning.
Practical Issues for Automated Categorization of Web Sites John M. Pierre Metacode Technologies, Inc. 139 Townsend Street San Francisco,
EB3233 Bioinformatics Introduction to Bioinformatics.
Kansas State University Department of Computing and Information Sciences CIS 690: Data Mining Systems Lab 0 Monday, May 15, 2000 William H. Hsu Department.
Mining the Biomedical Research Literature Ken Baclawski.
Computing & Information Sciences Kansas State University Paper Review Guidelines KDD Lab Course Supplement William H. Hsu Kansas State University Department.
Databases, Ontologies and Text mining Session Introduction Part 2 Carole Goble, University of Manchester, UK Dietrich Rebholz-Schuhmann, EBI, UK Philip.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Friday, 14 November 2003 William.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Graphical Models of Probability.
Virtual Information and Knowledge Environments Workshop on Knowledge Technologies within the 6th Framework Programme -- Luxembourg, May 2002 Dr.-Ing.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Data Mining Concepts and Techniques Course Presentation by Ali A. Ali Department of Information Technology Institute of Graduate Studies and Research Alexandria.
High throughput biology data management and data intensive computing drivers George Michaels.
VIEWS b.ppt-1 Managing Intelligent Decision Support Networks in Biosurveillance PHIN 2008, Session G1, August 27, 2008 Mohammad Hashemian, MS, Zaruhi.
Joslynn Lee – Data Science Educator
Databases, Ontologies and Text mining Session Introduction Part 2
Eick: Introduction Machine Learning
Data Mining: Concepts and Techniques Course Outline
Tools of Software Development
Data Warehousing and Data Mining
Overview of Machine Learning
Web Mining Department of Computer Science and Engg.
Network Inference Chris Holmes Oxford Centre for Gene Function, &,
About Thetus Thetus develops knowledge discovery and modeling infrastructure software for customers who: Have high value data that does not neatly fit.
Presentation transcript:

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( Collaborative Filtering Intelligent Information Retrieval and the Grid Friday 11 October 2002 William H. Hsu Laboratory for Knowledge Discovery in Databases Department of Computing and Information Sciences Kansas State University This presentation is:

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( Acknowledgements Kansas State University Lab for Knowledge Discovery in Databases –Graduate research assistants: Haipeng Guo Roby Joehanes –Other grad students: Prashanth Boddhireddy, Siddharth Chandak, Ben B. Perry, Rengakrishnan Subramanian –Undergraduate programmers: James W. Plummer, Julie A. Thornton Joint Work with –KSU Bioinformatics and Medical Informatics (BMI) group: Sanjoy Das (EECE), Judith L. Roe (Biology), Stephen M. Welch (Agronomy) –KSU Microarray group: Scot Hulbert (Plant Pathology), J. Clare Nelson (Plant Pathology), Jan Leach (Plant Pathology) –Kansas Geological Survey, Kansas Biological Survey, KU EECS Other Research Partners –NCSA Automated Learning Group (Michael Welge, Tom Redman) –University of Manchester (Carole Goble, Robert Stevens) –The Institute for Genomic Research (John Quackenbush, Alex Saeed) –International Rice Research Institute (Richard Bruskiewich)

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( Overview Filtering –Collaborative filtering (CF) and relatives –Application to intelligent information retrieval (IR) Computational Grids –High-Performance Computing (HPC) services Scientific data, metadata (ontologies, specifications), documentation Software tools (source codes, application servers) Experimental results –Grid initiatives: TeraGrid (USA), eScience (UK, EBI) Challenge: Personalization of Services Application: Bioinformatics Methodology: Learning Relational Probabilistic Models –User modeling and collaborative filtering (CF) –DESCRIBER system: integrative CF for computational genomics Current Research and Open Problems

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( Cross-Selling (based upon Market Basket Analysis) Collaborative Recommendation © 2002 Amazon.com, Inc. Collaborative Filtering in Action: Amazon.com [1]

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( Collaborative Filtering in Action: Amazon.com [2] © 2002 Amazon.com, Inc. Classification and Regression based upon Historical Customer Data Explanation from Recommender (Decision Support) System

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( Filtering and Recommendation Approaches Collaborative –Collect: recorded decisions (actions) of user(s) –Infer: preferences of user(s) –Model: associational relationships among entities (e.g., purchases) –Use to: recommend similar decisions to users in similar context Structural –Collect: recorded decisions (actions) of user(s) –Infer: preferences of user(s) –Model: causal relationships among entities (e.g., use cases) –Use to: make recommendation and explain Content-Based: Driven by Key Word / Phrase Collective: Driven by Consensus, Stochastic Mixture Model (e.g., “Swarm Intelligence”, Ant Colony Optimization)

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( ThemeScapes © 1999 SPIRIX software news stories from the WWW in 1997 A Filtering Problem: Text Mining for Information Retrieval (IR)

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( Another Filtering Application: Commercial Fraud Monitoring

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( Stages of Data Mining and Knowledge Discovery in Databases Adapted from Fayyad, Piatetsky-Shapiro, and Smyth (1996)

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( NCSA D2K: Visual Programming System for Rapid Application Development in KDD Data to Knowledge (D2K) © 2002 NCSA

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( NCSA D2K Workflow: Decision Support in Insurance Pricing Hsu, Welge, Redman, Clutter (2002) Data Mining and Knowledge Discovery, 6(4):

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( Computational Grids [1]: High-Performance Distributed Computing What is The Grid? –Infrastructure: Distributed Processing, Networks, Software –Paradigm for Very Large-Scale Scientific Computing End Users of The Grid – Adapted from Goble (2002) –Providers Tool builders Systems/network administrators, service providers, etc. –Researchers Scientific discipline – e.g., Biology Computational Science and Engineering (CSE) – e.g., Bioinformatics Patent Intelligence! –“End users” Developers: e.g., pharmaceutical Medical doctors, patients

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( Computational Grids [2]: Personalization of Services What Services? –High-Performance Computing (HPC) facilities Compute clusters (Beowulf, NT, etc.) Massively distributed networks –Software –Scientific data servers Metadata –Ontologies: Definitional Data Models (cf. Semantic Web) –Service Type Directory Dynamic Design of Workflows – myGrid, Goble et al. (2002) Challenge: Personalization –Intelligent Filtering Approach: User Modeling –“Users Who Used (Your) Specified Resources Also Used…”

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( Domain-Specific Repositories Experimental Data Source Codes and Specifications Data Models Ontologies Models Data Entity and Source Code Repository Index for Bioinformatics Experimental Research Personalized Interface Domain-Specific Collaborative Filtering New Queries Learning and Inference Components Historical Use Case & Query Data Decision Support Models Users of Scientific Document Repository Interface(s) to Distributed Repository Example Queries: What experiments have found cell cycle-regulated metabolic pathways in Saccharomyces? What codes and microarray data were used, and why? DESCRIBER: An Experimental Intelligent Filter

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( Module 2 Learning & Validation of Bayesian Network Models for Use Cases Module 4 Learning & Validation of Bayesian Network Models for MAGE Data & Codes Relational Models of MAGE Data Module 1 Intelligent Collaborative Filtering Front-End Data Historical Use Case & Query Data Personalized Interface Module 5 MAGE Data Model User Estimation of Constraint Parameters Graphical Models of Use Cases Module 3 Constrained Models of Use Cases New Queries DESCRIBER [1]: Overview

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( Intelligent Collaborative Filtering Front-End Personalized Interface Relational Models of (Domain-Specific) Data Constrained Models of Use Cases Relational Probabilistic Model Constraint Selector Integrated Reasoning Component: XML Validator and Constraint Checker Constraints on Repository Content Response to User New Query from User Module 1 DESCRIBER [2]: Collaborative Filtering Module

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( Computational Genomics and Microarray Data Mining Treatment 1 (Control) Treatment 2 (Pathogen) Messenger RNA (mRNA) Extract 1 Messenger RNA (mRNA) Extract 2 cDNA DNA Hybridization Microarray (under LASER)

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( Publication (e.g., PubMed) Source (e.g., Taxonomy) Gene (e.g., GenBank) Experiment SampleHybridizationArray Normalization/ Discretization Data Components of A Microarray Experiment: Hybridization

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( Computational Workflows (e.g., myGrid) Experimental Services & Metadata (Mage-ML XML) Gene Expression Model Pathway & Network Learning Specification Data Preprocessing Specification Parameter Learning Specification Model Analysis Specification Discretization Use Case Data Mining Use Case Feature Selection Specification Validation (e.g., Bootstrap) Use Case Components of A Microarray Experiment: Computational Gene Expression Modeling

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( Graphical Models of Probability for Collaborative Filtering (CF) Goal: Estimate Filtering: r = t –Intuition: infer current state from observations –Applications: signal identification –Variation: Viterbi algorithm Prediction: r < t –Intuition: infer future state –Applications: prognostics Smoothing: r > t –Intuition: infer past hidden state –Applications: signal enhancement CF Tasks –Plan recognition by smoothing –Prediction cf. WebCANVAS – Cadez et al. (2000) Murphy (2002)

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( Tools for Building Graphical Models Commercial Tools: Ergo, Netica, TETRAD, Hugin Bayes Net Toolbox (BNT) – Murphy (1997-present) –Distribution page –Development group Bayesian Network tools in Java (BNJ) – Hsu et al. (1999-present) –Distribution page –Development group –Current (re)implementation projects for KSU KDD Lab Continuous state: Minka (2002) – Hsu, Guo, Perry, Boddhireddy Formats: XML BNIF (MSBN), Netica – Guo, Hsu Space-efficient DBN inference – Joehanes Bounded cutset conditioning – Chandak

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( Learning Environment Specification Fitness (Inferential Loss) [B] Parameter Estimation [A] Structure Learning G = (V, E) Graph Component of BN D: Data (User, Microarray) B = (V, E,  ) BN with Probabilities  D val (Model Validation by Inference) G1G1 G2G2 G3G3 G4G4 G5G5 G1G1 G2G2 G3G3 G4G4 G5G5 Experimenters’ Workbench

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( References [1]: Intelligent Filtering, IR, and KDD Intelligent Filtering –Taxonomy of Filtering Approaches: Rocha (2001) –Microsoft Research: Cadez et al. (1999), Heckerman and Meek (2002), Kadie (2002) –Technical report: survey, Hsu (2002) –NCSA Automated Learning Group Machine Learning, Data Mining, and Knowledge Discovery –K-State KDD Lab: literature survey and resource catalog (2002) –Bayesian Network tools in Java (BNJ): Hsu, Guo, Joehanes, Perry, Thornton (2002) –Machine Learning in Java (BNJ): Hsu, Louis, Plummer (2002)

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( References [2]: The Grid and Bioinformatics The Grid –United Kingdom eScience Initiative: Taylor et al. (2002) –Access Grid: Foster and Kesselman (1999), Foster (2002) –NSF NPACI lecture: Reed (10 Apr 2002) Bioinformatics –European Bioinformatics Institute Tutorial: Brazma et al. (2001) –Hebrew University: Friedman, Pe’er, et al. (1999, 2000, 2002) –K-State BMI Group: literature survey and resource catalog (2002) Kohavi (1998): “Crossing the Chasm”