Presentation is loading. Please wait.

Presentation is loading. Please wait.

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Collaborative.

Similar presentations


Presentation on theme: "Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Collaborative."— Presentation transcript:

1

2 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Collaborative Filtering Intelligent Information Retrieval and the Grid Friday 11 October 2002 William H. Hsu Laboratory for Knowledge Discovery in Databases Department of Computing and Information Sciences Kansas State University http://www.kddresearch.org This presentation is: http://www.kddresearch.org/KSU/CIS/KU-20021010.ppt

3 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Acknowledgements Kansas State University Lab for Knowledge Discovery in Databases –Graduate research assistants: Haipeng Guo (hpguo@cis.ksu.edu), Roby Joehanes (hpguo@cis.ksu.edu)hpguo@cis.ksu.edu –Other grad students: Prashanth Boddhireddy, Siddharth Chandak, Ben B. Perry, Rengakrishnan Subramanian –Undergraduate programmers: James W. Plummer, Julie A. Thornton Joint Work with –KSU Bioinformatics and Medical Informatics (BMI) group: Sanjoy Das (EECE), Judith L. Roe (Biology), Stephen M. Welch (Agronomy) –KSU Microarray group: Scot Hulbert (Plant Pathology), J. Clare Nelson (Plant Pathology), Jan Leach (Plant Pathology) –Kansas Geological Survey, Kansas Biological Survey, KU EECS Other Research Partners –NCSA Automated Learning Group (Michael Welge, Tom Redman) –University of Manchester (Carole Goble, Robert Stevens) –The Institute for Genomic Research (John Quackenbush, Alex Saeed) –International Rice Research Institute (Richard Bruskiewich)

4 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Overview Filtering –Collaborative filtering (CF) and relatives –Application to intelligent information retrieval (IR) Computational Grids –High-Performance Computing (HPC) services Scientific data, metadata (ontologies, specifications), documentation Software tools (source codes, application servers) Experimental results –Grid initiatives: TeraGrid (USA), eScience (UK, EBI) Challenge: Personalization of Services Application: Bioinformatics Methodology: Learning Relational Probabilistic Models –User modeling and collaborative filtering (CF) –DESCRIBER system: integrative CF for computational genomics Current Research and Open Problems

5 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Cross-Selling (based upon Market Basket Analysis) Collaborative Recommendation © 2002 Amazon.com, Inc. Collaborative Filtering in Action: Amazon.com [1]

6 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Collaborative Filtering in Action: Amazon.com [2] © 2002 Amazon.com, Inc. Classification and Regression based upon Historical Customer Data Explanation from Recommender (Decision Support) System

7 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Filtering and Recommendation Approaches Collaborative –Collect: recorded decisions (actions) of user(s) –Infer: preferences of user(s) –Model: associational relationships among entities (e.g., purchases) –Use to: recommend similar decisions to users in similar context Structural –Collect: recorded decisions (actions) of user(s) –Infer: preferences of user(s) –Model: causal relationships among entities (e.g., use cases) –Use to: make recommendation and explain Content-Based: Driven by Key Word / Phrase Collective: Driven by Consensus, Stochastic Mixture Model (e.g., “Swarm Intelligence”, Ant Colony Optimization)

8 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org ThemeScapes © 1999 SPIRIX software http://www.cartia.com 6500 news stories from the WWW in 1997 A Filtering Problem: Text Mining for Information Retrieval (IR)

9 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Another Filtering Application: Commercial Fraud Monitoring

10 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Stages of Data Mining and Knowledge Discovery in Databases Adapted from Fayyad, Piatetsky-Shapiro, and Smyth (1996)

11 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org NCSA D2K: Visual Programming System for Rapid Application Development in KDD Data to Knowledge (D2K) © 2002 NCSA http://archive.ncsa.uiuc.edu/STI/ALG/d2k/

12 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org NCSA D2K Workflow: Decision Support in Insurance Pricing Hsu, Welge, Redman, Clutter (2002) Data Mining and Knowledge Discovery, 6(4):361-391

13 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Computational Grids [1]: High-Performance Distributed Computing What is The Grid? –Infrastructure: Distributed Processing, Networks, Software –Paradigm for Very Large-Scale Scientific Computing End Users of The Grid – Adapted from Goble (2002) –Providers Tool builders Systems/network administrators, service providers, etc. –Researchers Scientific discipline – e.g., Biology Computational Science and Engineering (CSE) – e.g., Bioinformatics Patent Intelligence! –“End users” Developers: e.g., pharmaceutical Medical doctors, patients

14 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Computational Grids [2]: Personalization of Services What Services? –High-Performance Computing (HPC) facilities Compute clusters (Beowulf, NT, etc.) Massively distributed networks –Software –Scientific data servers Metadata –Ontologies: Definitional Data Models (cf. Semantic Web) –Service Type Directory Dynamic Design of Workflows – myGrid, Goble et al. (2002) http://www.ebi.ac.uk/mygrid http://www.ebi.ac.uk/mygrid Challenge: Personalization –Intelligent Filtering Approach: User Modeling –“Users Who Used (Your) Specified Resources Also Used…”

15 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Domain-Specific Repositories Experimental Data Source Codes and Specifications Data Models Ontologies Models Data Entity and Source Code Repository Index for Bioinformatics Experimental Research Personalized Interface Domain-Specific Collaborative Filtering New Queries Learning and Inference Components Historical Use Case & Query Data Decision Support Models Users of Scientific Document Repository Interface(s) to Distributed Repository Example Queries: What experiments have found cell cycle-regulated metabolic pathways in Saccharomyces? What codes and microarray data were used, and why? DESCRIBER: An Experimental Intelligent Filter

16 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Module 2 Learning & Validation of Bayesian Network Models for Use Cases Module 4 Learning & Validation of Bayesian Network Models for MAGE Data & Codes Relational Models of MAGE Data Module 1 Intelligent Collaborative Filtering Front-End Data Historical Use Case & Query Data Personalized Interface Module 5 MAGE Data Model User Estimation of Constraint Parameters Graphical Models of Use Cases Module 3 Constrained Models of Use Cases New Queries DESCRIBER [1]: Overview

17 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Intelligent Collaborative Filtering Front-End Personalized Interface Relational Models of (Domain-Specific) Data Constrained Models of Use Cases Relational Probabilistic Model Constraint Selector Integrated Reasoning Component: XML Validator and Constraint Checker Constraints on Repository Content Response to User New Query from User Module 1 DESCRIBER [2]: Collaborative Filtering Module

18 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Computational Genomics and Microarray Data Mining Treatment 1 (Control) Treatment 2 (Pathogen) Messenger RNA (mRNA) Extract 1 Messenger RNA (mRNA) Extract 2 cDNA DNA Hybridization Microarray (under LASER)

19 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Publication (e.g., PubMed) Source (e.g., Taxonomy) Gene (e.g., GenBank) Experiment SampleHybridizationArray Normalization/ Discretization Data Components of A Microarray Experiment: Hybridization

20 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Computational Workflows (e.g., myGrid) Experimental Services & Metadata (Mage-ML XML) Gene Expression Model Pathway & Network Learning Specification Data Preprocessing Specification Parameter Learning Specification Model Analysis Specification Discretization Use Case Data Mining Use Case Feature Selection Specification Validation (e.g., Bootstrap) Use Case Components of A Microarray Experiment: Computational Gene Expression Modeling

21 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Graphical Models of Probability for Collaborative Filtering (CF) Goal: Estimate Filtering: r = t –Intuition: infer current state from observations –Applications: signal identification –Variation: Viterbi algorithm Prediction: r < t –Intuition: infer future state –Applications: prognostics Smoothing: r > t –Intuition: infer past hidden state –Applications: signal enhancement CF Tasks –Plan recognition by smoothing –Prediction cf. WebCANVAS – Cadez et al. (2000) Murphy (2002)

22 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Tools for Building Graphical Models Commercial Tools: Ergo, Netica, TETRAD, Hugin Bayes Net Toolbox (BNT) – Murphy (1997-present) –Distribution page http://http.cs.berkeley.edu/~murphyk/Bayes/bnt.html http://http.cs.berkeley.edu/~murphyk/Bayes/bnt.html –Development group http://groups.yahoo.com/group/BayesNetToolbox http://groups.yahoo.com/group/BayesNetToolbox Bayesian Network tools in Java (BNJ) – Hsu et al. (1999-present) –Distribution page http://bndev.sourceforge.net http://bndev.sourceforge.net –Development group http://groups.yahoo.com/group/bndev http://groups.yahoo.com/group/bndev –Current (re)implementation projects for KSU KDD Lab Continuous state: Minka (2002) – Hsu, Guo, Perry, Boddhireddy Formats: XML BNIF (MSBN), Netica – Guo, Hsu Space-efficient DBN inference – Joehanes Bounded cutset conditioning – Chandak

23 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Learning Environment Specification Fitness (Inferential Loss) [B] Parameter Estimation [A] Structure Learning G = (V, E) Graph Component of BN D: Data (User, Microarray) B = (V, E,  ) BN with Probabilities  D val (Model Validation by Inference) G1G1 G2G2 G3G3 G4G4 G5G5 G1G1 G2G2 G3G3 G4G4 G5G5 Experimenters’ Workbench

24 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org References [1]: Intelligent Filtering, IR, and KDD Intelligent Filtering –Taxonomy of Filtering Approaches: Rocha (2001) http://www.c3.lanl.gov/~rocha/GB0/adapweb_GB0.html http://www.c3.lanl.gov/~rocha/GB0/adapweb_GB0.html –Microsoft Research: Cadez et al. (1999), Heckerman and Meek (2002), Kadie (2002) –Technical report: survey, Hsu (2002) http://www.kddresearch.org/Publications/Techreports/BMI-2001.pdf http://www.kddresearch.org/Publications/Techreports/BMI-2001.pdf –NCSA Automated Learning Group http://www.ncsa.uiuc.edu/STI/ALGhttp://www.ncsa.uiuc.edu/STI/ALG Machine Learning, Data Mining, and Knowledge Discovery –K-State KDD Lab: literature survey and resource catalog (2002) http://www.kddresearch.org/Resources http://www.kddresearch.org/Resources –Bayesian Network tools in Java (BNJ): Hsu, Guo, Joehanes, Perry, Thornton (2002) http://bndev.sourceforge.net http://bndev.sourceforge.net –Machine Learning in Java (BNJ): Hsu, Louis, Plummer (2002) http://mldev.sourceforge.net http://mldev.sourceforge.net

25 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org References [2]: The Grid and Bioinformatics The Grid –United Kingdom eScience Initiative: Taylor et al. (2002) http://www.research-councils.ac.uk/escience http://www.research-councils.ac.uk/escience –Access Grid: Foster and Kesselman (1999), Foster (2002) http://www-fp.mcs.anl.gov/fl/accessgrid http://www-fp.mcs.anl.gov/fl/accessgrid –NSF NPACI lecture: Reed (10 Apr 2002) http://www.interact.nsf.gov/cise/conferences.nsf/cise_lectures http://www.interact.nsf.gov/cise/conferences.nsf/cise_lectures Bioinformatics –European Bioinformatics Institute Tutorial: Brazma et al. (2001) http://www.ebi.ac.uk/microarray/biology_intro.htm http://www.ebi.ac.uk/microarray/biology_intro.htm –Hebrew University: Friedman, Pe’er, et al. (1999, 2000, 2002) http://www.cs.huji.ac.il/labs/compbio/ http://www.cs.huji.ac.il/labs/compbio/ –K-State BMI Group: literature survey and resource catalog (2002) http://www.kddresearch.org/Groups/Bioinformatics http://www.kddresearch.org/Groups/Bioinformatics Kohavi (1998): “Crossing the Chasm” http://robotics.stanford.edu/~ronnyk/chasm.pdf


Download ppt "Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Collaborative."

Similar presentations


Ads by Google