Presentation is loading. Please wait.

Presentation is loading. Please wait.

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org William.

Similar presentations


Presentation on theme: "Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org William."— Presentation transcript:

1 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org William H. Hsu Department of Computing and Information Sciences Kansas State University http://www.kddresearch.org Sunday, 31 July 2005 IJCAI-2005 Workshop W20, Multi-Agent Information Retrieval This presentation is: http://www.kddresearch.org/KSU/CIS/IJCAI-20050731.ppthttp://www.kddresearch.org/KSU/CIS/IJCAI-20050731.ppt Joint work with: Jeffrey M. Barber, Haipeng Guo, Andrew L. King, Julie A. Thornton Relational Graphical Models for Collaborative Filtering and Recommendation Relational Representation Collaborative Recommendation, Information Retrieval & Extraction Multi-Agent Learning from Portal User Data

2 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Outline Application: Workflow Modeling in Bioinformatics –Collaborative recommendation (CR) –Shallow CR: market basket analysis for cross-selling –Domain: gene expression modeling, proteomics, metabolomics Methodology: Relational Graphical Models (RGMs) –Workflow basics –DESCRIBER project: using RGMs for CR and info retrieval (IR) –Input, desired output, application, methodology, criteria Link Analysis Applications –Finding dynamic relational attributes –Identity uncertainty in spatial data cleaning Software for Building Graphical Models: BNJ Infrastructure and Preliminary Experiments

3 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org “Classical” Collaborative Recommendation: Clickstream Mining © 2003 Amazon.com, Inc. Explanation from Recommender (Decision Support) System Classification and Regression based upon Historical Customer Data (Market Basket Analysis)

4 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Cross-Selling based upon Market Basket Analysis – Apriori (Agrawal, 1993) Basis for Collaborative Recommendation © 2002 Amazon.com, Inc. Shallow Collaborative Recommendation: Market Basket Analysis for Cross-Selling

5 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Domain-Specific Workflow Repositories Workflows Transactional, Objective Views Workflow Components Data Sources, Transformations; Other Services Data Entity, Service, and Component Repository Index for Bioinformatics Experimental Research Learning over Workflow Instances and Use Cases (Historical User Requirements) Use Case & Query/Evaluation Data Personalized Interface Domain-Specific Collaborative Recommendation User Queries & Evaluations Decision Support Models Users of Information Grid & Scientific Workflow Repository Interface(s) to Distributed Repository Example Queries: What experiments have found cell cycle-regulated metabolic pathways in Saccharomyces? What codes and microarray data were used? How and why? Application to Computational Grid Portal: DESCRIBER Design

6 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Computational Genomics and Microarray Gene Expression Modeling Treatment 1 (Control) Treatment 2 (Pathogen) Messenger RNA (mRNA) Extract 1 Messenger RNA (mRNA) Extract 2 cDNA DNA Hybridization Microarray (under LASER) Adapted from Friedman et al. (2000) http://www.cs.huji.ac.il/labs/compbio/http://www.cs.huji.ac.il/labs/compbio/ Learning Environment G = (V, E) Specification Fitness (Inferential Loss) B = (V, E,  ) [B] Parameter Estimation G1G1 G2G2 G3G3 G4G4 G5G5 [A] Structure Learning G1G1 G2G2 G3G3 G4G4 G5G5 D val (Model Validation by Inference) D: Data (User, Microarray) Nir’s Invited Talk at IJCAI: Wednesday, 0900 GMT 03 Aug 2005

7 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org How do we get from microarray data (and other expression data) to a linked network? © G. Simpson (1999) Used with permission Bioinformatics: Data Mining from DNA Hybridization Microarrays

8 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Application: Workflow Modeling in Bioinformatics –Collaborative recommendation (CR) –Shallow CR: market basket analysis for cross-selling –Domain: gene expression modeling, proteomics, metabolomics Methodology: Relational Graphical Models (RGMs) –Workflow basics –DESCRIBER project: using RGMs for CR and info retrieval (IR) –Input, desired output, application, methodology, criteria Link Analysis Applications –Finding dynamic relational attributes –Identity uncertainty in spatial data cleaning Software for Building Graphical Models: BNJ Infrastructure and Preliminary Experiments Outline

9 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Finding Dynamic Relational Attributes: From Workflows to Class Diagrams cDNA Microarray- Experiment Gene Protein protein-product role pathway functional- description canonical- name accession-number protein-ID Relational Link (Reference Key) Probabilistic Dependency cDNA-sequence treatment hybridization normalization data regulation DNA-sequence Pathway pathway- descriptor pathway-name pathway-ID pathway TAVERNA Workbench my Grid Project © 2003 Oinn et al. DESCRIBER example schema © 2003 Hsu Transactional View (cf. UML Sequence Diagram)Objective View (cf. UML Class Diagram)

10 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org RGMs of Queries Module 4 Learning & Validation of RGMs for User Requirements Complete RGMs of User Queries Module 1 Collaborative Recommendation Front-End Personalized Interface Module 5 RGM Parameters from User Query Data Module 3 Estimation of RGM Parameters from Workflow and Component Database RGMs of Workflows Complete RGMs of Workflows (Data-Oriented) Recommendations/Evaluations (Before and After Use) User Queries Module 2 Learning & Validation of Relational Graphical Models (RGMs) for Experimental Workflows and Components Workflow Logs, Instances, Templates, Components (Services, Data Sources) Training Data Structure & Data Training Data Structure & Data DESCRIBER: Preliminary Overview of System

11 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Input: Implemented Workflows –Workflow: operational aspect of work procedure Data sources: relational databases, object stores (what) Structure of tasks (what/how) Operations: structured queries, data transformations (how) Agents to perform tasks: web services/enactment history (who/where) –Examples Desktop: TIGR TM4 (gene expression data analysis suite) Intranet: groupware (e.g., business process management, ORACLE Workflow, IBM WebSphere MQ Workflow) Online: Computational science (grid) portals Representation –SCUFL (Stevens, 2002): language (DAML+OIL, now OWL) –TAVERNA (Oinn, 2003): editor Workflow Management [1]: Input and Representation

12 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Workflow Management [2]: Problem Specification: Output, Criteria Output –Relational abstraction over workflow classes –Underlying graphical models representing workflow instances Goals –Personalize UI –Assist in retrieval, development and repurposing Workflows and components Decrease time, maintain quality Criteria –The hard part! –Classical evaluation measures: accuracy, precision vs. recall, likelihood – “just a start” (Langley, 2000) –Utility measures: user ratings, performance –User modeling: usability, accessibility of grid portal

13 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Methodology [1]: from Collaborative Recommendation to IR Applications to Information Retrieval –Development of new workflows –Repurposing of prefabricated workflows –Personalization of interfaces What is Collaborative? –Filtering of workflow components by usage –Recommendation via ratings: EachMovie (McJones et al., 1997), Jester (Goldberg et al., 2001), MovieLens (Miller et al., 2003) Multi-Agent Aspects –Brokered services (W3C’s Simple Object Access Protocol v1.2) http://www.w3.org/TR/soap/ http://www.w3.org/TR/soap/ –Modeling context of data transformations, services, clients –Heterogeneous data at multiple levels of abstraction

14 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Methodology [2]: Relational Models for Multi-Agent IR Probabilistic Inference and Representation –Probabilistic Relational Models (Friedman et al., 1999) –Single instancs extracted from TAVERNA editor –Workflow abstractions: dropping enactment information –Schemata: relational skeletons, link/reference slot uncertainty Applied Machine Learning –General problem: knowledge acquisition and capture –Schemata: designed with grid portal builder –Distributions learned from data: link, reference slot –Clusters: workflows, components, users –Relations from clusters to one another

15 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Emergent Relational Structure “Google Approach” –Hubs/authorities (Brin & Page 1998, Kleinberg 1998) –Using existing structure: Netscape Open Directory Project (ODP) –Minimal annotation: meta tags (keywords, description) “CiteSeer/ResearchIndex Approach” –Citation indexing (Lawrence et al., 1998, Giles et al., 2002) –Web of influence (Koller, 2001) Where is The Relational Structure? –“Does inherent relational structure exist?” (Russell, SRL-2003) –Sources of rich info: “link structure” –Richer sources? Procedural context and beyond!

16 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Outline Application: Workflow Modeling in Bioinformatics –Collaborative recommendation (CR) –Shallow CR: market basket analysis for cross-selling –Domain: gene expression modeling, proteomics, metabolomics Methodology: Relational Graphical Models (RGMs) –Workflow basics –DESCRIBER project: using RGMs for CR and info retrieval (IR) –Input, desired output, application, methodology, criteria Link Analysis Applications –Finding dynamic relational attributes –Identity uncertainty in spatial data cleaning Software for Building Graphical Models: BNJ Infrastructure and Preliminary Experiments

17 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Identity Uncertainty How to Tell When Two Descriptors Refer to Same Entity? Problem –Coalesced databases –Multiple sources Errors and Inconsistencies –Spatial, temporal error –Inconsistent descriptors Clues –Proximity in space, time –Similarities in values of key variables (attributes, features) Applications –Fraud detection and information security (intrusion detection) –Data cleaning

18 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Spatial Data Cleaning: STARWARD Groundwater irrigation lifetime estimates in the Ogallala region of the Kansas High Plains aquifer. [Wilson et al. 2002] http://snurl.com/39kz http://snurl.com/39kz Darkest: already depleted Next darkest: 25-50 years Problems Water well location (identity uncertainty in coalesced spatial databases), descriptive statistics (paraconsistency), spatial outlier detection

19 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Outline Application: Workflow Modeling in Bioinformatics –Collaborative recommendation (CR) –Shallow CR: market basket analysis for cross-selling –Domain: gene expression modeling, proteomics, metabolomics Methodology: Relational Graphical Models (RGMs) –Workflow basics –DESCRIBER project: using RGMs for CR and info retrieval (IR) –Input, desired output, application, methodology, criteria Link Analysis Applications –Finding dynamic relational attributes –Identity uncertainty in spatial data cleaning Software for Building Graphical Models: BNJ Infrastructure and Preliminary Experiments

20 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org BNJ Graphical User Interface [1]: Editor © 2005 KSU Bayesian Network tools in Java (BNJ) Development Team ALARM Network

21 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org BNJ Graphical User Interface [2]: Graph Visualization and Algorithm Animation CPCS-54 Network© 2004 KSU Bayesian Network tools in Java (BNJ) Development Team

22 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Genetic Algorithm for BN Structure Learning Results: ALARM-13 (Hsu, Guo, Perry & Stilson, 2002)

23 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Commercial Tools: Ergo, Netica, TETRAD, Hugin Open Source Tools: BNT (Murphy, 2001), gR (Lauritzen et al., 2002) Bayesian Network tools in Java (BNJ) – Hsu et al. (2002-present) –Distribution page http://bnj.sourceforge.net http://bnj.sourceforge.net –Development group http://groups.yahoo.com/group/bndev http://groups.yahoo.com/group/bndev –Current (re)implementation projects for KSU KDD Lab Structure learning and parameter estimation – Hsu, Barber Fast Adaptive Importance Sampling, other sampling – King, Guo Statistical Machine Translation / Information Extraction (IE) toolkit – Al-Jandal, Meyer, Pydimarri Continuous time representations – Barber, Hsu Formats: XML BNIF (MSBN), Netica – Guo, Barber, Hsu Space-efficient DBN inference – Hsu, Barber Software Packages for Building Graphical Models: BNJ, etc.

24 Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org Acknowledgements Kansas State University Lab for Knowledge Discovery in Databases –Alumni: Guo (HKUST), Perry (Delaware), Thornton (Kansas State) –Graduate students: Ph.D. – Al-Jandal, Li; M.S. – Barber (Math), Meyer, Pydimarri –Undergraduate programmers: King (CIS); Bell, Figueroa (2005 summer interns) Joint Work with –KSU Bioinformatics Group (EECE: Das; Agronomy: Welch, Roe; Weather: Knapp) –NSF FIBR (Brown: Schmitt; NCSU: Purugganan; Wisconsin: Amasino) www.egad.ksu.edu www.egad.ksu.edu Thanks to Collaborators and Other Research Groups –IJCAI-2001, AAAI/UAI/KDD-2002, IJCAI-2003 (UMBC: Kargupta, ASU: Liu; Iowa: Street; MSR: Horvitz; UConn: Santos; HKUST: Guo) www.kddresearch.org/Workshops www.kddresearch.org/Workshops –BNJ/CSR (CMU: Glymour, Scheines; IA State: Honavar, Margaritis, Tian) –myGrid/TAVERNA (Manchester: Goble, Stevens; EBI: Oinn; Southampton: Addis) –The Institute for Genomic Research (Quackenbush, Saeed) –Kansas Geological Survey (Bohling), Kansas Biological Survey, KU EECS –NSF ITR (KSU Physics: Rahman, Kara; KSU CIS: Wallentine) http://www.phys.ksu.edu/~a0kara01/ITR/ http://www.phys.ksu.edu/~a0kara01/ITR/


Download ppt "Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org William."

Similar presentations


Ads by Google