Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab (www.kddresearch.org)www.kddresearch.org William.

Slides:



Advertisements
Similar presentations
1 Probability and the Web Ken Baclawski Northeastern University VIStology, Inc.
Advertisements

Database System Concepts and Architecture
Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( Collaborative.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
1 Richard White Design decisions: architecture 1 July 2005 BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Design decisions: architecture.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Scientific Data Mining: Emerging Developments and Challenges F. Seillier-Moiseiwitsch Bioinformatics Research Center Department of Mathematics and Statistics.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
Data Mining – Intro.
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
Computing & Information Sciences Kansas State University Boulder, Colorado First International Conference on Weblogs And Social Media (ICWSM-2007) Structural.
Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( Graphical.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Chun-Hung Chou
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July 2005 Metadata Agents and Semantic Mediation Mikhaila Burgess Cardiff University.
Deciding Semantic Matching of Stateless Services Duncan Hull †, Evgeny Zolin †, Andrey Bovykin ‡, Ian Horrocks †, Ulrike Sattler † and Robert Stevens †
TF Infer A Tool for Probabilistic Inference of Transcription Factor Activities H.M. Shahzad Asif Institute of Adaptive and Neural Computation School of.
Active Monitoring in GRID environments using Mobile Agent technology Orazio Tomarchio Andrea Calvagna Dipartimento di Ingegneria Informatica e delle Telecomunicazioni.
Chapter 1 Introduction to Data Mining
Computing & Information Sciences Kansas State University Boulder, Colorado First International Conference on Weblogs And Social Media (ICWSM-2007) Structural.
Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( Permutation.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( cDNA.
Building and Running caGrid Workflows in Taverna 1 Computation Institute, University of Chicago and Argonne National Laboratory, Chicago, IL, USA 2 Mathematics.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
EADGENE and SABRE Post-Analyses Workshop 12-14th November 2008, Lelystad, Netherlands 1 François Moreews SIGENAE, INRA, Rennes Cytoscape.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( Relational.
Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 28 of 41 Friday, 22 October.
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Anil Wipat University of Newcastle upon Tyne, UK A Grid based System for Microbial Genome Comparison and analysis.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
LEAP-KMC Workshop 2006 Visualization of KMC Simulation Data and Evolutionary Computation: The LEAP Infrastructure and Content Management System William.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, January 24, 2001.
Association of variations in I kappa B-epsilon with Graves' disease using classical and my Grid methodologies Peter Li School of Computing Science University.
© Geodise Project, University of Southampton, Knowledge Management in Geodise Geodise Knowledge Management Team Barry Tao, Colin Puleston, Liming.
Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Tuesday, December 7, 1999 William.
Proteomics databases for comparative studies: Transactional and Data Warehouse approaches Patricia Rodriguez-Tomé, Nicolas Pinaud, Thomas Kowall GeneProt,
Computing & Information Sciences Kansas State University Data Sciences Summer Institute Multimodal Information Access and Synthesis Learning and Reasoning.
FuGE: A framework for developing standards for functional genomics Andrew Jones School of Computer Science, University of Manchester Metabomeeting 2.0.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Computing & Information Sciences Kansas State University IJCAI HINA 2015: 3 rd Workshop on Heterogeneous Information Network Analysis KSU Laboratory for.
Data Mining and Decision Trees 1.Data Mining and Biological Information 2.Data Mining and Machine Learning Techniques 3.Decision trees and C5 4.Applications.
EB3233 Bioinformatics Introduction to Bioinformatics.
1 Outline Standardization - necessary components –what information should be exchanged –how the information should be exchanged –common terms (ontologies)
Kansas State University Department of Computing and Information Sciences CIS 690: Data Mining Systems Lab 0 Monday, May 15, 2000 William H. Hsu Department.
Databases, Ontologies and Text mining Session Introduction Part 2 Carole Goble, University of Manchester, UK Dietrich Rebholz-Schuhmann, EBI, UK Philip.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Friday, 14 November 2003 William.
Using DAML+OIL Ontologies for Service Discovery in myGrid Chris Wroe, Robert Stevens, Carole Goble, Angus Roberts, Mark Greenwood
Data Mining and Decision Support
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Graphical Models of Probability.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
High throughput biology data management and data intensive computing drivers George Michaels.
5/29/2001Y. D. Wu & M. Liu1 Content Management for Digital Library May 29, 2001.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
Data Mining – Intro.
Databases, Ontologies and Text mining Session Introduction Part 2
Spring 2003 Dr. Susan Bridges
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Data Warehousing and Data Mining
Overview of Machine Learning
Data Warehousing Data Mining Privacy
Presentation transcript:

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( William H. Hsu Department of Computing and Information Sciences Kansas State University Sunday, 31 July 2005 IJCAI-2005 Workshop W20, Multi-Agent Information Retrieval This presentation is: Joint work with: Jeffrey M. Barber, Haipeng Guo, Andrew L. King, Julie A. Thornton Relational Graphical Models for Collaborative Filtering and Recommendation Relational Representation Collaborative Recommendation, Information Retrieval & Extraction Multi-Agent Learning from Portal User Data

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( Outline Application: Workflow Modeling in Bioinformatics –Collaborative recommendation (CR) –Shallow CR: market basket analysis for cross-selling –Domain: gene expression modeling, proteomics, metabolomics Methodology: Relational Graphical Models (RGMs) –Workflow basics –DESCRIBER project: using RGMs for CR and info retrieval (IR) –Input, desired output, application, methodology, criteria Link Analysis Applications –Finding dynamic relational attributes –Identity uncertainty in spatial data cleaning Software for Building Graphical Models: BNJ Infrastructure and Preliminary Experiments

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( “Classical” Collaborative Recommendation: Clickstream Mining © 2003 Amazon.com, Inc. Explanation from Recommender (Decision Support) System Classification and Regression based upon Historical Customer Data (Market Basket Analysis)

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( Cross-Selling based upon Market Basket Analysis – Apriori (Agrawal, 1993) Basis for Collaborative Recommendation © 2002 Amazon.com, Inc. Shallow Collaborative Recommendation: Market Basket Analysis for Cross-Selling

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( Domain-Specific Workflow Repositories Workflows Transactional, Objective Views Workflow Components Data Sources, Transformations; Other Services Data Entity, Service, and Component Repository Index for Bioinformatics Experimental Research Learning over Workflow Instances and Use Cases (Historical User Requirements) Use Case & Query/Evaluation Data Personalized Interface Domain-Specific Collaborative Recommendation User Queries & Evaluations Decision Support Models Users of Information Grid & Scientific Workflow Repository Interface(s) to Distributed Repository Example Queries: What experiments have found cell cycle-regulated metabolic pathways in Saccharomyces? What codes and microarray data were used? How and why? Application to Computational Grid Portal: DESCRIBER Design

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( Computational Genomics and Microarray Gene Expression Modeling Treatment 1 (Control) Treatment 2 (Pathogen) Messenger RNA (mRNA) Extract 1 Messenger RNA (mRNA) Extract 2 cDNA DNA Hybridization Microarray (under LASER) Adapted from Friedman et al. (2000) Learning Environment G = (V, E) Specification Fitness (Inferential Loss) B = (V, E,  ) [B] Parameter Estimation G1G1 G2G2 G3G3 G4G4 G5G5 [A] Structure Learning G1G1 G2G2 G3G3 G4G4 G5G5 D val (Model Validation by Inference) D: Data (User, Microarray) Nir’s Invited Talk at IJCAI: Wednesday, 0900 GMT 03 Aug 2005

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( How do we get from microarray data (and other expression data) to a linked network? © G. Simpson (1999) Used with permission Bioinformatics: Data Mining from DNA Hybridization Microarrays

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( Application: Workflow Modeling in Bioinformatics –Collaborative recommendation (CR) –Shallow CR: market basket analysis for cross-selling –Domain: gene expression modeling, proteomics, metabolomics Methodology: Relational Graphical Models (RGMs) –Workflow basics –DESCRIBER project: using RGMs for CR and info retrieval (IR) –Input, desired output, application, methodology, criteria Link Analysis Applications –Finding dynamic relational attributes –Identity uncertainty in spatial data cleaning Software for Building Graphical Models: BNJ Infrastructure and Preliminary Experiments Outline

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( Finding Dynamic Relational Attributes: From Workflows to Class Diagrams cDNA Microarray- Experiment Gene Protein protein-product role pathway functional- description canonical- name accession-number protein-ID Relational Link (Reference Key) Probabilistic Dependency cDNA-sequence treatment hybridization normalization data regulation DNA-sequence Pathway pathway- descriptor pathway-name pathway-ID pathway TAVERNA Workbench my Grid Project © 2003 Oinn et al. DESCRIBER example schema © 2003 Hsu Transactional View (cf. UML Sequence Diagram)Objective View (cf. UML Class Diagram)

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( RGMs of Queries Module 4 Learning & Validation of RGMs for User Requirements Complete RGMs of User Queries Module 1 Collaborative Recommendation Front-End Personalized Interface Module 5 RGM Parameters from User Query Data Module 3 Estimation of RGM Parameters from Workflow and Component Database RGMs of Workflows Complete RGMs of Workflows (Data-Oriented) Recommendations/Evaluations (Before and After Use) User Queries Module 2 Learning & Validation of Relational Graphical Models (RGMs) for Experimental Workflows and Components Workflow Logs, Instances, Templates, Components (Services, Data Sources) Training Data Structure & Data Training Data Structure & Data DESCRIBER: Preliminary Overview of System

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( Input: Implemented Workflows –Workflow: operational aspect of work procedure Data sources: relational databases, object stores (what) Structure of tasks (what/how) Operations: structured queries, data transformations (how) Agents to perform tasks: web services/enactment history (who/where) –Examples Desktop: TIGR TM4 (gene expression data analysis suite) Intranet: groupware (e.g., business process management, ORACLE Workflow, IBM WebSphere MQ Workflow) Online: Computational science (grid) portals Representation –SCUFL (Stevens, 2002): language (DAML+OIL, now OWL) –TAVERNA (Oinn, 2003): editor Workflow Management [1]: Input and Representation

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( Workflow Management [2]: Problem Specification: Output, Criteria Output –Relational abstraction over workflow classes –Underlying graphical models representing workflow instances Goals –Personalize UI –Assist in retrieval, development and repurposing Workflows and components Decrease time, maintain quality Criteria –The hard part! –Classical evaluation measures: accuracy, precision vs. recall, likelihood – “just a start” (Langley, 2000) –Utility measures: user ratings, performance –User modeling: usability, accessibility of grid portal

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( Methodology [1]: from Collaborative Recommendation to IR Applications to Information Retrieval –Development of new workflows –Repurposing of prefabricated workflows –Personalization of interfaces What is Collaborative? –Filtering of workflow components by usage –Recommendation via ratings: EachMovie (McJones et al., 1997), Jester (Goldberg et al., 2001), MovieLens (Miller et al., 2003) Multi-Agent Aspects –Brokered services (W3C’s Simple Object Access Protocol v1.2) –Modeling context of data transformations, services, clients –Heterogeneous data at multiple levels of abstraction

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( Methodology [2]: Relational Models for Multi-Agent IR Probabilistic Inference and Representation –Probabilistic Relational Models (Friedman et al., 1999) –Single instancs extracted from TAVERNA editor –Workflow abstractions: dropping enactment information –Schemata: relational skeletons, link/reference slot uncertainty Applied Machine Learning –General problem: knowledge acquisition and capture –Schemata: designed with grid portal builder –Distributions learned from data: link, reference slot –Clusters: workflows, components, users –Relations from clusters to one another

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( Emergent Relational Structure “Google Approach” –Hubs/authorities (Brin & Page 1998, Kleinberg 1998) –Using existing structure: Netscape Open Directory Project (ODP) –Minimal annotation: meta tags (keywords, description) “CiteSeer/ResearchIndex Approach” –Citation indexing (Lawrence et al., 1998, Giles et al., 2002) –Web of influence (Koller, 2001) Where is The Relational Structure? –“Does inherent relational structure exist?” (Russell, SRL-2003) –Sources of rich info: “link structure” –Richer sources? Procedural context and beyond!

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( Outline Application: Workflow Modeling in Bioinformatics –Collaborative recommendation (CR) –Shallow CR: market basket analysis for cross-selling –Domain: gene expression modeling, proteomics, metabolomics Methodology: Relational Graphical Models (RGMs) –Workflow basics –DESCRIBER project: using RGMs for CR and info retrieval (IR) –Input, desired output, application, methodology, criteria Link Analysis Applications –Finding dynamic relational attributes –Identity uncertainty in spatial data cleaning Software for Building Graphical Models: BNJ Infrastructure and Preliminary Experiments

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( Identity Uncertainty How to Tell When Two Descriptors Refer to Same Entity? Problem –Coalesced databases –Multiple sources Errors and Inconsistencies –Spatial, temporal error –Inconsistent descriptors Clues –Proximity in space, time –Similarities in values of key variables (attributes, features) Applications –Fraud detection and information security (intrusion detection) –Data cleaning

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( Spatial Data Cleaning: STARWARD Groundwater irrigation lifetime estimates in the Ogallala region of the Kansas High Plains aquifer. [Wilson et al. 2002] Darkest: already depleted Next darkest: years Problems Water well location (identity uncertainty in coalesced spatial databases), descriptive statistics (paraconsistency), spatial outlier detection

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( Outline Application: Workflow Modeling in Bioinformatics –Collaborative recommendation (CR) –Shallow CR: market basket analysis for cross-selling –Domain: gene expression modeling, proteomics, metabolomics Methodology: Relational Graphical Models (RGMs) –Workflow basics –DESCRIBER project: using RGMs for CR and info retrieval (IR) –Input, desired output, application, methodology, criteria Link Analysis Applications –Finding dynamic relational attributes –Identity uncertainty in spatial data cleaning Software for Building Graphical Models: BNJ Infrastructure and Preliminary Experiments

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( BNJ Graphical User Interface [1]: Editor © 2005 KSU Bayesian Network tools in Java (BNJ) Development Team ALARM Network

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( BNJ Graphical User Interface [2]: Graph Visualization and Algorithm Animation CPCS-54 Network© 2004 KSU Bayesian Network tools in Java (BNJ) Development Team

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( Genetic Algorithm for BN Structure Learning Results: ALARM-13 (Hsu, Guo, Perry & Stilson, 2002)

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( Commercial Tools: Ergo, Netica, TETRAD, Hugin Open Source Tools: BNT (Murphy, 2001), gR (Lauritzen et al., 2002) Bayesian Network tools in Java (BNJ) – Hsu et al. (2002-present) –Distribution page –Development group –Current (re)implementation projects for KSU KDD Lab Structure learning and parameter estimation – Hsu, Barber Fast Adaptive Importance Sampling, other sampling – King, Guo Statistical Machine Translation / Information Extraction (IE) toolkit – Al-Jandal, Meyer, Pydimarri Continuous time representations – Barber, Hsu Formats: XML BNIF (MSBN), Netica – Guo, Barber, Hsu Space-efficient DBN inference – Hsu, Barber Software Packages for Building Graphical Models: BNJ, etc.

Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( Acknowledgements Kansas State University Lab for Knowledge Discovery in Databases –Alumni: Guo (HKUST), Perry (Delaware), Thornton (Kansas State) –Graduate students: Ph.D. – Al-Jandal, Li; M.S. – Barber (Math), Meyer, Pydimarri –Undergraduate programmers: King (CIS); Bell, Figueroa (2005 summer interns) Joint Work with –KSU Bioinformatics Group (EECE: Das; Agronomy: Welch, Roe; Weather: Knapp) –NSF FIBR (Brown: Schmitt; NCSU: Purugganan; Wisconsin: Amasino) Thanks to Collaborators and Other Research Groups –IJCAI-2001, AAAI/UAI/KDD-2002, IJCAI-2003 (UMBC: Kargupta, ASU: Liu; Iowa: Street; MSR: Horvitz; UConn: Santos; HKUST: Guo) –BNJ/CSR (CMU: Glymour, Scheines; IA State: Honavar, Margaritis, Tian) –myGrid/TAVERNA (Manchester: Goble, Stevens; EBI: Oinn; Southampton: Addis) –The Institute for Genomic Research (Quackenbush, Saeed) –Kansas Geological Survey (Bohling), Kansas Biological Survey, KU EECS –NSF ITR (KSU Physics: Rahman, Kara; KSU CIS: Wallentine)