Some Interesting Problems Rakesh Agrawal IBM Almaden Research Center.

Slides:



Advertisements
Similar presentations
PAKDD Panel: What Next Ramakrishnan Srikant. What Next Electronic Commerce –Catalog Integration (WWW 2001, with R. Agrawal) –Searching with Numbers (WWW.
Advertisements

Data Mining: Potentials and Challenges Rakesh Agrawal & Jeff Ullman.
Web Mining.
Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
Rule Discovery from Time Series Presented by: Murali K. Kadimi.
Web Services: A Personal Viewpoint Rakesh Agrawal IBM Almaden Research Center.
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
Data Mining Sangeeta Devadiga CS 157B, Spring 2007.
Privacy Preserving Association Rule Mining in Vertically Partitioned Data Reporter : Ximeng Liu Supervisor: Rongxing Lu School of EEE, NTU
Rakesh Agrawal Ramakrishnan Srikant
Chapter 5: Mining Frequent Patterns, Association and Correlations
2015/6/1Course Introduction1 Welcome! MSCIT 521: Knowledge Discovery and Data Mining Qiang Yang Hong Kong University of Science and Technology
An architecture for Privacy Preserving Mining of Client Information Jaideep Vaidya Purdue University This is joint work with Murat.
CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
Data Mining: Next 10 Years Rakesh Agrawal IBM Almaden Research Center Position from KDD-2001 Revisited.
CS 590M Fall 2001: Security Issues in Data Mining Lecture 5: Association Rules, Sequential Associations.
Data Mining.
Making Semantic Web Real: Some Building Blocks Rakesh Agrawal IBM Almaden Research Center.
Association Rule Mining (Some material adapted from: Mining Sequential Patterns by Karuna Pande Joshi)‏
Chaotic Mining: Knowledge Discovery Using the Fractal Dimension Daniel Barbara George Mason University Information and Software Engineering Department.
Privacy Preserving OLAP Rakesh Agrawal, IBM Almaden Ramakrishnan Srikant, IBM Almaden Dilys Thomas, Stanford University.
ITIS 6220/8220 Data Privacy Fall Overview Class hour 6:30 – 9:15pm, Monday Office hour 4pm – 6pm, Monday Instructor - Dr. Xintao Wu -
Privacy-Preserving Data Mining Rakesh Agrawal Ramakrishnan Srikant IBM Almaden Research Center 650 Harry Road, San Jose, CA Published in: ACM SIGMOD.
The Inductive Software Engineering Manifesto Principles for Industrial Data Mining Paper Authored By: Menzies & Kocaganeli – Lane Dept of CS/EE, WVU Bird,
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Mining Sequential Patterns: Generalizations and Performance Improvements R. Srikant R. Agrawal IBM Almaden Research Center Advisor: Dr. Hsu Presented by:
Webpage Understanding: an Integrated Approach
Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005.
Data Mining CS157B Fall 04 Professor Lee By Yanhua Xue.
Chapter Chapter 13-2 Accounting Information Systems, 1 st Edition Data and Databases.
Course on Data Mining: Seminar Meetings Page 1/17 Course on Data Mining ( ): Seminar Meetings Ass. Rules EpisodesEpisodes Text Mining
Data Mining: Potentials and Challenges Rakesh Agrawal IBM Almaden Research Center.
Data Mining By Dave Maung.
Privacy Preserving Mining of Association Rules Alexandre Evfimievski, Ramakrishnan Srikant, Rakesh Agrawal, Johannes Gehrke IBM Almaden Research Center.
User Behavior Analysis of Location Aware Search Engine Third international Conference of MDM, 2002 Takahiko Shintani, Iko Pramudiono NTT Information Sharing.
Copyright © 2004 Pearson Education, Inc.. Chapter 27 Data Mining Concepts.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Additive Data Perturbation: the Basic Problem and Techniques.
Last Words DM 1. Mining Data Steams / Incremental Data Mining / Mining sensor data (e.g. modify a decision tree assuming that new examples arrive continuously,
Randomization in Privacy Preserving Data Mining Agrawal, R., and Srikant, R. Privacy-Preserving Data Mining, ACM SIGMOD’00 the following slides include.
BIS 360 – Lecture Nine Ch. 13: Designing Forms and Reports.
An Evaluation of Commercial Data Mining Proposed and Presented by Emily Davis Supervisor: John Ebden.
Mining System-User Interaction Traces for Use Case Models Mohammed El-Ramly Eleni Stroulia Paul Sorenson (presented by Hsiao-Ming Tsou)‏
Data Mining and Decision Trees 1.Data Mining and Biological Information 2.Data Mining and Machine Learning Techniques 3.Decision trees and C5 4.Applications.
Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.
Sovereign Information Sharing, Searching and Mining Rakesh Agrawal IBM Almaden Research Center.
How To Program An Overview Or A Reframing of the Question of Programming.
1 Limiting Privacy Breaches in Privacy Preserving Data Mining In Proceedings of the 22 nd ACM SIGACT – SIGMOD – SIFART Symposium on Principles of Database.
1 Privacy Preserving Data Mining Introduction August 2 nd, 2013 Shaibal Chakrabarty.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Mining of Massive Datasets Edited based on Leskovec’s from
Chap 6: Association Rules. Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
Reconciling Confidentiality Risk Measures from Statistics and Computer Science Jerry Reiter Department of Statistical Science Duke University.
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
Book web site:
What types of problems we study, Part 1: Statistical problemsHighlights of the theoretical results What types of problems we study, Part 2: ClusteringFuture.
Data mining in web applications
Association rule mining
Introduction C.Eng 714 Spring 2010.
Data Mining: Concepts and Techniques Course Outline
Sangeeta Devadiga CS 157B, Spring 2007
Data Warehousing and Data Mining
iSRD Spam Review Detection with Imbalanced Data Distributions
Welcome! Knowledge Discovery and Data Mining
Statistical Relational AI
Presentation transcript:

Some Interesting Problems Rakesh Agrawal IBM Almaden Research Center

Foundations What is data mining A collection of techniques? A set of composable operations (a la Relational Algebra)? Hints: Inductive Databases (Mannila) Relational Calculus + Statistical Quantifiers (Imielinski)

Privacy Implications Can we build accurate data models while preserving privacy of individual records? Hints Randomization (Agrawal & Srikant): Replace x by x+y where y is drawn from a known distribution Anonymization (Crypto literature)

Web Mining: Beyond Click Streams Mining knowledge bases from the web Completeness Accuracy Malicious Spam Hints: Brin’s Book experiment etc. etc.

Web Mining: Beyond hrefs What other social behaviors exist on the web and how to make use of them? Hints: Viral marketing paper in this conf etc. etc.

Actionable Patterns Principled use of domain knowledge for discarding uninteresting patterns performance Hints: Papers in the recent KDD conferences

Simultaneous mining over multiple data types Not just Relational tables Time series Textual documents But patterns across all of them

Some more problems Online, incremental algorithms over data streams When to retire the past data Long sequential patterns Discovering richer patterns (trees and dags) Automatic, data-dependent selection of algorithm parameters

What not to work on? The field is too young! Let every flower bloom!!! Too early to say we don’t need new algorithms Impressive results of the PVSM algorithm Emphasize evaluation and benchmarks Interesting research issues

Applications most likely to benefit from data mining Web applications (I think) Bioinformatics (I hope!)

Inhibitors Insufficient skill base (Education) Usability

The true delight is in the finding out, rather than in the knowing. Isaac Asimov