Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Mining: Potentials and Challenges Rakesh Agrawal & Jeff Ullman.

Similar presentations


Presentation on theme: "Data Mining: Potentials and Challenges Rakesh Agrawal & Jeff Ullman."— Presentation transcript:

1 Data Mining: Potentials and Challenges Rakesh Agrawal & Jeff Ullman

2 Observations Transfer of data mining research into deployed applications and commercial products – Greater success in vertical applications – Horizontal tools: Examples: SAS Enterprise Miner: Sophisticated Statisticians segment DB2 Intelligent Miner: database applications requiring mining Emergence of the application of data mining in non-conventional domains – Combination of structured and unstructured data New challenges due to security/privacy concerns DARPA initiative to fund data mining research

3 Identifying Social Links Using Association Rules Input: Crawl of about 1 million pages

4 Website Profiling using Classification Input: Example pages for each category during training

5 Discovering Trends Using Sequential Patterns & Shape Queries Input: i) patent database ii) shape of interest

6 Discovering Micro-communities Frequently co-cited pages are related. Pages with large bibliographic overlap are related.

7 New Challenges Privacy-preserving data mining Data mining over compartmentalized databases

8 Inducing Classifiers over Privacy Preserved Numeric Data 30 | 25K | …50 | 40K | … Randomizer 65 | 50K | … Randomizer 35 | 60K | … Reconstruct Age Distribution Reconstruct Salary Distribution Decision Tree Algorithm Model 30 become s 65 (30+35) Alices age Alices salary Johns age

9 Other recent work Cryptographic approach to privacy- preserving data mining – Lindell & Pinkas, Crypto 2000 Privacy-Preserving discovery of association rules – Vaidya & Clifton, KDD2002 – Evfimievski et. Al, KDD 2002 – Rizvi & Haritsa, VLDB 2002

10 Computation over Compartmentalized Databases

11 Some Hard Problems Past may be a poor predictor of future – Abrupt changes – Wrong training examples Actionable patterns (principled use of domain knowledge?) Over-fitting vs. not missing the rare nuggets Richer patterns Simultaneous mining over multiple data types When to use which algorithm? Automatic, data-dependent selection of algorithm parameters

12 Discussion Should data mining be viewed as rich querying and deeply integrated with database systems? – Most of current work make little use of database functionality Should analytics be an integral concern of database systems? Issues in data mining over heterogeneous data repositories (Relationship to the heterogeneous systems discussion)

13 Summary Data mining has shown promise but needs much more further research We stand on the brink of great new answers, but even more, of great new questions -- Matt Ridley


Download ppt "Data Mining: Potentials and Challenges Rakesh Agrawal & Jeff Ullman."

Similar presentations


Ads by Google