Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Practical Lessons of Data Mining at Yahoo! Presenter: Jun-Yi Wu Authors: Ye Chen, Dmitry Pavlov, Pavel.

Similar presentations


Presentation on theme: "Intelligent Database Systems Lab N.Y.U.S.T. I. M. Practical Lessons of Data Mining at Yahoo! Presenter: Jun-Yi Wu Authors: Ye Chen, Dmitry Pavlov, Pavel."— Presentation transcript:

1 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Practical Lessons of Data Mining at Yahoo! Presenter: Jun-Yi Wu Authors: Ye Chen, Dmitry Pavlov, Pavel Berkhin, Aparna Seetharaman, Albert Meltzer 2009 CIKM 國立雲林科技大學 National Yunlin University of Science and Technology

2 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Outline Motivation Objective Experience Conclusion Comments 2

3 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Motivation 3 The usage of data in many commercial applications has been growing at an unprecedented pace in the last decade. While successful data mining efforts lead to major business advances, there were also numerous, less publicized efforts that for one or another reason failed. 商品購買頻率 A45% B42.5% C40% A和BA和B 25% A和CA和C 20% B和CB和C 15% A和B和CA和B和C 5% Raw Data Information

4 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Objective To discuss practical lessons based on years of our data mining experiences at Yahoo! To offer insights into how to drive the data mining effort to success in a business environment. To reflect on four success factors: methodology, data, infrastructure, and people. 4

5 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Success Factors 5 Methodology Data  A Data-driven Perspective  Data Preprocessing  Data Size and Sampling  Data Distribution  Data Understanding  Modeling Goals and Evaluation

6 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Success Factors 6 Infrastructure  An infrastructure for Web-scale Data  Gridification  The Scalability Dilemma People Engaging the Wider Community

7 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Success Factors Methodology  Many companies fail to take full advantage of their data because they do not apply data mining techniques to study, manage and learn from their data. 7

8 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Success Factors 8 Data- A Data-driven Perspective  Companies habitually rely on their "gut feelings" instead of relying on the data to drive decision-making.  That being said, one should not underestimate the importance of domain knowledge.  We argue that domain knowledge should guide empirical investigation, especially at the exploratory stage.

9 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Success Factors 9 Data- Data Preprocessing  The data mining process starts with data preprocessing, or so-called ETL (extract, transform and load), during which raw user data logs go through a series of perturbations and get loaded into a data warehouse (DW).  ETL may introduces biases in downstream data.  The timestamp may not be consistently normalized  Data consistency is a big challenge.  Data integration is a big architectural challenge.

10 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Success Factors 10 Data- Data Distribution

11 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Success Factors 11 Data  A Data-driven Perspective  Data Preprocessing  Data Size and Sampling  Data Distribution  Data Understanding  Modeling Goals and Evaluation

12 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Conclusion 12

13 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Comments 13 Advantage Drawback  … Application  Information Search and Retrieval


Download ppt "Intelligent Database Systems Lab N.Y.U.S.T. I. M. Practical Lessons of Data Mining at Yahoo! Presenter: Jun-Yi Wu Authors: Ye Chen, Dmitry Pavlov, Pavel."

Similar presentations


Ads by Google