Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Mining in Practice: Techniques and Practical Applications

Similar presentations

Presentation on theme: "Data Mining in Practice: Techniques and Practical Applications"— Presentation transcript:

1 Data Mining in Practice: Techniques and Practical Applications
Junling Hu May 14, 2013

2 What is data mining? Mining patterns from data Is it statistics?
Functional form? Computation speed concern? Data size Variable size Is it machine learning? Big data issue New methods: network mining E.g. stroke prediction

3 Examples of data mining
Frequently bought together Movie recommendation

4 More examples of data mining
Keyword suggestions Genome & disease mining Heart monitoring

5 Overview of data mining
Frequent pattern mining Machine Learning Supervised Unsupervised Stream mining Recommender system Graph mining Unstructured data Text, Audio Image and Video Big data technology

6 Frequent Pattern Mining
Diaper and Beer Product assortment Click behavior Machine breakdown ? Product display, assortment, re-stocking

7 The case of Amazon Count frequency of co-occurrence
User Items 1 {Princess dress, crown, gloves, t-shirt} 2 {Princess dress, crown, gloves, pink dress, t-shirt } 3 {Princess dress, crown, gloves, pink dress, jeans} 4 { Princess dress, crown, gloves, pink dress} 5 {crown, gloves } Count frequency of co-occurrence Efficient algorithm

8 Machine Learning Process

9 Machine Learning Supervised Unsupervised (clustering)
Examples: Churn, Click, yes/no Unsupervised: discussion topics (Twitter), customer feedback, …

10 Binary classification
Input features Output class Checking Duration (years) Savings ($k) Current Loans Loan Purpose Risky? Yes 1 10 TV 2 4 No 5 75 Car 66 Repair 83 11 99 Data point Millions of data points, hundreds of thousands of rows

11 Classification (1) Decision tree

12 Classification (2): Neural network
Perceptron Multi-layer neural netowrk

13 Head pose detection

14 Support Vector Machine (SVM)
Search for a separating hyperplane Maximize margin

15 Perceived advantage of SVM
Transform data into higher dimension

16 Applications of SVM: Spam Filter
Input Features: Transmission IP address Sender URL -- header From To “undisclosed” cc Body # of paragraphs # words structure # of attachments # of links

17 Logistic regression Advantage: Simple functional form
Can be parallelized Large scale

18 Applications of logistic regression
Click prediction Search ranking (web pages, products) Online advertising Recommendation The model Output: Click/no click Input features: page content, search keyword, User information

19 Regression Linear regression Non-linear regression Application:
Stock price prediction Credit scoring employment forecast Numeric number Nonlinear is used by machine learning

20 History of Supervised learning

21 Semi-supervised learning
Application: Speech dialog system

22 Unsupervised learning: Clustering
No labeled data Methods K-means

23 Categories of machine learning

24 Applications of Clustering
Malware detection Document clustering: Topic detection

25 Graphs in our life Social network Molecular compound
Friend recommendation Drug discovery

26 Graph and its matrix representation
Adjacency matrix 1 2 3 4 5 6 1 2 6 3 5 4

27 The web graph Page 2 Page 1 Hyperlink Page 3 Anchor text
Data become large, unsupervised learning becomes popular

28 PageRank as a steady state
Transition matrix P= PageRank is a probability vector such that 1 2 3 4 5 6 0.33 0.5 0.25

29 Discover influencers on Twitter
The Twitter graph Node Link A PageRank approach: TwitterRank 2 1 3 5 4 Following “following”

30 Facebook graph search Entity graph Natural language search
“Restaurants liked by my friends”

31 Recommending a game

32 Recommendation in Travel site

33 Prediction Problems ? Rating Prediction Top-N Recommendation ****
Given how an user rated other items, predict the user’s rating for a given item Top-N Recommendation Given the list of items liked by an user, recommend new items that the user might like ? ****

34 Explicit vs. Implicit Feedback Data
Explicit feedback Ratings and reviews Implicit feedback (user behavior) Purchase behavior: Recency, frequency, … Browsing behavior: # of visits, time of visit, time of staying, clicks

35 Collaborative Filtering
Hypotheses User/Item Similarities Similar users purchase similar items Similar items are purchased by similar users Matching characteristics Match exists between user’s and item’s characteristics

36 User-User similarity User’s movie rating Out of Africa Star Wars
Air Force One Liar, Liar John 4 5 1 Adam 2 Laura ?

37 Item-item similarity Out of Africa Star Wars Air Force One Liar, Liar
John 4 5 1 Adam 2 Laura ?

38 Application of item-item similarity

39 SVD (Singular Value Decomposition)

40 Latent factors

41 Application of Latent Factor Model

42 Ranking-based recommendation

43 Application in LinkedIn
Ranking-based model

44 Thanks and Contact Co-author: Patricia Hoffman Contact:

Download ppt "Data Mining in Practice: Techniques and Practical Applications"

Similar presentations

Ads by Google