Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Mining in Practice: Techniques and Practical Applications Junling Hu May 14, 2013.

Similar presentations


Presentation on theme: "Data Mining in Practice: Techniques and Practical Applications Junling Hu May 14, 2013."— Presentation transcript:

1 Data Mining in Practice: Techniques and Practical Applications Junling Hu May 14, 2013

2 What is data mining? 2  Mining patterns from data  Is it statistics?  Functional form?  Computation speed concern?  Data size  Variable size  Is it machine learning?  Big data issue  New methods: network mining

3 Examples of data mining 3  Frequently bought together  Movie recommendation

4 More examples of data mining 4  Keyword suggestions  Genome & disease mining  Heart monitoring

5 Overview of data mining 5  Frequent pattern mining  Machine Learning  Supervised  Unsupervised  Stream mining  Recommender system  Graph mining  Unstructured data  Text,  Audio  Image and Video  Big data technology

6 Frequent Pattern Mining 6  Diaper and Beer  Product assortment  Click behavior  Machine breakdown ?

7 The case of Amazon 7 UserItems 1{Princess dress, crown, gloves, t-shirt} 2{Princess dress, crown, gloves, pink dress, t-shirt } 3{Princess dress, crown, gloves, pink dress, jeans} 4{ Princess dress, crown, gloves, pink dress} 5{crown, gloves }  Count frequency of co-occurrence  Efficient algorithm

8 Machine Learning Process 8

9 Machine Learning 9  Supervised  Unsupervised (clustering)

10 Binary classification 10 Checking Duration (years) Savings ($k) Current Loans Loan Purpose Risky? Yes110YesTV0 Yes24NoTV1 No575NoCar0 Yes1066NoRepair1 Yes583YesCar0 Yes111NoTV0 Yes499YesCar0 Input features Output class Data point

11 Classification (1) 11  Decision tree

12 Classification (2): Neural network 12  Perceptron  Multi-layer neural netowrk

13 Head pose detection 13

14 Support Vector Machine (SVM) 14  Search for a separating hyperplane  Maximize margin

15 Perceived advantage of SVM 15  Transform data into higher dimension

16 Applications of SVM: Spam Filter 16 Input Features:  Transmission  IP address --167.12.24.555  Sender URL -- one-spam.com  Email header  From --“admin@one-spam.cpm”  To --“undisclosed”  cc  Email Body  # of paragraphs  # words  Email structure  # of attachments  # of links

17 Logistic regression 17  Advantage: Simple functional form  Can be parallelized  Large scale

18 Applications of logistic regression 18  Click prediction  Search ranking (web pages, products)  Online advertising  Recommendation  The model  Output: Click/no click  Input features: page content, search keyword, User information

19 Regression 19  Linear regression  Non-linear regression Application: Stock price prediction Credit scoring employment forecast

20 History of Supervised learning 20

21 Semi-supervised learning 21  Application:  Speech dialog system

22 Unsupervised learning: Clustering 22  No labeled data  Methods  K-means

23 Categories of machine learning 23

24 Applications of Clustering 24  Malware detection  Document clustering: Topic detection

25 Graphs in our life 25  Social network  Molecular compound Friend recommendation Drug discovery

26 Graph and its matrix representation 26 123456 1 010001 2 101100 3 010110 4 011010 5 001101 6 100010 1 2 6 3 5 4 Adjacency matrix

27 The web graph 27 Anchor text Hyperlink Page 1 Page 2 Page 3

28 PageRank as a steady state 28 123456 100.33 00 20.50 000 30.25 0 0 4010000 5000.33 0 60.5000 0  Transition matrix P=  PageRank is a probability vector such that

29 Discover influencers on Twitter 29  The Twitter graph  Node  Link  A PageRank approach: TwitterRank 2 1 3 5 4 Following

30 Facebook graph search 30  Entity graph  Natural language search  “ Restaurants liked by my friends”

31 Recommending a game 31

32 Recommendation in Travel site 32

33 Prediction Problems 33  Rating Prediction  Given how an user rated other items, predict the user’s rating for a given item  Top-N Recommendation  Given the list of items liked by an user, recommend new items that the user might like **** ?

34 Explicit vs. Implicit Feedback Data 34  Explicit feedback  Ratings and reviews  Implicit feedback (user behavior)  Purchase behavior: Recency, frequency, …  Browsing behavior: # of visits, time of visit, time of staying, clicks

35 Collaborative Filtering 35  Hypotheses  User/Item Similarities  Similar users purchase similar items  Similar items are purchased by similar users  Matching characteristics  Match exists between user’s and item’s characteristics

36 User-User similarity 36  User’s movie rating Out of Africa Star Wars Air Force One Liar, Liar John4451 Adam1125 Laura?452

37 Item-item similarity 37 Out of Africa Star Wars Air Force One Liar, Liar John4451 Adam1125 Laura?452

38 Application of item-item similarity 38  Amazon

39 SVD (Singular Value Decomposition) 39

40 Latent factors 40

41 Application of Latent Factor Model 41  GetJar

42 Ranking-based recommendation 42

43 Application in LinkedIn 43  Ranking-based model

44 Thanks and Contact 44  Co-author: Patricia Hoffman Contact:  junlinghu@gmail.com junlinghu@gmail.com  Twitter: @junling_tech


Download ppt "Data Mining in Practice: Techniques and Practical Applications Junling Hu May 14, 2013."

Similar presentations


Ads by Google