Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1: Information Retrieval an Web Search 1 Part II: Web Content Mining Chapter 3: Clustering Introduction Hierarchical Agglomerative Clustering K-Means Clustering Probability-Based Clustering Collaborative Filtering
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1: Information Retrieval an Web Search 2 Two Class Mixture A0 B0.961 A0 A0 B0.780 A0 B0 A0 B0.980 A0 B0.135 A0.490 B0.928 B0 B0.658 A0 A0 A0.387 A0.570 B0 ClassMeanStandard deviationProbability of sampling A B
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1: Information Retrieval an Web Search 3 Finite Mixture Problem Given the labeled data, for each class C compute: mean standard deviation probability of sampling Generative document model
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1: Information Retrieval an Web Search 4 Finite Mixture Problem (2) A0 B0.961 A0 A0 B0.780 A0 B0 A0 B0.980 A0 B0.135 A0.490 B0.928 B0 B0.658 A0 A0 A0.387 A0.570 B0
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1: Information Retrieval an Web Search 5 Classification Problem Given for class A and B compute P(A|x) and P(B|x). Use if x is a discrete variable if x is a continuous variable probability density function