SEG 4630 E-Commerce Data Mining — Final Review —

Slides:



Advertisements
Similar presentations
The Normal Distribution
Advertisements

Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
DIMENSIONALITY REDUCTION: FEATURE EXTRACTION & FEATURE SELECTION Principle Component Analysis.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
COMP 328: Final Review Spring 2010 Nevin L. Zhang Department of Computer Science & Engineering The Hong Kong University of Science & Technology
Classification and risk prediction
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
CSE803 Fall Pattern Recognition Concepts Chapter 4: Shapiro and Stockman How should objects be represented? Algorithms for recognition/matching.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Lecture 5 (Classification with Decision Trees)
ROC Curves.
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Constrained frequent itemset mining.
Revision (Part II) Ke Chen COMP24111 Machine Learning Revision slides are going to summarise all you have learnt from Part II, which should be helpful.
Stockman CSE803 Fall Pattern Recognition Concepts Chapter 4: Shapiro and Stockman How should objects be represented? Algorithms for recognition/matching.
Ensemble Learning (2), Tree and Forest
1 Business Intelligence and Data Analytics Intro Qiang Yang Based on Textbook: Business Intelligence by Carlos Vercellis.
Midterm Review. 1-Intro Data Mining vs. Statistics –Predictive v. experimental; hypotheses vs data-driven Different types of data Data Mining pitfalls.
APPLICATIONS OF DATA MINING IN INFORMATION RETRIEVAL.
1 Data Mining Lecture 3: Decision Trees. 2 Classification: Definition l Given a collection of records (training set ) –Each record contains a set of attributes,
Normal Distributions Z Transformations Central Limit Theorem Standard Normal Distribution Z Distribution Table Confidence Intervals Levels of Significance.
TOPICS IN BUSINESS INTELLIGENCE K-NN & Naive Bayes – GROUP 1 Isabel van der Lijke Nathan Bok Gökhan Korkmaz.
Lecture 20: Cluster Validation
Today Ensemble Methods. Recap of the course. Classifier Fusion
Chapter 4: Pattern Recognition. Classification is a process that assigns a label to an object according to some representation of the object’s properties.
Summary „Data mining” Vietnam national university in Hanoi, College of technology, Feb.2006.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall DM Finals Study Guide Rodney Nielsen.
Lecture Notes for Chapter 4 Introduction to Data Mining
ECE 471/571 - Lecture 19 Review 11/12/15. A Roadmap 2 Pattern Classification Statistical ApproachNon-Statistical Approach SupervisedUnsupervised Basic.
1 Introduction to data mining G. Marcou + + Laboratoire d’infochimie, Université de Strasbourg, 4, rue Blaise Pascal, Strasbourg.
Evaluating Classifiers Reading: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)An introduction to ROC analysis.
Data Analytics CMIS Short Course part II Day 1 Part 1: Clustering Sam Buttrey December 2015.
Reducing Number of Candidates Apriori principle: – If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due.
Evaluating Classifiers. Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)
Tallahassee, Florida, 2016 CIS4930 Introduction to Data Mining Midterm Review Peixiang Zhao.
Tallahassee, Florida, 2016 CIS4930 Introduction to Data Mining Final Review Peixiang Zhao.
CLUSTER ANALYSIS. Cluster Analysis  Cluster analysis is a major technique for classifying a ‘mountain’ of information into manageable meaningful piles.
Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Algorithms: The Basic Methods Clustering WFH:
Data Mining: Basic Cluster Analysis
Evaluating Classifiers
Lecture Notes for Chapter 4 Introduction to Data Mining
Reducing Number of Candidates
Data Science Algorithms: The Basic Methods
Fast Kernel-Density-Based Classification and Clustering Using P-Trees
Data Science Algorithms: The Basic Methods
SAD: 6º Projecto.
Chapter 7 – K-Nearest-Neighbor
SEEM5770/ECLT5840 Course Review
Data Mining Classification: Basic Concepts and Techniques
Chapter 6 Tutorial.
Clustering Evaluation The EM Algorithm
Data Mining Classification: Alternative Techniques
CSE 4705 Artificial Intelligence
LECTURE 05: THRESHOLD DECODING
Transactional data Algorithm Applications
Revision (Part II) Ke Chen
Association Rule Mining
Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.
Revision (Part II) Ke Chen
Association Analysis: Basic Concepts and Algorithms
Parametric Estimation
Pattern Recognition and Machine Learning
Cluster Validity For supervised classification we have a variety of measures to evaluate how good our model is Accuracy, precision, recall For cluster.
What Is Good Clustering?
Model Evaluation and Selection
LECTURE 05: THRESHOLD DECODING
Data Mining Anomaly Detection
Data Mining CSCI 307, Spring 2019 Lecture 21
Presentation transcript:

SEG 4630 E-Commerce Data Mining — Final Review — Hong Cheng SEEM Chinese University of Hong Kong www.se.cuhk.edu.hk/~hcheng November 21, 2018 E-Commerce Data Mining

E-Commerce Data Mining Final Time/Location Time: 9:30-11:30 am Dec. 15, Tuesday Location: 103 John Fulton Center Coverage: Chaps 2, 4-8 You can bring two A4 size, double-sided cheat sheet Calculator IS needed. November 21, 2018 E-Commerce Data Mining

E-Commerce Data Mining Chapter 2 (1) Calculate data distribution Mean, median, variance and standard deviation Calculate distance between data objects Minkowski distance Distance between binary variables: symmetric and asymmetric Cosine similarity November 21, 2018 E-Commerce Data Mining

E-Commerce Data Mining Chapter 2 (2) Data normalization Min-max normalization Z-score normalization Decimal scaling Data reduction Dimensionality reduction methods Sampling November 21, 2018 E-Commerce Data Mining

E-Commerce Data Mining Chapters 4-5 (1) Decision tree Calculate information gain, gini index, gain ratio Bayes theorem and Naïve Bayesian Calculate probabilities from training datasets Lazy classifier and k-nearest neighbor Calculate based on different k values and different distance measures Differences between eager and lazy classifiers November 21, 2018 E-Commerce Data Mining

E-Commerce Data Mining Chapters 4-5 (2) Accuracy and error measures Training error vs. validation error Confusion matrix ROC curve True positive rate (TPR) and false positive rate (FPR) Area under curve (AUC) Evaluation methods Hold out Cross validation Ensemble, bagging: know the principle November 21, 2018 E-Commerce Data Mining

E-Commerce Data Mining Chapters 6-7 (1) Frequent patterns and association rules Support, confidence Generate association rules from frequent itemsets Apriori algorithm Candidate generation and test Self joining Pruning Database scan FPgrowth algorithm Build FP-tree Extract conditional DB November 21, 2018 E-Commerce Data Mining

E-Commerce Data Mining Chapter 6-7 (2) Closed itemsets and maximal itemsets Lift/Interest measure Constraints Monotonic Antimonotonic Convertible constraints Sequence pattern mining: know the principle Max-gap min-gap Max-span November 21, 2018 E-Commerce Data Mining

E-Commerce Data Mining Chapter 8 K-means clustering Algorithm and calculation Advantages and disadvantages Hierarchical clustering: MIN, MAX, Group average Step-wise calculation Update distance matrix Density-based clustering Know the principle Evaluating clustering quality SSE, silhouette, entropy, purity November 21, 2018 E-Commerce Data Mining

E-Commerce Data Mining Questions? November 21, 2018 E-Commerce Data Mining