Presentation is loading. Please wait.

Presentation is loading. Please wait.

SEG 4630 E-Commerce Data Mining — Final Review —

Similar presentations


Presentation on theme: "SEG 4630 E-Commerce Data Mining — Final Review —"— Presentation transcript:

1 SEG 4630 E-Commerce Data Mining — Final Review —
Hong Cheng SEEM Chinese University of Hong Kong November 21, 2018 E-Commerce Data Mining

2 E-Commerce Data Mining
Final Time/Location Time: 9:30-11:30 am Dec. 15, Tuesday Location: 103 John Fulton Center Coverage: Chaps 2, 4-8 You can bring two A4 size, double-sided cheat sheet Calculator IS needed. November 21, 2018 E-Commerce Data Mining

3 E-Commerce Data Mining
Chapter 2 (1) Calculate data distribution Mean, median, variance and standard deviation Calculate distance between data objects Minkowski distance Distance between binary variables: symmetric and asymmetric Cosine similarity November 21, 2018 E-Commerce Data Mining

4 E-Commerce Data Mining
Chapter 2 (2) Data normalization Min-max normalization Z-score normalization Decimal scaling Data reduction Dimensionality reduction methods Sampling November 21, 2018 E-Commerce Data Mining

5 E-Commerce Data Mining
Chapters 4-5 (1) Decision tree Calculate information gain, gini index, gain ratio Bayes theorem and Naïve Bayesian Calculate probabilities from training datasets Lazy classifier and k-nearest neighbor Calculate based on different k values and different distance measures Differences between eager and lazy classifiers November 21, 2018 E-Commerce Data Mining

6 E-Commerce Data Mining
Chapters 4-5 (2) Accuracy and error measures Training error vs. validation error Confusion matrix ROC curve True positive rate (TPR) and false positive rate (FPR) Area under curve (AUC) Evaluation methods Hold out Cross validation Ensemble, bagging: know the principle November 21, 2018 E-Commerce Data Mining

7 E-Commerce Data Mining
Chapters 6-7 (1) Frequent patterns and association rules Support, confidence Generate association rules from frequent itemsets Apriori algorithm Candidate generation and test Self joining Pruning Database scan FPgrowth algorithm Build FP-tree Extract conditional DB November 21, 2018 E-Commerce Data Mining

8 E-Commerce Data Mining
Chapter 6-7 (2) Closed itemsets and maximal itemsets Lift/Interest measure Constraints Monotonic Antimonotonic Convertible constraints Sequence pattern mining: know the principle Max-gap min-gap Max-span November 21, 2018 E-Commerce Data Mining

9 E-Commerce Data Mining
Chapter 8 K-means clustering Algorithm and calculation Advantages and disadvantages Hierarchical clustering: MIN, MAX, Group average Step-wise calculation Update distance matrix Density-based clustering Know the principle Evaluating clustering quality SSE, silhouette, entropy, purity November 21, 2018 E-Commerce Data Mining

10 E-Commerce Data Mining
Questions? November 21, 2018 E-Commerce Data Mining


Download ppt "SEG 4630 E-Commerce Data Mining — Final Review —"

Similar presentations


Ads by Google