Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSc288 Term Project Data mining on predict Voice-over-IP Phones market ----- Huaqin Xu.

Similar presentations


Presentation on theme: "CSc288 Term Project Data mining on predict Voice-over-IP Phones market ----- Huaqin Xu."— Presentation transcript:

1 CSc288 Term Project Data mining on predict Voice-over-IP Phones market ----- Huaqin Xu

2 Agenda Abstract Introduction Methodology Result Conclusion Learning Experience References

3 Abstract This project based on the VoIP survey data sets. Weka explorer’s classifiers are chosen as data mining tool to build models to predict potential customers of VoIP phone and the most important features and services of two VoIP models.

4 Introduction Background  VoIP phone has a potential opportunity with the wide use of internet service.  Two VoIP phone models: Basic & Deluxe Data mining Scope  Customer  Product features and services

5 Methodology Data Mining Tools  C4.5/C5.0, Cubist  Weka  Microsoft SQL Server  SPSS Chose: Weka Explorer Why? Free, Easy, Good Interface, More choices……

6 Methodology Explorer Vs KnowledgeFlow

7 Methodology Datasets: Totally: 94 instances

8 Methodology Preprocessing  Split table Customer: 17 attributes Basic-model: 14 attributes Deluxe-model: 10 attributes  Processing Missing data Delete Replaced by “?”  Transfer data type SPSS  Excel  Weka

9 Methodology Algorithm selection  Classification  Clustering  Association Chose: NNge Why?  High accuracy rate  Simple, clear Rules AlgorithmsCorrect Instances (%) Naivebayes63.82 DecisionStump65.95 Id384.04 J4875.53 NBTree79.78 ConjunctiveRule69.14 DecisionTable80.85 NNge87.23 OneR71.27 PART72.34 Prism88.29 Ridor71.27 JRip74.46 ZeroR63.83 AdaBoostM165.95 BayesNet60.63

10 NNge classifier  Nearest-neighbor like algorithm using non- nested generalized exemplars.  a rule based classifier  builds a sort of “hypergeometric” model.  shows promise as an ML method that performs well on a wide range of datasets Methodology

11 Result

12

13 Rules:  One of customer rules : class Would_Buy IF : cost in {10-20} ^ phone in {yes} ^ email in {yes} ^ fax in {no} ^ chat in {yes,no} ^ other in {no} ^ service type in {Phone_cards_only} ^ price in {Somewhat_Dissatisfied, Somewhat_Satisfied} ^ voice_quality in {Somewhat_Dissatisfied, Somewhat_Satisfied} ^ service in {Somewhat_Dissatisfied} ^ convenience in {Somewhat_Satisfied} ^ promotion in {Somewhat_Dissatisfied} ^ Know VoIP in {yes,no} ^ marital status in {Single} ^ gender in {Male} (11)

14 Result Stat:  Classes allocation  Feature weights

15 Result Basic-model & Deluxe-model  Schema: meta.AttributeSelectedClassifier  Subschema: rules.NNge  Selected attributes: 3,6,8,10,11,12 : 6  Why? avoid overfitting

16 Result Evaluation Ten-fold cross-validation  Summary Correctly classified instances > 85%  Detailed Accuracy By Class TP, FP, Precision, Recall, F measure  Confusion Matrix Misclassified instances:12 instances/94 instances

17 Result

18 Conclusion Limitation  Small Datasets  Incomplete Data source Models  High accuracy rate  Help further Market Analysis  Help product design

19 Learning Experience Process a real data mining problem Know Classification algorithms better  Numeric, Nominal  Missing data  Overfitting Know Evaluation methods better  How to compare algorithms  Evaluation factors

20 Learning Experience Learn how to use Weka  Future work: learn how to modify source to perform better data mining Learn from classmates

21 References ”Data Mining - Concepts and Techniques" by Jiawei Han and Micheline Kamber, Morgan Kaufmann 2001. ”Data Mining - Concepts and Techniques"Jiawei Han “Data Mining – Practical Machine Learning Tools and Techniques with Java Implementations” by Ian H. Witten and Eibe Frank, Morgan Kaufmann 2000. “Data Mining – Practical Machine Learning Tools and Techniques with Java Implementations” http://www.cs.waikato.ac.nz/~ml/index.html. Machine Learning---Weka Home Page http://www.cs.waikato.ac.nz/~ml/index.html Marketing Research by David A. Aaker, V. Kumer and George S. Day, eighth edition, Willey 2004.

22 Thank you


Download ppt "CSc288 Term Project Data mining on predict Voice-over-IP Phones market ----- Huaqin Xu."

Similar presentations


Ads by Google