Group 5 Abhishek Das, Bharat Jangir.. Project Overview We received a total responses of 119 responses. The division of the responses were as follows:

Slides:



Advertisements
Similar presentations
WELCOME! Commute Trip Reduction (CTR) Electronic Survey Slides prepared by WSDOT Urban Programs Staff Updated 2013.
Advertisements

Data Mining Lecture 9.
Association Analysis (Data Engineering). Type of attributes in assoc. analysis Association rule mining assumes the input data consists of binary attributes.
DECISION TREES. Decision trees  One possible representation for hypotheses.
CHAPTER 9: Decision Trees
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Decision Tree Approach in Data Mining
SEM II : Marketing Research
Data Mining Tri Nguyen. Agenda Data Mining As Part of KDD Decision Tree Association Rules Clustering Amazon Data Mining Examples.
Nadia Andreani Dwiyono DESIGN AND MAKE OF DATA MINING MARKET BASKET ANALYSIS APLICATION AT DE JOGLO RESTAURANT.
Open data and data curation
Decision Tree under MapReduce Week 14 Part II. Decision Tree.
Chapter 1 The Where, Why, and How of Data Collection
Classifier Decision Tree A decision tree classifies data by predicting the label for each record. The first element of the tree is the root node, representing.
Decision Tree Algorithm
Three kinds of learning
Data and Process Modeling
Author Identification for LiveJournal Alyssa Liang.
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
Customer Satisfaction/Loyalty Turna Koksal. Goal Characterize the customer of a bank Customer satisfaction Customer loyalty Relationship between satisfaction.
Introduction to WEKA Aaron 2/13/2009. Contents Introduction to weka Download and install weka Basic use of weka Weka API Survey.
Chapter 4.
Total Population of Age (Years) of People that Smoke
Copyright Shanna Smith & Tom Bohman (2003). This work is the intellectual property of the authors. Permission is granted for this material to be shared.
Data Mining: A Closer Look
Chapter 5 Data mining : A Closer Look.
How to make a survey and a case study about time management of young people in Europe? José Mendes.
Genderization of Social Media By: Rakhi, Lisa, Mary, and Tricia.
Web Information Retrieval Projects Ida Mele. Rules Students can work in teams (max 3 people) The project must be delivered by the deadline that will be.
5.04 Discuss the Consumer Research Process. Consumer Research  Consumer research is used to gather information in order to know what consumers want and.
An Exercise in Machine Learning
WEKA - Explorer (sumber: WEKA Explorer user Guide for Version 3-5-5)
Data Mining CS157B Fall 04 Professor Lee By Yanhua Xue.
1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset.
A student guide To completing Level 1 & 2 portfolios.
Chapters 1 and 2 Week 1, Monday. Chapter 1: Stats Starts Here What is Statistics? “Statistics is a way of reasoning, along with a collection of tools.
Statistics Portugal/ Metadata Unit Monica Isfan « Joint UNECE/ EUROSTAT/ OECD Work Session on Statistical Metadata.
BUSINESS STATISTICS Chapter 1 (Page 26). 1.1 What is Business Statistics (Page26) Business Statistics – is a collection of tools and techniques that are.
Categorical data. Decision Tree Classification Which feature to split on? Try to classify as many as possible with each split (This is a good split)
Keys to Successful Marketing  Must understand and meet customer needs and wants  To meet customer needs, marketers must collect information.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration.
Lecture 5: Writing the Project Documentation Part III.
Decision Trees. Decision trees Decision trees are powerful and popular tools for classification and prediction. The attractiveness of decision trees is.
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
A Look at Data Mining Presented by: Charles Hollingsworth Flavia Peynado Ritch Overton DSc8020, Group Presentation, July 31, 2002.
An Investigation of Commercial Data Mining Presented by Emily Davis Supervisor: John Ebden.
Increasing Efficiency in Data Collection Processes Arie Aharon, Israel Central Bureau of Statistics.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
1Weka Tutorial 5 - Association © 2009 – Mark Polczynski Weka Tutorial 5 – Association Technology Forge Version 0.1 ?
United Nations Economic Commission for Europe Statistical Division Data Initiatives: The UNECE Gender Database and Website Victoria Velkoff On behalf of.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
United Nations Economic Commission for Europe Statistical Division The UNECE Gender Database and Website UNECE Statistical Division.
Agenda  Overview of survey strategy for 2012  Surveying non-GMC Public Health trainees  Obtaining demographics data  Surveying method  Reporting results.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
DECISION TREE Ge Song. Introduction ■ Decision Tree: is a supervised learning algorithm used for classification or regression. ■ Decision Tree Graph:
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
United Nations Economic Commission for Europe Statistical Division The UNECE Gender Statistics Database and Website UNECE Statistical Division.
Studying Dieting & Eating Behaviour in School Age Children Using Data from E-Stat Presented by: Jodie Ferguson & Amanda Yarascavitch Section 10.
CONFIDENTIAL AND PROPRIETARY Copyright 2007 by Frank N. Magid Associates, Inc. Any duplication, reproduction, or usage of this document or any portion.
In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.
Creating Effective Online Surveys August 26, 2008 Webinar Conference Call: ; Code: Creating Effective Online Surveys Facilitator: Kami.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
PowerPoint Presentation Background I used a make-believe school name; however, the data presented on this PowerPoint is actual data from Colorado Department.
Analysis of New York State Medicaid Program Enrollment by Month: Beginning 2009 TEAM #3 : TEAM PROJECT PRESENTATION (DATA MINING) DCS861A EMERGING INFORMATION.
Decision Trees.
Data Mining Jim King.
Waikato Environment for Knowledge Analysis
Exam #3 Review Zuyin (Alvin) Zheng.
PROJECTS SUMMARY PRESNETED BY HARISH KUMAR JANUARY 10,2018.
Presentation transcript:

Group 5 Abhishek Das, Bharat Jangir.

Project Overview We received a total responses of 119 responses. The division of the responses were as follows: – 73 surveymonkey.com – 26 Facebook and Reddit – 20 Friends, and on paper responses. We first divided the task of collection from various sources in different parts: – Bharat Jangir from Surveymonkey.com – Abhishek Das from Facebook, Reddit – Kevin Talmagde from Friends, and on paper

Surveymonkey.com Benefits Easy to use Secured Trusted website We can format the results obtained in various different ways. Easy registration process and free

Survey Overview. We decided to keep the survey short to increase the number of responses otherwise people tend to lose interest. We wanted questions to be ubiquitous. We then trimmed our question set from 18 questions to 6 question. We did this by giving each question a score based on easiness and guaranteed level of response (1-5).

Questionnaire There were total of only 6 questions: –Age –Gender –Highest level of education –Do you use Antivirus? If so, which one? –Reuse of username –Reuse of password

Team Organization Abhishek Das1) Implemented and decided the tools to be used for analysis. 2) Carried out the analysis. 3) Performed and devised strategies to find association of attributes. 4) Documented the report. Bharat Jangir1) Analysis of Data. 2) Documentation. 3) Algorithm and graph interpretation. Kevin Talmagde1) Analysis of Data. 2) Documentation. 3) Algorithm understanding and rule application.

Outcome We received a total responses of 119 in about 7 weeks time. Majority of the results came in from the initial weeks. Less number of females versus males who were open to taking surveys. Validity? - Completely anonymous - Website is secure - Plenty of time to publicize about the survey

Analysis - By Gender

Analysis- By Education

Analysis - By username reuse

Analysis - By password reuse

More about algorithms and implementation ● Password Reuse and use of anti virus: Initially by plotting a pie chart on the data set we found out that people do reuse passwords for more than one websites. We plotted a decision tree to support our claims. ●C4.5 / j48 algorithm was used to generate this decision tree. ●This algorithm classifies the attributes on the basis of entropy.

At each node it chooses an element that effectively splits the data on the basis of information gain. Higher the information gain, closer it lies to the node. Nodes are then split and re-split till the information gain is 0 or we reach end of splitting attributes. Education level affecting the use of antivirus: The table below shows how education level affects the user whether or not to use anti virus for protection.

More about algorithms and implementation Apriori algorithm was used learn the association rule learning. It identifies the frequent datasets and extends them to larger item sets as long as those item appear frequently. This frequent item set is determined by Apriori rule to determine association rule. Example - Market analysis basket.

Tool used - WEKA

J48 classification tree algorithm in use to classify population for the use of anti virus vs education level.

Jitter plot showing the population classification

References: 1) Blog report - Cyber security survey shows low internet security confidence across EU. nfidence-across-eu 2) Password reuse opens doors for cyber criminals End-users must have a different password for every website and security domain – Feb-15, als-457 3) University of Wakito, New Zealand, Weka documentation eka/classifiers/trees/J48.java#J48 4) University of Wakito, New Zealand, Weka documentation

Any Question???