Presentation is loading. Please wait.

Presentation is loading. Please wait.

DATA MINING FINAL REPORT Vipin Saini M964011062 許博淞 M964020009 陳昀志 M964020043.

Similar presentations


Presentation on theme: "DATA MINING FINAL REPORT Vipin Saini M964011062 許博淞 M964020009 陳昀志 M964020043."— Presentation transcript:

1 DATA MINING FINAL REPORT Vipin Saini M964011062 許博淞 M964020009 陳昀志 M964020043

2 Outline  Introduction  DM Methodology(Step1~Step3)  DM Methodology(Step4~Step8)  DM Methodology(Step9~Step10)  Conclusion

3 Introduction Direct marketing Response rate Telecommunications company Publicly available business data Addition of random companies

4 Step2-Records Some characteristics about each prospect  Number of employees at a particular office  Number of employees for the entire company  Annual sales (in thousands) at a particular office  Annual sales (in thousands) for the entire company  Whether or not the company does business outside the United States  Annual advertising expense  Whether the company has moved recently or is a new business  The type of ownership  Specific industry code  General industry code  Age of the company (in years)

5 Step3-Data Type Correcting the data types.  Make sure "Buyer" is the type Yes/No.  Change the type of Age to integer.  Make sure the "International" type is string or Boolean.  Change "Local Employees" to integer.  Change "Local Sales" to integer.  Change "Industry Type" to categorical.  Change "Total Employees" to integer.  Change "Total Sales" to integer.

6 Step4 Create a Model Set  The number of employees and the number of sales differ based on the size of the company. All of these characteristics represent a picture of company size.  Employee Ratio, Sales Ratio, Productivity Ratio

7 Step4: Create a Model Set  With our newly applied rules, the World dataset now has redundant columns.

8 Step5: Fix Problems with the Data  Categorical variables with too many values

9 Step6: Transform the data  create a training and testing set  Total Records : 13117

10 Step7: Build Model  We use PolyAnalyst to help us to mine the data, and the version is 5.0.

11 Step7: Build Model  We used MarketData.CSV file which we edited as the source. After the software filtrated out missing values, we had the decision tree.

12 the Decision Tree Root Local Employee<23 Age<3 Age<2 Sales Ratio < 0.0027 Sales Ratio >= 0.0027 Sales Ratio =N/A Age>=2 Age>=3 Local Employee <10 Local Employee >=10 Local Employee>=23 Industry Category = C Industry Category = H Industry Category = F Industry Category = E Employee Ratio< 0.214 Industry Category = D Industry Category = A Industry Category = B Industry Category = G Industry Category = I

13 the Decision Tree  We made a decision tree with:  Number of non-terminal nodes : 41  Number of leaves : 91  Depth of the tree : 8

14 Step 8:Assess model the result of decision tree of Training set: Total classification error: 14.04% Classification accuracy: 85.96% Classification error for class No: 14.89% Classification error for class Yes: 13.01% Real/predictNoYes undefined No 301852849 Yes 379253549

15 Step 8:Assess model  If we use top 40% of data and can use this model to predict 80% corrected response.

16 Step 9. Deploy models  The testing set is random selected 50 % of records from the whole dataset.  Total classification error: 15.54%  Classification accuracy: 84.46%  Classification error for class No: 16.56%  Classification error for class Yes: 14.19% Real/predictNoYes undefined No 307461045 Yes 396239539

17 Step 10. Assess result Root Local Employee<23 Local Employee>=23 Yes No

18 Step 10. Assess result  Almost every company that have more than 23 employee have higher ratio to respond. (Class label is Yes and the ratio is 75.5%).  a bigger company with more employee which have higher trends to response.  the number of employee is smaller than 23, are likely not to response (Class label is No and the ratio is 72.9%)  a small company doesn’t have trends to response

19 Step 10. Assess result Root Local Employee<23 Local Employee>=23 Industry Category = C Industry Category = H Industry Category = F Industry Category = E Employee Ratio< 0.214 Employee Ratio >= 0.214 Industry Category = D Industry Category = A Industry Category = B Industry Category = G Industry Category = I YesNo

20 Step 10. Assess result  if the Local Employee ratio is smaller than 0.214 then the response ratio is low. (class label is No and the ratio is 85.7%)  if the Local Employee ratio is bigger than 0.214 then the response ratio is high. (class label is Yes and the ratio is 66.2%)  the Local employee ratio have influence on response ratio of the bigger companies and Industry Category is E, depends on how is the Local employee Ratio is.

21 Step 10. Assess result Root Local Employee<23 Age<3 Age<2 Sales Ratio < 0.0027 Sales Ratio >= 0.0027Sales Ratio =N/A Age>=2 Age>=3 Local Employee <10 Local Employee >=10 Local Employee>=23 Yes

22 Step 10. Assess result  if the Sales ratio is more than 0.27% then the response ration is high (class label is Yes and the ratio is 98.2%)  a new beginning company and his sales rate is good, so he likes to response.

23 Conclusion  We use a decision tree to approach the target marketing.  Knowing how the industry category type is, we can get more information from this mining result.

24 Thanks For Your Listening!


Download ppt "DATA MINING FINAL REPORT Vipin Saini M964011062 許博淞 M964020009 陳昀志 M964020043."

Similar presentations


Ads by Google