Presentation is loading. Please wait.

Presentation is loading. Please wait.

Understanding Data Mining Craig A. Stevens, PMP, CC

Similar presentations

Presentation on theme: "Understanding Data Mining Craig A. Stevens, PMP, CC"— Presentation transcript:

1 Understanding Data Mining Craig A. Stevens, PMP, CC

2 Examples of Classical Statistical Methods

3 Latitude 36.19N and Longitude -86.78W Nashville, TN, USA

4 Y i = a + bx i + e

5 Multiple Regression

6 Multiple Regression

7 Multiple Regression

8 Multiple Regression

9 Multiple Regression

10 Data Mining

11 egorized/livejournal.png


13 What is Data Mining? The process of identifying hidden patterns, trends, and relationships in large quantities of data. Why Do Data Mining? To discover useful information for making decisions. Too many variables for Classical Statistical methods to work. – Large Number of Records 10 8 - 10 12 Gigabyte – Terabyte – High Dimensional Data Lots of Variables (10 – 10 4 attributes)

14 The Huber-Wegman Taxonomy of Data Set Sizes DescriptorData Set Size in Bytes Storage Mode Tiny10^2Piece of Paper Small10^4A few Pieces of Paper Medium10^6A Floppy Disk Large10^8Hard Disk Huge10^10Multiple Hard Disks Massive10^12Robotic Magnetic Tape Storage Silos Super Massive10^15Distributed Data Archives

15 Name Model Role Measurement Level Description BADTargetBinary 1=client defaulted on loan 0=loan repaid CLAGEInputInterval Age of oldest trade line in months CLNOInputIntervalNumber of trade lines DEBTINCInputIntervalDebt-to-income ratio DELINQInputIntervalNumber of trade lines DEROGInputInterval Number of major derogatory reports JOBInputNominal Six occupational categories LOANInputInterval Amount of the loan request MORTDUEInputInterval Amount due on existing mortgage NINQInputInterval Number of recent credit inquiries REASONInputBinary DebtCon=debt consolidation, HomeImp=home improvement VALUEInputIntervalValue of current property YOJInputIntervalYears at present job

16 SAS Enterprise Miner Objects


18 Shows the Cut off Point is 6 Variables

19 Small Number of Useful Variables


21 Comparing Methods and Profit vs Marketing Cost



24 Decision Trees for Predictive Modeling Padraic G. Neville SAS Institute Inc. 4 August 1999

25 Clustering As in Different Brands


27 Data Mining Art found at

28 Data Mining Art found at


30 National Energy Research Scientific Computing Center

31 SurfStat A Matlab toolbox for the statistical analysis of univariate and multivariate surface and volumetric data using linear mixed effects models and random field theory Keith J. Worsley

32 Latitude 36.19N and Longitude -86.78W Nashville, TN, USA

33 Genealogical Tree On You Tube

Download ppt "Understanding Data Mining Craig A. Stevens, PMP, CC"

Similar presentations

Ads by Google