Some slide material taken from: Groth, Han and Kamber, SAS Institute Data Mining A special presentation for BCIS 4660 Spring 2012 Dr. Nick Evangelopoulos,

Slides:



Advertisements
Similar presentations
Data Analytics : A powerful insight into your donors’ giving potential Insight SIG 19th February, 2013.
Advertisements

The Home Equity Loan Case
Section 2.1 Introduction to Enterprise Miner. 2 Objectives Open Enterprise Miner. Explore the workspace components of Enterprise Miner. Set up a project.
1. Abstract 2 Introduction Related Work Conclusion References.
Database – Part 3 Dr. V.T. Raja Oregon State University External References/Sources: Data Warehousing – Mr. Sakthi Angappamudali.
Chapter 9 Business Intelligence Systems
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Database – Part 2b Dr. V.T. Raja Oregon State University External References/Sources: Data Warehousing – Sakthi Angappamudali at Standard Insurance; BI.
Data Mining By Archana Ketkar.
Database Processing for Business Intelligence Systems
Credit Scores and Scorecard Lending AGEC 489/690 Spring 2009 Slide Show #12.
Lending Team Analysis AGEC Spring Factors to Consider Credit scores assessing the borrower’s existing credit history. Business plan and.
Oracle Data Mining Ying Zhang. Agenda Data Mining Data Mining Algorithms Oracle DM Demo.
Data Mining & Data Warehousing PresentedBy: Group 4 Kirk Bishop Joe Draskovich Amber Hottenroth Brandon Lee Stephen Pesavento.
Decision Tree Models in Data Mining
Computer Science Universiteit Maastricht Institute for Knowledge and Agent Technology Data mining and the knowledge discovery process Summer Course 2005.
Beyond Opportunity; Enterprise Miner Ronalda Koster, Data Analyst.
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
1 Chapter 1: Introduction 1.1 Introduction to SAS Enterprise Miner.
Chapter 1: Introduction
Logistic Regression KNN Ch. 14 (pp ) MINITAB User’s Guide
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Application of SAS®! Enterprise Miner™ in Credit Risk Analytics
Data Mining Techniques
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
Data Mining Dr. Chang Liu. What is Data Mining Data mining has been known by many different terms Data mining has been known by many different terms Knowledge.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
DATA MINING Team #1 Kristen Durst Mark Gillespie Banan Mandura University of DaytonMBA APR 09.
Understanding Data Analytics and Data Mining Introduction.
Data Warehouse & Data Mining
1 Data Mining Books: 1.Data Mining, 1996 Pieter Adriaans and Dolf Zantinge Addison-Wesley 2.Discovering Data Mining, 1997 From Concept to Implementation.
CREDIT REPORTS How important is your credit score? What do we already know? What do we want to know? When would a credit report get pulled?
Chapter 11 LEARNING FROM DATA. Chapter 11: Learning From Data Outline  The “Learning” Concept  Data Visualization  Neural Networks The Basics Supervised.
Chapter 1 Introduction to Data Mining
Premiere Products Team Project SAS Enterprise Miner (Part I)
Copyright © 2010, SAS Institute Inc. All rights reserved. Applied Analytics Using SAS ® Enterprise Miner™
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Succeeding with Technology Database Systems Basic Data Management Concepts Organizing Data in a Database Database Management Systems Using Database Systems.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
Data MINING Data mining is the process of extracting previously unknown, valid and actionable information from large data and then using the information.
1 Reviewing Data Warehouse Basics. Lessons 1.Reviewing Data Warehouse Basics 2.Defining the Business and Logical Models 3.Creating the Dimensional Model.
Guest Lecture Introduction to Data Mining Dr. Bhavani Thuraisingham September 17, 2010.
1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
A way to integrate IR and Academic activities to enhance institutional effectiveness. Introduction The University of Alabama (State of Alabama, USA) was.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
CISB113 Fundamentals of Information Systems Data Management.
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
MIS2502: Data Analytics Advanced Analytics - Introduction.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Copyright © 2001, SAS Institute Inc. All rights reserved. Data Mining Methods: Applications, Problems and Opportunities in the Public Sector John Stultz,
2 Table of Contents  Personal Identification Information  Trade Lines or Payment Records  Public Record and Collection Items  Inquiries  Consumer.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
A Decision Support Based on Data Mining in e-Banking Irina Ionita Liviu Ionita Department of Informatics University Petroleum-Gas of Ploiesti.
1 Data Warehousing Data Warehousing. 2 Objectives Definition of terms Definition of terms Reasons for information gap between information needs and availability.
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
Saskatoon SAS user group
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
Introduction C.Eng 714 Spring 2010.
Introduction to Data Mining and Classification
Advanced Analytics Using Enterprise Miner
Data Mining: Concepts and Techniques Course Outline
Data Warehousing and Data Mining
Supporting End-User Access
Introduction of Week 9 Return assignment 5-2
Presentation transcript:

Some slide material taken from: Groth, Han and Kamber, SAS Institute Data Mining A special presentation for BCIS 4660 Spring 2012 Dr. Nick Evangelopoulos, ITDS Dept.

Overview of this Presentation Introduction to Data Mining Examples of Data Mining applications The SEMMA Methodology SAS EM Demo: The Home Equity Loan Case Logistic Regression Decision Trees Neural Networks

Introduction to DM “It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.” Sir Arthur Conan Doyle: Sherlock Holmes, "A Scandal in Bohemia"

What Is Data Mining? Data mining (knowledge discovery in databases): –A process of identifying hidden patterns and relationships within data (Groth) Data mining: –Extraction of interesting ( non-trivial, implicit, previously unknown and potentially useful) information or patterns from data in large databases

Multidisciplinary Databases Statistics Pattern Recognition KDD Machine Learning AI Neurocomputing Data Mining

Architecture of a Typical Data Mining System Data Warehouse Data cleaning & data integration Filtering Databases Database or data warehouse server Data mining engine Pattern evaluation Graphical user interface Knowledge-base

A Data Mining example: The Home Equity Loan Case The analytic goal is to determine who should be approved for a home equity loan. The target variable is a binary variable that indicates whether an applicant eventually defaulted on the loan. The input variables are variables such as the amount of the loan, amount due on the existing mortgage, the value of the property, and the number of recent credit inquiries.

HMEQ case overview –The consumer credit department of a bank wants to automate the decision-making process for approval of home equity lines of credit. To do this, they will follow the recommendations of the Equal Credit Opportunity Act to create an empirically derived and statistically sound credit scoring model. The model will be based on data collected from recent applicants granted credit through the current process of loan underwriting. The model will be built from predictive modeling tools, but the created model must be sufficiently interpretable so as to provide a reason for any adverse actions (rejections). –The HMEQ data set contains baseline and loan performance information for 5,960 recent home equity loans. The target (BAD) is a binary variable that indicates if an applicant eventually defaulted or was seriously delinquent. This adverse outcome occurred in 1,189 cases (20%). For each applicant, 12 input variables were recorded.

The HMEQ Loan process 1.An applicant comes forward with a specific property and a reason for the loan (Home- Improvement, Debt-Consolidation) 2.Background info related to job and credit history is collected 3.The loan gets approved or rejected 4.Upon approval, the Applicant becomes a Customer 5.Information related to how the loan is serviced is maintained, including the Status of the loan (Current, Delinquent, Defaulted, Paid-Off)

The HMEQ Loan Transactional Database Entity Relationship Diagram (ERD), Logical Design: APPLICANT CUSTOMER PROPERTY becomes Applies for HMEQ Loan on… using… Reason Loan Approval Date OFFICER has HISTORY Balance Status MonthlyPayment ACCOUNT

HMEQ Transactional database: the relations Applicant APPLICANTID NAME JOB DEBTINC YOJ DEROG CLNO DELINQ CLAGE NINQ Property PROPERTYID ADDRESS VALUE MORTDUE HMEQLoanApplication OFFICERID APPLICANTID PROPERTYID LOAN REASON DATE APPROVAL Customer CUSTOMERID APPLICANTID NAME ADDRESS Account ACCOUNTID CUSTOMERID PROPERTYID ADDRESS BALANCE MONTHLYPAYMENT STATUS Officer OFFICERID OFFICERNAME PHONE FAX History HISTORYID ACCOUNTID PAYMENT DATE Entity Relationship Diagram (ERD), Physical Design:

The HMEQ Loan Data Warehouse Design We have some slowly changing attributes: HMEQLoanApplication: Loan, Reason, Date Applicant: Job and Credit Score related attributes Property: Value, Mortgage, Balance An applicant may reapply for a loan, then some of these attributes may have changed. –Need to introduce “Key” attributes and make them primary keys

The HMEQ Loan Data Warehouse Design STAR 1 – Loan Application facts Fact Table: HMEQApplicationFact Dimensions: Applicant, Property, Officer, Time STAR 2 – Loan Payment facts Fact Table: HMEQPaymentFact Dimensions: Customer, Property, Account, Time

Two Star Schemas for HMEQ Loans Applicant APPLICANTKEY APPLICANTID NAME JOB DEBTINC YOJ DEROG CLNO DELINQ CLAGE NINQ Property PROPERTYKEY PROPERTYID ADDRESS VALUE MORTDUE HMEQApplicationFact APPLICANTKEY PROPERTYKEY OFFICERKEY TIMEKEY LOAN REASON APPROVAL HMEQPaymentFact CUSTOMERKEY PROPERTYKEY ACCOUNTKEY TIMEKEY BALANCE PAYMENT STATUS Customer CUSTOMERKEY CUSTOMERID APPLICANTID NAME ADDRESS Time TIMEKEY DATE MONTH YEAR Account ACCOUNTKEY LOAN MATURITYDATE MONTHLYPAYMENT Officer OFFICERKEY OFFICERID OFFICERNAME PHONE FAX

The HMEQ Loan DW: Questions asked by management How many applications were filed each month during the last year? What percentage of them were approved each month? How has the monthly average loan amount been fluctuating during the last year? Is there a trend? Which customers were delinquent in their loan payment during the month of September? How many loans have defaulted each month during the last year? Is there an increasing or decreasing trend? How many defaulting loans were approved last year by each loan officer? Who are the officers with the largest number of defaulting loans?

The HMEQ Loan DW: Some more involved questions Are there any patterns suggesting which applicants are more likely to default on their loan after it is approved? Can we relate loan defaults to applicant job and credit history? Can we estimate probabilities to default based on applicant attributes at the time of application? Are there applicant segments with higher probability? Can we look at relevant data and build a predictive model that will estimate such probability to default on the HMEQ loan? If we make such a model part of our business policy, can we decrease the percentage of loans that eventually default by applying more stringent loan approval criteria?

Selecting Task-relevant attributes Customer CUSTOMERKEY CUSTOMERID APPLICANTID NAME ADDRESS Time TIMEKEY DATE MONTH YEAR Account ACCOUNTKEY LOAN MATURITYDATE MONTHLYPAYMENT Applicant APPLICANTKEY APPLICANTID NAME JOB DEBTINC YOJ DEROG CLNO DELINQ CLAGE NINQ Officer OFFICERKEY OFFICERID OFFICERNAME PHONE FAX Property PROPERTYKEY PROPERTYID ADDRESS VALUE MORTDUE HMEQApplicationFact APPLICANTKEY PROPERTYKEY OFFICERKEY TIMEKEY LOAN REASON APPROVAL HMEQPaymentFact CUSTOMERKEY PROPERTYKEY ACCOUNTKEY TIMEKEY BALANCE PAYMENT STATUS

HMEQ final task-relevant data file NameModel RoleMeasurement LevelDescription BAD TargetBinary 1=defaulted on loan, 0=paid back loan REASON InputBinary HomeImp=home improvement, DebtCon=debt consolidation JOB InputNominal Six occupational categories LOAN InputInterval Amount of loan request MORTDUE InputInterval Amount due on existing mortgage VALUE InputInterval Value of current property DEBTINC InputInterval Debt-to-income ratio YOJ InputInterval Years at present job DEROG InputInterval Number of major derogatory reports CLNO InputInterval Number of trade lines DELINQ InputInterval Number of delinquent trade lines CLAGE InputInterval Age of oldest trade line in months NINQ InputInterval Number of recent credit inquiries

HMEQ: Modeling Goal –The credit scoring model should compute the probability of a given loan applicant to default on loan repayment. A threshold is to be selected such that all applicants whose probability of default is in excess of the threshold are recommended for rejection. –Using the HMEQ task-relevant data file, three competing models will be built: A logistic Regression model, a Decision Tree, and a Neural Network –Model assessment will allow us to select the best of the three alternative models

... Predictive Modeling Inputs Cases Target...

Introducing SAS Enterprise Miner (EM)

The SEMMA Methodology –Introduced By SAS Institute –Implemented in SAS Enterprise Miner (EM) –Organizes a DM effort into 5 activity groups: Sample Explore Modify Model Assess

Sample Input Data Source Sampling Data Partition

Explore Distribution Explorer Multiplot Insight Association Variable Selection Link Analysis

Modify Data Set Attributes Transform Variables Filter Outliers Replacement Clustering Self-Organized Maps Kohonen Networks Time Series

Model Regression Tree Neural Network Princomp/ Dmneural User Defined Model Ensemble Memory Based Reasoning Two-Stage Model

Assess Assessment Reporter

SAS EM Demo: HMEQ Case