Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Mining Solutions (Westphal & Blaxton, 1998) Dr. K. Palaniappan Dept. of Computer Engineering & Computer Science, UMC.

Similar presentations


Presentation on theme: "Data Mining Solutions (Westphal & Blaxton, 1998) Dr. K. Palaniappan Dept. of Computer Engineering & Computer Science, UMC."— Presentation transcript:

1 Data Mining Solutions (Westphal & Blaxton, 1998) Dr. K. Palaniappan Dept. of Computer Engineering & Computer Science, UMC

2 Sep. 30, 1999Dr. K. Palaniappan2 What is Data Mining? “Something old, something new” “Something old, something new” Data mining vs applied statistics Data mining vs applied statistics Data mining vs pattern recognition Data mining vs pattern recognition Data mining vs machine intelligence Data mining vs machine intelligence Unique characteristics Unique characteristics Large information databases Large information databases Exploratory analysis vs predefined hypothesis Exploratory analysis vs predefined hypothesis

3 Sep. 30, 1999Dr. K. Palaniappan3 What is Data Mining? Unique characteristics (cont’d) Unique characteristics (cont’d) Qualitative vs quantitative tools Qualitative vs quantitative tools Visualization vs numeric tests Visualization vs numeric tests Data massaging - cleaning, warehousing Data massaging - cleaning, warehousing Discover “interesting” patterns and trends that have business relevance (revenue & profits) Discover “interesting” patterns and trends that have business relevance (revenue & profits)

4 Sep. 30, 1999Dr. K. Palaniappan4 Data Mining Activities Marketing - predict response to direct mail or telephone solicitation using historical data Marketing - predict response to direct mail or telephone solicitation using historical data Advertising - set ad rates based on number of Internet viewers, patterns of ad viewing Advertising - set ad rates based on number of Internet viewers, patterns of ad viewing Production - relate product quality to manufacturing and process variables Production - relate product quality to manufacturing and process variables Financial - identify anomalous patterns in transactions related to fraud, criminal activity Financial - identify anomalous patterns in transactions related to fraud, criminal activity

5 Sep. 30, 1999Dr. K. Palaniappan5 Data Mining Activities Insurance - compare property claims vs estimated damage from natural disasters ie. Hurricane Floyd, earthquakes Insurance - compare property claims vs estimated damage from natural disasters ie. Hurricane Floyd, earthquakes E-Commerce - product clusters, zShops E-Commerce - product clusters, zShops Information - WWW, electronic journals Information - WWW, electronic journals Medical - identifying disease (I.e. cancer) with a causitive agent Medical - identifying disease (I.e. cancer) with a causitive agent Analysis vs monitoring Analysis vs monitoring Offline vs online Offline vs online “good” vs “bad” transaction or condition “good” vs “bad” transaction or condition

6 Sep. 30, 1999Dr. K. Palaniappan6 Data Mining Tasks Classification or identification - automatically label input records Classification or identification - automatically label input records Estimation or regression - predict magnitude of response or other missing field given input records Estimation or regression - predict magnitude of response or other missing field given input records Segmentation or clustering - group the input records into meaningful sub-populations Segmentation or clustering - group the input records into meaningful sub-populations

7 Sep. 30, 1999Dr. K. Palaniappan7 Data Mining Tasks Description or visualization - looking for gems and diamonds among pebbles Description or visualization - looking for gems and diamonds among pebbles Exploit the power of human (visual) perception for detecting interesting patterns in data vs scrolling through textual tables Exploit the power of human (visual) perception for detecting interesting patterns in data vs scrolling through textual tables

8 Sep. 30, 1999Dr. K. Palaniappan8 Predictive and Descriptive Goals Predictive - produce models for classification or estimation Predictive - produce models for classification or estimation Descriptive - uncovering patterns and relationships Descriptive - uncovering patterns and relationships

9 Sep. 30, 1999Dr. K. Palaniappan9 Structured and Unstructured Data Structured - fixed length, fixed format records with numeric values, character codes, strings, etc. Structured - fixed length, fixed format records with numeric values, character codes, strings, etc. “Unstructured” - images (i.e. aerial or satellite photos of damage for insurance claims), video (i.e. shopping pattern behavior) “Unstructured” - images (i.e. aerial or satellite photos of damage for insurance claims), video (i.e. shopping pattern behavior)

10 Sep. 30, 1999Dr. K. Palaniappan10 Data Modeling Object modeling Object modeling Object attributes - value for each attribute as extracted from data record Object attributes - value for each attribute as extracted from data record Attribute assignment - e.g. notebook, cabinet, can, cup, case Attribute assignment - e.g. notebook, cabinet, can, cup, case Size, shape, material, purpose Size, shape, material, purpose State-based analysis State-based analysis

11 Sep. 30, 1999Dr. K. Palaniappan11 Data Modeling Flexibility in attribute modeling Flexibility in attribute modeling e.g. franchise (object class), city (attribute) e.g. franchise (object class), city (attribute) OR city (object class), franchise (attribute) OR city (object class), franchise (attribute) Analysis based on stores vs analysis based on location Analysis based on stores vs analysis based on location Class relationship model - links describe relationships modeled as objects with attributes Class relationship model - links describe relationships modeled as objects with attributes

12 Sep. 30, 1999Dr. K. Palaniappan12 Data Modeling Composite representations Composite representations Combining objects with similar user-selected characteristics Combining objects with similar user-selected characteristics Data abstraction Data abstraction Metadata - data within data, data about data Metadata - data within data, data about data Metadata from dates, numbers, address Metadata from dates, numbers, address Seasonality, warranty related parts failure Seasonality, warranty related parts failure

13 Sep. 30, 1999Dr. K. Palaniappan13 Data Modeling Descriptive vs transactional model Descriptive vs transactional model Telephone calling patterns Telephone calling patterns Intra- and inter-domain patterns Intra- and inter-domain patterns Horizontal vs vertical Horizontal vs vertical Communication, transportation, inventory Communication, transportation, inventory Combining data sources Combining data sources Spatial, temporal, structure-based (categorical clusters), value-based (discrete ranges) Spatial, temporal, structure-based (categorical clusters), value-based (discrete ranges)

14 Sep. 30, 1999Dr. K. Palaniappan14 Data Modeling e.g. Tax compliance - tax return, real-property assets, motor vehicle records, bank transfers e.g. Tax compliance - tax return, real-property assets, motor vehicle records, bank transfers e.g. Medicare filings, pharmacy product pricing e.g. Medicare filings, pharmacy product pricing

15 Sep. 30, 1999Dr. K. Palaniappan15 Problem Definition Knowledge representation using hierarchical frameworks Knowledge representation using hierarchical frameworks Objects--> Relationships-->Networks--> Applications-->Systems Objects--> Relationships-->Networks--> Applications-->Systems Procedural vs declarative knowledge Procedural vs declarative knowledge Episodic data tagged with temporal and spatial information Episodic data tagged with temporal and spatial information Semantic data more commonly analyzed Semantic data more commonly analyzed

16 Sep. 30, 1999Dr. K. Palaniappan16 Data Preparation & Analysis Define data mining goals Define data mining goals Accessing and preparing data Accessing and preparing data Capitalization, concatenation, representation format, augmentation, abstraction, unit conversion, exclusion Capitalization, concatenation, representation format, augmentation, abstraction, unit conversion, exclusion


Download ppt "Data Mining Solutions (Westphal & Blaxton, 1998) Dr. K. Palaniappan Dept. of Computer Engineering & Computer Science, UMC."

Similar presentations


Ads by Google