Managing Data for DSS II. Managing Data for DS Data Warehouse Common characteristics : –Database designed to meet analytical tasks comprising of data.

Slides:



Advertisements
Similar presentations
UNIT – 1 Data Preprocessing
Advertisements

UNIT-2 Data Preprocessing LectureTopic ********************************************** Lecture-13Why preprocess the data? Lecture-14Data cleaning Lecture-15Data.
1 Copyright by Jiawei Han, modified by Charles Ling for cs411a/538a Data Mining and Data Warehousing v Introduction v Data warehousing and OLAP for data.
Ch2 Data Preprocessing part3 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Data Warehousing.

Introduction to Data Warehouse and Data Mining MIS 2502 Data Analytics
Chapter 3 Pre-Mining. Content Introduction Proposed New Framework for a Conceptual Data Warehouse Selecting Missing Value Point Estimation Jackknife estimate.
Data Warehousing Xintao Wu. Evolution of Database Technology (See Fig. 1.1) 1960s: Data collection, database creation, IMS and network DBMS 1970s: Relational.
Pre-processing for Data Mining CSE5610 Intelligent Software Systems Semester 1.
Data Preprocessing.
1 Lecture 10: More OLAP - Dimensional modeling
13 Chapter 13 The Data Warehouse Hachim Haddouti.
Chapter 13 The Data Warehouse
Ch3 Data Warehouse part2 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
1 Data Warehousing and OLAP. 2 Data Warehousing & OLAP Defined in many different ways, but not rigorously.  A decision support database that is maintained.
Tanvi Madgavkar CSE 7330 FALL Ralph Kimball states that : A data warehouse is a copy of transaction data specifically structured for query and analysis.
Major Tasks in Data Preprocessing(Ref Chap 3) By Prof. Muhammad Amir Alam.
Chapter 1 Data Preprocessing
1 Data Warehouses C hapter 2. 2 Chapter 2 Outline Chapter 2 Outline – Introduction –Data Warehouses –Data Warehouse in Organisation – OLTP vs. OLAP –Why.
Chapter 13 – Data Warehousing. Databases  Databases are developed on the IDEA that DATA is one of the critical materials of the Information Age  Information,
Dr. Bernard Chen Ph.D. University of Central Arkansas
8/20/ Data Warehousing and OLAP. 2 Data Warehousing & OLAP Defined in many different ways, but not rigorously. Defined in many different ways, but.
ITEC 3220A Using and Designing Database Systems
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
Data Management for Decision Support Session-2 Prof. Bharat Bhasker.
Ch2 Data Preprocessing part2 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Enhancing Management Decision Making Week-10 Prof. Bharat Bhasker.
Data Warehouse & Data Mining
Datawarehouse Objectives
Outline Introduction Descriptive Data Summarization Data Cleaning Missing value Noise data Data Integration Redundancy Data Transformation.
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
Data Warehousing Xintao Wu. Can You Easily Answer These Questions? What are Personnel Services costs across all departments for all funding sources? What.
1 Data Warehouses BUAD/American University Data Warehouses.
13 Chapter 13 The Data Warehouse Database Systems: Design, Implementation, and Management 4th Edition Peter Rob & Carlos Coronel.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
The Data Warehouse “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of “all” an organisation’s data in support.
October 28, Data Warehouse Architecture Data Sources Operational DBs other sources Analysis Query Reports Data mining Front-End Tools OLAP Engine.
Data Preprocessing Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
13 1 Chapter 13 The Data Warehouse Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Ch3 Data Warehouse Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Fox MIS Spring 2011 Data Warehouse Week 8 Introduction of Data Warehouse Multidimensional Analysis: OLAP.
Data Preprocessing Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Data Mining Data Warehouses.
Data Mining: Concepts and Techniques — Chapter 2 —
13 1 Chapter 13 The Data Warehouse Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
January 21, 2016Data Mining: Concepts and Techniques 1 Chapter 3: Data Warehousing and OLAP Technology: An Overview What is a data warehouse? A multi-dimensional.
Advanced Database Concepts
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
February 18, 2016Data Mining: Babu Ram Dawadi1 Chapter 3: Data Preprocessing Preprocess Steps Data cleaning Data integration and transformation Data reduction.
ITEC 3220M Using and Designing Database Systems Instructor: Prof. Z.Yang Course Website: c3220m.htm Office: TEL.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
Waqas Haider Bangyal. Classification Vs Clustering In general, in classification you have a set of predefined classes and want to know which class a new.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
Data Mining What is to be done before we get to Data Mining?
Data Mining: Data Prepossessing What is to be done before we get to Data Mining?
Pattern Recognition Lecture 20: Data Mining 2 Dr. Richard Spillman Pacific Lutheran University.
Data warehouse and OLAP
Chapter 13 The Data Warehouse
Data Warehouse.
Chapter 13 – Data Warehousing
Data Warehouse and OLAP
Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009
Introduction of Week 9 Return assignment 5-2
Chapter 13 The Data Warehouse
Data Warehouse and OLAP
Presented by: Tek Narayan Adhikari
Presentation transcript:

Managing Data for DSS II

Managing Data for DS Data Warehouse Common characteristics : –Database designed to meet analytical tasks comprising of data from multiple applications –Small number of users with intense and long interactions –Read intensive usage –Periodic updates to the contents –Consists of current as well as historical data –Relatively fewer but large tables –Queries results is large results sets, involving full table scan and joins spanning several tables –Aggregation, vector operation and summarization are common –The data frequently resides in external heterogeneous sources

Introduction- Terminology Current Detail Data- data acquired directly from operational databases, often representing entire enterprise Old Detail Data- Aged current detail data, historical data organized by subjects, it helps in trend analysis Data Marts- A large data store for informational needs where scope is limited to a department, SBUs etc., In a phased implementation data marts are a way to build a warehouse. Summarized Data- Aggregated data along the lines required for executive reporting,trend analysis and decision support. Metadata- It is data about the data, description of contents, location, structure, end-user views, identification of authoritative data, history of updates, security authorizations

Introduction- Architecture Extract, Cleanup & Load External Currentl Current Repository Meta data Realized or Virtual MDDB Management Information Delivery System Report, Query & EIs OLAP Tools Data Mining Tools

The Data Warehouse is an integrated, subject-oriented, time-variant, non-volatile database that provides support for decision making. –Integrated The Data Warehouse is a centralized, consolidated database that integrates data retrieved from the entire organization. –Subject-Oriented The Data Warehouse data is arranged and optimized to provide answers to questions coming from diverse functional areas within a company.

Time Variant –The Warehouse data represent the flow of data through time. It can even contain projected data. –Non-Volatile Once data enter the Data Warehouse, they are never removed. The Data Warehouse is always growing.

Major Tasks in Data Preparation Data cleaning –Fill in missing values, smooth noisy data, identify or remove outliers, and resolve inconsistencies Data integration –Integration of multiple databases, data cubes, or files Data transformation –Normalization and aggregation Data reduction –Obtains reduced representation in volume but produces the same or similar analytical results Data discretization –Part of data reduction but with particular importance, especially for numerical data

Extraction, Cleanup, Integration Data Cleaning –Missing Values Ignore the tuple Fill in the value manually Use a global constant to fill Attribute mean as missing value –Average income of all customer is pm Attribute mean of all samples belonging to same class –Missing value with average income of same class e.g., credit_risk, emp_status Most probable value –Regression, Bayesian classifiers, decision tree induction

Extraction, Cleanup, Integration Data Cleaning –Noisy Data- A random error or variance of measured value. Given price how can we smooth our the data to remove noise. Binning –Smooth the sorted data by consulting the neighbours. – Given –Parttion it- »Bin 1: »Bin 2: »Bin 3: –Replace the Bin values by mean or Bin boundaries Clustering Regression- Smoothen it by fitting in fitting in functions. –Inconsistent Data – Manually or through rule base

Data Transformation Smoothing: remove noise from data Aggregation: summarization, data cube construction Generalization: concept hierarchy climbing Normalization: scaled to fall within a small, specified range –min-max normalization –Z-score normalization –Normalization by decimal scaling

Data Transformation: Normalization min-max normalization Suppose that the minimum and maximum values for the attribute income are £12,000 and £98,000, respectively. We map income to the range [0.0, 1.0]. By min-max normalization, a value of £73,600 for income is transformed to ( )/( )*( )+0=0.716.

Data Transformation: Normalization min-max normalization z-score normalization normalization by decimal scaling Where j is the smallest integer such that Max(| |)<1

Star Schema The star schema is a data-modeling technique used to map multidimensional decision support into a relational database. Star schemas yield an easily implemented model for multidimensional data analysis while still preserving the relational structure of the operational database. Four Components: –Facts –Dimensions –Attributes –Attribute hierarchies

A Simple Star Schema

Star Schema Facts –Facts are numeric measurements (values) that represent a specific business aspect or activity. –The fact table contains facts that are linked through their dimensions. –Facts can be computed or derived at run-time (metrics). Dimensions –Dimensions are qualifying characteristics that provide additional perspectives to a given fact. –Dimensions are stored in dimension tables.

Star Schema Attributes –Each dimension table contains attributes. Attributes are often used to search, filter, or classify facts. –Dimensions provide descriptive characteristics about the facts through their attributes. Possible Attributes For Sales Dimensions

Three Dimensional View Of Sales

Slice And Dice View Of Sales

Star Schema Attribute Hierarchies –Attributes within dimensions can be ordered in a well-defined attribute hierarchy. –The attribute hierarchy provides a top-down data organization that is used for two main purposes: Aggregation Drill-down/roll-up data analysis

A Location Attribute Hierarchy

Attribute Hierarchies In Multidimensional Analysis

Example of Star Schema time_key day day_of_the_week month quarter year time location_key street city province_or_street country location Sales Fact Table time_key item_key branch_key location_key units_sold dollars_sold avg_sales Measures item_key item_name brand type supplier_type item branch_key branch_name branch_type branch