1 On-Line Application Processing Warehousing Data Cubes Data Mining.

Slides:



Advertisements
Similar presentations
OLAP Tuning. Outline OLAP 101 – Data warehouse architecture – ROLAP, MOLAP and HOLAP Data Cube – Star Schema and operations – The CUBE operator – Tuning.
Advertisements

Data Mining of Very Large Data
Data Analysis. Overview Traditional database systems are tuned to many, small, simple queries. Some applications use fewer, more time-consuming, analytic.
Data Warehousing CPS216 Notes 13 Shivnath Babu. 2 Warehousing l Growing industry: $8 billion way back in 1998 l Range from desktop to huge: u Walmart:
1 Introduction to SQL Multirelation Queries Subqueries Slides are reused by the approval of Jeffrey Ullman’s.
Data Warehousing M R BRAHMAM.
Jennifer Widom On-Line Analytical Processing (OLAP) Introduction.
Data Mining, Frequent-Itemset Mining
2/10/05Salman Azhar: Database Systems1 On-Line Analytical Processing Salman Azhar Warehousing Data Cubes Data Mining These slides use some figures, definitions,
OLAP. Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries.
Lecture 1: Data Warehousing Based on the slides by Jeffrey D. Ullman and Hector Garcia-Molina at Stanford University 1.
1 More SQL Defining a Database Schema Views. 2 Defining a Database Schema uA database schema comprises declarations for the relations (“tables”) of the.
SLIDE 1IS 257 – Fall 2011 Data Mining and the Weka Toolkit University of California, Berkeley School of Information IS 257: Database Management.
Asssociation Rules Prof. Sin-Min Lee Department of Computer Science.
Data Mining, Frequent-Itemset Mining. Data Mining Some mining problems Find frequent itemsets in "market-basket" data – "50% of the people who buy hot.
Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.
COMP 578 Data Warehousing And OLAP Technology Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Chapter 13 The Data Warehouse
SLIDE 1IS 257 – Fall 2010 Data Mining and the Weka Toolkit University of California, Berkeley School of Information IS 257: Database Management.
Tanvi Madgavkar CSE 7330 FALL Ralph Kimball states that : A data warehouse is a copy of transaction data specifically structured for query and analysis.
On-Line Application Processing Warehousing Data Cubes Data Mining 1.
Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007.
Chetan Bhirud Raza Mohammad Abinash Sahoo Online Marketing Giant.
© 2014 Zvi M. Kedem 1 Unit 11 Online Analytical Processing (OLAP) Basic Concepts.
Business Intelligence. Topics Chart Online Analytical Process, OLAP – Excel’s Pivot table – Data visualization with dashboard Data warehousing Data Mining.
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
1 Transactions, Views, Indexes Controlling Concurrent Behavior Virtual and Materialized Views Speeding Accesses to Data This slides are from J. Ullman’s.
On-Line Analytic Processing Chetan Meshram Class Id:221.
SLIDE 1IS 257 – Fall 2014 Data Mining and Introduction to Big Data University of California, Berkeley School of Information IS 257: Database.
DW-1: Introduction to Data Warehousing. Overview What is Database What Is Data Warehousing Data Marts and Data Warehouses The Data Warehousing Process.
Databases 2 Storage optimization: functional dependencies, normal forms, data ware houses.
OnLine Analytical Processing (OLAP)
1 On-Line Application Processing Warehousing Data Cubes Data Mining.
Database Design Part of the design process is deciding how data will be stored in the system –Conventional files (sequential, indexed,..) –Databases (database.
DIMENSIONAL MODELLING. Overview Clearly understand how the requirements definition determines data design Introduce dimensional modeling and contrast.
1 Data Warehouses BUAD/American University Data Warehouses.
BUSINESS ANALYTICS AND DATA VISUALIZATION
Winter 2006Winter 2002 Keller, Ullman, CushingJudy Cushing 19–1 Warehousing The most common form of information integration: copy sources into a single.
Ayyat IT Group Murad Faridi Roll NO#2492 Muhammad Waqas Roll NO#2803 Salman Raza Roll NO#2473 Junaid Pervaiz Roll NO#2468 Instructor :- “ Madam Sana Saeed”
Fox MIS Spring 2011 Data Warehouse Week 8 Introduction of Data Warehouse Multidimensional Analysis: OLAP.
1 On-Line Analytic Processing Warehousing Data Cubes.
On-Line Application Processing Warehousing Data Cubes (Data Mining) (slides borrowed from Stanford)
ADVANCED TOPICS IN RELATIONAL DATABASES Spring 2011 Instructor: Hassan Khosravi.
Business Intelligence. Topics Chart Online Analytical Process, OLAP – Excel’s Pivot table – Data visualization with dashboard Scenario Management Data.
Business Intelligence Transparencies 1. ©Pearson Education 2009 Objectives What business intelligence (BI) represents. The technologies associated with.
OLAP On Line Analytic Processing. OLTP On Line Transaction Processing –support for ‘real-time’ processing of orders, bookings, sales –typically access.
1 Views, Indexes Virtual and Materialized Views Speeding Accesses to Data.
CSE 5331/7331 F'071 CSE 5331/7331 Fall 2007 Dimensional Modeling Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University.
SLIDE 1IS 257 – Fall 2015 Data Mining and Introduction to Big Data University of California, Berkeley School of Information IS 257: Database.
1 Introduction to Database Systems, CS420 SQL Views and Indexes.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
Data Warehouses and OLAP 1.  Review Questions ◦ Question 1: OLAP ◦ Question 2: Data Warehouses ◦ Question 3: Various Terms and Definitions ◦ Question.
Databases 2 On-Line Application Processing: Warehousing, Data Cubes, Data Mining.
Data Warehousing and OLAP Outline u Models & operations u Implementing a warehouse u Future directions.
On-Line Application Processing
Operation Data Analysis Hints and Guidelines
Data warehouse.
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
Data Warehouse.
On-Line Analytic Processing
Data warehouse and OLAP
Chapter 13 The Data Warehouse
CS 440 Database Management Systems
On-Line Analytical Processing (OLAP)
On-Line Application Processing
Instructor: Zhe He Department of Computer Science
Presentation transcript:

1 On-Line Application Processing Warehousing Data Cubes Data Mining

2 Overview uTraditional database systems are tuned to many, small, simple queries. uSome new applications use fewer, more time-consuming, analytic queries. uNew architectures have been developed to handle analytic queries efficiently.

3 The Data Warehouse uThe most common form of data integration. wCopy sources into a single DB (warehouse) and try to keep it up-to-date. wUsual method: periodic reconstruction of the warehouse, perhaps overnight. wFrequently essential for analytic queries.

4 OLTP uMost database operations involve On- Line Transaction Processing (OTLP). wShort, simple, frequent queries and/or modifications, each involving a small number of tuples. wExamples: Answering queries from a Web interface, sales at cash registers, selling airline tickets.

5 OLAP uOn-Line Application Processing (OLAP, or “analytic”) queries are, typically: wFew, but complex queries --- may run for hours. wQueries do not depend on having an absolutely up-to-date database.

6 OLAP Examples 1.Amazon analyzes purchases by its customers to come up with an individual screen with products of likely interest to the customer. 2.Analysts at Wal-Mart look for items with increasing sales in some region. wUse empty trucks to move merchandise between stores.

7 Common Architecture uDatabases at store branches handle OLTP. uLocal store databases copied to a central warehouse overnight. uAnalysts use the warehouse for OLAP.

8 Star Schemas uA star schema is a common organization for data at a warehouse. It consists of: 1.Fact table : a very large accumulation of facts such as sales. wOften “insert-only.” 2.Dimension tables : smaller, generally static information about the entities involved in the facts.

9 Example: Star Schema uSuppose we want to record in a warehouse information about every beer sale: the bar, the brand of beer, the drinker who bought the beer, the day, the time, and the price charged. uThe fact table is a relation: Sales(bar, beer, drinker, day, time, price)

10 Example -- Continued uThe dimension tables include information about the bar, beer, and drinker “dimensions”: Bars(bar, addr, license) Beers(beer, manf) Drinkers(drinker, addr, phone)

11 Visualization – Star Schema Dimension Table (Beers)Dimension Table (etc.) Dimension Table (Drinkers)Dimension Table (Bars) Fact Table - Sales Dimension Attrs.Dependent Attrs.

12 Dimensions and Dependent Attributes uTwo classes of fact-table attributes: 1.Dimension attributes : the key of a dimension table. 2.Dependent attributes : a value determined by the dimension attributes of the tuple.

13 Example: Dependent Attribute uprice is the dependent attribute of our example Sales relation. uIt is determined by the combination of dimension attributes: bar, beer, drinker, and the time (combination of day and time-of-day attributes).

14 Approaches to Building Warehouses 1.ROLAP = “relational OLAP”: Tune a relational DBMS to support star schemas. 2.MOLAP = “multidimensional OLAP”: Use a specialized DBMS with a model such as the “data cube.”

15 MOLAP and Data Cubes uKeys of dimension tables are the dimensions of a hypercube. wExample: for the Sales data, the four dimensions are bar, beer, drinker, and time. uDependent attributes (e.g., price) appear at the points of the cube.

16 Visualization -- Data Cubes price bar beer drinker

17 Marginals uThe data cube also includes aggregation (typically SUM) along the margins of the cube. uThe marginals include aggregations over one dimension, two dimensions,…

18 Visualization --- Data Cube w/Aggregation price bar beer drinker SUM over all Drinkers

19 Example: Marginals uOur 4-dimensional Sales cube includes the sum of price over each bar, each beer, each drinker, and each time unit (perhaps days). uIt would also have the sum of price over all bar-beer pairs, all bar-drinker- day triples,…

20 Structure of the Cube uThink of each dimension as having an additional value *. uA point with one or more *’s in its coordinates aggregates over the dimensions with the *’s. uExample: Sales(”Joe’s Bar”, ”Bud”, *, *) holds the sum, over all drinkers and all time, of the Bud consumed at Joe’s.

21 Drill-Down uDrill-down = “de-aggregate” = break an aggregate into its constituents. uExample: having determined that Joe’s Bar sells very few Anheuser-Busch beers, break down his sales by particular A.-B. beer.

22 Roll-Up uRoll-up = aggregate along one or more dimensions. uExample: given a table of how much Bud each drinker consumes at each bar, roll it up into a table giving total amount of Bud consumed by each drinker.

23 Example: Roll Up and Drill Down JimBobMary Joe’s Bar Nut- House Blue Chalk $ of Anheuser-Busch by drinker/bar MaryBobJim $ of A-B / drinker Roll up by Bar Bud Light M’lob Bud MaryBobJim $ of A-B Beers / drinker Drill down by Beer

24 Data Mining uData mining is a popular term for queries that summarize big data sets in useful ways. uExamples: 1.Clustering all Web pages by topic. 2.Finding characteristics of fraudulent credit-card use.

25 Course Plug uWinter : Anand Rajaraman and Jeff Ullman are offering CS345A Data Mining. wMW 4:15-5:30, Herrin, T185.

26 Market-Basket Data uAn important form of mining from relational data involves market baskets = sets of “items” that are purchased together as a customer leaves a store. uSummary of basket data is frequent itemsets = sets of items that often appear together in baskets.

27 Example: Market Baskets uIf people often buy hamburger and ketchup together, the store can: 1.Put hamburger and ketchup near each other and put potato chips between. 2.Run a sale on hamburger and raise the price of ketchup.

28 Finding Frequent Pairs uThe simplest case is when we only want to find “frequent pairs” of items. uAssume data is in a relation Baskets(basket, item). uThe support threshold s is the minimum number of baskets in which a pair appears before we are interested.

29 Frequent Pairs in SQL SELECT b1.item, b2.item FROM Baskets b1, Baskets b2 WHERE b1.basket = b2.basket AND b1.item < b2.item GROUP BY b1.item, b2.item HAVING COUNT(*) >= s; Look for two Basket tuples with the same basket and different items. First item must precede second, so we don’t count the same pair twice. Create a group for each pair of items that appears in at least one basket. Throw away pairs of items that do not appear at least s times.

30 A-Priori Trick – (1) uStraightforward implementation involves a join of a huge Baskets relation with itself. uThe a-priori algorithm speeds the query by recognizing that a pair of items {i, j } cannot have support s unless both {i } and {j } do.

31 A-Priori Trick – (2) uUse a materialized view to hold only information about frequent items. INSERT INTO Baskets1(basket, item) SELECT * FROM Baskets WHERE item IN ( SELECT item FROM Baskets GROUP BY item HAVING COUNT(*) >= s ); Items that appear in at least s baskets.

32 A-Priori Algorithm 1.Materialize the view Baskets1. 2.Run the obvious query, but on Baskets1 instead of Baskets. uComputing Baskets1 is cheap, since it doesn’t involve a join. uBaskets1 probably has many fewer tuples than Baskets. wRunning time shrinks with the square of the number of tuples involved in the join.

33 Example: A-Priori uSuppose: 1.A supermarket sells 10,000 items. 2.The average basket has 10 items. 3.The support threshold is 1% of the baskets. uAt most 1/10 of the items can be frequent. uProbably, the minority of items in one basket are frequent -> factor 4 speedup.