Nguyen Ngoc Tuan – 50702771 Le Nguyen Duy Vu - 50703018 11/24/2010 1.

Slides:



Advertisements
Similar presentations
Business Information Warehouse Business Information Warehouse.
Advertisements

UNIT-2 Data Preprocessing LectureTopic ********************************************** Lecture-13Why preprocess the data? Lecture-14Data cleaning Lecture-15Data.
OLAP Tuning. Outline OLAP 101 – Data warehouse architecture – ROLAP, MOLAP and HOLAP Data Cube – Star Schema and operations – The CUBE operator – Tuning.
Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
Nguyễn Phạm Luân Tiến Trần Đình H ươ ng Trà D ươ ng Bách Tùng
Data Warehousing CPS216 Notes 13 Shivnath Babu. 2 Warehousing l Growing industry: $8 billion way back in 1998 l Range from desktop to huge: u Walmart:
OLAP Services Business Intelligence Solutions. Agenda Definition of OLAP Types of OLAP Definition of Cube Definition of DMR Differences between Cube and.
ICS 421 Spring 2010 Data Warehousing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/18/20101Lipyeow.
Dimensional Modeling CS 543 – Data Warehousing. CS Data Warehousing (Sp ) - Asim LUMS2 From Requirements to Data Models.
Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.
COMP 578 Data Warehousing And OLAP Technology Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University.
Security in Databases. 2 Outline review of databases reliability & integrity protection of sensitive data protection against inference multi-level security.
Lab3 CPIT 440 Data Mining and Warehouse.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Chapter 13 The Data Warehouse
Microsoft SQL Server 2012 Analysis Services (SSAS) Reporting Services (SSRS)
Ch3 Data Warehouse part2 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
DATA WAREHOUSE (Muscat, Oman).
1 Data Warehousing and OLAP. 2 Data Warehousing & OLAP Defined in many different ways, but not rigorously.  A decision support database that is maintained.
Data Warehousing DSCI 4103 Dr. Mennecke Introduction and Chapter 1.
CS346: Advanced Databases
Ch3 Data Warehouse Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
OLAP OPERATIONS. OLAP ONLINE ANALYTICAL PROCESSING OLAP provides a user-friendly environment for Interactive data analysis. In the multidimensional model,
Dr. Bernard Chen Ph.D. University of Central Arkansas
8/20/ Data Warehousing and OLAP. 2 Data Warehousing & OLAP Defined in many different ways, but not rigorously. Defined in many different ways, but.
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie.
OnLine Analytical Processing (OLAP)
Data Warehouse and Business Intelligence Dr. Minder Chen Fall 2009.
DIMENSIONAL MODELLING. Overview Clearly understand how the requirements definition determines data design Introduce dimensional modeling and contrast.
Data Warehouse. Design DataWarehouse Key Design Considerations it is important to consider the intended purpose of the data warehouse or business intelligence.
1 Data Warehouses BUAD/American University Data Warehouses.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
The Data Warehouse “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of “all” an organisation’s data in support.
Data Warehousing.
October 28, Data Warehouse Architecture Data Sources Operational DBs other sources Analysis Query Reports Data mining Front-End Tools OLAP Engine.
Some OLAP Issues CMPT 455/826 - Week 9, Day 2 Jan-Apr 2009 – w9d21.
Ch3 Data Warehouse Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
SHIFALI CHOUBEY GISE LAB IITB Decision Support System For Farmers.
Fox MIS Spring 2011 Data Warehouse Week 8 Introduction of Data Warehouse Multidimensional Analysis: OLAP.
UNIT-II Principles of dimensional modeling
1 On-Line Analytic Processing Warehousing Data Cubes.
Data Mining Data Warehouses.
Managing Data for DSS II. Managing Data for DS Data Warehouse Common characteristics : –Database designed to meet analytical tasks comprising of data.
June 08, 2011 How to design a DATA WAREHOUSE Linh Nguyen (Elly)
Data Warehouse [ Example ] J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2001, ISBN Data Mining: Concepts and.
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
Data Warehousing and OLAP Outline u Models & operations u Implementing a warehouse u Future directions.
11/20/ :11 AMData Mining 1 Data Mining – CSE 9033 Chapter – 1; Data Warehousing Dr. Goutam Sarker, B.E., M.E., Ph.D.(Engineering), Fellow: IE(I),
Dense-Region Based Compact Data Cube
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
Chapter 13 Business Intelligence and Data Warehouses
Data warehouse and OLAP
Efficient Methods for Data Cube Computation
Chapter 13 The Data Warehouse
Data Warehouse—Subject‐Oriented
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Warehouse.
Data Mining Concept Description
Data Warehouse and OLAP
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie
Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009
Introduction of Week 9 Return assignment 5-2
Chapter 13 The Data Warehouse
Data Warehouse and OLAP
Data Warehouse and OLAP Technology
Presentation transcript:

Nguyen Ngoc Tuan – Le Nguyen Duy Vu /24/2010 1

1. Introduction to Data Warehouses and OLAP systems. 2. Security problem description and its related works. 3. Classify Security Threats & Identify Security Requirements. 4. Solution of Thee-tier Security Architecture. 5. Conclusion 11/24/2010 2

Introduction to Data Warehouses and OLAP systems 11/24/2010 3

A decision support database that is maintained separately from the organization’s operational database. “A subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management’s decision-making process.” W. H. Inmon 11/24/2010 4

5

Subject-oriented DW is organized around major subjects, such as: customer, supplier, product and sales. Integrated: DW is usually constructed by integrating multiple heterogeneous sources: relational databases, flat files and on-line transaction records, etc. Time-variant: data stored to provide information from a historical perspective (e.g., the past 5-10 years). Nonvolatile: DW is always a physically separate store of data transformed form. 11/24/2010 6

Information processing: supports querying, basic statistical analysis, and reporting using crosstabs, tables, charts, or graphs. Analytical processing: supports basic OLAP operations, including slice-and-dice, drill-down, roll-up, and pivoting. Data mining: 11/24/2010 7

Data cubes (aka. Hypercubes, or OLAP cubes): multidimensional matrices is Data Warehouse and OLAP data model support analysis to query data in different perspectives 11/24/ Example of 3-dimensional data cube model Three dimensions are: Product Location Year

2 type of tables: Dimensional table Fact table 2 type of schema: Star schema Snowflake schema: dimensional tables from star schema are organized into hierarchy by normalization Fact constellation: set of fact tables that share some dimension tables. 11/24/2010 9

Star schema 11/24/

Snowflake schema: 11/24/

Fact constellation 11/24/

OLAP (On-Line Analytical Processing): decision support system that enable analysts to construct a mental image about the underlying data (collected from Data Warehouse) by exploring it: from different perspectives, at different level of generations, and in interactive manner. 11/24/

OLAP provides a user-friendly environment for interactive data analysis. Roll-up (aka. drill up): performs aggregation on a data cube, either by climbing up a concept hierarchy for a dimension or by dimension reduction. Drill-down (reverse of roll-up): navigates from less detailed data to more detailed data. Slide and dice: performs a selection on one dimension of the given cube, resulting a sub cub. Pivot (rotate): visualization operation that rotates the data axes in order to provide an alternative presentation of the data. 11/24/

Security problem description and its related works 11/24/

Insiders who have legitimate accesses to data through OLAP queries Access control techniques are not directly applicable due to the difference in data models Indirect inferences of protected data Inference control is absent in most commercial OLAP systems 11/24/

Restricted-based methods: Cell suppression: hide cells that contain small COUNT values, detect possible inferences related to these cells and remove them using linear programming Partitioning: defines a partition on sensitive data and restricts queries to aggregate only to complete blocks in the partition Micro-aggregation: replace clusters of sensitive data with their averages Perturbed-based techniques: add random noise to data 11/24/

Classify Security Threats & Identify Security Requirements 11/24/

In OLAP Systems, sensitive data can be inferred from answers to legitimate queries. There are two kind of inference One dimensional inference (1-d inference) Multi-dimensional inference (m-d inference) A cell is inferred using two or more of its descendants Neither of those descendants causes 1-d inferences Examples 11/24/

11/24/

1-d inference: Adversary: Prohibited from accessing cuboid Allowed to access its descendant Suppose: Knows about empty cells, Bob & Alice taking the same amount of commission in Q3 Infer that and as 5500, half of 11/24/

M-d inference with SUM Adversary Prohibited from accessing cuboid Allowed to access its descendants, Supposed: know empty cells Infer that: = ( + ) – ( + ) = /24/

M-d inference with MAX Adversary Prohibited from accessing cuboid Allowed to access its descendants, knows MAX( ) = 6400, MAX ( ) = 6000  ≠ Similarly, and ≠ Conclusion: = /24/

M-d inference with SUM, MAX & MIN Adversary Assumption like above examples. Adversary can ask queries using SUM, MAX, MIN Get = 6400, MAX( ) = 6400, MIN( ) = 6000, SUM( ) =  {(,, } = {6000, 6000, 0} Continue to MAX, MIN, SUM on,  = /24/

Security solution for OLAP systems combine access control and inference control Achieve a balance among following objectives Security: from both unauthorized access and malicious inferences Applicability: cover a wide range of scenarios without need for significant modifications Efficiency Availability Practicality 11/24/

Solution of Thee-tier Security Architecture 11/24/

In statistical databases: two tier (sensitive data, aggregation queries) Apply this architecture to OLAP has some drawbacks Unacceptable delay for query processing Inference control methods cannot take advantage of the special characteristics of an OLAP application 11/24/

Three tier: query tier, aggregation tier and data tier 11/24/

Aggregation tier must satisfy 3 properties Aggregation layer is secure with respect to Data layer, enforced by inference control Its size must be comparable with the Data layer Problem of inference control can be partitioned into blocks in Data layer and Aggregation layer. Security need only to ensure each corresponding pair of blocks in the two tiers 11/24/

Reduce performance overhead of inference control Aggregation tier can pre-computed: computation intensive part of inference control can be shifted to offline processing Reduce size of inputs to inference control algorithms  reduce complexity Localizing inference control tasks to each block of data  failure in one block won’t affect other block 11/24/

Cardinality-based method Detect inferences based on the number of answered queries We consider one-level hierarchy, each dimension can only have two attributes: core cuboid, its descendants are, and 11/24/

Cardinality-based method 11/24/

Cardinality-based method Existence of 1-d inferences and the number of empty cells k=number of dimensions, d max is greatest domain size of all dimensions Number of empty cells 0 2 k-1.d max Free of 1-d inference Always have 1-d inference 11/24/

Cardinality-based method Existence of m-d inferences and the number of empty cells Cuboid with no empty cells is free of m-d inferences Theorem: C c is core cuboid, C all is collection of all aggregation cuboids i th attribute of C c has d i values, d u and d v is the 2 smallest among d i ’s w is number of C c empty cells We have: C c is free from m-d inference if w < 2(d u -4) + 2(d v -4) -1, d i ≥ 4 for all 1 ≤ I ≤ k. C c has m-d inference if w ≥ 2(d u -4) + 2(d v -4) /24/

Parity-based method Based on a simple fact that even number is closed under the operation of addition and subtraction The nature of m-inference is to keep adding (or subtracting) sets of cells until the result yields one cell We consider multi-dimensional range (MDR) query is considered. An MDR is an operation of addition (or subtraction) 11/24/

We use: q*(, ) = x1 + x2 + x3 + x4 + x5 + x6 q*(, ) = x1 + x2 … Restricting MDR queries to only include even number of cells  hard to obtain (maybe) 11/24/

Parity-based method Inference: q*(, ) = x1 + x2 = q*(, ) = x4 + x5 = q*(, ) = x5 + x6 = q*(, ) = x3 + x5 = q*(, ) = x1+ x2 + x3 + x4+ x5 +x6 = 6500 = x5 + x5 = 1000  x5 = /24/

Parity-based method Derivability: a set of queries Q1 is derivable from another set Q2, then the answer to Q1 can be computed using answers to Q2. Q1 is free of inferences if Q2 is. Find another collection of even MDR queries Q p that are equivalent to Q* and whose inferences are easier to detect. Then, denote Q p as an undirected simple graph G(C c, Q p ). After that, check G whether or not a bipartite graph (graph no cycle composed of odd number of edges) 11/24/

Approach detect inferences caused by queries involving both MAXs and SUMs is intractable not directly detect inferences, but instead first prevents m-d inferences and then remove 1-d inferences Access control Define 2 functions: Below() partitions data cube along the dependency lattice Slide() partitions data cube along dimensions. Object is the intersection of the two above partitions. 11/24/

Access control Example: Employee’s yearly or more detailed commission is sensitive. This requirement only applied to first year data Specifies as Object(L, S), L =, S includes all cells in the first four quarters of 11/24/

Lattice-based inference control Given to set of cells S and T. For any cell c in S, we say c is redundant with respect to T if S includes both c and c’s ancestors c is non-comparable to T if T contains no c’ that c is ancestor/ descendant of c’.  Reducible inference: only check if S – {c} causes any inferences to T Example: we want to protect Object(L, S), where S is complete cuboid(means “no slide”), and L = {, }. 11/24/

Lattice-based inference control 11/24/

Lattice-based inference control More generally, as long as any cuboid c r satisfies that all ancestors are included by T (under LOWER curve), the descendant closure of c r is the maximal result for preventing m-d inferences After m-d inferences are prevented, remove 1-d inferences control m-d inferences to this new object Repeating the two above steps until removing all 1-d references The final result is a set of cells that are guaranteed to be free of inferences to the object 11/24/

Implement lattice-based inference control method in three- tier architecture: The authorization object computed through the above iterative process comprises the data tier The complement of the object is the aggregation tier since it does not cause any inferences to the data tier 11/24/

Conclusion 11/24/

The most challenging security threat in Data Warehouse and OLAP systems is: Data stored in data warehouse may be disclosed through seemingly innocent OLAP queries 2 main inference threat that should be considered: 1-d inference m-d inference We presented 3 methods to prevent / remove inference: Cardinality-based method Parity-based method Lattice-based inference control All above methods are applicable to the three-tier inference control architecture, that especially suits OLAP systems. 11/24/

Lingyu and Sushil Jajodia. Security in Data Warehouses and OLAP Systems. 11/24/

11/24/