Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Warehousing.

Similar presentations


Presentation on theme: "Data Warehousing."— Presentation transcript:

1 Data Warehousing

2 On-Line Analytical Processing (OLAP) Tools
The use of a set of graphical tools that provides users with multidimensional views of their data and allows them to analyze the data using simple windowing techniques Relational OLAP (ROLAP) Traditional relational representation Multidimensional OLAP (MOLAP) Cube structure OLAP Operations Cube slicing–come up with 2-D view of data Drill-down–going from summary to more detailed views Roll-up – the opposite direction of drill-down Reaggregation – rearrange the order of dimensions

3 Slicing a data cube

4 Example of drill-down Summary report
Starting with summary data, users can obtain details for particular cells Drill-down with color added

5 Excel’s Pivot Table Data/Pivot Table Drilldown, rollup, reaggregation

6 Access Pivot Form Drill Down

7 Data Warehouse A subject-oriented, integrated, time-variant, non-updatable collection of data used in support of management decision-making processes Subject-oriented: e.g. customers, employees, locations, products, time periods, etc. Dimensions for data analysis Integrated: Consistent naming conventions, formats, encoding structures; from multiple data sources Time-variant: Can study trends and changes Nonupdatable: Read-only, periodically refreshed Data Mart: A data warehouse that is limited in scope

8 Need for Data Warehousing
Integrated, company-wide view of high-quality information (from disparate databases) Separation of operational and informational systems and data (for improved performance)

9 Generic two-level data warehousing architecture
One, company-wide warehouse T E Periodic extraction  data is not completely current in warehouse

10 The ETL Process Capture/Extract Scrub or data cleansing Transform
Load and Index ETL = Extract, transform, and load

11 Capture/Extract…obtaining a snapshot of a chosen subset of the source data for loading into the data warehouse Incremental extract = capturing changes that have occurred since the last static extract Static extract = capturing a snapshot of the source data at a point in time

12 Data Warehouse Design - Star Schema -
Also called “dimensional model” Fact table contain detailed business data Dimension tables contain descriptions about the subjects of the business such as customers, employees, locations, products, time periods, etc. A dimension is a term used to describe any category used in analyzing data, such as time, geography, and product line.

13 Star schema example Fact table provides statistics for sales broken down by product, period and store dimensions Dimension tables contain descriptions about the subjects of the business

14 Star schema with sample data

15 Example: Order Processing System
City OID ODate CID Cname Rating SalesPerson Has M Order Customer 1 M Qty Has M Product Price PID Pname

16 Star Schema Location CustomerRating Dimension Dimension LocationCode
State City CustomerRating Dimension Rating Description FactTable LocationCode PeriodCode Rating PID Qty Amount Can group by State, City Period Dimension PeriodCode Year Quarter Product Category CategoryID Description Product Dimension PID Pname CategoryID (Snowflake model)

17 From SalesDB to MyDataWarehouse
Extract data from SalesDB: Create query to get the data Download to MyDataWareHouse File/Import/Save as Table Data scrub/cleasing,and transform: Transform City to Location Transform Odate to Period Load data to FactTable

18 Bitmap saves on space requirements Figure 6-8
Rows - possible values of the attribute Columns - table rows Bit indicates whether the attribute of a row has the values Figure 6-8 Bitmap index index organization

19 Figure 6-9 Join Indexes–speeds up join operations

20 Data Mining and Visualization
Knowledge discovery using a blend of statistical, AI, and computer graphics techniques Goals: Explain observed events or conditions Confirm hypotheses Explore data for new or unexpected relationships Techniques Statistical regression Decision tree induction Clustering and signal processing Affinity Sequence association Case-based reasoning Rule discovery Neural nets Fractals Data visualization–representing data in graphical/multimedia formats for analysis

21 SQL GROUPING SETS GROUPING SETS
SELECT CITY,RATING,COUNT(CID) FROM HCUSTOMERS GROUP BY GROUPING SETS(CITY,RATING,(CITY,RATING),()) ORDER BY CITY; Note: () indicates that an overall total is desired.

22 SQL CUBE Perform aggregations for all possible combinations of columns indicated. SELECT CITY,RATING,COUNT(CID) FROM HCUSTOMERS GROUP BY CUBE(CITY,RATING) ORDER BY CITY, RATING;

23 SQL ROLLUP The ROLLUP extension causes cumulative subtotals to be calculated for the columns indicated. If multiple columns are indicated, subtotals are performed for each of the columns except the far-right column. SELECT CITY,RATING,COUNT(CID) FROM HCUSTOMERS GROUP BY ROLLUP(CITY,RATING) ORDER BY CITY, RATING


Download ppt "Data Warehousing."

Similar presentations


Ads by Google