Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Warehousing.

Similar presentations


Presentation on theme: "Data Warehousing."— Presentation transcript:

1 Data Warehousing

2 Definition Data Warehouse: Data Mart:
A subject-oriented, integrated, time-variant, non-updatable collection of data used in support of management decision-making processes Subject-oriented: e.g. customers, patients, students, products Integrated: Consistent naming conventions, formats, encoding structures; from multiple data sources Time-variant: Contain a time dimenstion so that it may be used to study trends and changes Nonupdatable: Read-only, periodically refreshed Data Mart: A data warehouse that is limited in scope

3 Need for Data Warehousing
Integrated, company-wide view of high-quality information (from disparate databases) Separation of operational and informational (decision support) systems and data (for improved performance)

4 Data Warehouse Architectures
Generic Two-Level Architecture Independent Data Mart All involve some form of extraction, transformation and loading (ETL)

5 Figure 11-2: Generic two-level data warehousing architecture
One, company-wide warehouse T E Periodic extraction  data is not completely current in warehouse

6 Figure 11-3 Independent data mart data warehousing architecture
Data marts: Mini-warehouses, limited in scope E T L Separate ETL for each independent data mart Data access complexity due to multiple data marts

7 The ETL Process Capture/Extract Scrub or data cleansing Transform:
Convert data from the format of the source to the format of the data warehouse. Load and Index ETL = Extract, transform, and load

8 Figure 11-10: Steps in data reconciliation
Load/Index= place transformed data into the warehouse and create indexes Figure 11-10: Steps in data reconciliation (cont.) Refresh mode: bulk rewriting of target data at periodic intervals Update mode: only changes in source data are written to data warehouse

9 Index Bitmap index Join index

10 Bitmap saves on space requirements Figure 6-8
Rows - possible values of the attribute Columns - table rows Bit indicates whether the attribute of a row has the values Figure 6-8 Bitmap index index organization

11 Figure 6-9 Join Indexes–speeds up join operations

12 Star Schema for Data Warehouse
Objectives Ease of use for decision support applications Fast response to predefined user queries Customized data for particular target audiences Also called “dimensional model” Dimension: A dimension is a term used to describe any category used in analyzing data, such as time, geography, and product line.

13 Figure 11-13 Components of a star schema
Fact tables contain factual or quantitative data 1:N relationship between dimension tables and fact tables Dimension tables are denormalized to maximize performance Dimension tables contain descriptions about the subjects of the business Excellent for ad-hoc queries, but bad for online transaction processing

14 Figure 11-14 Star schema example
Fact table provides statistics for sales broken down by product, period and store dimensions

15 Figure 11-15 Star schema with sample data

16 On-Line Analytical Processing (OLAP) Tools
The use of a set of graphical tools that provides users with multidimensional views of their data and allows them to analyze the data using simple windowing techniques Relational OLAP (ROLAP) Traditional relational representation Multidimensional OLAP (MOLAP) Cube structure OLAP Operations Cube slicing–come up with 2-D view of data Drill-down–going from summary to more detailed views

17 Figure 11-23 Slicing a data cube

18 Figure 11-24 Example of drill-down Summary report
Starting with summary data, users can obtain details for particular cells Drill-down with color added

19 Data Mining and Visualization
Knowledge discovery using a blend of statistical, AI, and computer graphics techniques Goals: Explain observed events or conditions Confirm hypotheses Explore data for new or unexpected relationships Techniques Statistical regression Decision tree induction Clustering and signal processing Affinity Sequence association Case-based reasoning Rule discovery Neural nets Fractals Data visualization–representing data in graphical/multimedia formats for analysis

20 Pivot Table Excel: Drill Down, Roll Up Access CrossTab query

21 SQL GROUPING SETS GROUPING SETS
SELECT CITY,RATING,COUNT(CID) FROM HCUSTOMERS GROUP BY GROUPING SETS(CITY,RATING,(CITY,RATING),()) ORDER BY CITY; Note: () indicates that an overall total is desired.

22 SQL CUBE Perform aggregations for all possible combinations of columns indicated. SELECT CITY,RATING,COUNT(CID) FROM HCUSTOMERS GROUP BY CUBE(CITY,RATING) ORDER BY CITY, RATING;

23 SQL ROLLUP The ROLLUP extension causes cumulative subtotals to be calculated for the columns indicated. If multiple columns are indicated, subtotals are performed for each of the columns except the far-right column. SELECT CITY,RATING,COUNT(CID) FROM HCUSTOMERS GROUP BY ROLLUP(CITY,RATING) ORDER BY CITY, RATING


Download ppt "Data Warehousing."

Similar presentations


Ads by Google