Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sachin Goel (68) Manav Mudgal (69) Piyush Samsukha (76) Rachit Singhal (82) Richa Somvanshi (85) Sahar ( )

Similar presentations


Presentation on theme: "Sachin Goel (68) Manav Mudgal (69) Piyush Samsukha (76) Rachit Singhal (82) Richa Somvanshi (85) Sahar ( )"— Presentation transcript:

1 Sachin Goel (68) Manav Mudgal (69) Piyush Samsukha (76) Rachit Singhal (82) Richa Somvanshi (85) Sahar ( )

2  Outline Data Warehousing Warehouse Architecture Its components Data flows Data marts Benefits of data warehousing Disadvantages of datawarehousing Case Study

3  What is data warehousing? data warehousing is subject-oriented, integrated, time-variant, and non-volatile collection of data in support of management’s decision-making process. a data warehouse is data management and data analysis data webhouse is a distributed data warehouse that is implement over the web with no central data repository goal: is to integrate enterprise wide corporate data into a single reository from which users can easily run queries

4  What is data warehousing? Subject-oriented  WH is organized around the major subjects of the enterprise..rather than the major application areas.. This is reflected in the need to store decision-support data rather than application-oriented data Integrated  because the source data come together from different enterprise- wide applications systems. The source data is often inconsistent using..The integrated data source must be made consistent to present a unified view of the data to the users Time-variant  the source data in the WH is only accurate and valid at some point in time or over some time interval. The time-variance of the data warehouse is also shown in the extended time that the data is held, the implicit or explicit association of time with all data, and the fact that the data represents a series of snapshots Non-volatile  data is not update in real time but is refresh from OS on a regular basis. New data is always added as a supplement to DB, rather than replacement. The DB continually absorbs this new data, incrementally integrating it with previous data

5 Operational data source1  The architecture Query Manager Warehouse Manager DBMS Operational data source 2 Meta-data High summarized data Detailed data Lightly summarized data Operational data store (ods) Operational data source n Archive/backup data Load Manager Data mining OLAP(online analytical processing) tools Reporting, query, application development, and EIS(executive information system) tools End-user access tools Typical architecture of a data warehouse Operational data store (ODS)

6  The main components Operational data sources  The sources of data for the data warehouse is supplied from: The data from the mainframe systems in the traditional network and hierarchical format. Data can also come from the relational DBMS like Oracle, Informix. In addition to these internal data, operational data also includes external data obtained from commercial databases and databases associated with supplier and customers. Operational datastore(ODS)  is a repository of current and integrated operational data used for analysis. It is often structured and supplied with data in the same way as the data warehouse, but may in fact simply act as a staging area for data to be moved into the warehouse

7  The main components Load manager  also called the frontend component, it performs all the operations associated with the extraction and loading of data into the warehouse. These operations include simple transformations of the data to prepare the data for entry into the warehouse Warehouse manager  performs all the operations associated with the management of the data in the warehouse. The operations performed by warehouse manager include: Analysis of data to ensure consistency Transformation and merging the source data from temporary storage into data warehouse tables Create indexes and views on the base table. Generation of aggregation Backing up and archiving of data

8  The main components Query manager  also called backend component, it performs all the operations associated with the management of user queries. The operations performed by this component include directing queries to the appropriate tables and scheduling the execution of queries Detailed, lightly and lightly summarized data,archive/backup data Meta-data End-user access tools  can be categorized into five main groups: data reporting and query tools, application development tools, executive information system (EIS) tools, online analytical processing (OLAP) tools, and data mining tools

9  Data flows Inflow- The processes associated with the extraction, cleansing, and loading of the data from the source systems into the data warehouse. upflow- The process associated with adding value to the data in the warehouse through summarizing, packaging, and distribution of the data downflow- The processes associated with archiving and backing-up of data in the warehouse outflow- The process associated with making the data availabe to the end-users Meta-flow- The processes associated with the management of the meta-data

10 Operational data source1 Warehouse Manager DBMS Meta-data High summarized data Detailed data Lightly summarized data Operational data store (ods) Operational data source n Archive/backup data Load Manager Data mining tools OLAP (online analytical processing) tools End-user access tools Information flows of a data warehouse Reporting, query,application development, and EIS (executive information system) tools Downflow Inflow Meta-flow Upflow Query Manager Outflow Warehouse Manager

11  Data mart data mart  a subset of a data warehouse that supports the requirements of particular department or business function The characteristics that differentiate data marts and data warehouses include: a data mart focuses on only the requirements of users associated with one department or business function. data marts do not normally contain detailed operational data, unlike data warehouses as data marts contain less data compared with data warehouses, data marts are more easily understood and navigated.

12 Operational data source1 Warehouse Manager DBMS Operational data source 2 Meta-data High summarized data Detailed data Lightly summarized data Operational data store (ods) Operational data source n Archive/backup data Load Manager Data mining OLAP(online analytical processing) tools Reporting, query,application development, and EIS(executive information system) tools End-user access tools Typical data warehouse adn data mart architecture Operational data store (ODS) Query Manage summarized data(Relational database) Summarized data (Multi-dimension database) Data Mart (First Tier) (Third Tier) (Second Tier) Warehouse Manager

13 Reasons for creating a data mart To give users access to the data they need to analyze most often To provide data in a form that matches the collective view of the data by a group of users in a department or business function To improve end-user response time due to the reduction in the volume of data to be accessed To provide appropriately structured data as ditated by the requirements of end-user access tools Normally use less data so tasks such as data cleansing, loading, transformation, and integration are far easier, and hence implementing and setting up a data mart is simpler than establishing a corporate data warehouse

14 The cost of implementing data marts is normally less than that required to establish a data warehouse The potential users of a data mart are more clearly defined and can be more easily targeted to obtain support for a data mart project rather than a corporate data warehouse project

15  The benefits of data warehousing The potential benefits of data warehousing are high returns on investment. substantial competitive advantage. increased productivity of corporate decision-makers. Data warehouses facilitate decision support system applications such as trend reports (e.g., the items with the most sales in a particular area within the last two years), exception reports, and reports that show actual performance versus goals.

16 Disadvantages of warehousing Data warehouses are not the optimal environment for unstructured data.unstructured data Because data must be extracted, transformed and loaded into the warehouse, there is an element of latency in data warehouse data.latency Over their life, data warehouses can have high costs. Maintenance costs are high. Data warehouses can get outdated relatively quickly. There is a cost of delivering suboptimal information to the organization. There is often a fine line between data warehouses and operational systems. Duplicate, expensive functionality may be developed. Or, functionality may be developed in the data warehouse that, in retrospect, should have been developed in the operational systems and vice versa.

17 TOSHIBA Case study

18

19

20

21

22

23

24

25

26

27

28

29

30


Download ppt "Sachin Goel (68) Manav Mudgal (69) Piyush Samsukha (76) Rachit Singhal (82) Richa Somvanshi (85) Sahar ( )"

Similar presentations


Ads by Google