Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Data Warehousing Architecture A data warewhouse is an architectural construct of an information system that provides users with current and historical.

Similar presentations


Presentation on theme: "1 Data Warehousing Architecture A data warewhouse is an architectural construct of an information system that provides users with current and historical."— Presentation transcript:

1 1 Data Warehousing Architecture A data warewhouse is an architectural construct of an information system that provides users with current and historical decision support information that is hard to access or present in traditional operational data stores. It comprises a number of components illustrated in figure 1

2 2 Data Warehousing Architecture Figure 1 : A datawarehouse architecture

3 3 Data Warehousing Architecture Data Warehouse database –This is a cornerstone of the data warehouse (item 2 in figure 1) –Almost always implemented on a relational database management system –However very large databases, ad hoc processing and the need for flexible user views e.g. aggregates and drill downs are calling for different technological approaches: Parallel relational database designs requiring the use of symmetric multiprocessors, Massively parallel processors and clusters

4 4 Data Warehousing Architecture Speeding up traditional RDBMS The use of Multidimensional databases (MDDBs) which are tightly coupled to on-line analytical processing

5 5 Data Warehousing Architecture Sourcing, acquisation, cleanup and transformation of data –Implementing data warehouses involves extracting data from operational systems including legacy systems and putting it into a suitable format. –The various tools are illustrated as item 1 in figure 1 –These tools perform all the conversions, summarisations, key changes, structural changes, and condensations needed to transform disparate data into information can be used by decision support tools –

6 6 Data Warehousing Architecture –It produces programs and control statements required to move data into the data warehouse form multiple operational systems –Maintains the metadata –Remove unwanted data –Converts to common data names and definitions –Calculates summaries –Establish defaults for missing data –Keep track of source data definition changes

7 7 Data Warehousing Architecture The tools have to deal with: –Database hetergeneity: DBMS can be very different in data models, data access language etc. –Data hetergeneity: the difference in the way data is defined and used e.g. synonyms and different attributes for the same entity etc.

8 8 Data Warehousing Architecture Metadata: data about data that describes the datawarehouse –Technical metadata: contains information for warehouse designers and administrators Information about data sources Transformation descriptions Rules used to perform data clean up Access authorisation, information delivery history, data acquisation history, data access etc –Business metadata: information that gives users an understanding of the information stored in the data warehouse. Queries, reports images Data warehouse operational information e.g. data history and ownership

9 9 Data Warehousing Architecture Access Tools –The principal purpose of the data warehouse is to provide information for strategic decision making. –The main tools used to achieve this objective are: Data query and reporting tools Executive information system tools On-line analytical processing tools Data mining tools

10 10 Information delivery system The IDS is used to enable the process of subscribing for data warehouse information and having it delivered to one or more destinations of choice according to some user-specified scheduling algorithm IDS may be based on time of day or on completion of an external event. IDS can be achieved by Client/Server architecture and now by Interner/intranet and World Wide Web.

11 11 Top 10 “Don’t’s” of Data Warehousing

12 12 10. Pre-selecting Your Technical Environment This is a very common trap in which many organizations find themselves. It is traditional to select the hardware, software and other physical, technical components of a system as one of the earliest activities. However, a data warehouse is an unstable environment from a sizing perspective. How do you know the hardware/RDBMS/end user tool is appropriate for your data warehouse before conducting even the first round of analysis? If at all possible, wait to select your technical environment until after you have analyzed the business requirements for information, data, and potential systems of record.

13 13 9. Allowing Access Tool to Determine Data Architecture This is an extension of #10, but is important enough to list by itself. If you select an end user tool before developing your data architecture, it is very likely that that architecture will suffer at the hand of design requirements delivered by the tool. If you have to sacrifice design requirements in order to meet functional requirements of a tool, it is probably time to put that tool aside and select another one.

14 14 8. Unarchitected Data Marts OK. Data marts are good; they are an essential part of the data warehouse architecture. But to build only a data mart and to ignore the rest of the data warehouse (specifically the atomic level data and centralized meta data) will lead you down a path that will be more expensive and deliver less quality of data than the alternative. The alternative is to architect and build the data warehouse incrementally, iteratively. Include data marts as departmental instances of the architecture, and populate them from the atomic level data. This will ensure accuracy across the architecture, and reduce costs by eliminating unnecessary population of stand-alone data marts.

15 15 7. Boiling the Ocean It is more efficient to implement the data warehouse in small, achievable and palatable chunks than to try to implement it all at once. When I say “boil the ocean”, I mean trying to do too many things for too many people all at the same time. There is an old adage: “You can have everything; where would you put it all?” The same holds true for a data warehouse. If you try to design, develop and implement a data warehouse that is all-encompassing as your first iteration, how will the users be able to use all that you delivered? And in the mean time, while you’ve been trying to meet all of their needs, you have failed to meet any needs. And users won’t forget that for a long time.

16 16 6. “If you build it they will come” If you design, develop and implement an operational system, such as an order processing system, that new system is typically going to replace an existing system. In other words, the old system goes away and the users have no choice but to use the new one. Not so with the d/w. “If you build it…” implies an analysis that includes only bottom-up activities. It is crucial to the success of a data warehouse that a top-down analysis of user requirements for information be conducted. After that, users must be tutored, mentored and otherwise have their hands held as part of the implementation of the data warehouse. Existence does not guarantee utilization and, therefore, value.

17 17 5. Putting ROI before RFI (Requirements for Information) It is very difficult to quantify the intangible benefits that a data warehouse can provide to an organization. How can you put a price on increased customer loyalty. Somewhere, sometime, someone has probably made this calculation. In most cases, however, the determination of how beneficial the data warehouse will be is based on criteria that was developed for operational systems. Just as you cannot use operational data to meet your strategic informational requirements, it is difficult to calculate the return on investment (ROI) of a data warehouse. In terms of benefits to the organization, it is more appropriate to concentrate on how well the data warehouse addresses the target users’ requirements for information.

18 18 4. No Committed User Involvement Write this down: The success of any data warehouse is directly proportional to the amount of end user participation! A data warehouse cannot be successful without active participate on the part of the target users. Period. If you do not have user participation, you will find yourself in a situation where you will build it and hope that they will come. If there is no serious user participation in a data warehouse project, you have to seriously question whether or not the organization truly needs a data warehouse.

19 19 3. No Dedicated DBA In many situations the lack of a dedicated database administrator (DBA) has prevented a data warehouse project to be complete 1) on time, or 2) successfully. “Borrowing” a DBA from the operational “pool” will only result in questions about the nature of the data warehouse data models and database design. It’s too flat, not normalized properly, too much redundancy, and other criticisms are well suited for an operational system’s database design, but not a data warehouse. Considering that “data” is the first word in “data warehouse”, be sure you have a dedicated database administration resource committed to this important project.

20 20 2. No Meta Data Meta data is like documentation and training: Everyone knows it is necessary, but it usually gets dropped somewhere along the route to implementation. For the data warehouse, meta data is more important than just your typical documentation. Remember, in order to turn data into information you have to have the data, know that you have it, be able to access it, and trust it. Meta data is the means by which the users will be able to understand and trust the data. A time-variant record of where data came from, what happened to it along the way, where it is in the data warehouse, and what vehicles exist to access it will spell the difference between success and frustration.

21 21 1. Analysis Paralysis The inability to proceed past a sticking question. Wanting to “boil the ocean” and model/design everything before proceeding with development. Having to resolve political issues surrounding a “standard” or “common” definition. All of these things (and more!) will result in analysis paralysis. The 80/20 rule is very applicable to the development of a data warehouse. Execute 20% effort to get 80% of the total outcome, then move on to the next set of challenges and opportunities. Many data warehouse failures started when the development team stopped. Get your hands around an idea, understand what the users’ requirements for information are, and build something that produces something that can be evaluated. Don’t just stand there…do something!

22 22 Parallel Data Management A topic that’s closely linked to Data Warehousing is that of Parallel Data Management. The argument goes: –if your main problem is that your queries run too slowly, use more than one machine at a time to make them run faster (Parallel Processing). –Oracle uses this strategy in its warehousing products. There are two types of parallel processing - Symmetric Multiprocessing (SMP), and Massively Parallel Processing (MPP)

23 23 Parallel Data Management (cont’d) SMP - means the O.S. runs and schedules tasks on more than one processor without distinction. –in other words, all processors are treated equally in an effort to get the list of jobs done. MPP - more varied in its design, but essentially consists of multiple processors, each running their own program. –the problem with MPP is to harness all these processors to solve a single problem.

24 24 Parallel Data Management (cont’d) Regardless of the architecture used, there are still alternatives regarding the use of the parallel processing capabilities. –In normal Transaction processing, each transaction runs on a separate processor, because transactions are small units of work that run in a reasonable time-span. –However, the type of analysis carried out in data warehouse applications isn’t like that. Typically you want to run a query that looks at all the data in a set of tables. The problem is splitting that into chunks that can be assigned to the multiple processors.

25 25 Parallel Data Management (cont’d) There are two possible solutions to this problem: Static and Dynamic Partitioning. –In Static Partitioning you break up the data into a number of sections. Each section is placed on a different processor with its own data storage and memory. The query is then run on each of the processors, and the results combined at the end to give the entire picture. –This is like joining a queue in a supermarket. You stay with it until you reach the check-out.

26 26 Parallel Data Management (cont’d) –The main problem with Static Partitioning is that you can’t tell how much processing the various sections need. If most of the relevant data is processed by one processor you could end up waiting almost as long as if you didn’t use parallel processing at all. –In Dynamic Partitioning the data is stored in one place, and the data server takes care of splitting the query into multiple tasks, which are allocated to processors as they become available. –This is like the single queue in a bank. As a counter position becomes free the person at the head of the queue takes that position

27 27 Parallel Data Management (cont’d) With Dynamic Partitioning the performance improvement can be dramatic, but the partitioning is out of the users hands.


Download ppt "1 Data Warehousing Architecture A data warewhouse is an architectural construct of an information system that provides users with current and historical."

Similar presentations


Ads by Google