Presentation on theme: "Data Warehousing components. Overall architecture."— Presentation transcript:
Data Warehousing components
Figure 4-1 Data Warehouse Architecture
Data warehouse database The central data warehouse database is a cornerstone of data warehousing environment These approaches include the following: –Parallel relational database designs that require a parallel computing platform –An innovative approach to speed up a traditional RDBMS by using new index structures to bypass relational table scans –Multidimensional database (MDDBs) that are based on proprietary database technology or implemented using already familiar RDBMS. Multidimensional database are designed to overcome any limitations placed on the warehouse by the nature of the relational data model
Sourcing, Acquisition, Cleanup and Transformation Tools A significant portion of the data warehouse implementation effort is spent extracting data from operational systems and putting it in a format suitable for informational applications that will run off the data warehouse The data sourcing, cleanup, transformation, and migration tools perform all of the conversions, summarizations, key changes, structural changes, and condensations needed to transform disparate data into information that can be used by the decision support tool
Sourcing, Acquisition, Cleanup and Transformation Tools The functionality includes: –Removing unwanted data from operational databases –Converting to common data names and definitions –Calculating summaries and derived data –Establishing defaults for missing data –Accommodating source data definition changes The data sourcing, cleanup, extract, transformation and migration tools have to deal with some significant issues, as follows: –Database heterogeneity. –Data heterogeneity.
Metadata Metadata is data about data that describes the data warehouse. It is used for building, maintaining, managing, and using the data warehouse. Metadata can be classified into the following: –Technical metadata –Business metadata –Data warehouse operational information such as data history (snapshots, versions), ownership, extract audit trail, usage data
Access Tools The principal purpose of data warehouse is to provide information to business users for strategic decision making. These users interact with the data warehouse using front- end tool. Many of these tools require an information specialist, a domain expert, who can analyze the information and can interact with the data warehousing environment in order to reach meaningful conclusions. This is especially true for data mining tools when defining the problem, configuring the tool, and analyzing the results.
Tool Taxonomy The end user tools area spans a number of components. For example, all end user tools use metadata definitions to obtain access to data stored in the warehouse, and some of these tools may employ additional/ intermediary data stores. These tools can be divide into five main groups: –Data Query and Reporting tools –Application Development tools –Executive Information System (EIS) Tools –Online analytical processing tools –Data mining tools
Data Mining tools Most organizations engage in data mining to do the same following: –Discovering knowledge: segmentation, classification, association and preferencing. –Visualizing Data –Correct data The strategic value of data mining is time-sensitive, especially in the retail, marketing and finance sectors of the industry Using data mining to build predictive models in decision making has several benefits. –A model should explain why a particular decision was made –Adjusting a model based on feedback from future decisions will lead to experience accumulation and true organizational learning. –Finally, a predictive model can be used to automate a decision step in a larger process.
Data Marts The concept of the data mart is causing a lot of excitement and attracting much attention in the data warehouse industry. In general, data marts are being presented as an inexpensive alternative to a data warehouse, taking significantly less time and money to built The data mart is directed at a partition of data (often called as a subject area) that is created for the use of a dedicated group of users. Unfortunately, the misleading statements about the simplicity and low cost of data marts sometimes result in organizations or vendors incorrectly positioning them as an alternative to the data warehouse. In summary, data marts present two problems: the problem of scalability in situations where an initial small data mart grows quickly in multiple dimensions, and the problem of data integration.
Data Warehouse administration and management In summary, managing data warehouses includes the following: –Security and priority management –Monitoring updates from multiple sources –Data quality checks –Managing and updating metadata –Auditing and reporting data warehouse usage and status –Purging data –Replicating, subsetting, distributing data –Backup and recovery –Data warehouse storage management (for example, capacity planning; hierarchical storage management, or HSM; purging of aged data)
Impact of the web Even a surface analysis of the information technology industry indicates that the two most pervasive themes in computing have been the Internet and data warehousing. From a marketing perspective, a marriage of these two giant technologies is a natural and unavoidable event. The reason for these trends is simple: the compelling advantages in using the Web for access are magnified even further in a data warehouse
Impact of the web (cont’d) The intranet movement has resulted in a drastic decrease in the capital intensity and the project expense of creating and deploying applications on the web Today, corporations can setup a RDBMS server, DSS server and Web server in a single location; build a decision support application using standard tools; and then immediately deploy to hundreds or even thousands of users anywhere on the corporate intranet. Application maintenance, code upgrades, and security privileges are now administered centrally. As an example: Sabre computer reservation system
Approaches to using the web
Figure 4-2 Web-enabled Information delivery
Design Options and Issues Issues: Web access offers some clear advantages over existing architectures, but there are some very clear issues and concerns. These issues include the following: –Security –Performance –Statelessness –Functionality –Presentation Therefore, we can offer the following suggestions: –Design your data warehouse very carefully –Minimize the number and size of data transmission per access –Use more server-based processing, including stored procedures and server-side functions –Ensure that the server is extensible, highly available, and that its workload is balanced.
XML XML stands for eXtensible Markup Language XML should: –Easy to use over the Internet –Compatible with SGML –Capable to processed by easy-to-write programs –Legible and reasonably clear to users In addition to the XML standard, several auxiliary standards are needed to complete the functionality of XML. For example, XSL, Xlink, and Xpointer are among the proposed standards that provide XML support for style sheets, hyperlinks, and other features