Presentation on theme: "1 TEACHING THE DATA WAREHOUSE COURSE Paul Gray ISECON 2001."— Presentation transcript:
1 TEACHING THE DATA WAREHOUSE COURSE Paul Gray ISECON 2001
2 Outline Overview of what is Data Warehousing The 5 and 10 week courses The Indiana University of Pennsylvania course
3 ORIGINS Data warehouses are the results of two software solutions needing and finding one another: Data base firms developed data warehouses and were looking for applications EIS and DSS software developers and vendors needed to deal with ever-increasing data bases About 10 years ago, the two groups started interacting with the results described here.
4 ORIGINS Database developers long understood that their software was required for both transactional and analytic processing However, their principal developments were directed to ever-larger transactional data bases. This process occurred even through operational and analytic data are separate with different requirements and different user communities.
5 ORIGINS Once these differences were understood, new data bases were created specifically for analysis use. Today, data warehouses have 3 major applications On-line analytic processing for business intelligence Data Mining Customer Relationship Management
6 WHAT IS DATA WAREHOUSING A data warehouse is typically a dedicated data base system for decision making that is separate from the production data base(s) used operationally. It differs from production system in that: it covers a much longer time horizon than transaction systems it includes multiple data bases that have been processed so that the warehouses data are defined uniformly (i.e., clean data) it is optimized for answering complex queries from managers and analysts.
7 WHAT IS DATA WAREHOUSING? In the last 5 years, data warehousing has become a major industry within computing which has brought together the ideas of data bases and decision support. It has also been the foundation for efforts in data mining and in CRM Data mining refers to finding answers about an organization from the information in the data warehouse that the executive or the analyst had not thought to ask. Data mining is made possible by the very presence of large databases in the data warehouse. It provides techniques that allow managers to obtain managerial information from their legacy systems. Its objective is to identify valid, novel, potentially useful, and understandable patterns in data.
8 WHAT IS DATA WAREHOUSING? The objective of a data warehouse is to create a single truth Data warehousing is a major new application area. It rates extremely high salaries (up to $100,000 for specialists, $300,000 for consultants).
9 DEFINITION A data warehouse is a: Subject oriented Integrated Time-variant Non-volatile Collection of data in support of management decision processes
10 NOTE: Data warehouse is physically separated from operational systems and operational data bases Data warehouses hold both aggregated and detailed data for management separate from the databases used for On-Line Transaction Processing (OLTP)
11 CHARACTERISTICS Subject orientedData are organized by how users refer to it IntegratedInconsistencies are removed in both nomenclature and conflicting information; (i.e. data are clean) Non-volatileRead-only data. Data do not change over time. Time seriesData are time series, not current status
12 CHARACTERISTICS SummarizedOperational data are mapped into decision usable form LargerTime series implies much more data is retained Non normalizedData can be redundant Metadata=Data about data InputUnintegrated, operational en- vironment (legacy systems)
13 SUBJECT ORIENTATION Data is organized around major subjects of the enterprise Example: OPERATIONALDATA WAREHOUSE LoansCustomer SavingsVendor Bank cardProduct TrustActivity An application A subject orientation
14 USING THE WAREHOUSE The higher the level of summarization, the more the data is used The more summarized the data, the quicker it is to retrieve However, the higher the level of summarization, the lower the level of detail
15 DATA MARTS Data Mart: A scaled-down version of the data warehouse A data mart is a small warehouse designed for the SBU or department level. It is often a way to gain entry and provide an opportunity to learn Major problem: if they differ from department to department, they can be difficult to integrate enterprise-wide
16 COST Data warehouses are not cheap Median cost to create (does not include operating cost) = $2.2M Multimillion dollar costs are common Their design and implementation is still an art and they require considerable time to create
17 SIZE Being designed for the enterprise so that everyone has a common data set, they are large and increase in size with time. Typical storage sizes run from 50 Gigabytes to several Terabytes
18 SIZE OF INDUSTRY Data warehouses are a major industry within information systems. 6B$/year Estimates vary but it is clear that many more than 90% of Fortune 1000 have data warehouse projects Major players include: Oracle IBM+Informix Sybase NCR + BI companies (Brio, Cognos, Pilot,….)
19 MARKET COMPONENTS High end business intelligence (OLAP) Low end query tools Data cleansing Data marts Data mining Customer relationship management (CRM)
20 APPLICATION – OLAP/BI OLAP = On Line analytic processing Basic idea of OLAP: managers should be able to manipulate enterprise data models across many dimensions to understand changes that are occurring Vendors claim they are OLAP compliant even if they are not
21 APPLICATION - DATA MINING Also known as Knowledge Data Discovery (KDD) Mining terminology refers to finding answers about a business from the data warehouse that the executive or analyst had not thought to ask
22 APPLICATION - DATA MINING KDD applies techniques mostly from artificial intelligence and statistics to discover new information. It is designed to find information that queries and reports dont reveal effectively KDD uses AI and statistics to find pattern in data and to infer rules.
23 APPLICATION - DATA MINING Some successes: People who buy scuba gear take Australian vacations Fraud detection, consumer loan analysis Optimizing production lines IBMs SCOUT (apocryphal?) Men who buy diapers buy beer
24 APPLICATION - DATA MINING AssociationsThings done together (buy groceries) SequencesEvents over time (house.refrigerator) ClassificationsPattern recognition (rules) ClustersDefine new groups ForecastingPredictions from time series
25 CUSTOMER RELATIONSHIP MANAGEMENT (CRM) Successor to data base marketing Implies marketing to customers on a 1:1 basis Requires data granularity at the level of the individual customer large amounts of data Data warehouse is only a part of the CRM concept.
26 History of Course I First given as PhD Seminar on Management of Information Systems 1996 Overview lecture by instructor Students present topics each week – typically 2 or 3 Great source of material!
27 History of Course II Attend Data Warehouse Institute conference in San Diego. Agree to write book with Watson. January 1997 Complete book in late Desk-top published. Used sabbatical at UCI to do the writing. Move course to regular MS course in Spring 1997.
28 Two Versions: 14 week for Claremont Graduate University 5 Week for UC Irvine Taught 1/year for last 5 years WHY DO STUDENTS TAKE COURSE? High salaries (100K specialist, $300K for consultant)
29 Both Versions Text: Decision Support in the Data Warehouse by P. Gray and H.J. Watson Prentice Hall 1998 PLUS readings PLUS hands-on exercises PLUS term paper
30 5 WEEK VERSION WkPart 1 Part 2 1 Overview of DW OLAP, applications 2 Demo Startracker Strategic use, software Framework 3 Data sources, Planning and cleansing,metadata operating the DW 4 Data marts,ODS Industry, economics 5 Mining, BI CRM, Developments
31 Date 14-WEEK COURSE 9-9Introduction to DW 9-16Architecture, Data Input 9-23Metadata, Multiple Dimensions 9-30Interface, Strategic Uses 10-7Design Methods; Data Marts 10-14Progress Report; ODS; Maintaining the DW 10-21Midterm 10-28Economics of DW; Personnel 11-7Querying, Data Mining 11-14Progress Report; CRM 11-21Business Intelligence ; Future Developments 11-28Student Reports 12-2Student Reports 12-9Final Exam
32 Using SoftwareStar Tracker Simple DW The Data Warehouse Toolkit : Practical Techniques for Building Dimensional Data Warehouses by Ralph Kimball (Wiley 1996) Commercial version Synchrony from If… no longer available
33 Using Software – Commercial Red Brick (now part of Informix which is now owned by IBM) Gift from Red Brick Mostly used with student projects
34 Data Sources Journal of Data Warehousing DM Review On-line (dmreview.com) Hard copy Lots and lots on Internet e.g., Pwp.starnetinc.com/larryg
35 Assignments Find 10 articles on Internet teaches search techniques, nomenclature, rapid course immersion Exercise with software The DW industry
36 Term Projects for DW Next chart shows 22 of the 30 suggested topics. Last topic allows student to choose topic subject to instructor approval.
37 DW Architecture Data sources, loading, cleaning, summarizing, granularity Metadata in a DW Data Modeling and the DW Multiple Dimensions in DWs via ROLAP and MOLAP Indexing in DWs DW Interfaces –conventional, window, browser Design principles for DWs Strategic use of the DW Justification of the DW including C/B analysis DW Security Data Marts Operational Data Stores Maintaining the DW Economics of DWs Querying and reporting in a DW OLAP and the DW Decision Support Systems and the DW Database marketing and the DW Data Mining and the DW Personnel considerations for building the DW and for maintaining the DW Organizing the DW Project (Project management etc.)
38 Elizabeth Pierces Course at Indiana University of Penn. Paper published in Communications of AIS (CAIS) Sept (Vol. 2 Article 16) Developing and Delivering a Data Warehousing and Mining Course Introduces students to the strategies, technologies, and techniques
39 Elizabeth Pierces Course at Indiana University of Penn. Students learn what is involved in planning, designing, building, using, and managing a data warehouse. Students also learn about how a data warehouse must fit into an over- all corporate data architecture that may include legacy systems, operational data stores, enterprise data warehouses, and data marts. In addition, students are exposed to the different data mining techniques used by organizations to derive information from the data warehouse for strategic and long-term business decision making. (abstract)
40 U. Of Indiana Course 14 weeks –7 weeks on DW Challenges: Evolving nature of the subject Hands on experience for students Textbook selectionmostly tradebooks
41 Useful Tables in Pierce List of Internet sources Available software demos Some on-line data warehouses with public access Schools that offered data mining courses in 1999
42 Conclusions Students come because it is a job source A good 2 nd course for data base (can be taken w/o database course) Allows covering the new topics of the 90s and 00sOLAP,CRM,Mining, in more depth than Intro to IS course. Its a fun topic and you can even do research on it.