Presentation on theme: "Building a Data Warehouse...Bring in the Sheaves January 13, 2004 EDUCAUSE Mid-Atlantic Conference Baltimore, Maryland Ella Smith U.S. Department of Agriculture."— Presentation transcript:
Building a Data Warehouse...Bring in the Sheaves January 13, 2004 EDUCAUSE Mid-Atlantic Conference Baltimore, Maryland Ella Smith U.S. Department of Agriculture Alan Harmon U.S. Naval Academy January 13, 2004 EDUCAUSE Mid-Atlantic Conference Baltimore, Maryland Ella Smith U.S. Department of Agriculture Alan Harmon U.S. Naval Academy
Copyright Ella Smith and Alan Harmon, This work is the intellectual property of the authors. Permission is granted for this material to be shared for non-commercial, educational purposes, provided that this copyright statement appears on the reproduced materials and notice is given that the copying is by permission of the authors. To disseminate otherwise or to republish requires written permission from the authors.
Agenda for this Session Definition Overview of the Initial Process (Proof-of-Concept) Overview of the Initial Process (Proof-of-Concept) Organizational Ownership Organizational Ownership Data Warehouse Architecture Data Warehouse Architecture Project Team Composition Project Team Composition The Process The Process Wrap Up Wrap Up Questions Questions Definition Overview of the Initial Process (Proof-of-Concept) Overview of the Initial Process (Proof-of-Concept) Organizational Ownership Organizational Ownership Data Warehouse Architecture Data Warehouse Architecture Project Team Composition Project Team Composition The Process The Process Wrap Up Wrap Up Questions Questions
What is a Data Warehouse? u Definition: a repository of data derived from operational systems or external source; NOT an archive u Purpose: collect and report data in a consistent, centralized manner; mechanism for conducting longitudinal analysis u Strategy: Target key applications (Admissions, Registrar, Frozen Files), clean data and load.
Benefits of a Data Warehouse u Cost Savings by reducing the amount of manual time and effort required to compile, organize, and report the data. u Data Consistency among the different areas since the data will be synchronized upon entry into the data warehouse. u Access to the information will be faster since the process will be automated and available online (versus paper reports).
Loading and Cleaning Data Opportunity to Integrate, Correct, and Validate Data Loading and Cleaning Data Opportunity to Integrate, Correct, and Validate Data DataWarehouseDataWarehouse Data Extraction and Cleaning (can be very complex) u Integrate multiple data sources u Correct data problems (cleanse) u Validate Data u Summarize and roll-up data u Update Metadata Flat File DataSources Data in Databases Data Live Data Sources ApplicationsSAPPeopleSoft Oracle Apps ApplicationsSAPPeopleSoft
Online Analytical Processing Fast and Selective Access to Summarized Data REGISTRAR View FINANCIAL View Ad Hoc View PROD MARKETMARKET TIME ADMISSIONS View MAJORS CLASS YEAR TIME STUDENTS
DW Development Strategy u Think & Plan Big –Build In Small Steps; Don’t build a BARN! Not an archive system u Identify your audience u Use DW to address new areas, add new capabilities, and fix existing problems u Retain existing transactional systems u Iterative development approach –Address key needs –Rapidly deliver capability to users –Lower risk
Strategy (continued) u Evolve system in manageable phases –Identify questions you need to answer OR –Look at data to determine questions you can answer u Strategy –Develop an overall plan –Develop common metadata standards –Implement needed pieces mindful of integration and expansion
Initial Considerations u Vision u Proof-of-Concept / Phased Approach u Benefits u Strategy u Timeline u Cost u Issues –Data –Political constraints –Organizational Factors
1 st Step: Proof-of-Concept u Develop a Stand Alone Proof-of-Concept u Develop model to demonstrate use of new tools to end users. u Provide benchmarks for future planning. u Low cost way to “test the waters” u Exposes YOUR data and ability to deal with it u Define number of tasks and deliverables.
Proof-of-Concept Timeline u 6-8 Weeks for each increment –Requirements: gather and document –Data: identify source, construct model, extract data, cleanse data, transport data to database –Data Access: user interface, security, training, documentation
ADMISSION DATA DATA SSN CLASS YEAR DEMOGRAPHIC DATA DATA SSN CLASS YEAR ETHNICITY GENDER HIGH SCHOOL REGION STUDENT_FACT SATVHI SATMHI H.S. RANK H.S. CLASS SIZE TIME #ACYEAR CLASS YEAR REGISTRAR DATA DATA SSN CLASS YEAR GPA MAJOR Proof-of-Concept Logical Data Model SSN ACYR
ADMISSION DATA DATA #ADMISSION_SSN SCORE_CLASS DEMOGRAPHIC DATA DATA # DEMO_SSN DEMO_CLASS ETHNICITY GENDER HIGH SCHOOL REGION STUDENT_FACT #ADMISSION_SSN #DEMO_SSN #REGISTRAR_SSN #ACYEAR SATVHI SATMHI H.S. RANK H.S. CLASS SIZE TIME #ACYEAR CLASS_YEAR REGISTRAR DATA DATA #REGISTRAR_SSN CLASS GPA MAJOR Proof-of-Concept Physical Data Model SSN ACYR
Post-PoC: DW Architecture u Many types of architecture –Star schema, Snowflake, Hybrid u Depends on: –Types of queries –Size of database –Capability of hardware and software u Basic Components: –Logical Model –Physical Model
Physical Data Warehouse Topology Admissions Academic Affairs Dean of Students Finance Office HR President Instit Research WebServer Database Server Server General Public for remote connectivity Remote Laptop1Remote Laptop2 Public Affairs
MetaData Definition: Information about your data MetaData Definition: Information about your data u Centralized description of business rules –Describes data and transformations within DW –Captures changes in business rules over time to provide a level playing field for comparing data u Audit trail for data authentication u Bottom line –Increased trust in DW-based analysis results results
Project Team Composition u Types of Personnel and Level of Skill – Analysis & Design (HIGH) – Implementation (MED) – Test & Quality Assurance (LOW) u Skill = $$$ u Vary Skill by Task to control cost
The Project Model “Roles and Responsibilities” Steering Committee Project Manager Quality Assurance PrgmrPrgmrModelerModelerDBADBA Tool Prgmrs EndUserLiaisonEndUserLiaisonDocumentationDocumentation Planning, Reporting, Certification Joint Client and Consultant Test and Map to Requirements
The Project Model “Roles and Responsibilities” PrgmrPrgmrModelerModelerDBADBA Tool Prgmrs EndUserLiaisonEndUserLiaisonDocumentationDocumentation Scoping Scoping Infrastructure InfrastructureScoping Modeling Cleaning Capacity Planning PrototypingModeling ETL Implementation ImplementationBuildingBuilding QAQAQAQA QA / Training Scoping Modeling Documentation Training Analysis Phase Architecture Phase Implementation Phase Transition Phase
The Harvest! u Review requirements and results periodically –At end of each phase –Annually, taken as a whole u Optimize data warehouse –Response based on queries and load –Bring in-line with operational systems u Review and Adjust the DW mission as institutional mandates change
Cost Control u Start small and develop in phases u Bring in skill sets as needed remember: $$$ = (Skills) x (period of time) remember: $$$ = (Skills) x (period of time) u Institutional staff should know the data u Organizational issues need to be resolved by the Project Manager and Steering Committee
Accountability u MUST show results (standard or adhoc reports) u Ensure complete documentation to maintain responsibility and association of data to departments u Establish a Return-on-Investment (ROI) whether tangible (number of reports) or intangible (executive support/decision making)
Issues u Security u Performance u Managing the metadata u Managing the data warehouse u Hardware/software configuration u Resources u Staying in the loop!