IUPs Production Data Warehouse By Indiana University of Pennsylvania Daniel J. Kuta.

IUPs Production Data Warehouse By Indiana University of Pennsylvania Daniel J. Kuta

Agenda Introduction Overview of the hardware/software environment Overview of the data warehouse items that have been implemented Items that have worked well Items that have been a challenge

Agenda Future directions Useful references

Introduction Database administrator at IUP Graduate of IUP March 2004 – 19 years at IUP Worked primarily with Office of the Registrar, Graduate School, Undergraduate Admissions and some Financial Aid.

About IUP Approx.13,800 students; 1,800 employees Largest Member, SSHE 3 campuses; 1 center; 1 academy More than 100 undergraduate programs, close to 50 masters degree programs and 8 programs leading to a doctoral degree. Clock-hour programs

Banner at IUP Implemented five baseline modules and three Web For products 1998-2000 Banner 5.x (soon to be Banner 6) Oracle 9i, OAS (soon to be 9IAS) Sun Solaris

Post Implementation SSD, FAMIS, Workflow, TouchNet, Resource25, IDWorks, CSI Web For Admissions and Web For Alumni Quest Central For Oracle Dozens of custom-written programs and web applications Large data warehousing initiative

Banner IT Support at IUP Application Development Group 1 Coordinator 1 Senior Systems Analyst/DBA 2 Senior DBAs 7 Developers Miscellaneous Entities User Services, Tech. Services, Acad. Support Reps., Power Users

Hardware/software environment Development environment Dell PowerEdge 2500 server 1.266 GHz CPU 1 Gb RAM 72 Gb disk storage Windows 2000 Server operating system Oracle 9.2.0.4 Enterprise Edition

Hardware/software environment Production environment Sun Sparc Ultra-4 server 4 - 296 MHz CPUs 2 Gb RAM 208 Gb total disk storage Sun Solaris 5.6 operating system Oracle 8.1.6.3 Enterprise Edition

Initial End User Base Staff within the office of the Vice Provost for Administration and Technology Staff within the University Planning and Analysis and Institutional Research area.

Current projects and their impetus Replacement of an Institutional Research database. Migration of routines from MS Access queries to packaged PL/SQL procedures. Want the ability to prove or justify the data.

Current Meeting Structure Meet with representatives from the Associate Provost and the Planning and Analysis areas approximately every three weeks. Set agenda of topics to discuss Email/phone call follow-ups as needed between meetings.

Overview of Data Warehouse items that have been implemented User starting point Warehouse web site Intro page lists the data sets or subject areas available with a brief description. Last extract/freeze date recorded. Next extract/freeze date identified.

Overview of Data Warehouse items that have been implemented Student Grades Provides data that allows for the analysis of grades and program review. Course Master Allows further analysis of the above, plus enrollment and credit hours generated by student level within a course.

Overview of Data Warehouse items that have been implemented Course Schedule Provides data that allows for the monitoring of enrollment levels in courses at determined intervals. Allow for attrition/migration analysis. Quarterly Financial Summarizations A rollup of financial data within Fund, Organization, Program and Account Code.

Overview of Data Warehouse items that have been implemented Payroll Source is the bi-weekly payroll Focus feeds received from the SSHE payroll system. Data regarding earnings, benefits, deductions, etc. is recorded. Data is linked back to Banner position numbers, giving a tie back to the FOAPAL strings responsible for the expense.

Items that have worked well Extract all data needed to a staging database and build/rebuild from there. All columns traced back to their source. Once a table was touched with a column or columns, the entire row is extracted to the staging database. All columns are pulled.

Items that have worked well Extract all data needed to a staging database and build/rebuild from there. Staging tables mimic the layout of their source tables. They include an additional column that identifies their freeze id. All rows required from the source tables are extracted and tagged in the staging database with an indicator to tie them together – the freeze id column is populated.

Items that have worked well Extract all data needed to a staging database and build/rebuild from there. The builds of the data sets are now based on the staged data. Any subsequent rebuilds all run from the same staged data.

Items that have worked well Extract all data needed to a staging database and build/rebuild from there. Benefit: Consistent builds/rebuilds. Not hitting a moving target with data from the Banner production database. Benefit: Were able to prove and justify the builds of the data sets.

Items that have worked well Model-based construction for the extracts from Banner production to staging. Parameter/profile tables were created to identify the source tables required to build the data sets for a subject area. The tables also identified any special SQL FROM or WHERE clause logic that was needed to extract the data.

Items that have worked well Model-based construction for the extracts from Banner production to staging. If a table was not listed as requiring any special SQL extract logic, the entire table was pulled. This was used to pull copies of required Banner validation tables, usually needed for some transformations or code descriptions.

Items that have worked well Model-based construction for the extracts from Banner production to staging. Parameter/profile tables assisted with... The generation of the scripts that created the tables in the staging database. The generation of the SQL extract scripts to pull data from the Banner production database to the staging database.

Items that have worked well Model-based construction for the extracts from Banner production to staging. Parameter/profile tables assisted with... Scripts to provide record counts of data pulled to staging. Scripts to delete data from test runs of the extract scripts in the staging database.

Items that have worked well Initially started with Java programs generating the scripts... CREATE TABLE scripts for the staging database The extract scripts The record count scripts The delete scripts

Items that have worked well Initially started with Java programs generating the scripts... Running the extract scripts in this manner worked well for high volume, low frequency extracts – 3 per year. It was a manageable process. However...

Items that have worked well PL/SQL packaged procedures were created to dynamically create and execute the SQL extract scripts. Need dictated by low volume, high frequency, off-hours extracts. Additional tables were created to record run-time parameters and the jobs results.

Items that have worked well PL/SQL packaged procedures were created to dynamically create and execute the SQL extract scripts. Procedures run unattended, logging their results.

Items that have worked well Builds of the data sets are done by PL/SQL packaged procedures. Call to execute a build procedure with passed parameters that identify the freeze data to use. Vast majority of the transformations and description lookups coded as PL/SQL functions. Benefit: reusability

Items that have worked well The completed data sets are built in the staging database. This allows for an analysis of the builds by validation procedures.

Items that have worked well After validation of the new data sets in the staging database, the new data sets are then copied into the data warehouse. Separate procedures are used to perform the migration of the data from staging to the data warehouse.

Items that have worked well Once the updated data sets have been migrated into the data warehouse… The data warehouse web site is updated to reflect the status of the data sets available.web site Keeping the web site updated and current is necessary to gain user buy-in to use it. Otherwise, expect phone calls asking for the status of...

Items that have been a challenge User dictated design – Replacement of the existing IR database. Too many databases and queries were already written and dependent on the existing structure.

Items that have been a challenge Discovery of all existing transformations Transformations hidden in a vast array of MS Access queries. Special fix routines coded in SQL scripts run through SQL*Plus.

Items that have been a challenge Missing data Analysis of the builds sheds light on data missing from the Banner production database. Resolution: Identify critical data. Verify it is available prior to performing the extracts.

Items that have been a challenge User availability Subject matter experts must be available to provide needed information and feedback in a timely manner.

Items that have been a challenge Parallel builds of the data Difficulty in coordinating parallel builds of the data sets within both systems in order to perform validation of the new procedures. User testing Parallel builds performed – Yeah! User participation in the validation of the builds was lacking.

Future directions/plans Complete the deactivation of the old IR database. SSHE-related semester freezes. Add additional functionality to the job execution environment. Currently logs start time, end time and duration of the entire job.

Future directions/plans Add additional functionality to the job execution environment. Will have it log each job step or extract it is performing. Record the start time, end time and duration of the step. Metadata on the target table: initial storage requirements, its needs after the extract and the change in those requirements.

Future directions/plans Add additional functionality to the job execution environment. Keep the build and migration procedures, but add procedure calls to perform the logging of the jobs metadata.

Future directions/plans Existing project in the queue for financial reporting. Desire is to have flexible, responsive, rollup reporting. Detail data must be available for drilldown. Look to model budgets, commitments, payments, revenue, etc.

Future directions/plans Existing project in the queue for financial reporting. Challenges: No intimate knowledge of Banner Finance. First truly dimensional model. Some Ragged Hierarchies. Implementation of change data capture procedures.

Future directions/plans Change the focus of the data warehousing projects. Currently, too heavy on mandated state reporting. Its focus is on reporting the past, or what has happened.

Future directions/plans Change the focus of the data warehousing projects. Need to direct attention to the detection of trends and our reaction to them. And yes, you do need historical data to do that. But it must be in the proper format to easily answer the questions that are asked.

Future directions/plans As a simple example, running a University (or any business) is a lot like driving a car... Can you successfully get to where you want to be by constantly looking in the rear view mirror? You must look out the front windshield and focus on what you see. Like it or not, theres stuff coming at you!

Future directions/plans As a simple example, running a University (or any business) is a lot like driving a car... You must navigate around any obstacles you encounter. But this is only short-term success, a nice leisurely drive. You need direction, a destination, and a road map to get there.

Future directions/plans As a simple example, running a University (or any business) is a lot like driving a car... The strategic plan of the university defines its goals – its destination. If so, whats our plan or road map look like in trying to get to reach that destination? Have we aligned our data warehouse initiatives with that plan?

Future directions/plans As a simple example, running a University (or any business) is a lot like driving a car... Are we collecting and analyzing the data needed to measure our progress at reaching that destination? What triggers a change, a detour or alternate route in the journey?

Conclusion Satisfied with the environment setup to perform the extracts, builds and migrations of the data sets. Users are satisfied with what they are receiving.

Conclusion Yes, I feel a level of frustration that the initiatives have focused on mandated reporting – the What happened? reporting. Need to implement structures to capture and provide more metadata on the data sets and the procedures and functions that build them.

Useful references Books Building the Data Warehouse - W. H. Inmon © 1996 – John Wiley & Sons The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses – Ralph Kimball © 1996 – John Wiley & Sons

Useful references Books The Data Warehouse Toolkit – Second Edition The Complete Guide to Dimensional Modeling Ralph Kimball, Margy Ross © 2002 – Wiley Computer Publishing Data Warehouse Design Solutions Christopher Adamson, Michael Venerable © 1998 – Wiley Computer Publishing

Useful references The Data Warehousing Institute www.dw-institute.com Intelligent Enterprise www.intelligententerprise.com DM Review www.dmreview.com

Useful references Bill Inmons web sites www.inmoncif.com www.inmongif.com Ralph Kimballs web site www.ralphkimball.com Oracle 9.2 documentation set

Questions? Comments? Dan Kuta djkuta@iup.edu (724) 357-2887

IUPs Production Data Warehouse By Indiana University of Pennsylvania Daniel J. Kuta.

Similar presentations

Presentation on theme: "IUPs Production Data Warehouse By Indiana University of Pennsylvania Daniel J. Kuta."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

IUPs Production Data Warehouse By Indiana University of Pennsylvania Daniel J. Kuta.

Similar presentations

Presentation on theme: "IUPs Production Data Warehouse By Indiana University of Pennsylvania Daniel J. Kuta."— Presentation transcript:

Similar presentations

About project

Feedback