Presentation on theme: "Data Warehousing: A Proven Solution to Sustaining a Vibrant Business Jerry Hammons Art Brooks EDUCAUSE 2006 Dallas, Texas Poster Session Copyright Jerry."— Presentation transcript:
Data Warehousing: A Proven Solution to Sustaining a Vibrant Business Jerry Hammons Art Brooks EDUCAUSE 2006 Dallas, Texas Poster Session Copyright Jerry Hammons 2006. This work is the intellectual property of the author. Permission is granted for this material to be shared for non-commercial, educational purposes, provided that this copyright statement appears on the reproduced materials and notice is given that the copying is by permission of the author. To disseminate otherwise or to republish requires written permission from the author.
The Evolving Data Warehouse UMR Started data warehouse adventure in 1986. UMR Started data warehouse adventure in 1986. Over 18 years the data warehouse and reporting has transitioned: Over 18 years the data warehouse and reporting has transitioned: –from local mainframe SQL/DS to local client/server with Informix –from Informix to Oracle –from native SQL for report creation to InfoMaker –through four strategic methodological changes –through three version upgrades in the Advance system –through implementation of PeopleSoft –through one PeopleSoft version upgrade.
Is user oriented. Is user oriented. Is an entity separate from the transactional system. Is an entity separate from the transactional system. Must be insulated from transactional system changes. Must be insulated from transactional system changes. Must include a transitional bridge. Must include a transitional bridge. Can begin BEFORE a system conversion. Can begin BEFORE a system conversion. Reporting:
Strategic Statement If data can be converted from legacy to PeopleSoft, then data can be translated from PeopleSoft to legacy. (We NEVER stated ALL of the data could be translated.) If data can be converted from legacy to PeopleSoft, then data can be translated from PeopleSoft to legacy. (We NEVER stated ALL of the data could be translated.) Terms: Terms: –Converted – data in a legacy format, modified and loaded in PeopleSoft tables –Translated – data in a PeopleSoft format, modified and loaded in legacy defined tables.
The UMR Approach Free standing Free standing Ambiguously Ambiguously Related Related Entities Entities In contrast to such conventional approaches as star schemas, fact tables and dimension tables.
The Reporting Foundation Event oriented, functional tables: Event oriented, functional tables: relational tables designed for a specific reporting need that draws data from multiple data warehouse tables, focusing on the needs of the user and not the technical staff. Goal – reduce technical requirements to formatting the output page. SIMPLICITY!! SIMPLICITY!!
Functional Table Concept was developed to: –1. Empower the users –2. Simplify the data structure –3. Reduce report development time –4. Reduce processing time for the server (quicker response) –5. Improve programmer efficiency –6. Provide another tool for reporting YOU can do this!
Concept –Simplicity –Zero table joins –Zero Where statements –Event oriented –Report design centric –User takes less than five minutes to develop query –User concentrates on report appearance –Estimated eleven fold reduction in code to create a report
Functional Tables Subsets of existing of existing application(s) Subsets of existing of existing application(s) Single table can contain data from disparate applications Single table can contain data from disparate applications Do not HAVE to be homogeneous Do not HAVE to be homogeneous Created from pre-selection routines Created from pre-selection routines Create logic continuity Create logic continuity May be a view or a physical table May be a view or a physical table
Functional Table -- Proven Approach Concept developed in 1998 Concept developed in 1998 First put into use in 1999 with conversion of University Advancement system from mainframe to client/server First put into use in 1999 with conversion of University Advancement system from mainframe to client/server Over 500 new reports created for that department using this technique Over 500 new reports created for that department using this technique Utilized in 2000 to retain orientation system when Admissions implemented PeopleSoft Utilized in 2000 to retain orientation system when Admissions implemented PeopleSoft An integral part of reporting solution at UMR prior to 2004 Registrars PeopleSoft implementation An integral part of reporting solution at UMR prior to 2004 Registrars PeopleSoft implementation
Process Process –User identifies report needed (presents sample or attempts to sketch idea) –User and professional staff member meet to discuss new report –Programmer identifies transactional data needed to create report –Programmer creates and loads table –User creates report.
Legacy PeopleSoft UMR DW UMR Download Process Download scripts Functional table scripts Campus systems Approximately 300 tables Approaching 5 gigabytes
Report Prepared direct from PeopleSoft PS table 1 PS table 2 PS table 7 PS table 3 PS table 4 PS table 5 PS table 6 12 Process tables Query Report: Count number of Freshmen for Specified term Sub Query 1 Sub Query 2 Sub Query 3 Sub Query 4 Sub Query 5 12 Process tables 12 Process tables 12 Process tables 12 Process tables 12 Process tables
Report prepared from UMR functional table Report: Count number of Freshmen for Specified term UMR functional table Query
The Challenge With modules being implemented over a six year period of time, applications and reports had to continue to function when some data was in the mainframe in a legacy format and other data was in PeopleSoft in a client server/relational environment. With modules being implemented over a six year period of time, applications and reports had to continue to function when some data was in the mainframe in a legacy format and other data was in PeopleSoft in a client server/relational environment. This required applications and reports to draw data from tables that had diverse sources. This required applications and reports to draw data from tables that had diverse sources. Inventory indicated over 3,000 reports and 25 applications built over a 10+ year period of time would cease to function. Inventory indicated over 3,000 reports and 25 applications built over a 10+ year period of time would cease to function.
Report Creation Comparisons select a.admajor, count(*) from admfresh a where a.adterm=FS2001 and a.fraction=A and a.fractiondate<=sysdate group by a.admajor; select f.acad_plan, count(*) from ps_pers_data_effdt a, ps_pers_dtef_sa_vw a1, ps_stdnt_career b, ps_adm_appl_data c, ps_adm_app_car_seq d, ps_adm_appl_prog e, ps_adm_appl_plan f, where a.emplid=a1.emplid and from ps_pers_data_effdt a_eda.effdt=a1.effdt and (a.effdt=(select max(a_ed.effdt) where a.emplid=a_ed.emplid and a_ed.effdt<=sysdate) and a.emplid=b.emplid and b.acad_career=c.acad_career and c.emplid=d.emplid and c.acad_career=d.acad_career and c.stdnt_car_nbr=d.stdnt_car_nbr and c.adm_appl_nbr=d.adm_appl_nbr and d.emplid=e.emplid and d.acad_career = e.acad_career and d.stdnt_car_nbr = e.stdnt_car_nbr and d.adm_appl_nbr = e.adm_appl_nbr and d.appl_prog_nbr = e.appl_prog_nbr and e.effdt=(select max(e_ed.effdt) from ps_adm_appl_prog e_ed where e.emplid=e_ed.emplid and e.acad_career=e_ed.acad_career and e.stdnt_car_nbr=e_ed.stdnt_car_nbr and e.adm_appl_nbr=e_ed.adm_appl_nbr and e.appl_prog_nbr = e_ed.appl_prog_nbr and e_ed.effdt <=sysdate) and from ps_adm_appl_prog e_es where e.emplid=e_es.emplid and e.acad_career=e_es.acad_career and e.stdnt_car_nbr=e_es.stdnt_car_nbr and e.adm_appl_nbr=e_es.adm_appl_nbr and e.appl_prog_nbr=e_es.appl_prog_nbr and e.effdt=e_es.effdt and e.prog_status=e_es.prog_status) and e.admit_term=FS2001 and e.prog_status in (AC, AD) and e.action_dt <= sysdate and c.admit_type=FTC and d.acad_career=UGRD and e.emplid = f.emplid and e.acad_career=f.acad_career and e.stdnt_car_nbr=f.stdnt_car_nbr and e.adm_appl_nbr=f.adm_appl_nbr and e.appl_prog_nbr = f.appl_prog_nbr and f.effdt=(select max(f_ed.effdt) from ps_adm_appl_plan f_ed where f.emplid=f_ed.emplid and f.acad_career=f_ed.acad_career and f.stdnt_car_nbr=f_ed.stdnt_car_nbr and f.adm_appl_nbr=f_ed.adm_appl_nbr and f.appl_prog_nbr=f_ed.appl_prog_nbr and f_ed.effdt<=e.effdt) and f.effseq=(select max(f_es.effseq) from ps_adm_appl_plan f_es where f.emplid=f_es.emplid and f.acad_career=f_es.acad_career and f.stdnt_car_nbr=f_es.stdnt_car_nbr and f.adm_appl_nbr=f_es.adm_appl_nbr and f.appl_prog_nbr=f_es.appl_prog_nbr and f.effdt=f_es.effdt)) group by f.acad_plan; 1query 1 table 0 joins 3 'where' statements 6 lines 1 query 7 distinct tables 12 process tables 6 joins 5 sub queries 50 'where' statements 70 lines UMR Functional Table Format PeopleSoft Format Each of these queries counts the number of freshmen admitted for a specified term. The results are the same.
Comparison of Relational Approaches Fact HR table Appt data Ed. data Benefit data Address data Dept data Bio data Fact and dimension structure with Star Schema Functional HR table with same data UMR Approach
Hybrid tables Hybrid tables –After further experience and discussion it was realized the functional tables could be hybridized to satisfy specific reporting needs and to provide a transitional bridge to the future. –Definition – a hybrid functional table is one that has data derived from disparate systems. (normally legacy and PeopleSoft) –Hybrid tables can become transitional tables. –With time, hybrid tables can become normal functional tables. (When the legacy data is no longer required, the columns cease to be filled or are removed.)
Evolution of a Hybrid Table StunoGenderVet_codeCitizenshipFin_Aid_IntMar_StatEth_OrgDOB Studnt #M/F1JpnY/NS6MM/DD/YYYY BIO Table -- PeopleSoft value Original -- UMR Data Warehouse Table Name/Structure and Column Names Stuno Old StunoGender Vet_cod e PS_Mil_S t Citizenshi p Fin_Aid_In tMar_Stat PS_Mar_ StEth_OrgDOB Sp Interests EmplidStudnt #M1VetJpn SSingle6 Band -- translated values PS to UMR DW -- new PeopleSoft data item First Degree of Hybridization Second Degree of Hybridization BIO Table Stun o Old Stuno Gende r PS_Ge nder Vet_co de PS_Mil _St Citizenshi p PS_Citz n Mar_St at PS_Mar _St Eth_ Org PS_Eth _OrgDOB Sp Interest s Empli d Studn t #MMale1VetJpnJapanSSingle6 Caucasi an Band Not needed or found at this level Third Degree of Hybridization BIO Table StunoGenderPS_Mil_StPS_CitznPS_Mar_St PS_Eth_Or gDOBSp Interests Ambassado r Collection_I D EmplidMaleVetJapanSingleCaucasianYYYY/MM/DDBandYES12345 * the BIO table draws its data from 11 PeopleSoft tables
Approach Comparisons Fact and dimension 1. Less complex downloads. 2. Better potential for documentation. 3. Better potential to electronically trace schema. 4. Tables and column names more likely to be familiar with power users. 5. More potential for system- wide assistance. 6. Better understood by trained technicians. Functional tables 1. More simple presentation. 2. Faster ramp up. 3. Less impact with transactional changes. 4. Faster execution. 5. Extended user potential. 6. Continues longitudinal studies. 7. Tables and column names more likely to be understandable by casual users.
UMR Startup Tactic Approach Users did not see the difference. UMR DW Existing Reports & Applications Legacy Tables Bridge tables Hybrid tables PeopleSoft System New Reports
Results Translated PeopleSoft admissions data for 3 years without interrupting production. Translated PeopleSoft admissions data for 3 years without interrupting production. Integrated PeopleSoft HR with student legacy for 2 years without interrupting production. Integrated PeopleSoft HR with student legacy for 2 years without interrupting production. Integrated legacy and PeopleSoft grant data for a fiscal year report. Integrated legacy and PeopleSoft grant data for a fiscal year report. Amended HR data without affecting applications. Amended HR data without affecting applications. Had 400 reports and all applications needing student data in production by the end of the 4 th week of classes. Had 400 reports and all applications needing student data in production by the end of the 4 th week of classes. Integrated admissions PeopleSoft data with legacy student data until student system went live with PeopleSoft Integrated admissions PeopleSoft data with legacy student data until student system went live with PeopleSoft
Appropriate Data Warehouse Tables FAS Legacy System PeopleSoft System Conversion to PeopleSoft Translation scripts Table names could be legacy UMDW names Table names could be new PS names Conceptual Approach to Providing Data to FAS Campus conversions to PeopleSoft and PeopleSoft upgrades become transparent to FAS Names of tables or columns are irrelevant in developing new applications/reports. If existing applications/reports are involved, then legacy names could be used and the tables become hybrid tables and transitional bridges.
Whats Next at UMR The Next Generation UMR Data Warehouse
PeopleSoft prod PeopleSoft rpt UMR server Physical tables Logical tables Download scripts Formatting scripts Columbia Rolla UMR DW Reports Applications / Feeds Extract Load Transform Current UMR Data Warehouse Process 6 -7 hours process -- points of failure infrequent inconsistent completion infrequent Nighttime primarily infrequent Data can be out of date Near daily load failures Numerous points of failure Nighttime connection not as stable PS process unpredictable Labor intensive Lengthy process time
Vendor prod server Vendor rpt server Logical table scripts Logical tables Central Local Enterprise DW Local data Spec appl Other data Local Virtual Data Warehouse Nightly Load Concept Hour or less process Allows transparent and phased transition Existing local scripts -- points of failure inconsistent completion infrequent Very limited physical data space Data same state as vendor rpt Fewer points of failure More flexible ODS Shorter processing time Nightly process Load process strictly at Central site potential sources primary source firewall
Vendor rpt server Logical tables Central Local Reports Applications/Feeds Enterprise DW Local data Spec appl Other data Virtual Local Data Warehouse Reporting Concept Allows transparent and phased transition -- points of failure infrequent Daytime primarily Data same state as vendor rpt More flexible ODS potential data sources primary data source Response time could be slightly longer Quicker response to connectivity issues
Solution Comparisons Labor intensive Highly vulnerable to process disruptions Longer refresh window Failures not found until start of bus. day Protected against telecommunication issues Faster response time Slower response time Vulnerable to telecommunication issues Fewer points of failure Greatly shortened refresh window Data at same point as vendor source More flexible Requires less space Traditional SolutionVirtual Solution
The Transitional Bridge Legacy mainframe system PeopleSoft – Std Adm/ver 7.6 Client/server UMR Data Warehouse ~ 3,000 reports 23 applications Converted from IMS legacy system Jan, 2004 ALL required reports and applications operating by second day of classes 400 reports and ALL applications functioning by end of fourth week of classes And legacy removal
The Transitional Bridge PeopleSoft – Std Adm/ver 7.6 Client/server UMR Data Warehouse ~ 3,000 reports 23 applications Upgraded to ver 8.0 Jul, 2004 Upgraded to ver 8.9 Jul, 2005 No impact observed with ANY report or application for either upgrade And version upgrades
Jerry Hammons Supervisor, Data Warehousing/Reporting Information Technology University of Missouri – Rolla E-mail: Jerryh@umr.edu Art Brooks Adjunct Professor/IT Appl Dir (retired) University of Missouri – Rolla E-mail: firstname.lastname@example.org Contact
SORRY. Were out of handouts. If you would like a copy of our handout, please leave your business card, with email address and we will send you a copy.