Presentation is loading. Please wait.

Presentation is loading. Please wait.

FORUM II Best Practices in Data Warehousing in Higher Education: A Framework for Higher Education Reporting April 18, 2005 Slide 1 Cornell University’s.

Similar presentations


Presentation on theme: "FORUM II Best Practices in Data Warehousing in Higher Education: A Framework for Higher Education Reporting April 18, 2005 Slide 1 Cornell University’s."— Presentation transcript:

1 FORUM II Best Practices in Data Warehousing in Higher Education: A Framework for Higher Education Reporting April 18, 2005 Slide 1 Cornell University’s Data Warehousing Infrastructure Presented by: Jeff Christen Data Warehousing DBA – Team Lead jrc42@cornell.edu

2 FORUM II Best Practices in Data Warehousing in Higher Education: A Framework for Higher Education Reporting April 18, 2005 Slide 2 Jeff Christen DBA Lead - Data Warehousing –Ten Years experience as a DBA (Oracle & Informix) –Last Four Years Focused on Data Warehouse DBA Support Team Interface to other IT groups Team Interface to University

3 FORUM II Best Practices in Data Warehousing in Higher Education: A Framework for Higher Education Reporting April 18, 2005 Slide 3 Responsibilities of a DW DBA Production Support – 24x7 (Database & Load) Performance Monitoring & Tuning Database Backup & Recovery Security Implementation Object & Code Migrations (Dev / Test / Prod) Infrastructure Development & Maintenance Enforcement of University Policies & Practices Integration of Models & ETL into Warehouse Environment Assist in Data Modeling & ETL Development

4 FORUM II Best Practices in Data Warehousing in Higher Education: A Framework for Higher Education Reporting April 18, 2005 Slide 4 Cornell’s Warehousing Challenges Twelve production data marts Twenty-five unique loads Variety of sources(mainframe, PeopleSoft) Varied load frequencies(daily, weekly, monthly) Varied load requirements(full, partial, append) Rapidly shrinking load windows Multiple server & O.S. environment

5 FORUM II Best Practices in Data Warehousing in Higher Education: A Framework for Higher Education Reporting April 18, 2005 Slide 5 DMTools as a Solution DMTools is a Data Warehousing infrastructure management tool developed and in use by Cornell University. Allows high data availability- 24x7 access Repository driven Manages loads Toolbox written in Oracle PL/SQL (O.S. independent) GUI console to manage load related metadata

6 FORUM II Best Practices in Data Warehousing in Higher Education: A Framework for Higher Education Reporting April 18, 2005 Slide 6 High Data Availability Table Renaming Process Two copies of each data mart table are maintained A current table and a backup or work table Data mart users only “see” the current table New data is loaded into “work” table Rename tables –rename person to person_b –rename person_w to person

7 FORUM II Best Practices in Data Warehousing in Higher Education: A Framework for Higher Education Reporting April 18, 2005 Slide 7 High Data Availability Table Renaming Benefits Instant access to new data Rename does not interrupt users “Backup” table for previous load’s data Ability to instantly roll back load

8 FORUM II Best Practices in Data Warehousing in Higher Education: A Framework for Higher Education Reporting April 18, 2005 Slide 8 High Data Availability Table Renaming Example

9 FORUM II Best Practices in Data Warehousing in Higher Education: A Framework for Higher Education Reporting April 18, 2005 Slide 9 High Data Availability Partition Exchange Similar to table rename May exchange table partitions without disrupting users Maintain full partitioned copy of table & small non- partitioned copy Data loaded into non-partitioned table, then exchanged with appropriate partition

10 FORUM II Best Practices in Data Warehousing in Higher Education: A Framework for Higher Education Reporting April 18, 2005 Slide 10 High Data Availability Partition Exchange Example

11 FORUM II Best Practices in Data Warehousing in Higher Education: A Framework for Higher Education Reporting April 18, 2005 Slide 11 High Data Availability Additional Considerations Move security between table copies Remove / add policy to table during exchange Constraint management Manage index names between table copies (A or B suffix to guarantee unique names) Step verification & error handling

12 FORUM II Best Practices in Data Warehousing in Higher Education: A Framework for Higher Education Reporting April 18, 2005 Slide 12 Production AIX Server Transformation & Staging Current Warehousing Environment Infrastructure Production AIX Server Consolidated DataMarts Production AIX Server DMTools Repository Replication Database (PeopleSoft) Production Windows Server H/R & Payroll Production Windows Server Contributor Relations Production Windows Server Student Administration Logging & Metadata Load (pull) from Replication Database Load (pull) from Replication Database Load (push) from Staging Database

13 FORUM II Best Practices in Data Warehousing in Higher Education: A Framework for Higher Education Reporting April 18, 2005 Slide 13 DMTools Repository Data Model DM_RUN_DETAILS DM_INDEXES DM_NAMES DM_TAB_PART_COUNTS DM_GRANTS DM_ROW_COUNTS DM_RUNS DM_CONSTRAINTS DM_TABLES DM_CONS_COLUMNS DM_POLICIES DM_IND_PARTITIONS DM_IND_COLUMNS DM_ADMIN_USERS DM_EMAIL_CONTENT DM_EMAIL_SUBSCRIPTIONS

14 FORUM II Best Practices in Data Warehousing in Higher Education: A Framework for Higher Education Reporting April 18, 2005 Slide 14 DMTools Repository Data Model (System Catalog)

15 FORUM II Best Practices in Data Warehousing in Higher Education: A Framework for Higher Education Reporting April 18, 2005 Slide 15 DMTools Repository Data Model (Logging Metrics)

16 FORUM II Best Practices in Data Warehousing in Higher Education: A Framework for Higher Education Reporting April 18, 2005 Slide 16 DMTools Repository Resides in a centralized database Contains metadata needed for load process Contains logging data related to load process & load metrics Holds notification information Scalable(may use for multiple data marts)

17 FORUM II Best Practices in Data Warehousing in Higher Education: A Framework for Higher Education Reporting April 18, 2005 Slide 17 Load Logging High level logging –Start & end times for pre load, load, post load –Load status( completed, running, failed ) –Snapshot date –Estimated completion time –Record counts(by table & partition) Low level logging –Start /end time & description for every action –Captures Oracle errors & SQL executing

18 FORUM II Best Practices in Data Warehousing in Higher Education: A Framework for Higher Education Reporting April 18, 2005 Slide 18 Load Status Monitoring Predefined views –Status views(high level load info for given DM) –Detail views(used by DBA for trouble shooting) –Load metrics views (various object counts related to loads) –Flexibility to build additional views on logging tables as needed

19 FORUM II Best Practices in Data Warehousing in Higher Education: A Framework for Higher Education Reporting April 18, 2005 Slide 19 DMTools Toolbox Set of Oracle stored procedures used to perform common load tasks Resides on each database in the DMTools environment Procedures bundled in packages related to their function –dmrunlog –dmidxpack …(partial list only)…

20 FORUM II Best Practices in Data Warehousing in Higher Education: A Framework for Higher Education Reporting April 18, 2005 Slide 20 Main Procedure Unique to each data mart Not owned by DMTools(data mart schema) Templates for easy starting point Modify template to accommodate Preload, load, post load

21 FORUM II Best Practices in Data Warehousing in Higher Education: A Framework for Higher Education Reporting April 18, 2005 Slide 21 DMTools Console GUI interface to the repository Metadata maintenance –Adding/removing tables, indexes, grants, etc. –Migration management Email/pager subscription & notification maintenance Warnings & reports –Detect potential load problems prior to load

22 FORUM II Best Practices in Data Warehousing in Higher Education: A Framework for Higher Education Reporting April 18, 2005 Slide 22 DMTools Console - GUI Interface (main)

23 FORUM II Best Practices in Data Warehousing in Higher Education: A Framework for Higher Education Reporting April 18, 2005 Slide 23 DMTools Console - GUI Interface (add Table)

24 FORUM II Best Practices in Data Warehousing in Higher Education: A Framework for Higher Education Reporting April 18, 2005 Slide 24 DMTools Advantages High data availability (24x7 access) Instant rollback to pre-load state Centralized logging and notification Metadata driven O.S. independent (Oracle RDBMS only) Works with various ETL tools Simple setup and maintenance Very scaleable

25 FORUM II Best Practices in Data Warehousing in Higher Education: A Framework for Higher Education Reporting April 18, 2005 Slide 25 Cornell University’s Data Warehousing Infrastructure INTEREST ? // QUESTIONS ? Presented by: Jeff Christen Data Warehousing DBA – Team Lead jrc42@cornell.edu


Download ppt "FORUM II Best Practices in Data Warehousing in Higher Education: A Framework for Higher Education Reporting April 18, 2005 Slide 1 Cornell University’s."

Similar presentations


Ads by Google