A Warehouse and Reporting Architecture for Healthcare using Oracle Technologies [CON3926] Nicholas Collins Clinical Analytics and Informatics 24 September.

Slides:



Advertisements
Similar presentations
Jack Jedwab Association for Canadian Studies September 27 th, 2008 Canadian Post Olympic Survey.
Advertisements

Symantec 2010 Windows 7 Migration EMEA Results. Methodology Applied Research performed survey 1,360 enterprises worldwide SMBs and enterprises Cross-industry.
EcoTherm Plus WGB-K 20 E 4,5 – 20 kW.
Números.
Symantec 2010 Windows 7 Migration Global Results.
University Paderborn 07 January 2009 RG Knowledge Based Systems Prof. Dr. Hans Kleine Büning Reinforcement Learning.
AGVISE Laboratories %Zone or Grid Samples – Northwood laboratory
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
PDAs Accept Context-Free Languages
EuroCondens SGB E.
Reinforcement Learning
Slide 1Fig 26-CO, p.795. Slide 2Fig 26-1, p.796 Slide 3Fig 26-2, p.797.
Sequential Logic Design
Copyright © 2013 Elsevier Inc. All rights reserved.
David Burdett May 11, 2004 Package Binding for WS CDL.
Create an Application Title 1Y - Youth Chapter 5.
Add Governors Discretionary (1G) Grants Chapter 6.
CALENDAR.
CHAPTER 18 The Ankle and Lower Leg
The 5S numbers game..
突破信息检索壁垒 -SciFinder Scholar 介绍
© Tally Solutions Pvt. Ltd. All Rights Reserved Shoper 9 License Management December 09.
A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
1Applied-Apologetics The Triunity of God 5Applied-Apologetics.
Break Time Remaining 10:00.
The basics for simulations
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Factoring Quadratics — ax² + bx + c Topic
Database Performance Tuning and Query Optimization
PP Test Review Sections 6-1 to 6-6
Employee & Manager Self Service Overview
1 IMDS Tutorial Integrated Microarray Database System.
Briana B. Morrison Adapted from William Collins
MCQ Chapter 07.
2013 Fox Park Adopt-A-Hydrant Fund Raising & Beautification Campaign Now is your chance to take part in an effort to beautify our neighborhood by painting.
Regression with Panel Data
TCCI Barometer March “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
Dynamic Access Control the file server, reimagined Presented by Mark on twitter 1 contents copyright 2013 Mark Minasi.
TCCI Barometer March “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
Progressive Aerobic Cardiovascular Endurance Run
Biology 2 Plant Kingdom Identification Test Review.
FAFSA on the Web Preview Presentation December 2013.
CSE 6007 Mobile Ad Hoc Wireless Networks
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
Facebook Pages 101: Your Organization’s Foothold on the Social Web A Volunteer Leader Webinar Sponsored by CACO December 1, 2010 Andrew Gossen, Senior.
TCCI Barometer September “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
When you see… Find the zeros You think….
2011 WINNISQUAM COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=1021.
Before Between After.
2011 FRANKLIN COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=332.
Subtraction: Adding UP
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Figure 10–1 A 64-cell memory array organized in three different ways.
Static Equilibrium; Elasticity and Fracture
1 © 2004, Cisco Systems, Inc. All rights reserved. CCNA 1 v3.1 Module 9 TCP/IP Protocol Suite and IP Addressing.
Pupil Premium CSV File Import & Maintain Jim Haywood Product Manager for Statutory Returns.
WARNING This CD is protected by Copyright Laws. FOR HOME USE ONLY. Unauthorised copying, adaptation, rental, lending, distribution, extraction, charging.
9. Two Functions of Two Random Variables
Patient Survey Results 2013 Nicki Mott. Patient Survey 2013 Patient Survey conducted by IPOS Mori by posting questionnaires to random patients in the.
A Data Warehouse Mining Tool Stephen Turner Chris Frala
1 Dr. Scott Schaefer Least Squares Curves, Rational Representations, Splines and Continuity.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Introduction Embedded Universal Tools and Online Features 2.
Presented to: By: Date: Federal Aviation Administration FAA Safety Team FAASafety.gov AMT Awards Program Sun ‘n Fun Bryan Neville, FAASTeam April 21, 2009.
Schutzvermerk nach DIN 34 beachten 05/04/15 Seite 1 Training EPAM and CANopen Basic Solution: Password * * Level 1 Level 2 * Level 3 Password2 IP-Adr.
Presentation transcript:

A Warehouse and Reporting Architecture for Healthcare using Oracle Technologies [CON3926] Nicholas Collins Clinical Analytics and Informatics 24 September 2013

Topics About MD Anderson (and me!) The Future of Cancer Treatment and Research Oracle at MD Anderson Our Warehouse and Reporting Architecture Implementation Conclusions 2

1 About MD Anderson 3

About MD Anderson Non-profit cancer hospital and research institution, founded in 1941 as part of The University of Texas System Named after Monroe Dunaway Anderson (a banker and cotton trader, not an MD) “Making Cancer History” – our mission is to eradicate cancer Consistently ranked as the #1 hospital for cancer care 4

About MD Anderson 5

About MD Anderson Over 19,000 employees, majority in the Houston area Occupying over 20 buildings in the Texas Medical Center The Texas Medical Center has over 50 member institutions, together over 100,000 employees 6

About Me Began working at MD Anderson while an undergraduate at Rice University (across the street) Currently in the Clinical Analytics and Informatics (CAI) Department, but before that was in HR Information Management (HRIM), working with PeopleSoft and other HR apps While in HRIM, built a custom HR data warehouse and reporting system using a combination of Microsoft and Oracle technologies After hours, a professional stage actor/director in Houston 7

The Future of Cancer Treatment and Research 2 The Future of Cancer Treatment and Research 8

MD Anderson Moon Shots Program “The Time is Now. Together we will end cancer.” Target six cancers: Breast/Ovarian, Leukemia (AML/MDS & CLL), Melanoma, Lung, Prostate Clear focus on the concept that the answer to curing cancer lies in both clinical and genomic data 9

MD Anderson Moon Shots Program http://www.cancermoonshots.org 10

It’s in the Data! 11

MD Anderson Moon Shots Platforms Massive Data Analytics – An infrastructure for complex analytics and clinical decision support using integrated patient information, including clinical and research data Big Data – An Information Technology infrastructure/environment that enables centralization, integration and secured access of patient and research data and analytical results 12

It’s in the Genes! 13

MD Anderson Moon Shots Platforms Clinical Genomics – Clinical gene sequencing infrastructure, including centralized bio-specimen repository and processing Omics – Bioinformatics – A high-throughput infrastructure for generation and standardization of large-scale “omic” data, including genomics, proteomics and immune profiling Adaptive Learning in Genomic Medicine – A framework for bringing clinical medicine and genomic research together to enable rapid learning to improve patient management using Clinical Genomics, Omics-Bioinformatics and Massive Data Analytics platforms within the Big Data environment 14

Genomics in the News 15

3 Oracle at MD Anderson 16

Oracle Health Sciences Products at MD Anderson Oracle Healthcare Data Warehouse Foundation (HDWF) Oracle Healthcare Analytics Data Integration (OHADI) Oracle TRC (Translational Research Center) Cohort Explorer Oracle TRC Omics Data Bank (ODB) 17

Oracle Technology at MD Anderson Oracle Database 11gR2 Oracle Exadata (x3) Oracle Business Intelligence (OBIEE) Oracle GoldenGate* *Oracle GoldenGate was used to demonstrate replication capabilities in a significant POC, but has not been purchased or put into production. Informatica is commonly used at MD Anderson for data integration; ODI is not currently in use at the institution. 18

Oracle Healthcare Data Warehouse Foundation (HDWF) HDM HDI 19

Oracle Healthcare Analytics Data Integration (OHADI) HDM HDI Integration code that maps from the interface tables (HDI) to the warehouse tables (HDM) Available as either Informatica or ODI mappings 20

Oracle Cohort Explorer CDM 21

Oracle Cohort Explorer 22

Oracle Omics Data Bank (ODB) 23

24

Review of Oracle Health Sciences Products Cohort Explorer HDI HDM CDM OHADI ODB 25

Our Warehouse and Reporting Architecture 4 Our Warehouse and Reporting Architecture 26

The MD Anderson FIRE Program FIRE - Federated Institutional Reporting Environment A program level initiative, with many projects and products involved, to provide a unified BI/Reporting solution for all of MD Anderson Managed by the Clinical Analytics and Informatics (CAI) Department, part of Oracle SDP (Strategic Development Parnter) Program 27

FIRE Program Team Structure (Early Proposal) 28

FIRE Program Team Structure (Current) 29

First FIRE Release – Pharmacy Dashboard Custom-built OBIEE Dashboard for orders data, pulling data from HDM, with HDI populated from the GE Centricity source Dimensional model with orders as the core fact Included a smaller pre-release of HR data for staff details and testing the FIRE Architecture Purchased HDWF and OHADI in May 2012, aiming for a fall go-live for our first FIRE release 30

Pharmacy Dashboard (OBIEE) 31

Unifying the Solution Oracle Health Sciences GBU provides products that are a part of an overall solution, the rest is organization specific We needed a way to effectively get data into HDWF and deliver it to custom Data Marts Having a pre-built warehouse model helps with speed of delivery, but not necessarily source system mapping and integration 32

Architectural Concept Bring all the data processing together on a single Oracle instance for performance benefits of local movement and transformation Abstract across all commonalities and patterns to the largest extent possible, avoiding needless one-off solutions, use code generation and automation Ideal candidate for a later “forklift” to Exadata 33

The FIRE Architecture 34

Source Systems at MD Anderson Currently no centralized EHR solution in place, a best-of- breed model with many disparate source systems Data currently brought together for patient-care clinical use in a single UI by a SOA-based custom .NET app called ClinicStation In July 2013, the institution announced its intention to migrate to Epic’s EHR solution 35

The FIRE Architecture HDI/HDM SR SI UI UD 36

SR (Staging Replica) Layer Stores replicated data from source systems in the consolidated warehouse environment, provides buffer from sources and their technology, also allows custom indexing and partitioning if needed Replication can be accomplished in a variety of ways, using a “bag of dirty tricks” to get the data into relational form in Oracle GoldenGate proposed as standard tool to replicate in from relational sources, transparent gateway and Informatica/ODI are other options Repliaction done at table level for consistency and ease of change 37

SI (Staging Interface) Layer Pull data directly from the corresponding SR schema (i.e. the SI_CENTRICITY schema pulls data from the SR_CENTRICITY schema) Contains views that match the target HDI tables, one-to-one, same column names and data types Accomplishes selection of the appropriate data, and any necessary pre-transformation In complex cases, can have materialized pre-processing data in tables or materialized views, in practice this meets the 80/20 rule 38

HDI Layer Oracle-defined HDWF interface tables A “landing zone” schema, can be used as a Persistent Staging Area (PSA) Conceptually, where you place unrefined and unvalidated data to be processed by OHADI before it goes into the HDM (warehouse tables) Tables are designed to be insert-only (source-change dated) 39

HDM Layer Oracle-defined HDWF warehouse tables HDM stands for Healtcare Data Model Keys and Referential Integrity (RI) in place for this layer, but RI is disabled by default Conceptually, where all your data is persistently stored, though reloads from HDI are possible Can be configured for effective-dating or only current state 40

UI (User Interface) Layer Similar in concept to the SI Layer, but pulling data from HDM for use in the UD layer Contains views that match the target UD tables, one-to-one, same column names and data types Accomplishes selection of the appropriate data, and any necessary pre-transformation (like SI) In complex cases, can have materialized pre-processing data in tables or materialized views, in practice this meets the 80/20 rule (like SI) 41

UD (User Data Mart) Layer Pull data directly from the corresponding UI schema (i.e. the UD_RX schema pulls data from the views in the UI_RX schema) Contains the user-layer target tables that are used in the dimensional (star schema) models Can be used directly by OBIEE or schemas can be replicated to a separate user database for isolated/off-loaded dashboard processing Data-wise, it’s the end result of the warehouse pipeline 42

Data Movement 43

Data Movement (Planned) There was a desire from our integration team to use Informatica for ETL because of experience base on the team, not much PL/SQL or ODI knowledge Architecture proposed use of abstracted code generation via Informatica APIs, jointly used with the push-down optimization option for all non-OHADI internal data movement (i.e. SI to HDI, UI to UD) 44

Data Movement (Actual) Our integration team initially indicated that code generation with Informatica (or other tools) could not be done on account of complexity, and that the push-down optimization option was too expensive To demostrate the feasibility, I programmed a PL/SQL-based version of the code generation as proposed in the FIRE Architecture documentation, we used this code in the first release 45

Data Movement Code Generation 46

PL/SQL Procedures for Code Generation Procedure iv_tv_ip_gen(name_of_sv_view) for SI layer, generates objects for change detection and movement from SR to HDI Procedure iv_uv_dv_ip_gen(name_of_sv_view) for UI layer, generates objects for change detection and movement from HDM to UD All that is needed for generation is the SV view, which conforms to the HDI- based structure, data in certain standard HDI columns determine action A benefit of the generated views is the ability to see what will happen during the next run, without actually running anything 47

Results/Next Steps Had approximately three months to implement, process was difficult, but in the end everything worked and we went to production with the first FIRE release in November 2012 OHADI was slower than expected but got the job done, Informatica version used, but might be faster with ODI? Integration team wanted a second chance to get code generation going for Informatica, and wanted more Informatica and less SQL and PL/SQL Committee voted to try Informatica alternatives for the next release 48

Second FIRE Release – Moon Shot Analytics and Pharmacy Dashboard In this release, there were four total projects under the FIRE Program: (1) Second Pharmacy Release, (2) Cohort Explorer and ODB (Moon Shot Analytics), (3) Exadata Implementation for Cohort Explorer and ODB, and (4) OBIEE Infrastructure Upgrade Beginning to use Omics Data Bank, data volumes required Exadata License restrictions and no budget yet to put HDWF on Exadata 49

Second Release Cohort Explorer CDM ODB 50

Architectural Changes 51

Architectural Changes Informatica Code Generation using Java to generate Informatica objects, so not using the PL/SQL code generation with SI and UI views for this release Integration team wanted an instantiated SI Layer and UI Layer for Informatica-based code generation instead of views in the SI and UI Layer 52

Informatica Code Generation with Java 53

Results Successfully implemented all projects within six months, Cohort Explorer and ODB now live, along with the new Pharmacy Dashboard Informatica code generation worked successfully, but perfomance issues surfaced, particularly with the new SI and UI instantiated data The old Pharmacy code (views) was easy to change/update and the integration team did not want to convert old code to the new Informatica- based methodology, hence we have a hybrid model in place currently Some of the Informatica code had to be abandoned close to go-live, replaced by quickly created SQL in views (materialized views) 54

More Results Had to tinker a little bit with the Exadata install of ODB and CDM, ended up to drop all indexes and implement some materialized views for improved performance, now being incorporated into the tool via our SDP partnership We noticed that Oracle was now using a view-based methodology for the CDM ETL, Informatica mappings use views off of HDM, differs from OHADI, more similar to our original architecture Indexing, partitioning, and SQL tuning were especially necessary in working with our conventional HDWF environment, Exadata can help in the future Aggregation was important to getting OBIEE to perform well with ROLAP 55

Exadata Implmentation for Cohort Explorer and Omics Data Bank (Moon Shot Analytics) CDM ODB 56

Exadata Architecture Database Clients Administrator Database Servers Infiniband Switch Storage Servers 57

Rack Configurations (x3-2) Full Rack – 8 DB Servers, 14 Storage Servers Half Rack – 4 DB Servers, 7 Storage Servers Quarter Rack – 2 DB Servers, 3 Storage Servers Eighth Rack – 2 DB Servers*, 3 Storage Servers** * The eighth rack’s DB servers each have one processor disabled via software. ** The eighth rack’s storage servers have half the drives and half the flash cards. 58

Exadata x3-2 Capabilities 59

Exadata Database Features Smart Scans (Query Offloading) Storage Indexes Hybrid Columnar Compression Exadata Smart Flash Cache These are software-based “Exdata-only” features. 60

Standard Database I/O Exadata I/O Conventional Storage select … I/O Request Client Database Server Exadata I/O Exadata Storage Servers Smart Scan Request select … Client Exadata DB Server 61

CDM/ODB Implementation - Exadata Equipment Purchased Development/Test Environment Production Environment 62

Exadata Pre-installation Activities Oracle Exadata Readiness Checklist Prepare Network and Power Connections Shipping needs to align with Sun technician arrival SFP Modules – be sure to order if needed Training from Enkitec 63

MD Anderson CAI “War Room” Exadata Implementation Team January 2013 64 Photo shown courtesy of Mr. Robert Jeffries, Project Manager

65

Exadata Validation Scripts Scripts to verify functionality Eighth Rack scripts expected Quarter Rack Most documentation still for x2 equipment, most likely updated by now 66

Exadata Storage Allocation Crucial to plan storage allocation in advance Recommended DATA/RECO disk groups: OLTP – 60%/40%, DW 80%/20%, we did a 70%/30% split DB instances share these disk groups per rack Implemented DBFS (Oracle Database File System) for use with the ODB Loaders, omics data files are large, particularly genomic reference data 67

Exadata Instance Creation Performed after storage allocation We installed two separate instances on production rack, four separate instances on dev/test rack All instances are clustered via RAC, ODB and CDM are in the same database instance Instance creation went fairly quickly for us, but would not have been the case for a consolidation project 68

Additional Components Backup – IBM Tivoli (non-Oracle) Anti-virus Software (non-Oracle) Oracle Enterprise Manager 12c Oracle Platinum Gateway 69

Oracle Enterprise Manager 12c 70

71

Exadata Implementation Lessons Learned Single points of contact are ideal (Oracle, Enkitec, Internal Departments) Early engagement project planning important “War Room” concept very effective Some documentation for new products hard to find (i.e. x3, OEM 12c) Enkitec training was fantastic, but probably should have happened earlier than the delivery, we were on an accelerated implementation schedule 72

GoldenGate POC at MD Anderson – July 2012 Assess GoldenGate’s viability toward MD Anderson Use Cases requiring Heterogeneous Data Replication, Flexible Data Deployment and Continuous Availability Determine ease-of-configuration, deployment, manageability and reliability of GoldenGate as implemented within the use cases “Should GoldenGate handle these use cases convincingly, the POC will be considered as successful” 73

GoldenGate POC Use Cases Use Case #1 - Heterogeneous Data Replication from multiple Source Databases, namely, PICIS (CareSuite), OR Manager (Security, Surgery, Interface DBs), and Sybase (RADDATA) to an Oracle 11g R2 Target Database (staging area for HCM Data Warehouse) Use Case #2 - Heterogeneous Data Replication from multiple Source Databases, namely PICIS (CareSuite), OR Manager (Security, Surgery, Interface DBs), and Sybase (RADDATA) to a SQL Server 2008 R2 Target Database (Operations Reporting) Use Case #3 - Flexible Data Deployment topology to merge activity data from multiple Source Tables (Sybase) to a Single SQL Server Target Table (Operations Reporting Database) Use Case #4 - Architecture capable of handling schema differences across platforms (Audit Reporting Columns added at Oracle Target Database) 74

GoldenGate POC Architecture 75

GoldenGate POC Results 76

Implementation Conclusions 5 Implementation Conclusions 77

Implementation Conclusions The HDWF, OHADI, Cohort Explorer, and Omics Data Bank products helped us deliver a lot of functionality very quickly, in only one year – our president was very impressed with Cohort Explorer when coupled with ODB, calling it a “game changer” Having a warehouse model in place helps you avoid a lot of the headaches and time that could be lost to developing your own intermediary models and related enforcement of standards OHADI gives you a good starting point for the HDI to HDM ETL so you can effectively generate your RI relationships and manage exceptions Taking some time to think about our overall architecture and try out some new techniques paid off in the long run, still important to deliver functionality of course 78

Implementation Conclusions The HDM tables cover a large swath of clinical concepts, but sometimes things might not fit perfectly, working with Oracle as an SDP has its advantages, as does understanding how to customize the model OHADI is supposed to be the step in the process where data is cleansed and validated, but a lot of customization would be required to do so thoroughly, Oracle will be working to improve this portion of the product, until then you can cleanse pre-HDI if OOTB functionality is not there yet The Informatica version of OHADI had some performance issues on our environments, but I would predict the ODI (Oracle Data Integrator) version would run faster Having a good vocabulary/terminology approach in place will help tremendously with your implementation of code systems in the model, Oracle is beginning to integrate OHADI with HLI 79

Implementation Conclusions The Cohort Explorer application aims to deliver the exact functionality that will be needed in cancer research – merging the clinical and genomic data, but the application still needs some more maturity and better use of HDWF structures, we do not want the tail to wag the dog Omics Data Bank helps to centralize a variety of genomics data using set standards from the academic and scientific community, this concept seems to work well in practice The complexity of the queries on ODB and CDM were sometimes challenging from a performance perspective – it is clear that Exadata helps tremendously in this area and we would not have been nearly as successful without it Be sure you have a knowledgeable SME resource for these Oracle technologies to assist you through implementation, they would be difficult to implement on your own 80

MD Anderson Future Steps Get HDWF on Exadata, already ordered two new eighth racks Upgrade Cohort Explorer and ODB to newer versions, when released Focus on architectural refinements/changes for the FIRE Architecture Need to get better code system infrastructure in place, will take work, but worth it in the end – it is unclear how use of Epic/HLI will affect us Beginning NLP pipeline for unstructured data, currently prototyping using IBM ICA platform, but possibly investigate use of Big Data Appliance and/or Hadoop? Semantic technologies? Prepare for Epic as source, hopefully acquire GoldenGate eventually too 81

Collins Axioms – Parting Maxims Abstraction over commonalities is key in a world driven increasingly by “Big Data” High-end performance is crucial for BI data delivery, every bit counts Always mind the “Mythical Man-Month,” the more knowledge and capability a single individual has, the better - streamline your teams for effective agility and delivery, understand the business, beware of “design by committee” You will never get it right the first time, expect changes and have flexibility to adapt quickly Virtualization, in-memory database objects, and real-time data are gaining momentum in the realm of BI and warehousing, learn to embrace them 82

Questions? ncollins@mdanderson.org www.mdanderson.org 83