2005 Epocrates, Inc. All rights reserved. Integrating XML with legacy relational data for publishing on handheld devices David A. Lee Senior member of the technical staff
2005 Epocrates, Inc. All rights reserved.Slide | 2 Importance of correct Clinical Information
2005 Epocrates, Inc. All rights reserved.Slide | 3 Introduction and Agenda Introduction Background “The Problem Space” False Starts “Common Object Model” Final Design Lessons Learned Conclusion
2005 Epocrates, Inc. All rights reserved.Slide | 4 Introduction Who is Epocrates ? Common Terminology Core Application
2005 Epocrates, Inc. All rights reserved.Slide | 5 Introduction Who is Epocrates ? Epocrates is the industry leader in providing clinical references on handheld devices. 475,000 active subscribers Subscription based clinical publishing
2005 Epocrates, Inc. All rights reserved.Slide | 6 Introduction Common Terminology PDA - "Personal Digital Assistant". Monograph - information describing a single drug, disease, lab test, preparation or other clinical entity. Syncing - The process of synchronizing a server's database with a PDA PDB - "Palm Database". A very simple variable length record format with a single 16 bit key index.
2005 Epocrates, Inc. All rights reserved.Slide | 7 Introduction Core Application “Essentials” Handheld Clinical Reference
2005 Epocrates, Inc. All rights reserved.Slide | 8 Background “The Problem Space” XML vs “Legacy” data Characteristics of the application Characteristics of "legacy" data Characteristics of "new" XML data Publishing Workflow Workflow Requirements
2005 Epocrates, Inc. All rights reserved.Slide | 9 Background XML vs “Legacy” data XML “Legacy” data
2005 Epocrates, Inc. All rights reserved.Slide | 10 Background Characteristics of the Application Runs on handheld devices Limited memory capability (8MB typical) Simple database Small display Synchronization speed critical Linking of related content
2005 Epocrates, Inc. All rights reserved.Slide | 11 Background Characteristics of “legacy” data Stored in Oracle SQL database Highly structured and referential Hard to change schema Data constantly changing (manual editing) Specifically designed schemas and content for presentation on PDAs. Difficult to change workflow, representation or tools Part of a complex workflow for publishing data to large subscriber base
2005 Epocrates, Inc. All rights reserved.Slide | 12 Background Characteristics of “new” XML data One large XML Document containing many “monographs” MB common. Periodic updates with unknown amount of change. Schema likely to change unexpectedly No control over content Referential data within and across monographs. Both structural and “markup” type elements
2005 Epocrates, Inc. All rights reserved.Slide | 13 Background Publishing workflow
2005 Epocrates, Inc. All rights reserved.Slide | 14 Background Workflow Requirements Integrates into current workflow with minimum changes to existing process and data structures. Reliable change detection Support for deferred detection of dependancies Accurately manage changes when data is updated Resilient to XML schema changes Extensible design
2005 Epocrates, Inc. All rights reserved.Slide | 15 False Starts One Big BLOB Full Normalization One BLOB per monograph XML Database
2005 Epocrates, Inc. All rights reserved.Slide | 16 False Starts One Big BLOB Store entire XML as a single “BLOB” Pros Very simple and easy Cons Deferred almost all processing to sync server Impossible to detect changes Solves no significant problem over using the filesystem No relational representation Difficult to search with SQL No structure or indexing at SQL level
2005 Epocrates, Inc. All rights reserved.Slide | 17 False Starts Full Normalization Fully normalize XML schema into separate DB tables for every element. Pros “Ideal” relational representation, referential integrity Fine granularity of modification detection Cons Very large number of tables (> 150) Difficult to implement Bad performance
2005 Epocrates, Inc. All rights reserved.Slide | 18 False Starts One BLOB per Monograph Split each monograph into an XML document fragment and store as a BLOB. Pros Fairly simple to implement Granularity maps well to device DB structure Cons Difficult to search via SQL Referential data not exposed at SQL level Significant processing deferred to sync server
2005 Epocrates, Inc. All rights reserved.Slide | 19 False Starts XML Database Use a native XML Database Pros Efficient and architecturally clean XML storage Cons NO in-house experience Difficult to integrate with existing tools “Locked In” to DB provider XML largely processed in-memory
2005 Epocrates, Inc. All rights reserved.Slide | 20 Common Object Model Abstract design pattern for modeling content with mappings to concrete representations. Document Structure XML Mapping Database Mapping Application Mapping
2005 Epocrates, Inc. All rights reserved.Slide | 21 Common Object Model Document Structure
2005 Epocrates, Inc. All rights reserved.Slide | 22 Common Object Model XML Mapping
2005 Epocrates, Inc. All rights reserved.Slide | 23 Common Object Model Database Mapping
2005 Epocrates, Inc. All rights reserved.Slide | 24 Common Object Model Application Mapping
2005 Epocrates, Inc. All rights reserved.Slide | 25 Final Design Final Design comprises a model based on the Common Object model as a Design Pattern. One BLOB per Monograph –XML Document Fragment Normalized referential data –Key fields as table fields Separate “compiled” BLOB per monograph
2005 Epocrates, Inc. All rights reserved.Slide | 26 Final Solution Modified Workflow Processes
2005 Epocrates, Inc. All rights reserved.Slide | 27 Lessons Learned Split up large XML files Don’t assume “All or Nothing” Process XML with a programming language Look for the distinction between “Structure” and “Markup” The ‘Real World’ is a compromise.
2005 Epocrates, Inc. All rights reserved.Slide | 28 Conclusion