2005 Epocrates, Inc. All rights reserved. Integrating XML with legacy relational data for publishing on handheld devices David A. Lee Senior member of.

Slides:



Advertisements
Similar presentations
Chapter 10: Designing Databases
Advertisements

Irwin/McGraw-Hill Copyright © 2000 The McGraw-Hill Companies. All Rights reserved Whitten Bentley DittmanSYSTEMS ANALYSIS AND DESIGN METHODS5th Edition.
XML DOCUMENTS AND DATABASES
Data Modeling and Database Design Chapter 1: Database Systems: Architecture and Components.
Introduction to Databases
Managing Data Resources
ETEC 100 Information Technology
PowerPoint Presentation for Dennis, Wixom & Tegarden Systems Analysis and Design Copyright 2001 © John Wiley & Sons, Inc. All rights reserved. Slide 1.
Organizing Data & Information
Interpret Application Specifications
PowerPoint Presentation for Dennis, Wixom & Tegarden Systems Analysis and Design Copyright 2001 © John Wiley & Sons, Inc. All rights reserved. Slide 1.
Copyright 2003 The McGraw-Hill Companies, Inc CHAPTER Application Software computing ESSENTIALS    
8 Systems Analysis and Design in a Changing World, Fifth Edition.
 MODERN DATABASE MANAGEMENT SYSTEMS OVERVIEW BY ENGINEER BILAL AHMAD
Module 17 Storing XML Data in SQL Server® 2008 R2.
Irwin/McGraw-Hill Copyright © 2000 The McGraw-Hill Companies. All Rights reserved Whitten Bentley DittmanSYSTEMS ANALYSIS AND DESIGN METHODS5th Edition.
Using Microsoft ACCESS to develop small to medium applications on campus.
Introduction –All information systems create, read, update and delete data. This data is stored in files and databases. Files are collections of similar.
Systems analysis and design, 6th edition Dennis, wixom, and roth
XML  JSON Lossless reversible transformation Epocrates, Inc. David A. Lee Senior Principle Software Engineer Tom Angelopoulos, Staff Engineer.
Database Solutions for Storing and Retrieving XML Documents.
PowerPoint Presentation for Dennis & Haley Wixom, Systems Analysis and Design, 2 nd Edition Copyright 2003 © John Wiley & Sons, Inc. All rights reserved.
Chapter 9 Designing Databases Modern Systems Analysis and Design Sixth Edition Jeffrey A. Hoffer Joey F. George Joseph S. Valacich.
Intro. to XML & XML DB Bun Yue Professor, CS/CIS UHCL.
Tech Terminology for non-technical people Tim Bornholtz 2006 Annual Conference.
NoSQL Databases Oracle - Berkeley DB. Content A brief intro to NoSQL About Berkeley Db About our application.
MULTIMEDIA DATABASES -Define data -Define databases.
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
Slide 1 Object Persistence Design Chapter 13 Alan Dennis, Barbara Wixom, and David Tegarden John Wiley & Sons, Inc. Slides by Fred Niederman Edited by.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Exam and Lecture Overview.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
Implementor’s Panel: BL’s eJournal Archiving solution using METS, MODS and PREMIS Markus Enders, British Library DC2008, Berlin.
1 Chapter Overview Performing Configuration Tasks Setting Up Additional Features Performing Maintenance Tasks.
Microsoft Sync Framework Content flow for the enterprise.
IS 325 Notes for Wednesday August 28, Data is the Core of the Enterprise.
1 Digital Preservation Testbed Database Preservation Issues Remco Verdegem Bern, 9 April 2003.
INFO1408 Database Design Concepts Week 15: Introduction to Database Management Systems.
Copyright 2006 Prentice-Hall, Inc. Essentials of Systems Analysis and Design Third Edition Joseph S. Valacich Joey F. George Jeffrey A. Hoffer Chapter.
Chapter 8 Data and Knowledge Management. 2 Learning Objectives When you finish this chapter, you will  Know the difference between traditional file organization.
PowerPoint Presentation for Dennis, Wixom, & Tegarden Systems Analysis and Design with UML, 3rd Edition Copyright © 2009 John Wiley & Sons, Inc. All rights.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
FHIR Server Design Review Brian Postlethwaite HEALTHCONNEX October 2015.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 1 DATABASE SYSTEMS Instructor Ms. Arwa Binsaleh.
Managing Data Resources. File Organization Terms and Concepts Bit: Smallest unit of data; binary digit (0,1) Byte: Group of bits that represents a single.
Building a Distributed Full-Text Index for the Web by Sergey Melnik, Sriram Raghavan, Beverly Yang and Hector Garcia-Molina from Stanford University Presented.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
Session 1 Module 1: Introduction to Data Integrity
Efficient Scripting David A. Lee Senior Principal Software Engineer Epocrates, Inc CONFIDENTIAL AND PROPRIETARY © Epocrates, Inc. All.
Copyright (c) 2014 Pearson Education, Inc. Introduction to DBMS.
Storage Systems CSE 598d, Spring 2007 OS Support for DB Management DB File System April 3, 2007 Mark Johnson.
Chapter 5-1. Chapter 5-2 Chapter 5: Organizing and Manipulating the Data in Databases Introduction Normalization Validating the Data in Databases Extracting.
Database Planning Database Design Normalization.
XML 2002 Annotation Management in an XML CMS A Case Study.
W4118 Operating Systems Instructor: Junfeng Yang.
3 Copyright © 2006, Oracle. All rights reserved. Designing and Developing for Performance.
XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
Managing Data Resources File Organization and databases for business information systems.
Systems Analysis and Design in a Changing World, Fifth Edition
An Introduction to database system
Dynamic SQL: Writing Efficient Queries on the Fly
Fundamentals of Information Systems, Sixth Edition
Chapter 13 The Data Warehouse
Database Database is a large collection of related data that can be stored, generally describes activities of an organization. An organised collection.
XML and Databases.
CIS 515 STUDY Lessons in Excellence-- cis515study.com.
Data Model.
Database Systems Instructor Name: Lecture-3.
WebDAV Design Overview
Presentation transcript:

2005 Epocrates, Inc. All rights reserved. Integrating XML with legacy relational data for publishing on handheld devices David A. Lee Senior member of the technical staff

2005 Epocrates, Inc. All rights reserved.Slide | 2 Importance of correct Clinical Information

2005 Epocrates, Inc. All rights reserved.Slide | 3 Introduction and Agenda Introduction Background “The Problem Space” False Starts “Common Object Model” Final Design Lessons Learned Conclusion

2005 Epocrates, Inc. All rights reserved.Slide | 4 Introduction Who is Epocrates ? Common Terminology Core Application

2005 Epocrates, Inc. All rights reserved.Slide | 5 Introduction Who is Epocrates ? Epocrates is the industry leader in providing clinical references on handheld devices. 475,000 active subscribers Subscription based clinical publishing

2005 Epocrates, Inc. All rights reserved.Slide | 6 Introduction Common Terminology PDA - "Personal Digital Assistant". Monograph - information describing a single drug, disease, lab test, preparation or other clinical entity. Syncing - The process of synchronizing a server's database with a PDA PDB - "Palm Database". A very simple variable length record format with a single 16 bit key index.

2005 Epocrates, Inc. All rights reserved.Slide | 7 Introduction Core Application “Essentials” Handheld Clinical Reference

2005 Epocrates, Inc. All rights reserved.Slide | 8 Background “The Problem Space” XML vs “Legacy” data Characteristics of the application Characteristics of "legacy" data Characteristics of "new" XML data Publishing Workflow Workflow Requirements

2005 Epocrates, Inc. All rights reserved.Slide | 9 Background XML vs “Legacy” data XML “Legacy” data

2005 Epocrates, Inc. All rights reserved.Slide | 10 Background Characteristics of the Application Runs on handheld devices Limited memory capability (8MB typical) Simple database Small display Synchronization speed critical Linking of related content

2005 Epocrates, Inc. All rights reserved.Slide | 11 Background Characteristics of “legacy” data Stored in Oracle SQL database Highly structured and referential Hard to change schema Data constantly changing (manual editing) Specifically designed schemas and content for presentation on PDAs. Difficult to change workflow, representation or tools Part of a complex workflow for publishing data to large subscriber base

2005 Epocrates, Inc. All rights reserved.Slide | 12 Background Characteristics of “new” XML data One large XML Document containing many “monographs” MB common. Periodic updates with unknown amount of change. Schema likely to change unexpectedly No control over content Referential data within and across monographs. Both structural and “markup” type elements

2005 Epocrates, Inc. All rights reserved.Slide | 13 Background Publishing workflow

2005 Epocrates, Inc. All rights reserved.Slide | 14 Background Workflow Requirements Integrates into current workflow with minimum changes to existing process and data structures. Reliable change detection Support for deferred detection of dependancies Accurately manage changes when data is updated Resilient to XML schema changes Extensible design

2005 Epocrates, Inc. All rights reserved.Slide | 15 False Starts One Big BLOB Full Normalization One BLOB per monograph XML Database

2005 Epocrates, Inc. All rights reserved.Slide | 16 False Starts One Big BLOB Store entire XML as a single “BLOB” Pros Very simple and easy Cons Deferred almost all processing to sync server Impossible to detect changes Solves no significant problem over using the filesystem No relational representation Difficult to search with SQL No structure or indexing at SQL level

2005 Epocrates, Inc. All rights reserved.Slide | 17 False Starts Full Normalization Fully normalize XML schema into separate DB tables for every element. Pros “Ideal” relational representation, referential integrity Fine granularity of modification detection Cons Very large number of tables (> 150) Difficult to implement Bad performance

2005 Epocrates, Inc. All rights reserved.Slide | 18 False Starts One BLOB per Monograph Split each monograph into an XML document fragment and store as a BLOB. Pros Fairly simple to implement Granularity maps well to device DB structure Cons Difficult to search via SQL Referential data not exposed at SQL level Significant processing deferred to sync server

2005 Epocrates, Inc. All rights reserved.Slide | 19 False Starts XML Database Use a native XML Database Pros Efficient and architecturally clean XML storage Cons NO in-house experience Difficult to integrate with existing tools “Locked In” to DB provider XML largely processed in-memory

2005 Epocrates, Inc. All rights reserved.Slide | 20 Common Object Model Abstract design pattern for modeling content with mappings to concrete representations. Document Structure XML Mapping Database Mapping Application Mapping

2005 Epocrates, Inc. All rights reserved.Slide | 21 Common Object Model Document Structure

2005 Epocrates, Inc. All rights reserved.Slide | 22 Common Object Model XML Mapping

2005 Epocrates, Inc. All rights reserved.Slide | 23 Common Object Model Database Mapping

2005 Epocrates, Inc. All rights reserved.Slide | 24 Common Object Model Application Mapping

2005 Epocrates, Inc. All rights reserved.Slide | 25 Final Design Final Design comprises a model based on the Common Object model as a Design Pattern. One BLOB per Monograph –XML Document Fragment Normalized referential data –Key fields as table fields Separate “compiled” BLOB per monograph

2005 Epocrates, Inc. All rights reserved.Slide | 26 Final Solution Modified Workflow Processes

2005 Epocrates, Inc. All rights reserved.Slide | 27 Lessons Learned Split up large XML files Don’t assume “All or Nothing” Process XML with a programming language Look for the distinction between “Structure” and “Markup” The ‘Real World’ is a compromise.

2005 Epocrates, Inc. All rights reserved.Slide | 28 Conclusion