Presentation is loading. Please wait.

Presentation is loading. Please wait.

Software and Enterprise Architectures

Similar presentations


Presentation on theme: "Software and Enterprise Architectures"— Presentation transcript:

1 Software and Enterprise Architectures
Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut 371 Fairfield Road, Box U-255 Storrs, CT (860) Copyright © 2008 by S. Demurjian, Storrs, CT.

2 Software Architectures
Emerging Discipline in Mid-1990s Software as Collection of Interacting Components What are Local Interactions (within Component)? What are Global Interactions (between Components)? Advantages of SW Architectural Design Understand Communication/Synchronization Definition of Database Requirements Identification of Performance/Scaling Issues Detailing of Security Needs and Constraints Towards Large-Scale Software Development For Biomedical Informatics: What are Architectures for Data Sharing? How is Interoperability Facilitated?

3 Concepts of Software Architectures
Exceed Traditional Algorithm/Data Structure Perspective Emphasize Componentwise Organization and System Functionality Focus on Global and Local Interactions Identify Communication/Synchronization Requirements Define Database Needs and Dependencies Consider Performance/Scaling Issues Understand Potential Evolution Dimensions

4 The HTSS Software Architecture
SDO EDO IL Item SDO EDO IL ItemDB Local Server IL Payment CR CR IC Order CR Non-Local Client Int. IC CR IL: Item Locator CreditCardDB CR: Cash Register Inventory Control IC: Invent. Control ATM-BanKDB DO: Deli Orderer for Shopper/Employee ItemDB Global Server OrderDB SupplierDB

5 Multiple Backend Database System (MBDS)
Database Processor Backend Database Processor Database Controller Host/User Backend Database Processor

6 The MBDS Processes Request Preparation Post Processing Put Msg.
Database Controller Put Msg. Get Msg. Get Msg. Put Msg. Directory Management Record Processing Backend Database Processor Concurrency Control Disk I/O

7 Multiple Processes in MBDS
No. Type SRC DST 1 New Request Host ReqP 2 Results of Request PoPr Host 3 Number of Reqs in Transaction ReqP PoPr 4 Aggregate Operators (Sum, etc.) ReqP PoPr 6 Parsed Request to Backends ReqP DM 12 Backend Aggregate Operator Results RecP PoPr 15 Ids for Accessing Database Indexes DM DMs 16 Request and Disk Addresses DM RecP 21 Ids for Accessing Database Records DM CC 22 Locks Obtained: Okay to Execute CC RecP 23 Request ID of Finished Request RecP CC

8 Message Passing in MBDS
K12 Request Preparation Post Processing C4 D6 K12 Put Msg. Get Msg. F15 From Other Backend E15 To Backend(s) Get Msg. Put Msg. K12 D6,F15 E15 Record Processing I16 Directory Management H22 J23 G21 Concurrency Control Disk I/O

9 Software Design Levels
Architecturally: Modules Interconnections Among Modules Decomposition into Subsystems Code: Algorithms/Data Structures Tasking/Control Threads Executable: Memory Management Runtime Environment Is this a Realistic/Accurate View? Yes for a Single “Application” What about Application of Applications? System of Systems?

10 Software Engineering - an Oxymoron?
Is there any Engineering? Is there any Science? Collection of Disparate Techniques: Data-Flow Diagrams E-R Diagrams Finite State Machines Petri Nets UML Class, Object, Sequence, Etc. Design Patterns Model Drive Architectures What is being “Engineered”? How do we Know we are Done? E.g. Does Artifact Match Specification?

11 What's Available for Engineering Software?
Specification (Abstract Models, Algebraic Semantics) Software Structure (Bundling Representation with Algorithms) Languages Issues (Models, Scope, User-Defined Types) Information Hiding (Protect Integrity of Information) Integrity Constraints (Invariants of Data Structures) Is this up to date? What else can be Added to List? Design Patters Model Driven Architectures XML –Data Modeling and Dependencies Others?

12 Engineering Success in Computing
Compilers Have Had Great Success Originally by Hand Then Compiler Compilers Parser Generators - Lex/Yacc Solid Science Behind Compilers Regular, Context Free, Context Sensitive Languages FSAs, PDAs, CFGs, etc. Science has Provided Engineering Success re. Ease and Accuracy of Modern Compiler Writing

13 History of Programming
C - Still Remains Industry Stronghorse Separate Compilation Decomposition of System into Subsystems, etc. Shared Declarations ADTs in C, But Compiler won't Enforce Them Modula-II and Ada 83 Had Information Hiding Public/Private Paradigm Module/Package Concepts Import/Export Paradigm Rigor Enforced by Compiler – but Can’t Bind/Group Modules into Subsystems Precisely Specify Interconnections and Interactions Among Subsystems and Components

14 ‘Recent-Past’ Generation?
C++ and Ada95 Considered “Legacy” Languages - Old Java, C# - Are they Headed Toward Legacy? How do they Rate? What Do they Offer that Hasn't been Offered Before? What are Unique Benefits and Potential of Java? What about new Web Technologies? Javascript, Perl, PhP, Phython, Ruby XML and SOAP How do all of these fit into this process? Particularly in Regards to C/S Solutions!

15 What's Next Step? Architectural Description Languages
Provide Tools to Describe Architectures Definition and Communication Codification of Architectural Expertise Frameworks for Specific Domains DB vs. GUI vs. Embedded vs. C/S Formal Underpinning for Engineering Rigor What has Appeared for Each of these? Struts for GUI Open Source Frameworks (mediawiki) Wide-Ranging Standards (XML) Model-Driven Architectures What Else???

16 Architectural Styles What are Popular Architectural Styles?
How are they Characterized? Example in Practice Explore a Taxonomy of Styles Focus on “Micro-Architectures” Components Flow Among Components Represents “Single” Application Forms Basis for “Macro-Architectures” System of Systems Application of Applications Significantly Scaling Up

17 Taxonomy of Architectural Styles
Data Centered Systems DBS Hypertext Blackboards Independent Components Communicating Processes/Event Systems Client/Server Two-Tier Multi-Tier Data Flow Systems Batch Sequential Pipes and Filters Call & Return Systems Main/Subroutines (C, Pascal) Object Oriented Implicit Invocation Hierarchical Systems Virtual Machines Interpreters Rule Based Systems

18 Taxonomy of Architectural Styles
Establish Framework of … Components Building Blocks for Constructing Systems A Major Unit of Functionality Examples Include: Client, Server, Filter, Layer, DB Connectors Defining the Ways that Components Interact What are the Protocols that Mandate the Allowable Interactions Among Components? How are Protocols Enforced at Run/Design Time? Examples Include: Procedure Call, Event Broadcast, DB Protocol, Pipe

19 Overall Framework What Is the Design Vocabulary?
Connectors and Components What Are Allowable Structural Patterns? Constraints on Combining Components & Connectors What Is the Underlying Conceptual Model? Von Newman, Parallel, Agent, Message-Passing… Are their New Emerging Models? Collaborative Environments/Shareware? What Are Essential Invariants of a Style? Limits on Allowable Components & Connectors Common Examples of Usage Advantages and Disadvantages of a Style Common Specializations of a Style

20 Pipes and Filters Filters Pipes Filters:
Components are Independent Entities. No Shared State! Filters Components with Input and Output Sort Sort Merge Pipes Connectors for Flow Streams of I/O Filters: Invariant: Unaware of up and Down Stream Behavior Streamed Behavior: Output Could Go From One Filter to the Next One Allowing Multiple Filters to Run in Parallel.

21 Pipes and Filters Possible Specializations:
Pipelines - Linear Sequence Bounded - Limits on Data Amounts Typed Pipes - Known Data Format What is a Classic Example? Other Examples: Compilers Sequential Processes Parallel Processes

22 Pipes and Filters - Another Example
Text Information Retrieval Systems Scanning Newspapers for Key Words, Etc. Also, Boolean Search Expressions Where is Such an Architecture Utilized Today? What is Potential Usage in BMI? User Search Controller DB Query Resolver Term Comparator Disk Commands Programming Control Result Data

23 ADTs and OO Architectures
Widespread Usage in the 1990’s Advantages Are Well Known Disadvantages: Interaction Required Object Identity If Identity Changes, It Is Difficult to Track All Affected Objects. obj Manager (ADT) Procedure Call op Connectors Components

24 Implicit Invocation Similar to OO in the Sense that Components Can Call Services on Other Components How Does this Work? Components Have List of Events they can Raise and List of Procedures to Handle Events When Event is Raised, it is Broadcast All Components that Have Procedure to Handle Broadcast Event will Act Upon it The Component That Raised the Event has no Knowledge of Which Component(s) will Handle Event What are Some Examples?

25 Implicit Invocation Advantages No Need to Know the Targeted Components
Single Event can Impact Multiple Components New Event Handlers can Easily be Added New Events Can then be Raised Disadvantages No Control Over the Order of Processing When an Event is Raised No Control Over “Who” and “How Many” Process Events Very Non-Deterministic System Behavior

26 What has OO Evolved Into?
What has Classic OO Solution Evolved into Today? Client (Browser + Struts) Server (Many Variants of OO Languages) Database Server (typically Relational) Different Style (e.g., Design Pattern) Does Pattern Capture All Aspects of Style? Do we Need to Couple Technology with Pattern? Dr. D, Jan 01, 08 Fever, Flu, Bed Rest No Scripts No Tests Item(Phy_Name*, Date*, Visit_Flag, Symptom, Diagnosis, Treatment, Presc_Flag, Pre_No, Pharm_Name, Medication, Test_Flag, Test_Code, Spec_No, Status, Tech)

27 Layered Systems Components - Virtual Machine at Each Layer
Layer: Composites of various elements Protocol: Usually Procedure Calls Useful Systems Base Utility Core level Users Components - Virtual Machine at Each Layer Connectors - Protocols That Specify How Layers Interact Interaction Is Restricted to Adjacent Layers

28 Layered Systems Advantages: Increasing Levels of Abstraction
Support Enhancement - New Layers Support for Reuse Drawbacks: Not Feasible for All Systems Performance Issues With Multiple Layers Defining Abstractions Is Difficult.

29 Public Health Researchers
Layered Systems in BMI One Approach to Constructing Access to Patient Data for Clinical Research and Clinical Practice Construct Layered Data Repositories as Below Each Layer Targets Different User Group Need to Fine Tune Access Even within Layers Patient Data De-identified Aggregated Provider Cl. Researchers Public Health Researchers

30 ISO as Layered Architecture
ISO Open Systems Interconnect (OSI) Model Now Widely Used as a Reference Architecture 7-layer Model Provides Framework for Specific Protocols (Such as IP, TCP, FTP, RPC, UDP, RSVP, …) Application Presentation Session Transport Network Data Link Physical

31 ISO OSI Model Application Presentation Session Transport Network Data Link Physical Physical (Hardware)/Data Link Layer Networks: Ethernet, Token Ring, ATM Network Layer Net: The Internet Transport Layer Net: Tcp-based Network Presentation/Session Layer Net: Http/html, RPC, PVM, MPI Applications, E.g., WWW, Window System, Algorithm

32 Repositories Blackboard (shared data)
Memory ks8 ks1 Blackboard (shared data) ks7 ks2 ks3 ks6 ks4 ks5 Computation Direct Access Knowledge Sources Interact With the Blackboard. Blackboard Contains the Problem Solving State Data. Control Is Driven by the State of the Blackboard. DB Systems Are a Form of Repository With a Layer Between the BB and the KSs - Supports Concurrent Access, Security, Integrity, Recovery

33 Database System as a Repository
DB Management System c8 c1 Database (shared data) c7 c2 c3 c6 c4 c5 Computation Direct Access Clients Interact With the DBMS Database Contains the Problem Solving State Data Control is Driven by the State of the Database Concurrent Access, Security, Integrity, Recovery Single Layer System: Clients have Direct Access Control of Access to Information must be Carefully Defined within DB Security/Integrity

34 Team Project as a Repository
Web Portal Shared c7 c2 c3 c6 c4 c5 Clients are Providers, Patients, Clinical Researchers Database Underlies Web Portal Simply a Portion of Architecture Interactions with PHR (Patients) Interactions with EMR (Providers) Interactions with Database/Warehouse (Researchers)

35 Interpreters What Are Components and Connectors?
Memory Program being interpreted Inputs Data (program state) Simulated interpretation engine Selected instruction Internal interpreter state Outputs Selected data Computation (State Machine) Data Access (fetch/store) What Are Components and Connectors? Where Have Interpreters Been Used in CS&E? LISP, ML, Java, Other Languages, OS Command Line

36 Java as Interpreter

37 Process Control Paradigms
Input variables With Feedback Set point Ds to manipulated variables Process Controller Controlled variable Input variables Without Feedback Set point Process Controller Ds to manipulated variables Controlled variable Also: Open vs. Close Loop Systems Well Defined Control and Computational Characters Heavily Used in Engineering Fields.

38 Process Architecture: Statechart Diagram?

39 Process Architecture: Activity Diagram?
Clear Applicability to Medical Processes that have Underlying BMI – Low Level Processes Breath Waiting for Resp. Signal Resp Signal timeout Trigger Local Alarm Remote Heartbeat Heart Signal irregular beat Alarm Reset

40 Design Patterns as Software Architectures
Emerged as the Recognition that in Object-Oriented Systems Repetitions in Design Occurred Gained Prominence in 1995 with Publication of “Design Patterns: Elements of Reusable Object-Oriented Software”, Addison-Wesley “… descriptions of communicating objects and classes that are customized to solve a general design problem in a particular context…” Akin to Complicated Generic Usage of Patterns Requires Consistent Format and Abstraction Common Vocabulary and Descriptions Simple to Complex Patterns – Wide Range

41 The Observer Pattern Utilized to Define a One-to-Many Relationship Between Objects When Object Changes State – all Dependents are Notified and Automatically Updated Loosely Coupled Objects When one Object (Subject – an Active Object) Changes State than Multiple Objects (Observers – Passive Objects) Notified Observer Object Implements Interface to Specify the Way that Changes are to Occur Two Interfaces and Two Concrete Classes

42 The Observer Pattern

43 Model View Controller

44 Model View Controller Three Parts of the Pattern: Model View
Enterprise Data and Business Rules for Accessing and Updating Data View Renders the Contents (or Portion) of Model Deals with Presentation of Stored Data Pull or Push Model Possible Controller Translates Interactions with View into Actions on Model Actions could be Button Clicks (GUI), Get/Post http (Web), etc.

45 Model View Controller

46 UML for System Modeling
UML is a Language for Specifying, Visualizing, Constructing, and Documenting Software Artifacts What Does a Modeling Language Provide? Model Elements: Concepts and Semantics Notation: Visual Rendering of Model Elements Guidelines: Hints and Suggestions for Using Elements in Notation References and Resources Web: Is UML Sufficient for Complexity of BMI? Able to Model Information Needs for BMI? Able to Represent Required Architectures?

47 UML Diagrammatic Representations
Component Diagram: Captures the Physical Structure of the Implementation Deployment Diagram: Captures the Topology of a System’s Hardware Collaboration Diagram: Captures Dynamic Behavior (Message-Oriented) What About Other Diagrams? State Chart Diagram: Captures Dynamic Behavior (Event-Oriented) Activity Diagram: Captures Dynamic Behavior (Activity-Oriented) These and Others Seem too Low Level … What is Role of UML for BMI? Yet Another Design Artifact Can it be More?

48 Component Diagram Captures the Physical Structure of the Implementation

49 Deployment Diagram Captures the Topology of a System’s Hardware

50 Collaboration Diagram

51 Single and Multi-Tier Architectures
Widespread use in Practice for All Types of Distributed Systems and Applications Two Kinds of Components Servers: Provide Services - May be Unaware of Clients Web Servers (unaware?) Database Servers and Functional Servers (aware?) Clients: Request Services from Servers Must Identify Servers May Need to Identify Self A Server Can be Client of Another Server Expanding from Micro-Architectures (Single Computer/One Application) to Macro-Architecture

52 Single and Multi-Tier Architectures
Normally, Clients and Servers are Independent Processes Running in Parallel Connectors Provide Means for Service Requests and Answers to be Passes Among Clients/Servers Connectors May be RPC, RMI, etc. Advantages Parallelism, Independence Separation of Concerns, Abstraction Others? Disadvantages Complex Implementation Mechanisms Scalability, Correctness, Real-Time Limits

53 Example: Software Architectural Structure
Initial Data Entry Operator (Scanning & Posting) Advanced Data Entry Operators Analyst Manager Database Server Running Oracle 10-100MB Network RMI Registry Document Server Stored Images/CD RMI Act. Obj/Server RMI Act. Obj/Server Functional Server

54 Business Process Model
DB DB Historical Records Completed Applications DB Licensing Scanner Supervisor Review DB Licensing Division Scanning Operator Stored Images Licensing Division Data Entry Operator Printer DB Basic Information Entered New Licenses New Appointments FOI Letters (Request Information, etc.)

55 Two-Tier Architecture
Small Manufacturer Previously on C++ New Order Entry, Inventory, and Invoicing Applications in Java Programming Language Existing Customer and Order Database Most of Business Logic in Stored Procedures Tool-generated GUI Forms for Java Objects

56 Three-Tier Architecture
Passenger Check-in for Regional Airline Local Database for Seating on Today's Flights Clients Invoke EJBs at Local Site Through RMI EJBs Update Database and Queue Updates JMS Queues Updates to Legacy System DBC API Used to Access Local Database

57 Four-Tier Architecture
Web Access to Brokerage Accounts Only HTML Browser Required on Front End "Brokerbean" EJB Provides Business Logic Login, Query, Trade Servlets Call Brokerbean Use JNDI to Find EJBs, RMI to Invoke Them

58 Architecture Comparisons
Two-tier Through JDBC API is Simplest Multi-tier: Separate Business Logic, Protect Database Integrity, More Scaleable JMS Queues vs. Synchronous (RMI or IDL): Availability, Response Time, Decoupling JMS Publish & Subscribe: Off-line Notification RMI IIOP vs. JRMP vs. Java IDL: Standard Cross-language Calls or Full Java Functionality JTS: Distributed Integrity, Lockstep Actions

59 Comments on Architectural Styles
Architectural Styles Provide Patterns Suppose Designing a New System During Requirements Discovery, Behavior and Structure of System Will Emerge Attempt to Match to Architectural Style Modify, Extend Style as Needed By Choosing Existing Architectural Style Know Advantages and Disadvantages Ability to Focus in on Problem Areas and Bottlenecks Can Adjust Architecture Accordingly Architectures Range from Large Scale to Small Scale in their Applicability We’ll see Examples for BMI Shortly …

60 Other Issues in Software Architectures
Consider a Set of Applications New Software Legacy, COTS, Databases, etc. A Distributed Application is a Set of Applications Deployed Over a Network that Communicate Relationship Between Applications Different Implementations of “Same” Application on Different Hardware Platforms Configuration of Various Hardware Nodes Different Node Types in the Network Issue: What is the ‘Best’ Way to Deploy Applications Across the Network of Available Resources?

61 Distributed Application & Hardware Nodes
Computers & Connections May have Different Characteristics that Affect their Usage Speed Storage Bandwidth

62 Objective: ‘Best’ Deployment
A Distributed System is Optimally Deployed if it Yields the Best Performance Performance: Efficient Use of Resources via Throughput, Response Time, or Number of Messages What are Implications in BMI? Need to Bring Together Multiple Assets Work Efficiently Across Network Unifying Clinical Research Repositories

63 Distr. Systems: Combo of Requirements
interaction patterns hardware elements software elements Specification connections protocols interfaces

64 Deployment Influenced by Many Factors
replication degree algorithms usage patterns software architecture Performance middleware deployment underlying network processing nodes

65 Framework for Design and Deployment
SOFTWARE HARDWARE Dependencies Deployment PERFORMANCE

66 What is I5? Five Definition Languages Interface Inheritance
Implementation Instantiation Installation Five Formal Integrated Graphical Languages Based on UML’s Implementation Diagrams The Application, Network, Dependencies and the Deployment are Part of an Integrated Framework

67 The Five Levels of I5 Interface (I1) - Types of Components, Nodes and Connectors Implementation (I2) - Classes of Components, Nodes and Connectors Integration (I3) - Dependencies Between Component and Node Classes Instantiation (I4) - Instances of Each Class Definition Installation (I5) - Deployment of Each Instance (Requirements and Complete Deployment) Detail Abstraction

68 Levels of Specification in I5
Types - Generic Definition of Components, Nodes, and Connectors According to Their Role Defined in I1 Used in I2 to Define Classes Classes - Different Implementations of the Types Defined in I2 Used in I3 to Associate Software Components and Hardware Artifacts and I4 to Define Instances Instances - Identical Copies of the Different Classes Defined in I4 Used in I5 to Deploy Instances Across Nodes

69 UML UML is a Set of Graphical Specification Languages (OMG’s Standard Design Language Since November, 1997) Implementation Diagrams Component Diagrams: Show the Physical Structure of the Code in Terms of Code Components and Their Dependencies Deployment Diagrams: Show the Physical Architecture of the Hardware and Software in the System. They Have a Type and an Instance Version.

70 UML When to Use Deployment Diagrams
“… In practice, I haven’t seen this kind of diagram used much. Most people do draw diagrams to show this kind of information but they are informal cartoons. On the whole, I don’t have a problem with that since each system has its own physical characteristics that your want to emphasize. As we wrestle more and more with distributed systems, however, I’m sure we will require more formality as we understand better which issues need to be highlighted in deployment diagrams.” From “UML Distilled. Applying the Standard Object Modeling Language”, by Martin Fowler. Addison-Wesley, Object Technology Series, 7th. Reprint June, 1998.

71 Pros and Cons of Graphical Modeling
Advantages: Clear to Show Structure Excellent Communication Vehicle Addresses Different Aspects of Modeling in an Integrated Fashion Disadvantages: Shows Little (or No) Details There is a Big Gap Between Specification and Implementation Limited by Screen Size & Printable Page Solution: Associate a Complete Textual Specification to Graphical Model that Contains the Necessary Details for Each Element

72 Design Concepts Interface Interaction With the Outer World Signature + Requested Services Type: Abstract Entity - Interface + Semantics Subtype: Inherits the Supertype Definition Class: Implementation of a Type Realization: Relation Between a Type and a Class That Implements It Subclass: Inherits the Superclass Implementation Instance: Element of a Class

73 The I5 Framework An Integrated Specification Framework for Distributed Systems Support for the Architectural Specification of OO and Component Based Distributed Systems Heterogeneous Network - Platforms A Five Level Framework for Defining Software and Hardware (Platforms) With a Uniform Notation and With Different Levels of Abstraction Specified Textually in Z or Graphically in UML Emphasis on Implementation Diagrams Please See

74 Dependencies Between Levels
Component Types Node Types INTERFACE Component Classes Node Classes IMPLEMENTATION Implementation Dependencies INTEGRATION Inst. Components Inst. Nodes System Instantiation INSTANTIATION Installation Req. (together,separated) (fix location) INSTALLATION Complete Installation

75 Interface - Software: I1S
Components Types Type Supertypes Associated Interfaces Calls Properties Types are Unique Supertypes Must Be Part of I1S Calls Must Be Satisfied in I1S

76 Interface - Software: I1S
Client response <<call>> <<call>> request receive FrontEnd <<call>> <<call>> receive gossip Replica <<call>>

77 Interface - Hardware: I1H
Node Types Connector Types Connections Properties All Node Types Must Be Connected Only Node and Connector Types Defined Take Part in the Connections SUN Intel Pentium MPI Sockets

78 Implementation - Software: I2S
Component Classes Component Type Class Superclasses Calls to Classes Interfaces Properties: Only Types in I1S are Allowed Superclasses Are Realizations of the Supertypes Calls & Inheritance are Satisfied Within I2S

79 Implementation - Software: I2S
PCCtrCl response XCtrCl response <<call>> <<call>> XFrontEnd request receive <<call>> receive gossip Counter <<call>>

80 Implementation - Hardware: I2H
Node Classes Node Type Class Connector Classes Type Connections Between Node Classes Properties Node and Connector Classes Refine the Types in I1H Connections are With Connector Classes That Refine Connector Types in I1H

81 Implementation - Hardware: I2H
SUN Intel Pentium MPI Sockets <<realizes>> <<realizes>> MPI_Impl SUN OS 4.1.4 CSockets Win95

82 Software and Hardware Integration: I3
Relation <<supports>> Instances of the Component Class May Run on Instances of the Node Class Important Step Since it Constrains Deployment Options Properties Only Node and Component Classes Defined in I2 Can Participate of the <<supports>> Relation

83 Software and Hardware Integration: I3
XCtrCl response PCCtrCl response <<supports>> <<supports>> MPI_Impl request Win95 XFrontEnd SUN OS 4.1.4 CSockets <<supports>> receive <<supports>> receive gossip Counter

84 Instantiation - Software: I4S
Component Instances Class Identification Calls Properties Instance Calls Refine Class Calls Only Classes in I2S May Be Instantiated

85 Instantiation - Software: I4S
request receive receive ct1:Counter gossip receive response ct2:Counter c1:PCCtrCl gossip fe1:XFrontEnd receive response ct3:Counter c2:PCCtrCl gossip receive response ct4:Counter request receive gossip c3:PCCtrCl fe2:XFrontEnd receive ct5:Counter response gossip c4:XCtrCl ct6:Counter receive gossip

86 Instantiation - Hardware: I4H
Node Instances Class Identification Connector Instances Set of Connected Nodes Properties There are Only Instances of the Node & Connector Classes Defined in I2H Connectors Refine I2H Connections

87 Instantiation - Hardware: I4H
sun6: SunOS4.1.4 sun7: sun8: sun9: sun10: sun1: sun2: sun3: sun4: sun5: pc1:Win95 pc2:Win95 pc3:Win95 pc4:Win95 sock1 sock2 sock3 sock4 mpi1

88 Installation Requirements
A Set of Component Instances Must Be Deployed Together or Separated Fix the Location of Some Component Instances All Installation Requirements Must Be Consistent With the Requirements Imposed by All the Previous Specification Levels Requirements Together Separated Fix

89 Installation - Requirements: Ifix, Iseparated
receive receive fe2:XFrontEnd fe1:XFrontEnd request request sun2:SunOS4.1.4 sun3:SunOS4.1.4 separated = {ct1:Counter, ct2:Counter, ct3:Counter, ct4:Counter, ct5:Counter, ct6:Counter}

90 Mapping Applications to Hardware
Applications (Left) and Hardware (Right) Instances Restrictions on Which Applications can be Deployed on Which Hardware? Which Applications Deployed Together? Which Applications Must be Separate?

91 Objective: ‘Best” Optimal Deployment

92 Using I5 for BMI Focus at Architectural Level
Multiple Assets to Bring Together Hospital EMRs, Provider EMRs, Other Systems Multiple and Disparate Hardware Different Contexts and Needs Clinical Practice – (Near) Real-Time Integration/Access Clinical Research – De-Identified Integrated Repository Performance will be Key Issue Clinical Practice – Time of Access Clinical Research – Volume of Information Some Genomic Data Requires Terabytes of Data! Information overload Possible

93 The Next Big Challenge Macro-Architectures System of Systems
Application of Applications Involves Two Key Issues Interoperability Heterogeneous Distributed Databases Heterogeneous Distributed Systems Autonomous Applications Scalability Rapid and Continuous Growth Amount of Data Variety of Data Types Different Privacy Levels or Ownerships of Data

94 Interoperability: A Classic View
Simple Federation Multiple Nested Federation FDB Global Schema FDB Global Schema 4 Federated Integration Federated Integration Local Schema Local Schema Local Schema FDB 1 Local Schema FDB3 Federation Federation

95 What is CORBA? Differs from Typical Programming Languages
Objects can be … Located Throughout Network Interoperate with Objects on other Platforms Written in Ant PLs for which there is mapping from IDL to that Language

96 What is CORBA? Allow Interactions from Client to Server CORBA
Installed on All Participating Machines

97 CORBA-Based Development
IDL file Object Implementation Client Application IDL Compiler IDL Compiler Stub Skeleton ORB/IIOP ORB/IIOP

98 Database Interoperability in the Internet
Technology Web/HTTP, JDBC/ODBC, CORBA (ORBs + IIOP), XML Architecture Information Broker Mediator-Based Systems Agent-Based Systems

99 ORB Integration:Java Client + Legacy Application
Wrapper Object Request Broker (ORB) CORBA is the Medium of Info. Exchange Requires Java/CORBA Capabilities

100 Java Client with Wrapper to Legacy Application
Interactions Between Java Client and Legacy Appl. via C and RPC C is the Medium of Info. Exchange Java Client with C++/C Wrapper Java Application Code WRAPPER Mapping Classes JAVA LAYER NATIVE LAYER Native Functions (C++) RPC Client Stubs (C) Legacy Application Network

101 COTS and Legacy Appls. to Java Clients
COTS Application Legacy Application Java Application Code Java Application Code Native Functions that Map to COTS Appl Native Functions that Map to Legacy Appl NATIVE LAYER NATIVE LAYER JAVA LAYER JAVA LAYER Mapping Classes Mapping Classes JAVA NETWORK WRAPPER JAVA NETWORK WRAPPER Network Java Client Java Client Java is Medium of Info. Exchange - C/C++ Appls with Java Wrappers

102 Java Client to Legacy App via RDBS
Transformed Legacy Data Java Client Relational Database System(RDS) Updated Data Extract and Generate Data Transform and Store Data Legacy Application

103 JDBC JDBC API Provides DB Access Protocols for Open, Query, Close, etc. Different Drivers for Different DB Platforms JDBC API Java Application Driver Manager Driver Driver Driver Driver Driver Oracle Access Sybase

104 Connecting a DB to the Web
Web Server are Stateless DB Interactions Tend to be Stateful Invoking a CGI Script on Each DB Interaction is Very Expensive, Mainly Due to the Cost of DB Open DBMS CGI Script Invocation or JDBC Invocation Web Server Web Server are stateless DB interactions tend to be stateful Invoking a CGI script on each DB interaction is very expensive, mainly due to the cost of DB open Internet Browser

105 Connecting More Efficiently
To Avoid Cost of Opening Database, One can Use Helper Processes that Always Keep Database Open and Outlive Web Connection Newly Invoked CGI Scripts Connect to a Preexisting Helper Process System is Still Stateless DBMS Helper Processes CGI Script or JDBC Invocation Web Server Internet Browser

106 DB-Internet Architecture
WWW Client (Netscape) WWW client (Info. Explore) WWW Client (HotJava) Internet HTTP Server DBWeb Gateway DBWeb Gateway Web server passes URL to a DB Web gateway DBWeb gateway asks DBWeb dispatcher what DBWeb server to connect to servers keep database open all the time Gateway connects to assigned server DBWeb server submits query to DB, accepts results, translates them to HTML (generic methods or user-overridden), sends to client. DBWeb Dispatcher DBWeb Gateway DBWeb Gateway

107 Biomedical Architectures
Transcend Normal Two, Three, and Four Tier Solutions – Macro-Architecture An Architecture of Architectures! Need to Integrate Systems that are Themselves Multi-Tier and Distributed Need to Resolve Data Ownership Issues State of Connecticut Agencies Don’t Share Competing Hospitals Seek to Protect Market Share T1, T2, and Clinical Research Requires Interoperating Genomic Databases/Supercomputers Integration of De-identified Patient Data from Multiple Sources to Allow Sufficient Study Samples De-identified Data Repositories or Data Marts Dealing with Ownership Issues (DNA Research)

108 Consider Team Project Architecture
Patients Providers EMR PHR Web-Based Portal(XML + HL7) Open Source DB (XML or MySQL) Feedback Repository Education Materials Clinical Researchers

109 Internet and the Web A Major Opportunity for Business
A Global Marketplace Business Across State and Country Boundaries A Way of Extending Services Online Payment vs. VISA, Mastercard A Medium for Creation of New Services Publishers, Travel Agents, Teller, Virtual Yellow Pages, Online Auctions … A Boon for Academia Research Interactions and Collaborations Free Software for Classroom/Research Usage Opportunities for Exploration of Technologies in Student Projects What are Implications for BMI? Where is the Adv?

110 WWW: Three Market Segments
Server Corporate Network Business to Business Information sharing Ordering info./status Targeted electronic commerce Server Intranet Decision support Mfg.. System monitoring corporate repositories Workgroups Internet Corporate Network Server Internet Sales Marketing Information Services Server Provider Network Provider Network Exposure to Outside

111 Information Delivery Problems on the Net
Everyone can Publish Information on the Web Independently at Any Time Consequently, there is an Information Explosion Identifying Information Content More Difficult There are too Many Search Engines but too Few Capable of Returning High Quality Data Most Search Engines are Useful for Ad-hoc Searches but Awkward for Tracking Changes What are Information Delivery Issues for BMI? Publishing of Patient Education Materials Publishing of Provider Education Materials How Can Patients/Providers find what Need? How do they Know if its Relevant? Reputable? First resource or last resource

112 Example Web Applications
Scenario 1: World Wide Wait A Major Event is Underway and the Latest, Up-to-the Minute Results are Being Posted on the Web You Want to Monitor the Results for this Important Event, so you Fire up your Trusty Web Browser, Pointing at the Result Posting Site, and Wait, and Wait, and Wait … What is the Problem? The Scalability Problems are the Result of a Mismatch Between the Data Access Characteristics of the Application and the Technology Used to Implement the Application May not be Relevant to BMI: Hard to Apply Scenario Events: National election or race, or Olympic what is the problems? it could be a number of technical glitches: a congested network, an overloaded server, or even a crashed server. In a larger sense the problem is one of scalability: the system cannot keep up with the heavy load caused by the transient surge in activity that occurs in such situations. Why? we argue that the scalability problems are the result of a mismatch between the data access characteristics of the application and the technology (in this case, HTTP) used to implement the application.

113 Example Web Applications
Scenario 2: Many Applications Today have the Need for Tracking Changes in Local and Remote Data Sources and Notifying Changes If Some Condition Over the Data Source(s) is Met To Monitor Changes on Web, You Need to Fire Your Trusty Web Browser from Time to Time, Cache the Most Recent Result, and Difference Manually Each Time You Poll the Data Source(s) Issue: Pure Pull is Not the Answer to All Problems BMI: If a Patient Enters Data that Sets off a Chain Reaction, how Can Provider be Notified and in Turn the Provider Notify the Patient (Bad Health Event) what is the problems? it could be a number of technical glitches: a congested network, an overloaded server, or even a crashed server. In a larger sense the problem is one of scalability: the system cannot keep up with the heavy load caused by the transient surge in activity that occurs in such situations. Why? we argue that the scalability problems are the result of a mismatch between the data access characteristics of the application and the technology (in this case, HTTP) used to implement the application.

114 What is the Problem? Applications are Asymmetric but the Web is Not
Computation Centric vs. Information Flow Centric Type of Asymmetry Network Asymmetry Satellite, CATV, Mobile Clients, Etc. Client to Server Ratio Too Many Clients can Swamp Servers Data Volume Mouse and Key Click vs. Content Delivery Update and Information Creation Clients Need to be Informed or Must Poll Clearly, for BMI, Simple Web Environment/Browser is Not Sufficient – No Auto-Notification what is the problems? it could be a number of technical glitches: a congested network, an overloaded server, or even a crashed server. In a larger sense the problem is one of scalability: the system cannot keep up with the heavy load caused by the transient surge in activity that occurs in such situations. Why? we argue that the scalability problems are the result of a mismatch between the data access characteristics of the application and the technology (in this case, HTTP) used to implement the application.

115 What are Information Delivery Styles?
Pull-Based System Transfer of Data from Server to Client is Initiated by a Client Pull Clients Determine when to Get Information Potential for Information to be Old Unless Client Periodically Pulls Push-Based System Transfer of Data from Server to Client is Initiated by a Server Push Clients may get Overloaded if Push is Too Frequent Hybrid Pull and Push Combined Pull First and then Push Continually Pull-based System the transfer of data from server to client is initiated by a client pull standard way of doing business Push-based System Assumes server will know what client will want server could learn access pattern clients could send profiles to servers

116 Publish/Subscribe Semantics: Servers Publish/Clients Subscribe
Servers Publish Information Online Clients Subscribe to the Information of Interest (Subscription-based Information Delivery) Data Flow is Initiated by the Data Sources (Servers) and is Aperiodic Danger: Subscriptions can Lead to Other Unwanted Subscriptions Applications Unicast: Database Triggers and Active Databases 1-to-n: Online News Groups May work for Clinical Researcher to Provider Push

117 Design Options for Nodes
Three Types of Nodes: Data Sources Provide Base Data which is to be Disseminated Clients Who are the Net Consumers of the Information Information Brokers Acquire Information from Other Data Sources, Add Value to that Information and then Distribute this Information to Other Consumers By Creating a Hierarchy of Brokers, Information Delivery can be Tailored to the Need of Many Users Brokers may be Ideal Intermediaries for BMI! Act on Behalf of Patients, Providers Incorporate Secure Access

118 Research Challenges Ubiquitous/Pervasive Inherent Complexity:
Coping with Latency (Sometimes Unpredictable) Failure Detection and Recovery (Partial Failure) Concurrency, Load Balancing, Availability, Scale Service Partitioning Ordering of Distributed Events “Accidental” Complexity: Heterogeneity: Beyond the Local Case: Platform, Protocol, Plus All Local Heterogeneity in Spades. Autonomy: Change and Evolve Autonomously Tool Deficiencies: Language Support (Sockets,rpc), Debugging, Etc. Ubiquitous/Pervasive Many computers and information appliances everywhere, networked together Heterogeneous Different Platforms Different APIs Different Access Capabilities …... Autonomous Change and evolve independently A big question: how to build a system that scales up w.r.t. performance functionality coverage of services number of users

119 Infosphere Clean, Reliable, Timely Information, Anywhere
Problem: too many sources,too much information Internet: Information Jungle Clean, Reliable, Timely Information, Anywhere Digital Earth Sensors Personalized Filtering & Info. Delivery Infopipes Resource Adaptation Property Mgmt Information Quality Continual Queries Microfeedback specialization Problem statement: too much information from too many heterogeneous sources, insufficient analysis, inadequate resource allocation for processing, transmission, filtering, and presentation. And the situation will get much, much worse due to the technology push. Technology push assumptions (what is already happening): (1) Plenty of affordable CPU, memory, storage, and network bandwidth. (2) Massively parallel generation of information content on the Internet and from new generation of sensors (e.g., Digital Earth). (3) Small, handy devices to access information (information appliances, or appliances with information access - called Infotaps in the proposal). Expedition Goal statement: clean, reliable, timely information, anywhere. (The right information to the right person at the right time.) Hypothesis: the missing link from assumptions to the goal is the proper information flow provided by Infopipes. Approach: construction of Infopipes and experimental demonstration with focus on the timely delivery of high quality fresh information.

120 Current State-of-Art Web Server Mainframe Database Server Thin Client

121 Infosphere Scenario – for BMI
Variety of Servers Infotaps & Fat Clients Sensors Many sources Database Server

122 Heterogeneity and Autonomy
How Much can we Really Integrate? Syntactic Integration Different Formats and Models Web/SQL Query Languages Semantic Interoperability Basic Research on Ontology, Etc Autonomy No Central DBA on the Net Independent Evolution of Schema and Content Interoperation is Voluntary Interface Technology (Support for Isvs) DCOM: Microsoft Standard CORBA, Etc...

123 Security and Data Quality
System Security in the Broad Sense Attacks: Penetrations, Denial of Service System (and Information) Survivability Security Fault Tolerance Replication for Performance, Availability, and Survivability Data Quality Web Data Quality Problems Local Updates with Global Effects Unchecked Redundancy (Mutual Copying) Registration of Unchecked Information Spam on the Rise

124 Legacy Data Challenge Legacy Applications and Data
Definition: Important and Difficult to Replace Typically, Mainframe Mission Critical Code Most are OLTP and Database Applications Evolution of Legacy Databases Client-server Architectures Wrappers Expensive and Gradual in Any Case

125 Potential Value Added/Jumping on Bandwagon
Sophisticated Query Capability Combining SQL with Keyword Queries Consistent Updates Atomic Transactions and Beyond But Everything has to be in a Database! Only If we Stick with Classic DB Assumptions Relaxing DB Assumptions Interoperable Query Processing Extended Transaction Updates Commodities DB Software A Little Help is Still Good If it is Cheap Internet Facilitates Software Distribution Databases as Middleware

126 Data Warehousing and Data Mining
Provide Access to Data for Complex Analysis, Knowledge Discovery, and Decision Making Underlying Infrastructure in Support of Mining Provides Means to Interact with Multiple DBs OLAP (on-Line Analytical Processing) vs. OLTP Data Mining Discovery of Information in a Vast Data Sets Search for Patterns and Common Features based Discover Information not Previously Known Medical Records Accessible Nationwide Research/Discover Cures for Rare Diseases Relies on Knowledge Discovery in DBs (KDD)

127 Data Warehousing and OLAP
A Data Warehouse Database is Maintained Separately from an Operational Database “A Subject-Oriented, Integrated, Time-Variant, and Non-Volatile Collection of Data in Support for Management’s Decision Making Process [W.H.Inmon]” OLAP (on-Line Analytical Processing) Analysis of Complex Data in the Warehouse Attempt to Attain “Value” through Analysis Relies on Trained and Adept Skilled Knowledge Workers who Discover Information Data Mart Organized Data for a Subset of an Organization Establish De-Identified Marts for BMI Research subject-oriented, integrated, time-variant, non-volatile

128 Building a Data Warehouse
Option 1 Leverage Existing Repositories Collate and Collect May Not Capture All Relevant Data Option 2 Start from Scratch Utilize Underlying Corporate Data Corporate data warehouse Option 1: Consolidate Data Marts Option 2: Build from scratch Data warehousing is a process f constructing and using data warehouses Data Mart Data Mart Data Mart Data Mart ... Corporate data

129 BMI – Partition/Excerpt Data Warehouse
Clinical and Epidemiological Research (and for T2 and T1) Each Study Submitted to Institutional Review Board (IRB) For Human Subjects (Assess Risks, Protect Privacy) See: To Satisfy IRB (and Privacy, Security, etc.), Reverse Process to Create a Data Mart for each Approved Study Export/Excerpt Study Data from Warehouse May be Single or Multiple Sources BMI data warehouse Data warehousing is a process f constructing and using data warehouses Data Mart Data Mart Data Mart Data Mart ...

130 Data Warehouse Characteristics
Utilizes a “Multi-Dimensional” Data Model Warehouse Comprised of Store of Integrated Data from Multiple Sources Processed into Multi-Dimensional Model Warehouse Supports of Times Series and Trend Analysis “Super-Excel” Integrated with DB Technologies Data is Less Volatile than Regular DB Doesn’t Dramatically Change Over Time Updates at Regular Intervals Specific Refresh Policy Regarding Some Data

131 Three Tier Architecture
monitor OLAP Server integrator External data sources Summarization report Extraxt Transform Load Refresh Operational databases Data Warehouse serve Query report Data mining metadata Data marts

132 Data Warehouse Design Most of Data Warehouses use a Start Schema to Represent Multi-Dimensional Data Model Each Dimension is Represented by a Dimension Table that Provides its Multidimensional Coordinates and Stores Measures for those Coordinates A Fact Table Connects All Dimension Tables with a Multiple Join Each Tuple in Fact Table Represents the Content of One Dimension Each Tuple in the Fact Table Consists of a Pointer to Each of the Dimensional Tables Links Between the Fact Table and the Dimensional Tables for a Shape Like a Star

133 What is a Multi-Dimensional Data Cube?
Representation of Information in Two or More Dimensions Typical Two-Dimensional - Spreadsheet In Practice, to Track Trends or Conduct Analysis, Three or More Dimensions are Useful For BMI – Axes for Diagnosis, Drug, Subject Age

134 Multi-Dimensional Schemas
Supporting Multi-Dimensional Schemas Requires Two Types of Tables: Dimension Table: Tuples of Attributes for Each Dimension Fact Table: Measured/Observed Variables with Pointers into Dimension Table Star Schema Characterizes Data Cubes by having a Single Fact Table for Each Dimension Snowflake Schema Dimension Tables from Star Schema are Organized into Hierarchy via Normalization Both Represent Storage Structures for Cubes

135 Example of Star Schema Product Date ProductNo ProdName Sale Fact Table
ProdDesc Categoryu Product CustID CustName CustCity CustCountry Customer Date Sale Fact Table Date Month Year Date Product Store Customer Unit_Sales Dollar_Sales Store StoreID City State Country Region

136 Example of Star Schema for BMI
BP Temp Resp HR (Pulse) Vitals PatientID PatientName PatientCity PatientCountry Patient Date Patient Fact Table Date Month Year Visit Date Vitals Symptoms Patient Medications Etc. Symptoms Pulmonary Heart Mus-Skel Skin Digestive Reference another Star Schema for all Meds

137 A Second Example of Star Schema …

138 and Corresponding Snowflake Schema

139 Data Warehouse Issues Data Acquisition
Extraction from Heterogeneous Sources Reformatted into Warehouse Context - Names, Meanings, Data Domains Must be Consistent Data Cleaning for Validity and Quality is the Data as Expected w.r.t. Content? Value? Transition of Data into Data Model of Warehouse Loading of Data into the Warehouse Other Issues Include: How Current is the Data? Frequency of Update? Availability of Warehouse? Dependencies of Data? Distribution, Replication, and Partitioning Needs? Loading Time (Clean, Format, Copy, Transmit, Index Creation, etc.)? For CTSA – Data Ownership (Competing Hosps).

140 Knowledge Discovery Data Warehousing Requires Knowledge Discovery to Organize/Extract Information Meaningfully Knowledge Discovery Technology to Extract Interesting Knowledge (Rules, Patterns, Regularities, Constraints) from a Vast Data Set Process of Non-trivial Extraction of Implicit, Previously Unknown, and Potentially Useful Information from Large Collection of Data Data Mining A Critical Step in the Knowledge Discovery Process Extracts Implicit Information from Large Data Set The main steps in KDD process Gathering data Cleansing the data and fit it in together Selecting the necessary data Crunch and squeeze the data to extract the essence of it Evaluate the output and use it

141 Steps in a KDD Process Learning the Application Domain (goals)
Gathering and Integrating Data Data Cleaning Data Integration Data Transformation/Consolidation Data Mining Choosing the Mining Method(s) and Algorithm(s) Mining: Search for Patterns or Rules of Interest Analysis and Evaluation of the Mining Results Use of Discovered Knowledge in Decision Making Important Caveats This is Not an Automated Process! Requires Significant Human Interaction! Learning the application domain (goals) Gathering and integrating data Cleaning and preprocessing data (often 60% effort) Reducing and projecting data (finding useful features, dimensions/variable reduction …) Choosing the mining algorithm(s) Minding: search for patterns or rules of interest Analysis and Evaluation of the mining results (visualization, alteration, removing redundant or uninteresting patterns/rules) Use of discovered knowledge in decision making

142 OLAP Strategies OLAP Strategies Roll-Up: Summarization of Data
Drill-Down: from the General to Specific (Details) Pivot: Cross Tabulate the Data Cubes Slide and Dice: Projection Operations Across Dimensions Sorting: Ordering Result Sets Selection: Access by Value or Value Range Implementation Issues Persistent with Infrequent Updates (Loading) Optimization for Performance on Queries is More Complex - Across Multi-Dimensional Cubes Recovery Less Critical - Mostly Read Only Temporal Aspects of Data (Versions) Important

143 On-Line Analytical Processing
Data Cube A Multidimensonal Array Each Attribute is a Dimension In Example Below, the Data Must be Interpreted so that it Can be Aggregated by Region/Product/Date Product Pants Diapers Product Store Date Sale acron Rolla,MO 7/3/ budwiser LA,CA /22/ large pants NY,NY /12/ 3’ diaper Cuba,MO 7/30/ Beer Nuts West East Central Mountain South Operations in OLAP Slicing / Dicing, also called ranging Pivoting Drill-up/Roll-down Add/drop a dimension, clim up.down concept hierarchy Further studies. Variables vs.dimensions Sparse cubes Imlementation relational tables vs. multimentional cube Efficient generation of cubes Region Jan Feb March April Date

144 On-Line Analytical Processing
For BMI – Imagine a Data Table with Patient Data Define Axis Summarize Data Create Perspective to Match Research Goal Essentially De-identified Data Mart Medication Lescol Crestor Patient Med BirthDat Dosage Steve Lipitor /1/ mg John Zocor /2/ mg Harry Crestor /3/ mg Lois Lipitor /4/ mg Charles Crestor /1/ mg Zocor Lipitor 5 10 20 40 80 Operations in OLAP Slicing / Dicing, also called ranging Pivoting Drill-up/Roll-down Add/drop a dimension, clim up.down concept hierarchy Further studies. Variables vs.dimensions Sparse cubes Imlementation relational tables vs. multimentional cube Efficient generation of cubes Dosage 1940s 1950s 1960s 1970s Decade

145 Examples of Data Mining
The Slicing Action A Vertical or Horizontal Slice Across Entire Cube Months Cities Products Sales Months Cities Products Sales Slice on city Atlanta N-dimensional data cube, each dimension denotes an attribute in a relation. A relation of N attributes will have a N-demensional data cube. Multi-Dimensional Data Cube

146 Examples of Data Mining
The Dicing Action A Slide First Identifies on Dimension A Selection of Any Cube within the Slice which Essentially Constrains All Three Dimensions Months Cities Products Sales Products Sales Months Atlanta March 2000 Atlanta Electronics Dice on Electronics and Atlanta

147 Examples of Data Mining
Drill Down - Takes a Facet (e.g., Q1) and Decomposes into Finer Detail Jan Feb March Cities Products Sales Q1 Q2 Q3 Q4 Location (city, GA) Products Sales Drill down on Q1 Columbus Atlanta Gainesville Savannah California Arizona Georgia Iowa Q1 Q2 Q3 Q4 Roll Up on Location (State, USA) N-dimensional data cube, each dimension denotes an attribute in a relation. A relation of N attributes will have a N-demensional data cube. Products Sales Roll Up: Combines Multiple Dimensions From Individual Cities to State

148 Mining Other Types of Data
Analysis and Access Dramatically More Complicated! Time Series Data for Glucose, BP, Peak Flow, etc. Spatial databases Multimedia databases World Wide Web Time series data Geographical and Satellite Data

149 Advantages/Objectives of Data Mining
Descriptive Mining Discover and Describe General Properties 60% People who buy Beer on Friday also have Bought Nuts or Chips in the Past Three Months Predictive Mining Infer Interesting Properties based on Available Data People who Buy Beer on Friday usually also Buy Nuts or Chips Result of Mining Order from Chaos Mining Large Data Sets in Multiple Dimensions Allows Businesses, Individuals, etc. to Learn about Trends, Behavior, etc. Impact on Marketing Strateg

150 Data Mining Methods (1) Association
Discover the Frequency of Items Occurring Together in a Transaction or an Event Example 80% Customers who Buy Milk also Buy Bread Hence - Bread and Milk Adjacent in Supermarket 50% of Customers Forget to Buy Milk/Soda/Drinks Hence - Available at Register Prediction Predicts Some Unknown or Missing Information based on Available Data Forecast Sale Value of Electronic Products for Next Quarter via Available Data from Past Three Quarters

151 Association Rules Motivated by Market Analysis Rules of the Form
Item1^Item2^…^ ItemkItemk+1 ^ … ^ Itemn Example “Beer ^ Soft Drink  Pop Corn” Problem: Discovering All Interesting Association Rules in a Large Database is Difficult! Issues Interestingness Completeness Efficiency Basic Measurement for Association Rules Support of the Rule Confidence of the Rule A market analysis is a collection of items purchased by a customer in a single transaction. Retailers want to identify sets of items that are purchased tofether this info can be used to improve the layout of goods in a store or the layout of catalogue pages. A-->B A and B are sets of items. If every item in A is purchased in a transaction, then it is likely that the items in B will also be purchased.

152 Data Mining Methods (2) Classification
Determine the Class or Category of an Object based on its Properties Example Classify Companies based on the Final Sale Results in the Past Quarter Clustering Organize a Set of Multi-dimensional Data Objects in Groups to Minimize Inter-group Similarity is and Maximize Intra-group Similarity Group Crime Locations to Find Distribution Patterns Distance/Similarity - object to object distance. Euclidean, Manhattan - Object to cluster distance centroid, closest, furtherest, average - Cluster to cluster distance centroid to centroid, closest, furtherest, average Cluster Number K - fixed vs. Dynamic cluster Va;idity Classificatiom of clustering methods - Exclusive vs. Non-exclusive - Exclusive Hierarchical vs. Partional - Hierarchical Agglomerative vs. Divisive

153 Classification Two Stages
Learning Stage: Construction of a Classification Function or Model Classification Stage: Predication of Classes of Objects Using the Function or Model Tools for Classification Decision Tree Bayesian Network Neural Network Regression Problem Given a Set of Objects whose Classes are Known (Training Set), Derive a Classification Model which can Correctly Classify Future Objects

154 An Example Attributes Class Attribute - Play/Don’t Play the Game
Training Set Values that Set the Condition for the Classification What are the Pattern Below? Attribute Possible Values outlook sunny, overcast, rain temperature continuous humidity continuous windy true, false Outlook Temperature Humidity Windy Play sunny false No overcast false Yes sunny true No sunny false No sunny false Yes … … … …

155 Data Mining Methods (3) Summarization
Characterization (Summarization) of General Features of Objects in the Target Class Example Characterize People’s Buying Patterns on the Weekend Potential Impact on “Sale Items” & “When Sales Start” Department Stores with Bonus Coupons Discrimination Comparison of General Features of Objects Between a Target Class and a Contrasting Class Comparing Students in Engineering and in Art Attempt to Arrive at Commonalities/Differences Summarization - overview or general views of underlying data - Problem Given a relational table, find aggregation (summary) of subsets. Two techniques Attribute-oriented induction on-line analytlical processing Generalization using Concert hierarchy (taxonomy)

156 Summarization Technique
Attribute-Oriented Induction Generalization using Concert hierarchy (Taxonomy) barcode category brand content size milk diaryland Skim L food mechanical MotorCraft valve 23a 12in … … … … Milk … bread Skim milk … 2% milk White whole bread … wheat User control Subset selection (e.g., using SQL) Attribute threshold the max number of vakues Attribute level Avilability of concept hierarchy Provided by users, domain expert, knowledge engineeris Extract from datanase schema Generate automatically Operation Generalization - hill climbing Specialization Category Content Count Lucern … Dairyland milk skim milk % … … Wonder … Safeway

157 Why is Data Mining Popular?
Technology Push Technology for Collecting Large Quantity of Data Bar Code, Scanners, Satellites, Cameras Technology for Storing Large Collection of Data Databases, Data Warehouses Variety of Data Repositories, such as Virtual Worlds, Digital Media, World Wide Web Corporations want to Improve Direct Marketing and Promotions - Driving Technology Advances Targeted Marketing by Age, Region, Income, etc. Exploiting User Preferences/Customized Shopping What is Potential for BMI? How do you see Data Mining Utilized? What are Key Issues to Worry About?

158 Requirements & Challenges in Data Mining
Security and Social What Information is Available to Mine? Preferences via Store Cards/Web Purchases What is Your Comfort Level with Trends? User Interfaces and Visualization What Tools Must be Provided for End Users of Data Mining Systems? How are Results for Multi-Dimensional Data Displayed? Performance Guarantees Range from Real-Time for Some Queries to Long-Term for Other Queries Data Sources of Complex Data Types or Unstructured Data - Ability to Format, Clean, and Load Data Sets

159 Robert H. Aseltine, Jr., Ph.D. Cal Collins January 16, 2008
An Initiative of the University of Connecticut Center for Public Health and Health Policy Robert H. Aseltine, Jr., Ph.D. Cal Collins January 16, 2008

160 What is CHIN? State of Connecticut Agencies Collect and Maintain Data in Separate Databases such as: Vital Statistics: Birth, Death (DPH) Surveillance data: Lead Screening and Immunization Registries (DPH) Administrative services: LINK system (DCF), CAMRIS (DMR) Benefit programs: WIC (DPH), Medicaid (DSS) Educational achievement: (PSIS) Such Data is Un-Integrated Impossible to Track Assess Target Populations Difficult to Develop Evidence-Based Practices Limits Meaningful Interactions Among State Agencies

161 What Do We Mean by “Integration?”
UCONN Health Center Low Birth Weight Infant Registry Last Name First Name DOB SSN Birth Wt. (kg) Appel April 01/01/1999 2.8 Berry John 02/02/1997 2.9 Carat Colleen 03/03/1993 1.9 Ernst Max 04/04/1994 2.7 Gomez Gloria 05/05/1995 2.6 Hurst William 06/06/1996 3.1 Keller Helene 07/07/1997 2.5 Martinez Pedro 08/08/1998 3.0 Rodriguez Felix 09/09/1999 Smith Peggy 10/10/2000 Dept. of Mental Retardation Birth to Three System Last Name First Name DOB Street Town Allen Gwen 01/01/1999 Apple Enfie Buck Jerome 07/01/1999 Burbank West Cleary Jane 03/03/1993 Cedar Tolla Dory Daniel Dogfish Hartf Ernst Max 04/04/1994 Elm Friday Joe 11/03/1999 Fruit Wind Glenn Valerie 03/23/1998 Glen Branf Martinez Pedro 08/08/1998 High Riley Lily 03/03/1996 Ipswich Bridg Sanchez Ramon Juniper New CT Dept. of Education PSIS System Last Name First Name CMT Math Polio Vac Date Days in Attendance Appel April 134 01/05/ 1999 179 Carat Colleen 256 05/01/ 1998 122 Cleary Jane 268 01/28/ 2000 178 Ernst Max 152 01/09/ 145 Gomez Gloria 289 01/01/ 168 Friday Joe 265 10/01/ 170 Keller Helene 309 11/01/ 2001 180 Martinez Pedro 248 12/01/ 2003 Riley Lily 201 Sanchez Ramon 249 159 Last Name First Name DOB SSN Birth Wt. Street Town CMT Math Grade 3 Polio Vaccination Date Days in Attendance Ernst Max 04/04/1994 2.7 Elm Enfield 152 01/09/1999 145 Martinez Pedro 08/08/1998 3.0 High Hartford 248 12/01/2003 180

162 Key Challenges to Integrating Data
Security and Privacy HIPAA FERPA WIC, Social Security (Medicaid/Medicare) regulations State statutes Alteration/disruption of business practices Unique identification of individuals/cases Accuracy and reliability of data Disparate hardware/software platforms

163 Key Challenges to Integrating Data
Security and Privacy HIPAA FERPA WIC, Social Security (Medicaid/Medicare) regulations State statutes Alteration/disruption of business practices Unique identification of individuals/cases Accuracy and reliability of data Disparate hardware/software platforms

164 The Solution: CHIN Connecticut Health Information Network
A Federated Network That: Allows Shared Access to “Health”-related Data From Heterogeneous Databases Allows Agencies to Retain Complete Control Over Access to Data Has Minimal Impact on Business Practices Complies with Security and Privacy Statutes Incorporates Cutting-edge Approaches to Case Matching Partnership of: Early Partners: DPH, DCF, DDS, DoE, DOIT, UConn, Akaza Research

165 CHIN Processes and Components
Define data elements in CHIN Map data elements to source database Publish “metadata” to CHIN with security and privacy rules CHIN Metadata Registry and CHIN Trusted Broker CHIN Metadata Registry CHIN Contributor Query Execution: Identifier Matching and Data Merge Review Committee Approval Build Query CHIN Enterprise Administration CHIN Metadata Registry and CHIN Query Builder CHIN GRID and Trusted Broker De-identify Data Integrated, De-identified Data CHIN Trusted Broker and De-Identification Engine

166 Original CHIN Architecture

167 Second CHIN Architecture: User Side
& Contributor Contributor

168 Second CHIN Architecture: Contributor Side
Front End Trusted Broker A &

169 Current CHIN Architecture

170 CHIN Architecture: Standards-based
All data is mapped to Health Level Seven’s Clinical Document Architecture (CDA) in XML Health Level Seven (HL7), is an ANSI-approved Standards Developing Organization HL7 has its own XML Special Interest Group, responsible for developing XML implementations of its standards in XML HL7 is also an active participant in W3C, the organization responsible for the development of XML CDA was approved as an ANSI standard in November of 2000. Component Architecture communicates via Web Services and OGSA Grid standards

171 CHIN Arch.: Proven, Open Components
Components are based on open-source libraries The grid-based servers Mako and Virtual Mako are part of the Mobius Project from Ohio State University’s Dept. of BioInformatics The translation tools to get data into XML are provided by the XQuare and XBridge projects, hosted on the ObjectWeb website, an open source middleware community The algorithm and code for identity management is FEBRL, Freely Extensible Biomedical Record Linkage, which was developed at Australian National University NuSOAP Web Services Engine for component integration

172 FEBRL Identifier matching in FEBRL proceeds in four steps:
Data cleansing and standardization Removes, to the degree possible, string discrepancies based on common misspellings, extra white space, or misplaced name or address components. Indexing Reduces the size of the number of record comparisons which must be performed for scalability; blocking, sorting, and bigram indexing methods are all supported. Record comparison Conducted using an arbitrary composition of exact or inexact string comparison methods over any combination of fields Classification. Follows the Felligi-Sunter34 model, with records pairs assigned a weight based on a pallet of probabilities and matches determined based on the record pair weights

173 FEBRL The current prototype uses FEBRL to implement a simplistic method of linkage whereby record pairs are declared a match if the first and last name are exactly equal. Next Steps Evaluate the accuracy of linking records over a rubric of five data fields - first name, last name, date of birth, social security number, and gender. Exact and inexact matching (ie misspellings and slight discrepancies), including experimental variations of the service based on the blinded bigram matching algorithm. Assess false positives and false negatives produced by each palette of field comparison algorithms. Evaluate the accuracy of linking records using fabricated data sets with characteristics similar to real datasets Experiment with variations of canopy cluster matching algorithm.

174 Other CHIN Issues Why Choose an Open Architecture?
Increased Accountability Plenty of Documentation and Research Greater Transparency Ease of Installation, Maintenance, Dissemination How is Data Ported into CHIN? CHIN is based on a Grid, with each organization supporting its own data through a Contributor server Agency staff has complete control over access to data on CHIN by other users Only one server faces to the outside network

175 Creating a Contributor Server
External IP Address Connection to CHIN Trusted Broker Published to MDR SSL Data Elements Firewall Contributor Server Contains: XML generated files Mako service Java files *.xqy files XML files to generate CDA compliant files Generate XML Datasource

176 Connecting to rest of Network
External IP Address Connection to CHIN Trusted Broker Metadata Registry takes information About data elements About data security Datasource information Contributor profile is registered with CHIN Network Admin Published to MDR SSL Data Elements Access to CHIN Firewall Contributor Server Contains: XML generated files Mako service Java files *.xqy files XML files to generate CDA compliant files Generate XML Datasource

177 How do we get data out? The Trusted Broker component:
Pulls XML from the Virtual Mako which reaches out to all Contributors Compares records from different Contributors using FEBRL De-identifies data sets to generate a final data set for Investigators The Front End component: Provides a central place for users to connect to the system Connects to the Metadata Registry and the Trusted Broker via Web Services calls Allows different users of the system to perform different actions

178 Getting Data from CHIN

179 Getting Data From CHIN Result Set Final Result Set XML Files FEBRL
CHIN also contains: A Front-end server to take queries A Trusted Broker to compare data, perform record linkage, and de-identify results Result Set Final Result Set XML Files FEBRL Deidentify

180 Progress to Date Needs assessment completed
Technical and functional specifications identified MOU’s with state agencies Expanding list of partners Prototype developed Funding for Model Network Development/Deployment /Evaluation 2008

181 Demo

182 EMR Architectures Provider-Based Systems have Two Variants
All Data In House Larger Providers (Clinics) Control All Own Data Sizeable IT Staff for 24-7 Operations Control of Own Backups Limited In House – Off Site Storage (Larger, Multi-Site Practices Smaller Providers – Limited IT Staff Desire Out-of-Box Solution Local Data for Ease of Access Remote Storage – Promotes Off-Hours Access Even 1st Variant – Service for “Backups”

183 EMR for Large Providers - AllScript

184 EMR for Smaller Providers
Provider’s Office Vendor’s Location Server/Data Farm Local EMR Local EMR Patient Data Remote EMR Remote Access

185 Integrating Clinical Repositories
Provider/Hospital Relationship Provider has Privileges at Hospital Provider Chooses Office-Based EMR More Easily Integrated with Hospital EMR Emerging at Community Hospital Level Example: Milford Hospital, MA All Area Providers with Privileges Linked in Ability to See Patient Records, Tests, at Hospital Unclear on Uploads from Providers to Hospital However, No Link to UMass Medical Center (of which Milford Hospital is Affiliated)

186 Integrating Clinical Repositories
CTSA – Region Wide Clinical/Translational Research Target Area Hospitals St. Francis, Hartford, Hosp. Central CT, CCMC Each Hospital has Own Clinical Repository (EMR) For Wider-Scoped T1, T2, and Clinical Research Need to Integrate these Repositories at Some Level What is Most Practical? Setting up Centralized De-Identified Repository? Creating Data Marts as you go? What are Pros and Cons of Each? Researcher Seeking CHF Patient Data Needs to have De-Identified Data Mart

187 Integrating Clinical Repositories

188 Integrating Clinical Repositories

189 Integrating Clinical Repositories

190 Integrating Clinical Repositories
NHIN Prototype Phase I

191 Integrating Clinical Repositories
NHIN Prototype Phase II

192

193 Personal Health Record Integration

194 Concluding Remarks Only Scratched Surface on Architectures
Micro Architectures Macro Architectures Super-Macro Architectures (We’ll see …) What’s are Key Facets in the Discussion? Role and Impact of Standards Open Solutions Architectural Variants – Reuse “Architecture” Can we Reuse CHIN for Clinical Practice? Are All Contributors Simply Each Hospital and EHR? How do we Connect all of the Pieces? What are Next Steps? Let’s Review Some other Work Source: Wide Range of Presentations on Web


Download ppt "Software and Enterprise Architectures"

Similar presentations


Ads by Google