Presentation is loading. Please wait.

Presentation is loading. Please wait.

Virginia’s Longitudinal Data System

Similar presentations


Presentation on theme: "Virginia’s Longitudinal Data System"— Presentation transcript:

1 Virginia’s Longitudinal Data System
A Federated Approach to Longitudinal Data April 4th, 2011

2 Agenda The Challenge Virginia’s Approach
Best Practice and SME Findings Design Considerations Proposed Solution Summary

3 The Challenge To develop a Statewide Longitudinal Data System (SLDS) that, without violating privacy policies or law, provides users with a capability to query, link, download and create reports from record level or aggregate data between one or more agencies Because of existing Commonwealth law, the SLDS could not be based on an underlying data warehouse De-identified data may be merged when a viable reason exist. However, The use of persistent, de-identified, linked (merged) data was determined to be highly inefficient and raised political issues which could have endangered the project. Also contains a BI capability for DOE reports. December 13, 2010

4 Virginia’s Approach Virginia undertook a comprehensive investigation of best practices and subject matter experts to determine the feasibility of a federated data model. Between October and December 2010, the Center for Innovative Technology (CIT), Virginia Information Technologies Agency (VITA) and the Department of Education (DOE) interviewed six best practice organizations and ten subject matter experts. Those findings led to a SLDS Technical Architecture which fulfilled the objective of the grant while adhering to the Commonwealth’s privacy constraints.

5 Significant Findings Best Practice Interviews
Subject Matter Experts Interviews Stakeholder Management Federated Systems Perform Poorly Data Governance Use of Commercial Solutions Leveraging Existing Systems Use of Multiple Hash Keys Requirements Drive System Architecture Cleary Defined Security Policies

6 Important Design Considerations
User friendly Maximize use of existing technologies/solutions Minimize sustainment costs Record level data queries were not time sensitive Strong central security model

7 The Solution A federated data model and technical architecture comprised of a web based user interface (UI), a query/linking engine, a multi-level security module, a rich business intelligence (BI) capability, a Lexicon and integrated workflow. Data Security SLDS Portal Reporting Workflow Lexicon Shaker Also contains a BI capability for DOE reports. December 13, 2010

8 Data Security SLDS Portal Reporting Workflow Lexicon Shaker

9 Conceptual Portal SLDS Portal Security Reporting Workflow Lexicon
Data Security SLDS Portal Reporting Workflow Lexicon Shaker

10 Portal Components Shaker Reports Lexicon Workflow
Data Security SLDS Portal Reporting Workflow Lexicon Shaker Shaker Distributed Query Engine (DQE) For use by Agency employees and named users Reports Public Facing Aggregated Data Named Users - Query Building Tool (QBT) Lexicon Workflow Account request Data request

11 Portal Features (Public Facing)
Data Security SLDS Portal Reporting Workflow Lexicon Shaker Aggregated Data Reports Lexicon Links to Agency reports Help Files FAQs Request for Named User Account

12 Portal Features (Named Users)
Data Security SLDS Portal Reporting Workflow Lexicon Shaker Help / Training Reports Non-suppressed aggregated data Query Building Tool (QBT) Lexicon Workflow Account and Data request Data retrieval File Attachment for uploading NDAs, etc. Ability to check status, modify or cancel account and/or data request Password reset

13 Data Security SLDS Portal Reporting Workflow Lexicon Shaker

14 Security Overview Anonymous Named Schools Researchers Agency Employees
Data Security SLDS Portal Reporting Workflow Lexicon Shaker Aggregated Data (Suppressed) Aggregated Data (Non- Suppressed) Unit Record Level Data Account Management Portal Components Anonymous Named Schools Researchers Agency Employees System Admin

15 Security Authentication Authorization Viewing Viewing Suppressed Data
SLDS Portal Reporting Workflow Lexicon Shaker Data Security SLDS Portal Reporting Workflow Lxicon Shaker Authentication Viewing Viewing Suppressed Data Non-Suppressed Data Authorization Database Table Column Role Based Permission Viewing Editing

16 Data SLDS Portal Reporting Workflow Lexicon Shaker Security Security

17 Workflow Data Security SLDS Portal Reporting Workflow Lexicon Shaker

18 Data SLDS Portal Reporting Workflow Lexicon Shaker Security Security

19 Reporting: Record Level Linked Data
Security SLDS Portal Reporting Workflow Lexicon Shaker Report Creation1,2 (Ad Hoc interface) Lexicon Shell Database1,2 Ad Hoc Metadata Query Results5,6 DOE SCHEV VEC Approval 1. Instantiates the information contained in the Lexicon. 2. Contains dummy data. Source Data Report link will display report with dummy data. Report will have a button that will allow submission of report to workflow. Distributed query engine generate queries to each of the source data systems and join the result sets . Engine will interact with Lexicon. Options for report display include a Logi Analysis Grid (depending on number of records returned.) or a link to download a file. Access may be provided through Ad Hoc report portal. Results Shaker3,4

20 Reporting: Aggregate Linked Data
Security SLDS Portal Reporting Workflow Lexicon Shaker Aggregate Linked Data3 DOE SCHEV VEC Source Data There will be prebuilt reports for linked data from the different sources (e.g., DOE to SCHEV, SCHEV to VEC). The prebuilt reports may provide the user with some capabilities to perform analysis on the data (e.g., crosstabbing, grouping, filtering, etc.) Prebuilt Reports1,2 User ETL1,2 ETL process will periodically pull source data and load aggregate data tables. The tool used for the ETL process may be SSIS or LogiETL. . Data access through Stored Procedures which will handle data suppression. HTTP Record Level Linked Data Direct DB Connection SLDS Portal Portal1 Prebuilt Reports will be displayed within iFrames in Portal. Public Reports SLDS Portal

21 Data Security SLDS Portal Reporting Workflow Lexicon Shaker

22 Lexicon Defined For Our Purposes:
The Lexicon is an inventory of every available data field in every available data source, the structure of their storage, the possible values and meanings of the information stored, all possible transformations of each set of field values to another set of field values, methods of data source access, and matching algorithms and how they are to be used in conjunction with possible field value transformations. Transformations & Matching Algorithms

23 Lexicon Maintenance To maintain accuracy and manage extensibility, the linking module will process all data sources periodically at a predetermined time/interval looking for: Changes in data ranges ( a new code was added for race/ethnicity ) New fields (more data, more data, more data!) Anything else that would disrupt the probabilistic matching or provide more ways to slice and dice the data Anomalies found by the linking module will prompt an alert for a system administrator to modify the matching algorithm or add query choices For new sources, or those with known common fields/links, this would be the method of entry

24 Shaker SLDS Portal Reporting Workflow Lexicon Shaker Security Security
Data SLDS Portal Reporting Workflow Lexicon Shaker Security Security

25 Lexicon – Shaker Process
User Interface/ Portal/ LogiXML Lexicon Common IDs [deterministic] or Common Elements with appropriate Transforms, Matching Algorithms and Thresholds [probabilistic] Shell Database Query Building Process (Pre-Authorization) ? Field Name Meta data A B C Field Name Meta data A B N Field Name Meta data k b n Sample Data Workflow Manager DS 1 Linking Control A linking engine process will update the Lexicon periodically to allow query building on known available matched data fields. No data is used in this process. Queries are built on the relationships between data fields in the Lexicon. DS 2 Data Access Control Sub-Query Optimization Hashed ID Matrix Authorized Query DS 3 Query Results

26 Joining Sub-Queries on Hashed-IDs
Data Security SLDS Portal Reporting Workflow Lexicon Shaker Joining Sub-Queries on Hashed-IDs Add’l Data Sources Possible Connection using Web Service – creates Web Services Data Source (Oracle) - enables application and data integration by turning external web service into an SQL data source, making external Web services appear as regular SQL tables. This table function represents the output of calling external web services and can be used in an SQL query. Possible Connection using Homogeneous link between Oracle DBs – establish synonyms for global names of remote objects in the distributed system so that the Shaker can access them with the same syntax as local objects Possible Connection using Heterogeneous link using available Transparent Gateway or Generic ODBC/OLE Sub-query processing priority will be determined for each query to minimize unnecessary data transfer (e.g. not downloading unmatched records unless specifically requested) to optimize join performance – see Query Sub-Process Optimization Matched Hash ID Values The SLDS server will match records from different agencies using the Hash ID After records are matched, the SLDS server will delete the Hash ID values and replace them with randomly generated unique IDs. November 10, 2018

27 Sub-Query Process Optimization
Data Security SLDS Portal Reporting Workflow Lexicon Shaker Sub-Query Process Optimization Agency Creates Hash-IDs DS 1 DS 2 DS 3 1st DS to query is DS with least count using specified criteria Query 1st DS using today’s key Returns set with hashed IDs 2nd DS to query is DS with next least count using specified criteria (if Inner Join) Query 2nd DS using today’s key AND hashed-ID list from 1st DS Get COUNTS from each DS Web Service for each set of limiting criteria Parse Sub-Queries Run 1st Sub-Query Join Sub-Queries on Hashed ID Create Hash-Key Derive JOIN Criteria from Lexicon - Common IDs [Deterministic] or Common Elements with appropriate Transforms, Matching Algorithm and Thresholds [probabilistic] Run 2nd Sub-Query Query Results Lexicon Query

28 Data SLDS Portal Reporting Workflow Lexicon Shaker Security Security

29 Data Architecture DS 1 DS 2 DS 3 DS 1 ETL1 VITA (CESC)
Security SLDS Portal Reporting Workflow Lexicon Shaker DS 1 DS 2 DS 3 Contains DBs for Shaker, Ad Hoc metadata, logging, auditing, etc. Database for Shaker process and that temporarily stores linked record level data. The temporary tables will be dropped after a set period of time. For canned reports, Stored Procedures will be used for data querying and suppression. DS 1 ETL1 VITA (CESC) Metadata and Security1 Workflow Lexicon Shell DB Shaker/ Deidentified Record Level Data2 Aggregate Linked Data SPs3 Workflow Lexicon UI / Admin Record Level Query / Reports Aggregate Linked Reports SLDS Portal

30 Physical Infrastructure

31 Physical Infrastructure Shaker – Production Env. (CESC)

32 SLDS Components Matrix
Custom / COTS Suggested Product Portal Custom Security Authentication COV AUTH Authorization Mixed Workflow COTS MS Dynamics Reports Public Facing Logi Info Query Building Logi Ad-Hoc Lexicon Shaker Extract, Transform & Load Logi ETL, SSIS or Informatica Distributed Query Engine (DQE) Custom or COTS Syncsort, Informatica or Custom

33 Questions?

34 Back-Up Slides

35 Security Data security enforced by/at …. Authentication Authorization
SLDS Portal Reporting Workflow Lexicon Shaker Authentication COV AUTH Authorization Role Based Anonymous User Named User System Administrator Agency Employee Researcher Permissions Workflow Reports (Suppressed and Non-Suppressed) Query Building Tool Lexicon Data elements User Account Management Data security enforced by/at …. Portal Lexicon Viewing Editing Reports Suppressed Data Non-Suppressed Data Workflow Data Database Table Column


Download ppt "Virginia’s Longitudinal Data System"

Similar presentations


Ads by Google