Demo, May 2005 Privacy Preserving Database Application Testing Xintao Wu, Yongge Wang, Yuliang Zheng, UNC Charlotte.

Slides:



Advertisements
Similar presentations
Testing Relational Database
Advertisements

MCMS Mining Course Management Systems
Dream Report: Advanced Manual Data Entry
A Privacy Preserving Index for Range Queries
Data Modeling and Database Design Chapter 1: Database Systems: Architecture and Components.
PDM Workshop April 8, 2006 Deriving Private Information from Perturbed Data Using IQR-based Approach Songtao Guo UNC Charlotte Xintao Wu UNC Charlotte.
Dr Gordon Russell, Napier University Unit Data Dictionary 1 Data Dictionary Unit 5.3.
Leting Wu Xiaowei Ying, Xintao Wu Dept. Software and Information Systems Univ. of N.C. – Charlotte Reconstruction from Randomized Graph via Low Rank Approximation.
Privacy Preserving Market Basket Data Analysis Ling Guo, Songtao Guo, Xintao Wu University of North Carolina at Charlotte.
SAC’06 April 23-27, 2006, Dijon, France Towards Value Disclosure Analysis in Modeling General Databases Xintao Wu UNC Charlotte Songtao Guo UNC Charlotte.
SAC’06 April 23-27, 2006, Dijon, France On the Use of Spectral Filtering for Privacy Preserving Data Mining Songtao Guo UNC Charlotte Xintao Wu UNC Charlotte.
Security in Databases. 2 Srini & Nandita (CSE2500)DB Security Outline review of databases reliability & integrity protection of sensitive data protection.
Chapter 1 INTRODUCTION TO DATABASE.
Security in Databases. 2 Outline review of databases reliability & integrity protection of sensitive data protection against inference multi-level security.
Introduction to Database Management
End-to-End Design of Embedded Real-Time Systems Kang G. Shin Real-Time Computing Laboratory EECS Department The University of Michigan Ann Arbor, MI
MetaQuerier Mid-flight: Toward Large-Scale Integration for the Deep Web Kevin C. Chang.
The Relational Model Codd (1970): based on set theory Relational model: represents the database as a collection of relations (a table of values --> file)
Data Management: Documentation & Metadata Types of Documentation.
Designing a Data Warehouse
Introduction to Databases
The Co-op Database Project Who It's For At Northeastern University cooperative education is an integral part of the education experience. There is a continuous.
Introduction to Database Systems 1.  Assignments – 3 – 9%  Marked Lab – 5 – 10% + 2% (Bonus)  Marked Quiz – 3 – 6%  Mid term exams – 2 – (30%) 15%
“OnTheMap” The Census Bureau’s New Tool for Residence-Workplace Analysis Fredrik Andersson and Jeremy Wu May 7, 2007 Daytona Beach, FL.
Metadata driven application for aggregation and tabular protection Andreja Smukavec SURS.
Designing a Data Warehouse Issues in DW design. Three Fundamental Processes Data Acquisition Data Storage Data a Access.
Chapter 4: Organizing and Manipulating the Data in Databases
Chapter 4-1. Chapter 4-2 Database Management Systems Overview  Not a database  Separate software system Functions  Enables users to utilize database.
ITEC224 Database Programming
De-identifying Pathology Reports for Pathology Informatics
ABC Insurance Co. Paul Barry Steve Randolph Jing Zhou CSC8490 Database Systems & File Management Dr. Goelman Villanova University August 2, 2004.
VLDB Demo WISE-Integrator: A System for Extracting and Integrating Complex Web Search Interfaces of the Deep Web Hai He, Weiyi Meng, Clement Yu, Zonghuan.
Project X Group Y Presenters: (indicate roles). Part I: Project Overview System provides functionality X Motivation for project –Address problem with…
Privacy vs. Utility Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.
ASEE Profiles and Salary Surveys: An Overview
Tallahassee, Florida, 2015 COP4710 Database Systems Project Overview Fall 2015.
INFuture2015 Zagreb, November 2015 Long-term Preservation of Longitudinal Statistical Surveys in Psycholinguistic Research Hrvoje Stančić Faculty.
September 25, 2006 NASA Feasibility Study Status Update.
Database Systems Lecture 1. In this Lecture Course Information Databases and Database Systems Some History The Relational Model.
Database Security Cmpe 226 Fall 2015 By Akanksha Jain Jerry Mengyuan Zheng.
GEM METADATA DEVELOPMENT Xiaoping Wang, Macrosearch Allen Macklin, PMEL and Bernard Megrey, AFSC.
DOE Data Management Plan Requirements
High-Performance Querying on RAW data Anastasia Ailamaki EPFL.
GSIM in practice in Norway Jenny Linnerud – Ørnulf Risnes – Arofan Gregory -
Introduction to Databases Transparencies © Pearson Education Limited 1995, 2005.
An Excel-based Data Mining Tool Chapter The iData Analyzer.
Institute for the Protection and Security of the Citizen HAZAS – Hazard Assessment ECCAIRS Technical Course Provided by the Joint Research Centre - Ispra.
Quantification of Integrity Michael Clarkson and Fred B. Schneider Cornell University RADICAL May 10, 2010.
SQL Introduction to database and SQL. Chapter 1: Databases and Database Users 6 Introduction to Databases Databases touch all aspects of our lives. Examples:
Level 1-2 Trigger Data Base development Current status and overview Myron Campbell, Alexei Varganov, Stephen Miller University of Michigan August 17, 2000.
Advanced Databases COMP3017 Dr Nicholas Gibbins
Building Valid, Credible & Appropriately Detailed Simulation Models
TCCICOMPUTERCOACH ING.COM.  TCCI-Tririd Computer Coaching Institute provides best teaching in basic computer programming language at tcci-ahmedabad.
Introduction.  Instructor: Cengiz Örencik   Course materials:  myweb.sabanciuniv.edu/cengizo/courses.
1 Integration of the LCP Reporting Into the E-PRTR Current status and Technical proposal August 4th.
Standard validation method with code generation in the ADEL system 10. November Wiesbaden Erzsébet Kómár IT Department
IFS310: Module 10 Database Design - Physical design of files and databases.
Xiaowei Ying, Kai Pan, Xintao Wu, Ling Guo Univ. of North Carolina at Charlotte SNA-KDD June 28, 2009, Paris, France Comparisons of Randomization and K-degree.
Big Data & Test Automation
Intro to MIS – MGS351 Databases and Data Warehouses
COP4710 Database Systems Project Overview.
NOTICE! These materials are prepared only for the students enrolled in the course Distributed Software Development (DSD) at the Department of Computer.
Dissemination Workshop for African countries on the Implementation of International Recommendations for Distributive Trade Statistics May 2008,
2. An overview of SDMX (What is SDMX? Part I)
Data Warehousing Data Mining Privacy
5th Sudan Population and Housing Census
Chapter 2 Database Environment Pearson Education © 2009.
Distributed Database Management Systems
Best Practices in Higher Education Student Data Warehousing Forum
Presentation transcript:

Demo, May 2005 Privacy Preserving Database Application Testing Xintao Wu, Yongge Wang, Yuliang Zheng, UNC Charlotte

Demo2 Overview Milestone Initial investigation from May 2002 to Dec 2002 Official starting from Sept 2003 and being supported by NSF CCR ( 200k, Sept 2003 – August 2005) The prototype system was finished April Developed using C++, Oracle with 22K lines of source code Demo at several Banks, May 2005 … Personnel Faculty: Xintao Wu, Yongge Wang, Yuliang Zheng Current graduate students: Songtao Guo, Ying Wu, Chintan Sanghvi, Guodong Jiao Previous graduate students: Jing Jin, Amol Kedar Several senior undergraduate students More Info

Demo3 Motivation To generate synthetic data for DB application testing, especially performance testing. Many applications are involving large-scale databases with sensitive information. Complete testing is essential for database applications to function correctly and to provide acceptable performance.

Demo4 Our Approach To generate synthetic databases based on a-priori knowledge about the current production databases The needed a-priori knowledge is generally available from ER, DDL, Data Dictionary with schema, data integrity rules as well as basic statistical information Can extract detailed statistical information if original data or samples from production database are available The data can be either realistic amounts or any amounts Better controllability, observability, and privacy

Demo5 Three Characteristics of Synthetic Data Valid The synthetic data need to satisfy all the same constraints and business rules as the live data Necessary for functional testing Privacy preserving No disclosure of any confidential information that need to be protected Resembling to real data The synthetic data need to have the similar statistical distributions or patterns as the live data Necessary for performance testing as the statistical nature of the data determines query performance We will show if data distributions are not similar, the execution time of the same workload may be totally different.

Demo6 ER Data DDL Catalog RNRS Schema & Domain Filter Schema’Domain’ Disclosure Assessment Performance Assessment Data Generator Synthetic database General Location Model Architecture

Demo7 Building a Project

Demo8 Data Dictionary Information

Demo9 Statistical Information Extraction Basic

Demo10 Statistical Information Extraction Advance

Demo11 Generating Meta & Data File

Demo12 Generating Confidential File

Demo13 Disclosure Analysis - Categorical

Demo14 Numerical Disclosure Basic Batch Mode

Demo15 Numerical Disclosure Basic Single Mode

Demo16 Creating Final Categorical File

Demo17 Creating Final Rule File (GLM Format)

Demo18 Generating Data