Data Warehousing Data Mining Privacy
Reading Bhavani Thuraisingham, Murat Kantarcioglu, and Srinivasan Iyer Extended RBAC-design and implementation for a secure data warehouse. Int. J. Bus. Intell. Data Min. 2, 4 (December 2007), , Technical-Reports/UTDCS pdf Bhavani Thuraisingham, Murat Kantarcioglu, and Srinivasan Iyer Extended RBAC-design and implementation for a secure data warehouse. Int. J. Bus. Intell. Data Min. 2, 4 (December 2007), , Technical-Reports/UTDCS pdf Technical-Reports/UTDCS pdf Technical-Reports/UTDCS pdf Sweeney L, Abu A, and Winn J. Identifying Participants in the Personal Genome Project by Name. Harvard University. Data Privacy Lab. White Paper April 24, FarkasCSCE Spring
Data Warehousing Repository of data providing organized and cleaned enterprise- wide data (obtained form a variety of sources) in a standardized format Repository of data providing organized and cleaned enterprise- wide data (obtained form a variety of sources) in a standardized format –Data mart (single subject area) –Enterprise data warehouse (integrated data marts) –Metadata FarkasCSCE Spring
OLAP Analysis Aggregation functions Aggregation functions Factual data access Factual data access Complex criteria Complex criteria Visualization Visualization FarkasCSCE Spring
Warehouse Evaluation Enterprise-wide support Enterprise-wide support Consistency and integration across diverse domain Consistency and integration across diverse domain Security support Security support Support for operational users Support for operational users Flexible access for decision makers Flexible access for decision makers FarkasCSCE Spring
Data Integration Data access Data access Data federation Data federation Change capture Change capture Need ETL (extraction, transformation, load) Need ETL (extraction, transformation, load) FarkasCSCE Spring
Data Warehouse Users Internal users Internal users –Employees –Managerial External users External users –Reporting and auditing –Research FarkasCSCE Spring
Data Mining Databases to be mined Knowledge to be mined Techniques Used Applications supported FarkasCSCE Spring
Data Mining Task DM: mostly automated DM: mostly automated Prediction Tasks Prediction Tasks –Use some variables to predict unknown or future values of other variables Description Tasks Description Tasks –Find human-interpretable patterns that describe the data FarkasCSCE Spring
Common Tasks Classification [Predictive] Classification [Predictive] Clustering [Descriptive] Clustering [Descriptive] Association Rule Mining [Descriptive] Association Rule Mining [Descriptive] Regression [Predictive] Regression [Predictive] Deviation Detection [Predictive] Deviation Detection [Predictive] FarkasCSCE Spring
Security for Data Warehousing Establish organizations security policies and procedures Establish organizations security policies and procedures Implement logical access control Implement logical access control Restrict physical access Restrict physical access Establish internal control and auditing Establish internal control and auditing FarkasCSCE Spring
Data Warehousing Issues: Integrity Poor quality data: inaccurate, incomplete, missing meta-data Poor quality data: inaccurate, incomplete, missing meta-data Loss of traditional consistency, e.g., keys Loss of traditional consistency, e.g., keys Source data quality vs. derived data quality Source data quality vs. derived data quality –Trust in the result of analysis? FarkasCSCE Spring
Big Data Security and Privacy Amount of data being considered Amount of data being considered Privacy-preserving analytics Privacy-preserving analytics Granular Access Control Granular Access Control –Flat, two dimensional tables Transaction logs and auditing Transaction logs and auditing Real time monitoring Real time monitoring FarkasCSCE Spring
Big Data Integrity Data Accuracy Data Accuracy Source provenance Source provenance End-point filtering and validation End-point filtering and validation FarkasCSCE Spring
Access Control Layered defense: Layered defense: –Access to processes that extract operational data –Access to data and process that transforms operational data –Access to data and meta-data in the warehouse FarkasCSCE Spring
Access Control Issues Mapping from local to warehouse policies Mapping from local to warehouse policies How to handle “new” data How to handle “new” data Scalability Scalability Identity Management Identity Management FarkasCSCE Spring
Inference Problem Data Mining: discover “new knowledge” how to evaluate security risks? Data Mining: discover “new knowledge” how to evaluate security risks? Example security risks: Example security risks: –Prediction of sensitive information –Misuse of information Assurance of “discovery” Assurance of “discovery” FarkasCSCE Spring
Privacy and Sensitivity Large volume of private (personal) data Large volume of private (personal) data Need: Need: –Proper acquisition, maintenance, usage, and retention policy –Integrity verification –Control of analysis methods (aggregation may reveal sensitive data) FarkasCSCE Spring
Privacy What is the difference between confidentiality and privacy? What is the difference between confidentiality and privacy? Identity, location, activity, etc. Identity, location, activity, etc. Anonymity vs. accountability Anonymity vs. accountability FarkasCSCE Spring
FarkasCSCE Spring Legislations Privacy Act of 1974, U.S. Department of Justice ( ) Privacy Act of 1974, U.S. Department of Justice ( ) Family Educational Rights and Privacy Act (FERPA), U.S. Department of Education, ( dex.html ) Family Educational Rights and Privacy Act (FERPA), U.S. Department of Education, ( dex.html ) dex.htmlhttp:// dex.html Health Insurance Portability and Accountability Act of 1996 (HIPAA), ( tability_and_Accountability_Act ) Health Insurance Portability and Accountability Act of 1996 (HIPAA), ( tability_and_Accountability_Act ) tability_and_Accountability_Acthttp://en.wikipedia.org/wiki/Health_Insurance_Por tability_and_Accountability_Act Telecommunications Consumer Privacy Act ( communications-privacy-act ) Telecommunications Consumer Privacy Act ( communications-privacy-act ) communications-privacy-acthttp:// communications-privacy-act
Online Social Network Social Relationship Social Relationship Communication context changes social relationships Communication context changes social relationships Social relationships maintained through different media grow at different rates and to different depths Social relationships maintained through different media grow at different rates and to different depths No clear consensus which media is the best No clear consensus which media is the best FarkasCSCE Spring
Internet and Social Relationships Internet Bridges distance at a low cost Bridges distance at a low cost New participants tend to “like” each other more New participants tend to “like” each other more Less stressful than face-to-face meeting Less stressful than face-to-face meeting People focus on communicating their “selves” (except a few malicious users) People focus on communicating their “selves” (except a few malicious users) FarkasCSCE Spring
Social Network Description of the social structure between actors Description of the social structure between actors Connections: various levels of social familiarities, e.g., from casual acquaintance to close familiar bonds Connections: various levels of social familiarities, e.g., from casual acquaintance to close familiar bonds Support online interaction and content sharing Support online interaction and content sharing FarkasCSCE Spring
Social Network Analysis The mapping and measuring of relationships and flows between people, groups, organizations, computers or other information processing entities The mapping and measuring of relationships and flows between people, groups, organizations, computers or other information processing entities Behavioral Profiling Behavioral Profiling Note: Social Network Signatures Note: Social Network Signatures –User names may change, family and friends are more difficult to change FarkasCSCE Spring
Interesting Read: M. Chew, D. Balfanz, B. Laurie, (Under)mining Privacy in Social Networks, oc/summary?doi= M. Chew, D. Balfanz, B. Laurie, (Under)mining Privacy in Social Networks, oc/summary?doi= oc/summary?doi= oc/summary?doi= FarkasCSCE Spring
Next Web application insecurity: risk to databases Web application insecurity: risk to databases FarkasCSCE Spring