Efficient Multi-User Indexing for Secure Keyword Search

Slides:



Advertisements
Similar presentations
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Advertisements

Efficient Information Retrieval for Ranked Queries in Cost-Effective Cloud Environments Presenter: Qin Liu a,b Joint work with Chiu C. Tan b, Jie Wu b,
PROVENANCE FOR THE CLOUD (USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES(FAST `10)) Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard.
Clustering and Load Balancing Optimization for Redundant Content Removal Shanzhong Zhu (Ask.com) Alexandra Potapova, Maha Alabduljalil (Univ. of California.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Spark: Cluster Computing with Working Sets
Experience with an Object Reputation System for Peer-to-Peer File Sharing NSDI’06(3th USENIX Symposium on Networked Systems Design & Implementation) Kevin.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Database Systems: A Practical Approach to Design, Implementation and Management International Computer Science S. Carolyn Begg, Thomas Connolly Lecture.
Detecting Near Duplicates for Web Crawling Authors : Gurmeet Singh Mank Arvind Jain Anish Das Sarma Presented by Chintan Udeshi 6/28/ Udeshi-CS572.
An Authentication Service Against Dishonest Users in Mobile Ad Hoc Networks Edith Ngai, Michael R. Lyu, and Roland T. Chin IEEE Aerospace Conference, Big.
Privacy and Integrity Preserving in Distributed Systems Presented for Ph.D. Qualifying Examination Fei Chen Michigan State University August 25 th, 2009.
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
PARTITIONING “ A de-normalization practice in which relations are split instead of merger ”
Wide-area cooperative storage with CFS
Chapter 17 Methodology – Physical Database Design for Relational Databases Transparencies © Pearson Education Limited 1995, 2005.
Introduction to Databases and Database Languages
1 Convergent Dispersal: Toward Storage-Efficient Security in a Cloud-of-Clouds Mingqiang Li 1, Chuan Qin 1, Patrick P. C. Lee 1, Jin Li 2 1 The Chinese.
USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.
Calculating Discrete Logarithms John Hawley Nicolette Nicolosi Ryan Rivard.
Practical Database Design and Tuning. Outline  Practical Database Design and Tuning Physical Database Design in Relational Databases An Overview of Database.
1 © Prentice Hall, 2002 Physical Database Design Dr. Bijoy Bordoloi.
Lecture 9 Methodology – Physical Database Design for Relational Databases.
Panagiotis Antonopoulos Microsoft Corp Ioannis Konstantinou National Technical University of Athens Dimitrios Tsoumakos.
Physical Database Design Chapter 6. Physical Design and implementation 1.Translate global logical data model for target DBMS  1.1Design base relations.
Chapter 16 Methodology – Physical Database Design for Relational Databases.
Master Thesis Defense Jan Fiedler 04/17/98
Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters H.Yang, A. Dasdan (Yahoo!), R. Hsiao, D.S.Parker (UCLA) Shimin Chen Big Data.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
Online Learning for Collaborative Filtering
10/10/2012ISC239 Isabelle Bichindaritz1 Physical Database Design.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
May 30, 2016Department of Computer Sciences, UT Austin1 Using Bloom Filters to Refine Web Search Results Navendu Jain Mike Dahlin University of Texas at.
Web Search Algorithms By Matt Richard and Kyle Krueger.
Abstract With the advent of cloud computing, data owners are motivated to outsource their complex data management systems from local sites to the commercial.
1 Virtual Machine Memory Access Tracing With Hypervisor Exclusive Cache USENIX ‘07 Pin Lu & Kai Shen Department of Computer Science University of Rochester.
CS 347Notes101 CS 347 Parallel and Distributed Data Processing Distributed Information Retrieval Hector Garcia-Molina Zoltan Gyongyi.
Methodology – Physical Database Design for Relational Databases.
CE Operating Systems Lecture 17 File systems – interface and implementation.
Chapter 5 Ranking with Indexes 1. 2 More Indexing Techniques n Indexing techniques:  Inverted files - best choice for most applications  Suffix trees.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Concept-based P2P Search How to find more relevant documents Ingmar Weber Max-Planck-Institute for Computer Science Joint work with Holger Bast Torino,
B. Information Technology (Hons.) CMPB245: Database Design Physical Design.
Presented By Amarjit Datta
SOCSAMS e-learning Dept. of Computer Applications, MES College Marampally FILE SYSTEM.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Retele de senzori Curs 2 - 1st edition UNIVERSITATEA „ TRANSILVANIA ” DIN BRAŞOV FACULTATEA DE INGINERIE ELECTRICĂ ŞI ŞTIINŢA CALCULATOARELOR.
W4118 Operating Systems Instructor: Junfeng Yang.
All Your Queries are Belong to Us: The Power of File-Injection Attacks on Searchable Encryption Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University.
BY S.S.SUDHEER VARMA (13NT1D5816)
File-System Management
Why indexing? For efficient searching of a document
Practical Database Design and Tuning
Indexing Structures for Files and Physical Database Design
Information Retrieval in Practice
Hybrid Cloud Architecture for Software-as-a-Service Provider to Achieve Higher Privacy and Decrease Securiity Concerns about Cloud Computing P. Reinhold.
Physical Database Design and Performance
A Replica Location Service
Methodology – Physical Database Design for Relational Databases
CHAPTER 5: PHYSICAL DATABASE DESIGN AND PERFORMANCE
國立臺北科技大學 課程:資料庫系統 fall Chapter 18
Physical Database Design
Practical Database Design and Tuning
Indexing and Hashing Basic Concepts Ordered Indices
Directory Structure A collection of nodes containing information about all files Directory Files F 1 F 2 F 3 F 4 F n Both the directory structure and the.
Database Systems Instructor Name: Lecture-3.
The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited)
Minwise Hashing and Efficient Search
CSE 542: Operating Systems
CSE 542: Operating Systems
Presentation transcript:

Efficient Multi-User Indexing for Secure Keyword Search PAIS ‘14, March 2014 Eirini Micheli, Giorgos Margaritis, Stergios V. Anastasiadis Dept. of Computer Science & Engineering University of Ioannina, Greece

Keyword Search Accumulated documents Search is indispensable for enormous amount Search is indispensable for personal content archives online social networks cloud facilities E. Micheli, PAIS '14

Shared Storage Environment Co-located users access control to protect documents different access permissions per document search for documents with keyword queries Search engine organizes documents into indices preprocesses documents into posting lists provides term occurrences across documents returns matching documents E. Micheli, PAIS '14

Confidentiality Important aspect of search process ensure search results do not leak confidential information provide end-to-end confidentiality conceal search activity and document content E. Micheli, PAIS '14

Confidentiality Important aspect of search process ensure search results do not leak confidential information provide end-to-end confidentiality conceal search activity and document content Prevent information leakage to unauthorized users return only accessible documents E. Micheli, PAIS '14

Not enough to ensure confidentiality Important aspect of search process ensure search results do not leak confidential information provide end-to-end confidentiality conceal search activity and document content Prevent information leakage to unauthorized users return only accessible documents Not enough to ensure confidentiality E. Micheli, PAIS '14

Bϋttcher et al. - USENIX Conference on File and Storage Technologies Confidentiality Attack Users can infer information about inaccessible documents Bϋttcher et al. - USENIX Conference on File and Storage Technologies E. Micheli, PAIS '14

Bϋttcher et al. - USENIX Conference on File and Storage Technologies Confidentiality Attack Users can infer information about inaccessible documents Insecure search environment single shared index vector space model TF/IDF scoring function Bϋttcher et al. - USENIX Conference on File and Storage Technologies E. Micheli, PAIS '14

Including inaccessible documents Confidentiality Attack Users can infer information about inaccessible documents Insecure search environment single shared index vector space model TF/IDF scoring function Estimate total number of documents containing given term Including inaccessible documents Bϋttcher et al. - USENIX Conference on File and Storage Technologies E. Micheli, PAIS '14

Including inaccessible documents Confidentiality Attack Users can infer information about inaccessible documents Insecure search environment single shared index vector space model TF/IDF scoring function Estimate total number of documents containing given term create specific documents issue customized queries leverage search result relevance ranking Including inaccessible documents Bϋttcher et al. - USENIX Conference on File and Storage Technologies E. Micheli, PAIS '14

Single document indexed multiple times Prior Work One index per user secure - consumes disk space Single document indexed multiple times E. Micheli, PAIS '14

Single document indexed multiple times Each document indexed once Prior Work One index per user secure - consumes disk space Single document indexed multiple times Single shared index Each document indexed once storage efficiency - security at extra list processing overhead E. Micheli, PAIS '14

Prior Work Partition documents by search permissions documents searchable by same user set in single index query handled over indices with searchable documents secure avoids result filtering of single shared index - variations in search sets may increase number of indices E. Micheli, PAIS '14

Prior Work Partition documents by search permissions documents searchable by same user set in single index query handled over indices with searchable documents secure avoids result filtering of single shared index - variations in search sets may increase number of indices Reduce number of indices E. Micheli, PAIS '14

Prior Work Partition documents by search permissions documents searchable by same user set in single index query handled over indices with searchable documents secure avoids result filtering of single shared index - variations in search sets may increase number of indices Reduce number of indices Take advantage of user commonality among searcher sets E. Micheli, PAIS '14

Goals of Indexing Organization Security ensure confidentiality Search efficiency low search latency Low indexing cost document insertion I/O activity indexing storage space E. Micheli, PAIS '14

Lethe Approach Document grouping Clustering Index creation group documents with identical search sets Clustering cluster together document groups with similar search sets Index creation shared indices for common subset of searchers E. Micheli, PAIS '14

Group Documents into Families Definitions searcher: user authorized to search document by keywords family: group of documents with identical search set Create families crawl the Path and Allowed Searchers of all documents group together documents with identical search sets E. Micheli, PAIS '14

Cluster Families by Searcher Set Apply clustering algorithm over searcher sets of families Searcher Similarity Ls є [0, 1] configurable parameter adjust number of common searchers across cluster families Clusters end up with one or more families E. Micheli, PAIS '14

Intersections and Differences Cluster intersection subset of common searchers among cluster families documents of all families in cluster Family difference searchers of family outside cluster intersection documents of corresponding family E. Micheli, PAIS '14

Map Families to Indices Compute cluster intersection and family differences E. Micheli, PAIS '14

Map Families to Indices Compute cluster intersection and family differences Dedicate single index to cluster intersection E. Micheli, PAIS '14

Map Families to Indices Compute cluster intersection and family differences Dedicate single index to cluster intersection What about family differences? E. Micheli, PAIS '14

Map Families to Indices Compute cluster intersection and family differences Dedicate single index to cluster intersection What about family differences? Extreme approaches E. Micheli, PAIS '14

Map Families to Indices Compute cluster intersection and family differences Dedicate single index to cluster intersection What about family differences? Extreme approaches Approach 1: Difference documents in private indices E. Micheli, PAIS '14

Map Families to Indices Compute cluster intersection and family differences Dedicate single index to cluster intersection What about family differences? Extreme approaches Approach 1: Difference documents in private indices Large number of duplicates E. Micheli, PAIS '14

Map Families to Indices Compute cluster intersection and family differences Dedicate single index to cluster intersection What about family differences? Extreme approaches Approach 1: Difference documents in private indices Large number of duplicates Approach 2: Dedicate single difference index E. Micheli, PAIS '14

Map Families to Indices Compute cluster intersection and family differences Dedicate single index to cluster intersection What about family differences? Extreme approaches Approach 1: Difference documents in private indices Large number of duplicates Approach 2: Dedicate single difference index Many indices serve small number of documents & searchers E. Micheli, PAIS '14

Separate Index per Difference? Approximate document duplication per difference introduce duplication product R R = |difference documents| x |difference searchers| E. Micheli, PAIS '14

Separate Index per Difference? Approximate document duplication introduce duplication product R R = |difference documents| x |difference searchers| Duplication threshold Td є [0, ∞) configurable parameter adjust number of duplicates per difference E. Micheli, PAIS '14

Separate Index per Difference? Approximate document duplication introduce duplication product R R = |difference documents| x |difference searchers| Duplication threshold Td є [0, ∞) configurable parameter adjust number of duplicates per difference Check whether R < Td yes: copy to private indices no: single difference index E. Micheli, PAIS '14

Index Documents Index documents into appropriate indices use index specifications from mapping stage Manage forthcoming documents look for existing indices with identical searcher set Manage document deletions or modifications change indices content reorganize mapping of indices to documents Optimize search periodically repeat clustering and mapping stage E. Micheli, PAIS '14

Prototype Implementation Crawler gather information for document searchers specify unique identifier to each document E. Micheli, PAIS '14

Prototype Implementation Clusterer create families over hash table cluster families with DBSCAN E. Micheli, PAIS '14

Prototype Implementation Mapper identify cluster intersections and family differences dedicate single index to each cluster intersection dedicate indices to family differences based on Td E. Micheli, PAIS '14

Prototype Implementation Indexer receive index specifications from mapper split documents of each index into batches communicate with search engine to index documents E. Micheli, PAIS '14

[1] Smetters et al. – Symposium on Usable Privacy and Security Experimental Evaluation Experimental set up Linux v2.6.32 quad-core x86 2.33GHz processor 4GB RAM 7.2KRPM SATA disks Synthetic document collection searcher lists based on existing dataset (Docushare [1]) 200 users, 131 groups, 50000 documents Configurable parameters searcher similarity Ls є [0, 1] duplication threshold Td є [0, ∞) [1] Smetters et al. – Symposium on Usable Privacy and Security E. Micheli, PAIS '14

Effect of Searcher Similarity Noteworthy cases Ls = 0% 1 cluster containing all families Ls = 100% 1 family per cluster Ls = 60% 929 clusters with 53.82 documents each Lower similarity: fewer and dissimilar clusters E. Micheli, PAIS '14

Sensitivity of Search Cost Indices accessed per searcher’s query Minimize indices per searcher Td = 0, 500 or 1500: shared intersection indices at Ls= 60% Td ∞: documents in private indices at Ls= 0% or 100% E. Micheli, PAIS '14

Sensitivity of Update Cost Indices updated per document Minimize indices per document Td = 1500 or ∞: intersections limit document duplication at Ls = 60% Flat curves Td = 0 or 500 across different Ls E. Micheli, PAIS '14

Search-Update Tradeoff Closer to (1, 1) better Opposite effect of Td decreases indices per searcher increases indices per document Td = 1500 seems reasonable choice leads to both low search and update cost E. Micheli, PAIS '14

Conclusions & Future Work Core idea cluster documents by searcher sets shared indices for common searchers Performance analysis avoided result filtering reduced indices per searcher prevented excessive document duplication Secure and efficient keyword search tunable parameters achieve low search and update cost Future work experiment with broad collection of datasets (e.g., OSN) integrate Lethe into distributed search engine E. Micheli, PAIS '14