Presentation is loading. Please wait.

Presentation is loading. Please wait.

SMIS: Secured and Mobile Information Systems

Similar presentations


Presentation on theme: "SMIS: Secured and Mobile Information Systems"— Presentation transcript:

1 SMIS: Secured and Mobile Information Systems
INRIA Paris-Rocquencourt Joint team with CNRS & University of Versailles (UVSQ) Junior Seminar INRIA may 19th 2015

2 Background and research fields
A Core Database Culture … Storage and indexing models, query execution and optimization Transaction protocols (atomicity, isolation, durability) Database security (access and usage control, encryption) Distributed DB architectures SMIS Project Team 12/7/2012

3 Current composition of the team
Permanent members Nicolas Anciaux CR INRIA Luc Bouganim, DR INRIA Benjamin Nguyen, MC UVSQ, since 2010 Philippe Pucheral, PR UVSQ Iulian Sandu Popa, MC UVSQ, since 2012 Engineers Quentin Lefebvre Aydogan Ersoz PhD students Saliha Lallali: Document Indexing for Embedded Personal Databases Athanasia Katsouraki: Access and Usage Control for Personal Data in Trusted Cells Cuong To: Secure Global Computations on Personal Data Servers Paul Tran-Van: Sharing file in Embedded Personal Databases SMIS Project Team 3

4 A Scalable Search Engine for Mass Storage Smart Objects
SMIS project (INRIA, Prism, Univ. Versailles) Saliha LALLALI Nicolas ANCIAUX Iulian SANDU POPA Philippe PUCHERAL

5 Motivation – Advent of Smart Objects
Secure devices on which a GB flash chip is superposed USB MicroSD reader Contactless + USB 8GB Flash Secure MicroSD 4GB Flash Personal & secure devices Personal memory devices in which a secure chip is implanted Sim Card Application domains Personal Data Server Personal Cloud / Personal Web Securely store, query and share personal user’s files and their metadata Documents, photos, s, links, profiles, preferences … Base required functionality: full text search (similar to an embedded Google desktop or Spotlight)

6 Motivation – Smart Metering and Internet of Things
Google glass Smart meters and IoT Camera sensor Smart sensor context Smart meters/objects (Linky, GPS tracker, set-up box, …) Smart sensors recording events in theirs surroundings (camera sensor, Google glass)

7 Smart Objects and Data Management
Why transposing traditional data management functionalities directly into the smart objects? Managing large collections of data locally in smart objects exhibits very good properties in terms of: Privacy & security Data distribution Transfer only the results and not the data Energy saving Avoiding to transmit all the data to a central server Transferring few data (the results) Bandwidth savings Several works consider the problem of data management in SOs: Basic filtering and SQL query support Facial recognition Full text search (documents, images: tags/visterms, any tagged data objects)

8 Full-Text Search Requirements (1)
Inverted index A search structure or (a dictionary): stores for each term t appearing in the documents the number Ft of documents containing t and a pointer to the inverted list of t A set of inverted lists: where each list stores for a term t the list of (d, fd,t) pairs where d is a document identifier that contains t and fd,t is the weight of the term t in the document d Dictionary organized as a B-tree ti , Fti tj , Ftj (d1, fti,d1), (d3, fti,d3), (d4, fti,d4) Inverted list for term ti (d5, ftj,d5)

9 Full-text search requirements (1)
Term Ft Inverted list and 1 (6,2) big 2 (2,2) (3,1) dark (6,1) did (4,1) gown (2,1) had (3,1) house (2,1) (3,1) in 5 (1,1) (2,2) (3,1) (5,1) (6,2) keep 3 (1,1) (3,1) (5,1) keeper (1,1) (4,1) (5,1) keeps (1,1) (5,1) (6,1) light never night (1,1) (4,1) (5,2) old 4 (1,1) (2,2) (3,1) (4,1) sleep sleeps the 6 (1,3) (2,2) (3,3) (4,1) (5,3) (6,2) town (1,1) (3,1) where The old night keeper keeps the keep in the town. In the big old house in the big old gown. The house in the town had the big old keep. Where the old night keeper never did sleep. The night keeper keeps the keep in the night. And keeps in the dark and sleeps in the light. « Keeper » Inverted Index « Keeper » Document Set Dictionary Organized as a B-Tree ti , Fti tj , Ftj (d1, fti,d1), (d3, fti,d3), (d4, fti,d4) Inverted list for term ti (d5, ftj,d5)

10 TF-IDF(doc) =  (fd,ti* Log( N / Fti))
Full-Text Search Requirements (2) Answer full-text search queries For a set of query keywords, produce the k most relevant documents (according to a weight function like TF-IDF) To evaluate the query: Access the inverted index search structure, retrieve for each query term t the inverted lists elements Allocate in RAM one container for each document identifier in these lists Compute the score of each of these documents using the TF-IDF formula Rank the documents according to their score and produce the k documents with the highest score TF-IDF(doc) =  (fd,ti* Log( N / Fti)) {ti} query keywords too much!

11 How do existing techniques deal with these constraints ?
Smart Object HW Architecture Smart objects share a common architecture (Secure) Microcontroller Low cost But small RAM ( 5KB ~ 128KB) NAND Flash Dense, robust, low cost But high cost of random writes Pages must be erased before being rewritten Erase by block vs. write by page Tiny RAM and NAND Flash introduce conflicting constraints for data indexing NAND FLASH MCU BUS How do existing techniques deal with these constraints ?

12 Insert-optimized family Query-optimized family
Problem Statement Challenge: execute queries with a very small RAM on large volumes of data indexed in NAND Flash Query time Insert-optimized family Sequentially write the index in Flash Small indexed structure (hash function with a small number of buckets indexed in RAM) Updates not supported! Query-optimized family Update the index in place Insertion time Objectives of the proposed solution: Bounded RAM (a few KB) & Full Scalability (both for updates and queries) Design principles Write-once partitioning (update scalability) Linear pipelining (query evaluation under a Bound RAM) Background merging (query/update scalability)

13 Principle1: Write-Once Partitioning
Split the inverted index structure in successive partitions such that a partition is flushed only once in Flash and is never updated. I1 I2 I3 Ip ………… ………… ………… FLASH RAM RAM _Bound

14 Principle2: Linear Pipelining
For a Q = {t1, t2,….,tn} : 𝑡𝑖∈𝑄 𝑊( 𝑓 𝑑,𝑡 ∗ log ) a global metadata N N Topk Fti Fti I1 I2 I3 Ip ti,fti ti,fti ti,fti ti,fti ………… ti,ftj FLASH RAM page page insert (d, s) = fti + fti + fti + … + fti s > min Ft1 , Ft2 , Ft3 ,… ,Ftn top-k merge on d sscore(d) min

15 Principle3: Background Linear Merging
I b L0 I 1 L0 I 2 L0 L0 ……… ……… merge merge merge merge I1,1 I1,b I1,1 I1,2 L1 inverted lists for term ti merge I2,1 order of the scan when querying the index L2 Active partition Reclaimed partition SSF (Scalable and Sequential Flash structure)

16 Document Deletions (1) Implementing the delete operation is challenging : Index updating  Random updates in the index State of the art embedded search indexes do not support/consider document deletions/updates The alternative to updating in-place is compensation: Store the Deleted Document Identifier (DDIs) as a sorted list in Flash Intersect DDIs lists at query execution time with the inverted lists of the query terms Compensation problems: Random documents deletion  maintaining a sorted list of DDIs in Flash violate the Write-Once Partitioning principle The Ft computation need an additional merge operation to subtract the sorted list of DDIs from the inverted lists for each term in the query the full DDI list has to be scanned for each query regardless of the query selectivity violate the Linear Pipelining principle

17 Document Deletions (2) Retained deletion method:
Compensate the index structure itself: A pair of (term, d, - fd,t) is inserted in Ii for each term in the deleted document d ft of each term t in d is decremented by 1 The objective is threefold: Preclude random writes Write-Once Partitioning principle Query selectivity Linear Pipelining principle Absorb the deleted documents in Background Merging

18 Document Deletions (3) Query: ………… FLASH RAM … … … dstock page page
𝑡𝑖∈𝑄 𝑊( 𝑓 𝑑,𝑡 𝑥 log ) N N a global metadata Topk Fti Fti I1 I2 I3 Ip ti,fti ti,fti ti,fti ti,fti ………… ti,ftj FLASH RAM dstock page page no yes insert (d, s) d< max insert (d, s) no yes check flash stock = fti + fti + fti + … + fti fd,t< 0 max Ft1 , Ft2 , Ft3 ,… , Ftn d min s > min s purge s< min merge on d sscore(d) top-k Ghost

19 Experimental Evaluation
HW platform: development board ST3221G-EVAL MCU STM32-F217IG and microSD card Storage on two SD cards (Silicon Power SDHC Class 10 4GB & Kingston microSDHC Class 10 4GB) Index RAM bound = 5KB SSF branching factors: b=8 and b’=3 Datasets and query sets PDS/Personal Cloud use-case: “rich” documents very large vocabulary (500k terms) and documents (more than 1000 terms per doc on average) ENRON dataset: 500k s (946MB of raw text) Pseudo-desktop dataset (CIKM’09): 27k documents, i.e., , html, pdf, doc and ppt (252 MB of raw text) Smart sensor use-case: “poor” documents moderate vocabulary (10k terms) and documents (100 terms per doc on average) Synthetic dataset: 100k documents (129MB of raw text)

20 Comparison with the State-of-the-Art Search Engine Methods
a. Average document insertion times of Microsearch, SSF and the Inverted Index with Silicon Power storage. b. Query execution times with the Inverted Index, SSF and Microsearch with Silicon Power storage

21 Overall performance comparison
Comparison with the State-of-the-Art Search Engine Methods Overall performance comparison

22 Conclusion & Future Work
We presented an embedded search engine for smart objects equipped with extremely low RAM and large Flash storage Our proposal is founded on three design principles, to produce an embedded search engine reconciling high insert/delete rate and query scalability Our inverted index supports document deletions, while the state-of- the-art embedded search indexes do not consider document deletions. Future work : Efficient tag-based access control using the embedded search engine Apply our 3 designs principle (Write-Once Partitioning, Linear Pipelining, Background Merging) to other indexing structures for smart objects

23 Questions/suggestions ?
Merci ! Questions/suggestions ?


Download ppt "SMIS: Secured and Mobile Information Systems"

Similar presentations


Ads by Google