Presentation is loading. Please wait.

Presentation is loading. Please wait.

RFID Data Management Kamlesh Laddhad (05329014) Karthik B.(05329021) Guide: Prof. Bernard Menezes.

Similar presentations


Presentation on theme: "RFID Data Management Kamlesh Laddhad (05329014) Karthik B.(05329021) Guide: Prof. Bernard Menezes."— Presentation transcript:

1 RFID Data Management Kamlesh Laddhad (05329014) Karthik B.(05329021) Guide: Prof. Bernard Menezes

2 Outline Introduction to RFID Technology.Introduction to RFID Technology. Issues with RFID Technology.Issues with RFID Technology. RFID Data Characteristics.RFID Data Characteristics. Data Warehousing.Data Warehousing. –Expressive Temporal Model: Dynamic Relationship ER Model –RFID - Cuboids. –Use of Bitmap Datatype. Data Cleaning.Data Cleaning. –Extensible Sensor stream Processing (ESP) –Statistical sMoothing for Unreliable RFid data.(SMURF) Future Plans.Future Plans.

3 Introduction Radio Frequency Identification:Radio Frequency Identification: –It is an Automatic Identification and Data Capture Technology. –Fast –No contact or line of sight. –Uses radio-frequency waves to transfer data ComponentsComponents –Tag: small, low-cost device that can hold a limited amount of data. Associated with objects, such as pallets, cases, and even individual items.Associated with objects, such as pallets, cases, and even individual items. –Reader: Recognize presence of tag and read info stored on it. Unique electronic product code (EPC) associated with a tag.Unique electronic product code (EPC) associated with a tag. By placing RFID tag readers at various locations, one can track the movement of objects through supply chain networks.By placing RFID tag readers at various locations, one can track the movement of objects through supply chain networks.

4 Applications and Adoptions Supply Chain Management: real-time inventory tracking.Supply Chain Management: real-time inventory tracking. –US Department Of Defense: shipments to armed forces Retail: Active shelves monitor product availabilityRetail: Active shelves monitor product availability –Wal-Mart, Albertson: Major Retails stores Access control: toll collection, transportation.Access control: toll collection, transportation. –Airline luggage management: British airways:20 million bags a yearBritish airways:20 million bags a year Implemented to reduce lost/misplaced luggage Implemented to reduce lost/misplaced luggage Anti-counterfeiting and security:Anti-counterfeiting and security: –Food and Drug Administration: To reduce counterfeit in pharmaceutical supply chain

5 Prospective for RFID research The physics of building tags and readersThe physics of building tags and readers –Tags have few gates: Apart from basic operation, very less computing power. –Radio-frequency has some issues with operating in certain physical mediums. The privacy and safety issues:The privacy and safety issues: –Complex encryption schemes are not possible on RFID tags. –Counterfeiting by means of either illegitimate readers or spoofed tags are possible –Reader-tag communication is wireless: Third parties can eavesdrop on signals. Software Architecture to collect, filter, organize, and answer online queries:Software Architecture to collect, filter, organize, and answer online queries: –No. of tags are proportional to No of items being serviced/tracked. –No. of readers are proportional to traceable strategic locations/areas Each Reader picks up tag signals on continuous basis.Each Reader picks up tag signals on continuous basis. Data generated by RFID systems is enormous:Data generated by RFID systems is enormous: E.g. Wal-Mart is expected to generate 7 terabytes of RFID data per day.E.g. Wal-Mart is expected to generate 7 terabytes of RFID data per day. Our Focus: Third Stream.Our Focus: Third Stream.

6 Data Warehousing Techniques

7 Data Management Challenges Data Explosion : ExampleData Explosion : Example –A retailer with 3,000 stores, selling 10,000 items a day per store. –Each item moves 10 times on average before being sold Movement recorded as (EPC, location, second)Movement recorded as (EPC, location, second) –Data volume: 300 million tuples per day. –Example OLAP Query: “Average time for items to move from warehouse to checkout counter in March 2006?”. Costly to answer if there are a billion tuples for March 2006.Costly to answer if there are a billion tuples for March 2006.

8 Data Characteristics Temporal and history orientedTemporal and history oriented –Applications dynamically generate observations (readings). –Objects location and containment relationship among objects changes –Need: Expressive data model. Inaccurate data and implicit semanticsInaccurate data and implicit semantics –False positive: Non-existing tag incorrectly read. –False Negative: Reader missed a tag which was in its vicinity. –Noisy data & duplicate readings (redundancy): Same tag read more than once. –Need: Automated data filtering and transformation. Streaming and large volumeStreaming and large volume –Object stay in place for longer duration: Readers records them periodically. Large data keeps generating. –We need to preserve this data for tracking and monitoring. –Need: Scalable storage scheme, compression techniques to reduce data. Data GranularityData Granularity –Data collection granularity needs to be decided –Differs across applications.

9 Warehousing Helps!! Lossless compressionLossless compression –Remove redundancy: (r 1,l 1,t 1 ) (r 1,l 1,t 2 )... (r 1,l 1,t 10 ) => (r 1,l 1,t 1,t 10 ) –Group objects that move and stay together. Data cleaning: Multi-reading, missed-reading, error-reading, bulky movement.Data cleaning: Multi-reading, missed-reading, error-reading, bulky movement. Data mining: Find trends, outliers, frequent, sequential, flow patterns.Data mining: Find trends, outliers, frequent, sequential, flow patterns. Multi-dimensional summary: product, location, time, …Multi-dimensional summary: product, location, time, … –Store manager: Check item movements from the backroom to different shelves in his store –Region manager: Collapse intra-store movements and look at distribution centers, warehouses, and stores Query ProcessingQuery Processing –Support for OLAP: roll-up, drill-down, slice, and dice –Path query: New to RFID-Warehouses, about the structure of paths What products that go through quality control have shorter paths?What products that go through quality control have shorter paths? What locations are common to the paths of a set of defective auto-parts?What locations are common to the paths of a set of defective auto-parts? Identify containers at a port that have deviated from their historic pathsIdentify containers at a port that have deviated from their historic paths

10 Dynamic Relationship ER Model Proposed by Wang and Liu from Siemens.Proposed by Wang and Liu from Siemens. RFID entities are static and are not altered.RFID entities are static and are not altered. RFID relationships: dynamic and change all the time.RFID relationships: dynamic and change all the time. Two types of dynamic relationships added:Two types of dynamic relationships added: –Event-based dynamic relationship. A timestamp attribute added to represent the occurring timestamp of the event. –State-based dynamic relationship. tstart and tend attributes added to represent the lifespan of a state.

11 Static entity tableStatic entity table –OBJECT (object_epc, name, description) –LOCATION (location_id, name, owner) Dynamic relationship tablesDynamic relationship tables –OBSERVATION(sensor_epc, value, timestamp) –OBJECTLOCATION(epc, location_id, tstart, tend) –TRANSACTIONITEM(transaction_id, epc, timestamp) –SENSOR (sensor_epc, name, description) –TRANSACTION (transaction_id, transaction_type) –CONTAINMENT(epc, parent_epc, tstart, tend) –SENSORLOCATION(sensor epc, location id,position, tstart, tend)

12 Monitoring. Missing RFID Object Detection:Missing RFID Object Detection: –Find when and where object holding EPC= `MEPC’ was lost. select location_id, tstart, tend from objectlocaiton where epc='MEPC' and tstart = ( select max(o.tstart) from objectlocation o where o.epc='MEPC' )select location_id, tstart, tend from objectlocaiton where epc='MEPC' and tstart = ( select max(o.tstart) from objectlocation o where o.epc='MEPC' ) –Check if there are missing objects at current location C, knowing that all objects were complete at previous location L at time T. select l.epc from objectlocation l where l.location_id = 'L' and l.tstart = 'T' and l.epc not in ( select c.epc from objectlocation c where c.location_id = 'C' )select l.epc from objectlocation l where l.location_id = 'L' and l.tstart = 'T' and l.epc not in ( select c.epc from objectlocation c where c.location_id = 'C' )

13 Tracking RFID Object Moving Time Inquiry:RFID Object Moving Time Inquiry: –Time it takes to supply ‘OEPC’ from location S to location E? select (e.tstart-s.tstart) as supplying_time from objectlocation e, objectlocation s where e.epc = 'OEPC' and s.epc='OEPC' and s.location_id ='S' and e.locaiton_id='E'select (e.tstart-s.tstart) as supplying_time from objectlocation e, objectlocation s where e.epc = 'OEPC' and s.epc='OEPC' and s.location_id ='S' and e.locaiton_id='E'

14 Compression Idea Bulky object movementsBulky object movements –Objects often move and stay together through the supply chain. –If 1000 packs of product P stay together at the distribution center, register a single record. –(GID, distribution center, time_in, time_out). –GID is a generalized identifier that represents the 1000 packs that stayed together at the distribution center Analysis usually takes place at a much higher level of abstraction than the one present in raw RFID dataAnalysis usually takes place at a much higher level of abstraction than the one present in raw RFID data Factory Dist. Center 1 Dist. Center2 … 10 pallets (1000 cases) store 1 store 2 … 20 cases (1000 packs) shelf 1 shelf 2 … 10 packs (12 sodas)

15 RFID Cuboids Fact Table: (EPC, location, time_in, time_out).Fact Table: (EPC, location, time_in, time_out). In supply chain: Items travel through a series of locations.In supply chain: Items travel through a series of locations. Query: what is the average time that product P stays at store in Location A?Query: what is the average time that product P stays at store in Location A? Traditional cubes miss the path structure of the dataTraditional cubes miss the path structure of the data Stay Table: (GIDs, location, time_in, time_out: measures):Stay Table: (GIDs, location, time_in, time_out: measures): –Records information on items that stay together at a given location –If using record transitions: difficult to answer queries, lots of intersections needed Map Table: (GID, )Map Table: (GID, ) –Links together stages that belong to the same path. Provides additional: compression and query processing efficiency –High level GID points to lower level GIDs –If saving complete EPC Lists: high costs of IO to retrieve long lists, costly query processing Information Table: (EPC list, attribute 1,...,attribute n)Information Table: (EPC list, attribute 1,...,attribute n) –Records path-independent attributes of the items, e.g., color, manufacturer, price..

16 EPC Overview Electronic product codeElectronic product code –Standard naming scheme, proposed by Auto-Id Center. –An EPC uniquely identifies an item. –Format: –Format: Header: Identifies the length, type, structure, version and generation of EPC.Header: Identifies the length, type, structure, version and generation of EPC. Manager Number: Identifies an organizational entity.Manager Number: Identifies an organizational entity. Object Class: Identifies a “class”, or type of thing.Object Class: Identifies a “class”, or type of thing. Serial Number: Specific instance of the Object Class being tagged.Serial Number: Specific instance of the Object Class being tagged. –We will refer to : Prefix : Prefix : Suffix : Suffix

17 Use of Bitmap Datatype Observation: Items move together.Observation: Items move together. –Groups of items in the same proximity - e.g. on a shelf, on a shipment –Groups of items with same property - e.g. Same product Use a bitmap type for modeling a collection of EPCs that can occur in item tracking applications.Use a bitmap type for modeling a collection of EPCs that can occur in item tracking applications. –Instead of storing a tuple per item store a tuple for all the items having same prefix. –New extra fields instead of epc:

18 Example: Product Inventory With EPC CollectionsWith EPC CollectionsWith epc_bitmaps Store_idProd_idTimeItem_collection s1p1t1 epc11, epc12, epc13, … s1p2t2 epc21, epc22, epc23, … ………… Store_idProd_idTimeItem_bmaps1p1t1bmap1 s1p2t2bmap2 …………

19 Use of Bitmap Datatype Header EPC_Manager Object_Class Serial_Number 2-bits 21-bits 17-bits 24-bits 0x4AA890001F62C160 0x4AA890001F62C160 ………………………… ………………………… 0x4AA890001FA0B38E 0x4AA890001FA0B38E LenSuff_lenPrefixSuff_startSuff_endbitmap 64240x4AA890001F0x62C1600xA0B38E101001…00010

20 Bitmap Operations To use this with such datatype in SQL, we need operations on such bitmaps.To use this with such datatype in SQL, we need operations on such bitmaps. Conversion and couting Operations: epc2Bmap, bmap2Epc and bmap2CountConversion and couting Operations: epc2Bmap, bmap2Epc and bmap2Count Pairwise Logical Operations: bmapAnd, bmapOr, bmapMinus, and bmapXorPairwise Logical Operations: bmapAnd, bmapOr, bmapMinus, and bmapXor Maintenance Operations: bmapInsert and bmapDeleteMaintenance Operations: bmapInsert and bmapDelete Membership Testing Operation: bmapExistsMembership Testing Operation: bmapExists Comparison Operation: bmapEqualComparison Operation: bmapEqual

21 Use of these operations in SQL Items added to a given shelf between time t1 and t2.Items added to a given shelf between time t1 and t2. –SELECT bmap2Epc(bmapMinus(s2.item_bmap, s1.item_bmap)) FROM Shelf_Inventory s1, Shelf_Inventory s2 WHERE s1.shelf_id = AND s1.shelf_id = s2.shelf_id AND s1.time = AND s2.time = ; Book store categorizes books in various categories.Book store categorizes books in various categories. –Following query determines the shelves where the books with property ’Adventure’ and ’Romance’, are currently present in the store. –SELECT s.shelf_id FROM Shelf_Inventory s WHERE bmap2Count(bmapAnd( s.item_bmap, SELECT bmapAnd(p.Adventure, p.Romance) FROM Propery_Inventory p) ) > 0; AND s.time= ;

22 Road Ahead Extension to bitmap proposal:Extension to bitmap proposal: –Bitmap datatype is more appropriate for initial bulk-load & batch updates. –It performs badly for incremental updates. –A ‘hybrid Scheme’ for incremental Updates: Maintain inventories periodic checkpoints using bitmaps.Maintain inventories periodic checkpoints using bitmaps. For changes occurring between checkpoints, Maintain a traditional item-level table.For changes occurring between checkpoints, Maintain a traditional item-level table. Answer queries by merging the latest checkpoint bitmap with the corresponding duration’s item-level data.Answer queries by merging the latest checkpoint bitmap with the corresponding duration’s item-level data. The epc_suffix in the collection may not be contiguousThe epc_suffix in the collection may not be contiguous –The bitmap will be sparse- Lot of zeros. –Compress this using some encoding scheme Good for initial bulk loading and batch updatesGood for initial bulk loading and batch updates May reduce efficiency of bitmap operations.May reduce efficiency of bitmap operations.

23 Open Problems Efficient methods data mining problemsEfficient methods data mining problems –Trend analysis –Outlier detection –Path clustering We will try exploring data mining applications to RFID data.We will try exploring data mining applications to RFID data.

24 RFID Data Cleaning

25 Issues in Data Cleaning Lack of CompletenessLack of Completeness –RFID readers capture only 60-70% of all tags that are in the vicinity –Smoothing of data is done to rectify the loss of intermediate messages Temporal Nature of data or tag dynamicsTemporal Nature of data or tag dynamics –RFID tags are in motion and that is what makes them more difficult to handle –But motion of a tag causes dropping of messages RFID data streams are very fast and are huge in numberRFID data streams are very fast and are huge in number –Hence filtering is important before sending them to database

26 Current Strategies Temporal Granule:Temporal Granule: – Based on the fact that tag data do not differ much over a small time period –Data can be clubbed on a small time frame Spatial Granule:Spatial Granule: –Similarly, data from physically close readers are also homogeneous

27 Stages of ESP Point: operates over a single value in a sensor stream, filtered by a predicate in the WHERE clausePoint: operates over a single value in a sensor stream, filtered by a predicate in the WHERE clause Smooth: granularity defined by applications to correct for missed readings temporally (over one input only); uses aggregate function over the input.Smooth: granularity defined by applications to correct for missed readings temporally (over one input only); uses aggregate function over the input. Merge: granularity specified by the application to correct for missed readings spatially; grouped by the specified spatial granule.Merge: granularity specified by the application to correct for missed readings spatially; grouped by the specified spatial granule.

28 Stages of ESP (contd.) Arbitrate: deals with conflicts between different spatial granules; grouped by spatial granule first and then uses HAVING construct to determine those conflictsArbitrate: deals with conflicts between different spatial granules; grouped by spatial granule first and then uses HAVING construct to determine those conflicts Virtualize: used for combining data streams from different sources, could also be different devices; join construct is used to combine the different data streams and then filtered using some predicateVirtualize: used for combining data streams from different sources, could also be different devices; join construct is used to combine the different data streams and then filtered using some predicate

29 Smooth stage False Positives: (erroneous readings) reporting objects that are not actually presentFalse Positives: (erroneous readings) reporting objects that are not actually present False Negatives: (missed readings) not reporting objects that actually are presentFalse Negatives: (missed readings) not reporting objects that actually are present False positives and False Negatives [Jeff06]

30 Tag List The reader has an internal table called the Tag List.The reader has an internal table called the Tag List. An epoch is the smallest unit of interaction between the reader and the middleware.An epoch is the smallest unit of interaction between the reader and the middleware. Every epoch consists of certain number of Interrogation cyclesEvery epoch consists of certain number of Interrogation cycles Interrogation Cycle is one run of the reader protocol to determine all tagsInterrogation Cycle is one run of the reader protocol to determine all tags At every epoch the reader sends the tag list to the middleware.At every epoch the reader sends the tag list to the middleware. Tag ID ResponsesTimestamp 123412346t1 123478901t2

31 SMURF – Per tag Cleaning SMURF uses statistical methods to reduce the false negative and false positives happening in the RFID stream.SMURF uses statistical methods to reduce the false negative and false positives happening in the RFID stream. The goal here is two fold: one is to determine the statistical window size, and secondly, ensuring that the transition of the tags is determined.The goal here is two fold: one is to determine the statistical window size, and secondly, ensuring that the transition of the tags is determined. To determine the window size we need to fit a probability distribution to the sample sizeTo determine the window size we need to fit a probability distribution to the sample size And to determine the transition of the tag out of the reader's vicinity, we define a 98% confidence interval within that probability distribution function on the sample size |S i |.And to determine the transition of the tag out of the reader's vicinity, we define a 98% confidence interval within that probability distribution function on the sample size |S i |.

32 SMURF – Per tag Cleaning (contd.) Using the tag list, per-epoch sampling probability, p i,t is determined, p i,t = number of times tag was read in a epoch / interrogation cycles per epochUsing the tag list, per-epoch sampling probability, p i,t is determined, p i,t = number of times tag was read in a epoch / interrogation cycles per epoch We average this over the sample size |S i | to get the average read rate (p i avg ) for a tag i.We average this over the sample size |S i | to get the average read rate (p i avg ) for a tag i. If same probability of p i is assumed for each epoch throughout the window then each successful observation is like a Bernoulli trail.If same probability of p i is assumed for each epoch throughout the window then each successful observation is like a Bernoulli trail.

33 SMURF – Per tag Cleaning (contd.) So, |S i | is the binomial random variable for a sample S i with mean = w i. p i avg and variance = w i. p i avg. (1-p i avg )So, |S i | is the binomial random variable for a sample S i with mean = w i. p i avg and variance = w i. p i avg. (1-p i avg ) Now using this we can express the window size as a limit,Now using this we can express the window size as a limit, If the current window size is less than the calculated one then the window size is adjusted accordingly.If the current window size is less than the calculated one then the window size is adjusted accordingly. Similarly using the Central limit theorem for transition detection we get ||S i | - μ| > 2 σSimilarly using the Central limit theorem for transition detection we get ||S i | - μ| > 2 σ

34 Normal Sliding window…. Epoch based mid-point sliding windowEpoch based mid-point sliding window Emits a reading with an epoch value corresponding to the middle of the windowEmits a reading with an epoch value corresponding to the middle of the window

35 Ensuring Completeness In the first window, p i avg demands a larger windowIn the first window, p i avg demands a larger window Thus window size is increasedThus window size is increased

36 Transition Detection In the first window the number of readings decreases significantly (and statistically)In the first window the number of readings decreases significantly (and statistically) Thus a transition is likely to have occurred; so window is halvedThus a transition is likely to have occurred; so window is halved [Fraklin06]

37 SMURF – Multi-tag aggregate Cleaning Similar to per-tag cleaning, the window for multi-tag cleaning is determined by: Here, p avg is the average per-epoch sampling probability over all observed tags.Similar to per-tag cleaning, the window for multi-tag cleaning is determined by: Here, p avg is the average per-epoch sampling probability over all observed tags. To detect the transition in population count, we estimate the population count of two windows [t – w i, t] and [t – w i /2, t]; with true populations: N w & N w’To detect the transition in population count, we estimate the population count of two windows [t – w i, t] and [t – w i /2, t]; with true populations: N w & N w’ Thus, for a transition to have happened, we need the difference between the two estimates to be within the limit: 2(σ w + σ w’ )Thus, for a transition to have happened, we need the difference between the two estimates to be within the limit: 2(σ w + σ w’ )

38 SMURF – Multi-tag aggregate Cleaning To calculate the estimate of population count, we use π- estimators; The estimated population count is given by:To calculate the estimate of population count, we use π- estimators; The estimated population count is given by: Similarly by π-estimators, and assuming independence across different tags, the variance of the estimate is estimated as:Similarly by π-estimators, and assuming independence across different tags, the variance of the estimate is estimated as: Here π i is probability of reading the tag i at least once during the whole window, given by 1 – (1 – p i avg ) wHere π i is probability of reading the tag i at least once during the whole window, given by 1 – (1 – p i avg ) w

39 The Road ahead… Applications in RFID do not accept any delays in the data deliveryApplications in RFID do not accept any delays in the data delivery Data is either present in the cache or the database; data in the database increases processing time and data in cache does not understand SQL like queriesData is either present in the cache or the database; data in the database increases processing time and data in cache does not understand SQL like queries Anomaly detection in object tracking is also an important part of object trackingAnomaly detection in object tracking is also an important part of object tracking Issues like untraceability, forward security, and database desynchronization are still not completely resolved.Issues like untraceability, forward security, and database desynchronization are still not completely resolved. One more serious problem with RFID is counterfeitingOne more serious problem with RFID is counterfeiting In the next stage we expect to look into some of these issuesIn the next stage we expect to look into some of these issues

40 ????

41 Thank You.

42 References Xiaolei Li, Hector Gonzalez, Jiawei Han and Diego Klabjan. Warehousing and analyzing massive RFID data sets. ICDE, 2006. Fusheng Wang and Peiya Liu. Temporal management of RFID data. VLDB, 2005. Timothy Chorma, Ying Hu, Seema Sundara and Jagannathan Srinivasan. Supporting RFID-based item tracking applications in oracle DBMS using a bitmap datatype. VLDB, 2005.

43 References Minos Garofalakis, Shawn R. Jeffery and Michael J. Franklin. Adaptive cleaning for RFID data streams. VLDB, 2006. J. Franklin, Wei Hong, Shawn R. Jeffery, Gustavo Alonso and Jennifer Widom. Declarative support for sensor data cleaning. In Pervasive, 2006. Sridhar Ramachandran Sudarshan S. Chawathe, Venkat Krishnamurthy and Sanjay E. Sarma. Managing RFID data. VLDB, 2004.


Download ppt "RFID Data Management Kamlesh Laddhad (05329014) Karthik B.(05329021) Guide: Prof. Bernard Menezes."

Similar presentations


Ads by Google