Presentation is loading. Please wait.

Presentation is loading. Please wait.

Course Project Ideas Yanlei Diao University of Massachusetts Amherst.

Similar presentations


Presentation on theme: "Course Project Ideas Yanlei Diao University of Massachusetts Amherst."— Presentation transcript:

1 Course Project Ideas Yanlei Diao University of Massachusetts Amherst

2 6/11/2015 Yanlei Diao, University of Massachusetts Amherst New Directions for DB Research Sensor data: new architecture XML: new data model Streams: new execution model Data quality and lineage: new services …

3 6/11/2015 Yanlei Diao, University of Massachusetts Amherst Querying in Sensor Networks Acoustic stream Store data locally at sensors and push queries into the sensor network –Flash memory energy- efficiency. –Limited capabilities of sensor platforms. Internet Gateway Image stream Flash Memory Push query to sensors

4 6/11/2015 Yanlei Diao, University of Massachusetts Amherst Optimize for Flash and Limited RAM Flash Memory Constraints –Data cannot be over-written, only erased –Pages can often only be erased in blocks (16-64KB) –Unlike magnetic disks, cannot modify in-place Challenges: –Energy: Organize data on flash to minimize read/write/erase operations –Memory: Minimize use of memory for flash database. 1.1. Load block 2.Into Memory 3. Save block back Erase block Memory 2. Modify in-memory ~16-64 KB ~4-10 KB

5 6/11/2015 Yanlei Diao, University of Massachusetts Amherst StonesDB: System Operation Image Retrieval: Return images taken last month with at least two birds one of which is a bird of type A. Identify “best” sensors to forward query. Provide hints to reduce search complexity at sensor. Proxy Cache of Image Summaries

6 6/11/2015 Yanlei Diao, University of Massachusetts Amherst StonesDB: System Operation Image Retrieval: Return images taken last month with at least two birds one of which is a bird of type A. Query Engine Partitioned Access Methods

7 6/11/2015 Yanlei Diao, University of Massachusetts Amherst Research Issues in StonesDB Local Database Layer –Reduce updates for indexing and aging. –New cost models for self-tuning sensor databases. –Energy-optimized query processing. –Query processing over aged data. Distributed Database Layer –What summaries are relevant to queries? –What remainder queries to send to sensors? –What resolution of summaries to cache?

8 6/11/2015 Yanlei Diao, University of Massachusetts Amherst XML (Extensible Markup Language) Foundations… Abiteboul Hull Vianu Addison Wesley 1995 … XML: a tagging mechanism to describe content.

9 6/11/2015 Yanlei Diao, University of Massachusetts Amherst XML Data Model (Graph) Main structure: ordered, labeled tree References between node: becoming a graph

10 6/11/2015 Yanlei Diao, University of Massachusetts Amherst XQuery: XML Query Language A declarative language for querying XML data XPath: path expressions –Patterns to be matched against an XML graph –/bib/paper[author/lastname=‘Croft’]/title FLOWR expressions –Combining matching and restructuring of XML data – For $p in distinct(document("bib.xml")//publisher) Let $b := document("bib.xml")/book[publisher = $p] Where count($b) > 100 Order by $p/name Return $p

11 6/11/2015 Yanlei Diao, University of Massachusetts Amherst Metadata Management using XML File systems for large-scale scientific simulations –File systems: petabytes or even more –Directory tree (metadata): large, can’t fit in memory –Links between files: steps in a simulation, data derivation File Searches –all the files generated on Oct 1, 2005 –all the files whose name is like ‘*simu*.txt’ –all the files that were generated from the file ‘basic-measures.txt’  Build an XML store to manage directory trees! –XML data model –XML Query language –XML Indices

12 6/11/2015 Yanlei Diao, University of Massachusetts Amherst XML Document Processing  Multi-hierarchical XML markup of text documents –Multi-hierarchies: part-of-speech, page-line –Features in different hierarchies overlap in scope –Need a query language & querying mechanism –References [Nakov et al., 2005; Iacob & Dekhtyar, 2005]  Querying and ranking of XML data –XML fragments returned as results –Fuzzy matches –Ranking of matches –References [Amer-Yahia et al., 2005; Luo et al., 2003] Well-defined problems  identify your contributions!

13 6/11/2015 Yanlei Diao, University of Massachusetts Amherst Data Stream Management Queries, Rules Event Specs, Subscriptions Results Data in motion, unending Continuous, long-running queries Data-driven execution Data Traditional Database Attr1 Attr2 Attr3 Query Data Stream Processor Data at rest One-shot or periodic queries Query-driven execution

14 6/11/2015 Yanlei Diao, University of Massachusetts Amherst XML is becoming the wire format for data In-network XML processing –Authentication –Authorization –Routing –Transformation –Pattern matching XPath widely used for in-network XML processing Applied directly to streaming XML data Line-speed performance In-Network XML Processing Expedite traffic Enhance security Real-time monitoring & diagnosis

15 6/11/2015 Yanlei Diao, University of Massachusetts Amherst Research Issues  Gigabit rate XPath processing –Take one look, process XPath, buffer data for future use if necessary –Processing needs to be gigabit rate –Memory usage needs to be minimized Time/space complexity of XPath stream processing –Theoretical analysis for common features of XPath Minimizing memory usage of YFilter technolgy –YFilter: state-of-the-art for multi-XPath processing

16 6/11/2015 Yanlei Diao, University of Massachusetts Amherst RFID Technology RFID technology 01.01298.6EF.0A 01.01267.60D.01 04.0768E.001.F0 reader_id, tag_id, timestamp

17 6/11/2015 Yanlei Diao, University of Massachusetts Amherst RFID Stream Processing Out of stocks : the number of items of product X on shelf ≤ 3. Shoplifting : an item was taken out of store without being checked out. 01.01298.6EF.0A 00129038 shelf 2 + 01.01298.6EF.0A 02183947 exit1 RFID tag RFID reader

18 6/11/2015 Yanlei Diao, University of Massachusetts Amherst RFID Processing: Global Tracking + 01.001298.6EF.0A … X Ltd. … 01.001298.6EF.0A … … <msr label=“temperature” max=2>90 … 01.001298.6EF.0A … … <msr label=“temperature” max=5>95 … 01.001298.6EF.0A … … <msr label=“temperature” max=2>80 … 01.001298.6EF.0A … … <msr label=“temperature” max=2>85 … 01.001298.6EF.0A … CVS … Counterfeit drugs: a bottle is accepted at the retailer if it came from a legal manufacturer and followed all necessary steps in the distribution network. Expired/spoiled drugs: a bottle is accepted at the retailer if it went through the distribution network in less than 3 months and was never exposed to temperature > 96 F. Missing pallet, expected case, illegally cloned tags…

19 6/11/2015 Yanlei Diao, University of Massachusetts Amherst Challenges in RFID Management Data-Information Mismatch –RFID raw data: (tag id, reader id, timestamp) –Meaningful information: shoplifting, misplaced inventory, out-of- stocks; expired drugs, spoiled drugs… Incomplete, inaccurate data –Readers miss tags –Readers can pick up tags from overlapping areas High-volume data –Readers read constantly, from all tags in range, without line-of-sight –Can create up to millions of terabytes of data in a single day Low-latency processing –Up-to-the-second information, time-critical actions

20 6/11/2015 Yanlei Diao, University of Massachusetts Amherst Research Issues Real-time event stream processing –Handling duplicate readings/results –Data cleaning –Data compression Handling incomplete readings –Inferences in event databases –Inferences over event streams Distributed processing –Real time anomaly detection –Distributed inferences

21 6/11/2015 Yanlei Diao, University of Massachusetts Amherst Adaptive Sensing of Atmosphere Environmental monitoring: real-time processing of huge- volume meteorological data Challenges –Large volume but limited bandwidth –Real-time processing –Uncertain data –Data archiving and querying the history Sense Send Merge Detection Prediction

22 6/11/2015 Yanlei Diao, University of Massachusetts Amherst Managing Uncertain Data Sources of data uncertainty 1)Sensing noise and partial scanning 2)Data compression 3)Lossy wireless links 4)Incomplete merging Managing uncertain data –Model sources of data uncertainty –Develop uncertainty calculus to combine the effects of these sources –Augment results with confidence values (1) (2) (3) Merge (4) Tornado Detection Prediction (confidence?)

23 6/11/2015 Yanlei Diao, University of Massachusetts Amherst Managing Uncertain Data Sources of data uncertainty 1)Sensing noise and partial scanning 2)Data compression 3)Lossy wireless links 4)Incomplete merging Self diagnosis and tuning –Compare predication at t with observation at t+1 (no ground truth?!) –System diagnosis when confidence value is low –Automatically tune the system (1) (2) (3) Merge (4) Tornado Detection Prediction (confidence?)

24 6/11/2015 Yanlei Diao, University of Massachusetts Amherst Questions

25 6/11/2015 Yanlei Diao, University of Massachusetts Amherst Outline An outside look: DB Application An inside look: Anatomy of DBMS Project ideas: DB Application Project ideas: DBMS Internals

26 6/11/2015 Yanlei Diao, University of Massachusetts Amherst Application: UMass CS Pub DB UMass Computer Science Publication Database –All papers on professors’ web pages and in their DBLP records –All technical reports Search: –Catalog search (author, title, year, conference, etc.) –Text search (using SQL “LIKE”) Navigation –Overview of the structure of document collection –Area-based “drill down” and “roll up” with statistics Add document Top hits Example: http://dbpubs.stanford.edu:8090/aux/index-en.html http://dbpubs.stanford.edu:8090/aux/index-en.html Deliverables: useful software, user-friendly interface

27 6/11/2015 Yanlei Diao, University of Massachusetts Amherst ManufacturerSupplier DCRetail DCRetail Store Application: RFID Database RFID technology RFID supply chain –Locations –Objects Pallet Truck Case

28 6/11/2015 Yanlei Diao, University of Massachusetts Amherst Application: RFID Database RFID technology RFID Supply chain Database propagation –Streams of (reader_id, tag_id, time) –Semantics: reader_id  location, tag_id  object –Containment Location-based, items in a case, cases on a pallet, pallets in a truck… Duration of containment –History of movement: (object, location, time_in, time_out) –Data compression for duplicate readings –Integration with sensors: temperature, location… Track and trace queries

29 6/11/2015 Yanlei Diao, University of Massachusetts Amherst Data Quality Closed world assumption: not any more! Various sources of data loss 1)Sensing noise 2)Data compression 3)Lossy wireless links 4)Incomplete merging Probabilistic query processing –Model sources of data loss –Quantify the effect on queries max(), avg(), percentile… –Output query results with confidence level (1) (2) (3) Merge (4)

30 6/11/2015 Yanlei Diao, University of Massachusetts Amherst Some idea on INFOD/data dissemination


Download ppt "Course Project Ideas Yanlei Diao University of Massachusetts Amherst."

Similar presentations


Ads by Google