CS & CS ST: Probabilistic Data Management

Slides:



Advertisements
Similar presentations
CS 498 Senior Seminar Students will research a current topic in computer science, write a paper on that topic, and make an oral presentation.
Advertisements

CS 498 Senior Seminar Students will research a current topic in computer science, write a paper on that topic, and make an oral presentation.
General information CSE 230 : Introduction to Software Engineering
CSE 322: Software Reliability Engineering Topics covered: Course outline and schedule Introduction, Motivation and Basic Concepts.
Introduction to SEG 5010 Hong Cheng 2009/10 Second Term.
Computer Science – Information Literacy Seminar ODUCS Information Literacy.
Syllabus CS 765: Introduction to Database Management Systems Fall 2008 Text Database Management Systems Ramakrishnan/Gehrke, 3rd.
Cpt S 471/571: Computational Genomics Spring 2015, 3 cr. Where: Sloan 9 When: M WF 11:10-12:00 Instructor weekly office hour for Spring 2015: Tuesdays.
Math 125 Statistics. About me  Nedjla Ougouag, PhD  Office: Room 702H  Ph: (312)   Homepage:
CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY.
Proposal for Term Project Operating Systems, Fall 2015 J. H. Wang Sep. 18, 2015.
Course Introduction Software Engineering
CS 498 Senior Seminar Students will research a current topic in computer science, write a paper on that topic, and make an oral presentation.
Introduction to Databases Computer Science 557 September 2007 Instructor: Joe Bockhorst University of Wisconsin - Milwaukee.
Overviews of ITCS 6161/8161: Advanced Topics on Database Systems Dr. Jianping Fan Department of Computer Science UNC-Charlotte
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Proposal for Term Project Operating Systems, Fall 2012 J. H. Wang Nov. 13, 2012.
IST 210: Organization of Data
MLG 205 September 5. Check Homework Major Due Dates in your planner 3-ring binder, paper, and folders.
CSCE 5073 Section 001: Data Mining Spring Overview Class hour 12:30 – 1:45pm, Tuesday & Thur, JBHT 239 Office hour 2:00 – 4:00pm, Tuesday & Thur,
IST 210: ORGANIZATION OF DATA Introduction IST210 1.
BUS 642 Entire Course (2 Sets) FOR MORE CLASSES VISIT This Tutorial Contains 2 Sets of Assignments for All Weeks, Check Details.
Web-Based Information Retrieval Week 1: Administrivia Old Dominion University Department of Computer Science CS 895 Spring 2013 Michael L. Nelson 01/15/13.
CSE6339 DATA MANAGEMENT AND ANALYSIS FOR COMPUTATIONAL JOURNALISM CSE6339, Spring 2012 Department of Computer Science and Engineering, University of Texas.
Class Introduction IST 210: Organization of Data IST2101.
CS & CS ST: Probabilistic Data Management Fall 2016 Xiang Lian Kent State University Kent, OH
Term Project Proposal By J. H. Wang Apr. 7, 2017.
Welcome to CS 4390/CS5381: Introduction to Formal Methods
Course Overview - Database Systems
BUS 642 HOMEWORK Marvelous Learning / bus642homework.com
BUS 642 Course Experience Tradition / snaptutorial.com
CS & CS ST: Big Data Analytics
Course Information and Introductions
TJTS505: Master's Thesis Seminar
Course Information and Introductions
Course Introduction 공학대학원 데이타베이스
The progress of the world depends almost entirely upon education
Proposal for Term Project
It’s called “wifi”! Source: Somewhere on the Internet!
Introduction to Information Retrieval Week 1: Administrivia
Computer Science 102 Data Structures CSCI-UA
CS410: Text Information Systems (Spring 2018)
CS & CS Capstone Project & Software Development Project
BUS 642 Education for Service-- snaptutorial.com.
Introduction and Overview
Cpt S 471/571: Computational Genomics
CS & CS Probabilistic Data Management
Introduction to CS Senior Design Project I / II
Introduction to CS Senior Design Project I / II
Probabilistic Data Management
Course Overview - Database Systems
Introduction to Information Retrieval Week 1: Administrivia
Andy Wang Operating Systems COP 4610 / CGS 5765
EECE 310 Software Engineering
CS & CS Capstone Project & Software Development Project
Andy Wang Operating Systems COP 4610 / CGS 5765
CS & CS Capstone Project & Software Development Project
Proposal for Term Project Operating Systems, Fall 2018
Title: Thesis Template
Cpt S 471/571: Computational Genomics
Andy Wang Operating Systems COP 4610 / CGS 5765
BIO1130 Lab 2 Scientific literature
Research and Life in Sakurai Lab.
Web-Based Information Retrieval Week 1: Administrivia
CS4501: Information Retrieval Course Policy
Data Engineering Research Group
Andy Wang Operating Systems COP 4610 / CGS 5765
CSCE 4143 Section 001: Data Mining Spring 2019.
Web-Based Information Retrieval Week 2: Administrivia
CPE 626 Advanced VLSI Design, Spring 2002 Admin
Presentation transcript:

CS 69995 & CS 79995 ST: Probabilistic Data Management Xiang Lian Department of Computer Science Kent State University Email: xlian@kent.edu Homepage: http://www.cs.kent.edu/~xlian/

Probabilistic Data Management An Overview of Probabilistic Data Management Data Uncertainty Model Probabilistic Query Answering Over Probabilistic and Uncertain Databases Probabilistic Graph Databases Data Quality in Probabilistic Databases

Background Needed Probability & statistics (math) Database techniques (e.g., index) Programming (e.g., C++, Java, or Python etc.) You need to be able to look up how to get things done (for example, read papers/surveys from online resources, using digital library, Google, Wikipedia, etc.)

Skills This course is a seminar course, in which you need to learn how to do research Lecture Literature review (survey) Project report Presentations & demonstrations Research collaborations

Study Group Please form a team with 2-3 members The workload should be distributed evenly to each team member Each team needs to finish 1 survey + 1 project report + 1 presentation + 1 bonus presentation (optional): A survey on a selected research topic A project report (including introduction, problem definition, related work, the proposed approaches, experimental evaluation, and conclusions); A presentation & demonstration on your research paper An optional presentation on 1-2 existing research papers in your selected research directions (20-25 minutes)

Survey & Research Project I will post a reading list of papers It does not include all related works, but only a few typical papers in different research directions You need to search digital libraries (e.g., ACM portal, IEEE Xplore, etc.) and Google the Web to find more related works in each direction You need to decide which topics/problems you want to study Please make an appointment with me to discuss research directions of your teams (within the first 3 weeks of the semester; on or before Sept. 14)

Scoring and Grading 5% - Attendance & Questions 50% - 5 Homeworks (10 points each) 15% - 1 Survey on papers for the selected research topics in recent database conferences/journals 20% - Research Project Report Code and report for the research project in paper format 10% - Presentations & Demonstration Presentation and demonstration for the proposed research project 5% - Bonus Points, rated by other team members 10% - (Optional) Presentation for 1-2 related works in the selected research direction

Scoring and Grading (cont'd) B = 80 - 89 C = 70 - 79 D = 60 - 69 F = <60 The maximum score you can get is: 115!

Use of the Textbook No textbooks!! Reference books Charu C. Aggarwal. Managing and Mining Uncertain Data. Springer Publishing Company, 2009. ISBN: 978-0-387-09689-6 (Print) 978-0-387-09690-2 (Online), https://link.springer.com/book/10.1007%2F978-0-387-09690-2 Lei Chen and Xiang Lian. Query Processing over Uncertain Databases. In Synthesis Lectures on Data Management, Vol. 4, No. 6, pages 1-101, Morgan & Claypool Publishers, 2012. ISBN: 9781608458929, http://www.morganclaypool.com/doi/abs/10.2200/S00465ED1V01Y201212DT M033 Dan Suciu, Dan Olteanu, Christopher Re, and Christoph Koch. Probabilistic Databases. In Synthesis Lectures on Data Management, Morgan & Claypool Publishers, 2011. ISBN-13: 978-1608456802, ISBN-10: 1608456803, http://www.morganclaypool.com/doi/abs/10.2200/S00362ED1V01Y201105DT M016

Online Resources The only resources are papers!! ACM digital library http://dl.acm.org/ IEEE Xplore Digital Library http://ieeexplore.ieee.org/Xplore/home.jsp DBLP http://dblp.uni-trier.de/ Database Conferences SIGMOD, PVLDB, ICDE, EDBT, CIKM Database Journals TODS, VLDBJ, TKDE

The Schedule for the Class I expect to give lectures and introduce the concepts and techniques of probabilistic data management for the first 2 months (September & October) Then, each team will submit a survey on related works in the literature (October) Finally, each team will start to identify research problems and find solutions. You need to write a project report in the paper format, do experiments (comparing with the existing approaches), and present/demonstrate your paper in class (November & December).

Advices & Suggestions Editor Tools: Survey Project Report Latex vs. MS Word Survey Check "Related Work" sections in most recent papers, and you can obtain more related papers Read abstract/introductions of papers, and classify papers into different categories (this will help you later to identify problems that have not been solved before) Project Report Even if you are not familiar with some topics, try to read as many related works as possible to understand the general problems and solutions in these topics (you can skip some part, if it is too hard to understand) Stick to the problem you want to solve, and use any resource you can find to solve the problem (note: DO NOT simply apply previous techniques to your problem, since it is not counted as your contributions!!)

Advices & Suggestions (cont'd) Project Report Introduction Related works Problem definition Solutions Experiments Conclusions In the project, please add a module to visualize your experimental results

Advices & Suggestions (cont'd) Do not copy from any sources (even for the survey) Any form of academic dishonesty will be strictly forbidden and will be punished to the maximum extent Allowing another student to copy one's work will be treated as an act of academic dishonesty, leading to the same penalty as copying

Advices & Suggestions (cont'd) If the resulting surveys and papers are of high quality and novel, I highly recommend you to submit them to database conferences or journals After this class, self-motivated, hardworking, and creative students with good performance on surveys/papers may have the chance to join my lab (Big Data Science Research Lab)!

Examples of Probabilistic Data (1) Witnessed Person location t.p PID1 A 0.9 PID2 B 0.2 PID3 0.1 Person ID Zip code Disease PID1 44224 (pneumonia,0.3), (flu, 0.7) PID2 44242 (AIDS, 0.9)

Examples of Probabilistic Data (2) GPS samples Location data are imprecise

Examples of Probabilistic Data (3) Inaccuracy of the data integration Unreliability of the data sources Data inconsistency … …

Queries k nearest neighbor query Range query Top-k query Skyline query …

Probabilistic Query Processing on Uncertain Data How to efficiently answer probabilistic queries over large-scale uncertain data? How to retrieve accurate query answers with confidence guarantees?