Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSCI5570 Large Scale Data Processing Systems

Similar presentations


Presentation on theme: "CSCI5570 Large Scale Data Processing Systems"— Presentation transcript:

1 CSCI5570 Large Scale Data Processing Systems
Course Overview Instructor: Prof. James Cheng

2 Course Webpage Check course webpage regularly
Remark: I prefer to put the course webpage under my own directory to make it easier for off-campus access.

3 Topics Overview Topic Tentative Schedule
Introduction and Course Project Week 1 Prerequisite: Relational Database Systems & Distributed Database Systems Weeks 2-4 Self Reading Distributed Data Analytics Systems Weeks 2-5 NoSQL Weeks 5-8 NewSQL Weeks 9-10 Distributed Graph Processing Systems Weeks 11-12 Distributed Stream Processing Systems Weeks 12-13 Other Large Scale Data Processing Systems ??? Distributed Machine Learning Systems Course Project

4 Prerequisites Fundamental concepts of distributed database systems, prerequisite to NoSQL and NewSQL, as well as other distributed data processing systems Parallel query processing Distributed query processing

5 Distributed Data Analytics Systems
Focus on state-of-the-art big data platforms, widely adopted by industry (e.g., Hadoop, Spark) or best in research (e.g., Naiad, Husky) Fundamental concepts of big data analytics systems Applications (too ad hoc to teach them all, but you can try them out with the systems taught in class): Data collecting, data extraction, data cleaning … Machine learning (e.g., classification, clustering, recommendation, feature selection, dimensionality reduction …) OLAP, data cube Data mining Graph analytics (including social network analysis) Similarity search (e.g., scalable locality sensitive hashing)

6 NoSQL/NewSQL Relational databases are the foundation of western civilization, but now is the era of NoSQL databases NoSQL databases, such as MongoDB, Cassandra, CouchDB, etc., are rapidly taking large shares of the market from traditional vendors such as Oracle Must learn for big data analytics NewSQL databases try to combine the pros of both traditional DBMS and NoSQL

7 Distributed Graph Processing Systems
Graph data: web graphs, online social networks, mobile communication networks, financial networks, biological networks, neutral networks … Distributed systems that make the analysis of these large scale graphs/networks possible Key techniques and algorithms for large scale graph data processing

8 Distributed Stream Processing Systems
Streaming data become common today, e.g., tweets, news feeds, … How to analyze such massive high-speed data in real time? Key techniques and applications

9 Distributed Data Storage Systems
How to store massive volumes of different types of data, retrieve them, and update them efficiently? How to handle consistency issues? How to handle availability issues?

10 Reading List A list of papers for each topic (except for the older topics such as Relational Database Systems and Distributed Database Systems) will be released weekly

11 Reference Database Systems – The Complete Book
Second edition (Prentice Hall) Hector Garcia-Molina, Jeffrey Ullman Jenifer Widom

12 Reference Database Management Systems Third edition
Raghu Ramakrishnan, Johannes Gehrke

13 Assessment Criteria Survey paper: 30 marks
Select one of the following topics: (1) Distributed Data Analytics Systems, (2) NoSQL, (3) NewSQL, (4) Distributed Graph Processing Systems, (5) Distributed Stream Processing Systems, or (6) any other related topic (please seek the approval of the course instructor first) Write a survey paper for this topic The survey paper much contain most of the seminal works and the state-of-the-art works related to this topic, including a clear introduction to each of these works, a description of the problems they solved and their main ideas, a comparative analysis highlighting the strengths and limitations of these works, and your own conclusions and comments on this topic and its future development, etc. Deadline: Nov 30, 2017 HK time (submit a pdf file with filename “Lastname Firstname” to with title “5570 survey Lastname Firstname”)

14 Assessment Criteria Course Project: 70 marks
See details in the project specification

15 Assessment Criteria You will receive an F grade for the course if
your score for the survey paper is less than 10 marks, OR your score for the course project is less than 30 marks You will receive at least a B- if your score for the survey paper is at least 20 marks, AND your score for the course project is at least 40 marks

16 Academic Honesty Plagiarism, cheating, misconduct in test/exam should be reported to the Faculty Disciplinary Committee for handling. University Guidelines to Academic Honesty:

17 Student/Faculty Expectations
Let’s join hands to create a positive, respectful, and engaged academic environment inside and outside classroom. Full version of Student/Faculty Expectations on Teaching and Learning: xpectations.pdf


Download ppt "CSCI5570 Large Scale Data Processing Systems"

Similar presentations


Ads by Google