Presentation is loading. Please wait.

Presentation is loading. Please wait.

So, what was this course about?

Similar presentations


Presentation on theme: "So, what was this course about?"— Presentation transcript:

1 So, what was this course about?

2 Ingredients Data Analytics Humans
Analyzing & extracting value from data Humans As analysts extracting value As workers helping the analysis

3 Course Objectives Reading and Comprehension Skills
You read ~25 papers Critical Thinking and Discussion Skills Active engaging in critically analyzing papers flaws and insights Research Skills Semester-long meaty project Presentation Skills Present the key ideas of a database style paper

4 Optimization Objectives
Accuracy Better, more complete results Power Enabler of more interesting analyses Speed Want results quickly Ease of use For both novice and expert users Cost Crowds, resources

5 Topics Covered Dealing with Unstructured/Noisy Data
Crowd-Powered Data Analytics Data Cleaning Tools Dealing with more data Scalable Data Analytics Approximate Data Analytics Dealing with New Scenarios ML and Graph Processing Collaborative Query Processing Dealing with Novice Analysts Visual Analytics Systems New Interfaces & Usability For each, we covered a A) system or an algorithm + B) connections to other (sometimes old) database topics

6 Topics Covered New forms of data Crowd-Powered Data Analytics
CrowdScreen: Filtering data with humans: cost/latency/accuracy; probabilistic reasoning So Who Won: Max Graph-based maximum-likelihood reasoning Sorts and Joins: Sorting and joins with humans New types of interfaces (hybrid), batching Enumeration: Gathering all entities on a topic Open world assumption, species estimation literature Turkit Programming toolkit for the crowd CrowdDB: DB + Crowds Data model (CNULL), query constructs, query processing Deco: DB + Crowds A more complete language

7 Topics Covered A. New forms of data 2) Data Cleaning
Potter’s Wheel: programmatic cleaning precursor of data cleaning systems. User defined cleaning Wrangler: interactive cleaning autosuggested cleaning alternatives Profiler: cleaning + anomaly discovery appropriate binning for discovering anomalies

8 Topics Covered B. Dealing with more data 1) Scalable Data Analytics
Spark: In-memory query processing noSQL system, datasets as objects, persist (as against MR) Dremel: Google’s parallel column-store system distributed query processing, column stores SparkSQL: DB layer on Spark Translation from SQL to Spark queries, ..

9 Topics Covered B. Dealing with more data
2) Approximate Analytics: tradeoff between c/l/a BlinkDB: Approximate Query Answering System stratified samples help! Query column sets

10 Topics Covered C. Dealing with novice analysts
1) Visual Analytics Systems Polaris: Basis for tableau Idea of a data cube, visualizations = cube aggregates! Trust me, I’m partially right: approximate vis online aggregation SeeDB: visualization recommendations scalable grouped query execution techniques

11 Topics Covered C. Dealing with novice analysts
2) New Interfaces and Usability DBTouch touch-based querying of data: pinch+zoom Gestural Query Specification completeness of operators; user study! Making Database Systems Usable natural language interface types: forms, keyword search, QBE

12 Topics Covered D. Dealing with new settings
1) Machine Learning and Graph Processing MADSkills: Wrapper on traditional database kinds of ML-based analyses of interest Graphlab: Distributed Graph Analytics tools graph analytics systems, “thinking like a vertex” MLBase: Wrapper on ML algorithms parameter tuning for ML is a pain

13 Topics Covered D. Dealing with new settings 2) Collaborative Analyses
Fusion Tables: Google’s public data analytics / viz tool data cleaning problem; data integration

14 “Historical” Takeaways
Examples: Storage layer: column stores, data compression, data sampling Processing layer: noSQL, adaptive QP, parallel QP c-l-a tradeoff in crowdsourcing, interfaces, batching Usability layer: forms, keyword search, QBE data integration, data cleaning Visualization layer: binning, aggregation, data cubes, online aggregation Applications layer: graph processing machine learning primitives

15 Mix of Papers: Vision vs. Details
Visionary, examples Database usability DBTouch/GestureDB MLBase Detail-Oriented, examples CrowdScreen GraphLab Dremel, SparkSQL

16 Mix of Papers: Algorithmic vs. Systems
Algorithmic: probably 30-35% Systems-oriented: the majority Not surprising given that this is a database systems course ….

17 (Hopefully) Lessons Learned
Don’t solve non-problems! Importance of thinking about users Interface Language careful systems architecture Generalizable Efficient / Powerful Tailored to use-cases Data analytics involves: Usability Careful, Scalable system architecture (Systems) Principled algorithms design (Algorithms)


Download ppt "So, what was this course about?"

Similar presentations


Ads by Google