Presentation is loading. Please wait.

Presentation is loading. Please wait.

András Benczúr Head, “Big Data – Momentum” Research Group Big Data Analytics Institute for Computer.

Similar presentations


Presentation on theme: "András Benczúr Head, “Big Data – Momentum” Research Group Big Data Analytics Institute for Computer."— Presentation transcript:

1 András Benczúr benczur@sztaki.mta.hu Head, “Big Data – Momentum” Research Group Big Data Analytics http://datamining.sztaki.hu/ Institute for Computer Science and Control, Hungarian Academy of Sciences in collaboration with Volker Markl TU Berlin

2 Data Management vs. Data Analytics Data Management for traditional gather, access, search Analytics (Machine Learning) for insights, predictions

3 Deep Analytics needs

4 Twitter Example: Meryl Streep – Oscar, 2012

5

6 kép: http://mirror.co.uk Twitter Example: Meryl Streep – Oscar, 2012

7

8 kép: http://bbc.com Twitter Example: Meryl Streep – Oscar, 2012

9

10 What was The Analytics Challenge? One year 1 billion Tweet collection, 100GB Ad Hoc queries (Meryl Streep) may have 100,000+ hits Fast response needed to support the analyst Solutions o In Memory databases (SAP HANA, …) – cost and physical limitations o Customized approximate data structures (Bloom filters, MinHash fingerprints)

11 Need for Networked Analytics

12 Information in interconnectivity Number and influnce, impressibility of followers, tweets Statistical properties on temporal dynamics and number of users reached by messages

13 Predictive Claims Processing Hungarian Insurance company cases Rule generation (incl. social media) Feature engineering Machine learning & alert generation Days since contract Known fraud Normal sample

14 3-4 transactions distance raise the flag

15 Need for Real Time Analytics

16 Software AND Human Latencies

17 Deep Analysis of Big Data is Key to Competitiveness

18 Data Science: Deep Analytics + Big Data

19 Data Scientist magic triangle Application Scalable Data Management Machine Learning, Statistics, Data Analysis Data Science Control Flow Iterative Algorithms Error Estimation Active Sampling Sketches Curse of Dimensionality Decoupling Convergence Monte Carlo Mathematical Programming Linear Algebra Stochastic Gradient Descent Regression Statistics Hashing Parallelization Query Optimization Fault Tolerance Relational Algebra / SQL Scalability Data Analysis Language Compiler Memory Management Memory Hierarchy Data Flow Hardware Adaptation Indexing Resource Management NF 2 /XQuery Data Warehouse/OLAP ML DM Domain Expertise (e.g., Industry 4.0, Medicine, Physics, Engineering, Energy, Logistics) Real-Time

20 Apache Flink: the emerging European tool TU Berlin / DFKI (DE) SICS (SE) SZTAKI (HU)

21 Data Scientist Supply Chain

22

23 Registered teams - affiliation

24 Extra Slide: Fully Distributed Modeling Needs no central service – suitable for: o Ad hoc networks o Privacy requirements Model delta updates are sent to peers Results for applicability in: o Classification o Recommender Systems R  P 1 Q 1 R  P 2 (Q 2 +  Q) Measurement QQ QQ

25 Conclusions Software Latency o For data streaming solutions, we have to combine Batch pre-computed models updated real time (lambda architecture) Very low memory data approximation Carefully selected database operations to optimize communication o Machine learning, prediction, classification made Highly time sensitive, streaming Fully distributed: each element learns by passing model error to peers Human Latency o Shortage of Data Scientists worldwide o Needs training AND systems with reduced learning curve


Download ppt "András Benczúr Head, “Big Data – Momentum” Research Group Big Data Analytics Institute for Computer."

Similar presentations


Ads by Google