Presentation is loading. Please wait.

Presentation is loading. Please wait.

Henri Bal Vrije Universiteit Amsterdam High Performance Distributed Computing.

Similar presentations


Presentation on theme: "Henri Bal Vrije Universiteit Amsterdam High Performance Distributed Computing."— Presentation transcript:

1 Henri Bal Vrije Universiteit Amsterdam High Performance Distributed Computing

2 Outline 1. Development of the field 2. Highlights VU-HPDC group 3. Links to data science cycle 4. Conclusions

3 Developments Multiple types of data explosions: –Big data: huge processing/transportation demands –Complex heterogeneous data LOFAR: ~15 PB/year SKA: >300 PB/year, exascale processing Complex data

4 Developments Infrastructure explosion –High complexity: heterogeneous systems with diversity of processors, systems, networks

5 VU HPDC GROUP Bridge the gap between demanding applications and complex infrastructure Distributed programming systems for –Clusters, grids, clouds –Accelerators (GPUs) –Heterogeneous systems (``Jungles”) –Clouds & mobile devices Applications: multimedia, semantic web, model checking, games, astronomy, astrophysics, climate modeling ….

6 Highlights VU-HPDC group 1st Prize: SCALE 2008 AAAI-VC 2007 DACH 2008 - BSDACH 2008 - FT 3rd Prize: ISWC 20081st Prize: SCALE 2010 EYR 2011 Sustainability award Solved Awari 2002

7 Links to data science cycle Understand and decide Analyze and model Store and process Reasoning Knowledge representati on Multimedia Retrieval Modeling and simulation Machine Learning Information Retrieval Decision Theory Perception Cognition Visual Analytics Distributed Processing Large Scale Databases Software Eng. System / Network Eng. Distributed reasoning Jungle computing MapReduce

8 Reasoning – Semantic Web Make the Web smarter by injecting meaning so that machines can “understand” it. o initial idea by Tim Berners-Lee in 2001 Now attracted the interest of big IT companies

9 Google Example

10

11 Distributed Reasoning WebPIE: web-scale distributed reasoner doing full materialization QueryPIE: distributed reasoning with backward-chaining + pre-materialization of schema-triples DynamiTE: maintains materialization after updates (additions & removals)  Challenge: real-time incremental reasoning on web scale, combining new (streaming) data & existing historic data With: Jacopo Urbani, Alessandro Margara, Frank van Harmelen COMMIT/

12 Glasswing: MapReduce on Accelerators Use accelerators as a mainstream feature Massive out-of-core data sets Scale vertically & horizontally Code portability using OpenCL Maintain MapReduce abstraction With: Ismail El Helw, Rutger Hofman

13 Glasswing Pipeline Overlaps computation, communication & disk access Supports multiple buffering levels

14 Evaluation of Glasswing Glasswing uses CPU, memory & disk resources more efficiently than Hadoop Compute-bound applications benefit dramatically from GPUs Better scalability than Hadoop Runs on a variety of accelerators E.g. k-means clustering: –8.5 × (1 node) vs. 15.5 × (64 nodes) vs. 107 × (GPU node)


Download ppt "Henri Bal Vrije Universiteit Amsterdam High Performance Distributed Computing."

Similar presentations


Ads by Google