Henri Bal Vrije Universiteit Amsterdam High Performance Distributed Computing.

Slides:



Advertisements
Similar presentations
The First 16 Years of the Distributed ASCI Supercomputer Henri Bal Vrije Universiteit Amsterdam COMMIT/
Advertisements

Vrije Universiteit Interdroid: a platform for distributed smartphone applications Henri Bal, Nick Palmer, Roelof Kemp, Thilo Kielmann High Performance.
Vrije Universiteit Interdroid: a platform for distributed smartphone applications Henri Bal, Nick Palmer, Roelof Kemp, Thilo Kielmann High Performance.
CCGrid2013 Panel on Clouds Henri Bal Vrije Universiteit Amsterdam.
Accelerators for HPC: Programming Models Accelerators for HPC: StreamIt on GPU High Performance Applications on Heterogeneous Windows Clusters
Big Data: Big Challenges for Computer Science Henri Bal Vrije Universiteit Amsterdam.
Large Scale Computing Systems
Priority Research Direction (I/O Models, Abstractions and Software) Key challenges What will you do to address the challenges? – Develop newer I/O models.
Running Large Graph Algorithms – Evaluation of Current State-of-the-Art Andy Yoo Lawrence Livermore National Laboratory – Google Tech Talk Feb Summarized.
Emerging Platform#6: Cloud Computing B. Ramamurthy 6/20/20141 cse651, B. Ramamurthy.
Tutorial at ISWC 2011, Distributed reasoning: because size matters Andreas Harth, Aidan Hogan, Spyros Kotoulas,
GPU Programming: eScience or Engineering? Henri Bal COMMIT/ msterdam Vrije Universiteit.
Parallel Programming Henri Bal Rob van Nieuwpoort Vrije Universiteit Amsterdam Faculty of Sciences.
Parallel Programming Henri Bal Vrije Universiteit Faculty of Sciences Amsterdam.
Transposition Driven Work Scheduling in Distributed Search Department of Computer Science vrijeamsterdam vrije Universiteit amsterdam John W. Romein Aske.
Parallel Programming Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences.
Copyright © 2012 Cleversafe, Inc. All rights reserved. 1 Combining the Power of Hadoop with Object-Based Dispersed Storage.
Introduction. Readings r Van Steen and Tanenbaum: 5.1 r Coulouris: 10.3.
Cloud MapReduce : a MapReduce Implementation on top of a Cloud Operating System Speaker : 童耀民 MA1G Authors: Huan Liu, Dan Orban Accenture.
Tyson Condie.
Energy Issues in Data Analytics Domenico Talia Carmela Comito Università della Calabria & CNR-ICAR Italy
資訊工程系智慧型系統實驗室 iLab 南台科技大學 1 Optimizing Cloud MapReduce for Processing Stream Data using Pipelining 出處 : 2011 UKSim 5th European Symposium on Computer Modeling.
Last Words COSC Big Data (frameworks and environments to analyze big datasets) has become a hot topic; it is a mixture of data analysis, data mining,
Slide 1 Auburn University Computer Science and Software Engineering Scientific Computing in Computer Science and Software Engineering Kai H. Chang Professor.
Panel Abstractions for Large-Scale Distributed Systems Henri Bal Vrije Universiteit Amsterdam.
BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.
Authors: Jiann-Liang Chenz, Szu-Lin Wuy,Yang-Fang Li, Pei-Jia Yang,Yanuarius Teofilus Larosa th International Wireless Communications and Mobile.
UNIT - 1Topic - 2 C OMPUTING E NVIRONMENTS. What is Computing Environment? Computing Environment explains how a collection of computers will process and.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
COMMUNICATION COMMUNICATE COMMUNITY Henri Bal A PUBLIC-PRIVATE RESEARCH COMMUNITY.
HadoopDB project An Architetural hybrid of MapReduce and DBMS Technologies for Analytical Workloads Anssi Salohalla.
Distributed Indexing of Web Scale Datasets for the Cloud {ikons, eangelou, Computing Systems Laboratory School of Electrical.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Extreme scale parallel and distributed systems – High performance computing systems Current No. 1 supercomputer Tianhe-2 at petaflops Pushing toward.
David S. Ebert David S. Ebert Visual Analytics to Enable Discovery and Decision Making: Potential, Challenges, and.
1 A Framework for Data-Intensive Computing with Cloud Bursting Tekin Bicer David ChiuGagan Agrawal Department of Compute Science and Engineering The Ohio.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Big Data Analytics Large-Scale Data Management Big Data Analytics Data Science and Analytics How to manage very large amounts of data and extract value.
Optimizing Cloud MapReduce for Processing Stream Data using Pipelining 2011 UKSim 5th European Symposium on Computer Modeling and Simulation Speker : Hong-Ji.
CS 127 Introduction to Computer Science. What is a computer?  “A machine that stores and manipulates information under the control of a changeable program”
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
Massive Semantic Web data compression with MapReduce Jacopo Urbani, Jason Maassen, Henri Bal Vrije Universiteit, Amsterdam HPDC ( High Performance Distributed.
| nectar.org.au NECTAR TRAINING Module 4 From PC To Cloud or HPC.
Shangkar Mayanglambam, Allen D. Malony, Matthew J. Sottile Computer and Information Science Department Performance.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
Big Data to Knowledge Panel SKG 2014 Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China August Geoffrey Fox
MapReduce. Google and MapReduce Google searches billions of web pages very, very quickly How? It uses a technique called “MapReduce” to distribute the.
Parallel Programming Henri Bal Vrije Universiteit Faculty of Sciences Amsterdam.
© 2013 IBM Corporation 1 Title of presentation goes Elisa Martín Garijo IBM Distinguish Engineer and CTO for IBM Spain. Global Technology.
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Smart Grid Big Data: Automating Analysis of Distribution Systems Steve Pascoe Manager Business Development E&O - NISC.
Big Data analytics in the Cloud Ahmed Alhanaei. What is Cloud computing?  Cloud computing is Internet-based computing, whereby shared resources, software.
BIG DATA/ Hadoop Interview Questions.
© 2007 IBM Corporation IBM Software Strategy Group IBM Google Announcement on Internet-Scale Computing (“Cloud Computing Model”) Oct 8, 2007 IBM Confidential.
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
Large Scale Semantic Data Integration and Analytics through Cloud: A Case Study in Bioinformatics Tat Thang Parallel and Distributed Computing Centre,
Geoffrey Fox Panel Talk: February
Organizations Are Embracing New Opportunities
IV-e: e-Infrastructure Virtualization for e-Science Applications (P20)
به نام خدا Big Data and a New Look at Communication Networks Babak Khalaj Sharif University of Technology Department of Electrical Engineering.
Tools for Processing Big Data Jinan Al Aridhee and Christian Bach
CS110: Discussion about Spark
MapReduce.
The Most In-Demand Skills for Cloud Computing.
The Performance of Big Data Workloads in Cloud Datacenters
Big DATA.
Vrije Universiteit Amsterdam
Panel on Research Challenges in Big Data
Presentation transcript:

Henri Bal Vrije Universiteit Amsterdam High Performance Distributed Computing

Outline 1. Development of the field 2. Highlights VU-HPDC group 3. Links to data science cycle 4. Conclusions

Developments Multiple types of data explosions: –Big data: huge processing/transportation demands –Complex heterogeneous data LOFAR: ~15 PB/year SKA: >300 PB/year, exascale processing Complex data

Developments Infrastructure explosion –High complexity: heterogeneous systems with diversity of processors, systems, networks

VU HPDC GROUP Bridge the gap between demanding applications and complex infrastructure Distributed programming systems for –Clusters, grids, clouds –Accelerators (GPUs) –Heterogeneous systems (``Jungles”) –Clouds & mobile devices Applications: multimedia, semantic web, model checking, games, astronomy, astrophysics, climate modeling ….

Highlights VU-HPDC group 1st Prize: SCALE 2008 AAAI-VC 2007 DACH BSDACH FT 3rd Prize: ISWC 20081st Prize: SCALE 2010 EYR 2011 Sustainability award Solved Awari 2002

Links to data science cycle Understand and decide Analyze and model Store and process Reasoning Knowledge representati on Multimedia Retrieval Modeling and simulation Machine Learning Information Retrieval Decision Theory Perception Cognition Visual Analytics Distributed Processing Large Scale Databases Software Eng. System / Network Eng. Distributed reasoning Jungle computing MapReduce

Reasoning – Semantic Web Make the Web smarter by injecting meaning so that machines can “understand” it. o initial idea by Tim Berners-Lee in 2001 Now attracted the interest of big IT companies

Google Example

Distributed Reasoning WebPIE: web-scale distributed reasoner doing full materialization QueryPIE: distributed reasoning with backward-chaining + pre-materialization of schema-triples DynamiTE: maintains materialization after updates (additions & removals)  Challenge: real-time incremental reasoning on web scale, combining new (streaming) data & existing historic data With: Jacopo Urbani, Alessandro Margara, Frank van Harmelen COMMIT/

Glasswing: MapReduce on Accelerators Use accelerators as a mainstream feature Massive out-of-core data sets Scale vertically & horizontally Code portability using OpenCL Maintain MapReduce abstraction With: Ismail El Helw, Rutger Hofman

Glasswing Pipeline Overlaps computation, communication & disk access Supports multiple buffering levels

Evaluation of Glasswing Glasswing uses CPU, memory & disk resources more efficiently than Hadoop Compute-bound applications benefit dramatically from GPUs Better scalability than Hadoop Runs on a variety of accelerators E.g. k-means clustering: –8.5 × (1 node) vs × (64 nodes) vs. 107 × (GPU node)