Big Data: Big Challenges for Computer Science Henri Bal Vrije Universiteit Amsterdam.

Slides:



Advertisements
Similar presentations
European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies Scalability.
Advertisements

Frank van Harmelen Vrije Universiteit Amsterdam The Information Universe of the (Near) Futur e Creative Commons License: allowed to share & remix, but.
Shredder GPU-Accelerated Incremental Storage and Computation
The First 16 Years of the Distributed ASCI Supercomputer Henri Bal Vrije Universiteit Amsterdam COMMIT/
Vrije Universiteit Interdroid: a platform for distributed smartphone applications Henri Bal, Nick Palmer, Roelof Kemp, Thilo Kielmann High Performance.
Vrije Universiteit Interdroid: a platform for distributed smartphone applications Henri Bal, Nick Palmer, Roelof Kemp, Thilo Kielmann High Performance.
CCGrid2013 Panel on Clouds Henri Bal Vrije Universiteit Amsterdam.
Accelerators for HPC: Programming Models Accelerators for HPC: StreamIt on GPU High Performance Applications on Heterogeneous Windows Clusters
Frank van Harmelen Vrije Universiteit Amsterdam The Web of data and LarKC’s role in it Creative Commons License: allowed to share & remix, but must attribute.
Sven Woop Computer Graphics Lab Saarland University
Large Scale Computing Systems
COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Lecture 6: Multicore Systems
Nokia Technology Institute Natural Partner for Innovation.
GPU System Architecture Alan Gray EPCC The University of Edinburgh.
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
GPUs. An enlarging peak performance advantage: –Calculation: 1 TFLOPS vs. 100 GFLOPS –Memory Bandwidth: GB/s vs GB/s –GPU in every PC and.
GPU Programming: eScience or Engineering? Henri Bal COMMIT/ msterdam Vrije Universiteit.
Parallel Programming Henri Bal Rob van Nieuwpoort Vrije Universiteit Amsterdam Faculty of Sciences.
Parallel Programming Henri Bal Vrije Universiteit Faculty of Sciences Amsterdam.
Weekly Report Start learning GPU Ph.D. Student: Leo Lee date: Sep. 18, 2009.
Parallel Programming Henri Bal Vrije Universiteit Faculty of Sciences Amsterdam.
CS 732: Advance Machine Learning Usman Roshan Department of Computer Science NJIT.
Parallel Programming Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences.
GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.
GPGPU platforms GP - General Purpose computation using GPU
1 Building National Cyberinfrastructure Alan Blatecky Office of Cyberinfrastructure EPSCoR Meeting May 21,
Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.
COMMUNICATION COMMUNICATE COMMUNITY Henri Bal A PUBLIC-PRIVATE RESEARCH COMMUNITY.
By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.
© David Kirk/NVIDIA and Wen-mei W. Hwu, 1 Programming Massively Parallel Processors Lecture Slides for Chapter 1: Introduction.
Programming Concepts in GPU Computing Dušan Gajić, University of Niš Programming Concepts in GPU Computing Dušan B. Gajić CIITLab, Dept. of Computer Science.
Use of GPUs in ALICE (and elsewhere) Thorsten Kollegger TDOC-PG | CERN |
Henri Bal Vrije Universiteit Amsterdam High Performance Distributed Computing.
GPU Architecture and Programming
A Closer Look At GPUs By Kayvon Fatahalian and Mike Houston Presented by Richard Stocker.
Hardware Acceleration Using GPUs M Anirudh Guide: Prof. Sachin Patkar VLSI Consortium April 4, 2008.
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
Uni Innsbruck Informatik - 1 Network Support for Grid Computing... a new research direction! Michael Welzl DPS NSG Team
FM - AGS 2001 What computers are good at ? The advantages of ICT.
CENTRAL PROCESSING UNIT. CPU Does the actual processing in the computer. A single chip called a microprocessor. Composed of an arithmetic and logic unit.
Big data Usman Roshan CS 675. Big data Typically refers to datasets with very large number of instances (rows) as opposed to attributes (columns). Data.
Massive Semantic Web data compression with MapReduce Jacopo Urbani, Jason Maassen, Henri Bal Vrije Universiteit, Amsterdam HPDC ( High Performance Distributed.
Lecture 8 : Manycore GPU Programming with CUDA Courtesy : SUNY-Stony Brook Prof. Chowdhury’s course note slides are used in this lecture note.
Wide-Area Parallel Computing in Java Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences vrije Universiteit.
Parallel processing
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483, University of Illinois, Urbana-Champaign 1 Graphic Processing Processors (GPUs) Parallel.
Parallel Programming Henri Bal Vrije Universiteit Faculty of Sciences Amsterdam.
Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.
Rigel: An Architecture and Scalable Programming Interface for a 1000-core Accelerator Paper Presentation Yifeng (Felix) Zeng University of Missouri.
Parallel Programming Henri Bal Vrije Universiteit Faculty of Sciences Amsterdam.
GPGPU introduction. Why is GPU in the picture Seeking exa-scale computing platform Minimize power per operation. – Power is directly correlated to the.
Parallel Computing on Wide-Area Clusters: the Albatross Project Aske Plaat Thilo Kielmann Jason Maassen Rob van Nieuwpoort Ronald Veldema Vrije Universiteit.
SCARIe: using StarPlane and DAS-3 Paola Grosso Damien Marchel Cees de Laat SNE group - UvA.
Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi
IV-e: e-Infrastructure Virtualization for e-Science Applications (P20)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
Big Data - Efficient SW Processing
COMPUTER GRAPHICS CHAPTER 38 CS 482 – Fall 2017 GRAPHICS HARDWARE
I P v 6 -- B a s e d 5 G & I o T.
Very VERY large scale knowledge representation In collaboration with:
Master Introduction Computer Science Joint UvA-VU Degree Andrew S
Introduction to Heterogeneous Parallel Computing
Vrije Universiteit Amsterdam
Panel on Research Challenges in Big Data
Presentation transcript:

Big Data: Big Challenges for Computer Science Henri Bal Vrije Universiteit Amsterdam

Multiple types of data explosions High-volume data x global internet traffic per year (by 2018) Complex data

Graphics Processing Units (GPUs)

Differences CPUs and GPUs ● CPU: minimize latency of 1 activity (thread) ● Must be good at everything ● Big on-chip caches ● Sophisticated control logic ● GPU: maximize throughput of all threads using large-scale parallelism Control ALU Cache

Example: NVIDIA Maxwell ● 16 independent streaming multiprocessors ● 2048 compute cores

Ongoing GPU work at VU ● Applications ● Multimedia data ● Digital forensics data ● Climate modelling ● Radio astronomy data ● Methodologies ● Hadoop on accelerators ● Programming methods for accelerators ● Teaching GPUs (with UvA) ● National ICT research infrastructure COMMIT/

Complex data ● Still smaller in volume than astronomy etc. ● Much more complicated, semantically rich data ● Growing fast ….

Semantic web ● Make the Web smarter by injecting meaning so that machines can reason about it ● initial idea by Tim Berners-Lee in 2001 ● Now attracted the interest of big IT companies

WebPIE: a Web-scale Parallel Inference Engine ● Web-scale parallel reasoner doing full materialization ● Orders of magnitude faster than previous work by using smart parallel algorithms ● Jacopo Urbani + Frank van Harmelen (VU) Christiaan Huygens nomination PhD thesis Urbani

Reasoning on changing data ● WebPIE must recompute everything if data changes ● Takes on the order of 1 day on a 64-node compute cluster ● Challenge: real-time incremental reasoning, combining new (streaming) data & historic data ● Nanopublications ( ● Handling 2 million news articles per day (Piek Vossen, VU) ● Data streams from (health) sensors & smart phones ● Exploit massive parallel computing and GPUs

Other work on complex data ● Use semantic web to describe and reason about computer infrastructure (Cees de Laat, UvA) ● Machine learning using GPUs (Hadoop) ● Joint work with Max Welling (UvA) ● Business applications ● With Frans Feldberg (VU, Economy)

Discussion ● We can process peta-scale (10 15, LHC) simple data with cluster and grid technology ● Exascale (10 18, SKA) may be feasible with GPUs, but requires new parallel programming methodologies ● Processing complex data is vastly more complicated, even at smaller scales ● Complex data is also escalating in size ● Dynamic (streaming) data will be next ● Processing exa-scale dynamic complex data?