Presentation on theme: "Big Data: Big Challenges for Computer Science Henri Bal Vrije Universiteit Amsterdam."— Presentation transcript:
Big Data: Big Challenges for Computer Science Henri Bal Vrije Universiteit Amsterdam
Multiple types of data explosions High-volume data x global internet traffic per year (by 2018) Complex data
Graphics Processing Units (GPUs)
Differences CPUs and GPUs ● CPU: minimize latency of 1 activity (thread) ● Must be good at everything ● Big on-chip caches ● Sophisticated control logic ● GPU: maximize throughput of all threads using large-scale parallelism Control ALU Cache
Ongoing GPU work at VU ● Applications ● Multimedia data ● Digital forensics data ● Climate modelling ● Radio astronomy data ● Methodologies ● Hadoop on accelerators ● Programming methods for accelerators ● Teaching GPUs (with UvA) ● National ICT research infrastructure COMMIT/
Complex data ● Still smaller in volume than astronomy etc. ● Much more complicated, semantically rich data ● Growing fast ….
Semantic web ● Make the Web smarter by injecting meaning so that machines can reason about it ● initial idea by Tim Berners-Lee in 2001 ● Now attracted the interest of big IT companies
WebPIE: a Web-scale Parallel Inference Engine ● Web-scale parallel reasoner doing full materialization ● Orders of magnitude faster than previous work by using smart parallel algorithms ● Jacopo Urbani + Frank van Harmelen (VU) Christiaan Huygens nomination PhD thesis Urbani
Reasoning on changing data ● WebPIE must recompute everything if data changes ● Takes on the order of 1 day on a 64-node compute cluster ● Challenge: real-time incremental reasoning, combining new (streaming) data & historic data ● Nanopublications (http://nanopub.org) ● Handling 2 million news articles per day (Piek Vossen, VU) ● Data streams from (health) sensors & smart phones ● Exploit massive parallel computing and GPUs
Other work on complex data ● Use semantic web to describe and reason about computer infrastructure (Cees de Laat, UvA) ● Machine learning using GPUs (Hadoop) ● Joint work with Max Welling (UvA) ● Business applications ● With Frans Feldberg (VU, Economy)
Discussion ● We can process peta-scale (10 15, LHC) simple data with cluster and grid technology ● Exascale (10 18, SKA) may be feasible with GPUs, but requires new parallel programming methodologies ● Processing complex data is vastly more complicated, even at smaller scales ● Complex data is also escalating in size ● Dynamic (streaming) data will be next ● Processing exa-scale dynamic complex data?