Scaling Charts with Design and GPUs Leo Meyerovich CEO of Graphistry.com | UC Berkeley 1.

Slides:



Advertisements
Similar presentations
Inside an XSLT Processor Michael Kay, ICL 19 May 2000.
Advertisements

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
1. 2 “Well-designed graphics are usually the simplest” Big Data is Different: going from Data Reporting to Knowledge Discovery … small & static charts.
L13: Review for Midterm. Administrative Project proposals due Friday at 5PM (hard deadline) No makeup class Friday! March 23, Guest Lecture Austin Robison,
Interactive Information Visualization of One Million Items Jean-Daniel Fekete University of Maryland.
Synergistic Execution of Stream Programs on Multicores with Accelerators Abhishek Udupa et. al. Indian Institute of Science.
University of Michigan Electrical Engineering and Computer Science Amir Hormati, Mehrzad Samadi, Mark Woh, Trevor Mudge, and Scott Mahlke Sponge: Portable.
Computer Architecture Parallel Processing
11 If you were plowing a field, which would you rather use? Two oxen, or 1024 chickens? (Attributed to S. Cray) Abdullah Gharaibeh, Lauro Costa, Elizeu.
GPUs and Accelerators Jonathan Coens Lawrence Tan Yanlin Li.
HTML5 – The Power of HTML5 – The Power of Thomas Lewis HTML5 Principal Technical Evangelist Microsoft | asimplepixel.tumblr.com.
GPU Shading and Rendering Shading Technology 8:30 Introduction (:30–Olano) 9:00 Direct3D 10 (:45–Blythe) Languages, Systems and Demos 10:30 RapidMind.
Parallel Applications Parallel Hardware Parallel Software IT industry (Silicon Valley) Users Efficient Parallel CKY Parsing on GPUs Youngmin Yi (University.
Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:
Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.
Stefan PopovHigh Performance GPU Ray Tracing Real-time Ray Tracing on GPU with BVH-based Packet Traversal Stefan Popov, Johannes Günther, Hans- Peter Seidel,
Embedded Software SKKU 28 1 WebKit/EFL. Embedded Software SKKU 28 2 WebKit Parsing Layout and Painting WebKit and EFL Contents.
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Xin Huo, Vignesh T. Ravi, Gagan Agrawal Department of Computer Science and Engineering.
DASX : Hardware Accelerator for Software Data Structures Snehasish Kumar, Naveen Vedula, Arrvindh Shriraman (Simon Fraser University), Vijayalakshmi Srinivasan.
Parallelization and Characterization of Pattern Matching using GPUs Author: Giorgos Vasiliadis 、 Michalis Polychronakis 、 Sotiris Ioannidis Publisher:
A Closer Look At GPUs By Kayvon Fatahalian and Mike Houston Presented by Richard Stocker.
A Parallel Implementation of MSER detection GPGPU Final Project Lin Cao.
Interactive Rendering With Coherent Ray Tracing Eurogaphics 2001 Wald, Slusallek, Benthin, Wagner Comp 238, UNC-CH, September 10, 2001 Joshua Stough.
Announcements Due dates extended: Project 1B—Wednesday by 10pm rule Thursday by 10pm Lab 5—Friday by 10pm Next week Labs 6/7—Tuesday by 10pm 11/19/2015D.A.
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
GPUs: Overview of Architecture and Programming Options Lee Barford firstname dot lastname at gmail dot com.
CHEP 2013, Amsterdam Reading ROOT files in a browser ROOT I/O IN JAVASCRIPT B. Bellenot, CERN, PH-SFT B. Linev, GSI, CS-EE.
Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.
A Neural Network Implementation on the GPU By Sean M. O’Connell CSC 7333 Spring 2008.
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
Efficient Parallel CKY Parsing on GPUs Youngmin Yi (University of Seoul) Chao-Yue Lai (UC Berkeley) Slav Petrov (Google Research) Kurt Keutzer (UC Berkeley)
Secure Cooperative Sharing of JavaScript, Browser, and Physical Resources Benjamin Livshits UC Berkeley Leo Meyerovich, David Zhu.
Trading Cache Hit Rate for Memory Performance Wei Ding, Mahmut Kandemir, Diana Guttman, Adwait Jog, Chita R. Das, Praveen Yedlapalli The Pennsylvania State.
ROOT I/O in JavaScript Browsing ROOT Files on the Web For more information see: For any questions please use following address:
GPUs – Graphics Processing Units Applications in Graphics Processing and Beyond COSC 3P93 – Parallel ComputingMatt Peskett.
Targeted Bottleneck #1: Rule Matching EECS Electrical Engineering and Computer Sciences B ERKELEY P AR L AB Parallel Cascading Style Sheets Leo Meyerovich,
PDAC-10 Middleware Solutions for Data- Intensive (Scientific) Computing on Clouds Gagan Agrawal Ohio State University (Joint Work with Tekin Bicer, David.
Computer Systems Lab TJHSST Senior Research Project Browser Based Distributed Computing Siggi Simonarson.
1 Adapted from UC Berkeley CS252 S01 Lecture 18: Reducing Cache Hit Time and Main Memory Design Virtucal Cache, pipelined cache, cache summary, main memory.
Towards a Smart Workload Generator on RAMP Archana Ganapathi, David Patterson, Anthony Joseph {archanag, pattrsn, cs.berkeley.edu.
AUTO-GC: Automatic Translation of Data Mining Applications to GPU Clusters Wenjing Ma Gagan Agrawal The Ohio State University.
3/12/2013Computer Engg, IIT(BHU)1 CUDA-3. GPGPU ● General Purpose computation using GPU in applications other than 3D graphics – GPU accelerates critical.
S. Pardi Frascati, 2012 March GPGPU Evaluation – First experiences in Napoli Silvio Pardi.
Martin Kruliš by Martin Kruliš (v1.1)1.
OCR Computing OGAT Web Technologies. What OCR need you to know… You are expected to have a working knowledge of the above web languages.
Our Graphics Environment Landscape Rendering. Hardware  CPU  Modern CPUs are multicore processors  User programs can run at the same time as other.
CSE 3 Portfolio Desktop Publishing with MS Word Computational Thinking
CS427 Multicore Architecture and Parallel Computing
Adventures with Computational Thinking
ICG Syllabus 1. Introduction 2. Viewing in 3D and Graphics Programming
Adventures in Computational Thinking Rosemary Maciel, Fall 2015
The Basics: HTML5, Drawing, and Source Code Organization
CSE 3: Fluency with Information Technology Allison Bagnol
Prepared by : Ankit Patel (226)
CSE 3 Computational Thinking
Leo Meyerovich, David Zhu
Real-Time Ray Tracing Stefan Popov.
CDA 6938 Final Project Triangulation from Point Cloud
Comparison Between Deep Learning Packages
Spare Register Aware Prefetching for Graph Algorithms on GPUs
Dignitas Digital Pvt. Ltd.
Parallel Programming in Contemporary Programming Languages (Part 2)
KISS-Tree: Smart Latch-Free In-Memory Indexing on Modern Architectures
VLIW DSP vs. SuperScalar Implementation of a Baseline H.263 Encoder
Support for ”interactive batch”
Unit 6 part 3 Test Javascript Test.
Thank you Sponsors.
Client-Server Model: Requesting a Web Page
Fast Accesses to Big Data in Memory and Storage Systems
Yale Digital Conference 2019
Presentation transcript:

Scaling Charts with Design and GPUs Leo Meyerovich CEO of Graphistry.com | UC Berkeley 1

Visibility 2

Visibility through design + speed 3

Histogram of Voter Turnout by Town 4 0%25%50% 75%100% Voter Turnout # Towns Most towns had ~40% people vote ballot box stuffing?

5

Opposition Incumbent Tiny square shows town size (area) and vote (color) 6

Filter for towns w/ high turnout 7

Tag suspicious with black 8

9

For visibility, speed  design 10

Problem: Plot 10+ Time Series Signals 11

Design  200 Time Series Signals 100 s 0 s 12

Speed  Pan/Zoom Interactions 38 s 37 s 13

CPU Bottlenecks: naïve and offline Transform Parse Layout Render 0ms 1600ms real- time is 30 ms 14

Prep Optimize Binary Data, Multicore Layout, GPU Render Layout Render 15 0ms 1600ms Real-time interaction Stream from server 12MB+/s

Graphs: Placing Nodes and Edges 16

17 Direct Feedback on Settings

Uber: Trip Start to End 18

Direct Edge Placement: Overplotting 19

Speed  Design  Edge Bundling 20

21

22  web

Bare Metal in the Browser Sequential Multicore GPU 5 X 4+ cores 1024 lanes SIMD 4 lanes 23

S UPERCONDUCTOR : Parallel JS Viz Engine HTML data CSS styling JS script Pixels Parser Selectors Layout Renderer JavaScript VM Renderer.GL webpage 24 Layout.CL Selectors.CL GPU data styling widgets data viz Compiler Parser.js BROWSERSUPERCONDUCTOR.js

Leaf Layout as Parallel Tree Traversals w,h x,y … 1. Works for all data sets 2. Compiler: CSS  Schedule logical joins logical spawns Parallelism in each traversal! 25

parallel for loop level synchronous GPU Traversals: Flat & Level-Synchronous level 1 Tree level n w h x y Nodes in arrays flat Array per attribute Compiler handles transform of code & data 26

More Scalable Designs Immens (Stanford) Nanocubes (AT&T) MapD (MIT) Abstract Rendering (Continuum)Synerscope 27

28

Achieve data visibility through hardware-accelerated designs (and deploy on the web ) 29

Visualize Magnitudes More Data in the Browser Leo Meyerovich CEO of Graphistry.com | UC Berkeley 30

Leaf Layout as Parallel Tree Traversals w,h x,y … 1. Works for all data sets 2. Compiler: CSS  Schedule logical joins logical spawns Parallelism in each traversal! 31

parallel for loop level synchronous GPU Traversals: Flat & Level-Synchronous level 1 Tree level n w h x y Nodes in arrays flat Array per attribute Compiler handles transform of code & data 32

L2: 1MB RAM: 2GB way SIMT GPGPU core 1 4-way SIMD L1d: 32KB Today’s Supercomputer-in-a-Pocket core 1 Prefetch Engine 1 33 Challenge: Parallelize Data Visualization Phone 16-lane CPU 1024-lane GPU

circ(…) Problem: Dynamic Memory Allocation on GPU? square(…) rect(…); … line(…); … rect(…); … oval(…) function circ (x,y,r) { buffer = new Array(r * 10) for (i = 0; i < r * 10; i++) buffer[i] = cos(i) } dynamic allocation 

Dynamic Allocation as SIMD Traversals allocCirc(…)  4 allocRect(…)  6 allocLine(…)  6 allocRect(…)  7 fillCirc(…) fillRect(…) fillLine(…) fillRect(…) 1. Prefix sum for needed space 2. Allocate buffers 3. Distribute offsets & compute 4. Give OpenGL buffer pointer

CPU vs. GPU for Election Treemap: 5 traversals over 100K nodes 36 WebCL: 30X WebCL: 70X COMBINED: 54X !