Data Management Techniques Sung-Eui Yoon KAIST URL:

Slides:

Advertisements

Similar presentations

Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.

Advertisements

1 A GPU Accelerated Storage System NetSysLab The University of British Columbia Abdullah Gharaibeh with: Samer Al-Kiswany Sathish Gopalakrishnan Matei.

IIIT Hyderabad Hybrid Ray Tracing and Path Tracing of Bezier Surfaces using a mixed hierarchy Rohit Nigam, P. J. Narayanan CVIT, IIIT Hyderabad, Hyderabad,

Early Linpack Performance Benchmarking on IPE Mole-8.5 Fermi GPU Cluster Xianyi Zhang 1),2) and Yunquan Zhang 1),3) 1) Laboratory of Parallel Software.

Introduction to Massive Model Visualization Patrick Cozzi Analytical Graphics, Inc.

Mesh Layouts for Block-Based Caches Sung-Eui Yoon Peter Lindstrom Lawrence Livermore National Laboratory.

Cost-based Workload Balancing for Ray Tracing on a Heterogeneous Platform Mario Rincón-Nigro PhD Showcase Feb 17 th, 2012.

VLSH: Voronoi-based Locality Sensitive Hashing Sung-eui Yoon Authors: Lin Loi, Jae-Pil Heo, Junghwan Lee, and Sung-Eui Yoon KAIST

Haptic Rendering using Simplification Comp259 Sung-Eui Yoon.

A many-core GPU architecture.. Price, performance, and evolution.

EFFICIENT RENDERING LARGE TERRAINS USING MULTIRESOLUTION MODELLING AND IMAGE PROCESSING TECHNIQUES Ömer Nebil YAVEROĞLU Department of Computer Engineering.

1 Lecture 20 – Caching and Virtual Memory  2004 Morgan Kaufmann Publishers Lecture 20 Caches and Virtual Memory.

Programming with CUDA, WS09 Waqar Saleem, Jens Müller Programming with CUDA and Parallel Algorithms Waqar Saleem Jens Müller.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Cache-Oblivious Mesh Layouts Sung-Eui Yoon, Peter Lindstrom Valerio Pascucci, Dinesh Manocha 1: University.

Order-Independent Texture Synthesis Li-Yi Wei Marc Levoy Gcafe 1/30/2003.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Quick-VDR: Interactive View-Dependent Rendering of Massive Models Sung-Eui Yoon Brian Salomon Russell Gayle.

Adapted from: CULLIDE: Interactive Collision Detection Between Complex Models in Large Environments using Graphics Hardware Naga K. Govindaraju, Stephane.

Interactive Shadow Generation in Complex Environments Naga K. Govindaraju, Brandon Lloyd, Sung-Eui Yoon, Avneesh Sud, Dinesh Manocha Speaker: Alvin Date:

3D Graphics Processor Architecture Victor Moya. PhD Project Research on architecture improvements for future Graphic Processor Units (GPUs). Research.

Geometric Sound Propagation Anish Chandak & Dinesh Manocha UNC Chapel Hill

Ray Tracing Dynamic Scenes using Selective Restructuring Sung-eui Yoon Sean Curtis Dinesh Manocha Univ. of North Carolina at Chapel Hill Lawrence Livermore.

Memory Efficient Acceleration Structures and Techniques for CPU-based Volume Raycasting of Large Data S. Grimm, S. Bruckner, A. Kanitsar and E. Gröller.

R-LODs: Fast LOD-based Ray Tracing of Massive Models Sung-Eui Yoon Lawrence Livermore National Lab. Christian Lauterbach Dinesh Manocha Univ. of North.

Assets and Dynamics Computation for Virtual Worlds.

Acceleration on many-cores CPUs and GPUs Dinesh Manocha Lauri Savioja.

Memory: Virtual MemoryCSCE430/830 Memory Hierarchy: Virtual Memory CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu.

Fast Isosurface Visualization on a High-Resolution Scalable Display Wall Adam Finkelstein Allison Klein Kai Li Princeton University Sponsors: DOE, Intel,

Real-Time Ray Tracing 3D Modeling of the Future Marissa Hollingsworth Spring 2009.

1 A Novel Page-Based Data Structure for Interactive Walkthroughs Behzad Sajadi Yan Huang Pablo Diaz-Gutierrez Sung-Eui Yoon M. Gopi.

11 If you were plowing a field, which would you rather use? Two oxen, or 1024 chickens? (Attributed to S. Cray) Abdullah Gharaibeh, Lauro Costa, Elizeu.

Invitation to Computer Science 5th Edition

GPUs and Accelerators Jonathan Coens Lawrence Tan Yanlin Li.

Parallel Applications Parallel Hardware Parallel Software IT industry (Silicon Valley) Users Efficient Parallel CKY Parsing on GPUs Youngmin Yi (University.

So far we have covered … Basic visualization algorithms Parallel polygon rendering Occlusion culling They all indirectly or directly help understanding.

Gregory Fotiades.  Global illumination techniques are highly desirable for realistic interaction due to their high level of accuracy and photorealism.

Bounds on the Geometric Mean of Arc Lengths for Bounded- Degree Planar Graphs M. K. Hasan Sung-eui Yoon Kyung-Yong Chwa KAIST, Republic of Korea.

Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,

On a Few Ray Tracing like Algorithms and Structures. -Ravi Prakash Kammaje -Swansea University.

By Mahmoud Moustafa Zidan Basic Sciences Department Faculty of Computer and Information Sciences Ain Shams University Under Supervision of Prof. Dr. Taymoor.

RACBVHs: Random-Accessible Compressed Bounding Volume Hierarchies Tae-Joon Kim Bochang Moon Duksu Kim Sung-Eui Yoon KAIST (Korea Advanced Institute of.

Interactive Visualization of Exceptionally Complex Industrial CAD Datasets Andreas Dietrich Ingo Wald Philipp Slusallek Computer Graphics Group Saarland.

Random-Accessible Compressed Triangle Meshes Sung-eui Yoon Korea Advanced Institute of Sci. and Tech. (KAIST) Peter Lindstrom Lawrence Livermore National.

1 Real-time visualization of large detailed volumes on GPU Cyril Crassin, Fabrice Neyret, Sylvain Lefebvre INRIA Rhône-Alpes / Grenoble Universities Interactive.

Collision and Proximity Queries Dinesh Manocha Department of Computer Science University of North Carolina

Click to edit Master title style HCCMeshes: Hierarchical-Culling oriented Compact Meshes Tae-Joon Kim 1, Yongyoung Byun 1, Yongjin Kim 2, Bochang Moon.

Fast BVH Construction on GPUs (Eurographics 2009) Park, Soonchan KAIST (Korea Advanced Institute of Science and Technology)

Real-time Rendering of Heterogeneous Translucent Objects with Arbitrary Shapes Stefan Kinauer KAIST (Korea Advanced Institute of Science and Technology)

IIIT Hyderabad Scalable Clustering using Multiple GPUs K Wasif Mohiuddin P J Narayanan Center for Visual Information Technology International Institute.

ARCHES: GPU Ray Tracing I.Motivation – Emergence of Heterogeneous Systems II.Overview and Approach III.Uintah Hybrid CPU/GPU Scheduler IV.Current Uintah.

- Laboratoire d'InfoRmatique en Image et Systèmes d'information

M. Agus F. Bettio E. Gobbetti F. Marton G. Pintore A. Zorcolo CRS4, Visual Computing POLARIS, Ed. 1 C.P. 25 – I Pula (CA) ITALY

Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.

Large Scale Time-Varying Data Visualization Han-Wei Shen Department of Computer and Information Science The Ohio State University.

LODManager A framework for rendering multiresolution models in real-time applications J. Gumbau O. Ripollés M. Chover.

Compact, Fast and Robust Grids for Ray Tracing Ares Lagae & Philip Dutré 19 th Eurographics Symposium on Rendering EGSR 2008Wednesday, June 25th.

Compact, Fast and Robust Grids for Ray Tracing

1 Cache-Oblivious Query Processing Bingsheng He, Qiong Luo {saven, Department of Computer Science & Engineering Hong Kong University of.

Parallel IO for Cluster Computing Tran, Van Hoai.

An Out-of-core Implementation of Block Cholesky Decomposition on A Multi-GPU System Lin Cheng, Hyunsu Cho, Peter Yoon, Jiajia Zhao Trinity College, Hartford,

Computer Architecture Organization and Architecture

Veysi ISLER, Department of Computer Engineering, Middle East Technical University, Ankara, TURKEY Spring

Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.

Memory COMPUTER ARCHITECTURE

So far we have covered … Basic visualization algorithms

Hybrid Ray Tracing of Massive Models

Cache-Efficient Layouts of BVHs and Meshes

NVIDIA Fermi Architecture

Interactive Massive Model Rendering

Overview Problem Solution CPU vs Memory performance imbalance

Presentation transcript:

Data Management Techniques Sung-Eui Yoon KAIST URL:

Data Avalanche (or Data Explosions) There are too much data out data!!! cmsc838b/Project/Parija_Spacco/images/

Geometric Data Avalanche ● Massive geometric data ● Due to advances of modeling, simulation, and data capture techniques ● Time-varying data (4D data sets)

CAD Model: Double Eagle Oil Tanker 82 million triangles (4 gigabyte)

CAD Model: Boeing 777 Ray Tracing Boeing 777, 470 million triangles Excerpted from SIGGRAPH course note on massive model rendering

Scanned Model: ST. Matthew Model 372 million triangles (10GB)

Possible Solutions? ● Hardware improvement will address the data avalanche? ● Moore’s law: the number of transistor is roughly double every 18 months

Current Architecture Trends Accumulated growth rate during 1999~2009 (log scale) access speed disk access speed Data access time becomes the major computational bottleneck!

Four Orthogonal Approaches ● Cache-coherent layouts ● Random-accessible compressed meshes ● Cache-oblivious ray reordering ● Hybrid parallel continuous collision detection

Overview ● Cache-coherent layouts ● Random-accessible compressed meshes ● Cache-oblivious ray reordering ● Hybrid parallel continuous collision detection

Cache-Coherent Layouts of Meshes ● One dimensional data layout of a mesh ● Reduce the number of cache misses ● Cache-aware or cache-oblivious layouts ● Minimize the number of cache misses for a specific or various cache parameters (e.g., cache block size) [Yoon et al. SIG05, VIS06, Euro06] vava vbvb vdvd vcvc vava vbvb vdvd vcvc One dimensional layout

Block-based I/O Model [Aggarwal and Vitter 88] CPU or GPU Fast memory or cache Slow memory Block transfer Disk 1 sec Access time: sec10 -6 sec

Applications ● View-dependent meshes ● View-dependent rendering ● Triangle meshes ● Isocontour extractions ● Hierarchies ● Ray tracing ● Collision detection

View-Dependent Rendering using LODs Improving GPU vertex cache Utilization GeForce 6800 (January 2005)

Applications ● View-dependent meshes ● View-dependent rendering ● Triangle meshes ● Isocontour extractions ● Hierarchies ● Ray tracing ● Collision detection Puget sound, 134 M triangles Isocontour z(x,y) = 500m Achieve up to 20X improvement on iso-contouring

Applications ● View-dependent meshes ● View-dependent rendering ● Triangle meshes ● Isocontour extractions ● Hierarchies ● Ray tracing ● Collision detection Achieve 30% ~ 300% performance improvement

Advantages ● General ● Works well for various applications ● Cache-oblivious ● Can have benefit for all levels of the memory hierarchy (e.g. CPU/GPU caches, memory, and disk) ● No modification of runtime applications ● Only layout computation Source codes are available as a library called OpenCCL

Overview ● Cache-coherent layouts ● Random-accessible compressed meshes ● Cache-oblivious ray reordering ● Hybrid parallel continuous collision detection

Random-Accessible Compressed Data ● Compression methods of meshes and hierarchies ● Reduce the memory requirements ● Supports random accesses on meshes and hierarchies ● Can be useful to many different applications [Kim et al. Tech. Report 09; Kim et al., TVCG 09; Yoon and Lindstrom, VIS 07]

Hierarchical-Culling oriented Compact Meshes (HCCMeshes) ● Consists of two parts: ● i-HCCMeshes (in-core representation) ● o-HCCMeshes (out-of-core representation)

21 Data Access Framework Main memory User Request Data Data pool

22 Data Access Framework - Out-of-Core Technique Main memory User Request Data Cached data External drive Data pool Cluster c 0 Cluster c 1 Cluster c 2 Cluster c 3 Cluster c 4 Cluster c 5 … Cluster c n cluster ID cluster

23 HCCMeshes Main memory User Request Data Cached data External drive Data pool cluster ID Decomp. cluster compressed cluster Decomp. Compressed Data Cluster c m Cluster c 0 Cluster c 1 Cluster c 2 Cluster c 3 Cluster c 4 Cluster c 5 Cluster c 6 Cluster c 7 Cluster c 8 Cluster c 9 Cluster c 10 Cluster c 11 Cluster c 12 Cluster c 13 … o-HCCMeshi-HCCMesh Support hierarchical random access!

24 Main Benefits ● Use a lower memory space and working set size ● o-HCCMeshes have 20:1 compression ratios ● i-HCCMeshes have 6:1 compression ratios ● Improve runtime performance

25 Applications ● Whitted-style ray tracing ● LOD-based ray tracing ● Collision detection ● Photon mapping ● Non-photorealistic rendering Source codes are available as OpenRACM

26 Results

27 Overview ● Multi-resolution representations ● Random-accessible compressed meshes ● Cache-oblivious ray reordering ● Hybrid parallel continuous collision detection

28 Challenges ● Secondary rays generated show low ray coherence ● Result in low cache utilizations ● In case of ray tracing massive models, expensive cache misses occur (e.g. L1/L2, main memory) Landscape ( >1000 M ) St.Matthew ( 372 M )

29 Goal ● Design an efficient algorithm for converting incoherent secondary rays to coherent ● Achieve a high cache coherence of these rays ● The performance improvement of ray tracing

30 Ray Reordering Framework Camera information Ray generation Ray reordering Ray buffer Hit points and material information Ray processing Disk Caches L1 Main memory Scene information [Moon et al., under review]

31 Applications ● Path tracing ● Photon mapping

32 Result – Path Tracing (Video)Video ● 104 M triangles ● (12.8 GB) ● 512*512 resolution ● 100 path ● 8 area lights

33 Result – Photon Mapping ● 128 M triangles ● (15.7 GB) ● Cache 19% of all the data ● 4 area lights ● 13 X speedup

34 Overview ● Multi-resolution representations ● Random-accessible compressed meshes ● Cache-oblivious ray reordering ● Hybrid parallel continuous collision detection

35 Collision Detection ● Collision detection is used in various fields ● Game, movie, scientific simulation and robotics

36 Discrete collision detection (DCD) Discrete VS Continuous Time step (i-1) Time step (i)

37 Continuous collision detection(CCD) Discrete VS Continuous Time step (i-1) Time step (i)

38 Discrete collision detection (DCD) Discrete VS Continuous Time step (i-1) Time step (i) ?

39 Discrete VS Continuous Continuous CDDiscrete CD AccuracyAccurateMay miss collisions Computation time ExpensiveVery fast

40 Motivation ● Continuous collision detection ● Accurate, but slow for complex models ● Hardware trend ● CPUs and GPUs are increasing the # of cores ● Heterogeneous architectures ● Intel Larabee architecture ● Previous approaches ● Utilize either multi-core CPUs or GPUs ● Not enough performance for interactive applications

41 Hybrid Parallel CCD [Kim et al. PG 09] ● Takes advantages of both: ● Multi-core CPU architectures ● GPU architectures ● Achieves interactive performance for various deforming models consisting of tens or hundreds of thousand triangles CCD Multi-core CPU Multi-core CPU Multi-core CPU Multi-core CPU GPU … …

42 Results ● Performance of HPCCD utilizing both CPUs and GPUs Source codes are available as a library called OpenCCD

43 Results

44 Conclusions ● Data explosion and lower growth rate of data access time ● Discussed three different techniques as a data management method ● Cache-coherent layouts ● Random-accessible compressed data ● Cache-oblivious ray reordering ● Hybrid continuous collision detection ● Applied to rendering and collision detection ● Observed meaningful performance improvement

45 Acknowledgements ● Research collaborators ● TaeJoon Kim, DukSu Kim, Pio Claudio, BooChang Moon, YongYoung Byun, JaePil Heo, SeungYong Lee, YongJin Kim, JaeHyuk Heo, John Kim, Peter Lindstrom, Valerio Pascucci, Dinesh Manocha ● Funding sources ● Microsoft Research Asia ● KAIST seed grant ● Ministry of Knowledge Economy ● Samsung ● Korea Research Foundation