-Based Workload Estimation for Mobile 3D Graphics

Slides:



Advertisements
Similar presentations
Point-based Graphics for Estimated Surfaces
Advertisements

Advanced Piloting Cruise Plot.
Pricing for Utility-driven Resource Management and Allocation in Clusters Chee Shin Yeo and Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS)
© 2008 Pearson Addison Wesley. All rights reserved Chapter Seven Costs.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Chapter 1 The Study of Body Function Image PowerPoint
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 3 CPUs.
Chapter 1 Image Slides Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Towards Automating the Configuration of a Distributed Storage System Lauro B. Costa Matei Ripeanu {lauroc, NetSysLab University of British.
Chapter 3: Top-Down Design with Functions Problem Solving & Program Design in C Sixth Edition By Jeri R. Hanly & Elliot B. Koffman.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
My Alphabet Book abcdefghijklm nopqrstuvwxyz.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Addition Facts
Year 6 mental test 5 second questions
Year 6 mental test 10 second questions
1GR2-00 GR2 Advanced Computer Graphics AGR Lecture 9 Adding Realism Through Texture.
16.1 Si23_03 SI23 Introduction to Computer Graphics Lecture 16 – Some Special Rendering Effects.
ZMQS ZMQS
Predicting Performance Impact of DVFS for Realistic Memory Systems Rustam Miftakhutdinov Eiman Ebrahimi Yale N. Patt.
Robust Window-based Multi-node Technology- Independent Logic Minimization Jeff L.Cobb Kanupriya Gulati Sunil P. Khatri Texas Instruments, Inc. Dept. of.
Re-examining Instruction Reuse in Pre-execution Approaches By Sonya R. Wolff Prof. Ronald D. Barnes June 5, 2011.
Acceleration of Cooley-Tukey algorithm using Maxeler machine
1 Lecture 2: Metrics to Evaluate Performance Topics: Benchmark suites, Performance equation, Summarizing performance with AM, GM, HM Video 1: Using AM.
Mehdi Naghavi Spring 1386 Operating Systems Mehdi Naghavi Spring 1386.
1 Challenge the future Subtitless On Lightweight Design of Submarine Pressure Hulls.
Chapter 1 Introduction to the Programmable Logic Controllers.
1 Quality of Service Issues Network design and security Lecture 12.
ABC Technology Project
Gate Sizing for Cell Library Based Designs Shiyan Hu*, Mahesh Ketkar**, Jiang Hu* *Dept of ECE, Texas A&M University **Intel Corporation.
Cache and Virtual Memory Replacement Algorithms
1 Sizing the Streaming Media Cluster Solution for a Given Workload Lucy Cherkasova and Wenting Tang HPLabs.
VOORBLAD.
15. Oktober Oktober Oktober 2012.
“Start-to-End” Simulations Imaging of Single Molecules at the European XFEL Igor Zagorodnov S2E Meeting DESY 10. February 2014.
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Squares and Square Root WALK. Solve each problem REVIEW:
© 2012 National Heart Foundation of Australia. Slide 2.
Using Technology Effectively Caroline Hargrove World Rowing Coaches Conference 22 nd January 2011.
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
Chapter 5 Test Review Sections 5-1 through 5-4.
GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.
Addition 1’s to 20.
25 seconds left…...
Håkan Sundell, Chalmers University of Technology 1 Evaluating the performance of wait-free snapshots in real-time systems Björn Allvin.
Januar MDMDFSSMDMDFSSS
Week 1.
We will resume in: 25 Minutes.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Improved Census Transforms for Resource-Optimized Stereo Vision
Intracellular Compartments and Transport
PSSA Preparation.
Essential Cell Biology
Mani Srivastava UCLA - EE Department Room: 6731-H Boelter Hall Tel: WWW: Copyright 2003.
CO-AUTHOR RELATIONSHIP PREDICTION IN HETEROGENEOUS BIBLIOGRAPHIC NETWORKS Yizhou Sun, Rick Barber, Manish Gupta, Charu C. Aggarwal, Jiawei Han 1.
RollCaller: User-Friendly Indoor Navigation System Using Human-Item Spatial Relation Yi Guo, Lei Yang, Bowen Li, Tianci Liu, Yunhao Liu Hong Kong University.
Optimal Partition with Block-Level Parallelization in C-to-RTL Synthesis for Streaming Applications Authors: Shuangchen Li, Yongpan Liu, X.Sharon Hu, Xinyu.
Technische Universität München Computer Graphics SS 2014 Graphics Effects Rüdiger Westermann Lehrstuhl für Computer Graphik und Visualisierung.
RealityEngine Graphics Kurt Akeley Silicon Graphics Computer Systems.
Power Analysis of Mobile 3D Graphics Bren Mochocki Dept of CSE, University of Notre Dame Notre Dame Kanishka Lahiri Srihari Cadambi NEC Laboratories America.
COOL Chips IV A High Performance 3D Graphics Rasterizer with Effective Memory Structure Woo-Chan Park, Kil-Whan Lee*, Seung-Gi Lee, Moon-Hee Choi, Won-Jong.
Games are Up for DVFS Yan Gu Samarjit Chakraborty Wei Tsang Ooi Department of Computer Science National University of Singapore.
Stream Processing Main References: “Comparing Reyes and OpenGL on a Stream Architecture”, 2002 “Polygon Rendering on a Stream Architecture”, 2000 Department.
A SEMINAR ON 1 CONTENT 2  The Stream Programming Model  The Stream Programming Model-II  Advantage of Stream Processor  Imagine’s.
Presentation transcript:

-Based Workload Estimation for Mobile 3D Graphics Bren Mochocki*†, Kanishka Lahiri*, Srihari Cadambi*, Xiaobo Sharon Hu† *NEC Laboratories America, †University of Notre Dame DAC 2006

Mobile Graphics Technology Increasing resource load Performance (Speed) Lifetime (Energy) Graphics Technology Advanced 3D Basic 3D Video clips 2D color 1997 2000 2001 2002 2003 2004 2005 2006 2007 Time

Meeting Performance/Lifetime Requirements Hardware Solutions Woo, 04 low-power 3D ASIC Kameyama, 03 low-power 3D ASIC Gu, Chakraborty, Ooi, 06 Games are up for DVFS Akenine-Moller, 03 Texture compression for mobile terminals Mochocki, Lahiri, Cadambi, 06 DVFS for mobile 3D graphics Keep short Stick to title Some can use workload prediction System - Level Optimizations Graphics Algorithms Tack, 04 LoD control for mobile terminals Accurate workload prediction is critical

Mobile 3D Workload Estimation Why? Adapt architectural parameters Adapt application parameters Better on-line resource management Desirable properties Speed – must be performed on-line Accuracy Compact

Workload-Estimation Spectrum General Purpose Simplicity Application specific Accuracy History-Based Predictors Analytical Predictors General purpose history-based predictors provide poor prediction accuracy for rapidly changing workloads Highly accurate analytical schemes are too complex for use at run time

Workload-Estimation Spectrum General Purpose Simplicity Application specific Accuracy Signature-Based Predictor Uses combination of history and application-specific parameters (the signature) to predict future workload Strikes a balance between simplicity and accuracy Preserves both cause AND effect Preserves substantial history

Outline Introduction and Motivation Background 3D-pipeline Basics Challenges in workload Estimation Signature-Based Workload Prediction Experimental Results Conclusions

3D Pipeline Basics 3D representation  2D image Geometry Setup Texturing Geometry Setup Rendering High level simplified view of 3D graphics pipeline Practice describing rendering stage quickly (hidden surface removal) World View Camera View Raster View Frame Buffer Transformations Lighting Clipping Scan-line conversion Pixel rendering Texturing

Workload Across Applications 12 TexCube RoomRev 10 8 Execution Cycles (ARM, x107) 6 4 2 Motivate the source of the pipeline imbalance Benchmark Workload varies significantly between applications Prediction scheme must be flexible

Workload Within an Application Workload can change rapidly between frames 1 2 3 4 5 6 Race geometry render Execution Cycles (ARM, x107) setup Start animation right away This motivates DVFS, but first Transition: what impacts the workload variation? 1 16 31 46 61 76 91 106 121 136 151 166 181 196 Frame

Outline Introduction and Motivation Background Signature-Based Workload Prediction Signature Generation Method Overview Pipeline Modifications Experimental Results Conclusions

Example Signature: <vertex count, avg. area> 3D Pipeline extract end frame Frame Buffer Application extract extract signature measure workload <6, 2.5> 1.0e4 -Workload we want to predict (last box) -Default prediction (Static analysis) Success slide after this one Signature Workload Default Signature Table Workload Prediction

Example Signature: <vertex count, avg. area> 3D Pipeline end frame Frame Buffer Application extract extract signature measure workload <6, 2.5> 1.0e4 1. More parameters implies fewer collisions 2. Exact match not necessary – Distance measurements can be used 3. Initial signature table can be included with the application Signature Workload <6, 2.5> 1.0e4 1.0e4 Signature Table Workload Prediction

Example Signature: <vertex count, avg. area> 3D Pipeline end frame Frame Buffer Application No overlap (render all pixels) extract extract signature measure workload <6, 2.5> 1.2e4 1. More parameters implies fewer collisions 2. Exact match not necessary – Distance measurements can be used 3. Initial signature table can be included with the application Signature Workload <6, 2.5> 1.0e4 1.0e4 Signature Table Workload Prediction

Partitioning the 3D pipeline Bulk of 3D workload ORIGINAL GEOMETRY SETUP RENDER Application Display Transform Transform Lighting Lighting Clipping Clipping Scan-line conversion Scan-line conversion Per-pixel Operations Per-pixel Operations Generally small workload Provides necessary signature elements Pre buffer, post buffer PARTITIONED Application Display Transform + Clipping Buffer Lighting Scan-line conversion Per-pixel Operations Pre-Buffer Post Buffer

Pipeline Workload Pre-buffer workload is less than 10% of the total workload Pre-buffer variation is small Post-buffer workload is large with significant variation post-buffer pre-buffer Red lines are min-max range

Signature Composition Can vary by application May include: Average Triangle Area Average Triangle Height Total vertex count Lit vertex count Number of lights Any measurable parameter Larger signatures  more accurate Smaller signatures  less time & space

Outline Introduction & Background Experimental Framework Signature-Based Workload Prediction Experimental Results Evaluation Framework Signature length vs. accuracy Frame Rate Energy Conclusions

System-level Communication Architecture Architectural View pre-buffer signature extraction post-buffer Prog. Voltage Regulator Prog. PLL V, F Applications Processor Programmable 3D Graphics Engine Performance counter System-level Communication Architecture Animate Voltage regulation (for example, we could…) Memory Frame Buffer measure workload buffer signature table output

Evaluation Framework OpenGL/ES library Instrumented with pipeline stage triggers Hans-Martin Will Vincent Fast, cycle-accurate Simulation W. Qin Cross Compiler ARM — g++ 3D application Simit-ARM OpenGL/ES 1.0 3D – application Trace simulator of mobile 3D pipeline Triangle, Instruction, & Trigger traces Workload prediction scheme Architecture Model Trace Simulator Processor Energy Model 3D pipeline Performance/power Simulation output

Workload Accuracy > 2 fps error at peaks Average Error (normalized) Peaks < 1 fps Even simple signatures can give a good prediction (3 or 4 four well chosen signatures) Signature must be chosen well <a> 2 bytes <a,b> 6 bytes <a,b,c> 10 bytes <a,b,c,d> 14 bytes Signature Complexity <a> triangle count, <b> avg. area, <c> avg. height, <d> vertex count

Frame Rate High peaks result in wasted energy Target Describe why peaks are bad (above target) and valleys are bad (below bad) Low valleys result in poor visual quality

Workload prediction for DVFS Before DVFS DVFS using signature-based workload Prediction 32% energy reduction One application of workload prediction….

Outline Introduction & Background Experimental Framework Signature-Based Workload Prediction Experimental Results Conclusions

Conclusions Accurate 3D workload prediction critical for mobile platforms. Proposed signature-based method Outperforms conventional history methods Trade accuracy for time & space Can be used to meet real time constraints and conserve energy.

Future Work Automatic selection of signature elements More sophisticated data structures for signature storage Faster comparison and replacement algorithms

-Based Workload Estimation for Mobile 3D Graphics Questions? -Based Workload Estimation for Mobile 3D Graphics Bren Mochocki*†, Kanishka Lahiri*, Srihari Cadambi*, Xiaobo Sharon Hu† *NEC Laboratories America, †University of Notre Dame DAC 2006