Efficient Run-Time Dispatching in Generic Programming with Minimal Code Bloat Lubomir Bourdev Advanced Technology Labs Adobe Systems Jaakko Järvi Computer.

Slides:



Advertisements
Similar presentations
Agenda Definitions Evolution of Programming Languages and Personal Computers The C Language.
Advertisements

Functional Image Synthesis. Pan An image synthesis “language” Images are functions Continuous and infinite Embedded in a functional host language Reusable.
The C ++ Language BY Shery khan. The C++ Language Bjarne Stroupstrup, the language’s creator C++ was designed to provide Simula’s facilities for program.
Image Processing … computing with and about data, … where "data" includes the values and relative locations of the colors that make up an image.
Mining Compressed Frequent- Pattern Sets Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign.
Review What is a virtual function? What can be achieved with virtual functions? How to define a pure virtual function? What is an abstract class? Can a.
SDN + Storage.
Normal Map Compression with ATI 3Dc™ Jonathan Zarge ATI Research Inc.
Lecture 10: Part 1: OO Issues CS 540 George Mason University.
Ray tracing. New Concepts The recursive ray tracing algorithm Generating eye rays Non Real-time rendering.
S I E M E N S C O R P O R A T E R E S E A R C H 1 1 A Seeded Image Segmentation Framework Unifying Graph Cuts and Random Walker Which Yields A New Algorithm.
Reference: Message Passing Fundamentals.
Pointer and Shape Analysis Seminar Context-sensitive points-to analysis: is it worth it? Article by Ondřej Lhoták & Laurie Hendren from McGill University.
Vertically Integrated Analysis and Transformation for Embedded Software John Regehr University of Utah.
M-HinTS: Mimicking Humans in Texture Sorting Egon L. van den Broek Eva M. van Rikxoort.
Operator Overloading: indexing Useful to create range-checked structures: class four_vect { double stor[4]; // private member, actual contents of vector.
A fuzzy video content representation for video summarization and content-based retrieval Anastasios D. Doulamis, Nikolaos D. Doulamis, Stefanos D. Kollias.
Content-Based Image Retrieval using the EMD algorithm Igal Ioffe George Leifman Supervisor: Doron Shaked Winter-Spring 2000 Technion - Israel Institute.
Digital Images. Scanned or digitally captured image Image created on computer using graphics software.
Image processing Lecture 4.
Digital Colour Theory. What is colour theory? It is the theory behind colour mixing and colour combination.
Bitmapped Images. Bitmap Images Today’s Objectives Identify characteristics of bitmap images Resolution, bit depth, color mode, pixels Determine the most.
Digital Images The digital representation of visual information.
Visual C New Optimizations Ayman Shoukry Program Manager Visual C++ Microsoft Corporation.
Computer Systems Nat 4.5 Computing Science Data Representation Lesson 4: Storing Graphics EXTENSION.
C++ Programming. Table of Contents History What is C++? Development of C++ Standardized C++ What are the features of C++? What is Object Orientation?
1 Outline:  Outline of the algorithm  MILP formulation  Experimental Results  Conclusions and Remarks Advances in solving scheduling problems with.
Threads, Thread management & Resource Management.
Bit Sequential (bSQ) Data Model and Peano Count Trees (P-trees) Department of Computer Science North Dakota State University, USA (the bSQ and P-tree technology.
Overloading Binary Operators Two ways to overload –As a member function of a class –As a friend function As member functions –General syntax Data Structures.
Geoff Holmes and Bernhard Pfahringer COMP206-08S General Programming 2.
Image Formats and Files Jung-Ming Wang
ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++
Evolutionary Art with Multiple Expression Programming By Quentin Freeman.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Parallel Ray Tracer Computer Systems Lab Presentation Stuart Maier.
Standard Template Library The Standard Template Library was recently added to standard C++. –The STL contains generic template classes. –The STL permits.
Large Scale Data Representation Erik Goodman Daniel Kapellusch Brennen Meland Hyunjae Park Michael Rogers.
COMPUTER GRAPHICS. Can refer to the number of pixels in a bitmapped image Can refer to the number of pixels in a bitmapped image The amount of space it.
CoCo: Sound and Adaptive Replacement of Java Collections Guoqing (Harry) Xu Department of Computer Science University of California, Irvine.
Music Programming Using New Features of Standard C++ Adrian Freed Amar Chaudhary Center for New Music and Audio Technologies University of California,
Map image compression for real-time applications UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE Image Compression Research group:
Chapter 1: Introduction to Visual Basic.NET: Background and Perspective Visual Basic.NET Programming: From Problem Analysis to Program Design.
GPGPU: Parallel Reduction and Scan Joseph Kider University of Pennsylvania CIS Fall 2011 Credit: Patrick Cozzi, Mark Harris Suresh Venkatensuramenan.
DISCRIMINATIVELY TRAINED DENSE SURFACE NORMAL ESTIMATION ANDREW SHARP.
An Introduction to the Cg Shading Language Marco Leon Brandeis University Computer Science Department.
ECE 750 Topic 8 Meta-programming languages, systems, and applications Automatic Program Specialization for J ava – U. P. Schultz, J. L. Lawall, C. Consel.
Duke CPS Iterators: Patterns and STL l Access a container without knowing how it’s implemented ä libtapestry: first, isDone, next, current iterators.
12/12/2003EZW Image Coding Duarte and Haupt 1 Examining The Embedded Zerotree Wavelet (EZW) Image Coding Method Marco Duarte and Jarvis Haupt ECE 533 December.
Geospatial Data Abstraction Library(GDAL) Sabya Sachi.
Zachary Starr Dept. of Computer Science, University of Missouri, Columbia, MO 65211, USA Digital Image Processing Final Project Dec 11 th /16 th, 2014.
Demystifying the Pixel. What is a Pixel The smallest unit of measurement in a image It contains color space information in RGB, CMYK, HSB Resolution information.
Naifan Zhuang, Jun Ye, Kien A. Hua
Support Vector Machines and Kernels
Concept Visualization for Ontologies of Learning Agents
Computer Systems and Networks
COMS 161 Introduction to Computing
Digital Image Processing using MATLAB
4.2 Data Input-Output Representation
Performance Optimization for Embedded Software
Segmentation of Images By Color
Kevin Mason Michael Suggs
Digital Image Processing
Midterm Exam Closed book, notes, computer Similar to test 1 in format:
The Challenge of Cross - Language Interoperability
<PETE> Shape Programmability
Computer Systems Nat 4.5 Computing Science Data Representation
Basic Concepts of Digital Imaging
C++ Object Oriented 1.
Excursions into Parallel Programming
Presentation transcript:

Efficient Run-Time Dispatching in Generic Programming with Minimal Code Bloat Lubomir Bourdev Advanced Technology Labs Adobe Systems Jaakko Järvi Computer Science Department Texas A&M University

Agenda Context & problem statement Background – previous approaches Our approach to code bloat reduction Code bloat reduction in run-time dispatch Results & conclusion

Agenda Context & problem statement Background – previous approaches Our approach to code bloat reduction Code bloat reduction in run-time dispatch Results & conclusion

Context: Image Manipulation Images vary in many different ways Writing generic and efficient image processing algorithms is challenging

Image Representations 4x3 image in which the second pixel is hilighted In interleaved form: In planar form: planar vs. interleaved channel depth 8-bit, 16-bit… channel order (RGB vs. BGR) Color space (RGB, CMYK…) optional padding at the end of rows

Generic Image Library (GIL) Adobe’s Open Source Image Library Abstracts image representations from algorithms on images Allows for writing the algorithm once & having it work on images of any representation, without loss of performance

Problem Statement How do we write image processing algorithms that are: –Generic –Efficient –Compact –Run-Time Flexible

Agenda Context & problem statement Background – previous approaches Our approach to code bloat reduction Code bloat reduction in run-time dispatch Results & conclusion

Image algorithms via inheritance & polymorphism struct image { virtual void invert()=0; }; struct rgb_image : public image { virtual void invert() { for (i=0; i<img.size(); ++i) … } }; struct cmyk_image : public image { virtual void invert() { for (i=0; i<img.size(); ++i) … } }; Generic XX Efficient √ Compact √ Run-Time Flexible √

Image algorithms via inheritance & polymorphism struct pixel { virtual void invert()=0; }; struct rgb_pixel : public pixel { virtual void invert(); }; struct gray_pixel : public pixel { virtual void invert(); }; struct image { pixel* operator[](size_t i); }; void invert(image* img) { for (i=0; i<img.size(); ++i) img[i]->invert(); } Generic X Efficient X Compact √ Run-Time Flexible √ Performance problem: dynamic dispatch once per pixel

Image Algorithms via Generic Programming struct rgb_pixel {…}; struct gray_pixel {…}; void invert_pixel(rgb_pixel&) {…} void invert_pixel(gray_pixel&) {…} template struct image { Pixel& operator[](size_t i); }; template void invert(Image& img) { for (i=0; i<img.size(); ++i) invert_pixel(img[i]); } Generic √ Efficient √ Compact √ Run-Time Flexible X

Generic Code Lacks Flexibility We need run-time flexibility: typedef boost::mpl::vector images; gil::any_image runtime_image; gil::jpeg_read_image(runtime_image, “test.jpg”); invert(runtime_image); How can we do that without loss of performance? –Variant construct (see boost::variant) –runtime_image holds: index: index to the type of image bits: buffer containing the currently instantiated image –To invoke an algorithm, go through a switch statement & cast –Efficient: invoke dynamic dispatch only once per algorithm

Variant invocation void invert_image(void* bits, int index) { switch (index) { case kLAB: invert(*(image *)(bits)); case kRGB: invert(*(image *)(bits)); } } Generic version: template void apply_operation(void* bits, int index, Op op) { switch (index) { case kLAB: op(*(image *)(bits)); case kRGB: op(*(image *)(bits)); } } Generic √ Efficient √ Compact x Run-Time Flexible √

Solution: Template Hoisting Define a class hierarchy: template class k_channel_image {…}; class rgb_image : public k_channel_image {}; class lab_image : public k_channel_image {}; Define the algorithm at the appropriate level of the hierarchy: template void invert(k_channel_image &) {…} - enforces a specific hierarchy - different algorithms may need different hierarchies - switch statement overhead remains - does not help when the function is inlined Generic x Efficient √ Compact Run-Time Flexible √

Agenda Context & problem statement Background – previous approaches Our approach to code bloat reduction Code bloat reduction in run-time dispatch Results & conclusion

Our method: Algorithm-centric approach to code bloat Define dimensions of variability of the type Specify, for a given algorithm, the set of dimensions that matter example: copy_pixels(source_image, dst_image); Reduce the type along the dimensions that don’t matter Image propertySource of “copy_pixels” Color SpaceNot important Channel TypeImportant Number of ChannelsImportant Channel OrderingImportant MutabilityNot important

Type Reduction Every algorithm partitions the space of its argument types into a set of equivalence classes Members of an equivalence result in the same assembly when instantiated The algorithm is instantiated only with one representative from each equivalence class

Type Reduction Implementation Metafunction to define the partition: template struct reduce { typedef T type; }; Generic algorithm invocation: template inline void apply_operation(const T& argument, Op op) { typedef typename reduce ::type base_t; op(reinterpret_cast (argument)); }

Example: The invert algorithm Define the algorithm as a function object: struct invert_op { template void operator()(Image&){…} }; Provide a function overload to invoke it: template inline void invert(Image& image) { apply_operation(image, invert_op()); } Inverting RGB and LAB images is assembly-level identical: template<> struct reduce { typedef rgb8_image_t; };

The technique generalizes to multiple dimensions template void apply_operation(T1& arg1, T2& arg2, Op op) { typedef typename reduce ::type base1_t; typedef typename reduce ::type base2_t; typedef std::pair pair_t; typedef typename reduce ::type base_pair_t; std::pair p(&arg1,&arg2); op(reinterpret_cast (p)); } template <> struct reduce {…}; template <> struct reduce<copy_pixels_op, std::pair > {…};

Defining Reduce Specializations Reduce dimensions separately, then combine: template struct reduce { typedef reduce_cs ::type cs; typedef reduce_ch ::type channel; typedef image_type ::type type; }; Reuse structures via metafunction forwarding: template struct reduce > : public reduce > {};

Example: binary color space reduction We identified eight such common color space equivalence classes A B G R B G R A A R G B R G B A reduces to:

Agenda Context & problem statement Background – previous approaches Our approach to code bloat reduction Code bloat reduction in run-time dispatch Results & conclusion

Reduction in variants Input: a variant of: input_types: [rgb8_image, lab8_image, cmyk16_image, rgba16_image] input_index: 2 Step 1: Reduce each member of the vector: reduced_t: [rgb8_image, rgb8_image, rgba16_image, rgba16_image] Step 2: Remove duplicates: output_types_t: [rgb8_image, rgba16_image] Step 3: Create index vector from reduced_t to output_types_t: indices_t: [0, 0, 1, 1] Step 4: Use indices_t to map the input index to an output index: output_index = indices_t[input_index] = indices[2] = 1 Invoke the algorithm on a variant of: output_types_t: [rgb8_image, rgba16_image] output_index: 1

Binary reduction in variants Step 1: Perform unary pre-reduction on each argument [A1, A2, A3, A4] with index 2 -> [A1, A3] with out_index1 = 1 [B1, B2, B3] with index 3 -> [B1, B2] with out_index2 = 0 Step 2: Compute a vector of the cross-products of types [(A1,B1), (A1,B2), (A3,B1), (A3,B2)] Step 3: Apply unary reduction on it: output_types_t = [(A1,B1), (A1,B2), (A3,B2)] Step 4: Compute the index in the output vector out_index = out_index1 * size(Vec1) + out_index2 Invoke the algorithm on a single variant of: output_types_t = [(A1,B1), (A1,B2), (A3,B2)] out_index

Hypothetical Reduction Example: copy_pixels Start with 3*9*2*2 = 108 image types –channel type (8 / 16 / 32 bit) –color space (rgb,bgr,lab,hsb,rgba,argb,bgra,abgr,cmyk) –planar / interleaved pixel ordering –mutable / immutable type Unary pre-reduction (3*6*2*1 = 36 equiv. classes) –reduce lab,hsb to rgb, cmyk to rgba –mutable-only Binary reduction –reduce color space pairs to 8 equiv. classes based on mapping –reduce incompatible combinations End result: 1 switch statement with 96 cases (down from 109 case statements with =11664 cases!)

Agenda Context & problem statement Background – previous approaches Our approach to code bloat reduction Code bloat reduction in run-time dispatch Results & conclusion

Tests Test sets –Set A: 90 types (10 color spaces, 3 channel types, other variations) –Set B: 10 types (4 color spaces, other) –Set C: 12 types (3 color spaces, planar/interleaved, step/nonstep) Tests –Test 1: copy_pixels on Set B (inlined binary algorithm) –Test 2: copy_pixels on Set C (inlined binary algorithm) –Test 3: resample_pixels on Set B (non-inlined binary algorithm) –Test 4: resample_pixels on Set C (non-inlined binary algorithm) –Test 5: invert_pixels on Set A (inlined unary algorithm)

Results Test % % Test % % Test % % Test % % Test % % Visual Studio 8GCC 4.0 No Reduce Reduce Percent reduction No Reduce Reduce Percent reduction Test 1106%116% Test 278%97% Test 387%118% Test 475%103% Test 5194%307% VS 8.0 GCC 4.0 Reduction in code bloat Effect on compile time

Conclusion Drawbacks –Unsafe –Requires intimate knowledge of the types and the algorithm –Some compilers can optimize most of the code bloat Benefits –Works even when functions are inlined –Simplifies code generated by variants (especially double dispatch) –Does not impose class hierarchy (essential for generic code!) –Works when algorithms differ in requirements