CS448f: Image Processing For Photography and Vision Lecture 1.

Slides:



Advertisements
Similar presentations
Chapter 18 Vectors and Arrays
Advertisements

Taking CUDA to Ludicrous Speed Getting Righteous Performance from your GPU 1.
Working with Images and HTML
CS 11 C track: lecture 7 Last week: structs, typedef, linked lists This week: hash tables more on the C preprocessor extern const.
Garbage collection David Walker CS 320. Where are we? Last time: A survey of common garbage collection techniques –Manual memory management –Reference.
Complete Unified Device Architecture A Highly Scalable Parallel Programming Framework Submitted in partial fulfillment of the requirements for the Maryland.
Windows Movie Maker Introduction to Video Editing Mindy McAdams.
Chapter 18 Vectors and Arrays John Keyser’s Modification of Slides by Bjarne Stroustrup
CSCE643: Computer Vision Bayesian Tracking & Particle Filtering Jinxiang Chai Some slides from Stephen Roth.
Templates in C++. Generic Programming Programming/developing algorithms with the abstraction of types The uses of the abstract type define the necessary.
Review What is a virtual function? What can be achieved with virtual functions? How to define a pure virtual function? What is an abstract class? Can a.
1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 28, 2011 GPUMemories.ppt GPU Memories These notes will introduce: The basic memory hierarchy.
GPU programming: CUDA Acknowledgement: the lecture materials are based on the materials in NVIDIA teaching center CUDA course materials, including materials.
Using Sprites in HTML and CSS AWAR4S May ) Create a Sprite Image A sprite is a specially formatted image which can show variants of various sub-images.
Neal Stublen Transforms  Defined by the transform property Translate Rotate Scale Skew  Apply to any element without affecting.
CS 791v Fall # a simple makefile for building the sample program. # I use multiple versions of gcc, but cuda only supports # gcc 4.4 or lower. The.
Lec 5 Feb 10 Goals: analysis of algorithms (continued) O notation summation formulas maximum subsequence sum problem (Chapter 2) three algorithms image.
CS 240: Data Structures Thursday, July 12 th Lists, Templates, Vector, Algorithms.
CS 106 Introduction to Computer Science I 10 / 15 / 2007 Instructor: Michael Eckmann.
CS 106 Introduction to Computer Science I 10 / 16 / 2006 Instructor: Michael Eckmann.
Graphics and Multimedia In visual Studio. Net (C#)
Image Synthesis Rabie A. Ramadan, PhD 3. 2 Our Problem.
Introduction to CUDA (1 of 2) Patrick Cozzi University of Pennsylvania CIS Spring 2012.
Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2012.
Image Synthesis Rabie A. Ramadan, PhD D Images.
CUDA Performance Considerations (1 of 2) Patrick Cozzi University of Pennsylvania CIS Spring 2012.
Addison Wesley is an imprint of © 2010 Pearson Addison-Wesley. All rights reserved. Chapter 7 The Game Loop and Animation Starting Out with Games & Graphics.
CS162 - Topic #11 Lecture: Recursion –Problem solving with recursion –Work through examples to get used to the recursive process Programming Project –Any.
Today’s lecture 2-Dimensional indexing Color Format Thread Synchronization within for- loops Shared Memory Tiling Review example programs Using Printf.
Finding Body Parts with Vector Processing Cynthia Bruyns Bryan Feldman CS 252.
CUDA Performance Considerations (1 of 2) Patrick Cozzi University of Pennsylvania CIS Spring 2011.
CS448f: Image Processing For Photography and Vision Lecture 2.
Fast Census Transform-based Stereo Algorithm using SSE2
Introduction to CUDA (1 of n*) Patrick Cozzi University of Pennsylvania CIS Spring 2011 * Where n is 2 or 3.
Machine Vision Introduction to Using Cognex DVT Intellect.
1 Computer Science 631 Multimedia Systems Prof. Ramin Zabih Computer Science Department CORNELL UNIVERSITY.
Computer Graphics 3 Lecture 1: Introduction to C/C++ Programming Benjamin Mora 1 University of Wales Swansea Pr. Min Chen Dr. Benjamin Mora.
CUDA All material not from online sources/textbook copyright © Travis Desell, 2012.
CS 106 Introduction to Computer Science I 03 / 02 / 2007 Instructor: Michael Eckmann.
1 Becoming More Effective with C++ … Day Two Stanley B. Lippman
CS/EE 217 GPU Architecture and Parallel Programming Midterm Review
UniMAP Sem2-10/11 DKT121: Fundamental of Computer Programming1 Arrays.
Copyright 2014 – Noah Mendelsohn Code Tuning Noah Mendelsohn Tufts University Web:
CS/EE 217: GPU Architecture and Parallel Programming Convolution, (with a side of Constant Memory and Caching) © David Kirk/NVIDIA and Wen-mei W. Hwu/University.
Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2014.
Programming Fundamentals. Today’s Lecture Array Fundamentals Arrays as Class Member Data Arrays of Objects C-Strings The Standard C++ string Class.
An Introduction to the Cg Shading Language Marco Leon Brandeis University Computer Science Department.
HTML5 and CSS3 Illustrated Unit F: Inserting and Working with Images.
KUKUM-06/07 EKT120: Computer Programming 1 Week 6 Arrays-Part 1.
Chapter 15 - C++ As A "Better C"
CS/EE 217 – GPU Architecture and Parallel Programming
CS 140 Lecture Notes: Virtual Memory
Animation and Flexboxes
David Meredith Aalborg University
Introducing OpenShot Library
Lecture 2: Intro to the simd lifestyle and GPU internals
CS 140 Lecture Notes: Virtual Memory
Outline Image formats and basic operations Image representation
Computer Vision Lecture 5: Binary Image Processing
Introduction to cuBLAS
Mipmapped High Resolution Video for Immersive Virtual 3D Environments
Mean transform , a tutorial
CS/EE 217 – GPU Architecture and Parallel Programming
EKT150 : Computer Programming
Image Processing, Lecture #8
CS 140 Lecture Notes: Virtual Memory
Image Processing, Lecture #8
Convolution Layer Optimization
CS 140 Lecture Notes: Virtual Memory
6- General Purpose GPU Programming
Presentation transcript:

CS448f: Image Processing For Photography and Vision Lecture 1

Today: Introductions Course Format Course Flavor – Low-level basic image processing – High-level algorithms from Siggraph

Introductions and Course Format

Some Background Qs Here are some things I hope everyone is familiar with – Pointer arithmetic – C++ inheritance, virtual methods – Matrix vector multiplication – Variance, mean, median

Some Background Qs Here are some things which I think some people will have seen and some people won’t have: – Fourier Space stuff – Convolution – C++ using templates – Makefiles – Subversion (the version control system)

Background Make use of office hours – Jen and I enjoy explaining things.

What does image processing code look like? Fast Easy To Develop For General

Maximum Ease-of-Development Image im = load(“foo.jpg”); for (int x = 0; x < im.width; x++) { for (int y = 0; y < im.height; y++) { for (int c = 0; c < im.channels; c++) { im(x, y, c) *= 1.5; }

Maximum Speed v0 (Cache Coherency) Image im = load(“foo.jpg”); for (int y = 0; y < im.height; y++) { for (int x = 0; x < im.width; x++) { for (int c = 0; c < im.channels; c++) { im(x, y, c) *= 1.5; }

Maximum Speed v1 (Pointer Math) Image im = load(“foo.jpg”); for (float *imPtr = im->start(); imPtr != im->end(); imPtr++) { *imPtr *= 1.5; }

Maximum Speed v2 (SSE) Image im = load(“foo.jpg”); assert(im.width*im.height*im.channels % 4 == 0); __m128 scale = _mm_set_ps1(1.5); for (float *imPtr = im->start(); imPtr != im->end(); imPtr += 4) { _mm_mul_ps(*((__m128 *)imPtr), scale); }

Maximum Speed v3 (CUDA) (…a bunch of code to initialize the GPU…) Image im = load(“foo.jpg”); (…a bunch of code to copy the image to the GPU…) dim3 blockGrid((im.width-1)/8 + 1, (im.height-1)/8 + 1, 1); dim3 threadBlock(8, 8, 3); scale >>(im->start(), im.width(), im.height()); (…a bunch of code to copy the image back…)

Maximum Speed v3 (CUDA) __global__ scale(float *im, int width, int height, int channels) { const int x = blockIdx.x*8 + threadIdx.x; const int y = blockIdx.y*8 + threadIdx.y; const int c = threadIdx.z; if (x > width || y > height) return; im[(y*width + x)*channels + c] *= 1.5; } Clearly we should have stopped optimizing somewhere, probably before we reached this point.

Maximum Generality Image im = load(“foo.jpg”); for (int x = 0; x < im.width; x++) { for (int y = 0; y < im.height; y++) { for (int c = 0; c < im.channels; c++) { im(x, y, c) *= 1.5; }

Maximum Generality v0 What about video? Image im = load(“foo.avi”); for (int t = 0; t < im.frames; t++) { for (int x = 0; x < im.width; x++) { for (int y = 0; y < im.height; y++) { for (int c = 0; c < im.channels; c++) { im(t, x, y, c) *= 1.5; }

Maximum Generality v1 What about multi-view video? Image im = load(“foo.strangeformat”); for (int view = 0; view < im.views; view++) { for (int t = 0; t < im.frames; t++) { for (int x = 0; x < im.width; x++) { for (int y = 0; y < im.height; y++) { for (int c = 0; c < im.channels; c++){ … }

Maximum Generality v2 Arbitrary-dimensional data Image im = load(“foo.strangeformat”); for (Image::iterator iter = im.start(); iter != im.end(); iter++) { *iter *= 1.5; // you can query the position within // the image using the array iter.position // which is length 2 for a grayscale image // length 3 for a color image // length 4 for a color video, etc }

Maximum Generality v3 Lazy evaluation Image im = load(“foo.strangeformat”); // doesn’t actually do anything im = rotate(im, PI/2); for (Image::iterator iter = im.start(); iter != im.end(); iter++) { // samples the image at rotated locations *iter *= 1.5; }

Maximum Generality v4 Streaming Image im = load(“foo.reallybig”); // foo.reallybig is 1 terabyte of data // doesn’t actually do anything im = rotate(im, PI/2); for (Image::iterator iter = im.start(); iter != im.end(); iter++) { // the iterator class loads rotated chunks // of the image into RAM as necessary *iter *= 1.5; }

Maximum Generality v5 Arbitrary Pixel Data Type Image im = load(“foo.reallybig”); // foo.reallybig is 1 terabyte of data // doesn’t actually do anything im = rotate (im, PI/2); for (Image::iterator iter = im.start(); iter != im.end(); iter++) { // the iterator class loads rotated chunks // of the image into RAM as necessary *iter *= 3; }

Image im = load(“foo.reallybig”); // foo.reallybig is 1 terabyte of data // doesn’t actually do anything im = rotate (im, PI/2); for (Image::iterator iter = im.start(); iter != im.end(); iter++) { // the iterator class loads rotated chunks // of the image into RAM as necessary *iter *= 3; } Maximum Generality v4 Streaming WAY TOO COMPLICATED 99% OF THE TIME

Speed vs Generality Image im = load(“foo.reallybig”); // foo.reallybig is 1 terabyte of data // doesn’t actually do anything im = rotate(im, PI/2); for (Image::iterator iter = im.start(); iter != im.end(); iter++) { // the iterator class loads rotated chunks // of the image into RAM as necessary *iter *= 1.5; } Image im = load(“foo.jpg”); assert(im.width*im.height*im.channels % 4 == 0); __m128 scale = _mm_set_ps1(1.5); for (float *imPtr = im->start(); imPtr != im->end(); imPtr += 4) { _mm_mul_ps(*((__m128 *)imPtr), scale); } Incompatible

For this course: ImageStack Fast Easy To Develop For General

ImageStack Image im = Load::apply(“foo.jpg”); for (int t = 0; t < im.frames; t++) { for (int y = 0; y < im.height; y++) { for (int x = 0; x < im.width; x++) { for (int c = 0; c < im.channels; c++) { im(t, x, y)[c] *= 1.5; }

ImageStack Concessions to Generality Image im = Load::apply(“foo.jpg”); for (int t = 0; t < im.frames; t++) { for (int y = 0; y < im.height; y++) { for (int x = 0; x < im.width; x++) { for (int c = 0; c < im.channels; c++) { im(t, x, y)[c] *= 1.5; } Four dimensions is usually enough

ImageStack Concessions to Generality Image im = Load::apply(“foo.jpg”); for (int t = 0; t < im.frames; t++) { for (int y = 0; y < im.height; y++) { for (int x = 0; x < im.width; x++) { for (int c = 0; c < im.channels; c++) { im(t, x, y)[c] *= 1.5; } Floats are general enough

ImageStack Concessions to Generality Image im = Load::apply(“foo.jpg”); Window left(im, 0, 0, 0, im.frames, im.width/2, im.height); for (int t = 0; t < left.frames; t++) for (int y = 0; y < left.height; y++) for (int x = 0; x < left.width; x++) for (int c = 0; c < left.channels; c++) left(t, x, y)[c] *= 1.5; Cropping can be done lazily, if you just want to process a sub-volume.

ImageStack Concessions to Speed Image im = Load::apply(“foo.jpg”); Window left(im, 0, 0, 0, im.frames, im.width/2, im.height); for (int t = 0; t < left.frames; t++) for (int y = 0; y < left.height; y++) for (int x = 0; x < left.width; x++) for (int c = 0; c < left.channels; c++) left(t, x, y)[c] *= 1.5; Cache-Coherency

ImageStack Concessions to Speed Image im = Load::apply(“foo.jpg”); Window left(im, 0, 0, 0, im.frames, im.width/2, im.height); for (int t = 0; t < left.frames; t++) for (int y = 0; y < left.height; y++) { float *scanline = left(t, 0, y); for (int x = 0; x < left.width; x++) for (int c = 0; c < left.channels; c++) (*scanline++) *= 1.5; } Each scanline guaranteed to be consecutive in memory, so pointer math is OK

Image im = Load::apply(“foo.jpg”); Window left(im, 0, 0, 0, im.frames, im.width/2, im.height); for (int t = 0; t < left.frames; t++) for (int y = 0; y < left.height; y++) for (int x = 0; x < left.width; x++) for (int c = 0; c < left.channels; c++) left(t, x, y)[c] *= 1.5; ImageStack Concessions to Speed This operator is defined in a header, so is inlined and fast, but can’t be virtual (rules out streaming, lazy evaluation, other magic).

Image im = Load::apply(“foo.jpg”); im = Rotate::apply(im, M_PI/2); im = Scale::apply(im, 1.5); Save::apply(im, “out.jpg”); ImageStack Ease of Development Each image operation is a class (not a function)

ImageStack Ease of Development Images are reference-counted pointer classes. You can pass them around efficiently and don’t need to worry about deleting them. Image im = Load::apply(“foo.jpg”); im = Rotate::apply(im, M_PI/2); im = Scale::apply(im, 1.5); Save::apply(im, “out.jpg”);

The following videos were then shown: Edge-Preserving Decompositions: – Animating Pictures: – Seam Carving: – – Face Beautification: –