GPU-ACCELERATED VIDEO ENCODING/DECODING CIS 565 Spring 2012 Philip Peng, 2012/03/19.

Slides:

Advertisements

Similar presentations

Finger Gesture Recognition through Sweep Sensor Pong C Yuen 1, W W Zou 1, S B Zhang 1, Kelvin K F Wong 2 and Hoson H S Lam 2 1 Department of Computer Science.

Advertisements

Video Mosaics CS 294/283 Final Project Steve Martin U.C. Berkeley, Fall 2003.

GPU-Accelerated Beat Detection for Dancing Monkeys Philip Peng, Yanjie Feng UPenn CIS 565 Spring 2012 Final Project – Midpoint Presentation img src:

GPU-Accelerated Beat Detection for Dancing Monkeys Philip Peng, Yanjie Feng UPenn CIS 565 Spring 2012 Final Project – Final Presentation img src:

Data Compression CS 147 Minh Nguyen.

© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Accelerators zExample: video accelerator.

MPEG4 Natural Video Coding Functionalities: –Coding of arbitrary shaped objects –Efficient compression of video and images over wide range of bit rates.

© Fastvideo, Key Points We implemented the fastest JPEG codec Many applications using JPEG can benefit from our codec.

SWE 423: Multimedia Systems

2009/04/07 Yun-Yang Ma.  Overview  What is CUDA ◦ Architecture ◦ Programming Model ◦ Memory Model  H.264 Motion Estimation on CUDA ◦ Method ◦ Experimental.

New Sorting-Based Lossless Motion Estimation Algorithms and a Partial Distortion Elimination Performance Analysis Bartolomeo Montrucchio and Davide Quaglia.

1 Single Reference Frame Multiple Current Macroblocks Scheme for Multiple Reference IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY Tung-Chien.

Video Compression Bee Fong. Lossy Compression  Inter Frame Compression Compression among frames Compression among frames  Intra Frame Compression Compression.

Computing motion between images

Lecture06 Video Compression. Spatial Vs. Temporal Redundancy Image compression techniques exploit spatial redundancy, the phenomenon that picture contents.

Image (and Video) Coding and Processing Lecture: Motion Compensation Wade Trappe Most of these slides are borrowed from Min Wu and KJR Liu of UMD.

1 An Efficient Mode Decision Algorithm for H.264/AVC Encoding Optimization IEEE TRANSACTION ON MULTIMEDIA Hanli Wang, Student Member, IEEE, Sam Kwong,

A Performance and Energy Comparison of FPGAs, GPUs, and Multicores for Sliding-Window Applications From J. Fowers, G. Brown, P. Cooke, and G. Stitt, University.

Volar Video a video encoding project. Our Customer Volar Video in downtown lex Specialize in live event streaming Architect the software to make that.

Adaptive Video Coding to Reduce Energy on General Purpose Processors Daniel Grobe Sachs, Sarita Adve, Douglas L. Jones University of Illinois at Urbana-Champaign.

Department of Electrical Engineering National Cheng Kung University

This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit

JPEG C OMPRESSION A LGORITHM I N CUDA Group Members: Pranit Patel Manisha Tatikonda Jeff Wong Jarek Marczewski Date: April 14, 2009.

Predictive Runtime Code Scheduling for Heterogeneous Architectures 1.

Intel® IPP. Fighting for the performance Intel® IPP. Fighting for the performance Novosibirsk, 2008 Boris Sabanin Novosibirsk, 2008 Boris Sabanin.

Computer Graphics Graphics Hardware

Performance Enhancement of Video Compression Algorithms using SIMD Valia, Shamik Jamkar, Saket.

Adaptive Multi-path Prediction for Error Resilient H.264 Coding Xiaosong Zhou, C.-C. Jay Kuo University of Southern California Multimedia Signal Processing.

GPU Architecture and Programming

Parallel Algorithms Patrick Cozzi University of Pennsylvania CIS Spring 2012.

Compression video overview 演講者：林崇元. Outline Introduction Fundamentals of video compression Picture type Signal quality measure Video encoder and decoder.

Aug 25, 2005 page1 Aug 25, 2005 Integration of Advanced Video/Speech Codecs into AccessGrid National Center for High Performance Computing Speaker: Barz.

MOTION ESTIMATION IMPLEMENTATION IN RECONFIGURABLE PLATFORMS

JPEG-GPU: A GPGPU IMPLEMENTATION OF JPEG CORE CODING SYSTEMS Ang Li University of Wisconsin-Madison.

Compression of Real-Time Cardiac MRI Video Sequences EE 368B Final Project December 8, 2000 Neal K. Bangerter and Julie C. Sabataitis.

IIIT Hyderabad Scalable Clustering using Multiple GPUs K Wasif Mohiuddin P J Narayanan Center for Visual Information Technology International Institute.

-BY KUSHAL KUNIGAL UNDER GUIDANCE OF DR. K.R.RAO. SPRING 2011, ELECTRICAL ENGINEERING DEPARTMENT, UNIVERSITY OF TEXAS AT ARLINGTON FPGA Implementation.

Adam Wagner Kevin Forbes. Motivation  Take advantage of GPU architecture for highly parallel data-intensive application  Enhance image segmentation.

QCAdesigner – CUDA HPPS project

Vamsi Krishna Vegunta University of Texas, Arlington

UNDER THE GUIDANCE DR. K. R. RAO SUBMITTED BY SHAHEER AHMED ID : Encoding H.264 by Thread Level Parallelism.

CDVS on mobile GPUs MPEG 112 Warsaw, July Our Challenge CDVS on mobile GPUs  Compute CDVS descriptor from a stream video continuously  Make.

Effect of Saturation Arithmetic on Sum of Absolute Difference (SAD) Computation in H.264 Venkata Suman Sanikommu ECE 734 Project Presentation.

CUDA Basics. Overview What is CUDA? Data Parallelism Host-Device model Thread execution Matrix-multiplication.

Igor Jánoš. Goal of This Project Decode and process a full-HD video clip using only software resources Dimension – 1920 x 1080 pixels.

COMPARATIVE STUDY OF HEVC and H.264 INTRA FRAME CODING AND JPEG2000 BY Under the Guidance of Harshdeep Brahmasury Jain Dr. K. R. RAO ID MS Electrical.

Canny Edge Detection Using an NVIDIA GPU and CUDA Alex Wade CAP6938 Final Project.

Motion Estimation Multimedia Systems and Standards S2 IF Telkom University.

SIMD Implementation of Discrete Wavelet Transform Jake Adriaens Diana Palsetia.

Hierarchical Systolic Array Design for Full-Search Block Matching Motion Estimation Noam Gur Arie,August 2005.

Fast disparity motion estimation in MVC based on range prediction Xiao Zhong Xu, Yun He ICIP 2008.

Multi-Frame Motion Estimation and Mode Decision in H.264 Codec Shauli Rozen Amit Yedidia Supervised by Dr. Shlomo Greenberg Communication Systems Engineering.

CMPT365 Multimedia Systems 1 Media Compression - Video Spring 2015 CMPT 365 Multimedia Systems.

Computer Graphics Graphics Hardware

GPU Architecture and Its Application

NFV Compute Acceleration APIs and Evaluation

JPEG Compression What is JPEG? Motivation

GPUs: Not Just for Graphics Anymore

Last update on June 15, 2010 Doug Young Suh

Data Compression.

Video-in-Video Insertion into a Pre-encoded Bit-stream

Highly Efficient and Flexible Video Encoder on CPU+FPGA Platform

Faster File matching using GPGPU’s Deephan Mohan Professor: Dr

Sum of Absolute Differences Hardware Accelerator

An enhanced estimation: motion and rotation estimation

MPEG4 Natural Video Coding

Computer Graphics Graphics Hardware

Overview of Secure Video Applications

Lecture 2 The Art of Concurrency

Patrick Cozzi University of Pennsylvania CIS Spring 2011

Presentation transcript:

GPU-ACCELERATED VIDEO ENCODING/DECODING CIS 565 Spring 2012 Philip Peng, 2012/03/19

Overview 1. Motivation 2. NVIDIA GPU 3. Compression Techniques 4. NPP & Performance 5. NVIDIA API Demo

MOTIVATION Why use GPU?

Motivation Why video encoding/decoding? Large part of consumer activities Watch YouTube videos everywhere Encode HD videos to iPhone/etc. Img src:

Motivation Why use GPU? Portable devices have limited processing power Very computationally intensive Faster + smaller = better Img src:

NVIDIA GPU Whats currently out there?

NVIDIA GPU Video Encoding Facilities SW H.264 codec designed for CUDA Supports baseline, main, high profiles Interfaces: C library (NVCUVENC) Direct Show API Win 7 MFT (multimedia framework)

NVIDIA GPU Video Decoding Facilities HW GPU acceleration H.264, VC1, MPEG2 SW MPEG2 decoder designed for CUDA Interfaces: C library (NVCUVIV) – HW & SW DXVA & Win7 MFT – HW only VDPAU library – HW only (Linux)

COMPRESSION TECHNIQUES How is it done?

GPU Algorithm Primitives Fundamental building blocks for Parallel Programming Reduce: sum of absolute differences Scan: integral image calculations Compact: index creation CUDA C SDK, CUDPP, Thrust implementations Img src:

Technique: Intra Prediction Img src:

Technique: Intra Prediction Img src:

Technique: Intra Prediction Video processing technique via content prediction Uses data within single video frame Divides frame into blocks Prediction rules Constant: each pixel constant value Vertical: each row uses top predictor Horizontal: each column uses left predictor Plane: gradient diagonal prediction Img src:

Technique: Intra Prediction For each block, calculate SAD between predicted pixels and actual Choose predictor with lowest prediction error Data compression = only store predictors and prediction rules Img src:

Technique: Motion Estimation Img src:

Technique: Motion Estimation Img src:

Technique: Motion Estimation Img src:

Technique: Motion Estimation Img src:

Technique: Motion Estimation Img src: Many different search algorithms for calculating motion vectors: Full search: brute force feature comparison Diamond search: template- based iterative matching 2D/3D-wave: diagonal block- based checking Combination done in parallel on GPU

NPP & PERFORMANCE GPU vs CPU

NVIDIA Performance Primitives What is it? NPP = C library of GPU-accelerated functions/primitives designed for CUDA API same as IPP (Intel Integrated Performance Primitives) No GPU architecture knowledge required! 5-10x faster performance vs CPU-only implementations

NVIDIA Performance Primitives Code Img src:

NVIDIA Performance Primitives Tests Data scalability: Img src:

NVIDIA Performance Primitives Tests Number of cores scalability: Img src:

NVIDIA Performance Primitives Tests Overall vs CPU Img src:

Encoding Performance Img src:

Decoding Performance Img src:

NVIDIA API DEMO See the numbers for yourself!

NVIDIA CUDA Video Encode API

NVIDIA CUDA Video Decoder GL API