OpenCL С. OpenCL и CUDA CUDAOpenCL Kernel Host program ThreadWork item BlockWork group GridNDRange WarpWavefront.

Slides:



Advertisements
Similar presentations
CUDA More on Blocks/Threads. 2 Debugging Using the Device Emulation Mode An executable compiled in device emulation mode ( nvcc -deviceemu ) runs completely.
Advertisements

COP 2800 Lake Sumter State College Mark Wilson, Instructor.
Functions. COMP104 Functions / Slide 2 Introduction to Functions * A complex problem is often easier to solve by dividing it into several smaller parts,
Programming for Artists ART 315 Dr. J. R. Parker Art/Digital Media Lab Lec 13 Fall 2010.
Return values.
Fraction class class fraction { int num, den; fraction (int a, int b){ int common = gcd(a,b); num=a/common; den=b/common; } fraction (int a){ num=a; den=1;
Pemrograman Dasar - Data Types1 OPERATOR. Pemrograman Dasar - Data Types2 Arithmetic operator  + - * /  / operator denotes integer division if both.
More on threads, shared memory, synchronization
Evaluating Sine & Cosine and and Tangent (Section 7.4)
Язык JavaScript Скриптовый язык для выполнения на html-страницах.
Concepts when Python retrieves a variable’s value Namespaces – Namespaces store information about identifiers and the values to which they are bound –
FORTRAN.  Fortran or FORmula TRANslation was developed in the 1950's by IBM as an alternative to Assembly Language. First successfull high level language.
Find the period of the function y = 4 sin x
Introduction to Matlab. Entering Commands Constants and Functions >> pi ans = >> eps ans = e-016 >> sin(pi/2) ans = 1 >> log(1000) ans =
EGR 105 Foundations of Engineering I Session 3 Excel – Basics through Graphing Fall 2008.
1 Python Chapter 2 © Samuel Marateck, After you install the compiler, an icon labeled IDLE (Python GUI) will appear on the screen. If you click.
GPU Programming EPCC The University of Edinburgh.
10.4A Polar Equations Rectangular: P (x, y) Polar: P (r,  )  r = radius (distance from origin)   = angle (radians)
1 TAC2000/ Protocol Engineering and Application Research Laboratory (PEARL) MATH Functions in C Language.
Computer Science 111 Fundamentals of Programming I Basic Program Elements.
Modules and Decomposition UW CSE 190p Summer 2012 download examples from the calendar.
13.2 – Define General Angles and Use Radian Measure.
4.2 Day 1 Trigonometric Functions on the Unit Circle Pg. 472 # 6-10 evens, evens, 46, 54, 56, 60 For each question (except the 0 o, 90 o, 180 o,
TOP 10 Missed Mid-Unit Quiz Questions. Use the given function values and trigonometric identities to find the indicated trig functions. Cot and Cos 1.Csc.
Warm-Up Find the following. 1.) sin 30 ◦ 2.) cos 270 ◦ 3.) cos 135 ◦
© David Kirk/NVIDIA and Wen-mei W. Hwu Taiwan, June 30-July 2, Taiwan 2008 CUDA Course Programming Massively Parallel Processors: the CUDA experience.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE498AL, University of Illinois, Urbana-Champaign 1 Programming Massively Parallel Processors CUDA Threads.
© David Kirk/NVIDIA and Wen-mei W. Hwu Urbana, Illinois, August 10-14, VSCSE Summer School 2009 Many-core processors for Science and Engineering.
1 ECE 8823A GPU Architectures Module 3: CUDA Execution Model © David Kirk/NVIDIA and Wen-mei Hwu, ECE408/CS483/ECE498al, University of Illinois,
1 Math Expressions and Operators. 2 Some C++ Operators Precedence OperatorDescription Higher ( )Function call +Positive - Negative *Multiplication / Division.
Today’s lecture 2-Dimensional indexing Color Format Thread Synchronization within for- loops Shared Memory Tiling Review example programs Using Printf.
CSC 107 – Programming For Science. Announcements  Lectures may not cover all material from book  Material that is most difficult or challenging is focus.
CS/EE 217 GPU Architecture and Parallel Programming Lecture 3: Kernel-Based Data Parallel Execution Model © David Kirk/NVIDIA and Wen-mei Hwu,
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE498AL, University of Illinois, Urbana-Champaign 1 Programming Massively Parallel Processors Lecture.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE498AL, University of Illinois, Urbana-Champaign 1 ECE498AL Lecture 3: A Simple Example, Tools, and.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE498AL, University of Illinois, Urbana-Champaign 1 ECE498AL Lecture 4: CUDA Threads – Part 2.
CUDA Parallel Execution Model with Fermi Updates © David Kirk/NVIDIA and Wen-mei Hwu, ECE408/CS483/ECE498al, University of Illinois, Urbana-Champaign.
ECE408/CS483 Applied Parallel Programming Lecture 4: Kernel-Based Data Parallel Execution Model © David Kirk/NVIDIA and Wen-mei Hwu, ECE408/CS483/ECE498al,
OpenCL Programming James Perry EPCC The University of Edinburgh.
Evaluating Inverse Trigonometric Functions
Introduction As programmers, we don’t want to have to implement functions for every possible task we encounter. The Standard C library contains functions.
CSCE 121: Introduction to Program Design and Concepts, Honors J. Michael Moore Spring 2015 Set 14: Plotting Functions and Data 1.
White Master Replace with a graphic 5.5” Tall & 4.3” Wide ® Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe confidential.1 Pixel.
Trigonometry Exact Value Memory Quiz A Trigonometry Exact Value Memory Quiz A.
OpenCL Joseph Kider University of Pennsylvania CIS Fall 2011.
Parallel Programming Basics  Things we need to consider:  Control  Synchronization  Communication  Parallel programming languages offer different.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE498AL, University of Illinois, Urbana-Champaign 1 CUDA Threads.
ECE408/CS483 Applied Parallel Programming Lecture 4: Kernel-Based Data Parallel Execution Model © David Kirk/NVIDIA and Wen-mei Hwu, , SSL 2014.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 Programming Massively Parallel Processors Lecture.
Sudeshna Sarkar, IIT Kharagpur 1 I/O in C + Misc Lecture –
© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007 ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Lecture 4: The CUDA Memory Model (Cont.)
Heterogeneous Computing with OpenCL Dr. Sergey Axyonov.
Programming with CUDA WS 08/09 Lecture 2 Tue, 28 Oct, 2008.
Mathematical Manipulation Data types Operator precedence Standard mathematical functions Worked examples.
CS 179: GPU Programming Lecture 7. Week 3 Goals: – More involved GPU-accelerable algorithms Relevant hardware quirks – CUDA libraries.
CSE 110: Programming Language I Afroza Sultana UB 1230.
OpenCL. Sources Patrick Cozzi Spring 2011 NVIDIA CUDA Programming Guide CUDA by Example Programming Massively Parallel Processors.
Lecture 15 Introduction to OpenCL
Chapter 6 - Functions modular programming general function format
Patrick Cozzi University of Pennsylvania CIS Spring 2011
CUDA and OpenCL Kernels
GPU Programming using OpenCL
CS 179: GPU Programming Lecture 7.
CUDA Parallelism Model
© David Kirk/NVIDIA and Wen-mei W. Hwu,
Find the numerical value of the expression. sinh ( ln 6 )
ECE498AL Spring 2010 Lecture 4: CUDA Threads – Part 2
Terminal-Based Programs
© David Kirk/NVIDIA and Wen-mei W. Hwu,
© David Kirk/NVIDIA and Wen-mei W. Hwu,
Presentation transcript:

OpenCL С

OpenCL и CUDA CUDAOpenCL Kernel Host program ThreadWork item BlockWork group GridNDRange WarpWavefront

OpenCL и CUDA CUDAOpenCL threadIdx.xget_local_id(0) blockIdx.x*blockDim.x + threadIdx.x get_global_id(0) gridDim.xget_num_groups(0) blockIdx.xget_group_id(0) blockDim.xget_local_size(0) gridDim.x * blockDim.xget_global_size(0)

OpenCL и CUDA Модель исполнения CUDA:

get_ local_ size(1) OpenCL и CUDA OpenCL: Index Space get_global_size(0) get_ global_ size(1) Work Group (0, 0) Work Group (1, 0) Work Group (2, 0) Work Group (0, 1) Work Group (1, 1) Work Group (2, 1) get_local_size(0) Work Item (0, 0) Work Group (0,0) Work Item (1, 0) Work Item (2, 0) Work Item (3, 0) Work Item (4, 0) Work Item (0, 1) Work Item (1, 1) Work Item (2, 1) Work Item (3, 1) Work Item (4, 1) Work Item (0, 2) Work Item (1, 2) Work Item (2, 2) Work Item (3, 2) Work Item (4, 2)

OpenCL и CUDA Модель памяти CUDA

OpenCL и CUDA OpenCL:

OpenCL и CUDA CUDAOpenCL Global memory Constant memory Shared memoryLocal memory Private memory

OpenCL и CUDA CUDAOpenCL __syncthreads()__barrier()

Image from:

OpenCL и CUDA Kernel – функции в CUDA: __global__ void sum(float *a, float *b, float *c) { int i = blockIdx.x*blockDim.x + threadIdx.x; c[i] = a[i] + b[i]; }

OpenCL и CUDA OpenCL: __kernel void vecAdd(__global float *a, __global float *b, __global float *c) { int i = get_global_id(0); c[i] = a[i] + b[i]; }

Slide from:

Еще несколько примеров float4 c; c.xyzw = (float4)(1.0f, 2.0f, 3.0f, 4.0f); float4 pos = (float4)(1.0f, 2.0f, 3.0f, 4.0f); float4 swiz= pos.wzyx; float4 dup = pos.xxyy; pos.xw = (float2)(5.0f, 6.0f); float4 a, b, c, d; float16 x; x = (float16)(a.xxxx, b.xyz, c.xyz, d.xyz, a.yzw);

Математические функции sin, cos, tan, sinh, cosh, tanh, обратные к перечисленным (вначале добавить букву a) gentype sincos (gentype x, gentype *cosval) – вычисляет сразу оба atan2(y,x) = arctg(y/x), atanpi = arctg/pi и др arc sqrt – квадратный корень, cbrt – кубический корень, rootn(x,y) = x^(1/y) erf(z) – интеграл ошибок exp, exp2,exp10 – e^x, 2^x, 10^x, expm1 = (e^x)-1, log, log2, log10 ceil, floor, round – округление fabs, fmax(x,y), fmin(x,y) – поэлементно! hypot(x,y) – sqrt(x^2+y^2) mad(a,b,c) = a*b+с, pow(x,y), pown(x,n) remainder(x,y) – остаток от деления x на y clamp (x, minval, maxval) = min(max(x, minval), maxval) degrees (radians) radians (degrees) gentype step (gentype edge, gentype x) – функция Хевисайда smoothstep (edge0, edge1, x) – сглаженный Хевисайд gentype sign (gentype x)

Константы M_E_F = e M_LOG2E_F = log2e M_LOG10E_F = log10e M_LN2_F = loge2 M_LN10_F = loge10 M_PI_F = π M_PI_2_F = π / 2 M_PI_4_F = π / 4 M_1_PI_F = 1 / π M_2_PI_F = 2 / π M_2_SQRTPI_F = 2 / √π M_SQRT2_F = √2 M_SQRT1_2_F = 1 / √2

Геометрические функции float3 cross (float3 p0, float3 p1) - векторное произведение float dot (floatn p0, floatn p1) – скалярное произведение float distance (floatn p0, floatn p1) – расстояние float length (floatn p) – норма (евклидова) floatn normalize (floatn p) - нормирование вектора

Функции сравнения intn isequal (floatn x, floatn y) – возвращает вектор 0 – false и -1 – true intn isnotequal (floatn x, floatn y) intn isgreater (floatn x, floatn y) intn isgreaterequal (floatn x, floatn y) intn isless (floatn x, floatn y) intn islessequal (floatn x, floatn y) int any (igentype x) – проверяет самый старший бит у каждого элемента вектора (если элемент = 1 - > это false) int all (igentype x)