NIRICT RECONNAISSANCE TOPIC PERFORMANCE AND CORRECTNESS OF GPGPU MARIEKE HUISMAN ALEXANDRU IOSUP ANA LUCIA VARBANESCU ANTON WIJS.

Slides:



Advertisements
Similar presentations
Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?
Advertisements

Copyright W. Howden1 Programming by Contract CSE 111 6/4/2014.
A Program Transformation For Faster Goal-Directed Search Akash Lal, Shaz Qadeer Microsoft Research.
Automated Verification with HIP and SLEEK Asankhaya Sharma.
Automated Software Verification with a Permission-Based Logic 20 th June 2014, Zürich Malte Schwerhoff, ETH Zürich.
Verification of Multithreaded Object- Oriented Programs with Invariants Bart Jacobs, K. Rustan M. Leino, Wolfram Schulte.
Reasoning About Code; Hoare Logic, continued
Hoare’s Correctness Triplets Dijkstra’s Predicate Transformers
11111 Functional Program Verification CS 4311 A. M. Stavely, Toward Zero Defect Programming, Addison-Wesley, Y. Cheon and M. Vela, A Tutorial on.
Axiomatic Verification I Prepared by Stephen M. Thebaut, Ph.D. University of Florida Software Testing and Verification Lecture 17.
ISBN Chapter 3 Describing Syntax and Semantics.
Predicate Transformers
CS 355 – Programming Languages
Slides 8d-1 Programming with Shared Memory Specifying parallelism Performance issues ITCS4145/5145, Parallel Programming B. Wilkinson Fall 2010.
CSE115/ENGR160 Discrete Mathematics 04/12/11 Ming-Hsuan Yang UC Merced 1.
ESC Java. Static Analysis Spectrum Power Cost Type checking Data-flow analysis Model checking Program verification AutomatedManual ESC.
VIDE Integrated Environment for Development and Verification of Programs.
On the Relationship Between Concurrent Separation Logic and Assume-Guarantee Reasoning Xinyu Feng Yale University Joint work with Rodrigo Ferreira and.
OOP #10: Correctness Fritz Henglein. Wrap-up: Types A type is a collection of objects with common behavior (operations and properties). (Abstract) types.
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 18 Program Correctness To treat programming.
PSUCS322 HM 1 Languages and Compiler Design II Formal Semantics Material provided by Prof. Jingke Li Stolen with pride and modified by Herb Mayer PSU Spring.
Describing Syntax and Semantics
K. Rustan M. Leino RiSE, Microsoft Research, Redmond joint work with Peter Müller and Jan Smans Lecture 0 1 September 2009 FOSAD 2009, Bertinoro, Italy.
Verifying a Wait Free Register Algorithm Using Assertional Reasoning Xu Qiwen Faculty of Science and Technology University of Macau.
Pre/Post Condition Logic 03/06/2013. Agenda Hoare’s Logic Overview Application to Pre/Post Conditions.
Introduction to CUDA (1 of 2) Patrick Cozzi University of Pennsylvania CIS Spring 2012.
Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2012.
1 Inference Rules and Proofs (Z); Program Specification and Verification Inference Rules and Proofs (Z); Program Specification and Verification.
Proofs of Correctness: An Introduction to Axiomatic Verification Prepared by Stephen M. Thebaut, Ph.D. University of Florida CEN 5035 Software Engineering.
Chapter 25 Formal Methods Formal methods Specify program using math Develop program using math Prove program matches specification using.
Survey on Trace Analyzer (2) Hong, Shin /34Survey on Trace Analyzer (2) KAIST.
CS 363 Comparative Programming Languages Semantics.
Viper A Verification Infrastructure for Permission-Based Reasoning 1 st March 2015, ECOOP’15 PC Meeting, Zurich Uri Juhasz, Ioannis Kassios, Peter Müller,
Checking Reachability using Matching Logic Grigore Rosu and Andrei Stefanescu University of Illinois, USA.
Reading and Writing Mathematical Proofs Spring 2015 Lecture 4: Beyond Basic Induction.
Chapter 5: Sequences, Mathematical Induction, and Recursion 5.5 Application: Correctness of Algorithms 1 [P]rogramming reliability – must be an activity.
Program Analysis and Verification Spring 2014 Program Analysis and Verification Lecture 4: Axiomatic Semantics I Roman Manevich Ben-Gurion University.
Semantics In Text: Chapter 3.
COP4020 Programming Languages Introduction to Axiomatic Semantics Prof. Robert van Engelen.
Automated and Modular Refinement Reasoning for Concurrent Programs Shaz Qadeer.
Introduction to CUDA (1 of n*) Patrick Cozzi University of Pennsylvania CIS Spring 2011 * Where n is 2 or 3.
13 Aug 2013 Program Verification. Proofs about Programs Why make you study logic? Why make you do proofs? Because we want to prove properties of programs.
Principle of Programming Lanugages 3: Compilation of statements Statements in C Assertion Hoare logic Department of Information Science and Engineering.
An Introduction to Automated Program Verification with Permission Logics 15 th May 2015, Systems Group, ETH Zurich Uri Juhasz, Ioannis Kassios, Peter Müller,
PROGRAMMING PRE- AND POSTCONDITIONS, INVARIANTS AND METHOD CONTRACTS B MODULE 2: SOFTWARE SYSTEMS 13 NOVEMBER 2013.
CS/EE 217 GPU Architecture and Parallel Programming Lecture 23: Introduction to OpenACC.
Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2014.
Cs7100(Prasad)L18-9WP1 Axiomatic Semantics Predicate Transformers.
AUTO-GC: Automatic Translation of Data Mining Applications to GPU Clusters Wenjing Ma Gagan Agrawal The Ohio State University.
CSC3315 (Spring 2009)1 CSC 3315 Languages & Compilers Hamid Harroud School of Science and Engineering, Akhawayn University
Agenda  Quick Review  Finish Introduction  Java Threads.
Formal Verification – Robust and Efficient Code Lecture 1
The Relationship Between Separation Logic and Implicit Dynamic Frames
Formal Methods in Software Engineering 1
Mathematical Structures for Computer Science Chapter 1
Lecture 5 Floyd-Hoare Style Verification
Programming Languages 2nd edition Tucker and Noonan
Predicate Transformers
Functional Program Verification
Programming with Shared Memory Specifying parallelism
Formal Methods in software development
Java Modeling Language (JML)
The Zoo of Software Security Techniques
Program correctness Axiomatic semantics
Programming with Shared Memory Specifying parallelism
CIS 720 Lecture 3.
CIS 720 Lecture 3.
Programming Languages 2nd edition Tucker and Noonan
Programming Languages 2nd edition Tucker and Noonan
COP4020 Programming Languages
Presentation transcript:

NIRICT RECONNAISSANCE TOPIC PERFORMANCE AND CORRECTNESS OF GPGPU MARIEKE HUISMAN ALEXANDRU IOSUP ANA LUCIA VARBANESCU ANTON WIJS

 Get to know each other:  Quick introduction round  Get inspired  Discuss what opportunities we see  Plans and organisation for the reconnaissance topic 3/12/15Verification of OpenCL kernels 2 GOAL OF THE DAY

10:00 opening & welcome - goal of the day and the reconnaissance theme - Marieke Huisman 10:05 Marieke Huisman - verification of OpenCL programs 10:20 Alexandru Iosup 10:40 Andrei Jalba - GPU-accelerated simulations for Computer Graphics 11:00 Jan Kuper - GPU-related work in the CAES group 11:20 Kees Vuik 11:40 Gert-Jan van den Braak 12:00 lunch 14:00 Ana Lucia Varbanescu 14:20 Anton Wijs - GPU Accelerated Verification and Verified Model- Based Development of Multi-Threaded Software 14:40 coffee break 15: :00 discussion 3/12/15Verification of OpenCL kernels 3

 What possibilities do we see for collaboration?  What are interesting topics that you would like to know more about?  What are the most important open problems in the area of performance and correctness of GPGPU?  How often should we meet?  Should we invite other people?  How should the meetings be organised? 3/12/15Verification of OpenCL kernels 4 DISCUSSION TOPICS

VERIFICATION OF OPENCL PROGRAMS MARIEKE HUISMAN UNIVERSITY OF TWENTE, FORMAL METHODS AND TOOLS GROUP

 Basics for program verification  Hoare logic  Separation logic  Permissions  Permission-based separation logic for OpenCL  Verification of compiler directives  Future challenges 3/12/15Verification of OpenCL kernels 6 OVERVIEW

 {P}S{Q}  Due to Tony Hoare (1969)  Meaning: if P holds in initial state s, and execution of S in s terminates in state s', then Q holds in s' 7 HOARE TRIPLES /12/15Verification of OpenCL kernels {P}S1{Q} {Q}S2{R} {P}S1;S2{R} {P  b}S1{Q} {P  ¬b}S2{Q} {P}if (b) S1 else S2 {Q} Seq If

Syntax extension of predicate logic: φ ::= e.f  e’ | φ  φ |... where e is an expression, and f a field Meaning: e.f  e’ – heap contains location pointed to by e.f, containing the value given by the meaning e’ φ1  φ2 – heap can be split in disjoint parts, satisfying φ1 and φ2, respectively Monotone w.r.t. extensions of the heap 8 INTUITIONISTIC SEPARATION LOGIC Developed for reasoning about programs with pointers 3/12/15Verification of OpenCL kernels  Two interpretations e.f  v  Field e.f contains value v  Permission to access field e.f  Disjointness of the heap allows reasoning about (absence) of aliasing  Local reasoning

where no variable free in Pi or Qi is changed in Sj (if i  j) 9 JOHN REYNOLDS’S 70TH BIRTHDAY PRESENT {P1}S1{Q1} {Pn}Sn{Qn} {P1 ...  Pn} S1 ||... || Sn {Q1 ...  Qn} 3/12/15Verification of OpenCL kernels

 Permission to access a variable is a fraction between 0 and 1  Full permission 1 allows to change the variable  Fractional permission in (0, 1) allows to inspect a variable  Permissions can be split and combined Perm(x, 1) ★ − ★ Perm(x, ½) ★ Perm(x, ½)  Global invariant: for each variable, the sum of all the permissions in the system is never more than 1  Thus: simultaneous reads allowed, but no read-write or write-write conflicts (data races) 3/12/15Verification of OpenCL kernels 10 PERMISSIONS John Boyland

 Basis for reasoning: Permission-based Separation Logic  Java-like programs (thread creation, thread joining, reentrant locks)  OpenCL-like programs  Permissions:  Write permission: exclusive access  Read permission: shared access  Read and write permissions can be exchanged  Permission specifications combined with functional properties 3/12/15Verification of OpenCL kernels 11 VERCORS (VERIFICATION OF CONCURRENT PROGRAMS)

 Kernel specification  All permissions that a kernel needs for its execution  Separated in permissions for  Global Memory – given up by host code  Shared Memory – local to the GPU  Thread specification  Permissions needed by single thread  Should be a subset of kernel permissions  Barrier specification  Each barrier allows redistribution of permissions Verification of Concurrent Software A LOGIC FOR GPU KERNELS 22/06/ Actually: Group specification in between kernel and thread specification Plus: functional specifications (pre- and postconditions)

_kernel void bla( _global float* input, _global float* output) { int i = get_global_id(0); output[i] = input[i] * input[i]; barrier(CLK_GLOBAL_MEM_FENCE); output[(i+1)%wg_size]= output[(i+1)%wg_size] * input[i]; } 22/06/2015Verification of Concurrent Software EXAMPLE KERNEL 13

Kernel Specification: Global Memory Resources: Write permission on all entries of output Read permission on all entries of input Shared Memory Resources: - Thread Specification: Resources: WritePerm(output[tid]) ReadPerm(input[tid]) Precondition: - Postcondition: output[(tid + 1) % wg_size] = input[tid]  input[(tid + 1) % wg_size]^2 Verification of Concurrent Software EXAMPLE SPECIFICATION _kernel void bla( _global float* input, _global float* output) { int i = get_global_id(0); output[i] = input[i] * input[i]; barrier(CLK_GLOBAL_MEM_FENCE); output[(i+1)%wg_size]= output[(i+1)%wg_size] * input[i]; } Provided by host Global proof obligation: All threads together use no more resources than available in the kernel 22/06/

Barrier Specification: Resources: Exchange write permission on output [tid] to write permission on output[(tid+1) % wg_size] Keep read permission on input[tid] Obtain read permission on input[(tid+1) % wg_size] Precondition: output[tid] = input[tid]  input[tid] Postcondition: output[(tid + 1)%wg_size] = input[(tid + 1)%wg_size]^2 Verification of Concurrent Software EXAMPLE BARRIER SPECIFICATION Global proof obligation: All permissions available in kernel _kernel void bla( _global float* input, _global float* output) { int i = get_global_id(0); output[i] = input[i] * input[i]; barrier(CLK_GLOBAL_MEM_FENCE); output[(i+1)%wg_size]= output[(i+1)%wg_size] * input[i]; } Global proof obligation: Barriers correctly transfer knowledge about state 22/06/

GPU programming: high-level sequential programs compiled with parallellising compiler // independent for(int i=0; i < N; i++) { a[i]= 2 ∗ b[i];} 22/06/2015Verification of Concurrent Software 16 VERIFICATION OF COMPILER DIRECTIVES Iteration contract: Resources needed and returned by each iteration Compilation to kernel: Iteration contract becomes thread specification Can be extended with functional properties / requires Perm(a[i],1)  Perm(b[i],1/2); ensures Perm(a[i],1)  ∗ /

for(int i=0; i < N; i++) { a[i]=b[i]+1; if (i>0) c[i]=a[i−1]+2; } 22/06/2015Verification of Concurrent Software 17 LOOP WITH FORWARD DEPENDENCE Send/receive:  Transfer permissions to different iteration  Needed for verification of this iteration contract / requires Perm(a[i],1)  Perm(b[i],1/2)  Perm(c[i],1); ensures Perm(b[i],1/2)  Perm(a[i],1/2)  Perm(c[i],1); ensures i>0 ==> Perm(a[i−1],1/2); ensures i==N−1 ==> ∗ / L1:if (i< N−1) send Perm(a[i],1/2) to L2,1; L2:if (i>0) recv Perm(a[i−1],1/2) from L1,1; Compilation to kernel: Iteration contract becomes thread specification

kernel Ref { global int[tcount] a,b,c; requires Perm(a[tid],1)  Perm(b[tid],1/2)  Perm(c[tid],1); ensures Perm(a[tid],1/2)  Perm(b[tid],1/2)  Perm(c[tid],1); ensures tid>0 ==> Perm(a[tid−1],1/2); ensures tid==tcount−1==>Perm(a[tid],1/2); void main(){ a[tid]=b[tid]+1; barrier(global){ requires tid Perm(a[tid],1/2); ensures (tid>0==>Perm(a[tid−1],1/2) } if (tid>0)c[tid]=a[tid−1]+2; }} 22/06/2015Verification of Concurrent Software 18 CORRESPONDING SPECIFIED KERNEL Permission transfer between iterations: barrier specification for(int i=0; i < N; i++) / requires Perm(a[i],1)  Perm(b[i],1/2)  Perm(c[i],1); ensures Perm(b[i],1/2)  Perm(a[i],1/2)  Perm(c[i],1); ensures i>0 ==> Perm(a[i−1],1/2); ensures i==N−1 ==> ∗ / { a[i]=b[i]+1; L1:if (i< N−1) send Perm(a[i],1/2) to L2,1; L2:if (i>0) recv Perm(a[i−1],1/2) from L1,1; if (i>0) c[i]=a[i−1]+2; }

 Scaling of the approach  Annotation generation  On-going work: OpenMP compiler directives  Correctness of compiler optimisations and other program transformations 3/12/15Verification of OpenCL kernels 19 CHALLENGES