October 26, 2006 Parallel Image Processing Programming and Architecture IST PhD Lunch Seminar Wouter Caarls Quantitative Imaging Group.

Slides:



Advertisements
Similar presentations
Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Advertisements

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Lecture 6: Multicore Systems
November 30, Pseudo-dynamic C metaprogramming Using strategic term rewriting and partial evaluation 7 th Stratego User Days Wouter Caarls Quantitative.
1 Parallel Scientific Computing: Algorithms and Tools Lecture #3 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.
Multiprocessors CSE 4711 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor –Although.
March 18, 2008SSE Meeting 1 Mary Hall Dept. of Computer Science and Information Sciences Institute Multicore Chips and Parallel Programming.
11 1 Hierarchical Coarse-grained Stream Compilation for Software Defined Radio Yuan Lin, Manjunath Kudlur, Scott Mahlke, Trevor Mudge Advanced Computer.
Tuesday, September 12, 2006 Nothing is impossible for people who don't have to do it themselves. - Weiler.
11/14/05ELEC Fall Multi-processor SoCs Yijing Chen.
Parallel Programming Models and Paradigms
ELEC Fall 05 1 Very- Long Instruction Word (VLIW) Computer Architecture Fan Wang Department of Electrical and Computer Engineering Auburn.
Chapter 17 Parallel Processing.
Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.
Fundamental Issues in Parallel and Distributed Computing Assaf Schuster, Computer Science, Technion.
Dr. Gheith Abandah, Chair Computer Engineering Department The University of Jordan 20/4/20091.
Single-Chip Multi-Processors (CMP) PRADEEP DANDAMUDI 1 ELEC , Fall 08.
Programming the Cell Multiprocessor Işıl ÖZ. Outline Cell processor – Objectives – Design and architecture Programming the cell – Programming models CellSs.
Course Outline DayContents Day 1 Introduction Motivation, definitions, properties of embedded systems, outline of the current course How to specify embedded.
KUAS.EE Parallel Computing at a Glance. KUAS.EE History Parallel Computing.
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
Parallel Computing Basic Concepts Computational Models Synchronous vs. Asynchronous The Flynn Taxonomy Shared versus Distributed Memory Interconnection.
HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Vermelding onderdeel organisatie April 28, Algorithmic Skeletons for Stream Programming in Embedded Hetereogeneous Parallel Image Processing Applications.
Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
ASIP Architecture for Future Wireless Systems: Flexibility and Customization Joseph Cavallaro and Predrag Radosavljevic Rice University Center for Multimedia.
Static Translation of Stream Programs S. M. Farhad School of Information Technology The University of Sydney.
SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.
CAPS project-team Compilation et Architectures pour Processeurs Superscalaires et Spécialisés.
The Value of Parallelism 16 th Meeting Course Name: Business Intelligence Year: 2009.
May 16-18, Skeletons and Asynchronous RPC for Embedded Data- and Task Parallel Image Processing IAPR Conference on Machine Vision Applications Wouter.
A Systematic Approach to the Design of Distributed Wearable Systems Urs Anliker, Jan Beutel, Matthias Dyer, Rolf Enzler, Paul Lukowicz Computer Engineering.
Parallel Computing.
Processor Architecture
Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
Lecture 3: Computer Architectures
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
Design of A Custom Vector Operation API Exploiting SIMD Intrinsics within Java Presented by John-Marc Desmarais Authors: Jonathan Parri, John-Marc Desmarais,
Mapping of Regular Nested Loop Programs to Coarse-grained Reconfigurable Arrays – Constraints and Methodology Presented by: Luis Ortiz Department of Computer.
A Pattern Language for Parallel Programming Beverly Sanders University of Florida.
Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.
Parallel Computing Presented by Justin Reschke
LECTURE #1 INTRODUCTON TO PARALLEL COMPUTING. 1.What is parallel computing? 2.Why we need parallel computing? 3.Why parallel computing is more difficult?
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Static Translation of Stream Program to a Parallel System S. M. Farhad The University of Sydney.
Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Advanced Architectures
CHAPTER SEVEN PARALLEL PROCESSING © Prepared By: Razif Razali.
Dynamo: A Runtime Codesign Environment
Conception of parallel algorithms
Parallel Programming By J. H. Wang May 2, 2017.
Constructing a system with multiple computers or processors
Vector Processing => Multimedia
Design Space Exploration
Performance Optimization for Embedded Software
Hyunchul Park, Kevin Fan, Manjunath Kudlur,Scott Mahlke
Chapter 17 Parallel Processing
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
The Vector-Thread Architecture
CAPS project-team Compilation et Architectures pour Processeurs Superscalaires et Spécialisés.
Types of Parallel Computers
Presentation transcript:

October 26, 2006 Parallel Image Processing Programming and Architecture IST PhD Lunch Seminar Wouter Caarls Quantitative Imaging Group

October 26, of 22 Why Parallel? Processing time Smaller timesteps, more scales, faster response times Memory Larger images, more dimensions Energy consumption More applications, smaller devices

October 26, of 22 Data parallelism Many image processing operations have locality of reference (segmentation, filtering, distance transforms, etc.)  Data parallelism

October 26, of 22 Task farm parallelism An application consists of many different operations Some of these operations are independent (scale spaces, parameter sweeps, noise realizations, etc.)  Task farm parallelism

October 26, of 22 Pipeline parallelism An image processing algorithm consists of consecutive stages If multiple objects are to be processed, they may be in different stages at the same time  Pipeline parallelism

October 26, of 22 Parallel hardware architectures Fine grained Irregular Superscalar (most modern microprocessors) VLIW (DSPs) Regular Vector (supercomputers, MMX) SIMD (graphics processors) Custom FPGA

October 26, of 22 Parallel hardware architectures Coarse grained Homogeneous Multi-core, SMP Cluster Heterogeneous Embedded systems Grid

October 26, of 22 Obstacles Programming Synchronization, bookkeeping Different systems, languages, optimization strategies Choosing an architecture Analyze program before it is written Additional requirements or unexpected performance may require rewrite

October 26, of 22 Architecture-independent parallel programming Data parallelism Differentiate between synchronization pattern and computation Library provides pattern, user provides computation Task farm & pipeline parallelism Operations do not work on images, but on streams Sequences of operation calls do not imply an order, but a stream graph.

October 26, of 22 Algorithmic Skeletons +=+=

October 26, of 22 Example skeletons Pixel Neighbourhood Recursive neighbourhood Stack Filter Associative reduction

October 26, of 22 Constructing stream graphs By program (dynamic) capture(orig); normalize(orig, norm); dx(orig, x_der, 1.0); dy(orig, y_der, 1.0); direction(x_der, y_der, dir); display(dir); Visually (static) normalize dxdy direction display capture

October 26, of 22 Mapping stream graphs to processors Processor 1Processor 2

October 26, of 22 Dealing with heterogeneous tasks Processor 1Processor

October 26, of 22 Dealing with interconnect Processor 1Processor 2Interconnect

October 26, of 22 Dealing with dependencies Processor 1Processor 2Interconnect (3)+4(3)+7 (3) (3)+4 4

October 26, of 22 Choosing an architecture automatically Architecture-independent program allows automatic analyis after it is written, but before an architecture is chosen Based on certain constraints, architecture can be chosen automatically to optimize some cost function. Tradeoff between cost, power and performance must be made by the designer

October 26, of 22 Design Space Exploration Program Archi- tecture MetricsAnalyze Explore

October 26, of 22 Search strategy Constrained single objective minimum performance cost

October 26, of 22 Search strategy Multiobjective tradeoff iteration performance cost

October 26, of 22 Search strategy Strength Pareto performance cost

October 26, of 22 Conclusions Architecture-independent programming allows Parallel programming without bookkeeping Targeting heterogeneous systems Choosing the most appropriate architecture automatically

October 26, of 22 Overview Parallelism in image processing Parallel hardware architectures Architecture-independent parallel programming Algorithmic skeletons Stream programming Choosing an appropriate architecture Design Space Exploration

October 26, of 22 Exploiting parallelism Fine grained, irregular Superscalar Dataflow dispatch & reorder Most modern microprocessors Automatic by processor Very Long Instruction Word Multiple instructions per word DSPs, Itanium “Automatic” by compiler Ex Dispatch I Ex I I III

October 26, of 22 Exploiting parallelism Fine grained, regular Vector instructions Supercomputers MMX/SSEx Special instructions/datatypes Single Instruction Multiple Data Graphics processors Special languages Ex Dispatch I Ex I I I MMMM

October 26, of 22 Exploiting parallelism Coarse grained Multiprocessing Multiple processors/cores sharing a memory Shared-memory threading libraries (pthread, OpenMP) Clusters Relatively loosely coupled systems connected by a network Message-passing libraries (MPI) Heterogeneous systems Exploit differences in algorithmic requirements Multiple paradigms in a single application