1 A simple parallel algorithm Adding n numbers in parallel.

Slides:



Advertisements
Similar presentations
ISO 9126: Software Quality Internal Quality: Maintainability
Advertisements

CSCI 4717/5717 Computer Architecture
RISC and Pipelining Prof. Sin-Min Lee Department of Computer Science.
The Central Processing Unit: What Goes on Inside the Computer.
Instructor: Sazid Zaman Khan Lecturer, Department of Computer Science and Engineering, IIUC.
Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.
1 Lecture 5: Part 1 Performance Laws: Speedup and Scalability.
Advanced Topics in Algorithms and Data Structures Classification of the PRAM model In the PRAM model, processors communicate by reading from and writing.
PRAM Models Advanced Algorithms & Data Structures Lecture Theme 13 Prof. Dr. Th. Ottmann Summer Semester 2006.
Slide 1 Parallel Computation Models Lecture 3 Lecture 4.
Complexity Analysis (Part I)
CS107 Introduction to Computer Science
11/14/05ELEC Fall Multi-processor SoCs Yijing Chen.
Parallel Algorithms - Introduction Advanced Algorithms & Data Structures Lecture Theme 11 Prof. Dr. Th. Ottmann Summer Semester 2006.
Advanced Topics in Algorithms and Data Structures 1 Two parallel list ranking algorithms An O (log n ) time and O ( n log n ) work list ranking algorithm.
3.1Introduction to CPU Central processing unit etched on silicon chip called microprocessor Contain tens of millions of tiny transistors Key components:
Simulating a CRCW algorithm with an EREW algorithm Lecture 4 Efficient Parallel Algorithms COMP308.
The Pentium: A CISC Architecture Shalvin Maharaj CS Umesh Maharaj:
Information and Communication Technology Fundamentals Credits Hours: 2+1 Instructor: Ayesha Bint Saleem.
1 Lecture 2: Parallel computational models. 2  Turing machine  RAM (Figure )  Logic circuit model RAM (Random Access Machine) Operations supposed to.
Writer:-Rashedul Hasan Editor:- Jasim Uddin
Gary MarsdenSlide 1University of Cape Town Computer Architecture – Introduction Andrew Hutchinson & Gary Marsden (me) ( ) 2005.
Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.
3 1 3 C H A P T E R Hardware: Input, Processing, and Output Devices.
Basics and Architectures
Chapter 2.6 Comparison of Algorithms modified from Clifford A. Shaffer and George Bebis.
Week 2 CS 361: Advanced Data Structures and Algorithms
1 Growth of Functions CS 202 Epp, section ??? Aaron Bloomfield.
The CPU (or Central Processing Unit. Statistics Clock speed – number of instructions that can be executed per second Data width – The number of bits held.
Chapter 2 The CPU and the Main Board  2.1 Components of the CPU 2.1 Components of the CPU 2.1 Components of the CPU  2.2Performance and Instruction Sets.
Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier.
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
Egle Cebelyte. Random Access Memory is simply the storage area where all software is loaded and works from; also called working memory storage.
Program Efficiency & Complexity Analysis. Algorithm Review An algorithm is a definite procedure for solving a problem in finite number of steps Algorithm.
Super computers Parallel Processing By Lecturer: Aisha Dawood.
MS 101: Algorithms Instructor Neelima Gupta
Chapter 17 Looking “Under the Hood”. 2Practical PC 5 th Edition Chapter 17 Getting Started In this Chapter, you will learn: − How does a computer work.
Multi-Core Development Kyle Anderson. Overview History Pollack’s Law Moore’s Law CPU GPU OpenCL CUDA Parallelism.
 Introduction to SUN SPARC  What is CISC?  History: CISC  Advantages of CISC  Disadvantages of CISC  RISC vs CISC  Features of SUN SPARC  Architecture.
Algorithm Analysis Part of slides are borrowed from UST.
Searching Topics Sequential Search Binary Search.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
3/12/2013Computer Engg, IIT(BHU)1 PRAM ALGORITHMS-3.
M211 – Central Processing Unit
Hardware Trends CSE451 Andrew Whitaker. Motivation Hardware moves quickly OS code tends to stick around for a while “System building” extends way beyond.
Complexity of Algorithms Fundamental Data Structures and Algorithms Ananda Guna January 13, 2005.
CPU (Central Processing Unit). The CPU is the brain of the computer. Sometimes referred to simply as the processor or central processor, the CPU is where.
History of Computers and Performance David Monismith Jan. 14, 2015 Based on notes from Dr. Bill Siever and from the Patterson and Hennessy Text.
What’s going on here? Can you think of a generic way to describe both of these?
Computer Organization CS345 David Monismith Based upon notes by Dr. Bill Siever and from the Patterson and Hennessy Text.
Algorithm Complexity is concerned about how fast or slow particular algorithm performs.
Lecture 3: Parallel Algorithm Design
Web: Parallel Computing Rabie A. Ramadan , PhD Web:
مقدمة في علوم الحاسب Lecture 5
Introduction to Algorithms
Lecture 2: Parallel computational models
Lecture-1 Introduction
INTRODUCTION TO MICROPROCESSORS
Chapter 3: Principles of Scalable Performance
The Pentium: A CISC Architecture
Central Processing Unit
CISC AND RISC SYSTEM Based on instruction set, we broadly classify Computer/microprocessor/microcontroller into CISC and RISC. CISC SYSTEM: COMPLEX INSTRUCTION.
3.1 Introduction to CPU Central processing unit etched on silicon chip called microprocessor Contain tens of millions of tiny transistors Key components:
Chapter 1 Introduction.
Unit –VIII PRAM Algorithms.
Parallel Algorithms A Simple Model for Parallel Processing
COMS 361 Computer Organization
Lesson Objectives A note about notes: Aims
Course Code 114 Introduction to Computer Science
Presentation transcript:

1 A simple parallel algorithm Adding n numbers in parallel

2 A simple parallel algorithm Example for 8 numbers: We start with 4 processors and each of them adds 2 items in the first step. The number of items is halved at every subsequent step. Hence log n steps are required for adding n numbers. The processor requirement is O(n). How do we allocate tasks to processors? Where is the input stored? How do the processors access the input as well as intermediate results? We do not ask these questions while designing sequential algorithms.

3 How do we analyze a parallel algorithm? A parallel algorithms is analyzed mainly in terms of its time, processor and work complexities. Time complexity T(n) : How many time steps are needed? Processor complexity P(n) : How many processors are used? Work complexity W(n) : What is the total work done by all the processors? Hence, For our example: T(n) = O(log n) P(n) = O(n) W(n) = O(n log n)

4 How do we judge efficiency? We say A 1 is more efficient than A 2 if W 1 (n) = O(W 2 (n)) regardless of their time complexities. For example, W 1 (n) = O(n) and W 2 (n) = O(n log n) Consider two parallel algorithms A 1 and A 2 for the same problem. A 1 : W 1 (n) work in T 1 (n) time. A 2 : W 2 (n) work in T 2 (n) time. If W 1 (n) and W 2 (n) are asymptotically the same then A 1 is more efficient than A 2 if T 1 (n) = O(T 2 (n)). For example, W 1 (n) = W 2 (n) = O(n), but T 1 (n) = O(log n), T 2 (n) = O(log 2 n)

5 How do we judge efficiency? It is difficult to give a more formal definition of efficiency. Consider the following situation. For A 1, W 1 (n) = O(n log n) and T 1 (n) = O(n). For A 2, W 2 (n) = O(n log 2 n) and T 2 (n) = O(log n) It is difficult to say which one is the better algorithm. Though A 1 is more efficient in terms of work, A 2 runs much faster. Both algorithms are interesting and one may be better than the other depending on a specific parallel machine.

6 Optimal parallel algorithms Consider a problem, and let T(n) be the worst-case time upper bound on a serial algorithm for an input of length n. Assume also that T(n) is the lower bound for solving the problem. Hence, we cannot have a better upper bound. Consider a parallel algorithm for the same problem that does W(n) work in T par (n) time. The parallel algorithm is work optimal, if W(n) = O(T(n))

7 A simple parallel algorithm Adding n numbers in parallel

8 A work-optimal algorithm for adding n numbers Step 1. Use only n/log n processors and assign log nnumbers to each processor. Each processor adds log n numbers sequentially in O(log n) time. Step 2. We have only n/log n numbers left. We now execute our original algorithm on these n/log n numbers. Now T(n) = O(log n) W(n) = O(n/log n x log n) = O(n)

9 Why is parallel computing important? We can justify the importance of parallel computing for two reasons. Very large application domains, and Physical limitations of VLSI circuits Though computers are getting faster and faster, user demands for solving very large problems is growing at a still faster rate. Some examples include weather forecasting, simulation of protein folding, computational physics etc.

10 Physical limitations of VLSI circuits The Pentium III processor uses 180 nano meter (nm) technology, i.e., a circuit element like a transistor can be fetched within 180 x m. Pentium IV processor uses 160nm technology. Intel has recently trialed processors made by using 65nm technology.

11 How many transistors can we pack? Pentium III has about 42 million transistors and Pentium IV about 55 million transistors. The number of transistors on a chip is approximately doubling every 18 months (Moore ’ s Law).

12 Other Problems The most difficult problem is to control power dissipation. 75 watts is considered a maximum power output of a processor. As we pack more transistors, the power output goes up and better cooling is necessary. Intel cooled its 8 GHz demo processor using liquid Nitrogen !

13 The advantages of parallel computing Parallel computing offers the possibility of overcoming such physical limits by solving problems in parallel. In principle, thousands, even millions of processors can be used to solve a problem in parallel and today ’ s fastest parallel computers have already reached teraflop speeds.

Today ’ s microprocessors are already using several parallel processing techniques like instruction level parallelism, pipelined instruction fetching etc. Intel uses hyper threading in Pentium IV mainly because the processor is clocked at 3 GHz, but the memory bus operates only at about MHz. 14

15 Problems in parallel computing The sequential or uni-processor computing model is based on von Neumann ’ s stored program model. A program is written, compiled and stored in memory and it is executed by bringing one instruction at a time to the CPU.

16 Problems in parallel computing Programs are written keeping this model in mind. Hence, there is a close match between the software and the hardware on which it runs. The theoretical RAM model captures these concepts nicely. There are many different models of parallel computing and each model is programmed in a different way. Hence an algorithm designer has to keep in mind a specific model for designing an algorithm. Most parallel machines are suitable for solving specific types of problems. Designing operating systems is also a major issue.

17 The PRAM model n processors are connected to a shared memory.

18 The PRAM model Each processor should be able to access any memory location in each clock cycle. Hence, there may be conflicts in memory access. Also, memory management hardware needs to be very complex. We need some kind of hardware to connect the processors to individual locations in the shared memory. The SB-PRAM developed at University of Saarlandes by Prof. Wolfgang Paul ’ s group is one such model.

19 The PRAM model A more realistic PRAM model

20

21

22

23

24

25

26

27

28

29

30

31

32