10/26/20151 GC16/3011 Functional Programming Lecture 21 Parallel Graph Reduction.

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

Practical techniques & Examples
Control-Flow Graphs & Dataflow Analysis CS153: Compilers Greg Morrisett.
CHAPTER 2 PROCESSOR SCHEDULING PART I By U ğ ur HALICI.
Operating System Concepts and Techniques Lecture 12 Interprocess communication-1 M. Naghibzadeh Reference M. Naghibzadeh, Operating System Concepts and.
Trace-Based Automatic Parallelization in the Jikes RVM Borys Bradel University of Toronto.
Reference: Message Passing Fundamentals.
12a.1 Introduction to Parallel Computing UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008.
Multiprocessing Memory Management
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Multiscalar processors
Chapter 11 Operating Systems
Prof. Aiken CS 294 Lecture 21 Abstract Interpretation Part 2.
1 Simulation Methodology H Plan: –Introduce basics of simulation modeling –Define terminology and methods used –Introduce simulation paradigms u Time-driven.
Processes CS 416: Operating Systems Design, Spring 2001 Department of Computer Science Rutgers University
Ch 7.5 – Special Linear Systems
PARALLEL PROGRAMMING ABSTRACTIONS 6/16/2010 Parallel Programming Abstractions 1.
Chapter 51 Threads Chapter 5. 2 Process Characteristics  Concept of Process has two facets.  A Process is: A Unit of resource ownership:  a virtual.
Advances in Language Design
Parallel Programming in Java with Shared Memory Directives.
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
NReduce: A Distributed Virtual Machine for Parallel Graph Reduction Peter Kelly Paul Coddington Andrew Wendelborn Distributed and High Performance Computing.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
10/12/20151 GC16/3C11 Functional Programming Lecture 3 The Lambda Calculus A (slightly) deeper look.
CS453 Lecture 3.  A sequential algorithm is evaluated by its runtime (in general, asymptotic runtime as a function of input size).  The asymptotic runtime.
10/20/20151 GC16/3011 Functional Programming Lecture 22 The Four-Stroke Reduction Engine.
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
Scala Parallel Collections Aleksandar Prokopec, Tiark Rompf Scala Team EPFL.
Chapter 3 Parallel Programming Models. Abstraction Machine Level – Looks at hardware, OS, buffers Architectural models – Looks at interconnection network,
Games Development 2 Concurrent Programming CO3301 Week 9.
Threads G.Anuradha (Reference : William Stallings)
Presentation by Tom Hummel OverSoC: A Framework for the Exploration of RTOS for RSoC Platforms.
CY2003 Computer Systems Lecture 04 Interprocess Communication.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Processes CS 6560: Operating Systems Design. 2 Von Neuman Model Both text (program) and data reside in memory Execution cycle Fetch instruction Decode.
CSE 425: Control Abstraction I Functions vs. Procedures It is useful to differentiate functions vs. procedures –Procedures have side effects but usually.
QUINN GAUMER ECE 259/CPS 221 Improving Performance Isolation on Chip Multiprocessors via on Operating System Scheduler.
Concurrency & Context Switching Process Control Block What's in it and why? How is it used? Who sees it? 5 State Process Model State Labels. Causes of.
1 Computer Systems II Introduction to Processes. 2 First Two Major Computer System Evolution Steps Led to the idea of multiprogramming (multiple concurrent.
© GCSE Computing Computing Hardware Starter. Creating a spreadsheet to demonstrate the size of memory. 1 byte = 1 character or about 1 pixel of information.
Processes and Virtual Memory
Concurrency Properties. Correctness In sequential programs, rerunning a program with the same input will always give the same result, so it makes sense.
CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.
Assoc. Prof. Dr. Ahmet Turan ÖZCERİT.  What Operating Systems Do  Computer-System Organization  Computer-System Architecture  Operating-System Structure.
Data Structures and Algorithms in Parallel Computing
Chapter 2 Process Management. 2 Objectives After finish this chapter, you will understand: the concept of a process. the process life cycle. process states.
Operating Systems (CS 340 D) Dr. Abeer Mahmoud Princess Nora University Faculty of Computer & Information Systems Computer science Department.
CSE 5317/4305 L12: Higher-Order Functions1 Functional Languages and Higher-Order Functions Leonidas Fegaras.
2/4/20161 GC16/3011 Functional Programming Lecture 20 Garbage Collection Techniques.
Threaded Programming Lecture 1: Concepts. 2 Overview Shared memory systems Basic Concepts in Threaded Programming.
Concurrency and Performance Based on slides by Henri Casanova.
Operating System Concepts
Embedded Real-Time Systems Processing interrupts Lecturer Department University.
Processes Chapter 3. Processes in Distributed Systems Processes and threads –Introduction to threads –Distinction between threads and processes Threads.
Computer Orgnization Rabie A. Ramadan Lecture 9. Cache Mapping Schemes.
1 Module 3: Processes Reading: Chapter Next Module: –Inter-process Communication –Process Scheduling –Reading: Chapter 4.5, 6.1 – 6.3.
Resource Management IB Computer Science.
Multiscalar Processors
Computer System Overview
CSCI1600: Embedded and Real Time Software
Process & its States Lecture 5.
Lecture Topics: 11/1 General Operating System Concepts Processes
COMP60621 Fundamentals of Parallel and Distributed Systems
Prof. Leonardo Mostarda University of Camerino
Foundations and Definitions
Process Management -Compiled for CSIT
Lecture 4: Instruction Set Design/Pipelining
Parallel Programming in C with MPI and OpenMP
COMP60611 Fundamentals of Parallel and Distributed Systems
CSCI1600: Embedded and Real Time Software
Presentation transcript:

10/26/20151 GC16/3011 Functional Programming Lecture 21 Parallel Graph Reduction

10/26/20152 Contents  Sequential evaluator  Parallel graph reduction  Lazy/Strict/Non-strict evaluation  Strictness analysis  Elements of parallel graph reduction  Issues

10/26/20153 Sequential evaluator  A sequential evaluator  Normal order (lazy)  Algebraic type = tree  Tree reduction  Can make tree into a graph  Graph reduction  Parallel GR: many sequential evaluators all working on one graph  With additional support for synchronisation and communication

10/26/20154 Making things easier  Turn all functions into combinators  No free variables!  “Lift” all lambdas to top level  Named functions have no embedded lambdas  So no name clashes, no free variable capture  Easy beta reduction!  “supercombinator reduction”

10/26/20155 Lazy/Strict/Non-strict evaluation  Normal order reduction (“lazy evaluation”) is SEQUENTIAL  Alternative: Applicative order (“strict evaluation”)  Parallel GR CANNOT be sequential!  Does PGR therefore lose benefits of lazy evaluation?  No! As long as it only evaluates those bits of the graph that normal order reduction would have evaluated  “non-strict” evaluation  Use “Strictness Analysis”

10/26/20156 Strictness analysis  Done at compile time  For (f x y z) which of x, y and/or z does f always evaluate using normal order reduction?  If (f endless y z) = endless, then assume f MUST evaluate x  Halting problem?  Use approximation technique  Abstract Interpretation

10/26/20157 Elements of parallel GR  Many processors running sequential evaluators  Graph exists in shared heap(s)

10/26/20158 Processor Shared heap Processor Processor + 3 x

10/26/20159 Elements of parallel GR  Processors  Each evaluates a subgraph to (weak head) normal form  Slightly modified for Inter-Process Communication and synchronisation  Values communicated via graph  Two processors might evaluate the same sub-graph  just wasted effort, cannot affect correctness  But prefer not to waste the effort!

10/26/ Elements of parallel GR  A processor evaluates one or more tasks  Tasks/threads “to be done” in shared task pool(s)  Synchronisation required whenever a task requires the value of a subgraph that is currently being evaluated by another task  First task blocks, waiting for second task to finish  Processor need not block (can start another task)

10/26/ Processor Processor Processor Processor Processor Shared x5 + 3 x Shared task pool

10/26/ Sparking  Strictness analysis tells us all of the subgraphs that definitely need to be evaluated  Graph is annotated  At start, just one task descriptor is placed in task pool  One evaluator will start evaluating it  Evaluators check annotations and place task descriptors (sparks) into the task pool  Other evaluators will start work on the sparked tasks

10/26/ Issues  How things are represented in the heap  How the evaluators work  Shared memory (including Virtual SM) or distributed memory  Synchronising tasks (e.g. block and resume)  Scheduling, task distribution and load balancing

10/26/ Summary  Sequential evaluator  Lazy/Strict/Non-strict evaluation  Strictness analysis  Elements of parallel graph reduction  Issues