Presentation is loading. Please wait.

Presentation is loading. Please wait.

Design and Implementation of the CCC Parallel Programming Language Nai-Wei Lin Department of Computer Science and Information Engineering National Chung.

Similar presentations


Presentation on theme: "Design and Implementation of the CCC Parallel Programming Language Nai-Wei Lin Department of Computer Science and Information Engineering National Chung."— Presentation transcript:

1 Design and Implementation of the CCC Parallel Programming Language Nai-Wei Lin Department of Computer Science and Information Engineering National Chung Cheng University

2 ICS20042 Outline zIntroduction zThe CCC programming language zThe CCC compiler zPerformance evaluation zConclusions

3 ICS20043 Motivations zParallelism is the future trend zProgramming in parallel is much more difficult than programming in serial zParallel architectures are very diverse zParallel programming models are very diverse

4 ICS20044 Motivations zDesign a parallel programming language that uniformly integrates various parallel programming models zImplement a retargetable compiler for this parallel programming language on various parallel architectures

5 ICS20045 Approaches to Parallelism zLibrary approach yMPI (Message Passing Interface), pthread zCompiler approach yHPF (High Performance Fortran), HPC++ zLanguage approach yOccam, Linda, CCC (Chung Cheng C)

6 ICS20046 Models of Parallel Architectures zControl Model ySIMD: Single Instruction Multiple Data yMIMD: Multiple Instruction Multiple Data zData Model yShared-memory yDistributed-memory

7 ICS20047 Models of Parallel Programming zConcurrency yControl parallelism: simultaneously execute multiple threads of control yData parallelism: simultaneously execute the same operations on multiple data zSynchronization and communication yShared variables yMessage passing

8 ICS20048 Granularity of Parallelism zProcedure-level parallelism yConcurrent execution of procedures on multiple processors zLoop-level parallelism yConcurrent execution of iterations of loops on multiple processors zInstruction-level parallelism yConcurrent execution of instructions on a single processor with multiple functional units

9 ICS20049 The CCC Programming Language zCCC is a simple extension of C and supports both control and data parallelism zA CCC program consists of a set of concurrent and cooperative tasks zControl parallelism runs in MIMD mode and communicates via shared variables and/or message passing zData parallelism runs in SIMD mode and communicates via shared variables

10 ICS Tasks in CCC Programs Control Parallel Data Parallel

11 ICS Control Parallelism zConcurrency ytask ypar and parfor zSynchronization and communication yshared variables – monitors ymessage passing – channels

12 ICS Monitors zThe monitor construct is a modular and efficient construct for synchronizing shared variables among concurrent tasks zIt provides data abstraction, mutual exclusion, and conditional synchronization

13 ICS An Example - Barber Shop Barber Chair Customer

14 ICS An Example - Barber Shop task::main( ) { monitor Barber_Shop bs; int i; par { barber( bs ); parfor (i = 0; i < 10; i++) customer( bs ); }

15 ICS An Example - Barber Shop task::barber(monitor Barber_Shop in bs) { while ( 1 ) { bs.get_next_customer( ); bs.finished_cut( ); } task::customer(monitor Barber_Shop in bs) { bs.get_haircut( ); }

16 ICS An Example - Barber Shop monitor Barber_Shop { int barber, chair, open; cond barber_available, chair_occupied; cond door_open, customer_left; Barber_Shop( ); void get_haircut( ); void get_next_customer( ); void finished_cut( ); };

17 ICS An Example - Barber Shop Barber_Shop( ) { barber = 0; chair = 0; open = 0; } void get_haircut( ) { while (barber == 0) wait(barber_available); barber  = 1; chair += 1; signal(chair_occupied); while (open == 0) wait(door_open); open  = 1; signal(customer_left); }

18 ICS An Example - Barber Shop void get_next_customer( ) { barber += 1; signal(barber_available); while (chair == 0) wait(chair_occupied); chair  = 1; } void get_haircut( ) { open += 1; signal(door_open); while (open > 0) wait(customer_left); }

19 ICS Channels zThe channel construct is a modular and efficient construct for message passing among concurrent tasks zPipe: one to one zMerger: many to one zSpliter: one to many zMultiplexer: many to many

20 ICS Channels zCommunication structures among parallel tasks are more comprehensive zThe specification of communication structures is easier zThe implementation of communication structures is more efficient zThe static analysis of communication structures is more effective

21 ICS An Example - Consumer-Producer producerconsumer spliter

22 ICS An Example - Consumer-Producer task::main( ) { spliter int chan; int i; par { producer( chan ); parfor (i = 0; i < 10; i++) consumer( chan ); }

23 ICS An Example - Consumer-Producer task::producer(spliter in int chan) { int i; for (i = 0; i < 100; i++) put(chan, i); for (i = 0; i < 10; i++) put(chan, END); }

24 ICS An Example - Consumer-Producer task::consumer(spliter in int chan) { int data; while ((data = get(chan)) != END) process(data); }

25 ICS Data Parallelism zConcurrency ydomain – an aggregate of synchronous tasks zSynchronization and communication ydomain – variables in global name space

26 ICS An Example – Matrix Multiplication  =

27 ICS An Example– Matrix Multiplication domain matrix_op[16] { int a[16], b[16], c[16]; multiply(distribute in int [16:block][16], distribute in int [16][16:block], distribute out int [16:block][16]); };

28 ICS task::main( ) { int A[16][16], B[16][16], C[16][16]; domain matrix_op m; read_array(A); read_array(B); m.multiply(A, B, C); print_array(C); } An Example– Matrix Multiplication

29 ICS matrix_op::multiply(A, B, C) distribute in int [16:block][16] A; distribute in int [16][16:block] B; distribute out int [16:block][16] C; { int i, j; a := A; b := B; for (i = 0; i < 16; i++) for (c[i] = 0, j = 0; j < 16; j++) c[i] += a[j] * matrix_op[i].b[j]; C := c; } An Example– Matrix Multiplication

30 ICS Platforms for the CCC Compiler zPCs and SMPs yPthread: shared memory + dynamic thread creation zPC clusters and SMP clusters yMillipede: distributed shared memory + dynamic remote thread creation zThe similarities between these two classes of machines enable a retargetable compiler implementation for CCC

31 ICS Organization of the CCC Programming System CCC compiler CCC runtime library Virtual shared memory machine interface CCC applications PthreadMillipede SMPSMP cluster

32 ICS The CCC Compiler zTasks → threads zMonitors → mutex locks, read-write locks, and condition variables zChannels → mutex locks and condition variables zDomains → set of synchronous threads zSynchronous execution → barriers

33 ICS Virtual Shared Memory Machine Interface zProcessor management zThread management zShared memory allocation zMutex locks zRead-write locks zCondition variables zBarriers

34 ICS The CCC Runtime Library zThe CCC runtime library contains a collection of functions that implements the salient abstractions of CCC on top of the virtual shared memory machine interface

35 ICS Performance Evaluation zSMPs yHardware : an SMP machine with four CPUs, each CPU is an Intel PentiumII Xeon 450MHz, and cache is 512K ySoftware : OS is Solaris 5.7 and library is pthread 1.26 zSMP clusters yHardware : four SMP machines, each of which has two CPUs, each CPU is Intel PentiumIII 500MHz, and cache is 512K ySoftware : OS is windows 2000 and library is millipede 4.0 yNetwork : Fast ethernet network 100Mbps

36 ICS Benchmarks zMatrix multiplication (1024 x 1024) z W arshall’s transitive closure (1024 x 1024) zAirshed simulation (5)

37 ICS Matrix Multiplication (SMPs) Sequential 1thread/1cpu2threads/1cpu4threads/1cpu8threads/1cpu CCC (1 cpu) (0.97, 0.97) (1.08, 1.08) (1.14, 1.14) (1.04, 1.04) Pthread (1 cpu) (0.98, 0.98) (1.12, 1.12) (1.17, 1.17) (1.08, 1.08) CCC (2 cpu) (1.89, 0.94) (2.6, 1.3) (2.93, 1.46) (2.31, 1.16) Pthread (2 cpu) (1.91, 0.96) (2.72, 1.36) (3.07, 1.53) (2.41, 1.20) CCC (4 cpu) (3.76, 0.94) (4.14, 1.03) (3.90, 0.98) Pthread (4 cpu) (3.85, 0.96) (4.39, 1.09) (4.11, 1.02) (4.46, 1.11) (4.83, 1.20) (unit : sec)

38 ICS Matrix Multiplication (SMP clusters) Sequential 1thread/1cpu2threads/1cpu4threads/1cpu8threads/1cp u CCC (1mach x 2cpu) (1.85, 0.929) (2.33, 1.16) (2.97, 1.48) (2.0, 1.0) Millipede (1mach x 2cpu) (1.89, 0.95) (2.39, 1.19) (3.05, 1.53) (2.09, 1.05) CCC (2mach x 2cpu) (3.45, 0.86) (4.6, 1.15) (4.89, 1.22) (3.17, 0.79) Millipede (2mach x 2cpu) (3.63, 0.91) (4.87, 1.22) (5.14, 1.27) (3.31, 0.82) CCC (4mach x 2cpu) (5.39, 0.67) (7.54, 0.94) (5.45, 0.73) (4.67, 0.58) Millipede (4mach x 2cpu) (6.0, 0.75) (8.56, 1.07) (5.57, 0.75) (4.87, 0.61) (unit : sec)

39 ICS Warshall’s Transitive Closure (SMPs) Sequtial 1thread/1cpu2threads/1cpu4threads/1cpu 8threads/1cpu CCC (1 cpu) (0.98, 0.98) (1.08, 1.08) (1.05, 1.05) (0.97, 0.97) Pthread (1 cpu) (0.99, 0.99) (1.11, 1.11) (1.07, 1.07) (0.99, 0.99) CCC (2 cpu) (1.80, 0.90) (2.16, 1.08) (1.91, 0.96) (1.53, 0.77) Pthread (2 cpu) (1.90, 0.95) (2.25, 1.12) (2.02, 1.01) (1.60, 0.80) CCC (4 cpu) (3.04, 0.76) (3.48, 0.87) (2.57, 0.64) (1.94, 0.49) Pthread (4 cpu) (3.40, 0.85) (3.68, 0.91) (2.72, 0.68) (2.02, 0.51) (unit : sec)

40 ICS Warshall’s Transitive Closure (SMP clusters) Sequential 1thread/1cpu 2threads/1cpu4threads/1cpu8threads/1cpu CCC (1mach x 2cpu) (1.91, 0.96) (2.29, 1.14) (2.98, 1.49) (1.98, 0.99) Millipade (1mach x 2cpu) (1.96, 0.98) (2.42, 1.21) (3.20, 1.59) (2.11, 1.56) CCC (2mach x 2cpu) (3.05, 0.76) (3.70, 0.92) (2.04, 0.52) (1.50, 0.38) Millipede (2mach x 2cpu) (3.45, 0.86) (4.02, 1.00) (2.17, 0.54) (1.61, 0.41) CCC (4mach x 2cpu) (5.08, 0.64) (5.59, 0.70) (3.40, 0.43) (2.20, 0.27) Millipede (4mach x 2cpu) (5.65, 0.71) (6.42, 0.80) (3.75, 0.46) (2.36, 0.30) (unit : sec)

41 ICS Seq 5\5\51\5\55\1\55\5\11\1\51\5\15\1\1 CCC(2cpu) (1.6,0.8) 8.84 (1.6,0.8) (1.3,0.6) (1.1,0.5) (1.3,0.6) 13.2 (1.1,0.5) (0.9,0.4) Pthread (2cpu) (1.6,0.8) 8.82 (1.6,0.8) (1.3,0.6) (1.1,0.5) (1.3,0.6) (1.1,0.5) (0.9,0.4) CCC(4cpu) (2.1,0.5) 6.84 (2.1,0.5) 9.03 (1.5,0.3) (1.1,0.2) 9.41 (1.5,0.3) (1.1,0.2) (0.9,0.2) Pthread (4cpu) (2.2,0.5) 6.81 (2.1,0.5) 9.02 (1.5,0.3) (1.1,0.2) 9.38 (1.5,0.3) (1.1,0.2) (0.9,0.2) Airshed simulation (SMPs) threads (unit : sec)

42 ICS Seq 5\5\51\5\55\1\55\5\11\1\51\5\15\1\1 CCC ( 1m x 2p ) (1.9,0.9) (1.8,0.9) (1.6,0.8) (1.1,0.5) (1.5,0.7) (1.1,0.5) (1.1,0.5) Millipede ( 1m x 2p ) (2.4,1.2) (2.3,1.1) (1.9,0.9) (1.6,0.8) (1.8,0.9) (1.5,0.7) (1.3,0.6) CCC ( 2m x 2p ) (1.8,0.4) (1.8,0.4) (0.9,0.2) (0.8,0.2) (0.9,0.2) (0.8,0.2) (0.5,0.1) Millipede ( 2m x 2p ) (2.4,0.6) (2.2,0.5) (1.5,0.4) (1.2,0.3) (1.6,0.4) (1.1,0.2) (1.3,0.3) CCC ( 4m x 2p ) (2.1,0.2) (1.9,0.2) (1.0,0.1) (0.8,0.1) (0.9,0.1) (0.8,0.1) (0.5,0.1) Millipede ( 4m x 2p ) (2.9,0.3) (2.8,0.3) (1.4,0.2) (1.2,0.1) (1.4,0.2) (1.2,0.1) (1.3,0.1) Airshed simulation (SMP clusters) threads (unit : sec)

43 ICS Conclusions zA high-level parallel programming language that uniformly integrates yBoth control and data parallelism yBoth shared variables and message passing zA modular parallel programming language zA retargetable compiler


Download ppt "Design and Implementation of the CCC Parallel Programming Language Nai-Wei Lin Department of Computer Science and Information Engineering National Chung."

Similar presentations


Ads by Google