Presentation is loading. Please wait.

Presentation is loading. Please wait.

CISC 879 : Software Support for Multicore Architectures John Cavazos Dept of Computer & Information Sciences University of Delaware www.cis.udel.edu/~cavazos/cisc879.

Similar presentations


Presentation on theme: "CISC 879 : Software Support for Multicore Architectures John Cavazos Dept of Computer & Information Sciences University of Delaware www.cis.udel.edu/~cavazos/cisc879."— Presentation transcript:

1 CISC 879 : Software Support for Multicore Architectures John Cavazos Dept of Computer & Information Sciences University of Delaware www.cis.udel.edu/~cavazos/cisc879 Lecture 10 Patterns for Parallel Programming III

2 CISC 879 : Software Support for Multicore Architectures Lecture 10: Overview Cell B.E. Clarification Design Patterns for Parallel Programs Finding Concurrency Algorithmic Structure Organize by Tasks Organize by Data Supporting Structures

3 CISC 879 : Software Support for Multicore Architectures LS-LS DMA transfer (PPU) int main() { pthread_t pts[N]; spe_context_ptr_t spe[N]; struct thread_args t_args[N]; int i; spe_program_handle_t *program; program = spe_image_open("../spu/hello"); for (i = 0; i < N; i++) { spe[i] = spe_context_create(0,NULL); spe_program_load(spe[i],program); t_args[i].spe = spe[i]; t_args[i].spuid = i; pthread_create(&pts[i],NULL, &my_spe_thread,&t_args[i]); } void *ls = spe_ls_area_get(spe[1]); unsigned int mbox_data = (unsigned int)ls; printf ("mbox_data %x\n", mbox_data); int rc; rc = spe_in_mbox_write(spe[0], &mbox_data, 1, SPE_MBOX_ALL_BLOCKING); rc = spe_out_intr_mbox_read(spe[0], &mbox_data, 1, SPE_MBOX_ALL_BLOCKING); for (i = 0; i < N; i++) { rc = spe_in_mbox_write(spe[i], &mbox_data, 1, SPE_MBOX_ALL_BLOCKING); } for (i = 0; i < N; i++) { pthread_join(pts[i],NULL); } spe_image_close(program); for (i = 0; i < N; i++) { spe_context_destroy(spe[i]); } return 0; }

4 CISC 879 : Software Support for Multicore Architectures LS-LS DMA transfer (PPU) int main() { pthread_t pts[N]; spe_context_ptr_t spe[N]; struct thread_args t_args[N]; int i; spe_program_handle_t *program; program = spe_image_open("../spu/hello"); for (i = 0; i < N; i++) { spe[i] = spe_context_create(0,NULL); spe_program_load(spe[i],program); t_args[i].spe = spe[i]; t_args[i].spuid = i; pthread_create(&pts[i],NULL, &my_spe_thread,&t_args[i]); } void *ls = spe_ls_area_get(spe[1]); unsigned int mbox_data = (unsigned int)ls; printf ("mbox_data %x\n", mbox_data); int rc; rc = spe_in_mbox_write(spe[0], &mbox_data, 1, SPE_MBOX_ALL_BLOCKING); rc = spe_out_intr_mbox_read(spe[0], &mbox_data, 1, SPE_MBOX_ALL_BLOCKING); for (i = 0; i < N; i++) { rc = spe_in_mbox_write(spe[i], &mbox_data, 1, SPE_MBOX_ALL_BLOCKING); } for (i = 0; i < N; i++) { pthread_join(pts[i],NULL); } spe_image_close(program); for (i = 0; i < N; i++) { spe_context_destroy(spe[i]); } return 0; }

5 CISC 879 : Software Support for Multicore Architectures LS-LS DMA transfer (PPU) int main() { pthread_t pts[N]; spe_context_ptr_t spe[N]; struct thread_args t_args[N]; int i; spe_program_handle_t *program; program = spe_image_open("../spu/hello"); for (i = 0; i < N; i++) { spe[i] = spe_context_create(0,NULL); spe_program_load(spe[i],program); t_args[i].spe = spe[i]; t_args[i].spuid = i; pthread_create(&pts[i],NULL, &my_spe_thread,&t_args[i]); } void *ls = spe_ls_area_get(spe[1]); unsigned int mbox_data = (unsigned int)ls; printf ("mbox_data %x\n", mbox_data); int rc; rc = spe_in_mbox_write(spe[0], &mbox_data, 1, SPE_MBOX_ALL_BLOCKING); rc = spe_out_intr_mbox_read(spe[0], &mbox_data, 1, SPE_MBOX_ALL_BLOCKING); for (i = 0; i < N; i++) { rc = spe_in_mbox_write(spe[i], &mbox_data, 1, SPE_MBOX_ALL_BLOCKING); } for (i = 0; i < N; i++) { pthread_join(pts[i],NULL); } spe_image_close(program); for (i = 0; i < N; i++) { spe_context_destroy(spe[i]); } return 0; }

6 CISC 879 : Software Support for Multicore Architectures LS-LS DMA transfer (SPU) int main() { gettimeofday(&tv,NULL); printf("spu %lld; t.tv_usec %ld\n", spuid,tv.tv_usec); if (spuid == 0) { unsigned int ea; unsigned int tag = 0; unsigned int mask = 1; ea = spu_read_in_mbox(); printf("ea = %p\n",(void*)ea); mfc_put(&tv,ea + (unsigned int)&tv, sizeof(tv),tag,1,0); mfc_write_tag_mask(mask); mfc_read_tag_status_all(); spu_write_out_intr_mbox(0); } spu_read_in_mbox(); printf("spu %lld; tv.tv_usec = %ld\n", spuid,tv.tv_usec); return 0; }

7 CISC 879 : Software Support for Multicore Architectures LS-LS Output -bash-3.2$./a.out spu 0; t.tv_usec = 875360 spu 1; t.tv_usec = 876446 spu 2; t.tv_usec = 877443 spu 3; t.tv_usec = 878459 mbox_data f7764000 ea = 0xf7764000 spu 0; tv.tv_usec = 875360 spu 1; tv.tv_usec = 875360 spu 2; tv.tv_usec = 877443 spu 3; tv.tv_usec = 878459

8 CISC 879 : Software Support for Multicore Architectures Organize by Data Operations on core data structure Geometric Decomposition Recursive Data

9 CISC 879 : Software Support for Multicore Architectures Geometric Deomposition Arrays and other linear structures Divide into contiguous substructures Example: Matrix multiply Data-centric algorithm and linear data structure (array) implies geometric decomposition

10 CISC 879 : Software Support for Multicore Architectures Recursive Data Lists, trees, and graphs Structures where you would use divide-and-conquer May seem that can only move sequentially through data structure But, there are ways to expose concurrency

11 CISC 879 : Software Support for Multicore Architectures Recursive Data Example Find the Root: Given a forest of directed trees find the root of each node Parallel approach: For each node, find its successor’s successor Repeat until no changes O(log n) vs O(n) Slide Source: Dr. Rabbah, IBM, MIT Course 6.189 IAP 2007

12 CISC 879 : Software Support for Multicore Architectures Organize by Flow of Data Organize By Flow of Data RegularIrregular Event-Based Coordination Pipeline

13 CISC 879 : Software Support for Multicore Architectures Organize by Flow of Data Computation can be viewed as a flow of data going through a sequence of stages Pipeline: one-way predictable communication Event-based Coordination: unrestricted unpredictable communication

14 CISC 879 : Software Support for Multicore Architectures Pipeline performance Concurrency limited by pipeline depth Balance computation and communication (architecture dependent) Stages should be equally computationally intensive Slowest stage creates bottleneck Combine lightly loaded stages or decompose heavily- loaded stages Time to fill and drain pipe should be small

15 CISC 879 : Software Support for Multicore Architectures Supporting Structures Single Program Multiple Data (SPMD) Loop Parallelism Master/Worker Fork/Join

16 CISC 879 : Software Support for Multicore Architectures SPMD Pattern Create single program that runs on each processor Initialize Obtain a unique identifier Run the same program each processor Identifier and input data can differentiate behavior Distribute data (if any) Finalize Slide Source: Dr. Rabbah, IBM, MIT Course 6.189 IAP 2007

17 CISC 879 : Software Support for Multicore Architectures SPMD Challenges Split data correctly Correctly combine results Achieve even work distribution If programs require dynamic load balancing, another pattern may be more suitable (Job Queue) Slide Source: Dr. Rabbah, IBM, MIT Course 6.189 IAP 2007

18 CISC 879 : Software Support for Multicore Architectures Loop Parallelism Pattern Many programs expressed as iterative constructs Programming models like OpenMP provide pragmas to automatically assign loop iterations to processors Slide Source: Dr. Rabbah, IBM, MIT Course 6.189 IAP 2007

19 CISC 879 : Software Support for Multicore Architectures Master/Work Pattern Slide Source: Dr. Rabbah, IBM, MIT Course 6.189 IAP 2007

20 CISC 879 : Software Support for Multicore Architectures Master/Work Pattern Slide Source: Dr. Rabbah, IBM, MIT Course 6.189 IAP 2007 Relevant where tasks have no dependencies Embarrassingly parallel Problem is determining when entire problem complete

21 CISC 879 : Software Support for Multicore Architectures Fork/Join Pattern Slide Source: Dr. Rabbah, IBM, MIT Course 6.189 IAP 2007 Parent creates new tasks (fork), then waits until they complete (join) Tasks created dynamically Tasks can create more tasks Tasks managed according to relationships


Download ppt "CISC 879 : Software Support for Multicore Architectures John Cavazos Dept of Computer & Information Sciences University of Delaware www.cis.udel.edu/~cavazos/cisc879."

Similar presentations


Ads by Google