Presentation is loading. Please wait.

Presentation is loading. Please wait.

SSU 1 Dr.A.Srinivas PES Institute of Technology Bangalore, India 9 – 20 July 2012.

Similar presentations


Presentation on theme: "SSU 1 Dr.A.Srinivas PES Institute of Technology Bangalore, India 9 – 20 July 2012."— Presentation transcript:

1 SSU 1 Dr.A.Srinivas PES Institute of Technology Bangalore, India a.srinivas@pes.edu 9 – 20 July 2012

2 Schedule Day-1: Module 1: Multicore Architecture Module-2 : Parallel Programming Days 2-5: Parallel Programming with OpenMP Assigment : JPEG Compression & Decompression using Parallel Programming OpenMP Directives 2

3 Memory Hierarchy of early computers: 3 levels CPU registers DRAM Memory Disk storage 3

4 CACHE MEMORY Principle of locality helped to speed up main memory access by introducing small fast memories known as CACHE MEMORIES that hold blocks of the most recently referenced instructions and data items. Cache is a small fast storage device that holds the operands and instructions most likely to be used by the CPU. 4

5 Due to increasing gap between CPU and main Memory, small SRAM memory called L1 cache is inserted. L1 caches can be accessed almost as fast as the registers, typically in 1 or 2 clock cycle Due to even more increasing gap between CPU and main memory, Additional cache: L2 cache inserted between L1 cache and main memory : accessed in fewer clock cycles. 5

6 L2 cache attached to the memory bus or to its own cache bus Some high performance systems also include additional L3 cache which sits between L2 and main memory. It has different arrangement but principle is the same. The cache is placed both physically closer and logically closer to the CPU than the main memory. 6

7 CACHE LINES / BLOCKS Cache memory is subdivided into cache lines Cache Lines / Blocks: The smallest unit of memory than can be transferred between the main memory and the cache. 7

8 8

9 9

10 Core Vs Processor - A core means, there could be more than one CPU inside; - A Quad core processor of 3 GHz will have four cores in the CPU running at 3 GHz, each with its own Cache.. 10

11 11

12 12

13 Amdahl’s Law Speedup = 13 In Terms of No. of Cores: Speedup = Where S is the time spent in executing the serialized portion of the parallelized version And n is the number of cores.

14 14

15 Multicore Philosophy - Two or more cores with in a single Die - each core has its own set of instructions and architectural resources 15

16 16

17 17

18 18

19 19

20 20

21 21

22 22

23 23

24 24

25 25

26 26

27 27

28 28

29 29

30 30

31 Hyper Threading: - Parts of a single processor are shared between threads - Execution Engine is shared - OS task switching does not happen in Hyper threading. -Processor is kept as busy as possible 31

32 32

33 33

34 34

35 35

36 36

37 37

38 38

39 39

40 40

41 41

42 42 Branch Target Buffer, Translation Look aside Buffer

43 43

44 44

45 45

46 46

47 47

48 48

49 49

50 50

51 51

52 52

53 53

54 54

55 55

56 56

57 57

58 58

59 59

60 60

61 61

62 62

63 63

64 64

65 65

66 66

67 67

68 68

69 69

70 70

71 71

72 72

73 73

74 74

75 75

76 76

77 77

78 78

79 79

80 80

81 References [1] Stephen Blair-Chappell, “Intel Core 2 Architecture - Implication for software developers, Intel Compiler Labs. [2] Ruud van der Pas, “An Overview of OpenMP”, IWOMP 2010, CCS, University of Tsukuba, Tsukuba, Japan, June 14-16, 2010 [3] Jernej Barbic, “Multi-core architectures”, 15-213, Spring 2007, May 3, 2007 [4] Ankit Kurana, “ Intel’s Multicore Processors”, Intel Corporation. [5] OpenMP Application Program Interface, Version 3.1 July 2011 [6] Ruud van der Pas, “An Introduction Into OpenMP”, IWOMP 2005, University of Oregon, Eugene, Oregon, USA, June 1-4, 2005 81


Download ppt "SSU 1 Dr.A.Srinivas PES Institute of Technology Bangalore, India 9 – 20 July 2012."

Similar presentations


Ads by Google