Multicore Designs Presented By: Mahendra B Salunke Asst. Professor, Dept of Comp Engg., SITS, Narhe, Pune. URL: www.microsig.webs.com.

Slides:



Advertisements
Similar presentations
Larrabee Eric Jogerst Cortlandt Schoonover Francis Tan.
Advertisements

Vectors, SIMD Extensions and GPUs COMP 4611 Tutorial 11 Nov. 26,
EZ-COURSEWARE State-of-the-Art Teaching Tools From AMS Teaching Tomorrow’s Technology Today.
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
AMD OPTERON ARCHITECTURE Omar Aragon Abdel Salam Sayyad This presentation is missing the references used.
A Complete GPU Compute Architecture by NVIDIA Tamal Saha, Abhishek Rawat, Minh Le {ts4rq, ar8eb,
Contents Even and odd memory banks of 8086 Minimum mode operation
Fall EE 333 Lillevik 333f06-l20 University of Portland School of Engineering Computer Organization Lecture 20 Pipelining: “bucket brigade” MIPS.
Computers Organization & Assembly Language Chapter 1 THE 80x86 MICROPROCESSOR.
Mobile Pentium 4 Architecture Supporting Hyper-ThreadingTechnology Hakan Burak Duygulu CmpE
1 Microprocessor-based Systems Course 4 - Microprocessors.
Processor history / DX/SX SX/DX Pentium 1997 Pentium MMX
Vacuum tubes Transistor 1948 ICs 1960s Microprocessors 1970s.
Chapter 12 Three System Examples The Architecture of Computer Hardware and Systems Software: An Information Technology Approach 3rd Edition, Irv Englander.
EECS 470 Superscalar Architectures and the Pentium 4 Lecture 12.
The Pentium 4 CPSC 321 Andreas Klappenecker. Today’s Menu Advanced Pipelining Brief overview of the Pentium 4.
Chapter 12 CPU Structure and Function. Example Register Organizations.
NYU DARPA DIS kick-off September 24, Comparing IA-64 and HPL-PD NYU.
Intel IA-64 Architecture Chehun Kim Glenn Ramos. Contents *Pipelining - Stages of pipelining *Microprogramming *Interconnection Structures.
COMP381 by M. Hamdi 1 Commercial Superscalar and VLIW Processors.
CS854 Pentium III group1 Instruction Set General Purpose Instruction X87 FPU Instruction SIMD Instruction MMX Instruction SSE Instruction System Instruction.
7-Aug-15 (1) CSC Computer Organization Lecture 6: A Historical Perspective of Pentium IA-32.
Intel Pentium 4 Processor Presented by Presented by Steve Kelley Steve Kelley Zhijian Lu Zhijian Lu.
Lect 13-1 Lect 13: and Pentium. Lect Microprocessor Family  Microprocessor  Introduced in 1989  High Integration  On-chip 8K.
Intel Architecture. Changes in architecture Software architecture: –Front end (Feature changes such as adding more graphics, changing the background colors,
Intel
Assembly Language for Intel-Based Computers, 4 th Edition Chapter 2: IA-32 Processor Architecture (c) Pearson Education, All rights reserved. You.
The Intel Architecture and Windows Internals
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
The Pentium Processor.
The Pentium Processor Chapter 3 S. Dandamudi To be used with S. Dandamudi, “Introduction to Assembly Language Programming,” Second Edition, Springer,
The Pentium Processor Chapter 3 S. Dandamudi.
The Arrival of the 64bit CPUs - Itanium1 นายชนินท์วงษ์ใหญ่รหัส นายสุนัยสุขเอนกรหัส
Is Out-Of-Order Out Of Date ? IA-64’s parallel architecture will improve processor performance William S. Worley Jr., HP Labs Jerry Huck, IA-64 Architecture.
Fall 2012 Chapter 2: x86 Processor Architecture. Irvine, Kip R. Assembly Language for x86 Processors 6/e, Chapter Overview General Concepts IA-32.
History of Microprocessor MPIntroductionData BusAddress Bus
Hyper-Threading Technology Architecture and Micro-Architecture.
AMD Opteron Overview Michael Trotter (mjt5v) Tim Kang (tjk2n) Jeff Barbieri (jjb3v)
ARM for Wireless Applications ARM11 Microarchitecture On the ARMv6 Connie Wang.
Introducing The IA-64 Architecture - Kalyan Gopavarapu - Kalyan Gopavarapu.
The original MIPS I CPU ISA has been extended forward three times The practical result is that a processor implementing MIPS IV is also able to run MIPS.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
Processor Architecture
Next Generation ISA Itanium / IA-64. Operating Environments IA-32 Protected Mode/Real Mode/Virtual Mode - if supported by the OS IA-64 Instruction Set.
The life of an instruction in EV6 pipeline Constantinos Kourouyiannis.
Intel Multimedia Extensions and Hyper-Threading Michele Co CS451.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
Introduction to Intel IA-32 and IA-64 Instruction Set Architectures.
Advanced Pipelining 7.1 – 7.5. Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.
Chapter Overview General Concepts IA-32 Processor Architecture
Protection in Virtual Mode
Microarchitecture.
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue
Multi-core processors
/ Computer Architecture and Design
Introduction to Pentium Processor
Superscalar Processors & VLIW Processors
IA-64 Microarchitecture --- Itanium Processor
NVIDIA Fermi Architecture
Comparison of Two Processors
Henk Corporaal TUEindhoven 2011
CC423: Advanced Computer Architecture ILP: Part V – Multiple Issue
* From AMD 1996 Publication #18522 Revision E
Lecture 3 (Microprocessor)
Presentation transcript:

Multicore Designs Presented By: Mahendra B Salunke Asst. Professor, Dept of Comp Engg., SITS, Narhe, Pune. URL:

Contents Intel 64bit Architecture: ◦ Block Diagram, ◦ Basic Execution Environment, ◦ Data Types, ◦ Specific advances: Instruction set, Intel Micro-architecture code name Nehalem, SIMD Instructions, Hyper threading Technology, Virtualization Technology (Refer TB3) Systems Programming, Multiple Processor Management (Refer RB1)

Brief History 16-bit Processors and Segmentation (1978) The Intel® 286 Processor (1982) The Intel386™ Processor (1985): First IA-32 The Intel486™ Processor (1989) The Intel® Pentium® Processor (1993) The P6 Family of Processors ( ) ◦ Intel Pentium Pro processor ◦ Intel Pentium II processor ◦ Pentium II Xeon processor ◦ Intel Celeron processor ◦ Intel Pentium III processor ◦ Pentium III Xeon processor

Brief History Continued… The Intel® Pentium® 4 Processor Family ( ) The Intel® Xeon® Processor ( ) The Intel® Pentium® M Processor (2003-Current) The Intel® Pentium® Processor Extreme Edition ( ): First Intel 64bit The Intel® Core™ Duo and Intel® Core™ Solo Processors ( ) The Intel® Xeon® Processor 5100, 5300 Series and Intel® Core™2 Processor Family (2006-Current) The Intel® Xeon® Processor 5200, 5400, 7400 Series and Intel® Core™2 Processor Family (2007-Current) The Intel® Atom™ Processor Family (2008-Current)

Brief History Continued… The Intel® Core™i7 Processor Family (2008-Current) The Intel® Xeon® Processor 7500 Series (2010) 2010 Intel® Core™ Processor Family (2010) The Intel® Xeon® Processor 5600 Series (2010) Second Generation Intel® Core™ Processor Family (2011)

Block Diagram

Architecture Overview The processor instruction set architecture is designed to achieve a large degree of instruction level parallelism by using speculation and predication. There are two types of speculation: ◦ Control speculation is the execution of an operation before the condition governing its execution. ◦ Data speculation refers to the execution of a memory load prior to a store that might change its value.

Continued…. Predication is the conditional execution of instructions that removes branches used for conditional execution eliminating associated miss-predict penalties. Predication and speculation work together to give the compiler a significantly increased scheduling scope across which performance optimizations can be performed more effectively.

Continued…. The hardware resources are exposed to the compiler in this explicitly parallel instruction computing (EPIC) machine. Each memory load and store has a 2-bit cache hint field in which the compiler encodes its prediction of the spatial and temporal locality of the memory area being accessed. The processor uses this information to determine the placement of cache lines in the cache hierarchy.

Continued…. The branch prediction table (BPT) contains 512 entries and uses a two-level adaptive algorithm. Each entry tracks the four most recent occurrences of that branch. The processor hardware is organized into a ten-stage core pipeline, shown in figure below, that can execute up to six IA-64 instructions in parallel each clock.

Continued…. Ten-stage core pipeline

Continued…. The first three pipeline stages perform the instruction fetch and deliver the instructions into a decoupling buffer that enables the front-end of the machine to operate independently from the back-end, by allowing the front end to speculatively fetch ahead and by hiding the instruction cache and branch prediction latencies.

Continued…. Dispersal and register renaming are performed in the next two stages. Operand delivery is accomplished across the WLD and REG stages, where the register file is accessed and data is delivered through the bypass network after processing the predicate control. Finally, the last three stages perform the wide parallel execution followed by exception management and retirement.

Continued…. The processor has 15 execution units: four integer, four multimedia, two floating point, three branch and two load/store units. The processor implements 128 integer and 128 floating point registers, 64 one- bit predicate registers and eight branch registers.

Continued…. The processor includes three levels of cache organized in a hierarchical manner. The L1 and L2 caches are integrated on die. At the first level, instruction and data caches are split, each 16kB in size, four-way set- associative and with a 32-byte line size. The second level of cache is unified, 96 kB in size, six-way set-associative and with a 64- byte line size.

Continued…. The L3 cache contains up to 4 MB of custom designed on-cartridge memory and is accessed through a dedicated 128- bit BSB source-synchronous interface running at the processor’s core frequency. The L3 cache is four-way set-associative with a line size of 64 bytes.

Continued…. Both L2 and L3 caches are completely error correction code (ECC) protected. To enable software-guided optimal utilization of the cache hierarchy, the processor uses a set of memory locality hints that indicate the temporal locality of each access at each level of hierarchy. The processor uses these hints to determine the allocation and replacement strategies for each level of cache. In addition, the processor uses a bias hint to indicate that the software intends to modify the data of a given cache line.

Continued…. The floating-point unit delivers up to 6.4 Gflops and provides full support for single, double, extended, and mixed-mode precision computations. Parallel IA-64 FP instructions which operate on pairs of 32-bit single precision numbers increase the single-precision FP computation throughput. The chip also supports multimedia instructions that treat the general registers as vectors of eight 8-bit, four 16-bit, or two 32-bit elements.

Continued…. The floating-point execution units support simultaneous multiply-add to provide performance for scientific computation.

Continued…. The processor includes special hardware support to ensure full binary compatibility with the IA-32 instruction set. The dynamic out-of-order scheduler optimizes performance on legacy binaries. Shared caches and execution resources increase the area efficiency. The processor can run a mix of IA-32 and IA-64 applications on an IA-64 operating system, as well as IA-32 applications on an IA-32 operating system, in both uni-processor and multi- processor configurations.

Continued…. The processor implements a machine check architecture (MCA) that provides the ability to Continue, Recover, or Contain detected errors. All significant structures on the chip are protected by parity or ECC.

Basic Execution Environment Describes how the processor executes instructions and how it stores and manipulates data. The execution environment described here includes memory (the address space), general-purpose data registers, segment registers, the flag register, and the instruction pointer register.

MODES OF OPERATION The operating mode determines which instructions and architectural features are accessible The IA-32 architecture supports three basic operating modes: ◦ protected mode, ◦ real-address mode, and ◦ system management mode.

Protected mode This mode is the native state of the processor. Among the capabilities of protected mode is the ability to directly execute “real-address mode” 8086 software in a protected, multi- tasking environment. This feature is called virtual-8086 mode, although it is not actually a processor mode. Virtual-8086 mode is actually a protected mode attribute that can be enabled for any task.

Real-address mode This mode implements the programming environment of the Intel 8086 processor with extensions (such as the ability to switch to protected or system management mode). The processor is placed in real-address mode following power-up or a reset.

System management mode (SMM) This mode provides an operating system or executive with a transparent mechanism for implementing platform- specific functions such as power management and system security. The processor enters SMM when the external SMM interrupt pin (SMI#) is activated or an SMI is received from the advanced programmable interrupt controller (APIC).

SMM Continued…. In SMM, the processor switches to a separate address space while saving the basic context of the currently running program or task. SMM-specific code may then be executed transparently. Upon returning from SMM, the processor is placed back into its state prior to the system management interrupt. SMM was introduced with the Intel386™ SL and Intel486™ SL processors and became a standard IA-32 feature with the Pentium processor family.

Intel® 64 Architecture (2.2.10) Intel 64 architecture increases the linear address space for software to 64 bits and supports physical address space up to 40 bits. The technology also introduces a new operating mode referred to as IA-32e mode. IA-32e mode operates in one of two sub-modes: ◦ Compatibility mode enables a 64-bit operating system to run most legacy 32-bit software unmodified, ◦ 64-bit mode enables a 64-bit operating system to run applications written to access 64-bit address space.

In the 64-bit mode, Applications may access: 64-bit flat linear addressing 8 additional general-purpose registers (GPRs) 8 additional registers for streaming SIMD extensions (SSE, SSE2, SSE3 and SSSE3) 64-bit-wide GPRs and instruction pointers uniform byte-register addressing fast interrupt-prioritization mechanism a new instruction-pointer relative-addressing mode

Data Types

Reference Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 1: Basic Architecture pdf