Niagara: a 32-Way Multithreaded SPARC Processor

Slides:



Advertisements
Similar presentations
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
Advertisements

Federation: Repurposing Scalar Cores for Out- of-Order Instruction Issue David Tarjan*, Michael Boyer, and Kevin Skadron* University of Virginia Department.
AMD OPTERON ARCHITECTURE Omar Aragon Abdel Salam Sayyad This presentation is missing the references used.
Intel Multi-Core Technology. New Energy Efficiency by Parallel Processing – Multi cores in a single package – Second generation high k + metal gate 32nm.
Structure of Computer Systems
PERFORMANCE ANALYSIS OF MULTIPLE THREADS/CORES USING THE ULTRASPARC T1 (NIAGARA) Unique Chips and Systems (UCAS-4) Dimitris Kaseridis & Lizy K. John The.
Microprocessor Microarchitecture Multithreading Lynn Choi School of Electrical Engineering.
June 30th, 2006 ICS’06 -- Håkan Zeffer: Håkan Zeffer Zoran Radovic Martin Karlsson Erik Hagersten Uppsala University Sweden TMA A Trap-Based.
Single-Chip Multiprocessor Nirmal Andrews. Case for single chip multiprocessors Advances in the field of integrated chip processing. - Gate density (More.
Dr. Alexandra Fedorova August 2007 Introduction to Systems Research at SFU.
1 Burroughs B5500 multiprocessor. These machines were designed to support HLLs, such as Algol. They used a stack architecture, but part of the stack was.
CS 7810 Lecture 23 Maximizing CMP Throughput with Mediocre Cores J. Davis, J. Laudon, K. Olukotun Proceedings of PACT-14 September 2005.
SYNAR Systems Networking and Architecture Group CMPT 886: Architecture of Niagara I Processor Dr. Alexandra Fedorova School of Computing Science SFU.
Review: Multiprocessor Basics
1 Pipelining for Multi- Core Architectures. 2 Multi-Core Technology Single Core Dual CoreMulti-Core + Cache + Cache Core 4 or more cores.
Synergistic Processing In Cell’s Multicore Architecture Michael Gschwind, et al. Presented by: Jia Zou CS258 3/5/08.
Simultaneous Multithreading:Maximising On-Chip Parallelism Dean Tullsen, Susan Eggers, Henry Levy Department of Computer Science, University of Washington,Seattle.
1 Lecture 26: Case Studies Topics: processor case studies, Flash memory Final exam stats:  Highest 83, median 67  70+: 16 students, 60-69: 20 students.
1 Lecture 25: Multi-core Processors Today’s topics:  Writing parallel programs  SMT  Multi-core examples Reminder:  Assignment 9 due Tuesday.
1 Lecture 16: Cache Innovations / Case Studies Topics: prefetching, blocking, processor case studies (Section 5.2)
Single-Chip Multi-Processors (CMP) PRADEEP DANDAMUDI 1 ELEC , Fall 08.
Multi-core Processing The Past and The Future Amir Moghimi, ASIC Course, UT ECE.
Joram Benham April 2,  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.
University of Michigan Electrical Engineering and Computer Science 1 Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-Thread Applications.
Crossbar switches By Alejandro Ayala. Hardware design Show hardware design of several modern crossbar switches used for multiprocessing system on chip.
Multi-core architectures. Single-core computer Single-core CPU chip.
Multi-Core Architectures
1 Multi-core processors 12/1/09. 2 Multiprocessors inside a single chip It is now possible to implement multiple processors (cores) inside a single chip.
Lecture 11 Multithreaded Architectures Graduate Computer Architecture Fall 2005 Shih-Hao Hung Dept. of Computer Science and Information Engineering National.
1 Thread level parallelism: It’s time now ! André Seznec IRISA/INRIA CAPS team.
POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? Multithreaded and multicore processors Marco D. Santambrogio:
Outline  Over view  Design  Performance  Advantages and disadvantages  Examples  Conclusion  Bibliography.
1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi
.1 Multiprocessor on a Chip & Simultaneous Multi-threads [Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005]
Anshul Kumar, CSE IITD Other Architectures & Examples Multithreaded architectures Dataflow architectures Multiprocessor examples 1 st May, 2006.
SIMULTANEOUS MULTITHREADING Ting Liu Liu Ren Hua Zhong.
Multi-core processors. 2 Processor development till 2004 Out-of-order Instruction scheduling Out-of-order Instruction scheduling.
Floating Point Numbers & Parallel Computing. Outline Fixed-point Numbers Floating Point Numbers Superscalar Processors Multithreading Homogeneous Multiprocessing.
Computer Architecture Lec 15 – T1 (“Niagara”). 01/19/10 T12 Review Caches contain all information on state of cached memory blocks Snooping cache over.
Computer Architecture: Multithreading (I) Prof. Onur Mutlu Carnegie Mellon University.
CMP/CMT Scaling of SPECjbb2005 on UltraSPARC T1 (Niagara) Dimitris Kaseridis and Lizy K. John The University of Texas at Austin Laboratory for Computer.
HyperThreading ● Improves processor performance under certain workloads by providing useful work for execution units that would otherwise be idle ● Duplicates.
Lecture 16 Instruction Level Parallelism: Hyper-threading and limits
Hybrid Multi-Core Architecture for Boosting Single-Threaded Performance Presented by: Peyman Nov 2007.
CSC 7080 Graduate Computer Architecture Lec 8 – Multiprocessors & Thread- Level Parallelism (3) – Sun T1 Dr. Khalaf Notes adapted from: David Patterson.
.1 Multiprocessor on a Chip & Simultaneous Multi-threads [Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005]
E6200, Fall 07, Oct 24Ambale: CMP1 Bharath Ambale Venkatesh 10/24/2007.
Advanced Computer Architecture pg 1 Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8) Henk Corporaal
ECE/CS 552: Multithreading and Multicore © Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
Carnegie Mellon /18-243: Introduction to Computer Systems Instructors: Anthony Rowe and Gregory Kesden 27 th (and last) Lecture, 28 April 2011 Multi-Core.
UltraSparc IV Tolga TOLGAY. OUTLINE Introduction History What is new? Chip Multitreading Pipeline Cache Branch Prediction Conclusion Introduction History.
Niagara: A 32-Way Multithreaded Sparc Processor Kongetira, Aingaran, Olukotun Presentation by: Mohamed Abuobaida Mohamed For COE502 : Parallel Processing.
Multi-Core CPUs Matt Kuehn. Roadmap ► Intel vs AMD ► Early multi-core processors ► Threads vs Physical Cores ► Multithreading and Multi-core processing.
Roza Ghamari CMPE 511 Niagara & Niagara 2. Outline Introduction Niagara Specifications Niagara 2 Specifications Comparison Conclusion References.
Itanium® 2 Processor Architecture
COMP 740: Computer Architecture and Implementation
Lynn Choi School of Electrical Engineering
Multi-core processors
Computer Structure Multi-Threading
Computer Architecture
David Patterson Electrical Engineering and Computer Sciences
Multi-Core Computing Osama Awwad Department of Computer Science
Lecture on High Performance Processor Architecture (CS05162)
Computer Architecture: Multithreading (I)
Computer Architecture Lecture 4 17th May, 2006
Database Servers on Chip Multiprocessors: Limitations and Opportunities Nikos Hardavellas With Ippokratis Pandis, Ryan Johnson, Naju Mancheril, Anastassia.
CSC3050 – Computer Architecture
CS 286 Computer Organization and Architecture
Chip&Core Architecture
Presentation transcript:

Niagara: a 32-Way Multithreaded SPARC Processor P. Kongetira, K. Aingaran, K.Olokotun Sun Microsystems Presented by Bogdan Romanescu

Goal Commercial server applications: High thread level parallelism (TLP) Large numbers of parallel client requests Low instruction level parallelism (ILP) High cache miss rates Many unpredictable branches Frequent load-load dependencies Power, cooling, and space are major concerns for data centers

Sun’s Solution UltraSPARC T1 processor “the highest-throughput and most eco-responsible processor ever created”® Multicore Fine-grain multithreading within core Simple pipelines Small L1 cache Shared L2 Metric: Performance/Watt

Architecture

Sparc pipe Hazards: UltraSPARC II style Single issue 6 stage: F, S, D, E, M, W Shared units: L1 $ TLB X units pipe registers Hazards: Data Structural

Integer Register file One register file / thread SPARC window: in, out, local registers Highly integrated cell structure to support 4 threads: 8 windows of 32 locations / thread 3 read ports + 2 write ports Read/write: single cycle latency 1 Active Window Cell (copy of the architectural set window)

Thread scheduling Thread selection based on: Select & Fetch coupled Previous long latency instruction in pipe Instruction type LRU status Select & Fetch coupled

Memory 16 KB 4 way set assoc. I$/ core 8 KB 4 way set assoc. D$/ core 3MB 12 way set assoc. L2 $ shared 4 x 750KB independent banks 2 cycle throughput, 8 cycle latency Direct link to DRAM & Jbus Manages cache coherence for the 8 cores CAM based directory Write through allocate LD no-allocate ST

Performance Test\Architecture Sun Fire T2000 IBM p5-550 with 2 dual-core Power5 chips Dell PowerEdge SPECjbb2005 (Java server software) business operations/ sec 63,378 61,789 24,208 (SC1425 with dual single-core Xeon) SPECweb2005 (Web server performance) 14,001 7,881 4,850 (2850 with two dual-core Xeon processors) NotesBench (Lotus Notes performance) 16,061 14,740

“Home run“ ? Relatively slow single-thread performance Poor floating-point performance Lack of software support ( Sun Fire T2000 does not support Linux or Windows) Price Concurrency counterattack no place as a general-purpose computer running databases small low-end market segment ? Niagara II & The “Rock” – multiprocessor & enhanced single thread support

References [1] P. Kongetira, et al, “A 32-Way Multithreaded SPARC Processor,” IEEE Micro, vol. 25, pp. 21-29, Mar., 2005. [2] A. S. Leon, et al, “A Power-Efficient High-Throughput 32-Thread SPARC Processor”, ISSCC 2006 , SESSION 5 , PROCESSORS [3] S. Chaudhry, S. Yip, P. Caprioli and M. Tremblay, “High Performance Throughput Computing” , IEEE Micro, vol. 25, Issue 3, 2005 [4] http://opensparc.sunsource.net/nonav/opensparct1.html [5] http://www.sun.com/processors/UltraSPARC-T1/features.xml [6] http://www.sun.com/servers/coolthreads/t1000/benchmarks.jsp [7] http://news.com.com/Sun+begins+Sparc+phase+of+server+overhaul/2163-1010_3-5983365.html [8] http://h71028.www7.hp.com/ERC/cache/280124-0-0-0-121.html