© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved Embedded Computer Architecture 5SAI0 Simulation - chapter 9 - Luc Waeijen 16 Nov.

Slides:



Advertisements
Similar presentations
Embedded System, A Brief Introduction
Advertisements

An Overview Of Virtual Machine Architectures Ross Rosemark.
Accurately Approximating Superscalar Processor Performance from Traces Kiyeon Lee, Shayne Evans, and Sangyeun Cho Dept. of Computer Science University.
Enabling Efficient On-the-fly Microarchitecture Simulation Thierry Lafage September 2000.
CISC Machine Learning for Solving Systems Problems Presented by: John Tully Dept of Computer & Information Sciences University of Delaware Using.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 19 Scheduling IV.
(C) 2002 Daniel SorinDuke Architecture Why Computer Architecture is Exciting and Challenging Daniel Sorin Department of Electrical & Computer Engineering.
Reliability SHARPE Reliability and SHARPE. Outline 1. What is Reliability? 2. How can you evaluate it? 3. What is SHARPE? 4. Usage of SHARPE.
NoC Modeling Networks-on-Chips seminar May, 2008 Anton Lavro.
Instruction Level Parallelism (ILP) Colin Stevens.
© Prentice Hall CHAPTER 9 Application Development by Information Systems Professionals.
What Great Research ?s Can RAMP Help Answer? What Are RAMP’s Grand Challenges ?
Copyright © 2006 Addison-Wesley. All rights reserved.1-1 ICS 410: Programming Languages Chapter 3 : Describing Syntax and Semantics Operational Semantics.
Chapter 13 Embedded Systems
System-Level Verification –a Comparison of Approach Ray Turner Rapid Systems Prototyping, IEEE International Workshop on.
Logic Design Outline –Logic Design –Schematic Capture –Logic Simulation –Logic Synthesis –Technology Mapping –Logic Verification Goal –Understand logic.
Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.1 Introduction.
1-1 Embedded Software Development Tools and Processes Hardware & Software Hardware – Host development system Software – Compilers, simulators etc. Target.
Reconfigurable Computing in the Undergraduate Curriculum Jason D. Bakos Dept. of Computer Science and Engineering University of South Carolina.
SECTION 1: INTRODUCTION TO SIMICS Scott Beamer CS152 - Spring 2009.
Principle of Functional Verification Chapter 1~3 Presenter : Fu-Ching Yang.
February 11, 2003Ninth International Symposium on High Performance Computer Architecture Memory System Behavior of Java-Based Middleware Martin Karlsson,
A Flexible Architecture for Simulation and Testing (FAST) Multiprocessor Systems John D. Davis, Lance Hammond, Kunle Olukotun Computer Systems Lab Stanford.
8/16/2015\course\cpeg323-08F\Topics1b.ppt1 A Review of Processor Design Flow.
1 Layers of Computer Science, ISA and uArch Alexander Titov 20 September 2014.
Introduction to SimpleScalar (Based on SimpleScalar Tutorial) TA: Kyung Hoon Kim CSCE614 Texas A&M University.
An Introduction Chapter Chapter 1 Introduction2 Computer Systems  Programmable machines  Hardware + Software (program) HardwareProgram.
1 CSCE 930 Advanced Computer Architecture Lecture 1 Evaluate Computer Architectures Dr. Jun Wang.
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
Computer Architecture Challenges Shriniwas Gadage.
© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved CHAPTER 9 Simulation Methods SIMULATION METHODS SIMPOINTS PARALLEL SIMULATIONS NONDETERMINISM.
1 Introduction to SimpleScalar (Based on SimpleScalar Tutorial) CPSC 614 Texas A&M University.
Statistical Simulation of Superscalar Architectures using Commercial Workloads Lieven Eeckhout and Koen De Bosschere Dept. of Electronics and Information.
NATIONAL INSTITUTE OF TECHNOLOGY KARNATAKA,SURATHKAL Presentation on ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS Publisher’s:
Lecture 1 1 Computer Systems Architecture Lecture 1: What is Computer Architecture?
A Methodology for Architecture Exploration of heterogeneous Signal Processing Systems Paul Lieverse, Pieter van der Wolf, Ed Deprettere, Kees Vissers.
Performance Simulators José Nelson Amaral CMPUT 429 Dept. of Computing Science University of Alberta.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Computer Science Department In-N-Out: Reproducing Out-of-Order Superscalar Processor Behavior from Reduced In-Order Traces Kiyeon Lee and Sangyeun Cho.
1 Instruction Set Architecture (ISA) Alexander Titov 10/20/2012.
1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi
FPGA-based Fast, Cycle-Accurate Full System Simulators Derek Chiou, Huzefa Sanjeliwala, Dam Sunwoo, John Xu and Nikhil Patil University of Texas at Austin.
Simics: A Full System Simulation Platform Synopsis by Jen Miller 19 March 2004.
Hybrid Multi-Core Architecture for Boosting Single-Threaded Performance Presented by: Peyman Nov 2007.
Basic Elements of Processor ALU Registers Internal data pahs External data paths Control Unit.
DR. SIMING LIU SPRING 2016 COMPUTER SCIENCE AND ENGINEERING UNIVERSITY OF NEVADA, RENO CS 219 Computer Organization.
Chapter 11 System-Level Verification Issues. The Importance of Verification Verifying at the system level is the last opportunity to find errors before.
Advanced Computer Architecture pg 1 Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8) Henk Corporaal
ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Evaluation – Metrics, Simulation, and Workloads Copyright 2004 Daniel.
Application Domains for Fixed-Length Block Structured Architectures ACSAC-2001 Gold Coast, January 30, 2001 ACSAC-2001 Gold Coast, January 30, 2001.
Computer Organization IS F242. Course Objective It aims at understanding and appreciating the computing system’s functional components, their characteristics,
CS203 – Advanced Computer Architecture Computer Architecture Simulators.
Lecture 3. Performance Prof. Taeweon Suh Computer Science & Engineering Korea University COSE222, COMP212, CYDF210 Computer Architecture.
1 COMP427 Embedded Systems Lecture 3. Virtual Platform Prof. Taeweon Suh Computer Science Education Korea University.
Chapter Goals Describe the application development process and the role of methodologies, models, and tools Compare and contrast programming language generations.
Programmable Logic Devices
CS203 – Advanced Computer Architecture
??? ple r B Amulya Sai EDM14b005 What is simple scalar?? Simple scalar is an open source computer architecture simulator developed by Todd.
Computer Organization and Machine Language Programming CPTG 245
Introduction to SimpleScalar
Introduction to SimpleScalar (Based on SimpleScalar Tutorial)
Introduction to SimpleScalar (Based on SimpleScalar Tutorial)
A Review of Processor Design Flow
Section 1: Introduction to Simics
The University of Texas at Austin
Agenda Why simulation Simulation and model Instruction Set model
Intro to Architecture & Organization
Control Unit Introduction Types Comparison Control Memory
A High Performance SoC: PkunityTM
Chapter 1 Introduction.
Presentation transcript:

© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved Embedded Computer Architecture 5SAI0 Simulation - chapter 9 - Luc Waeijen 16 Nov 2015

© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved How to study a computer system Methodologies ➢ Construct a hardware prototype ➢ Mathematical modeling ➢ Simulation

© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved Construct a hardware prototype Advantages ➢ Runs fast Disadvantages ➢ Takes long time to build - RPM (Rapid Prototyping engine for Multiprocessors) USC; took a few graduate students several years ➢ Expensive ➢ Not flexible

© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved Mathematically model the system Use analytical modeling ➢ Probabilistic ➢ Queuing ➢ Markov ➢ Petri Net Advantages ➢ Very flexible ➢ Very quick to develop ➢ Runs quickly Disadvantages ➢ Can not capture effects of system details ➢ Computer architects are skeptical of models

© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved Simulation Write a program that mimics system behavior Advantages ➢ Very flexible ➢ Relatively quick to develop Disadvantages ➢ Runs slowly (e.g., 30,000 times slower than hardware)

© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved Most popular research method Simulation is chosen by MOST research projects Why? ➢ Mathematical model is NOT accurate ➢ Building prototype is too time-consuming and too expensive for academic researchers

© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved Simulation Bottleneck 1 GHz = 1 Billion Cycles per Second Simulating a second of a future machine execution = Simulate 1B cycles!! Simulation of 1 cycle of a target = 30,000 cycles on a host 1 second of target simulation = 30,000 seconds on host = 8.3 Hours CPU2K run for a few hours natively Speed much worse when simulating CMP targets!! 7

© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved Simulation Bottleneck 1 GHz = 1 Billion Cycles per Second Simulating a second of a future machine execution = Simulate 1B cycles!! Simulation of 1 cycle of a target = 30,000 cycles on a host 1 second of target simulation = 30,000 seconds on host = 8.3 Hours CPU2K run for a few hours natively Speed much worse when simulating CMP targets!! 8

© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved How to overcome simulation bottleneck Gate level (RTL) Cycle accurate Functional level (ISA) DetailSimulation speed trade accuracy for simulation speed

© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved How to overcome simulation bottleneck Gate level (RTL) Cycle accurate Functional level (ISA) Model based approximation DetailSimulation speed trade accuracy for simulation speed This trade-off has resulted in a plethora of simulators

© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved Tool classification OS code execution ➢ System-level (complete system) - Does simulate behavior of an entire computer system, including OS and user code - Examples: – Simics – SimOS ➢ User-level - Does NOT simulate OS code - Does emulate system calls - Examples: – SimpleScalar

© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved Tool classification Simulation detail ➢ Instruction set - Does simulate the function of instructions - Does NOT model detailed micro-architectural timing - Examples: – Simics ➢ Micro-architecture - Does clock cycle level simulation - Does speculative, out-of-order multiprocessor timing simulation - May NOT implement functionality of full instruction set or any devices - Examples: – SimpleScalar ➢ RTL - Does logic gate-level simulation - Examples: – Synopsis

© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved Tool classification Simulation input ➢ Trace-driven - Simulator reads a “trace” of inst captured during a previous execution by software/hardware - Easy to implement, no functional component needed - Large trace size; no branch prediction ➢ Execution-driven - Simulator “runs” the program, generating a trace on-the-fly - More difficult to implement, but has many advantages - Interpreter, direct-execution - Examples: – Simics, SimpleScalar…

© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved

Interval Simulation

© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved Multi-Core Simulation Sequential simulation  All target cores are simulated in one thread (on one host core)  Unified memory hierarchy models simulate resource contention Parallel simulation  Each target core is simulated in separate thread

© Michel Dubois, Murali Annavaram, Per Strenstrom All rights reserved Multi-Core Simulation Sequential simulation  All target cores are simulated in one thread (on one host core)  Unified memory hierarchy models simulate resource contention Parallel simulation  Each target core is simulated in separate thread There is no relation between the number of target cores and the cores on the host! (except simulation speed)