© 2004 Wayne Wolf Memory system optimizations Strictly software:  Effectively using the cache and partitioned memory. Hardware + software:  Scratch-pad.

Slides:



Advertisements
Similar presentations
© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
Advertisements

Optimizing Compilers for Modern Architectures Syllabus Allen and Kennedy, Preface Optimizing Compilers for Modern Architectures.
Pooja ROY, Manmohan MANOHARAN, Weng Fai WONG National University of Singapore ESWEEK (CASES) October 2014 EnVM : Virtual Memory Design for New Memory Architectures.
CML Efficient & Effective Code Management for Software Managed Multicores CODES+ISSS 2013, Montreal, Canada Ke Bai, Jing Lu, Aviral Shrivastava, and Bryce.
A SOFTWARE-ONLY SOLUTION TO STACK DATA MANAGEMENT ON SYSTEMS WITH SCRATCH PAD MEMORY Arizona State University Arun Kannan 14 th October 2008 Compiler and.
Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.
High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 1: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf.
S CRATCHPAD M EMORIES : A D ESIGN A LTERNATIVE FOR C ACHE O N - CHIP M EMORY IN E MBEDDED S YSTEMS - Nalini Kumar Gaurav Chitroda Komal Kasat.
Memory Management Design & Implementation Segmentation Chapter 4.
1 CSE 380 Computer Operating Systems Instructor: Insup Lee University of Pennsylvania, Fall 2002 Lecture Note: Memory Management.
Term Project Overview Yong Wang. Introduction Goal –familiarize with the design and implementation of a simple pipelined RISC processor What to do –Build.
A High Performance Application Representation for Reconfigurable Systems Wenrui GongGang WangRyan Kastner Department of Electrical and Computer Engineering.
Chapter 13 Reduced Instruction Set Computers (RISC) Pipelining.
High Performance Embedded Computing © 2007 Elsevier Lecture 11: Memory Optimizations Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.
Chapter 4 Memory Management 4.1 Basic memory management 4.2 Swapping
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
Mem. Hier. CSE 471 Aut 011 Evolution in Memory Management Techniques In early days, single program run on the whole machine –Used all the memory available.
8/16/2015\course\cpeg323-08F\Topics1b.ppt1 A Review of Processor Design Flow.
Outline Introduction Different Scratch Pad Memories Cache and Scratch Pad for embedded applications.
Digital Signal Processors for Real-Time Embedded Systems By Jeremy Kohel.
Course Outline DayContents Day 1 Introduction Motivation, definitions, properties of embedded systems, outline of the current course How to specify embedded.
A genda for Today What is memory management Source code to execution Address binding Logical and physical address spaces Dynamic loading, dynamic linking,
Assuring Application-level Correctness Against Soft Errors Jason Cong and Karthik Gururaj.
Speculative Software Management of Datapath-width for Energy Optimization G. Pokam, O. Rochecouste, A. Seznec, and F. Bodin IRISA, Campus de Beaulieu
1 Advance Computer Architecture CSE 8383 Ranya Alawadhi.
Chapter 4 Storage Management (Memory Management).
High Performance Embedded Computing © 2007 Elsevier Chapter 2, part 2: CPUs High Performance Embedded Computing Wayne Wolf.
Chapter 4 Memory Management.
2013/10/21 Yun-Chung Yang An Energy-Efficient Adaptive Hybrid Cache Jason Cong, Karthik Gururaj, Hui Huang, Chunyue Liu, Glenn Reinman, Yi Zou Computer.
1 Fast and Efficient Partial Code Reordering Xianglong Huang (UT Austin, Adverplex) Stephen M. Blackburn (Intel) David Grove (IBM) Kathryn McKinley (UT.
1 Memory Management 4.1 Basic memory management 4.2 Swapping 4.3 Virtual memory 4.4 Page replacement algorithms 4.5 Modeling page replacement algorithms.
High Performance Embedded Computing © 2007 Elsevier Chapter 3, part 1: Programs High Performance Embedded Computing Wayne Wolf.
L11: Lower Power High Level Synthesis(2) 성균관대학교 조 준 동 교수
High Performance Embedded Computing © 2007 Elsevier Lecture 18: Hardware/Software Codesign Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.
USC Search Space Properties for Pipelined FPGA Applications University of Southern California Information Sciences Institute Heidi Ziegler, Mary Hall,
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Memory: Relocation.
Multimedia Characteristics and Optimizations Marilyn Wolf Dept. of EE Princeton University © 2004 Marilyn Wolf.
1 Optimizing compiler tools and building blocks project Alexander Drozdov, PhD Sergey Novikov, PhD.
2013/12/09 Yun-Chung Yang Partitioning and Allocation of Scratch-Pad Memory for Priority-Based Preemptive Multi-Task Systems Takase, H. ; Tomiyama, H.
CML SSDM: Smart Stack Data Management for Software Managed Multicores Jing Lu Ke Bai, and Aviral Shrivastava Compiler Microarchitecture Lab Arizona State.
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
Processor Architecture
High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf.
DSP Architectures Additional Slides Professor S. Srinivasan Electrical Engineering Department I.I.T.-Madras, Chennai –
CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.
Real-time aspects Bernhard Weirich Real-time Systems Real-time systems need to accomplish their task s before the deadline. – Hard real-time:
NETW3005 Memory Management. Reading For this lecture, you should have read Chapter 8 (Sections 1-6). NETW3005 (Operating Systems) Lecture 07 – Memory.
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
High Performance Embedded Computing © 2007 Elsevier Lecture 10: Code Generation Embedded Computing Systems Michael Schulte Based on slides and textbook.
High Performance Embedded Computing © 2007 Elsevier Lecture 7: Memory Systems & Code Compression Embedded Computing Systems Mikko Lipasti, adapted from.
WCET-Aware Dynamic Code Management on Scratchpads for Software-Managed Multicores Yooseong Kim 1,2, David Broman 2,3, Jian Cai 1, Aviral Shrivastava 1,2.
Memory-Aware Compilation Philip Sweany 10/20/2011.
1 of 14 Lab 2: Formal verification with UPPAAL. 2 of 14 2 The gossiping persons There are n persons. All have one secret to tell, which is not known to.
1 of 14 Lab 2: Design-Space Exploration with MPARM.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Block Cache for Embedded Systems Dominic Hillenbrand and Jörg Henkel Chair for Embedded Systems CES University of Karlsruhe Karlsruhe, Germany.
High Performance Embedded Computing © 2007 Elsevier Chapter 3, part 1: Programs High Performance Embedded Computing Wayne Wolf.
Ph.D. in Computer Science
CSC 322 Operating Systems Concepts Lecture - 12: by
Implementation of IDEA on a Reconfigurable Computer
CSCI1600: Embedded and Real Time Software
Tapestry: Reducing Interference on Manycore Processors for IaaS Clouds
Chapter 8: Memory management
Outline Module 1 and 2 dealt with processes, scheduling and synchronization Next two modules will deal with memory and storage Processes require data to.
Virtual Memory Overcoming main memory size limitation
Spring 2008 CSE 591 Compilers for Embedded Systems
CSCI1600: Embedded and Real Time Software
Research: Past, Present and Future
(via graph coloring and spilling)
Presentation transcript:

© 2004 Wayne Wolf Memory system optimizations Strictly software:  Effectively using the cache and partitioned memory. Hardware + software:  Scratch-pad memories.  Custom memory hierarchies.

© 2004 Wayne Wolf Taxonomy of memory optimizations (Wolf/Kandemir) Data vs. code. Array/buffer vs. non-array. Cache/scratch pad vs. main memory. Code size vs. data size. Program vs. process. Languages.

© 2004 Wayne Wolf Software performance analysis Worst-case execution time (WCET) analysis (Li/Malik):  Find longest path through CDFG.  Can use annotations of branch probabilities.  Can be mapped onto cache lines.  Difficult in practice---must analyze optimized code. Trace-driven analysis:  Well understood.  Requires code, input vectors.

© 2004 Wayne Wolf Software energy/power analysis Analytical models of cache (Su/Despain, Kamble/Ghose, etc.):  Decoding, memory core, I/O path, etc. System-level models (Li/Henkel). Power simulators (Vijaykrishnan et al, Brooks et al).

© 2004 Wayne Wolf Power-optimizing transformations Kandemir et al:  Most energy is consumed by the memory system, not the CPU core.  Performance-oriented optimizations reduce memory system energy but increase datapath energy consumption.  Larger caches increase cache energy consumption but reduce overall memory system energy.

© 2004 Wayne Wolf Cacheing for real-time systems Kirk and Strosnider---SMART  Strategic Memory Allocation for Real-Time cache design  cache is divided into segments  critical processes get their own cache segments  hardware flag selects private cache segment or pooled cache  heuristic algorithm groups tasks into cache segments Wolfe---software cache partitioning  map routines at link time to addresses which remove conflicts for critical routines

© 2004 Wayne Wolf Scratch pad memories Explicitly managed local memory. Panda et al used a static management scheme.  Data structures assigned to off-chip memory or scratch pad at compile time.  Put scalars in scratch pad, arrays in main. May want to manage scratch pad at run time.

© 2004 Wayne Wolf Reconfigurable caches Use compiler to determine best cache configuration for various program regions.  Must be able to quickly reconfigure the cache.  Must be able to identify where program behavior changes.

© 2004 Wayne Wolf Software methods for cache placement McFarling analyzed inter-function dependencies. Tomiyama and Yasuura used ILP. Li and Wolf used a process-level model. Kirovski et al use profiling information plus graph model. Dwyer/Fernando use bit vectors to construct boudns in instruction caches. Parmeswaran and Henkel use heuristics.

© 2004 Wayne Wolf Addressing optimizations Addressing can be expensive:  55% of DSP56000 instructions performed addressing operations in MediaBench. Utilize specialized addressing registers, pre/post-incr/decrement, etc.  Place variables in proper order in memory so that simpler operations can be used to calculate next address from previous address.