University of Maryland Profile-Driven Selective Program Loading Tugrul Ince Jeff Hollingsworth Department of Computer Science University.

Slides:



Advertisements
Similar presentations
Hand-Held Devices and Embedded Systems Course Student: Tomás Sánchez López Student ID:
Advertisements

Chapter 11 – Virtual Memory Management
ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto
University of Maryland Locality Optimizations in cc-NUMA Architectures Using Hardware Counters and Dyninst Mustafa M. Tikir Jeffrey K. Hollingsworth.
Assembler/Linker/Loader Mooly Sagiv html:// Chapter 4.3 J. Levine: Linkers & Loaders
Linkage Editors Difference between a linkage editor and a linking loader: Linking loader performs all linking and relocation operations, including automatic.
Linking & Loading CS-502 Operating Systems
Compilation (Semester A, 2013/14) Lecture 13: Assembler, Linker & Loader Noam Rinetzky Slides credit: Eli Bendersky, Mooly Sagiv & Sanjeev Setia.
Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.
Linking and Loading Fred Prussack CS 518. L&L: Overview Wake-up Questions Terms and Definitions / General Information LoadingLinking –Static vs. Dynamic.
OS Spring ‘04 Paging and Virtual Memory Operating Systems Spring 2004.
Figure 2.8 Compiler phases Compiling. Figure 2.9 Object module Linking.
Computer ArchitectureFall 2008 © November 10, 2007 Nael Abu-Ghazaleh Lecture 23 Virtual.
Computer ArchitectureFall 2007 © November 21, 2007 Karem A. Sakallah Lecture 23 Virtual Memory (2) CS : Computer Architecture.
Memory Management. 2 How to create a process? On Unix systems, executable read by loader Compiler: generates one object file per source file Linker: combines.
Memory Management 2010.
Disco Running Commodity Operating Systems on Scalable Multiprocessors.
Extensibility, Safety and Performance in the SPIN Operating System Brian Bershad, Stefan Savage, Przemyslaw Pardyak, Emin Gun Sirer, Marc E. Fiuczynski,
Chapter 11 – Virtual Memory Management Outline 11.1 Introduction 11.2Locality 11.3Demand Paging 11.4Anticipatory Paging 11.5Page Replacement 11.6Page Replacement.
An introduction to systems programming
Memory: Virtual MemoryCSCE430/830 Memory Hierarchy: Virtual Memory CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu.
Computer Science 210 Computer Organization The Instruction Execution Cycle.
Caching and Virtual Memory. Main Points Cache concept – Hardware vs. software caches When caches work and when they don’t – Spatial/temporal locality.
University of Maryland Compiler-Assisted Binary Parsing Tugrul Ince PD Week – 27 March 2012.
Memory Management 3 Tanenbaum Ch. 3 Silberschatz Ch. 8,9.
The Performance of Microkernel-Based Systems
Topic 2d High-Level languages and Systems Software
CS 149: Operating Systems March 3 Class Meeting Department of Computer Science San Jose State University Spring 2015 Instructor: Ron Mak
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Memory: Relocation.
CPS3340 COMPUTER ARCHITECTURE Fall Semester, /29/2013 Lecture 13: Compile-Link-Load Instructor: Ashraf Yaseen DEPARTMENT OF MATH & COMPUTER SCIENCE.
Computer Systems Week 14: Memory Management Amanda Oddie.
Static Shared Library. Non-shared v.s. Shared Library A library is a collection of pre-written function calls. Using existing libraries can save a programmer.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Demand Paging.
Processes and Virtual Memory
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2010 Binary Rewriting with Dyninst Madhavi Krishnan and Dan McNulty.
Full and Para Virtualization
University of Maryland Instrumentation with Relocatable Program Code Tugrul Ince Department of Computer Science University of Maryland, College Park, MD.
Different Types of Libraries
HPC F ORUM S EPTEMBER 8-10, 2009 Steve Rowan srowan at conveycomputer.com.
1 Lecture 8: Virtual Memory Operating System Fall 2006.
CSc 453 Linking and Loading
Memory Management. 2 How to create a process? On Unix systems, executable read by loader Compiler: generates one object file per source file Linker: combines.
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
Just-In-Time Compilation. Introduction Just-in-time compilation (JIT), also known as dynamic translation, is a method to improve the runtime performance.
Object Files & Linking. Object Sections Compiled code store as object files – Linux : ELF : Extensible Linking Format – Windows : PE : Portable Execution.
Program Execution in Linux David Ferry, Chris Gill CSE 522S - Advanced Operating Systems Washington University in St. Louis St. Louis, MO
Apr. 27, 2007 Tetsuyuki Kobayashi Aplix Corporation
Assemblers, linkers, loaders
Kernel Code Coverage Nilofer Motiwala Computer Sciences Department
System Programming and administration
The University of Adelaide, School of Computer Science
Background Information Text Chapter 8
Linking & Loading.
Linux Userspace Process Memory Layout
Program Execution in Linux
CS-3013 Operating Systems C-term 2008
Improving Program Efficiency by Packing Instructions Into Registers
Topic 2e High-Level languages and Systems Software
Loaders and Linkers.
Linking & Loading CS-502 Operating Systems
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
Program Execution in Linux
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
Linking & Loading CS-502 Operating Systems
An introduction to systems programming
Lecture 9: Caching and Demand-Paged Virtual Memory
Virtual Memory Lecture notes from MKP and S. Yalamanchili.
Program Assembly.
4.3 Virtual Memory.
CSE 542: Operating Systems
Presentation transcript:

University of Maryland Profile-Driven Selective Program Loading Tugrul Ince Jeff Hollingsworth Department of Computer Science University of Maryland, College Park, MD 20742

University of Maryland 2 Motivation Programs are getting larger! –Many frameworks and libraries Many supercomputers lack demand-paging –Example: Cray XT and BlueGene series –Available memory is scarce Observation: Most programs do not use every available function! –Frameworks and libraries are too general –Code that handles errors or special cases Why not remove functions that are not used in the common case?

University of Maryland 3 Aim Reduce memory footprint by selectively loading parts of shared libraries

University of Maryland Target Platforms and Applications Unix/Linux systems that support ELF –Modifies ELF program headers Applications with many libraries –Most current reasonable applications Parallel programs running on multiple nodes –MPI etc. Platforms without demand-paging –Cray XT and BlueGene series 4

University of Maryland Architecture Overview 5 Application is profiled. It is rewritten with –Modified Shared Libraries –A Signal Handler Application is executed as usual.

University of Maryland Profiler Need a list of never-called functions in each shared library –Profile the application several times –May not be perfect DynInst-based profiler –Write small program (~ 70 LOC) –Rewrite shared libraries –Profile as many times as necessary 6

University of Maryland Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align LOAD 0x x x x x R E 0x1000 LOAD 0x x x x x R E 0x1000 Rewriting Do not load unused functions –Modify ELF program headers –Example: libpetsc.so 7 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align LOAD 0x x x x x R E 0x1000.text LOAD 0x x x x013f8 0x0a434 RW 0x1000 DYNAMIC 0x12459c 0x c 0x c 0x x00130 RW 0x4 GNU_STACK 0x x x x x00000 RW 0x4 First Loadable Section:.text,.init,.fini,.plt Second Loadable Section:.dynamic,.got,.got.plt,.data,.bss

University of Maryland Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align LOAD 0x x x x x R E 0x1000 LOAD 0x x x x x R E 0x1000 Rewriting Do not load unused functions –Modify ELF program headers –Example: libpetsc.so 8.text LOAD 0x x x x013f8 0x0a434 RW 0x1000 DYNAMIC 0x12459c 0x c 0x c 0x x00130 RW 0x4 GNU_STACK 0x x x x x00000 RW 0x4 First Loadable Section:.text,.init,.fini,.plt Second Loadable Section:.dynamic,.got,.got.plt,.data,.bss

University of Maryland Rewriting Rewriter based on DynInst Profile data is used to create lists of Used and Unused functions Access / Modify symbols Defragment functions to maximize space savings –Requires moving functions inside shared libraries 9

University of Maryland Function Defragmentation 10 Used Unused

University of Maryland Challenges: Relative Calls Common way of calling functions in PIC. If either callee or caller is moved, their relative positioning changes. Offsets in such relative call instructions need to be updated 11 call d foo d call d’ foo d'

University of Maryland Challenges: Symbols Runtime linker uses symbols to resolve cross-library calls. –Uses procedure linkage tables (plt) If a function is moved, its associated symbol has to be updated. 12 call foo: 0xdeadbeef foo call foo: 0xbeefdead foo

University of Maryland Challenges: Jump Tables Used to represent n-way branches at machine level Targets are read from jump table –Entries are offsets of targets from the GOT address Becomes invalid if the function referenced in a jump table is moved DynInst reads jump tables to generate CFGs We update entries so that they can be used to point to new location of targets 13

University of Maryland Unexpectedly Called Function Execution is not always predictable –Unexpected function calls Rewrite original executable with a Signal Handler Load the function upon an unexpected call –Signal Handler picks up page faults (SIGSEGV) –Loads requested page on-demand –Execution resumes User-level: No OS modifications 14

University of Maryland 15 Experiments Tested on –PETSc ex5 in snes package –PETSc ex2 in ksp package –GS2 Compiled with debug flag and no optimization Used Open MPI Tested on 64-node cluster at UMD –Dual-core x86 processors –Unmodified Linux kernel Space savings of about 82% on average

University of Maryland PETSc – snes (ex5) 16 Library Name Text Pages (Original) Text Pages (Modified) Reduction % petsc petscdm petscksp petscmat petscvec petscsnes20 0 mpi_cxx10550 mpi open-pal open-rte m Library Name Text Pages (Original) Text Pages (Modified) Reductio n % X lapack blas stdc gcc_s Xau220 Xdcm330 gfortran dl220 nsl util220 OVERALL

University of Maryland PETSc – snes (ex5) 17

University of Maryland PETSc – ksp (ex2) 18 Library Name Text Pages (Original) Text Pages (Modified)Reduction % petsc petscdm petscksp petscmat petscvec mpi_cxx10550 mpi open-pal open-rte OVERALL

University of Maryland GS2 19 Library NameText Pages (Original) Text Pages (Modified)Reduction % MdsLib MdsShr TdiShr TreeShr fftw rfftw mpi_f mpi open-pal open-rte OVERALL

University of Maryland Running Times GS2 takes 5 seconds less on average –(36m 38s vs. 36m 33s) Overhead on PETSc examples –ex2 runs for 2.7 secs, ex5 runs for 1.05 secs. 20

University of Maryland Running Times Results suggest no overhead for reasonably-long running programs –Initial cost for signal handler registration –Better instruction cache and TLB performance 21

University of Maryland 22 Summary Our tool reduces memory footprint of shared libraries Rewrite shared libraries with holes –Defragment functions to maximize space savings On-demand page loading if a not-yet- loaded function is called About 82% memory space savings for shared libraries Might improve instruction cache and TLB performance