1 Allegro CL 9.0 Internals Duane Rettig ILC 2012.

Slides:



Advertisements
Similar presentations
CS252: Systems Programming Ninghui Li Based on Slides by Prof. Gustavo Rodriguez-Rivera Topic 14: Deadlock & Dinning Philosophers.
Advertisements

CS492B Analysis of Concurrent Programs Lock Basics Jaehyuk Huh Computer Science, KAIST.
Monitors & Blocking Synchronization 1. Producers & Consumers Problem Two threads that communicate through a shared FIFO queue. These two threads can’t.
Computer Organization CS224 Fall 2012 Lesson 12. Synchronization  Two processors or threads sharing an area of memory l P1 writes, then P2 reads l Data.
Chapter 7 Threads  Threads & Processes  Multi Threading  Class Thread  Synchronized Methods  Wait & Notify.
Contiki A Lightweight and Flexible Operating System for Tiny Networked Sensors Presented by: Jeremy Schiff.
Memory Allocation. Three kinds of memory Fixed memory Stack memory Heap memory.
Precept 3 COS 461. Concurrency is Useful Multi Processor/Core Multiple Inputs Don’t wait on slow devices.
03/09/2007CSCI 315 Operating Systems Design1 Memory Management Notice: The slides for this lecture have been largely based on those accompanying the textbook.
Memory Management CSCI 3753 Operating Systems Spring 2005 Prof. Rick Han.
CS 333 Introduction to Operating Systems Class 18 - File System Performance Jonathan Walpole Computer Science Portland State University.
Threads 1 CS502 Spring 2006 Threads CS-502 Spring 2006.
Memory Management 2010.
CS 300 – Lecture 20 Intro to Computer Architecture / Assembly Language Caches.
Synchronization in Java Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.
Chapter 6 – Concurrent Programming Outline 6.1 Introduction 6.2Monitors 6.2.1Condition Variables 6.2.2Simple Resource Allocation with Monitors 6.2.3Monitor.
Synchronization in Java Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
11/10/2005Comp 120 Fall November 10 8 classes to go! questions to me –Topics you would like covered –Things you don’t understand –Suggestions.
Chapter 8 Windows Outline Programming Windows 2000 System structure Processes and threads in Windows 2000 Memory management The Windows 2000 file.
Multi-threaded Lisp: Challenges and Solutions Roger Corman Corman Technologies International Lisp Conference 2002.
F RANZ I NC. Optimizing and Debugging Programs in Allegro CL By Duane Rettig April, 2007.
Cosc 4740 Chapter 6, Part 3 Process Synchronization.
Scheduling Basic scheduling policies, for OS schedulers (threads, tasks, processes) or thread library schedulers Review of Context Switching overheads.
Eraser: A Dynamic Data Race Detector for Multithreaded Programs STEFAN SAVAGE, MICHAEL BURROWS, GREG NELSON, PATRICK SOBALVARRO, and THOMAS ANDERSON Ethan.
Java Threads 11 Threading and Concurrent Programming in Java Introduction and Definitions D.W. Denbo Introduction and Definitions D.W. Denbo.
IT253: Computer Organization
Operating Systems Lecture 7 OS Potpourri Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard. Zhiqing Liu School of Software.
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
Games Development 2 Concurrent Programming CO3301 Week 9.
CE Operating Systems Lecture 14 Memory management.
The Linux Operating System C. Blane Adcock Bryan Knehr Kevin Estep Jason Niesz.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Mutual Exclusion.
CUDA - 2.
Threads G.Anuradha (Reference : William Stallings)
4P13 Week 3 Talking Points 1. Process State 2 Process Structure Catagories – Process identification: the PID and the parent PID – Signal state: signals.
Kernel Locking Techniques by Robert Love presented by Scott Price.
Processes and Process Control 1. Processes and Process Control 2. Definitions of a Process 3. Systems state vs. Process State 4. A 2 State Process Model.
1 Threads, SMP, and Microkernels Chapter Multithreading Operating system supports multiple threads of execution within a single process MS-DOS.
Prolog Program Style (ch. 8) Many style issues are applicable to any program in any language. Many style issues are applicable to any program in any language.
Operating Systems CSE 411 CPU Management Sept Lecture 10 Instructor: Bhuvan Urgaonkar.
CS333 Intro to Operating Systems Jonathan Walpole.
Lecture Topics: 11/24 Sharing Pages Demand Paging (and alternative) Page Replacement –optimal algorithm –implementable algorithms.
Operating Systems: Wrap-Up Questions answered in this lecture: What is an Operating System? Why are operating systems so interesting? What techniques can.
Processes & Threads Introduction to Operating Systems: Module 5.
Barriers and Condition Variables
Introduction Contain two or more CPU share common memory and peripherals. Provide greater system throughput. Multiple processor executing simultaneous.
Haskell on a Shared-Memory Multiprocessor Tim Harris Simon Marlow Simon Peyton Jones.
Introduction to HPC Debugging with Allinea DDT Nick Forrington
December 1, 2006©2006 Craig Zilles1 Threads & Atomic Operations in Hardware  Previously, we introduced multi-core parallelism & cache coherence —Today.
Smalltalk Implementation Harry Porter, October 2009 Smalltalk Implementation: Optimization Techniques Prof. Harry Porter Portland State University 1.
SLC/VER1.0/OS CONCEPTS/OCT'99
Jonathan Walpole Computer Science Portland State University
Chapter 11: File System Implementation
CS 6560: Operating Systems Design
Paging COMP 755.
Indexer AKEEL AHMED.
Threads and Locks.
Debugging Techniques.
Chapter 3: Windows7 Part 2.
System Structure B. Ramamurthy.
Chapter 3: Windows7 Part 2.
Synchronization Issues
CSE 451: Operating Systems Spring 2012 Module 6 Review of Processes, Kernel Threads, User-Level Threads Ed Lazowska 570 Allen.
Parallelism and Concurrency
Why Threads Are A Bad Idea (for most purposes)
Process Description and Control in Unix
Why Threads Are A Bad Idea (for most purposes)
Why Threads Are A Bad Idea (for most purposes)
CSE 542: Operating Systems
More concurrency issues
Presentation transcript:

1 Allegro CL 9.0 Internals Duane Rettig ILC 2012

2 Overview SMP Design Internals Fasl reader rewrite in Lisp Debugger Enhancements

3 SMP design goals Low Overhead for scalar operation Source Compatible Warnings for unsafe usage Tools for smp enhanced programming

4 SMP: Goal Realization Low Overhead 9.0 (non-smp) virtually the same as 8.2 SMP has <10% hit over non-smp (sometimes faster) Other lisps have 15%-100% hit over 8.2 (YMMV)

5 SMP: Goal Realization Source Compatible: Success Exceptions: hash-tables threads 4 new smp runtime-lisp files (compare 28 for new port)

6 SMP: Goal realization Warnings for unsafe usage smp-macros module (8.1 patch, integral in 8.2) not aggressive in looking for problems setf races potential deadlocks alists (but plists/getf are protected by convention)

7 SMP: goal realization Tools for enhanced SMP programming synchronizing operators and locks Separate GC

8 SMP: implementation Deprecation of SMP-unsafe Macros New macros Bindability and settability of specials synchronization operators GC

9 SMP: implementation S Deprecated smp-unsafe macros without-interrupts, without-scheduling excl::fast / excl::atomically excl::*warn-smp-usage*

10 SMP: Implementation New Macros fast-and-clean: replaces (excl::fast (excl::atomically...)) with-pinned-objects: replaces (excl::fast (excl::atomically...)) with-delayed-interrupts: replaces without- {interrupts,scheduling} defvar-nonbindable

11 SMP:Implementation Bindability and Settability bindable, settable: defvar/defparameter not bindable, settable: defvar-nonbindable bindable, not settable: excl::defvar-nonsettable not bindable, not settable: defconstant

12 SMP:implementation old 8.1wide binding header size value waste symbol header size global value thread 1 locative thread 2 locative thread 3 locative thread 4 locative symbol locatives header size value waste sv-vector tag=type=#xb tag=2/type=#x70 header value hash function name plist

13 SMP:implementation new wide binding header value hash function name plist sv-vector lock- index header value symbol function symbol header size symbol thread 1 locative thread 2 locative thread 3 locative thread 4 locative symbol locatives header value symbol function sv-vectortag=type=#xb tag=#xb/type=#x8b tag=2/type=#x85

14 smp: implementation Synchronization operators push-atomic/pop-atomic incf-atomic/decf-atomic update-atomic atomic-conditional-setf (implementor for above) atomic-conditional-setq (special operator)

15 smp: implementation Lower level smp operators get-atomic-modify-expansion excl::atomic-modify-form (may change!) excl::*force-csw-opcodes* (may change!) excl::defsetf-conditional (may change!)

16 SMP: implementation excl::defsetf-conditional built-in operator conversions excl::.inv-structure-ref excl::.inv-svref excl::.inv-car, excl::.inv-cdr excl::.inv-symbol-plist excl::.inv-global-symbol-value

17 SMP: implementation Lowest level operators gc-setf-protect-atomic ll :cas low-level instruction form

18 SMP: implementation with-locked-object good on any lockable object lighter weight than process-lock use with care to avoid deadlocks special versions for structs and streams

19 SMP: implementation Other higher-level synchronizing tools sharable-locks barriers queues (uses process-lock; with-locked-object is lighter weight) condition-variables

20 SMP Implementation: gc Runs in separate (non-Lisp) thread Able to provide per-thread object allocation Currently implemented: conses: everywhere floats: on x86-64 Synchronizes with all threads Still written in C

21 GC states 0: Lisping: thread is running Lisp code; GC can’t happen 1: Foreign: thread is running foreign code; GC can happen 2: Blocking GC: thread is running Lisp code; GC wants it to pause 3: Blocked by GC: thread is trying to get from foreign to Lisp; GC is running 4: Beside GC: thread is running foreign code and GC is happening

22 GC state diagram Lisping (0) Foreign (1) Blocking GC (2) Beside GC (4) Blocked by GC (3) thread goes lisping thread goes foreign GC Done GC Starting GC Starting (signal thread:GC waits) thread goes foreign thread goes lisping (thread waits) GC Done (post thread) thread invokes GC thread invokes GC (thread waits) (post gc) 22

23 Fasl Reader Rewritten in Lisp runtime code 50% faster than C version fewer transitions to/from C Started in 8.2; used only to load source debug info Used exclusively in 9.0

24 Fasl coding example case ff_complex:/* tos = imag, tos-1 = real */if (Building_BOTH) { LispVal comp = new_lisp_obj(TYPEcomplex, 0, 0); /* complex object is new */ *(nat *)((nat)comp + c_imag_adj/PtrScale) = (nat)f_pop(); *(nat *)((nat)comp + c_real_adj/PtrScale) = (nat)f_pop(); f_push(comp); }break; (#.ff_complex (when (building-both) (let* ((imag (fasl-pop thread)) (real (fasl-pop thread)) (complex (q-qint-call sys::make-complex real imag))) (fasl-push complex thread)))) Lisp: C: 24

25 DEBugger enhancements frame descriptor caching and validation source level debugger

26 Debugger: reimplementation of Api db:next-newer-frame hard to implement Always moving! either cached frames become invalid or have to start over each time :zoom on overflowed stack takes much too long

27 Frame descriptors Frame-descriptors are now larger newer, older, chain slots validated via interlock with their real frames. frame descriptor has argcount shadow slot

28 frames (link) return adddress function argcount... header class fp newer older chain (argcount) others... Stack Frame descrptor

29 Frame descriptors db:next-newer-frame is no longer programming suicide!

30 source level debugger Demo