Speculative DMA for Architecturally Visible Storage in Instruction Set Extensions Theo KluterEPFL Philip BriskEPFL Paolo IenneEPFL Edoardo CharbonEPFL.

Slides:

Advertisements

Similar presentations

Full life cycle support for security concerns minutes topics Wouter Joosen.

Advertisements

Cache Coherence “Can we do a better job of supporting cache coherence?” Ross Daly Chan Kim.

Streaming SIMD Extension (SSE)

To Include or Not to Include? Natalie Enright Dana Vantrease.

A KTEC Center of Excellence 1 Cooperative Caching for Chip Multiprocessors Jichuan Chang and Gurindar S. Sohi University of Wisconsin-Madison.

CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Snoopy Caches I Steve Ko Computer Sciences and Engineering University at Buffalo.

Simplifying the Integration of Processing Elements in Computing Systems using a Programmable Controller By Lesley Shannon and Paul Chow University of Toronto.

Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture Aanjhan Ranganathan (ETH Zurich), Ali Galip Bayrak (EPFL), Theo Kluter.

Memory Organization and Data Layout for Instruction Set Extensions with Architecturally Visible Storage Panagiotis Athanasopoulos EPFL Philip Brisk UCR.

Speculative Sequential Consistency with Little Custom Storage Impetus Group Computer Architecture Lab (CALCM) Carnegie Mellon University

Thread-Level Transactional Memory Decoupling Interface and Implementation UW Computer Architecture Affiliates Conference Kevin Moore October 21, 2004.

OGO 2.1 SGI Origin 2000 Robert van Liere CWI, Amsterdam TU/e, Eindhoven 11 September 2001.

A Scalable Approach to Thread-Level Speculation J. Gregory Steffan, Christopher B. Colohan, Antonia Zhai, and Todd C. Mowry Carnegie Mellon University.

G Robert Grimm New York University Disco.

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Nov 14, 2005 Topic: Cache Coherence.

Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.

Unbounded Transactional Memory Paper by Ananian et al. of MIT CSAIL Presented by Daniel.

SyNAR: Systems Networking and Architecture Group Symbiotic Jobscheduling for a Simultaneous Multithreading Processor Presenter: Alexandra Fedorova Simon.

Samsara: Efficient Deterministic Replay with Hardware Virtualization Extensions Peking University Shiru Ren, Chunqi Li, Le Tan, and Zhen Xiao July 27 ，

Prospector : A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by.

Manolis Katevenis FORTH and University of Crete, Greece Interprocessor Communication seen as load/store instruction generalization.

Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.

To Flip or Not to Flip David Kaeli Northeastern University Boston, MA.

Multicore In Real-Time Systems – Temporal Isolation Challenges Due To Shared Resources Ondřej Kotaba, Jan Nowotsch, Michael Paulitsch, Stefan.

Chapter 6 Multiprocessor System. Introduction  Each processor in a multiprocessor system can be executing a different instruction at any time.  The.

Optimizing Data Compression Algorithms for the Tensilica Embedded Processor Tim Chao Luis Robles Rebecca Schultz.

CPE 731: Advanced Computer Architecture Research Report and Presentation 1.

Thread-Level Speculation Karan Singh CS

Compactly Representing Parallel Program Executions Ankit Goel Abhik Roychoudhury Tulika Mitra National University of Singapore.

ECE200 – Computer Organization Chapter 9 – Multiprocessors.

With Virtual Machine Self Service Joey Alexander Aaron Dick Jon Hacker Damen Hicks.

Sequential Hardware Prefetching in Shared-Memory Multiprocessors Fredrik Dahlgren, Member, IEEE Computer Society, Michel Dubois, Senior Member, IEEE, and.

Architecture and Design Automation for Application-Specific Processors Philip Brisk Assistant Professor Dept. of Computer Science and Engineering University.

Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.

Advanced Computer Architecture Lab University of Michigan Compiler Controlled Value Prediction with Branch Predictor Based Confidence Eric Larson Compiler.

ND The research group on Networks & Distributed systems.

Coherence Decoupling: Making Use of Incoherence J. Huh, J. Chang, D. Burger, G. Sohi ASPLOS 2004.

Safetynet: Improving The Availability Of Shared Memory Multiprocessors With Global Checkpoint/Recovery D. Sorin M. Martin M. Hill D. Wood Presented by.

Review of Computer System Organization. Computer Startup For a computer to start running when it is first powered up, it needs to execute an initial program.

Parallel Processing Chapter 9. Problem: –Branches, cache misses, dependencies limit the (Instruction Level Parallelism) ILP available Solution:

Hardware-based Job Queue Management for Manycore Architectures and OpenMP Environments Junghee Lee, Chrysostomos Nicopoulos, Yongjae Lee, Hyung Gyu Lee.

컴퓨터교육과 이상욱 Published in: COMPUTER ARCHITECTURE LETTERS (VOL. 10, NO. 1) Issue Date: JANUARY-JUNE 2011 Publisher: IEEE Authors: Omer Khan (Massachusetts.

Multiprocessor  Use large number of processor design for workstation or PC market  Has an efficient medium for communication among the processor memory.

An Evaluation of Memory Consistency Models for Shared- Memory Systems with ILP processors Vijay S. Pai, Parthsarthy Ranganathan, Sarita Adve and Tracy.

Conditional Memory Ordering Christoph von Praun, Harold W.Cain, Jong-Deok Choi, Kyung Dong Ryu Presented by: Renwei Yu Published in Proceedings of the.

Transactional Memory Coherence and Consistency Lance Hammond, Vicky Wong, Mike Chen, Brian D. Carlstrom, John D. Davis, Ben Hertzberg, Manohar K. Prabhu,

Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.

VU-Advanced Computer Architecture Lecture 1-Introduction 1 Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 1.

Dynamic Region Selection for Thread Level Speculation Presented by: Jeff Da Silva Stanley Fung Martin Labrecque Feb 6, 2004 Builds on research done by:

An Adaptive Cache Coherence Protocol Optimized for Producer-Consumer Sharing Liquin Cheng, John B. Carter and Donglai Dai cs.utah.edu by Evangelos Vlachos.

1 load [2], [9] Transfer contents of memory location 9 to memory location 2. Illegal instruction.

MPSoC Design using Application-Specific Architecturally Visible Communication Theo Kluter Philip Brisk Edoardo Charbon Paolo Ienne.

روش تحقيق بارويكرد ي به پايان نامه نويسي

Maurice Herlihy and J. Eliot B. Moss, ISCA '93

Lecture 5 Approaches to Concurrency: The Multiprocessor

PHyTM: Persistent Hybrid Transactional Memory

Cache Coherence in Shared Memory Multiprocessors

Taeweon Suh §, Daehyun Kim †, and Hsien-Hsin S. Lee § June 15, 2005

Computer Engineering 2nd Semester

The University of Adelaide, School of Computer Science

Disco: Running Commodity Operating Systems on Scalable Multiprocessors

STUDY AND IMPLEMENTATION

المدخل إلى تكنولوجيا التعليم في ضوء الاتجاهات الحديثة

Mengjia Yan† , Jiho Choi† , Dimitrios Skarlatos,

Hybrid Transactional Memory

The University of Adelaide, School of Computer Science

CSE 471 Autumn 1998 Virtual memory

The University of Adelaide, School of Computer Science

Paper discussed in class: S. Hauck, T. Fry, M. Hosler, J

Presentation transcript:

Speculative DMA for Architecturally Visible Storage in Instruction Set Extensions Theo KluterEPFL Philip BriskEPFL Paolo IenneEPFL Edoardo CharbonEPFL

2 Motivation

3

4

5

6

7

8 DMA in DMA out

9 Motivation

10 Motivation

11 Motivation

12 Motivation

13 Motivation

14 Contents Motivation Ensuring coherence Speculative DMA Opportunistic Speculative DMA Conclusion Questions

15 Ensuring coherence

16 Ensuring coherence

17 Ensuring coherence

18 Ensuring coherence

19 Ensuring coherence

20 Ensuring coherence

21 Ensuring coherence

22 Ensuring coherence

23 Ensuring coherence

24 Ensuring coherence

25 Contents Motivation Ensuring coherence Speculative DMA Opportunistic Speculative DMA Conclusion Questions

26 Speculative DMA

27 Speculative DMA

28 Speculative DMA

29 Speculative DMA

30 Speculative DMA

31 Contents Motivation Ensuring coherence Speculative DMA Opportunistic Speculative DMA Conclusion Questions

32 Opportunistic Speculative DMA ?

33 Opportunistic Speculative DMA

34 Opportunistic Speculative DMA

35 Opportunistic Speculative DMA

36 Opportunistic Speculative DMA

37 Contents Motivation Ensuring coherence Speculative DMA Opportunistic Speculative DMA Conclusion Questions

38 Conclusion Coherence is a serious concern Use existing hardware coherence protocols Limited hardware overhead Single- and multi-processor solution Tanglable speed-up possible by profiling

39 Contents Motivation Ensuring coherence Speculative DMA Opportunistic Speculative DMA Conclusion Questions ?