EE 4xx: Computer Architecture and Performance Programming

Slides:



Advertisements
Similar presentations
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Parallelism & Locality Optimization.
Advertisements

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Faculty of Electrical Engineering Czech Technical University in Prague
Yaron Doweck Yael Einziger Supervisor: Mike Sumszyk Spring 2011 Semester Project.
Optimizing single thread performance Dependence Loop transformations.
Introductory Courses in High Performance Computing at Illinois David Padua.
Memory Consistency in Vector IRAM David Martin. Consistency model applies to instructions in a single instruction stream (different than multi-processor.
Lock-free Cache-friendly Software Queue for Decoupled Software Pipelining Student: Chen Wen-Ren Advisor: Wuu Yang 學生 : 陳韋任 指導教授 : 楊武 Abstract Multicore.
Instruction Level Parallelism (ILP) Colin Stevens.
Multiprocessors Andreas Klappenecker CPSC321 Computer Architecture.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
Topic ? Course Overview. Guidelines Questions are rated by stars –One Star Question  Easy. Small definition, examples or generic formulas –Two Stars.
CS 470/570:Introduction to Parallel and Distributed Computing.
RAM and Parallel RAM (PRAM). Why models? What is a machine model? – A abstraction describes the operation of a machine. – Allowing to associate a value.
Early Adopter Introduction to Parallel Computing: Research Intensive University: 4 th Year Elective Bo Hong Electrical and Computer Engineering Georgia.
Course Outline DayContents Day 1 Introduction Motivation, definitions, properties of embedded systems, outline of the current course How to specify embedded.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
Thinking in Parallel Adopting the TCPP Core Curriculum in Computer Systems Principles Tim Richards University of Massachusetts Amherst.
NSF/TCPP Curriculum Planning workshop Behrooz Shirazi Washington State University February 2010.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
CS-2710 Computer Organization Dr. Mark L. Hornick web: faculty-web.msoe.edu/hornick – CS-2710 info syllabus, homework, labs… –
CIS4930/CDA5125 Parallel and Distributed Systems Florida State University CIS4930/CDA5125: Parallel and Distributed Systems Instructor: Xin Yuan, 168 Love,
Parallel and Distributed Systems Instructor: Xin Yuan Department of Computer Science Florida State University.
1 Advance Computer Architecture CSE 8383 Ranya Alawadhi.
Multiprocessors Speed of execution is a paramount concern, always so … If feasible … the more simultaneous execution that can be done on multiple computers.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
(1) ECE 3056: Architecture, Concurrency and Energy in Computation Lecture Notes by MKP and Sudhakar Yalamanchili Sudhakar Yalamanchili (Some small modifications.
VTU – IISc Workshop Compiler, Architecture and HPC Research in Heterogeneous Multi-Core Era R. Govindarajan CSA & SERC, IISc
Hyper Threading (HT) and  OPs (Micro-Operations) Department of Computer Science Southern Illinois University Edwardsville Summer, 2015 Dr. Hiroshi Fujinoki.
Computer Organization and Architecture Tutorial 1 Kenneth Lee.
Early Adopter: Integration of Parallel Topics into the Undergraduate CS Curriculum at Calvin College Joel C. Adams Chair, Department of Computer Science.
Advanced Computer Architecture 5MD00 Overview Henk Corporaal TUEindhoven 2014.
Memory Hierarchy Adaptivity An Architectural Perspective Alex Veidenbaum AMRM Project sponsored by DARPA/ITO.
DISTRIBUTED COMPUTING
Lecture 19 Beyond Low-level Parallelism. 2 © Wen-mei Hwu and S. J. Patel, 2002 ECE 412, University of Illinois Outline Models for exploiting large grained.
Hybrid Multi-Core Architecture for Boosting Single-Threaded Performance Presented by: Peyman Nov 2007.
T.J Brown, I. Spence, P. Kilpatrick, C. Gillan, N. S. Scott School of Electronics, Electrical Engineering and Computer Science, Queen’s University of Belfast.
08/28/2012CS4230 CS4230 Parallel Programming Lecture 3: Introduction to Parallel Architectures Mary Hall August 28,
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 4: Threads.
Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.
CPE432: Computer Design Course Introduction Dr. Gheith Abandah د. غيث علي عبندة.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.
Static Translation of Stream Program to a Parallel System S. M. Farhad The University of Sydney.
Elec/Comp 526 Spring 2015 High Performance Computer Architecture Instructor Peter Varman DH 2022 (Duncan Hall) rice.edux3990 Office Hours Tue/Thu.
1 Lecture 5a: CPU architecture 101 boris.
Distributed and Parallel Processing George Wells.
Prof. Zhang Gang School of Computer Sci. & Tech.
Topics to be covered Instruction Execution Characteristics
CS5102 High Performance Computer Systems Thread-Level Parallelism
Yuanrui Zhang, Mahmut Kandemir
ESE532: System-on-a-Chip Architecture
Multi-core processors
Morgan Kaufmann Publishers
Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures Topic 17 NVIDIA GPU Computational Structures Prof. Zhang Gang
Lecture 5: GPU Compute Architecture
Advanced Computer Architecture 5MD00 / 5Z033 Overview
Lecture 5: GPU Compute Architecture for the last time
Konstantis Daloukas Nikolaos Bellas Christos D. Antonopoulos
Symmetric Multiprocessing (SMP)
ECE/CS 757: Advanced Computer Architecture II
Coe818 Advanced Computer Architecture
Embedded Computer Architecture 5SIA0 Overview
Mattan Erez The University of Texas at Austin
Embedded Computer Architecture 5SAI0 Wrap-Up, we are almost there...
Overview Prof. Eric Rotenberg
Mattan Erez The University of Texas at Austin
CS 286 Computer Organization and Architecture
Course Outline for Computer Architecture
The University of Adelaide, School of Computer Science
CAPS project-team Compilation et Architectures pour Processeurs Superscalaires et Spécialisés.
Presentation transcript:

EE 4xx: Computer Architecture and Performance Programming E.g. Nehalem Problem Abstractions Optimize Code PRAM Programming Models Shared memory Message Passing … Vectorization Synchronization & Coordination Scheduling Blocking/Tiling Data Alignment Data Locality Pipelining Shared Memory SIMD Cache Hierarchy Coherence Hyper Threading Memory Consistency Multicore Accelerators …

EE 4xx: Course Topics Architectural Concepts Computer organization Fixed and floating point representation Pipeline processor implementation Data level parallelism / SIMD ILP, out of order execution Cache coherence and consistency Shared variables, atomic operations Locks and barriers Pipeline, Data hazards Speculative execution Benchmarks Multicore processors Expressing parallelism using PRAM Parallel programming paradigms Pthreads programming model Data layout and prefetch Simple performance models Optimizing cache performance, e.g., matrix multiplication, graph problems Multicore implementations Minimizing communication time … Architectural Concepts Abstractions, Software & Optimizations