STUDY AND IMPLEMENTATION

Slides:



Advertisements
Similar presentations
Vectors, SIMD Extensions and GPUs COMP 4611 Tutorial 11 Nov. 26,
Advertisements

Intel Pentium 4 ENCM Jonathan Bienert Tyson Marchuk.
MP3 Optimization Exploiting Processor Architecture and Using Better Algorithms Mancia Anguita Universidad de Granada J. Manuel Martinez – Lechado Vitelcom.
VLIW Very Large Instruction Word. Introduction Very Long Instruction Word is a concept for processing technology that dates back to the early 1980s. The.
HW 2 is out! Due 9/25!. CS 6290 Static Exploitation of ILP.
Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Advanced Computer Architecture COE 501.
AMD OPTERON ARCHITECTURE Omar Aragon Abdel Salam Sayyad This presentation is missing the references used.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures Computer Architecture A.
POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:
The University of Adelaide, School of Computer Science
ECE291 Computer Engineering II Lecture 24 Josh Potts University of Illinois at Urbana- Champaign.
TIE Extensions for Cryptographic Acceleration Charles-Henri Gros Alan Keefer Ankur Singla.
EECC551 - Shaaban #1 Fall 2005 lec# Static Compiler Optimization Techniques We examined the following static ISA/compiler techniques aimed.
Fall EE 333 Lillevik 333f06-l20 University of Portland School of Engineering Computer Organization Lecture 20 Pipelining: “bucket brigade” MIPS.
Parallell Processing Systems1 Chapter 4 Vector Processors.
Intel’s MMX Dr. Richard Enbody CSE 820. Michigan State University Computer Science and Engineering Why MMX? Make the Common Case Fast Multimedia and Communication.
Improvement of CT Slice Image Reconstruction Speed Using SIMD Technology Xingxing Wu Yi Zhang Instructor: Prof. Yu Hen Hu Department of Electrical & Computer.
 Understanding the Sources of Inefficiency in General-Purpose Chips.
Data Locality CS 524 – High-Performance Computing.
Compilation Techniques for Multimedia Processors Andreas Krall and Sylvain Lelait Technische Universitat Wien.
Instruction Level Parallelism (ILP) Colin Stevens.
High Performance Computing Introduction to classes of computing SISD MISD SIMD MIMD Conclusion.
CS854 Pentium III group1 Instruction Set General Purpose Instruction X87 FPU Instruction SIMD Instruction MMX Instruction SSE Instruction System Instruction.
DATA LOCALITY & ITS OPTIMIZATION TECHNIQUES Presented by Preethi Rajaram CSS 548 Introduction to Compilers Professor Carol Zander Fall 2012.
Chapter One Introduction to Pipelined Processors.
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
Intel Architecture. Changes in architecture Software architecture: –Front end (Feature changes such as adding more graphics, changing the background colors,
Software Data Prefetching Mohammad Al-Shurman & Amit Seth Instructor: Dr. Aleksandar Milenkovic Advanced Computer Architecture CPE631.
Streaming SIMD Extensions CSE 820 Dr. Richard Enbody.
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring.
HOCT: A Highly Scalable Algorithm for Training Linear CRF on Modern Hardware presented by Tianyuan Chen.
Develop and Implementation of the Speex Vocoder on the TI C64+ DSP
1 Advance Computer Architecture CSE 8383 Ranya Alawadhi.
© 2007 SET Associates Corporation SAR Processing Performance on Cell Processor and Xeon Mark Backues, SET Corporation Uttam Majumder, AFRL/RYAS.
Speed-up of the ring recognition algorithm Semeon Lebedev GSI, Darmstadt, Germany and LIT JINR, Dubna, Russia Gennady Ososkov LIT JINR, Dubna, Russia.
AES Encryption Code Generator Undergraduate Research Project by Paul Magrath. Supervised by Dr David Gregg.
Bottlenecks of SIMD Haibin Wang Wei tong. Paper Bottlenecks in Multimedia Processing with SIMD Style Extensions and Architectural Enhancements One IEEE.
The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03.
The TM3270 Media-Processor. Introduction Design objective – exploit the high level of parallelism available. GPPs with Multi-media extensions (Ex: Intel’s.
Introduction to MMX, XMM, SSE and SSE2 Technology
CS/EE 5810 CS/EE 6810 F00: 1 Multimedia. CS/EE 5810 CS/EE 6810 F00: 2 New Architecture Direction “… media processing will become the dominant force in.
Implementation of MPEG2 Codec with MMX/SSE/SSE2 Technology Speaker: Rong Jiang, Xu Jin Instructor: Yu-Hen Hu.
Recap Multicycle Operations –MIPS Floating Point Putting It All Together: the MIPS R4000 Pipeline.
CPS 258, Fall 2004 Introduction to Computational Science.
Weekly Report- Reduction Ph.D. Student: Leo Lee date: Oct. 30, 2009.
Presented by Jeremy S. Meredith Sadaf R. Alam Jeffrey S. Vetter Future Technologies Group Computer Science and Mathematics Division Research supported.
EECS 583 – Class 22 Research Topic 4: Automatic SIMDization - Superword Level Parallelism University of Michigan December 10, 2012.
C.E. Goutis V.I.Kelefouras University of Patras Department of Electrical and Computer Engineering VLSI lab Date: 20/11/2015 Compilers for Embedded Systems.
Analyzing Memory Access Intensity in Parallel Programs on Multicore Lixia Liu, Zhiyuan Li, Ahmed Sameh Department of Computer Science, Purdue University,
1 ECE 734 Final Project Presentation Fall 2000 By Manoj Geo Varghese MMX Technology: An Optimization Outlook.
Sridhar Rajagopal Bryan A. Jones and Joseph R. Cavallaro
Computer Architecture Principles Dr. Mike Frank
Multi-core processors
INTRODUCTION TO MICROPROCESSORS
Morgan Kaufmann Publishers
Vector Processing => Multimedia
Lecture 16: Parallel Algorithms I
Memory Hierarchies.
Performance Optimization for Embedded Software
DSPs for Future Wireless Base-Stations
Coe818 Advanced Computer Architecture
Numerical Algorithms Quiz questions
The Fast Curvelet Transform
Final Project presentation
EE 4xx: Computer Architecture and Performance Programming
Samuel Larsen Saman Amarasinghe Laboratory for Computer Science
Superscalar and VLIW Architectures
Memory System Performance Chapter 3
DSPs for Future Wireless Base-Stations
Presentation transcript:

STUDY AND IMPLEMENTATION MMX TECHNOLOGY STUDY AND IMPLEMENTATION

Introduction MMX : Multi-Media-eXtension Designed to accelerate multimedia and communication applications. Exploits the parallelism inherent in many multimedia and communications algorithms.

Highlights Single Instruction Multiple Data technique 57 new instructions Eight 64-bit wide MMX technology registers Four new data types

MOTIVATION Study the Pentium Processor and its pipeline structure. Use the MMX instructions. Implement matrix operations using these instructions. Analyze instructions for latency and speedup.

IMPLEMENTATION Matrix operations. Matrix multiply, transform. Solution to linear system of equations.

Goal To find parts of code that can efficiently use MMX instructions. Find the data types supported and the speedup achieved. Use data cache efficiently. Perform loop unrolling and code optimization techniques.