Haney - 1 HPEC 9/28/2004 MIT Lincoln Laboratory pMatlab Takes the HPCchallenge Ryan Haney, Hahn Kim, Andrew Funk, Jeremy Kepner, Charles Rader, Albert.

Slides:



Advertisements
Similar presentations
The HPEC Challenge Benchmark Suite
Advertisements

Copyright 2011, Data Mining Research Laboratory Fast Sparse Matrix-Vector Multiplication on GPUs: Implications for Graph Mining Xintian Yang, Srinivasan.
Philips Research ICS 252 class, February 3, The Trimedia CPU64 VLIW Media Processor Kees Vissers Philips Research Visiting Industrial Fellow
The Study of Cache Oblivious Algorithms Prepared by Jia Guo.
Andrew Prout, William Arcand, David Bestor, Chansup Byun, Bill Bergeron, Matthew Hubbell, Jeremy Kepner, Peter Michaleas, Julie Mullen, Albert Reuther,
Chansup Byun, William Arcand, David Bestor, Bill Bergeron, Matthew Hubbell, Jeremy Kepner, Andrew McCabe, Peter Michaleas, Julie Mullen, David O’Gwynn,
K-means clustering –An unsupervised and iterative clustering algorithm –Clusters N observations into K clusters –Observations assigned to cluster with.
HPEC_GPU_DECODE-1 ADC 8/6/2015 MIT Lincoln Laboratory GPU Accelerated Decoding of High Performance Error Correcting Codes Andrew D. Copeland, Nicholas.
On Chip Nonlinear Digital Compensation for RF Receiver Karen M. G. V. Gettings, Andrew K. Bolstad, Show-Yah Stuart Chen, Benjamin A. Miller and Michael.
© 2010 The MITRE Corporation. All rights reserved. Session 2: Many Core Sharon Sacco / The MITRE Corporation HPEC 2010 Approved for Public Release:
PVTOL: A High-Level Signal Processing Library for Multicore Processors
Samuel Williams, John Shalf, Leonid Oliker, Shoaib Kamil, Parry Husbands, Katherine Yelick Lawrence Berkeley National Laboratory ACM International Conference.
Energy Profiling And Analysis Of The HPC Challenge Benchmarks Scalable Performance Laboratory Department of Computer Science Virginia Tech Shuaiwen Song,
Sparse Coding for Specification Mining and Error Localization Runtime Verification September 26, 2012 Wenchao Li, Sanjit A. Seshia University of California.
Implementation of Parallel Processing Techniques on Graphical Processing Units Brad Baker, Wayne Haney, Dr. Charles Choi.
XYZ 10/2/2015 MIT Lincoln Laboratory Sidecar Network Adapters: Open Architecture for ISR Data Sources Craig McNally, Larry Barbieri, Charles Yee.
HPEC-1 MIT Lincoln Laboratory Session 3: Cloud Computing Albert Reuther/ MIT Lincoln Laboratory HPEC Conference 16 September 2010.
Semantic Information Fusion Shashi Phoha, PI Head, Information Science and Technology Division Applied Research Laboratory The Pennsylvania State.
A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.
MIT Lincoln Laboratory hch-1 HCH 5/26/2016 Achieving Portable Task and Data Parallelism on Parallel Signal Processing Architectures Hank Hoffmann.
HPEC_NDP-1 MAH 10/11/2015 MIT Lincoln Laboratory Efficient Multidimensional Polynomial Filtering for Nonlinear Digital Predistortion Matthew Herman, Benjamin.
Slide 1 MIT Lincoln Laboratory Toward Mega-Scale Computing with pMatlab Chansup Byun and Jeremy Kepner MIT Lincoln Laboratory Vipin Sachdeva and Kirk E.
Radar Pulse Compression Using the NVIDIA CUDA SDK
J. Greg Nash ICNC 2014 High-Throughput Programmable Systolic Array FFT Architecture and FPGA Implementations J. Greg.
HPEC SMHS 9/24/2008 MIT Lincoln Laboratory Large Multicore FFTs: Approaches to Optimization Sharon Sacco and James Geraci 24 September 2008 This.
Slide-1 LACSI Extreme Computing MITRE ISIMIT Lincoln Laboratory This work is sponsored by the Department of Defense under Army Contract W15P7T-05-C-D001.
Slide-1 Parallel Matlab MIT Lincoln Laboratory Parallel Matlab: The Next Generation Dr. Jeremy Kepner /MIT Lincoln Laboratory Ms. Nadya Travinin / MIT.
Cousins HPEC 2002 Session 4: Emerging High Performance Software David Cousins Division Scientist High Performance Computing Dept. Newport,
Association Mining via Co-clustering of Sparse Matrices Brian Thompson *, Linda Ness †, David Shallcross †, Devasis Bassu † *†
HPEC2002_Session1 1 DRM 11/11/2015 MIT Lincoln Laboratory Session 1: Novel Hardware Architectures David R. Martinez 24 September 2002 This work is sponsored.
2007/11/2 First French-Japanese PAAP Workshop 1 The FFTE Library and the HPC Challenge (HPCC) Benchmark Suite Daisuke Takahashi Center for Computational.
Slide-1 HPCchallenge Benchmarks MITRE ICL/UTK HPCS HPCchallenge Benchmark Suite David Koester, Ph.D. (MITRE) Jack Dongarra (UTK) Piotr Luszczek (ICL/UTK)
CBELU - 1 James 11/23/2015 MIT Lincoln Laboratory High Performance Simulations of Electrochemical Models on the Cell Broadband Engine James Geraci HPEC.
“Friends/Partners in Aviation Weather” Forum 1 November 2012 Michael D. McPartland Wind Data Quality Factors & Metrics* *This work was sponsored by the.
3-D Graph Processor William S. Song, Jeremy Kepner, Huy T. Nguyen, Joshua I. Kramer, Vitaliy Gleyzer, James R. Mann, Albert H. Horst, Larry L. Retherford,
MIT Lincoln Laboratory HPEC JML 28 Sep 2004 Mapping Signal Processing Kernels to Tiled Architectures Henry Hoffmann James Lebak [Presenter] Massachusetts.
Slide-1 Multicore Theory MIT Lincoln Laboratory Theory of Multicore Algorithms Jeremy Kepner and Nadya Bliss MIT Lincoln Laboratory HPEC 2008 This work.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA
MPEG-4 Structured Audio Mihir Anandpara EE 382C – Embedded Software Systems.
Slide-1 Parallel MATLAB MIT Lincoln Laboratory Multicore Programming in pMatlab using Distributed Arrays Jeremy Kepner MIT Lincoln Laboratory This work.
Slide-1 Multicore Productivity MIT Lincoln Laboratory Evaluating the Productivity of a Multicore Architecture Jeremy Kepner and Nadya Bliss MIT Lincoln.
HPEC HN 9/23/2009 MIT Lincoln Laboratory RAPID: A Rapid Prototyping Methodology for Embedded Systems Huy Nguyen, Thomas Anderson, Stuart Chen, Ford.
TX-2500 AIR - Slide-1 MIT Lincoln Laboratory TX-2500 An Interactive, On-Demand Rapid-Prototyping HPC System Albert Reuther, William Arcand, Tim Currie,
edit type on title master Fortran ISV Release I to L LINPACK TOP500 Technical Systems Division * Scalable Computing Lab 2 Hsin-Ying Lin
A New Class of High Performance FFTs Dr. J. Greg Nash Centar ( High Performance Embedded Computing (HPEC) Workshop.
© 2003 Mercury Computer Systems, Inc. Data Reorganization Interface (DRI) Kenneth Cain Jr. Mercury Computer Systems, Inc. High Performance Embedded Computing.
1 VSIPL++: Parallel Performance HPEC 2004 CodeSourcery, LLC September 30, 2004.
PTCN-HPEC-02-1 AIR 25Sept02 MIT Lincoln Laboratory Resource Management for Digital Signal Processing via Distributed Parallel Computing Albert I. Reuther.
The objective of Nonlinear Equalization (NLEQ) is to extend the dynamic range of RF receivers by reducing in-band nonlinear distortions. This enables detection.
MIT Lincoln Laboratory 1 of 4 MAA 3/8/2016 Development of a Real-Time Parallel UHF SAR Image Processor Matthew Alexander, Michael Vai, Thomas Emberley,
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA Shirley Moore CPS5401 Fall 2013 svmoore.pbworks.com November 12, 2012.
KERRY BARNES WILLIAM LUNDGREN JAMES STEED
Do Now: Perform the indicated operation. 1.). Algebra II Elements 11.1: Matrix Operations HW: HW: p.590 (16-36 even, 37, 44, 46)
Matrix Multiplication in Hadoop
Item Based Recommender System SUPERVISED BY: DR. MANISH KUMAR BAJPAI TARUN BHATIA ( ) VAIBHAV JAISWAL( )
400328_M_1Y.ppt Thompson MIT Lincoln Laboratory Analysis of Terminal Separation Standards and Radar Performance Dr. Steven D. Thompson Dr. Steven.
HPEC-1 SMHS 7/7/2016 MIT Lincoln Laboratory Focus 3: Cell Sharon Sacco / MIT Lincoln Laboratory HPEC Workshop 19 September 2007 This work is sponsored.
12-1 Organizing Data Using Matrices
Parallel Matlab programming using Distributed Arrays
Parallel processing is not easy
Albert I. Reuther & Joel Goodman HPEC Sept 2003
M. Richards1 ,D. Campbell1 (presenter), R. Judd2, J. Lebak3, and R
Section 7: Memory and Caches
Presented by: Tim Olson, Architect
Parallel Processing in ROSA II
Matthew Lyon James Montgomery Lucas Scotta 13 October 2010
Case Study Memory Controller مرتضي صاحب الزماني.
Multicore Programming in pMatlab using Distributed Arrays
VSIPL++: Parallel Performance HPEC 2004
Deployment of SAR and GMTI Signal Processing on a Boeing 707 Aircraft using pMatlab and a Bladed Linux Cluster Jeremy Kepner, Tim Currie, Hahn Kim, Bipin.
Presentation transcript:

Haney - 1 HPEC 9/28/2004 MIT Lincoln Laboratory pMatlab Takes the HPCchallenge Ryan Haney, Hahn Kim, Andrew Funk, Jeremy Kepner, Charles Rader, Albert Reuther and Nadya Travinin HPEC 2004 This work is sponsored by the Defense Advanced Research Projects Administration under Air Force Contract F C Opinions, interpretations, conclusions, and recommendations are those of the author and are not necessarily endorsed by the United States Government. * This work is sponsored by Defense Advanced Research Projects Administration, under Air Force Contract F C Opinions, interpretations, conclusions and recommendations are those of the author and are not necessarily endorsed by the United States Government.

MIT Lincoln Laboratory Haney - 2 HPEC 9/28/2004 Motivation and Goals Motivation –The DARPA HPCS program has created the HPCchallenge benchmark suite in an effort to redefine how we measure productivity in the HPC domain –Implementing the HPCchallenge benchmarks using pMatlab allows a unique opportunity to explore the merits of pMatlab with respect to HPEC Goals –Compare traditional C/MPI with pMatlab. Measurements of productivity include: Maximum problem size: Largest problem that can be solved or fit into memory Execution performance: Run-time performance of the benchmark Code size: Software lines of code (SLOC) required to implement the benchmark

MIT Lincoln Laboratory Haney - 3 HPEC 9/28/2004 HPCchallenge Relevance to HPEC HPCchallenge benchmarks encompass key embedded signal processing operations –FFT: Distributed corner turn and FFTs important in multi- sensor signal processing –RandomAccess: Random data accesses typical of “post detection” operations –Top500: Matrix-matrix multiplies typical of multi- element beamforming –STREAM: Distributed vector operations common to signal processing (Vector Operations) Top500 (Matrix Multiply) FFT (Corner-turn) STREAM RandomAccess (Detection) High Low HPCS Spatial Locality Temporal Locality

MIT Lincoln Laboratory Haney - 4 HPEC 9/28/2004 FFT Results Algorithm Software Code Size Maximum Problem Size in Memory (GB) Execution Performance (GFLOPS) CPUs C/MPIpMatlabRatio SLOC pMatlab memory scalability comparable to C/MPI (128x on 128 CPUs) pMatlab execution performance comparable to C/MPI (55x on 128 CPUs) pMatlab code size is 35x smaller than C/MPI p0p0 p0 p1 : pN pNpN FFT rows FFT columns corner turn.. p1p1 32-bit limitation 128x 55x