Group Discussion Hong Man 07/21/2010 1. UMD DIF with GNU Radio From Will Plishker’s presentation. 2 GRC The DIF Package (TDP) Platforms GPUs Multi- processors.

Slides:



Advertisements
Similar presentations
Exploiting Execution Order and Parallelism from Processing Flow Applying Pipeline-based Programming Method on Manycore Accelerators Shinichi Yamagiwa University.
Advertisements

Threads, SMP, and Microkernels
Multi-Threading LAME MP3 Encoder
MP3 Optimization Exploiting Processor Architecture and Using Better Algorithms Mancia Anguita Universidad de Granada J. Manuel Martinez – Lechado Vitelcom.
Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
Motivation Desktop accelerators (like GPUs) form a powerful heterogeneous platform in conjunction with multi-core CPUs. To improve application performance.
Overview and Basics of Software Defined Radios INSTRUCTOR: Dr. George Collins By Praveen Kumar Chukka
Berlin, Germany – January 21st, 2013 A2B: A F RAMEWORK FOR F AST P ROTOTYPING OF R ECONFIGURABLE S YSTEMS Christian Pilato, R. Cattaneo, G. Durelli, A.A.
Trace-Based Automatic Parallelization in the Jikes RVM Borys Bradel University of Toronto.
SSP Re-hosting System Development: CLBM Overview and Module Recognition SSP Team Department of ECE Stevens Institute of Technology Presented by Hongbing.
Code recognition & CL modeling through AST Xingzhong Xu Hong Man.
Multi Agent Simulation and its optimization over parallel architecture using CUDA™ Abdur Rahman and Bilal Khan NEDUET(Department Of Computer and Information.
11 1 Hierarchical Coarse-grained Stream Compilation for Software Defined Radio Yuan Lin, Manjunath Kudlur, Scott Mahlke, Trevor Mudge Advanced Computer.
Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.
CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.
Embedded Systems in Silicon TD5102 Henk Corporaal Technical University Eindhoven DTI / NUS Singapore.
Synergistic Execution of Stream Programs on Multicores with Accelerators Abhishek Udupa et. al. Indian Institute of Science.
C++ Training Datascope Lawrence D’Antonio Lecture 11 UML.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 8: February 11, 2009 Dataflow.
February 1999 CHAIMS1 Prof. Gio Wiederhold, Dr. Dorothea Beringer, several Ph.D. and master students Stanford University
Trend towards Embedded Multiprocessors Popular Examples –Network processors (Intel, Motorola, etc.) –Graphics (NVIDIA) –Gaming (IBM, Sony, and Toshiba)
University of Michigan Electrical Engineering and Computer Science Amir Hormati, Mehrzad Samadi, Mark Woh, Trevor Mudge, and Scott Mahlke Sponge: Portable.
Contemporary Languages in Parallel Computing Raymond Hummel.
Kathy Grimes. Signals Electrical Mechanical Acoustic Most real-world signals are Analog – they vary continuously over time Many Limitations with Analog.
GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.
GPGPU platforms GP - General Purpose computation using GPU
“Early Estimation of Cache Properties for Multicore Embedded Processors” ISERD ICETM 2015 Bangkok, Thailand May 16, 2015.
Development in hardware – Why? Option: array of custom processing nodes Step 1: analyze the application and extract the component tasks Step 2: design.
COLLABORATIVE EXECUTION ENVIRONMENT FOR HETEROGENEOUS PARALLEL SYSTEMS Aleksandar Ili´c, Leonel Sousa 2010 IEEE International Symposium on Parallel & Distributed.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 10Slide 1 Architectural Design l Establishing the overall structure of a software system.
Multithreading Allows application to split itself into multiple “threads” of execution (“threads of execution”). OS support for creating threads, terminating.
COE4OI5 Engineering Design. Copyright S. Shirani 2 Course Outline Design process, design of digital hardware Programmable logic technology Altera’s UP2.
CIS4930/CDA5125 Parallel and Distributed Systems Florida State University CIS4930/CDA5125: Parallel and Distributed Systems Instructor: Xin Yuan, 168 Love,
GPUs and Accelerators Jonathan Coens Lawrence Tan Yanlin Li.
Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:
Section 10: Advanced Topics 1 M. Balakrishnan Dept. of Comp. Sci. & Engg. I.I.T. Delhi.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
3 rd Nov CSV881: Low Power Design1 Power Estimation and Modeling M. Balakrishnan.
1 Latest Generations of Multi Core Processors
DIPARTIMENTO DI ELETTRONICA E INFORMAZIONE Novel, Emerging Computing System Technologies Smart Technologies for Effective Reconfiguration: The FASTER approach.
GPUs: Overview of Architecture and Programming Options Lee Barford firstname dot lastname at gmail dot com.
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD Manifold Execution Model and System.
CSC480 Software Engineering Lecture 10 September 25, 2002.
Gedae, Inc. Gedae: Auto Coding to a Virtual Machine Authors: William I. Lundgren, Kerry B. Barnes, James W. Steed HPEC 2004.
1 Presenter: Ming-Shiun Yang 2013/01/21 SAGA : SystemC Acceleration on GPU Architectures Design Automation Conference (DAC), th ACM/EDAC/IEEE Sara.
1 November 11, 2015 A Massively Parallel, Hybrid Dataflow/von Neumann Architecture Yoav Etsion November 11, 2015.
Hy-C A Compiler Retargetable for Single-Chip Heterogeneous Multiprocessors Philip Sweany 8/27/2010.
ZHAO(176/MAPLD2004)1 FFT Mapping on Mathstar’s FPOA FilterBuilder Platform MathStar, Inc. Sept 2004.
University of Michigan Electrical Engineering and Computer Science Adaptive Input-aware Compilation for Graphics Engines Mehrzad Samadi 1, Amir Hormati.
Design of A Custom Vector Operation API Exploiting SIMD Intrinsics within Java Presented by John-Marc Desmarais Authors: Jonathan Parri, John-Marc Desmarais,
Statement Of Work Define static processor, DSP Profiles, memory and bus architectures. Define interconnections between DLX and DSP processors while helping.
Software Systems Division (TEC-SW) ASSERT process & toolchain Maxime Perrotin, ESA.
Workflow Management Concepts and Requirements For Scientific Applications.
Performed by:Liran Sperling Gal Braun Instructor: Evgeny Fiksman המעבדה למערכות ספרתיות מהירות High speed digital systems laboratory.
Static Translation of Stream Program to a Parallel System S. M. Farhad The University of Sydney.
Chapter 4: Threads 羅習五. Chapter 4: Threads Motivation and Overview Multithreading Models Threading Issues Examples – Pthreads – Windows XP Threads – Linux.
K-Nearest Neighbor Digit Recognition ApplicationDomainConstraintsKernels/Algorithms Voice Removal and Pitch ShiftingAudio ProcessingLatency (Real-Time)FFT,
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
UML Profile for SDR Hardware/Software Adequacy Verification
Parallel Programming By J. H. Wang May 2, 2017.
Abstract Machine Layer Research in VGrADS
Introduction to cosynthesis Rabi Mahapatra CSCE617
High Performance Computing (CS 540)
IC card Management System
Chapter 4: Threads.
P. Poplavko, D. Socci, R. Kahil, M. Bozga, S. Bensalem
Analysis models and design models
ECE 8823: GPU Architectures
Chapter 4: Threads.
Research: Past, Present and Future
Presentation transcript:

Group Discussion Hong Man 07/21/2010 1

UMD DIF with GNU Radio From Will Plishker’s presentation. 2 GRC The DIF Package (TDP) Platforms GPUs Multi- processors GNU Radio Engine Python/C++ Python Flowgraph (.py) 3a) Perform online scheduling DIF specification (.dif) 3b) Architecture specification (.arch?) CellFPGA XML Flowgraph (.grc) Schedule (.dif,.sched) 4) Architecture aware MP scheduling (assignment, ordering, invocation) Processors Memories Interconnect 1) Convert or generate.dif file (Complete) Platform Retargetable Library Uniprocessor Scheduling Existing or Completed Proposed Legend DIF Lite 2) Execute static schedules from DIF (Complete)

SSP Interface with DIF Currently DIF extracts dataflow model from GRC of GNU radio. – GRC is at the waveform level (component block diagram) To interact with DIF, we need to construct CL models at the waveform level – Our current works are mostly at radio primitive level – We need to start waveform level CL modeling – Open questions: Mapping “things” and “paths” in CL models to “actors” in dataflow models Representing “data rates” (“tokens”) in CL models “Processing delay” is missing in both models 3

Scheduling with Dataflow Models Scheduling based on dataflow models may achieve performance improvement with multi-rate processes ( example from Will Plishker’s presentation ) SDR at physical layer and MAC layer are mostly single-rate processes, and may not see significant performance improvement by using dataflow based scheduling Multicore scheduling is an interesting topic – Currently the assignments of “actors” to processors are done manually 4

GPU and Multicore Our findings on CUDA – Many specialized library functions optimized for GPUs – Parallelization has to be implemented manually – UMD CUDA work (FIR and Turbo decoding) have not been connected to their dataflow work yet Some considerations – Extend our investigation to OpenCL – Focus on CL modeling for multicore systems Automatically parallelize certain common DSP operations (e.g. FIR, FFT) from CL models – Operation recognition and rule-based mapping 5

Next Step Beyond rehosting – optimal code generation – c/c++ → (CL model) → SPIRAL – c/c++ → (CL model) → CUDA or OPEN CL (GPU and multicore) – c/c++ → (CL model) → c/c++ using SSE intrinsics CL modeling tasks – At both primitive level and waveform level – CL modeling from AST – DSP operation (or primitive) recognition – Code segment extraction, validation and transform 6