Los Alamos National Laboratory Streams-C Maya Gokhale Los Alamos National Laboratory September, 1999.

Slides:



Advertisements
Similar presentations
INPUT-OUTPUT ORGANIZATION
Advertisements

A Programmable Adaptive Router for a GALS Parallel System Jian Wu APT Group University of Manchester May 2009.
Sumitha Ajith Saicharan Bandarupalli Mahesh Borgaonkar.
Programmable Interval Timer
COMP3221: Microprocessors and Embedded Systems Lecture 17: Computer Buses and Parallel Input/Output (I) Lecturer: Hui.
1/1/ /e/e eindhoven university of technology Microprocessor Design Course 5Z008 Dr.ir. A.C. (Ad) Verschueren Eindhoven University of Technology Section.
Processor System Architecture
Overview: Chapter 7  Sensor node platforms must contend with many issues  Energy consumption  Sensing environment  Networking  Real-time constraints.
INTEGRATING TIMING AND FREE-RUNNING ACTORS WITH DATAFLOW Tim Hayles Principal Engineer National Instruments September 9, 2008
Team Morphing Architecture Reconfigurable Computational Platform for Space.
Term Project Overview Yong Wang. Introduction Goal –familiarize with the design and implementation of a simple pipelined RISC processor What to do –Build.
TCSS 372A Computer Architecture. Getting Started Get acquainted (take pictures) Discuss purpose, scope, and expectations of the course Discuss personal.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.
1 Computer System Overview OS-1 Course AA
Basic Input/Output Operations
University College Cork IRELAND Hardware Concepts An understanding of computer hardware is a vital prerequisite for the study of operating systems.
Recap – Our First Computer WR System Bus 8 ALU Carry output A B S C OUT F 8 8 To registers’ input/output and clock inputs Sequence of control signal combinations.
TECH CH03 System Buses Computer Components Computer Function
1 Evgeny Bolotin – ICECS 2004 Automatic Hardware-Efficient SoC Integration by QoS Network on Chip Electrical Engineering Department, Technion, Haifa, Israel.
Orion: A Power-Performance Simulator for Interconnection Networks Presented by: Ilya Tabakh RC Reading Group4/19/2006.
1 RAMP Infrastructure Krste Asanovic UC Berkeley RAMP Tutorial, ISCA/FCRC, San Diego June 10, 2007.
Lecture 12: High-Level Compilation October 17, 2013 ECE 636 Reconfigurable Computing Lecture 12 High-Level Compilation.
From Concept to Silicon How an idea becomes a part of a new chip at ATI Richard Huddy ATI Research.
INPUT-OUTPUT ORGANIZATION
Lecture 12 Today’s topics –CPU basics Registers ALU Control Unit –The bus –Clocks –Input/output subsystem 1.
1 Presenter: Ming-Shiun Yang Sah, A., Balakrishnan, M., Panda, P.R. Design, Automation & Test in Europe Conference & Exhibition, DATE ‘09. A Generic.
LSU 10/22/2004Serial I/O1 Programming Unit, Lecture 5.
SLAAC Hardware Status Brian Schott Provo, UT September 1999.
CHAPTER 3 TOP LEVEL VIEW OF COMPUTER FUNCTION AND INTERCONNECTION
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
Top Level View of Computer Function and Interconnection.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
Data and Computer Communications Circuit Switching and Packet Switching.
designKilla: The 32-bit pipelined processor Brought to you by: Victoria Farthing Dat Huynh Jerry Felker Tony Chen Supervisor: Young Cho.
© 2004 Mercury Computer Systems, Inc. FPGAs & Software Components Graham Bardouleau & Jim Kulp Mercury Computer Systems, Inc. High Performance Embedded.
Los Alamos National Lab Streams-C Maya Gokhale, Janette Frigo, Christine Ahrens, Marc Popkin- Paine Los Alamos National Laboratory Janice M. Stone Stone.
Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.
I/O Computer Organization II 1 Interconnecting Components Need interconnections between – CPU, memory, I/O controllers Bus: shared communication channel.
EEE440 Computer Architecture
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
Evaluating and Improving an OpenMP-based Circuit Design Tool Tim Beatty, Dr. Ken Kent, Dr. Eric Aubanel Faculty of Computer Science University of New Brunswick.
Introduction to VHDL Simulation … Synthesis …. The digital design process… Initial specification Block diagram Final product Circuit equations Logic design.
Computer Organization CDA 3103 Dr. Hassan Foroosh Dept. of Computer Science UCF © Copyright Hassan Foroosh 2002.
Graphical Design Environment for a Reconfigurable Processor IAmE Abstract The Field Programmable Processor Array (FPPA) is a new reconfigurable architecture.
Register Transfer Languages (RTL)
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
DDRIII BASED GENERAL PURPOSE FIFO ON VIRTEX-6 FPGA ML605 BOARD PART B PRESENTATION STUDENTS: OLEG KORENEV EUGENE REZNIK SUPERVISOR: ROLF HILGENDORF 1 Semester:
Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles Zhiyi Yu, Bevan Baas VLSI Computation Lab, ECE Department University of California,
Chapter 3 System Buses.  Hardwired systems are inflexible  General purpose hardware can do different tasks, given correct control signals  Instead.
CoDeveloper Overview Updated February 19, Introducing CoDeveloper™  Targeting hardware/software programmable platforms  Target platforms feature.
Chapter Goals Describe the application development process and the role of methodologies, models, and tools Compare and contrast programming language generations.
PROGRAMMABLE LOGIC CONTROLLERS SINGLE CHIP COMPUTER
SOFTWARE DESIGN AND ARCHITECTURE
How does an SIMD computer work?
Laxmi Narayan Bhuyan SIMD Architectures Laxmi Narayan Bhuyan
Design Flow System Level
Introduction to cosynthesis Rabi Mahapatra CSCE617
THE ECE 554 XILINX DESIGN PROCESS
Chapter 13: I/O Systems.
THE ECE 554 XILINX DESIGN PROCESS
Presentation transcript:

Los Alamos National Laboratory Streams-C Maya Gokhale Los Alamos National Laboratory September, 1999

Los Alamos National Laboratory LANL/SLAAC Activities Challenge Problem Formulation Integration of SLAAC Technology –Wideband RF –Hyperspectral Imaging Design Tool Validation Streams-C

Los Alamos National Laboratory Background 1997 ACS start, 3 year program Goal: To develop –programming model –language constructs –compiler technology –runtime library modules –reference programming examples for stream-oriented ACS applications.

Los Alamos National Laboratory Background (con’t.) Year 1 - applications study and simulator development Year 2 - compiler development Present status: From C input we can generate Wildforce Streams-C applications –pipelined processing –stream communication –on-board sequencer –pipelined memory access

Los Alamos National Laboratory Programming Model Collection of processes, separate address spaces High bandwidth synchronous data streams communicated among modules Low bandwidth asynchronous signals for phase coordination and flow control Flow may be a graph rather than simple pipeline Covers virtually all ACS applications - high data rate front end processing F5 F0 F1 F2 F3F4 F6

Los Alamos National Laboratory Streams-C vs. MPI Similarities Each process has a separate address space Processes communicate through messages Differences- unlike MPI Streams-C supports highly synchronous communication Streams-C data size is typically small Streams-C does not define aggregate communication patterns

Los Alamos National Laboratory Programming Model Process –parameters are input and output streams, signals that the process will use –body is a C subroutine Stream –defined by size of payload, flavor of stream (valid tag, buffered, …), and processes being interconnected –operations are open, close, read, write, eos, [eoframe] Signal –optional payload parameter –operations are post, wait Topology –describes interconnection of processes, streams, signals –describes mapping of processes, streams, signals to hardware board

Los Alamos National Laboratory Physical Mapping Information Process –which chip Stream –which physical I/O pins on chip Signal –mapping to physical resources determined by our library Chip 1 Chip 2 Chip 3 F5 F0 F1 F2 F3F4 F6 Left-toRightFIFO

Los Alamos National Laboratory A Single Environment for Functional Simulation and Synthesis

Los Alamos National Laboratory Process Declaration Stream Declaration Stream Operations

Los Alamos National Laboratory Process/Stream Mapping

Los Alamos National Laboratory Streams-C Compiler Builds on Malleable Architecture Generator (MARGE) synthesis technology Builds on NAPA C SUIF-based pragma infrastructure and analysis Unique extensions to streams processing –process/stream directives –mapping directives

Los Alamos National Laboratory Streams C Compiler Structure Process Run Methods Process Run Methods Front End Sequence info + Datapath Operations Module Generator Database Module Generator Database Runtime Software, Streams Library Runtime Software, Streams Library Synthesis Runtime Hardware, Streams Library Runtime Hardware, Streams Library Place and Route Place and Route Host Processes MARGE RTL for Processing Element RTL for Processing Element RTL for Processing Element RTL for Processing Element RTL for Processing Element RTL for Processing Element Configuration Bit Stream Processing Element Configuration Bit Stream Processing Element Configuration Bit Stream Processing Element Configuration Bit Stream Processing Element Configuration Bit Stream Processing Element Configuration Bit Stream

Los Alamos National Laboratory Processing Element Structure Datapath Pipeline Control Instruction Decode Instruction Decode Datapath Module Instruction Sequencer Instruction Sequencer Process 1 Stream Module Memory Interface Memory Interface External Memory External Memory Signal Controller Signal Controller Datapath Pipeline Control Instruction Decode Instruction Decode Datapath Module Instruction Sequencer Instruction Sequencer Process 2 Stream Module Stream Module Processing Element

Stream Hardware Components l High bandwidth, synchronous communication l Multiple protocols: “Valid” tag, buffered handshake l Parameterized synthesizable modules l Multiple Wildforce channel mappings: l Intra-FPGA, Nearest neighbor, Crossbar, Host FIFO Stream Writer Module Data Enable Ready Data Enable Ready Stream Reader Module Channel Producer Process Consumer Process

Los Alamos National Laboratory Stream Modules Stream Module Selection –Stream Type Valid Tag Buffered Full –Parameters Element Width Buffer Depth –Context Intra-process Nearest Neighbor Crossbar Host FIFO Stream Operations –Open –Close –Read –Write –End of Stream –Error

Los Alamos National Laboratory Stream Module Example

Los Alamos National Laboratory Hardware Control Structures Instruction Sequencing Instruction Decoding Loop pipeline control –Definite iteration –Indefinite iteration Memory access

Los Alamos National Laboratory Pipeline Control Modules Controls data path pipeline register enables Started by decode of “pipeline instruction” Automatic stall –Handles full/empty stream conditions Programmable: –Number of pipeline stages –Initiation Interval –Number of flush cycles –Number of iterations (definite iteration pipelines) Pipeline Controller Datapath Instruction Decoder

Los Alamos National Laboratory Pipeline Control Structure PipeEnable shift register PipeMaster shift register IntervalFlush Iterations IntervalCnt FlushCntIterationCnt Control Logic Start Stall Terminate Shift Bit Load Clear To Pipelined Data Path From Decoded Instruction

Los Alamos National Laboratory Software Runtime Library Compiler synthesizes control program Automatically inserts calls to runtime library Runtime system controls: –Board initialization –memory initialization –PE and Crossbar configuration

Los Alamos National Laboratory Current Status Compiler-generated benchmarks and kernels run on Wildforce board We are transitioning hardware and software environment to LANL Jan Stone will continue on compiler effort Jeff Arnold no longer available, replaced by Dominique Lavenier, sabbatical visitor from IRISA, France (DEC Pamette experience)

Los Alamos National Laboratory Plans for FY 2000 (I) Gain Quantitative experience with Streams-C compiler –measure size/speed of compiler-generated vs. hand- generated designs –improve compiler performance –improve compiler reliability Goal: significant LANL applications coded, simulated, and run on hardware with Streams-C

Los Alamos National Laboratory Applications (Sarnoff) Contrast Enhancement (Honeywell) Wavelet (LANL) Hyperspectral Image Processing –Pixel Purity Index –K-Means

Los Alamos National Laboratory Plans for FY2000 (II) Integrate Streams-C into SLAAC environment –conform to SLAAC API for multi-board applications –re-target hardware libraries to SLAAC platform