University of Veszprém Department of Image Processing and Neurocomputing Emulated Digital CNN-UM Implementation of a 3-dimensional Ocean Model on FPGAs.

Slides:



Advertisements
Similar presentations
Systolic Arrays & Their Applications
Advertisements

CA 714CA Midterm Review. C5 Cache Optimization Reduce miss penalty –Hardware and software Reduce miss rate –Hardware and software Reduce hit time –Hardware.
The Microprocessor is no more General Purpose. Design Gap.
Computer Architecture & Organization
Machine Learning Neural Networks
Digital Systems Emphasis for Electrical Engineering Students Digital Systems skills are very valuable for electrical engineers Digital systems are the.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
Introduction to ARM Architecture, Programmer’s Model and Assembler Embedded Systems Programming.
CS 300 – Lecture 20 Intro to Computer Architecture / Assembly Language Caches.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
Chapter 2: Impact of Machine Architectures What is the Relationship Between Programs, Programming Languages, and Computers.
1 Lecture 24: Parallel Algorithms I Topics: sort and matrix algorithms.
A Voyage of Discovery Physical oceanography Instructor: Dr. Cheng-Chien LiuCheng-Chien Liu Department of Earth Sciences National Cheng Kung University.
Modeling Fluid Phenomena -Vinay Bondhugula (25 th & 27 th April 2006)
1 CS402 PPP # 1 Computer Architecture Evolution. 2 John Von Neuman original concept.
Institute of Oceanogphy Gdańsk University Jan Jędrasik The Hydrodynamic Model of the Southern Baltic Sea.
A Finite Volume Coastal Ocean Model Peter C Chu and Chenwu Fan Naval Postgraduate School Monterey, California, USA.
Kathy Grimes. Signals Electrical Mechanical Acoustic Most real-world signals are Analog – they vary continuously over time Many Limitations with Analog.
Field Programmable Gate Array (FPGA) Layout An FPGA consists of a large array of Configurable Logic Blocks (CLBs) - typically 1,000 to 8,000 CLBs per chip.
Eye-RIS. Vision System sense – process - control autonomous mode Program stora.
SELFE: Semi-implicit Eularian- Lagrangian finite element model for cross scale ocean circulation Paper by Yinglong Zhang and Antonio Baptista Presentation.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Introduction to Interconnection Networks. Introduction to Interconnection network Digital systems(DS) are pervasive in modern society. Digital computers.
An Introduction Chapter Chapter 1 Introduction2 Computer Systems  Programmable machines  Hardware + Software (program) HardwareProgram.
Computer Architecture ECE 4801 Berk Sunar Erkay Savas.
Paper Review I Coarse Grained Reconfigurable Arrays Presented By: Matthew Mayhew I.D.# ENG*6530 Tues, June, 10,
2007 Sept 06SYSC 2001* - Fall SYSC2001-Ch1.ppt1 Computer Architecture & Organization  Instruction set, number of bits used for data representation,
Cellular Neural Network Simulation and Modeling Oroszi Balázs
Efficient FPGA Implementation of QR
Gerousis Toward Nano-Networks and Architectures C. Gerousis and D. Ball Department of Physics, Computer Science and Engineering Christopher Newport University.
NIMIA October 2001, Crema, Italy - Vincenzo Piuri, University of Milan, Italy NEURAL NETWORKS FOR SENSORS AND MEASUREMENT SYSTEMS Part II Vincenzo.
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
Raster Data Model.
Cellular Neural Networks Survey of Techniques and Applications Max Pflueger CS 152: Neural Networks December 12, 2006.
Scheduling Many-Body Short Range MD Simulations on a Cluster of Workstations and Custom VLSI Hardware Sumanth J.V, David R. Swanson and Hong Jiang University.
Representing Groundwater in Management Models Julien Harou University College London 2010 International Congress on Environmental Modelling and Software.
Problem is to compute: f(latitude, longitude, elevation, time)  temperature, pressure, humidity, wind velocity Approach: –Discretize the.
Embedded Runtime Reconfigurable Nodes for wireless sensor networks applications Chris Morales Kaz Onishi 1.
An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines By: David Chui Supervisor: Professor P. Chow.
EE3A1 Computer Hardware and Digital Design
Chapter 17 Looking “Under the Hood”. 2Practical PC 5 th Edition Chapter 17 Getting Started In this Chapter, you will learn: − How does a computer work.
MIKE 11 IntroductionNovember 2002Part 1 Introduction to MIKE 11 Part 1 General Hydrodynamics within MIKE 11 –Basic Equations –Flow Types Numerical Scheme.
Rinoy Pazhekattu. Introduction  Most IPs today are designed using component-based design  Each component is its own IP that can be switched out for.
Presentation of the paper: An unstructured grid, three- dimensional model based on the shallow water equations Vincenzo Casulli and Roy A. Walters Presentation.
M U N - February 17, Phil Bording1 Computer Engineering of Wave Machines for Seismic Modeling and Seismic Migration R. Phillip Bording February.
Conservation of Salt: Conservation of Heat: Equation of State: Conservation of Mass or Continuity: Equations that allow a quantitative look at the OCEAN.
CHANGSHENG CHEN, HEDONG LIU, And ROBERT C. BEARDSLEY
A hierarchy of ocean models
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 3.
Ghent University Pattern recognition with CNNs as reservoirs David Verstraeten 1 – Samuel Xavier de Souza 2 – Benjamin Schrauwen 1 Johan Suykens 2 – Dirk.
Numerical Algorithm Development and Testing in HYCOM.
Assessment of a wetting and drying scheme in the HYbrid Coordinate Ocean Model (HYCOM) Sébastien DENNEULIN Eric Chassignet, Flavien Gouillon, Alexandra.
Center for Ocean-Atmospheric Prediction Studies
1 An FPGA Implementation of the Two-Dimensional Finite-Difference Time-Domain (FDTD) Algorithm Wang Chen Panos Kosmas Miriam Leeser Carey Rappaport Northeastern.
Computer Architecture Furkan Rabee
Embedded Systems. What is Embedded Systems?  Embedded reflects the facts that they are an integral.
Cellular Neural Networks and Visual Computing Leon O. Chua and Tamás Roska Presentation by Max Pflueger.
HPEC 2003 Linear Algebra Processor using FPGA Jeremy Johnson, Prawat Nagvajara, Chika Nwankpa Drexel University.
Backprojection Project Update January 2002
COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE
Stateless Combinational Logic and State Circuits
Architecture & Organization 1
Ocean Models By Tom Snyder.
Convergence in Computational Science
Lecture 16: Parallel Algorithms I
Architecture & Organization 1
Modelling tools - MIKE11 Part1-Introduction
Lecture 1: Introduction
Introduction to Microprocessor Programming
Husky Energy Chair in Oil and Gas Research
Presentation transcript:

University of Veszprém Department of Image Processing and Neurocomputing Emulated Digital CNN-UM Implementation of a 3-dimensional Ocean Model on FPGAs Zoltán Nagy, Péter Szolgay

Nagy 2 MAPLD 2005/153 Introduction Cellular Neural/Nonlinear Networks Universal Machine (CNN-UM) Ocean modeling Results Conclusions

Nagy 3 MAPLD 2005/153 Cellular Neural/Nonlinear Networks (CNN) 2 or N dimensional grid Locally connected Analog processing elements State value is continuous in time

Nagy 4 MAPLD 2005/153 Structure of a CNN cell u ij input x ij state y ij output z ij constant bias A ij,kl feedback template B ij,kl feed-forward template

Nagy 5 MAPLD 2005/153 CNN-UM implementations Software simulation  Easy to implement  Slow, even if using processor specific instructions Emulated digital VLSI  Specialized digital architecture  Selectable computing precision (Castle architecture: 1, 6, 12 bit)  Orders faster than the software simulation  Long design time Analog VLSI  Huge computing power (~TeraOP/s)  Low accuracy (7-8 bit)  Noise and temperature sensitivity

Nagy 6 MAPLD 2005/153 Structure of the Falcon emulated digital CNN-UM Mixer  Contains cell values for the next updates Memory unit  Contains a belt of the cell array Template memory Arithmetic unit Processors can be connected on a grid  Linear speedup

Nagy 7 MAPLD 2005/153 Structure of the arithmetic unit Cell update in row wise order Cycle time depends on template size Fully pipelined

Nagy 8 MAPLD 2005/153 Configurable parameters State, template and constant width between 2 to 64 bits Number of templates Size of the templates Width of the cell array slice Number of layers Number and arrangement of the processor cores

Nagy 9 MAPLD 2005/153 Example: Solution of a simple PDE on CNN The Wave equation Spatial discretization 2 layer CNN

Nagy 10 MAPLD 2005/153 Ocean models Barotropic model Baroclinic models  z-coordinate model  σ-coordinate model  isopycnal Fine resolution models  Real-time forecast  Fishing industry  Search and rescue Coarse resolution models  Long term predictions  Climate modeling

Nagy 11 MAPLD 2005/153 The Princeton Ocean Model (POM) Sigma coordinate model  Vertical coordinate is scaled on the water column depth Second moment turbulence closure sub-model  Provides vertical mixing coefficients Solution technique: Mode splitting  Internal mode (3D) o Vertical structure equations o Implicit solution  External mode (2D) o Vertically integrated equations o Explicit solution (Leapfrog method)

Nagy 12 MAPLD 2005/153 Governing equations of the external (2D) mode u x, u y mass transport η free surface elevation Ω angular rotation of the Earth Θ latitude H depth of the ocean g gravitational acceleration τ w, τ b wind and bottom stress A lateral viscosity

Nagy 13 MAPLD 2005/153 Solution on CNN Spatial discretization on a uniform grid 3-layer CNN structure Non-linear template required for advection term Cannot be solved on analog VLSI CNN chips Solvable on the modified Falcon architecture  Support of non-linearity  Specialized cell model

Nagy 14 MAPLD 2005/153 The modified arithmetic unit of the Falcon architecture

Nagy 15 MAPLD 2005/153 Implementation on FPGA Complicated arithmetic unit Fixed-point number representation Configurable precision High level hardware description language required (e.g. Handel-C)

Nagy 16 MAPLD 2005/153 Performance

Nagy 17 MAPLD 2005/153 The Seamount problem

Nagy 18 MAPLD 2005/153 Results after 72 hours Circulation patternElevation

Nagy 19 MAPLD 2005/153 Error of the solution

Nagy 20 MAPLD 2005/153 Error of the solution

Nagy 21 MAPLD 2005/153 Memory requirements of the internal (3D) equations Extended memory hierarchy  New level stores 3 cross sectional slices from the 3D array o Large memory required (e.g. 512x512x64 sized grid, 3x512x64 elements per state variable) o Cannot be stored on-chip o Off-chip storage requires huge I/O bandwidth Processor array should be used  The 3D array is divided between the processors  Optimal data set for on chip storage: 2048 elements per cross sectional slice (512x32x64 sized grid per processor)  Each processor located on a separate FPGA

Nagy 22 MAPLD 2005/153 Solution of the internal (3D) equations Implicit solution  Fixed-point solution o Requires large precision to avoid rounding errors o Seems to be impractical  Floating-point solution o Requires large area (especially add/sub) Explicit solution  Smaller timestep  Simpler arithmetic unit

Nagy 23 MAPLD 2005/153 Conclusions Ocean modeling using emulated digital CNN is very promising Moderate precision is required in 2D mode  1% accuracy using 24 bits Expected speedup (compared to an Athlon64 2GHz microprocessor)  80 times on our RC200 prototyping board  3700 times on the largest available FPGA