A Distributed Algorithm for 3D Radar Imaging PATRICK LI SIMON SCOTT CS 252 MAY 2012.

Slides:



Advertisements
Similar presentations
CSE 413: Computer Networks
Advertisements

Maximum Battery Life Routing to Support Ubiquitous Mobile Computing in Wireless Ad Hoc Networks By C. K. Toh.
1 CONGESTION CONTROL. 2 Congestion Control When one part of the subnet (e.g. one or more routers in an area) becomes overloaded, congestion results. Because.
Fundamentals of Computer Networks ECE 478/578
1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston.
Chapter 8 Hardware Conventional Computer Hardware Architecture.
Compressive Oversampling for Robust Data Transmission in Sensor Networks Infocom 2010.
A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO.
An Analysis of the Optimum Node Density for Ad hoc Mobile Networks Elizabeth M. Royer, P. Michael Melliar-Smith and Louise E. Moser Presented by Aki Happonen.
Introduction Future wireless systems will be characterized by their heterogeneity - availability of multiple access systems in the same physical space.
Mesh Networks A.k.a “ad-hoc”. Definition A local area network that employs either a full mesh topology or partial mesh topology Full mesh topology- each.
1 Tuesday, October 03, 2006 If I have seen further, it is by standing on the shoulders of giants. -Isaac Newton.
Tiny OS Optimistic Lightweight Interrupt Handler Simon Yau Alan Shieh CS252, CS262A, Fall The.
DDDDRRaw: A Prototype Toolkit for Distributed Real-Time Rendering on Commodity Clusters Thu D. Nguyen and Christopher Peery Department of Computer Science.
TinyOS Software Engineering Sensor Networks for the Masses.
CS 684.
Topic Overview One-to-All Broadcast and All-to-One Reduction
Parallel System Performance CS 524 – High-Performance Computing.
CSCE 313: Embedded Systems Multiprocessor Systems
DCL Concepts STL Concepts ContainerIteratorAlgorithmFunctorAdaptor What New Concepts are Needed for a “DCL”? (Distributed Computing Library) Distributed.
Wireless Video Sensor Networks Vijaya S Malla Harish Reddy Kottam Kirankumar Srilanka.
Real-time Video Streaming from Mobile Underwater Sensors 1 Seongwon Han (UCLA) Roy Chen (UCLA) Youngtae Noh (Cisco Systems Inc.) Mario Gerla (UCLA)
Storage area network and System area network (SAN)
The hybird approach to programming clusters of multi-core architetures.
Edge Based Cloud Computing as a Feasible Network Paradigm(1/27) Edge-Based Cloud Computing as a Feasible Network Paradigm Joe Elizondo and Sam Palmer.
A Framework for Collective Personalized Communication Laxmikant V. Kale, Sameer Kumar, Krishnan Varadarajan.
Introduction to Interconnection Networks. Introduction to Interconnection network Digital systems(DS) are pervasive in modern society. Digital computers.
1 Enabling Large Scale Network Simulation with 100 Million Nodes using Grid Infrastructure Hiroyuki Ohsaki Graduate School of Information Sci. & Tech.
1 Scaling Collective Multicast Fat-tree Networks Sameer Kumar Parallel Programming Laboratory University Of Illinois at Urbana Champaign ICPADS ’ 04.
Networks for Distributed Systems n network types n Connection-oriented and connectionless communication n switching technologies l circuit l packet.
Object and Event Recognition in Wireless Multimedia Sensor Networks Clint Mueller CS441.
Low-Power Wireless Sensor Networks
Computer Science Informed Content Delivery Across Adaptive Overlay Networks Overlay networks have emerged as a powerful and highly flexible method for.
Extracted directly from:
1 Next Few Classes Networking basics Protection & Security.
Circuit & Packet Switching. ► Two ways of achieving the same goal. ► The transfer of data across networks. ► Both methods have advantages and disadvantages.
Planned AlltoAllv a clustered approach Stephen Booth (EPCC) Adrian Jackson (EPCC)
Chi-Cheng Lin, Winona State University CS 313 Introduction to Computer Networking & Telecommunication Data Link Layer Part I – Designing Issues and Elementary.
Example: Sorting on Distributed Computing Environment Apr 20,
Growth Codes: Maximizing Sensor Network Data Persistence abhinav Kamra, Vishal Misra, Jon Feldman, Dan Rubenstein Columbia University, Google Inc. (SIGSOMM’06)
RF network in SoC1 SoC Test Architecture with RF/Wireless Connectivity 1. D. Zhao, S. Upadhyaya, M. Margala, “A new SoC test architecture with RF/wireless.
A Method for Distributed Computation of Semi-Optimal Multicast Tree in MANET Eiichi Takashima, Yoshihiro Murata, Naoki Shibata*, Keiichi Yasumoto, and.
Manno, , © by Supercomputing Systems 1 1 COSMO - Dynamical Core Rewrite Approach, Rewrite and Status Tobias Gysi POMPA Workshop, Manno,
Using Heterogeneous Paths for Inter-process Communication in a Distributed System Vimi Puthen Veetil Instructor: Pekka Heikkinen M.Sc.(Tech.) Nokia Siemens.
Simics: A Full System Simulation Platform Synopsis by Jen Miller 19 March 2004.
Efficiency of small size tasks calculation in grid clusters using parallel processing.. Olgerts Belmanis Jānis Kūliņš RTU ETF Riga Technical University.
Tufts Wireless Laboratory School Of Engineering Tufts University Paper Review “An Energy Efficient Multipath Routing Protocol for Wireless Sensor Networks”,
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
CS 351/ IT 351 Modeling and Simulation Technologies HPC Architectures Dr. Jim Holten.
Localized Low-Power Topology Control Algorithms in IEEE based Sensor Networks Jian Ma *, Min Gao *, Qian Zhang +, L. M. Ni *, and Wenwu Zhu +
Video Streaming Transmission Over Multi-channel Multi-path Wireless Mesh Networks Speaker : 吳靖緯 MA0G WiCOM '08. 4th International.
Image Processing A Study in Pixel Averaging Building a Resolution Pyramid With Parallel Computing Denise Runnels and Farnaz Zand.
On Optimizing Collective Communication UT/Texas Advanced Computing Center UT/Computer Science Avi Purkayastha Ernie Chan, Marcel Heinrich Robert van de.
1 Opportunities and Challenges of Modern Communication Architectures: Case Study with QsNet CAC Workshop Santa Fe, NM, 2004 Sameer Kumar* and Laxmikant.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-2.
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.
Joe Bradish Parallel Neural Networks. Background  Deep Neural Networks (DNNs) have become one of the leading technologies in artificial intelligence.
NGS computation services: APIs and.
1 The Latency/Bandwidth Tradeoff in Gigabit Networks UBI 527 Data Communications Ozan TEKDUR , Fall.
-1/16- Maximum Battery Life Routing to Support Ubiquitous Mobile Computing in Wireless Ad Hoc Networks C.-K. Toh, Georgia Institute of Technology IEEE.
Network layer (addendum) Slides adapted from material by Nick McKeown and Kevin Lai.
H.264/SVC Video Transmission Over P2P Networks
NGS computation services: APIs and Parallel Jobs
NETWORK-ON-CHIP HARDWARE ACCELERATORS FOR BIOLOGICAL SEQUENCE ALIGNMENT Author: Souradip Sarkar; Gaurav Ramesh Kulkarni; Partha Pratim Pande; and Ananth.
CONGESTION CONTROL.
Indiana University, Bloomington
2019/9/14 The Deep Learning Vision for Heterogeneous Network Traffic Control Proposal, Challenges, and Future Perspective Author: Nei Kato, Zubair Md.
Emulating Massively Parallel (PetaFLOPS) Machines
Presentation transcript:

A Distributed Algorithm for 3D Radar Imaging PATRICK LI SIMON SCOTT CS 252 MAY 2012

eWallpaper Thousands of embedded, low-power, RISC-V processors. Connected in 2D mesh network within wallpaper. One radio and antenna per processor. 128

Applications and Challenges Application: Use the radio transceivers to image the room Algorithm: Each radio transmits pulses and records echoes The echoes are combined using SAR techniques to form an image Challenges: Response distributed amongst the processors Restrictive 2D mesh topology Limited local memory per processor (100KB)

How it Works

The Row-wise Transpose Before Transpose After Transpose Each processor sends its local data to all other processors in the row. Each node extracts data and forwards after each hop. Requires N-1 hops to perform full transpose.

The Column-wise Transpose Before Transpose After Transpose Each processor sends its local data to all other processors in the column. Each node extracts data and forwards after each hop. Requires N-1 hops to perform full transpose.

The 3D Imaging Algorithm The algorithm that runs on each processor Also known as the Fully Distributed pattern Key: Communication in grey Computation in yellow 2D FFT Backward propagation and Stolt 3D IFFT

The Functional Simulator For fast prototyping and debugging of eWallpaper applications. Applications written in SPMD style. One program instance launched per CPU. Each eWallpaper CPU simulated in its own thread.

The Functional Simulator Mesh Network API Minimal Communication Layer send_message(direction, message, message_size) receive_message(direction, message, message_size) set_receive_buffer(direction, buffer) Within a single MPI node, network functions are simulated using mutexes. Across MPI node boundaries, network functions are simulated using MPI commands. MPI node boundaries are invisible to the eWallpaper application.

Imaging Results: 3 Points Original SceneRecovered Scene

Imaging Results: Sphere Original Scene Recovered Scene

Imaging Results: Human Skull Recovered Scene

Timing and Memory Model Timing model developed from analysis of application code running on functional simulator Processor spends > 90% of its time communicating Memory requirements are shown here

Network Simulator Python-based discrete-event simulator accurately simulates network traffic on eWallpaper Simulated inter-processor communication events: 1.Packet transmission 2.Arrival of packet head 3.Arrival of packet tail 4.Acknowledgement of packet reception 5.Network buffer full/empty Timing of events based on projected link bandwidth and latency of eWallpaper network: Allows performance of different communication patterns to be predicted

Communication Patterns (our algorithm)

Communication Patterns: Speed Only Fully Distributed and 16x16 Cluster are fast enough to deliver realtime video framerates

Communication Patterns: Memory All patterns, except Fully Distributed and 16x16 Cluster, exceed the available memory per node (100KB)

Framerate vs. Resolution At planned resolution of 128 x 128 antennas, framerate of 75 fps is achieved

Speedup vs. Resolution At resolution of 128 x 128, our algorithm (fully distributed pattern) is 600 times faster than a serial implementation (single node pattern)

CPU Time Breakdown vs. Resolution

Effect of Changing Bandwidth At proposed link bandwidth of 1Gbps, the achieved framerate of 75 fps results in CPU utilization of 0.03

Effect of Precomputation Higher framerates can be achieved if FFT, Stolt and backward propagation coefficients are precomputed, but at the expense of memory.

Conclusions Developed functional simulator for eWallpaper simulations Timing model and network simulator allow performance of applications to be predicted Our parallel imaging algorithm achieves realtime video framerates with feasible memory and bandwidth requirements

Future Work