Asymmetric FPGA-loaded hardware accelerators for FPGA- enhanced CPU systems with Linux Performed by:Avi Werner William Backshi Instructor:Evgeny Fiksman.

Slides:



Advertisements
Similar presentations
Device Drivers. Linux Device Drivers Linux supports three types of hardware device: character, block and network –character devices: R/W without buffering.
Advertisements

Final Presentation Part-A
Using VMX within Linux We explore the feasibility of executing ROM-BIOS code within the Linux x86_64 kernel.
1 SECURE-PARTIAL RECONFIGURATION OF FPGAs MSc.Fisnik KRAJA Computer Engineering Department, Faculty Of Information Technology, Polytechnic University of.
Internal Logic Analyzer Final presentation-part B
Chapter 4 Threads, SMP, and Microkernels Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design.
Processor history / DX/SX SX/DX Pentium 1997 Pentium MMX
Definition Asymmetric FPGA-loaded hardware accelerators for FPGA- enhanced CPU systems with LinuxDefinition Performed by:Avi Werner William Backshi Instructor:Evgeny.
Operating Systems High Level View Chapter 1,2. Who is the User? End Users Application Programmers System Programmers Administrators.
Contiki A Lightweight and Flexible Operating System for Tiny Networked Sensors Presented by: Jeremy Schiff.
Embedded Systems Programming Networking on the puppeteer.
Hardware accelerator for PPC microprocessor Final presentation By: Instructor: Kopitman Reem Fiksman Evgeny Stolberg Dmitri.
VirtexIIPRO FPGA Device Functional Testing In Space environment. Performed by: Mati Musry, Yahav Bar Yosef Instuctor: Inna Rivkin Semester: Winter/Spring.
Configurable System-on-Chip: Xilinx EDK
29 April 2005 Part B Final Presentation Peripheral Devices For ML310 Board Project name : Spring Semester 2005 Final Presentation Presenting : Erez Cohen.
1 Network Packet Generator Characterization presentation Supervisor: Mony Orbach Presenting: Eugeney Ryzhyk, Igor Brevdo.
The Xilinx EDK Toolset: Xilinx Platform Studio (XPS) Building a base system platform.
Performance Analysis of Processor Characterization Presentation Performed by : Winter 2005 Alexei Iolin Alexander Faingersh Instructor:
Performance Analysis of Processor Midterm Presentation Performed by : Winter 2005 Alexei Iolin Alexander Faingersh Instructor: Evgeny.
Operating Systems CS208. What is Operating System? It is a program. It is the first piece of software to run after the system boots. It coordinates the.
1 Fast Communication for Multi – Core SOPC Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab.
Students:Gilad Goldman Lior Kamran Supervisor:Mony Orbach Part A Presentation Network Sniffer.
Murali Vijayaraghavan MIT Computer Science and Artificial Intelligence Laboratory RAMP Retreat, UC Berkeley, January 11, 2007 A Shared.
Device Driver for Generic ASC Module - Project Presentation - By: Yigal Korman Erez Fuchs Instructor: Evgeny Fiksman Sponsored by: High Speed Digital Systems.
Introduction Operating Systems’ Concepts and Structure Lecture 1 ~ Spring, 2008 ~ Spring, 2008TUCN. Operating Systems. Lecture 1.
Lecture 7 Lecture 7: Hardware/Software Systems on the XUP Board ECE 412: Microcomputer Laboratory.
Software Development and Software Loading in Embedded Systems.
Students: Oleg Korenev Eugene Reznik Supervisor: Rolf Hilgendorf
Digital signature using MD5 algorithm Hardware Acceleration
System Calls 1.
UNIX System Administration OS Kernal Copyright 2002, Dr. Ken Hoganson All rights reserved. OS Kernel Concept Kernel or MicroKernel Concept: An OS architecture-design.
OPERATING SYSTEM OVERVIEW. Contents Basic hardware elements.
Department of Electrical Engineering Electronics Computers Communications Technion Israel Institute of Technology High Speed Digital Systems Lab. High.
Enabling the ARM Learning in INDIA ARM DEVELOPMENT TOOL SETUP.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
© 2004 Xilinx, Inc. All Rights Reserved EDK Overview.
1 Lecture 20: I/O n I/O hardware n I/O structure n communication with controllers n device interrupts n device drivers n streams.
GBT Interface Card for a Linux Computer Carson Teale 1.
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
Linux development on embedded PowerPC 405 Jarosław Szewiński.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto OS-Related Hardware.
OPERATING SYSTEMS Goals of the course Definitions of operating systems Operating system goals What is not an operating system Computer architecture O/S.
The Project Asymmetric FPGA-loaded hardware accelerators for FPGA- enhanced CPU systems with Linux The Project Asymmetric FPGA-loaded hardware accelerators.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
Hardware-software Interface Xiaofeng Fan
1 Threads, SMP, and Microkernels Chapter Multithreading Operating system supports multiple threads of execution within a single process MS-DOS.
PROJECT - ZYNQ Yakir Peretz Idan Homri Semester - winter 2014 Duration - one semester.
Chapter 13 – I/O Systems (Pgs ). Devices  Two conflicting properties A. Growing uniformity in interfaces (both h/w and s/w): e.g., USB, TWAIN.
Full and Para Virtualization
1 Lecture 1: Computer System Structures We go over the aspects of computer architecture relevant to OS design  overview  input and output (I/O) organization.
Assoc. Prof. Dr. Ahmet Turan ÖZCERİT.  What Operating Systems Do  Computer-System Organization  Computer-System Architecture  Operating-System Structure.
Ethernet Bomber Ethernet Packet Generator for network analysis
Implementation of Embedded OS Lab3 Porting μC/OS-II.
Content Project Goals. Workflow Background. System configuration. Working environment. System simulation. System synthesis. Benchmark. Multicore.
Survey of Reconfigurable Logic Technologies
Recen progress R93088 李清新. Recent status – about hardware design Finishing the EPXA10 JPEG2000 project. Due to the DPRAM problem can’t be solved by me,
Virtual Machines Mr. Monil Adhikari. Agenda Introduction Classes of Virtual Machines System Virtual Machines Process Virtual Machines.
Final Presentation Hardware DLL Real Time Partial Reconfiguration Management of FPGA by OS Submitters:Alon ReznikAnton Vainer Supervisors:Ina RivkinOz.
Interrupts and Exception Handling. Execution We are quite aware of the Fetch, Execute process of the control unit of the CPU –Fetch and instruction as.
Corflow Online Tutorial Eric Chung
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
June 25, A Timely Question.  Most modern operating systems pre-emptively schedule programs. —If you are simultaneously running two programs A and.
Introduction to Operating Systems Concepts
Input/Output (I/O) Important OS function – control I/O
CS 3305 System Calls Lecture 7.
KERNEL ARCHITECTURE.
Derek Chiou The University of Texas at Austin
CSCI/CMPE 3334 Systems Programming
OS Virtualization.
Chapter 1: Introduction CSS503 Systems Programming
Presentation transcript:

Asymmetric FPGA-loaded hardware accelerators for FPGA- enhanced CPU systems with Linux Performed by:Avi Werner William Backshi Instructor:Evgeny Fiksman Duration:1 year (2 semesters) Performed by:Avi Werner William Backshi Instructor:Evgeny Fiksman Duration:1 year (2 semesters) Final project presentation 30/03/2009

RMI Processor

RMI – SW Programming Model

Agenda  Project description  Design considerations and schematics  System diagram and functionality  System flow  User & Kernel drivers API  Issues  Future progress

Project definition  An FPGA-based system.  Asymmetric multiprocessor system, with Master CPU and several slave Accelerators (modified softcore CPUs with RAM) with same or different OpCode.  Master CPU running single-processor Linux OS, with the Accelerators functionality provided to the applications in OS by driver API.

 Platform  ML310 with PPC405  Accelerators  Based on uBlaze soft-core microprocessors.  Controllers  IRQ controller for each core. “Accelerator” refers to microprocessor + IRQ generator + RAM The Platform

HW Design considerations  Scalability – the design is CPU-independent.  Accelerator working with interrupts – no polling.  Modularity – stubs needs only Interrupt Generator.  OS not working with interrupts – generic HW compatibility.  Separate register space.  Single cycle transaction for accessing accelerator status.  Data mover stub init includes chunk size.  Partial bus separation.

SW Design considerations  Scalability – the design is CPU-independent.  Working with kernel 2.6 & glibc (libraries).  Modularity - SW build in layers, with API’s for communication.  Stub doesn’t know on which slave core it runs.  Kernel image is loaded to memory using CPIO FS.  Kernel driver is polling (single check).  User-Land driver gives the user an easy & intuitive API for launching tasks in Accelerator cores.  Separating driver into User and Kernel parts enables flexibility since feature changes can be done solely on the user land driver, and won’t require knowledge of kernel internals.  Stub code enables the target code to work with interrupts, supporting interrupt-handling applications.  Data Mover stub init includes chunk size – no character recognition needed.

Accelerator Data & Instr. Dual port RAM CPU (uBlaze) IRQ Generator General Purpose Registers Slave Master PLB v.4.6 IRQ Accelerator Schematics MEM Controller MEM Controller Instruction bus Data bus

HW Design Schematics PPC DDR MEM MMU 2 PLB v.4.6 bus Accelerator Data & Instr MEM Accelerator Data & Instr MEM PLB v.4.6 bus PLB to PLB 1. The PLB to PLB bridge is needed for communication between PPC and IRQ Generators. 2. The MMU has 2 PLB buses in order to present the main memory to Accelerator cores at non-zero address.

SW/HW layers Accelerated Software platform FPGA PPC 405 Accelerator DDR MEM MMU Linux (Kernel ) Protocol communication Layer (Low-level SW) Instr MEM & Data MEM Software Stub (Data mover & executer) Virtual communication Layer (User SW) Kernel Land: kernel driver driver-allocated memory User Land: user land driver demo application

Partial SW Flow

Partial SW Flow - continue

Accelerator Stub API When target code finishes, it needs to pass control to the stub using the return_control_to_stub function: void return_control_to_stub(int * ret_val, int count) The function requires 2 paratemers: the amount of int to copy and a pointer to the first int (it doesn’t have to be int, just passed through as a predefined amount of int – in other words, using 32-bit data blocks).

User driver API  void FUD_open_device()  void FUD_close_device()  void FUD_allocate_and_load_to_memory(char *file_name, u_core_dsc *ret_st)  void FUD_run(int core_id)  void FUD_get_ret_values(int core_id, void* ret_ptr)  typedef struct _userland_core_dsc{ int valid; int core_id; int ret_size; } u_core_dsc;

Kernel driver API  int FPGA_drv_open(struct inode *inode, struct file *filp)  int FPGA_drv_release(struct inode *inode, struct file *filp)  static int FPGA_drv_ioctl(struct inode *inode,struct file *filp, unsigned int cmd, unsigned long arg)  IOCTL (I/O Control) commands: GET_CONTINUOUS_MEMORY FREE_CONTINUOUS_MEMORY READ_FROM_CORE WRITE_TO_CORE CHECK_IF_CORE_FINISHED  As mentioned before, kernel driver doesn’t perform the actual polling of the slave cores (this would be bad practice, since the driver runs in ring 0 and might block user applications).

Issues along the way  Linux on the ML310 isn’t easy.  Xilinx SystemACE (CF controller) issues.  Xilinx unstable memory controller.  Debug tools are bad.  PPC emulators don’t simulate the PPC405 well.

Future progress  Future projects:  Building a compiler for the system for simplifying and automatizing preparation of the target code.  Supporting data coherency in caches (maybe running the platform as a HW base for cache coherency project).  Improving the platform – switch to ALTERA (hopefully with a better memory controller).

Conclusions  Code immigration works.  Simulators are a good tool for SW debug.  Choosing the right simulator is important.  Don’t work with OLD/deprecated equipment.  Xilinx environment necessitates constant maintenance.  Embedded debug tools are very useful.  Visual step-by-step debug was our solves most of the issues.  Working in Xilinx EDK with predefined Ips speeds up generating a complex design, but imposes severe difficulties when trying to customize it.

System Flow – how to  HW is loaded on FPGA, Linux runs on central PPC core, accelerators are preloaded with client software stub.  Linux finishes loading, login into busybox (using “—login”).  SW driver is loaded in the memory (using the famous./1 script).  Play a bit with the marvelous shell to test liveness using “ls” or “cat 1”).  Demo application is launched (using./demo_app).  Select the desired functionality and smile.