F. Gharsalli, S. Meftali, F. Rousseau, A.A. Jerraya TIMA laboratory 46 avenue Felix Viallet 38031 Grenoble Cedex - France Embedded Memory Wrapper Generation.

Slides:



Advertisements
Similar presentations
IEEE INFOCOM 2004 MultiNet: Connecting to Multiple IEEE Networks Using a Single Wireless Card.
Advertisements

Nios Multi Processor Ethernet Embedded Platform Final Presentation
A hardware-software co-design approach with separated verification/synthesis between computation and communication Masahiro Fujita VLSI Design and Education.
1 SoC (DSP+ARM) Platform SungKyunKwan University VADA Lab. ( )
SoC Challenges & Transaction Level Modeling (TLM) Dr. Eng. Amr T. Abdel-Hamid ELECT 1002 Spring 2008 System-On-a-Chip Design.
Presenter : Cheng-Ta Wu Kenichiro Anjo, Member, IEEE, Atsushi Okamura, and Masato Motomura IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39,NO. 5, MAY 2004.
August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache.
Avalon Switch Fabric. 2 Proprietary interconnect specification used with Nios II Principal design goals – Low resource utilization for bus logic – Simplicity.
Chapter 8 Hardware Conventional Computer Hardware Architecture.
Addressing the System-on-a-Chip Interconnect Woes Through Communication-Based Design N. Vinay Krishnan EE249 Class Presentation.
NETWORK ON CHIP ROUTER Students : Itzik Ben - shushan Jonathan Silber Instructor : Isaschar Walter Final presentation part A Winter 2006.
STUDY OF THE ARIZONA MICROCHIP MICRO CONTROLLERS MOSTEFA GHASSOUL DAMMAM TECHNICAL COLLEGE
Core-based SoCs Testing Julien Pouget Embedded Systems Laboratory (ESLAB) Linköping University Julien Pouget Embedded Systems Laboratory (ESLAB) Linköping.
I/O Channels I/O devices getting more sophisticated e.g. 3D graphics cards CPU instructs I/O controller to do transfer I/O controller does entire transfer.
DSP Algorithm on System on Chip Performed by : Einat Tevel Supervisor : Isaschar Walter Accompanying engineers : Emilia Burlak, Golan Inbar Technion -
Hier wird Wissen Wirklichkeit Computer Architecture – Part 5 – page 1 of 25 – Prof. Dr. Uwe Brinkschulte, Prof. Dr. Klaus Waldschmidt Part 5 Fundamentals.
Presented by Frank Gennari
1 Evgeny Bolotin – ICECS 2004 Automatic Hardware-Efficient SoC Integration by QoS Network on Chip Electrical Engineering Department, Technion, Haifa, Israel.
DATA ACQUISITION SYSTEM FPGA2 APEX20K200E SAMSUNG MICROCONTROLLER ARM - RISC CORE (50MHZ – 32 BIT, 8 KByte SRAM) BOOT FLASH 512K X 16 PROGRAM MEMORY SDRAM.
Orion: A Power-Performance Simulator for Interconnection Networks Presented by: Ilya Tabakh RC Reading Group4/19/2006.
Presenter : Cheng-Ta Wu Antti Rasmus, Ari Kulmala, Erno Salminen, and Timo D. Hämäläinen Tampere University of Technology, Institute of Digital and Computer.
Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University.
Hardware Overview Net+ARM – Well Suited for Embedded Ethernet
Efficient Hardware dependant Software (HdS) Generation using SW Development Platforms Frédéric ROUSSEAU CASTNESS‘07 Computer Architectures and Software.
Role of Standards in TLM driven D&V Methodology
Presenter : Cheng-Ta Wu Vijay D’silva, S. Ramesh Indian Institute of Technology Bombay Arcot Sowmya University of New South Wales, Sydney.
2017/4/21 Towards Full Virtualization of Heterogeneous Noc-based Multicore Embedded Architecture 2012 IEEE 15th International Conference on Computational.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
04/04/20071 Image Understanding Architecture: Exploiting Potential Parallelism in Machine Vision.
CASTNESS‘11 Computer Architectures and Software Tools for Numerical Embedded Scalable Systems Workshop & School: Roma January 17-18th 2011 Frédéric ROUSSEAU.
SHAPES scalable Software Hardware Architecture Platform for Embedded Systems Hardware Architecture Atmel Roma, INFN Roma, ST Microelectronics Grenoble,
SOC Consortium Course Material ASIC Logic National Taiwan University Adopted from National Chiao-Tung University IP Core Design.
1 3-General Purpose Processors: Altera Nios II 2 Altera Nios II processor A 32-bit soft core processor from Altera Comes in three cores: Fast, Standard,
RTS Meeting 8th July 2009 Introduction Middleware AUTOSAR Conclusion.
Samsung ARM S3C4510B Product overview System manager
Computer Architecture and Organization Introduction.
CHAPTER 3 TOP LEVEL VIEW OF COMPUTER FUNCTION AND INTERCONNECTION
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
A New Method For Developing IBIS-AMI Models
Microcontroller Presented by Hasnain Heickal (07), Sabbir Ahmed(08) and Zakia Afroze Abedin(19)
Automatic Communication Refinement for System Level Design Samar Abdi, Dongwan Shin and Daniel Gajski Center for Embedded Computer Systems, UC Irvine
SystemC and Levels of System Abstraction: Part I.
Microprocessor-based Systems
CASTNESS'07 Rome, Italy January 2007 Programming Models and Hardware Dependent Software Abstraction for Multi-Processor SoC Ahmed A. Jerraya TIMA.
Memory Cell Operation.
1 Memory Design for Multi-Core System on Chip. 2 Introduction The DSP processor is optimized for extremely high performance for a specific kind of arithmetic-intensive.
GreenBus Extensions for System-On-Chip Exploration.
Handling Mixed-Criticality in SoC- based Real-Time Embedded Systems Rodolfo Pellizzoni, Patrick Meredith, Min-Young Nam, Mu Sun, Marco Caccamo, Lui Sha.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
2D/3D Integration Challenges: Dynamic Reconfiguration and Design for Reuse.
Development of Programmable Architecture for Base-Band Processing S. Leung, A. Postula, Univ. of Queensland, Australia A. Hemani, Royal Institute of Tech.,
Axel Jantsch 1 Networks on Chip Axel Jantsch 1 Shashi Kumar 1, Juha-Pekka Soininen 2, Martti Forsell 2, Mikael Millberg 1, Johnny Öberg 1, Kari Tiensurjä.
VLSI Algorithmic Design Automation Lab. THE TI OMAP PLATFORM APPROACH TO SOC.
PART 7 CPU Externals CHAPTER 7: INPUT/OUTPUT 1. Input/Output Problems Wide variety of peripherals – Delivering different amounts of data – At different.
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Yaohang Li.
Using Custom Accelerators in Wireless Systems Alex Papakonstantinou, Deming Chen Illinois Center for Wireless Systems Wireless SoC Design Trends and Challenges.
DDRIII BASED GENERAL PURPOSE FIFO ON VIRTEX-6 FPGA ML605 BOARD PART B PRESENTATION STUDENTS: OLEG KORENEV EUGENE REZNIK SUPERVISOR: ROLF HILGENDORF 1 Semester:
Virtual-Channel Flow Control William J. Dally
Field Programmable Port Extender (FPX) 1 Modular Design Techniques for the Field Programmable Port Extender John Lockwood and David Taylor Washington University.
Network On Chip Cache Coherency Final presentation – Part A Students: Zemer Tzach Kalifon Ethan Kalifon Ethan Instructor: Walter Isaschar Instructor: Walter.
Disk Drive Architecture Exploration VisualSim Mirabilis Design.
System on a Programmable Chip (System on a Reprogrammable Chip)
STUDY OF PIC MICROCONTROLLERS.. Design Flow C CODE Hex File Assembly Code Compiler Assembler Chip Programming.
Programmable Hardware: Hardware or Software?
Andreas Hoffmann Andreas Ropers Tim Kogel Stefan Pees Prof
HIBI_PE_DMA Example.
Course Outline for Computer Architecture
ADSP 21065L.
Presentation transcript:

F. Gharsalli, S. Meftali, F. Rousseau, A.A. Jerraya TIMA laboratory 46 avenue Felix Viallet Grenoble Cedex - France Embedded Memory Wrapper Generation for Multi-processor SoC Design gabriela:

Memory for SoC l SoC: a single chip  Heterogeneous components (CPU, IP, …)  Application-specific architecture l Integration of standard Memory IP  Adaptation of memory protocols to the specific network (N processors) DSP CPU IP Communication Network SRAM Memory FLASH Memory GLUE

Memory for SoC l SoC: a single chip  Heterogeneous components (CPU, IP, …)  Application-specific architecture l Integration of standard Memory IP  Adaptation of memory protocols to the specific network (N processors) DSP CPU IP Communication Network Wrapper SRAM Memory FLASH Memory

Outline l Introduction  Memory IP based design  Memory integration issues l Architectural Models and Basic Concepts l Memory Wrapper  Generic architecture  Automatic generation l Experiments l Conclusion

Memory IP based design l Steadily Increasing Capacity l Memory Reuse Based Design  to close the gap between capacity and productivity MEMORY INTERFACE DESIGN IS A DOMINANT PROBLEM

Memory integration issues l Complex system design  Heterogeneous components n Several logical ports and specific communication protocols  Standard Memory components n Limited physical ports and standard access protocols l Large memory design space exploration  Different memory characteristics (Type, Size, Consumption) l Multi-masters SoC  Parallel accesses to the global memory

Memory integration issues l Complex system design  PORT ADAPTATION is needed l Large memory design space exploration  WRAPPER FLEXIBILITY is required l Multi-masters SoC  SOPHISTICATED SYNCHRONIZATION MECHANISMS are required

Related Work l Port adaptation  CoWare  Polis  Cadence (VCC) l Wrapper flexibility  Marie Curie  COSY l Synchronization mechanisms  Fixed priority (PalmChip)  TDMA and Round-Robin (Sonics) None of the existing strategies has fully addressed the problems of memory IP integration already described

Our Contributions l Generic memory wrapper architecture  Port adaptation  Memory flexibility  Arbitration between parallel memory accesses l Automatic generation of memory wrapper by assembling library components

Outline l Introduction  Memory IP based design  Memory integration issues l Architectural Models and Basic Concepts l Memory Wrapper  Generic architecture  Automatic generation l Experiments l Conclusion

Architectural models l Virtual architecture model  Abstract modules (Virtual modules)  Abstract channels  Implicit communication procedures  Wrapper specification but no implementation M1 M2 MEMORY Channels Virtual architecture M1 OS Wrapper Physical Communication Network MEMORY Micro-architecture Module implementation l Micro-architecture model  Modules implementation  Physical communication network  Explicit communication procedures  HW wrapper implementation and synthesis

Basic concepts: virtual module l Separation between behavior and communication interface  Memory access must be independent of the memory type l Hiding the abstraction level of memory description  Memory integration must be independent of these abstraction levels n Logical and physical accesses  To adapt these accesses, we use a wrapper Memory IP External port (logic port) Internal port (physical memory port) virtual port Wrapper Channel 1 Channel 2

Outline l Introduction  Memory IP based design  Memory integration issues l Architectural Models and Basic Concepts l Memory Wrapper  Generic architecture  Automatic generation l Experiments l Conclusion

Memory wrapper architecture l Generic wrapper architecture  Memory dependent part n Memory port adapter (MPA)  Communication dependent part n Channel adapter (CA)  Internal bus (IB) n Address, data and control  Arbiter Memory IP Memory Bus IB MPA CA3CA1 arbiter memory wrapper CA2 channels Communication network

Flexibility of the memory architecture l Flexible memory wrapper architecture for a large design space exploration l Flexibility is ensured by generic and modular models  CA: customized with communication network specific parameters  MPA: customized with memory specific parameters  We change only the Memory Port Adapter part MPA2MPA1 Single port memory IP Memory Bus IB MPA CA3CA1 arbiter CA2 memory wrapper Communication network Memory Busses Dual port memory IP IB CA3 arbiter CA1CA2 memory wrapper Communication network

Memory wrapper generation flow l Wrapper generation  Input : n Memory IP library n Wrapper components library (CA, MPA) n Architectural parameters –Number of ports, channels, protocols  Action n Customizing the generic CA and MPA from library using the architectural parameters n Instantiation of customized CA and MPA n Interconnection to the rest of system  Output : n Micro-architecture Virtual Architecture Annotated with Parameters Memory IP Library CA MPA library Wrapper Generation Micro-architecture

Outline l Introduction  Memory IP based design  Memory integration issues l Architectural Models and Basic Concepts l Memory Wrapper  Generic architecture  Automatic generation l Experiments l Conclusion

Image Filtering Process Input/Output Image Input image Output image

Experiments l Low level image processing for digital camera  The initial specification is n Memory rich (2 Mbytes Flash, 2Mbytes ROM, 256 Kbytes SRAM) n Processor poor (only one 8 bit RISC processor) l Acceleration by adding an other processor  We use 2 ARM7 processors  1 global memory  Point-to-point communication network l 2 Experiments to prove the memory flexibility ensured by wrapper  Experiment 1: using a dual port SRAM  Experiment 2: using a single port SDRAM

Experience 1: Dual port memory T1 T2 M1 T3 T4 M2 Logical channels SRAM dual port

Experience 1: Dual port memory T1 T2 M1 T3 T4 M2 Logical channels SRAM dual port Extracted parameters Port number2 Port typesc_lv Port width32 Access modeBurst Channel number2 … …

Experience 1: Dual port memory T1 T2 M1 T3 T4 M2 Logical channels SRAM dual port Extracted parameters Port number2 Port typesc_lv Port width32 Access modeBurst Channel number2 … … Module 1 implementation ARM7 ISS CPU wrapper Module 2 implemenbtation ARM7 ISS CPU wrapper Memory Busses (32) SRAM dual port SRAM dual port MEMORY WRAPPER

Experience 1: Dual port memory T1 T2 M1 T3 T4 M2 Logical channels SRAM dual port Extracted parameters Port number2 Port typesc_lv Port width32 Access modeBurst Channel number2 … … Module 1 implementation ARM7 ISS CPU wrapper Module 2 implemenbtation ARM7 ISS CPU wrapper Memory Busses (32) SRAM dual port SRAM dual port SRAM MPA SRAM MPA

Experience 1: Dual port memory T1 T2 M1 T3 T4 M2 Logical channels SRAM dual port Extracted parameters Port number2 Port typesc_lv Port width32 Access modeBurst Channel number2 … … Module 1 implementation ARM7 ISS CPU wrapper Module 2 implemenbtation ARM7 ISS CPU wrapper Memory Busses (32) SRAM dual port SRAM dual port CA1 AFIFO + BUFFER CA2 AFIFO + BUFFER SRAM MPA SRAM MPA

Experience 1: Dual port memory T1 T2 M1 T3 T4 M2 Logical channels SRAM dual port Extracted parameters Port number2 Port typesc_lv Port width32 Access modeBurst Channel number2 … … Module 1 implementation ARM7 ISS CPU wrapper Module 2 implemenbtation ARM7 ISS CPU wrapper Memory Busses (32) SRAM dual port SRAM dual port CA1 AFIFO + BUFFER CA2 AFIFO + BUFFER IB1(32) IB2(32) SRAM MPA SRAM MPA

Experience 1: Dual port memory l MPA services  Test  Address decoding  Access mode n burst mode –burst seq (4 words)  Bank control Module 1 implementation ARM7 ISS CPU wrapper Module 2 implemenbtation ARM7 ISS CPU wrapper Memory Busses (32) SRAM dual port SRAM dual port CA1 AFIFO + BUFFER CA2 AFIFO + BUFFER IB1(32) IB2(32) SRAM MPA SRAM MPA

Experience 2: Single port memory T1 T2 M1 T3 T4 M2 SDRAM Single port Logical channels

Experience 2: Single port memory T1 T2 M1 T3 T4 M2 SDRAM Single port Logical channels Extracted parameters Port number1 Port typesc_lv Port width16 Access modeR/W Channel number2 … …

Experience 2: Single port memory T1 T2 M1 T3 T4 M2 SDRAM Single port Logical channels Extracted parameters Port number1 Port typesc_lv Port width16 Access modeR/W Channel number2 … … IB (32) arbiter Memory Bus (16) SDRAM Single port Module 1 implementation ARM7 ISS CPU wrapper Module 2 implementation ARM7 ISS CPU wrapper CA1 AFIFO + BUFFER CA2 AFIFO + BUFFER SDRAM MPA MEMORY WRAPPER

Experience 2: Single port memory T1 T2 M1 T3 T4 M2 SDRAM Single port Logical channels Extracted parameters Port number1 Port typesc_lv Port width16 Access modeR/W Channel number2 … … Memory Bus (16) SDRAM Single port Module 1 implementation ARM7 ISS CPU wrapper Module 2 implementation ARM7 ISS CPU wrapper CA1 AFIFO + BUFFER CA2 AFIFO + BUFFER

Experience 2: Single port memory T1 T2 M1 T3 T4 M2 SDRAM Single port Logical channels Extracted parameters Port number1 Port typesc_lv Port width16 Access modeR/W Channel number2 … … Memory Bus (16) SDRAM Single port Module 1 implementation ARM7 ISS CPU wrapper Module 2 implementation ARM7 ISS CPU wrapper CA1 AFIFO + BUFFER CA2 AFIFO + BUFFER SDRAM MPA

Experience 2: Single port memory T1 T2 M1 T3 T4 M2 SDRAM Single port Logical channels Extracted parameters Port number1 Port typesc_lv Port width16 Access modeR/W Channel number2 … … IB (32) arbiter Memory Bus (16) SDRAM Single port Module 1 implementation ARM7 ISS CPU wrapper Module 2 implementation ARM7 ISS CPU wrapper CA1 AFIFO + BUFFER CA2 AFIFO + BUFFER SDRAM MPA

Experience 2: Single port memory l MPA services  Test  Address decoding  Access mode n classic R/W mode  Bank control  Initialization  Refresh  Conversion bits IB (32) arbiter Memory Bus (16) SDRAM Single port Module 1 implementation ARM7 ISS CPU wrapper Module 2 implementation ARM7 ISS CPU wrapper CA1 AFIFO + BUFFER CA2 AFIFO + BUFFER SDRAM MPA

Results l SystemC code size for the memory wrapper  Experience 1 : 1438 lines  Experience 2 : 1335 lines l Latency (without memory latency)  Write : 3 CPU cycles  Read : 7 CPU cycles (send/receive) l Simulation results of an image of 387 x 222 :  Experience 1: 2.05 millions of CPU cycles  Experience 2: 2.97 millions of CPU cycle  Fast design exploration with different memories thanks to automatic memory wrapper generation

Outline l Introduction  Memory IP based design  Memory integration issues l Architectural Models and Basic Concepts l Memory Wrapper  Generic architecture  Automatic generation l Experiments l Conclusion

Conclusion l Systematic method to integrate Memory IP in the multi-processors SoC architectures at system level l Generic memory wrapper architecture  Port adaptation  Flexibility of the memory architecture  Parallel accesses arbitration l Automatic memory wrapper generation is done by assembling library components l Fast memory design exploration l Application for low-level image processing

Perspectives l Generalization of IP wrapper architecture based on generic wrapper model l Using a sophisticated communication network like AMBA bus and packet switch communication network l Configurable memory test bench

THANK YOU