Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hardware-based CIL-machine Nizhniy Novgorod State University, Russia Laboratory of Physical Fundamentals and Technologies of Wireless Communications reporter:

Similar presentations

Presentation on theme: "Hardware-based CIL-machine Nizhniy Novgorod State University, Russia Laboratory of Physical Fundamentals and Technologies of Wireless Communications reporter:"— Presentation transcript:

1 Hardware-based CIL-machine Nizhniy Novgorod State University, Russia Laboratory of Physical Fundamentals and Technologies of Wireless Communications reporter: Maxim Shuralev Head of the project: Dr. Alexey Umnov

2 Slide 2 Hardware CIL processor project team Hardware: Maxim Shuralev, Maxim Sokolov, Dmitry Mordvinov (NNSU, Wireless Lab) Software, workloads and tools: Andrey Eltsov (NNSU, Wireless Lab), Roman Mitin, Sergey Lyalin, Sergey Galkin, Ilia Golubev (NNSU, IT Lab) Support: Dmitry Golovachev, Svetlana Surova, Elena Pankratova (NNSU, Wireles Lab) Consultants: Aliaksei Chapyzhenka (Intel), Dmitry Ragozin (Intel), Sergey Chernyshov (Nizhniy Novgorod State Technology University) Head of Wireless Lab: Alexey Umnov

3 Slide 3 Agenda Introduction Architecture of the CIL processor Description of the DSP core Description of the CIL core Speed up features of the CIL core a metainformation cache a hardware stack a hardware type control engine Garbage collector implementation Example of DSP workload for the processor Development board for processor implementation HW Implementation results Software support & libraries Conclusion and comparison

4 Slide 4 Introduction port of the.NET engine to energy-efficient low-power mobile platform advantages and disadvantages of stack-based CIL engine: maximum execution speed of CIL instructions can not be more than one instruction per clock the stack engine is the most simplest way to execute some machine code, as instruction decoding and processor structure is very simple limited ability for parallel instruction execution low complexity and low power consumption

5 Slide 5 Introduction application target and target market.NET is intended for different Web-oriented services, distributed business databases, online transactions, CRM system support and etc. CIL processor is not supposed to compete with desktop processors and PDAs by performance – but it is great for mobile market and digital home! The target is end-user specialized and oriented for: MOBILE DEVICES, Web-terminals, Web-browsers, interactive TV, Web-terminals, Web-browsers, interactive TV, HOUSE CONTROL SYSTEMS HOUSE CONTROL SYSTEMS

6 Slide 6 Introduction requirements for the CIL processor Execute the.NET (CIL) code directlyExecute the.NET (CIL) code directly.NET is native code Consume low power from power supplyConsume low power from power supply Mobile low power devices Effectively handle DSP tasksEffectively handle DSP tasks New generation of interactive multimedia mobile devices

7 Slide 7 Architecture High-level structure of the CIL processor implementation Programmers model

8 Slide 8 Architecture High-level hardware structure of the CIL processor Hardware structure

9 Slide 9 Architecture Why DSP-based ? Is it a waste of time during development or a necessary thing for digital home? As CIL processor is an excellent solution for digital home Pro: We have firmware layer for executingWe have firmware layer for executing very complex CIL instructions increased in 5-10 times performanceincreased in 5-10 times performance in multimedia applications Contra: increased development timeincreased development time We need to implement onlyWe need to implement only standard CIL set, not DSP

10 Slide 10 Architecture Why DSP-based ? Hardware implementation Pro: Pro: Effective & low-power computational kernel Good mapping CIL instruction -> DSP instruction Low power consumption in multimedia tasks Similar technology to existing and efficient ARM/Java Jazelle Contra: Only serial instruction execution (as we have CIL stack based instruction set and do not want to use superscalar techniques)

11 Slide 11 Architecture Why DSP-based ? 2-in-1: 2 native instruction sets on-board2-in-1: 2 native instruction sets on-board Complex CIL instructions (e.g. type hierarchy checks and safety checks) are simply implemented in firmware as DSP instructionsComplex CIL instructions (e.g. type hierarchy checks and safety checks) are simply implemented in firmware as DSP instructions 5x-10x speed improvement for DSP workloads5x-10x speed improvement for DSP workloads Low overhead in terms of extra transistors on-chipLow overhead in terms of extra transistors on-chip

12 Slide 12 Description of DSP core units ALU AGU-1 AGU-2

13 Slide 13 Description of CIL core Under the execution CIL mode, the programmer has the exact implementation of the ECMA 335 standard CIL engine

14 Slide 14 Speed up features of CIL core Metainformation cache Constant table String table Method table Class field table Type table Smart array table

15 Slide 15 Speed up features of CIL core Hardware typed stack

16 Slide 16 Garbage collector Automatic memory management Division of objects into big and small The generational garbage collector with two generations for small objects Separate area of memory for big objects generation 1generation 0 large heap Special coprocessor, based on reduced DSP kernel may be used for processing garbage collector tasks

17 Slide 17 Example of DSP workload Our CIL processor is an excellent target for multimedia applications

18 Slide 18 Development board Virtex-4 FPGA chip 64 MBytes DDR SDRAM 100 Mhz clock oscillator Expansion bus up to 32 I/O lines Stereo AC97 audio codec RS-232 serial port LCD display for debugging messages VGA output (50 Mhz 24-bit video DAC) PS/2 mouse and PS/2 keyboard connectors System ACE configuration controller access to external flash cards 10/100/1000 Mbit Ethernet transceiver for networking USB interface chip Xilinx XC95144XL CPLD for FPGA configur. Xilinx XCF32P Platform Flash configuration JTAG configuration port for design loading or remote debugging from PC 495 USD only

19 Slide 19 Development board Testing process for processor cores The C++ model is a full-scale analog of the Verilog HDL model The C++ model is considered as a reference model

20 Slide 20 Implementation results The ALU consumes most of the FPGA resources The DSP core uses only a small part of Virtex-4 LX25, and the CIL processor implementation takes only up to 5500 cells (~35 %) of our Virtex-4 FPGA (without optimizations) DeviceSpartan-3Virtex-4 SlicesSlice Flip- Flops 4-input LUTs Maximum frequency, MHz SlicesSlice Flip- Flops 4-input LUTs Maximum frequency, MHz AGU-1331220548N/A300200560228 AGU-2385320543N/A300200560228 ALU43685877917N/A4216593805655.4 Decoder1227602139N/A1319402303971 DSP5365628950846.94981628919177.8

21 Slide 21 Implementation results main ALU unit structure Bit Manipulation Unit (a part of the ALU unit) whole DSP kernel

22 Slide 22 Implementation results Moderate detail- level structure of implemented CIL processor

23 Slide 23 Software support Exception microcode – complex CIL instruction implementation in DSP code Class library may ported from PC Supporting system libraries – I/O, memory management Multimedia libraries – for DSP core User applications Just in time compiler for CIL code, if necessary Compiler – we are using a retargeted GCC version Assembler / disassembler – retargetable utilities, used with compiler, they a specially tuned for CIL core Linker Hardware and software codesign suite (compiler, assembler, disassembler, Verilog instruction decoder generator

24 Slide 24 Conclusion & comparison Comparison with ARM-based software.NET engine for embedded systems ( Hardware-based CIL-machineARM-based.NET execution engine 80-100 Mhz FPGA implementation27 Mhz 1-2 CIL operations per cycle (40-50 Millions of CIL operations per second) hardware execution for basic CIL operations hardware assisted stack implementation 450,000 CIL operation per second interpreted CIL operations execution 50x faster than interpreted execution50x slower than hardware execution of basic operations hardware type controlsoftware type control garbage collector may be implemented as a hardware coprocessor or intellectual memory software garbage collector Meta-information cache hardwaresoftware meta-information processing DSP core with two memory spacesARM core 2 Multiply-Accumulate instructions and 2 ALU operations in cycle = up to 4 instruction per cycle 1 ALU operation in cycle DSP core power consumption is 3-4x less than ARM core ARM core power consumption in 3-4x more than DSP core

25 Slide 25 Conclusion & comparison 1.CIL processor is not only a software concept – it may be successfully implemented in hardware 2.Our dual architecture – the CIL processor, based on a DSP core, enables multimedia applications with low-power consumption, so the CIL processor may be successfully used for digital home and digital entertainment 3.CIL typed engines are implemented in hardware, that greatly reduces overhead of type checking in run-time 4. Hardware CIL implementation greatly outperforms non- optimized software implementations (by performance and power consumption)

26 Slide 26 Project participants

27 Slide 27 Express gratitude Microsoft Corporation for grant, which allows us to joint people for different faculties of Nizhny Novgorod State University into one team and develop our hardware solution Laboratory of Physical Foundations and Technologies of Wireless Communications, Nizhny Novgorod State University, which is supported by Intel Corporation, for help during our research activities Special thanks for Aliaskey Chapyzhenka, Intel Corp. for spending his time advising us in hardware architectures

28 Slide 28

Download ppt "Hardware-based CIL-machine Nizhniy Novgorod State University, Russia Laboratory of Physical Fundamentals and Technologies of Wireless Communications reporter:"

Similar presentations

Ads by Google