Presentation is loading. Please wait.

Presentation is loading. Please wait.

VLSI DESIGN 1998 TUTORIAL Part 1. Core Building Blocks and Building Systems using Cores Rajesh K. Gupta University of California, Irvine. What are cores?

Similar presentations

Presentation on theme: "VLSI DESIGN 1998 TUTORIAL Part 1. Core Building Blocks and Building Systems using Cores Rajesh K. Gupta University of California, Irvine. What are cores?"— Presentation transcript:

1 VLSI DESIGN 1998 TUTORIAL Part 1. Core Building Blocks and Building Systems using Cores Rajesh K. Gupta University of California, Irvine. What are cores? What are cores? Building systems using cores Challenges in using cores

2 © 1998 R. Gupta 2 Available Core Building Blocks ARM810 PPC401 68030

3 © 1998 R. Gupta 3 What Is A Core Cell? Working definition at least 5K gates pre-designed pre-verified re-usable Examples: –Processor: LSI logic CW4001/4010/4100, ARM 7TDMI, ARM 810, NEC 85x, Motorola 680x0, IBM PPC –DSP cores: TI TMS320C54X, Pine, Oak –Encryption: PKuP, DES –Controllers: USB, PCI, UART –Multimedia: JPEG comp., MPEG decoder, DAC –Networking: ATM SAR, Ethernet

4 © 1998 R. Gupta 4 Core Types Soft cores (code) –HDL description –flexible, i.e., can be changed to suit an application –technology independent: may be resynthesized across processes –significant IP protection risks Firm cores (code+structure) –gate-level netlist to be placed and routed –technology sampled Hard cores (physical) –ready for drop in –include layout and timing (technology dependent) –IP is easily protected –mostly processors and memory –functional test vectors or ATPG vectors available.

5 © 1998 R. Gupta 5 Core Types and Their Use Soft system specification Behavioral HDL Bus Functional ISA model system design RTL HDL Synthesizable RTL RTL Functional scheduling, binding Timing models Power models logic design Gate Netlist Firm Gate Functional control generation, FSM synthesis physical design Mask Data Hard Fault Coverage floorplanning, placement, routing Technology: ASIC or FPGA

6 © 1998 R. Gupta 6 DEF = Design Exchange Format (Cadence) SPEF = Standard Parasitic Extended Format (Cadence) GDSII = Layout format (Cadence) ITL = Interpolated Table Lookup cell-level timing model (Mentor) LEF = Layout Exchange Format (Cadence) MMF = Motive Modeling Format (Viewlogic) NLDM = Non-linear Delay Model (Synopsys) TLF = Table Lookup Format (Cadence) VCD = Verilog Change Dump (Cadence) WGL = Waveform Graphical Language (TSSI) Core Portability Determined by technology independence and data format. –Technology independence based on the type of core –both open and proprietary data formats are current in use.

7 © 1998 R. Gupta 7 Timing Information in Firm and Hard Cores Timing behavior can be generated from SPICE inputs However, it is not always possible for big cores –static timing information is necessary Basic delay model –propagation delay model from inputs to outputs –slew model (as a function of load and input slew) –input/output capacitances –setup and hold constraints on inputs.

8 © 1998 R. Gupta 8 What are cores? What are cores? Building systems using cores Building systems using cores Challenges in using cores

9 © 1998 R. Gupta 9 MEMORYCache/SRAM or even DRAM Processor Core DSP Processor Core GraphicsVideo VRAM Motion Encryption/Decryption SCSI EISA Interface GlueGlue PCI Interface I/O Interface LAN Interface Hub Architecture Commodity Software: - encryption/decryption - device drivers - legacy code - operating/runtime system CommodityHardware:-compression-encryption-modem -signal proc. -image proc. SOC is a SM of LSI Logic Corporation. Building Systems-On-A-Chip Using Cores

10 © 1998 R. Gupta 10 Time-constrained computing systems. Time-constrained computing systems. Set-top VOD+ Games HQ Graphics Video Conferencing MPEG1 encoding Audio & Video Bridging MPEG2 encoding High-end Set-top PDA Derivatives S-O-C Application Classes

11 © 1998 R. Gupta 11 Systems-On-A-Chip (SOCs) Two Types: Technology-Driven –Developed In-House, maximum leverage of technology crown-jewels –Close cooperation between module developers and system designers –or wide-ranging cross-licensing agreements between partners Component-Driven –Core cells as IP carriers »IP encapsulated into usable products »design reuse is critical to IP products

12 © 1998 R. Gupta 12 Component-Driven SOC Core supplier different from core user –Third party IP providers Significant technology packaging without importing it –The IP provider wants to sell a product and not the technology behind the product Enormous technical, and legal challenges –can it be done successfully? –who guarantees if a SOC works as required –who is liable in case the end product does not perform?

13 © 1998 R. Gupta 13 3Soft: uC, DSP, LAN, SCSI, PI ARM: uC, uP Plessey: per. controllers, DSP Scenix: uC, PCI, DMA Western Digital Center: uC TI: DSP; NEC: DSP, uC Symbios: ARM7 TC VAutomation: uP, controllers CAST: 2910A, IDT49C410, DMAc Butterfly DSP: DSP, FFT, DFT, ADSL, OFDM Int. Sil. Systems: ADPCM, FIR Analog Devices: DSP DSP Group: Pine, Oak Digital Design & Dev: MIDI Hitachi: MPGE, PCI, SCSI, uC Palmchip: MPEG, UART, ECC Silicon Engg.: micro VGA Eureka: PCI; Virtual Chips: PCI, USB Logic Innovations: PCI, ATM OKI: PCI, PCMCIA, DMA, UART Sand: USB, PCI Sierra: ATM SAR, Ether, R3000 LogicVision: BIST, JTAG ROHM: UART, SIO, PIO, FIFOc, Add, Mpy, ALU Synopsys: DesignWare, ISA, Intel uC Chip Express: FIFO, RAM, ROM VLSI Libraries: Memory, Mpy Focus Semi: PLL, VCXO VLSI Cores: Encryption, DES ASIC Intl: DES NOT EXHAUSTIVE. One-stop Shops ASIC Cores Availability LSI logic CoreWare IBM Microelectronics Motorola FlexWare Lucent One-Stop Shops

14 © 1998 R. Gupta 14 FPGA/CPLD Cores Availability Capacity constrained cores –do not include wide/high performance PCI, ATM SAR, or Microprocessors Altera –8-bit 6502 –DMAC 8237 Xilinx –PCI Actel –System Programmable Gate Array (SPGA) »combine FPGA with customer ASIC »ASIC examples: PCI, Router, DMA controller.

15 © 1998 R. Gupta 15 Three ways: Licensable Foundary Captive Foundary captive cores do not have to reveal internal design and layoutof the core. The foundary provides a bounding box. Current Core Market Models 1. A design house licenses design and tools –DSP Group (Pine and Oak Cores), 3Soft, ARM (RISC) –offering includes HDL simulation model, tool and/or an emulator –customer does the design, fab. 2. Core vendor designs and fabs ICs –TI, Motorola, Lucent –VLSI, SSI, Cirrus, Adaptec 3. Core vendor sells cores, takes customer designs and fabs ICs –LSI logic, TI, Lucent

16 © 1998 R. Gupta 16 Months to completion Source: Integrated System Design Core Trends: 1997 Survey of Designers 74% hardware designers. 26% plan to purchase core for next design: –40% hard, 68% soft, 32% firm

17 © 1998 R. Gupta 17 MEMORYPROCESSORSINTERFACE etc.ANALOG GENERICS Source: Integrated System Design Application Needs

18 © 1998 R. Gupta 18 CPU PCI controller Host Bus Primary PCI Bus PCI/IDE/ ISA ISA Bus IDE ASIC Using Cores : PCI Class of interface cores such as –USB, UART, SCSI, PCI, 1394 etc. Identify target technology –ASIC, FPGA PCI (Peripheral Component Interface) –processor independent CPU interface to peripherals –multi-master, peer-to-peer protocol –synchronous: 8-33 MHz (132 MB/s) –arbitration: central, access oriented, hidden –variable length bursting on reads and writes –(I/O, Mem) x (Read, Write) and IACK commands

19 © 1998 R. Gupta 19 PCI Cores VHDL/Verilog synthesizable cores with options: –PCI-Host, PCI-Satellite –32-bit (33 MHz) or 64-bit (66 MHz) –FIFO or register data storage –Synchronous or Asynchronous host interface Core components –Master/Target Read/Write FIFOs, –Master/Target State Machines –Configuration registers Timing requirements –input setup time = 7ns; clock to output delay = 11ns DC Specs: input pin caps: 10 pF, clk pin 12 pF, ID Sel 8pF

20 © 1998 R. Gupta 20 Source: EE Times User Experience Huges Network Systems: –DirecPC ASIC in a satellite receiver card –80K gates device on Chip Express process DirecPC consists of –IDT R3041 RISC controller –Memory, Demodulator, Error-check, PCI core PCI core from Virtual Chips –17K gates including asynchronous FIFOs –Guesstimate: 4K extra gates due to the core (5%) Comments: Their test vectors assume you have direct access to the internal interface of the core. I looked through their test vectors and tried to do the same things using my back end. They were kind of giving us a reference documentation. It wasnt turnkey.

21 © 1998 R. Gupta 21 Using Cores: DSPs 16-bit fixed point processors are most commonly used. DSPs –simple: Clarkspur Design CD2450 (variable data width) –compatible: DSPGroup, TI, SGS-T: 320C5x –clone: Options –memory, mem controller, interrupt controller, host port, serial port Criticals –power consumption as most DSP applications go into portable products

22 © 1998 R. Gupta 22 Design using DSP Cores Core vendors often supply a development chip or core version of the COTS processor –board-level prototyping fairly common –followed by single-chip solution To avoid board-level prototyping, a full-functional simulation model is a must, particularly for foundry captive cores. Software tools provided –assembler, linker, instruction set simulator, debugger, (high-level language compiler?)

23 © 1998 R. Gupta 23 Not exhaustive, only a representative sample. DSP Sample Points TI TEC320C52 –16-bit fixed-point TMS320C52 »1Kx16 data RAM, 4Kx16 program RAM »2 serial ports, 1 16-bit timer –and 0.8 micron 15,000-gate gate array Motorola 7-Day CSIC –8-16 MHz HC08, DMA, MMU,.. SGS-Thomson ST18932, ST18950 –16-bit fixed-point DSPs, 0.5 u, 3.3 volt CMOS, 80MHz –has no off-the-shelf DSP IC –used in PC sound cards, 950 has a better assembly

24 © 1998 R. Gupta 24 Third Party DSP Cores DSPGroup Pine –16-bit fixed-point, 0.8u CMOS, 5.0/3.3 V, 40 MHz – 36-bit ALU, 16-bit MPY, 2Kx16 RAM/ROM, (prog mem is outside core) –used in pagers and answering machines DSPGroup Oak –same as Pine, plus includes a bit manipulation unit –Viterbi decoding support instructions (min, max) –used in digital cellular telephony Clarkspur CD2400, CD2450 –16-bit fixed-point –24-bit ALU, MPY, Acc, 2x 256x16 data RAM/450 makes it 48 bits –used in fax-modem

25 © 1998 R. Gupta 25 One-Stop Shops: LSI Logic CoreWare Cores for building ASIC for most embedded applications: –laser printer, ATM, PDA, Set-top, Router, Graphics accelerators, etc. CPU cores: miniRISC CW4K, Oak DSP –miniRISC compatible with MIPS R4000 –0.5u CMOS, 2mW/MHz, 60MHz, 3-stage pipeline –32-bit address/data bus –full scan: 99% fault coverage, gate-level timing model Interface: PCI, Fibre Channel, SerialLink Networking: Ethernet, ATM (SAR), Viterbi, RS Compression etc: MPEG, JPEG, DAC/ADC.

26 © 1998 R. Gupta 26 Core Examples Only a representative sample of cores. Not exhaustive or even comparative. Processor cores –LSI Logic CW4001, CW4010 –ARM (7) processors –Motorola FlexCore Memory cores –16M/18M Rambus DRAM Multimedia cores –CompCore CD2 Networking –Media Access Controller (MAC) Encryption cores –VLSI cores, ASIC international.

27 CP0ALU Shifter Register File FlexLink CBus Courtesy: S. Dey, ICCAD96 LSI Logic. LSI Logic: CW4001 Core Behavioral Verilog/VHDL model Gate-level timing accurate model Specifications –60 MHz, 60 MIPS (45 MIPS average), 3 stage pipeline –0.5 micron CMOS process, 4 sq. mm., 2mW/MHz –Full-scan with 99% fault coverage. Interfaces: –CBUS, Computational Bolt-On (CBO), Co-processor, MMU Customizability: –BIU, cache controller, MDU, MMU, DRAM/SRAM controllers, timers, caches (<16K), RAM/ROM, DMAc –Upto 3 Co-processors (FPU, Graphics, Compression, Network Protocol), MPY/DIV unit, CRC, direct access to CPU GPRs

28 © 1998 R. Gupta 28 Co-processor has its own instruction set including read data bus for instruction, rd/wr to external mem. read/write to CPU registers, stall and interrupt CPU CW delivers [0:5] and [26:31] opc fields to Co-processor instr. decoder Coprocessor executs in lockstep with CPU pipeline stages. BIU, Cache Controller Write Buffer DRAM Controller Timer DMA Controller Extended BIU (XC) BBus XBus Courtesy: S. Dey, ICCAD96 LSI Logic. CW4001 Co-proc Interface FlexLink Interface CPUBus Interface CUCache MMURAM/ROM CPUBus Mult/Div coprocessor Using CW4001

29 © 1998 R. Gupta 29 CW4010 CPU Core Verilog/VHDL model with gate-level timing 80MHz, 160 MIPS (110 MIPS average), 6 stage pipeline 0.5 micron CMOS, 9 sq. mm., 5 mW/MHz Integrated cache controllers with separate I and D caches –cache size from 2-16 KB 64-bit memory and cache interface Up to 3 co-processors Full-scan with 99% fault coverage.

30 Advanced RISC Machines (ARM ) A family of 32-bit RISC processor cores ARM6, ARM7: MPU with Cache, MMU, Write Buffer and JTAG ARM7TDMI :ARM7 with Thumb ISA, ICE, Debug & MPY ARM8 : cached, low power, 5-stage pipe (vs 3 in others) StrongARM1, StrongARM2: available as Digital SA-110 (21285) Piccolo: DSP co-processor for ARM, shares system bus (AMBA) –support for Viterbi, bit manipulation operations –four nestable zero-overhead hardware loop constructs –splittable ALU, 1 cycle dual 16-bit operations –saturation arithmetic –1024 point in place complex radix 2 FFT in 33,331 cycles Manufacturing partnerships and/or licensing with –Cirrus logic, GEC Plessey, Sharp, TI and VLSI Tech.

31 © 1998 R. Gupta 31 ARM Processor Cores Enhancements: ARM7D, ARM7DM, ARM7DMI M = 64-bit result hardware multiplier running at 8bits/cycle D = 2 boundary scan chains for basic debug I = Embedded ICE debug –Thumb instruction set Source: ARM Inc.

32 © 1998 R. Gupta 32 Source: ARM Inc. ASIC ARMCore EmbeddedICE Cell (creates to core) ICE Debug Host running ARMsd 40KB/s software download Uses boundary scan pins ARM Enhancements: Embedded ICE The EmbeddedICE core cell allows debugging of ARM core embedded with an ASIC: –real time address and data-dependent breakpoints –full access and control of the CPU –can be reduced for size savings once the part goes into production.

33 1110 001 10 01001 Rd 0 Rd Constant 0000 Constant 16-bit Thumb instr. 32-bit ARM instr. always maj. opc.min. opc.dest. and extended ADD Rd #constant ARM Enhancements: Thumb ISA 8- or 16-bit external, 32-bit internal Thumb instruction set is a subset of 32-bit ARM instruction set –16-bit instructions –expanded into 32-bit ARM instructions at run time without any penalty Up to 65-70% smaller code size compared to ARM 130% of ARM performance with 8/16 bit memory 85% of ARM performance with 32-bit memory

34 © 1998 R. Gupta 34 Courtesy: S. Dey, ICCAD96 ARM Applications Widely used in a variety of applications –low cost 16-bit applications »mobile phones, modems, fax machines, pagers »hard disk and CD drive controllers »engine management –low cost 32-bit applications »smart cards »ATM and ethernet network interfaces »low power, on-chip application code –high performance 32-bit applications »digital cameras »set top boxes, network switches, laser printers »external memory system (RAM, ROMs)

35 © 1998 R. Gupta 35 Motorola FlexCore CPU cores based on 680x0 family –EC000, EC020, EC030 –all with static operation, 5/3.3 volt supplies –performance: »EC000: 2.7 MIPS @16.67MHz, 33 mW »EC020: 7.4 MIPS @25 MHz, 150 mW »EC030: 11.8 MIPS @33 MHz, 258 mW Serial I/O cores: 68681UART, MBus, SPI RT clock, Dual timer cores SCSCI, Parallel I/O, 8051 interfaces DRAM, Interrupt, JTAG controllers PLA, PLL, oscillators, power management cells.

36 © 1998 R. Gupta 36 Memory Core Example Virtual Chips 16M/18M bit Rambus DRAM Verilog/VHDL simulation model Organization –two banks, 512 pages per bank, 72x256 per page –dual internal banks, 2K byte cache per bank Programmable ack, write, read delays through control registers Synchronous protocol for fast block oriented xfrs. Modes of operation –reset, stand-by, power-down, active Deliverable: VHDL, Verilog source, test bench, test vectors, documentations. Others: Sand DRAM, VRAM verilog models.

37 © 1998 R. Gupta 37 Multimedia Cores MPEG input 1Mx16 SDRAM Audio Decoder Video Decoder microc. interface phy. mem. controller synchronization virtual mem. controller SRAM JPEG compression, MPEG decoding, Video DAC, etc. IBM Microelectronics, LSI logic, PalmChip, Silicon Engineering, Mentor Graphics, CompCore, Intrinsix VGA Example: MPEG-2 decoder from CompCore –70K-80K gates –18K bits of internal SRAM –16Mbit SDRAM (external) »bitstream buffering, frames –54MHz, 16-bit external mem. bus Source: CompCore CD2 Decoder audio streamvideo str.

38 © 1998 R. Gupta 38 Other Core Categories Protocol choices: –switched Ether, s. TR, ATM155, ATM25 Example: SYM1000 from Symbios –HDL code, 3.3 V, 0.5u –CSMA/CD ethernet –programmable inter- packet gap. –Optional CRC insertion, and check –MII interface to physical layer device –Host bus interface LSI Logic: ATMizer VLSI Cores –PKuP encryption core »implements modular exponentiation »synthesizable HDL core –DES core as a synthesizable Verilog model »two models: 8 bytes/8 cycle, 8 bytes/16 cycles ASIC International –DES cores –Exponentiator Engine –Hash function cores Networking Encryption

39 © 1998 R. Gupta 39 What are cores? What are cores? Building systems using cores Challenges in using cores Challenges in using cores

40 © 1998 R. Gupta 40 Challenges in Using Cores A core cell is not a single product –a PCI cell consists of 25 separate Verilog files »plus as many synthesis scripts –immature interface abstraction »e.g., there is no direct access to the core from the end product. Access must be created. A core is not an end product –a core cell is design + know-how to use it for a particular process, tools and even application Testability and testing is a challenge –as opposed to design, testing is not a hierarchical problem »using 90% testable cores does not give 90% system testability »tests are core-specific, not applicable from primary IO What is an efficient design methodology using cores?

41 © 1998 R. Gupta 41 Interface Processor ASIC Memory Interface Analog I/O DMA 2. HDL Modeling Architectural synthesis Logic synthesis Physical synthesis 3. Software synthesis, Optimization, Retargetable code gen., Debugging & Programming environ. 1. Design environment, co-simulation constraint analysis. 4. Test Issues, Test access, Isolation, ATPG Processor cores introduce software part of system design. SOC Design Problem Components

42 © 1998 R. Gupta 42 9 Co-Design Components Specification, Modeling and Analysis –How to capture designer intent efficiently in a design language? »HDL optimizations »Constraint modeling and analysis System Validation –How to use description in building a (computational) prototype capable of running actual applications? »Co-simulation, Formal Verification System Design and Synthesis –Delayed partitioning of hardware and software –Software synthesis and optimizations –Interface design and optimizations.

43 © 1998 R. Gupta 43 System Specification: Goals & Characteristics Main purpose: provide clear and unambiguous description of the system function, and to provide a –documentation of the initial design process Support –diverse models of computation –allow the application of computer-aided design tools for »design space exploration »partitioning »software-hardware synthesis »validation (verification, simulation) »testing Should not constrain the implementation options. –diverse implementation technologies.

44 © 1998 R. Gupta 44 Embedded System Modeling Reactive and time-constrained interactions Consist of structural and behavioral components. Hierarchically organized components. Synchronous and asynchronous communications. Locally or globally clocked. Idealized as Synchronous Reactive Systems.

45 © 1998 R. Gupta 45 Synchronous Reactive Modeling Zero computation time System outputs produced in synchrony with inputs Instantaneous broadcast communications Deterministic behavior: –a given sequence of inputs always produces same output sequence. Examples languages using this model –ESTEREL, LUSTURE. –More later.

46 © 1998 R. Gupta 46 Example: Esterel Reactive and atomicity of reactions –watching implements a generalized watchdog –Time as discrete instants –Easily translated into a transducer (FSM generation) –Perfect synchrony hypothesis Instantaneous broadcast –Implicit communication architecture. –Using signals which are present or absent and may carry a value. –Pure signals do not carry a value.

47 © 1998 R. Gupta 47 Constraint and Interface Modeling Source of timing constraints –Time-constrained interactions between system components and environment –Specified using statement tags on HDL descriptions. Types of constraints –Delay and interval constraints (latency-type) –Rate constraints (throughput-type) Constraint satisfiability –Are constraints satisfied for a given implementation? –Given an implementation, resynthesize to satisfy a given set of constraints.

48 © 1998 R. Gupta 48 RUNTIME SYSTEM DISPLAY INFO CALIBRATION GET INFO CLOCKSTATE VEHICLE CRUISE CONTROLLER CurFuel RotClk brake gear valve speed ave_speed consumption maintenance ROUTINE InstVel AveVel SecClk SecPulse 1/sec <= 1ms 1000/sec DATA-RATE OP-DELAY Derived from events at system interfaces. Example

49 © 1998 R. Gupta 49 Interface Modeling using Constraints Interface described using events. Events are instances of actions. Most common interface action is a signal transition on a wire. Temporal relationship between events: –Propagation delays: –Bounds on event separation intervals: min, max, linear –Absolute versus relative rate constraints.

50 © 1998 R. Gupta 50 LINEAR i j k MAX i j k MIN i j k ij k max min Binary Delay Constraints

51 © 1998 R. Gupta 51 Interface Delay Timing Constraints Three types: (McMillan & Dill) –Given events i and j with time stamps ti and tj respectively and dij as the delay i to event j, such that lij <= dij <= uij : »min constraints: tj = mini<j (ti +dij ) »max constraints: tj = maxi<j (ti +dij ) »linear constraints: tj - ti <= sij where sij is maximum achievable separation between i and j. Constraint graph: –nodes events; edges constraints. Synthesis: find maximum achievable separation between pairs of events (minimum separation depends upon operation delays.) Rate constraint analysis and debugging.

52 © 1998 R. Gupta 52 Hardware Modeling As A Programming Activity Programming languages are often used for constructing system models Core based designs assume that all new designs originate as an HDL model Hardware –concurrency in operations –I/O ports and interconnection of blocks –exact event timing is important: open computation Software –typically sequential execution –structural information is less important –exact event timing is not important: closed computation.

53 © 1998 R. Gupta 53 HDL Semantic Necessities Abstraction –provide a mechanism for building larger systems by composing smaller ones Reactive programming –provide mechansims to model non-terminating interaction with other components –watching (signal) and waiting (condition) »must be separate (else one is an implementation of the other) –exception handling Determinism –provide a predictable simulation behavior Simultaneity –model hardware parallelism, multiple clocks

54 © 1998 R. Gupta 54 HDL Pragmatics Data types –simple (bit/Boolean): HardwareC, Verilog –complex (records): VHDL Interface abstraction –provide an external view independent of implementation »Classes (packages) in C++, VHDL »Entity interfaces or Tasks: VHDL, ADA

55 © 1998 R. Gupta 55 Pragmatics (contd.) Communication –shared variables using explicit communication architectures –synchronous handshaking using implicit communications (ADA task entry call) –instantaneous broadcast (Esterel) –asynchronous message passing using explicitly communication architectures Time –global, multiple clocks, logics.

56 © 1998 R. Gupta 56 (Restricted) HLL Description Add reactivity, clock(s), waiting & watching Refine data types - bit true, fixed point - saturation arithmetic HDL Description CONTROLDATA Going from HLL to HDL

57 © 1998 R. Gupta 57 HLL Restrictions Classes for synthesis target do not use –unions, floating, pointers (only interface with lib) –type casts –virtual functions (restricted to only library classes) –policy of use on shared variables Suggestions: –explicit initialization blocks –use defines instead of conditional process enables for statically determined conditions

58 © 1998 R. Gupta 58 Adding Reactivity Reactivity can be added in one of three ways: 1. use annotations, comments »commonly used in home-grown C-based HDLs »sometime use semantic overloads that is association an alternative interpretations. 2. use library assists »additional library elements that can be used by the programmer in modeling hardware. »example: additional classes in C++ 3. use additional language constructs »new constructs require a specific language front- end, new debugging tools. »example: divide operations across cycles using next()

59 © 1998 R. Gupta 59 Adding Data Types Identify signals –storage elements, structured memory blocks Type variables : signed, unsigned, std_logic Size state variables on instantiation

60 © 1998 R. Gupta 60 Verilog, VHDL: compiler produces inputs to run a DES simulator. Esterel: compiler produces a single deterministic FSM. Scenic: compiler produces (synthesizable) processes and a simulator. Language Comparisons

61 © 1998 R. Gupta 61 From HDL to Circuit/System: Compilation & Synthesis Compilation spans programming language theory, architecture and algorithms Synthesis spans concurrency, finite automata, switching theory and algorithms In practice, the two tasks are inter-related. Compilation and synthesis tasks are done in three steps: –front-end, intermediate optimizations, back-end.

62 © 1998 R. Gupta 62 Compilation Program compilation for software target –Front-end parsing into intermediate form –Optimization over the intermediate form –Back-end code-generation for a given processor HDL compilation for hardware target –Front-end parsing into intermediate form –Optimization over the intermediate form –Back-end architecture, logic and physical synthesis.

63 © 1998 R. Gupta 63 Synthesis and Optimization Substantial growth in last twenty years Industry-standard tools in –Logic synthesis –Physical synthesis Behavioral synthesis just becoming commercial. Substantial room for growth when considered together with software compilation.

64 © 1998 R. Gupta 64 Behavioral to RTL Basic transformations needed –1. Operation scheduling –2. Resource binding –3. Control generation: central or distributed.. Evolutionary growth to synthesis tools –Designer expertise today lies in the RTL coding –Synthesis tools are strongly dependent upon design methodology. Generate a structure suitable for synchronous and single-phase circuits –resource performance in terms of execution delay –in number of clock cycles Design space: –area, cycle time, latency, throughput

65 © 1998 R. Gupta 65 Synthesis Tasks Operation scheduling, resource binding, control generation Scheduling determines operation start times –minimize latency Resource binding: resource selection, allocation –minimize area (maximize sharing) Control synthesis: –data-path = connectivity synthesis »detailed resource connections »steering logic »connection to the interface –control synthesis »synthesize controller that provides operations/resource enables, operation synchronization, resource arbitration

66 © 1998 R. Gupta 66 A CAD Methodology for SW Automated software synthesis from specs. –Synthesis tools generate implementation –Global optimization of the program. Optimization used to achieve design goals. Analysis and verification tools for feedback. Compilation for embeddable software Software Optimizations –Code compression –Optimization for power –Instruction-set generation –Static memory allocation

67 © 1998 R. Gupta 67 Compression Block-based compression –Program compressed in small blocks to preserve random- access properties (e.g., cache line blocks) Transparent code compression –ISA unchanged. Compression uses compiler output. –Decompression performed by cache refill engine. –Processor sees only uncompressed code. –Techniques: Huffman coding. Key issue: code location in memory after compression?

68 © 1998 R. Gupta 68 Compilation: What is New? Machine description –in terms of architecture -> programming –in terms of organization -> hardware Retargetable code generation has traditionally addressed the problem of compilation for an architecture. SOCs also need input about machine organization in order to perform timing analysis on generated code –Two approaches: »describe detailed machine »extract ISA from machine organization

69 © 1998 R. Gupta 69 Hardware Design & Synthesis C code Assembly Compiler Application Development Machine Definition Compiler Generator Code generator Algorithm(s) EDA Co-Design Framework

70 © 1998 R. Gupta 70 Test Strategy for Firm/Hard Cores System-level test strategy –build test sets for cores »generate functional vectors »fault grade for interconnects –prepare cores for test application from primary inputs through access/isolation, Scan/DFT –if BIST, schedule BIST application and signature analysis. System-level DFT –goal is to reduce testing cost –increase accessability of the internal nodes »controllability: ability to establish a specific signal value at each node from primary inputs (PIs) »observability: determine signal value by controlling Pis and observing primary outputs »tradeoffs: area, I/O pins, performance, yield, TTM

71 © 1998 R. Gupta 71 DFT Techniques Commonly used approach is to modify a sequential circuit into a combinational one during test. –Automatic test generation is much easier for combinational circuits Current monitoring techniques. For sequential circuits, scan techniques are often used –link memory elements into a shift register –serially load and read out –boundary scan is commonly used to test board-level devices Built-In Self Test –minimal external support, high fault coverage, easy access requirements, protect IP

72 © 1998 R. Gupta 72 Test Access for Cores Peripheral access techniques –parallel access, serial access or functional access Parallel access –add MUXs to connect core IOs, high routing overhead, pin limitations may prevent parallel access Serial access –most common is ring approach, during test core I/Os are connected via a scan chain, low overhead, delay penalty, easy to test user-defined logic, long test application time Functional access –sensitize path through cores, low hardware cost, parallel test pattern translation possible. Also need isolation mechanisms for cores.

73 © 1998 R. Gupta 73 Summary of Part I Core cells present a new market opportunity –core cells are breathing life into many old designs (6502) –a new class of third-party vendors who bridge the gap between design houses and EDA vendors. Productization of cores faces many challenges –portability of cores versus design reuse –socketing standards (portability and reuse) –IP protection: encryption, product versus technology –design and test methodologies Research outlook is aligned with industry expectations –all new designs start with HDL description –immediate focus on validation, testability issues –long term focus on software optimization, complexity management.

Download ppt "VLSI DESIGN 1998 TUTORIAL Part 1. Core Building Blocks and Building Systems using Cores Rajesh K. Gupta University of California, Irvine. What are cores?"

Similar presentations

Ads by Google