Burleson, UMASS1 Using System-on-a- Chip as a Vehicle for VLSI Design Education Andrew Laffely and Wayne Burleson Electrical and Computer Engineering University.

Slides:



Advertisements
Similar presentations
Programmable FIR Filter Design
Advertisements

Modern VLSI Design 4e: Chapter 8 Copyright  2008 Wayne Wolf Topics High-level synthesis. Architectures for low power. GALS design.
Ch.3 Overview of Standard Cell Design
Computer Architecture & Organization
Clock Design Adopted from David Harris of Harvey Mudd College.
Addressing the System-on-a-Chip Interconnect Woes Through Communication-Based Design N. Vinay Krishnan EE249 Class Presentation.
Digital Systems Emphasis for Electrical Engineering Students Digital Systems skills are very valuable for electrical engineers Digital systems are the.
MICROELETTRONICA Design methodologies Lection 8. Design methodologies (general) Three domains –Behavior –Structural –physic Three levels inside –Architectural.
Lecture 26: Reconfigurable Computing May 11, 2004 ECE 669 Parallel Computer Architecture Reconfigurable Computing.
Weiping Shi Department of Computer Science University of North Texas HiCap: A Fast Hierarchical Algorithm for 3D Capacitance Extraction.
Spring 08, Jan 15 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Introduction Vishwani D. Agrawal James J. Danaher.
Spring 07, Jan 16 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Introduction Vishwani D. Agrawal James J. Danaher.
Dynamically Parameterized Architectures for Power Aware Video Coding: Motion Estimation and DCT Wayne Burleson Prashant Jain
On-Line Adjustable Buffering for Runtime Power Reduction Andrew B. Kahng Ψ Sherief Reda † Puneet Sharma Ψ Ψ University of California, San Diego † Brown.
Burleson, UMASS1 Adaptive System on a Chip (ASOC): A Backbone for Power-Aware Signal Processing Cores Andrew Laffely, Jian Liang, Russ Tessier and Wayne.
Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier.
Jieyi Long and Seda Ogrenci Memik Dept. of EECS, Northwestern Univ. Jieyi Long and Seda Ogrenci Memik Dept. of EECS, Northwestern Univ. Automated Design.
Viterbi Decoder: Presentation #1 Omar Ahmad Prateek Goenka Saim Qidwai Lingyan Sun M1 Overall Project Objective: Design of a high speed Viterbi Decoder.
CSCE 613 VLSI design is mostly about CAD/EDA tools Many different tools for VLSI design Developed as a new course, independent of previous version Adopt.
Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University.
EC1354 – VLSI DESIGN SEMESTER VI
ELEC516/10 course_des 1 ELEC516 VLSI System Design and Design Automation Spring 2010 Course Description Chi-ying Tsui Department of Electrical and Electronic.
February 12, 1998 Aman Sareen DPGA-Coupled Microprocessors Commodity IC’s for the Early 21st Century by Aman Sareen School of Electrical Engineering and.
Ronny Krashinsky Seongmoo Heo Michael Zhang Krste Asanovic MIT Laboratory for Computer Science SyCHOSys Synchronous.
Research on Analysis and Physical Synthesis Chung-Kuan Cheng CSE Department UC San Diego
Making FPGAs a Cost-Effective Computing Architecture Tom VanCourt Yongfeng Gu Martin Herbordt Boston University BOSTON UNIVERSITY.
CAD for Physical Design of VLSI Circuits
ASIC/FPGA design flow. FPGA Design Flow Detailed (RTL) Design Detailed (RTL) Design Ideas (Specifications) Design Ideas (Specifications) Device Programming.
Automated Design of Custom Architecture Tulika Mitra
Logic Synthesis for Low Power(CHAPTER 6) 6.1 Introduction 6.2 Power Estimation Techniques 6.3 Power Minimization Techniques 6.4 Summary.
Section 10: Advanced Topics 1 M. Balakrishnan Dept. of Comp. Sci. & Engg. I.I.T. Delhi.
Xilinx Programmable Logic Design Solutions Version 2.1i Designing the Industry’s First 2 Million Gate FPGA Drop-In 64 Bit / 66 MHz PCI Design.
Advanced Computer Architecture, CSE 520 Generating FPGA-Accelerated DFT Libraries Chi-Li Yu Nov. 13, 2007.
EE/CS 481 Spring Founder’s Day, 2008 University of Portland School of Engineering Project Golden Eagle CMOS Fast Fourier Transform Processor Team.
CSE 494: Electronic Design Automation Lecture 2 VLSI Design, Physical Design Automation, Design Styles.
Lecture 2 1 ECE 412: Microcomputer Laboratory Lecture 2: Design Methodologies.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Chonnam national university VLSI Lab 8.4 Block Integration for Hard Macros The process of integrating the subblocks into the macro.
ELEC692/04 course_des 1 ELEC 692 Special Topic VLSI Signal Processing Architecture Fall 2004 Chi-ying Tsui Department of Electrical and Electronic Engineering.
Field Programmable Gate Arrays (FPGAs) An Enabling Technology.
Radix-2 2 Based Low Power Reconfigurable FFT Processor Presented by Cheng-Chien Wu, Master Student of CSIE,CCU 1 Author: Gin-Der Wu and Yi-Ming Liu Department.
COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION 03/26/
Hardware Image Signal Processing and Integration into Architectural Simulator for SoC Platform Hao Wang University of Wisconsin, Madison.
EE 466/586 VLSI Design Partha Pande School of EECS Washington State University
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
Distributed Computation: Circuit Simulation CK Cheng UC San Diego
Modern VLSI Design 3e: Chapter 7 Copyright  1998, 2002 Prentice Hall PTR Topics n Power/ground routing. n Clock routing. n Floorplanning tips. n Off-chip.
LOGIC OPTIMIZATION USING TECHNOLOGY INDEPENDENT MUX BASED ADDERS IN FPGA Project Guide: Smt. Latha Dept of E & C JSSATE, Bangalore. From: N GURURAJ M-Tech,
EE 5900 Advanced Algorithms for Robust VLSI CAD, Spring 2009 Combinational Circuits.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR Moore’s Law n Gordon Moore: co-founder of Intel. n Predicted that number of transistors.
University of Michigan Electrical Engineering and Computer Science 1 Compiler-directed Synthesis of Multifunction Loop Accelerators Kevin Fan, Manjunath.
Integrated Microsystems Lab. EE372 VLSI SYSTEM DESIGNE. Yoon 1-1 Panorama of VLSI Design Fabrication (Chem, physics) Technology (EE) Systems (CS) Matel.
1 Power-Aware System on a Chip A. Laffely, J. Liang, R. Tessier, C. A. Moritz, W. Burleson University of Massachusetts Amherst Boston Area Architecture.
Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles Zhiyi Yu, Bevan Baas VLSI Computation Lab, ECE Department University of California,
Written by Whitney J. Wadlow
Introduction to ASICs ASIC - Application Specific Integrated Circuit
ASIC Design Methodology
ELEC 7770 Advanced VLSI Design Spring 2016 Introduction
Stateless Combinational Logic and State Circuits
Architecture & Organization 1
ELEC 7770 Advanced VLSI Design Spring 2014 Introduction
Architecture & Organization 1
ELEC 7770 Advanced VLSI Design Spring 2012 Introduction
ELEC 7770 Advanced VLSI Design Spring 2010 Introduction
A High Performance SoC: PkunityTM
HIGH LEVEL SYNTHESIS.
Computer Evolution and Performance
Combinational Circuits
Low Power Digital Design
Combinational Circuits
Presentation transcript:

Burleson, UMASS1 Using System-on-a- Chip as a Vehicle for VLSI Design Education Andrew Laffely and Wayne Burleson Electrical and Computer Engineering University of Massachusetts Amherst This material is based upon work supported by the National Science Foundation under Grant No and SRC Tasks 766 and 1075

Burleson/UMASS2 Challenges in VLSI Education Advancing Processing Technology Higher level design tools Realistic yet tractable design projects Preparation for jobs in semiconductor and other sectors. Making best use of faculty/student time and university resources

Burleson/UMASS3 ECE 559/659: VLSI Design Project (10 grads, 20 seniors) Learn design process for a complex VLSI in deep sub-micron CMOS Learn VLSI design skills and tools, including working in teams Learn about a particular application component and its VLSI implementation Learn to present formal design reviews using oral, written, graphical and web-based techniques Course Objectives:

Burleson/UMASS4 Key Aspects of the Course aSoC (home-grown SoC platform) Provides a unifying framework to class Allows for subdivision but inter-relation of projects Interesting cutting edge architecture based on NSF- and SRC-funded research at UMASS and elsewhere Covers many aspects of VLSI Design Realistic constraints on area, timing, power and I/O Graduate and undergraduate teamwork Graduate students provide leadership, motivation and experience Commercial tools and design flow Review-based evaluation Oral and web-based reports for 4 different reviews: proposal, feasibility, implementation, integration

Burleson/UMASS5 Adaptive System-on-a-Chip (aSoC) Tiled architecture with mesh interconnect Point to point communication pipeline Allows for heterogeneous cores Differing sizes, clock rates, voltages Low-overhead core interface for On-chip bus substitute for streaming applications Based on static scheduling Fast and predictable  Proc Tile Multiplier FPGA Multiplier ctrl South Core West North East Communication Interface

Burleson/UMASS6 Communication Interface Custom design to maximize speed and reduce power Core-ports Crossbar Controller Instruction memory Local frequency and voltage supply Core Core-ports Decoder Local Frequency & Voltage North to South & East Instruction Memory PC Controller North South East West Local Config. North South East West Inputs Outputs Crossbar

Burleson/UMASS7 Class Projects SoC Infrastructure 1,3 Communication Interface Interconnect 3 Power Distribution Clock System Power Management Cores Motion estimation for video encoding 2,3 AES Cryptography 3 Cache 2,3 Huffman Coding 3D Graphics 1,2,3 Discrete Cosine Transform 2,3 Smart Card 2,3 1 Used in PhD Dissertation 2 Used in Masters Thesis 3 Used in Publications

Burleson/UMASS8 Design Flow Architecture to Layout Architecture: Block diagram of system and behavioral description Logic: Gate level or schematic description Circuit: Transistor sizing Layout: Floorplanning, clock and power distribution Tools VerilogXL: behavioral representation VTVT: standard cell library Synopsys: standard cell gate level netlist generation Silicon Ensemble: standard cell netlist to layout Cadence LayoutPlus: schematic and layout design NCSU CDK: design and extraction rules Cadence Layout vs. Schematic: layout verification HSPICE: circuit simulator

Burleson/UMASS9 aSoC Implementation and Integration  TSMC technology Full custom

Burleson/UMASS10 Advanced Signaling Techniques (building on SRC-funded work) Differential current sensingBooster Insertion Multi-level current signaling Phase coding

Burleson/UMASS11 Circuit Level Simulation (HSPICE) Evaluating Subsystems with realistic models Capacitance, resistance and inductance Process variations Process generations

Burleson/UMASS12 Interconnect Characterization: Comparing delay and power of signaling techniques for different tile sizes at 250nm, 180nm, 130nm, 100n

Burleson/UMASS13 Voltage Scaling Approach Core-ports Single buffer for each stream to cross clock/voltage barrier between core and interface Reading/Writing success rates indicate core utilization Input blocked: Core too slow Output blocked: Core too fast Controller Interprets core-port success rates to adjust local clock and voltage Interconnect Buffer Input Core-port Output Core-port Core Clock and Supply Controller Local Vdd Local Clock Blocked Processing Pipeline

Burleson/UMASS14 Vdd Selection Criteria Voltage Normalized Delay 0.73 As Vdd decreases delay increases exponentially Use curve to match available clock frequencies to voltages The voltage and frequency change reduces power by 79%, 96%, and 98.7% P =  C(Vdd) 2 f Normalized Core Critical Path Delay vs. Vdd Max Speed 1/4 Speed 1/2 Speed 1/8 Speed 1.16

Burleson/UMASS15 Clock Distribution 64 tile aSoC70nm100nm130nm180nm Chip Area(9.24mm) 2 (13.3mm) 2 (17.2mm) 2 (23.8mm) 2 Frequency5 GHz2 GHz1 GHz0.5 GHz Power126 mW240 mW445 mW784 mW Mean Skew41 ps50 ps92 ps70.6 ps Percent Skew21 %10 %9 %4 % Tile Tiled architecture extends life of globally synchronous systems Precise H-tree implementation Load is small and equal at each branch Skew can be reduced by 70% with advanced deskew circuits 1 1 S. Tan et al. “Clock Generation and Distribution for the First IA-64 Microprocessor” IEEE JSSC, Nov. 2000

Burleson/UMASS16 Power Distribution 64 tile aSoCVhVh V mh V ml VlVl Voltage1.8V1.16V0.73V0.6V Current per Core 110mA25mA13mA7mA Total Power12.1 W1.86 W607 mW269 mW Heterogeneous cores may require multiple power supply voltages Tile structure enables uniform interwoven grid Larger grid for higher current demands Reduced resistance Higher capacitance Gnd VhVh VlVl V ml V mh

Burleson/UMASS17 Architecture Evaluation (Motion Estimation) Array-based architecture Pipelined ME Parameterized search window size Full search Choose 16x16 or 8x8 windows Reduce power Address Generation Unit Processing Element Array Memory FIFOs

Burleson/UMASS18 Modify Existing Designs Take existing Verilog code or hardware and improve or change functionality (e.g. add motion estimation algorithms, provide AES key-length flexibility) Evaluate changes in performance and overhead - Old PE Layout - New PE Layout

Burleson/UMASS19 Conclusions Advancing Process Technology Target.18u for affordable fab but also do scaling studies Higher level design tools Combine synthesis and custom techniques Realistic yet tractable design projects Re-use existing projects and provide unifying themes Preparation for jobs in semiconductor and other sectors. Focus on system design and appropriate levels of abstraction Teach how to learn new tools Making best use of faculty/student time and university resources Leverage research Combine grad and undergrad Re-use materials, tools