Logistic Regression and Perceptron Prediction of Instruction Branches

Slides:



Advertisements
Similar presentations
Branch prediction Titov Alexander MDSP November, 2009.
Advertisements

Profiler In software engineering, profiling ("program profiling", "software profiling") is a form of dynamic program analysis that measures, for example,
Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Advanced Computer Architecture COE 501.
Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.
Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.
CISC Machine Learning for Solving Systems Problems Presented by: John Tully Dept of Computer & Information Sciences University of Delaware Using.
CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
Processor Technology and Architecture
Chapter XI Reduced Instruction Set Computing (RISC) CS 147 Li-Chuan Fang.
VLSI Project Neural Networks based Branch Prediction Alexander ZlotnikMarcel Apfelbaum Supervised by: Michael Behar, Spring 2005.
Chapter 4 Processor Technology and Architecture. Chapter goals Describe CPU instruction and execution cycles Explain how primitive CPU instructions are.
Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.
Prophet/Critic Hybrid Branch Prediction Falcon, Stark, Ramirez, Lai, Valero Presenter: Christian Wanamaker.
Neural Methods for Dynamic Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University.
A Characterization of Processor Performance in the VAX-11/780 From the ISCA Proceedings 1984 Emer & Clark.
Inside The CPU. Buses There are 3 Types of Buses There are 3 Types of Buses Address bus Address bus –between CPU and Main Memory –Carries address of where.
Cisc Complex Instruction Set Computing By Christopher Wong 1.
Improving the Performance of Object-Oriented Languages with Dynamic Predication of Indirect Jumps José A. Joao *‡ Onur Mutlu ‡* Hyesoon Kim § Rishi Agarwal.
General Computer Science for Engineers CISC 106 Lecture 02 Dr. John Cavazos Computer and Information Sciences 09/03/2010.
Dept. of Computer and Information Sciences : University of Delaware John Cavazos Department of Computer and Information Sciences University of Delaware.
University of Washington Roadmap 1 car *c = malloc(sizeof(car)); c->miles = 100; c->gals = 17; float mpg = get_mpg(c); free(c); Car c = new Car(); c.setMiles(100);
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
CSCI 6461: Computer Architecture Branch Prediction Instructor: M. Lancaster Corresponding to Hennessey and Patterson Fifth Edition Section 3.3 and Part.
Performance Counters on Intel® Core™ 2 Duo Xeon® Processors Michael D’Mello
Alpha Supplement CS 740 Oct. 14, 1998
CPE 631 Project Presentation Hussein Alzoubi and Rami Alnamneh Reconfiguration of architectural parameters to maximize performance and using software techniques.
11 Online Computing and Predicting Architectural Vulnerability Factor of Microprocessor Structures Songjun Pan Yu Hu Xiaowei Li {pansongjun, huyu,
Dept. of Computer Science - CS6461 Computer Architecture CS6461 – Computer Architecture Fall 2015 Lecture 1 – Introduction Adopted from Professor Stephen.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
Dynamic Branch Prediction
Pentium 4 Deeply pipelined processor supporting multiple issue with speculation and multi-threading 2004 version: 31 clock cycles from fetch to retire,
Measuring Performance II and Logic Design
Real-World Pipelines Idea Divide process into independent stages
Chapter Six.
Course Contents KIIT UNIVERSITY Sr # Major and Detailed Coverage Area
Computer Architecture Chapter (14): Processor Structure and Function
Improving the support for ARM in IgProf
CS203 – Advanced Computer Architecture
Computer Structure Advanced Branch Prediction
Visit for more Learning Resources
Central Processing Unit Architecture
William Stallings Computer Organization and Architecture 8th Edition
Roadmap C: Java: Assembly language: OS: Machine code: Computer system:
Overview Introduction General Register Organization Stack Organization
Module 3: Branch Prediction
Intro to Architecture & Organization
So far we have dealt with control hazards in instruction pipelines by:
Branch statistics Branches occur every 4-6 instructions (16-25%) in integer programs; somewhat less frequently in scientific ones Unconditional branches.
Phase Capture and Prediction with Applications
Dynamic Branch Prediction
Tools.
Chapter Six.
So far we have dealt with control hazards in instruction pipelines by:
Chapter 2: Operating-System Structures
Tools.
So far we have dealt with control hazards in instruction pipelines by:
Identifying Slow HTTP DoS/DDoS Attacks against Web Servers DEPARTMENT ANDDepartment of Computer Science & Information SPECIALIZATIONTechnology, University.
So far we have dealt with control hazards in instruction pipelines by:
Adapted from the slides of Prof
Chapter 12 Pipelining and RISC
So far we have dealt with control hazards in instruction pipelines by:
October 29 Review for 2nd Exam Ask Questions! 4/26/2019
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
Lecture 1 An Overview of High-Performance Computer Architecture
Chapter 11 Processor Structure and function
So far we have dealt with control hazards in instruction pipelines by:
Dynamic Binary Translators and Instrumenters
Sec (2.3) Program Execution.
Presentation transcript:

Logistic Regression and Perceptron Prediction of Instruction Branches Joshua Ferguson

Overview Motivation Branch Prediction background Machine Learning background Methodology Results

Motivation CPUs account for around 30% of server power usage while idle, and that percentage scales up with utilization* Instruction Branch Misprediction causes unnecessary instruction execution on the CPU A simple experiment on an Intel M 1.6 GHz CPU found approximately 8% of branched instruction were mispredicted, even while idle. *Luiz André Barroso and Urs Hölzle - The Case for Energy-Proportional Computing, IEEE 2007

Branch Prediction Workload L3 L2 L2 L1 L1 Results … … … … … … Registers

Branch Prediction Cont… If-Then statements throw this off By default, the CPU will execute whichever branch it predicts will be executed Common techniques involve a simple buffer of recent memory. Others use limited pattern matchers Instruction fetch cycle, decode cycle, execution cycle, mem access, writeback. Branching throws this process off by causing dependencies. T N T N T T T – Branch Taken N – Branch Not-Taken

Machine Learning The CPU is trying to learn patterns, so why not use modern machine learning techniques? Most scale poorly, especially at the constriction of resources that CPUs have. None-the-less, I wanted to try a few out.

Machine Learning cont… Logistic Regression Perceptron

Methodology Generate workload Trace CPU metrics Analyze and Rank ML algorithms

Methodology cont… Generate Workload Jakart – Java based HTTP request suite. Runs scripts of HTTP requests. Scripts aren’t very customizable, and would make patterns painfully obvious

Methodology cont… Generate Workload SpecPower – Perfect solution Provides interesting variation in CPU workload

Methodology cont… Trace CPU metrics Intel – Vtune Only provide graphs and summary data, no trace for research Performance Profiling for Machine Learning Abandoned project, only runs on Pentium 4s AMD - Code Analyst Only provides summary data, no trace

Methodology Trace CPU metrics Performance API University of Tennessee Knoxville Library of calls to Manufacturer Specific Registers that store information like: # of branch instructions encountered Branches mis-predicted L1/L2 cache miss/hit/access

Methodology Trace CPU metrics Unfortunately, limited to the resolution of the hardware’s sleep counter. Hundreds of branches would pass between each measurement. Capabilities for any specific CPU can vary. Main.c pthread_t BRCN; struct thread_args BRCN_args; *BRCN_args.metric_type = PAPI_BR_CN; pthread_create(&BRCN,NULL,papi_thread,(void *)&BRCN_args); PAPI_thread.c PAPI_read_counters();

Methodology Trace CPU Metrics Journal of Instruction-Level Parallelism hosts public traces with data values and memory addresses. Traces from Int and FP operations, as well as WebServer workload

Analysis Prepare data Bitshifted instruction addresses, so only high-level info remains Unsigned int Whether each instruction is a branch, call, or return Booleans If it branches, the bitshifted target address. Boolean and unsigned int

Analysis cont… Train each algorithm on subset of data, and then test for error rate on main data file Logistic Regression must train offline. Trained on 10,000 samples. Tested on 40,000. Perceptron can train online Keeps running buffer of passed 100 values Requires buffer size of (4*Boolean + 2*uint16)*100 3.6k

Analysis cont… Baselines Running history buffer Choose statistically likely outcome If 25%, 50% or 75% history take branch, then branch Previous outcome If took last branch, then take, otherwise pass.

Results Baseline Floating Point Workload Integer Point Workload Error % Buffer History Length T N T N T T

Results Logistic Regression Integer Workload Trace Floating Point Workload Trace Error % Epsilon Value (Higher means more accurate match with training data)

Results Perceptron Flat 33.9% error rate using inventor’s algorithm (Rosenblatt) A disappointed result, especially for an online algorithm. No capability to really change how accurately it fits the training data, thus causing the model to lose generality.

Final Thoughts Obtaining solid CPU traces is commonly done in literature using AIX, an IBM proprietary OS. For research in this area, this OS seems a necessity. Implementing logistic regression in a low enough language to execute effectively is a challenge. SPECPower can be combined with PAPI to test higher level workload learners, possibly existing at the OS level and controlling ACPI states, rather than just branch prediction in the register. Thanks!