Logistic Regression and Perceptron Prediction of Instruction Branches

Logistic Regression and Perceptron Prediction of Instruction Branches
Joshua Ferguson

Overview Motivation Branch Prediction background
Machine Learning background Methodology Results

Motivation CPUs account for around 30% of server power usage while idle, and that percentage scales up with utilization* Instruction Branch Misprediction causes unnecessary instruction execution on the CPU A simple experiment on an Intel M 1.6 GHz CPU found approximately 8% of branched instruction were mispredicted, even while idle. *Luiz André Barroso and Urs Hölzle - The Case for Energy-Proportional Computing, IEEE 2007

Branch Prediction Workload L3 L2 L2 L1 L1 Results … … … … … …
Registers

Branch Prediction Cont…
If-Then statements throw this off By default, the CPU will execute whichever branch it predicts will be executed Common techniques involve a simple buffer of recent memory. Others use limited pattern matchers Instruction fetch cycle, decode cycle, execution cycle, mem access, writeback. Branching throws this process off by causing dependencies. T N T N T T T – Branch Taken N – Branch Not-Taken

Machine Learning The CPU is trying to learn patterns, so why not use modern machine learning techniques? Most scale poorly, especially at the constriction of resources that CPUs have. None-the-less, I wanted to try a few out.

Machine Learning cont…
Logistic Regression Perceptron

Methodology Generate workload Trace CPU metrics
Analyze and Rank ML algorithms

Methodology cont… Generate Workload
Jakart – Java based HTTP request suite. Runs scripts of HTTP requests. Scripts aren’t very customizable, and would make patterns painfully obvious

Methodology cont… Generate Workload
SpecPower – Perfect solution Provides interesting variation in CPU workload

Methodology cont… Trace CPU metrics
Intel – Vtune Only provide graphs and summary data, no trace for research Performance Profiling for Machine Learning Abandoned project, only runs on Pentium 4s AMD - Code Analyst Only provides summary data, no trace

Methodology Trace CPU metrics
Performance API University of Tennessee Knoxville Library of calls to Manufacturer Specific Registers that store information like: # of branch instructions encountered Branches mis-predicted L1/L2 cache miss/hit/access

Methodology Trace CPU metrics
Unfortunately, limited to the resolution of the hardware’s sleep counter. Hundreds of branches would pass between each measurement. Capabilities for any specific CPU can vary. Main.c pthread_t BRCN; struct thread_args BRCN_args; *BRCN_args.metric_type = PAPI_BR_CN; pthread_create(&BRCN,NULL,papi_thread,(void *)&BRCN_args); PAPI_thread.c PAPI_read_counters();

Methodology Trace CPU Metrics
Journal of Instruction-Level Parallelism hosts public traces with data values and memory addresses. Traces from Int and FP operations, as well as WebServer workload

Analysis Prepare data Bitshifted instruction addresses, so only high-level info remains Unsigned int Whether each instruction is a branch, call, or return Booleans If it branches, the bitshifted target address. Boolean and unsigned int

Analysis cont… Train each algorithm on subset of data, and then test for error rate on main data file Logistic Regression must train offline. Trained on 10,000 samples. Tested on 40,000. Perceptron can train online Keeps running buffer of passed 100 values Requires buffer size of (4*Boolean + 2*uint16)*100 3.6k

Analysis cont… Baselines
Running history buffer Choose statistically likely outcome If 25%, 50% or 75% history take branch, then branch Previous outcome If took last branch, then take, otherwise pass.

Results Baseline Floating Point Workload Integer Point Workload
Error % Buffer History Length T N T N T T

Results Logistic Regression
Integer Workload Trace Floating Point Workload Trace Error % Epsilon Value (Higher means more accurate match with training data)

Results Perceptron Flat 33.9% error rate using inventor’s algorithm (Rosenblatt) A disappointed result, especially for an online algorithm. No capability to really change how accurately it fits the training data, thus causing the model to lose generality.

Final Thoughts Obtaining solid CPU traces is commonly done in literature using AIX, an IBM proprietary OS. For research in this area, this OS seems a necessity. Implementing logistic regression in a low enough language to execute effectively is a challenge. SPECPower can be combined with PAPI to test higher level workload learners, possibly existing at the OS level and controlling ACPI states, rather than just branch prediction in the register. Thanks!

Logistic Regression and Perceptron Prediction of Instruction Branches

Similar presentations

Presentation on theme: "Logistic Regression and Perceptron Prediction of Instruction Branches"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Logistic Regression and Perceptron Prediction of Instruction Branches

Similar presentations

Presentation on theme: "Logistic Regression and Perceptron Prediction of Instruction Branches"— Presentation transcript:

Similar presentations

About project

Feedback