Self-Learning, Adaptive Computer Systems Intel Collaborative Research Institute Computational Intelligence Yoav Etsion, Technion CS & EE Dan Tsafrir, Technion.

Self-Learning, Adaptive Computer Systems Intel Collaborative Research Institute Computational Intelligence Yoav Etsion, Technion CS & EE Dan Tsafrir, Technion CS Shie Mannor, Technion EE Assaf Schuster, Technion CS

Adaptive Computer Systems Complexity of computer systems keeps growing We are moving towards heterogeneous hardware Workloads are getting more diverse Process variability affects performance/power of different parts of the system Human programmers and administrators cannot handle complexity The goal: Adapt to workload and hardware variability Intel Collaborative Research Institute Computational Intelligence

Predicting System Behavior When a human observes the workload, she can typically identify cause and effect Workload carries inherent semantics The problem is extracting them automatically… Key issues with machine learning: Huge datasets (performance counters; exec. traces) Need extremely fast response time (in most cases) Rigid space constraints for ML algorithms Intel Collaborative Research Institute Computational Intelligence

Memory + Machine Learning Current state-of-the-art Architectures are tuned for structured data Managed using simple heuristics Spatial and temporal locality Frequency and recency (ARC) Block and stride prefetchers Real data is not well structured Programmer must transform data Unrealistic for program agnostic management (swapping, prefetching) Intel Collaborative Research Institute Computational Intelligence

Memory + Machine Learning Multiple learning opportunities Identify patterns using machine learning Bring data to the right place at the right time Memory hierarchy forms a pyramid Caches / DRAM, PCM / SSD, HDD Different levels require different learning strategies Top: smaller, faster, costlier [prefetching to caches] Bottom: bigger, slower, pricier [fetching from disk] Need both hardware and software support Intel Collaborative Research Institute Computational Intelligence

Research track: Predicting Latent Faults in Data Centers Intel Collaborative Research Institute Computational Intelligence Moshe Gabel, Assaf Schuster

Failures and misconfiguration happen in large datacenters Cause performance anomalies? Sound statistical framework to detect latent faults Practical: Non-intrusive, unsupervised, no domain knowledge Adaptive: No parameter tuning, robust to system/workload changes Intel Collaborative Research Institute Computational Intelligence 7 Latent Fault Detection

Applied to real-world production service of 4.5K machines Over 20% machine/sw failures preceded by latent faults Slow response time; network errors; disk access times Predict failures 14 days in advance, 70% precision, 2% FPR Latent Fault Detection in Large Scale Services, DSN 2012 Intel Collaborative Research Institute Computational Intelligence 8 Latent Fault Detection

Research track: Task Differentials: Dynamic, inter-thread predictions using memory access footsteps Intel Collaborative Research Institute Computational Intelligence Adi Fuchs, Yoav Etsion, Shie Mannor, Uri Weiser

Motivation  We are in the age of parallel computing.  Programming paradigms shift towards task level parallelism  Tasks are supported by libraries such as TBB and OpenMP:  Implicit forms of task level parallelism include GPU kernels and parallel loops  Tasks behavior tends to be highly regular = target for learning and adaptation Intel Collaborative Research Institute Computational Intelligence... GridLauncher &id = *new (tbb::task::allocate_root()) GridLauncher (NUM_TBB_GRIDS); tbb::task::spawn_root_and_wait(id); GridLauncher &cd = *new (tbb::task::allocate_root()) GridLauncher (NUM_TBB_GRIDS); tbb::task::spawn_root_and_wait(cd);... Taken from: PARSEC.fluidanimate TBB implementation 10

How do things currently work? Programmer codes a parallel loop SW maps multiple tasks to one thread HW sees a sequence of instructions HW prefetchers try to identify patterns between consecutive memory accesses No notion of program semantics, i.e. execution consists of a sequence of tasks, not instructions Intel Collaborative Research Institute Computational Intelligence 11 A B C ABCDEE

Task Address Set  Given the memory trace of task instance A, the task address set T A is a unique set of addresses ordered by access time: Intel Collaborative Research Institute Computational Intelligence Trace: START TASK INSTANCE(A) R 0x7f27bd6df8 R 0x61e630 R 0x6949cc R 0x7f77b02010 R 0x6949cc R 0x61e6d0 R 0x61e6e0 W 0x7f77b02010 STOP TASK INSTANCE(A) TA:TA: 0x7f27bd6df8 0x61e630 0x6949cc 0x7f77b02010 0x61e6d0 0x61e6e0 12

Address Differentials  Motivation: Task instance address sets are usually meaningless Intel Collaborative Research Institute Computational Intelligence T A : 7F27BD6DF8 61E630 6949CC 7F77B02010 61E6D0 61E6E0 + 0 = + 8000480 = + 54080 = + 8770090 = + 456 = -1808 =  Differences tend to be compact and regular, thus can represent state transitions 13 T B : 7F27BD6DF8 DBFA10 6A1D0C 7F7835F23A 61E898 61DFD0 T C : 7F27BD6DF8 1560DF0 6AF04C 7F78BBC464 61EA60 61D8C0 + 0 = + 8000480 = + 54080 = + 8770090 = + 456 = -1808 =

Address Differentials  Given instances A and B, the differential vector is defined as follows:  Example: Intel Collaborative Research Institute Computational Intelligence T A : 10000 60000 8000000 7F00000 FE000 14 T B : 10020 60060 8000008 7F00040 FE060

Differentials Behavior: Mathematical intuition Intel Collaborative Research Institute Computational Intelligence  Differential use is beneficial in cases of high redundancy.  Application distribution functions can provide the intuition on vector repetitions.  Non uniform CDFs imply highly regular patterns.  Uniform CDFs imply noisy patterns (differentials behavior cannot be exploited) Non uniform Uniform 15

Differentials Behavior: Mathematical intuition Intel Collaborative Research Institute Computational Intelligence  Given N vectors, straightforward dictionary will be of size: R=log 2 (N)  Entropy H is a theoretical lower bound on representation, based on distribution:  Example – assuming 1000 vector instances with 4 possible values: R = 2.  Differential Entropy Compression Ratio (DECR) is used as repetition criteria: Differential Value#instances p (20,8000,720,100050) 7000.7 (16,8040,-96,50) 1500.15 (0,0,14420,100) 500.05 (0,0,720,100050) 1000.1 BenchmarkSuiteImplementation Differential representationDifferential entropy DECR (%) FFT.128MBOTSOpenMP19.414.425.5 NQUEENS.N=12BOTSOpenMP11.88.428.7 SORT.8MBOTSOpenMP16.416.30.1 SGEFA.500x500LINPACKOpenMP14.10.993.6 FLUIDANIMATE.SIMSMALLPARSECTBB16.48.051.3 SWAPTIONS.SIMSMALLPARSECTBB17.913.126.6 STREAMCLUSTER.SIMSMALLPARSECTBB19.68.954.4 16

Possible differential application: cache line prefetching Intel Collaborative Research Institute Computational Intelligence 17 T A : 7F27BD6DF8 61E630 6949CC 7F77B02010 61E6D0 61E6E0 T B : 7F27BD6DF8 DBFA10 6A1D0C 7F7835F23A 61E898 61DFD0 TC:TC: 7F27BD6DF8 1560DF0 6AF04C? 7F78BBC464? 61EA60? 61D8C0?

Possible differential application: cache line prefetching Intel Collaborative Research Institute Computational Intelligence  Second attempt: PHT predictor, based on the last X differentials – predict next differential.  Example: 18

Possible differential application: cache line prefetching Intel Collaborative Research Institute Computational Intelligence  Prefix policy: Differential DB is a prefix tree, Prediction performed once differential prefix is unique.  PHT policy: Differential DB hold the history table, Prediction performed upon task start, based on history pattern: 19

Possible differential application: cache line prefetching Intel Collaborative Research Institute Computational Intelligence  Predictors compared with 2 models: Base (no prefetching) and Ideal (theoretical predictor – accurately predicts every repeating differential) Cache Miss Elimination (%) PrefixPHTIdeal NQUEENS.N=1219.411.4 62.1 SWAPTIONS18.30.1 49.2 FLUIDANIMATE14.926.0 46.0 SGEFA.5000.097.6 99.9 STREAMCLUSTER21.736.5 82.3 FFT.128M45.0 87.9 SORT.8M3.30.0 0.1 20

Future work Intel Collaborative Research Institute Computational Intelligence  Hybrid policies: which policy to use when? (PHT is better for complete vector repetitions, prefix is better for partial vector repetitions, i.e. suffixes)  Regular expression based policy (for pattern matching, beyond “ideal” model)  Predict other functional features using differentials (e.g. branch prediction, PTE prefetching etc.) 21

Conclusions (so far…) Intel Collaborative Research Institute Computational Intelligence When we look at the data, patterns emerge… Quite a large headroom for optimizing computer systems Existing predictions are based on heuristics A machine that does not respond within 1s is considered dead Memory prefetchers look for blocked and strided accesses Goal: Use ML, not heuristics, to uncover behavioral semantics 22

Self-Learning, Adaptive Computer Systems Intel Collaborative Research Institute Computational Intelligence Yoav Etsion, Technion CS & EE Dan Tsafrir, Technion.

Similar presentations

Presentation on theme: "Self-Learning, Adaptive Computer Systems Intel Collaborative Research Institute Computational Intelligence Yoav Etsion, Technion CS & EE Dan Tsafrir, Technion."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Self-Learning, Adaptive Computer Systems Intel Collaborative Research Institute Computational Intelligence Yoav Etsion, Technion CS & EE Dan Tsafrir, Technion.

Similar presentations

Presentation on theme: "Self-Learning, Adaptive Computer Systems Intel Collaborative Research Institute Computational Intelligence Yoav Etsion, Technion CS & EE Dan Tsafrir, Technion."— Presentation transcript:

Similar presentations

About project

Feedback