Carnegie Mellon School of Computer Science Forecasting with Cyber-physical Interactions in Data Centers Lei Li PDL Seminar 9/28/2011
Outline Overview of time series mining –Time series examples –What problems do we solve Motivation Experimental setup ThermoCast: the forecasting model Results Other time series models and algorithms 2(c) Lei Li 2012
What is co-evolving time series? 3 Correlated multidimensional time sequences with joint temporal dynamics (c) Lei Li 2012
Goal: generate natural human motion –Game ($57B) –Movie industry Challenge: –Missing values –“naturalness” 4 Motion Capture Right hand Left hand walking motion [Li et al 2008a] (c) Lei Li 2012
Environmental Monitoring Problem: early detection of leakage & pollution Challenge: noise & large data 5 Chlorine level in drinking water systems [Li et al 2009] (c) Lei Li 2012
Network Security Challenge: Anomaly detection in computer network & online activity 6 BGP # updates on backbone from Webclick for news from NTT Webclick for TV (c) Lei Li 2012
Time Series Mining Problems Forecasting Imputation (missing values) Compression Segmentation, change/anomaly detection Clustering Similarity queries Scalable/Parallel/Distributed algorithms 7 See my thesis for algorithms covering these problems (c) Lei Li 2012
Outline Overview of time series mining –Time series examples –What problems do we solve Motivation Experimental setup ThermoCast: the forecasting model Results Other time series models and algorithms 8(c) Lei Li 2012
Datacenter Monitoring & Management Temperature in datacenter Goal: save energy in data centers –US alone, $7.4B power consumption (2011) Challenge: –Huge data (1TB per day) –Complex cyber physical systems 9(c) Lei Li 2012
Typical Data Center Energy Consumption LBL data center Google data center [Barroso 09] [LBNL/PUB-945] 10(c) Lei Li 2012
Towards Thermal Aware DC Management Data centers are often over provisioned, with ≈40% of energy spent for cooling (total=$7.4B) How can we improve energy efficiency in modern multi-MegaWatt data centers? 11 JHU data center with Genomote (c) Lei Li 2012
Air cycle in DC 12(c) Lei Li 2012
Possible Ways for Saving Cooling and Computing Cost Challenges: –airflow interaction, spatial placement, SLA, … Possible direction: –Shutdown unused machine according to workload Example MSN workload 13(c) Lei Li 2012
Towards Data Driven AC control and server management Reactive energy saving: –slow down cooling fan in CRAC –raise AC temperature set points Proactive data center management: –predicting temperature distribution and thermal aware placement of workload 14 supply air temperature < threshold max(active inlet air temperature)< threshold (c) Lei Li 2012
Big Picture: Predictive AC Control and Server Management Temperature prediction Sensor measuring Server/workload management Cooling energy model Computing energy model CRAC control 15(c) Lei Li 2012
Outline Overview of time series mining –Time series examples –What problems do we solve Motivation Experimental setup ThermoCast: the forecasting model Results Other time series models and algorithms 16(c) Lei Li 2012
Experimental setup Tested in JHU data center with 171 1U servers, instrumented with a network of 80 sensors 17(c) Lei Li 2012
Sample measurements 18(c) Lei Li 2012
Observations Temperature difference cycle (max/min temp. on the same rack) is in anti-phase with air velocity cycle. Middle and bottom sections are coldest; Top is hottest Shutting down under- utilized servers could reduce energy consumption. 19(c) Lei Li 2012
What happens when shutting down servers? 20 Shut down (c) Lei Li 2012
Outline Overview of time series mining –Time series examples –What problems do we solve Motivation Experimental setup ThermoCast: the forecasting model Results Other time series models and algorithms 21(c) Lei Li 2012
ThermoCast [Li et al, KDD 2011] Given: intake temperatures, outtake temperatures, workload for each server, and floor air speed Goal: forecasting temperature distribution and thermal aware placement of workload Approach: a zonal forecasting model –divide the machine room into zones, and each rack into sections. 22(c) Lei Li 2012
Assumptions A0: incompressible air A1: environmental temperature is constant A2: supply air temperature is constant within a period A3: constant server fan speed A4: vertical air flow at the outtake is negligible A5: vertical air flow at the intake is linear to height 23(c) Lei Li 2012
Sensor measurements & Air interactions 24(c) Lei Li 2012
ThermoCast 25(c) Lei Li 2012
ThermoCast Model 26 floor air speed Inlet temp outlet temp Derived from fluid dynamics and thermodynamics together with assumptions [Li et al, KDD 2011] (c) Lei Li 2012
Parameter Learning 27 s.t. (c) Lei Li 2012
Outline Overview of time series mining –Time series examples –What problems do we solve Motivation Experimental setup ThermoCast: the forecasting model Results Other time series models and algorithms 28(c) Lei Li 2012
ThermoCast Results 29 AR ThermoCast 75% 100% shutdown Q1: How accurately can a server learn its local thermal dynamics for prediction? 2x better using 90 minutes as training, predicting 5 minutes away (c) Lei Li 2012
ThermoCast Results Q2: How long ahead can ThermoCast forecast thermal alarms? 2x faster 30 BaselineThermoCast Recall62.8%71.4% FAR45%43.1% MAT2.3min4.2 min FAR=false alarm rate MAT=mean look-ahead time (c) Lei Li 2012
Implication on Capacity Gain Preliminary results comparing workload placement strategies: –5 minutes forecast length –With the same cooling: Inlet temp with ThermoCast: C Inlet temp with Static profiling: 16.5 C Assume the servers consume 200W on average (Dell PowerEdge 1950), we gain extra 26% computing power with the same cooling 31(c) Lei Li 2012
Contributions and Impact Predictability: a hybrid approach to integrate the thermodynamics and sensor data Scalable learning/training thanks to the zonal thermal model Real data and instrument in a data center with practical workload Projected impact: can handle extra 26% workload (e.g. PUE 1.5 PUE 1.4) 32(c) Lei Li 2012
Outline Overview of time series mining –Time series examples –What problems do we solve Motivation Experimental setup ThermoCast: the forecasting model Results Other time series models and algorithms 33(c) Lei Li 2012
DynaMMo: imputation/forecasting 34 Time sensor 1 sensor 2 … sensor m blackout Goal: recover the missing values Details in [Li et al, KDD 2009] (c) Lei Li 2012
DynaMMo result 35 Reconstructionerror Average missing length Ideal Our DynaMMo MSVD [Srebro’03] Linear Interpolation Spline Dataset: CMU Mocap #16 mocap.cs.cmu.edu more results in [Li et al, KDD 2009] better harder (c) Lei Li 2012
PLiF and CLDS for clustering 36 BGP data: hierarchical clustering + PLiF features Details in [Li et al, VLDB 2010] and [Li & Prakash, ICML 2011] (c) Lei Li 2012
CLDS Clustering Mocap Data 37 Accuracy = 93.9% Accuracy = 51.0% PCA top 2 components CLDS two features walking motion running motion (c) Lei Li 2012
WindMine Goal: find patterns and anomalies from user- click streams 38(c) Lei Li 2012
Discoveries by WindMine 39 Job website Job website weather kids health (c) Lei Li 2012
Conclusion time series mining with many applications Numbers for energy consumption in DC, and cooling costs much Sensor networks find use in data center monitoring ThermoCast: the forecasting model Other time series models and algorithms –DynaMMo for imputation –PLiF & CLDS for clustering –WindMine for web clicks 40
References Lei Li, et al. ThermoCast: A Cyber-Physical Forecasting Model for Data Centers KDD 2011 Lei Li, et al. Time Series Clustering: Complex is Simpler. ICML 2011 Yasushi Sakurai, Lei Li, et al, WindMine: Fast and Effective Mining of Web-click Sequences, SDM, Lei Li, et al. Parsimonious Linear Fingerprinting for Time Series. VLDB Lei Li, et al. DynaMMo: Mining and Summarization of Coevolving Sequences with Missing Values. ACM KDD (c) Lei Li 2012
Thanks! contact: Lei Li papers, software, datasets on