Predictive Parallelization: Taming Tail Latencies in

Slides:

Advertisements

Similar presentations

Fast Data at Massive Scale Lessons Learned at Facebook Bobby Johnson.

Advertisements

Chapter 5: Introduction to Information Retrieval

Diversified Retrieval as Structured Prediction Redundancy, Diversity, and Interdependent Document Relevance (IDR ’09) SIGIR 2009 Workshop Yisong Yue Cornell.

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.

Clustering and Load Balancing Optimization for Redundant Content Removal Shanzhong Zhu (Ask.com) Alexandra Potapova, Maha Alabduljalil (Univ. of California.

OverCite: A Distributed, Cooperative CiteSeer Jeremy Stribling, Jinyang Li, Isaac G. Councill, M. Frans Kaashoek, Robert Morris MIT Computer Science and.

Learning to Rank: New Techniques and Applications Martin Szummer Microsoft Research Cambridge, UK.

2/25/2004 The Google Cluster Architecture February 25, 2004.

Improving performance of Multiple Sequence Alignment in Multi-client Environments Aaron Zollman CMSC 838 Presentation.

Sigir’99 Inside Internet Search Engines: Search Jan Pedersen and William Chang.

Saehoon Kim§, Yuxiong He. , Seung-won Hwang§, Sameh Elnikety

Adaptive Stream Processing using Dynamic Batch Sizing Tathagata Das, Yuan Zhong, Ion Stoica, Scott Shenker.

A Search-based Method for Forecasting Ad Impression in Contextual Advertising Defense.

FLANN Fast Library for Approximate Nearest Neighbors

Minimal Probing: Supporting Expensive Predicates for Top-k Queries Kevin C. Chang Seung-won Hwang Univ. of Illinois at Urbana-Champaign.

Cache-Conscious Runtime Optimization for Ranking Ensembles Xun Tang, Xin Jin, Tao Yang Department of Computer Science University of California at Santa.

Machine Learning at Orbitz Robert Lancaster and Jonathan Seidman Strata 2011 February 02 | 2011.

1 Scheduling I/O in Virtual Machine Monitors© 2008 Diego Ongaro Scheduling I/O in Virtual Machine Monitors Diego Ongaro, Alan L. Cox, and Scott Rixner.

11 If you were plowing a field, which would you rather use? Two oxen, or 1024 chickens? (Attributed to S. Cray) Abdullah Gharaibeh, Lauro Costa, Elizeu.

Performance Evaluation of Parallel Processing. Why Performance?

Parallel Applications Parallel Hardware Parallel Software IT industry (Silicon Valley) Users Efficient Parallel CKY Parsing on GPUs Youngmin Yi (University.

Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.

Parallel and Distributed IR. 2 Papers on Parallel and Distributed IR Introduction Paper A: Inverted file partitioning schemes in Multiple Disk Systems.

« Performance of Compressed Inverted List Caching in Search Engines » Proceedings of the International World Wide Web Conference Commitee, Beijing 2008)

Budget-based Control for Interactive Services with Partial Execution 1 Yuxiong He, Zihao Ye, Qiang Fu, Sameh Elnikety Microsoft Research.

C3: Cutting Tail Latency in Cloud Data Stores via Adaptive Replica Selection Marco Canini (UCL) with Lalith Suresh, Stefan Schmid, Anja Feldmann (TU Berlin)

Search - on the Web and Locally Related directly to Web Search Engines: Part 1 and Part 2. IEEE Computer. June & August 2006.

1 Multiprocessor and Real-Time Scheduling Chapter 10 Real-Time scheduling will be covered in SYSC3303.

ICPP 2012 Indexing and Parallel Query Processing Support for Visualizing Climate Datasets Yu Su*, Gagan Agrawal*, Jonathan Woodring † *The Ohio State University.

Search Result Interface Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.

Web Search Using Mobile Cores Presented by: Luwa Matthews 0.

Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.

The Simigle Image Search Engine Wei Dong

Power Containers: An OS Facility for Fine-Grained Power and Energy Management on Multicore Servers Kai Shen, Arrvindh Shriraman, Sandhya Dwarkadas, Xiao.

Towards Dynamic Green-Sizing for Database Servers Mustafa Korkmaz, Alexey Karyakin, Martin Karsten, Kenneth Salem University of Waterloo.

Evaluation of (Search) Results How do we know if our results are any good? Evaluating a search engine  Benchmarks  Precision and recall Results summaries:

1 Virtual Machine Memory Access Tracing With Hypervisor Exclusive Cache USENIX ‘07 Pin Lu & Kai Shen Department of Computer Science University of Rochester.

CiteSight: Contextual Citation Recommendation with Differential Search Avishay Livne 1, Vivek Gokuladas 2, Jaime Teevan 3, Susan Dumais 3, Eytan Adar 1.

Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.

Parallel and Distributed Simulation Time Parallel Simulation.

What Does the User Really Want ? Relevance, Precision and Recall.

Copyright © 2006, GemStone Systems Inc. All Rights Reserved. Increasing computation throughput with Grid Data Caching Jags Ramnarayan Chief Architect GemStone.

1 Adaptive Parallelism for Web Search Myeongjae Jeon Rice University In collaboration with Yuxiong He (MSR), Sameh Elnikety (MSR), Alan L. Cox (Rice),

Introduction to Information Retrieval Introduction to Information Retrieval CS276 Information Retrieval and Web Search Chris Manning and Pandu Nayak Efficient.

Document Clustering and Collection Selection Diego Puppin Web Mining,

Personalizing Web Search Jaime Teevan, MIT with Susan T. Dumais and Eric Horvitz, MSR.

Zeta: Scheduling Interactive Services with Partial Execution Yuxiong He, Sameh Elnikety, James Larus, Chenyu Yan Microsoft Research and Microsoft Bing.

Nawanol Theera-Ampornpunt, Seong Gon Kim, Asish Ghoshal, Saurabh Bagchi, Ananth Grama, and Somali Chaterji Fast Training on Large Genomics Data using Distributed.

The Anatomy of a Large-Scale Hypertextual Web Search Engine S. Brin and L. Page, Computer Networks and ISDN Systems, Vol. 30, No. 1-7, pages , April.

1 Parallel Datacube Construction: Algorithms, Theoretical Analysis, and Experimental Evaluation Ruoming Jin Ge Yang Gagan Agrawal The Ohio State University.

Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.

Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi

Optimizing Distributed Actor Systems for Dynamic Interactive Services

Lecture 2: Performance Evaluation

Efficient Multi-User Indexing for Secure Keyword Search

A Dynamic Scheduling Framework for Emerging Heterogeneous Systems

Memshare: a Dynamic Multi-tenant Key-value Cache

Optimizing Parallel Algorithms for All Pairs Similarity Search

Advanced Topics in Concurrency and Reactive Programming: Case Study – Google Cluster Majeed Kassis.

Measurement-based Design

Chilimbi, et al. (2014) Microsoft Research

Information Retrieval in Practice

PA an Coordinated Memory Caching for Parallel Jobs

IST 516 Fall 2011 Dongwon Lee, Ph.D.

Blazing-Fast Performance:

What is the Azure SQL Datawarehouse?

HashKV: Enabling Efficient Updates in KV Storage via Hashing

Paraskevi Raftopoulou, Euripides G.M. Petrakis

8. Efficient Scoring Most slides were adapted from Stanford CS 276 course and University of Munich IR course.

Presentation transcript:

Predictive Parallelization: Taming Tail Latencies in Web Search Myeongjae Jeon, Saehoon Kim, Seung-won Hwang, Yuxiong He, Sameh Elnikety, Alan L. Cox, Scott Rixner Microsoft Research, POSTECH, Rice University

Performance of Web Search 1) Query response time Answer quickly to users (e.g., in 300 ms) 2) Response quality (relevance) Provide highly relevant web pages Improve with resources and time consumed Focus: Improving response time without compromising quality

Background: Query Processing Stages Focus: Stage 1 Query 100s – 1000s of good matching docs doc Doc. index search For example: 300 ms latency SLA 10s of the best matching docs 2nd phase ranking Few sentences for each doc Snippet generator Response

Goal Query Speeding up index search (stage 1) without compromising result quality Improve user experience Larger index serving Sophisticated 2nd phase doc Doc. index search For example: 300 ms latency SLA 2nd phase ranking Snippet generator Response

A slow server makes the entire cluster slow How Index Search Works Query Pages Partition all web pages across index servers (massively parallel) Distribute query processing (embarrassingly parallel) Aggregate top-k relevant pages Index server Aggregator Top-k pages Top-k pages Problem: A slow server makes the entire cluster slow Partition All web pages

We need to reduce its tail latencies Observation Query processing on every server. Response time is determined by the slowest one. We need to reduce its tail latencies Latency

Examples Terminate long query in the middle of processing Fast response Slow response Aggregator Index servers Aggregator Index servers Terminate long query in the middle of processing → Fast response, but quality drop Long query (outlier)

Parallelism for Tail Reduction Opportunity Challenge Available idle cores CPU-intensive workloads Tails are few Tails are very long Breakdown Latency Network 4.26 ms Queueing 0.15 ms I/O 4.70 ms CPU 194.95 ms Percentile Latency Scale 50%tile 7.83 ms x1 75%tile 12.51 ms x1.6 95%tile 57.15 ms x7.3 99%tile 204.06 ms x26.1 Latency breakdown for the 99%tile. Latency distribution

Query Parallelism for Tail Reduction Opportunity 30% CPU utilization Available idle cores Few long queries Computationally-intensive workload Breakdown Latency Network 4.26 ms Queueing 0.15 ms I/O 4.70 ms CPU 194.95 ms Percentile Latency Scale 50%tile 7.83 ms x1 75%tile 12.51 ms x1.6 95%tile 57.15 ms x7.3 99%tile 204.06 ms x26.1 Table. Latency breakdown for the 99%tile. 99%tile latency of 204.06 ms = 99% requests have latency ≤ 204.06 ms Table. Latency distribution in Bing index server.

Predictive Parallelism for Tail Reduction Short queries Many Almost no speedup Long queries Few Good speedup

Predictive Parallelization Workflow Index server query Execution time predictor Predict (sequential) execution time of the query with high accuracy

Predictive Parallelization Workflow Index server query Execution time predictor Resource manager long short Using predicted time, selectively parallelize long queries

Predictive Parallelization Focus of Today’s Talk Predictor: of long query through machine learning Parallelization: of long query with high efficiency

Brief Overview of Predictor Accuracy Cost High recall for guaranteeing 99%tile reduction Low prediction overhead and misprediction cost In our workload, 4% queries with > 80 ms At least 3% must be identified (75% recall) Prediction overhead of 0.75ms or less and high precision Existing approaches: Lower accuracy and higher cost

Accuracy: Predicting Early Termination Only some limited portion contributes to top-k relevant results Such portion depends on keyword (or score distribution more exactly) Doc 1 Doc 2 Doc 3 ……. Doc N-2 Doc N-1 Doc N Docs sorted by static rank Highest Lowest Web documents ……. ……. Inverted index for “SIGIR” Not evaluated Processing

Space of Features Query features 4/11/2017 Space of Features Term Features [Macdonald et al., SIGIR 12] IDF, NumPostings Score (Arithmetic, Geometric, Harmonic means, max, var, gradient) Query features NumTerms (before and after rewriting) Relaxed Language Query features (6): captures query complexity. Query rewriting Term features (14): IDF: inverse document frequency © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

New Features: Query Rich clues from queries in modern search engines <Fields related to query execution plan> rank=BM25F enablefresh=1 partialmatch=1 language=en location=us …. <Fields related to search keywords> SIGIR (Queensland or QLD)

Space of Features Query features 4/11/2017 Space of Features Term Features [Macdonald et al., SIGIR 12] IDF, NumPostings Score (Arithmetic, Geometric, Harmonic means, max, var, gradient) Query features NumTerms (before and after rewriting) Relaxed Language Query features (6): captures query complexity. Query rewriting Term features (14): IDF: inverse document frequency © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Space of Features Category Feature Term feature (14) AMeanScore GMeanScore HMeanScore MaxScore EMaxScore VarScore NumPostings GAvgMaxima MaxNumPostings In5%Max NumThres ProK IDF Query feature (6) English NumAugTerm Complexity RelaxCount NumBefore NumAfter All features cached to ensure responsiveness (avoiding disk access) Term features require 4.47GB memory footprint (for 100M terms)

Feature Analysis and Selection Accuracy gain from boosted regression tree, suggesting cheaper subset What a surprise. Cheap features are enough to make prediction

Efficiency: Cheaper subset possible? 4/11/2017 Efficiency: Cheaper subset possible? Query features (6): captures query complexity. Query rewriting Term features (14): IDF: inverse document frequency © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Prediction Performance 80 ms Thresh. Precision (|A∩P|/|P|) Recall (|A∩P|/|A|) Cost Keyword features 0.76 0.64 High All features 0.89 0.84 Cheap features 0.86 0.80 Low A = actual long queries P = predicted long Query features are important Using cheap features is advantageous IDF from keyword features + query features Much smaller overhead (90+% less) Similarly high accuracy as using all features

Algorithms Classification vs. Regression Comparable accuracy 4/11/2017 Algorithms Classification vs. Regression Comparable accuracy Flexibility Algorithms Linear regression Gaussian process regression Boosted regression tree Regression versus classification Flexibility of regression © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Accuracy of Algorithms 4/11/2017 Accuracy of Algorithms Summary 80% long queries (> 80 ms) identified 0.6% short queries mispredicted 0.55 ms for prediction time with low memory overhead © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Predictive Parallelism 4/11/2017 Predictive Parallelism Key idea Parallelize only long queries Use a threshold on predicted execution time Evaluation Compare Predictive to other baselines Sequential Fixed Adaptive © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

99%tile Response Time Outperforms “Parallelize all” 50% throughput increase Outperforms “Parallelize all”

Performance: Response Time 4/11/2017 Performance: Response Time © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

4/11/2017 Response Time © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Related Work Search query parallelism Execution time prediction Fixed parallelization [Frachtenberg, WWWJ 09] Adaptive parallelization using system load only [Raman et al., PLDI 11]  High overhead due to parallelizing all queries Execution time prediction Keyword-specific features only [Macdonald et al., SIGIR 12] → Lower accuracy and high memory overhead for our target problem

Future Work Misprediction Diverse workloads Dynamic adaptation Prediction confidence Diverse workloads Analytics, graph processing,

Thank You! Your query to Bing is now parallelized if predicted as long. query Execution time predictor Resource manager long short