Predictive Parallelization: Taming Tail Latencies in

Name: Predictive Parallelization: Taming Tail Latencies in
Uploaded: 2017-12-23T22:27:51+00:00
Duration: PTM21S21
Channel: Ireland Boucher
Description: Predictive Parallelization: Taming Tail Latencies in

Predictive Parallelization: Taming Tail Latencies in
Web Search Myeongjae Jeon, Saehoon Kim, Seung-won Hwang, Yuxiong He, Sameh Elnikety, Alan L. Cox, Scott Rixner Microsoft Research, POSTECH, Rice University

Performance of Web Search
1) Query response time Answer quickly to users (e.g., in 300 ms) 2) Response quality (relevance) Provide highly relevant web pages Improve with resources and time consumed Focus: Improving response time without compromising quality

Background: Query Processing Stages
Focus: Stage 1 Query 100s – 1000s of good matching docs doc Doc. index search For example: 300 ms latency SLA 10s of the best matching docs 2nd phase ranking Few sentences for each doc Snippet generator Response

Goal Query Speeding up index search (stage 1) without compromising result quality Improve user experience Larger index serving Sophisticated 2nd phase doc Doc. index search For example: 300 ms latency SLA 2nd phase ranking Snippet generator Response

A slow server makes the entire cluster slow
How Index Search Works Query Pages Partition all web pages across index servers (massively parallel) Distribute query processing (embarrassingly parallel) Aggregate top-k relevant pages Index server Aggregator Top-k pages Top-k pages Problem: A slow server makes the entire cluster slow Partition All web pages

We need to reduce its tail latencies
Observation Query processing on every server. Response time is determined by the slowest one. We need to reduce its tail latencies Latency

Examples Terminate long query in the middle of processing
Fast response Slow response Aggregator Index servers Aggregator Index servers Terminate long query in the middle of processing → Fast response, but quality drop Long query (outlier)

Parallelism for Tail Reduction
Opportunity Challenge Available idle cores CPU-intensive workloads Tails are few Tails are very long Breakdown Latency Network 4.26 ms Queueing 0.15 ms I/O 4.70 ms CPU ms Percentile Latency Scale 50%tile 7.83 ms x1 75%tile 12.51 ms x1.6 95%tile 57.15 ms x7.3 99%tile ms x26.1 Latency breakdown for the 99%tile. Latency distribution

Query Parallelism for Tail Reduction
Opportunity 30% CPU utilization Available idle cores Few long queries Computationally-intensive workload Breakdown Latency Network 4.26 ms Queueing 0.15 ms I/O 4.70 ms CPU ms Percentile Latency Scale 50%tile 7.83 ms x1 75%tile 12.51 ms x1.6 95%tile 57.15 ms x7.3 99%tile ms x26.1 Table. Latency breakdown for the 99%tile. 99%tile latency of ms = 99% requests have latency ≤ ms Table. Latency distribution in Bing index server.

Predictive Parallelism for Tail Reduction
Short queries Many Almost no speedup Long queries Few Good speedup

Predictive Parallelization Workflow
Index server query Execution time predictor Predict (sequential) execution time of the query with high accuracy

Predictive Parallelization Workflow
Index server query Execution time predictor Resource manager long short Using predicted time, selectively parallelize long queries

Predictive Parallelization
Focus of Today’s Talk Predictor: of long query through machine learning Parallelization: of long query with high efficiency

Brief Overview of Predictor
Accuracy Cost High recall for guaranteeing 99%tile reduction Low prediction overhead and misprediction cost In our workload, 4% queries with > 80 ms At least 3% must be identified (75% recall) Prediction overhead of 0.75ms or less and high precision Existing approaches: Lower accuracy and higher cost

Accuracy: Predicting Early Termination
Only some limited portion contributes to top-k relevant results Such portion depends on keyword (or score distribution more exactly) Doc 1 Doc 2 Doc 3 ……. Doc N-2 Doc N-1 Doc N Docs sorted by static rank Highest Lowest Web documents ……. ……. Inverted index for “SIGIR” Not evaluated Processing

Space of Features Query features
4/11/2017 Space of Features Term Features [Macdonald et al., SIGIR 12] IDF, NumPostings Score (Arithmetic, Geometric, Harmonic means, max, var, gradient) Query features NumTerms (before and after rewriting) Relaxed Language Query features (6): captures query complexity. Query rewriting Term features (14): IDF: inverse document frequency © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

New Features: Query Rich clues from queries in modern search engines
<Fields related to query execution plan> rank=BM25F enablefresh=1 partialmatch=1 language=en location=us …. <Fields related to search keywords> SIGIR (Queensland or QLD)

Space of Features Query features
4/11/2017 Space of Features Term Features [Macdonald et al., SIGIR 12] IDF, NumPostings Score (Arithmetic, Geometric, Harmonic means, max, var, gradient) Query features NumTerms (before and after rewriting) Relaxed Language Query features (6): captures query complexity. Query rewriting Term features (14): IDF: inverse document frequency © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Space of Features Category Feature Term feature (14) AMeanScore GMeanScore HMeanScore MaxScore EMaxScore VarScore NumPostings GAvgMaxima MaxNumPostings In5%Max NumThres ProK IDF Query feature (6) English NumAugTerm Complexity RelaxCount NumBefore NumAfter All features cached to ensure responsiveness (avoiding disk access) Term features require 4.47GB memory footprint (for 100M terms)

Feature Analysis and Selection
Accuracy gain from boosted regression tree, suggesting cheaper subset What a surprise. Cheap features are enough to make prediction

Efficiency: Cheaper subset possible?
4/11/2017 Efficiency: Cheaper subset possible? Query features (6): captures query complexity. Query rewriting Term features (14): IDF: inverse document frequency © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Prediction Performance
80 ms Thresh. Precision (|A∩P|/|P|) Recall (|A∩P|/|A|) Cost Keyword features 0.76 0.64 High All features 0.89 0.84 Cheap features 0.86 0.80 Low A = actual long queries P = predicted long Query features are important Using cheap features is advantageous IDF from keyword features + query features Much smaller overhead (90+% less) Similarly high accuracy as using all features

Algorithms Classification vs. Regression Comparable accuracy
4/11/2017 Algorithms Classification vs. Regression Comparable accuracy Flexibility Algorithms Linear regression Gaussian process regression Boosted regression tree Regression versus classification Flexibility of regression © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Accuracy of Algorithms
4/11/2017 Accuracy of Algorithms Summary 80% long queries (> 80 ms) identified 0.6% short queries mispredicted 0.55 ms for prediction time with low memory overhead © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Predictive Parallelism
4/11/2017 Predictive Parallelism Key idea Parallelize only long queries Use a threshold on predicted execution time Evaluation Compare Predictive to other baselines Sequential Fixed Adaptive © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

99%tile Response Time Outperforms “Parallelize all”
50% throughput increase Outperforms “Parallelize all”

Performance: Response Time
4/11/2017 Performance: Response Time © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

4/11/2017 Response Time © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Related Work Search query parallelism Execution time prediction
Fixed parallelization [Frachtenberg, WWWJ 09] Adaptive parallelization using system load only [Raman et al., PLDI 11]  High overhead due to parallelizing all queries Execution time prediction Keyword-specific features only [Macdonald et al., SIGIR 12] → Lower accuracy and high memory overhead for our target problem

Future Work Misprediction Diverse workloads Dynamic adaptation
Prediction confidence Diverse workloads Analytics, graph processing,

Thank You! Your query to Bing is now parallelized if predicted as long. query Execution time predictor Resource manager long short

Predictive Parallelization: Taming Tail Latencies in

Similar presentations

Presentation on theme: "Predictive Parallelization: Taming Tail Latencies in"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Predictive Parallelization: Taming Tail Latencies in

Similar presentations

Presentation on theme: "Predictive Parallelization: Taming Tail Latencies in"— Presentation transcript:

Similar presentations

About project

Feedback