Machine Learning In Search Quality At Painting by G.Troshkov.

Slides:



Advertisements
Similar presentations
WillHelpYouOut.com Hits 1000 Let’s get Started.
Advertisements

Trends in the Web Space, from an insider
New York, October 4, 2010 Mobility in the next Decade: Ensuring you have the right programs to meet future talent challenges Presenters: Morgan Crosby.
Broadband Session Michael Byrne. Broadband Map Technical Details Data Integration Map Presentation Since Launch.
1 "Quality requirements for web-sites – recommendations for best practice for fulfilling user needs Best practice in communicating statistics: Making statistics.
Sequential learning in dynamic graphical model Hao Wang, Craig Reeson Department of Statistical Science, Duke University Carlos Carvalho Booth School of.
Michigan Electronic Grants System Plus
Xia Zhou*, Stratis Ioannidis ♯, and Laurent Massoulié + * University of California, Santa Barbara ♯ Technicolor Research Lab, Palo Alto + Technicolor Research.
Understanding Multiyear Estimates from the American Community Survey Updated February 2013.
1 Understanding Multiyear Estimates from the American Community Survey.
Even More TopX: Relevance Feedback Ralf Schenkel Joint work with Osama Samodi, Martin Theobald.
13 Copyright © 2005, Oracle. All rights reserved. Monitoring and Improving Performance.
31 Mount Pleasant, London WC1X 0AD UK Tel Fax LONDON | BEIJING | PHILADELPHIA | WASHINGTON The Energy.
QuestionPoint virtual reference networks Graeme Miller.
Security metrics in SCADA system Master of Computer and Information Science Student: Nguyen Duc Nam Supervisor: Elena Sitnikova.
Temporal Query Log Profiling to Improve Web Search Ranking Alexander Kotov (UIUC) Pranam Kolari, Yi Chang (Yahoo!) Lei Duan (Microsoft)
ProgressBook Training 2 Gradebook Setup Last Updated May 14, 2009.
Towards Methods for the Collective Gathering and Quality Control of Relevance Assessments SIGIR´09, July 2009.
INFORMATION SOLUTIONS Citation Analysis Reports. Copyright 2005 Thomson Scientific 2 INFORMATION SOLUTIONS Provide highly customized datasets based on.
Incorporating Participant Reputation in Community-driven Question Answering Systems Liangjie Hong, Zaihan Yang and Brian D. Davison Computer Science and.
Application of Ensemble Models in Web Ranking
Czech Internet Forum The World of Search 3 Global players do not win everywhere Value of Local Knowledge China (2001) S. Korea (1999) Czech Rep.
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
FindAll: A Local Search Engine for Mobile Phones Aruna Balasubramanian University of Washington.
Improving relevance prediction by addressing biases and sparsity in web search click data Qi Guo, Dmitry Lagun, Denis Savenkov, Qiaoling Liu
Search Engine Marketing Free Traffic for Your Web Site Paul Allen, CEO
A Quality Focused Crawler for Health Information Tim Tang.
Information Retrieval in Practice
Search Engines and Information Retrieval
Learning to Rank: New Techniques and Applications Martin Szummer Microsoft Research Cambridge, UK.
Looking at both the Present and the Past to Efficiently Update Replicas of Web Content Luciano Barbosa * Ana Carolina Salgado ! Francisco Tenorio ! Jacques.
1 The Four Dimensions of Search Engine Quality Jan Pedersen Chief Scientist, Yahoo! Search 19 September 2005.
Presented by Li-Tal Mashiach Learning to Rank: A Machine Learning Approach to Static Ranking Algorithms for Large Data Sets Student Symposium.
Jessica DuPont Assistant Director of Marketing and Communications John Thompson Web Systems Coordinator Catching & Captivating Students Online.
Algorithms (Contd.). How do we describe algorithms? Pseudocode –Combines English, simple code constructs –Works with various types of primitives Could.
Web Projections Learning from Contextual Subgraphs of the Web Jure Leskovec, CMU Susan Dumais, MSR Eric Horvitz, MSR.
Web Archive Information Retrieval Miguel Costa, Daniel Gomes (speaker) Portuguese Web Archive.
Important Task in Patents Retrieval Recall is an Important Factor Given Query Patent -> the Task is to Search all Related Patents Patents have Complex.
Overview of Search Engines
WageIndicator SEO, December 10, 2008 Irene van Beveren Today: 0.Why SEO is important 1.Keyword Strategies 2.Title Tags 3.Internal Links 4.Duplicate Content.
Modern Retrieval Evaluations Hongning Wang
Search Engines and Information Retrieval Chapter 1.
Evaluation Experiments and Experience from the Perspective of Interactive Information Retrieval Ross Wilkinson Mingfang Wu ICT Centre CSIRO, Australia.
Search Engine Optimization ext 304 media-connection.com The process affecting the visibility of a website across various search engines to.
An Analysis of Assessor Behavior in Crowdsourced Preference Judgments Dongqing Zhu and Ben Carterette University of Delaware.
Master Thesis Defense Jan Fiedler 04/17/98
Internet Unlimited LLC. Trolley Square, Suite 19c, Wilmington, DE 19806, USA Local Search & SEO Evaluation For ‘HangersRestaurant.com’ Local Directory.
The Bits Bazaar Vast amounts of information scattered across the world. Access within reach of millions of people without editors. Search engines provide.
Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.
High-level market surveys. Who we are? The first company specializing in high- level execution and analysis of online market surveys.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble.
Latest trends and facts. Internet in Russia Filip Pieczyński Moscow,
Sean Malley Mentor: Candace Jackson. The Old Main Page Not Easily Themable Complex Content Management Poor Structure No User Interactivity.
DIGITAL ADVERTISING Standard 4. THE ROLE OF DIGITAL ADVERTISING IS TO INCREASE SALES OR IMPROVE BRAND AWARENESS.
Learning to Estimate Query Difficulty Including Applications to Missing Content Detection and Distributed Information Retrieval Elad Yom-Tov, Shai Fine,
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
STD Approach Two general approaches: word-based and phonetics-based Goal is to rapidly detect the presence of a term in a large audio corpus of heterogeneous.
Beyond Ranking: Optimizing Whole-Page Presentation
Information Retrieval in Practice
Presentation by: Rebecca Chambers WebDuck Designs
Learning to Personalize Query Auto-Completion
SLAQ: Quality-Driven Scheduling for Distributed Machine Learning
SEARCH ENGINES & WEB CRAWLER Akshay Ghadge Roll No: 107.
Federated & Meta Search
The Four Dimensions of Search Engine Quality
IR Theory: Evaluation Methods
ISWC 2013 Entity Recommendations in Web Search
Agenda What is SEO ? How Do Search Engines Work? Measuring SEO success ? On Page SEO – Basic Practices? Technical SEO - Source Code. Off Page SEO – Social.
Presentation transcript:

Machine Learning In Search Quality At Painting by G.Troshkov

Russian Search Market Source: LiveInternet.ru,

1997 A Yandex Overview Yandex.ru was launched №7№7 Search engine in the world * (# of queries) 150 mln Search Queries a Day Offices >Moscow >4 Offices in Russia >3 Offices in Ukraine >Palo Alto (CA, USA) * Source: Comscore 2009

Variety of Markets countries with cyrillic alphabet regions in Russia Source: Wikipedia

Variety of Markets >Different culture, standard of living, average income for example, Moscow, Magadan, Saratov >Large semi-autonomous ethnic groups Tatar, Chechen, Bashkir >Neighboring bilingual markets Ukraine, Kazakhstan, Belarus

Geo-specific queries Relevant result sets vary across all regions and countries [wedding cake] [gas prices] [mobile phone repair] [пицца] Guess what it is?

pFound A Probabilistic Measure of User Satisfaction

Probability of User Satisfaction Similar to ERR, Chapelle, 2009, Expected Reciprocal Rank for Graded Relevance Optimization goal at Yandex since 2007 >pFound – Probability of an answer to be FOUND >pBreak – Probability of abandonment at each position (BREAK) >pRel – Probability of user satisfaction at a given position (RELevance)

Geo-Specific Ranking

An initial approach queryquery + user’s region Ranking feature e.g.: “user’s region and document region coincide”

An initial approach queryquery + user’s region >Twice as much queries ProblemsHard to perfect single ranking Cache hit degradation >Very poor local sites in some regions >Some features (e.g. links) missing >Countries (high-level regions) are very specific

Alternatives In Regionalization Separated local indices Unified index with geo-coded pages Two queries: original and modified (e.g. +city name) Results-based local intent detection Co-ranking and re-ranking of local results Train many formulas on local pools One query Query-based local intent detection Single ranking function Train one formula on a single pool VS

Why use MLR? Machine Learning as a Conveyer >Each region requires its ranking Very labor-intensive to construct >Lots of ranking features are deployed monthly MLR allows faster updates >Some query classes require specific ranking Music, shopping, etc

MatrixNet A Learning to Rank Method

MatrixNet A Learning Method >boosting based on decision trees We use oblivious trees (i.e. “matrices”) >optimize for pFound >solve regression tasks >train classifiers Based on Friedman’s Gradient Boosting Machine, Friedman 2000

MLR: complication of ranking formulas MatrixNet kb 1 kb 14 kb 220 kb kb

MLR: complication of ranking formulas A Sequence of More and More Complex Rankers >pruning with the Static Rank (static features) >use of simple dynamic features (such as BM25 etc) >complex formula that uses all the features available >potentially up to a million of matrices/trees for the very top documents See also Cambazoglu, 2010, Early Exit Optimizations for Additive Machine Learned Ranking Systems

Other search engines Geo-Dependent Queries: pfound Source:Yandex internal data,

Geo-Dependent Queries Player #2 Other search engines Source:AnalyzeThis.ru, Number of Local Results (%)

Conclusion MLR is the only key to regional search: it provides us the possibility of tuning many geo-specific models at the same time 20 Lessons

Challenges >Complexity of the models is increasing rapidly Don’t fit into memory! >MLR in its current setting does not fit well to time-specific queries Features of the fresh content are very sparse and temporal >Opacity of results of the MLR The back side of Machine Learning >Number of features grows faster than the number of judgments Hard to train ranking >Training formula on clicks and user behavior is hard Tens of Gb of data per a day!

Conclusion Participation and Support 22 Yandex and IR

Yandex MLR at IR Contests №1№1 MatrixNet at Yahoo Challenge: #1, 3, 10 (Track 2), also BagBoo, AG

Support of Russian IR Schools and Conferences > RuSSIR since 2007, – Russian Summer School for Information Retrieval > ROMIP since 2003, – Russian Information Retrieval Evaluation Workshop: 7 teams, 2 tracks in 2003; 20 teams, 11 tracks in 2009 > Yandex School of Data Analysis since 2007 – 2 years university master program Grants and Online Contests > IMAT (Internet Mathematics) 2005, 2007 – Yandex Research Grants; 9 data sets > IMAT 2009 – Learning To Rank (in a modern setup: test set is queries and ~ judgments, no raw data) > IMAT 2010 – Road Traffic Prediction

We are hiring!