Presentation is loading. Please wait.

Presentation is loading. Please wait.

Rank Aggregation Methods II Experiments CS728 Lecture 12.

Similar presentations


Presentation on theme: "Rank Aggregation Methods II Experiments CS728 Lecture 12."— Presentation transcript:

1 Rank Aggregation Methods II Experiments CS728 Lecture 12

2 Recall the Rank Aggregation Problem m candidates (a.k.a. “alternatives”) –M = {1,…,m}: set of candidates n voters (a.k.a. “agents” or “judges”) –N = {1,…,n}: set of voters Each voter i, has an ranking  i on M –  i (a) <  i (b) means i-th voter prefers a to b –Ranking may be a total or partial order The rank aggregation problem: Combine  1,…,  n into a single ranking  on M, which represents the “social choice” of the voters. –Rank aggregation function: f(  1,…,  n ) =  –  may be a total or partial order

3 Experiments: Distance Measures Goal: Quantitatively compare different rank aggregation methods. Performance Measures: (1) Spearman footrule distance is sum of pointwise distances. It is normalized by dividing this number by the maximum value (1/2)|S| 2, value between 0 and 1. (2) Kendall tau distance counts the number of pairwise disagreements. Dividing by the maximum possible value (1/2)S(S - 1) we obtain a normalized version, value between 0 and 1. (3) The induced footrule distance is obtained by taking the projections of a full list s with each partial list. In a similar manner, induced Kendall tau distance can be defined. (4) The scaled footrule distance weights contributions of elements based on the length of the lists they are present in. If s is a full list and t is a partial list, then: SF(s, t) = Sum | s(i)/|s|) - (t(i)/|t|) |. Normalize SF by dividing by |t|/2.

4 Experiments: Distance Measures So for each aggregation method and each distance measure we get a vector of values, each component representing a distance to from the aggregation to each voter list Simplest is to take the average (or 1-norm) Other norms are interesting –Mean square distance (2-norm) –Max distance (∞-norm)

5 Experiments: Minimizing Average Altavista (AV), Alltheweb (AW), Excite (EX), Google (GG), Hotbot HB),Lycos (LY), and Northernlight (NL) AltavistaAllthewebExciteGoogleHotbotLycosNorthernlight K = Kendall distance SF = scaled footrule distance IF = induced footrule distance LK = Local Kemenization

6 Experiments in Spam Filtering Define spam to be web pages are low-ranked by majority opinion (machine and human – a simplifying assumption) – although they may be highly ranked by some search engines Intuition: if a page spams most search engines for a particular query, then no combination of these search engines can filter the spam.---garbage in, garbage out. Spam pages are the Condorcet losers, and will occupy the bottom of ranking that satisfies the extended Condorcet criterion Similarly, good pages will be in the Condorcet winners, and will rank above the losers.

7 Condorcet Criterion –An candidate of M which wins every other in pairwise simple majority voting should be ranked first. Extended Condorcet Criterion (XCC): –Version 1: If most voters prefer candidate a to candidate b (i.e., # of i s.t.  i (a) <  i (b) is at least n/2), then also  should prefer a to b (i.e.,  (a) <  (b)). –Version 2: If there is a partition (W, L) of M such that for any x in W and y in L the majority prefers x to y, then x must be ranked above y. W is called Condorcet winners and L is Condorcet losers Condorcet Criteria

8 XCC(2) and SPAM Filtering Note that XCC(1) => XCC(2), so Version 1 is stronger But XCC(1) is not always realizable As we will see XCC(2) is always realizable via Local Keminization Hence using rank aggregation with XCC(2) should assist in SPAM filtering, since Condorcet losers will be lowest rank Let us look at where spam pages (human determined) are ranked with good aggregation methods.

9 Experiments: Filtering SPAM

10 Experiment: Word association Different search engines and portals have different (default) semantics of handling a multi-word query. Some use OR semantics (documents contain one of the given query terms) while Google uses the AND semantics (all the query words must appear). Both inconvenient in many situations. Consider searching for the job of a software engineer from an on-line job database. The user lists a number of skills and a number of potential keywords in the job description, for example, "Silicon Valley C++ Java CORBA TCP-IP algorithms start-up pre-IPO stock options". It is clear that the "AND" rule might produce no document or SPAM, and the "OR" rule is equally disastrous. Experiment with rank aggregation using multiple queries based on small subsets of terms.

11 Results for query: madras madurai coimbatore vellore. (cities in the state of Tamil Nadu, India) Google www.mssrf.org/Fris9809/location-tamilnadu.html www.indiaplus.com/Info/schools.html www.focustamilnadu.com/tamilnadu/Policy%20Note...Forests.html www.tn.gov.in/policy/environ.htm www.indiacolleges.com/Tamil_Nadu.htm www.mssrf.org/Fris9809/location-tamilnadu.html www.indiaplus.com/Info/schools.html www.focustamilnadu.com/tamilnadu/Policy%20Note...Forests.html www.tn.gov.in/policy/environ.htm www.indiacolleges.com/Tamil_Nadu.htm SFO with LK www.madurai.com www.ozemail.com.au/clday/locations.htm www.utoledo.edu/homepages/speelam/coimbatore.html www.ozemail.com.au/clday/madras.htm www.madurai.com/around.htm www.indiatraveltimes.com/tamilnadu/tamil1.html www.madurai.com www.ozemail.com.au/clday/locations.htm www.utoledo.edu/homepages/speelam/coimbatore.html www.ozemail.com.au/clday/madras.htm www.madurai.com/around.htm www.indiatraveltimes.com/tamilnadu/tamil1.html MC4 with LK www.madurai.com www.surfindia.com/omsakthi/tourism.htm www.indiatraveltimes.com/tamilnadu/tamil1.html www.indiatraveltimes.com/tamilnadu/tamil2.html www.indiatravels.com/forts/vellore_fort.htm www.india-tourism.de/english/south/tamil_nadu.html www.madurai.com www.surfindia.com/omsakthi/tourism.htm www.indiatraveltimes.com/tamilnadu/tamil1.html www.indiatraveltimes.com/tamilnadu/tamil2.html www.indiatravels.com/forts/vellore_fort.htm www.india-tourism.de/english/south/tamil_nadu.html

12 Locally Kemeny optimal aggregation and XCC(2) Many of existing aggregation methods do not satisfy XCC(1) or XCC(2). It is possible to use your favorite aggregation method to obtain a full list. Then apply local kemenization to realize XCC(2) which filters Condorcet losers.

13 Locally Kemeny optimal Recall that Kemeny optimal is NP-hard Definition of locally optimal A permutation p is a locally Kemeny optimal aggregation of partial lists t1, t2,..., tk, if there is no permutation p' that can be obtained from p by performing a single transposition of an adjacent pair of elements and for which Kendal distance K(p', t1, t2,..., tk) < K(p, t1, t2,..., tk). In other words, it is impossible to reduce the total distance to the t's by flipping an adjacent pair.

14 Example of LKO but not KO Example 1 t1 = (1,2), t2 = (2,3), t3 = t4 = t5 = (3,1). p = (1,2,3), We have that p satisfies Definition of LKO, K(p, t1, t2,..., t5)= 3, but transposing 1 and 3 decreases the sum to 2.

15 LKO satisfies XCC(2) Proof by contradiction If the result is false then there exist partial lists t1, t2,..., tk, a LKO aggregation p, and a partition (W,L) that violates XCC(2); that is some pair c in W and d in L, such that p(d) < p(c). Let (c,d) be the closest such pair in p. Consider the immediate successor of d in p, call it e. If e=c then c is adjacent to d in p and transposing this adjacent pair of alternatives produces a p' such that K(p', t1, t2,..., tk) < K(p, t1, t2,..., tk), contradicting the assumption on p. If e does not equal c, then either e is in W, in which case the pair (e,d) is a closer pair in p than (d,c) and also violates the XCC(2), or e is in L, in which case (e,c) is a closer pair than (d,c) that violates XCC(2). Both cases contradict the choice of (d,c).

16 A local Kemenization of a full list with respect to preference lists so as to compute a locally Kemeny optimal aggregation that is maximally consistent with original. This approach: (1) preserves the strengths of the initial aggregation (2) ranks non-spam above spam. (3) gives a result that disagrees with original on any pair (i, j) only if a majority endorse this disagreement. (4) for every d, 1 ≤ d ≤ | μ |, the restriction of the output is a local Kemenization of the top d elements of μ Local Kemenization procedure

17 A simple inductive construction. Assume inductively for that we have constructed p, a local Kemenization of the projection of the t's onto the elements 1,..., l-1. Insert next element x into the lowest-ranked "permissible" position in p: just below the lowest-ranked element y in p such that –(a) no majority among the (original) t's prefers x to y and –(b) for all successors z of y in p there is a majority that prefers x to z. In other words, we try to insert x at the end (bottom) of the list p; we bubble it up toward the top of the list as long as a majority of the t's insists that we do.

18 Example local kemenization procedure ABFECDABFECD BCAEFDBCAEFD ACFDEBACFDEB BFDCAEBFDCAE CABFEDCABFED BADCEFBADCEF BBABA ABAB ABDABD ABDCABDC ABCDABCD ABCFEDABCFED Local Kemenization Example! disagree A>B: 3 A<B: 2 B>D: 4 B<D: 1

19 RA and Searching Workplace Web Axiom 1: Intranet documents are not spam Axiom 2: Queries usually have unique answers (not broad topic based) Axiom 3: Intranet docs are not search engine friendly (docs are accessed through portals and database queries Rank aggregation allows us to combine number of heuristic alternatives: static and dynamic, query dependent and independent


Download ppt "Rank Aggregation Methods II Experiments CS728 Lecture 12."

Similar presentations


Ads by Google