Presentation is loading. Please wait.

Presentation is loading. Please wait.

TRANCO: A Research-Oriented Top Sites Ranking Hardened Against Manipulation Victor Le Pochat, Tom Van Goethem, Samaneh Tajalizadehkhoob, Maciej Korczyński.

Similar presentations


Presentation on theme: "TRANCO: A Research-Oriented Top Sites Ranking Hardened Against Manipulation Victor Le Pochat, Tom Van Goethem, Samaneh Tajalizadehkhoob, Maciej Korczyński."— Presentation transcript:

1 TRANCO: A Research-Oriented Top Sites Ranking Hardened Against Manipulation
Victor Le Pochat, Tom Van Goethem, Samaneh Tajalizadehkhoob, Maciej Korczyński and Wouter Joosen

2 Introduction Website Popularity rankings
Commonly used in research to evaluate security and privacy practices Issues with the validity and representativeness Four main ranking lists: Alexa, Cisco Umbrella, Majestic and Quantcast. What are the problems with these lists? How can they be manipulated? New ranking list to solve these issues : Tranco Website popularity rankings are commonly used in research to evaluate things like the prevalence of security and privacy practices. Despite there common use, there are many issues with the validity and representativeness of these lists. This paper was aimed at demonstrating the problems in the four main ranking lists and showing how easily they can be manipulated, then proposing solutions to these problems and implementing them in their own ranking list: TRANCO.

3 Alexa Based on page visits reported by a user panel and on a tracking script Most often used in recent research Very volatile list, changes daily.

4 Cisco Umbrella Based on DNS traffic Most recently created list
Volatile list Contains domains not visited through a browser or invalid Contains subdomains

5 Majestic Based on links back to a domain
Very stable list, sites may remain ranked even if popularity fades

6 Quantcast Based on page visits reported by a tracking script and on ISP traffic United States traffic only Very stable list, due to infrequent updates of estimates Some domains are hidden Domains may have an equal rank

7 Problems This study identified five main problems with these current ranking lists: Similarity Stability Representativeness Responsiveness Benignness Their susceptibility to manipulation is also a big concern

8 Problem – Similarity Combined, the 4 lists had 2.82 million sites, but agree on only 70,000 of them In terms of research, completely different results could be obtained depending on which list you used.

9 Problem – Stability A very stable list provides a reusable set of domains, but sudden increases or decreases in popularity may not be represented. A volatile list may introduce large variations in the results of longitudinal studies.

10 Problem – Representativeness
The top 10 top level domains make up over 73% of each list. Do not have access to complete internet usage data.

11 Problem – Responsiveness
Some sites on the list are not reachable. Out of the reachable websites some are so small in size that they are likely empty or contain no useful information, despite apparently being regularly visited by users.

12 Problem – Benignness The ranking lists investigated all contained some malicious sites, with Majestic having the most at 0.22%. Particularly concerning as popular sites are often assumed to be trustworthy, for example Quad9’s DNS-based blocking service whitelists all domains on Majestic’s list.

13 Why does this matter? 133 top-tier studies based their experiments and conclusions off these rankings. Data collection and processing methods these lists use to determine their rankings is unknown, it is hard to know how this might impact results. In general, the studies did not question the validity of these rankings. This research is used to inform decisions made by governments and companies on security issues.

14 Manipulation Can push malicious sites into the rankings to get included in whitelists Can manipulate other sites to push a site out of the rankings to hide domains from the lists. The authors of this paper were able to successfully manipulate the rankings for all four of the lists they investigated.

15 Solution To deter this manipulation, ranking lists need to employ tactics that would increase the effort and resources required Alexa - make it harder to create fake profiles Filter data based on IP address, particularly useful for the Umbrella list Check that the domains on the list are real and work. But no way to make the ranking lists adopt their recommendations

16 Tranco Combines the existing lists.
Aimed at resolving the deficiencies of those lists. Highly configurable Keeps a history of all lists Full transparency of data processing methods

17 Tranco – Configuration
Which lists to combine The number of days to get data for The time period to select the list from The combination method, Borda or Dowdall Size of the input list and output list

18 Tranco – Filtration How often they appear
If they appear on only some of the lists If they are flagged as dangerous by google safe browsing If they are pay level domains or subdomains If they are in Google’s Chrome User Experience Report

19 Tranco – How is it Better?
Similarity The similarity between the existing lists and their combined list is around 35-45% for all lists,, no list has a disproportionate influence. Stability When all lists are combined and averaged over 30 days, the daily change is only 0.6% Reproducibility A history of all lists made with Tranco along with their configuration details is kept Manipulation Is still susceptible to manipulation, but requires 4 times the effort as you would have to manipulate all 4 lists to achieve a high ranking on the Tranco list.

20 Criticisms Don’t discuss the limitations of Tranco
Don’t detail how much the representativeness, responsiveness and benignness is improved. No long term manipulation experiments Hard to quantify how much effort it would take to manipulate Tranco in the long term without this. No long term evaluation of Tranco Hard to know how it will perform and its resilience to manipulation in the long term.

21 Questions?


Download ppt "TRANCO: A Research-Oriented Top Sites Ranking Hardened Against Manipulation Victor Le Pochat, Tom Van Goethem, Samaneh Tajalizadehkhoob, Maciej Korczyński."

Similar presentations


Ads by Google