Presentation is loading. Please wait.

Presentation is loading. Please wait.

TRANCO: A Research-Oriented Top Sites Ranking Hardened Against Manipulation By Prudhvi raju G id: 401689301.

Similar presentations


Presentation on theme: "TRANCO: A Research-Oriented Top Sites Ranking Hardened Against Manipulation By Prudhvi raju G id: 401689301."— Presentation transcript:

1 TRANCO: A Research-Oriented Top Sites Ranking Hardened Against Manipulation
By Prudhvi raju G id:

2 OBJECTIVE Provide a reliable and reproducible ranking list system.
Tranco- A Research-Oriented Top Sites Ranking Hardened Against Manipulation. A stabilized, reproducible, similar and non-manipulated ranking list for research.

3 Background Multiple commercial providers publish rankings of popular domains that they compose using various methods. Around 133 top-tier studies over the past 4 years based their experiments and conclusions on the data from these providers. The providers included in the research are: Alexa Majestic Cisco-Umbrella Quantcast

4 Background Alexa Cisco- Umbrella:
Alexa is an American web traffic analysis company which is a subsidiary of amazon. The ranks calculated by Alexa are based on traffic data from global data panel Cisco- Umbrella: The ranks calculated by Cisco Umbrella are based on DNS traffic to its two DNS resolvers (marketed as OpenDNS), claimed to amount to over 100 billion daily requests from 65 million users.

5 Background Majestic: Quantcast:
Majestic publishes the daily updated ‘Majestic Million’ list consisting of one million websites since October The ranks calculated by Majestic are based on backlinks to websites Sites are ranked on the number of class C (IPv4 /24) subnets that refer to the site at least once. Quantcast: Quantcast directly measures traffic through a tracking script as well as sites where Quantcast estimates traffic based on data from ‘ISPs and toolbar providers including the number of users. The list also includes ‘hidden profiles’, where sites are ranked but the domain is hidden.

6 Background Domain rankings would perfectly reflect the popularity of web sites free from any biases. The properties to be considered between these sites while using for security research are as follows: Similarity Stability Representativeness Responsiveness Benignness

7 Problem-Classification of list usage
Security studies often rely on the list from top ranking sites. Around 133 research papers make use of these lists for various purposes in their research. These papers are classified according to 4 purposes of their usage of lists. Prevalence – Proportion of sites affected by an issue Evaluation – Serve to test an attack or defense Whitelist – source of benign websites Ranking – exact ranks of sites are mentioned.

8 Problem- Influence on security studies
Most studies lack any comment on when the list was downloaded. Hampers the reproducibility of the studies. Influence on security studies: Incentives – Influence the studies related to policy makers and government through malicious user as incentives. Case Study – A long tail of fingerprinting scripts are largely unblocked by current privacy tools.

9 Problem- Large scale Manipulation
Manipulating rankings becomes a prime vector for influencing security research. These manipulations can be done on the lists with minimal effort and low cost. Few such manipulation techniques applied on the sites.

10 Alexa – manipulation Alexa ranks domains based on traffic data from two sources: Traffic rank – A browser extension that reports all page visits Certify – An analytic service that uses a tracking script to count all visits on subscribing websites. Extension: Installed an extension in chrome browser instance to include a domain in ranking list. Achieved a rank as high as with 12 requests. Certify: Requires subscription for using the service. Achieved up to a rank of with in 52 days.

11 Cisco Umbrella - manipulation
Ranks websites on the number of unique client IP’s issuing DNS requests. Cloud providers : Pool of IP addresses for service instances (AWS). Achieved rank of Alternatives: Tor IP spoofing

12 Majestic- manipulation
Ranks based on number of subnets hosting a website that links to the ranked domain. Backlinks – Option to provide higher rank position in SEO. Reflected URLs- Provide with GET option in URL’s Alternatives: Hosting own sites Pingbacks

13 Quantcast- manipulation
Ranks based on traffic data obtained from tracking script and webmasters install on their website. Quantified- It mainly obtains traffic data through its tracking script that webmasters install on their website. Alternatives: A chance of interaction with ISP and toolbar providers for Quantcast.

14 Solution Defend existing rankings against manipulation.
An improvised, efficient, stabilized ranking site – TRANCO. A suitable ranking list for research hardened against manipulation. Combination options: The Borda Count Dowdall rule Add filters to create a list that represent a certain desired subset of popular domains. Provide multiple options to filters like status code, domain length and content length.

15 Solution Malicious domains – Remove domains on the google safe browsing list from the generated list. Evaluation – Validate characteristics for security study. Similarity: The final result is unbiased and provides almost equal similarity in all reports. Stability: Averaging the ranking for 30 days provide more stable list Reproducibility: A citation template and short link are generated for every list. Manipulation: The manipulation need to be quadrupled to insert a website into the list.

16 Consideration The generated list is a byproduct of the existing lists.
Avoids manipulation effects on the final list The effort need to be quadrupled to obtain a ranking in the new list. All the ranking sites need to be manipulated in proportions to appear a domain in the combined list. Responsiveness factor for the lists is not addressed.

17 Criticism It’s a byproduct of existing lists.
An automated machine learning algorithm using Amber loom domain analyzer, which analyze and scan about the new websites. Categorical grouping of websites and maintaining an observatory period before entering them into potential list for various validations can help to avoid the malicious domains to enter into the list. Inclusion of addon functionalities like Valbot.com which provide domain name valuations reporting globally on site value traffic, PageRank, malware, whois data, seo and social media presence.

18


Download ppt "TRANCO: A Research-Oriented Top Sites Ranking Hardened Against Manipulation By Prudhvi raju G id: 401689301."

Similar presentations


Ads by Google