An Empirical Reexamination of Global DNS Behavior Hongyu Gao Northwestern Phil Porras SRI Yan Chen Northwestern Shalini Ghosh SRI Jian Jiang Tsinghua Haixin.

Slides:



Advertisements
Similar presentations
1 Senn, Information Technology, 3 rd Edition © 2004 Pearson Prentice Hall James A. Senns Information Technology, 3 rd Edition Chapter 7 Enterprise Databases.
Advertisements

1
& dding ubtracting ractions.
© 2008 Pearson Addison Wesley. All rights reserved Chapter Seven Costs.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Chapter 1 The Study of Body Function Image PowerPoint
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
Effective Change Detection Using Sampling Junghoo John Cho Alexandros Ntoulas UCLA.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
ARIN Public Policy Meeting
1 OpenFlow + : Extension for OpenFlow and its Implementation Hongyu Hu, Jun Bi, Tao Feng, You Wang, Pingping Lin Tsinghua University
1 Network-Level Spam Detection Nick Feamster Georgia Tech.
We need a common denominator to add these fractions.
Exploring Traversal Strategy for Web Forum Crawling Yida Wang, Jiang-Ming Yang, Wei Lai, Rui Cai, Lei Zhang and Wei-Ying Ma Chinese Academy of Sciences.
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Process a Customer Chapter 2. Process a Customer 2-2 Objectives Understand what defines a Customer Learn how to check for an existing Customer Learn how.
CALENDAR.
Multiplying binomials You will have 20 seconds to answer each of the following multiplication problems. If you get hung up, go to the next problem when.
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Year 6 mental test 5 second questions
Year 6 mental test 10 second questions
ZMQS ZMQS
© Tally Solutions Pvt. Ltd. All Rights Reserved Shoper 9 License Management December 09.
INTERNET PROTOCOLS Class 9 CSCI 6433 David C. Roberts Entire contents copyright 2011, David C. Roberts, all rights reserved.
Solve Multi-step Equations
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
Break Time Remaining 10:00.
PP Test Review Sections 6-1 to 6-6
EU Market Situation for Eggs and Poultry Management Committee 21 June 2012.
1 Undirected Breadth First Search F A BCG DE H 2 F A BCG DE H Queue: A get Undiscovered Fringe Finished Active 0 distance from A visit(A)
IP Multicast Information management 2 Groep T Leuven – Information department 2/14 Agenda •Why IP Multicast ? •Multicast fundamentals •Intradomain.
VOORBLAD.
15. Oktober Oktober Oktober 2012.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
1 Developing a Predictive Model for Internet Video Quality-of-Experience Athula Balachandran, Vyas Sekar, Aditya Akella, Srinivasan Seshan, Ion Stoica,
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 EN0129 PC AND NETWORK TECHNOLOGY I IP ADDRESSING AND SUBNETS Derived From CCNA Network Fundamentals.
© 2012 National Heart Foundation of Australia. Slide 2.
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
1 Joseph Ghafari Artificial Neural Networks Botnet detection for Stéphane Sénécal, Emmanuel Herbert.
New Look, New Features, Broader Coverage
Before Between After.
25 seconds left…...
Subtraction: Adding UP
: 3 00.
5 minutes.
Januar MDMDFSSMDMDFSSS
We will resume in: 25 Minutes.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Addressing the Network – IPv4 Network Fundamentals – Chapter 6.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Clock will move after 1 minute
Intracellular Compartments and Transport
PSSA Preparation.
VPN AND REMOTE ACCESS Mohammad S. Hasan 1 VPN and Remote Access.
Essential Cell Biology
Select a time to count down from the clock above
CO-AUTHOR RELATIONSHIP PREDICTION IN HETEROGENEOUS BIBLIOGRAPHIC NETWORKS Yizhou Sun, Rick Barber, Manish Gupta, Charu C. Aggarwal, Jiawei Han 1.
People Counting and Human Detection in a Challenging Situation Ya-Li Hou and Grantham K. H. Pang IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART.
Detecting Malicious Flux Service Networks through Passive Analysis of Recursive DNS Traces Roberto Perdisci, Igino Corona, David Dagon, Wenke Lee ACSAC.
Presentation transcript:

An Empirical Reexamination of Global DNS Behavior Hongyu Gao Northwestern Phil Porras SRI Yan Chen Northwestern Shalini Ghosh SRI Jian Jiang Tsinghua Haixin Duan Tsinghua Vinod Yegneswaran SRI

An Empirical Reexamination of Global DNS Behavior Hongyu Gao Northwestern Phil Porras SRI Yan Chen Northwestern Shalini Ghosh SRI Jian Jiang Tsinghua Haixin Duan Tsinghua Vinod Yegneswaran SRI

Motivation Evolution of DNS

Motivation DNS has been the subject of numerous measurement studies 1992: Danzig et al. An analysis of wide-area name server traffic: a study of the Internet Domain System 2001: Brownlee et al. DNS measurements at a root server 2002: Jung et al. DNS performance and effectiveness of caching 2008: Castro et al. A day at the root of the Internet Secure Information Exchange publicly available data source from hundreds of operational resolvers primary driving force: security analysis

Background – DNS Protocol 5 Popular DNS query types: A: domain names -> IPv4 addresses AAAA: domain names -> IPv6 addresses PTR: IP addresses -> domain names …

Background – SIE Dataset 6 Resolver population: Commercial ISPs Universities Public DNS services Geographic location: North America Europe

Background – SIE Dataset 7 Data collection period: 12/09/2012 – 12/22/2012 Data size: 26 Billion DNS queries and responses 2 TB raw data / day # of contributing resolvers: 628

Background – SIE Dataset 8

9

Goals 10 Empirically reexamine findings from prior DNS measurement papers from the vantage point of the SIE datastream Differences due to perspective Differences due to the evolution of DNS use patterns Evaluate feasibility of extracting malicious domain groups from the global DNS data stream

11 Empirical Reexamination Query type distribution Traffic validity TTL distribution Malicious Domain Group Detection Detection using temporal correlation Domain group analysis Conclusions 11 Roadmap

12 Unanswered queries and unsolicited responses

Query Type Distribution 13 Distribution of DNS query types from the local perspective (%) * Dec 2000 (/ 2001) numbers are quoted from [Jung TON02]

Query Type Distribution (resolver to root perspective) 14 Distribution of DNS query types from the root perspective (%) * Oct 2002 numbers are quoted from [Wessels PAM03] ** Mar 2008 numbers are quoted from [Castro SIGCOMM CCR08]

15 Successful / Failed / Unanswered Jung et al. [2000] SIE [2012] Successful64.3%66.9% Negative Answer 11.1%18% Unanswered23.5%15.1%

Traffic Validity 16 Four established types of invalid traffic: The query name has an invalid TLD A-for-A Queries: The query “name” is already an IP address A PTR query for an IP address from private space Non-printable characters in the query name *2001: 20.0% *2002: 19.53% *2008: 22.0% 2012: 53.5% *2001: 12.0% *2002: 7.03% *2008: 2.7% 2012: 0.4% *2001: 7.0% *2002: 1.61% 2012: 0.1% *2002: 1.94% *2008: 0.1% 2012: 3.2% * These numbers are quoted from multiple prior papers. Please check our paper for more details.

TTL Distribution 17  ‘A’ and ‘AAAA’ records have similar TTL  ‘NS’ records have longer TTLs  ‘A’ records TTL becomes shorter  ‘NS’ records from root is highly regular

18 Repeated Queries

Repeated Queries (2) CNAME chain sanitization (~40% of repeated queries) Concurrent overlapping queries Premature retransmissions e.g., Unbound has a short retransmission timer Resolver quirks In certain cases BIND resolves expired NS names twice before replying to client queries Cache evictions

Key findings Demise of A-for-A queries Significant rise in AAAA queries > 50% of queries sent to root servers do not produce a successful answer Invalid TLD queries responsible for 99% of invalid queries sent to root servers TTLs of A records have become much smaller 2001: 20% of A records have TTLs > 1 day 2012: 90% of A records have TTLs 1 day

21 Empirical Reexamination Query type distribution Traffic validity TTL distribution Malicious Domain Group Detection Detection using temporal correlation Domain group analysis Conclusions 21 Roadmap

Malware Domain Group Detection 22 Key intuition: DNS queries are not isolated instances. Detection method: Advantages: Detect malicious domain groups in general (scam, DGA, etc.) Do not need comprehensive labeled training set Anchor Malicious Domain Temporal Correlation Detected Malicious Domain

Challenge 23 Ideally: In reality: Anchor Malicious Domain Malicious Domain 1 Malicious Domain 2 Detected! Anchor Malicious Domain Malicious Domain 1 Malicious Domain 2 Benign Domain 1 Benign Domain 2 Benign Domain 3 DNS caching effect!

Practical Solution 24 A 3-step approach to identify the correlated domain group, given an anchor malicious domain Identify the coarse related domain group using a TF-IDF heuristic Cluster the coarse domain group Refine the domain group according to the clustering result

Related Domain Identification 25 The concept of domain segments : All domain queries in close temporal proximity with an anchor domain query. An anchor domain queried for n times corresponds to n domain segments. Determining whether the candidate domain c is related with the anchor domain s: metric tf : how many times c appears in the domain segments? metric idf : metric tf / |c| Need both metrics to surpass predefined thresholds.

26 Obtaining anchor domains: Record all domains blacklisted on Dec. 16 th from three external blacklists. MalwareDomainBlockList, MalwareDomainList, Phishtank Validating detected domains: Blacklist matching with 5 external blacklists McAfee SiteAdvisor and MyWot IP address comparison 26 Experimental Evaluation DNS DataSizeAnchor Domain # Dec 16, B queries129

27 A good threshold combination: T_freq = 40, T_size = 20 TP = 6890, FP = 258 precision = 96.4%, expansion rate = Detection Accuracy

28 Sample anchor domain pairs deriving highly overlapping groups 28 Domain Group Analysis surprise-mnvq.tksurprise-mnvr.tk vural-electronic.comvfventura.sites.uol.com voyeurpornweb.comvkont.bos.ru

29 Domain Group Analysis pill-erectionmeds.rupillcheap-med.ruonlinerxpillhere.ru medspill-erection.rurxpill-medstore.rumedpillbuy-online.ru A pharmaceutical domain group, size = 295 uggsbootss.comniceuggsforsale.comlouisvuittonwhite.net uggsclassic.orgofficialuggsretails.comnicelouisvuittonbag.com A counterfeit product domain group, size = 17 lq8p.ruol4k.rus3po.ru n5di.rup9ha.run4gf.ru A suspected DGA domain group, size = 71

30 Empirical Reexamination Query type distribution Traffic validity TTL distribution Malicious Domain Group Detection Detection using temporal correlation Domain group analysis Conclusions 30 Roadmap

Conclusions 31 We conducted a comprehensive measurement study with more than 26 billion DNS query-response pairs collected from 600+ global DNS resolvers. While DNS characteristics vary significantly across networks, networks within an org exhibit similar behavior Web servers should take IPv6 support into account Client-side implementations could be more aggressive in suppressing invalid queries We proposed a novel approach that isolates malicious domain groups using temporal correlation in DNS queries, given a few known malicious domains as anchors. 96.4% detection precision, 56 X expansion rate

Acknowledgements 32 Paul Vixie, SIE Contributors, David Dagon, Sonia Fahmy, Yunsheng Cao Thanks!