Presentation is loading. Please wait.

Presentation is loading. Please wait.

JOHN P. JOHN FANG YU YINGLIAN XIE MARTÍN ABADI ARVIND KRISHNAMURTHY PRESENTATION BY SAM KLOCK Searching the Searchers with SearchAudit.

Similar presentations


Presentation on theme: "JOHN P. JOHN FANG YU YINGLIAN XIE MARTÍN ABADI ARVIND KRISHNAMURTHY PRESENTATION BY SAM KLOCK Searching the Searchers with SearchAudit."— Presentation transcript:

1 JOHN P. JOHN FANG YU YINGLIAN XIE MARTÍN ABADI ARVIND KRISHNAMURTHY PRESENTATION BY SAM KLOCK Searching the Searchers with SearchAudit

2 Motivation We can find this via a Google search

3 Motivation (cont’d) Search engines open opportunities for attackers  Construct clever queries  Find vulnerable sites  Plant malware; spam (e.g., MyDoom)  Do so stealthily and cheaply Mitigation strategy: identify malicious queries  May be able to deny results to user  Identify attackers (probably bots)  Interpret strategy, then anticipate and prevent The question: how to do so

4 Proposed Approach SearchAudit  Framework for generating malicious queries Input:  Seed set of known malicious queries  Search logs Output:  Large set of suspicious queries  Regular expressions matching queries inurl:gotoURL.asp?url= filetype:asp inurl:"shopdisplayprod ucts.asp" ext:pl inurl:cgi intitle:"FormMail *" -"*Referrer" -"* Denied" -sourceforge -error -cvs -input filetype:cgi inurl:tseekdir.cgi... SearchAudit inurl:gotoURL.asp?url= filetype:asp inurl:"shopdisplayprod ucts.asp" ext:pl inurl:cgi intitle:"FormMail *" -"*Referrer" -"* Denied" -sourceforge -error -cvs -input filetype:cgi inurl:tseekdir.cgi... inurl:gotoURL.asp?url= filetype:asp inurl:"shopdisplayprod ucts.asp" ext:pl inurl:cgi intitle:"FormMail *" -"*Referrer" -"* Denied" -sourceforge -error -cvs -input filetype:cgi inurl:tseekdir.cgi... inurl:gotoURL.asp?url= filetype:asp inurl:"shopdisplayprod ucts.asp" ext:pl inurl:cgi intitle:"FormMail *" -"*Referrer" -"* Denied" -sourceforge -error -cvs -input filetype:cgi inurl:tseekdir.cgi... "/includes/joomla\.php " site:\.[a-zA- Z]{2,3} "/includes/class_item\.php" site:[^?=#+@;&:]{2, 4} "php-nuke" site:[^?=#+@;&:]{2, 4} "modules\.php\?op=modl oad" site:\.[a-zA- Z0-9]{2,6} Seed setSearch logs Expanded setRegular expressions

5 Proposed Approach (cont’d) Needed to implement:  Seed set: milw0rm.com  Search logs: Microsoft Research  Bing  Way to expand seed set into more queries  Way to infer regular expressions Intended benefits:  Harvesting lots of information  Three months: ~1.2 TB of logs  Interpret relationship between queries and attacks  Use queries to find potential victims  Stop attacks

6 SearchAudit Query identification Query analysis

7 Query Identification: Expansion Basic idea: bootstrap on seed set  Search logs for exact matches to seed queries  Record IPs of hosts making seed queries  Add other queries from those IPs to set  Intuition: make one malicious query, will probably make more Account for DHCP Seed queries IP addresses Queries made by IPs Log search Queries made on same day

8 Query Identification: Regular Expressions Goals:  Account for variation in queries  Take advantage of scripting See paper for generation algorithm Compute score for generated expressions  Lower score: more specific  Goal: discard overly general expressions (score > 0.6) Consolidate to avoid overlap Avoid proxies, public NAT for performance Loopback for more queries

9 Query Identification: Results Data from Bing and milw0rm  500 queries  Logs for Feb. 2009, Dec. 2009, Jan. 2010  ~2 billion views per month System implemented on Dryad/DryadLINQ Initial observations:  Using specificity scores < 0.6 seems to be effective  Based on cookie heuristic  Proxy elimination does not limit results

10 Query Identification: Results (cont’d) Query expansion:  122 of 500 queries matched in logs: 174 unique IPs  Expanded to 800 unique queries, 264 IPs  Regular expressions matched 3,560 queries, 1,001 IPs Incomplete seeds  Tried with subsets of original set  Coverage still good

11 Query Identification: Results (cont’d) Loopback:  Multiple loopbacks got more results  One iteration is good enough Overall statistics  10,000s IPs each month  100,000s unique queries each month  Dec. 09: set of unusual attacker IPs cause spike

12 Query Identification: Verification Want to show queries are malicious  Sometimes easy: 73% of queries associated with security/hacker sites  What about others? No ground truth exists So: look for bot-like features  Individual level (one IP)  Group level (multiple IPs) Individual bots  New cookie  Whether a link was clicked Groups of bots  Data often fixed by botnets  User agent string  Metadata for requests  Tendencies dictated by scripts  Pages viewed per query  Time between queries

13 Query Identification: Verification (cont’d) Substantial variation between host behavior for normal queries and suspicious queries

14 Observations on Stage One Regular expressions can become obsolete  Just need fresh logs and a new seed to get new ones Attacker awareness of technique yields adaptation  Example: mix in normal user queries  Goal: trick SearchAudit into identifying as proxy  Hard to do: needs to be appropriate to time and place  Anyway: proxy elimination is optimization only  Injecting randomness also possible, but makes querying less productive  Could obviate cookie heuristic, but it is replaceable All attackers need to be careful to succeed

15 Query Analysis

16 42,000 IPs gave suspicious queries globally  U.S., Russia, China contribute almost 50%  10% of IPs gave 90% of queries Found 200 regular expressions Reveal three kinds of attack-related queries:  Vulnerable web sites  Forum spamming  Phishing on Windows Live Messenger

17 Queries for Vulnerable Websites Queries look for exploitable server vulnerabilities  GET variables embedded in URL (for SQL injection)  Server software with known vulnerabilities (e.g., status pages) SearchAudit as a defense:  Pull suspicious queries for vulnerabilities  Run queries; gather results  Inspect results for vulnerabilities  Notify sites of vulnerabilities inurl:index.php?content=X http://www.example.com/ind ex.php?content=X’%20OR%20’ 1’%20OR%20‘1=1’

18 Queries for Vulnerable Websites (cont’d) With identified queries:  Sampled 5,000 queries  Obtained 80,490 URLs from 39,475 sites Compared to malware/phishing lists:  3-4% on anti-phishing lists  1.5% on anti-malware lists SQL injection vulnerability:  Add a single-quote to variable in URL  Look for SQL error  12% of examined URLs showed an error

19 Queries for Forum Spamming Query motivation:  Find scriptable forums  Good for spam, PageRank Found 46 applicable regular expressions Most IPs show transient behavior: probably bots  All regular expression groups show at least one group similarity feature IPs got less aggressive over time: more stealthy

20 Queries for Forum Spamming (cont’d) Validation  Project Honey Pot  Dynamically generate e- mail address for each visiting IP  E-mail received: must be spam  12% of all IPs listed (vs. 0.5% for normal IPs) Applications  Use queries to find and clean targeted pages  Deny results to malicious queries

21 Phishing via Windows Live Messenger Queries triggered by normal users  Victim receives message from a contact  Follow link for party photos  Taken to fake WLM login  After giving credentials, redirected to Bing search for “party” Bing search to avoid costs of hosting

22 Phishing via WLM (cont’d) Detect via query referral field (source page)  Found two regular expressions for referrals  Both expressions: victim username embedded in URL Over 180 phishing domains for 12 IPs detected Compromised accounts show different login behaviors

23 Conclusion Presented framework for finding suspicious queries  Input: search logs, small set of seed queries  Output: regular expressions, millions of suspicious queries Analyzed suspicious queries  Identified possible attacks  Suggested means of prevention Generally: attempted to demonstrate relationship between suspicious queries and the possibility of attack


Download ppt "JOHN P. JOHN FANG YU YINGLIAN XIE MARTÍN ABADI ARVIND KRISHNAMURTHY PRESENTATION BY SAM KLOCK Searching the Searchers with SearchAudit."

Similar presentations


Ads by Google