Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2011 Cisco Systems, Inc. All rights reerved. 1 Applications of Machine Learning in Cisco Web Security Richard Wheeldon PhD BSc

Similar presentations


Presentation on theme: "© 2011 Cisco Systems, Inc. All rights reerved. 1 Applications of Machine Learning in Cisco Web Security Richard Wheeldon PhD BSc"— Presentation transcript:

1 © 2011 Cisco Systems, Inc. All rights reerved. 1 Applications of Machine Learning in Cisco Web Security Richard Wheeldon PhD BSc

2 2 © 2011 Cisco Systems, Inc. All rights reerved. Cisco Web Security Cisco, Ironport and ScanSafe Request time filtering Categorization and classification Reputation Response time filtering Malware types and attack vectors Malware detection Dynamic classification Other challenges

3 3 © 2011 Cisco Systems, Inc. All rights reerved. The Ubiquitous Speaker Slide Richard Wheeldon UCL Graduate in 1999 PhD from Birkbeck in 2003 Joined Cisco December Acknowledgements Steve Poulson - Bryan Feeney -

4 4 © 2011 Cisco Systems, Inc. All rights reerved. Cisco, Ironport and ScanSafe Cisco Worlds leading network company Ironport Leader in Anti-spam Provide Web Security Appliances ScanSafe World leader in Security as a Service Scans 1.8 billion web requests a day Blocks 32 million of them

5 5 © 2011 Cisco Systems, Inc. All rights reerved. Were local

6 6 © 2011 Cisco Systems, Inc. All rights reerved. Previous MSc projects Tree Kernels for CFG similarity Guangyan Song, 2010 Fast computation of the Kernel of a Tree and applications to Semi-Supervised Learning Malcolm Reynolds, 2009 Comparing N-gram features for web page classification Noureen Tejani, 2007

7 7 © 2011 Cisco Systems, Inc. All rights reerved. Were hiring Positions Software Developers QA, Operations, Research Locations ScanSafe UK - Bedfont Lakes, Reading, Staines, Edinburgh Galway, EMEA, US, Worldwide Graduate recruitment

8 8 © 2011 Cisco Systems, Inc. All rights reerved. 1. Availability Time our service is available to scan traffic % guaranteed availability 2. Latency Additional load time attributable to services Evaluated by 3 rd party analysis 3. False Positives Pages that were blocked but should not have 4. False Negatives Pages that were not blocked, but should have Scansafes SaaS

9 9 © 2011 Cisco Systems, Inc. All rights reerved. Risks of Unfiltered Content Software threats Malware Phishing Botnets Business threats Productivity Loss Bandwidth congestion Legal liability Data Leaks

10 10 © 2011 Cisco Systems, Inc. All rights reerved. The Web vs. Web Most web traffic is goodMost is bad Easy to find safe sitesEasy to get Spam Harder to get dangerous URLsHarder to get examples of good mail Blocking web sites is visibleBlocking is invisible Performance gain from white-listingPerformance gain from blocking Very Real-Time (<2s)Not Real-Time (

11 11 © 2011 Cisco Systems, Inc. All rights reerved. Request time filtering Motivation Quicker blocks save bandwidth and processing time If the request is made, the damage may be done Techniques Databases Reputation Rules Trained systems

12 12 © 2011 Cisco Systems, Inc. All rights reerved. Category-based filtering Responsible for most blocks High-risk and high-traffic Manual categorizers 10 million URLs 97% of traffic 2 million porn sites

13 13 © 2011 Cisco Systems, Inc. All rights reerved. Web Reputation Feeds Phishing sites Malware sites Heuristics In spam but not in ham Age of domain registration High traffic – e.g. Alexa 1000 Scanned but never blocked

14 14 © 2011 Cisco Systems, Inc. All rights reerved. Web Reputation in the WSA

15 15 © 2011 Cisco Systems, Inc. All rights reerved.

16 16 © 2011 Cisco Systems, Inc. All rights reerved. Keyword-based URL filtering Keyword rules Fitness -> Health Basketball -> Sport Pizzeria -> Food Restaurant -> Food Whore -> Porn Strange URLs whorepresents.com therapistfinder.com speedofart.com expertsexchange.com penisland.com powergenitalia.it

17 17 © 2011 Cisco Systems, Inc. All rights reerved. Recognizing Porn URLs Example of segmentation problem P('peni') X P('sland') P('penis') X P('land') P('pen') X P('island') Extends to classification P('penis') X P('land') X P(porn|'penis') X P(porn|'land') P('pen') X P('island') X P(not_porn|'pen') X P(not_porn|'island')

18 18 © 2011 Cisco Systems, Inc. All rights reerved. Phishing and Malware Examples Phishing examples Malicious examples: www1.scan-projectrf.cz.cc www1.scan-projectsi.cz.cc www1.scan-projectst.cz.cc www1.scan-projectte.cz.cc www1.scan-projectti.cz.cc

19 19 © 2011 Cisco Systems, Inc. All rights reerved. Searchahead If we can identify bad URLs we can warn before the user clicks. Over 90% of new sites are visited as the result of an Internet search Acceptable Uncategorized Prohibited Malicious

20 20 © 2011 Cisco Systems, Inc. All rights reerved. Response Time Scanning Trusted sites are targets Strength-in-depth combination of commercial scanners and in-house technology. Graphics Webmail New Web Pages Blogs Ad Links Links Comments Banner Ads Backdoors Rootkits Trojan Horses Keyloggers Worms

21 21 © 2011 Cisco Systems, Inc. All rights reerved. Exploited sites in recent years Facebook Times India Miami Dolphins Samsung

22 22 © 2011 Cisco Systems, Inc. All rights reerved. Nothing is safe – not even Twitter!

23 23 © 2011 Cisco Systems, Inc. All rights reerved. Signature Databases From 2006 to 2008, the F-Secure signature database grew from entries to 1.5 million The rate at which variants of viruses come out is growing rapidly No vendor can rely exclusively on signatures

24 24 © 2011 Cisco Systems, Inc. All rights reerved. Zero-hour protection Vendors take time to release signature updates Win32.IstBar.jl trojan Outbreak Intelligence (OI) provides proactive threat detection A huge data set of traffic to be leveraged

25 25 © 2011 Cisco Systems, Inc. All rights reerved. How does OI use Machine Learning? Approaches Malware detection Anomaly detection Dynamic categorization Techniques Employed Supervised Learning Unsupervised Learning Sandboxing

26 26 © 2011 Cisco Systems, Inc. All rights reerved. Dynamic Classification Document classification across 80 categories Increases coverage Language identification Identifies inappropriate content Porn is relatively easy Phishing is harder – but not impossible? Hate speech is harder still

27 27 © 2011 Cisco Systems, Inc. All rights reerved. DC for identifying malicious sites Automated tools generate malicious sites Fake escrow Fake pharmacy Mule recruitment Examples from Richard Claytons 2010 FOSDEM talk mercial+manager+of+a+large+corporation+engaged+in+electro nics+production%22http://www.google.com/search?q=%22before+that+was+a+com mercial+manager+of+a+large+corporation+engaged+in+electro nics+production%22 crow+service+on+the+internet%22http://www.google.com/search?q=%22as+the+most+trusted+es crow+service+on+the+internet%22

28 28 © 2011 Cisco Systems, Inc. All rights reerved. Malicious Executable Files The final stage of an attack is frequently downloading an executable Traditionally blocked using signatures We use a combination of signature-based scanners and machine-learning

29 29 © 2011 Cisco Systems, Inc. All rights reerved. Drive-by attacks Almost no-one opens executables from odd sources any more, so instead people use drive- by attacks. A normal file (e.g. Flash, PDF, Javascript, Image file) is crafted to exploit a vulnerability in a viewer or library and execute code embedded within the file.

30 30 © 2011 Cisco Systems, Inc. All rights reerved. Flash Symantec recently highlighted Flash for having one of the worst security records in We also know first hand that Flash is the number one reason Macs crash. We have been working with Adobe to fix these problems, but they have persisted for several years now. We dont want to reduce the reliability and security of our iPhones, iPods and iPads by adding Flash Steve Jobs, April 2010

31 31 © 2011 Cisco Systems, Inc. All rights reerved. The growing threat of Java Almost as common as Flash 90% of PCs have Java JDK downloads per month 3.48 Million JRE downloads per month Growth in known vulnerabilities 29 patched in a single update (Oct 2010) Growth in exploits reported by Sophos, Symantec, Microsoft and Cisco Signatures + Trained Scanlet

32 32 © 2011 Cisco Systems, Inc. All rights reerved. Detecting Malicious JavaScript Sandboxing Behavioural checking Good way to beat obfuscation techniques Difficult to constrain Trained classification Analyse features

33 33 © 2011 Cisco Systems, Inc. All rights reerved. Javascript Features v46f658f5e2260(v46f658f5e3226){ function v46f658f5e4207 () {return 16;} return(parseInt(v46f658f5e3226,v46f658f5e4207()));}function v46f658f5e61f4(v46f658f5e7174){ function v46f658f5ea0cd () {return 2;} var v46f658f5e813e=\'\';for(v46f658f5e9105=0; v46f658f5e9105

34 34 © 2011 Cisco Systems, Inc. All rights reerved. Obfuscation Attackers use obfuscation But so do legitimate vendors (e.g. Google) And large Web 2.0 libraries Techniques include Name changes String concatenation (eval) Dynamically loaded/generated/decrypted code (eval) Splitting functionality across files

35 35 © 2011 Cisco Systems, Inc. All rights reerved. Malicious Non-Executable Files There are a lot of file formats out there – documents, pictures, videos. For zero-day attacks, we have no data to compare against. Basically this is anomaly detection.

36 36 © 2011 Cisco Systems, Inc. All rights reerved. Development Constraints Low False Positive Rate Robust Tolerant against malformed data Language-agnostic Scalable 1.8 Billion requests per day on 1000 servers Low latency

37 37 © 2011 Cisco Systems, Inc. All rights reerved. Back-end processing If a technique is too slow for real-time scanning, that doesnt make it useless. Back end processing can generate lists of good and bad files and help evaluate new techniques.

38 38 © 2011 Cisco Systems, Inc. All rights reerved. Want to know more? Cisco 2Q10 Global Threat Report sco_threat_072610_959.pdf sco_threat_072610_959.pdf Richard Clayton : Evil on the Internet Internet)-FOSDEM-Talk-video.aspx Internet)-FOSDEM-Talk-video.aspx Kaspersky Lab Security News Service A plan for Spam

39 39 © 2011 Cisco Systems, Inc. All rights reerved. Still want to know more? Identifying Suspicious URLs : An Application of Large- Scale Online Learning Peter Norvig Google : Statistical Learning as the Ultimate Agile Development Tool Writing ClamAV Signatures Alain Zidouemba ppt ppt

40 40 © 2011 Cisco Systems, Inc. All rights reerved. Take Home Messages Web Security Challenging and interesting domain Many applications for Machine Learning ScanSafe and Cisco Many opportunities for collaboration Several opportunities for student projects

41 © 2011 Cisco Systems, Inc. All rights reerved. 41 Any Questions?


Download ppt "© 2011 Cisco Systems, Inc. All rights reerved. 1 Applications of Machine Learning in Cisco Web Security Richard Wheeldon PhD BSc"

Similar presentations


Ads by Google