School of Computer Science and Information Systems Identifying Malicious Web Requests through Changes in Locality and Temporal Sequence DIMACS Workshop on Security of Web Services and E-Commerce Li-Chiou Chen lchen@pace.edu School of Computer Science and Information Systems Pace University May 4th, 2005
Needs for anomaly detection in distributed network traces The fast spreading Internet worms or malicious programs interrupts web services Early detection and response is a vital approach These attacks are usually launched from distributed locations Network traces left at distributed locations are invaluable for searching clues of potential future attacks E.g. Dshield, the Honeynet Project © Li-Chiou Chen, 5/6/2005
Types of IDS Based on data Based on detection techniques Network-based IDS Monitors and inspects network traffic Host-based IDS Runs on a single host Based on detection techniques Signature-based IDS Uses pattern matching to identify known attacks Anomaly-based IDS Uses statistical, data mining or other techniques to distinguish normal from abnormal activities © Li-Chiou Chen, 5/6/2005
Outline Toolkits for inferring anomaly patterns from distributed network traces Previous works Changes of locality over time Markov chain analysis Preliminary results Summary Future works Focusing on anomaly detection in distributed network traces TIAP Malicious web requests © Li-Chiou Chen, 5/6/2005
Locality pattern analysis Sequence pattern analysis TIAP: Toolkits for inferring anomalous patterns in distributed network traces Network traces (web log, tcpdump, etc) Data conversion Alerts from other IDS or TIAP peers (using IDMEF) Locality pattern analysis Sequence pattern analysis Response module Alerts to other IDS or TIAP peers (using IDMEF) Alerts to administrators © Li-Chiou Chen, 5/6/2005
Web level IDS Anomaly detection Misuse detection Structure of a HTTP request (Kruegel and Vigna 03) Normality on streams of data access patterns (Sion et al 03) Misuse detection State transition analysis of HTTP requests (Vigna et al 03) Look for attack signatures (Almgren et al 01) © Li-Chiou Chen, 5/6/2005
Changes in locality patterns and temporal sequence patterns where the web request is sent, such as the source IP address, which web server is requested, such as the destination IP address Temporal sequence the order of requested objects during a given period of time © Li-Chiou Chen, 5/6/2005
Locality pattern analysis in distributed network traces ABAA ABCD KIKL ABPO t1: AB t2: .... t3: …. t4: …. © Li-Chiou Chen, 5/6/2005
An example: web traces in common log format from 6 web servers tstamp, ip, server, doc_tpe, user_agent 62978, 38.0.69.1, 1, 2, 3 62979, 38.0.69.1, 1, 2, 3 62979, 38.0.69.1, 2, 2, 3 63001, 38.0.69.1, 1, 2, 3 …….. ……… A session © Li-Chiou Chen, 5/6/2005
Data profiles 6 web servers (2 of them have links to each other, 4 of them are independent) One day web trace One session: a distinct IP, 10 minutes interval 193,070 HTTP requests, 11,177 sessions HTTP requests from outside of the organization © Li-Chiou Chen, 5/6/2005
Locality pattern analysis 86 sessions by only two web bots © Li-Chiou Chen, 5/6/2005
Markov chain analysis © Li-Chiou Chen, 5/6/2005 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 ……………. N S N S S O S N S O S N S S N S S ………………….. sampling window 1 sampling window 2 N S O © Li-Chiou Chen, 5/6/2005
Data profiles 1 web servers One week web traces Window size 30 Reference list 30 © Li-Chiou Chen, 5/6/2005
Change of distinct IP over time- browsers © Li-Chiou Chen, 5/6/2005
Change of distinct IP over time- web bots © Li-Chiou Chen, 5/6/2005
Markov chain results 0.43(0.14) Old (O) 0.42(0.21) 0.43(0.17) 0.13 (0.10) 0.13 (0.08) New (N) Same (S) 0.40 (0.22) 0.06 (0.04) 0.83 (0.10) 0.18 (0.16) © Li-Chiou Chen, 5/6/2005
Illustration of the state transition probability © Li-Chiou Chen, 5/6/2005
Summary The preliminary locality pattern analysis works well with identifying distinct web bot access patterns The Markov chain analysis provides a way to infer attacks that utilize random IP addresses A combination of the two approaches is needed © Li-Chiou Chen, 5/6/2005
Ongoing works Incorporate the analytical results for malware or intrusion detections A distributed framework of data collection and information sharing for inferring malwares or intrusion attempts across servers/platforms/geographical locations Collection of attack logs for analytical purpose Use of the Intrusion Detection Message Exchange Format (IDMEF) for message changes among servers © Li-Chiou Chen, 5/6/2005