Presentation is loading. Please wait.

Presentation is loading. Please wait.

Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY Roger Piqueras Jover AT&T Security.

Similar presentations


Presentation on theme: "Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY Roger Piqueras Jover AT&T Security."— Presentation transcript:

1 Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security Research Center New York, NY roger.jover@att.com IMC’12, November 14–16, 2012, Boston, Massachusetts, USA.

2

3

4 SMS-spam consume network resources for legitimate services otherwise. user pays at a per received message basis exposes smart phone users to viruses fraudulent messaging activities such as phishing, identity theft and fraud This paper: used for SMS spam detection engine

5 Outline three data sets for analysis Data analysis – Account information – Messaging Abuse Response ratio Message timing and time series – The Scene of the Crime Location & targets Mobility – Hardware choice – Voice and IP traffic

6 three data sets: SMS cell M2M tier-1 cellular operator Call Detail Records (CDR) of 9000 SMS spammer & 17000 legitimate (cell & M2M) Mobile Originated (MO):transmitting party Mobile Terminated (MT):receiver Spammers identified & disconnected from the network. SMS : prepaid cell : postpaid M2M: TAC

7 three data sets for analysis

8 Outline three data sets for analysis Data analysis – Account information – Messaging Abuse Response ratio Message timing and time series – The Scene of the Crime Location & targets Mobility – Hardware choice – Voice and IP traffic

9 notes In all the figures throughout the paper, legitimate cellphone users, M2M systems and spammers (SMS) are represented in green, blue and red, respectively.

10 Account information spammers (99.64%) are using pre-paid accounts with unlimited messaging plans SIM cards are constantly switched to circumvent detection schemes discard it once an account is canceled and work with a new one average age is 7 to 11 days (legitimate user is several months to a couple years)

11 Outline three data sets for analysis Data analysis – Account information – Messaging Abuse Response ratio Message timing and time series – The Scene of the Crime Location & targets Mobility – Hardware choice – Voice and IP traffic

12 Messaging Abuse

13 Spammers generate a large load of messages Spammers not only send but also receive more than legitimate customers do – opt-out – trick

14 Messaging Abuse Actual spam messages often attempt to trick the recipient into replying to the message. Despite a small percentage of users will reply, the large amount of accounts targeted in a spam campaign results in many responses.

15 Messaging Abuse

16 legitimate accounts have a small set of recipients. (7 on average) spammers hit a couple of thousand victims legitimate users send multiple messages to a small set of destinations spammers send one message to each victim

17 Outline three data sets for analysis Data analysis – Account information – Messaging Abuse Response ratio Message timing and time series – The Scene of the Crime Location & targets Mobility – Hardware choice – Voice and IP traffic

18 Response ratio

19 legitimate users, messages are sent in response to a previous message in a sequential way. the response ratio close to 1. For spammers the amount of MT SMSs is proportionally very small to the number of transmitted messages. the response ratio is close to 0

20 Outline three data sets for analysis Data analysis – Account information – Messaging Abuse Response ratio Message timing and time series – The Scene of the Crime Location & targets Mobility – Hardware choice – Voice and IP traffic

21 Message timing and time series

22

23 Inter-SMS intervals for spammers are short less random -- low entropy intervals for legitimate messages are less frequently random--higher entropy. Messaging activities of certain M2M devices are prescheduled.

24 Message timing and time series

25 Outline three data sets for analysis Data analysis – Account information – Messaging Abuse Response ratio Message timing and time series – The Scene of the Crime Location & targets Mobility – Hardware choice – Voice and IP traffic

26 Location & targets

27 California, Sacramento and Orange Los Angeles New York/New Jersey/Long Island Miami Beach Illinois, Michigan North Carolina and Texas.

28 Location & targets

29 The legitimate recipients -- local area (i.e. the area around the subscriber’s home or areas where the subscriber works, used to live or where friends and relatives reside). The spam recipients distributed uniformly over the US population.

30 Location & targets

31 Spammers are characterized by messaging a large number of area codes, always greater than those of cell-phone users and M2M.

32 Location & targets

33 low entropy (legitimate cell) -- contacts repeatedly the same area codes. High entropy (SMS) -- sends messages to a more random set of area codes. Network enabled appliances (M2M) -- a predefined set of cell-phones, the entropy is the lowest.

34 Location & targets

35 linear relation -- SMS spammers Both M2M systems and cell-phone users cluster around the bottom-left area of the graph. M2M send up to 20000 messages to 1 single destination???

36 Location & targets

37 Cellphone users destinations-to-messages ratio and a small set of area codes. A great majority of spammers exhibit the opposite behavior. bottom-right corner (SMS) target very specific geographical regions. ratio of one destination/message. targeted area codes is limited

38 Outline three data sets for analysis Data analysis – Account information – Messaging Abuse Response ratio Message timing and time series – The Scene of the Crime Location & targets Mobility – Hardware choice – Voice and IP traffic

39 mobility

40

41 Outline three data sets for analysis Data analysis – Account information – Messaging Abuse Response ratio Message timing and time series – The Scene of the Crime Location & targets Mobility – Hardware choice – Voice and IP traffic

42 Hardware choice 1. USB Modem/Aircard A1 2. Feature mobile-phone M1 3. Feature mobile-phone M2 4. USB Modem/Aircard A2 5. USB Modem/Aircard A3

43 Outline three data sets for analysis Data analysis – Account information – Messaging Abuse Response ratio Message timing and time series – The Scene of the Crime Location & targets Mobility – Hardware choice – Voice and IP traffic

44 Voice call

45

46 IP traffic

47 Voice call

48 IP traffic

49 STOPPING THE CRIME An advanced SMS spam detection algorithm is proposed based on an ensemble of decision trees Over 40 specific features are extracted from messaging patterns and processed through a combination of decision trees

50 CONCLUSIONS pre-paid accounts ---- 7 and 11 days. large number of messages sent to a wide target(also receive a large amount) five different models of hardware large number of phone calls, very short duration main geographical sources in US: Sacramento, Los Angeles-Orange County and Miami Beach certain networked appliances have messaging behavior close to that of a spammer.

51


Download ppt "Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY Roger Piqueras Jover AT&T Security."

Similar presentations


Ads by Google