Presentation is loading. Please wait.

Presentation is loading. Please wait.

Malware 1 Malware Malware 2 Malicious Software  Malware is not new…  Fred Cohen’s initial virus work in 1980’s o Used viruses to break MLS systems.

Similar presentations


Presentation on theme: "Malware 1 Malware Malware 2 Malicious Software  Malware is not new…  Fred Cohen’s initial virus work in 1980’s o Used viruses to break MLS systems."— Presentation transcript:

1

2 Malware 1 Malware

3 Malware 2 Malicious Software  Malware is not new…  Fred Cohen’s initial virus work in 1980’s o Used viruses to break MLS systems  Types of malware (lots of overlap) o Virus  passive propagation o Worm  active propagation o Trojan horse  unexpected functionality o Trapdoor/backdoor  unauthorized access o Rabbit  exhaust system resources

4 Malware 3 Where do Viruses Live?  Just about anywhere…  Boot sector o Take control before anything else  Memory resident o Stays in memory  Applications, macros, data, etc.  Library routines  Compilers, debuggers, virus checker, etc. o These would be particularly nasty!

5 Malware 4 Malware Timeline  Preliminary work by Cohen (early 80’s)  Brain virus (1986)  Morris worm (1988)  Code Red (2001)  SQL Slammer (2004)  Future of malware?

6 Malware 5 Brain q First appeared in 1986 q More annoying than harmful q A prototype for later viruses q Not much reaction by users q What it did 1. Placed itself in boot sector (and other places) 2. Screened disk calls to avoid detection 3. Each disk read, checked boot sector to see if boot sector infected; if not, goto 1 q Brain did nothing malicious

7 Malware 6 Morris Worm  First appeared in 1988  What it tried to do o Determine where it could spread o Spread its infection o Remain undiscovered  Morris claimed his worm had a bug…  Morris worm tried to re-infect systems o Led to resource exhaustion o Adverse effect was like a so-called rabbit

8 Malware 7 Morris Worm  How to spread its infection?  Tried to obtain access to machine by… o User account password guessing o Exploited buffer overflow in fingerd o Exploited trapdoor in sendmail  Flaws in fingerd and sendmail were well- known at the time, but not widely patched

9 Malware 8 Morris Worm  Once access had been obtained to machine…  “Bootstrap loader” sent to victim o Consisted of 99 lines of C code  Victim machine compiled and executed code  Bootstrap loader fetched the rest of worm  Victim even authenticated the sender! o Trudy doesn’t want user to get a bad worm…

10 Malware 9 Morris Worm  How to remain undetected?  If transmission of the worm was interrupted, all code was deleted  Code encrypted when downloaded  Code deleted after decrypting and compiling  When running, the worm regularly changed its name and process identifier (PID)

11 Malware 10 Result of Morris Worm  Shocked the Internet community of 1988 o Internet of 1988 much different than today  Internet designed to withstand nuclear war o Yet it was brought down by a graduate student! o At the time, Morris’ father worked at NSA… o …which added a conspiratorial overtone  Could have been much worse  not malicious  As a result, CERT, more security awareness o But limited actions to improve security

12 Malware 11 Code Red Worm  Appeared in July 2001  Infected more than 250,000 systems in about 15 hours  Eventually infected about 750,000 out of about 6,000,000 susceptible systems  To gain access, exploited buffer overflow in Microsoft IIS server software o Then monitored traffic on port 80, looking for other susceptible servers

13 Malware 12 Code Red Worm  What it did o Day 1 to 19 of month: tried to spread infection o Day 20 to 27: distributed denial of service attack on www.whitehouse.gov  Later versions (several variants) o Included trapdoor for remote access o Rebooted to flush worm, leaving only trapdoor  Some claimed Code Red was “beta test for information warfare”

14 Malware 13 SQL Slammer  Infected 250,000 systems in 10 minutes!  Code Red took 15 hours to do what Slammer did in 10 minutes  At its peak, Slammer infections doubled every 8.5 seconds  Slammer spread “too fast”…  …and “burned out” available bandwidth

15 Malware 14 SQL Slammer  Why was Slammer so successful? o Worm fit in one 376-byte UDP packet o Firewalls often let small packet thru, assuming it could do no harm by itself  Then firewall monitors the “connection” o Expectation was that much more data would be required for an attack o Slammer defied assumptions of “experts”

16 Malware 15 Malware Detection  Three common methods o Signature detection o Change detection o Anomaly detection  We briefly discuss each of these o And consider advantages and disadvantages of each

17 Malware 16 Signature Detection  A signature is a string of bits found in software (or could be a hash value)  Suppose that a virus has signature 0xd7e5ce3d47f2a5d1d83946141326ed83 o That is, this string of bits appears in virus  We can search for this signature in all files  If we find signature, have we found virus? o No, same signature could appear in innocent files o But at random, chance is 1/2 128 o Software is not random, so probability is higher

18 Malware 17 Signature Detection  Advantages o Effective on “traditional” malware o Minimal burden for users/administrators  Disadvantages o Signature file can be large (10,000’s)… o …making scanning slow o Signature files must be kept up to date o Cannot detect unknown viruses o Cannot detect some types of malware  By far the most popular detection method

19 Malware 18 Change Detection  Viruses must live somewhere on system  If we detect that a file has changed, it may have been infected  How to detect changes? o Hash files and (securely) store hash values o Recompute hashes and compare o If hash value changes, it might be infected

20 Malware 19 Change Detection  Advantages o Virtually no false negatives o Can even detect previously unknown malware  Disadvantages o Many files change  and often o Many false alarms (false positives) o Heavy burden on users/administrators o If suspicious change detected, then what? o Might fall back to signature-based system

21 Malware 20 Anomaly Detection  Monitor system for anything “unusual” or “virus-like” or potentially malicious or ???  What is unusual? o Files change in some unusual way o System misbehaves in some way o Unusual network activity o Unusual file access, etc., etc., etc.  But must first define “normal” o Normal can (and must) change over time!

22 Malware 21 Anomaly Detection  Advantages o Chance of detecting unknown malware  Disadvantages o No proven track record o Trudy can make abnormal look normal (go slow) o Must be combined with another method (usually, signature detection)  Also popular in intrusion detection (IDS)  A difficult unsolved (unsolvable?) problem o An AI problem?

23 Malware 22 Future of Malware  Trends o Encrypted, polymorphic, metamorphic malware  Fast replication/Warhol worms  Flash worms, Slow worms, etc.  Future is bright for malware o Good news for the bad guys… o …bad news for the good guys  Future of malware detection?

24 Malware 23 Encrypted Viruses  Virus writers know that signature detection is king  So, how to evade signature detection?  Encrypting the virus is a good idea o Looks like random bits o Different key, different “random” bits o Different copies have different signatures  Encryption is often used today in viruses

25 Malware 24 Encrypted Viruses  How to detect encrypted viruses?  Search for the decryptor code o Standard signature detection problem  Why not encrypt the decryptor code? o Then encrypt the encryptor of the encryptor code (and so on…)  Encryption is of limited value o Makes signature detection a bit more difficult

26 Malware 25 Polymorphic Malware  Polymorphic worm o Body of worm is encrypted o Decryptor is “mutated” o Goal is no common signature o Like an encrypted worm on steroids…  Q: How to detect?  A: Emulation o Slow, but effective

27 Malware 26 Metamorphic Malware  A metamorphic worm “mutates” when infecting a new system o Sometimes called “body polymorphic”  Such a worm can, in principle, avoid signature-based detection systems  Mutated worm must function the same o And be “different enough” to avoid detection  Detection is a current research problem

28 Malware 27 Metamorphic Malware  Metamorphic generator o Standalone app that generates metamorphic code o Source of endless “new” malware  Metamorphic virus that “carries its own generator” o Much more difficult to construct

29 Malware 28 Metamorphic Worm  One approach to metamorphic replication… o Disassemble the worm o Worm stripped to a base form o Random variations inserted into code (permute the code, insert dead code, etc., etc.) o Assemble the resulting code  Goal is worm with same functionality as original, but different signature

30 Malware 29 Warhol Worm  “In the future everybody will be world- famous for 15 minutes”  Andy Warhol  A Warhol Worm is designed to infect the entire Internet in 15 minutes  Slammer infected 250,000 in 10 minutes o “Burned out” bandwidth o Slammer could not have infected all of Internet in 15 minutes  too bandwidth intensive  Can a worm do “better” than Slammer?

31 Malware 30 A Possible Warhol Worm  Seed worm with an initial hit list containing a set of vulnerable IP addresses o List depends on the particular exploit… o Tools exist for identifying vulnerable systems  Each successful initial infection would attack selected part of IP address space  No worm this sophisticated has yet been seen in the wild (as of 2008) o Even slammer generated random IP addresses  Could infect entire Internet in 15 minutes!

32 Malware 31 Flash Worm  Possible to do “better” than Warhol worm?  Infect entire Internet in less than 15 minutes?  Searching for vulnerable IP addresses is the slow part of any worm attack  Searching might be bandwidth limited o Like Slammer  Flash worm designed to infect entire Internet almost instantly

33 Malware 32 Flash Worm  Predetermine all vulnerable IP addresses o Depends on details of the particular attack  Embed all known vulnerable addresses in worm(s)  Results in huge worm(s) (perhaps 400KB)  Whenever the worm replicates, it splits  Virtually no wasted time or bandwidth! Original worm(s) 1st generation 2nd generation

34 Malware 33 Flash Worm  Estimated that ideal flash worm could infect the entire Internet in 15 seconds!  Much faster than humans could respond  A conjectured defense against flash worms o Deploy many “personal IDSs” o Master IDS watches over the personal IDSs o When master IDS detects unusual activity, lets it proceed on a few nodes, blocks it elsewhere o If sacrificial nodes adversely affected, attack is prevented almost everywhere

35 Malware 34 Botnets  Today, “botnets” are often portrayed as biggest malware threat o Many compromised machines (zombies) under control of botmaster (bot-herder)  Why botnets? o Spamming o Distributed DoS attacks o Other “anonymous” malicious attacks

36 Malware 35 Botnets  Usually, controlled via IRC o But this is possible weakness o Shut down IRC server  Today, much interest in P2P botnets o More robust, harder to shut down o But, much harder to design and control  A good (but difficult) research topic

37 Malware 36 Whatever Happened to…  Since Slammer (2004), appears that there are few “fast” worms  Few new metamorphics since early 2000s  So, whatever happened to flash worms, metamorphic worms, etc.? o Difficult to develop? o Better detection? o Botnets?  Maybe just a lull before the storm?

38 Malware 37 Metamorphic Viruses

39 Malware 38 Metamorphic Viruses  Some interesting questions… Q: How metamorphic are existing “metamorphic” generators? Q: How to detect metamorphic viruses? Q: How to build a “better” metamorphic generator?

40 Malware 39 Hunting for Metamorphic Generators…  First, how to compare X.exe and Y.exe?  Disassemble and extract opcodes o x 1,x 2,…,x n from X and y 1,y 2,…,y m from Y o Compare all subsequences of length 3 o They match if opcodes match (in any order) o If (x i,x i+1,x i+2 ) matches (y j,y j+1,y j+2 ) then plot a point in x,y-plane at (i,j)  Reduce “noise” in resulting picture by requiring 5 consecutive matches

41 Malware 40 Comparing Executables  The process…

42 Malware 41 Comparing Executables  Compute a score based on picture as follows  Increment count for each opcode that is “covered” by a line segment o Do this for both x axis and y axis  Divide total count by (n + m) o Identical programs yield solid line on diagonal and (symmetric) noise, with score of 1.0 o Similar code has line segments parallel to diagonal and often scores greater than 0.5 o Unrelated programs have some random matches

43 Malware 42 Comparing Code to Itself  Note: “noise” not removed from this example  Here, score is 1.0

44 Malware 43 Comparing Metamorphic Code  Two files from “VCL32” generator  Score 0.60

45 Malware 44 Comparing Metamorphic Code  Files from “MPCGEN” generator  Score 0.57

46 Malware 45 Comparing Metamorphic Code  Files from “G2” generator  Score 0.75

47 Malware 46 Comparing Metamorphic Code  Files from “NGVCK” generator  Score 0.12

48 Malware 47 Comparing Normal Code  Randomly selected “normal” files o Cygwin utilities  Score 0.35

49 Malware 48 Metamorphic Generators  Metamorphic generators & normal files

50 Malware 49 Conclusion?  With 1 exception, metamorphic generators tested are not good o Only NGVCK is better than “normal” Q: Why so few good generators? A: Generating metamorphic code is a lot harder than it seems…

51 Malware 50 Detecting Metamorphic  We use Hidden Markov Models (HMMs) o A type of “machine learning” o Like neural nets, but not as sexy… o …but, arguably, easier and more informative  Assume there is some Markov process which is hidden  We are only able to observe some (indirect) effect of the Markov process

52 Malware 51 HMM  Markov process “behind the scenes” o Here, X 0  X 1  X 2  … (matrix A)  We only get to see observations, O i o The O i are related to X i via matrix B

53 Malware 52 HMM Example  Suppose tree growth ring sizes are related to average annual temperature o We cannot go back in time and measure temperature o But we can measure tree growth rings  With HMM, can obtain info about (hidden) temp, based on observed tree ring sizes

54 Malware 53 HMM Example  We assume temperature determined by a (hidden) Markov process… o …and we can observe tree growth rings  Suppose year-to-year temp (hot or cold), determined by:  And temperature related to growth rings according to:

55 Malware 54 HMM Example  Then we can define HMM as (A,B,  ) where  A is matrix for the (hidden) Markov process  B relates hidden state to observations   gives initial probabilities

56 Malware 55 HMM Example  Suppose for some 4-year period we observe tree ring sizes (S,M,S,L) o Where S,M,L are small, medium, large, respectively Q: What were “most likely” temps? A: Depends on what you mean by “most likely”  Dynamic programming (DP) finds best “path”  HMM maximizes expected number of correct states (expectation maximization)

57 Malware 56 HMM Example  Let’s use 0,1,2 for S,M,L, respectively  Then what is most likely state sequence given observation (0,1,0,2)? o Where “most likely” is in the HMM sense  Notation: A = {a ij } where  Notation: B = {b j (k)} where

58 Malware 57 HMM Example  Let X = (x 0,x 1,x 2,x 3 ) be state sequence o In our example, each x i is either H or C  And, for any such X,  For example, given observation sequence (O 0,O 1,O 2,O 3 ) = (0,1,0,2),

59 Malware 58 HMM Example  For observation sequence (0,1,0,2) we find   So, most likely state sequence is…  CCCH

60 Malware 59 HMMs  Real strength of HMMs due to existence of efficient algorithms  Efficient HMM algorithms exist for 1. Given a model, score an observation sequence 2. Find “most likely” hidden states 3. Generate a model from “training” data  Note: generating a model (number 3) is the sense that HMM is “machine learning” o Only specify N, number of hidden states

61 Malware 60 Uses for HMMs  Speech recognition o Train a model based on features (observations) extracted from speech o When someone speaks, extract same observations and score against the model o If score is high, it’s probably original speaker  DNA sequencing/protein modeling  Martian studying English text  Metamorphic virus detection…

62 Malware 61 Martians and English Text?  Martian knows nothing about English…  …but gets a lot of English text  Of course, decides to use HMMs to analyze the text o Remove all punctuation, make letters lower-case o Then 27 different symbols can be observed o Start with 2 hidden states…

63 Malware 62 HMM and English Text  Choose N = 2  Then A matrix is 2 x 2  And B matrix is 2 x 27  Train on about 50,000 letters of text, gives B matrix on next slide…  What happens for N = 3, N = 4, … ?

64 Malware 63

65 Malware 64 For More Info on HMMs  A revealing introduction to HMMs, Stamp A revealing introduction to HMMs o Of course, this is the best source…  A tutorial on HMMs and selected applications in speech recognition, Rabiner A tutorial on HMMs and selected applications in speech recognition o The standard reference

66 Malware 65 HMM-Based Detection  Assuming we have many metamorphic viruses from same generator  Extract opcodes and append o Yields one long opcode sequence  Train HMM model using opcode sequence  Then given an unknown file… o Extract its opcode sequence o Score its opcode sequence against the model o High score, then likely virus from same “family”

67 Malware 66 Training, Testing, Scoring  With 200 NGVCK files…

68 Malware 67 Detection Results  NGVCK vs normal files

69 Malware 68 More Detection Results  NGVCK, normal, and VCL32

70 Malware 69 HMM-Based Detection  Highly effective o And HMM only requires 2 or 3 hidden states  In fact, so effective, it should be patented o But it’s not (long story…)  Did I mention that it’s effective?  However, this method is not (yet) practical o Need to extract opcodes from “scanned” files  Ongoing student project will change that…

71 Malware 70 More Info  For more info on HMM-based detection of metamorphic malware, Hunting for metamorphic enginesHunting for metamorphic engines, Wing Wong and Mark Stamp, Journal in Computer Virology, December 2006 o Complete, thorough, readable, etc.

72 Malware 71 Profile HMM  “Profile” HMMs widely used in bioinformatics  Standard HMM does not take into account positional information o Markov process does not “know” (or care) where it is within observation sequence  In bioinformatics, position within the sequence is often critical o Profile HMMs developed for such problems

73 Malware 72 PHMMs  “Usual” picture of PHMM  Includes “insert” and “delete” states o To allow for gaps and incorrect symbols

74 Malware 73 PHMM vs Standard HMM  For PHMMs…  Different B matrix for each step, that is, positional dependence o New B for each step: B t  Insertions and deletions allowed  Algorithms much more complex  Initial “alignment” of sequences is a separate problem from PHMM training

75 Malware 74 PHMMs for Metamorphic Detection?  Might PHMM be better than HMM? o Possibly, stronger model if positional info taken into account o Analogy to biology: metamorphics “mutate”  Might PHMM be worse than HMM? o More complex o Positional info not useful

76 Malware 75 PHMM-Based Detection  Two students worked on this problem o One developed initial alignments o Other developed PHMM for this problem  Both did excellent work o PHMM had the bigger “wow” factor o Student who did initial alignment caught some undeserved grief

77 Malware 76 PHMM Detection Results  VCL32  Good results

78 Malware 77 PHMM Detection Results  NGVCK  Good, but not as impressive as HMM

79 Malware 78 PHMM Bottom Line  Interesting idea…  Very effective on certain types of metamorphism…  …not so effective against others o What morphing is hard for PHMM?  Overall, not as effective as HMM

80 Malware 79 More Info  Standard reference on PHMM o Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, Durbin, et. al. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids  Paper on PHMM-based detection o Profile hidden Markov models and metamorphic virus detection, Attaluri, McGhee, Stamp, Journal in Computer Virology, May 2009 Profile hidden Markov models and metamorphic virus detection

81 Malware 80 Undetectable Metamorphic?  Goal is to create metamorphic generator that will o Evade signature detection o And evade HMM-based detection  How to accomplish this? o Code must be highly metamorphic (to evade signature detection) o Code must look “normal” (to evade HMM/statistcal/heuristic-based detection)

82 Malware 81 Metamorphic Generator I  First attempt…  Generator is only moderately metamorphic  So, we iterated it several times o After 9 iterations, code is very metamorphic o But, code grows a lot due to junk insertion  What about detection?  See next slides…

83 Malware 82 Metamorphic Generator I  Trained HMM on 9th generation files  Scores vs normal files

84 Malware 83 Metamorphic Generator I  Graph of 9th gen. scores vs normal files

85 Malware 84 Metamorphic Generator II  Second attempt…  Appears to be much more successful  Better metamorphic generator  Junk code is taken from normal files o Entire subroutine o Or a few lines of code with jumps  See next slide…

86 Malware 85 Metamorphic Generator II  Not sure which generator…

87 Malware 86 More Info  Coming soon…

88 Malware 87 Ongoing Related Projects  Use HMM to detect “provably undetectable” viruses  Practical HMM-based detection o “Approximate disassembly”  Virus with built-in buffer overflow o Sneaky way to reach “dead” code

89 Malware 88 Backdoor.Hacarmy.D  Analysis of botnet code

90 Malware 89 Unpacking

91 Malware 90 Unpacking

92 Malware 91 Unpacking

93 Malware 92 Unpacking  Blah

94 Malware 93 Dumpbin  Blah

95 Malware 94 Dumpbin  Blah

96 Malware 95 Dumpbin  Blah

97 Malware 96 Dumpbin  Blah

98 Malware 97 Initial Impressions

99 Malware 98 Installation

100 Malware 99 Installation  Blah

101 Malware 100 Installation  Blah

102 Malware 101 Installation  Blah

103 Malware 102 Installation  Blah

104 Malware 103 Installation  Blah

105 Malware 104 Initializing Communication

106 Malware 105 Network Connection  Blah

107 Malware 106 Connect to Server

108 Malware 107 Connect to Server  Blah

109 Malware 108 Connect to Server  Blah

110 Malware 109 Connect to Server  Blah

111 Malware 110 Connect to Server  Blah

112 Malware 111 Joining the Channel

113 Malware 112 Joining the Channel  Blah

114 Malware 113 Communicate with Backdoor

115 Malware 114 Communication  Blah

116 Malware 115 Communication  Blah

117 Malware 116 Communication  Blah

118 Malware 117 Communication  Blah

119 Malware 118 Communication

120 Malware 119 Running SOCKS4 Server

121 Malware 120 Clearing Crime Scene

122 Malware 121 Clearing Crime Scene  Blah

123 Malware 122 Hacarmy Commands

124 Malware 123 Hacarmy Commands

125 Malware 124 Hacarmy Commands

126 Malware 125 Conclusions


Download ppt "Malware 1 Malware Malware 2 Malicious Software  Malware is not new…  Fred Cohen’s initial virus work in 1980’s o Used viruses to break MLS systems."

Similar presentations


Ads by Google