Future Issues Page 2 Bots Likely increasingly relevant compared to worms with commercialization of malware, due to finer-grained, lower profile Blended version: use worm to get bots (or use bots to flash worm) Today, detectable via monitoring IRC –near term: need IRC-on-random-port detectors (easy) –longer term: arms race Would like to understand protocols, intent behind uses to not only monitor but actively participate... –... modulo pesky issue of legality –may be able to fake up lame bots in the near-term –likely labor-intensive
Future Issues Page 3 Camouflage If we're discovered, attackers may –avoid, diminishing our analysis –mislead, skewing our analysis –bombard, complicating our lives How much time do we have before attackers care about us? –depends on publicity we engender –also, may detect us incidentally since care about thwarting other efforts based on similar technology
Future Issues Page 4 Camouflage, Con’t Hiding the telescope feeds –would like highly diverse set of endpoints –in fact, want this even w/o camouflage problem –end-user clients? trust & privacy? –dynamic endpoints (prevents long-term address-based fingerprinting) - exploit DHCP? client takes leases, tunnels them back modified dhcpd that tunnels unallocated addresses issues WRT outbound response? Hiding the faux interaction –how do we prevent fingerprinting based on front-end filtering? –fingerprinting based on VM detection? –Fundamental: fingerprinting based on containment?
Future Issues Page 5 Attracting traffic How do we plug our honeyfarm capabilities into application-level topologies? –join P2P networks, IM buddy lists, Web links, IRC? How do we get indexed in directories? –DNS zone files, email address books, host content so Google finds us? Opportunistic use of hotspots? –e.g., 126.96.36.199/23; acme.com Any principles we can pursue rather than one-offs? Can we leverage failures? –e.g., SMTP servers when asked for non-existent name might occasionally tunnel to us –e.g., Web servers on 404 might send to us
Future Issues Page 6 Attracting traffic, con’t How do we plug our honeyfarm capabilities into application-level Going out and looking for trouble –partnering with MSR Strider Honeymonkeys? –Eliza technology for agitating eleet IRC channels? These all add to camouflage issues Long shot: sniff live traffic and replay it looking for trouble
Future Issues Page 7 Worm growth prediction How can we predict early in the onset just how a worm will evolve? Relates to "situational awareness" issues –Would like to have a quantitative basis for worm "threat level” Basic idea: take analytic model for growth and invert it –fit measured probe behavior and solve for growth rate and size of susceptible population Challenges –noisy telescope data leads to big swings in prediction –doable for pure random worms; more complex target selection tougher –possible to incorporate service density information to improve prediction quality early in life cycle?
Future Issues Page 8 VM-based replay/introspection Speculative execution potentially great boon to containment Introspection key technology for detecting malware that does not manifest in network activity –monitor and manipulate guest OS from safety of VMM Utility of replay –can try malcode against "panel" of OS/application versions –can "undo" effects (at least local effects) of infection (possible defense strategy) –can crudely evaluate whether host infected _after_ infection, then replay with high level of instrumentation turned on –ReVirt from Xen available soon (Umich)
Future Issues Page 9 Host-level Analysis How to understand what a piece of malcode does –Doing in real-time in generality appears intractable at least, given our resources one could picture having technology for matching malware components against a library of programming idioms presumably commercial AV is way ahead of us here –Possibly doable in empirical fashion, given replay + introspection, for understanding behavior @ high-level...... if code not written to evade Extracting abstract malcode descriptions –Take known instance of exploit (provided by honeyfarm) –Replay to generate execution/state transition stream –Attempt to generalize to variants exploiting the same flaw –Distill broad host-based signature –Honeyfarm as oracle for finding bad things, facilitates replay
Future Issues Page 10 Legal issues Liability issues WRT our use of honeypots Evidentiary issues about how well some of our detection/identification technologies would serve in court
Future Issues Page 11 TAB Questions Revisited Are we considering the right threats? –what about mobile phone malware, spyware, bots, etc. Are there technical approaches we should be considering? Are we missing any important partnership opportunities? –for data, technology, or expertise? Are we missing any key capabilities on our team? What education/training is necessary/missing for practitioners in the field? What should we be thinking about incorporating into our outreach workshop in this regard?