2Outline Introduction Literature & Technical Review Project Status ForumsIRCHoneypotsProject StatusResearch ProjectsConclusion
3IntroductionAs computers become more ubiquitous throughout society, the security of networks and information systems is a growing concern.An increasing amount of critical infrastructure relies on computers and information technologiesAdvancing technologies have enabled hackers to commit cybercrime much more easily now than in the past.At the same time, accessibility to technologies and methods to commit cybercrime has grown (Radianti & Gonzalez, 2009)As a result, more researchers have become interested in the cyber domain.
4IntroductionTraditional cybersecurity research has focused on technological challenges and improvements to mitigate cyberattacks (Geer, 2005)Overall, there has been a lack of work investigating cybercriminal communities and the human element behind cybercrime (Hopper et al, 2009; Holt & Kilger, 2012).Little is understood about hacker social behaviors, the cybercriminal supply chain, etc.Recently, security researchers have begun conducting more explorations of hacker communities in tandem with focusing on the technological element of securityHacker communities contain useful information about cybercriminal black markets, emerging threats, attack trends, tutorials, malware samples, etc. (Radianti & Gonzalez, 2009 ; Motomaya et al, 2011; Benjamin & Chen, 2012).Many unique research questions can be investigated using data collected from hacker communities, providing new insights for security researchers and practitionersHere we review hacker community research relevant to our project goals
5Literature ReviewExisting literature is useful for providing details on the various facets of cybersecurity researchTo conduct our own research, we borrow insights and methodologies commonly identified within reviewed literatureThe human element behind cybercrime, including explorations of hacker forums, IRC channels, and other hacker social media.More traditional security research, such malware analysis, honeypot research, botnet research, and research utilizing network logs.Information on identifying data sources, data collection methodologies, analytical methods, and existing research gaps.
6Forums - Identification Public SourcesResearchers look to third parties for information on identifying hacker forumsRadianti et al, 2007, found a hacker forums cited in news or other mediaOthers have utilized the Google Safe Browsing API to acquire data on malicious or cybercrime related websites (Cova et al, 2010).Keyword SearchesAnother method commonly used is to conduct a series of keyword searches.For example, Holt & Lampke, 2010, crafted the keyword search “carding dump purchase sale cvv” to identify hacker black markets where stolen credit card information is sold.Keyword searches seem to be common in many similar studies on hacker forums (Fallman et al, 2010).Link IdentificationLastly, it is common practice to scrutinize known forums for links to other hacker forums and communities.Many studies found that hacker forum participants often cite or refer to other hacker communities (Radianti et al, 2009; Fallman et al, 2010; Holt et al, 2012).Thus, a snowball approach using one forum to identify many others could be promising.
7Forums - Collection Manual Collection Most of the reviewed literature resorted to manual collection or observation of dataSome researchers simply observe live forums without attempting any sort of collection (Holt, 2010; Yip, 2011).However, it is acknowledged that taking more active collection approaches, such as registering forum accounts, is at times useful gain access to restricted contents (Holt, 2010)Such restrictions are only lifted after a forum participant has registered to a forum or if they have been part of the community for a certain length of time.Other researchers move beyond observation and manually downloaded threads (Radianti et al, 2009; Holt & Lampke, 2010; Motoyama et al, 2011).It is important to store data intended for research offline, as hacker forums may sometimes spontaneously disappear or reduce visibility (Radianti, 2010).Manually collected contents are often also manually coded (Radianti et al, 2009; Holt et al, 2012).
8Forums - Collection Automated Collection Other researchers utilize more automated data collection methods.For example, Benjamin & Chen, 2012 used a web crawler to automatically collect all publicly available content from America and Chinese hacker forums.However, anti-crawling measures are sometimes put in place by hacker forums, (Spencer, 2008; Fallman et al, 2010)Heavy anti-crawling measures make automated collection a very difficult and slow processIn some cases, it may be necessary to use proxy servers and other identity obfuscation techniques to avoid detection of crawling activities (Goel, 2011).Anti-crawling measures seem to be a large reason as to why most research to date has utilized manual collection methods
9Forums – Anti-Crawling Measures Bandwidth monitoringCan be circumvented by creating crawlers that employ more human-like crawling ratesCan also be circumvented by creating a “distributed crawler”.One computer acts as a master and distributes hyperlinks to different computers for crawling. Newly discovered hyperlinks are reported to the master.This way, it appears that different users are accessing the community when in fact you are just using different computers to share the spidering work on one forum.
10Forums – Anti-Crawling Measures CAPTCHA images (verification codes)Requires human input to correctly enter verification codes – when a correct CAPTCHA is submitted, the client obtains a session cookie used to create an authenticated session with the serverIn this case, a crawler needs to be able to utilize such session cookies so that re-input of a verification code is not needed. If the spider cannot utilize the authenticated session cookie, the server will see the spider as a new client and request verification through a CAPTCHA image once again
11Forums – Anti-Crawling Measures PaywallsSome communities require a fee to be able to browse and view content. I do not know if these communities are legitimate or if they are scams; I suggest we just avoid these communities as there are other more open sources of data.Waiting PeriodsSome forums require newly registered users to wait a certain length of time before being able to access all forum contentsClosed registration\Invitation-onlySome forums close their registration or are invitation-only. We can’t really do anything about this unless someone else provides us with an already existing account.We are registering accounts on the forums that we are already crawling in case they someday close registration or become invitation-only.
12Forums – Identity Obfuscation To avoid some anti-crawling measures, we must practice identity obfuscationWe may need to reduce bot-like behaviors during collectionWe may also want to mask our true identityReducing crawling rate is useful for circumventing anti-crawling measures that monitor bandwidth usage or page viewsTo mask our identity, we can utilize proxy servers or peer-to-peer networks to route traffic throughLets us even regain access to forums than ban us via IP bansStand-alone web proxies and peer-to-peer networks such as Tor are both useful for identity obfuscation
13Forums – Identity Obfuscation Traditional proxy server configuration
17Forums – Identity Obfuscation Proxy ServersTor NetworkRequirementsNoneThe Tor network client (~9MB)ProtocolTypically HTTP or SOCKSSOCKS onlyUsageSend local network traffic to proxy server for re-routing to destination serverTunnel local network traffic to local Tor client; Tor client automatically handles peer-to-peer networking and routing traffic to the destination serverWhat does destination server see?Proxy server IP addressIP address of the last Tor relay used to route your message to the destination serverAssuming a new identity?A new proxy server must be used in replacement of current the current proxyTor client can automatically select new relay nodes when a new identity is neededFinding new servers?Lists of public proxy servers exist across various websites that can be identified through keyword searches (e.g. “public proxies”)The Tor client will automatically find new relays for the user. Selection parameters can be used to only use or exclude relays from specific countries
18Forums – Identity Obfuscation There any many stand-alone public proxy serversHowever, they are usually overused and thus slowNot stable, many are short-livedNeed to constantly find new serversIt may be better to use a peer-to-peer anonymization networks, such as Tor or I2PEstablished peer-to-peer networks are more stable than stand-alone proxiesP2P-network protocols often support automated server discoveryTor is perhaps the most popular of such networks; requires connection with a special Tor network client (Ling et al, 2011; Tschorsch & Scheurmann, 2011; Akhoondi et al, 2012)Other less popular networks exist such as Freenet and I2P. (Leavitt, 2009; Fu et al, 2010)
19Forums – Identity Obfuscation Various screenshots of the graphical Tor controller Vidalia. Left: A map allows users to view the locations of all published Tor relay nodes Middle: A real-time log of Tor network events allows users to monitor Tor activity Right: A basic interface that allows Tor users to quickly assume a new identity by routing traffic through a new circuit
20Forums – Analytical Methods After hacker forum contents are collected, they can be analyzed using content and network analysesContent analysis would be useful for understanding the discuss and information inside hacker social mediaIn the set of literature we reviewed, these studies tend to employ manual collection and analytical methodsGenerally conduct simple counting and statistical work of coded contents for analysis (Holt & Lampke, 2010; Radianti, 2010; Imperva 2012).Network analyses often aim to observe the relationships between forum participants (Motoyama et al, 2011, Holt et alsss, 2012)Both manual observations and automated techniques have been utilizedHelps to better understand the community social structures and hacker interaction behaviors
21IRC - IdentificationCyber security research conducted on IRC channels often focuses on questions relating to both hacker communities as well as botnets.Hacker community IRC research is similar to forum studies, as researchers attempt to locate hacker discussions and cybercriminal black markets.Botnet related research is more focused on identifying botnet command & control (C&C) channelsC&C channels are chat rooms often used by cybercriminals control large groups of malware-infected “zombie” computersIn either case, finding relevant IRC channels to collect data from is a challenge
22IRC - IdentificationAn example of a hacker IRC channel. A list of users, their messages, and timestamps for each message can be seen. The participants are discussing sqlmap, a tool for automated SQL injection and database hijacking, as well as programming concepts. The top header also includes links to other IRC channels affiliated with this one.
23IRC - IdentificationAs stated earlier, participants of hacker forums will often times cite and provide URLs of other hacker resources.This includes IRC channels (Radianti et al, 2009; Radianti, 2010).Often times, a hacker forum will have an associated IRC channel, or forum participants will simply mention other private channelsSome researchers collect content from IRC channels at random and perform content analysis to determine whether a channel is hacker-related (Fallman et al, 2010).Use of automated bots to log IRC chat dataMachine learning classifier can be used to check contents
24IRC - IdentificationA different research focus for some security researchers is to identify botnet command and control (C&C) channels.These channels are used by cybercriminal “botmasters” to give commands to collections of malware-infected computers that covertly join the IRC channel and wait for instruction.
25IRC - IdentificationC&C identification techniques have generally utilized honeypotsHoneypots are systems that are configured to simulate computer systems with software vulnerabilitiesCan allow wild malware to intentionally exploit honeypot vulnerabilities; malware behaviors can be captured and studied in a sandboxed environment (Rajab et al, 2006; Lu et al, 2009).All code execution, system changes, and network traffic are tracked and logged within a honeypot (Mielke & Chen, 2008; Zhu et al, 2008).By observing outbound network traffic generated by malware, researchers may potentially reveal botnet C&C channels and other hacker-related web addresses.
26IRC - CollectionThere are two common techniques used to collect IRC chat data, but both involve logging of real-time chat.Logging IRC chat in real-time manually or using automated bots. (Fallman et al, 2010)Scraping IRC packet contents generated by a honeypot’s local network traffic (Lu et al, 2009)Several strategies can be taken to effectively use bots and ensure comprehensive data collection (Fallmann et al, 2010):Swap strategy – Some IRC channels will automatically disconnect users who appear idle. Thus, it can be useful to occasionally rotate bots into different IRC channels for logging, avoiding some problems with idlingUse of multiple bots in the same channel can be used to help ensure comprehensive collection in case some bots get disconnectedPacket scraping requires the use of network traffic analyzer softwareWireshark is a popular resource that can be freely used to capture network packets
27IRC – Analytical Methods Different forms of analysis should be used depending on research goals and data. For example, the goals and methods used for analysis would be different in:Botnet research with data from command & control channelsResearch on IRC channels affiliated with hacker forums or acting as social hubsThe simplest method of analysis, much like hacker forums, is to manually sift through data (Franklin et al, 2007; Fallmann et al. 2010; Motoyama et al. 2011)Automated content and network analyses could be extended to IRC datasets as well when studying hacker IRC channelsCan reveal emerging threats, popular tools and methodsMay help with attack attribution
28IRC – Analytical Methods For botnet C&C channels, there common themes for analysisCharacterizing botmaster activityPaxton et al, 2011 investigate the different operational styles used by botmasters by computing some usage statistics per botnet masterMielke & Chen, 2008 use clustering to identify potential collaboration between botmasters based on their participation across different known C&C channelsIdentifying botnets based on network trafficMuch research is spent analyzing honeypot captures and network logs to develop new techniques to combat evolving botnets (Lu et al, 2009; Choi & Lee, 2012)Botnets are becoming increasingly more sophisticated in evading detection
29HoneypotsHoneypots are computers or clients that are setup with the purpose of attracting and logging cyber-attacks in real timeOften emulate or are exposed to live security vulnerabilities in order to capture and monitor both malware and cyber-attackersCan be used to monitor various protocols, applications, or operating system attacksAs mentioned in botnet literature, honeypot log files can be useful for identifying new botnets and observing malwareInfected honeypots may be transmitting or receiving data from botnet C&C channels. Analyzing network data could reveal new botnetsMalware execution behaviors can be logged, studied for malware research
30HoneypotsTwo types of honeypots exist (Zhuge et al, 2008; Cova et al, 2010):Low-interaction honey pots:Only assess malicious threats at a shallow level. They record occurrences of attacks and some associated metadata, but the level of data capture a low-interaction honeypot is typically limited to just recording that a malware or cybercriminal incident occurredEasy to set up. For example, a low-interaction honeypot could be a web crawler that randomly surfs the Internet to find websites that attempt to employ drive-by attacks (web pages attempting to remotely execute code through browser or other application vulnerabilities). However, as it is low-interaction, it does not allow the malware to execute; data capture limited to logging initial event.High-interaction honey potsTypically provide a much more comprehensive behavioral analysis of malware or hacker behavior. They may record all system changes, registry hooks, library calls, etc. made by malware or that occur during a cyber-attack.High-interaction honeypots are more difficult to set-up as they require a significantly more overhead (you need to basically dedicate an entire operating system instance towards acting as a high-interaction honeypot).
33Honeypots – Analytical Methods As mentioned in the IRC literature review, monitoring honeypot network traffic logs (or network logs in general) can reveal the addresses of botnet C&C channelsHoneypots can also provide log data which would be useful for analysisIdentifying the class of unknown malware by analyzing malware execution behavior logs through machine learning classification and clustering (Rieck et al, 2011).Automated identification of advanced persistent threats (APTs), an increasingly important area of security research (Binde et al, 2011; Hutchins et al, 2011).
34Project StatusAfter reviewing literature, we planned to collect data through:Hacker forumsHacker IRC channelsHoneypotsWe are interested in data relating too:Hacker social mediaBotnet C&C channelsHoneypot log data
35Project Status - Forums We identified 20 hacker forums from 4 geopolitical regions that seemed of research interest5 forums from China, the Middle-East, Russia, and the U.S. respectivelyLanguages: Mandarin, Arabic, Farsi, Russian, EnglishForums were manually explored to review activity levels, depth of discussions, black market activity, social mechanisms such as ‘friending’ or ‘liking’, and other interesting aspectsRelevant forums were chosen for collection using an automated crawler.However, unlike traditional crawling, we must account for anti-crawling measures
36Project Status - Forums RegionStatusAnti-Crawling Measures?bbs.51cto.comChinaParsedNcnhonkerarmy.comLinuxprobe.combbs.hackdark.comheishou.orgv4-team.comMiddle-East\Iran (Arabic)Yashiyane.org/forumsMiddle-East\Iran (Persian)forums.mihandownload.comshabgard.org/forums-arhack.net/vbantichat.ruRussiaexploit.in/forumSpideringZloy.bzforum.xeka.ruforum.xakepok.netIc0de.orgUnited StatesAnon-hackers.comVctools.netElitehackforums.comHackhound.org
37Project Status - Forums Forums with Anti-crawling MeasuresRegionAnti-Crawling Descriptionv4-team.comMiddle-East\Iran (Arabic)Anti-Crawling Measure: Bans IP addresses associated with non-human browsing activityWork-around: Slowed crawlers down to only crawl only a couple pages every few seconds second. Community has been collected.antichat.ruRussiaexploit.in/forumRussianAnti-Crawling Measure: Forum blocks US-based trafficWork-around: Use of TOR to route traffic through other countries before reaching destination serverAnon-hackers.comUnited StatesAnti-Crawling Measure: Some contents are restricted from new or young accountsWork-around: Wait for account to age on the forum
38Project Status - Forums Forum Name# of Members# of Threads# of PostsStart DateEnd DateLanguage51cto298,856239,8282,641,548ChineseCnhonkerarmy94,19965,2131,117,135Hackdark14,51511,499233,108Heishou10,58929,215190,465Linuxprobe17,17422,358702,986Xeksec.ru14,73239,24050,594Russianforum.xakepok.net3,83913,66546,076zloy.bz14,30269,385485,231antichat.ru25,50019,515199,929
39Project Status - Forums Forum Name# of Members# of Threads# of PostsStart DateEnd DateLanguageArhack.net19,64946,285429,507PerisanShabgard.org2,92211,89577,903Mihandownload5,292181,460169,279Persianashiyane.org16,40410,605172,478v4-team.com12,93433,999102,757Arabicic0de.org7713,16612,347Englishhackhound.org6789907,067anon-hackers.com1,1098833,949vctools.net6,56310,38342,935elitehackforums1,6763,92314,267Deeper summarization of each forum is forth-coming (Popular topics, top users, etc)
40Project Status - Forums Black market discussions on a Russian hacker forum (antichat.ru). Most discussions regard the sale of stolen and pirated software.In one popular thread, various stolen security products are found for sale
41Project Status - Forums An Egyptian hacker shares a personal project on an American hacking forum (elitehackforums.com). This is an interesting case of hackers sharing tools with other hackers outside their own geopolitical. Additionally, the poster appears to be affiliated with a group called the “Egyptian Shell Team.” It would be interesting to analyze occurrences of this hacker, software, and team name in the hacker social media of other geopolitical regions. Additionally, searches for the “Egyptian Shell Team” may reveal a group-run forum or IRC channel, perhaps for coordinating.
42Project Status - IRCHacker IRC channels can be detected by searching through hacker forumsSome hacker forums host an official IRC channelForum members often share IRC channels they are affiliated withIRC channels can be searched for automatically; the typical IRC server address pattern is irc.server.com:port_number while the typical channel name is #channelname. It should be noted that hashtags were used on IRC before they appeared on Twitter and other social media.There two ways to collect data as described by literature:Leaving chat-logging bots inside of IRC channels to collect data in real-timeCollecting IRC–related network packets using a honeypotData collection rates depend on popularity of IRC channels, which can vary widely between channelsPotential research direction:There are little to no studies comparing how hackers utilize different communication mediaCheck for correlation between IRC discussion and forum discussions, describe any differences.
43Project Status - IRCA hacker on the American community VCTools.net posts information for other forum participants to join a hacking-related IRC channel
44Project Status - IRCThe front-end for a web-based IRC client to connect users to Antichat.ru’s IRC server
45Project Status - IRCAside from hacker social rooms, IRC channels are also often hosts to botnet C&C channelsAforementioned literature describes using honeypots to intentionally execute malware and monitor network traffic for packets to\from a potential botnet C&C channelThis requires the selection and implementation of honeypot clientsWe currently have 2 honeypots running the Kippo Honeypot software (hosted on DigitalOcean.com)Low-interaction honeypotCaptures SSH protocol activitySelected due to simplicity to setup, and popularity of the SSH portCollected ~12,000 events thus far in two months (mostly brute force attacks)
46Project Status - Honeypots Brute force example::39: [HoneyPotTransport,491, ] starting service ssh-userauth:39: [SSHService ssh-userauth on HoneyPotTransport,491, ] paul trying auth password:39: [SSHService ssh-userauth on HoneyPotTransport,491, ] login attempt [paul/cacutza] failed:39: [-] paul failed auth password:39: [-] unauthorized login::39: [SSHService ssh-userauth on HoneyPotTransport,491, ] paul trying auth password:39: [SSHService ssh-userauth on HoneyPotTransport,491, ] login attempt [paul/paul] failed:39: [-] paul failed auth password:39: [-] unauthorized login::39: [HoneyPotTransport,491, ] connection lost:50: [kippo.core.honeypot.HoneyPotSSHFactory] New connection: :39126 ( :22) [session: 492]:50: [HoneyPotTransport,492, ] Remote SSH version: SSH-2.0-libssh2_1.4.1:50: [HoneyPotTransport,492, ] kex alg, key alg: diffie-hellman-group1-sha1 ssh-rsa:50: [HoneyPotTransport,492, ] outgoing: aes128-ctr hmac-sha1 none:50: [HoneyPotTransport,492, ] incoming: aes128-ctr hmac-sha1 none:50: [HoneyPotTransport,492, ] NEW KEYS:50: [HoneyPotTransport,492, ] starting service ssh-userauth:50: [SSHService ssh-userauth on HoneyPotTransport,492, ] office trying auth password:50: [SSHService ssh-userauth on HoneyPotTransport,492, ] login attempt [office/cacutza] failed:50: [-] office failed auth password:50: [-] unauthorized login::50: [SSHService ssh-userauth on HoneyPotTransport,492, ] office trying auth password:50: [SSHService ssh-userauth on HoneyPotTransport,492, ] login attempt [office/test123] failed:50: [-] office failed auth password:50: [-] unauthorized login:
47Project Status - Honeypots Next honeypot to be implemented is called HoneyCA tool by the Honeynet Project, whose work seems often used by security researchersLow-interaction honeypot, more advanced and customizable than KippoAfter HoneyC, another potential honeypot software to look at is Capture-HPCAnother Honeynet Project, but it is a high-interaction honeypot clientRequires a bit more time, expertise to set-up properlyCan provide us with data useful for malware analysis and IRC channel identification
49Research Project – Reputation Study As previously mentioned, many cybercriminal assets can be found freely available within hacker forums. (Radianti & Gonzalez, 2009; Motomaya et al, 2011).Contents are often for others to learn new techniques or to help improve a shared techniqueEven legitimate tools, such as search engines, are discussed for uses in cybercriminal attacksDue to open sharing of cybercriminal assets, successfully committing a cybercriminal act is much more accessible and easier than in the past (Zhuge et al, 2008; Moore & Clayton, 2009)Predictably has led to increased incidence of cybercrime, which is a growing problem for societyHowever, increased accessibility of hacking software and techniques has also led to increased competition among cybercriminals.For example, computer worms written by rivaling cybercriminals have been observed to seek out and uninstall one another from victim computers (Crandall et al, 2011)It seems counter-intuitive that cybercriminals would provide free assistance and resources to eventual competitors
50Research Project – Reputation Study Hackhound.orgHacking tool interfaceDescription of code functionalityHacker’s Reputation ScoreEmbedded sample of codeAttached Hacking ToolUnpack.cnLeft: A cybercriminal on hackhound.org publishes the latest version of his hacking tool meant to help others steal cached passwords on victims’ computers Right: A hacker of the Chinese community Unpack.cn posts sample code demonstrating how to reverse engineer software written in the Microsoft .NET framework
51Research Project – Reputation Study Perspectives from social psychology may be borrowed to attempt to explain this phenomenonSocial exchange theory states that all human relationships are the result of cost-benefit analysis and consideration of opportunity costs (Emerson, 1976)This theory suggests that hackers would perform a cost-benefit analysis of aiding other hackers, and subsequently choose to do soWhat may be the underlying motive for hackers to help one another?Past hacker literature reveals that peer approval and reputation are central to hacker social circlesReputation of sellers and buyers is a large factor for facilitating successful cybercriminal black market transactions. (Radianti 2010; Yip et al, 2013)Some hackers organize into groups to launch sophisticated, financially or politically motivated cyber attacks; such groups often seek to recruit members known to be technically skilled, educated, and able to utilize sophisticated cyber attack methods (Choo & Smith, 2008)
52Research Project – Reputation Study Hackers may be sharing knowledge for gains in reputationPrevious social science research has found that individuals that contribute to the cognitive advancement of their community will experience increases in their reputation, often leading to leadership positions (Muller, 2006)In the forum context, one would have to rely on crafting quality, insightful postings to contribute to the cognitive advancement of their community, as they are the primary way to communicate with other forum participants.Thus, forum messages should be evaluated for their relevance, quality, and contributionSince forum postings may play a large role in the composition of forum postings, a systematic method to analyze form posts must be createdMedia synchronicity theory (MST) helps explain aspects of effective communication performance of various forms of media (Fuller et al, 2008)It is especially useful for describing features of quality communication performance found in new forms of media, including digital contentsCurrently in the process of reviewing other papers to find features used in previous research that I can borrow (HTML features, network features, content features, etc)
53Research Project – Reputation Study Preliminary study presented at IEEE Intelligence and Security Informatics, 2012We collected two hacker communities from the United States and China to examine the mechanisms in which key actors ariseIdentified several features from literature that may contribute to hacker reputationRan a regression to identify what features contributed towardshacker reputation the mostFound that hackers who participated frequently and also contributed the most towards the cognitive advance of their community had the highest reputationThose who posted the most tools, source code,and other helpful contents saw the biggest gains inreputationOthers who participated regularly and were activein their community also appear to become reputable
56Research Project – Malware Analysis Program loads library for network communicationsAttack payloadLow-level instructions to access vulnerable application’s memory spaceAn example of a Perl exploit that attempts a remote buffer overflow attack on a popular enterprise Windows and Unix mailserver software. Malicious code such as this can be difficult for researchers to interpret in their explorations. Automated static analysis tools can help in such scenarios.
57Research Project – Malware Analysis Machine learning for attack vector identification in malware source codePreliminary study presented at IEEE Intelligence and Security Informatics, 2012Source code files for various malicious programs and scripts are among resources sharedHowever, research on malicious source code is largely limited largely due to both technical and language limitationsDocumentation may not be provided with source code, or may be written in a foreign languageAutomated tools that help security professionals and researchers overcome such limitations would be of great assetWe collected nearly 4,000 malicious source codes written in three different programming languages, utilizing four distinct attack vectors (local memory attacks, remote code execution attacks, web application exploits, and denial of service scripts)Research cites feature selection for malware analysis is difficult, so we utilize a genetic algorithm to select the optimal feature set to classify exploits by their attack vectorUsed SVM and C4.5 decision tree algorithms for classification, accuracy varied between 80%-95%Could improve accuracy by including program control flow feature, opcode, etc.
58Research Project – Botnet C&C Channels Identifying social networks in botnet C&C channelsA botnet monitoring group, the ShadowServer Foundation, provided the AI Lab with logs from multiple botnet C&C channelsText mining techniques were used to differentiate bot masters from connected zombie computersBot master names were tracked across all channelsSeveral names appeared frequently across the data setBy clustering bot masters according to their channelparticipation, potential collaboration between bot masterscan be identifiedThe roles of individuals within each group,and the overall operational style of each groupcan be identified by further analyzing C&C logsAdditionally, logs could be used to identify C&C activity patterns; this could help automatically identify future C&C channels
59Conclusion We’re continuing to work on data collection efforts Finishing forum collectionStarting to identify & collecting IRC channelsImplement more complex honeypotsCollection team will provide summaries of deeper content foundProgress will continue on research projectsExpand and publish hacker reputation studyRevisit other projects, continue forum research
61ReferencesHall, Angela T; Blass, Fred R; Ferris, Geral R; Massengale, Randy. Leader Reputation and Accountability in Organizations: Implications for Dysfunctional Leader Behavior. The Leadership Quarterly. Volume 15. Issue 4. August, 2004.Holt, T. J. (2010). Exploring Strategies for Qualitative Criminological and Criminal Justice Inquiry Using OnLine Data. Journal of Criminal Justice Education, 21(4), 466–487.Holt, T. J., & Kilger, M. (2012). Know Your Enemy : The Social Dynamics of Hacking. The Honeynet Project, 1–17.Holt, T. J., & Lampke, E. (2010). Exploring stolen data markets online: products and market forces. Criminal Justice Studies: A Critical Journal of Crime, Law, and Society, 23(1), 33–50.Holt, T. J., Strumsky, D., Smirnova, O., & Kilger, M. (2012). Examining the Social Networks of Malware Writers and Hackers. International Journal of Cyber Criminology, 6(1), 891–903.Hopper, L., Hopper, R., & Womble, P. (2009). Identifying network attacks from a social perspective IEEE Conference on Technologies for Homeland Security, 511–515.Hutchins, Eric M, Michael Cloppert, R. A. (2011). Intelligence-Driven Computer Network Defense Informed by Analysis of Adversary Campaigns and Intrusion Kill Chains. Lockheed Martin Corporation, (July 2005).II, C. J. M., & Chen, H. (2008). Botnets, and the CyberCriminal Underground. IEEE International Conference on Intelligence and Security Informatics 2008, 206–211.Imperva. (2012). Imperva Hacker Intelligence Intitiative. Monthly Trend Report #13. doi: /ana.23759Lampe, Klaus Von; Johansen, Per Ole. Organized Crime and Trust: On the Conceptualization and Empirical Relevance of Trust in the Context of Criminal Networks. Global Crime. Volume 6. IssueJang, D., Kim, M., Jung, H., & Noh, B. (2009). Analysis of HTTP2P Botnet : Case Study Waledac. IEEE 9th Malaysia International Conference on Communications, 15–17.Kshetri, N. (2006). The Simple Economics of Cybercrimes. IEEE Security & Privacy, Jan-Feb, 33–39.Leavitt, N. (2009). Anonymization Technology Takes a High Profile. IEEE Computer Society, (November), 15–18.Ling, Z., Luo, J., Yu, W., & Fu, X. (2011). Equal-Sized Cells Mean Equal-Sized Packets in Tor? 2011 IEEE International Conference on Communications (ICC), 1–6. Lu, W., & Ghorbani, A. a. (2008). Botnets Detection Based on IRC-Community. IEEE GLOBECOM IEEE Global Telecommunications Conference, (1), 1–5.Lu, W., Tavallaee, M., & Ghorbani, A. a. (2009). Automatic discovery of botnet communities on large-scale communication networks. Proceedings of the 4th International Symposium on Information, Computer, and Communications Security - ASIACCS ’09, 1.McCusker, R. (2006) Transnational organised cyber crime: distinguishing threat from reality. Crime, Law and Social Change. 46 (4-5),
62ReferencesMotoyama, M., McCoy, D., Levchenko, K., Savage, S., & Voelker, G. M. (2011). An analysis of underground forums. Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference - IMC ’11, 71.Moore, T., & Clayton, R. (2009). Evil Searching : Compromise and Recompromise of Internet Hosts for Phishing. Financial Cryptography and Data Security, 256–272.Muller, Paul. Reputation, Trust and the Dynamics of Leadership in Communities of Practice. Journal of Management and Governance. Volume 10. Number 4. November, 2006.Radianti, J. (2010). A Study of a Social Behavior inside the Online Black Markets Fourth International Conference on Emerging Security Information, Systems and Technologies, 88–92.Radianti, J., Rich, E., & Gonzalez, J. J. (2007). Using a Mixed Data Collection Strategy to Uncover Vulnerability Black Markets. Workshop for Information Security and Privacy.Radianti, J., Rich, E., & Gonzalez, J. J. (2009). Vulnerability Black Markets : Empirical Evidence and Scenario Simulation. 42nd Hawaii International Conference on, 1–10.Rieck, K., Trinius, P., Willems, C., & Holz, T. (2011). Automatic Analysis of Malware Behavior using Machine Learning. Journal of Computer Security, 1–30.Spencer, J. F. (2008). Using XML to map relationships in hacker forums. Proceedings of the 46th Annual Southeast Regional Conference on XX - ACM-SE 46, 487.Tschorsch, F., & Scheuermann, B. (2011). Tor is unfair — And what to do about it IEEE 36th Conference on Local Computer Networks, 432–440.Turrini, Elliot. (2010) Cybercrimes: A Multidisciplinary Analysis. Springer Publishing.Yadav, S., Reddy, A. K. K., & Reddy, A. L. N. (2010). Detecting Algorithmically Generated Malicious Domain Names Categories and Subject Descriptors. Proceedings of the 10th ACM SIGCOMM conference on Internet measurement.Yip, M. (2011). An Investigation into Chinese Cybercrime and the Applicability of Social Network Analysis. ACM Web Science Conference.Yip, M., Shadbolt, N., & Webber, C. (2013). Why Forums ? An Empirical Analysis into the Facilitating Factors of Carding Forums. ACM Web Science, May.Zhang, L., Yu, S., Wu, D., & Watters, P. (2011). A Survey on Latest Botnet Attack and Defense. 2011IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications, 53–60.Zhu, Z., Lu, G., Chen, Y., Fu, Z. J., Roberts, P., & Han, K. (2008). Botnet Research Survey nd Annual IEEE International Computer Software and Applications Conference, 967–972.Zhuge, J., Holz, T., Song, C., Guo, J., & Han, X. (2008). Studying Malicious Websites and the Underground Economy on the Chinese Web. Workshop on the Economics of Information Security, 225–244.