Presentation on theme: "Competitive Intelligence and the Web Presented at AMCIS2003 Tampa, Florida by Dr. Robert J. Boncella Washburn University."— Presentation transcript:
1 Competitive Intelligence and the Web Presented at AMCIS2003 Tampa, Florida by Dr. Robert J. Boncella Washburn University
2 Competitive Intelligence “the process of ethically collecting, analyzing and disseminating accurate, relevant, specific, timely, foresighted and actionable intelligence regarding the implications of the business environment, competitors and the organization itself”
3 Competitive Intelligence Process Planning and directionworking with decision makers to discover and hone their intelligence needsCollection activitiesconducted legally and ethicallyAnalysisinterpreting data and compiling recommended actionsDisseminationpresenting findings to decision makersFeedbacktaking into account the response of decision makers and their needs for continued intelligence
4 CI and The WebA business Web site will contain a variety of useful information,company history, corporate overviews, business visionsproduct overviews, financial data, sales figuresannual reports, press releases, biographies of top executives, locations of offices, and hiring ads.An example of this information isThe cost of this information is, for the most part, free.Access to open sources does not require proprietary software such as a number of commercial database
5 The Web Structure and Information Retrieval HTTP protocol and the use of Uniform Resource Locators (URL)Mathematical network of nodes and arcsInformation Retrieval (IR)follows the links (arcs)from document to document (node to node)Retrieve documents so their content can be evaluated and a new set of URLs would be available to follow
6 Issues Associated With CI and The Web Information GatheringInformation AnalysisInformation VerificationInformation Security
8 General Web Search Engines ArchitectureWeb Crawlers (Web Spiders) are used to collect Web pages using graph searching techniquesAn indexing method is used to index collected Web pages and store the indices into a database.Retrieval and ranking methods that are used to retrieve search results from the database and present ranked results to users.A user interfaceallow users to query the database and customize their searches
9 Domain Specific Web Search Engines Northern Light, a search engine for commercial publications, in the domains of business and general interest.EDGAR is the United States Securities and Exchange Commission clearinghouse of publicly available information on company information and filings.Westlaw is a search engine for legal materials.OVID Technologies provides a user interface that unifies searching across many subfields and databases of medical information.
10 Meta-search engineUpon receipt of query connects to several general search enginesReturns integrated results of searchesexamples
11 Difficulties with Information Gathering Time to carry out searchNumber of pages returnedCurrency of informationAccessible pagesWeb contains billion pagesGrowth rate of 7.3 million per day“Surface Web” v.s. “Deep Web”Surface Web page freely available to publicDeep Webdynamic pages, intranets & proprietary databasesSurface Web contains about 2.5 billionDeep Web contains about 550 billion (200 times more)Charge for Web retrieval
13 Web Page Content Focused Spiders (On Line) Return Appropriate Set of PagesIntelligent AgentUser InterfaceCI Spider by Chau & Chen - University of ArizonaAnswers On-line by Answer Chase
14 Search Result Mining Text Mining (Off Line) Automate the task of organizing and summarizing numerous pagesRequires automated analysis of natural language textsCommercially available text mining applications e.g. TextAnalyst by MegacomputerANN solution SITEX by Fukuda et. al.
15 Web Structure Page Rank Hyperlink-Induced Topic Search (HITS) Utilized in keyword searching of webMeasure of the number of “back links” to a pageImportance of page determined by number links to the pagePage’s priority determined by this measureImplemented in the Google search engineHyperlink-Induced Topic Search (HITS)Hub & Authority measures associated with pageHub - a page that contains links to authoritative pagesAuthoritative - best page (sources) for requested informatiomStarts with a keyword search that returns a set of pageshubs and authoritative
16 Web Usage Data mining on Web logs Web logs contain “clickstream” data Server sideInformation about pages providedClient sideInformation about pages requested
18 Techniques to Verify Accuracy of Information Deep web sources more reliable that surface web sourcesConfirm with non-web sourceAnswer the followingWho is the author?Who maintains the web site?How current is the web page?Observe the Top Level Domain (TLD) of the URL“~” within URL denotes a personal web page
19 Domain Names Original TLDs New TLDs .com .edu .gov .net .org .aero (for the air-transport industry).biz (for businesses),.coop (for cooperatives).info (for all uses).museum (for museums).name (for individuals).pro (for professions).
21 Information Security Issues Assuring the privacy and integrity of private informationManaged with usual computer and network security methodsAssuring the accuracy of a firm’s public informationDefend against:Web hijackingWeb defacingCognitive hacking (semantic attack)Negative informationReference - Cybenko, Giani, & ThompsonAvoiding unintentionally revealing information that ought to be private
22 Web HijackingDue to a bug in CNN’s software, when people at the spoofed site clicked on the “ This” link, the real CNN system distributed a real CNN to recipients with a link to the spoofed page.With each click at the bogus site, the real site’s tally of most popular stories was incremented for the bogus story.Allegedly this hoax was started by a researcher who sent the spoofed story to three users of AOL’s Instant Messenger chat software.Within 12 hours more than 150,000 people had viewed the spoofed page.
23 Sm0ked crew is back and better than ever! Web DefacingTHE-REV | SPLURGESm0ked crew is back and better than ever!“Well, admin I’m sorry to say by you have just got sm0ked by splurge. Don’t be scared though, everything will be all right, first fire your current security advisor . . .”In February 2001 the New York Times web site was defaced by a hacker identified as “splurge” from a group called “Sm0ked Crew”, which had a few days previously defaced sites belonging to Hewlett-Packard, Compaq, and Intel.
24 Cognitive Hacking Cognitive hacking is the manipulation of perception. Causesdisgruntled customers/employeescompetitionrandom act of vandalism
25 Two types of cognitive hacking single source cognitive hacking.when a reader reads information and the reader does not know who posted the information and has no way of verifying the information or contacting the author of the information.multiple source cognitive hackingoccurs when there are several sources for a topic, and this becomes a concern when the information is not accurate.
26 Categories of Cognitive Attacks OvertNo attempt is made to conceal overt cognitive attackswebsite defacements.CovertProvision of misinformationthe intentional distribution or insertion of false or misleading information intended to influence reader’s decisions and/or activities
27 Emulex & Mark JakobOn 8/25/2000 a press release distributed by financial news services stated that Emulex revised its per share gain to a per share lossPrice per share of Emulex moved from $ to $43.00 in 16 minutesThe press released was false - fabricated by Mark Jakob who was currently on the wrong side of a stock short sale.Jakob launched this press release via Internet Wire - LA based firm that distributes press releases.
28 The Jonathan Lebed Case DATE: 2/03/00 3:43pm Pacific Standard TimeFROM: LebedTG1FTEC is starting to break out! Next week, this thing will EXPLODE . . .Currently FTEC is trading for just $21/2. I am expecting to see FTEC at$20VERYSOON . . .Let me explain why . . .Revenues for the year should very conservatively be around $20 million.The average company in the industry trades with a price/salesratio of With 1.57 million shares outstanding, this will value FTECat $44. It is very possible that FTEC will see $44, but since I wouldlike to remain very conservative my short term price target onFTEC is still $20!The FTEC offices are extremely busy I am hearing that a number ofHUGE deals are being worked on. Once we get some news from FTECand the word gets out about the company it will take-off to MUCHHIGHER LEVELS!I see little risk when purchasing FTEC at these DIRT-CHEAP PRICES.FTEC is making TREMENDOUS PROFITS and is trading UNDER BOOKVALUE!!!This is the #1 INDUSTRY you can POSSIBLY be in RIGHT NOW.There are thousands of schools nationwide who need FTEC to installsecurity systems You can’t find a better positioned company thanFTEC!These prices are GROUND-FLOOR! My prediction is that this will bethe #1 performing stock on the NASDAQ in I am loading up withall of the shares of FTEC I possibly can before it makes a run to $20.Be sure to take the time to do your research on FTEC! You will probablynever come across an opportunity this HUGE ever again in yourentire life.According to the US Security Exchange Commission, 15-year-old Jonathan Lebed earned between $12,000 and $74,000 daily over six months - for a total gain of $800,000. Lebed would buy a block of FTEC stock and then using only AOL accounts with fictitious names he would post a message like the one in the next text box. Doing this a number of times he increased the daily trading volume of FTEC from 60,000 shares to more than one million.
29 POSSIBLE COUNTERMEASURES Single sourceAuthentication of sourceInformation "trajectory" modelingUlam gamesMultiple SourcesSource Reliability via Collaborative Filtering and Reliability reportingDetection of Collusion by Information SourcesByzantine Generals Models
30 Countermeasures: Single Source Authentication of SourceDue diligenceImplied verification - PKI (Digital Signature)Information TrajectoryVariation on a themee.g. Lebed case variation of the “pump & dump” schemeUlam GamesModel that assumes false informationHow fast can that be determined using questions & answers of source
31 Countermeasures: Multiple Sources Collaborative filtering and reliability reportingwhen a site keeps records and uses those records to verify future claims by those with access to publishing on the site.Detection of Collusion by Information SourcesLinguistic analysisDetermine if different sources are by same authorByzantine generals modelmessage communicating system has two types of processes: reliable and unreliable.Given a number of processes from this system determine which of type is each process.
32 Countermeasures:Negative Information Monitor Web Sites5360 URLs with the phrase “Microsoft sucks”Use an IA to monitorText mining for type of negative informationRespond accordingly
33 Countermeasures: Unintentional Disclosure Carry out a CI project against yourself
34 Conclusions Reconcile “deep web” v.s. “surface web” Determine when all pages are needed vs “right” set of pagesAutomate “authoritative page selection”“Consumer Reports” type processe.g. posting a Web page in early 90s (Yahoo)Automate detection offalse informationinaccurate informationnegative information
36 ReferencesAaron, R. D. and Naylor, E. “Tools for Searching the ‘Deep Web’ ”, Competitive Intelligence Magazine, (4:4), Online at (date of access April 18, 2003).Calishain, T. and Dornfest, R. (2003) Google Hacks: 100 Industrial-Strength Tips & Tools, Sebastopool, CA: O’Reilly & Associates.Chakrabarti, S. (2003) Mining the Web: Discovering Knowledge from Hypertext Data, San Francisco, CA: Morgan Kaufmann.Chen, H., Chau, M.l, and Zebg, D. (2002) “CI Spider: A Tool for Competitive Intelligence on the Web”, Decision Support Systems, (34:1) ppCybenko, G., Giani, A., and Thompson, P. (2002) “Cognitive Hacking: A Battle for the Mind”, IEEE Computer (35:8) August, pp. 50–56.Dunham. M. H. (2003), Data Mining: Introductory and Advanced Topics, Upper Saddle River, NJ: Prentice Hall.Fleisher, C. S. and Bensoussan, B. E. (2000) Strategic and Competitive Analysis, Upper Saddle River, NJ: Prentice Hall, 2003.Fuld, L. (1995) The New Competitor Intelligence, New York: Wiley.Herring, J. P. (1998) "What Is Intelligence Analysis?" Competitive Intelligence Magazine, (1:2), pp.,
37 ReferencesKleinberg, J. M. (1999), “Authoritative Sources in a Hyperlinked Environment”, Journal of the ACM (46:5), pp , September.Krasnow, J. D. (2000), “The Competitive Intelligence and National Security Threat from Website Job Listings” (date of access April 18, 2003).Lyman, P. and Varian, H.R. (2000) “Internet Summary” Berkeley, CA: How Much Information Project, University of California, Berkeley, (date of access April 18, 2003).Murray, M. and Narayanaswamy, R. (2003) “The Development of a Taxonomy of Pricing Structures to Support the Emerging E-business Model of ‘Some Free, Some Fee’”, Proceedings of SAIS 2003, ppPage, Lawrence, and Brin, Sergey, ”The Anatomy of a Large-Scale Hypertextual Web Search Engine”, , 1998.(date of access April 22, 2003).Schneier, Bruce (2000) “Semantic Attacks: The Third Wave of Network Attacks”, Crypto-gram Newsletter, October 15, 2000, (Date of access April 18, 2003).SCIP (Society of Competitive Intelligence Professionals) (date of access April 18, 2003).