Presentation is loading. Please wait.

Presentation is loading. Please wait.

Almaden Research Center © 2006 IBM Corporation IOP 06 Open Source Intelligence Lesson Learned.

Similar presentations


Presentation on theme: "Almaden Research Center © 2006 IBM Corporation IOP 06 Open Source Intelligence Lesson Learned."— Presentation transcript:

1 Almaden Research Center © 2006 IBM Corporation IOP 06 Open Source Intelligence Lesson Learned

2 Almaden Research Center © 2006 IBM Corporation I 2 Issues in using open source for intelligence Growth and complexity of heterogeneous content Not all open source data is equal – Quantities vs. Qualitative Requirements of Ecoinformatics Architectures

3 Almaden Research Center © 2006 IBM Corporation I 3 Source: IBM 2005 GTO Years = 1Trillion Terabytes of data which is equivalent to all the information consumed visually by all humans in a year Digital content is growing at dramatic rate

4 Almaden Research Center © 2006 IBM Corporation I 4 Source: IBM 2005 GTO The scale of open source data and its heterogeneous form increases complexity of extracting intelligence Storage online Medical data stored Personal multimedia Surveillance bytes Photos multimedia Scalable Heterogeneity Intelligence Structured data Free from text

5 Almaden Research Center © 2006 IBM Corporation I 5 Industry Publication Company Internal Content Company Publication Industry Journals Conference Proceedings NGO Publications Website affiliated with an organization User Groups / Forums News Letters Content Aggregators News & Press Releases Legal Filings Government Publications Blogs / Weblogs Non affiliated Websites Qualitative Quantitative Open Source Intelligence from the periphery requires an understanding of its topology, including strengths and weaknesses sources in the periphery These are authoritative sources, where data is trusted and is defended These are credentialed opinions, the source is known and can be weighted Open opinion, it is impossible to verify the authority of the source

6 Almaden Research Center © 2006 IBM Corporation I 6 Ecoinformatics Architectures need to be multi- layered Cross-Page Annotators Classification Clustering Communities Ranking Applications Network Associations Network Associations Search Topic Tracking Topic Tracking Buzz Analysis Buzz Analysis Per-Page Annotators Auto Entity Spotters Auto Entity Spotters Auto Geography Spotter Auto Geography Spotter Porn & Dup Detection Porn & Dup Detection Customer Taxonomy Spotter Customer Taxonomy Spotter 100s 1000s (pages/second) World Wide Web Blogs Newspapers Licensed Feeds Data Bases Intranet DataTaxonomies Commercial Date Bases Index Store Un-Structured Data DATA ACQUISITION Structured Data Parsing/ Tokenizing Annotation Searching Natural Clustering Natural Clustering Affinity Analysis Affinity Analysis Snippet Analysis Snippet Analysis Trending Performance Management Drug Research Business Insights Workbench Customer Applications 10s Relevancy Volume WebFountain Business Insights Workbench WS OminFind II Index Store DATA ACQUISITION Date Spotters Language Spotters Source Spotters

7 Almaden Research Center © 2006 IBM Corporation I 7 0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 4.5% Congressman Rob Simmons Douglas Rushkoff Eliot Jardines Major General Patrick Cammaert Mr Arno Reuser Robert Steele Open Source Trend on Web Some event happened in August % of OSI web documents One dominant voice Finding intelligence can require different view of the same information

8 Almaden Research Center © 2006 IBM Corporation I 8 Context Network of Conference Attendees to auto-spotted Companies and Universities In this network view we dont care about association with Open Source Intelligence but with companies and universities

9 Almaden Research Center © 2006 IBM Corporation I 9 Computers dont create intelligence, people do – computers enable smart people Not all open source content is equal – know the sources Not every thing you see is right – its all about the CONTEXT Ecoinformation architecture supports - Large scale analytics of open source content - Integration of content other than open source - Power text analytic tools to support analysis of on topic stores Conclusions on Open Source Intelligence


Download ppt "Almaden Research Center © 2006 IBM Corporation IOP 06 Open Source Intelligence Lesson Learned."

Similar presentations


Ads by Google