We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byGabriel Hicks
Modified over 4 years ago
Almaden Research Center © 2006 IBM Corporation IOP 06 Open Source Intelligence Lesson Learned
Almaden Research Center © 2006 IBM Corporation I 2 Issues in using open source for intelligence Growth and complexity of heterogeneous content Not all open source data is equal – Quantities vs. Qualitative Requirements of Ecoinformatics Architectures
Almaden Research Center © 2006 IBM Corporation I 3 Source: IBM 2005 GTO Years 10 24 = 1Trillion Terabytes of data which is equivalent to all the information consumed visually by all humans in a year Digital content is growing at dramatic rate
Almaden Research Center © 2006 IBM Corporation I 4 Source: IBM 2005 GTO The scale of open source data and its heterogeneous form increases complexity of extracting intelligence Storage online Medical data stored Personal multimedia Surveillance bytes Photos multimedia Scalable Heterogeneity Intelligence Structured data Free from text 10 9 10 12 10 15 10 21 10 24 10 27
Almaden Research Center © 2006 IBM Corporation I 5 Industry Publication Company Internal Content Company Publication Industry Journals Conference Proceedings NGO Publications Website affiliated with an organization User Groups / Forums News Letters Content Aggregators News & Press Releases Legal Filings Government Publications Blogs / Weblogs Non affiliated Websites Qualitative Quantitative Open Source Intelligence from the periphery requires an understanding of its topology, including strengths and weaknesses sources in the periphery These are authoritative sources, where data is trusted and is defended These are credentialed opinions, the source is known and can be weighted Open opinion, it is impossible to verify the authority of the source
Almaden Research Center © 2006 IBM Corporation I 6 Ecoinformatics Architectures need to be multi- layered Cross-Page Annotators Classification Clustering Communities Ranking Applications Network Associations Network Associations Search Topic Tracking Topic Tracking Buzz Analysis Buzz Analysis Per-Page Annotators Auto Entity Spotters Auto Entity Spotters Auto Geography Spotter Auto Geography Spotter Porn & Dup Detection Porn & Dup Detection Customer Taxonomy Spotter Customer Taxonomy Spotter 100s 1000s (pages/second) World Wide Web Blogs Newspapers Licensed Feeds Data Bases Intranet DataTaxonomies Commercial Date Bases Index Store Un-Structured Data DATA ACQUISITION Structured Data Parsing/ Tokenizing Annotation Searching Natural Clustering Natural Clustering Affinity Analysis Affinity Analysis Snippet Analysis Snippet Analysis Trending Performance Management Drug Research Business Insights Workbench Customer Applications 10s Relevancy Volume WebFountain Business Insights Workbench WS OminFind II Index Store DATA ACQUISITION Date Spotters Language Spotters Source Spotters
Almaden Research Center © 2006 IBM Corporation I 7 0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 4.5% Congressman Rob Simmons Douglas Rushkoff Eliot Jardines Major General Patrick Cammaert Mr Arno Reuser Robert Steele Open Source Trend on Web Some event happened in August % of OSI web documents One dominant voice Finding intelligence can require different view of the same information
Almaden Research Center © 2006 IBM Corporation I 8 Context Network of Conference Attendees to auto-spotted Companies and Universities In this network view we dont care about association with Open Source Intelligence but with companies and universities
Almaden Research Center © 2006 IBM Corporation I 9 Computers dont create intelligence, people do – computers enable smart people Not all open source content is equal – know the sources Not every thing you see is right – its all about the CONTEXT Ecoinformation architecture supports - Large scale analytics of open source content - Integration of content other than open source - Power text analytic tools to support analysis of on topic stores Conclusions on Open Source Intelligence
SEARCHING THE BLOGOSPHERE
Final Project Instructor: Nguyen Anh Tu Students: Tran Tien Tai Tran Tien Tai Tran Ngoc Mai Tran Ngoc Mai Tu Kim Tuan Tu Kim Tuan Nguyen Ngoc Phuong Nguyen.
Taxonomy & Ontology Impact on Search Infrastructure John R. McGrath Sr. Director, Fast Search & Transfer.
TIME IMPACT SHORT TIME IMPACT LONG MULTI-CULTURAL & TRANS-NATIONAL EQUITIES SINGLE-CULTURE SINGLE-ORGANIZATION EQUITIES LEADERS DECIDE PEOPLE DECIDE TOP-DOWN.
Technology Roadmap Project Harold Flescher VP-Elect, Technical Activities August 2008, Region 1 Meeting.
HATHI TRUST A Shared Digital Repository Delivering Data For New Generations of Research Strategies and Challenges Jeremy York NISO/BISG Forum ALA 2010.
GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
Smart Qualitative Data: Methods and Community Tools for Data Mark-Up SQUAD Libby Bishop Online Qualitative Data Resources: Best Practice in Metadata Creation.
DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
PhishZoo: Detecting Phishing Websites By Looking at Them
News and Blog Analysis with Lydia Steven Skiena Dept. of Computer Science SUNY Stony Brook
3.04 Understand the use of direct marketing to attract attention and to build a brand.
Bringing It All Together: An Academic Viewpoint (What is needed and what is likely to come next?) Association of Information and Dissemination Centers.
COMBASE: strategic content management system Soft Format, 2006.
Retrieval of Information from Distributed Databases By Ananth Anandhakrishnan.
Lorrie Apple Johnson Lead Librarian, Information Analysis & Services Office of Scientific and Technical Information (OSTI) National Academy of Sciences.
Biomarkers Data Center Product Overview Partnership between DMS Data Systems and Cambridge Healthtech Institute.
A Stepwise Modeling Approach for Individual Media Semantics Annett Mitschick, Klaus Meißner TU Dresden, Department of Computer Science, Multimedia Technology.
1 Distributed Agents for User-Friendly Access of Digital Libraries DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen,
© 2004, M. Fontoura VLDB, Toronto, September 2004 High Performance Index Build Algorithms for Intranet Search Engines Marcus Fontoura, Eugene Shekita,
© 2018 SlidePlayer.com Inc. All rights reserved.