Presentation is loading. Please wait.

Presentation is loading. Please wait.

Applying Existing Technology to Exploitation of Multiple Sources of Information Mike Brenton Sterling Software Memex Technology Limited.

Similar presentations


Presentation on theme: "Applying Existing Technology to Exploitation of Multiple Sources of Information Mike Brenton Sterling Software Memex Technology Limited."— Presentation transcript:

1 Applying Existing Technology to Exploitation of Multiple Sources of Information Mike Brenton Sterling Software Memex Technology Limited

2 Problem Statement First there was data overload. Now there is an over abundance of tool power.

3 Information Types The Migration Defense Intelligence Threat Data System (MDITDS) is a Department of Defense Intelligence Information Systems (DODIIS) designated migration system tasked to provide the automated production system for the DODIIS Indications and Warnings (I&W), Counterintelligence (CI), Anti-terrorism (AT), Counterterrorism (CT), Information Warfare (IW), Arms Proliferation (AP), and Defense Industry (DI) communities. Sterling Software Announces 2-For-1 Stock Split DALLAS, Texas (March 11, 1998) - Sterling Software, Inc. (SSW-NYSE) today announced that its Board of Directors has approved a 2-for-1 split of the companys common stock. Stockholders will receive one additional common share for every share held on the record date of March 20, 1998. The additional shares will be issued on April 3, 1998. Sterling Software currently has approximately 38.8 million shares of common stock outstanding. This number will double to approximately 77.6 million shares by reason of the stock split. Sterling L. Williams, president and chief executive officer of Sterling Software commented, "Sterling Softwares stock price increased 30% during 1997 and 28% so far this year, based on consistently excellent performance by the company. We decided to split our stock to improve its trading liquidity and to help ensure that it trades in a price range that is accessible to a broad base of investors." Sterling Software is a leading provider of software and services for the applications management, systems management and federal systems markets. Sterling Software, with its headquarters in Dallas, has a worldwide installed base of more than 20,000 customer sites and has 3,100 employees in 85 offices worldwide. For more information on Sterling Software, visit the companys Web site at http://www.sterling.com. Contact: Julie Kupp Vice President, Investor Relations Sterling Software, Inc. (214) 981-1000 Julie_Kupp@sterling.com ©Copyright Sterling Software, 1998 All rights reserved

4 Open Source Materials Electronic Information –Library Services –On-line Newspapers –On-line Reports –Information Brokers –CD-ROM Products –Wire Services Agents –Services - People –Services - Push and Punch –Spiders, Crawlers, and Profilers Electronic Information –Library Services –On-line Newspapers –On-line Reports –Information Brokers –CD-ROM Products –Wire Services

5 Data Warehouses Concept Analysis and Summarization Vectors, Clustering, Histograms Data Mining OLAP Statistical Analysis Visualization Information Extraction Temporal Analysis Link Analysis Data Warehouses Concept Analysis and Summarization Vectors, Clustering, Histograms Data Mining OLAP Statistical Analysis Visualization Information Extraction Temporal Analysis Link Analysis Tools

6 Data Warehouses Data warehousing is an emerging technology that supports non-operational application areas like management information systems, decision support, and data mining. A data warehouse is a database that provides efficient and integrated access to relevant analytical data. Department of Information Science - The Aarhus School of Business Data warehousing is an emerging technology that supports non-operational application areas like management information systems, decision support, and data mining. A data warehouse is a database that provides efficient and integrated access to relevant analytical data. Department of Information Science - The Aarhus School of Business Data warehousing is an emerging technology that supports non-operational application areas like management information systems, decision support, and data mining. A data warehouse is a database that provides efficient and integrated access to relevant analytical data. Department of Information Science - The Aarhus School of Business Data warehousing is an emerging technology that supports non-operational application areas like management information systems, decision support, and data mining. A data warehouse is a database that provides efficient and integrated access to relevant analytical data. Department of Information Science - The Aarhus School of Business

7 Memex Information Engine and Client Applications DIAEUCOM JICPACSOUTHCOMCENTCOMSTRATCOM SPACECOMTRANSCOMNetwork Products  Country Profiles  Group (Unit) Profiles  Individual Profiles  Incidents (Events)  Misc. Assessments  All Domains  Counter Intelligence  Counter Terrorism  Force Protection  Arms Proliferation  Defense Industries  Indications & Warning  All File Edit View Insert Memex Network Query Tool Field Search: Name: __________________________ Incident Type: ____________________ Organization Type: ________________ Equipment: ______________________ Start Date:_________ Stop Date:_________ Text Search: File Edit View Insert Memex Network Query Tool 1 -- 90% -- Air Power Over Bosnia 2 -- 85% -- UK Air Power and NATO 3 -- 70% -- Air Power Assessment 4 -- 65% -- Munitions on Tactical Fighters 5 -- 50% -- Tactical Fighters and LSB 6 -- 50% -- Smart Munitions 7 -- 45% -- Air Dropped Land Mines File Edit View Insert Memex Network Query Tool 4 -- 65% -- Munitions on Tactical Fighters 6 -- 50% -- Smart Munitions 7 -- 45% -- Air Dropped Land Mines 2 -- 85% -- UK Air Power and NATO 1 -- 90% -- Air Power Over Bosnia 3 -- 70% -- Air Power Assessment 5 -- 50% -- Tactical Fighters and LSB

8 Concept Analysis and Summarization Concept analysis is the process of matching keywords in the text to hierarchical topic trees in order to determine the major theme(s) in the document, paragraph, or sentence. Some systems use this information and predetermined templates to build summaries of a document. The concepts and summaries are then used to route documents to analysts.

9 Vectors, Clustering, and Histograms Document clustering is a technique for automatically discovering the subtopics in a set of documents and grouping the documents by those subtopics. Organizing documents by subtopic can help you get a sense of the major subject areas covered in the document set… Verity, Inc. Document clustering is a technique for automatically discovering the subtopics in a set of documents and grouping the documents by those subtopics. Organizing documents by subtopic can help you get a sense of the major subject areas covered in the document set… Verity, Inc.

10 Data Mining Data mining is the analysis of data for relationships that have not previously been discovered. For example, the sales records for a particular brand of tennis racket might, if sufficiently analyzed and related to other market data, reveal a seasonal correlation with the purchase by the same parties of golf equipment. whatis.com Inc. Data mining is the analysis of data for relationships that have not previously been discovered. For example, the sales records for a particular brand of tennis racket might, if sufficiently analyzed and related to other market data, reveal a seasonal correlation with the purchase by the same parties of golf equipment. whatis.com Inc.

11 OLAP OLAP (online analytical processing) enables a user to easily and selectively extract and view data from different points-of-view. For example, display a spreadsheet showing all of a company's beach ball products sold in Florida in the month of July, 1997, then compare revenue figures with those for the same products in July, 1996, and then etc. whatis.com Inc. OLAP (online analytical processing) enables a user to easily and selectively extract and view data from different points-of-view. For example, display a spreadsheet showing all of a company's beach ball products sold in Florida in the month of July, 1997, then compare revenue figures with those for the same products in July, 1996, and then etc. whatis.com Inc.

12 Statistical Analysis The collection, classification, and interpretation of numerical data. Elements of statistics are present in most OLAP tool sets. Functions include: Frequency Distribution, Average, Mean, Standard, Deviations, etc. Functions found in most spreadsheet applications.

13 Visualization is the process of representing abstract business or scientific data as images that can aid in understanding the meaning of the data. Visual computing is computing that lets you interact with and control work by through visualization. whatis.com Inc. Visualization is the process of representing abstract business or scientific data as images that can aid in understanding the meaning of the data. Visual computing is computing that lets you interact with and control work by through visualization. whatis.com Inc. Visualization

14 Automated information extraction involves the identification and extraction of information about specified classes of events and the filling of templates for each instance of such an event. Operates against pure text. Also known as NLU or NLP. Naval Research and Development group (NRaD) of NOSC Automated information extraction involves the identification and extraction of information about specified classes of events and the filling of templates for each instance of such an event. Operates against pure text. Also known as NLU or NLP. Naval Research and Development group (NRaD) of NOSC Automated information extraction involves the identification and extraction of information about specified classes of events and the filling of templates for each instance of such an event. Operates against pure text. Also known as NLU or NLP. Naval Research and Development group (NRaD) of NOSC Information Extraction

15 Temporal Analysis Temporal analysis is the process of evaluating information, events and activities in light of models which encompass the concept of time or sequence and time. Model sequences incorporate a timeframe constraint on the identified events.

16 Link Analysis Link analysis provided the ability to investigate relationships between people, places, events, and things. Ideally, it is a mechanism to walk through a data warehouse following those links which have meaning relevant to the immediate problem.

17 Tools are nice but... There has to be a reason: Analysis of operational data Analysis of associated data Discovering new relationships Discovering new trends Gaining new insights into your business Competitive Edge

18 Different Tools for Different Kinds of Discovery

19 Information Extraction Translating text reports (prose) into tagged data Evaluating the tagged data to extract information Commonly referred to as Natural Language Understanding or Processing

20 A Focus on the Analysis of Textual Information Typical process flow –Receipt –Auto-analysis Classification Extraction –Archive –Visualization Wire Service Government Traffic Receipt Ten-Plus- Year Repository Analyst Queues Review Process Analyze Think Update Assessment Update Queue Profiles Ignore

21 Making the Information Usable Sterling Software Announces 2-For-1 Stock Split DALLAS, Texas (March 11, 1998) - Mr. Sterling Williams of Sterling Software, Inc. (SSW-NYSE) today announced that the companies Board of Directors has approved a 2-for-1 split of the companys common stock. Stockholders will receive one additional common share for every share held on the record date of March 20, 1998. The additional shares will be issued on April 3, 1998. Sterling Software currently has approximately 38.8 million shares of common stock outstanding. This number will double to approximately 77.6 million shares by reason of the stock split.

22 Information Extraction is not Information Retrieval Information extraction gets facts out of documents -- you analyze the facts Natural Language Processing Group, The University of Sheffield Information retrieval gets sets of relevant documents -- you analyze the documents

23 There are many ways of expressing the same fact: –BNC Holdings Inc named Ms G Torretta as its new chairman. –Nicholas Andrews was succeeded by Gina Torretta as chairman of BNC Holdings Inc. –Ms. Gina Torretta took the helm at BNC Holdings Inc. Information may need to be combined across several sentences: –After a long boardroom struggle, Mr Andrews stepped down as chairman of BNC Holdings Inc. He was succeeded by Ms Torretta. There are many ways of expressing the same fact: –BNC Holdings Inc named Ms G Torretta as its new chairman. –Nicholas Andrews was succeeded by Gina Torretta as chairman of BNC Holdings Inc. –Ms. Gina Torretta took the helm at BNC Holdings Inc. Information may need to be combined across several sentences: –After a long boardroom struggle, Mr Andrews stepped down as chairman of BNC Holdings Inc. He was succeeded by Ms Torretta. Why is Information Extraction Difficult Natural Language Processing Group, The University of Sheffield

24 Information Extraction (Document) Natural Language Understanding Lexical Analysis Article Reduction Simple Relations Common Events Coreference Domain Events Records

25 Correlation of Extracted Information (Other Documents) The events in a single document are relevant to routing the document, But a single meeting (event) put in context of other meetings (events) becomes much more useful. Manual vs. Automated Process User interest profiles, e.g., –Membership –Meeting (Communication) Events –Relocation (Movement) Events

26 Using Correlated Data (Mining Text (or other) Databases) What would the user do if they knew how to use the visualization tools? Automate the process: –Use names of people and organizations for data mining. –Use temporal analysis to align (chronologically) the events. –Use link analysis to establish networks of people and things, e.g., vehicles. Present the user with organized information. Monitor the success of the process and feed back the results into the system.

27 Summary Still faced with a tremendous amount of data. Tools are available for acquiring information relevant to your business. Tools to perform data mining over a substantial data warehouse require a commitment to: –Money –Time –Training –Personnel The results are:

28 Thank you Mike Brenton Sterling Software www.sterling.com mbrenton@mclean.sterling.com ------------- Memex Technology Limited www.memex.co.uk ------------ Jim Basara Memex, Inc. jim@memex.com


Download ppt "Applying Existing Technology to Exploitation of Multiple Sources of Information Mike Brenton Sterling Software Memex Technology Limited."

Similar presentations


Ads by Google