Presentation on theme: "Semantics Rule, Keywords Drool J. Brooke Aker CEO Expert System USA February 2010."— Presentation transcript:
Semantics Rule, Keywords Drool J. Brooke Aker CEO Expert System USA February 2010
Corporate background Most accurate, largest, fastest growing semantics company worldwide 100+ customers including large corporations, government in; –business intelligence- enterprise search & data extensibility –market sentiment - customer care 100+ dedicated engineers focused on core semantic technology, applications, tools and services: –200 man/years in the development of COGITO over the last 10 years. 20 years old, private & profitable –FY2008: $13.5M, 110+ employees, 30% growth each of last 3 years –Offices in Connecticut, California, UK, Italy, & Germany 2
Why Do Keywords Drool? 3 Problems with Search Technology; 1. Same Word Different Meanings Jaguar (animal) Jaguar (car) 2. Different Words Same Meaning Disability Legislation Equal Opportunity Law 3. Different Words Related Meaning Organization Company Organization Charity Organization Trade Union
Results in Declining Productivity Productivity of Search Amount of Information Databases Files & Folders Directories Keyword Search (Google) Tagging Natural Language Search Semantic Search Desktop PC Era World Wide Web Web 1.0 Social Web Web 2.0 Semantic Web Web 3.0
Information Tasks In Business Query Well Formed Query Not Well Formed DiscoveryAnalysis Exploration Sources Known Sources Not Known Search
Information Measures In Business 1.Precision: Retrieving a high level of accurate results relevant to your search query (a measure of exactness) 2.Recall: Retrieving a high percentage of relevant documents (a measure of completeness) Recall Precision low high low PowerSet Keywords Statistics Semantics
What Business Wants IT to Provide Semantics plays a role in all these except perhaps the last 2. Source: AMR Research
So What Then is the Semantic Web? Web 1.0 ProducerConsumer Web 2.0 Web 3.0 One Producer Many Consumers Everyone Produces Everyone Consumes Everyone Produces Pinpoint Consumption semantics
COGITO ® : deep analysis 4 ApproachesDefinitionExample Morphological Analysis understand word forms dog, dogs, and dog-catcher are closely related Grammatical Analysis understand the parts of speech "There are 40 rows in the table" uses rows as a noun, vs. "She rows 5 times a week" uses rows as a verb Logical Analysis understand how words relate to other words "Jeffrey Skilling, represented by Attorney Daniel Petrocelli, is married to Rebecca Carter". Rebecca is married to Jeffrey not Daniel. Semantic Analysis (disambiguation) understand the context of key words "I used beef broth for my soup stock" uses stock in the context of food, vs. "The company keeps lots of stock on hand" uses stock in the context of inventory. Technology that understands the real meaning of the words – based on theories of human comprehension
The solution is Semantics Using human comprehension for machine understanding of text. Machine understanding of text needs: A semantic network A parser to trace each text back to its basic elements A linguistic engine to query the semantic network A system to eliminate ambiguity Steps to establish meaning Semantic Network Parse Eliminate Ambiguity Order & Priority 1 23 Linguistic Query Engine
COGITO ® is generic and horizontal and can transform unstructured information in structured data that can be managed with standard databases
The heart of semantic technology ; Quality of results derived from the complexity and richness of the network. Includes all definitions of all words. Include relationships among all words. COGITO® English Semantic Network: - 350,000 words - 2.8m relationships What is a Semantic Network?
Semantic Networks Traditional technologies can only guess the meaning using; keywords, shallow linguistics, & statistics Semantic Networks instead indentify; Connections Concepts Terms Abbrev. PhrasesMeanings Domains San Jose is an American city San Jose is a geographic part of California
Semantic Network Semantic Network Semantic Network Semantic Network Technology Stack Semantic Network Linguistic Query Engine Development Studio English Arabic Italian German Other Middle Eastern 1. Morphology 2. Grammatical 4. Disambiguation Develop & Add Custom Rules 3. Logic 80% Precision 90%+ Precision
60KB / sec Semantic text analysis processing speed (one CPU) <10 -6 sec Scalability in number of CPUsTypical time of access to a concept in the semantic netNumber of concepts in English semantic netHyponyms and hypernymsHypernyms and troponyms Average # of attributes for each concept Number of relations in semantic net (English) Software memory footprint (semantic net and engine) 50 MB 350, , , ,800,000 Virtually unlimited Superior Performance
Expert System Unique Feature #1 Expanded Definition Sets - captures all possible ways of expressing a concept, beyond the use of a single word; Compound word – like blackbird or cookbook Collocation – like overhead projector or landing field Idiomatic expression – like to fly off the handle or to weight anchor Locutions – group of words that express simple concepts that cannot be expressed by a single word Verbal lemmas – such as a verb in the infinitive form, e.g. to write, or verbal collocations, e.g. to sneak away Keyword / Statistical and Shallow Semantic Tech Fails Here treats to fly off the handle all as separate words not as a concept.
Expert System Unique Feature #2 Expanded Semantic Relations - expanded set (65) of relations between concepts by looking at their use within the text. Answers questions like Who did what to whom?, often called a triple or a subject-action-object. WordNet for example contains only 5 relation types. Verb / Subject Verb / Direct Object Adjective / Class Syncon / Class Syncon / Corpus Syncon / Geography Fine Grain / Coarse Grain Supernomen / Subnomen Omninomen / Parsnomen Keyword / Statistical and Shallow Semantic Tech Fails Here treats RIM sued Verizon as the same thing as Verizon sued RIM
Expert System Unique Feature #3 Categories of Attributes – every concept in the semantic network also contains attributes which are organized into a hierarchy of categories. The attributes and categories are assigned to maximize similarities and differences between concepts as an aid in disambiguation. object animals plants people concepts places time natural phenomena states quantity groups Keyword / Statistical and Shallow Semantic Tech Fails Here cant tell you what portions of a document are related to categorically … e.g. only points to words not sections within a long document as a first cut.