Presentation is loading. Please wait.

Presentation is loading. Please wait.

Semantics Rule, Keywords Drool J. Brooke Aker CEO Expert System USA February 2010.

Similar presentations


Presentation on theme: "Semantics Rule, Keywords Drool J. Brooke Aker CEO Expert System USA February 2010."— Presentation transcript:

1 Semantics Rule, Keywords Drool J. Brooke Aker CEO Expert System USA February 2010

2 Corporate background Most accurate, largest, fastest growing semantics company worldwide 100+ customers including large corporations, government in; –business intelligence- enterprise search & data extensibility –market sentiment - customer care 100+ dedicated engineers focused on core semantic technology, applications, tools and services: –200 man/years in the development of COGITO over the last 10 years. 20 years old, private & profitable –FY2008: $13.5M, 110+ employees, 30% growth each of last 3 years –Offices in Connecticut, California, UK, Italy, & Germany 2

3 Why Do Keywords Drool? 3 Problems with Search Technology; 1. Same Word Different Meanings Jaguar (animal) Jaguar (car) 2. Different Words Same Meaning Disability Legislation Equal Opportunity Law 3. Different Words Related Meaning Organization Company Organization Charity Organization Trade Union

4 Results in Declining Productivity Productivity of Search Amount of Information Databases Files & Folders Directories Keyword Search (Google) Tagging Natural Language Search Semantic Search Desktop PC Era World Wide Web Web 1.0 Social Web Web 2.0 Semantic Web Web 3.0

5 Information Tasks In Business Query Well Formed Query Not Well Formed DiscoveryAnalysis Exploration Sources Known Sources Not Known Search

6 Information Measures In Business 1.Precision: Retrieving a high level of accurate results relevant to your search query (a measure of exactness) 2.Recall: Retrieving a high percentage of relevant documents (a measure of completeness) Recall Precision low high low PowerSet Keywords Statistics Semantics

7 What Business Wants IT to Provide Semantics plays a role in all these except perhaps the last 2. Source: AMR Research

8 So What Then is the Semantic Web? Web 1.0 ProducerConsumer Web 2.0 Web 3.0 One Producer Many Consumers Everyone Produces Everyone Consumes Everyone Produces Pinpoint Consumption semantics

9 COGITO ® : deep analysis 4 ApproachesDefinitionExample Morphological Analysis understand word forms dog, dogs, and dog-catcher are closely related Grammatical Analysis understand the parts of speech "There are 40 rows in the table" uses rows as a noun, vs. "She rows 5 times a week" uses rows as a verb Logical Analysis understand how words relate to other words "Jeffrey Skilling, represented by Attorney Daniel Petrocelli, is married to Rebecca Carter". Rebecca is married to Jeffrey not Daniel. Semantic Analysis (disambiguation) understand the context of key words "I used beef broth for my soup stock" uses stock in the context of food, vs. "The company keeps lots of stock on hand" uses stock in the context of inventory. Technology that understands the real meaning of the words – based on theories of human comprehension

10 The solution is Semantics Using human comprehension for machine understanding of text. Machine understanding of text needs: A semantic network A parser to trace each text back to its basic elements A linguistic engine to query the semantic network A system to eliminate ambiguity Steps to establish meaning Semantic Network Parse Eliminate Ambiguity Order & Priority 1 23 Linguistic Query Engine

11 COGITO ® is generic and horizontal and can transform unstructured information in structured data that can be managed with standard databases

12 The heart of semantic technology ; Quality of results derived from the complexity and richness of the network. Includes all definitions of all words. Include relationships among all words. COGITO® English Semantic Network: - 350,000 words - 2.8m relationships What is a Semantic Network?

13 Semantic Networks Traditional technologies can only guess the meaning using; keywords, shallow linguistics, & statistics Semantic Networks instead indentify; Connections Concepts Terms Abbrev. PhrasesMeanings Domains San Jose is an American city San Jose is a geographic part of California

14 Semantic Network Semantic Network Semantic Network Semantic Network Technology Stack Semantic Network Linguistic Query Engine Development Studio English Arabic Italian German Other Middle Eastern 1. Morphology 2. Grammatical 4. Disambiguation Develop & Add Custom Rules 3. Logic 80% Precision 90%+ Precision

15 Semantic Intelligence Linguistic rules Sentence analysis Semantic Network Shallow text analytics Statistics Heuristic rules Morphological recognition Keyword-based technologies Disambiguation Entity extraction Categorization Natural lang. UI Semantic Search Discovery Sentiment 100% Semantic Technology

16 60KB / sec Semantic text analysis processing speed (one CPU) <10 -6 sec Scalability in number of CPUsTypical time of access to a concept in the semantic netNumber of concepts in English semantic netHyponyms and hypernymsHypernyms and troponyms Average # of attributes for each concept Number of relations in semantic net (English) Software memory footprint (semantic net and engine) 50 MB 350,000 400,000+ 55,000 20 2,800,000 Virtually unlimited Superior Performance

17 Expert System Unique Feature #1 Expanded Definition Sets - captures all possible ways of expressing a concept, beyond the use of a single word; Compound word – like blackbird or cookbook Collocation – like overhead projector or landing field Idiomatic expression – like to fly off the handle or to weight anchor Locutions – group of words that express simple concepts that cannot be expressed by a single word Verbal lemmas – such as a verb in the infinitive form, e.g. to write, or verbal collocations, e.g. to sneak away Keyword / Statistical and Shallow Semantic Tech Fails Here treats to fly off the handle all as separate words not as a concept.

18 Expert System Unique Feature #2 Expanded Semantic Relations - expanded set (65) of relations between concepts by looking at their use within the text. Answers questions like Who did what to whom?, often called a triple or a subject-action-object. WordNet for example contains only 5 relation types. Verb / Subject Verb / Direct Object Adjective / Class Syncon / Class Syncon / Corpus Syncon / Geography Fine Grain / Coarse Grain Supernomen / Subnomen Omninomen / Parsnomen Keyword / Statistical and Shallow Semantic Tech Fails Here treats RIM sued Verizon as the same thing as Verizon sued RIM

19 Expert System Unique Feature #3 Categories of Attributes – every concept in the semantic network also contains attributes which are organized into a hierarchy of categories. The attributes and categories are assigned to maximize similarities and differences between concepts as an aid in disambiguation. object animals plants people concepts places time natural phenomena states quantity groups Keyword / Statistical and Shallow Semantic Tech Fails Here cant tell you what portions of a document are related to categorically … e.g. only points to words not sections within a long document as a first cut.

20 Thank you Brooke Aker CEO of Expert System US +1 860-614-2411 baker@expertsystem.net www.expertsystem.net


Download ppt "Semantics Rule, Keywords Drool J. Brooke Aker CEO Expert System USA February 2010."

Similar presentations


Ads by Google