Presentation on theme: "An Introduction to Semantic Technology for SharePoint Administrators."— Presentation transcript:
An Introduction to Semantic Technology for SharePoint Administrators
2 Who We Are Leader in the development of semantic software, used by organizations to make information mangement more efficient, and to gain strategic knowledge through the automatic comprehension of text.
3 Our Customers
4 Why are we here? Greater demand for information in the decision making process Ever increasing volumes of data to be considered every day (documents, s, web pages, social media and “Big Data”) Traditional technologies increasingly expected to manage and process information But, most organizations are not taking advantage of all of their data.
5 Ultimately, we are here to create value from information INCREASED SALES REDUCED COSTS Increase customer satisfaction Increase competitive advantage through the monitoring of markets and innovations Enhance brand value with targeted social media analysis Simplify the organization and recovery of dataImprove internal knowledge sharingMore timely and effective customer interactions Reduce the time and costs of traditional customer assistance Increase sales and customers
6 So much data, so little time For organizations that use SharePoint, it is their primary means of collaboration. These organizations have invested a tremendous amount of time and money to connect their employees and their data to improve communication and workflow. Despite this investment, users still spend massive amounts of time searching for relevant data and content. Search is AN ABSOLUTE FAILURE for large SharePoint deployments. The bigger the farm, the less useful and less relevant search becomes.
7 WHY? The majority of enterprise content is unstructured in the form of electronic documents, s, forms, etc. Searching through the textual portion of unstructured content can be a daunting task as it is highly likely the search operation will return a large number of possible results. Further, people are generally searching for content inside content and inter-relationships between content– which complicates search even more. Most companies are guilty of one or more of the following: Underutilization of features Lack of clear requirements or vision Not using metrics to gauge feature usage and adoption Understaffing to properly support the platform
8 The problems with unstructured data Extraction and categorization are used to structure unstructured data and make the retrieval and management of information more effective Taxonomy and text mining rules are often dependent on specific business needs and influenced by market sector and project objectives Organizations need flexible solutions that are easily integrated and customizable, and capable of responding to specific requirements for extraction and categorization
IPOTESI INGOMBRO EVENTUALE SCREENSHOT SharePoint technologies for managing information Technology is a key factor in managing unstructured information. There are different approaches for managing unstructured information: Keyword-based plus statistical elements Shallow linguistics Semantic technology
IPOTESI INGOMBRO EVENTUALE SCREENSHOT Keyword Based Text is divided into single words that are inserted in an alphabetical index, with no understanding of content: Az IBM szokásosan nagy hangsúlyt helyez a továbbképzésre, így munkatársai évente számos szakmai tanfolyamon vesznek részt. Az elmúlt években a csoport több tagja is részt vett több hónapos, egyesült államokbeli, angliai illetve németországi projekt munkákban, melyek során nemzetközi csoportban végeztek fejlesztői tevékenységet.
IPOTESI INGOMBRO EVENTUALE SCREENSHOT Shallow Linguistics Words in the text are recognized as either belonging to the dictionary, or not. Acknowledged words are linked to the basic headword and a grammatical type is assigned. Some logical groupings are made. Indexes contain headwords and keywords.
IPOTESI INGOMBRO EVENTUALE SCREENSHOT Semantics Simulate a human’s process of text analysis. Morphological analysis, parsing, sentence and semantic analysis allow the extraction of large amounts of information and work from a conceptual point of view (thanks to the semantic network). Document indexing creates a set of words, headwords, concepts, relationships, subjects and structures (cognitive/conceptual map).
IPOTESI INGOMBRO EVENTUALE SCREENSHOT Why Does Semantic Technology Excel? The answer is in primary measures of Information Management: #1 – Precision (a measure of exactness) Retrieving a high level of accurate results that are relevant to your search. #2 – Recall (a measure of completeness) Retrieving a high percentage of relevant documents. Locating what applies. Keyword and statistics (math based) technology can achieve one, but not both.
IPOTESI INGOMBRO EVENTUALE SCREENSHOT Why is Semantics Important?...because language is too ambiguous. Same word – Different meanings Jaguar = car jaguar = animal Jaguars = football team Different words – Same meaning Disability Legislation = Equal Opportunity Law Different words – Related meanings Organization = company Organization = trade union Organization = charity
IPOTESI INGOMBRO EVENTUALE SCREENSHOT What Does Properly Analyzed Mean? 4 RequirementsDefinitionExample Morphological Analysisunderstand word forms dog, dog-catcher and doggy-bag are closely related. Grammatical Analysisunderstand the parts of speech "There are 40 rows in the table" uses rows as a noun, vs. "She rows 5 times a week" uses rows as a verb. Logical Analysis understand how words relate to other words “Davey Jones, represented by attorney Daniel Stanley, is married to Rebecca Carter". Rebecca is married to Davey, not Daniel. Semantic Analysis (disambiguation) understand the context of key words "I used chicken broth for my soup stock" uses stock in the context of food, vs. "The company keeps lots of stock on hand" uses stock in the context of inventory.
16 Using Semantic Technology morphological analysis parsing sentence analysis semantic network concepts, domain ontologies, places, companies, products, people ][
IPOTESI INGOMBRO EVENTUALE SCREENSHOT Adding value to information with syncons and links A syncon coincides with a node of the semantic net; each is connected to other syncons by specific semantic relations (= link) that develop a hereditary hierarchical structure. This structure allows every node to inherit characteristics from nearby nodes, thus enriching itself with information. Information inherently contains different kinds of links: hypernymy link (is a/type of) meronymy link (has a/part of) geographical link linguistic relations link (subject/verb, verb/object)
IPOTESI INGOMBRO EVENTUALE SCREENSHOT Ordering principles Links, which identify the semantic relationships between syncons, are the ordering principles. Syncons may contain: single headwords (‘set', ‘vacation‘, ‘work', ‘quick‘, ‘more') compounds ('non-stop', 'abat-jour', ‘policeman') collocations (‘credit card', ‘university degree', ‘go forward‘) A syncon has the following main elements: word class (noun, verb, adjective, adverb) semantic relations (link) gloss (explanation of meaning) domain, register and frequency
IPOTESI INGOMBRO EVENTUALE SCREENSHOT Links The link supernomen/subnomen concerns the relationship between a specific concept and a more general one. A supernomen is the more general term; it is a word that has a general meaning compared to those that represent a specification of the same meaning. EXAMPLES Dog – hunting dog – Irish terrier Habitation – flat – two-roomed flat Computer – portable computer – palmtop
IPOTESI INGOMBRO EVENTUALE SCREENSHOT Links The link superverbum/subverbum is one of the semantic relationships that link verb syncons together. This link is the equivalent for verbs compared to what link supernomen/subnomen is for nouns. EXAMPLES Eat – nibble at, eat listlessly Sleep – doze, snooze Walk – limp
IPOTESI INGOMBRO EVENTUALE SCREENSHOT Links The link omninomen/parsnomen is a “part/all” semantic relationship. A parsnomen is a term that indicates a part of something (omninomen). EXAMPLES Limb – hand – finger House – bathroom – washbasin Tree – trunk – bark
IPOTESI INGOMBRO EVENTUALE SCREENSHOT Parsing Parsing is a complete morphological, grammatical and syntactical analysis of a sentence, quickly applying many thousands of rules. Parsing identifies every element of a text, assigning each to the appropriate logical and grammatical function.
IPOTESI INGOMBRO EVENTUALE SCREENSHOT Disambiguating Text For a human, the meaning of a word is clear because the surrounding elements help him understand the sense in which the word is used. Software needs an unambiguous word interpretation represented by a reference system that is equivalent to the human world experience. If correctly trained in human common sense, the computer can achieve logical world comprehension and join it with its own memory and computing power.
IPOTESI INGOMBRO EVENTUALE SCREENSHOT Disambiguating Disambiguating analyzes single sentences or whole documents and finds the correct meaning for each element by removing every ambiguity. “Reasoning” takes place which identifies the different meanings of all elements of a text and the reference context.
IPOTESI INGOMBRO EVENTUALE SCREENSHOT Examples of Disambiguation Let’s look at some sentences using the term “bomb”: The disambiguator intercepts the first possible meaning of “bomb”: it is a sport noun which means a long high forward pass. In the second sentence, “bomb” is still a noun, but in this case it means a commercial or artistic failure.
IPOTESI INGOMBRO EVENTUALE SCREENSHOT Examples of Disambiguation If “bomb” appears in a sentence with the term volcano, it is interpreted as a lump of lava. Finally, in the following sentence, the disambiguator interprets the term “bomb” as an explosive device.
IPOTESI INGOMBRO EVENTUALE SCREENSHOT Disambiguator: Text Map Using semantics allows for the creation of a cognitive knowledge map, a graphic view of the text elements analyzed. We will use this internet biography of Edgar Allan Poe as an example:
IPOTESI INGOMBRO EVENTUALE SCREENSHOT Disambiguator: Classification Classification recognizes the main categories in the text (literature, military, publishing, etc.) and identifies the main concepts according to the semantic domain identified with the corresponding percentage (“allegory”, “book review”, “character” for literature; “Tamerlane”, “United States Army”, “West Point” for military; “book” for publishing; “epic poem”, “Baudelaire” for poetry, and so on).
IPOTESI INGOMBRO EVENTUALE SCREENSHOT Disambiguator: Main Concepts Main concepts included in the text are listed – frequency is indicated in the document analyzed as indicated by the colored bar.
IPOTESI INGOMBRO EVENTUALE SCREENSHOT Semantic Analysis Means Disambiguation Disambiguation is made possible by: A semantic network that contains the representation of concepts and relationships between them. A disambiguation engine that, based on knowledge from the semantic network, is able to associate every textual element to the meaning it represents. morphological analysis parsing sentence analysis semantic network Concepts, domain ontologies, places, companies, products, people 
IPOTESI INGOMBRO EVENTUALE SCREENSHOT What is a Semantic Network? A lexical database structured by a conceptual framework. Which means structuring words in groups of synonyms and words that are identical or similar in expressed meaning (concept). A concept in the language is named syncon (synonymous congressus), which is a set of synonyms representing the same lexical concept.