Presentation on theme: "From sentence to sense level information retrieval Bridging CONTENT and object with HOLTRAN Technology."— Presentation transcript:
From sentence to sense level information retrieval Bridging CONTENT and object with HOLTRAN Technology
Mission Statement HOLTRAN (Higher Order Logic Translation) Technology fills the gap between two fundamental methods of representation of information: unstructured texts and structured data. HOLTRAN Technology is aimed to serve as a universal international standard of representation, storing and exchange of information Our mission is to become an industry-leading provider of Natural Language Processing (NLP) solutions for consumers and companies HOLTRAN Technology Ltd.
What do we do? Next generation knowledge base engine Extracts information from multi-lingual natural language texts Stores the information in a structured form Keeps both semantic and textual information on equal rights Enables knowledge base queries in natural languages HOLTRAN Technology Ltd.
Why? The challenge … is to find effective solutions to unlock the value from unstructured information sources, and to leverage it in Business Intelligence deployments Butler Group Poor classification costs a 10,000 user organization $10M annually. Poor classification costs a 10,000 user organization $10M annually. Usability expert, Jacob Nielsen Unstructured information doubles in quantity every three months Gartner Group In modern enterprise, 85% of data is unstructured In modern enterprise, 85% of data is unstructured Butler Group
NEED common interface framework for diverse multilingual and multiform sources of information. OBSTACLE unstructured information human-friendly BUT meaningless to computers. and heavily dependant on manual processes HOLTRAN Technology Ltd.
The general need for… common interface being able to bind and put in order diverse multilingual and multiform sources of information …is also lately sharply realized by European leading companies and by ITEA board which has formulated in its Technology Roadmap on Software Intensive Systems the main challenge for Semantic data as the possibility for applications to understand the meaning of each others data and has invited HOLTRAN Technology Ltd, to the annual ITEA meeting in Amsterdam (January, 2003). As result, the company was invited to join the 6-th call EUREKA ITEA consortium: DigiNews (News and Information for mobile e-paper terminals. Leader: Philips Technology) HOLTRAN Technology addresses this need providing a framework for building semantic knowledge based information systems of a new generation being capable to extract semantic (structured) information from multilingual textual (unstructured) form and vice versa – to express the stored structured information in textual form.
Novelty Employment of a knowledge model based on our own improved Multiple sorted type theory. HOLTRAN based system is able to store arbitrary order relationships between semantic entities (also incomplete and contradictive information) store extendable language definitions of any type and complexity effectively evaluate full responses to any queries related to the stored information HOLTRAN Technology Ltd.
Novelty in Question Answering Unlike search engines, HOLTRAN Question Answering system aims to supply users with the essence of "just the right information," instead of merely providing a list of hits. Current question answering systems are based on sentence level information retrieval. The rate of correct answers in such systems is about 70%. HOLTRAN Question Answering system provides a revolutionary leap from sentence level retrieval to sense level information retrieval and answer formation
Why we are better than other QA systems? State-of-the-Art question answering systems use a variety of linguistic resources to understand users queries and match documents sections. Most common linguistic resources include: –part-of-speech tagging –parsing –named entity extraction –semantic relations –dictionaries, WordNet, etc. HOLTRAN Technology provides a unique framework embracing these resources inside the system and thus covering complex lexical, syntactic and semantic relationships between question and answer strings.
The social impact of HOLTRAN in 2020 will cover the following situations: Two or more people talking over the "phone" each in his native language and receiving the answer in it. Traveling in a car one can ask "it" any relevant question and receive the system response in his native language. One can receive ANY "newspaper" in his native language (the first step to be done through our participation in DigiNews ITEA project). Fully automated call centers: each user can be receive the answer in his native language. Translation form one artificial language to another providing full database compatibility without additional software development (solution of PDM-ERP compatibility problem). No manual transaction treatment: all e-mails read and treated automatically (Spam, as well).
Relational DBMS Unstructured documents System Server System Client Local documents User Meta data (SQL) Documents (text stream) Meta data (XML) Documents (text stream) Meta data + Documents (GUI) Typical modern information system architecture The system repository contains a relational database of some related meta information a hybrid of a vault of textual documents stored as unstructured "black boxes The System Server provides execution of queries to the relational DB and access to the textual documents. communicates to the system clients usually via GUI for the end users. HOLTRAN Technology Ltd.
The basic problems of classical architecture: Redundancy and inconsistency between contents of primary documents and meta information on these documents Limitations on inter-version and inter-application compatibility HOLTRAN technology solves these fundamental problems due to its ability to extract the semantic content from unstructured documents extract more semantic information from the same documents upon extending language definitions without a new software development express the content in a textual form. HOLTRAN Technology Ltd.
HOLTRAN KBMS HOLTRAN Interpreter Content + languages definitions (HOLTRAN native) Meta data + Documents (text stream) Local documents User Meta data + Documents (user native) Browser HOLTRAN based information system architecture HOLTRAN KB stores on equal rights both application information definitions of various external languages, i.e. any artificial or natural languages used to exchange information with applications and users. HOLTRAN interpreter translates information coming from users and documents in external languages to the internal KBMS language and vice versa. It allows users to communicate with the system in their native languages. HOLTRAN Technology Ltd.
How it works? Objects are assigned entity type e John plays table tennis well t stands for truth type Interior axiom representation HOLTRAN Technology Ltd.
How it works? (2) Querying in HOLTRAN: Who plays table tennis well ? Inference procedure consists in finding of all consistent substitutions of free variables (xs) in the tested formula with which it is provable. Interior axiom representation HOLTRAN Technology Ltd.
What is higher order logic? First Order Logic: Mary and Cathy play tennis. First order logic expressions – the ones being written in SQL and containing constants and variables only of simple types e, t,… Mary or Cathy Not Mary and Cathy First order part
What is higher order logic? (2) First Order Query - queries a variable of the first order: Who plays table tennis well ? Higher order logic expressions – the ones containing constants and variables of variable order types ee, et, eet … Second Order Query - queries a variable of the 2-d order: What does John do? How does Mary play tennis ? 4-th order query
How do we do it in HOLTRAN Native programming language HOLTRAN Native programming language is especially designed to express any human-percept notions and ideas, including definitions of natural languages This is how looks the piece of code in HOLTRAN Native to program the questions of the sort: How does Mary play tennis ? (("How does"=) ##& NounPhrase ##> VerbPhrase ##& ("?"=) =>> \np:e\vp:et((x:(et)et vp:et) np:e)) This is the way Interpreter translates from English to HOLTRAN Native and back to English: < How does Mary play tennis ? > Test (x:(et)et (_3:eet (COM _5:et)) (ID _2:e)); > Assert (_1:(et)et (_3:eet (COM _5:et)) (ID _2:e)); > Mary plays tennis well.
Product Overview Our core product – HOLTRAN Semantic Platform is a suit of software components and tools serving as a middleware to build customizable and extensible applications for semantic processing of textual information in multiple artificial and natural languages. HOLTRAN Technology Ltd.
HOLTRAN Semantic Platform comprises: HOLTRAN Engine - the heart of the product which reads input information, translates it into internal instructions and executes them by directly accessing HOLTRAN Knowledge Bases. HOLTRAN Engine includes built in Interpreter of HOLTRAN Native Language A set of HOLTRAN Knowledge Bases HOLTRAN Knowledge Connector - a package of software components forming a C++ and Java API (SDK) to provide local or network access to HOLTRAN Knowledge Bases. HOLTRAN Technology Ltd.
HOLTRAN Semantic Platform also comprises: A configurable set of HOLTRAN Standard Applications including HOLTRAN Desktop Assistant - a standard extension to MS Windows Explorer HOLTRAN Network Assistant providing instant Information Search, categorization and summarization across an enterprise network HOLTRAN Web Server providing access to HOLTRAN Knowledge Bases via http protocol and publishing documents to the linked HOLTRAN Knowledge Bases HOLTRAN Technology Ltd.
HOLTRAN Question & Answering: Current Status Consider the following short input: Mark is a doctor. He has a clinic. Mark works well. Bill and Cathy love Mark. John is a programmer. He is a good programmer. John builds a new program. He works well. Alex is a driver. Alex builds a new house. He works hard. Olga helps him, she works hard too. HOLTRAN Technology Ltd.
HOLTRAN Question & Answering: Current Status Now the following dialogue takes place between the user ( ). Note the dialogue management capabilities: < Who is Mark ? > He is a doctor. < Who is he ? > He is a doctor. < Does Mark have a clinic ? > Yes, he has it. < How does Mark work ? > He works well. < How does Olga work ? > She works hard. < Who loves him ? > Bill and Cathy love him. < Who works hard ? > Alex and Olga work hard. < Who is Alex ? > He is a driver. < What does he do? > He works and builds a new house. < Who helps him ? > Olga helps him. < What does John do ? > He works and builds a new program. HOLTRAN Technology Ltd.
HOLTRAN Question & Answering: Current Status Self – learning capabilities: Note that carnivore, lion and lab are absent in the system dictionary and we introduce them in the dialogue for the first time: < Alex is a carnivore. < A lion is a carnivore. < Who is Alex ? > He is a driver and he is a carnivore. < Who is a lion ? > It is a carnivore. < Mark builds a new lab. < What does Mark do ? > He works, builds a new lab and has a clinic. HOLTRAN Technology Ltd.
HOLTRAN Question & Answering Test Cases - 1 No doubts that none of existing products could pass even a half of these tests in viewable future. We expect to pass at least the first 7 cases in a year (the last one might require some additional efforts). 1. Negation Accounting I: The wolf huffed and puffed but he could not blow down that brick house. Q: Could the wolf blow down the brick house ? 2. Syllogisms I: Every human is mortal. Socrates is a human. Q: Is Socrates mortal ? HOLTRAN Technology Ltd.
HOLTRAN Question & Answering Test Cases -2 3. Wh" questions I: After its final passage by both houses, the bill is sent to the president. Q: Whom is a bill sent after its final passage by both houses ? Q: When is a bill sent to the president ? 4. References resolution I: When a senator or a representative introduces a bill, he or she sends it to the clerk of his house, who gives it a number and title. Q: Who sends a bill to the clerk of his house ? Q: Who gives a bill a number and title ? 5. Synonyms/antonyms accounting I: Diesel engines are heavier than gasoline engines. Q: Which type of internal-combustion engine is lighter ?
HOLTRAN Question & Answering Test Cases -3 6. Semantic categories accounting I: The heart employs a separate vascular system to obtain blood for its own nourishment. Two major coronary arteries regulate this blood supply. Q: What is the function of coronary arteries ? 7. Ontology accounting I: Joseph Kennedy devoted the rest of his life to advancing the political careers of his sons, John, Robert and Edward. Q: Is Robert Kennedy a brother of John Kennedy ? HOLTRAN Technology Ltd.
HOLTRAN Question & Answering Test Cases -4 8. Merging distributed info I: IBM and Philips announced a joint initiative to collaborate on radio frequency identification (RFID) technology for companies using supply-chain software. I: Eastman Kodak Co. and IBM on Tuesday announced a joint effort to offer healthcare facilities products that combine Kodak's medical imaging technology and services with IBM's storage devices. I: GiveMePower Corporation today announced it has partnered with IBM Corporation as one of four business solutions to be showcased in Intel Corporations "Inside Your Digital Life: Intel" exhibit at CeBIT 2004. Q: Which companies does IBM Corporation have joint projects or partnership with ? HOLTRAN Technology Ltd.
Context dependency Absolute truth < John works hard. > Yes, John works hard. < John does not work hard. > No, he does work hard. < How does John work ? > He works hard. (Currently implemented dialogue) Relative truth < Bryan says that John works hard. < Bill says that John does not work hard. < How does John work ? > According to Bryan he works hard and according to Bill – not. (Future dialogue) Contradiction resolution HOLTRAN Technology Ltd.
Key Persons: Dr. Alexander Brenner, President and CEO. Ph.D. in Mathematics from the Technion - Israel Institute of Technology and M.Sci. from Moscow State University. Previously: Image Processing and algorithms department leader at Imaginarix Ltd. and lecturer, at the Technion - Israel Institute of Technology. Professional experience: Pure and applied mathematics, Image and Signal Processing. Software engineering (object oriented design, testing and maintenance). Management of R&D teams. Applications: Image processing, Call Centres, Artificial intelligence (pattern recognition, natural language processing), scientific programming, industrial applications, mathematical and statistical modelling.
Key Persons: Dr. Victor Gluzberg, VP R&D. Ph.D. and M.Sci. in physics and applied mathematics from Novosibirsk State University. Previously: Software Manager at Parametric Technology, Israel Professional experience: Applied mathematics and computer sciences, Physics, Software engineering ( requirements analysis, program specification and design, testing and maintenance). Management of R&D teams. Applications: Data processing, System programming, CAD/CAM, Artificial intelligence( pattern recognition, inference, natural language processing) scientific programming, industrial applications, mathematical and statistical modelling.