Semantic Insights: Semantic Technology in Action Chuck Rehberg, Chief Scientist Semantic Insights a Division of Trigent Software, Inc
Topics Covered PriArt, a powerful new way to do research What PriArt does Investigation Examples –EPA –Fish & Wildlife –USPTO – Potential Patent Infringement –Internet Investigation Web 2.0, Web 3.0, and Cloud Computing
PriArt, a powerful new way to do research Version 1 Joe gave Bob the ball. Can your keyword search do that? Version 2 The ball was given by Joe to Bob. Synonyms Robert said, Joseph gave me the ball. Instance Cubs pitcher Joe DiMaggio handed the ball to Bob Kelly. Generalize The first guy gave the ball to the second guy. Specialize DiMaggio gave little Bobby Fischer an autographed baseball. Pronouns Joes son is Bob. He gave his son a baseball. Identity On February 17, the murder weapon, a baseball, was shown to Joe DiMaggio who subsequently transferred it to Bobby Fischer.
What PriArt does 1.PriArt starts with a plain-english statement of what you are interested in (we call it your investigation) 2.PriArt gathers information from a potentially large corpus of documents (by reading them) and generates a structured report containing only the information relevant to your investigation.
An EPA Example The Goal: Suppose you were interested in finding what is published in a recent EPA report on the environmental effects of Mercury. You have a quote from an existing source and you want to know what the new EPA publication says about it.
EPA Example: Information Source Information Source: U.S. Environmental Protection Agency (EPA). (2008) EPAs 2008 Report on the Environment. –National Center for Environmental Assessment, Washington, DC; EPA/600/R-07/045F. Available from the National Technical Information Service, Springfield, VA, and online at The specific URL of interest is located at: –http://oaspub.epa.gov/eims/eimscomm.getfile?p_downloa d_id=485027http://oaspub.epa.gov/eims/eimscomm.getfile?p_downloa d_id=485027
EPA Example: Investigation Your Investigation statement might look something like:The effects of Mercury on human health are diverse and depend on the forms of mercury encountered. Fetuses and children may be more susceptible to mercury and to neurological health effects. Prenatal exposures interfere with the growth.
Fish & Wildlife Example: The Goal Suppose you have specific topics you are interested in and wish to know what the Fish & Wildlife reports say about them. 1.Migratory bird numbers are shrinking. 2.Wetlands destruction probably will contribute to shrinking. 3.Biomass energy crops will reduce available habitats. 4.Birds reduce insect populations in temperate forests. 5.Declines in migratory birds pose a threat to the health of our forests and farmlands.
Fish & Wildlife Example: Sources Information Sources: U.S. Fish & Wildlife Service, Division of Migratory Bird Management Reports Located at:
About the Quality of Results The quality of the results is primarily influenced by how semantically close the information documents are to your investigation. The more PriArt understands about your investigation, the better the results. There are two ways to improve the results: 1.Do the Knowledge Engineering to add Ontology, Logic, and Processing to automatically expand the meaning of the investigation, and/or 2.Or better yet… Let the machine do the work: add more information to your investigation.
Other things that effect the Results The bottleneck is bandwidth Cloud Computing will help (and is necessary to scale in general). However, with scaled readers the bottleneck becomes the server of the sources to read. PDF require special handling PDFs presents challenges in identifying the content of lists and tables. We have heuristics to handle that. Charts are another matter. This requires much more work. But is it worth the effort?
PriArt Reads Documents in Real-Time
USPTO Example: Potential Patent Infringement Suppose you were interested in finding potential infringements to a specific patent in the US Patent Office database. Investigation of a specific Patent: United States Patent #7,433,858 Rehberg, et al. October 7, 2008 Rule selection engine Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch- bool.html&r=1&f=G&l=50&co1=AND&d=PTXT&s1=rehberg.INNM.&OS=IN/rehberg&RS=IN/rehberg
USPTO Example: Claim Text What is claimed is: 1. A computer-implemented method for processing rules, the method comprising: providing a static data structure (125) and a dynamic data structure (135) for processing rules, wherein the static data structure represents rules in a rules base and the dynamic data structure includes storage locations for working data produced by processing external facts according to the rules represented in the static data structure, wherein each rule in the rules base is specified according to a set of condition elements, and for each rule the static data structure includes a data vector (256) for said rule such that each element (257) of said vector is associated with a different one of the condition elements according to which the rule is specified,…
USPTO Example: Converting Claim Text to Plain Text PriArt supports English Converters. PriArt includes a Patent Claim language to Plain English language converter. For example, here is a segment of the conversion of Claim 1. … A static data structure and a dynamic data structure for processing rules are provided. The static data structure represents rules in a rules base and the dynamic data structure includes storage locations for working data produced by processing external facts according to the rules represented in the static data structure. Each rule in the rules base is specified according to a set of condition elements. The dynamic data structure includes a corresponding vector of storage locations for said rule such that each storage location of said vector corresponds to a different one of the elements of the data vector for said rule in the static data structure and is associated with a different one of the condition elements according to which the rule is specified. Facts are processed. …
USPTO Example: Potential Infringement Reports PriArt Infringement Reports reports state the claims that pose a potential infringement and identify specific aspects of the original claim may be infringed.
Example Internet Investigation Suppose you were interested in finding what is known about certain aspects of Autism. You notice a paragraph on a webpage –http://www.autism-society.org/site/PageServer?pagename=about_homehttp://www.autism-society.org/site/PageServer?pagename=about_home The paragraph reads: Autism is a complex developmental disability that typically appears during the first three years of life and affects a persons ability to communicate and interact with others. Autism is defined by a certain set of behaviors and is a "spectrum disorder" that affects individuals differently and to varying degrees. There is no known single cause for autism, but increased awareness and funding can help families today.
PriArt: Web 2.0, Web 3.0, and Cloud Computing Web 2.0 –Collaborative Investigations can improve the quality and accuracy of the investigations Web 3.0 –Natural Language Processing Improves itself through Experience On-going Training by our top PhD Linguists –Common and Domain-Specific Dictionaries Improves itself through Experience Tools to semi-automate curation of Dictionaries –Common and Domain-Specific Ontologies Improves itself through Experience Tools to semi-automate curation of Ontologies Cloud Computing –Essential for mass scalability. Work in progress…
Contact Us We are seeking pilot projects and early Beta sites now. Send to arrange a demo. For more information, please contact me: Chuck Rehberg, Chief Scientist/CTO Semantic Insights, A Division of Trigent Software, Inc. 2 Willow St, Suite 201, Southborough, MA Direct: Cell: Blog: