Search options and content tagging NKOS 2008 Aarhus Denmark September 19 Marjorie M. K. Hlava Access Innovations, Inc – Data Harmony.

Slides:



Advertisements
Similar presentations
Taxonomy & Ontology Impact on Search Infrastructure John R. McGrath Sr. Director, Fast Search & Transfer.
Advertisements

DCMI Workshop on Metadata and Search Vendor Panel Presentation Bradley P. Allen
Taxonomy as Content Outline, Site Map and Search Aid SLA NWR Vancouver October 6, 2006 Marjorie M.K. Hlava President
Classification & Your Intranet: From Chaos to Control Susan Stearns Inmagic, Inc. E-Libraries E204 May, 2003.
Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management Eytan Adar et al Presented by Xiao Hu CS491CXZ.
Chapter 5: Introduction to Information Retrieval
INFO624 - Week 2 Models of Information Retrieval Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.
Access Innovations Presents: Data Harmony version 3.8 New Features.
Access Innovations, Inc. Marjorie M.K. Hlava Jay Ven Eman.
Leveraging Your Taxonomy to Increase User Productivity MAIQuery and TM Navtree.
Innovation in Search? Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Information Retrieval in Practice
IR & Metadata. Metadata Didn’t we already talk about this? We discussed what metadata is and its types –Data about data –Descriptive metadata is external.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
A Flexible Workbench for Document Analysis and Text Mining NLDB’2004, Salford, June Gulla, Brasethvik and Kaada A Flexible Workbench for Document.
1 Information Retrieval and Web Search Introduction.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Basic IR Concepts & Techniques ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Libraries and Institutional Content Management Systems
Sunday May 4 – 5 PM Bradford, Hlava, McNaughton
Implementing Metadata Marjorie M K Hlava, President Access Innovations, Inc. Albuquerque, NM
Overview of Search Engines
At the NATIONAL TRANSPORTATION LIBRARY CIL 2007 Washington, DC Joyce W. Koeneman Digital Librarian, NTL Research and Innovative Technology Administration.
GL12 Conf. Dec. 6-7, 2010NTL, Prague, Czech Republic Extending the “Facets” concept by applying NLP tools to catalog records of scientific literature *E.
Text Analytics And Text Mining Best of Text and Data
ROI & Impact: Quantitative & Qualitative Measures for Taxonomies Wednesday, 11 February :00 – 12:30 PM MST Presented by Jay Ven Eman, Ph.D., CEO.
The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation SEASR Overview Loretta Auvil and Bernie Acs National.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Taxonomies: Hidden but Critical Tools Marjorie M.K. Hlava President Access Innovations, Inc.
1 The BT Digital Library A case study in intelligent content management Paul Warren
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
LIS618 lecture 1 Thomas Krichel economic rational for traditional model In olden days the cost of telecommunication was high. database use.
Copyright © 2006 Access Innovations, Inc. 1 Building Taxonomies Part 5 Alice Redmond-Neal Access Innovations, Inc. Enterprise Search Summit New York City,
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Use of Hierarchical Keywords for Easy Data Management on HUBzero HUBbub Conference 2013 September 6 th, 2013 Gaurav Nanda, Jonathan Tan, Peter Auyeung,
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Advanced AutoEntry Using Resume Parsing with Version 9 of PcHunter/Tempus Fugit Advanced AutoEntry © 2008 Micro J Systems, Inc.
5-1 McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved.
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
Search Engine Architecture
Evolution of a production pipeline Marjorie M.K. Hlava President Access Innovations.
Introduction to Information Retrieval Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Text Analytics in Action: Using Text Analytics as a Toolset TBC 4:15 p.m. - 5:00 p.m. Marjorie Hlava Semantic enrichment / Semantic Fingerprinting.
Advanced Semantics and Search Beyond Tag Clouds and Taxonomies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
Information Retrieval CSE 8337 Spring 2007 Introduction/Overview Some Material for these slides obtained from: Modern Information Retrieval by Ricardo.
Information Retrieval
© Copyright 2008 STI INNSBRUCK A Semantic Model of Selective Dissemination of Information for Digital Libraries.
Semantic Web Technologies Readings discussion Research presentations Projects & Papers discussions.
Information Retrieval in Practice
Information Architecture
Search Engine Architecture
Web News Sentence Searching Using Linguistic Graph Similarity
Information Retrieval and Web Search
Search Engine Architecture
Information Retrieval and Web Search
Information Retrieval and Web Search
Multimedia Information Retrieval
Search Techniques and Advanced tools for Researchers
Information Retrieval
CSE 635 Multimedia Information Retrieval
Introduction to Information Retrieval
Content Analysis of Text
BUILDING A DIGITAL REPOSITORY FOR LEARNING RESOURCES
Search Engine Architecture
Information Retrieval and Web Search
Presentation transcript:

Search options and content tagging NKOS 2008 Aarhus Denmark September 19 Marjorie M. K. Hlava Access Innovations, Inc – Data Harmony

In the olden days…..  Online from the 70’s Dialog Data Star Many others  Secondary publishers Mead – Lexis CAS NASA & DOE & many others

Online search  Worked very well Focused Controlled Specialized  Content analysis Database design - context Extensive markup Proprietary formats (Dialog format b)

Back at the lab  Computer science Full text Isolated Content without context  Developing shortcuts became critical Relevance Weighting Probabilities

Natural Language Processing  Since the early 1970's  Replicate human intelligent processes  HUGE body of research  Extract information  Increasing mountains of textual information  Holy Grail

Natural Language Processing  Artificial intelligence  Computational linguistics.  Problems of automated generation and understanding of natural human languages.  Convert samples of human language into more formal representations that are easier for computer programs to manipulate  Nine major areas (or so…)

Natural Language Processing  Linguistic Study of language  Semantic - study of meaning in communication Literal and connotation Lexical, Applied, Structural  Syntactic principles and rules for constructing sentences  Morphological - structure and content of word forms  Phraseological - peculiar form of words  Grammatical -the rules governing the use of any given natural language  Stemming – lemmatization reducing inflected word to stem, base or root  Synonyms - semantically equivalent  Pragmatics - Common sense - indexicality - use and effects of language

Other Techniques - Sample  Vector calculus (vector analysis) quaternion analysis Latent Semantic  Statistical - multivariate analysis  Bayesian probability uses probability as 'a measure of a state of knowledge' Objectivist school Subjectivist school  Neural networks - connectionism Statistical learning theory  SMART (System for the Mechanical Analysis and Retrieval of Text) Information Retrieval System Cornell University Gerard Salton Vector space model Relevance feedback Rule based

They don’t work well  Search is broken  Google stole the show  Precision and recall went out the window  Relevance became the buzzword

The Potential  To access content directly  Find it  Tag it  Know what the user will ask for And the next user, And the next user  Not all people search the same way  Persistent Clustering – find it again!

Use term control - applied  At the input end  On the search (query) side as well  Accommodate all learning styles  ………..  High relevance  Total recall  Excellent precision  Happy users

Look at The Weather Channel: 15 synonyms for “rain” Rain Shower Sprinkles Downpour Cloudburst Gully washer Monsoon Deluge Thunderstorm Thundershower Drizzle Mist Liquid precip. Torrent Virga vir ● ga \ ‘vərgə \ n –s Precipitation (usually rain or snow) that evaporates before it reaches the ground, often seen as gray streaks in the sky near the base of the cloud.

Hammered, Hit, Slammed, Buffeted, Slapped, Sprayed, Pushed, Pummeled, Drenched, Buried, Blasted, Blown, Abused, or otherwise manhandled by the elements.

Adding the taxonomy terms to the content  Time of creation  Adding to the corpus in the System Content Management Digital Asset Management System Repository Attach to the record or information object

Automatic term suggestion – VERY rich in synonyms for search

“Using the MAI has cut our search time by 50%” Jay Tellock, Weather Channel

Then pull it out again  Search software  Inverted index – fast look up  Display records – show the user  Accommodate different learning styles Browse (taxonomy) Search (the box) Advanced search (faceted navigation) Follow a thread (ontology)

Use term control - applied  At the input end  On the search (query) side as well  Accommodate all learning styles  ………..  High relevance  Total recall  Excellent precision  Happy users

Rules of thumb - general  Index to the most specific level  Role up the terms for presentation  Add lots of synonyms  Review the search logs  Add candidate terms

Rules of thumb - metrics  Hit Miss and Noise 85 % accuracy to launch  4 hours per month to maintain With candidate term feeds With search log data  5 minutes per term – rule and record  1 hour per training term

Justification – the ROI Copyright  2007 Access Innovations, Inc. The Pain of Search Percent Number of Employees Search & Use Timel Per Week Time Searching Per Week Time Analysing Per Week Average Loaded Salary Annual Cost of Looking Search Time ReductionDifference Mission critical1000Hours $ Per Hour10% High ,736,0007,862,400873,600 Medium ,928,00040,435,2004,492,800 Low ,120,0002,808,000312,000 $56,784,000$51,105,600$5,678,400

Concept Extraction System Full text, HTML, PDF, etc. data feeds from DRMS / Valet /etc. MAI Rule Base Database or search system TTE Server Taxonomies DH Concept Extraction System Novelty Detection Novelty Detection suggests new terms Taxonomy terms & Authority Name Metadata DC Metadata Produces DC Abstract Auto Sum Load to Shared TTE Server Entity Extractor MAI Concept Extractor Unified Bibliographic Citation

DH M.A.I.  Process Formulation Query formulations Use NLP to parse query Expand query term to all factors in rule base Data Harmony MAI Concept Extractor Module User Taxonomy Term Suggestions Pass text through rule bases Concept Extraction Provide suggested term list Term Selection Categorization of results by frequency Convert frequency to weights Present results Subject term indexing

DH MAI Query Process Formulation Query formulations Use NLP to parse query Expand query term to all factors in rule base Data Harmony MAI Query Module User Query Query revolver Pass query to Search Concept Extraction Analyze reply Reporting Categorization of results by frequency Group results Present results Query Results

DH API Data Harmony Data Harmony Architecture Web Content Files, Documents Databases Taxonomies WEB Server I GRAPHICAL USER INTERFACE Novelty Detection MAI Rule Bases MAI Concept Extractor Auto Summarization Entity Extractor DH CONCEPT EXTRACTION SYSTEM , Groupware, etc. Alerts Data Harmony Administrative Module Thesaurus Dublin Core METADATA Rules for Concept Extractor SUBJECT TERMS ABSTRACT Bibliographic citation with abstract Catalog Search Server Portals DBMS

DH API Data Harmony Data Harmony Architecture Web Content Files, Documents Databases Taxonomies WEB Server I GRAPHICAL USER INTERFACE Novelty Detection MAI Rule Bases MAI Concept Extractor Auto Summarization Entity Extractor DH CONCEPT EXTRACTION SYSTEM , Groupware, etc. Alerts Data Harmony Administrative Module TTE Dublin Core METADATA Rules for Concept Extractor SUBJECT TERMS ABSTRACT Bibliographic citation with abstract Catalog Search Server Portals DBMS

DH API Data Harmony Data Harmony Architecture Web Content Files, Documents Databases Taxonomies WEB Server I GRAPHICAL USER INTERFACE Novelty Detection MAI Rule Bases MAI Concept Extractor Auto Summarization Entity Extractor DH CONCEPT EXTRACTION SYSTEM , Groupware, etc. Alerts Data Harmony Administrative Module Thesaurus Dublin Core METADATA Rules for Concept Extractor SUBJECT TERMS ABSTRACT Bibliographic citation with abstract Catalog Search Server Portals DBMS Formulation Query formulations Use NLP to parse query Expand query term to all factors in rule base Data Harmony MAI Query Module User Query Query revolver Pass query to Search Concept Extraction Analyze reply Reporting Categorization of results by frequency Group results Present results Query Results

Thank you for attention! Marjorie M. K. Hlava Access Innovations / Data Harmony