MCNC/CNIDR & A/WWW Enterprises Introduction to CNIDR’s Isite Jim Fullton - MCNC/CNIDR Archie Warnock - A/WWW Enterprises.

Slides:



Advertisements
Similar presentations
Comparison of BIDS ISI (Enhanced) with Web of Science Lisa Haddow.
Advertisements

Z39.50 Profiles The Bath Profile ZIG Meeting Leuven, Belgium July 2000 William E. Moen School of Library and Information Sciences University.
CLEARSPACE Digital Document Archiving system INTRODUCTION Digital Document Archiving is the process of capturing paper documents through scanning and.
© Copyright 2012 STI INNSBRUCK Apache Lucene Ioan Toma based on slides from Aaron Bannert
Geospatial One-Stop A Federal Gateway to Federal, State & Local Geographic Data
An Operational Metadata Framework For Searching, Indexing, and Retrieving Distributed GIServices on the Internet By Ming-Hsiang.
Periodicals BooksNewspapers Reference tools Online Databases Printed Version Electronic Version Annual reports and other publications.
1 panFMP - Ein XML-basiertes Framework für Metadaten- Portale Vortrag und „hands-on“ Seminar am GFZ Potsdam Uwe Schindler MARUM – Universität Bremen PANGAEA.
How to Search the USFSP Digital Archive By Carol Hixson, Dean Nelson Poynter Memorial Library May 31, 2014.
Information Retrieval in Practice
December 9, 2002 Cheshire II at INEX -- Ray R. Larson Cheshire II at INEX: Using A Hybrid Logistic Regression and Boolean Model for XML Retrieval Ray R.
1 Chapter 12 Working With Access 2000 on the Internet.
Parametric search and zone weighting Lecture 6. Recap of lecture 4 Query expansion Index construction.
Chapter 4 : Query Languages Baeza-Yates, 1999 Modern Information Retrieval.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Lesson 2 Technology: Federated Searching Explained.
Z39 Intro DigiTool Version 3.0. Z39 Intro 2 Overview What is z39.50? “A network protocol which specifies rules that allow searching of a range of different.
Overview of Search Engines
Tutorial 6 Forms Section A - Working with Forms in JavaScript.
Microsoft Office System UK Developers Conference Radisson Edwardian, Heathrow 29 th & 30 th June 2005.
Chapter 4 Query Languages.... Introduction Cover different kinds of queries posed to text retrieval systems Keyword-based query languages  include simple.
Databases & Data Warehouses Chapter 3 Database Processing.
Marty Harris aka TEXT QUERY SYSTEM Marty Harris Mgr TRD.
A/WWW Enterprises1 Introduction to CNIDR’s Isearch Archie Warnock
Using The Explain Facility Denis Lynch SilverPlatter Information +44 (181)
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
A Survey of Patent Search Engine Software Jennifer Lewis April 24, 2007 CSE 8337.
Hotbot A Search Engine Case Study. Introduction  Owned by Terra/Lycos.  One of the largest web search engines.  Uses the Inktomi database combined.
University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
A/WWW Enterprises15 July 1996 Implementing Queries with HTTP A. Warnock A/WWW Enterprises
Querying Structured Text in an XML Database By Xuemei Luo.
The Internet 8th Edition Tutorial 4 Searching the Web.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
1 CS 502: Computing Methods for Digital Libraries Lecture 19 Interoperability Z39.50.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
A/WWW Enterprises 28 Sept 1995 AstroBrowse: Survey of Current Technology A. Warnock A/WWW Enterprises
ONE-2, SVUC, danZIG & Holdings ZIG July 2000 Poul Henrik Jørgensen, Danish Bibliographic Centre
ONE-2 Profile ZIG Tutorial 19 th January 2000 Poul Henrik Jørgensen, Danish Bibliographic Centre
An Interoperable Portal for the Historic Environment Tony Austin, Julian Richards Archaeology Data Service, Department of Archaeology,
PatentScope - Electronic Publication World Intellectual Property Organization.
The Future of Isite - Growing GILS Archie Warnock A/WWW Enterprises
1 HTML Forms
1 Tutorial 14 Validating Documents with Schemas Exploring the XML Schema Vocabulary.
Tutorial 13 Validating Documents with Schemas
Uwe SchindlerGES 2007 – May 2-4, 2007 Data Information Service based on Open Archives Initiative Protocols and Apache Lucene Uwe Schindler 1, Benny Bräuer.
Managed by UT-Battelle for the Department of Energy Mercury – Distributed Metadata Tool for Finding and Retrieving CDIAC Data CDIAC UWG Meeting September.
The FGDC and Metadata. To maintain an organization's internal investment in geospatial data To provide information about an organization's data holdings.
1 Information Retrieval LECTURE 1 : Introduction.
Archibald Warnock FGDC Activities CIP/INFEO Interoperability and ISO CD2 Metadata Activities.
Leveraging Publisher’s Search Engines to Deliver Relevant Results to Users Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC 28 th.
Coming Soon to a Computer Near You (maybe) MicroZGate A Light, Portable, and Configurable z39.50 Gateway John Ulmer NOAA Coastal Services Center.
FGDC and ASF Using Structured Metadata Archie Warnock A/WWW Enterprises
Don’t Duck Metadata March 2005 Introducing Setting Up a Clearinghouse Node Topic: Introduction to Setting Up a Clearinghouse Node Objective: By.
A/WWW Enterprises 15 July 1996 Implementing Queries with Z39.50 A. Warnock A/WWW Enterprises
Z39.50 and the ZING Initiatives: MAVIS Users Conference, 2003 November 6, 2003 Larry E. Dixson Library of Congress.
Search and Retrieval: Query Languages Prof. Marti Hearst SIMS 202, Lecture 19.
Z39.50 Maintenance Agency
Alexandria Digital Library The ADL Testbed Greg Janée
Understanding the Value and Importance of Proper Data Documentation 5-1 At the conclusion of this module the participant will be able to List the seven.
Alexandria Digital Library ADL Metadata Architecture Greg Janée.
High performance, full-featured text search engine written in Java. Technology suitable for nearly any application requiring full-text search, especially.
Information Retrieval in Practice
Lecture 1: Introduction and the Boolean Model Information Retrieval
Building Search Systems for Digital Library Collections
Thanks to Bill Arms, Marti Hearst
Information Retrieval and Web Design
INF 141: Information Retrieval
Archibald Warnock A/WWW Enterprises
Presentation transcript:

MCNC/CNIDR & A/WWW Enterprises Introduction to CNIDR’s Isite Jim Fullton - MCNC/CNIDR Archie Warnock - A/WWW Enterprises

MCNC/CNIDR & A/WWW Enterprises What is Isite? u A freely available implementation of the Z39.50 search/retrieval protocol u It includes a Unix-based server, a WWW gateway, a command-line client and a sophisticated text search engine u ftp://ftp.cnidr.org/pub/NIDR.tools/Isite u

MCNC/CNIDR & A/WWW Enterprises What is Isearch? u Isearch is the successor to freeWAIS u Isearch is a sophisticated full-text search and retrieval system u Isearch is a component of Isite, an implementation of the NISO standard protocol Z39.50 for information search and retrieval u ftp://ftp.cnidr.org/pub/NIDR.tools/Isearch u

MCNC/CNIDR & A/WWW Enterprises System Components - I u Iindex, the Text Indexer - builds searchable version of the document collection F Implements fast word-based searching F Document parser - recognize start/end of individual documents F Field parser - recognize start/end of fields within individual documents

MCNC/CNIDR & A/WWW Enterprises System Components - II u Isearch, the Search engine - searches a document collection based on user- supplied query F Command line search 4 Primarily used for testing F WWW gateway (using CGI) 4 End-user interface using forms F Z39.50 gateway

MCNC/CNIDR & A/WWW Enterprises Isearch Capabilities u Fast full-text search F US AIDS Patent Collection - can search ~250,000 patents in < 1 second u Fielded search F Can restrict searches to title, author, abstract, other fields u Relevance ranking F Search “hits” are assigned scores & sorted

MCNC/CNIDR & A/WWW Enterprises Isearch Capabilities u Word truncation F search for “matri*” matches “matrix” and “matrices” u Boolean functions F AND, OR and ANDNOT combinations of different fields u Customized presentation of results u Phrase searching (coming soon)

MCNC/CNIDR & A/WWW Enterprises Isearch Customization u What’s needed to customize Isearch? F Isearch is written in C++ F Documents are C++ objects - data & procedures 4 Already have SGML & HTML, among others F Object technology allows code reusability, customizing only where differences from existing objects occur

MCNC/CNIDR & A/WWW Enterprises Isearch Customization u What’s needed to make arbitrary documents searchable? F Code to parse documents F Code to parse fields F Code to build brief and full result records F Yes, it requires programming F But, many of these are derived from existing procedures

MCNC/CNIDR & A/WWW Enterprises Introduction to Z39.50 u Developed for search and retrieval u Networked, client/server environment u Tested by working information scientists (Z39.50 Implementor’s Group) u Commerical & public domain support (Isite from CNIDR) u

MCNC/CNIDR & A/WWW Enterprises Attribute Sets u Attributes define how the query is specified F Use: field names F Relation: comparisons F Position: location in field F Structure: word/phrase/key/ etc F Truncation: left/right/none/ etc F Completeness: subfield/field

MCNC/CNIDR & A/WWW Enterprises Attributes & Element Sets u Supported Attribute Sets  BIB-1  GILS  GEO F STAS u Element Sets define retrievable sets of use attributes F Brief record F Full record F Summary record (GEO)

MCNC/CNIDR & A/WWW Enterprises Record Syntaxes u Z39.50 allows specification of a “Preferred Record Syntax” for results F SUTRS (unstructured text) F HTML F USMARC F GRS-1 (tagged, generalized syntax)

MCNC/CNIDR & A/WWW Enterprises Profiles - GEO and Otherwise u Profiles define allowed attributes and element sets u Usually domain specific - ATS-1, GILS, WAIS, GEO, Digital Collections, Museum Collections u Supported by external agreement between client & server (currently) F i.e., a GEO client talks to a GEO server

MCNC/CNIDR & A/WWW Enterprises FGDC Enhancements u Search Engine (Iindex/Isearch) F Field types (text, numeric, date, others) F Search in nested fields F Search in numeric fields F Date & Date Range Searching F Spatial Searching

MCNC/CNIDR & A/WWW Enterprises FGDC Enhancements u Z39.50 Implementation (ZDist) F Support for GEO attributes & element sets F GRS-1 record syntax F Support for additional (non-Isearch) search engines F Syntax to support nested query

MCNC/CNIDR & A/WWW Enterprises Outstanding Issues u User Interface F What fields are searchable and how does the user indicate them? F How complex can the geographic queries be? Bounding box only? Complex regions?