Classification & Your Intranet: From Chaos to Control Susan Stearns Inmagic, Inc. E-Libraries E204 May, 2003.

Slides:



Advertisements
Similar presentations
Implementing Tableau Server in an Enterprise Environment
Advertisements

© 2008 EBSCO Information Services SUSHI, COUNTER and ERM Systems An Update on Usage Standards Ressources électroniques dans les bibliothèques électroniques.
DIGIDOC A web based tool to Manage Documents. System Overview DigiDoc is a web-based customizable, integrated solution for Business Process Management.
Distributed Data Processing
Site Collection, Sites and Sub-sites
Microsoft® SharePoint™ Products And Technologies “v2.0” Overview Brian Murphy Product Planner Microsoft Corporation.
Enterprise Search with SharePoint Portal Server Level: 300 Collaboration and Business Productivity.
CAPTURE SOFTWARE Please take a few moments to review the following slides. Please take a few moments to review the following slides. The filing of documents.
ENTERPRISE SEARCH AND ITS VALUE TO THE ENTERPRISE Lee Atkinson or why search and retrieval of ‘relevant’ information is only the start in meeting the business.
Taxonomies, Lexicons and Organizing Knowledge Wendi Pohs, IBM Software Group.
“ Leveraging SharePoint 2010 Search Technologies ” With: Ivan Neganov.
SOFTWARE PRESENTATION ODMS (OPEN SOURCE DOCUMENT MANAGEMENT SYSTEM)
System Center Configuration Manager Push Software By, Teresa Behm.
DEV392: Extending SharePoint Products And Technologies Through Web Parts And ASP.NET Clint Covington, Program Manager Data And Developer Services - Office.
Information Retrieval in Practice
Web Server Hardware and Software
1 ETT 429 Spring 2007 Microsoft Publisher II. 2 World Wide Web Terminology Internet Web pages Browsers Search Engines.
1 Computing for Todays Lecture 22 Yumei Huo Fall 2006.
Enterprise Search With SharePoint Portal Server V2 Steve Tullis, Program Manager, Business Portal Group 3/5/2003.
Portal Technologies An overview of portal products and other software.
Searching and Researching the World Wide: Emphasis on Christian Websites Developed from the book: Searching and Researching on the Internet and World Wide.
Tutorial 8 Sharing, Integrating and Analyzing Data
Implementing Metadata Marjorie M K Hlava, President Access Innovations, Inc. Albuquerque, NM
Overview of Search Engines
© InLoox ® InLoox PM Web App product presentation The Online Project Software.
Sharepoint Portal Server Basics. Introduction Sharepoint server belongs to Microsoft family of servers Integrated suite of server capabilities Hosted.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
INTRODUCTION TO WEB DATABASE PROGRAMMING
Crystal Hoyer Program Manager IIS Team Preview of features that will be announced at MIX09 Please do not blog, take pictures or video of session.
Using Microsoft FrontPage and Visual InterDev Stephen W. Meeley Vice President Product Management.
Classroom User Training June 29, 2005 Presented by:
WorkPlace Pro Utilities.
OFC304 Excel 2003 Overview: XML Support Joseph Chirilov Program Manager.
SharePoint and SharePoint Online: Today and what's next? Presented by Luke Abeling – IT Platforms.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
1 INTRODUCTION TO DATABASE MANAGEMENT SYSTEM L E C T U R E
Publish Calendars to the Web. CCUweb Presentation (10 Minutes) 1 Demonstration of published calendars (10 minutes) 2 Demonstration of importing calendar.
November 2003 Presented to “Commercializing RDF” Semantic Software Solutions for Enterprise Web Management International World Wide Web Conference 2004.
How did the internet develop?. What is Internet? The internet is a network of computers linking many different types of computers all over the world.
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. ACCESS 2007 M I C R O S O F T ® THE PROFESSIONAL APPROACH S E R I E S Lesson 13 – Advanced.
1 nlresearch.com The First ReSearch Engine: Northern Light® Susan M. Stearns Director of Enterprise Marketing March, 1999.
ICOLC Las Vegas March 28, 2003 TDNet E-Management Services for Consortia From E-Journals to E-Resources Michael Markwith President, TDNet Inc.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Module 10 Administering and Configuring SharePoint Search.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Search Engines Reyhaneh Salkhi Outline What is a search engine? How do search engines work? Which search engines are most useful and efficient? How can.
Web Search Engines AGED Search Engines Search engines (most have directories, too)  Yahoo  AltaVista  Lycos
Copyright © 2006 Pilothouse Consulting Inc. All rights reserved. Search Overview Search Features: WSS and Office Search Architecture Content Sources and.
Access and Information Protection Product Overview Andrew McMurray Technical Evangelist – Windows
Module 1: Overview of Microsoft Office SharePoint Server 2007.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
Lesson 10—Networking BASICS1 Networking BASICS The Internet and Its Tools Unit 3 Lesson 10.
Chapter 29. Copyright 2003, Paradigm Publishing Inc. CHAPTER 29 BACKNEXTEND 29-2 LINKS TO OBJECTIVES Attach an XML Schema Attach an XML Schema Load XML.
Web Design Terminology Unit 2 STEM. 1. Accessibility – a web page or site that address the users limitations or disabilities 2. Active server page (ASP)
5/29/2001Y. D. Wu & M. Liu1 Content Management for Digital Library May 29, 2001.
June 30, 2005 Public Web Site Search Project Update: 6/30/2005 Linda Busdiecker & Andy Nguyen Department of Information Technology.
Tutorial 1 Getting Started with Adobe Dreamweaver CS5.
XP Creating Web Pages with Microsoft Office
Database Principles: Fundamentals of Design, Implementation, and Management Chapter 1 The Database Approach.
Data mining in web applications
Information Retrieval in Practice
Chapter 8 Browsing and Searching the Web
Search Engine Architecture
PowerMart of Informatica
Taxonomies, Lexicons and Organizing Knowledge
Database Management System (DBMS)
Eric Sieverts University Library Utrecht Institute for Media &
InLoox PM Web App product presentation
How did the internet develop?
Introduction of Week 11 Return assignment 9-1 Collect assignment 10-1
Presentation transcript:

Classification & Your Intranet: From Chaos to Control Susan Stearns Inmagic, Inc. E-Libraries E204 May, 2003

Look familiar? ?#$%&* Why cant I find anything on the Intranet? How do we manage all the information we want to publish on the Intranet? User Content Manager

The issue: We know that we have documents in-house that contain hugely valuable information The problem: How do users find the right information at the right time? The answer: Automated spider and meta- classification software that allow an enterprise to automatically build and maintain a completely searchable database of critical content.

Automated Spidering Software A spider that crawls specified in- house servers and Web sites Extracts content from most popular file types and formats – HTML, Text, MSOffice, PDF – even Content can be loaded into a database

Key features to look for in a spider Document types: Microsoft Office, PDF, other formats (IFilter compatible) Zip files and folders can be crawled Remote administration Can be scheduled to run multiple times a day Web crawling can be set up to n levels deep Easy to create an XML transform to your database design Integrates with automated classification software

How a spider works File system crawl Database Web crawl The Spider Native document cache Extracted text cache XML load files Content Manager can add value (e.g. add additional meta-tags, etc.) Users can search and access Gathered content via a Web-browser

We Love Search: We Hate Search Search is ubiquitous but insufficient Only one slice into content Missing relationships across information Few are skilled at searching

The search engine paradox: Regardless of the product or a user's ability to use it, effective searches require the user to know the terms they need to use before they type them into the search engine. The Delphi Group

The Solution: Meta-Classification Enrich the content with meta-data Leverage XML and integrate content from multiple sources Extract other useful concepts Give users browse-able directories in addition to a search box

What is Meta-Classification? Automated meta-data extraction Meta-data includes subject information as well as names of people, company names, acronyms, key noun phrases Auto-classification of documents using a predefined taxonomy This meta-data can be mapped to a database along with the full-text of the document or a URL link

Why Meta-Classification? Creates structured information from unstructured data Allows local terminology to be reflected in searching Provides a browse-able directory Greatly enhances search through controlled vocabulary

How does it work? Spider/crawl the documents to create a corpus Automated software Identifies key words and phrases Maps them to known topics in taxonomy Scores the topics and derives a central theme Repeat for the sub-themes

Step 1: Identify words and phrases in the text Microsoft NASDAQ:MSFT, which won a round in its antitrust fight against the government today, launched its Microsoft.Net initiative that could someday replace computer hardware with software. Via XML (extensible markup language), Microsoft.net will enable use of much larger computers accessible on the Internet for storage of programs, word processing files and other data.

Step 2:Map them to topics within the taxonomy Government: government Computer Science: Internet, XML Hardware: computers Storage: data, storage Application Files: programs, language Word Processing Files: word processing files Software Companies: Microsoft Microsoft: Microsoft.Net

Step 3: Determine themes End results of classification of this story are: Central Theme: Microsoft Sub-theme 1: Word Processing Files Sub-theme 2: Software Companies Microsoft is a good match for central theme Microsoft.Net would have been the best classification Original taxonomy didnt know this topic Will be added to taxonomy

Customize the Taxonomy 1 million node taxonomy often too large Develop a custom taxonomy A subset of the large taxonomy Selected nodes to match business needs A set of rules to aggregate from the low level topics in the large taxonomy to the custom taxonomy

The Result Very large corpus of content can be classified in automated fashion Meta-data is used to create browse-able directories Meta-data is used for searching End user is given clues for finding the right information

Other features to consider Document summaries/abstracts Including external content Spidered from Web sites Integrated from licensed content sources User submissions User ratings/reviews

To Control Web Content Meta Data: Captured centrally User Interaction Word Doc Document Properties, Classification Search and Browse Content Collection (spider) Context (entity extraction/auto-classification) Corporate Intranet From Chaos

Thank You.