Automatic Subject Classification and Topic Specific Search Engines -- Research at KnowLib Anders Ardö and Koraljka Golub DELOS Workshop, Lund, 23 June.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
DELOS WP5 Workshop: Semantic Interoperability in DL systems, 17 th September 2004, Bath, UK Semantic Interoperability in Digital Library Systems Task 3:
Alexandria Digital Library Project Integration of Knowledge Organization Systems into Digital Library Architectures Linda Hill, Olha Buchel, Greg Janée.
Wincite Knowledge Warehousing and Networking Sophisticated Simplicity.
Chapter 2. Slide 1 CULTURAL SUBJECT GATEWAYS CULTURAL SUBJECT GATEWAYS Subject Gateways  Started as links of lists  Continued as Web directories  Culminated.
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
Interfaces for Selecting and Understanding Collections.
Towards Semantic Web Mining Bettina Berndt Andreas Hotho Gerd Stumme.
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Web Mining Research: A Survey
A Mobile World Wide Web Search Engine Wen-Chen Hu Department of Computer Science University of North Dakota Grand Forks, ND
Automating Keyphrase Extraction with Multi-Objective Genetic Algorithms (MOGA) Jia-Long Wu Alice M. Agogino Berkeley Expert System Laboratory U.C. Berkeley.
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
NERC Data Grid Helen Snaith and the NDG consortium …
Content Management Systems Digital Resources for Research in the Humanities 2001.
© 2001 Franz J. Kurfess Introduction 1 CPE/CSC 580: Knowledge Management Dr. Franz J. Kurfess Computer Science Department Cal Poly.
ACCESS TO QUALITY RESOURCES ON RUSSIA Tanja Pursiainen, University of Helsinki, Aleksanteri institute. EVA 2004 Moscow, 29 November 2004.
COHSE Informed WWW Link Navigation Using Ontologies Prof. Carole Goble, Sean Bechhofer Dr. Leslie Carr, Prof. Wendy Hall, Prof. David De Roure, Steve Harris,
1/39 Tools to “Think With” UW Knowledge Works: A Content Management System in Teaching and Learning Aaron Louie, Information Architect William Washington,
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Break Out Session on Infrastructure and Technology: A Report Vipul Kashyap AOS Workshop, Rome, 15 November 2001
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
OCLC Research: an update Lorcan Dempsey
Master Thesis Defense Jan Fiedler 04/17/98
Ihr Logo Chapter 7 Web Content Mining DSCI 4520/5240 Dr. Nick Evangelopoulos Xxxxxxxx.
Information Retrieval and Knowledge Organisation Knut Hinkelmann.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May.
NCSU Libraries Kristin Antelman NCSU Libraries June 24, 2006.
Ontologies and Lexical Semantic Networks, Their Editing and Browsing Pavel Smrž and Martin Povolný Faculty of Informatics,
Markup and Validation Agents in Vijjana – A Pragmatic model for Self- Organizing, Collaborative, Domain- Centric Knowledge Networks S. Devalapalli, R.
Semantic Network as Continuous System Technical University of Košice doc. Ing. Kristína Machová, PhD. Ing. Stanislav Dvorščák WIKT 2010.
Data Mining By Dave Maung.
MethodologyIntroductionConcluding remarksResultsBackground IntroductionBackgroundMethodologyResults To do Automated subject classification of engineering.
International Directory Network (IDN) Scalability, Security and Interoperability WGISS, 2006 Tom Northcutt Systems Administrator: GCMD September 13, 2006.
1 CS430: Information Discovery Lecture 18 Usability 3.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
GUIDED BY DR. A. J. AGRAWAL Search Engine By Chetan R. Rathod.
For: CS590 Intelligent Systems Related Subject Areas: Artificial Intelligence, Graphs, Epistemology, Knowledge Management and Information Filtering Application.
Indexing Mathematical Abstracts by Metadata and Ontology IMA Workshop, April 26-27, 2004 Su-Shing Chen, University of Florida
Information in the Digital Environment Information Seeking Models Dr. Dania Bilal IS 530 Spring 2005.
Using Domain Ontologies to Improve Information Retrieval in Scientific Publications Engineering Informatics Lab at Stanford.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Strategies for subject navigation of linked Web sites using RDF topic maps Carol Jean Godby Devon Smith OCLC Online Computer Library Center Knowledge Technologies.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.
Problem Query image by content in an image database.
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
Characteristics of Information on the Web Dania Bilal IS 530 Spring 2006.
Göttingen State and University Library enardus The European Academic Subject Gateway Service Heike Neuroth & Hans Becker Göttingen State.
PDS4 Demonstration Management Council Face-to-Face Flagstaff, AZ August 22-23, 2011 Sean Hardman.
Automatic vs manual indexing Focus on subject indexing Not a relevant question? –Wherever full text is available, automatic methods predominate Simple.
Smart Web Search Agents Data Search Engines >> Information Search Agents - Traditional searching on the Web is done using one of the following three: -
Information Organization: Overview
Building Search Systems for Digital Library Collections
EnTag Enhanced Tagging for Discovery Koraljka Golub, Jim Moon,
Web & Databases Dania Bilal IS 530 Fall 2006.
Data Mining Chapter 6 Search Engines
Planning and Storyboarding a Web Site
Semantic Interoperability in Digital Library Systems
Web archives as a research subject
Web Mining Research: A Survey
Information Organization: Overview
KnowLib, Dept. of Electrical and Information Technology,
Presentation transcript:

Automatic Subject Classification and Topic Specific Search Engines -- Research at KnowLib Anders Ardö and Koraljka Golub DELOS Workshop, Lund, 23 June 2004 Digital Information Systems Group Department of Information Technology Lund University, Sweden p. 1

KnowLib: Knowledge Discovery and Digital Library Research Group Goals information systems digital library services knowledge discovery distributed knowledge organization technologies –usability of knowledge organization systems (thesauri, classifications, subject headings systems, ontologies…) –user interfaces p. 2

KnowLib Members Anders Ardö, Associate Professor Department of Information Technology, Lund University Koraljka Golub, PhD Student Department of Information Technology, Lund University Traugott Koch, Digital Library Scientist Knowledge Technologies Group, NetLab, Lund University Libraries Michael Ovnell, Chief Scientist Bibliotekstjänst AB p. 3

KnowLib Projects: Log Analysis -- Renardus overall purpose: improve Renardus browsing and searching behaviour of users why log analysis? –catch unsupervised usage –evaluate the potential of thorough log analysis own software developed p. 4

Main Goals detailed usage patterns balance between browsing and searching and mixed activities hierarchical classification browsing behavior –usage degree of browsing support features p. 5

Renardus Home Page: p. 6

Main Navigation Features simple search advanced search subject browsing: DDC intellectual mapping of classification systems used by the distributed subject gateways p. 7

Subject Browsing Support Features graphical fish-eye presentation of the classification hierarchy (Graph. Browse) –text version (Text Browse) search entry into the browsing structure (Search Browse) merging of results from individual subject gateways (Merge Browse) p. 8

Preparing the Log Files Entries removedReason images or style sheets robots HTTP code 301 (redirections) malicious attacks 9 000local IP-numbers 4 690MS favicon.ico 408HTTP code 408 and other errors appx entries boiled down to entries (appx sessions) p. 9

Absolute Transitions (up to 1%) Circle sizes reflect a share in activities and arrow sizes a share in transitions. p. 10

Dominance of Browsing Activities more than 80% of sessions are dominated by browsing among users starting at home page (21%), still 57% browse and only 12.5% search possible reasons: –indexing of browse pages by search engines 71% start using Renardus at browsing pages –homepage design strongly “invites” for browsing p. 11

Major Conclusions clear dominance of browsing activities tendency to stay in the same group of activities good usage of the browsing support features, esp. graphical fish-eye browsing surprisingly low share of search activities needs to be further investigated log analysis can provide valuable insights p. 12

KnowLib Projects: KLIC-DDL… KLIC-DDL : KnowLib’s Intelligent Components of a Distributed Digital Library –architecture for a distributed digital library –implementation of information services using intelligent components automated subject classification, text categorization semi-intelligent information search agent with Web harvesting subject specific search engines etc. p. 13

KLIC-DDL: Automated Subject Classification… full-text Web-based documents established controlled vocabularies – browsing: DDC, FAST, Ei home-produced vocabularies: Materials Science, Carnivorous Plants machine learning: text categorization (TC) information retrieval: document clustering p. 14

…KLIC-DDL: Automated Subject Classification explore heuristics –e.g. importance of metadata vs. title vs. anchor text compare results of ”All Engineering” with a TC algorithm compare browsing controlled vocabulary versus automatically clustered vocabulary –advantages and disadvantages of each approach explore SOMs as a browsing interface p. 15

KLIC-DDL: Demonstrators Automatic subject classification of Web pages Multi-search demonstrator –the system analyses the query and dynamically generates indications based on which the user can modify his/her query Subject browsing of a harvest database Materials.dk p. 16