High-Performance Digital Library Classification Systems: PI: Hsinchun Chen, The University of Arizona From Information Retrieval to Knowledge Management.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
Knowledge Management Systems: Development and Applications Part II: Techniques and Examples Hsinchun Chen, Ph.D. McClelland Professor, Director, Artificial.
Jing-Shin Chang National Chi Nan University, IJCNLP-2013, Nagoya 2013/10/15 ACLCLP – Activities ( ) & Text Corpora.
1 CS 430 / INFO 430 Information Retrieval Lecture 27 Classification 2.
WMES3103 : INFORMATION RETRIEVAL
Interfaces for Selecting and Understanding Collections.
Automating Keyphrase Extraction with Multi-Objective Genetic Algorithms (MOGA) Jia-Long Wu Alice M. Agogino Berkeley Expert System Laboratory U.C. Berkeley.
1 Information Retrieval and Web Search Introduction.
Nnadi & Bieber, NJIT © Lightweight Integration of Documents and Services (Digital Library Integration Infrastructure) Nkechi Nnadi and Michael Bieber.
Information Visualization for Digital Library Hsinchun Chen McClelland Professor University of Arizona PI, NSF DLI-1, DLI-2
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
Knowledge Management Systems: Development and Applications Part II: Techniques and Examples Hsinchun Chen, Ph.D. McClelland Professor, Director, Artificial.
Internet Resources Discovery (IRD) Advanced Topics.
1 UCB Digital Library Project An Experiment in Using Lexical Disambiguation to Enhance Information Access Robert Wilensky, Isaac Cheng, Timotius Tjahjadi,
Tamas Doszkocs, Ph.D. Computer Scientist Meta Searching and Clustering.
ACCESS TO QUALITY RESOURCES ON RUSSIA Tanja Pursiainen, University of Helsinki, Aleksanteri institute. EVA 2004 Moscow, 29 November 2004.
GL12 Conf. Dec. 6-7, 2010NTL, Prague, Czech Republic Extending the “Facets” concept by applying NLP tools to catalog records of scientific literature *E.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation SEASR Overview Loretta Auvil and Bernie Acs National.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
A hybrid method for Mining Concepts from text CSCE 566 semester project.
University of Dublin Trinity College Localisation and Personalisation: Dynamic Retrieval & Adaptation of Multi-lingual Multimedia Content Prof Vincent.
International Conference on Digital Libraries November 16, 2000 Kyoto, Japan Digital Libraries of Community Knowledge: The Coming World of the Interspace.
IEEE Knowledge Media Networking KMN’02 Keynote Address, CRL, Kyoto Japan, July 11, 2002 Concept Switching in the Interspace: Networking Infrastructure.
ICS-FORTH January 11, Thesaurus Mapping Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Bath, UK, January.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Fourth Edition Discovering the Internet Discovering the Internet Complete Concepts and Techniques, Second Edition Chapter 3 Searching the Web.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
TOPIC CENTRIC QUERY ROUTING Research Methods (CS689) 11/21/00 By Anupam Khanal.
CNI Spring Meeting April 26, 1999 Washington, DC THE NET OF THE 21st CENTURY: Concepts across the Interspace Bruce Schatz CANIS Laboratory Graduate School.
1 CS 430: Information Discovery Lecture 25 Cluster Analysis 2 Thesaurus Construction.
Indexing Mathematical Abstracts by Metadata and Ontology IMA Workshop, April 26-27, 2004 Su-Shing Chen, University of Florida
1 CS 430: Information Discovery Lecture 23 Cluster Analysis 2 Thesaurus Construction.
How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”
Digital libraries and web- based information systems Mohsen Kamyar.
Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.
Using Domain Ontologies to Improve Information Retrieval in Scientific Publications Engineering Informatics Lab at Stanford.
CODE (Committee on Digital Environment) July 26, 2000 Rice University THE NET OF THE 21st CENTURY: Concepts across the Interspace Bruce Schatz CANIS Laboratory.
Workshop on The Transformation of Science Max Planck Society, Elmau, Germany June 1, 1999 TOWARDS INFORMATIONAL SCIENCE Indexing and Analyzing the Knowledge.
Graduate School of Informatics Kyoto University, November 21, 2001 Technologies of the Interspace Peer-Peer Semantic Indexing Bruce Schatz CANIS Laboratory.
Revolutionary System Models, The Net, & The Public Interest The Interspace Prototype ( ) Digital Libraries Initiative ( ) Worm Community.
Revolution & Kids: Building the Future of the Net & Understanding the Structures of the World Bruce R. Schatz CANIS - Community Systems Laboratory University.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Information Retrieval
Search Strategies & Catalog Instruction Frederic Murray Assistant Professor MLIS, University of British Columbia BA, Political Science, University of Iowa.
L&I SCI 110: Information science and information theory Instructor: Xiangming(Simon) Mu Sept. 9, 2004.
Information Literacy, Search Strategies & Catalog Instruction Frederic Murray Assistant Professor MLIS, University of British Columbia BA, Political Science,
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Semantic Interoperability for Geographic Information Systems Tobun Dorbin Ng Artificial Intelligence Lab The University of Arizona.
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
1 CS 430: Information Discovery Lecture 28 (a) Two Examples of Cluster Analysis (b) Conclusion.
Third Edition Discovering the Internet Discovering the Internet Complete Concepts and Techniques, Second Edition Chapter 3 Searching the Web.
Graduate School of Informatics Kyoto University, November 14, 2001 Functions of the Interspace Infrastructure for Concept Spaces Bruce Schatz CANIS Laboratory.
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
Information Retrieval and Web Search
Information Retrieval and Web Search
Information Retrieval and Web Search
Taxonomies, Lexicons and Organizing Knowledge
Thanks to Bill Arms, Marti Hearst
CSE 635 Multimedia Information Retrieval
Course Summary ChengXiang “Cheng” Zhai Department of Computer Science
Information Retrieval in Digital Libraries: Bringing Search to the Net
Information Retrieval and Web Design
Information Retrieval and Web Search
Introduction to Search Engines
Presentation transcript:

High-Performance Digital Library Classification Systems: PI: Hsinchun Chen, The University of Arizona From Information Retrieval to Knowledge Management Cornell, October 18-19, 1999 DLI-2 All-Projects Meeting

Automatic generation of large-scale classification systems (CL) Research Plan High-performance simulation and visualization of Object Oriented Hierarchical Automatic Yellowpage (OOHAY) Integration of system and human-generated classification systems Research Goals:

Geoscience: Georef and Petroleum Abstracts (800K) and Georef thesaurus (26K terms) The Web: Indexable pages (10M) and Yahoo directory (250K nodes) Medicine: CancerLit (1M) and UMLS (250K concepts) Geoscience Medicine The Web 800 K 10 M 26 K 1 M 250 K Research Plan Testbed:

Computing: Collections: Georef PA User Evaluation: Arizona Cancer Center Arizona Science and Engineering Library Arizona Health Science Library Research Plan Partners:

The Field “The Knowledge Networking (KN) initiative focuses on the integration of knowledge from different sources and domains across space and time... KN research aims to move beyond connectivity to achieve new levels of interactivity, increasing the semantic bandwidth, knowledge bandwidth, activity bandwidth, an cultural bandwidth among people, organizations, and communities.” Knowledge Management/Knowledge Networking: Definition

The Field Knowledge Management Functionality: (Source: GartnerGroup, 1998) Concept “Yellow Pages” Value “Recommendation” Retrieved Knowledge Semantic Collaboration Clustering — categorization “table of contents” Semantic Networks “index” Dictionaries Thesauri Linguistic analysis Data extraction Collaborative filters Communities Trusted advisor Expert identification

Illinois DLI-1 project: “Federated Search of Scientific Literature” Research goal: Semantic interoperability across subject domain Technologies: Semantic retrieval and analysis technologies Natural Language Processing Text Tokenization Part-of-speech-tagging Noun phrase generation Techniques Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1

Natural Language Processing Text Tokenization Part-of-speech-tagging Noun phrase generation Techniques Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1

Illinois DLI project: “Federated Search of Scientific Literature” Research goal: Semantic interoperability across subject domain Technologies: Semantic retrieval and analysis technologies Natural Language Processing Heuristic term weighting Weighted co-occurrence analysis Co-occurrence analysis Techniques Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1

Co-occurrence analysis Heuristic term weighting Weighted co-occurrence analysis Techniques Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1

Illinois DLI project: “Federated Search of Scientific Literature” Research goal: Semantic interoperability across subject domain Technologies: Semantic retrieval and analysis technologies Natural Language Processing Document clustering Category labeling Optimization and parallelization Co-occurrence analysisNeural Network Analysis Techniques Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1

Neural Network Analysis Document clustering Category labeling Optimization and parallelization Techniques Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1

Illinois DLI project: “Federated Search of Scientific Literature” Research goal: Semantic interoperability across subject domain Technologies: Semantic retrieval and analysis technologies Natural Language Processing 1D: alphabetic listing of categories 2D: semantic map listing of categories 3D: interactive, helicopter fly- through using VRML Co-occurrence analysisNeural Network AnalysisAdvanced Visualization Techniques Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1

Advanced Visualization 1D, 2D, 3D Techniques Automatic Generation of CL:

Entity Extraction and Co-reference based on TREC and MUG Visualization techniques based on Fisheye, Fractal, and Spotlight Text segmentation and summarization based on Textile and Wavelets Techniques Automatic Generation of CL: (Continued)

Lexicon-enhanced indexing (e.g., UMLS Specialist Lexicon) Ontology-enhanced semantic tagging (e.g., UMLS Semantic Nets) Ontology-enhanced query expansion (e.g., WordNet, UMLS Metathesaurus) Techniques Integration of CL: Spreading-activation based term suggestion (e.g., Hopfield net)

Algorithmic optimization and parallelization on NCSA supercomputers (time machine) Advanced, interactive 2D/3D visualization via Java, VRML, and OpenGL Techniques High-performance Simulation and Visualization:

Research Status From YAHOO! To OOHAY? YAHOO! AHYOO AHYOO AHYOO AHYOO AHYOO AHYOO AHYOO O O HA Y ? O bject O riented H ierarchical A utomatic Y ellowpage

OOHAY : Visualizing the Web Arizona DLI-2 project: “From Interspace to OOHAY?” Research goal: automatic and dynamic categorization and visualization of ALL the web pages in US (and the world, later) Technologies: OOHAY techniques Multi-threaded spiders for web page collectionHigh-precision web page noun phrasing and entity identificationMulti-layered, parallel, automatic web page topic directory/hierarchy generation Dynamic web search result summarization and visualizationAdaptive, 3D web-based visualization Research Status

MUSIC ROCK OOHAY : Visualizing the Web … 50 6 Research Status

2. Search results from spiders are displayed dynamically 1. Enter Starting URLs and Key Phrases to be searched OOHAY : CI Spider, Meta Spider, Med Spider For project information and free download: Research Status

4. SOM is generated based on the phrases selected. Steps 3 and 4 can be done in iterations to refine the results. 3. Noun Phrases are extracted from the web ages and user can selected preferred phrases for further summarization. OOHAY : CI Spider, Meta Spider, Med Spider For project information and free download: Research Status

Digital Library Research on New York Times, Cover article, Sep 30, 1999 Research Status

JASIS, 2000, forthcoming (Chen) IEEE Computer, May 1996 (Schatz/Chen) IEEE Computer, February 1999 (Schatz/Chen) DL Special Issues and Activities: Research Status Second Asia DL Workshop, November 8-9, 1999, Taipei, Taiwan Berkeley (Wilensky), UCSB (Hill/Smith), Maryland (Greene/Shneiderman), Xerox PARC (Baldonado), IBM (Liu), Texas A&M (Shipman/Furuta), NASA (Kaplan)