Presentation is loading. Please wait.

Presentation is loading. Please wait.

High-Performance Digital Library Classification Systems: PI: Hsinchun Chen, The University of Arizona From Information Retrieval to Knowledge Management.

Similar presentations


Presentation on theme: "High-Performance Digital Library Classification Systems: PI: Hsinchun Chen, The University of Arizona From Information Retrieval to Knowledge Management."— Presentation transcript:

1 High-Performance Digital Library Classification Systems: PI: Hsinchun Chen, The University of Arizona From Information Retrieval to Knowledge Management Cornell, October 18-19, 1999 DLI-2 All-Projects Meeting

2 Automatic generation of large-scale classification systems (CL) Research Plan High-performance simulation and visualization of Object Oriented Hierarchical Automatic Yellowpage (OOHAY) Integration of system and human-generated classification systems Research Goals:

3 Geoscience: Georef and Petroleum Abstracts (800K) and Georef thesaurus (26K terms) The Web: Indexable pages (10M) and Yahoo directory (250K nodes) Medicine: CancerLit (1M) and UMLS (250K concepts) Geoscience Medicine The Web 800 K 10 M 26 K 1 M 250 K Research Plan Testbed:

4 Computing: Collections: Georef PA User Evaluation: Arizona Cancer Center Arizona Science and Engineering Library Arizona Health Science Library Research Plan Partners:

5 The Field “The Knowledge Networking (KN) initiative focuses on the integration of knowledge from different sources and domains across space and time... KN research aims to move beyond connectivity to achieve new levels of interactivity, increasing the semantic bandwidth, knowledge bandwidth, activity bandwidth, an cultural bandwidth among people, organizations, and communities.” Knowledge Management/Knowledge Networking: Definition

6 The Field Knowledge Management Functionality: (Source: GartnerGroup, 1998) Concept “Yellow Pages” Value “Recommendation” Retrieved Knowledge Semantic Collaboration Clustering — categorization “table of contents” Semantic Networks “index” Dictionaries Thesauri Linguistic analysis Data extraction Collaborative filters Communities Trusted advisor Expert identification

7 Illinois DLI-1 project: “Federated Search of Scientific Literature” Research goal: Semantic interoperability across subject domain Technologies: Semantic retrieval and analysis technologies Natural Language Processing Text Tokenization Part-of-speech-tagging Noun phrase generation Techniques Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1

8 Natural Language Processing Text Tokenization Part-of-speech-tagging Noun phrase generation Techniques Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1

9 Illinois DLI project: “Federated Search of Scientific Literature” Research goal: Semantic interoperability across subject domain Technologies: Semantic retrieval and analysis technologies Natural Language Processing Heuristic term weighting Weighted co-occurrence analysis Co-occurrence analysis Techniques Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1

10 Co-occurrence analysis Heuristic term weighting Weighted co-occurrence analysis Techniques Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1

11 Illinois DLI project: “Federated Search of Scientific Literature” Research goal: Semantic interoperability across subject domain Technologies: Semantic retrieval and analysis technologies Natural Language Processing Document clustering Category labeling Optimization and parallelization Co-occurrence analysisNeural Network Analysis Techniques Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1

12 Neural Network Analysis Document clustering Category labeling Optimization and parallelization Techniques Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1

13 Illinois DLI project: “Federated Search of Scientific Literature” Research goal: Semantic interoperability across subject domain Technologies: Semantic retrieval and analysis technologies Natural Language Processing 1D: alphabetic listing of categories 2D: semantic map listing of categories 3D: interactive, helicopter fly- through using VRML Co-occurrence analysisNeural Network AnalysisAdvanced Visualization Techniques Automatic Generation of CL: Foundation from NSF/DARPA/NASA Digital Library Initiative-1

14 Advanced Visualization 1D, 2D, 3D Techniques Automatic Generation of CL:

15 Entity Extraction and Co-reference based on TREC and MUG Visualization techniques based on Fisheye, Fractal, and Spotlight Text segmentation and summarization based on Textile and Wavelets Techniques Automatic Generation of CL: (Continued)

16 Lexicon-enhanced indexing (e.g., UMLS Specialist Lexicon) Ontology-enhanced semantic tagging (e.g., UMLS Semantic Nets) Ontology-enhanced query expansion (e.g., WordNet, UMLS Metathesaurus) Techniques Integration of CL: Spreading-activation based term suggestion (e.g., Hopfield net)

17 Algorithmic optimization and parallelization on NCSA supercomputers (time machine) Advanced, interactive 2D/3D visualization via Java, VRML, and OpenGL Techniques High-performance Simulation and Visualization:

18 Research Status From YAHOO! To OOHAY? YAHOO! AHYOO AHYOO AHYOO AHYOO AHYOO AHYOO AHYOO O O HA Y ? O bject O riented H ierarchical A utomatic Y ellowpage

19 OOHAY : Visualizing the Web Arizona DLI-2 project: “From Interspace to OOHAY?” Research goal: automatic and dynamic categorization and visualization of ALL the web pages in US (and the world, later) Technologies: OOHAY techniques Multi-threaded spiders for web page collectionHigh-precision web page noun phrasing and entity identificationMulti-layered, parallel, automatic web page topic directory/hierarchy generation Dynamic web search result summarization and visualizationAdaptive, 3D web-based visualization Research Status

20 MUSIC ROCK OOHAY : Visualizing the Web … 50 6 Research Status

21 2. Search results from spiders are displayed dynamically 1. Enter Starting URLs and Key Phrases to be searched OOHAY : CI Spider, Meta Spider, Med Spider For project information and free download: http://ai.bpa.arizona.edu Research Status

22 4. SOM is generated based on the phrases selected. Steps 3 and 4 can be done in iterations to refine the results. 3. Noun Phrases are extracted from the web ages and user can selected preferred phrases for further summarization. OOHAY : CI Spider, Meta Spider, Med Spider For project information and free download: http://ai.bpa.arizona.edu Research Status

23 Digital Library Research on New York Times, Cover article, Sep 30, 1999 Research Status

24 JASIS, 2000, forthcoming (Chen) IEEE Computer, May 1996 (Schatz/Chen) IEEE Computer, February 1999 (Schatz/Chen) DL Special Issues and Activities: Research Status Second Asia DL Workshop, November 8-9, 1999, Taipei, Taiwan Berkeley (Wilensky), UCSB (Hill/Smith), Maryland (Greene/Shneiderman), Xerox PARC (Baldonado), IBM (Liu), Texas A&M (Shipman/Furuta), NASA (Kaplan)


Download ppt "High-Performance Digital Library Classification Systems: PI: Hsinchun Chen, The University of Arizona From Information Retrieval to Knowledge Management."

Similar presentations


Ads by Google