Finding Hierarchy in Facets. The Great Chain of Being.

Slides:



Advertisements
Similar presentations
® IBM Research © 2006 IBM Corporation Faceted Logic, Ontologies, and Wikis: Possible Approaches for ONTOLOG Content John Boz Handy-Bosma, Ph.D., Senior.
Advertisements

Using Social Bookmarking in Academic Research Adriana Reed J. Willard Marriott Library April 30, 2008.
Alexandria Digital Library Project Integration of Knowledge Organization Systems into Digital Library Architectures Linda Hill, Olha Buchel, Greg Janée.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
1 To Tag or Not to Tag!: Should we be structuring our Knowledge Assets? Is Free-Text Search Good Enough Boston KM Forum, March 16, 2006 Lynda Moulton,
Ciro Cattuto, Dominik Benz, Andreas Hotho, Gerd Stumme Presented by Smitashree Choudhury.
So What Does it All Mean? Geospatial Semantics and Ontologies Dr Kristin Stock.
Semantic Content Infrastructure for Knowledge Applications Tools of Change 2011 Thane Kerner, CEO Silverchair.
Taxonomies of Knowledge: Building a Corporate Taxonomy Wendi Pohs, Iris Associates
Taxonomies, Lexicons and Organizing Knowledge Wendi Pohs, IBM Software Group.
Information Retrieval in Practice
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
SemanTic Interoperability To access Cultural Heritage Frank van Harmelen Henk Matthezing Peter Wittenburg Marjolein van Gendt Antoine Isaac Lourens van.
Overview of Search Engines
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Some facets of knowledge management in mathematics Wolfram Sperber (Zentralblatt Math) Patrick Ion (Math Reviews) Facets of Knowledge Organization A tribute.
PREMIS Tools and Services Rebecca Guenther Network Development & MARC Standards Office, Library of Congress NDIIPP Partners Meeting July 21,
Taxonomies: Hidden but Critical Tools Marjorie M.K. Hlava President Access Innovations, Inc.
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
Data Management Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Terminology services and the DDC: the High-Level Thesaurus and beyond Presented to the symposium Dewey goes Europe: on the use and development of the Dewey.
Indexing Knowledge Daniel Vasicek 2014 March 27 Introduction Basic topic is : All Human Knowledge Who Cares? Simple Examples.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
Growing a Tree in the Forest: Constructing Folksonomies by Integrating Structured Metadata Anon Plangprasopchok 1, Kristina Lerman 1, Lise Getoor 2 1 USC.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
The Metadata Object Description Schema (MODS) NISO Metadata Workshop May 20, 2004 Rebecca Guenther Network Development and MARC Standards Office Library.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-1 Chapter 5 Business Intelligence: Data.
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
Kohonen Mapping and Text Semantics Xia Lin College of Information Science and Technology Drexel University.
Clustering Supervised vs. Unsupervised Learning Examples of clustering in Web IR Characteristics of clustering Clustering algorithms Cluster Labeling 1.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
Definition of a taxonomy “System for naming and organizing things into groups that share similar characteristics” Taxonomy Architectures Applications.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Semantic web course – Computer Engineering Department – Sharif Univ. of Technology – Fall Knowledge Representation Semantic Web - Fall 2005 Computer.
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
Algorithmic Detection of Semantic Similarity WWW 2005.
Mining real world data Web data. World Wide Web Hypertext documents –Text –Links Web –billions of documents –authored by millions of diverse people –edited.
Harvesting Social Knowledge from Folksonomies Harris Wu, Mohammad Zubair, Kurt Maly, Harvesting social knowledge from folksonomies, Proceedings of the.
Advanced Semantics and Search Beyond Tag Clouds and Taxonomies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
Subject cataloguing: faster, better, cheaper Wan Wong & Alison Dellit National Library of Australia.
Automating Readers’ Advisory to Make Book Recommendations for K-12 Readers by Alicia Wood.
Commonsense Reasoning in and over Natural Language Hugo Liu, Push Singh Media Laboratory of MIT The 8 th International Conference on Knowledge- Based Intelligent.
© Copyright 2008 STI INNSBRUCK A Semantic Model of Selective Dissemination of Information for Digital Libraries.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation Bioinformatics, July 2003 P.W.Load,
Using Wikipedia for Hierarchical Finer Categorization of Named Entities Aasish Pappu Language Technologies Institute Carnegie Mellon University PACLIC.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
Controlled Vocabulary & Thesaurus Design Types of Controlled Vocabularies.
Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.
A Self-organizing Semantic Map for Information Retrieval Xia Lin, Dagobert Soergel, Gary Marchionini presented by Yi-Ting.
Semantic Web Technologies Readings discussion Research presentations Projects & Papers discussions.
2017 ALA Midwinter Metadata Interest Group Meeting
Knowledge Management Systems
Personalized Social Image Recommendation
Federated & Meta Search
Taxonomies, Lexicons and Organizing Knowledge
Discovery Search vs. Library Catalogue
Introduction to Semantic Metadata & Semantic Web
Presented by Nick Janus
Presentation transcript:

Finding Hierarchy in Facets

The Great Chain of Being

Linnaeus chose a different facet

Why do we need facets in Search? Search result sets are bigger More metadata associated with each result Our brains cant efficiently manage large lists of data

Two search paradigms Choose your facets beforehand…

…or not

The simple keyword search box has become the tool of choice

Possible Facets Format Subject Language Author Place Era Publication Date Genre Collection

The FAST Model Several facets are peeled away from LCSH… Form (Genre) Chronological Geographical tag Personal Names Corporate Names …but a Hard Nut Remains: Topical Subject Headings

Browsable Hierarchy on a Human Scale - HILCC

Flat Tag Sets

Building Structure in the UI to Make Tags More Focused

Structured Patron Tags

Clustering Tags 101 Inputs: {User, Tag, Bib} Start with a similarity measure between tags. First tag forms initial cluster. For remaining tags, if similarity between tag and cluster exceeds threshold, add tag to cluster, else create new cluster. Complications: similarity measures, cluster normalization, multiple cluster membership, etc.

Vector Cosine Similarity Model each tag as a vector V of weighted features. Features are bib ids. Weights are the number of times all users assigned the tag to the feature. cos(V1, V2) = V1 V2 / (|V1|*|V2|), yields [0, 1] where 0 is no similarity and 1 is maximal similarity. Trigonometric interpretation: cosine of angular distance between vectors. V{1, 3} V{3, 1}

An Example of a Cluster (leonardo da vinci, bible stories, intelligent design, christianity, darwinism, opus dei, atheism, family tree of jesus christ, christian ethics, esoteric religion, morality tales, knights templar)

What Clusters Together? Unifications -- different user vocabularies (a.k.a. synonyms, misspellings, abbreviations). Abstraction -- different levels of generality (a.k.a. vertical relationships, IS-A, subsumption, hypernym). – Abstraction navigation. – Hierarchical roll-up for faceting. Semantic relationships -- various associations that link terms semantically (a.k.a. horizontal relationships, HAS-A, semantic co-occurrences). – See also navigation. And yes, spurious associations (a.k.a. noise, crap).

Structuring Clusters (Intrinsic Methods) Lexical subsumption -- book -> picture book -> childrens picture book. Operational subsumption -- T1 subsumes T2 if set of bibs tagged by T1 is superset of those of T2 (~80%). Use association rules to characterize association strength (with support and confidence metrics) between tags and infer relationships. Social network theory to analyze similarity graph. – Compute closeness centrality for tags in similarity graph. – Order tags by maximal centrality. – Add to taxonomy tree at most similar node or at root if similarity threshold is not met.

Using [Heymann and Garcia-Molina, 2006] christianity family tree of jesus christ opus dei leonardo da vinci esoteric religion knights templar atheism intelligent design darwinism christian ethics bible stories morality tales

Structuring Clusters (Extrinsic Methods) WordNet ([Stoica, Hearst, Richardson, 2007]) – Synsets to recognize synonyms and polysemy – IS-A links (hypernyms) to recognize abstraction; can also provide labels for hierarchical facets. LC Classifications / Subject Headings Specialized ontologies – Gazetteers for geospatial tags (e.g., GNS, GNIS, Alexandria Digital Library, Getty thesaurus of geonames). – Affect taxonomies (Sentiment AI). Introduces classification task to map into ontologies. Danger! Ontology structure may introduce noisy structure, causing more problems than benefits.

Widening the Similarity Net User / community modeling – Tag profiles for users – Tag taxonomies for specific user communities. Bib modeling – Similar titles based on tag features – Best of lists for user communities. Folding in other metadata during clustering – Pseudotag generation -- automated tag creation from metadata (e.g., LCSH), ontologies, or free text analysis (mining significant terms).

Full General-Purpose Automation? Techniques are exquisitely sensitive to features that are computationally accessible. – People use background knowledge and context. Absolutely useful for solving particular tasks. Human curation probably a necessary component. – Bootstrap structure through automated techniques. – Incentivize curation. – Manage human time via active learning techniques.

Bibliography