Knowledge Organization Research in the last two decades: 1988-2008 Fidelia Ibekwe-SanJuanEric SanJuan.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

The KOS spectra: a tentative typology of Knowledge Organization Systems Renato Rocha Souza Douglas Tudhope Maurício Barcellos Almeida 11 th ISKO International.
Toward an International Sharing and Use of Subject Authority Data
E-learning and Libraries WSIS Forum, Geneva,11 May 2010 Tullio Basaglia, CERN Scientific Information Service, Geneva.
Alexandria Digital Library Project Integration of Knowledge Organization Systems into Digital Library Architectures Linda Hill, Olha Buchel, Greg Janée.
Secondary Source Research Using the Law Library’s Online Resources Bijal Shah Electronic Resources/Interlibrary Loan Librarian February 24, 2009.
Scholarly Search and Nomadic Scholars - A Publisher’s View Alix Vance Executive Director, GeoScienceWorld Fiesole, April 2012.
M. Balikova, NL CRCyfrowość bibliotek i archiwów Warszawa, Subject access in Czechia
The OCLC Metadata Switch Project Jean Godby, Thomas Hickey, Diane Vizine-Goetz OCLC Office of Research Digital Library Federation May 14, 2003.
Chapter 2. Slide 1 CULTURAL SUBJECT GATEWAYS CULTURAL SUBJECT GATEWAYS Subject Gateways  Started as links of lists  Continued as Web directories  Culminated.
The Library behind the scene How does it work ? The Library behind the scenes 1 JINR / CERN Grid and advanced information systems 2012 Anne Gentil-Beccot.
In the Name of Allah, the Compassionate, the Merciful All Praise Be to Allah, the Lord of the Universe, and peace and blessing be upon Prophet Muhammad.
Features and Uses of a Multilingual Full-Text Electronic Theses and Dissertations (ETDs) System Yin Zhang Kent State University Kyiho Lee, Bumjong You.
FAO of the UN Library and Documentation Systems Division ECDL 2003 Trondheim August 03 Automatic multi-label subject indexing in a multilingual environment.
IR & Metadata. Metadata Didn’t we already talk about this? We discussed what metadata is and its types –Data about data –Descriptive metadata is external.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
The Subject Librarian's Role in Building Digital Collections: Where Information Management and Subject Expertise Meet Ruth Vondracek Oregon State University.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
OLC Spring Chapter Conferences Metadata, Schmetadata … Tell Me Why I Should Care? OLC Spring Chapter Conferences, 2004 Margaret.
Tamas Doszkocs, Ph.D. Computer Scientist Meta Searching and Clustering.
Information Literacy Summon Catalog Summon is the only discovery service designed around a single, unified index of content. Provides a Google-like search.
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
GL12 Conf. Dec. 6-7, 2010NTL, Prague, Czech Republic Extending the “Facets” concept by applying NLP tools to catalog records of scientific literature *E.
Databases & Data Warehouses Chapter 3 Database Processing.
Educause October 29, 2001 A GEM of a Resource: The Gateway to Educational Materials Copyright Nancy Virgil Morgan, This work is the intellectual.
Some facets of knowledge management in mathematics Wolfram Sperber (Zentralblatt Math) Patrick Ion (Math Reviews) Facets of Knowledge Organization A tribute.
1/ 27 The Agriculture Ontology Service Initiative APAN Conference 20 July 2006 Singapore.
Teaching Metadata and Networked Information Organization & Retrieval The UNT SLIS Experience William E. Moen School of Library and Information Sciences.
Terminology services and the DDC: the High-Level Thesaurus and beyond Presented to the symposium Dewey goes Europe: on the use and development of the Dewey.
Rutherford Appleton Laboratory SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory Semantic Web Best Practices and Deployment.
INFORMATION LITERACY Definition and Importance. The American Library Association  The American Library Association gives the following definition for.
IL Step 1: Sources of Information Information Literacy 1.
1 DATABASES By: Hanna Ben-Or Phone: October 2011.
Multilingual Information Exchange APAN, Bangkok 27 January 2005
Library Research. Objectives Locate books and articles in the library using the online catalog Explore subject directories Explore digital libraries and.
Web Scale Discovery Service Vs Federated Search NIKESH NARAYANAN
INFORMATION LITERACY. What is information?  Information is knowledge derived from data  Knowledge is data that an individual recognizes as relevant.
ISKO 2010 TERMINOLOGY AS ORGANIZED KNOWLEDGE Boyan Alexiev Nancy Marksbury.
DRJI Directory of Research Journals Indexing Presenter: Elnaz Faraji, Kowsar Corp.
Metadata Change: Meeting the Evolving Requirements Oksana L. Zavalina Priya Kizhakkethil (UNT iSchool) Daniel Gelaw Alemneh Mark E. Phillips Hannah Tarver.
The Agricultural Ontology Service (AOS) A Tool for Facilitating Access to Knowledge AGRIS/CARIS and Documentation Group Library and Documentation Systems.
Reading Discussions Metcalfe’s Law paper What is metcalfe’s Law? Examples from the Web? How can we utilize it? How semantics contribute to social networks,
“Metadata is cataloguing” ?????????? Pat Bell HM Customs and Excise.
Elaine Ménard & Margaret Smithglass School of Information Studies McGill University [Canada] July 5 th, 2011 Babel revisited: A taxonomy for ordinary images.
A National Library for Australian Educational Research Sue Clarke Manager, Cunningham Library Australian Council for Educational Research 27 th IATUL Annual.
Information Retrieval Transfer Cycle Dania Bilal IS 530 Fall 2007.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
A Resource Discovery Service for the Library of Texas Requirements, Architecture, and Interoperability Testing William E. Moen, Ph.D. Principal Investigator.
Public Library Ebsco Database How to get full text educational articles from the Public Library.
LIS 204: Introduction to Library and Information Science Week Nine Kevin Rioux, PhD.
Differences and distinctions: metadata types and their uses Stephen Winch Information Architecture Officer, SLIC.
International Forum on “Local Wisdom as Power to Social and Economic Development” ELECTRONIC RESOURCES OF LOCAL INFORMATION IN NATIONAL LIBRARY OF VIETNAM.
Semantic Web Overview Diane Vizine-Goetz OCLC Research.
Conceptualizing Cultural Literacy as a Key Concept When Researching Caribbean Cultural Groups Ardis Hanson, PhD University of South Florida.
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Digital Video Library - Jacky Ma.
From the old to the new… Towards better resource discoverability
Summon discovers contents from one search box!
Tuesday Tech Talks Skeen Library Presents Today’s Topic: Presented by:
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
Introduction to Metadata
Thanks to Bill Arms, Marti Hearst
Cataloging the Internet
From a thesaurus standard to a general knowledge organization standard?! 04/12/2018.
Transportation Research Thesaurus:
Metadata to fit your needs... How much is too much?
DATABASES By: Hanna Ben-Or Phone:
Networked Information Resources
DATABASES By: Hanna Ben-Or Phone:
Networked Knowledge Organization Systems/Sources
Presentation transcript:

Knowledge Organization Research in the last two decades: Fidelia Ibekwe-SanJuanEric SanJuan

Outline Previous work Goal Data collection Analysis methodolgy Results Discussion

Previous works On trends survey in KO: – McIlwaine & Williamson (1999) – McIlwaine (2003) – Hjorland & Albrechtsen (1999) – Lopez-Huertas (2008) – Saumure & Shiri (2008) – Smiraglia (2009)

Previous works Personal readings of journals & ISKO proceedings Query: was a query constructed and submitted to a database in order to retrieve records? Publications: reading / perusing of full texts? Records: bibliographic records (titles & abstracts)

Previous works Major findings: – : McIlwaine & Williamson (1999); McIlwaine (2003) Classification schemes (UDC, DCC, LCSH,..) Bias in classification (gender, culture) Interoperability of KO vocabularies Rise of Internet technology, search engines, impact on KO Resource discovery Emerging trends in expert systems (NLP, ontologies, automatic indexing...) Terminology management problems Thesauri design Information visualisation in online context

Previous works Major findings: ? – ?: Lopez-Huertas (1998); Mainstream research in KO are reformulations of old problems (classification, thesauri) Recasting them in web era gives them a new life! Especially since KO is more & more entwined with sister fields 2 major driving forces of research in KO: – demand for quality & interoperability in a multilingual, multicultural world – Managing emergent knowledge in KOS in the semantic web era Both are reformulations of multidimensionality of knowledge Necessitating an inter- and multi-disciplinary effort etc...

Previous works Major findings: (40 yrs!) – (40 yrs!): pre & post-web era Saumure & Shiri (1998); Organizing corporate or business information Machine-assisted knowledge organization Information professionals Interoperability Cataloging and classification Classifying the web Digital preservation and digital libraries Metadata applications and uses Cognition Education Indexing and abstracting Thesauri initiatives

Previous works Major findings: (40 yrs!) – Saumure & Shiri (1998): (40 yrs!): pre & post- web era ; Trends b/w pre (<1993, date of 1 st navigator, Mosaic) and post- web era KO research focused throughout on mainstream topics Cataloguing, classification Pre-web era: more focused on indexing and cataloguing Post-web era: metadata generation & harvesting, interoperability, thus more technological thrust

Previous works Summary Summary – Despite methodological differences in data collection and analysis methods – Important overlaps in findings – Mainstream research is still driving KO (classification research, cataloguing, thesauri, bias,...) – Reformulations in the web era (interoperability, metadata creation & harvesting, assisted indexing & retrieval, terminology issues...)

Goal Trends survey of research on KO issues over past 2 decades ( ), 21 yrs. What can we get from automatic data analysis methods? Can they provide any useful insight?

Goal Epistemology : – Empiricism (how): methodology - observation of evidence from data – Pragmatism (why): is it useful and for whom? Some connection with bibliometrics but focus is not on mapping authors but on mapping contents Methodological difference with mainstream data analysis techniques: symbolic (linguistic & terminology) vs bag-of- word approach

Data collection (1) issue ISKO proceedings ISKO proceedings: not indexed in a machine-processable format (database) No problem for peer-reviewed journals... ambiguityKO concept But ambiguity of KO concept! At the end of the day... a manual selection of KO & LIS- related journals Records downloaded from Web-of-Science (WoS)

Data collection (2) List of 31 selected journals at KO International Classification 931 records out of which 838 came from KO & ancestor (International Classification) words in titles & abstracts KO Research trends will portray mostly publications from KO journal. KO Not the entire realm of publications on KO but we had to be content with that...

Sample record from ISI-WoS PT J AU RADA, R ROSSIMORI, A PATON, R RECTOR, A MAGLIANI, F ROBBE, PD TI THE GALEN DREAM SO INTERNATIONAL CLASSIFICATION AB Outlines the origin, needs and principles of GALEN, the Generalized Architecture for Languages, Encyclopedias, and Nomenclatures as applicable to Medicine. Short-term and long-term plans of GALEN have been elaborated to cope with possible developments. ''Milestones'' are given indicating what should be reached when and how much funding will be required for each milestone. In two ''vision'' pictures the situation before and after the introduction of GALEN is shown and the responsibilities at 4 different levels are listed. SN PY 1992 VL 19 IS 4 BP 188 EP 191 UT ISI:A1992KH

Analysis methodology (1) Empirical observations of how terminology depicts knowledge artefacts (titles & abstracts) – Terminology engineering Descriptive text data analysis (propose automatically a partition in the data) Hierarchical agglomerative clustering – Mapping & Visualisation: – Multidimensional view of domain structure: symbolic & numerical information TermWatch system TermWatch system (SanJuan & Ibekwe-SanJuan 2006)

Analysis methodology (2) - Corpus split in 2 periods * * Terminology modeling * Automatic extraction of terms * Term variant search - Clustering by semantic relations - Linking clusters by co-occurrence - Mapping & visualization

Analysis methodology (3) - Terminology modeling * Automatic extraction of terms * surface morpho-syntactic properties of terms * rule implementation * extraction of likely candidates * filtering: statistical measures or manual * Problem: statistical measures work on massive data

Analysis methodology (4) - Terminology modeling * Term variant search * surface morpho-syntactic operations b/w terms spelling variants * spelling variants (WordNet) synonymsUSE/UF * synonyms (USE/UF)(WordNet) BT/NT * likely BT/NT candidates: syntactic information RT * likely RT: lexico-syntactic information * some errors and noise * but in automation you do a trade off!

Analysis methodology (5) Some term variants acquired Paradigmatic organization (BT/NT) classification scheme universalclassification scheme genericclassification scheme knowledge classification scheme Library of Congress – LC (USE/UF) knowledge organisation scheme knowledge organization tool (RT) The system does not tag these relations as such They are assumed to be implied by the variations

Analysis methodology (6) Assumptions behind terminology modeling Co nsensus from studies on terminology/lexicography: new terms (denominations of concepts) are mostly created from existing terms Rare creation of terms ad nihilo Surface linguistic operations reveal semantic (conceptual?) relations between domain concepts By studying these operations and visualising how they relate terms Reveal the conceptual structure of a domain

Analysis methodology (7) Clustering 3 tier process: 1 st group terms by close semantic relations 2 nd hierarchical clustering by lesser semantic relations (many iterations) 3 rd link cluster labels by co-occurrence of labels or that of their variants Visualisation Thematic maps (Pajek) Navigation interface (browser)

Results (1)

Results (2) Main topics for period 1 ( ) – – Global structure : typical « core - peripheral » layout – Knowledge – Knowledge is the structuring poleClassification – Subjects gravitating around the Knowledge pole: analysis online vocabulary control standardization bibliographic information system indexing (automatic & manual) thesaurus construction and usage information documentation system translation

Results (3) In the last decade ( ): Research network is much more intertwined No one center but several « core » issues connected to one another Major topics are intertwined: KO issues classification information theoretic indexing language user evaluation Newer topics: web issues, metadata, knowledge discovery, computer algorithm,...

Results (4) , equal divide b/w: theoretical research information science, concept, classification theory, epistemological foundation,... user-oriented studies user librarian, user-defined descriptor, user evaluation mainstream KO issues classification, thesaurus, KO, term selection technology oriented handling of KO issues knowledge, system, transfer, knowledge representation, knowledge engineering, knowledge discovery, information processing, computer algorithm... web, web designer, web document information retrieval, terminology structuring, metadata, metadata quality

Discussion Evaluation of clusters: information-theoretic problem. No solution. No gold standard Goal of the method: precisely to propose a partition amongst the data Is it the best one? Reliance on external criteria: human (expert) evaluation So response from the community neeeded!

Thank you for listening