How (Not) to Use a Semi-automated Clustering Tool Kat Hagedorn University of Michigan April 11, 2006.

Slides:



Advertisements
Similar presentations
1 Ontolog OOR Use Case Review Todd Schneider 1 April 2010 (v 1.2)
Advertisements

The Evolution of Dewey & WebDewey The Future is Now Andrea Kappler Cataloging Manager Evansville Vanderburgh Public Library Evansville, IN.
Powered by. The JACAMAR (fka. SDV) is... A stand-alone tool to structure and display data in Tree-Tables. An alternative for overwhelming standard spreadsheet.
Ken Varnum Copyright © 2001 Ford Motor Company Information Architecture at Ford Motor Company Ken Varnum Head, Web Development Group Library.
The eXtensible Catalog’s Drupal Toolkit: a Discovery Interface to Address Users’ Needs Jennifer Bowen University of Rochester, Rochester, NY ALA LITA Drupal.
Key-word Driven Automation Framework Shiva Kumar Soumya Dalvi May 25, 2007.
Lecture Tagging and Search Motivation Ubiquitous Presenter (UP) is a system designed at UCSD to promote and demonstrate the concept of “active learning.”
Furthering Collaboration Among OAI Data Providers and Service Providers Kat Hagedorn University of Michigan Libraries Digital Library Production Service.
Depositing e-material to The National Library of Sweden.
Taxonomies in Electronic Records Management Systems May 21, 2002.
University of Michigan’s OAIster Service Provider Kat Hagedorn OAIster/Metadata Harvesting Librarian University of Michigan, DLPS November 5, 2002.
Best Practices for OAI: A Status Report Kat Hagedorn Sarah Shreeves DLF Spring Forum San Diego, CA April
Please Describe Data ingestion. This includes support for real-time sensor data (object ring buffers) as well as simulation output (grid portals) –We have.
University of Michigan’s OAI Metadata Harvesting Project Kat Hagedorn OAIster Librarian, UM April 16, 2002.
Rights, Restrictions and Access Kat Hagedorn OAIster / Metadata Harvesting Librarian University of Michigan, DLPS May 31, 2003.
University of Michigan’s OAI Metadata Harvesting Project Kat Hagedorn OAIster Librarian, UM May 12, 2002.
1 ETT 429 Spring 2007 Microsoft Publisher II. 2 World Wide Web Terminology Internet Web pages Browsers Search Engines.
Mssm.nl Work in Progress Report - Subject Centric IT in Local Government Gabriel Hopmans, Peter-Paul Kruijsen Morpheus Software TMRA 2005.
IMLS Grant: University of Michigan’s Role Kat Hagedorn
University of Michigan’s OAIster Lessons Learned Kat Hagedorn OAIster/Metadata Harvesting Librarian University of Michigan, DLPS October 7, 2002.
OAIster Kat Hagedorn University of Michigan Libraries September 12, 2007.
The Open Archives Initiative and OAIster: Past, Present and Future Kat Hagedorn University of Michigan Libraries April 6, 2006.
Aquifer Portal at U of Michigan Kat Hagedorn and Perry Willett University of Michigan DLF Spring Forum, Austin TX April 11, 2006.
OAIster: What’s with the Weird Name? Kat Hagedorn UM Library Information Technology November 28, 2005.
ENVIRONMENTAL DATA MANAGEMENT & SHALE GAS PROGRAMS INTERNATIONAL PETROLEUM ENVIRONMENTAL CONFERENCE NOVEMBER 14, 2013.
Different approaches to digital preservation Hilde van Wijngaarden Digital Preservation Officer Koninklijke Bibliotheek/ National Library of the Netherlands.
Guide to the Reporting Galaxy An ACEware Presentation.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Sheet Music Consortium: Tools for Data Providers Jenn Riley Head, Carolina Digital Library and Archives The University of North Carolina at Chapel Hill.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
1 Research Groups : KEEL: A Software Tool to Assess Evolutionary Algorithms for Data Mining Problems SCI 2 SMetrology and Models Intelligent.
SeLeNe - Architecture George Samaras Kyriakos Karenos Larnaca – April 2003 THE UNIVERSITY OF CYPRUS.
1st Workshop on Intelligent and Knowledge oriented Technologies Universal Semantic Knowledge Middleware Marek Paralič,
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
Hands-on tutorial: Using Praat for analysing a speech corpus Mietta Lennes Palmse, Estonia Department of Speech Sciences University of Helsinki.
NCSU Libraries Andrew Pace & Emily Lynema NCSU Libraries May 24, 2006.
A Web Services Search Engine CS 8803 [AIA] - Spring 2008 Roland Krystian Alberciak Piotr Kozikowski Sudnya Padalikar Tushar Sugandhi.
Random Terrain Generation By Cliff DeKoker. About my project ● Incremental development ● Focus on creating height maps that mimic real terrain ● Allow.
Creating an Open Archives Metadata Harvesting Protocol Compliant Repository for the American Memory Online Collections OAI Open Meeting, Washington, DC.
Metadata Architecture at StatCan MSIS 2008 Luxembourg, April 7-9, 2008 Karen Doherty Director General Informatics Branch Statistics Canada.
ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.
FlexElink Winter presentation 26 February 2002 Flexible linking (and formatting) management software Hector Sanchez Universitat Jaume I Ing. Informatica.
Week #3 Objectives Partition Disks in Windows® 7 Manage Disk Volumes Maintain Disks in Windows 7 Install and Configure Device Drivers.
William H. Bowers Storage & Retrieval. William H. Bowers Topics Storing vs. Finding Retrieval Methods Associative Retrieval It Ain’t Document-centric.
First Indico Workshop An Introduction to the Indico Software Thomas Baron May 2013 CERN.
Finding the right balance between human effort and automation for metadata creation Jenn Riley Metadata Librarian Indiana University Digital Library Program.
OAI User Services Kat Hagedorn, UM University of Michigan 11/10/2005.
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
Best Practices for OAI: A Status Report Kat Hagedorn Sarah Shreeves DLF Spring Forum San Diego, CA April
MBAT User Workflows View an Atlas Open Data Upload Data Run a Query –Search Data Further Examination Microarray Data Further Examination of 2D Data –Search.
Jens Hartmann York Sure Raphael Volz Rudi Studer The OntoWeb Portal.
An Enterprise Clinical Data Search Solution. is Designed for: Informatics professionals, clinicians, statisticians, data managers and process/quality.
DLF Fall Forum DLF/IMLS OAI Project Update A Tale of Three Registries Plus a few other things By Tom Habing
NDIIPP Access Project Building on Metadata NDIIPP Partner Meeting June 25, 2009.
DLF Fall Forum The Distributed Library: OAI for Digital Library Aggregation UIUC’s Role: Registry of OAI Data Providers
1 Mind Mapping. 2 A Pictorial representation of an idea, a concept or a topic. Mind mapping helps to: o Provide an overview of a topic and its complex.
Web Services By: Anthony Rimel, CEO David Paterson, CFO Lindsay Aamot, CIO.
V7 Foundation Series Vignette Education Services.
Training Maps and Advanced Query Tools Midwest. Begin by Signing In You can always view the data in EDDMapS without signing in.
Access to Electronic Journals and Articles in ARL Libraries By Dana M. Caudle Cecilia M. Schmitz.
UNEP Live. What is UNEP Live? - An on-line knowledge management platform - Focuses on open access to global, regional and national data and knowledge.
Esri UC 2014 | Technical Workshop | Enhancing Web Map Performance in ArcGIS Online Julia Guard & Melanie Summers.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Using E-Business Suite Attachments
Lifecycle …of OAI …of DPs and SPs
Working with your archive organization Broadening your user community
Sophia Lafferty-hess | research data manager
Medusa at the University of Illinois
IMLS Grant: University of Michigan’s Role
Presentation transcript:

How (Not) to Use a Semi-automated Clustering Tool Kat Hagedorn University of Michigan April 11, 2006

Update on UM’s efforts  Built three research portals  DLF  DLF  MODS  MODS  Aquifer  Aquifer  Improvements for search / display  Integration of MODS format records  Simple vs. advanced searching  Inclusion of thumbnails

The need to cluster  Want to offer more than search within a generic, large corpus of data  How to partition the data?  Emory’s MetaCombine tool promising as a topical clustering agent  (Also interested in clustering by format, access restriction, OAI software used, etc.)

Clustering vs. classification  Clustering is main focus  Huge amount of data  Needed a tool to “find the topic”  Preferably a disjunctive tool (placing files under more than one topic)  Classification is secondary focus  Have potential classification (UM’s browse)  Marrying to current system nigh on impossible

Results: duration  First tried with small repository of ~5500 records (amnh)  Took around 25 minutes  Multiple tries with larger repository of ~270K records (dlps)  Took around 12 hours

Results: cluster names  Examples of set names from clustering UM’s metadata  Good: “europe”, “mechanical”, “architecture”  Not so good: “general”, “michigan”, “build”  Favorite: “southern literari literature fine messenger”  Granted…  Only asked for 20 clusters  Didn’t cluster hierarchically

Caveats  Metadata will always be difficult to cluster  Using a tool developed as a Web service, with obvious benefits  Expect necessity of mapping set names to real topical cluster names

What we need  Running the tool locally, with a local WSDL instance, would save lots (and lots) of time  Better set names…does this mean a better algorithm?  Ability to cluster by any criteria, not just topic, i.e., a post-processing module  Disjunctive clustering, meaning (so as not to hog storage) filename (not file) clustering