Judith Klavans 1, Jen Golbeck 1, Susan Chun 2, Rob Stein 3, Ed Bachta 3, Irene Eleta 1, Raul Guerra 1, Rebecca LaPlante 1 University of Maryland 1 Independent.

Slides:



Advertisements
Similar presentations
Database Searching: How to Find Journal Articles? START.
Advertisements

Language Learning through Database Searching Karen Bordonaro LOEX of the West 2010: Crossing Borders, Expanding Frontiers Calgary, Alberta June 11, 2010.
The Chinese Room: Understanding and Correcting Machine Translation This work has been supported by NSF Grants IIS Solution: The Chinese Room Conclusions.
Galia Angelova Institute for Parallel Processing, Bulgarian Academy of Sciences Visualisation and Semantic Structuring of Content (some.
Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.
Digitisation and Access to Archival Collections: A Case Study of the Sofia Municipal Government (1878 – 1879) Maria Nisheva-Pavlova, Pavel Pavlov Faculty.
Keyword extraction for metadata annotation of Learning Objects Lothar Lemnitzer, Paola Monachesi RANLP, Borovets 2007.
1 CS 502: Computing Methods for Digital Libraries Lecture 12 Information Retrieval II.
E-culture at UC Berkeley: Networked cultural and environmental data Caverlee Cary Staff Research Associate Geographic Information Science Center University.
Computational Linguistics: Computers and the Brain University of Maryland Yakov Kronrod Dan Parker Irene Eleta Raul David Guerra Judith Klavans.
OPAL Conference, August Social Tagging, Folksonomies & Controlled Vocabularies Inviting New Access Systems to our Academic Table Margaret Maurer.
1 Cataloging for School Librarians — It Matters! Margaret Maurer Head, Catalog and Metadata Kent State University Libraries and Media Services 2006 ILF.
Art Museum Image Consortium: enabling educational use of museum multimedia Acesso Multimédia ao Património Cultural Porto, October 7, 1999 Jennifer Trant.
© 2011 Pearson Prentice Hall, Salkind. Nonexperimental Research: Qualitative Methods.
Art Museum Image Consortium: enabling educational use of museum multimedia MUZEA, Kulturní Dedictiví a dígitaliní revoluce Jennifer Trant Executive Director.
Images of American Leadership
1. Develops ideas, plans, and produces artworks that serve specific functions (e.g., expressive, social, and utilitarian).
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Sharing linguistic multi-media resources Jacquelijn Ringersma Paul Trilsbeek Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands.
NERIL: Named Entity Recognition for Indian FIRE 2013.
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
Multilingual Information Exchange APAN, Bangkok 27 January 2005
Learning Models for Object Recognition from Natural Language Descriptions Presenters: Sagardeep Mahapatra – Keerti Korrapati
Lecture 6 Hidden Markov Models Topics Smoothing again: Readings: Chapters January 16, 2013 CSCE 771 Natural Language Processing.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Producción de Sistemas de Información Agosto-Diciembre 2007 Sesión # 8.
1 University of Palestine Topics In CIS ITBS 3202 Ms. Eman Alajrami 2 nd Semester
Using a Lemmatizer to Support the Development and Validation of the Greek WordNet Harry Kornilakis 1, Maria Grigoriadou 1, Eleni Galiotou 1,2, Evangelos.
Cross Curricular Resources What do we have? WHAT DO I SELL?
Amy Dai Machine learning techniques for detecting topics in research papers.
+ CULTURAL FRAMEWORK + REFRESH ON FORMAL FRAMEWORK.
Library databases. database NOUN:also data base Computer Science A collection of data arranged for ease and speed of search and retrieval. Also called.
GUIDE : PROF. PUSHPAK BHATTACHARYYA Bilingual Terminology Mining BY: MUNISH MINIA (07D05016) PRIYANK SHARMA (07D05017)
Improving Efficiencies Through Cost- Benefit Analysis of Metadata Creation Joyce Celeste Chapman NCSU Libraries Fellow Metadata and Digital Object roundtable:
Imaging Pittsburgh: Creating a Shared Gateway to Digital Image Collections of the Pittsburgh Region IMLS 2002 National Leadership Grant Library & Museum.
Elaine Ménard & Margaret Smithglass School of Information Studies McGill University [Canada] July 5 th, 2011 Babel revisited: A taxonomy for ordinary images.
GROWING PLANTS.  This unit:  2Nd course of first cicle of Primary Education.  Begining of 2nd term, the third and fourth week of March  Introduces.
TagLearner: A P2P Classifier Learning System from Collaboratively Tagged Text Documents Haimonti Dutta 1, Xianshu Zhu 2, Tushar Muhale 2, Hillol Kargupta.
C. Lawrence Zitnick Microsoft Research, Redmond Devi Parikh Virginia Tech Bringing Semantics Into Focus Using Visual.
Digitization – Basics and Beyond workshop Interoperability of cultural and academic resources New services for digitized collections Muriel Foulonneau.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
UKNARIC conference Understanding IELTS scores
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
Mr. P’s Class Term Paper All the Steps on the Path to an “A” Term Paper in World History.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Generating Query Substitutions Alicia Wood. What is the problem to be solved?
How Linked Open Data helps Museums Collaborate, Reach New Audiences, and Improve Access to art Information Eleanor E. Fink Manager, American Art Collaborative.
Art Museum Image Consortium: museum multimedia for education University of Pittsburgh November 2, 1999 Jennifer Trant Executive Director
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
Colby Smart, E-Learning Specialist Humboldt County Office of Education
A Faceted Interface to the Library Catalog Tito Sierra NCSU Libraries ALA Midwinter Meeting January 20, 2007.
PARTHENOS-project.eu EOSC market demand for art, humanties and cultural heritage Amsterdam– EGI Conference– 7/4/2016 Franco Niccolucci Scientific Coordinator,
Discovery and Metadata March 9, 2004 John Weatherley
1 Using DLESE: Finding Resources to Enhance Teaching Shelley Olds Holly Devaul 11 July 2004.
4TH Grade ELA Standards.
Linguistic Graph Similarity for News Sentence Searching
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Web News Sentence Searching Using Linguistic Graph Similarity
Using DLESE: Finding Resources to Enhance Teaching
CSE5544 Final Project Interactive Visualization Tool(s) for IEEE Vis Publication Exploration and Analysis Team Name: Publication Miner Team Members:
CSE5544 Final Project Interactive Visualization Tool(s) for IEEE Vis Publication Exploration and Analysis Team Name: Publication Miner Team Members:
Computational and Statistical Methods for Corpus Analysis: Overview
Social Knowledge Mining
CSc4730/6730 Scientific Visualization
Bulgarian WordNet Svetla Koeva Institute for Bulgarian Language
IL Step 2: Searching for Information
Introduction to Information Retrieval
EBSCOhost Digital Archives Viewer
Using Dictionaries in Translation (223 TRAJ)
Presentation transcript:

Judith Klavans 1, Jen Golbeck 1, Susan Chun 2, Rob Stein 3, Ed Bachta 3, Irene Eleta 1, Raul Guerra 1, Rebecca LaPlante 1 University of Maryland 1 Independent Museum Consultant 2 Indianapolis Museum of Art 3 Art Images Online: Leveraging Social Tagging and Language for Browsing

High Level Goals Images in Museums and Libraries Words…words… words – Traditional cataloging – Handbook and other descriptive text – Social tagging 2

Record from American Institute for College Teaching Minimal metadata for image, no descriptive terms. 3

Nefertiti Gardner (v. 11, pl. 3-33) The famous painted limestone bust of Akhenaton’s queen, Nefertiti (fig. 3-33), exhibits a similar expression of entranced musing and an almost mannered sensitivity and delicacy of curving contour. The piece was found in the workshop of the queen’s official sculptor, Thutmose, and is a deliberately unfinished model very likely by the master’s own hand. The left eye socket still lacks the inlaid eyeball, making the portrait a kind of before-and-after demonstration piece. With this elegant bust, Thutmose may have been alluding to a heavy flower on its slender stalk by exaggerating the weight of the crowned head and the length of the almost serpentine neck… 4

Excerpt of descriptive text from Gardner (v. 11, pl ), suggested CLiMB terms highlighted in yellow The famous painted limestone bust of Akhenaton’s queen, Nefertiti (fig. 3-33), exhibits a similar expression of entranced musing and an almost mannered sensitivity and delicacy of curving contour. The piece was found in the workshop of the queen’s official sculptor, Thutmose, and is a deliberately unfinished model very likely by the master’s own hand. The left eye socket still lacks the inlaid eyeball, making the portrait a kind of before-and-after demonstration piece. With this elegant bust, Thutmose may have been alluding to a heavy flower on its slender stalk by exaggerating the weight of the crowned head and the length of the almost serpentine neck… 5

User Tags Woman Hat One blind eye Beautiful features Long neck Beautiful woman Blue Yellow Blue hat Graceful eyes Ceramic statue of an elegant woman Half an ear I’d like this in my living room 6

Many kinds of Words….. Terms informed by art historical criteria: – deliberately unfinished model Ability to find related images – elongated neck – bust Potential for using thesaural resources – painted limestone bust 7

CLiMB: Computational Linguistics for Metadata Building Columbia University University of Maryland – UMIACS and Computational Linguistics and Information Processing (CLIP) Lab

9

STEVE.MUSEUM 18 museum partners Over 90,000 tags Nearly 1800 images tagged But tags are “unruly” and “chaotic” 10

T3: Tags, Terms, and Trust What computational linguistic techniques can be used to bring all these words to use? What is the nature of the tags? What tools can be helpful to the museums and library user communities? How does tagging in different languages compare? 11

Judith Klavans Jen Golbeck Irene Eleta Raul Guerra Rebecca LaPlante Museum Partners Rob Stein Ed Bachta Susan Chun … and 18 museums T3 Research Group 12

Funding Mellon Foundation – CLiMB-1 (Columbia Univ) – CLiMB-2 (Univ of Maryland, UMIACS-CLIP Lab) IMLS – – Steve.museum (Indianapolis Museum of Art) – T3 – IMA and University of Maryland National Science Foundation 13

Major Contribution – Creating Order over Chaos Developed techniques for processing social tags – What tags are related to other tags? – How are tags related (or not)? – What is the impact of cultural and language differences in tagging? Completed user analysis of tagging behavior Explored the value of tags compared with descriptive text 14

Highlight Two Specific Areas The value and use of – Computational linguistic analysis for tags – Multilingual social tags 15

Computational Processing Pipeline 16 Woman Hat One blind eye Beautiful features Long neck Beautiful woman Blue Yellow Blue hat Graceful eyes Ceramic statue of an elegant woman Half an ear Graceful - [Adjective] {graceful} Eyes – [Noun-Plural] {eye} One - [Number] Blind – [Adjective] {blind} Eye –[Noun-Singular] {eye} Word Count Woman – 3 Eye - 2

What is a tag ? 17 Important to treat tags related to the same topic as related, e.g. Line, line, lines, lining How many “tags”? Four, Three, or one? Stemmer leav Leaves Puppies puppi babi Babies Lemmatizer Leaves leave Puppies Babies puppy baby

What is a tag ? Comparing Stemmers and Lemmatizers Stemmers: reduce words to roots. – Advantage: fast with big data – Disadvantage: stems are not necessarily words Lemmatizers: reduce words to ‘dictionary’ form. – Advantage: lemmas are more useful – Disadvantage: slower because they rely on external resources like Wordnet 18

How do we do? Checking Lemmatizer performance against Lemma Gold Standard of 850 user-tags: Default configuration of the pipeline – 64% accuracy on all tags – 68% accuracy on all the correctly spelled tags Fine grained configuration of the pipeline – 76.47% accuracy on all tags – 81.35% accuracy on all the correctly spelled tags 19

Part of Speech Labeling “graceful eyes” - [[Adjective] [Noun]] “blue” – [Adjective] “face” – [Noun], [Verb] 20

WORD LEVEL 21

Harder than annotating text because of lack of context. Stanford Tagger has accuracy of 97.28% correct on WSJ (90.46% correct on unknown words) 22 Part-of-Speech Labeling

The famous painted limestone bust of Akhenaton’s queen, Nefertiti (fig. 3-33), exhibits a similar expression of entranced musing and an almost mannered sensitivity and delicacy of curving contour. The piece was found in the workshop of the queen’s official sculptor, Thutmose, and is a deliberately unfinished model very likely by the master’s own hand. The left eye socket still lacks the inlaid eyeball, making the portrait a kind of before-and-after demonstration piece. With this elegant bust, Thutmose may have been alluding to a heavy flower on its slender stalk by exaggerating the weight of the crowned head and the length of the almost serpentine neck… 23

Handbook Text Chunking 24 As a result of the way users tag images, – FEW images have LOTS of tags – LOTS of images have FEW tags Can we add tag-like phrases from text – Two types of phrases Names of things (Named Entities) Nouns and all the words that go with it ( Noun Phrases )

Results Precision for Named Entities is 46.39% for Full Match and 69.24% for partial matches. – Columbia University in the City of New York – Columbia University Precision for Noun Phrases is 79.95% for Full Match and 93.03% for Partial matches. – Famous painted limestone bust – Limestone bust 25

Conclusions Stemmers/Lematizers are the first step for removing variation from the tags. Part of Speech Tagging of social tags useful for getting a better understanding of the tags Handbook Text Analysis helps to add tags for poorly annotated images. The pipeline enables to add information to the user tags that is useful for applications using them. 26

Multilingual Social Tagging of Art Images Cultural Bridges and Diversity La Orana Maria, by Gauguin (Metropolitan Museum of Art) Polynesian women baby jungle Polinesia Virgen roja Jesús

Questions How similar are the tags provided by different language communities when tagging art images? Does it depend on the type of painting? Do tags reflect cultural differences in the interpretations of the paintings? 28

Contributions of this Study Expands T 3 to include Spanish Identifies tags that could be used for multilingual search Identifies tags that could be used for cross-cultural discovery and understanding Proposes the separation of tagging environments by language 29

773 English tags 566 Spanish tags 33 paintings Spanish participantsAmerican participants 5 questions 10 images IN NUMBERS… Genesis First Version By Lorser Feitelson (San Francisco MOMA) Data Collection 30

Tag tokenization and normalization Creation of lexical correspondences Processing and Comparing Tags 31 The Cotton Pickers by Winslow Homer (Los Angeles Contemporary Museum of Art)

Semantic Analysis 32 LaPlante, R., Klavans, J., Golbeck, J (under review). Subject Matter Categorization of Tags Applied to Digital Images from Art Museums.

Results Many exact translations The Cotton Pickers by Winslow Homer (LACMA) 33

Translation Pairs by Type of Painting 34

Differences The Cotton Pickers by Winslow Homer (LACMA) 35

Translation pairs > multilingual search –“general person or thing” in realistic paintings –“visual elements” in abstract paintings Different perspectives > design for discovery –“emotions and abstract ideas” 36

Publications Computational techniques to examine tags and phrases of importance for browsing on art image collections. Klavans, Judith, Guerra, Raul, LaPlante, Rebecca, Stein, Rob & Bachta, Ed (2011). Beyond Flickr: Not all image tagging is created equal. In AAAI 2011 Workshop: Language-Action Tools for Cognitive Artificial Agents, San Francisco, CA. Klavans, Judith, Stein, Rob, Chun, Susan, & Guerra, Raul (2011). Computational linguistics in museums: Applications for cultural datasets. Museums and the Web 2011, Philadelphia, PA. 37

Publications Comparing social tagging patterns in two languages to inform design for multilingual access to art images. Eleta, Irene and Jennifer Golbeck (2012, to appear) A Study of Multilingual Social Tagging of Art Images: Cultural Bridges and Diversity ACM Conference on Computer Supported Cooperative Work (CSCW 2012), Seattle, Washington. 38

Thank you! Publications: Tools available through Steve in Action: 39