SEASR Applications National Center for Supercomputing Applications University of Illinois at Urbana-Champaign.

SEASR Applications National Center for Supercomputing Applications University of Illinois at Urbana-Champaign

Outline Audio Analysis with NEMA Text Analysis with Monk Emotion Tracking Hands-On

Defining Music Information Retrieval? Music Information Retrieval (MIR) is the process of searching for, and finding, music objects, or parts of music objects, via a query framed musically and/or in musical terms Music Objects: Scores, Parts, Recordings (WAV, MP3, etc.), etc. Musically framed query: Singing, Humming, Keyboard, Notation-based, MIDI file, Sound file, etc. Musical terms: Genre, Style, Tempo, etc.

NEMA Networked Environment for Music Analysis –UIUC, McGill (CA), Goldsmiths (UK), Queen Mary (UK), Southampton (UK), Waikato (NZ) –Multiple geographically distributed locations with access to different audio collections –Distributed computation to extract a set of features and/or build and apply models

SEASR @ Work – NEMA Executes a SEASR flow for each run –Loads audio data –Extracts features from every 10 second moving window of audio –Loads models –Applies the models –Sends results back to the WebUI

NEMA Flow – Blinkie

NEMA Vision researchers at Lab A to easily build a virtual collection from Library B and Lab C, acquire the necessary ground-truth from Lab D, incorporate a feature extractor from Lab E, combine with the extracted features with those provided by Lab F, build a set of models based on pair of classifiers from Labs G and H validate the results against another virtual collection taken from Lab I and Library J. Once completed, the results and newly created features sets would be, in turn, made available for others to build upon

Do It Yourself (DIY) 1

DIY Options

DIY Job List

DIY Job View

Nester: Cardinal Annotation Audio tagging environment Green boxes indicate a tag by a researcher Given tags, automated approaches to learn the pattern are applied to find untagged patterns

Nester: Cardinal Catalog View

Examining Audio Collection Tagged a set of examples Male and Female

SEASR @ Work: MONK MONK: a case study Texts as data Texts from multiple sources Texts reprocessed into a new representation Different tools using the same data Slides from, John Unsworth, “Tools for Textual Data”, May 20, 2009

MONK Project MONK provides: 1400 works of literature in English from the 16th - 19th century = 108 million words, POS-tagged, TEI-tagged, in a MySQL database. Several different open-source interfaces for working with this data A public API to the datastore SEASR under the hood, for analytics Slides from, John Unsworth, “Tools for Textual Data”, May 20, 2009

MONK “A word token is the spelling or surface of form of a word. MONK performs a variety of operations that supply each token with additional 'metadata’. –Take something like 'hee louyd hir depely'. –This comes to exist in the MONK textbase as something like hee_pns31_he louyd_vvd_love hir_pno31_she depely_av-j_deep Because the textbase 'knows' that the surface 'louyd' is the past tense of the verb 'love' the individual token can be seen as an instance of several types: the spelling, the part of speech, and the lemma or dictionary entry form of a word.” (Martin Mueller) Slides from, John Unsworth, “Tools for Textual Data”, May 20, 2009

Text Data Texts represent language, which changes over time (spellings) Comparison of texts as data requires some normalization (lemma) Counting as a means of comparison requires units to count (tokens) Treating texts as data will usually entail a new representation of those texts, to make them comparable and to make their features countable. Slides from, John Unsworth, “Tools for Textual Data”, May 20, 2009

Text from Multiple Sources Five aphorisms about textual data (causing tool- builders to weep): Scholars are interested in texts first, data second Tools are only useful if they can be applied to texts that are of interest No single collection has all texts No two collections will be identical in format No one collection will be internally consistent in format Slides from, John Unsworth, “Tools for Textual Data”, May 20, 2009

Public MONK Texts Documenting the American South from UNC-Chapel Hill –(1.5 Gb, 8.5 M words) Early American Fiction from the University of Virginia –(930 Mb, 5.2 M words) Wright American Fiction from Indiana University –(4 Gb, 23 M words) Shakespeare from Northwestern University – (170 M, 850 K words) About 7 Gigabytes, 38 M words Slides from, John Unsworth, “Tools for Textual Data”, May 20, 2009

Restricted Monk Texts Eighteenth-Century Collection Online (ECCO) from the Text Creation Partnership –(6 Gb, 34 M words) Early English Books Online (EEBO) from the Text Creation Partnership –(7 G, 39 M words) Nineteenth-Century Fiction (NCF) from Chadwyck Healey –(7 G, 39 M words) About 20 Gb, 112 M words Slides from, John Unsworth, “Tools for Textual Data”, May 20, 2009

MONK Ingest Process Texts reprocessed into a new representation TEI source files (from various collections, with various idiosyncracies) go through Abbot, a series of xsl routines that transform the input format into TEI- Analytics (TEI-A for short), with some curatorial interaction. “Unadorned” TEI-A files go through Morphadorner, a trainable part-of-speech tagger that tokenizes the texts into sentences, words and punctuation, assigns ids to the words and punctuation marks, and adorns the words with morphological tagging data (lemma, part of speech, and standard spelling). Slides from, John Unsworth, “Tools for Textual Data”, May 20, 2009

MONK Ingest Process Adorned TEI-A files go through Acolyte, a script that adds curator-prepared bibliographic data Bibadorned files are processed by Prior, using a pair of files defining the parts of speech and word classes, to produce tab-delimited text files in MySQL import format, one file for each table in the MySQL database. cdb.csh creates a Monk MySQL database and imports the tab-delimited text files. Slides from, John Unsworth, “Tools for Textual Data”, May 20, 2009

MONK Tools MONK Datastore Flamenco Faceted Browsing MONK extension for Zotero TeksTale Clustering and Word Clouds FeatureLens SEASR The MONK Workbench (Public) The MONK Workbench (Restricted) Slides from, John Unsworth, “Tools for Textual Data”, May 20, 2009

SEASR @ Work – MONK Workbench Executes flows for each analysis requested –Predictive modeling using Naïve Bayes –Predictive modeling using Support Vector Machines (SVM) –Feature Comparison (Dunning Loglikelihood)

Feature Lens “The discussion of the children introduces each of the short internal narratives. This champions the view that her method of repetition was patterned: controlled, intended, and a measured means to an end. It would have been impossible to discern through traditional reading“

Dunning Loglikelihood Tag Cloud Words that are under-represented in writings by Victorian women as compared to Victorian men. Results are loaded into Wordle for the tag cloud —Sara Steger

SEASR @ Work – Emotion Tracking Goal is to have this type of Visualization to track emotions across a text document (Leveraging flare.prefuse.org)

UIMA Structured data Two SEASR examples using UIMA POS data –Frequent patterns (rule associations) of nouns (fpgrowth) –Sentiment analysis of adjectives

UIMA Unstructured Information Management Applications

UIMA + P.O.S. tagging Analysis Engines to analyze document to record Part Of Speech information. OpenNLP Tokenizer OpenNLP PosTagger OpenNLP SentanceDetector POSWriter Serialization of the UIMA CAS

UIMA to SEASR: Experiment I Finding patterns

SEASR + UIMA: Frequent Patterns Frequent Pattern Analysis on nouns Goal: –Discover a cast of characters within the text –Discover nouns that frequently occur together character relationships

Frequent Patterns: visualization Analysis of Tom Sawyer 10 paragraph window Support set to 10% Analysis of Tom Sawyer 10 paragraph window Support set to 10%

UIMA to SEASR: Experiment II Sentiment Analysis

UIMA + SEASR: Sentiment Analysis Classifying text based on its sentiment –Determining the attitude of a speaker or a writer –Determining whether a review is positive/negative Ask: What emotion is being conveyed within a body of text? –Look at only adjectives (UIMA POS) lots of issues and challenges Need to Answer: –What emotions to track? –How to measure/classify an adjective to one of the selected emotions? –How to visualize the results?

Sentiment Analysis: Emotion Selection Which emotions: –http://en.wikipedia.org/wiki/List_of_emotionshttp://en.wikipedia.org/wiki/List_of_emotions –http://changingminds.org/explanations/emotions/basic %20emotions.htmhttp://changingminds.org/explanations/emotions/basic %20emotions.htm –http://www.emotionalcompetency.com/recognizing.ht mhttp://www.emotionalcompetency.com/recognizing.ht m Parrot’s classification (2001) –six core emotions –Love, Joy, Surprise, Anger, Sadness, Fear

Sentiment Analysis: Emotions

Sentiment Analysis: Using Adjectives How to classify adjectives: –Lots of metrics we could use … Lists of adjectives already classified –http://www.derose.net/steve/resources/emotionwor ds/ewords.htmlhttp://www.derose.net/steve/resources/emotionwor ds/ewords.html –Need a “nearness” metric for missing adjectives –How about the thesaurus game ?

SEASR: Sentiment Analysis Using only a thesaurus, find a path between two words –no antonyms –no colloquialisms or slang

SEASR: Sentiment Analysis For example, how would you get from delightful to rainy? (answer coming soon, unless you find it first)

SEASR: Sentiment Analysis How to get from delightful to rainy ? ['delightful', 'fair', 'balmy', 'moist', 'rainy']. ['sexy', 'provocative', 'blue', 'joyless’] ['bitter', 'acerbic', 'tangy', 'sweet', 'lovable’] sexy to joyless? bitter to lovable?

SEASR: Sentiment Analysis Use this game as a metric for measuring a given adjective to one of the six emotions. Assume the longer the path, the “farther away” the two words are.

SEASR: Sentiment Analysis Introducing SynNet: a traversable graph of synonyms (adjectives)

Thesaurus Network (SynNet) Used thesaurus.com, create link between every term and its synonyms Created a large network Determine a metric to use to assign the adjectives to one of our selected terms –Is there a path? –How to evaluate best paths?

SynNet: rainy to pleasant

SynNet Metrics Path length Number of Paths Common nodes Symmetric: a  b b  a Unique nodes in all paths

SynNet Metrics: Path Length Rainy to Pleasant –Shortest path length is 4 (blue) Rainy, Moist, Watery, Bland, Pleasant –Green path has length of 3 but is not reachable via symmetry –Blue nodes are nodes 2 hops away

SynNet Metrics: Common Nodes Common Nodes –depth of common nodes Example –Top shows happy –Bottom shows delightful –Common nodes shown in center cluster

SynNet Metrics: Symmetry Symmetry of path in common nodes

SynNet: Sentiment Analysis Step 1: list your sentiments/concepts –joy, sad, anger, surprise, love, fear Step 2: for each concept, list adjectives –joy: joyful, happy, hopeful –surprise:surprising,amazing, wonderful, unbelievable Step 3: for each adjective in the text, calculate all the paths to each adjective in step 2 Step 4: pick the best adjective (using metrics)

SynNet: Sentiment Analysis Example: the adjective to score is incredible

SynNet: Sentiment Analysis Incredible to loving (concept: love) Blue paths are symmetric paths

SynNet: Sentiment Analysis Incredible to surprising (concept: surprise) Blue paths are symmetric paths

SynNet: Sentiment Analysis Incredible to joyful (concept: joy)

SynNet: Sentiment Analysis Incredible to joyless (concept: sad)

SynNet: Sentiment Analysis Incredible to fearful (concept: fear) Winner!

SynNet: Sentiment Analysis Try it yourself: http://services.seasr.org/synnet – /synnet/path/white/afraid – /synnet/path/white/afraid?format=xml – /synnet/path/white/afraid?format=json – /synnet/path/white/afraid?format=flash –Database is only adjectives –More api coming soon, visualizations

Sentiment Analysis: Issues Not a perfect solution –still need context to get quality Vain –['vain', 'insignificant', 'contemptible', 'hateful'] –['vain', 'misleading', 'puzzling', 'surprising’] Animal –['animal', 'sensual', 'pleasing', 'joyful'] –['animal', 'bestial', 'vile', 'hateful'] –['animal', 'gross', 'shocking', 'fearful'] –['animal', 'gross', 'grievous', 'sorrowful'] Negation –“My mother was not a hateful person.”

Sentiment Analysis: Process Process Overview –Extract the adjectives (SEASR, POS analysis) –Read in adjectives (SEASR) –Label each adjective (SEASR, SynNet) –Summarize windows of adjectives lots of experimentation here –Visualize the windows

Sentiment Analysis: Visualization New SEASR visualization component –Based on flash using the flare ActionScript Library http://flare.prefus e.org/ –Still in development http://demo.seasr.org/public/resources /data/viewer/emotions.html

Demonstration Son of Blinkie from the NEMA Project MONK Emotion Tracking

Learning Exercises

Discussion Questions What part of these applications can be useful to your research?

SEASR Applications National Center for Supercomputing Applications University of Illinois at Urbana-Champaign.

Similar presentations

Presentation on theme: "SEASR Applications National Center for Supercomputing Applications University of Illinois at Urbana-Champaign."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

SEASR Applications National Center for Supercomputing Applications University of Illinois at Urbana-Champaign.

Similar presentations

Presentation on theme: "SEASR Applications National Center for Supercomputing Applications University of Illinois at Urbana-Champaign."— Presentation transcript:

Similar presentations

About project

Feedback