Presentation is loading. Please wait.

Presentation is loading. Please wait.

Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan.

Similar presentations


Presentation on theme: "Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan."— Presentation transcript:

1 Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan

2

3 Foreground: The Close Read Middle Distance: Sensemaking Background: Culturomics (Text Mining)

4 Definition: “Close Read” Text “ Close reading describes, in literary criticism, the careful, sustained interpretation of a brief passage of text. Such a reading places great emphasis on the particular over the general, paying close attention to individual words, syntax, and the order in which sentences and ideas unfold as they are read. ” -English Wikipedia, 6/4/2012

5 “Power and Passion in Shakespeare’s Pronouns Interrogating ‘you’ and ‘thou’” Penelope Freedman, 2007, MPG Books, 280 pp. Scene from “As you like it” by Daniel Maclise ( )

6 Conclusions (“Power and Passion of Shakespeare’s Pronouns”) Text “The subtleties of the use of ‘you’ and ‘thou’ that have emerged … can seem, at worst, random or, at best, unfathomable. … A set of oppositions has been revealed here: … These oppositions are complex and slippery: they may operate in parallel, may converge or diverge. Each pronoun choice has to be seen in a highly specific context.”

7 Definition: “Culturomics” Text Narrower than “digital humanities” and broader than “corpus linguistics”. ( Loose interpretation of definitions at culturomics.org )

8

9

10

11 “Culturomics” example: middle distance vs. middle ground

12 As an NLP Researcher, where do your ideas come from? Can HCI improve your work?

13 Sensemaking A vague information need Iteratively refine it by Searching Reading Analyzing Reach understanding Pirolli and Card 2005, Pirolli and Russell 2011

14 Sensemaking for Literature Study

15 Typifying and interpreting works, events, or characters. Exploring the prevalence of themes, and the language use around them

16 WordSeer (version 1) The North American Pre-civil-war Slave Narratives

17 The North American Slave Narratives Stories of the lives of former slaves Published by white abolitionist sponsors About 3000 narratives survive ~300 in prototype Do the north american slave narratives all conform to the same stereotypes?

18 A “Master Plan” for the slave narratives Text “... conventions so early and firmly established that one can imagine a sort of master outline drawn from the great narratives and guiding the lesser ones” -- Olney, J. “I was born: Slave Narratives and their Status as Autobiography”, Callaloo, 1984

19

20

21 Our approach Phase 1: Support searching for instances of conventions Phase 2: Support visualizing their occurrence in the collection

22 Searching for stereotypes Keyword search is not enough Search words: “cruel” “harsh” “overseer” “master” “mistress” Instead: “overseer” “master”, “mistress” described as “cruel”, “harsh” Also want the entire picture, for comparison “overseer” “master”, “mistress” described as ____?_____ ___?_____ described as cruel

23 Natural language processing The cruel overseer beat us severely. object subject modifier (automatically-extracted structure)

24 Grammatical search

25

26

27 Prevalence Position of occurrence within a document Across the entire collection Part 2: visualizing stereotypes

28 “I was born”

29 Results (presented at MLA 2012) Prevalent stereotypes “I was born” Separation from parents Cruel treatment Escape A ‘missed’ stereotype Parents’ death Not as strictly ordered as implied by Olney’s master plan.

30 Problems Vocabulary Same concept expressed with many different wordings Needed to see synonyms, nearby words, suggestions on searches Comparison and curation Couldn’t isolate and compare results on sub-collections of document

31 WordSeer (version 1.5) wordseer.berkeley.edu

32 The complete works of Shakespeare 42 documents -- plays and sonnet collections Analyze Hamlet. How does the portrayal of men and women in Shakespeare change in different circumstances? (CHI ’12 works in progress) English 203:Hamlet in the Humanities Lab Spring 2012, University of Calgary

33 The Vocabulary Problem Which words embody the concept of female beauty? 261 results

34

35

36 Collection and Comparison Does the treatment of love vary between the comedies and tragedies?

37

38 Collection and Comparison Step 2. Compare word usage

39 comediestragedies

40 “in love” comediestragedies

41 Results WordSeer 1.5 being successfully used (so far) in Hamlet class How does the relationship between Hamlet and his mother change over the course of the play? How does Act 1 portray the character of Horatio? Investigated changing language use around men and women Unknowingly replicated and extended previous findings by other Shakespeare scholar

42 How Does This Apply to Social Media Language?

43 As an NLP Researcher, where do your ideas come from? Can HCI improve your work?

44 Sentiment Analysis? Sarcasm?

45

46

47

48

49

50

51

52

53

54

55 Summary We suggest enhancing NLP research with sensemaking tools to help with hypothesis formation Midway between reading the text and blind statistics. Helps with hypothesis formulation, verification, and refinement. This is clearly useful for literature analysis. It remains to be seen if it can help with social media analysis.

56 Thank you!


Download ppt "Analyzing Text at the Middle Distance between the Close Read and Culturomics Marti A. Hearst U.C Berkeley Joint Work with Aditi Muralidharan."

Similar presentations


Ads by Google