Presentation is loading. Please wait.

Presentation is loading. Please wait.

Gaiku Generating Haiku with Word Associations Norms Yael Netzer, David Gabay, Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion.

Similar presentations


Presentation on theme: "Gaiku Generating Haiku with Word Associations Norms Yael Netzer, David Gabay, Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion."— Presentation transcript:

1 Gaiku Generating Haiku with Word Associations Norms Yael Netzer, David Gabay, Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion University of the Negev Israel CALC’09 May 35 th 2009

2 Creativity “the forming of associative elements into new combinations which either meet specified requirements or are in some way useful…” [Mendick 1969] Three main pathes to a creative solution: -serendipity -similarity -mediation

3 Poetry Computational Creativity WAN Generating Haiku!

4 Haiku

5 Form of poetry originated in Japan, 16 th Century Three lines of 5,7,5 phonetic units (mora) Use present tense and use no judgmental words Adopted in Western languages, 20 th Century 5,7,5  3 short lines Traditionaly, reference to nature and seasons, but modern Haiku are not restricted Basho Haiku 古池や蛙飛込む水の音 old pond... a frog leaps in water’s sound

6 iced over pond I skip a rock the entire width blossomless but not unloved the old magnolia fishing guides boat in the background a new trip a holy cow a carton of milk seeking a church blind snakes on the wet grass tombstoned terror first date — the little pile of anchovies

7 Poetry Generation

8 Bo y Sul

9 Bo y Sul Structure

10 Bo y Sul Structure Content

11 Bo y Sul 3 lines, Grammatical, Haiku-like Inspiring, Interesting, Intriguing, Joyful, …

12 Previous works -Manurung [2003] -Manurung et al. [2000] -Gervas [2001] Emphasize on Structure, Less on Content

13 Body / Structure Haiku Corpus –~3,500 Haiku in English –Various sources amateurish sites children’s writings translations of classic Japanese Haiku of Bashu and others ’official’ sites of Haiku Associations (e.g., Haiku Path - Haiku Society of America).

14 Body / Structure POS Tag Count Line 1 Patterns: 280 JJ NN 276 NN NN... Line 2 Patterns: 64 DT_the JJ NN … Line 3 Patterns: …. Count Pattern Transitions: P(line2==DT_the NN | line1==JJ NN) =... … NN IN_of NNP DT_a NN IN_of NNS NN NN NNS CC NNS IN_on DT_a NN NN …

15 Body / Structure Line 1 Patterns: 280 JJ NN 276 NN NN... Line 2 Patterns: 64 DT_the JJ NN … Line 3 Patterns: …. Pattern Transitions: P(line2==DT_the NN | line1==JJ NN) =... … Google 1T-Web / Proj Gutenberg POS Tagged match

16 Body / Structure Google 1T-Web / Proj Gutenberg POS Tagged match JJ NNS DT_a JJ NN IN_of NN Line 1 Patterns: 280 JJ NN 276 NN NN... Line 2 Patterns: 64 DT_the JJ NN … Line 3 Patterns: …. Pattern Transitions: P(line2==DT_the NN | line1==JJ NN) =... …

17 Body / Structure Google 1T-Web / Proj Gutenberg POS Tagged match JJ NNS DT_a JJ NN IN_of NN pouring cats a pilot care of fighter Line 1 Patterns: 280 JJ NN 276 NN NN... Line 2 Patterns: 64 DT_the JJ NN … Line 3 Patterns: …. Pattern Transitions: P(line2==DT_the NN | line1==JJ NN) =... …

18 Body / Structure Line 1 Patterns: AA BB CC / 12 BB CC DD / 10 … Line 2 Patterns: CC DD EE / 20 … Line 3 Patterns: …. Pattern Transitions: P(Line2=AA BB | Line1= XX YY) … Google 1T-Web / Proj Gutenberg POS Tagged match Grammatical output Preserves Haiku “Texture” JJ NNS DT_a JJ NN IN_of NN pouring cats a pilot care of fighter

19 Soul? Requirements: good “story” –cohesive –surprising –provoke feelings/emotions –metaphorical –“Should leave the reader wondering…” … Creative!

20 Soul? An idea: capture “story” seed as sequence of concepts butterfly, spring, flower thief, steal, jail mosquito, blood, vampire but not any seed will do cat, feline, claw  too cohesive computer, coat, queen  too divergent

21 Soul? Is WordNet a good soul? not really it may give cohesiveness, but bad stories

22 Soul? Is WordNet a good soul? not really We actually measured it in Haiku Corpus

23

24 Butterfly Spring Flower The connection between these words is reconstructable by human It is not available in WordNet Where can we find such relations?

25 Word Association Norms

26 Word Association Norms (WAN) Collection of cue words a set of free associations (targets) with quantitative and statistical measures. (mouse  CAT 0.5, RAT 0.08, CHEESE 0.07, HOLE 0.05…) Given a cue - collect immediate responses of first word that comes to mind. Largest WAN we know for English is the University of South Florida Free Association Norms (Nelson et al., 1998). http://w3.usf.edu/FreeAssociation/ 5,019 cue words and 10,469 additional target that were collected with more than 6,000 participants since 1973. WAN – weighted directed graph, nodes are stemmed words.

27 water flower butterfly spring fall green water bloom

28 Why Word Associations Added value of WAN: an insight on language, not found in WordNet or are hard to acquire from corpora [Sinopalnikova & Smrz 2004] Associative thinking takes part in the process of writing and reading poetry. Haiku, because so short - relies on lexical associations for concept progression Hypothesis: word-associations are good catalyzers for creativity, can be used as a building block in the creative process of Haiku generation.

29 We first test this hypothesis by analyzing a corpus of existing Haiku poems. Can the creativity of text as reflected in word associations be quantified? Are Haiku poems indeed more associative than newswire text or prose?

30 Two nodes are connected iff one of them is a cue for the other. Associative distance: number of edges in the shortest path between the words in the associations-graph. WordNet distance: number of edges in the shortest path between any synset of one word to any synset of the other word Associativity of a text - the number of associated word pairs in the text, normalized by the number of word pairs in the text of which both words are in the WAN. WordNet-relations level - the number of WordNet-related word pairs in the text.

31 Average Associativity We measure the associavity and WordNet relations levels of 200 of the Haiku in our Haiku Corpus, as well as of random 12-word sequences from Project Gutenberg and from the NANC newswire corpus. Avg. WordNet Relations (<4) Avg. Assoc. Relations (<3) Source 2.020.26News 1.40.22Prose 1.380.32Haiku

32 Filling body with soul: Theme Selection Generating the seed of the story: –Start with a word  random walk on a word graph Many possible variants. We currently use: start with the node of the seed word do several short random walks keep resulting word set

33 water flower butterfly spring fall green water bloom Spring  {flower, butterfly…}

34 Filling body with soul For a given structure: –Choose first line containing seed word –Choose other lines containing a word from the set This is adequate, but relations might be straightforward Searching for a better soul:  Generate several poems for the pattern Rerank them based on associativity measure. Reranking catches further “residual” relations

35 6 alligator pear a handful of whites in the spring 8 avocado pear a kind of boots in the fall 10 pear salad a season of tears in the summer 10 pear tree a seasoning of spices in the fall 10 alligator pear a spring of tears in the blackness NN DET_a NN of NNS PP_in DET_the NN

36 Evaluation Method ‘Turing test’: –Was this Haiku written by human or by a computer? –How would you grade it between 1 to 5? Settings: –AUTO Haiku set: 15 Haiku created by Gaiku without any manual selection, 10 random human Haiku on same subjects –SEL set: 17 Haiku created by Gaiku, selected manually out several runs, 9 award winning human Haiku 52 subjects

37 Results: AUTO set

38 Results: SEL set

39 The Best of Gaiku Best in SEL. Classified as human - 77.2%, average grade 3.09 Best in AUTO. Classified as human - 72.2%, average grade 2.75 early dew the water contains teaspoons of honey cherry tree poisonous flowers lie blooming

40 Conclusions Word Association Norms have good potential in creative content generation Future Work: Lots! –Haiku: improve theme selection –Additional forms of creative texts Test WAN in general NLP tasks: –Use WAN for (Non-creative) Generation –Word Sense Disambiguation –Lexical chains –‘Guess the word’ given associations (for people with SLI)

41 iced over pond I skip a rock the entire width blossomless but not unloved the old magnolia first date — the little pile of anchovies fishing guides boat in the background a new trip a holy cow a carton of milk seeking a church blind snakes on the wet grass tombstoned terror

42 iced over pond I skip a rock the entire width blossomless but not unloved the old magnolia fishing guides boat in the background a new trip a holy cow a carton of milk seeking a church blind snakes on the wet grass tombstoned terror first date — the little pile of anchovies


Download ppt "Gaiku Generating Haiku with Word Associations Norms Yael Netzer, David Gabay, Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion."

Similar presentations


Ads by Google