Presentation is loading. Please wait.

Presentation is loading. Please wait.

Natural Language Processing Verbatim Text Coding and Data Mining Report Generation Josef S.W. Leung Ching-Long Yeh

Similar presentations


Presentation on theme: "Natural Language Processing Verbatim Text Coding and Data Mining Report Generation Josef S.W. Leung Ching-Long Yeh"— Presentation transcript:

1 Natural Language Processing Verbatim Text Coding and Data Mining Report Generation Josef S.W. Leung (j.leung@ieee.org) Ching-Long Yeh (chingyeh@cse.ttit.edu.tw) NLP One of the Top Priority Funding Items in Computer Science Research -- National Natural Science Foundation, China

2 Language Listen (Understand) Speak (Generate)

3 Natural Language Internal Representations Generation Analysis/ Understanding Natural Language Processing

4 Outline of Presentation NLP IntroductionNLP Introduction – Natural Language Analysis/Understanding – Natural Language Generation Case 1: Verbatim Text CodingCase 1: Verbatim Text Coding – May need NL analysis techniques Case 2: Data Mining Report GenerationCase 2: Data Mining Report Generation – May need NL generation techniques

5 Pre-processing Tokens Parsing Syntactic structure Semantic Interpretation Semantic representation Contextual Interpretation Knowledge representation Input sentence Modules of NL Understanding

6 Parsing for Syntactic Analysis Grammar Rules: S NP VP NP + VP ART + N V + NP Lexicon: N N V ART dog cat chased the

7 s NPVP ARTNVNP dogchasedthecat ARTN the Syntactic Structure

8 Structural Ambiguity Time flies like an arrow.Time flies like an arrow. The passage of time is as quick as an arrow.The passage of time is as quick as an arrow. A species of flies called ‘time flies’ enjoy an arrow.A species of flies called ‘time flies’ enjoy an arrow.

9 Structural Ambiguity The man saw the girl with telescope.The man saw the girl with telescope. The man saw the girl who possessed the telescope.The man saw the girl who possessed the telescope. The man saw the girl with the aid of the telescope.The man saw the girl with the aid of the telescope.

10 User’s Goal Surface Sentences Strategic Component Tactical Component Domain KB Planning Operators User Model Discourse Model Linguistic Rules & Lexicon Text Planning Linguistic Realization Natural Language Generation

11 Unification Grammar the man sees a sheep the man sees a sheep S [numb=X, tense=T] NP [numb=X] VP [numb=X, tense=T] VP[numb=N,tense=M] V [numb=N, tense=M] NP NP [numb=Y] det [numb = Y] noun [numb = Y] man: noun [numb = sing] a:det [numb = sing] the: det sheep:noun sees: [tense = pres, numb = sing]

12 Migraine abortive treatment is used to abort migraine. ((cat clause) (process ((lex “use”) (type material))) (partic ((affected ((cat proper) (lex “migraine abortive treatment”))) (agent none))) (circum ((purpose ((cat clause) (keep-in-order no) (keep-for no) (position end) (process ((lex “abort”) (effect-type creative) (type material))) (partic ((created ((lex “migraine”) (countable no) (cat common)))))))))))

13 Verbatim Text Coding A text content classification problem.A text content classification problem. Group semantically similar answer items.Group semantically similar answer items. Develop a code list/tree to represent the answer item groups.Develop a code list/tree to represent the answer item groups. Simple NL analysis techniques may help.Simple NL analysis techniques may help. Details will be given in the first example of NLP application.Details will be given in the first example of NLP application.

14 Data Mining Report Generation Data mining results are usually in rule or tree formats with obscure notations.Data mining results are usually in rule or tree formats with obscure notations. NL generation techniques may help translate the data mining results into plain natural languages.NL generation techniques may help translate the data mining results into plain natural languages. Details will be given in the second example of NLP application.Details will be given in the second example of NLP application.

15 Codia for Verbatim Text Coding Answer ItemsCode Tree Small screen/window/text Long list of answer items Long list of answer items Difficult to browse/view Difficult to browse/view Worse than paper form Worse than paper form

16 Codia for Verbatim Text Coding Key Terms

17 Ranking Answers by Similarity Items with similar meaning

18 Text Similarity Measures String SemanticsCoverage Text Similarity Score

19 Codia for Verbatim Text Coding A user-interface for classifying answer items by drag-and-drop actions.A user-interface for classifying answer items by drag-and-drop actions. NLP reduces time and effort in searching, browsing, and selecting multiple answer items for classification.NLP reduces time and effort in searching, browsing, and selecting multiple answer items for classification. There’s still limitations and not fully automated.There’s still limitations and not fully automated.

20 Technical Issues of Codia Improve user-interface.Improve user-interface. Use only simple NLP techniques.Use only simple NLP techniques. Ambiguity resolution by human.Ambiguity resolution by human. Limited by thesaurus.Limited by thesaurus. Still cannot handle negatives ‘Not’.Still cannot handle negatives ‘Not’. Knowledge engineering is tedious.Knowledge engineering is tedious.

21 Limitations and Future Improvements Thesaurus has only 60,000 terms classified into 3900 semantic categories.Thesaurus has only 60,000 terms classified into 3900 semantic categories. Manual operation (ambiguity resolution relies on human).Manual operation (ambiguity resolution relies on human). Similarity measures are too mechanical.Similarity measures are too mechanical. Need to update and incorporate frequently used terms/categories. Towards automation by using more AI such as NLP, GA and NN. More adaptive by rule-based or case- based reasoning.

22 Data Mining and Knowledge Discovery Patterns Knowledge Data Data Mining Interpretation Knowledge Discovery

23 If q12 = 4 and q31 = 6 and q35 = 3 then q38 = 3

24 If h/h_income = 4 and city = 6 and car_owner = 3 then user = 3

25 say(feature,[r1]).

26 The segment of respondents who are product X users is characterized by residence in Shanghai, consumption of brand Y cigarettes, overseas travel in the past twelve months, ownership of imported cars, and high monthly household income. r1 say(feature, [r1]).

27 say(general,[r1]).say(likely,[r1]).say(reason,[r1]).

28 Basically, the respondents who are product X users have residence in Shanghai, consumption of brand Y cigarettes, overseas travel in the past twelve months, ownership of imported cars, and high monthly household income. r1 say(general, [r1]).

29 The respondents who are product X users because they have residence in Shanghai, consumption of brand Y cigarettes, overseas travel in the past twelve months, ownership of imported cars, and high monthly household income. r1 say(reason, [r1]).

30 It is likely that the people who have residence in Shanghai, consumption of brand Y cigarettes, overseas travel in the past twelve months, ownership of imported cars, and high monthly household income are product X users are product X users. r1 say(likely, [r1]).

31 Limitations and Future Improvements Pre-defined syntactic category of code labels.Pre-defined syntactic category of code labels. Single sentence for each rule.Single sentence for each rule. Lack visualization.Lack visualization. Almost no text planning.Almost no text planning. English only.English only. Lack knowledge of explanation.Lack knowledge of explanation. Automatic recognition of the syntax. Describe rule relationship in multiple coherent sentences. Text + graphics or even multimedia generation. Implement text planning. Multilingual. Implement NL techniques for explanation.

32 Concluding Remarks NLP techniques are found useful in:NLP techniques are found useful in: – Verbatim text coding and – Data mining report generation. Group similar answer items.Group similar answer items. Write simple natural language text.Write simple natural language text. A pricey technology because few tools are available.A pricey technology because few tools are available.

33 Natural Language Processing Josef Siu-Wai Leung (j.leung@ieee.org) Ching-Long Yeh (chingyeh@cse.ttit.edu.tw)


Download ppt "Natural Language Processing Verbatim Text Coding and Data Mining Report Generation Josef S.W. Leung Ching-Long Yeh"

Similar presentations


Ads by Google