Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction NLP Applications

Similar presentations


Presentation on theme: "Introduction NLP Applications"— Presentation transcript:

1

2 Introduction NLP Applications

3 Tweets 140 Characters Also contains images, 😊emoji, and links Grammatically ambiguous Customer Service Requests through Social Media

4 Present Research Method developed for extracting keywords from Tweets.
By obtaining essential keywords by imitating human question-answering logic.

5 In answering a question, humans focus on the Keywords
What is ? your name your name

6 Highest token accuracy POS tagging by NLP4J - 97.64% [4]
NLP - Current Tools Stanford CoreNLP [1] OpenNLP [2] NLP4J [3] Highest token accuracy POS tagging by NLP4J % [4]

7 Tweets affect the token accuracy of POS taggers.
Models for POS tagging TwitIE [5] TweetNLP [6] Twitter-POS tagger for Stanford CoreNLP [7] it is noisy, with linguistic errors and idiosyncratic style. Token Accuracy of Stanford CoreNLP is 97.32% [4] Twitter-POS Tagger for Stanford CoreNLP recorded accuracy of 90.5% [7]

8 Data Collection Keyword Extraction Implementation Methodology

9 Methodology : Data Collection
Tweets of the months of February and March 2016 were used Dialog Axiata Twitter Profile Rejected - Domain specific nouns,verbs,interjections and aux verbs Keywords - essential for the meaning of the sentence Keyword Corpus (258 words) Rejected words Corpus (64 Words)

10 2. Keyword Extraction Methodology
Parser 1 Stanford CoreNLP POS Tagging with Twitter Model Parser 2 Keyword Matching Parser 3 Rejected Words Matching

11 Stanford CoreNLP POS Tagging with Twitter Model
Parser 1 Parser 2 Parser 3 divided into a Subject (Noun Phrase, NP) Predicate (Verb Phrase, VP) NP - Numbers (CD), Noun (NN - all forms), Adjectives (JJ - all forms) VP - Verbs (VB - all forms) NP & VP – essence of the meaning NP - Usernames, Emoji, Hashtags, Pronouns VP - Adverbs, Wh-adverbs, Auxiliary Verbs

12 Fig.1 POS Tagged Tweet (Tregex Notation)
Tweet Please unsubscribe cool club service .my number Nouns – Club(NN), service(NN), number(NN), (CD) Verbs – please(VB) Other unsubscribe(JJ), cool(JJ), my (PRP$) Fig.2 Results from Parser 1

13 Keyword Matching Parser 1 Parser 2 Parser 3 Tweet is matched against a Domain Specific Keywords Corpus The words not classified as NPs and VPs The NPs and VPs identified from Parser 1 Tweet

14 Tweet - @dialoglk Please unsubscribe cool club service
Tweet Please unsubscribe cool club service .my number Nouns – Club(NN), service(NN), number(NN), (CD) Verbs – please(VB) Adjectives – unsubscribe(JJ), cool(JJ) Other unsubscribe(JJ), cool(JJ), my (PRP$) Fig. 3 Result from Parser 2

15 Rejected Words Matching
Parser 1 Parser 2 Parser 3 The noise from the resulting keywords from Parser 2 The keywords which have a Levenshtein Distance match of 0 with the corpus Tweet Please unsubscribe cool club service .my number Nouns – Club(NN), service(NN), number(NN), (CD) Verbs – please(VB) Adjectives - unsubscribe(JJ), cool(JJ) Other my (PRP$) Fig. 3 Noise Removed by Parser 3

16 unsubscribe (JJ), cool (JJ), club (NN),
Final Keywords List = unsubscribe (JJ), cool (JJ), club (NN), service (NN), number(NN), (CD) @dialoglk Please unsubscribe cool club service .my number

17 3. Implementation Implemented using Java. Fig. 5 GUI of the Program

18 Evaluation Methodology
Evaluated using the Turing Test.[8] “The machine to be linguistically indistinguishable from humans” [9]

19 Evaluation Methodology : Design
14 new Tweets Keyword sets were generated by Humans (6 categories from different fields) Non-modified System (Sys.A) Modified System (Sys.B) Human supervisors evaluated the responses Sys A - Explain

20 Calculation of the test results
n : Total number of Tweets x : Machine and Human answers were identical y : Supervisor detected the answer generated by the Machine z : Supervisor could not detect the answer generated by the machine T : Total instances where the system was successful

21 Summary of Turing Test results for Sys.A TABLE II
Summary of Turing Test results for Sys.B Test Case Criteria x y z T Academics 14 0.00% English Language Experts 2 12 85.71% Undergraduates 3 9 35.71% Graduates 8 42.86% Computer Science Graduates 4 7 71.43% General Public 1 92.86% Test Case Criteria x y z T Academics 3 11 78.57% English Language Experts 2 12 85.71% Undergraduates 5 7 50.00% Graduates 6 57.14% Computer Science Graduates 4 9 1 35.71% General Public Test Case Failed Test Case Passed

22 Summary and Conclusions
TABLE III Summary of Turing Test Results The research modifies the Stanford CoreNLP with Twitter POS Tagger Model using a mix of parsers and corpora The modified system had keyword sets identical to humans The enhancements increase overall Turing Test result from 50% to 83.33% System Tested Test cases that passed Test cases that failed Success rate of the System System without Modifications 3 50.00% System with modifications 5 1 83.33%

23 Language supported is English Future Work
Limitations The system could be evaluated with a larger population for nuanced results Language supported is English Future Work Use a complete domain specific corpus to increase accuracy Present approach could be applied to other NLP tools

24 References [1] C. D. Manning, J. Bauer, J. Finkel, S. J. Bethard, M. Surdeanu, and D. McClosky, “The Stanford CoreNLP Natural Language Processing Toolkit,” Proc. 52nd Annu. Meet. Assoc. Comput. Linguist. Syst. Demonstr., pp. 55–60, 2014. [2] “Welcome to Apache OpenNLP,” [Online]. Available: [3] “emorynlp/nlp4j: NLP tools developed by Emory University,” [Online]. Available: [4] “POS Tagging (State of the art),” [Online]. Available: [Accessed: 22-Aug-2016] [5] K. Bontcheva, L. Derczynski, A. Funk, M. A. Greenwood, D. Maynard, and N. Aswani, “TwitIE: An Open- Source Information Extraction Pipeline for Microblog Text,” 2013.

25 References [6] O. Owoputi, B. O ’connor, C. Dyer, K. Gimpel, N. Schneider, and N. A. Smith, “Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters,” Proc. NAACL, 2013. [7] L. Derczynski, A. Ritter, S. Clark, and K. Bontcheva, “Twitter part-of-speech tagging for all: Overcoming sparse and noisy data,” Proc. Recent Adv. Nat. Lang. Process., no. September, pp. 198–206, 2013. [8] A. M. Turing, “Computing Machinery and Intelligence,” Mind, vol. 49, pp. 433–460, 1950. [9] K. Lacurts, “Criticisms of the Turing Test and Why You Should Ignore ( Most of ) Them,” Official Blog of MIT’s Course: Philosophy and Theoretical Computer Science, [Online]. Available: people.csail.mit.edu/katrina/papers/6893.pdf. [Accessed: 23-Jun-2016]. *Images obtained from online sources.

26


Download ppt "Introduction NLP Applications"

Similar presentations


Ads by Google