Introduction NLP Applications

Introduction NLP Applications

Tweets 140 Characters Also contains images, 😊emoji, and links Grammatically ambiguous Customer Service Requests through Social Media

Present Research Method developed for extracting keywords from Tweets.
By obtaining essential keywords by imitating human question-answering logic.

In answering a question, humans focus on the Keywords
What is ? your name your name

Highest token accuracy POS tagging by NLP4J - 97.64% [4]
NLP - Current Tools Stanford CoreNLP [1] OpenNLP [2] NLP4J [3] Highest token accuracy POS tagging by NLP4J % [4]

Tweets affect the token accuracy of POS taggers.
Models for POS tagging TwitIE [5] TweetNLP [6] Twitter-POS tagger for Stanford CoreNLP [7] it is noisy, with linguistic errors and idiosyncratic style. Token Accuracy of Stanford CoreNLP is 97.32% [4] Twitter-POS Tagger for Stanford CoreNLP recorded accuracy of 90.5% [7]

Data Collection Keyword Extraction Implementation Methodology

Methodology : Data Collection
Tweets of the months of February and March 2016 were used Dialog Axiata Twitter Profile Rejected - Domain specific nouns,verbs,interjections and aux verbs Keywords - essential for the meaning of the sentence Keyword Corpus (258 words) Rejected words Corpus (64 Words)

2. Keyword Extraction Methodology
Parser 1 Stanford CoreNLP POS Tagging with Twitter Model Parser 2 Keyword Matching Parser 3 Rejected Words Matching

Stanford CoreNLP POS Tagging with Twitter Model
Parser 1 Parser 2 Parser 3 divided into a Subject (Noun Phrase, NP) Predicate (Verb Phrase, VP) NP - Numbers (CD), Noun (NN - all forms), Adjectives (JJ - all forms) VP - Verbs (VB - all forms) NP & VP – essence of the meaning NP - Usernames, Emoji, Hashtags, Pronouns VP - Adverbs, Wh-adverbs, Auxiliary Verbs

Fig.1 POS Tagged Tweet (Tregex Notation)
Tweet Please unsubscribe cool club service .my number Nouns – Club(NN), service(NN), number(NN), (CD) Verbs – please(VB) Other unsubscribe(JJ), cool(JJ), my (PRP$) Fig.2 Results from Parser 1

Keyword Matching Parser 1 Parser 2 Parser 3 Tweet is matched against a Domain Specific Keywords Corpus The words not classified as NPs and VPs The NPs and VPs identified from Parser 1 Tweet

Tweet - @dialoglk Please unsubscribe cool club service
Tweet Please unsubscribe cool club service .my number Nouns – Club(NN), service(NN), number(NN), (CD) Verbs – please(VB) Adjectives – unsubscribe(JJ), cool(JJ) Other unsubscribe(JJ), cool(JJ), my (PRP$) Fig. 3 Result from Parser 2

Rejected Words Matching
Parser 1 Parser 2 Parser 3 The noise from the resulting keywords from Parser 2 The keywords which have a Levenshtein Distance match of 0 with the corpus Tweet Please unsubscribe cool club service .my number Nouns – Club(NN), service(NN), number(NN), (CD) Verbs – please(VB) Adjectives - unsubscribe(JJ), cool(JJ) Other my (PRP$) Fig. 3 Noise Removed by Parser 3

unsubscribe (JJ), cool (JJ), club (NN),
Final Keywords List = unsubscribe (JJ), cool (JJ), club (NN), service (NN), number(NN), (CD) @dialoglk Please unsubscribe cool club service .my number

3. Implementation Implemented using Java. Fig. 5 GUI of the Program

Evaluation Methodology
Evaluated using the Turing Test.[8] “The machine to be linguistically indistinguishable from humans” [9]

Evaluation Methodology : Design
14 new Tweets Keyword sets were generated by Humans (6 categories from different fields) Non-modified System (Sys.A) Modified System (Sys.B) Human supervisors evaluated the responses Sys A - Explain

Calculation of the test results
n : Total number of Tweets x : Machine and Human answers were identical y : Supervisor detected the answer generated by the Machine z : Supervisor could not detect the answer generated by the machine T : Total instances where the system was successful

Summary of Turing Test results for Sys.A TABLE II
Summary of Turing Test results for Sys.B Test Case Criteria x y z T Academics 14 0.00% English Language Experts 2 12 85.71% Undergraduates 3 9 35.71% Graduates 8 42.86% Computer Science Graduates 4 7 71.43% General Public 1 92.86% Test Case Criteria x y z T Academics 3 11 78.57% English Language Experts 2 12 85.71% Undergraduates 5 7 50.00% Graduates 6 57.14% Computer Science Graduates 4 9 1 35.71% General Public Test Case Failed Test Case Passed

Summary and Conclusions
TABLE III Summary of Turing Test Results The research modifies the Stanford CoreNLP with Twitter POS Tagger Model using a mix of parsers and corpora The modified system had keyword sets identical to humans The enhancements increase overall Turing Test result from 50% to 83.33% System Tested Test cases that passed Test cases that failed Success rate of the System System without Modifications 3 50.00% System with modifications 5 1 83.33%

Language supported is English Future Work
Limitations The system could be evaluated with a larger population for nuanced results Language supported is English Future Work Use a complete domain specific corpus to increase accuracy Present approach could be applied to other NLP tools

References [1] C. D. Manning, J. Bauer, J. Finkel, S. J. Bethard, M. Surdeanu, and D. McClosky, “The Stanford CoreNLP Natural Language Processing Toolkit,” Proc. 52nd Annu. Meet. Assoc. Comput. Linguist. Syst. Demonstr., pp. 55–60, 2014. [2] “Welcome to Apache OpenNLP,” [Online]. Available: [3] “emorynlp/nlp4j: NLP tools developed by Emory University,” [Online]. Available: [4] “POS Tagging (State of the art),” [Online]. Available: [Accessed: 22-Aug-2016] [5] K. Bontcheva, L. Derczynski, A. Funk, M. A. Greenwood, D. Maynard, and N. Aswani, “TwitIE: An Open- Source Information Extraction Pipeline for Microblog Text,” 2013.

References [6] O. Owoputi, B. O ’connor, C. Dyer, K. Gimpel, N. Schneider, and N. A. Smith, “Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters,” Proc. NAACL, 2013. [7] L. Derczynski, A. Ritter, S. Clark, and K. Bontcheva, “Twitter part-of-speech tagging for all: Overcoming sparse and noisy data,” Proc. Recent Adv. Nat. Lang. Process., no. September, pp. 198–206, 2013. [8] A. M. Turing, “Computing Machinery and Intelligence,” Mind, vol. 49, pp. 433–460, 1950. [9] K. Lacurts, “Criticisms of the Turing Test and Why You Should Ignore ( Most of ) Them,” Official Blog of MIT’s Course: Philosophy and Theoretical Computer Science, [Online]. Available: people.csail.mit.edu/katrina/papers/6893.pdf. [Accessed: 23-Jun-2016]. *Images obtained from online sources.

Introduction NLP Applications

Similar presentations

Presentation on theme: "Introduction NLP Applications"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction NLP Applications

Similar presentations

Presentation on theme: "Introduction NLP Applications"— Presentation transcript:

Similar presentations

About project

Feedback