Lie Detection using NLP Techniques

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Deema Abdal Hafeth MSc student by research School of Computer Science, University of Lincoln Dr Amr Ahmed Supervisor Dr David Cobham supervisor.
Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Psychometric Aspects of Linking Tests to the CEF Norman Verhelst National Institute for Educational Measurement (Cito) Arnhem – The Netherlands.
Farag Saad i-KNOW 2014 Graz- Austria,
Sentiment Analysis An Overview of Concepts and Selected Techniques.
A Survey on Text Categorization with Machine Learning Chikayama lab. Dai Saito.
A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts 04 10, 2014 Hyun Geun Soo Bo Pang and Lillian Lee (2004)
Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.
1. Abstract 2 Introduction Related Work Conclusion References.
Assuming normally distributed data! Naïve Bayes Classifier.
Confidence Estimation for Machine Translation J. Blatz et.al, Coling 04 SSLI MTRG 11/17/2004 Takahiro Shinozaki.
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
Defect prediction using social network analysis on issue repositories Reporter: Dandan Wang Date: 04/18/2011.
Linguistic Credibility Assessment. Emma – general comments on language Matt – tools for linguistic analysis Mary – case study.
Facial Feature Detection
Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews K. Dave et al, WWW 2003, citations Presented by Sarah.
Identifying Computer Graphics Using HSV Model And Statistical Moments Of Characteristic Functions Xiao Cai, Yuewen Wang.
Masquerade Detection Mark Stamp 1Masquerade Detection.
1 Template-Based Classification Method for Chinese Character Recognition Presenter: Tienwei Tsai Department of Informaiton Management, Chihlee Institute.
Short Introduction to Machine Learning Instructor: Rada Mihalcea.
Improving Utterance Verification Using a Smoothed Na ï ve Bayes Model Reporter : CHEN, TZAN HWEI Author :Alberto Sanchis, Alfons Juan and Enrique Vidal.
Automatically Identifying Localizable Queries Center for E-Business Technology Seoul National University Seoul, Korea Nam, Kwang-hyun Intelligent Database.
Presented by Tienwei Tsai July, 2005
2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.
Marcin Marszałek, Ivan Laptev, Cordelia Schmid Computer Vision and Pattern Recognition, CVPR Actions in Context.
Transcription of Text by Incremental Support Vector machine Anurag Sahajpal and Terje Kristensen.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
One-class Training for Masquerade Detection Ke Wang, Sal Stolfo Columbia University Computer Science IDS Lab.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Instance Filtering for Entity Recognition Advisor : Dr.
1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.
PIER Research Methods Protocol Analysis Module Hua Ai Language Technologies Institute/ PSLC.
Prediction of Influencers from Word Use Chan Shing Hei.
CISC Machine Learning for Solving Systems Problems Presented by: Ashwani Rao Dept of Computer & Information Sciences University of Delaware Learning.
TEXT ANALYTICS - LABS Maha Althobaiti Udo Kruschwitz Massimo Poesio.
Automatic Identification of Pro and Con Reasons in Online Reviews Soo-Min Kim and Eduard Hovy USC Information Sciences Institute Proceedings of the COLING/ACL.
Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.
Recognizing Stances in Ideological Online Debates.
USE RECIPE INGREDIENTS TO PREDICT THE CATEGORY OF CUISINE Group 7 – MEI, Yan & HUANG, Chenyu.
Date: 2015/11/19 Author: Reza Zafarani, Huan Liu Source: CIKM '15
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Exploring in the Weblog Space by Detecting Informative and Affective Articles Xiaochuan Ni, Gui-Rong Xue, Xiao Ling, Yong Yu Shanghai Jiao-Tong University.
A DYNAMIC APPROACH TO THE SELECTION OF HIGH ORDER N-GRAMS IN PHONOTACTIC LANGUAGE RECOGNITION Mikel Penagarikano, Amparo Varona, Luis Javier Rodriguez-
Hello, Who is Calling? Can Words Reveal the Social Nature of Conversations?
Comparative Experiments on Sentiment Classification for Online Product Reviews Hang Cui, Vibhu Mittal, and Mayur Datar AAAI 2006.
GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.
Jen-Tzung Chien, Meng-Sung Wu Minimum Rank Error Language Modeling.
Application of latent semantic analysis to protein remote homology detection Wu Dongyin 4/13/2015.
Using Linguistic Analysis and Classification Techniques to Identify Ingroup and Outgroup Messages in the Enron Corpus.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine 朱林娇 14S
Object Recognition as Ranking Holistic Figure-Ground Hypotheses Fuxin Li and Joao Carreira and Cristian Sminchisescu 1.
Lexical Affect Sensing: Are Affect Dictionaries Necessary to Analyze Affect? Alexander Osherenko, Elisabeth André University of Augsburg.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
Combining Evolutionary Information Extracted From Frequency Profiles With Sequence-based Kernels For Protein Remote Homology Detection Name: ZhuFangzhi.
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
 Hailey Maurer and Liya Zalaltdinova Lying Words: Predicting Deception From Linguistic Styles by Matthew L. Newman, James W. Pennebaker, Diane S. Berry.
Describing a Score’s Position within a Distribution Lesson 5.
Introduction to Classification & Clustering Villanova University Machine Learning Lab Module 4.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
An Effective Statistical Approach to Blog Post Opinion Retrieval Ben He, Craig Macdonald, Jiyin He, Iadh Ounis (CIKM 2008)
Detection of Misinformation on Online Social Networking
A Simple Approach for Author Profiling in MapReduce
Queensland University of Technology
Introduction to Classification & Clustering
iSRD Spam Review Detection with Imbalanced Data Distributions
Presentation transcript:

Lie Detection using NLP Techniques Swapnil Ghuge (10305907) Jyotesh Choudhari (113059001)

This seminar is mainly based on the short paper “The Lie Detector: Explorations in the Automatic Recognition of Deceptive Language” by Rada Mihalcea and Carlo Strapparava, published in ACL-2009 Lie Detection using NLP Techniques

Outline Introduction Approach Corpus Experiments and Results Identifying Salient Features of Deceptive Text Conclusion Lie Detection using NLP Techniques

Introduction Discrimination between truth and falsehood in psychology, philosophy and sociology From psychological aspect, human behavior, gesture, repetition of words, stammering, eye contact are important cues to detect lies Can we apply computational approaches to recognize deceptive language in written text? Lie Detection using NLP Techniques

Introduction Objective is to answer following questions: Are truthful and lying texts separable? Does this property hold for different datasets? Can computational approaches be used to separate truthful and lying texts? What are the distinct features of deceptive text? Lie Detection using NLP Techniques

Approach Statistical approach Classes – Truth and Falsehood Classifier – Naïve Bayes and SVM Features Corpus Lie Detection using NLP Techniques

Corpus 3 Datasets True and lying opinions on Abortion Death Penalty Best Friends 100 true and 100 false statements in every dataset. Lie Detection using NLP Techniques

Experiments and Results Used two classifiers Naïve Bayes SVM Minimal preprocessing – tokenization and stemming Topic Naïve Bayes SVM ABORTION 70% 67.5% DEATH PENALTY 67.4% 65.9% BEST FRIEND 75.0% 77.0% AVERAGE 70.8% 70.1% Table 1: Ten-fold cross validation classification using Naïve Bayes(NB) or Support Vector Machine(SVM) Lie Detection using NLP Techniques

Experiments and Results Figure 1: Classification Learning Curves Lie Detection using NLP Techniques

Experiments and Results Training Test Naïve Bayes SVM DEATH PENALTY + BEST FRIENDS ABORTION 62.0% 61.0% ABORTION + BEST FRIENDS DEATH PENALTY 58.7% DEATH PENALTY + ABORTION BEST FRIENDS 53.6% AVERAGE 59.8% 57.8% Table 2: Cross-topic classification results Lie Detection using NLP Techniques

Have we achieved it yet? Objective is to answer following questions: Are truthful and lying text separable? - YES Does this property hold for different datasets? - YES Can computational approaches be used to separate truthful and lying texts? - YES What are the distinct features of deceptive text? Lie Detection using NLP Techniques

Identifying Salient Features To understand the characteristics of the deceptive text, calculate a score for a given class of words Class coverage in the deceptive corpus D: Class coverage in the truthful corpus T: Lie Detection using NLP Techniques

Identifying Salient Features Dominance of word class C in deceptive corpus D: Dominance higher than 1 indicates word class C is dominant in deceptive texts. Dominance less than 1 states that word class C is more likely to appear in truthful texts. Dominance score close to 1 shows similar distribution of word class C in both the deceptive and truthful corpus. Lie Detection using NLP Techniques

Identifying Salient Features Word classes obtained from Linguistic Inquiry and Word Count (LIWC) by (Pennebaker and Francis, 1999). Text Class Score Sample Words Deceptive METAPH 1.71 God, mercy, sin, hell OTHER 1.47 She, him, themselves CERTAIN 1.24 Always, truly, completely Truthful OPTIM 0.57 Best, determined, accept I 0.59 I, myself, mine INSIGHT 0.65 Believe, understand, admit Table 3: Dominant Word Classes along with sample words Lie Detection using NLP Techniques

Conclusion Explored computational approach to recognize deceptive language in written text. Truthful and false texts are separable and this property holds for different datasets. Salient features show interesting pattern of uses of words in deceptive texts including detachment from self and emphasizing certainty. Lie Detection using NLP Techniques

References Rada Mihalcea, Carlo Strapparava. The Lie Detector: Explorations in the Automatic Recognition of Deceptive Language. In Proceedings of ACL-IJCNLP Conference Short Papers 2009. J. Pennebaker, M. Francis. 1999. Linguistic Inquiry and Word Count: LIWC. Erlbaum Publishers. Lie Detection using NLP Techniques

Ask no questions and you will hear no lies. -James Joyce, Ullysses Lie Detection using NLP Techniques