Presentation on theme: "Conclusions and future work 25 features are enough to get top performance (65%, 77%) p-value feature selection signal-to-noise feature selection Analyze."— Presentation transcript:
Conclusions and future work 25 features are enough to get top performance (65%, 77%) p-value feature selection signal-to-noise feature selection Analyze power of different features Basic features, Lexical features, repetitions LIWC, Diction Different emotions have different distinguish power sad, anger > happy > fear, disgust Data: Vignettes from 39 Subjects 39 subjects, matched for age, gender and ethnicity SZ: n=23, 115 stories; CO: n=16, 79 stories Five emotions: HAPPY, SAD, ANGER, FEAR and DISGUST Subjects were asked to produce narrative lasting 30-90 sec. Narratives transcribed and checked by 2 transcribers Demographic information GenderMale (19), Female (20) Age33.52 (Schizophrenia 32.18, Control 32.19) RaceBlack (23), White (12), Asian (1), Hybrid (3) References Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Ani Nenkova 1, Kai Hong 1, Amber A. Parker 2, Mary March 3, Christian G. Kohler 3 Introduction & Purpose of Research Status Prediction by Emotion Classification and Feature Selection Basic Features and Lexical Features Number of Features for Best Performance Linguistic Inquiry and Word Count & Diction Classification Approach: Using SVM-light as classifier Applied Leave-one-subject-out (39 times) Voting Basic Features: General properties of language without focus on specific words Linguistic Inquiry & Word Count (LIWC) (Pennebaker, 2007) calculates the degree to which people use different categories of words. Each word or word stem can be in one or more word categories or sub-dictionaries. Distinguish Power: Anger > Sad > Happy > Disgust > Fear SchizophreniaControlGeneral P(%)R(%)F(%)P(%)R(%)F(%)AccuracyMacroF Story68.775.071.757.149.452.964.762.3 Subject75.091.382.481.856.366.776.974.6  Michael A. Covington, et al. 2005. Schizophrenia and the structure of language: The linguist’s view.  Rodrick Hart. 2005. Diction 6.0.  J. Pennebaker. 2007. Linguistic inquiry and word count 2007.  Keyur Gabani, et al. 2009. A corpus-based approach for the prediction of language impairment in monolingual english and spanishenglish bilingual children. In HLT-NAACL.  Thamar Solorioet al. 2011a. Analyzing language samples of spanish-english bilingual children for the automated prediction of language dominance. Natural Language Engineering  Peter A. Heeman, et al, Autism and interactional aspects of dialogue. In Raquel Fern´andez, Yasuhiro Katagiri, Kazunori omatani, Oliver Lemon, and Mikio Nakano, editors, SIGDIAL Conference, 2010 Best Performance Analysis: Specificity: 91.3%, Sensitivity: 56.3%, Accuracy: 76.9% (30 of 39) Performance Changing with #Features Best Performance: 25 Features Accuracy dropped: features > 50 Less confidence features add noise P = Precision, R = Recall F = 2*P*R / (P+R)=F-measure Incorrect Classification SZ: 2/23, CO 9/16 Significance Level: ** [0, 0.01], * (0.001, 0.01], ^ (0.01, 0.05] LIWC Significant Features (p-value<0.05) SZSelf(I)**, personal pronoun^, insight^ COAdverbs**, Exclusive**, Inhibition^, Complex-words^ Diction (Hart, 2005) analyzes the lexical characteristics of the transcripts. Similar to LIWC, Diction scores are computed with reference to manually compiled dictionaries. Diction Significant Features (p-value<0.05) SZSelf**, Cognition^, Past, Insistence^, Satisfaction^ CORealism**, Diversity**, Complexity^, Familiarity^, Cooperation^, Certainty^ Lexical differences have not been investigated in detail. Evaluate differences in how persons with SZ describe autobiographical experiences of emotional events in standardized setting. Repetition of word: Word has appeared before it with a window of 5. Significance Level: ** [0, 0.01], * (0.001, 0.01], ^ (0.01, 0.05] LIWC significant categories (# of words included and example) - insight (195): think, know, consider. - exclusive words (17): but, without, exclusive. - inhibition (111): block, constrain, stop. - complex-words: defined as words >6 letters (sixltr) Some significant Diction categories and definitions - Cognition: Words referring to cerebral processes. - Satisfaction: Terms associated with positive affective states. - Insistence: A measure of code-restriction, indicating a preference for a limited, ordered world. - Diversity: Words describing individuals or groups of individuals differing from the norm. - Familiarity: Most common words in English. - Certainty: Indicating resoluteness in flexibility Lexical Features: Type 1: Analysis of specific words Type 2: Including word analysis and repetition of words Basic Significant Features (p-value <=0.05) SCHSentences/Document^ Repetition of words** COLetters/Word**, Words/Sentence**, Repetition of Punctuations** Type 1 Significant Features (p-value <=0.05) SCHI**, couldn’t**, extremely**, mildly**, money**, feeling*, moderately*, my*, took*, way*, mom^, friends^, dog^, trouble^, god^, loved^, son^, guy^, before^, alone^, mental^, hearing^, met^, passed^, hand^ CO“,”**, sorry*, very*, really*, basically^, relationship^, She’s^, actually^ Type 2 Significant Features (p-value <=0.05) SCHRep-and**, Rep-um**, Rep-I**, Rep-a^, Rep-was^ CORep-”,”**, Rep-very.* # Features on different p-value on 5 emotions. Feature Selection: using Signal -to-Noise (S2N) S2N 2% better than P-value (best performance) ThresholdHappyDisgustAngerFearSadStoryPatient S2n (25)66.763.461.060.072.564.776.9 P-value (0.05)59.061.070.755.060.062.964.1 P-value (0.001)71.851.270.767.5 65.774.4 Different Emotions HAPPY: SZ talk more about family; higher tendency of ambivalence DISGUST: SZ are more disgusted with dogs, and health.CO has higher communication score, referring to a better social interaction. ANGER: SZ show more aggression and cognition FEAR: SZ talk about money. CO use more inhibition words. SAD: SZ show more satisfaction & insistence, CO include more working experiences.