Author Age Prediction from Text using Linear Regression Dong Nguyen Noah A. Smith Carolyn P. Rose.

Slides:



Advertisements
Similar presentations
CORRELATIONAL RESEARCH o What are the Uses of Correlational Research?What are the Uses of Correlational Research? o What are the Requirements for Correlational.
Advertisements

Deema Abdal Hafeth MSc student by research School of Computer Science, University of Lincoln Dr Amr Ahmed Supervisor Dr David Cobham supervisor.
Prediction Modeling for Personalization & Recommender Systems Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
“Cheap” Tricks for NLP: An “Invited” Talk Craig Martell Associate Professor Naval Postgraduate School Director, NLP Lab.
Gender and Letters of Recommendation: Agentic and Communal Differences Juan Madera, Mikki Hebl, and Randi Martin Rice University ABSTRACT BACKROUND Letters.
Simultaneous Image Classification and Annotation Chong Wang, David Blei, Li Fei-Fei Computer Science Department Princeton University Published in CVPR.
Political Party, Gender, and Age Classification Based on Political Blogs Michelle Hewlett and Elizabeth Lingg.
Multiple Aspect Ranking using the Good Grief Algorithm Benjamin Snyder and Regina Barzilay at MIT Elizabeth Kierstead.
1 Fuchun Peng Microsoft Bing 7/23/  Query is often treated as a bag of words  But when people are formulating queries, they use “concepts” as.
© author(s) of these slides including research results from the KOM research network and TU Darmstadt; otherwise it is specified at the respective slide.
VBPro & Yoshikoder C.K. & D.L.. VBPro About VBPro Must make own dictionary in this format Can import LIWC and other dictionaries, but wildcards (*) crash.
IVITA Workshop Summary Session 1: interactive text analytics (Session chair: Professor Huamin Qu) a) HARVEST: An Intelligent Visual Analytic Tool for the.
Stock Volatility Prediction using Earnings Calls Transcripts and their Summaries Naveed Ahmad Aram Zinzalian.
Designing a Multi-Lingual Corpus Collection System Jonathan Law Naresh Trilok Pace University 04/19/2002 Advisors: Dr. Charles Tappert (Pace University)
SOWK 6003 Social Work Research Week 10 Quantitative Data Analysis
Author Identification for LiveJournal Alyssa Liang.
Introduction to Language Models Evaluation in information retrieval Lecture 4.
TTI's Gender Prediction System using Bootstrapping and Identical-Hierarchy Mohammad Golam Sohrab Computational Intelligence Laboratory Toyota.
Chapter 5. Operations on Multiple R. V.'s 1 Chapter 5. Operations on Multiple Random Variables 0. Introduction 1. Expected Value of a Function of Random.
Using SPSS for Simple Regression UDP 520 Lab 6 Lin November 27 th, 2007.
Author Author Author PH251 Date Is Father Absence Early in Life Associated with Age at Menarche?
Simple Linear Regression
Automated Personality Classification
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
NERIL: Named Entity Recognition for Indian FIRE 2013.
TagHelper: Basics Part 1 Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh Science of Learning Center and The Office of Naval.
Section 4.2 Regression Equations and Predictions.
Text classification Day 35 LING Computational Linguistics Harry Howard Tulane University.
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
Designing Ranking Systems for Consumer Reviews: The Economic Impact of Customer Sentiment in Electronic Markets Anindya Ghose Panagiotis Ipeirotis Stern.
Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.
Joint Models of Disagreement and Stance in Online Debate Dhanya Sridhar, James Foulds, Bert Huang, Lise Getoor, Marilyn Walker University of California,
Chapter6. Statistical Inference : n-gram Model over Sparse Data 이 동 훈 Foundations of Statistic Natural Language Processing.
Experiments of Opinion Analysis On MPQA and NTCIR-6 Yaoyong Li, Kalina Bontcheva, Hamish Cunningham Department of Computer Science University of Sheffield.
1 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה פרופ’ בנימין רייזר פרופ’ דוד פרג’י גב’ אפרת ישכיל.
DOES LANGUAGE CHOICE PREDICT MOTIVATIONS FOR RELATIONSHIP INITIATION?: USING LIWC TO ANALYZE LINGUISTIC MARKERS OF INTENT IN ONLINE DATING PROFILES LIESEL.
Prediction of Influencers from Word Use Chan Shing Hei.
Exploiting Wikipedia Categorization for Predicting Age and Gender of Blog Authors K Santosh Aditya Joshi Manish Gupta Vasudeva Varma
Differences between Genders And Age groups. Hypothesis: Male students believe they perform better in traditional classes over online classes while female.
META-LEARNING FOR AUTOMATIC SELECTION OF ALGORITHMS FOR TEXT CLASSIFICATION Karol Furdík, Ján Paralič, Gabriel Tutoky {Jan.Paralic,
Business Intelligence and Decision Modeling
Employees’ investment behavior in a company savings plan Nicolas AUBERT, Université de la Méditerranée - Inseec Thomas RAPP, University of Maryland 1.
Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science With funding from the National Science.
Date: 2015/11/19 Author: Reza Zafarani, Huan Liu Source: CIKM '15
Recognizing Stances in Online Debates Unsupervised opinion analysis method for debate-side classification. Mine the web to learn associations that are.
Chapter 16 Social Statistics. Chapter Outline The Origins of the Elaboration Model The Elaboration Paradigm Elaboration and Ex Post Facto Hypothesizing.
Hello, Who is Calling? Can Words Reveal the Social Nature of Conversations?
Tools for Linguistic Analysis. Overview of Linguistic Tools  Dictionaries  Linguistic Inquiry and Word Count (LIWC) Linguistic Inquiry and Word Count.
Logistic Regression. Linear regression – numerical response Logistic regression – binary categorical response eg. has the disease, or unaffected by the.
Modeling Latent Biographic Attributes in Conversational Genres Nikesh Garera David Yarowsky.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
 Hailey Maurer and Liya Zalaltdinova Lying Words: Predicting Deception From Linguistic Styles by Matthew L. Newman, James W. Pennebaker, Diane S. Berry.
By: Shannon Silessi Gender Identification of SMS Texts.
PREDICTION ON TWEET FROM DYNAMIC INTERACTION Group 19 Chan Pui Yee Wong Tsz Wing Yeung Chun Kit.
A Simple Approach for Author Profiling in MapReduce
Click Through Rate Prediction for Local Search Results
سرطان الثدي Breast Cancer
OVERVIEW OF BAYESIAN INFERENCE: PART 1
Designing data capture forms
Computational Models of Discourse Analysis
Determine the type of correlation between the variables.
Text Analytics - Accelerator
Feminist stylistics.
Title Introduction: Discussion & Conclusion: Methods & Results:
Machine Learning – a Probabilistic Perspective
Cases. Simple Regression Linear Multiple Regression.
Stance Classification of Ideological Debates
Presentation transcript:

Author Age Prediction from Text using Linear Regression Dong Nguyen Noah A. Smith Carolyn P. Rose

Introduction Frame author age prediction from text as a regression problem. Using multi-corpus approach: blogs, telephone conversations and online forum posts Investigation of age prediction with age modeled as a continuous variable.

Introduction Frame author age prediction from text as a regression problem. Using multi-corpus approach: blogs, telephone conversations and online forum posts Investigation of age prediction with age modeled as a continuous variable.

Data description Fisher telephone corpus Blog corpus Breast cancer forum – Information such as gender and age were indicated. – Every document consists of all posts from a particular user

Data description

Experiment Linear regression

Experiment JOINT Model:

Experiment Overview different models – INDIV: Models trained on the three corpora individually – JOINT: Model trained on all three corpora with features represented. – JOINT-Global: Using the learned JOINT model but only keeping the global features – JOINT-Global-Retrained: Using the discovered global features by the JOINT model, but retrained on each specific dataset

Experiment Features – Gender Binary feature (Male=1, Female=0) – Textual features Unigrams POS unigrams and bigrams LIWC (linguistic inquiry and word count). This is a word counting program that captures word classes such as inclusion words (LIWC-incl: "with," "and," "include" etc.), causation words (LIWC cause:"because" "hence" etc.), and stylistic characteristics such as percentage of words longer than 6 letters (LIWC-Sixltr).

Results and discussion

Reference Author Age Prediction from Text using Linear Regression. Dong Nguyen Noah A. Smith Carolyn P. Rose