Author Age Prediction from Text using Linear Regression Dong Nguyen Noah A. Smith Carolyn P. Rose
Introduction Frame author age prediction from text as a regression problem. Using multi-corpus approach: blogs, telephone conversations and online forum posts Investigation of age prediction with age modeled as a continuous variable.
Introduction Frame author age prediction from text as a regression problem. Using multi-corpus approach: blogs, telephone conversations and online forum posts Investigation of age prediction with age modeled as a continuous variable.
Data description Fisher telephone corpus Blog corpus Breast cancer forum – Information such as gender and age were indicated. – Every document consists of all posts from a particular user
Data description
Experiment Linear regression
Experiment JOINT Model:
Experiment Overview different models – INDIV: Models trained on the three corpora individually – JOINT: Model trained on all three corpora with features represented. – JOINT-Global: Using the learned JOINT model but only keeping the global features – JOINT-Global-Retrained: Using the discovered global features by the JOINT model, but retrained on each specific dataset
Experiment Features – Gender Binary feature (Male=1, Female=0) – Textual features Unigrams POS unigrams and bigrams LIWC (linguistic inquiry and word count). This is a word counting program that captures word classes such as inclusion words (LIWC-incl: "with," "and," "include" etc.), causation words (LIWC cause:"because" "hence" etc.), and stylistic characteristics such as percentage of words longer than 6 letters (LIWC-Sixltr).
Results and discussion
Reference Author Age Prediction from Text using Linear Regression. Dong Nguyen Noah A. Smith Carolyn P. Rose