Stock Volatility Prediction using Earnings Calls Transcripts and their Summaries Naveed Ahmad Aram Zinzalian.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.
Advertisements

ThemeInformation Extraction for World Wide Web PaperUnsupervised Learning of Soft Patterns for Generating Definitions from Online News Author Cui, H.,
Trends in Sentiments of Yelp Reviews Namank Shah CS 591.
Distant Supervision for Emotion Classification in Twitter posts 1/17.
Great Food, Lousy Service Topic Modeling for Sentiment Analysis in Sparse Reviews Robin Melnick Dan Preston
Deliverable #2: Question Classification Group 5 Caleb Barr Maria Alexandropoulou.
Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.
CHAPTER TWENTY-FOUR PORTFOLIO PERFORMANCE EVALUATION.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 4 PERCEPTIONS ABOUT RISK AND RETURN Behavioral Corporate.
Part-of-speech Tagging cs224n Final project Spring, 2008 Tim Lai.
Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.
Text Categorization Moshe Koppel Lecture 9: Top-Down Sentiment Analysis Work with Jonathan Schler, Itai Shtrimberg Some slides from Bo Pang, Michael Gamon.
1 I256: Applied Natural Language Processing Marti Hearst Sept 20, 2006.
Introduction to Language Models Evaluation in information retrieval Lecture 4.
Name Extraction from Chinese Novels CS224n Spring 2008 Jing Chen and Raylene Yung.
Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information Systems Singapore Management University AIRS 2014, Kuching,
Excel Modeling Non Linear Regression Anchored By: Renu Rao Kaveh Saba.
Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.
STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.
Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews K. Dave et al, WWW 2003, citations Presented by Sarah.
Text Classification using SVM- light DSSI 2008 Jing Jiang.
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
SESSION 19: SAVING AND INVESTING Talking Points Saving 1. Saving is allocating part of one’s current income toward the purchase of goods and services in.
Summary  The task of extractive speech summarization is to select a set of salient sentences from an original spoken document and concatenate them to.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
HW7 Extracting Arguments for % Ang Sun March 25, 2012.
Chapter 6: N-GRAMS Heshaam Faili University of Tehran.
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
Experiments of Opinion Analysis On MPQA and NTCIR-6 Yaoyong Li, Kalina Bontcheva, Hamish Cunningham Department of Computer Science University of Sheffield.
Domain-Specific Iterative Readability Computation Jin Zhao 13/05/2011.
1 Reference Julian Kupiec, Jan Pedersen, Francine Chen, “A Trainable Document Summarizer”, SIGIR’95 Seattle WA USA, Xiaodan Zhu, Gerald Penn, “Evaluation.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Mebi 591D – BHI Kaggle Class Baselines kaggleclass.weebly.com/
CS 6998 NLP for the Web Columbia University 04/22/2010 Analyzing Wikipedia and Gold-Standard Corpora for NER Training William Y. Wang Computer Science.
Semi-supervised Dialogue Act Recognition Maryam Tavafi.
TEXT ANALYTICS - LABS Maha Althobaiti Udo Kruschwitz Massimo Poesio.
Author Age Prediction from Text using Linear Regression Dong Nguyen Noah A. Smith Carolyn P. Rose.
LING 573 Deliverable 3 Jonggun Park Haotian He Maria Antoniak Ron Lockwood.
​ Text Analytics ​ Teradata & Sabanci University ​ April, 2015.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
CSC 594 Topics in AI – Text Mining and Analytics
Recognizing Stances in Online Debates Unsupervised opinion analysis method for debate-side classification. Mine the web to learn associations that are.
CSC 594 Topics in AI – Text Mining and Analytics
Date: 2012/11/29 Author: Chen Wang, Keping Bi, Yunhua Hu, Hang Li, Guihong Cao Source: WSDM’12 Advisor: Jia-ling, Koh Speaker: Shun-Chen, Cheng.
Week 10 Emily Hand UNR.
Chunk Parsing II Chunking as Tagging. Chunk Parsing “Shallow parsing has become an interesting alternative to full parsing. The main goal of a shallow.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
Information Extraction Entity Extraction: Statistical Methods Sunita Sarawagi.
Investment Analysis and Portfolio Management Frank K. Reilly & Keith C. Brown C HAPTER 2 BADM 744: Portfolio Management and Security Analysis Ali Nejadmalayeri.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
The P YTHY Summarization System: Microsoft Research at DUC 2007 Kristina Toutanova, Chris Brockett, Michael Gamon, Jagadeesh Jagarlamudi, Hisami Suzuki,
A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.
Consumer Health Question Answering Systems Rohit Chandra Sourabh Singh
Conditional Random Fields & Table Extraction Dongfang Xu School of Information.
Recent Paper of Md. Akmal Haidar Meeting before ICASSP 2013 報告者:郝柏翰 2013/05/23.
Thumbs up? Sentiment Classification using Machine Learning Techniques Jason Lewris, Don Chesworth “Okay, I’m really ashamed of it, but I enjoyed it. I.
That's What She Said: Double Entendre Identification Kiddon & Brun 2011.
An Effective Statistical Approach to Blog Post Opinion Retrieval Ben He, Craig Macdonald, Jiyin He, Iadh Ounis (CIKM 2008)
Language Identification and Part-of-Speech Tagging
A Simple Approach for Author Profiling in MapReduce
CRF &SVM in Medication Extraction
Textural sentiment in finance
Linear regression project
N-Gram Model Formulas Word sequences Chain rule of probability
Predicting Prevalence of Influenza-Like Illness From Geo-Tagged Tweets
CS224N: Query Focused Multi-Document Summarization
Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.
Presentation transcript:

Stock Volatility Prediction using Earnings Calls Transcripts and their Summaries Naveed Ahmad Aram Zinzalian

Setup – SVM Text Regression Output: Future Log Return Volatility, where log returns = ln(P(t+1)/P(t)) Baseline: Historical Volatility – i.e. volatility from previous quarter Predictors/Transcript + Summary Features: N-gram TF/IDF scores, POS tags, Target word frequency

Setup TF/IDF Scores: give weight to transcript specific words - need to ignore analyst names POS Tags: Is higher sentence adjective frequency indicative of risk/uncertainty? Target Words: Handpick words we give more weight to, e.g. risk, uncertain, variable, decline, etc. Summaries: Do we really need the whole text? Why not just extract most informative sentences

Experiments Vary: Prediction time frame i.e. look-back and look- forward periods by 20, 40, and 60 days. Train set size from 25 to 386 transcripts. NLP Models between Unigram with TF/IDF Bigram and Unigram with TF/IDF Unigram and Bigram with Sentence Summaries

Results ……………………………………………………….

Results Bigram and Unigram Models Improve over Historical Baseline POS Tags (JJ) and Target Words also slightly lower MSE Regression on summarized text does not help but the generated summaries were informative

Future Work More Data! Different regression model (CART?) Q/A Based Summarization Feature Selection Better heuristic for choosing bigrams SO-PMI scores with target words