Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Supervised Learning Techniques over Twitter Data Kleisarchaki Sofia.
Problem Semi supervised sarcasm identification using SASI
NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Overview of Search Engines
Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.
Forecasting with Twitter data Presented by : Thusitha Chandrapala MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.
Information Extraction with Unlabeled Data Rayid Ghani Joint work with: Rosie Jones (CMU) Tom Mitchell (CMU & WhizBang! Labs) Ellen Riloff (University.
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Measuring Language Development in Children: A Case Study of Grammar Checking in Child Language Transcripts Khairun-nisa Hassanali and Yang Liu {nisa,
Mining and Summarizing Customer Reviews
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.
Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali and Vasileios Hatzivassiloglou Human Language Technology Research Institute The.
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
Detecting Semantic Cloaking on the Web Baoning Wu and Brian D. Davison Lehigh University, USA WWW 2006.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
11 A Hybrid Phish Detection Approach by Identity Discovery and Keywords Retrieval Reporter: 林佳宜 /10/17.
Opinion Sentence Search Engine on Open-domain Blog Osamu Furuse, Nobuaki Hiroshima, Setsuo Yamada, Ryoji Kataoka NTT Cyber Solutions Laboratories, NTT.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Natural language processing tools Lê Đức Trọng 1.
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
1 Sentence Extraction-based Presentation Summarization Techniques and Evaluation Metrics Makoto Hirohata, Yousuke Shinnaka, Koji Iwano and Sadaoki Furui.
CIKM Opinion Retrieval from Blogs Wei Zhang 1 Clement Yu 1 Weiyi Meng 2 1 Department of.
Using Semantic Relations to Improve Passage Retrieval for Question Answering Tom Morton.
LOGO 1 Corroborate and Learn Facts from the Web Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Shubin Zhao, Jonathan Betz (KDD '07 )
Department of Software and Computing Systems Research Group of Language Processing and Information Systems The DLSIUAES Team’s Participation in the TAC.
NEW EVENT DETECTION AND TOPIC TRACKING STEPS. PREPROCESSING Removal of check-ins and other redundant data Removal of URL’s maybe Stemming of words using.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.
Learning Extraction Patterns for Subjective Expressions 2007/10/09 DataMining Lab 안민영.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.
Using Semantic Relations to Improve Information Retrieval
Evaluating NLP Features for Automatic Prediction of Language Impairment Using Child Speech Transcripts Khairun-nisa Hassanali 1, Yang Liu 1 and Thamar.
Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC.
AQUAINT Mid-Year PI Meeting – June 2002 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Semi-Supervised Recognition of Sarcastic Sentences in Twitter and Amazon -Smit Shilu.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
An Effective Statistical Approach to Blog Post Opinion Retrieval Ben He, Craig Macdonald, Jiyin He, Iadh Ounis (CIKM 2008)
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
Measuring Monolinguality
Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Sentiment analysis algorithms and applications: A survey
CRF &SVM in Medication Extraction
INAGO Project Automatic Knowledge Base Generation from Text for Interactive Question Answering.
Erasmus University Rotterdam
Social Knowledge Mining
Automatic Detection of Causal Relations for Question Answering
CSE 635 Multimedia Information Retrieval
Text Mining & Natural Language Processing
CS246: Information Retrieval
PURE Learning Plan Richard Lee, James Chen,.
Extracting Why Text Segment from Web Based on Grammar-gram
Presentation transcript:

Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University of Texas at Dallas 1. Summary  More than 22.6 million Americans maintain web sites with regularly updated commentary (blogs), of which at least 38,500 are specifically dedicated to politics  A tool for automatically tagging of political blog posts was introduced.  Political blogs differ from other blogs as they often revolve around named entities (politicians, organizations and places). Therefore, tagging of political blog posts benefits from using basic named entity recognition to improve tagging.  Tag identification using a hybrid approach (statistical and grammatical) yield better results  Sood et. al report a precision/recall of 13.11%/22.83% whereas Wang and Davidson report a precision/recall of 45.25%/23.24%. Our recall is higher perhaps because of the domain. 7. Experimental Results 8. Conclusion 5. Tag Detection using Support Vector Machines Collect data from several blogs that tag data Preprocess data – Parse HTML and rectify errors Divide data into posts and index them by their tags Train the SVMs on the training data Output Input One classifier for each tag Blog URLs Training of SVM classifiers Detection of Tags Collect data from the blog Preprocess data – Parse HTML and rectify errors Divide data into posts Run all the classifiers on each post Output Input Top five tags associated with each post Blog URL  Many blogs tag their posts  Tags are representative of the topics discussed  Training data was collected from “Daily Kos” and “Red State”  100,000 posts from Daily Kos ( )  70,000 posts from Red State ( )  A total of 787,780 tags  Used Joachim’s SVM Light  Use the same SVM based approach with new features based on grammatical knowledge  Proper Nouns are frequently topics  Place a higher weight on proper and common nouns  Identifying entities referred by different names  Barack Obama, Obama and Barack Hussein Obama refer to the same person Fetch data from blog Preprocess data and segment into posts Perform shallow parsing Extract Noun Phrases Input Blog URL Output Top scoring nouns Extraction of Tag Nouns Fetch data from blog Preprocess data and segment into posts Perform co- reference resolution Extract entities Input Blog URL Output Top scoring entities Extraction of Tag Entities using Named Entity Recognition and Co-reference Resolution Fig. 1: Tag Detection using Support Vector Machines Fig. 2: Tag Detection using Grammatical Techniques 3. The Larger Problem  Given multiple texts from two or more blogs/political sources, answer the following questions:  On which subjects the texts, as a whole across each source, agree/disagree?  How similar are the sources’ positions?  What makes them agree/disagree?  Difficult to associate an attitude with a specific topic/subject  Many clues are implicit and appear to require deep semantic analysis  Tags can serve as a basis for bringing together posts about the same topic  Compiling a profile for each political entity: What it talks about and what its position is  Organizing groups of sources according to perspective  Tags for Political blogs are automatically detected  Tags are representative of topics  Significant topics are automatically identified using SVM and other NLP techniques 9. Future Work  Political Profile is a summary of a political entity’s (politician, political group) stance on different issues  Extract the top scoring topics along with the “entities’ sentiments” (attitudes towards topic) and select representative sentences that voice sentiments towards these topics  Aggregate information across texts according to specific criteria (poster, source, time) and quantitatively compare signatures and identify which topics are responsible for the differences 2. Political Blogs 6. Tag Detection using Grammatical Techniques 4. Why are Tags Needed? PrecisionRecallF-Score Single Word SVM27.30%60.30%37.60% + Stemming26.10%59.50%36.30% + Proper Nouns36.50%56.80%44.40% Named Entities48.40%49.10%48.70% All Combined21.10%65%31.90% Manual Scoring67.00%75%70.80% Fig 3: Results on Daily Kos PrecisionRecallF-Score Single Word SVM19.00%30.00%23.30% + Stemming22.00%30.20%25.50% + Proper Nouns46.30%54.00%49.90% Named Entities60.10%41.50%49.10% All Combined20.30%65.70%31.00% Manual Scoring47.00%62.00%53.50% Fig 4: Results on Red State  2681 posts from Daily Kos and 571 posts from Red State  Compared tags to original tags of blog post  Manually evaluated relevance of tags on a small portion of test set