AND Working Groups July 24, 2008  Slide 1 AND Second Workshop on Analytics for Noisy Unstructured Text Data Group-1  Task : Data sets, benchmarks, evaluation.

Slides:



Advertisements
Similar presentations
This document contains information and data that AAUM considers confidential. Any disclosure of Confidential Information to, or use of it by any other.
Advertisements

1 Noisy Text Analytics: An Exercise in Futility? Hwee Tou Ng Department of Computer Science National University of Singapore 8 Jan 2007.
Individul Project: NPD-NET Component 4: Integration and Regional Adaptation of NPD Roadmap Kick-off meeting Region of Central Macedonia – URENIO Research.
Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,
July 2010 D2.1 Upgrading strategy Javier Soto Catalog Release 3. Communities.
Wrap-up Dr. John D. Prange AQUAINT Program Manager
Sentiment Analysis An Overview of Concepts and Selected Techniques.
Sentiment Analysis Applied Advertising & Public Relations Research JOMC 279.
Social Media Monitoring Platform.
The ACCESS Project Jesse Hausler, UDL/Accessibility Coordinator Marla Roll, Director of the Assistive Technology Resource Center.
The ACCESS Project Jesse Hausler, UDL/Accessibility Coordinator Craig Spooner, Project Coordinator.
Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.
1 Overview of Usability Testing CSSE 376, Software Quality Assurance Rose-Hulman Institute of Technology April 19, 2007.
VAST Challenge 2008 Participant Workshop Grand Challenge Summary VAST Challenge 2008 Participant Workshop Sunday Oct 19, 2008 (draft slides included in.
Chapter 2: Business Intelligence Capabilities
Cis-Regulatory/ Text Mining Interface Discussion.
A User-Focused Approach to Redesigning a Library Webpage for a Targeted Audience Start The International and Area Studies Library (IASL) website was developed.
Critical Thinking Skills for all Subjects
Chapter 8 The Marketing Science of Sentiment Analysis To make smart changes in business, you need to understand your customers’ opinions Many market analysts.
More than words: Social networks’ text mining for consumer brand sentiments A Case on Text Mining Key words: Sentiment analysis, SNS Mining Opinion Mining,
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
ISO Tor Stålhane IDI / NTNU. What is ISO ISO 9001 was developed for the production industry but has a rather general structure ISO describes.
Russian Information Retrieval Evaluation Seminar (ROMIP) Igor Nekrestyanov, Pavel Braslavski CLEF 2010.
Media Relations in a Social Media World By Julie DeBardelaben Deputy Director of Public Affairs CAP National Headquarters.
Chapter Copyright © 2009 by Nelson Education Limited. MARKETING RESEARCH Prepared by Simon Hudson, Haskayne School of Business University of Calgary.
Some Personal Observations Donna Harman NIST. Language issues I see learning about accessing information both within and across different languages as.
Introduction to Text and Web Mining. I. Text Mining is part of our lives.
Web Analytics Basic 6-Step Process Based on content from: /od/loganalysis/a/web_analy tics.htm.
Definitions Fact Opinion Rank the following in order of trustworthiness (1 = the most truthful) Which relies more on truth? online news articles Web.
TRECVID Evaluations Mei-Chen Yeh 05/25/2010. Introduction Text REtrieval Conference (TREC) – Organized by National Institute of Standards (NIST) – Support.
SES Xiamen 2007 – Search Term Research & Language Issues Search Term Research & Language Issues - 词研究 握手 ? What?
Data Mining By Dave Maung.
Ch 6. The Evolution of Analytic Tools and Methods Taming The Big Data Tidal Wave 31 May 2012 SNU IDB Lab. Sengyu Rim.
Leveraging Speech Analytics for Customer Satisfaction
Nicola Ferro, Allan Hanbury, Jussi Karlgren, Maarten de Rijke, and Giuseppe Santucci CLEF 2010, 20th Sept. 2010, Padova A PROMISE for Experimental Evaluation.
Blog Track Open Task: Spam Blog Detection Tim Finin Pranam Kolari, Akshay Java, Tim Finin, Anupam Joshi, Justin.
Department of Software and Computing Systems Research Group of Language Processing and Information Systems The DLSIUAES Team’s Participation in the TAC.
Voki Screen-by-Screen Directions. Go to Voki.com and click on “Voki for Education” Tab.
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
NEW REQUIREMENTS New requirements – American Sign Language – Recently Generated Sentences Issues with Requirements Options for Implementation Choice and.
1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.
AQUAINT AQUAINT Evaluation Overview Ellen M. Voorhees.
Blog Track Open Task: Spam Blog Detection Tim Finin Pranam Kolari, Akshay Java, Tim Finin, Anupam Joshi, Justin.
NTU Natural Language Processing Lab. 1 Blog Track Open Task: Spam Blog Classification Advisor: Hsin-Hsi Chen Speaker: Sheng-Chung Yen Date: 2007/01/08.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
By: Hossein and Hadi Shayesteh Supervisor: Mr. James Connan.
1 PROPRIETARY AND CONFIDENTIAL, MARITZ COPYRIGHT 2009July Next Generation Customer Experience Management Webinar 24 th September 2009 Roger Sant.
Foundations of Business Intelligence: Databases and Information Management Chapter 6 VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors.
MarketsandMarkets Presents MarketsandMarkets Presents Customer Experience Management Market Expected to Reach $6.61 Billion by 2017 Customer Experience.
Evaluation Issues: June 2002 Donna Harman Ellen Voorhees.
ASSIGNMENT ACTIVITIES
Future-Oriented Benchmarking through Social Media Analysis
02086 Writing Inspirations Aalto University
Power of Social Media Analytics
Start Time: 1:05pm Chelsea Dohemann Word for Law Firms.
Text REtrieval Conference (TREC)
Are you listening to what your customer have to say?
THE ENTERPRISE ANALYTICAL JOURNEY
Social Media Marketing Analytics 社群網路行銷分析
EDIUM: Improving Entity Disambiguation via User modelling
INTRODUCTION TO HISTORY AND GEOGRAPHY
INNOvation in TRAINING BUSINESS ANALYSTS HAO HElEN Zhang UniVERSITY of ARIZONA
Why IBM Watson.
C SC 620 Advanced Topics in Natural Language Processing
Process Description Tools
Intro to Machine Learning
Text Analytics - Accelerator
Adobe Acrobat DC Accessibility: Accessibility Checker
The Get Involved Group Medicines & Community Pharmacy Wednesday 5 July 2017 Friend’s Meeting House 2pm – 5pm.
Presentation transcript:

AND Working Groups July 24, 2008  Slide 1 AND Second Workshop on Analytics for Noisy Unstructured Text Data Group-1  Task : Data sets, benchmarks, evaluation techniques for analysis of noisy texts.  Participant: Maarten de Rijke, Amaresh Pandey, Donna Harman, Venu Govindaraju, Aixin Sun and Venkat Subramaniam

AND Working Groups July 24, 2008  Slide 2 AND Second Workshop on Analytics for Noisy Unstructured Text Data Datasets  Important to list out datasets that are out there.  A List of datasets that are publicly available can be added to the proceedings along with descriptions as well as comments.  Create a Table: dataset name and source; application; usability; tools for creating and analyzing the data sets.  Take a references from AND 07.  List out Missing things about data sets.  Data sets can be for speech, text, OCR, etc.  LDC and ELDC can be a source for speech data  NIST can be a source for OCR data  List out tools and sources which gives data for academic/Industry research work.

AND Working Groups July 24, 2008  Slide 3 AND Second Workshop on Analytics for Noisy Unstructured Text Data Benchmarks  Identify Popular tasks, organize competitions to create  List of past evaluations and benchmarks say from TREC and list what can be done  Blogs, speech, OCR in TREC 5, legal, spam, cross language text, historical texts and etc.  Create a table: popular tasks; what benchmarks exist; new benchmarks  Give emphasis of certain type of data sets like, Blogs and OCR.

AND Working Groups July 24, 2008  Slide 4 AND Second Workshop on Analytics for Noisy Unstructured Text Data Evaluation  Cascaded evaluation should be done: noise, effect of noise, effects of different stages of processing.  Evaluation requires truth data. Creating labeled truth is costly. So create a common task on a given dataset, that way truth data gets generated.  List evaluation techniques and metrics for common tasks  Create a table which contains: Task, evaluations technique, source and references.

AND Working Groups July 24, 2008  Slide 5 AND Second Workshop on Analytics for Noisy Unstructured Text Data Datasets, Benchmarks, Evaluation Techniques  What Data sets, benchmarks, evaluation techniques are needed for the analysis of noisy texts?  Datasets today comprise mostly newswire data. Blogs, sms, , voice, and other spontaneous communcation datasets are needed. TREC Tracks have recently started including such datasets  Are benchmarks/evaluation dependent on the task QA over blogs….. blogs are not factual Business Intelligence over customer calls and s Opinion and sentiment mining from s and blogs On such datasets agreement between humans is also very low