Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Noisy Text Analytics: An Exercise in Futility? Hwee Tou Ng Department of Computer Science National University of Singapore 8 Jan 2007.

Similar presentations


Presentation on theme: "1 Noisy Text Analytics: An Exercise in Futility? Hwee Tou Ng Department of Computer Science National University of Singapore 8 Jan 2007."— Presentation transcript:

1 1 Noisy Text Analytics: An Exercise in Futility? Hwee Tou Ng Department of Computer Science National University of Singapore 8 Jan 2007

2 2 Noisy Text Analytics: An Exercise in Futility?

3 3 Sources of Noisy Text Traditional sources –Automatically transcribed text from speech –Automatically OCRed text from image

4 4 Sources of Noisy Text More recent sources from the Web –Blogs, wikis, message boards, online chats, SMS, etc. –User generated content

5 5 Sources of Noisy Text More recent sources from the Web –Blogs, wikis, message boards, online chats, SMS, etc. –User generated content –Informal text »Acronyms, abbreviations, specialized vocabulary »Sublanguage, sub-community

6 6 Importance The rise of social media (Web 2.0) –Commercial, economic interest

7 7 Importance ACL SIGWAC (Special Interest Group on the Web as Corpus, Association for Computational Linguistics) –CLEANEVAL (shared task and competition for web corpus cleaning)

8 8 Noisy Text Analytics: An Exercise in Futility?

9 9 An Exercise in Futility? Necessity is the mother of invention!

10 10 Noisy Text Analytics: An Exercise in Futility?

11 11 What is Analytics? American Heritage Dictionary –The branch of logic dealing with analysis Merriam-Websters Online Dictionary –The method of logical analysis

12 12 Analytics Approach #1 –Eliminate the noise in noisy text (text normalization), followed by processing the text as per normal »Noise: Misspelled words, wrongly cased words, wrong sentence and paragraph boundaries –Examples: »Table recognition Learning to Recognize Tables in Free Text, H T Ng, C Y Lim, J L T Koo, ACL 1999

13 13 Table Recognition

14 14 Table Recognition

15 15 Table Recognition

16 16 Analytics Approach #2 –Process the noisy text as is directly –Examples: »Upper case text (e.g., speech recognizer output) Teaching a Weaker Classifier: Named Entity Recognition on Upper Case Text, H L Chieu, H T Ng, ACL 2002 »Semi-structured text (e.g., seminar announcements, job advertisements) A Maximum Entropy Approach to Information Extraction from Semi-Structured and Free Text, H L Chieu, H T Ng, AAAI 2002


Download ppt "1 Noisy Text Analytics: An Exercise in Futility? Hwee Tou Ng Department of Computer Science National University of Singapore 8 Jan 2007."

Similar presentations


Ads by Google