Presentation is loading. Please wait.

Presentation is loading. Please wait.

Web IR/NLP Group NUS Min-Yen Kan School of Computing National University of Singapore

Similar presentations


Presentation on theme: "Web IR/NLP Group NUS Min-Yen Kan School of Computing National University of Singapore"— Presentation transcript:

1 Web IR/NLP Group (WING) @ NUS Min-Yen Kan School of Computing National University of Singapore http://wing.comp.nus.edu.sg/

2 Min-Yen Kan 2MSRA Web-Scale NLP Worshop (Daedeok, Korea) Web IR/NLP Group @ NUS Support staff (undergraduate) System administrators System programmers Undergraduate Projects 4 this year (ask me about topics) PI: Min-Yen KAN (NLP and IR/DL) Postdoc: Su Nam KIM (Multiword Expressions) PhDs: Hendra SETIAWAN (Stat MT) Long QIU (Scenario Templates) Yee Fan TAN (Web Record Linkage) Jin ZHAO (Math IR) Jesse PRABAWA (UI/HCI for DLs) Ziheng LIN (Summarization) One of many groups doing these type of research at NUS Will go over NLP then DL for today

3 Min-Yen Kan 3MSRA Web-Scale NLP Worshop (Daedeok, Korea) Information Extraction Keyphase Extraction – Idea: Use section information as evidence (ICADL 07) Scenario Template Generation (Long Qiu) – Aim: to generate database rows from similar news events Charley landed further south on the Gulf Coast than predicted, … The hurricane … was weakened and is moving over South Carolina At least 21 missing after the storm hit … But Tokage had weakened by the time it passed over Tokyo, where it had left little damage before moving out to sea. – Model context and cluster to convergence using EM (EMNLP 06)

4 Min-Yen Kan 4MSRA Web-Scale NLP Worshop (Daedeok, Korea) Using less data URL Classification (WWW 04) http://www.usatoday.com/stories/080502/ent/hilton.html http://www.cancersupportgroup.org/forum/230.html – Classifies 1000’s of URLs per minute, with 2/3rds of full text accuracy – Useful for focused crawling, web mining applications

5 Min-Yen Kan 5MSRA Web-Scale NLP Worshop (Daedeok, Korea) Question-Answering (Hang Cui) Our Approaches to QA – Use of external resources from Web & WordNet (SIGIR04) – Employ dependency & SRL for answer extraction (SIGIR05, 06) – Soft pattern analysis of definitional patterns (WWW 05) – Explore temporal relationships and events – Extend techniques to precise passage retrieval – Came 2nd (in 2003, 2004 & 2005) in TREC QA Task – Licensed technology to company in legal search Current focus – Relation-based IE & QA – continue focus on linguistic knowledge – Ontology-based Interactive QA – leverage on domain knowledge – Searching for answers and mining terminology from the Web

6 Min-Yen Kan 6MSRA Web-Scale NLP Worshop (Daedeok, Korea) Summarization (Ziheng Lin) Document Concept Lattice Model (IPM 07) – Aim to find list of sentences that result in minimal info lost – Extract key concept terms, and build concept lattice – Perform sentence extraction that covers max concept terms – Participated in DUC, came in 1st (2005) and 2nd (2006) Pioneered iterative construction model for graph-based summarization (DUC 07) doc1doc2doc3 s1 doc1doc2doc3 s1 s2 doc1doc2doc3 s1 s2 s3

7 Min-Yen Kan 7MSRA Web-Scale NLP Worshop (Daedeok, Korea) Statistical Machine Translation (Hendra Setiawan) 表单 是 网页 上 的 数据 输 域 的 集合 表单 是 集合 的 数据 输 域 的 上 网页 a page is a coll. of data entry fields on a page a formisa pageondata entry fieldsofa coll. 上 网页上 网页 on a page 数据 输 域 的 上 网页 ona pagedata entry fields 集合 的 数据 输 域 的 上 网页 data entry fields on a pagea coll. of Function Word Based Reordering (ACL 07)

8 Min-Yen Kan 8MSRA Web-Scale NLP Worshop (Daedeok, Korea) Commercial record linkage (Yee Fan Tan) Addresses – Dongwon Lee, 110 E. Foster Ave. #410, State College, PA, 16802 – LEE Dong, 110 East Foster Avenue Apartment 410, Univ. Park, PA 16802-2343 Products – Honda Fix vs. Honda Jazz – Apple iPod Nano 4GB vs. 4GB iPod nano 4GB Idea: use web as additional context for disambiguation and clustering (JCDL 06, WIDM 07) Placed 3 rd in Web People Search Task (WEPS 2007)

9 Min-Yen Kan 9MSRA Web-Scale NLP Worshop (Daedeok, Korea) Multi(ple) Extensions Multimodal Alignment – Lyrics with Audio (ACM MM 04) – Slides with Paper (JCDL 07) Current and future work: – Extracted Terminology with User Tagging – Text in FocusSlide in Focus

10 Min-Yen Kan 10MSRA Web-Scale NLP Worshop (Daedeok, Korea) Focusing on the User Understanding user searches better – Known item search (JCDL 2005) – Faceted classification of web queries (WebQ 2007) Building better user interfaces (Jesse Prabawa) – Revisiting library catalog interfaces to better support searching (JCDL 2007)

11 Min-Yen Kan 11MSRA Web-Scale NLP Worshop (Daedeok, Korea) Putting it all together We’re building a niche academic research repository – e.g., MS Libra, CiteSeer, DBLP, Google Scholar What? Another one? What’s the catch? – The user interaction and community involvement is central – Overcome faults of imperfect machine learning – Platform for researching how web-scale NLP actively involves user feedback and mechanisms for channeling this What about Web NLP / IR? – My group emphasizes practical outcomes and deliverables – Find research within industry and practical problems – Multilingual, multimedia, web-as-data angles likely to continue

12 Min-Yen Kan 12MSRA Web-Scale NLP Worshop (Daedeok, Korea) Other pointers (NUS-wide) Text Processing Seminar (with archived slides) http://wing.comp.nus.edu.sg/chimetext Machine Learning (Graphical Models) Reading Group http://groups.google.com/group/mlnus/ NLP Reading Group http://wing.comp.nus.edu.sg/NLPReading/index.php/Main_Page Shameless plug for my group: http://wing.comp.nus.edu.sghttp://wing.comp.nus.edu.sg Thanks for listening!


Download ppt "Web IR/NLP Group NUS Min-Yen Kan School of Computing National University of Singapore"

Similar presentations


Ads by Google