Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Validating Transliteration Hypotheses Using the Web: Web.

Similar presentations


Presentation on theme: "Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Validating Transliteration Hypotheses Using the Web: Web."— Presentation transcript:

1 Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Validating Transliteration Hypotheses Using the Web: Web Counts vs. Web Mining Presenter : You Lin Chen Authors : Hikaridai,Seika-cho, Soraku-gun, Kyoto 2007.WI.7

2 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2 Outline Motivation Objective Methodology Experiments Conclusion Comments

3 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Motivation Web counts hit counts approximate Web frequency. Some Web search engines disregard punctuation and capitalization when matching a search term. Second, it is not easy to consider the contexts of transliteration hypotheses with Web counts. 3

4 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Objectives To address these problems, we propose a novel method for validating transliteration hypotheses based on Web mining. 4

5 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology 5 Ranking transliteration hypotheses machine transliteration system transliteration hypotheses Clinton 克林頓 Query Clinton 、 克林頓 Data Set Generate Web Pages contextual Information as feature trained SVM English terms Extract Ranking transliteration hypotheses trained MEM

6 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology freq(tci,Wl): the number of occurrences of tci in W l ex : freq(tci,W1)=6 freq d (SW, tci, W l, d): Co- occurrence of SW and tci within distance d ex : freq d (SW, tc,W,d=10)=5 freq p (SW, tci,Wl,d): Co- occurrence of SW and tc as parenthetical expressions within distance d ex : freq p (SW, tci,W1,d=10)=5 6

7 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology x i ∈ X be a feature vector of tc i ∈ TC g SVM (x )= w · x i + b, where x cor is a positive sample and the others are negative samples 7

8 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology g MEM (xi)= Pr(tc cor |xi) The maximum entropy model (MEM) is a widely used probability model that can in- corporate heterogeneous information e ff ectively. an event (ev) is usually composed of a target event (te) and a history event (he); say ev =. 8

9 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 9 Experiments

10 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 10 Conclusion Experiments showed that our Web mining-based transliteration validation method was consistently better than systems based on Web counts

11 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 11 Comments Advantage  … Drawback  … Application  …


Download ppt "Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Validating Transliteration Hypotheses Using the Web: Web."

Similar presentations


Ads by Google