Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Finding Terminology Translations From Hyperlinks On the.

Similar presentations


Presentation on theme: "Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Finding Terminology Translations From Hyperlinks On the."— Presentation transcript:

1 Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Finding Terminology Translations From Hyperlinks On the Internet Advisor : Dr. Hsu Graduate : Kuo-min Wang Authors : Shuang-Qing, Yuan, Fang Li, Huan-Ye Sheng 2002 IEEE.

2 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2 Outline Motivation Objective Introduction Methodology Evaluation and Results Conclusions Personal Opinion

3 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 3 Motivation Many web pages have parallel homepages or unparallel homepages because worldwide people can access information through the Internet. Bilingual lexicon extraction from parallel corpus and homepages has been widely investigated and researched.

4 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 4 Objective  We describe a novel method to find terminology translations from hyperlinks between bilingual homepages on the Internet.

5 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 5 Introduction Parallel homepages Information presented in two or more natural languages, they are translations of each other Unparallel homepages Also present in different languages, but they are not always corresponding to each other

6 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 6 Methodology A Web page consists of 4 components: Title of the web page, text of the web page, markups Hyperlinks, which provide entrances into other web pages, consist of paths to the source documents. Examples 圖書館 Libraries 新會員註冊 New member sign up 企業動態 Company dynamic We know that 圖書館 &Libraries ;新會員註冊 &New member sign up ; 企業動態 &Company dynamic Those are translations of each other

7 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 7 HyperLink vector & dice coefficient A hyperlink may consist of several level of directories, each level is divided with (/) such as: Suppose there are two hyperlinks consisting of n terms, for example, H e (T 1, T 2,….T n ) and H c (M 1, M 2,…M n ) Dice coefficient The similarity of two hyperlink is calculated:

8 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 8 The algorithm First Step. Find out the anchor point for two hyperlinks vector DIS, for example: | 網站地圖 sitemap The two vectors corresponding to the above two hyperlink are: H c =(sitemap, index.html) H e =(eng,sitemap,index.html) The anchor point is “sitemap” Suppose H c =(T 1, T 2, … T n ), H e =(M 1, M 2, M m ) If T 1 is corresponding to M i in H e, the Displacement(DIS) is (i -1) for the above example, DIS=1 H e will become  H e =(sitemap,index.html)

9 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 9 The algorithm (cont.) Second Step. Calculate REC Suppose there are two vectors: H cc =(T 1,T 2,…T n ) & H ee =(M 1, M 2,…M m ), if n <m, we will consider only n terms in H ee. We define a variable REC to describe such situation. Example. H cc = (sitemap, index.html) H ee = (eng, sitemap, index.html) REC = 2 × 2 / (2 + 3) = 4 /5 Third Step. calculate the similarity The similarity of two vectors is : Fourth Step. Calculate the final result

10 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 10 Evaluation and Results

11 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 11 Conclusions According to the result we made, the method is reasonable and correct. It can apply to any language pairs and any domains. In the future, we will extend the experiment to English and German, to get multilingual translation pairs for multilingual information retrieval and extraction.


Download ppt "Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Finding Terminology Translations From Hyperlinks On the."

Similar presentations


Ads by Google