Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cross Lingual Patent Retrieval Issues in Korean Language Minah Kim Korea Institute of Patent Information.

Similar presentations


Presentation on theme: "Cross Lingual Patent Retrieval Issues in Korean Language Minah Kim Korea Institute of Patent Information."— Presentation transcript:

1 Cross Lingual Patent Retrieval Issues in Korean Language Minah Kim Korea Institute of Patent Information

2 Agenda I. Why cross lingual patent retrieval system in Asian language? II. Introduction of Korea's cross lingual retrieval system : K2E-PAT III. How did we come up with cross lingual retrieval system? IV. Strength of the cross lingual patent retrieval system V. What needs to be improved VI. Future Plan

3 1. Patent Deal “By its nature, a patent can be seen as socio-economic contract between an inventor and society. Upon voluntary request by an inventor, who fulfills certain requirements, society grants patent rights to the inventor, who in turn ‘pays’ for the rights by disclosing the invention to the public.” -Edward Elgar(1999), ‘The Economics and Management of Intellectual Property’, p71 Ⅰ. Why cross lingual patent retrieval system in Asian language?

4 2. Exploding patent documents 1,660,000 Chart 1. The number of patent applications filed worldwide by year Ⅰ. Why cross lingual patent retrieval system in Asian language?

5 3. Increasing volume of patent documents especially from Asia Chart 2. Top 20 Patent Offices in 2005 Ⅰ. Why cross lingual patent retrieval system in Asian language?

6 3. Increasing volume of patent documents especially from Asia Ⅰ. Why cross lingual patent retrieval system in Asian language?

7  To guarantee accurate prior art search, patent retrieval system should guarantee accurate search result.  To read Asian patent documents, patent retrieval system should be cross lingual Ⅰ. Why cross lingual patent retrieval system in Asian language?  We need high quality cross lingual patent retrieval system

8 Ⅱ. Introduction of Korea's cross lingual retrieval system : K2E-PAT 1. System Workflow Translation Engine Dictionary ( 3 million technical terms, Names, address, translation pattern) IP users K2E on KIPRIS DictionaryUpdates Internet eng.kipris.or.kr Translation Memory Distributed Translation

9 Ⅱ. Introduction of Korea's cross lingual retrieval system : K2E-PAT 1. System Workflow IP users Query in English Korean TranslationServer Korean Patent Full Doc Search Result in English in English

10 Ⅱ. Introduction of Korea's cross lingual retrieval system : K2E-PAT 2. Sample Result

11 ※ PCT - Claims ※ K2E - Claims Ⅱ. Introduction of Korea's cross lingual retrieval system : K2E-PAT 2. Sample Result

12 Document KindCoverage Patent Unexamined Publication (A)1983 ~ Onwards Examined Publication(B1)1979 ~ Onwards Utility Model Unexamined Publication (U)1998 ~ Onwards Examined Publication (Y1)1979 ~ Onwards ※ All publications are daily updated. 3. Service Coverage Ⅱ. Introduction of Korea's cross lingual retrieval system : K2E-PAT

13 3. Specifics - Case Based Grammar for MT : 300,000 patterns accumulated - Indexing Method : Word Indexing -- Case Based Grammar for MT 300,000 sentence patterns were extracted from patent documents. 특허문서에서 추출한 특허문형에 대한 패턴화 ex) 본발명 + 은 NP1+ 를 제공하 + 는 것 + 를 과제 + 로 하 + 다 본발명 + 은 NP1+ 에 관한 것 + 으로 NP2+ 로 구성되 + 다 I- Indexing Method : Word indexing

14 Ⅱ. Introduction of Korea's cross lingual retrieval system : K2E-PAT 4. Problems when parsing Korean language patent documents - Word order - A variety of expressions - Problem with spacing rule -- Case Based Grammar for MT ex) (English) S + V + O => (Korean Language) S + O + V Original KR : 본 발명은 회로에 관한 것으로, 특히 반도체 회로에 관한 것이다 Pattern : 본 발명은 NP1에 관한 것으로, 특히 NP2에 관한 것이다 MT : The present invention relates to NP1, particularly, to NP2 -- Case Based Grammar for MT ex) Red => 빨갛다, 뻘겋다, 붉다, 붉그스름하다, 붉그레하다 > 빨갛다 ex) 동기화방안(synchronization plan) 동+기화+방안(east evaporation plan), 동기+화+방안(same period fire plan), 동기+화방+안(same period studio plan)

15 Ⅱ. Introduction of Korea's cross lingual retrieval system : K2E-PAT 5. Methodology handling vocabulary problem - Synonymy - Acronym -- Case Based Grammar for MT ex) Computer : query => (컴퓨터+콤퓨터+전자계산기+컴푸터) -- Case Based Grammar for MT ex) ex) (O -> Oxygen atom, CC -> Carbon-carbon bond)

16 [Food]+를 + eat, have,.. [Vehicle]+를 + ride, get … [ Body]+가 + hungry, empty.. 배 Pear [Food] Ship [Vehicle] Abdomen [Body] 예문 : 내 동생이 배를 타고 영국으로 가고 있다. Possible Translation Homonym Patten List 배+를 타+고 K2E-PAT MT Result: My younger brother gets on the ship and the ship goes to the United Kingdom. My younger brother is on a boat to England. Ⅱ. Introduction of Korea's cross lingual retrieval system : K2E-PAT 5. Methodology handling vocabulary problem

17 KoreanEnglishAttributeIPCFrequency 배 TimesUnits67% BoatVehicleB Section12% ShipVehicleB Section10% AbdomenBodyA Section8% PearFoodA Section3% - 3 Million Terminologies are classified by IPC and Frequency Fisherman’s boat Ⅱ. Introduction of Korea's cross lingual retrieval system : K2E-PAT 5. Methodology handling vocabulary problem : Dictionary Classified by IPC & Frequency

18 Ⅲ. How did we come up with Cross Lingual retrieval system? 1. 2 options for foreign users to access English version of KR patents. KR patent retrieval through CLIR system Web Server MT MT Server Search Server DB Server (Korean Data) English Keyword Korean Case 2. Korean Language Data KR patent retrieval through K2E MT DB Web Server Search Server DB Server (K2E MT Data) English Keyword Case 1. English Language Data

19 Ⅲ. How did we come up with cross lingual retrieval system? 2. Weakness of KR patent retrieval through K2E MT database Need Lots of Resources  System Cost Increasing number of servers are needed to upload exploding patent documents  Staff & Time Cost You need to hire staff and spend time to make English Machine Translated Database of Korean patent documents  Maintenance Cost All the MT database needs to be upgraded according to the MT engine and dictionary upgrade Maintenance cost according to the database upgrade will rise

20  Less Maintenance Fee (as discussed in the previous chapter) 2. Consistent Data Quality Improvement Consistent Dictionary Upgrade  Quarterly Upgrade  How to upgrade dictionary Words that MT engine failed to translate are automatically extracted New words are extracted from new KR patent publications (Yearly) Ⅳ. Strength of the cross lingual patent retrieval system

21 Ⅴ. What needs to be improved 1. Efforts to improve quality Semantically Based Query expansion  Plan to expand each word in the original query by its nearest neighbors in the other language ex) if boat is query, the query can be extended to ship, vessel, craft, water etc. 2. Efforts to improve speed Service response time : 0.5 sec  Average time for one document retrieval : 10 sec  Large volume document retrieval takes longer, in the worst case leads to time error. Speed will get worse if query expansion technique is adopted

22 Ⅵ. Future Plan 1. Short term goal Consistent Dictionary Upgrade Establishment of query expansion (2007. 12) 2. Long term goal Cross lingual retrieval system for Chinese, Japanese and Korean patent documents Combination of Ontology with existing system. 1. Short term goal Consistent Dictionary Upgrade Establishment of query expansion (2007. 12) 2. Long term goal Cross lingual retrieval system for Chinese, Japanese and Korean patent documents Combination of Ontology with existing system.

23 Korea Institute of Patent Information Thank You T. 82-2-3452-8144 F. 82-2-3453-2967 E. minah76@kipi.or.kr © 2007 KIPI. All rights reserved. This presentation is for informational purposes only. KIPI MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.


Download ppt "Cross Lingual Patent Retrieval Issues in Korean Language Minah Kim Korea Institute of Patent Information."

Similar presentations


Ads by Google