Cross Lingual Patent Retrieval Issues in Korean Language Minah Kim Korea Institute of Patent Information.

Slides:



Advertisements
Similar presentations
WIPO Patent Information Services
Advertisements

Presentation of the Chinese Delegation on Its Proposal March 15 to 17, 2011 Moscow.
PubMed and its search options Jan Emmerich, Sonja Jacobi, Kerstin Müller (5th Semester Library Management)
Searching for Prior Art: Moving From the Search Room to the World Wide Web Larry Tarazano Primary Examiner Technology Center 1700 U.S. Patent and Trademark.
Korean Intellectual Property Office Closer look at Korean patent documents - To find a way to be the PCT Minimum Documentation - F Feb , 2005 KIM,
Multilingual Information Access in a Digital Library Vamshi Ambati, Rohini U, Pramod, N Balakrishnan and Raj Reddy International Institute of Information.
Features and Uses of a Multilingual Full-Text Electronic Theses and Dissertations (ETDs) System Yin Zhang Kent State University Kyiho Lee, Bumjong You.
Modern Information Retrieval
Patent CLEF John Tait, Chief Scientific Officer, IRF.
Chapter 5: Information Retrieval and Web Search
An innovative platform to allow translation and indexing of internet sites Localization World
Funded under the EU ICT Policy Support Programme Automated Solutions for Patent Translation John Tinsley Project PLuTO WIPO Symposium of.
1 Basic Facts about Patents Chem 3380 Fall Patent Documents  Legal Document A patent is a legal right granted by a government to an inventor.
Promotion for SMEs under Japanese Patent Policy Eiichi Yamamoto Patent Information Policy Planning Office Japan Patent Office.
1 United States Patent and Trademark Office Revised PCT International Search and Preliminary Examination Guidelines Biotech/ChemPharm Customer Partnership.
China Patent Information For Western Users Huabing Liu Intellectual Property Publishing House, SIPO.
1 Raytec Co.,Ltd. Main businesses and Applied example for Patent information activity Copyrighted by Raytec Takashi Tsurumi Raytec Co.,Ltd.
0 © WIPO – 2003 PF & CJF CLAIMS Computer-Assisted Categorisation of Patent Documents in the International Patent Classification Patrick Fiévet, CLAIMS.
Dr. Michael Berger, European Patent Attorney © Michael Berger Intellectual Property (IP): Patents for Inventions.
1 QUESTEL ORBIT.COM. 2 QUESTEL French company Producer and provider of online and internet services Collection of patents, trademarks, designs, scientific-technical.
Patents- Practical Aspects of International Patent Procurement/Prosecution June 2015 Patent Cooperation Treaty (PCT) Practice Overview.
PATENTSCOPE Patent Search Strategies and Techniques Andrew Czajkowski Head, Innovation and Technology Support Section Centurion September 11, 2014.
Access to patent information and the role of classification Mikhail Makarov World Intellectual Property Organization IPC Forum 2006 Geneva.
Competitive Sourcing of Classification at the United States Patent and Trademark Office Terrence Mackey International Liaison Staff U.S. Patent and Trademark.
Knowledge Representation and Indexing Using the Unified Medical Language System Kenneth Baclawski* Joseph “Jay” Cigna* Mieczyslaw M. Kokar* Peter Major.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
The PATENTSCOPE search system: CLIR February 2013 Sandrine Ammann Marketing & Communications Officer.
1 April 2, 2009 Byeong-Yup,Lee Director of Technical Cooperation Division.
PCT FILING - ADVANTAGES© Dr. S. Padmaja, Managing Partner, iProPAT June 21, 2012.
The Key to Successful Searching Software patents pending. ™ Trademarks of SLICCWARE Corporation All rights reserved. SM Service Mark of SLICCWARE Corporation.
Probabilistic Query Expansion Using Query Logs Hang Cui Tianjin University, China Ji-Rong Wen Microsoft Research Asia, China Jian-Yun Nie University of.
Patent Application in Korea Internet Gazette on the Internet (PDF) Online Examination/ Registration Computerized Search Patent Document Digitalization.
Chapter 6: Information Retrieval and Web Search
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
PatentScope - Electronic Publication World Intellectual Property Organization.
1 July 30, 2008 Byeong-Yup LEE Director Technical Cooperation Division.
Information Retrieval
- University of North Texas - DSCI 5240 Fall Graduate Presentation - Option A Slides Modified From 2008 Jones and Bartlett Publishers, Inc. Version.
WIPO Patent Information Activities and Cooperation Projects September 2009.
Patent Searching Basics Patrick M. Torre, Ph.D. November 18, 2015.
Patents Presented by Cutting Edge Homework Development.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.
PATENT SEARCHES Istituto Nazionale Fisica Nucleare Rossella Osella.
NA, Yanghee International Application Team Korean Intellectual Property Office National Phase of PCT international applications April 26,
Advantages of PCT - Experiences of KIPO - 22 August 2008 Lee, Hyun-Song.
WIPO Patent Search. DO I NEED A PATENT SEARCH ? A patent search is a good idea but it costs money upfront. Deciding whether to spend the money on a patent.
PATENTSCOPE Patent Search Strategies and Techniques Andrew Czajkowski Head, Innovation and Technology Support Section.
Patent information from Japan and Korea
Patent information from China and Taiwan
Searching for Prior Art: Moving From the Search Room
Far East: Patents, Trademarks and Designs
PCT Statistics PCT Working Group Tenth Session
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
FAR EAST AND PATENT INFORMATION
Retrospective of 2016 & plans for 2017
The Smart Patenting Solution
ASEAN PATENTSCOPE Service
Is IP Helping or Impeding Economies?
Protection of Intellectual Property Resulting from STCU Projects
CLIR PATENTSCOPE search system
Multilingual Information Access in a Digital Library
Farhad Rezagholi – Amir Hadi University of Kurdistan
Overview of PATENTSCOPE® search service Webinar September 2010
PCT Statistics PCT Working Group Eleventh Session
Introduction of KNS55 Platform
CSE 635 Multimedia Information Retrieval
CLAIMS CLassification Automated InforMation System
PCT Statistics PCT Working Group Twelfth Session
Active AI Projects at WIPO
Presentation transcript:

Cross Lingual Patent Retrieval Issues in Korean Language Minah Kim Korea Institute of Patent Information

Agenda I. Why cross lingual patent retrieval system in Asian language? II. Introduction of Korea's cross lingual retrieval system : K2E-PAT III. How did we come up with cross lingual retrieval system? IV. Strength of the cross lingual patent retrieval system V. What needs to be improved VI. Future Plan

1. Patent Deal “By its nature, a patent can be seen as socio-economic contract between an inventor and society. Upon voluntary request by an inventor, who fulfills certain requirements, society grants patent rights to the inventor, who in turn ‘pays’ for the rights by disclosing the invention to the public.” -Edward Elgar(1999), ‘The Economics and Management of Intellectual Property’, p71 Ⅰ. Why cross lingual patent retrieval system in Asian language?

2. Exploding patent documents 1,660,000 Chart 1. The number of patent applications filed worldwide by year Ⅰ. Why cross lingual patent retrieval system in Asian language?

3. Increasing volume of patent documents especially from Asia Chart 2. Top 20 Patent Offices in 2005 Ⅰ. Why cross lingual patent retrieval system in Asian language?

3. Increasing volume of patent documents especially from Asia Ⅰ. Why cross lingual patent retrieval system in Asian language?

 To guarantee accurate prior art search, patent retrieval system should guarantee accurate search result.  To read Asian patent documents, patent retrieval system should be cross lingual Ⅰ. Why cross lingual patent retrieval system in Asian language?  We need high quality cross lingual patent retrieval system

Ⅱ. Introduction of Korea's cross lingual retrieval system : K2E-PAT 1. System Workflow Translation Engine Dictionary ( 3 million technical terms, Names, address, translation pattern) IP users K2E on KIPRIS DictionaryUpdates Internet eng.kipris.or.kr Translation Memory Distributed Translation

Ⅱ. Introduction of Korea's cross lingual retrieval system : K2E-PAT 1. System Workflow IP users Query in English Korean TranslationServer Korean Patent Full Doc Search Result in English in English

Ⅱ. Introduction of Korea's cross lingual retrieval system : K2E-PAT 2. Sample Result

※ PCT - Claims ※ K2E - Claims Ⅱ. Introduction of Korea's cross lingual retrieval system : K2E-PAT 2. Sample Result

Document KindCoverage Patent Unexamined Publication (A)1983 ~ Onwards Examined Publication(B1)1979 ~ Onwards Utility Model Unexamined Publication (U)1998 ~ Onwards Examined Publication (Y1)1979 ~ Onwards ※ All publications are daily updated. 3. Service Coverage Ⅱ. Introduction of Korea's cross lingual retrieval system : K2E-PAT

3. Specifics - Case Based Grammar for MT : 300,000 patterns accumulated - Indexing Method : Word Indexing -- Case Based Grammar for MT 300,000 sentence patterns were extracted from patent documents. 특허문서에서 추출한 특허문형에 대한 패턴화 ex) 본발명 + 은 NP1+ 를 제공하 + 는 것 + 를 과제 + 로 하 + 다 본발명 + 은 NP1+ 에 관한 것 + 으로 NP2+ 로 구성되 + 다 I- Indexing Method : Word indexing

Ⅱ. Introduction of Korea's cross lingual retrieval system : K2E-PAT 4. Problems when parsing Korean language patent documents - Word order - A variety of expressions - Problem with spacing rule -- Case Based Grammar for MT ex) (English) S + V + O => (Korean Language) S + O + V Original KR : 본 발명은 회로에 관한 것으로, 특히 반도체 회로에 관한 것이다 Pattern : 본 발명은 NP1에 관한 것으로, 특히 NP2에 관한 것이다 MT : The present invention relates to NP1, particularly, to NP2 -- Case Based Grammar for MT ex) Red => 빨갛다, 뻘겋다, 붉다, 붉그스름하다, 붉그레하다 > 빨갛다 ex) 동기화방안(synchronization plan) 동+기화+방안(east evaporation plan), 동기+화+방안(same period fire plan), 동기+화방+안(same period studio plan)

Ⅱ. Introduction of Korea's cross lingual retrieval system : K2E-PAT 5. Methodology handling vocabulary problem - Synonymy - Acronym -- Case Based Grammar for MT ex) Computer : query => (컴퓨터+콤퓨터+전자계산기+컴푸터) -- Case Based Grammar for MT ex) ex) (O -> Oxygen atom, CC -> Carbon-carbon bond)

[Food]+를 + eat, have,.. [Vehicle]+를 + ride, get … [ Body]+가 + hungry, empty.. 배 Pear [Food] Ship [Vehicle] Abdomen [Body] 예문 : 내 동생이 배를 타고 영국으로 가고 있다. Possible Translation Homonym Patten List 배+를 타+고 K2E-PAT MT Result: My younger brother gets on the ship and the ship goes to the United Kingdom. My younger brother is on a boat to England. Ⅱ. Introduction of Korea's cross lingual retrieval system : K2E-PAT 5. Methodology handling vocabulary problem

KoreanEnglishAttributeIPCFrequency 배 TimesUnits67% BoatVehicleB Section12% ShipVehicleB Section10% AbdomenBodyA Section8% PearFoodA Section3% - 3 Million Terminologies are classified by IPC and Frequency Fisherman’s boat Ⅱ. Introduction of Korea's cross lingual retrieval system : K2E-PAT 5. Methodology handling vocabulary problem : Dictionary Classified by IPC & Frequency

Ⅲ. How did we come up with Cross Lingual retrieval system? 1. 2 options for foreign users to access English version of KR patents. KR patent retrieval through CLIR system Web Server MT MT Server Search Server DB Server (Korean Data) English Keyword Korean Case 2. Korean Language Data KR patent retrieval through K2E MT DB Web Server Search Server DB Server (K2E MT Data) English Keyword Case 1. English Language Data

Ⅲ. How did we come up with cross lingual retrieval system? 2. Weakness of KR patent retrieval through K2E MT database Need Lots of Resources  System Cost Increasing number of servers are needed to upload exploding patent documents  Staff & Time Cost You need to hire staff and spend time to make English Machine Translated Database of Korean patent documents  Maintenance Cost All the MT database needs to be upgraded according to the MT engine and dictionary upgrade Maintenance cost according to the database upgrade will rise

 Less Maintenance Fee (as discussed in the previous chapter) 2. Consistent Data Quality Improvement Consistent Dictionary Upgrade  Quarterly Upgrade  How to upgrade dictionary Words that MT engine failed to translate are automatically extracted New words are extracted from new KR patent publications (Yearly) Ⅳ. Strength of the cross lingual patent retrieval system

Ⅴ. What needs to be improved 1. Efforts to improve quality Semantically Based Query expansion  Plan to expand each word in the original query by its nearest neighbors in the other language ex) if boat is query, the query can be extended to ship, vessel, craft, water etc. 2. Efforts to improve speed Service response time : 0.5 sec  Average time for one document retrieval : 10 sec  Large volume document retrieval takes longer, in the worst case leads to time error. Speed will get worse if query expansion technique is adopted

Ⅵ. Future Plan 1. Short term goal Consistent Dictionary Upgrade Establishment of query expansion ( ) 2. Long term goal Cross lingual retrieval system for Chinese, Japanese and Korean patent documents Combination of Ontology with existing system. 1. Short term goal Consistent Dictionary Upgrade Establishment of query expansion ( ) 2. Long term goal Cross lingual retrieval system for Chinese, Japanese and Korean patent documents Combination of Ontology with existing system.

Korea Institute of Patent Information Thank You T F E. © 2007 KIPI. All rights reserved. This presentation is for informational purposes only. KIPI MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.