Patent CLEF John Tait, Chief Scientific Officer, IRF
The IRF Mission To bridge the gap between information retrieval research and the world of professional search especially in patents and intellectual property To promote open research on very large scale information retrieval To make available a facility that enables large scale information retrieval and in depth patent and other complex data processing.
Across the world there are about 60 million patents and the number is growing rapidly Patent documents formed the most important shared information pool: Knowledge and research Innovative capacity and commercial strength Legal information Patents - General 80% of world technical-scientific knowledge can be found in patent documents – in some branches of industry the number is significantly higher still Intellectual Property (IP):
Innovation improves competitivity, creates jobs, promotes growth and secures prosperity. The only valid and binding instrument to protect innovation An important commercial asset – a monopoly on the use of an invention The issue of licences has become a significant revenue source for many companies Patents – Commercial importance Intangible Assets:
Distinctive Patent Search Characteristics High Recall: a single missed document can invalidate a patent Session based: single searchers may involve days of cycles of results review and query reformulation Defendable: Process and results may need to be defended in court
Established in 2005 Headquarters in Vienna Has over 70 employees, an expert team of software developers, technicians, mathematicians, language experts and other specialists Field of activity: Information Retrieval in the segment of Intellectual Property Products: innovative solutions for searching and categorising patent data Matrixware
Committed to provide 1. Sample from Alexandria patent database 2. Leonardo Eclipse based IR open development platform Populated with various tools General IR NLP MT UI But not necessarily in time for CLEF 2009
Patent Retrieval Distinctive Problems
Patent Process Submit to Patent Office Agent Grant Patent Defend Patent
Types of Patent Search 1. Patentability 2. Validity 3. Clearance (Freedom to Operate) 4. Infringement 5. State of the Art 6. Patent Landscape 1-3 dependent of prior art search
Very High Recall Any prior publication will invalidate a patent Other patents including lapsed Scientific Publication Comics ???!!!
Session Based Patent Professionals Searching Often Spend 2 or more days on one search May review more than 1000 results Work with other professionals (lawyers, chemical engineers, chemists, marketing etc. Have to record and defend search process to clients and courts
Classification All patents are classified IPTC Automatic Classification Possible People search for Gaps
Multilingual 1. A Russian patent can invalidate a British patent 2. Complex and changing patterns of filing language 3. Patents come in families Same idea: different jurisdictions and languages 4. MT already widely used
Filing Languages English continues to be the dominant language Chinese is the most rapidly growing language and may surpass English shortly (China now bigger than US) Activity in India is growing rapidly but looks set to be English dominated Cyrillic Languages especially Russian are also rising rapidly Japanese and Korean are very important German and French are important but declining relatively Spanish is underepresented versus world wide speakers “Minor” European Languages are declining rapidly
PAIR 08 CIKM Workshop Includes proposed TREC Chemistry Track and a proposal from Erik Graf and Leif Azzopardi from Glasgow on automatic Test Collection creation
Break Out Session Meeting Room 1 Through tunnel at end of corridor 118 Mødelokale 1.1 Areas for Discussion Test Collection Creation Task(s) Evaluation Methodolgies Organizational Issues Future Developments
Thank you for your attention Any questions ? Mailing List Subscription: