Presentation is loading. Please wait.

Presentation is loading. Please wait.

ST22 revision proposal June-2006 WIPO-SDWG meeting Geneva.

Similar presentations


Presentation on theme: "ST22 revision proposal June-2006 WIPO-SDWG meeting Geneva."— Presentation transcript:

1 ST22 revision proposal June-2006 WIPO-SDWG meeting Geneva

2 Agenda Reasons for the revision of the ST22 –Age of current standard –Expected benefits –PCT International Bureau experience –Examples of pages difficult to OCR –Conclusion Discussion / Questions

3 Age of current standard Inadequate title: “Recommendation for the presentation of patent applications typed in optical character recognition (OCR) format” Contains valid recommendations but expressed using an old-fashioned terminology (ribbons, typewriter,…). Some recommendations need to be precised. A few new recommendations should be added to take into account the progress in OCR technology in the last 10 years. Not enough followed by agents/applicants: some promotion is required

4 Expected benefits Experience shows that if documents follow simple layout rules, the automatic OCR procedures are sufficiently effective to yield a satisfying result for full text search purposes (i.e. an average accuracy above 98.5%). An updated standard ST22 would lead to: –Significant reductions in cost for the OCR procedures performed by the IP regional/national offices and the IB. –Better quality for the full-text published documents built from OCR procedures –More efficient and precise search procedures for the IP community

5 PCT International Bureau Experience An internal automatic OCR system and a Quality Checking system have been developed by the PCT The system has been tested for 6 months and then put in production. It has been in operations since January, 1st 2006 and OCRs the pamphlets published weekly by the PCT.

6 Internal OCR key points Use an off-the-shelf commercial product and adapt it to the PCT needs Build a generic and scalable service so that the OCR function can be used from different applications (on- line or batch) and fulfill PCT future needs Operate the service in house to reduce costs and gain flexibility in the publication process (discontinue Outsourcing contract)

7 Internal OCR: key points OCR the description and claims sections of the published PCT pamphlets each week (circa 50’000 pages to OCR weekly) Provide the results as ST36 XML files that are used to feed the indexation engine of the Patentscope site and the espacenet site (see http://www.wipo.int/pctdb/en/browse.jsp ) Enrich the PCT electronic products with the results of the OCR (searchable PDFs added to the rule 87 DVD)

8 Internal OCR some figures With our hardware configuration, the OCR of a complete publication week lasts around 16 hours (it runs during week ends). 5 staffs are performing part-time Quality Checking operations every Monday (Around 3 to 4 man days are spent each week on quality checking) in order to correct the worse cases.

9 Quality Checking system

10

11 Some examples of difficult pages submitted in paper or in image form, the revised ST22 standard should discourage...

12 Narrow fonts, justified paragraphs

13 Underline, italic, bold text

14 Subscripts too small

15 Mathematical formulae embedded in text

16 Handwritten text or cursive fonts

17 Gray or coloured backgrounds

18 Conclusion We invite the SDWG to: (a) to consider the proposal to revise WIPO Standard ST.22; and (b) to consider establishing a task for the revision of WIPO Standard ST.22 and to set up a Task Force to handle such revision.

19 Agenda Reasons for the review of the ST22 –Age of current standard –Expected benefits –PCT International Bureau experience –Examples of applications difficult to OCR –Conclusion Discussion / Questions


Download ppt "ST22 revision proposal June-2006 WIPO-SDWG meeting Geneva."

Similar presentations


Ads by Google