Presentation on theme: "Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?"— Presentation transcript:
Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ? Recall = = 1 # retrieved relevant documents # existing relevant documents For concepts defined in IPC: documents have all appropriate symbols Efficiency: documents have no inappropriate symbols Priority 1: Priority 2:
document is unclassified has wrong / inappropriate classification has outdated / invalid classification non-exhaustive / incomplete classification > appropriate symbols are missing > given symbols are not specific enough varying classifications of family members excessive classification Phenomenology of quality issues
Different aspects individual document / publication - classification by publishing IPO - and by other IPOs, e.g.EPO > ECLA DPMA > "ICP"ICP JPO,… ? > examiners create their own search files different publication levels: - unexamined (unsearched) applications - granted patents families: in MCD reclassification at family level data in different databases
Unclassified documents Published before 1.1.2006: many documents in MCD still unclassified / not reclassified: 92% of all documents in MCD* 87% of all documents of EPO members Published after 1.1.2006: 97% of all documents in MCD 91% of all WO each week 6 - 8% of WO publications are not classified at all *cf IPC/CE/40/4cf IPC/CE/40/4
Publication week 50 (13.12.2007): 260 of 3272 (7.9%) ISA EP 218 (84%) KR 27 (10%) AU 5 US 5 RU 2 SE 2 CA 1 Receiving Office US 177 IB 31 EP 26 GB 9 KR 3 DE 2 FR 2 IL 2 : Unclassified WO documents Lesson : There are still many documents without any valid classification > Top priority: All documents should have at least one valid classification
Wrong classification A61N 1/00 Electrotherapy; Circuits therefor courtesy of M. Meier (Audi)
Wrong classification B60K Arrangement or mounting of propulsion units or of transmissions in vehicles Lesson : Completely wrong classifications do occur courtesy of M. Meier (Audi)
Wrong classification Lesson : Typos may occur; flaws of concordance tables Example:WO2007126503 ISR:G01L 19/02 Espacenet:G10L 19/02 Wrong classifications: difficult to investigate because difficult to find feedback by users needed
Outdated / invalid classification Business methods: G06F 17/60 G06Q [2006.01] in Espacenet:0 WO docs with a:G06F17/60 in Patentscope:1506 WO docs with G06F17/60 - e.g. WO2007004271 reclassified in Espacenet only to ECLAWO2007004271Espacenet Lesson : Reclassification following revision is still incomplete Lesson : Classification data may be different in different databases in Espacenet:many non-PCT min are not reclassified - e.g. CZ, UY, NZ, ARNZ not all PCT min is reclassified - e.g. only 678 of 14543 KR docs reclassified in ECLA/IPC14543
Outdated / invalid classification Traditional medicine: A61K 35/78 A61K 36/.. [2006.01] in Espacenet: 10413 docs still have 35/78 as ECLA only 7412 thereof have 36/.. Lesson : Reclassification to valid IPC incomplete Further exampleWO1998039019 in Espacenet:A61K 36/02 as IPC-ALEspacenet A61K 35/80 as ECLA Patentscope:A61K 35/80 as IPC Lesson : Classification data may be different in different databases
Example: Aircraft cargo loading logistics system US 2005246132A1 (3.11.2005) US 7100827B2 (5.9.2006) DE 102005019194 A1 (24.11.2005) FR 2871269 A1 (9.12.2005) Classification data on front page US A1US B2DE A1FR A1 B64C 1/22G06F 19/00G06F 17/60G06F 19/00 G06K 15/00G07C 11/00G06F 17/60 Lesson : Classification of granted patents may be very different Lesson : Assessment of main classification varies Varying classifications in family
US A1US B2DE A1FR A1Espace IPC Espace ECLA DepatisPatFT B64C 1/20XXX B64C 1/22XXX B64D 9/00XXX B64D 9/00AX G06K 15/00XX G06Q 10/00 G06Q 10/00DX G06F 17/60XXX G06F 19/00XXXXX G07C 11/00XXX Lesson : classification data from subsequent publications may not be in MCD Lesson : some reclassification data may not be in MCD; exist as ECLA only Varying classifications in family
Varying classifications of single document Example:WO2007126503 ECLA:G01L 19/00B (roll up to IPC: G01L 19/00) IPC:G01L 19/02 Lesson : different views of different classifiers US7258017 B1 (granted family member) IPC:G01L 19/04 Lesson : classification of granted patents may be different
Current problems in classification (I): IPC consistency KR20070005367 A (Prio.: KR20050060661) Multifocal lens and manufacture method thereof IPC (AL):G02B3/10 JP2007017937 A (Prio.: KR20050060661) Multifocal lens and method for manufacturing the same IPC (AL):G02F1/13; G02B3/14; G02F1/1334 US2007008599 A (Prio.: KR20050060661) Multifocal lens and method for manufacturing the same IPC (AL):G02B5/32 CN1892258 A (Prio.: KR20050060661) Multifocal lens and method for manufacturing the same IPC (AL):G02B3/10 EP1742100 A1 (Prio.: KR20050060661) Multifocal lens and method for manufacturing the same IPC (AL):G02F1/1334 Lesson : classifiers may have different views of subject matter to be classified or interpret IPC groups differently by courtesy of H. Wongel
Non-exhaustive classification Example: Secondary scheme A01P [2006.01] "Biocidal, pest repellant,… activity of chemical compounds" Espacenet: not in ECLA ! A01PEPA01NEP total433611054 (2%) 9999423330 (24% ) 20072104114 (5% ) 103281040 (10% ) Lesson : incompatibility of IPC and ECLA may cause non-exhaustive classification
Non-exhaustive classification Example:EP1881839 ECLA:A61K 36/487 IPC:A61K 36/00 Lesson : classifications could be more specific Lesson : relevant classifications may not be given / available as IPC Example:A61K 36/.. ECLA:22440 documents IPC:only 17847 thereof have a:A61K 36/.. Example:C12Q 1/68 Espacenet:> 100.000 docs ECLA:> 40 subgroups IPC:0 subgroups
Causes/sources for deficiencies "wrong" or varying intellectual classification: -rules too complicated -drawbacks of classification scheme (too much overlap) -interpretation of subject matter -differing national practise -lack of expertise, diligence, time pressure granted claims may differ incompatibility ECLA - IPC; USPC concordance tables lack or delay of reclassification: -insufficient resources for intellectual reclassification data exchange / management problems data input (typos)
Options for improvement on IPO level: - allocate resources - adapt / harmonize classification practise / training - develop classification assistance tools on user level: - knowing deficiencies > adapt search strategies on IPC level: - improve user-friendliness (e.g. definitions) - simplify IPC scheme, rules More liberal approach when classifying ? One more symbol better than one symbol missing ? Do we need to be worried about varying classifications ?
Options for improvement On MCD / database level: crosscheck content of databases pooling / compiling of classification data (in one searchable field / on family level ?) of - classification data of fam members - subsequent publications - other sources (DE: ICP,…) processing such compilations of classifications of different origin, e.g.: compare classification of subsequent publications (A, B,..) > create "trusted" classifications (e.g. class (A) = class (B)) ?
Learn from / go WEB 2.0 ? "Folksonomy", "social tagging", "cooperative, collaborative classification" > include broader user community ? e.g. any searcher ? > implement feedback channels ?
Are you satisfied with classification in A61N 1/00 ? Yes / No Would you like to suggest further classifications:.............................................. Submit Click opens
Learn from / go WEB 2.0 ? "Folksonomy", "social tagging", "cooperative, collaborative classification" > include broader user community > compile varying views, ie classifications process such data; create "trusted" classifications broader participation in scheme development, in particular definitions ? Tagging of IPC entries ? Thank you
More liberal approach when classifying ? One more symbol better than one symbol missing ? Do we need to be worried about varying classifications ? Include broader user community ? e.g. any searcher ? Implement feedback channels ? Create "trusted" classifications (e.g. class (A) = class (B)) ? Top priority: all documents should have at least one valid classification Priority 1: documents have all appropriate symbols Priority 2: documents have no inappropriate symbols