© Magister Ltd 2004, 20051 Database evaluation: Part 2.

2 © Magister Ltd 2004, 20052 Two case studies 1. How many pamphlets (WO-A) did WIPO publish in 2000? –Fundamental question of database content –Quantitative? 2. How many patents/applications from 2002 refer to uses of elemental gold or its compounds? –Database searchability –Qualitative?

3 © Magister Ltd 2004, 20053 Sources for question 1 WIPO data –Press release, paper PCT Gazette –IPDL PCT Gazette esp@cenet ESPACE-ACCESS CD-ROM Questel-Orbit –WOPATENT, PCTFULL, PlusPat STN –PCTFULL

4 © Magister Ltd 2004, 20054 “The PCT in 2000” Source: “The Patent Cooperation Treaty in 2000” Geneva: WIPO, 2001

5 © Magister Ltd 2004, 20055 Basic test no.1 - Definition What does this figure represent? –Quote: “The number of international applications published in 2000 in each of the languages of publication was as follows:...” –Quote: “In 2000, the Gazette included entries relating to the 79,947 international applications which were published in 2000 in the form of PCT pamphlets…”

6 © Magister Ltd 2004, 20056 Variations on a theme No explicit mention of –reprinted or correction documents, –delayed search reports, –cases withdrawn after allocation of publication number. The use of the term “PCT pamphlets” may mean that only complete specifications are being counted, not WO-A3 or similar.

7 © Magister Ltd 2004, 20057 Paper PCT Gazette Lowest entry = 00/00001 Highest entry = 00/79858 –Implication: 79,858 cases were published. –WIPO says: 79,947 cases were published. –Where are the missing 89 ? –NOTE: if the difference is due to withdrawal, we would expect the highest Gazette entry to be higher than actual publications - but it’s lower.

8 © Magister Ltd 2004, 20058 IPDL PCT Gazette Difficult to locate year truncation feature: –DP/*/*/2000 Help refers to (*) specifically as RIGHT-hand truncation operator Result = 98,644 records (deviation = 18,697) –this includes WO-A3 and other correction documents –DP/*/*/2000 AND ( KI/A1 OR KI/A2 ) Result = 85,873 records (deviation = 5,926) No way of expanding the KI field to locate other possibilities.

9 © Magister Ltd 2004, 20059 Other comments on IPDL Relevance score = 0 ?

10 © Magister Ltd 2004, 200510 esp@cenet ® / ESPACE-ACCESS Worldwide file: –Publication no. = WO and Publication date = 2000 (no truncation) Result = 79,858 (exactly the same as the paper Gazette range, deviation from WIPO = - 89) ESPACE-ACCESS –Publication no. = WO2000* Result = 79,850 –Publication no. = WO2000* & (KI=A1, A2) Result = 68,635 ! Does this imply that KD codes are not accurately applied to all records?

11 © Magister Ltd 2004, 200511 STN/MicroPatent file PCTFULL Strategy: 2000/PY Comment: Very close to the 79,858 suggested by the paper Gazette

12 © Magister Ltd 2004, 200512 STN/MicroPatent file PCTFULL Strategy: 2000/PY & CC/LA

13 © Magister Ltd 2004, 200513 STN/MicroPatent file PCTFULL Strategy: 2000/PY & CC/LA & FT/FA

14 © Magister Ltd 2004, 200514 STN/Univentio file PCTFULL Strategy: 2000/PY

15 © Magister Ltd 2004, 200515 STN/Univentio file PCTFULL Strategy: 2000/PY & CC/LA Note: In both MicroPatent and Univentio, the sum of languages yields a different total to the publication year: Could imply either that the /LA field is not being accurately filled, or that documents are missing, or both.

16 © Magister Ltd 2004, 200516 STN/Univentio file PCTFULL Strategy: 2000/PY & CC/LA & DETD/FA Note: In both MicroPatent and Univentio, the availability of bibliographic data in the appropriate language is still no guarantee of availability of full text.

17 © Magister Ltd 2004, 200517 Questel WOTEXT file EPO file, now withdrawn - replaced by PCTFULL file from Univentio Drawn up on a different selection criteria –up to mid-2000, preference given to an English-language representative document, e.g. fast-publishing US-B replaced an equivalent WO- A –after mid-2000, WIPO XML full-text for all English, French and German cases

18 © Magister Ltd 2004, 200518 Questel/WOTEXT

19 © Magister Ltd 2004, 200519 Questel WOPATENT Bibliographic file only, data supplied by WIPO –Publication date and kind can be searched in PN field: /PN 2000 AND (A1/PN OR A2/PN) Result = 79,857 (1 different from the Gazette total) –of which 68,024 were WO-A1 + 11,833 were WO-A2 –Total = 79,857 ! –Not possible to analyse by publication language? - PNL field not on summary sheet

20 © Magister Ltd 2004, 200520 Questel WOPATENT ?..IND /APL A Beginning of the index. 15114CN 214CRO 334CS 413CZE 52686DA 6133141DE 71DK 8624721EN 95493FI 1043696FR 1141HR 1279HU 1321HUN 141594IT 15368ITA ……. 126TR 22TUR 31US End of the index. The APL field is the only obvious field for language, and does not correspond to the language of PUBLICATION INID codes 25 (filing language) and 26 (publication language) exist for this purpose - why are they not being used?

21 © Magister Ltd 2004, 200521 Questel PCTFULL Full text file, data supplied by Univentio (same as STN version) –Publication date and kind can be searched as for WOPATENT /PN 2000 AND (A1/PN OR A2/PN) Result = 79,857 (1 different from the Gazette total and identical to WOPATENT) Analysis by Kind Code also the same as WOPATENT –Full text available = 76,750

22 © Magister Ltd 2004, 200522 Questel PCTFULL Strategy: 2000/PN & CC/LA & DESC=YES Notes: Same data supplier, different numbers of texts available. Questel total with language code = 10 more than strategy without language code

23 © Magister Ltd 2004, 200523 Questel PlusPat Bibliographic - file producer = Questel –Publication date and kind can be searched as for WOPATENT : two (apparently) equivalent command strings - /PN 2000 AND WO = 83,229 –presumably including WOA3 documents /PN WO AND PD=2000 = 79,858 –of which WOA1 = 68,022, WOA2 = 11,836 –Language analysis available using 3-letter codes

24 © Magister Ltd 2004, 200524 Questel PlusPat Strategy: (WOA1/PN OR WOA2/PN) & PD=2000 & CCC/LA Notes: Closest match yet to the official WIPO totals - but language code is still apparently causing data loss

25 © Magister Ltd 2004, 200525 Summary A simple search on publication year and kind code yields substantial variation –but the exact explanation requires more research It appears that the language code is not being applied correctly –same possibly applies to the Kind Codes In a real search situation, missing full- texts would cause significant data loss “The truth is rarely pure - and never simple” Oscar Wilde

26 © Magister Ltd 2004, 200526 How many English WO’s in 2000?

27 © Magister Ltd 2004, 200527 Case study 2 How many patents/applications from 2002 refer to uses of elemental gold or its compounds? –Sub-question ; what proportion are US publications? Evaluation factors: –Database searchability –Qualitative?

28 © Magister Ltd 2004, 200528 Sources for question 2 WIPO IPDL Chemical Abstracts World Patent Index IFI Claims ® (esp@cenet ®)

29 © Magister Ltd 2004, 200529 Sample search: USPTO granted patents: –‘Quick Search’ option: Term 1 = 1/1/2002->12/31/2002 (Issue Date) AND Term 2 = gold (All Fields) = 8,854 patents –Range from US 6334244-B granted Jan 1, 2002 to US 6502221-B granted Dec 31, 2002 –Initial impression - large number of electronics cases (films, connectors etc.) –High recall, low precision

30 © Magister Ltd 2004, 200530 Second search USPTO published applications: –‘Quick Search’ option: PD/1/1/2002->12/31/2002 AND gold (all fields) = 0 applications (?!) –Re-run in ‘Advanced Search’ PD/1/1/2002->12/31/2002 and SPEC/gold = 11,140 applications –highest number = US 2002/0199221-A (unable to browse to other hit-lists) –Maybe it was having an ‘off-day’?

31 © Magister Ltd 2004, 200531 Refine the search False drops : –US 6499593-B (Golf Bag) Refs. cited: “Article, New Gold Accessories for 2000, Golf Illustrated--by: Laurie Lee Dovey, Equipment Editor (No date).” –US PP 13443-P2 (Nectarine tree named `Burnectfive`) Specification: “Floral nectaries.-- Color. -- A dull orange-gold (RHS Greyed Red Group 178 B).”

32 © Magister Ltd 2004, 200532 Refine the search False drops : –US 6404519-B (Method of advertising on a motor vehicle) Inventor City: Gold Hill, NC –US 6465189-B (Systematic evolution of ligands by exponential enrichment: blended selex) Inventor: Larry Gold –US D464586-S (Sock sculpture) Attorney: Gold & Rizvi, P.A.

33 © Magister Ltd 2004, 200533 “Experience is what you get when you don’t get what you want” Dan Stanford (1850-94)

34 © Magister Ltd 2004, 200534 Initial conclusion Many of the false drops are due to the definition of “all fields” –literally includes all text fields, plus all front page bibliographic data fields as well. Lesson –always be clear about what is included in the ‘basic index’ of your database

35 © Magister Ltd 2004, 200535 Limiting field of search Granted patents file: – ISD/20020101->20021231 and ACLM/gold = 1,321 patents By using the ‘claims’ field, we achieve two improvements: –substantial increase in precision –substantially eliminates Design Patents. smaller ACLM field : typically only one claim in the form “The ornamental design for […], as shown and described”

36 © Magister Ltd 2004, 200536 Alternative document types Re-issue Patents: –ISD/20020101->20021231 and gold (all fields) and APT/2 = 26 patents Not linked to their original issue patent in this file Unconventional topics: –US 6422036 (Jewelry clasp) unlikely to be covered by Chemical Abstracts? Lesson: –don’t assume that your database includes all candidate answers of all types

37 © Magister Ltd 2004, 200537 WIPO IPDL full-text Lesson: clarify timeliness criteria of file before starting to evaluate: –File ‘help’ notes some delay in release of text (typically 2-3 weeks) Lesson: clarify multilingual search capability in a multilingual file... –DP/20020101->20021231 and (ET/gold or ABE/gold or DEE/gold or CLE/gold) German and Spanish field labels not available 319 hits, including reprinted documents (A3)

38 © Magister Ltd 2004, 200538 Abbreviation searching DP/20020101->20021231 and (ET/gold or ABE/gold or DEE/gold or CLE/gold or DEE/Au) –121,584 hits Field DEE includes front page data, and retrieves every designation of Australia! Lesson: –consider ambiguity of search terms, especially in light of field contents. Question: How many other chemical element symbols correspond to ST.3 country codes? Answer coming up at the end of this session...

39 © Magister Ltd 2004, 200539 Further search terms Up to now, based on a very crude strategy –basic words, abbreviations, synonyms Database evaluation with respect to subject-based searching should always include strategies optimised for each file: –CAS - RN’s –IFI - Uniterms, linking and role indicators –WPI - Manual Codes, subscriber abstracts etc.

40 © Magister Ltd 2004, 200540 Chemical Abstracts Registry file: –Au/ELS = 39,006 records (L1) –HELP RNYEAR shows 2002 registrations = 380148-72-1 to 477930-11-3 S L1 RAN=(380148-72-1,477930-11-3) = 1,964 compounds registered during 2002 (L2) N.B. not the same as ‘compounds registered from documents published in 2002’ –safer to use L1 RAN=(380148-72-1,) = 2,420 (L3) –Answers will include isotopes, compounds and alloys

41 © Magister Ltd 2004, 200541 Chemical Abstracts CAPlus file –Cross L3 from Registry (L4); & P/DT & 2002/PY.B : 132 documents, each citing one or more Au compounds registered >=2002 –Compare P/DT & 2002/PY.B & (GOLD/TI OR GOLD/AB) = 519 –Compare L1 & P/DT & 2002/PY.B NOT (GOLD/TI OR GOLD/AB) = 2,735 both include new uses of older compounds

42 © Magister Ltd 2004, 200542 Example hits PL 182430-B1, pub. 20020131 –“Method of making ohmic contacts in III-V semiconductor radiation sources.” –496877-84-0 : 95% Au, 4.5% Zn alloy JP 2002-161327-A2, pub. 20020604 –“Sintered electric contact material, its manufacture, and circuit breaker.” –433295-32-0 : 70% W, 30% Au, 0.1% Sb alloy.

43 © Magister Ltd 2004, 200543 Example hits RU 2188430-C2, pub. 20020827 –“Method for predicting arterial hypertension development during anti-inflammatory therapy in rheumatoid arthritis patients.” cites 12244-57-4, Tauredon JP 2002274841-A2, pub. 20020925 –“Superconductor materials” cites 461667-30-1, Gold magnesium boride ((Au,Mg)B2)

44 © Magister Ltd 2004, 200544 Derwent WPI Available fields: Text fields –Basic index, Titles, Extension Abstracts Manual Codes –CPI subscriber only, EPI open to everyone Fragment Codes –subscriber only Lesson: –WPI includes a range of search options - not all open to all users evaluate with the customer’s access in mind

45 © Magister Ltd 2004, 200545 Extension Abstracts => S (GOLD OR AU) AND 2002/PY.B L1 4313 (GOLD OR AU) AND 2002/PY.B => S (GOLD OR AU)/BI,ABEX AND 2002/PY.B L2 4530 (GOLD OR AU)/BI,ABEX AND 2002/PY.B => S L2 NOT L1 L3 217 L2 NOT L1

46 © Magister Ltd 2004, 200546 No record in Basic Abstract AN 2003-271291 [27] WPIX TI Catalyst for carboxylate-ester synthesis contains metal ultrafine particle having preset average particle diameter supported on inorganic oxide support. PI JP2002361086 A 20021217 (200327)* 11p B01J-023-52 AB JP2002361086 A UPAB: 20030429 NOVELTY - A catalyst for carboxylate-ester synthesis….. DETAILED DESCRIPTION - An INDEPENDENT CLAIM is included for manufacture... USE - For synthesis of carboxylate ester… ADVANTAGE - The catalyst has excellent catalytic activity…. TECHNOLOGY FOCUS - INORGANIC CHEMISTRY - Preferred Support: The inorganic oxide support...

47 © Magister Ltd 2004, 200547 Extension Abstract ABEX JP 2002361086 AUPTX: 20030429 EXAMPLE - 10 mmol/L chloroauric-acid aqueous solution (500 ml) was maintained at 65-70degreesC and pH was adjusted to 7 using 0.5N sodium hydroxide aqueous solution. gamma-alumina AC-12R (40 g) was added to the aqueous solution with stirring…... The metal fixation material obtained by filtration was dried at 100degreesC for 10 hours, then bake-processed at 300degreesC in air for 3 hours and a metal support (gold/gamma- alumina) having metal supported on the alumina support, was obtained. The amount of metal on the support was 4.6 weight% with respect to the support…….

48 © Magister Ltd 2004, 200548 Effective use of Manual Codes Many of the CPI Manual Codes are too wide in scope to give precise retrieval for this search However, they can be used in combination with other search terms (e.g. text, IPC) to set a context for retrieval –e.g. N02-E04/MC AND GOLD/TI, AB limits retrieval specifically to gold in the context of catalysis.

49 © Magister Ltd 2004, 200549 Fragment Codes Applied only to certain chemical patents, primarily to aid retrieval of compounds disclosed only in generic form. A679 is the code for gold => S A679/M0,M1,M2,M3,M4,M5,M6 AND 2002/PY.B L6 531 A679/M0,M1,M2,M3,M4,M5,M6 AND 2002/PY.B => S L6 NOT (L1 OR L2) L7 81 L6 NOT (L1 OR L2)

50 © Magister Ltd 2004, 200550 Example answer AN 2003-209212 [20] WPIX TI Emulsions useful as solid dosage forms comprises a mixture of a drug-containing emulsion and a solid particle adsorbent. PI US2002160049 A1 20021031 (200320)* 15p A61K-009- 00 AB US2002160049 A UPAB: 20030324 NOVELTY - An emulsion composition (I) in…. M2 *02* A679 A960 A970 B415 B720 B743 B770 B815 B831 C710 H4 H401 H481 H8 J0 J014 J2 J273 J4 J471 J490 J9 K0 L8 L814 L821 L831 M210 M211 M212 M250 M262 M283 M315 M321 M332 M344 M349 M381 M391 M411 M431 M510 M520 M530 M540 M620 M782 M904 M905 N103 DCN: R09330-K; R09330-M; R11043-K; R11043-M

51 © Magister Ltd 2004, 200551

52 © Magister Ltd 2004, 200552 IFI CLAIMS ® US Patents only, from 1950 Uniterm Chemical Indexing –UN=55769 (UT=GOLD) –UN=34216 (UT=GOLD, INORGANIC) –UN=34217 (UT=GOLD, ORGANIC) Can be linked to specific ‘Roles’ in CDB only –Present –Reactant –Product

53 © Magister Ltd 2004, 200553 IFI CLAIMS ® Compound Terms –Specific registered compounds –UN=71665 (UT=Gold Chloride, AuClH2) –UN=98063 (UT=Gold Chloride, AuCl3) etc. (at least 37 others)

54 © Magister Ltd 2004, 200554 IFI CLAIMS ® General Term vocabulary –Descriptive names for classes of chemicals –UN=20582 Gold compounds and salts /STO/ replaced by alternate indexing; only appropriate for a full retrospective search. –General Term Thesaurus also suggests UN=07328 Compounds, Group 1B

55 © Magister Ltd 2004, 200555 Basic strategy 18845 UN=55769 (UT=GOLD) 741 UN=34216 (UT=GOLD, INORGANIC) 940 UN=34217 (UT=GOLD, ORGANIC) S1 722 PY=2002 AND UN=(55769 OR 34216 OR 34217) S2 317 S1 NOT GOLD

56 © Magister Ltd 2004, 200556 Sample result US 2002/0081459-A1, pub. 20020627 –“Magneto-optical recording medium and method of reproducing the same” –no text term present (title, abstract, claim) –Claim 3: “A magneto-optical recording medium...wherein the nonmagnetic layer is...selected from the group consisting of SiN, SiO2, AlN, ….Pt, Au, Si and Ge.” Indexed UN 55769, role 10 (present as starting material)

57 © Magister Ltd 2004, 200557 Sample result US 2002/0081397-A1, pub. 20020627 –“Fabrication of conductive/non-conductive nanocomposites by laser evaporation” – no text term present (title, abstract, claim) –Claim 7 : “The structure of claim 1 wherein the particles of electrically conducting material are metallic nano-particles.” Indexed UN 55769, role 10 (present as starting material); the only definition of ‘metallic’ (which includes gold) is found in the body of the specification - NOT in the claims.

58 © Magister Ltd 2004, 200558 Misleading text US 2002/0081397 uses the word ‘gold’ in two contexts: –as part of the invention “The electrically conducting particles are preferably made of carbonaceous material...or metallic material (such as, for example, gold)…” –as part of the equipment “For demonstration purposes, a glass substrate was prepared with 2 gold contact pads….” If we saw this first, we might be tempted to discard this record as irrelevant Lesson : Learn to trust your indexing - until proven otherwise

59 © Magister Ltd 2004, 200559 False drops we avoided... US 2002/0045493-A1 “Golf ball, and golf ball printing ink.” –Claim 1 : “A golf ball comprising a ball body, and a gold-colored mark printed on a surface of the ball body…” US 2002/0024168-A1 “Decorative candle” –Claim 12 : “The glitter candle composition of claim 1, wherein the glitter material comprises … Timiron Super Gold ™…”

60 © Magister Ltd 2004, 200560 Summary lessons Understand field contents –especially what is included in the Basic Index Don’t assume database coverage –check at the KD level as well as the country Understand timeliness –may be an important criterion for evaluation Understand multi-lingual aspects –are they present? / how are they handled? Try to anticipate ambiguity in sample searches Optimise each strategy to each database Evaluate with the customer’s access limitations in mind Get to know and trust the indexing

61 © Magister Ltd 2004, 200561 Final summary: Strategy = US/PC & 2002/PY & GOLD/TI,AB

62 © Magister Ltd 2004, 200562

63 © Magister Ltd 2004, 200563 Quiz answer How many elements have chemical symbols corresponding to ST.3 country codes? SymbolElementCountry AGSilverAntigua & Barbuda ALAluminiumAlbania AMAmericiumArmenia ARArgonArgentina ATAstatineAustria AUGoldAustralia and that’s only the A’s…… My estimate = 56 if you include EU (Europium) for the Community Patent

64 © Magister Ltd 2004, 200564 Conclusion Thank you very much for listening. Contact details: –Magister Ltd: –New telephone: (+44) (0)118 966 6520 –New facsimile: (+44) (0)118 966 6620 M AGISTER ® is a registered trade mark of Magister Ltd in the United Kingdom, for consultancy services in patent information.

