Almaden Services Research © 2009 IBM Corporation COA: Finding Novel Patents through Text Analysis Mohammad Hasan, Scott Spangler, Tom Griffin, Alfredo Alba Scott Spangler IBM Almaden Services Research
Almaden Services Research © 2009 IBM Corporation The BlackBerry Patents Five patents on the subject of RF communication with mobile processors Judge threatened an injunction which would have forced RIM/Blackberry to shut down service On the surface they appear to read very directly on RIM’s business But are these patents really what they appear to be?
Almaden Services Research © 2009 IBM Corporation Problem Addressed How do you automatically evaluate the value of Patent claims. Most existing approaches use field of invention + citation analysis to derive an approximation Our approach uses analysis of the claim text itself to discover indicators of patent worth.
Almaden Services Research © 2009 IBM Corporation Intuition The most valuable patents are those that are among the first to claim an important technology. Challenge: How do we discover that part of a patent claims which are most “original”
Almaden Services Research © 2009 IBM Corporation Method Focus on the patent claims section Find all terms occurring in the claims section For the technical area of the patent (patent class), discover when each of these search terms first occurred in patent claims Term originality then is defined as small difference between patent date and term first use date Create a score that ranks highly those patents with “original” terms in their claims
Almaden Services Research © 2009 IBM Corporation Description Build an index of patent claim words associated with time of first occurrence in patent claims For each patent evaluated –Analyze each 1,2,3-gram in patent claims to see if it is an original usage or an “early” usage of those words in the patent claim section in that technology “area” –Look for subsequent usage of that word in more recent patents to calculate “support” The value of a patent is based on the number of early* words with significant** support. Scored one of two ways: –Sum of support (# of patents) divided by age (# of days) –Count of # of terms with support > 2 and Age < 7 years *early = within 7 years of first occurrence **significant = at least 3 patents use the term
Almaden Services Research © 2009 IBM Corporation How we validated this approach Three easily identifiable metrics that should correlate to patent value –Citations –Lapsed Fees –Internal IBM Attorney Rating None of these is perfect, but all three should roughly correlate with the intrinsic value of the patents
Almaden Services Research © 2009 IBM Corporation Results Citations are roughly correlated with COA scores Lapsed patents have lower COA scores on average than do other patents Patents rated 1 (by IBM attorneys) have on average significantly better COA scores then those rated 3.
Almaden Services Research © 2009 IBM Corporation Claims Originality of Blackberry Patent All five patents have very lengthy, extensive claim language, around electronic mail devices Very little text in these claims is original. Taking context into consideration, the technical merit of these patents is questionable. $120M / patent licensed an appropriate valuation? TermFirst Occurred Difference in Days Supp ort application programs stored7/25/ information added8/20/ interface stores4/29/ network storing10/15/ information network12/25/ network information12/25/ mail systems5/23/ destination transmits7/25/ processors occurs4/29/ information accessible11/6/ electronic mail7/11/ interface switch7/25/ network switch7/30/ gateway switch7/25/ transmitting originated1/1/ stored originated7/25/ interface receiving9/28/
Almaden Services Research © 2009 IBM Corporation SIMPLE Implementation Usage: 572 Invocations of COA as of 6/15/2009
Almaden Services Research © 2009 IBM Corporation Success stories from SIMPLE to date: VOIP analysis: –Started from 13 original patents to more than 20 eventually licensed. –This drove nearly $8M in licensing revenue. Videoconferencing analysis: – Found 2 additional patents, each of which was sold. – This drove upwards of $5M in licensing revenue. SIMPLE has over 280 active users (both internal and external). We continue to develop and grow the capabilities.
Almaden Services Research © 2009 IBM Corporation If you want to try this out yourself Go to: Username: sb_test8 Password: hello2You Click Analyze / Claims Originality Enter one or more patent numbers Click Analyze button Tell us what you think! (
Almaden Services Research © 2009 IBM Corporation Potential Future Application: Tracing the Source in Web Content
Almaden Services Research © 2009 IBM Corporation Credibility Scoring (“net cred”)
Almaden Services Research © 2009 IBM Corporation Conclusions and Future Work We have demonstrated how text analysis in the patent space can help provide context far more effectively than manual methods We feel these methods generalize to other types of unstructured information The ability to provide better information context and validation will be important to individuals and organizations in a world where a smaller and smaller percentage of information comes from “authoritative” sources.
Almaden Services Research © 2009 IBM Corporation