Presentation is loading. Please wait.

Presentation is loading. Please wait.

Quality Taxonomies Jim Nisbet Senior Vice President of Technology Semio Corporation Knowledge Technologies 2001 March 5 th, 2001.

Similar presentations


Presentation on theme: "Quality Taxonomies Jim Nisbet Senior Vice President of Technology Semio Corporation Knowledge Technologies 2001 March 5 th, 2001."— Presentation transcript:

1 Quality Taxonomies Jim Nisbet Senior Vice President of Technology Semio Corporation Knowledge Technologies 2001 March 5 th, 2001

2 Ontology / Taxonomy Root Ontology Taxonomy Generation Static Discovery Dynamic Discovery

3 What is Quality ?  “Best value for the money”  According to this definition, you are entitled to get high performance from a costly product; likewise a low cost product or service is expected to be a poor delivery. For example, a loose demo delivery is both predictable and acceptable, since its quality is: low conformance / low cost.

4 What is Quality ?  “Good Quality is Nominal Conformance”  Taxonomy Quality is defined as Taxonomy Conformance to: Valid requirements; Explicitly documented development standards; and, Implicit characteristics that are expected of all professionally developed taxonomies, such as the desire for good maintainability.

5 Standards  ISO 2788-1986 International Organization for Standardization. Documentation—Guidelines for the Establishment and Development of Monolingual Thesauri. 2nd ed. n.p.: ISO, 1986. (ISO 2788- 1986(E)). (Available in the U.S. from American National Standards Institute)  ISO 5964-1985 International Organization for Standardization. Documentation—Guidelines for the Establishment and Development of Multilingual Thesauri. n.p.: ISO, 1985. (ISO 5964-1985(E)). (Available in the U.S. from American National Standards Institute)  ANSI/NISO Z39.19-1993 National Information Standards Institute. Guidelines for the Construction, Format, and Management of Monolingual Thesauri. Bethesda, MD: NISO Press, 1994. 69p. (ANSI/NISO Z39.19-1993)  SEMIO Quality Plan v1 2000  ISO/IEC 13250 Topic Maps  RDF Please refer to RDF at http://www.w3.org/RDF and XML at http://www/w3/org/XMLhttp://www.w3.org/RDFhttp://www/w3/org/XML

6 Project Plan 1.Kick-off 2.Requirements Review 3.Lexicon Review 4.Taxonomy Review 5.Tags Review 6.Final Review

7 1. Kick-off  Objectives Purpose Scope Scale Users Conditions of receipt  Roles Supplier Customer –Admin –KE –Experts –Users  Planning  Training and Transfer

8 2. Requirements Review  Sources  Lexicon  Ontology  Install

9 Sources  Dispersion (Multiplicity, Size, Homogeneity)  Refresh  Access

10 Typical Patterns  Disparity  Adjust sources  Adjust crawl strategy  Isolate communities / taxonomies

11 Lexicon  Vocabularies, etc.  Substitutions: Acronyms, Synonyms, etc.  Preferred Keywords: Brand Names, etc.  Banned Keywords

12 Typical Patterns  Lack of requirements  Use Librarian Resources

13 Ontology  Thesaurus ?  Is the information domain analysis complete, consistent, and accurate ?  Is the partitioning of the problem complete ?

14 Typical Patterns  Directory versus Taxonomy  Isolate “directory” branches  Thesaurus versus Taxonomy  Put an ontology on top of thesaurus  Check ASAP match of thesaurus generics with extracted lexicon  Very high level design for top categories requirements  Plan to work bottom-up  See also Taxonomy (functions, combinations, etc.)

15 Install  Implementation / Integration: Are external and internal interfaces properly defined? Are all requirements traceable to the system level? Has prototyping been conducted for the user/customer? Is performance achievable within the constraints imposed by other system elements? Are requirements consistent with schedule, resources, and budget?

16 Typical Patterns  Scale  Security  Missing Documents

17 3. Lexicon Review  Coverage Extracted words / Words (Extracted Index / Index)  Sources bench-marking Coverage Extraction quality Topic distribution  Structure Most Frequent Phrases Most Productive Generics  Substitutions  Exceptions

18 Typical Patterns  Low level of frequency / quality for the most meaningful content  Increase size of value corpus  Filter and re-import lexicon

19 4. Taxonomy Review  Taxonomy Operation Correctness Reliability Usability Integrity Efficiency  Taxonomy Revision Maintainability Flexibility Testability  Taxonomy Transition Portability Reusability Interoperability

20 Tax Liability Loan Term loan Short-term loan Unique Beginner Life Form Generic Specific Varietal Folk Taxonomies Design The Berlin and Kay model: Taxonomy = Nomenclature + Terminology

21 Correctness  Accuracy  Completeness  Consistency

22 Accuracy  Precision  Recall

23 Completeness TaxonomyMapsLexiconCollection

24 Concentration Works Against Quality Lexicon Document Collection Maps Taxonomy Tagging  Tagging Coverage  Ontology Coverage  Hook Coverage  Map Coverage  Lexical Coverage  Collection Coverage

25 Consistency: Typical Patterns  Objectivization  Hyperonymy  Speciation  Necessity

26 Objectivization Employment Firing Hiring Salaries  Avoid functional categories  Don’t mix functions / objects  Exhaust scripts  Match idiomatic phrases

27 Genericity Parts Air Conditioning Belts and Hoses Body Brake System Chassis Engine Exhaust System Fuel System Glass Ignition  Avoid meronymy  Don’t mix meronymy / hyperonymy  Exhaust prototypes

28 Speciation Person Unwelcome person Unpleasant person Selfish person Opportunist Backscratcher  Avoid “strings” of categories  Avoid (non-idioms) properties for categories (WordNet)

29 Necessity  Avoid non-productive categories  Avoid combinations of categories

30 Nomenclature (Design Structure) Quality Index  Depth  Width  Balance

31 Complexity Index  Cyclometric complexity increases with number of Cross References within the Taxonomy, giving an indication of complexity and difficulty of testing.  Taxonomy Complexity Index combines: autonomy closure similarity typicality commonality redundancy stability

32 Maturity index  The IEEE standard 982.1-1988 suggests a taxonomy maturity index to provide an indication of the stability of the taxonomy.  Maturity Index combines: number of modules in current ontology / taxonomy. number of modules in current ontology / taxonomy that have been changed. number of modules added to current ontology / taxonomy. number of modules deleted from the previous version of the ontology / taxonomy.

33 5. Tags Review  Document coverage  Concepts coverage http://www.TaxSource.com Liability 1.289 Federal Funds 0.746

34 6. Final Review  Receipt  Maintenance

35 Quality Taxonomies Jim Nisbet niz@semio.com Knowledge Technologies 2001


Download ppt "Quality Taxonomies Jim Nisbet Senior Vice President of Technology Semio Corporation Knowledge Technologies 2001 March 5 th, 2001."

Similar presentations


Ads by Google