Download presentation
Presentation is loading. Please wait.
Published byMelissa Skelton Modified over 9 years ago
1
Quality Taxonomies Jim Nisbet Senior Vice President of Technology Semio Corporation Knowledge Technologies 2001 March 5 th, 2001
2
Ontology / Taxonomy Root Ontology Taxonomy Generation Static Discovery Dynamic Discovery
3
What is Quality ? “Best value for the money” According to this definition, you are entitled to get high performance from a costly product; likewise a low cost product or service is expected to be a poor delivery. For example, a loose demo delivery is both predictable and acceptable, since its quality is: low conformance / low cost.
4
What is Quality ? “Good Quality is Nominal Conformance” Taxonomy Quality is defined as Taxonomy Conformance to: Valid requirements; Explicitly documented development standards; and, Implicit characteristics that are expected of all professionally developed taxonomies, such as the desire for good maintainability.
5
Standards ISO 2788-1986 International Organization for Standardization. Documentation—Guidelines for the Establishment and Development of Monolingual Thesauri. 2nd ed. n.p.: ISO, 1986. (ISO 2788- 1986(E)). (Available in the U.S. from American National Standards Institute) ISO 5964-1985 International Organization for Standardization. Documentation—Guidelines for the Establishment and Development of Multilingual Thesauri. n.p.: ISO, 1985. (ISO 5964-1985(E)). (Available in the U.S. from American National Standards Institute) ANSI/NISO Z39.19-1993 National Information Standards Institute. Guidelines for the Construction, Format, and Management of Monolingual Thesauri. Bethesda, MD: NISO Press, 1994. 69p. (ANSI/NISO Z39.19-1993) SEMIO Quality Plan v1 2000 ISO/IEC 13250 Topic Maps RDF Please refer to RDF at http://www.w3.org/RDF and XML at http://www/w3/org/XMLhttp://www.w3.org/RDFhttp://www/w3/org/XML
6
Project Plan 1.Kick-off 2.Requirements Review 3.Lexicon Review 4.Taxonomy Review 5.Tags Review 6.Final Review
7
1. Kick-off Objectives Purpose Scope Scale Users Conditions of receipt Roles Supplier Customer –Admin –KE –Experts –Users Planning Training and Transfer
8
2. Requirements Review Sources Lexicon Ontology Install
9
Sources Dispersion (Multiplicity, Size, Homogeneity) Refresh Access
10
Typical Patterns Disparity Adjust sources Adjust crawl strategy Isolate communities / taxonomies
11
Lexicon Vocabularies, etc. Substitutions: Acronyms, Synonyms, etc. Preferred Keywords: Brand Names, etc. Banned Keywords
12
Typical Patterns Lack of requirements Use Librarian Resources
13
Ontology Thesaurus ? Is the information domain analysis complete, consistent, and accurate ? Is the partitioning of the problem complete ?
14
Typical Patterns Directory versus Taxonomy Isolate “directory” branches Thesaurus versus Taxonomy Put an ontology on top of thesaurus Check ASAP match of thesaurus generics with extracted lexicon Very high level design for top categories requirements Plan to work bottom-up See also Taxonomy (functions, combinations, etc.)
15
Install Implementation / Integration: Are external and internal interfaces properly defined? Are all requirements traceable to the system level? Has prototyping been conducted for the user/customer? Is performance achievable within the constraints imposed by other system elements? Are requirements consistent with schedule, resources, and budget?
16
Typical Patterns Scale Security Missing Documents
17
3. Lexicon Review Coverage Extracted words / Words (Extracted Index / Index) Sources bench-marking Coverage Extraction quality Topic distribution Structure Most Frequent Phrases Most Productive Generics Substitutions Exceptions
18
Typical Patterns Low level of frequency / quality for the most meaningful content Increase size of value corpus Filter and re-import lexicon
19
4. Taxonomy Review Taxonomy Operation Correctness Reliability Usability Integrity Efficiency Taxonomy Revision Maintainability Flexibility Testability Taxonomy Transition Portability Reusability Interoperability
20
Tax Liability Loan Term loan Short-term loan Unique Beginner Life Form Generic Specific Varietal Folk Taxonomies Design The Berlin and Kay model: Taxonomy = Nomenclature + Terminology
21
Correctness Accuracy Completeness Consistency
22
Accuracy Precision Recall
23
Completeness TaxonomyMapsLexiconCollection
24
Concentration Works Against Quality Lexicon Document Collection Maps Taxonomy Tagging Tagging Coverage Ontology Coverage Hook Coverage Map Coverage Lexical Coverage Collection Coverage
25
Consistency: Typical Patterns Objectivization Hyperonymy Speciation Necessity
26
Objectivization Employment Firing Hiring Salaries Avoid functional categories Don’t mix functions / objects Exhaust scripts Match idiomatic phrases
27
Genericity Parts Air Conditioning Belts and Hoses Body Brake System Chassis Engine Exhaust System Fuel System Glass Ignition Avoid meronymy Don’t mix meronymy / hyperonymy Exhaust prototypes
28
Speciation Person Unwelcome person Unpleasant person Selfish person Opportunist Backscratcher Avoid “strings” of categories Avoid (non-idioms) properties for categories (WordNet)
29
Necessity Avoid non-productive categories Avoid combinations of categories
30
Nomenclature (Design Structure) Quality Index Depth Width Balance
31
Complexity Index Cyclometric complexity increases with number of Cross References within the Taxonomy, giving an indication of complexity and difficulty of testing. Taxonomy Complexity Index combines: autonomy closure similarity typicality commonality redundancy stability
32
Maturity index The IEEE standard 982.1-1988 suggests a taxonomy maturity index to provide an indication of the stability of the taxonomy. Maturity Index combines: number of modules in current ontology / taxonomy. number of modules in current ontology / taxonomy that have been changed. number of modules added to current ontology / taxonomy. number of modules deleted from the previous version of the ontology / taxonomy.
33
5. Tags Review Document coverage Concepts coverage http://www.TaxSource.com Liability 1.289 Federal Funds 0.746
34
6. Final Review Receipt Maintenance
35
Quality Taxonomies Jim Nisbet niz@semio.com Knowledge Technologies 2001
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.