Presentation is loading. Please wait.

Presentation is loading. Please wait.

6 Classification Schemes. Prof. Dr. Knut Hinkelmann 2 6 Classification Schemes Organizing Information Objective: finding a way through the overwhelming.

Similar presentations


Presentation on theme: "6 Classification Schemes. Prof. Dr. Knut Hinkelmann 2 6 Classification Schemes Organizing Information Objective: finding a way through the overwhelming."— Presentation transcript:

1 6 Classification Schemes

2 Prof. Dr. Knut Hinkelmann 2 6 Classification Schemes Organizing Information Objective: finding a way through the overwhelming volume of material. Approach: organize information into patterns with related items brought together Information collections used by many people are organized in a way that correspond to the needs of most users, e.g.  the navigation on the intranet or on a homepage  the arrangement of books in libraries or bookshops support users in searching Information collections for individual use are also organized  the order of paper files reflects the way you normally use them  file management on a computer groups items according to their shared characteristics, e.g. the nature of the item (software, document, database etc) the project reference

3 Prof. Dr. Knut Hinkelmann 3 6 Classification Schemes Classification Classification is an organization means arranging information items into classes - dividing the universe of information into manageable and logical portions. A class or category is a group of concepts that have something in common. This shared property gives the class its identity. Classifications may be designed for various purposes like  scientific classification  classification for information indexing and retrieval A class may be further divided into smaller classes (or subclasses), and so on, until no further subdivision is feasible. So classification is likely to be hierarchic. Source: UDConline (http://www.udconline.net/)

4 Prof. Dr. Knut Hinkelmann 4 6 Classification Schemes Use of classification schemes Classification schemes can be used to physically group items, e.g.  books in a library or  retail goods in a supermarket  papers in a file cabinet logically organize references to information objects - in other words: metadata -, e.g.  directory on a computer  internet directory  yellow pages system file system onlin e physical

5 Prof. Dr. Knut Hinkelmann 5 6 Classification Schemes Types of classification systems Flat Organisation: no structure between categories Taxonomy: categories arranged in a hierarchical structure related things are grouped together

6 Prof. Dr. Knut Hinkelmann 6 6 Classification Schemes Classification Schemes The classification scheme can be  decided locally  represent a consensus The greater the quantity or complexity of items, the more helpful it is to follow a ready-made classification scheme, which represents a consensus as to a helpful order of classes Classification schemes may be either:  special, i.e. limited to a specific domain of interest; or  general, i.e. aiming to cover all subjects equally ('the universe of information'). Three most widely used general classification schemes are :  Dewey Decimal Classification (DDC)  Universal Decimal Classification (UDC)  Library of Congress Classification (LCC)

7 Prof. Dr. Knut Hinkelmann 7 6 Classification Schemes Example: Universal Decimal Classification (UDC) 0GENERALITIES 1PHILOSOPHY. PSYCHOLOGY 2RELIGION. THEOLOGY 3SOCIAL SCIENCES 4VACANT 5NATURAL SCIENCES 6TECHNOLOGY 7THE ARTS 8LANGUAGE. LINGUISTICS. LITERATURE 9GEOGRAPHY. BIOGRAPHY. HISTORY 0GENERALITIES 1PHILOSOPHY. PSYCHOLOGY 2RELIGION. THEOLOGY 3SOCIAL SCIENCES 4VACANT 5NATURAL SCIENCES 6TECHNOLOGY 7THE ARTS 8LANGUAGE. LINGUISTICS. LITERATURE 9GEOGRAPHY. BIOGRAPHY. HISTORY 004Computer science and technology 004.8Artificial intelligence 004.89Artificial intelligence application systems 004.891Expert systems 004.891.2Consultation expert systems 004Computer science and technology 004.8Artificial intelligence 004.89Artificial intelligence application systems 004.891Expert systems 004.891.2Consultation expert systems The Universal Decimal Classification (UDC) is a classification scheme for all fields of knowledge and knowledge representation. UDC was originally created to organize a universal bibliography UDC was created in 1895, it has been translated into over thirty languages The scheme is updated annually, its standard version - known as the Master Reference File (MRF) - is available electronically in English language, The UDC is structured in a hierarchical manner, based on ten main classes The classes are further divided decimally. The notation is basically arabic numerals: http://www.udcc.org/

8 Prof. Dr. Knut Hinkelmann 8 6 Classification Schemes Applications of UDC libraries  shelf arrangement  information retrieval (classified catalogues)  collection management (acquisition, circulation statistics, weeding) museums and archives  collection management  objects indexing and retrieval  collection display bibliographies and bibliographic databases  subject information navigation  information retrieval information services  selective dissemination of information (user's profile description) Internet  subject gateways (information presentation and navigation)  metadata (information discovery) As a source for building knowledge domain maps (ontologies), other indexing languages (thesauri) and various kinds of taxonomies and special classifications

9 Prof. Dr. Knut Hinkelmann 9 6 Classification Schemes Notation Most classification schemes, including UDC, have a notation - a code that symbolizes the subject of each class and its place in the sequence. A simple list of named classes, which would file alphabetically, would not fulfil the purpose of keeping related things together, and separated from unrelated things. This can be done by using a notation which has an inherent order, such as numerals, alphabetic notation or a mixture (alphanumeric). Notation with variable length can also express the position in the hierarchy, with each extra character representing a lower level; this is called expressive notation. Arabic numerals arranged as decimal fractions are ideal for this purpose. Decimal fractions also have the advantage of being infinitely extensible, so it is always possible to introduce further subdivisions without altering the ordinal value of the rest of the sequence. Such notation is said to be hospitable.

10 Prof. Dr. Knut Hinkelmann 10 6 Classification Schemes Example 2: Computing Classification Scheme of the Association for Computing Machinery ACM Developed to classify articles of the ACM Computing Reviews journal In the meanwhile it is used by many other computer science journals, the ACM Digital Library and the MEDOC database The full ACM classification scheme involves the following concepts :  classification codes: tree structure containing three coded levels  subject descriptors: an uncoded fourth level of the tree  general terms: predefined set of terms that apply to any elements of the tree algorithms languagessecurity design legal aspects standardization documentation management theory economics measurement verification experimantation performance human factors reliability  implicit subject descriptors (also called "Proper Noun Subject Descriptors"): names of products, systems, languages, and prominent people in the computing field, along with the category code under which they are classified www.acm.org/class/1998

11 Prof. Dr. Knut Hinkelmann 11 6 Classification Schemes The first three levels of the ACM Classification Scheme H.3 Information Storage and Retrieval H.3.0General H.3.1Content Analysis and Indexing H.3.2Information Storage H.3.3Information Search and Retrieval H.3.4System and Software H.3.5Online Information Systems H.3.6Library Automation H.3.7Digital Libraries H.3.mMiscellaneous H.3 Information Storage and Retrieval H.3.0General H.3.1Content Analysis and Indexing H.3.2Information Storage H.3.3Information Search and Retrieval H.3.4System and Software H.3.5Online Information Systems H.3.6Library Automation H.3.7Digital Libraries H.3.mMiscellaneous H. Information Systems H.0General H.1Models and Principles H.2Database Management H.3Information Storage and Retrieval H.4Information Systems Applications H.5Information Interfaces & Presentation H.mMiscellaneous H. Information Systems H.0General H.1Models and Principles H.2Database Management H.3Information Storage and Retrieval H.4Information Systems Applications H.5Information Interfaces & Presentation H.mMiscellaneous A.General Literature B.Hardware C.Computer Systems Organisation D.Software E.Data F.Theory of Computation G.Mathematics of Computing H.Information Systems I.Computer Methodologies J.Computer Applications K.Computing Milieux A.General Literature B.Hardware C.Computer Systems Organisation D.Software E.Data F.Theory of Computation G.Mathematics of Computing H.Information Systems I.Computer Methodologies J.Computer Applications K.Computing Milieux H.Information Systems Classification codes

12 Prof. Dr. Knut Hinkelmann 12 6 Classification Schemes Examples for Subject Descriptors of the ACM Classification Schemes (Level 4 - uncoded) H.3.3Information Storage and Retrieval  Clustering  Information filtering  Query formulation  Relevance feedback  Retrieval models  Search processes  Selection processes H.3.3Information Storage and Retrieval  Clustering  Information filtering  Query formulation  Relevance feedback  Retrieval models  Search processes  Selection processes H.3.1Content Analysis and Indexing  Abstracting methods  Dictionaries  Indexing methods  Linguistic processing  Thesauruses H.3.1Content Analysis and Indexing  Abstracting methods  Dictionaries  Indexing methods  Linguistic processing  Thesauruses I.2.8 Problem Solving, Control Methods, and Search  Backtracking  Control Theory  Dynamic Programming  Graph and tree search strategies  Heuristic methods  Plan execution, formation, and generation  Scheduling I.2.8 Problem Solving, Control Methods, and Search  Backtracking  Control Theory  Dynamic Programming  Graph and tree search strategies  Heuristic methods  Plan execution, formation, and generation  Scheduling

13 Prof. Dr. Knut Hinkelmann 13 6 Classification Schemes Example Classification according to ACM Classification Scheme A Consequence-finding approach for feature recognition in CAPP Knut Hinkelmann Proceedings of the seventh international conference on Industrial and Engineering applications of artificial intelligence and expert systems May 1994 (from ACM Digital Library) The document has multiple classifications Primary Classification Additional Classifications

14 Prof. Dr. Knut Hinkelmann 14 6 Classification Schemes Problems with Classification Schemes Classification Schemes must be revised and adapted to new developments  Example: The ACM classification scheme Classification Schemes often are developed for a specific domain of interest. Documents outside the scope cannot be classified. Classification scheme must be comprehensible for all users Many documents cannot be classified eindeutig. To deal with this problem, classification systems can offer different solutions  Select exactly one classification  Example: Physically organising books in a library  Assign multiple classifications  Assign one primary and optional additional classifications  Example: In a library the primary classification corresponds to the physical location while the additional classifications can be used for searching in the library catalogue

15 Prof. Dr. Knut Hinkelmann 15 6 Classification Schemes Monodimensional vs. Polydimensional Classification Monodimensional  Classification according to one aspect Polydimensional  Classification according to different (independent) aspects  Documents may have different classes  Each dimension corresponds to one aspect/attribute Dokument [Aspect: Type] report correspondence invoice presentation [Aspect: kind] text image video [Aspect: Format] MS Word ASCII XML txt Dokument [Aspect: Type] report correspondence invoice presentation [Aspect: kind] text image video [Aspect: Format] MS Word ASCII XML txt Example „Foliensatz Information Retrieval“

16 Prof. Dr. Knut Hinkelmann 16 6 Classification Schemes Example: Polydimensional Classification Europe  Switzerland  EU  Eastern Europe North America  USA  Canada South America Asia  China  India  Japan Africa Europe  Switzerland  EU  Eastern Europe North America  USA  Canada South America Asia  China  India  Japan Africa health insurance  in-patient  out-patient  dental  short-term disability life insurance  endowment  term  accident property insurance  hausehold  liability  car  private health insurance  in-patient  out-patient  dental  short-term disability life insurance  endowment  term  accident property insurance  hausehold  liability  car  private Market: Products: Example: Document management in a re-insurance company Documents are classified in two dimension  Product type  Market in which the product is offered

17 Prof. Dr. Knut Hinkelmann 17 6 Classification Schemes Each classification dimension corresponds to a metadata attribute Document type:report Document format:MS Word Product:Life - annuity Market:Europa - Switzerland Document type:report Document format:MS Word Product:Life - annuity Market:Europa - Switzerland Index:

18 Prof. Dr. Knut Hinkelmann 18 6 Classification Schemes Aspect classification The widely used general classification schemes are aspect classifications. A simple concept may have several places in the scheme, each representing a different aspect of it. For example, the simple concept 'horse' has aspects which are allowed for under Zoology, Animal husbandry, Transport, Sport and Recreation, among others.

19 Prof. Dr. Knut Hinkelmann 19 6 Classification Schemes Advantages of Classification Schemes A classification scheme is an indexing and retrieval language. It groups related items into classes, and arranges such groups in a hierarchy so that users can then trace topics in their context and scan subject field from general to specific or vice versa. : Subject organization in classification is not language-dependent, as the subject is symbolized by a class number, which allows for cross language and cross collection information discovery. Documents on a certain subject will be collocated under the same class number irrespective the language they are written in or the language of cataloguing centre assigning the UDC number, For instance, documents on nanotechnology in e.g. Chinese, German, French or British library will all be classed as 620.3 irrespective the language of the publication. lIt overcomes the ambiguities of natural language; for instance, the word 'paraffin' has both a technical sense (a series of saturated aliphatic hydrocarbons of the general formula CnH2n+2) and a popular one (=kerosine, a petroleum fraction with a particular boiling range), so a verbal search would retrieve many irrelevant results - but a class number is unambiguous. In UDC, paraffin production is at 665.637.2 and kerosine production is at 665.634. It can also help to overcome problems of unfamiliar terminology, allowing non- specialists to find information through subject browsing An internationally widely used scheme, such as UDC, can facilitate the exchange of information between systems, and provide a basic standard from which more specialized information retrieval tools may be developed.


Download ppt "6 Classification Schemes. Prof. Dr. Knut Hinkelmann 2 6 Classification Schemes Organizing Information Objective: finding a way through the overwhelming."

Similar presentations


Ads by Google