Presentation is loading. Please wait.

Presentation is loading. Please wait.

CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.

Similar presentations


Presentation on theme: "CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal."— Presentation transcript:

1 CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal Technical University of Cluj-Napoca Department of Computer Science,

2 CONTI’2008, 5-6 June 2008, TIMISOARA2 Content Introduction Introduction Ontological approach towards digital library (DL) design Ontological approach towards digital library (DL) design Requirements for DLs Requirements for DLs A DL model for scientific and technical purposes A DL model for scientific and technical purposes Information retrieval in DLs Information retrieval in DLs Conclusions Conclusions

3 CONTI’2008, 5-6 June 2008, TIMISOARA3 Digital Content Management Systems and Digital Libraries Historical perspective Historical perspective Information gathering and preservation – an important attribute of any civilization Information gathering and preservation – an important attribute of any civilization A measure of the civilization level A measure of the civilization level Digital libraries Digital libraries Not only digitized form of classical libraries Not only digitized form of classical libraries A cooperation and communication environment A cooperation and communication environment Digital Content management systems: Digital Content management systems: Systems responsible for: Creation, Storage and Access to relevant information Systems responsible for: Creation, Storage and Access to relevant information It serves a community and/or a purpose (a project, a company, a virtual organization, etc.) It serves a community and/or a purpose (a project, a company, a virtual organization, etc.) The main goal of a DL (as outlined in the DELOS project) The main goal of a DL (as outlined in the DELOS project) “to allow any users transparent access to all the digital content anytime from anywhere in an efficient, effective and consistent way” “to allow any users transparent access to all the digital content anytime from anywhere in an efficient, effective and consistent way”

4 CONTI’2008, 5-6 June 2008, TIMISOARA4 Ontology for technical and scientific purposes Ontology: Ontology: Concepts and relations Concepts and relations Intelligent reasoning and constrains Intelligent reasoning and constrains Organizing a DL on ontology basis: Organizing a DL on ontology basis: For interoperability and flexible data exchange For interoperability and flexible data exchange For higher quality in information retrieval For higher quality in information retrieval Concepts: Concepts: Digital library Digital library a collection of digital content dedicated for a well defined purpose and to which a number of users (actors) and specific functionalities are associated a collection of digital content dedicated for a well defined purpose and to which a number of users (actors) and specific functionalities are associated dynamically created, modified and deleted in accordance with a given goal or purpose dynamically created, modified and deleted in accordance with a given goal or purpose It serves a given community of users organized in virtual organizations It serves a given community of users organized in virtual organizations

5 CONTI’2008, 5-6 June 2008, TIMISOARA5 Concepts Digital object Digital object Association of content (essence) and metadata (data about content) Association of content (essence) and metadata (data about content) The elementary data preservation entity The elementary data preservation entity It may contain information in different formats (text, image, video, etc.) It may contain information in different formats (text, image, video, etc.) Collection Collection Association of digital objects based on a given criterion or purpose (e.g. project, conference, course) Association of digital objects based on a given criterion or purpose (e.g. project, conference, course) It may also contain other collections It may also contain other collections Note: a digital object may be part of a number of collections Note: a digital object may be part of a number of collections Virtual organization Virtual organization A community of users associated with a digital library A community of users associated with a digital library Users that have a common goal and share common resources in order to fulfill the goal Users that have a common goal and share common resources in order to fulfill the goal Users have different roles and access rights (create, read, modify, delete digital objects) Users have different roles and access rights (create, read, modify, delete digital objects) Metadata Metadata Define different aspects of digital content: Define different aspects of digital content: descriptive metadata (keywords, topics, ID) descriptive metadata (keywords, topics, ID) Structural metadata (internal organization of the data) Structural metadata (internal organization of the data) Administrative metadata (access rights, quality control, ) Administrative metadata (access rights, quality control, ) Used for efficient data search, indexing and retrieval Used for efficient data search, indexing and retrieval

6 CONTI’2008, 5-6 June 2008, TIMISOARA6 Concepts and relations for the technical and scientific domain Project: Project: A collection of digital objects: A collection of digital objects: Documents needed as support for the project (reference documents: books, articles, standards, etc.) Documents needed as support for the project (reference documents: books, articles, standards, etc.) Documents dynamically created during the project (technical or scientific documents) Documents dynamically created during the project (technical or scientific documents) A set of users (team members) grouped in a virtual organization A set of users (team members) grouped in a virtual organization A common goal A common goal Course: Course: A collection of teaching materials (electronic books, presentations, exercises and laboratory works) A collection of teaching materials (electronic books, presentations, exercises and laboratory works) Teaching staff (course responsible, assistants, PhD students, etc.) and students, with different access rights Teaching staff (course responsible, assistants, PhD students, etc.) and students, with different access rights Automated services for documents’ upload and publication. Automated services for documents’ upload and publication. Events: Conference, Workshop, seminar Events: Conference, Workshop, seminar A collection of articles A collection of articles A set of presentation and administrative materials (organizing committees, web-portal, accommodation and travel information, etc.) A set of presentation and administrative materials (organizing committees, web-portal, accommodation and travel information, etc.) A set of participants A set of participants A digital object may be part of a number of structured entities: e.g. an article may be the result of a project, it may be included into the proceedings of a conference and it may be reference material for a course e.g. an article may be the result of a project, it may be included into the proceedings of a conference and it may be reference material for a course

7 CONTI’2008, 5-6 June 2008, TIMISOARA7 Relations

8 8 Standards and communication protocols http://mapageweb.umontreal.ca/turner/meta/english/metamap.html

9 CONTI’2008, 5-6 June 2008, TIMISOARA9 Standards and communication protocols MARC (MAchine Readable Cataloging) MARC (MAchine Readable Cataloging) promoted by the Library of Congress promoted by the Library of Congress Used to exchange bibliographic information between libraries Used to exchange bibliographic information between libraries Dublin Core metadata Dublin Core metadata Standard for simplified metadata exchange Standard for simplified metadata exchange Z39.50 Z39.50 defines a protocol for client-server based information retrieval defines a protocol for client-server based information retrieval The Open Archives Initiative (OAI) The Open Archives Initiative (OAI) a technical framework with client-driven interaction. The protocol supports interaction between a data provider and a service provider a technical framework with client-driven interaction. The protocol supports interaction between a data provider and a service provider

10 CONTI’2008, 5-6 June 2008, TIMISOARA10 Requirements for Digital Content Management systems Functional requirements: Functional requirements: Content submission (upload) Content submission (upload) Content storage: distributed, replicated, Content storage: distributed, replicated, Indexing and cataloging (based on metadata) Indexing and cataloging (based on metadata) Content search and retrieval Content search and retrieval Based on metadata Based on metadata Based on full-text search Based on full-text search Users management Users management Access control and authorization Access control and authorization Content annotation and classification Content annotation and classification Data processing services Data processing services Architectural requirements: Architectural requirements: Distribution of resources, services and users Distribution of resources, services and users Transparent access to remote content (including other DL resources) Transparent access to remote content (including other DL resources) Management of QoS Management of QoS

11 CONTI’2008, 5-6 June 2008, TIMISOARA11 A digital library model for scientific and technical purposes User InterfacesOAI Data Provider (content harvesting) Metadata Management Content Management User & Virtual Organization Management Search Engine Security Management Presentation Layer Business Logic Layer Query Processor History Recorder Ontology Metadata (SQL) GRID infrastructure SE &SRM Repository Storage and communication Layer

12 CONTI’2008, 5-6 June 2008, TIMISOARA12 Information search and retrieval Content search and retrieval: Content search and retrieval: Based on metadata – DB techniques Based on metadata – DB techniques Based of full-text analysis Based of full-text analysis Full-Text search: Full-Text search: Key-word search Key-word search Semantic Information Retrieval (e.g. documents with semantic annotations, semantic graphs, etc.) Semantic Information Retrieval (e.g. documents with semantic annotations, semantic graphs, etc.) Non-semantic Information Retrieval (e.g. probabilistic matching) Non-semantic Information Retrieval (e.g. probabilistic matching) Processing sequence: Processing sequence: Format conversion (DOC, PDF into TXT) Format conversion (DOC, PDF into TXT) Document parsing – rule-based key-words extraction Document parsing – rule-based key-words extraction Heuristics for relevance processing (probabilistic, distance, semantic graphs, etc.) Heuristics for relevance processing (probabilistic, distance, semantic graphs, etc.) “Query by example” “Query by example”

13 CONTI’2008, 5-6 June 2008, TIMISOARA13 Non-semantic Information Retrieval Naive Bayes Algorithm Naive Bayes Algorithm Allows classification of new (unlabeled) documents based on learning document (labeled) sets Allows classification of new (unlabeled) documents based on learning document (labeled) sets The algorithm determines the probability of words being related to a given topic The algorithm determines the probability of words being related to a given topic Problems: Problems: does not treat the problem of similar words does not treat the problem of similar words words are considered independent of their context (“naïve Bayes”) words are considered independent of their context (“naïve Bayes”) Topic-Based Vector Space Model Algorithm Topic-Based Vector Space Model Algorithm Treats the problem of similar words (synonyms are replaced) Treats the problem of similar words (synonyms are replaced) The steam of words are considered The steam of words are considered The algorithm associates a vector for every relevant word The algorithm associates a vector for every relevant word The similarity between 2 words is computed as the scalar product between the two associated vectors; The similarity between 2 words is computed as the scalar product between the two associated vectors; A document vector is computed as a weighted sum of the containing words’ vectors A document vector is computed as a weighted sum of the containing words’ vectors We proposed an automatic weight computation based on the relevance of a word to a given topic: We proposed an automatic weight computation based on the relevance of a word to a given topic: According to the proposed method the weight of a vector is computed as a function of its appearance frequency in the processed documents According to the proposed method the weight of a vector is computed as a function of its appearance frequency in the processed documents

14 CONTI’2008, 5-6 June 2008, TIMISOARA14 Conclusions The paper presents a new vision on the design and implementation of digital content management system. The paper presents a new vision on the design and implementation of digital content management system. The proposed ontology-based DL allows better content organization and retrieval The proposed ontology-based DL allows better content organization and retrieval The model was implemented on a GRID infrastructure The model was implemented on a GRID infrastructure As search and information retrieval two algorithms were implemented and tested. As search and information retrieval two algorithms were implemented and tested. The naïve Bayes algorithm is faster but it is not context aware The naïve Bayes algorithm is faster but it is not context aware The Topic-Based Vector Space Model Algorithm requires more processing time and more interaction from the user, but the quality of the results is higher The Topic-Based Vector Space Model Algorithm requires more processing time and more interaction from the user, but the quality of the results is higher


Download ppt "CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal."

Similar presentations


Ads by Google