CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.

Slides:



Advertisements
Similar presentations
E-learning and Libraries WSIS Forum, Geneva,11 May 2010 Tullio Basaglia, CERN Scientific Information Service, Geneva.
Advertisements

DSpace: the MIT Libraries Institutional Repository MacKenzie Smith, MIT EDUCAUSE 2003, November 5 th Copyright MacKenzie Smith, This work is the.
Interoperability Scenarios All Working Groups Meeting May, Rome, Italy.
The Library behind the scene How does it work ? The Library behind the scenes 1 JINR / CERN Grid and advanced information systems 2012 Anne Gentil-Beccot.
ELPUB 2006 June Bansko Bulgaria1 Automated Building of OAI Compliant Repository from Legacy Collection Kurt Maly Department of Computer.
Planning for Flexible Integration via Service-Oriented Architecture (SOA) APSR Forum – The Well-Integrated Repository Sydney, Australia February 2006 Sandy.
Web- and Multimedia-based Information Systems. Assessment Presentation Programming Assignment.
ISP 433/533 Week 8 IR in libraries. Goal Universal Access to Information Vannevar Bush 1945 article Memex A memex is a device in which an individual stores.
Architecture & Data Management of XML-Based Digital Video Library System Jacky C.K. Ma Michael R. Lyu.
Internet Resources Discovery (IRD) IBM DB2 Digital Library Thanks to Zvika Michnik and Avital Greenberg.
The Open Archives Initiative Simeon Warner (Cornell University) Symposium on “Scholarly Publishing and Archiving on the Web”, University.
OLC Spring Chapter Conferences Metadata, Schmetadata … Tell Me Why I Should Care? OLC Spring Chapter Conferences, 2004 Margaret.
The Open Archives Initiative Simeon Warner (Cornell University) Open Archives seminar “Facilitating Free and Efficient Scientific.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
ACCESS TO QUALITY RESOURCES ON RUSSIA Tanja Pursiainen, University of Helsinki, Aleksanteri institute. EVA 2004 Moscow, 29 November 2004.
GL12 Conf. Dec. 6-7, 2010NTL, Prague, Czech Republic Extending the “Facets” concept by applying NLP tools to catalog records of scientific literature *E.
Cluj Napoca, 28 August IEEE International Conference on Intelligent Computer Communication and Processing Digital Libraries Workshop Towards.
Grey Literature, E-Repositories and Evaluation of Academic & Research Institutes. The case study of BPI e-repository Maria V. Kitsiou - Head Librarian,
Teaching Metadata and Networked Information Organization & Retrieval The UNT SLIS Experience William E. Moen School of Library and Information Sciences.
JY Le Meur/Tibor Simko 12 th Feb’04 1)Context 2)Interoperability 3)Submission 4)Search 5)Preservation CERN, OAI3 Workshop, Geneva.
Dr. Kurt Fendt, Comparative Media Studies, MIT MetaMedia An Open Platform for Media Annotation and Sharing Workshop "Online Archives:
The Exchange of Retrieval Knowledge about Services between Agents Mirjam Minor Mike Wernicke.
Addressing Metadata in the MPEG-21 and PDF-A ISO Standards NISO Workshop: Metadata on the Cutting Edge May 2004 William G. LeFurgy U.S. Library of Congress.
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
1st Workshop on Intelligent and Knowledge oriented Technologies Universal Semantic Knowledge Middleware Marek Paralič,
Collaborative Research: Curriculum Development for Digital Library Education Presentation in May 1,2006
GEO: a special collection for Earth Science community *Stefania Biagioni, *Silvia Giannini, **Cecilia Giussani *CNR-ISTI, **CNR-IGG Pisa, Italy GL13 Conference,
Developing a Concept Extraction Technique with Ensemble Pathway Prat Tanapaisankit (NJIT), Min Song (NJIT), and Edward A. Fox (Virginia Tech) Abstract.
Ontologies and Lexical Semantic Networks, Their Editing and Browsing Pavel Smrž and Martin Povolný Faculty of Informatics,
RCDL Conference, Petrozavodsk, Russia Context-Based Retrieval in Digital Libraries: Approach and Technological Framework Kurt Sandkuhl, Alexander Smirnov,
19/10/20151 Semantic WEB Scientific Data Integration Vladimir Serebryakov Computing Centre of the Russian Academy of Science Proposal: SkTech.RC/IT/Madnick.
07/11/2002Thomas Baron - JACoW Workshop1 CERN Library Requirements T. Baron CERN ETT-DH-CDS.
MTA SZTAKI Department of Distributed Systems The problems of persistent identifiers in the context of the National Digital Data Archives of Hungary András.
1 CS 502: Computing Methods for Digital Libraries Lecture 19 Interoperability Z39.50.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
ON-line SERVICES based on DIGITAL DOCUMENTS Prof. Doina Banciu ROCS Bucharest, 2008.
MIND: An architecture for multimedia information retrieval in federated digital libraries Henrik Nottelmann University of Dortmund, Germany.
1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,
1 Chapter 1 Introduction to Databases Transparencies.
Agenda Why discuss Digital Libraries What is a digital Library History Meta-data FEDORA NSDL D Space.
Introduction to the Semantic Web and Linked Data
Digitization – Basics and Beyond workshop Interoperability of cultural and academic resources New services for digitized collections Muriel Foulonneau.
Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Automatic Metadata Discovery from Non-cooperative Digital Libraries By Ron Shi, Kurt Maly, Mohammad Zubair IADIS International Conference May 2003.
National Technical University of Ukraine “Kiev Polytechnic Institute” Heat and energy design faculty Department of automation design of energy processes.
UKOLN is supported by: Introduction to UKOLN Dr Liz Lyon, Director UKOLN, University of Bath, UK Grand Challenge Meeting, June a centre.
26/05/2005 Research Infrastructures - 'eInfrastructure: Grid initiatives‘ FP INFRASTRUCTURES-71 DIMMI Project a DI gital M ulti M edia I nfrastructure.
DSpace - Digital Library Software
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
Functionality Working Group Dagobert Soergel University at Buffalo 1.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
June 3-6, 2003E-Society Lisbon Automatic Metadata Discovery from Non-cooperative Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
1 CS 430: Information Discovery Lecture 13 Case Study: the NSDL.
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
Building A Repository for Digital Objects
Repository Software - Standards
GSAF Grid Storage Access Framework
VI-SEEM Data Repository
Outline Pursue Interoperability: Digital Libraries
Ahmet Fatih Mustacoglu
Metadata to fit your needs... How much is too much?
Context Interoperability Submission Search Preservation
Metadata in Digital Preservation: Setting the Scene
Presentation transcript:

CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal Technical University of Cluj-Napoca Department of Computer Science,

CONTI’2008, 5-6 June 2008, TIMISOARA2 Content Introduction Introduction Ontological approach towards digital library (DL) design Ontological approach towards digital library (DL) design Requirements for DLs Requirements for DLs A DL model for scientific and technical purposes A DL model for scientific and technical purposes Information retrieval in DLs Information retrieval in DLs Conclusions Conclusions

CONTI’2008, 5-6 June 2008, TIMISOARA3 Digital Content Management Systems and Digital Libraries Historical perspective Historical perspective Information gathering and preservation – an important attribute of any civilization Information gathering and preservation – an important attribute of any civilization A measure of the civilization level A measure of the civilization level Digital libraries Digital libraries Not only digitized form of classical libraries Not only digitized form of classical libraries A cooperation and communication environment A cooperation and communication environment Digital Content management systems: Digital Content management systems: Systems responsible for: Creation, Storage and Access to relevant information Systems responsible for: Creation, Storage and Access to relevant information It serves a community and/or a purpose (a project, a company, a virtual organization, etc.) It serves a community and/or a purpose (a project, a company, a virtual organization, etc.) The main goal of a DL (as outlined in the DELOS project) The main goal of a DL (as outlined in the DELOS project) “to allow any users transparent access to all the digital content anytime from anywhere in an efficient, effective and consistent way” “to allow any users transparent access to all the digital content anytime from anywhere in an efficient, effective and consistent way”

CONTI’2008, 5-6 June 2008, TIMISOARA4 Ontology for technical and scientific purposes Ontology: Ontology: Concepts and relations Concepts and relations Intelligent reasoning and constrains Intelligent reasoning and constrains Organizing a DL on ontology basis: Organizing a DL on ontology basis: For interoperability and flexible data exchange For interoperability and flexible data exchange For higher quality in information retrieval For higher quality in information retrieval Concepts: Concepts: Digital library Digital library a collection of digital content dedicated for a well defined purpose and to which a number of users (actors) and specific functionalities are associated a collection of digital content dedicated for a well defined purpose and to which a number of users (actors) and specific functionalities are associated dynamically created, modified and deleted in accordance with a given goal or purpose dynamically created, modified and deleted in accordance with a given goal or purpose It serves a given community of users organized in virtual organizations It serves a given community of users organized in virtual organizations

CONTI’2008, 5-6 June 2008, TIMISOARA5 Concepts Digital object Digital object Association of content (essence) and metadata (data about content) Association of content (essence) and metadata (data about content) The elementary data preservation entity The elementary data preservation entity It may contain information in different formats (text, image, video, etc.) It may contain information in different formats (text, image, video, etc.) Collection Collection Association of digital objects based on a given criterion or purpose (e.g. project, conference, course) Association of digital objects based on a given criterion or purpose (e.g. project, conference, course) It may also contain other collections It may also contain other collections Note: a digital object may be part of a number of collections Note: a digital object may be part of a number of collections Virtual organization Virtual organization A community of users associated with a digital library A community of users associated with a digital library Users that have a common goal and share common resources in order to fulfill the goal Users that have a common goal and share common resources in order to fulfill the goal Users have different roles and access rights (create, read, modify, delete digital objects) Users have different roles and access rights (create, read, modify, delete digital objects) Metadata Metadata Define different aspects of digital content: Define different aspects of digital content: descriptive metadata (keywords, topics, ID) descriptive metadata (keywords, topics, ID) Structural metadata (internal organization of the data) Structural metadata (internal organization of the data) Administrative metadata (access rights, quality control, ) Administrative metadata (access rights, quality control, ) Used for efficient data search, indexing and retrieval Used for efficient data search, indexing and retrieval

CONTI’2008, 5-6 June 2008, TIMISOARA6 Concepts and relations for the technical and scientific domain Project: Project: A collection of digital objects: A collection of digital objects: Documents needed as support for the project (reference documents: books, articles, standards, etc.) Documents needed as support for the project (reference documents: books, articles, standards, etc.) Documents dynamically created during the project (technical or scientific documents) Documents dynamically created during the project (technical or scientific documents) A set of users (team members) grouped in a virtual organization A set of users (team members) grouped in a virtual organization A common goal A common goal Course: Course: A collection of teaching materials (electronic books, presentations, exercises and laboratory works) A collection of teaching materials (electronic books, presentations, exercises and laboratory works) Teaching staff (course responsible, assistants, PhD students, etc.) and students, with different access rights Teaching staff (course responsible, assistants, PhD students, etc.) and students, with different access rights Automated services for documents’ upload and publication. Automated services for documents’ upload and publication. Events: Conference, Workshop, seminar Events: Conference, Workshop, seminar A collection of articles A collection of articles A set of presentation and administrative materials (organizing committees, web-portal, accommodation and travel information, etc.) A set of presentation and administrative materials (organizing committees, web-portal, accommodation and travel information, etc.) A set of participants A set of participants A digital object may be part of a number of structured entities: e.g. an article may be the result of a project, it may be included into the proceedings of a conference and it may be reference material for a course e.g. an article may be the result of a project, it may be included into the proceedings of a conference and it may be reference material for a course

CONTI’2008, 5-6 June 2008, TIMISOARA7 Relations

8 Standards and communication protocols

CONTI’2008, 5-6 June 2008, TIMISOARA9 Standards and communication protocols MARC (MAchine Readable Cataloging) MARC (MAchine Readable Cataloging) promoted by the Library of Congress promoted by the Library of Congress Used to exchange bibliographic information between libraries Used to exchange bibliographic information between libraries Dublin Core metadata Dublin Core metadata Standard for simplified metadata exchange Standard for simplified metadata exchange Z39.50 Z39.50 defines a protocol for client-server based information retrieval defines a protocol for client-server based information retrieval The Open Archives Initiative (OAI) The Open Archives Initiative (OAI) a technical framework with client-driven interaction. The protocol supports interaction between a data provider and a service provider a technical framework with client-driven interaction. The protocol supports interaction between a data provider and a service provider

CONTI’2008, 5-6 June 2008, TIMISOARA10 Requirements for Digital Content Management systems Functional requirements: Functional requirements: Content submission (upload) Content submission (upload) Content storage: distributed, replicated, Content storage: distributed, replicated, Indexing and cataloging (based on metadata) Indexing and cataloging (based on metadata) Content search and retrieval Content search and retrieval Based on metadata Based on metadata Based on full-text search Based on full-text search Users management Users management Access control and authorization Access control and authorization Content annotation and classification Content annotation and classification Data processing services Data processing services Architectural requirements: Architectural requirements: Distribution of resources, services and users Distribution of resources, services and users Transparent access to remote content (including other DL resources) Transparent access to remote content (including other DL resources) Management of QoS Management of QoS

CONTI’2008, 5-6 June 2008, TIMISOARA11 A digital library model for scientific and technical purposes User InterfacesOAI Data Provider (content harvesting) Metadata Management Content Management User & Virtual Organization Management Search Engine Security Management Presentation Layer Business Logic Layer Query Processor History Recorder Ontology Metadata (SQL) GRID infrastructure SE &SRM Repository Storage and communication Layer

CONTI’2008, 5-6 June 2008, TIMISOARA12 Information search and retrieval Content search and retrieval: Content search and retrieval: Based on metadata – DB techniques Based on metadata – DB techniques Based of full-text analysis Based of full-text analysis Full-Text search: Full-Text search: Key-word search Key-word search Semantic Information Retrieval (e.g. documents with semantic annotations, semantic graphs, etc.) Semantic Information Retrieval (e.g. documents with semantic annotations, semantic graphs, etc.) Non-semantic Information Retrieval (e.g. probabilistic matching) Non-semantic Information Retrieval (e.g. probabilistic matching) Processing sequence: Processing sequence: Format conversion (DOC, PDF into TXT) Format conversion (DOC, PDF into TXT) Document parsing – rule-based key-words extraction Document parsing – rule-based key-words extraction Heuristics for relevance processing (probabilistic, distance, semantic graphs, etc.) Heuristics for relevance processing (probabilistic, distance, semantic graphs, etc.) “Query by example” “Query by example”

CONTI’2008, 5-6 June 2008, TIMISOARA13 Non-semantic Information Retrieval Naive Bayes Algorithm Naive Bayes Algorithm Allows classification of new (unlabeled) documents based on learning document (labeled) sets Allows classification of new (unlabeled) documents based on learning document (labeled) sets The algorithm determines the probability of words being related to a given topic The algorithm determines the probability of words being related to a given topic Problems: Problems: does not treat the problem of similar words does not treat the problem of similar words words are considered independent of their context (“naïve Bayes”) words are considered independent of their context (“naïve Bayes”) Topic-Based Vector Space Model Algorithm Topic-Based Vector Space Model Algorithm Treats the problem of similar words (synonyms are replaced) Treats the problem of similar words (synonyms are replaced) The steam of words are considered The steam of words are considered The algorithm associates a vector for every relevant word The algorithm associates a vector for every relevant word The similarity between 2 words is computed as the scalar product between the two associated vectors; The similarity between 2 words is computed as the scalar product between the two associated vectors; A document vector is computed as a weighted sum of the containing words’ vectors A document vector is computed as a weighted sum of the containing words’ vectors We proposed an automatic weight computation based on the relevance of a word to a given topic: We proposed an automatic weight computation based on the relevance of a word to a given topic: According to the proposed method the weight of a vector is computed as a function of its appearance frequency in the processed documents According to the proposed method the weight of a vector is computed as a function of its appearance frequency in the processed documents

CONTI’2008, 5-6 June 2008, TIMISOARA14 Conclusions The paper presents a new vision on the design and implementation of digital content management system. The paper presents a new vision on the design and implementation of digital content management system. The proposed ontology-based DL allows better content organization and retrieval The proposed ontology-based DL allows better content organization and retrieval The model was implemented on a GRID infrastructure The model was implemented on a GRID infrastructure As search and information retrieval two algorithms were implemented and tested. As search and information retrieval two algorithms were implemented and tested. The naïve Bayes algorithm is faster but it is not context aware The naïve Bayes algorithm is faster but it is not context aware The Topic-Based Vector Space Model Algorithm requires more processing time and more interaction from the user, but the quality of the results is higher The Topic-Based Vector Space Model Algorithm requires more processing time and more interaction from the user, but the quality of the results is higher