Download presentation
Published byGary Blankenship Modified over 9 years ago
1
Chapter 6 Text and Multimedia Languages and Properties
.. .
2
Introduction Document has given syntax and structure
also has semantics may have presentation style associated with it Figure 6.1 depicts all these relationships document can also have information about itself, called metadata
3
one or more of these elements may be given together
Syntax of document can express different elements such as structure, presentation style, semantics one or more of these elements may be given together structural element (e.g. section) can have fixed formatting style
4
Syntax of document can be
implicit in its content expressed in declarative language or PL current trend is to use languages that provide information on document structure format semantics readable by humans and computers SGML is one such language
5
Metadata Metadata is data about data
metadata associated with text include author date of publication source of publication document length (in pages, words, bytes) document genre (book, article, memo) Machine Readable Cataloging Record (MARC) is most used format for library records
6
In Web, metadata used for many purposes
cataloging content rating (e.g. to protect children from reading some type of document) intellectual property rights digital signatures (for authentication) privacy levels (who should/should not have access to document) application to EC, etc.
7
New standard for Web metadata is Resource Description Framework (RDF)
RDF allows description of Web resources consists of description of nodes and attached attribute/value pairs nodes can be any Web resource (any URI), that include URL attributes are properties of nodes, and their values are text strings or other nodes
8
Text With the advent of computers, necessary to code text in binary digits first coding schemes were EBCDIC and ASCII for internationalization of oriental languages like Chinese or Japanese Kanji, 16-bit Unicode (ISO10616) exists
9
Text Formats No single format for text document
in the past, IR systems would convert document to internal format cannot change content of document current IR systems have filters to handle most popular documents, in particular Word, WordPerfect or Framemaker
10
Other text formats for document interchange
Rich Text Format (RTF) used by word processors and has ASCII syntax Portable Document Format (PDF) developed for displaying and printing documents Multipurpose Internet Mail Exchange (MIME) used to encode electronic mail
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.