Digital Library Technologies ● Text: formats and storage ● Searching text ● Images ● Speech ● Multimedia ● Networking.

Slides:



Advertisements
Similar presentations
15 Data Compression Foundations of Computer Science ã Cengage Learning.
Advertisements

Data Compression CS 147 Minh Nguyen.
Searching & Saving Web Resources ADE100- Computer Literacy Lecture 23.
Chapter 7 End-to-End Data
Spring 2003CS 4611 Multimedia Outline Compression RTP Scheduling.
Compression JPG compression, Source: Original 10:1 Compression 45:1 Compression.
Copyright 2003 Monash University IMS5401 Web-based Information Systems Topic 2: Elements of the Web (d) Digital representation.
1 A Balanced Introduction to Computer Science, 2/E David Reed, Creighton University ©2008 Pearson Prentice Hall ISBN Chapter 12 Data.
Files & Folders Mrs. Marshell Faubion Middle School.
File Formats The most common image file formats, the most important for cameras, printing, scanning, and internet use, are JPG, TIF, PNG, and GIF.
Software and Multimedia
Nat 4/5 - Software Design and Development – Low Level Operations - 1 National 4/5 – Computing Science Information Systems Design and Development Media.
File Formats Different applications (programs) store data in different formats. Applications support some file formats and not others. Open…, Save…, Save.
It is helpful to break up the word ‘multimedia’ in order to gain a better understanding of its meaning. “Multi” means more than one e.g. a multi storey.
Prepared by George Holt Digital Photography BITMAP GRAPHIC ESSENTIALS.
Lecture 10 Data Compression.
Sem 1 v2 Chapter 14: Layer 6 - The Presentation layer.
Database Types of database programs Charles w. Bachman Well- Designed Databases Database Management Systems Types of database programs Daabase Techniques.
Naresuan University Multimedia Paisarn Muneesawang
Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,
Discovering Computers 2010 Chapter 2 The Internet and World Wide Web.
Computers and Scientific Thinking David Reed, Creighton University Data Representation 1.
Institute of Technology Sligo - Dept of Computing Sem 1 Chapter 14: Layer 6 - The Presentation layer.
Lab #5-6 Follow-Up: More Python; Images Images ● A signal (e.g. sound, temperature infrared sensor reading) is a single (one- dimensional) quantity that.
: Chapter 12: Image Compression 1 Montri Karnjanadecha ac.th/~montri Image Processing.
 Refers to sampling the gray/color level in the picture at MXN (M number of rows and N number of columns )array of points.  Once points are sampled,
Multimedia and The Web.
Presented by the Virginia 4-H Science and Technology Committee PowerPoint 101.
Document Formats How to Build a Digital Library Ian H. Witten and David Bainbridge.
1 i206: Lecture 2: Computer Architecture, Binary Encodings, and Data Representation Marti Hearst Spring 2012.
Multimedia Specification Design and Production 2012 / Semester 1 / L3 Lecturer: Dr. Nikos Gazepidis
Data Compression. Compression? Compression refers to the ways in which the amount of data needed to store an image or other file can be reduced. This.
CS 111 – Sept. 10 Quiz Data compression –text –images –sounds Commitment: –Please read rest of chapter 1. –Department picnic next Wednesday.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Addressing Image Compression Techniques on current Internet Technologies By: Eduardo J. Moreira & Onyeka Ezenwoye CIS-6931 Term Paper.
1 Using HTML and JavaScript to Develop Websites. Using Images.
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials: Informedia.
Web Search Algorithms By Matt Richard and Kyle Krueger.
File Formats Different applications (programs) store data in different formats. Applications support some file formats and not others. Open…, Save…, Save.
ITGS Databases.
Digital Graphics. Formats: BMP – Bitmap image file which is used to store Bitmap digital images PNG – Portable Network Graphics GIF – Graphics Interchange.
Introduction to Information Technology Applications.
THIS IS With Host... Your YouKnowWhatItIs Boy.
Marwan Al-Namari 1 Digital Representations. Bits and Bytes Devices can only be in one of two states 0 or 1, yes or no, on or off, … Bit: a unit of data.
Class 9 LBSC 690 Information Technology Multimedia.
Media Types Information Systems can contain the following types of media: Sound, graphics, video & text.
CSCI-100 Introduction to Computing Hardware Part II.
Web Design and Development. World Wide Web  World Wide Web (WWW or W3), collection of globally distributed text and multimedia documents and files 
Image File Formats. What is an Image File Format? Image file formats are standard way of organizing and storing of image files. Image files are composed.
COMP135/COMP535 Digital Multimedia, 2nd edition Nigel Chapman & Jenny Chapman Chapter 2 Lecture 2 – Digital Representations.
MULTIMEDIA Multimedia is the field concerned with the computer- controlled integration of text, graphics, drawings, still and moving images (Video), animation,
Layer 6 Presentation Layer. Overview Now that you have learned about Layer 5 of the OSI model, it is time to look at Layer 6, the presentation layer.
Graphics and Image Data Representations 1. Q1 How images are represented in a computer system? 2.
Software Design and Development Storing Data Part 2 Text, sound and video Computing Science.
Submitted To-: Submitted By-: Mrs.Sushma Rani (HOD) Aashish Kr. Goyal (IT-7th) Deepak Soni (IT-8 th )
File Formats Different applications (programs) store data in different formats. Applications support some file formats and not others. Open…, Save…, Save.
Why indexing? For efficient searching of a document
Data Compression.
Lesson Objectives Aims You should know about: 1.3.1:
Multimedia: Digitised Sound Data
Data Compression.
Data Compression CS 147 Minh Nguyen.
Software and Multimedia
Software and Multimedia
UNIT IV.
15 Data Compression Foundations of Computer Science ã Cengage Learning.
Spreadsheets, Modelling & Databases
(c) V/2-Com (Verhaart) Multimedia Elements & standards 4/15/2019 (c) V/2-Com (Verhaart)
Chapter 8 – Compression Aims: Outline the objectives of compression.
15 Data Compression Foundations of Computer Science ã Cengage Learning.
Presentation transcript:

Digital Library Technologies ● Text: formats and storage ● Searching text ● Images ● Speech ● Multimedia ● Networking

Text formats Ascii: simple, no formatting, accessible HTML: simple, moderate formatting, accessible word processors: formatting, access limited PDF: formatted, complex, access limited TEI: formatted, open, very complex Oph. Pray let's have no words of this, but when they ask you what it means, say you this: Song. "To-morrow is Saint Valentine's day, All in the morning betime, And I a maid at your window,

Control of the format Ascii: user has complete control of display HTML: user has considerable control of display PDF: publisher has all the control Authors and readers disagree on who should decide things like column layout, type size, etc. Over time, more and more Web documents have the format nailed down.

Text compression Basic strategies: statistics or dictionaries Statistics: Morse code: the more frequent letters get shorter codes Huffman coding is the traditional method here, but lengthening the alphabet will give better results. Dictionaries: Lempel-Ziv or LZW. Find repeated strings and list them at the beginning. Questions: instantaneously decodable? Is a factor of 2 worth the trouble?

Searching text files Linear scan (grep): not for very big collections, no update problem Inverted files: tries, or just divide by blocks May wish to compress occurrence lists, index by both ends, allow fielded searching, and keep frequency information Signature files: electronic edge-notched cards, trading space for false drops Bitmaps: best for very common words; add to inverted files Clustering: for complex searching, summarizing results Case folding, suffixing, stop lists.

Grab – an example compromise Grab was an attempt to balance between the speed of inversion and the compactness of linear search. Bitmap vectors on hashed words, compressed 10bits to 4 bits. Go back later and cast out false drops. For 5% extra space, get 90% speedup on linear. Never caught on. Space is too cheap today, and files are too big. Might as well use full inversion.

Why not a DBMS? Why don't text retrieval systems use a DBMS underneath? Few numerical entries, and vast numbers of items Special needs, such as index browsing and truncation searching Input not neatly structured into records, and variable length of items may have to be retrieved Not much updating. Parallel searching: just coming into vogue.

What do the search engines do? Very large inverted files and parallel search engines on a great many machines (thousands). Big caches. They may search only in the cache and avoid all disk delays Are willing to give different results depending on what data is in cache

Collaborative ranking and filtering Google is the best known search engine; it derives from “backrub” at the Stanford digital library project. See: The Anatomy of a Large-Scale Hypertextual Web Search Engine Sergey Brin and Lawrence Page. Simply: pages pointed to by a lot of other people are probably better. Other work from Jon Kleinberg at Cornell has looked at links in both directions, and this is all related to “collaborative filtering”.

Image formats There are a great many image formats. The best known are GIF and JPG. Why are there so many? Images are bulky. The best compression is “lossy”, and one can choose what kinds of things to lose. GIF loses color space: it is perfect on b&w. JPG is more general You can do non-lossy compression, eg. Tiff G4, just run ordinary Huffman-like compression on the signal. Wavelet and fractal compression are coming along. JPEG2000 may replace JPEG someday. DjVu is particularly interesting: oriented for text, it divides the page into background and foreground and does wavelets on the background and dictionary compression on the foreground.

Sound formats Some technology, but mostly commerce a) Digitization rates. You can do speech at 8 kHz, but for music you ought to do better: CD music is 44.1 Khz. b) Compression: You can get speech to 2400 baud or so, and music by a factor of 10 (MP3 current favorite). Commerce: Real vs. Microsoft (WMP). Digital rights management Unlike text, few people can write sound manipulation software, and so everyone is dependent on one or another vendor.

Video formats Video is extremely bulky. With 24 frames/second (movies) or 30 (TV), an hour of video is easily a gigabyte even with minimal resolution on each image. But there is enormous scene-to-scene redundancy. MPEG sequence: key frames and then differentially coded frames; JPEG like coding on individual frames; prediction of moving objects. MPEG-1: 1.5 Mbit/sec; MPEG-2: 4-9 Mbit/sec MPEG-4: mixing synthetic (animation) with camera video MPEG-7: metadata The next real improvement is going to have to be longer-term storage and segmentation, e.g. separating the background from a scene and keeping it for many frames.

Image searching QBIC: color, texture, some shape Color histogram is easiest: beware any demo of sunsets Current work at Berkeley better at segmentation & labeling

Image labeling David Forsyth & Jitendra Malik, Berkeley

Sound searching Speech: speech recognition; speaker identification Music: we now have “hum & search” software. See Bill Birmingham, U of Michigan; Donald Byrd, U. Mass.

Video searching See Informedia, Howard Wactlar, CMU. Combination of: closed-captioning speech recognition face recognition OCR of on-screen text some image searching Also a great deal of work on presentation.

Summary There are lots of things in digital libraries today. And there are more to come: 3-D objects, scientific data, software, … All of this will have to be stored, organized and searched.