Presentation on theme: "Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)"— Presentation transcript:
Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)
Plan Coding of numeric information Coding of textual information Coding of graphical information Archiving of information Shannon-Fano coding Huffman coding
Basic terms Coding is the converting the message to the code, that is, to the set of symbols transmitted by the communication channel
Coding of numeric information Binary encoding used in computing, based on the representation of data sequence of two characters: 0 and 1. These signs are called binary digits, in English - binary digit, or, in short, bit (bit).
Coding of numeric information One bit can be represent two numbers: 0 or 1 (yes or no, true or false, etc.). If the number of bits is increased to two, we can represent four different numbers: 00 01 10 11 Three bits can encode eight different values: 000 001 010 011 100 101 110 111
Coding binary data The general formula is: N = 2 i where N - number of independent coded values; i - bit binary code.
Coding of binary integers Principle: Integer is divided in a half, while the reminder is not either zero or one. The set of reminders from each division, written from right to left with the last reminder forms a binary equivalent of a decimal number.
Coding of binary integers To encode the integers from 0 to 255 it is enough to have 8 bits. 16-bit coding is used for integers from 0 to 65535 24 bits are used for more than 16.5 million numbers.
Coding of textual information If each letter of the alphabet matches a certain integer, then we can use the binary code for the encoding the textual information. Eight bits are sufficient to encode 256 different characters.
Coding of textual information U.S. Standards Institute (ANSI - American National Standard Institute) has put in place a system of encoding ASCII (American Standard Code for Informational Interchange - American Standard Code for Information Interchange).
Coding of textual information There are two encoding tables in ASCII: basic (symbols with numbers 0 - 127) and extended one (128 - 255).
Coding of textual information The use of multiple concurrent encoding happen due to the limited set of codes (256). The character set based on a 16-bit character encoding, called universal - UNICODE. It contains the unique codes for 65536 different characters. The transition to this system was limited by the insufficient resources of computing for a long time
Coding of graphical information Graphic image is made up of tiny dots (pixels) which form a grid called a raster.
Coding of graphical information Pixels with only two possible colors (black and white) can be encoded by two numbers - 0 or 1. So, it is necessary to use only 1 bit. For black and white illustrations it is generally accepted coding with 256 shades of gray. How many bits do we need then?
While encoding color images, the principle of decomposition of any color on the basic components is used. Such a coding system is called RGB. If for the encoding of each of the main components of color it is used 256 bits, then the system provides 16777216 different colors.
Archiving of information Data archiving is the process of converting the information stored in a file to the form which reduces redundancy in its representation and thus requires less space for storage
Archiving of information Archiving (packing) movement of the source files into an archive file in a compressed format Decompression (unpacking) is the process of recovering files from the archive in the exact form which they had before archiving
Archiving of information The aims: accommodation in a more compact form on the disk reduction of time (or cost) of the transmission of information through communication channels simplification of transferring files from one computer to another protection from unauthorised access
Archiving of information One of the first archiving method was proposed in 1844 by Samuel Morse in the coding system of Morse code. Frequent characters are coded in shorter sequences
Archiving of information In the 40-ies of the XX century the founder of the modern information theory Shannon and in independency with him Fano developed a universal algorithm for constructing optimal codes. There is an analogue of this algorithm which was proposed by Huffman. The principle of this algorithm is the encoding of frequently occurring characters by shorter sequences of bits.
Archiving of information In the 70's of the XX century Lempel and Ziv proposed algorithms LZ77 and LZW. The algorithm finds the repeated sequences and replace some numbers instead of these sequences according to the dynamically generated dictionary. Most modern archives (WinRar, WinZip) are based on the variations of the Lempel-Ziv algorithm.
Archiving of information where K c – the coefficient of the compressed file, V c – the volume of the compressed file, V r – the volume of the resource file. The degree of the compression depends on the archiving program, the method and the type of source file
Archiving of information The degree of compression for graphical, text and data files is 5-40%. The degree of compression for executable files is 60-90%. The degree of compression for archived files is 90-100%.
Archiving of information The self-extracting archive file is the boot executable module which is able to self-unzip contained files without using the archiver. Big archive files can be divided into several toms.
1.Develop a list of probabilities or frequency counts 2.Sort the lists of symbols according to frequency 3.Divide the list into two parts, with the total frequency counts of the left part being as close to the total of the right as possible. 4.The left part of the list is assigned the binary digit 0, and the right part is assigned the digit 1. 5.Recursively apply the steps 3 and 4 to each of the two halves, subdividing groups and adding bits to the codes until each symbol has a code.
Huffman coding A source generates 4 different symbols with probability. A binary tree is generated from left to right taking the two least probable symbols and putting them together to form another equivalent symbol having a probability that equals the sum of the two symbols. The process is repeated until there is just one symbol. The tree can then be read backwards, from right to left, assigning different bits to different branches.