Lecture # 20 Image and Data Compression. Data Compression.

Presentation on theme: "Lecture # 20 Image and Data Compression. Data Compression."— Presentation transcript:

Lecture # 20 Image and Data Compression

Data Compression

How big? Image 1024x1024x3 –3 Million bytes (3 MB) Audio - 48000 x 10 min x 60 sec/min x 2 –58 million bytes (58 MB) Video –640 x 480 x 10 minutes –307,200 x 600 sec x 30 fps –16.6 billion pixels (17 GB) Compression (reduce the size)

Problem Reduce the size of a data object –Text –Image –Audio –Video How to do it –Cheat in ways that the user can ’ t see –Coherence

Ways to cheat Text generally only has less than 128 possible characters. –Use 7 bits instead of 8 (12%) For text, some characters are more common than others –Use fewer bits for common characters, more bits for infrequently used characters

Ways to cheat People can ’ t see more than 64 levels of gray –Use 6 bits instead of 8 (25%) People don ’ t see color as well as B/W –Use 6 bits for B/W and much less for color

Coherence If we know the previous value of something, then we generally have a good idea what the next value will be 3 Techniques –Run length encoding –Reuse of subsequences –Prediction and error

Run length encoding Values are frequently repeated. –Instead of storing each value, store a single value with a count of how many times to repeat

12 x 10 = 120 pixels 120 pixels x 3 bytes/pixel = 360 bytes

Run encoded RGB - 3 bytes Count - 1 byte Entries - 23 Space - 4*23 = 92 Compression (360-92)/360 = 74%

Run encoded - with indexed color 4 colors - 12 bytes index - 2 bits Count - 6 bits Entries - 23 Space - 12+1*23 = 35 Compression (360-35)/360 = 90%

Run encoding HELLO Works well

Run encoding Works well

Run encoding Not good Too much variation in the rose

Run encoding - text four score and seven years ago, our fathers brought forth on this continent Not good no repetition

Run Encoding - Audio Not good No repetition

Run Encoding - Audio Not good No repetition

Reuse common sequences

Works really well Used in GIF format

Reuse common sequences Works fair Blacks are good Rose has some similarities

Reuse common sequences

Works really well

Reuse common sequences Works poorly

Reuse common sequences Video Works really well Copy pieces from last frame into this frame One technique in MPEG

Reuse common sequences Text Reuses words and phrases Works fairly well Most common text compression technique

Prediction + error Given previous values, predict what the next value will be When it is not quite right, store the error The error almost always takes fewer bits than the value

Linear prediction line through previous predicts next Little error

Linear prediction line through previous predicts next More error

Linear prediction line through previous predicts next Still more error

Linear prediction line through previous predicts next less error

Linear prediction line through previous predicts next less error

Linear prediction line through previous predicts next little error

Linear Prediction

Look closer Little Error More error

Linear Prediction Prediction + error Shades of black Follows shade of rose Rose detail is error off shade Prediction + error + cheating = JPEG

JPEG Comparisons

Video Copy from previous frame Store error for small details MPEG

Text N-Grams Use the last N letters to predict the next letter Store errors English is quite regular

Review Cheat –Exploit weakness in what people can perceive Coherence –Run encoding (count repetitions) –Reuse (reference pieces from previous data) –Predict + error Know when each technique will or will not work