Presentation is loading. Please wait.

Presentation is loading. Please wait.

NATO Consultation, Command and Control Agency

Similar presentations


Presentation on theme: "NATO Consultation, Command and Control Agency"— Presentation transcript:

1 NATO Consultation, Command and Control Agency
COMMUNICATIONS & INFORMATION SYSTEMS Decreasing “Bit Pollution” through “Sequence Reduction” Dr. Davras Yavuz NATO UNCLASSIFIED NATO UNCLASSIFIED

2 You will find this presentation and the accompanying paper at
from where both can be viewed and/or downloaded (the four other NC3A presentations can also be found at the above URL) NATO UNCLASSIFIED NATO UNCLASSIFIED

3 Terminology “Sequence Reduction” Originates with Peribit ~2000, Founder’s Ph. D. on Genome Mapping - uses the term “Molecular Sequence Reduction” (MCR) - Biomedical Informatics, Stanford University “Bit Pollution” Link/network pollution repetition of redundant digital sequences over transmission media (especially significant for mobile/deployed networks/links) Other related terms: WAN optimizer, Application Accelerator/ Optimizer or Application Controller-Optimizer, Performance Enhancement Proxies (PEP), WAN Expanders, Latency (=delay) removers/compensators/mitigators ….. etc. New & dynamic field, many terms will continue to appear, coalesce, some will catch on others will disappear NATO UNCLASSIFIED NATO UNCLASSIFIED

4 Terminology Application Accelerator/Optimizer/Controller-Optimizer
“Next Generation Compression”, “Bit Pollution Reduction”, “Sequence Reduction” (latter Peribit/Dr. Amit Singh) WAN Expander (WX), WAN Optimizer, WAN Optimization Controller (WOC) (Juniper/Peribit) Application Accelerator/Optimizer/Controller-Optimizer Latency Remover/Optimizer (replace Latency by “Delay” ) Especially for networks with SATCOM links In general; use of a-priori knowledge of data comms protocols required by application to optimize the data input/output Combinations of above Unfortunately all present implementations “proprietary” Unrealistic to expect “standards” soon, technology too new and lucrative NATO UNCLASSIFIED 5

5 Why “Bit Pollution” ? 1) Application & protocol overheads
Most of us deal daily with various electronic files/ information Taking MS Office as an example; Word, PPT, Excel, Project, HTML, Access, …. Files …and/or many other electronic files, data-bases, forms, etc.,.. On many occasions we make small changes and send them back and/or forward to others Repetitive traffic over communication links can, in general, be classified broadly into 3 categories: 1) Application & protocol overheads 2) Commonly used words, phrases, strings, objects (logos, images, audio clips, etc.) 3) Process flows (data-base updates/views, forms, templates, etc. going back & forth) NATO UNCLASSIFIED NATO UNCLASSIFIED

6 SEQUENCE REDUCTION Next Generation Compression - Examples
256 Kbps satellite link 20 Mbytes PPT file (48 slides) sent 1st time : ~12 minutes (700 secs) 6 of the slides modified, file size change <0.5 Mbytes Modified file sent 6 hours later time taken: ~ 8 secs Same modified file sent 24 hours later ~ 18 secs Sent 7 days later ~24 secs Original file sent 7 days later : ~14 secs Similar results for Word, Excel files and web pages Less but still significant improvement for PDF files Smallest improvement for zipped files (reduction by ~ to 3) Amount of “new” files in between repetitions & SR RAM/HD capacities have strong effect on the duration of repeat transmissions (dynamic library updates) Above results based on Peribit SR s : German MOD, Syracuse University “Real World” Labs (Network Computing Nov 2004) and NC3A GE MOD results based on operational traffic, others test traffic Ref [6] of paper: “Record for throughput was ~60Mbps through a T1. It came about when copying 1.5GB file twice! ” NATO UNCLASSIFIED NATO UNCLASSIFIED

7 Mobile/Tactical Comms Divergence
Fixed communications – WANs with all users/nodes fixed Fiber-optic/photonic revolution: Essentially unlimited capacity is now possible/available if/when a cable can be installed Mobile comms: Networks with mobile/deployable users No technological revolution similar to photonic foreseen Radio propagation will be the limiting factor Mainstay will be radio: Tactical LOS tens/hundreds of Kbps, BLOS (rough terrain, long distances) few Kbps Star-wars scenarios : Moving laser beams ??? LEO satellites will provide some 100s of Kbps at a cost Divergence will continue Another factor: Input into the five senses : ~100 Shannon/ Entropy bps For transmission redundancy : x = 1 Kbps Basic issue for mobile/deployed communications, e.g. when at least one end of a communications link is moving, and/or some users/nodes of a communications network are moving Deployed : move and then set-up communications On-the-move : communicate while moving Therefore: we must treat mobile/tactical comms differently NATO UNCLASSIFIED NATO UNCLASSIFIED

8 Deployable, Mobile, On-the-Move Communications
At least one end of a link moving/deployed Networks which have nodes/users moving/deployed Such links/networks essential for survivability and rapid reaction Will be taking on increasingly more critical tasks Present approach: Use applications developed for fixed links/networks for deployed/mobile units Must consider the very different characteristics of such networks when choosing applications Can we measure information” so we can determine performance of links/ networks in terms of “information” transported, not just bits/bytes NATO UNCLASSIFIED NATO UNCLASSIFIED

9 Can we measure “information” ? Yes we can !
Shannon defined the concept of “Entropy”, a logarithmic measure in 1940s (while working on cryptography), it has stood the test of time First suggestion of log measure was Hartley (base 10) but Shannon used the idea to develop a complete “theory of information & communication” Shannon preferred Log2 and called the “unit” bits Base e is also sometimes used (Nats) Smaller the probability of occurrence of an event higher the “information delivered” when it occurs Hartley was the first to propose Log as a measure of information but Shannon developed it into a full fledged, structured theory. There is another “Info theory” that you might hear about, it is called Kolmogorov-Chaitin Theory (or K-C) that some mathematicians still refer to (not in the mainstream) Napiers/Nats base e Hartleys/dits/decs base etc. Hartley 1928 NATO UNCLASSIFIED NATO UNCLASSIFIED

10 {Si} {Rj} {{ discrete Discrete, countable C. E. Shannon (BSTJ 1948)
NATO UNCLASSIFIED 5

11 in the case of two possibilities/events/symbols
Entropy Entropy (H) in the case of two possibilities/events/symbols Prob of one = p the other q = 1-p H = -(p log p + q log q) H versus p plotted  NATO UNCLASSIFIED NATO UNCLASSIFIED

12 Let us take a “Natural Language” English as an example
English has 26 letters (characters) Space as a delimiter TOTAL characters (symbols) One could include punctuation, special characters, etc., for example we could use the full ASCII symbol set - methodology is the same Extension to other natural languages readily made Extension to images also possible (same methodology) NATO UNCLASSIFIED NATO UNCLASSIFIED

13 Structure of a “Natural Language” - English
Defined by many characteristics: Grammar, semantics, etymology, usage, …., historical developments, …. Until early 70s there was substantial belief that “Natural Languages” and “computer programming languages” (finite automata instructions) had similarities Noam Chomsky’s work (Professor at MIT) completely destroyed those expectations Natural Languages can be studied through probabilistic (Markov) models Shannon’s approach (1940s, no computers, Bell Labs staff flipped through many pages of books to get the probabilities) He was actually working on cryptography and made important contributions in that area also NATO UNCLASSIFIED NATO UNCLASSIFIED

14 Various Markov model examples here, skipped here for continuity, may be found at the end
NATO UNCLASSIFIED NATO UNCLASSIFIED

15 Zipf’s Law “Principle of Least Effort”
George Kingsley Zipf, Professor of Linguistics, Harvard (1902 – 1950) If the “words” in a language are ordered (“ranked”) from the most frequently used down the probability Pn of the nth word in this list is Pn  0.1 / n Implies a maximum vocabulary size words since (  1 / n is not finite when summed to  ) For details of above see DY IEEE Transactions on Information Theory, September 1974 Many other applications of “Zipf’s Law”, if interested just make a Google/Internet search Words could be roots, lexical, types is sufficiently large to model all languages (Shannon had which is wrong, however the correction of the error makes his results even more meaningful) Populations of cities in a country Company sizes ….. NATO UNCLASSIFIED NATO UNCLASSIFIED

16 “Symbols, Signals & Noise” J. R. Pierce
Zipf’s Law (Principle of Least Effort) ~ million words, various texts Many such analysis have been made All issues of TIME Magazine, New York Times, …., Shakespeare's works, etc and they all give similar results Just search Google for “Zipf’s Law” From “Symbols, Signals & Noise” J. R. Pierce NATO UNCLASSIFIED 5

17 Entropy bits/character - English
Amazingly it turns out to be about the same for most “Natural Languages” for which the analysis has been done (Arabic, French, German, Hebrew, Latin, Spanish, Turkish, .…). These languages also follow Zipf’s Law. NATO UNCLASSIFIED NATO UNCLASSIFIED

18 Entropy of Natural Languages
Between 1 & 2 bits per letter/character 1.5 bits per letter is commonly used English has ~4.5 letters per word on the average 4.5 x 1.5 = or ~7 bits per word average Normal speech words per second Hence information per second ~ 5 bits NATO UNCLASSIFIED NATO UNCLASSIFIED

19 (*) “equally likely” assumption clearly not realistic
Extension to Images Same concept and definitions Letters replaced by pixels/groups of pixels, etc. Words could be analogous to sets of pixels, objects The numbers are much larger E.g x 600 = pixel image with each pixel capable of taking on one of 16 brightness levels possible images Assume all these images are equally likely (*): Probability of one these images is 1/ and the information provided by that image is log2 16 = bits A real image contains much smaller “information” adjacent/nearby pixels are not independent of each other Movies : frame to frame only small/incremental changes (*) “equally likely” assumption clearly not realistic NATO UNCLASSIFIED NATO UNCLASSIFIED

20 Speech Coding ~5 b/s is irreducible information content, x by 10 to introduce redundancy - therefore we should be able communicate speech “information” at ~50 bps Examples of speech coding we use: 64000 bps , bps PC bps CVSD, bps LPC, MELP 1200, bps MELP All above “waveform” codecs, they will also convey “non-measurable” (intangible) information Speech codecs (recognition at transmitter and synthesis at receiver ) technology could conceivably go lower than bps but would not contain the intangible component ! NATO UNCLASSIFIED 5

21 A QUICK REFRESHER ON CONVENTIONAL COMPRESSION
May be found at the end NATO UNCLASSIFIED NATO UNCLASSIFIED

22 SEQUENCE REDUCTION Next Generation Compression
Dictionary based – implements learning algorithm Dynamically learns the “language” of the communications traffic and translates into “short-hand” Continuously updates/improves “knowledge” of link “language” Frequent patterns move up in dictionary, infrequent patterns move down and eventually can age out No fixed packet or window boundaries Unlike e.g. LZ which generally uses byte window Once a pattern is learned and put in dictionary it will be compressed wherever it appears Data compression is based on previously seen data Performance improves with time as “learning” increases Very quickly at first (10 –20 minutes) and then slowly When a new application comes in, SR adapts to its “language” NATO UNCLASSIFIED NATO UNCLASSIFIED

23 MOLECULAR SEQUENCE REDUCTION
Relative positioning of statistical and substitutional compression algorithms (from Peribit, A. P. Singh) NATO UNCLASSIFIED 5

24 “Molecular Sequence reduction”
NATO UNCLASSIFIED NATO UNCLASSIFIED

25 Origins in DNA pattern matching
MSR – Technology Real time, high speed, low latency Continuously learns and updates dictionary Transparently operates on all traffic (optimized for IP) Eliminates patterns of any size, anywhere in stream Patent-pending technology Origins in DNA pattern matching 3 or 4 conflicting goals High speed  Cisco works well at <256K links, but as bw incr, perf decr Patterns spread across large distances  key for data reduction Latency  looking at a broad range of data can create latency in the process of compression IP layer  benefits all applications rather than working at the app layer and making compression only work for one app Dictionary  auto-populates and doesn’t age NATO UNCLASSIFIED NATO UNCLASSIFIED

26 MSR – Molecular Sequence Reduction
“Next-gen dictionary-based compression” NATO UNCLASSIFIED NATO UNCLASSIFIED

27 Government/Military use examples
Many thousands of units in use in USA (mostly corporate but also government agencies) GE MOD using Peribit SRs (since ~2 years) INMARSAT German Navy WAN (encrypted) Links to GE Navy ships in/around South Africa Satellite links to GE units in Afghanistan Plans for some 64 Kbps landlines GE MOD total : units also other nations …… Some with initial trials NATO UNCLASSIFIED NATO UNCLASSIFIED

28 Reduction rates observed
(reduced by % amount given) GE Armed Forces Results Traffic type Version 3.0 V 4.02 V 5.0 HTTP 30 % 40 % 46 % MAIL 61 % 67 % NetBios 59 % 62 % CIFS 92 % FTP 69 % 73 % TELNET 65 % 93 % CIFS: Common Internet File System "Microsoft's way of doing network file sharing“ All MS operating systems have had some form of CIFS networking available or built in, and there are implementations of CIFS for most major non-MS operating systems as CIFS allows the sharing of directories, files, printers, and other cool computer stuff across a network NATO UNCLASSIFIED NATO UNCLASSIFIED

29 CIFS: Common Internet File System "Microsoft's way of doing network file sharing“
From German MOD NATO UNCLASSIFIED NATO UNCLASSIFIED

30 Startup behavior example
From German MOD NATO UNCLASSIFIED NATO UNCLASSIFIED

31 From German MOD NATO UNCLASSIFIED NATO UNCLASSIFIED

32 From German MOD NATO UNCLASSIFIED NATO UNCLASSIFIED

33 From Peribit.com (not GE MOD data)
NATO UNCLASSIFIED NATO UNCLASSIFIED

34 Peribit (screen capture)
NC3A – WAN (NL – BE) EFFECTIVE WAN CAPACITY INCREASED BY 2.80 DATA REDUCTION BY % NO DATA COMPRESSION & NO REDUCTION Real-life test results, with a typical IP traffic between NC3A-NL and NC3A-BE Impact of lossless data compression on TCP/IP traffic across a 2048 kbps terrestrial link. WITH DATA COMPRESSION & REDUCTION !!! NATO UNCLASSIFIED NATO UNCLASSIFIED

35 Real-life test results, with a typical IP traffic between NC3A-NL and NC3A-BE
Impact of lossless data compression on TCP/IP traffic across a 2048 kbps terrestrial link. NATO UNCLASSIFIED NATO UNCLASSIFIED

36 Peribit Sequence Reducers
NATO UNCLASSIFIED NATO UNCLASSIFIED

37 NC3A TEST RESULT SUMMARY Expand Model 4800 “WAN Link Accelerators”
512 kbps satellite link Multiplexed TCP/IP Link with SCPS-TP acceleration Link with application accelerator & IP data compressor Un-accelerated link NATO UNCLASSIFIED NATO UNCLASSIFIED

38 NC3A TEST RESULT SUMMARY
512 kbps satellite link Multiplexed TCP/IP Link with SCPS-TP acceleration Link with application accelerator & IP data compressor Un-accelerated link NATO UNCLASSIFIED NATO UNCLASSIFIED

39 512 Kbps satellite link 10 multiplexed TCP/IP sessions
Link with SCPS-TP acceleration Link with application accelerator & IP data compressor Un-accelerated link NATO UNCLASSIFIED NATO UNCLASSIFIED

40 Packeteer NATO UNCLASSIFIED NATO UNCLASSIFIED

41 Industry New area but many & increasing number of companies
Peribit.com (now Juniper Networks) Expand.com (Expand Networks) Packeteer.com Riverbed.com Silver-peak.com ….. National authorities (e.g. USA & GE) also working with industry to incorporate SR/WX technology into national crypto devices NATO UNCLASSIFIED NATO UNCLASSIFIED

42 SEQUENCE REDUCTION Next Generation Compression Summary (1)
WANs will form backbone of Network Enabled Operation This technology provides significant improvements in capacity Dictionary based – implements learning algorithm Dynamically learns the “language” of the communications traffic and translates into “short-hand” Continuously updates/improves “knowledge” of link “language” Frequent patterns move up in dictionary, infrequent patterns move down and eventually can age out No fixed packet or window boundaries Unlike conventional compression which operates over 1-2 Kbytes Once a pattern is learned and put in dictionary it will be compressed wherever it appears Data compression is based on previously seen data Performance improves with time as “learning” increases Very quickly at first (10 –20 minutes) and then slowly When a new application comes in, SR adapts to its “language” NATO UNCLASSIFIED NATO UNCLASSIFIED

43 SEQUENCE REDUCTION Next Generation Compression Summary (1)
Significant advantages for WANs where capacity is an issue (i.e. deployed/mobile/tactical) Removes redundant/repetitive transmissions Packet-flow acceleration (latency removal) can be easily added Quality of Service & Policy Based Multipath can also be implemented Does not impact security implementations (cryptos between SRs) However Presently available from a few sources, each with its “proprietary” technology Useful implementations for NNEC, GIG implementations in the coming years Proprietary nature of the product is an issue that needs to be considered NATO UNCLASSIFIED NATO UNCLASSIFIED

44 Conclusions Shannon Information Theory provides tools for measuring “information” as “Entropy” Has formed the basis for most of the coding, data transmission/detection results since 1950s DNA / Genome mapping process has also apparently benefited from it In 90s estimate for human genome was years; took years with the computational developments in late 90s A new form of compression, “Sequence Reduction” provides significant reductions by reducing redun-dancies in transmitted data Will provide important advantages for mobile/deployable/moving WAN link applications NATO UNCLASSIFIED NATO UNCLASSIFIED

45 This presentation & associated paper can be found at
Questions Comments This presentation & associated paper can be found at NATO UNCLASSIFIED NATO UNCLASSIFIED

46 NC3A NC3A Brussels NC3A The Hague Visiting address:
Bâtiment Z Avenue du Bourget 140 B-1110 Brussels Telephone +32 (0) Fax +32 (0) Postal address: NATO C3 Agency Boulevard Leopold III B-1110 Brussels - Belgium NC3A The Hague Oude Waalsdorperweg AK The Hague Telephone +31 (0) Fax +31 (0) Postal address: NATO C3 Agency P.O. Box CD The Hague The Netherlands NATO UNCLASSIFIED NATO UNCLASSIFIED

47 Markov model examples NATO UNCLASSIFIED NATO UNCLASSIFIED

48 = log 27 = 4.75 bits / letter (or symbol)
Zeroth approximation to English (zero memory) [Zero order Markov : equally likely letters, 27 numbers ] AZEWRTZYNSADXESYJRQY_WGECIJJ_OB _KRBQPOZB_YMBUAWVLBTQCNIKFMP_KMVUUGBSAXHLHSIE_MAULEXJ_NATSKI All logs base 2 Entropy =  pi log (1/pi) for i = 1 to 27 = log 27 = bits / letter (or symbol) NATO UNCLASSIFIED 5

49 Entropy =  pi log (1/pi) for i = 1 to 27
First approximation to English (zero memory) [Zero order Markov : letter probabilities, numbers ] AI_NGAE__ITF__NR_ASAEV_OIE_BAINTHHHYROO_POER_SETRYGAIETRWCO__ EHDUARU_ EU_C_FT_NSREM_DIY_EESE_ F_O_SRIS_R __UNNASHOR_CIE_AT_XEOIT_UTKLOOUL_E Entropy =  pi log (1/pi) for i = 1 to 27 = ~ 4 bits / letter NATO UNCLASSIFIED 5

50 Entropy =  pi,k log (1/pi/k) for i = 1 to 729 (= 27 x 27)
Second approximation to English (memory) [First order Markov : e.g. prob(a|a), prob(b|a), prob(c|a), … , 27 x 27 = numbers, some zero] URTESHETHING_AD_E AT_FOULE_ ITHALIORT_WACT_D_STE_MINTSAN_OLINS__TWID_OULY_TE_THIGHE_CO_YS_TH_HR_ UPAVIDE_PAD_CTAVED_QUES_E Entropy =  pi,k log (1/pi/k) for i = 1 to (= 27 x 27) = ~ bits / letter NATO UNCLASSIFIED 5

51 Entropy: ~ 3 bits / letter
Third approximation to English (memory) [Second order Markov : e.g. prob(a|aa), prob(a|ab), prob(a|ac), …, ….., prob(z|zy), prob(z|zz x 27 x 27 = , ~ 75% zero] (Shannon calls these “di-gram probabilities) IANKS _CAN_OU_ANG_RLER_THATTED _OF_TO_SHOR_OF_TO_HAVEMEM_A_I_MAND_AND_BUT_WHISSITABLY_THERVEREER_EIGHTS_TAKILLIS_TA_KIND_AL Entropy: ~ 3 bits / letter NATO UNCLASSIFIED 5

52 N. Abramson “Information Theory & Coding”
Third approximation to French JOU_MOUPLAS_DE_MONNERNAISSAINS_DEME_US_VREH_BRETU_DE_TOUCHEUR_DIMMERE_LLES_MAR_ELAME_RE_A_VER_IL_DOUVENTS_SO_FUITE N. Abramson “Information Theory & Coding” NATO UNCLASSIFIED 5

53 N. Abramson “Information Theory & Coding”
Third approximation to ???? ET_LIGERCUM_SITECI_LIBEMUS_ACERELEN_TE_VICAESCERUM_PE_NON_SUM_MINUS_UTERNE_UT_IN_ARION_POPOMIN_SE_INQUENEQUE_IRA N. Abramson “Information Theory & Coding” NATO UNCLASSIFIED 5

54 WE COULD CONTINUE THIS WITH CONDITIONAL PROBABILITIES GIVEN TRIPLETS (tri-grams), QUADRUPLETS (tetra-grams), … n-grams,... etc. (i.e. mth ORDER MARKOV SOURCES m  3) HOWEVER, THIS BECOMES IMPRACTICAL AS THE NUMBER OF JOINT PROBABILITIES BECOMES TOO LARGE - SO SHANNON JUMPED TO MARKOV SOURCES WITH WORDS AS SYMBOLS - symbol set no longer 27 characters, but thousands of words. However m=1,2 Markov model gives much better results than n-gram analysis as “n” is increased NATO UNCLASSIFIED 5

55 Fourth approximation to English
[Zero order Markov with words : e.g. Probability of words, zero memory] REPRESENTING AND SPEEDILY IS AN GOOD APT OR COME CAN DIFFERENT NATURAL HERE HE THE A IN CAME THE TO OF TO EXPERT GRAY COME TO FURNISHES THE LINE MESSAGE HAD BE THESE … Entropy = ~ 2.2 bits / letter (using Zipf’s Law) (Shannon 1948) NATO UNCLASSIFIED 5

56 Fifth approximation to English (memory)
[First order Markov with words : e.g. Probability (wordi | wordj) THE HEAD AND IN FRONTAL ATTACK ON AN ENGLISH WRITER THAT THE CHARACTER OF THIS POINT IS THEREFORE ANOTHER METHOD FOR THE LETTERS THAT THE TIME OF WHO EVER TOLD THE PROBLEM FOR AN… (Shannon 1948) NATO UNCLASSIFIED 5

57 Fifth approximation to Turkish (memory)
[First order Markov with words : e.g. Probability (wordi | wordj) BIR ANLATTIKLARINA GŰLMECE YAZDI YAPITLARININ ŞARAP BİÇİMLERİ BELA GÖRŰNŰMŰ GİBİ AMA BİR ETMEK YOK TUTULDU GELEN GİDEN YER KALMADI ... NATO UNCLASSIFIED 5

58 A QUICK REFRESHER ON CONVENTIONAL COMPRESSION
NATO UNCLASSIFIED NATO UNCLASSIFIED

59 Conventional Compression
Lossy Compression Not necessarily a copy of the input: most audio, image, video compression algorithms are “Lossy” – our ears and eyes have resolution thresholds Loss-less Compression Data integrity essential in digital data communications – Network compression must be “Loss-less” Two basic approaches Statistical compression algorithms Substitutional compression algorithms NATO UNCLASSIFIED 5

60 Statistical compression : Probabilities of characters in the input data calculated (or given) - frequently occurring characters are encoded into fewer bits [e.g. Huffman code, Morse code] Static coding : Once the coding is determined in accordance with the probabilities of occurrence it does not change Dynamic coding : Coding changes with “context” - for example, the occurrence of “q” in English increases the probability of occur-rence of “u” to 1, similarly the occurrence of “th” significantly increases the probability of occurrence of “e” , etc. As the amount of “historical context” information increases “dynamic coding” techniques can approach “Shannon limit”, however computational requirements increase exponentially making them impractical for real-time/on-line applications For example e, t, a are frequently occurring characters in English where as x, z are very infrequent - However ASCII encodes all with 8 bits, a statistical coding technique would, for example, encode “e” with 3 bits and “z” 10 bits, etc. NATO UNCLASSIFIED 5

61 Substitutional compression : Identifies repeated strings of characters (longer the better) and replaces them with reference identifiers or tokens (shorter the better) - At the receiver the tokens are de-referenced and the reverse substitution performed Essentially a form of “pattern recognition” and classification Pattern detection/recognition generally much faster than computations needed for dynamic coding algorithms Most network compression techniques in use today use substitutional compression Compression techniques can also be combined – for example substitution based compression followed by static coding, etc. NATO UNCLASSIFIED 5

62 “Substitution” based compression is the basis of almost all network compression implementations
Principle of all : replace repeated patterns with shorter tokens Different techniques for detecting/encoding repeated patterns Two basic approaches : Lempel-Ziv (LZ) “stateless” window compression e.g. v.42bis, fax compression, LZS(STAC) Predictor compression Tries to predict the next input byte : the matching algorithm looks for the most recent match of any pattern rather than best and longest match - higher speed but misses many significant pattern repetitions therefore lower data reduction (not much used) NATO UNCLASSIFIED 5

63 Lempel-Ziv (LZ) “stateless” window compression
Published in (hence LZ77) Basis of ~all loss-less data compression implementations today Repeated “strings” replaced by “pointers” to the previous location where the string had occurred Buffer or “window” required for the “historical” information to be available for reference – typically – bytes (mostly bytes) All previous data outside the buffer/window is lost or “forgotten” hence the name “stateless” or memory-less Can find and compress only patterns that are repeated within the window – repetitions separated by more than window size are ignored Poor scalability: For compression efficiency large window size is required but this increases pattern search computation significantly Good for “file compression” type applications NATO UNCLASSIFIED 5

64 NATO UNCLASSIFIED NATO UNCLASSIFIED

65 Nov 1978, University of Pennsylvania, Museum Hall, Banquet in honor of Claude E. Shannon receiving H. Pender award (Prof. F. Haber & DY) NATO UNCLASSIFIED NATO UNCLASSIFIED


Download ppt "NATO Consultation, Command and Control Agency"

Similar presentations


Ads by Google