# Kolmogorov complexity and its applications Paul Vitanyi CWI & University of Amsterdam Microsoft Intractability Workshop, 5-7.

## Presentation on theme: "Kolmogorov complexity and its applications Paul Vitanyi CWI & University of Amsterdam Microsoft Intractability Workshop, 5-7."— Presentation transcript:

Kolmogorov complexity and its applications Paul Vitanyi CWI & University of Amsterdam http://www.cwi.nl/~paulv/ Microsoft Intractability Workshop, 5-7 July 2010

1. Intuition & history What is the information content of an individual string? 111 …. 1 (n 1s) π = 3.1415926 … n = 2 1024 Champernownes number: 0.1234567891011121314 … is normal in scale 10 (every block has same frequency) All these numbers share one commonality: there are small programs to generate them. Shannons information theory does not help here.

Andrey Nikolaevich Kolmogorov ( 1903, Tambov, Russia1987 Moscow) Measure Theory Probability Analysis Intuitionistic Logic Cohomology Dynamical Systems Hydrodynamics Kolmogorov complexity

Example: Randomness Bob proposes to flip a coin with Alice: Alice wins a dollar if Heads; Bob wins a dollar if Tails Result: TTTTTT …. 100 Tails in a roll. Alice lost \$100. She feels cheated.

Alice goes to the court Alice complains: T 100 is not random. Bob asks Alice to produce a random coin flip sequence. Alice flipped her coin and got THTTHHTHTHHHTTTTH … But Bob claims Alices sequence has probability 2 -100, and so does his. How do we define randomness?

2. Roots of Kolmogorov complexity and preliminaries (1) Foundations of Probability P. Laplace: … a sequence is extraordinary (nonrandom) because it contains regularity (which is rare). 1919. von Mises notion of a random sequence S: lim n { #(1) in n-prefix of S}/n =p, 0 { "@context": "http://schema.org", "@type": "ImageObject", "contentUrl": "http://images.slideplayer.com/2/685594/slides/slide_6.jpg", "name": "2. Roots of Kolmogorov complexity and preliminaries (1) Foundations of Probability P.", "description": "Laplace: … a sequence is extraordinary (nonrandom) because it contains regularity (which is rare). 1919. von Mises notion of a random sequence S: lim n { #(1) in n-prefix of S}/n =p, 0

Roots … (2) Information Theory. Shannon theory is on an ensemble. But what is information in an individual object? (3) Inductive inference. Bayesian approach using universal prior distribution (4) Shannons State x Symbol (Turing machine) complexity.

Preliminaries and Notations Strings: x, y, z. Usually binary. x=x 1 x 2... an infinite binary sequence x i:j =x i x i+1 … x j |x| is number of bits in x.. Sets, A, B, C … |A|, number of elements in set A.. I assume you know Turing machines, universal TMs, basic facts...

3. Mathematical Theory Solomonoff (1960)-Kolmogorov (1965)-Chaitin (1969): The amount of information in a string is the size of the smallest program of an optimal Universal TM U generating that string. C (x) = min {|p|: U(p) = x } U p U p Invariance Theorem: It does not matter which optimal universal Turing machine U we choose. I.e. all universal encoding methods are ok.

Proof of the Invariance theorem Fix an effective enumeration of all Turing machines (TMs): T 1, T 2, … Define C = min {|p|: T(p) = x} T p U is an optimal universal TM such that (p produces x) U(1 n 0p) = T n (p) Then for all x: C U (x) C Tn (x) + n+1, and |C U (x) – C U (x)| c. Fixing U, we write C(x) instead of C U (x). QED Formal statement of the Invariance Theorem: There exists a computable function S 0 such that for all computable functions S, there is a constant c S such that for all strings x ε {0,1} * C S0 (x) C S (x) + c S

It has many applications Mathematics --- probability theory, logic, statistics. Physics --- chaos, thermodynamics. Computer Science – average case analysis, inductive inference and learning, shared information between documents, data mining and clustering, incompressibility method -- examples: Prime number theorem Goedels incompleteness Shellsort average case Heapsort average case Circuit complexity Lower bounds on combinatorics, graphs,Turing machine computations, formal languages, communication complexity, routing Philosophy, biology, cognition, etc – randomness, inference, learning, complex systems, sequence similarity Information theory – information in individual objects, information distance Classifying objects: documents, genomes Query Answering systems

Mathematical Theory cont. Intuitively: C(x)= length of shortest effective description of x Define conditional Kolmogorov complexity similarly, with C(x|y)=length of shortest description of x given y. Examples C(xx) = C(x) + O(1) C(xy) C(x) + C(y) + O(log(min{C(x),C(y)}) C(1 n ) O(logn) C(π 1:n ) O(logn); C(π 1:n |n) O(1) For all x, C(x) |x|+O(1) C(x|x) = O(1) C(x|ε) = C(x); C(ε|x)=O(1)

3.1 Basics Incompressibility: For constant c>0, a string x ε {0,1} * is c-incompressible if C(x) |x|-c. For constant c, we often simply say that x is incompressible. (We will call incompressible strings random strings.) Lemma. There are at least 2 n – 2 n-c +1 c-incompressible strings of length n. Proof. There are only k=0,…,n-c-1 2 k = 2 n-c -1 programs with length less than n-c. Hence only that many strings (out of total 2 n strings of length n) can have shorter programs (descriptions) than n-c. QED.

Facts If x=uvw is incompressible, then C(v) |v| - O(log |x|). Proof. C(uvw) = |uvw| |uw| +C(v)+ O(log |u|) +O(log C(v)). If p is the shortest program for x, then C(p) |p| - O(1) C(x|p) = O(1) but C(p|x) C(|p|)+O(1) ( optimal because of the Halting Problem!) If a subset A of {0,1}* is recursively enumerable (r.e.) (the elements of A can be listed by a Turing machine), and A is sparse (|A =n | p(n) for some polynomial p), then for all x in A, |x|=n, C(x) O(log p(n) ) + O(C(n)) +O(|A|)= O(log n).

3.2 Asymptotics Enumeration of binary strings: 0,1,00,01,10, mapping to natural numbers 0, 1, 2, 3, … C(x) as x Define m(x) to be the monotonic lower bound of C(x) curve (as natural number x ). Then m(x), as x, and m(x) < Q(x) for all unbounded computable Q. Nonmonotonicity: for x=yz, it does not imply that C(y)C(x)+O(1).

Graph of C(x) for integer x. Function m(x) is greatest monotonic non-decreasing lower bound.

Graph of C(x|l(x)). Function m(x) is greatest monotonic non-decreasing lower bound.

The Incompressibility Method: Shellsort Using p increments h 1, …, h p, with h p =1 At k-th pass, the array is divided in h k separate sublists of length n/h k (taking every h k -th element). Each sublist is sorted by insertion/bubble sort. ------------- Application: Sorting networks --- n log 2 n comparators, easy to program, competitive for medium size lists to be sorted.

Shellsort history Invented by D.L. Shell [1959], using p k = n/2 k for step k. It is a Θ(n 2 ) time algorithm Papernow&Stasevitch [1965]: O(n 3/2 ) time by destroying regularity in Shells geometric sequence. Pratt [1972]: All quasi geometric sequences use O(n 3/2 ) time.Θ(nlog 2 n) time for p=(log n)^2 with increments 2^i3^j. Incerpi-Sedgewick, Chazelle, Plaxton, Poonen, Suel (1980s) – best worst case, roughly, Θ(nlog 2 n / (log logn) 2 ). Average case: Knuth [1970s]: Θ(n 5/3 ) for p=2 Yao [1980]: p=3 characterization, no running time. Janson-Knuth [1997]: O(n 23/15 ) for p=3. Jiang-Li-Vitanyi [J.ACM, 2000]: Ω(pn 1+1/p ) for every p.

Shellsort Average Case Lower bound Theorem. p-pass Shellsort average case T(n) pn 1+1/p Proof. Fix a random permutation Π with Kolmogorov complexity nlogn. I.e. C(Π) nlogn. Use Π as input. (We ignore the self-delimiting coding of the subparts below. The real proof uses better coding.) For pass i, let m i,k be the number of steps the kth element moves. Then T(n) = Σ i,k m i,k From these m i,k 's, one can reconstruct the input Π, hence Σ log m i,k C(Π) n logn Maximizing the left, all m i,k must be the same (maintaining same sum). Call it m. So Σ m = pnm = Σ i,k m i,k Then, Σ log m = pn log m Σ log m i,k nlogn m p n. So T(n) = pnm > pn 1+1/p. Corollary: p=1: Bubblesort Ω(n 2 ) average case lower bound. p=2: n 3/2 lower bound. p=3, n 4/3 lower bound (4/3=20/15); and only p=Θ(log n) can give average time O(n log n).

Similarity of Strings: The Problem: 1 2 3 45 Given: Literal objects Determine: Similarity Distance Matrix (distances between every pair) (binary files) Applications: Clustering, Classification, Evolutionary trees of Internet documents, computer programs, chain letters, genomes, languages, texts, music pieces, ocr, ……

Normalized Information Distance Definition. We define the normalized information distance: d(x,y) =max{C(x|y,C(y|x)}/max{C(x),C(y)} The new measure has the following properties: Triangle inequality symmetric; d(x,y)>0 for x y and d(x,x)=0; Hence it is a metric! But it is not computable.

Practical concerns d(x,y) is not computable, hence we replace C(x) by Z(x), the length of the compressed version of x using compressor Z (gzip,bzip2,PPMZ). The equivalent formula becomes: d(x,y) = Z(xy)-min{Z(x),Z(y)} max{Z(x),Z(y)} This is a parameter-free, feature-free, alignment-free similarity method, usable for data-mining, phylogeny when features are unknown, and so on. Note: max{C(x|y),C(y|x)} = max{ C(xy)-C(y), C(xy)-C(x)} = C(xy) – min{C(x),C(y)}

Example: Eutherian Orders: It has been a disputed issue which of the two groups of placental mammals are closer: Primates, Ferungulates, Rodents. In mtDNA, 6 proteins say primates closer to ferungulates; 6 proteins say primates closer to rodents. Hasegawas group concatenated 12 mtDNA proteins from: rat, house mouse, grey seal, harbor seal, cat, white rhino, horse, finback whale, blue whale, cow, gibbon, gorilla, human, chimpanzee, pygmy chimpanzee, orangutan, sumatran orangutan, with opossum, wallaroo, platypus as out group, 1998, using max likelihood method in MOLPHY.

Evolutionary Tree of Mammals:

3.3 Properties Theorem (Kolmogorov) (i) C(x) is not partially recursive. That is, there is no Turing machine M s.t. M accepts (x,k) if C(x)k and undefined otherwise. (ii) However, there is H(t,x) such that H(t+1,x) H(t,x) and lim t H(t,x)=C(x) where H(t,x) is total recursive. Proof. (i) If such M exists, then design M as follows. M simulates M on input (x,n), for all |x|=n in parallel (one step each), and outputs the first x such that M says `yes. Choose n >> |M|. Thus we have a contradiction: C(x)n by M, but M outputs x hence |x|=n >> |M| C(x) n. (ii) TM with program for x running for t steps defines H(t,x). QED

3.4 Godels Theorem Theorem. The statement x is random (=incompressible) is undecidable for all but finitely many x. C(x) > C+O(log |x|), and output (first) random such x. Then (2) C(x) > C+O(log |x|). QED

3.5 Barzdins Lemma A characteristic sequence of set A is an infinite binary sequence χ=χ 1 χ 2 …, χ i =1 iff iεA. Theorem. (i) The characteristic sequence χ of an r.e. set A satisfies C(χ 1:n |n)log n+c A for all n. (ii) There is an r.e. set such that C(χ 1:n )log n for all n. Proof. (i) Use the number m of 1s in the prefix χ 1:n as termination condition [C(m) log n+O(1)]. (ii) By diagonalization. Let U be the universal TM. Define χ=χ 1 χ 2 …, by χ i =1 if the i-th bit output by U(i)< equals 0, otherwise χ i =0. χ defines an r.e. set. Suppose, for some n, we have C(χ 1:n ) { "@context": "http://schema.org", "@type": "ImageObject", "contentUrl": "http://images.slideplayer.com/2/685594/slides/slide_28.jpg", "name": "3.5 Barzdins Lemma A characteristic sequence of set A is an infinite binary sequence χ=χ 1 χ 2 …, χ i =1 iff iεA.", "description": "Theorem. (i) The characteristic sequence χ of an r.e. set A satisfies C(χ 1:n |n)log n+c A for all n. (ii) There is an r.e. set such that C(χ 1:n )log n for all n. Proof. (i) Use the number m of 1s in the prefix χ 1:n as termination condition [C(m) log n+O(1)]. (ii) By diagonalization. Let U be the universal TM. Define χ=χ 1 χ 2 …, by χ i =1 if the i-th bit output by U(i)< equals 0, otherwise χ i =0. χ defines an r.e. set. Suppose, for some n, we have C(χ 1:n )

Selected Bibliography T. Jiang, J. Seiferas, and P.M.B. Vitanyi, Two heads are better than two tapes, J. Assoc. Comput. Mach., 44:2(1997), 237--256 M. Li, X. Chen, X. Li, B. Ma, P.M.B. Vitanyi. The similarity metric, IEEE Trans. Inform. Th., 50:12(2004), 3250- 3264. R. Cilibrasi, P.M.B. Vitanyi, R. de Wolf, Algorithmic clustering of music based on string compression, Computer Music Journal, 28:4(2004), 49-67. R. Cilibrasi, P.M.B. Vitanyi, Clustering by compression, IEEE Trans. Inform. Th., 51:4(2005), 1523-1545. R. Cilibrasi, P.M.B. Vitanyi, The Google similarity distance, IEEE Trans. Knowledge and Data Engineering, 19:3(2007), 370-383. M. Li and P.M.B. Vitanyi. An Introduction to Kolmogorov Complexity and its Applications, Springer-Verlag, New York, 3rd Edition, 2008. R. Cilibrasi, P.M.B. Vitanyi, R. de Wolf, Algorithmic clustering of music based on string compression, Computer Music Journal, 28:4(2004), 49-67. R. Cilibrasi, P.M.B. Vitanyi, Clustering by compression, IEEE Trans. Inform. Th., 51:4(2005), 1523-1545. R. Cilibrasi, P.M.B. Vitanyi, The Google similarity distance,, http://xxx.lanl.gov/abs/cs.CL/0412098 (2004) http://xxx.lanl.gov/abs/cs.CL/0412098 (2004) E. Keogh, S. Lonardi, and C.A. Rtanamahatana, Toward parameter-free data mining, In: Proc. 10th ACM SIGKDD Intn'l Conf. Knowledge Discovery and Data Mining, Seattle, Washington, USA, August 22---25, 2004, 206--215. M. Li, J.H. Badger, X. Chen, S. Kwong, P. Kearney, and H. Zhang. An information-based sequence distance and its application to whole mitochondrial genome phylogeny, Bioinformatics, 17:2(2001), 149--154. M. Li and P.M.B. Vitanyi, Reversibility and adiabatic computation: trading time and space for energy, Proc. Royal Society of London, Series A, 452(1996), 769-789. M. Li and P.M.B Vitanyi. Algorithmic Complexity, pp. 376--382 in: International Encyclopedia of the Social \& Behavioral Sciences, N.J. Smelser and P.B. Baltes, Eds., Pergamon, Oxford, 2001/2002. M. Li, X. Chen, X. Li, B. Ma, P.M.B. Vitanyi. The similarity metric, IEEE Trans. Inform. Th., 50:12(2004), 3250- 3264. M. Li and P.M.B. Vitanyi. An Introduction to Kolmogorov Complexity and its Applications, Springer-Verlag, New York, 2nd Edition, 1997. A.Londei, V. Loreto, M.O. Belardinelli, Music style and authorship categorization by informative compressors, Proc. 5th Triannual Conference of the European Society for the Cognitive Sciences of Music (ESCOM), September 8-13, 2003, Hannover, Germany, pp. 200-203. S. Wehner, Analyzing network traffic and worms using compression, Manuscript, CWI, 2004. Partially available at http://homepages.cwi.nl/~wehner/worms/ R. Cilibrasi, P.M.B. Vitanyi, R. de Wolf, Algorithmic clustering of music based on string compression, Computer Music Journal, 28:4(2004), 49-67. R. Cilibrasi, P.M.B. Vitanyi, Clustering by compression, IEEE Trans. Inform. Th., 51:4(2005), 1523-1545. R. Cilibrasi, P.M.B. Vitanyi, The Google similarity distance, IEEE Trans. Knowledge and Data Engineering}, 19:3(2007), 370-383. M. Li, X. Chen, X. Li, B. Ma, P.M.B. Vitanyi. The similarity metric, IEEE Trans. Inform. Th., 50:12(2004), 3250- 3264. M. Li and P.M.B. Vitanyi. An Introduction to Kolmogorov Complexity and its Applications, Springer-Verlag, New York, 3rd Edition, 2008. R. Cilibrasi, P.M.B. Vitanyi, R. de Wolf, Algorithmic clustering of music based on string compression, Computer Music Journal, 28:4(2004), 49-67. R. Cilibrasi, P.M.B. Vitanyi, Clustering by compression, IEEE Trans. Inform. Th., 51:4(2005), 1523-1545. R. Cilibrasi, P.M.B. Vitanyi, The Google similarity distance, IEEE Trans. Knowledge and Data Engineering}, 19:3(2007), 370-383. M. Li, X. Chen, X. Li, B. Ma, P.M.B. Vitanyi. The similarity metric, IEEE Trans. Inform. Th., 50:12(2004), 3250- 3264. M. Li and P.M.B. Vitanyi. An Introduction to Kolmogorov Complexity and its Applications, Springer-Verlag, New York, 3rd Edition, 2008.

The End Thank You

Download ppt "Kolmogorov complexity and its applications Paul Vitanyi CWI & University of Amsterdam Microsoft Intractability Workshop, 5-7."

Similar presentations