IDSIA Lugano Switzerland On the Convergence Speed of MDL Predictions for Bernoulli Sequences Jan Poland and Marcus Hutter Is MDL Really So Bad? or

2 Big Picture MDL Bayes Other methods, e.g. PAC-Bayes

3 Bernoulli Classes 1 2 1 4 10 3 8 1 64 111001 1 8 1 64 111000 5 8 1 64 111010 3 4 1 16 1101 7 8 1 64 111011 0 1 4 00 1 1 4 01 # w C o d e 1 4 1 16 1100 C o d e = 111 |{z} 1 + #bi t s 0 |{z} s t op 10 |{z} d a t a ² S e t o f parame t ers £ = f # 1 ; # 2 ;::: g ½ [ 0 ; 1 ] ² W e i g h t s w # f oreac h# 2 £ ² W e i g h t scorrespon d t oco d es: w # = 2 ¡ ` ( C o d e # )

4 Estimators

5 An Example Process Sequence x Bayes mixture ML estimate MAP (MDL) * 0.5 0 0.21 0 1 0.5 0 0.45 0.34 0.5 0000011 0.4 5/16 0.5...(32)... 0.27 0.25...(640)... 0.3 5/16 T rueparame t er # 0 = 5 16 = 0 : 3125

6 What We Know

7 Is MDL Really So Bad?

8 N parame t ers, w # = 1 N f ora ll#, # 0 = 1 2 MDL Is Really So Bad! 1 2 + 1 16 1 2 + 1 8 1 2 + 1 4 1 2 ::: } } } } } P t E ( # ¤ ¡ # 0 ) 2 1 # ¤ 2 [ 1 2 + 1 8 ; 1 2 + 1 4 ] = O ( 1 ) P t E ( # ¤ ¡ # 0 ) 2 1 # ¤ 2 [ 1 2 + 1 16 ; 1 2 + 1 8 ] = O ( 1 ) P t E ( # ¤ ¡ # 0 ) 2 = O ( w ¡ 1 0 ) i n t h e f o ll ow i ngexamp l e:

9 MDL Is Not That Bad!

10 Prepare Sharper Upper Bound 0 1 8 1 7 8 3 4 5 8 1 2 3 8 # 0 = 1 4 } J 0 = [ 0 ; 1 2 ) } I 0 = [ 1 2 ; 1 ] } } } I1I1 J1J1 I1I1

11 Sharper Upper Bound

12 The Universal Case

13 Conclusions

