Presentation is loading. Please wait.

Presentation is loading. Please wait.

Average Value of Sum of Exponents of Runs in Strings Kazuhiko Kusano, Wataru Matsubara, Akira Ishino, Ayumi Shinohara Graduate School of Information Sciences.

Similar presentations


Presentation on theme: "Average Value of Sum of Exponents of Runs in Strings Kazuhiko Kusano, Wataru Matsubara, Akira Ishino, Ayumi Shinohara Graduate School of Information Sciences."— Presentation transcript:

1 Average Value of Sum of Exponents of Runs in Strings Kazuhiko Kusano, Wataru Matsubara, Akira Ishino, Ayumi Shinohara Graduate School of Information Sciences Tohoku University, Japan 1

2 Background 2

3 Run (Maximal Repetition) Substring w which has period Non-extendable left nor right Count once with it’s minimal period 3 :run

4 The Number & The Sum of Exponents The number of runs and the sum of exponents (repetition counts) of runs are interesting issue 4 2.5 2 3 2 2.2 Number: 6 Sum of exponents: 14.2

5 Maximum The maximum number of runs and the maximum value of sum of exponents of runs are still unknown 5 Number Sum of exponents ≦ cn Kolpakov and Kucherov, 1999 ≦ cn Kolpakov and Kucherov, 1999 ≦ 5n Rytter, 2006 ≦ 5n Rytter, 2006 ≦ 3.48n Puglisi et al., 2007 ≦ 3.44n Rytter, 2007 ≦ 1.048n Crochemore and Ilie, 2008 ≧ 0.927n Franek et al., 2003 ≧ 0.945n Matsubara et al., 2008 1.01.0 2.02.0 = n ? Conjecture = n ? Conjecture ≦ 2.9n Crochemore and Ilie, 2007 ≦ 2.9n Crochemore and Ilie, 2007 = 2n ? Conjecture = 2n ? Conjecture ≧ 1.854n Franek et al., 2003 ≧ 1.889n Matsubara et al., 2008 ≦ cn Kolpakov and Kucherov, 1999 ≦ 25n Rytter, 2006 ≦ 25n Rytter, 2006

6 Average The average number of runs is presented We show the average value of sum of exponents of runs 6 Number of runs Sum of exponents Puglisi & Simpson Australasian Journal of Combinatorics To appear (2008)  : alphabet size  (d) : Möbius function Our result

7 7

8 The average value of sum of exponents of runs in strings of length n is represented as follows 8  : alphabet size L(p) : number of Lyndon words of length p Number of runs Sum of exponents [Puglisi & Simpson, 2008]

9 Detail 9

10 Runs in all strings of length n 10 Complicated!

11 d(w,p)d(w,p)d(w,p)d(w,p) A string d(w,p) of length |w|-p is defined as follows w[i..j+p] is a run if and only if d(w,p)[i..j] is a 0-segment (maximal block of 0's) of length l ≧ p 11 w d(w,2) w>>2 w d(w,2)

12 Runs are classified according to its period 12 w d(w,1) d(w,2) d(w,3)

13 13 d(w,2) d(w,3) 0-segments are classified according to its length l=2 l=3 l=4

14 The number of 0-segments of length p in  n c(n,p)c(n,p)c(n,p)c(n,p) 14 Example  = 2, n = 5, p = 2

15 The number of 0-segments of length p in  n Instead of 0-segments, pairs of strings ( ,  ), which separated by 0-segments of length p, are counted up c(n,p)c(n,p)c(n,p)c(n,p)

16 The number of 0-segments of length p in  n Instead of 0-segments, pairs of strings ( ,  ), which separated by 0-segments of length p, are counted up c(n,p)c(n,p)c(n,p)c(n,p) 16 

17 The number of 0-segments of length p in  n Instead of 0-segments, pairs of strings ( ,  ), which separated by 0-segments of length p, are counted up ( σ -1) 2 choices c(n,p)c(n,p)c(n,p)c(n,p) 17 (  -1) 2 choices σ n-p-2 choices  n-p-2  choices (n - p+1) choices for position of 0-segments     ≠ 

18 C(n,p)C(n,p)C(n,p)C(n,p) 0-segments of length l in d(w,p) correspond to runs of period p in w The length of the run is l+p and the exponents is (l+p)/p We denote by C(n,p) the sum of (l+p)/p for each 0- segments of length p or longer as follows 18 w d(w, p) p=2, l=3

19 C(n,p)C(n,p)C(n,p)C(n,p) 19 Example  =2, n=5, p=2

20 0-segments and runs An 0-segment of length l ≧ p in d(w,p) correspond to  p runs having period p in w because d(w,p) and w[0..p-1] determine w[p..n-1] 20 d(w,2) w 00000, 11111 and 22222 are not runs of period 2 but period 1

21 0-segments and runs An 0-segment of length l ≧ p in d(w,p) correspond to  p runs having period p in w because d(w,p) and w[0..p-1] determine w[p..n-1] 21 d(w,2) w 00000, 11111 and 22222 are not runs of period 2 but period 1 In the roots all strings of length p appear once

22 Counting a run once To avoid counting a run more than once a run which has shorter period should be ignored A run has no shorter period ⇔ The root of a run is primitive The number of primitive strings of length p is pL(p) 22 L(p) :number of Lyndon words of length p

23 Counting a run once 23 To avoid counting a run more than once a run which has shorter period should be ignored A run has no shorter period ⇔ The root of a run is primitive The number of primitive strings of length p is pL(p) L(p) :number of Lyndon words of length p

24 Counting a run once 24 To avoid counting a run more than once a run which has shorter period should be ignored A run has no shorter period ⇔ The root of a run is primitive The number of primitive strings of length p is pL(p) L(p) :number of Lyndon words of length p

25 To avoid counting a run more than once a run which has shorter period should be ignored A run has no shorter period ⇔ The root of a run is primitive The number of primitive strings of length p is pL(p) Counting a run once 25 L(p) :number of Lyndon words of length p

26 Average value of sum of exponents The sum of exponents of runs in  n and the average value of sum of exponents of runs in strings of length n are as follows 26

27 Limit of e(n) The average value e(n) grows almost linearly, as n increases 27

28 Limit of e(n) The limit of e(n)/n and the actual values are follows 28  e(n)/n 21.131 30.738 40.545 50.430 60.355  (d) :Möbius function

29 Summary 29

30 Summary The number of 0-segments of length p in  n The sum of (l+p)/p for each runs of period p or longer as follows The average value of sum of exponents of runs in strings of length n 30 Thank you for your attension

31 31

32 周期 32 NG

33 The number of 0-segments of length p in  n Instead of 0-segments, pairs of strings ( ,  ), which separated by 0-segments of length p, are counted up c(n,p)c(n,p)c(n,p)c(n,p) 33 ( σ -1) 2 choices (  -1) 2 choices σ n-p-2 choices  n-p-2  choices (n - p+1) choices for position of 0-segments     ≠ 

34 The number of 0-segments of length p in  n Instead of 0-segments, pairs of strings ( ,  ), which separated by 0-segments of length p, are counted up c(n,p)c(n,p)c(n,p)c(n,p) 34 ( σ -1) 2 choices (  -1) 2 choices σ n-p-2 choices  n-p-2  choices (n - p+1) choices for position of 0-segments     ≠ 

35 The number of 0-segments of length p in  n Instead of 0-segments, pairs of strings ( ,  ), which separated by 0-segments of length p, are counted up c(n,p)c(n,p)c(n,p)c(n,p) 35 ( σ -1) 2 choices (  -1) 2 choices σ n-p-2 choices  n-p-2  choices (n - p+1) choices for position of 0-segments     ≠ 

36 The number of 0-segments of length p in  n Instead of 0-segments, pairs of strings ( ,  ), which separated by 0-segments of length p, are counted up c(n,p)c(n,p)c(n,p)c(n,p) 36 ( σ -1) 2 choices (  -1) 2 choices σ n-p-2 choices  n-p-2  choices (n - p+1) choices for position of 0-segments     ≠ 

37 The number of 0-segments of length p in  n Instead of 0-segments, pairs of strings ( ,  ), which separated by 0-segments of length p, are counted up c(n,p)c(n,p)c(n,p)c(n,p) 37 ( σ -1) 2 choices (  -1) 2 choices σ n-p-2 choices  n-p-2  choices (n - p+1) choices for position of 0-segments     ≠ 

38 The number of 0-segments of length p in  n Instead of 0-segments, pairs of strings ( ,  ), which separated by 0-segments of length p, are counted up c(n,p)c(n,p)c(n,p)c(n,p) 38 ( σ -1) 2 choices (  -1) 2 choices σ n-p-2 choices  n-p-2  choices (n - p+1) choices for position of 0-segments     ≠ 

39 The number of 0-segments of length p in  n Instead of 0-segments, pairs of strings ( ,  ), which separated by 0-segments of length p, are counted up c(n,p)c(n,p)c(n,p)c(n,p) 39 ( σ -1) 2 choices (  -1) 2 choices σ n-p-2 choices  n-p-2  choices (n - p+1) choices for position of 0-segments     ≠ 

40 The number of 0-segments of length p in  n Instead of 0-segments, pairs of strings ( ,  ), which separated by 0-segments of length p, are counted up c(n,p)c(n,p)c(n,p)c(n,p) 40 ( σ -1) 2 choices (  -1) 2 choices σ n-p-2 choices  n-p-2  choices (n - p+1) choices for position of 0-segments     ≠ 


Download ppt "Average Value of Sum of Exponents of Runs in Strings Kazuhiko Kusano, Wataru Matsubara, Akira Ishino, Ayumi Shinohara Graduate School of Information Sciences."

Similar presentations


Ads by Google