Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS-791/891--Preservation of Digital Objects and Collections

Similar presentations


Presentation on theme: "CS-791/891--Preservation of Digital Objects and Collections"— Presentation transcript:

1 CS-791/891--Preservation of Digital Objects and Collections
Estimating Frequency of Change Written By Junghoo Cho, Hector Garcia-Molina Presented By Suman Kumar Narsing.

2 The topics to be dealt in this are:
INTRODUCTION TAXONOMY OF ISSUES PRELIMINARIES ESTIMATION OF FREQUENCY: EXISTENCE OF CHANGE ESTIMATION OF FREQUENCY: LAST DATE OF CHANGE EXPERIMENTS CONCLUSION

3 1. INTRODUCTION: These are autonomous and are updated independently.
Now many data sources are available online. These are autonomous and are updated independently. Ex: CNN & NY Times, online stores etc. As sources updated autonomously, clients don’t know exactly when and how the sources change often.

4 HOW TO IMPROVE THEIR EFFECTIVENESS:
Improving a Web crawler. Improving the update policy of a data warehouse. Improving Web caching. Data mining.

5 HOW TO ESTIMATE THE FREQUENCY OF CHANGE:
Incomplete change history. Irregular access interval. Difference in available information.

6 EXAMPLE 1: A web crawler accessed a page on a daily basis for 10 days, and it detected 6 changes. From this data, the Change frequency is = 6/10 = 0.6 times a day. EXAMPLE 2: In a web cache a user accessed a web page for 4 times at day1, day2, day 7 and day 10. Web page had changes in it on day 2 and day 7. Then what does this imply? Does the page change every 10/2 = 5 days on an average? EXAMPLE 1: A web crawler accessed a page on a daily basis for 10 days, and it detected 6 changes. From this data, the Change frequency is = 6/10 = 0.6 times a day. EXAMPLE 2. In a web cache a user accessed a web page for 4 times at day1, day2, day 7 and day 10. Web page had changes in it on day 2 and day 7. Then what does this imply? Does the page change every 10/2 = 5 days on an average?

7 2. TAXONOMY OF ISSUES: What do we mean by “ Change of an Element”?
What does “Element” mean? What does “Change” mean? Element – “Web page” and any Change is – any modification to the page.

8 Developing Taxonomy: How do we trace the history of an element?
Passive monitoring Active monitoring Regular interval Random interval What information do we have? Complete history of changes. Last date of change Existence of change

9 Developing Taxonomy: (Contd..)
How do we use estimated frequency? Estimation of frequency. Categorization of frequency

10 E[X(t+1)-X(t)] = ∑kPr{X(t+1)-X(t)=k}= ∑k(λk e-λ /k|)= λ
3. PRELIMINARIES: Poisson Process: The model for the changes of an element. The no. of events expected to occur in a unit interval: E[X(t+1)-X(t)] = ∑kPr{X(t+1)-X(t)=k}= ∑k(λk e-λ /k|)= λ X(t)—No. of occurrences of a change in interval (0,t] λ – Poisson process of rate or frequency. For s>= 0 and t<0, the random variable X(s+t)-X(s) has the Poisson probability distribution Pr{X(s+t)-X(s) = k} = (λt)k e-λt /k! for k =0,1…….

11 Graphs explaining the importance of λ:

12 Estimator: λ = X/T; The distribution of λ determines how effective the estimator λ is: Bias. Efficiency. Consistency. Estimator: λ = X/T; The distribution of λ determines how effective the estimator λ is: Bias. Efficiency. Consistency.

13 4. ESTIMATION OF FREQUENCY: EXISTENCE OF CHANGE:
Total time elapsed =, T = nI = n/f; Assuming estimator from now as frequency ratio, r = λ/f = 1/f(X/T) = X/n.

14 Measuring X repeated accesses to the element:
Is the estimator r biased? Theorem 4.1 The expected value of the estimator r is E[r] = 1 – e -r Is the estimator r consistent? How efficient is the estimator? Corollary 4.2 The standard deviation of the estimator r = X/n is calculated.

15 5. ESTIMATION OF FREQUENCY: LAST DATE OF CHANGE
Let T be the time to the previous event in a Poisson process with rate λ. Then the expected value of T is E[T] = 1/ λ. The new estimator consists of three functions. Init() Update() Estimate()

16 The estimator using last modified changes:
Init() /* initialize variables */ N = 0; /* total number of accesses */ X = 0; /* number of detected changes */ T = 0; /* sum of the times from changes */ Update(Ti, Ii) /* update variables */ N = N + 1; /* Has the element changed? */ If (Ti < Ii) then /* The element has changed. */ X = X + 1; T = T + Ti; else /* The element has not changed */ T = T + Ii; Estimate() /* return the estimated lambda */ return X/T;

17 6. EXPERIMENTS: Non-Poisson model.
Improvement from last modification date. Effectiveness of estimators for real Web data. 6. EXPERIMENTS Non-Poisson model. Improvement from last modification date. Effectiveness of estimators for real Web data.

18 COMPARISION OF NAÏVE ESTIMATOR AND OURS

19 Application to a Web crawler:
Uniform Policy: Naïve Policy. Our Policy. Application to a Web crawler: Uniform Policy: Naïve Policy. Our Policy.

20 7. CONCLUSION: Future work: Adaptive Scheme: Changing λ CONCLUSION:

21 REFERENCES: Junghoo Cho, Hector Garcia-Molina "Estimating frequency of change." ACM Transactions on Internet Technology, 3(3): August REFERENCES: Junghoo Cho, Hector Garcia-Molina "Estimating frequency of change." ACM Transactions on Internet Technology, 3(3): August

22 THANK YOU THANK YOU


Download ppt "CS-791/891--Preservation of Digital Objects and Collections"

Similar presentations


Ads by Google