Presentation on theme: "Imprecise Probability and Network Quality of Service Martin Tunnicliffe."— Presentation transcript:
Imprecise Probability and Network Quality of Service Martin Tunnicliffe
Two Kinds of Probability Alietory Probability: The probability of chance. Example: “When throwing an unweighted die, the probability of obtaining a 6 is 1:6”. Epistemic Probability: The probability of belief. Example: “The defendant on trial is probably guilty”.
Probability and Betting Odds A “fair bet” is a gamble which, if repeated a large number of times, returns the same amount of money in winnings as the amount of money staked. Example, if there is a 1:10 chance of winning a game, then the “odds” for a fair gamble would be 10:1. Problems arise when we do not know exactly what the chance of winning is. Under such circumstances, how can we know what constitutes a fair gamble? Behavioural interpretation of probability (Bruno de Finetti, 1906-1985): “Probability” in such cases refers to what people will consider or believe a fair bet to be. Belief stems from experience, i.e. inductive learning.
Inductive Learning Induction is the opposite of deduction, which infers the specific from the general. Example: “All dogs have four legs. Patch is a dog. Therefore Patch has four legs.” Induction is the opposite: It infers the general from the specific. Example: “Patch is a dog. Patch has four legs. Therefore all dogs have four legs.”
Inductive Learning The last statement has little empirical support. However, consider a larger body of evidence: DogNumber of Legs Patch4 Lucky4 Pongo4 Perdita4 Freckles4 The statement “all dogs have four legs” now has significant plausibility or epistemic probability. However, it remains uncertain: Even with a hundred dogs, there is no categorical proof that the hundred-and-first Dalmatian will not have five legs!
Approaches to Inductive Learning. Frequentist statistics disallows the concept of epistemic probability (We cannot talk about the “probability of a five-legged Dalmatian”). Thus it offers very little framework for inductive learning. The Objective Bayesian approach allows epistemic probability, which it represents as a single probability distribution. (This is the Bayesian Dogma of Precision). The Imprecise Probability approach uses two distributions representing “upper probability” and “lower probability”.
Marble Problem Example (shamelessly “ripped off” from P. Walley, J. R. Stat. Soc. B, 58(1), pp.3-57, 1996): Marbles are drawn blindly from a bag of coloured marbles. The event constitutes the drawing of a red marble. The composition of the bag is unknown. For all we know, it could contain no red marbles. Alternatively every marble in the bag may be red. Nevertheless, we are asked to compute the probability associated with a “fair gamble” on , both a priori (before any marble is drawn) and after n marbles are drawn, j of which are red. (Marbles are replaced before the next draw.)
Binomial Distribution If is the true (unknown) chance of drawing a red marble. The probability of drawing j reds in n draws is: Walley actually considers a more complex “multinomial” situation, where three or more outcomes are possible. However, I am only going to consider two possibilities: = red marble and ~ = any other coloured marble. This is proportional to the “Likelihood” of given that j red marbles have been drawn
Bayes’ Theorem Bayes’ Theorem provides a relationship between likelihood and epistemic probability. Since is a continuous variable, its probability must be described by a “probability density function” or pdf which we can denote f( ): Let f ( ) be the “prior pdf” (representing our pre-existing beliefs about ) and f ( | n, j) the “posterior pdf” (representing our modified beliefs given that n trials have yielded j red marbles). Bayes’ Theorem tells us that:
Beta Model We need a formula for f( ). Let us assume that it follows a beta distribution: Now from the binomial distribution we know that: The “hyper-parameter” s is the “prior strength”, the influence this prior belief has upon the posterior probability. Here t is the first moment (or expectation) of the distribution, representing our prior belief.
Beta Model: Posterior Distributions Thus the beta-prior generates a beta-posterior (it is the “conjugate prior” for the binomial distribution).
Posterior Expectation The expectation of the posterior distribution can now be calculated: Under the behavioural interpretation, this is viewed as the posterior probability P( |j,n) of a red. Example: Supposing we are initially willing to bet 2:1 on a red (t=1/2). However, the next ten draws only produce 2 reds. Assuming s=2 gives: Thus in the light of the new information, a fair gamble now requires odds of 4:1 on red, and 4:3 against red.
Dirichlet Distribution Walley’s paper uses the generalised Dirichlet distribution. The beta distribution is the special case of the Dirichlet for which the number of possible outcomes is 2. (Sample set has cardinality 2.) This leads to the “Imprecise Dirichlet Model” or IDM. The simpler Beta-function model may be called the “Imprecise Beta Model” (IBM).
Objective Bayesian Approach We need an initial value for t, to represent our belief that will occur when we have no data available ( j = n = 0). This is called a “non-informative prior”. Under “Bayes’ Postulate” (in the absence of any information, all possibilities are equally likely) t = 0.5: Under this assumption: However, a value for s is still needed.
Non-Informative Priors Bayesians favour setting s to the cardinality of the sample space (in this case 2) to give a “uniform” prior.
Problems with Bayesian Approach Problem: Bayesian formula assigns finite probabilities to events which have never been known to happen, and might (for all we know) be physically impossible. Even after 10 failures to draw a red, the model still supports betting 10:1 on a red!
Problems with Bayesian Approach Strict application of Bayes’ Postulate yields prior (and hence posterior) probabilities which depend on the choice of sample space (which should be arbitrary). Two possibilities, one “successful”, t = 1/2 Three possibilities, one “successful”, t = 1/3 Four possibilities, two “successful”, t = 1/2 The experiment is identical in all three cases: Only its representation is altered. Thus the Representation Invariance Principle (RIP) is violated.
A Quote from Walley “The problem is not that Bayesians have yet to discover the truly noninformative priors, but rather that no precise probability distribution can adequately represent ignorance.” (Statistical Reasoning with Imprecise Probabilities, 1991) What does Walley mean by “precise probability”?
The “Dogma of Precision” The Bayesian approach rests upon de Finetti’s “Dogma of Precision”. Walley (1991) “…..for each event of interest, there is some betting rate which you regard as fair, in the sense that you are willing to accept either side of a bet on the event at that rate.” Example: If there is a 1:4 chance of an event , I am equally prepared to bet 4:1 on and 4:3 against .
The Imprecise Probability Approach The “Imprecise Probability” approach solves the problem by removing the dogma of precision, and thus the requirement for a noninformative prior. It does this by eliminating the need for a single probability associated with , and replaces it with an upper probability and a lower probability.
Upper and Lower Probabilities When no data is available, might take any value between 0 and 1. thus the prior lower and upper probabilities are respectively: Walley: Before any marbles are drawn “…I do not have any information at all about the chance of drawing a red marble, so I do not see why I should bet on or against red at any odds’. This is not a very exciting answer, but I believe that it is the correct one.”
Upper and Lower Probabilities Imprecise Probability Possibility Theory Dempster- Shafer Theory Upper Probability PossibilityPlausibility Lower Probability NecessityBelief Lower Probability: The degree to which we are confident that the next marble will definitely be red. Upper Probability: The degree to which we are worried that the next marble might be red.
Posterior Upper and Lower Probabilities However, the arrival of new information (j observed reds in n trials) allow these two probabilities to be modified. The prior upper and lower probabilities (1 and 0) can be substituted for t in the Bayesian formula for posterior mean probabvility. Thus we obtain the posterior lower and upper probabilities:
Properties of Upper and Lower Probabilities The amount of imprecision is the difference between the upper and lower probabilities, i.e. This does not depend on the number of “successes” (occurrences of ). As n , the imprecision tends to zero and the lower and upper probabilities converge towards j/n, the observed success ratio. As s , the prior dominates: The imprecision becomes 1, and the lower and upper probabilities return to 0 and 1 respectively. As s 0, the new data dominates the prior and and the lower and upper probabilities again converge to j/n (Haldane’s model).
Interpretation of Upper and Lower Probabilities How do we interpret these upper and lower probabilities? Which do we take as “the probability of red”? It depends on whether you are betting for or against red. If you are betting for red then you take the lower probability, since this represents the most cautious expectation of the probability of red. However, if you are betting against red, you take the upper probability, since this is associated with the lower probability of not-red. Proof:
Interpretation of Upper and Lower Probabilities A “fair bet” would be 1/0.7=1.429:1 against the event . 0.7 0.1 A “fair bet” would be 1/0.1=10:1 in favour of the event . (For consistency, we continue to assume that s = 2.)
Analogy with Possibility Theory Thus upper probability is analogous to possibility and lower probability to necessity. Consider the axiom of possibility theory: i.e. the “necessity” of event X occurring is one minus the “possibility” of X not occurring. Similarly the expressions for upper and lower probability show us that:
ModelValue of sRemarks Bayes-LaplaceSize of sample set (2 for Beta Model) Intuitively reasonable results. Violates the RIP. JeffreysHalf the size of sample set (1 for Beta Model) Intuitively reasonable results. Violates the RIP. Haldane0 P( |j,n)=P( |j,n)=j/n Loss of dichotomy between upper and lower probabilities. Unreasonable results for small n. Perks1Reasonable results. Confidence limits agrees with their frequentist values.
Confidence Intervals for You might be tempted to think that the upper and lower probabilities represent some kind of “confidence interval” for the true value of. This is not the case. Upper and lower probabilities are the mean values of belief functions for, relevant to people with different agendas (betting for and against ).
Confidence Intervals for Suppose we want to determine a “credible interval” ( - ( ), + ( ) ) such that we are at least 100 per cent “sure” that - ( ) < < + ( ): 95% probability within this range2.5% + ( ) - ( ) Example: =0.95 (95% confidence)
Calculating the Confidence Interval Integrating the two probability distributions, we find that we can compute the confidence intervals by solving the equations: I indicates the “Incomplete Beta Function”. No analytic solution exists, but numerical iteration using the partition method is quite straightforward.
Frequentist Confidence Limits Binomial distribution for = - ( ) Binomial distribution for = + ( )
Comparison – Frequentist vs. Imprecise Probability When s = 1 (Perks), Imprecise Probability agrees exactly with Frequentism on the upper and lower confidence limits. FrequentistImprecise Probability
Applications in Networking Network Management and Control often requires decisions to be made based upon limited information. This could be viewed as gambling on imprecise probabilities. Monitoring Network Quality-of-Service. Congestion Window Control in Wired-cum- Wireless Networks.
Quality of Service (QoS) Host/ End System Network Quality of Service (QoS) Different types of applications have different QoS requirements. FTP and HTTP can tolerate delay, but not errors/losses (transmitted and received messages must be exactly identical). Real time services (Voice/Video) can tolerate some data losses, but are sensitive to variations in delay.
QoS Metrics Loss: Percentage of transmitted packets which never reach their intended destination (either due to noise corruption or overflow at a queuing buffer.) Latency: A posh word for “delay”; the time a packet takes to travel between end-points. Jitter: Loosely defined as the amount by which latency varies during a transmission. (Its precise definition is problematic.) Most important in real-time applications. Throughput: The throughput is the rate at which data can be usefully carried.
User Data Monitor Data User Data Monitor Data Network n packets total (n - j) packets “successful” j packets “failed” Failure Probability Quality of Service (QoS) Monitoring
Simulation Data Monitor Stream Data Stream 1 Data Stream 2 Data Stream 3 Packet Size54-bytes53-bytes100-bytes200 bytes Packet Separation 10s10ms Loss Rate36.74%38.94%69.71%95.19% 95% IntervalN/A36.04-42.09% Mean Latancy 0.800s 0.801s0.804s Heavily Loaded Network (Average utilisation: 97%)
Simulation Data Monitor Stream Data Stream 1 Data Stream 2 Data Stream 3 Packet Size54-bytes53-bytes100-bytes200 bytes Packet Separation 10s10ms Loss Rate0% Mean Latancy 0.0025s0.0026s0.0034s0.0055s Lightly Loaded Network (Average utilisation: 46%)
Jitter Definition Two “Simple” Jitter: Difference between successive latencies “Smoothed” Jitter (RFC 3550): Each value inherits 15/16 of the previous value
Jitter Profiles Monitor Stream: 10s Data Stream: 10ms
Wired-Cum-Wireless Networks Wireless Network: Congestion Plus Random Noise Wired Network: Congestion Only
WTCP: Identifying the Cause of Packet Loss using Interarrival Time Block of lost packets Packet i Packet i+1 Packet i+2 Packet j-2 Packet j-1 Packet j Packet Stream titi t i+1 t i+2 t j-2 t j-1 tjtj Arrival Times Interarrival Times t i+1 – t i t i+2 – t i+1 t j-1 – t j-2 t j – t j-1
WTCP: Identifying the Cause of Packet Loss using Interarrival Time Assume we already know the mean M and standard deviation σ of the interarrival time when the network is uncongested. If M - Kσ < Δ i,j <M + Kσ (where K is a constant), then the losses are assumed to be random. The sending rate is not altered. Otherwise, we infer that queue-sizes are varying: An indication that congestion is occurring. The sending rate is reduced to alleviate the problem. Much work still to be done on this optimising mechanism to maximise throughput.