Presentation is loading. Please wait.

Presentation is loading. Please wait.

Concept of Probability AS3105 Astrophysical Processes 1 Dhani Herdiwijaya.

Similar presentations


Presentation on theme: "Concept of Probability AS3105 Astrophysical Processes 1 Dhani Herdiwijaya."— Presentation transcript:

1 Concept of Probability AS3105 Astrophysical Processes 1 Dhani Herdiwijaya

2 Probability in Everyday Life Rain fall Traffic jam Across the street Catastrophic meteoroid airplane travel. Is it safe to fly? Laplace (1819) Probability theory is nothing but common sense reduced to calculation Maxwell (1850) The true logic of this world is the calculus of probabilities... That is, probability is a natural language for describing real world phenomena A mathematical formulation of games of chance began in the middle of the 17th century. Some of the important contributors over the following 150 years include Pascal, Fermat, Descartes, Leibnitz, Newton, Bernoulli, and Laplace

3 Development it is remarkable that the theory of probability took so long to develop. An understanding of probability is elusive due in part to the fact that the probably depends on the status of the information that we have (a fact well known to poker players). Although the rules of probability are defined by simple mathematical rules, an understanding of probability is greatly aided by experience with real data and concrete problems.

4 Probability To calculate the probability of a particular outcome, count the number of all possible results. Then count the number that give the desired outcome. The probability of the desired outcome is equal to the number that gives the desired outcome divided by the total number of outcomes. Hence, 1/6 for one die.

5 Rules of Probability In 1933 the Russian mathematician A. N. Kolmogorov formulated a complete set of axioms for the mathematical definition of probability. For each event i, we assign a probability P(i) that satisfies the conditions P (i) ≥ 0 P (i) = 0 means that the event cannot occur P (i) = 1 means that the event must occur

6 The normalization condition says that the sum of the probabilities of all possible mutually exclusive outcomes is unity Example. Let x be the number of points on the face of a die. What is the sample space of x? Solution. The sample space or set of possible events is x i = {1, 2, 3, 4, 5, 6}. These six outcomes are mutually exclusive. There are many different interpretations of probability because any interpretation that satisfies the rules of probability may be regarded as a kind of probability. An interpretation of probability that is relatively easy to understand is based on symmetry.

7 For an actual die, we can estimate the probability a posteriori, that is, by the observation of the outcome of many throws. Suppose that we know that the probability of rolling any face of a die in one throw is equal to 1/6, and we want to find the probability of finding face 3 or face 6 in one throw. the probability of the outcome, i or j, where i is distinct from j P (i or j ) = P (i) + P (j ).(addition rule) The above relation is generalizable to more than two events. An important consequence is that if P (i) is the probability of event i, then the probability of event i not occurring is 1 − P (i). Addition rule

8 Combining Probabilities If a given outcome can be reached in two (or more) mutually exclusive ways whose probabilities are p A and p B, then the probability of that outcome is: p A + p B. This is the probability of having either A or B.

9 Example Paint two faces of a die red. When the die is thrown, what is the probability of a red face coming up?

10 Example: What is the probability of throwing a three or a six with one throw of a die? Solution. The probability that the face exhibits either 3 or 6 is 1/6 + 1/6 = 1/3 Example: What is the probability of not throwing a six with one throw of die? Solution. The answer is the probability of either “1 or 2 or 3 or 4 or 5.” The addition rule gives that the probability P (not six) is P (not six) = P (1) + P (2) + P (3) + P (4) + P (5) = 1 − P (6) = 5/6 the sum of the probabilities for all outcomes sums to unity. It is very useful to take advantage of this property when solving many probability problems.

11 Multiplication rule Another simple rule is for the probability of the joint occurrence of independent events. These events might be the probability of throwing a 3 on one die and the probability of throwing a 4 on a second die. If two events are independent, then the probability of both events occurring is the product of their probabilities P (i and j ) = P (i) P (j ) (multiplication rule) Events are independent if the occurrence of one event does not change the probability for the occurrence of the other.

12 Combining Probabilities If a given outcome represents the combination of two independent events, whose individual probabilities are p A and p B, then the probability of that outcome is: p A × p B. This is the probability of having both A and B.

13 Example Throw two normal dice. What is the probability of two sixes coming up?

14 Example: Consider the probability that a person chosen at random is female and was born on September 6. We can reasonably assume equal likelihood of birthdays for all days of the year, and it is correct to conclude that this probability is ½ x 1/365 Being a woman and being born on September 6 are independent events.

15 Solution. We know that the probability of any specific combination of outcomes, for example, (1,1), (2,2),... (6,6) is 1/6 x 1/6 = 1/36 P (same face) = P (1, 1) + P (2, 2) +... + P (6, 6) = 6 × 1/36 = 1/6 Example. What is the probability of throwing an even number with one throw of a die? Solution. We can use the addition rule to find that P (even) = P (2) + P (4) + P (6) = 1/6 + 1/6 +1/6 = ½ Example. What is the probability of the same face appearing on two successive throws of a die?

16 Example. What is the probability that in two throws of a die at least one six appears? Solution. We have already established that P (6) = 1/6 and P (not 6) = 5/6. In two throws, there are four possible outcomes (6, 6), (6, not 6), (not 6, 6), (not 6, not 6) with the probabilities P (6, 6) = 1/6 x 1/6 = 1/36 P (6, not 6) = P (not 6, 6) = 1/6 x 5/6 = 5/36 P (not 6, not 6) = 5/6 x 5/6 = 25/36 All outcomes except the last have at least one six. Hence, the probability of obtaining at least one six is P (at least one 6) = P (6, 6) + P (6, not 6) + P (not 6, 6) = 1/36 + 5/36 + 5/36 = 11/36 A more direct way of obtaining this result is to use the normalization condition. That is, P (at least one six) = 1 − P (not 6, not 6) = 1 − (5/6) 2 = 1 - 25/36 = 11/36 ~ 0.305…

17 Example. What is the probability of obtaining at least one six in four throws of a die? Solution. We know that in one throw of a die, there are two outcomes with P (6) = 1/6 and P (not 6) = 5/6. Hence, in four throws of a die there are sixteen possible outcomes, only one of which has no six. That is, in the fifteen mutually exclusive outcomes, there is at least one six. We can use the multiplication rule to find that P (not 6, not 6, not 6, not 6) = P (not 6) 4 = (5/6) 4 and hence P (at least one six) = 1 − P (not 6, not 6, not 6, not 6) = 1 - (5/6) 4 = 671/1296 ~ 0.517

18 Complications p is the probability of success. (1/6 for one die) q is the probability of failure. (5/6 for one die) p + q = 1, or q = 1 – p When two dice are thrown, what is the probability of getting only one six?

19 Complications Probability of the six on the first die and not the second is: Probability of the six on the second die and not the first is the same, so:

20 Simplification Probability of no sixes coming up is: The sum of all three probabilities is: p(2) + p(1) + p(0) = 1

21 Simplification p(2) + p(1) + p(0) = 1 p² + 2pq + q² =1 (p + q)² = 1 The exponent is the number of dice (or tries). Is this general?

22 Three Dice (p + q)³ = 1 p³ + 3p²q + 3pq² + q³ = 1 p(3) + p(2) + p(1) + p(0) = 1 It works! It must be general! (p + q) N = 1

23 Renormalization Suppose we know that P (i) is proportional to f (i), where f (i) is a known function. To obtain the normalized probabilities, we divide each function f (i) by the sum of all the unnormalized probabilities. That is, if P (i) α f (i), and Z = ∑ f (i), then P (i) = f (i)/Z. This procedure is called normalization.

24 Example. Suppose that in a given class it is three times as likely to receive a C as an A, twice as likely to obtain a B as an A, one-fourth as likely to be assigned a D as an A, and nobody fails the class. What are the probabilities of getting each grade? Solution. We first assign the unnormalized probability of receiving an A as f (A) = 1. Then f (B ) = 2, f (C ) = 3, and f (D) = 0.25. Then Z = ∑ f (i) = 1 + 2 + 3 + 0.25 = 6.25. Hence, P (A) = f (A)/Z = 1/6.25 = 0.16, P (B ) = 2/6.25 = 0.32, P (C ) = 3/6.25 = 0.48, and P (D) = 0.25/6.25 = 0.04.

25 Meaning of Probability How can we assign the probabilities of the various events? If we say that event E1 is more probable than event E2 (P (E1 ) > P (E2 )), we mean that E1 is more likely to occur than E2. This statement of our intuitive understanding of probability illustrates that probability is a way of classifying the plausibility of events under conditions of uncertainty. Probability is related to our degree of belief in the occurrence of an event. Probability assessments depend on who does the evaluation and the status of the information the evaluator has at the moment of the assessment. We always evaluate the conditional probability, that is, the probability of an event E given the information I, P (E | I ). Consequently, several people can have simultaneously different degrees of belief about the same event, as is well known to investors in the stock market.

26 IHSG

27 If rational people have access to the same information, they should come to the same conclusion about the probability of an event. The idea of a coherent bet forces us to make probability assessments that correspond to our belief in the occurrence of an event. Probability assessments should be kept separate from decision issues. Decisions depend not only on the probability of the event, but also on the subjective importance of say, a given amount of money

28 Probability and Knowledge Probability as a measure of the degree of belief in the occurrence of an outcome implies that probability depends on our prior knowledge, because belief depends on prior knowledge. Probability depends on what knowledge we bring to the problem. If we have no knowledge other than the possible outcomes, then the best estimate is to assume equal probability for all events. However, this assumption is not a definition, but an example of belief. As an example of the importance of prior knowledge, consider the following problem.

29 Large numbers We can estimate probabilities empirically by sampling, that is, by making repeated measurements of the outcome of independent events. Intuitively we believe that if we perform more and more measurements, the calculated average will approach the exact mean of the quantity of interest. We should use computer to generate random number. The applet/application at to simulate multiple tosses of a single coin This idea is called the law of large numbers.

30 Mean Value Consider the probability distribution P (1), P (2),... P (n) for the n possible values of the variable x. In many cases it is more convenient to describe the distribution of the possible values of x in a less detailed way. The most familiar way is to specify the average or mean value of x, which we will denote as. The definition of the mean value of is ≡ x 1 P (1) + x 2 P (2) +... + x n P (n) where P (i) is the probability of x i. If f (x) is a function of x, then the mean value of f (x) is defined by

31 Example: A certain $50 or $100 if you flip a coin and get a head and $0 if you get a tail. The mean value for the second choice is mean value = ∑ Pi × (value of i), where the sum is over the possible outcomes and Pi is the probability of outcome i. In this case the mean value is 1/2 × $100 + 1/2 × $0 = $50. We see that the two choices have the same mean value. (Most people prefer the first choice because the outcome is “certain.”)

32 If f (x) and g(x) are any two functions of x, then = ∑ [f (x i ) + g(x i )] P (i) = ∑ f (x i ) P (i) + ∑ g(x i ) P (i) or = + if c is a constant, then = c In general, we can define the mth moment of the probability distribution P as ≡ ∑ x i m P (i) where we have let f (x) = x m. The mean of x is the first moment of the probability distribution

33 The mean value of x is a measure of the central value of x about which the various values of xi are distributed. If we measure x from its mean, we have that Δx ≡ x − = )> = − = 0 That is, the average value of the deviation of x from its mean vanishes If only one outcome j were possible, we would have P (i) = 1 for i = j and zero otherwise, that is, the probability distribution would have zero width. In general, there is more than one outcome and a possible measure of the width of the probability distribution is given by ≡ ) 2 > The quantity is known as the dispersion or variance and its square root is called the standard deviation. It is easy to see that the larger the spread of values of x about, the larger the variance.

34 The use of the square of x − ensures that the contribution of x values that are smaller and larger than enter with the same sign. A useful form for the variance can be found by noting that ) 2 > = + 2 )> = - 2 + 2 = - 2 Because is always nonnegative, it follows that ≥ 2 it is useful to interpret the width of the probability distribution in terms of the standard deviation σ, which is defined as the square root of the variance. The standard deviation of the probability distribution P (x) is given by σ x = square ( ) = square ( - 2 )

35 Example: Find the mean value, the variance, and the standard deviation σ x for the value of a single throw of a die. Solution. Because P (i) = 1/6 for i = 1,..., 6, we have that = 1/6 (1+2+3+4+5+6) = 7/2 = 1/6 (1 + 4 + 9 + 25 + 36) = 46/3 ( ) = - 2 = 46/3 – 49/4 = 37/12 ~ 3.08 σ x = square (3.08) ~ 1.76

36 Home work There is an one-dimensional lattice constant a as shown in Fig. 1. An atom transit from a site to a nearest-neighbor site every r second. The probability of transiting to the right and left are p and q = 1 – p, respectively. (a) Calculate the average position of the atom at the time t = N τ, where N >> 1 (b) Calculate the mean square value ) 2 > at the time t

37 Ensemble Another way of estimating the probability is to perform a single measurement on many copies or replicas of the system of interest. For example, instead of flipping a single coin 100 times in succession, we collect 100 coins and flip all of them at the same time. The fraction of coins that show heads is an estimate of the probability of that event. The collection of identically prepared systems is called an ensemble and the probability of occurrence of a single event is estimated with respect to this ensemble. The ensemble consists of a large number M of identical systems, that is, systems that satisfy the same known conditions.

38 Information and Uncertainty Let us define the uncertainty function S (P1, P2,..., Pi,...) where Pi is the probability of event i. In case where all the probabilities Pi are equal. Then, P1 = P2 =... = Pi = 1/Ω, where Ω is the total number of outcomes. In this case we have S = S (1/Ω, 1/Ω,...) or simply S (Ω). For only one outcome, Ω = 1 and there is no uncertainty, S (Ω = 1) = 0 and S (Ω1 ) > S (Ω2 )  if Ω1 > Ω2 That is, S (Ω) is a increasing function of Ω

39 We next consider multiple events. For example, suppose that we throw a die with Ω1 outcomes and flip a coin with Ω2 equally probable outcomes. The total number of outcomes is Ω = Ω1 Ω2. If the result of the die is known, the uncertainty associated with the die is reduced to zero, but there still is uncertainty associated with the toss of the coin. Similarly, we can reduce the uncertainty in the reverse order, but the total uncertainty is still nonzero. These considerations suggest that S (Ω1 Ω2 ) = S (Ω1 ) + S (Ω2 ) or S (xy) = S (x) + S (y) This generalization is consistent with S (Ω) being a increasing function of Ω

40 First we take the partial derivative of S (xy) with respect to x and then with respect to y. We let z = xy and obtain From, S (xy) = S (x) + S (y)

41 By comparing the right-hand side If we multiply the first by x and the second by y, we obtain The first term depends only on x and the second term depends only on y. Because x and y are independent variables, the three terms must be equal to a constant. Hence we have the desired condition where A is a constant.

42 It can be integrated to give The integration constant B must be equal to zero to satisfy the condition S (Ω = 1) = 0 The constant A is arbitrary so we choose A = 1. Hence for equal probabilities we have that S (Ω) = ln Ω. In case where the probabilities for the various events are unequal? The general form of the uncertainty S is

43 Note that if all the probabilities are equal, then Pi = 1 / Ω, for all i. In this case We also see that if outcome j is certain, Pj = 1 and Pi = 0 if i = j and S = −1 ln 1 = 0. That is, if the outcome is certain, the uncertainty is zero and there is no missing information. We have shown that if the Pi are known, then the uncertainty or missing information S can be calculated.

44 Usually the problem is to determine the probabilities. Suppose we flip a perfect coin for which there are two possibilities. We know intuitively that P1 (heads) = P2 (tails) = 1/2. That is, we would not assign a different probability to each outcome unless we had information to justify it. Intuitively we have adopted the principle of least bias or maximum uncertainty. Lets reconsider the toss of a coin. In this case S is given by where we have used the fact that P 1 + P 2 = 1. To maximize S we take the derivative with respect to P 1. Use d(ln x)/dx = 1/x

45 The solution satisfies which is satisfied by P1 = 1/2. We can check that this solution is a maximum by calculating the second derivative. which is less than zero as expected for a maximum.

46 Example. The toss of a three-sided die yields events E1, E2, and E3 with a face of one, two, and three points. As a result of tossing many dice, we learn that the mean number of points is f = 1.9, but we do not know the individual probabilities. What are the values of P1, P2, and P3 that maximize the uncertainty? Solution. We have, S = − [ P1 ln P1 + P2 ln P2 + P3 ln P3 ] We also know that, f = 1P1 + 2P2 + 3P3, and P1 + P2 + P3 = 1. We use the latter condition to eliminate P3 using P3 = 1 − P1 − P2, and rewrite the above as f = P1 + 2P2 + 3(1 − P1 − P2 ) = 3 − 2P1 − P2. We then use this to eliminate P2 and P3 from the first eq. using P2 = 3 − f − 2P1 and P3 = f − 2 + P1, then S = −[P1 ln P1 + (3 − f − 2P1 ) ln(3 − f − 2P1 ) + (f − 2 + P1 ) ln(f − 2 + P1 )]. Because S depends on only P1, we can differentiate S with respect to P1 to find its maximum value:

47 Microstates and Macrostates Each possible outcome is called a “microstate”. The combination of all microstates that give the same number of spots is called a “macrostate”. The macrostate that contains the most microstates is the most probable to occur.

48 Microstates and Macrostates Macrostate: the state of a macro system specified by its macroscopic parameters. Two systems with the same values of macroscopic parameters are thermodynamically indistinguishable. A macrostate tells us nothing about a state of an individual particle. For a given set of constraints (conservation laws), a system can be in many macrostates. Microstate: the state of a system specified by describing the quantum state of each molecule in the system. For a classical particle – 6 parameters (x i, y i, z i, p xi, p yi, p zi ), for a macro system – 6N parameters. The statistical approach: to connect the macroscopic observables (averages) to the probability for a certain microstate to appear along the system’s trajectory in configuration space, P(  1,  2,...,  N ). The evolution of a system can be represented by a trajectory in the multidimensional (configuration, phase) space of micro- parameters. Each point in this space represents a microstate. During its evolution, the system will only pass through accessible microstates – the ones that do not violate the conservation laws: e.g., for an isolated system, the total internal energy must be conserved.  1 1  2 2  i i

49 The Phase Space vs. the Space of Macroparameters V T P  1 1  2 2  i i the surface defined by an equation of states some macrostate  1 1  2 2  i i  1 1  2 2  i i  1 1  2 2  i i numerous microstates in a multi-dimensional configuration (phase) space that correspond the same macrostate etc., etc., etc....

50 Examples: Two-Dimensional Configuration Space motion of a particle in a one-dimensional box -L L Lx pxpx -p x “Macrostates” are characterized by a single parameter: the kinetic energy K 0 K 0 Each “macrostate” corresponds to a continuum of microstates, which are characterized by specifying the position and momentum K=K 0 Another example: one-dimensional harmonic oscillator x pxpx K + U =const x U(r)U(r)

51 The Fundamental Assumption of Statistical Mechanics The ergodic hypothesis: an isolated system in an equilibrium state, evolving in time, will pass through all the accessible microstates at the same recurrence rate, i.e. all accessible microstates are equally probable. The average over long times will equal the average over the ensemble of all equi- energetic microstates: if we take a snapshot of a system with N microstates, we will find the system in any of these microstates with the same probability. Probability for a stationary system many identical measurements on a single system a single measurement on many copies of the system The ensemble of all equi-energetic states  a microcanonical ensemble. Note that the assumption that a system is isolated is important. If a system is coupled to a heat reservoir and is able to exchange energy, in order to replace the system’s trajectory by an ensemble, we must determine the relative occurrence of states with different energies. For example, an ensemble whose states’ recurrence rate is given by their Boltzmann factor (e -E/kBT ) is called a canonical ensemble.  1 1  2 2  i i microstates which correspond to the same energy

52 Probability of a Macrostate, Multiplicity The probability of a certain macrostate is determined by how many microstates correspond to this macrostate – the multiplicity of a given macrostate . This approach will help us to understand why some of the macrostates are more probable than the other, and, eventually, by considering the interacting systems, we will understand irreversibility of processes in macroscopic systems.

53 Probability Multiplication rule for independent events: P (i and j) = P (i) x P (j) “Probability theory is nothing but common sense reduced to calculations” Laplace (1819) Example: What is the probability of the same face appearing on two successive throws of a dice? The probability of any specific combination, e.g., (1,1): 1/6x1/6=1/36 (multiplication rule). Hence, by addition rule, P(same face) = P(1,1) + P(2,2) +...+ P(6,6) = 6x1/36 = 1/6 An event (very loosely defined) – any possible outcome of some measurement. An event is a statistical (random) quantity if the probability of its occurrence, P, in the process of measurement is < 1. The “sum” of two events: in the process of measurement, we observe either one of the events. Addition rule for independent events: P (i or j) = P (i) + P (j) The “product” of two events: in the process of measurement, we observe both events. (independent events – one event does not change the probability for the occurrence of the other). a macroscopic observable A: (averaged over all accessible microstates)

54 Two Interacting Einstein Solids, Macropartitions Suppose that we bring two Einstein solids A and B (two sub-systems with N A, U A and N B, U B ) into thermal contact, to form a larger isolated system. What happens to these solids (macroscopically) after they have been brought into contact? Example: the pair of macrostates where U A = 2  and U B = 4  is one possible macropartition of the combined system with U = 6  N A, U A N B, U B energy The combined system – N = N A + N B, U = U A + U B As time passes, the system of two solids will randomly shift between different microstates consistent with the constraint that U = const. Macropartition: a given pair of macrostates for sub-systems A and B that are consistent with conservation of the total energy U = U A + U B. Different macropartitions amount to different ways that the energy can be macroscopically divided between the sub-systems. Question: what would be the most probable macropartition for given N A, N B, and U ?

55 Problem: Consider the system consisting of two Einstein solids in thermal contact. A certain macropartition has a multiplicity of 6  10 1024, while the total number of microstates available to the system in all macropartitions is 3  10 1034. What is the probability to find the system in this macropartition? Imagine that the system is initially in the macropartition with a multiplicity of 6  10 1024. Consider another macropartition of the same system with a multiplicity of 6  10 1026. If we look at the system a short time later, how many more times likely is it to have moved to the second macropartition than to have stayed with the first?

56 The Multiplicity of Two Sub-Systems Combined Example: two one-atom “solids” into thermal contact, with the total U = 6 . Macro- partition UAUA UBUB  A A  B B  AB 0 : 6 0  6  128 1 : 5 1  5  32163 2 : 4 2  4  61590 3 : 3 3  10 100 4 : 2 4  2  15690 5 : 1 5  1  21363 6 : 0 6  0  281 Possible macropartitions for N A = N B = 3, U = q A +q B = 6  Grand total # of microstates: The probability of a macropartition is proportional to its multiplicity: macropartition A+B sub-system A sub-system B

57 Where is the Maximum? The Average Energy per Atom For two systems in thermal contact, the equilibrium (most probable) macropartition of the combined system is the one where the average energy per atom in each system is the same (the basis for introducing the temperature). At home: find the position of the maximum of  AB (U A ) for N A = 200, N B = 100, U = 180 AA UAUA BB UAUA  AB UAUA x = U/2 Let’s explore how the macropartition multiplicity for two sub-systems A and B (N A, N B,  A =  B =  ) in thermal contact depends on the energy of one of the sub-systems: The high-T limit (q >> N): For two identical sub-systems (N A = N B ),  AB (U A ) is peaked at U A = U B = ½ U :

58 Sharpness of the Multiplicity Function Example: N = 100,000 x = 0.01  (0.9999) 100,000 ~ 4.5·10 -5  1 How sharp is the peak? Let’s consider small deviations from the maximum for two identical sub-systems: U A = (U/2) (1+x)U B = (U/2) (1-x)(x <<1)  AB UAUA U/2 More rigorously (p. 65): 2U2U The peak width: a Gaussian function When the system becomes large, the probability as a function of U A (macropartition) becomes very sharply peaked.

59 Problem: Consider the system consisting of two Einstein solids P and Q in thermal equilibrium. Assume that we know the number of atoms in each solid and . What do we know if we also know (a)the quantum state of each atom in each solid? (b) the total energy of each of the two solids? (c) the total energy of the combined system? the system’s macrostate the system’s microstate the system’s macropartition (a) (b) (c) XXXXXX X X X (fluctuations)

60 Implications? Irreversibility! When two macroscopic solids are in thermal equilibrium with each other, completely random and reversible microscopic processes (leading to random shuffling between microstates) tend at the macroscopic level to push the solids inevitably toward an equilibrium macropartition (an irreversible macro behavior). Any random fluctuations away from the most likely macropartition are extremely small ! The vast majority of microstates are in macropartitions close to the most probable one (in other words, because of the “narrowness” of the macropartition probability graph). Thus, (a)If the system is not in the most probable macropartition, it will rapidly and inevitably move toward that macropartition. The reason for this “directionality” (irreversibility): there are far more microstates in that direction than away. This is why energy flows from “hot” to “cold” and not vice versa. (b)It will subsequently stay at that macropartition (or very near to it), in spite of the random shuffling of energy back and forth between the two solids.

61 Problem: Imagine that you discover a strange substance whose multiplicity is always 1, no matter how much energy you put into it. If you put an object made of this substance (sub-system A) into thermal contact with an Einstein solid having the same number of atoms but much more energy (sub-system B), what will happen to the energies of these sub-systems? A.Energy flows from B to A until they have the same energy. B.Energy flows from A to B until A has no energy. C.No energy will flow from B to A at all.

62 Two model systems with fixed positions of particles and discrete energy levels - the models are attractive because they can be described in terms of discrete microstates which can be easily counted (for a continuum of microstates, as in the example with a freely moving particle, we still need to learn how to do this). This simplifies calculation of . On the other hand, the results will be applicable to many other, more complicated models. Despite the simplicity of the models, they describe a number of experimental systems in a surprisingly precise manner. - two-state paramagnet .... (“limited” energy spectrum) - the Einstein model of a solid (“unlimited” energy spectrum)

63 The Two-State Paramagnet The energy of a macrostate: N  - the number of “up” spins N  - the number of “down” spins - a system of non-interacting magnetic dipoles in an external magnetic field B, each dipole can have only two possible orientations along the field, either parallel or any-parallel to this axis (e.g., a particle with spin ½ ). No “quadratic” degrees of freedom (unlike in an ideal gas, where the kinetic energies of molecules are unlimited), the energy spectrum of the particles is confined within a finite interval of E (just two allowed energy levels).  - the magnetic moment of an individual dipole (spin) E E 1 = -  B E 2 = +  B 0 an arbitrary choice of zero energy -  B for  parallel to B, +  B for  anti-parallel to B The total magnetic moment: (a macroscopic observable) The energy of a single dipole in the external magnetic field: A particular microstate ( ....) is specified if the directions of all spins are specified. A macrostate is specified by the total # of dipoles that point “up”, N  (the # of dipoles that point “down”, N  = N - N  ).

64 Example Consider two spins. There are four possible configurations of microstates: M = 2  0 0 - 2  In zero field, all these microstates have the same energy (degeneracy). Note that the two microstates with M=0 have the same energy even when B  0: they belong to the same macrostate, which has multiplicity  =2. The macrostates can be classified by their moment M and multiplicity  : M = 2  0 - 2   = 1 2 1 For three spins: M = 3     -  -  -  -3  M = 3   -  -3   = 1 3 3 1 macrostates:

65 The Multiplicity of Two-State Paramagnet Each of the microstates is characterized by N numbers, the number of equally probable microstates – 2 N, the probability to be in a particular microstate – 1/2 N. n !  n factorial = 1·2·....·n 0 !  1 (exactly one way to arrange zero objects) For a two-state paramagnet in zero field, the energy of all macrostates is the same (0). A macrostate is specified by (N, N  ). Its multiplicity - the number of ways of choosing N  objects out of N : The multiplicity of a macrostate of a two-state paramagnet with (N, N  ):

66 The Probability of Macrostates of a Two-State PM (B=0) (http://stat-www.berkeley.edu/~stark/Java/Html/) - as the system becomes larger, the P(N,N  ) graph becomes more sharply peaked: N =1   (1,N  ) =1, 2 N =2, P(1,N  )=0.5 NN P(1, N  ) 0.5 01 n 00.5·10 23 10 23   NN NN P(15, N  ) P(10 23, N  ) - random orientation of spins in B=0 is overwhelmingly more probable

67 Bernoulli Processes and the Binomial Distribution Because most physicists spend little time gambling, we will have to develop our intuitive understanding of probability in other ways. Our strategy will be to first consider some physical systems, e.g magnetic moment or spin, for which we can calculate the probability distribution by analytical methods. Then we will use the computer to generate more data to analyze.

68 Consider a system of N noninteracting magnetic dipoles each having a magnetic moment µ and associated spin in an external magnetic field B. The field B is in the up (+z) direction. According to quantum mechanics the component of the magnetic dipole moment along a given axis is limited to certain discrete values. Spin 1/2 implies that a spin can either point up (parallel to B ) or down (antiparallel to B ). The energy of interaction of each spin with the magnetic field is E = −µB if the spin is up and +µB if the spin is down. This model is a simplification of more realistic magnetic systems.

69 Take p to be the probability that the spin (magnetic moment) is up and q the probability that the spin is down. Because there are no other possible outcomes, we have p +q = 1 or q = 1 −p. If B = 0, there is no preferred spatial direction and p = q = 1/2. For B = 0 we do not yet know how to calculate p and for now we will assume that p is a known parameter. We associate with each spin a random variable s i which has the values ±1 with probability p and q, respectively. One of the quantities of interest is the magnetization M, which is the net magnetic moment of the system. For a system of N spins the magnetization is given by M = µ(s 1 + s 2 +... + s N ) = µ ∑ s i take µ = 1

70 first calculate the mean value of M, then its variance, and finally the probability distribution P (M ) that the system has magnetization M. To compute the mean value of M, we need to take the mean values of both sides = = ∑

71 Because the probability that any spin has the value ±1 is the same for each spin, the mean value of each spin is the same, that is, = =... = ≡. Therefore the sum consists of N equal terms and can be written as = N The meaning of above equation is that the mean magnetization is N times the mean magnetization of a single spin. Because = (1 × p) + (−1 × q) = p − q, we have that = N (p − q) Let us calculate the variance of M, that is, ) 2 >. We write ΔM = M − = ∑ Δ Where Δs i ≡ s i −

72 Example: let us calculate for N = 3 spins Solution: (ΔM) 2 = (Δs 1 + Δs 2 + Δs 3 )(Δs 1 + Δs 2 + Δs 3 ) = [(Δs 1 ) 2 + (Δs 2 ) 2 + (Δs 3 ) 2 ] + 2[Δs 1 Δs 2 + Δs 1 Δs 3 + Δs 2 Δs 3 ] take the mean value, interchange the order of the sums and averages, and write = [ + + ] + 2[ + + ] Then = = 0,(i ≠ j ) because = 0, Because different spins are statistically independent (the spins do not interact), each cross term vanishes on the average. Then, = [ + + ] Because each spin is equivalent on the average, each term is equal. Hence, we obtain the desired result = 3 The variance of M is 3 times the variance of a single spin, that is, the variance is additive.

73 We can evaluate further by finding an explicit expression for. We have that = [1 2 × p] + [(−1) 2 × q] = p + q = 1. Hence, we have = − 2 =1−(p − q) 2 =1−(2p−1) 2 = 1 − 4p 2 + 4p − 1 = 4p(1 − p) = 4pq = 3 (4pq) for N non interacting spins = N (4pq)

74 Because of the simplicity of a system of non interacting spins, we can calculate the probability distribution itself. Let us consider the statistical properties of a system of N = 3 non interacting spins. Because each spin can be in one of two states, there are 2 N=3 = 8 distinct outcomes. Because each spin is independent of the other spins, we can use the multiplication rule to calculate the probabilities of each outcome. Although each outcome is distinct, several of the configurations have the same number of up spins. One quantity of interest is the probability P N (n) that n spins are up out a total of N spins. For example, there are three states with n = 2, each with probability p 2 q so the probability that two spins are up is equal to 3p 2 q.

75 For N = 3 we see from Figure P 3 (n = 3) = p 3 P 3 (n = 2) = 3p 2 q P 3 (n = 1) = 3pq 2 P 3 (n = 0) = q 3 N=4  … N= 6  … N= n  …

76 Example: Find the first two moments of P 3 (n) Solution. The first moment n of the distribution is given by = 0 × q 3 + 1 × 3pq 2 + 2 × 3p 2 q + 3 × p 3 = 3p (q 2 + 2pq + p 2 ) = 3p (q + p) 2 = 3p Similarly, the second moment of the distribution is given by = 0 × q 3 + 1 2 × 3pq 2 + 2 2 × 3p 2 q + 3 2 × p 3 = 3p (q 2 + 4pq + 3p 2 ) = 3p(q + 3p)(q + p) = 3p (q + 3p) = (3p) 2 + 3pq Hence ) 2 > = − 2 = 3pq

77 First, in each trial there are only two outcomes, for example, up or down, heads or tails, and right or left. Second, the result of each trial is independent of all previous trials, for example, the drunken sailor has no memory of his or her previous steps. This type of process is called a Bernoulli process (after the mathematician Jacob Bernoulli, 1654- 1705)

78 Because of the importance of magnetic systems, we will cast our discussion of Bernoulli processes in terms of the non interacting magnetic moments of spin 2. The main quantity of interest is the probability P N (n) which we now calculate for arbitrary N and n. We know that a particular outcome with n up spins and n′ down spins occurs with probability p n q n ′. We write the probability P N (n) as P N (n) = W N (n, n′ ) s p n q n’ where n′ = N − n and W N (n, n′ ) is the number of distinct configurations of N spins with n up spins and n′ down spins. From our discussion of N = 3 non interacting spins, we already know the first several values of W N (n, n′ ).

79 We can determine the general form of W N (n, n′ ) by obtaining a recursion relation between W N and W N −1. A total of n up spins and n′ down spins out of N total spins can be found by adding one spin to N − 1 spins. The additional spin is either (a) up if there are (n − 1) up spins and n′ down spins, or (b) down if there are n up spins and (n′ − 1) down spins. Because there are W N (n − 1, n′ ) ways of reaching the first case and W N (n, n′ − 1) ways in the second case, we obtain the recursion relation W N (n, n′ ) = W N −1 (n − 1, n′ ) + W N −1 (n, n′ − 1). If we begin with the known values W 0 (0, 0) = 1, W 1 (1, 0) = W 1 (0, 1) = 1, we can use the recursion relation to construct W N (n, n′ ) for any desired N. For example, W 2 (2, 0) = W 1 (1, 0) + W 1 (2, −1) = 1 + 0 = 1 W 2 (1, 1) = W 1 (0, 1) + W 1 (1, 0) = 1 + 1 = 2 W 2 (0, 2) = W 1 (−1, 2) + W 1 (0, 1) = 0 + 1 It shows that W N (n, n′ ) forms a pyramid or (a Pascal) triangle.

80 The values of the first few coefficients W N (n, n′ ). Each number is the sum of the two numbers to the left and right above it. This construction is called a Pascal triangle.

81 It is straightforward to show by induction that the expression Note the convention 0! = 1 Binomial distribution

82 Binomial Distribution Probability of n successes in N attempts (p + q) N = 1 where, q = 1 – p.

83 Note that for p = q = 1/2, P N (n) reduces to The binomial distribution P 16 (n) for p = q = 1/2 and N = 16

84 Thermodynamic Probability The term with all the factorials in the previous equation is the number of microstates that will lead to the particular macrostate. It is called the “thermodynamic probability”, w n.

85 Microstates The total number of microstates is: For a very large number of particles

86 Mean of Binomial Distribution

87

88 Standard Deviation (  )

89 Standard Deviation

90

91 For a Binomial Distribution

92 Coins Toss 6 coins. Probability of n heads:

93 For Six Coins

94 For 100 Coins

95 For 1000 Coins

96 Math required to bridge the gap between 1 and 10 23 Typically, N is huge for macroscopic systems, and the multiplicity is unmanageably large – for an Einstein solid with 10 23 atoms, One of the ways to deal with these numbers – to take their logarithm [ in fact, the entropy ]  thus, we need to learn how to deal with logarithms of huge numbers.

97 Stirling’s Approximation for N! (N>>1) Multiplicity depends on N!, and we need an approximation for ln(N!): Check: More accurately: because ln N << N for large N or i

98 Stirling’s Approximation Multiple outcomes for large N

99 Number Expected Toss 6 coins N times. Probability of n heads: Number of times n heads is expected is: n = N P(n)

100 The Gaussian Distribution as a Limit of the Binomial Distribution for N>>1, P N (n) is a rapidly varying function of n near n = pN, and for this reason we do not want to approximate P N (n) directly. Because the logarithm of P N (n) is a slowly varying function, we expect that the power series expansion of ln P N (n) will converge. Hence, we expand ln P N (n) in a Taylor series about the value of n = ñ at which ln P N (n) reaches its maximum value. We will write p(n) instead of P N (n) because we will treat n as a continuous variable and hence p(n) is a probability density. We find

101 Because we have assumed that the expansion is about the maximum n = ñ, the first derivative d ln p(n)/dn| n= ñ must be zero. For the same reason the second derivative must be negative. We assume that the higher terms can be neglected and adopt the notation ln A = ln p(n = ñ ) and Then We next use Stirling’s approximation to evaluate the first two derivatives of ln p(n) and the value of ln p(n) at its maximum to find the parameters A, B, and ñ

102 From binomial distribution Take log. To obtain ln p(n) = ln N ! − ln n! − ln(N − n)! + n ln p + (N − n) ln q Use relation We have, d(ln p(n))/dn = − ln n + ln(N − n) + ln p − ln q The most probable value of n is found by finding the value of n that satisfies the condition d ln p/dn = 0. We findor (N − ñ )p = ñq. If we use the relation p + q = 1, we obtain ñ = pN Note that ñ = n, that is, the value of n for which p(n) is a maximum is also the mean value of n.

103 See, d(ln p(n))/dn = − ln n + ln(N − n) + ln p − ln q The second derivative Then We use or B = 1/σ 2, where σ 2 is the variance of n. (Gaussian probability density)

104 Gaussian probability density is valid for large values of N and for values of n near. Even for relatively small values of N, the Gaussian approximation is a good approximation for most values of n. The most important feature of the Gaussian probability distribution is that its relative width, σ n /, decreases as N −1/2.

105 The Poisson distribution and Should You Fly in Air-planes? We now return to the question of whether or not it is safe to fly. If the probability of a plane crashing is p = 10 −5, then 1 − p is the probability of surviving a single flight. The probability of surviving N flights is then P N = (1 − p) N. For N = 400, P N ≈ 0.996, and for N = 10 5, P N ≈ 0.365. Thus, our intuition is verified that if we took 400 flights, we would have only a small chance of crashing.

106 This type of reasoning is typical when the probability of an individual event is small, but there are very many attempts. Suppose we are interested in the probability of the occurrence of n events out of N attempts given that the probability p of the event for each attempt is very small. The resulting probability is called the Poisson distribution, a distribution that is important in the analysis of experimental data. To derive the Poisson distribution, we begin with the binomial distribution:

107 We first use Stirling’s approximation to write

108 For p << 1, we have ln(1 − p) ≈ −p, e ln(1−p) = 1 − p ≈ e −p, and (1 − p) N −n ≈ e −p(N −n) ≈ e −pN. If we use the above approximations, we find With = pN Poisson distribution

109 Let us apply the Poisson distribution to the airplane survival problem. We want to know the probability of never crashing, that is, P (n = 0). The mean = pN equals 10 −5 × 400 = 0.004 for N = 400 flights and = 1 for N = 10 5 flights. Thus, the survival probability is P (0) = e − ≈ 0.996 for N = 400 and P (0) ≈ 0.368 for N = 10 5 as we calculated previously. We see that if we fly 100,000 times, we have a much larger probability of dying in a plane crash.

110 Traffic Flow and the Exponential Distribution The Poisson distribution is closely related to the exponential distribution as we will see in the following. Consider a sequence of similar random events and let t 1, t 2,... be the time at which each successive event occurs. Examples of such sequences are the successive times when a phone call is received and the times when a Geiger counter registers a decay of a radioactive nucleus. Suppose that we determine the sequence over a very long time T that is much greater than any of the intervals t i − t i−1. We also suppose that the average number of events is λ per unit time so that in a time interval t, the mean number of events is λt. Assume that the events occur at random and are independent of each other. Given λ, the mean number of events per unit time, we wish to find the probability distribution w(t) of the interval t between the events. We know that if an event occurred at time t = 0, the probability that another event occurs within the interval [0, t] is

111 and the probability that no event occurs in the interval t is Thus the probability that the duration of the interval between the two events is between t and t + Δt is given by w(t)Δt = probability that no event occurs in the interval [0, t] × probability that an event occurs in interval [t, t + Δt] If we cancel Δt from each side and differentiate both sides with respect to t, we find 

112 The constant of integration A is determined from the normalization condition: Hence, w(t) is the exponential function These results for the exponential distribution lead naturally to the Poisson distribution. Let us divide a long time interval T into n smaller intervals t = T /n. What is the probability that 0, 1, 2, 3,... events occur in the time interval t, given λ, the mean number of events per unit time? We will show that the probability that n events occur in the time interval t is given by the Poisson distribution:

113 We first consider the case n = 0. If n = 0, the probability that no event occurs in the interval t is For the case n = 1, there is exactly one event in time interval t. This event must occur at some time t′ which may occur with equal probability in the interval [0, t]. Because no event can occur in the interval [t′, t], we have

114 with t → (t − t′ ). Hence, In general, if n events are to occur in the interval [0, t], the first must occur at some time t′ and exactly (n − 1) must occur in the time (t − t′ ). Hence, The above equation is a recurrence formula

115 Simulation 1.Approach to equilibrium (http://stp.clarku.edu/simulations/approachtoequilibrium/index.h tml 2.Sensitivity to initial conditions (http://stp.clarku.edu/simulations/sensitive/index.html) 3.Random walks (http://stp.clarku.edu/simulations/randomwalks/index.html) 4.Multiple coin toss (http://stp.clarku.edu/simulations/cointoss/index.html) 5.The Binomial distribution (http://stp.clarku.edu/simulations/binomial/index.html) 6.Monte Carlo estimation (http://stp.clarku.edu/simulations/estimate/index.html) 7.Random multiplicative processes (http://stp.clarku.edu/simulations/multiplicativeprocess/index.ht ml)

116 Kuis Probabilitas, 31 Agustus 2010, 30 menit 1.Berapakah probabilitas kemunculan pelemparan dadu sekali untuk angka 3 atau 6? 2.Berapakah probabilitas kemunculan pelemparan dadu sekali untuk angka bukan 6? 3.Berapakah probabilitas kemunculan pelemparan dadu sekali untuk angka genap ? 4.Berapakah probabilitas kemunculan pelemparan dadu dua kali untuk paling tidak satu angka 6?

117 Next … Boltzmann distribution


Download ppt "Concept of Probability AS3105 Astrophysical Processes 1 Dhani Herdiwijaya."

Similar presentations


Ads by Google