Computing Fundamentals 2 Lecture 7 Statistics

Slides:



Advertisements
Similar presentations
Presentation on Probability Distribution * Binomial * Chi-square
Advertisements

Chapter 12 Probability © 2008 Pearson Addison-Wesley. All rights reserved.
1 Chapter 3 Probability 3.1 Terminology 3.2 Assign Probability 3.3 Compound Events 3.4 Conditional Probability 3.5 Rules of Computing Probabilities 3.6.
Chapter 5 Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama.
Copyright ©2011 Nelson Education Limited. Probability and Probability Distributions CHAPTER 4.
Chapter 4 Probability and Probability Distributions
Business Statistics for Managerial Decision
Unit 32 STATISTICS.
Introduction to Probability
Introduction to Probability and Statistics
Probability Distributions Finite Random Variables.
Probability Distributions
LING 438/538 Computational Linguistics Sandiway Fong Lecture 17: 10/24.
Copyright © 2014 by McGraw-Hill Higher Education. All rights reserved.
Lecture Slides Elementary Statistics Twelfth Edition
Probability Distributions: Finite Random Variables.
Problem A newly married couple plans to have four children and would like to have three girls and a boy. What are the chances (probability) their desire.
Sets, Combinatorics, Probability, and Number Theory Mathematical Structures for Computer Science Chapter 3 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesProbability.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 4 and 5 Probability and Discrete Random Variables.
 A number expressing the likelihood that a specific event will occur, expressed as the ratio of the number of actual occurrences.
Chapter 1 Probability and Distributions Math 6203 Fall 2009 Instructor: Ayona Chatterjee.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Review and Preview This chapter combines the methods of descriptive statistics presented in.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Independence and Bernoulli.
Chapter 8 Probability Section R Review. 2 Barnett/Ziegler/Byleen Finite Mathematics 12e Review for Chapter 8 Important Terms, Symbols, Concepts  8.1.
Theory of Probability Statistics for Business and Economics.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 1 – Slide 1 of 34 Chapter 11 Section 1 Random Variables.
Copyright © 2006 Brooks/Cole, a division of Thomson Learning, Inc.
1 Lecture 4. 2 Random Variables (Discrete) Real-valued functions defined on a sample space are random vars. determined by outcome of experiment, we can.
 Review Homework Chapter 6: 1, 2, 3, 4, 13 Chapter 7 - 2, 5, 11  Probability  Control charts for attributes  Week 13 Assignment Read Chapter 10: “Reliability”
Random Variables. A random variable X is a real valued function defined on the sample space, X : S  R. The set { s  S : X ( s )  [ a, b ] is an event}.
DISCRETE PROBABILITY DISTRIBUTIONS
BINOMIALDISTRIBUTION AND ITS APPLICATION. Binomial Distribution  The binomial probability density function –f(x) = n C x p x q n-x for x=0,1,2,3…,n for.
Binomial Experiment A binomial experiment (also known as a Bernoulli trial) is a statistical experiment that has the following properties:
Week 21 Conditional Probability Idea – have performed a chance experiment but don’t know the outcome (ω), but have some partial information (event A) about.
Computing Fundamentals 2 Lecture 6 Probability Lecturer: Patrick Browne
The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables.
Discrete Distributions. Random Variable - A numerical variable whose value depends on the outcome of a chance experiment.
Sixth lecture Concepts of Probabilities. Random Experiment Can be repeated (theoretically) an infinite number of times Has a well-defined set of possible.
Barnett/Ziegler/Byleen Finite Mathematics 11e1 Chapter 11 Review Important Terms, Symbols, Concepts Sect Graphing Data Bar graphs, broken-line graphs,
Copyright © 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Review of Statistics I: Probability and Probability Distributions.
Random Variables Learn how to characterize the pattern of the distribution of values that a random variable may have, and how to use the pattern to find.
Probability Theory Modelling random phenomena. Permutations the number of ways that you can order n objects is: n! = n(n-1)(n-2)(n-3)…(3)(2)(1) Definition:
Chapter 5 Discrete Random Variables Probability Distributions
Binomial Distributions Chapter 5.3 – Probability Distributions and Predictions Mathematics of Data Management (Nelson) MDM 4U.
Lecture 6 Dustin Lueker.  Standardized measure of variation ◦ Idea  A standard deviation of 10 may indicate great variability or small variability,
+ Chapter 5 Overview 5.1 Introducing Probability 5.2 Combining Events 5.3 Conditional Probability 5.4 Counting Methods 1.
Lecture 7 Dustin Lueker.  Experiment ◦ Any activity from which an outcome, measurement, or other such result is obtained  Random (or Chance) Experiment.
Chapter 2: Probability. Section 2.1: Basic Ideas Definition: An experiment is a process that results in an outcome that cannot be predicted in advance.
Computing Fundamentals 2 Lecture 7 Statistics, Random Variables, Expected Value. Lecturer: Patrick Browne
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 5-1 Chapter 5 Some Important Discrete Probability Distributions Business Statistics,
Probability Distribution. Probability Distributions: Overview To understand probability distributions, it is important to understand variables and random.
Random Variables Lecture Lecturer : FATEN AL-HUSSAIN.
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate accuracy.
Chapter5 Statistical and probabilistic concepts, Implementation to Insurance Subjects of the Unit 1.Counting 2.Probability concepts 3.Random Variables.
3 Discrete Random Variables and Probability Distributions
Discrete and Continuous Random Variables
Conditional Probability
Chapter 5 STATISTICS (PART 1).
Discrete Probability Distributions
STA 291 Spring 2008 Lecture 7 Dustin Lueker.
Chapter 16.
PROBABILITY AND STATISTICS
Chapter 5 Some Important Discrete Probability Distributions
Discrete Distributions
Discrete Distributions
Lecture 11: Binomial and Poisson Distributions
Introduction to Probability and Statistics
Applied Discrete Mathematics Week 12: Discrete Probability
Discrete Distributions.
Chapter 11 Probability.
Presentation transcript:

Computing Fundamentals 2 Lecture 7 Statistics Lecturer: Patrick Browne http://www.comp.dit.ie/pbrowne/ See: http://www.stats.gla.ac.uk/steps/glossary/probability_distributions.html

Statistics Raw data are just lists of facts and numbers. The branch of mathematics that organizes, analyzes and interprets raw data is called statistics.

Recall: Permutations, Combinations P(n,r) = n! / (n-r)! Permutations a, b, and c taken 2 at a time is 3*2/1=6 <sequence> <ab>,<ba>,<ac>,<ca>,<bc>,<cb> C(n,r) = n! /r! (n-r)! Combinations of a, b, and c taken 2 at a time is 3*2/2*1=3. {ab},{ac},{bc} {set} {ab} is the same combination as {ba}, but <ab>,<ba> are distinct permutations

Recall Probability Calculations Calculation of union, sum P(A  B) = P(A) + P(B) – P(A  B) Calculation of intersection, product P(A ∩ B) = P(A) × P(B|A) Conditional probability of A given E: P(A|E) = P(A  E)/P(E) Test for independence P(A  B) = P(A) × P(B)

Frequency Table One way of organizing raw data is to use a frequency table (or frequency distribution), which shows the number of times that an individual item occurs or the number of items that fall within a given range or interval.

Frequency Distribution Suppose that a sample consists of the heights of 100 male students at XYZ University. We arrange the data into classes or categories and determine the number of individuals belonging to each class, called the class frequency. The resulting table is called a frequency distribution or frequency table

Frequency Distribution The first class or category, for example, consists of heights from 60 to 62 inches, indicated by 60–62, which is called class interval. Since 5 students have heights belonging to this class, the corresponding class frequency is 5. Since a height that is recorded as 60 inches is actually between 59.5 and 60.5 inches while one recorded as 62 inches is actually between 61.5 and 62.5 inches, we could just as well have recorded the class interval as 59.5 – 62.5. In the class interval 59.5 – 62.5, the numbers 59.5 and 62.5 are often called class boundaries.

Frequency Distribution The midpoint of the class interval, which can be taken as representative of the class, is called the class mark. A graph for the frequency distribution can be supplied by a histogram.

Frequency table & class interval 1 6 3 5 12 4 7 14 2 8 Frequency #tennents 3 110 105 2 100 8 95 5 90 7 85 80 75 70 Frequency TempRange

Probability Assume that all sample events are equally likely. We define classical probability that an event A will occur as P(A) = #Simple Events in A #Simple Events in S So P(A) is the number of ways in which A can occur, divided by the number of possible individual outcomes, assuming all are equally likely. Where S is the sample space.

Example Tossing a coin twice: Probability 1/4 for each simple event. S = {HH, HT, TH, TT}, Probability 1/4 for each simple event. A = {Exactly One Head} = {HT,TH} Then P(A) = 2/4 = 1/2 Does this tell us how often A would occur if we repeated the experiment (“toss a coin twice”) many times?

Relative frequency The probability of an event is the long run frequency of occurrence. To estimate P(A) using the frequency approach, repeat the experiment n times (with n large) and compute x/n, where x = # Times A occurred in the n trials. The larger we make n, the closer x/n gets to P(A).

Relative frequency If there have been 126 launches of the Space Shuttle, and two of these resulted in a catastrophic failure, we can estimate the probability that the next launch will fail to be 2/126 = 0.016. The relative frequency allows us to determine the probability from actual data. It is more widely applicable than the classical approach, since it doesn't require us to specify a sample space consisting of equally likely simple events.

Relationships between probability and frequency Frequencies are relevant when modelling repeated trials, or repeated sampling from a population.

Mean The arithmetic mean is the sum of the values in a data collection divided by the number of elements in that data collection.

Mean The arithmetic mean is the sum of the values in a data collection divided by the number of elements in that data collection. x = ∑xi n x = ∑fixi where f denotes frequency ∑fi

Range The range measures dispersion. It is the difference between the lowest and highest values in the data. For example: The highest CA = 48, lowest = 27 giving a range of 21. The highest exam = 45 and lowest = 12 giving a range of 33. There was wider variation in the students’ performance in the exam. than in the CA.

Variance & Standard Deviation List A: 12,10,9,9,10 List B: 7,10,14,11,8 The mean (x) of A & B is 10, but the values in A are more closely clustered around the mean than those in B (or there is greater desperation or spread in B). We use the standard deviation to measure this spread (SD(A)≈1.1,SD(B) ≈2.4)

Standard Deviation The standard deviation measures the spread of the data about the mean value. It is useful in comparing data which may have the same mean but a different range. The range measure of dispersion and is the difference between the lowest and highest values in the data.

Variance & Standard Deviation The variance is always positive and is zero only when all values are equal. variance = ∑(xi - x )2 n standard deviation = Alternatively The standard deviation measures the spread of the data about the mean value. It is useful in comparing sets of data which may have the same mean but a different range

Variance of a frequency distribution

Median The median is the middle value. If the elements are sorted the median is: Median = valueAt[(n+1)/2] odd Median = average(valueAt[n/2], valueAt[n/2+1]) even For odd and even n respectively. Example {1,2,3,4,5} , Median = 3 Example {1,2,3,4,5,6}, Median = 3.5

Mode The mode is the class or class value which occurs most frequently. mode([1, 2, 2, 3, 4, 7, 9]) = 2 We can have bimodal or multimodal collections of data. http://en.wikipedia.org/wiki/Multimodal_distribution The height of the bars is the number of cases in the category

Bernouilli Trials Independent repeated trial with two outcomes are called Bernouilli Trials. The probability of k successes in a binomial experiment is: Where n is the number of trials and (n-k) is the number of failure and p, q are probabilities of events.

Bernouilli Trials: Example Probability John hits target: p=1/4, Probability John does not hit target: q=3/4, John fires 6 times, n=6,: What is the probability that John hits the target 2 times out of 6? In EXCEL =((5*6)/2)*((1/4)^2)*((3/4)^4)

Bernoulli Trials: Example Probability John hits target: p=1/4, John fires 6 times, n=6,: What is the probability John hits the target at least once? No success (0), all failures, Anything to the power of 0 is 1 Only 1 way to pick 0 from 6 Probability that John hits target at least once EXCEL =1-((3/4)^6) Probability that John does not hit target 0 to the power 0 is undefined, anything else to the power of zero is 1.

Bernoulli Trials: Example Probability that Mary hits target: p=1/4, Mary fires 6 times, n=6,: What is the probability Mary hits the target more than 4 times? In EXCEL =(6)*((1/4)^5)*((3/4)^1)+(1/4)^6 In EXCEL =(6)*((1/4)^5)*((3/4)^1)+(1/4)^6

Random variables and probability distributions. Suppose you toss a coin two times. There are four possible outcomes: HH, HT, TH, and TT. Let the variable X represents the number of heads that result from this experiment. The variable X can take on the values 0, 1, or 2. In this example, X is a random variable; because its value is determined by the outcome of a statistical experiment. v

Random variables and probability distributions. A probability distribution is a table or an equation that links each outcome of a statistical experiment with its probability of occurrence. The table below, which associates each outcome (the number of heads) with its probability. This is an example of a probability distribution. S={HH,HT,TH,TT} A=number of heads {0,1,2}

Random Variable A random variable X on a finite sample space S is a function (or mapping) from S to a number R in S’. Let S be sample space of outcomes from tossing two coins. Then mapping a is; S={HH,HT,TH,TT} (assume HT≠TH) Xa(HH)=1, Xa(HT)=2, Xa(TH)=3, Xa(TT)=4 The range (or image) of the function Xa is: S’={1,2,3,4} From: http://www.stats.gla.ac.uk/steps/glossary/probability_distributions.html A discrete random variable is one which may take on only a countable number of distinct values such as 0, 1, 2, 3, 4, ... Discrete random variables are usually (but not necessarily) counts. If a random variable can take only a finite number of distinct values, then it must be discrete. Examples of discrete random variables include the number of children in a family, the Friday night attendance at a cinema, the number of patients in a doctor's surgery, the number of defective light bulbs in a box of ten. A coin is tossed ten times. The random variable X is the number of tails that are noted. X can only take the values 0, 1, ..., 10, so X is a discrete random variable. S = Sample Space (list of outcomes) n = size of the space (how many outcomes)

Random Variable Let S be sample space of outcomes from tossing two coins, where we are interested in the number of heads. Mapping b is: S={HH,HT,TH,TT} Xb(HH)=2, Xb(HT)=1, Xb(TH)=1, Xb(TT)=0 The range (image) of Xb is: S’’={0,1,2}

Random Variable A random variable is a function that maps a finite sample space into to a numeric value. The numeric value has a finite probability space of real numbers, where probabilities are assigned to the new space according to the following rule: pointi = P(xi)= sum of probabilities of points in S whose range is xi. Recall function F : Domain -> Range (Image) More formally, a random variable is a function from a sample space to the measurable space of possible values of the variable.

Random Variable The function assigning pi to xi can be given as a table called the distribution of the random variable. pi = P(xi)= number of points in S whose image is xi number of points in S (i = 1,2,3...n) gives the distribution of X

Random Variable The equiprobable space generated by tossing pair of fair dice, consists of 36 ordered pairs(1): S={<1,1>,<1,2>,<1,3>...<6,6>} Let X be the random variable which assigns to each element of S the sum of the two dice integers: 2,3,4,5,6,7,8, 9,10,11,12 (1)In a set of ordered pairs <2,2> only appears once whereas <1,3> and <3,1> are considered distinct. These pairs all sum to 4 showing that there is not a 1:1 mapping between sample space and the random variable. In this case, three elements in the sample space map to one element in the distribution of the random variable.

Random Variable Continuing with the sum of the two dice. There is only one point whose image is 2, giving P(2)=1/36. There are two points whose image is 3, giving P(3)=2/36. (<1,2>≠<2,1>, but their sums are =) Below is the distribution of X. 1/36 2/36 3/36 4/36 5/36 6/36 pi 12 11 10 9 8 7 6 5 4 3 2 xi =36/36

Example: Random Variable A box contains 9 good items and 3 defective items (total 12 items). Three items are selected at random from the box. Let X be the random variable that counts the number of defective items in a sample. X has a range space Rx = {0,1,2,3}. The sample space 12-choose-3 = 220 different samples of size 3. There are 9-choose-3 = 84 samples of size 3 with 0 defective items. There are 3 * 9-choose-2 = 108 samples of size 3 with 1 defective. There are 3-choose-2 * 9 = 27 samples of size 3 with 2 defective. There 3-choose-3 = 1 samples of size 3 with 3 defective items. Where n-choose-r means the number of combinations (sets): =COMBIN(12,3)) 84 108 27 1 ----- 220 There are c(9,3) = 84 of sample size 3, with 0 defective reduce in CALC : combCalc(12,3) (220):NzNat red combCalc(9,3) . (84):NzNat red combCalc(9,2) . (36):NzNat red combCalc(3,2) . (3):NzNat red 3 * 36 . (108):NzNat In EXCEL =COMBIN(6,2) =PERMUT(6,2)

Example: Random Variable A box contains 9 good items and 3 defective items (total 12 items). Three items are selected at random from the box. Let X be the random variable that counts the number of defective items in a sample. X can have values 0-3. Below is the distribution of X. 84 108 27 1 ----- 220 There are c(9,3) = 84 of sample size 3, with 0 defective reduce in CALC : combCalc(12,3) (220):NzNat red combCalc(9,3) . (84):NzNat red combCalc(9,2) . (36):NzNat red combCalc(3,2) . (3):NzNat red 3 * 36 . (108):NzNat 1/220 27/220 108/220 84/220 pi 3 2 1 xi = 220/220

Functions of a Random Variable If X is a random variable then so is Y=f(X). P(yk) = sum of probabilities xi, such that yk=f(xi)

Expectation and variance of a random variable Let X be a discrete random variable over sample space S. X takes values x1,x2,x3,... xt with respective probabilities p1,p2,p3,... pt An experiment which generates S is repeated n times and the numbers x1,x2,x3,... xt occur with frequency f1,f2,f3,... ft (fi=n) If n is large then one expects

Expectation of a random variable So becomes The final formula is the population mean, expectation, or expected value of X is denoted as  or E(X). u (Mu) represents the population mean, which is the true mean of all of the many individuals in the population. X-bar is the sample mean, which is the mean of the few selected individuals you have observed

Variance of a random variable The variance of X is denoted as 2 or Var(X). 2 2 The standard deviation is

Expected value, Variance, Standard Deviation E(X)= μ = μx =∑xipi Var(X)= 2 = 2x =∑(xi - μ)2pi SD(X)= x =

Example : Random Variable & Expected Value A box contains 9 good items and 3 defective items. Three items are selected at random from the box. Let X be the random variable that counts the number of defective items in a sample. X can have values 0-3. Below is the distribution of X. “12 choose 3” = 1320/6=220 1/220 27/220 108/220 84/220 pi 3 2 1 xi

Example : Random Variable & Expected Value 1/220 27/220 108/220 84/220 pi 3 2 1 xi μ is the expected value of defective items in in a sample size of 3. μ=E(X)= 0(84/220)+1(108/220)+2(27/220)+3(1/220)=132/220=? Var(X)= 02(84/220)+12 (108/220)+22 (27/220)+32 (1/220) - μ 2 =? SD(X) sqrt(μ2)=?

Fair Game1? If a prime number appears on a fair die the player wins that value. If an non-prime appears the player looses that value. Is the game fair?(E(X)=0) S={1,2,3,4,5,6} E(X) = 2(1/6)+3(1/6)+5(1/6)+(-1)(1/6)+(-4)(1/6)+(-6)(1/6)= -1/6 Note: 1 is not prime 1/6 pi -6 -4 -1 5 3 2 xi A game is fair if E(X) = 0, If E(X) > 0 then favourable to player If E(X) < 0 then unfavourable to player

Fair Game2? A player gambles on the toss of two fair coins. If 2 heads occur the player wins 2 Euro. If 1 head occurs he wins 1 Euro. If no heads occur he looses 3 Euro. Is the game fair?(E(X)=0) S={HH,HT,TH,TT}, X(HH) = 2, X(HT)=X(TH)=1, X(TT)=-3 E(X) = 2(1/4)+1(2/4)-3(1/4) = 0.25

Mean(μ), Variance(2), Standard Deviation() xi 2 3 11 pi 1/3 1/2 1/6 μ=Exipi = 2(1/3) + 3(1/2) + 11(1/6) = 4 E(X2) =Exipi= 2(1/3) + 3(1/2) + 11(1/6) = 26 2= Var(X) = E(X2) – μ2 = 26 – 42 = 10  = sqrt(Var(X)) = sqrt(10) =3.2

Mean(μ), Variance(2), Standard Deviation() xi 2 3 11 pi 1/3 1/2 1/6 μ=Exipi = 2(1/3) + 3(1/2) + 11(1/6) = 4 E(X2) =Exipi= 2(1/3) + 3(1/2) + 11(1/6) = 26 2= Var(X) = E(X2) – μ2 = 26 – 42 = 10  = sqrt(Var(X)) = sqrt(10) =3.2

Distribution Example(1) Five cards are numbered 1 to 5. Two cards are drawn at random. Let X denote the sum of the numbers drawn. Find (a) the distribution of X and (b) the mean, variance, and standard deviation. There are C(5,2) = 10 ways of drawing two cards at random.

Distribution Example(2) Ten equiprobable sample points with their corresponding X-values are points 1,2 1,3 1,4 1,5 2,3 2,4 2,5 3,4 3,5 4,5 xi 3 4 5 6 7 8 9

Distribution Example(3) The distribution is: xi 3 4 5 6 7 8 9 pi 0.1 0.2

Distribution Example(4) The distribution is: xi 3 4 5 6 7 8 9 pi 0.1 0.2 The mean is: 3(0.1)..+..9(0.1)=6 The E(X2) is 32(0.1)..+..92(0.1) = 39 The variance is 39 – 62 = 3 The SD is sqrt(3) = 1.7

Examples Two fair dice are thrown. If the sum of the faces is 4, what is the probability that one of the dice shows a 3?

Examples A fair coin is thrown three times. Consider the following events A={first toss is a head} B={second toss is head} C={exactly 2 heads tossed in a row} Are the following events independent? A and C B and C

Examples What is meant by repeated trials? If fair coin is tossed 6 times, what is the probability of exactly two heads occurring?

Examples The probabilities of three runners A, B or C winning a race are: P(a) = 1/2, P(b) = 1/3, P(c) = 1/6. If two races are run, what is the probability of C winning the first race and A winning the second race?

Examples A player tosses two fair coins. The player wins €2 if two heads occur, and wins €1 if one head occurs. The player loses €3 if no heads occur. Find the expected value of the game. How would you test whether or not the game is fair? Is the game fair?

Examples Five cards are numbered 1 to 5. Two cards are drawn at random. Let X denote the sum of the numbers drawn. Find (a) the distribution of X and (b) the mean, variance, and standard deviation of X.