PROBABILITY & STATISTICAL INFERENCE LECTURE 3 MSc in Computing (Data Analytics)

Slides:



Advertisements
Similar presentations
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
Advertisements

Normal Distribution * Numerous continuous variables have distribution closely resemble the normal distribution. * The normal distribution can be used to.
Note 7 of 5E Statistics with Economics and Business Applications Chapter 5 The Normal and Other Continuous Probability Distributions Normal Probability.
Continuous Probability Distributions.  Experiments can lead to continuous responses i.e. values that do not have to be whole numbers. For example: height.
Discrete Probability Distributions
Probability & Statistical Inference Lecture 3
Probability Densities
Chapter 5: Probability Concepts
Chapter 6 Continuous Random Variables and Probability Distributions
Probability Distributions
CHAPTER 6 Statistical Analysis of Experimental Data
Probability Distributions Random Variables: Finite and Continuous Distribution Functions Expected value April 3 – 10, 2003.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution and Other Continuous Distributions.
Chapter 5 Continuous Random Variables and Probability Distributions
Chapter 21 Random Variables Discrete: Bernoulli, Binomial, Geometric, Poisson Continuous: Uniform, Exponential, Gamma, Normal Expectation & Variance, Joint.
Continuous Probability Distribution  A continuous random variables (RV) has infinitely many possible outcomes  Probability is conveyed for a range of.
Chapter 6: Normal Probability Distributions
Chapter 4 Continuous Random Variables and Probability Distributions
© Copyright McGraw-Hill CHAPTER 6 The Normal Distribution.
Chapter 7: The Normal Probability Distribution
B AD 6243: Applied Univariate Statistics Understanding Data and Data Distributions Professor Laku Chidambaram Price College of Business University of Oklahoma.
Chapter 6 The Normal Probability Distribution
HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2010 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Chapter 8 Continuous.
Chapter 6: Probability Distributions
Business Statistics: Communicating with Numbers
McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved.
Chapter 3 Basic Concepts in Statistics and Probability
Biostatistics Lecture 7 4/7/2015. Chapter 7 Theoretical Probability Distributions.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Review and Preview This chapter combines the methods of descriptive statistics presented in.
Topics Covered Discrete probability distributions –The Uniform Distribution –The Binomial Distribution –The Poisson Distribution Each is appropriately.
1 Normal Random Variables In the class of continuous random variables, we are primarily interested in NORMAL random variables. In the class of continuous.
Theory of Probability Statistics for Business and Economics.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 1 – Slide 1 of 34 Chapter 11 Section 1 Random Variables.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution and Other Continuous Distributions.
Chapter 6 Continuous Distributions The Gaussian (Normal) Distribution.
Random Variables Numerical Quantities whose values are determine by the outcome of a random experiment.
4.3 NORMAL PROBABILITY DISTRIBUTIONS The Most Important Probability Distribution in Statistics.
Modular 11 Ch 7.1 to 7.2 Part I. Ch 7.1 Uniform and Normal Distribution Recall: Discrete random variable probability distribution For a continued random.
5.3 Random Variables  Random Variable  Discrete Random Variables  Continuous Random Variables  Normal Distributions as Probability Distributions 1.
 A probability function is a function which assigns probabilities to the values of a random variable.  Individual probability values may be denoted by.
Biostatistics Class 3 Discrete Probability Distributions 2/8/2000.
STATISTIC & INFORMATION THEORY (CSNB134) MODULE 7C PROBABILITY DISTRIBUTIONS FOR RANDOM VARIABLES ( NORMAL DISTRIBUTION)
1 Since everything is a reflection of our minds, everything can be changed by our minds.
STA347 - week 31 Random Variables Example: We roll a fair die 6 times. Suppose we are interested in the number of 5’s in the 6 rolls. Let X = number of.
June 11, 2008Stat Lecture 10 - Review1 Midterm review Chapters 1-5 Statistics Lecture 10.
1 Continuous Probability Distributions Continuous Random Variables & Probability Distributions Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering.
Random Variables Presentation 6.. Random Variables A random variable assigns a number (or symbol) to each outcome of a random circumstance. A random variable.
MATH 2400 Ch. 10 Notes. So…the Normal Distribution. Know the 68%, 95%, 99.7% rule Calculate a z-score Be able to calculate Probabilities of… X < a(X is.
B AD 6243: Applied Univariate Statistics Data Distributions and Sampling Professor Laku Chidambaram Price College of Business University of Oklahoma.
Statistics Chapter 6 / 7 Review. Random Variables and Their Probability Distributions Discrete random variables – can take on only a countable or finite.
Exam 2: Rules Section 2.1 Bring a cheat sheet. One page 2 sides. Bring a calculator. Bring your book to use the tables in the back.
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
CONTINUOUS RANDOM VARIABLES
Probability Theory Modelling random phenomena. Permutations the number of ways that you can order n objects is: n! = n(n-1)(n-2)(n-3)…(3)(2)(1) Definition:
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 6-1 Chapter 6 The Normal Distribution and Other Continuous Distributions Basic Business.
CIVE Engineering Mathematics 2.2 (20 credits) Statistics and Probability Lecture 4 Probability distributions -Poisson (discrete events) -Binomial.
Chap 5-1 Discrete and Continuous Probability Distributions.
THE NORMAL DISTRIBUTION
Random Variables By: 1.
Theoretical distributions: the Normal distribution.
Chapter 6 The Normal Distribution and Other Continuous Distributions
MECH 373 Instrumentation and Measurements
Analysis of Economic Data
STAT 311 REVIEW (Quick & Dirty)
Chapter 6. Continuous Random Variables
CONTINUOUS RANDOM VARIABLES
ENGR 201: Statistics for Engineers
Lecture 12: Normal Distribution
Chapter 6 Continuous Probability Distributions
Presentation transcript:

PROBABILITY & STATISTICAL INFERENCE LECTURE 3 MSc in Computing (Data Analytics)

Lecture Outline  A quick recap  Continuous distributions.  Question Time

A Quick Recap

Probability & Statistics  We want to make decisions based on evidence from a sample i.e. extrapolate from sample evidence to a general population  To make such decisions we need to be able to quantify our (un)certainty about how good or bad our sample information is. Population Representative Sample Sample Statistic Describe Make Inference

Some Definitions  An experiment that can result in different outcomes, even though it is repeated in the same manner every time, is called a random experiment.  The set of all possible outcomes of a random experiment is called the sample space of an experiment and is denote by S  A sample space is discrete if it consists of a finite or countable infinite set if outcomes.  A sample space is continuous if it contains an interval or real numbers.  An event is a subset of the sample space of a random experiment.

Some Definitions  A sample space is discrete if it consists of a finite or countable infinite set if outcomes.  A sample space is continuous if it contains an interval or real numbers.  An event is a subset of the sample space of a random experiment.

Probability  Whenever a sample space consists of n possible outcomes that are equally likely, the probability of the outcome 1/n.  For a discrete sample space, the probability of an event E, denoted by P(E), equals the sum of the probabilities of the outcome in E.  Some rules for probabilities:  For a given sample space containing n events E 1, E 2, E 3, ,E n 1. All simple event probabilities must lie between 0 and 1: 0 <= P(E i ) <= 1 for i=1,2, ,n 2. The sum of the probabilities of all the simple events within a sample space must be equal to 1:

Discrete Random Variable  A Random Variable (RV) is obtained by assigning a numerical value to each outcome of a particular experiment.  Probability Distribution: A table or formula that specifies the probability of each possible value for the Discrete Random Variable (DRV)  DRV: a RV that takes a whole number value only

Summary Continued…  For Discrete RV we often have a mathematical formula which is used to calculate probabilities, i.e. P(x) = some formula  This formula is called the Probability Mass Function (PMF)  Given the PMF you can calculate the mean and variance by:  When the summation is over all possible values of x

Binomial Distribution – General Formula  This all leads to a very general rule for calculating binomial probabilities: In General Binomial (n,p) n = no. of trials p = probability of a success x = RV (no. of successes)  Where P(X=x) is read as the probability of seeing x successes.

Binomial Distribution  If X is a binomial random variable with the paramerters p and n then

Poisson Probability Distribution  Probability Distribution for Poisson Where  is the known mean:  x is the value of the RV with possible values 0,1,2,3,…. e = irrational constant (like  ) with value …  The standard deviation, , is given by the simple relationship;  =

Continuous Probability Distributions

 Experiments can lead to continuous responses i.e. values that do not have to be whole numbers. For example: height could be 1.54 meters etc.  In such cases the sample space is best viewed as a histogram of responses.  The Shape of the histogram of such responses tells us what continuous distribution is appropriate – there are many.

Normal Distribution (AKA Gaussian) The Histogram below is symmetric & 'bell shaped' This is characteristic of the Normal Distribution We can model the shape of such a distribution (i.e. the histogram) by a Curve

Normal Distribution  The Curve may not fit the histogram 'perfectly' - but should be very close  Normal Distribution - two parameters, µ = mean,  = standard deviation,  The mathematical formula that gives a bell shaped symmetric curve f(x) = Height of curve at x =

Normal Distribution  Why Not P(x) as before? => because response is continuous  What is the probability that a person sampled at random is 6 foot?  Equivalent question: what proportion of people are 6 foot?  => really mean what proportion are  'around 6 foot' ( as good as the measurement device allows) - so not really one value, but many values close together.

 Example: What proportion of graduates earn €35,000?  Would we exclude €35, or €34,999.99?  Round to the nearest €, €10, €100, €1000?  Continuous measure => more useful to get proportion from €35,000 - €40,000  Some Mathematical Jargon:  The formula for the normal distribution is formally called the normal probability density function (pdf)

The Shaded portion of the Histogram is the Proportion of interest Can visualise this using the histogram of salaries.

Since the histogram of salaries is symmetric and bell shaped, we model this in statistics with a Normal distribution curve. Proportion = the proportion of the area of the curve that is shaded So proportions = proportional area under the curve = a probability of interest Need; To know ,  To be able to find area under curve

 Area under a curve is found using integration in mathematics.  In this case would need a technique called numerical integration.  Total area under curve is 1.  However, the values we need are in Normal Probability Tables.

The Tables are for a Normal Distribution with  = 0 and  = 1 this is called the Standard Normal Can 'convert' a value from any normal to the standard normal using standard scores (Z scores) Value from any Normal Distribution Standardiz e Corresponding Value from Normal  = 0  = 1 Standard Normal

Z scores are a unit-less quantity, measuring how far above/below  a certain score (x) is, in standard deviation units. Example: A score of 35, from a normal distribution with  = 25 and  = 5. Z = ( 35 − 25) / 5 => 10/5 = 2 So 35 is 2 standard deviation units above the mean What about a score of 20 ? Z = ( ) / 5 => − 5 / 5 = − 1 So 20 is 1 standard unit below the mean Z-Score Example

Positive Z score => score is above the mean Negative Z score => score is below the mean By subtracting  and dividing by the  we convert any normal to  = 0,  =1, so only need one set of tables! Z-Score Example

From looking at the histogram of peoples weekly receipts, a supermarket knows that the amount people spend on shopping per week is normally distributed with:  = €58  = €15. Example:

What is the probability that a customer sampled at random will spend less than €83.50 ? Z = ( x −  ) /  = ( € €58 ) / €15 => 1.7 Area from Z=1.7 to the left can be read in tables From tables area less than Z = 1.7 => So probability is Or 95.54%

What is the probability that a customer sampled at random will spend more than €83.50 ? Z = ( x −  ) /  = ( € €58 ) / €15 => 1.7 From tables area greater than Z = 1.7 => = So probability is Or 4.46%

Exercise  Find the proportion of people who spend more than €76.75  Find the proportion of people who spend less than €63.50  Note: The tables can also be used to find other areas (less than a particular value, or the area between two points)

Characteristics of Normal Distributions  Standard Deviation has particular relevance to Normal distribution  Normal Distribution => Empirical Rule Between Z (lower, upper) %Area -1,168 % -2,295 % -3,399.7 % -∞, +∞100%

 The normal distribution is just one of the known continuous probability distributions.  Each have their own probability density function, giving different shaped curves.  In each case, we find probabilities by calculating areas under these curves using integration..  However, the Normal is the most important – as it plays a major role in Sampling Theory.

Other important continuous probability distributions include Exponential distribution – especially positively skewed lifetime data. Uniform distribution. Weibull – especially for ‘time to event’ analysis. Gamma distribution – waiting times between Poisson events in time etc. Many others…..

Summary – Random Variables  There are two types – discrete RVs and continuous RVs  For both cases we can calculate a mean ( μ ) and standard deviation ( σ )  μ can be interpreted as average value of the RV  σ can be interpreted as the standard deviation of the RV

Summary Continued…  For Discrete RV we often have a mathematical formula which is used to calculate probabilities, i.e. P(x) = some formula  This formula is called the Probability Mass Function (PMF)  Given the PMF you can calculate the mean and variance by:  When the summation is over all possible values of x

Summary Continued…  For continuous RVs, we use a Probability Density Function (PDF) to define a curve over the histogram of the values of the random variables.  We integrate this PDF to find areas which are equal to probabilities of interest.  Given the PDF you can calculate the mean and variance by:  Where f(x) is usual mathematical notation for the PDF

Question Time

Next Week  Next week we will start with the practical part of the course. We will move to Lab 1005 in Aungier Street