Lecture 3 A Brief Review of Some Important Statistical Concepts.

Slides:



Advertisements
Similar presentations
Chapter 3 Properties of Random Variables
Advertisements

Brought to you by Tutorial Support Services The Math Center.
Lecture 2 Today: Statistical Review cont’d:
QUANTITATIVE DATA ANALYSIS
Calculating & Reporting Healthcare Statistics
Topic 2: Statistical Concepts and Market Returns
Intro to Descriptive Statistics
Biostatistics Unit 2 Descriptive Biostatistics 1.
Data observation and Descriptive Statistics
COURSE: JUST 3900 INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE Instructor: Dr. John J. Kerbs, Associate Professor Joint Ph.D. in Social Work and Sociology.
 Deviation is a measure of difference for interval and ratio variables between the observed value and the mean.  The sign of deviation (positive or.
July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 4 Summarizing Data.
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 3 Descriptive Measures
Psy B07 Chapter 2Slide 1 DESCRIBING AND EXPLORING DATA.
Measurement Tools for Science Observation Hypothesis generation Hypothesis testing.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
B AD 6243: Applied Univariate Statistics Understanding Data and Data Distributions Professor Laku Chidambaram Price College of Business University of Oklahoma.
Numerical Descriptive Techniques
Chapter 3 – Descriptive Statistics
Investment Analysis and Portfolio management Lecture: 24 Course Code: MBF702.
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
Descriptive Statistics Anwar Ahmad. Central Tendency- Measure of location Measures descriptive of a typical or representative value in a group of observations.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Variability The goal for variability is to obtain a measure of how spread out the scores are in a distribution. A measure of variability usually accompanies.
Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values.
Chapter 4 Variability. Variability In statistics, our goal is to measure the amount of variability for a particular set of scores, a distribution. In.
QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures.
Tuesday August 27, 2013 Distributions: Measures of Central Tendency & Variability.
Measures of Central Tendency and Dispersion Preferred measures of central location & dispersion DispersionCentral locationType of Distribution SDMeanNormal.
Sampling Design and Analysis MTH 494 Ossam Chohan Assistant Professor CIIT Abbottabad.
1 PUAF 610 TA Session 2. 2 Today Class Review- summary statistics STATA Introduction Reminder: HW this week.
Populations, Samples, & Data Summary in Nat. Resource Mgt. ESRM 304.
LECTURER PROF.Dr. DEMIR BAYKA AUTOMOTIVE ENGINEERING LABORATORY I.
AAEC 4302 ADVANCED STATISTICAL METHODS IN AGRICULTURAL RESEARCH Descriptive Statistics: Chapter 3.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.
Variables and Random Variables àA variable is a quantity (such as height, income, the inflation rate, GDP, etc.) that takes on different values across.
Measures of Dispersion
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
Chapter 3 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 Chapter 3: Measures of Central Tendency and Variability Imagine that a researcher.
Measures of Central Tendency & Variability Dhon G. Dungca, M.Eng’g.
Measures of Central Tendency: The Mean, Median, and Mode
Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Unit 2 (F): Statistics in Psychological Research: Measures of Central Tendency Mr. Debes A.P. Psychology.
Data Analysis.
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
CHAPTER 2: Basic Summary Statistics
Measurements and Their Analysis. Introduction Note that in this chapter, we are talking about multiple measurements of the same quantity Numerical analysis.
Data Description Chapter 3. The Focus of Chapter 3  Chapter 2 showed you how to organize and present data.  Chapter 3 will show you how to summarize.
Statistics Josée L. Jarry, Ph.D., C.Psych. Introduction to Psychology Department of Psychology University of Toronto June 9, 2003.
Introduction Dispersion 1 Central Tendency alone does not explain the observations fully as it does reveal the degree of spread or variability of individual.
Outline Sampling Measurement Descriptive Statistics:
An Introduction to Statistics
Descriptive Statistics ( )
Numerical Measures: Centrality and Variability
Lecture 5,6: Measures in Statistics
Numerical Descriptive Measures
Descriptive Statistics
MEASURES OF CENTRAL TENDENCY
Measures of Location Statistics of location Statistics of dispersion
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Numerical Descriptive Measures
Summary descriptive statistics: means and standard deviations:
CHAPTER 2: Basic Summary Statistics
Numerical Descriptive Measures
Presentation transcript:

Lecture 3 A Brief Review of Some Important Statistical Concepts

The Meaning of a Variable A variable refers to any quantity that may take on more than one value Population is a variable because it is not fixed or constant – changes over time The unemployment rate is a variable because it may take on any value from 0-100% A random variable can be thought of as an unknown value that may change every time it is inspected. A random variable either may be discrete or continuous A variable is discrete if its possible values have jumps or breaks Population - measured in integers or whole units: 1, 2, 3, … A variable is continuous if there are no jumps or breaks Unemployment rate – needs not be measured in whole units: 1.77,.., 8.99, …

Descriptive Statistics Descriptive statistics are used to describe the main features of a collection of data in quantitative terms. Descriptive statistics aim to quantitatively summarize a data set Some statistical summaries are especially common in descriptive analyses. For example Frequency Distribution Central Tendency Dispersion Association

Frequency Distribution Every set of data can be described in terms of how frequently certain values occur. In statistics, a frequency distribution is a tabulation of the values that one or more variables take in a sample. Consider the hypothetical prices of Dec CME Live Cattle Futures MonthPrice (cents/lb) May67.05 June66.89 July67.45 August68.39 September67.45 October70.10 November68.39

Frequency Distribution Univariate frequency distributions are often presented as lists ordered by quantity showing the number of times each value appears. A frequency distribution may be grouped or ungrouped For a small number of observations - ungrouped frequency distribution For a large number of observations - grouped frequency distribution Ungrouped Price (X)Frequency Grouped Price (X)Frequency

Central Tendency In statistics, the term central tendency relates to the way in which quantitative data tend to cluster around a “central value”. A measure of central tendency is any of a number of ways of specifying this "central value.“ There are three important descriptive statistics that gives measures of the central tendency of a variable: The Mean The Median The Mode

The Mean The arithmetic mean is the most commonly-used type of average and is often referred to simply as the average. In mathematics and statistics, the arithmetic mean (or simply the mean) of a list of numbers is the sum of all numbers in the list divided by the number of items in the list. If the list is a statistical population, then the mean of that population is called a population mean. If the list is a statistical sample, we call the resulting statistic a sample mean. If we denote a set of data by X = (x 1, x 2,..., x n ), then the sample mean is typically denoted with a horizontal bar over the variable (, enunciated "x bar"). The Greek letter μ is used to denote the arithmetic mean of an entire population.

The Sample Mean In mathematical notation, the sample mean of a set of data denoted as X = (x 1, x 2,..., x n ) is given by To calculate the mean, all of the observations (values) of X are added and the result is divided by the number of observations (n) In the previous example, the mean price of Dec CME Live Cattle futures contract is

The Median In statistics, a median is described as the numeric value separating the higher half of a sample or population from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to highest value and picking the middle one. If there is an even number of observations, then there is no single middle value, so one often takes the mean of the two middle values. Organize the price data in the previous example in ascending order 67.05, 66.89, 67.45, 67.45, 68.39, 68.39, The median of this price series is 67.45

The Mode In statistics, the mode is the value that occurs the most frequently in a data set. The mode is not necessarily unique, since the same maximum frequency may be attained at different values. Organize the price data in the previous example in ascending order 67.05, 66.89, 67.45, 67.45, 68.39, 68.39, There are two modes in the given price data – and Thus the mode of the sample data is not unique The sample price dataset may be said to be bimodal A population or sample data may be unimodal, bimodal, or multimodal

Statistical Dispersion In statistics, statistical dispersion (also called statistical variability or variation) is the variability or spread in a variable or probability distribution. In particular, a measure of dispersion is a statistic (formula) that indicates how disperse (i.e., spread) the values of a given variable are Common measures of statistical dispersion are The Variance, and The Standard Deviation Dispersion is contrasted with location or central tendency, and together they are the most used properties of distributions

The Variance In statistics, the variance of a random variable or distribution is the expected (mean) value of the square of the deviation of that variable from its expected value or mean. Thus the variance is a measure of the amount of variation within the values of that variable, taking account of all possible values and their probabilities. If a random variable X has the expected (mean) value E[X]=μ, then the variance of X can be given by:

The Variance The above definition of variance encompasses random variables that are discrete or continuous. It can be expanded as follows:

Variance is non-negative because the squares are positive or zero. The variance of a constant a is zero, and the variance of a variable in a data set is 0 if and only if all entries have the same value. Variance is invariant with respect to changes in a location parameter. That is, if a constant is added to all values of the variable, the variance is unchanged. If all values are scaled by a constant, the variance is scaled by the square of that constant. The Variance: Properties

The Sample Variance If we have a series of n measurements of a random variable X as X i, where i = 1, 2,..., n, then the sample variance, can be used to estimate the population variance of X = (x 1, x 2,..., x n ), The sample variance is calculated as

The Sample Variance The denominator, (n-1) is known as the degrees of freedom in calculating : Intuitively, once is known, only n-1 observation values are free to vary, one is predetermined by When n = 1 the variance of a single sample is obviously zero regardless of the true variance. This bias needs to be corrected for when n is small.

The Sample Variance For the hypothetical price data for Dec CME Live Cattle futures contract, 67.05, 66.89, 67.45, 67.45, 68.39, 68.39, 70.10, the sample variance can be calculated as

The Standard Deviation In statistics, the standard deviation of a random variable or distribution is the square root of its variance. If a random variable X has the expected value (mean) E[X]=μ, then the standard deviation of X can be given by: That is, the standard deviation σ (sigma) is the square root of the average value of (X − μ) 2.

The Standard Deviation If we have a series of n measurements of a random variable X as X i, where i = 1, 2,..., n, then the sample standard deviation, can be used to estimate the population standard deviation of X = (x 1, x 2,..., x n ). The sample standard deviation is calculated as

The Mean Absolute Deviation The mean or average deviation of X from its mean is always zero. The positive and negative deviations cancel out in the summation, which makes it a useless measure of dispersion. The mean absolute deviation (MAD), calculated by: solves the “canceling out” problem.

The MSD and RMSD The alternative way to address the canceling out problem is by squaring the deviations from the mean to obtain the mean squared deviation (MSD): The problem of squaring can be solved by taking the square root of the MSD to obtain the root mean squared deviation (RMSD):

RMSD vs. Standard Deviation When calculating the RMSD, the squaring of the deviations gives a greater importance to the deviations that are larger in absolute value, which may or may not be desirable. For statistical reasons, it turns out that a slight variation of the RMSD, known as the standard deviation (S X ), is more desirable as a measure of dispersion.

Variance vs. MSD Standard Deviation vs. RMSD

24 Association Bivariate statistics can be used to examine the degree in which two variables are related or associated, without implying that one causes the other Multivariate statistics can be used to examine the degree in which multiple variables are related or associated, without implying that one causes any or some of the others Two common measures of bivariate and multivariate statistics are Covariance Correlation Coefficient p 53

25 In Figure 3.3 (a) Y and X are positively but weakly correlated while in 3.3 (b) they are negatively and strongly correlated p 54 Association: Bivariate Statistics

The covariance between two real-valued random variables X and Y, with mean (expected values) and, is Cov(X, Y) can be negative, zero, or positive Random variables with covariance is zero are called uncorrelated or independent The Covariance

If X and Y are independent, then their covariance is zero. This follows because under independence, Recalling the final form of the covariance derivation given above, and substituting, we get The converse, however, is generally not true: Some pairs of random variables have covariance zero although they are not independent. Covariance

If X and Y are real-valued random variables and a and b are constants ("constant" in this context means non-random), then the following facts are a consequence of the definition of covariance: The Covariance: Properties

If X and Y are real-valued random variables and a and b are constants ("constant" in this context means non-random), then the following facts are a consequence of the definition of variance and covariance: The variance of a finite sum of uncorrelated random variables is equal to the sum of their variances. This is because, if X and Y are uncorrelated, their covariance is 0. Variance of the Sum of Correlated Random Variables

30 The covariance is one measure of how closely the values taken by two variables X and Y vary together: If we have a series of n measurements of X and Y written as X i and Y i where i = 1, 2,..., n, then the sample covariance can be used to estimate the population covariance between X=(X 1, X 2, …, X n ) and Y=(Y 1, Y 2, …, Y n ). The sample covariance is calculated as p 53 The Sample Covariance

Correlation Coefficient A disadvantage of the covariance statistic is that its magnitude can not be easily interpreted, since it depends on the units in which we measure X and Y The related and more used correlation coefficient remedies this disadvantage by standardizing the deviations from the mean: The correlation coefficient is symmetric, that is

Correlation Coefficient If we have a series of n measurements of X and Y written as Y i and Y i, where i = 1, 2,..., n, then the sample correlation coefficient, can be used to estimate the population correlation coefficient between X and Y. The sample correlation coefficient is calculated as

Correlation Coefficient The value of correlation coefficient falls between −1 and 1: r x,y = 0 => X and Y are uncorrelated r x,y = 1 => X and Y are perfectly positively correlated r x,y = −1 => X and Y are perfectly negatively correlated