CIS 2033 A Modern Introduction to Probability and Statistics Understanding Why and How Chapter 17: Basic Statistical Models Slides by Dan Varano Modified by Longin Jan Latecki
17.1 Random Samples and Statistical Models Random Sample: A random sample is a collection of random variables X 1, X 2,…, X n, that that have the same probability distribution and are mutually independent If F is a distribution function of each random variable X i in a random sample, we speak of a random sample from F. Similarly we speak of a random sample from a density f, a random sample from an N(µ, σ 2 ) distribution, etc.
17.1 continued Statistical Model for repeated measurements A dataset consisting of values x 1, x 2,…, x n of repeated measurements of the same quantity is modeled as the realization of a random sample X 1, X 2,…, X n. The model may include a partial specification of the probability distribution of each X i.
17.2 Distribution features and sample statistics Empirical Distribution Function F n (a) = Law of Large Numbers lim n->∞ P(|F n (a) – F(a)| > ε) = 0 This implies that for most realizations F n (a) ≈ F(a)
17.2 cont. The histogram and kernel density estimate ≈ f(x) Height of histogram on (x-h, x+h] ≈ f(x) f n,h (x) ≈ f(x)
17.2 cont. The sample mean, sample median, and empirical quantiles Ẋ n ≈ µ Med(x 1, x 2,…, x n ) ≈ q 0.5 = F inv (0.5) q n (p) ≈ F inv (p) = q p
17.2 cont. The sample variance and standard deviation, and the MAD S n 2 ≈ σ 2 and S n ≈ σ MAD(X 1, X 2,…,X n ) ≈ F inv (0.75) – F inv (0.5)
17.2 cont. Relative Frequencies for a random sample X 1,X 2,..., X n from a discrete distribution with probability mass function p,one has that ≈ p(a)
17.4 The linear regression model Simple Linear Regression Model: In a simple linear regression model for a bivariate dataset (x 1, y 1 ), (x 2, y 2 ),…,(x n, y n ), we assume that x 1, x 2,…, x n are nonrandom and that y 1, y 2,…, y n are realizations of random variables Y 1, Y 2,…, Y n satisfying Y i = α + βx i + U i for i = 1, 2,…, n, Where U 1,…, U n are independent random variables with E[U i ] = 0 and Var(U i ) = σ 2
17.4 cont Y 1, Y 2,…,Y n do not form a random sample. The Y i have different distributions because every Y i has a different expectation E[Y i ] = E[α + βx i + U i ] = α + βx i + E[U i ] = α + βx i