Presentation is loading. Please wait.

Presentation is loading. Please wait.

Machine Learning Week 4.

Similar presentations


Presentation on theme: "Machine Learning Week 4."— Presentation transcript:

1 Machine Learning Week 4

2 Basic Probability Envision an experiment for which the result is unknown. The collection of all possible outcomes is called the sample space. A set of outcomes, or subset of the sample space, is called an event. A probability space is a three-tuple (W ,, Pr) where W is a sample space,  is a collection of events from the sample space and Pr is a probability law that assigns a number to each event in . For any events A and B, Pr must satsify: Pr() = 1 Pr(A)  0 Pr(AC) = 1 – Pr(A) Pr(A  B) = Pr(A) + Pr(B), if A  B = . If A and B are events in  with Pr(B)  0, the conditional probability of A given B is

3 Random Variables A random variable is “a number that you don’t know… yet” Discrete vs. Continuous Cumulative distribution function Density function Probability distribution (mass) function Joint distributions Conditional distributions Functions of random variables Moments of random variables Transforms and generating functions

4 Conditioning Frequently, the conditional distribution of Y given X is easier to find than the distribution of Y alone. If so, evaluate probabilities about Y using the conditional distribution along with the marginal distribution of X: Example: Draw 2 balls simultaneously from a jar containing four balls numbered 1, 2, 3 and 4. X = number on the first ball, Y = number on the second ball, Z = XY. What is Pr(Z > 5)? Key: Maybe easier to evaluate Z if X is known

5 Moments of Random Variables
Expectation = “average” Variance = “volatility” Standard Deviation Coefficient of Variation

6 Linear Functions of Random Variables
Covariance Correlation If X and Y are independent then

7 Bernoulli Distribution
“Single coin flip” p = Pr(success) N = 1 if success, 0 otherwise Chapter 0

8 Binomial Distribution
“n independent coin flips” p = Pr(success) N = # of successes Chapter 0

9 Geometric Distribution
“independent coin flips” p = Pr(success) N = # of flips until (including) first success Chapter 0

10 Poisson Distribution “Occurrence of rare events”  = average rate of occurrence per period; N = # of events in an arbitrary period Chapter 0

11 Uniform Distribution X is equally likely to fall anywhere within interval (a,b) a b Chapter 0

12 Exponential Distribution
X is nonnegative and it is most likely to fall near 0 Also memoryless; more on this later… Chapter 0

13 Normal Distribution X follows a “bell-shaped” density function
From the central limit theorem, the distribution of the sum of independent and identically distributed random variables approaches a normal distribution as the number of summed random variables goes to infinity. Chapter 0

14 Stochastic Processes A stochastic process is a random variable that changes over time, or a sequence of numbers that you don’t know yet. Poisson process Continuous time Markov chains Chapter 0

15 Time Series Autoregressive (AR) and Moving Average (MA) models

16 Time Series A time series is a sequence of numerical data in which each item is associated with a particular instant in time In fact with the current progress in computer technology we have daily series on interest rates, the hourly "telerate" interest rate index, and stock prices by the minute (or even second).

17 An analysis of a single sequence of data is called univariate time-series analysis
An analysis of several sets of data for the same sequence of time periods is called multivariatetime-series analysis or, more simply, multiple time-series analysis

18 Stochastic processes Time series are an example of a stochastic or random process A stochastic process is 'a statistical phenomenen that evolves in time according to probabilistic laws' Mathematically, a stochastic process is an indexed collection of random variables

19 Stochastic processes We are concerned only with processes indexed by time, either discrete time or continuous time processes such as

20 Continuous vs. Discrete
We base our inference usually on a single observation or realization of the process over some period of time, say [0, T] (a continuous interval of time) or at a sequence of time points {0, 1, 2, T}

21 Specification of a process
A simpler approach is to only specify the moments—this is sufficient if all the joint distributions are normal The mean and variance functions are given by

22 Autocovariance Because the random variables comprising the process are not independent, we must also specify their covariance

23 Autocorrelation It is useful to standardize the autocovariance function (acvf) Consider stationary case only Use the autocorrelation function (acf)

24 Stationarity Inference is most easy, when a process is stationary—its distribution does not change over time This is strict stationarity A process is weakly stationary if its mean and autocovariance functions do not change over time

25 Weak stationarity The autocovariance depends only on the time difference or lag between the two time points involved

26 White noise This is a purely random process, a sequence of independent and identically distributed random variables Has constant mean and variance Also

27 Several Models for Time Series
(1) a purely random process, (2) a random walk, (3) a movingaverage (MA) process, (4) an autoregressive (AR) process, (5) an autoregressive movingaverage (ARMA) process, and (6) an autoregressive integrated moving average (ARIMA)process.

28 Purely Random Process Auto-covariance function
Auto-correlation function

29 Random Walk

30 Moving average processes
Start with {Zt} being white noise or purely random, mean zero, s.d. Z {Xt} is a moving average process of order q (written MA(q)) if for some constants 0, 1, q we have Usually 0 =1

31 Moving average processes
The mean and variance are given by The process is weakly stationary because the mean is constant and the covariance does not depend on t

32 Moving average processes
If the Zt's are normal then so is the process, and it is then strictly stationary The autocorrelation is

33 Moving average processes
Note the autocorrelation cuts off at lag q For the MA(1) process with 0 = 1

34 Moving average processes
In order to ensure there is a unique MA process for a given acf, we impose the condition of invertibility This ensures that when the process is written in series form, the series converges For the MA(1) process Xt = Zt + Zt - 1, the condition is ||< 1

35 Moving average processes
For general processes introduce the backward shift operator B Then the MA(q) process is given by

36 Moving average processes
The general condition for invertibility is that all the roots of the equation  lie outside the unit circle (have modulus less than one)

37 Autoregressive processes
Assume {Zt} is purely random with mean zero and s.d. z Then the autoregressive process of order p or AR(p) process is

38 Autoregressive processes
The first order autoregression is Xt = Xt Zt Provided ||<1 it may be written as an infinite order MA process Using the backshift operator we have (1 – B)Xt = Zt

39 Autoregressive processes
From the previous equation we have

40 Autoregressive processes
Then E(Xt) = 0, and if ||<1

41 Autoregressive processes
The AR(p) process can be written as

42 Autoregressive processes
This is for for some 1, 2, . . . This gives Xt as an infinite MA process, so it has mean zero

43 Autoregressive processes
Conditions are needed to ensure that various series converge, and hence that the variance exists, and the autocovariance can be defined Essentially these are requirements that the i become small quickly enough, for large i

44 Autoregressive processes
The i may not be able to be found however. The alternative is to work with the i The acf is expressible in terms of the roots i, i=1,2, ...p of the auxiliary equation

45 Autoregressive processes
Then a necessary and sufficient condition for stationarity is that for every i, |i|<1 An equivalent way of expressing this is that the roots of the equation must lie outside the unit circle

46 ARMA processes Combine AR and MA processes
An ARMA process of order (p,q) is given by

47 ARMA processes Alternative expressions are possible using the backshift operator

48 ARMA processes An ARMA process can be written in pure MA or pure AR forms, the operators being possibly of infinite order Usually the mixed form requires fewer parameters

49 ARIMA processes General autoregressive integrated moving average processes are called ARIMA processes When differenced say d times, the process is an ARMA process Call the differenced process Wt. Then Wt is an ARMA process and

50 ARIMA processes Alternatively specify the process as
This is an ARIMA process of order (p,d,q)

51 ARIMA processes The model for Xt is non-stationary because the AR operator on the left hand side has d roots on the unit circle d is often 1 Random walk is ARIMA(0,1,0) Can include seasonal terms—see later

52 Non-zero mean We have assumed that the mean is zero in the ARIMA models There are two alternatives mean correct all the Wt terms in the model incorporate a constant term in the model


Download ppt "Machine Learning Week 4."

Similar presentations


Ads by Google