Machine Learning Week 4.

Machine Learning Week 4

Basic Probability Envision an experiment for which the result is unknown. The collection of all possible outcomes is called the sample space. A set of outcomes, or subset of the sample space, is called an event. A probability space is a three-tuple (W ,, Pr) where W is a sample space,  is a collection of events from the sample space and Pr is a probability law that assigns a number to each event in . For any events A and B, Pr must satsify: Pr() = 1 Pr(A)  0 Pr(AC) = 1 – Pr(A) Pr(A  B) = Pr(A) + Pr(B), if A  B = . If A and B are events in  with Pr(B)  0, the conditional probability of A given B is

Random Variables A random variable is “a number that you don’t know… yet” Discrete vs. Continuous Cumulative distribution function Density function Probability distribution (mass) function Joint distributions Conditional distributions Functions of random variables Moments of random variables Transforms and generating functions

Conditioning Frequently, the conditional distribution of Y given X is easier to find than the distribution of Y alone. If so, evaluate probabilities about Y using the conditional distribution along with the marginal distribution of X: Example: Draw 2 balls simultaneously from a jar containing four balls numbered 1, 2, 3 and 4. X = number on the first ball, Y = number on the second ball, Z = XY. What is Pr(Z > 5)? Key: Maybe easier to evaluate Z if X is known

Moments of Random Variables
Expectation = “average” Variance = “volatility” Standard Deviation Coefficient of Variation

Linear Functions of Random Variables
Covariance Correlation If X and Y are independent then

Bernoulli Distribution
“Single coin flip” p = Pr(success) N = 1 if success, 0 otherwise Chapter 0

Binomial Distribution
“n independent coin flips” p = Pr(success) N = # of successes Chapter 0

Geometric Distribution
“independent coin flips” p = Pr(success) N = # of flips until (including) first success Chapter 0

Poisson Distribution “Occurrence of rare events”  = average rate of occurrence per period; N = # of events in an arbitrary period Chapter 0

Uniform Distribution X is equally likely to fall anywhere within interval (a,b) a b Chapter 0

Exponential Distribution
X is nonnegative and it is most likely to fall near 0 Also memoryless; more on this later… Chapter 0

Normal Distribution X follows a “bell-shaped” density function
From the central limit theorem, the distribution of the sum of independent and identically distributed random variables approaches a normal distribution as the number of summed random variables goes to infinity. Chapter 0

Stochastic Processes A stochastic process is a random variable that changes over time, or a sequence of numbers that you don’t know yet. Poisson process Continuous time Markov chains Chapter 0

Time Series Autoregressive (AR) and Moving Average (MA) models

Time Series A time series is a sequence of numerical data in which each item is associated with a particular instant in time In fact with the current progress in computer technology we have daily series on interest rates, the hourly "telerate" interest rate index, and stock prices by the minute (or even second).

An analysis of a single sequence of data is called univariate time-series analysis
An analysis of several sets of data for the same sequence of time periods is called multivariatetime-series analysis or, more simply, multiple time-series analysis

Stochastic processes Time series are an example of a stochastic or random process A stochastic process is 'a statistical phenomenen that evolves in time according to probabilistic laws' Mathematically, a stochastic process is an indexed collection of random variables

Stochastic processes We are concerned only with processes indexed by time, either discrete time or continuous time processes such as

Continuous vs. Discrete
We base our inference usually on a single observation or realization of the process over some period of time, say [0, T] (a continuous interval of time) or at a sequence of time points {0, 1, 2, T}

Specification of a process
A simpler approach is to only specify the moments—this is sufficient if all the joint distributions are normal The mean and variance functions are given by

Autocovariance Because the random variables comprising the process are not independent, we must also specify their covariance

Autocorrelation It is useful to standardize the autocovariance function (acvf) Consider stationary case only Use the autocorrelation function (acf)

Stationarity Inference is most easy, when a process is stationary—its distribution does not change over time This is strict stationarity A process is weakly stationary if its mean and autocovariance functions do not change over time

Weak stationarity The autocovariance depends only on the time difference or lag between the two time points involved

White noise This is a purely random process, a sequence of independent and identically distributed random variables Has constant mean and variance Also

Several Models for Time Series
(1) a purely random process, (2) a random walk, (3) a movingaverage (MA) process, (4) an autoregressive (AR) process, (5) an autoregressive movingaverage (ARMA) process, and (6) an autoregressive integrated moving average (ARIMA)process.

Purely Random Process Auto-covariance function
Auto-correlation function

Random Walk

Moving average processes
Start with {Zt} being white noise or purely random, mean zero, s.d. Z {Xt} is a moving average process of order q (written MA(q)) if for some constants 0, 1, q we have Usually 0 =1

The mean and variance are given by The process is weakly stationary because the mean is constant and the covariance does not depend on t

If the Zt's are normal then so is the process, and it is then strictly stationary The autocorrelation is

Note the autocorrelation cuts off at lag q For the MA(1) process with 0 = 1

In order to ensure there is a unique MA process for a given acf, we impose the condition of invertibility This ensures that when the process is written in series form, the series converges For the MA(1) process Xt = Zt + Zt - 1, the condition is ||< 1

For general processes introduce the backward shift operator B Then the MA(q) process is given by

The general condition for invertibility is that all the roots of the equation  lie outside the unit circle (have modulus less than one)

Autoregressive processes
Assume {Zt} is purely random with mean zero and s.d. z Then the autoregressive process of order p or AR(p) process is

The first order autoregression is Xt = Xt Zt Provided ||<1 it may be written as an infinite order MA process Using the backshift operator we have (1 – B)Xt = Zt

From the previous equation we have

Then E(Xt) = 0, and if ||<1

The AR(p) process can be written as

This is for for some 1, 2, . . . This gives Xt as an infinite MA process, so it has mean zero

Conditions are needed to ensure that various series converge, and hence that the variance exists, and the autocovariance can be defined Essentially these are requirements that the i become small quickly enough, for large i

The i may not be able to be found however. The alternative is to work with the i The acf is expressible in terms of the roots i, i=1,2, ...p of the auxiliary equation

Then a necessary and sufficient condition for stationarity is that for every i, |i|<1 An equivalent way of expressing this is that the roots of the equation must lie outside the unit circle

ARMA processes Combine AR and MA processes
An ARMA process of order (p,q) is given by

ARMA processes Alternative expressions are possible using the backshift operator

ARMA processes An ARMA process can be written in pure MA or pure AR forms, the operators being possibly of infinite order Usually the mixed form requires fewer parameters

ARIMA processes General autoregressive integrated moving average processes are called ARIMA processes When differenced say d times, the process is an ARMA process Call the differenced process Wt. Then Wt is an ARMA process and

ARIMA processes Alternatively specify the process as
This is an ARIMA process of order (p,d,q)

ARIMA processes The model for Xt is non-stationary because the AR operator on the left hand side has d roots on the unit circle d is often 1 Random walk is ARIMA(0,1,0) Can include seasonal terms—see later

Non-zero mean We have assumed that the mean is zero in the ARIMA models There are two alternatives mean correct all the Wt terms in the model incorporate a constant term in the model

Machine Learning Week 4.

Similar presentations

Presentation on theme: "Machine Learning Week 4."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Machine Learning Week 4.

Similar presentations

Presentation on theme: "Machine Learning Week 4."— Presentation transcript:

Similar presentations

About project

Feedback