Basic Time Series Analyzing variable star data for the amateur astronomer.

Basic Time Series Analyzing variable star data for the amateur astronomer

What Is Time Series? Single variable x that changes over time t Single variable x that changes over time t can be multiple variables, W W W A T can be multiple variables, W W W A T Light curve: x = brightness (magnitude) Light curve: x = brightness (magnitude) Each observation consists of two numbers Each observation consists of two numbers Time t is considered perfectly precise Time t is considered perfectly precise Data (observation/measurement/estimate) x is not perfectly precise Data (observation/measurement/estimate) x is not perfectly precise

Two meanings of “Time Series” TS is a process, how the variable changes over time TS is a process, how the variable changes over time TS is an execution of the process, often called a realization of the TS process TS is an execution of the process, often called a realization of the TS process A realization (observed TS) consists of pairs of numbers (t n,x n ), one such pair for each observation A realization (observed TS) consists of pairs of numbers (t n,x n ), one such pair for each observation

Goals of TS analysis Use the process to define the behavior of its realizations Use the process to define the behavior of its realizations Use a realization, i.e. observed data (t n,x n ), to discover the process Use a realization, i.e. observed data (t n,x n ), to discover the process This is our main goal This is our main goal

Special Needs Astronomical data creates special circumstances for time series analysis Astronomical data creates special circumstances for time series analysis Mainly because the data are irregularly spaced in time – uneven sampling Mainly because the data are irregularly spaced in time – uneven sampling Sometimes the time spacing (the “sampling”) is even pathological – with big gaps that have periods all their own Sometimes the time spacing (the “sampling”) is even pathological – with big gaps that have periods all their own

Analysis Step 1 Plot the data and look at the graph! (visual inspection) Plot the data and look at the graph! (visual inspection) Eye+brain combination is the world’s best pattern-recognition system Eye+brain combination is the world’s best pattern-recognition system BUT – also the most easily fooled BUT – also the most easily fooled “pictures in the clouds” “pictures in the clouds” Use visual inspection to get ideas Use visual inspection to get ideas Confirm them with numerical analysis Confirm them with numerical analysis

Data = Signal + Noise rue brightness is a function of time f(t) True brightness is a function of time f(t) it’s probably smooth (or nearly so) it’s probably smooth (or nearly so) There’s some measurement error ε it’s random It’s almost certainly not smooth Additive model: data x n at time t n is sum of signal f(t n ) and noise ε n x n = f(t n ) + ε n

Noise is Random That’s its definition! That’s its definition! Deterministic part = signal Deterministic part = signal Random part = noise Random part = noise Usually – the true brightness is deterministic, therefore it’s the signal Usually – the true brightness is deterministic, therefore it’s the signal Usually – the noise is measurement error Usually – the noise is measurement error

Achieve the Goal Means we have to figure out how the signal behaves and how the noise behaves Means we have to figure out how the signal behaves and how the noise behaves For light curves, we usually just assume how the noise behaves For light curves, we usually just assume how the noise behaves But we still should determine its parameters But we still should determine its parameters

What Determines Random? Probability distribution (pdf or pmf) Probability distribution (pdf or pmf) pdf: probability that the value falls in a small range of width dε, centered on ε is pdf: probability that the value falls in a small range of width dε, centered on ε is Probability = P(ε) dε Probability = P(ε) dε pmf: probability that the value is ε is P(ε) pmf: probability that the value is ε is P(ε) pdf/pmf has some mean value μ pdf/pmf has some mean value μ pdf/pmf has some standard deviation σ pdf/pmf has some standard deviation σ

Most Common Noise Model i.i.d. = “independent identically distributed” i.i.d. = “independent identically distributed” Each noise value is independent of others Each noise value is independent of others P 12 (x 1,x 2 ) = P 1 (x 1 )P 2 (x 2 ) P 12 (x 1,x 2 ) = P 1 (x 1 )P 2 (x 2 ) They’re all identically distributed They’re all identically distributed P 1 (x 1 ) = P 2 (x 2 ) P 1 (x 1 ) = P 2 (x 2 )

What is the Distribution? Most common is Gaussian (a.k.a. Normal) Most common is Gaussian (a.k.a. Normal)

Noise Parameters μ = mean = μ = mean = Usually assumed zero (i.e., data unbiased) Usually assumed zero (i.e., data unbiased) σ 2 = variance = σ 2 = variance = σ = √(σ 2 ) = standard deviation σ = √(σ 2 ) = standard deviation Typical value is 0.2 mag. for visual data Typical value is 0.2 mag. for visual data Smaller for CCD/photoelectric (we hope!) Smaller for CCD/photoelectric (we hope!) Note: don’t diparage visual data, what they lack in individual precision they make up by the power of sheer numbers Note: don’t diparage visual data, what they lack in individual precision they make up by the power of sheer numbers

Is the default noise model right? No! We know it’s wrong No! We know it’s wrong Bias: μ values not zero Bias: μ values not zero NOT identically distributed – different observers have different μ, σ values NOT identically distributed – different observers have different μ, σ values Sometimes not even independent (autocorrelated noise) Sometimes not even independent (autocorrelated noise) BUT – i.i.d. Gaussian is still a useful working hypothesis, so W W W A T BUT – i.i.d. Gaussian is still a useful working hypothesis, so W W W A T

Even if … Even if we know the form of the noise … Even if we know the form of the noise … We still have to figure out its parameters We still have to figure out its parameters Is it unbiased (i.e. centered at zero so μ = 0)? Is it unbiased (i.e. centered at zero so μ = 0)? How big does it tend to be (what’s σ )? How big does it tend to be (what’s σ )?

And … We still have to separate the signal from the noise We still have to separate the signal from the noise And of course figure out the form of the signal, i.e., And of course figure out the form of the signal, i.e., Figure out the process which determines the signal Figure out the process which determines the signal Whew! Whew!

Simplest Possible Signal None at all! None at all! f(t) = constant = β o f(t) = constant = β o This is the null hypothesis for many tests This is the null hypothesis for many tests But we can’t be sure f(t) is constant … But we can’t be sure f(t) is constant … … that’s only a model of the signal … that’s only a model of the signal

Separate Signal from Noise We already said data = signal + noise We already said data = signal + noise Therefore data – signal = noise Therefore data – signal = noise Approximate signal by model Approximate signal by model Approximate noise by residuals Approximate noise by residuals data – model = residuals data – model = residuals x n – y n = R n x n – y n = R n If model is correct, residuals are all noise If model is correct, residuals are all noise

Estimate Noise Parameters Use residuals R n to estimate noise parameters Use residuals R n to estimate noise parameters Estimate mean μ by average Estimate mean μ by average Estimate standard deviation σ by sample standard deviation Estimate standard deviation σ by sample standard deviation

Averages When we average i.i.d. noise we expect to get the mean When we average i.i.d. noise we expect to get the mean Standard deviation of the average (usually called the standard error) is less than standard deviation of the data Standard deviation of the average (usually called the standard error) is less than standard deviation of the data

Confidence Interval 95% confidence interval is the range in which we expect the average to lie, 95% of the time 95% confidence interval is the range in which we expect the average to lie, 95% of the time About 2 standard errors above or below the expected value About 2 standard errors above or below the expected value

Does average change? Divide time into bins Divide time into bins Usually of equal time width (often 10 days) Usually of equal time width (often 10 days) Sometimes of equal number of data N Sometimes of equal number of data N Compute average and standard deviation within each bin Compute average and standard deviation within each bin IF signal is constant AND noise is consistent, THEN expected value of data average will be constant IF signal is constant AND noise is consistent, THEN expected value of data average will be constant So: do the “bin averages” show more variation than is expected from noise? So: do the “bin averages” show more variation than is expected from noise?

ANOVA test Compare variance of averages to variance of data (ANalysis Of VAriance = ANOVA) Compare variance of averages to variance of data (ANalysis Of VAriance = ANOVA) In other words… compare variance between bins to variance within bins In other words… compare variance between bins to variance within bins “F-test” gives a “p-value,” probability of getting that result IF the data are just noise “F-test” gives a “p-value,” probability of getting that result IF the data are just noise Low p-value  probably NOT just noise Low p-value  probably NOT just noise Either we haven’t found all the signal Either we haven’t found all the signal Or the noise isn’t the simple kind Or the noise isn’t the simple kind

ANOVA test 50-day averages: 50-day averages: Fstat df.between df.within p Fstat df.between df.within p 0.315563 2 147 0.729871 0.315563 2 147 0.729871 NOT significant NOT significant 10-day averages: 10-day averages: Fstat df.between df.within p Fstat df.between df.within p 0.728138 14 135 0.743133 0.728138 14 135 0.743133 NOT significant NOT significant

ANOVA test 50-day averages: 50-day averages: Fstat df.between df.within p Fstat df.between df.within p 13.25758 2 147 5e-06 13.25758 2 147 5e-06 IS significant IS significant 10-day averages: 10-day averages: Fstat df.between df.within p Fstat df.between df.within p 2.546476 14 135 0.002879 2.546476 14 135 0.002879 IS significant IS significant

Averages Rule! Excellent way to reduce the noise Excellent way to reduce the noise because σ (ave) = σ (raw) / √N because σ (ave) = σ (raw) / √N Excellent way to measure the noise Excellent way to measure the noise Very little change to signal Very little change to signal unless signal changes faster than averaging time unless signal changes faster than averaging time So in most cases averages smooth the data, i.e., reduce noise but not signal So in most cases averages smooth the data, i.e., reduce noise but not signal

Decompose the Signal Additive model: sum of component signals Additive model: sum of component signals Non-periodic part Non-periodic part sometimes called trend sometimes called trend sometimes called secular variation sometimes called secular variation Repeating (periodic) part Repeating (periodic) part or almost-periodic (pseudoperiodic) part or almost-periodic (pseudoperiodic) part can be multiple periodic parts (multiperiodic) can be multiple periodic parts (multiperiodic) f(t) = S(t) + P(t) f(t) = S(t) + P(t)

Periodic Signal Discover that it’s periodic! Discover that it’s periodic! Find the period P Find the period P Or frequency ν Or frequency ν Pν = 1 ν = 1 / P P = 1 / ν Pν = 1 ν = 1 / P P = 1 / ν Find amplitude A = size of variation Find amplitude A = size of variation Often use A to denote the semi-amplitude, which is half the full amplitude Often use A to denote the semi-amplitude, which is half the full amplitude Find waveform (i.e., cycle shape) Find waveform (i.e., cycle shape)

Periodogram Searches for periodic behavior Searches for periodic behavior Test many frequencies (i.e., many periods) Test many frequencies (i.e., many periods) For each frequency, compute a power For each frequency, compute a power Higher power  more likely it’s periodic with that frequency (that period) Higher power  more likely it’s periodic with that frequency (that period) Plot of power vs frequency is a periodogram, a.k.a. power spectrum Plot of power vs frequency is a periodogram, a.k.a. power spectrum

Periodograms Fourier analysis  Fourier periodogram Fourier analysis  Fourier periodogram Don’t use DFT or FFT because of uneven time sampling Don’t use DFT or FFT because of uneven time sampling Use Lomb-Scargle modified periodogram OR Use Lomb-Scargle modified periodogram OR DCDFT (date-compensated discrete Fourier transform) DCDFT (date-compensated discrete Fourier transform) Folded light curve  AoV periodogram Folded light curve  AoV periodogram Many more … these are the most common Many more … these are the most common

DCDFT periodogram

AoV periodogram

Lots lots more … Non-periodic signals Non-periodic signals Periodic but not perfectly periodic Periodic but not perfectly periodic (parameters are changing) (parameters are changing) What if the noise is something “different”? What if the noise is something “different”? Come to the next workshop! Come to the next workshop!

Enjoy observing variables See your own data used in real scientific study (AJ, ApJ, MNRAS, A&A, PASP, …) Participate in monitoring and observing programs Assist in space science and astronomy Make your own discoveries! http://www.aavso.org/

Basic Time Series Analyzing variable star data for the amateur astronomer.

Similar presentations

Presentation on theme: "Basic Time Series Analyzing variable star data for the amateur astronomer."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Basic Time Series Analyzing variable star data for the amateur astronomer.

Similar presentations

Presentation on theme: "Basic Time Series Analyzing variable star data for the amateur astronomer."— Presentation transcript:

Similar presentations

About project

Feedback