Presentation on theme: "Searching for Periodic Signals in Time Series Data"— Presentation transcript:
1Searching for Periodic Signals in Time Series Data Least Squares Sine FittingDiscrete Fourier TransformLomb-Scargle PeriodogramPre-whitening of DataOther techniquesPhase Dispersion MinimizationString LengthWavelets
2Period AnalysisHow do you know if you have a periodic signal in your data?What is the period?
6Advantages of Least Sqaures sine fitting: Good for finding periods in relatively sparse dataDisadvantages of Least Sqaures sine fitting:Signal may not always be a sine wave (e.g. eccentric orbits)No assessement of false alarm probability (more later)Don‘t always trust your results
7This is fake data of pure random noise with a s = 30 m/s This is fake data of pure random noise with a s = 30 m/s. Lesson: poorly sampled noise almost always can give you a period, but it it not signficant
8[( ( ] ) ) S S 2. The Discrete Fourier Transform Any function can be fit as a sum of sine and cosinesN0FT(w) = Xj (t) e–iwtRecall eiwt = cos wt + i sinwtj=1X(t) is the time series1N0Power:Px(w) = | FTX(w)|2N0 = number of points1[(2(2S)S)]Px(w) =Xj cos wtjXj sin wtjN0A DFT gives you as a function of frequency the amplitude (power) of each sine wave that is in the data
9eixs = cos(xs) + i sin (xs) The continous form of the Fourier transform:F(s) = f(x) e–ixs dxf(x) = 1/2p F(s) eixs dseixs = cos(xs) + i sin (xs)This is only done on paper, in the real world (computers) you always use a discrete Fourier transform (DFT)
10The Fourier transform tells you the amplitude of sine (cosine) components to a data (time, pixel, x,y, etc) stringGoal: Find what structure (peaks) are real, and what are artifacts of sampling, or due to the presence of noise
11FT Ao P Ao 1/P A pure sine wave is a delta function in Fourier space xnA constant value is a delta function with zero frequency in Fourier space: Always subtract off „dc“ level.
12Useful concept: Convolution f(u)f(x–u)du = f * ff(x):f(x):
13Useful concept: Convolution f(x-u)a1a2a3g(x)a3a2a1Convolution is a smoothing function
14ConvolutionIn Fourier space the convolution is just the product of the two transforms:Normal Space Fourier Spacef*g F G
15DFT of a pure sine wave:widthSide lobesSo why isn‘t it a d-function?
16It would be if we measured the blue line out to infinity: But we measure the red points. Our sampling degrades the delta function and introduces sidelobes
17In time spaceIn Fourier Space×Dt1/P*1/Dt×*==1/P1/Dt
18WindowData16 min period sampled regularly for 3 hours
19The longer the data window, the narrower is the width of the sinc function window:
203ps dn = Error in the period (frequency) of a peak in DFT: 2 N1/2 T A s = error of measurementT = time span of your observationsA = amplitude of your signalN = number of data points
22Alias periods:Undersampled periods appearing as another period
23Alias periods: P –1 P –1 P –1 P –1 (1 day)–1 P –1 P –1 (29.53 d)–1 falseP –1aliasP –1true=+Common Alias Periods:P –1false(1 day)–1+P –1trueday=P –1false(29.53 d)–1+P –1true=monthP –1false=( d)–1+P –1trueyear
24Nyquist Frequency N < fs/2 If T is your sampling rate which corresponds to a frequency of fs, then signals with frequencies up to fs/2 can be unambiguously reconstructed. This is the Nyquist frequency, N:N < fs/2e.g. Suppose you observe a variable star once per night. Then the highest frequency you can determine in your data is 0.5 c/d = 2 days
25When you do a DFT on a sine wave with a period = 10, sampling = 1: n0=0.1, 1/Dt = 1 Nyquist frequency-n0+n0positive nnegative n
26The effects of noise: s =10 m/s s =50 m/s s =100 m/s s =200 m/s Real Peakss =10 m/ss =50 m/ss =100 m/ss =200 m/s2 sine waves ampltudes of 100 and 50 m/s. Noise added at different levels
273. Lomb-Scargle Periodograms [SXj cos w(tj–t)]2jXj cos2 w(tj–t)[SXj sin w(tj–t)]2jXj sin2 w(tj–t)1212Px(w) =+tan(2wt) =(Ssin 2wtj)/(Scos 2wtj)jjPower is a measure of the statistical significance of that frequency (period):Scargle, Astrophysical Journal, 263, 835, 1982
28Power: DFT versus Scargle N=15Scargle PowerAmplitude (m/s)N=40N=100Frequency (c/d)DFTs give you the amplitude of a periodic signal in the data. This does not change with more data. The Lomb-Scargle power gives you the statistical significance of a period. The more data you have the more significant the detection is, thus the higher power with more data
29False Alarm Probability (FAP) The FAP is the probability that random noise will produce a peak with Lomb-Scargle Power the same as your observed peak in a certain frequency rangeUnknown period:Where P = Scargle PowerN = number of independent frequencies in the frequency range of interestFAP ≈ 1 – (1–e–P)NKnown period:In this case you have only one independent frequencyFAP ≈ e–PScargle Power (significance) is increased by lower level of noise and/or more data points
30False Alarm Probability (FAP) The probability that noise can produce the highest peak over a range ≈ 1 – (1–e–P)NFAP ≈ e–PThe probability that noise can produce the this peak exactly at this frequency = e–P
31Example: A transit candidate from BEST Why is the FAP Impotant?Example: A transit candidate from BESTDepth [%] 1.2Duration [h] ?Orbital period [d]Semi mayor axis[AU]Number of detections 1Target field No. 8Host star K...(?)Magnitude(B.E.S.T.)Radius[Rsun] Radius of planet ? [RJup] ?To confirm you need radial velocity measurements, but you do not have a period…DSS1/POSSI
3216 one-hour observations made with the 2m coude echelle
34Published (Dreizler et al Published (Dreizler et al. 2003) Radial Velocity Curve of the transiting planet OGLE 3Wrong Phase (by 180 degrees) for a transiting planet!
35Discovery of a rapidly oscillating Ap star with 16.3 min period FAP ≈ 10–5
36Lesson: Do not believe any FAP < 0.01 My limit: < 0.001 b CrBSmall FAP does not always mean a real signalPeriod = 11.5 minFAP = 0.015Period = 16.3 minFAP = 10–5Lesson: Do not believe any FAP < 0.01My limit: < 0.001Better to miss a real period than to declare a false one
37How do you get the number of independent Frequencies? Determining FAP: To use the Scargle formula you need the number of independent frequencies.How do you get the number of independent Frequencies?First Approximation: Use the number of data points N0Horne & Baliunas (1986, Astrophysical Journal, 302, 757):Ni = – N N02 = number of independent frequencies
38Use Scargle FAP only as an estimate Use Scargle FAP only as an estimate. A more valid determination of the FAP requires Monte Carlo Simulations:Method 1:Create random noise at the same level as your dataSample the random noise in the same manner as your dataCalculate Scargle periodogram of noise and determine highest peak in frequency range of interestRepeat times = NtotalAdd the number of noise periodograms with power greater than your data = NnoiseFAP = Nnoise/NtotalAssumes Gaussian noise. What if your noise is not Gaussian, or has some unknown characteristics?
39Use Scargle FAP only as an estimate Use Scargle FAP only as an estimate. A more valid determination of the FAP requires Monte Carlo Simulations:Method 2:Randomly shuffle the measured values (velocity, light, etc) keeping the times of your observations fixedCalculate Scargle periodogram of random data and determine highest peak in frequency range of interestReshuffle your data times = NtotalAdd the number of „random“ periodograms with power greater than your data = NnoiseFAP = Nnoise/NtotalAdvantage: Uses the actual noise characteristics of your data
40FAP comparisonsScargle formula using N as number of data pointsMonte Carlo SimulationUsing formula and number of data points as your independent frequencies may overestimate FAP, but each case is different.
41Amplitude = level of noise 1246810Amplitude/errorN = 20Number of measurements needed to detect a signal of a certain amplitude. The FAP of the detection is The noise level is s = 5 m/s. Basically, the larger the measurement error the more measurements you need to detect a signal.
42Lomb-Scargle Periodogram of 6 data points of a sine wave: Lots of alias periods and false alarm probability (chance that it is due to noise) is 40%!For small number of data points do not use Scargle, sine fitting is best. But be cautious!!
44Least squares sine fitting: The best fit period (frequency) has the lowest c2 Discrete Fourier Transform: Gives the power of each frequency that is present in the data. Power is in (m/s)2 or (m/s) for amplitudeAmplitude (m/s)Lomb-Scargle Periodogram: Gives the power of each frequency that is present in the data. Power is a measure of statistical signficance
454. Finding Multiple Periods in Data: Pre-whitening What if you have multiple periods in your data? How do you find these and make sure that these are not due to alias effects of your sampling window.Standard procedure: Pre-whitening. Sequentially remove periods from the data until you reach the level of the noise
46Prewhitening flow diagram: Find highest peak in DFT/Scargle (Pi)Save PiFit sine wave to data and subtract fitCalculate new DFTAre more peaks above the level of the noise?noStop, publish all Piyes
47False alarm probability ≈ 10–14 Alias PeaksNoise level
48False alarm probability ≈ 0.24 Raw dataAfter removal of dominant periodFalse alarm probability ≈ 0.24
49Useful program for pre-whitening of time series data: Program picks highest peak, but this may be an aliasPeaks may be due to noise. A FAP analysis will tell you this
505. Other Techniques: Phase Dispersion Minimization AmplitudeWrong PeriodCorrect PeriodChoose a period and phase the data. Divide phased data into M bins and compute the standard deviation in each bin. If s2 is the variance of the time series data and s2 the total variance of the M bin samples, the correct period has a minimum value of Q :Q = s2/s2See Stellingwerf, Astromomical Journal, 224, 953, 1978
51PDM is suited to cases in which only a few observations are made of a limited period of time, especially if the light curves are highly non-sinsuisoidal
52PDMScargleIn most cases PDM gives the same answer as DFT, Scargle periodograms. With enough data it should give the same answer
535. Other Techniques: String Length Method Wrong period longer string lengthPhase the data to a test period and minimize the distance between adjacent pointsLafler & Kinman, Astrophysical Journal Supplement, 11, 216, 1965Dworetsky, Monthly Notices Astronomical Society, 203, 917, 1983
545. Other Techniques: Wavelet Analysis This technique is ideal for finding signals that are aperiodic, noisy, intermittent, or transient.Recent interest has been in transit detection in light curves