Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Hidden Message Some useful techniques for data analysis Chihway Chang, Feb 18’ 2009.

Similar presentations


Presentation on theme: "The Hidden Message Some useful techniques for data analysis Chihway Chang, Feb 18’ 2009."— Presentation transcript:

1 The Hidden Message Some useful techniques for data analysis Chihway Chang, Feb 18’ 2009

2 A famous example…  Hubble ’ s law v=H 0 d  Expansion of the universe

3 What do we learn?  Seemingly crappy data can lead to astonishing discoveries Insight + imagination  Nature laws are usually simple Most parts in our observable Universe are linear, spherical symmetric, Gaussian or Poisson  Data analysis should be easy! … theoretically

4 CLT We all know how this happens  Process of data analysis: Sampling  Central Limit Theorem  Strategy of sampling Model fitting  Linear regression  Maximum likelihood  Chi square Correlations Or … ???@#$ Collect lots of data Stare at your data

5 Outline Useful techniques in data analysis:  Correlations Linear correlation Cross-correlation Autocorrelation  Principle Component Analysis (PCA)

6 Correlations  Linear correlation  Data Standard scores Correlation coefficients (Pearson product- moment) Coefficient of determination Variance in common  Correlation matrix

7 Example – Hubble’s law We have 24 data points, we ’ d like to know how v and d correlate

8 Ndd * dzdzd vv * vzvzv z v *z d 1 0.0320.001024-1.3916317028900-0.55890.777778 2 0.0340.001156-1.3884629084100-0.228720.317567 3 0.2140.045796-1.10361-13016900-1.384351.527778 4 0.2630.069169-1.02606-704900-1.219261.251038 5 0.2750.075625-1.00707-18534225-1.535681.546545 …… 19 1.41.960.7732575002500000.3490970.269942 20 1.72.891.2480129609216001.6147882.015274 21 241.7227675002500000.3490970.601412 22 241.7227678507225001.3121222.260481 23 241.7227678006400001.1745472.023471 24 241.722767109011881001.9724833.398128 Ave 0.9113751.2299080.399304373.125271309.4132087.10.789639 Ave 2 0.631905363.43790.623529 Correlation coefficient Correlation of determination Standard scores

9 Example – Hubble’s law We have 24 data points, we ’ d like to know how v and d correlate

10 Significance and likelihood  One-tailed table usage  What is the likelihood for 24 random number sets to have by chance corr(X,Y) ≧ 0.79?  What if we only have 5 samples?

11 Limitations  Only capable of linear dependence  Sensible to outliers  Affected by correlated errors

12 Cross-correlation  Signal processing: search in a long series of data a short feature signal f g t t (f*g)(t)

13 Autocorrelation  Finding repeating patterns  Identifying fundamental lengths or time scales in noisy signal  Cross-correlation with self or simply f 0.1

14 Application  Correlation coefficient: Well, um … everywhere?  Auto & cross-correlation: Optics: laser coherence, spectra measurement, ultra-short laser pulse Signal process: musical beats Astronomy: pulsar frequency Correlation in space: 2-point (n-point) correlation functions & power spectrum

15 Example: 2-point correlation in weak lensing  Assumption: galaxy shapes are entirely random  Correlation of shape parameter “ e ”  0  Shear induces correlation at length scale ~arcmin  Atmosphere and systematics induce correlated noise

16  Typical 2-point correlation plots, no shear, but with noise and systematics  Shear signal is at 1% level  Controlling systematics is the key! 1 arcmin 5 arcmin

17 Principle Component Analysis  Revealing the internal structure of data in a way that best explains its variance  Conceptually, it is a transformation of coordinate system that rotates data into its eigen- space where the greatest variance by any projection of the data lie on the first coordinate  High-dimension analysis

18 Mathematical operation  Recognize important variance in data — the Principle Components (PCs)  Reconstruct data using only low orders of PCs thus compressing dimension of data  Assumption: Data can be represented by a linear combination of certain basis Data is Gaussian

19 Example — Hubble’s Law again Get data {(x i,y i )} 24*2 Subtract mean {(X i,Y i )}={(x i -ave(x),y i -ave(y))} 24*2 Calculate covariance matrix C 2*2 ={(X i,Y i )} T {(X i,Y i )}/N Calculate & normalize 2 eigenvalue and 2 eigenvectors of C The eigenvectors point to 2 PCs and the eigenvalues indicate relevant weightings

20 PC1, eigenvalue=132087 PC2, eigenvalue=0.1503

21 Recognize important PC and ignore others To form a new basis of compressed dimension {V} 2*1 Reconstruct data using 1 eigenvector to rotate data back {X ’ i,Y ’ i }={V} T {X i,Y i }{V} Shift data back and get final reconstructed data {X,Y} reconstruct ={X ’ i +ave(x),Y ’ i +ave(y)}

22 Example – characterize shape of CCD chips  Fit 27 chip shapes using 4th order polynomials Data matrix of dimension 27*15 15 eigenvalues and 15 eigenvectors Choose 15,5,1 PCs to reconstruct shapes

23

24 Applications  Pattern recognition (http://icg.cityu.edu.hk/private/PowerPoint/PCA.ppt)http://icg.cityu.edu.hk/private/PowerPoint/PCA.ppt  Multi-dimension data analysis  Noise reduction  Image analysis

25 Conclusion  Data is only useful if we know how to interpret them  Various statistical techniques are developed  Analyzing correlations and PCA are two common techniques I introduce today “ It can aid understanding reality, but it is no substitute for insight, reason, and imagination. It is a flashlight of the mind. It must be turned on and directed by our interests and knowledge; and it can help gratify and illuminate both. But like a flashlight, it can be uselessly turned on in the daytime, used unnecessarily beneath a lamp, employed to search for something in the wrong room, or become a play thing. ” R.J. Rummel, department of political science, University of Hawaii

26 Reference  A tutorial on Principal Components Analysis, Lindsay I Smith  Understanding Correlation, R.J. Rummel … and yes, I learned everything from Wikipidia

27 FIN Thank you for your attention!


Download ppt "The Hidden Message Some useful techniques for data analysis Chihway Chang, Feb 18’ 2009."

Similar presentations


Ads by Google