Three data analysis problems Andreas Zezas University of Crete CfA.

Slides:

Advertisements

Similar presentations

Antony Lewis Institute of Astronomy, Cambridge

Advertisements

Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 12, 2012.

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

X-ray Astrostatistics Bayesian Methods in Data Analysis Aneta Siemiginowska Vinay Kashyap and CHASC Jeremy Drake, Nov.2005.

Prospects for the Planck Satellite: limiting the Hubble Parameter by SZE/X-ray Distance Technique R. Holanda & J. A. S. Lima (IAG-USP) I Workshop “Challenges.

Deriving and fitting LogN-LogS distributions An Introduction Andreas Zezas University of Crete.

CMPUT 466/551 Principal Source: CMU

Robust Object Tracking via Sparsity-based Collaborative Model

Introduction to Boosting Slides Adapted from Che Wanxiang( 车万翔 ) at HIT, and Robin Dhamankar of Many thanks!

Constraining Astronomical Populations with Truncated Data Sets Brandon C. Kelly (CfA, Hubble Fellow, 6/11/2015Brandon C. Kelly,

Longitudinal Experiments Larry V. Hedges Northwestern University Prepared for the IES Summer Research Training Institute July 28, 2010.

Radial Basis Functions

Computational Learning Theory

Deriving and fitting LogN-LogS distributions Andreas Zezas Harvard-Smithsonian Center for Astrophysics.

Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.

Chapter 11 Multiple Regression.

Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.

1 Confidence Intervals for Means. 2 When the sample size n< 30 case1-1. the underlying distribution is normal with known variance case1-2. the underlying.

Classification and Prediction: Regression Analysis

For Better Accuracy Eick: Ensemble Learning

Correlation & Regression

Introduction to Multilevel Modeling Using SPSS

2015 AprilUNIVERSITY OF HAIFA, DEPARTMENT OF STATISTICS, SEMINAR FOR M.A 1 Hastie, Tibshirani and Friedman.The Elements of Statistical Learning (2nd edition,

GEO7600 Inverse Theory 09 Sep 2008 Inverse Theory: Goals are to (1) Solve for parameters from observational data; (2) Know something about the range of.

Issues with Data Mining

Calibration Guidelines 1. Start simple, add complexity carefully 2. Use a broad range of information 3. Be well-posed & be comprehensive 4. Include diverse.

Playing in High Dimensions Bob Nichol ICG, Portsmouth Thanks to all my colleagues in SDSS, GRIST & PiCA Special thanks to Chris Miller, Alex Gray, Gordon.

STEP2 simulated images Richard Massey with Molly Peeples, Will High, Catherine Heymans, etc. etc.

Tracking by Sampling Trackers Junseok Kwon* and Kyoung Mu lee Computer Vision Lab. Dept. of EECS Seoul National University, Korea Homepage:

Spatial Indexing of large astronomical databases László Dobos, István Csabai, Márton Trencséni ELTE, Hungary.

Properties of Barred Galaxies in SDSS DR7 - OPEN KIAS SUMMER INSTITUTE - Gwang-Ho Lee, Changbom Park, Myung Gyoon Lee & Yun-Young Choi 0. Abstract We investigate.

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.

Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.

A taste of statistics Normal error (Gaussian) distribution  most important in statistical analysis of data, describes the distribution of random observations.

Mingyang Zhu, Huaijiang Sun, Zhigang Deng Quaternion Space Sparse Decomposition for Motion Compression and Retrieval SCA 2012.

Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.

21 June 2009Robust Feature Matching in 2.3μs1 Simon Taylor Edward Rosten Tom Drummond University of Cambridge.

April 2001 OPTICON workshop in Nice 1 PSF-fitting with SExtractor Emmanuel BERTIN (TERAPIX)

SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.

CPE 619 One Factor Experiments Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama in.

HST ACS data LSST: ~40 galaxies per sq.arcmin. LSST CD-1 Review SLAC, Menlo Park, CA November 1 - 3, LSST will achieve percent level statistical.

Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.

Guest lecture: Feature Selection Alan Qi Dec 2, 2004.

1 Galaxy Evolution in the SDSS Low-z Survey Huan Lin Experimental Astrophysics Group Fermilab.

I271B QUANTITATIVE METHODS Regression and Diagnostics.

Classification Ensemble Methods 1

Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.

L Berkley Davis Copyright 2009 MER301: Engineering Reliability Lecture 12 1 MER301: Engineering Reliability LECTURE 12: Chapter 6: Linear Regression Analysis.

Computational Biology Group. Class prediction of tumor samples Supervised Clustering Detection of Subgroups in a Class.

Miguel CerviñoESAC 23 March 2007 How to use the SEDs produced by synthesis models? Miguel Cerviño (IAA-CSIC/SVO) Valentina Luridiana (IAA-CSIC/SVO)

ZCOSMOS galaxy clustering: status and perspectives Sylvain de la Torre Marseille - June, 11th Clustering working group: Ummi Abbas, Sylvain de la Torre,

Measuring shear using… Kaiser, Squires & Broadhurst (1995) Luppino & Kaiser (1997)

Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.

Statistical Concepts Basic Principles An Overview of Today’s Class What: Inductive inference on characterizing a population Why : How will doing this allow.

Geology 5670/6670 Inverse Theory 20 Mar 2015 © A.R. Lowry 2015 Last time: Cooperative/Joint Inversion Cooperative or Joint Inversion uses two different.

1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.

Stats Methods at IC Lecture 3: Regression.

Boosted Augmented Naive Bayes. Efficient discriminative learning of

Photometric redshift estimation.

A Forest of Sensors: Using adaptive tracking to classify and monitor activities in a site Eric Grimson AI Lab, Massachusetts Institute of Technology

Deriving and fitting LogN-LogS distributions An Introduction

Table 1. Advantages and Disadvantages of Traditional DM/ML Methods

Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani

Machine Learning Basics

Data Mining Practical Machine Learning Tools and Techniques

Statistical Methods For Engineers

Simple Linear Regression

Introduction to Sensor Interpretation

Introduction to Sensor Interpretation

Introductory Statistics

Presentation transcript:

Three data analysis problems Andreas Zezas University of Crete CfA

Two types of problems: Fitting Source Classification

Fitting: complex datasets

Maragoudakis et al. in prep.

Fitting: complex datasets

Iterative fitting may work, but it is inefficient and confidence intervals on parameters not reliable How do we fit jointly the two datasets ? VERY common problem !

Problem 2 Model selection in 2D fits of images

A primer on galaxy morphology Three components: spheroidal exponential disk and nuclear point source (PSF)

Fitting: The method Use a generalized model n=4 : spheroidal n=1 : disk Add other (or alternative) models as needed Add blurring by PSF Do χ 2 fit (e.g. Peng et al., 2002)

Fitting: The method Typical model tree n=free n=4 n=1 n=4 n=4 n=4 PSF PSF PSF PSF PSF

Fitting: Discriminating between models Generally χ 2 works BUT: Combinations of different models may give similar χ 2 How to select the best model ? Models not nested: cannot use standard methods Look at the residuals

Fitting: Discriminating between models

Excess variance Best fitting model among least χ 2 models the one that has the lowest exc. variance

Fitting: Examples Bonfini et al. in prep.

Fitting: Problems However, method not ideal: It is not calibrated Cannot give significance Fitting process computationally intensive Require an alternative, robust, fast, method

Problem 3 Source Classification (a) Stars

Classifying stars Relative strength of lines discriminates between different types of stars Currently done “by eye” or by cross-correlation analysis

Classifying stars Would like to define a quantitative scheme based on strength of different lines.

Classifying stars Maravelias et al. in prep.

Classifying stars Not simple…. Multi-parameter space Degeneracies in parts of the parameter space Sparse sampling Continuous distribution of parameters in training sample (cannot use clustering) Uncertainties and intrinsic variance in training sample

Problem 3 Source Classification (b) Galaxies

Classifying galaxies Ho et al. 1999

Classifying galaxies Kewley et al Kewley et al. 2006

Classifying galaxies Basically an empirical scheme Multi-dimensional parameter space Sparse sampling - but now large training sample available Uncertainties and intrinsic variance in training sample  Use observations to define locus of different classes

Classifying galaxies Uncertainties in classification due to measurement errors uncertainties in diagnostic scheme Not always consistent results from different diagnostics  Use ALL diagnostics together  Obtain classification with a confidence interval Maragoudakis et al in prep.

Classification Problem similar to inverting Hardness ratios to spectral parameters But more difficult We do not have well defined grid Grid is not continuous Taeyoung Park’s thesis NHNH Γ

Summary Model selection in multi-component 2D image fits Joint fits of datasets of different sizes Classification in multi-parameter space Definition of the locus of different source types based on sparse data with uncertainties Characterization of objects given uncertainties in classification scheme and measurement errors All are challenging problems related to very common data analysis tasks. Any volunteers ?