Overview G. Jogesh Babu.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Regression Eric Feigelson Lecture and R tutorial Arcetri Observatory April 2014.
Probability and Statistics Basic concepts II (from a physicist point of view) Benoit CLEMENT – Université J. Fourier / LPSC
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
Outline input analysis input analyzer of ARENA parameter estimation
An Overview of Machine Learning
Chap 9: Testing Hypotheses & Assessing Goodness of Fit Section 9.1: INTRODUCTION In section 8.2, we fitted a Poisson dist’n to counts. This chapter will.
Statistical Methods Chichang Jou Tamkang University.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Machine Learning CMPT 726 Simon Fraser University
2008 Chingchun 1 Bootstrap Chingchun Huang ( 黃敬群 ) Vision Lab, NCTU.
Overview of STAT 270 Ch 1-9 of Devore + Various Applications.
Statistical analysis and modeling of neural data Lecture 4 Bijan Pesaran 17 Sept, 2007.
Chapter 14 Inferential Data Analysis
Statistics Continued. Purpose of Inferential Statistics Try to reach conclusions that extend beyond the immediate data Make judgments about whether an.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
B. RAMAMURTHY EAP#2: Data Mining, Statistical Analysis and Predictive Analytics for Automotive Domain CSE651C, B. Ramamurthy 1 6/28/2014.
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Clustering and Testing in High- Dimensional Data M. Radavičius, G. Jakimauskas, J. Sušinskas (Institute of Mathematics and Informatics, Vilnius, Lithuania)
Overview G. Jogesh Babu. Overview of Astrostatistics A brief description of modern astronomy & astrophysics. Many statistical concepts have their roots.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Stats Term Test 4 Solutions. c) d) An alternative solution is to use the probability mass function and.
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
Chapter 13 Understanding research results: statistical inference.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Bootstrapping James G. Anderson, Ph.D. Purdue University.
Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.
Statistical principles: the normal distribution and methods of testing Or, “Explaining the arrangement of things”
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Overview G. Jogesh Babu. R Programming environment Introduction to R programming language R is an integrated suite of software facilities for data manipulation,
Canadian Bioinformatics Workshops
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Howard Community College
Advanced Data Analytics
Inferential Statistics
Probability Theory and Parameter Estimation I
Non-Parametric Tests 12/1.
Math 4030 – 10b Inferences Concerning Variances: Hypothesis Testing
Non-Parametric Tests 12/1.
Ch3: Model Building through Regression
Non-Parametric Tests 12/6.
Chapter 2 Simple Comparative Experiments
Maximum Likelihood Estimation
CHOOSING A STATISTICAL TEST
Non-Parametric Tests.
Bias and Variance of the Estimator
Statistics in Applied Science and Technology
Data Science Process Chapter 2 Rich's Training 11/13/2018.
Statistical Modelling
SA3202 Statistical Methods for Social Sciences
Introduction to Statistics
REMOTE SENSING Multispectral Image Classification
REMOTE SENSING Multispectral Image Classification
Filtering and State Estimation: Basic Concepts
Statistical NLP: Lecture 4
Ch13 Empirical Methods.
Pattern Recognition and Machine Learning
Analytics – Statistical Approaches
Parametric Methods Berlin Chen, 2005 References:
UNIT-4.
InferentIal StatIstIcs
Introductory Statistics
Uncertainty Propagation
Presentation transcript:

Overview G. Jogesh Babu

Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation, variance, standard deviation (units free estimates) density of a continuous random variable (as opposed to density defined in physics) Normal (Gaussian) distribution, Chi-square distribution (not Chi-square statistic) Probability inequalities and the CLT

R Programming environment Introduction to R programming language R is an integrated suite of software facilities for data manipulation, calculation and graphical display. Commonly used techniques such as, graphical description, tabular description, and summary statistics, are illustrated through R.

Exploratory Data Analysis An approach/philosophy for data analysis that employs a variety of techniques (mostly graphical) to: maximize insight into a data set uncover underlying structure extract important variables detect outliers and anomalies formulate hypotheses worth testing develop parsimonious models provide a basis for further data collection through surveys or experiments

Statistical Inference While Exploratory Data Analysis provides tools to understand what the data shows, the statistical inference helps in reaching conclusions that extend beyond the immediate data alone. Statistical inference helps in making judgments of an observed difference between groups is a dependable one or one that might have happened by chance in a study. Topics include: Point estimation Confidence intervals for unknown parameters Principles of testing of hypotheses

Maximum Likelihood Estimation Likelihood - differs from that of a probability Probability refers to the occurrence of future events while a likelihood refers to past events with known outcomes MLE is used for fitting a mathematical model to data. Modeling real world data by estimating maximum likelihood offers a way of tuning the free parameters of the model to provide a good fit.

Regression Basic Concepts in Regression Bias-Variance Tradeoff Linear Regression Nonparametric Regression Local Polynomial Regression Confidence Bands Splines

Linear regression issues in astronomy Compares different regression lines used in astronomy Illustrates them with Faber-Jackson relation. Measurement Error models are also discussed

Multivariate analysis Analysis of data on two or more attributes (variables) that may depend on each other Principle components analysis, to reduce the number of variables Canonical correlation Tests of hypotheses Confidence regions Multivariate regression Discriminant analysis (supervised learning). Computational aspects are covered in the lab

Cluster Analysis Data mining techniques Classifying data into clusters k-means Model clustering Single linkage (friends of friends) Complete linkage clustering algorithm

Nonparametric Statistics These statistical procedures make no assumptions about the probability distributions of the population. The model structure is not specified a priori but is instead determined from data. As non-parametric methods make fewer assumptions, their applicability is much wider Procedures described include: Sign test Mann-Whitney two sample test Kruskal-Wallis test for comparing several samples Density Estimation

Bootstrap How to get most out of repeated use of the data. Bootstrap is similar to Monte Carlo method but the `simulation' is carried out from the data itself. A very general, mostly non-parametric procedure, and is widely applicable. Applications to regression, cases where the procedure fails, and where it outperforms traditional procedures will be also discussed

Model selection Chi-square test Wald Test Rao's score test Likelihood ratio test AIC, BIC

Goodness of Fit Curve (model) fitting or goodness of fit using bootstrap procedure. Procedure like Kolmogorov-Smirnov does not work in multidimensional case, or when the parameters of the curve are estimated. Bootstrap comes to rescue Some of these procedures are illustrated using R in a lab session on Hypothesis testing and bootstrapping

Bayesian Inference As evidence accumulates, the degree of belief in a hypothesis ought to change Bayesian inference takes prior knowledge into account The quality of Bayesian analysis depends on how best one can convert the prior information into mathematical prior probability Methods for parameter estimation, model assessment etc Illustrations with examples from astronomy

Monte Carlo Markov Chain MCMC methods are a collection of techniques that use pseudo-random (computer simulated) values to estimate solutions to mathematical problems MCMC for Bayesian inference Illustration of MCMC for the evaluation of expectations with respect to a distribution MCMC for estimation of maxima or minima of functions MCMC procedures are successfully used in the search for extra-solar planets

Time Series Time domain procedures State space models Kernel smoothing Poisson processes Spectral methods for inference A brief discussion of Kalman filter Illustrations with examples from astronomy

Spatial Statistics Spatial Point Processes Gaussian Processes (Inference and computational aspects) Modeling Lattice Data Homogeneous and inhomogeneous Poisson processes Estimation of Ripley's K function (useful for point pattern analysis) Cox Process (doubly stochastic Poisson Process) Markov Point Processes

Facing Challenge: Complex Theory and Complex Data in Astrostatistics Complex Theory – Models with “black box” mappings from parameter space Complex Data – Large quantity of high dimensional data, spectra, images, with significant observational limitations Testing Cosmological Theories Type Ia Supernovae Analysis Role of dimension reduction Role of nonparametric methods