Download presentation

Presentation is loading. Please wait.

Published byBernardo Surgent Modified over 2 years ago

1
AMERICAN ASTRONOMICAL SOCIETY Continuous Probability Distribution as an Alternative to Binning of Survey Data JANUARY 6, 2010 David J. Corliss

2
Figure 1 – Population Distribution of hot DB white dwarfs described by Eisenstein et al < 30,000 K ,000 K ,000 K ,000 K> 45,000 K A Typical Example of Binned Data Population of Hot DB White Dwarfs in the Sloan Digital Sky Survey

3
Figure 2A – Population Distribution of hot DB white dwarfs described by Eisenstein et al. 2006b Some Amount of Information if Lost as All Points in a Given Bin Are Treated the Same UPPER DB GAP MIDDLE DB GAP LOWER DB GAP There is Also Some Uncertainty as to Which Bin a Given Point Belongs

4
Figure 2B – Population Distribution of hot DB white dwarfs described by Eisenstein et al b Kernel Density Estimate (KDE) Process: Represent Each Point as a Normal and Sum

5
Hot DB White Dwarfs in Eisenstein et al Histogram and KDE Plot D Corliss 10/16/2009

6
Prevents Loss of Information From Relatively Accurate Measurements Being Placed into Larger Bins Summary and Conclusions: Kernel Density Estimation Incorporates the Uncertainty Associated with Measured Values into Population Distributions Creates a Continuous Probability Density Distribution by summing over Gaussian Distributions for Each Data Point, Where μ is the Observed Value and σ is the σ of the Individual Measurement. Provides a Viable Alternative to Binning in Developing Population Distributions for Survey and Other Data

7
Babu, G. Jogesh, Summer School in Statistics for Astronomers V lecture Notes, Pennsylvania State University 2009 Barnes, George R., Cerrito, Patricia B., The Visualization of Continuous Data Using PROC KDE and PROC CAPABILITY, SUGI, 26, 2001 Corliss, David J., MS Thesis, Wayne State University, 2008 Eisenstein, D.J., et al., 2006, ApJS, 167, 40 (Eisenstein et al. 2006a) Eisenstein, D.J., et al., 2006, ApJ, 132, 676 (Eisenstein et al. 2006b) Sall, John – Personal Communication re. the SAS KDE Procedure References

8
A Final Thought - “Essentially, all models are wrong, but some are useful.” George E. P. Box (Norman R. Draper (1987). Empirical Model-Building and Response Surfaces, p. 424, Wiley.)

9
libname project 'C:\SAS\Conferences'; data work.kde; input month 4.0 day 4.0 year 4.0 volume 8.0; cards; run; DATA WORK.TSERIES; SET WORK.CRYER; IF MONTH = 1; DUMMY = 1; ATTRIB T INFORMAT=8.0 FORMAT=8.0; T = YEAR; ATTRIB Y INFORMAT=8.0 FORMAT=8.1; Y = VOLUME; RUN; PROC MEANS DATA=WORK.TSERIES NOPRINT; VAR VOLUME; OUTPUT OUT=WORK.RANDOM_TERM; RUN; %GLOBAL LAMBDA SIGMA; %MACRO ASSIGNMENT; DATA _NULL_; SET WORK.RANDOM_TERM; IF _STAT_ = MEAN; %LET LAMBDA = VOLUME; RUN; DATA _NULL_; SET WORK.RANDOM_TERM; IF _STAT_ = STD; %LET SIGMA = VOLUME; RUN; %ASSIGNMENT; %PUT LAMBDA = &LAMBDA.; DATA WORK.TEST; SET WORK.TSERIES; LAMBDA = &LAMBDA.; SIGMA = &SIGMA.; RUN;

10
%MACRO AC(N); PROC SORT DATA=WORK.TSERIES; BY DUMMY; RUN; DATA WORK.LAST; SET WORK.TSERIES; BY DUMMY; IF LAST.DUMMY; RECENT = _N_ - &N. + 1; KEEP DUMMY RECENT; RUN; DATA WORK.RECENT; MERGE WORK.TSERIES WORK.LAST; BY DUMMY; IF _N_ GE RECENT; DROP RECENT; RUN; PROC REG DATA=WORK.RECENT NOPRINT; MODEL Y=T; OUTPUT OUT=WORK.TREND PREDICTED=FORECAST RESIDUAL=RESIDUAL; RUN; DATA WORK.TREND; SET WORK.TREND; OUTPUT; T_PREVIOUS = T; Y_PREVIOUS = FORECAST + RAND(SIGMA,LAMBDA); RETAIN T_PREVIOUS Y_PREVIOUS; RUN; DATA WORK.NEW; SET WORK.TREND; BY DUMMY; IF LAST.DUMMY; DELTA_T = T - T_PREVIOUS; T = T + DELTA_T; DELTA_Y = Y - Y_PREVIOUS + 1; Y = Y + DELTA_Y; KEEP T Y DUMMY; RUN; DATA WORK.TSERIES; SET WORK.TSERIES WORK.NEW; RUN; %MEND AC; %AC(5);

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google