# AMERICAN ASTRONOMICAL SOCIETY Continuous Probability Distribution as an Alternative to Binning of Survey Data JANUARY 6, 2010 David J. Corliss.

## Presentation on theme: "AMERICAN ASTRONOMICAL SOCIETY Continuous Probability Distribution as an Alternative to Binning of Survey Data JANUARY 6, 2010 David J. Corliss."— Presentation transcript:

AMERICAN ASTRONOMICAL SOCIETY Continuous Probability Distribution as an Alternative to Binning of Survey Data JANUARY 6, 2010 David J. Corliss

Figure 1 – Population Distribution of hot DB white dwarfs described by Eisenstein et al. 2006 0 2 4 6 8 10 12 14 16 < 30,000 K 30 - 35,000 K 35 - 40,000 K40 - 45,000 K> 45,000 K A Typical Example of Binned Data Population of Hot DB White Dwarfs in the Sloan Digital Sky Survey

Figure 2A – Population Distribution of hot DB white dwarfs described by Eisenstein et al. 2006b Some Amount of Information if Lost as All Points in a Given Bin Are Treated the Same UPPER DB GAP MIDDLE DB GAP LOWER DB GAP There is Also Some Uncertainty as to Which Bin a Given Point Belongs

Figure 2B – Population Distribution of hot DB white dwarfs described by Eisenstein et al. 2006 b Kernel Density Estimate (KDE) Process: Represent Each Point as a Normal and Sum

Hot DB White Dwarfs in Eisenstein et al. 2006 Histogram and KDE Plot D Corliss 10/16/2009

 Prevents Loss of Information From Relatively Accurate Measurements Being Placed into Larger Bins Summary and Conclusions: Kernel Density Estimation  Incorporates the Uncertainty Associated with Measured Values into Population Distributions  Creates a Continuous Probability Density Distribution by summing over Gaussian Distributions for Each Data Point, Where μ is the Observed Value and σ is the σ of the Individual Measurement.  Provides a Viable Alternative to Binning in Developing Population Distributions for Survey and Other Data

Babu, G. Jogesh, Summer School in Statistics for Astronomers V lecture Notes, Pennsylvania State University 2009 Barnes, George R., Cerrito, Patricia B., The Visualization of Continuous Data Using PROC KDE and PROC CAPABILITY, SUGI, 26, 2001 Corliss, David J., MS Thesis, Wayne State University, 2008 Eisenstein, D.J., et al., 2006, ApJS, 167, 40 (Eisenstein et al. 2006a) Eisenstein, D.J., et al., 2006, ApJ, 132, 676 (Eisenstein et al. 2006b) Sall, John – Personal Communication re. the SAS KDE Procedure References

A Final Thought - “Essentially, all models are wrong, but some are useful.” George E. P. Box (Norman R. Draper (1987). Empirical Model-Building and Response Surfaces, p. 424, Wiley.)

libname project 'C:\SAS\Conferences'; data work.kde; input month 4.0 day 4.0 year 4.0 volume 8.0; cards; 1 1 1962 589 2 1 1962 561 3 1 1962 640 4 1 1962 656 5 1 1962 727 6 1 1962 697 7 1 1962 640 8 1 1962 599 run; DATA WORK.TSERIES; SET WORK.CRYER; IF MONTH = 1; DUMMY = 1; ATTRIB T INFORMAT=8.0 FORMAT=8.0; T = YEAR; ATTRIB Y INFORMAT=8.0 FORMAT=8.1; Y = VOLUME; RUN; PROC MEANS DATA=WORK.TSERIES NOPRINT; VAR VOLUME; OUTPUT OUT=WORK.RANDOM_TERM; RUN; %GLOBAL LAMBDA SIGMA; %MACRO ASSIGNMENT; DATA _NULL_; SET WORK.RANDOM_TERM; IF _STAT_ = MEAN; %LET LAMBDA = VOLUME; RUN; DATA _NULL_; SET WORK.RANDOM_TERM; IF _STAT_ = STD; %LET SIGMA = VOLUME; RUN; %ASSIGNMENT; %PUT LAMBDA = &LAMBDA.; DATA WORK.TEST; SET WORK.TSERIES; LAMBDA = &LAMBDA.; SIGMA = &SIGMA.; RUN;

%MACRO AC(N); PROC SORT DATA=WORK.TSERIES; BY DUMMY; RUN; DATA WORK.LAST; SET WORK.TSERIES; BY DUMMY; IF LAST.DUMMY; RECENT = _N_ - &N. + 1; KEEP DUMMY RECENT; RUN; DATA WORK.RECENT; MERGE WORK.TSERIES WORK.LAST; BY DUMMY; IF _N_ GE RECENT; DROP RECENT; RUN; PROC REG DATA=WORK.RECENT NOPRINT; MODEL Y=T; OUTPUT OUT=WORK.TREND PREDICTED=FORECAST RESIDUAL=RESIDUAL; RUN; DATA WORK.TREND; SET WORK.TREND; OUTPUT; T_PREVIOUS = T; Y_PREVIOUS = FORECAST + RAND(SIGMA,LAMBDA); RETAIN T_PREVIOUS Y_PREVIOUS; RUN; DATA WORK.NEW; SET WORK.TREND; BY DUMMY; IF LAST.DUMMY; DELTA_T = T - T_PREVIOUS; T = T + DELTA_T; DELTA_Y = Y - Y_PREVIOUS + 1; Y = Y + DELTA_Y; KEEP T Y DUMMY; RUN; DATA WORK.TSERIES; SET WORK.TSERIES WORK.NEW; RUN; %MEND AC; %AC(5);

Download ppt "AMERICAN ASTRONOMICAL SOCIETY Continuous Probability Distribution as an Alternative to Binning of Survey Data JANUARY 6, 2010 David J. Corliss."

Similar presentations