Presentation is loading. Please wait.

Presentation is loading. Please wait.

Codes for astrostatistics: StatCodes & VOStat Eric Feigelson Penn State.

Similar presentations


Presentation on theme: "Codes for astrostatistics: StatCodes & VOStat Eric Feigelson Penn State."— Presentation transcript:

1 Codes for astrostatistics: StatCodes & VOStat Eric Feigelson Penn State

2 Vast range of statistical problems in modern astronomy Poisson processes: point processes, time series analysis Image analysis: MLE deconvolution, adaptive smoothing, wavelet analyses Multivariate analysis & classification (w/ meas errors) Survival analysis (censoring & truncation w/ meas errors) Parametric models: Model selection, non-linear regression Non-parametric methods Confidence limits: bootstrap resampling Prior knowledge: Bayesian inference (see talk at PhysStat 2003 conference)

3 The problem Astronomers are insufficiently trained in modern applied statistics ….. but even if they knew what to do, they inadequate access to computer codes.

4 Astronomers never use large commercial statistical packages like SAS, SPSS, Statistica Some astronomers sometimes use UNIX-based command- line systems like MatLab or S-Plus. Astronomers like mini-codes in Numerical Recipes & often write their own codes. Many like IDL which has simple statistics. NASA/NSF observatories produce huge data analysis codes (IRAF, AIPS, CIAO, …) which by policy avoid proprietary codes A few specialized stand-along astrostat codes written under NASA funding: ROSTAT, ASURV, SLOPES, StatPy Altogether this is a very bad situation: vast statistical needs with very inadequate codes

5 The rise of the Virtual Observatory Vast collections of calibrated data (images, spectra, time series), extracted catalogs (rows=sources, columns=properties), and source bibliographies emerged during the 1990s. NASA Science Archive Centers (MAST, HEASARC, IRSA, LAMDA), bibliographic databases (ADS, SIMBAD, NED), & more are being transformed into a federated (though still distributed & heterogeneous) system. XML metadata (VOTable), SOAP protocols, … for data mining & extraction. but originally no plan for visualization & statistical analysis of extracted datasets

6 StatCodes: A partial solution In late-1990s, the Penn State group created a Web metasite with annotated links to ~200 open source packages & codes of utility to astronomers. Quite successful: 50-100 hits/day for 7 years. Multivariate & time series methods most popular. But the collection of on-line codes was very inhomogeneous and incomplete

7 R Finally a broad public-domain statistical software system emerges Based on the successful commercial UNIX-based S/S-Plus, R has an interactive command-line feel (like IDL), flexible data I/O, acceptable graphics, integration to C/Fortran/Python/…, and quite a lot of sophisticated statistical methods. Core R: 2000-page manual with ~200 functionalities, some very complex & advanced CRAN: 300 add-on packages, dozens useful to astronomers. Some are themselves full systems.

8 VOStat: A Web service 1.Web form interface providing simple statistical R functions with VOTable inputs 2.Same R functions provided through a more sophisticated Java-based grid-computing mode. User data bases Dispersed VO VOStat server Heavy statistical computation Answers Requests Heavy data

9 VOStat may be a big improvement but … Generic Web-based services are inherently inflexible & limited. VOStat may serve to entice the astronomer to download R & perform the real analysis at home. Astronomers need training in advanced methods before using them with R. Penn State has just created a Center for Astrostatistics to develop curriculum, conduct tutorials, provide template R code, etc. R/CRAN does not serve huge VO datasets or some special astrostat needs. New methodological/code development underway (CMU, Cornell, PSU, UCIrv,…)


Download ppt "Codes for astrostatistics: StatCodes & VOStat Eric Feigelson Penn State."

Similar presentations


Ads by Google