Presentation on theme: "A Systematic Exploration of the Time Domain"— Presentation transcript:
1 A Systematic Exploration of the Time Domain S. G. DjorgovskiWith M. Graham, A. Mahabal, A. Drake, C. Donalek, M. Turmon, and many collaborators world-wideHotwiring the Transient Universe III, Santa Fe, Nov. 2013
2 Data-Intensive Science in the 21st Century The exponential growth of data volumes, complexity, and quality has several important consequences:The value shifts from the ownership of data to the ownership of expertise and creativityThere is much more latent science in the data than can be done by any individual or a group (especially in real time)}Open Data PhilosophyYou can do a great science without expensive observational facilities“The computer is the new telescope”Data farming, data mining, and informatics are the key new scientific skills (because the human intelligence and bandwidth do not follow the Moore’s law)And remember - nobody ever over estimated the cost of software
3 From “Morphological Box” to the Observable Parameter Spaces Zwicky’s concept: explore all possible combinations of the relevant parameters in a given problem; these correspond to the individual cells in a “Morphological Box”Fritz ZwickyExample: Zwicky’s discovery of the compact dwarfs
4 Expanding the Observable Parameter Space Technology advances Expanded domain of measurements Discovery of new types of phenomenaM. HarwitAs we open up the time domain, we are bound to discover some new things!
5 Systematic Exploration of the Observable Parameter Space (OPS) Every observation, surveys included, carves out a hypervolume in the OPSIts axes are defined by the observable quantitiesTechnology opens new domains of the OPS New discoveries
6 Measurements Parameter Space Physical Parameter SpaceColors of stars and quasarsFundamental Plane of hot stellar systemsSDSSEdSphGCDimensionality ≤ the number of observed quantitiesBoth are populated by objects or events
8 Parameter Spaces for the Time Domain (in addition to everything else: flux, wavelength, etc.)For surveys:Total exposure per pointingNumber of exposures per pointingHow to characterize the cadence?AWindow function(s)AInevitable biasesFor objects/events ~ light curves:Significance of periodicity, periodsDescriptors of the power spectrum (e.g., power law)Amplitudes and their statistical descriptors… etc. − over 70 parameters defined so far, but which ones are the minimum / optimal set?
9 Characterizing Synoptic Sky Surveys Define a measure of depth (roughly ~ S/N of indiv. exposures):D = [ A texp ]1/2 / FWHMwhere A = the effective collecting area of the telescope in m2texp = typical exposure length= the overall throughput efficiency of the telescope+instrumentFWHM = seeingDefine the Scientific Discovery Potential for a survey:SDP = D tot Nb Navgwhere tot = total survey area coveredNb = number of bandpasses or spec. resolution elementsNavg = average number of exposures per pointingTransient Discovery Rate:TDR = D R Newhere R = d/dt = area coverage rateNe = number of passes per night
10 Towards the Automated Event Classification (because human time/attention does not scale)Data are heterogeneous and sparse: incorporation of the contextual information (archival, and from the data themselves) is essentialAutomated prioritization of follow-up observations, given the available resources and their costA dynamical, iterative systemA very hard problem!
11 Contextual Information is Essential Visual context contains valuable information about the reality and classification of transientsSo does the temporal context, from the archival light curvesAnd the multi-λ contextInitial detection data contain little information about the transient: α, δ, m, Δm, (tc). Almost all of the initial information is archival or contextual; follow-up data trickle in slowly, if at allArtifactSNCV not SNVisibleRadioGamma
12 Harvesting the Human Pattern Recognition (and Domain Expertise) Human-annotated images (via SkyDiscovery.org) Semantic descriptors Machine processing Evolving novel algorithms… and iterateChallenges: Optimizing for different levels of user expertise; optimal input averaging; encoding contextual information; etc.(Lead: M. Graham)
13 A Hierarchical Approach to Classification Different types of classifiers perform better for some event classes than for the othersWe use some astrophysically motivated major features to separate different groups of classesProceeding down the classification hierarchy every node uses those classifiers that work best for that particular task
14 From Light Curves to Feature Vectors We compute ~ 70 parameters and statistical measures for each light curve: amplitudes, moments, periodicity, etc.This turns heterogeneous light curves into homogeneous feature vectors in the parameter spaceApply a variety of automated classification methods
15 Optimizing Feature Selection RR LyraeEclipsing binary (W U Ma)Rank features in the order of classification quality for a given classification problem, e.g., RR Lyrae vs. WUMaIn k-fold cross-validation, the original sample is randomly partitioned into k subsamples. Of the k subsamples, a single subsample is retained as the validation data for testing the model, and the remaining k − 1 subsamples are used as training data. The cross-validation process is then repeated k times (the folds), with each of the k subsamples used exactly once as the validation data. The k results from the folds then can be averaged (or otherwise combined) to produce a single estimation. The advantage of this method over repeated random sub-sampling is that all observations are used for both training and validation, and each observation is used for validation exactly once.(Lead: C. Donalek)
16 Metaclassification: An optimal combining of classifiers Exploring a variety of techniques for an optimal classification fusion:Markov Logic Networks, Diffusion Maps, Multi-Arm Bandit, Sleeping Expert…
17 The Follow-Up CrisisFollow-up observations are essential, especially spectroscopy. We are already limited by the available resources. This is a key bottleneck now, and it will get much worse“Exciting” transients are no longer rare – the era of ToO observations may be ending, we need dedicated follow-up facilities… and most of the existing spectrographs are not suitable for thisA hierarchical elimination of less interesting objects: iterative classification, photometric observations with smaller telescopesCoordinated coverage by multi-wavelength surveys would produce a first order, mutual “follow-up”We will always follow the brightest transients first (caveat LSST)Coordinated observations by surveys with different cadences can probe more of the observable parameter space
18 Real-Time vs. Non-Time-Critical Transients may be overemphasized; there is a lot of good science in the archival studies, and that can only get better in time
19 It Is NOT All About the LSST! (or LIGO, or SKA…) NOW is the golden era of time-domain astronomy
20 ConclusionsTime domain astronomy is here now (CRTS, PTF, PS1, SkyMapper, ASCAP, Kepler, Fermi, …), and it is a vibrant new frontierLots of exciting and diverse science already under way, from the Solar system to cosmology – something for everyone!CRTS data stream is open – use it! (and free ≠ bad)It is astronomy of telescope and computational systems, requiring a strong cyber-infrastructure (VO, astroinformatics)Automated classification is a core problem; it is critical for a proper scientific exploitation of synoptic sky surveysData mining of Petascale data streams both in real time and archival modes is important well beyond astronomySurveys today are science and methodology precursors and testbeds for the LSST, and they are delivering science nowCRTS II consortium now forming – join us!