A Systematic Exploration of the Time Domain

Slides:



Advertisements
Similar presentations
11 Nov 2009IVOA Garching: Apps II1 Crowdsourcing and the VO Matthew J. Graham (Caltech, NVO) et Roy Williams, Andrew Drake, George Djorgovski Ashish Mahabal,
Advertisements

Vestrand Real Time Transient Detection with RAPTOR: Exploring the Path Toward a “Thinking” Telescope Tom Vestrand on behalf of the RAPTOR Team Los Alamos.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Project Ideas slides modified from Eileen Kraemer and David P. Feldman.
Foreground cleaning in CMB experiments Carlo Baccigalupi, SISSA, Trieste.
Time Series Photometry; Some Musings Steve B. Howell, NASA Ames Research Center.
Growth of Structure Measurement from a Large Cluster Survey using Chandra and XMM-Newton John R. Peterson (Purdue), J. Garrett Jernigan (SSL, Berkeley),
The Transient Universe: AY 250 Spring 2007 Parameter Space and the Time Domain Geoff Bower.
B12 Next Generation Supernova Surveys Marek Kowalski 1 and Bruno Leibundgut 2 1 Physikalisches Institut, Universität Bonn 2 European Southern Observatory.
Data Mining – Intro.
19 April, 2017 Knowledge and image processing algorithms for real-life applications. Dr. Maria Athelogou Principal Scientist & Scientific Liaison Manager.
LSST CD-1 Review SLAC, Menlo Park, CA November 1 - 3, 2011 Analysis Overview Bhuv Jain and Jeff Newman.
Machine Learning in Simulation-Based Analysis 1 Li-C. Wang, Malgorzata Marek-Sadowska University of California, Santa Barbara.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Data Mining Techniques
Science 9 - Space Topic 4 – Bigger and Smarter Telescopes.
N Tropy: A Framework for Analyzing Massive Astrophysical Datasets Harnessing the Power of Parallel Grid Resources for Astrophysical Data Analysis Jeffrey.
Data Mining Chun-Hung Chou
Last Words COSC Big Data (frameworks and environments to analyze big datasets) has become a hot topic; it is a mixture of data analysis, data mining,
Crowdsourcing Predictors of Behavioral Outcomes. Abstract Generating models from large data sets—and deter¬mining which subsets of data to mine—is becoming.
Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University.
Chapter 13: Taking the Measure of Stars Stars come in a wide range of temperatures, sizes, masses and colors. The constellation of Orion is a good example.
The VAO is operated by the VAO, LLC. VAO: Archival follow-up and time series Matthew J. Graham, Caltech/VAO.
National Center for Supercomputing Applications Observational Astronomy NCSA projects radio astronomy: CARMA & SKA optical astronomy: DES & LSST access:
1 New Frontiers with LSST: leveraging world facilities Tony Tyson Director, LSST Project University of California, Davis Science with the 8-10 m telescopes.
Science Update: SN2011fe in M101 (Pinwheel Galaxy) Peter Nugent (LBNL/UCB)
1 Radio Astronomy in the LSST Era – NRAO, Charlottesville, VA – May 6-8 th LSST Survey Data Products Mario Juric LSST Data Management Project Scientist.
Spectroscopy in VO, ESAC Mar Access to Spectroscopic Data In the VO Doug Tody (NRAO/US-NVO ) for the IVOA DAL working group I NTERNATIONAL.
ANTs PI Meeting, Nov. 29, 2000W. Zhang, Washington University1 Flexible Methods for Multi-agent distributed resource Allocation by Exploiting Phase Transitions.
Citizen Science (& Public Engagement) C. Christian HST Outreach Project Scientist.
Astro / Geo / Eco - Sciences Illustrative examples of success stories: Sloan digital sky survey: data portal for astronomy data, 1M+ users and nearly 1B.
ALIP: Automatic Linguistic Indexing of Pictures Jia Li The Pennsylvania State University.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Jeffrey Zheng School of Software, Yunnan University August 4, nd International Summit on Integrative Biology August 4-5, 2014 Chicago, USA.
Document Clustering for Forensic Analysis: An Approach for Improving Computer Inspection.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
Last Words DM 1. Mining Data Steams / Incremental Data Mining / Mining sensor data (e.g. modify a decision tree assuming that new examples arrive continuously,
Experts in numerical algorithms and High Performance Computing services Challenges of the exponential increase in data Andrew Jones March 2010 SOS14.
PHY306 1 Modern cosmology 3: The Growth of Structure Growth of structure in an expanding universe The Jeans length Dark matter Large scale structure simulations.
Catalina Real-Time Transient Survey (CRTS) S. G. Djorgovski, A. Drake, A. Mahabal, C. Donalek, R. Williams, M. Graham (CIT), E. Beshore, S. Larson, et.
G. Miknaitis SC2006, Tampa, FL Observational Cosmology at Fermilab: Sloan Digital Sky Survey Dark Energy Survey SNAP Gajus Miknaitis EAG, Fermilab.
Topic (iii): Macro Editing Methods Paula Mason and Maria Garcia (USA) UNECE Work Session on Statistical Data Editing Ljubljana, Slovenia, 9-11 May 2011.
Gaia Science Alerts in Italy Bologna, Catania, Napoli, Teramo, … Padova Massimo Turatto INAF Osservatorio Astronomico di Padova L’Italia in Gaia - Roma.
LSST and VOEvent VOEvent Workshop Pasadena, CA April 13-14, 2005 Tim Axelrod University of Arizona.
DDM Kirk. LSST-VAO discussion: Distributed Data Mining (DDM) Kirk Borne George Mason University March 24, 2011.
The VAO is operated by the VAO, LLC. Ashish Mahabal Ciro Donalek Matthew Graham Ray Plante George Djorgovski.
Data Mining Challenges and Opportunities in Astronomy The Punchline: Astronomy has become an immensely data- rich field (and growing) There is a need.
An Overview of Scientific Workflows: Domains & Applications Laboratoire Lorrain de Recherche en Informatique et ses Applications Presented by Khaled Gaaloul.
Lucent Technologies - Proprietary 1 Interactive Pattern Discovery with Mirage Mirage uses exploratory visualization, intuitive graphical operations to.
Social Information Processing March 26-28, 2008 AAAI Spring Symposium Stanford University
Scientific Data Analysis via Statistical Learning Raquel Romano romano at hpcrd dot lbl dot gov November 2006.
Spectral Analysis Pipeline for LAMOST Project A-Li Luo LAMOST Science Division NAOC, CAS.
Sample expanded template for one theme: Physics of Galaxy Evolution Mark Dickinson.
+ UVIS Data Visualization UVIS Team Meeting Braunschweig, Deutschland June 18, 2012.
Bayesian Template-Based Approach to Classifying SDSS-II Supernovae from 3-Year Survey Brian Connolly Photometric Supernova ID Workshop 3/16/12.
Globular Clusters Globular clusters are clusters of stars which contain stars of various stages in their evolution. An H-R diagram for a globular cluster.
Machine learning & object recognition Cordelia Schmid Jakob Verbeek.
T. Axelrod, NASA Asteroid Grand Challenge, Houston, Oct 1, 2013 Improving NEO Discovery Efficiency With Citizen Science Tim Axelrod LSST EPO Scientist.
Data Mining – Intro.
Machine Learning overview Chapter 18, 21
Machine Learning overview Chapter 18, 21
Data Mining Challenges and Opportunities in Astronomy
School of Computer Science & Engineering
به نام خدا Big Data and a New Look at Communication Networks Babak Khalaj Sharif University of Technology Department of Electrical Engineering.
A survey of network anomaly detection techniques
Data Warehousing and Data Mining
Project Ideas slides modified from Eileen Kraemer and David P. Feldman.
Machine Learning overview Chapter 18, 21
Presentation transcript:

A Systematic Exploration of the Time Domain S. G. Djorgovski With M. Graham, A. Mahabal, A. Drake, C. Donalek, M. Turmon, and many collaborators world-wide Hotwiring the Transient Universe III, Santa Fe, Nov. 2013

Data-Intensive Science in the 21st Century The exponential growth of data volumes, complexity, and quality has several important consequences: The value shifts from the ownership of data to the ownership of expertise and creativity There is much more latent science in the data than can be done by any individual or a group (especially in real time) } Open Data Philosophy You can do a great science without expensive observational facilities “The computer is the new telescope” Data farming, data mining, and informatics are the key new scientific skills (because the human intelligence and bandwidth do not follow the Moore’s law) And remember - nobody ever over estimated the cost of software

From “Morphological Box” to the Observable Parameter Spaces Zwicky’s concept: explore all possible combinations of the relevant parameters in a given problem; these correspond to the individual cells . in a “Morphological Box” Fritz Zwicky Example: Zwicky’s discovery of the compact dwarfs

Expanding the Observable Parameter Space Technology advances  Expanded domain of measurements  Discovery of new types of phenomena M. Harwit As we open up the time domain, we are bound to discover some new things!

Systematic Exploration of the Observable Parameter Space (OPS) Every observation, surveys included, carves out a hypervolume in the OPS Its axes are defined by the observable quantities Technology opens new domains of the OPS New discoveries

Measurements Parameter Space Physical Parameter Space Colors of stars and quasars Fundamental Plane of hot stellar systems SDSS E dSph GC Dimensionality ≤ the number of observed quantities Both are populated by objects or events

Measurements Parameter Space Physical Parameter Space Color-magnitude diagram H-R diagram Theory + Other data Not filled uniformly: clustering indicates different families Clustering + dimensionality reduction _correlations High dimensionality poses analysis challenges

Parameter Spaces for the Time Domain (in addition to everything else: flux, wavelength, etc.) For surveys: Total exposure per pointing Number of exposures per pointing How to characterize the cadence? AWindow function(s) AInevitable biases For objects/events ~ light curves: Significance of periodicity, periods Descriptors of the power spectrum (e.g., power law) Amplitudes and their statistical descriptors … etc. − over 70 parameters defined so far, but which ones are the minimum / optimal set?

Characterizing Synoptic Sky Surveys Define a measure of depth (roughly ~ S/N of indiv. exposures): D = [ A  texp   ]1/2 / FWHM where A = the effective collecting area of the telescope in m2 texp = typical exposure length = the overall throughput efficiency of the telescope+instrument FWHM = seeing Define the Scientific Discovery Potential for a survey: SDP = D  tot  Nb  Navg where tot = total survey area covered Nb = number of bandpasses or spec. resolution elements Navg = average number of exposures per pointing Transient Discovery Rate: TDR = D  R  Ne where R = d/dt = area coverage rate Ne = number of passes per night

Towards the Automated Event Classification (because human time/attention does not scale) Data are heterogeneous and sparse: incorporation of the contextual information (archival, and from the data themselves) is essential Automated prioritization of follow-up observations, given the available resources and their cost A dynamical, iterative system A very hard problem!

Contextual Information is Essential Visual context contains valuable information about the reality and classification of transients So does the temporal context, from the archival light curves And the multi-λ context Initial detection data contain little information about the transient: α, δ, m, Δm, (tc). Almost all of the initial information is archival or contextual; follow-up data trickle in slowly, if at all Artifact SN CV not SN Visible Radio Gamma

Harvesting the Human Pattern Recognition (and Domain Expertise) Human-annotated images (via SkyDiscovery.org)  Semantic descriptors  Machine processing  Evolving novel algorithms … and iterate Challenges: Optimizing for different levels of user expertise; optimal input averaging; encoding contextual information; etc. (Lead: M. Graham)

A Hierarchical Approach to Classification Different types of classifiers perform better for some event classes than for the others We use some astrophysically motivated major features to separate different groups of classes Proceeding down the classification hierarchy every node uses those classifiers that work best for that particular task

From Light Curves to Feature Vectors We compute ~ 70 parameters and statistical measures for each light curve: amplitudes, moments, periodicity, etc. This turns heterogeneous light curves into homogeneous feature vectors in the parameter space Apply a variety of automated classification methods

Optimizing Feature Selection RR Lyrae Eclipsing binary (W U Ma) Rank features in the order of classification quality for a given classification problem, e.g., RR Lyrae vs. WUMa In k-fold cross-validation, the original sample is randomly partitioned into k subsamples. Of the k subsamples, a single subsample is retained as the validation data for testing the model, and the remaining k − 1 subsamples are used as training data. The cross-validation process is then repeated k times (the folds), with each of the k subsamples used exactly once as the validation data. The k results from the folds then can be averaged (or otherwise combined) to produce a single estimation. The advantage of this method over repeated random sub-sampling is that all observations are used for both training and validation, and each observation is used for validation exactly once. (Lead: C. Donalek)

Metaclassification: An optimal combining of classifiers Exploring a variety of techniques for an optimal classification fusion: Markov Logic Networks, Diffusion Maps, Multi-Arm Bandit, Sleeping Expert…

The Follow-Up Crisis Follow-up observations are essential, especially spectroscopy. We are already limited by the available resources. This is a key bottleneck now, and it will get much worse “Exciting” transients are no longer rare – the era of ToO observations may be ending, we need dedicated follow-up facilities … and most of the existing spectrographs are not suitable for this A hierarchical elimination of less interesting objects: iterative classification, photometric observations with smaller telescopes Coordinated coverage by multi-wavelength surveys would produce a first order, mutual “follow-up” We will always follow the brightest transients first (caveat LSST) Coordinated observations by surveys with different cadences can probe more of the observable parameter space

Real-Time vs. Non-Time-Critical Transients may be overemphasized; there is a lot of good science in the archival studies, and that can only get better in time

It Is NOT All About the LSST! (or LIGO, or SKA…) NOW is the golden era of time-domain astronomy

Conclusions Time domain astronomy is here now (CRTS, PTF, PS1, SkyMapper, ASCAP, Kepler, Fermi, …), and it is a vibrant new frontier Lots of exciting and diverse science already under way, from the Solar system to cosmology – something for everyone! CRTS data stream is open – use it! (and free ≠ bad) It is astronomy of telescope and computational systems, requiring a strong cyber-infrastructure (VO, astroinformatics) Automated classification is a core problem; it is critical for a proper scientific exploitation of synoptic sky surveys Data mining of Petascale data streams both in real time and archival modes is important well beyond astronomy Surveys today are science and methodology precursors and testbeds for the LSST, and they are delivering science now CRTS II consortium now forming – join us!