September 2005NVO Summer School1 Object Classification in the Virtual Observatory: A VO Status Report Tom McGlynn NASA/GSFC T HE US N ATIONAL V IRTUAL.

Slides:



Advertisements
Similar presentations
Trying to Use Databases for Science Jim Gray Microsoft Research
Advertisements

Applications of one-class classification
VO/IVOA and The Astronomy Community Dave De Young NOAO.
IVOA, Kyoto May Data Access Layer Working Group Working Group Report and Summary Doug Tody National Radio Astronomy Observatory International.
July 13, 2005Duke University Graduate School1 CGS 2005 Summer Workshop: Ph.D. COMPLETION AT DUKE UNIVERSITY: SOME INTERVENTIONS AND THEIR INITIAL EFFECTS.
12/5/2007Duke University Graduate School1 THE GRADUATE SCHOOL AS AN AGENT OF PROGRAMMATIC CHANGE: Ph.D. COMPLETION AT DUKE UNIVERSITY CGS Annual Meeting.
September 13, 2004NVO Summer School1 VO Protocols Overview Tom McGlynn NASA/GSFC T HE US N ATIONAL V IRTUAL O BSERVATORY.
September 13, 2004NVO Summer School1 VO Protocols Overview Tom McGlynn NASA/GSFC T HE US N ATIONAL V IRTUAL O BSERVATORY.
Discovery and Exploration in the VO Chris Miller NOAO/CTIO La Serena, Chile T HE US N ATIONAL V IRTUAL O BSERVATORY.
2008 NVO Summer School1 Finding Services in the NVO Registry Gretchen Greene T HE US N ATIONAL V IRTUAL O BSERVATORY.
NVO Summer School, Santa Fe Sept Access to Spectroscopic Data In the VO Doug Tody (NRAO/US-NVO ) I NTERNATIONAL V IRTUAL O BSERVATORY A LLIANCE.
The NVO Data Discovery Portal Tom McGlynn NASA/GSFC.
September 7, 2005NVO Summer School1 Building a SkyNode Server Tom McGlynn NASA/GSFC T HE US N ATIONAL V IRTUAL O BSERVATORY.
Sep 8, NVO Summer School1 WCSFixer A Web-Enabled IRAF Plate Solver Mike Fitzpatrick, NOAO T HE US N ATIONAL V IRTUAL O BSERVATORY.
September 13, 2004NVO Summer School1 Exploring VO Registries, Resources and Software with The NVO DataScope and Other VO Tools Tom McGlynn NASA/GSFC T.
14 Sep 2004 NVO Summer School1 Introduction to Web Services Technology Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY.
Sept NVO Summer School1 Cone, SIAP, and OpenSkyQuery Client Development Gretchen Greene, Maria Nieto-Santisteban T HE US N ATIONAL V IRTUAL O.
8 September 2008NVO Summer School 2008 – Santa Fe1 Publishing Data and Services to the VO Ray Plante Gretchen Greene T HE US N ATIONAL V IRTUAL O BSERVATORY.
September 7, 2005NVO Summer School1 National Virtual Observatory Summer School: Welcome Tom McGlynn NASA/GSFC T HE US N ATIONAL V IRTUAL O BSERVATORY.
9 Sept 2005NVO Summer School II1 NVO Science Applications T HE US N ATIONAL V IRTUAL O BSERVATORY Robert Hanisch US NVO Project Manager Space Telescope.
Sept NVO Summer School1 The NVO DataScope: Internals Tom McGlynn NASA/GSFC T HE US N ATIONAL V IRTUAL O BSERVATORY.
John Cunniffe Dunsink Observatory Dublin Institute for Advanced Studies Evert Meurs (Dunsink Observatory) Aaron Golden (NUI Galway) Aus VO 18/11/03 Efficient.
Sixth Grade Writing Workshop My heart says to write. My hands want to write. My mind has ideas. How do I get it all on paper? The Champion School1.
Pei Chun Public School1 Sharing Session on Science Answering Techniques Objectives of Sharing Session Introduce four basic skills involved in answering.
Chapter 11 Automatic Cluster Detection. 2 Data Mining Techniques So Far… Chapter 5 – Statistics Chapter 6 – Decision Trees Chapter 7 – Neural Networks.
Slice Sampling Radford M. Neal The Annals of Statistics (Vol. 31, No. 3, 2003)
Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
14 October 2003ADASS 2003 – Strasbourg1 Resource Registries for the Virtual Observatory R.Plante (NCSA), G. Greene (STScI), R. Hanisch (STScI), T. McGlynn.
National Center for Supercomputing Applications 259 th fastest computer in the world Michael Remijan NCSA –Research Programmer –Web-based Distributed Programming.
GLAST Science Support CenterJuly, 2003 LAT Ground Software Workshop Science Analysis Tools Design Robert Schaefer – Software Lead, GSSC.
Three kinds of learning
Lecture #1COMP 527 Pattern Recognition1 Pattern Recognition Why? To provide machines with perception & cognition capabilities so that they could interact.
Dusty star formation at high redshift Chris Willott, HIA/NRC 1. Introductory cosmology 2. Obscured galaxy formation: the view with current facilities,
KDD for Science Data Analysis Issues and Examples.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
Galaxies Live in Clusters Hickson Fornax. Coma Virgo.
A long tradition. e-science, Data Centres, and the Virtual Observatory why is e-science important ? what is the structure of the VO ? what then must we.
Automated Classification of X-ray Sources R. J. Hanisch, A. A. Suchkov, R. L. White Space Telescope Science Institute T. A. McGlynn, E. L. Winter, M. F.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
GAVO – after one year H.-M. Adorf & the GAVO team.
DateADASS How to Navigate VO Datasets Using VO Protocols Ray Plante (NCSA/UIUC), Thomas McGlynn and Eric Winter NASA/GSFC T HE US N ATIONAL V IRTUAL.
Proficiency in Science Students who are proficient in science: 1. Know, use, and interpret scientific explanations of the natural world 2. Generate and.
Functions and Demo of Astrogrid 1.1 China-VO Haijun Tian.
Science with the Virtual Observatory Brian R. Kent NRAO.
Spectroscopy in VO, ESAC Mar Access to Spectroscopic Data In the VO Doug Tody (NRAO/US-NVO ) for the IVOA DAL working group I NTERNATIONAL.
Astronomical Spectroscopy and the Virtual Observatory ESAC, March 2007 VO tools and cross-calibration Pedro García-Lario European Space Astronomy.
Predicting Earthquakes By Lois Desplat. Why Predict Earthquakes?  To minimize the loss of life and property.  Unfortunately, current techniques do not.
Federation and Fusion of astronomical information Daniel Egret & Françoise Genova, CDS, Strasbourg Standards and tools for the Virtual Observatories.
Wiss. Beirat AIP, ClusterFinder & VO-Methods H. Enke German Astrophysical Virtual Observatory ClusterFinder VO Methods for Astronomical Applications.
1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.
Map of the Great Divide Basin, Wyoming, created using a neural network and used to find likely fossil beds See:
Digital Image Processing
German Astrophysical Virtual Observatory Overview and Results So Far W. Voges, G. Lemson, H.-M. Adorf.
Some thoughts on error handling for FTIR retrievals Prepared by Stephen Wood and Brian Connor, NIWA with input and ideas from others...
12 Oct 2003VO Tutorial, ADASS Strasbourg, Data Access Layer (DAL) Tutorial Doug Tody, National Radio Astronomy Observatory T HE US N ATIONAL V IRTUAL.
Event and Feature Catalogs in the Virtual Solar Observatory Joseph A. Hourclé and the VSO Team SP54A-07 : 2008 May 30.
Budapest Group Eötvös University MAGPOP kick-off meeting Cassis 2005 January
Automated Classification of X-ray Sources for Very Large Datasets Susan Hojnacki, Joel Kastner, Steven LaLonde Rochester Institute of Technology Giusi.
February 12, 2002Tom McGlynn ADEC Interoperability Technical Working Group Report.
NVO Study of Super Star Clusters in Nearby Galaxies – Proposal and Demo B. Whitmore, C. Hanley, B. Chan, R. Chandar OUTLINE Background and Science Goals.
Data Mining – Intro.
Geographical Data Mining
Map of the Great Divide Basin, Wyoming, created using a neural network and used to find likely fossil beds See:
Incorporating Ancillary Data for Classification
Self organizing networks
Hidden Markov Models Part 2: Algorithms
Supervised Classification
ALI assignment – see amended instructions
Google Sky.
Presentation transcript:

September 2005NVO Summer School1 Object Classification in the Virtual Observatory: A VO Status Report Tom McGlynn NASA/GSFC T HE US N ATIONAL V IRTUAL O BSERVATORY

September 2005NVO Summer School2 How do we know what we want in the VO? Pretend the VO exists. What is the science we are doing with it? Now try to do that science and see what gets in the way.

September 2005NVO Summer School3 Can we classify ROSAT X-ray sources? All RASS Sources (124,730) Classified RASS Sources ~7,000 Total RASS Sources ~130,000

September 2005NVO Summer School4 What do we want to do? Find counterparts to ROSAT X-ray sources in optical, IR, radio. Train a classifier to use multiwavelength information to determine type of objects. Classify all of the objects seen by ROSAT.

September 2005NVO Summer School5 What is classification? Translation from observables to distinct physical processes. Each element classified independently of others Is classification different from measurement? Classification versus cataloging Usually classify objects but also… –Events: GRBs, solar flares, … –Simulated data –Pixels/regions in an image: Earth and planetary studies, shocked regions, …

September 2005NVO Summer School6 Its not just us. A typical plot of objects to be classified? There is lots of information and discussion of classification outside astronomy.

September 2005NVO Summer School7 Examples Moving versus fixed stars Classes of stellar spectra (ordered by strength of Balmer lines). –Substitute for a measurement –Cf. Dwarf versus giant Osterbrock diagram: AGN versus star-forming emission line galaxies. Bautz-Morgan types of clusters of galaxies –Dominance of cluster by central galaxy. Types of x-ray sources: AGN, SNR, pulsars, XRBs, …

September 2005NVO Summer School8 Galaxy Classification

September 2005NVO Summer School9 Why do we classify? Understand a given field. Generate statistical samples. Compare different regions/observations. Find rare objects. Remove unwanted backgrounds. Plan subsequent observations. …

September 2005NVO Summer School10 Do we know what we are looking for? Yes: We have a good idea of the kinds of objects that are in the field. –Supervised classification –Find out which regions of observable phase space belong to which classes and use that knowledge to classify new sources. No: We dont really know what were looking at. –Unsupervised classification –Is there any structure in the phase space distribution?

September 2005NVO Summer School11 Supervised versus unsupervised classification Supervised and Unsupervised Land Use Classification, Chris Banman an5/perry3.html

September 2005NVO Summer School12 Supervised classification Often has a training phase where a priori knowledge is used to tune the classifier algorithm. Training takes most of the time. –But Osterbrock diagram based on theoretical modeling. We specify a list of output classes. May give a list of probabilities of membership in more than one class. Algorithms: Neural networks, nearest neighbor, decision trees

September 2005NVO Summer School13 Supervised classifier training Neural NetworksOblique Decision Trees

September 2005NVO Summer School14 Unsupervised classification Tries to find natural groupings of data. User often specifies number of classes to find. Classes found are anonymous – it is up to user to define physical meaning. Self-organizing maps, K-means, C- means hierarchical clustering, gaussian mixtures

September 2005NVO Summer School15 Self-organizing maps Catalogs in VizieR K-means Fuzzy C- means

September 2005NVO Summer School16 Some key questions. 1.(S) What output classes are we interested in, and what degree of resolution do we want? Star versus galaxy or A0V versus SBa (U) How many classes might we expect? 2.What input data sets are we going to use? 3.How are we going to get them? 4.How do we combine them? 5.What observables are available? Which are useful? 6.(S) What training sets are available? (U) How do we understand the output classes? 7.What algorithm are we going to use in classification? 8.How can we test the results so that we believe them?

September 2005NVO Summer School17 Specification/Count of Output Classes We werent sure how detailed we could do classifications and had to play with the classifiers to see what might be feasible. Does the VO help? Not directly. This will often be implicit in the problem. By making other aspects in classification easier, the VO makes playing around with this choice easier.

September 2005NVO Summer School18 What input data sets are we going to use? We knew which datasets we were going to use but we added one along the way. Does the VO help? Maybe. VO registries can help find resources but these will often be implicit in the problem.

September 2005NVO Summer School19 We used custom interfaces to get data from different resources, but VOTables were developed early enough for us to use. (Perl VOTable parser from ClassX effort) This took a fair bit of work. Does the VO help? A lot. Just a few standard ways to get the data and nice standard ways of defining them. Limits on some services are still annoying. New libraries can make this part really easy. Large XML files are cumbersome to process in many tools. How are we going to get the data?

September 2005NVO Summer School20 How do we combine them? We used custom software. This took a lot of work but we had to deal with the issue of multiple counterparts to each X-ray sources. Does the VO help? A lot. XMatch does a lot of what we want though not everything. Note spatial matching capabilities in TOPCAT allow merging of data from ConeSearch too.

September 2005NVO Summer School21 What observables are available? Which are useful? This took a lot of work. Understanding what variables were available and getting full descriptions was difficult. Does the VO help? A little. Visualization tools like Mirage are nice for getting a feel for the data, but non-VO tools (e.g., IDL itself) may do this just as well. Documentation in the VO is probably not better than before but a common framework for getting information to users is available if providers ever get around to providing adequate documentation.

September 2005NVO Summer School22 Classification needs right information, not all information. Hughes Effect Classification of Multi-Spectral Data by Join Supervised-Unsupervised Learning (Shahshahani & Landgrebe)

September 2005NVO Summer School23 Training set/ground truth data We knew most of the training data in advance. Does the VO help? VO registry may point out some possibilities but training or truth data may be implicit in the problem.

September 2005NVO Summer School24 What algorithm are we going to use in classification? We had experience with oblique decision trees. Does the VO help? A little. VOStat provides a few capabilities for unsupervised classification, but the Web interface is a little flakey. Web service interfaces to a few standard classifiers might be nice. VO could do a lot more here.

September 2005NVO Summer School25 VOStat See Statistics routines on-line with VO interface. Downloadable library Fairly minimal Web interface Includes K-means and hierarchical clustering tools.

September 2005NVO Summer School26 How can we test the results so that we believe them? We found a number of independently classified sets of objects and checked for consistency. Does the VO help? Yes. This is probably where we can most effectively use VO resources we discover in the registry. However a couple of the samples we used were not yet published.

September 2005NVO Summer School27 Testing the results Classify independently classified datasets. Check faint sources?

September 2005NVO Summer School28 Overall… A lot of progress since we started ClassX but plenty of issues still remain.

September 2005NVO Summer School29 A ClassX phase space slice \

September 2005NVO Summer School30 Science Probalistic classifications of all ROSAT X-ray sources: McGlynn, et. al 2004ApJ M New HMXRBs: Suchkov and Hanisch 2004ApJ S