Download presentation
Presentation is loading. Please wait.
1
The VONeural team Progetto S.Co.P.E. – WP4 The Virtual Observatory and the PON-SCOPE The VO-Neural Team G. Longo (Principal Investigator) M. Brescia (Project Manager) S. Cavuoti (applications) A. Corazza (models and algorithms) R. D’Abrusco (applications)G. d’Angelo (documentation, GRID) N. Deniskina (GRID – VO interface)M. Garofalo (applications) O. Laurino (System, Applications)A. Nocella (UML software engineering) G. Riccio (Applications) S. Pardi External Members C. Donalek (Caltech)G. Djorgovski (Caltech)
2
Summary 1.What is the Virtual Observatory & its international background 2.Why the V.Obs. is so important for the future of cosmology 3.Applications already ported under SCOPE Astronomy has become an immensely data rich field Detector evolution (plates to digital to mosaics) Telescope evolution Space instruments From 1MB/night to 1TB/night Heterogeneous Data + Metadata
3
The VLT Survey Telescope 100 GB/night 2.6 meter 0.021”/pxl 16 k x 16 k Digital libraries Follow-Up Telescopes and Missions Data Services --------------- Data Mining and Analysis, Target Selection Secondary Data Providers Results V.O
4
The Virtual Observatory Data Gathering (e.g., from sensor networks, telescopes…) Data Farming: Storage/Archiving Indexing, Searchability Data Fusion, Interoperability Data Mining (or Knowledge Discovery in Databases): Pattern or correlation search Clustering analysis, automated classification Outlier / anomaly searches Hyperdimensional visualization Data understanding Computer aided understanding KDD Etc. New Knowledge Database technologies Key mathematical issues Ongoing research Users: >>1000 Total data ca. 1 PByte
5
–Clustering ~ N log N N 2, ~ D 2 –Correlations ~ N log N N 2, ~ D k (k ≥ 1) –Likelihood, Bayesian ~ N m (m ≥ 3), ~ D k (k ≥ 1) Data Mining algorithms scale very badly: The scientific exploitation of a multi band, multiepoch (K epochs) survey implies to search for patterns, trends, etc. among N points in a DxK dimensional parameter space N >10 9, D>>100, K>10 Cf. isophotal, petrosian, aperture magnitudes concentration indexes, shape parameters, etc. Band 1 Band 2 Band 3 V.S.T.
6
Tools in the VONeural Middleware Astrogrid Model (Nocella) Interface between Virtual Observatory and GRID computing (GRID-launcher; Deniskina, D’Angelo) Models Multi Layer Perceptron (VONeural_MLP; Donalek, Cavuoti, Skordovski) Support Vector Machines (VONeural_SVM; Cavuoti, Russo) Probabilistic Principal Surfaces (VONeural_PPS; Garofalo) Tools Segmentation of Astronomical images (VONeural_Ext; Laurino)
7
Scientific Applications Data mining in multiparametric spaces (supervised and unsupervised) Photometric redshifts (MLP, SVM) Search for candidate quasars and AGN (PPS, NEC) Galaxy groups and clusters CMB simulations of cosmic string signatures In collaboration with Moscow University Extraction of catalogues from astronomical images INAF + Caltech VST pipeline for distant clusters INAF + Caltech
8
Application 1 – VONeural _MLP photometric redshifts Phot z are an alternative way, less accurate than spectroscopic but much more convenient in terms of computing power and observing time, to derive redshifts (i.e. distances) of extragalactic objects
9
SDSS-DR4/5 – GG trainingvalidationTest set 60%, 20%, 20% MLP, 1(5), 1(18) 0.01<Z<0.250.25<Z<0.50 99.6 % accuracy MLP, 1(5), 1(23)MLP, 1(5), 1(24) rob = 0.206 rob = 0.234 Interpolation of systematic errors Phot Z for SDSS General Galaxy sample at least 30 experiments (10-12 h/each) training on 350.000 objects 12 features results for 32.000.000 objects
10
σ z = 0.02 Redshifts for 30 million galaxies Photometric redshifts for 30 million SDSS galaxies
11
Two types of compact groups Spatial clustering in phot_z space: two types of groups: Compact and isolated Loose and non embebbed into larger structures 95% of SKG has large fraction of E-type galaxies f 150 (E) ≥ 0.5.
12
Looking for AGN candidates Different orientations Different parameters become significant Different clusters in parameter space BUT, STILL THE SAME OBJECT !
13
3-D PCA PPS Dimensionality reduction (classification of correlated non linear data)
14
Negative entropy clustering
15
NEC: a matter of Gaussians Clustering method based on the “neg-entropy” NegE, a measure of non gaussianity of a variable. If A is gaussian, then NegE(A) = 0. Given a threshold d: If NegE(A U B) < d, then clusters A and B are replaced by cluster A U B Not replaced!Replaced! Negative entropy clustering
16
SDSS UKIDSS preprocessing clustering labeling BoK results PPS NEC dendrogram Cluster optimization 1 experiment ca. 11 days
17
SpecClass 0 | 1 | 2 | 3 | 4 |5| 6 PPS: We select clusters associating latent variables on the sphere and sources NEC: The number of clusters after the aggregation is determined by “cluster optimization”. Leads to proper binning of parameter space
18
Applicazione 2 con SVM Miglior Risultato: 81.5% PON-SCOPE GRID Infrastructure (110 nodes PON NA-CA-CT) lg 2 (gamma) lg 2 (C)
19
SDSS spectroscopic subsample of confimed QSO (specclass=4 & 6) UKIDS HO-QSO’s Colours used for all these experimentswere calculated using adjacent bands: u−g, g−r, r−i, i−z for the optical bands, and Y −J, J −H, H −K for the near infrared ones
20
Applicazione 2 con MLP Gli esperimenti sono stati effettuati selezionando soltanto gli oggetti presenti nel catalogo di G. Sorrentino et al. (2006) (z compreso tra 0.05 e 0.095) che venivano indicati come Tipo 1 e Tipo 2. Si sono selezionati solo quelli sicuramente AGN. Il dataset si componeva di 1570 oggetti: si è indicato con 1 gli oggetti di Tipo 1 e con 0 gli oggetti di Tipo 2. Il miglior risultato ottenuto è stato: Efficienza totale e = 99.4% Efficienza tipo 1 e tipo 1 = 98.4% Efficienza tipo 2 e tipo 2 = 100% Completezza tipo 1: c tipo 1 = 100% Completezza tipo 2: c tipo 2 = 98.9% 1(net)0(net) 1(known)1260 0(known)2186
21
Workshop SCoPE - Stato del progetto e dei Work Packages Sala Azzurra - Complesso universitario Monte Sant’Angelo 21-2-2008 THE END
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.