Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hrvatska, June 3-rd, 2003 COST Action n. 283 - progress report, June 2003 Computational and Information Infrastructures in the Astronomical Data GRID Giuseppe.

Similar presentations

Presentation on theme: "Hrvatska, June 3-rd, 2003 COST Action n. 283 - progress report, June 2003 Computational and Information Infrastructures in the Astronomical Data GRID Giuseppe."— Presentation transcript:

1 Hrvatska, June 3-rd, 2003 COST Action n. 283 - progress report, June 2003 Computational and Information Infrastructures in the Astronomical Data GRID Giuseppe Longo Chair of Astrophysics Department of Physical Sciences University of Napoli Federico II, Italy & INFN (Italian Institute for Nuclear Physics) Chair: Prof. F. Murtagh – Queen University College Belfast Hubble Deep Field

2 Hrvatska, June 3-rd, 2003 Methodological background: Id est: is history teaching us something (or isnt it?)… Role of Technological Breakthroughs All discoveries Before 1954 After 1954 Number of discoveries

3 Hrvatska, June 3-rd, 2003 Where is (now) the next breakthrough in Astronomy? Either new channels (better: new information carriers): Electromagnetic waves (optical since 1609, other since 60s) Solid samples (70s ->) Gravitational waves (2005 ->) Neutrinos (early 80s ->) Or leaps in any of: Sensitivity Spectral range Spectral resolution Angular resolution Time resolution

4 Hrvatska, June 3-rd, 2003 The iAstro people believe that: Discoveries Massive data sets Distributed computing Massive data mining Hardware breakthrough: wide field imaging with CCD Mosaics enables digital surveys The Sky covers 40.000 sq. Deg. With 0.6 arcsec sampling: 2 x 10 12 pxl 8 TB for band (10/100 TB/survey) Ca. 10 PB keeping temporal resolution ( ca. h for 1 yr …need for 20 yr)

5 Hrvatska, June 3-rd, 2003 From Traditional to Survey Science Highly successful and increasingly prominent, but inherently limited by the information content of individual surveys … What comes next, beyond survey science is distributed (V.O.) science Data Analysis Results Telescope Traditional: Survey Telescope Archive Follow-Up Telescope Results Target Selection Data Mining Another Survey/Archive? Survey-Based: Courtesy of G. Djorgovski

6 Hrvatska, June 3-rd, 2003 Surveys Observatories Missions Survey and Mission Archives Follow-Up Telescopes and Missions Results Data Services --------------- Data Mining and Analysis, Target Selection Digital libraries Primary Data Providers VO Secondary Data Providers A Schematic Illustration of the new astronomy Courtesy of G. Djorgovski

7 Hrvatska, June 3-rd, 2003 RadioFar-InfraredVisible Visible + X-ray Dust Map Density Map Panchromatic view of the Universe: Search for the unknown Offers: Different physics Global understanding Comparison with theory New discoveries New domains of the parameter space: cf. time Faint, Fast Transients (Tyson et al.)

8 Hrvatska, June 3-rd, 2003 RA Dec Wavelength Time Flux Proper motion Non-EM … Polarization Morphology / Surf.Br. High dimensionality (N>>100) What is the coverage? Where are the gaps? Calls for… Feature selection clustering statistics KDD Visualization, etc… Catalogue space (features; TB) Pixel space (raw data; TB/PB) Huge data flow data fusion need for recalibrations Calls for… Automatic catalogue extraction spurious features removal image parametrization and classification data compression multiscale analysis, etc.

9 Hrvatska, June 3-rd, 2003 T 2 (Moore) ~1.5 years Sounds Beautiful ! …. BUT: Terascale (Petascale?) computing and/or better algorithms are required

10 Hrvatska, June 3-rd, 2003 In modern data sets: D D >> 10, D S >> 3 Data Complexity Multidimensionality Discoveries But the bad news is … The computational cost of clustering analysis: Some dimensionality reduction methods do exist (e.g., PCA, class prototypes, hierarchical methods, etc.), but more work is needed K-means: K N I D Expectation Maximisation: K N I D 2 Monte Carlo Cross-Validation: M K max 2 N I D 2 N = no. of data vectors, D = no. of data dimensions K = no. of clusters chosen, K max = max no. of clusters tried I = no. of iterations, M = no. of Monte Carlo trials/partitions Digital sky surveys call for huge increases in computing power

11 Hrvatska, June 3-rd, 2003

12 Standard Activities all meeting reports and proceedings on the web First and Second MC meetings, Brussels, 11/23/2001 & 2/14-15/2002 Third MC meeting, Edinburgh, 07/21/2002 (at GGF-5, Global Grid Forum 5) Fourth MC meeting & workshop on Multispectral data analysis, and image metadata, Strasbourg, 11/28-29/2002 Fifth MC meeting & workshop on High/low resolution signal processing, Granada, 02/22-23/2003 Planned: Sixth MC meeting & workshop on Poisson noise models, Nice, Oct. 2003. Planned: Seventh MC meeting & workshop on Data mining & Image analysis in a distributed environment, Capri, Mar. 2004.

13 Hrvatska, June 3-rd, 2003 Granada, february 2002 Guess who was taking the picture…

14 Hrvatska, June 3-rd, 2003 Major Orientation of iAstro in early 2003: FP6 Expressions of Interest filed in - summer 2002. Participation in Commission Information Days. Involvement in several NoEs (sensor fusion, information retrieval, e-education and training, the European virtual observatory, and digital signal processing and data mining in medicine). Participation in evaluation panels.

15 Hrvatska, June 3-rd, 2003 Submitted early April 2003. Participants: iAstro partners in BG, CH, D, E, F, GR, H, I, IRL and UK. Additional partner cluster in University of Paris Sud. COST 283 proposal for the Marie Curie RTN network GridFocus: Data and Information Fusion and Mining in the Context of the DataGrid Multiband and multiple layer image and signal processing as a basic paradigm for the data Grid. Data mining of visual and other streams, including high performance forensic image data mining. Empirical and virtual data interfaces.

16 Hrvatska, June 3-rd, 2003 GridFocus concept based on data dynamics and information thermodynamics

17 Hrvatska, June 3-rd, 2003 SOMETHING ON SCIENCE….

18 Hrvatska, June 3-rd, 2003 open import header compliant non compliant openimport Head/proc. preprocessing Parameter and training options Supervised unsupervised supervised Parameter options unsupervised Labeled unlabeled labeled Label preparation Feature selection via unsupervised clustering MLPRBFEtc. Training set preparation Feature selection via unsupervised clustering Etc. GTMSOM Fuzzy set INTERPRETATION Code in C++ Parallelized on Beowulf Used (so far) for Cosmology particle Physics (ARGO) Gravitational Waves (VIRGO)

19 Hrvatska, June 3-rd, 2003 A standard clustering example: unsupervised S/G classification Input data: DPOSS catalogue ( ca. 5x10 6 objects, 50 features each ) SOM (output is a U-Matrix) ~ GTM (output is a PDF) 1.Input data (Tables or strings) 2.Feature selection (backward elimination strategy) 3.Compression of input space and re-design of network 4.Classification 5.Labeling (e.g. 500 well classified objects) 6.…freeze & run on real data

20 Hrvatska, June 3-rd, 2003 Star/Galaxy classification Automatic selection of significant features Unsupervised SOM (DPOSS data)

21 Hrvatska, June 3-rd, 2003 Labeling Localization of a set of 500 faint stars

22 Hrvatska, June 3-rd, 2003 Stars p.d.f galaxies p.d.f cumulative p.d.f G.T.M. unsupervised clustering; S/G

23 Hrvatska, June 3-rd, 2003 cumulative p.d.f Stars p.d.f galaxies p.d.f G.T.M. unsupervised clustering; S/G – CDF Field 5x10 5 obj.

24 Hrvatska, June 3-rd, 2003 SDSS-EDR DB SOM unsup. Set construction SOM supervised Feature selection MLP supervised experiments SOM unsup. completeness Reliability Map Best MLP model Input data set: SDSS – EDR photometric data (galaxies) Training/validation/test set: SDSS-EDR spectroscopic subsample Photometric redshifts: a mixed case

25 Hrvatska, June 3-rd, 2003 Step 3 - experiments to find the optimal architecture Varying n. of input, n. of hidden, n. of patterns in the training set, n. of training epochs, n. of Bayesian cycles and inner loops, etc. Convergence computed on validation set Error derived from test set Robust error: 0.02176

26 Hrvatska, June 3-rd, 2003 Advance the state of the art through our workshops and visits. Manyof these exchanges presented results in a Special Issue of Neural Networks ( Ed. Tagliaferri and Longo, vol 16 3-4, 2003 ). Status: ongoing. Define our role vis-à-vis large Framework Programme projects on the virtual observatory, grid, computer vision, etc. through an iAstro White Paper. Status: done in early 2003. Spin-off specific targeted actions where greater resources are needed. Status: GridFocus Marie Curie RTN network proposal written and submitted in early 2003; local initiatives Next step: spin-out and commercial exploitation of our work through a STREP or IP proposal? iAstro strategy:

27 Hrvatska, June 3-rd, 2003 iAstro web pages: To join the iAstro Mailing List: send a message to: Where & how to know more about iAstro: Thanks to: E.U. & to… Prof. Fedi

Download ppt "Hrvatska, June 3-rd, 2003 COST Action n. 283 - progress report, June 2003 Computational and Information Infrastructures in the Astronomical Data GRID Giuseppe."

Similar presentations

Ads by Google