Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Virtual Observatory and other data issues computers and astronomy today background technology the future : opportunities and problems VO vision VO.

Similar presentations


Presentation on theme: "The Virtual Observatory and other data issues computers and astronomy today background technology the future : opportunities and problems VO vision VO."— Presentation transcript:

1 The Virtual Observatory and other data issues computers and astronomy today background technology the future : opportunities and problems VO vision VO progress next steps OECD workshop Munchen Andy Lawrence Dec 2003

2 computers and astronomy today (1)

3 key IT areas (1) facility operations (2) facility output processing (3) shared supercomputers for theory (4) science archives (5) end-user tools (1-3) : big bucks (4-5) : smaller bucks but set requirements for (1-2)

4 astronomical archives major archives growing at TB/yr issue not storage but management (curation) improving quality of data access and presentation needs specialist data centres

5 end users increasing fraction of archive re-use increasing multi-archive use most download small files and analyse at home some users process whole databases reduction standardised; analysis home grown

6 needles in a haystack Hambly et al 2001 - faint moving object is a cool white dwarf - may be solution to the dark matter problem - but hard to find : one in a million - even harder across multiple archives

7 solar-terrestrial links Coronal mass ejection imaged by space- based solar observatory Effect detected hours later by satellites and ground radar

8 background technology (2)

9 dogs and fleas there is a very large dog

10 hardware trends ops, storage, bw : all 1000x/decade –can get 1TB IDE = $5K –backbones and LANS are Gbps but device bw 10x/decade –real PC disks 10MB/s; fibre channel SCSI poss 100MB/s and last mile problem remains –end-end b/w typically 10Mbps

11 operations on a TB database searching at 10 MB/s takes a day –solved by parallelism –but parallel code hard ! ==> people transfer at 10 Mbps takes a week –leave it where it is ==> data centres provide search service

12 network development higher level protocols ==> transparency TCP/IP message exchange HTTPdoc sharing (web) grid suiteCPU sharing XML/SOAP data exchange ==> service paradigm

13 on the internet horizon workflow definition dynamic semantics (ontology) software agents

14 the future : opportunities and problems (3)

15 data rich future heritage –Schmidt, IRAS, Hipparcos current hits –VLT, SDSS, 2MASS, HST, Chandra, XMM, WMAP coming up : –UKIDSS, VISTA, ALMA, JWST, Planck, Herschel cross fingers : –LSST, ELT, Lisa, Darwin,SKA, XEUS, etc. plus lots more danger is being lost in trees

16 data growth astronomical data is growing fast but so is computing power so whats the problem ? T 2 < 18 mths 1990-2000 (1) Fast facilities (2) End user delivery

17 data rates : collection and processing reference : sky at 0.1", 1 col, 16 bits = 100 TB worst problems for FAST experiments –SKA peak PB/sec out of correlator –ELT MCAO real time control : Pflops –repeat wide field imaging, eg LSST –N**2 processing e.g. GAIA achievable but challenging –supercomputer today = 10 Tflops; Pflops ok in 20 years –may need dedicated hardwired logic –real problem is s/w development and ops logistics

18 data rates : archive VISTA 100 TB/yr by 2007 SKA datacubes 100PB/yr by 2020 not a technical or financial problem –LHC doing 100PB/yr by 2007 issue is logistic : data management need professional data centres

19 data rates : user delivery disk I/O and bandwidth –end-user bottlenecks will get WORSE –but links between data centres can be good move from download to service paradigm –leave the data where it is –operations on data (search, cluster analysis, etc) as services –shift the results not the data –networks of collaborating data centres (datagrid or VO)

20 user demands bar constantly raising –online ease –multi-archive transparency –easy data intensive science new requirements –automated resource discovery (intelligent Google) –cheap I/O and CPU cycles –new standards and software infrastructure

21 the VO vision (4)

22 the VO concept web all docs in the world inside your PC VO all databases in the world inside your PC

23 Generic science drivers data growth multi-archive science large database science can do all this now, but needs to be fast and easy empowerment

24 whats its not not a monolith not a warehouse

25 VO framework framework + standards inter-operable data inter-operable software modules no central VO-command

26 VO geometry not a warehouse not a hierarchy not a peer-to-peer system small set of service centres and large population of end users –note : latest hot database lives with creators / curators

27 yesterday browser front end CGI request html web page DB engine SQL data

28 today application web service SOAP/XML request SOAP/XML data DB engine SQL native data anything standard formats

29 tomorrow application web service job results anything web service web service web service web service web service Registry Workflow GLUE Certification VO Space standard semantics publish WSDL

30 VO progress (5)

31 parts of the big picture standards : IVOA glue writing : main VO projects implementations : national VO projects data creation : facilities data publication : data centres tools and data mining services : lots of folk

32 International VO alliance (IVOA)

33 IVOA standards formal process modelled on W3C technical working groups and interop workshops agreed functionality roadmap key standards so far –VOTable –resource and service metadata definitions –provisional semantic dictionary –provisional protocols for image and spectra

34 glue accomplishments two example registries (NVO and AstroGrid) conesearch service (NVO) remote analysis (AVO) virtual space management (AstroGrid) next few months –workflow, single sign-on, module interface standards

35 implementations several controlled demos of working s/w slowly increasing functionality can do real science now but still clunky beta testers using now, real users by end 2004 ?

36 AVO demo Jan 2003

37 tools and services smaller projects building compliant tools –eg VOPlot from India little work so far on ambitious analysis services

38 next steps (6)

39 next steps intelligent glue –ontology, agents analysis services –cluster analysis, multi-D visualisation, etc theory services –simulated data, models on demand embedding facilities –VO ready facilities –links to data creation

40 funding VO projects –funded off the back of e-science/grid so far –should be absorbed into mainstream : agency backing processing and data centres –big bucks...but don't skimp ! –finishing the science needs this work network infrastructure –fat pipes between service centres crucial –last mile investment will give a community an edge

41 castle on the hill or creative anarchy ? no VO command centre but does need co-ordination –standards body –exchange of best practice – continuing waves of technology development

42 Euro-VO 2005+ plan three bodies VO Facility Centre (VOFC) Data Centre Alliance (DCA) VO Technology Centre (VOTC) aim to be part national, part EU funding

43 lessons drivers: bottlenecks, user demand, empowerment need network of healthy data centres need last mile investment need facilities to be VO ready need continuing technology development need continuing coord programme of standards need national backing and agency backing

44 FIN

45 publishing metaphor facilities are authors data centres are publishers VO portals are shops end-users are readers VO infrastructure is distribution system.

46 collectivisation and democratisation thirty year trend towards communal organisation –facility class (common-user) instruments –facility class data reduction s/w –calibrated archives with simple tools –information services (Vizier, ADS, NED) –large consortium projects (MACHO, 2dF, SDSS, UKIDSS, VISTA...) next steps –inter-operable archives (joint queries) –automated resource discovery (registry) –facility-class exploration and analysis tools (data mining)

47 How Much Information Is there? Soon everything can be recorded and indexed Most data never be seen by humans Precious Resource: Human attention Auto-Summarization Auto-Search is key technology. www.lesk.com/mlesk/ksg97/ksg.html Yotta Zetta Exa Peta Tera Giga Mega Kilo A Book.Movi e All LoC books (words) All Books MultiMedia Everything ! Recorded A Photo 24 Yecto, 21 zepto, 18 atto, 15 femto, 12 pico, 9 nano, 6 micro, 3 milli

48 multi- views of a Supernova Remnant Shocks seen in the X- ray Heavy elements seen in the optical Dust seen in the IR Relativistic electrons seen in the radio


Download ppt "The Virtual Observatory and other data issues computers and astronomy today background technology the future : opportunities and problems VO vision VO."

Similar presentations


Ads by Google