1 Data-Intensive Research: Actions to make better use of Data for Research Malcolm Atkinson & David De Roure & 12 January 2010 Report to e-Science Leeds from a fact-finding mission
Mission goal: learn how researchers use data Acknowledgements: the UK e-Science Directors CIR authors, our teams, the EPSRC, the RCUK, USA office and all of our hosts in the USA, their good ideas; all the opinions, observations and recommendations are our own.
Draft Report, partially edited
Outline Cornucopia of data Yet to learn how to use it well Hot topic Research to politics Concepts Datascopes, Intellectual Ramps, Going the last mile Co-* Alignment Digital ecosystem Principles Recommendations Actions Survival in the Digital Revolution 3
Data-Intensive Research Events Bermuda agreement 1996, 97 & 98 SDSS Archive DB1999 Human Genome 2001 DI Comp. Environms2001 Fort Lauderdale2003 Hey&Trefethen D.Del.2003 Digital Curation Cen.2004 NSF DataNet call2007 XLDB series starts2007 SciDB starts2008 Yahoo DI workshop2008 Harnessing data2009 Beyond data del.2009 Govs use Linked D.2009 NSF CISE DI call th Paradigm book2009 JISC Research DM2009 e-IRG DMTF report2009 DIEW Japan2010
Sir Tim Berners-Lee How will Linked Data benefit researchers?
Datascopes for the naked mind 6 NRAO/AUI/NSF To reveal evidence in data you could never see before Data to Information to Knowledge to Wisdom Changed our place in the universe
Datascopes Summary Better methods for extracting information from data better algorithms for discovery, selection, fusion, distillation, aggregation, presentation algorithms transformed to run incrementally Better strategies for using the algorithms Better data/metadata and semantics Better platforms supporting the strategies Data centres hosting data and computation Coping with more complexity, more users & more questions Knowledge, questions & datascopes co-evolve Rally cross-disciplinary effort
Intellectual ramps Easy and low risk to start Progress to advanced skills For research data users No obligation Go as far as you want Find a service & relax
Dropbox as a Ramp Local folder synchronised and shared via cloud Condor job submitted by drag and drop Ian Cottam Results appear in Dropbox Slide from David De Roure ramp 1: exploiting familiar tools
Intuitive interfaces e-Science Research Slide from Jano van Hemert Engineering economic ramps Replace Portal Building Cottage Industry ramp 2 - hiding complexity; minimal input
Ramps: Summary An easy path to use a data analysis method An opportunity Not an obligation Engage as far as you want Use a service for routine tasks Types of ramp in browser - now powerful - can reach the GPU in familiar tools support from centres and crowd-sourced Strongly linked with education Removes distracting technical clutter Rescues educators & students Ramp & education co-evolve Boost investment here
Technology & Researchers 15 Co-evolution Tech. display Researchers choose? Niches? Fastest at adaptation wins
Actions 1. Workshops on DIR 2. DIR education 3. Ideas factory project launch 4. Test best practice 5. Immediate research challenges 6.DIR facilities pool 7.Boost reference data services 8.Foundational research 9.Green DIR 10.Coordination
Phase 1 1. Workshops on DIR 2. DIR education 3. Ideas factory project launch 4. Test best practice 5. Immediate research challenges In Edinburgh March
Phase 1 1. Workshops on DIR 2. DIR education 3. Ideas factory project launch 4. Test best practice 5. Immediate research challenges What will your organisation do?
Phase 1 1. Workshops on DIR 2. DIR education 3. Ideas factory project launch 4. Test best practice 5. Immediate research challenges Immersive + mix => launch projects
Actions Phase 1 1. Workshops on DIR 2. DIR education 3. Ideas factory project launch 4. Test best practice 5. Immediate research challenges Import, Experiment, Engage & Choose existing D-I methods and technology for pressing existing research
Actions Phase 1 1. Workshops on DIR 2. DIR education 3. Ideas factory project launch 4. Test best practice 5. Immediate research challenges Which ones? Cross-cutting challenges, methods, facilities and capabilities
Actions phase 1 Actions A1 to A5 will build capacity for larger and more demanding projects, help researchers build teams with effective mixes of skills and experience, and provide performance information for the selection of strategies, technologies and teams for larger commitments to follow.
Phase 2 6.DIR facilities pool 7.Boost reference data services 8.Foundational research 9.Green DIR 10.Coordination S/W, H/W & support: what services do researchers need? How much? How soon? More software; Less hardware More bandwidth; Fewer FLOPS
Phase 2 6.DIR facilities pool 7.Boost reference data services 8.Foundational research 9.Green DIR 10.Coordination What can we do to help UK reference-data services? What do is needed as international agreements?
Phase 2 6.DIR facilities pool 7.Boost reference data services 8.Foundational research 9.Green DIR 10.Coordination Computing science + Mathematical & Information sciences + Field experience + long-term commitment to building foundations of DIR UKCRC should espouse this cause
Phase 2 6.DIR facilities pool 7.Boost reference data services 8.Foundational research 9.Green DIR 10.Coordination What should the UK or your organisation do? Where should it do it?
Phase 2 6.DIR facilities pool 7.Boost reference data services 8.Foundational research 9.Green DIR 10.Coordination Framework for interdisciplinarity & pooled effort/resources + UKs international presence? This should be set up immediately
Applied science + last mile We seek solutions. We dont see - dare I say this? - just scientific papers anymore Quoting US Secretary of Energy & Nobel laureate Steven Chu
Survival in the digital-data revolution depends on speed and appropriateness of adaptation
Summary Much research is data intensive More of it will be Exploiting the opportunity is urgent for the UK (you/your org.) This requires changes In facility provision In research investment In research behaviour (incentives) In education These changes are part of the digital revolution Understand, engage and ride the wave Investing in data-intensive research Will accelerate research Deliver more applicable research Provide a better return on investment
And Next... Messages from the panel To EPSRC Research Facilities SAT To ESRC & BBSRC &... To Caroles e-Infrastructure group To e-IRG & ESFRI Develop ideas & plan campaign In your university, institute, research council Feed into Spending Reviews Meeting at the e-Science Institute 15 to 19 March
24 ADMIRE – Framework 7 ICT ? Picture composition by Luke Humphry based on prior art by Frans Hals