Presentation is loading. Please wait.

Presentation is loading. Please wait.

Kenji Takeda Microsoft Research

Similar presentations


Presentation on theme: "Kenji Takeda Microsoft Research"— Presentation transcript:

1 Community Capability Model for Data-Intensive Research Project Overview
Kenji Takeda Microsoft Research Liz Lyon, Monica Duke, Alex Ball, Michael Day, Manjula Patel UKOLN, University of Bath

2 A Tidal Wave of Scientific Data

3 “Data sets are becoming the new instruments of science”
Dan Atkins, University of Michigan

4 — Douglas Kell, University of Manchester
“One of the greatest challenges for 21st-century science is how we respond to this new era of data-intensive science. This is recognized as a new paradigm beyond experimental and theoretical research and computer simulations of natural phenomena—one that requires new tools, techniques, and ways of working.” — Douglas Kell, University of Manchester

5 All Scientific Data Online
Many disciplines overlap and use data from other sciences. Internet can unify all literature and data Go from literature to computation to data back to literature. Information at your fingertips – For everyone, everywhere Increase Scientific Information Velocity Huge increase in Science Productivity Literature Derived and recombined data Raw Data (From Jim Gray’s last talk)

6 Data-intensive Science
Acquisition & modelling Collaboration and visualisation Analysis & data mining Dissemination & sharing Archiving and preserving Tagging to hide QA as part of publication and sign-off prior to dissemination

7

8 Medical Analogy Symptoms Diagnosis Action
Idea developed with Dave de Roure

9 What are we trying to achieve?
Understand disciplinary and community diversity in data-driven research (consult) Unpack the “readiness” concept : identify and deconstruct “capability” factors (scope) Explore components and metrics for the capability factors / parameters (describe) Develop a Community Capability Model Framework (model, visualise) Produce domain mini case studies and business usage cases (validate)

10 Application, value, benefits
Research Stakeholders PIs, research groups, departments Higher education institutions Research funding agencies Industry, business & innovation partners Gap analysis and dependencies Inform planning and assist decision-making Validate funding allocations Maximise funder investments Accelerate knowledge transfer between domains and across sectors

11 Community Workshops York: Harvard:
Bristol: 7th International Digital Curation Conference, 5 Dec Stockholm: 7th IEEE eScience Conference, 5 Dec Australia: 10 February 2012, Monash University Washington DC: 2012 (TBC)

12 Definitions Capability Maturity Behaviours
“power or ability to do something, capacity to be used or developed, a facility” “fully grown, fully-developed” “mass adoption & shared usage, community consensus & trust, advanced development & exploitation, embedded skills”

13 Defining a Model View as a Capability Spectrum?
Norms? Extremes? Trends? Components? Taxonomy? Visualisations? Indicators, benchmarks, metrics? D-Kuru from Wikimedia Commons

14 Existing Models

15 Metrics and measures (describe)

16 Metrics and measures (describe)

17

18 Metrics and measures (describe)

19 Exemplar

20 Draft Model Framework

21 Community Capability Model Framework
Independent working  Collaborative working Closed research  Open research Standardisation Information and communications technology Legal and ethical issues Economic and business issues Data sharing in the cloud, with annotations to facilitate discovery and reuse Sample and manipulate extremely large data collections in the cloud Top data analytics algorithms, through Excel ribbon running on Azure Invoke models, perform analytics and visualization Machine learning over large data sets to discover correlations Publish data collections and visualizations to the cloud to share insights Academic issues Skills and training

22 Parameters / factors (scope)

23 Parameters / factors (scope)

24 Parameters / factors (scope)

25 Parameters / factors (scope)
Discussion for Panel Session

26 Parameters / factors (scope)

27 Parameters / factors (scope)

28 Visualisation Synergy with other approaches

29 We need your help! Groupings, gaps and metrics?
How can the model help you?

30 Community Capability Model for Data-Intensive Research Panel Session
Alex Wade Microsoft Research

31 Guest Speakers Graeme Earl Dave De Roure

32 Developing a Model If you were trying to improve the capability for, and amount of, data-intensive research in your group or institution…. What would be the most useful things a model could provide?

33 Community Capability Model for Data-Intensive Research: Group Discussion Session

34

35 Medical Analogy Symptoms Diagnosis Action
Idea developed with Dave de Roure

36 What are we trying to achieve?
Understand disciplinary and community diversity in data-driven research (consult) Unpack the “readiness” concept : identify and deconstruct “capability” factors (scope) Explore components and metrics for the capability factors / parameters (describe) Develop a Community Capability Model Framework (model, visualise) Produce domain mini case studies and business usage cases (validate)

37 Visualisation Synergy with other approaches

38 ”People want more guidance at the start of their project – there is a black hole in this area.”
“People make do with what they can find – this may not be best practice” “not very well signposted even in support departments” “There are skills barriers but not particularly high – researchers are quite trainable.” “not sure we give them enough technical support to store & organise their data” “Researchers can hire cloud time but they don’t know what to do with it yet.”

39 Developing a Model If you were trying to improve the capability for, and amount of, data-intensive research in your group or institution…. What would be the most useful things a model could provide?

40 What would be the most useful things a model could provide?
Human Economic Technical Rewards Incentives Legal/ethical issues Collaborative working Closed/open research Skills & training Startup funding Operational costs Transactional costs Sustainability ICT Platforms Access Exposure Standards Data-intensive research 42 Microsoft Confidential Microsoft Confidential

41 What is holding research back?
Human Economic Technical Rewards Incentives Legal/ethical issues Collaborative working Closed/open research Skills & training Startup funding Operational costs Transactional costs Sustainability ICT Platforms Access Exposure Standards Data-intensive research 43 Microsoft Confidential Microsoft Confidential

42 What would be the most useful things a model could provide?
Rewards/incentives Good for teaching (e.g. UCI) Define tenure – is data Metrics of success Collaborative research drives data sharing – in order to be invited/stay on teams Modellers vs experimentors Consumers vs producers Peer-review for data? Skills and training Look at what grad students an postdocs are doing? Legal & ethical Not sharing copyright material Send analysis to the data Medical data Human Rewards Incentives Legal/ethical issues Collaborative working Closed/open research Skills & training Impact metric

43 What would be the most useful things a model could provide?
Economic Startup funding Operational costs Transactional costs Sustainability How to provide analytics against data stored at an institution e.g. who pays for the processing Cost model Transparent pricing How do you make RoI

44 What would be the most useful things a model could provide?
Technical ICT Platforms Access Exposure Standards Agreed practice NOT standards! Discipline semantics Interop and fragmentation Exposing services to the web e.g. clusters Bandwidth issues Disconnect between secondary use

45 Case Studies Exemplars of shared data success stories Archaeology
Hydrology Chemistry Music – data for testing algorithms Million song dataset – giving away features Oceanography Ecology Astronomy Philologists Group of communities Hathi Trust Digging into Data Challenges

46 Parameters / factors (scope)
What metrics/axes could be used to describe capability in each area?

47 Methods & tools

48 Data Management

49 Communication & collaboration

50 Community Capability Model for Data-Intensive Research
Kenji Takeda Microsoft Research

51 Summary Community Capability Model for
Researchers Institutions Funders Provide insight into data-intensive research

52 Next steps 2012 Prepare White Paper describing the CCM Framework for consultation Develop case studies (PIs, institutions, funding agencies) and business case Australia and Washington DC workshops: (validate) - test the strawman Framework Kenji Takeda – Liz Lyon –


Download ppt "Kenji Takeda Microsoft Research"

Similar presentations


Ads by Google