Mingfang Wu, Stefanie Kethers, Andrew Treloar Getting from managed to reused: Making it easier for researchers to do something useful with data.

Slides:



Advertisements
Similar presentations
Geo-spatial and Visualisation L&T materials - the e-MapScholar project Moira Massey ALT-C 2002 University of Sunderland.
Advertisements

Where next…. Stakeholder workshop, 29 Jan To the end of the project.
About ANDS One of the chief goals of ANDS is to build the Australian Research Data Commons, a cohesive collection of research data outputs from all Australian.
The Wash and Fens Green Infrastructure Plan Paul Espin.
Measuring Interest Group Expectations and Trust Meeting the demands of a changing world Theme 1: Reputation and Image Analysis Presenter - Gemma van Halderen.
Bio-IT World Asia Conference 2013 A Genomics Virtual Lab for Cancer Research Dominique Gorse.
Open Government Vlora Ademi, Business Development Manager-Edu, Microsoft Macedonia &Kosovo
1 Enriching UK PubMed Central SPIDER launch meeting, Wolfson College, Oxford Paul Davey, UK PubMed Central Engagement Manager.
GIS Overview. What is GIS? GIS is an information system that allows for capture, storage, retrieval, analysis and display of spatial data.
An Internet Tool For Forecasting Land Use Change And Land Degradation In The Mediterranean Region Richard Kingston & Andy Turner University of Leeds UK.
New Approaches to GIS and Atlas Production Infrastructure for spatial data integration: across scales and projects Ilya Zaslavsky David Valentine San Diego.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
Meeting of the Management of Statistical Information System (MSIS 2014) Innovation – Open Data Initiative of Government of India – Fostering Innovations,
Vivien Bonazzi Ph.D. Program Director: Computational Biology (NHGRI) Co Chair Software Methods & Systems (BD2K) Biomedical Big Data Initiative (BD2K)
Virtual Geophysics Laboratory (VGL) VGL v1.1 Launch Ryan Fraser, Terry Rankine, Joshua Vote, Lesley Wyborn, Ben Evans, Robert Woodcock February 2013 CSIRO.
Virtual Geophysics Laboratory (VGL) VGL v1.2 NeCTAR Project Close R.Fraser, T.Rankine, J.Vote, L.Wyborn, B.Evans, R.Woodcock, C.Kemp July 2013 CSIRO |
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Australian Partnership for Sustainable Repositories AUSTRALIAN PARTNERSHIP FOR SUSTAINABLE REPOSITORIES Caul Meeting 2005/2 Brisbane 15.
TPAC Digital Library Talk Overview Presenter:Glenn Hyland Tasmanian Partnership for Advanced Computing & Australian Antarctic Division Outline: TPAC Overview.
RDA Wheat Data Interoperability Working Group Outcomes RDA Outputs P5 9 th March 2015, San Diego.
RDA Wheat Data Interoperability Working Group Outcomes RDA Outputs P5 9 th March 2015, San Diego.
Developing Health Geographic Information Systems (HGIS) for Khorasan Province in Iran (Technical Report) S.H. Sanaei-Nejad, (MSc, PhD) Ferdowsi University.
Moving forward our shared data agenda: a view from the publishing industry ICSTI, March 2012.
Harnessing the Power of Environmental Data for Decision-Making IABIN Phase II.
Centre for Earth Systems Engineering Research Infrastructure Transitions Research Consortium (ITRC) David Alderson & Stuart Barr What is the aim of ITRC?
SoE Reporting in Scotland Scotland’s Environment Web LIFE Project Joanna Muse Principal.
Spatially enabling Northern Ireland Dr Suzanne McLaughlin DFP Land & Property Services GIS Ireland Conference 11 th October 2012.
Genomics Virtual Lab: analyze your data with a mouse click Igor Makunin School of Agriculture and Food Sciences, UQ, April 8, 2015.
Updates from EOSDIS -- as they relate to LANCE Kevin Murphy LANCE UWG, 23rd September
Enabling Cloud and Grid Powered Image Phenotyping Nirav Merchant iPlant Collaborative
NEPTUNE Canada Workshop Oceans 2.0 Project Environment NEPTUNE Canada DMAS Team Victoria, BC February 16, 2009.
material assembled from the web pages at
The Statistical Spatial Framework for Australia - enabling location analysis Gemma Van Halderen First Assistant Statistician Population, Education & Data.
Virtual Laboratories VGL and Friends R.Fraser, T.Rankine, J.Vote, R.Woodcock AuScope Grid Roadshow 2014 CSIRO | MINERAL RESOURCES FLAGSHIP.
Supporting Research in an Era of Data Deluge Developing a new service portfolio within Information Services at the University of Western Australia Toby.
Data Citation & Digital Object Identifiers DOIs. 2 DOIs for articles mints DOIs for Journal articles and some datasets.
ANDS and its Services Phenomics Data & Informatics Workshop 2010, Friday, 23rd April 2010.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Sport and Recreation Spatial Background and demonstration.
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
The Astronomy challenge: How can workflow preservation help? Susana Sánchez, Jose Enrique Ruíz, Lourdes Verdes-Montenegro, Julian Garrido, Juan de Dios.
Research Grants and Projects Discovery Service ANDS Webinar 12th August 2015 Monica Omodei, ANDS.
The Parkes Data Archiving Project Arkadi Kosmynin 11 December 2009 The Third ATNF Gravitational Wave Workshop.
ARROW Institutional Repositories for Managing e-Theses Presentation to ETD September 2005 Geoff Payne, ARROW Project Manager.
Using Innovative technology for planning Associate Professor Rochelle Eime
AN ORGANISATION FOR A NATIONAL EARTH SCIENCE INFRASTRUCTURE PROGRAM Virtual Geophysics Laboratory (VGL): Scientific workflows Exploiting the Cloud Josh.
| nectar.org.au NECTAR TRAINING Module 2 Virtual Laboratories and eResearch Tools.
Curtin University is a trademark of Curtin University of Technology CRICOS Provider Code 00301J The Digital Mineral Library at Curtin University Major.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
Knowledge Modeling and Discovery. About Thetus Thetus develops knowledge modeling and discovery infrastructure software for customers who: Have high-value.
Fire Emissions Network Sept. 4, 2002 A white paper for the development of a NSF Digital Government Program proposal Stefan Falke Washington University.
The Canadian Climate Impacts Scenarios (CCIS) Project is funded by the Climate Change Action Fund and provides climate change scenarios and related information.
Nov 26, Health-y sharing of human data. 2 Plan ahead.. It can be done in many cases, to great success and benefit!
CSIRO’s Data Access Portal Sue Cook | Research Data Services Support 18 March 2014.
Landscape Heritage Sustainable Development Indicator Assessment using Geographical Information Systems in County Clare Lianda d’Auria Department of Geography,
Event-Based Model for Reconciling Digital Entities Ahmet Fatih Mustacoglu Ahmet E. Topcu Aurel Cami Geoffrey C. Fox Indiana University Computer Science.
COMPASS09 Annual Conference of Compass Informatics.
ANDS Projects: The University of Western Australia 16 May, 2011 Toby Burrows, Manager (eResearch Support)
Integrating LAMS using Blackboard Building Blocks and PowerLinks James Dalziel Managing Director, LAMS International Pty Ltd & Professor of Learning Technology.
Breakout session 11.7 Sustainable Urban Development Outputs Wenbo Chu GEO Secretariat.
ARCHER Building data and information management tools for the complete research life-cycle July 2006.
Enhancements to Galaxy for delivering on NIH Commons
Integrated infrastructure for UQ researchers
Sharing models as social objects through HydroShare
National e-Infrastructure Vision
FDA Objectives and Implementation Planning
how users and data producers interact on WIS
Palestinian Central Bureau of Statistics
Presentation transcript:

Mingfang Wu, Stefanie Kethers, Andrew Treloar Getting from managed to reused: Making it easier for researchers to do something useful with data

What is ANDS?  ANDS is supported by the Australian Government  Began in 2009, currently funded to mid 2015  Collaboration between Monash University, CSIRO and the Australian National University  Staff in 6 cities across the country  Funded 200+ projects across 68 institutions ANDS aims to make data more valuable to researchers, research institutions and the nation 2

So that researchers can easily publish, discover, access and use research data through the Australian Research Data Commons. How Do We Make Data More Valuable? Value

ANDS Programs  Underpinning infrastructure for discovery and citation (ARDC Core)  Enable rich metadata about data to be managed and accessible (Metadata Stores)  Make new data and associated metadata available from range of instruments (Data Capture)  Make a selection of existing data and associated metadata available from Australia’s research-producing universities (Seeding the Commons)  Make data and associated metadata available from government departments (Public Sector Data)  Provide the overall policy and practice frameworks to support better data management and re-use (Frameworks and Capabilities)  Demonstrate the value of doing all these (Applications) 4

Tools for Data-reuse 5 Data Collections Metadata Data Form Hypothesis Design & Run Experiment Publish Paper, Data, Software Research Activities Look Up Data Analyse Data/Results Discover Data Transform Data Visualise Data Analyse Data Register Data Workflow Integrate Data Extract Data Computing

6 The ANDS Applications Program  Funded through EIF (Education Infrastructure Fund)  Focus on Software Infrastructure to enable research  Goal of the Applications program: “to produce compelling demonstrations of the value of having data available for re-use” (i.e. enabling research across many sources of data that was not previously possible).

Developed software might…  empower researchers to solve important problems  build new connections  enable important problems to be solved  enable new questions to be answered  simplify problems  accelerate solving problems, or analysing data 7

What have been funded under the apps program?  7 projects in bio/characterisation  8 projects in climate change adaptation  10 others (urban planning, marine research, public health, humanity )  For a completed list of the apps projects and their profiles, please visit ANDS project registry: 8

What kind of tools have been developed?  Data transformation  Data linkage and integration  Data service  Data analysis and modelling  Data visulisation  Data manipulation workflow …. 9

Example Applications  Climate Model Downscaling Data for Impacts Research  Cancer Genomics Linkage Application  Brain Mapping National Resource  POSITIVE PLACES: Spatial Analysis of Public Open Space 10

Climate Model Downscaling Data for Impacts Research Regional Climate Model Data Collection 11  Very big! High spatial and temporal resolution Large region Many climate variables Many atmospheric layers Multiple simulations  Data on an irregular model grid  Stored in netCDF

12 Regional Climate Model Downscaling Data Agricultural Impacts Researchers Hydrological Impact Researchers Health Impacts Researchers Ecological Impacts Group

13 Climate Change Impact Researchers: I see some problems!  What is a Regional Climate Model?  I don’t have enough disk space for this dataset on my computer  I can’t find data for the sites I’m interested in  My software tools can’t handle this irregular grid.  I can’t read this netCDF data format  This data set doesn’t contain data for my site  This data gives me strange results for the current climate  This dataset is great! – How can I share my work on it with others? Impacts-relevant high res  Very big! High spatial and temporal resolution Large region Many climate variables Many atmospheric layers Multiple simulations  Data on an irregular model grid  Stored in netCDF Regional Climate Model Downscaling Data

14 Data service – Climate Model Downscaling Data for Impact Research (CliMDDIR) (AP04, UNSW) Provide open source software to transform RCM data Extract subsets of data (e.g. variables, regions) Regrid or interpolate data to sites Reformat data (e.g. GIS, ASCII, CSV) Calculate derived variables (e.g. pan evaporation) Apply statistical corrections (if necessary)

CliMDDIR Service 15 Collection Description at RDAService Description at RDA

CliMDDIR Service Portal 16 Climate impact researchers can select region select time coverage select variables select simulation models select output format share (sub-set) data to other researchers

Agricultural Impact Researchers 17 Assess how climate change impact on wheat cropping in NSW using the APSIM agriculture model Climate Modellers IT Specialists

Workflow - Cancer Genome Linkage Project 18 Challenges faced by biologists and Clinicians: The manual process required to integrated their research data with other data sets No availability of standarised analytical processes The delay in transitioning from analysis to publication ready result Raw data tttctgaaga ccatggacta tgagacctct Derived Data (i.e. mutation info) is released through the ICGC Data Portal

Workflow - Cancer Genome Linkage Project 19 Variant detection pipeline in Galaxy Provide software/infrastructure to enable integration/transformation of multiple datasets within the GVL environment  Software Development by QFAB (Queensland Facility for Advanced Bioinformatics, UQ)  Development aligned with that of the NeCTAR GVL  Inclusion of the very large raw ICGC Pancreatic Dataset into the NeCTAR GVL  Development of (reusable) Galaxy Workflows for easier mutation searching

Workflow - Cancer Genome Linkage Project 20 Screenshots of output data

Workflow - Cancer Genome Linkage Project 21

Data Visualisation Brain Mapping National Resource  Funded at QCIF and Centre for Advanced Imaging, UQ  Developed TissueStack that can link to specific parts of the data,, and rapidly view and collaboratively annotate on very large 3D datasets via a web browser.  For detail, please go to Dr. Andrew Janke’s presentation on Wed. 12:05 – 12:25, Room:P1 22

POSITIVE PLACES: spatial analysis of public open space  Are the current provisions of POS and parks adequate for the projected urban densification and population growth?  Will there be enough POS? (i.e. will it meet the 10% land provision still?)  Will the provision of different park types and facilities that encourage use by different population demographics (i.e. small pocket parks with play equipment for young children) or for different uses (i.e. active or passive recreation) be adequate? What more / less will be needed?  Is there sufficient large open space for active recreation and sporting needs?  What type of POS can promote increase social connectedness within communities? Challenge: lack of a comprehensive and consistent digital datasets of public open space 23

24 Data integration and interrogation: Public Open Space (POS) Tool developed at UWA With advance features, users can: define area of interest directly on screen upload a user defined region as a GIS shapefile scenario test the relationship between changes in population structure for a user defined area and the provision of POS POS statistics of a searched suburb or LGA can be downloaded as an Excel spreadsheet 7624 areas of POS 3813 parks (up to 43 different facilities and amenities per park) 820 school grounds/playing fields 1860 natural and conservation or bushland areas 771 areas of residual green space

Who benefit from the applications projects?  Researchers  Conduct existing research more efficiently  Enable new research  Increase research collaboration opportunities  Strength relationship with government agencies and industries  Connect science to the public  Government agencies, urban planner, and infrastructure planner, …  The public 25 Prof. Charles Watson, from Curtin University and neuroscience Research Australia commented that “The ability to share data from cloud, access it through TissueStack, would make a huge difference to the way we are able to interact, the ability for all participates to access the same dataset, to annotate it and to have a discussion on the way forward. Max De Antoni Migliorati (PhD Candidate from QUT) on Semaphore: monitoring and Modelling Australian Gas Emissions: It is much more time effective, it is much more easier to get our result with Semaphore. Now I can run 5 simulation today, while a previous method, it took me one day to get one simulation done.

Summary  Substantial data infrastructures have been built to enable data sharing and data reuse  The ANDS application program has demonstrated the value of data sharing and data reuse 26

Information  ANDS project registry:  Project blogs: feed.html feed.html  Demonstrations of value:

Thanks  To Ian Macadam (from UNSW) for providing some slides about CliMDDIR project  To all who have participated in and contributed to the program 28

Questions? 29