TWC Why Data Science Matters Xiaogang (Marshall) Ma Tetherless World Constellation Rensselaer Polytechnic Institute

Slides:



Advertisements
Similar presentations
OPEN ACCESS PUBLICATION ISSUES FOR NSF OPP Advisory Committee May 30, /24/111 |
Advertisements

The Thomson Reuters CITATION CONNECTION Digital Library st March – 3 rd April 2014, Jasná David Horký Country Manager – Central and Eastern Europe.
DCO-VIVO: A Collaborative Data Platform for the Deep Carbon Science Communities Han Wang 1 ( ), Yu Chen 1 Patrick West.
Object Re-Use and Exchange Mellon Retreat, Nassau Inn, Princeton, NJ, March Herbert Van de Sompel, Carl Lagoze The OAI Object Re-Use & Exchange.
Planning for Flexible Integration via Service-Oriented Architecture (SOA) APSR Forum – The Well-Integrated Repository Sydney, Australia February 2006 Sandy.
McGuinness – Microsoft eScience – December 8, Semantically-Enabled Science Informatics: With Supporting Knowledge Provenance and Evolution Infrastructure.
Sensemaking and Ground Truth Ontology Development Chinua Umoja William M. Pottenger Jason Perry Christopher Janneck.
FROM DATA REPOSITORIES TO DATA JOURNALS – WHERE, WHEN AND HOW TO SUBMIT Andrew L. Hufton Managing Editor, Scientific Data Nature Publishing Group
A Semantic Workflow Mechanism to Realise Experimental Goals and Constraints Edoardo Pignotti, Peter Edwards, Alun Preece, Nick Gotts and Gary Polhill School.
TWC Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semantic Technologies Xiaogang (Marshall) Ma Tetherless World Constellation.
Semantic Similarity Computation and Concept Mapping in Earth and Environmental Science Jin Guang Zheng Xiaogang Ma Stephan.
Moving forward our shared data agenda: a view from the publishing industry ICSTI, March 2012.
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
Beyond a Data Portal: A Collaborative Environment for the Deep Carbon Science Communities Han Wang, Yu Chen, Patrick West, John Erickson, Xiaogang Ma,
The Case for Data Stewardship: Preserving the Scientific Record Matthew Mayernik National Center for Atmospheric Research Version 2.0 [Review Date]
DATAVERSE FOR JOURNALS Mercè Crosas, Ph.D. Director of Data Science IQSS, Harvard Society for Scholarly Publishing 37 th Meeting,
Progress in Open-World, Integrative, Web-based Collaborative Research Platforms Peter Fox and the DCO-DS* Team Tetherless World Constellation.
Data Archiving and Networked Services DANS is an institute of KNAW en NWO Trusted Digital Archives and the Data Seal of Approval Peter Doorn Data Archiving.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
References: [1] [2] [3] Acknowledgments:
Innovation & Supplementary Material Eleonora Presani – Elsevier
1 Why should “WE” CARE about data?. International initiatives OECD principles and guidelines for access to research data from public funding 2007 “Access.
Open Access to Grey Literature: Challenges and Opportunities in India By Dr. Manorama Tripathi Prof. H. N. Prasad Banaras Hindu University, Varanasi. Mr.
Semantic Cyberinfrastructure for Knowledge and Information Discovery (SCiKID) Proposal Principle Investigator: Eric Rozell Tetherless World Constellation.
E - Physical Sciences & Engineering Jeff Pache IEE
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
TWC Adoption of RDA DTR and PID in Deep Carbon Observatory Data Portal Stephan Zednik, Xiaogang Ma, John Erickson, Patrick West, Peter Fox, & DCO-Data.
Modeling and Representing National Climate Assessment Information using Linked Data Jin Guang Zheng 1 Curt Tilmes 2
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
Introduction GeoData 2014 Workshop #geodata2014 June 17-19, 2014,NCAR, Boulder, CO Peter Fox (RPI)
Presented by Dr. S. C. Jindal Librarian Central Science Library University of Delhi Delhi Information Competency.
TWC Deep Earth Computer: A Platform for Linked Science of the Deep Carbon Observatory Community Xiaogang (Marshall) Ma, Yu Chen, Han Wang, Patrick West,
Prof. Peter #twcrpi) Tetherless World Constellation Chair, Earth and Environmental Science/ Computer Science/ Cognitive.
Can sharing research data raise your research profile and impact? Gerry Ryder Charles Darwin University, September 2015.
1 Semantic Provenance and Integration Peter Fox and Deborah L. McGuinness Joint work with Stephan Zednick, Patrick West, Li Ding, Cynthia Chang, … Tetherless.
Deepcarbon.net Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox Tetherless World Constellation Rensselaer Polytechnic.
TWC Adoption of RDA DTR and PID in Deep Carbon Observatory Data Portal Stephan Zednik, Xiaogang Ma, John Erickson, Patrick West, Peter Fox, & DCO-Data.
DataONE: Preserving Data and Enabling Data-Intensive Biological and Environmental Research Bob Cook Environmental Sciences Division Oak Ridge National.
DCO-VIVO: A Collaborative Data Platform for the Deep Carbon Science Communities Han Wang 1 ( ), Yu Chen 1 Patrick West.
VIVO Conference 2013 Panel on VIVO Use-Cases for Collaborative Science: From Researcher Networks to Semantic User Interfaces for Data Patrick West – Tetherless.
References: [1] Lebo, T., Sahoo, S., McGuinness, D. L. (eds.), PROV-O: The PROV Ontology. Available via: [2]
Electronic labnotes Mari Wigham COMMIT/. Information WUR  Organising, sharing, finding and reusing data  Expertise in: ● Modelling data.
Information Modeling and Semantic Web Application For National Climate Assessment Jin Guang Zheng 1 Curt Tilmes 2
Data Citation: framing the discussion and global context Dr Simon Hodson Executive Director, CODATA Referencing data in publications: principles,
Deepcarbon.net Xiaogang Ma, Patrick West, John Erickson, Stephan Zednik, Yu Chen, Han Wang, Hao Zhong, Peter Fox Tetherless World Constellation Rensselaer.
Semantic Similarity Computation and Concept Mapping in Earth and Environmental Science Jin Guang Zheng Xiaogang Ma Stephan.
Determining Fitness-For-Use of Ontologies through Change Management, Versioning and Publication Best Practices Patrick West 1 Stephan.
 Key integrating concepts  Groups  Formal Community Groups  Ad-hoc special purpose/ interest groups  Fine-grained access control and membership 
TWC Illuminate Knowledge Elements in Geoscience Literature Xiaogang (Marshall) Ma, Jin Guang Zheng, Han Wang, Peter Fox Tetherless World Constellation.
DOE Data Management Plan Requirements
Determining Fitness-For-Use of Ontologies through Change Management, Versioning and Publication Best Practices Patrick West 1 Stephan.
TWC A use case-driven iterative method for building a provenance-aware GCIS ontology Xiaogang Ma a, Jin Guang Zheng a, Justin Goldstein b,c, Linyun Fu.
Copyright and Data Matthew Mayernik National Center for Atmospheric Research Section: Responsible Data Use Version 1.0 October 2012 Copyright 2012 Matthew.
TWC Adoption* of RDA DTR and PIT in the Deep Carbon Observatory Data Portal Xiaogang Ma, John Erickson, Patrick West, Stephan Zednik, Peter Fox, & the.
A Framework for Earth Science Search Interface Development Design and Implementation of S2S Presented by: Stephan Zednik, Tetherless World Constellation.
| 1 Anita de Waard, VP Research Data Collaborations Elsevier RDM Services May 20, 2016 Publishing The Full Research Cycle To Support.
Poster: EGU Glossary: USGCRP – United States Global Change Research Program NCA – National Climate Assessment GCIS – Global Change Information.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
RDA US Science workshop Arlington VA, Aug 2014 Cees de Laat with many slides from Ed Seidel/Rob Pennington.
Joslynn Lee – Data Science Educator
Xiaogang Ma, John Erickson, Patrick West, Stephan Zednik, Peter Fox,
Publishing software and data
Ontology Evolution: A Methodological Overview
Deep Carbon Observatory Data Science Platform
Data types and persistent identifiers in
Research Data Management
Adoption of RDA DTR and PIT in the Deep Carbon Observatory Data Portal
Bird of Feather Session
Jonathan Griffin, Managing Director, IFIS Publishing &
Presentation transcript:

TWC Why Data Science Matters Xiaogang (Marshall) Ma Tetherless World Constellation Rensselaer Polytechnic Institute ICSU-WDS Data Stewardship Award Lecture SciDataCon 2014, New Delhi, India, Nov

TWC Acknowledgements Dr. Mustapha Mokrane and Dr. Simon Hodson Colleagues at TWC/RPI, CODATA-ECDP, ESIP, CGI- IUGS, AGU/ESSI, ICSU-WDS, RDA, ITC, and more My mentor Prof. Peter Fox My family All of you

TWC Outline Technical trends –Data management, publication & citation Methodology –Interoperability & Provenance Data management is just a start –Data analysis –Semantic eScience 3

TWC Data Management 4 data work Image courtesy Randy Glasbergen

TWC Data Management Plan –A formal document that outlines what you will do with your data during and after you complete your research Resources/Tools help create DMPs: –NSF Data Management Plan Requirements: –DCC Data Management Plans: –DMPTool: –DCC DMPOnline: 5

TWC Data Publication Data as first class products of research –e.g., NSF bio-sketches can include data publications 6 Image from j4h.net See:

TWC 7 “All data necessary to understand, assess, and extend the conclusions of the manuscript must be available to any reader of Science. ” “…authors are required to make materials, data and associated protocols promptly available to readers without undue qualifications.” “…authors must make materials, data, and associated protocols available to readers.” “…it is a condition of publication that authors make available the data and research materials supporting the results in the article.” “…require authors to make all data underlying the findings described in their manuscript fully available without restriction…” “Earth and space science data should be widely accessible in multiple formats and long ‐ term preservation of data is an integral responsibility of scientists and sponsoring institutions.” “…support the principle that research data should be made freely available to all researchers…” “…recommends depositing data that correspond to journal articles in reliable data repositories…”

TWC Ways of data publication –Data as supplemental material of a paper –Standalone data –Data paper: data in a repository + descriptive ‘data paper’ 8 Strasser, GeoData 2014 Workshop Presentation (2014) Examples: Standalone data journals: Nature Scientific Data, Geoscience Data Journal, Ecological Archives, Data in Brief … Journals that publish data papers: Earth and Space Science, GigaScience, F1000 Research, Internet Archaeology …

TWC 9 An isolated data island ?! Image from nature.com

TWC Data Citation Data Citation Index –Indexes the world's leading data repositories –Connects datasets to related refereed literature indexed in the Web of Science™ –Efficient access to data across subjects and regions 10 Image courtesy

TWC Data interoperability 11 Ma et al., Nature Geosciecne (2011) Interoperability: “Data should be discoverable, accessible, decodable, understandable and usable, and data sharing should be legal and ethical for all participants.” Original image from:

TWC Provenance of research 12 Image from nature.com Ma et al., Nature Climate Change (2014) Provenance documentation “Linking a range of observations and model outputs, research activities, people and organizations involved in the production of scientific findings with the supporting data sets and methods used to generate them”

TWC IPython Notebook: A web-based interactive computational environment Di Stefano et al., ESIP 2014 Summer Meeting Presentation (2014) Codes, APIs, datasets, text… PDF document We made extension to the IPython Notebook environment to enable automatic provenance capture during a scientific workflow 13

TWC 14

TWC Semantic eScience Artificial Intelligence accelerates scientific discovery –Data search, synthesis and hypothesis representation –Data analysis: reasoning with models of the data Gil et al., Science (2014) Image from science.com A state-of-the-art example: Hanalyzer Hanalyzer (high-throughput analyzer) Uses natural language processing to automatically extract a semantic network from all PubMed papers relevant to a scientist Uses Semantic Web technology to integrate assertions from other biomedical sources Reasons about the network to find new correlations that suggest new genes to investigate 15 Leach et al., PLoS Comput Bio (2009)

TWC Deep Carbon Virtual Observatory Fox, RDA Fourth Plenary Meeting Presentation (2014) A cyber-enabled platform for linked science

TWC Summary Data as first class products of research eScience: the digital or electronic facilitation of science Semantic eScience –A virtuous circle between science and semantic technologies –Data driven + Knowledge driven? Image 17

TWC More information: Marshall X Ma Thank you!