Presentation is loading. Please wait.

Presentation is loading. Please wait.

NIST Scientific Data for Data Science United Nations Open Data / Open Government Conference, April 26-28, Abu Dhabi

Similar presentations


Presentation on theme: "NIST Scientific Data for Data Science United Nations Open Data / Open Government Conference, April 26-28, Abu Dhabi"— Presentation transcript:

1 NIST Scientific Data for Data Science United Nations Open Data / Open Government Conference, April 26-28, Abu Dhabi http://semanticommunity.info/Data_Science/NIST_Scientific_Data_for_Data_Science http://semanticommunity.info/Data_Science/NIST_Scientific_Data_for_Data_Science Dr. Brand Niemann Director and Senior Data Scientist Semantic Community http://semanticommunity.info/ http://www.meetup.com/Federal-Big-Data-Working-Group/ http://semanticommunity.info/Data_Science/Federal_Big_Data_Working_Group_Meetup April 26, 2014 1

2 Open Data / Open Government Conference Request: – Interesting case studies about open government / open data. – Information on relevant federal apps designed. – A short bio. Response: – AOL Government published about 80 of my 200 some stories at Semantic Community about open government data and activities. – Over 250 Spotfire dashboard apps in my cloud library including most of the major open government dashboards and new data sets. – Helped Data.gov get started in the US, and open government data get started in the SEMIC.EU and Japan. 2

3 Speaker Bio Brand Niemann, former Senior Enterprise Architect & Data Scientist with the US EPA, works as a data scientist, produces data science products, and publishes data stories for Semantic Community, AOL Government, & Data Science & Data Visualization DC. He co-organized the Federal Big Data Working Group Meetup with Kate Goodier that has Data Science Teams producing big data applications for government and business and provides a free on-line graduate course entitled Practical Data Science for Data Scientists. 3

4 Broader Context NIST and other agencies need to support the following Federal Government Initiatives: – Big Data – Digital Government Strategy – Public access mandated for "scientific results" supported by the U.S. government – Federal agencies have submitted their "initial plans" for public access to scientific data to OSTP – Digital Object Architecture: One result will be to make the scientific record into a first class scientific object The author has suggested that all of these can be addressed with agency digital content by following the Data Mining Standard. – See “Data Science Makes Data More Important Than Code and Ontology”Data Science Makes Data More Important Than Code and Ontology 4

5 Data Mining Standard Business Understanding: – NIST Mission Standardize measurement Data Understanding: – NIST Digital Archives Promised to publish raw data sets Data Preparation: – Knowledge Base of the Above Need raw data for figures Modeling: – Semantic Knowledge Base, Data Papers, and NanoPublications See White Paper on “Making Big Data Small" using Data Science and Semantics Evaluation: – Searchability, Discovery, and Reasoning Relational Queries and Graph Traversal Deployment: – Story and Knowledge Base in MindTouch, Excel, NodeXL, Spotfire, and Be Informed Data ecosystem 5

6 NIST NIST Supports its employees and others with the following Information Services: – Research Library – Publishing Services – NIST Museum and Archives The NIST Digital Archives (NDA) present images of NIST Museum artifacts and full-text NIST publications: – NBS Bulletins – Journal of Research of NIST – NBS-NIST Directors – NBS-NIST Histories – NBS Circulars and Reports 6

7 NIST Home Page 7 http://www.nist.gov/

8 NIST Virtual Library 8 http://www.nist.gov/nvl

9 NIST Digital Archive Interface 9 http://nistdigitalarchives.contentdm.oclc.org/

10 NIST Digital Archive Contents 10 http://nistdigitalarchives.contentdm.oclc.org/cdm/search/display/200/order/title/ad/asc My Note: 9602 Items!

11 NIST Digital Archive Example 11 http://cdm16009.contentdm.oclc.org/cdm/compoundobject/collection/p13011coll6/id/153009/rec/1 My Note: Can Read PDF On-line, but Where Is the Data?

12 PDF-to-MindTouch 12 Figure 8 The solid circles show the measured absorbance Table 1 Properties of 2.0 μm microspheres at 266 nm obtained from the fit of the L-M apparent cross section to the absorbance measurements My Note: Need Data for Figure 8 and for Table 1 to be Real Data (it is!)

13 Modeling: Approaches by the Federal Big Data Working Group Meetup Semantic Medline: – Semantic MEDLINE Query: mesothelioma and Data Science for VIVO Semantic MEDLINE Query: mesotheliomaData Science for VIVO Data Papers: – Sepublica 2014: The Semantics for e-science in an intelligent Big Data Context http://sepublica.mywikipaper.org/ Nanopublications: – The smallest unit of publishable information: an assertion about anything that can be uniquely identified and attributed to its author. http://nanopub.org/wordpress/?page_id=65 13

14 Modeling: Examples 14 Most Recent: 500 citations, Start Date: 01/01/1900, End Date: 11/30/2013, 3169 predications extracted. Summarized for Substance Interactions Dr. Barend Mons: BRAIN Dr. Tom Rindflesch: Semantic Medline

15 Evaluation and Deployment The Evaluation and Deployment examples of each is as follows: – Semantic Knowledge Base: Web & PDF – Selected Data Papers: PDF-to-MindTouch Measurement of Scattering and Absorption Cross Sections of Microspheres for Wavelengths between 240 nm and 800 nm Measurement of Scattering and Absorption Cross Sections of Microspheres for Wavelengths between 240 nm and 800 nm OMNIDATA and the Computerization of Scientific Data – Nanopublication: Extracts from the Data Papers-to-Excel My Note: Still need the NIST raw data sources to re- create the figures in the publications. – I have been promised that NIST is going to publish their data sets as part of the Open Government Data Initiative. 15

16 How was the data collected? 16 http://semanticommunity.info/Data_Science/NIST_Scientific_Data_for_Data_Science My Note: Unstructured Information to Structured Data, Including the Two PDF Papers, with Well-defined URLs According to the SEMIC.EU Standards.

17 Where is the unstructured and structured data stored? 17 http://semanticommunity.info/@api/deki/files/28860/NISTDataScience.xlsx Web and PDF Footnote and References Metadata and Data Sources Well-defined URLs for Linked Data Relational and Graph Ready for NodeXL & Spotfire

18 What are the results?: NIST Scientific Data Knowledge Base Visualization 18 My Note: Sections with Many Reference Links Can be Very Important!

19 What are the results?: NIST Digital Archives Century of Excellence 19 My Note: The Featured Seminal Data Paper is the 60 th out of 106 Which I Found from Doing the Index Below!

20 What are the results?: NIST Digital Archives 20 My Note: The NIST Digital Archive Can be an Interface to Data Papers with Data Tables and Interactive Visualizations. This Work Can be Used to Prioritize the Additional Work and Reduce Duplication.

21 What are the results?: NIST Library Catalog Search for Data 21 My Note: This Was a Test for Searching the Catalog for “data” and Converting the Results to a Spreadsheet (20 of 259). There is Also the Need to Search for Data Tables Within the Individual Publications.

22 What is our data story and product? Need a scientific data publishing environment that supports: – Conformance to editorial policies – Facilitates peer review – Standardizes dissemination – Manages references and URLs – Promotes data publication, validation, and mining Semantic Community is doing that for NIST: – More work in progress to be reported at the conference and elsewhere 22


Download ppt "NIST Scientific Data for Data Science United Nations Open Data / Open Government Conference, April 26-28, Abu Dhabi"

Similar presentations


Ads by Google