Presentation is loading. Please wait.

Presentation is loading. Please wait.

Federal Big Data Working Group Meetup Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

Similar presentations


Presentation on theme: "Federal Big Data Working Group Meetup Dr. Brand Niemann Director and Senior Data Scientist Semantic Community"— Presentation transcript:

1 Federal Big Data Working Group Meetup Dr. Brand Niemann Director and Senior Data Scientist Semantic Community http://semanticommunity.info/ http://www.meetup.com/Federal-Big-Data-Working-Group/ http://semanticommunity.info/Data_Science/Federal_Big_Data_Working_Group_Meetup March 18, 2014 1

2 Mission Statement Federal: Supports the Federal Big Data Initiative, but not endorsed by the Federal Government or its Agencies; Big Data: Supports the Federal Digital Government Strategy which is "treating all content as data", so big data = all your content; Working Group: Data Science Teams composed of Federal Government and Non-Federal Government experts producing big data products (How was the data collected, Where is it stored, and What are the results?); and Meetup: The world's largest network of local groups to revitalize local community and help people around the world self-organize like MOOCs (Massive Open On-line Classes) being considered by the White House. 2 Co-organizers: Brand Niemann and Kate Goodier

3 Joint NSF-NIH Biomedical Big Data Research Meetup 3 http://semanticommunity.info/Data_Science/Euretos_BRAIN#Story “Thanks again for a wonderful gathering of deep thinkers at the NIH-NSF Big Data event -- that was terrific. Great line up of speakers.”

4 Scientific Data: A View from the US Dr. George Strawn, Director, NITRD/NCO and co-chair of the Federal Big Data Senior Steering Work Group: – Public access mandated for "scientific results" supported by the U.S. government – Federal agencies have submitted their "initial plans" for public access to scientific data to OSTP – Digital Object Architecture: An "hour glass" for data? (As the Internet was an hour glass for networks: TCP/IP at the narrow point; many applications above, many implementations below) – One result will be to make the scientific record into a first class scientific object 4 http://semanticommunity.info/@api/deki/files/28467/GeorgeStrawn01132014.ppt

5 Activities White House OSTP - MIT Big Data Privacy Workshop:Big Data Privacy Workshop – Story and Network Analysis of Tweets: April 1 st Meetup with Kate Goodier and Marc Smith NIST Data Science Symposium: – Poster and Story: Poster and Story Data Science Team Pilot with Information Services Office White Paper for NIST and NITRD: – “Making Big Data Small" using Data Science and Semantics: “Thanks again for your effort in putting this program together.!” Information Visualization MOOC: – Story and Course Work: Forming Teams to Work with Clients for the Remaining 7 Weeks DARPA Big Mechanism: – Story and Pilot: April 15 th Meetup with Mike Megginson, Northrop Grumman, and Fredrik Salvesen, YarcData (in planning) 5

6 Agenda 6:30 p.m. Tutorials (Proposed GMU Course) and Refreshments – Continue Data Science Tutorial: Class 4 and Graph Databases and Bigdata SYSTAP Literature Survey of Graph Databases 7:00 p.m. Introductions and Announcements (10 seconds per individual depending on the size of the group) 7:15 p.m. Featured Presentation/Demonstration (where did you get the data, where did you store the data, and what were your results?) – Bryan Thompson, Chief Scientist of SYSTAP, LLC will speak about their SYSTAP open source graph database platform. Highlights will include support for highly available replication clusters as well their recent work with accelerated graph processing on GPUs at 3 billion traversed edges per second. – See CSHALS 2014: Tech Talk and Poster in WikiWiki 8:30 p.m. Networking/Individual Demos (talk among yourselves and look at one another's work) 9:00 p.m. Continue Your Conversations Elsewhere (We need to clear out of the space) 6

7 Next Meetups Sixth Meetup: April 1, 6:30 p.m. – Network Analytics and Visualization of Big Data Privacy Workshop Tweets, Dr. Marc A. Smith, Chief Social Scientist, Connected Action Consulting Group, and Remarks by the President on Review of Signals Intelligence, Dr. Kate Goodier, Information Architect, Xcelerate Solutions Seventh Meetup: April 15, 6:30 p.m. – DARPA Big Mechanism, Mike Megginson, Northrop Grumman, and Fredrik Salvesen, YarcData (in planning) Eighth Meetup: May 6, 6:30 p.m. – Federating Big Data for Big Innovation, Dr. Jeanne Holm Data.gov Evangelist Ninth Meetup: May 18, 6:30 p.m. – The Science Behind Data Science, Ruhollah Farchtchi, Director of Big Data, UNISYS 2nd Cloud, SOA, Semantics and Data Science Conference, June (in planning) 7

8 Overview Practical Data Science for Data Scientists: – 2/11 Specific Data Science Tools and Applications 1 – Chapters 7 & 8 Data Science for VIVO & Information Visualization MOOC (not time to cover): Data Science for VIVO & Information Visualization MOOC – 7 Weeks of Course Work with Sci2 Tools – Forming Teams to Work with Clients for Next 7 Weeks NodeXL and Sci2 for Data Science (not time to cover): NodeXL and Sci2 for Data Science – NodeXL: A free, open-source template for Microsoft® Excel® that makes it easy to explore network graphs. – Sci2: A modular tool for science of science research & practice on scholarly datasets. 8

9 Practical Data Science for Data Scientists http://semanticommunity.info/Data_Science/Practical_Data_Science_for_Data_Scientists 9 Class 4 Providing On-Line Class With Private Tutoring

10 Resources Required Textbook – Doing Data Science: http://shop.oreilly.com/product/0636920028529.do Free Sampler: – http://cdn.oreillystatic.com/oreilly/booksamplers/9781449358655_sampler.pdf (PDF) http://cdn.oreillystatic.com/oreilly/booksamplers/9781449358655_sampler.pdfPDF Optional Supplemental Reading: – Data Science Starter Kit: http://shop.oreilly.com/category/get/data-science-kit.do – DC Data Community: http://datacommunitydc.org/blog/about/ DC Data Community Calendar: – http://datacommunitydc.org/blog/calendar/ http://datacommunitydc.org/blog/calendar/ Technology Requirements – Internet and Free Tools like Spotfire Cloud: https://spotfire.cloud.tibco.com/tsc/#!/compproductrequest – NodeXL: http://nodexl.codeplex.com/ My Note: Current Focus http://nodexl.codeplex.com/ 10

11 Class 4 2/11 Specific Data Science Tools and Applications 1 – Discuss Reading: Chapters 7 and 8, Present and Discuss Team Homework Exercises, Hands-on Class Exercise, and Team Homework Exercise.78 – My Resources: http://semanticommunity.info/Data_Science/Free_Data_Visualization_and_An alysis_Tools http://semanticommunity.info/Data_Science/Free_Data_Visualization_and_An alysis_Tools http://semanticommunity.info/Data_Science/KDD_Cup http://www.kdnuggets.com/datasets/ Hands-on Class Exercise: – SAS and SAS Public Data Sets SASSAS Public Data Sets – See​ Spotfire ​Web Player and Spotfire File, Spotfire Web Player and Spotfire File, and Spotfire Web Player and Spotfire File​Web PlayerFileWeb PlayerFileWeb PlayerFile – Exercise: Build Your Own Recommendation System Exercise: Build Your Own Recommendation System 11

12 Discuss Reading Chapter 7: – How do companies extract meaning from the data they have? In this chapter we hear from two people with very different approaches to that question— namely, William Cukierski from Kaggle and David Huffaker from Google. Chapter 8: – This is the most difficult chapter in the book for me to teach since I do not understand the Python code at the end and have never built a Recommendation Engine myself. I would welcome some help here. 12

13 Present and Discuss Team Homework Exercise Get the Data: Go to Yahoo! Finance and download daily data from a stock that has at least eight years of data, making sure it goes from earlier to later. If you don’t know how to do it, Google it.Go to Yahoo! Finance – Yahoo: http://finance.yahoo.com/q/hp?s=%5EO...t orical+Prices (CSV)http://finance.yahoo.com/q/hp?s=%5EO...t orical+PricesCSV – See Spotfire ​Web Player and FileWeb PlayerFile 13

14 Chapter 6 Timestamps and Financial Modeling 14 Web Player

15 Hands-on Class Exercise SAS and SAS Public Data Sets: SASSAS Public Data Sets – SAS-Spotfire ​Web Player and Spotfire File,​Web PlayerFile – SAS Exercises-Spotfire Web Player and Spotfire File, andWeb PlayerFile – SAS Public Data Sets-Spotfire Web Player and Spotfire FileWeb PlayerFile Exercise: Build Your Own Recommendation System Exercise: Build Your Own Recommendation System – I would welcome some help here. 15

16 SAS Public Data Sets-Spotfire Tutorial 16 Web Player

17 Team Homework Exercise Read in next week's reading: Data Visualization for the Rest of Us:Data Visualization for the Rest of Us – See my Slides and Web Player.SlidesWeb Player – Start to create your own Hubway Data Visualization Challenge and eventually submit it for your class project and the challenge (now closed but still accepting submissions) if you want. Form Teams (Same or New), Ask Me Questions, and Prepare to Present Next Week 17

18 18 Web Player

19 A Data Science Big Mechanism for DARPA DARPA wants to help the DoD get to the essence of cause and effect for cancer from reading the medical literature. The Federal Big Data Working Group Meetup has also been doing that with Semantic Medline - YarcData and Euretos BRAIN (Bio Relations and Intelligence Network). – See the video for Cancer Immunotheraphy (21 minutes) which Science magazine called the biggest breakthrough in 2013 at the end of 2013 and which Dr. Tom Rindflesch (the inventor of Semantic Medline) identified from Semantic Medline as a very important breakthrough in early 2013!Cancer Immunotheraphy 19

20 Data Science Data Mining Process Business Understanding: – Broad Agency Announcement (PDF) and Slide Presentation (PPT) Data Understanding: – Semantic Medline, Open Catalog, CSHALS* 2014, and “Starter kit“ (to be provided) Data Preparation: – Knowledge Base of the Above Modeling: – Semantic Medline, Data Papers, and NanoPublications Evaluation: – Searchability, Discovery, and Reasoning Deployment: – Story and Knowledge Base in MindTouch, Excel, Spotfire, and Be Informed 20 * Conference on Semantics in Healthcare and Life Sciences

21 The Initial Knowledge Base- Data Ecosystem 21 http://semanticommunity.info/Data_Science/A_Data_Science_Big_Mechanism_for_DARPA

22 Where did we find some structured data? 22 http://www.darpa.mil/opencatalog/

23 Where did we store the structured data? 23 http://semanticommunity.info/@api/deki/files/28732/DARPA.xlsx

24 Modeling: Approaches Semantic Medline – Semantic MEDLINE Query: mesothelioma and Data Science for VIVO Semantic MEDLINE Query: mesotheliomaData Science for VIVO Data Papers: – Sepublica 2014: The Semantics for e-science in an intelligent Big Data Context http://sepublica.mywikipaper.org/ Nanopublications: – The smallest unit of publishable information: an assertion about anything that can be uniquely identified and attributed to its author. http://nanopub.org/wordpress/?page_id=65 24

25 How did we store the unstructured data? 25 http://semanticommunity.info/@api/deki/files/28470/BRAIN.xlsx Well-defined URLs Knowledge and Glossary Relational and Graph Linked Data Footnote and References Metadata and Data Sources Ready for NodeXL & Spotfire

26 Modeling: Examples 26 Most Recent: 500 citations, Start Date: 01/01/1900, End Date: 11/30/2013, 3169 predications extracted. Summarized for Substance Interactions Dr. Barend Mons: BRAIN Dr. Tom Rindflesch: Semantic Medline

27 What did we find when we analyzed the data? 27 Web Player

28 What is our data story and product? Data Ecosystem: – BRAIN.xlsx – DARPA.xlsx Individual Tabs: – DARPA Open Catalog: Bigdata SYSTAP is Category: Infrastructure and License: GPLv2 – DARPA Big Mechanism Knowledge Base: DARPA Big Mechanism Knowledge Base by Function (21) DARPA Big Mechanism Knowledge Base by Number of References (175) – BRAIN Knowledge Base and Examples: BRAIN Knowledge Base by Function (References) Data Fairport Conference Dropbox Files by Type (PPTX) – Data Science for VIVO & IVMOOC Citations by Publisher (APS) Total Award Amount ($) by Principal Investigator (Geoffrey Fox) 28

29 Graph Databases 29 http://semanticommunity.info/Data_Science/Graph_Databases#Story http://semanticommunity.info/Data_Science/Graph_Databases/Tutorial Absent: Bigdata SYSTAP Virtuoso YarcData Etc. 12 Leading BI Tools and Analytic Platforms I Tested for OMB

30 Bigdata SYSTAP Literature Survey of Graph Databases 30 http://semanticommunity.info/Data_Science/Bigdata_SYSTAP_Literature_Survey_of_Graph_Databases#Story Awarded Best Paper in 2004! And 10 Years Later…..


Download ppt "Federal Big Data Working Group Meetup Dr. Brand Niemann Director and Senior Data Scientist Semantic Community"

Similar presentations


Ads by Google