Presentation is loading. Please wait.

Presentation is loading. Please wait.

Federal Big Data Working Group Meetup Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

Similar presentations


Presentation on theme: "Federal Big Data Working Group Meetup Dr. Brand Niemann Director and Senior Data Scientist Semantic Community"— Presentation transcript:

1 Federal Big Data Working Group Meetup Dr. Brand Niemann Director and Senior Data Scientist Semantic Community http://semanticommunity.info/ http://www.meetup.com/Federal-Big-Data-Working-Group/ http://semanticommunity.info/Data_Science/Federal_Big_Data_Working_Group_Meetup January 7, 2014 1

2 Mission Statement Federal: Supports the Federal Big Data Initiative, but not endorsed by the Federal Government or its Agencies;Federal Big Data Initiative Big Data: Supports the Federal Digital Government Strategy which is "treating all content as data", so big data = all your content;Federal Digital Government Strategy Working Group: Data Science Teams composed of Federal Government and Non-Federal Government experts producing big data products (see Possible Team Presentations below); andPossible Team Presentations Meetup: The world's largest network of local groups to revitalize local community and help people around the world self-organize like MOOCs (Massive Open On-line Classes) being considered by the White House.world's largest network of local groupsWhite House 2

3 Co-organizers Brand Niemann and Kate Goodier Kate Goodier, Host: Excelerate Solutions offices in Tysons Corner:Excelerate Solutions – Capacity about 50 with Skype and wifi available. The Silver Line Spring Hill Metro Stop (planned to open in March) is across the street (Route 7 and Spring Hill Road). Directions to the building are easy and they have open underground parking: Directions – See photo below from Excelerate Solutions Office looking south to the Spring Hill Road Silver Line Metro Station (planned to open in March 2014). Logistics: – Refreshments, restrooms, etc. 3

4 Suggested Format 6:30 p.m. Tutorials (I will start with - Proposed GMU Course, and hope that others would offer to do tutorials as well) and Refreshments 7:00 p.m. Introductions and Announcements (10 seconds per individual depending on the size of the group) – Remarks by Dr. George Strawn, Director, NITRD/NCO and co-chair of the Federal Big Data Senior Steering Work GroupDirector, NITRD/NCOFederal Big Data Senior Steering Work Group 7:15 p.m. Featured Presentation/Demonstration (where did you get the data, where did you store the data, and what were your results) – Start with our Semantic Big Data Science Application: Semantic Medline on the YarcData Graph Appliance for the Federal Big Data Senior Steering Work Group that our Semantic Data Science Team made a good presentation of to Lee Watkins Jr., Director of Bioinformatics at the Institute of Genetic Medicine Center for Inherited Disease Research (CIDR) recently.Semantic Big Data Science ApplicationpresentationLee Watkins Jr.Center for Inherited Disease Research 8:30 p.m. Networking/Individual Demos (talk among yourselves and look at one another's work) 9:00 p.m. Continue Your Conversations Elsewhere (We need to clear out of the space) 4

5 Next Meetups Second Meetup: Tuesday, February 4, 6:30 p.m. – Continue Data Science Tutorial: Practical Data Science for Data Scientists – What Went Wrong with the Obamacare Web Site, and How Can It Be Fixed? and Why the First Rollout of HealthCare.gov Crashed, an Architectural Assessment, Eric Kavanagh, Inside Analysis, and Geoffrey Malafsky, PSIKORS Institute; Healthcare.gov Data Science, Brand Niemann, Semantic Community; and Healthcare.gov Prototype Video, Kees van Mansom, Be Informed Third Meetup: Tuesday, February 18, 6:30 p.m. – Continue Data Science Tutorial: Modus Operandi Semantic Knowledge Base – Wave All-Source Semantic Fusion Engine: Eric Little, Modus Operandi: and Department of Defense Metadata Engineers. Fourth Meetup: March 4, 6:30 p.m. – Continue Data Science Tutorial: Graph Databases and Bigdata SYSTAP Literature Survey of Graph Databases – Bigdata SYSTAP, Michael Personick and Bryan Thompson, SYSTAP April Workshop: Date and Location TBA – 2 nd Cloud: SOA, Semantics, Data Science, and Business Concept Computing (16th SOA for eGov Conference). 5

6 Practical Data Science for Data Scientists http://semanticommunity.info/Data_Science/Practical_Data_Science_for_Data_Scientists 6

7 Resources Required Textbook – Doing Data Science: http://shop.oreilly.com/product/0636920028529.do Free Sampler: – http://cdn.oreillystatic.com/oreilly/booksamplers/9781449358655_sampler.pdf (PDF) http://cdn.oreillystatic.com/oreilly/booksamplers/9781449358655_sampler.pdfPDF Optional Supplemental Reading: – Data Science Starter Kit: http://shop.oreilly.com/category/get/data-science-kit.do – DC Data Community: http://datacommunitydc.org/blog/about/ DC Data Community Calendar: – http://datacommunitydc.org/blog/calendar/ http://datacommunitydc.org/blog/calendar/ Technology Requirements – Internet and Free Tools like Spotfire Cloud: https://spotfire.cloud.tibco.com/tsc/#!/compproductrequest – NodeXL: http://nodexl.codeplex.com/ 7

8 Class 1 1/21 What is Data Science and the Data Science Process? – Discuss Reading: Chapters 1 and 212 My Resources: – http://semanticommunity.info/Data_Science http://semanticommunity.info/Data_Science – http://semanticommunity.info/Analytics/Predictive_Analytic_World_Government_2013#Story http://semanticommunity.info/Analytics/Predictive_Analytic_World_Government_2013#Story Hands-on Class Exercise: Individual and Team Profiles and Case Study: RealDirectCase Study: RealDirect 8

9 Tutorial Overview: Data Science and the Data Science Process My Profile: Breaking Government/AOL Government Data Stories and Products – Select some interesting content and make it structured – Select a related data set/table – Explore both and write a story about it: Where did you get the data?, Where did you store the data?, and What were your results? What were the steps? Assignment: Do something like My Profile 9

10 Overview: Data Science 10 http://semanticommunity.info/Data_Science Key Concepts Extracted What is Data Science? The future belongs to the companies and people that turn data into products See Sidebar Topics

11 Overview: Data Science Process 11 http://semanticommunity.info/Analytics/Predictive_Analytic_World_Government_2013#Story So my three overlapping circles are: "Find and Prepare Data Sets", "Store and Query Data Sets", and "Discover Data Stories in the Data Sets“ See mapping between the three Venn Diagrams in the table below.

12 Select some interesting content 12 http://breakinggov.com/2012/03/30/defense-department-bets-big-on-big-data/

13 Make it structured 13 http://semanticommunity.info/@api/deki/files/27612/SpotfireCloud.xlsx

14 Select a related data set/table 14 http://semanticommunity.info/@api/deki/files/27612/SpotfireCloud.xlsx My Note: This is Categorized (Faceted Search) Correlation (Two Numeric Variables) Relational (Columns and Rows) Linked (URLs) Semantic Web (Subject, Predicate, and Object) Graph/Network Analytics (Edge and Node Tables) Geospatial (Could add Latitude and Longitude)

15 AOL Gov to BreakingGov Migration 15 Web Player Note: The lack of correlation between Excel size and Spotfire size is due to the presence of large boundary (Shape) files).

16 Spotfire Silver to Spotfire Cloud Migration 16 Web Player

17 Explore both and write a story about it Where did you get the data?, – The Web and spreadsheets Where did you store the data?, and – Spreadsheets What were your results? – All files were accounted for in the two migrations (data quality), versatile formats were created, and visualizations help me and others build on this data science work Steps: – Search MindTouch for Spotfire File Name: Like GDELT-Spotfire – Find Where It Was Used at One Or More Locations – Change Web Player Links in Spotfire Dashboard, Story, and Slides – Test to See If Embedded File Works – Repeat the Process 283 Times! 17

18 Preview of What You Are Going To Hear The Best Way to Get BIG DATA is By Starting Small: – BIG DATA – Subcommittee on Networking and Information Technology Research and Development (NITRD Subcommittee) These three activities fostered Semantic Medline on the YarcData Graph Appliance for the White House Big Data Initiative. – Data Science Team Example – Generic Problems – Semantic Medline – YarcData Graph Appliance Application for Federal Big Data Senior Steering WG – Modus Operandi: Mantra, Performance, and Vision – Knowledge Base: Modus Operandi Web Intelligence in MindTouch – Big Data in Memory: Innovation Story 18


Download ppt "Federal Big Data Working Group Meetup Dr. Brand Niemann Director and Senior Data Scientist Semantic Community"

Similar presentations


Ads by Google