Data Science, Data Infrastructure, & Data Publications for the HHS IDEA Lab Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic.

Slides:



Advertisements
Similar presentations
Data Science for Business: Semantic Verses Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Advertisements

Data Science for Tackling the Challenges of Big Data
Federal SOA for E-Government The Top Ten Things You Need to Know for YouTube October 15, 2011 DRAFT 1
Director and Senior Data Scientist/Data Journalist
Data Science for NSF Polar Cyberinfrastructure & MIT Big Data Course Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community.
1 Services and Cloud Computing Work Groups: Status Update Brand Niemann US EPA January 8, 2010.
DoDAF 3.0: A Web 2.0 and SOA Mashup!
Build VIVO in the Cloud NIH Workshop on Value Added Services for VIVO Brand Niemann Semantic Community March 25-26,
EarthCube Data Science Publications Dr. Joan Aron Dr. Sophia Liu Dr. Brand Niemann May 29, 2015
Data Science for Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Build the Binary Group in the Cloud Brand Niemann Senior Enterprise Architect Binary Group August 5, Updated August 8,
Data Science for MyFamilySearch.org and FamilyTree DNA Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
OMB Data Visualization Tool Requirements Analysis: IBM Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
OMB Data Visualization Tool Requirements Analysis: Microsoft Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
NLM-Semantic Medline Data Science Data Publication Commons Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data.
Big Data and Social Media & Web Analytics Innovation Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
NIST Scientific Data for Data Science United Nations Open Data / Open Government Conference, April 26-28, Abu Dhabi
Federal Big Data Working Group Meetup Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
EPA Big Data Analytics: Data Science for EPA Fracturing Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Semantic Data Discovery: Proof of Concept for DHS
Linked Data Visualizations for Eurostat Linked Data Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
OMB Data Visualization Tool Requirements Analysis: SAP Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
1 Semantic Cloud Computing & Open Linked Data Pattern Brand Niemann Invited Expert to the NCIOC SCOPE and Services WGs September 22, 2009.
Data Science for USGS Minerals Big Data Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data.
Imagine Everything is Before You: Past, Present, and Future Paper and Demonstration for the 2014 Family History Technology BYU Dr. Brand Niemann.
Information Sharing Begins With Me Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
Data Science Publication for NSF Polar Cyberinfrastructure Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Using Data Science as Evidence in Public Policy With Big Data and Elections Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist.
EPA Indicators of Our Health and Environment Updated and Improved Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic.
Big Data Symposium: Analytics and Applications for Federal Big Data - FEMA Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist.
Farm Data Dashboards: USDA and Microsoft Innovation Challenge Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data.
Data Science for Agency Initiatives 2015 Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Data Science for RDA Climate Change Data Challenge and Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data.
Data Science for Big Data Application and Analytics MOOC Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Data Science for NOAA Chief Data Officer and Big Data Predictive Analytics Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist.
Director and Senior Data Scientist/Data Journalist
Data Science ESIP Publication Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Data Science for USGS Minerals Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science.
Data Science for DTIC Data Ecosystem Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
The 2012 EuroStat Regional Yearbook for Semantic Interoperability Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic.
Data Science for USDA Big Data
Data Science for HealthData.gov Developers & Family Caregivers Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community.
Data Science for the National Big Data R and D Initiative Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Open DATA METI: All Content As Big Data Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
Data Science for Migration Data Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Data Science for the NOAA Chief Data Officer Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Data Science for HealthCare.gov Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Data Science for Semantics Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science for Semantics.
Department of Commerce App Challenge: Big Data Dashboards Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community.
Data Science for DoI BSEE Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science for DoI BSEE.
Data Science for Joint Doctrine Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science for Joint.
Data Science for FDA RFI Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Data Science for Conservation International's Big Ecosystem Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community.
NGA Demo Participant Collaboration Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
1 Social Business Intelligence from Open Government Data Brand Niemann Senior Enterprise Architect US EPA November 27, 2010 DISCLAIMER: While allowed to.
1 Improved Access to EPA and Interagency Information: Before and After with Web 2.0 – Part 7 EPA Jam on Improved Access to Environmental Information, June.
Government Technology & Innovation Incubator for Big Data Analytics Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community.
Data Science for NIST Big Data Framework Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Government and Industry IT: one vision, one community Vice Chairs April Meeting Agenda Welcome and Introductions GAPs welcome meeting with ACT Board (John.
Business Opportunity Health and Freedom (This Power Point Presentation has been translated by two associates and is not the work of Winner4Life.)
Data Science for Global Ebola Response Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science.
HealthIT.gov Dashboard: Spotfire not Flash Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Google Apps and Tools for the Classroom
GroupRocket.net. Years back checking s in the morning was the first ever thing most of the professionals would start their day with. And with the.
Data Science for the National Big Data R&D Initiative Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Partner Readiness Guide Cloud Application Development
Federal Communities of Practice: IBM Contributions
First Meetup: Data Science for the Data Act at Treasury
Partner Readiness Guide Cloud Application Development
Spotfire 5 Users Guide Dashboard
Presentation transcript:

Data Science, Data Infrastructure, & Data Publications for the HHS IDEA Lab Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community December 1,

The Profit and Data Enterprises Marcus Lemonis (born November 16, 1973) is a Lebanese-born American businessman, investor, television personality and philanthropist. He is currently the chairman and CEO of Camping World and Good Sam Enterprises, and the star of The Profit, a CNBC reality show about saving small businesses through People, Process, and Products. The Federal Big Data Working Group Meetup is also about helping government agencies develop: – People – Data Scientists – Process – Data Infrastructure – Products – Data Publications Some examples: – EPA – FDA – NOAA – HHS And provide MOOCs for training and networking. 2

Five MOOCs for Big Applications and Analytics Practical Data Science for Data Scientists by Niemann Based on Schutt and O’Neil Book Data Science for Data Mining by Niemann Based on North Book and Borne Class Federal Big Data Working Group Meetups by Niemann and Goodier Tackling the Challenges of Big Data, MIT ProfessionalX Online Course by Niemann Based on Rus and Madden MOOC Data Science for Big Data Application and Analytics MOOC by Niemann Based on Geoffrey Fox MOOC 3 See: Top 5 MOOCs for Data ScienceTop 5 MOOCs for Data Science

Agenda 6:30 p.m. Welcome and Introduction – Report on Recent HHS IDEA Lab Demo Meeting with Bryan Sivak (invited) and Damon Davis (invited) and HHS Data Science Data Publication Tutorial Slides Background Data Science for Tackling the Challenges of Big Data (MIT Online Course)SlidesBackgroundData Science for Tackling the Challenges of Big Data 7:00-7:15 p.m. Joe Pringle, Director of Health, Socrata Slides and Demo Links:Slides – – – – – 7:15 p.m. Brief Member Introductions and Refreshment Break 7:30 p.m.​ Alex Sherman and Kartik Verma, Deloitte Consulting for HHS NIH and MHS, Slides and Demo Link: – GINAS: Advancing FDA's Ingredient Information System, Noel Southall, National Institutes of Health (also FDA involved) (invited) FDA has articulated its vision for a next-generation data system that serves as the central clearing house for ingredients in medical products. Meanwhile, the National Center for Advancing Translational Science at NIH has created its own substance tracking system to facilitate research efforts. Working with the FDA, this NIH team will test their software as a solution in the FDA environment. – Fostering Scientific Insight through Data Federation, Brock Smith, National Institutes of Health (invited) This cross-departmental team consisting of individuals representing NIH, FDA and CDC recognizes a problem affecting scientists and their research goals. Because of the breadth and variety of resources, NIH researchers have difficulty synthesizing existing public data with their internally produced research findings and thus can easily lose valuable scientific insight. The team is testing the value of a web platform called SEMOSS that is designed to aggregate existing, fragmented health data while leveraging data analytic and visualization tools to enable scientists’ intuitive analysis and synthesis in their research. 8:30 p.m. Open Discussion 8:45 p.m. Networking 9:00 p.m. Depart 4 Data Science, Data Infrastructure, & Data Publications for the HHS IDEA Lab

Calendar November 4, December 16, 2014, Tackling the Challenges of Big Data, MITProfessionalX Online Course, $545. (January 5 th Meetup) – out out November 4, "Diverse Data Analytics Applications“ a joint George Mason University and IBM ASC Symposium. (This Meetup) – Slides Slides November 13, Automated Data Science: Data Science as a Service™ (DSaaS), Virginia Big Data Meetup. (This Meetup) – Group/events/ / Group/events/ / November 18-19, Symposium on Predictive Analytics For Defense and Government. (December 15 th Meetup) – Group/events/ / Group/events/ / January 5, Data Science for NSF Polar Cyberinfrastructure and MIT Big Data Course – Group/events/ / Group/events/ / 5

Data Science for NOAA Chief Data Officer and Big Data Predictive Analytics Meetup Excellent presentation. Predictive Analytics in the Era of Big Data by Dave Vennergrund was precise and informative. Thanks for presenting. Great presentation on NOAA Big Data by Brand! Thanks!! Has anyone come across a list or database of all US businesses (free)? Very good quality talks. Hi, This is Ram, I am Java developer with 5 years experience. I want to get in Big data development projects what is best path to start for me. Sorry if this question is already asked earlier. Great video/sound app! The presentations were excellent as always. 6

Semantic Insights Followup Looking for interested individuals who wish to participate in our Natural Language Understanding and Reasoning research. We welcome educational institutions and individual researchers interested in working collaboratively with us. Accounts are available for beta test: – Applying High-speed Pattern Recognition to Generate Queryable Semantics from Big Data - Big Data is filtered and reduced in real-time for event and pattern discovery: – Applying High-speed Pattern Recognition to Generate Queryable Semantics from Big Data Applying High-speed Pattern Recognition to Generate Queryable Semantics from Big Data 7

Data Science for Data Mining: Overview I suggested "Data Mining for the Masses" by Matthew North. It uses the CRISP Data Mining Conceptual Model that is used in the Data Science for Business book I did the tutorial for. GMU Professor Borne uses the title in his talks and the book in his Data Science Class: – Available at Amazon.com: – North/dp/ /ref=sr_1_1?ie=UTF8&qid= &sr=8- 1&keywords=data+mining+for+the+masses North/dp/ /ref=sr_1_1?ie=UTF8&qid= &sr=8- 1&keywords=data+mining+for+the+masses Book datasets are available: – Recent book review: – Free PDF download of the book: – 8

Data Science for Data Mining: Tutorial I will do a tutorial on this and would welcome anyone else doing and presenting on this as well. The steps I followed are as follows: – I merged the 14 CSV files into one Excel SpreadsheetExcel Spreadsheet – I copied the Book PDF files into MindTouch by first creating the Table of Contents structure and then copying individual sections of the book to support the Exploratory Data Analysis I did with Spotfire. – Instead of the book's text mining exercises and four text files in Chapter 12, I text mined the entire publication by building a structured knowledge base in the Excel Spreadsheet.Excel Spreadsheet Question: Can we do RapidMiner with Spotfire? The Answer is Yes and is shown in the Spotfire Screen Captures below. 9

GMU CDS 401 Syllabus 10

GMU CDS 401 Reading Assignments 11

Data Science for Data Mining: Knowledge Base for Finding 12 Data Science for Data Mining Google Chrome Find: Regression

Data Science for Data Mining: Spreadsheet for Finding 13 Excel Spreadsheet Also 14 CSV files merged for Spotfire

Data Science for Data Mining: Spotfire Screen Captures 2.1. Cover Page 2.2. Chapter 03 Data Set 2.3. Chapter 04 Data Set 2.4. Chapter 05 Data Set 2.5. Chapter 06 Data Set 2.6. Chapter 07 Data Set Scoring 2.7. Chapter 07 Data Set Training 2.8. Chapter 08 Data Set: MyModel​ 2.9. ​Chapter 09 Data Set Scoring Chapter 09 Data Set Training Chapter 10 Data Set Scoring Chapter 10 Data Set Training Chapter 11 Data Set Scoring Chapter 11 Data Set Training Chapter 11 Exercise Training Data 14

Data Science for Data Mining: Spotfire Dashboard 15 Web Player

2014 George Mason and IBM Symposium on Diverse Data Analytics Applications In the last decade, data explosion and robust analytics tools engender “Big Data and Analytics” among the most popular words used in the computer engineering and IT industry. In addition, the cloud, the social, and the mobile environments generate a tremendous amount of personalized, geospatial and temporal data that is extremely valuable to education, business operations, government services, and the intelligence community. According to the Forum for Innovation, 90 percent of the world's data has been produced in the last two years. The operational need and market demand grow stronger and stronger. To create an opportunity to share Big Data and Analytics knowledge and technologies among academic institutions, industry leaders and government customers, George Mason University and IBM will host the conference "Big Data and Analytics 2014" on November 4th at George Mason University. In this conference, we have invited leaders and experts from academia, industry and government to discuss how big data and analytics hold the key to unlocking value. If you are a student, you will learn topics like data mining, statistical models, predictive analytics, and data visualization; if you are a researcher, you can compare notes with analytics experts from George Mason University, IBM and other colleagues; if you are a technology provider, you will get an update on the cutting edge of analytics in industry and research. 16 Web SiteWeb Site and SlidesSlides

GMU Data Analytics Engineering, MS 17

Data Science for the HHS IDEA LAB: Knowledge Base 18 Data Science for the HHS IDEA LAB The HHS IDEA Lab is cultivating innovation for a more modern and effective government. They are striving to better harness the talent of the workforce at HHS and remove barriers HHS employees are faced with so they can act. They are doing this through a three pronged approach: Encouraging internal entrepreneurship by investing in HHS employees; Recognizing they don’t have all the answers inside government and are bringing in external talent to help; and Building communities of like-minded people across HHS to take on issues of strategic importance.

HHS IDEA Lab HHS IDEA Lab Hosts Demo Day for 11 Teams to Pitch Potentially Game-Changing Projects for Continued Support to HHS Senior Leadership: – Media Advisory for September 30, 2014, 10:30AM – 12:30AM What is the HHS IDEA Lab? – The approach the IDEA Lab takes is based on four tenets: Innovation is a direct result of the freedom to experiment. Design is critical to effectively communicate ideas. Entrepreneurship allows us to take advantage of underutilized talent. Action, above all else, is encouraged. Data Science for the HHS IDEA LAB and Innovative Design, Development and Linkages of Databases Fellowship: My Tribute to George Thomas (July 2014) – Still not decided 19

HHS IDEA Lab Hosts Demo Day "Shark Panelists", Dr. Taha Kass-Hout, FDA HHS IDEA Lab Director and CTO, Bryan Sivak (invited) HHS Health Data Initiative Director, Damon Davis, PAWG 2014 (invited) National Institutes of Health, Noel Southall GINAS: Advancing FDA's Ingredient Information System (invited) National Institutes of Health, Brock Smith, Fostering Scientific Insight through Data Federation (SEMOSS) (invited) Alex Sherman, Deloitte Consulting LLP, (Accepted) 20

FDA Analytics with SEMOSS 21

Agenda 6:30 p.m. Welcome and Introduction – Report on Recent HHS IDEA Lab Demo Meeting with Bryan Sivak (invited) and Damon Davis (invited) and HHS Data Science Data Publication Tutorial Slides Background Data Science for Tackling the Challenges of Big Data (MIT Online Course)SlidesBackgroundData Science for Tackling the Challenges of Big Data 7:00-7:15 p.m. Joe Pringle, Director of Health, Socrata Slides and Demo Links:Slides – – – – – 7:15 p.m. Brief Member Introductions and Refreshment Break 7:30 p.m.​ Alex Sherman and Kartik Verma, Deloitte Consulting for HHS NIH and MHS, Slides and Demo Link: – GINAS: Advancing FDA's Ingredient Information System, Noel Southall, National Institutes of Health (also FDA involved) (invited) FDA has articulated its vision for a next-generation data system that serves as the central clearing house for ingredients in medical products. Meanwhile, the National Center for Advancing Translational Science at NIH has created its own substance tracking system to facilitate research efforts. Working with the FDA, this NIH team will test their software as a solution in the FDA environment. – Fostering Scientific Insight through Data Federation, Brock Smith, National Institutes of Health (invited) This cross-departmental team consisting of individuals representing NIH, FDA and CDC recognizes a problem affecting scientists and their research goals. Because of the breadth and variety of resources, NIH researchers have difficulty synthesizing existing public data with their internally produced research findings and thus can easily lose valuable scientific insight. The team is testing the value of a web platform called SEMOSS that is designed to aggregate existing, fragmented health data while leveraging data analytic and visualization tools to enable scientists’ intuitive analysis and synthesis in their research. 8:30 p.m. Open Discussion 8:45 p.m. Networking 9:00 p.m. Depart 22 Data Science, Data Infrastructure, & Data Publications for the HHS IDEA Lab