Big Data Conference: Analytics and Applications for Federal Big Data Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic.

Slides:



Advertisements
Similar presentations
Federal Transparency.gov As Data For the Digital Government Strategy Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Advertisements

OMB Data Visualization Tool Requirements Analysis: Information Builders Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Data Science for Business: Semantic Verses Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
OMB Data Visualization Tool Requirements Analysis: Oracle Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
OMB Data Visualization Tool Requirements Analysis: Birst Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
OMB Data Visualization Tool Requirements Analysis: SAS Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Build VIVO in the Cloud NIH Workshop on Value Added Services for VIVO Brand Niemann Semantic Community March 25-26,
Title: Build EPA Apps in the Cloud Dr. Brand Niemann Former US EPA Senior Enterprise Architect and Data Scientist Current Binary Group Senior Enterprise.
Presentation to Data.gov PMO Semantic Web/Linked Data Team Dr. Brand Niemann Director and Senior Data Scientist Semantic Community July 27,
Build the Binary Group in the Cloud Brand Niemann Senior Enterprise Architect Binary Group August 5, Updated August 8,
Build Systems of Systems in the Cloud: Tutorial Brand Niemann Director and Senior Data Scientist Semantic Community November 9,
Semantic Interoperability Community of Practice (SICoP) Semantic Web Applications for National Security Conference Hyatt Regency Crystal City, Regency.
OMB Data Visualization Tool Requirements Analysis: IBM Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
OMB Data Visualization Tool Requirements Analysis: Logi Analytics Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
OMB Data Visualization Tool Requirements Analysis: Microsoft Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
NLM-Semantic Medline Data Science Data Publication Commons Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data.
Big Data and Social Media & Web Analytics Innovation Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
Big Data Innovation: Semantic Analytics 14 th SOA for eGovernment Conference Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist.
NIST Scientific Data for Data Science United Nations Open Data / Open Government Conference, April 26-28, Abu Dhabi
Semantic Data Discovery: Proof of Concept for DHS
Linked Data Visualizations for Eurostat Linked Data Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
OMB Data Visualization Tool Requirements Analysis: SAP Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
3 Round Stones: All Content As Big Data Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
A TEDMED Data Reveal: Big and Little Dr. Brand Niemann Director and Senior Data Scientist Semantic Community AOL Government.
Imagine Everything is Before You: Past, Present, and Future Paper and Demonstration for the 2014 Family History Technology BYU Dr. Brand Niemann.
Semantic Knowledge Bases and Be Informed for the FAA Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
Information Sharing Begins With Me Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
GIS Data Science for Collaboration Across Communities: GIScience 2.0 and Beyond Dr. Brand Niemann Director and Senior Data Scientist Semantic Community.
Using Data Science as Evidence in Public Policy With Big Data and Elections Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist.
EPA Indicators of Our Health and Environment Updated and Improved Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic.
Big Data Symposium: Analytics and Applications for Federal Big Data – Bureau of Justice Statistics Dr. Brand Niemann Director and Senior Enterprise Architect.
Big Data Symposium: Analytics and Applications for Federal Big Data - FEMA Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist.
Farm Data Dashboards: USDA and Microsoft Innovation Challenge Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data.
Data Science for Agency Initiatives 2015 Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Data Science for DataBay DataBay "Reclaim the Bay" Innovation Challenge: August 1-3, 2014, Smithsonian Environmental Research Center, 647 Contees Wharf.
Data Science ESIP Publication Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Data Science for USGS Minerals Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science.
Data Science for DTIC Data Ecosystem Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
The 2012 EuroStat Regional Yearbook for Semantic Interoperability Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic.
Why Doesn't EPA Have a Self- Contained Statistical Unit?: A Tribute to Doug Engelbart Dr. Brand Niemann Director and Senior Data Scientist Semantic Community.
Data Science for USDA Big Data
Data Science for EPA Big Data Analytics: Oregon Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Open DATA METI: All Content As Big Data Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
Data Science for Migration Data Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Health Datapalooza IV: Child and Adolescent Health Data App Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
SmartGrid and Spotfire Cloud Computing - Similarities in Innovation Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic.
Research on US Federal Government Handling of Data Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
An Internet of Things: People, Processes, and Products in the Spotfire Cloud Library Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist.
Data Science for the NOAA Chief Data Officer Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Semantic Data Science for the US Census Bureau Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Harnessing Data to Address Diabetes in the US Dr. Brand Niemann Director and Senior Data Scientist Semantic Community AOL.
Data Science for HealthCare.gov Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Department of Commerce App Challenge: Big Data Dashboards Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community.
Data Science for DoI BSEE Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science for DoI BSEE.
CTOlabs.com Government Big Data Success Stories Bob Gourley Jan 2012.
Data Science for FDA RFI Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Data Science for Conservation International's Big Ecosystem Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community.
Cross Information Sharing and Integration for the Intelligence Community: 13 th SOA for eGovernment Conference Dr. Brand Niemann Director and Senior Enterprise.
1 Social Business Intelligence from Open Government Data Brand Niemann Senior Enterprise Architect US EPA November 27, 2010 DISCLAIMER: While allowed to.
NIEM 3.0 Data Analytics App Dr. Brand Niemann Director and Senior Data Scientist Semantic Community AOL Government Blogger.
Harnessing Health.Data.gov Data to Address Diabetes in the US Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Data Science for NIST Big Data Framework Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
U.S. Federal Government Handling of Data for Open Government Data in Japan Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist.
Data Science for Global Ebola Response Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science.
HealthIT.gov Dashboard: Spotfire not Flash Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Data Science for the National Big Data R&D Initiative Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
1 Using DLESE: Finding Resources to Enhance Teaching Shelley Olds Holly Devaul 11 July 2004.
CSPA & Digital Transformation
Spotfire 5 Users Guide Dashboard
Title: Build EPA Apps in the Cloud
Presentation transcript:

Big Data Conference: Analytics and Applications for Federal Big Data Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community AOL Government Blogger March 5-6,

Preface I have attended and written about many big data conferences. One of the biggest in terms of number of conferences (3 per year for 3 years), attendees (1000s), and Tweets (1000s) is the recent Strata 2013: Making Data Work. I thought it would be valuable to distill that conference in preparation for next week's Government Big Data Symposium and next month’s Big Data Analytics and Applications for Defense, Intelligence and Homeland Security SymposiumTweetsStrata 2013: Making Data WorkGovernment Big Data SymposiumBig Data Analytics and Applications for Defense, Intelligence and Homeland Security Symposium 2

Strata 2013 Conference: Making Data Work The Strata 2013 Conference topics were: – Hadoop in Practice – Beyond Hadoop – Connected World – Data Science – Design – Law, Ethics, and Open Data – Internet of Things – Data Driven Business Data Design – Enterprise IT Someone new to big data would find some unfamiliar terms in the above list like Hadoop which Wikpedia defines as:Wikpedia – Apache Hadoop is an open-source software framework that supports data-intensive distributed applications. 3

Big Data Symposia The Government Big Data Symposium topics are:Government Big Data Symposium – The Latest Federal Government Strategies, Plans, Needs and Initiatives – Technical Challenges and Mission Strategies – Advanced Tools and Techniques – Implementation Strategies and Lessons Learned The Big Data Analytics and Applications for Defense, Intelligence and Homeland Security Symposium topics are:Big Data Analytics and Applications for Defense, Intelligence and Homeland Security Symposium – Government Needs, Initiatives, Opportunities and Challenges – Emerging Applications for Defense and Intelligence – The Latest Tools, Techniques and Technologies – Data Collection/Discovery, Deep/Predictive Analytics, Cloud, Scalability, Security, etc. 4

The Connection The connection between these three conferences is that: – Gartner Says Big Data Makes Organizations Smarter, But Open Data Makes Them Richer and Gartner Says Big Data Makes Organizations Smarter, But Open Data Makes Them Richer – The Gartner Magic Quadrant for Business Intelligence and Analytics Platforms shows one how to do that with tools that work well with open data.Magic Quadrant for Business Intelligence and Analytics Platforms 5

Strata 2013 Conference: Broad Data Strata 2013: Making Data Work emphasized the need for smart and adult data, fast and agile methods and tools that could solve difficult problems and provide real business value - just data is not a business model. Strata 2013: Making Data Work There were 11 sessions of interest, of particular interest and relevance to government big data, especially one that I had written about previously entitled Broad Data: What Happens When the Web of Data Becomes Real?, by James Hendler (RPI) which said:11 sessions of interestwrittenBroad Data: What Happens When the Web of Data Becomes Real? – Recently we have begun to see the emergence of a new online data challenge—that of the “Broad data” that emerges from millions and millions of raw datasets available on the World Wide Web. For broad data, the new challenges that emerge include Web-scale data search and discovery, rapid and potentially ad hoc integration of datasets, visualization and analysis of only-partially modeled datasets, and issues relating to the policies for data use, reuse and combination. In this talk, we present the broad data challenge and discuss potential starting points for solutions. We illustrate these approaches using data from a “meta-catalog” of over 1,000,000 open datasets that have been collected from about two hundred governments from around the world. 6

Big Data is Going Broad According to Government Internet Guru Jim Hendler 7

World Wide Web Expert Jim Hendler Receives Inaugural Strata "Big Data" Award 8 “RPI is one of the top places in the world for new data engineering research, and I'm excited to get this award as yet more evidence of the continuing growth in the strength of data science expertise here.”

IOGDS: International Open Government Dataset Search 9 Countries Catalogs Agencies Categories Data Sets?

Critique I found the RPI International Data Set Catalog difficult to use and the research paper that explains why linking open data is important difficult to re-create because the raw data was not provided.International Data Set Catalogresearch paperre-create The first problem with data catalogs is that their format is not standardized - they do not have a standard set of descriptive items like a library card catalog for example. Second, they do all contain sufficient metadata (data about the data) to allow one to work with the data without a subject matter expert. Third, they do not always contain links to the actual data and in a format that can be readily used. And fourth, they may not contain a data dictionary. 10

A Solution: Data Science I have found that the best source of metadata and data for data sets that can be integrated comes from government statistical agencies like the United States (Census Bureau Annual Statistical Abstract), Europe (Eurostat Annual Statistical Yearbook) and Japan (Statistical Agency Annual Statistical Yearbook). I have suggested a data science and system of systems approach to this problem and need for data integration as follows: – World Catalog - that helps identify the best Individual Catalogs - that help identify the best Value-added Data Sets. – This makes it more than just "an IT project that turns the crank on lots of data". 11

A Solution: Spotfire This is illustrated in a Spotfire dashboard I created using data sets and visualization of each these three parts of the system. Spotfire is a leader in the Gartner Magic Quadrant for Business Intelligence and Analytics Platforms for the reasons given in their report.Gartner Magic Quadrant for Business Intelligence and Analytics Platformsreport It also shows Data Visualization Design Using Shneiderman’s Mantra: Overview First, Zoom and Filter, Then Details-on-Demand presented at the Strata 2013 Conference.Data Visualization Design Using Shneiderman’s Mantra: Overview First, Zoom and Filter, Then Details-on-Demand 12

Gartner Magic Quadrant: Business Intelligence and Analytics Platforms 13

My Process So here is what I did: – Started with: Big Data Symposia Content Gartner Article and Magic Quadrant Reports Strata 2013 Conference Highlights Data Catalogs and Research Notes – Copied it to MindTouch to make it Digital Government Strategy compliantDigital Government Strategy – Made it Big Data and Machine Readable 14

Results The results are presented in: – A MindTouch Knowledge BaseKnowledge Base – An Excel SpreadsheetSpreadsheet – A Spotfire DashboardSpotfire – Tutorial slides in PowerPointPowerPoint 15

Big Data Symposia: Knowledge Base in MindTouch 16

IOGDS: Excel Spreadsheet 17 Note: This is linked data!

DataCatalogs.org: Excel Spreadsheet 18 Note: This is linked data!

DataCatalogs.org: Spotfire 19 Note: Most have no Metadata License! Note: Most are European!

IOGDS Countries and Catalogs: Spotfire 20 Note: This does not really tell about the data!

IOGDS France: Spotfire 21 Note: 352,285 rows by 20 columns! Note: Mostly CSV, HTML, and XLS!

US Data.gov Catalog: Spotfire 22 Note: Mostly US EPA where I worked!

U.S. Census Bureau/Small Area Health Insurance (SAHIE) Program-Spotfire 23 Note: 204,295 rows by 19 columns! Note: Percent Uninsured!

Conclusions and Recommendations Jim Hendler’s “meta-catalog” of over 1,000,000 open datasets that have been collected from about two hundred governments from around the world cannot be verified. Specifically he says US Data.gov has 441,339 data sets, but the catalog has only 5,999! – This big data analytics and application show four problems with data catalogs: format is not standardized; insufficient metadata (data about the data) to allow one to work with the data without a subject matter expert; they do not always contain links to the actual data and in a format that can be readily used; and they may not contain a data dictionary. The previous example of the SAHIE Program data set does! – Bottom Line: All the work with Data Catalogs does not really help with data integration as I have been able to show! Recall Slide 11! 24