Mandates for Data Transparency in 113th Congress: DataCoalition.org Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic.

Slides:



Advertisements
Similar presentations
Federal Transparency.gov As Data For the Digital Government Strategy Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Advertisements

Data Science for Business: Semantic Verses Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Data Act at US Department of Treasury Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Department of Commerce App Challenge: Big Data Dashboards Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community.
OMB Data Visualization Tool Requirements Analysis: Birst Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Build VIVO in the Cloud NIH Workshop on Value Added Services for VIVO Brand Niemann Semantic Community March 25-26,
Title: Build EPA Apps in the Cloud Dr. Brand Niemann Former US EPA Senior Enterprise Architect and Data Scientist Current Binary Group Senior Enterprise.
Presentation to Data.gov PMO Semantic Web/Linked Data Team Dr. Brand Niemann Director and Senior Data Scientist Semantic Community July 27,
Build the Binary Group in the Cloud Brand Niemann Senior Enterprise Architect Binary Group August 5, Updated August 8,
Build Systems of Systems in the Cloud: Tutorial Brand Niemann Director and Senior Data Scientist Semantic Community November 9,
Data Science for MyFamilySearch.org Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community My Personal Family History.
OMB Data Visualization Tool Requirements Analysis: Logi Analytics Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
OMB Data Visualization Tool Requirements Analysis: Microsoft Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
NLM-Semantic Medline Data Science Data Publication Commons Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data.
Big Data and Social Media & Web Analytics Innovation Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
Big Data Innovation: Semantic Analytics 14 th SOA for eGovernment Conference Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist.
NIST Scientific Data for Data Science United Nations Open Data / Open Government Conference, April 26-28, Abu Dhabi
Semantic Data Discovery: Proof of Concept for DHS
Linked Data Visualizations for Eurostat Linked Data Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Cloud: SOA, Semantics, & Data Science Welcome and Overview Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
OMB Data Visualization Tool Requirements Analysis: SAP Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
3 Round Stones: All Content As Big Data Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
Big Data Conference: Analytics and Applications for Federal Big Data Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic.
Data Science for USGS Minerals Big Data Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data.
Imagine Everything is Before You: Past, Present, and Future Paper and Demonstration for the 2014 Family History Technology BYU Dr. Brand Niemann.
NIEM as Big Data in a Network with Data Science Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
Datawatch Visual Data Discovery Department of Defense United States Army Detail Financial Transaction Data Visualization Impact.
Information Sharing Begins With Me Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
GIS Data Science for Collaboration Across Communities: GIScience 2.0 and Beyond Dr. Brand Niemann Director and Senior Data Scientist Semantic Community.
Using Data Science as Evidence in Public Policy With Big Data and Elections Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist.
EPA Indicators of Our Health and Environment Updated and Improved Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic.
Big Data Symposium: Analytics and Applications for Federal Big Data – Bureau of Justice Statistics Dr. Brand Niemann Director and Senior Enterprise Architect.
XBRL Seminar: The New Data Reference Model
Big Data Symposium: Analytics and Applications for Federal Big Data - FEMA Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist.
Farm Data Dashboards: USDA and Microsoft Innovation Challenge Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data.
Data Science for Agency Initiatives 2015 Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Data Science for USGS Minerals Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science.
Open Data Future for Grants ANN EBBERTS, CEO AGA MIKE PECKHAM, DIRECTOR, DATA ACT PMO, HHS CHRIS ZELEZNIK, ENGAGEMENT LEAD, DATA ACT PMO, HHS.
Data Science for DTIC Data Ecosystem Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
The 2012 EuroStat Regional Yearbook for Semantic Interoperability Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic.
Why Doesn't EPA Have a Self- Contained Statistical Unit?: A Tribute to Doug Engelbart Dr. Brand Niemann Director and Senior Data Scientist Semantic Community.
Open DATA METI: All Content As Big Data Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
Data Science for Migration Data Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Health Datapalooza IV: Child and Adolescent Health Data App Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
SmartGrid and Spotfire Cloud Computing - Similarities in Innovation Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic.
Research on US Federal Government Handling of Data Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
Binary Group Knows What It Knows Because of It’s Information Attitude Brand Niemann Senior Enterprise Architect and Data Scientist August 26,
1 A Target Data Architecture for the US EPA: Implementing DRM 3.0 and Data.gov Brand Niemann Senior Enterprise Architect, US EPA April 21, 2009 PARS 2009.
Build the NITRD Dashboard in the Cloud Brand Niemann Semantic Community March 14,
Data Science for the NOAA Chief Data Officer Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
1 Services and Cloud Computing Work Groups: Status Update Brand Niemann US EPA December 18, 2009.
Datawatch Visual Data Discovery Department of Defense United States Army Detail Financial Transaction Data Visualization Impact.
Data Science for HealthCare.gov Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Data Science for Semantics Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science for Semantics.
Department of Commerce App Challenge: Big Data Dashboards Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community.
Data Science for DoI BSEE Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science for DoI BSEE.
SICoP 2011: Transforming Government through Innovation with Semantic Technologies Semantic Tech and Business Conference, November 29 – December 1, 2011.
Data Science for Conservation International's Big Ecosystem Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community.
NGA Demo Participant Collaboration Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
Cross Information Sharing and Integration for the Intelligence Community: 13 th SOA for eGovernment Conference Dr. Brand Niemann Director and Senior Enterprise.
NIEM 3.0 Data Analytics App Dr. Brand Niemann Director and Senior Data Scientist Semantic Community AOL Government Blogger.
Data Science for NIST Big Data Framework Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
U.S. Federal Government Handling of Data for Open Government Data in Japan Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist.
Data Science for Global Ebola Response Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science.
Better Data, Better Decisions, Better Government: Digital Accountability and Transparency Act (DATA Act) Implementation Update Christina Ho, Deputy Assistant.
Data Science for the National Big Data R&D Initiative Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Semantic Enhancements for DoD Information Sharing, Enterprise Architecture, and Standards Dr. Brand Niemann Director and Senior Enterprise Architect –
Updates on U.S. Spending Transparency Improvements
Spotfire 5 Users Guide Dashboard
Title: Build EPA Apps in the Cloud
Presentation transcript:

Mandates for Data Transparency in 113th Congress: DataCoalition.org Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community AOL Government Blogger January 4,

Semantic Community Our Mantra is: Data Science Precedes the Use of SOA, Cloud, and Semantic Technologies! We use data science to help marketing and business development efforts.Data Science Our Mission is like Googles: Organize the world’s information and make it universally accessible and useful.Googles Our Method is like Be Informed 4: Architectural Diagrams and Questions and Answers are not enough, you need Dynamic Case Management!Be Informed 4 Our Sound Byte: It is not just where you put your data (cloud), but how you put it there! Our Work: Semantically enhancing your data and writing data science stories about it. 2

Mission Statement 1. Personal: – Senior Data Scientist at the US EPA: Completed Data Science Academic Training and Many EPA Data Products – Detail to Data.gov: Built Data.gov in An Information Platform 2. Putting Data To Work: – Data Journalist for Federal Computer Week and AOL Government: Published Many Data Science Products and Built Own Data Journalism Handbook – Data as a First Class Citizen: Data Science and Journalism for Analytic Standards and Audit of Open Data Sites: Working with CKAN, DoD, IC, NCOIC, NIST, OASIS, OMG, OSTP, W3C, etc. 3. The Emergence of Data Science: – Built a Data Science Team for the Government Community: “Killer Semantic Web Application” (Semantic MedLine on the new Cray Graph Computer) for the Federal Big Data Senior Steering Group – Challenges and Contests Using the Best High Quality Data Sets: Heritage Provider Network Health Prize, Health Data Initiative Forums, TedMed, Department of Commerce App Challenge, etc. 3

Invitation Your conversation with Cory Casanave over NIEM and StratML caught my eye. Our Coalition is preparing a legislative agenda on data standardization for the 113th Congress. Chairman Issa of the Oversight Committee - for whom I wrote the DATA Act when I served as counsel to the committee - will be reintroducing the DATA Act and is interested in other standards-related legislative mandates as well. I'd like to get your advice on how Congress could push the executive branch toward publishing more actionable data. I Legislative mandates are a powerful but very blunt tool. 4

My Response This is what I recommended to Congressional Staff last September: – BIG_DATA_at_the_Hill#Story BIG_DATA_at_the_Hill#Story I am working on several briefings in January and February: – ogy_SIG_Big_Data_Committee/Government_Chall enges_With_Big_Data ogy_SIG_Big_Data_Committee/Government_Chall enges_With_Big_Data 5

DTC Response Thanks very much for sharing these resources. I noticed that your three recommendations to Congress didn't include passing mandates for greater use of nonproprietary standards in publishing spending or regulatory data compilations that are currently one-star or zero stars. I'm eager to talk through whether some of the bills that our Coalition will be pursuing in the next Congress are consistent with your recommendations - and, if not, whether they could be made so. 6

My Response You are welcome and excellent point. Having worked for a regulatory agency (US EPA) for 30+ years and with the new spending data recently I thought we had gotten to open standards for publishing the data and on to the problem of data quality and completeness. See for example: – First_Show_Us_All_the_Missing_Data First_Show_Us_All_the_Missing_Data I am all for helpful legislation that focuses on the value of data analytics to government performance and accountability, more for organizational changes that foster “data work” instead of “IT projects” (e.g. Data.gov, NIEM, etc.), and government use of data scientists and data science like big data companies do to be successful. See for example: – linkedin-and-the-use-of-big-da/ linkedin-and-the-use-of-big-da/ 7

DTC Response Great! You are absolutely right that Recovery.gov is the best use case of standardized federal spending data. But it only covers stimulus spending and plus the whole platform and standard is about to be discarded - and not replaced by anything - unless we are able to pass some version of the DATA Act. 8

My Response My work with open government data on the platform I use is not going away so we could audit every agency’s financial data like my latest example for the Bureau of Public Debt and rate them for Congress: – I could do this: Data Transparency in Action – The technology companies joining the Data Transparency Coalition are eager to design software packages and platforms that analyze federal data in new, powerful ways. Our members' products and services will use standardized, freely-available federal data to find waste and abuse in federal spending, illuminate systemic risk in the financial markets, and help government and the private sector make better decisions. – This page will feature demonstrations of coalition members' data- driven solutions. Source: 9 See:

My Example 10 MY NOTE: This begins to illustrate the Use of My 5-Step Method To Get to 5-Stars With Open Data using the DataTransparencyCoalition.org Web site and then the 8 financial data sets.

My 5-Step Method So what I like to do to illustrate (data science) and explain (data journalism) is the following (like a recipe): – Put the Best Content into a Knowledge Base (e.g. MindTouch*) The DataTransparencyCoalition.org Web Pages – Put the Knowledge Base into a Spreadsheet (Excel*) Linked Data to Subparts of the Knowledge Base – Put the Spreadsheet into a Dashboard (Spotfire*) Data Integration and Interoperability Interface – Put the Dashboard into a Semantic Model (Excel*) Data Dictionaries and Models – Put the Semantic Model into Dynamic Case Management (Be Informed*) Structured Process for Updating Data in the Dashboard 11 * Examples of tools used.

Put the Knowledge Base into a Spreadsheet 12

To Get to 5-Stars With Open Data StarDefinitionExample / Tool* Make your stuff available on the Web (whatever format) under an open license This StoryThis Story / MindTouch Make it available as structured data (e.g., Excel instead of image scan of a table) SpreadsheetSpreadsheet / Excel Use non-proprietary formats (e.g., CSV instead of Excel) TableTable / MindTouch and Spotfire Use URIs to identify things, so that people can point at your stuff Table of ContentsTable of Contents / MindTouch and Spotfire Link your data to other data to provide context TableTable / MindTouch and Spotfire 13 * Examples of tools used. Source of Star and Definition:

System of Systems Architecture 14 S Semantic Index of Linked Data (e.g. Excel) Dynamic Case Management (e.g. Be Informed) Data Science Library (e.g. Spotfire) Data Science Products (e.g. Spotfire)

Key Concepts Build a Network (instead of XBRL, NIEM, etc.): – System of Systems: Federal, Agency, and Program My DoD SoS: Does It Follow Gall’s Law? – Big Challenges: Federal Budget, DoD Audit, Recovery.gov, Grants.gov, & Web-Services.gov The Federal Budget Network Could Be the Killer App for the Federal Government! 15

My Question and DTC Response I ask the following: – What are the 8 databases and a URL to a sample of their data sets and data dictionaries? – Has anyone done a demo of making those 8 databases interoperable in an interoperability interface? – If not, that would be my first pilot to document what it takes to do that on a sample of data from each. DTC Response: – The 8 data systems are listed below (see next slide). None is comprehensive; some are not publicly accessible; and they are not interoperable. 16

Federal Financial Information Network Data Submitted By Agencies: – 1. Federal Procurement Data System (feeds into USASpending.gov)*: – 2. Federal Awards Assistance Data System (feeds into USASpending.gov): – 3. OMB's MAX system (not publicly accessible, but the agency budget reports that go into it are accessible, piecemeal, here)*: %20SF%20133%20Report%20on%20Budget%20Execution%20and%20Budgetary%20Resource s.html %20SF%20133%20Report%20on%20Budget%20Execution%20and%20Budgetary%20Resource s.html – 4. Treasury's payments information repository (not publicly accessible) – 5. Catalog of Federal Domestic Assistance*: – 6. Consolidated Federal Funds Report (recently discontinued)*: Data Submitted By Recipients: – 7. FFATA Sub-Award Reporting System (not separately accessible but feeds into USASpending.gov): – 8. Recovery.gov* (will be discontinued on Sept. 30, 2013): 17 *: Done previous data science work.

1. Federal Procurement Data System 18

FPDS-Spotfire 19

5. Catalog of Federal Domestic Assistance 20

CFDA-Spotfire 21

8. Recovery.gov 22 MY NOTE: Did not use. MY NOTE: Used.

Recovery.gov-Spotfire 23

Data Panel-Spotfire The Data panel is used to get an overview of the columns in all data tables, in-memory as well as in-database (in-db). When working with in-database data the Data panel is the starting point for configuring both visualizations and the filters panel, since no filters are created automatically for external data. Depending on the data source, there will be different sections available for a selected data table, see below. In-Memory or In-Database Relational Data – Data from in-memory data tables or in-database data tables based on relational databases is simply displayed as a list of the available columns in the selected data table, If data from in-db database tables have been joined with relations in the Data Tables in Connection dialog, they can be treated as a single, virtual data table within Spotfire. If no relations have been defined, each data table in the external connection will be a separate data table within Spotfire. In-Database Cube Data – When you are working with cube data you will see more fields in the Data panel than for the other data tables. 24

Data Network-Spotfire 25

Add Data Connection-Spotfire 26

Data Source Tree-Spotfire 27

Some Recommendations Standardization of financial data is good, but making data more actionable is better. Four of the eight Federal Financial Information data sets are available to pilot a network that will show what can be done and provide data sets to DTC members to demonstrate that. In the broader context, DTC could provide an audit and constructive feedback service to government open data and digital government strategy efforts like Semantic Community does. 28