Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Science for Data Act Data Harmonization Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data.

Similar presentations


Presentation on theme: "Data Science for Data Act Data Harmonization Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data."— Presentation transcript:

1 Data Science for Data Act Data Harmonization Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science for the DataAct Datathon August 7, 2015 1

2 First Meetup: Data Science for the Data Act at Treasury, December 15, 2014, CGI Federal DATA Act Requirements Thoughts and Open Discussion, Art Nicewick, Executive Consultant, CGI Federal SlidesSlides Data Science for the Data Act at Treasury, Brand Niemann SlidesSlides Web Sites: http://fedspendingtransparency.github.io/ http://fedspendingtransparency.github.io/dataelements/ Questions: At this time, we are asking for comments in response to the following questions: Which data elements are most crucial to your current reporting and/or analysis? In setting standards, what are industry standards the Treasury and OMB should be considering? What are some of the considerations that Treasury and OMB should take into account when establishing data standards? http://www.meetup.com/Virginia-Big-Data-Meetup/events/218682974/ 2

3 Congressional Testimony Try as it might, the federal government doesn't have the best track record on publicly reporting spending data, Gene Dodaro, comptroller general of the Government Accountability Office, told lawmakers December 3, 2014. USASpending.gov's success thus far could serve as a cautionary tale for the implementation of the Digital Accountability and Transparency Act, or DATA Act, said Dodaro during a hearing of the House Oversight and Government Reform Committee. "Our recent report on USASpending.gov really illustrates the challenge, here," said Dodaro. GAO's recent report found, five years after USASpending.gov launched, much of the information remains incomplete and inaccurate – with 324 programs not recorded in the database, $619 billion omitted and many of the data elements required for reporting missing, said Dodaro. Source: FierceGovFierceGov 3

4 Blogs In a speech at the 2014 Financial Stability Conference last week in Washington, the Director of the Office of Financial Research at Treasury, Dick Berner, called for universal adoption of Legal Entity Identifiers (LEI) throughout the federal government. Source: Treasury.gov Web SiteTreasury.gov Web Site OMB’s Mark Reger compared the DATA Act to the Full Employment Act, noting, “there is a ton of work to be done.” Reger said that the input from data transparency consultants, contractors, and data specialists is needed to tell the implementing federal executives what data is most important and help with analysis. Source: Data Coalition BlogData Coalition Blog 4

5 Data Transparency Breakfasts December 8, 2014: Federal Financial Management and the DATA Act The fourth Data Transparency Breakfast, presented by PwC, will explore the transformation of the U.S. government's spending information from disconnected documents into standardized data, as required by the DATA Act of 2014, from the perspective of federal financial managers. Join the financial officers who will be responsible for applying government-wide DATA Act data standards to make federal financial reports fully searchable, interoperable, and open to all. Our panel will explore the challenges and opportunities of the DATA Act transformation. My Note: I attended the Data Transparency Breakfast this morning in preparation for our December 15th Meetup. Please see additions to the agenda above, especially the slides, Web Site Links and Questions we will be discussing to provide feedback to Mark Reger, Deputy Controller, OMB, at his request to me at the breakfast. Source: Data CoalitionData Coalition 5

6 Government Technology & Innovation Incubator for Big Data Analytics II Meetup, March 25 th, Eastern Foundry 6:30 p.m. Welcome and Introduction (Preview of Proposed DATA Act Elements, Standardized Formulas, and Agency Implementation Challenges) 6:45 p.m. Brief Member Introductions 7:00 p.m. Chris Garner, Paxata, Inc., Presentation and Demo SlidesPaxataDemoSlides 7:20 p.m. Steve Hanmer, Gov PATH Solution, Presentation and DemoGov PATH Solution 7:40 p.m. Open Discussion 8:00 p.m.​ Government Technology & Innovation Incubator: Eastern Foundry Tour, Geoff OrazemEastern Foundry 8:30 p.m. Networking 9:00 p.m. Depart http://www.meetup.com/Federal-Big-Data-Working-Group/events/221283174/ 6

7 Newly Appointed U.S. CIO Tony Scott Speaks U.S. Chief Information Officer Tony Scott, in his first day of public appearances after his appointment by President Obama last month, described the President's 2013 Open Data Policy. Though the Open Data Policy is not mandatory for independent regulatory agencies, including most financial regulators, Scott said financial regulators can bring benefits to investors, their own operations, and the financial industry by voluntarily following it. View slideshow presentation here: http://www.datacoalition.org/wp- content/uploads/2015/03/Open-data-and-financial-regulationv2.pdfhttp://www.datacoalition.org/wp- content/uploads/2015/03/Open-data-and-financial-regulationv2.pdf 7

8 Financial Regulation Summit Highlights Over 300 public and private sector open data leaders gathered at Union Market in Washington, D.C. on Tuesday for the Coalition's Financial Regulation Summit - aimed at building a consensus for the transformation of U.S. financial regulatory reporting from disconnected documents into open, standardized data. Participants included Members of Congress; U.S. Chief Information Officer Tony Scott; Treasury Office of Financial Research Director Dick Berner; and representatives of nearly every major financial regulator. The Financial Regulation Summit was presented by RR Donnelley, with additional sponsorship by Workiva, Booz Allen Hamilton, PwC, RDG Filings, and Socrata. In coming weeks, the Coalition will publish video of all Summit presentations and a full analysis of the MADOFF Transparency Act. 8

9 Parties Interested in the DATA Act 1 You are invited to participate in a webinar hosted by the DATA Act Section 5 Pilot Team to discuss the Digital Accountability and Transparency Act (DATA Act) Section 5 Pilot. This online event is being held on April 1, 2015 from 1:00PM to 2:00PM EDT. The Chief Acquisition Officers Council, General Services Administration, and the Department of Health and Human Services are sponsoring a dialogue and pilot to identify clear recommendations for (1) standardizing grant and contractor awardee reporting, (2) eliminating duplicative and/or unnecessary reporting, and (3) reducing awardee compliance costs. The open dialogue, which will launch in spring of 2015, is iterative and will first ask interested parties to weigh in on these ideas, then we will apply those ideas in a pilot, and finally we will ask participants to again weigh in on the next iteration of ideas. Participation in the dialogue will provide federal contract and grant recipient organizations a unique opportunity to guide the future of the government-wide implementation of the DATA Act. 9

10 Parties Interested in the DATA Act 2 Attendees will learn the background and goals of the DATA Act Section 5 Pilot, expected outcomes, and participant opportunities and requirements. The event also will address commonly asked questions about the pilot. DATA Act Section 5 Pilot Grants Lead Lora Kutkat and DATA Act Program Management Office Communications Lead Christopher Zeleznik will be leading the discussion, which will include ample time for questions and answers. A recording and documentation from the event will be posted to the Outreach section of http://www.grants.gov following the event.http://www.grants.gov Please send any questions regarding the DATA Act Session 5 Pilot Webinar to Emily Gartland at emily.gartland@gsa.gov.emily.gartland@gsa.gov 10

11 National Webcast on Implementation of the Data Act On March 27th at 3:30pm EDT, please join us for a national webcast about implementation of the Digital Accountability and Transparency Act (DATA Act). Sponsored by a number of national organizations representing a broad-cross section of DATA Act stakeholders, the webcast will feature Federal leaders responsible for the Act's implementation. Hear from OMB Controller Dave Mader and Treasury Fiscal Assistant Secretary David Lebryk about plans for implementing this important legislation, which will have an impact on Federal agencies and all those who receive Federal funds. In particular, learn about the Federal government's approach to setting the required data element definition standards. There is no cost for participating in the webcast. 11 Source: PostponedPostponed See: GitHub Site for National DialogueGitHub Site for National Dialogue

12 12 https://actiac.org/project/data-act-transparency-federal-financials-project

13 Art Nicewick, Executive Consultant, CGI Federal I have been talking with Mike Wood about pulling something together for the Data Act demo day in June. I have some ideas, but no time. I'm still unclear on the goals of the Act. From what I see, it’s five headed monster, with many goals, and many of which are divergent. Everybody has a lot of ideas on what it can be, all the ideas are good. However, partitioning the problem into actionable components, defining the cost benefits of the components, and then setting the priorities --- is a challenge. I'd love to hear your thoughts. Art, Thanks and hopefully we could discuss this at the Meetup on Wednesday. 13

14 14 http://www.datacoalition.org/events/summits/finreg-2015/

15 So Many Activities About Financial Data, But Not with Financial Data! But See: Data Science for Financial Data by Dr. Brand Niemann Published by AOL Government in 2011-2013: Recovery.gov: A Good Start But Show Us All the Missing Data, By Brand Niemann, on September 08, 2011 at 3:00 PM http://breakinggov.com/2011/09/08/recovery-gov-a-good-start-but-show-us-all-the-missing-data/ But See: Semantic Community showed A USASpending.gov Dashboard with All the USA Spending Data in 2011. A USASpending.gov Dashboard, December 18, 2013 http://semanticommunity.info/A_USASpending.gov_Dashboard But See: Semantic Community showed for the 2014 Data Transparency Summit that the Federal Digital Government Strategy accomplishes the Data Act. Hudson Hollister, Executive Director, Data Transparency Coalition, agreed. Data Science for Financial Data Transparency (with Ontologies) http://semanticommunity.info/Data_Science/Data_Transparency_Summit But See: Data Science for the Data Act at Treasury Data Act at US Department of Treasury http://semanticommunity.info/Data_Science/Data_Science_for_the_Data_Act_at_Treasury 15

16 16

17 17 http://semanticommunity.info/A_USASpending.gov_Dashboard

18 18 http://semanticommunity.info/Data_Science/Data_Transparency_Summit Data Science uses the Data Mining Ontology (suggest by Dr. Barry Smith) and Data Mining Standard Process (CRISP-DM) to structure the content into a knowledge base using semantic web standards for big data.

19 19 http://semanticommunity.info/Data_Science/Data_Science_for_the_Data_Act_at_Treasury

20 Data Science for the Data Act at Treasury My Questions For the Fourth Data Transparency Breakfast Panel: My EPA Experience: Why not have a Federal Chief Data Officer and Agency Chief Data Officers with Data Scientists Mining Agency Data Assets? Federal Spending Data Elements: Will they support more than just reporting? Data analysis and even predictive analytics? Some results highlights are: There are 59 data elements in the Data Act and 46 in the USASpending Data Dictionary. The USASpending data set with 149,110 rows and 46 columns was geocoded by Spotfire using the PlaceofPerformanceCity column. There were other columns like Congressional District, ZIP Code, and County that were available. 20

21 Data Science for the DataAct Datathon Finally a Data Act Activity with Actual Financial Data Where a Data Scientist Can Actually Get Ready Access to the Data! Just by happenstance, I discovered the DATA Act Forum Datathon Call for Participants, DATA Act Forum-The Art of the Possible, and the DATA Act Forum Data Zoo Technology Showcase Application on July 27-28, and July 29, respectively.DATA Act Forum Datathon Call for ParticipantsDATA Act Forum-The Art of the PossibleDATA Act Forum Data Zoo Technology Showcase Application The three events (July 27-29) will be summarized for our future meetup (Data Science for the Data Act at Treasury?) and this Data Science for the Data Act Datathon will be extended by our Data Act Data Science team to make recommendations to OMB and other agencies.Data Science for the Data Act at Treasury The next step is to render the data dictionaries and the OMB Standard Data Act Data Elements in spreadsheet form so we can begin the semantic harmonization and mediation process in Spotfire. 21

22 My Conclusions and Recommendations The Federal Big Data Working Group Meetup Data Mining – Data Science Process was Applied to the DataAct Datathon Data Sets. A Data Ecosytem was Built by Downloading 19 Files from the IAC/ACT Datathon Socrata Catalog and Using Spotfire to Inventory Their Characteristics in an Excel Spreadsheet. There are many duplicate files in the IAC/ACT Datathon Socrata Catalog. The 14 unique files were imported into 3 Spotfire files for analytics and visualizations. Screen Capture Samples Are Shown to Help the Datathon Participants and in Preparation for Another Federal Big Data Working Group/Virginia Big Data Meetup on the Data Act. 22

23 23 http://semanticommunity.info/Data_Science/Data_Science_for_the_DataAct_Datathon

24 My Suggested Harmonization Process 1 What I am suggesting, which is the opposite of say you have an Access or MySQL database with multiple tables and key fields to join them, and you issue a SQL command to extract the subset of joined table data set you want to analyze. We have the reverse problem of trying to make 20 or so Datathon data sets, and ultimately multiple tables for every agency with their financial data, into a integrate data base to do the same thing with queries as above. I showed this in a recent Meetup for multiple Harmful Algal Bloom data sets that had been purposely designed with key fields. 24

25 My Suggested Harmonization Process 2 But what if the data sets have not been purposely designed with well- defined key fields or it is very difficult to match the “key fields” because of lack of data dictionaries, slightly different wording, etc. What I call semantic interoperability problems. Well I, or a team, can do this by hand using data dictionaries and the data sets in Spotfire and/or get a tool like TAMR that we had demonstrated recently in a Meetup. First you match as many of the data elements to the new OMB standard data elements (57), as I recall from work in our earlier Data Act for Treasury Meetup, and then you implement those matches in Spotfire Tools, Data Relationships feature so you can the “query” (without any SQL) a new merged, semantically harmonized table or tables. 25

26 26 http://www.tamr.com/tamr-catalog-alpha-download/


Download ppt "Data Science for Data Act Data Harmonization Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data."

Similar presentations


Ads by Google