Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Science for EPA & USGS Fracturing & Fracking­­­­­ Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data.

Similar presentations


Presentation on theme: "Data Science for EPA & USGS Fracturing & Fracking­­­­­ Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data."— Presentation transcript:

1 Data Science for EPA & USGS Fracturing & Fracking­­­­­ Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science EPA Fracturing Data Data Science for USGS Produced Waters October 5, 2015 1

2 Agenda Get a Preview of National Data Science Organizers Workshop on November 5-6, 2015, and the Focus on National Data Science Challenges and Hackathons 6:30 p.m. Welcome and Introduction (New Tutorial and Mentoring) Slides Data Science for USGS Produced Waters See previous: EPA Fracturing Data and TIBCO WebinarSlidesData Science for USGS Produced WatersEPA Fracturing DataTIBCO Webinar 7:15 p.m. Brief Member Introductions 7:30 p.m. Invited Presentation: Dr. Sophia Liu and USGS Staff See USGS Hydrologic Fracking and List of Hackers Dr. Sophia LiuUSGS Hydrologic FrackingList of Hackers 8:15 p.m. Open Discussion ​8:45 p.m. Networking 9:00 p.m. Depart http://www.meetup.com/Federal-Big-Data-Working-Group/events/223892919/ 2

3 Background Dr. Sophia B. Liu is currently a Mendenhall Postdoctoral research fellow at the U.S. Geological Survey investigating crowdsourced geographic information around earthquakes. July 13 th Meetup: Data Science for USGS Minerals Big Data Slides and USGS Civic Hacking Challenges on Hackpad and Slack SlidesHackpadSlack Brief Comments from Subject Matter Experts in the USGS Energy, Minerals, and Environmental Health Programs Dr. Liu was at the EPA and White House for Crowdsourcing and Citizen Science Meetings last week and she will provide a report at our Meetup. See next slides for more background and an EPA and Crowdsourcing Citizen Science example for the EPA Nutrient Indicators Dataset by Dr. Niemann. http://profile.usgs.gov/sophialiu 3

4 4 https://www.whitehouse.gov/blog/2014/12/02/designing-citizen-science-and-crowdsourcing-toolkit-federal-government Federal Community of Practice on Crowdsourcing and Citizen Science Lea Shanley, Presidential Innovation Fellow, NASA, and Jay Benforado, Deputy Chief Innovation Officer, EPA. See SlidesSee Slides

5 5 http://www2.epa.gov/innovation/federal-community-practice-crowdsourcing-and-citizen-science

6 6 https://www.whitehouse.gov/blog/2015/09/09/open-science-and-innovation-people-people-people Recording To Be Posted

7 7 http://www2.epa.gov/nutrient-policy-data/nutrient-indicators-dataset

8 8 http://www2.epa.gov/nutrient-policy-data/estimated-total-nitrogen-and-total-phosphorus-loads-and-yields-generated-within

9 Specific Indicators Documented Nutrient Pollution: Nutrient loads and yields Download the loadsdatatable.xlsx (2 pp, 26 K) Nutrient loads and yieldsloadsdatatable.xlsx Fertilizer Download the Fertilizer nitrogen data table (excel) (2 pp, 19 K) and Download the Fertilizer phosphorus data table (excel) (2 pp, 28 K) FertilizerFertilizer nitrogen data table (excel)Fertilizer phosphorus data table (excel) Manure Download the manuredata.xlsx (2 pp, 44 K) Manuremanuredata.xlsx Documented Impacts: Hypoxia Download the hypoxiadata.xlsx (2 pp, 13 K) Hypoxiahypoxiadata.xlsx Harmful algal toxins Download the toxinsdata.xlsx (2 pp, 14 K) Harmful algal toxinstoxinsdata.xlsx Groundwater nitrate Download the groundwaterdata.xlsx (2 pp, 15 K) Groundwater nitrategroundwaterdata.xlsx Assessed and impaired waters Download the impairedrivers.xlsx (2 pp, 17 K), Download the impairedlakes.xlsx (2 pp, 17 K), & Download the impairedestuaries.xlsx (2 pp, 15 K) Assessed and impaired watersimpairedrivers.xlsximpairedlakes.xlsximpairedestuaries.xlsx State Actions Underway: Limiting loads Download the npdesdata.xlsx (2 pp, 25 K) Limiting loadsnpdesdata.xlsx Adoption of standards My Note: Data Table Missing. Sent Message. Corrected Problem: Criteria Progress Nutrient Policy Data US EPA.xlsx Adoption of standards Criteria Progress Nutrient Policy Data US EPA.xlsx 9

10 10 http://www2.epa.gov/nutrient-policy-data/forms/contact-us-about-nutrient-policy-and-data The table on this page is missing: http://www2.epa.gov/nutrient-policy-data/progress- towards-adopting-total-nitrogen-and-total-phosphorus- numeric-water#listhttp://www2.epa.gov/nutrient-policy-data/progress- towards-adopting-total-nitrogen-and-total-phosphorus- numeric-water#list. Response: Our webmaster has addressed the issue hindering the access to full table contents for the link of your interest. I encourage you to re-visit our site.

11 11 http://cfpub.epa.gov/wqsits/nnc-development/milestonetable.html

12 12 My Note: These need to be reformatted for Spotfire and merged by state. See Next Slides. Nutrient loads and yieldsNutrient loads and yields Download the loadsdatatable.xlsx (2 pp, 26 K)loadsdatatable.xlsx

13 13 Spotfire great for very wide tables just like when I helped Dr. Ben Schneiderman many years ago test it with very wide Toxic Release Inventory tables! EPANutrientIndicatorsDataSet.xlsx

14 14 EPANutrientIndicatorsDataSet.xlsx Metadata

15 15 Web Player My Note: Compare these nutrient data to those from the IPNI NuGis used in my Big Data Science for Precision Farming Business Online Course.Big Data Science for Precision Farming Business Online Course Would Have to merge these two datasets by state like I am going to show next.

16 16 Web Player

17 17 Semantic Community Data Science Big Data Science for Precision Farming Business Week 4 Modeling

18 18 Web Player

19 19 http://www.tamr.com/tamr-catalog-2/ Tamr Co-Founder/CTO Stonebraker Wins 2014 Turing Award

20 Tamr Catalog and Tamr Platform Tamr Catalog: See all your data What does Catalog do for you?: Data Discovery Data Organization Data Understanding All of your organization’s hidden data. All your data, in one place What you have and where Catalog everything Supercharge Catalog with more Tamr Tamr Platform: Focuses on solving the core problems associated with integrating many disparate datasets across the enterprise in a rapid and scalable manner. Data Unification Connect Consume Specifically, Tamr enables users to: Register any data source, regardless of source format or location Define the desired schema of the integrated dataset Cluster or merge records 20 http://www.tamr.com/tamr-connect/http://www.tamr.com/tamr-catalog-2/

21 Tamr Catalog ZIP File 21 Spreadsheets Executable and Readme

22 22 http://localhost:8228/

23 Tamr Catalog Views: Add Sources and Explore Table 23 Click here to add sources to your Catalog. Each tile represents a source you cataloged. Click here to explore a table of your sources.

24 Tamr and TIBCO Spotfire TIBCO Spotfire is both a Catalog and a Platform and an Analytics and Visualization Tool: Originally I thought that Tamr did something more that TIBCO Spotfire, but until they actually had a product to test, I could not be sure. There may still be something in the Tamr Platform that uses an ontological approach to fuzzy matching of the columns that I read about in their early white paper. I was able to integrate the 11 EPA Nutrient Indicator Datasets readily in a spreadsheet because they all have a common key field state: I could have imported each separately into Spotfire and used the Manage Relation Function to automatically merge them but I need to reformat the 11 individual spreadsheets to clean up their headers! Next I want do everything and more than the Tamr Catalog and Tamr Platform for the EPA & USGS Fracturing & Fracking­­­­­ Data: Do each individually Merge them! 24

25 25 Linked Data Visualizations Data Table Metadata Table Data Columns Classified by: Numbers Time Location Categories with Filters Filters Details-on-Demand Web Player

26 TIBCO Spotfire Data Table and Data Column Properties: EPA Nutrient Dataset 26 Data Table PropertiesData Column Properties See Next Slide For Details

27 27 Data Column Properties Exported to Spreadsheet

28 28 2862 Pairs of Statistical Relationships! N(N-1)/2 where N=54 Web Player

29 29 Web Player EPA Fracturing Data

30 TIBCO Spotfire Data Table and Data Column Properties: EPA Fracturing Data 30 Data Table Properties See Next Slide For Details Data Column Properties

31 31 EPA Fracturing Data: additive_ingredients_final_030515_3

32 32 USGS Produced WatersWeb Player

33 TIBCO Spotfire Data Table and Data Column Properties: USGS Produced Waters 33 See Next Slide For Details Data Table PropertiesData Column Properties

34 34 USGS Produced Waters: USGSPWDB_v2.1

35 35 Add Relation for EPA Fracturing Data: additive_ingredients_final_030515_3 and USGS Produced Waters: USGSPWDB_v2.1 Step 3: This is the State Relation Step 1 Step 2

36 36 TIBCO Spotfire provides all the column name matches that are possible and these can become relations if the column names are semantically the same in all the data sets (left). TIBCO Spotfire can also create calculated columns (right).

37 37 Add USGS Produced Waters: USGSPWDB_v2.1 to EPA Fracturing Data: additive_ingredients_final_030515_3 Because EPA Fracturing has 25 datasets and USGS Produced Water is just one dataset. EPA Fracturing Data: Number of Disclosures by State USGS Produced Waters: TDS by State Web Player

38 October 19 th Meetup: Sensing Our Air: The Quest for Big Data About Our Air Quality Get Another Preview of National Data Science Organizers Workshop on November 5-6, 2015, and the Focus on National Data Science Challenges and Hackathons 6:30 p.m. Welcome and Introduction Slides Data Science for EPA EnviroAtlas Part II. Also see Earth Insights from Big DataSlidesData Science for EPA EnviroAtlasEarth Insights from Big Data 6:45 p.m. Invited Presentation EPA Staff: Robin Thottungal (invited) Robin Thottungal 7:15 p.m. Brief Member Introductions 7:30 p.m. Invited Presentation EPA Staff (continued):EPA Engineer Dr. Gayle HaglerEPA Engineer Dr. Gayle Hagler 8:15 p.m. Open Discussion​ 8:45 p.m. Networking 9:00 p.m. Depart http://www.meetup.com/Federal-Big-Data-Working-Group/events/223605766/ 38

39 New EPA Chief Data Scientist Robin Thottungal (invited) will be joining as the division director for the Environmental Analysis Division (EAD) within the Office of Information Analysis and Access, and as the chief data scientist. Robin Thottungal An email from EPA CIO Ann Dunkin, which Federal News Radio obtained, said Thottungal starts later this month after spending most of his career in the private sector. Most recently, Thottungal worked at Deloitte Consulting where he focused on large scale analytics projects for public sector and commercial clients. He also led the global big data community of practice for Deloitte, developing analytical frameworks and go-to market strategy for big data and analytics solutions. Additionally, Thottungal is the vice-chairman for the Institute of Electrical and Electronics Engineers (IEEE) Washington D.C. section as well as the chapter chairman for IEEE Computational and Intelligence society. 39

40 Rescheduled From June 29th Meet EPA Engineer Gayle Hagler, Ph.D. http://www2.epa.gov/sciencematters/meet-epa-engineer-gayle-hagler-phd Q & A from EPA Presentation: Sensor Technology State of the Science July 8, 2014 http://www.epa.gov/heasd/airsensortoolbox/sensortechnology_qa.html Air Sensors 2014 https://sites.google.com/site/airsensors2014/agenda Welcome to the Village Green Project: a research effort to discover new ways of measuring air quality and weather conditions in community environments http://villagegreen.airnowtech.org/welcome 40


Download ppt "Data Science for EPA & USGS Fracturing & Fracking­­­­­ Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data."

Similar presentations


Ads by Google