Presentation is loading. Please wait.

Presentation is loading. Please wait.

EPA Big Data Analytics: EnviroAtlas Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

Similar presentations


Presentation on theme: "EPA Big Data Analytics: EnviroAtlas Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community"— Presentation transcript:

1 EPA Big Data Analytics: EnviroAtlas Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community http://semanticommunity.info/ http://www.meetup.com/Federal-Big-Data-Working-Group/ http://www.meetup.com/Virginia-Big-Data-Meetup http://www.meetup.com/Northern-Virginia-Semantic-Web-Meetup/ http://semanticommunity.info/Data_Science/Federal_Big_Data_Working_Group_Meetup April 17, 2015 1

2 Overview EPA EnviroAtlas Data: Web Page Description Maps Scales – National and Community Geodatabases-to-Shape Files: FME Workbench Results Data Science Data Publication: MindTouch Knowledge Bases Spreadsheet Knowledge Base Indices and Tables Spotfire Analytics and Visualizations: Cover Page – Knowledge Base Content Analytics IRM Strategic Plan Tables EnviroAtlas Inventories Selected National Metrics 2

3 EPA EnviroAtlas Data: Web Page http://enviroatlas.epa.gov/enviroatlas/Datadownload/index.html 3

4 EPA EnviroAtlas Data: Description EnviroAtlas national and community data are available to download below as geodatabases. Due to technical limitations which we are working to overcome, not all of the EnviroAtlas data (e.g., 1- meter landcover data, supplemental data) are available for download. As of February 2015, the EnviroAtlas is transitioning to a more recent version of the 12-digit HUCs, data aggregated to these new boundaries will be available soon. All available EnviroAtlas data for each community, except the landcover, is included in the individual geodatabase files below. Durham, NC metric tables in Esri FileGeodatabase format (compressed [36 MB]) Fresno, CA metric tables in Esri FileGeodatabase format (compressed [7 MB]) Green Bay, WI metric tables in Esri FileGeodatabase format (compressed [9 MB]) Milwaukee, WI metric tables in Esri FileGeodatabase format (compressed [31 MB]) New Bedford, MA metric tables in Esri FileGeodatabase format (compressed [4 MB]) Phoenix, AZ metric tables in Esri FileGeodatabase format (compressed [74 MB]) Pittsburgh, PA metric tables in Esri FileGeodatabase format (compressed [31 MB]) Portland, ME metric tables in Esri FileGeodatabase format (compressed [22 MB]) Tampa, FL metric tables in Esri FileGeodatabase format (compressed [53 MB]) Woodbine, IA metric tables in Esri FileGeodatabase format (compressed [658 KB]) http://enviroatlas.epa.gov/enviroatlas/Datadownload/index.html 4

5 EPA EnviroAtlas Data: Maps 5 http://enviroatlas.epa.gov/enviroatlas/Data/scale.html

6 EPA EnviroAtlas Data: National Maps at the national extent provide wall-to-wall data coverage for the coterminous U.S. These data layers are summarized by 12 digit hydrologic watershed basins (12-digit HUCs) and provide approximately 90,000 similarly sized spatial units. A list of the currently available data is accessible as a.pdf, an.xls file, or as a tab-delimited text file (National file). This file shows the benefit categories under which each layer can be found. Supplemental maps for the nation provide context and additional data for exploring ecosystem services and the built environment. These data are not summarized by a specific spatial unit. Instead, these supplemental maps represent features in the landscape such as rivers and wetlands, as well as other contextual landmarks such as state boundaries. Details on each supplemental map can be found in the data fact sheets. http://enviroatlas.epa.gov/enviroatlas/Data/scale.html 6

7 EPA EnviroAtlas Data: Community Community-level information in EnviroAtlas draws from fine scale land cover data, census data, and models to estimate ecosystem services and their benefits within the community area. EnviroAtlas community data are consistent for each available community, and are mostly summarized by census block groups. EnviroAtlas is building datasets for 50 communities in the United States; each community area boundary is based on selected block groups within the 2010 US Census Urban Area boundary. See a list of the available and upcoming communities. Learn more in the Community Fact Sheet (pp, 997K) or download a list of all the EnviroAtlas data available for each community as a.pdf), an.xls file, or as a tab-delimited text file (Community file). This file shows the benefit categories under which each layer can be found. Supplemental maps for each community provide context and additional data for exploring ecosystem services and the built environment. These data are not summarized by a specific spatial unit and include the 1 meter resolution land cover data for each community. Details on each supplemental map can be found in the data fact sheets. 7 http://enviroatlas.epa.gov/enviroatlas/Data/scale.html

8 EPA EnviroAtlas Data: Map of Communities 8 http://enviroatlas.epa.gov/enviroatlas/data/communities.html

9 Geodatabases-to-Shape Files 9 My Note: Sort by Size My Note: 0.5 GB HUC 12 Being Updated

10 FME Workbench: National Metrics Log File Starting translation... FME 2015.0 (20150217 - Build 15253 - WIN64) FME_HOME is 'C:\Program Files\FME\' FME Database Edition (node locked-crc) Serial Number: 0 Temporary License: 31 days left. Machine host name is: BrandNiemann-PC LOTS MORE DETAILS….. Total Features Written 2,607,688 Translation was SUCCESSFUL with 8 warning(s) (2607688 feature(s) output) FME Session Duration: 6 minutes 18.3 seconds. (CPU: 326.0s user, 47.7s system) END - ProcessID: 6016, peak process memory usage: 57144 kB, current process memory usage: 57092 kB Translation was SUCCESSFUL 10

11 FME Workbench: National Metrics GDB-to- SHP 11 http://www.safe.com/

12 Data Science Data Publication: MindTouch Knowledge Base 12 Data Science for EPA Big Data Analytics My Note: Use Google Chrome Find

13 Data Science Data Publication: Spreadsheet Knowledge Base 13 EPABigDataAnalytics.xlsx

14 EPA EnviroAtlas National & Community Inventory 14 xlscurrentdata.xls

15 Data Science Data Publication: Spotfire Cover Page 15 Content Analytics Web Player

16 Data Science Data Publication: IRM Strategic Plan 16 Content Analytics Web Player

17 Data Science Data Publication: IRM Strategic Plan Tables 17 PDF to Tables Enterprise Data Dictionary Web Player

18 Data Science Data Publication: EnviroAtlas Inventory National 18 National Layer Counts Web Player

19 Data Science Data Publication: EnviroAtlas Inventory Community 19 Community Layer Counts Web Player

20 Data Science Data Publication: EnviroAtlas Inventory NatureServe 20 SHAPE Length Versus SHAPE Area Acres per State SHAPE Area per State Web Player

21 Data Science Data Publication: EnviroAtlas Inventory Land Cover 21 Percent Wetland Versus PAGP Percent Wetland by HUC 12 Web Player

22 Conclusions and Recommendations The EPA EnviroAtlas Data are the most integrated databases EPA has for national and community ecosystems. The use of the ESRI proprietary GDB format limits the reuse of these data in open government data applications. The Safe Software FME Workbench was used to convert GDB-to-SHP formats for selected national and community files. A Data Science Data Publication of EPA Big Data Analytics was produced as an example of the new EPA Big Data Analytics Service in the EPA 5 year IRM Strategic Plan. There are EnviroAtlas Data for 50 Communities coming and lots of EPA Geospatial Data Sets that could be used for Big Data Analytics in Data Science Data Publications. 22

23 Exploratory Data Science on Even Bigger Data Process: Unzipped and Converted all National Metrics GDB-to-SHP with Safe FME Workbench (70 MB to 282 MB in 102 files of which 34 were SHP). Imported all 34 SHP (30 MB) at once into one Spotfire file that was 84 MB. Did Exploratory Data Analysis on them! Geometry is missing, but did not need it for this initially because have HUC Codes. Found current HUC 12 Geometry at USDA Geospatial Data Gateway (700 MB GDB ZIP) and Unzipped to 744 MB and converted GDB-to-SHP to 4.0 GB SHP! Imported to Spotfire and only 1.8 GB file! Safe FME Workbench Log file: Total Features Written: 100493 Translation was SUCCESSFUL with 0 warning(s) (100493 feature(s) output) FME Session Duration: 4 minutes 12.1 seconds. (CPU: 230.1s user, 6.3s system) END - ProcessID: 10120, peak process memory usage: 290572 kB, current process memory usage: 137068 kB Translation was SUCCESSFUL 23

24 Spotfire Data Tables and Relations 24 My Note: 35 Data Tables with All Their Many Columns of Numbers, Locations and Categories with BioMass (83,029 Rows by 10 Columns) Joined to HUC12 (100,493 Rows by 27 Columns) All in Memory!

25 Exploratory Data Science: BioMass by HUC12 1 25

26 Exploratory Data Science: BioMass by HUC12 2 26

27 Exploratory Data Science: Florida BenMap 27

28 Exploratory Data Science: Florida BG_Pop 28


Download ppt "EPA Big Data Analytics: EnviroAtlas Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community"

Similar presentations


Ads by Google