Data Science for USDA Big Data

Slides:



Advertisements
Similar presentations
Data Science for Business: Semantic Verses Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Advertisements

Data Science for Tackling the Challenges of Big Data
Data Science for NSF Polar Cyberinfrastructure & MIT Big Data Course Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community.
Build VIVO in the Cloud NIH Workshop on Value Added Services for VIVO Brand Niemann Semantic Community March 25-26,
Data Science for Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
A Search for Veterans Benefits Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community December 22,
EPA Big Data Analytics: EnviroAtlas Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
OMB Data Visualization Tool Requirements Analysis: Microsoft Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
SSURGO Dataset to File Geodatabase Import Tool One Example of Extending Capabilities through Python 2013 IGIC Conference Muncie, Indiana Chris Morse, NRCS.
NLM-Semantic Medline Data Science Data Publication Commons Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data.
Big Data and Social Media & Web Analytics Innovation Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
NIST Scientific Data for Data Science United Nations Open Data / Open Government Conference, April 26-28, Abu Dhabi
Federal Big Data Working Group Meetup Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Data Science for RDA Climate Change Data Challenge and Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data.
EPA Big Data Analytics: Data Science for EPA Fracturing Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Semantic Data Discovery: Proof of Concept for DHS
Linked Data Visualizations for Eurostat Linked Data Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
OMB Data Visualization Tool Requirements Analysis: SAP Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Data Science for USGS Minerals Big Data Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data.
September 18-19, 2006 – Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development Using Geographic Information Systems (GIS) as.
Information Sharing Begins With Me Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
GIS Data Science for Collaboration Across Communities: GIScience 2.0 and Beyond Dr. Brand Niemann Director and Senior Data Scientist Semantic Community.
Data Science for Agency Initiatives 2015 Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science.
Data Science Publication for NSF Polar Cyberinfrastructure Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Using Data Science as Evidence in Public Policy With Big Data and Elections Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist.
EPA Indicators of Our Health and Environment Updated and Improved Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic.
Big Data Symposium: Analytics and Applications for Federal Big Data – Bureau of Justice Statistics Dr. Brand Niemann Director and Senior Enterprise Architect.
Federal Big Data Working Group Meetup: The Yosemite Project: A Roadmap for Healthcare Information Interoperability and The New Book: Building Ontologies.
Farm Data Dashboards: USDA and Microsoft Innovation Challenge Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data.
Data Science for Agency Initiatives 2015 Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Data Science for RDA Climate Change Data Challenge and Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data.
Federal Big Data Working Group Meetup Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Data Science for International Data Week 2016: Concept Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science.
Director and Senior Data Scientist/Data Journalist
Data Science for DataBay DataBay "Reclaim the Bay" Innovation Challenge: August 1-3, 2014, Smithsonian Environmental Research Center, 647 Contees Wharf.
Data Science ESIP Publication Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Data Science for USGS Minerals Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science.
Data Science for DTIC Data Ecosystem Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
The 2012 EuroStat Regional Yearbook for Semantic Interoperability Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic.
Why Doesn't EPA Have a Self- Contained Statistical Unit?: A Tribute to Doug Engelbart Dr. Brand Niemann Director and Senior Data Scientist Semantic Community.
Data Driven Farming: Week 5: Evaluation
Data Science for HealthData.gov Developers & Family Caregivers Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community.
Data Science for EPA Big Data Analytics: Oregon Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Open DATA METI: All Content As Big Data Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
SmartGrid and Spotfire Cloud Computing - Similarities in Innovation Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic.
Data Science for NSF Data Science Workshop 2015 Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science NSF.
Research on US Federal Government Handling of Data Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
Oregon GIS Framework Forum 05/20/2015 Oregon Soils Data Standard.
Data Science for the NOAA Chief Data Officer Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Semantic Data Science for the US Census Bureau Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Data Driven Farming: Week 6: Deployment Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Week 6 Deployment.
Data Science for Semantics Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science for Semantics.
Department of Commerce App Challenge: Big Data Dashboards Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community.
Data Science for DoI BSEE Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science for DoI BSEE.
Data Science for Joint Doctrine Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science for Joint.
Data Science for FDA RFI Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Data Science for Conservation International's Big Ecosystem Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community.
Harnessing Health.Data.gov Data to Address Diabetes in the US Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Government Technology & Innovation Incubator for Big Data Analytics Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community.
Data Science for NIST Big Data Framework Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Defense Strategies Institute Professional Educational Forum Harnessing the Power of Big Data for The Intelligence Community November 17-18, 2015 Mary M.
Climate Change & Genomic Data - Data Science Meetup of Meetups Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community.
Data Science for EarthCube 2015 Key Documents Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Data Science for Global Ebola Response Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science.
National Data Science Organizers Lightning Talks From Around the Country Dr. Brand Niemann Founder and Co-Organizer Federal Big Data Working Group Meetup.
Data Science and Semantic Insights for DoD Joint Doctrine Meetup Dr. Brand Niemann Founder and Co-Organizer Federal Big Data Working Group Meetup Director.
-gSSURGO- Using the Soil Data Management Toolbox Steve Peaslee USDA-NRCS National Soil Survey Center Lincoln, Nebraska March.
Data Science for the National Big Data R&D Initiative Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
Data Science for RDA Climate Change Data Challenge and Meetup
Spotfire 5 Users Guide Dashboard
Presentation transcript:

Data Science for USDA Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Big Data Science for Precision Farming Business August 19, 2015

The Journey USDA Data Sources: Open Data Innovation Challenge Data Driven Precision Farming Online Course Data Audit (See Next Slide for Details): Mine Science Questions Publish Results: Open Data: Problem Reading Catalog Was Fixed and USDA Data Science MOOC Was Created Innovation Challenge: Problems with Farm Data Dashboard Data Sets and Went Back to Original Data Sets Data Driven Precision Farming Online Course: Problems Understanding Multiple Soils Data Sets Being Sorted Out

Data Mining - Science - Questions - Publication Process Data Mining Process: Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment Data Science Process: Data Ecosystem Data Story Data Science Questions: How was the data collected? Where is the data stored? What are the data results? and Why should we believe the data results? Data Science Data Publication: Knowledge Base Spreadsheet Index Web & PDF Tables to Spreadsheet Data Browser Dynamically Linked Adjacent Visualizations

NIH Data Commons FAIR Principles: Cloud: Federal Science Policy: Findable Accessible Interoperable Reusable Cloud: Data Software Results Federal Science Policy: OSTP Public Access to Scientific Data Memo (February 2013) New Program: Big-Data-to- Knowledge (2013) New Position: Associate Director of Data Science (2014) Digital Enterprise (2015): Data Commons Metadata Open APIs Digital Objects Containers Federal Big Data Working Group Meetup, August 17, 2015: A NIH – Semantic Medline Data Science Data Publication Commons

OSTP/NSF Data Science Meetup of Meetups Week of November 2nd: NSF Data Science/Big Data Principal Investigators (About 300) NSF Data Hubs (4) Organizers of Largest Data Science/Big Data Meetups (About 65) Pipeline for Return on Investment: PIs put their data, tools and research results in the Data Hubs Data Hubs provide those data, tools, and research results to the world, but especially to the Data Science/Big Data Meetups Data Science/Big Data Meetups collaborate with PIs and Data Hubs to increase usage and feedback

We Already Do This! Semantic Community: Provides a Community Sandbox that is like a GitHub, Data Hub, Data Commons, etc. Metadata (MindTouch) Open APIs (MIndTouch) Digital Objects (MindTouch) Containers (Spotfire) Organize the Federal Big Data Working Group Meetup Support Agencies and Programs in Crowdsourcing Their Data Sets Mentor Data Scientists (Tutorials and MOOCs) and Entrepreneurs (Eastern Foundry) Federal Big Data Working Group Meetup: Federal: Supports the Federal Big Data Initiative, but not endorsed by the Federal Government or its Agencies; Big Data: Supports the Federal Digital Government Strategy which is "treating all content as data", so big data = all your content; Working Group: Data Science Teams composed of Federal Government and Non- Federal Government experts producing big data products; and Meetup: The world's largest network of local groups to revitalize local community and help people around the world self- organize like MOOCs (Massive Open On-line Classes) now embraced by the White House.

http://semanticommunity.info/Data_Science/Big_Data_Science_for_Precision_Farming_Business#Story_2

USDA Big Data in Spotfire USDAMOOC-Spotfire: 1.7 GB Web Player USDANASS-Spotfire: 730 MB FarmDataDashboard1-Spotfire: 521 MB Web Player FarmDataDashboard2-Spotfire: 1.2 GB In Process FarmData-Spotfire: 15 MB  Web Player NCSSSoilSurvey-Spotfire: 235 MB NCSSSoilCharacterizationDatabase6302015GDB-Spotfire: 144 MB

Data Science Data Publication

Web Player 10 MB

Story

Web Player 2.6 MB Web Player

Extra Slides on USDA Soils Data Sets File Inventory for Weeks 4 and 5 of Online Course Multiple Data Sets in Multiple Formats in Multiple Places The Newest Gridded Soil Survey Geographic (gSSURGO) Database Requires Advanced Tool to Convert from GDB to SHP 32 SHP Files with Attributes But No Location and Many Access Data Sets to Export to Excel Spotfire for Data Relationships (Statistical and Linking)

Weeks 4 and 5 Soil File Inventory Week 4: Modeling NCSS_Soil_Characterization_Database: GDB and Access (NCSSSoilCharacterizationDatabase6302015GDB-Spotfire) SoilDataAvailabilityShapefile: Shape (NCSSSoilSurvey-Spotfire) Otoe County, Nebraska: PDF, GIF, and PNG (FarmData-Spotfire) Master_query 72 MB Excel and NCSS_Site_Location 7 MB Excel (NCSSSoilSurvey- Spotfire) and (FarmData-Spotfire) Week 5: Evaluation wss_SSA_NE131_soildb_NE_2003_[2014-09-02] 19 MB Shape and 12 MB Access (NCSSSoilCharacterizationDatabase6302015GDB-Spotfire) wss_gsmsoil_NE_[2006-07-06] 8 MB Shape and 11 MB Access (NCSSSoilCharacterizationDatabase6302015GDB-Spotfire) nrcs142p2_052440 28 MB Shape and Excel and PDF Image Map (NCSSSoilSurvey- Spotfire) and (FarmData-Spotfire)

Digital Soil Geographic Databases (GIS-ready) Land Resource Regions (LRR) and Major Land Resource Areas (MLRA): http://www.nrcs.usda.gov/wps/portal/nrcs/detail/soils/survey/geo/?cid=nrcs142p2_053624 Common Resource Areas (CRA): http://www.nrcs.usda.gov/wps/portal/nrcs/detail/soils/survey/geo/?cid=nrcs142p2_053635 U.S. General Soil Map (STATSGO2): http://www.nrcs.usda.gov/wps/portal/nrcs/detail/soils/survey/geo/?cid=nrcs142p2_053629 Soil Survey Geographic (SSURGO) Database: http://www.nrcs.usda.gov/wps/portal/nrcs/detail/soils/survey/geo/?cid=nrcs142p2_053627 Gridded Soil Survey Geographic (gSSURGO) Database (See Previous Slides): http://www.nrcs.usda.gov/wps/portal/nrcs/detail/soils/survey/geo/?cid=nrcs142p2_053628 National Cooperative Soil Survey Soil Characterization Database (Pedons): http://ncsslabdatamart.sc.egov.usda.gov/

Geospatial Data Gateway

Map Layer Status Map/Spotfire Source Format Projection Major Land Resource Areas by State Link Yes USDA ESRI Shape, ESRI File GeoDataBase Geographic,UTM,State Plane Common Resource Areas by State Link State Only? Soil Survey Spatial and Tabular Data (SSURGO 2.2) ESRI Shape WGS84Geographic Raster Soil Survey Link GDB-to-SHP, Not Yet ESRI File GeoDataBase AutoUTM to county U.S. General Soil Map (STATSGO2) by State Link Nebraska, Yes Gridded Soil Survey Geographic (gSSURGO) by State or Conterminous U.S. Link GDB-to-SHP, Yes Albers My Note: These Links Do Not Go To Data Download

http://websoilsurvey. sc. egov. usda http://websoilsurvey.sc.egov.usda.gov/DSD/Download/Cache/STATSGO2/wss_gsmsoil_US_[2006-07-06].zip My Note: Did Not Download

http://www.nrcs.usda.gov/wps/portal/nrcs/site/soils/home/ FY2015 gSSURGO Database Release The FY2015 Gridded Soil Survey Geographic (gSSURGO) Database was released on February 23, 2015. These data are derived from a December 1, 2014, snapshot of the Soil Data Mart database. These new data are available in both state-wide tiles and the Conterminous U.S. (CONUS).

Description of Gridded Soil Survey Geographic (gSSURGO) Database Gridded SSURGO (gSSURGO) is similar to the standard USDA-NRCS Soil Survey Geographic (SSURGO) Database product but in the format of an Environmental Systems Research Institute, Inc. (ESRI®) file geodatabase. A file geodatabase has the capacity to store much more data and thus greater spatial extents than the traditional SSURGO product. This makes it possible to offer these data in statewide or even Conterminous United States (CONUS) tiles. gSSURGO contains all of the original soil attribute tables in SSURGO. All spatial data are stored within the geodatabase instead of externally as separate shapefiles. Both SSURGO and gSSURGO are considered products of the National Cooperative Soil Survey (NCSS) partnership. The gridded SSURGO (gSSURGO) dataset was created for use in national, regional, and statewide resource planning and analysis of soils data. The raster map layer data can be readily combined with other national, regional, and local raster layers, including the National Land Cover Database (NLCD), the National Agricultural Statistics Service (NASS) Crop Data Layer (CDL), and the National Elevation Dataset (NED).

32 SHP Files with Attributes But No Location Access Database

Web Player