New ways of exploring environmental data or: Letting do the hard work Jon Blower (ESSC and Reading e-Science Centre)

Slides:



Advertisements
Similar presentations
1 NASA CEOP Status & Demo CEOS WGISS-25 Sanya, China February 27, 2008 Yonsook Enloe.
Advertisements

The Reading e-Science Centre Jon Blower Reading e-Science Centre Environmental Systems Science Centre University of Reading United Kingdom.
BARRODALE COMPUTING SERVICES LTD. Managing and serving large volumes of gridded spatial environmental data Adit Santokhee, Chunlei Liu,
Samford University Virtual Supercomputer (SUVS) Brian Toone 4/14/09.
BlogMyData A Virtual Research Environment for collaborative visualization of environmental data Andrew Milsted | 14 September 2010.
Dynamic Quick View, interoperability and the future Jon Blower, Keith Haines, Chunlei Liu, Alastair Gemmell Environmental Systems Science Centre University.
Planned Title: Review of Evaluation of Geospatial Search Allan Doyle.
The MashMyData project Combining and comparing environmental science data on the web Alastair Gemmell 1, Jon Blower 1, Keith Haines 1, Stephen Pascoe 2,
Exploring large marine datasets using an interactive website and Google Earth Jon Blower, Dan Bretherton, Keith Haines, Chunlei Liu, Adit Santokhee Reading.
NERC Data Grid Helen Snaith and the NDG consortium …
By Godfrey Aziyo Department of LIS Telephone:
Internet GIS. A vast network connecting computers throughout the world Computers on the Internet are physically connected Computers on the Internet use.
TPAC Digital Library Talk Overview Presenter:Glenn Hyland Tasmanian Partnership for Advanced Computing & Australian Antarctic Division Outline: TPAC Overview.
Client/Server Architectures
Computer for Health Sciences
1 Introduction to web mapping Dissemination of results, maps and figures ESTP course on Geographic Information Systems (GIS): Use of GIS for making statistics.
The use of standard OGC web services in integrating distributed model, satellite and in-situ datasets Alastair Gemmell Jon Blower Keith Haines Environmental.
Ch 4. The Evolution of Analytic Scalability
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Chromium OS is an open-source project that aims to build an operating system that provides a fast, simple, and more secure computing experience for people.
CIS 375—Web App Dev II Microsoft’s.NET. 2 Introduction to.NET Steve Ballmer (January 2000): Steve Ballmer "Delivering an Internet-based platform of Next.
GADS: A Web Service for accessing large environmental data sets Jon Blower, Keith Haines, Adit Santokhee Reading e-Science Centre University of Reading.
MapReduce April 2012 Extract from various presentations: Sudarshan, Chungnam, Teradata Aster, …
Running Climate Models On The NERC Cluster Grid Using G-Rex Dan Bretherton, Jon Blower and Keith Haines Reading e-Science Centre Environmental.
A Distributed Computing System Based on BOINC September - CHEP 2004 Pedro Andrade António Amorim Jaime Villate.
Unidata’s TDS Workshop TDS Overview – Part II October 2012.
Open Source Web Mapping Server Products (Spatially-enabled Internet applications)‏ Rex Thaxton & Jerry Workman Mountain CAD Corporation 339 Sixth Ave.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
Unidata TDS Workshop TDS Overview – Part I XX-XX October 2014.
An Introduction To Building An Open Standard Web Map Application Joe Daigneau Pennsylvania State University.
material assembled from the web pages at
NOCS, PML, STFC, BODC, BADC The NERC DataGrid = Bryan Lawrence Director of the STFC Centre for Environmental Data Archival (BADC, NEODC, IPCC-DDC.
DELIVERING ENVIRONMENTAL WEB SERVICES (DEWS) Partners: UK Met Office (Lead Partner), British Atmospheric Data Centre (BADC), British Maritime Technology.
Ch 1. A Python Q&A Session Spring Why do people use Python? Software quality Developer productivity Program portability Support libraries Component.
Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.
MapTube and Streaming Google Earth – Sharing Data via Anonymous Servers Richard Milton and Andrew Hudson-Smith CASA, UCL UCL CENTRE FOR ADVANCED SPATIAL.
Peter Laird. | 1 Building Dynamic Google Gadgets in Java Peter Laird Managing Architect WebLogic Portal BEA Systems.
The Arctic Observing Network (AON) Cooperative Arctic Data and Information Service (CADIS) Florence Fetterer,
Technical Workshops | Esri International User Conference San Diego, California Creating Geoprocessing Services Kevin Hibma, Scott Murray July 25, 2012.
BARRODALE COMPUTING SERVICES LTD. Spatial Data Activities at the Reading e-Science Centre Adit Santokhee, Jon Blower, Keith Haines Reading.
1 Geospatial and Business Intelligence Jean-Sébastien Turcotte Executive VP San Francisco - April 2007 Streamlining web mapping applications.
Building a Web-based GIS Portal For the Great Lakes Observing System (GLOS) Pete Giencke Program Specialist Data and Information Management.
Opendap dev - meeting, Boulder, Feb 2007 OPeNDAP infrastructure in European Operational Oceanography T Loubrieu (IFREMER) T Jolibois (CLS)
Composing workflows in the environmental sciences using Web Services and Inferno Jon Blower, Adit Santokhee, Keith Haines Reading e-Science Centre Roger.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Foundation year Lec.3: Computer SoftwareLec.3: Computer Software Lecturer: Dalia Mirghani Year: 2014/2015.
MIS 105 LECTURE 1 INTRODUCTION TO COMPUTER HARDWARE CHAPTER REFERENCE- CHP. 1.
Building the e-Minerals Minigrid Rik Tyer, Lisa Blanshard, Kerstin Kleese (Data Management Group) Rob Allan, Andrew Richards (Grid Technology Group)
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
Esri UC 2014 | Technical Workshop | Creating Geoprocessing Services Kevin Hibma.
Information Technology: GrADS INTEGRATED USER INTERFACE Maps, Charts, Animations Expressions, Functions of Original Variables General slices of { 4D Grids.
© University of Reading 2008www.reading.ac.uk Reading e-Science Centre 9 September 2008 Harmonization of environmental data using the Climate Science Modelling.
Research & Development Building a science foundation for sound environmental decisions Remote Sensing Information Gateway (RSIG)
1 Adventures in Web Services for Large Geophysical Datasets Joe Sirott PMEL/NOAA.
Introduction TO Network Administration
Welcome to the PRECIS training workshop
Using Google Maps and other OpenSource GIS software for displaying geospatial data Jon Blower, Dan Bretherton, Keith Haines, Chunlei Liu, Adit Santokhee.
Grid Remote Execution of Large Climate Models (NERC Cluster Grid) Dan Bretherton, Jon Blower and Keith Haines Reading e-Science Centre
Climate-SDM (1) Climate analysis use case –Described by: Marcia Branstetter Use case description –Data obtained from ESG –Using a sequence steps in analysis,
PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.
GeoSpatial Analysis UNICEF Security Advisors Workshop 20 October 2010.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND.
Advances with the DDS David J. S. Poulter, British Oceanographic Data Centre, National Oceanography Centre, UK
Reading e-Science Centre Technical Director Jon Blower ESSC Director Rachel Harrison CS Director Keith Haines ESSC Associated Personnel External Collaborations.
DELIVERING ENVIRONMENTAL WEB SERVICES (DEWS)
X3D Technology Approach for Developing 3D Web-GIS System
IRI Data Library Overview
Spatial Data Activities at the Reading e-Science Centre
Ch 4. The Evolution of Analytic Scalability
Presentation transcript:

New ways of exploring environmental data or: Letting do the hard work Jon Blower (ESSC and Reading e-Science Centre)

Motivation The environmental sciences are very data-intensive –Satellite data (high resolution, several spectral bands) –Numerical model output data –Raw data -> analysis -> re-analysis –Ensembles –Easy to get up to terabytes of data Data are expensive to produce and are economically valuable –Strong real-time requirement in many cases Need ways to cope with large datasets and make sense of them Computers get faster and disks get bigger –But we can always fill them But our brains stay the same size!

Technical barriers Each data provider has its own preferred data format –NetCDF, HDF, HDF5, GRIB, PP, GeoTIFF, more –and there are many varieties of the above Data exist on a variety of grids –Latitude-longitude –Rotated-pole –Tri-polar –Or might not be on a grid at all (spectral format) Data providers choose different naming conventions –e.g. “temperature”, “temp”, “T” This makes even simple tasks hard –users should not have to care about any of these details

Solutions Expose data using standard interfaces –irrespective of how data are ultimately stored –Defining these interfaces is a community effort Provide simple tools for simple tasks –e.g. simple Web interface Use distributed computing to work with very large datasets –more of this later…

GADS Grid Access Data Service GADS is a software library for accessing gridded data Hides details of storage from users –user’s don’t have to know internal data formats or naming conventions Uses standard names Can make queries about data … –e.g. “what variables are there in dataset X?” … and get data subsets DATA GADS library META DATA Applications

GODIVA web portal The GODIVA Web portal provides a graphical interface to data at ESSC Uses GADS to query and extract data sets Users can make simple visualisations –pictures and movies

GADS as a Web Service Web Services are a standard way of building distributed systems “Black box” subroutines that are executed over the Internet Platform-independent –strong interoperability GADS has a Web Service interface Means that external applications can use the GADS routines at ESSC DATA GADS library META DATA WS interface External applications

GADS application: Search and Rescue British Maritime Technology produce software (SARIS) to help the Coastguard with Search and Rescue Predicts drift patterns of people and objects that have fallen overboard –This significantly cuts the time to rescue Have worked with BMT to produce prototype that uses live Met Office data from GADS to improve its predictions –Uses forecasts of surface winds and surface currents Can also be applied to oil spills

Geographical Information Systems (GIS) Many companies produce GIS software for manipulating and visualizing geographical data –e.g. ArcInfo, Maptitude, many more –Big business! Very sophisticated and powerful –Spatial statistics, geoprocessing, mapping… –e.g. identify high-risk flood zones, assess effectiveness of ambulance centres Historically very map-oriented (2-d or “2.5d”) –Hence not so useful in ocean/atmosphere sciences (need 4-d) Vendors typically used proprietary formats and interfaces –Users “locked in” to a particular vendor, hard to share information The Open Geospatial Consortium is addressing these issues

OGC Web Services Web ServicePurpose Web Map Server (WMS)Serves map images (cf. Streetmap, Multimap) Web Feature Server (WFS)Serves geographical features (roads, rivers, hospital locations etc) Web Coverage Server (WCS)Serves multidimensional data (e.g. numerical model output) Web Processing Server (WPS)Processes data Lots more! (roughly in decreasing order of maturity) Services can be composed to create a distributed geospatial application

NERC Data Grid (NDG) NERC e-Science project led by BADC Will provide software for discovery and delivery of data Data will be distributed between NDG and other groups (NDG won’t hold everything) Vast diversity of data types (all NERC data!) Rigidly standards-based (ISO) –Metadata is all-important: enables data discovery –Have created CSML (Climate Science Markup Language) – describes 7 feature types Producing whole array of OGC-compliant Web Services –Key task is to add proper security

ProfileFeature GridFeature ProfileSeriesFeature Some CSML features

NDG: data extractor and GeoSPLaT

Other uses of OGC Web Services DEWS project (Delivering Environmental Web Services) –Deliver Met Office data to end users in marine and health sectors –Marine applications: Search and rescue –Health application: Chronic Obstructive Pulmonary Disease (COPD prediction) –Re-engineering GADS to be WCS-compliant –Using NDG security layer –Will hopefully influence Met Office’s data provision in future GDEVIL project (Data Assimilation Research Centre) –In conjunction with RSI (makers of ENVI and IDL) –Made WCS server and client software for extracting and visualizing large datasets

The story so far: summary We can look forward to much easier access to data –Allows more end-users (e.g. industry) to get data in real time and at lower cost Data providers will work with the same OGC standards Web Services are a key technology NERC, Met Office, ECMWF data (and more) will be available to you through the NERC DataGrid Still lots of work to do –e.g. descriptions of community-specific datasets

The next generation…

Google Maps Web-based “widget” for viewing map data –or any images in fact Like Streetmap, Multimap etc but much slicker –draggable map –fast response time Can mark locations

Google Earth “Mapping for the masses” –According to Nature Desktop application (Windows and Mac) for displaying geographical data –Satellite images –Earthquake locations –Live data! All on a 3-D spinning globe Can view data at all scales Very easy to incorporate new data –easy as writing a simple Web page

Example of a KML file

How it renders

More examples of Google Earth data Post-Katrina satellite images Sea ice cover and ice velocity Locations of ARGO floats Bird flu outbreaks

Google Maps vs Google Earth Google MapsGoogle Earth Web-based – works on any modern browser (with Javascript) Standalone application – Windows and Mac only Only two layers of pictures per map (base plus overlay). As many layers of pictures as you like Some specialist knowledge required to incorporate your own data Easy to distribute new data via the web (just write a KML file) or incorporate data from local disk Relatively feature-poorFeature-rich Code has been released to publicClosed-source (black box) Both load data from servers on-the fly Neither deal with animations very well (if at all)

“GODIVA Two” (currently under development) Near-instantaneous previews of data Draggable Google Map for easy navigation Adjustable scale links to Google Earth Now we really are exploring data! An AJAX application (all donkey work is still done by GADS)

What can be done with Godiva2? Search through data very quickly using the Web interface Pick your own scale range –crude identification of isotherms Having identified data, explore further in Google Earth –Incorporate multiple data sources into GE –Overlay a lat-lon grid –Measure the size of features –much more! Download data into your application of choice (IDL, Matlab) Future modifications to Godiva2: –Other slices through data e.g. xt (Hovmuller) –Movies –Collaborative GE? –Simple data processing e.g. statistical calculations

ESSC Data serving architecture DATA GADS library META DATA Web Service interface Google Maps interface Google Earth interface Tomcat Application Server SARIS Other apps Google Maps Google Earth SOAP messaging HTTP GET

Geospatial databases A lot of the above relies on fast access to data in a multi-user environment This is the sort of thing that databases do well But most databases don’t deal well with geospatial data –Some exceptions, e.g. PostGIS –Gridded data is still a problem for most systems We have been evaluating software from Barrodale Computing Services –Very advanced geospatial database that supports gridded data –Versions for PostgreSQL, Informix, Oracle –Demos exist at Results are very promising –Faster than our system especially for small data extractions –Caches recently-used data for extra speed But this is commercial software –We have an evaluation version, in return for feeding back requirements

“New” methods for data processing

Data processing Environmental datasets are typically large and distributed In many cases data processing can be sped up through parallel processing Can also help with problem of dealing with multiple users on a data-intensive website –Website must be responsive Often tasks can be “trivially parallelized” –But even this is often awkward Let’s look at some tools we can use to make this easy

Condor Mature technology for scheduling jobs (programs) on ordinary desktop machines –“Cycle stealing” Makes good use of existing resources Ideal for applications where you need to run the same executable lots of times on different data sets –Monte Carlo simulations –Parameter sweeps Can also run MPI jobs Very popular world-wide

Condor application: TRACK TRACK identifies and tracks storms in numerical model output –Identifies pressure lows and vorticity highs Use Condor to run TRACK over large numbers of datasets –Datasets are downloaded from the Internet on-demand Then produce statistics and diagnostics using the results –Tells us about the predictability of storms Web interface Lizzie Froude and Kevin Hodges

BOINC Berkeley Open Infrastructure for Network Computing Used by ClimatePrediction.net and Run code on volunteer computers (i.e. home computers) –In background or as a screensaver –Windows, Linux, Mac OSX Each computer downloads a chunk of data to process –In CP.net, each computer runs a simulation of evolution of Earth’s climate Then uploads results Volunteers join BOINC, then decide which projects they want to be involved in Have to deal with users dropping out –Also some volunteers have been known to tamper with results Some users use CP.net running speed for bragging about their computers!

ClimatePrediction.net on the BBC

Distributed Parallel Processing Environment for Java (DPPEJ) Run jobs in parallel by creating a number of Java threads Each thread runs on a different machine Easy to get started –If you’re a Java programmer Test case: search through 250 OCCAM ¼ degree ocean data files (5 GB total) looking for files that contain extreme temperatures –No point in using more than 4 machines for this task –Limited by disk access speed Time Number of threads 4 threads

MapReduce Google have written papers on how they do some of their distributed computing –All done on clusters of commodity machines –Have to take into account machine failures A key concept is the “Map-Reduce” programming model –One routine maps input data to intermediate output –Another routine reduces this to a final result E.g. Map names of data files to locations of storms contained therein Then plot these data on a single plot (reduce) Open source implementation of this programming model in Java (Hadoop) Programmers don’t have to worry about details of parallelization and fault tolerance –Just write a Map function and a Reduce function

Parallel processing tools: summary Condor –uses spare power of desktop machines –for running a program lots of times –run compiled executables – can write in any language –not real-time (jobs might not run immediately) Many other systems –Sun GridEngine, PBS, etc (often installed with clusters) BOINC (also World Community Grid and others) –Potentially lots of computers involved –Issue of trust in results –Good way to reach general public DPPEJ, MapReduce –Must program in Java, but easy if you know how –Idea is to reduce development time –MapReduce has fault-tolerance –Would probably sit behind a website like Godiva2 – most scientists wouldn’t use these directly

What resources are available? ESSC Condor pool Reading Campus Grid –Currently a Condor pool in Computer Science Dept –Will incorporate other resources in future (e.g. library machines, clusters) National Grid Service –2000 processors, and over 36TB –CPUs heavily used, data capacity under-used OxGrid (in future) –Intend to connect this to RCG In ideal world all these would be linked –You would then submit jobs via a single portal –this is Grid computing!

Where do we go from here?

Environmental e-Science toolkit The Reading e-Science Centre is building a “toolkit” for environmental e-Science Will incorporate many of the ideas we have seen today –Fast web access to data (“Godiva2”) –Google Maps and Google Earth interfaces –Parallel data processing at back end (for common processing tasks) –Perhaps IDL/Matlab/CDAT interfaces to the same back-end –Fast searches through data Easy access to resources such as the National Grid Service, Reading Campus Grid We will work closely with the NERC DataGrid Please tell us what you would like!

Stuff that you can do now Think about exposing your data through Google Earth –Easy to do –Reaches a wide range of people including the public –Great for demos –Useful for teaching? Think about what you could achieve if you had more processing power –And easy access to it If you are a data provider, look at the OGC standards and seriously consider using them Talk to us –I would especially like to hear about real science use cases

Thank you