MATLAB, Big Data, and HDF Server

Slides:



Advertisements
Similar presentations
The Datacenter Needs an Operating System Matei Zaharia, Benjamin Hindman, Andy Konwinski, Ali Ghodsi, Anthony Joseph, Randy Katz, Scott Shenker, Ion Stoica.
Advertisements

DC GIS Presentation 1/14/2007 DC GIS Use of Google Geospatial Technology MWGOG GIS Committee January 14, 2008 Barney Krucoff GIS Director District of Columbia.
Christine White, Esri Growing OPeNDAP Support: Current ArcGIS Workflows and Future Directions Christine White, Esri
Anne Mascarin DSP Marketing The MathWorks
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
MATLAB and Scientific Data: New Features and Capabilities
© 2005 The MathWorks December 2 nd, 2005 MATLAB ® and HDF Accelerating Engineering Productivity and Scientific Discovery.
Cloud Computing. What is Cloud Computing? Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable.
DISTRIBUTED DATA FLOW WEB-SERVICES FOR ACCESSING AND PROCESSING OF BIG DATA SETS IN EARTH SCIENCES A.A. Poyda 1, M.N. Zhizhin 1, D.P. Medvedev 2, D.Y.
, Implementing GIS for Expanded Data Accessibility and Discoverability ASDC Introduction The Atmospheric Science Data Center (ASDC) at NASA Langley Research.
Company Overview for GDF Suez December 29, Enthought’s Business Enthought provides products and consulting services for scientific software solutions.
material assembled from the web pages at
Introduction to ArcView NPS Introduction to GIS: Lecture 2 Based on NINC, ESRI and Other Sources.
GIS On The Web: An Overview of ArcIMS. *The easy flow of geographic data can offer real-life solutions in many societal sectors, including municipal government,
DATABASE MANAGEMENT SYSTEMS CMAM301. Introduction to database management systems  What is Database?  What is Database Systems?  Types of Database.
Remote Data Access with OPeNDAP Dr. Dennis Heimbigner Unidata netCDF Workshop October 25, 2012.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
Handling Landsat Images with Matlab Malinda Siriwardana, Prof. Yuji Murayama University of Tsukuba Graduate School of Life and Environmental Science 132.
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.
Big Data Analytics Platforms. Our Team NameApplication Viborov MichaelApache Spark Bordeynik YanivApache Storm Abu Jabal FerasHPCC Oun JosephGoogle BigQuery.
Sharing Maps and Layers to Portal for ArcGIS Melanie Summers, Tom Shippee, Ty Fitzpatrick.
Leverage Big Data With Hadoop Analytics Presentation by Ravi Namboori Visit
Prof. Jong-Moon Chung’s Lecture Notes at Yonsei University
Platform as a Service (PaaS)
Scaling Big Data Mining Infrastructure: The Twitter Experience
Organizations Are Embracing New Opportunities
SNS COLLEGE OF TECHNOLOGY
Platform as a Service (PaaS)
Data Analytics using MATLAB and HDF5
Matlab.
Hadoop.
DocFusion 365 Intelligent Template Designer and Document Generation Engine on Azure Enables Your Team to Increase Productivity MICROSOFT AZURE APP BUILDER.
By Chris immanuel, Heym Kumar, Sai janani, Susmitha
Big Data A Quick Review on Analytical Tools
MATLAB Distributed, and Other Toolboxes
Tutorial: Big Data Algorithms and Applications Under Hadoop
Data Sharing We all need data
Spark Presentation.
HDF5 October 8, 2017 Elena Pourmal Copyright 2016, The HDF Group.
MatLab Programming By Kishan Kathiriya.
Platform as a Service.
Welcome to MATLAB.
Introduction to R Programming with AzureML
Recap: introduction to e-science
The Improvement of PaaS Platform ZENG Shu-Qing, Xu Jie-Bin 2010 First International Conference on Networking and Distributed Computing SQUARE.
Hadoop Clusters Tess Fulkerson.
University of Technology
Blaze - An IoT Analytics Engine
System And Application Software
Cloud Distributed Computing Environment Hadoop
Learn about MATLAB Engineers – not sales!
Accelerate Your Self-Service Data Analytics
Weaving Abstractions into Workflows
Media365 Portal by Ctrl365 is Powered by Azure and Enables Easy and Seamless Dissemination of Video for Enhanced B2C and B2B Communication MICROSOFT AZURE.
Big Data Overview.
Overview of big data tools
Cloud computing mechanisms
Accessing Remote Datasets through the netCDF interface.
Charles Tappert Seidenberg School of CSIS, Pace University
Big Data, Bigger Data & Big R Data
Web AppBuilder for ArcGIS
Tile layers, map image layers, and on-premises Web GIS
Big DATA.
Server & Tools Business
MapReduce: Simplified Data Processing on Large Clusters
Map Reduce, Types, Formats and Features
DIBBs Brown Dog BDFiddle
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Slide Summary: Perpetual Reality, who we are, and why we exist
Presentation transcript:

MATLAB, Big Data, and HDF Server Ellen Johnson MathWorks

Overview MATLAB capabilities and domain areas Scientific data in MATLAB HDF5 interface NetCDF interface Big Data in MATLAB MATLAB data analytics workflows RESTful web service access Demo: Programmatically access HDF5 data served on HDF Server

DESIGNED FOR CUSTOMERS IN Embedded system development Engineering Education Aircraft and missile guidance systems Control system design Communications system design Earth Sciences Engineering research Robotics Online trading systems System optimization Computational Biology CUSTOMERS IN Aerospace and defense Automotive Biotech and pharmaceutical Communications Education Electronics and semiconductors Energy production Financial services Industrial automation and machinery Medical devices Software Internet The MathWorks

Scientific Data in MATLAB Scientific data formats HDF5, HDF4, HDF-EOS2 NetCDF (with OPeNDAP!) FITS, CDF, BIL, BIP, BSQ Image file formats TIFF, JPEG, HDR, PNG, JPEG2000, and more Vector data file formats ESRI Shapefiles, KML, GPS and more Raster data file formats GeoTIFF, NITF, USGS and SDTS DEM, NIMA DTED, and more Web Map Service (WMS)

HDF5 in MATLAB High Level Interface (h5read, h5write, h5disp, h5info) h5disp('example.h5','/g4/lat'); data = h5read('example.h5','/g4/lat'); Low Level Interface (Wraps HDF5 C APIs) fid = H5F.open('example.h5'); dset_id = H5D.open(fid,'/g4/lat'); data = H5D.read(dset_id); H5D.close(dset_id); H5F.close(fid); h5disp maps to h5dump try, catch don’t have to recompile your code to play with the lower level interfaces Run code as you type it

NetCDF in MATLAB High Level Interface (ncdisp, ncread, ncwrite, ncinfo) url = 'http://oceanwatch.pifsc.noaa.gov/thredds/ dodsC/goes-poes/2day'; ncdisp(url); data = ncread(url,'sst'); Low Level Interface (Wraps netCDF C APIs) ncid = netcdf.open(url); varid = netcdf.inqVarID(ncid,'sst'); netcdf.getVar(ncid,varid,'double'); netcdf.close(ncid); ncdisp maps to ncdump

Big Data in MATLAB

Scale Data Memory and Data Access Programming Constructs Platforms 64-bit processors Memory Mapped Variables Disk Variables Databases Datastores Programming Constructs Streaming Block Processing Parallel-for loops GPU Arrays SPMD and Distributed Arrays MapReduce Big data means many different things to different users. MATLAB provides numerous capabilities for processing data that is too cumbersome for the desktop, as well for supporting big data systems such as Hadoop: 64 bit processors along with memory mapped and disk variables optimize processing on the desktop, while databases and our new datastore functionality allow for analyzing your data in segments. MATLAB also provides for various programming constructs to address the wide variety of data characteristics. Use system objects for stream processing, process images using block processing techniques, process your data in parallel or on GPUs using distributed arrays or the new Mapreduce framework in MATLAB to further enhance the speed of analysis and the volume of data which can be analyzed. Theses capabilities will let you analyze big data on your desktop, and if more processing power or workspace is needed scale to a cluster. If your data happens to reside in the big data platform Hadoop, we have some new features to allow MATLAB to interoperate with this big data platform. Platforms Desktop (Multicore, GPU) Clusters Cloud Computing (MDCS for EC2) Hadoop

Hadoop with MATLAB Production Hadoop Create applications or components that execute on Hadoop

Access Big Data datastore datastore for accessing large data sets Text or image files Single file or collection of files Preview data structure and format Select data to import using column names Incrementally read subsets of the data Access data stored in HDFS airdata = datastore('*.csv'); airdata.SelectedVariables = {'Distance', 'ArrDelay‘}; data = read(airdata); Datastore provides a straightforward way to access big data that consists of a single text or image file or a large collection of such files. Point the datastore to a folder or use wildcards to specify all the files in a given directory Preview a subset of the data for easy exploration Identify columns to import using column names, and specify the format for each column of interest Step through files a chunk at a time

Analyze Big Data mapreduce ******************************** * MAPREDUCE PROGRESS * Map 0% Reduce 0% Map 20% Reduce 0% Map 40% Reduce 0% Map 60% Reduce 0% Map 80% Reduce 0% Map 100% Reduce 25% Map 100% Reduce 50% Map 100% Reduce 75% Map 100% Reduce 100% mapreduce uses datastore to process data in chunks Intermediate analysis results do not fit in memory Processing multiple keys Data resides in Hadoop Work on the desktop Local data exploration, analysis, and algorithm development Scale to Hadoop Interactive use with MATLAB Distributed Computing Server Deploy to production Hadoop instances using MATLAB Compiler MapReduce is a powerful programming technique for applying filtering, statistics and other general analysis methods to big data. You can use mapreduce on your desktop machine for applications where the intermediate results of your analysis will not fit into memory, when the analysis is being done on many keys, or to develop algorithms for later use on data stored in HDFS, Hadoop Distributed File System. You can execute MATLAB MapReduce based algorithms within Hadoop MapReduce, using MATLAB Distributed Computing Server You can package MapReduce based algorithms for deploying to production Hadoop systems, using MATLAB Compiler™

Data Analytics with MATLAB Symbolic Computing Neural Networks Optimization Signal Processing Image Processing Control Systems Financial Modeling Apps Language Machine Learning Statistics

Enterprise-Scale Data Analytics Computation Layer Data Visualization Presentation Layer Cloud Analytics Layer Customer’s point of view, especially if talking to IT/Enterprise Archiect The key thing to take away from this slide is that there are many other companies in this space, but most of them should be considered complimentary to what we offer. Only a few are competitive: R, Python, SAS In the data layer, have the big data vendors, data warehouse vendors, … Story here is that we can work with customers to help them integrate with these Similar for the presentation layer… MathWorks Cloud Data Warehouses Databases Data Layer

Combining Big Data, RESTful Web Services, and MATLAB mapreduce and datastore functions table, categorical, and datetime data types are powerful in conjunction with big data analysis RESTful web service access webread, webwrite, and weboptions JSON objects represented as struct arrays struct2table converts data into table as a collection of heterogeneous data Combine to support MATLAB data analytics workflow Data import into appropriate data types Data Exploration Data Visualization Data Analysis

webread Example: Read historical temperature data Read historical temperature data from the World Bank Climate Data API >> api = 'http://climatedataapi.worldbank.org/climateweb/rest/v1/'; >> url = [api 'country/cru/tas/year/USA']; >> S = webread(url) S = 112x1 struct array with fields: year data >> S(1) ans = year: 1901 data: 6.6187

Demo: Using MATLAB to programmatically access and analyze data hosted on HDF Server HDF Server: A RESTful API providing remote access to HDF5 data Responses are JSON formatted text webread with weboptions provide data access table and datetime data types enable data analysis Example: Coral Reef Temperature Anomaly Database (CoRTAD) Version 3 CoRTAD products in HDF5 format 1.8G dataset hosted on h5serv running on Amazon AWS thermStress = sortrows(thermStress,'ThermalStressAnomaly','descend'); thermStress(1:10,:)   ans = Latitude Longitude ThermalStressAnomaly ________ _________ ____________________ -8.2839 137.53 52 -2.0874 146.67 51 -8.2399 137.49 50 -8.2399 137.53 50 -15.447 145.22 50 -15.491 145.22 50 -10.13 148.34 50 -4.5924 135.99 49

Thank you! Questions? www.mathworks.com www.mathworks.com/matlabcentral Examples: Using the high-level HDF5 Functions to Import Data Tackling Big Data with MATLAB Performing Numerical Simulation of an Oil Spill Reading Content from RESTful Web Service Thank you!

References www.hdfgroup.org https://hdfgroup.org/wp/2015/04/hdf5-for-the-web-hdf-server/ http://data.worldbank.org/developers/climate-data-api https://data.nasa.gov/data http://visibleearth.nasa.gov/ http://www.nodc.noaa.gov/sog/cortad/ http://data.nodc.noaa.gov/cgi-bin/iso?id=gov.noaa.nodc:0068999