Project Matsu: Large Scale On-Demand Image Processing for Disaster Relief Collin Bennett, Robert Grossman, Yunhong Gu, and Andrew Levine Open Cloud Consortium.

Project Matsu: Large Scale On-Demand Image Processing for Disaster Relief Collin Bennett, Robert Grossman, Yunhong Gu, and Andrew Levine Open Cloud Consortium June 21, 2010 www.opencloudconsortium.org

Project Matsu Goals Provide persistent data resources and elastic computing to assist in disasters: – Make imagery available for disaster relief workers – Elastic computing for large scale image processing – Change detection for temporally different and geospatially identical image sets Provide a resource to test standards and interoperability studies large data clouds

Part 1: Open Cloud Consortium

501(3)(c) Not-for-profit corporation Supports the development of standards, interoperability frameworks, and reference implementations. Manages testbeds: Open Cloud Testbed and Intercloud Testbed. Manages cloud computing infrastructure to support scientific research: Open Science Data Cloud. Develops benchmarks. 4 www.opencloudconsortium.org

OCC Members Companies: Aerospace, Booz Allen Hamilton, Cisco, InfoBlox, Open Data Group, Raytheon, Yahoo Universities: CalIT2, Johns Hopkins, Northwestern Univ., University of Illinois at Chicago, University of Chicago Government agencies: NASA Open Source Projects: Sector Project 5

Operates Clouds 500 nodes 3000 cores 1.5+ PB Four data centers 10 Gbps Target to refresh 1/3 each year. Open Cloud Testbed Open Science Data Cloud Intercloud Testbed Project Matsu: Cloud- based Disaster Relief Services

Open Science Data Cloud 7 Astronomical data Biological data (Bionimbus) Networking data Image processing for disaster relief

Focus of OCC Large Data Cloud Working Group 8 Cloud Storage Services Cloud Compute Services (MapReduce, UDF, & other programming frameworks) Table-based Data Services Relational-like Data Services App Developing APIs for this framework.

Tools and Standards Apache Hadoop/MapReduce Sector/Sphere large data cloud Open Geospatial Consortium – Web Map Service (WMS) OCC tools are open source (matsu-project) – http://code.google.com/p/matsu-project/ http://code.google.com/p/matsu-project/

Part 2: Technical Approach Hadoop – Lead Andrew Levine Hadoop with Python Streams – Lead Collin Bennet Sector/Sphere – Lead Yunhong Gu

Implementation 1: Hadoop & Mapreduce Andrew Levine

Image Processing in the Cloud - Mapper Mapper Input Key: Bounding Box Mapper Input Value: Mapper Output Key: Bounding Box Mapper Output Value: Mapper resizes and/or cuts up the original image into pieces to output Bounding Boxes (minx = -135.0 miny = 45.0 maxx = -112.5 maxy = 67.5) Step 1: Input to Mapper Step 2: Processing in Mapper Step 3: Mapper Output Mapper Output Key: Bounding Box Mapper Output Value: Mapper Output Key: Bounding Box Mapper Output Value: Mapper Output Key: Bounding Box Mapper Output Value: Mapper Output Key: Bounding Box Mapper Output Value: Mapper Output Key: Bounding Box Mapper Output Value: Mapper Output Key: Bounding Box Mapper Output Value: Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp

Image Processing in the Cloud - Reducer Reducer Key Input: Bounding Box (minx = -45.0 miny = -2.8125 maxx = -43.59375 maxy = -2.109375) Reducer Value Input: Step 1: Input to Reducer … … Step 2: Process difference in Reducer Assemble Images based on timestamps and compare Result is a delta of the two Images Step 3: Reducer Output All images go to different map layers set of images for display in WMS Timestamp 1 Set Timestamp 2 Set Delta Set

Implementation 2: Hadoop & Python Streams Collin Bennett

Preprocessing Step All images (in a batch to be processed) are combined into a single file. Each line contains the image’s byte array transformed to pixels (raw bytes don’t seem to work well with the one-line-at-a-time Hadoop streaming paradigm). geolocation \t timestamp | tuple size ; image width ; image height; comma-separated list of pixels the fields in red are metadata needed to process the image in the reducer

Map and Shuffle We can use the identity mapper All of the work for mapping was done in the pre-process step Map / Shuffle key is the geolocation In the reducer, the timestamp will be 1st field of each record when splitting on ‘|’

Implementation 3: Sector/Sphere Yunhong Gu

Sector Distributed File System Sector aggregate hard disk storage across commodity computers – With single namespace, file system level reliability (using replication), high availability Sector does not split files – A single image will not be split, therefore when it is being processed, the application does not need to read the data from other nodes via network – A directory can be kept together on a single node as well, as an option

Sphere UDF Sphere allows a User Defined Function to be applied to each file (either it is a single image or multiple images) Existing applications can be wrapped up in a Sphere UDF In many situations, Sphere streaming utility accepts a data directory and a application binary as inputs./stream -i haiti -c ossim_foo -o results

For More Information info@opencloudconsortium.org www.opencloudconsortium.org

Project Matsu: Large Scale On-Demand Image Processing for Disaster Relief Collin Bennett, Robert Grossman, Yunhong Gu, and Andrew Levine Open Cloud Consortium.

Similar presentations

Presentation on theme: "Project Matsu: Large Scale On-Demand Image Processing for Disaster Relief Collin Bennett, Robert Grossman, Yunhong Gu, and Andrew Levine Open Cloud Consortium."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Project Matsu: Large Scale On-Demand Image Processing for Disaster Relief Collin Bennett, Robert Grossman, Yunhong Gu, and Andrew Levine Open Cloud Consortium.

Similar presentations

Presentation on theme: "Project Matsu: Large Scale On-Demand Image Processing for Disaster Relief Collin Bennett, Robert Grossman, Yunhong Gu, and Andrew Levine Open Cloud Consortium."— Presentation transcript:

Similar presentations

About project

Feedback