Microsoft Research Faculty Summit 2008. Ian Foster Computation Institute University of Chicago & Argonne National Laboratory.

Slides:



Advertisements
Similar presentations
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Advertisements

The Data Lifecycle and the Curation of Laboratory Experimental Data Tony Hey Corporate VP for Technical Computing Microsoft Corporation.
A CSEL presentation based on I. Foster, Z. Yong, I. Raicu, and S. Lu, "Cloud Computing and Grid Computing 360-Degree Compared," in Grid Computing Environments.
Ian Foster Computation Institute Argonne National Lab & University of Chicago Services for Science.
Clouds from FutureGrid’s Perspective April Geoffrey Fox Director, Digital Science Center, Pervasive.
A Very Brief Introduction to iRODS
Ian Foster Computation Institute Argonne National Lab & University of Chicago Education in the Science 2.0 Era.
Authors: Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox Publish: HPDC'10, June 20–25, 2010, Chicago, Illinois, USA ACM Speaker: Jia Bao Lin.
Transform + analyze Visualize + decide Capture + manage Dat a.
Astrophysics, Biology, Climate, Combustion, Fusion, Nanoscience Working Group on Simulation-Driven Applications 10 CS, 10 Sim, 1 VR.
GPPC Connections 2011 | November 6-8 | Las Vegas, NV SharePoint 101: An Introduction to Microsoft SharePoint 2010 Joseph Tews, MCITP, MCT Summit Group.
MapReduce in the Clouds for Science CloudCom 2010 Nov 30 – Dec 3, 2010 Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox {tgunarat, taklwu,
WORKFLOWS IN CLOUD COMPUTING. CLOUD COMPUTING  Delivering applications or services in on-demand environment  Hundreds of thousands of users / applications.
1© Copyright 2013 EMC Corporation. All rights reserved. EMC and Microsoft SharePoint Server Performance Name Title Date.
Beyond Laboratory Notebooks: Next Era Biological Data Hurdles for Information Storage, Access and Distribution Richard Slayden, PhD. Microbiology, Immunology.
Bridge the gap between HPC and HTC Applications structured as DAGs Data dependencies will be files that are written to and read from a file system Loosely.
1 iPlant Data Store (iDS) Supporting the Lifecycle of Data Nirav Merchant 1.
Copyright © 2012 Cleversafe, Inc. All rights reserved. 1 Combining the Power of Hadoop with Object-Based Dispersed Storage.
Building Data-intensive Pipelines Ravi K Madduri Argonne National Lab University of Chicago.
Introductory Overview
Digital Asset Management for All? Visualising a Flexible DAMS Solution for Small and Medium Scale Institutions Paul Bevan Llyfrgell Genedlaethol Cymru.
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
SIDGrid The Social Informatics Data Grid Mark Hereld Computation Institute Argonne National Laboratory & University of Chicago.
Using Globus to Scale an Application Case Study 4: Scientific Workflow for Computational Economics Tiberiu Stef-Praun, Gabriel Madeira, Ian Foster, Robert.
1 1 Hybrid Cloud Solutions (Private with Public Burst) Accelerate and Orchestrate Enterprise Applications.
Microsoft Research Faculty Summit Paul Watson Professor of Computer Science Newcastle University, UK.
ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.
Ian Foster Computation Institute Argonne National Lab & University of Chicago From the Heroic to the Logistical Programming Model Implications of New Supercomputing.
Amazon Web Services BY, RAJESH KANDEPU. Introduction  Amazon Web Services is a collection of remote computing services that together make up a cloud.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Objectives.
IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.
Research and Educational Networking and Cyberinfrastructure Russ Hobby, Internet2 Dan Updegrove, NLR University of Kentucky CI Days 22 February 2010.
Building and Running caGrid Workflows in Taverna 1 Computation Institute, University of Chicago and Argonne National Laboratory, Chicago, IL, USA 2 Mathematics.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Web: Minimal Metadata for Data Services Through DIALOGUE Neil Chue Hong AHM2007.
Graph Data Analytics Arka Mukherjee, Ph.D. Global IDs Resolving Complexity at an Enterprise Scale.
Interoperability and Image Analysis KC Stegbauer.
The iPlant Collaborative Using iPlant for sharing, managing, and analyzing ecological data Ramona Walls Presented at ESA 2014 – Ignite session August 12,
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
©2012 LIESMARS Wuhan University Building Integrated Cyberinfrastructure for GIScience through Geospatial Service Web Jianya Gong, Tong Zhang, Huayi Wu.
1 Manage your Research Articles : Using Mendeley & Zotero Winter Term 2012 Helen B. Josephine
A Technical Overview Bill Branan DuraCloud Technical Lead.
Securing the Grid & other Middleware Challenges Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.
Globus.org/genomics Globus Galaxies Science Gateways as a Service Ravi K Madduri, University of Chicago and Argonne National Laboratory
Experiments in Utility Computing: Hadoop and Condor Sameer Paranjpye Y! Web Search.
June 3-6, 2003E-Society Lisbon Automatic Metadata Discovery from Non-cooperative Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science.
Workflow-Driven Science using Kepler Ilkay Altintas, PhD San Diego Supercomputer Center, UCSD words.sdsc.edu.
PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.
1 Open Science Grid: Project Statement & Vision Transform compute and data intensive science through a cross- domain self-managed national distributed.
Get Data to Computation eudat.eu/b2stage B2STAGE How to shift large amounts of data Version 4 February 2016 This work is licensed under the.
Data Infrastructure in the TeraGrid Chris Jordan Campus Champions Presentation May 6, 2009.
Transforming Science Through Data-driven Discovery Workshop Overview Ohio State University MCIC Jason Williams – Lead, CyVerse – Education, Outreach, Training.
TOWARDS AN ARCHITECTURE FOR NATIONAL DATA SERVICES Ian Foster Director, Computation Institute Argonne National Laboratory & The University of
Ian Foster Computation Institute Argonne National Lab & University of Chicago Towards an Open Analytics Environment (A “Data Cauldron”)
Enhancements to Galaxy for delivering on NIH Commons
Geoffrey Fox Panel Talk: February
Go LNG LNG Value Chain for Clean Shipping, Green Ports and Blue Growth in Baltic Sea Region.
CyVerse Tools and Services
Writing Effective Public Policy Papers:
Tools and Services Workshop
University of Chicago and ANL
Joslynn Lee – Data Science Educator
The Leadership Excellence Series
NEHEMIAH So I came to Jerusalem. When I had been there for three days, I got up during the night, along with a few men who were with me. But I.
Workflows in archaeology & heritage sciences
SDM workshop Strawman report History and Progress and Goal.
Design for Sign-up Niko Vesalainen
Clouds from FutureGrid’s Perspective
Defining the Grid Fabrizio Gagliardi EMEA Director Technical Computing
I-ASIST Meeting April 11, 2006 Stacy Kowalczyk
Presentation transcript:

Microsoft Research Faculty Summit 2008

Ian Foster Computation Institute University of Chicago & Argonne National Laboratory

If you want to build a ship, don’t drum up the men to gather wood, divide the work, and give orders. Instead, teach them to yearn for the vast and endless sea. Antoine de Saint- Exupéry

Folker Meyer, Genome Sequencing vs. Moore’s Law: Cyber Challenges for the Next Decade, CTWatch, August 2006.

Results out Data in Programs & rules in “No limits”  Storage  Computing  Format  Program Allowing for  Versioning  Provenance  Collaboration  Annotation

having the interior immediately accessible relatively free of obstructions to sight, movement, or internal arrangement generous, liberal, or bounteous in operation; live readily admitting new members not constipated

Rules Workflows Dryad MapReduce Parallel programs SQL BPEL Swift SCFL R R MatLab Octave

Virtualization Run any program, store any data Indexing Automated maintenance Provisioning Policy-driven allocation of resources to competing demands

Data

Transform Annotate Search Add to Tag Visualize Discover Extend Group Share

Astrophysics Cognitive science East Asian studies Economics Environmental science Epidemiology Genomic medicine Neuroscience Political science Sociology Solid state physics

500 TB reliable storage (data, metadata) 180 TB, 180 GB/s 17 Top/s analysis Data ingest Dynamic provisioning Parallel analysis Remote access Offload to remote data centers P A D S Diverse users Diverse data sources 1000 TB tape backup

CPU cores: Tasks: Elapsed time: 7257 sec Compute time: CPU yr Average task time: 667 sec Relative Efficiency: 99.7% (from 16 to 32 racks) Utilization: Sustained: 99.6% Overall: 78.3% Ioan Raicu Zhao Zhang Mike Wilde Time (secs)

HPC systems software (MPICH, PVFS, ZeptOS) Collaborative data tagging (GLOSS) Data integration (XDTM) HPC data analytics and visualization Loosely coupled parallelism (Swift, Hadoop) Dynamic provisioning (Falkon) Service authoring (Introduce, caGrid, gRAVI) Provenance recording and query (Swift) Service composition and workflow (Taverna) Virtualization management (Workspace Service) Distributed data management (GridFTP, etc.)

Functional MRI Ben Clifford, MihaelHatigan, Mike Wilde, Yong Zhao

TeraGridPADS… SIDgrid Diverse experimental data & metadata Browse data Search Content preview Transcode Download Analyze Bennett Berthenthal Mike Papka Mike Wilde … and others

Results out Data in Programs & rules in “No limits”  Storage  Computing  Format  Program Allowing for  Versioning  Provenance  Collaboration  Annotation