Cyberinfrastructure and Its Application Cyberinfrastructure Day Salish Kootenai College, Pablo MT August 2 2011 Geoffrey Fox

Slides:



Advertisements
Similar presentations
Cloud Computing for Education & Cloud Learning Minjuan Wang to BT Research Center (Abu Dhabi) Educational Technology San Diego State University
Advertisements

Architecture and Measured Characteristics of a Cloud Based Internet of Things May 22, 2012 The 2012 International Conference.
Collaboration in the Cloud and online education environments The 2013 International Conference on Collaboration Technologies.
1 Challenges and New Trends in Data Intensive Science Panel at Data-aware Distributed Computing (DADC) Workshop HPDC Boston June Geoffrey Fox Community.
SALSA HPC Group School of Informatics and Computing Indiana University.
High Performance Computing Course Notes Grid Computing.
International Conference on Cloud and Green Computing (CGC2011, SCA2011, DASC2011, PICom2011, EmbeddedCom2011) University.
Clouds from FutureGrid’s Perspective April Geoffrey Fox Director, Digital Science Center, Pervasive.
Authors: Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox Publish: HPDC'10, June 20–25, 2010, Chicago, Illinois, USA ACM Speaker: Jia Bao Lin.
Clouds will win! Geoffrey Fox Director,
MS DB Proposal Scott Canaan B. Thomas Golisano College of Computing & Information Sciences.
1 Clouds and Sensor Grids CTS2009 Conference May Alex Ho Anabas Inc. Geoffrey Fox Computer Science, Informatics, Physics Chair Informatics Department.
Student Visits August Geoffrey Fox
Parallel Data Analysis from Multicore to Cloudy Grids Indiana University Geoffrey Fox, Xiaohong Qiu, Scott Beason, Seung-Hee.
1 Building National Cyberinfrastructure Alan Blatecky Office of Cyberinfrastructure EPSCoR Meeting May 21,
Clouds on IT horizon Faculty of Maritime Studies University of Rijeka Sanja Mohorovičić INFuture 2009, Zagreb, 5 November 2009.
A.V. Bogdanov Private cloud vs personal supercomputer.
1 Challenges Facing Modeling and Simulation in HPC Environments Panel remarks ECMS Multiconference HPCS 2008 Nicosia Cyprus June Geoffrey Fox Community.
Computing in Atmospheric Sciences Workshop: 2003 Challenges of Cyberinfrastructure Alan Blatecky Executive Director San Diego Supercomputer Center.
Big Data and Clouds: Challenges and Opportunities NIST January Geoffrey Fox
X-Informatics Introduction: What is Big Data, Data Analytics and X-Informatics? January Geoffrey Fox
X-Informatics Cloud Technology (Continued) March Geoffrey Fox Associate.
Cyber-Infrastructure in Education South Carolina State University Cyberinfrastructure Day March Geoffrey Fox
SALSASALSASALSASALSA AOGS, Singapore, August 11-14, 2009 Geoffrey Fox 1,2 and Marlon Pierce 1
Overview of Cyberinfrastructure Northeastern Illinois University Cyberinfrastructure Day August Geoffrey Fox
PolarGrid Geoffrey Fox (PI) Indiana University Associate Dean for Graduate Studies and Research, School of Informatics and Computing, Indiana University.
DISTRIBUTED COMPUTING
Science Clouds and FutureGrid’s Perspective June Science Clouds Workshop HPDC 2012 Delft Geoffrey Fox
OpenQuake Infomall ACES Meeting Maui May Geoffrey Fox
material assembled from the web pages at
Data Science at Digital Science October Geoffrey Fox Judy Qiu
What is Cyberinfrastructure? Russ Hobby, Internet2 Clemson University CI Days 20 May 2008.
Scientific Computing Environments ( Distributed Computing in an Exascale era) August Geoffrey Fox
SALSA HPC Group School of Informatics and Computing Indiana University.
SBIR Final Meeting Collaboration Sensor Grid and Grids of Grids Information Management Anabas July 8, 2008.
Service - Oriented Middleware for Distributed Data Mining on the Grid ,劉妘鑏 Antonio C., Domenico T., and Paolo T. Journal of Parallel and Distributed.
SALSASALSASALSASALSA FutureGrid Venus-C June Geoffrey Fox
Cyberinfrastructure and Its Application California State University Dominguez Hills Cyberinfrastructure Day June Geoffrey Fox
1 CReSIS Lawrence Kansas February Geoffrey Fox (PI) Computer Science, Informatics, Physics Chair Informatics Department Director Digital Science.
SALSASALSASALSASALSA Clouds Ball Aerospace March Geoffrey Fox
Virtual Appliances CTS Conference 2011 Philadelphia May Geoffrey Fox
Clouds will win! CTS Conference 2011 Philadelphia May Geoffrey Fox
Minority-Serving Institutions (MSI) Cyberinfrastructure (CI) Institute [MSI-CI 2 ] and CI Empowerment Coalition MSI-CIEC October Geoffrey Fox
EScience: Techniques and Technologies for 21st Century Discovery Ed Lazowska Bill & Melinda Gates Chair in Computer Science & Engineering Computer Science.
Security: systems, clouds, models, and privacy challenges iDASH Symposium San Diego CA October Geoffrey.
Cloud Computing Paradigms for Pleasingly Parallel Biomedical Applications Thilina Gunarathne, Tak-Lon Wu Judy Qiu, Geoffrey Fox School of Informatics,
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
SALSASALSASALSASALSA Digital Science Center February 12, 2010, Bloomington Geoffrey Fox Judy Qiu
Big Data to Knowledge Panel SKG 2014 Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China August Geoffrey Fox
HPC in the Cloud – Clearing the Mist or Lost in the Fog Panel at SC11 Seattle November Geoffrey Fox
Memcached Integration with Twister Saliya Ekanayake - Jerome Mitchell - Yiming Sun -
Directions in eScience Interoperability and Science Clouds June Interoperability in Action – Standards Implementation.
PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.
4a. Aula 2o. Período de Livro texto Copyright © 2012, Elsevier Inc. All rights reserved March 5, 2012 Prof. Kai Hwang, USC Cloud Roles in.
© 2007 IBM Corporation IBM Software Strategy Group IBM Google Announcement on Internet-Scale Computing (“Cloud Computing Model”) Oct 8, 2007 IBM Confidential.
Societal applications of large scalable parallel computing systems ARTEMIS & ITEA Co-summit, Madrid, October 30th 2009.
Cloud Computing: Concepts, Technologies and Business Implications B. Ramamurthy & K. Madurai &
EGI-InSPIRE EGI-InSPIRE RI The European Grid Infrastructure Steven Newhouse Director, EGI.eu Project Director, EGI-InSPIRE 29/06/2016CoreGrid.
Geoffrey Fox Panel Talk: February
Clouds , Grids and Clusters
Volunteer Computing for Science Gateways
Recap: introduction to e-science
Biology MDS and Clustering Results
Clouds from FutureGrid’s Perspective
Cyberinfrastructure and PolarGrid
Department of Intelligent Systems Engineering
Panel on Research Challenges in Big Data
Cloud versus Cloud: How Will Cloud Computing Shape Our World?
CReSIS Cyberinfrastructure
Presentation transcript:

Cyberinfrastructure and Its Application Cyberinfrastructure Day Salish Kootenai College, Pablo MT August Geoffrey Fox Director, Digital Science Center, Pervasive Technology Institute Associate Dean for Research and Graduate Studies, School of Informatics and Computing Indiana University Bloomington

Important Trends Data Deluge in all fields of science Multicore implies parallel computing important again – Performance from extra cores – not extra clock speed – GPU enhanced systems can give big power boost Clouds – new commercially supported data center model replacing compute grids (and your general purpose computer center) Light weight clients: Sensors, Smartphones and tablets accessing and supported by backend services in cloud Commercial efforts moving much faster than academia in both innovation and deployment

Jaliya Ekanayake - School of Informatics and Computing 3 Big Data in Many Domains  According to one estimate, we created 150 exabytes (billion gigabytes) of data in In 2010, we created 1,200 exabytesone  Enterprise Storage sold in 2010 was 15 Exabytes; BUT total storage sold (including flash memory etc.) was 1500 Exabytes  Size of the web ~ 3 billion web pages: MapReduce at Google was on average processing 20PB per day in January 2008 Size of the web  During 2009, American drone aircraft flying over Iraq and Afghanistan sent back around 24 years’ worth of video footage   New models being deployed this year will produce ten times as many data streams as their predecessors, and those in 2011 will produce 30 times as many  ~108 million sequence records in GenBank in 2009, doubling in every 18 monthsGenBank  ~20 million purchases at Wal-Mart a day  90 million Tweets a dayTweets a day  Astronomy, Particle Physics, Medical Records …  Most scientific task shows CPU:IO ratio of 10000:1 – Dr. Jim Gray  The Fourth Paradigm: Data-Intensive Scientific Discovery The Fourth Paradigm: Data-Intensive Scientific Discovery  Large Hadron Collider at CERN; 100 Petabytes to find Higgs Boson

66 What is Cyberinfrastructure Cyberinfrastructure is (from NSF) infrastructure that supports distributed research and learning (e-Science, e-Research, e- Education) Links data, people, computers Exploits Internet technology (Web2.0 and Clouds) adding (via Grid technology) management, security, supercomputers etc. It has two aspects: parallel – low latency (microseconds) between nodes and distributed – highish latency (milliseconds) between nodes Parallel needed to get high performance on individual large simulations, data analysis etc.; must decompose problem Distributed aspect integrates already distinct components – especially natural for data (as in biology databases etc.)

77 e-moreorlessanything ‘e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it.’ from inventor of term John Taylor Director General of Research Councils UK, Office of Science and Technology e-Science is about developing tools and technologies that allow scientists to do ‘faster, better or different’ research Similarly e-Business captures the emerging view of corporations as dynamic virtual organizations linking employees, customers and stakeholders across the world. This generalizes to e-moreorlessanything including e- DigitalLibrary, e-FineArts, e-HavingFun and e-Education A deluge of data of unprecedented and inevitable size must be managed and understood. People (virtual organizations), computers, data (including sensors and instruments) must be linked via hardware and software networks

The Span of Cyberinfrastructure High definition videoconferencing linking people across the globe Digital Library of music, curriculum, scientific papers Flickr, Youtube, Amazon …. Simulating a new battery design (exascale problem) Sharing data from world’s telescopes Using cloud to analyze your personal genome Enabling all to be equal partners in creating knowledge and converting it to wisdom Analyzing Tweets…documents to discover which stocks will crash; how disease is spreading; linguistic inference; ranking of institutions 8

UNIVERSITY OF CALIFORNIA, SAN DIEGO SAN DIEGO SUPERCOMPUTER CENTER Fran Berman Hubble Telescope Palomar Telescope Sloan Telescope “The Universe is now being explored systematically, in a panchromatic way, over a range of spatial and temporal scales that lead to a more complete, and less biased understanding of its constituents, their evolution, their origins, and the physical processes governing them.” Towards a National Virtual Observatory Tracking the Heavens

10 Virtual Observatory Astronomy Grid Integrate Experiments RadioFar-InfraredVisible Visible + X-ray Dust Map Galaxy Density Map

DNA Sequencing Pipeline Visualization Plotviz Blocking Sequence alignment MDS Dissimilarity Matrix N(N-1)/2 values FASTA File N Sequences Form block Pairings Pairwise clustering Illumina/Solexa Roche/454 Life Sciences Applied Biosystems/SOLiD Internet Read Alignment ~300 million base pairs per day leading to ~3000 sequences per day per instrument ? 500 instruments at ~0.5M$ each MapReduce MPI

100,043 Metagenomics Sequences

13 Lightweight Cyberinfrastructure to support mobile Data gathering expeditions plus classic central resources (as a cloud)

Data Centers Clouds & Economies of Scale I Range in size from “edge” facilities to megascale. Economies of scale Approximate costs for a small size center (1K servers) and a larger, 50K server center. Each data center is 11.5 times the size of a football field TechnologyCost in small- sized Data Center Cost in Large Data Center Ratio Network$95 per Mbps/ month $13 per Mbps/ month 7.1 Storage$2.20 per GB/ month $0.40 per GB/ month 5.7 Administration~140 servers/ Administrator >1000 Servers/ Administrator Google warehouses of computers on the banks of the Columbia River, in The Dalles, Oregon Such centers use 20MW-200MW (Future) each with 150 watts per CPU Save money from large size, positioning with cheap power and access with Internet

15 Builds giant data centers with 100,000’s of computers; ~ to a shipping container with Internet access “Microsoft will cram between 150 and 220 shipping containers filled with data center gear into a new 500,000 square foot Chicago facility. This move marks the most significant, public use of the shipping container systems popularized by the likes of Sun Microsystems and Rackable Systems to date.” Data Centers, Clouds & Economies of Scale II

Grids MPI and Clouds Grids are useful for managing distributed systems – Pioneered service model for Science – Developed importance of Workflow – Performance issues – communication latency – intrinsic to distributed systems – Can never run large differential equation based simulations or datamining Clouds can execute any job class that was good for Grids plus – More attractive due to platform plus elastic on-demand model – MapReduce easier to use than MPI for appropriate parallel jobs – Currently have performance limitations due to poor affinity (locality) for compute-compute (MPI) and Compute-data – These limitations are not “inevitable” and should gradually improve as in July Amazon Cluster announcement – Will probably never be best for most sophisticated parallel differential equation based simulations Classic Supercomputers (MPI Engines) run communication demanding differential equation based simulations – MapReduce and Clouds replaces MPI for other problems – Much more data processed today by MapReduce than MPI (Industry Informational Retrieval ~50 Petabytes per day)

Clouds and Jobs Clouds are a major industry thrust with a growing fraction of IT expenditure that IDC estimates will grow to $44.2 billion direct investment in 2013 while 15% of IT investment in 2011 will be related to cloud systems with a 30% growth in public sector. Gartner also rates cloud computing high on list of critical emerging technologies with for example “Cloud Computing” and “Cloud Web Platforms” rated as transformational (their highest rating for impact) in the next 2-5 years. Correspondingly there is and will continue to be major opportunities for new jobs in cloud computing with a recent European study estimating there will be 2.4 million new cloud computing jobs in Europe alone by Cloud computing is an attractive for projects focusing on workforce development. Note that the recently signed “America Competes Act” calls out the importance of economic development in broader impact of NSF projects

Gartner 2009 Hype Curve Clouds, Web2.0 Service Oriented Architectures Transformational High Moderate Low Cloud Computing Cloud Web Platforms Media Tablet

3-way Clouds and/or Cyberinfrastructure Use it in faculty, graduate student and undergraduate research – ~10 students each summer at IU from ADMI Teach it as it involves areas of Information Technology with lots of job opportunities Use it to support distributed learning environment – A cloud backend for course materials and collaboration – Green computing infrastructure

C4 = Continuous Collaborative Computational Cloud C4 EMERGING VISION While the internet has changed the way we communicate and get entertainment, we need to empower the next generation of engineers and scientists with technology that enables interdisciplinary collaboration for lifelong learning. Today, the cloud is a set of services that people explicitly have to access (from laptops, desktops, etc.). In 2020 the C4 will be part of our lives, as a larger, pervasive, continuous experience. The measure of success will be how “invisible” it becomes. We are no prophets and can’t anticipate what exactly will work, but we expect to have high bandwidth and ubiquitous connectivity for everyone everywhere, even in rural areas (using power-efficient micro data centers the size of shoe boxes). Here the cloud will enable business, fun, destruction and creation of regimes (societies) C4 Society Vision Wandering through life with a tablet/smartphone hooked to cloud Education should embrace C4 just as students do

C 4 Continuous Collaborative Computational Cloud C4C4 I N T E L I G L E N C E Motivating Issues job / education mismatch Higher Ed rigidity Interdisciplinary work Engineering v Science, Little v. Big science Modeling & Simulation C(DE)SE C 4 Intelligent Economy C 4 Intelligent People C 4 Intelligent Society NSF Educate “Net Generation” Re-educate pre “Net Generation” in Science and Engineering Exploiting and developing C 4 C 4 Curricula, programs C 4 Experiences (delivery mechanism) C 4 REUs, Internships, Fellowships Computational Thinking Internet & Cyberinfrastructure Higher Education 2020 CDESE is Computational and Data- enabled Science and Engineering

Implementing C4 in a Cloud Computing Curriculum Generate curricula that will allow students to enter cloud computing workforce Teach workshops explaining cloud computing to MSI faculty Write a basic textbook Design courses at Indiana University Design modules and modifications suitable to be taught at MSI’s Help teach initial MSI courses

ADMI Cloudy View on Computing Workshop June 2011 Jerome took two courses from IU in this area Fall 2010 and Spring 2011 ADMI: Association of Computer and Information Science/Engineering Departments at Minority Institutions Offered on FutureGrid (see later) 10 Faculty and Graduate Students from ADMI Universities The workshop provided information from cloud programming models to case studies of scientific applications on FutureGrid. At the conclusion of the workshop, the participants indicated that they would incorporate cloud computing into their courses and/or research. Concept and Delivery by Jerome Mitchell: Undergraduate ECSU, Masters Kansas, PhD Indiana

ADMI Cloudy View on Computing Workshop Participants

Published October 2011 Morgan Kaufmann

DDCPPIT Contents 1)Part 1: Systems Modeling, Clustering and Virtualization 1)Distributed System Models and Enabling Technologies 2)Computer Clusters for Scalable Computing 3)Virtual Machines and Virtualization of Clusters and Datacenters 2)Part 2: Computing Clouds, Service-Oriented Architecture and Programming 4)Cloud Platform Architecture over Virtualized Datacenters 5)Service-Oriented Architectures 6)Cloud Programming and Software Environments 3)Part 3: Grids, P2P and The Future Internet 7)Grid Computing and Resource Management 8)Peer-to-Peer Computing with Overlay Networks 9)Ubiquitous Clouds and The Internet of Things

Qiu: CSCI-B649 Fall 2010 Topics Class: Cloud Computing for Data Intensive Sciences This course offers to graduate students cloud computing programming models and tools to support data-intensive science applications. These include virtual machine-based utility computing environments such as Amazon AWS and Microsoft Azure. The class covers MapReduce for information retrieval, and scientific data analysis. Students have the opportunity to understand some commercial cloud systems through projects using FutureGrid resources Project topics were Matrix Multiplication with DryadLINQ; Implementing PhyloD application with DryadLINQ; Parallelism for Latent Dirichlet allocation (LDA); Memcached Integration with Twister; Improving Twister Messaging System Using Apache Avro; Large Scale PageRank with DryadLINQ; A Survey of Open-Source Cloud Infrastructure; A Survey on Cloud Storage Systems; Performance Analysis of HPC Virtualization Technologies. The projects were performed on FutureGrid and carefully chosen to include different aspects of the cloud architecture stack from top-level biology and large-scale graphics applications, optimization of MapReduce runtimes, Cloud storage, to low-level virtualization technologies.

Qiu: CSCI-B534 Spring 2011 Class: Distributed Systems This is motivated by the Internet and the data deluge for science with the emergence of data-oriented analysis as a fourth paradigm of scientific methodology. The content of B534 covers the design principles, systems architecture, and innovative applications of parallel, distributed, and cloud computing systems. These include supercomputing clusters, service-orient architecture (SOA), computational grids, P2P (peer-to-peer) networks, virtualized datacenters, and cloud platforms. The programming project for B534 class is a "platform" building bottom up, giving a stack with virtualization and dynamic provisioning capabilities to support the OS and science applications. This is a prototype of commercial cloud systems. Additional class topics included Authentication and Authorization Workflow; Data Transport Image Library; Generalize Azure Blob or Amazon S3; Data-Parallel File System; Queues: Publish Subscribe based queuing system. Later topics will be added to cover the complete "OS of OSs at Internet scale".

Some Next Steps Develop Appliances (Virtual machine based preconfigured computer systems) to support programming laboratories Offer Cloud Computing course with – Web portal support – FutureGrid or Appliances locally – Distance delivery Deliver first to ECSU, then other MSI’s Write proposals with Linda Hayden at ECSU and/or Montana and/or AIHEC Develop Cloud Computing Certificates and other degree offerings – Masters, Undergraduate, Continuing education …..

RS502 ECSU Proposed Class in Cloud Computing and Remote Sensing RS 502 will be the starting point for introduction of cloud computing methods and will be offered during the spring semester. RS 502 currently covers Mapping Concepts, Data Structures, Data Management Techniques, Data Acquisition, Global positioning system interface, and data manipulation and analysis. RS 502 has been taught in the past as a project based course using available data sets and on site visualization packages including ENVI. The new implementation of RS 502 will be teaching and using cloud computing techniques with CReSIS polar data sets. We expect about half the course to be the responsibility of Indiana University (Qiu) and the other half ECSU. The ECSU instructor of record (Dr. LeCompte) will work collaboratively with the IU co-instructor for RS 501 and RS 502. In addition CReSIS personnel will consult to identify the polar data sets for use in the course

JSU Syracuse Teaching Jackson State Fall 97 to Fall 2001