Presentation is loading. Please wait.

Presentation is loading. Please wait.

Clemson Computing and Information Technology Introduction to Cyberinfrastructure (aka: Grids, Clouds, etc) Creating an Integrated South Carolina Environmental.

Similar presentations


Presentation on theme: "Clemson Computing and Information Technology Introduction to Cyberinfrastructure (aka: Grids, Clouds, etc) Creating an Integrated South Carolina Environmental."— Presentation transcript:

1 Clemson Computing and Information Technology Introduction to Cyberinfrastructure (aka: Grids, Clouds, etc) Creating an Integrated South Carolina Environmental and Human Health Grid: Using Cyberinfrastructure to Link Hazards, Exposures and Health Effects June 25, 2009 Jim Bottum and Jill Gemmill

2 Clemson Computing and Information Technology Cyberinfrastructure (CI) Computing and Communications Data – Storage, retrieval, archiving, visualization, mining, security Virtual Organizations – Software environments for communities Education and Workforce Training Cyberinfrastructure has a four-legged strategic approach. Cyberinfrastructure = Information technology we all use

3 Clemson Computing and Information Technology 3 CI enables Scaling up Science Citation Network Analysis in Sociology Work of James Evans, University of Chicago, Department of Sociology

4 Clemson Computing and Information Technology Query and analysis of 25+ million citations Higher throughput and capacity enables deeper analysis and broader community access. Analysis began on desktop workstations Queries grew to month-long duration Moved analysis to U of Chicago compute cluster: 50 (faster) CPUs gave 100 X speedup Many more methods and hypotheses can be tested! Scaling up the analysis

5 Clemson Computing and Information Technology Telephone / Voice Photography / Images Video / Movies / Animation Information / Data / Library CI is the convergence of technologies:

6 Clemson Computing and Information Technology Clemson CI Vision “Cyberinfrastructure is the primary backbone that ties together innovation in research, instruction, and service to elevate Clemson to the Top 20” Dori Helms Provost

7 Clemson Computing and Information Technology High Performance CI is similar, but scaled up in size and complexity +

8 Clemson Computing and Information Technology Independent computations can be done in parallel Many computations (modeling or analysis of data) consist of: Large datasets as inputs (find datasets) “Transformations” which work on the input datasets (process) The output datasets (store and publish) Montage Workflow: ~1200 jobs, 7 levels NVO, NASA, ISI/Pegasus - Deelman et al. = Data Transfer = Compute Job

9 Clemson Computing and Information Technology Clusters: High Performance Computing Clemson Palmetto Cluster: #60 supercomputer, #4 in stand alone US universities – 756 “PCs” (commodity hardware) with closely coupled, high speed communications and large scale, very fast storage. ½ PetaByte Storage 45 TeraFlops. 756 compute nodes High Throughput interconnects (10 Gbps)

10 Clemson Computing and Information Technology What problems? (months to minutes) Estimation and Inference about Efficiency in Production Settings - Paul Wilson, Economics Protein Interaction with Synthetic Materials -Robert Latour, Bioengineering Effects of School Characteristics and Parents' Work Behaviors on Children's Performance in School--modeling the parents' choices of places of residence and their labor market behaviors (over 15,000 school districts) - Tom Mroz, Economics

11 Clemson Computing and Information Technology “Snapshot” of Palmetto in use

12 Clemson Computing and Information Technology High Throughput Computing Harvesting unused student lab computer cycles using Condor software 1700 lab PCs can be used as a supercomputer Small amounts of storage Low Throughput connections

13 Clemson Computing and Information Technology What Problems? (months to minutes) Manufacturing & Scheduling Optimization-Mary Beth Kurz, Industrial Engineering Rendering Architectural 3D views and walk-throughs ArcGIS Computations over multiple inputs and very large areas

14 Clemson Computing and Information Technology Leverage Existing Investments *Linked 1200 Clemson lab machines to provide high throughput compute power M. B. Kurz, Industrial Engineering “.. Grid computing is saving me” "I had all but given up on this line of research, when I was approached with this Condor idea. Now I am doing work at a scale larger than is usually done." M.B. Kurz student lab machines

15 Clemson Computing and Information Technology National Trend – Grids, Clouds, Similar to power grids….. …..computing grids enable resource sharing

16 Clemson Computing and Information Technology Credit: Open Cloud Consortium Cloud system at the NSF supported National Center for Data Mining at the UI - Chicago Cloud computing enables faster, less expensive processing across geographically distributed data centers National Trend - Clouds "We demonstrated that our system is six times faster than competing technology.” Robert Grossman, NCDM director NSF Press Release , Feb. 25, 2009

17 Clemson Computing and Information Technology FutureNet connects to every major R&E Network via multiple lambdas National and Regional Research and Education Networks South Carolina Light Rail and C-Light Higher Bandwidth and Optional Dedicated Fiber Paths Connect Grids R&E Connections are International

18 Clemson Computing and Information Technology State Network Foundation SCLR FCC Rural Health Care DoE – PSA ARRA Stimulus Broadband Activities may build on this

19 Clemson Computing and Information Technology Clemson 2006Clemson 2008 Networks: No redundancy Mbps Data Center No HPC C-Light used no taxpayer $$ SCLR – I2 - NLRGbps  ~30,000 sf aggregate data center  Re-engineered SAN; petascale of storage  Collocation, Condominium, Condor Approaching 100 Teraflops  #60 on Intl Top X7X365 NOC “ Last Mile” Clemson Infrastructure

20 Clemson Computing and Information Technology Virtual Organization: Group of people who share a common goal and share resources (or need resources) to achieve that goal A classroom as a VO A group of astronomers Providers of a common application such as Sharing resources among virtual organizations is what grid computing is trying to solve Portals / Science Gateways / Hubs make it easy to use grids – all you need is a web browser National Trend – VOs and Communities

21 Clemson Computing and Information Technology COMPUTATION VISUALIZATION EDUCATION DATA R&E Networks Portal/ Gateway/Hub Middleware (Grid Software) Another Way of looking at Grids

22 Clemson Computing and Information Technology Example Virtual Organizations

23 Clemson Computing and Information Technology

24 24 Intelligent River TM Watershed Monitoring Linking Water, Land Use, Energy, and Global Climate Change Source: WATERS Network cyberinfrastructure, NSF Slide courtesy of Gene Eidson

25 Clemson Computing and Information Technology Open Parks Grid

26 Clemson Computing and Information Technology Grids are enhancing and expanding traditional scientific methods

27 Clemson Computing and Information Technology A Disruptive Technology Examples of some previous disruptive technologies: The printing press The telegraph and telephone The web browser What changed as a result? Widespread literacy Real-time communications over distance Anyone can use a computer

28 Clemson Computing and Information Technology Grids are Disruptive Technologies (1) The world is “flat” (2) Everyone can be a producer as well as consumer of information (3) ‘Web Services’ makes information re-useable and available anywhere (4) Information can be customized to your interests (think Amazon.com)

29 Clemson Computing and Information Technology What are “Web Services”? The map and data on the left represent streamflow conditions – data is collected by the USGS and made available on a map on their web site. Data is made available as a web service

30 Clemson Computing and Information Technology 30 Therefore, the Intelligent River TM (IR) can use this data to re-display in a different manner and combine with other, project- specific data. USGS data used is always current, but can be stored as well. In the same way, IR data can be re-used by others. Standards and MetaData make this possible

31 Clemson Computing and Information Technology Science is Disrupted, too Old way: The experimental notebook The filing cabinet The library catalog Purchase/write your own analysis software Results made available to research specialists via conference presentations and journal publications Results are written, presented and preserved on paper. Information is organized and presented for a specific audience

32 Clemson Computing and Information Technology Using Grids-- The experimental notebook Digital data in standardized format; use Metadata to record date, parameters, owner, etc. The filing cabinet The Database The library catalog The search engine Purchase/write your own analysis software Use analytical/computational services available in the Grid The experimental notebook Digital data in standardized format; use Metadata to record date, parameters, owner, etc. The filing cabinet The Database The library catalog The search engine Purchase/write your own analysis software Use analytical/computational services available in the Grid

33 Clemson Computing and Information Technology 33 Results made available to research specialists via conference presentations and journal publications Results are available to anyone interested Data may be made available upon collection or analysis Results may be made available as simulations, animations, movies, tutorials, interactive games…or a mix of these with text. Results are written, presented and preserved on paper. Results are digitally stored, archived, and available on the Internet Results made available to research specialists via conference presentations and journal publications Results are available to anyone interested Data may be made available upon collection or analysis Results may be made available as simulations, animations, movies, tutorials, interactive games…or a mix of these with text. Results are written, presented and preserved on paper. Results are digitally stored, archived, and available on the Internet Using Grids (cont’d):

34 Clemson Computing and Information Technology Information is organized and presented for a specific audience Information and resources can be presented in a highly customized manner Information is organized and presented for a specific audience Information and resources can be presented in a highly customized manner Using Grids (cont’d):

35 Clemson Computing and Information Technology Conclusion: Why Cyberinfrastructure? New approaches to inquiry based on Deep analysis of huge quantities of data Interdisciplinary collaboration Large-scale simulation and analysis Smart instrumentation Dynamically assemble the resources to tackle a new scale of problem Enabled by access to resources & services without regard for location & other barriers

36 Clemson Computing and Information Technology Discussion


Download ppt "Clemson Computing and Information Technology Introduction to Cyberinfrastructure (aka: Grids, Clouds, etc) Creating an Integrated South Carolina Environmental."

Similar presentations


Ads by Google