E-Science and Cyberinfrastructure Tony Hey Corporate VP for Technical Computing Microsoft Corporation.

Slides:



Advertisements
Similar presentations
Microsoft Research Microsoft Research Jim Gray Distinguished Engineer Microsoft Research San Francisco SKYSERVER.
Advertisements

Trying to Use Databases for Science Jim Gray Microsoft Research
World Wide Telescope mining the Sky using Web Services Information At Your Fingertips for astronomers Jim Gray Microsoft Research Alex Szalay Johns Hopkins.
1 Online Science The World-Wide Telescope as a Prototype For the New Computational Science Jim Gray Microsoft Research
Online Science The World-Wide Telescope as a Prototype For the New Computational Science Jim Gray Microsoft Research
IATUL Porto, May 21, 2006 DOI and e-Science Dr Anne E Trefethen Oxford e-Research Centre
E-Science and Cyberinfrastructure: A Middleware Perspective Tony Hey Corporate VP for Technical Computing Microsoft Corporation.
The Australian Virtual Observatory e-Science Meeting School of Physics, March 2003 David Barnes.
GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
Research Councils ICT Conference Welcome Malcolm Atkinson Director 17 th May 2004.
National e-Science Centre Glasgow e-Science Hub Opening: Remarks NeSCs Role Prof. Malcolm Atkinson Director 17 th September 2003.
Digital Repositories: interoperability & common services Closing Remarks Dr Liz Lyon, UKOLN, University of Bath, UK
Sunday October 28, www.eprints.org Tim Brody - Stevan Harnad -
CHORUS Implementation Webinar May 16, 2014 Mark Martin Assistant Director, Office of Scientific and Technical Information Office of Science U.S. Department.
Joint CASC/CCI Workshop Report Strategic and Tactical Recommendations EDUCAUSE Campus Cyberinfrastructure Working Group Coalition for Academic Scientific.
1 Challenges and New Trends in Data Intensive Science Panel at Data-aware Distributed Computing (DADC) Workshop HPDC Boston June Geoffrey Fox Community.
E-Science and Scholarly Communication Tony Hey Corporate VP for Technical Computing Microsoft Corporation.
The Data Lifecycle and the Curation of Laboratory Experimental Data Tony Hey Corporate VP for Technical Computing Microsoft Corporation.
Client+Cloud The Future of Research Dr. Daniel A. Reed Corporate Vice President Extreme Computing Group & Technology Strategy and Policy.
Using Sakai to Support eScience Sakai Conference June 12-14, 2007 Sayeed Choudhury Tim DiLauro, Jim Martino, Elliot Metsger, Mark Patton and David Reynolds.
Solar and STP Physics with AstroGrid 1. Mullard Space Science Laboratory, University College London. 2. School of Physics and Astronomy, University of.
Tony Hey Corporate Vice President Corporate Vice President Technical Computing Microsoft Corporation Microsoft Corporation Computer and Information Sciences.
E-Science and Global Grids in the Information Society: The Role of an EU e-IRG? Tony Hey Director of the UK e-Science Core Programme
Jeffery Loo NLM Associate Fellow ’03 – ’05 chemicalinformaticsforlibraries.
14 July 2000TWIST George Brett NLANR Distributed Applications Support Team (NCSA/UIUC)
Science as an Open Enterprise: Open Data for Open Science Professor Brian Collins CB, FREng UCL, June 2012 Emerging conclusions from a Royal Society Policy.
This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License Enabling Academic.
Supercomputing Center Jysoo Lee KISTI Supercomputing Center National e-Science Project.
Tony Hey Corporate Vice President Corporate Vice President Technical Computing Microsoft Corporation Microsoft Corporation Computer and Information Sciences.
E-Science and Cyberinfrastructure Tony Hey Corporate VP for Technical Computing Microsoft Corporation.
Open Science Grid For CI-Days Internet2: Fall Member Meeting, 2007 John McGee – OSG Engagement Manager Renaissance Computing Institute.
Tony Hey Corporate Vice President Corporate Vice President Technical Computing Microsoft Corporation Microsoft Corporation Computer and Information Sciences.
Alex Szalay, Jim Gray Analyzing Large Data Sets in Astronomy.
Digital/Open Access repositories Paul Sheehan Director of Library Services DCU HEAnet National Networking Conference Athlone 11 th November 2005.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Molecular Science in NPACI Russ B. Altman NPACI Molecular Science Thrust Stanford Medical.
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
From GEANT to Grid empowered Research Infrastructures ANTONELLA KARLSON DG INFSO Research Infrastructures Grids Information Day 25 March 2003 From GEANT.
What is Cyberinfrastructure? Russ Hobby, Internet2 Clemson University CI Days 20 May 2008.
Research and Educational Networking and Cyberinfrastructure Russ Hobby, Internet2 Dan Updegrove, NLR University of Kentucky CI Days 22 February 2010.
Lewis Shepherd GOVERNMENT AND THE REVOLUTION IN SCIENTIFIC COMPUTING.
Bob Jones Technical Director CERN - August 2003 EGEE is proposed as a project to be funded by the European Union under contract IST
NVO Review -- San Diego Jan The VO compared to Other O‘s Jim Gray Microsoft T HE US N ATIONAL V IRTUAL O BSERVATORY.
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
Sky Survey Database Design National e-Science Centre Edinburgh 8 April 2003.
Information Infrastructure Evolution ARIIC is working towards – a distributed electronic research environment that allows researchers to share, annotate,
CNI, 3rd April 2006 Slide 1 UK National Centre for Text Mining: Activities and Plans Dr. Robert Sanderson Dept. of Computer Science University of Liverpool.
ESFRI & e-Infrastructure Collaborations, EGEE’09 Krzysztof Wrona September 21 st, 2009 European XFEL.
Interoperability from the e-Science Perspective Yannis Ioannidis Univ. Of Athens and ATHENA Research Center
Building an Infrastructure for Digital Humanities: Issues and Considerations Peter Zhou 周欣平 University of California, Berkeley October 8, 2009.
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
August 3, March, The AC3 GRID An investment in the future of Atlantic Canadian R&D Infrastructure Dr. Virendra C. Bhavsar UNB, Fredericton.
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
Integrating Data Mining and Data Management Technologies for Scholarly Inquiry Ray R. Larson University of California, Berkeley Paul Watry Richard Marciano.
Statistics in WR: Lecture 1 Key Themes – Knowledge discovery in hydrology – Introduction to probability and statistics – Definition of random variables.
Digital Data Collections ARL, CNI, CLIR, and DLF Forum October 28, 2005 Washington DC Chris Greer Program Director National Science Foundation.
Introduction to the VO ESAVO ESA/ESAC – Madrid, Spain.
Toward a common data and command representation for quantum chemistry Malcolm Atkinson Director 5 th April 2004.
The National Grid Service Mike Mineter.
RC ICT Conference 17 May 2004 Research Councils ICT Conference The UK e-Science Programme David Wallace, Chair, e-Science Steering Committee.
Northwest Indiana Computational Grid Preston Smith Rosen Center for Advanced Computing Purdue University - West Lafayette West Lafayette Calumet.
E-Science and Cyberinfrastructure Tony Hey Corporate VP for Technical Computing Microsoft Corporation.
ENEA GRID & JPNM WEB PORTAL to create a collaborative development environment Dr. Simonetta Pagnutti JPNM – SP4 Meeting Edinburgh – June 3rd, 2013 Italian.
Design and Manufacturing in a Distributed Computer Environment
e-Science and Cyberinfrastructure
Cyberinfrastructure and PolarGrid
Grid Application Model and Design and Implementation of Grid Services
Google Sky.
Review of grid computing
Presentation transcript:

e-Science and Cyberinfrastructure Tony Hey Corporate VP for Technical Computing Microsoft Corporation

Licklider’s Vision “Lick had this concept – all of the stuff linked together throughout the world, that you can use a remote computer, get data from a remote computer, or use lots of computers in your job” “Lick had this concept – all of the stuff linked together throughout the world, that you can use a remote computer, get data from a remote computer, or use lots of computers in your job” Larry Roberts – Principal Architect of the ARPANET

What is e-Science? ‘e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it’ ‘e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it’ John Taylor Former Director General of Research Councils Former Director General of Research Councils Office of Science and Technology, UK Office of Science and Technology, UK

e-Science  e-Science is about data-driven, multidisciplinary science and the technologies to support such distributed, collaborative scientific research  Many areas of science are now being overwhelmed by a ‘data deluge’ from new high-throughput devices, sensor networks, satellite surveys …  Areas such as bioinformatics, genomics, drug design, engineering and healthcare require collaboration between different domain experts  ‘e-Science’ is a shorthand for a set of technologies to support collaborative networked science  HPC and Information Management are key technologies to support this e-Science revolution

The UK e-Science Initiative  DTI and OST Investment of over £250M over 5 years aimed at enabling the next generation of multi-disciplinary collaborative science and engineering  Major collaborative industrial program with participation from the engineering, pharmaceutical, petrochemical, media and financial sectors  Similar national e-Science programs now initiated around the world e.g. China, South Korea, Australia, Germany, Spain, Netherlands … e.g. China, South Korea, Australia, Germany, Spain, Netherlands … … and now Chile … and now Chile

A New Science Paradigm  Thousand years ago: Experimental Science - description of natural phenomena - description of natural phenomena  Last few hundred years: Theoretical Science - Newton’s Laws, Maxwell’s Equations … - Newton’s Laws, Maxwell’s Equations …  Last few decades: Computational Science - simulation of complex phenomena - simulation of complex phenomena  Today: e-Science or Data-centric Science - unify theory, experiment, and simulation - unify theory, experiment, and simulation - using data exploration and data mining - using data exploration and data mining Data captured by instruments Data captured by instruments Data generated by simulations Data generated by simulations Data generated by sensor networks Data generated by sensor networks  Scientist analyzes databases/files (With thanks to Jim Gray)

Undersea Sensor Network Connected & Controllable Over the Internet

Visual Programming Persistent Distributed Storage

Distributed Computation Interoperability & Legacy Support via Web Services

Live Documents Searching & Visualization Reputation & Influence

Two examples of e-Science  An Astronomy DataGrid – AstroGrid and the International Virtual Observatory  Chemistry – The Comb-e-Chem Project

Powering the Virtual Universe Multi-wavelength showing the jet in M87: from top to bottom – X-ray, Optical, Infra-Red and Radio

The Multiwavelength Crab Nebulae X-ray, optical, infrared, and radio views of the nearby Crab Nebula, which is now in a state of chaotic expansion after a supernova explosion first sighted in 1054 A.D. by Chinese Astronomers. Slide courtesy of Robert CalTech. Crab star 1053 AD

International Virtual Observatory  Data has no commercial value  No privacy concerns  Can freely share results with others  Great for experimenting with algorithms  Data is real and well documented  High-dimensional data  Spatial data  Temporal data  Data from many different instruments, places and times instruments, places and times  Federation is a key goal  There is a lot of data (petabytes) With thanks to Jim Gray IRAS 100  ROSAT ~keV DSS Optical 2MASS 2  IRAS 25  NVSS 20cm WENSS 92cm GB 6cm

IVO: An Astronomy Data Grid  Working to build world-wide telescope  All astronomy data and literature  online and cross indexed  Tools to analyze it  Built SkyServer.SDSS.org  Built Analysis system  MyDB  CasJobs (batch job)  OpenSkyQuery Federation of ~20 observatories.  Results:  It works and is used every day  Spatial extensions in SQL 2005  A good example of Data Grid  A good example of Web Services

The Comb-e-Chem Project National X-Ray Service Data Mining and Analysis Automatic Annotation Combinatorial Chemistry Wet Lab HPC Simulation Video Data Stream Diffractometer Middleware Structures Database

National Crystallographic Service Send sample material to NCS service Search materials database and predict properties using Grid computations Download full data on materials of interest Collaborate in e-Lab experiment and obtain structure

A digital lab book replacement that chemists were able to use, and liked

Monitoring laboratory experiments using a broker delivered over GPRS on a PDA

Crystallographic e-Prints Direct Access to Raw Data from scientific papers Raw data sets can be very large - stored at UK National Datastore using SRB software

Cyberinfrastructure  Cyberinfrastructure and e-Infrastructure  In the US, Europe and Asia there is a common vision for the ‘cyberinfrastructure’ required to support the e-Science revolution  Set of Middleware Services supported on top of high bandwidth academic research networks  Software, hardware and organizations that support e-Science  Similar to vision of the Grid as a set of services that allows scientists – and industry – to routinely set up ‘Virtual Organizations’ for their research – or business  The ‘Microsoft Grid’ vision is as much about integrating and managing data and information than about compute cycles

Grids for Virtual Organizations

Service-Orientation for building Distributed Systems

Web Services and Interoperability Company A (J2EE) Open Source (OMII) Company C (.NET) Web Services

Progress in Grid Standards?  The GGF/EGA merger gives great opportunity for the new Open Grid Forum (OGF) to standardize a small set of basic Grid services based on generally accepted Web Services  Harness the power of the world-wide Grid community to develop robust open source reference implementations  Grid research community needs to propose and explore new features in real experiments  OGF can reassure industry about progress in Grid standards and grow the market for all

Key Data Issues for e-Science  Networks  Lambda technology  The Data Life Cycle  From Acquisition to Preservation  Scholarly Communication  Open Access to Data and Publications

Computation Starlight (Chicago) Netherlight (Amsterdam) Leeds PSC SDSC UCL Network PoP Service Registry NCSA Manchester UKLight Oxford RAL US TeraGrid UK NGS Steering clients AHM 2004 Local laptops and Manchester vncserver All sites connected by production network (not all shown) An International e-Infrastructure

The Problem for the e-Scientist  Data ingest  Managing a petabyte  Common schema  How to organize it?  How to reorganize it?  How to coexist & cooperate with others?  Data Query and Visualization tools  Support/training  Performance  Execute queries in a minute  Batch (big) query scheduling Experiments & Instruments Simulations facts answers questions ? Literature Other Archives facts

The e-Science Data Life Cycle  Data Acquisition  Data Ingest  Metadata  Annotation  Provenance  Data Storage  Data Cleansing  Data Mining  Curation  Preservation

Scholarly Communication  Global Movement towards permitting ‘Open Access’ to scholarly publications  Libraries can no longer afford publisher subscriptions  Principle that results of publicly funded research should be available to all  Mandates for Open Access  US Proposal – Cornyn-Lieberman Bill  Supported by most top US research universities  EU Proposals  UK, France and German initiatives

NSF ‘Atkins’ Report on Cyberinfrastructure  ‘the primary access to the latest findings in a growing number of fields is through the Web, then through classic preprints and conferences, and lastly through refereed archival papers’  ‘archives containing hundreds or thousands of terabytes of data will be affordable and necessary for archiving scientific and engineering information’

Interoperable Repositories?  Paul Ginsparg’s arXiv at Cornell has demonstrated a new model of scientific publishing  Electronic version of ‘preprints’ hosted on the Web  David Lipman of the NIH National Library of Medicine has developed PubMedCentral as repository for NIH funded research papers  Microsoft funded development of ‘portable PMC’ now being deployed in UK and other countries  Stevan Harnad’s ‘self-archiving’ EPrints project in Southampton provides a basis for OAI-compliant ‘Institutional Repositories’  JISC-funded TARDis Project at Southampton is hybrid of full-text open access and links to publisher sites

The Service Revolution  Web 2.0  Social networks, tagging for sharing e.g. e.g. Flikr, Del.icio.us, MySpace, CiteULike, Connotea …  Wikis, Blogs, RSS, folksonomies …  Software delivered as a service  Microsoft Live services  Office Live  Xbox Live  Windows Live Academic  Mashups  Craigslist + GoogleMap 

id Combine services to give added value e-Science Mashups?

Technical Computing at Microsoft  Advanced Computing for Science and Engineering  Application of new algorithms, tools and technologies to scientific and engineering problems  High Performance Computing  Application of high performance clusters and database technologies to industrial and scientific applications  Radical Computing  Research in potential breakthrough technologies

Fighting HIV with Computer Science Nebojsa Jojic and David Heckerman  A major problem: Over 40 million infected  Drug treatments are effective but are an expensive life commitment  Vaccine needed for third world countries  Effective vaccine could eradicate disease  Methods from computer science are helping with the design of vaccine  Machine learning: Finding biological patterns that may stimulate the immune system to fight the HIV virus  Optimization methods: Compressing these patterns into a small, effective vaccine

Developed Set of Specialist Tools  Chromatogram deconvolution  Pathway analysis/association/causal models  Clustering/Trees (phylo, haplotypes etc.)  Protein binding and folding  Sequence diversity models (epitomes)  Image analysis/classification  Evolution modeling and inference  Epitope prediction

Kyril Faenov, Director of HPC Microsoft Corporation

Top 500 Architectures / Systems

HPC: Market Trends Capability, Enterprise $1M+ Divisional $250K-$1M Departmental $50-250K Workgroup <$50K 2004 Systems 1,167 3,915 22, ,802 Source: IDC, 2005 <$250K – 97% of systems, 52% of revenue In 2004 clusters grew 96% to 37% by revenue Average cluster size nodes

Windows Compute Cluster Server 2003 Faster time-to-insight through simplified cluster deployment, job submission and status monitoring Better integration with existing Windows infrastructure allowing customers to leverage existing technology and skill-sets Familiar development environment allows developers to write parallel applications from within the powerful Visual Studio IDE

10,0001, ‘70‘80‘90‘00‘10 Power Density (W/cm 2 ) Pentium ® Hot Plate Nuclear Reactor Rocket Nozzle Sun’s Surface Intel Developer Forum, Spring Pat Gelsinger

Radical Computing  Future of silicon chips  “100’s of cores on a chip in 2015” (Justin Rattner, Intel) (Justin Rattner, Intel)  Challenge for IT industry and Computer Science community  Can we make parallel computing on a chip easier than message-passing?  Challenge for the Scientific Community  How will the Multi-Core transition affect scientific computing?

Even more Radical Computing?  Quantum Computing  Multidisciplinary Institute at UCSB  Director is Michael Freedman of MSR  Looking at novel material physics and the 2- dimensional Quantum Hall effect  Exploring non-Abelian ‘Anyon’ excitations to protect coherence of qubits and quantum gates  See Scientific American April 2006 for description of ‘Project Q’

Multiplication versus Factoring X = Prime Factors of the 129-digit number known as RSA-129 Multiplication is classically computationally ‘easy’ but factorization is computationally ‘hard’

Peter Shor developed new factorization algorithm for a quantum computer

Summary Microsoft wishes to work with the university research and library communities to: Microsoft wishes to work with the university research and library communities to: develop interoperable high-level services, work flows, tools and data services develop interoperable high-level services, work flows, tools and data services accelerate progress in a small number of societally important scientific applications accelerate progress in a small number of societally important scientific applications assist in the development of interoperable repositories and new models of scholarly publishing assist in the development of interoperable repositories and new models of scholarly publishing explore radical new directions in computing and ways and applications to exploit on-chip parallelism explore radical new directions in computing and ways and applications to exploit on-chip parallelism  How can Microsoft best collaborate with the scientific community?

© 2005 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.