Space …. are big. Really big. You just won't believe how vastly, hugely, mindbogglingly big they are. Massive data streams Douglas Adams – Hitchhiker’s.

Slides:



Advertisements
Similar presentations
Microsoft Research Microsoft Research Jim Gray Distinguished Engineer Microsoft Research San Francisco SKYSERVER.
Advertisements

1 Online Science The World-Wide Telescope as a Prototype For the New Computational Science Jim Gray Microsoft Research
Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY
Computing for LHC Dr. Wolfgang von Rüden, CERN, Geneva ISEF students visit CERN, 28 th June - 1 st July 2009.
1 A walk through the Universe Space is big. You just wont believe how vastly, hugely, mind-bogglingly big it is. I mean, you may think its a long way down.
Provenance-Aware Storage Systems Margo Seltzer April 29, 2005.
Brian Schmidt The Research School of Astronomy and Astrophysics Mount Stromlo & Siding Spring Observatories.
LOGO Emil Persson Head of Research Joel de Vahl Engine Programmer.
Large Scale Computing Systems
The profile of the management (data) scientist: Potential scenarios and skills for B/SMD-based Management research Juan Mateos-Garcia, Nesta P&R NEMODE.
9 v 2012Kavli IPMU1 Computational Astrophysics at the Kavli Institute for Particle Astrophysics and Cosmology at Stanford Roger Blandford.
A Berkeley View of Big Data Ion Stoica UC Berkeley BEARS February 17, 2011.
UNCLASSIFIED: LA-UR Data Infrastructure for Massive Scientific Visualization and Analysis James Ahrens & Christopher Mitchell Los Alamos National.
Announcements I’m enjoying reading your projects! Turn in homework 8 by 5:00 today Pick up Homework 9. Test this week (W, Th, F before class) To put things.
Navigation Jeremy Wyatt School of Computer Science University of Birmingham.
B1 -Biogeochemical ANL - Townhall V. Rao Kotamarthi.
Computer Science Prof. Bill Pugh Dept. of Computer Science.
ASTR 2310: Chapter 2 Continuing.. Galileo: The First Modern Scientist Kepler's Laws of Planetary Motion Proof of the Earth's Motion.
Data-Intensive Science (eScience) Ed Lazowska Bill & Melinda Gates Chair in Computer Science & Engineering University of Washington August 2011.
Candice Frazier.  § Science, Grade 5  (4)Science is a way of learning about the natural world. Students should know how science has built a vast.
Introduction to Data Science Kamal Al Nasr, Matthew Hayes and Jean-Claude Pedjeu Computer Science and Mathematical Sciences College of Engineering Tennessee.
Ground Systems Data in the World Data Center Archives Justin Mabie (NGDC) and Anatoly Soloviev (GCRAS) 2 July
Big Data A big step towards innovation, competition and productivity.
Science Data Products – HMI Magnetic Field Images Pipeline 45-second Magnetic line-of-sight velocity on full disk Continuum intensity on full disk Vlos.
ANTHONY PYM TRANSLATION FOR ALL : COMMUNICATIVE TRANSLATION AND ADULT LANGUAGE-LEARNING.
Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University.
My Book About Space by. Day and Night We all live on Earth. It is like a big, round ball. And it is spinning. This is hard to believe because we do not.
Big Data in Science (Lessons from astrophysics) Michael Drinkwater, UQ & CAASTRO 1.Preface Contributions by Jim Grey Astronomy data flow 2.Past Glories.
National Center for Supercomputing Applications Observational Astronomy NCSA projects radio astronomy: CARMA & SKA optical astronomy: DES & LSST access:
1 Radio Astronomy in the LSST Era – NRAO, Charlottesville, VA – May 6-8 th LSST Survey Data Products Mario Juric LSST Data Management Project Scientist.
Astronomical data curation and the Wide-Field Astronomy Unit Bob Mann Wide-Field Astronomy Unit Institute for Astronomy School of Physics University of.
Cloud Computing Dave Elliman 11/10/2015G53ELC 1. Source: NY Times (6/14/2006) The datacenter is the computer!
Methods in Gravitational Shear Measurements Michael Stefferson Mentor: Elliott Cheu Arizona Space Grant Consortium Statewide Symposium Tucson, Arizona.
ALICE Upgrade for Run3: Computing HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.
Astro / Geo / Eco - Sciences Illustrative examples of success stories: Sloan digital sky survey: data portal for astronomy data, 1M+ users and nearly 1B.
Earth Science Daily Challenge, 9/17
Henri Bal Vrije Universiteit Amsterdam High Performance Distributed Computing.
GridPP18 Glasgow Mar 07 DØ – SAMGrid Where’ve we come from, and where are we going? Evolution of a ‘long’ established plan Gavin Davies Imperial College.
What is Computer Science? “Computer Science is no more about computers than astronomy is about telescopes.” - Edsger Dijkstra “Computer Science is no more.
Earth Science Daily Challenge, 9/9 What is a MODEL? Give at least 4 different examples of models.
Web and Grid Services from Pitt/CMU Andrew Connolly Department of Physics and Astronomy University of Pittsburgh Jeff Gardner, Alex Gray, Simon Krughoff,
EScience May 2007 From Photons to Petabytes: Astronomy in the Era of Large Scale Surveys and Virtual Observatories R. Chris Smith NOAO/CTIO, LSST.
“Big Data” and Data-Intensive Science (eScience) Ed Lazowska Bill & Melinda Gates Chair in Computer Science & Engineering University of Washington July.
EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
Spatial rainfall distribution in flood modelling Adam Baylis Research Scientist 21 September 2015.
F. Douglas Swesty, DOE Office of Science Data Management Workshop, SLAC March Data Management Needs for Nuclear-Astrophysical Simulation at the Ultrascale.
Lecture 11 Page 1 CS 111 Online Virtual Memory A generalization of what demand paging allows A form of memory where the system provides a useful abstraction.
CERN – IT Department CH-1211 Genève 23 Switzerland t Working with Large Data Sets Tim Smith CERN/IT Open Access and Research Data Session.
Peter Lee Head, Computer Science Department Carnegie Mellon University.
Large Area Surveys - I Large area surveys can answer fundamental questions about the distribution of gas in galaxy clusters, how gas cycles in and out.
DDM Kirk. LSST-VAO discussion: Distributed Data Mining (DDM) Kirk Borne George Mason University March 24, 2011.
What is Big Query?.
The VAO is operated by the VAO, LLC. Ashish Mahabal Ciro Donalek Matthew Graham Ray Plante George Djorgovski.
ADNET Systems, Inc. Jack Ireland & Helioviewer Team ADNET Systems, Inc. Helioviewer Discovery for Everyone Everywhere.
Cybertools!! David DeWitt. So what is a cybertool? That is a good question NSF doesn’t even know the answer Maybe Hector does since he wrote the report.
1 The Good  HPC brings a wealth of parallelization experience, petaflop scaling and hybrid architectures.  Analytics brings new algorithms and new markets.
ATLAS Distributed Computing perspectives for Run-2 Simone Campana CERN-IT/SDC on behalf of ADC.
Smart Grid Big Data: Automating Analysis of Distribution Systems Steve Pascoe Manager Business Development E&O - NISC.
What’s the Big Deal about Big Data? 52 nd Annual ACMSE Conference Jennifer Lewis Priestley, Ph.D. Professor of Statistics and Data Science.
Mining of Massive Datasets Ch4. Mining Data Streams.
Science and Faith Harry Macdonald. Science: Science One law explains each of these.
Critical Issues in Distributed Computing Jeff Templon NIKHEF ACAT’07 Conference Amsterdam, 26 april 2007.
T. Axelrod, NASA Asteroid Grand Challenge, Houston, Oct 1, 2013 Improving NEO Discovery Efficiency With Citizen Science Tim Axelrod LSST EPO Scientist.
Tamas Szalay, Volker Springel, Gerard Lemson
Introduction to Data Programming
Johnson & Johnson Virtual Medical Offices and Research Laboratories
Stanford Linear Accelerator
Stanford Linear Accelerator
Data search & distribution
Presentation transcript:

Space …. are big. Really big. You just won't believe how vastly, hugely, mindbogglingly big they are. Massive data streams Douglas Adams – Hitchhiker’s Guide to Big Data

Massive Data Streams SDO (The devil we know!) – 650 TB of data a year – Storable / Distributable Possible perhaps for 6TB per day – Efforts to locate data of interest FFT / HEK – For FFT some modules need to run in near real time for space weather applications – Visualize the data stream Helioviewer – But people still want full resolution / full cadence science data – No real access to level 0

Massive Data Streams ATST – ATST ~7 PB a year (8 hour duty cycle) – Process at night?! – Is it distributable? Products probably but not like SDO – No access to “raw” data products You’ll just have to trust us – Not reprocessable?

Massive Data Streams Moore’s Law (like increases) won’t help you now …

Massive Data Streams Astronomy – Large Synoptic Survey Telescope (LSST) 30 TB per night Google will help them!

Massive Data Streams Particle Physics – LHC 15 PB a year Very bursty – Hardware challenges – But rarely on!! Reduced data products are very small

Massive Data Streams EST – 3.5 PB per day!!! – Will throw away data …

Massive Data Streams Challenges – Storing data locally – Processing data Need for true real time pipeline for data reduction etc. High speed, high quality, robust, algorithms for real time science analysis, modeling / forecasting – More computer scientists in solar physics? – Distributing data Visual data will always hard to distribute Abstract information to databases using automated methods – NMM? (Not my method!) – Database mining User interaction with the data? Bring on the statisticians!