Section 1 # 1 CS 765 1. The Age of Infinite Storage.

Slides:



Advertisements
Similar presentations
Challenges in Using Lifetime Personal Information Stores based on MyLifeBits Gordon Bell, Jim Gemmell, Roger Lueder SIGIR University of Sheffield, July.
Advertisements

Challenges in Using Lifetime Personal Information Stores based on MyLifeBits Gordon Bell Alpbach Forum 26 August 2004.
“ Everything that can be invented has been invented.” Commissioner, U.S. Office of Patents, 1899 From
Universal Memex (A Research Project for Discussion)
The Evolution of Storage Technologies Daniel Frankl, Ph.D. Kinesiology & Nutritional Science.
Revolution Yet to Happen1 The Revolution Yet to Happen Gordon Bell & James N. Gray (from Beyond Calculation, Chapter 1) Rivier College, CS699 Professional.
CS597A: Managing and Exploring Large Datasets Kai Li.
Unit 3—Part A Computer Memory
Information Technology Ms. Abeer Helwa. Computer Generations First Generation (Vacuum Tubes) -They relied on the machine language to perform operations.
How much information? Adapted from a presentation by: Jim Gray Microsoft Research Alex Szalay Johns Hopkins University.
Earth Science Prologue
Scientific Notation and Metrics
The Dawning of the Age of Infinite Storage William Perrizo Dept of Computer Science North Dakota State Univ.
Lecture 1: What is a Modern Computer
Catalyst What piece of lab equipment would you use to measure…  The volume of water?  The volume of a square block?  The mass of a diamond?  The temperature.
The Future of Information Chris Pal Assistant Professor, Computer Science University of Rochester.
Dr. Michael D. Featherstone Summer 2013 Introduction to e-Commerce Web Analytics.
Section 1 # 1 CS The Age of Infinite Storage.
1 Store Everything Online In A Database Jim Gray Microsoft Research
1 6/3/99 The Next 50 Years of Computing © 1999 UW CSE Some inspirations: ACM 97 50th Anniversary Conference Beyond Calculation: The Next 50 Years of Computing;
The Worlds of Database Systems From: Ch. 1 of A First Course in Database Systems, by J. D. Pullman and H. Widom.
CSCI 765 Big Data and Infinite Storage One new idea introduced in this course is the emerging idea of structuring data into vertical structures and processing.
MyLifeBits Jim Gemmell & Gordon Bell SDForum Distinguished Speaker Series February 19, 2004.
Unit 2—Part A Computer Memory Computer Technology (S1 Obj 2-3)
+ Big Data IST210 Class Lecture. + Big Data Summary by EMC Corporation ( More videos that.
Units and Significant Digits
Dimensional Analysis -Blake Schmidt. In science, numbers have meaning…we need UNITS! e.g. If I ask you to measure the length of the lab bench, and you.
THE EMERGING GI ENVIRONMENT BRUCE McCORMACK VICE PRESIDENTEUROGI (EUROPEAN UMBRELLA ORGANISATION FOR GEOGRAPHIC INFORMATION) ESTONIA GI ASSOCIATION ANNUAL.
Floppy Disk Drive Lesson 5 CES Industries, Inc.. 1. Evolved from audio tape to floppy disk drives, with the first being an 8” disk to modern 3 1/2” 2.
Storage Why is storage an issue? Space requirements Persistence Accessibility Needs depend on purpose of storage Capture/encoding Access/delivery Preservation.
Section 1 # 1 CS 766 Introduction: 1. The Age of Infinite Storage. 2. Concurrency Control. 3. Recovery.
Catalyst – January 4(5), 2011 HW OUT PLEASE!!!  List 2 units other than inches.  List 3 prefixes other than milli-.
Big Data Why it matters Patrice KOEHL Department of Computer Science Genome Center UC Davis.
Building Peta-Byte Data Stores Jim Claus Shira Anniversary European Media Lab 12 February 2001.
TOPICS:  Introduction  Place Value  Binary  Decimal conversion  Decimal  Binary conversion  Related terms  Quiz.
The Wonderful World of Computers Larry Holder The University of Tennessee at Martin.
Vannevar Bush: As we may think. Consider a future device for individual use, which is a sort of mechanized private file and library. It needs a name,
Dr. ClincyLecture 3 Slide 1 CS Chapter 1 (1 of 2) Dr. Clincy Professor of CS.
Storage Devices. Store, to store and storage I have stored my pictures in a CD. I have to go to the store. Your storage device isn’t working, so you need.
12 Physics Lesson #1 Physics studies fundamental questions about two entities. What are these two entities?
Pengantar Ilmu Komputer dan Profesi Informatika
Metric Units.
12 Physics Lesson #1 Physics studies fundamental questions
What is Information? What will we retrieve with information retrieval?
Data Representation N4/N5.
Drill! Drill! Drill! 1 – Name two different things that a chemistry lab neophyte might do their first time in the lab. 2 – Name 5 different things that.
Computer Memory Digital Literacy.
How much information? Adapted from a presentation by:
The Metric System & Unit Conversions: aka Dimensional Analysis
Bits & Bytes How Computers Represent Data
9/2- 7th Grade Agenda Learning Objective: Learn the powers of 10
Unit 2 Computer Memory Computer Technology (S1 Obj 2-3)
How to write numbers The 4 different ways to represent numbers:
Unit 3—Part A Computer Memory
CS The Age of Infinite Storage
The Wonderful World of Computers
What is Information? What will we retrieve with information retrieval?
6 October 2016 Irmingard Eder Data Scientist, Munich Re
Unit 3—Part A Computer Memory
8/31 & 9/1 - 7th Grade Agenda Learning Objective: Learn about Metric Prefix Collect HW: Metrics Worksheet #1(5 Points) Video: Powers of 10 Metrics Lab.
9/5 & 9/6 - 7th Grade Agenda Collect HW: Signed Welcome Letter
9/12 - 7th Grade Agenda Learning Objective: Learn the powers of 10
Units and Significant Digits
8/27 - 7th Grade Agenda Learning Objective: Learn about Metric Prefix
Introduction to Chemical Principles
Jim Gray Microsoft Research
8/28 & 8/ th Grade Agenda Learning Objective: Learn the powers of 10
8/29 - 7th Grade Agenda Collect HW: Signed Welcome Letter
8/28 & 8/ th Grade Agenda Learning Objective: Learn the powers of 10
9/2 - 8th Grade Agenda Learning Objective: Learn about Scientific Inquiry Collect HW: Metrics Worksheet #2 Video: Power of 10 HW: Metrics Worksheet #3.
Presentation transcript:

Section 1 # 1 CS The Age of Infinite Storage

1. The Age of Infinite Storage has begun Many of us have enough money in our pockets right now to buy all the storage we will be able to fill for the next 5 years. So having the storage capacity is no longer a problem. Managing it is a problem (especially when the volume gets large). How much data is there? Section 1 # 2

 Tera Bytes (TBs) are Here 1 TB costs  1k$ to buy 1 TB costs ~300k$/year to own  Management and curation are the expensive part  Searching 1 TB takes hours  I’m Terrified by TeraBytes  I’m Petrified by PetaBytes Googi Yotta Zetta Exa Peta Tera Giga 10 9 Mega 10 6 Kilo 10 3 We are here  I’m completely Exafied by ExaBytes  I’m too old to ever be Zettafied by ZettaBytes, but you may be in your lifetime.  You may be Yottafied by YottaBytes.  You may not be Googified by GoogiBytes, but the next generation may be? Section 1 # 3

How much information is there?  Soon everything can be recorded and indexed.  Most of it will never be seen by humans.  Data summarization, trend detection, anomaly detection, data mining, are key technologies Yotta Zetta Exa Peta Tera Giga Mega Kilo A Book.Movie All books (words) All Books MultiMedia Everything! Recorded A Photo Yocto, zepto, atto, femto, pico, nano, micro, milli Section 1 # 4

First Disk, in 1956  IBM 305 RAMAC  4 MB  50 24” disks  1200 rpm (revolutions per minute)  100 milli-seconds (ms) access time  35k$/year to rent  Included computer & accounting software (tubes not transistors) Section 1 # 5 7 th Grade C.S. lab Tech.

10 years later 1.6 meters 30 MB Section 1 # 6

In 2003, the Cost of Storage was about 1K$/TB. It’s gone steadily down since then. 12/1/1999 9/1/2000 9/1/2001 4/1/ /4/2003 Section 1 # 7

Disk Evolution Kilo Mega Giga Tera Peta Exa Zetta Yotta Section 1 # 8

Memex As We May Think, Vannevar Bush, 1945 “A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility” “yet if the user inserted 5000 pages of material a day it would take him hundreds of years to fill the repository, so that he can enter material freely” Section 1 # 9

Can you fill a terabyte in a year? ItemItems/TBItems/day a 300 KB JPEG image3 M9,800 a 1 MB Document1 M2,900 a 1 hour, 256 kb/s MP3 audio file 9 K26 a 1 hour 1 MPEG video Section 1 # 10

On a Personal Terabyte, How Will We Find Anything?  Need Queries, Indexing, Data Mining, Scalability, Replication…  If you don’t use a DBMS, you will implement one of your own!  Need for Data Mining, Machine Learning is more important then ever! Of the digital data in existence today,  80% is personal/individual  20% is Corporate/Governmental DBMS Section 1 # 11

We’re awash with data! Network data: 10 terabytes by 2004 ~ Bytes US EROS Data Center archives Earth Observing System (near Soiux Falls SD) Remotely Sensed satellite and aerial imagery data 15 petabytes by 2007 ~ Bytes National Virtual Observatory (aggregated astronomical data) 10 exabytes by 2010 ~ Bytes Sensor data from sensors (including Micro & Nano -sensor networks) 10 zettabytes by 2015 ~ Bytes WWW (and other text collections) 10 yottabytes by 2020 ~ Bytes Genomic/Proteomic/Metabolomic data (microarrays, genechips, genome sequences) 10 gazillabytes by 2030 ~ Bytes? Stock Market prediction data (prices + all the above?) 10 supragazillabytes by 2040 ~ Bytes? Useful information must be teased out of these large volumes of raw data. AND these are some of the 1/5 th of Corporate or Governmental data collections. The other 4/5 ths of data sets are personnel! I made up these Name! Projected data sizes are overrunning our ability to name their orders of magnitude! Section 1 # 12

 Parkinson’s Law (for data) Data expands to fill available storage  Disk-storage version of Moore’s Law Available storage doubles every 9 months!  How do we get the information we need from the massive volumes of data we will have? Querying (for the information we know is there) Data mining (for the answers to questions we don't know to ask precisely). Section 3 # 13

Thank you. Section 3 # 1