Big Data Why it matters Patrice KOEHL Department of Computer Science Genome Center UC Davis.

Slides:



Advertisements
Similar presentations
Hardware Lesson 3 Inside your computer.
Advertisements

Chapter 6:Storage. Storage What is storage? p. 220 Fig. 6-1 Next Holds data, instructions, and information for future use Storage medium is physical material.
How Big is an Exabyte? 5 Exabytes = 37,000 new libraries
Digital Multimedia.
Chapter 4: Representation of data in computer systems
Did You Know? Number of spam s sent each day? 100 billion.
assumes basic arithmetic
Unit 3—Part A Computer Memory
Big Data, Future of Computing, Parting Thoughts Slobodan Vucetic Associate Professor Department of Computer and Information Sciences Temple University.
Big Data A big step towards innovation, competition and productivity.
Computer Information Technology – Section 1-1. Parts of the Computer Objective: To identify the parts of a computer and their uses.
1 © 2012 Oracle Corporation – Proprietary and Confidential Big Data Challenge – From EMR’s to Translational Research Miroslav Končar, PhD Oracle Healthcare.
Digital Data Patrice Koehl Computer Science UC Davis.
CREATED BY, MS. JENNIFER DUKE BITS, BYTES, AND UNITS OF MEASUREMENT.
Communications Technology 2104 Mercedes Lahey. Bit 1. bit=From a shortening of the words “binary digit” 2. the basic unit of information for computers.
 A bit is the smallest piece of information a computer can record.  It means that an electrical signal is on (1) or off (0).  A byte is 8 bits and.
Communications technology Ali Kennedy.  Bit= from a shortening of the words “ bi nary digit”  The basic unit ofinformation for computers  1 or 0 are.
Computer System.
Computer Systems Chapter 1 Pages Hardware-physical pieces Key hardware components in a computer system: The physical parts. – Central processing.
What do computers know?  All they really know is on or off.  Kind of like a light switch  Computers aren’t nearly as smart as you are!
COMPUTER TECHNOLOGY MRS. SEALE COMPUTER PERFORMANCE.
Section 1 # 1 CS The Age of Infinite Storage.
Inside your computer. Hardware Review Motherboard Processor / CPU Bus Bios chip Memory Hard drive Video Card Sound Card Monitor/printer Ports.
Inside your computer. Hardware Motherboard Processor / CPU Bus Bios chip Memory Hard drive Video Card Sound Card Monitor/printer Ports.
Computer Systems Nat 4/5 Computing Science Data Representation Lesson 2: Floating Point Representation.
Unit 2—Part A Computer Memory Computer Technology (S1 Obj 2-3)
SEQUENCING – THE BENCHTOPS. Roche 454 Junior Same technology as 454 FLX Read length: 400 bases Paired-end 100,000 reads 12 hours (instrument time) Output.
Do it now activity Can you work out what the missing symbols are and work out the order they should be in if the table shows smallest to largest KB kilobyte.
Under the Hood & In the Hood Sabrina Carreno. Places to Save: Burn to a CD Floppy Disk File (Student if at school) Flash Drive the information to.
How We Measure Memory. Learning Goal Today we are going to learn how the computer stores information.
Networking for Home and Small Businesses –.  Explain the binary representation of data.
Computers - The Journey Inside continues…
How We Measure Memory. At the Bottom of things A piece of digital information is always stored as a sequence of binary states. What’s that mean you ask???
Floppy Disk Drive Lesson 5 CES Industries, Inc.. 1. Evolved from audio tape to floppy disk drives, with the first being an 8” disk to modern 3 1/2” 2.
2.1.4 Data Representation Units.
Know what a computer is used for Understand the difference between hardware and software Be able to describe the way that data is stored in a computer.
By Dzuryat Nugroho.  Bit ◦ The smallest size of data in computer 1 bit = 0 or 1  Character (a,b,c,d……z,1,2,3,&,%,?,/….) ◦ 1 character = 8 bit  So,…if.
What’s the Big Deal about Big Data? Jennifer Lewis Priestley, Ph.D. Professor of Statistics and Data Science.
Smart Planet The IBM Smart Planet initiative concentrates on the world’s infrastructure, those systems and processes that enable goods to be developed,
What’s the Big Deal about Big Data? 52 nd Annual ACMSE Conference Jennifer Lewis Priestley, Ph.D. Professor of Statistics and Data Science.
The Wonderful World of Computers Larry Holder The University of Tennessee at Martin.
File types and storage. Primary and Secondary storage  Primary – built into the computer, the computers memory capacity  RAM – random access memory.
Big data What they are and why we should care. Previous hot topics 60’s Catastrophe theory 70’s Fractals 80’s Chaos theory 90’s Data mining 00’s Machine.
Binary Numbers. Base 10 and Base 2  We normally work with numbers in base 10.  In base 10 we use the digits 0, 1, 2, 3, 4, 5, 6, 7, 8 and 9.  Everything.
CC111 Lec#2 The System Unit The System Unit: Processing and Memory Lecture 2 Binary System.
Storage Devices. Store, to store and storage I have stored my pictures in a CD. I have to go to the store. Your storage device isn’t working, so you need.
Reminders Talk in English during all activities inside the class. Don’t answer the questions if you haven’t been asked. Don’t shout the answers.
© OCR 2016 Unit 2.6 Data Representation Lesson 1 ‒ Numbers.
Computer basics.
Data Representation N4/N5.
BIG Data 25 Need-to-Know Facts.
Computer Memory Digital Literacy.
Memory Parts of a computer
What is Binary? Binary is a two-digit (Base-2) numerical system, which computers use to process and store data. The reason computers use the binary system.
COMP211 Computer Logic Design Lecture 1. Number Systems
IATUL Creating an institutional repository for massive data sets -- a statement of the problem and an assessment of the challenge and opportunities. James.
Digital Information Fluency
Unit 2.6 Data Representation Lesson 1 ‒ Numbers
مهارت اول: مفاهیم پایه فناوری اطلاعات و ارتباطات فصل اول: سخت‌افزار
Unit 2 Computer Memory Computer Technology (S1 Obj 2-3)
Unit 3—Part A Computer Memory
Data Representation Numbers
The Wonderful World of Computers
Connected sources and available data
Storage Chapter 7.
How do computers work? Storage.
Unit 3—Part A Computer Memory
Software Development Environment, File Storage & Compiling
Data Analysis and R : Technology & Opportunity
CSE 102 Introduction to Computer Engineering
Presentation transcript:

Big Data Why it matters Patrice KOEHL Department of Computer Science Genome Center UC Davis

The three I’s of Big Data Big Data is: - Ill-defined (what is it?) - Immediate (we need to do something about it now) - Intimidating (what if we don’t) (loosely adapted from Forbes)

Big Data: Volume Byte Kilobyte Megabyte Gigabyte Terabyte Petabyte Exabyte Zettabyte Yottabyte KB MB GB TB PB EB ZB YB 1000 bytes 1000 KB 1000 MB 1000 GB 1000 TB 1000 PB 1000 ZB 1000YB

Big Data: Volume Byte Kilobyte Megabyte Gigabyte Terabyte Petabyte Exabyte Zettabyte Yottabyte KB MB GB TB PB EB ZB YB 1000 bytes 1000 KB 1000 MB 1000 GB 1000 TB 1000 PB 1000 ZB 1000YB 30KB One page of text 5 MB One song 5 GB One movie 6 million books 1 TB 55 storeys of DVD 1 PB Data up to EB Data in ZB NSA data center 1 YB

Big Data: Volume Byte Kilobyte Megabyte Gigabyte Terabyte Petabyte Exabyte Zettabyte Yottabyte KB MB GB TB PB EB ZB YB 1000 bytes 1000 KB 1000 MB 1000 GB 1000 TB 1000 PB 1000 ZB 1000YB 1s 20 mins 11 days 30 years million 30 billion …. centuries years years 30KB One page of text 5 MB One song 5 GB One movie 6 million books 1 TB 55 storeys of DVD 1 PB Data up to EB Data in ZB NSA data center 1 YB

Big Data: Volume, Velocity One minute in the digital world 3+ million searches launched 6 million users connected 1.3 million 30 hours videos viewed videos uploaded (Intel, 2013) 204 million s sent 50 GB of data generated at the Large Hadron Collider 640 TB IP data transferred

Big Data: Volume, Velocity, Variety text Numbers Images sound

Big Data: Challenges Volume and Velocity Variety  Structured, Unstructured….  Images, Sound, Numbers, Tables,… Security Reliability, Integrity, Validity

Big Data: Challenges Large N: “Any dataset that is collected by a scientist whose data collection skills are far superior to her analysis skills” Computing issues:  Data transfer  Scalability of algorithms  Memory limitations  Distributed computing

Big Data: Challenges Vizualization issues: The black screen problem (Matloff, 2013)

Big Data: Challenges and Opportunities Fourth Paradigm: data driven science Holistic approaches to major research efforts New paradigms in computing Digital Humanities DataKnowledgeSocietal Benefit BasicTranslational

Big Data Dreams: Genomics

Genomics: Sequencing costs Cost per Mbase Cost per Human Genome

Genomics: Game changing technologies Illumina HiSeq 2000 Capable of 600 Gb per run -> 1,000+ Gb 55 Gb/day 6 billion paired-end reads <$4,000 per human/plant genome <$200 per transcriptome Multiplex 384 pathogen isolates/lane  $10 (+ $50 library construction)/isolate Gary Schroth (Illumina): “A single lab with one HiSeq is able to generate more sequences than was in GenBank in 2009, every four days”. Challenges: Library preparation & data analysis

UC Davis Massively parallel DNA sequencing 2 Illumina Genome Analyzers 1 Illumina Hiseq 2000, 2 Miseq 1 Roche 454 Junior 1 Pacific Biosystems RS GoldenGate SNP genotyping iScan, BeadArray & BeadExpress

Cancer Genomics: Molecular Diagnostics

“A single lab with one HiSeq is able to generate more sequences than was in GenBank in 2009, every four days.” Gary Schroth (Illumina) Genomics: actual costs

“A single lab with one HiSeq is able to generate more sequences than was in GenBank in 2009, every four days.” Gary Schroth (Illumina) Genomics: actual costs Assembling 22GB conifer genome: Data: -16 billion pair reads (100 bases) Processing: -10 days for error correction -11 days for assembling “super-reads” -60 days to build contigs/scaffold -8 days to fill in gaps steven-salzberg-at-bog13-assembling-22gb-conifer-genome/

Social Consequences of Commodity Sequencing The danger of misuse predict sensitivities to various industrial or environmental agents discrimination by employers? The impact of information that is likely to be incomplete an indication of a 25 percent increase in the risk of cancer? Reversal of knowledge paradigm Are the "products" of the Human Genome Project to be patented and commercialized? Myriad genetics and BRCA1/2 How to educate about genetic research and its implications?

Social Consequences of Commodity Sequencing

How to Approach Big Data