The Memory B. Ramamurthy C B. Ramamurthy1. Topics for discussion On chip memory On board memory System memory Off system/online storage/ secondary memory.

Slides:



Advertisements
Similar presentations
From Startup to Enterprise A Story of MySQL Evolution Vidur Apparao, CTO Stephen OSullivan, Manager of Data and Grid Technologies April 2009.
Advertisements

Data Storage Solutions Module 1.2. Data Storage Solutions Upon completion of this module, you will be able to: List the common storage media and solutions.
MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
PHANI VAMSI KRISHNA.MADDALI. BASIC CONCEPTS.. FILE SYSTEMS: It is a method for storing and organizing computer files and the data they contain to make.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
The Memory B. Ramamurthy C B. Ramamurthy1. Topics for discussion On chip memory On board memory System memory Off system/online storage/ secondary memory.
DISTRIBUTED DATABASE. Centralized & Distributed Database  Single site database – centralized database –A database is located at a single site or distributed.
High Performance Computing Course Notes High Performance Storage.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
IT Systems Memory EN230-1 Justin Champion C208 –
Session 3 Windows Platform Dina Alkhoudari. Learning Objectives Understanding Server Storage Technologies Direct Attached Storage DAS Network-Attached.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Module 3 - Storage MIS5122: Enterprise Architecture for IT Auditors.
Chapter 10 : Designing a SQL Server 2005 Solution for High Availability MCITP Administrator: Microsoft SQL Server 2005 Database Server Infrastructure Design.
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
Secondary Storage Chapter 7.
SOFTWARE SYSTEMS DEVELOPMENT MAP-REDUCE, Hadoop, HBase.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
COT 4600 Operating Systems Spring 2011 Dan C. Marinescu Office: HEC 304 Office hours: Tu-Th 5:00 – 6:00 PM.
Lecture#14. Last Lecture Summary Memory Address, size What memory stores OS, Application programs, Data, Instructions Types of Memory Non Volatile and.
Chapter 3: Operating-System Structures System Components Operating System Services System Calls System Programs System Structure Virtual Machines System.
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.
© Pearson Education Limited, Chapter 16 Physical Database Design – Step 7 (Monitor and Tune the Operational System) Transparencies.
B. RAMAMURTHY MapReduce and Hadoop Distributed File System 10/6/ Contact: Dr. Bina Ramamurthy CSE Department University at Buffalo (SUNY)
Latest Relevant Techniques and Applications for Distributed File Systems Ela Sharda
Hadoop Basics -Venkat Cherukupalli. What is Hadoop? Open Source Distributed processing Large data sets across clusters Commodity, shared-nothing servers.
W HAT IS H ADOOP ? Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity.
Introduction to Hadoop and HDFS
CSE 451: Operating Systems Section 10 Project 3 wrap-up, final exam review.
Whirlwind Tour of Hadoop Edward Capriolo Rev 2. Whirlwind tour of Hadoop Inspired by Google's GFS Clusters from systems Batch Processing High.
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
NoSQL Databases Oracle - Berkeley DB. Content A brief intro to NoSQL About Berkeley Db About our application.
Distributed systems A collection of autonomous computers linked by a network, with software designed to produce an integrated computing facility –A well.
The exponential growth of data –Challenges for Google,Yahoo,Amazon & Microsoft in web search and indexing The volume of data being made publicly available.
Overview of Physical Storage Media
Introduction. Readings r Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 m Note: All figures from this book.
Computers Operating System Essentials. Operating Systems PROGRAM HARDWARE OPERATING SYSTEM.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
+ CS 325: CS Hardware and Software Organization and Architecture Memory Organization.
Database Applications (15-415) Part II- Hadoop Lecture 26, April 21, 2015 Mohammad Hammoud.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
Parts of the Computer System
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
Introduction: Memory Management 2 Ideally programmers want memory that is large fast non volatile Memory hierarchy small amount of fast, expensive memory.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
11 Intel Modular Server Understanding the Storage MFSYS25 MFSYS35.
A.Abhari CPS1251 Topic 1: Introduction to Computers Computer Hardware Computer components Connecting Computers Computer Software Operating System (OS)
The Storage B. Ramamurthy C B. Ramamurthy1. Topics for discussion On chip memory On board memory System memory Off system/online storage/ secondary memory.
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
BIG DATA/ Hadoop Interview Questions.
CIT 140: Introduction to ITSlide #1 CSC 140: Introduction to IT Operating Systems.
Hadoop Aakash Kag What Why How 1.
CSE 451: Operating Systems
Advanced Topics in Concurrency and Reactive Programming: Case Study – Google Cluster Majeed Kassis.
An Open Source Project Commonly Used for Processing Big Data Sets
Large-scale file systems and Map-Reduce
Software Engineering Introduction to Apache Hadoop Map Reduce
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
Chapter 7.
Introduction to Operating Systems
Database Applications (15-415) Hadoop Lecture 26, April 19, 2016
Cloud computing mechanisms
The Memory B. Ramamurthy C B. Ramamurthy.
MICROPROCESSOR MEMORY ORGANIZATION
Zoie Barrett and Brian Lam
Database System Architectures
Primary Storage 1. Registers Part of the CPU
Presentation transcript:

The Memory B. Ramamurthy C B. Ramamurthy1

Topics for discussion On chip memory On board memory System memory Off system/online storage/ secondary memory File system abstraction Offline/ tertiary memory RAID: Redundant Array of Inexpensive Disks NAS: Network Accessible Storage SAN: Storage area networks DB and DBMS: Data base and DB management systems Distributed file system Google file system Hadoop file system C B. Ramamurthy2

Data and Computation Continuum Compute intensive Ex: computation of digits of PI Data intensive Ex: analyzing web logs C B. Ramamurthy3

On chip memory Registers Cache Buffers (instruction pipeline) Characteristics: volatile C B. Ramamurthy4

On board memory Cache – Instructions cache – Data cache – Translation look aside buffers (TLB) Characteristics: content addressable, set- associative organization C B. Ramamurthy5

System memory RAM : Random access memory: main memory Read and write possible volatile ROM: Read only memory: boot programs for operating systems Flash memory: Erasable/writable non-volatile memory SDRAM: synch dynamic RAM others EAROM C B. Ramamurthy6

Off-system storage (Earlier Lectures covered these) Off system/online storage/ secondary memory File system abstraction Offline/ tertiary memory RAID: Redundant Array of Inexpensive Disks NAS: Network Accessible Storage SAN: Storage area networks C B. Ramamurthy7

Database and Database Management System Data source Transactional Data base server Relational db or similar foundation Tables, rows, result set, SQL ODBC: open data base connectivity Very successful business model: Oracle, DB2, MySQL, and others Persistence models: EJB, DAO, ADO (I am not going to expand the abbreviation.. ) C B. Ramamurthy8

Distributed file system(DFS) A dedicated server manages the files for an compute environment For example, nickelback,cse.buffalo.edu is your file server and that is why we did not want you to run your user applications on this machine. DFS addresses various transparencies: location transparency, sharing, performance etc. Examples: NFS, NFS+, AFS (Andrew FS)… (you will study these in Distributed Systems course) C B. Ramamurthy9

Issues with ultra-scale data How to store the large amount of data? – On commodity hardware or special hardware Large storage implies large number of devices to store them. – How to address shortening MTTF (Mean time to failure)? – How to realize “fault tolerance”? – Redundancy/replication is a solution How to manage the replication and the health of the large number of devices? More importantly how to partition the large scale data to store in these storage devices (nodes)? How to parallelize processing of the data stored at multiple “nodes”? C B. Ramamurthy10

On to Google File Internet introduced a new challenge in the form web logs, web crawler’s data: large scale “peta scale” But observe that this type of data has an uniquely different characteristic than your transactional or the “order” data on amazon.com: “write once” ; so is HIPPA protected healthcare and patient information; Google exploited this characteristics in its Google file system: S. GhemavatGoogle file system: S. Ghemavat C B. Ramamurthy11

Hadoop File System (HFS) Hadoop file system is a reverse engineered version of the GFS : this is my first opinion on HFS HFS is a distributed file system for large scale data Data throughput is more important than latency Batch computing than interactive time shared computing C B. Ramamurthy12

Cat Bat Dog Other Words (size: TByte) map split combine reduce part0 part1 part2 MapReduce

Exercise: Count the number of occurrences of the word in the text This is a cat. Cat sits on a roof. The roof is a tin roof. There is a tin can on the roof. Cat kicks the can. It rolls on the roof and falls on the next roof. The cat rolls too. It sits on the can. C B. Ramamurthy14