Introduction to Teradata

Slides:

Advertisements

Similar presentations

2. Computer Clusters for Scalable Parallel Computing

Advertisements

Reference: Message Passing Fundamentals.

Virtualization for Cloud Computing

Designing a Data Warehouse

Data Warehousing: Defined and Its Applications Pete Johnson April 2002.

Working with SQL and PL/SQL/ Session 1 / 1 of 27 SQL Server Architecture.

11 SERVER CLUSTERING Chapter 6. Chapter 6: SERVER CLUSTERING2 OVERVIEW  List the types of server clusters.  Determine which type of cluster to use for.

An Introduction to Infrastructure Ch 11. Issues Performance drain on the operating environment Technical skills of the data warehouse implementers Operational.

Operating Systems (CS 340 D) Dr. Abeer Mahmoud Princess Nora University Faculty of Computer & Information Systems Computer science Department.

Data Warehousing 1 Lecture-24 Need for Speed: Parallelism Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.

Week 5 Lecture Distributed Database Management Systems Samuel ConnSamuel Conn, Asst Professor Suggestions for using the Lecture Slides.

Processes Introduction to Operating Systems: Module 3.

Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.

Last Updated : 27 th April 2004 Center of Excellence Data Warehousing Group Teradata Performance Optimization.

3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 1.

Covered Topics of Teradata Teradata Architecture Objects of Teradata Recovery and Protection of Data Indexes of Tera data Storage & Retrieval of Data.

Hands-On Microsoft Windows Server 2008 Chapter 7 Configuring and Managing Data Storage.

1 OPERATING SYSTEMS. 2 CONTENTS 1.What is an Operating System? 2.OS Functions 3.OS Services 4.Structure of OS 5.Evolution of OS.

Distributed Systems Architectures Chapter 12. Objectives  To explain the advantages and disadvantages of different distributed systems architectures.

Distributed Systems Architectures. Topics covered l Client-server architectures l Distributed object architectures l Inter-organisational computing.

File System Implementation

Computer Architecture and Number Systems

Virtualization for Cloud Computing

OPERATING SYSTEM CONCEPT AND PRACTISE

CMSC 611: Advanced Computer Architecture

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING

Memory COMPUTER ARCHITECTURE

Subject Name: File Structures

CS522 Advanced database Systems

Chapter 11: File System Implementation

Chapter 12: File System Implementation

Open Source distributed document DB for an enterprise

File System Implementation

CHAPTER 3 Architectures for Distributed Systems

Oracle Solaris Zones Study Purpose Only

Software Architecture in Practice

Storage Virtualization

Teradata Platform Introduction

Chapter 3 Part 3 Switching and Bridging

What is the Azure SQL Datawarehouse?

Mapping the Data Warehouse to a Multiprocessor Architecture

Chapter 17: Database System Architectures

TERADATA RDBMS ARCHITECTURE

Akshay Tomar Prateek Singh Lohchubh

Ch 4. The Evolution of Analytic Scalability

Main Memory Background Swapping Contiguous Allocation Paging

MICROPROCESSOR MEMORY ORGANIZATION

Teradata Basics 12/4/2018 Sayrite Inc..

CLUSTER COMPUTING.

Lecture 29: Virtual Memory-Address Translation

Lecture 3: Main Memory.

Implementing an OpenFlow Switch on the NetFPGA platform

Language Processors Application Domain – ideas concerning the behavior of a software. Execution Domain – Ideas implemented in Computer System. Semantic.

Translation Buffers (TLB’s)

Subject Name: Operating System Concepts Subject Number:

Parallel DBMS Chapter 22, Sections 22.1–22.6

Chapter 8: Memory Management strategies

Data Warehouse.

Chapter 3 Part 3 Switching and Bridging

Translation Buffers (TLB’s)

Data Warehousing Concepts

Database System Architectures

Translation Buffers (TLBs)

A Virtual Machine Monitor for Utilizing Non-dedicated Clusters

The Gamma Database Machine Project

Review What are the advantages/disadvantages of pages versus segments?

Parallel DBMS DBMS Textbook Chapter 22

L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher

Presentation transcript:

Introduction to Teradata Kanakadurga Rajanala

Overview What is a Independent Data Mart? What is Data Warehousing? What is Teradata? Teradata Architecture Data Storage and Retrieval Fault Tolerance Q & A

Independent Data Marts Decision Support Information Engg Dept Decision Support Information Arts Dept Decision Support Information Science Dept How to get university wide reports? How to prevent data redundancy? How to maintain consistent data across departments?

What is a Data Warehouse? A data warehouse is a central, enterprise wide database that contains information obtained from operational systems. Good Data warehouse Provides required storage capability Performance Scalability

Data Warehouse University Database Data Data Mining Warehouse Engg Dep Science Student/faculty info Arts Music Etc… Copied, organized detailed Data Warehouse Data Mining

What is Teradata? Teradata is an RDBMS designed for enterprise data warehousing. Massively Parallel Processing system(MPP) Parallelism throughout Platform “Share Nothing” architecture Linear Scalability

Teradata MPP Architecture Physical layout of 4 node system MPP System Each Node is an SMP System. The maximum nodes can be up to 512.

BYNET Interconnect BYNET high-speed interconnect facilitates system communication All nodes connected via BYNET Hardware network Driver Software runs on each node

Node Architecture Each Teradata Node is made up of hardware and software Each node has CPUs, system disk, memory and adapters Each node runs copy of OS and database SW A Teradata Database node requires three distinct pieces of software: The Teradata Database can run on the following operating systems: UNIX MP-RAS Microsoft Windows 2000 The Parallel Database Extensions (PDE) software layer was added to the operating system by NCR to support the parallel software environment. A Trusted Parallel Application (TPA) uses PDE to implement virtual processors (vprocs). The Teradata Database is classified as a TPA. The four components of the Teradata Database TPA are: AMP (Top Right) PE (Bottom Right) Channel Driver (Top Left) Teradata Gateway (Bottom Left) A Parsing Engine (PE) is a vproc that manages the dialogue between a client application and the Teradata Database, once a valid session has been established. Each PE can support a maximum of 120 sessions. The AMP is a vproc that controls its portion of the data on the system. AMPs do the physical work associated with generating an answer set (output) including sorting, aggregating, formatting, and converting. The AMPs perform all database management functions on the required rows in the system. The AMPs work in parallel, each AMP managing the data rows stored on its single vdisk. AMPs are involved in data distribution and data access in different ways. Channel Driver software is the means of communication between an application and the PEs assigned to channel-attached clients. There is one Channel Driver per node. Teradata Gateway software is the means of communication between an application and the PEs assigned to network-attached clients. There is one Teradata Gateway per node.

V2 Virtual Processors (Vprocs) Node Architecture Each node has virtual processors (process) PDE layer facilitates communication between OS and Vprocs. Syntaxer Optimizer Step Gen Dispatch PE vproc AMP vproc Vdisk Operating System PDE V2 Virtual Processors (Vprocs) Execution Engines Run in Parallel A Teradata Database node requires three distinct pieces of software: The Teradata Database can run on the following operating systems: UNIX MP-RAS Microsoft Windows 2000 The Parallel Database Extensions (PDE) software layer was added to the operating system by NCR to support the parallel software environment. A Trusted Parallel Application (TPA) uses PDE to implement virtual processors (vprocs). The Teradata Database is classified as a TPA. The four components of the Teradata Database TPA are: AMP (Top Right) PE (Bottom Right) Channel Driver (Top Left) Teradata Gateway (Bottom Left) A Parsing Engine (PE) is a vproc that manages the dialogue between a client application and the Teradata Database, once a valid session has been established. Each PE can support a maximum of 120 sessions. The AMP is a vproc that controls its portion of the data on the system. AMPs do the physical work associated with generating an answer set (output) including sorting, aggregating, formatting, and converting. The AMPs perform all database management functions on the required rows in the system. The AMPs work in parallel, each AMP managing the data rows stored on its single vdisk. AMPs are involved in data distribution and data access in different ways. Channel Driver software is the means of communication between an application and the PEs assigned to channel-attached clients. There is one Channel Driver per node. Teradata Gateway software is the means of communication between an application and the PEs assigned to network-attached clients. There is one Teradata Gateway per node. No one else shares

Data Storage Defined by user. Used for data distribution. 25 26 Pick Up Primary Index value Array associates hash buckets to AMPs Parsing Engine Hashing Algorithm Row Hash Bucket # Hash Map A hashing algorithm is a standard data processing technique that takes in a data value, like last name or order number, and systematically mixes it up so that the incoming values are converted to a number in a range from zero to the specified maximum value. A successful hashing scheme scatters the input evenly over the range of possible output values. It is predictable in that Smith will always hash to the same value and Jones will always hash to another (hopefully different) value. With a good hashing algorithm any patterns in the input data should disappear in the output data. Teradata’s algorithm works predictably well over any data, typically loading each AMP with variations in the range of .1% to .5% between AMPs. For extremely large systems, the variation can be as low as .001% between AMPs. A Row Hash is the 32-bit result of applying a hashing algorithm to an index value. The DSW (Destination Selection Word) or Hash Bucket is represented by the high order 16 bits of the Row Hash. Teradata also uses hashing quite differently than other data storage systems. Other hashed data storage systems equate a bucket with a physical location on disk. In Teradata, a bucket is simply an entry in a hash map. Each hash map entry points to a single AMP. Therefore, changing the number of AMPs does not require any adjustment to the hashing algorithm. Teradata simply adjusts the hash maps and redistributes any affected rows. The hash maps must always be available to the Message Passing Layer. A Teradata Version 2 a hash map has 65,536 entries. When the hash bucket has determined the destination AMP, the full 32-bit row hash plus the Table-ID is used to assign the row to a cylinder and a data block on the AMPs disk storage. In Version 2 the algorithm can produce over 4,000,000,000 row hash values. Hash values will be the same for non-unique primary indexes and for hash synonyms (values where different inputs produce same output), so a ‘Uniqueness value’ is added to the Row Hash. This combined value becomes the Row ID and is truly unique for every row on the database. Message Passing Layer (BYNET) Node-1 Node-2 AMP AMP AMP AMP

Data Retrieval Select * from CalStateLADB.Students; Client Parsing Engine Dispatcher RET Step RET Step RET Step RET Step RET Step Message Passing Layer AMP AMP AMP AMP 10 10 29 29 25 25 50 50 75 75

Teradata Scales Linearly Scaling achieved via ‘shared nothing’ architecture and unconditional parallelism Power is in linear scalability where slope = 1 More nodes More work More users More data BYNET Node1 Node2 Node3 Node4 Node Work Users Data

Fault Tolerance BYNET Amp Amp Amp Amp Amp Amp Amp Amp CLIQUE-2 Disk Array Disk Array Disk Array Disk Array

Summary Teradata is designed and used for enterprise data warehousing. MPP with “Share Nothing” Architecture. Parallelism throughout the platform. Runs on windows and Linux.

References Teradata Basics and Architecture user manual (www.info.ncr.com) Teradata Software Demo CD can be ordered (free copy) from www.teradata.com

Questions?