Introduction to Teradata

Introduction to Teradata
Kanakadurga Rajanala

Overview What is a Independent Data Mart? What is Data Warehousing?
What is Teradata? Teradata Architecture Data Storage and Retrieval Fault Tolerance Q & A

Independent Data Marts
Decision Support Information Engg Dept Decision Support Information Arts Dept Decision Support Information Science Dept How to get university wide reports? How to prevent data redundancy? How to maintain consistent data across departments?

What is a Data Warehouse?
A data warehouse is a central, enterprise wide database that contains information obtained from operational systems. Good Data warehouse Provides required storage capability Performance Scalability

Data Warehouse University Database Data Data Mining Warehouse Engg Dep
Science Student/faculty info Arts Music Etc… Copied, organized detailed Data Warehouse Data Mining

What is Teradata? Teradata is an RDBMS designed for enterprise data warehousing. Massively Parallel Processing system(MPP) Parallelism throughout Platform “Share Nothing” architecture Linear Scalability

Teradata MPP Architecture
Physical layout of 4 node system MPP System Each Node is an SMP System. The maximum nodes can be up to 512.

BYNET Interconnect BYNET high-speed interconnect facilitates system communication All nodes connected via BYNET Hardware network Driver Software runs on each node

Node Architecture Each Teradata Node is made up of hardware and software Each node has CPUs, system disk, memory and adapters Each node runs copy of OS and database SW A Teradata Database node requires three distinct pieces of software: The Teradata Database can run on the following operating systems: UNIX MP-RAS Microsoft Windows 2000 The Parallel Database Extensions (PDE) software layer was added to the operating system by NCR to support the parallel software environment. A Trusted Parallel Application (TPA) uses PDE to implement virtual processors (vprocs). The Teradata Database is classified as a TPA. The four components of the Teradata Database TPA are: AMP (Top Right) PE (Bottom Right) Channel Driver (Top Left) Teradata Gateway (Bottom Left) A Parsing Engine (PE) is a vproc that manages the dialogue between a client application and the Teradata Database, once a valid session has been established. Each PE can support a maximum of 120 sessions. The AMP is a vproc that controls its portion of the data on the system. AMPs do the physical work associated with generating an answer set (output) including sorting, aggregating, formatting, and converting. The AMPs perform all database management functions on the required rows in the system. The AMPs work in parallel, each AMP managing the data rows stored on its single vdisk. AMPs are involved in data distribution and data access in different ways. Channel Driver software is the means of communication between an application and the PEs assigned to channel-attached clients. There is one Channel Driver per node. Teradata Gateway software is the means of communication between an application and the PEs assigned to network-attached clients. There is one Teradata Gateway per node.

V2 Virtual Processors (Vprocs)
Node Architecture Each node has virtual processors (process) PDE layer facilitates communication between OS and Vprocs. Syntaxer Optimizer Step Gen Dispatch PE vproc AMP vproc Vdisk Operating System PDE V2 Virtual Processors (Vprocs) Execution Engines Run in Parallel A Teradata Database node requires three distinct pieces of software: The Teradata Database can run on the following operating systems: UNIX MP-RAS Microsoft Windows 2000 The Parallel Database Extensions (PDE) software layer was added to the operating system by NCR to support the parallel software environment. A Trusted Parallel Application (TPA) uses PDE to implement virtual processors (vprocs). The Teradata Database is classified as a TPA. The four components of the Teradata Database TPA are: AMP (Top Right) PE (Bottom Right) Channel Driver (Top Left) Teradata Gateway (Bottom Left) A Parsing Engine (PE) is a vproc that manages the dialogue between a client application and the Teradata Database, once a valid session has been established. Each PE can support a maximum of 120 sessions. The AMP is a vproc that controls its portion of the data on the system. AMPs do the physical work associated with generating an answer set (output) including sorting, aggregating, formatting, and converting. The AMPs perform all database management functions on the required rows in the system. The AMPs work in parallel, each AMP managing the data rows stored on its single vdisk. AMPs are involved in data distribution and data access in different ways. Channel Driver software is the means of communication between an application and the PEs assigned to channel-attached clients. There is one Channel Driver per node. Teradata Gateway software is the means of communication between an application and the PEs assigned to network-attached clients. There is one Teradata Gateway per node. No one else shares

Data Storage Defined by user. Used for data distribution.
25 26 Pick Up Primary Index value Array associates hash buckets to AMPs Parsing Engine Hashing Algorithm Row Hash Bucket # Hash Map A hashing algorithm is a standard data processing technique that takes in a data value, like last name or order number, and systematically mixes it up so that the incoming values are converted to a number in a range from zero to the specified maximum value. A successful hashing scheme scatters the input evenly over the range of possible output values. It is predictable in that Smith will always hash to the same value and Jones will always hash to another (hopefully different) value. With a good hashing algorithm any patterns in the input data should disappear in the output data. Teradata’s algorithm works predictably well over any data, typically loading each AMP with variations in the range of .1% to .5% between AMPs. For extremely large systems, the variation can be as low as .001% between AMPs. A Row Hash is the 32-bit result of applying a hashing algorithm to an index value. The DSW (Destination Selection Word) or Hash Bucket is represented by the high order 16 bits of the Row Hash. Teradata also uses hashing quite differently than other data storage systems. Other hashed data storage systems equate a bucket with a physical location on disk. In Teradata, a bucket is simply an entry in a hash map. Each hash map entry points to a single AMP. Therefore, changing the number of AMPs does not require any adjustment to the hashing algorithm. Teradata simply adjusts the hash maps and redistributes any affected rows. The hash maps must always be available to the Message Passing Layer. A Teradata Version 2 a hash map has 65,536 entries. When the hash bucket has determined the destination AMP, the full 32-bit row hash plus the Table-ID is used to assign the row to a cylinder and a data block on the AMPs disk storage. In Version 2 the algorithm can produce over 4,000,000,000 row hash values. Hash values will be the same for non-unique primary indexes and for hash synonyms (values where different inputs produce same output), so a ‘Uniqueness value’ is added to the Row Hash. This combined value becomes the Row ID and is truly unique for every row on the database. Message Passing Layer (BYNET) Node-1 Node-2 AMP AMP AMP AMP

Data Retrieval Select * from CalStateLADB.Students;
Client Parsing Engine Dispatcher RET Step RET Step RET Step RET Step RET Step Message Passing Layer AMP AMP AMP AMP 10 10 29 29 25 25 50 50 75 75

Teradata Scales Linearly
Scaling achieved via ‘shared nothing’ architecture and unconditional parallelism Power is in linear scalability where slope = 1 More nodes More work More users More data BYNET Node1 Node2 Node3 Node4 Node Work Users Data

Fault Tolerance BYNET Amp Amp Amp Amp Amp Amp Amp Amp CLIQUE-2
Disk Array Disk Array Disk Array Disk Array

Summary Teradata is designed and used for enterprise data warehousing.
MPP with “Share Nothing” Architecture. Parallelism throughout the platform. Runs on windows and Linux.

References Teradata Basics and Architecture user manual ( Teradata Software Demo CD can be ordered (free copy) from

Questions?

Introduction to Teradata

Similar presentations

Presentation on theme: "Introduction to Teradata"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction to Teradata

Similar presentations

Presentation on theme: "Introduction to Teradata"— Presentation transcript:

Similar presentations

About project

Feedback