Download presentation
Presentation is loading. Please wait.
1
MySQL Cluster overview and ndb-7.0 features demo
Presented By: Matthew Montgomery MySQL Meetup San Antonio, TX
2
Who am I? Matthew Montgomery Senior Support Engineer Working for Sun
MySQL Cluster team based in San Antonio, TX
3
If you have a question, ask it!
Interactivity If you have a question, ask it! (No matter how silly)
4
What is MySQL Cluster?
5
A Storage Engine
6
unique feature of MySQL
Storage Engines unique feature of MySQL
7
No one best way to store tables
8
Choice of Storage Engines
9
Different engine per table (if you want)
10
Just like a Virtual File System Layer
Application Application Application Application Kernel VFS ext3 ext4 vfat XFS
11
Just like a Virtual File System Layer
Application Application Application Application MySQL Server Storage Engine API MyISAM InnoDB Falcon NDB Cluster
12
What is MySQL Cluster?
13
What is MySQL Cluster? A High Availability
14
A High Availability High Performance
What is MySQL Cluster? A High Availability High Performance
15
A High Availability High Performance In Memory (and disk in 5.1+)
What is MySQL Cluster? A High Availability High Performance In Memory (and disk in 5.1+)
16
In Memory (and disk in 5.1+) Shared Nothing
What is MySQL Cluster? A High Availability High Performance In Memory (and disk in 5.1+) Shared Nothing
17
In Memory (and disk in 5.1+) Shared Nothing
What is MySQL Cluster? A High Availability High Performance In Memory (and disk in 5.1+) Shared Nothing Clustered
18
In Memory (and disk in 5.1+) Shared Nothing
What is MySQL Cluster? A High Availability High Performance In Memory (and disk in 5.1+) Shared Nothing Clustered Storage Engine
19
In Memory (and disk in 5.1+) Shared Nothing
What is MySQL Cluster? A High Availability High Performance In Memory (and disk in 5.1+) Shared Nothing Clustered Storage Engine
20
Designed for Five Nines (99.999%) Uptime
21
Sub-Second Failover
22
Sub-Second Failover High Availability mysqld mysqld Transactions
Data Nodes
23
Sub-Second Failover High Availability mysqld mysqld Transactions
Data Nodes
24
Sub-Second Failover High Availability mysqld mysqld Transactions
Data Nodes
25
Hot “Online” Backup
26
No Locks during Backup
27
Hot (Online) Backup High Availability mysqld mysqld Transactions
Data Nodes
28
Hot (Online) Backup High Availability mysqld mysqld Transactions
Data Nodes
29
Hot (Online) Compressed Backup
High Availability Hot (Online) Compressed Backup mysqld mysqld Transactions Compressed Compressed Data Nodes
30
Configurable Redundancy
NoOfReplicas
31
NoOfReplicas=1 D a t a
32
NoOfReplicas=1 D a t a
33
NoOfReplicas=1 D a t a No surviving replica of this data
34
NoOfReplicas=2 Da ta Da ta
35
NoOfReplicas=2 Da ta Da ta
36
NoOfReplicas=2 Da ta Da ta There is a copy of the data here
37
NoOfReplicas=2 Da ta Da ta There is a copy of the data here
38
NoOfReplicas=2 Da ta Da ta There is a copy of the data here
39
NoOfReplicas=2 Da ta Da ta There is a copy of the data here
40
NoOfReplicas=2 Da ta Da ta There is a copy of the data here
41
NoOfReplicas=2 Da ta Da ta There is a copy of the data here
42
NoOfReplicas=2 Da ta Da ta
43
NoOfReplicas=2 Da ta Da ta No surviving replicas for this data
44
NoOfReplicas=2 Da ta Da ta No surviving replicas for this data
45
NoOfReplicas=3 Da Da ta Da ta ta
46
NoOfReplicas=3 Da Da ta Da ta ta
47
NoOfReplicas=4 Data Data Data Data
48
For Production: NoOfReplicas=2 NoOfReplicas=1 (bad) D a t a Da ta Da
No surviving replica of this data Da ta Da ta NoOfReplicas=2 There is a copy of the data here
49
In Memory (and disk in 5.1+) Shared Nothing
What is MySQL Cluster? A High Availability High Performance In Memory (and disk in 5.1+) Shared Nothing Clustered Storage Engine
50
Not from BEGIN to COMMIT
High Performance Not from BEGIN to COMMIT
51
...but through Parallelism
High Performance ...but through Parallelism
52
High Performance Parallelism mysqld mysqld Transactions Data Nodes
53
In Memory (and disk in 5.1+) Shared Nothing
What is MySQL Cluster? A High Availability High Performance In Memory (and disk in 5.1+) Shared Nothing Clustered Storage Engine
54
Data and Indexes kept in main memory
In Memory (and disk) Data and Indexes kept in main memory
55
Non-Indexed attributes on disk (introduced in 5.1)
What is MySQL Cluster? Non-Indexed attributes on disk (introduced in 5.1)
56
Row
57
Row in memory part
58
Row in memory part on disk part
Row in memory part on disk part
59
In memory? What about machine/cluster failures?
60
Check point to disk
61
Check point to disk Frequent, Configurable
62
Check point to disk Not complete data loss after power outage
63
In Memory (and disk in 5.1+) Shared Nothing
What is MySQL Cluster? A High Availability High Performance In Memory (and disk in 5.1+) Shared Nothing Clustered Storage Engine
64
Shared Nothing Commodity PCs
65
Commodity Interconnects
Shared Nothing Commodity Interconnects
66
Commodity Interconnects Ethernet
Shared Nothing Commodity Interconnects Ethernet
67
Commodity Interconnects Ethernet SCI
Shared Nothing Commodity Interconnects Ethernet SCI
68
No Expensive Shared Disk
Shared Nothing No Expensive Shared Disk
69
No Expensive Shared Disk (so no single point of failure)
Shared Nothing No Expensive Shared Disk (so no single point of failure)
70
In Memory (and disk) Shared Nothing
What is MySQL Cluster? A High Availability High Performance In Memory (and disk) Shared Nothing Clustered Storage Engine
71
What is MySQL Cluster? Clustered
72
In Memory (and disk) Shared Nothing
What is MySQL Cluster? A High Availability High Performance In Memory (and disk) Shared Nothing Clustered Storage Engine
73
What is MySQL Cluster? ENGINE=NDBCLUSTER
74
In Memory (and disk in 5.1+) Shared Nothing
What is MySQL Cluster? A High Availability High Performance In Memory (and disk in 5.1+) Shared Nothing Clustered Storage Engine
75
What is MySQL Cluster?
76
What is MySQL Cluster? A Collection of Nodes
77
Node Types
78
Node Types Data Nodes (ndbd)
79
Data Nodes
80
Data Nodes Data nodes (running ndbd)
81
Data Nodes Data nodes (running ndbd) These store data
82
Data Nodes This cloud means a cluster Data nodes (running ndbd)
These store data
83
Data Nodes grouped
84
Data Nodes grouped into nodegroups
85
NoOfReplicas NoOfReplicas=2 Nodegroup 0 Nodegroup 1
86
NoOfReplicas DA TA NoOfReplicas=2 DA TA Nodegroup 0 Nodegroup 1
87
Data Nodes Nodegroup 0 Nodegroup 1
88
Data Nodes pk Nodegroup 0 Nodegroup 1
89
HASH(pk) pk
90
HASH(pk) pk
91
HASH(pk) pk
92
SELECT * from t1 pk
93
MySQL Servers talk to the Data Nodes
mysqld Nodegroup 0 Nodegroup 1
94
One used as a Transaction Coordinator
mysqld One storage node as Transaction Coordinator (TC) Nodegroup 0 Nodegroup 1
95
One used as a Transaction Coordinator
mysqld One storage node as Transaction Coordinator (TC) Nodegroup 0 Nodegroup 1
96
Many can send data mysqld One storage node as
Transaction Coordinator (TC) Nodegroup 0 Nodegroup 1
97
mysqld Several nodes can be involved in processing a single query
98
mysqld Several nodes can be involved in processing a single query Parallelism=Better Performance
99
Data Nodes (ndbd) Up to 48 in one cluster
Node Types Data Nodes (ndbd) Up to 48 in one cluster
100
Management Server (ndb_mgmd)
Node Types Management Server (ndb_mgmd)
101
Management Server config.ini mysqld mysqld ndbd ndb_mgmd
[ndbd default] NoOfReplicas= 2 DataMemory= 400M IndexMemory= 32M DataDir= /usr/local/mysql/cluster [ndbd] HostName= HostName= [ndb_mgmd] HostName= [mysqld] mysqld mysqld ndb_mgmd ndbd
102
Management Server config.ini mysqld mysqld ndbd ndb_mgmd
[ndbd default] NoOfReplicas= 2 DataMemory= 400M IndexMemory= 32M DataDir= /usr/local/mysql/cluster [ndbd] HostName= HostName= [ndb_mgmd] HostName= [mysqld] mysqld mysqld ndb_mgmd ndbd
103
Management Server config.ini mysqld mysqld ndb_mgmd ndbd ndbd
104
Management Server config.ini mysqld mysqld ndb_mgmd ndbd ndbd
105
Management Server config.ini mysqld mysqld ndb_mgmd ndbd ndbd
106
Management Server config.ini mysqld mysqld ndb_mgmd ndbd ndbd
107
Management Server config.ini mysqld mysqld ndb_mgmd ndbd ndbd
108
Management Server (ndb_mgmd)
Node Types Management Server (ndb_mgmd) also involved in arbitration, starting backups, issuing commands to nodes (start, stop, restart)
109
SQL Nodes (mysqld) also called API nodes
Node Types SQL Nodes (mysqld) also called API nodes
110
SQL Nodes mysqld mysqld mysqld
111
SQL and API Nodes mysqld mysqld mysqld NDB API NDB API
112
DELETE UPDATE INSERT mysqld mysqld mysqld NDB API NDB API
113
DELETE UPDATE INSERT update() update() mysqld mysqld mysqld NDB API
114
SQL Nodes (mysqld) Accessed like any other MySQL Server
Node Types SQL Nodes (mysqld) Accessed like any other MySQL Server
115
API Nodes Talk NDB API directly to the Data Nodes
Node Types API Nodes Talk NDB API directly to the Data Nodes
116
Management Server Node Types
Management Client talks to Management Server. Used to administer the cluster
117
Perl Mono PHP .NET mysql Ruby DELETE UPDATE INSERT Management Server
mysqld mysqld mysqld Management Server Mgm client Data Nodes NDB API NDB API update() update()
118
Physical Requirements
119
A node is a process, not a computer
120
At least three physical machines for High Availability
121
At least three physical machines for High Availability
122
Why?
123
Three machines minimum for HA
B A
124
Three machines minimum for HA
B A Can no longer see A
125
Three machines minimum for HA
B A Can no longer see A Did A Die?
126
Three machines minimum for HA
B A Or did the network link between A and B die? Can no longer see A
127
Three machines minimum for HA
B A Or did the network link between A and B die? Can no longer see B Can no longer see A
128
Who Is In Charge Now? B A Or did the network Can no longer see B
link between A and B die? Can no longer see B Can no longer see A
129
Split Brain = Bad B A Or did the network Can no longer see B
link between A and B die? Can no longer see B Can no longer see A
130
We detect possible Split Brain scenarios
Nodes will shut down instead
131
Three machines minimum for HA
B A Management server on 3rd machine
132
Three machines minimum for HA
B A Management server on 3rd machine Is Arbitrator
133
Three machines minimum for HA
B A Management server on 3rd machine Is Arbitrator
134
Three machines minimum for HA
B A Management server on 3rd machine Is Arbitrator
135
Three machines minimum for HA
B A Management server on 3rd machine Is Arbitrator
136
Physical Requirements
Management Server
137
Management Server Not CPU Intensive
138
Management Server Not CPU Intensive Not Memory Intensive
139
Management Server Not CPU Intensive Not Memory Intensive
Can have multiple for redundancy
140
Physical Requirements
Data Nodes
141
Data Node Requirements
Lots of Memory All indexed data in memory Data in memory Cache for data on disk
142
Data Node Requirements
Disk IO and capacity IO Rate can be calculated With disk-based tables calculation is harder Space usage calculated
143
Data Node Requirements
CPU Often not CPU bound (depends on queries) Before 7.0 single threaded SMP does not buy you a lot Few helper threads though Multithreaded ndbmtd (7.0)
144
Physical Requirements
SQL Node: Many API/SQL nodes are needed to load Storage Nodes
145
Physical Requirements
SQL Node: MySQL is multi-threaded SMP can help
146
A Configuration [ndbd default] NoOfReplicas= 2 DataMemory= 400M
IndexMemory= 32M DataDir= /usr/local/mysql/cluster [ndbd] HostName= HostName= [ndb_mgmd] HostName= [mysqld]
147
A Configuration Default settings for data nodes (ndbd) [ndbd default]
NoOfReplicas= 2 DataMemory= 400M IndexMemory= 32M DataDir= /usr/local/mysql/cluster [ndbd] HostName= HostName= [ndb_mgmd] HostName= [mysqld] Default settings for data nodes (ndbd)
148
A Configuration Settings for a data node [ndbd default]
NoOfReplicas= 2 DataMemory= 400M IndexMemory= 32M DataDir= /usr/local/mysql/cluster [ndbd] HostName= HostName= [ndb_mgmd] HostName= [mysqld] Settings for a data node
149
A Configuration Settings for a data node Settings for a data node
[ndbd default] NoOfReplicas= 2 DataMemory= 400M IndexMemory= 32M DataDir= /usr/local/mysql/cluster [ndbd] HostName= HostName= [ndb_mgmd] HostName= [mysqld] Settings for a data node Settings for a data node
150
A Configuration Settings for a management server [ndbd default]
NoOfReplicas= 2 DataMemory= 400M IndexMemory= 32M DataDir= /usr/local/mysql/cluster [ndbd] HostName= HostName= [ndb_mgmd] HostName= [mysqld] Settings for a management server
151
A Configuration [mysqld] Settings for a SQL/API node [ndbd default]
NoOfReplicas= 2 DataMemory= 400M IndexMemory= 32M DataDir= /usr/local/mysql/cluster [ndbd] HostName= HostName= [ndb_mgmd] HostName= [mysqld] [mysqld] Settings for a SQL/API node
152
Demo Configuration 2 Replicas 2n Data Nodes (2 or 4) 50MB for Data
5MB for Indexes 1 management server 3 MySQL Servers/API Nodes No other special options
153
A Configuration ndb_mgmd [ndbd default] NoOfReplicas= 2
DataMemory= 400M IndexMemory= 32M DataDir= /usr/local/mysql/cluster [ndbd] HostName= HostName= [ndb_mgmd] HostName= [mysqld] ndb_mgmd
154
A Configuration ndbd ndb_mgmd [ndbd default] NoOfReplicas= 2
DataMemory= 400M IndexMemory= 32M DataDir= /usr/local/mysql/cluster [ndbd] HostName= HostName= [ndb_mgmd] HostName= [mysqld] ndb_mgmd ndbd
155
A Configuration mysqld mysqld ndbd ndb_mgmd [ndbd default]
NoOfReplicas= 2 DataMemory= 400M IndexMemory= 32M DataDir= /usr/local/mysql/cluster [ndbd] HostName= HostName= [ndb_mgmd] HostName= [mysqld] mysqld mysqld ndb_mgmd ndbd
156
A Configuration mysqld mysqld ndbd ndb_mgmd [ndbd default]
NoOfReplicas= 2 DataMemory= 400M IndexMemory= 32M DataDir= /usr/local/mysql/cluster [ndbd] HostName= HostName= [ndb_mgmd] HostName= [mysqld] mysqld mysqld ndb_mgmd ndbd
157
A Configuration mysqld mysqld ndbd ndb_mgmd [ndbd default]
NoOfReplicas= 2 DataMemory= 400M IndexMemory= 32M DataDir= /usr/local/mysql/cluster [ndbd] HostName= HostName= [ndb_mgmd] HostName= [mysqld] mysqld mysqld ndb_mgmd ndbd
158
A Configuration applications mysqld mysqld ndbd ndb_mgmd
[ndbd default] NoOfReplicas= 2 DataMemory= 400M IndexMemory= 32M DataDir= /usr/local/mysql/cluster [ndbd] HostName= HostName= [ndb_mgmd] HostName= [mysqld] mysqld mysqld ndb_mgmd ndbd
159
Starting Nodes
160
Configuration Information
A Starting node needs: Configuration Information
161
Location of Management Server
A Starting node needs: Location of Management Server
162
The Connect String
163
Lists Management Servers
The Connect String Lists Management Servers
164
A Connect String:
165
A Connect String: :1186
166
A Connect String: :9310
167
A Connect String: ,
168
A Connect String: , ,nodeid=3
169
A Bad Connect String: mgmsrv1,mgmsrv2
170
use IP Addresses (or hosts file)
DNS is Not Reliable Do Not Trust DNS to work, use IP Addresses (or hosts file)
171
DHCP is Not Reliable Do Not Trust DHCP to work,
use static IP Addresses
172
Starting The Cluster
173
Starting the cluster 1. Management server 2. Data Nodes
3. MySQL Server Nodes
174
Starting the cluster 1. Management server 2. Data Nodes
Needs to be started first (so new nodes can get the configuration) On the management server: $ ndb_mgmd -f config.ini 2. Data Nodes 3. MySQL Server Nodes
175
Starting the cluster 1. Management server 2. Data Nodes
On each storage node: $ ndbd -c The -c option is the connect string 3. MySQL Server Nodes
176
Starting the cluster 1. Management server 2. Data Nodes
3. MySQL Server Nodes Make sure the ndbcluster option is enabled (command line or my.cnf) Make sure the connect string is specified (command line or my.cnf) Start the MySQL server in your preferred way (e.g. /etc/init.d/mysql start)
177
MySQL Server Options 1. Create a my.cnf file 2. Add ndbcluster option
3. Add ndb-connectstring option 4. Set unique port, socket, datadir 5. mysql_install_db --defaults-file=/path/my.cnf 6. ./mysqld –defaults-file=/path/to/my.cnf 7. repeat for each SQL node
178
Using the Management Client
Basic Monitoring Using the Management Client
179
On Management Server, $DataDir/ndb_<id>_cluster.log
Check the Cluster Log On Management Server, $DataDir/ndb_<id>_cluster.log
180
Using MySQL Cluster ENGINE=NDBCLUSTER
181
Let's CREATE TABLE CREATE TABLE t1 (
pk1 INT PRIMARY KEY AUTO_INCREMENT, v VARCHAR(100) ) ENGINE=NDBCLUSTER;
182
SELECT, INSERT, UPDATE, DELETE from all SQL Nodes
and see the new and updated rows!
183
Isn't this just like Replication?
184
Isn't this just like Replication?
No.
185
MySQL Replication Asynchronous Read-only slaves
186
All nodes can perform reads/writes
MySQL Cluster Synchronous, All nodes can perform reads/writes
187
MySQL Replication Changes made by a transaction are available on a slave after a small amount of time
188
MySQL Cluster Changes made by a transaction are instantly available from all nodes on commit
189
Two-Phase Commit Protocol
MySQL Cluster Two-Phase Commit Protocol
190
MySQL Cluster Two-Phase Commit Protocol
Ensures consistency in event of failure. (with a performance penalty)
191
Cluster vs Replication
With replication, a single transaction will be COMMITted quicker. But if master fails before a slave retrieves the binary log, transaction is lost.
192
Cluster vs Replication
With replication, a single transaction will be COMMITted quicker. But if master fails before a slave retrieves the binary log, the transaction is lost.
193
Cluster vs Replication
With Cluster, COMMIT means transaction can survive node failures
194
Cluster and Replication
We'll cover later
195
What else does MySQL Cluster Support?
196
All the Standard 5.1 features
197
Views
198
Stored Procedures
199
Triggers
200
Triggers Implemented in the MySQL Server, so changes made with NDB API programs do not fire triggers
201
Standard Permissions GRANT/REVOKE
202
...and a caveat
203
The mysql database is per SQL node, not per cluster.
204
So GRANT/REVOKE, Triggers, Stored Procedures, Views have to be set up on each SQL node.
205
Also, no native FOREIGN KEYs support
206
You can emulate foreign keys on the SQL nodes using triggers.
207
Also, no FULLTEXT indexes
208
Distributed Metadata
209
Notice how the 2nd MySQL Server knew that there were tables in the Cluster
210
The MySQL Server uses .frm files to track table metadata
211
For MySQL Cluster, we store the FRM files in the Cluster
212
Retrieving them when needed
213
Distributed Metadata MySQL server MySQL server .frm files .frm files
Distributed database
214
Distributed Metadata create table t1 ... MySQL server MySQL server
.frm files MySQL server .frm files MySQL server Distributed database
215
Distributed Metadata create table t1 ... MySQL server MySQL server
.frm files MySQL server .frm files MySQL server copy .frm compressed .frm copies Distributed database
216
Distributed Metadata select * from t1 create table t1 ... MySQL server
.frm files MySQL server .frm files MySQL server copy .frm compressed .frm copies Distributed database
217
Distributed Metadata select * from t1 create table t1 ... MySQL server
.frm files MySQL server .frm files MySQL server autodiscover .frm copy .frm compressed .frm copies Distributed database
218
Data Distribution
219
MySQL Cluster implements horizontal partitioning
220
pk 2 Nodes
221
pk 2 Nodes F1 F2 Two Fragments
222
NoOfReplicas=2 pk 2 Nodes F1 F2 Two Fragments
223
NoOfReplicas=2 pk 2 Nodes F1 F1 F2 F2 Two Fragments
224
NoOfReplicas=2 pk 2 Nodes F1 F1 F2 F2
225
NoOfReplicas=2 pk 2 Nodes F1 F1 F2 F2
226
NoOfReplicas=2 pk 2 Nodes F1 F1 F1 F1 F2 F2 F2 F2
227
NoOfReplicas=2 pk 2 Nodes F1 F1 F1 F1 F2 F2 F2 F2 Primary Replica
228
NoOfReplicas=2 pk 2 Nodes F1 F1 F1 F1 F2 F2 F2 F2 Secondary Replica
229
Why two fragments for two nodes?
230
What is a Primary Replica responsible for?
231
What is a Primary Replica responsible for?
Locks
232
What is a Primary Replica responsible for?
Locks, Reads
233
What is a Primary Replica responsible for?
Locks, Reads (among other things)
234
Two fragments for a two node cluster
Load Balances
235
What about node failure?
236
NoOfReplicas=2 pk 2 Nodes F1 F1 F1 F1 F2 F2 F2 F2
237
NoOfReplicas=2 pk 2 Nodes F1 F1 F1 F1 F2 F2 F2 F2
238
NoOfReplicas=2 pk 2 Nodes F1 F1 F1 F1 F2 F2 F2 F2 Transparent Failover
239
NoOfReplicas=2 pk 2 Nodes F1 F1 F1 F1 F2 F2 F2 F2 Primary Replica
240
Surviving nodes take over
241
Surviving nodes have increased load
242
What about node recovery?
243
NoOfReplicas=2 pk 2 Nodes F1 F1 F1 F1 F2 F2 F2 F2 Primary Replica
244
NoOfReplicas=2 pk 2 Nodes F1 F1 F1 F1 F2 F2 F2 F2 Synchronize data
245
NoOfReplicas=2 pk 2 Nodes F1 F1 F1 F1 F2 F2 F2 F2
246
What about ongoing transactions during node failure?
247
Transactions using a failed node are aborted
248
What about MySQL Server node failure?
249
Application can connect to another MySQL Server
250
How does an Application connect to another MySQL Server?
251
Load Balancing system
252
Connector based(JDBC) Hardware load balancer
253
What about Management Server failure?
254
Continued operation of cluster not dependent on Management Server
255
Management Server required to start new nodes
256
Can have multiple Management Servers (but there is increased admin work)
257
So...
258
Let's kill things
259
kill -9 (Angel and NDB)
260
See the failure reported in the logs
261
See the failure from the management client
262
See that things still work
263
Run some SELECT, INSERT, UPDATE queries
264
Restart the failed data node
265
See it rejoin
266
Run more queries
267
See that all is good with the world
268
Two-Phase Commit Protocol
269
Two-phase Commit Protocol
Keeping DB nodes synchronized facilitate immediate fail-over TC DB 1 DB 3 DB 2 DB 4 Node group 1 Node group 2 Two-phase commit Prepare phase: Both node groups get their information updated Commit phase: The change is committed
270
Two-phase Commit Protocol
Keeping DB nodes synchronized facilitate immediate fail-over TC DB 1 DB 3 DB 2 DB 4 Node group 1 Node group 2 Two-phase commit Prepare phase: Both node groups get their information updated Commit phase: The change is committed
271
Two-phase Commit Protocol
Keeping DB nodes synchronized facilitate immediate fail-over TC DB 1 DB 3 DB 2 DB 4 Node group 1 Node group 2 Two-phase commit Prepare phase: Both node groups get their information updated Commit phase: The change is committed
272
Two-phase Commit Protocol
Keeping DB nodes synchronized facilitate immediate fail-over TC DB 1 DB 3 DB 2 DB 4 Node group 1 Node group 2 Two-phase commit Prepare phase: Both node groups get their information updated Commit phase: The change is committed
273
Two-phase Commit Protocol
Keeping DB nodes synchronized facilitate immediate fail-over TC DB 1 DB 3 DB 2 DB 4 Node group 1 Node group 2 Two-phase commit Prepare phase: Both node groups get their information updated Commit phase: The change is committed
274
Two-phase Commit Protocol
Keeping DB nodes synchronized facilitate immediate fail-over TC DB 1 DB 3 DB 2 DB 4 Node group 1 Node group 2 Two-phase commit Prepare phase: Both node groups get their information updated Commit phase: The change is committed
275
Two-phase Commit Protocol
Keeping DB nodes synchronized facilitate immediate fail-over TC DB 1 DB 3 DB 2 DB 4 Node group 1 Node group 2 Two-phase commit Prepare phase: Both node groups get their information updated Commit phase: The change is committed
276
Transaction Over 3 Replicas
TC DB 1 DB 4 DB 2 DB 5 DB 3 DB 6 Node group 1 Node group 2
277
Transaction Over 3 Replicas
TC DB 1 DB 4 DB 2 DB 5 DB 3 DB 6 Node group 1 Node group 2
278
Transaction Over 3 Replicas
TC DB 1 DB 4 DB 2 DB 5 DB 3 DB 6 Node group 1 Node group 2
279
Transaction Over 3 Replicas
TC DB 1 DB 4 DB 2 DB 5 DB 3 DB 6 Node group 1 Node group 2
280
Transaction Over 3 Replicas
TC DB 1 DB 4 DB 2 DB 5 DB 3 DB 6 Node group 1 Node group 2
281
Transaction Over 3 Replicas
TC DB 1 DB 4 DB 2 DB 5 DB 3 DB 6 Node group 1 Node group 2
282
Transaction Over 3 Replicas
TC DB 1 DB 4 DB 2 DB 5 DB 3 DB 6 Node group 1 Node group 2
283
Transaction Over 3 Replicas
TC DB 1 DB 4 DB 2 DB 5 DB 3 DB 6 Node group 1 Node group 2
284
Transaction Over 3 Replicas
TC DB 1 DB 4 DB 2 DB 5 DB 3 DB 6 Node group 1 Node group 2
285
Transaction Over 3 Replicas
TC DB 1 DB 4 DB 2 DB 5 DB 3 DB 6 Node group 1 Node group 2
286
Transaction Over 3 Replicas
TC DB 1 DB 4 DB 2 DB 5 DB 3 DB 6 Node group 1 Node group 2
287
Transaction Over 3 Replicas
TC DB 1 DB 4 DB 2 DB 5 DB 3 DB 6 Node group 1 Node group 2
288
Transaction Over 3 Replicas
TC DB 1 DB 4 DB 2 DB 5 DB 3 DB 6 Node group 1 Node group 2
289
Two phase commit enables recovery in distributed system
290
Nodes communicate with each other over an interconnect
291
Ethernet is common/cheap
292
Ethernet isn't the fastest in the world
293
Performance of some queries is very latency dependent
294
MySQL Cluster abstracts away the communication method
295
TCP Transporter
296
SCI Transporter
297
SHM Transporter (alpha)
298
In reality: use TCP or SCI (with appropriate hardware)
299
In reality: use gigabit ethernet, not 100Mbit for TCP
300
Use private network for MySQL Cluster traffic
301
Inter-node communication is not authenticated and not encrypted
302
other applications on the network may interfere with heartbeats
303
Heartbeats
304
Failure detection: Heartbeats, lost connections
Nodes organized in a logical circle DB Node 1 DB Node 4 DB Node 2 All nodes must have the same view of which nodes are alive DB Node 3 Heartbeat messages sent to next node in circle
305
Schema considerations for MySQL Cluster
306
Every table has a PRIMARY KEY
307
Every table has a PRIMARY KEY
Even if you don't explicitly set one
308
Three types of indexes
309
Three types of indexes 1) Primary Hash Index 2) Unique Hash Index
3) Ordered T-tree Index
310
UNIQUE (SQL) is Unique Hash and Ordered Tree (NDB)
311
UNIQUE USING HASH (SQL) is Unique Hash (NDB)
312
PRIMARY KEY (SQL) is Primary Hash and Ordered Tree (NDB)
313
PRIMARY KEY USING HASH (SQL) is Primary Hash (NDB)
314
Q: What query can use a hash index?
A) Key lookup
315
Q: What query can use an ordered index?
A) Range scan & ORDER BY
316
So what happens in a table scan?
317
MySQL Server NDBCLUSTER Engine Data Nodes (ndbd)
318
MySQL Server NDBCLUSTER Engine TC TC TC TC Data Nodes (ndbd)
319
MySQL Server NDBCLUSTER Engine TC TC TC TC Data Nodes (ndbd)
320
MySQL Server NDBCLUSTER Engine SCANTABREQ TC TC TC TC Data Nodes (ndbd)
321
MySQL Server NDBCLUSTER Engine SCAN_FRAGREQ SCAN_FRAGREQ TC TC TC TC SCAN_FRAGREQ Data Nodes (ndbd)
322
MySQL Server NDBCLUSTER Engine LQH LQH LQH LQH Data Nodes (ndbd)
323
MySQL Server NDBCLUSTER Engine TRANSID_AI LQH LQH LQH LQH Data Nodes (ndbd)
324
MySQL Server NDBCLUSTER Engine TRANSID_AI LQH LQH LQH LQH Data Nodes (ndbd)
325
MySQL Server NDBCLUSTER Engine TRANSID_AI LQH LQH LQH LQH Data Nodes (ndbd)
326
MySQL Server NDBCLUSTER Engine TRANSID_AI LQH LQH LQH LQH Data Nodes (ndbd)
327
MySQL Server NDBCLUSTER Engine TRANSID_AI LQH LQH LQH LQH Data Nodes (ndbd)
328
MySQL Server NDBCLUSTER Engine TRANSID_AI LQH LQH LQH LQH Data Nodes (ndbd)
329
MySQL Server NDBCLUSTER Engine TRANSID_AI LQH LQH LQH LQH Data Nodes (ndbd)
330
MySQL Server ORDER BY done here NDBCLUSTER Engine TRANSID_AI LQH LQH LQH LQH Data Nodes (ndbd)
331
MySQL Server WHERE done here NDBCLUSTER Engine TRANSID_AI LQH LQH LQH LQH Data Nodes (ndbd)
332
MySQL Server NDBCLUSTER Engine SCAN_FRAGCONF SCAN_FRAGCONF TC LQH LQH LQH SCAN_FRAGCONF LQH Data Nodes (ndbd)
333
MySQL Server NDBCLUSTER Engine SCAN_TABCONF TC Data Nodes (ndbd)
334
Engine Condition Pushdown
335
Evaluate conditions in parallel on data nodes
336
Only send matching rows to API
337
MySQL Server NDBCLUSTER Engine LQH LQH LQH LQH Data Nodes (ndbd)
338
MySQL Server NDBCLUSTER Engine LQH LQH LQH LQH Data Nodes (ndbd)
339
MySQL Server NDBCLUSTER Engine LQH LQH LQH LQH Data Nodes (ndbd)
340
MySQL Server NDBCLUSTER Engine LQH LQH LQH LQH Data Nodes (ndbd)
341
MySQL Server NDBCLUSTER Engine LQH LQH LQH LQH Data Nodes (ndbd)
342
MySQL Server NDBCLUSTER Engine TRANSID_AI LQH LQH LQH LQH Data Nodes (ndbd)
343
MySQL Server NDBCLUSTER Engine TRANSID_AI LQH LQH LQH LQH Data Nodes (ndbd)
344
MySQL Server NDBCLUSTER Engine TRANSID_AI LQH LQH LQH LQH Data Nodes (ndbd)
345
MySQL Server NDBCLUSTER Engine TRANSID_AI LQH LQH LQH LQH Data Nodes (ndbd)
346
MySQL Server NDBCLUSTER Engine TRANSID_AI LQH LQH LQH LQH Data Nodes (ndbd)
347
MySQL Server NDBCLUSTER Engine TRANSID_AI LQH LQH LQH LQH Data Nodes (ndbd)
348
MySQL Server ORDER BY done here NDBCLUSTER Engine TRANSID_AI LQH LQH LQH LQH Data Nodes (ndbd)
349
MySQL Server WHERE already done NDBCLUSTER Engine Data Nodes (ndbd)
350
set engine_condition_pushdown=on|off
351
Just about any normal comparisons can be pushed down
352
Common to see 5-10x improvement
353
Details in EXPLAIN [EXTENDED]
354
Batching
355
Rule: One large network packet is quicker than several small ones
356
Batching leads to improved performance
357
We do:
358
Batched Inserts
359
INSERT INTO t1 (a) values (1),(2),(3),(4),(5);
360
Batched Lookups
361
SELECT * FROM t1 WHERE pk1 IN (11,22,63,14,25,6,9,8);
362
All key lookups sent together
363
Batched lookups: 2-3x improvement
364
Query Cache
365
Invalidated when table is changed
366
MySQL Cluster has slightly different semantics...
367
ndb_cache_check_time
368
Milliseconds to wait before checking the query cache
369
Ask Data nodes if other nodes have changed anything
370
If table changed, invalidate Query Cache
371
This means:
372
For up to ndb_cache_check_time milliseconds, result may be old
373
BACKUP
374
Data not backed up is data not wanted
375
Two options for backing up MySQL Cluster
376
mysqldump Backup
377
mysqldump: single connection to single MySQL Server
378
mysqldump: human readable
379
mysqldump: READ_COMMITTED
380
mysqldump: READ_COMMITTED i.e. not consistent
381
MySQL Cluster Native Backup
382
MySQL Cluster Native Backup non-blocking parallel consistent
383
ndb_mgm> START BACKUP
384
Each data node participates in BACKUP
385
Each node performs backup for its primary fragments
386
Backups stored in ndb_<id>_fs/BACKUP/
387
Backup API
388
Backup API updates updates
389
Backup API updates updates Data Log Data Log
390
Backup API updates updates Data Log Data Log
391
Backup API updates updates Data Log Data Log
392
Backup API updates updates Data Log Data Log
393
Backup API updates updates Data Log Data Log
394
Backup API updates updates Data Log Data Log
395
Backup API updates updates Data Log Data Log
396
Backup API updates updates Data Log Data Log
397
Backup API updates updates Data Log Data Log
398
Backup API updates updates Data Log Data Log
399
Backup API updates updates Data Log Data Log
400
Backup API updates updates Data Log Data Log
401
Backup API updates updates Control Control Data Log Data Log
402
RESTORE
403
ndb_restore
404
Restore Control Control Data Log Data Log
405
Restore Filesystem Control Control Data Log Data Log
406
Restore Filesystem Filesystem Control Control Data Log Data Log
407
Restore Control Control Data Log Data Log
408
Restore - Metadata Control Control Data Log Data Log
409
Restore - Data Control Control Data Log Data Log
410
Restore - Data Control Control Data Log Data Log
411
Restore - Data Control Control Data Log Data Log
412
Restore – Data (Log) Control Control Data Log Data Log
413
Restore Control Control Data Log Data Log
414
Restore API updates updates Control Control Data Log Data Log
415
Some Configuration Parameters
416
Several different categories
417
Why specify resource limits?
418
We statically allocate memory on startup for some resources
419
Deterministic behavior for resource allocation at run time
420
1. Memory
421
DataMemory
422
Limits amount of data that can be stored in the Cluster
DataMemory Limits amount of data that can be stored in the Cluster
423
DataMemory Per Node
424
Allocated to tables in Pages
DataMemory Allocated to tables in Pages
425
4 node, 2 replica DataMemory=100MB
426
4 node, 2 replica DataMemory=100MB
400MB total
427
4 node, 2 replica DataMemory=100MB
200MB of Data (2 copies)
428
Memory used by hash indexes
IndexMemory Memory used by hash indexes
429
4 node, 2 replica IndexMemory=10MB
20MB of Indexes (2 copies)
430
StringMemory
431
Memory used for table names, column names, FRM files etc
StringMemory Memory used for table names, column names, FRM files etc
432
(the default “5”% is likely fine up to ~1000 tables)
StringMemory (the default “5”% is likely fine up to ~1000 tables)
433
2. Transaction Parameters
434
MaxNoOfConcurrentTransactions
Maximum number of ongoing transactions
435
MaxNoOfConcurrentOperations
Maximum number of uncommitted changed rows (divided by the number of data nodes)
436
3. Scans and Buffering
437
MaxNoOfConcurrentScans
Maximum number of parallel scans (for each data node)
438
BatchSizePerLocalScan
Linked with ScanBatchSize – how many rows we batch for scans
439
4. Logging and Checkpointing
440
NoOfFragmentLogFiles
441
NoOfFragmentLogFiles
Sets number of REDO log files for each node
442
NoOfFragmentLogFiles
Each transaction written to REDO Log
443
NoOfFragmentLogFiles
REDO Log used in System Restart
444
NoOfFragmentLogFiles
REDO Log record exists for 2 local checkpoints
445
NoOfFragmentLogFiles
If no room, transactions aborted with: 410 Out of log file space temporarily
446
NoOfFragmentLogFiles
Allocated in units of 64MB (changed with FragmentLogFileSize)
447
NoOfFragmentLogFiles
Default is 8; 8x64MB = 512MB
448
NoOfFragmentLogFiles
Update heavy systems need large values... even up to 300
449
NoOfFragmentLogFiles
300 x 64MB = 19.2GB
450
NoOfFragmentLogFiles
Can be changed on a running cluster... rolling –initial restart
451
5. Metadata Objects
452
MaxNoOfAttributes
453
Maximum number of columns
MaxNoOfAttributes Maximum number of columns for all tables
454
MaxNoOfTables
455
Maximum number of tables
MaxNoOfTables Maximum number of tables
456
MaxNoOfOrderedIndexes
457
MaxNoOfOrderedIndexes
Maximum number of ordered indexes
458
MaxNoOfUniqueHashIndexes
459
MaxNoOfUniqueHashIndexes
Maximum number of unique hash indexes
460
6. Behavior
461
LockPagesInMainMemory
462
LockPagesInMainMemory
Prevents memory allocated by ndbd from being swapped out by the Operating System
463
LockPagesInMainMemory
(This is a good idea)
464
Diskless
465
Nothing written to disk
Diskless Nothing written to disk
466
Diskless No checkpointing
467
If enabled neither records or tables survive cluster crash
Diskless If enabled neither records or tables survive cluster crash
468
...but requires much less (zero) disk space and bandwidth.
Diskless ...but requires much less (zero) disk space and bandwidth.
469
7. Timeouts, Intervals, Disk Paging
470
TimeBetweenWatchDogCheck
471
TimeBetweenWatchDogCheck
Remember the Angel process from before?
472
TimeBetweenWatchDogCheck
Every TimeBetweenWatchDogCheck milliseconds, check that the main thread isn't stuck
473
StartPartialTimeout
474
Normally, we wait for all data nodes before starting the cluster
StartPartialTimeout Normally, we wait for all data nodes before starting the cluster
475
StartPartialTimeout After StartPartialTimeout milliseconds (30s), we'll perform a partial start
476
0 means always wait for all the data nodes
StartPartialTimeout 0 means always wait for all the data nodes
477
StartPartitionedTimeout
478
StartPartitionedTimeout
If after StartPartialTimeout the cluster could be in a partitioned state, we wait an additional StartPartitionedTimeout milliseconds
479
HeartbeatIntervalDbDb
480
HeartbeatIntervalDbDb
Every HeartbeatIntervalDbDb, heartbeats sent between Data nodes
481
HeartbeatIntervalDbDb
Maximum time to discover node failure is 4 times HeartbeatIntervalDbDb
482
HeartbeatIntervalDbApi
483
HeartbeatIntervalDbApi
Each Data node sends heartbeats to each API node connected to it
484
TimeBetweenLocalCheckpoints
Wins the prize for the strangest units for a configuration parameter
485
TimeBetweenLocalCheckpoints
Not a time period
486
TimeBetweenLocalCheckpoints
Amount of updates before starting a local checkpoint
487
TimeBetweenLocalCheckpoints
...but not a value in bytes
488
TimeBetweenLocalCheckpoints
base-2 logarithm of the number of 4 byte words
489
TimeBetweenLocalCheckpoints
base-2 logarithm of the number of 4 byte words (sorry, not joking)
490
TimeBetweenLocalCheckpoints
Default value is 20 4 x 220 = 4MB
491
TimeBetweenLocalCheckpoints
Value of 21 4 x 221 = 8MB
492
TimeBetweenLocalCheckpoints
Value of 22 4 x 222 = 16MB (and so on...)
493
TimeBetweenLocalCheckpoints
Maximum value of 31 4 x 231 = 8GB
494
TimeBetweenLocalCheckpoints
Value of 6 or less Constant local checkpoints
495
TimeBetweenLocalCheckpoints
Designed to prevent checkpointing on mostly idle clusters
496
TimeBetweenGlobalCheckpoints
497
TimeBetweenGlobalCheckpoints
A COMMITted transaction is in main memory of all replicas
498
TimeBetweenGlobalCheckpoints
A COMMITted transaction is not immediately flushed to disk
499
TimeBetweenGlobalCheckpoints
A global checkpoint is where a set of COMMITted transaction are flushed to disk
500
TimeBetweenGlobalCheckpoints
This is where we recover to after a System Restart
501
TimeBetweenGlobalCheckpoints
Default is every 2000 milliseconds
502
Checkpointing
503
COMMIT= txn survives node failure
504
COMMIT != Disk Persistence
505
The D in ACID is still covered
506
Durable to machine failure
507
Durable to disk failure
508
In event of cluster failure, want to be able to restore a consistent image of the database
509
We checkpoint to disk (except when in diskless mode)
510
Can't lock the database while we write a checkpoint
511
Checkpoint in background, while transactions continue
512
Write image of database (LCP) and REDO log (GCP)
513
Take a Local Check Point of the database, apply REDO from that point to Global Check Point
514
Space Usage
515
Fixed Size Rows (prior to 5. 1) Variable Sized Columns (5
Fixed Size Rows (prior to 5.1) Variable Sized Columns (5.1, with 4byte alignment)
516
BLOBs and TEXT
517
BLOBs and TEXT 256 bytes in the row with remainder in 2000 byte chunks stored in separate table
518
Indexed Columns must be in main memory
519
Non-Indexed columns can be on disk
520
Disk Columns are fixed size
521
Disk Columns are fixed size VARCHAR(11) uses 12 bytes on disk
522
Columns are 4-byte aligned
523
Calculate storage requirements
524
Know your dataset
525
Use ndb_size.pl (examines existing database, creates report)
526
ndb_size.pl output
527
Variable Sized Rows
528
4.1 and 5.0 1 2 hello int int VARCHAR or just saying hello
529
Wasted Space 1 2 hello Wasted Space int int VARCHAR
This means you could have a lot of wasted space on a lot of rows Wasted Space
530
5.1 1 2 hello int int VARCHAR However, in 5.1, we now have variable sized rows. This means that we don't waste space on fields that aren't full.
531
The Saving Which can save a lot of space when you have any reasonable number of rows
532
The Saving We just use the space needed by each particular row. Here we only have a few longer rows, saving us a lot of memory.
533
On line Add/Drop Index
534
ADD INDEX (4.1, 5.0) t1 Index Index Rows
In 4.1 and 5.0, to add an index to a table
535
ADD INDEX (4.1, 5.0) t1 temp table Index Index Index Index Index Rows
We first create a temporary table with a schema of what we want t1 to look like (with the new index)
536
ADD INDEX (4.1, 5.0) t1 temp table Index Index Index Index Index Rows
and then copy the data in t1 over to the temporary table, building all the indexes as we go. We keep a TABLE LOCK while we do this
537
ADD INDEX (4.1, 5.0) t1 temp table Index Index Index Index Index Rows
538
ADD INDEX (4.1, 5.0) t1 temp table Index Index Index Index Index Rows
539
ADD INDEX (4.1, 5.0) t1 temp table Index Index Index Index Index Rows
540
ADD INDEX (4.1, 5.0) t1 temp table Index Index Index Index Index Rows
we then delete t1
541
ADD INDEX (4.1, 5.0) temp table Index Index Index Rows
and rename the temporary table
542
ADD INDEX (4.1, 5.0) t1 Index Index Index Rows
so we now have t1 with our new index
543
ADD INDEX in 5.1 t1 Index Index Rows In 5.1, we can do a lot better
544
ADD INDEX in 5.1 t1 Index Index Index Rows
We build a new index as an online operation – avoiding the copy.
545
ADD INDEX in 5.1 t1 Index Index Index Rows
so the only thing we have to build is one index, not copying all the data and rebuilding all the other indexes.
546
ADD INDEX in 5.1 t1 Index Index Index Rows
547
ADD INDEX in 5.1 t1 Index Index Index Rows
548
ADD INDEX in 5.1 t1 Index Index Index Rows
549
DROP INDEX in 5.1 t1 Index Index Index Rows
Delete is the same, we only drop the index we don't want
550
DROP INDEX in 5.1 t1 Index Index Rows
551
How much faster?
552
Online Add Index Before (copy the table):
mysql> create index b on t1(b); Query OK, 1356 rows affected (2.20 sec) Records: Duplicates: 0 Warnings: 0 mysql> drop index b on t1; Query OK, 1356 rows affected (2.03 sec)
553
Online Add Index Before (copy the table):
mysql> create index b on t1(b); Query OK, 1356 rows affected (2.20 sec) Records: Duplicates: 0 Warnings: 0 mysql> drop index b on t1; Query OK, 1356 rows affected (2.03 sec) Now (just add/drop an index): Query OK, 0 rows affected (0.58 sec) Records: 0 Duplicates: 0 Warnings: 0 Query OK, 0 rows affected (0.46 sec)
554
User Defined Partitioning
555
Since the dawn of time...
557
pk
558
pk
559
pk Nodegroup 0 Nodegroup 1
560
HASH(pk) pk
561
HASH(pk) pk
562
HASH(pk) pk
563
Perception Reality HASH(pk) pk
564
pk
565
pk Two Partitions
566
How the default looks (in 5.1 SHOW CREATE TABLE)
CREATE TABLE account(number int unsigned, location varchar, amount int) PRIMARY KEY (number) [PARTITION BY KEY ()] [(PARTITION P0 NODEGROUP 0, PARTITION P1 NODEGROUP 1, …)] ENGINE=NDBCLUSTER;
567
How the default looks (in 5.1 SHOW CREATE TABLE)
CREATE TABLE account(number int unsigned, location varchar, amount int) PRIMARY KEY (number) [PARTITION BY KEY ()] [(PARTITION P0 NODEGROUP 0, PARTITION P1 NODEGROUP 1, …)] ENGINE=NDBCLUSTER;
568
How the default looks (in 5.1 SHOW CREATE TABLE)
CREATE TABLE account(number int unsigned, location varchar, amount int) PRIMARY KEY (number) [PARTITION BY KEY ()] [(PARTITION P0 NODEGROUP 0, PARTITION P1 NODEGROUP 1, …)] ENGINE=NDBCLUSTER;
569
How the default looks (in 5.1 SHOW CREATE TABLE)
CREATE TABLE account(number int unsigned, location varchar, amount int) PRIMARY KEY (number) [PARTITION BY KEY ()] [(PARTITION P0 NODEGROUP 0, PARTITION P1 NODEGROUP 1, …)] ENGINE=NDBCLUSTER;
570
Now, In 5.1
571
User Defined Partitioning
572
By Key
573
Partition by Key CREATE TABLE account(number int unsigned, location varchar, amount int) PRIMARY KEY (number) [PARTITION BY KEY ()] [(PARTITION P0 NODEGROUP 0, PARTITION P1 NODEGROUP 1, …)] ENGINE=NDBCLUSTER;
574
MySQL Cluster Replication
575
Not Internal Mirroring between nodes
576
Replication from one Cluster to Another Cluster
577
Why?
578
the usual reasons
579
Why replicate between Clusters?
Geographical Redundancy Chatting on replication for about 1min 20sec so ffar.
580
Why replicate between Clusters?
Geographical Redundancy Split the processing load Chatting on replication for about 1min 20sec so ffar.
581
Why replicate between Clusters?
Geographical Redundancy Split the processing load e.g. for monthly reports Chatting on replication for about 1min 20sec so ffar.
582
Quick Overview of MySQL Replication
583
MySQL Replication
584
MySQL Replication INSERT ...
585
MySQL Replication INSERT ... INSERT...
586
MySQL Replication INSERT ... INSERT...
587
MySQL Replication INSERT ... INSERT... INSERT...
588
MySQL Replication INSERT ... INSERT ... INSERT... INSERT...
589
MySQL Replication INSERT ... INSERT ... INSERT... INSERT... INSERT...
590
MySQL Replication INSERT ... UPDATE ... INSERT ... INSERT... UPDATE...
591
MySQL Replication INSERT ... UPDATE ... INSERT ... DELETE ...
592
MySQL Replication Master
593
MySQL Replication Slave Master Slave Slave
594
MySQL Replication Slave Master Slave Slave
595
MySQL Replication Slave Master Slave Slave
596
MySQL Replication Slave Master Slave Slave
597
MySQL Replication Slave Master Slave Slave
598
Back at Cluster
599
mysqld mysqld mysqld NDB API NDB API
600
mysqld mysqld mysqld UPDATE UPDATE UPDATE NDB API NDB API
601
mysqld mysqld mysqld NDB API NDB API update() update()
602
mysqld mysqld mysqld NDB API NDB API update() update()
603
UPDATE DELETE INSERT update() update() mysqld mysqld mysqld NDB API
604
UPDATE DELETE INSERT update() update() mysqld mysqld mysqld NDB API
605
UPDATE DELETE INSERT mysqld mysqld mysqld
606
INSERT UPDATE DELETE INSERT UPDATE DELETE INSERT UPDATE DELETE SLAVE
607
ORDER? SLAVE INSERT UPDATE DELETE INSERT UPDATE DELETE INSERT UPDATE
608
Serialization is in the storage nodes
609
NDB Injector Thread A thread inside the MySQL Server
Subscribes to events in NDB The event of “Row was committed” Injects the rows into the binlog Producing a single, canonical binlog of your cluster not just one mysql server it contains EVERYTHING done on ALL ndbapi programs (including mysqld) connected to the cluster
610
UPDATE DELETE INSERT update() update() mysqld mysqld mysqld NDB API
611
A Closer Look...
612
MySQL Replication between Clusters
Application Application Application Application MySQL Server MySQL Server I/O thread Replication Master Apply thread Slave NdbCluste r Handler NdbCluste r Handler Binlog Relay Binlog Binlog NDB Kernel (Data nodes) ndbd NDB Kernel (Data nodes) ndbd
613
Who spotted the single point of failure?
One thing... Who spotted the single point of failure?
614
Redundant Replication Channels
MySQL Server MySQL Server Master Master I/O thread Apply thread MySQL Server Slave NdbCluster Handler NdbCluster Handler NdbCluster Handler Binlog Binlog Relay Binlog Binlog NDB Kernel (Data nodes) ndbd NDB Kernel (Data nodes) ndbd NdbCluster Handler Master MySQL Server Binlog Slave Relay Binlog Binlog Apply thread I/O thread NdbCluster Handler MySQL Server Replication
615
Redundant Replication Channels
MySQL Server MySQL Server Master Master I/O thread Apply thread MySQL Server Slave NdbCluster Handler NdbCluster Handler NdbCluster Handler Binlog Binlog Relay Binlog Binlog NDB Kernel (Data nodes) ndbd NDB Kernel (Data nodes) ndbd NdbCluster Handler Master MySQL Server Binlog Relay Binlog Binlog NdbCluster Handler Replication MySQL Server I/O thread Apply thread Slave
616
Redundant Replication Channels
MySQL Server MySQL Server MySQL Server MySQL Server Master Master Master Master I/O thread Apply thread MySQL Server Slave NdbCluster Handler NdbCluster Handler NdbCluster Handler NdbCluster Handler NdbCluster Handler Binlog Binlog Binlog Binlog Relay Binlog Binlog NDB Kernel (Data nodes) ndbd NDB Kernel (Data nodes) ndbd NdbCluster Handler Master MySQL Server Binlog Slave Relay Binlog Binlog Apply thread I/O thread NdbCluster Handler MySQL Server Replication
617
Redundant Replication Channels
MySQL Server MySQL Server Master Master I/O thread Apply thread MySQL Server Slave NdbCluster Handler NdbCluster Handler NdbCluster Handler Binlog Binlog Relay Binlog Binlog NDB Kernel (Data nodes) ndbd NDB Kernel (Data nodes) ndbd NdbCluster Handler Master MySQL Server Binlog Relay Binlog Binlog NdbCluster Handler Replication MySQL Server I/O thread Apply thread Slave
618
How do I make fail over happen?
619
But first...
620
Epoch A point of synchronization in the cluster
Everybody agrees on what transactions are disk persistent In case of system crash, this is where we'll recover to. About 9min 20sec on replication up to this point
621
Okay, but fail over?
622
Currently manual, but only four simple steps
623
STEP 1 Find out where the Slave is up to
In the binary log produced by the injector each Global Check Point (epoch) is a transaction Where we are up to is recorded on the slave The `mysql` database tracks these things here, the mysql.ndb_apply_status table. Which has two columns: server_id and epoch (both integers) and is ENGINE=ndb so is available everywhere! So, we mysqlS`> from mysql.ndb_apply_status; possibly with a WHERE clause for server_id
624
STEP 2 Find the binlog position for this epoch
The mysql.ndb_binlog_index table will help us here It maps binlog position to GCI and tells us the number of INSERT, UPDATES, DELETES and SCHEMAOPS per GCI Is MyISAM and is per-master So, we run the query from slave) mysqlM`> '/', -1), @pos:=Position FROM mysql.ndb_binlog_index WHERE epoch ORDER BY epoch ASC LIMIT 1;
625
STEP 3 Synchronize the second channel i.e. change the master
Run the query from last query) mysqlS`> CHANGE MASTER TO
626
STEP 4 mysqlS`> START SLAVE; No, really, that's it.
627
Limitations Fail over of replication channels is manual
can be scripted Since all updates are through one injector thread, there is a limit this limit is much less than what you can pump through a good cluster We are working to overcome this
628
You can now... Have a 99.999% uptime cluster with great performance
and a cool name Have replication between it and another Cluster (or single server) Load balancing Geographical redundancy Redundant replication channels between these setups Redundancy up the wazoo! On switching replication channels - 3.5mins
629
Disk Data
630
Two Phase Implementation
1. Data on disk (5.1) 2. Indexes on disk (7.1?)
631
A few concepts...
632
Where we store things Table Space Data file
633
Where we store things Table Space Data file Data file Data file
634
Where we store things Table Space Table Space Data file Data file
635
Where we store things Table Space Table Space Log file group Data file
Undo file Undo file
636
Where we store things Table Space Table Space Log file group Data file
Undo file Undo file
637
Files are per node Node 2 Node 1 df1 df1
638
Let's look at the SQL
639
CREATE LOGFILE GROUP CREATE LOGFILE GROUP lg_1 ADD UNDOFILE 'undo1' INITIAL_SIZE 16M UNDO_BUFFER_SIZE 2M ENGINE=NDB; We can add another undo file ALTER LOGFILE GROUP lg_1 ADD UNDOFILE 'undo2' INITIAL_SIZE 12M ENGINE=NDB; We currently don't auto-extend files For more space, add files
640
CREATE TABLESPACE CREATE TABLESPACE ts1 ADD DATAFILE 'datafile1' USE LOGFILE GROUP lg1 INITIAL_SIZE 32M ENGINE=NDB; We can add another datafile too ALTER TABLESPACE ts1 ADD DATAFILE 'datafile2' INITIAL_SIZE 48M ENGINE=NDB; We currently don't auto-extend So just add another file
641
CREATE TABLE CREATE TABLE t1 ( pk1 INT NOT NULL PRIMARY KEY, b INT NOT NULL, c INT NOT NULL) TABLESPACE ts1 STORAGE DISK ENGINE=NDB; b and c will be stored on disk pk1 in memory (as it's indexed)
642
I_S.FILES for Data files
what ts it belongs to Extent Size (bytes) Number of extents in file Number of free extents So, Free extents multiplied by extent size = free bytes that can be allocated to tables
643
A useful VIEW CREATE VIEW isf AS SELECT FILE_NAME, (TOTAL_EXTENTS * EXTENT_SIZE) AS 'Total', (FREE_EXTENTS * EXTENT_SIZE) AS 'Free', ( ((FREE_EXTENTS * EXTENT_SIZE)*100) / (TOTAL_EXTENTS * EXTENT_SIZE)) AS '% Free' FROM INFORMATION_SCHEMA.FILES WHERE ENGINE="ndbcluster" and FILE_TYPE = 'DATAFILE';
644
I_S.FILES for UNDO files
Free log space If running out, maybe need to add more
645
Optimized NR Traditional NR copy everything over the wire
PRO: easy to implement correctly PRO: not too bad for a few gigs of data CON: very very bad for disk data think 2TB going over the wire... ouch! Details on optimized node recovery for NDB in s1108-ronstrom.pdf recovery from checkpoint, so don't have to copy everything
646
Thank You
647
Find out More! MySQL Online Documentation
Cluster Forum Cluster Mailing List
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.