Presentation is loading. Please wait.

Presentation is loading. Please wait.

MySQL Cluster overview and ndb-7.0 features demo

Similar presentations


Presentation on theme: "MySQL Cluster overview and ndb-7.0 features demo"— Presentation transcript:

1 MySQL Cluster overview and ndb-7.0 features demo
Presented By: Matthew Montgomery MySQL Meetup San Antonio, TX

2 Who am I? Matthew Montgomery Senior Support Engineer Working for Sun
MySQL Cluster team based in San Antonio, TX

3 If you have a question, ask it!
Interactivity If you have a question, ask it! (No matter how silly)

4 What is MySQL Cluster?

5 A Storage Engine

6 unique feature of MySQL
Storage Engines unique feature of MySQL

7 No one best way to store tables

8 Choice of Storage Engines

9 Different engine per table (if you want)

10 Just like a Virtual File System Layer
Application Application Application Application Kernel VFS ext3 ext4 vfat XFS

11 Just like a Virtual File System Layer
Application Application Application Application MySQL Server Storage Engine API MyISAM InnoDB Falcon NDB Cluster

12 What is MySQL Cluster?

13 What is MySQL Cluster? A High Availability

14 A High Availability High Performance
What is MySQL Cluster? A High Availability High Performance

15 A High Availability High Performance In Memory (and disk in 5.1+)
What is MySQL Cluster? A High Availability High Performance In Memory (and disk in 5.1+)

16 In Memory (and disk in 5.1+) Shared Nothing
What is MySQL Cluster? A High Availability High Performance In Memory (and disk in 5.1+) Shared Nothing

17 In Memory (and disk in 5.1+) Shared Nothing
What is MySQL Cluster? A High Availability High Performance In Memory (and disk in 5.1+) Shared Nothing Clustered

18 In Memory (and disk in 5.1+) Shared Nothing
What is MySQL Cluster? A High Availability High Performance In Memory (and disk in 5.1+) Shared Nothing Clustered Storage Engine

19 In Memory (and disk in 5.1+) Shared Nothing
What is MySQL Cluster? A High Availability High Performance In Memory (and disk in 5.1+) Shared Nothing Clustered Storage Engine

20 Designed for Five Nines (99.999%) Uptime

21 Sub-Second Failover

22 Sub-Second Failover High Availability mysqld mysqld Transactions
Data Nodes

23 Sub-Second Failover High Availability mysqld mysqld Transactions
Data Nodes

24 Sub-Second Failover High Availability mysqld mysqld Transactions
Data Nodes

25 Hot “Online” Backup

26 No Locks during Backup

27 Hot (Online) Backup High Availability mysqld mysqld Transactions
Data Nodes

28 Hot (Online) Backup High Availability mysqld mysqld Transactions
Data Nodes

29 Hot (Online) Compressed Backup
High Availability Hot (Online) Compressed Backup mysqld mysqld Transactions Compressed Compressed Data Nodes

30 Configurable Redundancy
NoOfReplicas

31 NoOfReplicas=1 D a t a

32 NoOfReplicas=1 D a t a

33 NoOfReplicas=1 D a t a No surviving replica of this data

34 NoOfReplicas=2 Da ta Da ta

35 NoOfReplicas=2 Da ta Da ta

36 NoOfReplicas=2 Da ta Da ta There is a copy of the data here

37 NoOfReplicas=2 Da ta Da ta There is a copy of the data here

38 NoOfReplicas=2 Da ta Da ta There is a copy of the data here

39 NoOfReplicas=2 Da ta Da ta There is a copy of the data here

40 NoOfReplicas=2 Da ta Da ta There is a copy of the data here

41 NoOfReplicas=2 Da ta Da ta There is a copy of the data here

42 NoOfReplicas=2 Da ta Da ta

43 NoOfReplicas=2 Da ta Da ta No surviving replicas for this data

44 NoOfReplicas=2 Da ta Da ta No surviving replicas for this data

45 NoOfReplicas=3 Da Da ta Da ta ta

46 NoOfReplicas=3 Da Da ta Da ta ta

47 NoOfReplicas=4 Data Data Data Data

48 For Production: NoOfReplicas=2 NoOfReplicas=1 (bad) D a t a Da ta Da
No surviving replica of this data Da ta Da ta NoOfReplicas=2 There is a copy of the data here

49 In Memory (and disk in 5.1+) Shared Nothing
What is MySQL Cluster? A High Availability High Performance In Memory (and disk in 5.1+) Shared Nothing Clustered Storage Engine

50 Not from BEGIN to COMMIT
High Performance Not from BEGIN to COMMIT

51 ...but through Parallelism
High Performance ...but through Parallelism

52 High Performance Parallelism mysqld mysqld Transactions Data Nodes

53 In Memory (and disk in 5.1+) Shared Nothing
What is MySQL Cluster? A High Availability High Performance In Memory (and disk in 5.1+) Shared Nothing Clustered Storage Engine

54 Data and Indexes kept in main memory
In Memory (and disk) Data and Indexes kept in main memory

55 Non-Indexed attributes on disk (introduced in 5.1)
What is MySQL Cluster? Non-Indexed attributes on disk (introduced in 5.1)

56 Row

57 Row in memory part

58 Row in memory part on disk part
Row in memory part on disk part

59 In memory? What about machine/cluster failures?

60 Check point to disk

61 Check point to disk Frequent, Configurable

62 Check point to disk Not complete data loss after power outage

63 In Memory (and disk in 5.1+) Shared Nothing
What is MySQL Cluster? A High Availability High Performance In Memory (and disk in 5.1+) Shared Nothing Clustered Storage Engine

64 Shared Nothing Commodity PCs

65 Commodity Interconnects
Shared Nothing Commodity Interconnects

66 Commodity Interconnects Ethernet
Shared Nothing Commodity Interconnects Ethernet

67 Commodity Interconnects Ethernet SCI
Shared Nothing Commodity Interconnects Ethernet SCI

68 No Expensive Shared Disk
Shared Nothing No Expensive Shared Disk

69 No Expensive Shared Disk (so no single point of failure)
Shared Nothing No Expensive Shared Disk (so no single point of failure)

70 In Memory (and disk) Shared Nothing
What is MySQL Cluster? A High Availability High Performance In Memory (and disk) Shared Nothing Clustered Storage Engine

71 What is MySQL Cluster? Clustered

72 In Memory (and disk) Shared Nothing
What is MySQL Cluster? A High Availability High Performance In Memory (and disk) Shared Nothing Clustered Storage Engine

73 What is MySQL Cluster? ENGINE=NDBCLUSTER

74 In Memory (and disk in 5.1+) Shared Nothing
What is MySQL Cluster? A High Availability High Performance In Memory (and disk in 5.1+) Shared Nothing Clustered Storage Engine

75 What is MySQL Cluster?

76 What is MySQL Cluster? A Collection of Nodes

77 Node Types

78 Node Types Data Nodes (ndbd)

79 Data Nodes

80 Data Nodes Data nodes (running ndbd)

81 Data Nodes Data nodes (running ndbd) These store data

82 Data Nodes This cloud means a cluster Data nodes (running ndbd)
These store data

83 Data Nodes grouped

84 Data Nodes grouped into nodegroups

85 NoOfReplicas NoOfReplicas=2 Nodegroup 0 Nodegroup 1

86 NoOfReplicas DA TA NoOfReplicas=2 DA TA Nodegroup 0 Nodegroup 1

87 Data Nodes Nodegroup 0 Nodegroup 1

88 Data Nodes pk Nodegroup 0 Nodegroup 1

89 HASH(pk) pk

90 HASH(pk) pk

91 HASH(pk) pk

92 SELECT * from t1 pk

93 MySQL Servers talk to the Data Nodes
mysqld Nodegroup 0 Nodegroup 1

94 One used as a Transaction Coordinator
mysqld One storage node as Transaction Coordinator (TC) Nodegroup 0 Nodegroup 1

95 One used as a Transaction Coordinator
mysqld One storage node as Transaction Coordinator (TC) Nodegroup 0 Nodegroup 1

96 Many can send data mysqld One storage node as
Transaction Coordinator (TC) Nodegroup 0 Nodegroup 1

97 mysqld Several nodes can be involved in processing a single query

98 mysqld Several nodes can be involved in processing a single query Parallelism=Better Performance

99 Data Nodes (ndbd) Up to 48 in one cluster
Node Types Data Nodes (ndbd) Up to 48 in one cluster

100 Management Server (ndb_mgmd)
Node Types Management Server (ndb_mgmd)

101 Management Server config.ini mysqld mysqld ndbd ndb_mgmd
[ndbd default] NoOfReplicas= 2 DataMemory= 400M IndexMemory= 32M DataDir= /usr/local/mysql/cluster [ndbd] HostName= HostName= [ndb_mgmd] HostName= [mysqld] mysqld mysqld ndb_mgmd ndbd

102 Management Server config.ini mysqld mysqld ndbd ndb_mgmd
[ndbd default] NoOfReplicas= 2 DataMemory= 400M IndexMemory= 32M DataDir= /usr/local/mysql/cluster [ndbd] HostName= HostName= [ndb_mgmd] HostName= [mysqld] mysqld mysqld ndb_mgmd ndbd

103 Management Server config.ini mysqld mysqld ndb_mgmd ndbd ndbd

104 Management Server config.ini mysqld mysqld ndb_mgmd ndbd ndbd

105 Management Server config.ini mysqld mysqld ndb_mgmd ndbd ndbd

106 Management Server config.ini mysqld mysqld ndb_mgmd ndbd ndbd

107 Management Server config.ini mysqld mysqld ndb_mgmd ndbd ndbd

108 Management Server (ndb_mgmd)
Node Types Management Server (ndb_mgmd) also involved in arbitration, starting backups, issuing commands to nodes (start, stop, restart)

109 SQL Nodes (mysqld) also called API nodes
Node Types SQL Nodes (mysqld) also called API nodes

110 SQL Nodes mysqld mysqld mysqld

111 SQL and API Nodes mysqld mysqld mysqld NDB API NDB API

112 DELETE UPDATE INSERT mysqld mysqld mysqld NDB API NDB API

113 DELETE UPDATE INSERT update() update() mysqld mysqld mysqld NDB API

114 SQL Nodes (mysqld) Accessed like any other MySQL Server
Node Types SQL Nodes (mysqld) Accessed like any other MySQL Server

115 API Nodes Talk NDB API directly to the Data Nodes
Node Types API Nodes Talk NDB API directly to the Data Nodes

116 Management Server Node Types
Management Client talks to Management Server. Used to administer the cluster

117 Perl Mono PHP .NET mysql Ruby DELETE UPDATE INSERT Management Server
mysqld mysqld mysqld Management Server Mgm client Data Nodes NDB API NDB API update() update()

118 Physical Requirements

119 A node is a process, not a computer

120 At least three physical machines for High Availability

121 At least three physical machines for High Availability

122 Why?

123 Three machines minimum for HA
B A

124 Three machines minimum for HA
B A Can no longer see A

125 Three machines minimum for HA
B A Can no longer see A Did A Die?

126 Three machines minimum for HA
B A Or did the network link between A and B die? Can no longer see A

127 Three machines minimum for HA
B A Or did the network link between A and B die? Can no longer see B Can no longer see A

128 Who Is In Charge Now? B A Or did the network Can no longer see B
link between A and B die? Can no longer see B Can no longer see A

129 Split Brain = Bad B A Or did the network Can no longer see B
link between A and B die? Can no longer see B Can no longer see A

130 We detect possible Split Brain scenarios
Nodes will shut down instead

131 Three machines minimum for HA
B A Management server on 3rd machine

132 Three machines minimum for HA
B A Management server on 3rd machine Is Arbitrator

133 Three machines minimum for HA
B A Management server on 3rd machine Is Arbitrator

134 Three machines minimum for HA
B A Management server on 3rd machine Is Arbitrator

135 Three machines minimum for HA
B A Management server on 3rd machine Is Arbitrator

136 Physical Requirements
Management Server

137 Management Server Not CPU Intensive

138 Management Server Not CPU Intensive Not Memory Intensive

139 Management Server Not CPU Intensive Not Memory Intensive
Can have multiple for redundancy

140 Physical Requirements
Data Nodes

141 Data Node Requirements
Lots of Memory All indexed data in memory Data in memory Cache for data on disk

142 Data Node Requirements
Disk IO and capacity IO Rate can be calculated With disk-based tables calculation is harder Space usage calculated

143 Data Node Requirements
CPU Often not CPU bound (depends on queries) Before 7.0 single threaded SMP does not buy you a lot Few helper threads though Multithreaded ndbmtd (7.0)

144 Physical Requirements
SQL Node: Many API/SQL nodes are needed to load Storage Nodes

145 Physical Requirements
SQL Node: MySQL is multi-threaded SMP can help

146 A Configuration [ndbd default] NoOfReplicas= 2 DataMemory= 400M
IndexMemory= 32M DataDir= /usr/local/mysql/cluster [ndbd] HostName= HostName= [ndb_mgmd] HostName= [mysqld]

147 A Configuration Default settings for data nodes (ndbd) [ndbd default]
NoOfReplicas= 2 DataMemory= 400M IndexMemory= 32M DataDir= /usr/local/mysql/cluster [ndbd] HostName= HostName= [ndb_mgmd] HostName= [mysqld] Default settings for data nodes (ndbd)

148 A Configuration Settings for a data node [ndbd default]
NoOfReplicas= 2 DataMemory= 400M IndexMemory= 32M DataDir= /usr/local/mysql/cluster [ndbd] HostName= HostName= [ndb_mgmd] HostName= [mysqld] Settings for a data node

149 A Configuration Settings for a data node Settings for a data node
[ndbd default] NoOfReplicas= 2 DataMemory= 400M IndexMemory= 32M DataDir= /usr/local/mysql/cluster [ndbd] HostName= HostName= [ndb_mgmd] HostName= [mysqld] Settings for a data node Settings for a data node

150 A Configuration Settings for a management server [ndbd default]
NoOfReplicas= 2 DataMemory= 400M IndexMemory= 32M DataDir= /usr/local/mysql/cluster [ndbd] HostName= HostName= [ndb_mgmd] HostName= [mysqld] Settings for a management server

151 A Configuration [mysqld] Settings for a SQL/API node [ndbd default]
NoOfReplicas= 2 DataMemory= 400M IndexMemory= 32M DataDir= /usr/local/mysql/cluster [ndbd] HostName= HostName= [ndb_mgmd] HostName= [mysqld] [mysqld] Settings for a SQL/API node

152 Demo Configuration 2 Replicas 2n Data Nodes (2 or 4) 50MB for Data
5MB for Indexes 1 management server 3 MySQL Servers/API Nodes No other special options

153 A Configuration ndb_mgmd [ndbd default] NoOfReplicas= 2
DataMemory= 400M IndexMemory= 32M DataDir= /usr/local/mysql/cluster [ndbd] HostName= HostName= [ndb_mgmd] HostName= [mysqld] ndb_mgmd

154 A Configuration ndbd ndb_mgmd [ndbd default] NoOfReplicas= 2
DataMemory= 400M IndexMemory= 32M DataDir= /usr/local/mysql/cluster [ndbd] HostName= HostName= [ndb_mgmd] HostName= [mysqld] ndb_mgmd ndbd

155 A Configuration mysqld mysqld ndbd ndb_mgmd [ndbd default]
NoOfReplicas= 2 DataMemory= 400M IndexMemory= 32M DataDir= /usr/local/mysql/cluster [ndbd] HostName= HostName= [ndb_mgmd] HostName= [mysqld] mysqld mysqld ndb_mgmd ndbd

156 A Configuration mysqld mysqld ndbd ndb_mgmd [ndbd default]
NoOfReplicas= 2 DataMemory= 400M IndexMemory= 32M DataDir= /usr/local/mysql/cluster [ndbd] HostName= HostName= [ndb_mgmd] HostName= [mysqld] mysqld mysqld ndb_mgmd ndbd

157 A Configuration mysqld mysqld ndbd ndb_mgmd [ndbd default]
NoOfReplicas= 2 DataMemory= 400M IndexMemory= 32M DataDir= /usr/local/mysql/cluster [ndbd] HostName= HostName= [ndb_mgmd] HostName= [mysqld] mysqld mysqld ndb_mgmd ndbd

158 A Configuration applications mysqld mysqld ndbd ndb_mgmd
[ndbd default] NoOfReplicas= 2 DataMemory= 400M IndexMemory= 32M DataDir= /usr/local/mysql/cluster [ndbd] HostName= HostName= [ndb_mgmd] HostName= [mysqld] mysqld mysqld ndb_mgmd ndbd

159 Starting Nodes

160 Configuration Information
A Starting node needs: Configuration Information

161 Location of Management Server
A Starting node needs: Location of Management Server

162 The Connect String

163 Lists Management Servers
The Connect String Lists Management Servers

164 A Connect String:

165 A Connect String: :1186

166 A Connect String: :9310

167 A Connect String: ,

168 A Connect String: , ,nodeid=3

169 A Bad Connect String: mgmsrv1,mgmsrv2

170 use IP Addresses (or hosts file)
DNS is Not Reliable Do Not Trust DNS to work, use IP Addresses (or hosts file)

171 DHCP is Not Reliable Do Not Trust DHCP to work,
use static IP Addresses

172 Starting The Cluster

173 Starting the cluster 1. Management server 2. Data Nodes
3. MySQL Server Nodes

174 Starting the cluster 1. Management server 2. Data Nodes
Needs to be started first (so new nodes can get the configuration) On the management server: $ ndb_mgmd -f config.ini 2. Data Nodes 3. MySQL Server Nodes

175 Starting the cluster 1. Management server 2. Data Nodes
On each storage node: $ ndbd -c The -c option is the connect string 3. MySQL Server Nodes

176 Starting the cluster 1. Management server 2. Data Nodes
3. MySQL Server Nodes Make sure the ndbcluster option is enabled (command line or my.cnf) Make sure the connect string is specified (command line or my.cnf) Start the MySQL server in your preferred way (e.g. /etc/init.d/mysql start)

177 MySQL Server Options 1. Create a my.cnf file 2. Add ndbcluster option
3. Add ndb-connectstring option 4. Set unique port, socket, datadir 5. mysql_install_db --defaults-file=/path/my.cnf 6. ./mysqld –defaults-file=/path/to/my.cnf 7. repeat for each SQL node

178 Using the Management Client
Basic Monitoring Using the Management Client

179 On Management Server, $DataDir/ndb_<id>_cluster.log
Check the Cluster Log On Management Server, $DataDir/ndb_<id>_cluster.log

180 Using MySQL Cluster ENGINE=NDBCLUSTER

181 Let's CREATE TABLE CREATE TABLE t1 (
pk1 INT PRIMARY KEY AUTO_INCREMENT, v VARCHAR(100) ) ENGINE=NDBCLUSTER;

182 SELECT, INSERT, UPDATE, DELETE from all SQL Nodes
and see the new and updated rows!

183 Isn't this just like Replication?

184 Isn't this just like Replication?
No.

185 MySQL Replication Asynchronous Read-only slaves

186 All nodes can perform reads/writes
MySQL Cluster Synchronous, All nodes can perform reads/writes

187 MySQL Replication Changes made by a transaction are available on a slave after a small amount of time

188 MySQL Cluster Changes made by a transaction are instantly available from all nodes on commit

189 Two-Phase Commit Protocol
MySQL Cluster Two-Phase Commit Protocol

190 MySQL Cluster Two-Phase Commit Protocol
Ensures consistency in event of failure. (with a performance penalty)

191 Cluster vs Replication
With replication, a single transaction will be COMMITted quicker. But if master fails before a slave retrieves the binary log, transaction is lost.

192 Cluster vs Replication
With replication, a single transaction will be COMMITted quicker. But if master fails before a slave retrieves the binary log, the transaction is lost.

193 Cluster vs Replication
With Cluster, COMMIT means transaction can survive node failures

194 Cluster and Replication
We'll cover later

195 What else does MySQL Cluster Support?

196 All the Standard 5.1 features

197 Views

198 Stored Procedures

199 Triggers

200 Triggers Implemented in the MySQL Server, so changes made with NDB API programs do not fire triggers

201 Standard Permissions GRANT/REVOKE

202 ...and a caveat

203 The mysql database is per SQL node, not per cluster.

204 So GRANT/REVOKE, Triggers, Stored Procedures, Views have to be set up on each SQL node.

205 Also, no native FOREIGN KEYs support

206 You can emulate foreign keys on the SQL nodes using triggers.

207 Also, no FULLTEXT indexes

208 Distributed Metadata

209 Notice how the 2nd MySQL Server knew that there were tables in the Cluster

210 The MySQL Server uses .frm files to track table metadata

211 For MySQL Cluster, we store the FRM files in the Cluster

212 Retrieving them when needed

213 Distributed Metadata MySQL server MySQL server .frm files .frm files
Distributed database

214 Distributed Metadata create table t1 ... MySQL server MySQL server
.frm files MySQL server .frm files MySQL server Distributed database

215 Distributed Metadata create table t1 ... MySQL server MySQL server
.frm files MySQL server .frm files MySQL server copy .frm compressed .frm copies Distributed database

216 Distributed Metadata select * from t1 create table t1 ... MySQL server
.frm files MySQL server .frm files MySQL server copy .frm compressed .frm copies Distributed database

217 Distributed Metadata select * from t1 create table t1 ... MySQL server
.frm files MySQL server .frm files MySQL server autodiscover .frm copy .frm compressed .frm copies Distributed database

218 Data Distribution

219 MySQL Cluster implements horizontal partitioning

220 pk 2 Nodes

221 pk 2 Nodes F1 F2 Two Fragments

222 NoOfReplicas=2 pk 2 Nodes F1 F2 Two Fragments

223 NoOfReplicas=2 pk 2 Nodes F1 F1 F2 F2 Two Fragments

224 NoOfReplicas=2 pk 2 Nodes F1 F1 F2 F2

225 NoOfReplicas=2 pk 2 Nodes F1 F1 F2 F2

226 NoOfReplicas=2 pk 2 Nodes F1 F1 F1 F1 F2 F2 F2 F2

227 NoOfReplicas=2 pk 2 Nodes F1 F1 F1 F1 F2 F2 F2 F2 Primary Replica

228 NoOfReplicas=2 pk 2 Nodes F1 F1 F1 F1 F2 F2 F2 F2 Secondary Replica

229 Why two fragments for two nodes?

230 What is a Primary Replica responsible for?

231 What is a Primary Replica responsible for?
Locks

232 What is a Primary Replica responsible for?
Locks, Reads

233 What is a Primary Replica responsible for?
Locks, Reads (among other things)

234 Two fragments for a two node cluster
Load Balances

235 What about node failure?

236 NoOfReplicas=2 pk 2 Nodes F1 F1 F1 F1 F2 F2 F2 F2

237 NoOfReplicas=2 pk 2 Nodes F1 F1 F1 F1 F2 F2 F2 F2

238 NoOfReplicas=2 pk 2 Nodes F1 F1 F1 F1 F2 F2 F2 F2 Transparent Failover

239 NoOfReplicas=2 pk 2 Nodes F1 F1 F1 F1 F2 F2 F2 F2 Primary Replica

240 Surviving nodes take over

241 Surviving nodes have increased load

242 What about node recovery?

243 NoOfReplicas=2 pk 2 Nodes F1 F1 F1 F1 F2 F2 F2 F2 Primary Replica

244 NoOfReplicas=2 pk 2 Nodes F1 F1 F1 F1 F2 F2 F2 F2 Synchronize data

245 NoOfReplicas=2 pk 2 Nodes F1 F1 F1 F1 F2 F2 F2 F2

246 What about ongoing transactions during node failure?

247 Transactions using a failed node are aborted

248 What about MySQL Server node failure?

249 Application can connect to another MySQL Server

250 How does an Application connect to another MySQL Server?

251 Load Balancing system

252 Connector based(JDBC) Hardware load balancer

253 What about Management Server failure?

254 Continued operation of cluster not dependent on Management Server

255 Management Server required to start new nodes

256 Can have multiple Management Servers (but there is increased admin work)

257 So...

258 Let's kill things

259 kill -9 (Angel and NDB)

260 See the failure reported in the logs

261 See the failure from the management client

262 See that things still work

263 Run some SELECT, INSERT, UPDATE queries

264 Restart the failed data node

265 See it rejoin

266 Run more queries

267 See that all is good with the world

268 Two-Phase Commit Protocol

269 Two-phase Commit Protocol
Keeping DB nodes synchronized facilitate immediate fail-over TC DB 1 DB 3 DB 2 DB 4 Node group 1 Node group 2 Two-phase commit Prepare phase: Both node groups get their information updated Commit phase: The change is committed

270 Two-phase Commit Protocol
Keeping DB nodes synchronized facilitate immediate fail-over TC DB 1 DB 3 DB 2 DB 4 Node group 1 Node group 2 Two-phase commit Prepare phase: Both node groups get their information updated Commit phase: The change is committed

271 Two-phase Commit Protocol
Keeping DB nodes synchronized facilitate immediate fail-over TC DB 1 DB 3 DB 2 DB 4 Node group 1 Node group 2 Two-phase commit Prepare phase: Both node groups get their information updated Commit phase: The change is committed

272 Two-phase Commit Protocol
Keeping DB nodes synchronized facilitate immediate fail-over TC DB 1 DB 3 DB 2 DB 4 Node group 1 Node group 2 Two-phase commit Prepare phase: Both node groups get their information updated Commit phase: The change is committed

273 Two-phase Commit Protocol
Keeping DB nodes synchronized facilitate immediate fail-over TC DB 1 DB 3 DB 2 DB 4 Node group 1 Node group 2 Two-phase commit Prepare phase: Both node groups get their information updated Commit phase: The change is committed

274 Two-phase Commit Protocol
Keeping DB nodes synchronized facilitate immediate fail-over TC DB 1 DB 3 DB 2 DB 4 Node group 1 Node group 2 Two-phase commit Prepare phase: Both node groups get their information updated Commit phase: The change is committed

275 Two-phase Commit Protocol
Keeping DB nodes synchronized facilitate immediate fail-over TC DB 1 DB 3 DB 2 DB 4 Node group 1 Node group 2 Two-phase commit Prepare phase: Both node groups get their information updated Commit phase: The change is committed

276 Transaction Over 3 Replicas
TC DB 1 DB 4 DB 2 DB 5 DB 3 DB 6 Node group 1 Node group 2

277 Transaction Over 3 Replicas
TC DB 1 DB 4 DB 2 DB 5 DB 3 DB 6 Node group 1 Node group 2

278 Transaction Over 3 Replicas
TC DB 1 DB 4 DB 2 DB 5 DB 3 DB 6 Node group 1 Node group 2

279 Transaction Over 3 Replicas
TC DB 1 DB 4 DB 2 DB 5 DB 3 DB 6 Node group 1 Node group 2

280 Transaction Over 3 Replicas
TC DB 1 DB 4 DB 2 DB 5 DB 3 DB 6 Node group 1 Node group 2

281 Transaction Over 3 Replicas
TC DB 1 DB 4 DB 2 DB 5 DB 3 DB 6 Node group 1 Node group 2

282 Transaction Over 3 Replicas
TC DB 1 DB 4 DB 2 DB 5 DB 3 DB 6 Node group 1 Node group 2

283 Transaction Over 3 Replicas
TC DB 1 DB 4 DB 2 DB 5 DB 3 DB 6 Node group 1 Node group 2

284 Transaction Over 3 Replicas
TC DB 1 DB 4 DB 2 DB 5 DB 3 DB 6 Node group 1 Node group 2

285 Transaction Over 3 Replicas
TC DB 1 DB 4 DB 2 DB 5 DB 3 DB 6 Node group 1 Node group 2

286 Transaction Over 3 Replicas
TC DB 1 DB 4 DB 2 DB 5 DB 3 DB 6 Node group 1 Node group 2

287 Transaction Over 3 Replicas
TC DB 1 DB 4 DB 2 DB 5 DB 3 DB 6 Node group 1 Node group 2

288 Transaction Over 3 Replicas
TC DB 1 DB 4 DB 2 DB 5 DB 3 DB 6 Node group 1 Node group 2

289 Two phase commit enables recovery in distributed system

290 Nodes communicate with each other over an interconnect

291 Ethernet is common/cheap

292 Ethernet isn't the fastest in the world

293 Performance of some queries is very latency dependent

294 MySQL Cluster abstracts away the communication method

295 TCP Transporter

296 SCI Transporter

297 SHM Transporter (alpha)

298 In reality: use TCP or SCI (with appropriate hardware)

299 In reality: use gigabit ethernet, not 100Mbit for TCP

300 Use private network for MySQL Cluster traffic

301 Inter-node communication is not authenticated and not encrypted

302 other applications on the network may interfere with heartbeats

303 Heartbeats

304 Failure detection: Heartbeats, lost connections
Nodes organized in a logical circle DB Node 1 DB Node 4 DB Node 2 All nodes must have the same view of which nodes are alive DB Node 3 Heartbeat messages sent to next node in circle

305 Schema considerations for MySQL Cluster

306 Every table has a PRIMARY KEY

307 Every table has a PRIMARY KEY
Even if you don't explicitly set one

308 Three types of indexes

309 Three types of indexes 1) Primary Hash Index 2) Unique Hash Index
3) Ordered T-tree Index

310 UNIQUE (SQL) is Unique Hash and Ordered Tree (NDB)

311 UNIQUE USING HASH (SQL) is Unique Hash (NDB)

312 PRIMARY KEY (SQL) is Primary Hash and Ordered Tree (NDB)

313 PRIMARY KEY USING HASH (SQL) is Primary Hash (NDB)

314 Q: What query can use a hash index?
A) Key lookup

315 Q: What query can use an ordered index?
A) Range scan & ORDER BY

316 So what happens in a table scan?

317 MySQL Server NDBCLUSTER Engine Data Nodes (ndbd)

318 MySQL Server NDBCLUSTER Engine TC TC TC TC Data Nodes (ndbd)

319 MySQL Server NDBCLUSTER Engine TC TC TC TC Data Nodes (ndbd)

320 MySQL Server NDBCLUSTER Engine SCANTABREQ TC TC TC TC Data Nodes (ndbd)

321 MySQL Server NDBCLUSTER Engine SCAN_FRAGREQ SCAN_FRAGREQ TC TC TC TC SCAN_FRAGREQ Data Nodes (ndbd)

322 MySQL Server NDBCLUSTER Engine LQH LQH LQH LQH Data Nodes (ndbd)

323 MySQL Server NDBCLUSTER Engine TRANSID_AI LQH LQH LQH LQH Data Nodes (ndbd)

324 MySQL Server NDBCLUSTER Engine TRANSID_AI LQH LQH LQH LQH Data Nodes (ndbd)

325 MySQL Server NDBCLUSTER Engine TRANSID_AI LQH LQH LQH LQH Data Nodes (ndbd)

326 MySQL Server NDBCLUSTER Engine TRANSID_AI LQH LQH LQH LQH Data Nodes (ndbd)

327 MySQL Server NDBCLUSTER Engine TRANSID_AI LQH LQH LQH LQH Data Nodes (ndbd)

328 MySQL Server NDBCLUSTER Engine TRANSID_AI LQH LQH LQH LQH Data Nodes (ndbd)

329 MySQL Server NDBCLUSTER Engine TRANSID_AI LQH LQH LQH LQH Data Nodes (ndbd)

330 MySQL Server ORDER BY done here NDBCLUSTER Engine TRANSID_AI LQH LQH LQH LQH Data Nodes (ndbd)

331 MySQL Server WHERE done here NDBCLUSTER Engine TRANSID_AI LQH LQH LQH LQH Data Nodes (ndbd)

332 MySQL Server NDBCLUSTER Engine SCAN_FRAGCONF SCAN_FRAGCONF TC LQH LQH LQH SCAN_FRAGCONF LQH Data Nodes (ndbd)

333 MySQL Server NDBCLUSTER Engine SCAN_TABCONF TC Data Nodes (ndbd)

334 Engine Condition Pushdown

335 Evaluate conditions in parallel on data nodes

336 Only send matching rows to API

337 MySQL Server NDBCLUSTER Engine LQH LQH LQH LQH Data Nodes (ndbd)

338 MySQL Server NDBCLUSTER Engine LQH LQH LQH LQH Data Nodes (ndbd)

339 MySQL Server NDBCLUSTER Engine LQH LQH LQH LQH Data Nodes (ndbd)

340 MySQL Server NDBCLUSTER Engine LQH LQH LQH LQH Data Nodes (ndbd)

341 MySQL Server NDBCLUSTER Engine LQH LQH LQH LQH Data Nodes (ndbd)

342 MySQL Server NDBCLUSTER Engine TRANSID_AI LQH LQH LQH LQH Data Nodes (ndbd)

343 MySQL Server NDBCLUSTER Engine TRANSID_AI LQH LQH LQH LQH Data Nodes (ndbd)

344 MySQL Server NDBCLUSTER Engine TRANSID_AI LQH LQH LQH LQH Data Nodes (ndbd)

345 MySQL Server NDBCLUSTER Engine TRANSID_AI LQH LQH LQH LQH Data Nodes (ndbd)

346 MySQL Server NDBCLUSTER Engine TRANSID_AI LQH LQH LQH LQH Data Nodes (ndbd)

347 MySQL Server NDBCLUSTER Engine TRANSID_AI LQH LQH LQH LQH Data Nodes (ndbd)

348 MySQL Server ORDER BY done here NDBCLUSTER Engine TRANSID_AI LQH LQH LQH LQH Data Nodes (ndbd)

349 MySQL Server WHERE already done NDBCLUSTER Engine Data Nodes (ndbd)

350 set engine_condition_pushdown=on|off

351 Just about any normal comparisons can be pushed down

352 Common to see 5-10x improvement

353 Details in EXPLAIN [EXTENDED]

354 Batching

355 Rule: One large network packet is quicker than several small ones

356 Batching leads to improved performance

357 We do:

358 Batched Inserts

359 INSERT INTO t1 (a) values (1),(2),(3),(4),(5);

360 Batched Lookups

361 SELECT * FROM t1 WHERE pk1 IN (11,22,63,14,25,6,9,8);

362 All key lookups sent together

363 Batched lookups: 2-3x improvement

364 Query Cache

365 Invalidated when table is changed

366 MySQL Cluster has slightly different semantics...

367 ndb_cache_check_time

368 Milliseconds to wait before checking the query cache

369 Ask Data nodes if other nodes have changed anything

370 If table changed, invalidate Query Cache

371 This means:

372 For up to ndb_cache_check_time milliseconds, result may be old

373 BACKUP

374 Data not backed up is data not wanted

375 Two options for backing up MySQL Cluster

376 mysqldump Backup

377 mysqldump: single connection to single MySQL Server

378 mysqldump: human readable

379 mysqldump: READ_COMMITTED

380 mysqldump: READ_COMMITTED i.e. not consistent

381 MySQL Cluster Native Backup

382 MySQL Cluster Native Backup non-blocking parallel consistent

383 ndb_mgm> START BACKUP

384 Each data node participates in BACKUP

385 Each node performs backup for its primary fragments

386 Backups stored in ndb_<id>_fs/BACKUP/

387 Backup API

388 Backup API updates updates

389 Backup API updates updates Data Log Data Log

390 Backup API updates updates Data Log Data Log

391 Backup API updates updates Data Log Data Log

392 Backup API updates updates Data Log Data Log

393 Backup API updates updates Data Log Data Log

394 Backup API updates updates Data Log Data Log

395 Backup API updates updates Data Log Data Log

396 Backup API updates updates Data Log Data Log

397 Backup API updates updates Data Log Data Log

398 Backup API updates updates Data Log Data Log

399 Backup API updates updates Data Log Data Log

400 Backup API updates updates Data Log Data Log

401 Backup API updates updates Control Control Data Log Data Log

402 RESTORE

403 ndb_restore

404 Restore Control Control Data Log Data Log

405 Restore Filesystem Control Control Data Log Data Log

406 Restore Filesystem Filesystem Control Control Data Log Data Log

407 Restore Control Control Data Log Data Log

408 Restore - Metadata Control Control Data Log Data Log

409 Restore - Data Control Control Data Log Data Log

410 Restore - Data Control Control Data Log Data Log

411 Restore - Data Control Control Data Log Data Log

412 Restore – Data (Log) Control Control Data Log Data Log

413 Restore Control Control Data Log Data Log

414 Restore API updates updates Control Control Data Log Data Log

415 Some Configuration Parameters

416 Several different categories

417 Why specify resource limits?

418 We statically allocate memory on startup for some resources

419 Deterministic behavior for resource allocation at run time

420 1. Memory

421 DataMemory

422 Limits amount of data that can be stored in the Cluster
DataMemory Limits amount of data that can be stored in the Cluster

423 DataMemory Per Node

424 Allocated to tables in Pages
DataMemory Allocated to tables in Pages

425 4 node, 2 replica DataMemory=100MB

426 4 node, 2 replica DataMemory=100MB
400MB total

427 4 node, 2 replica DataMemory=100MB
200MB of Data (2 copies)

428 Memory used by hash indexes
IndexMemory Memory used by hash indexes

429 4 node, 2 replica IndexMemory=10MB
20MB of Indexes (2 copies)

430 StringMemory

431 Memory used for table names, column names, FRM files etc
StringMemory Memory used for table names, column names, FRM files etc

432 (the default “5”% is likely fine up to ~1000 tables)
StringMemory (the default “5”% is likely fine up to ~1000 tables)

433 2. Transaction Parameters

434 MaxNoOfConcurrentTransactions
Maximum number of ongoing transactions

435 MaxNoOfConcurrentOperations
Maximum number of uncommitted changed rows (divided by the number of data nodes)

436 3. Scans and Buffering

437 MaxNoOfConcurrentScans
Maximum number of parallel scans (for each data node)

438 BatchSizePerLocalScan
Linked with ScanBatchSize – how many rows we batch for scans

439 4. Logging and Checkpointing

440 NoOfFragmentLogFiles

441 NoOfFragmentLogFiles
Sets number of REDO log files for each node

442 NoOfFragmentLogFiles
Each transaction written to REDO Log

443 NoOfFragmentLogFiles
REDO Log used in System Restart

444 NoOfFragmentLogFiles
REDO Log record exists for 2 local checkpoints

445 NoOfFragmentLogFiles
If no room, transactions aborted with: 410 Out of log file space temporarily

446 NoOfFragmentLogFiles
Allocated in units of 64MB (changed with FragmentLogFileSize)

447 NoOfFragmentLogFiles
Default is 8; 8x64MB = 512MB

448 NoOfFragmentLogFiles
Update heavy systems need large values... even up to 300

449 NoOfFragmentLogFiles
300 x 64MB = 19.2GB

450 NoOfFragmentLogFiles
Can be changed on a running cluster... rolling –initial restart

451 5. Metadata Objects

452 MaxNoOfAttributes

453 Maximum number of columns
MaxNoOfAttributes Maximum number of columns for all tables

454 MaxNoOfTables

455 Maximum number of tables
MaxNoOfTables Maximum number of tables

456 MaxNoOfOrderedIndexes

457 MaxNoOfOrderedIndexes
Maximum number of ordered indexes

458 MaxNoOfUniqueHashIndexes

459 MaxNoOfUniqueHashIndexes
Maximum number of unique hash indexes

460 6. Behavior

461 LockPagesInMainMemory

462 LockPagesInMainMemory
Prevents memory allocated by ndbd from being swapped out by the Operating System

463 LockPagesInMainMemory
(This is a good idea)

464 Diskless

465 Nothing written to disk
Diskless Nothing written to disk

466 Diskless No checkpointing

467 If enabled neither records or tables survive cluster crash
Diskless If enabled neither records or tables survive cluster crash

468 ...but requires much less (zero) disk space and bandwidth.
Diskless ...but requires much less (zero) disk space and bandwidth.

469 7. Timeouts, Intervals, Disk Paging

470 TimeBetweenWatchDogCheck

471 TimeBetweenWatchDogCheck
Remember the Angel process from before?

472 TimeBetweenWatchDogCheck
Every TimeBetweenWatchDogCheck milliseconds, check that the main thread isn't stuck

473 StartPartialTimeout

474 Normally, we wait for all data nodes before starting the cluster
StartPartialTimeout Normally, we wait for all data nodes before starting the cluster

475 StartPartialTimeout After StartPartialTimeout milliseconds (30s), we'll perform a partial start

476 0 means always wait for all the data nodes
StartPartialTimeout 0 means always wait for all the data nodes

477 StartPartitionedTimeout

478 StartPartitionedTimeout
If after StartPartialTimeout the cluster could be in a partitioned state, we wait an additional StartPartitionedTimeout milliseconds

479 HeartbeatIntervalDbDb

480 HeartbeatIntervalDbDb
Every HeartbeatIntervalDbDb, heartbeats sent between Data nodes

481 HeartbeatIntervalDbDb
Maximum time to discover node failure is 4 times HeartbeatIntervalDbDb

482 HeartbeatIntervalDbApi

483 HeartbeatIntervalDbApi
Each Data node sends heartbeats to each API node connected to it

484 TimeBetweenLocalCheckpoints
Wins the prize for the strangest units for a configuration parameter

485 TimeBetweenLocalCheckpoints
Not a time period

486 TimeBetweenLocalCheckpoints
Amount of updates before starting a local checkpoint

487 TimeBetweenLocalCheckpoints
...but not a value in bytes

488 TimeBetweenLocalCheckpoints
base-2 logarithm of the number of 4 byte words

489 TimeBetweenLocalCheckpoints
base-2 logarithm of the number of 4 byte words (sorry, not joking)

490 TimeBetweenLocalCheckpoints
Default value is 20 4 x 220 = 4MB

491 TimeBetweenLocalCheckpoints
Value of 21 4 x 221 = 8MB

492 TimeBetweenLocalCheckpoints
Value of 22 4 x 222 = 16MB (and so on...)

493 TimeBetweenLocalCheckpoints
Maximum value of 31 4 x 231 = 8GB

494 TimeBetweenLocalCheckpoints
Value of 6 or less Constant local checkpoints

495 TimeBetweenLocalCheckpoints
Designed to prevent checkpointing on mostly idle clusters

496 TimeBetweenGlobalCheckpoints

497 TimeBetweenGlobalCheckpoints
A COMMITted transaction is in main memory of all replicas

498 TimeBetweenGlobalCheckpoints
A COMMITted transaction is not immediately flushed to disk

499 TimeBetweenGlobalCheckpoints
A global checkpoint is where a set of COMMITted transaction are flushed to disk

500 TimeBetweenGlobalCheckpoints
This is where we recover to after a System Restart

501 TimeBetweenGlobalCheckpoints
Default is every 2000 milliseconds

502 Checkpointing

503 COMMIT= txn survives node failure

504 COMMIT != Disk Persistence

505 The D in ACID is still covered

506 Durable to machine failure

507 Durable to disk failure

508 In event of cluster failure, want to be able to restore a consistent image of the database

509 We checkpoint to disk (except when in diskless mode)

510 Can't lock the database while we write a checkpoint

511 Checkpoint in background, while transactions continue

512 Write image of database (LCP) and REDO log (GCP)

513 Take a Local Check Point of the database, apply REDO from that point to Global Check Point

514 Space Usage

515 Fixed Size Rows (prior to 5. 1) Variable Sized Columns (5
Fixed Size Rows (prior to 5.1) Variable Sized Columns (5.1, with 4byte alignment)

516 BLOBs and TEXT

517 BLOBs and TEXT 256 bytes in the row with remainder in 2000 byte chunks stored in separate table

518 Indexed Columns must be in main memory

519 Non-Indexed columns can be on disk

520 Disk Columns are fixed size

521 Disk Columns are fixed size VARCHAR(11) uses 12 bytes on disk

522 Columns are 4-byte aligned

523 Calculate storage requirements

524 Know your dataset

525 Use ndb_size.pl (examines existing database, creates report)

526 ndb_size.pl output

527 Variable Sized Rows

528 4.1 and 5.0 1 2 hello int int VARCHAR or just saying hello

529 Wasted Space 1 2 hello Wasted Space int int VARCHAR
This means you could have a lot of wasted space on a lot of rows Wasted Space

530 5.1 1 2 hello int int VARCHAR However, in 5.1, we now have variable sized rows. This means that we don't waste space on fields that aren't full.

531 The Saving Which can save a lot of space when you have any reasonable number of rows

532 The Saving We just use the space needed by each particular row. Here we only have a few longer rows, saving us a lot of memory.

533 On line Add/Drop Index

534 ADD INDEX (4.1, 5.0) t1 Index Index Rows
In 4.1 and 5.0, to add an index to a table

535 ADD INDEX (4.1, 5.0) t1 temp table Index Index Index Index Index Rows
We first create a temporary table with a schema of what we want t1 to look like (with the new index)

536 ADD INDEX (4.1, 5.0) t1 temp table Index Index Index Index Index Rows
and then copy the data in t1 over to the temporary table, building all the indexes as we go. We keep a TABLE LOCK while we do this

537 ADD INDEX (4.1, 5.0) t1 temp table Index Index Index Index Index Rows

538 ADD INDEX (4.1, 5.0) t1 temp table Index Index Index Index Index Rows

539 ADD INDEX (4.1, 5.0) t1 temp table Index Index Index Index Index Rows

540 ADD INDEX (4.1, 5.0) t1 temp table Index Index Index Index Index Rows
we then delete t1

541 ADD INDEX (4.1, 5.0) temp table Index Index Index Rows
and rename the temporary table

542 ADD INDEX (4.1, 5.0) t1 Index Index Index Rows
so we now have t1 with our new index

543 ADD INDEX in 5.1 t1 Index Index Rows In 5.1, we can do a lot better

544 ADD INDEX in 5.1 t1 Index Index Index Rows
We build a new index as an online operation – avoiding the copy.

545 ADD INDEX in 5.1 t1 Index Index Index Rows
so the only thing we have to build is one index, not copying all the data and rebuilding all the other indexes.

546 ADD INDEX in 5.1 t1 Index Index Index Rows

547 ADD INDEX in 5.1 t1 Index Index Index Rows

548 ADD INDEX in 5.1 t1 Index Index Index Rows

549 DROP INDEX in 5.1 t1 Index Index Index Rows
Delete is the same, we only drop the index we don't want

550 DROP INDEX in 5.1 t1 Index Index Rows

551 How much faster?

552 Online Add Index Before (copy the table):
mysql> create index b on t1(b); Query OK, 1356 rows affected (2.20 sec) Records: Duplicates: 0 Warnings: 0 mysql> drop index b on t1; Query OK, 1356 rows affected (2.03 sec)

553 Online Add Index Before (copy the table):
mysql> create index b on t1(b); Query OK, 1356 rows affected (2.20 sec) Records: Duplicates: 0 Warnings: 0 mysql> drop index b on t1; Query OK, 1356 rows affected (2.03 sec) Now (just add/drop an index): Query OK, 0 rows affected (0.58 sec) Records: 0 Duplicates: 0 Warnings: 0 Query OK, 0 rows affected (0.46 sec)

554 User Defined Partitioning

555 Since the dawn of time...

556

557 pk

558 pk

559 pk Nodegroup 0 Nodegroup 1

560 HASH(pk) pk

561 HASH(pk) pk

562 HASH(pk) pk

563 Perception Reality HASH(pk) pk

564 pk

565 pk Two Partitions

566 How the default looks (in 5.1 SHOW CREATE TABLE)
CREATE TABLE account(number int unsigned, location varchar, amount int) PRIMARY KEY (number) [PARTITION BY KEY ()] [(PARTITION P0 NODEGROUP 0, PARTITION P1 NODEGROUP 1, …)] ENGINE=NDBCLUSTER;

567 How the default looks (in 5.1 SHOW CREATE TABLE)
CREATE TABLE account(number int unsigned, location varchar, amount int) PRIMARY KEY (number) [PARTITION BY KEY ()] [(PARTITION P0 NODEGROUP 0, PARTITION P1 NODEGROUP 1, …)] ENGINE=NDBCLUSTER;

568 How the default looks (in 5.1 SHOW CREATE TABLE)
CREATE TABLE account(number int unsigned, location varchar, amount int) PRIMARY KEY (number) [PARTITION BY KEY ()] [(PARTITION P0 NODEGROUP 0, PARTITION P1 NODEGROUP 1, …)] ENGINE=NDBCLUSTER;

569 How the default looks (in 5.1 SHOW CREATE TABLE)
CREATE TABLE account(number int unsigned, location varchar, amount int) PRIMARY KEY (number) [PARTITION BY KEY ()] [(PARTITION P0 NODEGROUP 0, PARTITION P1 NODEGROUP 1, …)] ENGINE=NDBCLUSTER;

570 Now, In 5.1

571 User Defined Partitioning

572 By Key

573 Partition by Key CREATE TABLE account(number int unsigned, location varchar, amount int) PRIMARY KEY (number) [PARTITION BY KEY ()] [(PARTITION P0 NODEGROUP 0, PARTITION P1 NODEGROUP 1, …)] ENGINE=NDBCLUSTER;

574 MySQL Cluster Replication

575 Not Internal Mirroring between nodes

576 Replication from one Cluster to Another Cluster

577 Why?

578 the usual reasons

579 Why replicate between Clusters?
Geographical Redundancy Chatting on replication for about 1min 20sec so ffar.

580 Why replicate between Clusters?
Geographical Redundancy Split the processing load Chatting on replication for about 1min 20sec so ffar.

581 Why replicate between Clusters?
Geographical Redundancy Split the processing load e.g. for monthly reports Chatting on replication for about 1min 20sec so ffar.

582 Quick Overview of MySQL Replication

583 MySQL Replication

584 MySQL Replication INSERT ...

585 MySQL Replication INSERT ... INSERT...

586 MySQL Replication INSERT ... INSERT...

587 MySQL Replication INSERT ... INSERT... INSERT...

588 MySQL Replication INSERT ... INSERT ... INSERT... INSERT...

589 MySQL Replication INSERT ... INSERT ... INSERT... INSERT... INSERT...

590 MySQL Replication INSERT ... UPDATE ... INSERT ... INSERT... UPDATE...

591 MySQL Replication INSERT ... UPDATE ... INSERT ... DELETE ...

592 MySQL Replication Master

593 MySQL Replication Slave Master Slave Slave

594 MySQL Replication Slave Master Slave Slave

595 MySQL Replication Slave Master Slave Slave

596 MySQL Replication Slave Master Slave Slave

597 MySQL Replication Slave Master Slave Slave

598 Back at Cluster

599 mysqld mysqld mysqld NDB API NDB API

600 mysqld mysqld mysqld UPDATE UPDATE UPDATE NDB API NDB API

601 mysqld mysqld mysqld NDB API NDB API update() update()

602 mysqld mysqld mysqld NDB API NDB API update() update()

603 UPDATE DELETE INSERT update() update() mysqld mysqld mysqld NDB API

604 UPDATE DELETE INSERT update() update() mysqld mysqld mysqld NDB API

605 UPDATE DELETE INSERT mysqld mysqld mysqld

606 INSERT UPDATE DELETE INSERT UPDATE DELETE INSERT UPDATE DELETE SLAVE

607 ORDER? SLAVE INSERT UPDATE DELETE INSERT UPDATE DELETE INSERT UPDATE

608 Serialization is in the storage nodes

609 NDB Injector Thread A thread inside the MySQL Server
Subscribes to events in NDB The event of “Row was committed” Injects the rows into the binlog Producing a single, canonical binlog of your cluster not just one mysql server it contains EVERYTHING done on ALL ndbapi programs (including mysqld) connected to the cluster

610 UPDATE DELETE INSERT update() update() mysqld mysqld mysqld NDB API

611 A Closer Look...

612 MySQL Replication between Clusters
Application Application Application Application MySQL Server MySQL Server I/O thread Replication Master Apply thread Slave NdbCluste r Handler NdbCluste r Handler Binlog Relay Binlog Binlog NDB Kernel (Data nodes) ndbd NDB Kernel (Data nodes) ndbd

613 Who spotted the single point of failure?
One thing... Who spotted the single point of failure?

614 Redundant Replication Channels
MySQL Server MySQL Server Master Master I/O thread Apply thread MySQL Server Slave NdbCluster Handler NdbCluster Handler NdbCluster Handler Binlog Binlog Relay Binlog Binlog NDB Kernel (Data nodes) ndbd NDB Kernel (Data nodes) ndbd NdbCluster Handler Master MySQL Server Binlog Slave Relay Binlog Binlog Apply thread I/O thread NdbCluster Handler MySQL Server Replication

615 Redundant Replication Channels
MySQL Server MySQL Server Master Master I/O thread Apply thread MySQL Server Slave NdbCluster Handler NdbCluster Handler NdbCluster Handler Binlog Binlog Relay Binlog Binlog NDB Kernel (Data nodes) ndbd NDB Kernel (Data nodes) ndbd NdbCluster Handler Master MySQL Server Binlog Relay Binlog Binlog NdbCluster Handler Replication MySQL Server I/O thread Apply thread Slave

616 Redundant Replication Channels
MySQL Server MySQL Server MySQL Server MySQL Server Master Master Master Master I/O thread Apply thread MySQL Server Slave NdbCluster Handler NdbCluster Handler NdbCluster Handler NdbCluster Handler NdbCluster Handler Binlog Binlog Binlog Binlog Relay Binlog Binlog NDB Kernel (Data nodes) ndbd NDB Kernel (Data nodes) ndbd NdbCluster Handler Master MySQL Server Binlog Slave Relay Binlog Binlog Apply thread I/O thread NdbCluster Handler MySQL Server Replication

617 Redundant Replication Channels
MySQL Server MySQL Server Master Master I/O thread Apply thread MySQL Server Slave NdbCluster Handler NdbCluster Handler NdbCluster Handler Binlog Binlog Relay Binlog Binlog NDB Kernel (Data nodes) ndbd NDB Kernel (Data nodes) ndbd NdbCluster Handler Master MySQL Server Binlog Relay Binlog Binlog NdbCluster Handler Replication MySQL Server I/O thread Apply thread Slave

618 How do I make fail over happen?

619 But first...

620 Epoch A point of synchronization in the cluster
Everybody agrees on what transactions are disk persistent In case of system crash, this is where we'll recover to. About 9min 20sec on replication up to this point

621 Okay, but fail over?

622 Currently manual, but only four simple steps

623 STEP 1 Find out where the Slave is up to
In the binary log produced by the injector each Global Check Point (epoch) is a transaction Where we are up to is recorded on the slave The `mysql` database tracks these things here, the mysql.ndb_apply_status table. Which has two columns: server_id and epoch (both integers) and is ENGINE=ndb so is available everywhere! So, we mysqlS`> from mysql.ndb_apply_status; possibly with a WHERE clause for server_id

624 STEP 2 Find the binlog position for this epoch
The mysql.ndb_binlog_index table will help us here It maps binlog position to GCI and tells us the number of INSERT, UPDATES, DELETES and SCHEMAOPS per GCI Is MyISAM and is per-master So, we run the query from slave) mysqlM`> '/', -1), @pos:=Position FROM mysql.ndb_binlog_index WHERE epoch ORDER BY epoch ASC LIMIT 1;

625 STEP 3 Synchronize the second channel i.e. change the master
Run the query from last query) mysqlS`> CHANGE MASTER TO

626 STEP 4 mysqlS`> START SLAVE; No, really, that's it.

627 Limitations Fail over of replication channels is manual
can be scripted Since all updates are through one injector thread, there is a limit this limit is much less than what you can pump through a good cluster We are working to overcome this

628 You can now... Have a 99.999% uptime cluster with great performance
and a cool name Have replication between it and another Cluster (or single server) Load balancing Geographical redundancy Redundant replication channels between these setups Redundancy up the wazoo! On switching replication channels - 3.5mins

629 Disk Data

630 Two Phase Implementation
1. Data on disk (5.1) 2. Indexes on disk (7.1?)

631 A few concepts...

632 Where we store things Table Space Data file

633 Where we store things Table Space Data file Data file Data file

634 Where we store things Table Space Table Space Data file Data file

635 Where we store things Table Space Table Space Log file group Data file
Undo file Undo file

636 Where we store things Table Space Table Space Log file group Data file
Undo file Undo file

637 Files are per node Node 2 Node 1 df1 df1

638 Let's look at the SQL

639 CREATE LOGFILE GROUP CREATE LOGFILE GROUP lg_1 ADD UNDOFILE 'undo1' INITIAL_SIZE 16M UNDO_BUFFER_SIZE 2M ENGINE=NDB; We can add another undo file ALTER LOGFILE GROUP lg_1 ADD UNDOFILE 'undo2' INITIAL_SIZE 12M ENGINE=NDB; We currently don't auto-extend files For more space, add files

640 CREATE TABLESPACE CREATE TABLESPACE ts1 ADD DATAFILE 'datafile1' USE LOGFILE GROUP lg1 INITIAL_SIZE 32M ENGINE=NDB; We can add another datafile too ALTER TABLESPACE ts1 ADD DATAFILE 'datafile2' INITIAL_SIZE 48M ENGINE=NDB; We currently don't auto-extend So just add another file

641 CREATE TABLE CREATE TABLE t1 ( pk1 INT NOT NULL PRIMARY KEY, b INT NOT NULL, c INT NOT NULL) TABLESPACE ts1 STORAGE DISK ENGINE=NDB; b and c will be stored on disk pk1 in memory (as it's indexed)

642 I_S.FILES for Data files
what ts it belongs to Extent Size (bytes) Number of extents in file Number of free extents So, Free extents multiplied by extent size = free bytes that can be allocated to tables

643 A useful VIEW CREATE VIEW isf AS SELECT FILE_NAME, (TOTAL_EXTENTS * EXTENT_SIZE) AS 'Total', (FREE_EXTENTS * EXTENT_SIZE) AS 'Free', ( ((FREE_EXTENTS * EXTENT_SIZE)*100) / (TOTAL_EXTENTS * EXTENT_SIZE)) AS '% Free' FROM INFORMATION_SCHEMA.FILES WHERE ENGINE="ndbcluster" and FILE_TYPE = 'DATAFILE';

644 I_S.FILES for UNDO files
Free log space If running out, maybe need to add more

645 Optimized NR Traditional NR copy everything over the wire
PRO: easy to implement correctly PRO: not too bad for a few gigs of data CON: very very bad for disk data think 2TB going over the wire... ouch! Details on optimized node recovery for NDB in s1108-ronstrom.pdf recovery from checkpoint, so don't have to copy everything

646 Thank You

647 Find out More! MySQL Online Documentation
Cluster Forum Cluster Mailing List


Download ppt "MySQL Cluster overview and ndb-7.0 features demo"

Similar presentations


Ads by Google