Alejandro Álvarez on behalf of the FTS team

Alejandro Álvarez on behalf of the FTS team
The FTS Case Alejandro Álvarez on behalf of the FTS team

Introduction about FTS
Implements low-level transferring for LHCb, ATLAS and CMS And a few other smaller Multi-level, fair share transfer scheduler Maximize resource usage & congestion avoidance Multiple protocol support Support for recall from tape

Introduction about FTS
Experiments need to copy a big number of files between sites They send the list to FTS3 FTS3 decides how many transfers to run (optimizer) and how to share them (scheduler) FTS3 runs the transfer when suitable Messages are sent for asynchronous clients and for monitoring

MySQL in FTS3 FTS3 uses MySQL to keep the queue, and the state of each transfer When scheduling, need to get them from there On changes, need to update the DB For the optimizer and monitoring views, need to aggregate One database used by a few hosts

MySQL in FTS3 Performing well with the DB is necessary for a well performing service MySQL could be quite stressed (80% CPU usage wasn’t rare) Architecture changes were (are) considered, but that’s very hard! Can’t take years to make things better The DB was a “low” hanging fruit

MySQL in FTS3 Some ideas were already in place
Each node only access an even subset of the tables Avoids contention

Today Architecture is still the same
CPU usage now between 14% and peaks of 50% Way better! We can do more with the same What changed?

Step 1: Disable backups Yes, really DboD scheduled backups
We can afford it Recovering from 23 hours old backup is worse than non recovering at all (for us!) Was damaging us Symptom: blocked queries and a “FLUSH TABLES WITH READ LOCK;”

Step 1: Disable backups From time to time we would see MySQL (thus, FTS3) deadlocking A massive query Q1 may be read-locking table T1 mysqldump tries to get a global lock, blocking updates first, but then mysqldump gets blocked by Q1 All updates get blocked until Q1 is done

Step 1: Disable backups Don’t do this! Again, we can afford it
Nice to know this can happen You may be able to live without single transaction dumps Or reconsider the long queries Or use master-slave replication

Step 2: Profile database
Slow queries Archival of old jobs is the most time consuming part! Unused indexes Archive tables… hum Thanks to the DB people!

Step 3: Reconsider engine type
ARCHIVE is better for data rarely read, never modified Not indexed Low disk footprint, fast INSERT Perfect for the archive tables!

Step 4: Low hanging fruit
Drop unused and redudant fields Smaller reads Reconsider column types Reconsider index types

Reconsider column types Some string fields (state) could be enums 1 byte vs ~O(10) Indexed! Adding is cheap, deleting/renaming is expensive Some string fields could be booleans And others could be shorter

Reconsider index type BTREE vs HASH index type HASH only supported by MEMORY engine Nevermind, then

Step 5: Slow queries Gives a hint at what indexes to add
Look at EXPLAIN <query> Indexes improves SELECT, hurt INSERT/DELETE, and maybe UPDATE

Step 6: Redundant indexes
(a, b, c) covers a a,b a,b,c (a,b,c) is redundant with (b,c,a), but (b,c) may be needed, and (a,b) never

Step 6: Redundant indexes
Very coupled with which queries are run Queries can be reworded to match index Or an index can be added to match the query More queries using the same index => less indexes => good Harder, because you need to move queries and indexes in lockstep

Step 7: Rewrite queries Multiple nodes may pick the same entry
SELECT FOR UPDATE is bad, locks the record for potentially a long time SELECT job FROM t_job WHERE running=0 FOR UPDATE UPDATE SET running = 1 WHERE job = X Rather, UPDATE WHERE + affected rows SELECT job FROM t_job WHERE running=0 UPDATE t_job SET running = 1 WHERE job = X AND running = 0 mysql_affected_rows() > 0

An example Retrieve files to recall from tape was very slow
Reading way more than needed # Attribute pct total min max avg % stddev median # ============ === ======= ======= ======= ======= ======= ======= ======= # Count # Exec time s s s s s s s # Lock time ms 115us 2ms 211us 515us 161us 159us # Rows sent k # Rows examine G k G M M M M # Query size k

An example The original query had a DEPENDENT SUBQUERY, which blows the number of rows read It degraded with time, ending N2 To fix it, had to consider both query an index Many iterations of EXPLAIN and rewrite Managed to drop the nested query Turned into a self-JOIN

An example To make queries easier, wrapped the self-join into a view
2 PRIMARY + DEPENDENT SUBQUERY went away, replaced with three SIMPLE Added an index to make it better

An example Exec time: 395 835s => 13 490s
# Attribute pct total min max avg % stddev median # ============ === ======= ======= ======= ======= ======= ======= ======= # Count # Exec time s s s s s s s # Lock time ms 227us 14ms 476us 384us 1ms 287us # Rows sent k # Rows examine G k M M M M M # Query size k Exec time: s => s No way around knowing your queries, and iterating

TODO: UUID are terrible keys
36 characters InnoDB stores the primary key on all secondary keys It is randomly distributed which is actually bad Scattered writes Fragmentation See Percona’s blog post on it

Alejandro Álvarez on behalf of the FTS team

Similar presentations

Presentation on theme: "Alejandro Álvarez on behalf of the FTS team"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Alejandro Álvarez on behalf of the FTS team

Similar presentations

Presentation on theme: "Alejandro Álvarez on behalf of the FTS team"— Presentation transcript:

Similar presentations

About project

Feedback