Sarah Sproehnle Cloudera, Inc

Sarah Sproehnle Cloudera, Inc sarah@cloudera.com

Outline  Identifying slow queries  Indexing techniques  Proper schema design  EXPLAIN  Rewriting subqueries  Common mistakes

Identifying slow queries  Use the slow query log! --log-slow-queries --long-query-time --log-queries-not-using-indexes  5.1.21+ allows microsecond granularity  5.1: SET GLOBAL slow_query_log = 1;  Do not use log_output = table  mysqldumpslow  Monitor SHOW PROCESSLIST  mytop, innotop, Enterprise Monitor (query analyzer is great, but adds latency; use selectively)  Ask around

Indexing  Most indexes are b-trees  B-tree is great for: WHERE col = x WHERE col > x WHERE col IS NULL WHERE col LIKE ‘foo%’ ORDER BY col [DESC] LIMIT n  Not useful for: WHERE function(col) = x WHERE col LIKE ‘%foo’ ORDER BY col -- without a LIMIT  Covering indexes Query: SELECT a FROM t WHERE b=‘foo’ ORDER BY c; Index: KEY(b, c, a);

Indexing continued  Creating the index can be painful (consider InnoDB plugin or replication switchover)  The order of columns in a composite index matters! KEY (a, b) will be used for: WHERE a = x WHERE a = x AND b = x WHERE a = x ORDER BY b SELECT b.. WHERE a = x but not… WHERE b = x  Hash indexes for Memory, NDB Cluster and InnoDB (adaptive) Fast and compact, but only useful for equality lookups  For InnoDB, do not append the PK to a secondary index  Watch out for duplicate indexes (mk-duplicate-key-checker)

Schema design  Normalize or denormalize?  Choose NOT NULL if possible  Choose good primary keys (keep them small!) They are often used as foreign keys; InnoDB uses the primary key as a row id  Keep your data small. Use the right data types: SMALLINT vs. INT vs. BIGINT CHAR vs. VARCHAR TIMESTAMP (4 bytes) vs. DATETIME (8 bytes) Store IP addresses as INT UNSIGNED (inet_aton) Use PROCEDURE ANALYSE()  Index a prefix of a string column: KEY(col(5))  Use RANGE partitioning, but careful which functions you use  Use replication to split read/writes

EXPLAIN  EXPLAIN SELECT… unfortunately doesn’t work on UPDATE/DELETE  Important that you use EXPLAIN on your true data set  Useful for verifying: Is an appropriate index being used? What order are the tables joined in? This is critical given MySQL’s nested join algorithm. Is a temporary table required? (“Using temporary”) Covering index or are seeks to the row(s) needed? (“Using index”)  Example: a query that looks innocuous SELECT.. WHERE idx IN (42, 101, 1024); EXPLAIN can reveal interesting results!

Rewriting subqueries  mysql> EXPLAIN SELECT name FROM Country WHERE code IN(SELECT countrycode FROM City WHERE population>9000000)\G *************************** 1. row *************************** select_type: PRIMARY table: Country type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 239 Extra: Using where *************************** 2. row ********* select_type: DEPENDENT SUBQUERY This query should not be correlated! table: City type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 4079 Extra: Using where

Rewriting subqueries  Instead of “WHERE col IN(SELECT..)”, write a JOIN SELECT DISTINCT Country.name FROM Country JOIN City ON code = countrycode WHERE City.population > 9000000;  In general: SELECT col FROM t1 WHERE id_col IN (SELECT id_col2 FROM t2 WHERE condition); Can be rewritten as follows: SELECT DISTINCT col FROM t1, t2 WHERE t1.id_col = t2.id_col AND condition; Fixed in 5.4!

Common mistakes/General advice  Need a random row? Do not use ORDER BY rand() LIMIT 1! Consider generating a random number and doing a lookup by auto_inc column.  Avoid hints (STRAIGHT_JOIN, FORCE INDEX)  Use hints when necessary, especially pre 5.1  Don’t set sort_buffer_size extremely large  Query cache: 256MB max Careful when benchmarking (use SQL_NO_CACHE)  memcached  Move long running queries (e.g., reporting queries) to a slave

More general advice  Consider other storage engines  Materialize data into a Memory or MyISAM table. Keep it fresh with triggers or an event. Use - -init-file for Memory tables  LIMIT for paging – not always suitable App servers generally do not scale with large result sets, but evaluating LIMIT n,m over and over is painful  For batch processing, consider Hadoop: Can handle very large datasets Sqoop moves data from MySQL to Hadoop (and back) Built in reliability and scalability Don’t think in MapReduce? Use Hive

Thanks for attending! Sarah Sproehnle Cloudera, Inc sarah@cloudera.com

Sarah Sproehnle Cloudera, Inc

Similar presentations

Presentation on theme: "Sarah Sproehnle Cloudera, Inc"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Sarah Sproehnle Cloudera, Inc

Similar presentations

Presentation on theme: "Sarah Sproehnle Cloudera, Inc"— Presentation transcript:

Similar presentations

About project

Feedback