Emre Sokullu
Aug 17

High Performance MySQL Scaling notes / tips

Mon 17 Aug 2009 04:11:15 | 2 comments
  • MySQL Cluster is not the solution; consider it as a different database and don't forget that it's not very efficient, so ends up costing you much
  • Use Memcached
  • Your memcache machine should be close to your application/web servers, not the database
  • Sharding is hard. There are things you can do before sharding. Replicate as much as you need, so that you can split read and write operations. One machine dedicated to write only.
    • Then the most logical thing to do for sharding is horizontal sharding; which means host different tables in different machines. But this is needed only in extreme cases, as you grow like crazy, like grou.ps
    • Before that, you can maximize your write capabilities, you can get a beefy server with 128MB of ram, 4x Hexacore processors. Note that you'll need to use xtradb on such a configuration because regular mysql setup does not scale well to more than 4 cores, instabilities do occur.
    • And even before that, try optimizing your code. 
  • You can optimize your queries by enabling logging of the slow queries
  • For a 16GB of ram, 400 is the maximum number of connections you need to have on the machine.
  • If SHOW PROCESSLIST; displays too many Sleeping connections, no problem, don't worry about them.
  • You don't recover from table locks with myisam, auto_increment still requires table lock
  • Use maatkit tools. mk_audit recommended as a good start
  • If you are replicating, you'll need mk_slave_restart at some point, but don't forget that, it's dangerous, can create inconsistencies
  • use mk-query-digest to collect information about the incoming mysql queries; then you can optimize them using the EXPLAIN command - tcpdump is a microsecond level alternative for the very same job
  • take off unused databases and tables
  • for a fast updating environment, cronjob mk-query-digest and mk-duplicate-key-checker
  • Consider a dual master setup; use mmm (1.x branch is recommended as of this writing) or flipper - good 
  • hot backup strategies: lvm_backup, mk-parallel-dump, innobackup,
  • do not use set where you can use enum; set is for options that can be chosed more than once
  • vmstat 5 5 to see the status of your disk
  • even though you don't use myisam, the minimum key_buffer size should be 32MB
  • InfoBright is good for analytics tables
  • XFS is good as the underlying filesystem of innodb systems.
If you have other tips, please add in the comments; this list will be updated with your name.

Other Sources:
http://20bits.com/articles/10-tips-for-optimizing-mysql-queries-that-dont-suck/
http://provenscaling.com/blog/2008/10/09/introducing-flipper-for-managing-mysql-master-pairs/

http://provenscaling.com/software/flipper/docs/html/
http://www.howtoforge.com/mysql_master_master_replication
http://www.howtoforge.org/mysql_database_replication

Comments

Hi, Some comments in line; I mostly have worked on Oracle but these concepts apply to RDBMS in general. * Sharding is hard. There are things you can do before sharding. Replicate as much as you need, so that you can split read and write operations. One machine dedicated to write only. > Partitioning is also a solution to contending on I/O because you are writing to the same blocks you are reading from when the PK in several rows are stored in that block. Partitioning can help you spread your load across many disks, which will improve throughput by reducing contention, but requires deep understanding of your data model. o Before that, you can maximize your write capabilities, you can get a beefy server with 128MB of ram, 4x Hexacore processors. Note that you'll need to use xtradb on such a configuration because regular mysql setup does not scale well to more than 4 cores, instabilities do occur. > I think you mean 128GB. But remember, throwing hardware at a problem you don't understand just makes your problem more expensive. That's why I think the most important thing you can do is the next two of your bullet points: o And even before that, try optimizing your code. * You can optimize your queries by enabling logging of the slow queries > A good rule of thumb is that if you haven't optimized your queries your queries are slow. > If you are going to do performance tuning be sure you have goals you are tuning towards. Don't say "get faster", but say "These select queries need to be under 300 ms because otherwise the user will have a bad experience on this page." Once you have those goals, you can then say "... and I can trade write speed in these queries for the more user-important read speed by adding an index because my users' read:write ratio on this page is 10000:1 and the writes could even be done asynchronously." * For a 16GB of ram, 400 is the maximum number of connections you need to have on the machine. > Why? * hot backup strategies: lvm_backup, mk-parallel-dump, innobackup, > Test recovery. Most small shops don't. * vmstat 5 5 to see the status of your disk > Install sysstat (it may already be installed) and use iostat to see the writes to each block device. The slowest bus that your DB interacts with is most likely your disk, so there's a lot to be gained by using faster disks and more disks. > Others: Trend your performance metrics! Nothing worse then saying "I don't know why it was slow last night at 2am Eastern". Much better to look at the graphs and say "The mysql process was stuck in iowait at 2am as shown on this graph, and our web server usage logs from 2am show us getting lots of requests from Japan at the same time, hitting a page the loads that new report nobody has yet used. We'll need to look at ways to optimize the report to do less I/O." -Adan



or
Connect with Facebook

Powered by

Twitter

Bir Digital Turks Uyesiyim

More about me...

Destekliyoruz...

About My Company