10,000 Tables Can’t Be Wrong: Designing a Highly Scalable MySQL Architecture for Write-intensive Applications by Richard Chart
Chose MySQL for performance and stability, and less important but still there, experience and support. Support is becoming increasingly more and more important.
Starting point: 1 appliance supporting 200 devices
Problem/Goal: Extensible architecture with deep host and app monitoring, over 1000 devices with 100 mgmt points each
Distributed collection over a WAN, with latency and security concerns
Current reality: several times the scale of the original goal
Commercial embedded product, so they actually pay for the embedded MySQL server
Future: The fundamentals are sound: next generation of the product moves up another order of magnitude
ACID not important
Resilient to loss, because gaps in data do not invalidate the rest of the data
Data elements by themselves are valuable, but much more so when relationships are added.
Chose MyISAM because:
- writes were very fast when there was no locking
- data handling simple
- lower license cost than InnoDB
- no need for transactions
- needs to live on disk for weeks or month (MEMORY not good)
- MyISAM cached indexes were faster than MEMORY
- Likely will use MEMORY tables in the future
Scaling w/out lock contention:
dynamically created tables
multi-threaded but no more than 1 thread writing to a table at once
each thread is writing to a different table
each app has a db, each monitor has a table. If the db or table doesn’t exist, it’s created. So tables and databases are created dynamically.
How far does this go? >20,000 tables, >2,200 queries per second, 5 billion rows, 93% writes. The bottleneck is the I/O.
Appliance is RAID10.
As growth happens:
- Query cache invalidations and purges
- Increased lock contention
- sort data set size causing temporary disk tables rather than temporary in-memory tables
- must keep in mind system limits on open files with open_files_limit and table_cache
- Very important to measure DB stats over time so you can predict when the next bottleneck is.
- Many of the SHOW STATUS variables are point-in-time counters, which need to be graphed with 5 minute diffs, etc.
- Some stats must be combined to be useful, for instance table_locks_waited vs. table_locks_immediate.
- Can use MySQL Enterprise Monitor for this.
- monitors can be misleading — he showed an example where high CPU load seemed to correlate with spikes in created_tmp_tables, but they were unrelated — CPU was high because the clients were misconfigured after a software change, and the created_tmp_tables were also new due to the change, but # of rows being sorted in the tmp tables was tiny, so it wasn’t that the tmp tables were causing the CPU.
- Also want to summarize, aggregate and prune the data. Summarize hourly, daily, monthly, etc.
- Be careful, because deletes can eat up MySQL performance
Problems: Reporting tools don’t like dynamic db’s and tables, backups.
MyISAM doesn’t work for ALL tables on the larger sites — about 0.1% of their tables.