Mark Callaghan mentioned a simple issue on DDL operation. I found the back porting interesting, but not always a possible option. I have nothing to comment on top of what was already stated there, but I’ll mention it because it is interesting to read.
Customers often ask me (and I believe all MySQL DBAs can relate): “Why did that query remain in a KILL state and does not go away?”
As we know, KILLed queries remain in MySQL until a clean up takes place, given that the action is not done synchronously. I found this article to be a good description of such events and a good reference to pass by to customers when they ask about the KILL state.
The first article is about MariaDB Hash Join, a test done in Percona. I am not fully convinced about the test results, specifically about the indexes used. But what is important is that we have an initial review of the feature, and we can then evaluate how and if it makes sense.
“BKA improves the query time by a huge margin for IO bound workload but does not make much of a difference to in-memory workload. Also BKA relies on both the join_buffer_size and the read_rnd_buffer_size/mrr_buffer_size and both of these buffers should be increased appropriately for the best possible performance gain. This is not entirely visible in the manual either for MariaDB or MySQL, but you need to appropriately increase read_rnd_buffer_size/mrr_buffer_size because these have an impact on MRR performance, and BKA uses the MRR interface, so these buffers indirectly impact BKA performance. I did not find much of a performance improvement from using Hash Join in MariaDB 5.5 or from Key-ordered Scan for TPCH query #3, in fact disabling both of these provided the best result for MariaDB 5.5 for in-memory workload.”
This article is another astonishing announcement about MySQL Cluster performance, but you must be careful when you read it.
Here is an excerpt: “…most recently delivering 1.2 Billion UPDATE operations per Minute across a cluster of 30 x commodity Intel Xeon E5-based servers.”
Cluster is known for its high speed and the scaling in write, so I am not impressed by the numbers, but let’s consider a few things…
NDBCluster is already a product that requires quite an in-depth knowledge if you want to use it correctly. It is also not the common MySQL solution, but it is “the” solution only for very special cases. NDBCLuster is unusable as a simple replacement of another storage engine; it requires good analysis and review, needs to be correctly placed and configured, and in those cases, is mainly associated to MySQL.
This means that MySQL works as the data interface against the applications. There are few limited cases where NDBCLuster is directly connected to an application using C++ API. In my many years of work with NDBCLuster, I have seen that only 10% of customers use C++.
At the same time, the benchmark reporting the numbers are normally executed against the NDBCluster directly (C++ API) and do not involve the MySQL layer. Those working with NDBCLuster know very well that skipping the MySQL layer makes it faster, a lot faster. So in the end, the benchmarks are true and good, but hide a small secret: The results are not applicable to MySQL, only to NDBCluster.
The difference is significant given that in MySQL you are going to use the SQL to query the cluster, and not using the C++ interface. Also, the usability and the effort of developing a good interrogation layer is quite significant, compared to SQL.
What I am saying is that the customer could not fully understand what those numbers mean and how they got there. So, it would be nice to have comparative results every time (MySQL -> NDBCLuster and NDBCluster + C++ API). All this would bring better understanding and will define correct expectations from the customer’s side.
That being said, I think Oracle is doing a great job on MySQL NDBCLuster. Awesome!
Leave a Reply