Posts by Danil Zburivsky
Building a secure Hadoop cluster requires protecting a number of services which comprise Hadoop infrastructure. If you are using CDH distribution, then Cloudera Manager (CM) is one of the components that needs to be secured. There is a good step by step guide in CM documentation, and it’s easy to follow for one server, but what when you have hundreds of them? There are different approaches to the problem of managing server’s configuration at scale, but I’d like to focus on Ansible which is a neat framework for parallel commands execution and complex rollouts.
HDFS authentication model changed in recent releases, but documentation is stale which can lead people into thinking HDFS is using very primitive authentication
I was presented with test results which showed that IN query was about 100 times faster than OR query. Where OR query took minutes to run, IN query took seconds! Ok, I said to myself, it is time to start digging. Here are my findings.
I spent last week at Collaborate 2012 in Las Vegas and it was really great experience in many ways. I am a MySQL DBA and have been working with MySQL for most of my career, so Collaborate didn’t seem like an obvious choice. It turned out that there are so many things that I can learn from Oracle professionals and the Oracle community which can be applied in MySQL world as well. For me an indication of a good conference is when you come back inspired and full of ideas.
I had to refresh my knowledge on how InnoDB threads queue works the other day when debugging activity spikes on one of the customer’s production system and while I had general idea about InnoDB kernel and queue, thread concurrency and queue join delays I didn’t have a complete model of how InnoDB concurrency control works. So I started from manual…
MySQL Replication is a powerful tool and it’s hard to find a production system not using it. On the other hand debugging replication issues can be very hard and time consuming. Especially if your replication setup is not straightforward and you are using filtering of some kind. Let’s look at an issue I had..
Using algorithm described in “Relational Database Index Design and the Optimizers by Tapio Lahdenmaki and Mike Leach”, I quickly came up with two indexes and while first one looked fine, I was really confused by the second one for the elimination of the sort. Let me show an example, not copy one from the book, but rather show a test I did with MySQL.
A couple of days ago I was reading a paper Paxos Made Live – An Engineering Perspective written by Google engineers. It is an interesting reading about implementation of Paxos algorithm for building a fault-tolerant database. But one paragraph made me think I am reading something very familiar…
Today I’ve spent some time (more than this issue was worth, actually) on a client’s system trying to find out why table was not accessible and failed. The error message suggested something went very wrong with .frm file and I already started thinking about restoring the table from backup, when I noticed that accessing any InnoDB table was producing same error. A quick check of the error log showed that when MySQL server was restarted some time ago InnoDB failed to initialize due to a memory issue.
Quite often we need to perform a so-called “MySQL instance audit”. This common DBA procedure should give you a general view of the MySQL environment. You may be interested in a basic understanding of what kind of operation MySQL performs, how much memory does it use, or how well does it look from the performance point of view. There is no easy out-of-the-box way to do such an audit on a MySQL server. Fortunately there are several tools to make this process easier. Among most popular are mysqlreport and MySQLTuner. In this post I’d like to give a brief overview of MySQLTuner.