This is my fourth week at Pythian and in Canada and I'm starting to get back to my normal life cycle --- my personal things are getting sorted and my working environment is set. Here at Pythian I'm in a team of four people together with Christo, Joe, and Virgil. (I should write another post about beginning at Pythian --- will do one day.)
Yesterday, I asked Christo to show me how he monitors I/O on Linux. I needed to collect statistics on a large Oracle table on a production box, and wanted to keep an eye on the impact. So we grabbed Joe as well and sat all three around my PC. While we were discussing, Paul was around and showed some interest in the topic we discussed --- otherwise, why would we all three be involved?. Anyway, Dave and Paul thought that this would be a nice case for a blog post. So here we are... Indeed, while the technique we discuss here is basic, it gives a good overview and is very easy to use.
So let get focused... We will use iostat utility. In case you need you know where to find more about it --- right, man pages. So we will use the following form of the command:
iostat -x [-d] <interval>
avg-cpu: %user %nice %sys %iowait %idle 6.79 0.00 3.79 16.97 72.46If you have many devices and you want to watch for only some of them, you can also specify device names on command line: iostat -x -d sda 5
Now let's get to the most interesting part --- what those cryptic extended statistics are. (For readability, I formatted the report above so that the last two lines are in fact a continuation of the first two.)
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s sda 0.00 12.57 10.18 9.78 134.13 178.84 67.07 wkB/s avgrq-sz avgqu-sz await svctm %util 89.42 15.68 0.28 14.16 8.88 17.72
or
Traditionally, it's common to assume that the closer to 100% utilization a device is, the more saturated it is. This might be true when the system device corresponds to a single physical disk. However, with devices representing a LUN of a modern storage box, the story might be completely different. Rather than looking at device utilization, there is another way to estimate how loaded a device is.
Look at the non-existent column I mentioned above --- qutim --- the average time a request is spending in the queue. If it's insignificant, compare it to svctim --- the IO device is not saturated. When it becomes comparable to svctim and goes above it, then requests are queued longer and a major part of response time is actually time spent waiting in the queue.
The figure in the await column should be as close to that in the svctim column as possible. If await goes much above svctim, watch out! The IO device is probably overloaded.
There is much to say about IO monitoring and interpreting results. Perhaps this is only the first of a series of posts about IO statistics. At Pythian we often come across different environments with specific characteristics and various requirements that our clients have. So stay tune --- more to come.
Update 12-Feb-2007: You might also find useful Oracle Disk IO Basics session of Pythian Goodies.
Ready to optimize your Oracle Database for the future?