<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Basic I/O Monitoring on Linux</title>
	<atom:link href="http://www.pythian.com/news/247/basic-io-monitoring-on-linux/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.pythian.com/news/247/basic-io-monitoring-on-linux/</link>
	<description>News and views from Pythian DBAs</description>
	<lastBuildDate>Fri, 10 Feb 2012 13:01:25 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.4</generator>
	<item>
		<title>By: Alex Gorbachev</title>
		<link>http://www.pythian.com/news/247/basic-io-monitoring-on-linux/#comment-698519</link>
		<dc:creator>Alex Gorbachev</dc:creator>
		<pubDate>Tue, 03 Jan 2012 14:31:11 +0000</pubDate>
		<guid isPermaLink="false">http://www.pythian.com/blogs/247/basic-io-monitoring-on-linux#comment-698519</guid>
		<description>mark, not sure how to interpret your comment and how this metric is supposed to be interpreted if you are right about it and what time exactly it measures in ms.

Average time spent by an IO request in the queue is await-svctm to the best of my knowledge. What exactly do you extract reading the kernel source?</description>
		<content:encoded><![CDATA[<p>mark, not sure how to interpret your comment and how this metric is supposed to be interpreted if you are right about it and what time exactly it measures in ms.</p>
<p>Average time spent by an IO request in the queue is await-svctm to the best of my knowledge. What exactly do you extract reading the kernel source?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: mark</title>
		<link>http://www.pythian.com/news/247/basic-io-monitoring-on-linux/#comment-695803</link>
		<dc:creator>mark</dc:creator>
		<pubDate>Sun, 01 Jan 2012 09:10:55 +0000</pubDate>
		<guid isPermaLink="false">http://www.pythian.com/blogs/247/basic-io-monitoring-on-linux#comment-695803</guid>
		<description>Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
 sdj 0.00 17236.60 0.00 148.20 0.00 67.40 931.42 144.41 915.11 6.75 100.00
 sdj1 0.00 17236.60 0.00 148.20 0.00 67.40 931.42 144.41 915.11 6.75 100.00

you say:
&quot;You already have 144 IOs in the device queue&quot;

avgqu-sz ,is not average queue lenth,it represents the waiting time(ms) that all requests in the queue .

you can deep into kernel code.</description>
		<content:encoded><![CDATA[<p>Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util<br />
 sdj 0.00 17236.60 0.00 148.20 0.00 67.40 931.42 144.41 915.11 6.75 100.00<br />
 sdj1 0.00 17236.60 0.00 148.20 0.00 67.40 931.42 144.41 915.11 6.75 100.00</p>
<p>you say:<br />
&#8220;You already have 144 IOs in the device queue&#8221;</p>
<p>avgqu-sz ,is not average queue lenth,it represents the waiting time(ms) that all requests in the queue .</p>
<p>you can deep into kernel code.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Alex Gorbachev</title>
		<link>http://www.pythian.com/news/247/basic-io-monitoring-on-linux/#comment-646021</link>
		<dc:creator>Alex Gorbachev</dc:creator>
		<pubDate>Mon, 21 Nov 2011 18:58:57 +0000</pubDate>
		<guid isPermaLink="false">http://www.pythian.com/blogs/247/basic-io-monitoring-on-linux#comment-646021</guid>
		<description>It&#039;s OK to have 100% utilization (it&#039;s good) as long as response time and throughput are satisfactory. But remember that iostat &amp; sar data is aggregate - so if you have problems that are tens of seconds long (and you periods are several minutes) and otherwise steady state - you might not see it in sar.

In your case, it might very well be that IO requests are processed by the application mostly synchronously - issuing and IO and waiting on it an then another one and waiting and so on.

In many cases, you can&#039;t say whether IO is a bottleneck using iostat - say doesn&#039;t tell you how much time IO contributed to you response time. What you see in iostat is what&#039;s *average* IO workload and how you IO subsystem delivers. You need to dig into application instrumentations to understand the bottleneck. In case of an Oracle database as an application using IO as a service - you can user 10046 trace.

dm-0 and dm-1 is how your multipathed devices are represented.</description>
		<content:encoded><![CDATA[<p>It&#8217;s OK to have 100% utilization (it&#8217;s good) as long as response time and throughput are satisfactory. But remember that iostat &#038; sar data is aggregate &#8211; so if you have problems that are tens of seconds long (and you periods are several minutes) and otherwise steady state &#8211; you might not see it in sar.</p>
<p>In your case, it might very well be that IO requests are processed by the application mostly synchronously &#8211; issuing and IO and waiting on it an then another one and waiting and so on.</p>
<p>In many cases, you can&#8217;t say whether IO is a bottleneck using iostat &#8211; say doesn&#8217;t tell you how much time IO contributed to you response time. What you see in iostat is what&#8217;s *average* IO workload and how you IO subsystem delivers. You need to dig into application instrumentations to understand the bottleneck. In case of an Oracle database as an application using IO as a service &#8211; you can user 10046 trace.</p>
<p>dm-0 and dm-1 is how your multipathed devices are represented.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: sid</title>
		<link>http://www.pythian.com/news/247/basic-io-monitoring-on-linux/#comment-640983</link>
		<dc:creator>sid</dc:creator>
		<pubDate>Thu, 17 Nov 2011 00:35:03 +0000</pubDate>
		<guid isPermaLink="false">http://www.pythian.com/blogs/247/basic-io-monitoring-on-linux#comment-640983</guid>
		<description>Hi Alex,

Hi, what does this mean:

avgqu-sz await svctm %util
1.17      3.66  3.11  99.72

1. Why is the disk hitting almost 100% utilization? Is it alright and can we say there is no IO bottlneck as await is almost equal to svctm and avgqu-sz is close to 1.

This is happening on a RAID 5 SAN partition.

2. Iostat shows the SAN disks as dm-0 and dm-1 so what does that mean in terms of IOstat output? Should it be read just as any normal disk?</description>
		<content:encoded><![CDATA[<p>Hi Alex,</p>
<p>Hi, what does this mean:</p>
<p>avgqu-sz await svctm %util<br />
1.17      3.66  3.11  99.72</p>
<p>1. Why is the disk hitting almost 100% utilization? Is it alright and can we say there is no IO bottlneck as await is almost equal to svctm and avgqu-sz is close to 1.</p>
<p>This is happening on a RAID 5 SAN partition.</p>
<p>2. Iostat shows the SAN disks as dm-0 and dm-1 so what does that mean in terms of IOstat output? Should it be read just as any normal disk?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Alex Gorbachev</title>
		<link>http://www.pythian.com/news/247/basic-io-monitoring-on-linux/#comment-638701</link>
		<dc:creator>Alex Gorbachev</dc:creator>
		<pubDate>Mon, 14 Nov 2011 14:27:55 +0000</pubDate>
		<guid isPermaLink="false">http://www.pythian.com/blogs/247/basic-io-monitoring-on-linux#comment-638701</guid>
		<description>@sid: there is no &quot;iostat&quot; command but you can issue simple &quot;mount&quot; command to see which devices mounted on which mountpoints.</description>
		<content:encoded><![CDATA[<p>@sid: there is no &#8220;iostat&#8221; command but you can issue simple &#8220;mount&#8221; command to see which devices mounted on which mountpoints.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: sid</title>
		<link>http://www.pythian.com/news/247/basic-io-monitoring-on-linux/#comment-638153</link>
		<dc:creator>sid</dc:creator>
		<pubDate>Mon, 14 Nov 2011 02:30:41 +0000</pubDate>
		<guid isPermaLink="false">http://www.pythian.com/blogs/247/basic-io-monitoring-on-linux#comment-638153</guid>
		<description>Alex, as you know most enterprise systems are on filesystems stored on SAN.

how do we run iostat to get information on something like 
/dev/mapper/mpath0p1 mounted on /somepartition</description>
		<content:encoded><![CDATA[<p>Alex, as you know most enterprise systems are on filesystems stored on SAN.</p>
<p>how do we run iostat to get information on something like<br />
/dev/mapper/mpath0p1 mounted on /somepartition</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: ??IOStat????IO?? at ??e?-Tech</title>
		<link>http://www.pythian.com/news/247/basic-io-monitoring-on-linux/#comment-611835</link>
		<dc:creator>??IOStat????IO?? at ??e?-Tech</dc:creator>
		<pubDate>Thu, 13 Oct 2011 06:01:08 +0000</pubDate>
		<guid isPermaLink="false">http://www.pythian.com/blogs/247/basic-io-monitoring-on-linux#comment-611835</guid>
		<description>[...] Basic I/O Monitoring on Linux  FROM: http://huoding.com/2011/07/13/91  ??????????darkstat [...]</description>
		<content:encoded><![CDATA[<p>[...] Basic I/O Monitoring on Linux  FROM: http://huoding.com/2011/07/13/91  ??????????darkstat [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Alex Gorbachev</title>
		<link>http://www.pythian.com/news/247/basic-io-monitoring-on-linux/#comment-588583</link>
		<dc:creator>Alex Gorbachev</dc:creator>
		<pubDate>Tue, 06 Sep 2011 18:07:12 +0000</pubDate>
		<guid isPermaLink="false">http://www.pythian.com/blogs/247/basic-io-monitoring-on-linux#comment-588583</guid>
		<description>Without any further analysis, to me it seems you&#039;ve reached the throughput for this device - you are processing writes that seems to be almost 512K in size (strace to get exact numbers).

You already have 144 IOs in the device queue and your device simply can&#039;t process them faster than that. If IOs are done sequentially and each is 6.75ms long and you are doing 148.2 IOs per second then in one second you get 1,000 ms of IOs. Providing they are all serialized to a single IO thread - there is no way unless you can split those IOs.

I don&#039;t see how increasing queue size would help (unless you are talking about different queue which is set to 1 right now somewhere in the layer below normal linux device queue - didn&#039;t look into iSCSI software much). You could try to look into why you average actual IO size is half of max 1MB that you are requesting. Maybe limitation of your iSCSI device or some Linux config - this could be a simple way to increase throughput.

Another place to look is your Linux IO scheduler.

I don&#039;t know if running two dd&#039;s in parallel would make any difference.

Anyway, dd test is pretty artificial. If you need to simulate Oracle workload - do yourself a favor and have a look at ORION.</description>
		<content:encoded><![CDATA[<p>Without any further analysis, to me it seems you&#8217;ve reached the throughput for this device &#8211; you are processing writes that seems to be almost 512K in size (strace to get exact numbers).</p>
<p>You already have 144 IOs in the device queue and your device simply can&#8217;t process them faster than that. If IOs are done sequentially and each is 6.75ms long and you are doing 148.2 IOs per second then in one second you get 1,000 ms of IOs. Providing they are all serialized to a single IO thread &#8211; there is no way unless you can split those IOs.</p>
<p>I don&#8217;t see how increasing queue size would help (unless you are talking about different queue which is set to 1 right now somewhere in the layer below normal linux device queue &#8211; didn&#8217;t look into iSCSI software much). You could try to look into why you average actual IO size is half of max 1MB that you are requesting. Maybe limitation of your iSCSI device or some Linux config &#8211; this could be a simple way to increase throughput.</p>
<p>Another place to look is your Linux IO scheduler.</p>
<p>I don&#8217;t know if running two dd&#8217;s in parallel would make any difference.</p>
<p>Anyway, dd test is pretty artificial. If you need to simulate Oracle workload &#8211; do yourself a favor and have a look at ORION.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: fred</title>
		<link>http://www.pythian.com/news/247/basic-io-monitoring-on-linux/#comment-588053</link>
		<dc:creator>fred</dc:creator>
		<pubDate>Mon, 05 Sep 2011 15:15:34 +0000</pubDate>
		<guid isPermaLink="false">http://www.pythian.com/blogs/247/basic-io-monitoring-on-linux#comment-588053</guid>
		<description>Hi Alex,
Doing a dd test from linux to iSCCI luns, we get high values for await though the svctm col gives good values :

# dd if=/dev/zero of=/vdbench/test bs=1024k

Device:          rrqm/s wrqm/s   r/s  w/s        rMB/s   wMB/s  avgrq-sz avgqu-sz   await    svctm  %util
sdj               0.00 17236.60  0.00 148.20     0.00    67.40  931.42   144.41     915.11   6.75   100.00
sdj1              0.00 17236.60  0.00 148.20     0.00    67.40  931.42   144.41     915.11   6.75   100.00

So here, the virtual qutim value is very very high, hence the average time a request is spending in the queue is quite abnormal. Do you think that increasing the iscsi queue depth on client side (linux) would improve something ?
Well, I may be far away from the root cause of this.
Many thanks.
fred</description>
		<content:encoded><![CDATA[<p>Hi Alex,<br />
Doing a dd test from linux to iSCCI luns, we get high values for await though the svctm col gives good values :</p>
<p># dd if=/dev/zero of=/vdbench/test bs=1024k</p>
<p>Device:          rrqm/s wrqm/s   r/s  w/s        rMB/s   wMB/s  avgrq-sz avgqu-sz   await    svctm  %util<br />
sdj               0.00 17236.60  0.00 148.20     0.00    67.40  931.42   144.41     915.11   6.75   100.00<br />
sdj1              0.00 17236.60  0.00 148.20     0.00    67.40  931.42   144.41     915.11   6.75   100.00</p>
<p>So here, the virtual qutim value is very very high, hence the average time a request is spending in the queue is quite abnormal. Do you think that increasing the iscsi queue depth on client side (linux) would improve something ?<br />
Well, I may be far away from the root cause of this.<br />
Many thanks.<br />
fred</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kenneth</title>
		<link>http://www.pythian.com/news/247/basic-io-monitoring-on-linux/#comment-582595</link>
		<dc:creator>Kenneth</dc:creator>
		<pubDate>Thu, 25 Aug 2011 08:12:29 +0000</pubDate>
		<guid isPermaLink="false">http://www.pythian.com/blogs/247/basic-io-monitoring-on-linux#comment-582595</guid>
		<description>Hi Alex,

We are currently having some performance issues, our currently linux installation is on Red Hat 4.1.2-48.  It has an Oracle 11g installed and based on the Oracle dba&#039;s we are having I/O contention.  After reading your post and analyzing the iostat results we received, the stats does not seem to point to a contention.

My question is, how does the avgqu-sz relate to await, as based on the stats i get, await is greater than svctm, but avgqu-sz is not relevant.  Can you help me interpret the numbers below?  Btw, disk is on a SAN setup, and the stats below are based on some of the most questionable numbers.

avgqu-sz   await  svctm
 0.70  107.03   0.77
 0.79   15.35   0.45
 0.70   12.11   0.10
 0.28   10.96   0.97
 0.56    9.95   0.60</description>
		<content:encoded><![CDATA[<p>Hi Alex,</p>
<p>We are currently having some performance issues, our currently linux installation is on Red Hat 4.1.2-48.  It has an Oracle 11g installed and based on the Oracle dba&#8217;s we are having I/O contention.  After reading your post and analyzing the iostat results we received, the stats does not seem to point to a contention.</p>
<p>My question is, how does the avgqu-sz relate to await, as based on the stats i get, await is greater than svctm, but avgqu-sz is not relevant.  Can you help me interpret the numbers below?  Btw, disk is on a SAN setup, and the stats below are based on some of the most questionable numbers.</p>
<p>avgqu-sz   await  svctm<br />
 0.70  107.03   0.77<br />
 0.79   15.35   0.45<br />
 0.70   12.11   0.10<br />
 0.28   10.96   0.97<br />
 0.56    9.95   0.60</p>
]]></content:encoded>
	</item>
</channel>
</rss>

