THE WORLD DISCUSSES #PYTHIAN ON TWITTER. HAVE A QUESTION? USE OUR HASHTAG AND ASK AWAY.

Oracle Database Appliance: Storage Performance — Part 1

Today I want to show what kind of IO performance we can get from Oracle Database Appliance (ODA). In this part, I will focus on hard disks. That’s right — those good old brown spinning disks.

I often use Oracle ORION tool to stress-test an IO subsystem and find it’s limits. It’s a very simple and handy tool and usually provide most of the IO simulation I need.

I usually benchmark for small random IOs and for large sequential reads. This gives me a good idea what I can expect from this IO subsystem for OLTP workloads as well as bulk data processing workloads including data warehousing, backups and batch activity. I usually don’t stress test mixed workload until I know what’s the profile of the application that I will run on this platform. In this particular case, I’m more after generic IO stress test and finding the limit.

Today, let’s talk about small random IOs which is the attribute of the OLTP workloads. I’m interested in single IO response time and IOs per second (IOPS).

When I stress test an IO subsystem I usually process average numbers but I always remember that averages are just that — averages. Because my artificial ORION workload is pretty randomly distributed and I use reasonably small intervals, the results have good confidence for me but in some cases I would want to dig further and collect some histograms of IO latencies. I haven’t done it for Oracle Database Appliance though and knowing what’s behind I expect response time to be quite consistent – there is no disk cache or something similar that skews response time.

I should note that I use term IO response time and IO latency interchangeable here in case you are using these terms differently. It might be a bad habit but that’s what I do.

Before I stress test an IO subsystem, I usually set some expectations. Let’s do the same here. ODA has 20 disks – 15K RPM SAS disks. My experience tells me that I should expect very good single IO latency (below 10ms) from these disks serving at least 100 IOPS each. I also expect that these disks will still provide reasonable response time if you crank up the workload to about 200 IOPS but this is where I would see much higher response times — getting into 20ms range. Now, I know that 15K RPM SAS disk can deliver even more IOPS each but then IO response time becomes generally unacceptable for OLTP systems. In fact, 10 ms target is what’s been a good rule of thumb in the last decade.
Read the rest of this entry . . .

Determining I/O throughput for a system

At Kscope this year, I attended a half day in-depth session entitled Data Warehousing Performance Best Practices, given by Maria Colgan of Oracle. In that session, there was a section on how to determine I/O throughput for a system, because in data warehousing I/O per second (iops) is less important than I/O throughput (how much actual data goes through, not just how many reads/writes).

The section contained an Oracle-specific in-database tool, and a standalone tool that can be used on many operating systems, regardless of whether or not a database exists:
Read the rest of this entry . . .

Log Buffer #182, a Carnival of the Vanities for DBAs

This is the 182nd edition of Log Buffer, the weekly review of database blogs. Make sure to read the whole edition so you do not miss where to submit your SQL limerick!

This week started out with me posting about International Women’s Day, and has me personally attending Confoo (Montreal) which is an excellent conference I hope to return to next year. I learned a lot from confoo, especially the blending nosql and sql session I attended.

This week was also the Hotsos Symposium. Doug’s Oracle Blog has a series of posts about Hotsos. If all this talk about conferences has gotten you excited, Joshua Drake notes that 14 days and the hotel is almost full for postgresql conference east which is March 25th-28th in Philadelphia. And the Oracle database insider notes that the Oracle OpenWorld call for papers is now open.

According to Susan Visser this week (ending tomorrow) is also read an e-book week. So if you have not already done so, read an e-book! She links a coupon for an e-book in the post.
Read the rest of this entry . . .

Announcing Sydney Oracle Meetup #6 — Storage for Oracle Databases

What: Sydney Oracle Meetup #6 — storage for Oracle databases

When: June 17, 2009 5:30 PM. Please RSVP Yes/No/Maybe.

Where: Our usual location at Sydney CBD. Level 3 this time!

Details:

We will start at 5:30PM with pizza and drinks and roll on from there as usual.
Note that we are meeting at the level 3 this time!

This meetup will be focused on storage technologies for Oracle database. It looks like a short presentation on Oracle Automatic Storage Management is in order – quite a few people are missing the concepts of the Oracle flagman storage storage solution and it’s useful to understand the approach whether you use it now or not.

So the presentation is – Oracle ASM 11g – the Evolution by Alex Gorbachev:
Read the rest of this entry . . .

Where is Storage QoS?

In the era of consolidation, storage has not been left out. Different systems are made to share the same storage boxes, fiber-channel switches and networks. Inside a typical storage box, we have front-end and back-end controllers, cache, physical spindles shared amongst different applications, databases, backup destinations, and so on.

The impact of backup on normal database activity . . . batch processing in one database impacting transactional processing — these are two real life examples of the consequences of storage consolidation known to almost every DBA. Of course, it’s easy to suggest separating databases to different physical disks, but what about SAN box controllers and shared cache? And don’t forget about the cost factor and ubiquitous consolidation that forces storage administrators to pack as much data as possible into a single SAN or NAS storage device.

Some of our customers use hosting services — they outsource hardware hosting just like they outsource DBA work to Pythian. In such scenarios, hosting service providers usually have storage hardware shared amongst different customers to provide higher utilization and on-demand storage capacity at a lower cost.

Read the rest of this entry . . .

Basic I/O Monitoring on Linux

This is my fourth week at Pythian and in Canada and I’m starting to get back to my normal life cycle — my personal things are getting sorted and my working environment is set. Here at Pythian I’m in a team of four people together with Christo, Joe, and Virgil. (I should write another post about beginning at Pythian — will do one day.)

Yesterday, I asked Christo to show me how he monitors I/O on Linux. I needed to collect statistics on a large Oracle table on a production box, and wanted to keep an eye on the impact. So we grabbed Joe as well and sat all three around my PC. While we were discussing, Paul was around and showed some interest in the topic we discussed — otherwise, why would we all three be involved?. Anyway, Dave and Paul thought that this would be a nice case for a blog post. So here we are…

Indeed, while the technique we discuss here is basic, it gives a good overview and is very easy to use. So let get focused… We will use iostat utility. In case you need you know where to find more about it — right, man pages.

So we will use the following form of the command:

iostat -x [-d] <interval>
  • -x option displays extended statistics. You definitely want it.
  • -d is optional. It removes CPU utilization to avoid cluttering the output. If you leave it out, you will get the following couple lines in addition:
    avg-cpu:  %user   %nice    %sys %iowait   %idle
       6.79    0.00    3.79   16.97   72.46
  • is the number of seconds iostat waits between each report. Without a specified interval, iostat displays statistics since the system was up then exits, which is not useful in our case. Specifying the number of seconds causes iostat to print periodic reports where IO statistics are averaged for the time period since previous report. I.e., specifying 5 makes iostat dump 5 seconds of average IO characteristics, every 5 seconds until it’s stopped.

If you have many devices and you want to watch for only some of them, you can also specify device names on command line:

iostat -x -d sda 5

Now let’s get to the most interesting part — what those cryptic extended statistics are. (For readability, I formatted the report above so that the last two lines are in fact a continuation of the first two.)

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s
sda          0.00  12.57 10.18  9.78  134.13  178.84    67.07

wkB/s avgrq-sz  avgqu-sz   await  svctm  %util
89.42    15.68      0.28   14.16   8.88  17.72
  • r/s and w/s— respectively, the number of read and write requests issued by processes to the OS for a device.
  • rsec/s and wsec/s — sectors read/written (each sector 512 bytes).
  • rkB/s and wkB/s — kilobytes read/written.
  • avgrq-sz — average sectors per request (for both reads and writes). Do the math — (rsec + wsec) / (r + w) = (134.13+178.84)/(10.18+9.78)=15.6798597
    If you want it in kilobytes, divide by 2.
    If you want it separate for reads and writes — do you own math using rkB/s and wkB/s.
  • avgqu-sz — average queue length for this device.
  • await — average response time (ms) of IO requests to a device. The name is a bit confusing as this is the total response time including wait time in the requests queue (let call it qutim), and service time that device was working servicing the requests (see next column — svctim).

    So the formula is await = qutim + svctim.

  • svctim — average time (ms) a device was servicing requests. This is a component of total response time of IO requests.
  • %util — this is a pretty confusing value. The man page defines it as, Percentage of CPU time during which I/O requests were issued to the device (bandwidth utilization for the device). Device saturation occurs when this value is close to 100%. A bit difficult to digest. Perhaps it’s better to think of it as percentage of time the device was servicing requests as opposed to being idle. To understand it better here is the formula:

    utilization = ( (read requests + write requests) * service time in ms / 1000 ms ) * 100%
    or
    %util = ( r + w ) * svctim /10 = ( 10.18 + 9.78 ) * 8.88 = 17.72448

Traditionally, it’s common to assume that the closer to 100% utilization a device is, the more saturated it is. This might be true when the system device corresponds to a single physical disk. However, with devices representing a LUN of a modern storage box, the story might be completely different.

Rather than looking at device utilization, there is another way to estimate how loaded a device is. Look at the non-existent column I mentioned above — qutim — the average time a request is spending in the queue. If it’s insignificant, compare it to svctim — the IO device is not saturated. When it becomes comparable to svctim and goes above it, then requests are queued longer and a major part of response time is actually time spent waiting in the queue.

The figure in the await column should be as close to that in the svctim column as possible. If await goes much above svctim, watch out! The IO device is probably overloaded.

There is much to say about IO monitoring and interpreting results. Perhaps this is only the first of a series of posts about IO statistics. At Pythian we often come across different environments with specific characteristics and various requirements that our clients have. So stay tune — more to come.

Update 12-Feb-2007: You might also find useful Oracle Disk IO Basics session of Pythian Goodies.

Start NowWith Pythian - database design, management and emergency handling capabilities...

Live Updates

pythian: RT @FN_Press2: Schooner Information Technology Teams with Pythian to Deliver Advanced Support and High... http://finanznachrichten.de/20
more



Testimonials

  • Serge Racine

    DBA, Brookfield Energy

    We are very satisfied by the service given to us by Andre and Shakir in support of our recent data quality and reorganization initiative.... more