Oracle Database

ASM multi-disk performance

4 min read

May 12, 2006 12:00:00 AM

This is a fascinating look into storage architecture! It perfectly illustrates that "more hardware" doesn't always equal "more speed" without the right I/O pattern. Your librarian analogy is a great way to explain the impact of readahead and cache management on serial vs. parallel workloads.

Here are the H2 headings inserted into your blog post to improve organization and scannability while keeping every word of your original content intact.

The Core Dilemma: SAN vs. ASM Spindle Management

If you have the ability to combine disk spindles at both the SAN level and Oracle (ASM) level, which one is better?

Should you combine all your spindles on the SAN and present 1 big disk to OS and give that to ASM? Or should you present each individual disk spindle to ASM and let ASM do the mirroring ?

One item should help you decide very quickly. ASM does not offer RAID 5, and that’s what most people would like to run for it’s low cost.

Another item is performance. Modern disks are able to push 50 to 70 Mb/sec in sequential reads easily. Combine 3 drives’ output and you get 210 Mb/sec which is approximately the bandwith limitation of 2GBit fibre channel. That’s of course, under optimal disk setup.

So imagine, as a DBA, you have full freedom on how to divide your hard disk devices. Don’t you wish it was like that for all DBAs?

The Test Environment: Comparing LUN Configurations

I happen to have both setups. One diskgroup with one big disk (array) and another disk group with 2 disks (arrays). Those are on the same machine, attached to the same database. All 3 arrays are RAID 5 with 256kb striping. To visualize:

(14 x 36 gb) => Raid 5 LUN => ORA_DISK_1 => ASM DISK GROUP A

(14x36gb) Raid5 => ORA_DISK 2 + (14x36gb) => ORA_DISK 3 Together = ASM DISK GROUP B

There’s 1 more detail that matters – the machine has 2 Fibre Channel controllers, 2gbit bandwith each (~200 Mb/sec). The LUNs are split equaly alternating the controllers. In the LUNs that I am testing, for the disk group with 2 LUNs, each LUN is on a separate controller.

So I created the same tablespace with the same table on those 2 disk groups. And ran the following tests:

Serial Scan Results: Why More Disks Can Be Slower

Full table scan of 15 gb table: – Disk Group A (1 disk) – 136 seconds – ~110 Mb/sec – Disk Group B (2 disks) – 184 seconds – ~81 Mb/sec

Surprised? The disk group with 2 disks is slower! Those results are consistent, and confirmed with diagnostic output from iostat. You may start to wonder why would 2 be slower then 1. It should be twice as fast!

I will have to give an example of this. Imagine you go to the library. In this specific library, you dont get access to books directly. You go at the desk and request them. The librarian goes and fetches the books you want. You been a smart guy, ask for multiple books at the same time, since you know they are in the same area – thus you are saving time.

Now imagine if there were 2 librarians. Now you have 2 people to ask for books, but what you do is ask for one, wait for your books, then ask the other librarian, alternating them. You never ask for them at the same time, either one or the other. You won’t get your books faster, you will get them at the same speed!

In our situation we got slower with 2 “librarians”. Why? Well it happened that our “librarians” were really smart, and when they went to get the books, they decided to get an extra set, in case you asked for it. So when you had 1 “librarian”, it was working great and some of the books you were asking for were already available. But now that you have 2 “librarians” to ask for stuff, by the time you come back to the librarian who just brought you the books, he would decide that you don’t need them and return them.

Parallel Execution: Unleashing Multi-Controller Bandwidth

Now the same test, but in parallel. My parallelism level is 8, full scan of 15 gb table: ASM disk group A (1 disk) – 78 seconds – 192 Mb/sec ASM disk group B (2 disks) – 41 seconds – ~365 Mb/sec

Now I am sending more requests SIMULTANEOUSLY – I get to use the fact that I have 2 LUNs on separate controllers. In addition, it helped my 1 LUN disk group by providing a constant flow of requests.

And then the final test I ran, rman backup validate tablespace: It simply reads all the data. Since it’s 1 big tablespace, no parallelism is available, but that’s not important. Unfortunately, the tablespace backup tests I did at a later point, thus their sizes are different:

ASM disk group A (1 disk) – 17’500 Mb in 96 seconds – 182 Mb/sec ASM disk group B (2 disks) – 46’700 Mb in 135 seconds – 345 Mb/sec

Even though that speed looks amazing, it’s actually a bit higher, as RMAN takes 1-2 seconds after the copy before taking the timing estimate. According to iostat I reached 196 Mb/sec in group A, and 392 Mb/sec in group B.

RMAN and the Power of ASYNC IO

This is 1 backup. Why the difference between backup and full table scan? They were both limited by disk, why is it different?

The reason is ASYNC IO.

RMAN uses ASYNC IO extensively, keeping 32 read requests of 1 mb each in the read queue. This is clearly visible in iostat. ASYNC IO allows RMAN to keep requests in the queue, while processing them as they come. This allows the Disk IO subsystem to fetch them very efficiently.

Think about it, if you go to the librarian and give him a list of all the books you need, he will get them in the most efficient way for him.

Conclusion

Conclusion? Discussion and feedback is open in the comments!

Oracle Database Consulting Services

Ready to optimize your Oracle Database for the future?

Speak with our Oracle Database consultants ->

On this page

Ready to unlock value from your data?

With Pythian, you can accomplish your data transformation goals and more.

Speak with Pythian consultants now →

ASM multi-disk performance

The Core Dilemma: SAN vs. ASM Spindle Management

The Test Environment: Comparing LUN Configurations

Serial Scan Results: Why More Disks Can Be Slower

Parallel Execution: Unleashing Multi-Controller Bandwidth

RMAN and the Power of ASYNC IO

Conclusion

Oracle Database Consulting Services

Share this

Share this

More resources

ASM disk group just will not mount

Meaning of "Disk Reads" Values in DBA_HIST_SQLSTAT

Oracle Database Appliance: Storage Performance -- Part 1

Ready to unlock value from your data?