Troubleshooting ORA-27090 Async I/O Errors with Systemtap

Posted in: Technical Track

Last week, I ran into an issue on a high-volume Oracle database where sessions were periodically failing with ORA-27090 errors. Job queue processes were also seeing this error, and showing messages like this in the database alert log:

Tue Sep 11 20:56:15 2012
Errors in file /orahome/oracle/base/diag/rdbms/dbm/dbm1/trace/dbm1_j001_126103.trc:
ORA-27090: Unable to reserve kernel resources for asynchronous disk I/O
Linux-x86_64 Error: 11: Resource temporarily unavailable
Additional information: 3
Additional information: 128
ORA-27090: Unable to reserve kernel resources for asynchronous disk I/O
Linux-x86_64 Error: 11: Resource temporarily unavailable
Additional information: 3
Additional information: 128

The tracefile just showed the same ORA-27090 messages, so nothing particularly useful there. Oerr is of no help:

$ oerr ora 27090
27090, 00000, "Unable to reserve kernel resources for asynchronous disk I/O"
// *Cause: The system call to reserve kernel resources for asynchronous I/O
// has failed.
// *Action: Check errno

There’s a known bug, 7306820 “ORA-7445 [krhahw] / ORA-27090 during file header read. Instances may crash, but this bug is fixed in, and this database is running

And on top of that, it’s an Exadata system, so I/O to storage servers go over the InfiniBand network rather than use async I/O (AIO) calls.

A web search turned up a blog entry from DBHK’s blog, pointing to a value of aio-max-nr being set too low. However aio-max-nr is actually set to the same level as the recommended value and matches Exadata defaults as well:

# cat /proc/sys/fs/aio-max-nr

The Linux kernel documentation has a brief but meaty description of this parameter:

aio-nr & aio-max-nr:

aio-nr is the running total of the number of events specified on the io_setup system call for all currently active aio contexts. If aio-nr reaches aio-max-nr then io_setup will fail with EAGAIN. Note that raising aio-max-nr does not result in the pre-allocation or re-sizing of any kernel data structures.

Taking a peek at aio-nr:

# cat /proc/sys/fs/aio-nr

We’re within 2 of the absolute limit, so it’s likely that this limit is indeed the problem. However, the question is: Who is using these AIO events? This DB is a huge session hog (8000+ concurrent sessions) but even there, 3M is a pretty high limit. And at this point, we can’t even be sure that its database processes use up the AIO events.

The only AIO-related information in /proc (or /sys for that matter) is the two files in /proc/sys/fs. To go into more detail requires some more tools.

Solaris admins will no doubt be familiar with DTrace, a kernel tracing framework that can expose all kinds of information in the OS kernel, among other things. Oracle has ported DTrace to Linux, but it requires the latest-and-greatest UEK2 kernel and is not yet supported on Exadata.

I came across another tool that also allows kernel inspection and is available in Oracle Linux 5: systemtap. Systemtap hooks into the call stack, allowing function calls to be traced, arguments captured, and if you’re really brave, actually modified.

With dependencies, I ended up needing to add four packages. As this machine doesn’t (yet) have a working yum repository, I used to obtain the following:


The avahi package is a tool for plug-and-play networking that I don’t exactly want running on a server, but the systemtap binary is linked to it for remote compilation capability. Avahi configures itself to auto-start itself on the next boot, so I disabled that:

# chkconfig avahi-daemon off
# chkconfig avahi-dnsconfd off

The systemtap packages complained about missing kernel package depedencies, since this system is running’s UEK kernel, naming the kernel package kernel-u2k instead. I ended up doing the installation with the –nodeps option to skip dependency checking.

I couldn’t find any pre-made scripts to monitor AIO, but in a 2008 presentation from Oracle, Linux engineering does have a bullet point on it:

• Tracking resources tuned via aio_nr and aio_max_nr

So based on some of the many example scripts I set out to build a script to monitor AIO calls. Here is the end result:

stap -ve '
global allocated, allocatedctx, freed

probe syscall.io_setup {
  allocatedctx[pid()] += maxevents; allocated[pid()]++;
  printf("%d AIO events requested by PID %d (%s)\n",
  	maxevents, pid(), cmdline_str());
probe syscall.io_destroy {freed[pid()]++}

probe kprocess.exit {
  if (allocated[pid()]) {
     printf("PID %d exited\n", pid());
     delete allocated[pid()];
     delete allocatedctx[pid()];
     delete freed[pid()];

probe end {
foreach (pid in allocated) {
   printf("PID %d allocated=%d allocated events=%d freed=%d\n",
      pid, allocated[pid], allocatedctx[pid], freed[pid]);

Sample output (using sytemtap’s -v verbose option to see compilation details):

Pass 1: parsed user script and 76 library script(s) using 147908virt/22876res/2992shr kb, in 130usr/10sys/146real ms.
Pass 2: analyzed script: 4 probe(s), 10 function(s), 3 embed(s), 4 global(s) using 283072virt/49864res/4052shr kb, in 450usr/140sys/586real ms.
Pass 3: using cached /root/.systemtap/cache/11/stap_111c870f2747cede20e6a0e2f0a1b1ae_6256.c
Pass 4: using cached /root/.systemtap/cache/11/stap_111c870f2747cede20e6a0e2f0a1b1ae_6256.ko
Pass 5: starting run.
128 AIO events requested by PID 32885 (oracledbm1 (LOCAL=NO))
4096 AIO events requested by PID 32885 (oracledbm1 (LOCAL=NO))
128 AIO events requested by PID 69099 (oracledbm1 (LOCAL=NO))
4096 AIO events requested by PID 69099 (oracledbm1 (LOCAL=NO))
128 AIO events requested by PID 69142 (oracledbm1 (LOCAL=NO))
4096 AIO events requested by PID 69142 (oracledbm1 (LOCAL=NO))
128 AIO events requested by PID 69099 (oracledbm1 (LOCAL=NO))
128 AIO events requested by PID 69142 (oracledbm1 (LOCAL=NO))
128 AIO events requested by PID 32885 (oracledbm1 (LOCAL=NO))
4096 AIO events requested by PID 69142 (oracledbm1 (LOCAL=NO))
4096 AIO events requested by PID 69099 (oracledbm1 (LOCAL=NO))
128 AIO events requested by PID 69142 (oracledbm1 (LOCAL=NO))
128 AIO events requested by PID 69099 (oracledbm1 (LOCAL=NO))
(and when control-C is pressed):

PID 99043 allocated=6 allocatedevents=12672 freed=3
PID 37074 allocated=12 allocatedevents=25344 freed=6
PID 99039 allocated=18 allocatedevents=38016 freed=9
PID 69142 allocated=24 allocatedevents=50688 freed=12
PID 32885 allocated=36 allocatedevents=76032 freed=18
PID 69099 allocated=6 allocatedevents=12672 freed=3
Pass 5: run completed in 0usr/50sys/9139real ms.

It’s quite obvious here that the AIO allocations are all happening from Oracle database processes.

From the summary output we can see that each process seems to run io_setup twice as much as io_destroy. Kernel gurus may have an answer to this, but I suspect it has more to do with the data gathering than a massive leak in AIO events.

But the more interesting result is the frequent allocation of 4096 AIO events at a time. On a database with 8000 connections, that would be over 10 times the current limit.

The only major downside of increasing this limit seems to be avoiding to exhaust kernel memory. From a 2009 post to the linux-kernel mailing list:

Each time io_setup is called, a ring buffer is allocated that can hold nr_events I/O completions. That ring buffer is then mapped into the process’ address space, and the pages are pinned in memory. So, the reason for this upper limit (I believe) is to keep a malicious user from pinning all of kernel memory.

And with consultation with Oracle support, we set aio-max-nr to 50 million, enough to accommodate three databases with 16k connections, all allocating 4096 AIO events. Or in other words, way more than we ever expect to use.

# sysctl -w fs.aio-max-nr=50000000

And since this change, the ORA-27090 errors have gone away.


Interested in working with Marc? Schedule a tech call.

About the Author

Marc is a passionate and creative problem solver, drawing on deep understanding of the full enterprise application stack to identify the root cause of problems and to deploy sustainable solutions. Marc has a strong background in performance tuning and high availability, developing many of the tools and processes used to monitor and manage critical production databases at Pythian. He is proud to be the very first DataStax Platinum Certified Administrator for Apache Cassandra.

6 Comments. Leave new

Would have loved to have a full ps listing from this test setup.

Any chance you still have it?

Frank Ch. Eigler
September 20, 2012 9:38 pm

Marc, it turns out the base systemtap package uses avahi not for remote tracing, but for remote compilation of systemtap scripts (so as to have gcc / kernel-devel / kernel-debuginfo files on some more central machine). (For remote tracing, stap can use ssh: “stap –remote HOST …”, not avahi.)


Hi Frank,

Thanks for the correction; I’ve updated the post. I still find it frustrating that RedHat (and by extension Oracle) (a) requires avahi to run any type of systemtap setup (b) requires a full avahi install to get the libraries, rather than, say, an avahi-libs package, and (c) silently configures the daemons to run on startup when the avahi package is installed.

asynch descriptor resize, Oracle Linux x86_64 | Oracle mechanics
March 31, 2013 2:49 pm

[…] ????? ? ?????????????? systemtap, ????????, ??? ???????? ? Troubleshooting ORA-27090 async I/O errors with systemtap – ?????? ??, ??? aio-nr ????? ?? ????????????? ? […]


Hi Marc,
thanks for the post it has been very helpful. I was aware of your post since a while but I only had a chance to come back to it in a real case when a colleague of mine reported ORA-27090 on a box that is using dNFS (NAS storage). This was puzzling to me as dNFS does not use kernel async I/O! It turns out that AIO slots are allocated anyway. From a few samples I see that each time a new process (session) is started 128 AIO slots are allocated (and not used), similarly to what you report here. So it looks like the “mystery” of why a DB using dNFS also needs to have IO slots available is solved. Anyway the most interesting part of your post for me is the investigation process and on how systemtap can give a direct view into what Oracle is doing at the OS level.
As I mentioned on twitter I have actually used a slightly modified script which does not require the need to install kernel debuginfo, this is a time saver idea that I recently learned from Brendan Gregg work. Here is my modified script:

#! /usr/bin/stap
# asyncio_trace_mfield_modified.stp
# Investigate ORA-27090 Async I/O Errors
# Originally by Marc Fielding with some modifications
# Luca Feb 2015
# Removed the need to have kernel debuginfo:
# cut the probe handling process exit
# changed to use nd_syscall instead of syscall probes

global allocated, allocatedctx, freed

probe nd_syscall.io_setup {
allocatedctx[pid()] += maxevents;
printf(“%d AIO events requested by PID %d (%s)\n”,
maxevents, pid(), cmdline_str());

probe nd_syscall.io_destroy {freed[pid()]++}

probe nd_syscall.io_submit {printf(“io_submit called\n”)}

probe nd_syscall.io_getevents {printf(“io_getevents called\n”)}

probe end {
foreach (pid in allocated) {
printf(“PID %d allocated=%d allocated events=%d freed=%d\n”,
pid, allocated[pid], allocatedctx[pid], freed[pid]);



Based from oracle are you also curious if you will need to ever grow the 50 million connections to 100 million connections? I find it hard to believe that concurrently it would exceed 3 million, so the clean-up or discount of connections is not happening fast enough? Just curious.


Leave a Reply

Your email address will not be published. Required fields are marked *