Troubleshooting Oracle’s Auto Service Request

Dec 20, 2013 / By Marc Fielding

Tags: , , , , , ,

I’ve spent the better part of the day troubleshooting an issue with Oracle’s Auto Service Request (ASR) and wanted to share my results in case if saves someone else some effort.

The ASR manager is designed to be a side-wide aggregation point for ASR alerts, receiving SNMP traps and forwarding over https to transport.oracle.com. But if you’re using port 162 for SNMP traps on a Linux system, you may find that such traps are never sent to Oracle.

I was testing this by creating test traps through IPMI:

# ipmitool sunoem cli "set /SP/alertmgmt/rules/1 testrule=true"
 Connected. Use ^D to exit.
 -> set /SP/alertmgmt/rules/1 testrule=true
 Set 'testrule' to 'true'

 -> Session closed
Disconnected

This command should be passed onto Oracle and result in an e-mail noting a test service request had been created. But in my case, nothing came up.

/var/log/messages however did show a test trap generated:

Dec 19 16:12:23 asrmgr01 snmptrapd[14527]: 2013-12-19 16:12:23 testdb01.example.com [UDP: [43.218.200.118]:32957]: DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (51161892) 5 days, 22:06:58.92  SNMPv2-MIB::snmpTrapOID.0 = OID: SNMPv2-SMI::enterprises.42.2.175.103.2.0.63    SNMPv2-SMI::enterprises.42.2.175.103.2.1.1.0 = STRING: "Oracle Database Appliance X3-2 1234ABC12B"      SNMPv2-SMI::enterprises.42.2.175.103.2.1.14.0 = STRING: "1234ABC12B"    SNMPv2-SMI::enterprises.42.2.175.103.2.1.15.0 = STRING: "SUN FIRE X4170 M3"     SNMPv2-SMI::enterprises.42.2.175.103.2.1.20.0 = STRING: "This is a test trap"

But none of the ASR manager logs in /var/opt/SUNWsasm/log showed any indication of activity.

After a lot of digging, including copious logfile reading, straces, and tcpdumps, I found that the ASR manager process is not even listening for SNMP traps:

[root@asrmgr01 log]# lsof -p `pidof java` | grep UDP
java    31318 root   93u  IPv6           23334618      0t0      UDP *:41178

Searching for who’s holding the SNMP port 162, “snmptrap”

[root@asrmgr01 log]# lsof | grep UDP | grep ":snmptrap"
snmptrapd 28163 root    8u  IPv4           23357406      0t0      UDP *:snmptrap

It’s another complete process, snmptrapd.

[root@asrmgr01 log]# ps -ef | grep snmptrapd | grep -v grep
root      4986     1  0 Dec15 ?        00:00:04 /usr/sbin/snmptrapd -Lsd -p /var/run/snmptrapd.pid

Decoding the arguments from the command line, -Lsd sends “L”og messages to “s”yslog at “d”aemon priority. And it was these messages I had seen in /var/log/messages.

And a little more diffing in the ASR manager lgofile /var/opt/SUNWsasm/log/sasm.log does show a telling message:

2013-12-19_16:00:51  command executed:  sasm start-instance
Starting Oracle Automated Service Manager...
Cannot bind to port : 162

Unfortunately sasm continued to start, not reporting anything in stdout. It would have been much easier if it would have simply exited on a fatal error like this.

Anyways, the fix was quite simple: disabling snmptrapd on the ASR manager host:

chkconfig snmptrapd off
service snmptrapd stop
service sasm restart

And then my test traps start succeeding in generating e-mail alerts.

Leave a Reply

  • (will not be published)

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>