The ASR manager is designed to be a side-wide aggregation point for ASR alerts, receiving SNMP traps and forwarding over https to transport.oracle.com. But if you’re using port 162 for SNMP traps on a Linux system, you may find that such traps are never sent to Oracle.
I was testing this by creating test traps through IPMI:
# ipmitool sunoem cli "set /SP/alertmgmt/rules/1 testrule=true" Connected. Use ^D to exit. -> set /SP/alertmgmt/rules/1 testrule=true Set 'testrule' to 'true' -> Session closed Disconnected
This command should be passed onto Oracle and result in an e-mail noting a test service request had been created. But in my case, nothing came up.
/var/log/messages however did show a test trap generated:
Dec 19 16:12:23 asrmgr01 snmptrapd: 2013-12-19 16:12:23 testdb01.example.com [UDP: [126.96.36.199]:32957]: DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (51161892) 5 days, 22:06:58.92 SNMPv2-MIB::snmpTrapOID.0 = OID: SNMPv2-SMI::enterprises.188.8.131.52.2.0.63 SNMPv2-SMI::enterprises.184.108.40.206.220.127.116.11 = STRING: "Oracle Database Appliance X3-2 1234ABC12B" SNMPv2-SMI::enterprises.18.104.22.168.22.214.171.124 = STRING: "1234ABC12B" SNMPv2-SMI::enterprises.126.96.36.199.188.8.131.52 = STRING: "SUN FIRE X4170 M3" SNMPv2-SMI::enterprises.184.108.40.206.220.127.116.11 = STRING: "This is a test trap"
But none of the ASR manager logs in /var/opt/SUNWsasm/log showed any indication of activity.
After a lot of digging, including copious logfile reading, straces, and tcpdumps, I found that the ASR manager process is not even listening for SNMP traps:
[root@asrmgr01 log]# lsof -p `pidof java` | grep UDP java 31318 root 93u IPv6 23334618 0t0 UDP *:41178
Searching for who’s holding the SNMP port 162, “snmptrap”
[root@asrmgr01 log]# lsof | grep UDP | grep ":snmptrap" snmptrapd 28163 root 8u IPv4 23357406 0t0 UDP *:snmptrap
It’s another complete process, snmptrapd.
[root@asrmgr01 log]# ps -ef | grep snmptrapd | grep -v grep root 4986 1 0 Dec15 ? 00:00:04 /usr/sbin/snmptrapd -Lsd -p /var/run/snmptrapd.pid
Decoding the arguments from the command line, -Lsd sends “L”og messages to “s”yslog at “d”aemon priority. And it was these messages I had seen in /var/log/messages.
And a little more diffing in the ASR manager lgofile /var/opt/SUNWsasm/log/sasm.log does show a telling message:
2013-12-19_16:00:51 command executed: sasm start-instance Starting Oracle Automated Service Manager... Cannot bind to port : 162
Unfortunately sasm continued to start, not reporting anything in stdout. It would have been much easier if it would have simply exited on a fatal error like this.
Anyways, the fix was quite simple: disabling snmptrapd on the ASR manager host:
chkconfig snmptrapd off service snmptrapd stop service sasm restart
And then my test traps start succeeding in generating e-mail alerts.
Interested in working with Marc? Schedule a tech call.