Adding Networks to Exadata: Fun with Policy Routing

Oct 15, 2012 / By Marc Fielding

Tags: , , , , ,

I’ve noticed that Exadata servers are now configured to use Linux policy routing. Peeking at My Oracle Support, I’ve noticed that note 1306154.1 goes in a bit more detail about this configuration. It’s apparently delivered by default with factory build images 11.2.2.3.0 and later. The note goes on to explain that this configuration was implemented because of asymetric routing problems associated with the management network:

Database servers are deployed with 3 logical network interfaces configured: management network (typically eth0), client access network (typically bond1 or bondeth0), and private network (typically bond0 or bondib0). The default route for the system uses the client access network and the gateway for that network. All outbound traffic that is not destined for an IP address on the management or private networks is sent out via the client access network. This poses a problem for some connections to the management network in some customer environments.


It goes on to mention a bug where this was reported:

@ BUG:11725389 – TRACK112230: MARTIAN SOURCE REPORTED ON DB NODES BONDETH0 INTERFACE

The bug is not public, but the title does show the type of error messages that would appear if a packet with a non-local source address comes out.

This configuration is implemented using RedHat Oracle Linux-style /etc/sysconfig/network-scripts files, with matched rule- and route- files for each interface.

A sample configuration, where the management network is in the 10.10.10/24 subnet, is:

[root@exa1db01 network-scripts]# cat rule-eth0
from 10.10.10.93 table 220
to 10.10.10.93 table 220
[root@exa1db01 network-scripts]# cat route-eth0
10.10.10.0/24 dev eth0 table 220
default via 10.10.10.1 dev eth0 table 220

This configuration tells traffic originating from the 10.10.10.93 IP (which is the management interface IP on this particular machine) and traffic destined to this address to be directed away from the regular system routing table to a special routing table 220. Route-eth0 configures table 220 with two routers: one for the local network and a default route through a router on the 10.10.10.1 network.

This contrasts with the default gateway of the machine itself:

[root@exa1db01 network-scripts]# grep GATEWAY /etc/sysconfig/network
GATEWAYDEV=bondeth0
GATEWAY=10.50.50.1

The difference between this type of policy routing and regular routing is that traffic with the _source_ address of 10.10.10.93 will automatically go through default gateway 10.10.10.1, regardless of the destination. (The bible for Linux routing configuration is the Linux Advanced Routing and Traffic Control HOWTO, for those looking for more details.)

I ran into an issue with this configuration when adding a second external network on the bondeth1 interface. I set up the additional interface configuration for a network, 10.50.52.0/24:

[root@exa1db01 network-scripts]# cat ifcfg-bondeth1
DEVICE=bondeth1
USERCTL=no
BOOTPROTO=none
ONBOOT=yes
IPADDR=10.50.52.104
NETMASK=255.255.255.0
NETWORK=10.50.52.0
BROADCAST=10.50.52.255
BONDING_OPTS="mode=active-backup miimon=100 downdelay=5000 updelay=5000 num_grat_arp=100"
IPV6INIT=no
GATEWAY=10.50.52.1

I also added rule and route entries:

[root@exa1db01 network-scripts]# cat rule-bondeth1
from 10.50.52.104 table 211
to 10.50.52.104 table 211
[root@exa1db01 network-scripts]# cat route-bondeth1
10.50.52.0/24 dev bondeth1 table 211
10.100.52.0/24 via 10.50.52.1 dev bondeth1 table 211
default via 10.50.52.1 dev bondeth1 table 211

This was a dedicated data guard network to a remote server, IP 10.100.52.10.

The problem with this configuration was that it didn’t work. Using tcpdump, I could see incoming requests come in on the bondeth1 interface, but the replies came out the system default route on bondeth0 and did not reach their destination. After some digging, I did find the cause of the problem: In order to determine the packet source IP, the kernel was looking up the destination in the default routing table (table 255). And the route for the 10.100.52.0 network was in non-default table 211. So the packet followed the default route instead, got a source address in the client-access network, and never matched any of the routing rules for the data guard network.

The solution ended up being rather simple: Take out the “table 211″ for the data guard network route, effectively putting it in the default routing table:

[root@exa1db01 network-scripts]# cat route-bondeth1
10.50.52.0/24 dev bondeth1 table 211
default via 10.50.52.1 dev bondeth1 table 211
10.100.52.0/24 via 10.50.52.1 dev bondeth1

And then we ran into a second issue: The main interface IP could now be reached, but not the virtual IP (VIP). This is because the rule configuration, taken from the samples, didn’t list the VIP address at all. To avoid this issue, and in the case of VIP addresses migrating from other cluster nodes, we set up a netmask in the rule file, making all addresses in the data guard network use this particular routing rule:

[root@exa1db01 network-scripts]# cat rule-bondeth1
from 10.50.52.0/24 table 211
to 10.50.52.0/24 table 211

So to sum up, when setting up interfaces in a policy-routed Exadata system remember to:

  • Set up the interface itself and any bonds using ifcfg- files.
  • Create a rule- file for the interface, encompassing every possible address the interface could have. I added the entire IP subnet. Add “from” and “to” lines with a unique routing table number.
  • Create a route- file for the interface, listing a local network route and a default route with the default router of the subnet, all using the table number defined on the previous step.
  • Add to the route- file any static routes required on this interface, but don’t add a table qualifier.

The final configuration:

[root@exa1db01 network-scripts]# cat ifcfg-eth8
DEVICE=eth8
HOTPLUG=no
IPV6INIT=no
HWADDR=00:1b:21:xx:xx:xx
ONBOOT=yes
MASTER=bondeth1
SLAVE=yes
BOOTPROTO=none
[root@exa1db01 network-scripts]# cat ifcfg-eth12
DEVICE=eth12
HOTPLUG=no
IPV6INIT=no
HWADDR=00:1b:21:xx:xx:xx
ONBOOT=yes
MASTER=bondeth1
SLAVE=yes
BOOTPROTO=none
[root@exa1db01 network-scripts]# cat ifcfg-bondeth1
DEVICE=bondeth1
USERCTL=no
BOOTPROTO=none
ONBOOT=yes
IPADDR=10.50.52.104
NETMASK=255.255.255.0
NETWORK=10.50.52.0
BROADCAST=10.50.52.255
BONDING_OPTS="mode=active-backup miimon=100 downdelay=5000 updelay=5000 num_grat_arp=100"
IPV6INIT=no
GATEWAY=10.50.52.1
[root@exa1db01 network-scripts]# cat rule-bondeth1
from 10.50.52.0/24 table 211
to 10.50.52.0/24 table 211
[root@exa1db01 network-scripts]# cat route-bondeth1
10.50.52.0/24 dev bondeth1 table 211
default via 10.50.52.1 dev bondeth1 table 211
10.100.52.0/24 via 10.50.52.1 dev bondeth1

4 Responses to “Adding Networks to Exadata: Fun with Policy Routing”

  • [...] basically specify that traffic go back out whatever interface it came in on. For more info, see http://www.pythian.com/news/36747/ad…olicy-routing/ Good luck and I hope this helps someone. # cat rule-bondeth0 from 10.22.102.0/23 table 210 to [...]

  • Vagelis Nisyraios says:

    Hello Marc,

    Sorry to bother you but an attempt to configure Linux Advanced Routing (policy routing) brought me here after reading hundreds of other sites and opening an SR in MOS. Three months have passed without a valid info from their side but this is not surprising based on my experience from MOS. The key issue here is where you are pointing (among others) the following:
    “…•Add to the route- file any static routes required on this interface, but DON’T ADD A TABLE QUALIFIER…”.
    This specific notification is included only in your blog (as I’ve seen so far).

    The problem is that although your way of setup regarding addition of extra static routes in route-ethx file works just fine (it goes to main routing table), according to relevant redhat knowledgebase article regarding “How to make routing rules persistent, when I want packets to leave the same interface they came in?” this is a wrong setup in terms that you must add the table qualifier to the static route entry! I quote from the knowledgebase article (example from article’s route-eth0 setup):
    …..
    # cat /etc/sysconfig/network-scripts/route-eth0
    default via dev eth0 table 1
    #to add additional static routes
    # via dev eth0 table 1

    As you can see on the last line he adds the table qualifier to the additional static route and btw he removes the DEFAULT GATEWAY from any relevant files (/etc/sysconfig/network, ifcfg-eth* files). The problem of course is that when I add the table qualifier to my static route, although it’s seen in the specific table’s routes, it’s totally ignored so I cannot access (outgoing initiated only) the corresponding network (I get “network unreachable…”).
    So bottom line if you are not bored already. Besides the very important fact that your way works, do you have any official article/document from Oracle or Redhat that backs this type of setup for extra static routes in Advanced Routing configurations?
    I hope that you will find the time to clarify things if possible.

    Thank you in advance,
    Best Regards,
    Vagelis Nisyraios
    Athens, Greece

  • Jatin says:

    Dear Marc

    i have one question. Can we install to crs on exadata x4-2 machine?

    • Hi Jatin,

      I assume you’re asking if you can run two different RAC clusters in the same physical Exadata rack. I don’t see it done often, but it’s possible to physically partition your rack between two clusters, each with their own compute and storage servers. This so-called “split rack” configuration would locally separate everything but the InfiniBand and power distribution infrastructure. To maintain high availability and quorum, you would want a minimum of 2 compute and 3 storage servers for each part of your split rack.

      As isn’t particularly well documented, so I’d suggest engaging professional services for assistance if you’re considering this type of configuration.

      Marc

Leave a Reply

  • (will not be published)

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>