Round Trip Network Latency Tests for your Remote Database Service
If you are using a remote Oracle database service, such as Oracle’s Autonomous Transaction Processing database, or an AWS RDS database, how do you measure the network latency?
The usual method of using ‘ping’ or other tools will not work, as there is no access to the underlying server of the database service.
Wouldn’t it be great if there were a way to measure the round trip latency from your location directly to the remote database service?
Well, now there is. Here’s how you can do it.
Two databases are used for this testing. The database that is the source of the testing, and the remote database that is the target of the testing.
A database link is created in the local database, and as you have likely already realized, the database link connects to the remote database.
For best results, the sqlplus session for this test is best run on the server for the local database. If the test is run from a client on another machine, the results can be skewed by the overhead of the connection between the client (such as your laptop) and the local database.
If the client machine and local database are physically close, then the skew would likely be only a fraction of millisecond. This may be negligible for most testing.
For this testing, the target database is a standard Oracle database created on a virtual server that is located in San Francisco California, USA. San Francisco is approximately 700 miles (1126 km) from my location.
Here’s the database link as created in my local database:
create database link droplet_link connect to pingtest identified by XXXX using 'localhost:6789/ORCLPDB1'
This particular database link is a bit of a special case. The database is running on a remote virtual machine from a low cost vendor. As I don’t want to open a database port to the internet, the connection to this database is via an ssh tunnel. The local port 6789 is forwarded to port 1521 on the database server.
The reason for creating this test database in a remote virtual machine is so that latency may also be measured with ‘ping’ for comparison.
Now that the database link is created, the first test can be performed.
Singleton Database Calls
This test consists of running this SQL statement 5 times, and measuring the difference between the time the SQL is executed, and the time the results are received.
This is the SQL statement used:
select systimestamp at local from dual@dblink
The code is actually a PL/SQL block. This can be seen at ping-remote-db.sql
The SQL is run 5 times at 2 second intervals. As the first execution of the SQL may include time required to connect to the database referred to by the database link, the first result is ignored.
$ echo exit | sql -S -L jkstill/XXX@orcl/pdb1 @ping-remote-db.sql Local Seconds Begin: 1641640285.703832 Local Seconds End: 1641640285.799489 Round Trip: 0.095657 ============================== Local Seconds Begin: 1641640287.864372 Local Seconds End: 1641640288.054133 Round Trip: 0.189761 ============================== Local Seconds Begin: 1641640290.103683 Local Seconds End: 1641640290.471617 Round Trip: 0.367934 ============================== Local Seconds Begin: 1641640292.537824 Local Seconds End: 1641640292.671595 Round Trip: 0.133771 ============================== Local Seconds Begin: 1641640294.711450 Local Seconds End: 1641640295.176477 Round Trip: 0.465027 ============================== Iterations: 5
The latency is from 95 – 465 ms.
How does that compare to ‘ping’?
$ ping -i 2 -c 5 137.nnn.nn.nnn PING 126.96.36.199 (188.8.131.52) 56(84) bytes of data. 64 bytes from 184.108.40.206: icmp_seq=1 ttl=48 time=31.4 ms 64 bytes from 220.127.116.11: icmp_seq=2 ttl=48 time=623 ms 64 bytes from 18.104.22.168: icmp_seq=3 ttl=48 time=194 ms 64 bytes from 22.214.171.124: icmp_seq=4 ttl=48 time=179 ms 64 bytes from 126.96.36.199: icmp_seq=5 ttl=48 time=66.5 ms --- 137.nnn.nn.nnn ping statistics --- 5 packets transmitted, 5 received, 0% packet loss, time 8004ms rtt min/avg/max/mdev = 31.441/219.196/623.788/211.862 ms
The first few times I tested this, the times of the database ping were approximately 2x the standard ping times.
Testing on other days shows less of a difference between the times.
There is extra overhead for the database ping as compared to the ping utility, so it would be expected to take a more time.
Just how much more time seems to vary with internet performance, as the remote databases are accessed via the internet.
Multiple Row Set Ping
If the extra time is due to database overhead, we should be able to devise a test that minimizes the database overhead.
One way to do this is to use a single SELECT statement that returns many rows, so that only 1 SQL execution is required.
Craft the SQL so that each row requires an entire TCP packet. Then get the total time required to retrieve the rows, and calculate the average latency.
The following SQL was used:
select systimestamp at local as ping_timestamp, rpad(''X'',1500-35-28,''X'') as filler from dual@dblink_name connect by level <= 10;
The standard TCP packet size is 1500 bytes, the length of `systimestamp at local` is 35, and the ICMP header is 28 bytes (on Linux anyway).
When this new test is run, as the number of packets increases, the average time per row should be close to the times seen by using the ‘ping’ utility.
This script was used to do that: ping-remote-db-multirow.sql
Results of the multirow db-ping test:
$ echo exit | sql -S -L jkstill/XXX@orcl/pdb1 @ping-remote-db-multirow.sql Local Seconds Begin: 1641644528.779681 Local Seconds End: 1641644528.849177 Round Trip: 0.069496 ============================== Local Seconds Begin: 1641644528.849205 Local Seconds End: 1641644529.350788 Round Trip: 0.501583 ============================== Local Seconds Begin: 1641644529.350817 Local Seconds End: 1641644530.000492 Round Trip: 0.649675 ============================== Local Seconds Begin: 1641644530.000530 Local Seconds End: 1641644530.042419 Round Trip: 0.041889 ============================== Local Seconds Begin: 1641644530.042447 Local Seconds End: 1641644530.146593 Round Trip: 0.104146 ============================== Local Seconds Begin: 1641644530.146621 Local Seconds End: 1641644530.264522 Round Trip: 0.117901 ============================== Local Seconds Begin: 1641644530.264549 Local Seconds End: 1641644530.625663 Round Trip: 0.361114 ============================== Local Seconds Begin: 1641644530.625691 Local Seconds End: 1641644531.093010 Round Trip: 0.467319 ============================== Local Seconds Begin: 1641644531.093038 Local Seconds End: 1641644531.397555 Round Trip: 0.304517 ============================== Local Seconds Begin: 1641644531.397584 Local Seconds End: 1641644531.448949 Round Trip: 0.051365 ============================== Connect Time: 3.449366 Round Trip Avg: 0.289347 Iterations: 10
And now a ping test with a count of 10:
$ ping -c 10 137.nnn.nn.nnn PING 188.8.131.52 (184.108.40.206) 56(84) bytes of data. 64 bytes from 220.127.116.11: icmp_seq=1 ttl=48 time=207 ms 64 bytes from 18.104.22.168: icmp_seq=2 ttl=48 time=124 ms 64 bytes from 22.214.171.124: icmp_seq=3 ttl=48 time=1281 ms 64 bytes from 126.96.36.199: icmp_seq=4 ttl=48 time=249 ms 64 bytes from 188.8.131.52: icmp_seq=5 ttl=48 time=383 ms 64 bytes from 184.108.40.206: icmp_seq=6 ttl=48 time=58.7 ms 64 bytes from 220.127.116.11: icmp_seq=7 ttl=48 time=588 ms 64 bytes from 18.104.22.168: icmp_seq=8 ttl=48 time=84.0 ms 64 bytes from 22.214.171.124: icmp_seq=9 ttl=48 time=74.8 ms 64 bytes from 126.96.36.199: icmp_seq=10 ttl=48 time=81.1 ms --- 137.nnn.nn.nnn ping statistics --- 10 packets transmitted, 10 received, 0% packet loss, time 9043ms rtt min/avg/max/mdev = 58.797/313.416/1281.181/360.031 ms, pipe 2
With an average ping time of 313 ms, and the average time with the db ping test at 289 ms, the results are quite close between the two tests.
These results are quite poor. While writing this blog over the course of a few days, I have seen the round trip latency for this database vary from 25-500 ms.
If this were a real production database rather than a throw away test database, it would be a good time to call in a network engineer.
Now that we know the db ping test is fairly accurate, we can trust the results when used with an ATP database.
Here are the results when testing against and ATP database it is in an Oracle OCI data center in Phoenix Arizona, USA, which is about 1200 miles (1931 km) from my location.
Here is the SQL for the database link:
-- for a dblink the oracle server must have access to wallets -- so the wallets must be available on the db server create database link oci_link connect to jkstill identified by XXXX using '(description= (retry_count=20)(retry_delay=3)(address=(protocol=tcps)(port=1522)(host=adb.us-phoenix-1.oraclecloud.com))(connect_data=(service_name=****************atp21c01_high.adb.oraclecloud.com))(security=(ssl_server_dn_match=true)(ssl_server_cert_dn="CN=****.uscom-east-1.oraclecloud.com,OU=Oracle BMCS US,O=Oracle Corporation,L=Redwood City,ST=California,C=US")(my_wallet_directory=/u01/app/oracle/wallets/atp21c01)))' /
And here are the results:
$ echo exit | sql -S -L jkstill/XXX@orcl/pdb1 @ping-remote-db-multirow.sql Local Seconds Begin: 1641644841.938675 Local Seconds End: 1641644842.365383 Round Trip: 0.426708 ============================== Local Seconds Begin: 1641644842.365437 Local Seconds End: 1641644842.603871 Round Trip: 0.238434 ============================== Local Seconds Begin: 1641644842.603898 Local Seconds End: 1641644843.064586 Round Trip: 0.460688 ============================== Local Seconds Begin: 1641644843.064614 Local Seconds End: 1641644843.749951 Round Trip: 0.685337 ============================== Local Seconds Begin: 1641644843.749978 Local Seconds End: 1641644844.319656 Round Trip: 0.569678 ============================== Local Seconds Begin: 1641644844.319686 Local Seconds End: 1641644844.550935 Round Trip: 0.231249 ============================== Local Seconds Begin: 1641644844.550964 Local Seconds End: 1641644845.017424 Round Trip: 0.466460 ============================== Local Seconds Begin: 1641644845.017453 Local Seconds End: 1641644845.678804 Round Trip: 0.661351 ============================== Local Seconds Begin: 1641644845.678831 Local Seconds End: 1641644845.868892 Round Trip: 0.190061 ============================== Local Seconds Begin: 1641644845.868920 Local Seconds End: 1641644846.417399 Round Trip: 0.548479 ============================== Connect Time: 2.455469 Round Trip Avg: 0.483373 Iterations: 10
The average latency is 483 ms, which is an excessive time for this location.
The time required to connect is being measured as well. In this case the connection took nearly 2.5 seconds.
Previous tests on a different day had much better results, so clearly there are currently some network performance issues.
If there were complaints of slow performance retrieving results from this database, the network performance would certainly be a prime suspect.
The results appear good enough for general use, but, there are some refinements that could be made.
TCP Packet Size
The calculate in the SQL for packet size does not take into account the overhead required for organization of the data that is returned. The ICMP overhead and size the timestamp string were used, but there will be a few bytes used to organize that data. Each row returned is likely required 2 TCP packets.
To correct this, the ‘tcpdump’ utility could be used to analyze the size of the data and overhead, and reduce the size of the filler string used in the SQL.
Local Network Latency
No attempt has been made to correct for local network latency.
For instance, my testing was performed from a client that is in the same rack as the local database. There is a small amount of latency between those machines, about 0.000250 seconds, or .25 ms. The timestamps are being returned from the local machine via ‘timestamp at local’, where local is the client machine.
To make this more accurate, the latency could be calculated and accounted for.
However, that is really too much work, as the simpler solution is to just run these tests directly from the server where the local database is running.
Even without those refinements, the results are acceptable for most use cases.
Now that you have the means, you can try this out on your own.
The full code is found at db-ping