Take the salary data from the example employees database. Run two queries against the same data, one taking a SUM of the data in a subquery, another using GROUP BY WITH ROLLUP to calculate the same SUM.

```
mysql> select sum(hourly) from (
-> select emp_no,year(from_date),
-> sum(salary/(52*40)) as hourly
-> from salaries-> group by emp_no,year(from_date)
-> ) t;
+---------------+
| sum(hourly) |
+---------------+
| 87250372.2511 |
+---------------+
1 row in set (47.34 sec)
mysql> select emp_no,year(from_date),sum(salary/(52*40))
-> from salaries
-> group by emp_no,year(from_date) with rollup;
+--------+-----------------+---------------------+
| emp_no | year(from_date) | sum(salary/(52*40)) |
+--------+-----------------+---------------------+
...
| 240754 | 1994 | 22.6274 |
| 240754 | 1995 | 23.9077 |
| 240754 | 1996 | 24.3221 |
| 240754 | 1997 | 26.2399 |
...
| 499998 | 1999 | 24.0245 |
| 499998 | 2000 | 24.6067 |
| 499998 | 2001 | 26.4438 |
| 499998 | NULL | 201.9183 |
| 499999 | 1997 | 30.6284 |
| 499999 | 1998 | 32.2322 |
| 499999 | 1999 | 34.0120 |
| 499999 | 2000 | 35.7341 |
| 499999 | 2001 | 37.1649 |
| 499999 | NULL | 169.7716 |
| NULL | NULL | 87250364.1424 |
+--------+-----------------+---------------------+
3142278 rows in set (8.76 sec)
```

According to the MySQL documentation, the calculations in the above queries are done using fixed-point exact value arithmetic. How is it that the SUM using a subquery and the GROUP BY WITH ROLLUP differ so much?

It’s certainly worth discussing MySQL floating-point approximate value types vs. the fixed-point exact value types and how conversions and calculations on each are handled. The floating-point (approximate value) types are FLOAT, REAL, and DOUBLE, and the fixed-point (exact value) types are INTEGER, SMALLINT, DECIMAL, and NUMERIC. Floating-point means the decimal point can be placed anywhere relative to the significant digits of the number with the actual position being indicated separately. A fixed-point value is an integer scaled by a specific factor.

MySQL 5.5 added support for precision math including the MySQL library for fixed-point arithmetic that allows operations to be handled in the same manner across different platforms and not dependent on the underlying C library. With this update, if no approximate values or strings are being used in a calculation, expressions are evaluated using DECIMAL exact value arithmetic with precision to 65 digits. For GROUP BY functions, STDDEV() and VARIANCE() return DOUBLE, an approximate floating-point type, and SUM() and AVG() return a DECIMAL for exact-value arguments and a DOUBLE for approximate value.

Type conversion in MySQL is handled using floating-point values. The results of type conversion may vary and can be affected by factors such as computer architecture or the compiler version or optimization level. One way to avoid these problems is to use an explicit CAST() rather than the implicit conversion.

So what does this mean and where will we encounter it in real world calculations?

There are certain naturally occurring ratios, such as the length of any circle’s diameter compared to its circumference, i.e. ?, that compute to an irrational number. An irrational number, as opposed to a rational number, is any real number that cannot be expressed as a fraction a/b, where a is an integer and b is a non-zero integer. Irrational numbers cannot be represented as terminating or repeating decimals. True irrational numbers will not be encountered in computing, only approximations thereof. An infinite continued fraction, such as 1/3, is another example in which the decimal value is approximated in computing. In 1949, John von Neumann et al. used ENIAC to compute 2037 digits of pi; a calculation that took 70 hours. Over the course of the 20th century, additional thousands of decimal places were obtained with the million-digit milestone in 1973. On August 6, 2010, Alexander J. Yee and Shigeru Kondo announced that they calculated pi to 5 trillion decimal places on a personal computer, double the previous record.

So back to the initial example, which value is more accurate and which is the correct way to obtain the SUM? The truth is, neither is exactly accurate. By using a little algebra, the query can be simplified. The idea being to work with whole numbers in as many initial steps as possible.

```
mysql> select sum(salary)/(52*40) from salaries;
+---------------------+
| sum(salary)/(52*40) |
+---------------------+
| 87250364.1438 |
+---------------------+
1 row in set (2.27 sec)
```

“The moon makes the tides change, so the oceans are werewolves. That makes sense.”

]]>I have found the following:

- The user needs permissions for a temp directory to stream to/from. The my.cnf of the target database cannot be used because the user does not have permission to write to /tmp/mysql-stdout, so we set a tmpdir in a separate defaults-file.
- A backup target directory must be used that the user has read/write permissions to. It seems to me a target directory should not be needed for a streaming backup, but it is.
- A user who has SUPER privileges must be used to connect to the target database.
- Group and datadir permissions:

- The primary group for the user should be the same as the mysql user running the target database, i.e. pythian:mysql.
- The user must have 770 access to the database schema directories.
- The user must have 740 access to the all files under the mysql datadir.

Here is the innobackupex command that works for us.

$ innobackupex --defaults-file=/home/pythian/my.cnf --user=root --stream=tar ./ | gzip - > /home/pythian/backups/2011-08-02_backup.tar.gz

So if you are seeing errors such as,

tar: -: Cannot write: Broken pipe sh: /tmp/mysql-stout: Permission denied

do yourself a favor and run through this permission checklist. I hope it saves a lot of headaches.

]]>