Beware: Default charset for mysqldump is utf8, regardless of server default charset

Posted in: Technical Track

I ran into this issue a while ago, and was reminded of it again recently.  mysqldumpuses a default charset of utf8, even when the default charset of the server is set differently.  Why does this matter?

The problem exists more in the fact that if you have string data that is in latin1 format, you are allowed to put in non-Latin characters. This can lead to lost data, especially when upgrading a major series (e.g. 5.0 to 5.1 or 5.1 to 5.5), because you’re supposed to export and import the data.

Also, when importing a backup of an InnoDB table, if there is an error with one of the parts of the INSERT, the whole INSERT statement rolls back.  I have experienced major data loss because the garbled characters cause an error when INSERTed, and it causes perfectly fine data *not* to import because they’re in the same INSERT statement as the garbled data.

For example:

First, set variables such on a MySQL server (5.0 or 5.1, I haven’t tested on 5.5):

mysql> show global variables like ‘%char%’;

+————————–+—————————-+

| Variable_name            | Value                      |

+————————–+—————————-+

| character_set_client     | latin1                     |

| character_set_connection | latin1                     |

| character_set_database   | latin1                     |

| character_set_filesystem | binary                     |

| character_set_results    | latin1                     |

| character_set_server     | latin1                     |

| character_set_system     | utf8                       |

| character_sets_dir       | /usr/share/mysql/charsets/ |

+————————–+—————————-+

8 rows in set (0.00 sec)

 

Then create these tables with data:

 

CREATE TABLE `test_utf8` (

`kwid` int(10) unsigned NOT NULL default ‘0’,

`keyword` varchar(80) NOT NULL default ”

) ENGINE=InnoDB DEFAULT CHARSET=utf8;

 

INSERT INTO `test_utf8` VALUES

(1,’watching’),(2,’poet’),(3,’?????????’),(4,’?????’);

 

CREATE TABLE `test_latin1` (

`kwid` int(10) unsigned NOT NULL default ‘0’,

`keyword` varchar(80) NOT NULL default ”

) ENGINE=InnoDB DEFAULT CHARSET=latin1;

 

INSERT INTO `test_latin1` VALUES

(1,’watching’),(2,’poet’),(3,’?????????’),(4,’?????’);

 

Now compare:

mysqldump test > test_export_utf8.sql

mysqldump –default-character-set=latin1 test > test_export_latin1.sql

 

Note that the test export with the default character set of utf8 has mojibake whereas the export with latin1 does not.

 

So be *extremely* careful when using mysqldump – whether for backups or while upgrading.  You can checksum your data before and after  an export/import with mysqldump to be sure that your data is the same.

email

Interested in working with Sheeri? Schedule a tech call.

2 Comments. Leave new

Will setting mysqldump charset to be same as server charset solve the problem ?

Reply

I just spent 12 hours meticulously confirming this positively whopper of a brain-dead “feature” of mysql. Thanks very much for the confirmation. And yest, @satish, setting default-character-set=binary fixes the problem. You can do this in /etc/my.cnf or in your user’s .my.cnf. You can do it in the [client] section or in the [mysqldump] section. For sanity and consistency, I will put it in /etc/my.cnf, [mysqldump] section, and set the encoding to binary. (In theory that’s the same as latin1, but perhaps not always).

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *