MySQL: Replacing URL Escape Sequences
Sep 26, 2008 / By Augusto Bott
So you want to store URLs in MySQL, and the URLs have those annoying %20%27%7C%26%5E%2B%2D%25 symbols? And you want to be able to show your users some kind of human-readable information. You might want to consider using this trick. Take this list of commonly escaped characters as an example:
%20 - space %27 - ' %7C - | %26 - & %5E - ^ %2B - + %2D - - %25 - %
So, how about we do some search’n'replace on that?
mysql> SET @url:='%20%27%7C%26%5E%2B%2D%25';
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT @url as original,
-> REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
-> REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
-> @test,'%20',' '),
-> '%27','''),
-> '%7C','|'), -- REPLACE() is case sensitive
-> '%7c','|'), -- so we have
-> '%26','&'),
-> '%5E','^'),
-> '%5e','^'), -- to repeat
-> '%2D','-'),
-> '%2d','-'), -- ourselves
-> '%2B','+'),
-> '%2b','+'), -- sometimes
-> '%25','%') as replaced;
+--------------------------+----------+
| original | replaced |
+--------------------------+----------+
| %20%27%7C%26%5E%2B%2D%25 | '|&^+-% |
+--------------------------+----------+
1 row in set (0.01 sec)
mysql>
We can easily turn this into a function:
mysql> CREATE DATABASE foo;
Query OK, 1 row affected (0.03 sec)
mysql> USE foo;
Database changed
mysql> CREATE FUNCTION url_replace
-> (url VARCHAR(1024))
-> RETURNS VARCHAR(1024)
-> BEGIN
-> RETURN REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
-> REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
-> @test,'%20',' '),
-> '%27','''),
-> '%7C','|'),
-> '%7c','|'),
-> '%26','&'),
-> '%5E','^'),
-> '%5e','^'),
-> '%2D','-'),
-> '%2d','-'),
-> '%2B','+'),
-> '%2b','+'),
-> '%25','%');
-> END$$
Query OK, 0 rows affected (0.02 sec)
mysql> DELIMITER ;
mysql> SELECT url_replace('%20%27%7C%26%5E%2B%2D%25') as bar;
+----------+
| bar |
+----------+
| '|&^+-% |
+----------+
1 row in set (0.00 sec)
mysql>
I hope this is useful to you guys. Cheers!
Useful links:

Hi!
It’s a nice try, but I don’t think this covers all cases. Lookhere (rfc 1738):
“…octets may be encoded by a character triplet consisting of the character “%” followed by the two hexadecimal digits…
and
”
Octets must be encoded if they have no corresponding graphic character within the US-ASCII coded character set, if the use of the corresponding character is unsafe, or if the corresponding character is reserved for some other interpretation within the particular URL scheme.
”
So, this basically says that *any* character may be encoded, and that some *must* be encoded. So in order to do it right, you actually have to decode every occurrence of %HH.
In addition, (can’t find which RFC that is), the space character (ascii 32) maybe encoded as +
As far as I can see the only way to do this reliably and correct is to loop over all characters. Ugly, slow…but correct. So I guess I would recommend everybody to deal with this in the application before putting it into the db. That, or write a UDF (for performance reasons).
kind regards,
Roland Bouman
@Roland,
Thanks for your feedback! Indeed, replacing every occurrence of %HH is a requirement for full compliance (as well as parsing correctly the space character). This snippet is an (incomplete) example.
Cheers!
Yes not complete. The example as given doesn’t parse in MySQL.
I recently wanted to replace a string within MySQL on the fly, but the field could contain 2 items. So I wrapped a REPLACE() within a REPLACE(), such as:
REPLACE(REPLACE(field_name, “what we are looking for”, “replace first instance”), “something else we are looking for”, “replace second instance”)
This is the syntax I used to detect a boolean value:
REPLACE(REPLACE(field, 1, “Yes”), 0, “No”)
Hope this helps!