MySQL: Replacing URL Escape Sequences

Sep 26, 2008 / By Augusto Bott

Tags:

So you want to store URLs in MySQL, and the URLs have those annoying %20%27%7C%26%5E%2B%2D%25 symbols? And you want to be able to show your users some kind of human-readable information. You might want to consider using this trick. Take this list of commonly escaped characters as an example:

%20 - space
%27 - '
%7C - |
%26 - &
%5E - ^
%2B - +
%2D - -
%25 - %

So, how about we do some search’n'replace on that?

mysql> SET @url:='%20%27%7C%26%5E%2B%2D%25';
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT @url as original,
    ->        REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
    ->        REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
    ->        @test,'%20',' '),
    ->        '%27','''),
    ->        '%7C','|'),  -- REPLACE() is case sensitive
    ->        '%7c','|'),  -- so we have 
    ->        '%26','&'),
    ->        '%5E','^'),
    ->        '%5e','^'),  -- to repeat 
    ->        '%2D','-'),
    ->        '%2d','-'),  -- ourselves
    ->        '%2B','+'),
    ->        '%2b','+'),  -- sometimes
    ->        '%25','%') as replaced;
+--------------------------+----------+
| original                 | replaced |
+--------------------------+----------+
| %20%27%7C%26%5E%2B%2D%25 |  '|&^+-% | 
+--------------------------+----------+
1 row in set (0.01 sec)

mysql>

We can easily turn this into a function:

mysql> CREATE DATABASE foo;
Query OK, 1 row affected (0.03 sec)

mysql> USE foo;
Database changed
mysql> CREATE FUNCTION url_replace 
    -> (url VARCHAR(1024))
    -> RETURNS VARCHAR(1024)
    -> BEGIN
    -> RETURN REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
    ->        REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
    ->        @test,'%20',' '),
    ->        '%27','''),
    ->        '%7C','|'),
    ->        '%7c','|'),
    ->        '%26','&'),
    ->        '%5E','^'),
    ->        '%5e','^'),
    ->        '%2D','-'),
    ->        '%2d','-'),
    ->        '%2B','+'),
    ->        '%2b','+'),
    ->        '%25','%');
    -> END$$
Query OK, 0 rows affected (0.02 sec)

mysql> DELIMITER ;

mysql> SELECT url_replace('%20%27%7C%26%5E%2B%2D%25') as bar;
+----------+
| bar      |
+----------+
|  '|&^+-% | 
+----------+
1 row in set (0.00 sec)

mysql>

I hope this is useful to you guys. Cheers!

Useful links:

4 comments on “MySQL: Replacing URL Escape Sequences

  1. Roland Bouman on said:

    Hi!

    It’s a nice try, but I don’t think this covers all cases. Lookhere (rfc 1738):

    “…octets may be encoded by a character triplet consisting of the character “%” followed by the two hexadecimal digits…

    and


    Octets must be encoded if they have no corresponding graphic character within the US-ASCII coded character set, if the use of the corresponding character is unsafe, or if the corresponding character is reserved for some other interpretation within the particular URL scheme.

    So, this basically says that *any* character may be encoded, and that some *must* be encoded. So in order to do it right, you actually have to decode every occurrence of %HH.

    In addition, (can’t find which RFC that is), the space character (ascii 32) maybe encoded as +

    As far as I can see the only way to do this reliably and correct is to loop over all characters. Ugly, slow…but correct. So I guess I would recommend everybody to deal with this in the application before putting it into the db. That, or write a UDF (for performance reasons).

    kind regards,
    Roland Bouman

  2. Augusto Bott on said:

    @Roland,

    Thanks for your feedback! Indeed, replacing every occurrence of %HH is a requirement for full compliance (as well as parsing correctly the space character). This snippet is an (incomplete) example.

    Cheers!

  3. Websurfer on said:

    Yes not complete. The example as given doesn’t parse in MySQL.

  4. walmley web design on said:

    I recently wanted to replace a string within MySQL on the fly, but the field could contain 2 items. So I wrapped a REPLACE() within a REPLACE(), such as:

    REPLACE(REPLACE(field_name, “what we are looking for”, “replace first instance”), “something else we are looking for”, “replace second instance”)

    This is the syntax I used to detect a boolean value:

    REPLACE(REPLACE(field, 1, “Yes”), 0, “No”)

    Hope this helps!

Leave a Reply

Your email address will not be published. Required fields are marked *

*

HTML tags are not allowed.