THE WORLD DISCUSSES #PYTHIAN ON TWITTER. HAVE A QUESTION? USE OUR HASHTAG AND ASK AWAY.

MySQL: Replacing URL Escape Sequences

So you want to store URLs in MySQL, and the URLs have those annoying %20%27%7C%26%5E%2B%2D%25 symbols? And you want to be able to show your users some kind of human-readable information. You might want to consider using this trick. Take this list of commonly escaped characters as an example:

%20 - space
%27 - '
%7C - |
%26 - &
%5E - ^
%2B - +
%2D - -
%25 - %

So, how about we do some search’n'replace on that?

mysql> SET @url:='%20%27%7C%26%5E%2B%2D%25';
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT @url as original,
    ->        REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
    ->        REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
    ->        @test,'%20',' '),
    ->        '%27','\''),
    ->        '%7C','|'),  -- REPLACE() is case sensitive
    ->        '%7c','|'),  -- so we have
    ->        '%26','&'),
    ->        '%5E','^'),
    ->        '%5e','^'),  -- to repeat
    ->        '%2D','-'),
    ->        '%2d','-'),  -- ourselves
    ->        '%2B','+'),
    ->        '%2b','+'),  -- sometimes
    ->        '%25','%') as replaced;
+--------------------------+----------+
| original                 | replaced |
+--------------------------+----------+
| %20%27%7C%26%5E%2B%2D%25 |  '|&^+-% |
+--------------------------+----------+
1 row in set (0.01 sec)

mysql>

We can easily turn this into a function:

mysql> CREATE DATABASE foo;
Query OK, 1 row affected (0.03 sec)

mysql> USE foo;
Database changed
mysql> CREATE FUNCTION url_replace
    -> (url VARCHAR(1024))
    -> RETURNS VARCHAR(1024)
    -> BEGIN
    -> RETURN REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
    ->        REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
    ->        @test,'%20',' '),
    ->        '%27','\''),
    ->        '%7C','|'),
    ->        '%7c','|'),
    ->        '%26','&'),
    ->        '%5E','^'),
    ->        '%5e','^'),
    ->        '%2D','-'),
    ->        '%2d','-'),
    ->        '%2B','+'),
    ->        '%2b','+'),
    ->        '%25','%');
    -> END$$
Query OK, 0 rows affected (0.02 sec)

mysql> DELIMITER ;

mysql> SELECT url_replace('%20%27%7C%26%5E%2B%2D%25') as bar;
+----------+
| bar      |
+----------+
|  '|&^+-% |
+----------+
1 row in set (0.00 sec)

mysql>

I hope this is useful to you guys. Cheers!

Useful links:

4 Responses

  1. Hi!

    It’s a nice try, but I don’t think this covers all cases. Lookhere (rfc 1738):

    “…octets may be encoded by a character triplet consisting of the character “%” followed by the two hexadecimal digits…

    and


    Octets must be encoded if they have no corresponding graphic character within the US-ASCII coded character set, if the use of the corresponding character is unsafe, or if the corresponding character is reserved for some other interpretation within the particular URL scheme.

    So, this basically says that *any* character may be encoded, and that some *must* be encoded. So in order to do it right, you actually have to decode every occurrence of %HH.

    In addition, (can’t find which RFC that is), the space character (ascii 32) maybe encoded as +

    As far as I can see the only way to do this reliably and correct is to loop over all characters. Ugly, slow…but correct. So I guess I would recommend everybody to deal with this in the application before putting it into the db. That, or write a UDF (for performance reasons).

    kind regards,
    Roland Bouman

  2. Augusto Bott says:

    @Roland,

    Thanks for your feedback! Indeed, replacing every occurrence of %HH is a requirement for full compliance (as well as parsing correctly the space character). This snippet is an (incomplete) example.

    Cheers!

  3. Websurfer says:

    Yes not complete. The example as given doesn’t parse in MySQL.

  4. I recently wanted to replace a string within MySQL on the fly, but the field could contain 2 items. So I wrapped a REPLACE() within a REPLACE(), such as:

    REPLACE(REPLACE(field_name, “what we are looking for”, “replace first instance”), “something else we are looking for”, “replace second instance”)

    This is the syntax I used to detect a boolean value:

    REPLACE(REPLACE(field, 1, “Yes”), 0, “No”)

    Hope this helps!

Leave a Reply

Start NowWith Pythian - database design, management and emergency handling capabilities...

Live Updates

pythian: RT @FN_Press2: Schooner Information Technology Teams with Pythian to Deliver Advanced Support and High... http://finanznachrichten.de/20
more



Testimonials

  • Serge Racine

    DBA, Brookfield Energy

    We are very satisfied by the service given to us by Andre and Shakir in support of our recent data quality and reorganization initiative.... more