SQL server : Convert BBcode to HTML

Oct 19, 2011 / By Mohammed Mawla

Tags: , ,

If you ever participated in some forum or message board , then you know what’s BBCode.

What’s BBCode ?

BBCode is short for Bulletin Board Code and used to format posts in forums or message boards as a fast way to achieve formating without the need to add any complex HTML code beside allowing safer posts (no JS code for example). BBCode uses tags indicated by rectangular brackets surrounding a keyword. At the time of displaying as part of a webpage , those tags are converted to HTML.

A simple example of BBCode would be to make some text bold , this can be simply written like that [b]Bold Text[/b] or italic [i]italic text[/i]. In HTML , this can be done by font tags (deprecated in newer HTML versions) or CSS text properties. I tell you what! I’m writing this blog and I’m even using some BBCode tags to format text.

Case defined

One of our clients had a database that contained forum posts stored inside the database and the rows included various BBCode tags. They decided to migrate to a new platform and it was required that the BBCode stored to be migrated to its equivalent HTML ones.

The requirement was to export the data to flatfiles and the code to be migrated on the fly. Converting them at the App side wasn’t an option for the client due to the limitations of the destination platform.

Talk the talk

We determined that we need to commit the following actions :

1- Determine the BBcode tags that are stored in the table(s). This was an easy task since they had a list of supported BBCode tags.

Although not exactly the list that we had , the following page lists some of the famous BBCode operators

2- Figure out the equivalent HTML for each BBCode.

Some of the tags were pretty simple HTML code such as Bold , Italic formatting so it was easy to figure it out. Some were subject to the client direction since they have specific platform; examples were images and video posts.

We conducted some research on the different tags and made a list of the BBCode tags and their “equivalent” HTML ones.

3- After that , we needed to find out a way to convert the BBCode data to HTML on the fly while writing the output to flatfiles.

Walk the walk

Because T-SQL is a bit limited when it comes to string manipulations compared to .NET , we opted to use CLR to use .Net regular expressions and take advantage of its abilities to fast parse large amounts of text to find specific character patterns.

I’m not going to dip which is better, TSQL or CLR , to use for different tasks but it is generally accepted that CLR excels for procedural, computational work ; for data access the easier and direct T-SQL wins. The following whitepaper, Using CLR Integration in SQL Server 2005, provides more info about that.

Now we have the BBCode tag , the equivalent HTML and determined that CLR is the way to , we just need to code!

I used Visual C# express 2010 to create a class DLL so that I can use later as a CLR function. The code uses namespace System.Text.RegularExpressions for regular expression functionality.

Parts of the code I saw on some forums online, can’t recall exactly where but I wrote more to fulfill the need.

The following is the main function containing regular expressions definitions. The Whole code file is attached as a CShap file.

static Converter_bbcode()
    {
        try
        {

        BBCode_parser.Add(new RegexFormatter(@"", string.Empty));

        // Font
        BBCode_parser.Add(new RegexFormatter(@"[b(?:s*)]((.|n)*?)[/b(?:s*)]", "<strong>$1</strong>"));
        BBCode_parser.Add(new RegexFormatter(@"[bold(?:s*)]((.|n)*?)[/bold(?:s*)]", "<strong>$1</strong>"));
        BBCode_parser.Add(new RegexFormatter(@"[i(?:s*)]((.|n)*?)[/i(?:s*)]", "<span style="font-style: italic">$1</span>"));
        BBCode_parser.Add(new RegexFormatter(@"[italic(?:s*)]((.|n)*?)[/italic(?:s*)]", "<span style="font-style: italic">$1</span>"));
        BBCode_parser.Add(new RegexFormatter(@"[u(?:s*)]((.|n)*?)[/u(?:s*)]", "<span style="text-decoration: underline">$1</span>"));
        BBCode_parser.Add(new RegexFormatter(@"[underline(?:s*)]((.|n)*?)[/underline(?:s*)]", "<span style="text-decoration: underline">$1</span>"));
        BBCode_parser.Add(new RegexFormatter(@"[s(?:s*)]((.|s)*?)[/s(?:s*)]", "<span style="text-decoration: line-through">$1</span>"));
        BBCode_parser.Add(new RegexFormatter(@"[strike(?:s*)]((.|s)*?)[/strike(?:s*)]", "<span style="text-decoration: line-through">$1</span>"));
        BBCode_parser.Add(new RegexFormatter(@"[size=((.|n)*?)(?:s*)]((.|n)*?)[/size(?:s*)]", "<font>$3</font>"));

            //color
        BBCode_parser.Add(new RegexFormatter(@"[color=((.|n)*?)(?:s*)]((.|n)*?)[/color(?:s*)]", "<span style="color:$1">$3</span>"));

        BBCode_parser.Add(new RegexFormatter(@"[background=((.|n)*?)(?:s*)]((.|n)*?)[/background(?:s*)]", "<span style="background-color:$1">$3</span>"));
         //highlight
        BBCode_parser.Add(new RegexFormatter(@"[highlight(?:s*)]((.|n)*?)[/highlight(?:s*)]", "<span style="background-color:yellow">$1</span>"));
        BBCode_parser.Add(new RegexFormatter(@"[highlight=((.|n)*?)(?:s*)]((.|n)*?)[/highlight(?:s*)]", "<span style="background-color:$1">$3</span>"));
        BBCode_parser.Add(new RegexFormatter(@"[hl(?:s*)]((.|n)*?)[/hl(?:s*)]", "<span style="background-color:yellow">$1</span>"));
        BBCode_parser.Add(new RegexFormatter(@"[hl=((.|n)*?)(?:s*)]((.|n)*?)[/hl(?:s*)]", "<span style="background-color:$1">$3</span>"));

        //blink
        BBCode_parser.Add(new RegexFormatter(@"[blink(?:s*)]((.|n)*?)[/blink(?:s*)]", "<span style="text-decoration: blink">$1</span>"));

          //Align
        BBCode_parser.Add(new RegexFormatter(@"[left(?:s*)]((.|n)*?)[/left(?:s*)]", "<div style="text-align:left">$1</div>"));
        BBCode_parser.Add(new RegexFormatter(@"[center(?:s*)]((.|n)*?)[/center(?:s*)]", "<div style="text-align:center">$1</div>"));
        BBCode_parser.Add(new RegexFormatter(@"[right(?:s*)]((.|n)*?)[/right(?:s*)]", "<div style="text-align:right">$1</div>"));

        //URL
        BBCode_parser.Add(new RegexFormatter(@"[url(?:s*)]www.(.*?)[/url(?:s*)]", "<a href="http://www.$1" title="$1">$1</a>"));
        BBCode_parser.Add(new RegexFormatter(@"[url(?:s*)]((.|n)*?)[/url(?:s*)]", "<a href="$1" title="$1">$1</a>"));
        BBCode_parser.Add(new RegexFormatter("\[url=(?:"|&quot;|")((.|\n)*?)(?:\s*)(?:"|&quot;|")\]((.|\n)*?)\[/url(?:\s*)\]", "<a href="$1" title="$1">$3</a>"));
        BBCode_parser.Add(new RegexFormatter(@"[url=((.|n)*?)(?:s*)]((.|n)*?)[/url(?:s*)]", "<a href="$1" title="$1">$3</a>"));

        //Email
        BBCode_parser.Add(new RegexFormatter(@"[email(?:s*)]((.|n)*?)[/email(?:s*)]", "<a href="mailto:$1">$1</a>"));
        BBCode_parser.Add(new RegexFormatter(@"[email=""((.|n)*?)(?:s*)""]((.|n)*?)[/email(?:s*)]", "<a href="mailto:$1" title="$3">$3</a>"));
        BBCode_parser.Add(new RegexFormatter(@"[email=((.|n)*?)(?:s*)]((.|n)*?)[/email(?:s*)]", "<a href="mailto:$1" title="$3">$3</a>"));

        //Image
        BBCode_parser.Add(new RegexFormatter(@"[img(?:s*)]((.|n)*?)[/img(?:s*)]", "<img src="$1" alt="image" class="bbcode-image" />"));
        BBCode_parser.Add(new RegexFormatter(@"[img align=((.|n)*?)(?:s*)]((.|n)*?)[/img(?:s*)]", "<img src="$3" align="$1" alt="image" class="bbcode-image" />"));
        BBCode_parser.Add(new RegexFormatter(@"[img=((.|n)*?),((.|n)*?),((.|n)*?)(?:s*)]((.|n)*?)[/img(?:s*)]", "<img width="$1" height="$3" align="$5" src="$7" alt="image" class="bbcode-image" />"));
        BBCode_parser.Add(new RegexFormatter(@"[img=((.|n)*?),((.|n)*?)(?:s*)]((.|n)*?)[/img(?:s*)]", "<img width="$1" height="$3" src="$5" alt="image" class="bbcode-image" />"));

        BBCode_parser.Add(new RegexFormatter(@"[image=((.|n)*?)(?:s*)]", "<div class="legacyImgWrapper"><a class="BoardRowBLink" target="_blank" href="$1" /><img src="$1" alt="image" class="bbcode-image" /></a></div>"));
        BBCode_parser.Add(new RegexFormatter(@"[imagecenter=((.|n)*?)(?:s*)]", "<div class="legacyImgWrapper"><a class="BoardRowBLink" target="_blank" href="$1" /><img src="$1" align="center" alt="image" class="bbcode-image" /></a></div>"));
        BBCode_parser.Add(new RegexFormatter(@"[imageleft=((.|n)*?)(?:s*)]", "<div class="legacyImgWrapper"><a class="BoardRowBLink" target="_blank" href="$1" /><img src="$1" align="left" alt="image" class="bbcode-image" /></a></div>"));
        BBCode_parser.Add(new RegexFormatter(@"[imageright=((.|n)*?)(?:s*)]", "<div class="legacyImgWrapper"><a class="BoardRowBLink" target="_blank" href="$1" /><img src="$1" align="right" alt="image" class="bbcode-image" /></a></div>"));
        BBCode_parser.Add(new RegexFormatter(@"[imagelink src=((.|n)*?)url='((.|n)*?)']((.|n)*?)[/imagelink(?:s*)]", "<div class="legacyImgWrapper"><a class="BoardRowBLink" target="_blank" href="$3" /><img src="$3" alt="image" class="bbcode-image" /></a></div>"));

       //Flash
        BBCode_parser.Add(new RegexFormatter(@"[flash(?:s*)]((.|n)*?)[/flash(?:s*)]", ""));
        BBCode_parser.Add(new RegexFormatter(@"[flash=((.|n)*?),((.|n)*?)(?:s*)]((.|n)*?)[/flash(?:s*)]", ""));
        BBCode_parser.Add(new RegexFormatter(@"[*(?:s*)]s*([^[]*)", "<li>$1</li>"));

        //List
        BBCode_parser.Add(new RegexFormatter(@"[list(?:s*)]((.|n)*?)[/list(?:s*)]", "<ul class="bbcode-list">$1</ul>"));
        BBCode_parser.Add(new RegexFormatter(@"[list=1(?:s*)]((.|n)*?)[/list(?:s*)]", string.Format(CultureInfo.InvariantCulture, FORMAT_LIST, "decimal"), false));
        BBCode_parser.Add(new RegexFormatter(@"[list=i(?:s*)]((.|n)*?)[/list(?:s*)]", string.Format(CultureInfo.InvariantCulture, FORMAT_LIST, "lower-roman"), false));
        BBCode_parser.Add(new RegexFormatter(@"[list=I(?:s*)]((.|n)*?)[/list(?:s*)]", string.Format(CultureInfo.InvariantCulture, FORMAT_LIST, "upper-roman"), false));

        BBCode_parser.Add(new RegexFormatter(@"[ol(?:s*)]((.|n)*?)[/ol(?:s*)]", "<ul class="bbcode-list">$1</ul>"));
        BBCode_parser.Add(new RegexFormatter(@"[ol=1(?:s*)]((.|n)*?)[/ol(?:s*)]", string.Format(CultureInfo.InvariantCulture, FORMAT_LIST, "decimal"), false));
        BBCode_parser.Add(new RegexFormatter(@"[ol=i(?:s*)]((.|n)*?)[/ol(?:s*)]", string.Format(CultureInfo.InvariantCulture, FORMAT_LIST, "lower-roman"), false));
        BBCode_parser.Add(new RegexFormatter(@"[ol=I(?:s*)]((.|n)*?)[/ol(?:s*)]", string.Format(CultureInfo.InvariantCulture, FORMAT_LIST, "upper-roman"), false));

        BBCode_parser.Add(new RegexFormatter(@"[ol(?:s*)]((.|n)*?)[/ol(?:s*)]", "<ol>$1</ol>"));

        BBCode_parser.Add(new RegexFormatter(@"[dir(?:s*)]((.|n)*?)[/dir(?:s*)]", "$1"));

        BBCode_parser.Add(new RegexFormatter(@"[li(?:s*)]((.|n)*?)[/li(?:s*)]", "<li>$1</li>"));

        BBCode_parser.Add(new RegexFormatter(@"[ul(?:s*)]((.|n)*?)[/ul(?:s*)]", "<ul class="bbcode-list">$1</ul>"));
        BBCode_parser.Add(new RegexFormatter(@"[ul=1(?:s*)]((.|n)*?)[/ul(?:s*)]", string.Format(CultureInfo.InvariantCulture, FORMAT_LIST, "decimal"), false));
        BBCode_parser.Add(new RegexFormatter(@"[ul=i(?:s*)]((.|n)*?)[/ul(?:s*)]", string.Format(CultureInfo.InvariantCulture, FORMAT_LIST, "lower-roman"), false));
        BBCode_parser.Add(new RegexFormatter(@"[ul=I(?:s*)]((.|n)*?)[/ul(?:s*)]", string.Format(CultureInfo.InvariantCulture, FORMAT_LIST, "upper-roman"), false));

         BBCode_parser.Add(new RegexFormatter(@"[list=a(?:s*)]((.|n)*?)[/list(?:s*)]", "<span>$1</span>", false));
        BBCode_parser.Add(new RegexFormatter(@"[list=A(?:s*)]((.|n)*?)[/list(?:s*)]", "<span>$1</span>", false));

        BBCode_parser.Add(new SearchReplaceFormatter("r", ""));
        BBCode_parser.Add(new SearchReplaceFormatter("nn", "<br /><br />"));
        BBCode_parser.Add(new SearchReplaceFormatter("n", "<br />"));

        //code
        BBCode_parser.Add(new RegexFormatter(@"1[/code]((.|n)*?)[/code(?:s*)]", "<div class="bbc-codetitle">Code:</div><div class="bbc-codecontent"><pre>$1</pre></div>"));
        BBCode_parser.Add(new RegexFormatter(@"1[/php]((.|n)*?)[/php(?:s*)]", "<div class="bbc-codetitle">PHP Code:</div><div class="bbc-codecontent"><pre>$1</pre></div>"));

       //quote
        BBCode_parser.Add(new RegexFormatter(@"[quote=((.|n)*?)(?:s*)]((.|n)*?)[/quote(?:s*)]", "<strong>$1</strong> said: <blockquote>$3</blockquote>"));
        BBCode_parser.Add(new RegexFormatter(@"[quote=((.|n)*?)(?:s*)]", "<blockquote>$1</blockquote>"));
        BBCode_parser.Add(new RegexFormatter(@"[quote member=((.|n)*?)(?:s*)]((.|n)*?)[/quote(?:s*)]", "<strong>$1</strong> said: <blockquote>$3</blockquote>"));
        BBCode_parser.Add(new RegexFormatter(@"[quote(?:s*)]((.|n)*?)[/quote(?:s*)]", "<blockquote>$1</blockquote>"));
        BBCode_parser.Add(new RegexFormatter(@"[blockquote=((.|n)*?)(?:s*)]((.|n)*?)[/blockquote(?:s*)]", "<strong>$1</strong> said: <blockquote>$3</blockquote>"));
        BBCode_parser.Add(new RegexFormatter(@"[blockquote=((.|n)*?)(?:s*)]", "<blockquote>$1</blockquote>"));
        BBCode_parser.Add(new RegexFormatter(@"[blockquote(?:s*)]((.|n)*?)[/blockquote(?:s*)]", "<blockquote>$1</blockquote>"));

        BBCode_parser.Add(new RegexFormatter(@"[spoiler(?:s*)]((.|n)*?)[/spoiler(?:s*)]", "<span class="bbc-spoiler">$1</span>"));
        BBCode_parser.Add(new RegexFormatter(@"[indent(?:s*)]((.|n)*?)[/indent(?:s*)]", "<div class="bbc-indent">$1</div>"));

        BBCode_parser.Add(new RegexFormatter(@"[hr(?:s*)][/hr(?:s*)]", "<hr />"));
        BBCode_parser.Add(new RegexFormatter(@"[hr(?:s*)]", "<hr />"));

        BBCode_parser.Add(new RegexFormatter(@"[rule=((.|n)*?)(?:s*)]((.|n)*?)[/rule(?:s*)]", "<div style="height: 0pt;border-top: 1px solid $3;margin: auto;width: $1"></div>"));

        //Styles
        BBCode_parser.Add(new RegexFormatter(@"[size=+((.|n)*?)(?:s*)]((.|n)*?)[/size(?:s*)]", "<span style="font-size:$1em">$3</span>"));
        BBCode_parser.Add(new RegexFormatter(@"[size=((.|n)*?)(?:s*)]((.|n)*?)[/size(?:s*)]", "<span style="font-size:$1">$3</span>"));
        BBCode_parser.Add(new RegexFormatter(@"[font=((.|n)*?)(?:s*)]((.|n)*?)[/font(?:s*)]", "<span style="font-family:$1">$3</span>"));
        BBCode_parser.Add(new RegexFormatter(@"[align=((.|n)*?)(?:s*)]((.|n)*?)[/align(?:s*)]", "<span style="text-align:$1">$3</span>"));
        BBCode_parser.Add(new RegexFormatter(@"[float=((.|n)*?)(?:s*)]((.|n)*?)[/float(?:s*)]", "<span style="float:$1">$3</div>"));

        BBCode_parser.Add(new RegexFormatter(@"[*(?:s*)]s*([^[]*)", "<li>$1</li>"));
        //LINK
        BBCode_parser.Add(new RegexFormatter(@"[link(?:s*)]((.|n)*?)[/link(?:s*)]", "<a href="$1" target="_blank" title="$1">$1</a>"));
        BBCode_parser.Add(new RegexFormatter(@"[link=((.|n)*?)(?:s*)]((.|n)*?)[/link(?:s*)]", "<a href="$1" target="_blank" title="$1">$3</a>"));
        BBCode_parser.Add(new RegexFormatter(@"[link=((.|n)*?)(?:s*)]", "<a href="$1" target="_blank" title="$1">$1</a>"));

                //Youtube
        BBCode_parser.Add(new RegexFormatter(@"[youtube(?:s*)]((.|n)*?)[/youtube(?:s*)]", ""));
        }
        catch (Exception)
        {
            return;
        }

}

There are two important requirements when building the code:

1- .Net framework version

You need to build the dll to work with SQL server CLR base; VS 2010 , by default, builds projects against .net framework 4.0 which is NOT supported even with SQL server 2008 R2

SQL Server 2008 & 2008 R2 still load 2.0 CLR ;SQL Server 2008 uses .NET Framework 3.5 SP1 which is just an update of framework 2.0 base with some new assemblies.

You can verify the CLR version by running this code

SELECT * FROM sys.dm_clr_properties

Result on my computer which is SQL server 2008 R2

name----------------value
-----------------------------------------------------------------------------------
directory----------------C:WindowsMicrosoft.NETFramework64v2.0.50727
version---------------- v2.0.50727
state---------------- CLR is initialized

If you build the DLL against .NET framework 4.0 and try to load into SQL server, you'll get this error

Msg 6257, Level 16, State 1, Line 1
CREATE ASSEMBLY for assembly 'bbcode_parser' failed because the assembly is built for an unsupported version of the Common Language Runtime.

If you are using IDE to build the DLL then you can define the target framework from project properties, application tab as in the screenshot here

.net framework target

If you are compiling the code using C# compiler , then having only .net 3.5 should be enough; all you want to do is to browse to Microsoft.NETFramework folder under the system directory to compile the code.

Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation. All rights reserved.

C:WindowsMicrosoft.NETFrameworkv3.5>csc /out:d:tempbbcode_parser.dll /target:library d:tempcode.cs
Microsoft (R) Visual C# 2008 Compiler version 3.5.30729.5420
for Microsoft (R) .NET Framework version 3.5
Copyright (C) Microsoft Corporation. All rights reserved.

2- Assembly access permission

Depending on the assembly role, we specify its access permissions. Usually SAFE access permission should be enough if the assembly does NOT access external system resources like in our case. SAFE is the most restrictive permission set; code executed by an assembly with SAFE permissions cannot access external system resources such as files, the network, environment variables, or the registry.

I have a blog where I had to sign the assembly because it needed access to external resources (Web resources) , we don't need that here. More info about creating an assembly.

Just for information
IDE: You can sign the assembly from project properties > signing tab

Signing

C# compiler : you'll need to specify key file using parameter /keyfile

You'll need first to create a key file using Strong Name Tool (Sn.exe)

Using the assembly inside SQL server

With the DLL in hand , all we do is to create an assembly inside SQL server , build a CLR function on top of it and call the function.

1- Ensure CLR is enabled

Exec master..Sp_configure 'clr enabled', 1
Go
Reconfigure
Go

2- Create the assembly with access permission=SAFE

CREATE assembly bbcode_parser FROM 'd:tempbbcode_parser.dll'
WITH PERMISSION_SET=SAFE
GO

3-Create the CLR function

CREATE function bbcode_parser (@bbcode NVARCHAR(MAX),@scrubHtml bit) returns nvarchar(max)
AS external name bbcode_parser.Converter_bbcode.BBCode_To_HTML_Convert ;
Go

Testing the function

We can test the function by feeding it plain BBCode. Here are some examples and the result.

Browser should render the effects of the real output

set nocount on
GO
Select dbo.bbcode_parser ('[hl]Text with yellow background[/hl]',0)
Go
Select dbo.bbcode_parser ('[u] Underline text[/u]',0)
Go
Select dbo.bbcode_parser ('[link]http://www.pythian.com[/link]',0)
Go
select dbo.bbcode_parser ('[ol][li]{point1}[/li] [li]{point2}[/li][/ol]',0)

Text with yellow background

Underline text

http://www.pythian.com

  • {point1}
  • {point2}

I hope that's helpful for any of you and I apologize if it is a bit long, just trying to make it clear

Cheers

2 Responses to “SQL server : Convert BBcode to HTML”

Leave a Reply

  • (will not be published)

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>