XPathScript Reborn

Jul 7, 2010 / By Yanick Champoux

Tags: , ,

A long, long time ago, Matt Sergeant (of SpamAssassin fame) came up with an XML application server for Apache called AxKit. It was quite nifty, and offered many ways to transform XML documents. One of them was an home-brewed stylesheet language called XPathScript, which very quickly caught my fancy. It had a very Perlish way of doing things and was feeling infinitely more ergonomic to me than, say, the visual tag-storm that is XSLT. So, quite naturally, it was not long before I found myself wanting to use it not only in the context of an AxKit, but as a generic XML transformer. A little hacking happened to decouple the core engine from its Apache roots, and XML::XPathScript was born.

That module served me quite well throughout the years, but for some time now I’ve had this plan of doing a clean rewrite patiently sitting on my back-burner. There are a few new features that I wanted to wedge in (an easier, cleaner way to create and extend stylesheets, a way for the transformation elements to pass information back and forth), and other infrastructure details (like the way the current XPathScript definition of ‘template’ and ‘stylesheet’ is the inverse of what one would expect). But, of course round tuits are rare, and that project lingered…

… but lingers no more. This week I had a smashing staycation, and thanks to a very understanding wife, I was able to indulge in the necessary hacking sessions to get the ground work done. The result is not on CPAN yet, but can be perused on GitHub.

As an example is worth a thousand pages of documentation, let’s say that you want to turn the piece of docbook-ish xml

<section title="Introduction">
<para>This is the first paragraph.</para>
<para>And here comes the second one.</para>
</section>

into the html

<h1>Introduction</h1>
<p class="first_para">This is the first paragraph.</p>
<p>And here comes the second one.</p>

Here a XML::XSS script that will do the trick:

use XML::XSS;

my $xss = XML::XSS->new;

$xss->set(
    section => {
        showtag => 0,
        intro   => sub {
            my ( $self, $node ) = @_;
            $self->stash->{seen_para} = 0;    # reset flag
            return '<h1>' . $node->findvalue('@title') . '</h1>';
        },
    } );

$xss->set(
    para => {
        pre   => '<p>',
        post  => '</p>',
        process => sub {
            my ( $self, $node ) = @_;

            $self->set_pre('<p class="first_para">')
                unless $self->{seen_para}++;

            return 1;
        },
    } );

print $xss->render( <<'END_XML' );

<doc>
    <section title="Introduction">
    <para>This is the first paragraph.</para>
    <para>And here comes the second one.</para>
    </section>

</doc>
END_XML

The code is still very young and has more bugs that I dare to count, but it’s getting to the point where it’s usable. The next things that are on my plate are:

  • Make the documentation suck less.
  • Re-introduce the templates. So that
$xss->get('section')->set_intro( sub {
    my ( $self, $node ) = @_;
    $self->stash->{seen_para} = 0;    # reset flag
    return '<h1>' . $node->findvalue('@title') . '</h1>';
} );

can become

$xss->get('section')->set_intro( xsst q{
    <% $r->stash->{seen_para} = 0; %>
    <h1><%@ @title %></h1>

} );
  • Re-introduce the command-line transforming command.
  • Add the ability to use XPath expressions as rendering rules.
  • And much, much more…

Leave a Reply

  • (will not be published)

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>