Building Web Service APIs

Jul 3, 2012 / By Yanick Champoux

Tags: ,

A couple of years back, I created WWW::Ohloh::API because it seemed to be a fun thing to do. And trust me, it was. But now, since I’m not using that module personally, I thought it would be a good idea to see if anyone would be willing to co-maintain it. Before I could do that, though, I had to deal with two little problems.

The first was the general problem that there is a real CPAN equivalent of the good ol’ church steps on which one could leave modules in a wicker basket for adoption. So, well… I kinda proposed one. The Dist::Zilla part of the deal is by now out there, and peeps so far have been making favorable noises regarding the MetaCPAN pull request, so there are good chances that we’ll see some Help Wanted signs popping in on MetaCPAN soon. On that front, everything’s groovy.

The second matter is less meta. Back in the days I first wrote WWW::Ohloh::API, I was a big proponent of Object::InsideOut. Heck, I even have a use.perl.org blog entry of that time, where I profess a preference of O::IO over Moose. Come to think of it, those were also the years where I was going with Prototype over jQuery and with Rose::DB::Object instead of DBIx::Class.

No, I don’t win a lot at horse racing either… Why are you asking?

Seriously though, Object::InsideOut was — and still is — great pieces of code. But it’s fair to say that fate has pushed it to the margins of the Perl OO world. So I thought that to increase WWW::Ohloh::API‘s chances to be adopted, I should pass it through a Moose make-over. (Let’s call it Extreme Makeover: Antlered Edition.)

So I did. And now, if you’ll allow me, I’ll babble a wee bit about its refurbished modus operandi as it not only makes a great hook for wannabe co-maintainers, but is also an interesting foray into the implementation of web services, I believe.

Ohloh’s web service

Before we dive in, we should probably take a step back and see the broader picture. Ohloh is a social software directory website where you can create “stacks” listing the software they are using as well as send kudos to the peeps working on them. It also has a REST API, which is decently documented.

The REST API is fairly standard and supports two types of requests. One for single “objects” (projects, accounts, etc.) and another for collections of those objects.

To interact with that API, I also went a fairly standard way. I decided that I would have a main WWW::Ohloh::API object that would take care of the interactions with the REST service proper and several WWW::Ohloh::API::Object:: and WWW::Ohloh::API::Collection:: objects that would reflect the different results that the web service provides. Of course, I could have done this in a less fancy (and much easier) way by returning a more generic hash on all requests (like, for example, MetaCPAN::API). But hey, I like fancy).

The main WWW::Ohloh::API class

Funnily enough, the main class is perhaps the simplest of the whole distribution. At its core, it’s very little more than a thin wrapper around LWP::UserAgent.

package WWW::Ohloh::API;

use Carp;

use Moose;

use MooseX::SemiAffordanceAccessor;

use Module::Pluggable
  require     => 1,
  search_path => [qw/
    WWW::Ohloh::API::Object
    WWW::Ohloh::API::Collection
/];

use LWP::UserAgent;
use Readonly;
use XML::LibXML;
use List::Util qw/ first /;
use Digest::MD5 qw/ md5_hex /;

our $OHLOH_HOST = 'www.ohloh.net';
our $OHLOH_URL  = "http://$OHLOH_HOST";

our $useragent_signature = join '/', 'WWW-Ohloh-API',
  ( eval q{$VERSION} || 'dev' );

has api_key => ( is => 'rw', );

has api_version => (
    is      => 'rw',
    default => 1,
);

has user_agent => (
    is      => 'ro',
    lazy    => 1,
    default => sub {
        my $ua = LWP::UserAgent->new;
        $ua->agent($useragent_signature);
        return $ua;
    } );

has xml_parser => (
    is      => 'ro',
    lazy    => 1,
    default => sub {
        return XML::LibXML->new;
    } );

sub fetch {
    my ( $self, $object, @args ) = @_;

    my $class = first { /::$object$/ } $self->plugins
      or croak "object or collection '$object' not found";

    return $class->new( agent => $self, @args, )->fetch;
}

sub _query_server {
    my $self = shift;
    my $url  = shift;

    unless ( ref $url eq 'URI' ) {
        $url = URI->new($url);
    }

    my $result = $self->_fetch_object($url);

    my $dom = eval { $self->xml_parser->parse_string($result) }
      or croak "server didn't feed back valid xml: $@";

    if ( $dom->findvalue('/response/status/text()') ne 'success' ) {
        croak "query to Ohloh server failed: ",
          $dom->findvalue('/response/status/text()');
    }

    return $dom;
}

sub _fetch_object {
    my ( $self, $url ) = @_;

    my $request = HTTP::Request->new( GET => $url );
    my $response = $self->user_agent->request($request);

    unless ( $response->is_success ) {
        croak "http query to Ohloh server failed: " . $response->status_line;
    }

    return $response->content;
}

1;

There is precious little magic in that code. I’m using Module::Pluggable to auto-discover all the object and collection classes implemented (which is a little sloppy, but sure gets things going quickly). I implemented a main fetch() method to create all the result objects without having to pass the main object over and over again.

Besides that, _query_server() and _fetch_object() take care of the core functionality. Namely, take an uri, query the Ohloh server, make sure the returned xml answer is kosher, parse it, and return its resulting dom representation.

So far, so good.

An object class

Next on the line: the classes representing the different objects. Let’s take for our example WWW::Ohloh::API::Object::Account, which implements an account:

package WWW::Ohloh::API::Object::Account;

use Moose;

use MooseX::SemiAffordanceAccessor;

with 'WWW::Ohloh::API::Role::Fetchable';

use WWW::Ohloh::API::Types qw/ OhlohId OhlohDate OhlohURI /;

use Digest::MD5 qw/ md5_hex /;

has id => (
    traits => [ 'XMLExtract' ],
    is      => 'rw',
    isa     => 'Str',
    predicate => 'has_id',
);

has name => (
    traits => [ 'XMLExtract' ],
    is      => 'rw',
    isa     => 'Str',
    predicate => 'has_name',
);

has [qw/ created_at updated_at /] => (
    traits => [ 'XMLExtract' ],
    isa => OhlohDate,
    is => 'rw',
    coerce => 1,
);

has [ qw/homepage_url avatar_url/ ] => (
    traits => [ 'XMLExtract' ],
    is => 'rw',
    isa => OhlohURI,
    coerce => 1,
);

has posts_count => (
    traits => [ 'XMLExtract' ],
    is => 'rw',
    isa => 'Int',
);

has [qw/ location country_code /] => (
    traits => [ 'XMLExtract' ],
    is => 'rw',
    isa => 'Str',
);

has [ qw/ latitude longitude / ] => (
    traits => [ 'XMLExtract' ],
    is => 'rw',
    isa => 'Num',
);

has 'kudo_score' => (
    is => 'rw',
    isa => 'WWW::Ohloh::API::Object::KudoScore',
    lazy => 1,
    default => sub {
        my $self = shift;

        return WWW::Ohloh::API::Object::KudoScore->new(
            agent => $self->agent,
            xml_src => $self->xml_src->findnodes( 'kudo_score' )->[0],
        );
    },
);

has stack => (
    is => 'rw',
    isa => 'WWW::Ohloh::API::Object::Stack',
    lazy => 1,
    default => sub {
        my $self = shift;

        return WWW::Ohloh::API::Object::Stack->new(
            agent   => $self->agent,
            id      => $self->id,
            account => $self,
        );
    },
);

has email => (
    is      => 'rw',
    isa     => 'Str',
    lazy     => 1,
    default  => '',
    predicate => 'has_email',
);

has email_md5 => (
    is      => 'rw',
    isa     => 'Str',
    lazy     => 1,
    default => sub {
        md5_hex($_[0]->email);
    },
    predicate => 'has_email_md5',
);

around _build_request_url => sub {
    my( $inner, $self ) = @_;

    my $uri = $inner->($self);

    $self->has_id or $self->has_email or $self->has_email_md5
        or die "id or email not provided for account, cannot fetch";

    my $id = $self->has_id ? $self->id : $self->email_md5;

    $uri->path( 'accounts/' . $id . '.xml' );

    return $uri;
};

1;

Much shorter than what you expected, eh?

There are two juicy pieces of role-fu at work here. The first one is WWW::Ohloh::API::Role::Fetchable, which takes care of the nitty-gritty details of retrieving and storing the data fetched from the web service. We’ll see it in its full glory shortly, but for now all we need to know is that it adds a request_url attribute to the class. The main piece required by a class that consumes that role is a wrapper around the builder of that attribute that properly populates the path of that url. (I dare you to say that sentence thrice without spitting all over your monitor.)

The second one is the XMLExtract trait that grabs the value for the attribute straight out of the xml returned by the service. Of course, for more complex sub-structures, like the kudo_score, or related objects that require a second request, like the stack, one has to work a little bit more, but it’s still all very manageable.

A collection class

The collections require a little more work. Not in the collections classes themselves, mind you, which follow the same pattern but are even shorter (as they don’t really have attributes by themselves):

package WWW::Ohloh::API::Collection::AccountStacks;

use Moose;

with 'WWW::Ohloh::API::Collection';

has '+entry_class' => (
    default => 'WWW::Ohloh::API::Object::Stack',
);

around _build_request_url => sub {
    my( $inner, $self ) = @_;

    my $uri = $inner->($self);

    $self->has_id or $self->has_email
        or die "id or email not provided for account, cannot fetch";

    my $id = $self->has_id ? $self->id : $self->email_md5;

    $uri->path( 'accounts/' . $id . '/stacks.xml' );

    return $uri;
};

has id => (
    is      => 'ro',
    isa     => 'Int',
    lazy     => 1,
    predicate => 'has_id',
    default => sub {
    },
);

has email => (
    is      => 'rw',
    isa     => 'Str',
    lazy     => 1,
    default  => '',
    predicate => 'has_email',
);

has email_md5 => (
    is      => 'rw',
    isa     => 'Str',
    lazy     => 1,
    default => sub {
        md5_hex($_[0]->email);
    },
    predicate => 'has_email_md5',
);

1;

But they do rely on the WWW::Ohloh::API::Collection role, which implements all the work having to do with pagination behind the scene:

package WWW::Ohloh::API::Collection;

use Moose::Role;

use Carp;

with 'WWW::Ohloh::API::Role::Fetchable' => { -excludes => 'fetch' };

has entry_class => (
    is      => 'rw',
    isa     => 'Str',
    lazy    => 1,
    default => sub { die "'entry_class must be defaulted\n" },
);

has cached_entries => (
    traits  => ['Array'],
    isa     => 'ArrayRef',
    is      => 'ro',
    default => sub { [] },
    handles => {
        add_entries     => 'push',
        cache_empty     => 'is_empty',
        cache_size      => 'count',
        next_from_cache => 'shift',
    },
);

after next_from_cache => sub { $_[0]->inc_entry_cursor };

has page_cursor => (
    is      => 'rw',
    traits  => ['Counter'],
    isa     => 'Int',
    default => 1,
    handles => { inc_page => 'inc', },
);

has entry_cursor => (
    is      => 'rw',
    traits  => ['Counter'],
    isa     => 'Int',
    default => 0,
    handles => { inc_entry_cursor => 'inc', },
);

has nbr_entries => (
    is        => 'rw',
    isa       => 'Int',
    predicate => 'has_nbr_entries',
    lazy      => 1,
    default   => sub {
        $_[0]->fetch->nbr_entries;
    },
);

sub fetch {
    my ( $self, @args ) = @_;

    # no more to fetch
    return
      if $self->has_nbr_entries
         and $self->entry_cursor >= $self->nbr_entries;

    $self->clear_request_url;

    my $xml = $self->agent->_query_server( $self->request_url );

    $self->nbr_entries( $xml->findvalue('/response/items_available') );

    my @entries = $xml->findnodes('//result/child::*');
    my $first   = $xml->findvalue('/response/first_item_position');

    while ( @entries and $first < $self->entry_cursor ) {
        shift @entries;
        $first++;
    }

    $self->add_entries( $xml->findnodes('//result/child::*') );

    $self->inc_page;

    return $self;
}

sub all {
    my $self = shift;

    my @entries;

    while ( my $e = $self->next ) {
        push @entries, $e;
    }

    return @entries;
}

sub next {
    my $self = shift;

    return if $self->entry_cursor >= $self->nbr_entries;

    if ( $self->cache_empty ) {
        $self->fetch;
    }

    my $raw = $self->next_from_cache or return;

    return $self->entry_class->new(
        agent   => $self->agent,
        xml_src => $raw,
    );
}

around _build_request_url => sub {
    my ( $inner, $self ) = @_;

    my $uri = $inner->($self);

    my $params = $uri->query_form_hash;

    $params->{page} = $self->page_cursor;

    $uri->query_form_hash($params);

    return $uri;
};

1;

The two key points in there are the local implementation of fetch(), which deals with the different xml structure returned by collections, and the meddling with the request url that injects the paging parameters for the subsequent fetch() calls to get a full collection.

The keystone role

Underneath all of that lies the Fetchable role. One would expect a massive work-horse here, but I’ll let you see for yourself:

package WWW::Ohloh::API::Role::Fetchable;

use Moose::Role;
use WWW::Ohloh::API::Role::Attr::XMLExtract;

use URI::URL;
use URI::QueryParam;

has request_url => (
    is      => 'rw',
    writer  => '_set_request_url',
    lazy    => 1,
    builder => '_build_request_url',
    clearer => 'clear_request_url',
);

has xml_src => (
    is        => 'ro',
    writer    => '_set_xml_src',
    predicate => 'has_xml_src',
    lazy      => 1,
    default   => sub {
        $_[0]->fetch;
        $_[0]->xml_src;
    },
);

has agent => (
    isa => 'WWW::Ohloh::API',
    is  => 'ro',
);

sub fetch {
    my ( $self, @args ) = @_;

    my $xml = $self->agent->_query_server( $self->request_url );

    $self->_set_xml_src( $xml->findnodes('//result/child::*') );

    return $self;
}

sub _build_request_url {
    my ($self) = @_;

    my $uri = URI::URL->new($WWW::Ohloh::API::OHLOH_URL);

    my $params = $uri->query_form_hash;

    $params->{api_key} ||= $self->agent->api_key;
    $params->{v}       ||= $self->agent->api_version;

    $uri->query_form_hash($params);

    return $uri;
}

1;

Yup, that’s it. It manages the agent that stores the main WWW::Ohloh::API object, provides the scaffolding necessary to build the request url, and sets a comfy nest for the xml structure that will be returned, and that’s that.

A last piece of candy: automatically extracting attributes from the xml

This last trait is the secret sauce that keeps all the object classes so DRY. It will basically populate its attributes with the xml element of the same name as found in xml_src. Of course, most things can be tweaked as desired, but the defaults are already doing all we need in, well, all the cases so far.

package WWW::Ohloh::API::Role::Attr::XMLExtract;

use Moose::Role;

Moose::Util::meta_attribute_alias('XMLExtract');

has 'xml_src' => ( isa => 'Str', is => 'ro' );

has xpath => ( isa => 'Str', is => 'ro' );

has 'lazy' => ( is => 'ro', default => 1 );

before '_process_options' => sub {
    my ( $class, $name, $options ) = @_;

    die "attribute '$name' in class '$class' must be lazy-evaluated\n"
      if defined $options->{lazy} and not $options->{lazy};

    my $src   = $options->{xml_src} ||= 'xml_src';
    my $xpath = $options->{xpath}   ||= $name;

    $options->{predicate} = 'has_' . $name;

    $options->{default} = sub {
        return $_[0]->$src->findvalue($xpath);
    };

};

1;

And that’s it

Well, not quite. There are still some details like the test version of the agent that uses local copies of the request urls and the definition of the different types and their coercion, but that’s all banal stuff. We have covered pretty much all the interesting, and remotely tricky, bits of WWW::Ohloh::API. All there is left to do is to turn the crank and write the different classes. And document them. It’s not terribly hard, but there is an awful lot of them.

So… Who wants a co-maint bit?

Guys?

Leave a Reply

  • (will not be published)

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>