Your Schwartz Factor on your CPAN Page

Nov 30, 2010 / By Yanick Champoux

Tags: ,

The Schwartz factor of a CPAN author is the ratio of the number of tarballs sitting in his CPAN directory over the number of distributions. A low number indicates that it’s probably time for this author to do some clean-up (without fearing to lose the old tarballs, as they will always be available via the BackPAN, natch).

As such, I wanted to include a periodic check of my Schwartz factor to my monitoring system. Coming up with a script to extract the information from my CPAN home directory was simple enough:

#!/usr/bin/perl

# see http://use.perl.org/~brian_d_foy/journal/8314

use strict;
use warnings;

use 5.10.0;

use LWP::Simple qw/ get /;
use List::Util qw/ sum /;

my $author = 'YANICK';

$author =~ s#(.)(.)#$1/$1$2/$&#;  # YANICK => Y/YA/YANICK

my $page = get "http://search.cpan.org/CPAN/authors/id/$author";

my %dist;
$dist{$1}++  while $page =~ /<a href="(.*)-v?[\d_.]+\.tar\.gz"/ig;

say "Schwartz factor: ", keys( %dist) / sum values %dist;

while( my ( $dist, $num ) = each %dist ) {
    say $dist, ' - ', $num;
}

This is not exactly the most robust code I’ve ever written — the parsing of the page should be left to HTML::Tree, really — but it’s doing what it’s supposed to do. Depending on which mirror site you’ll hit, the factor may vary a little bit.

But then I thought, why keep the fun offline? So I imported the logic into a GreaseMonkey script and I now have the Schwartz factor of CPAN authors added to their CPAN pages:

The Schwartz is weak with this one.

The script will not work for authors who dropped an index.html in their home directory, or if they use sub-directories, but I expect that they should be more the exception than the rule.

The GreaseMonkey script is available on the userscripts.org site, and on GitHub.


Share this article



Leave a Reply

  • (will not be published)

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>