Your Schwartz Factor on your CPAN Page

Nov 30, 2010 / By Yanick Champoux

Tags: ,

The Schwartz factor of a CPAN author is the ratio of the number of tarballs sitting in his CPAN directory over the number of distributions. A low number indicates that it’s probably time for this author to do some clean-up (without fearing to lose the old tarballs, as they will always be available via the BackPAN, natch).

As such, I wanted to include a periodic check of my Schwartz factor to my monitoring system. Coming up with a script to extract the information from my CPAN home directory was simple enough:

#!/usr/bin/perl

# see http://use.perl.org/~brian_d_foy/journal/8314

use strict;
use warnings;

use 5.10.0;

use LWP::Simple qw/ get /;
use List::Util qw/ sum /;

my $author = 'YANICK';

$author =~ s#(.)(.)#$1/$1$2/$&#;  # YANICK => Y/YA/YANICK

my $page = get "http://search.cpan.org/CPAN/authors/id/$author";

my %dist;
$dist{$1}++  while $page =~ /<a href="(.*)-v?[d_.]+.tar.gz"/ig;

say "Schwartz factor: ", keys( %dist) / sum values %dist;

while( my ( $dist, $num ) = each %dist ) {
    say $dist, ' - ', $num;
}

This is not exactly the most robust code I’ve ever written — the parsing of the page should be left to HTML::Tree, really — but it’s doing what it’s supposed to do. Depending on which mirror site you’ll hit, the factor may vary a little bit.

But then I thought, why keep the fun offline? So I imported the logic into a GreaseMonkey script and I now have the Schwartz factor of CPAN authors added to their CPAN pages:

Schwartz factor on CPAN author page

The Schwartz is weak with this one.

The script will not work for authors who dropped an index.html in their home directory, or if they use sub-directories, but I expect that they should be more the exception than the rule.

The GreaseMonkey script is available on the userscripts.org site, and on GitHub.

Leave a Reply

  • (will not be published)

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>