Ruby gems annoyance – GPL’ed libraries

I tend not to be much of a license bigot, and think that they all have their places, but GPL’ed libraries are annoying, especially when they’ve been “blessed” by being inserted into an official repository like Ruby’s gems. Consider:

  1. A big, large stand-alone application, like…Linux, for instance. I can see why people might GPL it – because there is a very real risk of someone “taking it proprietary” (or was – these days the comunity is too big and too fast to compete with). So I can understand the use of the GPL there.

  2. 200 lines of library code. There is no way anyone is going to “take it proprietary” and make any money off it. Even incorporated into some proprietary application, it’s not going to make or break it. So, if it’s GPL’ed, those working on proprietary software won’t use it. If, on the other hand it uses a liberal license, then they can use it, and if they’re smart, they’ll contribute back any improvements to it.

I suppose that in case 2, someone might still GPL the code if their goal is solely to keep it out of the hands of proprietary developers, even at the cost of having less utilized and contributed to free code, but it’s not an optimal strategy otherwise.

The problem that I’ve stumbled upon is that Ruby’s Gem repository is full of randomly licensed code, including lots of GPL stuff. At the very least, this information ought to be very visible so that it’s possible to weed out any GPL’ed libraries. At best, the Ruby guys would only accept MIT/BSD licensed code for the standard repository.

This is something that’s done well in the Tcl community. Tcl is BSD licensed, and so is everything that surrounds it. Sure, people at times consciously decide to GPL their applications, but that’s their right, and it’s much easier to understand than some tiny bit of random code that one might like to use for some particular function in an application.

Information Asymmetry and IT Hiring

It appears as though Akerlof’s “market for lemons” that Bruce Schneier wrote about has caught on.

While hiring is certainly a case of information asymmetry like the “lemons market”, there are some other factors at work:

The Principal-agent problem: even once you’ve hired someone, there are some asymmetries at work of a different sort: your motives may not be the same as your employer’s, and it may be difficult for the employer to monitor what you’re doing. If you’re digging ditches, it’s pretty easy to see what you’ve done, and how well you’ve done it. If you’re creating a complex system, on the other hand, it’s much harder, even for an expert, to tell what sort of job you’ve done, let alone for a “pointy haired boss” with little knowledge of the domain at hand. The wikipedia article discusses different schemes and problems with attempting to align the interests of employee and employer.

The curious may wonder why anyone hires anyone at all, rather than just having all work be a series of contracts between independant individuals, rather than companies and employees. Ronald Coase asked the same question a number of years ago, and determined that the answer lies in transaction costs – the subtle friction that permeates any real world exchange. Think about having to continuously search for potential contract workers, and bargain with them over prices. It’s easier to employ someone, and be able to count on their work, at a steady rate, than to have to outsource important functions continuously and at variable prices. It’s also probably cheaper in some ways because of people’s tendency to be “risk averse” – they prefer the steady but stable job over a more variable, but higher paying income stream. As we’ve seen in recent years, as transaction costs have come down, it does become easier to outsource certain kinds of work, and we have seen more of it.

Economics provides a fascinating lens through which to view the world of high tech and information goods, and the superficial overview that I’ve attained has certainly served me well. A good place to start, besides perusing wikipedia, is the book Information Rules by Hal Varian and Carl Shapiro.

Dunbar’s number and online communities

It’s widely known that online communities tend to follow a certain pattern if they are successful. They get larger, and beyond a certain point, they start degrading. Slashdot, digg, reddit, usenet – there are countless examples.

My idle thought for the day: how might one go about reconciling the vastness of the internet with Dunbar’s number.

Dunbar’s number, which is 150, represents a theoretical maximum number of individuals with whom a set of people can maintain a social relationship, the kind of relationship that goes with knowing who each person is and how each person relates socially to every other person.

The ideal system would work to expose you to all the new and interesting things you can find only by interacting with a large group of people, yet keep the actual community aspects (comments, for instance) small, in order to facilitate happier, more interesting and productive communities, rather than the vicious circle turd-flinging matches that a lot of popular sites seem to descend into.

I don’t have a clear idea of how that might be accomplished at a practical level, but the idea interests me as a way to create something that grows a lot, yet doesn’t become unpleasant.

Pluto Meeting

The Pluto Meeting in Padova was fun to speak at. It was good to see people I haven’t had a chance to see in a while, and I was very impressed that the organizers managed to put the whole thing together in under a month. That’s no mean feat!

I gave a talk on economics and free software, a subject that I find fascinating. I went there expecting to get a few more questions or people pushing back against what I had to say, given that a lot of the people present were “of the FSF persuasion” in terms of their beliefs, but for some reason that didn’t really materialize. Too bad, because it could have been interesting.

In any case, we had a great time back in Padova for a few days!

Paul Graham and the Opportunity Costs of Startups

Anyone who hasn’t read them should have a look at Paul Graham’s collection of essays:

http://paulgraham.com/articles.html

They’re not all winners, but there’s a lot of good stuff there, especially where he talks about things he knows intimately: startups and “hackers” (in the sense of programmers). I’m not sure if he actually ever comes out and says that startups are the way to go for enterprising hackers (he may not, infact), but his enthusiasm for that route is infectious. It’s a route that was fantastically successful for him, and so it’s natural that he thinks highly of it.

However, the “opportunity costs” are also important to consider. In other words, what else could you do with that time, as a young and bright coder? I think the world, and possibly Linus Torvalds himself, would have been worse off had he dedicated himself to creating a company. He’s a bright guy, and might not have done too badly with a startup, but Linux is really one of a kind in the success that it’s had. Same thing goes for other bright people who have dedicated themselves to technology in some way (Guido van Rossum, Ruby’s ‘Matz’, Andrew Tridgell come to mind), in the process creating something very cool that has come back to reward them, in the end. On the other hand, while the idea of Arc is interesting, my guess is that the world is better off with Y Combinator, and Mr. Graham certainly seems to have done well by himself with the company he founded and sold to Yahoo.

The question is: what kind of person should do what? Some of the personality traits are doubtless similar between the two categories. Or perhaps there isn’t much difference at all, and it’s motivation that counts. Perhaps the most important thing is the people you know. Apple’s two founding Steves might not have did what they did without one another, whereas Linus was able to capitalize on the internet to leverage his own creation, without having to work closely with a business partner.

I don’t have the answers, nor do I think they are clear or easy, although I’m sure that motivation does count for a lot in terms of overcoming what people don’t expect you to be able to do.

Rails Distributions

A random idea: rails distributions that include what “should be included in the core”. I’m not the first to think of it (google finds several others), but I have been mulling it over lately, and I think it’s a good idea. Nothing too fancy, simply a tarball or gem that includes rails + the stuff that you end up installing every time you set up a real web site with rails. Of course, “what to include” is the big question, so the best thing would be for lots of people to try, and see what sticks. Here’s what I always end up installing:

Acts as Authenticated

Exception Notifier

And I recently found another one that seems to be pretty small, and very handy:

Condition Builder

because I usually always end up having to build up complex queries with dynamic components with any sort of more advanced search, and this makes it much easier, and more fun.

However, I’m sure more advanced Rails users can add to that list. The trick is to nail those things that you pretty much always end up installing sooner or later as your project grows in complexity.

Perhaps at that point a gem could be created that installs them…

Padova Pluto Meeting Talk – Economics of Free Software

A combination of factors worked out rather nicely, and we’ll be spending a long weekend in Padova, where I’ll be giving a talk on the 10th:

http://www.pluto.it/files/meeting2007/programma.html

It should be a fun weekend, and I particularly enjoy this talk and the subject matter. It has a lot of potential for interesting debate and interaction, because it’s not really a “solved problem”.

Missing rails feature: cache everything but this

It’s pretty common to have a front page that has content that is very cacheable, except for a little section with something like “signup” or “my page”, depending on whether the user is logged in or not. It would be very nice to have a handy solution in rails for this problem, which seems to be a fairly common question.

Here’s my stab at an answer:

cache everything but…

However, I’m not entirely happy with it because of the very hacky approach taken to make the render_as_string work – I have to fiddle with an instance variable in the controller:

controller.instance_variable_set '@performed_render', false 

Not good. In the thread linked to above, Paul Gustav posts a link about a system for having two different cached pages, but that’s not entirely satisfying either, because it gets you from 1 version to 2 versions, but going beyond doesn’t look real appetizing.

It would be a very nice problem to solve once and for all with a clean solution.

Simple item-based collaborative filtering with Rails

I suspect there is room for improvements with this, since it is really simplistic, and I certainly am not an expert in the subject. But it works ok for small numbers of items:

def Pair.from_items(items)
  Pair.transaction do
    res = []
    all = items.reject { |i| i.name == "" }.collect { |i|
            i.name.downcase }.sort
    all.collect { |i| ([i]*all.length).zip(all) }.each { |i| 
            i.each {|j| res << j} }
    res = res.each { |i| i.sort! }.uniq.reject { |p| 
            p[0] == p[1] }
    res.each do |pair|
      p = Pair.find(:first, :conditions => 
                            ['val1 = ? and val2 = ?', 
                             pair[0], pair[1]])
      p ||= Pair.new(:val1 => pair[0], :val2 => pair[1],
                     :num => 0)
      p.num += 1
      p.save!
    end
  end
end

Basically what happens is that for a list of items [A, B, C, D], it calculates and inserts into the database that A and B have been purchased/rated/viewed/whatevered together once. Same for A and C, A and D, B and C, and so on. After accumulating enough of these pairs, you can query the datbase for an item – B for instance – and see what it’s been purchased with the most times, and recommend that to people who have shown an interest in B. When someone purchases a group of items togehter, you call Pair.from_items(the_items) and it takes care of inserting the various combinations.

Subversion FAQ: How to make patches utilizing files not in the repository

As kpreid on #svn kindly points out, if you need to make a patch for a subversion project that includes files you have created, you need to add the files:

svn add tests/fixtures/empty.rb

Then you can run svn diff > my_patch.diff, and everything will be included.

Anyone can add files to their working copy, it’s just that they can’t necessarily commit them.

This ought to be in their FAQ.