Details, details

While working on Hecl, it has sometimes really surprised me how much it’s possible to delve into seemingly trivial details. Just for starters, we’ve been talking about what to call string commands.

Originally, I’d used names like slen and sindex to have something quick and easy to write, but no one seemed to like them that much. Wolfgang implemented a bunch of new string commands (very much appreciated!), and took the Tcl approach of command subcommand, like string index, and also suggested some other things like str.index. I am strongly opposed to the string length style commands though, because I think that’s just too much typing for a language that’s supposed to get you up and running quickly. The compromise I proposed and implemented was to use more ‘C style’ commands like strlen, strindex, strfind, and so on. I’m not a big PHP fan, but they seem to have taken a similar approach. It’s not too verbose, but you can tell more or less what the command is for.

Even more seemingly trivial – indexes for strings and arrays. Hecl takes the Lisp approach rather than the C approach, and utilizes commands instead of syntax, even for list and string access, so to get the second character of the string “foo”, you would do: strindex "foo" 1 . Like C, Hecl indexes start at 0. Where things get tricky is in calculating indexes “from the end” – how do you tell the language to “fetch the last character from the string ‘foo'”?

Tcl lets you write a string like “end” or “end-1”, which is handy compared to the process you would go through in a language that doesn’t help you out: take the length of the string, and subtract from that as needs be. I prefer the approach that I first saw in Python, though, use -1 for the last character:

>>> "abcde"[-1]
'e'

Ruby takes the same approach. It’s nice to be able to just insert a -1 rather than have “end-1”, which sort of looks like an expression, but isn’t really (Hecl doesn’t even have normal infix expressions), especially because you can easily calculate the -1 and insert it as is, instead of having to do some string interpolation like "end-$foo".

Where things get tricky is when you want to do something like specify a range, say from the 3rd letter to the end of the string.

In Python, they use syntax, specifically by simply not including the end of the range:

>>> "abcde"[3:]
'de'

That doesn’t work really well with Hecl – I suppose we could pass in a blank string instead of a number, like so:

strrange "abcde" 3 ""

But that’s not very clear, and IMO it’s slighly ugly to pass in a string where you want to use a number. I had a look at Ruby’s behavior, and found it more to my liking:

irb(main):001:0> "abcde"[3..-1]
=> "de"

The difference with Python is that the second index is inclusive, whereas Python includes the character at the first index, and excludes the second:

>>> "abcde"[3:-1]
'd'

I decided the Ruby approach worked best for Hecl, so now we have:

hecl> strrange abcde 3 -1
de

If you’re still reading this, I guess you see what I mean by how much one must delve into what must seem like very unimportant minutiae to… well… normal people!

“Hydras”, “real” open source, and the ASF Incubator

An interesting post by Ian Holsman, one of my colleagues in the Apache Software Foundation who is also interested in open source business and economics:

http://feh.holsman.net/articles/2006/05/12/is-your-project-a-hydra

I take a more lassez-faire approach, myself. Ian’s advice is valuable to those sorting through open source projects to use or get involved with – a full-fledged free software project with many users and committers who are independant of one another is usually going to be a better proposition than something run by one company, or one person. A lot of ASF thinking focuses on the community being the real value in an open source project, and it’s obvious that it adds a lot of value. However, the licensing is important too, because it’s your escape hatch. It means that if the company goes away, or decides to create a proprietary product, that you have the option to create a community around the open source code, by taking over its development and maintainance.

ASF Incubator

Ian also mentions the ASF Incubator, which I’ve been involved with first hand through my involvement in incubating OFBiz. One of the big hurdles that the project faces is getting a scrap of paper from everyone who has ever contributed anything important to the project. And since OFBiz is a real open source project with committers all over the world, and, over the years, many contributions, that is a lot of paper to collect! Thanks to the efforts of the OFBiz team, they’re doing an admirable job of completing the task.

However, I can’t help but observe that “incubation” is a far easier process to go through if the code arrives in the form of a corporate donation, because it all comes from one place. Unfortunately, I think that leads to some selection for “hydra style” projects, although to their credit they may be trying to break out of that by joining the ASF (they need to if they want to successfully complete incubation). Still, I think it is likely to lead to a more “corporate” organization.

Consistency vs Convenience

We have a decision to make in the Hecl project that highlights how much computer programming is really an art, rather than a science. Namely, how to name a series of string handling commands. Tcl has a series of string commands like

string first string1 string2 ?startIndex?
string equal ?-nocase? ?-length int? string1 string2
string index string charIndex

which are mostly consistent. At times, they’re also just a bit more unwieldy than I might like. For instance;

set somechar [string index "the nth char is" 4]

is just not as quick to write as

somechar = "the nth char is"[4]

For something that you need to use frequently like ‘string equals’, a shorter, quicker version is really a must – infact, Tcl’s expressions recognize the ‘eq’ operator. Hecl has ‘eq’ and ‘ne’ as commands, which makes things a bit less consistent, sure, but a lot more convenient.

The decision we need to make, though, is where to draw the line? String equality is so common that it doesn’t take much thought to opt for convenience. How about string index? string length? The artistry involved is to make something that’s appealing to people, but also practical. Elegant, but not so much that it remains unsullied by use in real work.

Open Source Digressions

Is it just me, or do these kinds of side trips happen to other people in the open source world?

For instance, I want to add SMS capabilities to Hecl, so I did some research, realized that it’s not too hard, and wrote the code. The build system we use is based on Antenna, which turns out to have a problem with the 2.2 version of suns Wireless ToolKit in that it doesn’t let you select wma11.jar or wma20.jar, as the 2.2 version requires you to do.

Poking around some more turned up a patch for Antenna that would do just what I need. However, it’s from late 2005. It seems to be quite popular in the J2ME world, so what happened to it? Is the project no longer maintained? I wonder if it would be possible to fork it and get some people involved in keeping it up to date?

And so it goes… and I get sidetracked from what I’m working on. Sometimes it leads me to new and interesting projects, but I suppose that, long-term, it’s not the best use of my time. And yet I find that the pattern repeats itself, and all things considered, I really enjoy hacking on open source software, so I do end up having fun.

Rails Annoyances

First, let me say that I’m actually enjoying Ruby on Rails quite a bit. I agree with the large number of people who, like me, are finding that Rails hits a sweet spot between too much structure, configuration and overhead, and anarchy, on the other side. It’s good code, fun to use, and for the most part I enjoy Ruby as a language.

However, being a bit of a perfectionist, and since Rails is “opinionated software”, I thought I’d fire mine right back at it regarding a few of the things I don’t like. Mostly minor nitpicks, but I’ll likely add to them as time goes by and I use Rails more (I’d like to use it more at work, for instance) – after all, I’m opinionated myself.

  • XML processing directives are <? and ?>, not <% and %>. I think PHP got it right there.

  • To make a variable “visible” in a template that you create in a controller, you use an @instance @variable. However, only instance variables are visible in templates, class variables (@@somevar) aren’t. Odd… If I understand things correctly, it turns out that the variable scope isn’t what’s being looked at (as one’s mental model might lead one to believe), but rather, all the instance variables are introspected and made visible in the template. I think I was happier thinking something along the lines of the template having the scope of another method in the controller. I honestly haven’t thought it through, much, but it seems that rather than ‘instance variables’, they are ‘variables marked, via a @, for inclusion in the template’.

  • The one thing that does drive me batty is that you have to restart Webrick to see changes in templates. Other people don’t seem to observe this, so perhaps it’s the versions of everything I’m running (Debian stable, Ruby 1.8.2, Rails 1.0, Postgresql backend) that don’t quite work out. Of course it’s not in production mode and thus cacheing. What happens is when I restart the server, I see the changes I made. At that point, I can usually change the template file once and see the changes. After that, they simply don’t show up, even if changes in the controller always do. So to edit my template, I have to restart the server a bunch, which is extremely annoying. I have a feeling that there must be something I’m doing wrong.

  • I really wish I could use ActiveRecord with primary keys on multiple columns. I know that the authors just aren’t in favor of the idea, but it would be damn useful in transitioning away from legacy databases (I’m thinking of work, here). It would make my life that much better if I could smear some ActiveRecord putty over the crufty old tables and be able to deal with them in a nice, clean way from Ruby.

  • Indiscriminate use of strings and symbols. There doesn’t seem to be an entirely clear rule of when to use strings and when to use symbols. You eventually memorize what to use where, but it feels like it’s not quite as consistent as it might be, that there’s a ‘helpful rule’ that’s missing.

Hecl news – conditional compilation, tools

For the curious, who read Conditional Compilation in Java, and were waiting on the edge of their seats to know what happened, we decided to use Antenna, with the work being done mostly by Wolfgang. It’s not particularly elegant – the source code ends up being, in some places, what the Italians call a “pugno nell’occhio” – a “fist in the eye”. But all things considered, it’s not that bad, and it’s better than writing the same file in slightly different ways in different subdirectories, especially in those cases where the code changes only slightly from one version to another. As a bonus, Antenna gives you a lot of handy tasks for doing common j2me things.

Indeed, Wolfgang did a very nice job of revamping the Hecl build system, and after a bit of additional tweaking, it works very well. It builds different .jars and .jads for different targets, so that it’s easy to develop for new platforms without stepping on everyone else’s toes.

My current plans are to spend some more time on build tools and infrastructure – I think more people will start to experiment with Hecl once they realize just how easy it is to build applications. You don’t even need a java compiler if you use HeclBuilder, which is a simple GUI that lets you specify the input script, the jar file to create, and then generates a new Hecl application read to be loaded onto a cell phone. It does this by keeping a copy of Hecl.jar as a resource, which it then modifies (replacing the script.hcl resource that the Hecl.jar itself contains) and then writes out as a new .jar/.jad. The long and short of it being that you have a very quick edit/”build”/test cycle when using an emulator. I have some ideas about improving on this even more, perhaps utilizing microemulator, which, although not as complete as Sun’s WTK emulator, is open source and thus redistributable with Hecl.

Ruby vs Tcl, round 2

Ding ding. (Round 1 – for those who missed it)

Kidding aside, remember that I like and respect both of them.

Libraries

In the first article, I mentioned that Ruby has a lot of momentum, which is a pleasant change from Tcl’s relative ‘uncoolness’ amongst the Web 2.0/O’Reilly/”next big thing” crowd. That said, there are still places where that momentum hasn’t taken it. I set about writing some code that I’d wanted to play with that involves sending some email. Something that’s very easy in Tcl:

http://tcllib.sourceforge.net/doc/smtp.html (towards the end of thepage).

Python seems to have a pretty complete email system too, for that matter: http://docs.python.org/lib/module-email.html.

So it appears that, based on a sample of one attempted task (how’s that for statistics?), that Ruby is still lacking a few things in its standard distribution. Note that this functionality is available elsewhere (TMail, to cite one), but email is ubiquitous enough that it ought to be in the standard library.

Command line swiss-army knife

While Ruby’s OO system gives it a head start when you need to create a larger system with distinct parts, it can still be used as a quick’n’dirty scripting language for quick one-off jobs. From the man page:

% cat /tmp/junk
matz
% ruby -p -i.bak -e ’$_.upcase!’ /tmp/junk
% cat /tmp/junk
MATZ

Very handy. Tcl’s command-based syntax just isn’t quite as quick for those sorts of operations, so Ruby wins hands down here. I’ve suggested that the Tcl folks distribute a second program that takes a lot of the Perl-style command line arguments for this sort of work, but the idea doesn’t seem to be of much interest.

C API’s

On another tack completely, one of the things that originally drew me to Tcl was its very, very nice C API. It’s documentation is clear, and thorough, and the API itself lets you get involved in pretty much any aspect of the language that you want. This makes sense, because when Tcl was originally created by Dr. Ousterhout in the late ’80ies, the idea was to create a scripting language as a C library that would be loaded into other programs. The language has always been faithful to its heritage, and to this day I find it lots of fun to merge Tcl with C code.

I have to admit that I don’t know the Ruby API all that well, but what I have looked up looks pretty nice. To a certain degree, it’s comparing apples and oranges, because Ruby requires you to deal with the very object oriented nature of the language, whereas Tcl is a lot more direct. The Programming Ruby book’s coverage of the subject also leads me to believe that Tcl really gives you access to more stuff (interpreters, channels, events, and many other parts of the system). Ruby seems to take an approach that might best be described as letting you write Ruby in C – meaning that you create Ruby objects, use their methods, get their values, and so on, but you’re still really dealing with Ruby. This has a certain elegance, but sometimes it’s necessary to muck about with things at a lower level.

I don’t know how it works out in practice, but Ruby has a bit more infrastructure in place for actually building extensions once you’ve created them. Tcl has “tea”, which is a set of m4 macros, but anyone can tell you that the auto tools are not much fun to work with (going to the dentist is more fun) – anything that keeps me away from them is welcome.

Garbage collection

One of the more interesting aspects of the different approaches to C interoperability is the fact that Tcl uses a simple, robust, straightforward reference counting system to keep track of, and throw away resources that are no longer used. Ruby has a mark and sweep garbage collector, which is probably more sophisticated, but also more complicated, and requires a bit more support from the programmer initially. The benefit is that once things are set up, they require less keeping track of, because you hand off memory management to the computer. From the end user’s point of view, Ruby wins here, but if you happen to be writing a C extension, I could see it being more difficult to write and debug to this API, although Tcl has its own warts, one of the worst of which is the fact that Tcl values must be convertible back and forth to strings, so that for things like a file handle that can’t really survive the round trip, because it’s just a pointer, you use a hash table and some sort of string to hold onto the object:

 "file1" -> int fd1
 "file2" -> int fd2

which makes it impossible to GC these values.

I suspect that both approaches have their merits – Ruby’s is more elegant, but Tcl’s is simple and rugged.

Licensing

Bouncing back to something completely non-technical, Ruby’s licensing is either the GPL, or their own license. If I understand things correctly, you can get around the GPL’s “viral nature” by simply renaming things, if you have a need to include Ruby in a proprietary product:

  1. You may distribute the software in object code or executable
    form, provided that you do at least ONE of the following:

    a) distribute the executables and library files of the software,
    together with instructions (in the manual page or equivalent)
    on where to get the original distribution.

    b) accompany the distribution with the machine-readable source of
    the software.

    c) give non-standard executables non-standard names, with
    instructions on where to get the original software distribution.

    d) make other distribution arrangements with the author.

So it seems that you’re ok if you just call it something else. However, the license goes on to talk about several other files in the core distribution:

  1. You may modify and include the part of the software into any other
    software (possibly commercial). But some files in the distribution
    are not written by the author, so that they are not under this terms.
    They are gc.c(partly), utils.c(partly), regex.[ch], fnmatch.[ch],
    glob.c, st.[ch] and some files under the ./missing directory. See
    each file for the copying condition.

Looking through the LEGAL file in the distribution shows that there are files distributed under other terms. Of course, they’re all free software, but some are LGPL, some BSD, and a few others for good measure.

Tcl’s licensing, on the contrary, requires very little understanding. The language was developed at the University of California, Berkeley, and the license remains BSD. This goes for many of the libraries and extensions as well. If you need to embed a language in your proprietary system, Tcl and its libraries present no problems whatsoever. Of course, I prefer to work with open source code and communities, but that’s not possible 100% of the time, so it’s always nice to know things are free and clear, should that need arise.

Internationalization

This is something the Ruby folks know they need to fix, and are working on, so it’s not worth dwelling on it much, but it is a proud point for the Tcl community. Tcl has had very nice i18n setup for many years, at this point – since the 8.0 release. It’s built into the language, so that everything works with it. You have commands to set encodings of IO channels, and even munge strings. Tcl wins here – no contest.

As I continue exploring Ruby, I think I’ll find more stuff to compare with Tcl, so we won’t ring the final bell just yet.

Language Design – weak types and easy marshalling

Weak/Dynamic typing

One of the things I really like about Tcl is how very, very dynamic it is. It’s possible to replace just about everything ‘on the fly’, including control commands like if and while (in Tcl, they’re just commands, not special syntax).

Like Lisp, Tcl gets a lot of its flexibility from its simplicity, and one of the ways that things are kept simple is via the concept of “Everything Is a String”. Some people erroneously recall the early days of Tcl and take this to mean that Tcl represents everything as a string internally, making it very slow for numerical operations. That hasn’t been true for something like 8 years – Tcl values are translated into a more appropriate internal representation ‘on demand’. However, the concept still gives you lots of flexibility:

set a b

You don’t have to quote things – they default to strings – the variable ‘a’ now contains the value ‘b’.

set lst {a b c d e}

No fiddling around with lots of syntax – the above is defined as a string (we could have used “” to group the words), but it gets turned into a list

llength $lst
5

Things are automatically converted into the type you want to work with. Of course, this also makes it really easy to work with code as strings and vice versa:

proc until {cond code} {
    while {1} {
        uplevel $code
        if { [uplevel [list expr $cond]] } { return }
    }
}

set i 0

until {$i == 10} {
    puts "i is $i"
    incr i
}

and so on and so forth.

Ruby (and Python) handles things a bit differently than Tcl does. Even though it’s still very dynamic, it is more strongly typed:

"5" + 5 # error

Whereas the same code runs just fine in Tcl – “5” is transformed into a number, because if the programmer is trying to add to it, they must want it to be a number. On the other hand, strong typing can help catch errors, may help make things clearer, and in corner cases, probably helps catch nasty surprises.

One of the things that Tcl’s system facilitates, at least with simple types (lists, strings, numbers, dicts) is ‘marshalling’ or ‘serializing’. Want to send a hash table down a socket?

 puts $sk $mydict

 # And on the other side:
 gets $localsk mydict
 dict get $mydict key

When you start treating it as a dictionary, it “just turns into one” – or throws an error. Good programmers will check for errors, but that goes for whatever language you use. This often comes in handy for code-creation, as well – it’s very easy to build up a string programmatically and then evaluate it, much as you might do with a list in Lisp.

It also makes coding short scripts… well, shorter, because you don’t have to fiddle around telling the language what you mean, in most cases. Treat it like how you want to use it, and it will work out what you want to do.

Hecl

Hecl comes into the picture, because I’ve borrowed a lot of design ideas from Tcl, but the language is still used by only a few people, so we’re free to hack on it and change things that we don’t like about Tcl. Lately, we’ve been having an interesting debate on the merits of Tcl style weak typing, vs stronger Ruby-style typing, as well as the string representation you get when you attempt to print a type:

http://sourceforge.net/mailarchive/message.php?msg_id=15254941

http://sourceforge.net/mailarchive/message.php?msg_id=15254942

There are certainly some things that could be improved in Tcl’s way of dealing with things – one area where it doesn’t do so well is in dealing with NULL values – you can’t really deal with that in a string, because a blank string is different from no value at all.

Wolfgang would like to go towards more Ruby-style values. I’m willing to consider most anything at this stage, because it’s interesting, and I like going over the possibilities, in order to make Hecl as useful as possible. “Everything is a string” has its limits in Tcl, in any case, because what ends up happening is that ‘complex’ types (files, say), are tied, via a hash table, to the C struct that actually implements them, meaning that the string representation is really only a pointer to something else, and isn’t “meaningful”, as it is with lists or dicts.

I think, though, that in this case I’ll mostly want to keep types weak/dynamic, as Hecl aims to be a complement, not a replacement for Java, so in many cases, I’d rather see it make nearly the opposite design decision, where it’s reasonable.

We’ll see, though, who knows what ‘type’ of interesting compromise we might end up with. Of course, I’m always happy to see new people join the discussion.

More programming language economics

Interesting article on the “opportunity costs” of Java not being open source:

http://www.0xdeadbeef.com/weblog/wp-trackback.php?p=190

It fits in with my interest in the economics of programming languages, and adds a new twist to the debate over how to best popularize and profit from the creation and stewardship/ownership of a language.

I don’t agree with his point that Java would have been where “LAMP” is today – part of the attraction to PHP was how much it scales down, but most of his other points are dead on.

New Committer

Yesterday, I added Wolfgang Kechel as a committer to the Hecl project, because of the good work he’s been doing on it. He’s demonstrated consistency and quality over a number of months, and has also shows he understands where I want to go with the language, so it was a natural thing to do.

Adding a new committer is not always so clear cut, though, and it’s sometimes a difficult decision, especially with smaller projects. I find that it’s much more fun to work on projects with other people, so I want to encourage them to get involved, and granting access is something that ought to give people a bigger stake in the project, and reward/encourage them in their efforts. In practice, it doesn’t always work out that way, though. Often, after a bit of initial work, they dissappear, which means they weren’t really that interested in the first place, and you shouldn’t have bothered granting them access to the code – they could get by just fine by submitting patches.

The question of openness vs control also plays a strong part in the difficulty of the decision. I think the Apache Software Foundation way of doing things (decisions by consensus of those who work on the code) is pretty successful for well established projects, but I’m also convinced that the “benevolent dictator” model is best in some cases. Design by committee doesn’t work very well for projects like a programming language where you really want to have one coherent vision, rather than some sort of frankenstein mishmash of pieces. (As an aside, Larry Wall, has done a fine job of proving that you can have – even revel in – a mishmash, even with a ‘dictator’!).

The point being that it’s a little scary to hand someone the keys to your project, because you take a risk on them working counter to your goals, especially in new projects without a self-propagating culture in place. An established language like Python has a fairly large number of people who can instinctively tell you if a new feature is “pythonic” or not. But for a small, new project like Hecl, the risk is that there is more room for unproductive arguments, or having to back out “bad” changes should someone really take off in a direction you don’t like. Of course, with Hecl, I think it is at a great point in its life. There is a lot of room to hack, because it isn’t used by many people, so we don’t have to worry about angry users if we break it while trying to improve it!

All things considered, it’s a positive step to get more people involved, but it’s important to start building a culture in order to keep your code going in a healthy direction. Depending on the project, that culture might include a healthy dose of what is and what isn’t within the project’s scope, or it might simply regard the quality of code considered acceptable. For instance, Tcl’s standard library is pretty open to whatever people want to implement, however you must accompany your code by either tests or documentation, and preferably both.

I think the most interesting phase is when you are “bootstrapping” – going from just one person to a group. Judging by the number of small projects that have fallen by the wayside without ever getting more than one person to work on them, it’s also the most difficult period in a project’s life.

What do other people think? What tactics do you use to get people involved, yet still make sure they are the right people?