Erlang and unreliable resources

Here’s something you should be aware of if you’re using Erlang in a large system.  It’s a pity the core team hasn’t included something like it in the Erlang distribution itself, or event documented; because it’s something I think you’ll eventually need if you build a big enough system.

In a large system, you’ll have components that should be up and running – but at some point might not be.  For instance, a web site with a database.  If you write your Erlang code to just “let it crash” because the web server can’t connect to the database – just like you read in the books – then your whole site will fall over, web server and all.  At this point, you can’t even show your visitors a message explaining that “we’re experiencing problems, please try again later”, because the problem has propagated throughout the system, bringing it all down.  Another scenario might be a machine with a user interface and some fragile hardware.  The machine can’t do its job without the hardware, but if the hardware is broken, the software should not crash completely!  The user interface needs to stay active, letting users know that the machine is broken, and perhaps offering some diagnostics or offering some steps to try and correct the problem.  What these have in common is that there are some components of the system that should always try to be available even if they are not 100% functional, and there are other bits and pieces that may be necessary for the correct functionality of the system, but should not cause it to become entirely unavailable when something goes wrong.

The term for this is “circuit breaker”, because it breaks the chain that is part of a normal OTP system, where enough failures of a worker lead to a supervisor crashing, and its supervisors crashing in turn, after enough failures, and so on up the chain.  One high quality implementation is located here: https://github.com/jlouis/fuse – and politely points to some other, similar implementations.  I’m a bit frustrated that I found this only recently, because it’s a nice bit of code that strikes me as quite likely to be used somewhere withing any sufficiently large Erlang system.

As a footnote, I also created something similar, although it’s not battle tested (I just released it, actually!) and operates at a different level: https://github.com/davidw/hardcore – but it might be useful for some people.

A Minor Erlang Rant

In an earlier post, I compared node.js to Erlang: https://journal.dedasys.com/2010/04/29/erlang-vs-node-js – which admittedly has a whiff of apples and oranges about it.  Still, though, I think there's something to it.  Node.js is creating lots of buzz for itself these days.   Some of that will turn into actual production use, and at that point actual companies will have a vested interest in seeing the system improved.  Currently, it is not as 'good' as Erlang.  Erlang's VM has a built in scheduler, so you simply don't have to worry about manually creating bite-sized chunks of processing that feed into one another, it all "just works".  For instance, my current Erlang project involves reading from twitter's stream, and distributing that to IRC and the web.  It's pretty simple, works well, and is quite robust.

I haven't had the best of days though, and one little annoyance of the many I dealt with today really caught my eye.  I need to do an HTTP POST in Erlang, and:

  1. The documentation does not have examples.
  2. Here's an example of how to do a POST:
    http:request( post, { "http://scream-it.com/win/tests/posttest.cfm", [], "application/x-www-form-urlencoded", "x=2&y=3&msg=Hello%20World" }, [], [] ).

  3. Aside from being ugly and not very leggible, you'll note that he's passing the parameters as a string, and also has to manually include the "x-www-form-urlencoded" header.

  4. To create that string, you'd want to url encode it.  Actually, ideally, you'd just pass in a list of key/value pairs and let the library module handle it.

  5. However, there's nothing in the http module that does that.

  6. If you look through various bits of Erlang code on the web, you'll note many, many versions of url encoding and decoding, because the http module makes no mention of how one might go about doing so.

  7. It turns out, that the edoc_lib module does have a uri encode function!

  8. That isn't documented in its own man page.

  9. And certainly isn't linked to in the http page.

So, in 2010, doing an HTTP POST in Erlang is still a pain in the neck.  I'd be happy to put my money where my mouth is and submit a patch (at least one for the documentation), but you have to wonder why no one has fixed the problem so far – maybe they're not very accepting of patches?

Erlang: Log to Files

Since I had a heck of a time finding out how to do something that ought to be really easy, I thought I'd post what I found.  If you want Erlang to write errors to a file instead of stdout, add this to the command line:

-kernel error_logger '{file, "/tmp/erl.log"}'

Maybe there's a better solution, in which case feel free to correct me.

Mochiweb proto-deb and hacking

I had some time over the weekend to play a bit with Mochiweb.  It's a very nice little bit of work, but kind of strange for an open source project.  I think both things come from the fact that it was built to do some very specific things in a business, and once those goals were accomplished, that suited the owner just fine.  There is no nice/fancy project home page, no links to documentation, and in general, all of the stuff you're supposed to do for a successful open source project.   The code, however, is good, fast, and pretty easy to deal with.

Anyway, I wanted to use it for a thing or two I'm working on, so I created some .deb packaging for it.  It's been ages since I've done that (indeed, I officially retired from Debian), so it's probably missing something, but I thought I'd throw it out there for people to have a look.

The git repository is here: http://github.com/davidw/mochiweb – my changes live in the 'deb' branch.

The big thing I've tried to do is make it so that if you have several mochiweb projects on the same machine, they'll all be started/stopped by the init scripts, if you so choose.  What you do is add a file with the path to the application to the /etc/mochiweb/apps directory, and when you do /etc/init.d/mochiweb start, it will start all of those.  At stop time, it will run the stop.sh script, which shuts things down.  It runs things as the user that owns the start.sh script, unless it's root, in which case it simply doesn't run.

The thing that's missing right now is a way to get the thing to write logs somewhere, and I'm not quite sure how to do that, so here's hoping someone will write in with a suggestion.

The .deb file I generated ought to be attached to this post, although there are no guarantees that I'll keep it up to date.

Erlang vs node.js

I've written about Erlang in the past, and my suspicion that, soon or later, other languages/systems will come along and "eat its lunch".  Scala is one such potential contender.  Another that has been gaining some visiblity lately is node.js, a simple framework for creating networked applications on top of Google's V8 Javascript engine.

I should define what Erlang's lunch is a little bit better before we go on.  Erlang does several things really, really well; better than many existing, mainstream systems:

  1. Concurrency – the Actor model that it uses is much easier and more pleasant than dealing with threads.
  2. Distributed systems – Erlang makes this fairly easy and pleasant as well.
  3. Fault tolerant systems – using its distributed powers, Erlang is good for writing things like telephone switches that can't spend any significant time down.

Of these, I think the big one is concurrency.  Distributed systems are nice, but not a critical point for most people.  Same goes with fault-tollerant: those who need it really need it, but the rest of us are willing to make some compromises.  Our web applications are important, but generally not something where people's lives hang in the balance.

How does Erlang do "concurrency"?  The developer creates Erlang "processes" that interact with one another by passing messages.  These are not real, OS level processes, though, and this is critical to how Erlang operates.  Since these processes must all coexist within one real, system level process, it is absolutely essential that no operation that they perform hangs the entire system!  The Erlang runtime system is built around this concept.  Any Erlang process can do things like read and write to the disk or network, or have a "while (true) { …. }" loop (it doesn't actually look quite like that in Erlang, but that's the idea), and it won't wedge the system.  This knowledge is also critical when you want to interface Erlang to the outside world: if your C library contains a routine that might block for a long time, you can't just call it from Erlang as it won't be a well-behaved part of Erlang's world (there are ways around this of course, but make life a bit more complicated for the developer).  All this is done with Erlang's scheduler: each Erlang process gets a number of operations it can run before some other process gets to run, so even our while loop will only run for a bit before the system moves on to something else.   IO is rigorously done with non-blocking calls internally in order to keep that from becoming an issue.

No other system that I know of has such a focus on being non-blocking, and node.js is no exception: a while(true) loop is perfectly capable of wedging the system.  Node.js works by passing functions (closures, in many cases) around so that work can be performed as needs be.  However, the actual functions that run actually do block the system, and thus must be written in order to not run for too long.  Also, and this is important, Node.js also does its best to make IO non-blocking by default, so that you don't have to worry about IO calls.

Node.js isn't up to the level Erlang is at, because it requires more manual intervention and thinking about how to do things, but it's probably "good enough" for many tasks.  How often do you really write code with so many calculations that it slows things down?  Not often in my case – the real world problem I most often encounter is IO, and node.js does its best to make that non-blocking, so that it can be handled in the "background" or a bit at a time, without wedging the system.  And if you really needed to write a long-running calculation (say you want to stream the digits of PI or something), you can break up your calculation manually, which may not be quite as elegant as Erlang, but is "good enough" for many people.

"Good enough" concurrency, combined with a language that is at least an order of magnitude more popular than Erlang, and a fast runtime, combined with ease of use in general (it's way easier to get started with node.js than with most Erlang web stuff) make for a system that's likely to do fairly well in terms of diffusion and popularity, and is going to "eat some of Erlang's lunch".  Or perhaps, rather than actually taking users away from Erlang, it's likely to attract people that might have otherwise gone to Erlang.

The Erlang Paradox

There has been a nice series of interviews with the creators of various programming languages. In this one, it’s Joe Armstrong‘s turn. The creator of Erlang, he’s a very bright guy and well worth reading. One thing that has always struck me as interesting about Erlang, as compared to most of the “open source” style languages I’ve tended to use, is how it was incubated for a long time within Ericsson. It spent years there, and millions of lines of code were put into production use, prior to being open sourced, and certainly before the world at large took much notice of it.

This has been both good and bad for the language. The obvious upside is that a lot of smart people were paid to work on and improve it. Many new languages don’t really get that opportunity, unless they come out of an academic environment. Rather than being a spare time project, or something that slowly builds up speed over the years, Erlang, as far as the rest of the world was concerned, burst forth from Joe’s forehead, fully formed. This is not something like Ruby, where the early adapters were small companies like 37 signals, willing to live on the edge and not follow the pack. Quite the contrary, Erlang was backed by a huge, established player in the telecom market, focused particularly on applications where reliability is paramount. Certainly not a language you can call a ‘toy’!

The downside to all this, though, is that, esentially, they can’t really change the language much without being very, very careful. Here’s what Joe says about what he might have done differently:

Looking back, is there anything you would change in the language’s development?

Removing stuff turns out to be painfully difficult. It’s really easy to add features to a language, but almost impossibly difficult to remove things. In the early days we would happily add things to the language and remove them if they were a bad idea. Now removing things is almost impossible.

The main problem here is testing, we have systems with literally millions of lines of code and testing them takes a long time, so we can only make backwards compatible changes.

Some things we added to the language were with hindsight not so brilliant. I’d happily remove macros, include files, and the way we handle records. I’d also add mechanism to allow the language itself to evolve.

Which is an interesting contrast to more “traditional” open source languages like Ruby and Python, which, over the years, have from time to time bitten the bullet and unloaded stuff they don’t like. On the other hand, Erlang is solid code that you can count on if you do need to write systems with little to no downtime.

What do you think of the two approaches?

Language Geekery: Reia Programming Language

This looks kind of interesting:

http://wiki.reia-lang.org/wiki/Reia_Programming_Language

In short: it’s a Ruby/Python type of language on top of Erlang. It has a mishmash of Ruby and Python syntax, and also does objects, implemented as Erlang processes.

Funny quote:

Reia uses an indentation-sensitive syntax. This allows Reia to look similar to Erlang without relying on “ant turd” tokens (i.e. , ; and .) to structure relationships between forms.

That made me laugh, and is in all seriousness, something that I do find to be a bit annoying in Erlang, especially when refactoring code. You can’t just move a line of code from one place to another; you move it, then change the line endings in both places.

My preference would be to do something more Ruby style rather than Python style for the following reason: the indentation thing makes it harder to use straight-up Python for templating purposes, whereas Ruby can be used as-is. Templates are something that I wouldn’t particularly want to do in Erlang, either, so doing them with this language might be a very nice alternative.

Anyway, it looks to be in its infancy. Unfortunately, it seems to require a more recent version of Erlang than what ships with Ubuntu Hardy. Maybe Intrepid will have something recent enough to run it.

It’s a little known fact that I originally wrote Hecl in Erlang, although, sadly, I can’t find the code any more.