Nickles, dimes, pennies, and Italian regulations

I have recently gone about opening a "Partita IVA" ( http://it.wikipedia.org/wiki/Partita_IVA ) so I can act as an independant consultant here.  Like everything here, it's a pain in the neck, but opening it wasn't all that bad, compared to other close encounters of the bureaucratic kind that I've had.

When it came time to send out my first bill, of course I had to get the accountant to help me put it together (simply sending a bill with the amount to be paid would be way too easy).  The crowning touch, though, was that I had to go to the "tabaccheria" and purchase a 1.81 (one Euro, eight-one cents) "marca da bollo" ( http://en.wikipedia.org/wiki/Italian_revenue_stamp ) to affix to the aforementioned bill.  This is only necessary, however, in cases where the bill exceeds 77.47 (seventy-seven Euro, fourty-seven cents).  The end result was that between asking the accountant for help, going over to the store to get the stamp, and so on, I probably wasted in excess of a half an hour of my life for something that really isn't that complicated.

Who dreams this bullshit up, anyway?

Google execs convicted

In an update to an earlier article I posted, it appears that the Google executives in question have been convicted:

http://www.corriere.it/salute/disabilita/10_febbraio_24/dirigenti-google-condannati_29ebaefe-2122-11df-940a-00144f02aabe.shtml (in Italian)

They were convicted for having failed to block the publication of a video showing some teenagers picking on and hitting another minor with Down's syndrome.

It will be interesting to see how Google reacts.  Apparently, the court believes that Google is criminally responsible for videos its users happen to post, which means that they would, in theory, have to personally review every video submitted to determine whether they are going to be infringing on someone's rights because of its content?

Update:

Here's a New York Times link:

http://www.nytimes.com/aponline/2010/02/24/business/AP-EU-Italy-GoogleTrial.html


Update 2:

"cate" posted a link to Google's official response: http://googleblog.blogspot.com/2010/02/serious-threat-to-web-in-italy.html

Also, it's really incredible to read the comments here (in Italian): http://vitadigitale.corriere.it/2010/02/processo_vivi_down_google_cond.html

Most of them are against this ruling, but a significant number think it's a good thing, which just goes to show that you can't put all the blame on politicians for Italy's woes: someone is voting for them, after all.

Italy vs Google

I'm starting to notice a pattern here:

  • Google executives are on trial because some sorry excuses for human beings picked on a retarded person and posted the video to youtube: http://news.bbc.co.uk/2/hi/technology/8115572.stm – this one is simply preposterous.  Going after the execs of a company who did nothing to aid, abet, condone or in any way facilitate the abuse in question is absurd, and if extended to other industries would mean that you could pretty much attack any company whose products happened to figure in a crime somehow.  Kitchen knives, hunting rifles, golf clubs, even automobiles would seem fair game.
  • Italy is going after "user generated content" sites like Youtube and wants to force them to register with the government if they wish to operate: http://arstechnica.com/tech-policy/news/2010/02/italy-preparing-to-hold-youtube-others-liable-for-uploads.ars
  • And last but not least, this hit piece in the normally respectable Corriere della Sera: http://www.corriere.it/economia/10_gennaio_28/mucchetti_4de4be8a-0be8-11df-bc70-00144f02aabe.shtml
     – it's in Italian, but the gist of it is that Mr Mucchetti really has it in for Google because they operate out of Ireland in the EU, whereas he believes they should be registered in Italy as a publisher, and subject to Italy's myriad rules, regulations, and, of course, taxes regarding publishing.  Despite, well, not really publishing much of anything themselves. He mentions "tax evasion" charges that had been considered, because the Italian division of Google is not where the adsense revenue in Europe goes.  I suppose he figures that since the ads are bought by residents of Italy, the money should somehow stay in Italy?  He also huffs and puffs about Italy's antitrust laws, which, in the same piece, he admits were created with the express purpose of not touching existing companies (the market share limit was set higher than the share of the largest existing company).  Perhaps he would do well to reflect on political schemes and carve-ups like that and think about why companies like Google go to Ireland, rather than Italy.  He also makes some quick mentions of network neutrality, and rambles on a bit about how it's a battle between the "Obamanian, Californian, search engines" versus the telecommunications industry, in "the rest of the world and above all in Europe".  And of course he uses a liberal sprinking of keywords like "globalization", "multinational corporations", and "deregulated" to attempt to paint Google in terms of being a big, evil company throwing its weight around.  One wonders if there aren't more pressing problems with the Italian media industry, such as the prime minister owning a large chunk of it?

One way of seeing things is that politicians and businessmen in Italy noticed Google was actually making quite a bit of money, and even if they don't quite understand this internet thing, they want some of the loot.

And while Google certainly is becoming big enough to be cause for worry and discussion, the moves against them in Italy do not seem anything like a rational response calculated to offset severe failures in the market.

In any case, it will be interesting to see what happens.  Maybe, after China, we'll see Google quit Italy as well?

Flippa experiment

I decided to try a little experiment with Flippa.com, a site where you can auction off domains or web sites.

I put http://www.innsbruck-apartments.com up for auction:


http://flippa.com/auctions/83341/Innsbruck-Austria-rental-listing-site—Ski-Season

We'll see how it goes and whether the site is worth using for other sites that I'd like to sell on.

It's a good test case, because it's a site I threw together years ago simply to aid our search for a new apartment in Innsbruck, and then requested by friends.

Rough Estimates of the Dollar Cost of Scaling Web Platforms – Part I

I have been pondering the idea behind this article for a while, and finally had a bit of time to implement it.

The basic idea is this: certain platforms have higher costs in terms of memory per concurrent connection. Those translate into increased costs in dollar terms.

Nota Bene: Having run LangPop.com for some time, I'm used to people getting hot and bothered about this or that aspect of statistics that are rough in nature, so I'm going to try and address those issues from the start, with more detail below.

  • Constructive criticism is welcome. I expect to utilize it to revisit these results and improve them. Frothing at the mouth is not welcome.
  • There is something of a "comparing apples and oranges" problem inherent in doing these sorts of comparisons. As an example, Rails gives you a huge amount of functionality "out of the box", whereas Mochiweb does much less. More on that below.
  • I am not familiar with all of these systems: meaning that I may not have configured them as I should have. Helpful suggestions are, of course, welcome. Links to source code are provided below.
  • You can likely handle many more 'users' than concurrent connections, which means multiple browsers connecting to the site at the same time.
  • Programmer costs are probably higher than anything else, so more productive platforms can save a great deal of money, which more than makes up for the cost of extra memory.  There's a reason that most people, outside of Google and Yahoo and sites like that, don't use much C for their web applications.  Indeed, I use Rails myself, even though it uses a lot of memory and isn't terribly fast: I'd rather get sites out there, see how they do, and then worry about optimizing them (which is of course quite possible in Rails).

Methodology

All tests were run like so: my new laptop with two cores and four gigs of memory was used as a server, and my older laptop was used to run the ab (Apache Benchmark) program – they're connected via ethernet. I built up to successive levels of concurrency, running first 1 concurrent connection, 2, 10, and so on and so forth. The "server" computer is running Ubuntu 9.10, "karmic".

Platforms

The platforms I tested:

  • Apache 2.2, running the worker MPM, serving static files.
  • Nginx 0.7.62, serving static files.
  • Mochiweb from svn (revision 125), serving static files.
  • Jetty 6.1.20-2, serving static files.
  • Rails 2.3.5, serving up a simple template with the current date and time.
  • PHP 5.2.10.dfsg.1-2ubuntu6.3, serving up a single php file that prints the current date and time.
  • Django 1.1.1-1ubuntu1, serving up a template with the date and time.
  • Mochiweb, serving a simple template (erltl) with the date and time.
  • Jetty, serving a simple .war file containing a JSP file, with, as clever observers will have surmised, the date and time.

As stated above, it's pretty obvious that using Rails or Django for something so simple is overkill:

Better Tests for the Future

I would like to run similar tests with a more realistic application, but I simply don't have the time or expertise to sit down and write a blog, say, for all of the above platforms. If I can find a few volunteers, I'd be happy to discuss some rough ideas about what those tests ought to look like. Some ideas:

  • They should test the application framework with a realistic, real world type of example.
  • The data store should figure as little as possible – I want to concentrate on testing the application platform for the time being, rather than Postgres vs Sqlite vs Redis. Sqlite would probably be a good choice to utilize for the data store.
  • Since this first test is so minimalistic, I think a second one ought to be fairly inclusive, making use of a fair amount of what the larger systems like Rails, Django and PHP offer.
  • I'd also be interested in seeing other languages/platforms.
  • The Holy Grail would be to script all these tests so that they're very easy to run repeatably.

Results

With that out of the way, I do think the results are meaningful, and reflect something of what I've seen on various platforms in the real world.

First of all, here we look at the total "VSZ" (as ps puts it) or Virtual Size of the process(es) in memory. Much of this might be shared, between libraries, and "copy on write" where applicable.

The results are impressive: Rails, followed by Django and PHP eats up a lot of memory for each new concurrent connection. Rails, which I know fairly well, most likely suffers from several problems: 1) it includes a lot of code. That's actually a good thing if you're building a reasonably sized app that makes use of all it has to offer. 2) Its garbage collector doesn't play well with "copy on write". Which is what "Enterprise Ruby" aims to fix. Django and PHP are also fairly large, capable platforms when compared to something small and light like mochiweb.

That said, excuses aside, Erlang and Mochiweb are very impressive in how little additional memory they utilize when additional concurrent connections are thrown at them. I was also impressed with Jetty. I don't have a lot of experience with Java on the web (I work more with J2ME for mobile phones), so I expected something a bit more "bloated", which is the reputation Java has. As we'll see below, Jetty does take up a lot of initial memory, but subsequent concurrent connections appear to not take up much.  Of course, this is also likely another 'apples and oranges' comparison and it would be good to utilize a complete Java framework, rather than just a tiny web app with one JSP file.

So what's this mean in real world terms of dollars and cents? As your Rails application gets more popular, you're going to have to invest relatively more money to make it scale, in terms of memory.

For this comparison, I utilized the bytes/dollar that I'm getting for my Linode, which works out to 18,889,040.85 ($79.95 for 1440 MB a month).

As we can see, to have a similar amount of concurrent users is essentially free for Mochiweb, whereas with Rails, it has a significant cost.  This information is particularly relevant when deciding how to monetize a site: with Erlang and Jetty it would appear that scaling up to lots of users is relatively cheap, so even a small amount of revenue per user per month is going to be a profit, whereas with Rails, scaling up to huge numbers of users is going to be more expensive, so revenue streams such as advertising may not be as viable.  It's worth noting that 37 signals, the company that created Rails, is a vocal supporter of charging money for products.

There's another interesting statistic that I wanted to include as well.  The previous graph shows the average cost per additional concurrent user, but this one shows how much the platform costs (using  when there is just one user, so it acts as a sort of baseline:

As we can see, Jetty is particularly expensive from this point of view.  The default settings (on Ubuntu) seem to indicate that, for instance, the basic $20 a month Linode package would not be sufficient to run Jetty, plus a database, plus other software.  I think that the Apache Worker number is off a bit, and may reflect settings made to handle a large number of connections, or perhaps a different MPM would make sense.

Source Code / Spreadsheet

The spreadsheet I put together is here: http://spreadsheets.google.com/ccc?key=0An76R90VwaRodElEYjVYQXpFRmtreGV3MEtsaWYzbXc&hl=en

And the source code (admittedly not terribly well organized) is here: http://github.com/davidw/marginalmemory/

Detecting BlackBerry JDE Version

Recently, I went back and added some preprocessor code (it’s pretty much necessary in the world of J2ME) to ensure that Hecl would compile with older versions of the BlackBerry JDE. However, I also faced a problem: how to figure out what version of the JDE we’re using. It could be my latest cold clouding my mind, but I couldn’t find a simple way to do this. It never seems to be simple with the BlackBerry platform, unfortunately.

I did, however, finally find a nice way to obtain this information programmatically: the bin/rapc.jar file, which ships with the JDE, contains a file called app.version, which, indeed, contains the version of the JDE in use. I hacked up this code to read it and print it out:

File Selector for Java ME

I recently did some work to make Hecl read files, which also means that it can execute Hecl scripts from the phone’s memory. This is especially important for environments like Blackberry, where we will be distributing a signed version of the Hecl application. To create your own Hecl applications, instead of simply replacing the script in the .jar file, you can point to a script on the device’s file system. This is also available for normal Java ME phones, but unfortunately, for an open source project, the cost of a code signing certificate are prohibitive (on Blackberry, it’s only $20, so I bought one with my own money).

In any case, as part of this effort, I developed a very simple ‘file browser’, which is used in Hecl to select a script to execute.

The results are, like all of Hecl, available under an Apache license, which means that you can use it pretty much wherever you want:

http://github.com/davidw/hecl/blob/master/files/org/hecl/files/FileFinder.java

http://github.com/davidw/hecl/blob/master/files/org/hecl/files/FileFinderCallback.java

Of course, if you spot any ways to improve it or fix problems with it, I’d appreciate it if you sent patches back.

Mysql, Oracle and the European Commission

I am a bit of a cynic, and my cynic-sense definitely lit up when I read this:

http://monty-says.blogspot.com/2009/12/help-saving-mysql.html

He wants to have his cake and eat it too. Or as they say in Italian, he wants to have his wife drunk and the barrel full (really!).

He sold Mysql to Sun for a great deal of money – around one billion dollars. Even if his share is only 1% (which I doubt), that is still 10 million dollars, which is enough for anyone normal to live the rest of their life without ever having to work again. If he walked away with 10% of the deal, that’s 100 million dollars.

Monty writes:

I have spent the last 27 years creating and working on MySQL and I hope, together with my team of MySQL core developers, to work on it for many more years.

Since it’s under the GPL, he can do that himself for as long as he wants – he doesn’t have to worry about making a living, after all. If he took away something closer to the 100 million number, he can also fund several of his friends to work on it for years and years, without worrying much about income. They could always do consulting if they wanted to make a few bucks, in any case.

Now, what he’s worried about is that Oracle will no longer put money into Mysql. I agree that that’s a real risk.

However, I also think that since it’s Oracle’s money that is being spent, they have the right to do as they see fit, within certain limits, and this case is well within those limits. My view is that there are plenty of cases when governments should intervene. For instance, if Mysql were the only competitor to Oracle in the database market, it might be quite unhealthy to let them buy it. But that’s not the case – not at all. There are numerous competitors, both closed and open.

So the risk to ‘the public’ is relative: people bought into a product that may now stagnate, but if they really want, they have plenty of other places they can jump without too much pain (SQL is a standard, after all). It’s like if FreeBSD disappeared tomorrow: you could switch to OpenBSD or Linux or something else. Perhaps not painlessly, but it wouldn’t be the end of the world, either (let’s hope FreeBSD has a long and happy future ahead of it, though).

We’re back to wanting to both have and eat the cake: Monty seems to want the EC to force Oracle to divest itself of Mysql or impose various restrictions on its development. If he had imposed those conditions on Sun when that sale was made, they might not have done the deal, or placed a smaller value on it. He didn’t, but now he is back and trying to impose them on a deal in which he is not a participant (although I suppose he may have some Sun shares). That doesn’t strike me as being entirely fair: if he really wanted to ensure the future of Mysql, he would not have sold the company.

In terms of open source, keep in mind that Mysql was not released under the GPL until 2001:

http://www.mysql.com/news-and-events/generate-article.php?id=23

For many years it was under a “sort of open” license that was not proper open source.

Then there’s the Richard Stallman angle: that somehow, Mysql “deserves” to have developers funded to work on it. I don’t buy that, either. Mysql is a nice project, but I don’t think that there’s any moral imperative that the government should step in and fund some open source projects. Should the EC step in and say that Oracle should also fund Postgres development while they’re at it? There are plenty of deserving projects that could use some cash to fund their development. For instance, my friend Salvatore is looking for donations for Redis development. It’s not as big or “important” a project as Mysql, but it’s good code and already being taken up by various companies. For that matter, I’d like to keep working on Hecl full time.

In short: I hope that the EC approves the sale “as-is”. I also hope that Oracle continues to take good care of Mysql, but if they don’t, that is ultimately their decision as long as there remain plenty of competitors in the market.

I think that our time as “the open source community” is better spent fighting for more important things: against bad software patents, for open standards in governments, and that sort of thing, that will benefit everyone, rather than trying to wrest control of Monty’s code back from Oracle after he’s already been paid handsomely for it.

So Long, Debian

I went ahead and officially retired from Debian today. It’s something I probably ought to have done a while ago, but have been putting off. I’ve had some great times with Debian and met a lot of good people. Debian is also where I really got my start with free software. For me, and others, Debian is something of a stepping stone: it’s fairly easy to start packaging up some program, and before you know it, you’re hooked on the whole open source thing. These days I mostly use Ubuntu, but hold Debian in very high regard in terms of the technical quality. And I’m also glad that they’ve held the line on free software over the years: with a Debian CD, I know I can make as many copies as I want for anyone, and use it both myself and for whatever commercial applications my clients need. How cool is that?

Google Design Annoyance

I’m the last person in the world to have much of anything to say about visual design, but this one is so blatant that even I can’t help but notice:

Adsense

Analytics

See the big blue button in the same place? In one case, it is a ‘sign up’ button, in the other it is an ‘enter if you already have an account button’. This is visually confusing for those (and there are a lot of us, I’m sure), who use both products.