Hosted Code Repository?

Recently (yesterday), the Ruby on Rails project announced their switch to git for their revision controlling needs. Also, they announced that they will use the hosted service github as the place to host the main repository on (even though git is decentralized, there is some sense in having a “main tree” which contains what’s going to be the official releases).

I didn’t know github, so I had a look at their project.

What I don’t understand is that they seem to also target commercial entities with their offering. Think of it: Supposing that you are a commercial entity doing commercial software development. Would you send over all your sourcecode and all the development history to another company?

Sure. They call themselves “Secure”. But what does that mean? Sure: They have SSL and SSH support, but frankly, I’m less concerned with patches travelling over the network unencrypted than I’m concerned with trusting anybody to host my code.

Even if they don’t screw up storage security (think: “accessing the code of your competition”), even if they are completely 100% trustworthy (think: “displeased employee selling out to your competition before leaving his employer”), there is still the issue of government/legal access.

When using an external hosting provider, you are storing your code (and history) in a foreign country with its own legislation. Are you prepared for that?

And finally, do you want the government of the country you’ve just sent your code (and history) to, to really have access to all that data? Who guarantees that the hosting provider of your choice won’t cooperate as soon as the government comes knoking (it happened before, even without legal base at all)?

All that is never worth the risk for a larger company (or for smaller ones – like ours).

So what exactly are these hosting companies (github is one. Code Spaces is another) targeted at?

  • Free Software developers? Their code is open to begin with, so they have to face the problems I described anyways. But they are much harder to sue. Also, I’m not sure how compelling it is for a free software project to use a non-free tool (rails being the exception, but we’ll talk about that later on)
  • Large companies? No way (see above)
  • Smaller companies? Probably not. Smaller companies are less of a target due to lower visibility, but sueing them for anything is more likely to get you something in return quickly as they usually don’t dare prolonged legal fights.

A rant on brace placement

Many people consider it to be good coding style to have braces (in language that use them for block boundaries) on their own line. Like so:

function doSomething($param1, $param2)
{
    echo "param1: $param1 / param2: $param2";
}

Their argument usually is that it clearly shows the block boundaries, thus increasing the readability. I, as a proponent of placing bracers at the end of the statement opening the block, strongly disagree. I would format above code like so:

function doSomething($param1, $param2){
    echo "param1: $param1 / param2: $param2";
}

Here is why I prefer this:

  • In many languages code blocks don’t have their own identity – functions have, but not blocks (they don’t provide scope). Placing the opening brace on its own line, you emphasize the block but you actually make it harder to see what caused the block in the first place.
  • Using correct indentation, the presence of the block should be obvious anyways. There is no need to emphasize it more (at the cost of readability of the block opening statement).
  • I doubt that using one line per token really makes the code more readable. Heck… why don’t we write that sample code like so?
function
doSomething
(
$param1,
$param2
)
{
    echo "param1: $param1 / param2: $param2";
}

PostgreSQL on Ubuntu

Today, it was time to provision another virtual machine. While I’m a large fan of Gentoo, there were some reasons that made me decide to gradually start switching over to Ubuntu Linux for our servers:

  • One of the large advantages of Gentoo is that it’s possible to get bleeding edge packages. Or at least you are supposed to. Lately, it’s taking longer and longer for an ebuild of an updated version to finally become available. Take PostgreSQL for example: It took about 8 months for 8.2 to become available and it looks like history is repeating itself for 8.3
  • It seems like there are more flamewars than real development going on in Gentoo-Land lately (which in the end leads to above problems)
  • Sometimes, init-scripts and stuff changes over time and there is not always a clear upgrade-path. emerge -u world once, then forget to etc-update and on next reboot, hell will break loose.
  • Installing a new system takes ages due to the manual installation process. I’m not saying it’s hard. It’s just time-intensive

Earlier, the advantage of having current packages greatly outweighted the issues coming with Gentoo, but lately, due to the current state of the project, it’s taking longer and longer for packages to become available. So that advantage fades away, leaving me with only the disadvantages.

So at least for now, I’m sorry to say, Gentoo has outlived it’s usefulness on my productive servers and has been replaced by Ubuntu, which albeit not being bleeding-edge with packages, at least provides a very clean update-path and is installed quickly.

But back to the topic which is the installation of PostgreSQL on Ubuntu.

(it’s ironic, btw, that Postgres 8.3 actually is in the current hardy beta, together with a framework to concurrently use multiple versions whereas it’s still nowhere to be seen for Gentoo. Granted: An experimental overlay exists, but that’s mainly untested and I had some headaches installing it on a dev machine)

After installing the packages, you may wonder how to get it running. At least I wondered.

/etc/init.d/postgresql-8.3 start

did nothing (not very nice a thing to do, btw). initdb wasn’t in the path. This was a real WTF moment for me and I assumed some problem in the package installation.

But in the end, it turned out to be an (underdocumented) feature: Ubuntu comes with a really nice framework to keep multiple versions of PostgreSQL running at the same time. And it comes with scripts helping to set up that configuration.

So what I had to do was to create a cluster with

pg_createcluster --lc-collate=de_CH --lc-ctype=de_CH -e utf-8 8.3 main

(your settings my vary – especially the locale settings)

Then it worked flawlessly.

I do have some issues with this process though:

  • it’s underdocumented. Good thing I speak perl and bash, so I could use the source to figure this out.
  • in contrast to about every other package in Ubuntu, the default installation does not come with a working installation. You have to manually create the cluster after installing the packages
  • pg_createcluster –help bails out with an error
  • I had /var/lib/postgresql on its own partition and forgot to remount it after a reboot which caused the init-script to fail with a couple of uninitialized value errors in perl itself. This should be handeled cleaner.

Still. It’s a nice configuration scheme and a real progress from gentoo. The only thing left for me now is to report these issues to the bugtracker and hope to see this fixed eventually. And it it isn’t, there is this post here to remind me and my visitors.

Another new look

It has been a while since the last redesign of gnegg.ch, but is a new look after just a little more than one year of usage really needed?

The point is that I have changed blogging engines yet again. This time it’s from Serendipity to Word Press.

What motivated the change?

Interestingly enough, if you ask me, s9y is clearly the better product than WordPress. If WordPress is Mac OS, then s9y is Linux: It has more features it’s based on cleaner code, it doesn’t have any commercial backing at all. So the question remains: Why switch?

Because that OSX/Linux-analogy also works the other way around: s9y is an ugly duckling compared to WP. External tools won’t work (well) with s9y due to it not being known well enough. The amount of knobs to tweak is sometimes overwhelming and the available plugins are not nearly as polished as the WP ones.

All these are reasons to make me switch. I’ve used a s9y to wp converter, but some heavy tweaking was needed to make it actually transfer category assignements and tags (the former didn’t work, the latter wasn’t even implemented). Unfortunately, the changes were too hackish to actually publish them here, but it’s quite easily done.

Aside of that, most of the site has survived the switch quite nicely (the permalinks are broken once again though), so let’s see how this goes :-)

Impressed by git

The company I’m working with is a Subversion shop. It has been for a long time – since fall of 2004 actually where I finally decided that the time for CVS is over and that I was going to move to subversion. As I was the only developer back then and as the whole infrastructure mainly consisted of CVS and ViewVC (cvsweb back then), this move was an easy one.

Now, we are a team of three developers, heavy trac users and truly dependant on Subversion which is – mainly due to the amount of infrastructure that we built around it – not going away anytime soon.

But none the less: We (mainly I) were feeling the shortcomings of subversion:

  • Branching is not something you do easily. I tried working with branches before, but merging them really hurt, thus making it somewhat prohibitive to branch often.
  • Sometimes, half-finished stuff ends up in the repository. This is unavoidable considering the option of having a bucket load of uncommitted changes in the working copy.
  • Code review is difficult as actually trying out patches is a real pain to do due to the process of sending, applying and reverting patches being a manual kind of work.
  • A pet-peeve of mine though is untested, experimental features developed out of sheer interest. Stuff like that lies in the working copy, waiting to be reviewed or even just having its real-life use discussed. Sooner or later, a needed change must go in and you have the two options of either sneaking in the change (bad), manually diffing out the change (hard to do sometimes) or just forget it and svn revert it (a real shame).

Ever since the Linux kernel first began using Bitkeeper to track development, I knew that there is no technical reason for these problems. I knew that a solution for all this existed and that I just wasn’t ready to try it.

Last weekend, I finally had a look at the different distributed revision control systems out there. Due to the insane amount of infrastructure built around Subversion and not to scare off my team members, I wanted something that integrated into subversion, using that repository as the official place where official code ends up while still giving us the freedom to fix all the problems listed above.

I had a closer look at both Mercurial and git, though in the end, the nicely working SVN integration of git was what made me have a closer look at that.

Contrary to what everyone is saying, I have no problem with the interface of the tool – once you learn the terminology of stuff, it’s quite easy to get used to the system. So far, I did a lot of testing with both live repositories and test repositories – everything working out very nicely. I’ve already seen the impressive branch merging abilities of git (to think that in subversion you actually have to a) find out at which revision a branch was created and to b) remember every patch you cherry-picked…. crazy) and I’m getting into the details more and more.

On our trac installation, I’ve written a tutorial on how we could use git in conjunction with the central Subversion server which allowed me to learn quite a lot about how git works and what it can do for us.

So for me it’s git-all-the-way now and I’m already looking forward to being able to create many little branches containing many little experimental features.

If you have the time and you are interested in gaining many unexpected freedoms in matters of source code management, you too should have a look at git. Also consider that on the side of the subversion backend, no change is needed at all, meaning that even if you are forced to use subversion, you can privately use git to help you manage your work. Nobody would ever have to know.

Very, very nice.

Failing silently is bad

Today, I’ve experienced the perfect example of why I prefer PostgreSQL (congratulations for a successful 8.3 release today, guys!) to MySQL.

Let me first give you some code, before we discuss it (assume that the data which gets placed in the database is – wrongly so – in ISO-8859-1):

This is what PostgreSQL does:

bench ~ > createdb -Upilif -E utf-8 pilif
CREATE DATABASE
bench ~ > psql -Upilif
Welcome to psql 8.1.4, the PostgreSQL interactive terminal.

Type:  copyright for distribution terms
       h for help with SQL commands
       ? for help with psql commands
       g or terminate with semicolon to execute query
       q to quit

pilif=> create table test (blah varchar(20) not null default '');
CREATE TABLE
pilif=> insert into test values ('gnügg');
ERROR:  invalid byte sequence for encoding "UTF8": 0xfc676727293b
pilif=>

and this is what MySQL does:

bench ~ > mysql test
Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 97
Server version: 5.0.44-log Gentoo Linux mysql-5.0.44-r2

Type 'help;' or 'h' for help. Type 'c' to clear the buffer.

mysql> create table test( blah varchar(20) not null default '')
    -> charset=utf8;
Query OK, 0 rows affected (0.01 sec)

mysql> insert into test values ('gnügg');
Query OK, 1 row affected, 1 warning (0.00 sec)

mysql> select * from test;
+------+
| blah |
+------+
| gn   |
+------+
1 row in set (0.00 sec)

mysql>

Obvisouly it is wrong to try and place latin1 encoded data in an utf-8 formatted data store: While every valid utf-8 byte sequence is a valid latin1 byte sequence (latin1 does not restrict the validity of bytes, though some positions may be undefined), the reverse certainly is not true. The character ü from my example is 0xfc in latin1 and U+00fc in unicode which must be encoed as 0xc3 0xbc in utf-8. 0xfc alone is no valid utf-8 byte sequence.

So if you pass this invalid sequence to any entity accepting an utf-8 encoded byte stream, it will not be clear what to do with that data. It’s not utf-8, that’s for sure. But assuming that no character set is specified with the stream, it’s impossible to guess what to translate the byte sequence into.

So PostgreSQL sees the error and bails out (if both the server and the client are set to utf-8 encoding and data is sent in non-utf8-format – otherwise it knows how to convert the data – conversion from any character set to utf-8 is possible all the time). MySQL on the other hand decides to fail silently and to try to fix up the invalid input.

Now while I could maybe live with the default of assuming latin1 encoding, just stopping to process the data without warning what so ever leads to undetected loss of data!

What if I’m not just entering one word? What if it’s a blog-entry like this one? What if the entry is done by a non tech-savvy user? Remember: This mistake can easily be produced: Wrong Content-Type headers, old browsers, broken browsers… it’s very easy to get Latin1 when you want utf-8.

While I agree that sanitization must be done in the application tier (preferably on the model), it’s inacceptable for a database application to store different data than what it was ordered to store without warning the user in any way. This easily leads to data loss or data corruption.

There are many more little things like this where MySQL decides to silently fail where PostgreSQL (and any other database) bail out correctly. As a novice this can feel tedious for you. It can feel like PostgreSQL is pedantic and like you are faster with MySQL. But let’s be honest: What do you prefer? An error message or lost data with no way of knowing that it’s lost?

This, by the way, is the outcome of a lengthy debugging session on a Typo3 installation, which also, but not ultimately is to blame here. In a perfect world, MySQL would bail out, but Typo3 would either

  • Not specify charset=utf8 when creating the table unless specifically asked to.
  • Send a charset=utf-8 http-header, knowing that the database has been created as containing utf-8
  • Sanitize user input before handing it over to the mysql-backend which is obviously broken in this instance.
  • Now back to debugging real software on real databases *grin*

reddit’s commenting system

This is something I wanted to talk about for quite some time now, but I never got around to it. Maybe you know reddit. reddit basically works like digg.com – it’s one of these web2.0 mashup community social networking bubble sites. reddit is about links posted by users and voted for by users.

Unlike digg, reddit has an awful screen design and thus seems to attract  a bit more mature crowds than digg does, but lately it seems to be taken over by politics and pictures which devalues the whole site a bit.

What is really interesting though is the commenting system. In fact, it’s interesting enough for me to write about it and it works well enough for me to actually post a comment there here and then. It’s even good enough for me to be sure that whenever I will be in the situation to design a system to allow users to comment on something that I will have a look at what reddit did and I will model my solution around that base.

There are so many commenting systems out there, but all fail in some regards. Either they disturb your reading flow, making it too difficult to post something. Or they either hide comments behind a foldable tree structure or they display a flat list making it difficult to see any kind of threading going on.

And once you actually are interested in a topic enough to post a comment or a reply to a comment, you’ll quickly lose track of the discussion which gets as quickly buried by newly arriving posts.

reddit works differently.

First, messages are displayed in a threaded, but fully expanded view, thus allowing to skip over content you are not interested in while still providing all the overview you need. Then, posting is done inline via some AJAX interface. You see a comment you want to reply to, you hit the reply link, enter the text and hit "save". The page is not reloaded, you end up just where you left off.

But what good is answering to a comment if the initial commenter quickly forgets about his or her comment? Or if he or she just plain doesn’t find her comment again?

reddit puts all direct replies to any comments you made into your personal inbox folder. If you have any of these replies, the envelope to the top right will light up red allowing you to see newly arrived replies to your comments. With one click, you can show the context of the post you replied to, your reply and the reply you got. This makes it incredibly easy to be notified when someone posted something in response, thus keeping the discussion alive, no matter how deeply it may have been buried by comments arriving after yours.

So even if reddit looks awful (one gets used to the plain look though), it has one of the best, if not the best online discussion systems under its hood and so many other sites should learn from that example. It’s so easy that it even got me to post a comment here and then – and I even got replies despite not obviously trolling (which usually helps you get instant-replies, though I don’t recommend this practice).

The IE rendering dilemma – solved?

A couple of months a IE rendering dilemma: How to fix IE8’s rendering engine without breaking all the corporate intranets out there? How to create both a standards oriented browser and still ensure that the main customers of Microsoft – the enterprises – can still run a current browser without having to redo all their (mostly internal) web applications.

Only three days after my posting IEBlog talked about IE8 passing the ACID2 test. And when you watch the video linked there, you’ll notice that they indeed kept the IE7 engine untouched and added an additional switch to force IE8 into using the new rendering engine.

And yesterday, A List Apart showed us how it’s going to work.

While I completely understand Microsofts solution and the reasoning behind it, I can’t see any other browser doing what Microsoft recommended as a new standard. The idea to keep multiple rendering engines in the browser and default to outdated ones is in my opinion a bad idea. Download-Sizes of browser increase by much, security problems in browsers must be patched multiple times, and, as the Webkit blog put it, “[..] hurts the hackability of the code [..]”.

As long as the other browser vendors don’t have IE’s market share nor the big company intranets depending on these browsers, I don’t see any reason at all for the other browsers to adapt IE’s model.

Also, when I’m doing (X)HTML/CSS work, usually it works and displays correctly in every browser out there – with the exception of IE’s current engine. As long as browsers don’t have awful bugs all over the place and you are not forced to hack around them, deviating from the standard in the process, there is no way a page you create will only work in one specific version of a browser. Even more so: When it breaks on a future version, that’s a bug in the browser that must be fixed there.

Assuming that Microsoft will, finally, get it right with IE8 and subsequent browser versions, we web developers should be fine with

<meta http-equiv="X-UA-Compatible" content="IE=edge" />

on every page we output to a browser. These compatibility hacks are for people that don’t know what they are doing. We know. We follow standards. And if IE begins to do so as well, we are fine with using the latest version of the rendering engine there is.

If IE doesn’t play well and we need to apply braindead hacks that break when a new version of IE comes out, then we’ll all be glad that we have this method of forcing IE to use a particular engine, thus making sure that our hacks continue to work.

Apple TV – Second try

When Apple announced their AppleTV a couple of months (or was it years?) ago, I was very skeptical of the general idea behind the device. Think of it: What was the big success behind the iPod? That it could run proprietary AAC files people buy from the music store?

No. That thing didn’t even exist back then. The reason for the success was the total easy (and FAST – remember: Back in the days, we had 1.1 MB/s USB which every MP3 player used vs. 40MB/s Firewire of the iPod) handling and the fact that it was an MP3 player – playing the files everyone already had.

It was a device for playing the content that was available at the time.

The AppleTV in its first incarnation was a device capable of playing content that wasn’t exactly available. Sure it could play the two video podcasts that existed back then (maybe more, but you get the point). And you could buy TV shows and movies in subpar quality on your PC (Windows or Mac) and then transfer them to the device. But the content that was available back then was in a different format: XVID dominated the scene. x264 was a newcomer and MP4 (and mov) wasn’t exactly used.

So what you got was a device, but no content (and the compatible content you had was in subpar quality compared to the incompatible content that was available). And you needed a PC, so it wasn’t exactly a device I could hook to my parents PC for example.

All these things were fixed by Apple today:

  • There is a huge library of content available right here, right now (at least in the US): The new movie rental service. Granted. I think it’s not quite there yet price vs. usability-wise (I think $5 is a totally acceptable price for a movie with unlimited replayability), but at least we have the content.
  • It works without a PC. I can hook this thing up to my parents TV and they can immediately use it.
  • The quality is OK. Actually, it’s more than OK. There is HD content available (though maybe only 720p one, but frankly, on my expensive 1080p projector, I don’t see that much of a difference between 720p and 1080p)
  • It can still access the scarce content that was available before.

The fact that this provides very easy to use video-on-demand to a huge amount of people is what makes me think that this little device is even more of a disruptive technology than the iPod or the iPhone. Think of it: Countless of companies are trying to make people pay for content these days. It’s the telcos, it’s cable companies and it’s device manufacturers. But what do we get? Crappy, constantly crashing devices, which are way too complicated for a non-geek and way too limited in functionality for a geek.

Now we got something that’s perfect for the non-geek. It has the content. It has the ease-of-use. Plug it in, watch your movie. Done. This is what a whole industry tried to do and failed so miserably.

I for my part will still prefer the flexibility given by my custom Windows Media Center solution. I will still prefer the openness provided by illegal copies of movies. I totally refuse to pay multiple times for something just because someone says that I have to. But that’s me.

And even I may sooner or later prefer the comfort of select-now-watch-now to the current procedure (log into private tracker, download torrent, wait for download to finish, watch – torrents are not streamable, even if the bandwith would easily suffice in my case – the packets arrive out of order), so even for me, the AppleTV could be interesting.

This was yet another perfect move by Apple. Ignore the analysts out there who expected more out of this latest keynote. Ignore the bad reception of the keynote by the marked (I hear that Apple stock just dropped a little bit). Ignore all that and listen to yourself: This wonderful device will certainly revolutionize the way we consume video content.

I’m writing this as a constant sceptic – as a person always trying to see a flaw in a certain device. But I’m sure that this time around, they really got it. Nice work!

My PSP just got a whole lot more useful

Or useful at all – considering the games that are available to that console. To be honest: Of all the consoles I have owned in my life, the PSP must be the most underused one. I basically own two games for it: Breath of Fire and Tales of Eternia – not only by this choice of titles, but also by reading this blog, you may notice a certain affinity to Japanese Style RPG’s.

These are the closest thing to a successor of the classical graphic adventures I started my computer career with, minus hard to solve puzzles plus a much more interesting story (generally). So for my taste, these things are a perfect match.

But back to the PSP. It’s an old model – one of the first here in Switzerland. One of the first on the world to be honest: I bought the thing WAAAY back with hopes of seeing many interesting RPG’s – or even just good ports of old classics. Sadly neither really happened.

Then, a couple of days ago, I found a usable copy of the game Lumines. Usable in a sense of when the guy in the store told me that there is a sequel out and I told him that I did not intend to actually play the game, he just blinked with one eye and wished me good luck with my endeavor.

Or in layman’s terms: That particular version of Lumines had a security flaw allowing one to do a lot of interesting stuff with the PSP. Like installing an older, flawed version of the firmware which in turn allows to completely bypass whatever security the PSP would provide.

And now I’m running the latest M33 firmware: 3.71-M4.

What does that mean? It means that the former quite useless device has just become the device of my dreams: It runs SNES games. It runs Playstation 1 games. It’s portable. I can use it in bed without a large assembly of cables, gamepads and laptops. It’s instant-on. It’s optimized for console games. It has a really nice digital directional pad (gone are the days of struggling with diagonally-emphasized joypads – try playing Super Metroid with one of these).

It plays games like Xenogears, Chrono Cross, Chrono Trigger – it finally allows me to enjoy the RPG’s of old in bed before falling asleep. Or in the bathtub. Or whatever.

It’s a real shame that once more I had to resort to legally questionable means to get a particular device to a state I imagine it to be. Why can’t I buy any PS1 game directly from Sony? Why are the games I want to play not even available in Switzerland? Why is it illegal to play the games I want to play? Why are most of the gadgets sold today crippled in a way or another? Why is it illegal to un-cripple our gadgets we bought?

Questions I, frankly, don’t want to answer. For years now I wanted a possibility to play Xenogears in bed and while taking a bath. Now I can, so I’m happy. And playing Xenogears. And loving it like when I was playing through that jewel of game history for the first time.

If I find time, expect some more in-depth articles about the greatness of Xenogears (just kidding – just read the early articles in this blog) or how to finally get your PSP where you want it to be – there are lots of small things to keep in mind to make it work completely satisfactory.