webdev- gnegg

pilif.ch is back

It has been a while since I lost pilif.ch. Two years to be exact.

Fortunately, it looks like the domain grabber who took pilif.ch after that unfortunate accounting incident has since lost interest, so now pilif.ch belongs to me again. About bloody time!

Aside of the fact that my online identity has always been pilif (despite lipfi sounding much friendlier when pronounced in swiss german), there are other reasons for me wanting the domain back:

it’s in my MSN-ID (passport@pilif.ch)
various other @pilif.ch addresses are registered at various services I’ve since forgotten the password for.
it was the very first domain I bought – ever.

So it’s back to the roots for me. MX, Web and DNS are already configured (the zone file is actually symlinked to lipfi.ch – I have no idea whether this is a legal thing to do, but it works).

Home – sweet home!

Dependent on working infrastructure

If you create and later deploy and run a web application, then you are dependent on a working infrastructure: You need a working web server, you need a working application server and in most cases, you’ll need a working database server.

Also, you’d want a solution that always and consistently works.

We’ve been using lighttpd/FastCGI/PHP for our deployment needs lately. I’ve preferred this to apache due to the easier configuration possible with lighty (out of the box automated virtual hosting for example), the potentially higher performance (due to long-running FastCGI processes) and the smaller amount of memory consumed by lighttpd.

But last week, I had to learn the price of walking off the beaten path (Apache, mod_php).

In one particular constellation, the lighty, fastcgi, php combination, running on a Gentoo box sometimes (read: 50% of the time) a certain script didn’t output all the data it should have. Instead, lighty randomly sent out RST packets. This without any indication of what could be wrong in any of the involved log files.

Naturally, I looked everywhere.

I read the source code of PHP. I’ve created reduced test cases. I’ve tried workarounds.

The problem didn’t go away until I tested the same script with Apache.

This is where I’m getting pragmatic: I depend on a working infrastructure. I need it to work. Our customers need it to work. I don’t care who is to blame. Is it PHP? Is it lighty? Is it Gentoo? Is it the ISP (though it would have to be on the senders end as I’ve seen the described failure with different ISPs)?

I don’t care.

My interest is in developing a web application. Not in deploying one. Not really, anyways.

I’m willing (and able) to fix bugs in my development environment. I may even be able to fix bugs in my deployment platform. But I’m certainly not willing to. Not if there is a competing platform that works.

So after quite some time with lighty and fastcgi, it’s back to Apache. The prospect of having a consistently working backed largely outweighs the theoretical benefits of memory savings, I’m afraid.

Web service authentication

When reading an article about how to make google reader work with authenticated feeds, one big flaw behind all those web 2.0 services sprang to my mind: Authentication.

I know that there are efforts underway to standardise on a common method of service authentication, but we are nowhere near there yet.

Take facebook: They offer you to enter your email account data into some form to send an invitation to all your friends. Or the article I was referring to: They want your account data for a authenticated feed to make them available in google reader.

But think of what you are giving away…

For your service provider to be able to interact with that other service, they need to store your passwort. Be it short term (facebook, hopefully) or long term (any online feed reader with authentication support). They can (and do) assure you that they will store the data in encrypted form, but to be able to access the service in the end, they need the unencrypted password, thus requiring them to not only use reversible encryption, but to also keep the encryption key around.

Do you want a company in a country whose laws you are not familiar with to have access to all your account data? Do you want to give them the password to your personal email account? Or to everything else in case you share passwords?

People don’t seem to get this problem as account data is freely given all over the place.

Efforts like OAuth are clearly needed, but as webbased technology, they clearly can’t solve all the problems (what about Email accounts for example).

But is this the right way? We can’t even trust desktop applications. Personally, I think the good old username/password combination is at the end of its usefulness (was it ever really useful?). We need new, better, ways for proving our identity. Something that is easily passed around and yet cannot be copied.

SSL client certificates feel like an underused but very interesting option. Let’s make two examples. The first one is your authenticated feed. The second one is your SSL-enabled email server. Let’s say that you want to give a web service revokable access to both services without ever giving away personal information.

For the authenticated feed, the external service will present the feed server with its client side certificate which you have signed. By checking your signature, the authenticated feed knows your identity and by checking your CRL it knows whether you authorized the access or not. The service doesn’t know your password and can’t use your signature for anything but accessing that feed.

The same goes for the email server: The third party service logs in with your username and the signed client certificate (signed by you), but without password. The service doesn’t need to know your password and in case they do something wrong, you revoke your signature and be done with it (I’m not sure whether mail servers support client certificates, but I gather they do as it’s part of the SSL spec).

Client side certificates already provide a standard means for secure authentication without ever passing a known secret around. Why isn’t it used way more often these days?

Another new look

It has been a while since the last redesign of gnegg.ch, but is a new look after just a little more than one year of usage really needed?

The point is that I have changed blogging engines yet again. This time it’s from Serendipity to Word Press.

What motivated the change?

Interestingly enough, if you ask me, s9y is clearly the better product than WordPress. If WordPress is Mac OS, then s9y is Linux: It has more features it’s based on cleaner code, it doesn’t have any commercial backing at all. So the question remains: Why switch?

Because that OSX/Linux-analogy also works the other way around: s9y is an ugly duckling compared to WP. External tools won’t work (well) with s9y due to it not being known well enough. The amount of knobs to tweak is sometimes overwhelming and the available plugins are not nearly as polished as the WP ones.

All these are reasons to make me switch. I’ve used a s9y to wp converter, but some heavy tweaking was needed to make it actually transfer category assignements and tags (the former didn’t work, the latter wasn’t even implemented). Unfortunately, the changes were too hackish to actually publish them here, but it’s quite easily done.

Aside of that, most of the site has survived the switch quite nicely (the permalinks are broken once again though), so let’s see how this goes :-)

Impressed by git

The company I’m working with is a Subversion shop. It has been for a long time – since fall of 2004 actually where I finally decided that the time for CVS is over and that I was going to move to subversion. As I was the only developer back then and as the whole infrastructure mainly consisted of CVS and ViewVC (cvsweb back then), this move was an easy one.

Now, we are a team of three developers, heavy trac users and truly dependant on Subversion which is – mainly due to the amount of infrastructure that we built around it – not going away anytime soon.

But none the less: We (mainly I) were feeling the shortcomings of subversion:

Branching is not something you do easily. I tried working with branches before, but merging them really hurt, thus making it somewhat prohibitive to branch often.
Sometimes, half-finished stuff ends up in the repository. This is unavoidable considering the option of having a bucket load of uncommitted changes in the working copy.
Code review is difficult as actually trying out patches is a real pain to do due to the process of sending, applying and reverting patches being a manual kind of work.
A pet-peeve of mine though is untested, experimental features developed out of sheer interest. Stuff like that lies in the working copy, waiting to be reviewed or even just having its real-life use discussed. Sooner or later, a needed change must go in and you have the two options of either sneaking in the change (bad), manually diffing out the change (hard to do sometimes) or just forget it and svn revert it (a real shame).

Ever since the Linux kernel first began using Bitkeeper to track development, I knew that there is no technical reason for these problems. I knew that a solution for all this existed and that I just wasn’t ready to try it.

Last weekend, I finally had a look at the different distributed revision control systems out there. Due to the insane amount of infrastructure built around Subversion and not to scare off my team members, I wanted something that integrated into subversion, using that repository as the official place where official code ends up while still giving us the freedom to fix all the problems listed above.

I had a closer look at both Mercurial and git, though in the end, the nicely working SVN integration of git was what made me have a closer look at that.

Contrary to what everyone is saying, I have no problem with the interface of the tool – once you learn the terminology of stuff, it’s quite easy to get used to the system. So far, I did a lot of testing with both live repositories and test repositories – everything working out very nicely. I’ve already seen the impressive branch merging abilities of git (to think that in subversion you actually have to a) find out at which revision a branch was created and to b) remember every patch you cherry-picked…. crazy) and I’m getting into the details more and more.

On our trac installation, I’ve written a tutorial on how we could use git in conjunction with the central Subversion server which allowed me to learn quite a lot about how git works and what it can do for us.

So for me it’s git-all-the-way now and I’m already looking forward to being able to create many little branches containing many little experimental features.

If you have the time and you are interested in gaining many unexpected freedoms in matters of source code management, you too should have a look at git. Also consider that on the side of the subversion backend, no change is needed at all, meaning that even if you are forced to use subversion, you can privately use git to help you manage your work. Nobody would ever have to know.

Very, very nice.

Failing silently is bad

Today, I’ve experienced the perfect example of why I prefer PostgreSQL (congratulations for a successful 8.3 release today, guys!) to MySQL.

Let me first give you some code, before we discuss it (assume that the data which gets placed in the database is – wrongly so – in ISO-8859-1):

This is what PostgreSQL does:

bench ~ > createdb -Upilif -E utf-8 pilif
CREATE DATABASE
bench ~ > psql -Upilif
Welcome to psql 8.1.4, the PostgreSQL interactive terminal.

Type:  copyright for distribution terms
       h for help with SQL commands
       ? for help with psql commands
       g or terminate with semicolon to execute query
       q to quit

pilif=> create table test (blah varchar(20) not null default '');
CREATE TABLE
pilif=> insert into test values ('gnügg');
ERROR:  invalid byte sequence for encoding "UTF8": 0xfc676727293b
pilif=>

and this is what MySQL does:

bench ~ > mysql test
Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 97
Server version: 5.0.44-log Gentoo Linux mysql-5.0.44-r2

Type 'help;' or 'h' for help. Type 'c' to clear the buffer.

mysql> create table test( blah varchar(20) not null default '')
    -> charset=utf8;
Query OK, 0 rows affected (0.01 sec)

mysql> insert into test values ('gnügg');
Query OK, 1 row affected, 1 warning (0.00 sec)

mysql> select * from test;
+------+
| blah |
+------+
| gn   |
+------+
1 row in set (0.00 sec)

mysql>

Obvisouly it is wrong to try and place latin1 encoded data in an utf-8 formatted data store: While every valid utf-8 byte sequence is a valid latin1 byte sequence (latin1 does not restrict the validity of bytes, though some positions may be undefined), the reverse certainly is not true. The character ü from my example is 0xfc in latin1 and U+00fc in unicode which must be encoed as 0xc3 0xbc in utf-8. 0xfc alone is no valid utf-8 byte sequence.

So if you pass this invalid sequence to any entity accepting an utf-8 encoded byte stream, it will not be clear what to do with that data. It’s not utf-8, that’s for sure. But assuming that no character set is specified with the stream, it’s impossible to guess what to translate the byte sequence into.

So PostgreSQL sees the error and bails out (if both the server and the client are set to utf-8 encoding and data is sent in non-utf8-format – otherwise it knows how to convert the data – conversion from any character set to utf-8 is possible all the time). MySQL on the other hand decides to fail silently and to try to fix up the invalid input.

Now while I could maybe live with the default of assuming latin1 encoding, just stopping to process the data without warning what so ever leads to undetected loss of data!

What if I’m not just entering one word? What if it’s a blog-entry like this one? What if the entry is done by a non tech-savvy user? Remember: This mistake can easily be produced: Wrong Content-Type headers, old browsers, broken browsers… it’s very easy to get Latin1 when you want utf-8.

While I agree that sanitization must be done in the application tier (preferably on the model), it’s inacceptable for a database application to store different data than what it was ordered to store without warning the user in any way. This easily leads to data loss or data corruption.

There are many more little things like this where MySQL decides to silently fail where PostgreSQL (and any other database) bail out correctly. As a novice this can feel tedious for you. It can feel like PostgreSQL is pedantic and like you are faster with MySQL. But let’s be honest: What do you prefer? An error message or lost data with no way of knowing that it’s lost?

This, by the way, is the outcome of a lengthy debugging session on a Typo3 installation, which also, but not ultimately is to blame here. In a perfect world, MySQL would bail out, but Typo3 would either

Not specify charset=utf8 when creating the table unless specifically asked to.
Send a charset=utf-8 http-header, knowing that the database has been created as containing utf-8
Sanitize user input before handing it over to the mysql-backend which is obviously broken in this instance.

Now back to debugging real software on real databases *grin*

reddit’s commenting system

This is something I wanted to talk about for quite some time now, but I never got around to it. Maybe you know reddit. reddit basically works like digg.com – it’s one of these web2.0 mashup community social networking bubble sites. reddit is about links posted by users and voted for by users.

Unlike digg, reddit has an awful screen design and thus seems to attract a bit more mature crowds than digg does, but lately it seems to be taken over by politics and pictures which devalues the whole site a bit.

What is really interesting though is the commenting system. In fact, it’s interesting enough for me to write about it and it works well enough for me to actually post a comment there here and then. It’s even good enough for me to be sure that whenever I will be in the situation to design a system to allow users to comment on something that I will have a look at what reddit did and I will model my solution around that base.

There are so many commenting systems out there, but all fail in some regards. Either they disturb your reading flow, making it too difficult to post something. Or they either hide comments behind a foldable tree structure or they display a flat list making it difficult to see any kind of threading going on.

And once you actually are interested in a topic enough to post a comment or a reply to a comment, you’ll quickly lose track of the discussion which gets as quickly buried by newly arriving posts.

reddit works differently.

First, messages are displayed in a threaded, but fully expanded view, thus allowing to skip over content you are not interested in while still providing all the overview you need. Then, posting is done inline via some AJAX interface. You see a comment you want to reply to, you hit the reply link, enter the text and hit "save". The page is not reloaded, you end up just where you left off.

But what good is answering to a comment if the initial commenter quickly forgets about his or her comment? Or if he or she just plain doesn’t find her comment again?

reddit puts all direct replies to any comments you made into your personal inbox folder. If you have any of these replies, the envelope to the top right will light up red allowing you to see newly arrived replies to your comments. With one click, you can show the context of the post you replied to, your reply and the reply you got. This makes it incredibly easy to be notified when someone posted something in response, thus keeping the discussion alive, no matter how deeply it may have been buried by comments arriving after yours.

So even if reddit looks awful (one gets used to the plain look though), it has one of the best, if not the best online discussion systems under its hood and so many other sites should learn from that example. It’s so easy that it even got me to post a comment here and then – and I even got replies despite not obviously trolling (which usually helps you get instant-replies, though I don’t recommend this practice).

The IE rendering dilemma – solved?

A couple of months a IE rendering dilemma: How to fix IE8’s rendering engine without breaking all the corporate intranets out there? How to create both a standards oriented browser and still ensure that the main customers of Microsoft – the enterprises – can still run a current browser without having to redo all their (mostly internal) web applications.

Only three days after my posting IEBlog talked about IE8 passing the ACID2 test. And when you watch the video linked there, you’ll notice that they indeed kept the IE7 engine untouched and added an additional switch to force IE8 into using the new rendering engine.

And yesterday, A List Apart showed us how it’s going to work.

While I completely understand Microsofts solution and the reasoning behind it, I can’t see any other browser doing what Microsoft recommended as a new standard. The idea to keep multiple rendering engines in the browser and default to outdated ones is in my opinion a bad idea. Download-Sizes of browser increase by much, security problems in browsers must be patched multiple times, and, as the Webkit blog put it, “[..] hurts the hackability of the code [..]”.

As long as the other browser vendors don’t have IE’s market share nor the big company intranets depending on these browsers, I don’t see any reason at all for the other browsers to adapt IE’s model.

Also, when I’m doing (X)HTML/CSS work, usually it works and displays correctly in every browser out there – with the exception of IE’s current engine. As long as browsers don’t have awful bugs all over the place and you are not forced to hack around them, deviating from the standard in the process, there is no way a page you create will only work in one specific version of a browser. Even more so: When it breaks on a future version, that’s a bug in the browser that must be fixed there.

Assuming that Microsoft will, finally, get it right with IE8 and subsequent browser versions, we web developers should be fine with

<meta http-equiv="X-UA-Compatible" content="IE=edge" />

on every page we output to a browser. These compatibility hacks are for people that don’t know what they are doing. We know. We follow standards. And if IE begins to do so as well, we are fine with using the latest version of the rendering engine there is.

If IE doesn’t play well and we need to apply braindead hacks that break when a new version of IE comes out, then we’ll all be glad that we have this method of forcing IE to use a particular engine, thus making sure that our hacks continue to work.

The IE rendering dilemma

There’s a new release of Internet Explorer, aptly named IE8, pending and a whole lot of web developers are in fear of new bugs and no fixes to existing ones. Like the problems we had with IE7.

A couple of really nasty bugs where fixed, but there wasn’t any significant progress in matters of extended support for web standards or even a really significant amount of bugfixes.

And now, so fear the web developers, history is going to repeat itself. Why, are people asking, aren’t they just throwing away the currently existing code-base, replacing it with something more reasonable? Or if licensing or political issues prevent using something not developed in-house, why not rewrite IE’s rendering engine from scratch?

Backwards compatibility. While the web itself has more or less stopped using IE-only-isms and began embracing the way of the web standards (and thus began cursing on IE’s bugs), corporate intranets, the websites accessed by Microsoft’s main customer base, certainly have not.

ActiveX, <FONT>-Tags, VBScript – the list is endless and companies don’t have the time or resources to remedy that. Remember. Rewriting for no real purpose than “being modern” is a real waste of time and certainly not worth the effort. Sure. New Applications can be developped in a standards compliant way. But think about the legacy! Why throw all that away when it works so well in the currently installed base of IE6?

This is why Microsoft can’t just throw away what they have.

The only option I see, aside of trying to patch up what’s badly broken, is to integrated another rendering engine into IE. One that’s standards compliant and one that can be selected by some means – maybe a HTML comment (the DOCTYPE specification is already taken).

But then, think of the amount of work this creates in the backend. Now you have to maintain two completely different engines with completely different bugs at different places. Think of security problems. And think of what happens if one of these buggers is detected in a third party engine a hypothetical IE may be using. Is MS willing to take responsibility of third-party bugs? Is it reasonable to ask them to do this?

To me it looks like we are now paying the price for mistakes MS did a long time ago and for quick technological innovation happening at the wrong time on the wrong platform (imagine the intranet revolution happening now). And personally, I don’t see an easy way out.

I’m very interested in seeing how Microsoft solves this problem. Ignore the standards-crowd? Ignore the corporate customers? Add the immense burden of another rendering engine? FIx the current engine (impossible, IMHO)? We’ll know once IE8 is out I guess.

Web Applications and the View State

Today, it came to my mind, that I know of a problem with some web applications, which apparently few else seem to know about. What is worse, is that those new technologies like ASP.NET and Java Server Faces seem to run straight into the problem.

This article is even bigger than the usual, so I split it up.
Continue reading “Web Applications and the View State”