Uncategorized- gnegg

Failing silently is bad

Today, I’ve experienced the perfect example of why I prefer PostgreSQL (congratulations for a successful 8.3 release today, guys!) to MySQL.

Let me first give you some code, before we discuss it (assume that the data which gets placed in the database is – wrongly so – in ISO-8859-1):

This is what PostgreSQL does:

bench ~ > createdb -Upilif -E utf-8 pilif
CREATE DATABASE
bench ~ > psql -Upilif
Welcome to psql 8.1.4, the PostgreSQL interactive terminal.

Type:  copyright for distribution terms
       h for help with SQL commands
       ? for help with psql commands
       g or terminate with semicolon to execute query
       q to quit

pilif=> create table test (blah varchar(20) not null default '');
CREATE TABLE
pilif=> insert into test values ('gnügg');
ERROR:  invalid byte sequence for encoding "UTF8": 0xfc676727293b
pilif=>

and this is what MySQL does:

bench ~ > mysql test
Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 97
Server version: 5.0.44-log Gentoo Linux mysql-5.0.44-r2

Type 'help;' or 'h' for help. Type 'c' to clear the buffer.

mysql> create table test( blah varchar(20) not null default '')
    -> charset=utf8;
Query OK, 0 rows affected (0.01 sec)

mysql> insert into test values ('gnügg');
Query OK, 1 row affected, 1 warning (0.00 sec)

mysql> select * from test;
+------+
| blah |
+------+
| gn   |
+------+
1 row in set (0.00 sec)

mysql>

Obvisouly it is wrong to try and place latin1 encoded data in an utf-8 formatted data store: While every valid utf-8 byte sequence is a valid latin1 byte sequence (latin1 does not restrict the validity of bytes, though some positions may be undefined), the reverse certainly is not true. The character ü from my example is 0xfc in latin1 and U+00fc in unicode which must be encoed as 0xc3 0xbc in utf-8. 0xfc alone is no valid utf-8 byte sequence.

So if you pass this invalid sequence to any entity accepting an utf-8 encoded byte stream, it will not be clear what to do with that data. It’s not utf-8, that’s for sure. But assuming that no character set is specified with the stream, it’s impossible to guess what to translate the byte sequence into.

So PostgreSQL sees the error and bails out (if both the server and the client are set to utf-8 encoding and data is sent in non-utf8-format – otherwise it knows how to convert the data – conversion from any character set to utf-8 is possible all the time). MySQL on the other hand decides to fail silently and to try to fix up the invalid input.

Now while I could maybe live with the default of assuming latin1 encoding, just stopping to process the data without warning what so ever leads to undetected loss of data!

What if I’m not just entering one word? What if it’s a blog-entry like this one? What if the entry is done by a non tech-savvy user? Remember: This mistake can easily be produced: Wrong Content-Type headers, old browsers, broken browsers… it’s very easy to get Latin1 when you want utf-8.

While I agree that sanitization must be done in the application tier (preferably on the model), it’s inacceptable for a database application to store different data than what it was ordered to store without warning the user in any way. This easily leads to data loss or data corruption.

There are many more little things like this where MySQL decides to silently fail where PostgreSQL (and any other database) bail out correctly. As a novice this can feel tedious for you. It can feel like PostgreSQL is pedantic and like you are faster with MySQL. But let’s be honest: What do you prefer? An error message or lost data with no way of knowing that it’s lost?

This, by the way, is the outcome of a lengthy debugging session on a Typo3 installation, which also, but not ultimately is to blame here. In a perfect world, MySQL would bail out, but Typo3 would either

Not specify charset=utf8 when creating the table unless specifically asked to.
Send a charset=utf-8 http-header, knowing that the database has been created as containing utf-8
Sanitize user input before handing it over to the mysql-backend which is obviously broken in this instance.

Now back to debugging real software on real databases *grin*

reddit’s commenting system

This is something I wanted to talk about for quite some time now, but I never got around to it. Maybe you know reddit. reddit basically works like digg.com – it’s one of these web2.0 mashup community social networking bubble sites. reddit is about links posted by users and voted for by users.

Unlike digg, reddit has an awful screen design and thus seems to attract a bit more mature crowds than digg does, but lately it seems to be taken over by politics and pictures which devalues the whole site a bit.

What is really interesting though is the commenting system. In fact, it’s interesting enough for me to write about it and it works well enough for me to actually post a comment there here and then. It’s even good enough for me to be sure that whenever I will be in the situation to design a system to allow users to comment on something that I will have a look at what reddit did and I will model my solution around that base.

There are so many commenting systems out there, but all fail in some regards. Either they disturb your reading flow, making it too difficult to post something. Or they either hide comments behind a foldable tree structure or they display a flat list making it difficult to see any kind of threading going on.

And once you actually are interested in a topic enough to post a comment or a reply to a comment, you’ll quickly lose track of the discussion which gets as quickly buried by newly arriving posts.

reddit works differently.

First, messages are displayed in a threaded, but fully expanded view, thus allowing to skip over content you are not interested in while still providing all the overview you need. Then, posting is done inline via some AJAX interface. You see a comment you want to reply to, you hit the reply link, enter the text and hit "save". The page is not reloaded, you end up just where you left off.

But what good is answering to a comment if the initial commenter quickly forgets about his or her comment? Or if he or she just plain doesn’t find her comment again?

reddit puts all direct replies to any comments you made into your personal inbox folder. If you have any of these replies, the envelope to the top right will light up red allowing you to see newly arrived replies to your comments. With one click, you can show the context of the post you replied to, your reply and the reply you got. This makes it incredibly easy to be notified when someone posted something in response, thus keeping the discussion alive, no matter how deeply it may have been buried by comments arriving after yours.

So even if reddit looks awful (one gets used to the plain look though), it has one of the best, if not the best online discussion systems under its hood and so many other sites should learn from that example. It’s so easy that it even got me to post a comment here and then – and I even got replies despite not obviously trolling (which usually helps you get instant-replies, though I don’t recommend this practice).

The IE rendering dilemma – solved?

A couple of months a IE rendering dilemma: How to fix IE8’s rendering engine without breaking all the corporate intranets out there? How to create both a standards oriented browser and still ensure that the main customers of Microsoft – the enterprises – can still run a current browser without having to redo all their (mostly internal) web applications.

Only three days after my posting IEBlog talked about IE8 passing the ACID2 test. And when you watch the video linked there, you’ll notice that they indeed kept the IE7 engine untouched and added an additional switch to force IE8 into using the new rendering engine.

And yesterday, A List Apart showed us how it’s going to work.

While I completely understand Microsofts solution and the reasoning behind it, I can’t see any other browser doing what Microsoft recommended as a new standard. The idea to keep multiple rendering engines in the browser and default to outdated ones is in my opinion a bad idea. Download-Sizes of browser increase by much, security problems in browsers must be patched multiple times, and, as the Webkit blog put it, “[..] hurts the hackability of the code [..]”.

As long as the other browser vendors don’t have IE’s market share nor the big company intranets depending on these browsers, I don’t see any reason at all for the other browsers to adapt IE’s model.

Also, when I’m doing (X)HTML/CSS work, usually it works and displays correctly in every browser out there – with the exception of IE’s current engine. As long as browsers don’t have awful bugs all over the place and you are not forced to hack around them, deviating from the standard in the process, there is no way a page you create will only work in one specific version of a browser. Even more so: When it breaks on a future version, that’s a bug in the browser that must be fixed there.

Assuming that Microsoft will, finally, get it right with IE8 and subsequent browser versions, we web developers should be fine with

<meta http-equiv="X-UA-Compatible" content="IE=edge" />

on every page we output to a browser. These compatibility hacks are for people that don’t know what they are doing. We know. We follow standards. And if IE begins to do so as well, we are fine with using the latest version of the rendering engine there is.

If IE doesn’t play well and we need to apply braindead hacks that break when a new version of IE comes out, then we’ll all be glad that we have this method of forcing IE to use a particular engine, thus making sure that our hacks continue to work.

Apple TV – Second try

When Apple announced their AppleTV a couple of months (or was it years?) ago, I was very skeptical of the general idea behind the device. Think of it: What was the big success behind the iPod? That it could run proprietary AAC files people buy from the music store?

No. That thing didn’t even exist back then. The reason for the success was the total easy (and FAST – remember: Back in the days, we had 1.1 MB/s USB which every MP3 player used vs. 40MB/s Firewire of the iPod) handling and the fact that it was an MP3 player – playing the files everyone already had.

It was a device for playing the content that was available at the time.

The AppleTV in its first incarnation was a device capable of playing content that wasn’t exactly available. Sure it could play the two video podcasts that existed back then (maybe more, but you get the point). And you could buy TV shows and movies in subpar quality on your PC (Windows or Mac) and then transfer them to the device. But the content that was available back then was in a different format: XVID dominated the scene. x264 was a newcomer and MP4 (and mov) wasn’t exactly used.

So what you got was a device, but no content (and the compatible content you had was in subpar quality compared to the incompatible content that was available). And you needed a PC, so it wasn’t exactly a device I could hook to my parents PC for example.

All these things were fixed by Apple today:

There is a huge library of content available right here, right now (at least in the US): The new movie rental service. Granted. I think it’s not quite there yet price vs. usability-wise (I think $5 is a totally acceptable price for a movie with unlimited replayability), but at least we have the content.
It works without a PC. I can hook this thing up to my parents TV and they can immediately use it.
The quality is OK. Actually, it’s more than OK. There is HD content available (though maybe only 720p one, but frankly, on my expensive 1080p projector, I don’t see that much of a difference between 720p and 1080p)
It can still access the scarce content that was available before.

The fact that this provides very easy to use video-on-demand to a huge amount of people is what makes me think that this little device is even more of a disruptive technology than the iPod or the iPhone. Think of it: Countless of companies are trying to make people pay for content these days. It’s the telcos, it’s cable companies and it’s device manufacturers. But what do we get? Crappy, constantly crashing devices, which are way too complicated for a non-geek and way too limited in functionality for a geek.

Now we got something that’s perfect for the non-geek. It has the content. It has the ease-of-use. Plug it in, watch your movie. Done. This is what a whole industry tried to do and failed so miserably.

I for my part will still prefer the flexibility given by my custom Windows Media Center solution. I will still prefer the openness provided by illegal copies of movies. I totally refuse to pay multiple times for something just because someone says that I have to. But that’s me.

And even I may sooner or later prefer the comfort of select-now-watch-now to the current procedure (log into private tracker, download torrent, wait for download to finish, watch – torrents are not streamable, even if the bandwith would easily suffice in my case – the packets arrive out of order), so even for me, the AppleTV could be interesting.

This was yet another perfect move by Apple. Ignore the analysts out there who expected more out of this latest keynote. Ignore the bad reception of the keynote by the marked (I hear that Apple stock just dropped a little bit). Ignore all that and listen to yourself: This wonderful device will certainly revolutionize the way we consume video content.

I’m writing this as a constant sceptic – as a person always trying to see a flaw in a certain device. But I’m sure that this time around, they really got it. Nice work!

My PSP just got a whole lot more useful

Or useful at all – considering the games that are available to that console. To be honest: Of all the consoles I have owned in my life, the PSP must be the most underused one. I basically own two games for it: Breath of Fire and Tales of Eternia – not only by this choice of titles, but also by reading this blog, you may notice a certain affinity to Japanese Style RPG’s.

These are the closest thing to a successor of the classical graphic adventures I started my computer career with, minus hard to solve puzzles plus a much more interesting story (generally). So for my taste, these things are a perfect match.

But back to the PSP. It’s an old model – one of the first here in Switzerland. One of the first on the world to be honest: I bought the thing WAAAY back with hopes of seeing many interesting RPG’s – or even just good ports of old classics. Sadly neither really happened.

Then, a couple of days ago, I found a usable copy of the game Lumines. Usable in a sense of when the guy in the store told me that there is a sequel out and I told him that I did not intend to actually play the game, he just blinked with one eye and wished me good luck with my endeavor.

Or in layman’s terms: That particular version of Lumines had a security flaw allowing one to do a lot of interesting stuff with the PSP. Like installing an older, flawed version of the firmware which in turn allows to completely bypass whatever security the PSP would provide.

And now I’m running the latest M33 firmware: 3.71-M4.

What does that mean? It means that the former quite useless device has just become the device of my dreams: It runs SNES games. It runs Playstation 1 games. It’s portable. I can use it in bed without a large assembly of cables, gamepads and laptops. It’s instant-on. It’s optimized for console games. It has a really nice digital directional pad (gone are the days of struggling with diagonally-emphasized joypads – try playing Super Metroid with one of these).

It plays games like Xenogears, Chrono Cross, Chrono Trigger – it finally allows me to enjoy the RPG’s of old in bed before falling asleep. Or in the bathtub. Or whatever.

It’s a real shame that once more I had to resort to legally questionable means to get a particular device to a state I imagine it to be. Why can’t I buy any PS1 game directly from Sony? Why are the games I want to play not even available in Switzerland? Why is it illegal to play the games I want to play? Why are most of the gadgets sold today crippled in a way or another? Why is it illegal to un-cripple our gadgets we bought?

Questions I, frankly, don’t want to answer. For years now I wanted a possibility to play Xenogears in bed and while taking a bath. Now I can, so I’m happy. And playing Xenogears. And loving it like when I was playing through that jewel of game history for the first time.

If I find time, expect some more in-depth articles about the greatness of Xenogears (just kidding – just read the early articles in this blog) or how to finally get your PSP where you want it to be – there are lots of small things to keep in mind to make it work completely satisfactory.

SPAM insanity

<p>I don’t see much point in complaining about SPAM, but it’s slowly but surely reaching complete insanity…</p>

What you see here is the recent history view of my DSPAM – our second line of defense against SPAM.

Red means SPAM. (the latest of the messages was a quite clever phishing attempt which I had to manually reclassify)

To give even more perspective to this: The last genuine Email I received was this morning at 7:54 (it’s now 10 hours later) and even that was just an automatically generated mail from Skype.

To put it into even more perspective: My DSPAM reports that since december 22th, I got 897 SPAM messages and – brace yourself – 170 non-spam messages of which 100 were subversion commit emails and 60 other emails sent from automated cron-jobs.

What I’m asking myself now is: Do these spammers still get anything out of their work? The signal-to-noise ratio has gone down the drain in a manner which can only mean that no person on earth would actually still read through all this spam and even be stupid enough to actually fall for it.

How bad does it have to get before it gets better?

Oh and don’t think that DSPAM is all I’m doing… No… these 897 mails were the messages that passed through both the ix DNSBL and SpamAssassin.

Oh and: Kudos to the DSPAM team. A recognition rate of 99.957% is really, really good

The IE rendering dilemma

There’s a new release of Internet Explorer, aptly named IE8, pending and a whole lot of web developers are in fear of new bugs and no fixes to existing ones. Like the problems we had with IE7.

A couple of really nasty bugs where fixed, but there wasn’t any significant progress in matters of extended support for web standards or even a really significant amount of bugfixes.

And now, so fear the web developers, history is going to repeat itself. Why, are people asking, aren’t they just throwing away the currently existing code-base, replacing it with something more reasonable? Or if licensing or political issues prevent using something not developed in-house, why not rewrite IE’s rendering engine from scratch?

Backwards compatibility. While the web itself has more or less stopped using IE-only-isms and began embracing the way of the web standards (and thus began cursing on IE’s bugs), corporate intranets, the websites accessed by Microsoft’s main customer base, certainly have not.

ActiveX, <FONT>-Tags, VBScript – the list is endless and companies don’t have the time or resources to remedy that. Remember. Rewriting for no real purpose than “being modern” is a real waste of time and certainly not worth the effort. Sure. New Applications can be developped in a standards compliant way. But think about the legacy! Why throw all that away when it works so well in the currently installed base of IE6?

This is why Microsoft can’t just throw away what they have.

The only option I see, aside of trying to patch up what’s badly broken, is to integrated another rendering engine into IE. One that’s standards compliant and one that can be selected by some means – maybe a HTML comment (the DOCTYPE specification is already taken).

But then, think of the amount of work this creates in the backend. Now you have to maintain two completely different engines with completely different bugs at different places. Think of security problems. And think of what happens if one of these buggers is detected in a third party engine a hypothetical IE may be using. Is MS willing to take responsibility of third-party bugs? Is it reasonable to ask them to do this?

To me it looks like we are now paying the price for mistakes MS did a long time ago and for quick technological innovation happening at the wrong time on the wrong platform (imagine the intranet revolution happening now). And personally, I don’t see an easy way out.

I’m very interested in seeing how Microsoft solves this problem. Ignore the standards-crowd? Ignore the corporate customers? Add the immense burden of another rendering engine? FIx the current engine (impossible, IMHO)? We’ll know once IE8 is out I guess.

VMWare Server 2.0

Now that the time has come to upgrade shion‘s hardware, and now that I’m running a x86 based platform (well, it’s the 64 bit server install of Ubuntu Gutsy), I guessed it was time to have a look at my current bittorrent solution.

Of all the torrent clients out there, so far, I had the most painless experience with uTorrent: Acceptable download speeds, a very nice web interface and a nice looking user interface. The only drawback is that it requires Windows to run and I had no constant-running Windows-PC at home.

In fact, I didn’t even have a Windows-PC at all. VMWare Fusion came to the rescue as it allowed me to install Windows on a virtual machine and run that on my main mac at home. I chose fusion as opposed to parallels because I always knew that I was going to update shion sooner or later, so I wanted the portability of the VMWare virtual machines (they run everywhere VMWare runs on – no conversion, no nothing).

And now that I did replace shion, I’ve installed the latest beta version of VMWare Server 2.0 and moved the virtual machine over to the newly born shion 2.0 which means that I now have a constantly running “Windows-PC” at home.

The move was painless as expected, but the whole process of installing VMWare server or the web interface was not as painless. VMWare Server feels exactly like every other proprietary Unix application I ever had to deal with. Problems with shared libraries (PAM, Gentoo, 32bit emulation and vmware server 1.0 is pure hell), problems with init-scripts not working, problems with incomprehensible error messages, you name it.

And once I actually got the thing to run, the first thing I had to do was to configure a whole bunch of iptables-rules because it seems impossible to bind all the 7 ports the web interface opens to localhost only (shion also is my access router, so I certainly don’t want the vmware-stuff exposed on eth1).

And actually using the web interface means forwarding all the 7 ports. In VMWare Server 1, it sufficed to forward the one port the console application used.

All this to finally end up without a working console access – the browser plugin they use for this seems not to work with Mac OS X and adding all the 7 ports to putty in my client windows VM, frankly, was more complicated than what I could get out of it.

Before this goes final with the expectation of being as useful as version 1 was, they need to give us back a native client and a smaller number of ports to forward.

shion died

After so many years of continued usage, shion (not the character from Xenosaga, my Mac Mini) died.

The few times it’s actually capable of detecting its hard-drive at boot-time, it loses contact to it shortly after loading the kernel. And the hard drive makes an awful kind of noise which is a very good pointer at what’s wrong.

Now, I could probably just replace the hard drive, but that old G4 processor, the 512 Megs of RAM and the two single USB-ports forcing me to cascade hub after hub all are good reasons to upgrade the hardware itself.

And thus, Shion 2.0 was born.

I grabbed an unused Mac Mini from the office and tried installing Ubuntu Gutsy on it, which worked well, but Leopard’s “Startup Disk” preference pane didn’t list the partition I installed Ubuntu on as a bootable partition. Booting Linux via pressing alt during pre-boot worked, but, hey, it’s a server and I don’t have a keyboard ready where shion is going to stand.

So I did it the brute-force way and just installed Ubuntu using the whole drive. It takes a hell of a lot of time for the EFI firmware to start missing the original GUID partition scheme and the original EFI parition, but when it does, it starts GRUB in the MBR partition, so I’m fine.

This does mean that I will be unable to install later firmware upgrades (due to the lack of a working OS X), but at least it means that I can reboot shion when needed without having to grab a keyboard.

This, provided that Domi will be able to solder me a display adaptor making the EFI BIOS emulation think that a display is connected.

All in all, I’m not totally happy with the next generation of shion. Not booting without a display attached, long boot times, non-working bios updates and, especially, no eSATA, but it’s free, so I’ll take it. I guess the old shion just chose a terribly inconvenient time to die.

Closed Source on Linux

One of the developers behind the Linux port of the new Flex Builder for Linux has a blog post about how building closed source software for linux is hard

Mostly, all the problems boil down to the fact that Linux distributors keep patching the upstream source to fit their needs which clearly is a problem rooted in the fact that open source software is, well, open sourced.

Don’t get me wrong. I love the concepts behind free software and in fact, every piece of software I’ve written so far has been open source (aside of most of the code I’m doing for my eployer of course). I just don’t see why every distribution feels the urgue to patch around upstream code, especially as this issue applies to both open- and closed source software projects.

And worse yet: Every distribution adds their own bits and pieces – sometimes doing the same stuff in different ways and thus making it impossible or at least very hard for a third party to create add-ons for a certain package.

What good is a plugin system if the interface works slightly different on each and every distribution?

And think of the time you waste learning configuration files over and over again. To make an example: Some time ago, SuSE delivered an apache server that was using a completely customized configuration file layout, thereby breaking every tutorial and documentation written out there because none of the directives where in the files they are supposed to be.

Other packages are deliberately broken up. Bind for example often comes in two flavors: The server and the client, even though officially, you just get one package. Additionally, every library package these days is broken up in the real library and the development headers. Sometimes the source of these packages may even get patched to support such breaking up.

This creates an incredible mess for all involved parties:

The upstream developer gets blamed for bugs she didn’t cause because they were introduced by the packager.
Third party developers can’t rely on their plugins or whatever pluggable components to work across distributions if they work upstream
Distributions have to do the same work over and over again as new upstream versions are released, thus wasting time better used for other improvements.
End users suffer from the general disability of reliably installing precompiled third-party binaries (mysql recommends the use of their binaries, so this even affects open sourced software) and from the inability to follow online-tutorials not written for the particular distribution that’s in use.

This mess must come to an end.

Unfortunately, I don’t know how.

You see: Not all patches created by distributions get merged upstream. Sometimes, political issues prevent a cool feature from being merged, sometimes clear bugs are not recognized as such upstream and sometimes upstream is dead – you get the idea.

Solution like FHS and LSB tried to standardize many aspects of how linux distributions should work in the hope of solving this problem. Bureaucracy and idiotic ideas (german link, I’m sorry) are causing quite the bunch of problems lately, making it hard to impossible to implement the standards. And often the standards don’t specify the latest and greatest parts of current technology.

Personally, I’m hoping that we’ll either end up with one big distribution defining the “state of the art”, with the others being 100% compatible or with distributions switching to pure upstream releases with only their own tools custom-made.

What do you think? What has to change in your opinion?