programming- gnegg

A rant on brace placement

Many people consider it to be good coding style to have braces (in language that use them for block boundaries) on their own line. Like so:

function doSomething($param1, $param2)
{
    echo "param1: $param1 / param2: $param2";
}

Their argument usually is that it clearly shows the block boundaries, thus increasing the readability. I, as a proponent of placing bracers at the end of the statement opening the block, strongly disagree. I would format above code like so:

function doSomething($param1, $param2){
    echo "param1: $param1 / param2: $param2";
}

Here is why I prefer this:

In many languages code blocks don’t have their own identity – functions have, but not blocks (they don’t provide scope). Placing the opening brace on its own line, you emphasize the block but you actually make it harder to see what caused the block in the first place.
Using correct indentation, the presence of the block should be obvious anyways. There is no need to emphasize it more (at the cost of readability of the block opening statement).
I doubt that using one line per token really makes the code more readable. Heck… why don’t we write that sample code like so?

function
doSomething
(
$param1,
$param2
)
{
    echo "param1: $param1 / param2: $param2";
}

Impressed by git

The company I’m working with is a Subversion shop. It has been for a long time – since fall of 2004 actually where I finally decided that the time for CVS is over and that I was going to move to subversion. As I was the only developer back then and as the whole infrastructure mainly consisted of CVS and ViewVC (cvsweb back then), this move was an easy one.

Now, we are a team of three developers, heavy trac users and truly dependant on Subversion which is – mainly due to the amount of infrastructure that we built around it – not going away anytime soon.

But none the less: We (mainly I) were feeling the shortcomings of subversion:

Branching is not something you do easily. I tried working with branches before, but merging them really hurt, thus making it somewhat prohibitive to branch often.
Sometimes, half-finished stuff ends up in the repository. This is unavoidable considering the option of having a bucket load of uncommitted changes in the working copy.
Code review is difficult as actually trying out patches is a real pain to do due to the process of sending, applying and reverting patches being a manual kind of work.
A pet-peeve of mine though is untested, experimental features developed out of sheer interest. Stuff like that lies in the working copy, waiting to be reviewed or even just having its real-life use discussed. Sooner or later, a needed change must go in and you have the two options of either sneaking in the change (bad), manually diffing out the change (hard to do sometimes) or just forget it and svn revert it (a real shame).

Ever since the Linux kernel first began using Bitkeeper to track development, I knew that there is no technical reason for these problems. I knew that a solution for all this existed and that I just wasn’t ready to try it.

Last weekend, I finally had a look at the different distributed revision control systems out there. Due to the insane amount of infrastructure built around Subversion and not to scare off my team members, I wanted something that integrated into subversion, using that repository as the official place where official code ends up while still giving us the freedom to fix all the problems listed above.

I had a closer look at both Mercurial and git, though in the end, the nicely working SVN integration of git was what made me have a closer look at that.

Contrary to what everyone is saying, I have no problem with the interface of the tool – once you learn the terminology of stuff, it’s quite easy to get used to the system. So far, I did a lot of testing with both live repositories and test repositories – everything working out very nicely. I’ve already seen the impressive branch merging abilities of git (to think that in subversion you actually have to a) find out at which revision a branch was created and to b) remember every patch you cherry-picked…. crazy) and I’m getting into the details more and more.

On our trac installation, I’ve written a tutorial on how we could use git in conjunction with the central Subversion server which allowed me to learn quite a lot about how git works and what it can do for us.

So for me it’s git-all-the-way now and I’m already looking forward to being able to create many little branches containing many little experimental features.

If you have the time and you are interested in gaining many unexpected freedoms in matters of source code management, you too should have a look at git. Also consider that on the side of the subversion backend, no change is needed at all, meaning that even if you are forced to use subversion, you can privately use git to help you manage your work. Nobody would ever have to know.

Very, very nice.

The IE rendering dilemma – solved?

A couple of months a IE rendering dilemma: How to fix IE8’s rendering engine without breaking all the corporate intranets out there? How to create both a standards oriented browser and still ensure that the main customers of Microsoft – the enterprises – can still run a current browser without having to redo all their (mostly internal) web applications.

Only three days after my posting IEBlog talked about IE8 passing the ACID2 test. And when you watch the video linked there, you’ll notice that they indeed kept the IE7 engine untouched and added an additional switch to force IE8 into using the new rendering engine.

And yesterday, A List Apart showed us how it’s going to work.

While I completely understand Microsofts solution and the reasoning behind it, I can’t see any other browser doing what Microsoft recommended as a new standard. The idea to keep multiple rendering engines in the browser and default to outdated ones is in my opinion a bad idea. Download-Sizes of browser increase by much, security problems in browsers must be patched multiple times, and, as the Webkit blog put it, “[..] hurts the hackability of the code [..]”.

As long as the other browser vendors don’t have IE’s market share nor the big company intranets depending on these browsers, I don’t see any reason at all for the other browsers to adapt IE’s model.

Also, when I’m doing (X)HTML/CSS work, usually it works and displays correctly in every browser out there – with the exception of IE’s current engine. As long as browsers don’t have awful bugs all over the place and you are not forced to hack around them, deviating from the standard in the process, there is no way a page you create will only work in one specific version of a browser. Even more so: When it breaks on a future version, that’s a bug in the browser that must be fixed there.

Assuming that Microsoft will, finally, get it right with IE8 and subsequent browser versions, we web developers should be fine with

<meta http-equiv="X-UA-Compatible" content="IE=edge" />

on every page we output to a browser. These compatibility hacks are for people that don’t know what they are doing. We know. We follow standards. And if IE begins to do so as well, we are fine with using the latest version of the rendering engine there is.

If IE doesn’t play well and we need to apply braindead hacks that break when a new version of IE comes out, then we’ll all be glad that we have this method of forcing IE to use a particular engine, thus making sure that our hacks continue to work.

C, C#, Java

Today, I was working on porting a EAN128-parser from Java to C#. The parser itself was initially written in C and porting it from there to Java was already quite easy – sure. It still looks like C, but it works nicely and thankfully, understanding the algorithm once and writing it was enough for me, so I can live with not-so-well looking Java code.

What made me write this entry though is the fact that porting the Java version over to C# involved three steps:

Copy
Paste
Change byte barCode[] to byte[] barCode

It’s incredible how similar those two languages are – at least if what you are working with more or less uses the feature set C provided us with.

PHP 5.2.4

Today, the bugfix-release 5.2.4 of PHP has been released.

This is an interesting release, because it includes my fix for bug 42117 which I discovered and fixed a couple of weeks ago.

This means that with PHP 5.2.4 I will finally be able to bzip2-encode data as it is generated on the server and stream it out to the client, greatly speeding up our windows client.

Now I only need to wait for the updated gentoo package to update our servers.

More iPod fun

Last time I explained how to get .OGG-feeds to your iPod.

Today I’ll show you one possible direction one could go to greatly increase the usability of non-official (read: not bought at audible.com) audiobooks you may have lying around in .MP3 format.

You see, your iPod threats every MP3-File of your library as music, regardless of length and content. This can be annoying as the iPod (rightly so) forgets the position in the file when you stop playback. So if you return to the file, you’ll have to start from the beginning and seek through the file.

This is a real pain in case of longer audiobooks and / or radio plays of which I have a ton

One way is to convert your audiobooks to AAC and rename the file to .m4b which will convince iTunes to internally tag the files as audiobooks and then enable the additional features (storing the position and providing UI to change play speed).

Of course this would have meant converting a considerable part of my MP3 library to the AAC-format which is not yet as widely deployed (not to speak of the quality-loss I’d have to endure when converting a lossy format into another lossy format).

It dawned me that there’s another way to make the iPod store the position – even with MP3-files: Podcasts.

So the idea was to create a script that reads my MP3-Library and outputs RSS to make iTunes think it’s working with a Podcast.

And thus, audiobook2cast.php was born.

The script is very much tailored to my directory structure and probably won’t work at your end, but I hope it’ll provide you with something to work with.

In the script, I can only point out two interesting points:

When checking a podcast, iTunes ignores the type-attribute of the enclosure when determining whether a file can be played or not. So I had to add the fake .mp3-extension.
I’m outputting a totally fake pubDate-Element in the <item>-Tag to force iTunes to sort the audiobooks in ascending order.

As I said: This is probably not useful to you out-of-the-box, but it’s certainly an interesting solution to an interesting problem.

Updating or replacing datasets

This is maybe the most obvious trick in the world but I see people not doing it all over the place, so I guess it’s time to write about it.

Let’s say you have a certain set of data you need to enter into your RDBMS. Let’s further assume that you don’t know whether the data is already there or not, so you don’t know whether to use INSERT or UPDATE

Some databases provide us with something like REPLACE or “INSERT OR REPLACE”, but others do not. Now the question is, how to do this efficiently?

What I always see is something like this (pseudo-code):

select count(*) from xxx where primary_key = xxx
if (count > 0) update; else insert;

This means that for every dataset you will have to do two queries. This can be reduced to only one query in some cases by using this little trick:

update xxx set yyy where primary_key = xxx
if (affected_rows(query) == 0) insert;

This method just goes ahead with the update, assuming that data is already there (which usually is the right assumption anyways). Then it checks if an update has been made. If not, it goes ahead and inserts the data set.

This means that in cases where the data is already there in the database, you can reduce the work on the database to one single query.

Additionally, doing a SELECT and then an UPDATE essentially does the select twice as the update will cause the database to select the rows to update anyways. Depending on your optimizer and/or query cache, this can be optimized away of course, but there are no guarantees.

PHP, stream filters, bzip2.compress

Maybe you remember that, more than a year ago, I had an interesting problem with stream filters.

The general idea is that I want to output bz2-compressed data to the client as the output is being assembled – or, more to the point: The PopScan Windows-Client supports the transmission of bzip2 encoded data which gets really interesting as the amount of data to be transferred increases.

Even more so: The transmitted data is in XML format which is very easily compressed – especially with bzip2.

Once you begin to transmit multiple megabytes of uncompressed XML-data, you begin to see the sense in jumping through a hoop or two to decrease the time needed to transmit the data.

On the receiving end, I have an elaborate construct capable of downloading, decompressing, parsing and storing data as it arrives over the network.

On the sending end though, I have been less lucky: Because of that problem I had, I was unable to stream out bzip2 compressed data as it was generated – the end of the file was sometimes missing. This is why I’m using ob_start() to gather all the output and then compress it with bzcompress() to send it out.

Of course this means that all the data must be assembled before it can be compressed and the sent to the client.

As we have more and more data to transmit, the client must wait longer and longer before the data begins to reach it.

And then comes the moment when the client times out.

So I finally really had to fix the problem. I could not believe that I was unable to compress and stream out data on the fly.

It turns out that I finally found the smallest possible amount of code to illustrate the problem in a non-hacky way:

So: This fails under PHP up until 5.2.3:

<?
$str = "BEGIN (%d)n
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad
minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip
ex ea commodo consequat. Duis aute irure dolor in reprehenderit in
voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur
sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt
mollit anim id est laborum.
nEND (%d)n";

$h = fopen($_SERVER['argv'][1], 'w');
$f = stream_filter_append($h, "bzip2.compress", STREAM_FILTER_WRITE);
for($x=0; $x < 10000; $x++){
   fprintf($h, $str, $x, $x);

}
fclose($h);
echo "Writtenn";
?>

Even worse though: It doesn’t fail with a message, but it writes out a corrupt bzip-File.

And it gets worse: With a little amount of data it works, but as the amount of data increases, it begins to fail – at different places depending on how you shuffle the data around.

Above script will write a bzip file which – when uncompressed – will end around iteration 9600.

So now that I had a small reproducible testcase, I could report a bug in PHP: Bug 47117.

After spending so many hours on a problem which in the end boiled down to a bug in PHP (I’ve looked anywhere, believe me. I also tried workarounds, but all to no avail), I just could not let the story end there.

Some investigation quickly turned up a wrong check for a return value in bz2_filter.c which I was able to patch up very, very quickly, so if you visit that bug above, you will find a patch correcting the problem.

Then, when I finished patching PHP itself, hacking up the needed PHP-code to let the thing stream out the compressed data as it arrived was easy. If you want, you can have a look at bzcomp.phps which demonstrates how to plug the compression into either the output buffer handling or something quick, dirty and easier else.

Oh, and if you are tempted to do this:

function ob($buf){
        return bzcompress($buf);
}

ob_start('ob');

… it won’t do any good because you will still gobble up all the data before compressing. And this:

function ob($buf){
        return bzcompress($buf);
}

ob_start('ob', 32768);

will encode in chunks (good), but it will write a bzip2-end-of-stream marker after every chunk (bad), so neither will work.

Nothing more satisfying than to fix a bug in someone else’s code. Now let’s hope this gets applied to PHP itself so I don’t have to manually patch my installations.

Newfound respect for JavaScript

Around the year 1999 I began writing my own JavaScript code as opposed to copying and pasting it from other sources and only marginally modifying it.

In 2004 I practically discovered AJAX (XmlHttpRequest in particular) just before the hype started and I have been doing more and more JavaScript since then.

I always regarded JavaScript as something you have to do, but which you dislike. My code was dirty, mainly because I was of the wrong opinion that JavaScript was a procedural language with just one namespace (the global one). Also, I wasn’t using JavaScript for a lot of functionality of my sites, partly because of old browsers and partly because I have not yet seen what was possible in that language.

But for the last year or so, I’m writing very large quanitites of JS in very AJAXy applications, which made me really angry about the limited ways you could use to structure your code.

And then I found a link on reddit to a lecture of a yahoo employee, Douglas Crockford, which really managed to open my eyes.

JavaScript isn’t procedural with some object oriented stuff bolted on. JavaScript is a functional language with object oriented and procedural concepts integrated where it makes sense for us developers to both quickly write code and to understand written code even with only a very little knowledge of how functional languages work.

The immensely powerful concept of having functions as first class objects, of allowing closures and of allowing to modify object prototypes at will makes turns JS into a really interesting language which can be used to write “real” programs with a clean structure.

The day when I have seen those videos, I understood that I had the completely wrong ideas about JavaScript mainly because of my crappy learning experience so far which initially consisted of Copying and Pasting crappy code from the web and later of reading library references, but always ignoring real introductions to the language («because I know that already»).

If you are interested to learn a completely new, powerful side of JavaScript, I highly recommend you watch these movies.

A followup to MSI

My last post about MSI generated some nice responses, amongst them the lengthy blog post on Legalize Adulthood.

Judging from the two track-backs on the MSI posting and especially after reading the linked post above, I come to the conclusion that my posting was very easy to misunderstand.

I agree that the workarounds I listed are problems with the authoring. I DO think however that all these workarounds where put in place because the platform provided by Microsoft is lacking in some kind.

My rant was not about the side effects of these workarounds. It was about their sole existence. Why are some of us forced to apply workarounds to an existing platform to achieve their goals? Why doesn’t the platform itself provide the essential features that would make the workarounds unneeded?

For my *real* problems with MSI from an end users perspective, feel free to read this rant or this on e (but bear in mind that both are a bit oldish by now).

Let’s go once again through my points and try to understand what each workaround tries to accomplish:

EXE-Stub to install MSI: MSI, despite being the platform of choice still isn’t as widely deployed as the installer authors want it to be. If Microsoft wants us to use MSI, it’s IMHO their responsibility to ensure that the platform is actually available.

I do agree though that Microsoft is working on this, for example by requiring MSI 3.1 (the first release with acceptable patching functionality) for Windows Update. This is what makes the stubs useless over time.

And personally I think a machine that isn’t using Windows Update and thus hasn’t 3.1 on it isn’t a machine I’d want to deploy my software on because a machine not running Windows update is probably badly compromised and in an unsupportable state.
EXE-Stub to check prerequisites: Once more I don’t get why the underlying platform cannot provide functionality that is obviously needed by the community. Prerequisites are a fact for life and MSI does nothing to help that. MSI packages can’t be used to install other MSI packages but Merge Modules, but barely any libraries required by todays applications actually come in MSM format (.NET framework? Anyone?).

In response to the excellent post on Legalize Adulthood which gives an example about DirectX, I counter with: Why is there a DirectX Setup API? Why are there separate CAB files? Isn’t MSI supposed to handle that? Why do I have to create a setup stub calling a third-party API to get stuff installed that isn’t installed in the default MSI installation?.

An useful package solution would provide a way to specify dependencies or at least allow for automated installation of dependencies from the original package.

It’s ironic that an MSI package can – even though it’s dirty – use a CustomAction to install a traditionally packaged .EXE-Installer-Dependency, but can’t install a .MSI packaged dependency.

So my problem isn’t with bootstrappers as such, but with the limitations in MSI itself requiring us developers to create bootstrappers to do work which IMHO MSI should be able to do.
MSI-packages .EXE’s: I wasn’t saying that MSI is to blame for the authors that repacked their .EXE’s into .MSI packages. I’m just saying that this is another type of workaround that could have been chosen for the purpose of getting the installation to work despite (maybe only perceived) limitations in MSI. An ideal packaging solution would be as accessible and flexibly as your common .EXE-installer and thus make such a workaround unneeded.
Third party scripting: In retrospect I think the motivation for these third party scripting solutions is mainly the vendor-lock-in. I’m still convinced though that with a more traditional structure and a bit more flexibility for the installer authors, such third party solutions would get more and more unneeded until they finally die out.
Extracting, then merging: Also just another workaround that has been chosen because a distinct problem wasn’t solvable using native MSI technology.

I certainly don’t blame MSI for a developer screwing up. I’m blaming MSI for not providing the tools necessary for the installer community to use native MSI to solve the majority of problems. I ALSO blame MSI for messiness, for screwing up my system countless times and for screwing up my parent’s system which is plainly unforgivable.

Because MSI is a complicated black box, I’m unable to fix problems with constantly appearing installation prompts, with unremovable entries in “Add/Remove programs” and with installations failing with such useful error messages as “Unknown Error 0x[whatever]. Installation terminated”.

I’m blaming MSI for not stopping the developer community to author packages with above problems. I’m blaming MSI for its inherent complexity causing developers to screw up.

I’m disappointed with MSI because it works in a ways that requires at least a part of the community to create messy workarounds for quite common problems MSI can’t solve.

What I posted was a list of workarounds of varying stupidity for problems that shouldn’t exist. Authoring errors that shouldn’t need to happen.

I’m not picky here: A large majority of packages I had to work with do in fact employ one of these workarounds (the unneeded EXE-stub being the most common one), none of which should be needed.

And don’t get me started about how other operating systems do their deployment. I think Windows could learn from some of them, but that’s for another day.