armchair scientists

The place: London. The time: Around 1890.

Imagine a medium sized room, lined with huge shelves filled with dusty
books. The lights are dim, the air is heavy with cigar smoke. Outside
the last shred of daylight is fading away.

In one corner of the room, you spot two large leather armchairs and a
small table. On top of the table, two half-full glasses of Whiskey. In
each of the armchair an elderly person.

One of them opens the mouth to speak

«If I were in charge down there in South Africa, we’d be so much
better off – running a colony just can’t be so hard as they make it
out to be»

Concievably to have happened? Yeah. Very likely actually. Crazy and
misguided? Of course – we learned about that in school,
imperialism
doesn’t
work.

Of course that elderly guy in the little story is wrong. The problems
are way too complex for a bystander to even understand, let alone
solve. More than likely he doesn’t even have a fraction of the
background needed to understand the complexities.

And yet he sits there, in his comfortable chair, in the warmth of his
club in cozy London and yet he explains that he knows so much better
than, you know, the people actually doing the work.

Now think today.

Think about that article you just read that was explaining a problem
the author was solving. Or that other article that was illustrating a
problem the author is having, still in search of a solution.

Didn’t you feel the urge to go to Hacker News
and reply how much you know better and how crazy the original poster
must be not to see the obvious simple solution?

Having trouble scaling 4chan? How can that be hard?
Having trouble with your programming environment feeling unable to assign a string to another?
Well. It’s just strings, why is that so hard?

Or those idiots at Amazon who can’t even keep their cloud service
running? Clearly it can’t be that hard!

See a connection? By stating opinion like that, you are not even a
little bit better than the elderly guy in the beginning of this essay.

Until you know all the facts, until you were there, on the ladder
holding a hose trying to extinguish the flames, until then, you don’t
have the right to assume that you’d do better.

The world we live in is incredibly complicated. Even though computer
science might boil down to math, our job is dominated by side-effects
and uncontrollable external factors.

Even if you think that you know the big picture, you probably won’t
know all the details and without knowing the details, it’s
increasingly likely that you don’t understand the big picture either.

Don’t be an armchair scientist.

Be a scientist. Work with people. Encourage them, discuss solutions,
propose ideas, ask what obvious fact you missed or was missing in the
problem description.

This is 2012, not 1890.

E_NOTICE stays off.

I’m sure you’ve used this idiom a lot when writing JavaScript code

options['a'] = options['a'] || 'foobar';

It’s short, it’s concise and it’s clear what it does. In ruby, you can even be more concise:

params[:a] ||= 'foobar'

So you can imagine that I was happy with PHP 5.3’s new ?: operator:

<? $options['a'] = $options['a'] ?: 'foobar'; ?>

In all three cases, the syntax is concise and readable, though arguably, the PHP one could read a bit better, but, ?: still is better than writing the full ternary expression, spelling out $options['a'] three times.

PopScan, since forever (forever being 2004) runs with E_NOTICE turned off. Back in the times, I felt it provided just baggage and I just wanted (had to) get things done quickly.

This, of course, lead to people not taking enough care for the code and
recently, I had one too many case of a bug caused by accessing a variable that
was undefined in a specific code path.

I decided that I’m willing to spend the effort in cleaning all of this up and
making sure that there are no undeclared fields and variables in all of
PopScans codebase.

Which turned out to be quite a bit of work as a lot of code is apparently
happily relying on the default null that you can read out of undefined
variables. Those instances might be ugly, but they are by no means bugs.

Cases where the null wouldn’t be expected are the ones I care about, but I
don’t even what to go and discern the two – I’ll just fix all of the instances
(embarrassingly many, most of them, thankfully, not mine).

Of course, if I put hours into a cleanup project like this, I want to be sure
that nobody destroys my work again over time.

Which is why I was looking into running PHP with E_NOTICE in development
mode at least.

Which brings us back to the introduction.

<? $options['a'] = $options['a'] ?: 'foobar'; ?>

is wrong code. Any accessing of an undefined index of an array always raises a
notice. It’s not like Python where you can chose (accessing a dictionary using
[] will throw a KeyError, but there’s get() which just returns None). No. You
don’t get to chose. You only get to add boilerplate:

<? $options['a'] = isset($options['a']) ? $options['a'] : 'foobar'; ?>

See how I’m now spelling $options['a'] three times again? ?: just got a
whole lot less useful.

But not only that. Let’s say you have code like this:

<?
list($host, $port) = explode(':', trim($def))
$port = $port ?: 11211; ?>

IMHO very readable and clear what it does: It extracts a host and a port and
sets the port to 11211 if there’s none in the initial string.

This of course won’t work with E_NOTICE enabled. You either lose the very
concise list() syntax, or you do – ugh – this:

<?
list($host, $port) = explode(':', trim($def)) + array(null, null);
$port = $port ?: 11211; ?>

Which looks ugly as hell. And no, you can’t write a wrapper to explode() which
always returns an array big enough, because you don’t know what’s big enough.
You would have to pass the amount of nulls you want into the call too. That
would look nicer then above hack, but it still doesn’t even come close in
conciseness to the solution which throws a notice.

So. In the end, I’m just complaining about syntax you might think? I though so too and I wanted to add the syntax I liked, so I did a bit of experimenting.

Here’s a little something I’ve come up with:

https://gist.github.com/1267568.js?file=e_notice_stays_off.php

The wrapped array solution looks really compelling syntax-wise and I could totally see myself using this and even forcing everybody else to go there. But of course, I didn’t trust PHP’s interpreter and thus benchmarked the thing.

pilif@tali ~ % php e_notice_stays_off.php
Notices off. Array 100000 iterations took 0.118751s
Notices off. Inline. Array 100000 iterations took 0.044247s
Notices off. Var. Array 100000 iterations took 0.118603s
Wrapped array. 100000 iterations took 0.962119s
Parameter call. 100000 iterations took 0.406003s
Undefined var. 100000 iterations took 0.194525s

So. Using nice syntactic sugar costs 7 times the performance. The second best
solution? Still 4 times. Out of the question. Yes. It could be seen as a
micro-optimization, but 100’000 iterations, while a lot is not that many.
Waiting nearly a second instead of 0.1 second is crazy, especially for a
common operation like this.

Interestingly, the most bloated code (that checks with isset()) is twice as
fast as the most readable (just assign). Likely, the notice gets fired
regardless of error_reporting() and then just ignored later on.

What really pisses me off about this is the fact that everywhere else PHP
doesn’t give a damn. ‘0’ is equal to 0. Heck, even ‘abc’ is equal to 0. It
even fails silently many times.

But in a case like this, where there is even newly added nice and concise
syntax, it has to be anal and bitchy. And there’s no way to get to the needed
solution but to either write too expensive wrappers or ugly boilerplate.

Dynamic languages give us a very useful tool to be dynamic in the APIs we
write. We can create functions that take a dictionary (an array in PHP) of
options. We can extend our objects at runtime by just adding a property. And
with PHP’s (way too) lenient data conversion rules, we can even do math with
user supplied string data.

But can we read data from $_GET without boilerplate? No. Not in PHP. Can we
use a dictionary of optional parameters? Not in PHP. PHP would require
boilerplate.

If a language basically mandates retyping the same expression three times,
then, IMHO, something is broken. And if all the workarounds are either crappy
to read or have very bad runtime properties, then something is terribly
broken.

So, I decided to just fix the problem (undefined variable access) but leave
E_NOTICE where it is (off). There’s always git blame and I’ll make sure I
will get a beer every time somebody lets another undefined variable slip in.

Alt-Space

Today, I was looking into the new jnlp_href way of launching a Java Applet. Just like applet-launcher, this allows one to create applets that depend on native libraries without the usual hassle of manually downloading the files and installing them.

Contrary to applet-launcher, it’s built into the later versions of Java 1.6 and it’s officially supported, so I have higher hopes concerning its robustness.

It’s even possible to keep the applet-launcher calls in there if the user has an older Java Plugin that doesn’t support jnlp_href yet.

So in the end, you just write a .jnlp file describing your applet and add

<param name="jnlp_href" value="http://www.example.com/path/to/your/file.jnlp">

and be done with it.

Unless of course, your JNLP file has a syntax error. Then you’ll get this in your error console (at least in case of this specific syntax error):

java.lang.NullPointerException
    at sun.plugin2.applet.Plugin2Manager.findAppletJDKLevel(Unknown Source)
    at sun.plugin2.applet.Plugin2Manager.createApplet(Unknown Source)
    at sun.plugin2.applet.Plugin2Manager$AppletExecutionRunnable.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
Ausnahme: java.lang.NullPointerException

How helpful is that?

Thanks, by the way, for insisting to display a half-assed German translation on my otherwise english OS: Never use locale info for determining the UI langauge, please.

Of course, this error does not give any indication of what the problem could be.

And even worse: The error in question is the topic of this blog post: It’s the dreaded Alt-Space character, 0xa0, or NBSP in ISO 8859-1.

0xa0 looks like a space, feels like a space, is incredibly easy to type instead of a space, but it’s not a space – not in the least. Depending on your compiler/parser, this will blow up in various ways:

pilif@celes ~ % ls | grep gnegg
zsh: command not found:  grep
pilif@celes ~ %
pilif@celes ~ % cat test.php
<?
echo "gnegg";
?>
pilif@celes ~ % php test.php
PHP Parse error:  syntax error, unexpected T_CONSTANT_ENCAPSED_STRING in /Users/pilif/test.php on line 2

Parse error: syntax error, unexpected T_CONSTANT_ENCAPSED_STRING in /Users/pilif/test.php on line 2
pilif@celes ~ %

and so on.

Now you people in the US with US keyboard layouts might think that I’m just one of those whiners – after all, how stupid must one be to press Alt-Space all the time? Probably stupid enough to deserve stuff like this.

Before you think these nasty thoughts, I ask you to consider the Swiss German keyboard layout though: Nearly all the characters use programmers use are accessed by pressing Alt-[some letter]. At least on the Mac. Windows uses AltGr, or right-alt, but on the mac, any alt will do.

So when you look at the shell line above:

ls | grep gnegg

you’ll see how easy it is to hit alt-space: First I type ls, then space. Then I press and hold alt-7 for the pipe and then, I am supposed to let go of alt and hit space. But because my left hand is on alt and the right one is pressing space, it’s very easy to hit space before letting go of alt.

Now instead of getting immediate feedback, nothing happens. It looks as if the space had been added, when in fact, something else has been added and that something is not recognized as a white space character and thus is something completely different from a space – despite looking exactly the same.

As much fun as reading hexdump -C output is – I need this to stop.

Dear internet! How can I make my Mac (or Linux when using the Mac keyboard layout) stop recognizing Alt-Space?

To take air out of the eventually arriving troll’s sails:

  • I won’t use Windows again. Thank you. Neither do I want to use Linux on my desktop.
  • I cannot use the US keybindings because my brain just can’t handle the keyboard layout changing all the time and as I’m a native German speaker, I do have to type umlauts here and then – actually often enough, so that the ¨+vocal combo isn’t acceptable.
  • While running Mac OS X, I’m stuck with the mac keyboard layout – I can’t use the Windows one.

Above JNLP error (printed here just in case somebody else has the same issue) caused me to lose nearly 5 hours of my life and will force me to work this weekend – who’d expect a XML parser error due to a space that isn’t one when seeing above call stack?

Update: A commenter on reddit.com has recommended to use Ukelele which I did and it helped me to create a custom keyboard layout that makes alt-space work like just space. That’s the best solution for my specific taste, so thanks a lot!

Of all the hardware that can break…

… it has to be the one that’s most difficult to replace.

Today, my Gefen HDMI over Cat5 adapter died. Well. It didn’t die completely, it just lost its ability to produce a stable image. What is transmitted is very intermittent and in the few seconds the image is available, it’s heavily distorted.

Also, it’s not the obvious issue (faulty cabling) as the problems did not go away after using two very short (1m) cat 5 cables to test.

Now this is really bad for a variety of reasons:

  • Only just last Saturday I bought Star Ocean and Tales of Vesperia for my 360, giving me a total play time of 1.5 hours so far.
  • Yesterday I noticed that Worms: Armageddon was released for Xbox arcade and I have already invited Ebi after the huge success that was our earlier Worms evening on the 360.
  • My setup is totally dependent on the two extenders as I am covering more than 20 meters of distance between receiver and projector. No extender, no Xbox, no Wii, no projector.
  • Last time I waited around six weeks for the extender to arrive

Of all the hardware I’m having at home, the HDMI extender is the worst to break. Not only is it very hard to replace (see above), it’s so deeply integrated into my home cinema setup that just debugging what was going on took a ladder, a screwdriver, a hex-wrench and unwinding an ungodly heap of cables.

All of that in an apartment whose temperature is currently at 30°C (86 °F) and with a hell of a headache.

I’d take anything else going down. Anything but that Gefen extender. My XBox? Sure. Shion? It’d suck, but sure if it has to be, go ahead. My reciever? That would hurt as it was very expensive, but at least it’s easily replaced.

Why did it have to be that Gefen extender? Why??

digg bar controversy

Update: I’ve actually written this post yesterday and scheduled it for posting today. In the mean time, digg has found an even better solution and only shows their bar for logged in users. Still – a solution like the one provided here would allow for the link to go to the right location regardless of the state of the digg bar settings.

Recently, digg.com added a controversial feature, the digg bar, which basically frames every posted link in a little IFRAME.

Rightfully so, webmasters were concerned about this and quite quickly, we had the usual religious war going on between the people finding the bar quite useful and the webmasters hating it for lost page rank, even worse recognition of their site and presumed affiliation with digg.

Ideas crept up over the weekend, but turned out not to be so terribly good.

Basically it all boils down to digg.com screwing up on this, IMHO.

I know that they let you turn off that dreaded digg bar, but all the links on their page still point to their own short url. Only then is the decision made whether to show the bar or not.

This means that all links on digg currently just point to digg itself, not awarding any linked page with anything but the traffic which they don’t necessarily want. Digg-traffic isn’t worth much in terms of returning users. You get dugg, you melt your servers, you return back to be unknown.

So you would probably appreciate the higher page rank you get from being linked at by digg as that leads to increased search engine traffic which generally is worth much more.

The solution on diggs part could be simple: Keep the original site url in the href of their links, but use some JS-magic to still open the digg bar. That way they still get to keep their foot in the users path away from the site, but search engines will now do the right thing and follow the links to their actual target, thus giving the webmasters their page rank back.

How to do this?

Here’s a few lines of jQuery to automatically make links formated in the form

be opened via the digg bar while still working correctly for search engines (assuming that the link’s ID is the digg shorturl):

$(function(){
  $('div#link_container a').click(function(){
    $(this).attr('href') = 'http://digg.com/' + this.id;
  });
});

piece of cacke.

No further changes needed and all the web masters will be so much happier while digg gets to keep all the advantages (and it may actually help digg to increase their pagerank as I could imagine that a site with a lot of links pointing to different places could rank higher than one without any external links).

Webmasters then still could do their usual parent.location.href trickery to get out of the digg bar if they want to, but they could also retain their page rank.

No need to add further complexity to the webs standards because one site decides not to play well.

Bugs, Bugs and more Bugs

I love my job. Ever loved it, always will love it.

But if you ask me what the most annoying aspect of it is, then I would answer you that it’s stuff always breaking all around me.

Whatever I do, there is no guarantee that any defined thing will work like it’s expected to, it will break from one moment to another or it will never work. There are hardware failures, OS failures, software failures – each and every day I lose at least one or two hours due to stuff not working or suddenly stopping to work.

Let me give you an account of what happened since the beginning of 2009:

  • When installing two previously configured servers at a collocation center, one didn’t start up at all (opening and reclosing the case fixed that) and the ESX server on the other machine refused to connect to the VMWare license server despite a working TCP/IP connection between them which turned out to be a missing host file entry despite connecting via IP-address.
  • One day later, Outlook on a computer of someone I’m looking after the PC a bit decided to trash the .PST-file and I had to remotely guide (on the phone) the person to restore it from the backup.
  • Yesterday, my Firebug suddenly stopped working. At least the console-object wasn’t any longer available in my scripts and the console itself didn’t work. Reinstalling the Addon helped (WTF?)
  • One of my two Vista Media Center PCs suddenly stopped to play any video file, despite me not doing updates on these machines to prevent stuff like this from happening. To this date I have no idea how to fix this.
  • My Delphi 2007 installation just now decided to stop displaying the online help. Trying to fix that by reinstalling it ended with an Error message containing title and content of “Error”, but not after first completely uninstalling Delphi with no way of getting it back (you know… “Error” again). This was fixed by removing D2009 and then reinstalling 2007 and 2009 – a process that took 2 hours of installation time and another three to figure out what’s going on.
  • When I was frustrated enough and wanted to vent (i.e. write this post), my WordPress just now decided to do something really strange to the layout of the “Add New Post” page which made it impossible to post anything. Disabling Google Gears and restarting the browser helped.

Our everyday technology is becoming more and more complex, thus causing more and more strange problems, requiring more and more knowledge and time to work around them. If we continue on that path, sooner or later it will be impossible to keep up with fixing problems popping up.

That will be the day when I’ll hopefully live on some island way off the net and all this stuff.

Automatic language detection

If you write a website, do not use Geolocation to determine the language to display to your user.

If you write a desktop application, do not use the region setting to determine the language to display to your user.

This is incredibly annoying for some of us, especially for me which is why I’m ranting here.

The moment Google released their (awful) German translation for their RSS reader, I was served the German version just because I have a Swiss IP address.

Here in Switzerland, we actually speak one of three (or four, depending on who you ask) languages, so defaulting to German is probably not of much help for the people in the french speaking part.

Additionally, there are many users fluent in (at least reading) English. We always prefer the original language if at all possible because generally, translations never quite work. Even if you have the best translators at work, translated texts never feel fluid. Especially not when you are used to the original version.

So, Google, what were you thinking to switch me over to the German version of the reader? I have been using the English version for more than a year, so clearly, I understood enough of that language to be able to use it. More than 90% of the RSS feeds I’m subscribed to are, in fact, in English. Can you imagine how pissed I was to see the interface changed?

This is even worse on the iPhone/iPod frontend, because, there, you don’t even provide an option to change the language aside of manually hacking the URL.

Or take desktop applications. I live in the German speaking parts of Switzerland. True. So naturally I have set my locale settings to Swiss German. You know: I want to have the correct number formatting, I want my weeks to start on Mondays. I want the correct currency. I want my 24 hours clock I’m used to.

Actually, I also want the German week and month names, because I will be using these in most of my letters and documents, which are, in fact, German too.

But my OS installation is English. I am used to English. I prefer English. Why do so many programs insist to use the locale setting to determine the display language? Do you developers think it’s funny to have a mish-mash of languages on the screen? Don’t you think that me using an English OS version may be an indication that I do not want to read your crappy German translation alongside the English user interface of my OS?

Don’t you think that it feels really stupid to have a button in a German dialog box open another, English, dialog (the first one is from Chrome, the one that opens once you click “Zertifikate verwalten” (Manage certificates) is from Windows itself)?

In Chrome, I can at least fix the language – once I found the knob to turn. At first, it was easier for me to just delete the German localization file from the chrome installation because, due to being completely unused to German UIs, I was unable to find the right setting.

This is really annoying and I see this particular problem being neglected on an incredibly large scale. I know that I am a minority, but the problem is so terribly easy to fix:

  • All current browsers send an Accept-Language header. In contrast to the earlier times, nowadays, it is actually correctly preset in all the common browsers. Use that. Don’t use my IP-address.
  • Instead of reading the locale setting in my OS, ask the OS for its UI language and use that to determine which localization to load (actually, this is the recommended way of doing things according to Microsoft’s guidelines at least since Windows XP which was 2001).

Using these two simple tricks, you help a minority without hindering the majority in any way and without additional development overhead!

Actually, you’ll be getting away a lot cheaper than before. GeoIP is expensive if you want it to be accurate (and you do want that. Don’t you?), whereas there are ready-to-use libraries to determine the correct language even from the most complex Accept-Language-Header.

Asking the OS for the UI language isn’t harder than asking it for the locale, so no overhead there either.

Please, developers, please have mercy! Stop the annoyance! Stop it now!