More iPod fun

Last time I explained how to get .OGG-feeds to your iPod.

Today I’ll show you one possible direction one could go to greatly increase the usability of non-official (read: not bought at audible.com) audiobooks you may have lying around in .MP3 format.

You see, your iPod threats every MP3-File of your library as music, regardless of length and content. This can be annoying as the iPod (rightly so) forgets the position in the file when you stop playback. So if you return to the file, you’ll have to start from the beginning and seek through the file.

This is a real pain in case of longer audiobooks and / or radio plays of which I have a ton

One way is to convert your audiobooks to AAC and rename the file to .m4b which will convince iTunes to internally tag the files as audiobooks and then enable the additional features (storing the position and providing UI to change play speed).

Of course this would have meant converting a considerable part of my MP3 library to the AAC-format which is not yet as widely deployed (not to speak of the quality-loss I’d have to endure when converting a lossy format into another lossy format).

It dawned me that there’s another way to make the iPod store the position – even with MP3-files: Podcasts.

So the idea was to create a script that reads my MP3-Library and outputs RSS to make iTunes think it’s working with a Podcast.

And thus, audiobook2cast.php was born.

The script is very much tailored to my directory structure and probably won’t work at your end, but I hope it’ll provide you with something to work with.

In the script, I can only point out two interesting points:

  • When checking a podcast, iTunes ignores the type-attribute of the enclosure when determining whether a file can be played or not. So I had to add the fake .mp3-extension.
  • I’m outputting a totally fake pubDate-Element in the <item>-Tag to force iTunes to sort the audiobooks in ascending order.

As I said: This is probably not useful to you out-of-the-box, but it’s certainly an interesting solution to an interesting problem.

Cheating with OGG-podcasts

For about a year, I’m listening to Podcasts all the time. Until now, I was using my iPod nano with iTunes for my podcasting needs and I was pretty happy about it.

Lately though, I came across some podcasts that provide either only OGG versions or at least enhanced OGG versions (like stereo or additional content). Not wanting to start writing code to listen to Podcasts, I thought that maybe I should try out another player…

I settled with an iRiver Clix 2 which looks great, has a nice OLED display and plays OGG files.

Unfortunately though, it doesn’t play AAC-files which is what one of the podcasts I listen to is distributed in.

So I went down to code and wrote some conversion scripts that download the AAC-files, convert them to ogg and alter the RSS-feed to point to the converted files.

This worked perfectly, so today I rsynced two Podcasts to the iRiver and went to the Office, only to noticing two big problems with the thing:

  1. It doesn’t keep track of what Podcasts I’ve already listened to. As I have quite many podcasts I’m subscribed to, it’s very hard to manually keep track.
  2. And the killer: It doesn’t store the playback position. This is totally bad as podcasts usually are long (up to two hours) and while I like the iRiver’s nice ‘press-the-edge-of-the-device’ usage concept, it’s a real pain to seek in the file: Either it’s way too slow or totally inaccurate, so while seeking on the iPod would be tolerable, it’s completely impossible to do on the iRiver.

Just when I thought that the advantages of being able to play OGGs still outweigh the two disadvantages, I began thinking that maybe, maybe I could do the AAC to OGG-Hack again, but in the other direction…

So now I’m “cheating” myself into better quality and bonus content without actually really using the free format.

And this is how it works (it’s basically the same thing as the scripts I linked in the forum post above, but it has some advanced features):

  • At half pas midnight (though I may increase the interval), ogg_cast_download.php runs. It goes over a list of RSS-feeds (though I may actually automate this list in a later revision – as soon as I’m getting more and more ogg-casts), checks them for new entries (which is easy: If the file isn’t there, it must be new), downloads the enclosures (using wget for resume functionality, proper handling of redirects and meaningful output), acquires tagging information and finally converts the files to AAC format using faac.
  • Whenever iTunes checks for new podcasts, it doesn’t actually download the original, but uses oggcasts.php running on shion, passing the original URL
  • oggcasts.php checks the (symlinked) output directoy of the ogg downloader and alters the feeds to match the converted files.

And if you think you can just install the official quicktime OGG component to import the feeds: That unfortunately won’t work. iTunes refuses to directly download ogg-feeds.

Updating or replacing datasets

This is maybe the most obvious trick in the world but I see people not doing it all over the place, so I guess it’s time to write about it.

Let’s say you have a certain set of data you need to enter into your RDBMS. Let’s further assume that you don’t know whether the data is already there or not, so you don’t know whether to use INSERT or UPDATE

Some databases provide us with something like REPLACE or “INSERT OR REPLACE”, but others do not. Now the question is, how to do this efficiently?

What I always see is something like this (pseudo-code):

  1. select count(*) from xxx where primary_key = xxx
  2. if (count > 0) update; else insert;

This means that for every dataset you will have to do two queries. This can be reduced to only one query in some cases by using this little trick:

  1. update xxx set yyy where primary_key = xxx
  2. if (affected_rows(query) == 0) insert;

This method just goes ahead with the update, assuming that data is already there (which usually is the right assumption anyways). Then it checks if an update has been made. If not, it goes ahead and inserts the data set.

This means that in cases where the data is already there in the database, you can reduce the work on the database to one single query.

Additionally, doing a SELECT and then an UPDATE essentially does the select twice as the update will cause the database to select the rows to update anyways. Depending on your optimizer and/or query cache, this can be optimized away of course, but there are no guarantees.

Careful when clean-installing TabletPCs

At work, I got my hands on a LS-800 TabletPC by motion computing and after spending a lot of time with it and as I’m very interested in TabletPCs anyways, I finally got myself its bigger brother, the LE-1700

The device is a joy to work with: Relatively small and light, one big display and generally nice to handle.

The tablet came with Windows XP preinstalled and naturally, I wanted to have a look at the new Tablet-centric features in Vista, so I went ahead and upgraded.

Or better: Clean-installed.

The initial XP installation was german and I was installing an english copy of Vista which makes the clean installation mandatory.

The LE-1700 is one of the few devices without official Vista-support, but I guess that’s because of the missing software for the integrated UMTS modem – for all other devices, drivers either come prebundled with Vista, are available on Windows update or you can use the XP drivers provided at the Motion computing support site.

After the clean installation, I noticed that the calibration of the pen was a bit off – depending on the position on the screen, the tablet noticed the pen up to 5mm left or above the actual position of the pen. Unfortunately, using the calibration utility in the control panel didn’t seem to help much.

After some googling, I found out what’s going on:

The end-user accessible calibration tool only calibrates the screen for the tilt of the pen relative to the current position. The calibration of the pens position is done by the device manufacturer and there is no tool available for end-users to do that.

Which, by the way, is understandable considering how the miscalibration showed itself: To the middle of the screen it was perfect and near the sides it got worse and worse. This means that a tool would have to present quite a lot of points for you to hit to actually get a accurately working calibration.

Of course, this was a problem for me – especially when I tried out journal and had to notice that the error was bad enough to take all the fun out of hand-writing (imagine writing on a paper and the text appearing .5cm left of where you put the pen).

I needed to get the calibration data and I needed to put it back after the clean installation.

It turns out that the linear calibration data is stored in the registry under HKLMSYSTEMCurrentControlSetControlTabletPCLinearityData in the form of a (large) binary blob.

Unfortunately, Motion does not provide a tool or even reg-file to quickly re-add the data should you clean-install your device, so I had to do the unthinkable (I probably could have called support, but my method had the side effect of not making me wait forever for a fix):

I restored the device to the factory state (by using the preinstalled Acronis True Image residing on a hidden partition), exported the registry settings, reinstalled Vista (at which time the calibration error resurfaced), imported the .reg-File and rebooted.

This solved the problem – the calibration was as smooth as ever.

Now, I’m not sure if the calibration data is valid for the whole series or even defined per device, but here is my calibration data in case you have the same problem as I had.

If the settings are per device or you have a non-LE-1700, I strongly advise you to export that registry key before clean-installing

Obviously I would have loved to know this beforehand, but… oh well.

Gmail – The review

It has been quite a while since I began routing my mail to Gmail with the intention of checking that often-praised mail service out thoroughly.

The idea was to find out if it’s true what everyone keeps saying: That gmail has a great user interface, that it provides all the features one needs and that it’s a plain pleasure to work with it.

Personally, I’m blown away.

Despite the obviously longer load time to be able to access the mailbox (Mac Mail launches quicker than it takes gmail to load here – even with a 10 MBit/s connection), the gmail interface is much faster to use – especially with the nice keyboard shortcuts – but I’m getting ahead of myself.

When I began to use the interface for some real email work, I immediately noticed the shift of paradigm: There are no folders and – the real new thing for me – you are encouraged to move your mail out of the inbox as you take notice of them and/or complete the task associated with the message.

When you archive a message, it moves out of the inbox and is – unless you tag it with a label for quick retrieval – only accessible via the (quick) full text search engine built into the application.

The searching part of this usage philosophy is known to me. When I was using desktop clients, I usually kept arriving email in my inbox until it contained somewhere around 1500 messages or so. Then I grabbed all the messages and put them to my “Old Mail” folder where I accessed them strictly via the search functionality built into the mail client (or the server in case of a good IMAP client).

What’s new for me is the notion of moving mail out of your inbox as you stop being interested in the message – either because you plain read it or because the associated task is completed.

This allows you for a quick overview over the tasks still pending and it keeps your inbox nice and clean.

If you want quick access to certain messages, you can tag them with any label you want (multiple labels per message are possible of course) in which case you can access the messages with one click, saving you the searching.

Also, it’s possible to define filters allowing you to automatically apply labels to messages and – if you want, move them out of the inbox automatically – a perfect setup for the SVN commit messages I’m getting, allowing me to quickly access them at the end of the day and looking over the commits.

But the real killer feature of gmail is the keyboard interface.

Gmail is nearly completely accessible without requiring you to move your hands off the keyboard. Additionally, you don’t even need to press modifier keys as the interface is very much aware of state and mode, so it’s completely usable with some very intuitive shortcuts which all work by pressing just any letter button.

So usually, my workflow is like this: Open gmail, press o to open the new message, read it, press y to archive it, close the browser (or press j to move to the next message and press o again to open it).

This is as fast as using, say, mutt on the console, but with the benefit of staying usable even when you don’t know which key to press (in that case, you just take the mouse).

Gmail is perfectly integrated into google calendar, and it’s – contrary to mac mail – even able to detect outlook meeting invitations (and send back correct responses).

Additionally, there’s a MIDP applet available for your mobile phone that’s incredibly fast and does a perfect job of giving you access to all your email messages when you are on the road. As it’s a Java application, it runs on pretty much every conceivable mobile phone and because it’s a local application, it’s fast as hell and can continue to provide the nice, keyboard shortcut driven interface which we are used to from the AJAXy web application.

Overall, the experiment of switching to gmail proofed to be a real success and I will not switch back anytime soon (all my mail is still archived in our Exchange IMAP box). The only downside I’ve seen so far is that if you use different email-aliases with your gmail-account, gmail will set the Sender:-Header to your gmail-address (which is a perfectly valid – and even mandated – thing to do), and the stupid outlook on the receiving end will display the email as being sent from your gmail adress “in behalf of” your real address, exposing your gmail-address at the receiving. Meh. So for sending non-private email, I’m still forced to use Mac Mail – unfortunately.

PHP, stream filters, bzip2.compress

Maybe you remember that, more than a year ago, I had an interesting problem with stream filters.

The general idea is that I want to output bz2-compressed data to the client as the output is being assembled – or, more to the point: The PopScan Windows-Client supports the transmission of bzip2 encoded data which gets really interesting as the amount of data to be transferred increases.

Even more so: The transmitted data is in XML format which is very easily compressed – especially with bzip2.

Once you begin to transmit multiple megabytes of uncompressed XML-data, you begin to see the sense in jumping through a hoop or two to decrease the time needed to transmit the data.

On the receiving end, I have an elaborate construct capable of downloading, decompressing, parsing and storing data as it arrives over the network.

On the sending end though, I have been less lucky: Because of that problem I had, I was unable to stream out bzip2 compressed data as it was generated – the end of the file was sometimes missing. This is why I’m using ob_start() to gather all the output and then compress it with bzcompress() to send it out.

Of course this means that all the data must be assembled before it can be compressed and the sent to the client.

As we have more and more data to transmit, the client must wait longer and longer before the data begins to reach it.

And then comes the moment when the client times out.

So I finally really had to fix the problem. I could not believe that I was unable to compress and stream out data on the fly.

It turns out that I finally found the smallest possible amount of code to illustrate the problem in a non-hacky way:

So: This fails under PHP up until 5.2.3:

<?
$str = "BEGIN (%d)n
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad
minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip
ex ea commodo consequat. Duis aute irure dolor in reprehenderit in
voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur
sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt
mollit anim id est laborum.
nEND (%d)n";

$h = fopen($_SERVER['argv'][1], 'w');
$f = stream_filter_append($h, "bzip2.compress", STREAM_FILTER_WRITE);
for($x=0; $x < 10000; $x++){
   fprintf($h, $str, $x, $x);

}
fclose($h);
echo "Writtenn";
?>

Even worse though: It doesn’t fail with a message, but it writes out a corrupt bzip-File.

And it gets worse: With a little amount of data it works, but as the amount of data increases, it begins to fail – at different places depending on how you shuffle the data around.

Above script will write a bzip file which – when uncompressed – will end around iteration 9600.

So now that I had a small reproducible testcase, I could report a bug in PHP: Bug 47117.

After spending so many hours on a problem which in the end boiled down to a bug in PHP (I’ve looked anywhere, believe me. I also tried workarounds, but all to no avail), I just could not let the story end there.

Some investigation quickly turned up a wrong check for a return value in bz2_filter.c which I was able to patch up very, very quickly, so if you visit that bug above, you will find a patch correcting the problem.

Then, when I finished patching PHP itself, hacking up the needed PHP-code to let the thing stream out the compressed data as it arrived was easy. If you want, you can have a look at bzcomp.phps which demonstrates how to plug the compression into either the output buffer handling or something quick, dirty and easier else.

Oh, and if you are tempted to do this:

function ob($buf){
        return bzcompress($buf);
}

ob_start('ob');

… it won’t do any good because you will still gobble up all the data before compressing. And this:

function ob($buf){
        return bzcompress($buf);
}

ob_start('ob', 32768);

will encode in chunks (good), but it will write a bzip2-end-of-stream marker after every chunk (bad), so neither will work.

Nothing more satisfying than to fix a bug in someone else’s code. Now let’s hope this gets applied to PHP itself so I don’t have to manually patch my installations.

Trying out Gmail

Everyone and their friends seems to be using Gmail lately and I agree: The application has a clean interface, a very powerful search feature and is easily accessible from anywhere.

I have my Gmail address from back in the days when invites were scarce and the term AJAX wasn’t even a term yet, but I never go around to really take advantage of the services as I just don’t see myself checking various email accounts at various places – at least not for serious business.

But now I found a way to put gmail to the test as my main email application – at least for a week or two.

My main mail storage is and will be our Exchange server. I have multiple reasons for that

  1. I have all my email I ever sent or received in that IMAP account. That’s WAY more than the 2.8 GB you get in Gmail and even if I had enough space there, I would not want to upload all my messages there.
  2. I don’t trust gmail to be as diligent with the messages I store there as I would want it to. I managed to keep every single email message from 1998 till now and I’d hate to lose all that to a “glitch in the system”.
  3. I need IMAP access to my messages for various purposes.
  4. I need the ability of a strong server-side filtering to remove messages I’m more or less only receiving for logging purposes. I don’t want to see these – not until I need them. No reason to even have them around usually.

So for now I have added yet another filter to my collection of server-side filters: This time I’m redirecting a copy of all mail that didn’t get filtered away due to various reasons to my Gmail address. This way I get to keep all mail of my various aliases all at the central location where they always were and I can still use Gmail to access the newly arrived messages.

Which leaves the problem with the sent messages which I ALSO want to archive at my own location – at least the important ones.

I fixed this by BCCing all Mail I’m writing in gmail to a new alias I created. Mail to that alias with my Gmail address as sender will be filtered into my sent-box by Exchange so it’ll look as though I sent the message via Thunderbird and then uploaded the copy via IMAP.

I’m happy with this solution, so testing Gmail can begin.

I’m asking myself: Is a tag based storage system better than a purely search based (the mail I don’t filter away is kept in one big INBOX which I access purely via search queries if I need something)? Is a web based application as powerful as a mail client like Thunderbird or Apple Mail? Do I unconsciously use features I’m going to miss when using Gmail instead of Apple Mail or Thunderbird? Will I be able to get used to the very quick keyboard-interface to gmail?

Interesting questions I intend to answer.

Mail filtering belongs on the server

Different people who got their iPhone are complaining about SPAM reaching their inbox and want Junk Mail controls on their new gadget, failing to realize the big problem with that approach:

Even if the iPhone is updated with a SPAM filter, the messages will get transmitted and filtered there, which means that you pay for receiving the junk just to throw it away afterwards.

Additionally, Bayes filter still seem to be the way to go with junk mail filtering. The Bayes rules can get pretty large, so this means that you either have to retrain your phone or that the seed data must be synchronized with the phone which will take both a lot of time and space better used for something else.

No. SPAM filtering is a task for the mail server.

I’m using SpamAssassin and DSPAM to check the incoming mail for junk and then I’m using the server side filtering capabilities of our Exchange server to filter mail recognized as SPAM into the “Junk E-Mail” box.

If the filter is easy enough (checking for header values and moving into boxes), even though it is defined in Outlook, the server can process them regardless of which client is connecting to it to fetch the mail (Apple Mail, Thunderbird and the IMAP client on my W880i in my case). This means that all my junk is sorted away into the “Junk Email” folder just when it arrives. It never reaches the INBOX and I never see it.

I don’t have an iPhone and I don’t want to have one (I depend on bluetooth modem functionality and a real keypad), but the same thing applies to any mobile emailing solution. You don’t want SPAM on your Blackberry and especially not on your even simpler non-smartphone.

Speaking of transferring data: The other thing I really don’t like about the iPhone is the browser. Sure: It’s standard compliant, it renders nice, it supports AJAX and supports small-screen-rendering but it transmits the websites uncompressed.

Let me make an example: The digg.com frontpage in Opera Mini causes 10KB of data to be tranferred. It looks perfectly fine on my SonyEricsson W880 and works as such (minus some javascript functionality). Digg.com when accessed via Firefox causes 319 KB to be transmitted.

One MB costs CHF 7 here (though you can have some inclusive MB’s depending on contract) which is around EUR 4.50, so for that money I could watch digg.com three times with the iPhone or 100 times with Opera Mini. The end-user experience is largely the same on both platforms – at least close enough not to warrant the 33 times more expensive access via a browser that works without a special proxy.

As long as GPRS data traffic is prohibitively expensive, junk mail filtering on the server and a prerendering-proxy based browser are a must. Even more so than the other stuff missing in the iPhone.

Upscaling video

I have an awesome Full-HD projector and I have a lot of non-HD video material, ranging from DVD-rips to speedruns of older consoles and I’m using a Mac Mini running Windows (first Vista RC2, then XP and now Vista again) connected to said projector to access the material.

The question was: How do I get the best picture quality out of this setup.

The answer boils down to the question of what device should do the scaling of the picture:

Without any configuration work, the video is scaled by your graphics card which usually does quite a bad job at it unless it provides some special upscaling support which the intel chip in my Mac Mini seems not to.

Then you could let the projector do the scaling which would require the MCE application to change the screen resolution to the resolution of the file played. It would also mean that the projector has to support the different resolutions the files are stored in which is hardly the case as there are some very strange resolutions here and then (think game boy’s native 140×102 resolution).

The last option is to let your CPU do the scaling – at least to some degree.

This is a very interesting option, especially as my Mac Mini comes with one of these nice dual core CPUs we can try and leverage for this task. Then, there are a lot of algorithms out there that are made exactly for the purpose of scaling video, some of which are very expensive to implement in specialized hardware like GPUs or the firmware of a projector.

So I went around and finally found this post outlining the steps needed to configure ffdshow to do its thing.

I used the basic setting and modified it just a bit to keep the original aspect ratio of the source material and to only do the resizing up to the resolution of 1280×720. If the source is larger than this, there’s no need to shrink the video just to use the graphics chip to upscale it again to the projectors native 1920×1280 resolution (*sigh*).

Also, I didn’t want ffdshow to upscale 1280×720 to the full 1920×1280. At first I tried that, but I failed to see a difference in picture quality, but I had the odd frame drop here and then, so I’m running at the limits of my current setup.

Finally, I compared the picture quality of a Columbo (non-referal link to Amazon – the package arrived last week) DVD rip with and without the resizing enabled.

The difference in quality is immense. The software-enhanced picture looks nearly like a real 720p movie – sure: Some details are washed-up, but the overall quality is worlds better than what I got with plain ffdshow and no scaling.

Sure. The CPU usage is quite a bit higher than before, but that’s what the CPUs are for – to be used.

I highly recommend you taking the 10 minutes needed to set up the ffdshow video decoder to do the scaling. Sure: The UI is awful and I didn’t completely understand many of the settings, but the increased quality more than made up the work it took to configure the thing.

Heck! Even the 240×160 pixel sized Pokémon Sapphire run looked much better after going through ffdshow with software scaling enabled.

Highly recommended!

By the way: This only works in MCE for video files as MCE refuses to use ffdshow for MPEG2 decoding which is needed for DVD or TV playback. But 100% of the video I watch are video files anyways, so this doesn’t bother me at all.

*sigh*

 % php -a
Interactive shell

php > if (0 == null) echo "*sigh*n";
*sigh*
php > quit

that bit me today. Even after so many years. I should really get used to use ===