solution- gnegg

VMWare Server, chrony and slow clocks

We have quite many virtual machines running under VMWare server. Some for testing purposes, some for quite real systems serving real webpages.

It’s wonderful. Need a new server? Just cp -r the template I created. Need more RAM in your server? No problem. Just add it via the virtual machine configuration file. Move to another machine? No problem at all. Power down the virtual machine and move the file where you want it to be.

Today I noticed something strange: The clocks on the virtual machines were way slow.

One virtual second was about ten real seconds.

This was so slow that chrony which I used on the virtual machines thought that the data sent from the time servers was incorrect, so chrony was of no use.

After a bit of digging around, I learned that VMware server needs access to /dev/rtc to provide the virtual machines with an usable time signal (usable as in “not too slow”).

The host’s /var/log/messages was full of lines like this (you’ll notice that I found yet another girl from a console RPG to name that host):

Dec 15 16:12:58 rikku /dev/vmmon[6307]: /dev/rtc open failed: -16
Dec 15 16:13:08 rikku /dev/vmmon[6307]: host clock rate change request 501 -> 500

-16 means “device busy”

The fix was to stop chrony from running on the host machine so VMWare could open /dev/rtc. This made the error messages vanish and additionally it allowed the clocks of the virtual machines to work correctly.

Problem solved. Maybe it’s useful for you too.

DVD ripping, second edition

HandBrake is a tool with the worst website possible: The screenshot that’s presented on the index page leaves behind a completely wrong image of the application.

When you just look at the screenshot, you will get the impression that the tool is fairly limited and totally optimized for creating movies for handheld devices.

That’s not true though. The screenshot is the screenshot of a light edition of the tool. The real thing is actually quite capable and only lacks the capability to store subtitles in the container format.

And it doesn’t know about Matroska.

And it refuses to store x264 encoded video in the OGM container.

Another tool I found after my first very bad experience with ripping DVDs last time is OGMrip. The tool is a frontend for mencoder (of mplayer fame) and has all the features you’d ever want from a ripping tool, while still being easy to use.

It even provides a command line interface, allowing to process your movies from the console.

It has one inherent flaw though: It’s single threaded.

HandBrake on the other hand, can split the encoding work (yes. the actual encoding) over multiple threads and thus can profit a lot of SMP machines.

Here’s what I found in matters of encoding speed. I encoded the same video (from a DVD ISO image) with the same settings (x264, 1079kbit/s, 112kbit mp3 audio, 640×480 resolution at 30fps) on different machines:

1.4Ghz, G4 Mac mini, running Gentoo Linux with OGMrip: 3fps
Thinkpad T43, running Ubuntu Edgy Eft, 1.6Ghz Centrino, OGMRip: 8fps
MacBook Pro, 2Ghz Core Duo, HandBrake: 22fps (both cores at 100%)
Mac Pro, Dual Dual Core 2.66Ghz, HandBrake: 110fps(!!), 80% total cpu usage (hdd io seems to limit the process)

This means that encoding the whole 47 minutes A-Team episode takes:

OGMRip on Mac mini G4: 7.8 hours
OGMRip on Thinkpad: 2.35 hours per episode
HandBrake on MacBook Pro: 1.6 hours per episode
HandBrake on MacPro: 0.2 hours (12 minutes) per episode

Needless to say what method I’m using. Screw subtitles and Matroska – I want to finish ripping my collection this century!

On an additional closing note, I’d like to add that even after 3 hours of encoding video, the MacPro stayed very, very quiet. The only thing I could hear was the hard drive – the fans either didn’t run or were quieter than the harddrive (which is quiet too)

Intel Mac Mini, Linux, Ethernet

If you have one of these new Intel Macs, you will sooner or later find yourself in the situation of having to run Linux on one of them. (Ok. Granted: The situation may be coming sooner for some than for others).

Last weekend, I was in that situation: I had to install Linux on an Intel Mac Mini.

The whole thing is quite easy to do and if you don’t need Mac OS X, you can just go ahead and install Linux like you would on any other x86 machine (provided the hardware is sufficiently new to have the BIOS emulation layer already installed – otherwise you have to install the Firmware Update first – you’ll notice by the mac not booting from the CD despite holding c during the initial boot sequence).

You can partition the disk to your liking – the Mac bootloader will notice that there’s something fishy with the parition layout (the question-mark-on-a-folder icon will blink one or two times) before passing control to the BIOS emulation which will be able to boot Linux from the partitions you created during installation.

Don’t use grub as bootloader though.

I don’t know if it’s something grub does to the BIOS or if it’s something about the partition table, but grub can’t launch stage 1.5 and thus is unable to boot your installation.

lilo works fine though (use plain lilo when using the BIOS emulation for the boot process, not elilo)

When you are done with the installation process, something bad will happen sooner or later though: Ethernet will stop working.

This is what syslog has to say about it:

NETDEV WATCHDOG: eth0: transmit timed out
sky2 eth0: tx timeout
sky2 eth0: transmit ring 60 .. 37 report=60 done=60
sky2 hardware hung? flushing

When I pulled the cable and plugged it in again, the kernel even oops’ed.

The macs have a Marvel Yukon ethernet chipset. This is what lspci has to tell us: 01:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 22). The driver to use in the kernel config is “SysKonnect Yukon2 support (EXPERIMENTAL)” (CONFIG_SKY2)

I guess the EXPERIMENTAL tag is warranted for once.

The good news is, that this problem is fixable. The bad news is: It’s tricky to do.

Basically, you have to update the driver with the version that is in the repository of what’s going to be kernel 2.6.19

Getting a current version of sky.c and sky.h is not that difficult. Unfortunately though, the new driver won’t compile with the current 2.6.18 kernel (and upgrading to a pre-rc is out of the question – even more so considering the ton of stuff going into 2.6.19).

So first, we have to patch in this changeset to make the current release of sky compile.

Put the patch to /usr/src/linux and patch with patch -p1

Then fetch the current revision of sky2.c and sky2.h and overwrite the existing files. I used the web interface to git for that as I have no idea how the command line tools work.

Recompile the thing and reboot.

For me, this fixed the problem with the sky2 driver: The machine in question is now running for a whole week without any networking lockups – despite heavy network load at times.

While happy to see this fixed, my statement about not buying too new hardware (posting number 6 here on gnegg.ch – ages ago) if you intend to use Linux on it seems to continue to apply.

XmlTextReader, UTF-8, Memory Corruption

XmlTextReader on the .NET CF doesn’t support anything but UTF-8 which can be a good thing as it can be a bad thing.

Good thing because UTF-8 is a very flexible character encoding giving access to the whole Unicode character range while still being compact and easy to handle.

Bad thing because PopScan doesn’t do UTF-8. It was just never needed as its primary market is countries well within the range of ISO-8859-1. This means that the protocol between server and client so far was XML encoded in ISO-8859-1.

To be able to speak with the Windows Mobile application, the server had to convert the data to UTF-8.

And this is where a small bug occurred: Part of the data wasn’t properly encoded and was transmitted as ISO-8859-1.

The correct thing a XML-Parser should do about obviously incorrect data is to bail out, which also is what the .NET CF DOM parser did.

XmlTextReader did something else though: It threw an uncatchable IndexOutOfRange exception either in Read() or ReadString(). And sometimes it miraculously changed its internal state – jumping from element to element even when just using ReadString().

To make things even worse, the exception happened at a location not even close to where the invalid character was in the stream.

In short, from what I have seen (undocumented and uncatchable exceptions being thrown at random places), it feels like the specific invalid character that was parsed in my particular situation caused memory corruption somewhere inside the parser.

Try to imagine how frustrating it was to find and fix this bug – it felt like the old days of manual memory allocation combined with stack corruption. And all because of one single bad byte in a stream of thousands of bytes.