XmlTextReader, UTF-8, Memory Corruption

XmlTextReader on the .NET CF doesn’t support anything but UTF-8 which can be a good thing as it can be a bad thing.

Good thing because UTF-8 is a very flexible character encoding giving access to the whole Unicode character range while still being compact and easy to handle.

Bad thing because PopScan doesn’t do UTF-8. It was just never needed as its primary market is countries well within the range of ISO-8859-1. This means that the protocol between server and client so far was XML encoded in ISO-8859-1.

To be able to speak with the Windows Mobile application, the server had to convert the data to UTF-8.

And this is where a small bug occurred: Part of the data wasn’t properly encoded and was transmitted as ISO-8859-1.

The correct thing a XML-Parser should do about obviously incorrect data is to bail out, which also is what the .NET CF DOM parser did.

XmlTextReader did something else though: It threw an uncatchable IndexOutOfRange exception either in Read() or ReadString(). And sometimes it miraculously changed its internal state – jumping from element to element even when just using ReadString().

To make things even worse, the exception happened at a location not even close to where the invalid character was in the stream.

In short, from what I have seen (undocumented and uncatchable exceptions being thrown at random places), it feels like the specific invalid character that was parsed in my particular situation caused memory corruption somewhere inside the parser.

Try to imagine how frustrating it was to find and fix this bug – it felt like the old days of manual memory allocation combined with stack corruption. And all because of one single bad byte in a stream of thousands of bytes.

The price of automatisms

Visual Studio 2005 and the .NET Framework 2.0 brought us the concept of table adapters and a nice visual designer for databases allowing you to quickly “write” (point and click) your data access layer.

Even when using the third party SQLite library, you can make use of this facility and it’s true: Doing basic stuff works awfully well and quickly.

The problems start when what you intend to do is more complex. Then the tool becomes braindead.

The worst thing about it is that it’s tailor-made for SQL-Server and that it insists on parsing your queries instead of letting the database or even the database driver do that.

If you add any feature to your query that is not supported by SQL-Server (keep in mind that I’m NOT working with SQL-Server – I don’t even have a SQL-Server installed), the tool will complain about not being able to parse the query.

The dialog provides an option to ignore the error but it doesn’t work like I would have hoped it should: “Ignore” doesn’t mean: “Keep the old configuration”. It means “work as if there wasn’t any query at all”.

This means that even when you want to do something simple as write “insert or replace” instead of “insert” (saves one query per batch item and I’m doing lots of batch items) or just add a limit clause “limit 20” will make the whole database designer unusable for you.

The ironic thing about the limit clause is that the designer certainly accepts “select top xxx from…” which will fail at run time due to SQLite not supporting that proprietary extension.

So in the end it’s back to doing it manually.

But wait a minute: Doing it manually is even harder that it should be because the help, tutorials, books and even google all only talk about the automatic way, either unaware or not caring that it just won’t work if you want to do more than example code.

SQLite, Windows Mobile 2005, Performance

As you know from previous posts, I’m working with SQLite on mobile devices which lately means Windows Mobile 2005 (there was a Linux device before that tough, but it was hit by the RoHS regulation of the European union).

In previous experiments with the older generation of devices (Windows CE 4.x / PocketPC 2003), I was surprised by the high performance SQLite is able to achieve, even in complex queries. But this time, something felt strange: Searching for a string in a table was very, very slow.

The problem is that CE5 (and with it Windows Mobile 2005) uses non-volatile flash for storage. This has the tremendous advantage that the devices don’t lose their data when the battery runs out.

But compared to DRAM, Flash is slow. Very slow. Totally slow.

SQLite doesn’t load the complete database into RAM, but only loads small chunks of the data. This in turn means that when you have to do a sequential table scan (which you have to do when you have a LIKE ‘%term%’ condition), you are more or less dependant on the speed of the storage device.

This what caused SQLite to be slow when searching. It also caused synchronizing data to be slow because SQLite writes data out into checkpoint files during transactions.

The fix was to trade off launch speed (the application is nearly never started fresh) for operating speed by loading the data into an in-memory table and using that for all operations.

attach ":memory:" as mem;

create table mem.prod as select * from prod;

Later on, the trick was to just refer to mem.prod instead of just prod.

Of course you’ll have to take extra precaution when you store the data back to the file, but as SQLite even supports transactions, most of the time, you get away with

begin work;

delete from prod;

insert into prod (select * from mem.prod);

commit;

So even if something goes wrong, you still have the state of the data of the time when it was loaded (which is perfectly fine for my usage scenario).

So in conclusion some hints about SQLite on a Windows Mobile 2005 device:

  • It works like a charm
  • It’s very fast if it can use indexes
  • It’s terribly slow if it has to scan a table
  • You can fix that limitation by loading the data into memory (you can even to it on a per-table basis)