SPAM insanity

<p>I don’t see much point in complaining about SPAM, but it’s slowly but surely reaching complete insanity…</p>

What you see here is the recent history view of my DSPAM – our second line of defense against SPAM.

Red means SPAM. (the latest of the messages was a quite clever phishing attempt which I had to manually reclassify)

To give even more perspective to this: The last genuine Email I received was this morning at 7:54 (it’s now 10 hours later) and even that was just an automatically generated mail from Skype.

To put it into even more perspective: My DSPAM reports that since december 22th, I got 897 SPAM messages and – brace yourself – 170 non-spam messages of which 100 were subversion commit emails and 60 other emails sent from automated cron-jobs.

What I’m asking myself now is: Do these spammers still get anything out of their work? The signal-to-noise ratio has gone down the drain in a manner which can only mean that no person on earth would actually still read through all this spam and even be stupid enough to actually fall for it.

How bad does it have to get before it gets better?

Oh and don’t think that DSPAM is all I’m doing… No… these 897 mails were the messages that passed through both the ix DNSBL and SpamAssassin.

Oh and: Kudos to the DSPAM team. A recognition rate of 99.957% is really, really good

Windows Installer – Worked around

I’ve talked about Windows Installer (the tool that parses these .MSI files) before and I’ve never really convinced that this technology really does its job. Just have a look at these previous articles: Why o why is my hard-drive so small?, A look at Windows Installer and The myth of XCOPY deployment

Yesterday I had a look at the Delphi 2007 installation process and it dawned me that I’m going to have to write yet another blog entry.

It’s my gut-feeling that 80% of all bigger software packages in Windows can’t live with MSIs default feature set and they have to work around inherent flaws in the design of that tool. Here’s what I found installers doing (in increasing order of stupidity):

  1. Use a .EXE-stub to install the MSI engine. These days this really doesn’t make sense any more as 99% of all windows installation already have MSI installed and the ones that don’t, you don’t want to support anyways (Windows Update requires MSI).
  2. Use a .EXE-stub that checks for availability and thereafter installs a bunch of prerequisites – sometimes even other MSI packages. This isn’t caused by MSI-files unable to detect the presence of prerequisites – it’s because MSI-files are unable to install other MSI files and the workaround (using merge packages) doesn’t work because most of the third party libraries to install don’t come as merge packages.
  3. Create a MSI-file which contains a traditional .EXE-Setup, unpack that to a temporary location and run it. This is what I call the “I want a Windows-Logo, but have no clue how to author MSI files”-type of installation (and I completely understand the motivation behind that) which just defeats all the purposes MSI files ever had. Still: Due to inherent limitations in the MSI engine, this is often times the only way to go.
  4. Create MSI-files that extract a vendor specific DLL, a setup script and all files to deploy (or even just an archive) and then use that vendor specific DLL to run the install script. This is what InstallShield does at least some of the time. This is another version of the “I have no clue how to author a MSI file”-installation with the additional “benefit” of being totally vendor-locked.
  5. Create a custom installer that installs all files and registry keys and then launch the windows installer with a temporary .MSI-file to register your installation work in the MSI-installer. This is what Delphi 2007 does. I feel this is another workaround for Microsoft’s policy that only MSI-driven software can get a windows-logo, but this time it’s vendor-locked and totally unnecessary and I’m not even sure if such a behavior is consistent with any kind of specification.

Only a small minority of installations really use pure MSI and these installations usually are installations of small software packages and as my previous articles show: The technology is far from fool-proof. While I see that Windows should provide a generalized means for driving software installations, MSI can’t be the solution as evidenced by the majority of packages using workarounds to get by the inherent flaws of the technology.

*sigh*

Button placement

Besides the fact that this message is lying to me (the device in question certainly is a Windows Mobile device and there can’t be any cradle problem because it’s an emulated image ActiveSync is trying to connect to), I have one question: What exactly do the OK and the Cancel button do?

And this newly created dialog is in ActiveSync 4.2 – way after the MS guys are said to have seen the light and are trying to optimize usability.

Oh and I could list some other “fishy” things about this dialog:

  • It has no indication of what the real problem is (a soft reset of the emulator image helped, by the way).
  • It has way too much text on it
  • Trying to format a list using * and improper indentation looks very unprofessional. Judging from the bottom part of the dialog where the buttons are, this is no plain MessageBox anyways, so it would have been doable to fix that.
  • The spacing between the buttons is not exactly consistend with the Windows-Standard

Dialogs like these is precisely why I doubt that Windows Mobile really is the right OS to run on a barcode scanner – at least if it’s a scanner that will be distributed among end-users with no clue of PCs. It’s such a good thing that the scanners finally have GPRS included.

Debugging PocketPCs

Currently I’m working with Windows Mobile based barcode scanning devices. With .NET 2.0, actually developing real-world applications for the mobile devices using .NET has become a viable alternative.

.NET 2.0 combines sufficient speed at runtime (though you have to often test for possible performance regressions) with a very powerful development library (really usable – as compared to .NET 1.0 on smart devices) and unbeatable development time.

All in all, I’m quite happy with this.

There’s one problem though: The debugger.

When debugging, I have two alternatives and both suck:

  1. Use the debugger to connect to the real hardware. This is actually quite fast and works flawlessly, but whenever I need to forcibly terminate the application (for example when an exception happened or when I’m pressing the Stop-Button in the debugger), the hardware crashes somewhere in the driver for the barcode scanner.

    Parts of the application stay in memory and are completely unkillable. The screen freezes

    To get out of this, I have to soft-reset the machine and wait half a century for it to boot up again.

  2. Use the emulator. This has the advantage of not crashing, but it’s so slow.

    From the moment of starting the application in VS until the screen of the application is loaded in the emulator, nearly three minutes pass. That slow.

So programming for mobile devices mainly contains of waiting. Waiting for reboots or waiting for the emulator. This is wearing me down.

Usually, I change some 10 lines or so and then run the application to test what I’ve just written. That’s how I work and it works very well because I get immediate feedback and it helps me to write code what’s working in the first place.

Unfortunately, with these prohibitive long startup times, I’m forced to write more and more code in one batch which means even more time wasted with debugging.

*sigh*

The pain of email SPAM

Lately, the SPAM problem got a lot worse in my email INBOX. Spammers seem to more and more check if their mail gets flagged by SpamAssasin and tweak the messages until they get through.

Due to some tricky aliasing going on on the mail server, I’m unable to properly use the bayes filter of SpamAssasin on our main mail server. You see, I have an infinite amount of addresses which are in the end delivered to the same account and all that aliasing can only be done after the message has passed SpamAssassin.

This means that even though mail may go to one and the same user in the end, it’s seen as mail for many different users by SpamAssassin.

This inability to use Bayes with SpamAssassin means that lately, SPAM has been getting through the filter.

So much SPAM that I began getting really, really annoyed.

I know that mail clients themselves also have bayes based SPAM filters, but I often check my email account with my mobile phone or on different computers, so I’m dependent on a solution that filters out the SPAM before it reaches my INBOX on the server.

The day before yesterday I had enough.

While all mail for all domains I’m managing is handled by a customized MySQL-Exim-Courier setting, mail to the @sensational.ch domain is relayed to another server and then delivered to our exchange server.

Even better: That final delivery step is done after all the aliasing steps (the catch-all aliases being the difficult part here) have completed. This means that I can in-fact have all mail to @sensational.ch pass through a bayes filter and the messages will all be filtered for the correct account.

This made me install dspam on the relay that transmits mail from our central server to the exchange server.

Even after only one day of training, I’m getting impressive results: DSPAM only touches mail that isn’t flagged as spam by SpamAssassin, which means that it’s carefully crafted to look “real”.

After one day of training, DSPAM usually detects junk messages and I’m down to one false negative every 10 junk messages (and no false positives).

Even after running SpamAssassin and thus filtering out the obvious suspects, a whopping 40% of emails I’m receiving are SPAM. So nearly half of the messages not already filtered out by SA are still SPAM.

If I take a look at the big picture, even when counting the various mails sent by various cron daemons as genuine email, I’m getting much more junk email than genuine email per day!

Yesterday, tuesday, for example, I got – including mails from cron jobs and backup copies of order confirmations for PopScan installations currently in public tests – 62 genuine emails and 252 junk mails of which 187 were caught by SpamAssassin and the rest was detected by DSPAM (with the exception of two mails that got through).

This is insane. I’m getting four times more spam than genuine messages! What the hell are these people thinking? With that volume of junk filling up our inboxes how ever could one of these “advertisers” think that somebody is both stupid enough to fall for such a message and intelligent enough to pick the one to fall for from all the others?

Anyways. This isn’t supposed to be a rant. It’s supposed to be a praise to DSPAM. Thanks guys! You rule!

ripping DVDs

I have plenty of DVDs in my possession: Some movies of dubious quality which I bought when I was still going to school (like “Deep Rising” – eeew) and many, many episodes of various series (Columbo, the complete Babylon 5 series, A-Team and other pearls).

As you may know, I’m soon to move into a new flat which I thought would be a nice opportunity to reorganize my library.

shion has around 1.5TB of storage space and I can easily upgrade her capacity (shion is the only computer I own I’m using a female pronoun for – the machine is something really special to me – like the warships of old times) by plugging in yet another USB hub and USB hard drives.

It makes totally sense to use that unlimited amount of storage capacity to store all my movies – not only the ones I’ve downloaded (like video game speed runs). Spoiled by the ease of use of ripping CDs, I thought, that this would be just another little thing to do before moving.

You know: Enter the DVD, use the ripper, use the encoder, done.

Unfortunately, this is proving to be harder than it looked like in the first place:

  • Under Mac OS X, you can try to use the Unix tools with fink or some home-grown native tools. Whatever you do, you either get outdated software (fink) or not really working freeware tools documented in outdated tutorials. Nah.
  • Under Windows, there are two kinds of utilities: On one hand, you have the single-click ones (like AutoGK) which really do what I initially wanted. Unfortunately, they are limited in their use: They provide only a limited amount of output formats (like no x264) and they hard-code the subtitles into the movie stream. But they are easy to use. On the other hand, you have the hardcore tools like Gordian Knot or MeGUI or even StaxRip. These tools are frontends for other tools that work like Unix tools: Each does one thing, but tries to excel at that one thing.

    This could be a good thing, but unfortunately, it fails at things like awful documentation, hard-coded paths to files everywhere and outdated tools.

    I could not get any of the tools listed above to actually create a x264 AVI or MKV-File without either throwing a completely unusable error message (“Unknown exception ocurred”) or just not working at all or missing things like subtitles.

  • Linux has dvd::rip which is a really nice solution, but unfortunately, no solution for me as I don’t have the right platform to run it on: My MCE machine is – well – running Windows MCE, my laptop is running Ubuntu (no luck with the debian packages and no ubuntu-packages). shion is running Gentoo, but she’s headless, so I have to use a remote X-connection which is awfully slow and non-scriptable.

The solution I want works on the Linux (or MacOS X) console, is scriptable and – well – works.

I guess I’m going the hard-core way and use transcode which is what dvd::rip is using – provided I find good documentation (I’m more than willing to read and learn – if the documentation is current enough and actually documents the software that I’m running and not the software at the state of two years ago).

I’ll keep you posted on how I’m progressing.

XmlTextReader, UTF-8, Memory Corruption

XmlTextReader on the .NET CF doesn’t support anything but UTF-8 which can be a good thing as it can be a bad thing.

Good thing because UTF-8 is a very flexible character encoding giving access to the whole Unicode character range while still being compact and easy to handle.

Bad thing because PopScan doesn’t do UTF-8. It was just never needed as its primary market is countries well within the range of ISO-8859-1. This means that the protocol between server and client so far was XML encoded in ISO-8859-1.

To be able to speak with the Windows Mobile application, the server had to convert the data to UTF-8.

And this is where a small bug occurred: Part of the data wasn’t properly encoded and was transmitted as ISO-8859-1.

The correct thing a XML-Parser should do about obviously incorrect data is to bail out, which also is what the .NET CF DOM parser did.

XmlTextReader did something else though: It threw an uncatchable IndexOutOfRange exception either in Read() or ReadString(). And sometimes it miraculously changed its internal state – jumping from element to element even when just using ReadString().

To make things even worse, the exception happened at a location not even close to where the invalid character was in the stream.

In short, from what I have seen (undocumented and uncatchable exceptions being thrown at random places), it feels like the specific invalid character that was parsed in my particular situation caused memory corruption somewhere inside the parser.

Try to imagine how frustrating it was to find and fix this bug – it felt like the old days of manual memory allocation combined with stack corruption. And all because of one single bad byte in a stream of thousands of bytes.

The price of automatisms

Visual Studio 2005 and the .NET Framework 2.0 brought us the concept of table adapters and a nice visual designer for databases allowing you to quickly “write” (point and click) your data access layer.

Even when using the third party SQLite library, you can make use of this facility and it’s true: Doing basic stuff works awfully well and quickly.

The problems start when what you intend to do is more complex. Then the tool becomes braindead.

The worst thing about it is that it’s tailor-made for SQL-Server and that it insists on parsing your queries instead of letting the database or even the database driver do that.

If you add any feature to your query that is not supported by SQL-Server (keep in mind that I’m NOT working with SQL-Server – I don’t even have a SQL-Server installed), the tool will complain about not being able to parse the query.

The dialog provides an option to ignore the error but it doesn’t work like I would have hoped it should: “Ignore” doesn’t mean: “Keep the old configuration”. It means “work as if there wasn’t any query at all”.

This means that even when you want to do something simple as write “insert or replace” instead of “insert” (saves one query per batch item and I’m doing lots of batch items) or just add a limit clause “limit 20” will make the whole database designer unusable for you.

The ironic thing about the limit clause is that the designer certainly accepts “select top xxx from…” which will fail at run time due to SQLite not supporting that proprietary extension.

So in the end it’s back to doing it manually.

But wait a minute: Doing it manually is even harder that it should be because the help, tutorials, books and even google all only talk about the automatic way, either unaware or not caring that it just won’t work if you want to do more than example code.

Oldstyle HTML – the worst offenders

More and more, the WWW is cleansed of old, outdated pages. In more and more cases, the browsers will finally be able to go into standards mode – no more quirks.

But one bastion still remains to be conquered.

Consider this:

<br><font size=2 face="sans-serif">Danke</font>
<br><font size=2 face="sans-serif">Gruss</font>
<br><font size=2 face="sans-serif">xxxx</font>

By accident, I had my email client on “View Source” mode and this is the (complete) body of an email my dad sent me.

Beside the fact that it’s a total abuse of HTML email (the message does not contain anything plain text would not have been able to contain), it’s an obscene waste of bandwith:

The email ALSO contains a text alternative part, effectively doubling its size – not to speak of the unneeded HTML tags.

What’s even worse: This is presentational markup at its finest. Even if I would insist in creating a HTML mail for this message, this would have totally sufficed:

Danke<br />
Gruss<br />
xxxx<br />

Or – semantically correct:

<p>Danke</p>
<p>Gruss</p>
<p>xxx</p>

Personally, I actually see reason behind a certain kind of HTML email. Newsletter or product announcements come to mind. Why use plain text if you can send over the whole message in a way that’s nice for users to view?

Your users are used to viewing rich content – everyone of them probably has a web browser installed.

And with todays bandwith it’s even possible to transfer the image and all pictures in one nice package. No security warnings, no crappy looking layout due to broken images.

What I don’t see though is what email programs are actually doing. Why send over messages like the one in the example as HTML? Why waste the users bandwith (granted: It doesn’t matter any more) and even create security problems (by forcing the email client to display HTML) to send a message that’s not looking any different than one consisting of plain text?

The message also underlines another problem: The old presentational markup actually lent itself perfectly for creating WYSIWYG editors. But today’s way of creating HTML pages just won’t work in these editors for the reasons I outlined in my posting about Word 2007

Still – using a little bit of CSS could result in so much nicer HTML emails which have the additional benefit of being totally readable even if the user has a client not capable of displaying HTML (which is a wise decision security-wise).

Oh and in case you wonder what client created that email…

    X-MIMETrack: Serialize by Router on ZHJZ11/xxxx(Release 7.0.1FP1|April 17, 2006) at
     02.10.2006 16:35:09,
    	Serialize complete at 02.10.2006 16:35:09,
    	Itemize by SMTP Server on ZHJZ05/xxxxx(Release 6.5.3|September 14, 2004) at
     02.10.2006 16:36:15,
    	Serialize by Router on ZHJZ05/xxxxx(Release 6.5.3|September 14, 2004) at
     02.10.2006 16:36:19,
    	Serialize complete at 02.10.2006 16:36:19

I wonder if using a notes version of september 04 is a good thing to do in todays world full of spam, spyware and other nice things – especially considering that my dad is working in a public office.

Word 2007 – So much wasted energy

Today, I’ve come across a screencast showing how to quickly format a document using the all new Word 2007 – part of office 2007 (don’t forget to also read the associated blog post).

If you have any idea how Word works and how to actually use it, you will be as impressed as the presenter (and admittedly I) was: Apply some styles, chose a theme and be done with it.

Operations that took ages to get right are now done in a minute and it’ll be very easy to create good looking documents.

Too bad that it’s looking entirely different in practice.

If I watch my parents or even my coworkers use word, all I’m seeing is styles being avoided. Heading 1? Just use the formatting toolbar to make the font bigger and bold.

Increase spacing between paragraphs? Hit return twice.

Add empty spacing after a heading (which isn’t even one from Word’s point of view)? Hit return twice.

Indent text? Hit tab (or even space as seen in my mother’s documents).

This also is the reason why those people never seem to have problems with word: The formatting toolbar works perfectly fine – the bugs lie in the “advanced” features like assigning styles.

Now the problem is that all features shown in that screencast are totally dependent of the styles being set correctly.

If you take the document shown as it is before you apply styling and then use the theme function to theme your document, nothing will happen as word doesn’t know the semantic data about your document. What’s a heading? What’s a subtitle? It’s all plain text.

Conversely, if you style your document the “traditional” way (using the formatting toolbar) and then try to apply the theme, nothing will happen either as the semantic information is still missing.

This is the exact reason why WYSIWYG looks like a nice gimmick at the first glance, but it more or less makes further automated work on the document impossible to do.

You can try and hack around this of course – try to see pattern in the user’s formatting and guess the right styles. But this can lead to even bigger confusion later on as you can make wrong guesses which will in the end make the themeing work inconsistently.

Without actually using semantic analysis of the text (which currently is impossible to do), you will never be able to accurately use stuff like themeing – unless the user provides the semantic information by using styles which in turn defeats the purpose of WYSIWYG.

So, while I really like that new themeing feature of Office 2007, I fear that for the majority of the people it will be completely useless as it plain won’t work.

Besides, themes are clearly made for the end user at home – in a corporate environment you will have to create documents according to the corporate design which probably won’t be based on a pre-built style in office.

And end users are the people the least able to understand how assigning styles to content works.

And once people “get” how to work with text styles and the themes will begin to work, we’ll be back at square one where everyone and their friends are using all the same theme because it’s the only one looking more or less acceptable, defeating all originality initially in the theme.