programming- gnegg

IRC user interface idea

Don’t you know this problem?

You are connected to some amount of IRC servers and you are watching a certain amount of channels.

Every IRC client I know either uses tabs or windows to separate these channels in their own context, usually providing some visual clue if there was activity in a different channel you are not currently watching.

While this metaphor probably makes a lot of sense in very busy channels, I think that consolidating every channel into one single window probably is a much better way for you to follow what’s going on and to talk back to the channels – especially when you are watching lesser populated channels.

This frees you from the burden of constantly switching channel windows (or tabs) to see what is going on.

Let’s say you are connected to irc1.example.com and irc2.example.com. On irc1, you are connected to #channel1a and #channel1b and on irc2, you are connected to #channel2a

Now, to my knowledge, every current IRC client either uses three windows or three tabs (maybe even 5 windows/tabs because the server themselves get a window too) to represent this information. In window based clients, you can arrange all of them aside of each other, but talking to a certain channel still forces you to focus different windows.

Now with my idea, you would consolidate these channels. You would only get one window which contains all the messages from all channels.

So in the simplest incarnation, you’d probably see something like this in your chat window:

1) irc1/#channel1a [user1aa]> hi there!
2) irc1/#channel1b [user1a]> hi there!
1) irc1/#channel1a [user1ab]> hi user1aa
3) irc2/#channel2a [user2aa]> hi folks!

though you would probably understand much more easily what’s going on if the server-, channel- and user names were a bit more… well… distinct.

Of course, you could add color. You assign each channel a color, like this:

1) irc1/#channel1a [user1aa]> hi there!
2) irc1/#channel1b [user1a]> hi there!
1) irc1/#channel1a [user1ab]> hi user1aa
3) irc2/#channel2a [user2aa]> hi folks!

and if you need to, you can still color nicks.

1) irc1/#channel1a [user1aa]> hi there!
2) irc1/#channel1b [user1a]> hi there!
1) irc1/#channel1a [user1ab]> hi user1aa
3) irc2/#channel2a [user2aa]> hi folks!

Now… how to talk back?

Easy. Every channel is assigned a number for quick access. Just type /[channel number] to switch to that channel and type, then hit enter. The channel you last talked to is predefined and sticks around until you hit /[another channel number].

This feels so much an easier and more intuitive way to handle multiple connections, especially in cases where the channels you are joined are not too active, as in this way, you can easily watch everything that is going on.

Also, usually, discussions happen in intervals in different channels. You will only very rarely see the color concert I’ve shown above as usually, you have a discussion going on in one channel while the others are relatively quiet.

I’ll probably have to go and implement a proof-of-concept sometime in the future, but this feels like such a nice idea when just looking at it. Why is nobody doing it? What am I missing?

Automatic language detection

If you write a website, do not use Geolocation to determine the language to display to your user.

If you write a desktop application, do not use the region setting to determine the language to display to your user.

This is incredibly annoying for some of us, especially for me which is why I’m ranting here.

The moment Google released their (awful) German translation for their RSS reader, I was served the German version just because I have a Swiss IP address.

Here in Switzerland, we actually speak one of three (or four, depending on who you ask) languages, so defaulting to German is probably not of much help for the people in the french speaking part.

Additionally, there are many users fluent in (at least reading) English. We always prefer the original language if at all possible because generally, translations never quite work. Even if you have the best translators at work, translated texts never feel fluid. Especially not when you are used to the original version.

So, Google, what were you thinking to switch me over to the German version of the reader? I have been using the English version for more than a year, so clearly, I understood enough of that language to be able to use it. More than 90% of the RSS feeds I’m subscribed to are, in fact, in English. Can you imagine how pissed I was to see the interface changed?

This is even worse on the iPhone/iPod frontend, because, there, you don’t even provide an option to change the language aside of manually hacking the URL.

Or take desktop applications. I live in the German speaking parts of Switzerland. True. So naturally I have set my locale settings to Swiss German. You know: I want to have the correct number formatting, I want my weeks to start on Mondays. I want the correct currency. I want my 24 hours clock I’m used to.

Actually, I also want the German week and month names, because I will be using these in most of my letters and documents, which are, in fact, German too.

But my OS installation is English. I am used to English. I prefer English. Why do so many programs insist to use the locale setting to determine the display language? Do you developers think it’s funny to have a mish-mash of languages on the screen? Don’t you think that me using an English OS version may be an indication that I do not want to read your crappy German translation alongside the English user interface of my OS?

Don’t you think that it feels really stupid to have a button in a German dialog box open another, English, dialog (the first one is from Chrome, the one that opens once you click “Zertifikate verwalten” (Manage certificates) is from Windows itself)?

In Chrome, I can at least fix the language – once I found the knob to turn. At first, it was easier for me to just delete the German localization file from the chrome installation because, due to being completely unused to German UIs, I was unable to find the right setting.

This is really annoying and I see this particular problem being neglected on an incredibly large scale. I know that I am a minority, but the problem is so terribly easy to fix:

All current browsers send an Accept-Language header. In contrast to the earlier times, nowadays, it is actually correctly preset in all the common browsers. Use that. Don’t use my IP-address.
Instead of reading the locale setting in my OS, ask the OS for its UI language and use that to determine which localization to load (actually, this is the recommended way of doing things according to Microsoft’s guidelines at least since Windows XP which was 2001).

Using these two simple tricks, you help a minority without hindering the majority in any way and without additional development overhead!

Actually, you’ll be getting away a lot cheaper than before. GeoIP is expensive if you want it to be accurate (and you do want that. Don’t you?), whereas there are ready-to-use libraries to determine the correct language even from the most complex Accept-Language-Header.

Asking the OS for the UI language isn’t harder than asking it for the locale, so no overhead there either.

Please, developers, please have mercy! Stop the annoyance! Stop it now!

OAuth signature methods

I’m currently looking into web services and different methods of request authentication, especially as what I’m aiming to end up with is something inherently RESTful as this method will provide me with the best flexibility when designing a frontend to the service and generally, the arguments of the REST crowd seem to convince me (works like the human readable web, inherently scalable, enforces clean structure of resources and finally: easy to program against due to “obvious” API).

As different services are going to communicate with themselves, sometimes acting as users of their respective platforms and because I’m not really inclined to pass credentials around (or make the user do one half of the tasks on one site and the other half on another site), I was looking into different methods of authentication and authorization which work in a RESTful enviroment and work without passing around user credentials.

The first thing I did was to note the requirements and subsequently, I quickly designed something using public key cryptography which would have worked quite nicely (possibly – I’m no expert in this field – yet).

Then I learned about OAuth which was designed precisely to solve my issues.

Eager, I read through the specification, but I was put off by one single fact: The default method for signing requests, the method that is most widely used, the method that is most widely supported, relies on a shared secret.

Even worse: The shared secret must be known in clear on both the client and the server (using the common terminology here; OAuth speaks of consumers and providers, but I’m (still) more used to the traditional naming).

This is bad on multiple levels:

As the secret is stored on two places (client and server), it’s twice as probable to leak out than if it’s only stored on one place (the client).
If the token is compromised, the attacker can act in the name of the client with no way of detection.
Frankly, it’s responsibility I, as a server designer, would not want to take on. If the secret is on the client and the client screws up and lets it leak, it’s their problem, if the secret is stored on the server and the server screws up, it’s my problem and I have to take responsibility.
Personally, I’m quite confident that I would not leak secret tokens, but can I be sure? Maybe. Do I even want to think about this? Certainly not if there is another option.
If, god forbid, the whole table containing all the shared secrets is compromised, I’m really, utterly screwed as the attacker can use all services, impersonating any user at will.
As the server needs to know all shared secrets, the risk of losing all of them is only even created. If only the client knows the secret, an attacker has to compromise each client individually. If the server knows the secret, it suffices to compromise the server to get all clients.
As per the point above, the server gets to be a really interesting target for attacks and thus needs to be extra secured and even needs to take measures against all kinds of more-or-less intelligent attacks (usually ending up DoSing the server or worse).

In the end, HMAC-SHA1 is just repeating history. At first, we stored passwords in the clear, then we’ve learned to hash them, then we even salted them and now we’re exchanging them for tokens stored in the clear.

No.

What I need is something that keeps the secret on the client.

The secret should never ever need to be transmitted to the server. The server should have no knowledge at all of the secret.

Thankfully, OAuth contains a solution for this problem: RSA-SHA1 as defined in section 9.3 of the specification. Unfortunately, it leaves a lot to be desired though. Whereas the rest of the specification is a pleasure to read and very, well, specific, 9.3 contains the following phrase:

It is assumed that the Consumer has provided its RSA public key in a verified way to the Service Provider, in a manner which is beyond the scope of this specification.

Sure. Just specify the (IMHO) useless way using shared secrets and leave out the interesting and IMHO only functional method.

Sure. Transmitting a Public Key is a piece of cake (it’s public after all), but this puts another burden on the writer of the provider documentation and as it’s unspecified, implementors will be forced to amend the existing libraries with custom code to transmit the key.

Also I’m unclear on header size limitations. As the server needs to know what public key was used for signature (oauth_consumer_key), it must be sent on each requests. While manually generated public token can be small, a public key certainly isn’t. Is there a size-limit for HTTP-headers? I’ll have to check that.

I could just transmit the key ID (the key is known on the server) or the key fingerprint as the consumer key, but is that following the standard? I didn’t see this documented anywhere and examples in the wild are very scarcely implemented.

Well… as usual, the better solution just requires more work and I can live with that, especially considering as, for now, I’ll be the person to write both server and client, but I feel the upcoming pain, should third party consumers decide to hook up with that provider.

If you ask me what I would have done in the footsteps of the OAuth guys, I would only have specified RSA-SHA1 (and maybe PLAINTEXT) and not even bothered with HMAC-SHA1. And I would have specified a standard way for public key exchange between consumer and provider.

Now the train has left and everyone interested in creating a really secure (and convenient – at least for the provider) solution will be left with more work and not standardized methods.

Beautifying commits with git

When you look at our Subversion log, you’d often see revisions containing multiple topics, which is something I don’t particularly like. The main problem is merging patches. The moment you cramp up multiple things into a single commit, you are doomed should you ever decide to merge one of the things into another branch.

Because it’s such an easy thing to do, I began committing really, really often in git, but whenever I was writing the changes back to subversion, I’ve used merge –squash as to not clutter the main revision history with abandoned attempts at fixing a problem or implementing a feature.

So in essence, this meant that by using git, I was working against my usual goals: The actual commits to SVN were larger than before which is the exact opposite thing of how I’d want the repository to look.

I’ve lived with that, until I learned about the interactive mode of git add.

Beginners with git (at least those coming from subversion and friends) always struggle to really get the concept of the index and usually just git commit –a when committing changes.

This does exactly the same thing as a svn commit would do: It takes all changes you made to your working copy and commits them to the repository. This also means that the smallest unit of change you can track is the state of the working copy itself.

To do finer grained commits, you can git add a file and commit that, which is the same as svn status followed by some grep and awk magic.

But even a file is too large a unit for a commit if you ask me. When you implement feature X, it’s possible if not very probable, that you fix bugs a and b and extend the interface I to make the feature Y work – a feature on which X depends.

Bugfixes, interface changes, subfeatures. A git commit –a will mash them all together. A git add per file will mash some of them together. Unless you are really really careful and cleanly only do one thing at a time, but in the end that’s now how reality works.

It may very well be that you discover bug b after having written a good amount of code for feature Y and that both Y and b are in the same file. Now you have to either back out b again, commit Y and reapply b or you just commit Y and b in one go, making it very hard to later just merge b into a maintenance branch because you’d also get Y which you would not want to.

But backing out already written code to make a commit? This is not a productive workflow. I could never make myself do something like that, let alone my coworkers. Aside of that, it’s yet another cause to create errors.

This is where the git index shines. Git tracks content. The index is a stage area where you store content you whish to later commit to the repository. Content isn’t bound to a file. It’s just content. By help of the index, you can incrementally collect single changes in different files, assemble them to a complete package and commit that to the repository.

As the index is tracking content and not files, you can add parts of files to it. This solves the problems outlined above.

So once I have completed Feature X, and assuming I could do it in one quick go, then I run git add with the –i argument. Now I see a list of changed files in my working copy. Using the patch-command, I can decide, hunk per hunk, whether it should be included in the index or not. Once I’m done, I exit the tool using 7. Then I run git commit¹⁾ to commit all the changes I’ve put into the index.

Remember: This is not done per file, but per line in the file. This way I can separate all the changes in my working copy, bug a and b, feature Y and X into single commits and commit them separately.

With a clean history like that, I can consequently merge the feature branch without —squash, thus keeping the history when dcommiting to subversion, finally producing something that can easily be merged around and tracked.

This is yet another feature of git that, after you get used to it, makes this VCS shine even more than everything else I’ve seen so far.

Git is fun indeed.

1) and not git commit -a which would destroy all the fine-grained plucking of lines you just did – trust me: I know. Now.

Converting ogg streams into mp3 ones

This is just an announcement for my newest quick-hack which can be used to on-the-fly convert streams from webradios which use the ogg/vorbis format into the mp3 format which is more widely supported by the various devices out there.

I have created an own dedicated page for the project for those who are interested.

Also, I really got to like github.com, not as the commercial service they intend to be (I’ve already written about the stupidity of hosting your company trade secrets at a company in a foreign country with foreign legislation), but as a place to quickly and easily dump some code you want to be publically available without going through all the hassle otherwise associated with public project hosting.

This is why this little script is hosted there and not here. As I’m using git, even if github goes away, I still have the full repository around to either self-host or let someone else host for me, which is a crucial requirement for me to outsource anything.

Simplest possible RPCs in PHP

After spending hours to find out why a particular combination of SoapClient in PHP itself and SOAP::Server from PEAR didn’t consistenly work together (sometimes, arrays passed around lost an arbitrary number of elements), I thought about what would be needed to make RPCs work form a PHP client to a PHP server.

I wanted nothing fancy and I certainly wanted as less an overhead as humanly possible.

This is what I came up with for the server:

<?php
header('Content-Type: text/plain');

require_once('a/file/containing/a/class/you/want/to/expose.php');

$method = str_replace('/', '', $_SERVER['PATH_INFO']);

if ($_SERVER['REQUEST_METHOD'] != 'POST'){
   sendResponse(array('state' =&gt; 'error', 'cause' =&gt; 'unsuppored HTTP method'));
}

$s = new MyServerObject();
$params = unserialize(file_get_contents('php://input'));
if ( ($res = call_user_func_array(array($s, $method), $params)) === false)
   sendResponse(array('state' => 'error', 'cause' => 'RPC failed'));
if (is_object($res))
   $res = get_object_vars($res);
sendResponse($res);

function sendResponse($resobj){
    echo serialize($resobj);
    exit;

}

?>

This client as shown below is a bit more complex, mainly because it contains some HTTP protocol logic. Logic, which could possibly be reduced to 2-3 lines of code if I’d use the CURL library, but the client in this case does not have the luxury of having access to such functionality.

Also, I’ve already had the function laying around (/me winks at domi), so that’s what I used (as opposed to file_get_contents with a pre-prepared stream context). This way, we DO have the advantage of learning a bit of how HTTP works and we are totally self-contained.

<?php
class Client{
    function __call($name, $args){
        $req = $this-&gt;openHTTPRequest('http://localhost:5436/restapi.php/'.$name, 'POST', array('Content-Type' =&gt; 'text/plain'), serialize($args));
        $data = unserialize(stream_get_contents($req['handle']));
        fclose($req['handle']);
        return $data;
    }
    private function openHTTPRequest($url, $method = 'GET', $additional_headers = null, $data = null){
        $parts = parse_url($url);

        $fp = fsockopen($parts['host'], $parts['port'] ? $parts['port'] : 80);
        fprintf($fp, "%s %s HTTP/1.1rn", $method, implode('?', array($parts['path'], $parts['query'])));
        fputs($fp, "Host: ".$parts['host']."rn");
        if ($data){
            fputs($fp, 'Content-Length: '.strlen($data)."rn");
        }
        if (is_array($additional_headers)){
            foreach($additional_headers as $name => $value){
                fprintf($fp, "%s: %srn", $name, $value);
            }
        }
        fputs($fp, "Connection: closernrn");
        if ($data)
            fputs($fp, "$datarn");

        // read away header
        $header = array();
        $response = "";
        while(!feof($fp)) {
            $line = trim(fgets($fp, 1024));
            if (empty($response)){
                $response = $line;
                continue;
            }
            if (empty($line)){
                break;
            }
            list($name, $value) = explode(':', $line, 2);
            $header[strtolower(trim($name))] = trim($value);
        }
        return array('response' => $response, 'header' => $header, 'handle' => $fp);
   }

}

$client = new Client();
$result = $client->someMethod(array('data' => 'even arrays work'));

?>

What you can’t pass around this way is objects (at least object which are not of type stdClass) as both client and server would need to have access to the prototype. Also, this seriously lacks error handling. But it generally works much better than what SOAP ever could accomplish.

Naturally, I give up stuff when compared to SOAP or any «real» RPC solution:

This one works only with PHP
It has limitations on what data structures can be passed around, though that’s aleviated by PHP’s incredibly strong array support.
It relies heavily on PHP’s loosely typed nature and thus probably isn’t as robust.

Still, protocols like SOAP (or even any protocol with either «simple» or «lightweight» in its name) tend to be so complicated that it’s incredibly hard if not impossible to create different implementations what still correctly work together in all cases.

In my case, where I have the problem of having to separate two pieces of the same application due to unstable third-party libraries which I would not want to have linked into every PHP instance running on that server for which the solution outlined above (plus some error handling code) works better than SOAP on so many levels:

it’s easily debuggable. No need for wireshark or comparable tools
client and server are written by me, so they are under my full control
it works all the time
it relies on as little functionality of PHP as possible and the functionality it depends on is widely used and tested, to I can assume that it’s reasonably bug-free (aside of my own bugs).
it’s a whole lot faster than SOAP, though this does not matter at all in this case.

Dependent on working infrastructure

If you create and later deploy and run a web application, then you are dependent on a working infrastructure: You need a working web server, you need a working application server and in most cases, you’ll need a working database server.

Also, you’d want a solution that always and consistently works.

We’ve been using lighttpd/FastCGI/PHP for our deployment needs lately. I’ve preferred this to apache due to the easier configuration possible with lighty (out of the box automated virtual hosting for example), the potentially higher performance (due to long-running FastCGI processes) and the smaller amount of memory consumed by lighttpd.

But last week, I had to learn the price of walking off the beaten path (Apache, mod_php).

In one particular constellation, the lighty, fastcgi, php combination, running on a Gentoo box sometimes (read: 50% of the time) a certain script didn’t output all the data it should have. Instead, lighty randomly sent out RST packets. This without any indication of what could be wrong in any of the involved log files.

Naturally, I looked everywhere.

I read the source code of PHP. I’ve created reduced test cases. I’ve tried workarounds.

The problem didn’t go away until I tested the same script with Apache.

This is where I’m getting pragmatic: I depend on a working infrastructure. I need it to work. Our customers need it to work. I don’t care who is to blame. Is it PHP? Is it lighty? Is it Gentoo? Is it the ISP (though it would have to be on the senders end as I’ve seen the described failure with different ISPs)?

I don’t care.

My interest is in developing a web application. Not in deploying one. Not really, anyways.

I’m willing (and able) to fix bugs in my development environment. I may even be able to fix bugs in my deployment platform. But I’m certainly not willing to. Not if there is a competing platform that works.

So after quite some time with lighty and fastcgi, it’s back to Apache. The prospect of having a consistently working backed largely outweighs the theoretical benefits of memory savings, I’m afraid.

VMware shared folders and Visual Studio

ver since I’ve seen the light, I’m using git for every possible situation. Subversion is ok, but git is fun. It changed the way how I do developement. It allowed me to create ever so many fun-features for our product. Even in spare-time – without the fear of never completing and thus wasting them.

I have so many branches of all our projects – every one of them containing useful, but just not ready for prime-time feature. But when the time is right, I will be able to use that work. No more wasting it away because a bugfix touches the same file.

The day I dared to use git was the day that changed how I work.

Now naturally, I wanted to use all that freedom for my windows work aswell, but as you know, git just isn’t quite there yet. In fact, I had an awful lot of trouble with it, mainly because of it’s integrated SSH client that doesn’t work with my putty pageant-setup and stuff.

So I resorted to storing my windows development stuff on my mac file system and using VMware Fusion’s shared folder feature to access the source files.

Unfortunately, it didn’t work very well at first as this is what I got:

I didn’t even try to find out what happens when I compile and run the project from there, so I pressed F1 and followed the instructions given there to get rid of the message that the “Project location is not trusted”.

I followed them, but it didn’t help.

I tried adding various UNC paths to the intranet zone, but neither worked.

Then I tried sharing the folder via Mac OS X’s built in SMB server. This time, the path I’ve set up using mscorcfg.msc actually seemed to do something. Visual Studio stopped complaining. And then I found out:

Windows treats host names containing a dot (.) as internet resources. Hostnames without dots are considered to be intranet resouces.

celeswindev worked in mscorcfg.msc because celes, not containing a dot, was counted as an intranet resource.

.host contains a dot and this is counted to be an internet resource.

This means that to make the .NET framework trust your VMWare shared folder, you have to add the path to the “Internet_Zone”. Not the “LocalIntranet_Zone”, because the framework loader doesn’t even look there.

Once I’ve changed that configuration, Visual Studio complained that it was unable to parse the host name – it seems to assume them not starting with a dot.

This was fixed by mapping the path to a drive letter like we did centuries ago.

Now VS is happy and I can have the best of all worlds:

I can keep my windows development work in a git repository
I have a useful (and working) shell and ssh-agent to actually “git svn dcommit” my work
I don’t have to export any folders of my mac via SMB
Time Machine now also backs up my Windows Work which I had to do manually until now.

Very nice indeed, but now back to work (with git :-) ).

Converting Java keytool-certificates

To be able to read barcodes from connected barcode-scanners into the webbased version of PopScan, we have to use a signed applet – there is no other way for getting the needed level of hardware access without signing your applet.

The signature, by the way, doesn’t at all prevent any developer from doing bad stuff – it just puts their signature below it (literally), so it kind of raises the bar to distribute malware that way – after all, the checks when applying for a certificate usually are very rigid, so there is no way anybody could forge their application, so the origin of any piece of code is very tracable.

But there is no validation done of the actual code to be signed and I doubt that the certificate authorities out there actually revoke certificates used to certify malware, thought that remains to be seen.

Anyways. Back to the topic.

In addition to the Java Applet, we also develop the windows client frontend to the PopScan server. And we have a small frontend to run on Windows CE (or Windows Mobile) based barcode capable devices. Traditionally, both of these were never signed.

But lately with Vista and Windows Mobile 6, signing becomes more and more important: Both systems complain in variable loudness about unsigned code, so I naturally prefer the code to be signed – we DO have a code signing certificate after all – for our Applet.

Now the thing is that keytool, Java’s way of handling code signing keys doesn’t allow a private key to be exported. This means that there was no obvious way for me to ever use the certificate we got for our applet to sign Windows EXEs.

Going back to the CA and ask them to send over an additional certificate was no option for me: Aside of the fact that it would certainly have cost another two years fee, this would have ment to prove our identity all over again – one year too early as our current certificate is valid till 2009.

But then, I found a solution. Here’s how you convert a java keystore certificate to something you can use with Microsoft’s Authenticode:

Start KeyTool GUI
In the Treeview, click “Export”, “Private Key”
Select your java keystore-file
Enter two trarget file names for your key and the certificate chain (and select PEM format)
Click OK

Now you will have two more files. One is your private key (I’ve named it key.pem), the other is the certificate chain (named cert.pem in my case). Now, use OpenSSL to covert this into something Microsoft likes to see:

% openssl pkcs12 -inkey key.pem -in cert.pem -out keypair.pfx -export

openssl will ask for a password to encrypt the pfx file with and you’ll be done. Now you can use the pfx-file like any other pfx file you recived from your certificate authority (double click it to install it or use it with signcode.exe to directly sign your code).

Remember to delete key.pem as it’s the unencrypted private key!

Hosted Code Repository?

Recently (yesterday), the Ruby on Rails project announced their switch to git for their revision controlling needs. Also, they announced that they will use the hosted service github as the place to host the main repository on (even though git is decentralized, there is some sense in having a “main tree” which contains what’s going to be the official releases).

I didn’t know github, so I had a look at their project.

What I don’t understand is that they seem to also target commercial entities with their offering. Think of it: Supposing that you are a commercial entity doing commercial software development. Would you send over all your sourcecode and all the development history to another company?

Sure. They call themselves “Secure”. But what does that mean? Sure: They have SSL and SSH support, but frankly, I’m less concerned with patches travelling over the network unencrypted than I’m concerned with trusting anybody to host my code.

Even if they don’t screw up storage security (think: “accessing the code of your competition”), even if they are completely 100% trustworthy (think: “displeased employee selling out to your competition before leaving his employer”), there is still the issue of government/legal access.

When using an external hosting provider, you are storing your code (and history) in a foreign country with its own legislation. Are you prepared for that?

And finally, do you want the government of the country you’ve just sent your code (and history) to, to really have access to all that data? Who guarantees that the hosting provider of your choice won’t cooperate as soon as the government comes knoking (it happened before, even without legal base at all)?

All that is never worth the risk for a larger company (or for smaller ones – like ours).

So what exactly are these hosting companies (github is one. Code Spaces is another) targeted at?

Free Software developers? Their code is open to begin with, so they have to face the problems I described anyways. But they are much harder to sue. Also, I’m not sure how compelling it is for a free software project to use a non-free tool (rails being the exception, but we’ll talk about that later on)
Large companies? No way (see above)
Smaller companies? Probably not. Smaller companies are less of a target due to lower visibility, but sueing them for anything is more likely to get you something in return quickly as they usually don’t dare prolonged legal fights.