How I back up gmail

There was a discussion on HackerNews about Gmail having lost the email in some accounts. One sentiment in the comments was clear:

It’s totally the users problem if they don’t back up their cloud based email.

Personally, I think I would have to agree:

Google is a provider like every other ISP or basically any other service too. There’s no reason to believe that your data is more save on Google than it is any where else. Now granted, they are not exactly known for losing data, but there’s other things that can happen.

Like your account being closed because whatever automated system believed your usage patterns were consistent with those of a spammer.

So the question is: What would happen if your Google account wasn’t reachable at some point in the future?

For my company (using commercial Google Apps accounts), I would start up that IMAP server which serves all mail ever sent to and from Gmail. People would use the already existing webmail client or their traditional IMAP clients. They would lose some productivity, but no single byte of data.

This was my condition for migrating email over to Google. I needed to have a back up copy of that data. Otherwise, I would not have agreed to switch to a cloud based provider.

The process is completely automated too. There’s not even a backup script running somewhere. Heck, not even the Google Account passwords have to be stored anywhere for this to work.

So. How does it work then?

Before you read on, here are the drawbacks of the solution:

  • I’m a die-hard Exim fan (long story. It served me very well once – up to saving-my-ass level of well), so the configuration I’m outlining here is for Exim as the mail relay.
  • Also, this only works with paid Google accounts. You can get somewhere using the free ones, but you don’t get the full solution (i.e. having a backup of all sent email)
  • This requires you to have full control over the MX machine(s) of your domain.

If you can live with this, here’s how you do it:

First, you set up your Google domain as normal. Add all the users you want and do everything else just as you would do it in a traditional set up.

Next, we’ll have to configure Google Mail for two-legged OAuth access to our accounts. I’ve written about this before. We are doing this so we don’t need to know our users passwords. Also, we need to enable the provisioning API to get access to the list of users and groups.

Next, our mail relay will have to know about what users (and groups) are listed in our Google account. Here’s what I quickly hacked together in Python (my first Python script ever – be polite while flaming) using the GData library:

import gdata.apps.service

consumer_key = 'yourdomain.com'
consumer_secret = '2-legged-consumer-secret' #see above
sig_method = gdata.auth.OAuthSignatureMethod.HMAC_SHA1

service = gdata.apps.service.AppsService(domain=consumer_key)
service.SetOAuthInputParameters(sig_method, consumer_key,
  consumer_secret=consumer_secret, two_legged_oauth=True)

res = service.RetrieveAllUsers()
for entry in res.entry:
    print entry.login.user_name

import gdata.apps.groups.service

service = gdata.apps.groups.service.GroupsService(domain=consumer_key)
service.SetOAuthInputParameters(sig_method, consumer_key,
  consumer_secret=consumer_secret, two_legged_oauth=True)
res = service.RetrieveAllGroups()
for entry in res:
    print entry['groupName']

Place this script somewhere on your mail relay and run it in a cron job. In my case, I’m having its output redirected to /etc/exim4/gmail_accounts. The script will emit one user (and group) name per line.

Next, we’ll deal with incoming email:

In the Exim configuration of your mail relay, add the following routers:

yourdomain_gmail_users:
  driver = accept
  domains = yourdomain.com
  local_parts = lsearch;/etc/exim4/gmail_accounts
  transport_home_directory = /var/mail/yourdomain/${lc:$local_part}
  router_home_directory = /var/mail/yourdomain/${lc:$local_part}
  transport = gmail_local_delivery
  unseen

yourdomain_gmail_remote:
  driver = accept
  domains = yourdomain.com
  local_parts = lsearch;/etc/exim4/gmail_accounts
  transport = gmail_t

yourdomain_gmail_users is what creates the local copy. It accepts all mail sent to yourdomain.com, if the local part (the stuff in front of the @) is listed in that gmail_accounts file. Then it sets up some paths for the local transport (see below) and marks the mail as unseen so the next router gets a chance too.

Which is yourdomain_gmail_remote. This one is again checking domain and the local part and if they match, it’s just delegating to the gmail_t remote transport (which will then send the email to Google).

The transports look like this:

gmail_t:
  driver = smtp
  hosts = aspmx.l.google.com:alt1.aspmx.l.google.com:
    alt2.aspmx.l.google.com:aspmx5.googlemail.com:
    aspmx2.googlemail.com:aspmx3.googlemail.com:
    aspmx4.googlemail.com
  gethostbyname

gmail_local_delivery:
  driver = appendfile
  check_string =
  delivery_date_add
  envelope_to_add
  group=mail
  maildir_format
  directory = MAILDIR/yourdomain/${lc:$local_part}
  maildir_tag = ,S=$message_size
  message_prefix =
  message_suffix =
  return_path_add
  user = Debian-exim
  create_file = anywhere
  create_directory

the gmail_t transport is simple. The local one you might have to patch up users and groups plus the location where you what to write the mail to.

Now we are ready to reconfigure Google as this is all that’s needed to get a copy of every inbound mail into a local maildir on the mail relay.

Here’s what you do:

  • You change the MX of your domain to point to this relay of yours

The next two steps are the reason you need a paid account: These controls are not available for the free accounts:

  • In your Google Administration panel, you visit the Email settings and configure the outbound gateway. Set it to your relay.
  • Then you configure your inbound gateway and set it to your relay too (and to your backup MX if you have one).

This screenshot will help you:

gmail config

All email sent to your MX (over the gmail_t transport we have configured above) will now be accepted by gmail.

Also, Gmail will now send all outgoing Email to your relay which needs to be configured to accept (and relay) email from Google. This pretty much depends on your otherwise existing Exim configuration, but here’s what I added (which will work with the default ACL):

hostlist   google_relays = 216.239.32.0/19:64.233.160.0/19:66.249.80.0/20:
    72.14.192.0/18:209.85.128.0/17:66.102.0.0/20:
    74.125.0.0/16:64.18.0.0/20:207.126.144.0/20
hostlist   relay_from_hosts = 127.0.0.1:+google_relays

And lastly, the tricky part: Storing a copy of all mail that is being sent through Gmail (we are already correctly sending the mail. What we want is a copy):

Here is the exim router we need:

gmail_outgoing:
  driver = accept
  condition = "${if and{
    { eq{$sender_address_domain}{yourdomain.com} }
    {=={${lookup{$sender_address_local_part}lsearch{/etc/exim4/gmail_accounts}{1}}}{1}}} {1}{0}}"
  transport = store_outgoing_copy
  unseen

(did I mention that I severely dislike RPN?)

and here’s the transport:

store_outgoing_copy:
  driver = appendfile
  check_string =
  delivery_date_add
  envelope_to_add
  group=mail
  maildir_format
  directory = MAILDIR/yourdomain/${lc:$sender_address_local_part}/.Sent/
  maildir_tag = ,S=$message_size
  message_prefix =
  message_suffix =
  return_path_add
  user = Debian-exim
  create_file = anywhere
  create_directory

The maildir I’ve chosen is the correct one if the IMAP-server you want to use is Courier IMAPd. Other servers use different methods.

One little thing: When you CC or BCC other people in your domain, Google will send out multiple copies of the same message. This will yield some message duplication in the sent directory (one per recipient), but as they say: Better backup too much than too little.

Now if something happens to your google account, just start up an IMAP server and have it serve mail from these maildir directories.

And remember to back them up too, but you can just use rsync or rsnapshot or whatever other technology you might have in use. They are just directories containing one file per email.

PHP 5.3 and friends on Karmic

I have been patient. For months I hoped that Ubuntu would sooner or later get PHP 5.3, a release I’m very much looking forward to, mainly because of the addition of anonymous inner functions to spell the death of create_function or even eval.

We didn’t get 5.3 for Karmic and who knows about Lucid even (it’s crazy that nearly one year after the release of 5.3, there is still debate on whether to include it in the next version of Ubuntu that will be the current LTS release for the next four years. This is IMHO quite the disservice against PHP 5.3 adoption).

Anyways: We are in the process of releasing a huge update to PopScan that is heavily focussed on getting rid of cruft, increasing speed all over the place and increasing overall code quality. Especially the last part could benefit from having 5.3 and seeing that at this point PopScan already runs well on 5.3, I really wanted to upgrade.

In comes Al-Ubuntu-be, a coworker of mine and his awesome Debian packaging skills: Where there are already a few PPAs out there that contain a 5.3 package, Albe went the extra step and added not only PHP 5.3 but quite many other packages we depend upon that might also be useful to my readers. Packages like APC, memcache, imagick and xdebug for development.

While we can make no guarantees that these packages will be maintained heavily, they will get some security update treatment (though highly likely by version bumping as opposed to backporting).

So. If you are on Karmic (and later Lucid if it won’t get 5.3) and want to run PHP 5.3 with APC and Memcache, head over to Albe’s PPA.

Also, I’d like to take the opportunity to thank Albe for his efforts: Having a PPA with real .deb packages as opposed to just my self-compiled mess I would have done gives us a much nicer way of updating existing installations to 5.3 and even a much nicer path back to the original packages once they come out. Thanks a lot.

Snow Leopard and PHP

Earlier versions of Mac OS X always had pretty outdated versions of PHP in their default installation, so what you usually did was to go to entropy.ch and fetch the packages provided there.

Now, after updating to Snow Leopard you’ll notice that the entropy configuration has been removed and once you add it back in, you’ll see Apache segfaulting and some missing symbol errors.

Entropy has not updated the packages to snow leopard yet, so you could have a look at PHP that came with stock snow leopard: This time it’s even bleeding edge: Snow Leopard comes with PHP 5.3.0.

Unfortunately though, some vital extensions are missing, most notably for me, the PostgeSQL extension.

This time around though, Snow Leopard comes with a functioning PHP development toolset, so there’s nothing stopping you to build it yourself, so here’s how to get the official PostgreSQL extension working on Snow Leopard’s stock php:

  1. Make sure that you have installed the current Xcode Tools. You’ll need a working compiler for this.
  2. Make sure that you have installed PostgreSQL and know where it is on your machine. In my case, I’ve used the One-click installer from EnterpriseDB (which persisted the update to 10.6).
  3. Now that Snow Leopard uses a full 64bit userspace, we’ll have to make sure that the PostgreSQL client library is available as a 64 bit binary – or even better, as an universal binary.Unfortunately, that’s not the case with the one-click installer, so we’ll have to fix that first:
    1. Download the sources of the PostgreSQL version you have installed from postgresql.org
    2. Open a terminal and use the following commands:
      % tar xjf postgresql-[version].tar.bz2
      % cd postgresql-[version]
      % CFLAGS="-arch i386 -arch x86_64" ./configure --prefix=/usr/local/mypostgres
      % make

      make will fail sooner or later because you the postgres build scripts can’t handle building an universal binary server, but the compile will progress enough for us to now build libpq. Let’s do this:

      % make -C src/interfaces
      % sudo make -C src/interfaces install
      % make -C src/include
      % sudo make -C src/include install
      % make -C src/bin
      % sudo make -C src/bin install
  4. Download the php 5.3.0 source code from their website. I used the bzipped version.
  5. Open your Terminal and cd to the location of the download. Then use the following commands:
    % tar -xjf php-5.3.0.tar.bz2
    % cd php-5.3.0/ext/pgsql
    % phpize
    % ./configure --with-pgsql=/usr/local/mypostgres
    % make -j8 # in case of one of these nice 8 core macs :p
    % sudo make install
    % cd /etc
    % cp php.ini-default php.ini
  6. Now edit your new php.ini and add the line extension=pgsql.so

And that’s it. Restart Apache (using apachectl or the System Preferences) and you’ll have PostgreSQL support.

All in all this is a tedious process and it’s the price us early adopters have to pay constantly.

If you want an honest recommendation on how to run PHP with PostgreSQL support on Snow Leopard, I’d say: Don’t. Wait for the various 3rd party packages to get updated.

Tunnel munin nodes over HTTP

Last time I’ve talked about Munin, the one system monitoring tool I feel working well enough for me to actually bother to work with. Harsh words, I know, but the key to every solution is simplicity. And simple Munin is. Simple, but still powerful enough to do everything I would want it to do.

The one problem I had with it is that the querying of remote nodes works over a custom TCP port (4949) which doesn’t work behind firewalls.

There are some SSH tunneling solutions around, but what do you do if even SSH is no option because the remote access method provided to you relies on some kind of VPN technology or access token.

Even if you could keep a long-running VPN connection, it’s a very performance intensive solution as it requires resources on the VPN gateway. But this point is moot anyways because nearly all VPNs terminate long running connections. If re-establishing the connection requires physical interaction, then you are basically done here.

This is why I have created a neat little solution which tunnels the munin traffic over HTTP. It works with a local proxy server your munin monitoring process will connect to and a little CGI-script on the remote end.

This will cause multiple HTTP connections per query interval (the proxy uses Keep-Alive though so it’s not TCP connections we are talking about – it’s just hits in the access.log you’ll have to filter out somehow) because it’s impossible for a CGI script to keep the connection open and send data both ways – at least not if your server-side is running plain PHP which is the case in the setup I was designing this for.

Aynways – the solution works flawlessly and helps me to monitor a server behind one hell of a firewall and behind a reverse proxy.

You’ll find the code here (on GitHub as usual) and some explanation on how to use it is here.

Licensed under the MIT license as usual.

Monitoring servers with munin

If you want to monitor runtime parameters of your machines, there are quite many tools available.

But in the past, I’ve never been quite happy with any of them. Some didn’t work, others didn’t work right and some others worked ok but then stopped working all of a sudden.

All of them were a pain to install and configure.

Then, a few days ago, I found Munin. Munin is optimized for simplicity, which makes it work very, very well. And the reports actually look nice and readable which is a nice additional benefit.

Apache parameters

Like many other system monitoring solutions, Munin relies on custom plugins to access the various system parameters. Unlike other solutions though, the plugins are very easy to write, understand and debug which encourages you to write your own.

Adding additional servers to be watched is a matter of configuring the node (as in “apt-get install munin-node”) and adding two lines to your master server configuration file.

Adding another plugin for a new parameter to monitor is a matter of creating one symlink and restarting the node.

Manifestation of a misconfigured cronjob

On the first day after deployment the tool already proved useful in finding a misconfigured cronjob on on server which ran every minute for one hour every second hour instead of once per two hours.

Munin may not have all the features of the foll-blown solutions, but it has three real advantages over everything else I’ve seen so far:

  1. It’s very easy to install and configure. What good is an elaboration solution if you can never get it to work correctly?
  2. It looks very nicely and clean. If looking at the reports hurts the eyes, you don’t look at them or you don’t understand what they want to tell you.
  3. Because the architecture is so straight-forward, you can create customized counters in minutes which in the end provides you with a much better overview over what is going on.

The one big drawback is that the master data collector needs to access the monitored servers on port 4949 which is not exactly firewall-friendly.

Next time, we’ll learn how to work around that (and I don’t mean the usual ssh tunnel solution).

Dropbox

Dropbox is cloud storage on the next level: You install their little application – available for Linux, Mac OS X and Windows – which will create a folder which will automatically be kept synchronized between all the computers where you have installed that little application on.

Because it synchronizes in the background and always keeps the local copy around, the access-speed isn’t different from a normal local folder – mainly because it is, after all, a local folder you are accessing. Dropbox is not one of these slow “online hard drives” it’s more like rsync in the background (and rsync it is – the application is intelligent enough to only transmit deltas – even from binary files).

They do provide you with a web interface of course, but the synchronizing aspect is the most interesting.

The synchronized data ends up somewhere in Amazon’s S3 service, which is fine with me.

Unfortunately, while the data stored in an encrypted fashion on S3, the key is generated by the Dropbox server and thus known to them, which makes Dropbox completely unusable for sensitive unencrypted data. They do state in the FAQ that this will maybe change sometime in the future, but for not it is as it is.

Still, I found some use for Dropbox: ~/Library/Preferences, ~/.zshrc and ~/.ssh all are now stored in ~/Dropbox/System and symlinked back to their original place. This means that a large chunk of my user profile is availalbe on all the computers I’m working on. I would even try the same trick with ~/Library/Application Support, but that seems risky due to the missing encryption and due to the fact that Application Support sometimes contains database files which get corrupted for sure when moved around while they are open – like the Firefox profile.

This naturally even works when the internet connection is down – DropBox synchronizes changes locally, so when the internet (or Dropbox) is down, I just have the most recent copy of when the service was still working – that’s more than good enough.

Another use that comes to mind for Dropbox storage are game save files or addons you’d want to have access to on every computer you are using – just move your stuff to ~/Dropbox and symlink it back to the original place.

Very convenient.

Now if only they’d provide me with a way to provide my own encryption key. That way I would instantly buy the pro account with 25GB of storage and move lots and lots of data in there.

Dropbox is the answer to the ever increasing amount of computers in my life because now I don’t care about setting up the same stuff over and over again. It’s just there and ready. Very helpful.

Listen to your home music from the office

My MP3 collection is safely stored on shion, on a drobo mounted as /nas. Naturally, I want to listen to said music from the office – especially considering my fully routed VPN access between the office and my home infrastructure and the upstream which suffices for at least 10 concurrent 128bit streams (boy – technology has changed in the last few years – I remember the times where you couldn’t reliably stream 128 bit streams – let alone my 160/320 mp3s).

I’ve tried many things so far to make this happen:

  • serve the files with a tool like jinzora. This works, but I don’t really like jinzora’s web interface and I was never able to get it to work correctly on my Ubuntu box. I was able to trace it down to null bytes read from their tag parser, but the code is very convoluted and practically unreadable without putting quite some effort into that. Considering that I didn’t much like the interface in the first place, I didn’t want to invest time into that.
  • Use a SlimServer (now Squeezecenter) with a softsqueeze player. Even though I don’t use my squeezebox (an original model with the original slimdevices brand, not the newer Logitech one) any more because the integrated amplifier in the Sonos players works much better for my current setup. This solution worked quite ok, but the audio tends to stutter a bit at the beginning of tracks, indicating some buffering issues.
  • Use iTune’s integrated library sharing feature. This seemed both undoable and unpractical. Unpractical because it would force me to keep my main mac running all the time and undoable because iTunes sharing can’t pass subnet boundaries. Aside of that, it’s a wonderful solution as audio doesn’t stutter, I already know the interface and access is very quick and convenient.

But then I found out how to make the iTunes thing both very much doable and practical.

The network boundary problem can be solved using Network Beacon, a ZeroConf proxy. Start the application, create a new beacon. Chose any service name, use «_daap._tcp.» as service type, set the port number to 3689, enable the host proxy, keep the host name clear and enter the IP address of the system running iTunes (or firefly – see below).

Oh, and the target iTunes refuses to serve out data to machines in different subnets, so to be able to directly access a remote iTunes, you’d also have to set up an SSH tunnel.

Using Network Beacon, ZeroConf quickly begins working across any subnet boundaries.

The next problem was about the fact that I was forced to keep my main workstation running at home. I fixed that with Firefly Media Server for which even a pretty recent prebuilt package exists for Ubuntu (apt-get install mt-daapd).

I’ve installed that, configured iptables to drop packets for port 3689 on the external interface, configured Firefly to use the music share (which basically is a current backup of the itunes library of my main workstation – rsync for the win).

Firefly in this case even detects the existing iTunes playlists (as the music share is just a backup copy of my iTunes library – including the iTunes Library.xml), though smart playists don’t work, but can easily be recreated in the firefly web interface.

This means that I can access my complete home mp3 library from the office, stutter free, using an interface I’m well used to, without being forced to keep my main machine running all the time.

And it isn’t even that much of a hack and thus easy to rebuild should the need arise.

I’d love to not be forced to do the Network Beacon thing, but avahi doesn’t relay ZeroConf information across VPN interfaces.