How I back up gmail

There was a discussion on HackerNews about Gmail having lost the email in some accounts. One sentiment in the comments was clear:

It’s totally the users problem if they don’t back up their cloud based email.

Personally, I think I would have to agree:

Google is a provider like every other ISP or basically any other service too. There’s no reason to believe that your data is more save on Google than it is any where else. Now granted, they are not exactly known for losing data, but there’s other things that can happen.

Like your account being closed because whatever automated system believed your usage patterns were consistent with those of a spammer.

So the question is: What would happen if your Google account wasn’t reachable at some point in the future?

For my company (using commercial Google Apps accounts), I would start up that IMAP server which serves all mail ever sent to and from Gmail. People would use the already existing webmail client or their traditional IMAP clients. They would lose some productivity, but no single byte of data.

This was my condition for migrating email over to Google. I needed to have a back up copy of that data. Otherwise, I would not have agreed to switch to a cloud based provider.

The process is completely automated too. There’s not even a backup script running somewhere. Heck, not even the Google Account passwords have to be stored anywhere for this to work.

So. How does it work then?

Before you read on, here are the drawbacks of the solution:

  • I’m a die-hard Exim fan (long story. It served me very well once – up to saving-my-ass level of well), so the configuration I’m outlining here is for Exim as the mail relay.
  • Also, this only works with paid Google accounts. You can get somewhere using the free ones, but you don’t get the full solution (i.e. having a backup of all sent email)
  • This requires you to have full control over the MX machine(s) of your domain.

If you can live with this, here’s how you do it:

First, you set up your Google domain as normal. Add all the users you want and do everything else just as you would do it in a traditional set up.

Next, we’ll have to configure Google Mail for two-legged OAuth access to our accounts. I’ve written about this before. We are doing this so we don’t need to know our users passwords. Also, we need to enable the provisioning API to get access to the list of users and groups.

Next, our mail relay will have to know about what users (and groups) are listed in our Google account. Here’s what I quickly hacked together in Python (my first Python script ever – be polite while flaming) using the GData library:

import gdata.apps.service

consumer_key = 'yourdomain.com'
consumer_secret = '2-legged-consumer-secret' #see above
sig_method = gdata.auth.OAuthSignatureMethod.HMAC_SHA1

service = gdata.apps.service.AppsService(domain=consumer_key)
service.SetOAuthInputParameters(sig_method, consumer_key,
  consumer_secret=consumer_secret, two_legged_oauth=True)

res = service.RetrieveAllUsers()
for entry in res.entry:
    print entry.login.user_name

import gdata.apps.groups.service

service = gdata.apps.groups.service.GroupsService(domain=consumer_key)
service.SetOAuthInputParameters(sig_method, consumer_key,
  consumer_secret=consumer_secret, two_legged_oauth=True)
res = service.RetrieveAllGroups()
for entry in res:
    print entry['groupName']

Place this script somewhere on your mail relay and run it in a cron job. In my case, I’m having its output redirected to /etc/exim4/gmail_accounts. The script will emit one user (and group) name per line.

Next, we’ll deal with incoming email:

In the Exim configuration of your mail relay, add the following routers:

yourdomain_gmail_users:
  driver = accept
  domains = yourdomain.com
  local_parts = lsearch;/etc/exim4/gmail_accounts
  transport_home_directory = /var/mail/yourdomain/${lc:$local_part}
  router_home_directory = /var/mail/yourdomain/${lc:$local_part}
  transport = gmail_local_delivery
  unseen

yourdomain_gmail_remote:
  driver = accept
  domains = yourdomain.com
  local_parts = lsearch;/etc/exim4/gmail_accounts
  transport = gmail_t

yourdomain_gmail_users is what creates the local copy. It accepts all mail sent to yourdomain.com, if the local part (the stuff in front of the @) is listed in that gmail_accounts file. Then it sets up some paths for the local transport (see below) and marks the mail as unseen so the next router gets a chance too.

Which is yourdomain_gmail_remote. This one is again checking domain and the local part and if they match, it’s just delegating to the gmail_t remote transport (which will then send the email to Google).

The transports look like this:

gmail_t:
  driver = smtp
  hosts = aspmx.l.google.com:alt1.aspmx.l.google.com:
    alt2.aspmx.l.google.com:aspmx5.googlemail.com:
    aspmx2.googlemail.com:aspmx3.googlemail.com:
    aspmx4.googlemail.com
  gethostbyname

gmail_local_delivery:
  driver = appendfile
  check_string =
  delivery_date_add
  envelope_to_add
  group=mail
  maildir_format
  directory = MAILDIR/yourdomain/${lc:$local_part}
  maildir_tag = ,S=$message_size
  message_prefix =
  message_suffix =
  return_path_add
  user = Debian-exim
  create_file = anywhere
  create_directory

the gmail_t transport is simple. The local one you might have to patch up users and groups plus the location where you what to write the mail to.

Now we are ready to reconfigure Google as this is all that’s needed to get a copy of every inbound mail into a local maildir on the mail relay.

Here’s what you do:

  • You change the MX of your domain to point to this relay of yours

The next two steps are the reason you need a paid account: These controls are not available for the free accounts:

  • In your Google Administration panel, you visit the Email settings and configure the outbound gateway. Set it to your relay.
  • Then you configure your inbound gateway and set it to your relay too (and to your backup MX if you have one).

This screenshot will help you:

gmail config

All email sent to your MX (over the gmail_t transport we have configured above) will now be accepted by gmail.

Also, Gmail will now send all outgoing Email to your relay which needs to be configured to accept (and relay) email from Google. This pretty much depends on your otherwise existing Exim configuration, but here’s what I added (which will work with the default ACL):

hostlist   google_relays = 216.239.32.0/19:64.233.160.0/19:66.249.80.0/20:
    72.14.192.0/18:209.85.128.0/17:66.102.0.0/20:
    74.125.0.0/16:64.18.0.0/20:207.126.144.0/20
hostlist   relay_from_hosts = 127.0.0.1:+google_relays

And lastly, the tricky part: Storing a copy of all mail that is being sent through Gmail (we are already correctly sending the mail. What we want is a copy):

Here is the exim router we need:

gmail_outgoing:
  driver = accept
  condition = "${if and{
    { eq{$sender_address_domain}{yourdomain.com} }
    {=={${lookup{$sender_address_local_part}lsearch{/etc/exim4/gmail_accounts}{1}}}{1}}} {1}{0}}"
  transport = store_outgoing_copy
  unseen

(did I mention that I severely dislike RPN?)

and here’s the transport:

store_outgoing_copy:
  driver = appendfile
  check_string =
  delivery_date_add
  envelope_to_add
  group=mail
  maildir_format
  directory = MAILDIR/yourdomain/${lc:$sender_address_local_part}/.Sent/
  maildir_tag = ,S=$message_size
  message_prefix =
  message_suffix =
  return_path_add
  user = Debian-exim
  create_file = anywhere
  create_directory

The maildir I’ve chosen is the correct one if the IMAP-server you want to use is Courier IMAPd. Other servers use different methods.

One little thing: When you CC or BCC other people in your domain, Google will send out multiple copies of the same message. This will yield some message duplication in the sent directory (one per recipient), but as they say: Better backup too much than too little.

Now if something happens to your google account, just start up an IMAP server and have it serve mail from these maildir directories.

And remember to back them up too, but you can just use rsync or rsnapshot or whatever other technology you might have in use. They are just directories containing one file per email.

Google Apps – Provisioning – Two-Legged OAuth

Our company uses Google Apps premium for Email and shared documents, but in order to have more freedom in email aliases, in order to have more control over email routing and finally, because there are a couple of local parts we use to direct mail to some applications, all our mail, even though it’s created in Google Apps and finally ends up in Google Apps, goes via a central mail relay we are running ourselves (well. I’m running it).

Google Apps premium allows you to do that and it’s a really cool feature.

One additional thing I’m doing on that central relay is to keep a backup of all mail that comes from Google or goes to Google. The reason: While I trust them not to lose my data, there are stories around of people losing their accounts to Googles anti-spam automatisms. This is especially bad as there usually is nobody to appeal to.

So I deemed it imperative that we store a backup of every message so we can move away from google if the need to do so arises.

Of course that means though that our relay needs to know what local parts are valid for the google apps domain – after all, I don’t want to store mail that would later be bounced by google. And I’d love to bounce directly without relaying the mail unconditionally, so that’s another reason why I’d want to know the list of users.

Google provides their provisioning API to do that and using the GData python packages, you can easily access that data. In theory.

Up until very recently, the big problem was that the provisioning API didn’t support OAuth. That meant that my little script that retreives the local parts had to have a password of an administrator which is something that really bugged me as it meant that either I store my password in the script or I can’t run the script from cron.

With the Google Apps Marketplace, they fixed that somewhat, but it still requires a strange dance:

When you visit the OAuth client configuration (https://www.google.com/a/cpanel/YOURDOMAIN/ManageOauthClients), it lists you domain with the note “This client has access to all APIs.”.

This is totally not true though as Google’s definition of “all” apparently doesn’t include “Provisioning” :-)

To make two-legged OAuth work for the provisioning API, you have to explicitly list the feeds. In my case, this was Users and Groups:

Under “Client Name”, add your domain again (“example.com”) and unter One or More API Scopes, add the two feeds like this: “https://apps-apis.google.com/a/feeds/group/#readonly,https://apps-apis.google.com/a/feeds/user/#readonly”

This will enable two-legged OAuth access to the user and group lists which is what I need in my little script:

import gdata.apps.service
import gdata.apps.groups.service

consumer_key = 'YOUR.DOMAIN'
consumer_secret = 'secret' #check advanced / OAuth in you control panel
sig_method = gdata.auth.OAuthSignatureMethod.HMAC_SHA1

service = gdata.apps.service.AppsService(domain=consumer_key)
service.SetOAuthInputParameters(sig_method, consumer_key, consumer_secret=consumer_secret, two_legged_oauth=True)

res = service.RetrieveAllUsers()
for entry in res.entry:
    print entry.login.user_name

service = gdata.apps.groups.service.GroupsService(domain=consumer_key)
service.SetOAuthInputParameters(sig_method, consumer_key, consumer_secret=consumer_secret, two_legged_oauth=True)
res = service.RetrieveAllGroups()
for entry in res:
    print entry['groupName']

… and back to Thunderbird

It has been a while since I’ve last posted about email – still a topic very close to my heart, be it on the server side or on the client side (though the server side generally works very well here, which is why I don’t post about it).

Waybackwhen, I’ve written about Becky! which is also where I’ve declared the points I deemed important in a mail client. A bit later, I’ve talked about The Bat!, but in the end, I’ve settled with Thunderbird, just to switch to Mac Mail when I’ve switched to the Mac.

After that came my excursion to Gmail, but now I’m back to Thunderbird again.

Why? After all, my Gmail review sounded very nice, didn’t it?

Well…

  • Gmail is blazingly fast once it’s loaded, but starting the browser and then gmail (it loads so slow that “starting (the) gmail (application)” is a valid term to use) is always slower than just keeping a mail client open on the desktop.
  • Google Calendar Sync sucks and we’re using Exchange/Outlook here (and are actually quite happy with it – for calendaring and address books – it sucks for mail, but it provides decent IMAP support), so there was no way for the other folks here to have a look at my calendar.
  • Gmail always posts a “Sender:”-Header when using a custom sender domain which technically is the right thing to do, but Outlook on the receiving end screws up by showing the mail as being “From xxx@gmail.com on behalf of xxx@domain.com” which isn’t really what I’d want.
  • Google’s contact management offering is sub par compared even to Exchange.
  • iPhone works better with Exchange than it does with Google (yes. iPhone, but that’s another story).
  • The cool Gmail MIDP client doesn’t work/isn’t needed on the iPhone, but originally was one of the main reasons for me to switch to Gmail.

The one thing I really loved about Gmail though was the option of having a clean inbox by providing means for archiving messages with just a single keyboard shortcut. Using a desktop mail client without that funcationality wouldn’t have been possible for me any more.

This is why I’ve installed Nostalgy, a Thunderbird extension allowing me to assign a “Move to Folder xxx” action to a single keystroke (y in my case – just like gmail).

Using Thunderbird over Mac Mail has its reasons in the performance and in the crazy idea of Mac Mail to always download all the messages. Thunderbird is no race horse, but Mail.app isn’t even a slug.

Lately, more and more interesting posts regarding the development of Thunderbird have appeared on Planet Mozilla, so I’m looking forward to see Thunderbird 3 taking shape in its revitalized form.

I’m all but conservative in my choice of applications and gadgets, but Mail  – also because of its importance for me – must work exactly as I want it. None of the solutions out there are doing that to the full extent, but TB certainly comes closest. Even after years of testing and trying out different solutions, TB is the thing that solves most of my requirements without adding new issues.

Gmail is splendid too, but it presents some shortcomings TB doesn’t come with.

Gmail – The review

It has been quite a while since I began routing my mail to Gmail with the intention of checking that often-praised mail service out thoroughly.

The idea was to find out if it’s true what everyone keeps saying: That gmail has a great user interface, that it provides all the features one needs and that it’s a plain pleasure to work with it.

Personally, I’m blown away.

Despite the obviously longer load time to be able to access the mailbox (Mac Mail launches quicker than it takes gmail to load here – even with a 10 MBit/s connection), the gmail interface is much faster to use – especially with the nice keyboard shortcuts – but I’m getting ahead of myself.

When I began to use the interface for some real email work, I immediately noticed the shift of paradigm: There are no folders and – the real new thing for me – you are encouraged to move your mail out of the inbox as you take notice of them and/or complete the task associated with the message.

When you archive a message, it moves out of the inbox and is – unless you tag it with a label for quick retrieval – only accessible via the (quick) full text search engine built into the application.

The searching part of this usage philosophy is known to me. When I was using desktop clients, I usually kept arriving email in my inbox until it contained somewhere around 1500 messages or so. Then I grabbed all the messages and put them to my “Old Mail” folder where I accessed them strictly via the search functionality built into the mail client (or the server in case of a good IMAP client).

What’s new for me is the notion of moving mail out of your inbox as you stop being interested in the message – either because you plain read it or because the associated task is completed.

This allows you for a quick overview over the tasks still pending and it keeps your inbox nice and clean.

If you want quick access to certain messages, you can tag them with any label you want (multiple labels per message are possible of course) in which case you can access the messages with one click, saving you the searching.

Also, it’s possible to define filters allowing you to automatically apply labels to messages and – if you want, move them out of the inbox automatically – a perfect setup for the SVN commit messages I’m getting, allowing me to quickly access them at the end of the day and looking over the commits.

But the real killer feature of gmail is the keyboard interface.

Gmail is nearly completely accessible without requiring you to move your hands off the keyboard. Additionally, you don’t even need to press modifier keys as the interface is very much aware of state and mode, so it’s completely usable with some very intuitive shortcuts which all work by pressing just any letter button.

So usually, my workflow is like this: Open gmail, press o to open the new message, read it, press y to archive it, close the browser (or press j to move to the next message and press o again to open it).

This is as fast as using, say, mutt on the console, but with the benefit of staying usable even when you don’t know which key to press (in that case, you just take the mouse).

Gmail is perfectly integrated into google calendar, and it’s – contrary to mac mail – even able to detect outlook meeting invitations (and send back correct responses).

Additionally, there’s a MIDP applet available for your mobile phone that’s incredibly fast and does a perfect job of giving you access to all your email messages when you are on the road. As it’s a Java application, it runs on pretty much every conceivable mobile phone and because it’s a local application, it’s fast as hell and can continue to provide the nice, keyboard shortcut driven interface which we are used to from the AJAXy web application.

Overall, the experiment of switching to gmail proofed to be a real success and I will not switch back anytime soon (all my mail is still archived in our Exchange IMAP box). The only downside I’ve seen so far is that if you use different email-aliases with your gmail-account, gmail will set the Sender:-Header to your gmail-address (which is a perfectly valid – and even mandated – thing to do), and the stupid outlook on the receiving end will display the email as being sent from your gmail adress “in behalf of” your real address, exposing your gmail-address at the receiving. Meh. So for sending non-private email, I’m still forced to use Mac Mail – unfortunately.

Trying out Gmail

Everyone and their friends seems to be using Gmail lately and I agree: The application has a clean interface, a very powerful search feature and is easily accessible from anywhere.

I have my Gmail address from back in the days when invites were scarce and the term AJAX wasn’t even a term yet, but I never go around to really take advantage of the services as I just don’t see myself checking various email accounts at various places – at least not for serious business.

But now I found a way to put gmail to the test as my main email application – at least for a week or two.

My main mail storage is and will be our Exchange server. I have multiple reasons for that

  1. I have all my email I ever sent or received in that IMAP account. That’s WAY more than the 2.8 GB you get in Gmail and even if I had enough space there, I would not want to upload all my messages there.
  2. I don’t trust gmail to be as diligent with the messages I store there as I would want it to. I managed to keep every single email message from 1998 till now and I’d hate to lose all that to a “glitch in the system”.
  3. I need IMAP access to my messages for various purposes.
  4. I need the ability of a strong server-side filtering to remove messages I’m more or less only receiving for logging purposes. I don’t want to see these – not until I need them. No reason to even have them around usually.

So for now I have added yet another filter to my collection of server-side filters: This time I’m redirecting a copy of all mail that didn’t get filtered away due to various reasons to my Gmail address. This way I get to keep all mail of my various aliases all at the central location where they always were and I can still use Gmail to access the newly arrived messages.

Which leaves the problem with the sent messages which I ALSO want to archive at my own location – at least the important ones.

I fixed this by BCCing all Mail I’m writing in gmail to a new alias I created. Mail to that alias with my Gmail address as sender will be filtered into my sent-box by Exchange so it’ll look as though I sent the message via Thunderbird and then uploaded the copy via IMAP.

I’m happy with this solution, so testing Gmail can begin.

I’m asking myself: Is a tag based storage system better than a purely search based (the mail I don’t filter away is kept in one big INBOX which I access purely via search queries if I need something)? Is a web based application as powerful as a mail client like Thunderbird or Apple Mail? Do I unconsciously use features I’m going to miss when using Gmail instead of Apple Mail or Thunderbird? Will I be able to get used to the very quick keyboard-interface to gmail?

Interesting questions I intend to answer.