Joining Debian to ActiveDirectory

This blog post is a small list of magic incantations and to be issued and animals to be sacrificed in order to join a Unix machine (Debian in this case) to a (samba-powered) ActiveDirectory domain.

All of these things have to be set up correctly or you will suffer eternal damnation in non-related-error-message hell:

  • Make absolutely sure that DNS works correctly
    • the new member server’s hostname must be in the DNS domain of the AD Domain
    • This absolutely includes reverse lookups.
    • Same goes for the domain controller. Again: Absolutely make sure that you set up a correct PTR record for your domain controller or you will suffer the curse of GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (Server not found in Kerberos database)
  • Disable IPv6 everywhere. I normally always advocate against disabling IPv6 in order to solve problems and instead just solve the problem, but bugs exist. Failing to disable IPv6 on either the server or the client will also cause you to suffer in Server not found in Kerberos database hell.
  • If you made previous attempts to join your member server, even when you later left the domain again, there’s probably a lingering host-name added by a previous dns update attempt. If that exists, your member server will be in ERROR_DNS_UPDATE_FAILED hell even if DNS is configured correctly.
    • In order to check, use samba-tool on the domain controller samba-tool dns query your.dc.ip.address your.domain.name memberservername ALL
    • If there’s a hostname, get rid of it using samba-tool dns delete your.dc.ip.address your.domain.name memberservername A ip.returned.above
  • make sure that the TLS certificate served by your AD server is trusted, either directly or chained to a trusted root. If you’re using a self-signed root (you’re probably doing that), add the root as a PEM-File (but with .crt extension!) to /usr/local/share/ca-certificates/ ad run /usr/sbin/update-ca-certificates. If you fail to do this correctly, you will suffer in ldap_sasl_interactive_bind_s: Can't contact LDAP server (-1) hell (no. Nothing will inform you of a certificate error – all you get is can't connect)
  • In order to check that everything is set up correctly, before even trying realmd or sssd, use ldapsearch: ldapsearch -H ldap://your.dc.host/ -Y GSSAPI -N -b "dc=your,dc=base,dc=dn" "(objectClass=user)"
  • Aside of all that, you can follow this guide, but also make sure that you manually install the krb5-user package. The debian package database has a missing dependency, so the package doesn’t get pulled in even though it is required.

All in all, this was a really bad case of XKCD 979 and in case you ask yourself whether I’m bitter, then let me tell you, that yes. I am bitter.

I can totally see that there are a ton of moving parts involved in this and I’m willing to nudge some of these parts in order to get the engine up and running. But it would definitely help if the various tools involved would give me meaningful log output. samba on the domain controller doesn’t log, tcpdump is pointless thanks to SSL everywhere, realmd fails silently while still saying that everything is ok (also, it’s unconditionally removing the config files it feeds into the various moving parts involved, so good luck trying to debug this), sssd emits cryptic error messages (see above) and so on.

Anyways. I’m just happy I go this working and now for reproducing it one more time, but this time recording everything in Ansible.

Geek heaven

If I had to make a list of attributes I would like the ISP of my dream to
have, then, I could write quite the list:

  • I would really like to have native IPv6 support. Yes. IPv4 will be
    sufficient for a very long time, but unless pepole start having access to
    IPv6, it’ll never see the wide deployment it needs if we want the internet
    to continue to grow. An internet where addresses are only available to
    people with a lot of money is not an internet we all want to be subjected to
    (see my post «asking for permission»)
  • I would want my ISP to accept or even support network neutrality. For this
    to be possible, the ISP of my dreams would need to be nothing but an ISP so
    their motivations (provide better service) align with mine (getting better
    service). ISPs who also sell content have all the motivation to provide
    crappy Internet service in order to better sell their (higher-margin)
    content.
  • If I have technical issues, I want to be treated as somebody who obviously
    has a certain level of technical knowledge. I’m by no means an expert in
    networking technology, but I do know about powering it off and on again. If
    I have to say «shibboleet» to get to a real
    technicial, so be it, but if that’s not needed, that’s even better.
  • The networking technology involved in getting me the connectivity I want
    should be widely available and thus easily replacable if something breaks.
  • The networking technology involved should be as simple as possible: The
    more complex the hardware involved, the more stuff can break, especially
    when you combine cost-pressure for end-users with the need for high
    complexity.
  • The network equipment I’m installing at my home and which has thus access
    to my LAN needs to be equipment I own and I control fully. I do not accept
    leased equipment to which I do not have full access to.
  • And last but not least, I would really like to have as much bandwidth as possible

I’m sure I’m not alone with these wishes, even though, for «normal people»
they might seem strange.

But honestly: They just don’t know it, but they too have the same interests.
Nobody wants an internet that works like TV where you pay for access to a
curated small list of “approved” sites (see network neutrality and IPv6
support).

Nobody wants to get up and reboot their modem here and then because it
crashed. Nobody wants to be charged with downloading illegal content
because their Wifi equipment was suddenly repurposed as an open access point
for other customers of an ISP.

Most of the wishes I list above are the basis needed for these horror
scenarios never coming to pass, however unlikely the might seem now (though
getting up and rebooting the modem/router is something we already have to
deal with today).

So yes. While it’s getting rarer and rarer to get all the points of my list
fulfilled, to the point where I though this to be impossible to get all of
it, I’m happy to say that here in Switzerland, there is at least one ISP that
does all of this and more.

I’m talking about Init7 and especially their
awesome FTTH offering Fiber7 which very recently
became available in my area.

Let’s deal with the technology aspect first as this really isn’t the
important point of this post: What you get from them is pure 1Gbit/s
Ethernet. Yes, they do sell you a router box if you want one, but you can
just as well just get a simple media converter, or just an SFP module to plug
into any (managed) switch (with SFP port).

If you have your own routing equipment, be it a linux router like my
shion or be it any
Wifi Router, there’s no need to add any kind of additional complexity to
your setup.

No additional component that can crash, no software running in your home to
which you don’t have your password to and certainly no sneakily opened
public WLANs (I’m looking at you,
cablecom).

Of course you get native IPv6 (a /48 which incidentally is room for
281474976710656 whole internets in your apartment) too.

But what’s really remarkable about Init7 isn’t the technical aspect (though,
again, it’s bloody amazing), but everything else:

  • Init7 was one of the first ISPs in Switzerland to offer IPv6 to end users.
  • Init7 doesn’t just support network neutrality.
    They actively fight for it
  • They explicitly state
    that they are not selling content and they don’t intend to start doing so. They are just an ISP and as such their motivations totally align with mine.

There are a lot of geeky soft factors too:

  • Their press releases are written in Open Office (check the PDF properties
    of this one
    for example)
  • I got an email from a technical person on their end that was written using
    f’ing Claws Mail on Linux
  • Judging from the Recieved headers of their Email, they are using IPv6 in
    their internal LAN – down to the desktop workstations. And related to that:
  • The machines in their LAN respond to ICMPv6 pings which is utterly crazy
    cool. Yes. They are firewalled (cough I had to try. Sorry.), but they let
    ICMP through. For the not as technical readers here: This is as good an
    internet citizen as you will ever see and it’s extremely unexpected these
    days.

If you are a geek like me and if your ideals align with the ones I listed
above, there is no question: You have to support them. If you can have their
Fiber offering in your area, this is a no-brainer. You can’t get synchronous
1GBit/s for CHF 64ish per month anywhere else and even if you did, it
wouldn’t be plain Ethernet either.

If you can’t have their fiber offering, it’s still worth considering their
other offers. They do have some DSL based plans which of course are
technically inferior to plain ethernet over fiber, but you would still
support one of the few remaining pure ISPs.

It doesn’t have to be Init7 either. For all I know there are many others,
maybe even here in Switzerland. Init7 is what I decided to go with initially
because of the Gbit, but the more I leared about their philosophy, the less
important the bandwith got.

We need to support companies like these because companies like these are
what ensures that the internet of the future will be as awesome as the
internet is today.

Thoughts on IPv6

A few months ago, the awesome provider Init7 has released their
awesome FTTH offering Fiber7 which provides
synchronous 1GBit/s access for a very fair price. Actually, they are by
far the cheapest provider for this kind of bandwith.

Only cablecom comes close at matching them bandwidth wise with their 250Mbits
package, but that’s 4 times less bandwith for nearly double the price. Init7
also is one of the only providers who officially states that
their triple-play strategy is that they don’t do it. Huge-ass kudos for
that.

Also, their technical support is using Claws Mail on GNU/Linux – to give you
some indication of the geek-heaven you get when signing up with them.

But what’s really exciting about Init7 is their support for IPv6. In-fact,
Init7 was one of the first (if not the first) providers to offer IPv6 for
end users. Also, we’re talking about a real, non-tunneled, no strings attached
plain /48.

In case that doesn’t ring a bell, a /48 will allow for 216 networks
consisting of 264 hosts each. Yes. That’s that many hosts.

In eager anticipation of getting this at home natively (of course I ordered
Fiber7 the moment I could at my place), I decided to play with IPv6 as far as
I could with my current provider, which apparently lives in the stone-age and
still doesn’t provide native v6 support.

After getting abysmal pings using 6to4 about a year ago, this time I decided
to go with tunnelbroker which these days also
provides a nice dyndns-alike API for updating the public tunnel endpoint.

Let me tell you: Setting this up is trivial.

Tunnelbroker provides you with all the information you need for your tunnel
and with the prefix of the /64 you get from them and setting up for your own
network is trivial using radvd.

The only thing that’s different from your old v4 config: All your hosts will
immediately be accessible from the public internet, so you might want to
configure a firewall from the get-go – but see later for some thoughts in that
matter.

But this isn’t any different from the NAT solutions we have currently. Instead
of configuring port forwarding, you just open ports on your router, but the
process is more or less the same.

If you need direct connectivity however, you can now have it. No strings attached.

So far, I’ve used devices running iOS 7 and 8, Mac OS X 10.9 and 10.10,
Windows XP, 7 and 8 and none of them had any trouble reaching the v6 internet.
Also, I would argue that configuring radvd is easier than configuring DHCP.
There’s less thought involved for assigning addresses because
autoconfiguration will just deal with that.

For me, I had to adjust how I’m thinking about my network for a bit and I’m
posting here in order to explain what change you’ll get with v6 and how some
paradigms change. Once you’ve accepted these changes, using v6 is trivial and
totally something you can get used to.

  • Multi-homing (multiple adresses per interface) was something you’ve rarely
    done in v4. Now in v6, you do that all the time. Your OSes go as far as to
    grab a new random one every few connections in order to provide a means of
    privacy.
  • The addresses are so long and hex-y – you probably will never remember them.
    But that’s ok. In general, there are much fewer cases where you worry about
    the address.

    • Because of multi-homing every machine has a guaranteed static address
      (built from the MAC address of the interface) by default, so there’s no
      need to statically assign addresses in many cases.
    • If you want to assign static addresses, just pick any in your /64.
      Unless you manually hand out the same address to two machines,
      autoconfiguration will make sure no two machines pick the same address.
      In order to remember them, feel free to use cute names – finally you got
      some letters and leetspeak to play with.
    • To assign a static address, just do it on the host in question. Again,
      autoconfig will make sure no other machine gets the same address.
  • And with Zeroconf (avahi / bonjour), you have fewer and fewer oportunities
    to deal with anything that’s not a host-name anyways.
  • You will need a firewall because suddenly all your machines will be
    accessible for the whole internet. You might get away with just the local
    personal firewall, but you probably should have one on your gateway.
  • While that sounds like higher complexity, I would argue that the complexity
    is lower because if you were a responsible sysadmin, you were dealing with
    both NAT and a firewall whereas with v6, a firewall is all you need.
  • Tools like nat-pmp or upnp don’t support v6 yet as far as I can see, so
    applications in the trusted network can’t yet punch holes in the firewall
    (what is the equivalent thing to forwarding ports in the v4 days).

Overall, getting v6 running is really simple and once you adjust your mindset
a bit, while stuff is unusual and taking some getting-used-to, I really don’t
see v6 as being more complicated. Quite to the contrary actually.

As I’m thinking about firewalls and opening ports, actually, as hosts get
wiser about v6, you actually really might get away without a strict firewall
as hosts could grab a new random v6 address for every connection they want to
use and then they would just bind their servers to that address.

Services binding to all addresses would never bind to these temporary addresses.

That way none of the services brought up by default (you know – all those
ports open on your machine when it runs) would be reachable from the outside.
What would be reachable is the temporary addresses grabbed by specific
services running on your machine.

Yes. An attacker could port-scan your /64 and try to find the non-temporary
address, but keep in mind that finding that one address out of 264
addresses would mean that you have to port-scan 4 billion traditional v4
internets per attack target (good luck) or randomly guessing with an average
chance of 1:263 (also good luck).

Even then a personal firewall could block all unsolicited packets from
non-local prefixes to provide even more security.

As such, we really might get away without actually needing a firewall at the
gateway to begin with which will actually go great lengths at providing the
ubiquitous configuration-free p2p connectivity that would be ever-so-cool and
which we have lost over the last few decades.

Me personally, I’m really happy to see how simple v6 actually is to get
implemented and I’m really looking forward to my very own native /48 which I’m
probably going to get somehwere in September/October-ish.

Until then, I’ll gladly play with my tunneled /64 (for now still firewalled,
but I’ll investigate into how OS X and Windows deal with the temporary
addresses they use which might allow me to actually turn the firewall off).

when in doubt – SSL

Since 2006, as part of our product, we are offering barcode scanners
with GSM support to either send orders directly to the vendor or to
transmit products into the web frontend where you can further edit them.

Even though the devices (Windows Mobile. Crappy. In progress of
updating) do support WiFi, we really only support GSM because that means we don’t have to share the end users infrastructure.

This is a huge plus because it means that no matter how locked-down the
customer’s infrastructure, no matter how crappy the proxy, no matter the IDS in use, we’ll always be able to communicate with our server.

Until, of course, the mobile carrier most used by our customers decides
to add a “transparent” (do note the quotes) proxy to the mix.

We were quite stomped last week when we got reports of an HTTP error 408 to be reported by the mobile devices, especially because we were not seeing error 408 in our logs.

Worse, using tcpdump has clearly shown how we were getting a RST
packet from the client, sometimes before sending data, sometimes while
sending data.

Strange: Client is showing 408, server is seeing a RST from the client.
Doesnt’ make sense.

Tethering my Mac using the iPhones personal hotspot feature and a SIM
card of the mobile provider in question made it clear: No longer are we
talking directly to our server. No. What the client receives is a 408
HTML formatted error message by a proxy server.

Do note the “DELETE THIS LINE” and “your organization here” comments.
What a nice touch. Somebody was really spending alot of time getting
this up and running.

Now granted, taking 20 seconds before being able to produce a response
is a bit on the longer side, but unfortunately, some versions of the
scanner software require gzip compression and gzip compression needs to
know the full size of the body to compress, so we have to prepare the
full response (40 megs uncompressed) before being able to send anything
– that just takes a while.

But consider long-polling or server sent events – receiving a 408 after
just 20 seconds? That’s annoying, wasting resources and probably not
something you’re prepared for.

Worse, nobody was notified of this change. For 7 years, the clients
were able to connect directly to our server. Then one day it changes
and now they aren’t. No communication, no time to prepare and
certainly too strict limits in order to not affect anything (not
just us – see my remark about long polling).

The solution in the end is, like so often, to use SSL. SSL connections
are opaque to any reverse proxy. A proxy can’t decrypt the data without
the client noticing. An SSL connection can’t be inspected and an SSL
connection can’t be messed with.

Sure enough: The exact same request that fails with that 408 over HTTP
goes through nicely using HTTPS.

This trick works every time when somebody is messing with your
connection. Something f’ing up your WebSocket connection? Use SSL!
Something messing with your long-polling? Use SSL. Something
decompressing your response but not stripping off the Content-Encoding
header (yes. that happend to me once)? Use SSL. Something replacing
arbitrary numbers in your response with asterisks (yepp. happened too)?
You guessed it: Use SSL.

Of course, there are three things to keep in mind:

  1. Due to the lack of SNI in the world’s most used OS and Browser
    combination (any IE under Windows XP), every SSL site you host requires
    one dedicated IP address. Which is bad considering that we are running
    out of addresses.

  2. All of the bigger mobile carriers have their CA in the browsers
    trusted list. Aside of ethics, there is no reason what so ever for them
    to not start doing all the crap I described and just re-encrypting the
    connection, faking a certificate using their trusted ones.

  3. failing that, they still might just block SSL at some point, but as
    more and more sites are going SSL only (partially for above reasons no
    doubt), outright blocking SSL is going to be more and more unlikely to
    happen.

So. Yes. When in doubt: Use SSL. Not only does that help your users
privacy, it also fixes a ton of technical issues created by practically
non-authorized third-party messing with you.

how to accept SSL client certificates

Yesterday I was asked on twitter how you would use client certificates
on a web server in order to do user authentication.

Client certificates are very handy in a controlled environment and they
work really well to authenticate API requests. They are, however,
completely unusable for normal people.

Getting meaningful information from client side certificates is
something that’s happening as part of the SSL connection setup, so it
must be happening on whatever piece of your stack that terminates the
client’s SSL connection.

In this article I’m going to look into doing this with nginx and Apache
(both traditional frontend web servers) and in node.js which you might
be using in a setup where clients talk directly to your application.

In all cases, what you will need is a means for signing certificates in
order to ensure that only client certificates you signed get access to
your server.

In my use cases, I’m usually using openssl which comes with some
subcommands and helper script to run as a certificate authority. On the
Mac if you prefer a GUI, you can use Keychain Access which has all you
need in the “Certificate Assistant” submenu of the application menu.

Next, you will need the public key of your users. You can have them
send in a traditional CSR and sign that on the command line (use
openssl req to create the CSR, use openssl ca to sign it), or you
can have them submit an HTML form using the <keygen> tag (yes. that
exists. Read up on it on MDN
for example).

You absolutely never ever in your lifetime want the private key of
the user. Do not generate a keypair for the user. Have them generate a
key and a CSR, but never ever have them send the key to you. You only
need their CSR (which contains their public key, signed by their
private key) in order to sign their public key.

Ok. So let’s assume you got that out of your way. What you have now is
your CAs certificate (usually self-signed) and a few users which now
own certificates you have signed for them.

Now let’s make use of this (I’m assuming you know reasonably well how
to configure these web servers in general. I’m only going into the
client certificate details).

nginx

For nginx, make sure you have enabled SSL using the usual steps. In
addition to these, set ssl_client_certificate
(docs)
to the path of your CA’s certificate. nginx will only accept client
certificates that have been signed by whatever ssl_client_certificate
you have configured.

Furthermore, set ssl_verify_client
(docs)
to on. Now only requests that provide a client certificate signed by
above CA will be allowed to access your server.

When doing so, nginx will set a few additional variables for you to
use, most importantly $ssl_client_cert (full certificate),
$ssl_client_s_dn (the subject name of the client certificate),
$ssl_client_serial (the serial number your CA has issued for their
certificate) and most importantly $ssl_client_verify which you should
check for SUCCESS.

Use fastcgi_param or add_header to pass these variables through to
your application (in the case of add_header make sure that it was
really nginx who set it and not a client faking it).

I’ll talk about what you do with these variables a bit later on.

Apache

As with nginx, ensure that SSL is enabled. Then set
SSLCACertificateFile to the path to your CA’s certificate. Then set
SSLVerifyClient to require
(docs).

Apache will also set many variables for you to use in your application.
Most notably SSL_CLIENT_S_DN (the subject of the client
certificate)and SSL_CLIENT_M_SERIAL (the serial number your CA has
issued). The full certificate is in SSL_CLIENT_CERT.

node.js

If you want to handle the whole SSL stuff on your own, here’s an
example in node.js. When you call http.createServer
(docs),
pass in some options. One is requestCert which you would set to true.
The other is is ca which you should set to an array of strings in PEM
format which is your CA’s certificate.

Then you can check whether the certificate check was successful by
looking at the client.authorized property of your request object.

If you want to get more info about the certificate, use
request.connection.getPeerCertificate().

what now?

Once you have the information about the client certificate (via
fastcgi, reverse proxy headers or apache variables in your module),
then the question is what you are going to do with that information.

Generally, you’d probably couple the certificate’s subject and its
serial number with some user account and then use the subject and
serial as a key to look up the user data.

As people get new certificates issued (because they might expire), the
subject name will stay the same, but the serial number will change, so
depending on your use-case use one or both.

There are a couple of things to keep in mind though:

  • Due to a flaw in the SSL protocol which was discovered in 2009,
    you cannot safely have only parts of your site require a certificate.
    With most client libraries, this is an all-or-nothing deal. There is
    a secure renegotiation, but I don’t think it’s widely supported at
    the moment.
  • There is no notion of signing out. The clients have to present their
    certificate, so your clients will always be signed on (which might
    be a good thing for your use-case)
  • The UI in traditional browsers to handle this kind of thing is
    absolutely horrendous.
    I would recommend using this only for APIs or with managed devices
    where the client certificate can be preinstalled silently.

You do however gain a very good method for uniquely identifying
connecting clients without a lot of additional protocol overhead. The
SSL negotiation isn’t much different whether the client is presenting a
certificate or not. There’s no additional application level code
needed. Your web server can do everything that’s needed.

Also, there’s no need for you to store any sensitive information. No
more leaked passwords, no more fear of leaking passwords. You just
store whatever information you need from the certificate and make sure
they are properly signed by your CA.

As long as you don’t lose your CAs private key, you can absolutely
trust your clients and no matter how much data they get when they
break into your web server, they won’t get passwords, not the ability
to log in as any user.

Conversely though, make sure that you keep your CA private key
absolutely safe. Once you lose it, you will have to invalidate all
client certificates and your users will have to go through the process
of generating new CSRs, sending them to you and so on. Terribly
inconvenient.

In the same vein: Don’t have your CA certificate expire too soon. If it
does expire, you’ll have the same issue at hand as if you lost your
private key. Very annoying. I learned that the hard way back in
2001ish and that was only for internal use.

If you need to revoke a users access, either blacklist his serial
number in your application or, much better, set up a proper CRL for
your certificate authority and have your web server check that.

So. Client certificates can be useful tool in some situations. It’s
your job to know when, but at least now you have some hints to get you
going.

Me personally, I was using this once around 2009ish for a REST
API, but I have since replaced that with oAuth because that’s what most
of the users knew best (read: “at all”). Depending on the audience,
client certificates might be totally foreign to them.

But if it works for you, perfect.

How I back up gmail

There was a discussion on HackerNews about Gmail having lost the email in some accounts. One sentiment in the comments was clear:

It’s totally the users problem if they don’t back up their cloud based email.

Personally, I think I would have to agree:

Google is a provider like every other ISP or basically any other service too. There’s no reason to believe that your data is more save on Google than it is any where else. Now granted, they are not exactly known for losing data, but there’s other things that can happen.

Like your account being closed because whatever automated system believed your usage patterns were consistent with those of a spammer.

So the question is: What would happen if your Google account wasn’t reachable at some point in the future?

For my company (using commercial Google Apps accounts), I would start up that IMAP server which serves all mail ever sent to and from Gmail. People would use the already existing webmail client or their traditional IMAP clients. They would lose some productivity, but no single byte of data.

This was my condition for migrating email over to Google. I needed to have a back up copy of that data. Otherwise, I would not have agreed to switch to a cloud based provider.

The process is completely automated too. There’s not even a backup script running somewhere. Heck, not even the Google Account passwords have to be stored anywhere for this to work.

So. How does it work then?

Before you read on, here are the drawbacks of the solution:

  • I’m a die-hard Exim fan (long story. It served me very well once – up to saving-my-ass level of well), so the configuration I’m outlining here is for Exim as the mail relay.
  • Also, this only works with paid Google accounts. You can get somewhere using the free ones, but you don’t get the full solution (i.e. having a backup of all sent email)
  • This requires you to have full control over the MX machine(s) of your domain.

If you can live with this, here’s how you do it:

First, you set up your Google domain as normal. Add all the users you want and do everything else just as you would do it in a traditional set up.

Next, we’ll have to configure Google Mail for two-legged OAuth access to our accounts. I’ve written about this before. We are doing this so we don’t need to know our users passwords. Also, we need to enable the provisioning API to get access to the list of users and groups.

Next, our mail relay will have to know about what users (and groups) are listed in our Google account. Here’s what I quickly hacked together in Python (my first Python script ever – be polite while flaming) using the GData library:

import gdata.apps.service

consumer_key = 'yourdomain.com'
consumer_secret = '2-legged-consumer-secret' #see above
sig_method = gdata.auth.OAuthSignatureMethod.HMAC_SHA1

service = gdata.apps.service.AppsService(domain=consumer_key)
service.SetOAuthInputParameters(sig_method, consumer_key,
  consumer_secret=consumer_secret, two_legged_oauth=True)

res = service.RetrieveAllUsers()
for entry in res.entry:
    print entry.login.user_name

import gdata.apps.groups.service

service = gdata.apps.groups.service.GroupsService(domain=consumer_key)
service.SetOAuthInputParameters(sig_method, consumer_key,
  consumer_secret=consumer_secret, two_legged_oauth=True)
res = service.RetrieveAllGroups()
for entry in res:
    print entry['groupName']

Place this script somewhere on your mail relay and run it in a cron job. In my case, I’m having its output redirected to /etc/exim4/gmail_accounts. The script will emit one user (and group) name per line.

Next, we’ll deal with incoming email:

In the Exim configuration of your mail relay, add the following routers:

yourdomain_gmail_users:
  driver = accept
  domains = yourdomain.com
  local_parts = lsearch;/etc/exim4/gmail_accounts
  transport_home_directory = /var/mail/yourdomain/${lc:$local_part}
  router_home_directory = /var/mail/yourdomain/${lc:$local_part}
  transport = gmail_local_delivery
  unseen

yourdomain_gmail_remote:
  driver = accept
  domains = yourdomain.com
  local_parts = lsearch;/etc/exim4/gmail_accounts
  transport = gmail_t

yourdomain_gmail_users is what creates the local copy. It accepts all mail sent to yourdomain.com, if the local part (the stuff in front of the @) is listed in that gmail_accounts file. Then it sets up some paths for the local transport (see below) and marks the mail as unseen so the next router gets a chance too.

Which is yourdomain_gmail_remote. This one is again checking domain and the local part and if they match, it’s just delegating to the gmail_t remote transport (which will then send the email to Google).

The transports look like this:

gmail_t:
  driver = smtp
  hosts = aspmx.l.google.com:alt1.aspmx.l.google.com:
    alt2.aspmx.l.google.com:aspmx5.googlemail.com:
    aspmx2.googlemail.com:aspmx3.googlemail.com:
    aspmx4.googlemail.com
  gethostbyname

gmail_local_delivery:
  driver = appendfile
  check_string =
  delivery_date_add
  envelope_to_add
  group=mail
  maildir_format
  directory = MAILDIR/yourdomain/${lc:$local_part}
  maildir_tag = ,S=$message_size
  message_prefix =
  message_suffix =
  return_path_add
  user = Debian-exim
  create_file = anywhere
  create_directory

the gmail_t transport is simple. The local one you might have to patch up users and groups plus the location where you what to write the mail to.

Now we are ready to reconfigure Google as this is all that’s needed to get a copy of every inbound mail into a local maildir on the mail relay.

Here’s what you do:

  • You change the MX of your domain to point to this relay of yours

The next two steps are the reason you need a paid account: These controls are not available for the free accounts:

  • In your Google Administration panel, you visit the Email settings and configure the outbound gateway. Set it to your relay.
  • Then you configure your inbound gateway and set it to your relay too (and to your backup MX if you have one).

This screenshot will help you:

gmail config

All email sent to your MX (over the gmail_t transport we have configured above) will now be accepted by gmail.

Also, Gmail will now send all outgoing Email to your relay which needs to be configured to accept (and relay) email from Google. This pretty much depends on your otherwise existing Exim configuration, but here’s what I added (which will work with the default ACL):

hostlist   google_relays = 216.239.32.0/19:64.233.160.0/19:66.249.80.0/20:
    72.14.192.0/18:209.85.128.0/17:66.102.0.0/20:
    74.125.0.0/16:64.18.0.0/20:207.126.144.0/20
hostlist   relay_from_hosts = 127.0.0.1:+google_relays

And lastly, the tricky part: Storing a copy of all mail that is being sent through Gmail (we are already correctly sending the mail. What we want is a copy):

Here is the exim router we need:

gmail_outgoing:
  driver = accept
  condition = "${if and{
    { eq{$sender_address_domain}{yourdomain.com} }
    {=={${lookup{$sender_address_local_part}lsearch{/etc/exim4/gmail_accounts}{1}}}{1}}} {1}{0}}"
  transport = store_outgoing_copy
  unseen

(did I mention that I severely dislike RPN?)

and here’s the transport:

store_outgoing_copy:
  driver = appendfile
  check_string =
  delivery_date_add
  envelope_to_add
  group=mail
  maildir_format
  directory = MAILDIR/yourdomain/${lc:$sender_address_local_part}/.Sent/
  maildir_tag = ,S=$message_size
  message_prefix =
  message_suffix =
  return_path_add
  user = Debian-exim
  create_file = anywhere
  create_directory

The maildir I’ve chosen is the correct one if the IMAP-server you want to use is Courier IMAPd. Other servers use different methods.

One little thing: When you CC or BCC other people in your domain, Google will send out multiple copies of the same message. This will yield some message duplication in the sent directory (one per recipient), but as they say: Better backup too much than too little.

Now if something happens to your google account, just start up an IMAP server and have it serve mail from these maildir directories.

And remember to back them up too, but you can just use rsync or rsnapshot or whatever other technology you might have in use. They are just directories containing one file per email.

Find relation sizes in PostgreSQL

Like so many times before, today I was yet again in the situation where I wanted to know which tables/indexes take the most disk space in a particular PostgreSQL database.

My usual procedure in this case was to dt+ in psql and scan the sizes by eye (this being on my development machine, trying to find out the biggest tables I could clean out to make room).

But once you’ve done that a few times and considering that dt+ does nothing but query some PostgreSQL internal tables, I thought that I want this solved in an easier way that also is less error prone. In the end I just wanted the output of dt+ sorted by size.

The lead to some digging in the source code of psql itself (src/bin/psql) where I quickly found the function that builds the query (listTables in describe.c), so from now on, this is what I’m using when I need to get an overview over all relation sizes ordered by size in descending order:

select
  n.nspname as "Schema",
  c.relname as "Name",
  case c.relkind
     when 'r' then 'table'
     when 'v' then 'view'
     when 'i' then 'index'
     when 'S' then 'sequence'
     when 's' then 'special'
  end as "Type",
  pg_catalog.pg_get_userbyid(c.relowner) as "Owner",
  pg_catalog.pg_size_pretty(pg_catalog.pg_relation_size(c.oid)) as "Size"
from pg_catalog.pg_class c
 left join pg_catalog.pg_namespace n on n.oid = c.relnamespace
where c.relkind IN ('r', 'v', 'i')
order by pg_catalog.pg_relation_size(c.oid) desc;

Of course I could have come up with this without source code digging, but honestly, I didn’t know about relkind s, about pg_size_pretty and pg_relation_size (I would have thought that one to be stored in some system view), so figuring all of this out would have taken much more time than just reading the source code.

Now it’s here so I remember it next time I need it.