Google Apps – Provisioning – Two-Legged OAuth

Our company uses Google Apps premium for Email and shared documents, but in order to have more freedom in email aliases, in order to have more control over email routing and finally, because there are a couple of local parts we use to direct mail to some applications, all our mail, even though it’s created in Google Apps and finally ends up in Google Apps, goes via a central mail relay we are running ourselves (well. I’m running it).

Google Apps premium allows you to do that and it’s a really cool feature.

One additional thing I’m doing on that central relay is to keep a backup of all mail that comes from Google or goes to Google. The reason: While I trust them not to lose my data, there are stories around of people losing their accounts to Googles anti-spam automatisms. This is especially bad as there usually is nobody to appeal to.

So I deemed it imperative that we store a backup of every message so we can move away from google if the need to do so arises.

Of course that means though that our relay needs to know what local parts are valid for the google apps domain – after all, I don’t want to store mail that would later be bounced by google. And I’d love to bounce directly without relaying the mail unconditionally, so that’s another reason why I’d want to know the list of users.

Google provides their provisioning API to do that and using the GData python packages, you can easily access that data. In theory.

Up until very recently, the big problem was that the provisioning API didn’t support OAuth. That meant that my little script that retreives the local parts had to have a password of an administrator which is something that really bugged me as it meant that either I store my password in the script or I can’t run the script from cron.

With the Google Apps Marketplace, they fixed that somewhat, but it still requires a strange dance:

When you visit the OAuth client configuration (https://www.google.com/a/cpanel/YOURDOMAIN/ManageOauthClients), it lists you domain with the note “This client has access to all APIs.”.

This is totally not true though as Google’s definition of “all” apparently doesn’t include “Provisioning” :-)

To make two-legged OAuth work for the provisioning API, you have to explicitly list the feeds. In my case, this was Users and Groups:

Under “Client Name”, add your domain again (“example.com”) and unter One or More API Scopes, add the two feeds like this: “https://apps-apis.google.com/a/feeds/group/#readonly,https://apps-apis.google.com/a/feeds/user/#readonly”

This will enable two-legged OAuth access to the user and group lists which is what I need in my little script:

import gdata.apps.service
import gdata.apps.groups.service

consumer_key = 'YOUR.DOMAIN'
consumer_secret = 'secret' #check advanced / OAuth in you control panel
sig_method = gdata.auth.OAuthSignatureMethod.HMAC_SHA1

service = gdata.apps.service.AppsService(domain=consumer_key)
service.SetOAuthInputParameters(sig_method, consumer_key, consumer_secret=consumer_secret, two_legged_oauth=True)

res = service.RetrieveAllUsers()
for entry in res.entry:
    print entry.login.user_name

service = gdata.apps.groups.service.GroupsService(domain=consumer_key)
service.SetOAuthInputParameters(sig_method, consumer_key, consumer_secret=consumer_secret, two_legged_oauth=True)
res = service.RetrieveAllGroups()
for entry in res:
    print entry['groupName']

Twisted Tornado

Lately, the net is all busy talking about the new web server released by FriendFeed last week and how their server basically does the same thing as the Twisted framework that was around so much longer. One blog entry ends with

Why Facebook/Friendfeed decided to create a new web server is completely beyond us.

Well. Let me add my two cents. Not from a Python perspective (I’m quite the Python newbie, only having completed one bigger project so far), but from a software development perspective. I feel qualified to add the cents because I’ve been there and done that.

When you start any project, you will be on the lookout for a framework or solution to base your work on. Often times, you already have some kind of idea of how you want to proceed and what the different requirements of your solution will be.

Of course, you’ll be comparing existing requirements against the solutions around, but chances are that none of the existing solutions will match your requirements exactly, so you will be faced with changing them to match.

This involves not only the changes themselves but also other considerations:

  • is it even possible to change an existing solution to match your needs?
  • if the existing solution is an open source project, is there a chance of your changes being accepted upstream (this is not a given, by the way).
  • if not, are you willing to back- and forward-port your changes as new upstream versions get released? Or are you willing to stick with the version for eternity, manually back-porting security-issues?

and most importantly

  • what takes more time: Writing a tailor-made solution from scratch or learning how the most-matching solutions ticks to make it do what you want?

There is a very strong perception around, that too many features mean bloat and that a simpler solution always trumps the complex one.

Have a look at articles like «Clojure 1, PHP 0» which compares a home-grown, tailor-made solution in one language to a complete framework in another and it seems to favor the tailor-made solution because it was more performant and felt much easier to maintain.

The truth is, you can’t have it both ways:

Either you are willing to live with «bloat» and customize an existing solution, adding some features and not using others, or you are unwilling to accept any bloat and you will do a tailor-made solution that may be lacking in features, may reimplement other features of existing solutions, but will contain exactly the features you want. Thus it will not be «bloated».

FriendFeed decided to go the tailor-made route but instead of many other projects each day who go the tailor made route (take Django’s reimplementations of many existing Python technologies like templating and ORM as another example) and keep using that internally, they actually went public.

Not with the intention to bad-mouth Twisted (though it kinda sounded that way due to bad choice of words), but with the intention of telling us: «Hey – here’s the tailor-made implementation which we used to solve our problem – maybe it is or parts of it are useful to you, so go ahead and have a look».

Instead of complaining that reimplementation and a bit of NIH was going on, the community could embrace the offering and try to pick the interesting parts they see fitting for their implementation(s).

This kind of reinventing the wheel is a standard process that is going on all the time, both in the Free Software world as in the commercial software world. There’s no reason to be concerned or alarmed. Instead we should be thankful for the groups that actually manage to put their code out for us to see – in so many cases, we never get a chance to see it and thus lose a chance at making our solutions better.

(Unicode-)String handling done right

Today, found myself reading the chapter about strings on diveintopython3.org.

Now, I’m no Python programmer by any means. Sure. I know my share of Python and I really like many of the concepts behind the language. I have even written some smaller scripts in Python, but it’s not my day-to-day language.

That chapter about string handling really really impressed me though.

In my opinion, handling Unicode strings they way python 3 is doing is exactly how it should be done in every development environment: Keep strings and collections of bytes completely separate and provide explicit conversion functions to convert from one to the other.

And hide the actual implementation from the user of the language! A string is a collection of characters. I don’t have to care how these characters are stored in memory and how they are accessed. When I need that information, I will have to convert that string to a collection of bytes, giving an explicit encoding how I want that to be done.

This is exactly how it should work, but implementation details leaking into the language are mushing this up in every other environment I know of making it a real pain to deal with multibyte character sets.

Features like this is what convinces me to look into new stuff. Maybe it IS time to do more python after all.