tempalias.com – sysadmin work

This is yet another episode in the development diary behind the creation of a new web service. Read the previous installment here.

Now that I made the SMTP proxy do its thing and that I’m able to serve out static files, I though it was time to actually set up the future production environment so that I can give it some more real-world testing and to check the general stability of the solution when exposed to the internet.

So I went ahead and set up a new VM using Ubuntu Lucid beta, running the latest (HEAD) redis and node and finally made it run the tempalias daemons (which I consolidated into one opening SMTP and HTTP ports at the same time for easier handling).

I always knew that deployment will be something of a problem to tackle. SMTP needs to run on port 25 (if you intend to be running on the machine listed as MX) and HTTP should run on port 80.

Both being sub 1024 in consequence require root privileges to listen on and I definitely didn’t want to run the first ever node.js code I’ve written to run with root privileges (even though it’s a VM – I don’t like to hand out free root on a machine that’s connected to the internet).

So additional infrastructure was needed and here’s what I came up with:

The tempalias web server listens only on localhost on port 8080. A reverse nginx proxy listens on public port 80 and forwards the requests (all of them – node is easily fast enough to serve the static content). This solves another issue I had which is HTTP content compression: Providing compression (Content-Encoding: gzip) is imperative these days and yet not something I want to implement myself in my web application server.

Having the reverse proxy is a tremendous help as it can handle the more advanced webserver tasks – like compression.

I quickly noticed though that the stable nginx release provided with Ubuntu Lucid didn’t seem to be willing to actually do the compression despite it being turned on. A bit of experimentation revealed that stable nginx, when comparing content-types for gzip_types checks the full response content-type including the charset header.

As node-paperboy adds the “;charset: UTF-8” to all requests it serves, the default setting didn’t compress. Thankfully though, nginx could live with

gzip_types "text/javascript; charset: UTF-8" "text/html; charset: UTF-8"

so that settled the compression issue.

Update: of course it should be “charset=UTF-8” instread of “charset: UTF-8” – with the equal sign, nginx actually compresses correctly. My patch to paperboy has since been accepted by upstream, so you won’t have to deal with this hassle.

Next was SMTP. As we are already an SMTP proxy and there are no further advantages of having incoming connections proxied further (no compression or anything), I wanted clients to somehow directly connect to the node daemon.

I quickly learned that even the most awesome iptables setup won’t make the Linux kernel accept on the lo interface anything that didn’t originate from lo, so no amount of NATing allows you to redirect a packet from a public interface to the local interface.

Hence I went by reconfiguring the SMTP server component of tempalias to listen on all interfaces, port 2525 and then redirect the port of packets on the public port from 25 to 2525.

This of course left the port 2525 open on the public interface which I don’t like.

A quickly created iptables rule rejecting (as opposed to dropping – I don’t want a casual port scanner to know that iptables magic is going on) any traffic going to 2525 also dropped the redirected traffic which of course wasn’t much help.

In comes the MARK extension. Here’s what I’ve done:

# mark packets going to port 25
iptables -t mangle -A PREROUTING -i eth0 -p tcp --dport 25 -j MARK --set-mark 99

# redirect packets going to port 25 to 2525
iptables -t nat -A PREROUTING -p tcp -i eth0 --dport 25 -j REDIRECT --to-ports 2525

# drop all incoming packets to 2525 which are not marked
iptables -A INPUT -i eth0 -p tcp --dport 2525 -m mark ! --mark 99 -j REJECT

So. Now the host responds on public port 25 (but not on public port 2525).

Next step was to configure DNS and tell Richard to create himself an alias using

curl --no-keepalive -H "Content-Type: application/json" 
     --data-binary '{"target":"t@example.com","days": 3,"max-usage": 5}' 
     -qsS http://tempalias.com/aliases

(yes. you too can do that right now – it’s live baby!)

Of course it blew up the moment the redis connection timed out, taking the whole node server with it.

Which was the topic of yesterdays coding session: The redis-node-client library is very brittle what connection tracking and keeping is concerned. I needed something quick, so I hacked the library to provide an additional very explicit connection management method.

Then I began discussing the issues I was having with redis-node-client’s author. He’s such a nice guy and we had one hell of a nice discussion which is still ongoing, so I will probably have to rewrite the backend code once more once we found out how to do this the right way.

Between all that sysadmin and library-fixing time, unfortunately, I didn’t yet have time to do all too much on the public facing website: http://tempalias.com at this point contains nothing but a gradient. But it’s a really nice gradient. One of the best.

Today: More redis-node-client hacking (provided I get another answer from fictorial) or finally some real HTML/CSS work (which I’m not looking forward to).

This is taking shape.

tempalias.com – rewrites

This is yet another installment in my series of posts about building a web service in node.js. The previous post is here.

Between the last post and current trunk of tempalias, there lie two substantial rewrites of core components of the service. One thing is that I completely misused Object.create() which takes an object to be the prototype of the object you are creating. I was of the wrong opinion that it works like Crockford’s object.create() which is creating a clone of the object you are passing.

Also, I learned that only Function objects actually have a prototype.

Not knowing these two things made it impossible to actually deserialize the JSON representation of an alias that was previously stored in redis. This lead to the first rewrite – this time of lib/tempalias.js. Now aliases work more like standard JS objects and require to be instantiated using the new operator, on the plus side though, they work as expected now.

Speaking of serialization. I learned that in V8 (and Safari)

isNan(Date.parse( (new Date()).toJSON() )) === true

which, according to the ES5 spec is a bug. The spec states that Date.parse() should be able to parse a string created by Date.doISOStirng() which is what is used by toJSON.

This ended up with me doing an ugly hack (string replacement) and reporting a bug in Chrome (where the bug happens too).

Anyhow. Friday and Saturday I took off the project, but today I was on it again. This time, I was looking into serving static content. This is how we are going to serve the web site after all.

Express does provide a Static plugin, but it’s fairly limited in that it doesn’t do any client side caching which, even though Node.js is crazy fast, seems imperative to me. Also while allowing you to configure the file system path it should serve static content from, it insists on the static content’s URL being /public/whatever, where I would much rather have kept the URL-Space together.

I tried to add If-Modified-Since-support to express’ static plugin, but I hit some strange interraction in how express handles the HTTP request that caused some connections to never close – not what I want.

After two hours of investigating, I was looking at a different solution, which leads us to rewrite two:

tempalias trunk now doesn’t depend on express any more. Instead, it serves the web service part of the URL space manually and for all the static requests, it uses node-paperboy. paperboy doesn’t try to convert node into Rails and it provides nothing but a simple static file handler for your web server which also works completely inside node’s standard method for handling web requests.

I prefer this solution by much because express was doing too much in some cases and too little in others: Express tries to somewhat imitate rails or any other web framework in that it not only provides request routing but also template rendering (in HAML and friends). It also abstracts away node’s HTTP server module and it does so badly as eveidenced by this strange connection not-quite-ending problem.

On the other hand, it doesn’t provide any help if you want to write something that doesn’t return text/html.

Personally, if I’m doing a RESTful service anyways, I see no point in doing any server-side HTML generation. I’d much rather write a service that exposes an API at some URL endpoints and then also a static page that uses JavaScript / AJAX to consume said API. This is where express provides next to no help at all.

So if the question is whether to have a huge dependency which fails at some key points and doesn’t provide any help with other key points or to have a smaller dependency that handles the stuff I’m not interested in, but otherwise doesn’t interfer, I’d much prefer that solution to the first one.

This is why I went with this second rewrite.

Because I was already using a clean MVC separation (the “view” being the JSON I emit in the API – there’s no view in the traditional sense yet), the rewrite was quite hassle-free and basically nothing but syntax work.

After completing that, I felt like removing the known issues from my blog post where I was writing about persistence: Alias generation is now race-free and alias length is stored in redis too. The architecture can still be improved in that I’m currently doing two requests to Redis per ALIAS I’m creating (SETNX and SET). By moving stuff around a little bit, I can get away with just the SETNX.

On the other hand, let me show you this picture here:

Screenshot of ab running in a terminalConsidering that the current solution is already creating 1546 aliases per second at a concurrency of 100 requests, I can probably get away without changing the alias creation code any more.

And in case you ask: The static content is served with 3000 requests per second – again with a concurrency of 100.

Node is fast.

Really.

Tomorrow: Philip learns CSS – I’m already dreading this final step to enlightenment: Creating the HTML/CSS front-end UI according to the awesome design provided by Richard.

tempalias.com – the cake is a lie

This is another installment of my development diary for tempalias.com, a web service that will allow you to create self-destructing email aliases. You can read the last previous here.

This was a triumph.
I’m making a note here: HUGE SUCCESS.
It’s hard to overstate my satisfaction.

I didn’t post an update on wednesday evening because it got very late and I just wanted to sleep. Today, it’s late yet again, but I can gladly report that the backend service is now feature complete.

We are still missing the UI, but with a bit of curl on the command line, you can use the restful web service interface to create aliases and you can use the generated aliases to send email via the now completed SMTP proxy – including time and usage based expiration.

As a reminder: All the code (i.e. the completed backend) is available on my github repository, though keep in mind that there is no documentation what so ever. That I will save for later when this is really going public. If you are brave, feel free to clone it.

You will need the trunk versions for both redis and node.

Screenshot of a terminal showing three consumptions of an alias and a fourth failng.

The screenshot is is showing me consuming an alias four times in a row. Three times, I get the data back, the fourth time, it’s gone.

The website itself is still in the process of being designed and I can promise you, it will be awesome. Richard’s last design was simply mind-blowing. Unfortunately I can’t show it here yet, because he used a non-free picture. Besides, we obviously can’t use non-free artwork for a Free Software project.

So this update concerns itself with two days of work. What was going on?

On wednesday, I wanted to complete the SMTP server, but before I went ahead doing so, I revised the servers design. At the end of the last posting here, we had a design where the SMTP proxy would connect to the smarthost the moment a client connects. It would then proceed to proxy through command by command, returning error messages as they are returned by the smarthost.

The issue with this design lies in the fact that tempalias.com is, by definition, not about sending mail, but about rejecting mail. This means that once it’s up and running, the majority of mail deliveries will simply fail at the RCPT state.

From this perspective, it doesn’t make sense to connect to the smarthost when a client connects. Instead, we should do the handshake up to and including the RCPT TO command, at which time we do the alias expansion. If that fails (which is the more likely case), we don’t need to bother to connect to upstream but we can simply deny the recipient.

The consequence of course is that our RCPT TO can now return errors that happened during MAIL FROM on the upstream server. But as MAIL FROM usually only fails with a 5xx error, this isn’t terribly wrong anyways – the saved resources far outweigh the not-so-perfect error messages.

Once I completed that design change, the next roadblock I went into was the fact that both the smtp server and the smtp client libraries weren’t quite as asynchronous as I would have wanted: The server was reading the complete mail from the client into memory and the client wanted the complete mail as a parameter to its data method.

That felt unpractical to me as in the majority of cases, we won’t get the whole mail at once, but we can certainly already begin to push it through to the smarthost, keeping memory usage of our smtp server as low as possible.

So my clone of the node SMTP library now contains support for asynchronous handling for DATA. The server fires data, data_available and data_end and the client provides startData(), sendData() and endData(). Of course the old functionality is still available, but the tempalias.com SMTP server is using the new interface.

So, that was Wednesday’s work:

  • only connect to the smarthost when it’s no longer inevitable
  • complete the smtp server node library
  • made the smtp server and client libraries fully asynchronous
  • complete the SMTP proxy (but without alias expansion yet)

Before I went to bed, the SMTP server was accepting mail and sending it using the smarthost. It didn’t do alias expansion yet but just rewrote the recipient to my private email address.

This is where I picked up Thursday night: The plan was to hook the alias model classes into the SMTP server as to complete the functionality.

While doing that, I had one more architectural thing to clear: How to make sure that I can decrement the usage-counter race-free? Once that was settled, the rest was pure grunt work by just writing the needed code.

As we are getting long and as it’s quite late again, I’m saving the post-mortem of this last task for tomorrow. You’ll get a chance learn about bugs in node, about redis’ DECR command and finally you will get a chance to laugh at me for totally screwing up the usage of Object.create().

Stay tuned.

tempalias.com – config file, SMTP cleanup, beginnings of a server

Welcome to the next installment of a series of blog posts about the creation of a new web service in node.js. The posts serve as a diary of how the development of the service proceeds and should give you some insight in how working with node.js feels right now. You can read the previous episode here.

Yesterday, I unfortunately didn’t have a lot of time to commit to the project, so I chose a really small task to complete: create a configuration file, a configuration file parser and use both to actually configure how the application should behave.

The task in general was made a lot easier by the fact that (current) node contains a really simple parser for INI style configuration files. For the simple type of configuration data I have to handle, the INI format felt perfect and as I got a free parser with node itself, that’s what I went with. So as of monday it’s possible to configure listening addresses and ports for both HTTP and SMTP daemons and additional settings for the SMTP part of the service.

Today I had more time.

The idea was to seriously look into the SMTP transmission. The general idea is that email sent to the tempalias.com domain will have to end up on the node server where the alias expansion is done and the email is prepared for final delivery.

While I strive to keep the service as self-contained as possible, I opted into forcing a smarthost to be present to do the actual mail delivery.

You see, mail delivery is a complicated task in general as you must deliver the mail and if you can’t, you have to notify the sender. Reasons for a failing delivery can be permanent (that’s easy – you just tell the sending server that there was a problem and you are done) or temporary. In case of many temporary errors you end up with the responsibility of needing to handle them.

Handling in case of temporary errors usually means: Keep the original email around in a queue and retry after the initial client has long disconnected. If you don’t succeed for a reasonably large amount of delivery attempts or if a permanent problem creeps up, then you  have to bounce the message back to the initial sender.

If you want to do the final email delivery, so that your app runs without any other dependencies, then you will end up not only writing an SMTP server but also a queueing system, something that’s way beyond the scope of simple alias resolution.

Even if I wanted to go through that hassle, it still wouldn’t help much as aside of the purely technical hurdles, there are also others on a more meta level:

If you intend to do the final delivery nowadays, you practically need to have a valid PTR record, you need to be in good standing with the various RBL’s, you need to handle SSL – the list goes on and on. Much of this is administrative in nature and might even create additional cost and is completely pointless considering the fact that you do usually have a dedicated smarthost around that takes your mail and does the final delivery. And even if you don’t: Installing a local MTA for the queue handling is easily done and whatever you install, it’ll be way more mature than what I could write in any reasonable amount of time.

So it’s decided: The tempalias codebase will require a smarthost to be configured. As mine doesn’t require authentication from a certain IP range, I can even get away without writing any SMTP authentication support.

Once that was clear, the next design decision was clear too: the tempalias smtp daemon should be a really, really thin layer around the smarthost. When a client connects to tempalias, we will connect to the smarthost (500ing (or maybe 400ing) out if we can’t – remember: immediate and permanent errors are easy to handle). When a client sends MAIL FROM, just relay it to the smarthost, returning back to the client whatever we got – you get the idea: the tempalias mail daemon is an SMTP proxy.

This keeps the complexity low while still providing all the functionality we need (i.e. rewriting RCPT TO).

Once all of this was clear, I sat down and had a look at the node-smtp servers and clients and it was immediately clear that both need a lot of work to even to the simple thing I had in mind.

This means that most of todays work went into my fork of node-smtp:

  • made the hostname in the banner configurable
  • made the smtp client library work with node trunk
  • fire additional events (on close, on mail from, on rcpt to)
  • fixed various smaller bugs

Of course I have notified upstream of my changes – we’ll see what they think about.

On the other hand, the SMTP server part of tempalias (incidentally the first SMTP server I’m writing. ever) also took shape somewhat. It now correctly handles proxying from initial connection up until DATA. It doesn’t do real alias expansion yet, but that’s just a matter of hooking it into the backend model class I already have – for now I’m happy with it rewriting all passed recipients to my own email address for testing.

I already had a look at how node-smtp’s daemon handles the DATA command and I have to say that gobbling up data into memory until the client stops sending data or we run out of heap isn’t quite what I need, so tomorrow I will have to change node-smtp even more in that it fires events for every bit of data that was received. That way a consumer of the API can do some validation on the various chunks  (mostly size validation) and I can pass the data directly to the smarthost as it arrives.

This keeps memory usage of the node server small.

So that’s what I’m going to do tomorrow.

On a different note, I had some thought going into actual deployment, which probably will end up with me setting up a reverse proxy after all, but this is a topic for another discussion.

tempalias.com – SMTP and design

After being sick the end of last week, only today I found time and willpower to continue working on this little project of mine.

For people just coming to the series with this article: This is a development diary about the creation of a web service for autodestructing email addresses. Read the previous installment here.

The funny thing about the projcet is that people all around me seem to like the general idea behind the service. I even got some approval from Ebi (who generally dislikes everything that’s new) and this evening I was having dinner with a former coworker of mine whom I know for doing kick-ass web design.

He too liked the idea of the project and I could con him into creating the screen design of tempalias.com. This is a really good thing as whatever Richard touches comes out beautiful and usable.

For example, he told me that it makes way more sense to just expose a valid until date and in the form of “Valid for x days” instead of asking the user to provide a real date. This is not only much clearer and easier to use, it also fixes a brewing timezone problem I had with my previous design:

Valid for “3 days from now” is 3 days from now wherever on the world you are. But valid until 2010-04-16 is different depending on where you are.

This is a rare case of where adding usability also keeps the code simpler.

So, this is what Richard came up with so far:

Mockup of the tempalias website designIt’s not finalized yet, but in the spirit of publishing here early and often, I’m posting this now. It’s actually the third iteration already and Richard is still working on making it even nicer. But it’s already 2124 times better than what I could ever come up with.

On the code-front, I was looking into the SMTP server, where I found @kennethkalmer’s node-smtp project which provides a very rough implementation of an SMTP daemon.

Unfortunately, it doesn’t run under node trunk (or even 0.1.30), but with the power of github, I was able to create my own fork at

http://github.com/pilif/node-smtp

My fork contains a bit of additional code compared to the source:

  • Runs under node trunk (where trunk is defined as “node as it was last tuesday”)
  • Enforces proper SMTP protocol sequence (first: HELO, then MAIL FROM, then RCPT TO and finally DATA)
  • Supports multiple recipients (by handling multiple RCPT TO)
  • Does some email address validation (which is way too strict for being RFC compliant)

Tomorrow, I’m going to use this fork to build an SMTP server that we’ll be using for alias processing, where I will have to put some thought into actual mail delivery: Do I deliver the mail myself? Am I offloading it to a mail relay (I really want to do this. But read more tomorrow)? If so, how is this done with the most memory efficiency?

We’ll see.

tempalias.com – persistence

(This is the third installment of a development diary about the creation of a self destructing email alias service. Read the previous episode here.)

After the earlier clear idea on how to handle the aliases identity, the next question I needed to tackle was the question of persistence: How do I want to store these aliases? Do I want them to persist a server restart? How would I access them?

On the positive side remains the fact that the data structure for this service is practically non-existant: Each alias has its identity and some data associated with it, mainly a target address and the validity information. And lookup will always happen using that identity (with the exception of garbage collection – something I will tackle later).

So this is a clear candiate to use a very simple key/value store. As I hope to gain at least some traction though (wait until I coded the bookmarklet), I would want this to be at least of some robustness, hence writing flat-files seemed like a bad idea.

Ironically, if you want a really simple, built-in solution for data persistance in node.js, you have two options: Either write your own (which is where I don’t want to go to) or use SQLite which is total overkill for the current solution.

So I had the option of just keeping stuff in memory (as plain JS objects or using memcache)  or to use any of the supported key/value storage services.

Aliases going away on server restart felt like a bad thing, so I looked into the various key/value stores.

While looking at the available libraries, I went for the one that was most recently updated, which is redis-node-client. Of course, this meant that I had to use both redis trunk and node trunk as the library is really tracking the bleeding edge. I don’t mind that much though because both redis and node are very self-contained and compile easily on both linux (deployment) and mac os (development) while requiring next to no configuration.

So with a decision made for both persistence and identity, I went ahead and wrote more code.

On the project page, you will see few commits completing the full functionality I wanted a POST to /aliases to have – including persistence using redis and identity using the previously described method of brute-forcing the issue.

I still have two issues at the moment that will need tackling

  1. The initial length of the pseudo-uuid isn’t persisted. This means that once enough aliases are created that we are increasing the length and I’m restarting the server, I will get needless collisions or even a too heavily-used keyspace.
  2. The current method of checking for ID availability and later usage is totally non-race-proof and needs some serious looking-into.

Stuff I learned:

  • node is extremely work-in-progress. While it runs flawlessly and never surprises me with irreproducible or even just seemingly illogical behavior, features appear and disappear at will.
  • This state of flux in node makes it really hard to work with external dependencies. In this case, multipart.js vanished from node trunk (without change log entry either), but express still depends upon that. On the other hand, I’m forced to use node trunk otherwise redis client won’t work.
  • Date(“<timestamp>”) in node is dependent on the local timezone and changing process.env.TZ post-startup doesn’t have any effect. This means that I’m going to have to set TZ=UTC in my start script.
  • Working with an asynchronous API seems strange sometimes, but the power of closures usually comes to the rescue. I certainly wouldn’t want to have to write software like this if I didn’t have closures at my disposal (and, NO, global variables are NOT a viable alternative…)

tempalias.com – another day

This is the second installment of an article series about creating a web service for self-destructing email aliases. Read part 1 here.

Today, I spent a lot of thought and experimentation with two issues:

  1. How would I name and identify the temporary aliases?
  2. How would I store the temporary aliases

Naming and identifying sounds easy. One is inclined to just use an incrementing integer or something alike. But that won’t work for security reasons. If the address you got is 12@tempalias.net, with any likelyhood, there will be an 11@ and a 13@.

Using that information, you could easily bring the whole service down (and endlessly annoy its users) by requesting an address to get the current ID and then sending a lot of mail to the neighboring IDs. If those were created without a mail count limitation, then you could spam the recipient for the whole validity period and if they were created with a count limitation, you could use up all allowed mails.

So the aliases need to be random.

Which leads to the question of how to ensure uniqueness.

Unique random numbers you ask? Isn’t this what UUIDs were invented for?

True. But considering the length of an UUID, would you really want to have an alias in the form e8ea98ce-dabc-42f8-8fcd-c50d20b1f2c5@tempalias.net? That address is so long that it might even hit some length limitation of the target site, which of course is true even if you apply cheap tricks like removing the dashes.

Of course, using base16 to encode an UUID (basically an 128 bit integer) is hopelessly inefficient. By increasing the amount of characters we use, we might be able to decrease the amount of characters.

Keep in mind though, that the string in question is to be a local part of an email address and those tend to be case insensitive with not much guarantees that case is preserved over the process of delivering the message.

That, of course, limits the amount of characters we can use to basically 0-9 and A-Z (plus a few special characters like + . – and _).

This is what Base32 was invented for, but unfortunately, a base32 encoded UUID would still be around 26 characters in length. While that’s a bit better, I still wouldn’t want the email address scheme to be eda3u3rzcfer3fztdvvd6xnd3i@tempalias.com

So in the end, we need something way smaller (adding + . – and _ to the character space wouldn’t help much – what comes out is about 20 characters in length).

In the end, I would probably have to create a elaborate scheme doing something like this:

  • pick a UUID. Use the first n bytes.
  • base32 encode.
  • Check whether that ID is free. If not, add 1 to n and try again.
  • Keep n around so that in the future, we can already start with taking bigger chunks.

So the moment we reach the first collision, we increase the keyspace eight-fold. That feels sufficiently safe from collisions to me, but of course it increases the maintenance burden somewhat.

The next question was how to get UUIDs and how to base32 encode them from JavaScript.

I tried different aspects, one of which even included using uuidjs and doing the b32 encoding/decoding in C. The good part about that: I now have a general idea of how to extend nodejs with C++ code (yeah. it has to be C++ and my b32 code was C, so I had to do a bit of trickery there too).

In the end though, considering that I can’t use UUIDs anyways, we can go forward using Math.uuid.js and use their call using both len and radix (with the additional change of only using lowercase to encode the data), increasing the length as we hit collisions.

So the next issue is storage: How to store the alias data? How to access it?

This will be part of the next posting here.