Why node.js excites me

Today, on Hacker News, an article named “Why node.js disappoints me” appeared – right on the day I returned back from jsconf.eu (awesome conference. Best days of my life, I might add) where I was giving a talk about using node.js for a real web application that provides real use: tempalias.com

Time to write a rebuttal, I guess.

The main gripe Eric has with node is a gripe with the libraries that are available. It’s not about performance. It’s not about ease of deployment, or ease of development. In his opinion, the libraries that are out there at the moment don’t provide anything new compared to what already exists.

On that level, I totally agree. The most obvious candidates for development and templating try to mimik what’s already out there for other platforms. What’s worse: There seems to be no real winner and node itself doesn’t seem to make a recommendation or even include something with the base distribution.

This is inherently a good thing though. Node.js isn’t your complete web development stack. Far from it.

Node is an awesome platform to very easily write very well performing servers. Node is an awesome platform to use for your daily shell scripting needs (allowing you to work in your favorite language even for these tasks). Node isn’t about creating awesome websites. It’s about giving you the power to easily build servers. Web, DNS, SMTP – we’ve seen all.

To help you with web servers and probably to show us users how it’s done, node also provides a very good library to interact with the HTTP protocol. This isn’t about generating web pages. This isn’t about URL routing, or MVC or whatever. This is about writing a web server. About interracting with HTTP clients. Or HTTP servers. On the lowest level.

So when comparing node with other platforms, you must be careful to compare apple with apples. Don’t compare pure node.js to rails. Compare it to mod_wsgi, to fastcgi, to a servlet container (if you must) or to mod_php (the module that allows a script of yours access to server internals. Not the language) or mod_perl.

In that case, consider this. With node.js you don’t worry about performance, you don’t worry about global locks (you do worry about never blocking though), and you really, truly and most awesomely don’t worry about race conditions.

Assuming

    var a = 0;
    var f = function(){
        var t = a; // proving a point here. I know it's not needed
        a = t + 1;
    }
    setTimeout(f, 100);
    setTimeout(f, 100);

you’d always end up with a === 2 once both timeouts have executed. There is no interruption between the assignment of t and the increment. No worries about threading. No hours wasted trying to find out why a suddenly (and depending on the load on your system) is either 1, 2 or 3.

In the years we got experience in programming, we learned that what f does in my example above is a bad thing. We feel strange when typing code like this – seeking for any method of locking, of specifying a critical section. With node, there’s no need to.

This is why writing servers (remember: highly concurrent access to potentially the same code) is so much fun in node.

The perfect little helpers that were added to deal with the HTTP protocol are just the icing on the cake, but in so many other frameworks (cough WSGI cough) stuff like chunking, multipart parsing, even just reading the client’s data from an input stream are hard if you do them on your own, or completely beyond your control if you let the libraries do them.

With node you get to the knobs to turn in the easiest way possible.

Now we know that we can easily write well performing servers (of any kind with special support for HTTP) in node, so let’s build a web site.

In traditional frameworks, your first step would be to select a framework (because the HTTP libraries are so effing (technical term) hard to use).

You’d end up with something lightweight like, say, mnml or werkzeug in python or something more heavy like rails for ruby (though rack isn’t nearly as bad as wsgi) or django for python. You’d add some kind of database abstraction or even ORM layer – maybe something that comes with your framework.

Sure. You could do that in node too. There are frameworks around.

But remember: Node is an awesome tool for you to write highly specialized servers.

Do you need to build your whole site in node?

Do you see this as a black or white situation?

Over the last year, I’ve done two things.

One is to layout a way how to augment an existing application (PHP, PostgreSQL) with a WebSocket based service using node to greatly reduce the load on the existing application. I didn’t have time to implement this yet, but it would work wonders.

The other thing was to prove a point and to implement a whole web application in node.

I built tempalias.com

At first I fell into the same trap that anybody coming from the “old world” would be falling. I selected what seemed to be the most used web framework (Express) and rolled with that, but I soon found out that I have it all backwards.

I don’t want to write the 50iest web application. I wanted to do something else. Something new.

When you look at the tempalias source code (yeah – the whole service is open source so all of us can learn from it), you’ll notice that no single byte of HTML is dynamically generated.

I ripped out Express. I built a RESTful API for the main functionality of the site: Creating aliases. I built a server that does just that and nothing more.

I leveraged all the nice features JavaScript as a language provides me with to build a really cool backend. I used all the power that node provides me with to build a really cool (and simple!) server to web-enable that API (posting and reading JSON to and from the server)

The web client itself is just a client to that API. No single byte of that client is dynamically generated. It’s all static files. It’s using Sammy, jQuery, HTML and CSS to do its thing, but it doesn’t do anything the API I built on node doesn’t expose.

Because it’s static HTML, I could serve that directly from nginx I’m running in front of node.

But because I wanted the service to be self-contained, I plugged in node-paperboy to serve the static files from node too.

Paperboy is very special and very, very cool.

It’s not trying to replace node’s HTTP library. It’s not trying to abstract away all the niceties of node’s excellent HTTP support. It’s not even trying to take over the creation of the actual HTTP server. Paperboy is just a function you call with the request and response object you got as part of node’s HTTP support.

Whether you want to call it or not is your decision.

If you want to handle the request, you handle it.

If you don’t, you pass it on to paperboy.

Or foobar.

Or whatever.

Node is the UNIX of the tools to build servers with: It provides small dedicated tools that to one task, but truly, utterly excel at doing so.

So the libraries you are looking for are not the huge frameworks that do everything but just the one bit you really need.

You are looking for the excellent small libraries that live the spirit of node. You are looking for libraries that do one thing well. You are looking for libraries like paperboy. And you are relying on the excellent HTTP support to build your own libraries where the need arises.

It’s still very early in node’s lifetime.

You can’t expect everything to be there, ready to use it.

For some cases, that’s true. Need a DNS server? You can do that. Need an SMTP daemon? Easy. You can do that. Need a HTTP server that understands the HTTP protocol really well and provides excellent support to add your own functionality? Go for it.

But above all: You want to write your server in a kick-ass language? You want to never have to care about race conditions when reading, modifying and writing to a variable? You want to be sure not to waste hours and hours of work debugging code that looks right but isn’t?

Then node is for you.

It’s no turnkey solution yet.

It’s up to you to make the most out of it. To combine it with something more traditional. Or to build something new, maybe rethinking how you approach the problem. Node can help you to provide an awesome foundation to build upon. It alone will never provide you with a blog in 10 minutes. Supporting libraries don’t at this time provide you with that blog.

But they empower you to build it in a way that withstands even the heaviest pounding, that makes the most out of the available resources and above all, they allow you to use your language of choice to do so.

JavaScript.

stabilizing tempalias

While the maintenance last weekend brought quite a bit of stabilization to the tempalias service, I quickly noticed that it was still dying sooner or later and while before updating node, it died due to not being able to allocate more memory, this time, it died by just not answering any requests any more.

A look at the error log quickly revealed quite many exceptions complaining about a certain request type not being allowed to have a body and finally one complaining about not being able to open a file due to having run out of file handles.

So I quickly improved error logging and restarted the daemon in order to get a stacktrace leading to these tons of exceptions.

This quickly pointed to paperboy which was sending the file even if the request was a HEAD request. http.js in node checks for this and throws whenever you send a body when you should not. That exception lead then to paperboy never closing the file (have I already complained how incredibly difficult it is to do proper exception handling the moment continuations get involved? I think not and I also think it’s a good topic for another diary entry). With the help of lsof I’ve quickly seen that my suspicions were true: the node process serving tempalias had tons of open handles to public/index.html.

I sent a patch for this behavior to @felixge which was quickly applied, so that’s fixed now. I hope it’s of some use for other people too.

Now knowing that having a look at lsof here and then might be a good idea, quickly revealed another problem: While the file handles were gone, I’ve noticed tons and tons of SMTP sockets staying open in CLOSE_WAIT state. Not good as that too will lead to handle starvation sooner or later.

On a hunch, I found out that connecting to the SMTP daemon and then disconnecting, not sending QUIT to let the server disconnect was what was causing the lingering sockets. Clients disconnecting like that is very common in case the sender sends a 5xx response which is what the tempalias daemon was designed for.

So I had to fix that in my fork of the node smtp daemon (the original upstream isn’t interested in daemon functionality and the owner I forked the daemon for doesn’t respond to my pull requests. Hence I’m maintaining my own fork for now).

Futher looks at lsof prove that now we are quite stable in resource consumption: No lingering connections, no unclosed file handles.

But the error log was still filling up. This time something about removeListener needing a function. Thanks to the callstack I now had in my error log, I quickly hunted that one down and fixed it – that was a very stupid mistake. Thankfully, because the mails I usually deliver are small enough so that socket draining usually wasn’t required.

Onwards to the next issue filling the error log: «This deferred has already been resolved».

This comes from the Promise.js library if you emit*() multiple times on the same promise. This time, of course, the callstack was useless (… at <anonymous> – why, thank you), but I was very lucky again in that I tested from home and my mail relay didn’t trust my home IP address and thus denied relaying with a 500 which immediately led to the exception.

Now, this one is crazy: When you call .addErrback() on a Promise before calling addCallback(), your callback will be executed no matter if the errback was executed first.

Promise.js does some really interesting things to simulate polymorphism in JavaScript and I really didn’t want to fix up that library as lately, node.js itself seems go to a simpler continuation style using a callback parameter, so sooner or later, I’ll have to patch up the smtp server library anyways to remove Promise.js if I want to adhere to current node style.

So I took the workaround route by just using addCallback() before addErrback() even though the other order feels more natural to me. In addition, I reported an issue with the author as this is clearly unexpected behavior.

Now the error log is pretty much silent (minus some ECONNRESET exceptions due to clients sending RST packets in mid-transfer, but I think they are uncritical to resource consumption), so I hope the overall stability of the site has improved a bunch – I’d love not having to restart the daemon for more than a day :-)

Do spammers find pleasure in destroying fun stuff?

Recently, while reading through the log file of the mail relay used by tempalias, I noticed a disturbing trend: Apparently, SPAM was being sent through tempalias.

I’ve seen various behaviours. One was to strangely create an alias per second to the same target and then delivering email there.

While I completely fail to understand this scheme, the other one was even more disturbing: Bots were registering {max-usage: 1, days: null} aliases and then sending one mail to them – probably to get around RBL checks they’d hit when sending SPAM directly.

Aside of the fact that I do not want to be helping spammers, this also posed a technical issue: node.js head which I was running back when I developed the service tended to leak memory at times forcing me to restart the service here and then.

Now the additional huge load created by the bots forced me to do that way more often than I wanted to. Of course, the old code didn’t run on current node any more.

Hence I had to take tempalias down for maintenance.

A quick look at my commits on GitHub will show you what I have done:

  • the tempalias SMTP daemon now does RBL checks and immediately disconnects if the connected host is listed.
  • the tempalias HTTP daemon also does RBL checks on alias creation, but it doesn’t check the various DUL lists as the most likely alias creators are most certainly listed in a DUL
  • Per IP, aliases can only be generated every 30 seconds.

This should be some help. In addition, right now, the mail relay is configured to skip sender-checks and sa-exim scans (Spam Assassin on SMTP time as to reject spam before even accepting it into the system) for hosts where relaying is allowed. I intend to change that so that sa-exim and sender verify is done regardless if the connecting host is the tempalias proxy.

Looking at the mail log, I’ve seen the spam count drop to near-zero, so I’m happy, but I know that this is just a temporary victory. Spammers will find ways around the current protection and I’ll have to think of something else (I do have some options, but I don’t want to pre-announce them here for obvious reasons).

On a more happy note: During maintenance I also fixed a few issues with the Bookmarklet which should now do a better job at not coloring all text fields green eventually and at using the target site’s jQuery if available.

tempalias – validity limits

I’ve just pushed a small update to tempalias.com that imposes some (generous) limits to the values you can provide for the validity:

  • the maximum amount of days an alias can be valid is now 60 days.
  • the maximum amount of messages that can be sent to an aliases is now set to 100 messages.

I realized that there might be some potential for abusing tempalias.com if the aliases have a practically unlimited duration. Besides, then they wouldn’t be tempaliases any more. Right?

Already generated aliases with longer durations stay valid – true to the spirit of not looking into the data my users provided me with, I’m not going to check the existing aliases.

tempalias.com – creating the bookmarklet

Now that the bookmarklet feature is finished, let me take a few minutes to reflect on its creation, in the spirit of continuing the development diary.

The reason for the long silence after the launch is, believe it or not, the weather: Over the time I made the initial tempalias service, I began to really enjoy taking my 17inch MacBook Pro outside on the balcony and write code from there. In fact, I enjoyed it so much that I really wanted to continue that tradition when doing more work on the site.

Unfortunately from May first until May 21st it was raining constantly which made coding on the balcony kind of no-fun to do.

Now the weather was great and I could finish what I began way earlier.

So. How does one create a bookmarklet?

I didn’t know much either, but in the end, the essence of a bookmarklet is JavaScript code that gets executed in the context of the page you are on when you are executing it. So that’s something to work with.

Of course, you don’t want to add all the code you need for your magic to work into that link target – that would be unmaintainable and there’s some risk of breakage once the link gets too big – who knows at what size of the script browsers begin cutting off the code.

So you basically do nothing but creating a script tag sourcing the real script. This is what I’m doing too – the non-minified version of that code is in util/bookmarklet_launcher_test.js.

Looking at that file, you’ll notice that the bookmarklet itself is configurable using that c variable (keeping the names short to keep the code as short as possible). The configuration is done on the results page that is shown once the alias has been generated (public/templates/result.template).

Why the host name? Because the script that is injected (public/bookmarklet.js) doesn’t know it – when it’s sourced, window.location would still point to the site it was sourced on. The script is static code, so the server can’t inject the correct host name either – in fact, all of tempalias is static code aside of that one RESTful endpoint (/aliases).

This is a blessing as it keeps the code clean and a curse as it makes stuff harder than usual at places – this time it’s just the passing around of the host name (which I don’t want to hard-code for easier deployment and development).

The next thing of note is how the heavy lifting script is doing its work: Because the DOM manipulation and event-hooking up needed to make this work is too hard for my patience, I decided that I wanted to use jQuery.

But the script is running in the context of the target site (where the form field should be filled out), so we neither can be sure that jQuery is available nor should we blindly load it.

So the script is really careful:

  • if jQuery is available and of version 1.4.2, that one is used.
  • If jQuery is available, but not of version 1.4.2, we load our own (well – the official one from Google’s CDN) and use that, while restoring the old jQuery to the site.
  • If jQuery is not available, we load our own, restoring window.$ if it pointed to something beforehand.

This procedure would never work if jQuery wasn’t as careful as it is not to pollute the global namespace – juggling two values (window.$ and window.jQuery) is possible – anything more is breakage waiting to happen.

The last thing we need to take care of, finally, is the fact that the bookmarklet is now running in the context of the target site and, hence, cannot do AJAX requests to tempalias.com any more. This is what JSONp was invented for and I had to slightly modify the node backend to make JSONp work for the bookmarklet script (this would be commit 1a6e8c – not something I’m proud of – tempalias_http.js needs some modularization now).

All in all, this was an interesting experience between cross domain restrictions and trying to be a good citizen on the target page. Also I’m sure the new knowledge will be of use in the future for similar projects.

Unfortunately, the weather is getting bad again, so the next few features will, again, have to wait. Ideas for the future are:

  • use tempalias.com as MX and CNAME as to create your own aliases for our own domain
  • create an iphone / android client app for the REST API (/aliases)
  • daemonize the main code on its own without the help of some shell magic
  • maybe find a way to still hook some minimal dynamic content generation into paperboy.

tempalias.com – now with bookmarklet

let’s say you want to create one of these temporary aliases, but you don’t actually want to leave the page you are on.

Good news is: Now you can.

http://vimeo.com/moogaloop.swf?clip_id=11995145&server=vimeo.com&show_title=0&show_byline=0&show_portrait=0&color=00ADEF&fullscreen=1

  1. Visit tempalias.com once.
  2. Create any alias you want the bookmarklet to create for you in the future
  3. In the confirmation screen, you will be offered the bookmarklet to drag to your bookmarks bar.

Now whenever you are on a site you want to create a temporary alias for, just click that bookmarklet, hover the email field and press the left mouse button. The alias will be generated and filled into that email form.

If you are interested in how this was made, read the next entry of my development diary.

If you like to find out more about tempalias and more projects of mine, you should follow me on twitter here.

tempalias.com – bookmarklet work

While the user experience on tempalias.com is already really streamlined, compared to other services that encode the expiration settings and sometimes even the target) into the email address (and are thus exploitable and in some cases requiring you to have an account with them), it loses in that, when you have to register on some site, you will have to open the tempalias.com website in its own window and then manually create the alias.

Wouldn’t it be nice if this worked without having to visit the site?

This video is showing how I want this to work and how the bookmarklet branch on the github project page is already working:

http://vimeo.com/moogaloop.swf?clip_id=11193192&server=vimeo.com&show_title=1&show_byline=0&show_portrait=0&color=00ADEF&fullscreen=1

The workflow will be that you create your first (and probably only) alias manually. In the confirmation screen, you will be presented with a bookmarklet that you can drag to your bookmark bar and that will generate more aliases like the one just generated. This works independently of cookies or user accounts, so it would even work across browsers if you are synchronizing bookmarks between machines.

The actual bookmarklet is just a very small stub that will contain all the configuration for alias creation (so the actual bookmarklet will be the minified version of this file here). The bookmarklet, when executed will add a script tag to the page that actually does the heavy lifting.

The script that’s running in the video above tries really hard to be a good citizen as it’s run in the context of a third party webpage beyond my control:

  • it doesn’t pollute the global namespace. It has to add one function, window.$__tempalias_com,  so it doesn’t reload all the script if you click the bookmark button multiple times.
  • while it depends on jQuery (I’m not doing this in pure DOM), it tries really hard to be a good citizen:
    • if jQuery 1.4.2 is already used on the site, it uses that.
    • if any other jQuery version is installed, it loads 1.4.2 but restores window.jQuery to what it was before.
    • if no jQuery is installed, it loads 1.4.2
    • In all cases, it calls jQuery.noConflict if $ is bound to anything.
  • All DOM manipulation uses really unique class names and event namespaces

While implementing, I noticed that you can’t unbind live events with just their name, so $().die(‘.ta’) didn’t work an I had to provide all events I’m live-binding to. I’m using live here because the bubbling up delegation model works better in a case where there might be many matching elements on any particular page.

Now the next step will be to add some design to the whole thing and then it can go live.