On december 20th I gave a talk at the JSZurich user group meeting in Zürich.
The talk is about a decade old technology which can be abused to get full,
I was showing how you would script a Java Applet (which is completely hidden
from the user) to do the dirty work for you while you are creating a very nice
While it’s a very cool tech demo, it’s IMHO also a very bad security issue
which browser vendors and Oracle need to have a look at. The user sees nothing
but a dialog like this:
and once they click OK, they are completely owned.
Even worse, while this dialog is showing the case of a valid certificate, the
dialog in case of an invalid (self-signed or expired) certificate isn’t much
different, so users can easily tricked into clicking allow.
The source code of the demo application is on github
and I’ve already written about this on this blog here,
but back then I was mainly interested in getting it work.
By now though, I’m really concerned about putting an end to this, or at least
increasing the hurdle the end-user has to jump through before this goes off –
maybe force them to click a visible Applet. Or just remove the LiveConnect feature all
together from browsers, thus forcing applets to be visible.
But aside of the security issues, I still think that this is a very
interesting case of long forgotten technology. If you are interested, do have
a look at the talk and travel back in time to when stuff like this was only
half as scary as it is now.
Think huge shopping cart: You change the quantity of a line item and the line total as well as the order total will change.
This leads to the same data (line items) having three representations:
The model on the server
The HTML UI that is shown to the user
So the question is: Considering we have some HTML templating language to build 2), how do we get to 3).
Back in 2004 when I initially designed that system (using AJAX before it was widely called AJAX even), I hadn’t seen Crockford’s lectures yet, so I still lived in the “JS sucks” world, where I’ve done something like this
(Yeah – as I said: 2004. No object literals, global functions. We had a lot to learn back then, but so did you, so don’t be too angry at me – we improved)
Obviously, this doesn’t scale: As the line items got more complicated, that parameter list grew and grew. The HTML code got uglier and uglier and of course, cluttering the window object is a big no-no too. So we went ahead and built a beautiful design:
The first iteration was then parsing that JSON every time we needed to access any of the associated data (and serializing again whenever it changed). Of course this didn’t go that well performance-wise, so we began caching and did something like this (using jQuery):
Now each DOM element representing one of these <tr>’s had a ps_data member which allowed for quick access. The JSON had to be parsed only once and then the data was available. If it changed, writing it back didn’t require a re-serialization either – you just changed that property directly.
This design is reasonably clean (still not as DRY as the initial attempt which had the data only in that JSON string) while still providing enough performance.
Until you begin to amass datasets. That is.
Well. Until you do so and expect this to work in IE.
800 rows like this made IE lock up its UI thread for 40 seconds.
So more optimization was in order.
will kill IE. Remember: IE (still) doesn’t have getElementsByClassName, so in IE, jQuery has to iterate the whole DOM and check whether each elements class attribute contains “lineitem”. Considering that IE’s DOM isn’t really fast to start with, this is a HUGE no-no.
Nope. Nearly as bad considering there are still at least 800 tr’s to iterate over.
Would help if it weren’t 800 tr’s that match. Using dynaTrace AJAX (highly recommended tool, by the way) we found out that just selecting the elements alone (without the iteration) took more than 10 seconds.
So the general take-away is: Selecting lots of elements in IE is painfully slow. Don’t do that.
But back to our little problem here. Unserializing that JSON at DOM ready time is not feasible in IE, because no matter what we do to that selector, once there are enough elements to handle, it’s just going to be slow.
Now by chunking up the amount of work to do and using setTimeout() to launch various deserialization jobs we could fix the locking up, but the total run time before all data is deserialized will still be the same (or slightly worse).
So what we have done in 2004, even though it was ugly, was way more feasible in IE.
Which is why we went back to the initial design with some improvements:
phew crisis averted.
Loading time went back to where it was in the 2004 design. It was still bad though. With those 800 rows, IE was still taking more than 10 seconds for the rendering task. dynaTrace revealed that this time, the time was apparently spent rendering.
The initial feeling was that there’s not much to do at that point.
Until we began suspecting the script tags.
The page loaded instantly.
it took 10 seconds again.
So the final solution was to loop through the server-side model twice and do something like this:
Problem solved. Not too ugly design. Certainly no 2004 design any more.
And in closing, let me give you a couple of things you can do if you want to bring the performance of IE down to its knees:
Use broad jQuery selectors. $('.someclass') will cause jQuery to loop through all elements on the page.
Even if you try not to be broad, you can still kill performance: $('div.someclass'). The most help jQuery can expect from IE is getElementsByTagName, so while it’s better than iterating all elements, it’s still going over all div’s on your page. Once it’s more than 200, the performance extremely quickly falls down (probably doing some O(n^2) thing somehwere).
Use a lot of <script>-tags. Every one of these will force IE to marshal data to the scripting engine COM component and to wait for the result.
Next time, we’ll have a look at how to use jQuery’s delegate() to handle common cases with huge selectors.
The main gripe Eric has with node is a gripe with the libraries that are available. It’s not about performance. It’s not about ease of deployment, or ease of development. In his opinion, the libraries that are out there at the moment don’t provide anything new compared to what already exists.
On that level, I totally agree. The most obvious candidates for development and templating try to mimik what’s already out there for other platforms. What’s worse: There seems to be no real winner and node itself doesn’t seem to make a recommendation or even include something with the base distribution.
This is inherently a good thing though. Node.js isn’t your complete web development stack. Far from it.
Node is an awesome platform to very easily write very well performing servers. Node is an awesome platform to use for your daily shell scripting needs (allowing you to work in your favorite language even for these tasks). Node isn’t about creating awesome websites. It’s about giving you the power to easily build servers. Web, DNS, SMTP – we’ve seen all.
To help you with web servers and probably to show us users how it’s done, node also provides a very good library to interact with the HTTP protocol. This isn’t about generating web pages. This isn’t about URL routing, or MVC or whatever. This is about writing a web server. About interracting with HTTP clients. Or HTTP servers. On the lowest level.
So when comparing node with other platforms, you must be careful to compare apple with apples. Don’t compare pure node.js to rails. Compare it to mod_wsgi, to fastcgi, to a servlet container (if you must) or to mod_php (the module that allows a script of yours access to server internals. Not the language) or mod_perl.
In that case, consider this. With node.js you don’t worry about performance, you don’t worry about global locks (you do worry about never blocking though), and you really, truly and most awesomely don’t worry about race conditions.
you’d always end up with a === 2 once both timeouts have executed. There is no interruption between the assignment of t and the increment. No worries about threading. No hours wasted trying to find out why a suddenly (and depending on the load on your system) is either 1, 2 or 3.
In the years we got experience in programming, we learned that what f does in my example above is a bad thing. We feel strange when typing code like this – seeking for any method of locking, of specifying a critical section. With node, there’s no need to.
This is why writing servers (remember: highly concurrent access to potentially the same code) is so much fun in node.
The perfect little helpers that were added to deal with the HTTP protocol are just the icing on the cake, but in so many other frameworks (cough WSGI cough) stuff like chunking, multipart parsing, even just reading the client’s data from an input stream are hard if you do them on your own, or completely beyond your control if you let the libraries do them.
With node you get to the knobs to turn in the easiest way possible.
Now we know that we can easily write well performing servers (of any kind with special support for HTTP) in node, so let’s build a web site.
In traditional frameworks, your first step would be to select a framework (because the HTTP libraries are so effing (technical term) hard to use).
You’d end up with something lightweight like, say, mnml or werkzeug in python or something more heavy like rails for ruby (though rack isn’t nearly as bad as wsgi) or django for python. You’d add some kind of database abstraction or even ORM layer – maybe something that comes with your framework.
Sure. You could do that in node too. There are frameworks around.
But remember: Node is an awesome tool for you to write highly specialized servers.
Do you need to build your whole site in node?
Do you see this as a black or white situation?
Over the last year, I’ve done two things.
One is to layout a way how to augment an existing application (PHP, PostgreSQL) with a WebSocket based service using node to greatly reduce the load on the existing application. I didn’t have time to implement this yet, but it would work wonders.
The other thing was to prove a point and to implement a whole web application in node.
At first I fell into the same trap that anybody coming from the “old world” would be falling. I selected what seemed to be the most used web framework (Express) and rolled with that, but I soon found out that I have it all backwards.
I don’t want to write the 50iest web application. I wanted to do something else. Something new.
When you look at the tempalias source code (yeah – the whole service is open source so all of us can learn from it), you’ll notice that no single byte of HTML is dynamically generated.
I ripped out Express. I built a RESTful API for the main functionality of the site: Creating aliases. I built a server that does just that and nothing more.
The web client itself is just a client to that API. No single byte of that client is dynamically generated. It’s all static files. It’s using Sammy, jQuery, HTML and CSS to do its thing, but it doesn’t do anything the API I built on node doesn’t expose.
Because it’s static HTML, I could serve that directly from nginx I’m running in front of node.
But because I wanted the service to be self-contained, I plugged in node-paperboy to serve the static files from node too.
Paperboy is very special and very, very cool.
It’s not trying to replace node’s HTTP library. It’s not trying to abstract away all the niceties of node’s excellent HTTP support. It’s not even trying to take over the creation of the actual HTTP server. Paperboy is just a function you call with the request and response object you got as part of node’s HTTP support.
Whether you want to call it or not is your decision.
If you want to handle the request, you handle it.
If you don’t, you pass it on to paperboy.
Node is the UNIX of the tools to build servers with: It provides small dedicated tools that to one task, but truly, utterly excel at doing so.
So the libraries you are looking for are not the huge frameworks that do everything but just the one bit you really need.
You are looking for the excellent small libraries that live the spirit of node. You are looking for libraries that do one thing well. You are looking for libraries like paperboy. And you are relying on the excellent HTTP support to build your own libraries where the need arises.
It’s still very early in node’s lifetime.
You can’t expect everything to be there, ready to use it.
For some cases, that’s true. Need a DNS server? You can do that. Need an SMTP daemon? Easy. You can do that. Need a HTTP server that understands the HTTP protocol really well and provides excellent support to add your own functionality? Go for it.
But above all: You want to write your server in a kick-ass language? You want to never have to care about race conditions when reading, modifying and writing to a variable? You want to be sure not to waste hours and hours of work debugging code that looks right but isn’t?
Then node is for you.
It’s no turnkey solution yet.
It’s up to you to make the most out of it. To combine it with something more traditional. Or to build something new, maybe rethinking how you approach the problem. Node can help you to provide an awesome foundation to build upon. It alone will never provide you with a blog in 10 minutes. Supporting libraries don’t at this time provide you with that blog.
But they empower you to build it in a way that withstands even the heaviest pounding, that makes the most out of the available resources and above all, they allow you to use your language of choice to do so.
Create any alias you want the bookmarklet to create for you in the future
In the confirmation screen, you will be offered the bookmarklet to drag to your bookmarks bar.
Now whenever you are on a site you want to create a temporary alias for, just click that bookmarklet, hover the email field and press the left mouse button. The alias will be generated and filled into that email form.
Now that the bookmarklet feature is finished, let me take a few minutes to reflect on its creation, in the spirit of continuing the development diary.
The reason for the long silence after the launch is, believe it or not, the weather: Over the time I made the initial tempalias service, I began to really enjoy taking my 17inch MacBook Pro outside on the balcony and write code from there. In fact, I enjoyed it so much that I really wanted to continue that tradition when doing more work on the site.
Unfortunately from May first until May 21st it was raining constantly which made coding on the balcony kind of no-fun to do.
Now the weather was great and I could finish what I began way earlier.
So. How does one create a bookmarklet?
Of course, you don’t want to add all the code you need for your magic to work into that link target – that would be unmaintainable and there’s some risk of breakage once the link gets too big – who knows at what size of the script browsers begin cutting off the code.
So you basically do nothing but creating a script tag sourcing the real script. This is what I’m doing too – the non-minified version of that code is in util/bookmarklet_launcher_test.js.
Looking at that file, you’ll notice that the bookmarklet itself is configurable using that c variable (keeping the names short to keep the code as short as possible). The configuration is done on the results page that is shown once the alias has been generated (public/templates/result.template).
Why the host name? Because the script that is injected (public/bookmarklet.js) doesn’t know it – when it’s sourced, window.location would still point to the site it was sourced on. The script is static code, so the server can’t inject the correct host name either – in fact, all of tempalias is static code aside of that one RESTful endpoint (/aliases).
This is a blessing as it keeps the code clean and a curse as it makes stuff harder than usual at places – this time it’s just the passing around of the host name (which I don’t want to hard-code for easier deployment and development).
The next thing of note is how the heavy lifting script is doing its work: Because the DOM manipulation and event-hooking up needed to make this work is too hard for my patience, I decided that I wanted to use jQuery.
But the script is running in the context of the target site (where the form field should be filled out), so we neither can be sure that jQuery is available nor should we blindly load it.
So the script is really careful:
if jQuery is available and of version 1.4.2, that one is used.
If jQuery is available, but not of version 1.4.2, we load our own (well – the official one from Google’s CDN) and use that, while restoring the old jQuery to the site.
If jQuery is not available, we load our own, restoring window.$ if it pointed to something beforehand.
This procedure would never work if jQuery wasn’t as careful as it is not to pollute the global namespace – juggling two values (window.$ and window.jQuery) is possible – anything more is breakage waiting to happen.
The last thing we need to take care of, finally, is the fact that the bookmarklet is now running in the context of the target site and, hence, cannot do AJAX requests to tempalias.com any more. This is what JSONp was invented for and I had to slightly modify the node backend to make JSONp work for the bookmarklet script (this would be commit 1a6e8c – not something I’m proud of – tempalias_http.js needs some modularization now).
All in all, this was an interesting experience between cross domain restrictions and trying to be a good citizen on the target page. Also I’m sure the new knowledge will be of use in the future for similar projects.
Unfortunately, the weather is getting bad again, so the next few features will, again, have to wait. Ideas for the future are:
use tempalias.com as MX and CNAME as to create your own aliases for our own domain
create an iphone / android client app for the REST API (/aliases)
daemonize the main code on its own without the help of some shell magic
maybe find a way to still hook some minimal dynamic content generation into paperboy.
While the user experience on tempalias.com is already really streamlined, compared to other services that encode the expiration settings and sometimes even the target) into the email address (and are thus exploitable and in some cases requiring you to have an account with them), it loses in that, when you have to register on some site, you will have to open the tempalias.com website in its own window and then manually create the alias.
Wouldn’t it be nice if this worked without having to visit the site?
This video is showing how I want this to work and how the bookmarklet branch on the github project page is already working:
The workflow will be that you create your first (and probably only) alias manually. In the confirmation screen, you will be presented with a bookmarklet that you can drag to your bookmark bar and that will generate more aliases like the one just generated. This works independently of cookies or user accounts, so it would even work across browsers if you are synchronizing bookmarks between machines.
The actual bookmarklet is just a very small stub that will contain all the configuration for alias creation (so the actual bookmarklet will be the minified version of this file here). The bookmarklet, when executed will add a script tag to the page that actually does the heavy lifting.
The script that’s running in the video above tries really hard to be a good citizen as it’s run in the context of a third party webpage beyond my control:
it doesn’t pollute the global namespace. It has to add one function, window.$__tempalias_com, so it doesn’t reload all the script if you click the bookmark button multiple times.
while it depends on jQuery (I’m not doing this in pure DOM), it tries really hard to be a good citizen:
if jQuery 1.4.2 is already used on the site, it uses that.
if any other jQuery version is installed, it loads 1.4.2 but restores window.jQuery to what it was before.
if no jQuery is installed, it loads 1.4.2
In all cases, it calls jQuery.noConflict if $ is bound to anything.
All DOM manipulation uses really unique class names and event namespaces
While implementing, I noticed that you can’t unbind live events with just their name, so $().die(‘.ta’) didn’t work an I had to provide all events I’m live-binding to. I’m using live here because the bubbling up delegation model works better in a case where there might be many matching elements on any particular page.
Now the next step will be to add some design to the whole thing and then it can go live.
The layout is finished, the last edges too rough for pushing the thing live are smoothed. tempalias.com is live. After coming really close to finishing the thing yesterday (hence the lack of a posting here – I was too tired when I had to quit at 2:30am) last night, now I could complete the results page and add the needed finishing touches (like a really cool way of catching enter to proceed from the first to the last form field – my favorite hidden feature).
I guess it’s time for a little debriefing:
All in all, the project took a time span of 17 days to implement from start to finish. I did this after work and mostly during weekdays and sundays, so it’s actually 11 days in which work was going on (I also was sick two days). Each day I worked around 4 hours, so all in all, this took around 44 hours to implement.
A significant part of this time went into modifications of third party libraries, while I tried to contact the initial authors to get my changes merged upstream:
The author of node-smtp isn’t interested in the SMTP daemon functionality (that wasn’t there when I started and is now completed)
The author of redis-node-client didn’t like my patch, but we had a really fruitful discussion and node-redis-client got a lot better at handling dropped connection in the process.
The author of node-paperboy has merged my patch for a nasty issue and even tweeted about it (THANKS!)
Before I continue, I want to say a huge thanks to fictorial on github for the awesome discussion I was allowed to have with him about node-redis-client’s handling of dropped connections. I’ve enjoyed every word I was typing and reading.
But back to the project.
Non-third-party code consists of just 1624 lines of code (using wc -l, so not an accurate measurement). This doesn’t factor in the huge amount of changes I made to my fork of node-smtp the daemon part of which was basically non-existant.
Overall, the learnings I made:
git and github are awesome. I knew that beforehand, but this just cemented my opinion
node.js and friends are still in their infancy. While node removes previously published API on a nearly daily basis (it’s mostly bug-free though), none of the third-party libraries I am using were sufficiently bug-free to use them without change.
Asynchronous programming can be fun if you have closures at your disposal
Asynchronous programming can be difficult once the nesting gets deep enough
Making any variable not declared with var global is the worst design decision I have ever seen in my life especially in node where we are adding concurrency to the mix)
Node is crazy fast.
Also, I want to take the opportunity and say huge thanks to:
the guys behind node.js. I would have had to do this in PHP or even rails (which is even less fitting than PHP as it provides so much functionality around generating dynamic HTML and so little around pure JSON based web services) without you guys!
Richard for his awesome layout
fictorial for redis-node-client and for the awesome discussion I was having with him.
kennethkalmer for his work on node-smtp even though it was still incomplete – you lead me on the right tracks how to write an SMTP daemon. Thank you!
@felixge for node-paperboy – static file serving done right
The guys behind sammy – writing fully JS based AJAX apps has never been easier and more fun.
Thank you all!
The next step will be marketing: Seing this is built on node.js and an actually usable project – way beyond the usual little experiments, I hope to gather some interest in the Hacker community. Seing it also provides a real-world use, I’ll even go and try to submit news about the project on more general outlets. And of course on the Security Now! feedback page as this is inspired by their episode 242.