why I don’t touch crypto

When doing our work as programmers, we screw up. Small bugs, big bugs, lazyness – the possibilties are endless.

Usually, when we screw up, we know that immediately: We get a failing test, we get an exception logged somewhere, or we hear from our users that such and such feature doesn’t work.

Also, most of the time, no matter how bad the bug, the issue can be worked around and the application keeps working overall.

Once you found the bug, you fix it and everybody is happy.

But imagine you had one of these off-by-one errors in your code (those that constantly happen to all of us) and further imagine that the function where the error was in was still apparently producing the same output as if the error wasn’t there.

Imagine that because of that error the apparently correctly looking output is completely useless and your whole application has just now utterly broken.

That’s crypto for you.

Crypto can’t be a «bit broken». It can’t be «mostly working». Either it’s 100% correct, or you shouldn’t have bothered doing it at all. The weakest link breaks the whole chain.

Worse: looking at the data you are working with doesn’t show any sign of wrongness when you look at it. You encrypt something, you see random data. You decrypt it, you see clear text. Seems to work fine. Right! Right?

Last week’s issue in the random number generator in Cryptocat is a very good example.

The bug was an off-by-one error in their random number generator. The output of the function was still random numbers, looking at the output would clearly show random numbers. Given that fact, the natural bias for seeing code as being correct is only reinforced.

But yet it was wrong. The bug was there and the random numbers weren’t really random (enough).

The weakest link was broken, the whole effort in security practically pointless, which is even worse in this case of an application whose only purpose is, you know, security.

Security wasn’t just an added feature to some other core functionality. It was the core functionality.

That small off-by-one error has completely broken the whole application and was completely unnoticable by just looking at the produced output. Writing a testcase for this would have required complicated thinking and coding which would be as likely to contain an error as it was likely for the code to be tested to contain an error.

This, my friends, is why I keep my hands off crypto. I’m just plain not good enough. Crypto is a world where understanding the concepts, understanding the math and writing tests just isn’t good enough.

The goal you have to reach is perfection. If you fail to reach that, than you have failed utterly.

Crypto is something I leave to others to deal with. Either they have reached perfection at which point they have my utmost respect. Or they fail at which point they have my understanding.

armchair scientists

The place: London. The time: Around 1890.

Imagine a medium sized room, lined with huge shelves filled with dusty
books. The lights are dim, the air is heavy with cigar smoke. Outside
the last shred of daylight is fading away.

In one corner of the room, you spot two large leather armchairs and a
small table. On top of the table, two half-full glasses of Whiskey. In
each of the armchair an elderly person.

One of them opens the mouth to speak

«If I were in charge down there in South Africa, we’d be so much
better off – running a colony just can’t be so hard as they make it
out to be»

Concievably to have happened? Yeah. Very likely actually. Crazy and
misguided? Of course – we learned about that in school,
imperialism
doesn’t
work.

Of course that elderly guy in the little story is wrong. The problems
are way too complex for a bystander to even understand, let alone
solve. More than likely he doesn’t even have a fraction of the
background needed to understand the complexities.

And yet he sits there, in his comfortable chair, in the warmth of his
club in cozy London and yet he explains that he knows so much better
than, you know, the people actually doing the work.

Now think today.

Think about that article you just read that was explaining a problem
the author was solving. Or that other article that was illustrating a
problem the author is having, still in search of a solution.

Didn’t you feel the urge to go to Hacker News
and reply how much you know better and how crazy the original poster
must be not to see the obvious simple solution?

Having trouble scaling 4chan? How can that be hard?
Having trouble with your programming environment feeling unable to assign a string to another?
Well. It’s just strings, why is that so hard?

Or those idiots at Amazon who can’t even keep their cloud service
running? Clearly it can’t be that hard!

See a connection? By stating opinion like that, you are not even a
little bit better than the elderly guy in the beginning of this essay.

Until you know all the facts, until you were there, on the ladder
holding a hose trying to extinguish the flames, until then, you don’t
have the right to assume that you’d do better.

The world we live in is incredibly complicated. Even though computer
science might boil down to math, our job is dominated by side-effects
and uncontrollable external factors.

Even if you think that you know the big picture, you probably won’t
know all the details and without knowing the details, it’s
increasingly likely that you don’t understand the big picture either.

Don’t be an armchair scientist.

Be a scientist. Work with people. Encourage them, discuss solutions,
propose ideas, ask what obvious fact you missed or was missing in the
problem description.

This is 2012, not 1890.

E_NOTICE stays off.

I’m sure you’ve used this idiom a lot when writing JavaScript code

options['a'] = options['a'] || 'foobar';

It’s short, it’s concise and it’s clear what it does. In ruby, you can even be more concise:

params[:a] ||= 'foobar'

So you can imagine that I was happy with PHP 5.3’s new ?: operator:

<? $options['a'] = $options['a'] ?: 'foobar'; ?>

In all three cases, the syntax is concise and readable, though arguably, the PHP one could read a bit better, but, ?: still is better than writing the full ternary expression, spelling out $options['a'] three times.

PopScan, since forever (forever being 2004) runs with E_NOTICE turned off. Back in the times, I felt it provided just baggage and I just wanted (had to) get things done quickly.

This, of course, lead to people not taking enough care for the code and recently, I had one too many case of a bug caused by accessing a variable that was undefined in a specific code path.

I decided that I’m willing to spend the effort in cleaning all of this up and making sure that there are no undeclared fields and variables in all of PopScans codebase.

Which turned out to be quite a bit of work as a lot of code is apparently happily relying on the default null that you can read out of undefined variables. Those instances might be ugly, but they are by no means bugs.

Cases where the null wouldn’t be expected are the ones I care about, but I don’t even what to go and discern the two – I’ll just fix all of the instances (embarrassingly many, most of them, thankfully, not mine).

Of course, if I put hours into a cleanup project like this, I want to be sure that nobody destroys my work again over time.

Which is why I was looking into running PHP with E_NOTICE in development mode at least.

Which brings us back to the introduction.

<? $options['a'] = $options['a'] ?: 'foobar'; ?>

is wrong code. Any accessing of an undefined index of an array always raises a notice. It’s not like Python where you can chose (accessing a dictionary using [] will throw a KeyError, but there’s get() which just returns None). No. You don’t get to chose. You only get to add boilerplate:

<? $options['a'] = isset($options['a']) ? $options['a'] : 'foobar'; ?>

See how I’m now spelling $options['a'] three times again? ?: just got a whole lot less useful.

But not only that. Let’s say you have code like this:

<?
list($host, $port) = explode(':', trim($def))
$port = $port ?: 11211; ?>

IMHO very readable and clear what it does: It extracts a host and a port and sets the port to 11211 if there’s none in the initial string.

This of course won’t work with E_NOTICE enabled. You either lose the very concise list() syntax, or you do – ugh – this:

<?
list($host, $port) = explode(':', trim($def)) + array(null, null);
$port = $port ?: 11211; ?>

Which looks ugly as hell. And no, you can’t write a wrapper to explode() which always returns an array big enough, because you don’t know what’s big enough. You would have to pass the amount of nulls you want into the call too. That would look nicer then above hack, but it still doesn’t even come close in conciseness to the solution which throws a notice.

So. In the end, I’m just complaining about syntax you might think? I though so too and I wanted to add the syntax I liked, so I did a bit of experimenting.

Here’s a little something I’ve come up with:

https://gist.github.com/1267568

The wrapped array solution looks really compelling syntax-wise and I could totally see myself using this and even forcing everybody else to go there. But of course, I didn’t trust PHP’s interpreter and thus benchmarked the thing.

pilif@tali ~ % php e_notice_stays_off.php
Notices off. Array 100000 iterations took 0.118751s
Notices off. Inline. Array 100000 iterations took 0.044247s
Notices off. Var. Array 100000 iterations took 0.118603s
Wrapped array. 100000 iterations took 0.962119s
Parameter call. 100000 iterations took 0.406003s
Undefined var. 100000 iterations took 0.194525s

So. Using nice syntactic sugar costs 7 times the performance. The second best solution? Still 4 times. Out of the question. Yes. It could be seen as a micro-optimization, but 100’000 iterations, while a lot is not that many. Waiting nearly a second instead of 0.1 second is crazy, especially for a common operation like this.

Interestingly, the most bloated code (that checks with isset()) is twice as fast as the most readable (just assign). Likely, the notice gets fired regardless of error_reporting() and then just ignored later on.

What really pisses me off about this is the fact that everywhere else PHP doesn’t give a damn. ‘0’ is equal to 0. Heck, even ‘abc’ is equal to 0. It even fails silently many times.

But in a case like this, where there is even newly added nice and concise syntax, it has to be anal and bitchy. And there’s no way to get to the needed solution but to either write too expensive wrappers or ugly boilerplate.

Dynamic languages give us a very useful tool to be dynamic in the APIs we write. We can create functions that take a dictionary (an array in PHP) of options. We can extend our objects at runtime by just adding a property. And with PHP’s (way too) lenient data conversion rules, we can even do math with user supplied string data.

But can we read data from $_GET without boilerplate? No. Not in PHP. Can we use a dictionary of optional parameters? Not in PHP. PHP would require boilerplate.

If a language basically mandates retyping the same expression three times, then, IMHO, something is broken. And if all the workarounds are either crappy to read or have very bad runtime properties, then something is terribly broken.

So, I decided to just fix the problem (undefined variable access) but leave E_NOTICE where it is (off). There’s always git blame and I’ll make sure I will get a beer every time somebody lets another undefined variable slip in.

Asking for permission

Only just last year, I told @brainlock (in real life, so I can’t link) that the coolest thing about our industry was that you don’t have to ask for permission to do anything.

Want to start the next big web project? Just start it. Want to write about your opinions? Just write about them. Want to get famous? It’s still a lot of work and marketing, but nothing (aside of lack of talent) is stopping you.

Whenever you have a good idea for a project, you start working on it, you see how it turns out and you decide whether to continue working on it or whether to scrap it. Aside of a bit of cash for hosting, you don’t need anything else.

This is very cool because is empowers “normal people”. Heck, I probably wouldn’t be where I currently am if it wasn’t for this. Back in 1996 I had no money, I wasn’t known, I had no past experience. What I had though was enthusiasm.

Which is all that’s needed.

Only a year later though, I’m sad to see that we are at the verge of losing all of this. Piece by piece.

First was apple with their iPhone. Even with all the enthusiasm of the world, you are not going to write an app that other people can run on the phone. No. First you will have to ask Apple for permission.

Want to access some third-party hardware from that iPhone app? Sure. But now you have to not only ask Apple, but also the third party vendor for permission.

The explanation we were given is that a malicious app could easily bring down the mobile network. Thus they needed to be careful what we could run on our phones.

But then, we got the iPad with the exact same restrictions even though not all of them even have mobile network access.

The explanation this time? Security.

As nobody wants their machine to be insecure, everybody just accepts it.

Next came Microsoft: In the Windows Mobile days before the release of 7, you didn’t have to ask anybody for permission. You bought (or pirated if you didn’t have money) Visual Studio, you wrote your app, you published it.

All of this is lost now. Now you ask for permission. Now you hope for the powers that be to allow you to write your software.

Finally, you can’t even do what you want with your PC – all because of security.

So there’s still the web you think? I wish I could be positive about that, but as we are running out of IP-addresses and the adoption of IPv6 is slow as ever, I believe that public IP addresses are becoming a scarce good at which point, again, you will be asking for permission.

In some countries, even today, it’s not possible to just write a blog post because the government is afraid of “unrest” (read: losing even more credibility). That’s not just countries we always perceived as “not free” – heck, even in Italy you must register with the government if you want to have a blog (it turns out that law didn’t come to pass – let’s hope no other country has the same bright idea). In Germany, if you read the law by the letter, you can’t blog at all without getting every post approved – you could write
something that a minor might see.

«But permission will be granted anyways», you might say. Are you sure though? What if you are a minor wanting to create an application for your first client? Back in my days, I could just do it. Are you sure that whatever entity is going to have to give permission wan’t to do business with minors? You do know that you can’t have a Gmail account if you are younger than 13 years, do you? So age barriers exist.

What if your project competes with whatever entity has to give permission? Remember the story about the Google Voice app? Once we are out of IP addresses, the big provider and media companies who still have addresses might see you little startup web project as competition in some way. Are you sure you will still get permission?

Back in 1996 when I started my company in High-School, all you needed to earn your living was enthusiasm and a PC (yes – I started doing web programming without having access to the internet)

Now you need signed contracts, signed NDAs, lobbying, developer program memberships, cash – the barriers to entry are infinitely higher at this point.

I’m afraid though, that this is just the beginning. If we don’t stand up now, if we continue to let big companies and governments take away our freedom of expression piece by piece, if we give up more and more of our freedom because of the false promise of security, then, at one point, all of what we had will be lost.

We won’t be able to just start our projects. We won’t be able to create – only to work on other peoples projects. We will lose all that makes our profession interesting.

Let’s not go there.

Please.

Discussion on HackerNews

AJAX, Architecture, Frameworks and Hacks

Today I was talking with @brainlock about JavaScript, AJAX and Frameworks and about two paradigms that are in use today:

The first is the “traditional” paradigm where your JS code is just glorified view code. This is how AJAX worked in the early days and how people are still using it. Your JS-code intercepts a click somewhere, sends an AJAX request to the server and gets back either more JS code which just gets evaulated (thus giving the server kind of indirect access to the client DOM) or a HTML fragment which gets inserted at the appropriate spot.

This means that your JS code will be ugly (especially the code coming from the server), but it has the advantage that all your view code is right there where all your controllers and your models are: on the server. You see this pattern in use on the 37signals pages or in the github file browser for example.

Keep the file browser in mind as I’m going to use that for an example later on.

The other paradigm is to go the other way around an promote JS to a first-class language. Now you build a framework on the client end and transmit only data (XML or JSON, but mostly JSON these days) from the server to the client. The server just provides a REST API for the data plus serves static HTML files. All the view logic lives only on the client side.

The advantages are that you can organize your client side code much better, for example using backbone, that there’s no expensive view rendering on the server side and that you basically get your third party API for free because the API is the only thing the server provides.

This paradigm is used for the new twitter webpage or in my very own tempalias.com.

Now @brainlock is a heavy proponent of the second paradigm. After being enlightened by the great Crockford, we both love JS and we both worked on huge messes of client-side JS code which has grown over the years and lacks structure and feels like copy pasta sometimes. In our defense: Tons of that code was written in the pre-enlightened age (2004).

I on the other hand see some justification for the first pattern aswell and I wouldn’t throw it away so quickly.

The main reason: It’s more pragmatic, it’s more DRY once you need graceful degradation and arguably, you can reach your goal a bit faster.

Let me explain by looking at the github file browser:

If you have a browser that supoports the HTML5 history API, then a click on a directory will reload the file list via AJAX and at the same time the URL will be updated using push state (so that the current view keeps its absolute URL which is valid even after you open it in a new browser).

If a browser doesn’t support pushState, it will gracefully degrade by just using the traditional link (and reloading the full page).

Let’s map this functionality to the two paradigms.

First the hacky one:

  1. You render the full page with the file list using a server-side template
  2. You intercept clicks to the file list. If it’s a folder:
  3. you request the new file list
  4. the server now renders the file list partial (in rails terms – basically just the file list part) without the rest of the site
  5. the client gets that HTML code and inserts it in place of the current file list
  6. You patch up the url using push state

done. The view code is only on the server. Whether the file list is requested using the AJAX call or the traditional full page load doesn’t matter. The code path is exactly the same. The only difference is that the rest of the page isn’t rendered in case of an AJAX call. You get graceful degradation and no additional work.

Now assuming you want to keep graceful degradation possible and you want to go the JS framework route:

  1. You render the full page with the file list using a server-side template
  2. You intercept the click to the folder in the file list
  3. You request the JSON representation of the target folder
  4. You use that JSON representation to fill a client-side template which is a copy of the server side partial
  5. You insert that HTML at the place where the file list is
  6. You patch up the URL using push state

The amount of steps is the same, but the amount of work isn’t: If you want graceful degradation, then you write the file list template twice: Once as a server-side template, once as a client-side template. Both are quite similar but usually you’ll be forced to use slightly different syntax. If you update one, you have to update the other or the experience will be different whether you click on a link or you open the URL directly.

Also you are duplicating the code which fills that template: On the server side, you use ActiveRecord or whatever other ORM. On the client side, you’d probably use Backbone to do the same thing but now your backend isn’t the database but the JSON response. Now, Backbone is really cool and a huge timesaver, but it’s still more work than not doing it at all.

OK. Then let’s skip graceful degradation and make this a JS only client app (good luck trying to get away with that). Now the view code on the server goes away and you are just left with the model on the server to retrieve the data, with the model on the client (Backbone helps a lot here, but there’s still a substatial amount of code that needs to be written that otherwise wouldn’t) and with the view code on the client.

Now don’t ge me wrong.

I love the idea of promoting JS to a first class language. I love JS frameworks for big JS only applications. I love having a “free”, dogfooded-by-design REST API. I love building cool architectures.

I’m just thinking that at this point it’s so much work doing it right, that the old ways do have their advantages and that we should not condemn them for being hacky. True. They are. But they are also pragmatic.

overpriced data roaming

You shouldn’t complain if something gets cheaper. But if something just gets 7 times cheaper from one day to the next, then that leaves you thinking whether the price offered so far might have been a tad bit too high.

I’m talking about Swisscom’s data roaming charges.

Up to now, you paid CHF 50 per 5 MB (CHF 10 per MB) when roaming in the EU. Yes. That’s around $10 and EUR 6.60 per Megabyte. Yes. Megabyte. Not Gigabyte. And you people complain about getting limited to 5 GB for your $30.

Just now I got a press release form Swisscom that they are changing their roaming charges to CHF 7 per 5 MB. That’s CHF 1.40 per MB which is 7 times cheaper.

If you can make a product of yours 7 times cheaper from one day to the other, the rates you charged before that were clearly way too high.

Why node.js excites me

Today, on Hacker News, an article named “Why node.js disappoints me” appeared – right on the day I returned back from jsconf.eu (awesome conference. Best days of my life, I might add) where I was giving a talk about using node.js for a real web application that provides real use: tempalias.com

Time to write a rebuttal, I guess.

The main gripe Eric has with node is a gripe with the libraries that are available. It’s not about performance. It’s not about ease of deployment, or ease of development. In his opinion, the libraries that are out there at the moment don’t provide anything new compared to what already exists.

On that level, I totally agree. The most obvious candidates for development and templating try to mimik what’s already out there for other platforms. What’s worse: There seems to be no real winner and node itself doesn’t seem to make a recommendation or even include something with the base distribution.

This is inherently a good thing though. Node.js isn’t your complete web development stack. Far from it.

Node is an awesome platform to very easily write very well performing servers. Node is an awesome platform to use for your daily shell scripting needs (allowing you to work in your favorite language even for these tasks). Node isn’t about creating awesome websites. It’s about giving you the power to easily build servers. Web, DNS, SMTP – we’ve seen all.

To help you with web servers and probably to show us users how it’s done, node also provides a very good library to interact with the HTTP protocol. This isn’t about generating web pages. This isn’t about URL routing, or MVC or whatever. This is about writing a web server. About interracting with HTTP clients. Or HTTP servers. On the lowest level.

So when comparing node with other platforms, you must be careful to compare apple with apples. Don’t compare pure node.js to rails. Compare it to mod_wsgi, to fastcgi, to a servlet container (if you must) or to mod_php (the module that allows a script of yours access to server internals. Not the language) or mod_perl.

In that case, consider this. With node.js you don’t worry about performance, you don’t worry about global locks (you do worry about never blocking though), and you really, truly and most awesomely don’t worry about race conditions.

Assuming

    var a = 0;
    var f = function(){
        var t = a; // proving a point here. I know it's not needed
        a = t + 1;
    }
    setTimeout(f, 100);
    setTimeout(f, 100);

you’d always end up with a === 2 once both timeouts have executed. There is no interruption between the assignment of t and the increment. No worries about threading. No hours wasted trying to find out why a suddenly (and depending on the load on your system) is either 1, 2 or 3.

In the years we got experience in programming, we learned that what f does in my example above is a bad thing. We feel strange when typing code like this – seeking for any method of locking, of specifying a critical section. With node, there’s no need to.

This is why writing servers (remember: highly concurrent access to potentially the same code) is so much fun in node.

The perfect little helpers that were added to deal with the HTTP protocol are just the icing on the cake, but in so many other frameworks (cough WSGI cough) stuff like chunking, multipart parsing, even just reading the client’s data from an input stream are hard if you do them on your own, or completely beyond your control if you let the libraries do them.

With node you get to the knobs to turn in the easiest way possible.

Now we know that we can easily write well performing servers (of any kind with special support for HTTP) in node, so let’s build a web site.

In traditional frameworks, your first step would be to select a framework (because the HTTP libraries are so effing (technical term) hard to use).

You’d end up with something lightweight like, say, mnml or werkzeug in python or something more heavy like rails for ruby (though rack isn’t nearly as bad as wsgi) or django for python. You’d add some kind of database abstraction or even ORM layer – maybe something that comes with your framework.

Sure. You could do that in node too. There are frameworks around.

But remember: Node is an awesome tool for you to write highly specialized servers.

Do you need to build your whole site in node?

Do you see this as a black or white situation?

Over the last year, I’ve done two things.

One is to layout a way how to augment an existing application (PHP, PostgreSQL) with a WebSocket based service using node to greatly reduce the load on the existing application. I didn’t have time to implement this yet, but it would work wonders.

The other thing was to prove a point and to implement a whole web application in node.

I built tempalias.com

At first I fell into the same trap that anybody coming from the “old world” would be falling. I selected what seemed to be the most used web framework (Express) and rolled with that, but I soon found out that I have it all backwards.

I don’t want to write the 50iest web application. I wanted to do something else. Something new.

When you look at the tempalias source code (yeah – the whole service is open source so all of us can learn from it), you’ll notice that no single byte of HTML is dynamically generated.

I ripped out Express. I built a RESTful API for the main functionality of the site: Creating aliases. I built a server that does just that and nothing more.

I leveraged all the nice features JavaScript as a language provides me with to build a really cool backend. I used all the power that node provides me with to build a really cool (and simple!) server to web-enable that API (posting and reading JSON to and from the server)

The web client itself is just a client to that API. No single byte of that client is dynamically generated. It’s all static files. It’s using Sammy, jQuery, HTML and CSS to do its thing, but it doesn’t do anything the API I built on node doesn’t expose.

Because it’s static HTML, I could serve that directly from nginx I’m running in front of node.

But because I wanted the service to be self-contained, I plugged in node-paperboy to serve the static files from node too.

Paperboy is very special and very, very cool.

It’s not trying to replace node’s HTTP library. It’s not trying to abstract away all the niceties of node’s excellent HTTP support. It’s not even trying to take over the creation of the actual HTTP server. Paperboy is just a function you call with the request and response object you got as part of node’s HTTP support.

Whether you want to call it or not is your decision.

If you want to handle the request, you handle it.

If you don’t, you pass it on to paperboy.

Or foobar.

Or whatever.

Node is the UNIX of the tools to build servers with: It provides small dedicated tools that to one task, but truly, utterly excel at doing so.

So the libraries you are looking for are not the huge frameworks that do everything but just the one bit you really need.

You are looking for the excellent small libraries that live the spirit of node. You are looking for libraries that do one thing well. You are looking for libraries like paperboy. And you are relying on the excellent HTTP support to build your own libraries where the need arises.

It’s still very early in node’s lifetime.

You can’t expect everything to be there, ready to use it.

For some cases, that’s true. Need a DNS server? You can do that. Need an SMTP daemon? Easy. You can do that. Need a HTTP server that understands the HTTP protocol really well and provides excellent support to add your own functionality? Go for it.

But above all: You want to write your server in a kick-ass language? You want to never have to care about race conditions when reading, modifying and writing to a variable? You want to be sure not to waste hours and hours of work debugging code that looks right but isn’t?

Then node is for you.

It’s no turnkey solution yet.

It’s up to you to make the most out of it. To combine it with something more traditional. Or to build something new, maybe rethinking how you approach the problem. Node can help you to provide an awesome foundation to build upon. It alone will never provide you with a blog in 10 minutes. Supporting libraries don’t at this time provide you with that blog.

But they empower you to build it in a way that withstands even the heaviest pounding, that makes the most out of the available resources and above all, they allow you to use your language of choice to do so.

JavaScript.