why I don’t touch crypto

When doing our work as programmers, we screw up. Small bugs, big bugs, lazyness – the possibilties are endless.

Usually, when we screw up, we know that immediately: We get a failing test, we get an exception logged somewhere, or we hear from our users that such and such feature doesn’t work.

Also, most of the time, no matter how bad the bug, the issue can be worked around and the application keeps working overall.

Once you found the bug, you fix it and everybody is happy.

But imagine you had one of these off-by-one errors in your code (those that constantly happen to all of us) and further imagine that the function where the error was in was still apparently producing the same output as if the error wasn’t there.

Imagine that because of that error the apparently correctly looking output is completely useless and your whole application has just now utterly broken.

That’s crypto for you.

Crypto can’t be a «bit broken». It can’t be «mostly working». Either it’s 100% correct, or you shouldn’t have bothered doing it at all. The weakest link breaks the whole chain.

Worse: looking at the data you are working with doesn’t show any sign of wrongness when you look at it. You encrypt something, you see random data. You decrypt it, you see clear text. Seems to work fine. Right! Right?

Last week’s issue in the random number generator in Cryptocat is a very good example.

The bug was an off-by-one error in their random number generator. The output of the function was still random numbers, looking at the output would clearly show random numbers. Given that fact, the natural bias for seeing code as being correct is only reinforced.

But yet it was wrong. The bug was there and the random numbers weren’t really random (enough).

The weakest link was broken, the whole effort in security practically pointless, which is even worse in this case of an application whose only purpose is, you know, security.

Security wasn’t just an added feature to some other core functionality. It was the core functionality.

That small off-by-one error has completely broken the whole application and was completely unnoticable by just looking at the produced output. Writing a testcase for this would have required complicated thinking and coding which would be as likely to contain an error as it was likely for the code to be tested to contain an error.

This, my friends, is why I keep my hands off crypto. I’m just plain not good enough. Crypto is a world where understanding the concepts, understanding the math and writing tests just isn’t good enough.

The goal you have to reach is perfection. If you fail to reach that, than you have failed utterly.

Crypto is something I leave to others to deal with. Either they have reached perfection at which point they have my utmost respect. Or they fail at which point they have my understanding.

armchair scientists

The place: London. The time: Around 1890.

Imagine a medium sized room, lined with huge shelves filled with dusty
books. The lights are dim, the air is heavy with cigar smoke. Outside
the last shred of daylight is fading away.

In one corner of the room, you spot two large leather armchairs and a
small table. On top of the table, two half-full glasses of Whiskey. In
each of the armchair an elderly person.

One of them opens the mouth to speak

«If I were in charge down there in South Africa, we’d be so much
better off – running a colony just can’t be so hard as they make it
out to be»

Concievably to have happened? Yeah. Very likely actually. Crazy and
misguided? Of course – we learned about that in school,
imperialism
doesn’t
work.

Of course that elderly guy in the little story is wrong. The problems
are way too complex for a bystander to even understand, let alone
solve. More than likely he doesn’t even have a fraction of the
background needed to understand the complexities.

And yet he sits there, in his comfortable chair, in the warmth of his
club in cozy London and yet he explains that he knows so much better
than, you know, the people actually doing the work.

Now think today.

Think about that article you just read that was explaining a problem
the author was solving. Or that other article that was illustrating a
problem the author is having, still in search of a solution.

Didn’t you feel the urge to go to Hacker News
and reply how much you know better and how crazy the original poster
must be not to see the obvious simple solution?

Having trouble scaling 4chan? How can that be hard?
Having trouble with your programming environment feeling unable to assign a string to another?
Well. It’s just strings, why is that so hard?

Or those idiots at Amazon who can’t even keep their cloud service
running? Clearly it can’t be that hard!

See a connection? By stating opinion like that, you are not even a
little bit better than the elderly guy in the beginning of this essay.

Until you know all the facts, until you were there, on the ladder
holding a hose trying to extinguish the flames, until then, you don’t
have the right to assume that you’d do better.

The world we live in is incredibly complicated. Even though computer
science might boil down to math, our job is dominated by side-effects
and uncontrollable external factors.

Even if you think that you know the big picture, you probably won’t
know all the details and without knowing the details, it’s
increasingly likely that you don’t understand the big picture either.

Don’t be an armchair scientist.

Be a scientist. Work with people. Encourage them, discuss solutions,
propose ideas, ask what obvious fact you missed or was missing in the
problem description.

This is 2012, not 1890.

background events

Today is the day that one of the coolest things I had the pleasure to
develop so far in my life has gone live to production use.

One installation of PopScan is connected to
a SAP system that had at times really bad performance and yet it
needed to be connected even just to query for price information.

This is a problem because of features like our persistent shopping
basket or the users templates which cause a lot of products to be
displayed at once.

Up until now, PopScan synchronously queried for the prices and would
not render any products until all the product data has been assembled.

When you combine this with the sometimes bad performance of that SAP
system, you’ll quickly see unhappy users waiting for the pages to
finally load.

We decided to fix this problem for the users.

Aside of the price, all product data is in PopScan’s database anyways, so
while we need to wait for prices, everything else, we could display
immediately.

So that’s what we do now: Whenever we load products and we don’t have a price
yet, we’ll launch a background job which asynchronously retrieves the prices.
The frontend will immediately get the rendered products minus the prices.

But of course, we still need to show the user the fully loaded products once
they become available and this is where the cool server based event framework
comes into play:

The JS client in PopScan now gets notified on arbitrary events that can happen
on the server (like “product data loaded”, but also “GPRS scaner data
received”). The cool thing about this is that events are seemingly pushed
through instantly as they happen on the server giving the user the immediate
response they would want and lessening the load on the server as there’s no
(well. only long-) polling going on.

$(ServerEvents).bind('product-data', function(data){
    // product data has changed!
}

is all that we need on the client. The rest happens automatically.

Also remember though that PopScan is often used in technology-hostile
enterprise environments. Thus, features like web-sockets are out and in
general, we had to support ancient software all over the place.

We still managed to make it work and today this framework went to production
use for that one customer with the badly performing SAP system.

Over the course of the next few weeks, I might write in detail about how this
stuff works given the constratins (ancient client-software behind hostile
firewalls) and what software components we used.

Seeing this work go life fills me with joy: I’ve spend to many hours designing
this framework in a fool-proof way in order to not lose events and in order to
gracefully continue working as components in the big picture die.

Now it’s finally live and already contributing to lower waiting times for all
users.

sacy 0.4-beta 1

I’ve just pushed version 0.4-beta1 of sacy
to its github repository. Aside of requiring
PHP 5.3 now, it also has support for transforming contents of inline-tags.

So if you always wanted to write

type="text/coffeescript">
hello = (a)->
    alert "Hello #{a}"
hello "World"

and have the transformation done on the server-side, then I have good news
for you: Now you can! Just wrap the script with
{asset_compile}...{/asset_compile}.

I’m not saying that having inline-scripts (or even stylesheets) is a good idea
but sometimes, we have to pass data between our HTML templates and the JS
code and now we can do it in Coffee Script.

Development note

When you take a look at the commits leading to the release, you will notice
that I more or less hacked the support for inline tags into the existing
codebase (changing the terminology from files to work units in the process
though).

Believe me, I didn’t like this.

When I sat down to implement this, what I had in mind was a very nice
architecture where various components just register themselves and then
everything falls into place more or less automatically.

Unfortunately, what ever I did (I used git checkout . about three times) to
start over, I never got a satisfactory solution:

  • sometimes, I was producing a ton of objects, dynamically looking up what
    methods to call and what classes to instantiate.

    This would of course be very clean and cool, but also terribly slow. Sacy
    is an embeddable component, not an application in its own right.

  • sometimes, I had a simplified object model that kind of worked right until I
    thought of some edge-case at which point we would have either ended up back in
    hack-land or the edge-cases would have had to remain unfixed

  • sometimes I had something flexible enough to do what I need, but it still
    had code in it that had to know whether it was dealing with instances of Class
    A or Class B which is as inacceptable as the current array-mess.

In the end, it hit me: Sacy is already incomplete in that it simplifies the
problem domain quite a lot already. To cleanly get out of this, I would have to
actually parse and manipulate the DOM instead of dealing with regexes and I
would probably even have to go as far as to write a FactoryFactory in order
to correctly abstract away the issues.

Think of it: We have a really interesting problem domain here:

  • the same type of asset can use different tags (style and link for
    stylesheets)
  • Different attributes are used to refer to external resources (href for
    stylesheets, src for scripts)
  • File-backed assets can (and should) be combined
  • Conent-backed assets should be transformed and immediately inlined
  • Depending on the backing (content or file), the assets use a different
    method to determine cache-freshness (modification-time/size vs. content)
  • And last but not least, file based asset caching is done on the client side,
    content based asset caching is done on the server-side.

Building a nice architecture that would work without the ifs I learned to
hate lately would mean huge levels of indirections and abstractions.

No matter what I tried, I always ended up with a severe case of object-itis and
architectur-itis, both of which I deemed completely inacceptable for a
supposedly small and embeddable library.

Which is why I decided to throw away all my attempts and make one big
compromise and rely on CacheRenderer::renderWorkUnits to be called with
unified workunits (either all file or all content-based).

That made the backend code a lot easier.

And I could keep the lean array structure for describing a unit of work to do
for the backend.

I would still, at some point, love to have a nice way for handlers to register
themselves, but that’s something I’ll handle another day. For now, I’m happy
that I could accomplish my goal in a very lean fashion at the cost of a public
interface of the backend that is really, really inconvenient to use which leaves way too much code in the fronend.

At least I got away without an AssetFactoryFactory though :-)

My worst mistakes in programming

I’m in the middle of refactoring a big infrastructure piece in our product PopScan. It’s very early code, rarely touched since its inception in 2004, so I’m dealing mainly with my sins of the past.

This time like no time before, I’m feeling the two biggest mistake I have ever made in designing a program, so I though I’d make this post here in order to help others not fall into the same trap.

Remember this: Once you are no longer alone working on your project, the code you have written sets an example. Mistakes you have made are copied – either verbatim or in spirit. The design you have chosen lives on in the code that others write (rightfully so – you should strive to keep code consistent).

This makes it even more important not to screw up.

Back in 2004 I have failed badly at two places.

  • I chose a completely wrong abstraction in class design, mixing two things that should be separate.
  • I chose – in a foolhearted whish to save on CPU time to create a ton of internal state instead of fetching the data when it’s needed (I could still cache then, but I missed that).

So here’s the story.

One is the architectural issue.

Let me tell you, dear reader, should you ever be in the position of having to do anything even remotely related to an ecommerce solution dealing with products and orders, so repeat with me:

Product lists are not the same thing as orders. Orders are not the same thing as baskets.

and even more importantly:

A product and a line item are two completely different things.

A line item describes how a specific product is placed in a list, so at best, a product is contained in a line item. A product doesn’t have a quantity. A product doesn’t have a total price.

A line item does.

And when we are at it: «quantity» is not a number. It is the entitiy that describes the amount of times the product is contained within the line item. As such a quantity usually consists of an amount and a unit. If you change the unit, you change the quantity. If you change the amount, you change the quantity.

Anyways – sitting down and thinking of the entities in the feature that you are implementing is an essential part of the work that you do. Even it it seems “kinda right” at the time, even if it works “right” for years – once you make a mistake at a bad place, you are stuck with it.

PopScan is about products and ordering them. Me missing the distinction between a product and a line item back in 2004 worked fine until now, but as this is a core component of PopScan, it has grown the most over the years, more and more intertwining product and line item functionality to the point of where it’s too late to fix this now or at least it would require countless hours of work.

Work that will have to be done sooner rather than later. Work that deeply affects a core component of the product. Work that will change the API greatly and as such can only be tested for correctness in integration tests. Unit tests become useless as the units that are
being tested won’t exist any more in the future.

Painful work.

If only I had more time and experience those 8 years ago.

The other issue is about state

Let’s say you have a class FooBar with a property Foo that is exposed as part of the public API via a getFoo method.

That Foo relies of some external data – let’s call it foodata.

Now you have two options of dealing with that foodata:

  1. You could read foodata into an internal foo field at construction time. Then, whenever your getFoo() is called, you return the value you stored in foo.
  2. Or you could read nothing until getFoo() is called and then read foodata and return that (optionally caching it for the next call to getFoo())

Choosing the first design for most of the models back in 2004 was the second biggest coding mistake I have ever made in my life.

Aside of the fact that constructing one of these FooBar objects becomes more and more expensive the more stuff you preload (likely never to be used for the lifetime of the object), you have also contributed to a huge amount of internal state of the object.

The temptation to write a getBar() method that has a side effect of also altering the internal foo field is just too big. And now you end up with a getBar() that suddenly also depends on the internal state of foo which suddenly is disconnected from the initial foodata.

Worse, suddenly calling code will see different results depending on whether it calls getBar() before it’s calling getFoo(). Which will of course lead to code depending on that fact, so fixing it becomes very hard (but at least caught by unit tests).

Having the internal fields also leads to FooBar’s implementation preferring these fields over the public methods, which is totally fine, as long as FooBar stands alone.

But the moment there’s a FooFooBar which inherits from FooBar, you lose all the advantages of polymorphism. FooBar’s implementation will always only use its own private fields. It’s impossible for FooFooBar to affect FooBar’s implementation, causing the need to override many more methods than what would have been needed if FooBar used its own public API.

Conclusion

These two mistakes cost us hours and hours of working around our inability to do what we want. It cost us hours of debugging and it causes new features to come out much more clunky than they need to be.

I have done so many bad things in my professional life. A shutdown -h instead of -r on a remote server. A mem=512 boot parameter (yes. That number is/was interpreted as bytes. And yes. Linux needs more than 512 bytes of RAM to boot), an update without where clause – I’ve screwed up so badly in my life.

But all of this is nothing compared to these two mistakes.

These are not just inconveniencing myself. These are inconveniencing my coworkers and our customers (because we need more time to implement features).

Shutting down a server by accident means 30 minutes of downtime at worst (none since we heavily use VMWare). Screwing up a class design twice is the gift that keeps on giving.

I’m so sorry for you guys having to put up with OrderSet of doom.

Sorry guys.

Abusing LiveConnect for fun and profit

On december 20th I gave a talk at the JSZurich user group meeting in Zürich.
The talk is about a decade old technology which can be abused to get full,
unrestricted access to a client machine from JavaScript and HTML.

I was showing how you would script a Java Applet (which is completely hidden
from the user) to do the dirty work for you while you are creating a very nice
user interface using JavaScript and HTML.

The slides are available in PDF format too.

While it’s a very cool tech demo, it’s IMHO also a very bad security issue
which browser vendors and Oracle need to have a look at. The user sees nothing
but a dialog like this:

security prompt

and once they click OK, they are completely owned.

Even worse, while this dialog is showing the case of a valid certificate, the
dialog in case of an invalid (self-signed or expired) certificate isn’t much
different, so users can easily tricked into clicking allow.

The source code of the demo application is on github
and I’ve already written about this on this blog here,
but back then I was mainly interested in getting it work.

By now though, I’m really concerned about putting an end to this, or at least
increasing the hurdle the end-user has to jump through before this goes off –
maybe force them to click a visible Applet. Or just remove the LiveConnect feature all
together from browsers, thus forcing applets to be visible.

But aside of the security issues, I still think that this is a very
interesting case of long forgotten technology. If you are interested, do have
a look at the talk and travel back in time to when stuff like this was only
half as scary as it is now.

updated sacy – now with external tools

I’ve just updated the sacy repository again and tagged a v0.3-beta1 release.

The main feature since yesterday is support for the official compilers and
tools if you can provide them on the target machine.

The drawback is that these things come with hefty dependencies at times (I
don’t think you’d find a shared hoster willing to install node.js or Ruby for
you), but if you can provide the tools, you can get some really nice
advantages over the PHP ports of the various compilers:

  • the PHP port of sass has an issue that prevents
    @import from working. sacy’s build script does patch that, but the way they
    were parsing the file names doesn’t inspire confidence in the library. You
    might get a more robust solution by using the official tool.

  • uglifier-js is a bit faster than JSMin, produces significantly smaller
    output and comes with a better license (JSMin isn’t strictly free software
    as it has this “do no evil” clause)

  • coffee script is under very heavy development, so I’d much rather use the
    upstream source than some experimental fun project. So far I haven’t seen
    issues with coffeescript-php, but then I haven’t been using it much yet.

Absent from the list you’ll find less and css minification:

  • the PHP native CSSMin is really good and
    there’s no single official external tool out that demonstrably better (maybe
    the YUI compressor, but I’m not going to support something that requires me
    to deal with Java)

  • lessphp is very lightweight and yet very full
    featured and very actively developed. It also has a nice advantage over the
    native solution in that the currently released native compiler does not
    support reading its input from STDIN, so if you want to use the official
    less, you have to go with the git HEAD.

Feel free to try this out (and/or send me a patch)!

Oh and by the way: If you want to use uglifier or the original coffee script
and you need node but can’t install it, have a look at the
static binary I created