Ansible

In the summer of 2012, I had the great oportunity to clean up our hosting
infrastructure. Instead of running many differently configured VMs, mostly one
per customer, we started building a real redundant infrastructure with two
really beefy physical database machines (yay) and quite many (22) virtual
machines for caching, web app servers, file servers and so on.

All components are fully redundant, every box can fail without anybody really
needing to do anything (one exception is the database – that’s also redundant,
but we fail over manually due to the huge cost in time to failback).

Of course you don’t manage ~20 machines manually any more: Aside of the fact
that it would be really painful to do for those that have to be configured in an
identical way (the app servers come to mind), you also want to be able to
quickly bring a new box online which means you don’t have time to manually go
through the hassle of configuring it.

So, In the summer of 2012, when we started working on this, we decided to go
with puppet. We also considered Chef but their server
was really complicated to set up and install and there was zero incentive for
them to improve because that would, after all, disincentivse people from
becoming customers of their hosted solutions (the joys of open-core).

Puppet is also commerically backed, but everything they do is available as open
source and their approach for the central server is much more «batteries
included» than what Chef has provided.

And finally, after playing around a bit with both Chef and puppet, we noticed
that puppet was way more bitchy and less tolerant of quick hacks around issues
which felt like a good thing for people dabbling with HA configuration of a
multi machine cluster for the first time.

Fast forward one year: Last autumn I found out about
ansible (linking to their github page –
their website reads like a competition in buzzword-bingo) and after reading
their documentation, I immediately was convinced:

  • No need to install an agent on managed machines
  • Trivial to bootstrap machines (due to above point)
  • Contributors don’t need to sign a CLA (thank you so much, ansibleworks!)
  • No need to manually define dependencies of tasks: Tasks are run requentially
  • Built-in support for cowsay by default
  • Many often-used modules included by default, no hunting for, say, a sysctl
    module on github
  • Very nice support for rolling updates
  • Also providing a means to quickly do one-off tasks
  • Very easy to make configuration entries based on the host inventory (which requires puppetdb and an external database in the case of puppet)

Because ansible connects to each machine individually via SSH, running it
against a full cluster of machines is going to take a bit longer than with
puppet, but our cluster is small, so that wasn’t that much of a deterrent.

So last Sunday evening I started working on porting our configuration over from
puppet to Ansible and after getting used to the YAML syntax of the playbooks, I
made very quick progress.

progress

Again, I’d like to point out the excellent, built-in, on-by-default support for
cowsay as one of the killer-features that made me seriously consider starting
the porting effort.

Unfortunately though, after a very promising start, I had to come to the
conclusion that we will be sticking with puppet for the time being because
there’s one single feature that Ansible doesn’t have and that I really, really
want a configuration management system to have:

I’ts not possible in Ansible to tell it to keep a directory clean of files not
managed by Ansible in some way

There are, of course, workarounds, but they come at a price too high for me to
be willing to pay.

  • You could first clean a directory completely using a shell command, but this
    will lead to ansible detecting a change to that folder every time it runs which
    will cause server restarts, even when they are not needed.

  • You could do something like this stack overflow question
    but this has the disadvantage that it forces you into a configuration file
    specific playbook design instead of a role specific one.

What I mean is that using the second workaround, you can only have one playbook
touching that folder. But imagine for example a case where you want to work with
/etc/sysctl.d: A generic role would put some stuff there, but then your
firewall role might put more stuff there (to enable ip forwarding) and your
database role might want to add other stuff (like tweaking shmmax and shmall,
though that’s thankfully not needed any more in current Postgres releases).

So suddenly your /etc/sysctl.d role needs to know about firewalls and
databases which totally violates the really nice separation of concerns between
roles. Instead of having a firewall and a database role both doing something to
/etc/sysctl.d, you know need a sysctl-role which does different things
depending on what other roles a machine has.

Or, of course, you just don’t care that stray files never get removed, but
honestly: Do you really want to live with the fact that your /etc/sysctl.d, or
worse, /etc/sudoers.d can contain files not managed by ansible and likely not
intended to be there? Both sysctl.d and sudoers.d are more than capable of doing
immense damage to your boxes and this sneakily behind the watching eye of your
configuration management system?

For me that’s inacceptable.

So despite all the nice advantages (like cowsay), this one feature is something
that I really need and can’t have right now and which, thus, forces me to stay
away from Ansible for now.

It’s a shame.

Some people tell me that implementing my feature would require puppet’s feature
of building a full state of a machine before doing anything (which is error-
prone and frustrating for users at times), but that’s not really true.

If ansible modules could talk to each other – maybe loosly coupled by firing
some events as they do stuff, you could just name the task that makes sure the
directory exists first and then have that task register some kind of event
handler to be notified as other tasks touch the directory.

Then, at the end, remove everything you didn’t get an event for.

Yes. This would probably (I don’t know how Ansible is implemented internally)
mess with the decouplling of modules a bit, but it would be so far removed
from re-implementing puppet.

Which is why I’m posting this here – maybe, just maybe, somebody reads my plight
and can bring up a discussion and maybe even a solution for this. Trust me: I’d
so much rather use Ansible than puppet, it’s crazy, but I also want to make sure
that no stray file in /etc/sysctl.d will bring down a machine.

Yeah. This is probably the most words I’ve ever used for a feature request, but
this one is really, really important for me which is why I’m so passionate about
this. Ansible got so f’ing much right. It’s such a shame to still be left
unable to really use it.

Is this a case of xkcd1172? Maybe, but to me, my
request seems reasonable. It’s not? Enlighten me! It is? Great! Let’s work on
fixing this.

pdo_pgsql needs some love

Today, PostgreSQL 9.3 was released.
September is always the month of PostgreSQL as every September a new
Major Release with awesome new feature is released and every September
I have to fight the urgue to run and immediately update the production
systems to the new version of my
favorite toy

As every year, I want to talk the awesome guys (and girls I hope) that
make PostgreSQL one of my favorite pieces of software overall and for
certain my most favorite database system.

That said, there’s another aspect of PostgreSQL that needs some serious
love: While back in the days PHP was known for its robust database
client libraries, over time other language environments have caught up
and long since surpassed what’s possible in PHP.

To be honest, the PostgreSQL client libraries as they are currently
available in PHP are in serious need of some love.

If you want to connect to a PostgreSQL database, you have two options:
Either you use the thin wrapper over libpq, the pgsql extension,
or you go PDO at which point, you’d use pdo_pgsql

Both solutions are, unfortunately, quite inadequate solutions that fail
to expose most of the awesomeness that is PostgreSQL to the user:

pgsql

On the positive side, being a small wrapper around libpq, the pgsql
extension knows quite a bit about Postgres’ internals: It has excellent
support for COPY, it knows about a result sets data types (but doesn’t
use that knowledge as you’ll see below), it has pg_quote_identifier
to correctly quote identifiers, it support asynchronous queries and it
supports NOTIFY.

But, while pgsql knows a lot about Postgres’ specifics, to this day,
the pg_fetch_* functions convert all columns into strings. Numeric
types? String. Dates? String. Booleans? Yes. String too (‘t’ or ‘f’,
both trueish values to PHP).

To this day, while the extension supports prepared statements, their
use is terribly inconvenient, forcing you to name your statements and
to manually free them.

To this day, the pg_fetch_* functions load the whole result set into
an internal buffer, making it impossible to stream results out to the
client using an iterator pattern. Well. Of course it’s still possible,
but you waste the memory for that internal buffer, forcing you to
manually play with DECLARE CURSOR and friends.

There is zero support for advanced data types in Postgres and the
library doesn’t help at all with todays best practices for accessing a
database (prepared statements).

There are other things that make the extension unpractical for me, but
they are not the extensions fault, so I won’t spend any time explaining
them here (like the lack of support by newrelic – but, as I said,
that’s not the extensions fault)

pdo_pgsql

pdo_pgsql gets a lot of stuff right that the pgsql extension doesn’t:
It doesn’t read the whole result set into memory, it knows a bit about
data types, preserving numbers and booleans and, being a PDO driver, it
follows the generic PDO paradigms, giving a unified API with other PDO
modules.

It also has good support for prepared statements (not perfect, but
that’s PDOs fault).

But it also has some warts:

  • There’s no way to safely quote an identifier. Yes. That’s a PDO
    shortcoming, but still. It should be there.
  • While it knows about numbers and booleans, it doesn’t know about any of the other more advanced data types.
  • Getting metadata about a query result actually makes it query the
    database – once per column, even though the information is right there
    in libpq, available to use (look at the
    source
    of PDOStatement::getColumnMeta). This makes it impossible to fix above issue in userland.
  • It has zero support for COPY

If only

Imagine the joy of having a pdo_pgsql that actually cares about
Postgres. Imagine how selecting a JSON column would give you its data
already decoded, ready to use in userland (or at least an option to).

Imagine how selecting dates would at least give you the option of
getting them as a DateTime (there’s loss of precision though –
Postgres’ TIMESTAMP has more precision than DateTime)

Imagine how selecting an array type in postgres would actually give you
back an array in PHP. The string that you have to deal with now is
notoriously hard to parse. Yes. There now is array_to_json in
Postgres, but hat shouldn’t be needed.

Imagine how selecting a HSTORE would give you an associative array.

Imagine using COPY with pdo_pgsql for very quickly moving bulk data.

Imagine the new features of PGResult being exposed to userland.
Giving applications the ability to detect what constraint was just
violated (very handy to detect whether it’s safe to retry).

Wouldn’t that be fun? Wouldn’t that save us from having to type so much
boilerplate all day?

Honestly, what I think should happen is somebody should create a
pdo_pgsql2 that breaks backwards compatibility and adds all these
features.

Have getColumnMeta just return the OID instead of querying the
database. Have a quoteIdentifier method (yes. That should be in PDO
itself, but let’s fix it where we can).

Have fetch() return Arrays or Objects for JSON columns. Have it
return Arrays for arrays and HSTOREs. Have it optionally return
DateTimes instead of strings.

Wouldn’t that be great?

Unfortunately, while I can write some C, I’m not nearly good enough
to produce something that I could live with other people using, so any
progress I can achieve will be slow.

I’m also unsure of whether this would ever have a chance to land in PHP
itself. Internals are very adverse to adding new features to stuff that
already “works” and no matter how good the proposal, you need a very
thick skin if you want to ever get something merged, no matter whether
you can actually offer patches or not.

Would people be using an external pdo_pgsql2? Would it have a chance as
a pecl extension? Do other people see a need for this? Is somebody
willing to help me? I really think something needs to be done and I’m
willing to get my hands dirty – I just have my doubts about the quality
of the result I’m capable of producing. But I can certainly try.

And I will.

when in doubt – SSL

Since 2006, as part of our product, we are offering barcode scanners
with GSM support to either send orders directly to the vendor or to
transmit products into the web frontend where you can further edit them.

Even though the devices (Windows Mobile. Crappy. In progress of
updating) do support WiFi, we really only support GSM because that means we don’t have to share the end users infrastructure.

This is a huge plus because it means that no matter how locked-down the
customer’s infrastructure, no matter how crappy the proxy, no matter the IDS in use, we’ll always be able to communicate with our server.

Until, of course, the mobile carrier most used by our customers decides
to add a “transparent” (do note the quotes) proxy to the mix.

We were quite stomped last week when we got reports of an HTTP error 408 to be reported by the mobile devices, especially because we were not seeing error 408 in our logs.

Worse, using tcpdump has clearly shown how we were getting a RST
packet from the client, sometimes before sending data, sometimes while
sending data.

Strange: Client is showing 408, server is seeing a RST from the client.
Doesnt’ make sense.

Tethering my Mac using the iPhones personal hotspot feature and a SIM
card of the mobile provider in question made it clear: No longer are we
talking directly to our server. No. What the client receives is a 408
HTML formatted error message by a proxy server.

Do note the “DELETE THIS LINE” and “your organization here” comments.
What a nice touch. Somebody was really spending alot of time getting
this up and running.

Now granted, taking 20 seconds before being able to produce a response
is a bit on the longer side, but unfortunately, some versions of the
scanner software require gzip compression and gzip compression needs to
know the full size of the body to compress, so we have to prepare the
full response (40 megs uncompressed) before being able to send anything
– that just takes a while.

But consider long-polling or server sent events – receiving a 408 after
just 20 seconds? That’s annoying, wasting resources and probably not
something you’re prepared for.

Worse, nobody was notified of this change. For 7 years, the clients
were able to connect directly to our server. Then one day it changes
and now they aren’t. No communication, no time to prepare and
certainly too strict limits in order to not affect anything (not
just us – see my remark about long polling).

The solution in the end is, like so often, to use SSL. SSL connections
are opaque to any reverse proxy. A proxy can’t decrypt the data without
the client noticing. An SSL connection can’t be inspected and an SSL
connection can’t be messed with.

Sure enough: The exact same request that fails with that 408 over HTTP
goes through nicely using HTTPS.

This trick works every time when somebody is messing with your
connection. Something f’ing up your WebSocket connection? Use SSL!
Something messing with your long-polling? Use SSL. Something
decompressing your response but not stripping off the Content-Encoding
header (yes. that happend to me once)? Use SSL. Something replacing
arbitrary numbers in your response with asterisks (yepp. happened too)?
You guessed it: Use SSL.

Of course, there are three things to keep in mind:

  1. Due to the lack of SNI in the world’s most used OS and Browser
    combination (any IE under Windows XP), every SSL site you host requires
    one dedicated IP address. Which is bad considering that we are running
    out of addresses.

  2. All of the bigger mobile carriers have their CA in the browsers
    trusted list. Aside of ethics, there is no reason what so ever for them
    to not start doing all the crap I described and just re-encrypting the
    connection, faking a certificate using their trusted ones.

  3. failing that, they still might just block SSL at some point, but as
    more and more sites are going SSL only (partially for above reasons no
    doubt), outright blocking SSL is going to be more and more unlikely to
    happen.

So. Yes. When in doubt: Use SSL. Not only does that help your users
privacy, it also fixes a ton of technical issues created by practically
non-authorized third-party messing with you.

background events

Today is the day that one of the coolest things I had the pleasure to
develop so far in my life has gone live to production use.

One installation of PopScan is connected to
a SAP system that had at times really bad performance and yet it
needed to be connected even just to query for price information.

This is a problem because of features like our persistent shopping
basket or the users templates which cause a lot of products to be
displayed at once.

Up until now, PopScan synchronously queried for the prices and would
not render any products until all the product data has been assembled.

When you combine this with the sometimes bad performance of that SAP
system, you’ll quickly see unhappy users waiting for the pages to
finally load.

We decided to fix this problem for the users.

Aside of the price, all product data is in PopScan’s database anyways, so
while we need to wait for prices, everything else, we could display
immediately.

So that’s what we do now: Whenever we load products and we don’t have a price
yet, we’ll launch a background job which asynchronously retrieves the prices.
The frontend will immediately get the rendered products minus the prices.

But of course, we still need to show the user the fully loaded products once
they become available and this is where the cool server based event framework
comes into play:

The JS client in PopScan now gets notified on arbitrary events that can happen
on the server (like “product data loaded”, but also “GPRS scaner data
received”). The cool thing about this is that events are seemingly pushed
through instantly as they happen on the server giving the user the immediate
response they would want and lessening the load on the server as there’s no
(well. only long-) polling going on.

$(ServerEvents).bind('product-data', function(data){
    // product data has changed!
}

is all that we need on the client. The rest happens automatically.

Also remember though that PopScan is often used in technology-hostile
enterprise environments. Thus, features like web-sockets are out and in
general, we had to support ancient software all over the place.

We still managed to make it work and today this framework went to production
use for that one customer with the badly performing SAP system.

Over the course of the next few weeks, I might write in detail about how this
stuff works given the constratins (ancient client-software behind hostile
firewalls) and what software components we used.

Seeing this work go life fills me with joy: I’ve spend to many hours designing
this framework in a fool-proof way in order to not lose events and in order to
gracefully continue working as components in the big picture die.

Now it’s finally live and already contributing to lower waiting times for all
users.

sacy 0.4-beta 1

I’ve just pushed version 0.4-beta1 of sacy
to its github repository. Aside of requiring
PHP 5.3 now, it also has support for transforming contents of inline-tags.

So if you always wanted to write

type="text/coffeescript">
hello = (a)->
    alert "Hello #{a}"
hello "World"

and have the transformation done on the server-side, then I have good news
for you: Now you can! Just wrap the script with
{asset_compile}...{/asset_compile}.

I’m not saying that having inline-scripts (or even stylesheets) is a good idea
but sometimes, we have to pass data between our HTML templates and the JS
code and now we can do it in Coffee Script.

Development note

When you take a look at the commits leading to the release, you will notice
that I more or less hacked the support for inline tags into the existing
codebase (changing the terminology from files to work units in the process
though).

Believe me, I didn’t like this.

When I sat down to implement this, what I had in mind was a very nice
architecture where various components just register themselves and then
everything falls into place more or less automatically.

Unfortunately, what ever I did (I used git checkout . about three times) to
start over, I never got a satisfactory solution:

  • sometimes, I was producing a ton of objects, dynamically looking up what
    methods to call and what classes to instantiate.

    This would of course be very clean and cool, but also terribly slow. Sacy
    is an embeddable component, not an application in its own right.

  • sometimes, I had a simplified object model that kind of worked right until I
    thought of some edge-case at which point we would have either ended up back in
    hack-land or the edge-cases would have had to remain unfixed

  • sometimes I had something flexible enough to do what I need, but it still
    had code in it that had to know whether it was dealing with instances of Class
    A or Class B which is as inacceptable as the current array-mess.

In the end, it hit me: Sacy is already incomplete in that it simplifies the
problem domain quite a lot already. To cleanly get out of this, I would have to
actually parse and manipulate the DOM instead of dealing with regexes and I
would probably even have to go as far as to write a FactoryFactory in order
to correctly abstract away the issues.

Think of it: We have a really interesting problem domain here:

  • the same type of asset can use different tags (style and link for
    stylesheets)
  • Different attributes are used to refer to external resources (href for
    stylesheets, src for scripts)
  • File-backed assets can (and should) be combined
  • Conent-backed assets should be transformed and immediately inlined
  • Depending on the backing (content or file), the assets use a different
    method to determine cache-freshness (modification-time/size vs. content)
  • And last but not least, file based asset caching is done on the client side,
    content based asset caching is done on the server-side.

Building a nice architecture that would work without the ifs I learned to
hate lately would mean huge levels of indirections and abstractions.

No matter what I tried, I always ended up with a severe case of object-itis and
architectur-itis, both of which I deemed completely inacceptable for a
supposedly small and embeddable library.

Which is why I decided to throw away all my attempts and make one big
compromise and rely on CacheRenderer::renderWorkUnits to be called with
unified workunits (either all file or all content-based).

That made the backend code a lot easier.

And I could keep the lean array structure for describing a unit of work to do
for the backend.

I would still, at some point, love to have a nice way for handlers to register
themselves, but that’s something I’ll handle another day. For now, I’m happy
that I could accomplish my goal in a very lean fashion at the cost of a public
interface of the backend that is really, really inconvenient to use which leaves way too much code in the fronend.

At least I got away without an AssetFactoryFactory though :-)

My worst mistakes in programming

I’m in the middle of refactoring a big infrastructure piece in our product PopScan. It’s very early code, rarely touched since its inception in 2004, so I’m dealing mainly with my sins of the past.

This time like no time before, I’m feeling the two biggest mistake I have ever made in designing a program, so I though I’d make this post here in order to help others not fall into the same trap.

Remember this: Once you are no longer alone working on your project, the code you have written sets an example. Mistakes you have made are copied – either verbatim or in spirit. The design you have chosen lives on in the code that others write (rightfully so – you should strive to keep code consistent).

This makes it even more important not to screw up.

Back in 2004 I have failed badly at two places.

  • I chose a completely wrong abstraction in class design, mixing two things that should be separate.
  • I chose – in a foolhearted whish to save on CPU time to create a ton of internal state instead of fetching the data when it’s needed (I could still cache then, but I missed that).

So here’s the story.

One is the architectural issue.

Let me tell you, dear reader, should you ever be in the position of having to do anything even remotely related to an ecommerce solution dealing with products and orders, so repeat with me:

Product lists are not the same thing as orders. Orders are not the same thing as baskets.

and even more importantly:

A product and a line item are two completely different things.

A line item describes how a specific product is placed in a list, so at best, a product is contained in a line item. A product doesn’t have a quantity. A product doesn’t have a total price.

A line item does.

And when we are at it: «quantity» is not a number. It is the entitiy that describes the amount of times the product is contained within the line item. As such a quantity usually consists of an amount and a unit. If you change the unit, you change the quantity. If you change the amount, you change the quantity.

Anyways – sitting down and thinking of the entities in the feature that you are implementing is an essential part of the work that you do. Even it it seems “kinda right” at the time, even if it works “right” for years – once you make a mistake at a bad place, you are stuck with it.

PopScan is about products and ordering them. Me missing the distinction between a product and a line item back in 2004 worked fine until now, but as this is a core component of PopScan, it has grown the most over the years, more and more intertwining product and line item functionality to the point of where it’s too late to fix this now or at least it would require countless hours of work.

Work that will have to be done sooner rather than later. Work that deeply affects a core component of the product. Work that will change the API greatly and as such can only be tested for correctness in integration tests. Unit tests become useless as the units that are
being tested won’t exist any more in the future.

Painful work.

If only I had more time and experience those 8 years ago.

The other issue is about state

Let’s say you have a class FooBar with a property Foo that is exposed as part of the public API via a getFoo method.

That Foo relies of some external data – let’s call it foodata.

Now you have two options of dealing with that foodata:

  1. You could read foodata into an internal foo field at construction time. Then, whenever your getFoo() is called, you return the value you stored in foo.
  2. Or you could read nothing until getFoo() is called and then read foodata and return that (optionally caching it for the next call to getFoo())

Choosing the first design for most of the models back in 2004 was the second biggest coding mistake I have ever made in my life.

Aside of the fact that constructing one of these FooBar objects becomes more and more expensive the more stuff you preload (likely never to be used for the lifetime of the object), you have also contributed to a huge amount of internal state of the object.

The temptation to write a getBar() method that has a side effect of also altering the internal foo field is just too big. And now you end up with a getBar() that suddenly also depends on the internal state of foo which suddenly is disconnected from the initial foodata.

Worse, suddenly calling code will see different results depending on whether it calls getBar() before it’s calling getFoo(). Which will of course lead to code depending on that fact, so fixing it becomes very hard (but at least caught by unit tests).

Having the internal fields also leads to FooBar’s implementation preferring these fields over the public methods, which is totally fine, as long as FooBar stands alone.

But the moment there’s a FooFooBar which inherits from FooBar, you lose all the advantages of polymorphism. FooBar’s implementation will always only use its own private fields. It’s impossible for FooFooBar to affect FooBar’s implementation, causing the need to override many more methods than what would have been needed if FooBar used its own public API.

Conclusion

These two mistakes cost us hours and hours of working around our inability to do what we want. It cost us hours of debugging and it causes new features to come out much more clunky than they need to be.

I have done so many bad things in my professional life. A shutdown -h instead of -r on a remote server. A mem=512 boot parameter (yes. That number is/was interpreted as bytes. And yes. Linux needs more than 512 bytes of RAM to boot), an update without where clause – I’ve screwed up so badly in my life.

But all of this is nothing compared to these two mistakes.

These are not just inconveniencing myself. These are inconveniencing my coworkers and our customers (because we need more time to implement features).

Shutting down a server by accident means 30 minutes of downtime at worst (none since we heavily use VMWare). Screwing up a class design twice is the gift that keeps on giving.

I’m so sorry for you guys having to put up with OrderSet of doom.

Sorry guys.

Abusing LiveConnect for fun and profit

On december 20th I gave a talk at the JSZurich user group meeting in Zürich.
The talk is about a decade old technology which can be abused to get full,
unrestricted access to a client machine from JavaScript and HTML.

I was showing how you would script a Java Applet (which is completely hidden
from the user) to do the dirty work for you while you are creating a very nice
user interface using JavaScript and HTML.

The slides are available in PDF format too.

While it’s a very cool tech demo, it’s IMHO also a very bad security issue
which browser vendors and Oracle need to have a look at. The user sees nothing
but a dialog like this:

security prompt

and once they click OK, they are completely owned.

Even worse, while this dialog is showing the case of a valid certificate, the
dialog in case of an invalid (self-signed or expired) certificate isn’t much
different, so users can easily tricked into clicking allow.

The source code of the demo application is on github
and I’ve already written about this on this blog here,
but back then I was mainly interested in getting it work.

By now though, I’m really concerned about putting an end to this, or at least
increasing the hurdle the end-user has to jump through before this goes off –
maybe force them to click a visible Applet. Or just remove the LiveConnect feature all
together from browsers, thus forcing applets to be visible.

But aside of the security issues, I still think that this is a very
interesting case of long forgotten technology. If you are interested, do have
a look at the talk and travel back in time to when stuff like this was only
half as scary as it is now.

updated sacy – now with external tools

I’ve just updated the sacy repository again and tagged a v0.3-beta1 release.

The main feature since yesterday is support for the official compilers and
tools if you can provide them on the target machine.

The drawback is that these things come with hefty dependencies at times (I
don’t think you’d find a shared hoster willing to install node.js or Ruby for
you), but if you can provide the tools, you can get some really nice
advantages over the PHP ports of the various compilers:

  • the PHP port of sass has an issue that prevents
    @import from working. sacy’s build script does patch that, but the way they
    were parsing the file names doesn’t inspire confidence in the library. You
    might get a more robust solution by using the official tool.

  • uglifier-js is a bit faster than JSMin, produces significantly smaller
    output and comes with a better license (JSMin isn’t strictly free software
    as it has this “do no evil” clause)

  • coffee script is under very heavy development, so I’d much rather use the
    upstream source than some experimental fun project. So far I haven’t seen
    issues with coffeescript-php, but then I haven’t been using it much yet.

Absent from the list you’ll find less and css minification:

  • the PHP native CSSMin is really good and
    there’s no single official external tool out that demonstrably better (maybe
    the YUI compressor, but I’m not going to support something that requires me
    to deal with Java)

  • lessphp is very lightweight and yet very full
    featured and very actively developed. It also has a nice advantage over the
    native solution in that the currently released native compiler does not
    support reading its input from STDIN, so if you want to use the official
    less, you have to go with the git HEAD.

Feel free to try this out (and/or send me a patch)!

Oh and by the way: If you want to use uglifier or the original coffee script
and you need node but can’t install it, have a look at the
static binary I created

updated sacy – now with more coffee

I’ve just updated the sacy repository
to now also provide support for compiling Coffee Script.

{asset_compile}
 type="text/coffeescript" src="/file1.coffee">
 type="text/javascript" src="/file2.js">
{/asset_compile}

will now not compile file1.coffee into JS before creating and linking one big chunk of minified JavaScript.

 type="text/javascript" src="/assetcache/file2-deadbeef1234.js">

As always, the support is seamless – this is all you have to do.

Again, in order to keep deployment simple, I decided to go with a pure PHP solution (coffeescript-php).

I do see some advantages in the native solutions though (performance, better output), so I’m actively looking into a solution to detect the availability of native converters that I could shell out to without having to hit the file system on every request.

Also, when adding the coffee support, I noticed that the architecture of sacy isn’t perfect for doing this transformation stuff. Too much code had to be duplicated between CSS and JavaScript, so I will do a bit of refactoring there.

Once both the support for external tools and the refactoring of the transformation is completed, I’m going to release v0.3, but if you want/need coffee support right now, go ahead and clone
the repository.

A new fun project

Like back in 2010 I went to JSConf.eu this year around.

One of the many impressive facts about JSConf is the quality of their Wifi
connection. It’s not just free and stable, it’s also fast. Not only that, this
time around, they had a very cool feature: You authenticated via twitter.

As most of the JS community seems to be having twitter accounts anyways, this
was probably the most convenient solution for everyone: You didn’t have to
deal with creating an account or asking someone for a password and on the
other hand, the organizers could make sure that, if abuse should happen,
they’d know whom to notify.

On a related note: This was in stark contrast to the WiFi I had in the hotel
which was unstable, slow and cost a ton of money to use and it didn’t use
Twitter either :-)

In fact, the twitter thing was so cool to see in practice, that I want to use
it for myself too.

Since the days of WEP-only Nintendo DS, I’m running two WiFi networks at home:
One is WPA protected and for my own use, the other is open, but it runs over
a different interface on shion
which has no access to any other machine in my network. This is even more
important as I have a permanent OpenVPN connection
to my office and I definitely don’t want to give the world access to that.

So now the plan would be to change that open network so that it redirects to a
captive portal until the user has authenticated with twitter (I might add
other providers later on – LinkedIn would be awesome for the office for
example).

In order for me to actually get the thing going, I’m doing a tempalias on this
one too and keep a diary of my work.

So here we go. I really think that every year I should do some fun-project
that’s programming related, can be done on my own and is at least of some use.
Last time it was tempalias, this time, it’ll be
Jocotoco (more about the name in the next installment).

But before we take off, let me give, again, huge thanks to the JSConf crew for
the amazing conference they manage to organize year after year. If I could,
I’d already preorder the tickets for next year :p

Attending a JSConf feels like a two-day drug-trip that lasts for at least two
weeks.