devops- gnegg

Sensational AG is the company I founded together with a colleague back in 2000. Ever since then, we had a very nice combination of fun, interesting work and a very successful business.

We’re a very small team – just nine programmers, one business guy, a product designer, a frontend designer and two bloody excellent project managers. Me personally, I would love to keep the team as small and tightly-knit as possible as that brings huge advantages: Little internal politics, a lot of freedoms for everybody and mind-blowing productivity.

I’m still amazed to see what we manage to do with our small team time and time again and yet still manage to keep the job fun. It’s not just the stuff we do outside of immediate work, like Cola Double Blind Tests, Drone Flights directly from the roof of our office and much more – it’s also the work itself that we try to make as fun as possible for everybody.

We are looking for a new member to help us with the development of our main product, an eCommernce platform that’s optimized for wholesale customers.

We’re not about presenting a small amount of product in the most enticing manner, but we’re into helping our end users to be as efficient and quick as possible to deal with their big orders (up to 400 line items per week).

Processes most of Switzerland’s whole-sale food orders

Over the years the main industry we serve has focussed itself on wholesale gastronomy. If you have ever bought something in any of the bigger empolyee restaurants in any larger company, if you have ever bought something in any restaurant of any school or university in Switzerland, whatever you consumed has likely been procured and managed over one of our platforms.

For our customers and ourselves, we handle a seizable amount of data (the largest data set is 3.2 TB in size).

I’m always calling our field «medium data» – while it might still fit into memory, it’s definitely too big to deal with it in the naïve way, so it’s not quite big-data yet, but it’s certainly in interesting spheres.

We’re in the comfortable position that the data entrusted to us is growing in the speed that we’re able to learn how to deal with it and so is our architecture. What started as a simple PHP-in-front-of-PostgreSQL deal back in 2004 by now has grown to a cluster of about 40 machines: Job queue servers, importer servers, application servers, media servers, event forwarding servers; because we are hosting our infrastructure for our customers, we can afford to go the extra mile to do things technically interesting and exciting.

Speaking of infrastructure: We own the full stack of our product: Our web application, its connected micro services, our phone apps, our barcode reading apps, but also our backend infrastructure (which is kept up to date by Puppet)

While our main application is a beast of 250k lines of PHP code, we still strive to use the best tool for their jobs and in the last years have grown our infrastructure with tools we have written in Rust, JavaScript & TypeScript (via Node.js) and of course our mobile apps are written in their native languages Swift and Java with more and more Kotlin.

We try to stay as current as possible even with our core PHP code. We have upgraded PHP versions more or less the day they come out.

As strong believers in Open Source, whenever we come across a bug in our dependencies, we fix it and publish it upstream. Many of our team members have had their patches merged into PHP, Psalm, Rust, Tantivy and others. Giving back is only fair (and of course also helps us with future maintenance).

Over the years we have learned about the value of modern software development: Strong typing (even in PHP through Psalm), functional programming, automated tests, automated deployments: We do whatever we can do to allow us to continuously and with confidence push updates to our thousands of users multiple times a day.

Ready to press the button and deploy to 100s of users

If this sounds interesting to you and you want to help us make it possible for our end users to leave their workplace earlier because ordering is so much easier, then ping me at jobs@sensational.ch.

You should be familiar with working on bigger Software projects and understanding of software maintainability over the years. We hardly ever start fresh, but we constantly strive to keep what we have modern and up to speed with wherever technology goes.

You will be initially mostly working on our PHP and JS/TypeScript code-base, but if you’re into another language and it will help you solve a problem you’re having or your skill in a language we’re already working with can help us solve a problem, then you’re more than welcome to help.

If you have UNIX shell experience, that’s a bigger plus, though it’s not required, but you will just have to learn the ropes a bit.

All our work is tracked in git and we’re extremely into beautiful commit histories and thus heavy users of the full feature-set that git offers. But don’t worry – so far, we’ve helped everybody get up to speed.

And finally: As a mostly male team – after all, we only have two women working on our team of developers, we’d especially love if more women would find their way into our team.

All of us are very aware how difficult it is for minorities to find a comfortable working environment they can add their experiences to and where they can be themselves. All of us strive to provide such an environment.

In the summer of 2012, I had the great oportunity to clean up our hosting
infrastructure. Instead of running many differently configured VMs, mostly one
per customer, we started building a real redundant infrastructure with two
really beefy physical database machines (yay) and quite many (22) virtual
machines for caching, web app servers, file servers and so on.

All components are fully redundant, every box can fail without anybody really
needing to do anything (one exception is the database – that’s also redundant,
but we fail over manually due to the huge cost in time to failback).

Of course you don’t manage ~20 machines manually any more: Aside of the fact
that it would be really painful to do for those that have to be configured in an
identical way (the app servers come to mind), you also want to be able to
quickly bring a new box online which means you don’t have time to manually go
through the hassle of configuring it.

So, In the summer of 2012, when we started working on this, we decided to go
with puppet. We also considered Chef but their server
was really complicated to set up and install and there was zero incentive for
them to improve because that would, after all, disincentivse people from
becoming customers of their hosted solutions (the joys of open-core).

Puppet is also commerically backed, but everything they do is available as open
source and their approach for the central server is much more «batteries
included» than what Chef has provided.

And finally, after playing around a bit with both Chef and puppet, we noticed
that puppet was way more bitchy and less tolerant of quick hacks around issues
which felt like a good thing for people dabbling with HA configuration of a
multi machine cluster for the first time.

Fast forward one year: Last autumn I found out about
ansible (linking to their github page –
their website reads like a competition in buzzword-bingo) and after reading
their documentation, I immediately was convinced:

No need to install an agent on managed machines
Trivial to bootstrap machines (due to above point)
Contributors don’t need to sign a CLA (thank you so much, ansibleworks!)
No need to manually define dependencies of tasks: Tasks are run requentially
Built-in support for cowsay by default
Many often-used modules included by default, no hunting for, say, a sysctl
module on github
Very nice support for rolling updates
Also providing a means to quickly do one-off tasks
Very easy to make configuration entries based on the host inventory (which requires puppetdb and an external database in the case of puppet)

Because ansible connects to each machine individually via SSH, running it
against a full cluster of machines is going to take a bit longer than with
puppet, but our cluster is small, so that wasn’t that much of a deterrent.

So last Sunday evening I started working on porting our configuration over from
puppet to Ansible and after getting used to the YAML syntax of the playbooks, I
made very quick progress.

progress

Again, I’d like to point out the excellent, built-in, on-by-default support for
cowsay as one of the killer-features that made me seriously consider starting
the porting effort.

Unfortunately though, after a very promising start, I had to come to the
conclusion that we will be sticking with puppet for the time being because
there’s one single feature that Ansible doesn’t have and that I really, really
want a configuration management system to have:

I’ts not possible in Ansible to tell it to keep a directory clean of files not
managed by Ansible in some way

There are, of course, workarounds, but they come at a price too high for me to
be willing to pay.

You could first clean a directory completely using a shell command, but this
will lead to ansible detecting a change to that folder every time it runs which
will cause server restarts, even when they are not needed.
You could do something like this stack overflow question
but this has the disadvantage that it forces you into a configuration file
specific playbook design instead of a role specific one.

What I mean is that using the second workaround, you can only have one playbook
touching that folder. But imagine for example a case where you want to work with
/etc/sysctl.d: A generic role would put some stuff there, but then your
firewall role might put more stuff there (to enable ip forwarding) and your
database role might want to add other stuff (like tweaking shmmax and shmall,
though that’s thankfully not needed any more in current Postgres releases).

So suddenly your /etc/sysctl.d role needs to know about firewalls and
databases which totally violates the really nice separation of concerns between
roles. Instead of having a firewall and a database role both doing something to
/etc/sysctl.d, you know need a sysctl-role which does different things
depending on what other roles a machine has.

Or, of course, you just don’t care that stray files never get removed, but
honestly: Do you really want to live with the fact that your /etc/sysctl.d, or
worse, /etc/sudoers.d can contain files not managed by ansible and likely not
intended to be there? Both sysctl.d and sudoers.d are more than capable of doing
immense damage to your boxes and this sneakily behind the watching eye of your
configuration management system?

For me that’s inacceptable.

So despite all the nice advantages (like cowsay), this one feature is something
that I really need and can’t have right now and which, thus, forces me to stay
away from Ansible for now.

It’s a shame.

Some people tell me that implementing my feature would require puppet’s feature
of building a full state of a machine before doing anything (which is error-
prone and frustrating for users at times), but that’s not really true.

If ansible modules could talk to each other – maybe loosly coupled by firing
some events as they do stuff, you could just name the task that makes sure the
directory exists first and then have that task register some kind of event
handler to be notified as other tasks touch the directory.

Then, at the end, remove everything you didn’t get an event for.

Yes. This would probably (I don’t know how Ansible is implemented internally)
mess with the decouplling of modules a bit, but it would be so far removed
from re-implementing puppet.

Which is why I’m posting this here – maybe, just maybe, somebody reads my plight
and can bring up a discussion and maybe even a solution for this. Trust me: I’d
so much rather use Ansible than puppet, it’s crazy, but I also want to make sure
that no stray file in /etc/sysctl.d will bring down a machine.

Yeah. This is probably the most words I’ve ever used for a feature request, but
this one is really, really important for me which is why I’m so passionate about
this. Ansible got so f’ing much right. It’s such a shame to still be left
unable to really use it.

Is this a case of xkcd1172? Maybe, but to me, my
request seems reasonable. It’s not? Enlighten me! It is? Great! Let’s work on
fixing this.

Category: devops

Sensational AG is hiring (once more)

Ansible