Tailscale on PFSense

For a bit more than a year, I’m a user of Tailscale which is a service that builds an overlay network on top of Wireguard while relying on OAuth with third party services for authentication

It’s incredibly easy to get going with Tailscale and the free tier they provide is more than good enough for the common personal use cases (in my case: tech support for my family).

Most of the things that are incredibly hard to set up with traditional VPN services just work out of the box or require a minimal amount of configuration. Heck, even more complicated things like tunnel splitting and DNS resolution in different private subnets just work. It’s magic.

While I have some gripes that prevent me from switching all our company VPN connectivity over to them, those are a topic for a future blog post.

The reason I’m writing here right now is that a few weeks ago, Netgate and Tailscale announced a Tailscale package for PFSense. As a user of both PFSense and Tailscale, this allowed me to get rid of a VM that does nothing but be a Tailscale exit node and subnet router and instead use the Tailscale package to do this on PFSense.

However, doing this for a week or so has revealed some very important things to keep in mind which I’m posting about here because other people (and that includes my future self) will run into these issues and some are quite devastating:

When using the Tailscale package on PFSense, you will encounter two issues directly caused by Tailscale, but both of which also seen in other reports when you search for the issue on the internet, so you might be led astray when debugging it.

Connection loss

The first one is the bad one: After some hours of usage, an interface on your PFSense box will become unreachable, dropping all traffic through it. A reboot will fix it and when you then look at the system log, you will find many lines like

arpresolve: can't allocate llinfo for <IP-Address> on <interface>
I’m in so much pain right now

This will happen if one of your configured gateways in “System > Routing” is reachable both by a local connection and through Tailscale by subnet router (even if your PFSense host itself is told to advertise that route).

I might have overdone the fixing, but here’s all the steps I have taken

  • Tell Tailscale on PFSense to never use any advertised routes (“VPN > Tailscale > Settings”, uncheck “Accept subnet routes that other nodes advertise.”
  • Disable gateway monitoring under “System > Routing > Gateways” by clicking the pencil next to the gateway in question.

I think what happens is that PFSense will accidentally believe that the subnet advertised via Tailscale is not local and will then refuse to add the address of that gateway to its local ARP table.

IMHO, this is a bug in Tailscale. It should never mess with interfaces its exposing as a subnet router to the overlay network.

Log Spam

The second issue is not as bad but as the effect is so far removed from the cause, it’s still worth talking about it.

When looking at the system log (which you will do for above issue), you will see a ton of entries like

sshguard: Exiting on signal
sshguard: Now monitoring attacks.
this can’t be good. Can it?

What happens is that PFSense moved a few releases ago from a binary ring-buffer for logging to a more naïve approach to check once a minute whether a log file is too big, then rotating it and restarting daemons logging to that file.

If a daemon doesn’t have a built-in means for re-opening log files, PFSense will kill and restart the daemon, which happens to be the case for sshguard.

So the question is: Why is the log file being rotated every minute? This is caused by the Tailscale overlay network and the firewall by default blocking Tailscale traffic (UDP port 41641) to the WAN interface and also by default logging every dropped packet.

In order to fix this and assuming you trust Tailscale and their security update policies (which you probably should given that you just installed their package on a gateway), you need to create a rule to allow UDP port 41641 on the WAN interface.

much better now

This, too, IMHO is a bug in the Tailscale package: If your package opens port 41614 on an interface on a machine whose main purpose is being a firewall, you should probably also make sure that traffic to that port is not blocked.

With these two configuration changes in place, the network is stable and the log spam has gone away.

What’s particularly annoying about these two issues is that Googling for either of the two error messages will yield pages and pages of results, none of which apply because they will have many more possible causes and because Tailscale is a very recent addition to PFSense.

This is why I decided to post this article in order to provide one more result in Google and this time combining the two keywords: Tailscale and PFSense, in the hope of helping fellow admins who run into the same issues after installing Tailscale on their routers.

After seven years, the Apple Watch experience still is a mess

Seven years ago, in 2015, the Apple Watch was released and quickly switched focus from a personal communication device with some fitness support to a personal fitness device with anciliary functionality.

Every year since then, Apple released a new version of its watchOS operating system, adding some new features, but most of the time, what was added felt like how Software and Hardware development was done up until the early 2000s where features were made to fill bullets lists, not to actually be used.

To this day, the Apple Watch is a device that nearly gets there but even the basic functionality is hampered by bugs, inconsistencies and features which exist on paper but just plain don’t work in reality.

I am a heavy user of the Apple Watch, and daily I stumble over some 100% reproducible issue that I don’t expect to stumble over in an Apple product, much less one with such a pinpoint focus on a specific use case.

My “user story” matches exactly what the Watch was designed for: I’m wearing the watch all-day to know the time, to get silent notifications and the silent alarm clock. And once a day, I’m going on a running workout while listening to podcasts without taking my phone with me.

I’m a nerd, so I tend to be hung up in a case of XKCD 1172, but none of this user-story feels off what the watch was designed for.

But now, let me guide you through my day of 100% reproducible annoyances which have been present since the respective feature was added to the Watch (multiple years ago) and which to this day have not been addressed and which start to drive me up the wall to the point that I’m now sitting down and writing a long-form article.

First, let me give you context about the Apps and Hardware involved in my setup:

  • I’m using Overcast as my Podcast app. It has a hard-time syncing podcasts due to 3rd party API restrictions, but it’s still better than the built-in Podcasts app because that one cannot even sync the playback order of playlists (native podcasts was a feature added to the watch in 2018) and most of the time, syncing episodes did not work, worked only partially (some episodes downloaded, some not), or appeared to work (progress bar shown in the watch app on the phone but not moving). Streaming over LTE works (at the understandable huge cost of battery life), but even then, I have to manually select the next episode to play because it does not sync playback order of playlists (called “Station” in Apple Podcasts terms)
  • I’m using AirPods Pro as my headphones.
  • I’m tracking my runs using the built-in Workouts app (because that one is more equal than others with regards to what features it can have compared to third-parties) not because it’s better in general.

That’s all.

The trouble starts before I start my workout: Sometimes I want to add a specific podcast to the playlist I’m listening to. Because Overcast (see above for the reasoning why Overcast) only allows to sync one playlist, this means it will have to download that episode.

So I open Overcast and watch it start downloading.

Which is very slow because watchOS decides to use my phone as a proxy over a Bluetooth connection (this was the case since 2015).

I have WiFi enabled, but the Watch doesn’t auto-join my UniFi based Wifi (it works at home, but not at the COVID-related “home-office” location). All other Apple devices join fine. Wifi is a Watch feature since 2015.

But even if I manually join the WiFi (which works fine), watchOS will not stop using Bluetooth, so that won’t improve download speeds (this too was the case since 2015).

Also, because I switched to the settings app, Overcast was force-quit by the OS due to resource constraints, so when I go back to it, the download will be marked as “Failed” and I have to start it again.

So my daily routine before a run when I want to listen to a specific episode of a podcast that has not yet been downloaded for whatever reason, is as follows:

  • go into settings
  • disable bluetooth
  • manually join wifi
  • open Overcast and start the download.
  • Continously tap the screen in order to make sure the app is not terminated while the download is ongoing

You could list this as Overcast’s fault, but other Podcast players on the platform, most notable Apple’s native one also have similar to identical problems (with Apple’s offering being spectacularly bad in that it doesn’t work right even aside of the connectivity problems).

OK. Now I’m ready to run. So I select my episode on Overcast and hit play. But all I get is a prompt to chose an output device. Of course I forgot to re-enable Bluetooth which I fix and after a very long wait time, finally, one the AirPod in my right ear connects to the watch and starts playing the podcast. The left one remains silent until I take it out of my ear and put it back in (this doesn’t happen every time, but it does happen regularly enough for me to write it down here).

As I sit down again at my office chair to put on my running shoes, I accidentally bump the table which causes the mouse to move and the computer to wake up again. Thanks to automatic device switching (a feature added in 2020), the mac immediately grabs the AirPods back from my Watch to play the silence it’s currently playing (this one happens every. single. time).

So I go back to the watch and press play. Another 20s of waiting while the Watch negotiates the handover with my mac and I’m back to my podcast.

Finally it’s time to leave.

I’m in the very privileged position to work right next to where I’m running, so as I start to run, I’m initially going in and out of WiFi range. Every time the watch connects and disconnects of the WiFi, the audio stutters and sometimes breaks off completely (bug present since 2015).

So I stop and disable WiFi.

But now I’m running. Finally.

The workout itself is fine (with the exception of the display issue that if the workout is auto-paused due to me stopping to tie my shoelaces and then resumed, the screen will say “paused” in the top left, but the running animation and timer will still be running – this is a regression in watchOS 8, released in 2021) until the point that I’m getting a notification from a message I want to reply to.

I bought a Series 7 Watch based on their presentation of the new QWERTY keyboard feature for right this purpose.

Unfortunately, the message I got is in German (I live in the german speaking parts of Switzerland after all) and I want to reply in German. The new signature feature of the Series 7 Watch is not available in any language but English though which nobody told me beforehand, so it’s back to either scribble and only being sporadically able to type umlauts or dictation where I can watch the device in real-time bungling my speech into a ridiculous word salad it wants me to send off. The watch is much worse at dictation than the phone.

There’s no reason for the QWERTY keyboard to be Series 7 exclusive but to make more money for Apple which is also why they touted it as a signature feature of this new hardware generation.

They could at least have bothered to make it useable for the majority of the people on this planet (which speak a language other than English).

Anyways – back to the run. It starts to rain and after having had a half-marathon cancelled unbeknownst to me by a wet shirt hitting the touch screen in the past (why not warn me over my headphones if you detect I’m still moving? Ah right. There’s no weather in California, so this problem doesn’t happen), I enable the key lock feature.

After I reach the destination of my run, I want to stop the workout, so I turn to crown to disable the key lock. As that feature was invented for swimming, the loudspeaker of the watch starts playing a sound to eject water that entered the speaker.

All well and good, but also, while playing that sound over the built-in speaker, Bluetooth audio stops. Why? I don’t know, but this misfeature has been present since key lock was introduced in 2016. Sometimes, the audio starts again, sometimes it doesn’t.

But that doesn’t matter anyways, because the moment I’m back in WiFi or Blueooth range with my phone, clearly what needs to happen is that audio needs to stop and Bluetooh needs to be transferred back to my phone which is currently playing… nothing. Also, while transferring audio from phone to the watch takes multiple tens of seconds, the way back is instant.

This always happended here and then before the automatic switching was added in 2020, but since then, it happens every time.

So here you have it. Bug after bug after annoyance every single day. Many of the features I was talking about were added after the initial release of the Watch and were used to coax me into spending money to upgrade to new hardware.

But none of these features work correctly. Some of them just don’t work at all, some of them only work sometimes.

Over the last seven years, the underlying hardware has gotten better and better. The CPU is multiple times faster than it was in 2015. There’s mutliple times more memory available. The battery is larger, there’s more storage available. Marketing has graduated the watch from being a companion of the phone to being a mostly self-sufficient internet-connected device.

Why are apps still being killed after mere milliseconds in the background? Why are apps only awoken rarely to do actions in the background? Apps I have installed manually and I’m using all the time. Why are data transfers from the watch to the phone still basically a crapshoot and if they do work, slow as molasses? Why is Bluetooth audio still hit and miss 6 years after the last iPhone with an audio jack has been released? Why did Series 7 launch with a signature feature only available for a small portion of the planet when there’s no regulatory needs to do so?

The product is supposed to delight and all it does is frustrate me with reproducible and 100% avoidable issues every single day.

This isn’t about wishing for 3rd party apps to have more capabilities. This isn’t about wishing the hardware to do things it’s not advertised to be doing. This isn’t about the frustrating development experience for 3rd parties. This isn’t about sometimes having to reset the watch completely because a feature stopped working suddenly – that happens too, but rarely enough for me to not mind.

This is about first-party features advertized for nearly a decade working only partially or not working at all when all I’m doing is using the product exactly as the marketing copy is telling me I should be using it.

Apple, please allocate the resources the watchOS platform so desperately needs and finally make it so your excellent hardware product can live up to its promise.

Sensational AG is hiring (again)


Sensational AG
 is the company I founded together with a collegue back in 2000. Ever since then, we had a very nice combination of fun, interesting work and a very successful business.

We’re a very small team – just eight programmers, one business guy, a product designer and a bloody excellent project manager. Me personally, I would love to keep the team as small and tightly-knit as possible as that brings huge advantages: No internal politics, a lot of freedoms for everybody and mind-blowing productivity.

I’m still amazed to see what we manage to do with our small team time and time again and yet still manage to keep the job fun. It’s not just the stuff we do outside of immediate work, like UT2004 matches, Cola Double Blind TestsDrone Flights directly from the roof of our office, sometimes hosting JSZurich and meetups for the Zurich Clojure User group and much more – it’s also the work itself that we try to make as fun as possible for everybody.

We are looking for a new member to help us with technical support and smaller scale modifications to our main product, though there’s ample opportunity to grow into helping with bigger projects and getting ownership over pieces of our code-base.

Our main product is an ecommernce platform that’s optimized for wholesale customers. We’re not about presenting a small amount of product in the most enticing manner, but we’re into helping our end users to be as efficient and quick as possible to deal with their big orders (up to 400 line items per week).

Our customers have relatively large amounts of data for us to handle (the largest data set is 2.3 TB in size). I’m always calling our field “medium data” – while it might still fit into memory, it’s definitely too big to deal with it in the naïve way, so it’s not quite big-data yet, but it’s certainly in interesting spheres.

We’re in the comfortable position that the data entrusted to us is growing in the speed that we’re able to learn how to deal with it and so is our architecture. What started as a simple PHP-in-front-of-PostgreSQL deal back in 2004 by now has grown to a cluster of about 40 machines: Job queue servers, importer servers, application servers, media servers, event forwarding servers; because we are hosing our infrastructure for our customers, we can afford to go the extra mile to do things technically interesting and exciting.

Speaking of infrastructure: We own the full stack of our product: Our web application, its connected micro services, our phone apps, our barcode reading apps, but also our backend infrastructure (which is kept up to date by Puppet)

While our main application is a beast of 300k lines of PHP code, we still strive to use the best tool for their jobs and in the last years have grown our infrastructure with tools we have written in Rust, Clojure, JavaScript (via Node.js) and of course our mobile apps are written in their native languages Swift and Java with more and more Kotlin.

We try to stay as current as possible even with our core PHP code. We have upgraded to PHP 7.4 the day it came out and we’re already running PHP 8.0 beta 3 in our staging and development environments, ready to upgrade the day PHP 8 will come out – those of us who write PHP are already excited about the new features coming to 8.0.

As strong believers in Open Source, whenever we come across a bug in our dependencies, we fix it and publish it upstream. Many of our team members have had their patches merged into PHP, Rust, Tantivy and others. Giving back is only fair (and of course also helps us with future maintenance).

If this sounds interesting to you and you want to help us make it possible for our end users to leave their workplace earlier because ordering is so much easier, then ping me at jobs@sensational.ch.

You should be familiar with working on bigger Software projects and understanding of software maintainability over the years. We hardly ever start fresh, but we constantly strive to keep what we have modern and up to speed with wherever technology goes.

You will be initially mostly working on our PHP and JS (ES2020) code-base, but if you’re into another language and it will help you solve a problem you’re having or your skill in a language we’re already working with can help us solve a problem, then you’re more than welcome to help.

If you have UNIX shell experience, that’s a bigger plus, though it’s not required, but you will just have to learn the ropes a bit.

All our work is tracked in git and we’re extremely into beautiful commit histories and thus heavy users of the full feature-set that git offers. But don’t worry – so far, we’ve helped everybody get up to speed.

And finally: As a mostly male team – after all, we only have one woman working on our team of developers, we’d especially love if more women would find their way into our team. All of us are very aware how difficult it is for minorities to find a comfortable working environment they can add their experiences to and where they can be themselves.

In defense of «macOS 10.15 Vista»

With the release of macOS 10.15 Catalina, people are all up in arms about the additional security popups, comparing it to what happened when Windows Vista introduced UAC and its constant prompting for Administrator permission.

I can understand where people are coming from, I do have a slightly contrarian opinion which I would like to voice here as this requires more space than what a comment field on some third-party site offers me.

First, after you read the article I linked above, keep in mind that while these prompts after the first boot after the upgrade are certainly very annoying, there’s a difference to Windows Vista and later:

UAC constantly prompts for elevation when elevation is needed, but the macOS permission given out with the prompts is persistent. Once you have authorized an application, the authorization remains and the same prompt will not appear for the same application.

The screenshot presented in the original article happens after the first boot after the upgrade when a lot of applications are launched the first time. None of the prompts seen in the screenshots will ever appear again.

Blanket permission

OK. But the prompts are still annoying. Isn’t there a way how the OS could ask ahead of time and the user could blanket allow all requests?

That would be cool but it could not possibly work without requiring changes to be made to applications: The applications installed on your machine expect to be able to get access to the things the OS now prompts for permission. In most cases, this even involves synchronous API calls, so the application is suspended while the OS is waiting for user input on the permission prompt.

Finally, knowing ahead of time what APIs an application is going to use is impossible to know, so it’s impossible to list the things an application needs ahead of time. You could run static analysis on a binary, but it would be full of false positives (scaring the user with accesses an application doesn’t need) and false negatives (still showing dialogs).

For an ahead-of-time permission request, an app would need to declare the permissions it needs and then also be prepared for API calls to fail, even though they used to always succeed (and might not even have an option to signal an error to the caller). This means apps need to be updated.

And you know what: At least for some of the features (namely filesystem related things), such a declaration is now possible via the application’s .plist file though, guess what, nobody updated their applications for catalina yet

Off-switch

Fine, so the apps aren’t updated yet. Why isn’t there a way for me to turn this off?

There is a way though: If you boot from the recovery partition (by holding Cmd-R while turning the machine on), you can configure system integrity prevention and gatekeeper to your liking using the command line tool csrutil

Macs with system integrity prevention disabled will not to any of this prompting.

Oh – but disabling system integrity prevention is a security issue? Well – so is letting applications roam free on your disk, control other windows or read keystrokes not sent to themselves.

Oh – but why do I have to reboot to disable this? I want an UI to be able to do so in the running system. If you allow this, then «helpful» applications will silently do that for you which means Apple wouldn’t even had to bother implementing SIP to begin with.

Ok. But why does it have to be such a complicated command-line tool? In order to protect users from themselves. This is a very powerful sledgehammer. With great power comes great responsibility and by making the steps required as complicated as possible, the likelihood it’s going to give somebody pause before blindly following the steps presented by the «Flash Player Installer» increases.

In conclusion

I think the prompts are annoying, but once you’ve gone though the initial flood, they appear very rarely. For me it was a mild inconvenience, but even though I consider myself a somewhat technical user, I love the protection of SIP. In light of ever more devious dark patterns and phishing attempts (that last link was on HN the same day as the article complaining about Catalina, btw).

Longer-term I wish that privacy sensitive APIs will all get asynchronous and will all require declaration ahead of time (like Android – but there, people are complaining too) and I wish that applications would update to these APIs or be forced into adopting them (causing another slew of articles about Apple breaking thousands of old applications), but in the mean time, I’m gladly accepting a prompt here and then if it means I’m harder to phish and harder to have my data exfiltrated from.

Fiber7 TV behind PFSense

As I’ve stated previously, I’m subscribed to what is probably the coolest ISP on earth. Between the full symmetric Gbit/s, their stance on network neutrality, their IPv6 support and their awesome support even for advanced things like setting up an IPv6 reverse DNS delegation(!), there’s nothing you could ever wish for from an ISP.

For some time now, they have also provided an IPTV solution as an additional subscription called tv7.

As somebody who last watched live tv around 20 years ago, I wasn’t really interested to subscribe to that. However, contrary to many other IPTV solutions what’s special about the Fiber7 solution is that they are using IP multicast to deliver the unaltered DVB frames to their users.

For people interested in TV, this is great because it’s, for all intents and purposes, lag free as the data is broadcast directly through their network where interested clients can just pick it up (of course there will be some <1ms lag for the data to move through their network plus some additional <1ms lag as your router forwards the packets to your internal network).

As I never dealt with IP multicast, this was an interesting experiment for me, and when they released their initial offering, they provided a test-stream to see whether your infrastructure was multicast ready or not.

Back then, I never got it to work behind my PFSense setup but as I wasn’t interested in TV, I never bothered spending time on this, though it did hurt my pride.

Fast forward to about three weeks ago where I made a comment on twitter about that pride being hurt to the CEO of fiber7. He informed me that the test stream was down, but then he also sent me a DM to ask me whether I was interested in trying out their tv7 offering, including the beta version of their app for the AppleTV.

That was one evil way to nerd-snipe me, so naturally, I told him that, yes, I would be interested, but that I wasn’t really ever going to use it aside of just getting it to work, because live TV just doesn’t interest me.

Despite the fact that it was past 10pm, he sent me another DM, telling me that he has enabled tv7 for my account.

The rest of the night I spent experimenting with IGMP Proxy and the PFSense firewall to some varying success, but on the next day I was finally successful

You might notice that this is a screenshot of VLC. That’s no coincidence: While Fiber7 officially only supports the AppleTV app, they also offer links on a support page of theirs to m3u and xspf playlists that can be used by advanced users (which is another case of Fiber7 being awesome), so while debugging to make this work, I definitely preferred to using VLC which had a proper debug log.

After I got it to work, I also found a bug in the Beta version of the Fiber7 app where it would never unsubscribe from a multicast group, causing the traffic to my LAN to increase whenever I would switch channels in the app. The traffic wouldn’t decrease even if the AppleTV went to sleep – only a reboot would help.

I’ve reported this to Fiber7 and within a day or two, a new release was pushed to TestFlight in order to fix the issue.

Since this little adventure happened, Fiber7 has changed their offering: Now every Fiber7 account gets free access to tv7 which will probably broaden the possible audience quite a bit.

Which brings me to the second point of this post: To show you the configuration needed if you’re using a PFSense based gateway and you want to make use of tv7.

First, you have to enable the IGMP proxy:

Screen Shot 2018-05-22 at 16.31.15.png

For the LAN interface, please type in the network address and netmask of your internal IPv4 LAN.

What IGMP Proxy does is to listen to clients in your LAN joining to a multicast group and then joining on their behalf on the upstream interface. It will then forward all traffic received on the upstream aimed at the group to the group on the downstream interface. This is where the additional small bit of lag is added, but this is the only way to have multicast cross routing barriers.

This is also mostly done on your routers CPU, but at the 20MBit/s a stream consumes, this shouldn’t be a problem on more or less current hardware.

Anyways – if you want to actually watch TV, you’re not done yet because even though this service is now running, the built-in firewall will drop any packets related to multicast joining and all actual multicast packets containing the video frames.

So the next step is to update the firewall:

Create the following rules for your WAN interface:

Screen Shot 2018-05-22 at 16.39.07.png

You will notice that little gear icon next to the rule. What that means is that additional options are enabled. The extra option you need to enable is this one here:

Screen Shot 2018-05-22 at 16.41.31.png

I don’t really like the second of the two rules. In principle, you only need to allow a single IP: The one of your upstream gateway. But that might change whenever your IPv4 address changes and I don’t think you will want to manually update your firewall rule every time.

Instead, I’m allowing all IGMP traffic from the WAN net, trusting Fiber7 to not leak other subscriber’s IGMP traffic to my network.

Unfortunately, you’re still not quite done.

While this configures the rules for the WAN interface, the default “pass all” rule on the LAN interface will still drop all video packets because the above “Allow IP options” checkbox is off by default for the default pass all rule.

You have to update that too on the “LAN” interface:

Screen Shot 2018-05-22 at 16.46.47.png

And that’s all.

The network I’m listing there, 77.109.128.0/19 is not documented officially. Fiber7 might change that at any time at which point your nice setup will stop working and you’ll have to update the IGMP Proxy and Firewall configuration.

In my case, I’ve determined the network address by running

/usr/local/sbin/igmpproxy -d -vvvv /var/etc/igmpproxy.conf

and checking out the error message where igmpproxy was not allowing traffic to an unknown network. I’ve then looked up the network of the address using whois and updated my config accordingly.

Why I recommend against JWT

Json Web Tokens are all the rage lately. They are lauded as being a stateless alternative to server-side cookies and as the perfect way to use authentication in your single-page app and some people also sell them as a work around for the EU cookie policy because, you know, they work without cookies too.

If you ask me though, I would always recommend against the use of JWT to solve your problem.

Let me give you a few arguments to debunk, from worse to better:

Debunking arguments

It requires no cookies

General “best” practice stores JWT in the browsers local storage and then sends that off to the server in all authenticated API calls.

This is no different from a traditional cookie with the exception that transmission to the server isn’t done automatically by the browsers (which a cookie would be) and that it is significantly less secure than a cookie: As there is no way to set a value in local storage outside of JavaScript, there consequently is no feature equivalent to cookies’ httponly. This means that XSS vulnerabilities in your frontend now give an attacker access to the JWT token.

Worse, as people often use JWT for both a short-lived and a refresh token, this means that any XSS vulnerability now gives the attacker to a valid refresh token that can be used to create new session tokens at-will, even when your session has expired, in the process completely invalidating all the benefits of having separate refresh and access tokens.

“But at least I don’t need to display one of those EU cookie warnings” I hear you say. But did you know that the warning is only required for tracking cookies? Cookies that are required for the operation of your site (so a traditional session cookie) don’t require you to put up that warning in the first place.

It’s stateless

This is another often used argument in favour of JWT: Because the server can put all the required state into them, there’s no need to store any thing on the server end, so you can load-balance incoming requests to whatever app server you want and you don’t need any central store for session state.

In general, that’s true, but it becomes an issue once you need to revoke or refresh tokens.

JWT is often used in conjunction with OAuth where the server issues a relatively short-lived access token and a longer-lived refresh token.

If a client wants to refresh its access token, it’s using its refresh token to do so. The server will validate that and then hand out a new access token.

But for security reasons, you don’t want that refresh token to be re-used (otherwise, a leaked refresh token could be used to gain access to the site for its whole validity period) and you probably also want to invalidate the previously used access token otherwise, if that has leaked, it could be used until its expiration date even though the legitimate client has already refreshed it.

So you need a means to black-list tokens.

Which means you’re back at keeping track of state because that’s the only way to do this. Either you black-list the whole binary representation of the token, or you put some unique ID in the token and then blacklist that (and compare after decoding the token), but what ever you do, you still need to keep track of that shared state.

And once you’re doing that, you lose all the perceived advantages of statelessness.

Worse: Because the server has to invalidate and blacklist both access and refresh token when a refresh happens, a connection failure during a refresh can leave a client without a valid token, forcing users to log in again.

In todays world of mostly mobile clients using the mobile phone network, this happens more often than you’d think. Especially as your access tokens should be relatively short-lived.

It’s better than rolling your own crypto

In general, yes, I agree with that argument. Anything is better than rolling your own crypto. But are you sure your library of choice has implemented the signature check and decryption correctly? Are you keeping up to date with security flaws in your library of choice (or its dependencies).

You know what is still better than using existing crypto? Using no crypto what so ever. If all you hand out to the client to keep is a completely random token and all you do is look up the data assigned to that token, then there’s no crypto anybody could get wrong.

A solution in search of a problem

So once all good arguments in favour of JWT have dissolved, you’re left with all their disadvantages:

  • By default, the JWT spec allows for insecure algorithms and key sizes. It’s up to you to chose safe parameters for your application
  • Doing JWT means you’re doing crypto and you’re decrypting potentially hostile data. Are you up to this additional complexity compared to a single primary key lookup?
  • JWTs contain quite a bit of metadata and other bookkeeping information. Transmitting this for every request is more expensive than just transmitting a single ID.
  • It’s brittle: Your application has to make sure to never make a request to the server without the token present. Every AJAX request your frontend makes needs to manually append the token and as the server has to blacklist both access and refresh tokens whenever they are used, you might accidentally end up without a valid token when the connection fails during refresh.

So are they really useless?

Even despite all these negative arguments, I think that JWT are great for one specific purpose and that’s authentication between different services in the backend if the various services can’t trust each other.

In such a case, you can use very short-lived tokens (with a lifetime measured in seconds at most) and you never have them leave your internal network. All the clients ever see is a traditional session-cookie (in case of a browser-based frontend) or a traditional OAuth access token.

This session cookie or access token is checked by frontend servers (which, yes, have to have access to some shared state, but this isn’t an unsolvable issue) which then issue the required short-lived JW tokens to talk to the various backend services.

Or you use them when you have two loosely coupled backend services who trust each other and need to talk to each other. There too, you can issue short-lived tokens (given you are aware of above described security issues).

In the case of short-lived tokens that never go to the user, you circumvent most of the issues outlined above: They can be truly stateless because thank to their short lifetime, you don’t ever need to blacklist them and they can be stored in a location that’s not exposed to possible XSS attacks against your frontend.

This just leaves the issue of the difficult-to-get-right crypto, but as you never accept tokens from untrusted sources, a whole class of possible attacks becomes impossible, so you might even get away with not updating on an too-regular basis.

So, please, when you are writing your next web API that uses any kind of authentication and you ask yourself “should I use JWT for this”, resist the temptation. Using plain opaque tokens is always better when you talk to an untrusted frontend.

Only when you are working on scaling our your application and splitting it out into multiple disconnected microservices and you need a way to pass credentials between them, then by all means go ahead and investigate JWT – it’ll surely be better than cobbling something up for yourself.

A rant on brace placement

Many people consider it to be good coding style to have braces (in language that use them for block boundaries) on their own line. Like so:

function doSomething($param1, $param2)
{
    echo "param1: $param1 / param2: $param2";
}

Their argument usually is that it clearly shows the block boundaries, thus increasing the readability. I, as a proponent of placing bracers at the end of the statement opening the block, strongly disagree. I would format above code like so:

function doSomething($param1, $param2){
    echo "param1: $param1 / param2: $param2";
}

Here is why I prefer this:

  • In many languages code blocks don’t have their own identity – functions have, but not blocks (they don’t provide scope). Placing the opening brace on its own line, you emphasize the block but you actually make it harder to see what caused the block in the first place.
  • Using correct indentation, the presence of the block should be obvious anyways. There is no need to emphasize it more (at the cost of readability of the block opening statement).
  • I doubt that using one line per token really makes the code more readable. Heck… why don’t we write that sample code like so?
function
doSomething
(
$param1,
$param2
)
{
    echo "param1: $param1 / param2: $param2";
}

PostgreSQL on Ubuntu

Today, it was time to provision another virtual machine. While I’m a large fan of Gentoo, there were some reasons that made me decide to gradually start switching over to Ubuntu Linux for our servers:

  • One of the large advantages of Gentoo is that it’s possible to get bleeding edge packages. Or at least you are supposed to. Lately, it’s taking longer and longer for an ebuild of an updated version to finally become available. Take PostgreSQL for example: It took about 8 months for 8.2 to become available and it looks like history is repeating itself for 8.3
  • It seems like there are more flamewars than real development going on in Gentoo-Land lately (which in the end leads to above problems)
  • Sometimes, init-scripts and stuff changes over time and there is not always a clear upgrade-path. emerge -u world once, then forget to etc-update and on next reboot, hell will break loose.
  • Installing a new system takes ages due to the manual installation process. I’m not saying it’s hard. It’s just time-intensive

Earlier, the advantage of having current packages greatly outweighted the issues coming with Gentoo, but lately, due to the current state of the project, it’s taking longer and longer for packages to become available. So that advantage fades away, leaving me with only the disadvantages.

So at least for now, I’m sorry to say, Gentoo has outlived it’s usefulness on my productive servers and has been replaced by Ubuntu, which albeit not being bleeding-edge with packages, at least provides a very clean update-path and is installed quickly.

But back to the topic which is the installation of PostgreSQL on Ubuntu.

(it’s ironic, btw, that Postgres 8.3 actually is in the current hardy beta, together with a framework to concurrently use multiple versions whereas it’s still nowhere to be seen for Gentoo. Granted: An experimental overlay exists, but that’s mainly untested and I had some headaches installing it on a dev machine)

After installing the packages, you may wonder how to get it running. At least I wondered.

/etc/init.d/postgresql-8.3 start

did nothing (not very nice a thing to do, btw). initdb wasn’t in the path. This was a real WTF moment for me and I assumed some problem in the package installation.

But in the end, it turned out to be an (underdocumented) feature: Ubuntu comes with a really nice framework to keep multiple versions of PostgreSQL running at the same time. And it comes with scripts helping to set up that configuration.

So what I had to do was to create a cluster with

pg_createcluster --lc-collate=de_CH --lc-ctype=de_CH -e utf-8 8.3 main

(your settings my vary – especially the locale settings)

Then it worked flawlessly.

I do have some issues with this process though:

  • it’s underdocumented. Good thing I speak perl and bash, so I could use the source to figure this out.
  • in contrast to about every other package in Ubuntu, the default installation does not come with a working installation. You have to manually create the cluster after installing the packages
  • pg_createcluster –help bails out with an error
  • I had /var/lib/postgresql on its own partition and forgot to remount it after a reboot which caused the init-script to fail with a couple of uninitialized value errors in perl itself. This should be handeled cleaner.

Still. It’s a nice configuration scheme and a real progress from gentoo. The only thing left for me now is to report these issues to the bugtracker and hope to see this fixed eventually. And it it isn’t, there is this post here to remind me and my visitors.

Another new look

It has been a while since the last redesign of gnegg.ch, but is a new look after just a little more than one year of usage really needed?

The point is that I have changed blogging engines yet again. This time it’s from Serendipity to Word Press.

What motivated the change?

Interestingly enough, if you ask me, s9y is clearly the better product than WordPress. If WordPress is Mac OS, then s9y is Linux: It has more features it’s based on cleaner code, it doesn’t have any commercial backing at all. So the question remains: Why switch?

Because that OSX/Linux-analogy also works the other way around: s9y is an ugly duckling compared to WP. External tools won’t work (well) with s9y due to it not being known well enough. The amount of knobs to tweak is sometimes overwhelming and the available plugins are not nearly as polished as the WP ones.

All these are reasons to make me switch. I’ve used a s9y to wp converter, but some heavy tweaking was needed to make it actually transfer category assignements and tags (the former didn’t work, the latter wasn’t even implemented). Unfortunately, the changes were too hackish to actually publish them here, but it’s quite easily done.

Aside of that, most of the site has survived the switch quite nicely (the permalinks are broken once again though), so let’s see how this goes :-)

Impressed by git

The company I’m working with is a Subversion shop. It has been for a long time – since fall of 2004 actually where I finally decided that the time for CVS is over and that I was going to move to subversion. As I was the only developer back then and as the whole infrastructure mainly consisted of CVS and ViewVC (cvsweb back then), this move was an easy one.

Now, we are a team of three developers, heavy trac users and truly dependant on Subversion which is – mainly due to the amount of infrastructure that we built around it – not going away anytime soon.

But none the less: We (mainly I) were feeling the shortcomings of subversion:

  • Branching is not something you do easily. I tried working with branches before, but merging them really hurt, thus making it somewhat prohibitive to branch often.
  • Sometimes, half-finished stuff ends up in the repository. This is unavoidable considering the option of having a bucket load of uncommitted changes in the working copy.
  • Code review is difficult as actually trying out patches is a real pain to do due to the process of sending, applying and reverting patches being a manual kind of work.
  • A pet-peeve of mine though is untested, experimental features developed out of sheer interest. Stuff like that lies in the working copy, waiting to be reviewed or even just having its real-life use discussed. Sooner or later, a needed change must go in and you have the two options of either sneaking in the change (bad), manually diffing out the change (hard to do sometimes) or just forget it and svn revert it (a real shame).

Ever since the Linux kernel first began using Bitkeeper to track development, I knew that there is no technical reason for these problems. I knew that a solution for all this existed and that I just wasn’t ready to try it.

Last weekend, I finally had a look at the different distributed revision control systems out there. Due to the insane amount of infrastructure built around Subversion and not to scare off my team members, I wanted something that integrated into subversion, using that repository as the official place where official code ends up while still giving us the freedom to fix all the problems listed above.

I had a closer look at both Mercurial and git, though in the end, the nicely working SVN integration of git was what made me have a closer look at that.

Contrary to what everyone is saying, I have no problem with the interface of the tool – once you learn the terminology of stuff, it’s quite easy to get used to the system. So far, I did a lot of testing with both live repositories and test repositories – everything working out very nicely. I’ve already seen the impressive branch merging abilities of git (to think that in subversion you actually have to a) find out at which revision a branch was created and to b) remember every patch you cherry-picked…. crazy) and I’m getting into the details more and more.

On our trac installation, I’ve written a tutorial on how we could use git in conjunction with the central Subversion server which allowed me to learn quite a lot about how git works and what it can do for us.

So for me it’s git-all-the-way now and I’m already looking forward to being able to create many little branches containing many little experimental features.

If you have the time and you are interested in gaining many unexpected freedoms in matters of source code management, you too should have a look at git. Also consider that on the side of the subversion backend, no change is needed at all, meaning that even if you are forced to use subversion, you can privately use git to help you manage your work. Nobody would ever have to know.

Very, very nice.