Why I recommend against JWT

Json Web Tokens are all the rage lately. They are lauded as being a stateless alternative to server-side cookies and as the perfect way to use authentication in your single-page app and some people also sell them as a work around for the EU cookie policy because, you know, they work without cookies too.

If you ask me though, I would always recommend against the use of JWT to solve your problem.

Let me give you a few arguments to debunk, from worse to better:

Debunking arguments

It requires no cookies

General “best” practice stores JWT in the browsers local storage and then sends that off to the server in all authenticated API calls.

This is no different from a traditional cookie with the exception that transmission to the server isn’t done automatically by the browsers (which a cookie would be) and that it is significantly less secure than a cookie: As there is no way to set a value in local storage outside of JavaScript, there consequently is no feature equivalent to cookies’ httponly. This means that XSS vulnerabilities in your frontend now give an attacker access to the JWT token.

Worse, as people often use JWT for both a short-lived and a refresh token, this means that any XSS vulnerability now gives the attacker to a valid refresh token that can be used to create new session tokens at-will, even when your session has expired, in the process completely invalidating all the benefits of having separate refresh and access tokens.

“But at least I don’t need to display one of those EU cookie warnings” I hear you say. But did you know that the warning is only required for tracking cookies? Cookies that are required for the operation of your site (so a traditional session cookie) don’t require you to put up that warning in the first place.

It’s stateless

This is another often used argument in favour of JWT: Because the server can put all the required state into them, there’s no need to store any thing on the server end, so you can load-balance incoming requests to whatever app server you want and you don’t need any central store for session state.

In general, that’s true, but it becomes an issue once you need to revoke or refresh tokens.

JWT is often used in conjunction with OAuth where the server issues a relatively short-lived access token and a longer-lived refresh token.

If a client wants to refresh its access token, it’s using its refresh token to do so. The server will validate that and then hand out a new access token.

But for security reasons, you don’t want that refresh token to be re-used (otherwise, a leaked refresh token could be used to gain access to the site for its whole validity period) and you probably also want to invalidate the previously used access token otherwise, if that has leaked, it could be used until its expiration date even though the legitimate client has already refreshed it.

So you need a means to black-list tokens.

Which means you’re back at keeping track of state because that’s the only way to do this. Either you black-list the whole binary representation of the token, or you put some unique ID in the token and then blacklist that (and compare after decoding the token), but what ever you do, you still need to keep track of that shared state.

And once you’re doing that, you lose all the perceived advantages of statelessness.

Worse: Because the server has to invalidate and blacklist both access and refresh token when a refresh happens, a connection failure during a refresh can leave a client without a valid token, forcing users to log in again.

In todays world of mostly mobile clients using the mobile phone network, this happens more often than you’d think. Especially as your access tokens should be relatively short-lived.

It’s better than rolling your own crypto

In general, yes, I agree with that argument. Anything is better than rolling your own crypto. But are you sure your library of choice has implemented the signature check and decryption correctly? Are you keeping up to date with security flaws in your library of choice (or its dependencies).

You know what is still better than using existing crypto? Using no crypto what so ever. If all you hand out to the client to keep is a completely random token and all you do is look up the data assigned to that token, then there’s no crypto anybody could get wrong.

A solution in search of a problem

So once all good arguments in favour of JWT have dissolved, you’re left with all their disadvantages:

  • By default, the JWT spec allows for insecure algorithms and key sizes. It’s up to you to chose safe parameters for your application
  • Doing JWT means you’re doing crypto and you’re decrypting potentially hostile data. Are you up to this additional complexity compared to a single primary key lookup?
  • JWTs contain quite a bit of metadata and other bookkeeping information. Transmitting this for every request is more expensive than just transmitting a single ID.
  • It’s brittle: Your application has to make sure to never make a request to the server without the token present. Every AJAX request your frontend makes needs to manually append the token and as the server has to blacklist both access and refresh tokens whenever they are used, you might accidentally end up without a valid token when the connection fails during refresh.

So are they really useless?

Even despite all these negative arguments, I think that JWT are great for one specific purpose and that’s authentication between different services in the backend if the various services can’t trust each other.

In such a case, you can use very short-lived tokens (with a lifetime measured in seconds at most) and you never have them leave your internal network. All the clients ever see is a traditional session-cookie (in case of a browser-based frontend) or a traditional OAuth access token.

This session cookie or access token is checked by frontend servers (which, yes, have to have access to some shared state, but this isn’t an unsolvable issue) which then issue the required short-lived JW tokens to talk to the various backend services.

Or you use them when you have two loosely coupled backend services who trust each other and need to talk to each other. There too, you can issue short-lived tokens (given you are aware of above described security issues).

In the case of short-lived tokens that never go to the user, you circumvent most of the issues outlined above: They can be truly stateless because thank to their short lifetime, you don’t ever need to blacklist them and they can be stored in a location that’s not exposed to possible XSS attacks against your frontend.

This just leaves the issue of the difficult-to-get-right crypto, but as you never accept tokens from untrusted sources, a whole class of possible attacks becomes impossible, so you might even get away with not updating on an too-regular basis.

So, please, when you are writing your next web API that uses any kind of authentication and you ask yourself “should I use JWT for this”, resist the temptation. Using plain opaque tokens is always better when you talk to an untrusted frontend.

Only when you are working on scaling our your application and splitting it out into multiple disconnected microservices and you need a way to pass credentials between them, then by all means go ahead and investigate JWT – it’ll surely be better than cobbling something up for yourself.

sensational ag is hiring an iOS developer

Sensational AG is the company I founded together with a collegue back in 2000. Ever since then, we had a very nice combination of fun, interesting work and a very successful business.

We’re a very small team – just six programmers, one business guy and a product designer. Me personally, I would love to keep the team as small and tightly-knit as possible as that brings huge advantages: No politics, a lot of freedoms for everybody and mind-blowing productivity.

I’m still amazed to see what we manage to do with our small team time and time again and yet still manage to keep the job fun. It’s not just the stuff we do outside of immediate work, like UT2004 matches, Cola Double Blind Tests, Drone Flights directly from the roof of our office, sometimes hosting JSZurich and meetups for the Zurich Clojure User group and much more – it’s also the work itself that we try to make as fun as possible for everybody.

Sure – sometimes, work just has to be done, but we try as much as possible to distribute the fun parts of the work between everybody. Nobody has to be a pure code monkey; nobody constanly pulls the “change this logo there for the customer” card (though, that card certainly exists to be pulled – we just try to distribute it).

Most of the work we do flows into our big eCommerce project: Whenever you order food in a restaurant here in Switzerland, if the restaurant is big enough for them to get the food delivered to them, the stuff you eat will have been ordered using the product of ours.

Whenever you visit a dentist, the things they put in your mouth likely have been ordered using the product of ours.

The work we do helps countless people daily to get their job done more quickly allowing them to go home earlier. The work we do is essential for the operations of many, many companies here in Switzerland, in Germany and in Austria.

From a technical perspective, the work we do is very interesting too: While the main part of the application is a web application, there are many components around it: Barcode Scanners, native Smartphone applications and our very own highly available cluster (real, physical hardware) that hosts the application for the majority of our customers.

As even our end users slowly start to use their mobile phones more and more, so do our native mobile applications gain in importance to the point where we really have to focus a lot more resources on them.

This is where you come in: In order to provide the best possible user experience, we have decided to develop our offlline-first, native mobile applications separately for both iOS and Android and while we have Android pretty much covered, iOS is lagging behind a bit

If you’re interested to help us out with iOS, here’s what you will be working with.

  • The application is written in Swift, so you’ll likely use a lot of Swift during your day, however, we don’t mind if you decide you prefer to use something else.
  • The native application talks to a web service API of our main web application. But as the API is mostly private, you have the ability to directly influence the application in many cases.
  • As some parts of the process are very customizable, we’re looking into embedding react native views into the existing application.
  • As we maintain our applications for a long time, code-archeology is an important part of our work. And archeology is much more easily done with useful self-contained commits, so you’ll likely have a bit of a culture-shock when you see us use every nook and cranny of git’s feature-set. But don’t worry: We’ll help you get up to speed quickly.
  • We have bi-weekly meetings focussed on development practices and challenges we’ve overcome. You will have the direct ability to influence how we work together.

The platform we use to develop on is everybodys own choice. Everybody here uses Macs, but whatever you are most productive with is what you use, though as we’re talking mostly iOS development here, you’re probably going to use a Mac. Be it an iMac or a MacBook Pro – you tell us what you need and we’ll make it possible.

All of the code that we work with daily is home-grown (minus some libraries, of course). We control all of it and we get to play with all the components of the big thing. No component can’t be changed, though we certainly prefer changing some over the others :-)

Between the Cola tests and the technical versatility and challenges described above: If I can interest you, dear reader, to join our crazy productive team in order to improve one hell of a suite of applications, then now is your chance to join up: We need more people
to join our team of developers.

Also, if your particular problem is better solved in $LANGUAGE of your choice, feel free to just do it. Part of the secret behind our productivity is that we know our tools and know when to use them. Good code is readable in any language (though I’d hve to brush up my lisp if you chose to go that route).

Interested? I would love to hear from you at
phofstetter@sensational.ch.

Tracking runs with the Apple Watch

When I started running this summer, I also wanted to make use of my Apple watch to keep track of my routes and my speed over time, so I looked into the various apps and services around that.

Generally, there are two parts to tracking a run: One part is the actual data gathering that happens while you’re running and the other part is the analysis and comparison other other runs afterwards.

Unfortunately, of all the applications I looked at, none excelled at both, so in the end what I’ve ended up with is writing custom code to give me the best of both worlds.

Here’s what I’ve looked at.

Apple Workouts.app

The built-in Workout app of the watch, being a watch-native app made by Apple, is more equal than other apps: It’s the only app that allows you to trigger the screen lock while in the app and with WatchOS 4 it’s also the only app that gives you very easy access to the media controls. And finally, it’s the only app that can log its tracked workout and movement Activitiy in the Activitiy app in their actual colors instead of just gray (yes. very important this).

It offers very readable data on-screen while it’s running, it can send timely notifications as you pass another kilometre and it never crashes.

Looking at the Map of the run in the Activity app, it also collects very accurate location data.

As good as it is for collecting data, as bad it is for analysis of the data though: The best you can do is have a look at a single workout. There’s no way to compare two – unless you take screenshots and do that manually.

There is also no way to export the data: In the workout details there is a share button, but that just exports a corny text and a useless picture. No detail is included.

Screen Shot 2017-09-04 at 08.51.13.png
Yes. This is all you get from exporting a run. I love the picture. So useful.

So for any analysis you want to do based on runs recorded with the Workouts app, you have to first manually transfer data from screenshots to some other machine readable form and even then: The screenshots alone don’t provide nearly enough useful data.

Strava

This is the other extreme in the list of apps I looked at: It provides excellent analysis and it has an extremely motivating high-score list for user-provided segments of a run. You don’t have to match routes exactly – the moment you run through an existing previously created segment, you’ll be able to compare your effort to others.

Screen Shot 2017-09-04 at 08.47.09.png
Segment rankings (this segment is uphill)

It’s also great at automatically matching previous runs over the same route, so you can compare your runs over time.

Screen Shot 2017-09-04 at 08.44.32.png
Getting faster over time (this mostly uphill too)

The other social features it offers don’t interest me, so I can’t really talk about them.

However: As good as the analysis is, as bad its recording feature is: Of all the apps I looked at it provides the least amount of detail during the run and, what’s worse, its GPS tracking is extremely inaccurate and unreliable.

I’m always running having my phone with me – mainly for easy access to all my media and to Overcast and also because most of my runs I do on my way home from the office where I need the phone anyways. Strava doesn’t make use of this but instead solely relies on the watches GPS which is much less accurate than the phones.

I can understand this: The device is smaller, so it’s harder to put in powerful antennas, it has way less battery and a much weaker CPU than the phone, so it just can’t be as good. It’s totally ok for when you only have the watch with you, but when you have the phone with you, it’s a shame if the app can’t use it.

Runkeeper

Runkeeper uses both the watch and the phone for location tracking and it provides a great UI while the workout is ongoing.

Its analysis features aren’t as good as the ones from Strava though. It doesn’t do the automated segment high-scoring and it’s not as good at comparing runs over the same route with each other.

And finally, the UI of the site doesn’t look as polished as does Strava’s – but that’s just a matter of taste I guess.

… master of none

For all of July and August, my mode of operation was to use Runkeeper to acquire the data during the run and then to export a .gpx file from their site and to import it into Strava.

This gave me the best of both worlds: Very good data gathering and very good data analysis.

However, I wasn’t entirely happy with this either as the process was somewhat cumbersome and, lately, unstable.

Probably caused by iOS 11 Beta, I’ve seen various failure modes related to Runkeeper, all of wich are very annoying:

  1. The workout might start on the Watch but it will not manage to also start it on the phone. This way, the workout will be tracked, but no route data will be saved.
  2. Runkeeper on the phone will crash after about 10 minutes. There’s no indication of this happening, but the result will be that a 10 minutes run is logged instead of the real data on the watch. If this happens, there is no way to even just get to the data without the route.

Issue 1) I could work around often by launching Runkeeper manually on the phone, then starting the workout on the watch and then making sure that the workout would also start on the phone.

If that happened, then route data was tracked correctly.

Unfortunately, sometimes, this stopped working all-together and the only way for the watch to talk to the phone again was to completely uninstall and reinstall Runkeeper on both the Phone and the Watch. This is annoying when you want to start running, but you can’t because the Software-gods have put 20 minutes of fiddling with the App Store in front of you (also, Runkeeper is bigger than the App Store’s 3G download limit, so you better have wifi  available).

Issue 2) is much worse though: There’s no indication of it happening. You’d think that the blue bar “Runkeeper is actively using your location” on the phone would be a good indicator, but it isn’t: When the crash happens, the bar stays there until you unlock your phone. Then it goes away.

So there’s no way to be sure unless you periodically unlock your phone which is very annoying and distracting during the run – especially as you’re sweaty and TouchID won’t work most of the time (I use a strong 25 character password).

I know – even if it isn’t tracked, a run is a run. But it certainly doesn’t feel that way and how it feels is very important to keep motivated to doing this – especially under bad weather conditions.

let’s just hack it

Now, admittedly, these are very likely beta-woes that will eventually solve themselves. We’re pretty far into the beta cycle though (Beta 9 at the time of this writing), so I’m suspicious that these issues won’t be fixed come release but will have to wait for a future update to either Runkeeper or the OS.

Losing about a third of my runs to software issues felt really unacceptable to me, especially considering that Runkeeper still wasn’t offering some features that the workout app was (like enabling the screen lock – which is important when running in the rain).

However, when I looked again at the WWDC sessions this year, I found out that IOS11 will finally offer an API to read and write route data for workouts. This means that the data you track using Apple’s built-in app will finally be available to other apps to read.

This would give me the best of all worlds: Use the best data-gathering app and export it to the best analysis app; side-stepping the stability issues.

IMG_1324.png
Such Wow. Much UI.

So, this weekend, I hacked together a quick solution (MIT licensed) that does exactly this. It lists you all your workouts and if you tap one, it will eventually show you a share-sheet, allowing you to select a location to store a .gpx file to.

That file contains all the information required for Strava to do its analysis.

In a perfect world, this app would of course upload directly to Strava. And it would not block the UI thread while it’s exporting the gpx file. And it would actually have some UI to speak of.

But this was a quick-hack that solved an issue for me – and who knows – maybe it will fix it for you.

If you need a real solution for this, Twitter user @dwlz is apparently working on a real app that will be usable for normal people and I’ll definitely switch to that when it’s ready. But until then, I can finally track my runs with the peace of mind of having a crash-free solution that still provides the best analysis possible.

Screen Shot 2017-09-04 at 08.54.14.png
A run tracked with the Workout app and uploaded to Strava using my hack

pdo_pgsql improvements

Last autumn, I was talking about how I would like to see pdo_pgsql for PHP to be improved.

Over the last few days I had time to seriously start looking into making sure I get my wish. Even though my C is very rusty and I have next to no experience in dealing with the PHP/Zend API, I made quite a bit of progress over the last few days.

First, JSON support

json

If you have the json extension enabled in your PHP install (it’s enabled by default), then any column of data type json will be automatically parsed and returned to you as an array.

No need to constantly repeat yourself with json_parse(). This works, of course, with directly selected json columns or with any expression that returns json (like array_to_json or the direct typecast shown in the screenshot).

This is off by default and can be enabled on a per-connection or a per-statement level as to not break backwards compatibility (I’ll need it off until I get a chance to clean up PopScan for example).

Next, array support:

array

Just like with JSON, this will automatically turn any array expression (of the built-in array types) into an array to use from PHP.

As I’m writing this blog entry here, this only works for text[] and it’s always enabled.

Once I have an elegant way to deal with the various types of arrays and convert them into the correct PHP types, I’ll work on making this turnoffable (technical term) too.

I’ll probably combine this and the automatic JSON parsing into just one setting which will include various extended data types both Postgres and PHP know about.

Once I’ve done that, I’ll look into more points on my wishlist (better error reporting with 9.3 and later and a way to quote identifiers comes to mind) and then I’ll probably try to write a proper RFC and propose this for inclusion into PHP itself (though don’t get your hopes up – they are a conservative bunch).

If you want to follow along with my work, have a look at my pdo_pgsql-improvements branch on github (tracks to PHP-5.5)

Ansible

In the summer of 2012, I had the great oportunity to clean up our hosting
infrastructure. Instead of running many differently configured VMs, mostly one
per customer, we started building a real redundant infrastructure with two
really beefy physical database machines (yay) and quite many (22) virtual
machines for caching, web app servers, file servers and so on.

All components are fully redundant, every box can fail without anybody really
needing to do anything (one exception is the database – that’s also redundant,
but we fail over manually due to the huge cost in time to failback).

Of course you don’t manage ~20 machines manually any more: Aside of the fact
that it would be really painful to do for those that have to be configured in an
identical way (the app servers come to mind), you also want to be able to
quickly bring a new box online which means you don’t have time to manually go
through the hassle of configuring it.

So, In the summer of 2012, when we started working on this, we decided to go
with puppet. We also considered Chef but their server
was really complicated to set up and install and there was zero incentive for
them to improve because that would, after all, disincentivse people from
becoming customers of their hosted solutions (the joys of open-core).

Puppet is also commerically backed, but everything they do is available as open
source and their approach for the central server is much more «batteries
included» than what Chef has provided.

And finally, after playing around a bit with both Chef and puppet, we noticed
that puppet was way more bitchy and less tolerant of quick hacks around issues
which felt like a good thing for people dabbling with HA configuration of a
multi machine cluster for the first time.

Fast forward one year: Last autumn I found out about
ansible (linking to their github page –
their website reads like a competition in buzzword-bingo) and after reading
their documentation, I immediately was convinced:

  • No need to install an agent on managed machines
  • Trivial to bootstrap machines (due to above point)
  • Contributors don’t need to sign a CLA (thank you so much, ansibleworks!)
  • No need to manually define dependencies of tasks: Tasks are run requentially
  • Built-in support for cowsay by default
  • Many often-used modules included by default, no hunting for, say, a sysctl
    module on github
  • Very nice support for rolling updates
  • Also providing a means to quickly do one-off tasks
  • Very easy to make configuration entries based on the host inventory (which requires puppetdb and an external database in the case of puppet)

Because ansible connects to each machine individually via SSH, running it
against a full cluster of machines is going to take a bit longer than with
puppet, but our cluster is small, so that wasn’t that much of a deterrent.

So last Sunday evening I started working on porting our configuration over from
puppet to Ansible and after getting used to the YAML syntax of the playbooks, I
made very quick progress.

progress

Again, I’d like to point out the excellent, built-in, on-by-default support for
cowsay as one of the killer-features that made me seriously consider starting
the porting effort.

Unfortunately though, after a very promising start, I had to come to the
conclusion that we will be sticking with puppet for the time being because
there’s one single feature that Ansible doesn’t have and that I really, really
want a configuration management system to have:

I’ts not possible in Ansible to tell it to keep a directory clean of files not
managed by Ansible in some way

There are, of course, workarounds, but they come at a price too high for me to
be willing to pay.

  • You could first clean a directory completely using a shell command, but this
    will lead to ansible detecting a change to that folder every time it runs which
    will cause server restarts, even when they are not needed.

  • You could do something like this stack overflow question
    but this has the disadvantage that it forces you into a configuration file
    specific playbook design instead of a role specific one.

What I mean is that using the second workaround, you can only have one playbook
touching that folder. But imagine for example a case where you want to work with
/etc/sysctl.d: A generic role would put some stuff there, but then your
firewall role might put more stuff there (to enable ip forwarding) and your
database role might want to add other stuff (like tweaking shmmax and shmall,
though that’s thankfully not needed any more in current Postgres releases).

So suddenly your /etc/sysctl.d role needs to know about firewalls and
databases which totally violates the really nice separation of concerns between
roles. Instead of having a firewall and a database role both doing something to
/etc/sysctl.d, you know need a sysctl-role which does different things
depending on what other roles a machine has.

Or, of course, you just don’t care that stray files never get removed, but
honestly: Do you really want to live with the fact that your /etc/sysctl.d, or
worse, /etc/sudoers.d can contain files not managed by ansible and likely not
intended to be there? Both sysctl.d and sudoers.d are more than capable of doing
immense damage to your boxes and this sneakily behind the watching eye of your
configuration management system?

For me that’s inacceptable.

So despite all the nice advantages (like cowsay), this one feature is something
that I really need and can’t have right now and which, thus, forces me to stay
away from Ansible for now.

It’s a shame.

Some people tell me that implementing my feature would require puppet’s feature
of building a full state of a machine before doing anything (which is error-
prone and frustrating for users at times), but that’s not really true.

If ansible modules could talk to each other – maybe loosly coupled by firing
some events as they do stuff, you could just name the task that makes sure the
directory exists first and then have that task register some kind of event
handler to be notified as other tasks touch the directory.

Then, at the end, remove everything you didn’t get an event for.

Yes. This would probably (I don’t know how Ansible is implemented internally)
mess with the decouplling of modules a bit, but it would be so far removed
from re-implementing puppet.

Which is why I’m posting this here – maybe, just maybe, somebody reads my plight
and can bring up a discussion and maybe even a solution for this. Trust me: I’d
so much rather use Ansible than puppet, it’s crazy, but I also want to make sure
that no stray file in /etc/sysctl.d will bring down a machine.

Yeah. This is probably the most words I’ve ever used for a feature request, but
this one is really, really important for me which is why I’m so passionate about
this. Ansible got so f’ing much right. It’s such a shame to still be left
unable to really use it.

Is this a case of xkcd1172? Maybe, but to me, my
request seems reasonable. It’s not? Enlighten me! It is? Great! Let’s work on
fixing this.

pdo_pgsql needs some love

Today, PostgreSQL 9.3 was released.
September is always the month of PostgreSQL as every September a new
Major Release with awesome new feature is released and every September
I have to fight the urgue to run and immediately update the production
systems to the new version of my
favorite toy

As every year, I want to talk the awesome guys (and girls I hope) that
make PostgreSQL one of my favorite pieces of software overall and for
certain my most favorite database system.

That said, there’s another aspect of PostgreSQL that needs some serious
love: While back in the days PHP was known for its robust database
client libraries, over time other language environments have caught up
and long since surpassed what’s possible in PHP.

To be honest, the PostgreSQL client libraries as they are currently
available in PHP are in serious need of some love.

If you want to connect to a PostgreSQL database, you have two options:
Either you use the thin wrapper over libpq, the pgsql extension,
or you go PDO at which point, you’d use pdo_pgsql

Both solutions are, unfortunately, quite inadequate solutions that fail
to expose most of the awesomeness that is PostgreSQL to the user:

pgsql

On the positive side, being a small wrapper around libpq, the pgsql
extension knows quite a bit about Postgres’ internals: It has excellent
support for COPY, it knows about a result sets data types (but doesn’t
use that knowledge as you’ll see below), it has pg_quote_identifier
to correctly quote identifiers, it support asynchronous queries and it
supports NOTIFY.

But, while pgsql knows a lot about Postgres’ specifics, to this day,
the pg_fetch_* functions convert all columns into strings. Numeric
types? String. Dates? String. Booleans? Yes. String too (‘t’ or ‘f’,
both trueish values to PHP).

To this day, while the extension supports prepared statements, their
use is terribly inconvenient, forcing you to name your statements and
to manually free them.

To this day, the pg_fetch_* functions load the whole result set into
an internal buffer, making it impossible to stream results out to the
client using an iterator pattern. Well. Of course it’s still possible,
but you waste the memory for that internal buffer, forcing you to
manually play with DECLARE CURSOR and friends.

There is zero support for advanced data types in Postgres and the
library doesn’t help at all with todays best practices for accessing a
database (prepared statements).

There are other things that make the extension unpractical for me, but
they are not the extensions fault, so I won’t spend any time explaining
them here (like the lack of support by newrelic – but, as I said,
that’s not the extensions fault)

pdo_pgsql

pdo_pgsql gets a lot of stuff right that the pgsql extension doesn’t:
It doesn’t read the whole result set into memory, it knows a bit about
data types, preserving numbers and booleans and, being a PDO driver, it
follows the generic PDO paradigms, giving a unified API with other PDO
modules.

It also has good support for prepared statements (not perfect, but
that’s PDOs fault).

But it also has some warts:

  • There’s no way to safely quote an identifier. Yes. That’s a PDO
    shortcoming, but still. It should be there.
  • While it knows about numbers and booleans, it doesn’t know about any of the other more advanced data types.
  • Getting metadata about a query result actually makes it query the
    database – once per column, even though the information is right there
    in libpq, available to use (look at the
    source
    of PDOStatement::getColumnMeta). This makes it impossible to fix above issue in userland.
  • It has zero support for COPY

If only

Imagine the joy of having a pdo_pgsql that actually cares about
Postgres. Imagine how selecting a JSON column would give you its data
already decoded, ready to use in userland (or at least an option to).

Imagine how selecting dates would at least give you the option of
getting them as a DateTime (there’s loss of precision though –
Postgres’ TIMESTAMP has more precision than DateTime)

Imagine how selecting an array type in postgres would actually give you
back an array in PHP. The string that you have to deal with now is
notoriously hard to parse. Yes. There now is array_to_json in
Postgres, but hat shouldn’t be needed.

Imagine how selecting a HSTORE would give you an associative array.

Imagine using COPY with pdo_pgsql for very quickly moving bulk data.

Imagine the new features of PGResult being exposed to userland.
Giving applications the ability to detect what constraint was just
violated (very handy to detect whether it’s safe to retry).

Wouldn’t that be fun? Wouldn’t that save us from having to type so much
boilerplate all day?

Honestly, what I think should happen is somebody should create a
pdo_pgsql2 that breaks backwards compatibility and adds all these
features.

Have getColumnMeta just return the OID instead of querying the
database. Have a quoteIdentifier method (yes. That should be in PDO
itself, but let’s fix it where we can).

Have fetch() return Arrays or Objects for JSON columns. Have it
return Arrays for arrays and HSTOREs. Have it optionally return
DateTimes instead of strings.

Wouldn’t that be great?

Unfortunately, while I can write some C, I’m not nearly good enough
to produce something that I could live with other people using, so any
progress I can achieve will be slow.

I’m also unsure of whether this would ever have a chance to land in PHP
itself. Internals are very adverse to adding new features to stuff that
already “works” and no matter how good the proposal, you need a very
thick skin if you want to ever get something merged, no matter whether
you can actually offer patches or not.

Would people be using an external pdo_pgsql2? Would it have a chance as
a pecl extension? Do other people see a need for this? Is somebody
willing to help me? I really think something needs to be done and I’m
willing to get my hands dirty – I just have my doubts about the quality
of the result I’m capable of producing. But I can certainly try.

And I will.

when in doubt – SSL

Since 2006, as part of our product, we are offering barcode scanners
with GSM support to either send orders directly to the vendor or to
transmit products into the web frontend where you can further edit them.

Even though the devices (Windows Mobile. Crappy. In progress of
updating) do support WiFi, we really only support GSM because that means we don’t have to share the end users infrastructure.

This is a huge plus because it means that no matter how locked-down the
customer’s infrastructure, no matter how crappy the proxy, no matter the IDS in use, we’ll always be able to communicate with our server.

Until, of course, the mobile carrier most used by our customers decides
to add a “transparent” (do note the quotes) proxy to the mix.

We were quite stomped last week when we got reports of an HTTP error 408 to be reported by the mobile devices, especially because we were not seeing error 408 in our logs.

Worse, using tcpdump has clearly shown how we were getting a RST
packet from the client, sometimes before sending data, sometimes while
sending data.

Strange: Client is showing 408, server is seeing a RST from the client.
Doesnt’ make sense.

Tethering my Mac using the iPhones personal hotspot feature and a SIM
card of the mobile provider in question made it clear: No longer are we
talking directly to our server. No. What the client receives is a 408
HTML formatted error message by a proxy server.

Do note the “DELETE THIS LINE” and “your organization here” comments.
What a nice touch. Somebody was really spending alot of time getting
this up and running.

Now granted, taking 20 seconds before being able to produce a response
is a bit on the longer side, but unfortunately, some versions of the
scanner software require gzip compression and gzip compression needs to
know the full size of the body to compress, so we have to prepare the
full response (40 megs uncompressed) before being able to send anything
– that just takes a while.

But consider long-polling or server sent events – receiving a 408 after
just 20 seconds? That’s annoying, wasting resources and probably not
something you’re prepared for.

Worse, nobody was notified of this change. For 7 years, the clients
were able to connect directly to our server. Then one day it changes
and now they aren’t. No communication, no time to prepare and
certainly too strict limits in order to not affect anything (not
just us – see my remark about long polling).

The solution in the end is, like so often, to use SSL. SSL connections
are opaque to any reverse proxy. A proxy can’t decrypt the data without
the client noticing. An SSL connection can’t be inspected and an SSL
connection can’t be messed with.

Sure enough: The exact same request that fails with that 408 over HTTP
goes through nicely using HTTPS.

This trick works every time when somebody is messing with your
connection. Something f’ing up your WebSocket connection? Use SSL!
Something messing with your long-polling? Use SSL. Something
decompressing your response but not stripping off the Content-Encoding
header (yes. that happend to me once)? Use SSL. Something replacing
arbitrary numbers in your response with asterisks (yepp. happened too)?
You guessed it: Use SSL.

Of course, there are three things to keep in mind:

  1. Due to the lack of SNI in the world’s most used OS and Browser
    combination (any IE under Windows XP), every SSL site you host requires
    one dedicated IP address. Which is bad considering that we are running
    out of addresses.

  2. All of the bigger mobile carriers have their CA in the browsers
    trusted list. Aside of ethics, there is no reason what so ever for them
    to not start doing all the crap I described and just re-encrypting the
    connection, faking a certificate using their trusted ones.

  3. failing that, they still might just block SSL at some point, but as
    more and more sites are going SSL only (partially for above reasons no
    doubt), outright blocking SSL is going to be more and more unlikely to
    happen.

So. Yes. When in doubt: Use SSL. Not only does that help your users
privacy, it also fixes a ton of technical issues created by practically
non-authorized third-party messing with you.