protecting siri

Over the last weekend, 9to5mac.com posted about a hack which shows that it’s possible to run Siri on a iPhone 4 and
an iPod Touch 4g and possibly even oder devices – considering how much of Siri
is running on Apple’s servers.

We’ve always suspected that the decision to restrict Siri to the 4S is
basically a marketing decision and I don’t really care about this either.
Nobody is forcing you to use Siri and thus nobody is forcing you to update to
anything.

Siri is Apple’s product and so are the various iPhones. It’s their decision
whom they want to sell what to.

What I find more interesting is that it was even possible to have a hacked
Siri on a non 4S-phone talk to Apple’s servers. If I were in Apple’s shoes, I
would have made that (practically) impossible.

And here’s how:

Having a device that you put into users hands and trusting it is always a very
hard, if impossible thing to do as the device can (more or less) easily be
tampered with.

So to solve this problem, we need some component that we know reasonably well
to be safe from the user’s tampering and we need to find a way for that
component to prove to the server that indeed the component is available and
healthy.

I would do that using public key crypto and specialized hardware that works
like a TPM. So that would be a chip that contains a private key embedded in
hardware, likely not updatable. Also, that private key will never leave that
device. There is no API to read it.

The only API the chip provides is either a relatively high-level API to sign
an arbitrary binary blob or, more likely, a lower level one to encrypt some
small input (a SHA1 hash for example) with the private key.

OK. Now we have that device (also, it’s likely that the iPhone already has
something like that for its secured boot process). What’s next?

Next you make sure that the initial handshake with your servers requires that
device. Have the server post a challenge to the phone. Have the phone solve it
and have the response signed by that crypto device.

On your server, you will have the matching public key. If the signature checks
out, you talk to the device. If not, you don’t.

Now, it is possible using very expensive hardware to extract that key from the
hardware (by opening the chip’s casing and using a microscope and a lot of
skills). If you are really concerned about this, give each device a unique
private key. If a key gets compromised, blacklist it.

This greatly complicates the manufacturing process of course, so you might go
ahead with just one private key per hardware type and hope that cracking the
key will take longer than the lifetime of the hardware (which is very likely).

This isn’t at all specific to Siri of course. Whenever you have to trust a
device that you put into consumers hands, this is the way to go and I’m sure
we’ll be seeing more of this in the future (imagine the uses for copy
protection – let’s hope we don’t end up there).

I’m not particularly happy that this is possible, but I’d rather talk about it
than to hope that it’s never going to happen – it will and I’ll be pissed.

For now I’m just wondering why Apple wasn’t doing it to protect Siri.

A new fun project

Like back in 2010 I went to JSConf.eu this year around.

One of the many impressive facts about JSConf is the quality of their Wifi
connection. It’s not just free and stable, it’s also fast. Not only that, this
time around, they had a very cool feature: You authenticated via twitter.

As most of the JS community seems to be having twitter accounts anyways, this
was probably the most convenient solution for everyone: You didn’t have to
deal with creating an account or asking someone for a password and on the
other hand, the organizers could make sure that, if abuse should happen,
they’d know whom to notify.

On a related note: This was in stark contrast to the WiFi I had in the hotel
which was unstable, slow and cost a ton of money to use and it didn’t use
Twitter either :-)

In fact, the twitter thing was so cool to see in practice, that I want to use
it for myself too.

Since the days of WEP-only Nintendo DS, I’m running two WiFi networks at home:
One is WPA protected and for my own use, the other is open, but it runs over
a different interface on shion
which has no access to any other machine in my network. This is even more
important as I have a permanent OpenVPN connection
to my office and I definitely don’t want to give the world access to that.

So now the plan would be to change that open network so that it redirects to a
captive portal until the user has authenticated with twitter (I might add
other providers later on – LinkedIn would be awesome for the office for
example).

In order for me to actually get the thing going, I’m doing a tempalias on this
one too and keep a diary of my work.

So here we go. I really think that every year I should do some fun-project
that’s programming related, can be done on my own and is at least of some use.
Last time it was tempalias, this time, it’ll be
Jocotoco (more about the name in the next installment).

But before we take off, let me give, again, huge thanks to the JSConf crew for
the amazing conference they manage to organize year after year. If I could,
I’d already preorder the tickets for next year :p

Attending a JSConf feels like a two-day drug-trip that lasts for at least two
weeks.

E_NOTICE stays off.

I’m sure you’ve used this idiom a lot when writing JavaScript code

options['a'] = options['a'] || 'foobar';

It’s short, it’s concise and it’s clear what it does. In ruby, you can even be more concise:

params[:a] ||= 'foobar'

So you can imagine that I was happy with PHP 5.3’s new ?: operator:

<?php $options['a'] = $options['a'] ?: 'foobar';

In all three cases, the syntax is concise and readable, though arguably, the PHP one could read a bit better, but, ?: still is better than writing the full ternary expression, spelling out $options['a'] three times.

PopScan, since forever (forever being 2004) runs with E_NOTICE turned off. Back in the times, I felt it provided just baggage and I just wanted (had to) get things done quickly.

This, of course, lead to people not taking enough care for the code and recently, I had one too many case of a bug caused by accessing a variable that was undefined in a specific code path.

I decided that I’m willing to spend the effort in cleaning all of this up and making sure that there are no undeclared fields and variables in all of PopScans codebase.

Which turned out to be quite a bit of work as a lot of code is apparently happily relying on the default null that you can read out of undefined variables. Those instances might be ugly, but they are by no means bugs.

Cases where the null wouldn’t be expected are the ones I care about, but I don’t even what to go and discern the two – I’ll just fix all of the instances (embarrassingly many, most of them, thankfully, not mine).

Of course, if I put hours into a cleanup project like this, I want to be sure that nobody destroys my work again over time.

Which is why I was looking into running PHP with E_NOTICE in development mode at least.

Which brings us back to the introduction.

<?php $options['a'] = $options['a'] ?: 'foobar';

is wrong code. Any accessing of an undefined index of an array always raises a notice. It’s not like Python where you can chose (accessing a dictionary using [] will throw a KeyError, but there’s get() which just returns None). No. You don’t get to chose. You only get to add boilerplate:

<?php $options['a'] = isset($options['a']) ? $options['a'] : 'foobar';

See how I’m now spelling $options['a'] three times again? ?: just got a whole lot less useful.

But not only that. Let’s say you have code like this:

<?php
list($host, $port) = explode(':', trim($def))
$port = $port ?: 11211;

IMHO very readable and clear what it does: It extracts a host and a port and sets the port to 11211 if there’s none in the initial string.

This of course won’t work with E_NOTICE enabled. You either lose the very concise list() syntax, or you do – ugh – this:

<?php
list($host, $port) = explode(':', trim($def)) + array(null, null);
$port = $port ?: 11211;

Which looks ugly as hell. And no, you can’t write a wrapper to explode() which always returns an array big enough, because you don’t know what’s big enough. You would have to pass the amount of nulls you want into the call too. That would look nicer then above hack, but it still doesn’t even come close in conciseness to the solution which throws a notice.

So. In the end, I’m just complaining about syntax you might think? I though so too and I wanted to add the syntax I liked, so I did a bit of experimenting.

Here’s a little something I’ve come up with:

<?php
define('IT', 100000);
error_reporting(E_ALL);
$g = array('a' => 'b');
$a = '';
function _(&$in, $k=null){
if (!isset($in)) return null;
if (is_array($in)){
return isset($k) ? ( isset($in[$k]) ? $in[$k] : null) : new WrappedArray($in);
}
return $in ?: null;
}
class WrappedArray{
private $v;
function __construct(&$v){
$this->v = $v;
}
function __get($key){
return isset($this->v[$key]) ? $this->v[$key] : null;
}
function __set($key, $value){
$this->v[$key] = $value;
}
}
error_reporting(E_ALL ^ E_NOTICE);
$start = microtime(true);
for($i = 0; $i < IT; $i++){
$a = $g['gnegg'];
}
$end = microtime(true);
printf("Notices off. Array %d iterations took %.6fs\n", IT, $end-$start);
$start = microtime(true);
for($i = 0; $i < IT; $i++){
$a = isset($g['gnegg']) ? $g['gnegg'] : null;
}
$end = microtime(true);
printf("Notices off. Inline. Array %d iterations took %.6fs\n", IT, $end-$start);
$start = microtime(true);
for($i = 0; $i < IT; $i++){
$a .= $blupp;
}
$end = microtime(true);
$a = "";
printf("Notices off. Var. Array %d iterations took %.6fs\n", IT, $end-$start);
error_reporting(E_ALL);
$start = microtime(true);
for($i = 0; $i < IT; $i++){
@$a = $g['gnegg'];
}
$end = microtime(true);
printf("Notices on. @-operator. %d iterations took %.6fs\n", IT, $end-$start);
$start = microtime(true);
for($i = 0; $i < IT; $i++){
@$a .= $blupp;
}
$end = microtime(true);
printf("Notices on. Var. @-operator. %d iterations took %.6fs\n", IT, $end-$start);
$start = microtime(true);
for($i = 0; $i < IT; $i++){
$a = _($g)->gnegg;
}
$end = microtime(true);
printf("Wrapped array. %d iterations took %.6fs\n", IT, $end-$start);
$start = microtime(true);
for($i = 0; $i < IT; $i++){
$a = _($g, 'gnegg');
}
$end = microtime(true);
printf("Parameter call. %d iterations took %.6fs\n", IT, $end-$start);
$start = microtime(true);
for($i = 0; $i < IT; $i++){
$a .= _($blupp);
}
$end = microtime(true);
$a="";
printf("Undefined var. %d iterations took %.6fs\n", IT, $end-$start);

The wrapped array solution looks really compelling syntax-wise and I could totally see myself using this and even forcing everybody else to go there. But of course, I didn’t trust PHP’s interpreter and thus benchmarked the thing.

pilif@tali ~ % php e_notice_stays_off.php
Notices off. Array 100000 iterations took 0.118751s
Notices off. Inline. Array 100000 iterations took 0.044247s
Notices off. Var. Array 100000 iterations took 0.118603s
Wrapped array. 100000 iterations took 0.962119s
Parameter call. 100000 iterations took 0.406003s
Undefined var. 100000 iterations took 0.194525s

So. Using nice syntactic sugar costs 7 times the performance. The second best solution? Still 4 times. Out of the question. Yes. It could be seen as a micro-optimization, but 100’000 iterations, while a lot is not that many. Waiting nearly a second instead of 0.1 second is crazy, especially for a common operation like this.

Interestingly, the most bloated code (that checks with isset()) is twice as fast as the most readable (just assign). Likely, the notice gets fired regardless of error_reporting() and then just ignored later on.

What really pisses me off about this is the fact that everywhere else PHP doesn’t give a damn. ‘0’ is equal to 0. Heck, even ‘abc’ is equal to 0. It even fails silently many times.

But in a case like this, where there is even newly added nice and concise syntax, it has to be overly conservative. And there’s no way to get to the needed solution but to either write too expensive wrappers or ugly boilerplate.

Dynamic languages give us a very useful tool to be dynamic in the APIs we write. We can create functions that take a dictionary (an array in PHP) of options. We can extend our objects at runtime by just adding a property. And with PHP’s (way too) lenient data conversion rules, we can even do math with user supplied string data.

But can we read data from $_GET without boilerplate? No. Not in PHP. Can we use a dictionary of optional parameters? Not in PHP. PHP would require boilerplate.

If a language basically mandates retyping the same expression three times, then, IMHO, something is broken. And if all the workarounds are either crappy to read or have very bad runtime properties, then something is terribly broken.

So, I decided to just fix the problem (undefined variable access) but leave E_NOTICE where it is (off). There’s always git blame and I’ll make sure I will get a beer every time somebody lets another undefined variable slip in.