Over the past couple of weeks there’s been a lot written about Google’s apparently unintentional capture of small amounts of wi-fi data (admitted by Google) whilst driving Street View cars around. I find this interesting not so much because of the actual issue itself, but because it highlights several key underlying issues:
- people’s understanding of the technology they use – even at a high level
- the general use of encryption, or lack of it
- the apparent conflation of many issues under the banner of “privacy”
Essentially, Google were apparently capturing unique identifiers from wi-fi networks together with geographical data about where they were. (Presumably interesting as another – somewhat more ad-hoc than GPS – method of establishing physical location and which seems to be me to be uncontroversial). The point of the story was that whilst doing this they apparently accidentally left in some test code that captured a bit more than just network identifiers, and stored some actual network traffic too.
The technology, and people’s understanding of it
I find it interesting that so many commentators find this story so horrifying. The data captured was being broadcast openly on the radio! Google just happened to be driving past and apparently got some tiny snippets. Anyone, possibly with ill-intent, could sit out in the street and capture the same data, not just for a matter of seconds but for hours or even days. That’s the nature of wi-fi; if you’re not comfortable with that idea, then in simple terms – don’t use wi-fi. If you start from the assumption that someone’s capturing all the data you’re broadcasting, suddenly the fact that Google has a few bits of it doesn’t seem so exciting.
(I’m not saying it’s right or good to sit recording other people’s wi-fi – but it is ultimately being broadcast in public)
What has largely been overlooked, is the importance then of using encryption more widely. Strong encryption implies that if someone captures the data, it’s scrambled – and thus decrypting it would be difficult or impossible. In this case, there are at least two potential protections offered against random people packet-grabbing wi-fi data:
- encryption in the wi-fi protocol itself; typically WEP or WPA. Now, unfortunately, WEP has long been regarded as breakable. WPA configured properly is considered by most to give a relatively high level of security, at least against casual packet-sniffing
- encryption of data itself, whilst it’s in transit over wi-fi.
The second is more interesting and arguably important, because with encryption at the higher level, even if you’re using unencrypted wi-fi, or wi-fi provided by a nefarious operator (who could capture even WPA-encrypted data at the wireless router) then you’re still protected. SSL/TLS (known by many users by a “browser padlock” when visiting websites) is far from new, but unfortunately there are plenty of major websites and services out there still not offering it, let alone the myriad of smaller websites that still gather personal data in one form or another. By way of pertinent example, it’s only this week – coincidentally or otherwise – that Google introduced an SSL option to their main search site – and the secured option doesn’t include all types of search at present. To my mind, that is more of a scandal than the fact some poor engineer left a couple of lines of debugging/testing code that resulted in some Google cars grabbing some probably tiny and irrelevant fragments of data that they probably don’t want anyway.
“Privacy” and the conflation of issues
There are huge challenges in how we manage growing amounts of personal data (created by ourselves, or others) scattered around an increasing number of online spaces with effectively unlimited storage and a plethora services to allow you share that data more or less selectively, with others. Discussions about these will probably form an important part of shaping the further development of online applications over the coming years, especially with developments such as Google’s Chrome OS, which is a bit like the 21st century version of 1970′s dumb terminals, where your laptop is just an interface to applications and data served centrally – in this case over the Internet. It’s not hard to see the concerns raised by effectively moving data that would otherwise be on your private computer to servers hosted by large concerns, mixed up with the data of millions of other people, with only software – written by fallible humans – to keep them apart and manage that data – and who it’s shared with – correctly. Of course, there’s implicit trust in all the software you use all the time, but the sheer scale if nothing else of services hosted by the likes of Google makes for some legitimate questions such as – how, given inevitable software bugs or errors, can you at least confine lapses which expose private data to have the least possible impact. (Note also that some of these concerns are probably better categorised as “security” rather than “privacy”)
However, there seems to be a lack of understanding of the detail here. The BBC’s Rory Cellan-Jones seems to conflate this with general issues of online privacy by suggesting to Google’s founders, whilst discussing the wi-fi captures, that “…the privacy issue was something of a crisis for the whole internet industry” and seems surprised at the suggestion that he is creating “hyperbole”. Charles Arthur at the Guardian does a similar thing, linking it to more general discussions about privacy, and Stephen Conroy, the Australian Communications Minister has called it, apparently without irony, “the largest privacy breach in history across Western democracies“. (This is coming from the authoritarian Australian government, who want country-wide Internet censorship and to sniff around for adult material on laptops just because someone happens to be crossing customs, so perhaps they’re not the best judges).
But Google grabbing wi-fi data is quite different as an issue to the challenges I mention above, and simplistically linking them under some grand “privacy” banner doesn’t help anyone in terms of understanding what the challenges are, how they can be addressed, and who the good guys and bad guys are. I don’t generally think that “Who was harmed? Name the person.” is a particularly good argument when defending oneself against alleged privacy violations, but in this case I have some degree of sympathy for Eric Schmidt.