Wednesday, April 09, 2008

.

Your privacy on the Internet

The New York Times recently published an op-ed piece by Adam Cohen about how the information that Internet Service Providers (ISPs) have about their users can be a significant privacy issue:

Technology companies have long used “cookies,” little bits of tracking software slipped onto your computer, and other means, to record the Web sites you visit, the ads you click on, even the words you enter in search engines — information that some hold onto forever. They’re not telling you they’re doing it, and they’re not asking permission. Internet service providers are now getting into the act. Because they control your connection, they can keep track of everything you do online, and there have been reports that I.S.P.’s may have started to sell the information they collect.

Mr Cohen talks about targeted advertisements — a practice that most people consider benign, or even helpful, without considering the privacy implications — but goes on to talk about how much further things go:

The bigger issue is the digital dossiers that tech companies can compile. Some companies have promised to keep data confidential, or to obscure it so it cannot be traced back to individuals. But it’s hard to know what a particular company’s policy is, and there are too many to keep track of. And privacy policies can be changed at any time.

There is also no guarantee that the information will stay with the company that collected it. It can be sold to employers or insurance companies, which have financial motives for wanting to know if their workers and policyholders are alcoholics or have AIDS.

It could also end up with the government, which needs only to serve a subpoena to get it (and these days that formality might be ignored).

A bit of this sort of thing got press last November, when Facebook’s “Beacon” feature started exposing people’s Internet purchases to their online “friends”. So, yes, as Mr Cohen notes:

The public has been slow to express outrage — not, as tech companies like to claim, because they don’t care about privacy, but simply because few people know all that is going on. That is changing. “A lot of people are creeped-out by this,” says Ari Schwartz, a vice president of the Center for Democracy and Technology. He says the government is under increasing pressure to act.

The message of the op-ed piece is, in the end, that we need to be aware of this and that federal laws that address privacy of telephone communications need to be extended to address the Internet.

I agree. But that will only go so far, because

  1. it’ll be hard for laws to address what information companies collect and keep when you visit their web sites and use their services, and
  2. while it is true that Internet users are gradually becoming more aware of privacy issues, it’s also true that there’s far too much of which people aren’t aware.
In particular, I want to say something about what Mr Cohen breezed past at the beginning of his essay: browser cookies.

Long ago, early in the days of the worldwide web, “cookies” were devised as a way to add some state to the stateless HTTP protocol. Basically, the issue is that hypertext transfer protocol (HTTP) is (mostly) sessionless, stateless, and identityless. When you retrieve a web page, your browser contacts the web server and says, “give me [x]”. The response is the content of web page [x], after which your connection with the server ends. When you click on page [y] on the same web site, your browser again connects to the (same) web server and says, “give me [y]”. There’s no reliable way for the server to link those two requests, to know that they were done by the same user, the same browser, the same “browsing session”.

The server could look at your computer’s Internet address, which it gets when your browser makes the connection. But that’s not reliable, because the address could have been reassigned to another computer between the two requests. Or, more likely, because you and I are both using the same HTTP proxy, and the address it sees is the address of the proxy server, not of our individual computers.

But if we’re browsing a catalogue and accumulating items in “shopping carts”, the web server has to keep track of what got put into what cart. And you can come up with many other good reasons for a server to know that a new request is a follow-up to an earlier one.

And, so, we have cookies. A cookie is simply a small bit of data that the web server gives to your browser as part of the response to “give me [x]”. It basically says, “here; save this, and give it back to me when I ask for it.” When you ask for page [y], the web server says, “Do you have that cookie I might have given you before?” If you do, and your browser sends it back, it can use that to keep you connected to the shopping-cart database, maintaining your cart as you shop. Or whatever.

And we were always told that cookies are “safe”. For one thing, it’s just a bit of data that the web server gave to your browser, so it contains no information that the server doesn’t already have. And your browser will only give it back to the same server it got it from. All it does it enhance your web-browsing experience; what could be bad?

What can be bad is that it ties together every request you make to the web server, along with all the information you send to the server in browser forms and such. By setting a cookie, the web server can keep a database of every page you visited at that web site, every button you pressed, every box you checked, every text field you filled in, every ad it showed you, every question it asked you, every answer you gave, every... everything you did at that site.

The cookie itself doesn’t say who you are. But, hey, once they get you to fill in a form — say, an order form — with your name and address, they have that information associated with the cookie they gave you. And that connects it to everything else.

And here’s another thing: as the company expands and you use more of their services, there’s more that they keep track of and tie together. Your Google searches are connected to what you do at Google Maps, at Flickr, at YouTube, at Blogger....

Most browsers have a way to “turn off” cookies, but that’s essentially a useless feature these days: so many web sites require cookies — they’ll refuse to give you anything, and will simply give you a page that explains how to turn cookie support back on. So unless you’re willing to have 75% of the web not work for you, you’re stuck with cookies and the data-collection that they enable.

There’s more, too; most people don’t know what other information web sites get from your browser. The browser usually tells the web server such things as what operating system you’re using, what browser (and version) you’re using, what your screen resolution is, what page you clicked on to get here, what search you ran to get here.... If it can tie all that to a cookie, it has that much more information. In particular, it now also knows what web sites you visit before coming here. Hm.

And there’s not really anything you can do about it. Anonymizing services, which act as proxies and hide that sort of information — omit some, lie about some, block cookies — will make too many web sites not work. The average user will not be willing to accept that, and will give up the privacy, usually with a shrug.

Privacy laws that cover this stuff have to be sufficiently robust that they catch all of this, not just the more egregious bits. The U.S. government doesn’t have a good track record in that regard, I’m afraid. The people writing the laws aren’t savvy enough to get it right. And there’s a great deal of lobbying from companies and organizations that will benefit — at the expense of your privacy — from having access to this information.

3 comments:

lidija said...

I think companies ought to pay me to get the info, and not get it for free. That's what pisses me off here. Other than that, it's truly inescapable and unless, as a consequence, we have ID theft on a massive scale or an uncovered not-specifically-agreed-upon profiling by say, insurance companies, not a thing will change. The True Market Believers believe it is all good. That includes a vast majority of our representatives while the rest of us, as you said, don't know and don't have an opinion.

Barry Leiba said...

They ought to pay, yes... but as it works out, their "payment" is the service they offer. In return for letting them collect information, you get to use their web site. Don't like that? OK... don't use their web site.

Most people aren't willing to give up the open access to all the web sites. If you give them a browser setup that preserves more privacy, they get too frustrated when web sites don't work. So, to most people, the companies are paying them enough for them to be willing to provide the information.

What's needed is a greater understanding of the real value of the information.

lidija said...

I suppose I'm starting from the point of making shoebuy.com and Macy's equivalent. Nobody really knows I enter Macy's like everybody knows I go to shoebuy.com. And really, no one could be any wiser on what hideous shoes I might buy at Macy's but if I'm not careful everybody, on Facebook might. As someone to whom internet access and retail has been taken for granted, I do want to consider those two retail options equivalent (i.e walk-in vs. online). But apparently we do pay a premium to use the latter. Which only goes to show that it is not the commerce way yet because people would ultimately have a choice to go to sites that don't collect that info... which they don't. Or at least I'd like to believe that the market would drive us there too.

Also, Google is the worst. Not only because they censor sites that talk trash about them (blogs yet? :) but because they have tracked me far and wide like you wouldn't believe (or rather, it is surprising how many businesses use their services... yes, of course, I can log out of Google and I obsessively do). Oh but I don't want to go down that alley today... because what I am really upset about in their case is that I cannot rely on a good search anymore. I didn't mean to digress.