Monday, July 07, 2008


YouTube and Viacom and privacy

Oh, my!

A federal judge has ordered Google to turn over YouTube usage data, as part of Viacom’s copyright-infringement lawsuit:

A federal judge has ordered Google to turn over to Viacom its records of which users watched which videos on YouTube, the Web’s largest video site by far.

The order raised concerns among YouTube users and privacy advocates that the video viewing habits of tens of millions of people could be exposed. But Google and Viacom said they were hoping to come up with a way to protect the anonymity of the site’s visitors.

Viacom also said that the information would be safeguarded by a protective order restricting access to the data to outside lawyers, who will use it solely to press Viacom’s $1 billion copyright suit against Google.

Still, the judge’s order, which was made public late Wednesday, renewed concerns among privacy advocates that Internet companies like Google are collecting unprecedented amounts of private information that could be misused or fall unexpectedly into the hands of third parties.

That last bit is really the significant part. There are many things that could be done in this case to limit the damage — the privacy exposure. Restricting who has access to the information is the most obvious.

They could also agree to have the data anonymized before it leaves Google. Even better, they could limit what information is released. For the specific purpose of the motion in question, the judge’s order is far broader than it needs to be. Viacom is ostensibly looking for information about what videos are being watched, and how often — not, specifically, who is watching them. Yet:

For every video on YouTube, the judge required Google to turn over to Viacom the login name of every user who had watched it, and the address of their computer, known as an I.P. or Internet protocol address.

Such a broad order isn’t surprising from a judge who doesn’t really understand the technological aspects of all this, and I don’t expect a judge to be well versed in that stuff. It would be nice, therefore, if he had expert advice on it — advice other than the depositions of experts called by either side of the tussle.

For this motion, it would be sufficient for Google to provide only a list of videos along with the number of times each has been viewed within some time window (say, the month of June). No information is needed that identifies the users in any way — not user names, not IP addresses[1], not even the dates and times the videos were accessed.

If Google has to give all the detailed information to Viacom, there are two big issues with that. First, it’s an enormous amount of information. According to the article, 4.1 billion YouTube videos were accessed in April. That’s around 1600 every second. Sifting through that, on both sides, will take a while and will cost a lot. Second, and more important from a privacy point of view, Viacom can use the information to pursue the users, possibly trying to use RIAA-style strong-arm tactics to extort money from the highest-volume viewers.

They say they won’t do that, and that Viacom itself won’t have access to the data. But are we sure? What will their next motion be, and how will the judge decide it? This could easily turn into a fishing expedition, and a dangerous one with wide-ranging effects, which could be quelled by a more informed judge who was more selective in what he ordered.

Where this all really takes us, though, is not just to the privacy aspects of this case, but to the understanding that there isn’t really any privacy on the Internet — a refrain I’ve sung here before. Any data that’s kept can be divulged, be it by mistake, through hacker attacks, after a unilateral change in privacy policy, or in response to legal demands from a court or illegal demands from the government. No matter what assurance you have today, you don’t know what will happen to the data tomorrow. The only information that’s safe is information that’s not kept, not backed up, not archived.

And with the exception of services that are expressly designed for anonymity, this sort of information is always kept, and, therefore, is always subject to being divulged.


[1] The article says, “Both companies have argued that I.P. addresses alone cannot be used to unmask the identities of individuals with certainty. But in many cases, technology experts and others have been able to link I.P. addresses to individuals using other records of their online activities.” Yes, indeed: knowing the IP address and the exact time of use can, in most situations, get someone with a subpoena direct access to the identity of the user, at least for users within the United States (and, therefore, under the jurisdiction of the subpoena). This has been done many times, in many court cases, and is well known.

I also find the NY Times’ style of using periods in all abbreviations to be... quaint. No one writes “I.P. address” except the Times.

1 comment:

Anonymous said...

Hi All:

If you've ever watched a video on YouTube, you should know that Viacom could identify you through this data. This order opens the door for corporations to use our private records at their will and without our consent. Tell Google to defy the court ruling and to refuse to hand over our records to Viacom. Sign this petition: