Staring At Empty Pages: February 2011

Monday, February 28, 2011

IP blocklists, email, and IPv6

Engineers in the Internet Engineering Task Force, in the Messaging Anti-Abuse Working Group, and elsewhere have been debating how to handle e-mail-server blocklists in an IPv6 network. Let’s take a look at the problem here.

We basically have three ways to address spam, in our goal of reducing the amount of spam in our inboxes:

Prevent its being sent in the first place.
Refuse to accept it when it’s presented for relay or delivery.
Discard it or put it into a junk mail folder at (or after) delivery.

The last is handled by what we usually think of as spam filters, which analyze the content and other aspects of the messages. Dealing with the first involves law enforcement, as well as adoption of best practices for legal email marketers. To implement the second, we try to do various analyses during the actual transmission of the email messages, in order to respond at the protocol level with some sort of refusal. It’s rather like standing between your postal carrier and the mailbox at your house, and telling the carrier that she may put this envelope into the box, but she should take those two catalogues and the credit-card offer right back to the post office with her.

And one can actually imagine doing that, by looking at the envelopes and applying rules such as, If it’s pre-sorted, it’s probably junk, and, The more urgent it claims to be, the more likely it is to be junk. But a better way, still, would be if we could get this to happen as soon as the junk mail entered the postal system, by having a way to say, See that guy who’s dropping that pile of mail at the post office? He only sends junk, and when you see him coming just make him go away. Don’t even let him bring his pile in the door.

We have that in our email systems, in what we call IP blocklists (or blacklists). These are lists of the numeric Internet addresses of email servers that we think send so much spam that we won’t even let them come to the door. When one of these servers makes an Internet connection to one of our mail servers, we don’t even start an email protocol exchange with them — we just refuse the connection. We make them go away.

Estimates vary as to what portion of attempted spam this blocks, but at least some estimates are on the order of 90%. Despite the problems with this mechanism (legitimate mail servers do find themselves on blocklists, for various reasons, and sometimes have a hard time getting the list-managers to remove them), it’s a critical one in the fight against spam, saving a great deal of time and computing resources by cutting the spam messages off much earlier in the process.

But note that it deals with IP addresses. Today, of course, that means IPv4 addresses, those things that look like 192.168.0.1, and that there are around 4 billion of. 4 billion is a large number, but, as we’ve seen, it’s notably finite and manageable. It’s reasonable to take every IP address we ever see trying to send mail, and keep it on a list, sorting the addresses into the good ones and the bad ones. It’s feasible to block Internet connections from the ones in our list that are marked bad.

Not so when we consider IPv6. Bumping the IP address from 32 bits to 128, bumping the 4 billion up to a billion billion billion or so — the number doesn’t matter, at that point — makes it infeasible to keep a list of bad addresses. There are enough addresses there to allow the bad guys to use a new one every time, so we’d never see repeats. There are, of course, ways we can group addresses into large blocks, and know that any address we see in one of those blocks will be bad, but even that isn’t enough to make it work.

We could switch to a pass list, a whitelist of known good addresses — that would still be small enough to be manageable — and refuse anything else. But that makes it very hard for an organization to deploy a new server, or for a new organization to join in.

John Levine has one approach: leave the email system on IPv4 for the foreseeable future. Even, John points out, when many other services, customer endpoints, mobile and household devices, and the like have been — have to have been — switched to IPv6, we can still run the Internet email infrastructure on IPv4 for a long time, leaving the IP blocklists with v4 addresses, and a system that we’re already managing fine with.

Of course, some day, we’ll want to completely get rid of IPv4 on the Internet, and by then we’ll need to have figured out a replacement for the IP blocklist mechanism. But John’s right that that won’t be happening for many years yet, and he makes a good case for saying that we don’t have to worry about it.

At least not until he and I have long been retired

Sunday, February 27, 2011

Personal watermelon

Friday, February 25, 2011

Interactive Voice Response (IVR)

I’ve been meaning to change my credit-card PIN (not PIN number, please; PIN already includes the word number) for a while now. I don’t need it to be reset... I know the current one, and I just want to change it. For whatever reason, one can’t do that from the web site, but only by calling in. Having just returned last night from meetings in Orlando, where I’ve been all week, I decided to call in.

When I connect, I first get a cheerful voice telling me the great news (their phrase, not mine): they have changed their system, and now I can speak things like my account number, my selections, and such, rather than just entering them from the number pad. In other words, they have a new IVR system.

Joy.

It suggests that I might press 2 para español (I don’t), and then asks for my account number. I choose to enter it the old-fashioned way, and I follow with my zip code when the prompt requests it. It correctly identifies my account and spends a minute or two reciting every detail about my account that it can think of, whether I want to hear it or not: my account balance, my remaining available credit, the portion of my remaining available credit I can use for cash advances, the amount and date of my last payment (along with thanks for sending it in), the minimum payment currently due and the due date. I wait all this out.

It then tells me what to say if I want to hear all that again (&deity, no!), suggests two other things I might say, and gives me a fourth choice, I want to do something else.

I want to do something else, I say.

Briefly tell me what you would like to do, it says. For example, you could say, ‘I want to change my PIN number.’ Yes, it says PIN number; waddyagonnado? But it’s funny that the very thing I want to do is the example it gives. I say, in the nice, clear voice I speak in, I want to change my PIN number, including the word number, just as prompted.

I’m sorry; I didn’t quite understand you. Not quite, you see. Almost, perhaps, but not quite. It asks me to try again, to just say a few words. I guess some folks bloviate, become logorrhetic, or otherwise confuse the electrons.

I want to change my PIN number.

No joy.

It fails on the third try, as well, and then sends me to a human, who, as they’re trained to do, apologizes for the trouble I’m having, and tells me that he can transfer my call to the PIN-changing system. Great! So he does. I wait a moment...

...and find that I’m back to the beginning of the whole process, from the Spanish prompt to the account-number prompt to the zip-code prompt, and I listen again to the account status message. It’s so nice that my minimum payment is only $24, though, of course, I would never pay off my balance at that rate. Nevermind. I again tell it that I want to do something else, I again tell it that I want to change my PIN, and it again fails to understand me thrice.

I get a second human.

I moan to this second human that the IVR system isn’t understanding me, and he offers to stay on the line with me while I try it again. This way, he can hear what’s going wrong and direct it to the right place anyway. Great!

I/we go back through the whole thing again... Spanish, account, zip, status info, do something else, change PIN, change PIN, change PIN. See?, I say, while the IVR system says it will connect me with a human operator. But my friend isn’t there after all, and in a moment a third person responds and, as the others, sympathizes with me for the trouble I’m having. She tells me that they are having problems with their system, implying that they know about it but are inflicting it on everyone anyway.

She offers me an easy solution: I can use the option numbers instead of the speech recognition. Of course, now that the speech reco is in there, they don’t list the numbers any more, but she tells me what they are: press 4, then 2. Great! She sends me back into the abyss.

Spanish, account number, zip code... but now, as it starts to read my account status to me I barge in with an aggressive 4 on my number pad, and it stops in its tracks and asks me to say what I want to do, again suggesting that I might say, I want to change my PIN number. Instead, though, now savvier, I press 2.

I’m sorry; I didn’t quite understand you. It didn’t understand the number on the pad either. I press 2 again and get the same second oops message, and a third try brings the promise of a human. It’s possible that it had understood me all along, but the PIN-setting system is what’s really broken. Human number four comes on the line, and, yes, I did get four different people.

I tell this one what happened, and he says that the only way to change my PIN is to go through the system that way — it’s so sensitive that they don’t want human operators to know the customer’s PIN (I suppose that makes sense). He tries to get me to do it again, but, feeling like a mouse in a maze or, perhaps, a Candid Camera victim, I decline, say that he should please report that the system is horridly broken, and I’ll try calling in another time in hope that it will have been fixed. He tries not to let me go, but I say, No, thanks very much for the help. Bye, and I hang up.

I should have stayed in Orlando.

Saturday, February 19, 2011

Forty seven seconds to walk

You know how some intersections in some cities have digital count-downs, telling you how many seconds you have to cross before the walk signal changes to don’t walk? How do they come up with the times?

I just waited at a corner, and when the walk signal came on, the timer started at 47. Who decided that it should be 47 seconds, and not, say, 45, or 50? And why? Is there really any sense in which 47 seconds is enough, but 45 isn’t, and 50 is too long? Why should two or three seconds one way or another matter?

One wonders.

Thursday, February 17, 2011

Watson’s third day

I hadn’t planned to make three posts, one per day, about Watson on Jeopardy!, but there ya go. The third day — the second game of the two-game tournament — was perhaps even more interesting than the first two.

Watson seemed to have a lot more trouble with the questions this time, sometimes making runs of correct answers, but at other times having confidence levels well below the buzz-in threshold. Also, at many of those times its first answer was not the correct one, and sometimes its second and even its third were not either. Some of the problems seemed to be in the categories, but some just seemed to deal with particular clues, regardless of category.

Watson also did not have domination of the buzzer this time, even when it had enough confidence to buzz in. I don’t know whether they changed anything — I suspect not, since they didn’t say so. It’s likely that Mr Jennings and Mr Rutter simply were more practiced at anticipating and timing their button-presses by then (remember that the three days’ worth of shows were all recorded at the same time, a month ago).

Those factors combined to make Watson not the run-away winner going into the Final Jeopardy! round that it was in the first game. In yesterday's final round (category: 19th-century novelists), all three contestants (and your reporter, at home) came up with the right answer, and Watson pulled far ahead with an aggressive bet that Mr Rutter didn’t have the funds to match. Mr Jennings, meanwhile, chose to be conservative: assuming he would lose to Watson (the first game’s results made that certain), he made his bet of only $1000 to ensure that he would come in second even if he got the answer wrong.

The result, then, was Watson winning the two-game match handily, and earning $1 million for two charities. Other charities will get half of Mr Jennings’s and Mr Rutter’s winnings (whether that’s before or after taxes, I don’t know; I also don’t know whether taxes will reduce Watson’s million-dollar contribution).

One other thing: in a New Scientist article yesterday, talking about the second day and the first Final Jeopardy! round, Jim Giles makes a sloppy mistake (but see update below):

Watson’s one notable error came right at the end, when it was asked to name the city that features two airports with names relating to World War II. Jennings and Rutter bet almost all their money on Chicago, which was the correct answer. Watson went for Toronto.
Even so, the error showed another side to Watson’s intelligence: knowing that it was unsure about the answer, the machine wagered less than $1000 on its answer.

Of course, Watson’s wager had nothing to do with how sure it was about the answer: it had to place the bet before the clue was revealed. Its wager had something to do with the category, but likely was far more heavily controlled by its analysis of the game position and winning strategy. In determining its bets, it runs through all the bets it and its opponents might make, and decides on a value that optimizes its own position. And its strategy in the second game was different from that in the first

Update: The New Scientist article was updated shortly after it was published. It now says this:

Even so, the error did not hurt Watson too much. Knowing that it was far ahead of Jennings and Rutter, the machine wagered less than $1000 on its answer.

Wednesday, February 16, 2011

Watson’s second day

Commenting on yesterday’s entry, The Ridger notes this:

I find looking at the second-choice answers quite fascinating. "Porcupine" for what stiffens a hedgehog’s bristles, for instance. There is no way that would be a human’s second choice (after keratin). Watson is clearly getting to the answers by a different route than we do.

That’s one way to look at it, and clearly it’s true that Watson goes about determining answers very differently from the way humans do — Watson can’t reason, and it’s all about very sophisticated statistical associations.

Consider that both humans (in addition to this one, at home) got the Final Jeopardy question with no problem, in seconds... but Watson had no idea (and, unfortunately, we didn’t get to see the top-three analysis that we saw in the first two rounds). My guess is that the question (the answer) was worded in a manner that made it very difficult for the computer to pick out the important bits. It also didn’t understand the category, choosing Toronto in the category U.S. Cities, which I find odd (that doesn’t seem a hard category for Watson to suss).

But another way to look at it is that a human wouldn’t have any second choice for some of these questions, but Watson always does (as well as a third), by definition (well, or by programming). In the case of the hedgehog question that The Ridger mentions, keratin had 99% confidence, porcupine had 36%, and fur had 8%. To call fur a real third choice is kind of silly, as it was so distant that it only showed up because something had to be third.

But even the second choice was well below the buzz-in threshold. That it was as high as it was, at 36% confidence, does, indeed, show Watson’s different thought process — there’s a high correlation between hedgehog and porcupine, along with the other words in the clue. Nevertheless, Watson’s analysis correctly pushed that well down in the answer bin as it pulled out the correct answer at nearly 100% confidence.

In fact, I think most adult humans do run the word porcupine through their heads in the process of solving this one. It’s just that they rule it out so quickly that it doesn’t even register as a possibility. That sort of reasoning is beyond what Watson can do. In that sense it’s behaving like a child, who might just leave porcupine as a candidate answer, lacking the knowledge and experience to toss it.

No one will be mistaking a computer for a human any time soon, though Watson probably is the closest we’ve come to something that could pass the Turing test. However good it can do at Jeopardy! — and from the perspective of points, it’s doing fabulously (and note how skilled it was at pulling all three Daily Doubles) — it would quickly fall on its avatar-face if we actually tried to converse with it.

Tuesday, February 15, 2011

Watson’s first day

Interesting.

Watson did very well on its first day. In order to have time to explain things and introduce the concept of Watson, they set it up so that only two games are played over the three days. The first day was for the first round, and the second day (this evening) will have Double Jeopardy and Final Jeopardy.

It wasn’t surprising that there were a few glitches, where Watson didn’t fully get the question — for instance, answering leg, rather than missing a leg, in describing the anatomical oddity of an Olympic winner. And, as we knew might happen, Watson repeated an incorrect answer from Ken Jennings, because the computer has no way to know what the other contestants have said.

What I found interesting, though, is that Watson does have a very strong advantage with the buzzer. Despite the attempts to smooth that out by setting up a mechanical system whereby Watson sends a signal to cause a button to be physically pushed, and despite whatever the humans can do through anticipation, it’s clear that people just can’t match the computer’s reactions. Almost every time Watson was highly confident of its answer — a green bar (see below) — it won the buzz. Surely, on things like the names of people in Beatles songs, Mr Jennings and Mr Rutter were as confident of the answer as Watson was, and had the answers ready well before Alex finished reading. Yet Watson won the buzz on every one of those.

It was fun to have a little of Watson’s thought process shown: at the bottom of the screen, we saw Watson’s top three answer possibilities, along with its confidence for each, shown as a percentage bar that was coloured red, yellow, or green, depending upon the percentage. That was interesting whether or not Watson chose to buzz in. On a Harry Potter question for which the answer was the villain, Voldemort, Watson’s first answer was Harry Potter — it didn’t understand that the question was looking for the bad guy, even though the whole category related to bad guys. But its confidence in the answer was low (red, and well below the buzz threshold), it didn’t buzz in, and Mr Rutter gave the correct answer (which had been Watson’s second choice).

Of course, they didn’t use any audio or video clues, according to the agreement — Watson can neither hear nor see — but they didn’t seem to pull any punches on the categories or types of questions. It feels like a normal Jeopardy! game.

Oh, and by the way: the TiVo has it marked as copy-protected, so I can’t put it on a DVD. Damn. I don’t know whether regular Jeopardy! games are that way or not; I’ve never recorded one before.

Monday, February 14, 2011

Government oversight of the Internet

Now that the protests in Egypt have led to a change in leadership — an outcome that seemed inevitable for a while, though now-former-President Mubarak denied that it would happen — I want to go back and look at a key event during the last few weeks, when the Egyptian government disconnected the country from the Internet

It appears that removing an entire country from the internet is surprisingly easy, by making changes in a system known as the border gateway protocol (BGP). This system is used by ISPs and other organisations to connect to each others’ networks, so the Egyptian government just had to order ISPs to alter the BGP routing tables to make external connections impossible.
Looking at BGP data we can confirm that according to our analysis 88 per cent of the ‘Egyptian internet’ has fallen off the internet, reports Andree Tonk of BGPmon, a site dedicated to monitoring changes in the BGP. A recent report for the OECD cited the BGP as a weak point in online infrastructure that needs to be secured — a prediction that seems to have now come true.

As the report makes clear, it’s not technically difficult, at least not for a relatively small country with a relatively centralized connection to the Internet. And we see countries such as China and Iran using similar techniques to do more selective blocking (the latter has, I understand, responded to the events in Tunisia and Egypt by joining the former in blocking access to blog sites such as this one). The issue isn’t technical, but one of policy: is the government allowed to cut off the Internet?

Of course, with countries where the government makes its own authority, the answer is always Yes. But what about in the U.S., where the government was limited, at least through the end of the 20th century, to abiding by its constitution, legislation, and a judicial system?

For one answer to that question, we can look to Senator Joe Lieberman of Connecticut, who, along with Senators Susan Collins (Maine) and Tom Carper (Delaware), introduced legislation to enhance the security and resiliency of the cyber and communications infrastructure of the United States.

The Protecting Cyberspace as a National Asset Act of 2010, S.3480 (here’s a PDF of the latest version as of this writing) was introduced last June and was entirely replaced by Senator Lieberman in December (you have to go to the bottom of page 197 of the PDF to see the new version). The December version was reported to the Senate from the Committee on Homeland Security and Governmental Affairs, which Mr Lieberman chairs (and on which his cosponsors sit). It’s now on the Senate’s legislative calendar. (The corresponding House bill is H.R.5548.)

The bill, if it should become law, would create a new operational entity within [the Department of Homeland Security]: the National Center for Cybersecurity and Communications (NCCC).

The NCCC would be led by a Senate-confirmed Director, who would regularly advise the President regarding the exercise of authorities relating to the security of federal networks. The NCCC would include the United States Computer Emergency Response Team (US-CERT), and it would lead federal operational efforts to protect public and private sector networks. The NCCC would detect, prevent, analyze, and warn of cyber threats to these networks.

The bill creates, in addition to the NCCC, quite a number of offices, councils, task forces, and programs, some of which make sense and some of which probably don’t. It creates the Office of Cyberspace Policy, whose Director is appointed by and reports to the President. It creates the Federal Information Security Taskforce, comprising executives and representatives from more than a dozen government agencies. And so on.

The entire bill is quite extensive, running well over 200 pages. And what’s frightening about it is that it puts the U.S. government right in the middle of the operation and management of the Internet within the United States and its territories — and keep in mind how central U.S. operations and U.S.-based services are to the Internet as a whole. It’s difficult to understand the effect that all this new administration will have on the operation of the Internet within the U.S., and the effect that it could have if it’s mismanaged, if it tries to respond to perceived threats, if it’s affected by right-wing zealots or other dubious elements that inhabit the U.S. political community.

I have read the bill’s summary, along with parts of the bill itself, but haven’t had time to read the whole bill yet. It’s not clear how bad it could be, nor, indeed, whether it will be bad at all... but I’m very skeptical of the result of putting such a large set of deep layers of U.S. government bureaucracy in the middle of the operation and management of the Internet. And I’m deeply worried about giving authority to make operational decisions to people who have insufficient technical knowledge to understand the ramifications of those decisions, who may have political or ideological motivations that do not coincide with what’s best for the Internet, and who can implement their decisions without the checks-and-balances oversight that protects us in other parts of our lives.

I have lots more reading to do.

Sunday, February 13, 2011

Jeopardy! tomorrow

Monday through Wednesday are the days when the Jeopardy! games will air that pit IBM Research’s Watson computer against former champions Ken Jennings and Brad Rutter.

My TiVo is set to record them, and it’s also recorded last week’s NOVA program, Smartest Machine on Earth (which you can watch on the PBS site). I’m eager to see how the games, recorded last month, came out.

Update, 15 Feb, answer to Nathaniel’s question in the comments: Ken Jennings says this, on his blog:

On Twitter, Watson (okay, his human handlers) have said that video will be posted on Watson’s website on Thursday, for those unable to watch one or more of the games live. You know: non-Americans, the gainfully employed, the Tivo-less, those with significant others expecting a romantic night out tonight instead of a quiz show, etc.

Friday, February 11, 2011

And visions of greengage plums dance in my head

A week ago, New Scientist told us about some new research technology by Toshiba, a system that recognizes fruits and veg at the self-checkout station:

Its system, developed by Susumu Kubota and his team at Toshiba’s research centre in Kawasaki, Japan, uses a webcam, image recognition and machine-learning software to identify loose goods, such as fruit. The company claims the system can tell apart products that look virtually identical, by picking up slight differences in colour and shape, or even faint markings on the surface.
When shoppers want to buy, say, apples at existing self-service checkouts they must choose the right product from a long list of pictures on a screen. Toshiba’s technology, part of which was presented last year at the 11th European Conference on Computer Vision in Chersonissos, Greece, compares the image captured by the webcam against a database of images and detailed information on the item’s appearance. The software uses an algorithm to produce a list of pictures of similar items, with its choice for the closest match at the top. If this choice is the correct one, the checkout user presses a button to confirm the purchase.

The system isn’t quite ready yet, and Toshiba hopes to commercialise the system within three years. They note, Similar ideas designed to identify products without barcodes have never made it to market in the past.

Indeed. Let’s go back to this item from 2003, where USA Today talks about some IBM research, including a system called Veggie Vision:

Researchers at IBM recently assembled several of the high-tech machines for a demonstration at their Industry Solutions Lab in Hawthorne. Among them were the smart shopping cart, a computerized produce scale called Veggie Vision, and a fascinating projection tentatively dubbed the Everything Display.
[...]
There doesn’t seem to be any controversy about Veggie Vision, a scale for fruits and vegetables that is hooked up to a digital camera and a library of hundreds of pictures of produce. When a shopper puts tomatoes on the scale, the machine evaluates their color, texture and shape to determine what they are, then weighs and prices the purchase.
Not only can it tell an apple from a tomato, but unlike some checkout clerks, it can tell a McIntosh apple from a Red Delicious.

Sound familiar? It did to me, because I knew some of the people who worked on Veggie Vision, colleagues at IBM’s T.J. Watson Research Center. And, while the USA Today article is from 2003, the conference papers about Veggie Vision, as well as the patents covering the technology, are from 1996 and 1997 (see this page for the IBM Research description, and links to the papers and the patents). It’s all there, complete with reading through the bag and machine learning.

I remember being impressed with the system (and the cool name), back when my colleagues were working on it and demonstrating it within the research lab. We had a good one, thought I, and according to the IBM Research web page, The system is now ready for prime time, and its developers have signed field test agreements with two scanner manufacturers and one company that makes self-checkout systems.

So, what happened? Why isn’t the IBM system out there at all the self-checkout stations? Why is Toshiba making the science-and-technology news for re-inventing what IBM had ready for market ten years ago? I’d hate to see Toshiba get the credit for what my IBM colleagues did so much earlier.

I have no information about that, alas... only the vague frustration that I often found, where good research projects would never seem to go where we thought they should, after they left the lab.

The thing that hath been, it is that which shall be;
and that which is done is that which shall be done:
and there is no new thing under the sun.
— Ecclesiastes, chapter 1, verse 9 (King James Version)

Thursday, February 10, 2011

Foiling offline password attacks

Jarno, at F-Secure — an excellent Finnish anti-malware company — has posted a nice analysis of encoding password files. Because he assumes some knowledge of the way things work, I’ll try to expand a bit on that here. Some of this has been in these pages before, so this is a review.

A cryptographic hash algorithm is a mathematical algorithm that will take some piece of data as input, and will generate as output a piece of data — a number — of a fixed size. The output is called a hash value, or simply a hash (and it’s sometimes also called a digest). The algorithm has the following properties:

It’s computationally simple to run the algorithm on any input.
Given two different inputs, however similar, it’s very likely that the hashes will be different (it is collision resistant).
Given a hash value, it’s computationally infeasible to determine an input that will generate that hash (it is preimage resistant).
Given an input, it’s computationally infeasible to choose another input that gives the same hash (it has second preimage resistance).

Cryptographic hash algorithms go by names like MD5 (for Message Digest) and SHA-1 (for Secure Hash Algorithm), and they’re used for many things. Sometimes they’re used to convert a large piece of data into a small value, in order to detect modifications to the data. They’re used that way in digital signatures. But sometimes they’re just used to hide the original data (which might actually be smaller than the hash value).

Unix systems used to store user names and passwords in a file called /etc/passwd, with the passwords hashed to hide (obfuscate) them. A standard attack was to find a way to get a copy of a system’s /etc/passwd file, and try to guess the passwords offline. If you know what hash algorithm they’re using, that’s easy: guess a password, hash it, then look in the /etc/passwd file to see if any user has that hash value for its password.

Nowadays, most systems have moved away from storing the passwords that way, but there are still services that do it, there are still ways of snatching password files, and the attack’s still current. Jarno’s article looks at some defenses.

Salting the hashed passwords involves including some other data along with the password when the hash is computed, to make sure that two different users who use the same password will have different hashes in the password file. That prevents the sort of global attack that says, Let’s hash the word ‘password’, and see if anyone’s using that. Of course, if the salt is discoverable (it’s the user name, or something else that’s stored along with the user’s information), users’ passwords can still be attacked individually.

Even using individual attacks, it’s long been easy to crack a lot of passwords offline: we know that a good portion of people will use one of the 1000 or so most popular passwords (password, 123456, and so on), and it never has taken very long to test those. Even if that only nets the attacker 5% of the passwords in the database, that’s pretty good. But now that processors are getting faster, it’s feasible to test not only the 1000 most popular passwords, but tens or hundreds of thousands. All but the best passwords will fall to a brute-force offline attack.

The reason offline attacks are important is that most systems have online protections: if, as an attacker, you actually try to log in, you’ll only be allowed a few tries before the account is locked out and you have to move on to another. But if you can play with the password file offline, you have no limits.

Of course, the best defense is for a system administrator to make sure no one can get hold of the system’s or the service’s password file. That said, one should always assume that will fail, and someone will get the file. Jarno suggests the backup defense of using different salt values for each user and making a point of picking a slow hash algorithm. The reasoning is that it doesn’t make much difference if it takes a few hundred milliseconds for legitimate access — it doesn’t matter if a login takes an extra quarter or half second — but at a quarter of a second per attempt, it will be much harder for an attacker to crack a bunch of passwords on the system.

Just two small points:

First, Jarno recommends specific alternatives to SHA-1, but he doesn’t have it quite right. PBKDF2 and HMAC are not themselves hash algorithms. They are algorithms that make use of hash algorithms within them. You’d still be using SHA-1, but you’d be wrapping complexity around it to slow it down. That’s fine, but it’s not an alternative to SHA-1.

The same is the case for bcrypt, only, worse, bcrypt uses a non-standard hash algorithm within it. I would not recommend that, because the hash algorithm hasn’t been properly vetted by the security community. We don’t really know how its cryptographic properties compare with those of SHA-1.

Second, Jarno suggests that as processors get faster, the hashing can be changed to maintain the time required to do it. He’s right, but that still leaves an exposure: because the server doesn’t have the passwords (only the hashes of the passwords), no hash can be changed until the user logs in. If the system doesn’t lock out unused accounts periodically, those unused accounts become weak points for break-ins over time.

That said, this is sound advice for system administrators and designers. And perhaps at least a little interesting to some of the rest of you.

Wednesday, February 09, 2011

What to teach children

On Tuesday, a local radio talk show hosted by Brian Lehrer included a call-in segment about sleep-over parties for children. It seems that some parents don’t allow their children to host or to attend them. Who knew? The guest for the segment was a pediatrician called Perri Klass.

I wasn’t especially interested in the topic (and some of the commentors on the web page agree in ridiculing tones), and I’m not especially interested in talking about it here. But I happened to be in my car and I heard it... and what did interest me was the last caller, Max in Larchmont. Here’s my transcript starting at about 12:35 into the audio stream:

Max: I’m calling because I’m wondering if the doctor has heard about people having problems with religious and political differences. I have three kids, and when they sleep over at other people’s houses, especially if they’re religious... my wife and I, we teach our children that religion is a pernicious force in the world, and is a terrible thing, and sometimes the parents of other kids get upset if my kids tell them that while they’re doing their prayers or something.
Lehrer: Well, while they’re doing their prayers may not be very nice. But, all right, so how do you handle it... Max, how do you handle it?
Max: We just... I don’t know if the doctor agrees, I think children should be legally shielded from religion until they’re sixteen. I think it’s crazy to expose children to superstitious ideas like that; it makes them dumb. And I think a lot of the kids are swayed when my kids meet their kids and they stop going to Hebrew school and so forth, and I think that’s a good thing.
Lehrer: Thank you, Max. Doctor Klass, any response?
Klass: Well, I’ll just make a general response, which is that if you’re gonna let your children go over to other people’s houses, either for sleep-overs or during the day, you’re gonna have to teach ’em to be good guests. Leaving aside your politics and leaving aside your religious issues, if you’re going to go into somebody’s house and you’re going to accept their hospitality, part of growing up is learning to be a good and respectful guest. Now, that doesn’t mean that you have to agree with things that you absolutely don’t agree with, and it doesn’t mean you have to necessarily join in practices which aren’t yours, but you do have to learn how to be polite, or, in the great way of the world, you won’t be invited back.

Some of the last batch of comments talk about Max’s call (unfortunately, they’re not numbered, but start with MP from Brooklyn at 11:58, and read up from there). Some think it’s a prank call, and not real. Some support the attitude (Robin from NYC, YZ from Brooklyn). One, Samantha from Sunny Riverdale seems to think the kids should be shunned for their parents’ attitude. That certainly seems the good, Christian thing to do, eh?

I agree with Max: as I’ve said many times, I consider religious indoctrination to be tantamount to child abuse. Teaching children made-up nonsense as truth, whether it be...

about Xenu the space dictator abducting his citizens, bringing them to Earth, and then killing them by blowing up volcanoes, or
about Apollo driving his chariot across the sky, carrying the sun through the day, or
about a talking snake convincing a primordial couple to sin by eating the wrong fruit, or
about a virgin who had been separated from that original sin giving birth to God’s son, who was then tortured to death but rose from the dead to rule in heaven, or
about Isildur defeating Sauron and severing his finger (and ring) in the Battle of Dagorlad,

...is ludicrous, and, yes, often makes them dumb. It certainly ill prepares them to think critically, when we demand that they accept preposterous stories without question, simply because it is written, it’s God’s word, and they must have faith. We spend far too much time either actively promoting belief in fantasy or passively allowing it to interfere with the education we need to be giving children — see, for example, this article.

All that said, though, I agree with Dr Klass: we don’t call people our friends, go to their houses, eat their food, sleep in their beds, and tell them, while we’re there, that their beliefs are stupid and ridiculous. Whatever we think, and however public we are about it otherwise, when we’re invited to people’s homes we make a choice: we decline the invitation if we’re unwilling to be civil, or we accept the invitation and stay clear of things that we know will upset them.

And, so, it’s a pity that Max and his wife have what I think is an admirable approach to teaching their children sense and reason... and yet have chosen not to teach them civility and the polite behaviour of a guest.

Tuesday, February 08, 2011

AOL to acquire dumpster full of garbage

Yesterday, the New York Times reported that AOL will pay $315 million for the Huffington Post, using this headline:

Betting on News, AOL Is Buying The Huffington Post

The problem here is that the HuffPo hasn’t been news in several years, if it ever was at all. I used to follow it in my feed reader, occasionally finding things of interest, but at least for the last three of its less than six years, it’s just been full of pointers to other people’s news, inane commentary, new-age silliness, quackery, and other junk. I stopped following it at all well over two years ago.

AOL apparently hasn’t. To be sure, there are things to be found there that are worth reading — I just don’t find it worth panning through the pebbles to find those few bits of pyrite, and there certainly isn’t anything that rates as gold. But with AOL’s content coming up even emptier, I guess the acquisition will be some sort of a boost, at least.

But news? Not unless something changes. Not unless Ms Huffington tosses the likes of Deepak Chopra and the other crazies that post there, and goes back to the substantive commentary that she used to have more of than now.

And it will be up to Ms H, indeed; according to the report:

Arianna Huffington, the cable talk show pundit, author and doyenne of the political left, will take control of all of AOL’s editorial content as president and editor in chief of a newly created Huffington Post Media Group. The arrangement will give her oversight not only of AOL’s national, local and financial news operations, but also of the company’s other media enterprises like MapQuest and Moviefone.
By handing so much control over to Ms. Huffington and making her a public face of the company, AOL, which has been seen as apolitical, risks losing its nonpartisan image. Ms. Huffington said her politics would have no bearing on how she ran the new business.

Well, best of luck to AOL’s new Huffington Post Media Group, but I, at least, am more skeptical than the HuffPo has ever been.

Friday, February 04, 2011

The Internet is falling!

The big Internet tech news this week is that the last block of Internet addresses, for the version of the Internet Protocol (IP) that we mostly use (IPv4), has been allocated. Or, as the headlines are saying, we have now run out of Internet addresses. Of course, it’s filled the tech media, as above, but it’s shown up in the mainstream press as well; here it is from the New York Times, and from The Guardian.

What does it really mean, that we’ve run out of IPv4 addresses?

Well, for one thing, it doesn’t mean that we’ve run out of IPv4 addresses. The Times gets it better than the other articles, in its headline:

The Last Block of IPv4 Addresses Allocated

The last address has not been assigned, not by a long shot. IPv4 addresses are allocated to organizations in large blocks — sometimes blocks of 60,000 or so, sometimes blocks of more than 16 million. Those organizations then assign addresses within those blocks, sometimes individually and sometimes in sub-blocks. What has just happened is that the last large block of addresses has been allocated. There are still many, many IPv4 addresses available for assignment, within many of the blocks that have been allocated.

For example, IBM has a 16-million-plus block of addresses comprising all addresses that start with 9 (that is, every address of the form 9.x.x.x; they also have some of the 129.x.x.x range). Those 9.x.x.x addresses are assigned within the company’s network. Not all of them are assigned, of course; there aren’t more than 16 million devices within the company.

Similarly, Internet service providers, such as Comcast and Verizon, have large blocks of their own, some for use within the company, and some to provide to their customers.

Many companies have blocks that are much larger than they need, far more than they could ever imagine using for their normal networks. Those blocks were allocated to them in earlier times, before the worldwide web and the explosion of Internet usage, when we never thought it would matter. Or they were assigned later, when we assumed that IPv6, with many orders of magnitude more addresses, would be well deployed by now. (I’ll note that it would be very difficult, even though large portions of the allocated blocks remain unused, to reclaim the unused bits and to reallocate them.)

Let’s not be Chicken Little, here; the sky is not falling, an the Internet is not imminently doomed. Indeed, the Internet will mostly run fine, as it is, for many years yet. We’ll all be able to read our email, buy from Amazon and eBay, use Facebook, and see YouTube videos.

Eventually, we’ll be crowded out by expanding Internet use, though we have techniques to keep that at bay for a long time. What will be blocked by this are — and this should be a familiar refrain to readers here — new applications, new uses of the Internet. To move into the future, beyond email and eBay, Facebook and YouTube, we need to move to IPv6.

We have enough IPv4 addresses for now, and for a while, to accommodate putting every computer on the Internet, as long as we’re thinking of computer as we have been: desktops and laptops. Maybe iPads, too. But now add Kindles and other eBook readers. Add smart-phones. Consider that every mobile phone is a smart-phone. Do we have enough v4 addresses for all of that?

Now move into the Internet of things: add every car, because our cars need to be online. Add every television (they’ll stream video directly), every stereo receiver (streaming music, radio stations, and other audio from the Internet), every portable music player from boom box to iPod Nano. Are we getting there? Include appliances: alarm clocks, refrigerators, coffee makers. Include home- and building-automation targets: thermostats, light switches, and so on. Put in sensor networks, traffic-control and monitoring systems....

Well, given all that, we ran out of v4 addresses long ago. It’s not really the v4 address-space depletion that should be driving the move to IPv6, but the need for more address space for future applications. If you don’t think that sort of thing is important, consider this news item about electric-grid problems stemming from the recent ice storm in Texas:

FORT WORTH, Texas — A high power demand in the wake of a massive ice storm caused rolling outages for more than eight hours Wednesday across most of Texas, resulting in signal-less intersections, coffee houses with no morning java and some people stuck in elevators.
The temporary outages started about 5:30 a.m. and ended in the afternoon, but there is a strong possibility that they will be required again this evening or tomorrow, depending on how quickly the disabled generation units can be returned to service, the chief operator of Texas’ power grid said in a release.

Consider the potential consequences of intersections without traffic signals and people stuck in elevators. We’d like to shut the power down in an area selectively, killing most of it but leaving the elevators running (at least until they open on the next floor), leaving a trickle of emergency lighting, leaving the the traffic lights running. We can do that, if everything’s addressable, and the power control system is set up to allow distribution with sufficient granularity.

But if it takes a Chicken Little scare — The Internet is falling! The Internet is falling! — to get IPv6 out there, well, here it comes.

Thursday, February 03, 2011

The scions of Sagan

Neil deGrasse Tyson and Brian Greene are the Carl Sagans of the 21st century. I’m happy that Carl Sagan’s 1980 TV series, Cosmos is available for streaming on Netflix, and I’ll soon re-watch it.

Neil deGrasse Tyson hosts the TV program NOVA ScienceNOW, and gives wonderfully amusing and insightful talks, some of which are on YouTube. Brian Greene did a PBS program about string theory, called The Elegant Universe, based on his book of the same name. Both scientists have a way of making science interesting and understandable to the general public, and both are excellent, charismatic speakers. If you have a chance to see or hear either of them, do it.

A week ago, or so, Dr Greene was on the radio program Fresh Air, a 34-minute segment called A Physicist Explains Why Parallel Universes May Exist (promoting his new book, of course, The Hidden Reality: Parallel Universes and the Deep Laws of the Cosmos). Go give it a listen.