Wednesday, February 20, 2008

.

Who's to blame when programs don't play well together?

Last week, Raymond Chen blogged about misplaced blame for software interoperability problems:

It followed the usual pattern. First, there’s “It works on that other system but not on yours, so it’s obviously your fault,” followed by the invective “You suck.”

Next, the development team scrambles to study the problem and the investigation reveals that the bug was in the app after all. But that doesn’t get the platform off the hook.

Microsoft, where Raymond works, is, of course, unusually frequently hit by that sort of complaint. But it’s something we run into quite often in Internet standards work, so I want to say more about it here.

You might think that Internet standards specifications are necessarily very specific (hence, “specifications”), spelling everything out and leaving nothing to chance. Unfortunately, the process of arriving at consensus on the issues often winds us up with some features that different people want to take different approaches with. So we sometimes leave certain details to the decision of the implementors.

There’s a meta-specification, RFC 2119, that defines language that’s used in other IETF specifications. It explains what terms like MUST, SHOULD, and MAY mean — capitalized like that — when they’re used in the IETF’s documents. The idea is that when a standards document says something like, “clients MUST NOT [do this], and servers SHOULD ignore [this] if they receive it,” it’s trying to maximize interoperability — trying to make sure that client implementations (usually, the programs you run on your computer, such as web browsers and email programs) and server implementations (usually, the programs that provide the services you need, like web servers and email servers) work together smoothly.

The problems show up when SHOULD and MAY allow behaviour that’s not predictable. In the example above, if a client program makes the choice of [doing this] and someone uses that program with a server that makes the choice of treating [this] as an error, against the advice of the SHOULD clause, we’ll get what we so charmingly call an interoperability problem: the programs won’t work together. In this case, of course, the client has violated a MUST NOT condition, and is clearly at fault according to the standard.

But there are plenty of cases where combinations of SHOULD [NOT] and MAY [NOT] result in two programs that are technically compliant with the standard, and yet they don’t work well with each other. Because of that, I’m a strong proponent of not using SHOULD or MAY in most cases. In my opinion, standards specifications should, at every turn possible, tell you what to do in order to comply with the standard. Anything left to an implementation decision is one more opening for an interoperability problem.

Of course, even with everything laid out explicitly, implementors will always make mistakes. There’ll be misinterpretations of the spec, programming errors, unforeseen situations, and other sorts of things that will cause interoperability problems. And the blame usually gets shifted away from the component that’s most popular (or sometimes toward the component that’s least under the user’s direct control).

And that’s the effect that Raymond wrote about.

If a popular email program, for instance, does something that doesn’t exactly conform to the applicable standards, but most email servers cope with it, the error might easily be overlooked. Then, when you use that program with my server, which does not tolerate that particular quirk, we have a problem. Most likely, the problem will be blamed on my server. Most likely, protest though I might, I’ll have to change my server so it also copes with the situation.

Interestingly, it often doesn’t matter which side the problem is actually on. The fact is that the side that’s more easily fixed — or more easily pressured to do the fixing — is the one that will be pushed to do it, whether or not that’s right. It’s a frustration that most of us in the software world have come across at least once.

The late Internet pioneer and long-time IETF mover Jon Postel famously said what’s often quoted as, “Be conservative in what you do, liberal in what you accept from others.” The MUST NOT/SHOULD example above is consistent with that, and it’s generally good advice. Unfortunately, it sometimes results in problems’ being masked for a long time, as the liberal recipients let the errors slide.

No comments: