Thursday, July 20, 2006

.

Digital signatures, part 4

This is a continuation of the Digital Signatures overview series.


3.0  Digital Signatures

So, at the simplest level a digital signature of a message is just a hash of the message, encrypted with the sender's private key. The recipient sees who the sender purports to be, gets that sender's public key, and decrypts the hash value. The recipient then hashes the message herself and compares the resultant hash value with the one she decrypted, and if they match she knows two things[1]:

  1. The message was, in fact, signed by the purported sender.
  2. The message was not altered after it was signed.
In fact, the digital signature has somewhat more information, but that is the core of it.

As it turns out, though, it's not quite that simple. Because of the way Internet email has evolved, changes often "happen" to email messages as they find their way through the Internet — changes that humans wouldn't consider to be of any significance, but that result in different hash values, and, thus, "break" digital signatures. Internet standards are required in order to have a hope of interoperability among email programs from different sources.

Signed and encrypted email has worked quite well for some time in proprietary email systems (such as Lotus Notes®, for example), and its use there is pretty much transparent to the users - they just check a box or select a menu item to have the message signed, and that's it. At the basic user-interface level, that's the same for Internet email. It's when we get below the surface that the problems crop up. We'll look at some of those issues here.

3.1  Formatting the Message

An Internet email message comprises two primary pieces: the header and the body. The header contains a set of header fields, each of which conveys information about the message — by whom it was written, the date it was written, to whom it was sent and copied, what its subject is, and so on. Note that none of the information in the header is authenticated, so essentially all of it is reliable only by convention (a convention flouted, of course, in spam). The body has the main content of the message, and it may be subdivided into body parts. All of this is explicitly defined in a set of RFCs, Internet standards specifications that are agreed to and published by the Internet Engineering Task Force (IETF).

While the standards that are normally used for Internet email work well for normal, unsigned messages, creating a reproducible hash of a message for signing is problematic. First, note that the header section is fairly volatile. Header fields are added to messages — sometimes at the front of the header section, sometimes at the end, and sometimes in the middle. Existing header fields are sometimes reordered at the convenience of the email program, since the standards call for the order to be insignificant in most cases. And header fields may be "folded" or "unfolded" at any time, allowing a program to split a long line into two, or to join two lines into a longer one. All of these would cause any hash value to change if the headers are included in the hash computation.

Changes also sometimes happen in transit to the body of the message, though this is less common today than it once was. The primary source of problems with the body today is the addition or removal of one or more trailing "line-end sequences" — two characters called CR ("carriage return") and LF ("line feed"), the names of which are relics from days of typewriter-like computer terminals. To the user, the addition of a CR/LF to the end of a message body appears simply as an extra "blank line" at the end of the message, and would almost always be insignificant. To a hash function, it makes this a different message, and will make the signature unverifiable.

There's also the question of communicating the signature itself, along with information about how the signature was created (which hash function was used, which encryption algorithm, and so on), so that the recipient can find the signature and verify it.

To deal with these problems, two competing Internet standards for secure messages — S/MIME and OpenPGP — take similar approaches, dividing the message body into a specific set of body parts (for S/MIME) or self-defined segments (for OpenPGP) that convey the appropriate information and make the message as robust as possible against incidental and innocuous changes in transit.


Next time: Key Distribution and Certificates


[1]For this and the rest of the discussion, we'll assume that all private keys are properly secured and have not been compromised, that the public keys were obtained properly, and that the associated cryptographic algorithms are robust and properly implemented.

No comments: