The discipline that gave the world digital communication and data compression, information theory, also put cryptography on a secure mathematical foundation.
Since 1948, when the paper that created information theory first appeared, most information-theoretic analyses of secure schemes depended on a common assumption.
However, that assumption is wrong, said a group of researchers at MIT and the National University of Ireland (NUI) at Maynooth in a paper presented at the International Symposium on Information Theory.
In a follow-up paper this fall at the Asilomar Conference on Signals and Systems, the same team will show the wireless card readers used in many keyless-entry systems may not be as secure as previously thought.
In information theory, the concept of information entwines with that of entropy. Two digital files might contain the same amount of information, but if one is shorter, it has more entropy. If a compression algorithm — such as WinZip or gzip — worked perfectly, the compressed file would have the maximum possible entropy. That means it would have the same number of 0s and 1s, and the way in which they distributed would be totally unpredictable. In information-theoretic parlance, it would be perfectly uniform.
Traditionally, information-theoretic analyses of secure schemes have assumed the source files are perfectly uniform. In practice, they rarely are, but they’re close enough it appeared the standard mathematical analyses still held.
“We thought we’d establish that the basic premise that everyone was using was fair and reasonable,” said Ken Duffy, one of the researchers at NUI. “And it turns out that it’s not.” On both papers, Duffy teams with his student Mark Christiansen; Muriel Médard, a professor of electrical engineering at MIT; and her student Flávio du Pin Calmon.
The problem, Médard said, is information-theoretic analyses of secure systems have generally used the wrong notion of entropy. They relied on Shannon entropy, named after the founder of information theory, Claude Shannon, who taught at MIT from 1956 to 1978.
The basis behind Shannon entropy is on the average probability that a given string of bits will occur in a particular type of digital file. In a general-purpose communications system, that’s the right type of entropy to use, because the characteristics of the data traffic will quickly converge to the statistical averages. Although Shannon’s 1948 paper dealt with cryptography, its primary concern was with communication, and it used the same measure of entropy in both discussions.
But in cryptography, the real concern isn’t with the average case but with the worst case.
A codebreaker needs only one reliable correlation between the encrypted and unencrypted versions of a file in order to begin to deduce further correlations. In the years since Shannon’s paper, information theorists have developed other notions of entropy, some of which give greater weight to improbable outcomes. Those, it turns out, offer a more accurate picture of the problem of codebreaking.
When Médard, Duffy and their students used these alternate measures of entropy, they found slight deviations from perfect uniformity in source files, which seemed trivial in the light of Shannon entropy, suddenly loomed much larger. The upshot is a computer turned loose to simply guess correlations between the encrypted and unencrypted versions of a file would make headway much faster than previously expected.
“It’s still exponentially hard, but it’s exponentially easier than we thought,” Duffy said. One implication is that an attacker who simply relied on the frequencies with which letters occur in English words could probably guess a user-selected password much more quickly than was previously thought. “Attackers often use graphics processors to distribute the problem,” Duffy said. “You’d be surprised at how quickly you can guess stuff.”