|Discuss this paper and the subject of security in general -- subscribe to the mcg-talk list.||
Copyright © 1999 by E. Gerck and MCG, first published on Jan 13, 1999 in the mcg-talk list server.
Archived at http://mcwg.org/mcg-mirror/nrdes.htm
All rights reserved, free copying and citation allowed with source and author reference.
However, DES and Triple-DES security may change if DES has non-random characteristics that could be explored in new attacks. One of the objectives of this paper is thus the study of non-random DES characteristics, in order to begin to assess such risks -- which are ignored nowadays.
Indeed, even though it has been around for a long time, since the 1970s, no significant weakness has been found on DES [Bek82, MOV97] -- except its key-length of 56-bits. The few anomalies that were found are restricted to very special cases, such as the complementation for key and plaintext, leading to complemented ciphertext, and the handful cases for weak and semi-weak keys. The only significant attack on DES is still thus an exhaustive key search, also called the brute-force method, which explores its rather small key length when viewed by an attacker with large enough resources [EFF98] -- though still modest in comparison with other potential attackers.
Even though sometimes possible as when attacking an automatic e-mail encryption engine, a DES known-plaintext attack is a very special case -- where the known-plaintext is simply used with tentative keys until the intercepted ciphertext is produced, which directly defines the correct key. In the general case, however, one must ask what happens when the attacker only knows the ciphertext plus routing information, as when intercepting a communication on the Internet. The first important question in such ciphertext-only attack is then: "What is a correct key?". This question is usually answered either by looking to the set of possible decipherments or to the set of impossible decipherments -- or, to both. However, this exposition shows that for DES almost all decipherments done with wrong keys are impossible for natural or compressed English statistics (as an example). So, brute-force DES attacks, even when only the ciphertext is known, are much simplified as discussed -- in fact they can be 3x faster than hitherto imagined [MOV97, Sch98].
Why is that so?
In a ciphertext-only attack, for example over the Internet, the attacker may also have good clues into the message's language -- for example, the message is in English -- from the messages' routing headers, address, etc. And, as Shannon first pointed out [Sha49], the use of any additional encoding before encryption, such as compression or even a transposition cipher, can be accounted for since the underlying language statistics can still be reached. In this situation, also with compression, one can use the concept of cipher unicity [Sha49, Ger99], which measures the least amount of plaintext that needs to be deciphered in order to unambiguously define the plaintext -- hence, its key. Thus, to answer the question "What is a correct key?" one must first answer the question "What is the cipher's unicity?". However, to answer this question one needs to first define over which message and key spaces the cipher can be classified as a "random cipher". This was originally formulated by Shannon [Sha49] and, in the case of DES (an 8-byte block cipher with 64-bits plaintext space), essentially requires DES to obey the condition:
The above statement of DES random characteristics by Menezes and similar declarations by other authors [Bek82, Sch98] seems justified by practice and is often used, along with similar statements, to justify DES as a "Random Cipher". For example, it is used in DES unicity calculations using Shannon's formula [Sha49], both for uncompressed text and compressed text by Schneier [Sch98], in one block and also over more than one block [MOV97, Sch98].
However, this exposition shows that the usual statement about DES randomness, as quoted above from [MOV97], is neither entirely correct nor, even if entirely correct, would allow DES to be considered a "Random Cipher" over more than one block.
A first consequence is that current unicity calculations for DES (and, some other ciphers) are too large and need to be revised.
A second consequence is that DES cannot really hide the plaintext statistics, since almost nothing that is tentatively deciphered is close to the expected underlying language statistics of the plaintext except the only expected solution. Which observation unfortunately provides a practical and one-block algorithmic solution to the attacker's problem of separating all the (256 - 1) wrong keys from the only correct (but unknown) one.
I note that the reported results have nothing to do with the obvious result that one has "only" 256 possible plaintexts in DES decryption, which is due to the fact that the key also has 56-bits. Rather, they have to do with the much larger number of (264 - 256) ~ 264 impossible decryptions and their very influence on the possible decryption statistics, which were so far ignored. This leads to an "Identification Attack" on DES, with two different impossibility constraints:
ii. The second impossibility is differential and defines that a coincidental result of two English-looking plaintext statistics is impossible for two ciphertext DES blocks encrypted with one key but decrypted with a wrong key. This defines a second impossibility, the "two-block impossibility". The paper will provide a probability number for "impossible".
Further, the present results are discussed for English and for compressed English text, but they could have been equally well discussed for any natural language, for any protocol or document format -- including e-mail, fax, modem signals or specialized protocols tunneled in within the encryption.
In particular, a natural language such as English was used to exemplify the concepts because natural languages are a harder target to model since they are essentially non-deterministic in all their aspects except alphabet. To contrast, a Microsoft Word .doc file is essentially deterministic in alphabet, grammar and lexicon, for all its control characters -- specially the initial control codes for internal commands, which are very repetitive even from file to file for different users. There must be no surprises when a parser decides a file is a MS Word file, or any other file format, since parsers are optimized for size and speed and this requires specialization.
These well-defined and fixed commands will be called "magic numbers" in the exposition, as usual, since their task is to allow immediate "magic" identification by themselves. The "magic numbers" may be present in the beginning of the corresponding section, as usual, but may also be at the end or anywhere else.
However, it is important to remark that one does not have to "know" it is a MS Word document -- one just adds its distinct n-character "magic number" header to one's list of trusted n-grams, and then the MS Word file is just "discovered". These "magic numbers" are thus encoding-sniffers, and one usually just needs three or four bytes to disambiguate document type and even version numbers. And, "magic numbers" are present everywhere, for example in PKZIP files, GIF files, UNIX compress files, in SSL messages with known-plaintext headers for all queries from a known-text HTML form, in digital certificates such as X.509, with the OID numbers and the CA name and key-hash as well as the distinctive ASN.1 encoding for its various standardized fields, or in PGP for example in the ASCII armor.
In general, "magic numbers" are patterns which will be always present in any type of encoding or language since the very existence of patterns is what makes the encoding or language understandable to others, besides the sending party -- after all, this is the goal of standardization. This is explained in Trust Theory [Ger98b] by noting that trust is the vehicle of information and patterns are information tags which need to be mutually trusted (i.e., known and relied upon by all dialogue parties for their decisions) if they are to have any bearing on understanding.
For special protocols, in high-security applications, the "magic numbers"
can also be implicit or encoded with a random seed in each use. However,
the encoded data itself may betray its presence by its very structure,
size, range, randomness, redundancy, etc. This is a problem in identification
and, as given in [Ger98], "to identify is to look for connections" -- not
just to look for an identity connection with a set of "magic numbers".
For further discussion on this topic and a practical example please see
[Ger99], Section 4.
3.A. Shannon's "Random Cipher" assumption breaks down when one deciphers more than one 8-byte DES block.
The reason is that ciphertext in two different blocks did not mutually interact when each was enciphered. One effectively has different statistical ensembles for each block -- even though a block can influence the next block in cipher-feedback modes. DES enciphers/deciphers per block and these are the primitives that must be considered.
Therefore, eventual formulas for each basic probabilistic calculation must include the full 8 bytes of ciphertext and plaintext, but not more and not less. And, when connecting results for different blocks, the basic ciphertext probabilities should be considered independently for each block -- notwithstanding the plaintext possible correlation between blocks.
These conditions have been hitherto ignored in DES unicity calculations [cf. Sch98] -- which are perfunctorily extended over more than one block. However, the above arguments invalidate a priori the use of more than one DES block in Shannon's unicity calculations.
The same conclusion is valid regardless of the DES mode of operation used, such as ECB, CBC, CFB or OFB [MOV97]. For example, if we take DES CBC (Cipher-Block-Chaining) mode, would randomness in the first N blocks thwart analysis of subsequent blocks?
No, with three comments:
This is due to the fact that the probability of at least one coincidence between the plaintext that results from any ciphertext deciphered with any key and the set of all possible decipherments of a different ciphertext must be 1 for a "Random Cipher" but is 1/28 for DES.
For a given ciphertext, even though plaintext bit reversal is equally probable (as empirically known, [MOV97]) for any plaintext bit in one block as one reverses any bit of the key, this only applies to 256 samples of all possible 264 plaintexts -- which is the amount spanned by the 256 keys. Thus, DES decipherments do not produce a random distribution over 264 samples, but at most over 256 samples.
This condition has also been hitherto ignored in DES unicity calculations [cf. Sch98], but leads to a value of 1/28 (i.e., 256/264) for the probability of at least one coincidence between a plaintext that results from any ciphertext deciphered with any key and the set of all possible decipherments of a different ciphertext. Of course, this probability should have been 1 if the "Random Cipher" assumption were correct for a DES 8-byte block. The difference is noteworthy for DES, with a much reduced probability.
3.B.ii DES 56-bit Randomness: For DES, it is empirically known that altering any single key bit should alter each ciphertext bit in a block with probability e > 0, except for few known pathological cases. Experimentally, e = 1/2.
3.B.iv DES* 64-bit Randomness: For DES*, it is empirically known that altering any single ciphertext bit in an 8-byte block should alter each plaintext bit from that block with probability e > 0, except for few known pathological cases. Experimentally, e = 1/2.
3.B.v DES* 56-bit Randomness: For DES*, it is empirically known that altering any single key bit should alter each plaintext bit in a block with probability e > 0, except for few known pathological cases. Experimentally, e = 1/2.
3.C. Entropy of a generic English-looking message is 1.2 ~ 2 bits/byte, which means that 776 ~ 65,536 different English-looking messages are possible for each DES plaintext block, also with text compression.
For a message M, with entropy H = H(M), a DES plaintext block is expected to contain an average of 28*H different 8-character English-entropic strings.
For a DES block of 8-bytes, the message (note that "message" denotes the actual and unknown plaintext used to produce the ciphertext being attacked) corresponds to 8-characters of English text. According to the known estimate for the entropy of English text [Sch98] and within some considerations [Ger99] regarding its approximate use for short text, one would need 28*1.2 bits per plaintext block, in order to represent English text. Thus, using the entropy approximation for sequence statistics, a DES plaintext block is expected to contain an average of 28*1.2 = 776 different 8-character English-entropic strings.
Of course, the actual value should be higher for short sequences, as
we are dealing only with 8-character strings [Ger99]. This is also the
case when (as usual) text compression is used before encryption. To estimate,
if the entropy for compressed 8-character English text is 2 bits/character,
then 28*2 = 65,536 English-entropic strings with 8-characters
are to be expected, including compression.
To summarize, for English, H(M) is 1.2 bits/character or up to 2 bits/character for compressed text in 8-bytes, which provides from 28*1.2 = 776 to 28*2 = 65,536 expected English-entropic strings, also including text compression.
3.D. Entropy of plaintext deciphered with a random wrong key is 7 bits/byte.
The claim is that, given an 8-byte message M with entropy H = H(M) bits/byte, the corresponding DES ciphertext for a 56-bit random key: (i) has an entropy of 7 bits/byte; (ii) when deciphered by the same key produces the same message M; when deciphered by any other key produces a plaintext M' <> M with an entropy of 7 bits/byte.
It is important to note what the claim implies: the entropy values given in (i) and (iii) do not depend on the message entropy H(M) -- which is an expected behavior, even though the value of 7 bits/byte will introduce non-random characteristics as discussed below. The proof is as follows, per item:
3.D.ii - The correct-key deciphered plaintext entropy is H(M) bit/byte: As given by 3.B.iii, DES* decipherment of the ciphertext with the same key must produce the plaintext M, as one and the only one solution. This proves 3.D.ii.
3.D.iii - The wrong-key deciphered plaintext entropy is 7 bit/byte: As given by 3.B.v, DES* decipherments of any fixed 8-byte ciphertext by a random 56-bit key leads to random 8-byte plaintexts with 1/2 probability of bit reversal for any bit, with 256 possible decipherments. This directly defines an entropy of lg(256) = 56 bits per 8-bytes for the plaintext and proves item 3.D.iii: the entropy is (56 bits)/(8-bytes) = 7 bits/byte.
3.D.a. The encipherment redundancy is too high when one varies the key.
3. E. For each DES ciphertext block and plaintext deciphered with a random wrong key, the probability of occurrence of English-looking messages is (3 ~ 300)/256 per key, also including text compression.
This claim reflects the influence of the non-flat DES statistics that results from 3.D.b on "plaintext selection" upon a false decipherment. Consider the tentative decipherment of a given 8-byte ciphertext block with all possible wrong keys -- which are 256 - 1 keys -- what is the expected total number of English-looking texts M with entropy H = H(M) to be produced for all wrong keys? For a fully random choice of 8 bits/byte, item 3.C argued that 776 ~ 65,536 different choices are expected for English-looking text, including compression. However, we now expect a reduction due to the entropy decrease for plaintext that is produced by decipherments with a wrong key -- which is 7 bits/byte. There is less variety from the source, less "surprises", while still occupying 8 bytes of plaintext space. However, how much is the reduction, if any?
An estimate can be obtained on average terms -- which is appropriate, since we are dealing with a large number of choices (e.g., 264 ). With 8 bits/byte of entropy and 8 bytes of text one has 28*8 = 264 possibilities of which 776 ~ 65,536 are English-looking text (including compression). With 7 bits/byte of entropy and 8 bytes of text one has has 27*8 = 256 possibilities, that can be seen as 256 "random choice events with replacement" from a bin with 264 tokens of which (776 ~ 65,536) are "different" from all the others -- but they all have equal selection choice. Each random key corresponds to a choice event and the chosen token is replaced before the next choice event (i.e., before the next random key is used). The probability of selecting at least one of the (776 ~ 65,536) "different" tokens per key (i.e., per choice event) is, directly:
Item 3.C dealt with English-looking messages, which were calculated by their expected entropy. The same applies to item 3.E, which concludes with an assertion that "the total expected number of false English-looking messages deciphered by using all wrong keys is (3 ~ 300)". However, I note that this occurrence range must include 8-byte plaintext strings which are not all necessarily English-valid, even though they should be all English-looking. For example, the words "IN NO IS", "LAT WHEY", "CRATICT", "FROURE B", "PONDENOM", " REPTAGIN" given by Shannon [Sha48] -- which do contain valid unigrams, bigrams, trigrams, etc. according to English statistics (thus, reflected in the entropy value used) but they are not English. Thus, the expected number of English-entropic 8-character strings given in 3.E must be higher than the average number of English-valid strings.
In other words, DES decipherment under brute-force assumptions has been
reduced to a simple English plaintext selection problem, even when text
compression is used before encipherment. The result so far was that less
than 3 ~ 300 wrong messages need to be screened for all possible keys in
order to reveal the only one correct message. But, how many English-valid
messages should we expect? During selection, how can we decide which message
is probably not English-valid (though English-looking) and should be left
for a second pass, which message should one immediately try out?
As Shannon [Sha49] showed 50 years ago, this can be answered using standard English letter frequencies, digram and trigram frequencies, dictionaries and even known plaintext -- in what is essentially a "yes/no" detection problem.
This introduces the concept of a detection threshold, which represents the minimum probability that an acceptable 8-byte string must have under the language model defined by the attacker in order to qualify for a "yes" -- anything less than that is "no", but it is also possible to define a "maybe" class which can be used in a second pass if needed. The chosen detection threshold thus expectedly reduces the number of possible plaintext messages from the total value given in item 3.E to a lesser value.
Given the cost of a false detection, the cost of a miss, and their probabilities, which can be translated in detection theory in terms of accuracy and reliability [Ger98a], one can calculate the best threshold for the first pass (in terms of number of letters, digrams, trigrams, and so on frequencies, dictionary entry confirmation, known plaintext confirmation, etc.) that will minimize cost.
For example, a known result in [Sha49] is that one should need an average of 5-characters to disambiguate one English message -- this is the unicity distance for plaintext English. Since we have 8-characters in a DES block, Shannon [Sha49] showed thus that we should expect no large difficulties to disambiguate one correct English message from the set of all possible English-looking 3 ~ 300 samples.
For lower and upper case text represented in 8-byte characters, one may also directly reject mixed unlikely occurrences such as "PoNdEnOm" or even "hOWeVeR ", at least in a first pass. Of course, the expectedly few "maybe" results (less than 300) can be stored for a second pass, if needed.
This item answers two questions. First, the practical case where more than one key results from item 3.F above. In this case, I know there are wrong keys -- but, how to filter them? Would I need to decrypt all other blocks? Or, could I just decrypt one more block in order to reduce the probability of false-detection to zero? Second, even if just a single key results from 3.F it would be useful to test it with a low overhead before using it to decipher a long message -- but, how can I test a key?
Both questions can be solved by using DES "hash-function" property, described in 3.D above, but now with two DES blocks -- i.e., using two different ciphertexts for the same key. Of course, even if all 3 ~ 300 messages are left after applying the method in item 3.F to a block, this number is still small and would easily allow the corresponding 3 ~ 300 keys to be directly used to decipher just one more block.
Now, if a wrong key was accepted in item 3.F because it fortuitously led to a plaintext which was "nearly correct" in one block, in the next block the probability that this fortuitous case also occurs for the same key is the product of the probabilities for each block, which is ((3 ~ 300)/256)2 ~ 1/296, vanishingly small even if the probability decrease caused by text screening is not taken into account in both cases.
This means that a single key can be easily positively tested. Also, all wrong keys can be directly eliminated by one extra decryption per pre-selected key, if so needed.
This is a trivial item, included here for presentation clarity. Indeed,
item 3.B.iii shows that there is one and only one key which maps the ciphertext
to the original plaintext that was enciphered -- the correct key. Which
implied that we know there is one and only one correct message, even if
we have to try out all the 256 keys and even if for some keys
we have to use one extra block as in item 3.G above. At the end, there
must be only one key and this key is the correct one.
Thus, to summarize all eight points from 3.A to 3.H, when one tries to decrypt the DES encipherment of a plaintext with defined statistics (such as English text or 2x compressed English text), the above arguments show that using the wrong key almost absolutely guarantees that one obtains "garbage" as a result -- as with a "hash-function" -- except for very few messages compared to 256, perhaps as few as 3 ~ 300 for the first block. Afterwards, text frequency screening with an attacker-defined detection threshold can use the known English letter, digram, trigram, 4-gram, etc. frequency tables, also for compressed text and even for some expected plaintext and dictionaries, to automatically reduce the occurrences to a number close to one for 8-byte text. If one uses unsophisticated (less cost and less time) or incomplete (by necessity) text analysis, then it is possible that more than one key will be left after text screening, however expectedly few and less than 300. But, in this case, a second DES block of ciphertext is all that one needs in order to disambiguate the few remaining keys and find just the correct one, with almost unity probability.
The full message can therefore be readily decrypted and verified under the attack described, based on impossible decipherment constraints -- even if language screening is not very effective.
Further, due to its mathematical generality, I suppose that the "hash-function"
argument advanced in item 3.D and applied also in item 3.G would be valid
to a broad range of plaintext statistics, not just English and not just
natural languages. And, not just DES. On another topic, the "hash-function"
property could be useful for other purposes, not only for attackers. This
will be reported elsewhere.
Further application of these results are reported in [Ger99], using Identification Theory [Ger98] in level 2. They allow DES unicity to be reduced even more when the text is identified by its linguistic statistics and not by its actual words -- i.e., when the text is identified in Obscure form, not necessarily in Distinguished form. As shown in [Ger99], the least amount of information needed to break DES is then given by the concept of Unicity-5 (i.e., unicity for obscure identification), which is just 3 characters for English. For English text, even compressed text, one should then be able to break DES with the recognition of at most one-, two- and three-character frequencies. This means a 3-letter attack on DES.
Now, it is instructive to consider in somewhat more detail "how" such an attack may be conducted -- with 3 or 5 letters -- as it can be applied to any other cipher system, not only DES.
The attack could be called an "Identification Attack" and may use the new concept of "Unicity-5" described in [Ger99] or even the original unicity, called Unicity-1. With Unicity-5, however, the intention is not to identify words or even the whole plaintext in a block. The intention is just to identify a "valid plaintext". In both cases, the attack unfolds in the usual way, where one uses a brute-force key-search to find a probable key -- defined as that key which provides the required level of identification for the plaintext. From this key, the full plaintext can be read in one easy step. In case of failure, the search would continue.
The central observation is thus that in order to break a cipher system one does not need to identify the plaintext as Distinguished, just as Obscure. The reduced identification value in the DAOF scale pays off in terms of a strong reduction of the amount of plaintext needed in order to break the cipher. This was exemplified in the former Section, which showed why one can break DES with three letters.
The DES attack procedure is automatic and involves two possible loop-backs for the full 8-character text (i.e., not just for 3-characters -- the 3-characters mentioned are just a matching pattern of possibly 1, 2, 3 or more, in 8):
decipher -> trigrams -> Yes, "English" -> 5-letters -> Yes, English ^<- try new key <- No--|<------ No -------------------|
To develop a feeling for the (small) plaintext space one may need if only 3-characters are used, one can start with the maximum number of 218 cases if one includes all upper and lower case letters, space, punctuation and numbers in a 64-character alphabet (to simplify calculations). Of course, one should now rate the entries according to letter, bigram, trigram linguistic frequencies for English -- taken from general text or (better) from former texts of similar sources (e.g., electrical engineering, computer science, social, medical, etc.). Looking at usual trigram tables [Sha49], [Bek82], I would assume one can thus reduce the plaintext space to probably 12,000 cases for the most relevant English trigrams (the, ing, and, her, ere, ent, tha, nth, was, eth, for, dth, hat, she, ion, int, his,...) [Bek82]. The probability of occurrence decreases rapidly along such a list. For example, "the" is approximately 8x more probable than "his". But, it may get better (i.e., worse). Since most cipher systems begin with a "magic number" or a message begins with the usual "Received", "To:", "From:", etc., then it may be possible to further reduce the 3-character list, for the message beginning. Perhaps, down to 100 cases as I can evaluate from the variety in my own messages. This is comparable with the maximum number of expected false decipherments given by 3 ~ 300, as calculated in the above Section for 8-byte strings with or without text compression before encryption -- which is the value I use next.
Thus, one may need from 3 Kb to 30 Kb to store the entire DES-relevant information on the English plaintext space -- relevant enough to allow English to be detected and the DES key to be found. Of course, the less entries one has, the less false detections one may encounter. However, the less entries one has the more misses one may suffer. Given the cost of a false detection, the cost of a miss, and their probabilities, which can be translated in detection theory in terms of accuracy and reliability [Ger98], one can calculate the best threshold (in terms of number of letters, digrams, trigrams, and so on frequencies, dictionary entry confirmation, known plaintext confirmation, etc.) that will minimize cost. This can be further optimized in terms of specific dictionaries, so that one can define an overall best strategy for each language group, targeted in terms of n-gram length, their number of entries and dictionaries.
Uniting this Section with the former and summarizing the "why" and "how" of the "3-letter DES attack", one recognizes first that only one DES block needs to be used in any case (i.e., with or without text compression before encipherment) and that there are only approximately 3 ~ 300 expected false messages that may resemble English, from all possible decipherments of a given 8-byte ciphertext. According to this exposition, the 3 ~ 300 corresponding keys can then be graded as "yes" or "no", by using frequency tables and dictionaries to accept or reject the linguistic characteristics (e.g., for English) for the corresponding text; which allows the cipher to be broken when a second DES ciphertext is tested with the set of "yes" keys -- which are expectedly few, and yield just the correct key with an effective unity probability.
An interesting point from the discussion above is that, of course, one may use the full ciphertext for testing purposes at any stage, since one has already intercepted a part of it and since the message itself lies in the full ciphertext -- but the main reason why one does not want to do that before one finds only one expected key is that the full ciphertext space is very large, possibly spanning over thousands of DES 64-bit blocks which would have to be all tentatively decrypted.
There are two questions which may now surface.
The first question concerns the use of purposefully inserted or otherwise identified "magic numbers" such as OIDs, or covert channels. A protocol usually defines a header -- which is always used and allows one to easily spot such message in a tentative plaintext. That is how one may (for example) detect a PGP key by itself, or a PGP message. A covert channel ensues when the existence of this header (and, its significance to aid decryption) is ignored by at least one of the dialogue parties. This means that no one needs to approve "key-escrow" for session keys -- the message itself may provide enough unicity in a covert channel, as it already does (Note: I am making no affirmation about its actual use as such in any protocol -- just about its actual potential to be used as such, unwittingly to a dialogue party or to all dialogue parties).
The second question concerns the impression that we do only seem to find a "possible" plaintext, but not necessarily the one that was encrypted. As a matter of fact in probability analysis, yes we are concerned with "possible messages". But, the message that was actually encrypted will expectedly be counted among them (as a function of the selection choice) -- see item 3.E.
In a different slant, since 56-bit DES can be attacked by brute force without much expense as is well-known [EFF98] and it seems to be even easier as given above, DES well exemplifies the waste of using a WA-controlled random 56-bit key in order to protect at most 24 bits (3 bytes x 8 = 24). While a simple XOR encoder (a Vernam cipher) with 56-bits of key would protect 56-bits of English text -- a 130% increase over DES.
In general, a claim can be made for the Identification Attack -- that to break a cipher system one does not need to identify the plaintext as Distinguished, just as Obscure:
Therefore, by avoiding a larger scope for such ghost-threats, designers can make a good protocol easy to attack. But, by recognizing the sand-boxes we are forced to be in and the choices we are denied, we can recognize the ghost-threats we cannot use -- this is called a "negative pregnant" logical clause. And, to each denied choice we score a minus. On the other hand, for each choice we can insert, we score a plus. It does not matter if we do not use all choices -- just being there is enough. This was exemplified here for DES, when I called attention to the fact that the (264 - 256) ~ 264 denied decryption choices were the deciding factor to simplify an attack on DES. In fact, the paper showed that they turn DES into a non-random cipher over 64-bits -- DES was shown to be a random cipher only over 56-bits, in 64-bits of plaintext.
The results thus show some non-random characteristics of DES, all entailed by a dimensional reduction in the effective decipherment message space from 264 to 256 coordinates. Which was a modification introduced after the cipher was initially designed by IBM [Bek82] and is a collateral effect of the key-length reduction from 64-bits to 56-bits.
The exposition also describes an attack that explores this vulnerability, what I call the "Identification Attack", making use of impossible decryptions in two different impossibility constraints and in three steps:
The present results and the "Identification Attack" are of course not
only valid for DES and may be used to analyze the security of other cipher
[Bih98] Biham, E. et al., "Cryptanalysis of Skipjack Reduced to 31 Rounds using Impossible Differentials", in Technion Tech Rep. CS0947, 1998. Published at http://wwww.cs.technion.ac.il/~biham/publications.html
[EFF98] Cracking DES, reference in http://www.eff.org/descracker/
[Ger98a] Gerck, E., "Identification Theory", in two parts: http://mcwg.org/mcg-mirror/coherence.txt and http://mcwg.org/mcg-mirror/coherence2.txt
[Ger98b] Gerck, E., "Towards a Real-World Model of Trust", in http://mcwg.org/mcg-mirror/trustdef.htm
[Ger99] Gerck, E., "Unicity, DES Unicity, Open-Keys; Unknown-Keys", in http://mcwg.org/mcg-mirror/unicity.htm
[MOV97] A. Menezes et al., Handook of Applied Cryptography, CRC Press, New York, 1997.
[NBS77] NBS FIPS PUB 46, "Data Encryption Standard", National Bureau of Standards, U.S. Department of Commerce, Jan 1977.
[NIST99] NIST FIPS 46-3, Draft to specify the use of Triple DES and deprecate the use of DES, Jan 1999, see for example http://jya.com/nist011599.txt
[Sha48] Shannon, C. A Mathematical Theory of Communication. Bell Syst. Tech. J., vol. 27, pp. 379-423, July 1948. See also http://cm.bell-labs.com/cm/ms/what/shannonday/paper.html for a WWW copy.
[Sha49] Shannon, C. Communication Theory of Secrecy Systems. Bell Syst. Tech. J., vol. 28, pp. 656-715, 1949. See also http://www3.edgenet.net/dcowley/docs.html for readable scanned images of the complete original paper and Shannon's definition of "unicity distance" in page 693.
[Sch98] Schneier B., CRYPTO-GRAM, December 15, 1998, e-mail newsletter, http://www.counterpane.com. See also Schneier, B., Applied Cryptography, J. Wiley & Sons, 1996.