Some Non-Random DES Characteristics 
and an Identification Attack on DES

This is a work document for open peer-review and public discussion. It may change frequently.
Discuss this paper and the subject of security in general -- subscribe to the mcg-talk list.
  • Browse List Repository
  • Subscribe to mcg-talk
  •  

    Ed Gerck
    Copyright © 1999 by E. Gerck and MCG, first published on Jan 13, 1999 in the mcg-talk list server.
    Archived at http://mcwg.org/mcg-mirror/nrdes.htm
    All rights reserved, free copying and citation allowed with source and author reference.
     

    ABSTRACT

    When an attacker knows only the ciphertext, the first important question is "What is a correct key?". This question is usually answered either by looking to the set of possible decipherments or to the set of impossible decipherments -- or, to both. However, for DES this exposition shows that almost all decipherments done with wrong keys are impossible for natural or compressed English statistics (as an example). So, even when an attacker knows only the ciphertext, brute-force DES attacks can be much simplified as discussed -- in fact they can be 3x faster than hitherto imagined. These results are based on some non-random DES characteristics, which, contrary to current literature, invalidate the characterization of DES as a random cipher over 64-bits of plaintext. However, the reported results have nothing to do with the obvious result that one has "only" 256 possible plaintexts in DES decryption, which is due to the fact that the key also has 56-bits. Rather, they have to do with the much larger number of (264 - 256) ~ 264 impossible decryptions and their very influence on the possible decryption statistics, which were so far ignored. This leads to an "Identification Attack" on DES, with two different impossibility constraints that simplify the attack. The present results are also valid for other ciphers but DES.

     

    1. INTRODUCTION
    2. APPLICATION DOMAIN
    3. NON-RANDOM DES CHARACTERISTICS
    4. THE IDENTIFICATION ATTACK
    5. DISCUSSION AND CONCLUSIONS
    ACKNOWLEDGMENTS
    REFERENCES
     

    1. INTRODUCTION

    DES -- Data Encryption Standard [NBS77, Bek82] -- is the well-known 56-bit cryptographic algorithm used worldwide on the Internet and in private networks. DES is a Feistel cipher which processes plaintext blocks of 64 bits (8 bytes) at a time, producing ciphertext blocks of 64 bits, under a 56-bit secret key [MOV97]. DES decryption consists of the same encryption algorithm and the same key, but with (internally) reversed key schedule. DES has received more attention recently, due to export restrictions of cryptography to 56-bits [WA98] and, the earlier announcement that 56-bit DES can be broken by exhaustive key-search with rather modest means and attack time [EFF98]. Now, NIST [NIST99] itself also recognizes that DES is no longer safe in some applications, recommending a change over to Triple-DES.

    However, DES and Triple-DES security may change if DES has non-random characteristics that could be explored in new attacks. One of the objectives of this paper is thus the study of non-random DES characteristics, in order to begin to assess such risks -- which are ignored nowadays.

    Indeed, even though it has been around for a long time, since the 1970s, no significant weakness has been found on DES [Bek82, MOV97] -- except its key-length of 56-bits. The few anomalies that were found are restricted to very special cases, such as the complementation for key and plaintext, leading to complemented ciphertext, and the handful cases for weak and semi-weak keys. The only significant attack on DES is still thus an exhaustive key search, also called the brute-force method, which explores its rather small key length when viewed by an attacker with large enough resources [EFF98] -- though still modest in comparison with other potential attackers.

    Even though sometimes possible as when attacking an automatic e-mail encryption engine, a DES known-plaintext attack is a very special case -- where the known-plaintext is simply used with tentative keys until the intercepted ciphertext is produced, which directly defines the correct key. In the general case, however, one must ask what happens when the attacker only knows the ciphertext plus routing information, as when intercepting a communication on the Internet. The first important question in such ciphertext-only attack is then: "What is a correct key?". This question is usually answered either by looking to the set of possible decipherments or to the set of impossible decipherments -- or, to both. However, this exposition shows that for DES almost all decipherments done with wrong keys are impossible for natural or compressed English statistics (as an example). So, brute-force DES attacks, even when only the ciphertext is known, are much simplified as discussed -- in fact they can be 3x faster than hitherto imagined [MOV97, Sch98].

    Why is that so?

    In a ciphertext-only attack, for example over the Internet, the attacker may also have good clues into the message's language -- for example, the message is in English -- from the messages' routing headers, address, etc. And, as Shannon first pointed out [Sha49], the use of any additional encoding before encryption, such as compression or even a transposition cipher, can be accounted for since the underlying language statistics can still be reached. In this situation, also with compression, one can use the concept of cipher unicity [Sha49, Ger99], which measures the least amount of plaintext that needs to be deciphered in order to unambiguously define the plaintext -- hence, its key. Thus, to answer the question "What is a correct key?" one must first answer the question "What is the cipher's unicity?". However, to answer this question one needs to first define over which message and key spaces the cipher can be classified as a "random cipher". This was originally formulated by Shannon [Sha49] and, in the case of DES (an 8-byte block cipher with 64-bits plaintext space), essentially requires DES to obey the condition:

    Random Cipher: all decipherments must produce a random flat distribution over 64-bits in the plaintext space. Empirically, this seems to be the case for DES. For example, Menezes [MOV97] reports that for DES "each bit of the ciphertext should depend on all bits of the key and all bits of plaintext; altering any single plaintext or key bit should alter each ciphertext bit with probability 1/2; and altering a ciphertext bit should result in an unpredictable change to the recovered plaintext block."

    The above statement of DES random characteristics by Menezes and similar declarations by other authors [Bek82, Sch98] seems justified by practice and is often used, along with similar statements, to justify DES as a "Random Cipher". For example, it is used in DES unicity calculations using Shannon's formula [Sha49], both for uncompressed text and compressed text by Schneier [Sch98], in one block and also over more than one block [MOV97, Sch98].

    However, this exposition shows that the usual statement about DES randomness, as quoted above from [MOV97], is neither entirely correct nor, even if entirely correct, would allow DES to be considered a "Random Cipher" over more than one block.

    A first consequence is that current unicity calculations for DES (and, some other ciphers) are too large and need to be revised.

    A second consequence is that DES cannot really hide the plaintext statistics, since almost nothing that is tentatively deciphered is close to the expected underlying language statistics of the plaintext except the only expected solution. Which observation unfortunately provides a practical and one-block algorithmic solution to the attacker's problem of separating all the (256 - 1) wrong keys from the only correct (but unknown) one.

    I note that the reported results have nothing to do with the obvious result that one has "only" 256 possible plaintexts in DES decryption, which is due to the fact that the key also has 56-bits. Rather, they have to do with the much larger number of (264 - 256) ~ 264 impossible decryptions and their very influence on the possible decryption statistics, which were so far ignored. This leads to an "Identification Attack" on DES, with two different impossibility constraints:

    i. The first impossibility deals with only one block of ciphertext -- the "one-block impossibility". The expected plaintext statistics (compressed English, for example) is predicted to almost never occur spontaneously in DES decryptions with a wrong key -- thus a wrong key could almost never decrypt a ciphertext to that impossible statistics. The paper will provide a probability number for "almost never".

     ii. The second impossibility is differential and defines that a coincidental result of two English-looking plaintext statistics is impossible for two ciphertext DES blocks encrypted with one key but decrypted with a wrong key. This defines a second impossibility, the "two-block impossibility". The paper will provide a probability number for "impossible".

    The second impossibility constraint will be shown to resemble the technique of impossible differentials described by Biham [Bih98], in which a plaintext differential is predicted never (also, almost never) to occur and thus the correct key could not decrypt a pair of ciphertexts to that difference. The slight distinction, and that is why it "resembles", is that I consider not particular and deterministic plaintext differentials, but statistic differentials for any plaintext. The first impossibility constraint is however different from Biham's because it does not concern differentials at all but language statistics for a single block of ciphertext.
     
     Regarding notation, in the next discussion, "lg(x)" will designate log2(x) -- the unit of information in binary digit, the bit.

    2. APPLICATION DOMAIN

    It is important to note that the present results are not only valid for DES and may be used also to analyze the security of other cipher systems. The results do not depend on any DES-specific S- or P-boxes but are parametrized by DES byte-sizes, which can be changed at will for other cipher systems.

    Further, the present results are discussed for English and for compressed English text, but they could have been equally well discussed for any natural language, for any protocol or document format -- including e-mail, fax, modem signals or specialized protocols tunneled in within the encryption.

    In particular, a natural language such as English was used to exemplify the concepts because natural languages are a harder target to model since they are essentially non-deterministic in all their aspects except alphabet. To contrast, a Microsoft Word .doc file is essentially deterministic in alphabet, grammar and lexicon, for all its control characters -- specially the initial control codes for internal commands, which are very repetitive even from file to file for different users. There must be no surprises when a parser decides a file is a MS Word file, or any other file format, since parsers are optimized for size and speed and this requires specialization.

    These well-defined and fixed commands will be called "magic numbers" in the exposition, as usual, since their task is to allow immediate "magic" identification by themselves. The "magic numbers" may be present in the beginning of the corresponding section, as usual, but may also be at the end or anywhere else.

    However, it is important to remark that one does not have to "know" it is a MS Word document -- one just adds its distinct n-character "magic number" header to one's list of trusted n-grams, and then the MS Word file is just "discovered". These "magic numbers" are thus encoding-sniffers, and one usually just needs three or four bytes to disambiguate document type and even version numbers. And, "magic numbers" are present everywhere, for example in PKZIP files, GIF files, UNIX compress files, in SSL messages with known-plaintext headers for all queries from a known-text HTML form, in digital certificates such as X.509, with the OID numbers and the CA name and key-hash as well as the distinctive ASN.1 encoding for its various standardized fields, or in PGP for example in the ASCII armor.

    In general, "magic numbers" are patterns which will be always present in any type of encoding or language since the very existence of patterns is what makes the encoding or language understandable to others, besides the sending party -- after all, this is the goal of standardization. This is explained in Trust Theory [Ger98b] by noting that trust is the vehicle of information and patterns are information tags which need to be mutually trusted (i.e., known and relied upon by all dialogue parties for their decisions) if they are to have any bearing on understanding.

    For special protocols, in high-security applications, the "magic numbers" can also be implicit or encoded with a random seed in each use. However, the encoded data itself may betray its presence by its very structure, size, range, randomness, redundancy, etc. This is a problem in identification and, as given in [Ger98], "to identify is to look for connections" -- not just to look for an identity connection with a set of "magic numbers". For further discussion on this topic and a practical example please see [Ger99], Section 4.
     

    3. NON-RANDOM DES CHARACTERISTICS

    The first arguments deal with the characteristics of DES, only -- independent of plaintext statistics. But, there are eight different arguments to be considered, as given below.

    3.A. Shannon's "Random Cipher" assumption breaks down when one deciphers more than one 8-byte DES block.

    The reason is that ciphertext in two different blocks did not mutually interact when each was enciphered. One effectively has different statistical ensembles for each block -- even though a block can influence the next block in cipher-feedback modes. DES enciphers/deciphers per block and these are the primitives that must be considered.

    Therefore, eventual formulas for each basic probabilistic calculation must include the full 8 bytes of ciphertext and plaintext, but not more and not less. And, when connecting results for different blocks, the basic ciphertext probabilities should be considered independently for each block -- notwithstanding the plaintext possible correlation between blocks.

    These conditions have been hitherto ignored in DES unicity calculations [cf. Sch98] -- which are perfunctorily extended over more than one block. However, the above arguments invalidate a priori the use of more than one DES block in Shannon's unicity calculations.

    The same conclusion is valid regardless of the DES mode of operation used, such as ECB, CBC, CFB or OFB [MOV97]. For example,  if we take DES CBC (Cipher-Block-Chaining) mode, would randomness in the first N blocks thwart analysis of subsequent blocks?

    No, with three comments:

    1.  it is true that the CBC chaining mechanism makes ciphertext in one block depend on all preceding plaintext blocks -- but this entire dependency is contained in the value of the previous ciphertext block. Thus, proper decryption of any block requires only the preceding raw (i.e., in ciphertext) block in CBC -- not the N blocks before.
    2. DES CBC mode secondarily-ciphers the plaintext. This means that the attacker only needs to tentatively decipher one block in order to break CBC: the current one -- with the previous block being XOR'ed raw with the resulting tentative plaintext.
    3. the 3-letter estimate given in [Ger99] for English-text (or, natural language text) is valid here also -- but it must change if the attacker changes the assumed plaintext statistics. For example, as discussed in [Ger99, Note-2], if the plaintext statistics is random, then the dialogue parties have theoretical security -- and practically any cipher is safe, not just 56-bit DES. However, if that random plaintext statistic changes after N blocks -- the attacker can immediately detect it, irrespective of the N blocks that have passed by (the attacker just needs to use the previous one, raw).
    3.B. Shannon's "Random Cipher" assumption breaks down within an 8-byte block itself, in DES.  DES is a random cipher over 56-bits, not 64-bits.

    This is due to the fact that the probability of at least one coincidence between the plaintext that results from any ciphertext deciphered with any key and the set of all possible decipherments of a different ciphertext must be 1 for a "Random Cipher" but is 1/28 for DES.

    For a given ciphertext, even though plaintext bit reversal is equally probable (as empirically known, [MOV97]) for any plaintext bit in one block as one reverses any bit of the key, this only applies to 256 samples of all possible 264 plaintexts -- which is the amount spanned by the 256 keys. Thus, DES decipherments do not produce a random distribution over 264 samples, but at most over 256 samples.

    This condition has also been hitherto ignored in DES unicity calculations [cf. Sch98], but leads to a value of 1/28 (i.e., 256/264) for the probability of at least one coincidence between a plaintext that results from any ciphertext deciphered with any key and the set of all possible decipherments of a different ciphertext. Of course, this probability should have been 1 if the "Random Cipher" assumption were correct for a DES 8-byte block. The difference is noteworthy for DES, with a much reduced probability.

    NOTE: It is interesting to comment that this strong probability reduction for any plaintext message to be reachable by an exaustive search over all keys, from 1 to 1/28, is due to the (264 - 256) ~ 264 plaintext messages which cannot be reached -- the zeroth set of the impossible occurrences that I will consider in this exposition. To account for the probability reduction proved above, and for the block limitation considered in item 3.A, I need to modify the statement provided by Menezes [MOV97] and quoted in the Introduction, reflecting the assymmetrical role played by 64-bit plaintexts/ciphertexts vis-a-vis 56-bit keys. Thus, I need two sets of independent statements, also taking into account the experimental data and the handful known pathological cases of weak keys and ciphertexts [MOV97]: 3.B.i DES 64-bit Randomness: For DES, it is empirically known that altering any single plaintext bit in an 8-byte block should alter each ciphertext bit from that block with probability e > 0, except for few known pathological cases. Experimentally, e = 1/2.

    3.B.ii DES 56-bit Randomness: For DES, it is empirically known that altering any single key bit should alter each ciphertext bit in a block with probability e > 0, except for few known pathological cases. Experimentally, e = 1/2.

    which are essentially the same as the usual ones, but where keys and plaintexts now play independent roles -- as they must, since they span spaces with different dimensionality (64 versus 56). Further, decipherment by DES* is the unique reversal of DES and uses exactly the same algorithm as encipherment by DES, with the same key but with (internally) reversed key schedule [NBS77]. Thus, I may also write for DES* (decipherment): 3.B.iii DES* Uniqueness: Given any plaintext M, any key K and the DES encipherment of P by K given by C, there is one and only one plaintext that can be deciphered by DES* from C with key K, which is M.

    3.B.iv DES* 64-bit Randomness: For DES*, it is empirically known that altering any single ciphertext bit in an 8-byte block should alter each plaintext bit from that block with probability e > 0, except for few known pathological cases. Experimentally, e = 1/2.

    3.B.v DES* 56-bit Randomness: For DES*, it is empirically known that altering any single key bit should alter each plaintext bit in a block with probability e > 0, except for few known pathological cases. Experimentally, e = 1/2.

    These five "DES facts" will be used elsewhere in this exposition and are also useful by themselves, to help make precise the basic underlying assumptions regarding DES. Note that they do not depend on any assumed values for the S- and P-boxes in DES, but form a statistical model based on experience -- with an assumed high-reliance (implictly given by 22+ years of testing). Thus, they can be applied to model any other block cipher system, also with e <> 1/2.
     

    3.C.  Entropy of a generic English-looking message is 1.2 ~ 2  bits/byte, which  means that 776 ~ 65,536 different English-looking messages are possible for each DES plaintext block, also with text compression.

    For a message M, with entropy H = H(M), a DES plaintext block is expected to contain an average of 28*H different 8-character English-entropic strings.

    For a DES block of 8-bytes, the message (note that "message" denotes the actual and unknown plaintext used to produce the ciphertext being attacked) corresponds to 8-characters of English text. According to the known estimate for the entropy of English text [Sch98] and within some considerations [Ger99] regarding its approximate use for short text, one would need 28*1.2 bits per plaintext block, in order to represent English text. Thus, using the entropy approximation for sequence statistics, a DES plaintext block is expected to contain an average of 28*1.2 = 776 different 8-character English-entropic strings.

    Of course, the actual value should be higher for short sequences, as we are dealing only with 8-character strings [Ger99]. This is also the case when (as usual) text compression is used before encryption. To estimate, if the entropy for compressed 8-character English text is 2 bits/character, then 28*2 = 65,536 English-entropic strings with 8-characters are to be expected, including compression.
     
    To summarize, for English, H(M) is 1.2 bits/character or up to 2 bits/character for compressed text in 8-bytes, which provides from 28*1.2 = 776 to 28*2 = 65,536 expected English-entropic strings, also including text compression.
     

    3.D.  Entropy of plaintext deciphered with a random wrong key is 7 bits/byte.

    The claim is that, given an 8-byte message M with entropy H = H(M) bits/byte, the corresponding DES ciphertext for a 56-bit random key: (i) has an entropy of 7 bits/byte; (ii) when deciphered by the same key produces the same message M; when deciphered by any other key produces a plaintext M' <> M with an entropy of 7 bits/byte.

    It is important to note what the claim implies: the entropy values given in (i) and (iii) do not depend on the message entropy H(M) -- which is an expected behavior, even though the value of 7 bits/byte will introduce non-random characteristics as discussed below.  The proof is as follows, per item:

    3.D.i - The ciphertext entropy is 7 bit/byte: As given by 3.B.ii, DES encipherments of any fixed 8-byte plaintext by a random 56-bit key leads to random 8-byte ciphertexts with 1/2 probability of bit reversal for any bit, with a total of 256 possible encipherments. This directly defines an entropy of lg(256) = 56 bits per 8-bytes for the ciphertext and proves item 3.D.i: the entropy is (56 bits)/(8-bytes) = 7 bits/byte.

    3.D.ii - The correct-key deciphered plaintext entropy is H(M) bit/byte: As given by 3.B.iii, DES* decipherment of the ciphertext with the same key must produce the plaintext M, as one and the only one solution. This proves 3.D.ii.

    3.D.iii - The wrong-key deciphered plaintext entropy is 7 bit/byte: As given by 3.B.v, DES* decipherments of any fixed 8-byte ciphertext by a random 56-bit key leads to random 8-byte plaintexts with 1/2 probability of bit reversal for any bit, with 256 possible decipherments. This directly defines an entropy of lg(256) = 56 bits per 8-bytes for the plaintext and proves item 3.D.iii: the entropy is (56 bits)/(8-bytes) = 7 bits/byte.

     The above results can be viewed in two different perspectives:

    3.D.a. The encipherment redundancy is too high when one varies the key.

    3.D.b. The decipherment redundancy is too low for wrong keys. The unfavorable redundancy values commented above are caused by the 64:56 dimensional reduction as one can see from the derivation itself. Note that it mainly involves a "geometric" factor -- it does not depend on cipher design for "confusion" or "diffusion" [Sha49]. The "geometric factor" thus introduces a further source of variability and will be discussed elsewhere.
     

    3. E. For each DES ciphertext block and plaintext deciphered with a random wrong key, the probability of occurrence of English-looking messages is (3 ~ 300)/256 per key,  also including text compression.

    This claim reflects the influence of the non-flat DES statistics that results from 3.D.b on "plaintext selection" upon a false decipherment. Consider the tentative decipherment of a given 8-byte ciphertext block with all possible wrong keys -- which are 256 - 1 keys -- what is the expected total number of English-looking texts M with entropy H = H(M) to be produced for all wrong keys? For a fully random choice of 8 bits/byte, item 3.C argued that 776 ~ 65,536 different choices are expected for English-looking text, including compression. However, we now expect a reduction due to the entropy decrease for plaintext that is produced by decipherments with a wrong key -- which is 7 bits/byte. There is less variety from the source, less "surprises", while still occupying 8 bytes of plaintext space.  However, how much is the reduction, if any?

    An estimate can be obtained on average terms -- which is appropriate, since we are dealing with a large number of choices (e.g., 264 ). With 8 bits/byte of entropy and 8 bytes of text one has 28*8 = 264 possibilities of which 776 ~ 65,536 are English-looking text (including compression). With 7 bits/byte of entropy and 8 bytes of  text one has has 27*8 = 256 possibilities, that can be seen as 256  "random choice events with replacement" from a bin with 264  tokens of which (776 ~ 65,536) are "different" from all the others -- but they all have equal selection choice. Each random key corresponds to a choice event and the chosen token is replaced before the next choice event (i.e., before the next random key is used). The probability of selecting at least one of the (776 ~ 65,536) "different" tokens per key (i.e., per choice event) is, directly:

    which confirms the claim. This probability also means that the total expected number of false English-looking messages deciphered by  using all wrong keys is (3 ~ 300). This result can be seen as the outcome of a "hash-function" where DES almost never produces English-looking text upon a wrong decipherment -- irrespective of the ciphertext values. 3.F.  For each DES ciphertext block and plaintext deciphered with a random wrong key, the probability of false-detection of English-valid messages depends on a user-defined detection threshhold but is less than (3 ~ 300)/256 per key,  also including text compression.

    Item 3.C dealt with English-looking messages, which were calculated by their expected entropy. The same applies to item 3.E, which concludes with an assertion that "the total expected number of false English-looking messages deciphered by  using all wrong keys is (3 ~ 300)". However, I note that this occurrence range must include 8-byte plaintext strings which are not all necessarily English-valid, even though they should be all English-looking. For example, the words "IN NO IS", "LAT WHEY", "CRATICT", "FROURE B", "PONDENOM", " REPTAGIN" given by Shannon [Sha48] -- which do contain valid unigrams, bigrams, trigrams, etc. according to English statistics (thus, reflected in the entropy value used) but they are not English. Thus, the expected number of English-entropic 8-character strings given in 3.E must be higher than the average number of English-valid strings.

    In other words, DES decipherment under brute-force assumptions has been reduced to a simple English plaintext selection problem, even when text compression is used before encipherment. The result so far was that less than 3 ~ 300 wrong messages need to be screened for all possible keys in order to reveal the only one correct message. But, how many English-valid messages should we expect? During selection, how can we decide which message is probably not English-valid (though English-looking) and should be left for a second pass, which message should one immediately try out?
     
    As Shannon [Sha49] showed 50 years ago, this can be answered using standard English letter frequencies, digram and trigram frequencies, dictionaries and even known plaintext -- in what is essentially a "yes/no" detection problem.

    This introduces the concept of a detection threshold, which represents the minimum probability that an acceptable 8-byte string must have under the language model defined by the attacker in order to qualify for a "yes" -- anything less than that is "no", but it is also possible to define a "maybe" class which can be used in a second pass if needed. The chosen detection threshold thus expectedly reduces the number of possible plaintext messages from the total value given in item 3.E  to a lesser value.

    Given the cost of a false detection, the cost of a miss, and their probabilities, which can be translated in detection theory in terms of accuracy and reliability [Ger98a], one can calculate the best threshold for the first pass (in terms of number of letters, digrams, trigrams, and so on frequencies, dictionary entry confirmation, known plaintext confirmation, etc.) that will minimize cost.

    For example, a known result in [Sha49] is that one should need an average of 5-characters to disambiguate one English message -- this is the unicity distance for plaintext English. Since we have 8-characters in a DES block, Shannon [Sha49] showed thus that we should expect no large difficulties to disambiguate one correct English message from the set of all possible English-looking 3 ~ 300 samples.

    For lower and upper case text represented in 8-byte characters, one may also directly reject mixed unlikely occurrences such as "PoNdEnOm" or even "hOWeVeR ", at least in a first pass. Of course, the expectedly few "maybe" results (less than 300) can be stored for a second pass, if needed.

    3.G. For two DES ciphertext blocks and plaintexts deciphered with the same random wrong key, the probability of false-detection of English-valid messages is less than 1/296 per key,  also including text compression.

    This item answers two questions. First, the practical case where more than one key results from item 3.F above. In this case, I know there are wrong keys -- but, how to filter them? Would I need to decrypt all other blocks? Or, could I just decrypt one more block in order to reduce the probability of false-detection to zero? Second, even if just a single key results from 3.F it would be useful to test it with a  low overhead before using it to decipher a long message -- but, how can I test a key?

    Both questions can be solved by using DES "hash-function" property, described in 3.D above, but now with two DES blocks -- i.e., using two different ciphertexts for the same key. Of course, even if all 3 ~ 300 messages are left after applying the method in item 3.F to a block, this number is still small and would easily allow the corresponding 3 ~ 300 keys to be directly used to decipher just one more block.

    Now, if a wrong key was accepted in item 3.F because it fortuitously led to a plaintext which was "nearly correct" in one block, in the next block the probability that this fortuitous case also occurs for the same key is the product of the probabilities for each block, which is ((3 ~ 300)/256)2 ~ 1/296, vanishingly small even if the probability decrease caused by text screening is not taken into account in both cases.

    This means that a single key can be easily positively tested. Also, all wrong keys can be directly eliminated by one extra decryption per pre-selected key, if so needed.

     
    3.H. There is one and only one correct message and correct key, even if the attacker has to try out all 256 keys.

    This is a trivial item, included here for presentation clarity. Indeed, item 3.B.iii shows that there is one and only one key which maps the ciphertext to the original plaintext that was enciphered -- the correct key. Which implied that we know there is one and only one correct message, even if we have to try out all the 256 keys and even if for some keys we have to use one extra block as in item 3.G above. At the end, there must be only one key and this key is the correct one.
     

    Thus, to summarize all eight points from 3.A to 3.H, when one tries to decrypt the DES encipherment of a plaintext with defined statistics (such as English text or 2x compressed English text), the above arguments show that using the wrong key almost absolutely guarantees that one obtains "garbage" as a result -- as with a "hash-function" -- except for very few messages compared to 256, perhaps as few as 3 ~ 300 for the first block. Afterwards, text frequency screening with an attacker-defined detection threshold can use the known English letter, digram, trigram, 4-gram, etc. frequency tables, also for compressed text and even for some expected plaintext and dictionaries, to automatically reduce the occurrences to a number close to one for 8-byte text. If one uses unsophisticated (less cost and less time) or incomplete (by necessity) text analysis, then it is possible that more than one key will be left after text screening, however expectedly few and less than 300. But, in this case, a second DES block of ciphertext is all that one needs in order to disambiguate the few remaining keys and find just the correct one, with almost unity probability.

    The full message can therefore be readily decrypted and verified under the attack described, based on impossible decipherment constraints -- even if language screening is not very effective.

    Further, due to its mathematical generality, I suppose that the "hash-function" argument advanced in item 3.D and applied also in item 3.G would be valid to a broad range of plaintext statistics, not just English and not just natural languages. And, not just DES.  On another topic, the "hash-function" property could be useful for other purposes, not only for attackers. This will be reported elsewhere.
     

    4. THE IDENTIFICATION ATTACK

    The former Section has concerned itself with the unicity of DES and has discussed "why" a 5-letter attack is possible on DES.

    Further application of these results are reported in [Ger99], using Identification Theory [Ger98] in level 2. They allow DES unicity to be reduced even more when the text is identified by its linguistic statistics and not by its actual words -- i.e., when the text is identified in Obscure form, not necessarily in Distinguished form. As shown in [Ger99], the least amount of information needed to break DES is then given by the concept of Unicity-5 (i.e., unicity for obscure identification), which is just 3 characters for English. For English text, even compressed text, one should then be able to break DES with the recognition of at most one-, two- and three-character frequencies. This means a 3-letter attack on DES.

    Now, it is instructive to consider in somewhat more detail "how" such an attack may be conducted -- with 3 or 5 letters -- as it can be applied to any other cipher system, not only DES.

    The attack could be called an "Identification Attack" and may use the new concept of "Unicity-5" described in [Ger99] or even the original unicity, called Unicity-1. With Unicity-5, however, the intention is not to identify words or even the whole plaintext in a block. The intention is just to identify a "valid plaintext". In both cases, the attack unfolds in the usual way, where one uses a brute-force key-search to find a probable key -- defined as that key which provides the required level of identification for the plaintext. From this key, the full plaintext can be read in one easy step. In case of failure, the search would continue.

    The central observation is thus that in order to break a cipher system one does not need to identify the plaintext as Distinguished, just as Obscure. The reduced identification value in the DAOF scale pays off in terms of a strong reduction of the amount of plaintext needed in order to break the cipher. This was exemplified in the former Section, which showed why one can break DES with three letters.

    The DES attack procedure is automatic and involves two possible loop-backs for the full 8-character text (i.e., not just for 3-characters -- the 3-characters mentioned are just a matching pattern of possibly 1, 2, 3 or more, in 8):

    decipher -> trigrams -> Yes, "English" -> 5-letters -> Yes, English
           ^<- try new key <- No--|<------ No -------------------|
    where the last 'Yes, English' is indeed English with high confidence, but the first one is Obscure -- "English" -- as measured by Unicity-5.

    To develop a feeling for the (small) plaintext space one may need if only 3-characters are used, one can start with the maximum number of 218 cases if one includes all upper and lower case letters, space, punctuation and numbers in a 64-character alphabet (to simplify calculations). Of course, one should now rate the entries according to letter, bigram, trigram linguistic frequencies for English -- taken from general text or (better) from former texts of similar sources (e.g., electrical engineering, computer science, social, medical, etc.). Looking at usual trigram tables [Sha49], [Bek82], I would assume one can thus reduce the plaintext space to probably 12,000 cases for the most relevant English trigrams (the, ing, and, her, ere, ent, tha, nth, was, eth, for, dth, hat, she, ion, int, his,...) [Bek82]. The probability of occurrence decreases rapidly along such a list. For example, "the" is approximately 8x more probable than "his". But, it may get better (i.e., worse). Since most cipher systems begin with a "magic number" or a message begins with the usual "Received", "To:", "From:", etc., then it may be possible to further reduce the 3-character list, for the message beginning. Perhaps, down to 100 cases as I can evaluate from the variety in my own messages. This is comparable with the maximum number of expected false decipherments given by 3 ~ 300, as calculated in the above Section for 8-byte strings with or without text compression before encryption -- which is the value I use next.

    Thus, one may need from 3 Kb to 30 Kb to store the entire DES-relevant information on the English plaintext space -- relevant enough to allow English to be detected and the DES key to be found. Of course, the less entries one has, the less false detections one may encounter. However, the less entries one has the more misses one may suffer. Given the cost of a false detection, the cost of a miss, and their probabilities, which can be translated in detection theory in terms of accuracy and reliability [Ger98], one can calculate the best threshold (in terms of number of letters, digrams, trigrams, and so on frequencies, dictionary entry confirmation, known plaintext confirmation, etc.) that will minimize cost. This can be further optimized in terms of specific dictionaries, so that one can define an overall best strategy for each language group, targeted in terms of n-gram length, their number of entries and dictionaries.

    Uniting this Section with the former and summarizing the "why" and "how" of the "3-letter DES attack", one recognizes first that only one DES block needs to be used in any case (i.e., with or without text compression before encipherment) and that there are only approximately 3 ~ 300 expected false messages that may resemble English, from all possible decipherments of a given 8-byte ciphertext. According to this exposition, the 3 ~ 300 corresponding keys can then be graded as "yes" or "no", by using frequency tables and dictionaries to accept or reject the linguistic characteristics (e.g., for English) for the corresponding text; which allows the cipher to be broken when a second DES ciphertext is tested with the set of "yes" keys -- which are expectedly few, and yield just the correct key with an effective unity probability.

    An interesting point from the discussion above is that, of course, one may use the full ciphertext for testing purposes at any stage, since one has already intercepted a part of it and since the message itself lies in the full ciphertext -- but the main reason why one does not want to do that before one finds only one expected key is that the full ciphertext space is very large, possibly spanning over thousands of DES 64-bit blocks which would have to be all tentatively decrypted.

    There are two questions which may now surface.

    The first question concerns the use of purposefully inserted or otherwise identified "magic numbers" such as OIDs, or covert channels. A protocol usually defines a header -- which is always used and allows one to easily spot such message in a tentative plaintext. That is how one may (for example) detect a PGP key by itself, or a PGP message. A covert channel ensues when the existence of this header (and, its significance to aid decryption) is ignored by at least one of the dialogue parties. This means that no one needs to approve "key-escrow" for session keys -- the message itself may provide enough unicity in a covert channel, as it already does (Note: I am making no affirmation about its actual use as such in any protocol -- just about its actual potential to be used as such, unwittingly to a dialogue party or to all dialogue parties).

    The second question concerns the impression that we do only seem to find a "possible" plaintext, but not necessarily the one that was encrypted. As a matter of fact in probability analysis, yes we are concerned with "possible messages". But, the message that was actually encrypted will expectedly be counted among them (as a function of the selection choice) -- see item 3.E.

    In a different slant, since 56-bit DES can be attacked by brute force without much expense as is well-known [EFF98] and it seems to be even easier as given above, DES well exemplifies the waste of using a WA-controlled random 56-bit key in order to protect at most 24 bits (3 bytes x 8 = 24). While a simple XOR encoder (a Vernam cipher) with 56-bits of key would protect 56-bits of English text -- a 130% increase over DES.

    In general, a claim can be made for the Identification Attack -- that to break a cipher system one does not need to identify the plaintext as Distinguished, just as Obscure:

    CLAIM: In order to define what is a valid but unknown plaintext when deciphering a message, a reduced identification value in the DAOF scale pays off in terms of a reduction of the amount of plaintext needed to break the cipher. The minimum amount of plaintext needed is given by Unicity-5, under Obscure identification. As a final remark, the term "3-letter Identification Attack" as used against DES should be understood in the sense of frequency tables for 1-letter, 2-letters and, at most, 3-letters. Except in the case of privileged information that can be used as a probable plaintext (for example, OIDs, "magic numbers" as used in protocols, covert channels, etc.), one should view the term "3-letter" as an attack using only known statistics, perhaps even without a dictionary.
     

    5. DISCUSSION AND CONCLUSIONS

    Regarding the present cryptoanalysis of  DES, one may recall Shannon's comparison [Sha49] with chess strategy. Chess is a game for which most of the threats are not carried out -- but surely influence the game and make the opponent lose time.

    Therefore, by avoiding a larger scope for such ghost-threats, designers can make a good protocol easy to attack. But, by recognizing the sand-boxes we are forced to be in and the choices we are denied, we can recognize the ghost-threats we cannot use -- this is called a "negative pregnant"  logical clause. And, to each denied choice we score a minus. On the other hand, for each choice we can insert, we score a plus. It does not matter if we do not use all choices -- just being there is enough. This was exemplified here for DES, when I called attention to the fact that the (264 - 256) ~ 264 denied decryption choices were the deciding factor to simplify an attack on DES. In fact, the paper showed that they turn DES into a non-random cipher over 64-bits -- DES was shown to be a random cipher only over 56-bits, in 64-bits of plaintext.

    The results thus show some non-random characteristics of DES, all entailed by a dimensional reduction in the effective decipherment message space from 264 to 256 coordinates. Which was a modification introduced after the cipher was initially designed by IBM [Bek82] and is a collateral effect of the key-length reduction from 64-bits to 56-bits.

    The exposition also describes an attack that explores this vulnerability, what I call the "Identification Attack", making use of impossible decryptions in two different impossibility constraints and in three steps:

    1. Only one DES block needs to be used in any case (i.e., with or without text compression before encipherment) and there are only approximately 3 ~ 300 expected false messages that may resemble English, from all possible 256 decipherments of a given 8-byte ciphertext -- everything else being effectively impossible. This leads to a set of 3 ~ 300 pre-selected keys, left over from all possible 256 keys. Thus defining the first impossibility, the "one-block impossibility", which probability is (3 ~ 300)/256 ~ 0.
    2. Using frequency tables and dictionaries, the attacker defines a detection threshold and grades all pre-selected keys either as "yes" or "no", according to the linguistic characteristics (e.g., English) for the corresponding one block DES text. Possible "maybe" results (expectedly 300 at most) can be saved for a second pass, if needed.
    3. If more than one key remains from this process and also in order to positively test a single key that may have remained, the attacker uses a second block of ciphertext (but only for those remaining keys) -- which allows the key to be confirmed with effectively unity probability, since a coincidental result of two English-looking plaintext statistics is an impossible differential for two DES ciphertext blocks encrypted with one key but decrypted with a wrong key. This defines a second impossibility, the "two-block impossibility", which probability is 1/296 ~ 0.
    The second impossibility is akin to the one described by Biham [Bih98], even though the plaintext impossibility is defined here by a rule such as English-looking statistics and not by a set of values as in the case of [Bih98]. However, the first impossibility constraint (one-block impossibility) is a different kind since it is not a differential impossibility of plaintexts for two different ciphertext blocks but a single plaintext which has an impossible statistic-model for one ciphertext block -- in other words, with an impossible statistic-content for decryption.

    The present results and the "Identification Attack" are of course not only valid for DES and may be used to analyze the security of other cipher systems.
     

    ACKNOWLEDGMENTS

    I am grateful to Tony Bartoletti, tks, Pedro Rezende, Gregory Rose, the sci.crypt group, and some private postings for questions on this text and on the earlier e-mail versions of this exposition. Comments are welcome.
     

     REFERENCES:

    [Bek82] Beker, H., "Cipher Systems: The Protection of Communications",  J. Wiley & Sons, 1983, ISBN 0-471-89192-4.

    [Bih98] Biham, E. et al., "Cryptanalysis of Skipjack Reduced to 31 Rounds using Impossible Differentials", in Technion Tech Rep. CS0947, 1998. Published at  http://wwww.cs.technion.ac.il/~biham/publications.html

    [EFF98] Cracking DES, reference in http://www.eff.org/descracker/

    [Ger98a] Gerck, E., "Identification Theory", in two parts: http://mcwg.org/mcg-mirror/coherence.txt and http://mcwg.org/mcg-mirror/coherence2.txt

    [Ger98b] Gerck, E., "Towards a Real-World Model of Trust", in http://mcwg.org/mcg-mirror/trustdef.htm

    [Ger99] Gerck, E., "Unicity, DES Unicity, Open-Keys; Unknown-Keys", in  http://mcwg.org/mcg-mirror/unicity.htm

    [MOV97] A. Menezes et al., Handook of Applied Cryptography, CRC Press, New York, 1997.

    [NBS77] NBS FIPS PUB 46, "Data Encryption Standard", National Bureau of Standards, U.S. Department of Commerce, Jan 1977.

    [NIST99] NIST FIPS 46-3, Draft to specify the use of Triple DES and deprecate the use of DES, Jan 1999, see for example http://jya.com/nist011599.txt

    [Sha48] Shannon, C. A Mathematical Theory of Communication. Bell Syst. Tech. J., vol. 27, pp. 379-423, July 1948. See also http://cm.bell-labs.com/cm/ms/what/shannonday/paper.html for a WWW copy.

    [Sha49] Shannon, C. Communication Theory of Secrecy Systems. Bell Syst. Tech. J., vol. 28, pp. 656-715, 1949. See also http://www3.edgenet.net/dcowley/docs.html for readable scanned images of the complete original paper and Shannon's definition of "unicity distance" in page 693.

    [Sch98] Schneier B., CRYPTO-GRAM, December 15, 1998, e-mail newsletter, http://www.counterpane.com. See also Schneier, B., Applied Cryptography, J. Wiley & Sons, 1996.

    [WA98] http://www.wassenaar.org