Unicity, DES Unicity, Open-Keys, Unknown-Keys

This is a work document for open peer-review and public discussion. It may change frequently.
Discuss this paper and the subject of security in general -- subscribe to the mcg-talk list.
  • Browse List Repository 
  • Subscribe to mcg-talk
  •  

    Ed Gerck
    Copyright © 1999 by E. Gerck and MCG, first published on Jan 5, 1999 in the mcg-talk list server.
    Archived at http://mcwg.org/mcg-mirror/unicity.htm
    All rights reserved, free copying and citation allowed with source and author reference.
     

    Abstract

    This exposition revisits the concept of unicity in cryptographic systems and applies it to show that: (i) key-length is not the limiting parameter that controls the security of cryptographic systems as explicit in the Wassenaar Arrangement  export control , (i) ciphers can be much weaker than hitherto imagined as exemplified by showing that DES can be broken with an analysis over 3 letters and not 20, (iii) there are two new security concepts, called Open-Keys and Unknown-Keys,  which can increase security even within secret-key 56-bit length limitations imposed by the Wassenaar Arrangement.  As an application of unknown-keys, Section 5.2  discusses a protocol called M-DES, based on a single 56-bit DES encryption per block, which can afford 70-bits or more of perceived key-length to an attacker,  but within current export limitations of 56-bits of shared-secret. This offers a practical way to achieve worldwide standards for 70-bit cryptography or much more -- today -- which can positively impact Internet e-commerce and interoperability. By allowing the secure use of smaller secret-keys, the Open-Key/Unknown-Key concepts can also have other applications, such as in smart-cards, digital signatures, authentication, non-repudiation protocols, etc.
      1. INTRODUCTION AND THE ORIGINAL CONCEPT OF UNICITY
    2. CALCULATING UNICITY
    3. TWO EXAMPLES OF UNICITY: PRINTABLE AND ENGLISH MESSAGES
    4. ADDING COMPRESSION
    5. OPEN OR ABSENT TRUST: OPEN-KEYS AND UNKNOWN-KEYS
        5.1. Open Trust and Open-Keys
        5.2. Absent Trust and Unknown-Keys -- from 56-bit to 70-bit DES
        5.3. Unicity, Trust and Ignorance
    6. IDENTIFICATION OF UNICITY
    7. THE UNICITY OF DES
    8. CONCLUSIONS
    ACKNOWLEDGMENTS
    REFERENCES
     

    1. INTRODUCTION AND THE ORIGINAL CONCEPT OF UNICITY

    The security of cryptographic algorithms can be measured in many different ways, stressing different metrics [Sha49], [Bek82],  [MOV97], [Sch98]. The 1998 Wassenaar Arrangement [WA, WA98] was signed by 33 countries which propose to restrict the worldwide export of cryptography with more than 56-bits of secret-keys -- and, widely [cf. EFF98] regarded to negatively impact worldwide Internet interoperability, privacy, security and e-commerce. The WA also brought to the front the question of protection against  brute-force attacks to a ciphertext sent through an open and public network such as the Internet -- when the attacker has the full ciphertext plus routing information and enough resources to try out all keys, in an exhaustive search. Indeed, the WA can be seen as defining a current lower limit for the bit-length security parameter of symmetric ciphers, which should be larger than 64-bits. However, ciphers that obey this very limit cannot be freely exported to be used on the worldwide Internet -- due to the export limitation of 56-bits imposed by the WA.

    The objective of this communication is thus to address the subject of  security from this perspective -- which can be summarized in two questions: (i) how to protect one's data against a brute-force attacker that has almost unlimited resources and, (ii) Can this be done under the current secret-key export-free bit-length restrictions of the WA?

    Apparently, the answer to both questions together must be negative. As answered by the very existence of the WA itself -- after all, the very purposes of the WA limitations on cryptography depend on a negative answer.

    However, in 1949 Claude E. Shannon, an American scientist, published an application of his Information Theory (then, called "Communication Theory") to Cryptography  [Sha49] -- which, in this author's opinion and as discussed here, can be used to derive positive and practical answers to both questions (i) and (ii) above, together. This will also show that the Wassenaar Arrangement is a moot issue to security and privacy concerns -- notwithstanding all other different issues that it may raise [EFF98].

    To answer the questions, we need first to revisit the concept of "unicity".

    Shannon [Sha49] defined "unicity distance" (hereafter, "n") as the least amount of plaintext which can be uniquely deciphered from the corresponding ciphertext -- given unbounded resources by the attacker. The "amount" of plaintext (i.e., "n") can be measured in any units the user may find convenient, such as bits, bytes, letters, symbols, etc. Actually, Shannon used "letters" in his paper.

    In few words, "unicity" is the least message length that can be uniquely deciphered. As we will see, this number depends on several factors -- some explicit, most implicit. And, it is a fundamental property of secrecy systems.

    Of further importance and often ignored or even contradicted by declarations in the literature such as "any cipher can be attacked by exhaustively trying all possible keys", I want to call attention to the fact that any cipher (including 56-bit DES) can be theoretically secure against any attacker -- even an attacker with unbounded resources -- when the cipher is used within its unicity. Not only the One-Time Pad (also called XOR-cipher or Vernam cipher). Thus, indeed there is a theoretically secure defense even against brute-force attacks, which is to work within the unicity limit of the cipher. And, it works for any cipher -- irrespective of key-length or encryption method used.

    It is important to note, as the literature has also not been very neat in this regard, that unicity is always referred to the plaintext. However, it may also be applied to indicate the least amount of ciphertext which needs to be intercepted in order to attack the cipher -- within the ciphertext/plaintext granularity. For example, for a simple XOR-cipher (Vernam cipher), one byte of ciphertext links back to one byte of plaintext -- so, a unicity of n bytes implies n bytes of ciphertext. For DES, the ciphertext must be considered in blocks of 8 bytes -- so, a unicity of n bytes implies a corresponding modular number of 8 bytes.

    Further, the present results are discussed for English and for compressed English text, but they could equally have ben discussed for any natural language, for any protocol or document format -- including fax, modem signals or specialized protocols tunneled in within the encryption [see Ger99, Section 2].
     

    2. CALCULATING UNICITY

    As first given by Shannon [Sha49] under some restrictive assumptions, specially the "random cipher" assumption: Random Cipher:  all decipherments must produce a random flat distribution over all bits in the plaintext space, the mathematical expression for unicity can be cast in the following unfolded expression: n = H(K)/[|M| - H(M)] where the quantities are: and the entropies are calculated accordingly to the desired units (bits, bytes, letters, symbols, etc.), which also define the unit for n.  In the next discussion, "lg(x)" will designate log2(x) -- the unit of information in binary digit, the bit.

    Please note that it does not matter how the attacker may try to decipher -- he can of course use brute-force and try out all keys or he can use short-cuts, it is his choice and he is entirely free to use any method he desires.

    The main point is that if the intercepted message has less than "n" (bits, bytes, letters, etc.) of plaintext then it cannot be uniquely deciphered from an analysis of its ciphertext -- even by trying all keys. Conversely, any message that is larger than "n" bits could be theoretically broken -- albeit that may require some large (actually, unspecified) amount of work.

    I call attention to the fact that the denominator of the formula above may be zero in some cases of interest -- currently, at least of theoretical interest. This means that the formula will fail to provide an answer in such cases. I expect to address this situation in a new and expanded formulation of unicity, which will follow several aspects to be advanced below -- specially in regard to the Sections on identification of unicity, different unicity definitions and "added trust". The interplay between unicity and trust will also be further explored in next messages.
     

    3. TWO EXAMPLES OF UNICITY: PRINTABLE AND ENGLISH MESSAGES

    To exemplify the concept and also to unravel some subtle points on unicity, I will use an example in two cases. In both cases, we are intercepting an enciphered message. In the first case we trust that the text is in printable ASCII characters (including the space), while in the second case we also trust that it is in English. In each case, how much of the enciphered message do we need to intercept before we can calculate its key? In case two, will the additional trust make a difference?

    In the first case we (the attacker) trust that:

    where (d) is  the "random cipher" assumption made by Shannon, when deriving the unicity formula. Now, we ask "How many plaintext characters do we need to intercept at least, so that we can uniquely decipher the message and thus discover its key?" And, "How many ciphertext characters are needed to break the system?"

    The answer to both questions is the unicity, in this case. In the general case, the first answer is the unicity and the second answer is the least number of cipher blocks which link back to the unicity. With the given data, the unicity is calculated as:

    H(K) = lg(256) = 56 bits
    |M| = lg(28) = 8 bits/character
    H(M) = lg(95) = 6.57 bits/character
    which leads to n = 56/(8 - 6.57) = 39.2 ~ 40 characters. As given in d, the plaintext encoding uses 8-bit characters, i.e., one byte per character. This means that the system above has a unicity of 40 bytes -- which is also (in this case) the least length of ciphertext that I need to intercept in order to break it.

    In the second case, we add the trust that the plaintext is in English. What changes? Our trust on f -- which is now:

    This changes H(M) -- which is now the entropy of English text. A  quantity which has been measured by various authors and lies between 1.0 and 1.5 bits/character; for a discussion see [Bek82]. I will use the close-to-average value of 1.2 for consistency with the cryptography literature [Bek82], [Sch98].

    However, I note that the above value is only correct for very long text -- certainly not for 8-character text. Thus, it will be considered here as an approximation, as usually done in the literature -- for lack of a tabulated entropy value for 8-character English text. The value of H(English) = 1.2 bits/character will be used throughout this exposition so, at least, the results will be consistent with one another. This also corresponds to a worst-case analysis in security -- since the entropy value for 8-character English text should be larger than the value for very long English text.

    The calculation is:

    H(K) = lg(256) = 56 bits
    |M| = lg(28) = 8 bits/character
    H(M) = lg(English) = 1.2 bits/character
    which leads to n = 56/(8 - 1.2) = 56/6.8 = 8.2 ~ 9 characters. If I suppose an encoding of one byte per English letter then the system above has a unicity of 9 bytes -- which is also (in this case) the least length of ciphertext that I need to intercept in order to break it. This value is in agreement with the literature, for example the same value by Schneier [Sch98].

    Thus, the added trust in f' (vis-a-vis f) has decreased the unicity approximately 5 times.
     

    4. ADDING COMPRESSION

    The possibility of plaintext compression adds a further degree of freedom to the problem. If the plaintext is compressed before encipherment, then we rightly expect its entropy per compressed character to increase -- even though its entropy per English character does not increase. This is often confusing and may provide the impression that nothing is gained by compression or that we may need to "hide" the compression algorithm from the attacker.

    An example may further help clarify the differences, and the benefits of compression -- albeit 100% known to the attacker.

    First, it is interesting to comment on compression/decompression as an encoding/decoding operation in comparison with encryption/decryption as an encipherment/decipherment operation.

    Encipherment/decipherment operations are deterministic by themselves in their values (even though they may include random seeds) but they are usually so complex and involve so many different possibilities as a function of the keys and all possible messages, which are unknown by definition, that one must (and, is justified to) resort to probabilistic methods in order to describe average values for dealing with the variety of possible variables and outcomes -- as we have done by defining the entropy of the keys in a cipher system H(K), as well as the entropy of the actual message H(M) and the maximum possible entropy of the plaintext |M|.

    To contrast, lossless compression/decompression can be seen as a reversible deterministic encoding which essentially forms a 1:1 partial function such that given a valid encoding one can uniquely and directly determine the valid decoding and vice-versa. Possibly, not only by a linear transformation but by an affine transformation -- which is still a 1:1 partial function and deals only with deterministic values. Thus, we may represent compression simply by a deterministic factor which scales bit/characters to bit/compressed-characters.

    Following the arguments above, one can view the compression argument of this Section in an alternative way which may be more illuminating. Suppose we will use 2x compression. Then, the same amount of English-information (i.e., information as what you do not expect; surprise) is present before and after compression -- but now in half the space. Thus, 2 characters = 1 compressed-character. Of course, compression has limits and the limit of compression is the original entropy of the plaintext -- but 2x compression is usually a feasible goal for English text.

    But, must we really consider the compression method known to the attacker?

    Yes. In cryptography, specially with commercial programs, one usually assumes that only the secret key is not known (even though possible statistics about the secret-key may be known, such as the fact that most UNIX users do not use anything else besides lower-case letters and a few symbols such as !$%& in their passwords). Thus, one must assume that all details at least about the decompression algorithm (i.e., the compression algorithm may be secret in special one-way communication protocols) are either known or can be decompiled (e.g., the RC4 cipher algorithm released on the Usenet).

    This makes it simple then for the attacker to set up an interface module that translates valid compressed-text into plaintext, such that given one it is immediately possible to know the other -- deterministically, unambiguously, uniquely. Which simply inverts the affine transformation. Of course, no knowledge about the underlying plaintext language statistics is needed for this. To identify which compression algorithm is in use, the attacker can rely upon publicly-known and necessarily present "magic numbers" or "information tags" as discussed in Section 2 of [Ger99].

    For special protocols, in high-security applications, the "magic numbers" can also be implicit or encoded with a random seed in each use. However, the encoded data itself may betray its presence by its very structure, size, range, randomness, redundancy, etc. This is a problem in identification and, as given in [Ger98], "to identify is to look for connections" -- not just to look for an identity connection with a set of "magic numbers". The attacker can thus rely on intrinsic connections it may find or try.

    To look for such intrinsic connections, it is interesting to note that the plaintext statistics are indeed translated to the compressed-text statistics, while of course possibly warped by the affine transformation. Yet, still recognizable through the affine transformation rules -- which is the 1:1 partial function. So, clever analysis can even allow the attacker to work directly in compressed space if one so wants, when analyzing a cipher. For example, Huffman coding assigns the most probable character to the code "O", the second most likely to the code "10" and so on (here, one may also have "1" and "01" but that is irrelevant). Thus, for English, we know immediately what they are: "0" is " " (space) and "10" is "e". And, so on, also using digrams, trigrams, etc. for the di-code, tri-code, etc. probabilities. The same applies to other compression methods such as Lempel-Ziv, pkzip, zoo, gzip, etc. -- not to mention the familiar headings and "magic numbers" that identify each compression method (see the respective comments in Section 2 of [Ger99]).   For example, several compression methods rely on "autocode", which is a name for "uniquely decodable code" -- i.e., an easy distinguished target.

    Now, for the compression example, I will use the last case of the example in section 3. and include a new trust-point after f' -- i.e. added public information which the attacker knows and can rely upon:

    The new unicity calculation begins as the former one: while for H(M) we need now to transform units: which leads to n = 56/(8 - 2.4) = 10 compressed-characters. This means that I can now transport up to 10*2 = 20 characters of English when compression is used. If I suppose an encoding of one byte per English character then the system above has a unicity of 20 bytes -- which is also (in this case) the least length of ciphertext that I need to intercept in order to break it. This value is in agreement with the literature, for example with a similar value by Schneier [Sch98].

    As we can see from above, twice compression has increased English text unicity more than twice -- actually, 22% more than the compression ratio -- vis-a-vis the second case.

    Thus, this is not a question of efficiency alone -- it is a fundamental consequence of compression, since compression must increase the text's entropy per compressed-character. But, of course, compression also adds to communication efficiency, since there are less bytes to send for the same amount of text.

    The benefit of compression is to increase unicity and such does not depend on obscurity for the compression algorithm.

    In an alternative view, compression allows you to tunnel through brute-force with more plaintext than the case without compression.
     

    5. OPEN OR ABSENT TRUST: OPEN-KEYS AND UNKNOWN-KEYS

    5.1. Open Trust and Open-Keys

    It may seem that the more the attacker trusts information about the intercepted message, the least the unicity. It seems intuitive that the more the attacker knows, the least he would need to know.

    This is, however, provably wrong.

    Added open trust -- publicly known to all, including the attacker -- can either increase or decrease unicity, as has been proved and exemplified in Sections 3. and 4. above.

    Of course, to measure and understand this effect one needs to refer unicity to the same units of plaintext -- for example, in characters of English text. Otherwise we would be comparing apples with speedboats.

    There is much more to the subject of added "open trust" than this brief introduction can contain, when one wants to use it to increase the security of cipher systems. Another example is the recent results reported by Tony Bartoletti in the mcg-talk listserver [MCG99], for plaintext hardening before encryption. However, for the moment, suffices to say that, demonstrably, added open trust can increase unicity -- even when publicly known to all -- and works as a type of "open-key".

    Or, in other words, adequately adding open trust  -- i.e., adding Open-Keys -- proved to decrease what the cipher system might disclose at most for a given length of intercepted ciphertext, under proper unicity conditions -- notwithstanding the public's full knowledge about such trust (and, the attacker's).  With Open-Keys, the more the attacker knows the safer the system is.

    Of course, the same feature is trivially true for "closed trust" -- i.e, that which is trusted but is not publicly known, as a "secret-key". And, is the basis for current WA export limitations [WA98]-- to limit the amount of "closed trust" one may have or require. That the same feature but not the same limitation applies to "open trust" is very noteworthy -- and not controllable.

    The concept that emerges from such discussions is that since Open-Keys can provably enhance the security of too-short secret-keys, it is worthwhile to investigate what are the eventual limits of such synergy. For example, is there a maximum limit to the size P of an English message that can be sent within unicity conditions for a secret-key bit-length N and an open-key bit-length M? What are the tradeoffs?

    The use of  open trust should not however be confused with "open ignorance"  --  where "ignorance" can also be seen as "absent trust" or "Formless trust" ( cf. "Formless" as discussed in [Ger98] and "Formless trust" in trust-ref list discussions).   This will be discussed in the next item.

    5.2. Absent Trust and Unknown-Keys -- 70-bit DES

    What happens when everyone ignores a secret but only one person has trusted information that may lead to that secret? Then, we expect that "one person" to be able to discover the secret sooner -- by using that trust --  all other resources being equivalent. But, if the resources are not equivalent -- can we still favor one in lieu of the others? Yes, if we balance trust and ignorance to counter the resource imbalance, as this item demonstrates in two different scenarios.

    First, I will consider increasing the security of Triple-DES when used with just one 56-bit key -- leading to an effective 59-bit key-length. Second, I will consider a more general case, describing a protocol called M-DES -- for which the effective key-length is (56 + M)-bits. For M=14, a practical case, this implies 70-bit key-length. All these methods are export-free under WA -- since the only secret-key has 56-bits. The other security bits are initially ignored by the recipient and attackers alike.

    Consider the cipher protocol for one-key Triple-DES [NIST99], where the cipher is given by Encrypt[K1, Decrypt[K1, Encrypt[K1, Plaintext]]] = Encrypt[K1, Plaintext]  (so that one-key Triple-DES is equivalent to the usual DES) and is represented by the letters EDE. Now, consider a (yet hypothetical) newer Triple-DES Standard, where all 8 combinations (EEE, EED, DDE ... DDD) and all key orderings K1, K2, K3 would be allowed and post-selected by the sender, Bob, after algorithm negotiation (Triple-DES) with Alice.  This corresponds to added uncertainty. And, even if the attacker knows or supposes that one-key Triple-DES is being used (though a Triple-DES cipher was negotiated), there are still an open total of  8 possible choices with K1=K2=K3 for the one-key Triple-DES case. But Alice, who would have the right key from Bob (e.g., sent by a public-key cipher),  would only have to test the first DES block to see which scheme was randomly chosen by Bob -- for which she just needs to try out one block 4 times, on average. An attacker would also need to try out an average of four blocks, but each one with an average of half the corresponding key-space of 256 -- which would at least multiply by 4x the average attack workload for one-key DES due to the combination uncertainty of EEE .. DDD.

    In the above example, no one knows (neither Alice nor anyone else) which of the 8 possible schemes Bob did use -- everyone openly ignores it. Thus, this situation can be defined as "absent trust" or "open ignorance".

    The concept of open ignorance can be seen as defining an "unknown key". And, it can be applied not only to the cipher algorithm itself as done above -- but also to the ciphertext as discussed in the next paragraph, and in combination. This is in contrast to the usual technique of adding "salt" (random data) to the plaintext, or to the algorithm's bit manipulation as done in  the "crypt" procedure for UNIX passwords [MOV97, p. 393-394]. These forms of adding variability do not actually benefit from open ignorance to make a brute-force attack more difficult because the "salt" is publicly known -- it is not an unknown key. Actually, their main purpose is to make a dictionary attack more difficult [MOV97]. Further, even if unknown, "salt" added to the plaintext would not increase the attacker`s workload for a brute-force attack.

    The effect of open ignorance can be exemplified in simple one-key 56-bit DES encryption, where an unknown key is applied to the ciphertext and imposes a large decryption uncertainty. As will be shown next, it can  easily bootstrap key-length by a large factor -- without breaking any export restriction on the 56-bit limitation.

    Consider a hypothetical M-DES Standard (hereafter, M-DES) which would specify a total of 2M pre-defined and publicly known 64-bit different blocks of data, for various choices of M = 0, 1, ... Now, when Bob wants to send a message to Alice, he needs to negotiate a M-DES cipher with Alice, which includes a 56-bit secret-key [Note-1] and a publicly defined value for M, say M = 14. The value of M is chosen by Bob according to his security needs [Note-2], and accepted (or not) by Alice. To use M-DES, Bob would privately choose one 64-bit "key" at will (from the public list of 214  "keys") and then repeatedly use this choice to XOR-encode [Note-3] one by one each of all 64-bit DES ciphertext blocks which are calculated with the 56-bit secret-key he shares with Alice --  without disclosing his choice (i.e., anyone else would ignore the choice). Thus, Bob's choice is an "unknown key" to anyone else, including Alice -- providing open ignorance. An attacker would have a workload (214)/2 =  8,192 times higher on average for all 56-bit secret-key combinations to be tried -- which would effectively raise the DES key-length from 56 to 70 bits. But the intended recipient Alice, who knows the 56-bit secret-key, would just have to try out an average of 8,192 DES decryptions of one 8-byte block in order to find out which "unknown key" was chosen by Bob -- a small initial overhead which can be averaged out for different sessions that can use the same "unknown key" (albeit with possibly different DES secret-keys, see below).

    Of course, the same example can be used with the extended one-key Triple-DES discussed before, but inceasing perceived key-length by additional 3 bits (i.e., for the 8 possible combinations EEE...DDD) -- to 73-bits for one-key Triple-DES. For two-key and three-key DES, additional ignorance can come from an undisclosed key order.

    On a practical question, how can 214 blocks of 64-bits be defined and known worldwide without doubt? This is a total of 16,384 blocks of 8-bytes each block. If stored as a table, they would take approximately 130 K bytes. But, they can also be defined as an algorithm -- in much less space. For example, as recursive applications of a secure hash-function such as SHA-1 [MOV97] on a simple and known seed such as "0", or as the output of  a well-known long-cycle random-number generator with known seed. But, since ignorance does not pertain to the blocks themselves but rather to which block is chosen, there is a much simpler solution. If the decryption algorithm is a random cipher (see Section 2) at least over a subset of the plaintext space (as it must be for some subset -- for example, DES is a random cipher over a 56-bit plaintext space [Ger99]), then the needed 16,384 (i.e., 214) blocks of 64-bits each can be formed by a simple sequential numbering from N = 1 to (No + 16,384),  where No is publicly known and fixed -- e.g., as a 64-bit number in the case of DES.

    To increase ignorance to the full extent of the 64-bit space allowed by the ciphertext block, without the corresponding overhead, two dialogue parties could use a trusted channel (with the needed trust evaluated as a function of risk and cost for each possible channel: personal contact, mail, fax, phone, a previous e-mail, etc.) in order to establish a mutual long-term secret N = (1 to 264 - M). Then, N is used together with a publicly-known ignorance limit of 2M, where M << 64,  in the M-DES protocol (exemplified by the M=14 case given above). The mutual long-term secret should be different for each dialogue party,  increasing security. This would effectively increase the M-DES bit-length to 120 bits -- from a protocol secret-key of only 56-bits, defining a (64 - M)-bit key besides the 56-bit DES key and the unknown-key with M-bits. However, since this third key needs to be secret (for M =14, this key would have 50-bits -- too large for low-latency discovery) it would require the user to enter it or read it from a file, floppy, etc. Which might not be allowed by a strict interpretation of [WA98]. However, publicly known M and No as given by the previous example are of course  not "secret" and are allowed by [WA98].

    One question is the usual problem of key-repetition in XOR-encoding [Bek82] when the message to be enciphered is longer than the key, which is clearly unsafe for non-random (e.g., English) messages. However, note that DES ciphertext encrypted with the same DES-key is itself random from block to block  -- even if only one bit of plaintext changes from block to block [Ger99] -- thus,  the unknown-key can be kept constant in the XOR-encoding since the message itself (i.e., the DES ciphertext) that will be XOR-encoded is either constant or random. In other words, unlimited number of DES ciphertext 8-byte blocks can be securely XOR encrypted by a constant unknown-key that has only one 8-byte block.

    A related question is collision after the final M-DES XOR encipherment, that can occur in two different ways -- which, however, only affects the attacker's knowledge whether a message was repeated, not what the message is. In the worse case of equal plaintexts before DES, given 2M choices (for example: 16,384) for the unknown-key  it is a simple mathematical problem [Bek82] to calculate the maximum allowed number of repeated uses of the same DES 56-bit key that would guarantee a near-zero probability of randomly choosing the same unknown-key for that DES secret-key in different sessions -- thus deriving worse case reuse limits of the same DES secret-key to avoid XOR-ciphertext collision for each M. The same question can be asked for its dual, calculating (in the worse case of equal DES plaintexts) what is the maximum allowed number of repeated uses of the same unknown key, if the DES secret-key is changed for each session. Clearly, since M << 56, the last question can allow the dialogue parties to strongly reduce the average ignorance computation overhead for a chosen M, by spreading it over several sessions -- while the first question allows key-negotiation to be adequately scheduled for a chosen M. But, neither collision events are of practical concern if every session which randomly chooses a DES key (from 256) should also randomly choose the unknown key (from 2M) and vice-versa -- even for small M.

    But, how does Bob decide on M? Implictly, because Bob is not interested on M per se -- Bob's concern is security versus cost, for which Bob needs to define a threat model. Let us suppose that Bob's threat model is dominated by a brute-force attack for X years under currently estimated military technology -- this directly implies a minimum key-length K0, needed to defend against such threat within an acceptable confidence level (e.g., 99.999% per key). Obviously, Bob wants an effective key-length that is at least equal to K0 -- which value he inputs to M-DES. To find the effective key-length, M-DES solves a decision problem which depends at least on two independent variables: plaintext statistics and M. For a given statistics and M, is the effective key-length larger than K0? Yes or no? If not, and possible, increase M. Some of these issues are discussed in [Note-2] and lead to the following table for some values:
     

    MESSAGE Entropy M effective key-length
    English text 1.2 0 56
    English text 1.2 14 70
    random text 8 0 (120) -- full unicity [Note-2]
    which shows that small M does not necessarily mean "low security" -- it all depends on the plaintext statistics. Of course, one could also include the maximum allowed average latency-time for the recipient to discover the unknown-key as a constraint in the decision problem -- which will restrict the range of M.

    These and other cryptanalysis issues of M-DES, including higher key-length modes, with No public or private, will be further addressed elsewhere since they are  outside the scope of this paper -- on unicity. For public discussions, also forwarded from other fora and private messages, please see the mcg-talk listserver [MCG99].

    To summarize the M-DES scheme, the objective is to slow down the attacker much more than the penalty imposed to Alice and Bob -- so much more that it becomes infeasible within a given threat model. This is granted because:

    1. Both Alice and the attacker ignore any bit of M-bit unknown-key. Alice knows the 56-bit secret-key but the attacker ignores it. Alice needs to discover M-bits  while the attacker faces a brute-force attack on a (56 + M)-bit keyspace.
    2. If the M-bit can be discovered by trial decryptions only, Alice needs an average of 2M-1 DES trials with the known secret-key in order to find out the unknow-key. For M=14 as exemplified above, this needs about the same time as to DES-decrypt 64 Kbyte of data, a common amount of work. The same unknow-key can be used for Bob, so Bob has no such delay. Also, the same unknown-key can be used for several messages since the DES ciphertext is random with full entropy of 8 bit/byte (i.e., with 264 combinations).
    3. The attacker however needs to solve an average of 2(56+M) DES decryptions for one block (with the revised DES unicity presented in Section 7 of this paper, not three as usually quoted). For M=14, the attacker effectively faces a 70-bit DES key.
    To have an idea how much the above three points contribute to security, suppose the EFF DES-cracker machine [EFF98] would take an average of 40 hours (for 50% search) to break 56-bit DES. Then, the same DES-cracker would take about 75 years to break the discussed scheme, which is equivalent to 70-bit DES. Thus, Alice and Bob can enjoy 70-bit protection with export-free 56-bit DES, for a long time.

    Therefore, open ignorance can much improve limited key-length protocols as exemplified for DES, raising them even above the limit of controlled export [WA98], of 64-bits for symmetric ciphers such as DES -- in a practical setting.  This is also useful because DES is a very well-tested cipher, with few and known weak points [Ger99] which can be well-accounted for -- except its low 56-bit key-length, which can be solved as above.

    Thus, the above exposition is not only a "proof of principle" that key-length export restrictions can be easily foiled but also offers a practical way to achieve worldwide standards for 70-bit cryptography or much more -- today.  Future expositions will show how this can be extended to (for example) 184-bit security or even to theoretically unbreakable security, but still only with a 56-bit secret-key.

    Im general, the method can be applied as a "meta-method" to any block cipher system -- not only to DES --  defining a "M-X" method for cipher X. To summarize the overall scheme, the open ignorance technique depends on two parts:

    5.3. Unicity, Trust and Ignorance

    In conclusion, the usefulness of unicity is perhaps not only as a measure to help "grade" cipher systems (but where it may miserably fail if used as originally defined -- see Section 2.), but it seems to be more interestingly inversely related to what a cipher system may disclose at most -- when any number of its cipher messages are attacked by unlimited computational resources, including time. In this, open trust was shown to play an essential role to increase unicity -- as an effective but open key, which however does not change the perceived secret-key bit-length.  This was contrasted with open ignorance, which increases the perceived secret-key bit-length-- by an added uncertainty that is shared by all and leveraged by the original secret-key bit length itself.
     
    However,  it is important to note that "open ignorance" adds uncertainty in order to protect the cipher by increasing the attack workload within limited resources,  while Open-Key adds certainty in order to protect the cipher from attacks that can use any (unlimited) amount of resources. Also, with open ignorance there is an increase of the perceived secret-key bit-length, which does not happen with Open-Keys -- notwithstanding, security can be increased by either method. As a further difference, with Open-Keys the more the attacker knows the safer the system is -- while with Unknown-Keys the more everyone needs to know, the safer the system is.

    Thus,  both methods described in this Section work in complement -- therefore, it is feasible to use them in conjunction in order to obtain increased security gains over each method in isolation. For example, improving upon the 70-bit effective figure obtained above for 56-bit secret-key DES and 14 bits of open ignorance  -- by adding x bits of open trust. This will be  discussed elsewhere.
     

    6. IDENTIFICATION OF UNICITY

    Next, the identification of unicity is studied in terms of Identification Theory [Ger98].

    First, I note that it is hard to define what is "to look valid" -- we do not know what a valid plaintext is. So, we must expect difficulties to distinguish between a false plaintext and a correct one, since it is axiomatic that we do not know the plaintext.

    In other words, we have an identification problem.

    Using the Identification Theory outlined in [Ger98], I can consider the four possible identifications of all possible decryptions, defined by identification level I-2, as:

    The table above illustrates the difficulties posed by the question -- what is to "look valid"?

    It also serves to exemplify the generalized concept of identification -- "to identify is to look for coherence" -- which allows identification to be defined without a previously required identity. This removes the circularity implied by the usual definition (see [Ger98]) and allows an intrinsic treatment of identification -- as needed here.

    To start, I can rephrase Shannon's unicity (hereafter, Unicity-1) as:

    Now, if we view the above D, A, O, F classification of identification level I-2 as cardinal numbers ordered in the sequence D > A > 0 > F, then Shannon's unicity is the "greatest lower bound" of D -- i.e, as close as possible to Ambiguous, but still Distinguished.

    This immediately allows us to define another unicity, as the "least upper bound" of Ambiguous:

    Thus, Unicity-2 gives us the maximum amount of plaintext which can contain one false message -- plus the right (and, a priori unknown) one.

    It is easy to prove that Unicity-1 = Unicity-2 for a random cipher, by writing the probability expression that corresponds to  the "maximum amount of plaintext that gives just one expected false decipherment" and transforming it into Unicity-1's terms.

    This resolves a difference in the literature, by showing that for random ciphers it is equivalent to consider "the least amount of plaintext that can be deciphered" or the "maximum amount of plaintext that gives just one expected false decipherment". Since the last case may be easier to calculate, for some systems, one may choose.

    It is also possible (and, useful for security) to define Unicity-3 until Unicity-6 by following down on the DAOF scale:

    I will leave for future discussions the non-trivial consequences of such definitions -- except Unicity-5, which will be considered in Section 8.
     

    7. THE UNICITY OF DES

    Even though it has been around for a long time, since the 1970s, no significant weakness has been found on DES [NBS77], [MOV97] -- except its key-length of 56-bits. The few anomalies that were found are restricted to very special cases, such as the complementation for key and plaintext, leading to complemented ciphertext, and the handful cases for weak and semi-weak keys. The only significant attack on DES is still thus an exhaustive key search, also called the brute-force method, which explores its rather small key length when viewed by an attacker with large resources [EFF98]  -- though still modest in comparison with other potential attackers.

    But, as mentioned in Section 1, a theoretically secure defense against a brute-force attack on DES is to work within the unicity limit of DES.

    This defense requires however the cipher to satisfy the "random cipher" condition within its entire message space. This was originally formulated by Shannon [Sha49] and, in the case of DES (an 8-byte block cipher), essentially requires the condition below:

    Random Cipher: the decipherments must produce a random flat distribution over 64-bits in the plaintext space. Empirically, the condition seems to be satified for DES as Menezes [MOV97] reports for DES that "each bit of the ciphertext should depend on all bits of the key and all bits of plaintext; altering any single plaintext or key bit should alter each ciphertext bit with probability 1/2; and altering a ciphertext bit should result in an unpredictable change to the recovered plaintext block."

    However, as shown in [Ger99], the above statement is neither entirely correct nor, even if entirely correct, would allow DES to be considered a "Random Cipher" over more than one block.

    Thus, current unicity calculations for DES, which simply ignore that the validity conditions for the unicity formula itself are denied, are in error and need to be revisited -- as I consider next.

    When one uses compression (best case, as seen in Section 4.), different authors quote DES unicity to be from 16 to 20 bytes of English. For uncompressed English, a value of 8.2 is often quoted. These numbers are indeed easy to obtain. For example, if I would directly use the calculation given in Sections 3. and 4. above, DES unicity would indeed be 8.2 bytes for uncompressed English and 20 bytes for 2x compressed English -- since DES uses 56-bit keys and enciphers bytes (although in 8-byte blocks).

    As another example, quoting from Schneier [Sch98]:

    "This means that if you are trying to brute-force DES you need two ciphertext blocks. (DES's block length is 8 bytes.) Decrypt the first ciphertext block with one key after another. If the resulting plaintext looks like English, then decrypt the second block with the same key. If the second plaintext block also looks like English, you've found the correct key."

    "The unicity distance grows as the redundancy of the plaintext shrinks. For compressed files, the redundancy might be 2.5, or three blocks of DES ciphertext".

    However, I will show that DES unicity is actually close to 3 bytes of English -- and could even be 0 byte in some systems, such as in SSL that contains a large fraction of known plaintext.

    Our interest is to study DES under brute-force attack. In this condition, the discussion in [Ger99] can be summarized in three observations:

    i. for DES under brute-force attack, one must use the unicity of English alone, without any key but that provided by the English alphabet itself -- which is 5 bytes when characters are byte-encoded and one looks for unambiguous text identification.

    ii. there is almost no expected plaintext ambiguity when inverting DES to tentatively decipher a ciphertext from English -- one expects to observe either "garbage" or readable (thus, valid) English. This is called the "one-block impossibility" in [Ger99].

    iii. If a second block is used, using only the same keys pre-selected for the first block, "near correct" messages effectively do not occur -- which means that the key can be found even with unsophisticated or incomplete text analysis. This is called the "two-block impossibility" in [Ger99].

     Next, I note that in order to break DES the attacker is perhaps not limited by the least amount of "Distinguished readable and valid text" that he can recognize -- which is Unicity-1 at 5-bytes, as given in item (i) above.

    The idea is that the least amount of "Obscure readable and valid text" should already be enough, when compared with "Obscure garbage" -- as given in item (ii) above. In other words, the attacker just needs to identify with some degree of success whether English is probably present or not -- he does not need to know which English words were used in order to break into the key. This leads to what can be called an "Identification Attack", with least information. Of course, once the key is known, any message that uses that key can be directly deciphered -- irrespective of message length.

    The question is now what unicity concept should one use as the least amount of information needed to break DES -- Unicity-5, as given in Section 6, seems more appropriate than Unicity-1.

    Since DES works on 64-bit blocks, this issue is not so important to define the least amount of ciphertext that needs to be intercepted -- 64 bits is certainly more than enough. However, it is crucial to the system's security to know how many English characters one is allowed to transmit under provable security conditions -- i.e., that cannot be broken by a (given and possible) brute force attack. Of course, a low unicity also strongly reduces the workload of the attack since less ciphertext is needed and this may lead to simplifications in the computations.

    This needs to be emphasized. Under current technology, 56-bit DES is only secure if used within its unicity -- which is expectedly much lower than calculated by Shannon's formula, by two counts: (i) the cryptographic formula does not apply and plaintext unicity must be used, (ii) the plaintext unicity is not Unicity-1 but the much lower Unicity-5. Also, attack workload could be much less than hitherto imagined.

    To estimate Unicity-5, already defined in Section 6, I will use the given values of H(English) = 1 to 1.5 and define the value of Unicity-5 = 21 to 21.5 = 2 to 3 characters. I will use the largest value, as a worst case, to define the detection threshold:

    This corresponds to the least amount of information one expectedly needs in order to allow English text to be obscurely identified as English -- not word-wise and not unambiguously. And, under the same one character per byte encoding used for the other examples, this is 3 bytes. But, it is well known that 3 bytes of English do not really compress -- since it is already close to the entropy coding limit (the best possible) of 2 bytes -- even when using Huffman or Lempel-Ziv compression.

    Thus, the unicity of DES for English text is approximately 3 bytes -- compressed or not. Not 20 bytes. The difference between a name ... and just the initials.

    This also means that if you are trying to brute-force DES you need just one ciphertext block. Not three, as given in [Sch98]. Which is a decrease of 20:3 in the number of plaintext bytes that need to be known for each key before the cipher is broken.

    But, what is the significance of this result for DES?

    First, the ratio of 20:3 implies at least a 3x increase in decryption throughput because one can now tentatively decipher only one DES 8-byte block per key trial, not 3 blocks (24 bytes). Attacks can be more efficient.

    However, this point has relative relevance -- since the attack is still not trivial.

    Far more important is the consequence that 20 bytes of compressed text are not safe against brute-force attack on DES -- and they never actually were, so we can imagine the consequences. Re-keying is also not very useful to block a brute-force attack since one would need to re-key DES after every two letters -- i.e., changing 56-bits every 16 bits.

    These considerations thus immediately impact the assumed security of SSL, S/MIME and other protocols that use DES. They also mean that even the small refuge provided by unicity to resist brute-force attacks on text encrypted with DES, as strongly compressed as one may wish, is gone.

    Further, it is interesting to note the self-consistency achieved in the argumentation. The hypothesis that DES can be attacked by brute-force has led to a DES unicity of 3 bytes -- which supports the hypothesis. In other words, the low unicity of 3 bytes decreases by 3x the workload of a brute-force attack and thus makes it more viable, when compared to the usual figure of 20 bytes.

    As a final remark, the term "3-letter" Identification Attack as used against DES should be understood in the sense of frequency tables for 1-letter, 2-letters and, at most, 3-letters. Except in the case of privileged information that can be used as a probable plaintext (for example, OIDs, "magic numbers" as used in protocols, covert channels, etc.), one should view the term "3-letter" as an attack using only known statistics, perhaps even without a dictionary.

    For further details on the DES specific considerations and probability calculations pertinent to the "Identification Attack", please see [Ger99].
     

    8. CONCLUSIONS

    This exposition shows that key-length is not the most important parameter to evaluate the security of cryptographic systems. Rather, the concept of unicity, first introduced by Shannon 50 years ago, is shown to be a more comprehensive merit figure to describe the system's resilience to attacks -- specially when 56-bit brute-force attacks are feasible, as motivated by the very existence of export limitations for larger key-lengths under current Wassenaar Arrangement limitations.

    The paper shows that the low-end limit of security/key-bit is occupied by DES. Indeed, DES English messages can be brute-force attacked over a plaintext space of only 3 letters -- in an Identification Attack -- instead of the currently assumed 20 letters. This result immediately impacts the assumed security of SSL, S/MIME and other protocols that use DES. It further shows that re-keying is not of very much use under DES, even if done out-of-band, since one would have to re-key after every two bytes of text.

    The paper also revisits the very concept of unicity and presents new unicity types which result from the application of Identification Theory, when identification levels are understood as cardinal numbers. It also provides practical examples, showing how unicity can be used as a merit figure to evaluate different cryptographic systems and how the results can be used to guide the design of safer systems.

    In particular, the idea that added open trust can increase security -- even if known and certain to the attacker -- was shown and calculated. This leads to a new security paradigm, for which:

    The new security paradigm is the concept of Open-Keys. Much like a usual secret-key, an Open-Key also increases security -- but is publicly certain and known to all, including eventual attackers, with no uncertainty.

    Further, in order to increase brute-force attack workload the paper describes a concept called "open ignorance" that deals with Unknown-Keys and a very high workload differential to discover the unknown key, as viewed by the intended recipient vis-a-vis an attacker. The concept of Unknown-Keys is distinct from the usual resource of adding uncertainty (e.g., "salt") to the plaintext or the algorithm's bit manipulation -- which mainly aim to bar dictionary attacks but offer no gain against a  brute-force attack. This leads to another new security paradigm, for which:

    As an application of unknown keys,  Section 5.2 showed that DES can afford 70-bits of perceived secret-key length or much more, within current export limitations of 56-bits.

    The Open-Key/Unknown-Key concepts may also allow the use of smaller secret-keys -- which can have other applications besides improving export-limited cipher systems, such as in smart-cards, digital signatures, authentication, non-repudiation, etc.
     
    Thus, the above exposition is not only a "proof of principle" that key-length export restrictions are a technical moot issue but also offers a practical way to achieve worldwide standards for 70-bit cryptography or much more -- today. Which can positively impact Internet e-commerce and interoperation.

    This work is being done within the stated objectives of the MCG, the Meta-Certificate Group -- an Internet open group on Certification and Security, currently  with participants from 28 countries. To use public discussions to develop a  worldwide security lingua franca in a set of Open Internet Standards and Open Source APIs (Application Programming Interface) that will enpower programmers and users worldwide, in order to fully  address the current security, privacy and restricted interoperation issues of the Internet -- as well as cost. It is important to note that this is not intended as an attempt to violate any law in any country, but to allow the technical concepts of privacy and security to be as transparent as possible in regard to any national legislation in existence or in planning.
     

    ACKNOWLEDGMENTS

    I am grateful to Tony Bartoletti, Pedro Rezende, some private postings and the sci.crypt group for questions on the earlier versions of this paper.  Comments are welcome.
     

    REFERENCES:

    [Bek82] Beker, H., "Cipher Systems: The Protection of Communications",  J. Wiley & Sons, 1983, ISBN 0-471-89192-4.

    [EFF98] Cracking DES, reference in http://www.eff.org/descracker/

    [Ger98] Gerck, E., "Identification Theory", in two parts: http://mcwg.org/mcg-mirror/coherence.txt and http://mcwg.org/mcg-mirror/coherence2.txt

    [Ger99] Gerck, E., "Some Non-Random DES Characteristics and an Identification Attack on DES", in http://mcwg.org/mcg-mirror/nrdes.htm

    [MCG99] mcg-talk archive at http://mcwg.org/mcg-mirror/cgi-bin/lwg-mcg/MCG-TALK/archives/

    [MOV97] A. Menezes et al., Handook of Applied Cryptography, CRC Press, New York, 1997.

    [NBS77] NBS FIPS PUB 46, "Data Encryption Standard", National Bureau of Standards, U.S. Department of Commerce, Jan 1977.

    [NIST99] NIST FIPS 46-3, Draft to specify the use of Triple DES and deprecate the use of DES, Jan 1999, see for example http://jya.com/nist011599.txt

    [Note-1] It is not relevant here how Alice knows the DES secret-key she shares with Bob, and why the attacker does not. This will be the subject of another exposition, as it pertains to a similar discussion of security for asymmetric cryptography -- public/private-key pairs also limited by the WA. For the moment, just suppose that Bob and Alice met and Alice received the secret-key  (or, for example they exchanged floppies with 1,000's of 56-bit DES keys to be used in sequence).

    [Note-2] If the plaintext data to be sent by Bob is truly random and unrelated to any other data (a very particular case) then Bob will choose M = 0 because the attacker cannot decide what is a valid plaintext (see the discussion in Section 6, above) and Alice needs to accept the decryption based on her trust that the secret-key is correct. Note that in this case, for truly random 64-bit data, key-length is formally120-bits but the cipher is theoretically secure since |M| = H(M) -- we have full unicity -- so, security is at the maximum possible level even with M = 0. However, if the plaintext is random but constitutes a private-key then Bob should choose M > 0 because the private-key can be tested against its public-key counterpart and thus be detectable by an attacker as a valid plaintext. Further, if Bob wants to send twice the same plaintext data that is truly random and unrelated to any other data, but does not want the attacker to know that the message is being repeated, then Bob could also send together with the message some additional non-random plaintext that Alice could detect  -- for example, any message that can be detected by Unicity-5 (See Section 6) -- and use M > 0.

    [Note-3] Note that the XOR-encoding will use a fixed key but a random-message for each block -- since the DES ciphertext is a "message" to the XOR. Thus, the XOR-encoding is secure since the "message" has a full entropy of 8 bits/byte [Ger99]. However, any other (secure) encryption method can be used with M-DES -- such as RC4 or RC5, even DES. But, DES is limited to 56-bits of key-length. Further, the option that involves least assumptions is surely the one that only uses XOR -- which can also be theoretically analyzed, in contrast to RC4, RC5, DES, etc.

    [Sha49] Shannon, C. Communication Theory of Secrecy Systems. Bell Syst. Tech. J., vol. 28, pp. 656-715, 1949. See also http://www3.edgenet.net/dcowley/docs.html for readable scanned images of the complete original paper and Shannon's definition of "unicity distance" in page 693.

    [Sch98] Schneier B., CRYPTO-GRAM, December 15, 1998, e-mail newsletter, http://www.counterpane.com. See also Schneier, B., Applied Cryptography, J. Wiley & Sons, 1996.

    [WA98] http://www.wassenaar.org