| Discuss this paper and the subject of security in general -- subscribe to the mcg-talk list. |
|
Ed Gerck
Copyright © 1999 by E. Gerck and MCG, first published
on Jan 5, 1999 in the mcg-talk list server.
Archived at http://www.mcg.org.br/unicity.htm
All rights reserved, free copying and citation allowed
with source and author reference.
The objective of this communication is thus to address the subject of security from this perspective -- which can be summarized in two questions: (i) how to protect one's data against a brute-force attacker that has almost unlimited resources and, (ii) Can this be done under the current secret-key export-free bit-length restrictions of the WA?
Apparently, the answer to both questions together must be negative. As answered by the very existence of the WA itself -- after all, the very purposes of the WA limitations on cryptography depend on a negative answer.
However, in 1949 Claude E. Shannon, an American scientist, published an application of his Information Theory (then, called "Communication Theory") to Cryptography [Sha49] -- which, in this author's opinion and as discussed here, can be used to derive positive and practical answers to both questions (i) and (ii) above, together. This will also show that the Wassenaar Arrangement is a moot issue to security and privacy concerns -- notwithstanding all other different issues that it may raise [EFF98].
To answer the questions, we need first to revisit the concept of "unicity".
Shannon [Sha49] defined "unicity distance" (hereafter, "n") as the least amount of plaintext which can be uniquely deciphered from the corresponding ciphertext -- given unbounded resources by the attacker. The "amount" of plaintext (i.e., "n") can be measured in any units the user may find convenient, such as bits, bytes, letters, symbols, etc. Actually, Shannon used "letters" in his paper.
Of further importance and often ignored or even contradicted by declarations in the literature such as "any cipher can be attacked by exhaustively trying all possible keys", I want to call attention to the fact that any cipher (including 56-bit DES) can be theoretically secure against any attacker -- even an attacker with unbounded resources -- when the cipher is used within its unicity. Not only the One-Time Pad (also called XOR-cipher or Vernam cipher). Thus, indeed there is a theoretically secure defense even against brute-force attacks, which is to work within the unicity limit of the cipher. And, it works for any cipher -- irrespective of key-length or encryption method used.
It is important to note, as the literature has also not been very neat in this regard, that unicity is always referred to the plaintext. However, it may also be applied to indicate the least amount of ciphertext which needs to be intercepted in order to attack the cipher -- within the ciphertext/plaintext granularity. For example, for a simple XOR-cipher (Vernam cipher), one byte of ciphertext links back to one byte of plaintext -- so, a unicity of n bytes implies n bytes of ciphertext. For DES, the ciphertext must be considered in blocks of 8 bytes -- so, a unicity of n bytes implies a corresponding modular number of 8 bytes.
Further, the present results are discussed for English and for compressed
English text, but they could equally have ben discussed for any natural
language, for any protocol or document format -- including fax, modem signals
or specialized protocols tunneled in within the encryption [see Ger99,
Section 2].
Please note that it does not matter how the attacker may try to decipher -- he can of course use brute-force and try out all keys or he can use short-cuts, it is his choice and he is entirely free to use any method he desires.
The main point is that if the intercepted message has less than "n" (bits, bytes, letters, etc.) of plaintext then it cannot be uniquely deciphered from an analysis of its ciphertext -- even by trying all keys. Conversely, any message that is larger than "n" bits could be theoretically broken -- albeit that may require some large (actually, unspecified) amount of work.
I call attention to the fact that the denominator of the formula above
may be zero in some cases of interest -- currently, at least of theoretical
interest. This means that the formula will fail to provide an answer in
such cases. I expect to address this situation in a new and expanded formulation
of unicity, which will follow several aspects to be advanced below -- specially
in regard to the Sections on identification of unicity, different unicity
definitions and "added trust". The interplay between unicity and trust
will also be further explored in next messages.
In the first case we (the attacker) trust that:
The answer to both questions is the unicity, in this case. In the general case, the first answer is the unicity and the second answer is the least number of cipher blocks which link back to the unicity. With the given data, the unicity is calculated as:
In the second case, we add the trust that the plaintext is in English. What changes? Our trust on f -- which is now:
However, I note that the above value is only correct for very long text -- certainly not for 8-character text. Thus, it will be considered here as an approximation, as usually done in the literature -- for lack of a tabulated entropy value for 8-character English text. The value of H(English) = 1.2 bits/character will be used throughout this exposition so, at least, the results will be consistent with one another. This also corresponds to a worst-case analysis in security -- since the entropy value for 8-character English text should be larger than the value for very long English text.
The calculation is:
Thus, the added trust in f' (vis-a-vis f) has decreased the unicity
approximately 5 times.
An example may further help clarify the differences, and the benefits of compression -- albeit 100% known to the attacker.
First, it is interesting to comment on compression/decompression as an encoding/decoding operation in comparison with encryption/decryption as an encipherment/decipherment operation.
Encipherment/decipherment operations are deterministic by themselves in their values (even though they may include random seeds) but they are usually so complex and involve so many different possibilities as a function of the keys and all possible messages, which are unknown by definition, that one must (and, is justified to) resort to probabilistic methods in order to describe average values for dealing with the variety of possible variables and outcomes -- as we have done by defining the entropy of the keys in a cipher system H(K), as well as the entropy of the actual message H(M) and the maximum possible entropy of the plaintext |M|.
To contrast, lossless compression/decompression can be seen as a reversible deterministic encoding which essentially forms a 1:1 partial function such that given a valid encoding one can uniquely and directly determine the valid decoding and vice-versa. Possibly, not only by a linear transformation but by an affine transformation -- which is still a 1:1 partial function and deals only with deterministic values. Thus, we may represent compression simply by a deterministic factor which scales bit/characters to bit/compressed-characters.
Following the arguments above, one can view the compression argument of this Section in an alternative way which may be more illuminating. Suppose we will use 2x compression. Then, the same amount of English-information (i.e., information as what you do not expect; surprise) is present before and after compression -- but now in half the space. Thus, 2 characters = 1 compressed-character. Of course, compression has limits and the limit of compression is the original entropy of the plaintext -- but 2x compression is usually a feasible goal for English text.
But, must we really consider the compression method known to the attacker?
Yes. In cryptography, specially with commercial programs, one usually assumes that only the secret key is not known (even though possible statistics about the secret-key may be known, such as the fact that most UNIX users do not use anything else besides lower-case letters and a few symbols such as !$%& in their passwords). Thus, one must assume that all details at least about the decompression algorithm (i.e., the compression algorithm may be secret in special one-way communication protocols) are either known or can be decompiled (e.g., the RC4 cipher algorithm released on the Usenet).
This makes it simple then for the attacker to set up an interface module that translates valid compressed-text into plaintext, such that given one it is immediately possible to know the other -- deterministically, unambiguously, uniquely. Which simply inverts the affine transformation. Of course, no knowledge about the underlying plaintext language statistics is needed for this. To identify which compression algorithm is in use, the attacker can rely upon publicly-known and necessarily present "magic numbers" or "information tags" as discussed in Section 2 of [Ger99].
For special protocols, in high-security applications, the "magic numbers" can also be implicit or encoded with a random seed in each use. However, the encoded data itself may betray its presence by its very structure, size, range, randomness, redundancy, etc. This is a problem in identification and, as given in [Ger98], "to identify is to look for connections" -- not just to look for an identity connection with a set of "magic numbers". The attacker can thus rely on intrinsic connections it may find or try.
To look for such intrinsic connections, it is interesting to note that the plaintext statistics are indeed translated to the compressed-text statistics, while of course possibly warped by the affine transformation. Yet, still recognizable through the affine transformation rules -- which is the 1:1 partial function. So, clever analysis can even allow the attacker to work directly in compressed space if one so wants, when analyzing a cipher. For example, Huffman coding assigns the most probable character to the code "O", the second most likely to the code "10" and so on (here, one may also have "1" and "01" but that is irrelevant). Thus, for English, we know immediately what they are: "0" is " " (space) and "10" is "e". And, so on, also using digrams, trigrams, etc. for the di-code, tri-code, etc. probabilities. The same applies to other compression methods such as Lempel-Ziv, pkzip, zoo, gzip, etc. -- not to mention the familiar headings and "magic numbers" that identify each compression method (see the respective comments in Section 2 of [Ger99]). For example, several compression methods rely on "autocode", which is a name for "uniquely decodable code" -- i.e., an easy distinguished target.
Now, for the compression example, I will use the last case of the example in section 3. and include a new trust-point after f' -- i.e. added public information which the attacker knows and can rely upon:
As we can see from above, twice compression has increased English text unicity more than twice -- actually, 22% more than the compression ratio -- vis-a-vis the second case.
Thus, this is not a question of efficiency alone -- it is a fundamental consequence of compression, since compression must increase the text's entropy per compressed-character. But, of course, compression also adds to communication efficiency, since there are less bytes to send for the same amount of text.
The benefit of compression is to increase unicity and such does not depend on obscurity for the compression algorithm.
In an alternative view, compression allows you to tunnel through brute-force
with more plaintext than the case without compression.
It may seem that the more the attacker trusts information about the intercepted message, the least the unicity. It seems intuitive that the more the attacker knows, the least he would need to know.
This is, however, provably wrong.
Added open trust -- publicly known to all, including the attacker -- can either increase or decrease unicity, as has been proved and exemplified in Sections 3. and 4. above.
Of course, to measure and understand this effect one needs to refer unicity to the same units of plaintext -- for example, in characters of English text. Otherwise we would be comparing apples with speedboats.
There is much more to the subject of added "open trust" than this brief introduction can contain, when one wants to use it to increase the security of cipher systems. Another example is the recent results reported by Tony Bartoletti in the mcg-talk listserver [MCG99], for plaintext hardening before encryption. However, for the moment, suffices to say that, demonstrably, added open trust can increase unicity -- even when publicly known to all -- and works as a type of "open-key".
Or, in other words, adequately adding open trust -- i.e., adding Open-Keys -- proved to decrease what the cipher system might disclose at most for a given length of intercepted ciphertext, under proper unicity conditions -- notwithstanding the public's full knowledge about such trust (and, the attacker's). With Open-Keys, the more the attacker knows the safer the system is.
Of course, the same feature is trivially true for "closed trust" -- i.e, that which is trusted but is not publicly known, as a "secret-key". And, is the basis for current WA export limitations [WA98]-- to limit the amount of "closed trust" one may have or require. That the same feature but not the same limitation applies to "open trust" is very noteworthy -- and not controllable.
The concept that emerges from such discussions is that since Open-Keys can provably enhance the security of too-short secret-keys, it is worthwhile to investigate what are the eventual limits of such synergy. For example, is there a maximum limit to the size P of an English message that can be sent within unicity conditions for a secret-key bit-length N and an open-key bit-length M? What are the tradeoffs?
The use of open trust should not however be confused with "open ignorance" -- where "ignorance" can also be seen as "absent trust" or "Formless trust" ( cf. "Formless" as discussed in [Ger98] and "Formless trust" in trust-ref list discussions). This will be discussed in the next item.
5.2. Absent Trust and Unknown-Keys -- 70-bit DES
What happens when everyone ignores a secret but only one person has trusted information that may lead to that secret? Then, we expect that "one person" to be able to discover the secret sooner -- by using that trust -- all other resources being equivalent. But, if the resources are not equivalent -- can we still favor one in lieu of the others? Yes, if we balance trust and ignorance to counter the resource imbalance, as this item demonstrates in two different scenarios.
First, I will consider increasing the security of Triple-DES when used with just one 56-bit key -- leading to an effective 59-bit key-length. Second, I will consider a more general case, describing a protocol called M-DES -- for which the effective key-length is (56 + M)-bits. For M=14, a practical case, this implies 70-bit key-length. All these methods are export-free under WA -- since the only secret-key has 56-bits. The other security bits are initially ignored by the recipient and attackers alike.
Consider the cipher protocol for one-key Triple-DES [NIST99], where the cipher is given by Encrypt[K1, Decrypt[K1, Encrypt[K1, Plaintext]]] = Encrypt[K1, Plaintext] (so that one-key Triple-DES is equivalent to the usual DES) and is represented by the letters EDE. Now, consider a (yet hypothetical) newer Triple-DES Standard, where all 8 combinations (EEE, EED, DDE ... DDD) and all key orderings K1, K2, K3 would be allowed and post-selected by the sender, Bob, after algorithm negotiation (Triple-DES) with Alice. This corresponds to added uncertainty. And, even if the attacker knows or supposes that one-key Triple-DES is being used (though a Triple-DES cipher was negotiated), there are still an open total of 8 possible choices with K1=K2=K3 for the one-key Triple-DES case. But Alice, who would have the right key from Bob (e.g., sent by a public-key cipher), would only have to test the first DES block to see which scheme was randomly chosen by Bob -- for which she just needs to try out one block 4 times, on average. An attacker would also need to try out an average of four blocks, but each one with an average of half the corresponding key-space of 256 -- which would at least multiply by 4x the average attack workload for one-key DES due to the combination uncertainty of EEE .. DDD.
In the above example, no one knows (neither Alice nor anyone else) which of the 8 possible schemes Bob did use -- everyone openly ignores it. Thus, this situation can be defined as "absent trust" or "open ignorance".
The concept of open ignorance can be seen as defining an "unknown key". And, it can be applied not only to the cipher algorithm itself as done above -- but also to the ciphertext as discussed in the next paragraph, and in combination. This is in contrast to the usual technique of adding "salt" (random data) to the plaintext, or to the algorithm's bit manipulation as done in the "crypt" procedure for UNIX passwords [MOV97, p. 393-394]. These forms of adding variability do not actually benefit from open ignorance to make a brute-force attack more difficult because the "salt" is publicly known -- it is not an unknown key. Actually, their main purpose is to make a dictionary attack more difficult [MOV97]. Further, even if unknown, "salt" added to the plaintext would not increase the attacker`s workload for a brute-force attack.
The effect of open ignorance can be exemplified in simple one-key 56-bit DES encryption, where an unknown key is applied to the ciphertext and imposes a large decryption uncertainty. As will be shown next, it can easily bootstrap key-length by a large factor -- without breaking any export restriction on the 56-bit limitation.
Consider a hypothetical M-DES Standard (hereafter, M-DES) which would specify a total of 2M pre-defined and publicly known 64-bit different blocks of data, for various choices of M = 0, 1, ... Now, when Bob wants to send a message to Alice, he needs to negotiate a M-DES cipher with Alice, which includes a 56-bit secret-key [Note-1] and a publicly defined value for M, say M = 14. The value of M is chosen by Bob according to his security needs [Note-2], and accepted (or not) by Alice. To use M-DES, Bob would privately choose one 64-bit "key" at will (from the public list of 214 "keys") and then repeatedly use this choice to XOR-encode [Note-3] one by one each of all 64-bit DES ciphertext blocks which are calculated with the 56-bit secret-key he shares with Alice -- without disclosing his choice (i.e., anyone else would ignore the choice). Thus, Bob's choice is an "unknown key" to anyone else, including Alice -- providing open ignorance. An attacker would have a workload (214)/2 = 8,192 times higher on average for all 56-bit secret-key combinations to be tried -- which would effectively raise the DES key-length from 56 to 70 bits. But the intended recipient Alice, who knows the 56-bit secret-key, would just have to try out an average of 8,192 DES decryptions of one 8-byte block in order to find out which "unknown key" was chosen by Bob -- a small initial overhead which can be averaged out for different sessions that can use the same "unknown key" (albeit with possibly different DES secret-keys, see below).
Of course, the same example can be used with the extended one-key Triple-DES discussed before, but inceasing perceived key-length by additional 3 bits (i.e., for the 8 possible combinations EEE...DDD) -- to 73-bits for one-key Triple-DES. For two-key and three-key DES, additional ignorance can come from an undisclosed key order.
On a practical question, how can 214 blocks of 64-bits be defined and known worldwide without doubt? This is a total of 16,384 blocks of 8-bytes each block. If stored as a table, they would take approximately 130 K bytes. But, they can also be defined as an algorithm -- in much less space. For example, as recursive applications of a secure hash-function such as SHA-1 [MOV97] on a simple and known seed such as "0", or as the output of a well-known long-cycle random-number generator with known seed. But, since ignorance does not pertain to the blocks themselves but rather to which block is chosen, there is a much simpler solution. If the decryption algorithm is a random cipher (see Section 2) at least over a subset of the plaintext space (as it must be for some subset -- for example, DES is a random cipher over a 56-bit plaintext space [Ger99]), then the needed 16,384 (i.e., 214) blocks of 64-bits each can be formed by a simple sequential numbering from No = 1 to (No + 16,384), where No is publicly known and fixed -- e.g., as a 64-bit number in the case of DES.
To increase ignorance to the full extent of the 64-bit space allowed by the ciphertext block, without the corresponding overhead, two dialogue parties could use a trusted channel (with the needed trust evaluated as a function of risk and cost for each possible channel: personal contact, mail, fax, phone, a previous e-mail, etc.) in order to establish a mutual long-term secret No = (1 to 264 - M). Then, No is used together with a publicly-known ignorance limit of 2M, where M << 64, in the M-DES protocol (exemplified by the M=14 case given above). The mutual long-term secret should be different for each dialogue party, increasing security. This would effectively increase the M-DES bit-length to 120 bits -- from a protocol secret-key of only 56-bits, defining a (64 - M)-bit key besides the 56-bit DES key and the unknown-key with M-bits. However, since this third key needs to be secret (for M =14, this key would have 50-bits -- too large for low-latency discovery) it would require the user to enter it or read it from a file, floppy, etc. Which might not be allowed by a strict interpretation of [WA98]. However, publicly known M and No as given by the previous example are of course not "secret" and are allowed by [WA98].
One question is the usual problem of key-repetition in XOR-encoding [Bek82] when the message to be enciphered is longer than the key, which is clearly unsafe for non-random (e.g., English) messages. However, note that DES ciphertext encrypted with the same DES-key is itself random from block to block -- even if only one bit of plaintext changes from block to block [Ger99] -- thus, the unknown-key can be kept constant in the XOR-encoding since the message itself (i.e., the DES ciphertext) that will be XOR-encoded is either constant or random. In other words, unlimited number of DES ciphertext 8-byte blocks can be securely XOR encrypted by a constant unknown-key that has only one 8-byte block.
A related question is collision after the final M-DES XOR encipherment, that can occur in two different ways -- which, however, only affects the attacker's knowledge whether a message was repeated, not what the message is. In the worse case of equal plaintexts before DES, given 2M choices (for example: 16,384) for the unknown-key it is a simple mathematical problem [Bek82] to calculate the maximum allowed number of repeated uses of the same DES 56-bit key that would guarantee a near-zero probability of randomly choosing the same unknown-key for that DES secret-key in different sessions -- thus deriving worse case reuse limits of the same DES secret-key to avoid XOR-ciphertext collision for each M. The same question can be asked for its dual, calculating (in the worse case of equal DES plaintexts) what is the maximum allowed number of repeated uses of the same unknown key, if the DES secret-key is changed for each session. Clearly, since M << 56, the last question can allow the dialogue parties to strongly reduce the average ignorance computation overhead for a chosen M, by spreading it over several sessions -- while the first question allows key-negotiation to be adequately scheduled for a chosen M. But, neither collision events are of practical concern if every session which randomly chooses a DES key (from 256) should also randomly choose the unknown key (from 2M) and vice-versa -- even for small M.
But, how does Bob decide on M? Implictly, because Bob is not interested
on M per se -- Bob's concern is security versus cost, for which Bob needs
to define a threat model. Let us suppose that Bob's threat model is dominated
by a brute-force attack for X years under currently estimated military
technology -- this directly implies a minimum key-length K0, needed to
defend against such threat within an acceptable confidence level (e.g.,
99.999% per key). Obviously, Bob wants an effective key-length that is
at least equal to K0 -- which value he inputs to M-DES. To find the effective
key-length, M-DES solves a decision problem which depends at least
on two independent variables: plaintext statistics and M. For a given statistics
and M, is the effective key-length larger than K0? Yes or no? If not, and
possible, increase M. Some of these issues are discussed in [Note-2]
and lead to the following table for some values:
| MESSAGE | Entropy | M | effective key-length |
| English text | 1.2 | 0 | 56 |
| English text | 1.2 | 14 | 70 |
| random text | 8 | 0 | (120) -- full unicity [Note-2] |
These and other cryptanalysis issues of M-DES, including higher key-length modes, with No public or private, will be further addressed elsewhere since they are outside the scope of this paper -- on unicity. For public discussions, also forwarded from other fora and private messages, please see the mcg-talk listserver [MCG99].
To summarize the M-DES scheme, the objective is to slow down the attacker much more than the penalty imposed to Alice and Bob -- so much more that it becomes infeasible within a given threat model. This is granted because:
Therefore, open ignorance can much improve limited key-length protocols as exemplified for DES, raising them even above the limit of controlled export [WA98], of 64-bits for symmetric ciphers such as DES -- in a practical setting. This is also useful because DES is a very well-tested cipher, with few and known weak points [Ger99] which can be well-accounted for -- except its low 56-bit key-length, which can be solved as above.
Thus, the above exposition is not only a "proof of principle" that key-length export restrictions can be easily foiled but also offers a practical way to achieve worldwide standards for 70-bit cryptography or much more -- today. Future expositions will show how this can be extended to (for example) 184-bit security or even to theoretically unbreakable security, but still only with a 56-bit secret-key.
Im general, the method can be applied as a "meta-method" to any block cipher system -- not only to DES -- defining a "M-X" method for cipher X. To summarize the overall scheme, the open ignorance technique depends on two parts:
In conclusion, the usefulness of unicity is perhaps not only as a measure
to help "grade" cipher systems (but where it may miserably fail if used
as originally defined -- see Section 2.), but it seems to be more interestingly
inversely related to what a cipher system may disclose at most -- when
any number of its cipher messages are attacked by unlimited computational
resources, including time. In this, open trust was shown to play an essential
role to increase unicity -- as an effective but open key, which however
does not change the perceived secret-key bit-length. This was contrasted
with open ignorance, which increases the perceived secret-key bit-length--
by an added uncertainty that is shared by all and leveraged by the original
secret-key bit length itself.
However, it is important to note that "open ignorance" adds uncertainty
in order to protect the cipher by increasing the attack workload within
limited resources, while Open-Key adds certainty in order
to protect the cipher from attacks that can use any (unlimited)
amount of resources. Also, with open ignorance there is an increase of
the perceived secret-key bit-length, which does not happen with Open-Keys
-- notwithstanding, security can be increased by either method. As a further
difference, with Open-Keys the more the attacker knows the safer the system
is -- while with Unknown-Keys the more everyone needs to know, the safer
the system is.
Thus, both methods described in this Section work in complement
-- therefore, it is feasible to use them in conjunction in order to obtain
increased security gains over each method in isolation. For example, improving
upon the 70-bit effective figure obtained above for 56-bit secret-key DES
and 14 bits of open ignorance -- by adding x bits of open trust.
This will be discussed elsewhere.
First, I note that it is hard to define what is "to look valid" -- we do not know what a valid plaintext is. So, we must expect difficulties to distinguish between a false plaintext and a correct one, since it is axiomatic that we do not know the plaintext.
In other words, we have an identification problem.
Using the Identification Theory outlined in [Ger98], I can consider the four possible identifications of all possible decryptions, defined by identification level I-2, as:
It also serves to exemplify the generalized concept of identification -- "to identify is to look for coherence" -- which allows identification to be defined without a previously required identity. This removes the circularity implied by the usual definition (see [Ger98]) and allows an intrinsic treatment of identification -- as needed here.
To start, I can rephrase Shannon's unicity (hereafter, Unicity-1) as:
This immediately allows us to define another unicity, as the "least upper bound" of Ambiguous:
It is easy to prove that Unicity-1 = Unicity-2 for a random cipher, by writing the probability expression that corresponds to the "maximum amount of plaintext that gives just one expected false decipherment" and transforming it into Unicity-1's terms.
This resolves a difference in the literature, by showing that for random ciphers it is equivalent to consider "the least amount of plaintext that can be deciphered" or the "maximum amount of plaintext that gives just one expected false decipherment". Since the last case may be easier to calculate, for some systems, one may choose.
It is also possible (and, useful for security) to define Unicity-3 until Unicity-6 by following down on the DAOF scale:
"Unicity-4 is the maximum amount of Obscure plaintext which can be deciphered from the corresponding ciphertext -- given unbounded resources by the attacker"
"Unicity-5 is the least amount of Obscure plaintext which can be deciphered
from the corresponding ciphertext -- given
unbounded resources by the attacker"
"Unicity-6 is the maximum amount of Formless plaintext which can be deciphered from the corresponding ciphertext -- given unbounded resources by the attacker"
But, as mentioned in Section 1, a theoretically secure defense against a brute-force attack on DES is to work within the unicity limit of DES.
This defense requires however the cipher to satisfy the "random cipher" condition within its entire message space. This was originally formulated by Shannon [Sha49] and, in the case of DES (an 8-byte block cipher), essentially requires the condition below:
However, as shown in [Ger99], the above statement is neither entirely correct nor, even if entirely correct, would allow DES to be considered a "Random Cipher" over more than one block.
Thus, current unicity calculations for DES, which simply ignore that the validity conditions for the unicity formula itself are denied, are in error and need to be revisited -- as I consider next.
When one uses compression (best case, as seen in Section 4.), different authors quote DES unicity to be from 16 to 20 bytes of English. For uncompressed English, a value of 8.2 is often quoted. These numbers are indeed easy to obtain. For example, if I would directly use the calculation given in Sections 3. and 4. above, DES unicity would indeed be 8.2 bytes for uncompressed English and 20 bytes for 2x compressed English -- since DES uses 56-bit keys and enciphers bytes (although in 8-byte blocks).
As another example, quoting from Schneier [Sch98]:
"The unicity distance grows as the redundancy of the plaintext shrinks. For compressed files, the redundancy might be 2.5, or three blocks of DES ciphertext".
Our interest is to study DES under brute-force attack. In this condition, the discussion in [Ger99] can be summarized in three observations:
ii. there is almost no expected plaintext ambiguity when inverting DES to tentatively decipher a ciphertext from English -- one expects to observe either "garbage" or readable (thus, valid) English. This is called the "one-block impossibility" in [Ger99].
iii. If a second block is used, using only the same keys pre-selected for the first block, "near correct" messages effectively do not occur -- which means that the key can be found even with unsophisticated or incomplete text analysis. This is called the "two-block impossibility" in [Ger99].
The idea is that the least amount of "Obscure readable and valid text" should already be enough, when compared with "Obscure garbage" -- as given in item (ii) above. In other words, the attacker just needs to identify with some degree of success whether English is probably present or not -- he does not need to know which English words were used in order to break into the key. This leads to what can be called an "Identification Attack", with least information. Of course, once the key is known, any message that uses that key can be directly deciphered -- irrespective of message length.
The question is now what unicity concept should one use as the least amount of information needed to break DES -- Unicity-5, as given in Section 6, seems more appropriate than Unicity-1.
Since DES works on 64-bit blocks, this issue is not so important to define the least amount of ciphertext that needs to be intercepted -- 64 bits is certainly more than enough. However, it is crucial to the system's security to know how many English characters one is allowed to transmit under provable security conditions -- i.e., that cannot be broken by a (given and possible) brute force attack. Of course, a low unicity also strongly reduces the workload of the attack since less ciphertext is needed and this may lead to simplifications in the computations.
This needs to be emphasized. Under current technology, 56-bit DES is only secure if used within its unicity -- which is expectedly much lower than calculated by Shannon's formula, by two counts: (i) the cryptographic formula does not apply and plaintext unicity must be used, (ii) the plaintext unicity is not Unicity-1 but the much lower Unicity-5. Also, attack workload could be much less than hitherto imagined.
To estimate Unicity-5, already defined in Section 6, I will use the given values of H(English) = 1 to 1.5 and define the value of Unicity-5 = 21 to 21.5 = 2 to 3 characters. I will use the largest value, as a worst case, to define the detection threshold:
Thus, the unicity of DES for English text is approximately 3 bytes -- compressed or not. Not 20 bytes. The difference between a name ... and just the initials.
This also means that if you are trying to brute-force DES you need just one ciphertext block. Not three, as given in [Sch98]. Which is a decrease of 20:3 in the number of plaintext bytes that need to be known for each key before the cipher is broken.
But, what is the significance of this result for DES?
First, the ratio of 20:3 implies at least a 3x increase in decryption throughput because one can now tentatively decipher only one DES 8-byte block per key trial, not 3 blocks (24 bytes). Attacks can be more efficient.
However, this point has relative relevance -- since the attack is still not trivial.
Far more important is the consequence that 20 bytes of compressed text are not safe against brute-force attack on DES -- and they never actually were, so we can imagine the consequences. Re-keying is also not very useful to block a brute-force attack since one would need to re-key DES after every two letters -- i.e., changing 56-bits every 16 bits.
These considerations thus immediately impact the assumed security of SSL, S/MIME and other protocols that use DES. They also mean that even the small refuge provided by unicity to resist brute-force attacks on text encrypted with DES, as strongly compressed as one may wish, is gone.
Further, it is interesting to note the self-consistency achieved in the argumentation. The hypothesis that DES can be attacked by brute-force has led to a DES unicity of 3 bytes -- which supports the hypothesis. In other words, the low unicity of 3 bytes decreases by 3x the workload of a brute-force attack and thus makes it more viable, when compared to the usual figure of 20 bytes.
As a final remark, the term "3-letter" Identification Attack as used against DES should be understood in the sense of frequency tables for 1-letter, 2-letters and, at most, 3-letters. Except in the case of privileged information that can be used as a probable plaintext (for example, OIDs, "magic numbers" as used in protocols, covert channels, etc.), one should view the term "3-letter" as an attack using only known statistics, perhaps even without a dictionary.
For further details on the DES specific considerations and probability
calculations pertinent to the "Identification Attack", please see [Ger99].
The paper shows that the low-end limit of security/key-bit is occupied by DES. Indeed, DES English messages can be brute-force attacked over a plaintext space of only 3 letters -- in an Identification Attack -- instead of the currently assumed 20 letters. This result immediately impacts the assumed security of SSL, S/MIME and other protocols that use DES. It further shows that re-keying is not of very much use under DES, even if done out-of-band, since one would have to re-key after every two bytes of text.
The paper also revisits the very concept of unicity and presents new unicity types which result from the application of Identification Theory, when identification levels are understood as cardinal numbers. It also provides practical examples, showing how unicity can be used as a merit figure to evaluate different cryptographic systems and how the results can be used to guide the design of safer systems.
In particular, the idea that added open trust can increase security -- even if known and certain to the attacker -- was shown and calculated. This leads to a new security paradigm, for which:
Further, in order to increase brute-force attack workload the paper describes a concept called "open ignorance" that deals with Unknown-Keys and a very high workload differential to discover the unknown key, as viewed by the intended recipient vis-a-vis an attacker. The concept of Unknown-Keys is distinct from the usual resource of adding uncertainty (e.g., "salt") to the plaintext or the algorithm's bit manipulation -- which mainly aim to bar dictionary attacks but offer no gain against a brute-force attack. This leads to another new security paradigm, for which:
The Open-Key/Unknown-Key concepts may also allow the use of smaller
secret-keys -- which can have other applications besides improving export-limited
cipher systems, such as in smart-cards, digital signatures, authentication,
non-repudiation, etc.
Thus, the above exposition is not only a "proof of principle" that
key-length export restrictions are a technical moot issue but also offers
a practical way to achieve worldwide standards for 70-bit cryptography
or much more -- today. Which can positively impact Internet e-commerce
and interoperation.
This work is being done within the stated objectives of the MCG, the
Meta-Certificate Group -- an Internet open group on Certification and Security,
currently with participants from 28 countries. To use public discussions
to develop a worldwide security lingua franca in a set of Open Internet
Standards and Open Source APIs (Application Programming Interface) that
will enpower programmers and users worldwide, in order to fully address
the current security, privacy and restricted interoperation issues of the
Internet -- as well as cost. It is important to note that this is not intended
as an attempt to violate any law in any country, but to allow the technical
concepts of privacy and security to be as transparent as possible in regard
to any national legislation in existence or in planning.
[EFF98] Cracking DES, reference in http://www.eff.org/descracker/
[Ger98] Gerck, E., "Identification Theory", in two parts: http://www.mcg.org.br/coherence.txt and http://www.mcg.org.br/coherence2.txt
[Ger99] Gerck, E., "Some Non-Random DES Characteristics and an Identification Attack on DES", in http://www.mcg.org.br/nrdes.htm
[MCG99] mcg-talk archive at http://www.mcg.org.br/cgi-bin/lwg-mcg/MCG-TALK/archives/
[MOV97] A. Menezes et al., Handook of Applied Cryptography, CRC Press, New York, 1997.
[NBS77] NBS FIPS PUB 46, "Data Encryption Standard", National Bureau of Standards, U.S. Department of Commerce, Jan 1977.
[NIST99] NIST FIPS 46-3, Draft to specify the use of Triple DES and deprecate the use of DES, Jan 1999, see for example http://jya.com/nist011599.txt
[Note-1] It is not relevant here how Alice knows the DES secret-key she shares with Bob, and why the attacker does not. This will be the subject of another exposition, as it pertains to a similar discussion of security for asymmetric cryptography -- public/private-key pairs also limited by the WA. For the moment, just suppose that Bob and Alice met and Alice received the secret-key (or, for example they exchanged floppies with 1,000's of 56-bit DES keys to be used in sequence).
[Note-2] If the plaintext data to be sent by Bob is truly random and unrelated to any other data (a very particular case) then Bob will choose M = 0 because the attacker cannot decide what is a valid plaintext (see the discussion in Section 6, above) and Alice needs to accept the decryption based on her trust that the secret-key is correct. Note that in this case, for truly random 64-bit data, key-length is formally120-bits but the cipher is theoretically secure since |M| = H(M) -- we have full unicity -- so, security is at the maximum possible level even with M = 0. However, if the plaintext is random but constitutes a private-key then Bob should choose M > 0 because the private-key can be tested against its public-key counterpart and thus be detectable by an attacker as a valid plaintext. Further, if Bob wants to send twice the same plaintext data that is truly random and unrelated to any other data, but does not want the attacker to know that the message is being repeated, then Bob could also send together with the message some additional non-random plaintext that Alice could detect -- for example, any message that can be detected by Unicity-5 (See Section 6) -- and use M > 0.
[Note-3] Note that the XOR-encoding will use a fixed key but a random-message for each block -- since the DES ciphertext is a "message" to the XOR. Thus, the XOR-encoding is secure since the "message" has a full entropy of 8 bits/byte [Ger99]. However, any other (secure) encryption method can be used with M-DES -- such as RC4 or RC5, even DES. But, DES is limited to 56-bits of key-length. Further, the option that involves least assumptions is surely the one that only uses XOR -- which can also be theoretically analyzed, in contrast to RC4, RC5, DES, etc.
[Sha49] Shannon, C. Communication Theory of Secrecy Systems. Bell Syst. Tech. J., vol. 28, pp. 656-715, 1949. See also http://www3.edgenet.net/dcowley/docs.html for readable scanned images of the complete original paper and Shannon's definition of "unicity distance" in page 693.
[Sch98] Schneier B., CRYPTO-GRAM, December 15, 1998, e-mail newsletter, http://www.counterpane.com. See also Schneier, B., Applied Cryptography, J. Wiley & Sons, 1996.
[WA98] http://www.wassenaar.org