Date: Tue, 3 Nov 1998 22:13:03 -0200 (EDT) From: "416720" To: MCG Subject: [MCG] What is identification, that we can identify it? List: Identification is often understood as an act of identifying, or of establishing an identity. Identity is usually defined as "the distinguishing character or personality of an individual" [1]. Of course, such definitions cannot be applied on the Internet. Any mention to "identity" or "identity authentication" on the Internet is a mere tag for something else, such as an individual's purported attribute. Certainly, without any physical contact with an individual or even without any way to directly verify it. It is simply not possible to speak of "identity" or "identification" as a dictionary defines it, over the Internet. All legal and technical studies that call for or depend on such equivalence are misleading. Any such "identity" or "identification" can be faked, repudiated or are unwarranted to relying-parties (e.g. by CAs) over the Internet. Though not so extensive, this is also an unsolved problem in the 3D-world, outside the Internet. On 15th April 1997, The Daily Telegraph, a UK quality newspaper, reported on Alan Reeve [2] -- a convicted criminal and triple killer who was described as "friendly, caring, dependable and loving" by his fiancée when he was arrested under false identity in Ireland. Clearly, the indeterminacy of "identity" on the 3D-world is itself a reason to doubt any extension of such credentials to the Internet [3]. Moreover, on the Internet, we also need to identify hosts, routing, software, etc. -- not just humans. What is the solution, if any? This essay begins with a suggestion to revisit the concept of identification -- what is identification, that we can identify it? To fulfill this goal, the work intends to be as general as possible. It should be useful to the scientific development of all Internet protocols, to all human communication modes, to all information transfer models and anywhere one needs to reach beyond one's own point in space and time. What is "to identify"? I posit that "to identify" is to look for connections. Thus, in identification we look for logical or natural connections. For example: - between a fingerprint and the person that has it, - between a name and the person that answers by that name, - between an Internet host and a URL that connects to it, - between an idea and the way we can represent it in words, - conversely, between words and the ideas they represent, - etc. Do you, the reader, agree? If you agree you have just identified. If you do not agree, likewise you have identified. The essence of identification is thus to find connections -- where absence of connections also counts. Identification can thus be understood not only in the sense of an "identity" connection, but in the wider sense of "any" connection. Which one to use is just a matter of protocol expression, need, cost and (very importantly) privacy concerns. To go forward, I refer now to the British form "connexions" (for connections) which point out to the same root as "nexus". Linguistically, nexus is *both* the link as well as the linked collective. Usually, we say something "has nexus" when we perceive natural or logical connexions between its elements -- i.e., when there is a perceived coherence. Henceforth, to be precise, I use the terms: - "nexus": a linked collective of elements. - "coherence": links that form a linked collective. Linguistically, "coherence" is "a natural or logical connection" -- thus, the intended technical focus is linguistically correct both for "nexus" as well as for "coherence". I am not creating new words nor defining new meanings for old ones -- I am just capturing their precise and old meanings, while distinguishing between the thing itself (coherence) and its effects (nexus). Thus, a set of elements can form a nexus if and only if there is some coherence among them, as recognized by an observer. Different observers may observe different nexus or even an absence of nexus, when looking to the same set of elements. I note that this is an important realization in day-to-day life, so that we can healthly accept and work with people that are different from our own ego (person), since we all have different viewpoints in one aspect or another. This issue will be taken up again when I deal with Objective, Intersubjective and Subjective Realities. The distinction and interoperation implied in this paragraph is even more important on the Internet and between different software agents and messages, that express our wishes as our "ambassadors" in foreign land. We want their actions to be coherent with our wishes -- i.e., to have a natural or logical connection. To formally describe the fact that different observers may diverge in their declarations about coherence when observing a set of elements, I note that a nexus is of course more than the sum of the elements in it -- since a nexus is a linked collective of elements. Thus, formally, I can say that: nexus = (sum of elements) + coherence What thus is differently expressed by different observers when they evaluate a nexus, even though they all see the same elements? Coherence. With these considerations I can already identify things -- albeit rather crudely. I can say that an element is "part of" or is not "part of" a nexus. In other words, I defined a function called "part-of" which now allows me to express in a very clear way what is "part-of a nexus" and what is not: Part-of a nexus: an element is part-of a nexus, as evaluated by an observer, if and only if the observer can perceive coherence between the element and the nexus. This I define as identification-1 (hereafter, I-1) or the first level of identification. I-1 is the least amount of description one needs in order to identify anything from a formless substrate, as defined by the existence (or lack) of coherence: COHERENCE IDENTIFICATION-1: Identified (part-of) YES Not Identified (not part-of) NO For example, "water is part of a river" is an identification of water -- out of the formless of a river. The first level of identification is governed by the logical laws of mereology. The function part-of defines what Leshniewski called a collective class: when the whole is conceived as physically constituted by its parts. As an example, the class of all entities called "John Smith" consists of the entire collection of them -- and no one is more or less "John Smith" than any of them. All "John Smith" are indistinguishable in the collective. Collective classes obey a general logic theory of the relation between part and whole, called mereology. The word "mereology" comes from the Greek "mereos" or "part". It is the simplest class structure one may have. To improve the depth of identification of I-1, I need now to define a function called "member-of" in that mereology. The member-of function will allow me to define what Leshniewski called a distributive class: when a class expression is identical with a general reference. For example, to say that a "DSS-key" belongs to the class of DSS-keys is to say that a "DSS-key" is a DSS-key. Distributive classes obey all the theorems of traditional Aristotelian logic and logical algebra, as well as the logic of sets and relations. Members of a distributive class are described by ontology. To define a "member-of" function, I need to define predicates for coherence. In a natural way, I postulate that coherence can have two different predicates, as divided in the usual two realms of existence: - conceptual coherence: the observer's expression of natural or logical connections *that define* a nexus. To exemplify, if software evaluates an "out of focus photo of a gizmo" by edge detection, the software is certainly able to say that there is a loss of coherence between the different parts of the photo itself -- even if the software has no model of what a gizmo is. The very fact that the software is able to declare the photo "out of focus" by itself is thus a sign that it can express a conceptual coherence loss without reference to previous or past experiences with the nexus itself. The same can also happen if a human observes the photo -- e.g., the reader can certainly tell if a photo of mine is out of focus or not, even if the reader has never seen me. - perceptual coherence: the observer's expression of natural or logical connexions *with* a nexus. To exemplify, if the software receives and evaluates 100 photos of a gizmo, some out of focus and some in focus, the software is certainly able to say that there is a loss or gain of perceptual coherence between the different photos received -- even if the software has no model of what a gizmo is or is used for. Thus, perceptual coherence has to do with spatial and/or temporal coherence in the perceptual world. Conceptual coherence does not depend on space-time events -- while perceptual coherence does. To measure conceptual coherence, I will use the word "accuracy". To measure perceptual coherence, I will use the word "reliability". Both are standard terminology in Engineering and can be better understood by an example in terms of a signal that varies in time and is detected when it crosses a given threshhold level Eo, in four cases, for each of the logical possibilities of combination: 1. high accuracy and high reliability: signal ^ | | __ | || | || |Eo---------------------------- | || | || | || | || __ | || || | || || | || || |----- -- ---- -------------------------> time 2. high accuracy and low reliability signal ^ | | __ | || __ | || || |Eo---------------------------- | || || | || || | || || | || || | || || | || || | || || |----- -- ---- -------------------------> time 3. low accuracy and high reliability: signal ^ | | _____ | | | | | | |Eo---------------------------- | | | | | | | | | | | | _____ | | | | | | | | | | | | | | | |----- -- ---- -------------------------> time 4. low accuracy and low reliability signal ^ | | _____ | | | _____ | | | | | |Eo---------------------------- | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |----- -- ---- -------------------------> time To summarize, the two predicates of coherence are "accuracy" and "reliability" -- which allow four combinations to be defined for the second level of identification, which I call identification-2. The combinations are described and named in the following table: PREDICATE OF COHERENCE accuracy reliability IDENTIFICATION-2 Distinguished YES YES Ambiguous YES NO Obscure NO YES Formless NO NO where I have taken the name suggestion for each case from the work "de dialetica" by Augustine of Hippo [4] and have changed my own previous terminology, from "void" to "formless", because "void" may mislead the reader into the notion of emptiness alone -- as the only possibility. However, the notion of "formless" is "without form" and can either represent the absence of form as well as the absence of physical existence -- which is an essential ambiguity I want to represent in that case. To better explain the cases, the levels of Identification-2 are summarized below: D - Distinguished -- when an observer can distinguish one identification. A - Ambiguous -- when an observer can distinguish different identifications but is uncertain which one to take. O - Obscure -- when an observer cannot distinguish an identification but can percieve information. F - Formless -- when an observer cannot detect any form of information exchange. I note that Identification-2 (hereafter called I-2) has already more structure than current identification methods used in Internet protocols or in other uses of identification in general. I also note that I-2 allows us to discard once and for all the "tiranny of the certainty" in which we are told we must choose between "absolute YES" and "absolute NO" -- as given in I-1. According to I-2, it is perfectly valid and *precise* to identify something as ambiguous or obscure, if that is what the observer needs or can express. To force otherwise would be insecure. Further, even though one may think that a collective of Obscure identifications would be Obscure, that is not necessarily the case. Each letter of this e-mail, when taken in isolation, is Obscure in its coherence with the message I want to convey -- in other words, there is little accuracy in their use. In fct, aprxmtly 50% of al ltrs of ths mail cn be mssng nd th rdr cn stl rad it! Thus, a linked collective of Obscure elements can be Distinguished -- if and only if a new nexus is formed with this level of coherence. The positions and velocity of all electrons that make up the reader's table are certainly Obscure -- and yet, the table itself is Distinguished. To further exemplify, we can count two I-2 types in PKIX's or X.509's DN usage. The first is to be seen in commercial CAs that define the DN as "natural name + physical address" -- hence purportedly "Distinguished" to the public. The second is when DNs are used for "local names", as we can find in private CAs in companies -- hence "Obscure" to the public but "Distinguished" within the company. We also count only the two types in credit-card cardholder's names, PINs, credit-card numbers, etc. Thus, they all include the two types "Distinguished" and "Obscure" -- only. However, PGP is the only one that includes a higher subset of identification types within I-2. PGP includes the types "Distinguished", "Ambiguous", and "Obscure" -- which are defined by PGP's "Trust signature" scheme (see OpenPGP October-98 draft, item 5.2.3.12) under other names (and, as I interpret them). To summarize the results so far, this essay has revisited the concept of identification -- viewing identification as a connection. Essentially, to identify is to look for connections -- i.e., for coherence. Using only the idea of coherence to start with, it was possible to define "identification levels" -- where each higher level refines what the other level allows one to express. The first 2 identification levels have the following number of types: I-1 1 (Identified) I-2 4 (Distinguished, Ambiguous, Obscure, Formless) But, what are each of the four coherence predicates we have in I-2? I posit that they represent "modes of understanding", called sbin for "semiotic bin". They are cardinal numbers and semiotic is a name given in hindsight, because sbins will include three parts present in semiotics: Reference, Sense and Entity. Each sbin can be seen as an equivalence class for a type of understanding. They represent a type of understanding -- but not the understanding itself. Note that they do not represent a type of meaning either, but a more complex object called "understanding". To clarify the difference, I posit that "understanding" must be seen not as a set of isolated element but as a nexus itself -- a linked collective of elements -- because it must include links to everything I need. It cannot be only Shannon's information, it cannot be only what I do not expect. There must be communicable content that can be comprehended, there must be denotation of physical objects that can be touched. It is much closer to the concept of biological information, such as encoded in DNA and transported in RNA to a different location in space-time -- where it can be exactly *understood* in all its functional aspects. Exactly because "biological information" contains all needed links. It does not need to consult a third-party or an extrinsic hierarchy. Extensions to be commented at the mcg-talk will show that the next identification level (I-3) is characterized by 64 types: I-3 64 (DDD, DDA, DDO, ... FFF) where the 64 cases of I-3 result from 4^3 combinations of a 4-base (D,A,O,F) with 3 cells (Reference, Sense and Entity). Further, I-3 has additional 61 subtypes, for example DAoDF. The total number of types and subtypes in I-3 is 125. The 64 types define Identification-3 or, I-3. They are in same number as the 64 types used in the genetic code of all living beings, which are also a combination of 4^3. Going further into I-4 and etc, the 64 types can provide billions of secondary identification types, in combination. A full spectrum of nuances opens up. However, to make matters clearer to current uses of Internet identification and certification, it is possible that (in order to capture the *most* interesting cases) no more than 26 main types would be needed in a first approximation. These 26 main types were selected from I-2, I-3 and its subtypes, as currently being openly discussed by the MCG (see the mcg-talk list archives). The present work has thus necessarily led from identification to understanding. To identify is also to understand. It should be useful to the scientific development of all Internet protocols, to all human communication modes, to all information transfer models (including the RNA genetic code) and anywhere one needs to use the basic concept of a link -- a connection -- in order to reach beyond one's own point in space and time. Thus, even when dealing with one's own memory. A further benefit of the work is that it allows clear definitions for a large number of new anonymous identification types, sorely needed on the Internet and for e-commerce. Thus, identification must be understood not only in the sense of an identity connection, but in the wider sense of any connection. This work shows that not identity but coherence is the general metric for identification. More coherence and more coherence modes mean stronger identification, even if anonymous. Cheers, Ed Gerck ============================================ References: [1] Merriam-Webster Dictionary [2] http://www.mcg.org.br/auth_b1.htm [3] http://www.mcg.org.br/certover.pdf [4] http://ccat.sas.upenn.edu/jod/texts/dialecticatrans.html [5] http://www.mcg.org.br/trustdef.htm ______________________________________________________________________ Dr.rer.nat. E. Gerck egerck@novaware.cps.softex.br http://novaware.cps.softex.br