Simplicity, Complexity, Unexpectedness, Cognition, Probability, Information

by Jean-Louis Dessalles     (created 31 December 2008, updated December 2015)

Example: Inverted stamps (rarity)

Rare objects must be simple to be unexpected

In T. K. Tapling’s great stamp collection (XIX° century) to be seen in the British Museum in London, the nine most valuable rarities include:

All these erroneous characteristics share the property of being remarkably simple. How does simplicity influence the feeling of rarity (and, in this case, the value of items)?

This 24-cent airmail stamp of 1918 (below, left), known as the "Inverted Jenny" was erroneously printed with an inverted centre. An exemplar of it was bought for $ 977500 in 2007. About eight thousand times the price of a regular copy of the same stamp (below, right).    

How much is this inverted stamp unexpected? By definition, unexpectedness is the difference between generation complexity and description complexity: Cw – C. Let’s compute both terms.

Computing unexpectedness for rare events

Let’s call s the following situation: "I can see this stamp in front of me now". What is the complexity of generating s? In the absence of any knowledge about it, Cw(s) = 0 (the stamp exists in the world I am in), C(s) = 0 (because the stamp is uniquely determined by you considering it here and now) and therefore U(s) = 0. Ignorant beings cannot be surprised.

Now add knowledge about the class r the stamp belongs to (US postage stamp, 1918, featuring the Curtiss JN-4 airplane). This piece of knowledge can be represented by the predicate r(s). If you notice that its center has been printer upside-down, you will consider another predicate, f(s), where f represents the feature ‘inverted’. This time, U(s) is no longer 0. It can be estimated in two ways.

Fist computation: Lottery

You may consider that s (the presence of the stamp in front of you) results from a random draw among all collector stamps. Suppose there are N0 collector stamps in the world. Then the W-Machine needs:

Cw(s) = log2(N0) bits

to "decide" which stamp will be in front of you.
The fact that the stamps has feature f helps you describe it in a concise way. The use of conceptual features allows to write:

C(s) < C(f) + C(s|f)

If Nf items are known to have feature f, we get: C(s|f) = log2 Nf. This is the amount of bits that the O-Machine requires to discriminate among the Nf objects that share f. Finally:

    U(s) > log2 N0log2 Nf – C(f)    

Remark: Reference classes and features like f should be chosen so as to maximize unexpectedness. Choosing a more specific class (e.g. "officially listed 1918 US-Postage 24-cent stamps" instead of merely "stamps") would decrease both N0 and Nf. Choosing a more specific feature f such as "listed as the kth Inverted Jenny" would make Nf = 1, but would increase C(f) by (more than) log2(k). The latter strategy may prove benefical if k is very small, but only for specialists who know about the list.

         The subjective probability attached to rarity, as given by formula p = 2–U, amount (as far as U(s) > 0) to:

    p = Nf / N0 × 2C(f)    

Note that the first factor Nf/N0 is the classical probability value, i.e. the ratio between the favourable outcomes over the possible outcomes.
ST, however, introduces a corrective term which is ignored in standard Probability Theory. The corrective term 2C(f) accounts for the fact that only simple features make rare situations improbable. This explains why simple features, such as symmetries of colour permutations, are regarded as remarkable by stamp collectors, as in T. K. Tapling’s stamp collection (see above).

All objects in the universe are unique. For classical probability theory, their probability is zero. Most objects are, however, unique for complex reasons. Their subjective probability is thus rather close to 1 for a human observer. Genuinely rare objects must be easy to describe.

Several features may be used together to increase the valuation of U(s). Let’s remember that the stamp belongs to class r (US postage stamps, 1918, featuring the Curtiss JN-4 airplane). By using several conceptual features, we can write:

C(s) < C(r*f*s) = C(r) + C(f) + C(s|r&f)

* designates computation sequence. If we know that there are only P inverted Jennies in the world (P is believed to be smaller than 100), then C(s|r&f) = log2 P and finally:         

U(s) > log2 N0log2 P – C(r) – C(f)

Alternatively, we may estimate U(s) through the computation sequence r*s: U(s) > U(r(s)) + U(s|r(s)). This means that we compute the unexpectedness of getting a 1918-US-postage stamp first, and then the unexpectedness that the stamp is an inverted one. We get (still using f on the description side):

U(s) > (log2 N0log2 Nr – C(r)) + (log2 Nrlog2 P – C(f))

where Nr is the number of stamps in r. This result is the same as the preceding one.

Second computation: Causality

Let’s consider the computation sequence f*s: U(s) > U(f(s)) + U(s|f(s)).

Suppose that f means the following event: "this stamp in front of you has been printed upside-down".
With this feature, last term U(s|f(s)) is zero (nothing left to generate, and stamp fully determined).
Generation complexity Cw(f(s)) may be assessed through a causal story. For instance a scenario leading to an error made by the worker who printed this stamp in front of you in a stamp printing facility a century ago. Experts would know that Cw(f(s)) amounts to log2(N0/Nf) by using the frequency of such events. We get: U(s) > log2(N0/Nf) – C(f), which is the same result as before. Note, hwever, that the result may significantly differ from this estimate, as the layperson may find the causal story much easier (if she thinks that printing errors are commonplace) or on the contrary highly complex or even hard to believe (see the example of the rabid bat or the example of the running nuns).

Another way to use causality is to consider another predicate "This stamp exhibition I am visiting today was lucky enough to get an Inverted Jenny". Then you imagine a causal scenario hiding behind the word ‘luck’.
Remark: Seeing an inverted stamp in an exhibition is far less impressive than finding it by chance on a letter. Only a causal estimate of Cw can account for the huge unexpectedness in the latter case.


Dessalles, J-L. (2013). Algorithmic simplicity and relevance. In D. L. Dowe (Ed.), Algorithmic probability and friends - LNAI 7070, 119-130. Berlin, D: Springer Verlag.

Saillenfest, A. & Dessalles, J-L. (2015). Some probability judgments may rely on complexity assessments. Proceedings of the 37th Annual Conference of the Cognitive Science Society, to appear. Austin, TX: Cognitive Science Society.

Back to the Simplicity Theory page.