SimplicityTheory

Simplicity, Complexity, Unexpectedness, Cognition, Probability, Information

by Jean-Louis Dessalles (created 31 December 2008, updated November 2017)

Example: Inverted stamps (rarity)


Rare objects must be simple to be unexpected

In T. K. Tapling’s great stamp collection (XIX° century) to be seen in the British Museum in London, the nine most valuable rarities include:

an Indian stamp of 1854 has an inverted head
a Spanish stamp of 1851 has the wrong colour
an Australian stamp of 1854 has an inverted frame

All these erroneous characteristics share the property of being remarkably simple. How does simplicity influence the feeling of rarity (and, in this case, the value of items)?

This 24-cent airmail stamp of 1918 (below, left), known as the "Inverted Jenny" was erroneously printed with an inverted centre. An exemplar of it was bought for $ 977500 in 2007. About eight thousand times the price of a regular copy of the same stamp (below, right).

How much is this inverted stamp unexpected? By definition, unexpectedness is the difference between generation complexity and description complexity: C_w – C. Let’s compute both terms.

Computing unexpectedness for rare events

Let’s call s the following situation: "I can see this stamp in front of me now". What is the complexity of generating s? In the absence of any knowledge about it, C_w(s) = 0 (the stamp exists in the world I am in), C(s) = 0 (because the stamp is uniquely determined by you considering it here and now) and therefore U(s) = 0. Ignorant beings cannot be surprised.

Now add knowledge about the class r the stamp belongs to (US postage stamp, 1918, featuring the Curtiss JN-4 airplane). This piece of knowledge can be represented by the predicate r(s). If you notice that its center has been printer upside-down, you will consider another predicate, f(s), where f represents the feature ‘inverted’. This time, U(s) is no longer 0. It can be estimated in two ways.

Fist computation: Lottery

You may consider that s (the presence of the stamp in front of you) results from a random draw among all collector stamps. Suppose there are N₀ collector stamps in the world. Then the W-Machine needs:

C_w(s) = log₂(N₀)

bits to "decide" which stamp will be in front of you.
The fact that the stamps has feature f helps you describe it in a concise way. The use of conceptual features allows to write:

C(s) < C(f) + C(s|f)

If N_f items are known to have feature f, we get: C(s|f) = log₂ N_f. This is the amount of bits that the O-Machine requires to discriminate among the N_f objects that share f. Finally:

U(s) > log₂ N₀ – log₂ N_f – C(f)

U(s) > log₂ N₀ – log₂ N_f – C(f)

Remark: Reference classes and features like f should be chosen so as to maximize unexpectedness. Choosing a more specific class (e.g. "officially listed 1918 US-Postage 24-cent stamps" instead of merely "stamps") would decrease both N₀ and N_f. Choosing a more specific feature f such as "listed as the k^th Inverted Jenny" would make N_f = 1, but would increase C(f) by (more than) log₂(k). The latter strategy may prove benefical if k is very small, but only for specialists who know about the list.

The subjective probability attached to rarity, as given by formula p = 2^–U, amount (as far as U(s) > 0) to:

p = N_f / N₀ × 2^C(f)

Note that the first factor N_f/N₀ is the classical probability value, i.e. the ratio between the favourable outcomes over the possible outcomes.
ST, however, introduces a corrective term which is ignored in standard Probability Theory. The corrective term 2^C(f) accounts for the fact that only simple features make rare situations improbable. This explains why simple features, such as symmetries of colour permutations, are regarded as remarkable by stamp collectors, as in T. K. Tapling’s stamp collection (see above).

All objects in the universe are unique. For classical probability theory, their probability is zero. Most objects are, however, unique for complex reasons. Their subjective probability is thus rather close to 1 for a human observer. Genuinely rare objects must be easy to describe.

Several features may be used together to increase the valuation of U(s). Let’s remember that the stamp belongs to class r (US postage stamps, 1918, featuring the Curtiss JN-4 airplane). By using several conceptual features, we can write:

C(s) < C(r*f*s) = C(r) + C(f) + C(s|r&f)

* designates computation sequence. If we know that there are only P inverted Jennies in the world (P is believed to be smaller than 100), then C(s|r&f) = log₂ P and finally:

U(s) > log₂ N₀ – log₂ P – C(r) – C(f)

Alternatively, we may estimate U(s) through the computation sequence r*s: U(s) > U(r(s)) + U(s|r(s)). This means that we compute the unexpectedness of getting a 1918-US-postage stamp first, and then the unexpectedness that the stamp is an inverted one. We get (still using f on the description side):

U(s) > (log₂ N₀ – log₂ N_r – C(r)) + (log₂ N_r – log₂ P – C(f))

where N_r is the number of stamps in r. This result is the same as the preceding one.

Alternatively, we may estimate U(s) through the computation sequence rs: U(s) > U(r(s)) + U(s\|r(s)). This means that we compute the unexpectedness of getting a 1918-US-postage stamp first, and then the unexpectedness that the stamp is an inverted one. We get (still using f on the description side): U(s) > (log₂ N₀ – log₂ N_r – C(r)) + (log₂ N_r – log₂ P – C*(f)) where N_r is the number of stamps in r. This result is the same as the preceding one.

Second computation: Causality

Let’s consider the computation sequence f * s: U(s) > U(f(s)) + U(s|f(s)).

Suppose that f means the following event: "this stamp in front of you has been printed upside-down".
With this feature, last term U(s|f(s)) is zero (nothing left to generate, and stamp fully determined).
Generation complexity C_w(f(s)) may be assessed through a causal story. For instance a scenario leading to an error made by the worker who printed this stamp in front of you in a stamp printing facility a century ago. Experts would know that C_w(f(s)) amounts to log₂(N₀/N_f) by using the frequency of such events. We get: U(s) > log₂(N₀/N_f) – C(f), which is the same result as before. Note, however, that the result may significantly differ from this estimate, as the layperson may find the causal story much easier (if she thinks that printing errors are commonplace) or on the contrary highly complex or even hard to believe (see the example of the rabid bat or the example of the running nuns).

Another way to use causality is to consider another predicate "This stamp exhibition I am visiting today was lucky enough to get an Inverted Jenny". Then you imagine a causal scenario hiding behind the word ‘luck’.
Remark: Seeing an inverted stamp in an exhibition is far less impressive than finding it by chance on a letter. Only a causal estimate of C_w can account for the huge unexpectedness in the latter case.

Bibliography

Dessalles, J-L. (2013). Algorithmic simplicity and relevance. In D. L. Dowe (Ed.), Algorithmic probability and friends - LNAI 7070, 119-130. Berlin, D: Springer Verlag.

Saillenfest, A. & Dessalles, J-L. (2015). Some probability judgments may rely on complexity assessments. Proceedings of the 37th Annual Conference of the Cognitive Science Society, to appear. Austin, TX: Cognitive Science Society.

Back to the Simplicity Theory page.