muflax65ngodyewp.onion/content_blog/solomonoff/si-universal-prior-and-anth...

3.5 KiB

title date tags techne episteme slug
[SI] Universal Prior and Anthropic Reasoning 2012-01-19
great filter
solomonoff induction
:done :speculation 2012/01/19/si-universal-prior-and-anthropic-reasoning/

(This is not really part of my explanation of Solomonoff Induction, just a crazy idea. But it overlaps and does explain some things, so yeah.)

Bayes theorem is awesome. We all know that. It is the optimal way to reason from a given set of evidence. Well, almost. There's one little flaw - what's your prior? What initial probability do you assign your hypotheses before you got any evidence?

There is one approach, which I might talk about more when I explain Solomonoff Induction, that is called the Universal Prior. (How original.) The UP is really easy: for every hypothesis, you find all programs consistent with the data and assign them a weight proportional to their Kolmogorov Complexity, favoring short programs.

Let's step back a little bit...

So you are a program and want to locate yourself in program space. You don't actually know your own source code, but you do know your output. You pray to St. Tegmark for a minor hypercomputation miracle and check all possible programs in program space. You exclude all programs that are inconsistent with your output and have a small set left over. Which one are you?

You look closer and notice that there are really only two prefixes left. One is S bits long (call it A), the other S+1 bits (B), but both are fundamentally different. You notice that all programs are infinite in size because you can always just pad them out with random noise. The content of a Turing machine's tape is irrelevant after it halts, but it can still be used to distinguish them. So if you look at all programs of up to length S, there is exactly one match - A. If you look at all programs of length S+1, there are 3 matches. There is B, of course, but there is also A+0 and A+1, i.e. just A with a random bit at the end.

Programs like A are always twice as common as programs like B. That's exactly what the Universal Prior tells us - weight of a program is 2^-l, so being 1 bit shorter makes you twice as likely. There are simply twice as many of you in all of program space.

So which program are you? Well, you don't know. You have no reason to prefer one algorithm over another if they produce identical data, so you simply give all programs the same weight and say you are one random sample from them. This means that you should assign 2/3 probability that you have the prefix A and 1/3 that you have B. That looks an awful lot like anthropic reasoning.

The Self-Sampling Assumption says you should assume you are a random sample from all actual existing observers, meaning all observers within this world.

The Self-Indication Assumption says you should sample from all possible observes, including from different worlds.

So if I use SSA, I might say, all actual observers are all continuations of my current program, so I should assume I'm a random sample from all programs given a specific prefix, but not sampled from all programs. So I should weigh all prefixes equally, then for each distribute my probability mass over all continuations. That's... weird.

If I use SIA, I just assume I'm somewhere in program space and so judge all programs equally. This means I favor prefix A over prefix B, at 2:1, as it is 1 bit shorter and so twice as common.

This seems to support SIA, and anthropic self-location in general. Is that of any consequence? Well, SIA implies a late Great Filter. Uh oh.