1
0
Fork 0
mirror of https://github.com/fmap/muflax65ngodyewp.onion synced 2024-06-26 10:26:48 +02:00
This commit is contained in:
muflax 2012-06-20 00:04:53 +02:00
parent 199799c847
commit 2054bdfd8e

View file

@ -20,19 +20,19 @@ When, and in what order, were the texts written? I'm going to ignore the *when*
Why is this important at all? Because then we can trace influences, theological and political developments and so on. We can use this information to figure out what the direction of certain developments was (Did they start with a messiah and made him a prophet or the other way around?), can date the texts much better (If Paul's letter were written *after* the gospels, then who the fuck is "Paul"?) and so on. So basically, you can make significant progress on all historical questions about early Christianity.
Of course, this doesn't just apply to Christianity. It works in any textual tradition, but Christianity is the extremely well-documented compared to anything else before basically the Renaissance, so we start there.
Of course, this doesn't just apply to Christianity. It works in any textual tradition, but Christianity is extremely well-documented compared to anything else before basically the Renaissance, so we start there.
You'd think that with such an important question, you'd have good answers by now. If you seriously assume that, you've never *been* in a humanities class. Seriously, these fuckers can't even quantify shit. They are like the little brother who's a bit retarded, but no-one has the to heart to tell them how much they make a fool of themselves when they constantly claim that they don't need "math", "computers" or "machines", they have "dialectic". </rant\>
You'd think that with such an important question, you'd have good answers by now. If you seriously assume that, you've never *been* in a humanities class. Seriously, these fuckers can't even quantify shit. They are like the little brother who's a bit retarded, but no one has the to heart to tell them how much they make a fool of themselves when they constantly claim that they don't need "math", "computers" or "machines", they have "dialectic". </rant\>
Anyway, back to text ordering. I had an interesting talk with a statistical learning researcher yesterday and he brought up a really cool idea.
Let's say you have two pieces of data, A and B, and you're trying to figure out if A *causes* B. [Traditionally][Judea Pearl], you do this through statistics. You sample and collect some observations, then check if you see conditional probabilities. Basically, if A and B are independent variables, there can't be a causation, but if you can predict B, given A, but not the other way around, then A causes B. (In your face, Popper!)
There's one problem with this - you need a certain amount of samples. It doesn't work with N=1. If you only ever saw A and B once, statistically, you'd be screwed. [But maybe there's another way.][Causal Inference])
There's one problem with this - you need a certain amount of samples. It doesn't work with N=1. If you only ever saw A and B once, statistically, you'd be screwed. [But maybe there's another way.][Causal Inference]
Let's say your data is actually a sequence of digits, as produced by two volunteers. You put each one of them in an isolated room and then tell them to write down 1000 digits. Afterwards you compare the texts and notice something - *they are almost identical*. What happened?
Well, one possibility is that one of them copied the other. But you isolated them, this can't have happened. What else? If you thought, "they used the same method to come up with the sequence", then you win. For example, they might both be writing down the prime numbers, but each one made a few minor mistakes. But how does this help use discover causality?
Well, one possibility is that one of them copied the other. But you isolated them, this can't have happened. What else? If you thought, "they used the same method to come up with the sequence", then you win. For example, they might both be writing down the prime numbers, but each one made a few minor mistakes. But how does this help us discover causality?
Remember [Kolmogorov complexity][Kolmogorov Complexity]. K(s) of any sequence s is a measure of how well you can compress s. In other words, it tells you how hard it is to find an algorithm to generate s. The lower K(s), the easier the task. So going back to our two sequences A and B, what's their complexity? Well, K(A) and K(B) will be almost identical. After all, it's just K(prime numbers) + K(a few mistakes). But more importantly, what's the complexity of K(A, B), i.e. of a program that outputs both A and B? In our case, it's almost the same - we just have to remember the additional mistakes. K(prime numbers) can be reused.
@ -42,7 +42,7 @@ So what do we conclude? If K(A) + K(B), for any two pieces of data A and B, is s
Alright, but how does this give us order?
Let's say there is a third sequence, C. We check it and find it has all the errors in A, but a few additional ones. So K(C) = K(A) + K(additional error) and thus K(C,A) is much smaller than K(C) + K(A) and there's a causal link. But there's more than that. If you search for an algorithm that generates C, if you already have one that generates A for free, what's the result? K(C|A) is really small, like trivially small - it's just a few additional errors.
Let's say there is a third sequence, C. We check it and find it has all the errors in A, but a few additional ones. So K(C) = K(A) + K(additional error) and thus K(C,A) is much smaller than K(C) + K(A) and there's a causal link. But there's more than that. If you search for an algorithm that generates C, if you already have one that generates A for free, what's the result? K(C\|A) is really small, like trivially small - it's just a few additional errors.
Enter Markov and his Condition. In a causal graph, any node is determined only by its direct causes. Basically, once you know all the direct causes of something, there's nothing left to learn. Checking any other node won't give you additional information. We say that the direct causes *screen off* the rest of the graph. Everything is nice and local. We can slightly relax this to construct a statistical ordering. Remember the case where B depended on A, but not the other way around. So obviously A must be the cause of B because otherwise you could learn something about B without involving causation. The *strength* of a causal link is then a measure of how much information you can extract from other nodes.