On Bits and the Information Formula

For people not immersed in computers or the mathematical field known as information theory, the idea of "measuring information" may seem unusual. However, terms like "bits", "bytes", or even "gigabytes" are now clichés in today's world. What isn't apparent is that these same ideas are also applicable to the Q-Ching. The oracle is more or less a means of accessing the will of heaven through a very intricate process of using the yarrow sticks to "create information" that is recorded as a collection of yin and yang lines; these symbols are then used to choose the proper texts that (hopefully) make sense to humans.

What is "Information"?

The first idea you need to break free of is the notion that "information" (in a mathematical sense) has anything to do with "useful content" or some kind of factual data. In fact, it's more like a "something" that removes uncertainty or doubt concerning what is going to happen next in a message. The more doubt that is dispelled, the more information you got from the symbol you observed. For instance, if you have some prose before you and all you know so far is the first word is "The", it's pretty difficult to hazard a guess about what happens next. But if the piece opens up with "Four score and seven", you've got some pretty big clues as to what the rest of the text might be about. Therefore, the second phrase has more information than the first one does.

Mathematicians generally don't concern themselves with things like English prose. Generally, they would describe a message as "a sequence of symbols", without worrying too much which symbols are used or how they are represented. In fact, they are apt to represent each symbol with a number and consider sequences of numbers instead. Even more basic, they tend to restrict themselves to the numbers zero and one. All messages can be represented as sequences of zeros and ones, given the proper "coding" so you know how to interpret the message.

The notion of sequences of yin and yang lines is a perfect parallel to sequences of zeros and ones, which is probably why the Q-Ching is such a favorite of people in the sacred mathematics field. It seems absolutely natural to apply the methods of information theory to the oracle.

What is a Bit?

A "bit" is the basic unit of information in this theory. If you are examining messages consisting of two symbols (such as 0 and 1) that are equally likely to occur in any given sequence, the amount of information needed to dispell doubt as to which symbol will occur next is one bit. If you see the next symbol is a 0, that's one bit; similarly if a 1 shows up. The difference between heads and tails in a coin toss is one bit. So is the choice of a yin or a yang line.

The more possibilities there are for a next symbol, and the less likely that any given symbol will be the next one, the more bits of info you need to figure it out. If you have 4 possible choices, it takes 2 bits. If there are 8 symbols possible, it requires 3 bits to represent the next symbol, etc. Similarly, the appearance of the letter "e" in English text (the most common letter in the alphabet) carries less information than the highly improbable letters like "z" and "q". The more unexpected the outcome, the more novelty it conveys, the more bits it takes to represent.

Numbers that are written with just 0's and 1's are called base 2 numbers or binary numbers. In fact, the word "bit" is just a contraction of "binary digit" -- a choice between 0 and 1. The term "byte" generally means 8 bits of information, enough to represent 256 different symbols.

What is the Information Formula?

One of the aims of information theory is to estimate the amount of information in a message. This is often specified as the total number of bits in the message, or as the average number of bits per symbol in the message.

The amount of information in a message has a very precise mathematical definition, but we will have to work up to it. A message is considered to be a sequence of symbols, usually infinite in length (for simplicity, curiously enough), where the symbols come from a specified "alphabet" (or set of symbols) and appear with certain probabilities. These symbols can be almost anything, but for definiteness, we describe an alphabet A as a set of "symbols" we call a0, a1, a2, ...; we write A = { a0, a1, a2, ... }. A can be either finite or infinite. Corresponding to A is a set of probabilities P = { p0, p1, p2,... }, where each p(n) is a number between 0 and 1 that describes how frequently the symbol a(n) shows up in a message. The sum of all the p(n)'s must equal 1 (which means perfect certainty of seeing some symbol next). A message M is simply an ordered sequence of symbols, that is, a message M = ( m0, m1, m2, ... ) where each symbol in the message is one of the symbols in our alphabet A.

Now, the less likely a symbol is to show up, the more of a surprise it is when it does appear. Low probability symbols thus have more information (measured in bits) than high probability symbols. In particular, the "information content" of symbol a(n) is equal to -log p(n), where "log" represents the logarithm function.

Since a symbol a(n) has only a p(n) chance of being the next symbol in a message, it can make only a p(n)*(-log p(n)) bit contribution to the information in the next symbol. Remember that the sum of all the p(n)'s is exactly one, by definition, so the "average value" of bits in the next symbol is simply a weighted average, namely:

Average Information (in bits per symbol) = the sum of all -p(n)*log p(n), summed over all the symbols in the alphabet A.

For instance, let's look at the information content of the emblematic lines in a hexagram. There are 4 such lines in our "alphabet": moving yin, static yang, static yin, moving yang. Looking at the situation simplistically, you would think 4 symbols, each showing up 25% of the time, which would give a measure of 2 bits per line, as shown in the next table.

Type of Line Probability p(n) -log p(n) -p(n)*log p(n)

Moving Yin .25 2 .5

Static Yang .25 2 .5

Static Yin .25 2 .5

Moving Yang .25 2 .5

Sums 1 n/a 2 bits

Type of Line	Probability p(n)	-log p(n)	-p(n)*log p(n)
Moving Yin	.25	2	.5
Static Yang	.25	2	.5
Static Yin	.25	2	.5
Moving Yang	.25	2	.5
Sums	1	n/a	2 bits

Actually, moving lines are much less likely than static lines. In the 3 coins method for casting lines, there's about a 1 in 8 chance of throwing a moving yin or yang line, and a 3 in 8 chance of throwing the static lines. Notice that when some symbols are less likely than others, the information content decreases compared to the equal probabilities situation:

Type of Line Probability p(n) -log p(n) -p(n)*log p(n)

Moving Yin .125 3 .375

Static Yang .375 1.41503... .53063...

Static Yin .375 1.41503... .53063...

Moving Yang .125 3 .375

Sums 1 n/a 1.81127... bits

Type of Line	Probability p(n)	-log p(n)	-p(n)*log p(n)
Moving Yin	.125	3	.375
Static Yang	.375	1.41503...	.53063...
Static Yin	.375	1.41503...	.53063...
Moving Yang	.125	3	.375
Sums	1	n/a	1.81127... bits

However, according to the famous yarrow stick method of casting lines for a hexagram, the probabilities are even more skewed, reducing the content even more:

Type of Line Probability p(n) -log p(n) -p(n)*log p(n)

Moving Yin .047766... 4.387849... .209593...

Static Yang .278319... 1.845230... .513547...

Static Yin .451911... 1.145888... .517839...

Moving Yang .222911... 2.171296... .482051...

Sums 1 n/a 1.723032... bits

Type of Line	Probability p(n)	-log p(n)	-p(n)*log p(n)
Moving Yin	.047766...	4.387849...	.209593...
Static Yang	.278319...	1.845230...	.513547...
Static Yin	.451911...	1.145888...	.517839...
Moving Yang	.222911...	2.171296...	.482051...
Sums	1	n/a	1.723032... bits

The Logarithm Function

Since the advent of computers, most people have little use for logarithms anymore, especially as an aid to computation. The mathematical function "log x" is the logarithm function to base 2. In a nutshell, log x = y if and only if 2^y = x. In other words, log x is the exponent (on the base 2) that gets you back to x. As y increases, x goes up exponentially, meaning the more bits per symbol, the more symbols you can represent. log x is easy to calculate if x is a power of 2, but more difficult is x is not an even power.

In other mathematical contexts, logarithms with other bases may be used. The most common bases are 10 (common logarithms) and the irrational number e = 2.71828... (natural logarithms). While it may seem weird to use an unwieldy number like e in your calculations, it actually simplifies the equations for calculating logarithms considerably. The natural base e also shows up in many formulas in mathematics and the sciences.