Section 8.1 Error-Detecting and Correcting Codes
Let us examine a simple model of a communications system for transmitting and receiving coded messages (Figure 8.1).
Uncoded messages may be composed of letters or characters, but typically they consist of binary \(m\)-tuples. These messages are encoded into codewords, consisting of binary \(n\)-tuples, by a device called an encoder. The message is transmitted and then decoded. We will consider the occurrence of errors during transmission. An error occurs if there is a change in one or more bits in the codeword. A decoding scheme is a method that either converts an arbitrarily received \(n\)-tuple into a meaningful decoded message or gives an error message for that \(n\)-tuple. If the received message is a codeword (one of the special \(n\)-tuples allowed to be transmitted), then the decoded message must be the unique message that was encoded into the codeword. For received non-codewords, the decoding scheme will give an error indication, or, if we are more clever, will actually try to correct the error and reconstruct the original message. Our goal is to transmit error-free messages as cheaply and quickly as possible.
Example 8.2.
One possible coding scheme would be to send a message several times and to compare the received copies with one another. Suppose that the message to be encoded is a binary \(n\)-tuple \((x_{1}, x_{2}, \ldots, x_{n})\text{.}\) The message is encoded into a binary \(3n\)-tuple by simply repeating the message three times:
To decode the message, we choose as the \(i\)th digit the one that appears in the \(i\)th place in at least two of the three transmissions. For example, if the original message is \((0110)\text{,}\) then the transmitted message will be \((0110\; 0110\; 0110)\text{.}\) If there is a transmission error in the fifth digit, then the received codeword will be \((0110\; 1110\; 0110)\text{,}\) which will be correctly decoded as \((0110)\text{.}\) 10 This triple-repetition method will automatically detect and correct all single errors, but it is slow and inefficient: to send a message consisting of \(n\) bits, \(2n\) extra bits are required, and we can only detect and correct single errors. We will see that it is possible to find an encoding scheme that will encode a message of \(n\) bits into \(m\) bits with \(m\) much smaller than \(3n\text{.}\)
Example 8.3.
Even parity, a commonly used coding scheme, is much more efficient than the simple repetition scheme. The ASCII (American Standard Code for Information Interchange) coding system uses binary \(8\)-tuples, yielding \(2^{8} = 256\) possible \(8\)-tuples. However, only seven bits are needed since there are only \(2^7 = 128\) ASCII characters. What can or should be done with the extra bit? Using the full eight bits, we can detect single transmission errors. For example, the ASCII codes for A, B, and C are
Notice that the leftmost bit is always set to 0; that is, the \(128\) ASCII characters have codes
The bit can be used for error checking on the other seven bits. It is set to either \(0\) or \(1\) so that the total number of \(1\) bits in the representation of a character is even. Using even parity, the codes for A, B, and C now become
Suppose an A is sent and a transmission error in the sixth bit is caused by noise over the communication channel so that \((0100\; 0101)\) is received. We know an error has occurred since the received word has an odd number of \(1\)s, and we can now request that the codeword be transmitted again. When used for error checking, the leftmost bit is called a parity check bit.
By far the most common error-detecting codes used in computers are based on the addition of a parity bit. Typically, a computer stores information in \(m\)-tuples called words. Common word lengths are \(8\text{,}\) \(16\text{,}\) and \(32\) bits. One bit in the word is set aside as the parity check bit, and is not used to store information. This bit is set to either \(0\) or \(1\text{,}\) depending on the number of \(1\)s in the word.
Adding a parity check bit allows the detection of all single errors because changing a single bit either increases or decreases the number of \(1\)s by one, and in either case the parity has been changed from even to odd, so the new word is not a codeword. (We could also construct an error detection scheme based on odd parity; that is, we could set the parity check bit so that a codeword always has an odd number of \(1\)s.)
The even parity system is easy to implement, but has two drawbacks. First, multiple errors are not detectable. Suppose an A is sent and the first and seventh bits are changed from \(0\) to \(1\text{.}\) The received word is a codeword, but will be decoded into a C instead of an A. Second, we do not have the ability to correct errors. If the 8-tuple \((1001\; 1000)\) is received, we know that an error has occurred, but we have no idea which bit has been changed. We will now investigate a coding scheme that will not only allow us to detect transmission errors but will actually correct the errors.
Example 8.4.
Suppose that our original message is either a \(0\) or a \(1\text{,}\) and that \(0\) encodes to \((000)\) and \(1\) encodes to \((111)\text{.}\) If only a single error occurs during transmission, we can detect and correct the error. For example, if a \((101)\) is received, then the second bit must have been changed from a \(1\) to a \(0\text{.}\) The originally transmitted codeword must have been \((111)\text{.}\) This method will detect and correct all single errors.
Transmitted | Received Word | |||||||
Codeword | \(000\) | \(001\) | \(010\) | \(011\) | \(100\) | \(101\) | \(110\) | \(111\) |
\(000\) | \(0\) | \(1\) | \(1\) | \(2\) | \(1\) | \(2\) | \(2\) | \(3\) |
\(111\) | \(3\) | \(2\) | \(2\) | \(1\) | \(2\) | \(1\) | \(1\) | \(0\) |
In Table 8.5, we present all possible words that might be received for the transmitted codewords \((000)\) and \((111)\text{.}\) Table 8.5 also shows the number of bits by which each received \(3\)-tuple differs from each original codeword.
Subsection Maximum-Likelihood Decoding
The coding scheme presented in Example 8.4 is not a complete solution to the problem because it does not account for the possibility of multiple errors. For example, either a (000) or a (111) could be sent and a (001) received. We have no means of deciding from the received word whether there was a single error in the third bit or two errors, one in the first bit and one in the second. No matter what coding scheme is used, an incorrect message could be received. We could transmit a (000), have errors in all three bits, and receive the codeword (111). It is important to make explicit assumptions about the likelihood and distribution of transmission errors so that, in a particular application, it will be known whether a given error detection scheme is appropriate. We will assume that transmission errors are rare, and, that when they do occur, they occur independently in each bit; that is, if \(p\) is the probability of an error in one bit and \(q\) is the probability of an error in a different bit, then the probability of errors occurring in both of these bits at the same time is \(pq\text{.}\) We will also assume that a received \(n\)-tuple is decoded into a codeword that is closest to it; that is, we assume that the receiver uses maximum-likelihood decoding. 11
A binary symmetric channel is a model that consists of a transmitter capable of sending a binary signal, either a \(0\) or a \(1\text{,}\) together with a receiver. Let \(p\) be the probability that the signal is correctly received. Then \(q = 1 - p\) is the probability of an incorrect reception. If a \(1\) is sent, then the probability that a \(1\) is received is \(p\) and the probability that a \(0\) is received is \(q\) (Figure 8.6). The probability that no errors occur during the transmission of a binary codeword of length \(n\) is \(p^{n}\text{.}\) For example, if \(p=0.999\) and a message consisting of 10,000 bits is sent, then the probability of a perfect transmission is
Theorem 8.7.
If a binary \(n\)-tuple \((x_{1}, \ldots, x_{n})\) is transmitted across a binary symmetric channel with probability \(p\) that no error will occur in each coordinate, then the probability that there are errors in exactly \(k\) coordinates is
Proof.
Fix \(k\) different coordinates. We first compute the probability that an error has occurred in this fixed set of coordinates. The probability of an error occurring in a particular one of these \(k\) coordinates is \(q\text{;}\) the probability that an error will not occur in any of the remaining \(n-k\) coordinates is \(p\text{.}\) The probability of each of these \(n\) independent events is \(q^{k}p^{n-k}\text{.}\) The number of possible error patterns with exactly \(k\) errors occurring is equal to
the number of combinations of \(n\) things taken \(k\) at a time. Each of these error patterns has probability \(q^{k}p^{n-k}\) of occurring; hence, the probability of all of these error patterns is
Example 8.8.
Suppose that \(p = 0.995\) and a \(500\)-bit message is sent. The probability that the message was sent error-free is
The probability of exactly one error occurring is
The probability of exactly two errors is
The probability of more than two errors is approximately
Subsection Block Codes
If we are to develop efficient error-detecting and error-correcting codes, we will need more sophisticated mathematical tools. Group theory will allow faster methods of encoding and decoding messages. A code is an \((n, m)\)-block code if the information that is to be coded can be divided into blocks of \(m\) binary digits, each of which can be encoded into \(n\) binary digits. More specifically, an \((n, m)\)-block code consists of an encoding function
and a decoding function
A codeword is any element in the image of \(E\text{.}\) We also require that \(E\) be one-to-one so that two information blocks will not be encoded into the same codeword. If our code is to be error-correcting, then \(D\) must be onto.
Example 8.9.
The even-parity coding system developed to detect single errors in ASCII characters is an \((8,7)\)-block code. The encoding function is
where \(x_8 = x_7 + x_6 + \cdots + x_1\) with addition in \({\mathbb Z}_2\text{.}\)
Let \({\mathbf x} = (x_1, \ldots, x_n)\) and \({\mathbf y} = (y_1, \ldots, y_n)\) be binary \(n\)-tuples. The Hamming distance or distance, \(d({\mathbf x}, {\mathbf y})\text{,}\) between \({\mathbf x}\) and \({\mathbf y}\) is the number of bits in which \({\mathbf x}\) and \({\mathbf y}\) differ. The distance between two codewords is the minimum number of transmission errors required to change one codeword into the other. The minimum distance for a code, \(d_{\min}\text{,}\) is the minimum of all distances \(d({\mathbf x}, {\mathbf y})\text{,}\) where \({\mathbf x}\) and \({\mathbf y}\) are distinct codewords. The weight, \(w({\mathbf x})\text{,}\) of a binary codeword \({\mathbf x}\) is the number of \(1\)s in \({\mathbf x}\text{.}\) Clearly, \(w({\mathbf x}) = d({\mathbf x}, {\mathbf 0})\text{,}\) where \({\mathbf 0} = (00 \cdots 0)\text{.}\)
Example 8.10.
Let \({\mathbf x} = (10101)\text{,}\) \({\mathbf y} = (11010)\text{,}\) and \({\mathbf z} = (00011)\) be all of the codewords in some code \(C\text{.}\) Then we have the following Hamming distances:
The minimum distance for this code is 3. We also have the following weights:
The following proposition lists some basic properties about the weight of a codeword and the distance between two codewords. The proof is left as an exercise.
Proposition 8.11.
Let \({\mathbf x}\text{,}\) \({\mathbf y}\text{,}\) and \({\mathbf z}\) be binary \(n\)-tuples. Then
\(w({\mathbf x}) = d( {\mathbf x}, {\mathbf 0})\text{;}\)
\(d( {\mathbf x}, {\mathbf y}) \geq 0\text{;}\)
\(d( {\mathbf x}, {\mathbf y}) = 0\) exactly when \({\mathbf x} = {\mathbf y}\text{;}\)
\(d( {\mathbf x}, {\mathbf y})= d( {\mathbf y}, {\mathbf x})\text{;}\)
\(d( {\mathbf x}, {\mathbf y}) \leq d( {\mathbf x}, {\mathbf z}) + d( {\mathbf z}, {\mathbf y})\text{.}\)
The weights in a particular code are usually much easier to compute than the Hamming distances between all codewords in the code. If a code is set up carefully, we can use this fact to our advantage.
Suppose that \({\mathbf x} = (1101)\) and \({\mathbf y} = (1100)\) are codewords in some code. If we transmit \((1101)\) and an error occurs in the rightmost bit, then (1100) will be received. Since \((1100)\) is a codeword, the decoder will decode \((1100)\) as the transmitted message. This code is clearly not very appropriate for error detection. The problem is that \(d({\mathbf x}, {\mathbf y}) = 1\text{.}\) If \({\mathbf x} = (1100)\) and \({\mathbf y} = (1010)\) are codewords, then \(d({\mathbf x}, {\mathbf y}) = 2\text{.}\) If \({\mathbf x}\) is transmitted and a single error occurs, then \({\mathbf y}\) can never be received. Table 8.12 gives the distances between all 4-bit codewords in which the first three bits carry information and the fourth is an even parity check bit. We can see that the minimum distance here is \(2\text{;}\) hence, the code is suitable as a single error-detecting code.
\(0000\) | \(0011\) | \(0101\) | \(0110\) | \(1001\) | \(1010\) | \(1100\) | \(1111\) | |
\(0000\) | \(0\) | \(2\) | \(2\) | \(2\) | \(2\) | \(2\) | \(2\) | \(4\) |
\(0011\) | \(2\) | \(0\) | \(2\) | \(2\) | \(2\) | \(2\) | \(4 \) | \(2\) |
\(0101\) | \(2\) | \(2\) | \(0\) | \(2\) | \(2\) | \(4\) | \(2\) | \(2\) |
\(0110\) | \(2\) | \(2\) | \(2\) | \(0\) | \(4\) | \(2\) | \(2\) | \(2\) |
\(1001\) | \(2\) | \(2\) | \(2\) | \(4\) | \(0\) | \(2\) | \(2\) | \(2\) |
\(1010\) | \(2\) | \(2\) | \(4\) | \(2\) | \(2\) | \(0\) | \(2\) | \(2\) |
\(1100\) | \(2\) | \(4\) | \(2\) | \(2\) | \(2\) | \(2\) | \(0\) | \(2\) |
\(1111\) | \(4\) | \(2\) | \(2\) | \(2\) | \(2\) | \(2\) | \(2\) | \(0\) |
To determine exactly what the error-detecting and error-correcting capabilities for a code are, we need to analyze the minimum distance for the code. Let \({\mathbf x}\) and \({\mathbf y}\) be codewords. If \(d({\mathbf x}, {\mathbf y}) = 1\) and an error occurs where \({\mathbf x}\) and \({\mathbf y}\) differ, then \({\mathbf x}\) is changed to \({\mathbf y}\text{.}\) The received codeword is \({\mathbf y}\) and no error message is given. Now suppose \(d({\mathbf x}, {\mathbf y}) = 2\text{.}\) Then a single error cannot change \({\mathbf x}\) to \({\mathbf y}\text{.}\) Therefore, if \(d_{\min} = 2\text{,}\) we have the ability to detect single errors. However, suppose that \(d({\mathbf x}, {\mathbf y}) = 2\text{,}\) \({\mathbf y}\) is sent, and a noncodeword \({\mathbf z}\) is received such that
Then the decoder cannot decide between \({\mathbf x}\) and \({\mathbf y}\text{.}\) Even though we are aware that an error has occurred, we do not know what the error is.
Suppose \(d_{\min} \geq 3\text{.}\) Then the maximum-likelihood decoding scheme corrects all single errors. Starting with a codeword \({\mathbf x}\text{,}\) an error in the transmission of a single bit gives \({\mathbf y}\) with \(d({\mathbf x}, {\mathbf y}) = 1\text{,}\) but \(d({\mathbf z}, {\mathbf y}) \geq 2\) for any other codeword \({\mathbf z} \neq {\mathbf x}\text{.}\) If we do not require the correction of errors, then we can detect multiple errors when a code has a minimum distance that is greater than or equal to \(3\text{.}\)
Theorem 8.13.
Let \(C\) be a code with \(d_{\min} = 2n + 1\text{.}\) Then \(C\) can correct any \(n\) or fewer errors. Furthermore, any \(2n\) or fewer errors can be detected in \(C\text{.}\)
Proof.
Suppose that a codeword \({\mathbf x}\) is sent and the word \({\mathbf y}\) is received with at most \(n\) errors. Then \(d( {\mathbf x}, {\mathbf y}) \leq n\text{.}\) If \({\mathbf z}\) is any codeword other than \({\mathbf x}\text{,}\) then
Hence, \(d({\mathbf y}, {\mathbf z} ) \geq n+1\) and \({\mathbf y}\) will be correctly decoded as \({\mathbf x}\text{.}\) Now suppose that \({\mathbf x}\) is transmitted and \({\mathbf y}\) is received and that at least one error has occurred, but not more than \(2n\) errors. Then \(1 \leq d( {\mathbf x}, {\mathbf y} ) \leq 2n\text{.}\) Since the minimum distance between codewords is \(2n +1\text{,}\) \({\mathbf y}\) cannot be a codeword. Consequently, the code can detect between \(1\) and \(2n\) errors.
Example 8.14.
In Table 8.15, the codewords \({\mathbf c}_1 = (00000)\text{,}\) \({\mathbf c}_2 = (00111)\text{,}\) \({\mathbf c}_3 = (11100)\text{,}\) and \({\mathbf c}_4 = (11011)\) determine a single error-correcting code.
\(00000\) | \(00111\) | \(11100\) | \(11011\) | |
\(00000 \) | \(0\) | \(3\) | \(3\) | \(4\) |
\(00111 \) | \(3\) | \(0\) | \(4\) | \(3\) |
\(11100 \) | \(3\) | \(4\) | \(0\) | \(3\) |
\(11011 \) | \(4\) | \(3\) | \(3\) | \(0\) |
Subsection Historical Note
Modern coding theory began in 1948 with C. Shannon's paper, “A Mathematical Theory of Information” [7]. This paper offered an example of an algebraic code, and Shannon's Theorem proclaimed exactly how good codes could be expected to be. Richard Hamming began working with linear codes at Bell Labs in the late 1940s and early 1950s after becoming frustrated because the programs that he was running could not recover from simple errors generated by noise. Coding theory has grown tremendously in the past several decades. The Theory of Error-Correcting Codes, by MacWilliams and Sloane [5], published in 1977, already contained over 1500 references. Linear codes (Reed-Muller \((32, 6)\)-block codes) were used on NASA's Mariner space probes. More recent space probes such as Voyager have used what are called convolution codes. Currently, very active research is being done with Goppa codes, which are heavily dependent on algebraic geometry.