Chapter

5 Error-Correcting Codes

https://doi.org/10.1093/acprof:oso/9780198509417.003.0005

Pages

74–115

Published:

July 2001

Abstract

Reliable transmission of information through noisy channels plays a vital role in modern society. Some aspects of this problem have close formal similarities to the theory of spin glasses. Noise in the transmission channel can be related to random interactions in spin glasses and the bit sequence representing information corresponds to the Ising spin configuration. The replica method serves as a powerful tool of analysis, and TAP-like equations can be used as a practical implementation of the algorithm to infer the original message. The gauge theory also provides an interesting point of view. This chapter introduces these problems.

5.1 Error-correcting codes

Information theory was initiated by Shannon half a century ago. It formulates various basic notions on information transmission through noisy channels and develops a framework to manipulate those abstract objects. We first briefly review some ideas of information theory, and then restate the basic concepts, such as noise, communication, and information inference, in terms of statistical mechanics of spin glasses.

5.1.1 Transmission of information

Suppose that we wish to transmit a message (information) represented as a sequence of N bits from one place to another. The path for information transmission is called a (transmission) channel. A channel usually carries noise and the output from a channel is different in some bits from the input. We then ask ourselves how we can infer the original message from the noisy output.

It would be difficult to infer which bit of the output is corrupted by noise if the original message itself had been fed into the channel. It is necessary to make the message redundant before transmission by adding extra pieces of information, by use of which the noise can be removed. This process is called channel coding (or encoding), or simply coding. The encoded message is transmitted through a noisy channel. The process of information retrieval from the noisy output of a channel using redundancy is called decoding.

A very simple example of encoding is to repeat the same bit sequence several (for instance, three) times. If the three sequences received at the end of the channel coincide, one may infer that there was no noise. If a specific bit has different values in the three sequences, one may infer its correct value (0 or 1) by the majority rule. For example, when the original message is (0, 0, 1, 1, 0) and the three output sequences from a noisy channel are (0, 0, 1, 1, 0), (0, 1, 1, 1, 0), and (0, 0, 1, 1, 0), the correct second bit can be inferred to be 0 whereas all the other bits coincide to be (0, *, 1, 1, 0). This example shows that the redundancy is helpful to infer the original information from the noisy message.

A more sophisticated method is the parity-check code. For example, suppose that seven bits are grouped together and one counts the number of ones in the group. If the number is even, one adds 0 as the eighth bit (the parity bit) and adds 1 otherwise. Then there are always an even number of ones in the group of eight bits in the encoded message (code word). If the noise rate of the channel is not very large and at most only one bit is flipped by noise out of the eight received bits, one may infer that the output of the channel carries no noise when one finds even ones in the group of eight bits. If there are odd ones, then there should be some noise, implying error detection (Fig. 5.1). Error correction after error detection needs some further trick to be elucidated in the following sections.

Fig. 5.1.

Parity-check code

Open in new tab Download slide

5.1.2 Similarity to spin glasses

It is convenient to use the Ising spin ±1 instead of 0 and 1 to treat the present problem by statistical mechanics. The basic operation on a bit sequence is the sum with modulo 2 (2=0 with mod 2), which corresponds to the product of Ising spins if one identifies 0 with S _i = 1 and 1 with S _i = −1. For example, 0 + 1 = 1 translates into 1 × (−1) = −1 and 1 + 1 = 0 into (−1) x (−1) = 1. We hereafter use this identification of an Ising spin configuration with a bit sequence.

Generation of the parity bit in the parity-check code corresponds to the product of appropriate spins. By identifying such a product with the interaction between the relevant spins, we obtain a spin system very similar to the Mattis model of spin glasses.

In the Mattis model one allocates a randomly quenched Ising spin ξ _i to each site, and the interaction between sites i and j is chosen to be J _ij = ξ_iξ_j. The Hamiltonian is then

(5.1)

The ground state is clearly identical to the configuration of the quenched spins S _i = ξ _i (∀i) (or its total reversal S _i = —ξ _i (∀i)), see Fig. 5.2(a).

Fig. 5.2.

The ground-state spin configurations of the Mattis model without noise added (a), the interaction marked by a cross flipped by noise (b), and three interactions flipped by strong noise (c). Thin lines represent ferromagnetic interactions and bold lines antiferromagnetic interactions. The plaquettes marked by dots are frustrated.

Open in new tab Download slide

Returning to the problem of error-correcting codes, we form the Mattis-type interactions ${J_{i_{1} … i_{r}}^{0} = ξ_{i_{1}} … ξ_{i r}}$ with r an integer for appropriate combinations of sites {i _l … i _r}. We then feed these interactions (encoded message), instead of the original spin configuration ξ≡{ξ_i} (original message), into the noisy channel. The encoded information is redundant because the number of interactions Nb (which is the number of elements in the set {(i ₁ …i _r)}) is larger than the number of spins n.

For instance, the conventional r = 2 Mattis model on the two-dimensional square lattice has N_B = 2N interactions, the number of neighbouring sites. Forthe original interactions without noise J⁰ ={ξ_iξ_j}, the product of the $J_{i j}^{0}$ along an arbitrary closed loop $c, f_{c} = Π J_{i j}^{0} = Π (ξ_{i} ξ_{j})$ is alwavs unity (positive) since all the ξ_i appear an even number of times in the product. Thus the Mattis model has no frustration. However, noise in the channel flips some elements of J ⁰ and therefore the output of the channel, if it is regarded as interactions of the spin system, includes frustration (Fig. 5.2(b)). Nevertheless the original spin configuration is still the ground state of such a system if the noise rate is not large (i.e. only a small number of bits are flipped) as exemplified in Fig. 5.2(b). It has thus been shown that correct inference of the original message is possible even if there exists a small amount of noise in the channel, as long as an appropriate procedure is employed in encoding and decoding the message.

5.1.3 Shannon bound

It is necessary to introduce redundancy appropriately to transmit information accurately through a noisy channel. It can indeed be proved that such redundancy should exceed a threshold so that we are able to retrieve the original message without errors.

Let us define the transmission rate R of information by a channel as

(5.2)

For smaller denominator, the redundancy is smaller and the transmission rate is larger. Strictly speaking, the numerator of (5.2) should be the 'number of informative bits in the original message'. For biased messages, where 0 and 1 appear with unequal probabilities, the amount of information measured in bits is smaller than the length of the binary message itself. This can easily be verified by the extreme case of a perfectly biased message with only ones; the message carries no information in such a case. By multiplying (5.2) by the number of bits transmitted per second, one obtains the number of information bits transmitted per second.

We consider a memoryless binary symmetric channel (BSC) where noise flips a bit from 1 to 0 and 0 to 1 independently at each bit with a given probability. It is known that the transmission rate should satisfy the following inequality so that error-free transmission is possible through the BSC in the limit of a very long bit sequence:

(5.3)

where C is a function of the noise probability and is called the channel capacity. The capacity of the BSC is

(5.4)

Here p is the probability that a bit is flipped by noise. Equation (5.3) is called the channel coding theorem of Shannon and implies that error-free transmission is possible if the transmission rate does not exceed the channel capacity and an appropriate procedure of encoding and decoding is employed. Similar results hold for other types of channels such as the Gaussian channel to be introduced later. A sketch of the arguments leading to the channel coding theorem is given in Appendix C.

An explicit example of a code that saturates the Shannon bound (5.3) asymptotically as an equality is the Sourlas code (Sourlas 1989): one takes all possible products of r spins from N sites to form Mattis-type interactions. We have mainly explained the conventional Mattis model with r = 2 in §5.1.2, and we discuss the general case of an arbitrary r (= 2, 3, 4, …) hereafter. This is nothing but the infinite-range model with r-body interactions.⁵ It will be shown later that the Shannon bound (5.3) is asymptotically achieved and the error rate approaches zero in the Sourlas code if we take the limit N → ∞ first and r → ∞ afterwards. It should be noted, however, that the inequality (5.3) reduces to an equality with both sides approaching zero in the Sourlas code, which means that the transmission rate R is zero asymptotically. Therefore the transmission is not very efficient. A trick to improve this point was shown recently to be to take the product of a limited number of combinations, not all possible combinations of r spins, from N sites. All of these points will be elucidated in detail later in the present chapter.

5.1.4 Finite-temperature decoding

Let us return to the argument of §5.1.2 and consider the problem of inference of spin configurations when the noise probability is not necessarily small. It was shown in §5.1.2 that the ground state of the Ising spin system (with the output of the channel as interactions) is the true original message (spin configuration) if the channel is not very noisy. For larger noise, the ground state is different from the original spin configuration (Fig. 5.2(c)). This suggests that the original spin configuration is one of the excited states and thus one may be able to decode more accurately by searching for states at finite temperature. It will indeed be shown that states at a specific finite temperature T _p determined by the error rate p give better results under a certain criterion. This temperature T _p turns out to coincide with the temperature at the Nishimori line discussed in Chapter 4 (Ruján 1993).

5.2 Spin glass representation

We now formulate the arguments in the previous section in a more quantitative form and proceed to explicit calculations (Sourlas 1989).

5.2.1 Conditional probability

Suppose that the Ising spin configuration ξ= {ξ_i} has been generated according to a probability distribution function P(ξ). This distribution P(ξ) for generating the original message is termed the prior. Our goal is to infer the original spin configuration from the output of a noisy channel as accurately as possible. Following the suggestion in the previous section, we form a set of products of r spins

(5.5)

for appropriately chosen combinations of the ξ_i and feed the set of interactions into the channel. We first consider the BSC, and the output of the channel $J_{i_{1} …}_{i_{r}}$ is flipped from the corresponding input $J_{i_{1…} i_{r}}^{0} = ξ_{i_{1} …} ξ_{i_{r}}$ probability p and is equal to ${-ξ}_{i_{1} …} ξ_{i_{r}}$ . The other possibility of correct output $ξ_{i_{1} …} ξ_{i_{r}}$ has the probability 1 — p. The output probability of a BSC can be expressed in terms of a conditional probability:

(5.6)

where β_p is a function of p defined as

(5.7)

Equation (5.6) is equal to 1 − p when $J_{i_{1} …}_{i_{r}}$ = $ξ_{i_{1} …} ξ_{i_{r}}$ according to (5.7) and is equal to p when $J_{i_{1} …}_{i_{r}}$ = ${-ξ}_{i_{1} …} ξ_{i_{r}}$ _, implying that (5.6) is the correct conditional probability to characterize the channel. Equations (5.6) and (5.7) are analogous to the distribution function (4.4)−(4.5) in the gauge theory. The inverse of β _P defined in (5.7), denoted T _p at the end of the previous section, coincides with the temperature on the Nishimori line. One should note here that p in the present chapter is 1 — p in Chapter 4. The temperature T _p is sometimes called the Nishimori temperature in the literature of error-correcting codes.

Assume that (5.6) applies to each set (i _l … i _r) independently. This is a memoryless channel in which each bit is affected by noise independently. The overall probability is then the product of (5.6)

(5.8)

where the sum in the exponent is taken over all sets (i _l … i _r) for which the spin products are generated by (5.5). The symbol N_B is for the number of terms in this sum and is equal to the number of bits fed into the channel.

5.2.2 Bayes formula

The task is to infer the original message (spin configuration) ξ from the output J = { $J_{i_{1} …}_{i_{r}}$ } For this purpose, it is necessary to introduce the conditional probability of ξ given J, which is called the posterior. The posterior is the conditional probability with the two entries of the left hand side of (5.8), J and ξ, exchanged. The Bayes formula is useful for exchanging these two entries.

The joint probability P(A,B) that two events A and B occur is expressed in terms of the product of P(B) and the conditional probability P(A|B) for A under the condition that B occurred. The same holds if A and B are exchanged. Thus we have

(5.9)

It follows immediately that

(5.10)

Equation (5.10) is the Bayes formula.

We can express P(σ|J) in terms of (5.8) using the Bayes formula:

(5.11)

We have written σ = {σ₁, …, σ_N} for dynamical variables used for decoding. The final decoded result will be denoted by $\hat{ξ} = {{\hat{ξ}}_{1, …,} {\hat{ξ}}_{N}}$ and we reserve ξ = {ξ ₁,…,ξ_n} for the true original configuration. Equation (5.11) is the starting point of our argument.

The probability P(J|σ) represents the characteristics of a memoryless BSC and is given in (5.8). It is therefore possible to infer the original message by (5.11) if the prior P(σ) is known. Explicit theoretical analysis is facilitated by assuming a message source that generates various messages with equal probability. This assumption is not necessarily unnatural in realistic situations where information compression before encoding usually generates a rather uniform distribution of zeros and ones. In such a case, p(σ) can be considered a constant. The posterior is then

(5.12)

Since J is given and fixed in the present problem, (5.12) is nothing more than the Boltzmann factor of an Ising spin glass with randomly quenched interactions J. We have thus established a formal equivalence between the probabilistic inference problem of messages for a memoryless BSC and the statistical mechanics of the Ising spin glass.

5.2.3 MAP and MPM

Equation (5.12) is the probability distribution of the inferred spin configuration σ, given the output of the channel J. Then the spin configuration to maximize (5.12) is a good candidate for the decoded (inferred) spin configuration. Maximization of the Boltzmann factor is equivalent to the ground-state search of the corresponding Hamiltonian

(5.13)

This method of decoding is called the maximum a posteriori probability (MAP). This is the idea already explained in §5.1.2. Maximization of the conditional probability P(J|σ) with respect to σ is equivalent to maximization of the posterior P(σ|J) if the prior P(σ) is uniform. The former idea is termed the maximum likelihood method as P(J|σ) is the likelihood function of σ.

The MAP maximizes the posterior of the whole bit sequence σ. There is another strategy of decoding in which we focus our attention on a single bit i, not the whole sequence. This means we trace out all the spin variables except for a single σ_i to obtain the posterior only of σ_i. This process is called marginalization in statistics:

(5.14)

We then compare P(σ _i = 1|J) and p(σ_i = — 1|J) and, if the former is larger, we assign one to the decoded result of the ith bit $({\hat{ξ}}_{i} = 1)$ and assign $({\hat{ξ}}_{i} = - 1)$ otherwise. This process is carried out for all bits, and the set of thus decoded bits constitutes the final result. This method is called the finite-temperature decoding or the maximizer of posterior marginals (MPM) and clearly gives a different result than the MAP (Ruján 1993; Nishimori 1993; Sourlas 1994; Iba 1999).

It is instructive to consider the MPM from a different point of view. The MPM is equivalent to accepting the sign of the difference of two probabilities as the ith decoded bit

(5.15)

This may also be written as

(5.16)

${< σ_{i} >}_{β_{ρ}}$ Here is the local magnetization with (5.12) as the Boltzmann factor. Equation (5.16) means to calculate the local magnetization at a finite temperature $T_{p} = β_{p}^{- 1}$ and assign its sign to the decoded bit. The MAP can be regarded as the low-temperature (large-β) limit in place of finite β _P in (5.16). The MAP was derived as the maximizer of the posterior of the whole bit sequence, which has now been shown to be equivalent to the low-temperature limit of the MPM, finite-temperature decoding. The MPM, by contrast, maximizes the posterior of a single bit. We shall study the relation between these two methods in more detail subsequently.

5.2.4 Gaussian channel

It is sometimes convenient to consider channels other than the BSC. A typical example is a Gaussian channel. The encoded message $ξ_{i_{1} …} ξ_{i_{r}}$ (= ±1) is fed into the channel as a signal of some amplitude, J ₀ $ξ_{i_{1} …} ξ_{i_{r}}$ . The output is continuously distributed around this input with the Gaussian distribution of variance J ²:

(5.17)

If the prior is uniform, the posterior is written using the Bayes formula as

(5.18)

Comparison of this equation with (5.12) shows that the posterior of the Gaussian channel corresponds to that of the BSC with β _p replaced by J ₀/J ². We can therefore develop the following arguments for both of these channels almost in the same way.

5.3 Overlap

5.3.1 Measure of decoding performance

It is convenient to introduce a measure of success of decoding that represents the proximity of the decoded message to the original one. The decoded ith bit is ${\hat{ξ}}_{i} = sgn {< σ_{i} >}_{β}$ with β = β _P for the MPM and β → ∞ for the MAP. It sometimes happens that one is not aware of the noise rate p of the channel, or equivalently β _p. Consequently it makes sense even for the MPM to develop arguments with β unspecified, which we do in the following.

The product of ${\hat{ξ}}_{i}$ and the corresponding original bit ξ_i, ξ _isgn<σ_i>_β, is 1 if these two coincide and −1 otherwise. An appropriate strategy is to increase the probability that this product is equal to one. We average this product over the output probability of the channel P(J|ξ) and the prior P(ξ),

(5.19)

which is the overlap of the original and decoded messages. We have denoted the sum over the site variables ξ by Tr_ξ and the one over the bond variables J by $\sum_{J} . A$ better decoding would be the one that gives a larger M(β). For a uniform message source P(ξ) = 2^−N, the average over ξ and J leads to the right hand side independent of i,

(5.20)

This expression may be regarded as the average of sgn <σ_i>_β with the weight proportional to Z(β_p) ${< ξ >}_{β_{ρ}}$ , which is essentially equivalent to the configurational average of the correlation function with a similar form of the weight that appears frequently in the gauge theory of spin glasses, for example (4.43).

The overlap is closely related with the Hamming distance of the two messages (the number of different bits at the corresponding positions). For closer messages, the overlap is larger and the Hamming distance is smaller. When the two messages coincide, the overlap is one and the Hamming distance is zero, while, for two messages completely inverted from each other, the overlap is −1 and the Hamming distance is N.

5.3.2 Upper bound on the overlap

An interesting feature of the overlap M(β) is that it is a non-monotonic function of β with its maximum at β = β _p:

(5.21)

In other words the MPM at the correct parameter value β = β _P gives the optimal result in the sense of maximization of the overlap (Ruján 1993; Nishimori 1993; Sourlas 1994; Iba 1999). The MPM is sometimes called the Bayes-optimal strategy.

To prove (5.21), we first take the absolute value of both sides of (5.20) and exchange the absolute value operation and the sum over J to obtain

(5.22)

where we have used |sgn<σ_i>_β| = 1. By rewriting the right hand side as follows, we can derive (5.21):

(5.23)

Almost the same manipulations lead to the following inequality for the Gaussian channel:

(5.24)

We have shown that the bit-wise overlap defined in (5.19) is maximized by the MPM with the correct parameter (β = β _p for the BSC). This is natural in that the MPM at β = β _P was introduced to maximize the bit-wise (marginalized) posterior. The MAP maximizes the posterior of the whole bit sequence σ, but its probability of error for a given single bit is larger than the MPM with the correct parameter value. This observation is also confirmed from the viewpoint of Bayesian statistics (Sourlas 1994; Iba 1999).

The inequalities (5.21) and (5.24) are essentially identical to (4.63) derived by the gauge theory in §4.6.4. To understand this, we note that generality is not lost by the assumption ξ_i = 1(∀i) in the calculation of M(β) for a uniform information source. This may be called the ferromagnetic gauge. Indeed, the gauge transformation $J_{i_{1…} i_{r}} \to J_{i_{1…} i_{r}} ξ_{i_{1} …} ξ_{i_{r}}$ and $σ_{i} \to σ_{i} ξ_{i}$ in (5.19) removes ξ_i from the equation. Then M defined in (5.19) is seen to be identical to (4.63) with the two-point correlation replaced with a single-point spin expectation value. The argument in §4.6.4 applies not only to two-point correlations but to any correlations, and thus the result of §4.6.4 agrees with that of the present section. Therefore the overlap M of the decoded bit and the original bit becomes a maximum on the Nishimori line as a function of the decoding temperature with fixed error rate p. For the Gaussian channel, (5.24) corresponds to the fact that the Nishimori line is represented as J ₀/J ² = β as observed in §4.3.3.

5.4 Infinite-range model

Inequality (5.21) shows that M(β) is a non-monotonic function, but the inequality does not give the explicit β-dependence. It may also happen that it is not easy to adjust β exactly to β _P in practical situations where one may not know the noise rate p very precisely. One is then forced to estimate β _p but should always be careful of errors in the parameter estimation. It is therefore useful if one can estimate the effects of errors in the parameter estimation on the overlap M(β).

A solvable model, for which we can calculate the explicit form of M(β), would thus be helpful as a guide for non-solvable cases. The infinite-range model serves as an important prototype for this and other purposes.

5.4.1 Infinite-range model

The Sourlas code explained in §5.1.3 is represented as the infinite-range model. In the Sourlas code, the sum in the Hamiltonian

(5.25)

runs over all possible combinations of r spins out of N spins. Then the number of terms is $N_{B} = (_{r}^{N}) .$ This infinite-range model with r-body interactions can be solved explicitly by the replica method (Derrida 1981; Gross and Mézard 1984; Gardner 1985; Nishimori and Wong 1999). We show the solution for the Gaussian channel. The BSC is expected to give the same result in the thermodynamic limit according to the central limit theorem.

The parameters J ₀ and J in the Gaussian distribution (5.17) must be scaled appropriately with N so that the expectation value of the infinite-range Hamiltonian (5.25) is extensive (proportional to N) in the limit N → ∞. If we also demand that physical quantities remain finite in the limit r → ∞ after N → ∞, then r should also be appropriately scaled. The Gaussian distribution satisfying these requirements is

(5.26)

where J and j ₀ are independent of N and r. The appropriateness of (5.26) is justified by the expressions of various quantities to be derived below that have non-trivial limits as N → ∞ and r → ∞.

5.4.2 Replica calculations

Following the general prescription of the replica method, we first calculate the configurational average of the nth power of the partition function and take the limit n → 0. Order parameters naturally emerge in due course. The overlap M is expressed as a function of these order parameters.

The configurational average of the nth power of the partition function of the infinite-range model is written for the uniform prior (P = 2^−N) as

(5.27)

where α (= 1, …, n) is the replica index. A gauge transformation

(5.28)

in (5.27) removes ξ from the integrand. The problem is thus equivalent to the case of ξ _i = 1(∀i) the ferromagnetic gauge. We mainly use this gauge in the present chapter. The sum over ξin (5.27) then simply gives 2ⁿ, and the factor 2^−N in front of the whole expression disappears.

It is straightforward to carry out the Gaussian integral in (5.27). If we ignore the trivial overall constant and terms of lower order in N, the result is

(5.29)

Here Tr_σ is the sum over σ. In deriving the final expression, we have used a relation that generalizes the following relation to the r-body case:

(5.30)

It is convenient to introduce the variables

(5.31)

to replace the expressions inside the parentheses in the final expression of (5.29) by q _αβ and m _α so that we can carry out the Tr_σ. operation. The condition to satisfy (5.31) is imposed by the Fourier-transformed expressions of delta functions with integration variables ${\hat{q}}_{α β}$ and ${\hat{m}}_{α}$

(5.32)

We can now operate Tr_σ independently at each i to find

(5.33)

Here Tr denotes sums over single-site replica spins {σ ¹, …, σ ⁿ}.

5.4.3 Replica-symmetric solution

Further calculations are possible under the assumption of replica symmetry:

(5.34)

We fix n and take the thermodynamic limit N → ∞ to evaluate the integral by steepest descent. The result is

(5.35)

where $D u = e^{- u^{2} / 2} d u / \sqrt{2 π}$ This u has been introduced to reduce the double sum over a and $β (\sum_{α < β})$ to a single sum. The trace operation can be performed for each replica independently so that the free energy βf defined by [Z ⁿ] = e^−Nnβf becomes in the limit n → 0

(5.36)

The equations of state for the order parameters are determined by the saddlepoint condition. By variation of (5.36), we obtain

(5.37)

(5.38)

Eliminating $\hat{q}$ and $\hat{m}$ from (5.38) using (5.37), we can write the equations for q and m in closed form:

(5.39)

where

(5.40)

These reduce to the equations of state for the conventional SK model, (2.28) and (2.30), when r = 2. One should remember that 2j ₀ here corresponds to J ₀ in the conventional notation of the SK model as is verified from (5.27) with r = 2.

5.4.4 Overlap

The next task is to derive the expression of the overlap M. An argument similar to §2.2.5 leads to formulae expressing the physical meaning of q and m:

(5.41)

Comparison of (5.41) and (5.39) suggests that tanh² β(.) and tanh β(.) in the integrands of the latter may be closely related with <σ _i>² and <σ _i> in the former. To confirm this, we add $h \sum_{i} σ_{i}^{α} σ_{i}^{β}$ to the final exponent of (5.27) and follow the calculations of the previous section to find a term hσ ^α σ ^β in the exponent of the integrand of (5.35). We then differentiate − βnf with respect to h and let n → 0, h → 0 to find that σ^α and σ ^β are singled out in the trace operation of replica spins, leading to the factor tanh² β(.). A similar argument using an external field term $h \sum_{i} σ_{i}^{α}$ and differentiation by h leads to tanh β(.).

It should now be clear that the additional external field with the product of k spins, $h \sum_{i} σ_{i}^{α} σ_{i}^{β} …,$ yields

(5.42)

Thus, for an arbitrary function F(x) that can be expanded around x = 0, the following identity holds:

(5.43)

The overlap is, in the ferromagnetic gauge, M(β) = [sgn<σ _i>β]. If we therefore take as F(x) a function that approaches sgn(x) (e.g. tanh(ax) with a → ∞), we obtain the desired relation for the overlap:

(5.44)

It has thus been established that M(β) is determined as a function of q and m through G.

5.5 Replica symmetry breaking

The system (5.25) and (5.26) in the ferromagnetic gauge is a spin glass model with r-body interactions. It is natural to go further to investigate the properties of the RSB solution (Derrida 1981; Gross and Mèzard 1984; Gardner 1985; Nishimori and Wong 1999; Gillin et al. 2001). We shall show that, for small values of the centre of distribution j ₀, a 1RSB phase appears after the RS paramagnetic phase as the temperature is lowered. A full RSB phase follows at still lower temperature. For larger j ₀, paramagnetic phase, ferromagnetic phase without RSB, and then ferromagnetic phase with RSB phases appear sequentially as the temperature is decreased.

5.5.1 First-step Rsb

The free energy with 1RSB can be derived following the method described in §3.2

(5.45)

where x (0 ≤ x ≤ 1) is the boundary of q _o and q ₁ in the matrix block, denoted by m ₁ in §3.2. Extremization of (5.45) with respect to $q_{0}, q_{1}, {\hat{q}}_{0}, {\hat{q}}_{1}, m, \hat{m}, x$ leads to the equations of state. For m, q ₀,q ₁,

(5.46)

Elimination of $\hat{m}, {\hat{q}}_{0}, {\hat{q}}_{1},$ using (5.46) from the equations of extremization with respect to ${\hat{q}}_{0}, {\hat{q}}_{1}, \hat{m}$ leads to

(5.47)

(5.48)

(5.49)

(5.50)

These equations coincide with (3.32)−(3.34) for r = 2 if we set h = 0 and 2j _o = J _0. The equation of extremization by x does not have an intuitively appealing compact form so that we omit it here.

The AT stability condition of the RS solution is

(5.51)

The criterion of stability for 1RSB is expressed similarly. For the replica pair (αβ) in the same diagonal block, the stability condition for small deviations of q _αβ and ${\hat{q}}_{α β}$ from 1RSB is

(5.52)

Further steps of RSB take place with the diagonal blocks breaking up into smaller diagonal and off-diagonal blocks. Thus it is sufficient to check the intra-block stability condition (5.52) only.

5.5.2 Random energy model

The model in the limit r → ∞ is known as the random energy model (REM) (Derrida 1981). The problem can be solved completely in this limit and the spin glass phase is characterized by 1RSB.

As its name suggests, the REM has an independent distribution of energy. Let us demonstrate this fact for the case of j ₀ = 0. The probability that the system has energy E will be denoted by P(E),

(5.53)

The average [• • •] over the distribution of J, (5.26), can be carried out if we express the delta function by Fourier transformation. The result is

(5.54)

The simultaneous distribution function of the energy values E ₁ and E ₂ of two independent spin configurations σ ⁽¹⁾ and σ ⁽²⁾ with the same set of interactions can be derived similarly:

(5.55)

It is easy to see in the limit r → ∞ that

(5.56)

which implies independence of the energy distributions of two spin configurations. Similar arguments hold for three (and more) energy values.

The number of states with energy E is, according to the independence of energy levels,

(5.57)

This expression shows that there are very many energy levels for $| E | < N J \sqrt{\log 2} \equiv E_{0}$ but none in the other range |E| > e ₀ in the limit N → ∞ The entropy for |E| < E ₀ is

(5.58)

We then have, from dS/dE = 1/T,

(5.59)

The free energy is therefore

(5.60)

where $T_{c} / J = {(2 \sqrt{\log 2})}^{- 1}$ Equation (5.60) indicates that there is a phase transition at T = T _c and the system freezes out completely (S = 0) below the transition temperature.

5.5.3 Replica solution in the limit r → ∞

It is instructive to rederive the results of the previous subsection by the replica method. We first discuss the case j ₀ = 0. It is quite reasonable to expect no RSB in the paramagnetic (P) phase at high temperature. We thus set $q = \hat{q} = m = \hat{m} = 0$ in (5.36) to obtain

(5.61)

This agrees with (5.60) for T > T _c.

It is necessary to introduce RSB in the spin glass (SG) phase. We try 1RSB and confirm that the result agrees with that of the previous subsection. For the 1RSB to be non-trivial (i.e. different from the RS), it is required that q ₀ < q ₁ ≤ 1 and ${\hat{q}}_{0} < {\hat{q}}_{1}$ Then, if q ₁ < 1, we find ${\hat{q}}_{0} = {\hat{q}}_{1} = 0$ in the limit r → ∞ from (5.46). We therefore have q ₁ = 1, and ${\hat{q}}_{1} = β^{2} J^{2} r / 2$ from (5.46). Then, in (5.48), we find $G_{1} = J \sqrt{r / 2 v} for r ≫ 1$ for r ≫ 1 and the v-integral in the numerator vanishes, leading to q ₀ = 0. Hence ${\hat{q}}_{0} = 0$ from (5.46). From these results, the free energy (5.45) in the limit r → ∞ is

(5.62)

Variation with respect to x gives

(5.63)

The highest temperature satisfying this equation is the following one for x = 1:

(5.64)

and therefore, for T <T_C,

(5.65)

This agrees with (5.60), which confirms that 1RSB is exact in the temperature range T < T _ϲ. It is also possible to show by explicit k-step RSB calculations that the solution reduces to the 1RSB for any k (≥1) (Gross and Mèzard 1984).

It is easy to confirm that x < 1 for T < T _ϲ from (5.63); generally, x = T/T _ϲ. The order parameter function q(x) is equal to 1 (= q ₁) above x (= T/T _ϲ) and 0 (= q ₀) below x (see Fig. 5.3).

Fig. 5.3.

Spin glass order parameter of the REM below the transition temperature

Open in new tab Download slide

For j ₀ exceeding some critical value, a ferromagnetic (F) phase exists. It is easy to confirm that the solution of (5.48), (5.49), and (5.47) in the limit r → ∞ is q ₀ = q _x = m = 1 when j₀ > 0, m > 0. The ferromagnetic phase is therefore replica symmetric since q ₀ = q ₁. The free energy (5.36) is then

(5.66)

The phase boundaries between these three phases are obtained by comparison of the free energies:

P–SG transition: $T_{c} / J = (2 {\sqrt{\log 2)}}^{- 1}$

SG-F transition: ${(j_{0})}_{c} / J = (\sqrt{\log 2}$

P–F transition: $j_{0} = J^{2} / (4 T) + T \log 2$

The final phase diagram is depicted in Fig. 5.4.

Fig. 5.4.

Phase diagram of the REM. The Nishimori line is shown dashed.

Open in new tab Download slide

Let us now turn to the interpretation of the above results in terms of error-correcting codes. The overlap M is one in the ferromagnetic phase $(j_{0} / J > \sqrt{\log 2)}$ because the spin alignment is perfect (m = 1), implying error-free decoding.⁶ To see the relation of this result to the Shannon bound (5.3), we first note that the transmission rate of information by the Sourlas code is

(5.67)

As shown in Appendix C, the capacity of the Gaussian channel is

(5.68)

Here we substitute $J_{0} = j_{0} r! / N^{r - 1}$ and $J^{2} \to J^{2} r! / 2 N^{r - 1}$ according to (5.26) and take the limit N ≫ 1 with r fixed to find

(5.69)

The transmission rate (5.67), on the other hand, reduces in the same limit to

(5.70)

It has thus been established that the transmission rate R coincides with the channel capacity C at the lower limit of the ferromagnetic phase $j_{0} / J = \sqrt{\log 2.}$ In the context of error-correcting codes, j ₀ represents the signal amplitude and J is the amplitude of the noise, and hence jo/J corresponds to the S/N ratio. The conclusion is that the Sourlas code in the limit r → ∞, equivalent to the REM, is capable of error-free decoding (m = 1,M = 1) for the S/N ratio exceeding some critical value and the Shannon bound is achieved at this critical value.

The general inequality (5.21) is of course satisfied. Both sides vanish if j ₀ < (j ₀)_c- For j ₀ > (j ₀)_c, the right hand side is one while the left hand side is zero in the paramagnetic phase and one in the ferromagnetic phase. In other words, the Sourlas code in the limit r → ∞ makes it possible to transmit information without errors under the MAP as well as under the MPM. An important point is that the information transmission rate R is vanishingly small, impeding practical usefulness of this code.

The Nishimori line βJ ² = J ₀ is in the present case T/J = J/(2j ₀) and passes through the point at $j_{0} / J = \sqrt{\log 2.}$ and $T / J = 1 / 2 \sqrt{\log 2.}$ where three phases (P, SG, F) coexist. The exact energy on it, E = − j ₀, derived from the gauge theory agrees with the above answer (5.66). One should remember here that the free energy coincides with the energy as the entropy vanishes.

5.5.4 Solution for finite r

It is necessary to solve the equations of state numerically for general finite r.⁷ The result for the case of r = 3 is shown here as an example (Nishimori and Wong 1999; Gillin et al. 2001). If jo is close to zero, one finds a 1RSB SG solution with q ₁ > 0 and q ₀ = m = 0 below T = 0.651 J. As the temperature is lowered, the stability condition of the 1RSB (5.52) breaks down at T = 0.240J, and the full RSB takes over.

The ferromagnetic phase is RS (5.39) in the high-temperature range but the RSB should be taken into account below the AT line (3.22); a mixed (M) phase with both ferromagnetic order and RSB exists at low temperatures. The ferromagnetic phase, with RS and/or RSB, continues to exist beyond the limit of thermodynamic stability as a metastable state (i.e. as a local minimum of the free energy). Figure 5.5 summarizes the result.

Fig. 5.5.

Phase diagram of the model with r = 3. The double dotted line indicates the limit of metastability (spinodal) of the ferromagnetic phase. Error correction is possible to the right of this boundary. Thermodynamic phase boundaries are drawn in full lines. The Nishimori line is drawn dashed.

Open in new tab Download slide

Dependence of the overlap M(β) on T (= 1/β) is depicted in Fig. 5.6 where j ₀/J is fixed to 0.77 close to the boundary of the ferromagnetic phase. The overlap M(β) is a maximum at the optimal temperature T/J = J/2j ₀ = 0.649 appearing on the right hand side of (5.24) corresponding to the Nishimori line (the dot in Fig. 5.6). The ferromagnetic phase disappears above T/J= 0.95 even as a metastable state and one has M = 0: the paramagnetic phase has <σ _i>β =0 at each site i, and sgn <σ _i>β cannot be defined. It is impossible to decode the message there. For temperatures below T/J = 0.43, the RSB is observed (the dashed part). We have thus clarified in this example how much the decoded message agrees with the original one as the temperature is changed around the optimal value.

Fig. 5.6.

Overlap for r = 3, j ₀ = 0.77

Open in new tab Download slide

5.6 Codes with finite connectivity

The Sourlas code saturates the Shannon bound asymptotically with vanishing transmission rate. A mean-field model with finite connectivity has a more desirable property that the rate is finite yet the Shannon bound is achieved. We state some of the important results about this model in the present section. We refer the reader to the original papers cited in the text for details of the calculations.

5.6.1 Sourlas-type code with finite connectivity

The starting point is analogous to the Sourlas code described by the Hamiltonian (5.25) but with diluted binary interactions for the BSC,

(5.71)

where the element of the symmetric tensor $A_{i_{1} …}_{i_{r}}$ (representing dilution) is either zero or one depending on the set of indices (i ₁,i ₂, …, i _r) The final term has been added to be prepared for biased messages in which 1 may appear more frequently than — 1 (or vice versa). The connectivity is c; there are c non-zero elements randomly chosen for any given site index i:

(5.72)

The code rate is R = r/c because an encoded message has c bits per index i and carries r bits of the original information.

Using the methods developed for diluted spin glasses (Wong and Sherrington 1988), one can calculate the free energy under the RS ansatz as (Kabashima and Saad 1998, 1999; Vicente et al. 1999)

(5.73)

Here [• • •] J and [•••]ξ denote the configurational averages over the distributions of J and ξ, respectively. The order functions π(x) and $\hat{π}$ (y) represent distributions of the multi-replica spin overlap and its conjugate:

(5.74)

where a and â are normalization constants, and l is the number of replica indices on the left hand side. Extremization of the free energy gives paramagnetic and ferromagnetic solutions for the order functions. The spin glass solution should be treated with more care under the 1RSB scheme. The result becomes relatively simple in the limit where r and c tend to infinity with the ratio R = r/c(= l/α) kept finite and F = 0:

(5.75)

where p is the noise probability of the BSC, and $T_{g}$ is determined by the condition of vanishing paramagnetic entropy, α (log cosh $β_{g} - β_{g}$ tanh $β_{g}$ ) + log2 = 0. The ferromagnetic and 1RSB-SG phases are completely frozen (vanishing entropy). The finite-temperature phase diagram for a given R is depicted in Fig. 5.7. Perfect decoding (m = M = 1) is possible in the ferromagnetic phase that extends to the limit p _c. It can be verified by equating f _F and f irsb–sg that the Shannon bound is achieved at p _c,

(5.76)

Fig. 5.7.

Finite-temperature phase diagram of the unbiased diluted Sourlas code with R= 1/4 in the limit r, c → ∞ (Vicente et al. 1999). The Shannon bound is achieved at p _c. The ferromagnetic phase is stable at least above the dashed line. (Copyright 1999 by the American Physical Society)

Open in new tab Download slide

The ferromagnetic solution appears as a metastable state at a very high temperature of O(r/log r), but the thermodynamic transition takes place at T of O(1). This suggests that there exists a high energy barrier between the ferromagnetic and paramagnetic solutions. Consequently, it might be difficult to reach the correctly decoded (i.e. ferromagnetic) state starting from an arbitrary initial condition (which is almost surely a paramagnetic state) by some decoding algorithm. We are therefore lead to consider moderate r, in which case the ferromagnetic phase would have a larger basin of attraction although we have to sacrifice the final quality of the decoded result (magnetization smaller than unity). In Fig. 5.8 the ground-state magnetization (overlap) is shown as a function of the noise probability for various finite values of r (written as K in the figure) in the case of R = 1/2. The transition is of first order except for r = 2. It can be seen that the decoded result is very good (m close to one) for moderate values of r and p.

Fig. 5.8.

Ground-state magnetization as a function of the noise probability for various r (written as K in the figure) (Vicente et al 1999). The rate R is 1/2 and F = 0. Also shown by open circles are the numerical results from the TAP-like decoding algorithm. (Copyright 1999 by the American Physical Society)

Open in new tab Download slide

It is useful to devise a practical algorithm of decoding, given the channel output {J_μ}, where μ denotes an appropriate combination of site indices. The following method based on an iterative solution of TAP-like equations is a powerful tool for this purpose (Kabashima and Saad 2001; Saad et al. 2001) since its computational requirement is only of O(N). For a given site i and an interaction μ that includes i, one considers a set of conditional probabilities

(5.77)

where ν also includes i.⁸ Under an approximate mean-field-like decoupling of the conditional probabilities, one obtains the following set of equations for m _μi and ${\hat{m}}_{μ i}$

(5.78)

where M(i) is a set of interactions that include i, and L (μ) is a set of sites connected by J _μ. After iteratively solving these equations for m _μi and ${\hat{m}}_{μ i}$ one determines the final decoded result of the ith bit as sgn(m _i), where

(5.79)

This method is equivalent to the technique of belief propagation used in information theory. It is also called a TAP approach in the statistical mechanics literature owing to its similarity to the TAP equations in the sense that m _μi and ${\hat{m}}_{μ i}$ reflect the effects of removal of a bond μ from the system.

The resulting numerical data are shown in Fig. 5.8. One can see satisfactory agreement with the replica solution. It also turns out that the basin of attraction of the ferromagnetic solution is very large for r = 2 but not for r ≥ 3.

5.6.2 Low-density parity-check code

Statistical-mechanical analysis is applicable also to other codes that are actively investigated from the viewpoint of information theory. We explain the low-density parity-check code (LDPC) here because of its formal similarity to the diluted Sourlas code treated in the previous subsection (Kabashima et al. 2000a; Murayama et al. 2000). In statistical mechanics terms, the LDPC is a diluted many-body Mattis model in an external field.⁹

Let us start the argument with the definition of the code in terms of a Boolean representation (0 and 1, instead of ±1). The original message of length N is denoted by an N-dimensional Boolean vector ξ and the encoded message of length M by z ₀. The latter is generated from the former using two sparse matrices C _s and C _n according to the following modulo-2 operation of Boolean numbers:

(5.80)

The matrix C _s has the size M × N and the number of ones per row and column randomly. The channel noise ζ is added to z ₀, and the output is

(5.81)

Decoding is carried out by multiplying z by C _n:

(5.82)

One finds the most probable solution of this equation for the decoded message σ and the inferred noise τ

(5.83)

The Ising spin representation corresponding to the above prescription of the LDPC, in particular (5.83), is

(5.84)

where L _s(μ) is a set of indices of non-zero elements in the μth row of C _s and similarly for L _n(μ). Note that σ, τ, ξ, and ζ are all Ising variables (±1) from now on. The Hamiltonian reflects the constraint (5.84) as well as the bias in the original message and the channel noise:

(5.85)

Here A is a sparse tensor for choosing the appropriate combination of indices corresponding to C _s and C _n (or L _s(μ) and L _s(μ)), F _s is the bias of the original message, and F _n = $\frac{1}{2}$ log(l — p)/p comes from the channel noise of rate p. The interaction J _i… i _r;j ₁…j _l is specified by the expressions involving ξ and ζ in (5.84). The problem is to find the ground state of this Hamiltonian to satisfy (5.84), given the output of the channel {Jμ} defined in (5.84).

The replica analysis of the present system works similarly to the diluted Sourlas code. The resulting RS free energy at T = 0 is

(5.86)

The order functions π(x) and $\hat{π} (\hat{x})$ denote the distributions of the multi-replica overlaps and their conjugates for the σ-spins, and ρ(y) and $\hat{ρ} (\hat{y})$ are for the τ-spins:

(5.87)

Extremization of the free energy (5.86) with respect to these order functions yields ferromagnetic and paramagnetic solutions. Since the interactions in the Hamiltonian (5.85) are of Mattis type without frustration, there is no spin glass phase. When r ≥ 3 or l ≥ 3, r > 1, the free energy for an unbiased message (F _s = 0) is

(5.88)

The spin alignment is perfect (m = 1) in the ferromagnetic phase. The magnetization as a function of the channel-error probability p is shown in Fig. 5.9(a). The ferromagnetic state has a lower free energy below p _c that coincides with the Shannon bound as can be verified by equating f _F and f _P. The paramagnetic solution loses its significance below p _c because its entropy is negative in this region. A serious drawback is that the basin of attraction of the ferromagnetic state is quite small in the present case.

Fig. 5.9.

Magnetization as a function of the channel-error probability in the LDPC (Murayama et al 2000). Bold lines represent stable states, (a) r ≥ 3 or l ≥ 3, r > 1. (b) r = l = 2. (c) r = 1. (Copyright 2000 by the American Physical Society)

Open in new tab Download slide

If r = l = 2, the magnetization behaves as in Fig. 5.9(b). The perfect ferromagnetic state and its reversal are the only solutions below a threshold p _s. Any initial state converges to this perfect state under an appropriate decoding algorithm. Thus the code with r = l = 2 is quite useful practically although the threshold p _s lies below the Shannon bound.

The system with single-body interactions r = 1 has the magnetization as shown in Fig. 5.9(c). Again, the Shannon bound is not saturated, but the perfect ferromagnetic state is the only solution below p _s. An advantage of the present case is that there is no mirror image (m = −1).

Iterative solutions using TAP-like equations work also in the LDPC as a rapidly converging tool for decoding (Kabashima and Saad 2001; Saad et al. 2001). These equations have similar forms to (5.78) but with two types of parameters, one for the σ-spins and the other for τ. Iterative numerical solutions of these equations for given dilute matrices C _s and C _n show excellent agreement with the replica predictions.

5.6.3 Cryptography

The LDPC is also useful in public-key cryptography (Kabashima et al. 20006). The N-dimensional Boolean plaintext ξ is encrypted to an M-dimensional ciphertext z by the public key $G \equiv C_{n}^{- 1} C_{s} D$ (where D is an arbitrary invertible dense matrix of size N x N) and the noise ζ with probability p according to (5.81)

(5.89)

Only the authorized user has the knowledge of C _n, C _s, and D separately, not just the product G. The authorized user then carries out the process of decryption equivalent to the decoding of the LDPC to infer D ξ and consequently the original plaintext ξ. This user succeeds if r = l = 2 and p < p _s as was discussed in the previous subsection.

The task of decomposing G into C _n, C _s, and D is NP complete¹⁰ and is very difficult for an unauthorized user, who is therefore forced to find the ground state of the Hamiltonian, which is the Ising spin representation of (5.89):

(5.90)

where G is a dense tensor with elements 1 or 0 corresponding to G, and ϑ is either 1 or – 1 according to the noise added as ζ in the Boolean representation (5.89). Thus the system is frustrated. For large N, the number r' in the above Hamiltonian and c' (the connectivity of the system described by (5.90)) tend to infinity (but are smaller than N itself) with the ratio c'/r' kept finite. The problem is thus equivalent to the Sourlas-type code in the same limit. We know, as mentioned in §5.6.1, that the basin of attraction of the correctly decrypted state in such a system is very narrow. Therefore the unauthorized user almost surely fails to decrypt.

This system of cryptography has the advantage that it allows for relatively high values of p, and thus an increased tolerance against noise in comparison with existing systems. The computational requirement for decryption is of O(N), which is much better than some of the commonly used methods.

5.7 Convolutional code

The convolutional code corresponds to a one-dimensional spin glass and plays important roles in practical applications. It also has direct relevance to the turbo code, to be elucidated in the next section, which is rapidly becoming the standard in practical scenes owing to its high capability of error correction. We explain the convolutional code and its decoding from a statistical-mechanical point of view following Montanari and Sourlas (2000).

5.7.1 Definition and examples

In a convolutional code, one first transforms the original message sequence ξ = {ξ ₁,….ξ _N}(ξ _i = ±1,∀i)into a register sequence τ = {τ ₁(ξ),…,τ _N(ξ)} (τ _{i =} ±1, ∀i). In the non-recursive convolutional code, the register sequence coincides with the message sequence (τ _{i =} ξ _i, ∀i) but this is not the case in the recursive convolutional code to be explained later in §5.7.3. To encode the message, one prepares r registers, the state of which at time t is described by $\sum_{1} (t), \sum_{2} (t), …, {\sum_{r} (t)}^{} .$ .¹¹ The number r is called the memory order of the code. The register sequence τ is fed into the register sequentially (shift register):

(5.91)

The encoder thus carries the information of (r + 1) bits τ _t, τ _{t – 1, …,} τ _{t – r} at any moment t.

We restrict ourselves to the convolutional code with rate R = 1/2 for simplicity. Code words $J = {J_{1}^{(1)}, …, J_{N}^{(1)}; J_{1}^{(2)}, …, J_{N}^{(2)}}$ are generated from the register bits by the rule

(5.92)

Here, α = 1 or 2, and we define τ _j = 1 for j ≤ 0. The superscript k(j; α) is either 0 or 1 and characterizes a specific code. We define k(0; 1) = k(0; 2) = 1 to remove ambiguities in code construction. Two simple examples will be used frequently to illustrate the idea:

k(0; 1) = k(1;1) = 1, and the other k(j; 1) = 0; k(0; 2) = 1, and the other k(j; 2) = 0. The memory order is r = 1. The code words are $J_{i}^{(1)} = τ_{i} τ_{i - 1}$ and $J_{i}^{(2)} = τ_{i}$ The corresponding spin Hamiltonian is

(5.93)

where ${\tilde{J}}_{i}^{(α)}$ is the noisy version of ${\tilde{J}}_{i}^{(α)}$ and σ _i is the dynamical variable used for decoding. This is a one-dimensional spin system with random interactions and random fields.

k(0; 1) = k(1; 1) = k(2; 1) = 1, and the other k(j; 1) = 0; k(0; 2) = k(2; 2) = 1, and the other k(j; 2) = 0. The memory order is r = 2 and the code words are $J_{i}^{(1)} = τ_{i} τ_{i - 1} τ_{i - 2}$ and $J_{i}^{(2)} = τ_{i} τ_{i - 2}$ There are three-body and two-body interactions in the corresponding spin system

(5.94)

which can be regarded as a system of ladder-like structures shown in Fig. 5.10. A diagrammatic representation of the encoder is depicted in Fig. 5.11.

Fig. 5.10.

A convolutional code with code rate 1/2 (example 2 in the text) expressed as a spin system. Interactions exist among three spins around each triangle and between two horizontally neighbouring spins. Two up spins are located at i = – 1 and i = 0 to fix the initial condition.

Open in new tab Download slide

Fig. 5.11.

Encoder corresponding to the code of Fig. 5.10. $J_{t}^{(1)}$ is formed from the three consecutive register bits and $J_{t}^{(2)}$ from two bits.

Open in new tab Download slide

5.7.2 Generating polynomials

Exposition of the encoding procedure in terms of the Boolean (0 or 1) representation instead of the binary (±1) representation is useful to introduce the recursive convolutional code in the next subsection. For this purpose, we express the original message sequence ξ ={ξ1,….ξN} by its generating polynomial defined as

(5.95)

where H _j(= 0 or 1) is the Boolean form of ξj: ξj = ${(-1)}^{H_{j}}$ . Similarly the generating polynomial for the register sequence τ is

(5.96)

The non-recursive convolutional code has G(x) = H(x), but this is not the case for the recursive code to be explained in the next subsection. The code word J ^(α)(α = 1,2) is written as

(5.97)

with $J_{j}^{(α)} = {(- 1)}^{L_{j}^{(α)}}$ The relation between L ^(α)(x) and G(x) is determined by (5.92) and is described by another polynomial

(5.98)

(5.99)

or equivalently

(5.100)

The right hand side is the convolution of k and G, from which the name of convolutional code comes.

The examples 1 and 2 of §5.7.1 have the generating polynomials as $(1) g_{1} (x) = 1 + x {and g}_{2} (x) = 1, and (2) g_{1} (x) = 1 + x + x^{2} {and g}_{2} (x) = 1 + x^{2}$

5.7.3 Recursive convolutional code

The relation between the source and register sequences ξ and τ (or H (x) and G(x)) is not simple in the recursive convolutional code. The register sequence of recursive convolutional code is defined by the generating polynomial as

(5.101)

Code words satisfy $L^{(α)} (x) = g_{α} (x) G (x)$ and therefore we have

(5.102)

The first relation means J ⁽¹⁾ = ξ in the binary representation.

Fig. 5.12.

Encoder of the recursive convolutional code to be compared with the non-recursive case of Fig. 5.11.

Open in new tab Download slide

The relation between the source and register sequences (5.101) can be written in terms of the binary representation as follows. Equation (5.101) is seen to be equivalent to $G (x) = - G (x)$ because G(x) = – G(x) and H(x) = –H(x) (mod 2). The coefficient of x ⁱ in this relation is, if we recall $k (0; 1) = 1, G_{i} = H_{i} + \sum_{j = 1}^{r} k (j; 1) G_{i - j}$ which has the binary representation

(5.103)

This equation allows us to determine τ _i recursively; that is, is determined if we know τ ₁….,τ _i–1. From the definition $L^{(α)} (x) = g_{α} (x) G (x)$ code words are expressed in terms of the register sequence in the same form as in the case of the non-recursive convolutional code:

(5.104)

The encoder for the code of example 2 of §5.7.1 is shown in Fig. 5.12. Decoding is carried out under the Hamiltonian

(5.105)

where ${\tilde{J}}_{i}^{(α)}$ is the noisy version of the code word $J_{i}^{(α)}$ According to (5.103), the ith bit is inferred at the inverse temperature β as

(5.106)

which is to be contrasted with the non-recursive case

(5.107)

5.8 Turbo code

The turbo code is a powerful coding/decoding technique frequently used recently. In has near-optimal performance (i.e. the transmission rate can be made close to the Shannon bound under the error-free condition), which is. exceptional in a practicable code. We explain its statistical-mechanical formulation and some of the results (Montanari and Sourlas 2000; Montanari 2000).

The turbo code is a variant of the recursive convolutional code with the source message sequence ξ={ξ ₁,…ξ _N} the permuted sequence ξ ^P = {ξ _P(1), … ξ _P(N)}as the input to the encoder. The permutation P operates on the set {1, 2, …, N} and is fixed arbitrarily for the moment. Correspondingly, two register sequences are generated according to the prescription of the recursive convolutional code (5.103):

(5.108)

(5.109)

or, equivalently,

(5.110)

(5.111)

The code words are comprised of three sequences and the rate is R= 1/3:

(5.112)

The posterior to be used at the receiving end of the channel has the following expression:

(5.113)

where the Hamiltonian is, corresponding to (5.112),

(5.114)

The interactions ${\tilde{J}}_{i}^{(0)} {\tilde{J}}_{i}^{(1)},$ and ${\tilde{J}}_{i}^{(2)}$ are the noisy versions of the code words ${\tilde{J}}_{i}^{(0)} {\tilde{J}}_{i}^{(1)},$ and ${\tilde{J}}_{i}^{(2)}$ respectively. The system (5.114) is a one-dimensional spin glass composed of two chains (σ ⁽¹⁾ and σ ⁽²⁾) interacting via the constraint ε _P(i)(σ ⁽¹⁾) = εi(σ ⁽²)) (∀i). In decoding, one calculates the thermal expectation value of the variable representing the original bit, (5.110), using the posterior (5.113):

(5.115)

The finite-temperature (MPM) decoding with the appropriate β is used in practice because an efficient TAP-like finite-temperature iterative algorithm exists as explained later briefly.

To understand the effectiveness of turbo code intuitively, it is instructive to express the spin variable $σ_{i}^{(1)}$ in terms of the other set σ ⁽²⁾. In example 1 of §5.7.1, we have k(0; 1) = k(1; 1) = 1 and therefore, from the constraint $\in_{P (i)} (σ^{(1)}) = \in_{i} (σ^{(2)}), σ_{i}^{(2)} σ_{i - 1}^{(2)} = σ_{P (i)}^{(1)} σ_{P (i) - 1}^{(1)}$ see (5.110) and (5.111). We thus have $σ_{i}^{(1)} = \prod_{j = 1}^{i} σ_{j}^{(1)} σ_{j - 1}^{(1)}$ with $σ_{j}^{(α)}$ for j ≤ 0. If i is of O(N) and the permutation P is random, it is very plausible that this final product of the σ ⁽²⁾ is composed of O(N) different σ ⁽²⁾. This means that the Hamiltonian (5.114) has long-range interactions if expressed only in terms of σ ⁽²⁾, and the ferromagnetic phase (in the ferromagnetic gauge) is likely to have an enhanced stability compared to simple one-dimensional systems. We may therefore expect that good performance is achieved in a turbo code with random permutation P, which is indeed confirmed to be the case in numerical experiments.

The decoding algorithm of the turbo code is described as follows. One prepares two chains labelled by α = 1, 2 with the Hamiltonian

(5.116)

Then one iteratively solves a set of TAP-like equations for the effective fields $Γ_{i}^{(α)}$ that represent the effects of the other chain:

(5.117)

(5.118)

where <…>^(α) is the thermal average with the Hamiltonian H ^(α) (σ ^(α)) and k denotes the iteration step. The process (5.116)−(5.118) is an approximation to the full system (5.114) and yet yields excellent performance numerically.

Detailed statistical-mechanical analysis of the system H ⁽¹⁾(σ ⁽¹⁾) + H ⁽²⁾(σ ⁽²⁾) with ε _P(i)(σ ⁽¹⁾) = ε _i(σ ⁽²⁾) has been carried out (Montanari 2000). We describe some of its important results. Let us suppose that the channel is Gaussian and (3 is adjusted to the optimal value (MPM). The S/N ratio is denoted as 1/w ². There exists a phase of error-free decoding (overlap M = 1) that is locally unstable in the high-noise region $w^{2} > w_{c}^{2}$ The numerical values $w_{c}^{2}$ are l/log4 = 0.721 for the code 1 of §5.7.1 and 1.675 for the code 2. The latter is very close to the Shannon limit $w_{S}^{2} = 1 / (2^{2 / 3} - 1) = 1.702$ derived by equating the capacity of the Gaussian channel with the rate R = 1/3:

(5.119)

The limit of the first example (w _c = 0.721) is found to be close to numerical results whereas the second (w _c = 1.675) shows some deviation from numerical results. The stability analysis leading to these values may not give the correct answer if a first-order phase transition takes place in the second example.

5.9 CDMA multiuser demodulator

In this section we present a statistical-mechanical analysis of signal transmission by modulation (T. Tanaka 2001). This topic deviates somewhat from the other parts of this chapter. The signal is not encoded and decoded but is modulated and demodulated as described below. Nevertheless, the goal is very similar to error-correcting codes: to extract the best possible information from a noisy output using the idea of Bayesian inference.

5.9.1 Basic idea of Cdma

Code-division multiple access (CDMA) is an important standard of modern mobile communications (Simon et al. 1994; Viterbi 1995). The digital signal of a user is modulated and transmitted to a base station through a channel that is shared by multiple users. At the base station, the original digital signal is retrieved by demodulation of the received signal composed of the superposition of multiple original signals and noise. An important problem is therefore to design an efficient method to modulate and demodulate signals.

In CDMA, one modulates a signal in the following way. Let us focus our attention to a signal interval which is the time interval carrying a single digital signal, with a signal ξ _i (= ±1) for the ith user. The signal interval is divided into p chip intervals (p = 4 in Fig. 5.13). User i is assigned a spreading code sequence $η_{i}^{t} (= \pm 1) (t = 1, …, p)$ . The signal ξ _i is modulated in each chip interval t by the spreading code sequence according to the multiplication $η_{i}^{t} ξ_{i}$ Modulated signals of N users are superimposed in a channel and are further disturbed by noise. At the base station, one receives the signal

(5.120)

Fig. 5.13.

Modulation of the signal of a single user in CDMA. A signal interval is composed of four chip intervals in this example. The full line represents the original signal and the dashed line denotes the spreading code sequence.

Open in new tab Download slide

at the chip interval t and is asked to retrieve the original signals ξ _i (i = 1, …, N) from y _t(t = l,…,p) with the knowledge of the spreading code sequence $η_{i}^{t}$ .

Before proceeding to the problem of demodulation, we list a few points of idealization that lead to the simple formula (5.120): modulated signals of N users are assumed to be transmitted under perfect synchronization at each chip interval t throughout an information signal interval. This allows us simply to sum up $η_{i}^{t} ξ_{i}$ over all i (= 1,…, N) at any given chip interval t. Furthermore, all signals are supposed to have the same amplitude (normalized to unity in (5.120)), a perfect power control. Other complications (such as the effects of reflections) are ignored in the present formulation. These aspects would have to be taken into account when one applies the theory to realistic situations.

The measure of performance is the overlap of the original (ξ _i) and demodulated $({\hat{ξ}}_{i})$ signals

(5.121)

averaged over the distributions of $ξ_{i}, η_{i}^{t}$ and ν ^t. Equivalently, one may try to minimize the bit-error rate (the average probability of error per bit) (1 — M)/2. We show in the following that the CDMA multiuser demodulator, which uses Bayesian inference, gives a larger overlap than the conventional demodulator.

5.9.2 Conventional and Bayesian demodulators

Let us first explain the simple method of the conventional demodulator. To extract the information of ξ _i from y ^t, we multiply the received signal at the tth chip interval y ^t by the spreading code $η_{i}^{t}$ and sum it up over the whole signal interval:

(5.122)

The first term on the right hand side is the original signal, the second represents multiuser interference, and the third is the channel noise (which is assumed to be Gaussian). We then demodulate the signal by taking the sign of this quantity

(5.123)

It is easy to analyse the performance of this conventional demodulator in the limit of large N and p with α = p/N fixed. We also assume that the noise power $σ_{s}^{2}$ the variance of ν^t, scales with N such that $β_{s} \equiv N / σ_{s}^{2}$ is of O(1), and that $η_{i}^{t}$ and ξ_k are all independent. Then the second and third terms on the right hand side of (5.122) are Gaussian variables, resulting in the overlap

(5.124)

where Erf(x) is the error function $\int_{0}^{x} e^{- t^{2}} d t$ This represents the performance of the conventional demodulator as a function of the number of chip intervals per signal interval α and the noise power β_s.

To improve the performance, it is useful to construct the posterior of the original signal, given the noisy signal, following the method of Bayesian inference. Let us denote the set of original signals by ξ= ^t{ξ ₁,….ξ _N} and the corresponding dynamical variables for demodulation by S = ^t(S ₁,…, S_N) The sequence of received signals within p chip intervals is also written as a vector in a p-dimensional space y = ^t(y ¹,…, y ^p). Once the posterior p(S\y) is given, one demodulates the signal by the MAP or MPM:

(5.125)

(5.126)

To construct the posterior, we first write the distribution of Gaussian noise $ν^{t} = y^{t} - \sum_{i} η_{i}^{t} ξ_{i}$ (5.120), as

(5.127)

where the effective Hamiltonian has been defined as

(5.128)

The field h_i has already been defined in (5.122). If we assume that the prior is uniform, P(ξ) = const, the posterior is seen to be directly proportional to the prior according to the Bayes formula:

(5.129)

The Hamiltonian (5.128) looks very similar to the Hopfield model to be discussed in Chapter 7, (7.7) with (7.4), the only difference being that the sign of the interaction is the opposite (Miyajima et al. 1993).

5.9.3 Replica analysis of the Bayesian demodulator

The replica method is useful to analyse the performance of the Bayesian demodulator represented by the posterior (5.129).

Since we usually do not know the noise power of the channel β_s, it is appropriate to write the normalized posterior with an arbitrary noise parameter β in place of β_s, the latter being the true value. From (5.127)−(5.129), we then find

(5,130)

where the vector r denotes ^t(r ¹,…, r ^p) with $r^{t} = y^{t} / \sqrt{N}$ , and the normalization factor (or the partition function) is given as

(5.131)

The factor 2^−N is the uniform prior for ξ. The macroscopic behaviour of the system is determined by the free energy averaged over the distributions of the spreading code sequence, which is assumed to be completely random, and the channel noise. The latter distribution of noise is nothing more than the partition function (5.131) with the true hyperparameter β = β_s, which we denote by Z_o(r). The replica average is therefore expressed as

(5.132)

where the configurational average on the right hand side is over the spreading code sequence. It is convenient to separate the above quantity into the spin-dependent part g ₁ and the rest g ₂ for further calculations:

(5.133)

where α = p/N in the exponent should not be confused with the replica index. The zeroth replica (α= 0) corresponds to the probability weight Z ₀. The two functions g ₁ and g ₂ are defined by

(5.134)

(5.135)

where the following notations have been used:

(5.136)

In the thermodynamic limit p,N → ∞ with their ratio α fixed, these v ₀ and v _α become Gaussian variables with vanishing mean and covariance given by the overlap of spin variables, under the assumption of a random distribution of the spreading code sequence:

(5.137)

To proceed further, we assume symmetry between replicas (α = l,…,n): Q_0α=m, Q_0αβ=q(α,β≥1). Then v ₀ and v _α are more conveniently written in terms of independent Gaussian variables u, t, and z _α with vanishing mean and unit variance,

(5.138)

We are now ready to evaluate the factor $e^{g_{2}}$ explicitly as

(5.139)

The other factor $e^{N_{g_{1}}}$ (5.134) can be evaluated using the Fourier representation of the delta function

(5.140)

(5.141)

where we have used the RS form of the matrix M _0α = E and M _αβ = F (α≠ β ≥ 1). In the thermodynamic limit, the leading contribution is

(5.142)

From (5.139) and (5.142), the total free energy g ₁+αg ₂ is given in the limit n → 0 as

(5.143)

Extremization of the free energy yields the equations of state for the order parameters as

(5.144)

(5.145)

The overlap is determined from these quantities by

(5.146)

The stability limit of the RS solution, the AT line, is expressed as

(5.147)

The optimum demodulation by MPM is achieved at the parameter β = β_s whereas the MAP corresponds to β → ∞.

5.9.4 Performance comparison

The results of the previous analysis in terms of the bit-error rate (1 − M)/2 are plotted in Fig. 5.14 for (a) β_s = 1 and (b) β_s = 20 for the conventional demodulator (CD), MPM ('Opt.'), and MAP demodulators. Also shown is the mean-field demodulator in which one uses the mean-field equation of state for local magnetization

(5.148)

Fig. 5.14.

Bit-error rate of the CDMA demodulators. The left one (a) is for the noise power βs = 1 and the right (b) is for βs = 20. The symbols are: Opt. for the MPM, MFA for the mean-field demodulator with β = βs _, and CD for the conventional demodulator (T. Tanaka 2001; Copyright 2001 by the Massachusetts Institute of Technology).

Open in new tab Download slide

in combination with ${\hat{ξ}}_{i} = sgn (m_{i})$ . This method has the advantage that it serves as a demodulating algorithm of direct practical usefulness.

It is observed that the MAP and MPM show much better performance than the conventional demodulator. The curve for the MAP almost overlaps with the MPM curve when the noise power is high, β_s = 1, but a clear deviation is found in the low-noise case β_s = 20. The MPM result has been confirmed to be stable for RSB. By contrast, one should take RSB into account for the MAP except in a region with small α and large β_s.

Bibliographical note

General expositions of information theory and error-correcting codes are found in textbooks on these subjects (McEliece 1977; Clark and Cain 1981; Lin and Costello 1983; Arazi 1988; Rhee 1989; Ash 1990; Wicker 1995). The present form of statistical-mechanical analysis of error-correcting codes was proposed by Sourlas (1989) and has been expanding rapidly as described in the text. Some of the recent papers along this line of development (but not cited in the text) include Kanter and Saad (2000), Nakamura et al. (2000), and Kabashima et al. (2000 c). See also Heegard and Wicker (1999) for the turbo code.

Notes

The symbol p is often used in the literature to denote the number of interacting spins. We use r instead to avoid confusion with the error probability in the BSC.

Remember that we are using the ferromagnetic gauge.

Expansions from the large-r limit and from r = 2 are also possible (Gardner 1985).

We did not write out the normalization constant in the second expression of (5.77) because the left hand side is to be normalized with respect to J _μ in contrast to the first expression to be normalized for σ _i.

There are several variations of the LDPC. We treat in this section the one discussed by MacKay and Neal (1997) and MacKay (1999). See also Vicente et al.(2000) for a slightly different code.

See Chapter 9 for elucidation of the term 'NP completeness'.

The original source message is assumed to be generated sequentially from i = 1 to i = N. Consequently the time step denoted by t (= 1, 2, …,N) is identified with the bit number i(= 1,2,…, N).

Download all slides

Metrics

Total Views 153

122 Pageviews

31 PDF Downloads

Since 10/1/2022

Month:	Total Views:
October 2022	5
November 2022	4
December 2022	4
March 2023	3
April 2023	2
May 2023	2
July 2023	4
September 2023	6
October 2023	2
December 2023	7
January 2024	16
February 2024	3
March 2024	21
April 2024	3
May 2024	5
June 2024	7
July 2024	10
August 2024	1
October 2024	4
November 2024	4
December 2024	5
January 2025	7
February 2025	5
March 2025	3
April 2025	11
May 2025	9

Contents

5 Error-Correcting Codes

Abstract

5.1 Error-correcting codes

5.1.1 Transmission of information

5.1.2 Similarity to spin glasses

5.1.3 Shannon bound

5.1.4 Finite-temperature decoding

5.2 Spin glass representation

5.2.1 Conditional probability

5.2.2 Bayes formula

5.2.3 MAP and MPM

5.2.4 Gaussian channel

5.3 Overlap

5.3.1 Measure of decoding performance

5.3.2 Upper bound on the overlap

5.4 Infinite-range model

5.4.1 Infinite-range model

5.4.2 Replica calculations

5.4.3 Replica-symmetric solution

5.4.4 Overlap

5.5 Replica symmetry breaking

5.5.1 First-step Rsb

5.5.2 Random energy model

5.5.3 Replica solution in the limit r → ∞

5.5.4 Solution for finite r

5.6 Codes with finite connectivity

5.6.1 Sourlas-type code with finite connectivity

5.6.2 Low-density parity-check code

5.6.3 Cryptography

5.7 Convolutional code

5.7.1 Definition and examples

5.7.2 Generating polynomials

5.7.3 Recursive convolutional code

5.8 Turbo code

5.9 CDMA multiuser demodulator

5.9.1 Basic idea of Cdma

5.9.2 Conventional and Bayesian demodulators

5.9.3 Replica analysis of the Bayesian demodulator

5.9.4 Performance comparison

Bibliographical note

Email alerts

This Feature Is Available To Subscribers Only