Error-correcting Codes on a Bethe-like Lattice 
Renato Vicente David Saad 
The Neural Computing Research Group 
Aston University, Birmingham, B4 7ET, United Kingdom 
{ vicenter, saadd} @aston. ac. uk 
Yoshiyuki Kabashima 
Department of Computational Intelligence and Systems Science 
Tokyo Institute of Technology, Yokohama 2268502, Japan 
kaba @ dis. titech. ac.jp 
Abstract 
We analyze Gallager codes by employing a simple mean-field approxi- 
mation that distorts the model geometry and preserves important interac- 
tions between sites. The method naturally recovers the probability prop- 
agation decoding algorithm as an extremization of a proper free-energy. 
We find a thermodynamic phase transition that coincides with informa- 
tion theoretical upper-bounds and explain the practical code performance 
in terms of the free-energy landscape. 
1 Introduction 
In the last years increasing interest has been devoted to the application of mean-field tech- 
niques to inference problems. There are many different ways of building mean-field theo- 
ries. One can make a perturbative expansion around a tractable model [1,2], or assume a 
tractable structure and variationally determine the model parameters [3]. 
Error-correcting codes (ECC) are particularly interesting examples of inference problems 
in loopy intractable graphs [4]. Recently the focus has been directed to the state-of-the art 
high performance turbo codes [5] and to Gallager and MN codes [6,7]. Statistical physics 
has been applied to the analysis of ECCs as an alternative to information theory methods 
yielding some new interesting directions and suggesting new high-performance codes [8]. 
Soufias was the first to relate error-correcting codes to spin glass models [9], showing that 
the Random-energy Model [10] can be thought of as an ideal code capable of saturating 
Shannon's bound at vanishing code rates. This work was extended recently to the case of 
finite code rates [11] and has been further developed for analyzing MN codes of various 
structures [12]. All of the analyzes mentioned above as well as the recent turbo codes 
analysis [13] relied on the replica approach under the assumption of replica symmetry. 
To date, the only model that can be analyzed exactly is the REM that corresponds to an 
impractical coding scheme of a vanishing code rate. 
Here we present a statistical physics treatment of non-structured Gallager codes by em- 
ploying a mean-field approximation based on the use of a generalized tree structure (Bethe 
lattice [14]) known as Husimi cactus that is exactly solvable. The model parameters are 
just assumed to be those of the model with cycles. In this framework the probability prop- 
agation decoding algorithm (PP) emerges naturally providing an alternative view to the re- 
lationship between PP decoding and mean-field approximations already observed in [15]. 
Moreover, this approach has the advantage of being a slightly more controlled and easier 
to understand than replica calculations. 
This paper is organized as follows: in the next section we present unstructured Gallager 
codes and the statistical physics framework to analyze them, in section 3 we make use of 
the lattice geometry to solve the model exactly. In section 4 we analyze the typical code 
performance. We summarize the results in section 5. 
2 Gallager codes: statistical physics formulation 
We will concentrate here on a simple communication model whereby messages are rep- 
resented by binary vectors and are communicated through a Binary Symmetric Channel 
(BSC) where uncorrelated bit flips appear with probability f. A Gallager code is defined 
by a binary matrix A = [G I c2], concatenating two very sparse matrices known to both 
sender and receiver, with C2 (of dimensionality (M - N) x (M - N)) being invertible - 
the matrix G is of dimensionality (M - N) x N. 
Encoding refers to the production of an M dimensional binary code word t E {0, 1} M 
(M > N) from the original message  E {0, 1} v by t = GT (mod 2), where all 
operations are performed in the field {0, 1} and are indicated by (mod 2). The generator 
matrix is G = [I I (mod 2), where I is the N x N identity matrix, implying that 
AG T (mod 2) = 0 and that the first N bits of t are set to the message . In regular Gallager 
codes the number of non-zero elements in each row of A is chosen to be exactly K. The 
number of elements per column is then U = (1 -/)K, where the code rate is/ = N/M 
(for unbiased messages). The encoded vector t is then corrupted by noise represented by 
the vector (  {0, 1 } M with components independently drawn from P(6) = (1- f)6(6)+ 
f6(( - 1). The received vector takes the form r = (7w + ( (mod 2). 
Decoding is carried out by multiplying the received message by the matrix A to produce 
the syndrome vector z = Ar = A (mod 2) from which an estimate ? for the noise 
vector can be produced. An estimate for the original message is then obtained as the first 
N bits of r + ? (mod 2). The Bayes optimal estimator (also known as marginal posterior 
maximizer, MPM) for the noise is defined as j = argmax_j P(D I z), where D  {0, 1 }. 
The performance of this estimator can be measured by the probability of bit error pt, = 
1 - 1/M 5-_- 8[5; 5], where 8[;] is Kronecker's delta. Knowing the matrices C and 
G, the syndrome vector z and the noise level f it is possible to apply Bayes' theorem and 
compute the posterior probability 
1 
P(r I z) = X[z = Ar(mod2)]P(r), 
(1) 
where x[X] is an indicator function providing 1 if X is true and 0 otherwise. To compute 
the MPM one has to compute the marginal posterior P(rj I z) = 5-i7 P(r I z), which 
in general requires O(2 M) operations, thus becoming impractical for long messages. To 
solve this problem one can use the sparseness of A to design algorithms that require O(M) 
operations to perform the same task. One of these methods is the probability propagation 
algorithm (PP), also known as belief propagation or sum-product algorithm [ 16]. 
The connection to statistical physics becomes clear when the field {0, 1} is replaced by 
Ising spins {+l} and mod 2 sums by products [9]. The syndrome vector acquires the form 
of a multi-spin coupling fu = 1--[j6c(u) ( where j = 1,..., iV/and it = 1,..., (iV/- N). 
Figure 1: Husimi cactus with K = 3 and connectivity U = 4. 
The K indices of nonzero elements in the row/ of a matrix A, which is not necessarily a 
concatenation of two separate matrices (therefore, defining an unstructured Gallager code), 
are given by ;(/) = {j,..., j:}, and in a column/are given by .M(l) = {/,..',/c}. 
The posterior (1) can be written as the Gibbs distribution [12]: 
1 
- lira exp [-37/('r; 7)] 
(2) 
The external field corresponds to the prior probability over the noise and has the form 
F = atanh(1 - 2f). Note that the Hamiltonian depends on a hyper-parameter that has to be 
set as 3' --> oc for optimal decoding. The disorder is trivial and can be gauged as ,ft,  1 
by using D  DJ. The resulting Hamiltonian is a multi-spin ferromagnet with finite 
connectivity in a random field hj = F. The decoding process corresponds to finding 
local magnetizations at temperature  = 1, m = {D}/= and calculating estimates as 
^ 
-j = sgn m . 
In the {4-1} representation the probability of bit error, acquires the form 
M 
I 1 
Pb -- 2 2-- E Cj sgn(mj), (3) 
j=l 
connecting the code performance with the computation of local magnetizations. 
3 Bethe-like Lattice calculation 
3.1 Generalized Bethe lattice: the Husimi cactus 
A Husimi cactus with connectivity C is generated starting with a polygon of K vertices 
with one Ising spin in each vertex (generation 0). All spins in a polygon interact through 
a single coupling ,ft, and one of them is called the base spin. In figure 1 we show the first 
step in the construction of a Husimi cactus, in a generic step the base spins of the n - 1 
generation polygons, numbering (U - 1)(K - 1), are attached to K - 1 vertices of a gen- 
eration n polygon. This process is iterated until a maximum generation nmax is reached, 
the graph is then completed by attaching C uncorrelated branches of nrnax generations at 
their base spins. In that way each spin inside the graph is connected to exactly C poly- 
gons. The local magnetization at the centre m can be obtained by fixing boundary (initial) 
conditions in the 0-th generation and iterating recursion equations until generation nmax 
is reached. Carrying out the calculation in the thermodynamic limit corresponds to having 
nmax  In M generations and M --> . 
The Hamiltonian of the model has the form (2) where (it) denotes the polygon It of the 
lattice. Due to the tree-like structure, local quantities far from the boundary can be cal- 
culated recursively by specifying boundary conditions. The typical decoding performance 
can therefore be computed exactly without resorting to replica calculations [17]. 
3.2 Recursion relations: probability propagation 
We adopt the approach presented in [18] where recursion relations for the probability dis- 
tribution Pk(rk) of the base spin of the polygon/ is connected to (G - 1)(K - 1) dis- 
tributions Pj (rj), with v E .M(j) \/ (all polygons linked to j but/) of polygons in the 
previous generation: 
Puk(rk)=A/.Tr{,-j}ex p /9 7urk H D -1 + Frk H H 
(4) 
where the trace is over the spins rj such that j E (/) \ k. 
The effective field 2, on a base spin j due to neighbors in polygon v can be written as  
exp (-22,)=e 2F P"(-) 
Combining (4) and (5) one finds the recursion relation: 
exp (-22uk) = 
(5) 
Tr{-j} exp [-/9J u 1-[je(u)\k -J + Eje(u)\k( F + E,e.4(j)\u 
Tr{-} exp [+/9J u 1-[je(u)\k -J + Eje(u)\k( F + Ee(j)Su 
(6) 
By computing the traces and ting    one obtains: 
2uk = atanh [Ju H tanh(F +  2j) (7) 
The effective local magnetization due to interactions with the nearest neighbors in one 
branch is given by uj = tanh(2uj). The effective local field on a base spin j of a polygon 
 due to C - I branches in the previous generation and due to the external field is xuj = 
F + ve(j) vj; the effective local magnetization is, therefore, mj = tanh(xj). 
Equation (7) can then be rewritten in terms of uJ and muj and the PP equations [7,15,16] 
can be recovered: 
muk = tanh (F +  atanh (vk)) uk=u  muj (8) 
Once the magnetizations on the boundary (0-th generation) are assigned, the local magne- 
tization mj in the central site is determined by iterating (8) and computing  
mj=tanh(F+ y atanh(vj)) (9) 
3.3 Probability propagation as extremization of a free-energy 
The equations (8) describing PP decoding represent extrema of the following free-energy: 
M-N M-N 
 T'({mu,uk}) = y yln(1 +muiui ) -- y ln(l+Ju Hmui) (10) 
/=1 iG /=1 iG 
0.6 
0.4 
0.2 
0 o 
0.6 
0.2 
0.1 0.2 0.3 0.4 0.1 0.2 0.3 
f f 
0.4 0.5 
Figure 2: (a) Mean normalized overlap between the actual noise vector  and decoded 
noise ' for K = 4 and C = 3 (therefore R = 1/4). Theoretical values ([2), experimental 
averages over 20 runs for code word lengths M = 5000 (.) and M = 100 (full line). 
(b) Transitions for K = B. Shannon's bound (dashed line), information theory upper 
bound (full line) and thermodynamic transition obtained numerically (o). Theoretical (O) 
and experimental (+, M = 000 averaged over 20 runs) PP decoding transitions are also 
shown. In both figures, symbols are chosen larger than the error bars. 
ln [e ' H (l+uj)+ e-' 
J=- ue.M(j) 
H (1 - uJ)] 
ueA4(j) 
The iteration of the maps (8) is actually one out of many different methods of finding 
extrema of this free-energy (not necessarily stable). This observation opens an alternative 
way for analyzing the performance of a decoding algorithm by studying the landscape (10). 
4 Typical performance 
4.1 Macroscopic description 
The typical macroscopic states of the system during decoding can be described by his- 
tograms of the variables rnuk and uk averaged over all possible realizations of the noise 
vector . By applying the gauge transformation ,7't -> 1 and rj -> rjCj, assigning the 
probability distributions Po (x) to boundary fields and averaging over random local fields 
F one obtains from (7) the recursion relation in the space of probability distributions 
P(x): 
C-1 
I=1 
h h 
d2t P- (xt) 
K-1 
= 
j=l 
, (11) 
where P (x) is the distribution of effective fields at the n-th generation due to the previous 
generations and external fields, in the thermodynamic limit the distribution far from the 
boundary will be P (x) (generation n --> ec). The local field distribution at the central 
site is computed by replacing C - 1 by C in (11), taking into account C polygons in the 
generation just before the central site, and inserting the distribution P (x). Equations (11) 
are identical to those obtained by the replica symmetric theory as in [12]. 
By setting initial (boundary) conditions Po (z) and numerically iterating (11), for U _> 3 
one can find, up to some noise level rs, a single stable fixed point at infinite fields, corre- 
sponding to a totally aligned state (successful decoding). At f8 a bifurcation occurs and 
two other fixed points appear, stable and unstable, the former corresponding to a misaligned 
state (decoding failure). This situation is identical to that one observed in [12]. In terms of 
the free-energy (10), below f, the landscape is dominated by the aligned state that is the 
global minimum. Above f, a sub-optimal state, corresponding to an exponentially large 
number of spurious local minima of the Hamiltonian (2), appears and convergence to the 
totally aligned state becomes a difficult task. At some critical noise level the totally aligned 
state loses the status of global minimum and the thermodynamic transition occurs. 
The practical PP decoding is performed by setting initial conditions as muj = 1 - 2f, 
corresponding to the prior probabilities and iterating (8), until stationarity or a maximum 
number of iterations is attained. The estimate for the noise vector is then produced by com- 
puting j = sign(ms). At each decoding step the system can be described by histograms 
of the variables (8), this is equivalent to iterating (11) (a similar idea was presented in [7]). 
Below f, the process always converges to the successful decoding state, above f, it con- 
verges to the successful decoding only if the initial conditions are fine tuned, in general 
the process converges to the failure state. In Fig.2a we show the theoretical mean overlap 
between actual noise  and the estimate  as a function of the noise level f, as well as 
results obtained with PP decoding. 
Information theory provides an upper bound for the maximum attainable code rate by 
equalizing the maximal information contents of the syndrome vector z and of the noise 
estimate  [7]. The thermodynamic phase transition obtained by finding the stable fixed 
points of (11) and their free-energies interestingly coincides with this upper bound within 
the precision of the numerical calculation. Note that the performance predicted by thermo- 
dynamics is not practical as it requires 0(2 M) operations for an exhaustive search for the 
global minimum of the free-energy. In Fig.2b we show the thermodynamic transition for 
K = 6 and compare with the upper bound, Shannon's bound and the theoretical f8 values. 
4.2 Tree-like approximation and the thermodynamic limit 
The geometrical structure of a Gallager code defined by the matrix A can be represented 
by a bipartite graph (Tanner graph) [16] with bit and check nodes. Each column j of A 
represents a bit node and each row It represents a check node, Au5 = 1 means that there 
is an edge linking bit j to check It. It is possible to show that for a random ensemble of 
regular codes, the probability of completing a cycle after walking l edges starting from an 
arbitrary node is upper bounded by 7>[l; K, C', M] _< 12Kt/M. It implies that for very large 
M only cycles of at least order In M survive. In the thermodynamic limit M --> ec the 
probability 7>[/; K, C, M] --> 0 for any finite l and the bulk of the system is effectively tree- 
like. By mapping each check node to a polygon with K bit nodes as vertices, one can map 
a Tanner graph into a Husimi lattice that is effectively a tree for any number of generations 
of order less than In M. It is experimentally observed that the number of iterations of (8) 
required for convergence does not scale with the system size, therefore, it is expected that 
the interior of a tree-like lattice approximates a Gallager code with increasing accuracy as 
the system size increases. Fig.2a shows that the approximation is fairly good even for sizes 
as small as M = 100. 
5 Conclusions 
To summarize, we solved exactly, without resorting to the replica method, a system rep- 
resenting a Gallager code on a Husimi cactus. The results obtained are in agreement with 
the replica symmetric calculation and with numerical experiments carded out in systems 
of moderate size. The framework can be easily extended to MN and similar codes. New 
insights on the decoding process are obtained by looking at a proper free-energy landscape. 
We believe that methods of statistical physics are complimentary to those used in the statis- 
tical inference community and can enhance our understanding of general graphical models. 
Acknowledgments 
We acknowledge support from EPSRC (GR/N00562), The Royal Society (RV, DS) and 
from the JSPS RFTF program (YK). 
References 
[1] Plefka, T., (1982) Convergence condition of the TAP equation for the infinite-ranged Ising spin 
glass model. Journal of Physics A 15, 1971-1978. 
[2] Tanaka, T., Information geometry of mean field approximation. to appear in Neural Computation 
[3] Saul, L.K. &, M.I. Jordan (1996) Exploiting tractable substructures in intractable. In Touretzky, 
D. S., M. C. Mozer and M. E. Hasselmo (eds.), Advances in Neural Information Processing Systems 
8, pp. 486-492. Cambridge, MA: MIT Press. 
[4] Frey, B.J. & D.J.C. MacKay (1998) A revolution: belief propagation in graphs with cycles. In 
Jordan, M.I., M. J. Kearns and S.A. Solla (eds.), Advances in Neural Information Processing Systems 
10, pp. 479-485. Cambridge, MA: MIT Press. 
[5] Berrou, C. & A. Glavieux (1996) Near optimum error correcting coding and decoding: Turbo- 
codes, IEEE Transactions on Communications 44, 1261-1271. 
[6] Gallager, R.G. (1963) Low-density parity-check codes, MIT Press, Cambridge, MA. 
[7] MacKay, D.J.C. (1999) Good error-correcting codes based on very sparse matrices, IEEE Trans- 
actions on Information Theory 45, 399-431. 
[8] Kanter, I. & D. Saad (2000) Finite-size effects and error-free communication in Gaussian chan- 
nels, Journal of Physics A 33, 1675-1681. 
[9] Sourlas, N. (1989) Spin-glass models as error-correcting codes, Nature 339, 693-695. 
[10] DelTida, B. (1981) Random-energy model: an exactly solvable model of disordered systems, 
Physical Review B 24(5), 2613-2626. 
[11] Vicente, R., D. Saad & Y. Kabashima (1999) Finite-connectivity systems as error-correcting 
codes, Physical Review E 60(5), 5352-5366. 
[12] Kabashima, Y., T. Murayama & D.Saad (2000) Typical performance of Gallager-type error- 
correcting codes, Physical Review Letters' 84 (6), 1355-1358. 
[13] Montanari, A. & N. Soufias (2000) The statistical mechanics of turbo codes, European Physical 
Journal B 18, 107-119. 
[14] Sherrington, D. & K.Y.M. Wong (1987) Graph bipartitioning and the Bethe spin glass, Journal 
of Physics A 20, L785-L791. 
[15] Kabashima, Y. & D. Saad (1998) Belief propagation vs. TAP for decoding corrupted messages, 
Europhysics Letters' 44 (5), 668-674. 
[16] Kschischang, F.R. & B.J. Frey, (1998) Iterative decoding of compound codes by probability 
probagation in graphical models, IEEE Journal on Selected Areas in Comm. 16 (2), 153-159. 
[17] Gujrati, P.D. (1995) Bethe or Bethe-like lattice calculations are more reliable than conventional 
mean-field calculations, Physical Review Letters' 74 (5), 809-812. 
[18] Rieger, H. & T.R. Kirkpatrick (1992) Disordered p-spin interaction models on Husimi trees, 
Physical Review B 45 (17), 9772-9777. 
