Constrained Independent Component 
Analysis 
Wei Lu and Jagath C. Rajapakse 
School of Computer Engineering 
Nanyang Technological University, Singapore 639798 
email: asjag ath ntu. edu. s g 
Abstract 
The paper presents a novel technique of constrained independent 
component analysis (CICA) to introduce constraints into the clas- 
sical ICA and solve the constrained optimization problem by using 
Lagrange multiplier methods. This paper shows that CICA can 
be used to order the resulted independent components in a specific 
manner and normalize the demixing matrix in the signal separation 
procedure. It can systematically eliminate the ICA's indeterminacy 
on permutation and dilation. The experiments demonstrate the use 
of CICA in ordering of independent components while providing 
normalized demixing processes. 
Keywords: Independent component analysis, constrained indepen- 
dent component analysis, constrained optimization, Lagrange mul- 
tiplier methods 
I Introduction 
Independent component analysis (ICA) is a technique to transform a multivari- 
ate random signal into a signal with components that are mutually independent 
in complete statistical sense [1]. There has been a growing interest in research for 
efficient realization of ICA neural networks (ICNNs). These neural algorithms pro- 
vide adaptive solutions to satisfy independent conditions after the convergence of 
learning [2, 3, 4]. 
However, ICA only defines the directions of independent components. The magni- 
tudes of independent components and the norms of demixing matrix may still be 
varied. Also the order of the resulted components is arbitrary. In general, ICA has 
such an inherent indeterminacy on dilation and permutation. Such indetermina- 
tion cannot be reduced further without additional assumptions and constraints [5]. 
Therefore, constrained independent component analysis (CICA) is proposed as a 
way to provide a unique ICA solution with certain characteristics on the output by 
introducing constraints: 
 To avoid the arbitrary ordering on output components: statistical measures 
give indices to sort them in order, and evenly highlight the salient signals. 
 To produce unity transform operators: normalization of the demixing chan- 
nels reduces dilation effect on resulted components. It may recover the exact 
original sources. 
With such conditions applied, the ICA problem becomes a constrained optimization 
problem. In the present paper, Lagrange multiplier methods are adopted to provide 
an adaptive solution to this problem. It can be well implemented as an iterative 
updating system of neural networks, referred to ICNNs. Next section briefly gives an 
introduction to the problem, analysis and solution of Lagrange multiplier methods. 
Then the basic concept of ICA will be stated. And Lagrange multiplier methods are 
utilized to develop a systematic approach to CICA. Simulations are performed to 
demonstrate the usefulness of the analytical results and indicate the improvements 
due to the constraints. 
2 Lagrange Multiplier Methods 
Lagrange multiplier methods introduce Lagrange multipliers to resolve a constrained 
optimization iteratively. A penalty parameter is also introduced to fit the condition 
so that the local convexity assumption holds at the solution. Lagrange multiplier 
methods can handle problems with both equality and inequality constraints. 
The constrained nonlinear optimization problems that Lagrange multiplier methods 
deal take the following general form: 
minimize f(X), subject to g(X) _< 0, h(X) = 0 (1) 
where X is a matrix or a vector of the problem arguments, f(X) is an objective 
function, g(X) = [g(X)... gin(X)] T defines a set of m inequality constraints and 
h(X) = [h(X)...h(X)] T defines a set of n equality constraints. Because La- 
grangian methods cannot directly deal with inequality constraints gi(X) _< 0, it 
is possible to transform inequality constraints into equality constraints by intro- 
ducing a vector of slack variables z - [z... Zm] T to result in equality constraints 
2 =0, i= 1...m. 
pi(X) = gi(X) + z i 
Based on the transformation, the corresponding simplified augmented Lagrangian 
function for problem (1) is defined as: 
m 
(X,/,) f(X)+7 -]{[max{O,- X 2  1 
- si( )}] +Th(X)+711h(X)112 (2) 
i=1 
where/ = [/.../m] T and A = [A ... A,] T are two sets of Lagrange multipliers, 7 
is the scalar penalty parameter, i(X) equals to/i +7si(X), I1' II denotes Euclidean 
norm, and I II  is the penalty term to ensure that the optimization problem 
is held at the condition of local convexity assumption: V:x > 0. We use the 
augmented Lagrangian function in this paper because it gives wider applicability 
and provides better stability [6]. 
For discrete problems, the changes in the augmented Lagrangian function can be 
defined as Ax(X,/, A) to achieve the saddle point in the discrete variable space. 
The iterative equations to solve the problem in eq.(2) are given as follows: 
X(k + 1) = X(k)- Ax(X(k),l(k),A(k)) 
/(k + 1) = /(k) +?p(X(k)) = max(0,(X(k))} (3) 
A(k + 1) = A(k) + 7h(X(k)) 
where k denotes the iterative index and (X(k)) -/(k) + ? g(X(k)). 
3 Unconstrained ICA 
Let the time varying input signal be x = (x, x2,..., XN) T and the interested signal 
consisting of independent components (ICs) be  - (c,c2,...,CM) T, and gener- 
ally M _< N. The signal x is considered to be a linear mixture of independent 
components : x = A, where A is an N x M mixing matrix with full column rank. 
The goal of general ICA is to obtain a linear M x N demixing matrix W to recover 
the independent components  with a minimal knowledge of A and , normally 
M = N. Then, the recovered components u are given by u = Wx. 
In the present paper, the contrast function used is the mutual information (gd) of 
the output signal which is defined in the sense of variable's entropy to measure the 
independence: 
M(W) = -i_ H(ui) - H(u) (4) 
where H(ui) is the marginal entropy of component ui and H(u) is the output 
joint entropy. M has non-negative value and equals to zero when components are 
completely independent. 
While minimizing M, the learning equation for demixing matrix W to perform ICA 
is given by [1]: 
Aw w -T + (u)x T (5) 
where W -T is the transpose of the inverse matrix W - and (u) is a nonlinear 
function depending on the activation functions of neurons or p.d.f. of sources [1]. 
With above assumptions, the exact components c are indeterminant because of 
possible dilation and permutation. The independent components and the columns 
of A and the rows of W can only be estimated up to a multiplicative constant. The 
definitions of normal ICA imply no ordering of independent components [5]. 
4 Constrained ICA 
In practice, the ordering of independent components is quite important to separate 
non-stationary signals or interested signals with significant statistical characters. 
Eliminating indeterminacy in the permutation and dilation is useful to produce a 
unique ICA solution with systematically ordered signals and normalized demixing 
matrix. This section presents an approach to CICA by enhancing classical ICA 
procedure using Lagrange multiplier methods to obtain unique ICs. 
4.1 Ordering of Independent Components 
The independent components are ordered in a descent manner according to a certain 
statistical measure defined as index Z(u). The constrained optimization problem 
to CICA is then defined as follows: 
minimize Mutual Information M (W) 
subject to g(W) _ 0, g(W) = [g (W)'''gM- (W)] T (6) 
where g(W) is a set of (M - 1) inequality constraints, gi(W) = :Z(ui+) - :Z(ui) 
defines the descent order and :Z(ui) is the index of some statistical measures of 
output components ui, e.g. variance, normalized kurtosis. 
Using Lagrange multiplier methods, the augmented Lagrangian function is defined 
based on eq.(2) as: 
1 M- 
(W,) = M(W) +   {[max{0,i(W)}] 2 - ,} (7) 
i----1 
With discrete solutions applied, the changes of individual element wij can be for- 
mulated by minimizing eq.(7): 
wj  ,j(W(k),(k)) 
min 2M(W(k)) + [max{0,i_(W(k))} 
wij 
- max{0, i(W(k))}] Z'(ui(k)) xj 
where Z'(.) is the first derivative of index measure. 
The iterative equation for finding individual multipliers i is 
i(k + 1)= max{0, i(k)+"/[Z(ui+(k))-Z(ui(k))]} 
(8) 
(9) 
With the learning equation of normal ICNN given in (5) and the multiplier 
iterative equation (9), the iterative procedure to determine the demixing matrix W 
is given as follows: 
AW cx Aw(W,g) = W -T + (u)x T (10) 
 (,) - mz'(,) 
2(,) + (, - ,)z'(,) 
where (u) = 
M-(M-) + (.M- -- ._)Z'(_) 
() +.M_Z'() 
We apply measures of variance and kurtosis as examples to emerge the ordering 
among the signals. Then the functions Z and corresponding first-derivative Z ' be- 
come as below. 
variance  
kurtosis  
kur(i) = E{t/} 2 3 
var(i) : 2E{t/i} (11) 
4u 4E{u}ui (12) 
Zlur() = f)2 f) 
The signal with the most variance shows the majority of information that input 
signals consist of. The ordering based on variance sorts the components in informa- 
tion magnitude that needs to reconstruct the original signals. However, it should 
be used accompanying with other preprocessing or constraints, such as PCA or nor- 
malization, because the normal ICA's indeterminacy on dilation of demixing matrix 
may cause the variance of output components being amplified or reduced. 
Normalized kurtosis is a kind of 4th-order statistical measure. The kurtosis of a 
stationary signal to be extracted is constant under the situation of indeterminacy on 
signals' amplitudes. Kurtosis shows the high order statistical character. Any signal 
can be categorized into super-Gaussian, Gaussian and sub-Gaussianly distributed 
ones by using kurtosis. The components are ordered in the distribution of sparseness 
(i.e. super-Gaussian) to denseness (i.e. sub-Gaussian). Kurtosis has been widely 
used to produce one-unit ICA [7]. In contrast to their sequential extraction, our 
approach can extract and order the components in parallel. 
4.2 Normalization of Demixing Matrix 
The definition of ICA implies an indeterminacy in the norm of the mixing and 
demixing matrix, which is in contrast to, e.g. PCA. Rather than the unknown 
mixing matrix A was to be estimated, the rows of the demixing matrix W can be 
normalized by applying a constraint term in the ICA energy function to establish 
a normalized demixing channel. The constrained ICA problem is then defined as 
follows: 
minimize Mutual Information JM (W) 
subject to h(W) = [h(W)...hM(W)] T = 0 (13) 
where h(W) defines a set of M equality constraints, hi(wi) = w/w/- I (i = 
1,..., M), which define the row norms of the demixing matrix W equal to 1. 
Using Lagrange multiplier methods, the augmented Lagrangian function is defined 
based on eq.(2) as: 
(W, ) = JM(W) + A Tdiag[ww T - I] + 7 IIdiag[ WWT - I]ll 2 (14) 
where diag[.] denotes the operation to select the diagonal elements in the square 
matrix as a vector. 
By applying discrete Lagrange multiplier method, the iterative equation minimizing 
the augmented function for individual multiplier Ai is 
&i(k + 1) = &i(k) + 7 (w/w/- 1) 
and the iterative equation of demixing matrix W is given as follows: 
aW  awC(W,X) = W -T + (u)x T + a(W) 
where Qi(wi) = 2Aiwi rr 
(15) 
(16) 
Let assume c is the normalized source with unit variance such that E{cc T } = I, and 
the input signal x is processed by a prewhitening matrix P such that p = Px obeys 
E{pp T} = I. Then with the normalized demixing matrix W, the network output 
u contains exact independent components with unit magnitude, i.e. ui contains one 
4-cj for some non-duplicative assignment j - i. 
5 Experiments and Results 
The CICA algorithms were simulated in MATLAB version 5. The learning proce- 
dure ran 500 iterations with certain learning rate. All signals were preprocessed by 
a whitening process to have zero mean and uniform variance. The accuracy of the 
recovered components compared to the source components was measured by the 
signal to noise ratio (SNR) in dB, where signal power was measured by the variance 
of the source component, and noise was the mean square error between the sources 
and recovered ones. The performance of the network separating the signals into ICs 
was measured by an individual performance index (IPI) of the permutation error ei 
for ith output: 
n 
ei- Ipjl )- 1 (17) 
maxk Ipikl 
where Pij were elements of the permutation matrix P = WA. IPI was close to zero 
when the corresponding output was closely independent to other components. 
5.1 Ordering ICs in Signal Separation 
Three independent random signals distributed in Gaussian, sub- and super-Gaussian 
manner were simulated. Their statistical configurations were similar to those used 
in [1]. These source signals  were mixed with a random matrix to derive inputs 
to the network. The networks were trained to obtain the 3 x 3 demixing matrix 
using the algorithm of kurtosis-constraint CICA eq.(10) and (12) to separate three 
independent components in complete ICA manner. 
The source components, mixed input signals and the resulted output waveforms 
are shown in figure 1 (a), (b) and (c), respectively. The network separated and 
(a) (b) (c) 
Figure 1: Result of extraction of one super-Gaussian, one Gaussian and one sub- 
Gaussian signals in the kurtosis descent order. Normalized kurtosis measurements 
are ;4(yl) - 32.82, ;4(y2) - -0.02 and ;4(Y3) - -1.27. (a) Source components, 
(b) input mixtures and (c) resulted components. 
sorted the output components in a decreasing manner of kurtosis values, where the 
component yl had kurtosis 32.82 (> 0, super-Gaussian), y2 is 0.02 ( 0, Gaussian) 
and y3 is -1.27 (< 0, sub-Gaussian). The final performance index value of 0.28 
and output components' average SNR value of 15dB show all three independent 
components well separated too. 
5.2 Demixing Matrix Normalization 
Three deterministic signals and one Gaussian noise were simulated in this experi- 
ment. All signals were independently generated with unit variance and mixed with 
a random mixing matrix. All input mixtures were preprocessed by a whitening pro- 
cess to have zero mean and unit variance. The signals were separated using both 
unconstrained ICA and constrained ICA as given by eq.(5) and (16) respectively. 
Table 1 compares their resulted demixing matrix, row norms, variances of separated 
components and SNR values. The dilation effect can be seen from the difference 
y Demixing Matrix W Norms Variance SNR 
y 0.90 0.08 -0.12 -0.82 1.23 1.50 4.55 
uncons. y2 -0.06 1.11 -0.07 0.07 1.11 1.24 10.88 
ICA y3 0.07 0.07 1.47 -0.09 1.47 2.17 21.58 
y4 1.04 0.08 0.04 1.16 1.56 2.43 16.60 
y 0.65 0.43 -0.02 -0.61 0.99 0.98 4.95 
cons. y,-0.37 0.91 0.05 0.20 1.01 1.02 13.94 
ICA y 0.01 -0.04 1.00 -0.04 1.00 1.00 25.04 
y4 0.65 0.07 0.02 0.76 1.00 1.00 22.56 
Table 1: Comparison of the demixing matrix elements, row norms, output variances 
and resulted components' SNR values in ICA, and CICA with normalization. 
among components' variances caused by the non-normalized demixing matrix in 
unconstrained ICA. The CICA algorithm with normalization constraint normalized 
rows of the demixing matrix and separated the components with variances remained 
at unit. Therefore, the source signals are exactly recovered without any dilation. 
The increment of separated components' SNR values using CICA also can be seen 
in the table. Their source components, input mixture, separated components using 
normalization are given in figure 2. It shows that the resulted signals from CICA 
are exactly match with the source signals in the sense of waveforms and amplitudes. 
Samples n Tme Series Samples n Tme Series Samples n Tme Senes 
(a) (b) (c) 
Figure 2: (a) Four source deterministic components with unit variances, (b) mixture 
inputs and (c) resulted components through normalized demixing channel W. 
6 Conclusion 
We present an approach of constrained ICA using Lagrange multiplier methods to 
eliminate the indeterminacy of permutation and dilation which are present in clas- 
sical ICA. Our results provide a technique for systematically enhancing the ICA's 
usability and performance using the constraints not restricted to the conditions 
treated in this paper. More useful constraints can be considered in similar manners 
to further improve the outputs of ICA in other practical applications. Simulation 
results demonstrate the accuracy and the usefulness of the proposed algorithms. 
References 
[1] Jagath C. Rajapakse and Wei Lu. Unified approach to independent component 
networks. In Second International ICSC Symposium on NEURAL COMPUTA- 
TION (NO'2000), 2000. 
[2] A. Bell and T. Sejnowski. An information-maximization approach to blind sep- 
aration and blind deconvolution. Neurocomputing, 7:1129-1159, 1995. 
[3] S. Amari, A. Chchocki, and H. Yang. A new learning algorithm for blind signal 
separation. In Advances in Neural Information Processing Systems 8, 1996. 
[4] T-W. Lee, M. Girolami, and T. Sejnowski. Independent component analysis us- 
ing an extended informax algorithm for mixed sub-gaussian and super-gaussian 
sources. Neural Computation, 11(2):409-433, 1999. 
[5] P. Comon. Independent component analysis: A new concept? Signal Processing, 
36:287-314, 1994. 
[6] Dimitri P. Bertsekas. Constrained optimization and Lagrange multiplier meth- 
ods. New York: Academic Press, 1982. 
[7] A. Hyv/irinen and Erkki Oja. Simple neuron models for independent component 
analysis. Neural Systems, 7(6):671-687, December 1996. 
