Skip to main content

We will keep fighting for all libraries - stand with us!

Full text of "DTIC ADA383267: The 10th IEEE Signal Processing Workshop on Statistical Signal and Array Processing"

See other formats

Proceedings  of  the 
Tenth  IEEE  Workshop  on 

Statistical  Signal  and  Array  Processing 

Sponsored  by 

The  IEEE  Signal  Processing  Society 

August  14-16,  2000 

Pocono  Manor  Inn,  Pocono  Manor,  Pennsylvania,  USA 

20001024  007 

Supported  by 

Office  of  Naval  Research  Air  Force  Research  Laboratory  Villanova  University 

DUG  QUALITY  Hfszursnp  4 


Form  Approved 
OMB  No.  0704-0188 

Public  reporting  burden  for  this  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  data  sources. 

gathering  and  maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection 

of  information,  including  suggestions  for  reducing  this  burden  to  Washington  Headquarters  Service,  Directorate  for  Information  Operations  and  Reports, 

1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington,  VA  222024302,  and  to  the  Office  of  Management  and  Budget, 

Paperwork  Reduction  Project  (0704-0188)  Washington,  DC  20503. 






3.  DATES  COVERED  (From  -  To) 

January  2000  -  September  2000 




N000 1 4-00- 1 -00 1 4 


6.  AUTHOR(S) 



Moeness  G.  Amin 



Villanova  University 
800  Lancaster  Ave 
Villanova,  PA  19085 


Office  of  Naval  Research,  Program  Officer:  W.  Miceli 
Ballston  Center  Tower  One 
800  North  Quincy  Street 
Arlington,  VA  22217-5660 



Acc:  527639 



Approved  for  Public  Release;  Distribution  is  Unlimited 


This  is  the  Proceedings  of  the  10th  IEEE  Workshop  on  Statistical  Signal  and  Array  Processing  (SSAP),  which  was  held  at  the  Pocono 
Manor  Inn.  Pocono  Manor,  Pa  during  the  period  of  August  14th-16th,  2000.  The  Workshop  featured  four  keynote  speakers  whose  talks 
covered  the  areas  of  Radar  and  Sonar  Signal  Processing;  Time-Delay  Estimation;  Space-Time  Codes;  and  Multi-carrier  CDMA.  The 
Workshop  offered  traditional  and  new  research  topics.  It  included  one  session  on  Radar  Signal  Processing,  one  session  on  Signal 
Processing  for  GPS,  one  session  on  Network  Traffic  Modeling,  one  session  on  Statistical  Signal  Processing,  one  session  on  Acoustical 
Signal  Processing,  two  sessions  on  Time-Frequency  Analysis,  two  sessions  on  Array  Processing,  three  sessions  on  Second  and  Higher 
Order  Statistics,  and  four  sessions  on  Signal  Processing  for  Communications.  The  workshop  received  the  highest  number  of  paper 
submissions  compared  to  previous  workshops  in  the  same  area,  and  the  technical  committee  carefully  selected  high  quality  papers  for 
presentations.  The  2000 IEEE-SSAP  Workshop  was  a  tremendous  success  in  all  aspects. 


Radar  Signal  Processing,  Statistical  Signal  Processing,  Signal  Processing  for  Communications, 
Time-Frequency  Analysis,  Array  Processing 










19b.  TELEPONE  NUMBER  ( Include  area  code) 

Standard  Form  298  (Rev.  8-98) 

Prescribed  by  ANSI-Std  Z39-18 

Proceedings  of  the 
Tenth  IEEE  Workshop  on 

Statistical  Signal  and  Array  Processing 

Sponsored  by 

The  IEEE  Signal  Processing  Society 

August  14-16,  2000 

Pocono  Manor  Inn,  Pocono  Manor,  Pennsylvania,  USA 

Supported  by 

Office  of  Naval  Research  Air  Force  Research  Laboratory  Villanova  University 

Proceedings  of  the  Tenth  IEEE  Workshop  on  Statistical  Signal  and  Array  Processing 

Copyright  and  Reprint  Permission:  Abstracting  is  permitted  with  credit  to  the  source.  Libraries  are 
permitted  to  photocopy  beyond  the  limit  of  U.S.  copyright  law  for  private  use  of  patrons  those  ar¬ 
ticles  in  this  volume  that  carry  a  code  at  the  bottom  of  the  first  page,  provided  the  per-copy  fee  in¬ 
dicated  in  the  code  is  paid  through  Copyright  Clearance  Center,  222  Rosewood  Drive,  Danvers,  MA 
01923.  For  other  copying,  reprint  or  republication  permission,  write  to  IEEE  Copyrights  Manager, 
IEEE  Operations  Center,  445  Hoes  Lane,  P.O.  Box  1331,  Piscataway,  NJ  08855-1331.  All  rights 
reserved.  Copyright  ©  2000  by  The  Institute  of  Electrical  and  Electronics  Engineers,  Inc. 

IEEE  Catalog  Number:  00TH8496 
ISBN:  0-7803-5988-7  (hardbound) 

Library  of  Congress  Number:  99-69422 

IEEE  SSAP-2000  Workshop  Committee 

General  and  Organizational  Chair 
Moeness  Amin 
Villanova  University,  USA. 
e-mail:moeness  @ 

Technical  Chair 
Mike  Zoltowski 
Purdue  University,  USA 

Kevin  Buckley 
Villanova  University,  USA 

Rick  Blum 

Lehigh  University,  USA 

Bill  Jemison 
Lafayette  College,  USA 

Local  Arrangement 
Wojtek  Berger 
University  of  Scranton,  USA 

Asian  Liaison 
Rahim  Leyman 

e-mail:EARLEYMAN  @ 

Australian  Liaison 
Abdelhak  Zoubir 


European  Liaison 
Pierre  Comon 


Table  of  Contents 


Multistage  Multiuser  Detection  for  CDMA  with  Space-Time  Coding 

Y.  Zhang  and  R.  S.  Blum  —  Lehigh  University .  1 

Adaptive  MAP  Multi-User  Detection  for  Fading  CDMA  Channels 

C.  Andrieu  and  A.  Doucet  —  Cambridge  University,  UK 

A.  Touzni  —  NxtWave  Communications .  6 

Analysis  of  a  Subspace  Channel  Estimation  Technique  for  Multicarrier  CDMA  Systems 

C.  J.  Escudero,  D.  I.  Iglesia,  M.  F.  Bugallo,  and  L.  Castedo  —  Universidad  de  La  Coruna,  Spain .  10 

Blind  Adaptive  Asynchronous  CDMA  Multiuser  Detector  Using  Prediction  Least  Mean  Kurtosis  Algorithm 

K.  Wang  and  Y.  Bar- Ness  —  New  Jersey  Institute  of  Technology . .  15 

MMSE  Equalization  for  Forward  Link  in  3G  CDMA:  Symbol-Level  Versus  Chip-Level 

T.  R  Krauss,  W.  J.  Hillery,  and  M.  D.  Zoltowski  —  Purdue  University .  18 

Transform  Domain  Array  Processing  for  CDMA  Systems 
Y.  Zhang  and  M.  G.  Amin  —  Villanova  University 

K.  Yang  —  ATR  Adaptive  Communications  Research  Laboratories,  Japan .  23 

Sectorized  Space-Time  Adaptive  Processing  for  CDMA  Systems 

K.  Yang,  Y.  Mizuguchi  —  ATR  Adaptive  Communications  Research  Laboratories,  Japan 

Y.  Zhang  —  Villanova  University .  28 

Demodulation  of  Amplitude  Modulated  Signals  in  the  Presence  of  Multipath 

Z.  Xu  and  P.  Liu  —  University  of  California .  33 

Multichannel  and  Block  Based  Precoding  Methods  for  Fixed  Point  Equalization  of  Nonlinear  Communication 


A.  J.  Redfern  —  Texas  Instruments 

G.  T.  Zhou  —  Georgia  Institute  of  Technology .  38 

Joint  Estimation  of  Propagation  Parameters  in  Multicarrier  Systems 

S.  Aouada  and  A.  Belouchrani  —  Ecole  Nationale  Polytechnique,  Algeria .  43 

OFDM  Spectral  Characterization:  Estimation  of  the  Bandwidth  and  the  Number  of  Sub-Carriers 

W.  Akmouche  —  CELAR,  France 

E.  Kerherve  and  A.  Quinquis  —  ENSIETA,  France .  48 

Blind  Source  Separation  of  Nonstationary  Convolutively  Mixed  Signals 

B.  S.  Krongold  and  D.  L.  Jones  —  University  of  Illinois  at  Urbana-Champaign .  53 

A  Versatile  Spatio-Temporal  Correlation  Function  for  Mobile  Fading  Channels  with  Non-Isotropic  Scattering 

A.  Abdi  and  M.  Kaveh  —  University  of  Minnesota  .  58 

Session  MA-2.  Array  Processing  I 

A  Batch  Subspace  ICA  Algorithm 

A.  Mansour  and  N.  Ohnishi  —  RIKEN,  Japan .  63 

Comparative  Study  of  Two-Dimensional  Maximum  Likelihood  and  Interpolated  Root-Music  with  Application  to 
Teleseismic  Source  Localization 

P.J.  Chung  and  J.  F.  Bohme  —  Ruhr  University,  Germany 

A.  B.  Gershman  —  McMaster  University,  Canada  .  68 

Bounds  on  Uncalibrated  Array  Signal  Processing 

B.  M.  Sadler  —  Army  Research  Laboratory 

R.  J.  Kozick  —  Bucknell  University .  73 

Array  Processing  in  the  Presence  of  Unknown  Nonuniform  Sensor  Noise:  A  Maximum  Likelihood  Direction  Finding 
Algorithm  and  Cram6r-Rao  Bounds 

M.  Pesavento  and  A.  B.  Gershman  —  McMaster  University,  Canada  .  78 

Matched  Symmetrical  Subspace  Detector 

V.  S.  Golikov  and  F.  C.  Pareja  —  Ciencia  y  Tecnologia  del  Mayab,  A.  C.,  Mexico .  83 

Table  of  Contents 

Multiple  Source  Direction  Finding  with  an  Array  of  M  Sensors  Using  Two  Receivers 

E.  Fishier  and  H.  Messer  —  Tel  Aviv  University,  Israel .  86 

Self-Stabilized  Minor  Subspace  Extraction  Algorithm  Based  on  Householder  Transformation 

K.  Abed-Meraim  and  S.  Attallah  —  National  University  of  Singapore,  Singapore 
A.  Chkeif  —  Telecom  Paris,  France 

Y.  Hua  —  University  of  Melbourne,  Australia  .  90 

A  Bootstrap  Technique  for  Rank  Estimation 

P.  Pelin,  R.  Brcich  and  A.  Zoubir  —  Curtin  University  of  Technology,  Australia .  94 

Detection-Estimation  of  More  Uncorrelated  Sources  than  Sensors  in  Noninteger  Sparse  Linear  Antenna  Arrays 

Y.  I.  Abramovich  and  N.  K.  Spencer  —  CSSIP,  Australia  .  99 

A  New  Gerschgorin  Radii  Based  Method  for  Source  Number  Detection 

H.  Wu  and  C.  Chen  —  Southern  Taiwan  University  of  Technology,  Taiwan .  104 


Adapting  Multitaper  Spectrograms  to  Local  Frequency  Modulation 

J.  W.  Pitton  —  University  of  Washington .  108 

Optimal  Subspace  Selection  for  Non-Linear  Parameter  Estimation  Applied  to  Refractivity  from  Clutter 

S.  Kraut  and  J.  Krolik  —  Duke  University .  113 

MAP  Model  Order  Selection  Rule  for  2-D  Sinusoids  in  White  Noise 

M.  A.  Kliger  and  J.  M.  Francos  —  Ben-Gurion  University,  Israel .  118 

Optimum  Linear  Periodically  Time-Varying  Filter 

D.  Wei  —  Drexel  University .  123 

Fast  Approximated  Sub-Space  Algorithms 

M.  A.  Hasan  —  University  of  Minnesota  Duluth 

A.  A.  Hasan  —  College  of  Electronic  Engineering,  Libya .  127 

Stochastic  Algorithms  for  Marginal  Map  Retrieval  of  Sinusoids  in  Non-Gausslan  Noise 

C.  Andrieu  and  A.  Doucet  —  University  of  Cambridge,  UK .  131 

Harmonic  Analysis  Associated  with  Spatio-Temporal  Transformations 

J.  Leduc  —  Washington  University  in  Saint  Louis .  1 30* 


Blind  Noise  and  Channel  Estimation 

M.  Frikel,  W.  Utschick,  andJ.  Nossek —  Technical  University  of  Munich,  Germany . .  141 

Multiuser  Detection  in  Impulsive  Noise  via  Slowest  Descent  Search 

P.  Spasojevic  —  Rutgers  University 

X.  Wang  —  Texas  A&M  University .  146 

Maximum  Likelihood  Delay-Doppler  Imaging  of  Fading  Mobile  Communication  Channels 

L.  M.  Davis  —  Bell  Laboratories,  Australia 

I.  B.  Coliings  —  University  of  Sydney,  Australia 

R.  J.  Evans  —  University  of  Melbourne,  Australia  .  151 

Enhanced  Space-Time  Capture  Processing  for  Random  Access  Channels 

A.  M.  Kuzminskiy,  K.  Samaras,  C.  Luschi  and  P.  Strauch  —  Bell  Laboratories,  Lucent  Technologies,  UK .  156 

Asymmetric  Signaling  Constellations  for  Phase  Estimation 

T.  Thaiupathump,  C.  D.  Murphy  and  S.  A.  Kassam  —  University  of  Pennsylvania .  161 

A  Convex  Semi-Blind  Cost  Function  for  Equalization  in  Short  Burst  Communications 

K.  K.  Au  and  D.  Hatzinakos  —  University  of  Toronto,  Canada .  166 


Table  of  Contents 

Performance  Analysis  of  Blind  Carrier  Phase  Estimators  for  General  QAM  Constellations 

E.  Serpedin  —  Texas  A&M  University 

P.  Ciblatand  P.  Loubaton  —  University  de  Marne-la-Vallde,  France 

G.  B.  Giannakis  —  University  of  Minnesota  .  171 

Unbiased  Parameter  Estimation  for  the  Identification  of  Bilinear  Systems 

S.  Meddeb,  J.  Y.  Tourneret  and  F.  Castanie  —  ENSEEIHT  /VESA,  France .  176 

Blind  Identification  of  Linear-Quadratic  Channels  with  Usual  Communication  Inputs 

N.  Petrochilos  —  Delft  University  of  Technology,  Netherlands 

P.  Comon  —  University  de  Nice,  France .  181 

Joint  Channel  Estimation  and  Detection  for  Interference  Cancellation  in  Multi-Channel  Systems 

C.  Martin  and  B.  Ottersten  —  Royal  Institute  of  Technology  (KTH),  Sweden .  186 

A  Spatial  Clustering  Scheme  for  Downlink  Beamforming  In  SDMA  Mobile  Radio 

IV.  Huang  and  J.  F.  Doherty  —  Pennsylvania  State  University .  191 

On  the  Use  of  Cyclostationary  Filters  to  Transmit  Information 

A.  Duverdier  —  ONES,  France 

B.  Lacaze  and  J.  Tourneret  —  ENSEEIHT/SIC,  France .  196 

Non-Parametric  Trellis  Equalization  in  the  Presence  of  Non-Gaussian  Interference 

C.  Luschi  —  Bell  Laboratories,  Lucent  Technologies,  UK 

B.  Mulgrew  —  University  of  Edinburgh,  UK  .  201 

Analytical  Blind  Identification  of  a  SISO  Communication  Channel 

O.  Grellier  and  P.  Comon  —  University  de  Nice,  France  . .  206 

The  Role  of  Second-Order  Statistics  in  Blind  Equalization  of  Nonlinear  Channels 

R.  Lopez-Valcarce  and  S.  Dasgupta  —  University  of  Iowa .  211 

On  Super-Exponential  Algorithm,  Constant  Modulus  Algorithm  and  Inverse  Filter  Criteria  for  Blind  Equalization 

C.  Chi,  C.  Chen  and  B.  Li  —  National  Tsing  Hua  University,  Taiwan  .  216 


An  Efficient  Algorithm  for  Gaussian-Based  Signal  Decomposition 

Z  Hong  and  B.  Zheng  —  Xidian  University,  China .  221 

Consistent  Estimation  of  Signal  Parameters  In  Non-Stationary  Noise 

J.  Friedmann,  E.  Fishier  and  H.  Messer  —  Tel  Aviv  University,  Israel  .  225 

Channel  Order  and  RMS  Delay  Spread  Estimation  for  AC  Power  Line  Communications 

H.  Li  —  Stevens  Institute  of  Technology 
Z.  Bi  and  J.  Li  —  University  of  Florida 

D.  Liu  —  Watson  Research  Center 

P.  Stoica  —  Uppsala  University,  Sweden  .  229 

Taylor  Series  Adaptive  Processing 

D.  J.  Rabideau  —  Massachusetts  Institute  of  Technology  . . .  234 

Adaptive  Bayesian  Signal  Processing — A  Sequential  Monte  Carlo  Paradigm 

X.  Wang  and  R,  Chen  —  Texas  A&M  University 

J.  S.  Liu  —  Stanford  University .  239 

QQ-Plot  Based  Probability  Density  Function  Estimation 

Z.  Djurovic  and  V.  Barroso  —  Instituto  Superior  Tecnico  —  Instituto  de  Sistemas  e  Robdtica,  Portugal 

B.  Kovacevic  —  University  of  Belgrade,  Yugoslavia . .  243 

Nonlinear  System  Inversion  Applied  to  Random  Variable  Generation 

A.  Pagds-Zamora,  M.  A.  Lagunas  and  X.  Mestre  —  Universitat  Politdcnica  de  Catalunya,  Spain .  248 

The  Numerical  Spread  as  a  Measure  of  Non-Stationarity:  Boundary  Effects  in  the  Numerical  Expected  Ambiguity 

R.  A.  Hedges  and  B.  W.  Suter  —  Air  Force  Research  Laboratory  IFGC  .  252 


Table  of  Contents 

Locally  Stationary  Processes 

M.  E.  Oxley  and  T.  F.  Reid  —  Air  Force  Institute  of  Technology 

B.  W.  Suter  —  Air  Force  Research  Laboratory .  257 

Statistical  Performance  Comparison  of  a  Parametric  and  a  Non-Parametric  Method  for  If  Estimation  of  Random 
Amplitude  Linear  FM  Signals  in  Additive  Noise 

M.  R.  Morelande,  B.  Barkat  and  A.  M.  Zoubir  —  Curtin  University  of  Technology,  Australia  .  262 


The  Application  of  a  Nonlinear  Inverse  Noise  Cancellation  Technique  to  Maritime  Surveillance  Radar 

M.  R.  Cowper  and  B.  Mulgrew  —  University  of  Edinburgh,  UK. .  267 

Adaptive  Digital  Beamforming  RADAR  for  Monopulse  Angle  Estimation  in  Jamming 

K.  Yu  —  GE  Research  &  Development  Center 

D.  J.  Murrow  —  Lockheed  Martin  Ocean,  Radar  &  Sensors  Systems .  272 

Statistical  Analysis  of  SMF  Algorithm  for  Polynomial  Phase  Signals  Analysis 

A.  Ferrari  and  G.  Alengrin  —  University  de  Nice  Sophia-Antipolis,  France  .  276 

Passive  Sonar  Signature  Estimation  Using  Bispectrai  Techniques 

R. K.  Lennartsson,  J.W.C.  Robinson,  and  L.  Persson  —  Defence  Research  Establishment,  Sweden 
M.J.  Hinich  —  University  of  Texas  at  Austin 

S.  McLaughlin  —  University  of  Edinburgh,  UK .  281 

Approximate  CFAR  Signal  Detection  in  Strong  Low  Rank  Non-Gaussian  Interference 

/.  P.  Kirsteins  —  Naval  Undersea  Warfare  Center 

M.  Rangaswamy  —  ARCON  Corporation .  286 

Blind  Equalization  of  Phase  Aberrations  in  Coherent  Imaging:  Medical  Ultrasound  and  SAR 

S.  D.  Silverstein  —  University  of  Virginia  .  291 

False  Detection  of  Chaotic  Behaviour  in  the  Stochastic  Compound  K-Distribution  Model  of  Radar 
Sea  Clutter 

C. P.  Unsworth,  M.R.  Cowper,  S.  McLaughlin,  and  B.  Mulgrew  —  University  of  Edinburgh,  UK .  296 


Recursive  Estimator  for  Separation  of  Arbitrarily  Kurtotic  Sources 

M.  Enescu  and  V.  Koivunen  —  Helsinki  Univ.  of  Technology,  Finland .  30 1 

A  Second  Order  Multi  Output  Deconvolution  (SOMOD)  Technique 

H.  Bousbia-Salah  and  A.  Belouchrani  —  Ecole  Nationale  Polytechnique,  Algeria .  306 

DOA  Estimation  of  Many  W-Disjoint  Orthogonal  Sources  from  Two  Mixtures  Using  Duet 
S.  Rickard  —  Princeton  University 

F.  Dietrich  —  Siemens  Corporate  Research .  311 

Blind  Separation  of  Non-Circular  Sources 

J.  Galy  —  LIRMM,  France 

C.Adnet — Thomson-Csf  Airsys, France  .  315 

Blind  Identification  of  Slightly  Delayed  Mixtures 

G.  Chabriel  and  J.  Barrdre  —  University  de  Toulon  et  du  Var,  France .  319 

Robust  Source  Separation  Using  Ranks 

L.  Xiang,  Y.  Zhang  and  S.  A.  Kassam  —  University  of  Pennsylvania .  324 

Semi-Blind  Maximum  Likelihood  Separation  of  Linear  Convolutive  Mixtures 

J.  Xavier  and  V.  Barroso  —  Instituto  Superior  Tricnico  —  Instituto  de  Sistemas  e  Robdtica,  Portugal  .  329 

Techniques  for  Blind  Source  Separation  Using  Higher-Order  Statistics 

Z.  M.  Kamran  and  A.  R.  Leyman  —  Nanyang  Technological  University,  Singapore 

K.  Abed-Meraim  —  ENST/TSI,  France .  334 


Table  of  Contents 

Joint-Diagonalization  of  Cumulant  Tensors  and  Source  Separation 

E.  Moreau  —  MS-GESSY,  ISITV,  France .  339 

New  Criteria  for  Blind  Signal  Separation 

N.  Thirion-Moreau  and  E.  Moreau  —  MS-GESSY,  ISITV,  France  .  344 

An  Iterative  Algorithm  Using  Second  Order  Moments  Applied  to  Blind  Separation  of  Sources  with  Same  Spectral 

J.  Cavassilas,  B.  Xerri  and  B.  Borloz  —  University  de  Toulon  et  du  Var,  France . v .  349 

Performance  of  Cumulant  Based  Inverse  Filter  Criteria  for  Blind  Deconvolution  of  Multi-Input  Multi-Output  Linear 
Time-Invariant  Systems 

C.  Chi  and  C.  Chen  —  National  Tsing  Hua  University,  Taiwan .  354 

Separation  of  Non  Stationary  Sources;  Achievable  Performance 

J.  Cardoso  —  C.N.R.S./E.N.S.T.,  France .  359 

Modified  BSS  Algorithms  Including  Prior  Statistical  Information  about  Mixing  Matrix 

J.  Igual  and  L.  Vergara  —  Universidad  Politecnica  Valencia,  Spain .  364 

Approximate  Maximum  Likelihood  Blind  Source  Separation  with  Arbitrary  Source  PDFs 
M.  Ghogho  and  T.  Durrani  —  University  of  Strathclyde,  UK 

A.  Swami  —  Army  Research  Lab .  368 


Power  Spectral  Density  Analysis  of  Randomly  Switched  Pulse  Width  Modulation  for  DC/AC  Converters 

R.  L.  Kirlin  —  University  of  Victoria,  Canada 
M.  M.  Bech  —  University  of  Aalborg,  Denmark 

A  M.  Trzynadlowski  —  University  of  Nevada  Reno .  373 

Study  on  Spectral  Analysis  and  Design  for  DC/DC  Conversion  Using  Random  Switching  Rate  PWM 

R.  L.  Kirlin,  J.  Wang,  and  R.  M.  Dizaji  —  University  of  Victoria,  Canada  .  378 

Spectral  Subtraction  and  Spectral  Estimation 

M.  A.  Lagunas  and  A.  I.  Perez-Neira  —  Campus  Nord  UPC,  Spain  .  383 

Parameter  Estimation:  The  Ambiguity  Problem 

V.  Lefkaditis  and  A.  Manikas  —  Imperial  College  of  Science,  Technology  and  Medicine,  UK .  387 

On  Multiwindow  Estimators  for  Correlation 

A.  Hanssen  —  University  of  Tromso,  Norway .  391 

Asymptotic  Analysis  of  the  Least  Squares  Estimate  of  2-D  Exponentials  in  Colored  Noise 

G.  Cohen  and  J.  M.  Francos  —  Ben-Gurion  University,  Israel .  396 

Cross-Spectral  Methods  for  Processing  Biological  Signals 

D.  J,  Nelson  —  Department  of  Defense .  400 

Default  Prior  for  Robust  Bayesian  Model  Selection  of  Sinusoids  in  Gaussian  Noise 

C.  Andrieu  —  Cambridge  University,  UK 

J.-M.  Perez  —  Universidad  Simdn  Bolfvar,  Venezuela .  405 

On  the  Exact  Solution  to  the  “Gliding  Tone”  Problem 

L.  Galleani  and  L.  Cohen  —  City  University  of  New  York .  410 

Baseline  and  Distribution  Estimates  of  Complicated  Spectra 

D. J.  Thomson  —  Bell  Labs  .  414 


Distributed  Source  Localization  with  Multiple  Sensor  Arrays  and  Frequency-Selective  Spatial  Coherence 

R.  J.  Kozick  —  Bucknell  University 

B.  M.  Sadler  —  Army  Research  Laboratory .  419 


Table  of  Contents 

Deterministic  Maximum  Likelihood  DOA  Estimation  in  Heterogeneous  Propagation  Media 

P.  Stoica  —  Uppsala  University,  Sweden 
O.  Besson  —  ENSICA,  France 

A.  B.  Gershman  —  McMaster  University,  Canada  . 

Efficient  Signal  Detection  in  Perturbed  Arrays 

A.  M.  Rao  and  D.  L.  Jones  —  University  of  Illinois . 

A  Neural  Network  Approach  for  DOA  Estimation  and  Tracking 

L.  Badidi  and  L.  Radouane  —  LESSI,  Morocoo . 

Partially  Adaptive  Array  Algorithm  Combined  with  CFAR  Technique  in  Transform  Domain 

S.  Moon,  D.  Yun,  and  D.  Han  —  Kyungpook  National  University,  Korea  . 

A  New  Beamforming  Algorithm  Based  on  Signal  Subspace  Eigenvectors 

M.  Biguesh  and  M.  H.  Bastani  —  Sharif  University  of  Technology,  Iran 
S.  Valaee  —  Tarbiat  Modares  University,  Iran 

B.  Champagne  —  McGill  University,  Canada . 

Detection  of  Sources  in  Array  Processing  Using  the  Bootstrap 

R.  Brcich,  P.  Pelin  and  A.  Zoubir  —  Curtin  University  of  Technology,  Australia  . 

Robust  Localization  of  Scattered  Sources 

J.  Tabrikian  —  Ben-Gurion  University,  Israel 

H.  Messer  —  Tel  Aviv  University,  Israel . 


ISAR  Imaging  and  Crystal  Structure  Determination  from  EXAFS  Data  Using  a  Super-Resolution  Fast  Fourier 

G.  Zweig  —  Signition,  Inc. 

B.  Wohlberg  —  Los  Alamos  National  Laboratory . 

Analysis  of  Radar  Micro-Doppler  Signature  With  Time-Frequency  Transform 

V.  C.  Chen  —  Naval  Research  Laboratory . 

Estimating  the  Parameters  of  Multiple  Wideband  Chirp  Signals  in  Sensor  Arrays 

A.  B.  Gershman  and  M.  Pesavento  —  McMaster  University,  Canada 

M.  G.  Amin  —  Villanova  University . 

On  the  Use  of  Space-Time  Adaptive  Processing  and  Time-Frequency  Data  Representations  for  Detection  of  Near- 
Stationary  Targets  in  Monostatic  Clutter 

D.  C.  Braunreiter,  H.-W.  Chen,  M.  L.  Cassabaum,  J.  G.  Riddle,  A.  A.  Samuel,  J.  F.  Scholl  and  H.  A.  Schmitt  — 
Raytheon  Missile  Systems . 

Application  of  Adaptive  Joint  Time-Frequency  Processing  to  ISAR  Image  Formation 

H.  Ling  and  J.  Li  —  University  of  Texas  at  Austin . 

Joint  Time-Frequency  Analysis  of  SAR  Data 

R.  Fiedler  and  R.  Jansen  —  Naval  Research  Laboratory . 

Pulse  Propagation  in  Dispersive  Media 

L  Cohen  —  City  University  of  New  York . 


Wavelet-Based  Models  for  Network  Traffic 

D.  Wei  and  H.  Cheng  —  Drexel  University . 

The  Extended  On/Off  Process  for  Modeling  Traffic  in  High-Speed  Communication  Networks 

X.  Yang,  A.  P.  Petropulu  and  V.  Adams  —  Drexel  University  . 

A  Simulation  Study  of  the  Impact  of  Switching  Systems  on  Self-Similar  Properties  of  Traffic 

Y.  Zhou  and  H.  Sethu  —  Drexel  University . 



















Table  of  Contents 

Parameter  Estimation  in  Farima  Processes  with  Applications  to  Network  Traffic  Modeling 

J.  Ilow  —  Dalhousie  University,  Canada .  505 


Nonlinear  Filtering  Algorithm  with  its  application  in  INS  Alignment 

R.  Zhao  and  Q.  Gu  —  Tsinghua  University,  China . .  510 

GPS  Jammer  Suppression  with  Low-Sample  Support  Using  Reduced-Rank  Power  Minimization 

W.  L.  Myrick  and  M.  D.  Zoltowski  —  Purdue  University 

J.  S.  Goldstein  —  SAIC .  514 

Jammer  Excision  in  Spread  Spectrum  Using  Discrete  Evolutionary-Hough  Transform  and  Singular  Value 

R.  Suleesathira  and  L.  F.  Chaparro  —  University  of  Pittsburgh  .  519 

Spatial  and  Temporal  Processing  of  GPS  Signals 

P.  Xiong  and  S.  N.  Batalama  —  State  University  of  New  York  at  Buffalo 

M.  J.  Medley  —  Air  Force  Research  Laboratory .  524 

Subspace  Projection  Techniques  for  Anti-FM  Jamming  GPS  Receivers 

L  Zhao  and  M.  G.  Amin  —  Villanova  University 

A.  R.  Lindsey  —  Air  Force  Research  Laboratory .  529 

Session  TP-4.  WAVELETS 

Fixed-Point  HAAR-Wavelet-Based  Echo  Canceller 

M.  Doroslovacki  and  I.  Khan  —  George  Washington  University 

B.  Kosanovic  —  Texas  Instruments .  534 

Wavelet-Polyspectra:  Analysis  of  Non-Stationary  and  Non-Gaussian/Non-Linear  Signals 

Y.  Larsen  and  A.  Hanssen  —  University  of  Tromso,  Norway .  539 

Adaptive  Seismic  Compression  by  Wavelet  Shrinkage 

M.F.  Khdne  and  S.H.  Abdul-Jauwad  —  King  Fahd  University  of  Petroleum  &  Minerals,  Saudi  Arabia .  544 

Representations  of  Stochastic  Processes  Using  COIFLET-Type  Wavelets 

D.  Wei  and  H.  Cheng  —  Drexel  University .  549 


Time-Frequency  Coherence  Analysis  of  Nonstationary  Random  Processes 

G.  Matz  and  F.  Hlawatsch  —  Vienna  University  of  Technology  Austria .  554 

Multi-Component  IF  Estimation 

Z.  M.  Hussain  and  B.  Boashash  —  Queensland  University  of  Technology  Australia .  559 

Detection  of  Seizures  in  Newborns  Using  Time-Frequency  Analysis  of  EEG  Signals 

B.  Boashash,  H.  Carson  and  M.  Mesbah  —  Queensland  University  of  Technology,  Australia .  564 

Multitaper  Reduced  Interference  Distribution 

S.  Aviyente  and  W.  J.  Williams  —  University  of  Michigan .  569 

Instantaneous  Spectral  Skew  and  Kurtosis 

P.  J.  Loughlin  and  K.  L.  Davidson  —  University  of  Pittsburgh .  574 

Adaptive  Time-Frequency  Representations  for  Multiple  Structures 

A.  Papandreou-Suppappola  —  Arizona  State  University 

S.  B.  Suppappola  —  Pipeline  Technologies,  Inc .  579 

A  Resolution  Performance  Measure  for  Quadratic  Time-Frequency  Distributions 

B.  Boashash  and  V.  Sucic  —  Queensland  University  of  Technology  Australia  .  584 

The  Wigner  Distribution  for  Ordinary  Linear  Differential  Equations  and  Wave  Equations 

L.  Galleani  and  L.  Cohen  —  City  University  of  New  York .  589 


Table  of  Contents 

Application  of  Time-Frequency  Techniques  for  the  Detection  of  Anti-Personnel  Landmines 

fi.  Barkat,  A.M.  Zoubir  and  C.L.  Brown  —  Curtin  University  of  Technology,  Australia 

A  New  Matrix  Decomposition  Based  on  Optimum  Transformation  of  the  Singular  Value  Decomposition  Basis  Sets 
Yields  Principal  Features  of  Time-Frequency  Distributions 

D.  Groutage  —  Naval  Surface  Warfare  Center 

D.  Bennink  —  Applied  Measurements  Systems  Inti. . 

Minimum  Entropy  Time-Frequency  Distributions 

A.  El-Jaroudi  —  University  of  Pittsburgh . 

Uncertainty  in  the  Time-Frequency  Plane 

P.  M.  Oliveira  —  Escoia  Naval,  Portugal 

V.  Barroso  —  Instituto  Superior  Tdcnico  iSR/DEEC,  Portugal . 

High  Resolution  Frequency  Tracking  via  Non-Negative  Time-Frequency  Distributions 

R.  M.  Nickel  and  W.  J.  Williams  —  University  of  Michigan . 







A  Cumulant  Subspace  Approach  to  FIR  Multiuser  Channel  Estimation 

J.  Liang  and  Z.  Ding  —  University  of  Iowa . 

An  Efficient  Forth  Order  System  Identification  (FOSI)  Algorithm  Utilizing  the  Joint  Diagonalization  Procedure 

A.  Belouchrani  —  Ecole  National  Polytechnique,  Algeria 

B.  Derras  —  Cirrus  Logic  Inc. . 

Unity-Gain  Cumulant-Based  Adaptive  Line  Enhancer 

R.  R.  Gharieb  and  A.  Cichocki  —  RIKEN,  Japan 

Y.  Horita  and  T.  Murai  —  Toyama  University,  Japan . 

Adaptive  Detection  and  Extraction  of  Sparse  Signals  Embdded  in  Colored  Gaussian  Noise  Using  Higher  Order 

R.  R.  Gharieb  and  A.  Cichocki  —  RIKEN,  Japan 

S.  F.  Filipowicz  —  Warsaw  University  of  Technology,  Poland . 

Higher-Order  Matched  Field  Processing 

R.  M.  Dizaji,  R.  L.  Kirlin,  and  N.  R.  Chapman  —  University  of  Victoria,  Canada 
Multiwindow  Bispectral  Estimation 

Y.  Birkelund  and  A.  Hanssen  —  University  of  Tromse,  Norway 


Global  Convergence  of  a  Single-Axis  Constant  Modulus  Algorithm 

A.  Shah,  S.  Biracree,  R.  A.  Casas,  T.  J.  Endres,  S.  Hulyalkar,  T.  A.  Schaffer,  and  C.  H.  Strolle  —  NxtWave 
Communications  . 

A  Novel  Modulation  Method  for  Secure  Digital  Communications 

A.  Salberg  and  A.  Hanssen  —  University  of  Tromse,  Norway 

A  Multitime-Frequency  Approach  for  Detection  and  Classification  of  Noisy  Frequency  Modulations 

M.  Colas,  G.  Gelle,  and  G.  Delaunay  —  L.A.M.-URCA,  France 
J.  Galy  —  L.I.R.M.M.,  France . 

NDA  PLL  Design  for  Carrier  Phase  Recovery  of  QPSK/TDMA  Bursts  without  Preamble 

J.  Lee  —  COMSAT  Laboratories . 

An  Optimized  Multi-Tone  Calibration  Signal  for  Quadrature  Receiver  Communication  Systems 

R.  A.  Green  —  North  Dakota  State  University . 

A  Polynomial  Rooting  Approach  for  Synchronization  in  Multipath  Channels  Using  Antenna  Arrays 

G.  Seco  and  J.  A.  Fermkndez-Rubio  —  Univ.  Politdcnica  de  Catalunya,  Spain 
A.  L.  Swindlehurst  —  Brigham  Young  University 















Table  of  Contents 

Super-Exponential-Estimator  for  Fast  Blind  Channel  Identification  of  Mobile  Radio  Fading  Channels 

A.  Schmidbauer  —  Munich  University  of  Technology,  Germany .  673 

Finite  Data  Record  Maximum  SINR  Adaptive  Space-Time  Processing 

I.  N.  Psaromiligkos  and  S.  N.  Batalama  —  State  University  of  New  York  at  Buffalo .  677 

On  the  Effects  of  Rotating  Blades  on  DS/SS  Communication  Systems 

Y.  Zhang  and  M.  G.  Amin  —  1 /Ulanova  University 

V.  Mancuso  —  Boeing  Helicopter  Division .  682 

Joint  Synchronization  and  Symbol  Detection  in  Asynchronous  DS-CDMA  Systems 

F.  Rey  G.  Vizquez,  and  J.  Riba  —  Polytechnic  University  of  Catalonia,  Spain .  687 

New  Criteria  for  Blind  Equalization  of  M-PSK  Signals 

Z.  Xu  and  P.  Liu  —  University  of  California .  692 

Third-Order  Blind  Equalization  Properties  of  Hexagonal  Constellations 

C.  D.  Murphy  —  Helsinki  University  of  Technology,  Finland .  697 


Comparison  of  the  Cyclostationary  and  the  Bilinear  Approaches:  Theoretical  Aspects  and  Applications  to  Industrial 


L.  Bouillaut  and  M.  Sidahmed  —  Universite  de  Technologie  de  Compiegne,  France .  702 

Array  Processing  of  Underwater  Acoustic  Sensors  Using  Weighted  Fourier  Integral  Method 

/.  S.  D.  Solomon  and  A.  J.  Knight  —  Defence  Science  and  Technology  Organisation,  Australia .  707 

A  Hierarchical  Algorithm  for  Nearfield  Acoustic  Imaging 

M.  Peake  and  M.  Karan  —  CSSIP,  Australia 

D.  Gray — University  of  Adelaide,  Australia .  712 

An  Introduction  to  Synthetic  Aperture  Sonar 

D.  Marx,  M.  Nelson,  E.  Chang,  W.  Gillespie,  A.  Putney,  and  K.  Warman  —  Dynamics  Technology,  Inc .  717 

Classification  of  Acoustic  and  Seismic  Data  Using  Nonlinear  Dynamical  Signal  Models 

R.  K.  Lennartsson  —  Defence  Research  Establishment,  Sweden 

A.  Pentek  and  J.  B.  Kadtke  —  University  of  California .  722 

The  Performance  of  Sparse  Time-Reversal  Mirrors  in  the  Context  of  Underwater  Communications 

J.  Gomes  and  V.  Barroso  —  Instituto  Superior  Tbcnico  —  Instituto  de  Sistemas  e  Robdtica,  Portugal .  727 

Beam  Patterns  of  an  Underwater  Acoustic  Vector  Hydrophone 

K.  T.  Wong  —  Chinese  University  of  Hong  Kong,  China 

H.  Chi  —  Purdue  University  .  732 



Yumin  Zhang  and  Rick  S.  Blum 

EECS  Department,  Lehigh  University 
Bethlehem,  PA  18015 


The  combination  of  Turbo  codes  and  space-time  block 
codes  is  studied  for  use  in  CDMA  systems.  Each  user’s 
data  are  first  encoded  by  a  Turbo  code.  The  Turbo  coded 
data  are  next  sent  to  a  space-time  block  encoder  which 
employs  a  BPSK  constellation.  The  space-time  en¬ 
coder  output  symbols  are  transmitted  through  the  fading 
channel  using  multiple  antennas.  A  multistage  receiver 
is  proposed  using  non-linear  MMSE  estimation  and  a 
parallel  interference  cancellation  scheme.  Simulations 
show  that  with  reasonable  levels  of  multiple  access  in¬ 
terference  (p  <  0.3 ),  near  single  user  performance  is 
achieved.  The  receiver  structure  is  generalized  to  de¬ 
code  CDMA  signals  with  space-time  convolutional  cod¬ 
ing  and  similar  performance  is  observed. 


Space-time  codes  [l]-[4]  use  multiple  transmit  and  re¬ 
ceive  antennas  to  achieve  diversity  and  coding  gain  for 
communication  over  fading  channels.  High  bandwidth 
efficiency  is  achieved,  with  performance  close  to  the 
theoretical  outage  capacity  [1].  Turbo  codes  [5]  are 
a  family  of  powerful  channel  codes,  which  have  been 
shown  to  achieve  near  Shannon  capacity  over  additive 
white  Gaussian  noise  channels.  Since  their  introduc¬ 
tion,  both  space-time  codes  and  Turbo  codes  have  re¬ 
ceived  considerable  attention.  In  the  CDMA2000  Ra¬ 
dio  Transmission  Technology  (RTT)  proposed  for  the 
third  generation  systems,  both  space-time  codes  and 
Turbo  codes  have  been  adopted  [6]. 

Although  papers  treating  either  just  space-time  codes 
or  Turbo  codes  abound,  jointly  considering  space-time 
codes  and  Turbo  codes  in  CDMA  systems  is  a  relatively 
new  topic.  In  this  paper,  we  initiate  a  study  on  this 
topic  where  we  focus  on  space-time  block  codes  [3]  [4], 
Our  research  develops  suboptimum  low-complexity  re¬ 
ceivers,  which  will  be  needed. 

This  paper  is  organized  as  follows.  Section  2  first 
sets  up  the  system  configuration  and  develops  the  re¬ 
ceived  signal  model.  A  brief  review  of  space-time  block 

codes  is  given  in  Section  3.  The  structure  of  our  mul¬ 
tistage  receiver  is  discussed  in  Section  4.  Section  5 
presents  simulation  results.  Conclusions  are  given  in 
Section  6. 


Fig.  2  depicts  a  K  user  synchronous  CDMA  system 
with  combined  Turbo  coding  and  space-time  block  cod¬ 
ing.  There  are  N  transmit  antennas  and  M  receive  an¬ 
tennas  in  the  system.  Suppose  user  k,  k  =  1, ...,  K,  has 
a  block  of  binary  information  bits  {dk{i),i  =  1,  ■■■,  Lx) 
to  transmit.  These  bits  are  first  encoded  by  a  Turbo 
code  with  rate  Rx  =  The  bits  which  are  produced 

by  the  Turbo  encoder,  denoted  by  {dk{i),i  =  1, ...,  L2}, 
are  passed  to  a  space-time  block  encoder.  This  space- 
time  block  code  uses  a  transmission  matrix  Gn  [3]  with 
a  BPSK  constellation,  generates  N  output  bits  dur¬ 
ing  each  time  slot,  and  has  rate  R2  =  qjf.  During 
time  slot  l,  N  bits  are  transmitted,  which  are  denoted 
by  {b„k(l),  n  =  l,...,iV},  for  l  =  1  The  bit 

bnk{l)  £  {—1,4-1}  is  spread  using  a  unique  spreading 
waveform  s&(t)  and  transmitted  using  antenna  n.  For 
convenience  we  denote  the  vector  of  nth  output  bits 
from  all  K  users  as  b „(/)  =  [bni(l),  ...,bnK(l)]T ,  and 
we  note  that  all  of  these  bits  are  transmitted  by  an¬ 
tenna  n  during  time  slot  l.  We  define  the  set  of  bits 
{b„(f),  l  =  0,  ...,L  —  1}  as  one  frame  of  data. 

The  fading  coefficient  for  the  path  between  transmit 
antenna  n  and  receive  antenna  m  is  denoted  by  anm .  In 
our  research,  we  assume  a  flat  quasi-static  fading  envi¬ 
ronment  [3],  where  the  fading  coefficients  are  constant 
during  a  frame  and  are  independent  from  one  frame  to 
another.  Further  we  assume  for  simplicity  that  perfect 
estimates  of  all  fading  coefficients  are  available  at  the 
receiver.  The  received  signal  at  antenna  m  is 
N  K  L—l 

'.w  =  £  E  £  (^nmAkbnk{l)Sk(t  lT)-^-T)m(t)  (1) 

n—1  k= 1  1=0 

where  T  is  the  bit  period,  Ak  is  the  transmitted  signal 

0-7803-5988-7/00/$  10.00  ©  2000  IEEE 


amplitude  for  user  k,  and  r]m(t)  is  the  complex  channel 
noise  at  receive  antenna  m.  The  received  signal  rm(t ) 
is  next  passed  through  a  matched  filter  bank,  with  each 
filter  matched  to  one  user’s  spreading  waveform.  De¬ 
note  the  matched  filter  outputs  at  receive  antenna  m 
for  the  time  slot  j  by  ym(j)  =  \ymi(j),  ■■■■,VmK{j)]T ■ 
The  equation  describing  ym(j)  can  be  represented  in 
vector  form  as 


y  m(j)  =  RA  O')  +  nm0) 


m  =  j  =  0,...,L  —  1.  (2) 

where  R  is  the  K  x  K  cross-correlation  matrix  of  the 
spreading  codes,  A  =  diag(A\,  ...,Ak),  and  nm0)  is 
the  K  x  1  complex  noise  vector  after  matched  filter¬ 
ing.  Assuming  the  channel  noise  is  Gaussian  with  zero 
mean  and  autocorrelation  function  <r2<5(r),  nm(j)  has 
a  multidimensional  Gaussian  distribution  TV(0,  cr2R). 


An  extensive  discussion  of  space-time  block  codes  is 
given  in  [3]  [4].  Here  we  consider  only  TV  =  2  antenna 
cases.  Extension  to  TV  >  2  cases  is  straightforward.  A 
BPSK  space-time  block  code  with  two  transmit  anten¬ 
nas  is  described  by  the  transmission  matrix 

The  encoder  works  as  follows.  The  block  of  L2  Turbo 
coded  bits  enter  the  encoder  and  are  grouped  into  units 
of  two  bits.  Each  group  of  two  bits  are  mapped  to  a 
pair  of  BPSK  symbols  sj  and  82.  These  symbols  are 
transmitted  during  two  consecutive  time  slots.  During 
the  first  time  slot,  Si  and  s 2  are  transmitted  simultane¬ 
ously  from  antenna  one  and  two  respectively.  During 
the  second  time  slot,  -s 2  and  Si  are  transmitted  si¬ 
multaneously  from  antenna  one  and  two,  respectively. 
The  code  rate  of  C?2  is  1. 

In  [3]  [4],  the  transmission  matrix  is  designed  so  that 
the  columns  are  orthogonal  to  each  other.  This  allows 
a  simple  receiver  structure  using  only  linear  processing. 
We  illustrate  this  using  the  code  described  in  (3)  as  an 
example.  Extension  to  TV  >  2  cases  is  straightforward. 
Assuming  there  are  M  receive  antennas,  the  received 
signal  at  antenna  m  during  the  first  and  second  time 
slots,  denoted  by  ym(  1)  and  ym( 2),  are 

2/m(  1)  —  QUm^l  +  OL2mS2  T  rim(l) 

2/m(2)  =  Oi\mS2  T  oc2mS\  Tnm(2)  (4) 

where  nm(l)  and  nm( 2)  are  two  iid  complex  Gaussian 
noise  samples  with  variance  a2.  The  observations  in 

(4)  can  be  combined  to  yield  the  improved  quantities 
si  and  S2  using 

=  +  Q:2m?/rn(^) 

=  T  |Q2m|  )®1  4"  QqmJlm(l)  "h  Q;2m^'m(2) 

*2  =  a2mVm(X)  —  °1  m3/m(^) 

=  (l^lml  _t"|o!2m|  )S2  +  CK2m?Tm(l)  Oim7lm(2) 

Combining  quantities  obtained  at  each  receive  antenna 


h  =  (aim2/m(l)  +  £*2m2/m(2))  =  C  SX  +  Tlx 
m— 1 

^2  =  ^{almym{l)-aiimy*in{2))  =  Cs2  +  n2  (5) 

m= 1 



C=X;(Kn  |2  +  |a2m|2).  (6) 

m—  1 

The  Gaussian  noise  variables  ni  and  «2  have  variance 


°b  =  dal™|2  +  la2m|2)  (7) 

m= 1 

It  is  easily  seen  from  (5),  (6)  and  (7)  that  after  this  sim¬ 
ple  linear  combining,  the  resulting  signals  are  equiva¬ 
lent  to  those  obtained  from  using  maximal  ratio  com¬ 
bining  [7]  techniques  for  systems  with  1  transmit  an¬ 
tenna  and  2M  receive  antennas.  This  combining  tech¬ 
nique  will  be  used  in  two  places  in  our  low-complexity 
receiver  as  discussed  in  the  next  section. 


The  optimum  receiver  that  minimizes  the  frame  error 
rate  should  construct  a  “super-trellis”  for  decoding. 
The  super-trellis  combines  the  trellis  of  Turbo  codes 
and  the  structure  of  the  multiuser  channel  and  space- 
time  block  codes.  Due  to  the  interleavers  used  in  the 
Turbo  codes,  it  is  very  hard  to  construct  such  a  super¬ 
trellis.  In  fact,  “optimum  decoding”  for  Turbo  codes 
alone  is  impossible  in  practice.  This  is  why  subopti¬ 
mum  iterative  decoding  schemes  are  used  to  decode 
Turbo  codes  [5].  Thus  instead  of  trying  to  find  an 
optimum  receiver,  which  would  obviously  have  a  pro¬ 
hibitively  high  complexity,  our  goal  in  this  section  is  to 
develop  a  low-complexity  suboptimum  receiver. 

We  suggest  the  multistage  receiver  structure  de¬ 
picted  in  Fig.  2.  The  output  of  the  matched  filter  bank 
is  first  passed  to  a  detector  [8],  which 
attempts  to  eliminate  the  multiple  access  interference 
(MAI)  completely  with  perfect  estimation.  The  output 


of  the  decorrelating  detector  at  receive  antenna  m  and 
time  slot  j  is 


y m(j)  =  (RA)-1ym(i)  =  '52anmbn(j)  +  n m(j)  (8) 

n= 1 

where  we  defined  the  noise  vector  nm(j)  —  (RA)_1nm(j), 
which  has  a  Gaussian  distribution  with  covariance  ma¬ 

R  =  cr2(ARA)-1.  (9) 

The  elements  from  yi(j),  ...,  ym (j )  corresponding  to 
the  feth  user,  denoted  by  yik(j),—,yMk(j),  are  com¬ 
bined  using  the  technique  discussed  in  Section  3  to  pro¬ 
vide  improved  observations  for  user  k.  These  improved 
observations  are  sent  to  a  single  user  Turbo  decoder  to 
perform  the  first  stage  of  decoding.  The  Turbo  decoder 
produces  posterior  probabilities  for  user  fc’s  transmit¬ 
ted  bits.  These  posterior  probabilities,  together  with 
the  diversity  combined  observations,  are  used  by  a  soft 
estimator  to  form  soft  estimates  of  user  k’s  transmitted 

The  soft  estimator  uses  non-linear  minimum  mean 
square  error  (MMSE)  estimation  [9]  to  form  the  soft 
estimates.  From  (5),  it  is  seen  that  the  diversity  com¬ 
bined  observations  for  user  k  can  always  be  represented 
in  the  form  of  y  =  Cb  +  n,  where  y  is  the  noisy  obser¬ 
vation,  b  is  the  transmitted  bit,  C  is  a  known  constant 
and  n  is  a  complex  Gaussian  noise  sample  with  vari¬ 
ance  denoted  by  a%.  The  soft  estimate  of  b  is  obtained 


2Re(Cy*  ) 

Pr(b=+1)  — >  ^ 
Pr(b=- l)c 


b  —  e  "b 




+  e 

2Re(Cy*  ) 


where  the  prior  probabilities  Pr(b  =  ±1)  can  be  up¬ 
dated  using  the  posterior  probabilities  obtained  by  the 
Turbo  decoders. 

The  transmitted  signals  are  reconstructed  using  the 
soft  estimates  as  if  they  were  binary  digits.  Denote 
the  reconstructed  encoder  output  for  antenna  n  and 
user  k  during  time  slot  j  as  bnk{j)  and  define  bn(j)  = 
[5ni  (j), bnK(j)]T.  The  reconstructed  signals  {b „(j), 
n  =  1  ,—,N,  j  =  0,  ...,L  —  1}  are  used  in  soft  MAI 
cancellation  to  produce  “cleaner”  received  signals  for 
each  user.  To  cancel  MAI  for  user  k,  we  first  define  a 
vector  b„  (J)  equal  to  b n(j)  except  that  its  kth  ele¬ 
ment  is  zero.  The  MAI-reduced  observation  for  user  k 
at  receive  antenna  m  is  obtained  using 


ym {k)(j)  =  y m(j)  -  RA  ^2  anmbj1fc)(j)  (if) 

n= 1 

When  perfect  estimate  of  b„Q')  is  available,  ym^(i) 
offers  K  different  observations  of  the  signal  from  user  k, 
contaminated  only  by  channel  noise.  For  simplicity,  we 
use  the  fcth  element  of  ym^  (j)  for  processing,  which 
gives  the  highest  SNR  for  user  k.  The  fcth  elements 
of  y m(^(i)»  m  =  1  at  all  receive  antennas  are 

combined  using  the  techniques  discussed  in  Section  3. 
The  improved  observations  are  passed  to  another  set  of 
Turbo  decoders  to  perform  the  second  stage  of  decod¬ 
ing.  These  Turbo  decoders  produce  the  final  “hard” 
decisions  on  each  user’s  transmitted  bits. 


Monte  Carlo  simulations  are  carried  out  to  study  the 
performance  of  the  proposed  multistage  receiver.  Con¬ 
sider  a  4  user  synchronous  CDMA  system  with  2  trans¬ 
mit  antennas  and  2  receive  antennas.  Each  user’s  bits 
are  first  encoded  by  a  rate  1/3  Turbo  code  with  con¬ 
straint  length  v  =  5  and  generator  23,  35  (octal  form). 
The  random  interleaver  chosen  for  the  Turbo  code  has 
length  128.  The  block  of  Turbo  coded  data  is  encoded 
using  a  space-time  block  code  with  the  code  matrix 
t/2  from  (3)  and  a  BPSK  constellation.  Next  the  out¬ 
put  bits  are  spread  using  each  user’s  spreading  wave¬ 
form  and  the  results  are  transmitted  using  2  antennas 
over  the  fading  channel.  The  path  gains  are  modeled 
as  samples  of  independent  complex  Gaussian  random 
variables  with  variance  0.5  per  dimension  (real  or  imag¬ 
inary).  Quasi-static  fading  is  assumed.  For  the  CDMA 
channel,  we  use  the  symmetric  channel  model  where 
the  cross-correlation  between  all  pairs  of  two  users  is 
the  common  value  p.  The  SNR  for  user  k  is  defined  as 

SNRk  = 




Fig.  3  gives  the  BER  performance  of  the  proposed 
multistage  receiver  in  Gaussian  noise  when  all  users 
have  the  same  power  (A  =  I).  The  BER  performance 
for  the  first  stage  and  second  stage  decoding  are  both 
plotted,  which  we  denote  by  “51”  and  “52”  on  the 
graph.  For  comparison,  we  also  give  the  single  user 
performance,  which  is  the  Turbo  code  performance  for 
the  fading  channel  under  consideration.  The  perfor¬ 
mance  of  the  space-time  block  code  using  Q2  without 
the  Turbo  coding  is  also  shown.  For  p  =  0.1,  single  user 
performance  is  nearly  achieved  after  just  the  first  stage 
decoding.  The  second  stage  decoding  curve  is  indis¬ 
tinguishable  from  that  of  the  single  user  performance. 
For  p  =  0.3,  the  performance  improvement  obtained 
by  employing  the  second  stage  of  decoding  is  obvious 
from  Fig.  3b.  After  the  second  stage  decoding,  single 
user  performance  is  approached.  By  combining  a  Turbo 
code  with  a  space-time  block  code,  a  performance  gain 


of  about  2.5dB  is  achieved  at  BER=10-4  compared  to 
using  a  space-time  block  code  only. 

An  iterative  receiver  structure  can  be  easily  con¬ 
structed  by  feeding  back  the  posterior  information  ob¬ 
tained  after  the  second  stage  decoding  to  the  soft  es¬ 
timators.  We  have  carried  out  simulations  using  this 
iterative  structure,  but  results  show  that  the  improve¬ 
ment  over  the  second  stage  of  decoding  is  marginal.  In 
Fig.  3b,  we  plot  the  BER  performance  for  the  second 
iteration  of  the  “iterative  receiver”  (denoted  by  “Ite 
2”),  which  is  almost  indistinguishable  from  the  second 
stage  decoding  curve.  Thus  the  extra  computations 
incurred  by  the  iterative  structure  are  not  justified. 

Next  we  study  the  performance  of  our  receiver  in 
a  near-far  situation  where  two  users  are  20dB  stronger 
than  the  other  two  users,  all  other  parameters  remain 
the  same  as  in  Fig  3.  The  BER  performance  for  the 
strong  user  and  weak  user  are  given  in  Fig.  4a  and  4b 
respectively.  The  performance,  for  both  the  weak  and 
strong  users,  approaches  single  user  performance  after 
the  second  stage  decoding. 

Finally,  we  point  out  that  the  received  signal  model 
in  (2)  is  also  valid  for  a  CDMA  system  with  space-time 
convolutional  coding  [1]  replacing  the  combination  of 
space-time  block  codes  and  Turbo  codes.  An  iterative 
receiver  can  be  constructed  using  the  parallel  interfer¬ 
ence  cancellation  scheme  [10].  Fig.  1  gives  the  frame 
error  rate  performance  for  the  first  two  iterations  of  the 
iterative  receiver  for  a  CDMA  system  with  space-time 
convolutional  coding.  It  is  seen  that  with  2  iterations, 
single  user  performance  is  achieved.  Another  observa¬ 
tion  is  that  the  performance  improvement  obtained  by 
employing  the  iterative  structure  is  marginal.  This  is 
consistent  with  our  previous  observations  for  the  space- 
time  block  coded  system. 


In  this  paper,  we  studied  the  application  of  Turbo  codes 
and  space-time  block  codes  in  CDMA  systems.  A  mul¬ 
tistage  receiver  is  proposed  using  parallel  interference 
cancellation  schemes.  Simulation  results  show  that  with 
reasonable  levels  of  MAI  (p  <  0.3),  near  single  user  per¬ 
formance  can  be  achieved.  The  receiver  developed  in 
this  paper  was  generalized  to  decode  CDMA  signals 
with  space-time  convolutional  coding  and  similar  per¬ 
formance  was  observed. 


[1]  V.  Tarokh,  N.  Seshadri,  and  A.  R.  Calderbank, 
”  Space-time  codes  for  high  data  rate  wireless  com¬ 
munication:  Performance  criteria  and  code  con¬ 
struction,”  IEEE  Trans.  Info.  Theo.,  vol.  44,  No. 
2,  pp.  744-765,  Mar.  1998. 

[2]  S.  M.  Alamouti,  “A  simple  transmitter  diver¬ 
sity  scheme  for  wireless  communications,”  IEEE 
JSAC,  vol.  16,  No.  8,  pp.  1451-1458,  Oct.  1998. 

[3]  V.  Tarokh,  H.  Jafarkhani,  and  A.  R.  Calderbank, 
”  Space-time  block  coding  for  wireless  communica¬ 
tions:  performance  results,”  IEEE  JSAC,  vol.  17, 
No.  3,  pp.  451-460,  March  1999. 

[4]  V.  Tarokh,  H.  Jafarkhani,  and  A.  R.  Calderbank, 
’’Space-time  block  codes  from  orthogonal  designs,” 
IEEE  Trans.  Info.  Theo.,  vol.  45,  No.  5,  pp.  1456- 
1467,  July  1999. 

[5]  C.  Berrou  and  A.  Glavieux,  “Near  optimum  error 
correcting  coding  and  decoding:  Turbo-Codes,” 
IEEE  Trans.  Comm,.,  vol.  44,  No.  10,  pp.  1261- 
1271,  Oct.  1996. 

[6]  S.  Dennett,  “The  CDMA2000  ITU-R  RTT  candi¬ 
date  submission,”  V.  0.17,  TIA,  July  28,  1998. 

[7]  J.  G.  Proakis,  Digital  Communications,  3rd  Edi¬ 
tion,  McGraw-Hill,  1995. 

[8]  S.  Verdu,  Multiuser  Detection,  UK:  Cambridge 
University  Press,  1998. 

[9]  A.  Papoulis,  Probability,  Random  Variables  and 
Stochastic  Processes,  New  York:  McGraw-Hill, 

[10]  Yumin  Zhang,  Iterative  and  Adaptive  Receivers 
For  Wireless  Communication  and  Radar  Systems, 
Ph.D.  Dissertation,  Lehigh  University,  May  2000. 

Figure  1:  Performance  of  the  iterative  multiuser  re¬ 
ceiver  for  CDMA  with  space-time  convolutional  coding 
[10]  with  K  —  4,  p  =  0.3,  4-PSK  S-T  code  with  rate 
2/b/s/Hz,  130  symbols  per  frame,  2  transmit  and  2  re¬ 
ceive  antennas  where  MMSE  is  used  in  the  first  stage 


Figure  2:  Structure  of  our  K  user  CDMA  system  (including  our  multistage  receiver)  with  combined  Turbo  coding 
and  space-time  block  coding,  N  transmit  antennas  and  M  receive  antennas. 



Figure  3:  Performance  of  the  multistage  receiver  for  CDMA  with  Turbo  coding  and  space-time  block  coding  with 
K=4  users,  2  transmit  and  2  receive  antennas. 

0123456769  10 


(a)  p  =  0.1 

SNR(dB)  SNR(dB) 

(a)  Strong  user  (b)  Weak  user 

Figure  4:  Performance  of  the  multistage  receiver  for  CDMA  with  Turbo  coding  and  space-time  block  coding  under 
a  near-far  situation  with  K=4,  p  =  0.3,  2  transmit  and  2  receive  antennas.  Two  users  are  20dB  stronger  than  the 
other  two  users. 



Christophe  Andrieu t  -  Arnaud  Douce  A  -  Azzedine  Touznfi 

^Signal  Processing  Group,  Engineering  Dept.  Cambridge  University 
Trumpington  Street,  CB2  1PZ  Cambridge,  UK. 

*NxtWave  Communications,  Langhome,  PA  19047,  USA.  -  - 


This  paper  presents  an  adaptive  multi-user  maximum  a  pos¬ 
teriori  (MAP)  decoder  for  synchronous  code  division  mul¬ 
tiple  access  (CDMA)  signals  on  fading  channels.  The  key 
idea  is  to  interpret  this  problem  as  an  optimal  filtering  prob¬ 
lem.  An  efficient  particle  filtering  method  is  then  developed 
to  solve  this  complex  estimation  problem.  Simulation  re¬ 
sults  demonstrate  the  efficiency  of  our  method. 

1  Introduction 

Code  division  multiple  access  (CDMA)  systems  have  re¬ 
ceived  much  attention  in  recent  years  [13].  For  the  case  of  a 
known  channel  with  additive  Gaussian  noise,  the  maximum 
likelihood  (ML)  optimal  receiver  was  presented  by  Verdu 
[16].  Lower-complexity  linear  receivers  have  also  been  pre¬ 
sented  in  this  case.  In  the  presence  of  unknown  fading 
channels,  the  estimation  problem  to  be  solved  is  much  more 
complex.  MMSE  linear  receivers  have  also  been  presented 
in  this  context .  However  it  turns  out  that  the  rate  of  adap¬ 
tation  for  these  linear  techniques  is  not  sufficient  to  track 
fast-fading  channels  and  more  sophisticated  approaches  are 
required.  Recently,  more  efficient  methods  have  been  pro¬ 
posed;  see  for  example  [5],  [6]  where  coupled  estimators 
combining  a  Viterbi  algorithm  and  an  MMSE  predictor  are 

In  this  paper  we  follow  a  Bayesian  probabilistic  approach. 
A  state-space  model  is  included  to  model  explicitly  the  non¬ 
stationary  of  the  fading  channel.  This  allows  us  to  formulate 
the  problem  of  estimating  a  posteriori  symbol  probabilities 
as  a  complex  optimal  filtering  problem.  Under  assumptions 
detailed  later  on,  it  is  well  known  that  exact  computation  of 
these  probabilities  involves  a  prohibitive  computational  cost 
exponential  in  the  (growing)  number  of  observations.  Thus 
one  needs  to  perform  some  approximations. 

We  present  here  a  simulation-based  method  for  solving 
this  problem.  This  so-called  particle  filtering  method  can  be 
viewed  as  a  randomized  adaptive  grid  approximation  of  the 
posterior  distribution.  As  will  be  shown  later,  the  particles 

C.  Andrieu  is  sponsored  by  AT&T  Laboratories,  Cambridge  UK. 

0-7803-5988-7/00/$  10.00  ©  2000  IEEE 

(values  of  the  grid)  evolve  randomly  in  time  according  to  a 
simulation-based  rule.  The  weights  of  the  particles  are  up¬ 
dated  according  to  Bayes’  rule.  The  most  striking  advantage 
of  these  MC  particle  filters  is  that  the  rate  of  convergence  of 
the  error  towards  zero  is  independent  of  the  state  dimension. 
That  is,  the  randomization  implicit  in  the  particle  filter  gets 
around  the  curse  of  dimensionality.  Taking  advantage  of 
the  increase  of  computational  power  and  the  availability  of 
parallel  computers,  several  authors  have  recently  proposed 
such  particle  methods  following  the  seminal  paper  of  Gor¬ 
don  et  al.  [11],  see  [7],  [8]  for  a  summary  of  the  state-of- 
the-art  and  [2],  [14],  [15]  for  other  applications  in  digital 
communications.  It  has  been  shown  that  these  methods  out¬ 
perform  the  standard  suboptimal  methods. 

We  propose  in  this  paper  an  improved  particle  method 
where  the  filtering  distribution  of  interest  is  approximated 
by  a  Gaussian  mixture  of  a  large  number,  say  N,  of  compo¬ 
nents  which  evolve  stochastically  over  time  and  are  driven 
by  the  observations.  Though  it  is  rather  computationally  in¬ 
tensive,  it  can  be  easily  implemented  on  parallel  processors. 

The  rest  of  the  paper  is  organized  as  follows.  In  Section 
2,  we  state  the  model  and  the  estimation  objectives.  In  Sec¬ 
tion  3,  we  describe  particle  filtering  methods.  Finally  we 
demonstrate  the  efficiency  of  our  algorithm  in  Section  4. 

2  System  Model  and  Estimation  Objectives 
2.1  System  model 

We  follow  here  the  presentation  in  [5],  [6].  Consider  a 
synchronous  CDMA  system  with  a  single-antenna  at  the 
centralized  receiver.  The  system  has  M  users,  each  trans¬ 
mitting  using  a  know  direct  sequence  (DS)  spreading  code 
with  processing  gain  G  (i.e.  G  chips  per  symbol).  For  user 
to,  the  spreading  code  is  represented  by  the  G  x  1  vector 
sm  =  [sm,o,  •  ■  • ,  sm,G-i]T-  At  time  t,  user  to  transmits  a 
symbol  xmit  of  period  T  =  GTC,  where  Tc  is  the  chip  inter¬ 
val.  Each  chip  sm,cxm;t  is  affected  by  the  flat-fading  chan¬ 
nel  fm,k ,  represented  at  the  chip  rate  where  k  =  Gt  +  c. 
Note  that  t  is  used  as  an  index  at  the  symbol  rate,  and  k  is 
used  as  an  index  at  the  chip  rate. 


At  the  receiver,  the  incoming  signal  is  sampled  at  the 
chip  rate  to  obtain  zu.  Assuming  a  synchronous  system,  the 
received  samples  are  given  by 


Zk  =  'y  ^  ^m,  [k/G\  ^m,mod(k.G)fm}k  "b 
m= 1 

for  k  =  0, ... ,  GT  —  1.  In  vector-matrix  notation 


^  ^  Wj,  (1) 


for  t  =  0,. . . ,  T  —  1,  where  Sm  =  diag  (sm), 

zt  —  [ *Gt ,  ■■■,  2G(i+l)-l]T  wn  —  [u>Gt,  ■  •  • ,  ^G(*+l)— l]T 

is  a  vector  of  zero  mean  i.i.d.  complex  Gaussian  noise  sam¬ 
ples  with  variance  =  |E  [tUfctujj:]  =  N0/ (2TC).  We 
assume  that  the  fading  channels  fm,t  satisfy  the  following 
state-space  models 

=  Afmij_i  -(-  Bvm,(  (2) 

where  fTOio  is  assumed  distributed  according  to  a  Gaussian 
distribution  and  the  disturbance  noise  vmit  is  assumed  zero 
mean  i.i.d.  Gaussian.  We  denote  ft  =  [fxit, . . . ,  f m,t]-  The 
initial  states  fm,o,  the  sequences  vm,t  and  the  observation 
noise  wt  are  all  assumed  mutually  independent  at  any  time 
t.  Finally,  we  assume  that  the  symbols  xt  are  modeled  as 
a  first-order  (finite  state-space)  Markov  chain.  The  finite 
state-space  of  the  symbols  is  denoted  by  X. 

2.2  Estimation  objectives 

Given  the  observations  zo-t  —  (zo>  •  •  • » Zt),  all  Bayesian  in¬ 
ference  on  x0;t  =  (x0, . . . ,  xt )  and  f0:t  =  (fo, . . . ,  ft)  is 
based  on  the  posterior  distribution  p  ( xo :t,  fo:t|  zo:t)-  Here 
the  channel  coefficients  ft  are  regarded  as  nuisance  param¬ 
eters  and  integrated  out. 

Our  aim  is  to  compute  recursively  in  time  t  the  MMAP 
symbol  estimate  defined  as 

yMMAP  =arg max  p(Xt|  Zo.t) 

The  joint  distribution  p  ( xo;t  |  zo:t)  satisfies  the  following  re¬ 

p(x0:t+l|Z0:t+l)  =  p(x0:t|z0:t) 


P  (  Zf-f-1  j  Z0:t ,  X0:f  ) 

The  likelihood  term  p  (zt+i  |  z0:i,  x0:t+i)  can  be  evaluated 
pointwise  through  the  Kalman  filter  associated  to  the  path 
xo:t+i  as  the  system  (l)-(2)  is  linear  Gaussian  conditional 
upon  zo:t.  It  is  easily  seen  that,  given  our  assumptions,  com¬ 
puting  p  ( Xo:t  |  zq:(  )  orp  (xt|  zo;t)  requires  a  computational 

cost  exponential  in  the  (growing)  number  t  of  observations. 

It  is  thus  necessary  to  develop  an  approximation  scheme. 

Efficient  batch  algorithms  have  been  developed  to  solve 
related  estimation  problems  [9]  but  they  are  of  limited  inter¬ 
est  in  a  digital  communications  framework.  Several  “classi¬ 
cal”  suboptimal  algorithms  have  also  been  proposed  to  solve 
related  problems  in  the  literature,  see  for  example  [1]  for  a 
standard  textbook  on  the  subject.  However,  these  approx¬ 
imation  methods  are  notoriously  unreliable  and  faults  are 
difficult  to  diagnose  on-line. 

3  Particle  Filtering 

In  this  paper,  we  present  an  original  particle  filtering  method 
to  solve  this  optimal  estimation  problem  . 

3.1  Perfect  Monte  Carlo  sampling 

Assume  it  is  possible  to  sample  N  i.i.d.  samples,  called  par¬ 
ticles,  {xq*J  :  i  =  1, . . . ,  N}  according  to  the  joint  distri¬ 
bution  p  (xo:t|  yi:t),  then  an  empirical  distribution  approx¬ 
imation  of  p  ( x0:t  |  yi;t)  is  given  by 

1  N 

PN  (x0;t|  Z0:t)  =  (X0:t)  • 

i—  1 

Consequently  an  approximation  of  its  marginal  p  (xt|  zo:t) 
is  given  by 

1  N 

PN  (Xt|  Zo:t)  =  JjJ25xli)  (X«) 


that  is,  for  any  i  e  X, 


pN  ( xt  =  i\  z 0:t)  =  —  ^2  <*x<o  (*)  (3) 



-MMAP  =argma xpN  (xt|  Z0:t) 

The  estimate  (3)  is  unbiased  and  from  the  strong  law  of  large 
numbers  (SLLN),  pN  (xt  =  i\  z0;t)  —>  p(xt  =  i\  z0:t)  al¬ 
most  surely  as  N  — >  +oo.  A  central  limit  theorem  (CLT) 
holds  too.  The  main  advantage  of  Monte  Carlo  methods 
over  other  numerical  integration  methods  is  that  the  rate  of 
convergence  of p jv  (xt  =  i\ z0:t)  towards  p  ( x4  =  i\  z0:()  is 
independent  of  the  dimension  t.  Unfortunately,  it  is  not  pos¬ 
sible  to  sample  directly  from  the  distribution  p  (x0:t|  z0:t)  at 
any  t,  and  alternative  strategies  need  to  be  investigated. 

3.2  Sequential  Bayesian  Importance  Sampling 

An  alternative  solution  to  estimate  p  (xo;t|  zo:t)  consists  of 
using  the  importance  sampling  method.  Suppose  that  N 
i.i.d.  samples  {x^  :  i  =  1, . . . ,  N}  can  be  easily  simulated 
according  to  an  arbitrary  importance  distribution  7r(  xo:t  |  zq -t), 


such  thatp(xo:f|  zo:  t)  >  0  implies  7r(xo;t|  z0:t)  >  0.  Using 
this  distribution  a  Monte  Carlo  estimate  of  p  (xt|  zo:t)  may 
be  obtained  as 

Pn  (xt  =  i\  7.0, t)  =  J2i= i  4!^xf)  (*) .  (4) 

where  Wg’j  oc  w;(xq*|)  (£^j  w^t  =  1),  is  the  normalised 
version  of  the  importance  weight  tu(x^)  defined  as 

w(x0:t)  X 



z  0:t) 

According  to  the  SLLN,  pjv  (xt  =  i\zo,t)  converges  almost 
surely  towards  p  (xt  =  i\  zo,t)  as  N  — >  +oo,  and  under  ad¬ 
ditional  assumptions  a  CLT  also  holds. 

The  method  described  up  to  now  is  a  batch  method. 
In  order  to  obtain  the  estimate  of  p(x0;t|  z0:t)  sequentially, 
one  should  be  able  to  propagate  this  estimate  in  time  with¬ 
out  modifying  subsequently  the  past  simulated  trajectories 
{xqZj  :  i  =  1, . . . ,  N}.  This  means  that  7r(xo;i|  zo ,t)  should 
admit  7r(x0;t-i|  zo:t-i)  as  marginal  distribution: 

7r(xo-.t|  Z 0,t)  =  7r(x0:t_l|  Z0:t_l)7r(xt|  Z0:*,X0:t-l), 

and  the  importance  weights  w(x0:t)  can  then  be  evaluated 
recursively,  i.e. 

w(x0:t)  =  iu(x0:t_i)  x  wt,  (5) 


m  =  — i - r- 


There  are  an  unlimited  number  of  choices  for  the  impor¬ 
tance  distribution  7r  (xo,t|  zo:t),  the  only  restriction  being 
that  its  support  includes  that  of  p  (x0:t|  zo:t)-  A  sensible 
selection  criterion  is  to  choose  a  proposal  that  minimises 
the  variance  of  the  importance  weights  given  x0;t-i  and 
zo;t-  The  importance  distribution  that  satisfies  this  condi¬ 
tion  is  7r(xt|  z0:t,x0:t-i)  =  p(xt|z0:t,x0:t-i),  and  this 
“optimal”  importance  distribution  is  employed  throughout 
the  paper  (see  [7]  for  details). 

3.3  Selection  step 

For  importance  distributions  of  the  form  specified  by  (5) 
the  variance  of  the  importance  weights  can  only  increase 
(stochastically)  over  time  [7].  It  is  thus  impossible  to  avoid 
a  degeneracy  phenomenon.  Practically,  after  a  few  itera¬ 
tions  of  the  algorithm,  all  but  one  of  the  normalised  im¬ 
portance  weights  are  very  close  to  zero,  and  a  large  com¬ 
putational  effort  is  devoted  to  updating  trajectories  whose 
contribution  to  the  final  estimate  is  almost  zero.  To  avoid 
this,  it  is  of  crucial  importance  to  include  a  selection  step 
in  the  algorithm,  the  purpose  of  which  is  to  discard  particles 

with  low  normalised  importance  weights  and  multiply  those 
with  high  normalised  importance  weights.  The  weights  of 
the  “surviving”  particles  are  reset  to  1  /N.  A  selection  pro¬ 
cedure  associates  with  each  particle,  say  Xg*t,  i  —  1, . . . ,  N, 
a  number  of  children  Ni  <E  N,  such  that  AT*  =  N,  to 
obtain  N  new  particles  {x[,2:j  :  i  —  1, . . . ,  N}.  If  Ni  =  0 
then  Xq’;{  is  discarded,  otherwise  it  has  Nt  children  at  time 
t  +  1.  In  this  paper,  the  selection  step  is  done  according 
to  a  stratified  sampling  scheme  [12],  though  other  methods 
such  as  sampling  importance  resampling  (SIR)  [11]  may  be 
employed.  The  stratified  sampling  scheme  proceeds  as  fol¬ 
lows:  generate  N  points  equally  spaced  in  the  interval  [0,1], 
and  associate  for  each  particle  i,  a  number  of  children  Ni 
equal  to  the  number  of  points  lying  between  the  partial  sums 
of  weights  <7i_i  and  qt,  where  qt  =  = 

\^N  ~(i) 

|£i=i w: 


w(t3)).  This  algorithm  is  such  that  E  [Ni] 

Nw^  and  var[Ni]  =  jiVto^  j  ^1  —  jiVto^  where. 

for  any  a,  [aj  is  the  integer  part  of  a  and  {a}  =  a  -  [aj . 

3.3.1  Algorithm 

Given  at  time  t  —  1,  N  6  N*  random  samples  Xq^_j  (i  = 
1  ,...,JV)  distributed  according  to  p(x0;t-i| z0:t_i),  the 
MC  filter  proceeds  as  follows  at  time  t. 

Particle  Filtering  Algorithm 

Sequential  Importance  Sampling  step 

•  Fori  =  1,...,JV,  sample  x[8)  ~  7r(xt| z0:* , x^t_j) 
and  x§!{  =  (x&l.i.xj0). 

•  For  i  =  1, ...,  N,  evaluate  the  importance  weights 
up  to  a  normalising  constant: 

(i)  Pl 

(zil  z0:t-i,: 

K0:t  J 




wt  oc  — 

Zo  :t 

yW  \ 

and  normalise  them  ui\l)  oc  w[l\  J2j=i  wtf>  =  1- 
Selection  step 

•  Multiply/Discard  particles  (x£t5  *  =  1>  •  •  ■  > N)  with 
respect  to  high/low  normalised  importance  weights 
w[l)  to  obtain  N  particles  (x£J;  i  =  1, . . . ,  N^. 

Clearly,  the  computational  complexity  of  the  proposed 
algorithm  at  each  iteration  is  O  (N).  Moreover,  since  the 
optimal  and  prior  importance  distributions  7r(  x  1 1  zo-.t ,  xo:«- 1 ) 
and  the  associated  importance  weights  depend  on  xo:t-i  via 


a  set  of  low-dimensional  sufficient  statistics,  only  these  val¬ 
ues  need  to  be  kept  in  memory  and,  thus,  the  storage  re¬ 
quirements  for  the  proposed  algorithm  are  also  O  ( N )  and 
do  not  increase  over  time. 

3.3.2  Convergence  Results 

The  following  proposition  is  a  straightforward  consequence 
of  Theorem  1  in  [4],  which  itself  is  an  extension  of  results 
in  [3]. 

Proposition  1  For  all  t  >  0,  there  exists  ct  independent  of 
N  such  that 


(Pn  (xt  =  i|z0:t)  -p(xt  =  i|z0:t)) 

The  expectation  operator  is  with  respect  to  the  randomness 
introduced  in  the  particle  filtering  method.  Though  the  par¬ 
ticles  are  interacting,  one  observes  that  one  keeps  the  “stan¬ 
dard”  rate  of  convergence  of  Monte  Carlo  methods. 

4  Simulation  Results 

We  demonstrate  the  performance  of  our  multi-user  MAP  de¬ 
coder  for  transmission  of  binary-shift-keyed  (BPSK)  sym¬ 
bols  over  fast  fading  CDMA  channels.  The  simulation  pa¬ 
rameters  were  as  follows:  M  =  3,  G  =  10  and  a  flat  fading 
channel  with  fading  rate  0.05/T.  We  compared  our  results 
with  [6]  and  the  case  where  the  channel  is  assumed  known 
exactly.  The  results  in  terms  of  Bit  Error  Rate  (BER)  are 
presented  in  Fig.  1.  We  notice  that  when  the  SNR  is  large, 
our  stochastic  algorithm  outperforms  substantially  that  of 

[6].  Their  deterministic  algorithm  can  indeed  get  trapped  in 
severe  local  maxima  as  the  posterior  distribution  is  peakier. 

Figure  1:  Dotted  line  +  (channel  known),  solid  line  (particle 
filtering),  dotted  line  x  ([6]) 

[3]  D.  Crisan,  P.  Del  Moral  and  T.  Lyons,  “Discrete  filtering 
using  branching  and  interacting  particle  systems”,  Markov 
Processes  and  Related  Fields,  vol.  5,  no.  3,  pp.  293-318, 

[4]  D.  Crisan  and  A.  Doucet,  “Convergence  of  generalized  par¬ 
ticle  filters”,  technical  report,  Cambridge  University,  TR-F- 
INFENG  TR  381,  2000. 

[5]  L.M.  Davis  and  I.B.  Collings,  “Joint  MAP  detection  and 
channel  estimation  for  CDMA  over  frequency-selective  fad¬ 
ing  channels”,  in  Proc.  ISPACS-98,  pp.  432-436,  1998. 

[6]  L.M.  Davis  and  I.B.  Collings,  “Multi-user  MAP  decoding 
for  flat-fading  CDMA  channels”,  in  Proc.  Conf.  DSPCS-99, 
pp.  79-86,  1999. 

[7]  A.  Doucet,  J.F.G.  de  Freitas  and  N.J.  Gordon  (eds.),  Se¬ 
quential  Monte  Carlo  Methods  in  Practice,  Springer- Verlag: 
New- York,  2000. 

[8]  A.  Doucet,  S.J.  Godsill  and  C.  Andrieu,  “On  sequential 
Monte  Carlo  sampling  methods  for  Bayesian  filtering”, 
Statistics  and  Computing,  vol.  10,  no.  3,  pp.  197-208,  2000. 

[9]  A.  Doucet,  A.  Logothetis  and  V.  Krishnamurthy,  “Stochas¬ 
tic  sampling  algorithms  for  state  estimation  of  jump  Markov 
linear  systems”,  IEEE  Trans.  Automatic  Control,  vol.  45,  no. 

2,  pp.  188-201,2000. 

[10]  A.  Doucet,  N.J.  Gordon  and  V.  Krishnamurthy,  “Particle  fil¬ 
ters  for  state  estimation  of  jump  Markov  linear  systems”, 
technical  report,  Cambridge  University,  TR-F-INFENG  TR 
359, 1999. 

[11]  N.J.  Gordon,  D.J.  Salmond  and  A.F.M.  Smith,  “Novel  ap¬ 
proach  to  nonlinear/non-Gaussian  Bayesian  state  estima¬ 
tion”,  IEE  Proceedings-F,  vol.  140,  no.  2,  pp.  107-113, 1993. 

[12]  G.  Kitagawa,  “Monte  Carlo  Filter  and  Smoother  for  Non- 
Gaussian  Nonlinear  State  Space  Models”,  J.  Comp.  Graph. 
Stat.,  vol.  5,  no.  1,  pp.  1-25,  1996. 

[13]  U.  Madhow,  “Blind  adaptive  interference  suppression  for 
direct-sequence  CDMA”,  Proceedings  of  the  IEEE,  pp. 
2049-2069,  1998. 

[14]  E.  Punskaya,  C.  Andrieu,  A.  Doucet  and  W.J.  Fitzgerald, 
“Particle  filters  for  demodulation  of  M-ary  modulated  sig¬ 
nals  in  noisy  fading  communication  channels”,  in  Proceed¬ 
ings  Conf.  ICASSP  2000. 

[15]  E.  Punskaya,  C.  Andrieu,  A.  Doucet  and  W.J.  Fitzgerald, 
“Particle  filtering  for  demodulation  in  fading  channels”, 
technical  report  Cambridge  University  CUED-F-INFENG 
TR  381,  2000. 

[16]  S.  Verdu,  “Minimum  probability  of  error  for  asynchronous 
Gaussian  multiple  access  channels”,  IEEE  Trans.  Informa¬ 
tion  Theory,  vol.  32,  no.  1 ,  pp.  85-96, 1986. 


[1]  B.D.O.  Anderson  and  J.B.  Moore,  Optimal  Filtering, 
Prentice-Hall,  Englewood  Cliffs,  1979. 

[2]  C.  Andrieu,  A.  Doucet  and  E.  Punskaya,  “Sequential  Monte 
Carlo  methods  for  optimal  filtering”,  in  [7], 




Carlos  J.  Escudero,  Daniel  I.  Iglesia,  Monica  F.  Bugallo,  Luis  Castedo 

Departamento  de  Electronica  y  Sistemas.  Universidad  de  La  Coruna 
Campus  de  Elvina  s/n,  15.071  La  Coruna,  SPAIN 
Tel:  ++  34-981-167150,  e-mail: 


In  this  paper  we  investigate  a  blind  channel  estimation 
method  for  Multi-Carrier  CDMA  systems  that  uses 
a  subspace  decomposition  technique.  This  technique 
exploits  the  orthogonality  property  between  the  noise 
subspace  and  the  received  user  codes  to  obtain  a  chan¬ 
nel  identification  algorithm.  In  order  to  analyze  the 
performance  of  this  algorithm,  we  derived  a  theoretical 
expression  of  the  estimation  MSE  using  a  perturbation 
approach.  This  expression  is  compared  with  the  numer¬ 
ical  results  of  some  computer  simulations  to  illustrate 
the  validity  of  the  analysis. 


Multi-Carrier  (MC)  transmission  methods  for  Code  Di¬ 
vision  Multiple  Access  (CDMA)  communication  sys¬ 
tems  have  been  recently  proposed  as  an  efficient  tech¬ 
nique  to  combat  multipath  propagation  and  have  gained 
an  increased  interest  during  the  last  years  [1,  2].  In 
these  techniques  each  user  is  assigned  to  a  unique  iden¬ 
tification  code  sequence  and  the  transmitted  signal  is 
split  in  different  subcarriers.  It  is  assumed  that  the 
subcarrier  bandwidth  is  smaller  than  the  channel  co¬ 
herence  bandwidth  and,  therefore,  presents  only  flat 
fading.  As  a  consequence,  MC-CDMA  systems  do  not 
suffer  from  Inter-Symbol  Interference  (ISI).  However, 
the  effects  of  dispersive  channels  appear  as  random  dis¬ 
tortions  in  the  amplitude  and  phase  of  each  subcarrier. 
This  causes  a  loss  of  orthogonality  between  user  codes 
and  introduces  Multiple  Access  Interference  (MAI). 

In  order  to  implement  a  multiuser  detector  and  to 
reduce  MAI  it  is  necessary  to  characterize,  implicitly  or 
explicitly,  the  channel  parameters.  In  this  paper  we  in¬ 
troduce  a  new  blind  channel  estimation  technique  that 
is  based  on  a  subspace  decomposition  [3]  and  derive 
a  particular  algorithm  to  identify  the  channel  parame¬ 
ters.  We  also  obtain,  using  perturbation  techniques,  an 

This  work  has  been  supported  by  FEDER  (grant  1FD97- 

approximate  expression  of  the  estimation  Mean  Square 
Error  (MSE)  achieved  with  the  proposed  algorithm. 

The  paper  is  organized  as  follows.  Section  2  presents 
the  signal  model  of  a  synchronous  MC-CDMA  system. 
Section  3  describes  the  subspace  decomposition  tech¬ 
nique  and  the  resultant  algorithm.  In  section  4  we  per¬ 
form  the  theoretical  analysis  of  the  estimation  MSE. 
Section  5  shows  the  results  of  several  computer  simula¬ 
tions  that  illustrate  the  validity  of  the  approximations 
in  the  previous  section  and,  finally,  Section  6  is  devoted 
to  the  conclusions. 


Let  us  consider  a  discrete-time  baseband  equivalent 
model  of  a  synchronous  MC-CDMA  system  with  N 
users  using  L-chip  signature  codes.  The  fc-tli  chip  cor¬ 
responding  to  the  n-th  symbol  transmitted  by  the  i-th 
user  is  given  by 

Figure  1:  Block  diagram  of  the  discrete-time  baseband 
model  of  a  MC-CDMA  system. 

0-7803-5988-7/00/$  10.00  ©  2000  IEEE 



«£(*)  =  «»*(*)  fc  =  0,  •  •  • ,  L  —  1  ra  =  0, 1,2,  •  •  •  (1) 

where  c,(fc)  is  the  fc-th  chip  of  the  i-th  user  code.  In 
a  MC-CDMA  system  the  modulator  computes  the  L- 
IDFT  (Inverse  Discrete  Fourier  Transform)  of  (1)  to 
obtain  the  following  multicarrier  signal 

1  i_1 

V‘(m)  =  IDFT[vi(k )]  =  -  £  vl{k)e^km  (2) 

^  k—0 

This  signal  is  transmitted  through  a  dispersive  channel 
with  an  impulse  response  hj(m);  m  =  0 ,...M  —  1.  At 
the  receiver  the  observed  signal  is  a  superposition  of 
the  signals  corresponding  to  N  users  plus  an  additive 
white  Gaussian  noise  (AWGN).  Therefore,  the  received 
signal  for  the  n-th  symbol  is  the  following 


Xn(m)  =  ^2  Vn(m)  *  hi(m)  +  rn{m )  (3) 

i= 1 

where  *  denotes  discrete  convolution  and  rn(m )  repre¬ 
sents  a  white  noise  sequence. 

To  recover  the  transmitted  symbols,  the  receiver 
applies  a  L-DFT  (Discrete  Fourier  Transform)  to  the 
received  signal  (3).  Assuming  perfect  synchronization 
and  a  sufficiently  large  guard  time  between  symbols, 
the  resultant  signal  is 


xn(k)  =  DFT[Xn(m)}  =  J2<(k)Hi(k)  +  »  »(*)  (4) 

i—  1 


=  ^2slnCi{k)Hi(k)  +  ,  n(k)  k  =  0,  -  ■  ■  ,L  —  1 

i=  1 

where  Hi(k)  and  ,  n(k)  are  the  DFT’s  of  /i,(m)  and 
rn(m),  respectively.  Rewriting  (4)  in  vector  notation 
we  obtain 


=  [£n(0),  •  ■  •  ,Xn(L  —  1)]T  =  ^2  snCjH,  +  r„ 

i— 1 

N  N 

=  ^2  slCiFhi  +  rn  =  8%  +  rn  (5) 

i— 1  i= 1 

where  T  denotes  transposition,  C,  is  a  diagonal  matrix 
whose  elements  are  the  L  chips  of  the  code  correspond¬ 
ing  to  the  i-th  user,  H,  =  [/?,( 0),  ■  •  •  ,Hi(L  —  1)]T  and 
Tn  =  [,  n(0),  •  •  • , ,  n(L  -  1)]T.  To  obtain  (5)  we  have 
used  the  relationship  Hj  =  Fh,  where  F  is  a  L  x  M 
DFT  matrix  and  h,  =  [h,(0),  ■  •  ■ ,  h,(M  -  1)]T.  Note 
that  (5)  is  a  CDMA  signal  where  the  code  associated 
to  the  *-th  user  is  c*  =  CjFh,. 

Assuming  statistical  independence  between  users  and 
noise,  the  autocorrelation  matrix  of  the  observations 
vector  (5)  can  be  decomposed  as 


R  =  E[xnx^]  =  ^ciE[44*]cf  +  E[T„r^] 

i—  1 

=  Y2  (6) 

i—  1 

where  £?[■]  is  the  expectation  operator,  *  represents  con¬ 
jugate,  H  denotes  conjugate  transpose,  I  is  the  identity 
matrix  and  of  and  rilT  are  the  i-th  user  signal  and  noise 
power,  respectively. 

Let  us  consider  the  eigendecomposition  of  (6).  There 
are  L  eigenvalues  that  we  sort  as  Ao  >  \\  >  •  •  •  > 
\l-i-  It  is  well-known  that  the  eigenvectors  associ¬ 
ated  to  the  N  most  significants  eigenvalues  (u;,  l  = 
0,  ■  •  • ,  N  —  1)  span  the  signal  subspace  where  the  per¬ 
turbed  user  codes,  Cj,  lie.  The  remaining  L  —  N  eigen¬ 
vectors  (u;,  l  =  N,  •  ■  • ,  L  —  1)  span  the  noise  (orthogo¬ 
nal)  subspace  and  their  associated  eigenvalues  are  equal 
to  the  noise  power,  i.e.,  XN  =  ■  •■  =  \l-\  =  <4  [3]. 

As  we  have  seen,  the  perturbed  user  codes  lie  in  the 
signal  subspace  and  are  orthogonal  to  the  noise  sub¬ 
space.  This  property  can  be  used  to  state  the  following 
system  of  equations  for  the  i-th  user 

cfu,  =  0  l  —  N,-  •  •  ,L  —  1  (7) 

Recall  that  this  system  of  equations  has  M  unknowns 
and  L  —  N  equations.  It  will  be  solvable  if  and  only 
if  the  number  of  equations  is  greater  or  equal  than  the 
number  of  unknowns,  M  <  L  —  N.  This  means  that 
the  number  of  simultaneous  users,  N,  is  limited  by  the 
number  of  carriers,  L,  and  the  channel  length,  M.  Nev¬ 
ertheless,  it  is  interesting  to  note  that  the  system  ca¬ 
pacity  can  be  increased  without  increasing  the  number 
of  carriers  using  codes  with  a  length  larger  than  the 
spreading  gain  [4]. 

In  order  to  solve  the  equations  system  (7),  we  can 
consider  the  following  equivalent  system 

||cf uH|2  =  cfu/ufci  =  hf FwCf u^u/^CiFh 4  =  0  (8) 

for  l  =  N,--  ■  ,L  —  1.  The  solution  to  these  equations 
can  be  found  by  solving  the  following  minimization 


hi  =  arg  min  V  hf  F^Cf  u;uf  CiFh; 



=  are  min  h( 


X- 1 

^2  FHCfulufICiF 


=  arg  min  hf  [FHCf  UUffCiF]  h; 
l|h,N2=l  L  J 

=  arg  min  hfQjh; 



where  the  solution  h,  is  an  estimation  of  the  chan¬ 
nel  impulse  response  vector,  U  is  a  L  x  (L  -  N)  ma¬ 
trix  whose  columns  are  the  eigenvectors  associated  to 
the  noise  subspace  (i.e.,  u/,  l  =  N,  •  •  •  ,L  —  1)  and 
Q i  —  FHC^\JJJHCiF.  The  solution  can  be  obtained 
by  the  least  squares  method  and  it  corresponds  to  the 
eigenvector  of  Q,  associated  to  its  minimum  eigenvalue 


In  practice,  we  do  not  know  a  priori  the  autocorre¬ 
lation  matrix  (6).  However,  it  can  be  estimated  from 
the  sampled  matrix  as 

A  =  ^-Sx«xn  (10) 

8  n=l 

where  Ns  is  the  number  of  received  symbols  used  to 
obtain  the  estimation.  Note  that  R  ->  R  as  Ns  tends 
to  infinity  and  also  its  eigenvalues  A;  — »  A;  and  eigen¬ 
vectors  U(  ->  U;. 

Finally,  when  using  second  order  statistics,  the  chan¬ 
nel  impulse  response  can  be  obtained  up  to  a  complex 
constant.  This  constant  has  to  be  compensated  in  or¬ 
der  to  analyze  the  algorithm  performance.  Towards 
this  aim,  we  normalize  the  estimation  of  the  impulse 
response  vector  as  h i, normalized  =  where  hi(0) 

and  hi( 0)  are  the  first  elements  of  the  true  and  esti¬ 
mated  channel  impulse  response  vectors,  respectively. 

where  we  have  neglected  the  second  order  term,  AQAh  ~ 
0.  Therefore, 

QAh  ~  -AQh  (12) 


Ah  ~  -QfAQh 

=  -Qf(Q-Q)h 
=  -QtQh  (13) 

where  Ql  denotes  the  left  pseudo-inverse  of  Q.  The 
fc-th  component  of  Ah  is  given  by 

A  h(k)  ~  -qfQh 

=  -qf(F"CflrUU//CF)h 

=  -^qfFKCHu(ufCFh 



=  -^ufCFhqfFHCHu( 


=  -TVacelUU^CFhqfF^C^}  (14) 

where  qfc  is  the  fc-th  column  of  (Q^)H ■  Based  on  the 
results  of  [6]  (page  1840,  equation  (4.11)),  we  obtain 
the  following  identity 

UU*  CFh  ~  -UUHAVVff  CFh  (15) 

where  V  is  a  L  x  N  matrix  whose  columns  are  the  eigen¬ 
vectors  associated  to  the  signal  subspace  (i.e.,  u;  l  = 
0,  •  •  ■ ,  N  -  1)  and  AV  =  V  -  V  .  Moreover,  from 
Appendix  A  of  [6]  (page  1844,  equation  {A. 2)) 

VH AV  ~  UhRVA_1  (16) 


In  this  section,  we  derive  an  analytical  expression  of  the 
estimation  MSE.  For  simplicity  reasons,  let  us  denote 
hj  =  h,  Qi  =  Q  and  C,  =  C.  Our  analysis  is  based  on 
a  perturbation  technique  [7]  that  allows  us  to  express 
the  perturbation  in  h,  Ah,  in  terms  of  the  perturbation 
in  Q,  AQ.  Let  us  consider  the  following  identities 

Qh  =  0 

h  =  h  +  Ah  (11) 

Q  =  Q  +  AQ 

For  a  sufficiently  large  number  of  samples  ( Ns  — >  oo), 
Q  — >  Q,  h  — >  h  and  Qh  is  approximately  equal  to  the 
zero  vector,  i.e. 

where  A  =  diag(Ao  —  o> ,  •  •  • ,  Ajv-i  -  of)  where  diag(a) 
is  a  diagonal  matrix  whose  elements  are  the  elements  of 
vector  a.  To  remove  the  effect  of  the  unknown  constant 
that  we  have  in  the  estimation  of  the  channel  vector, 
we  have  to  consider  a  normalization  of  the  vector  chan¬ 
nel  estimate.  Similarly  to  [7],  we  select  the  following 

Ah  normalized  —  (I  ' 





where  I  is  the  identity  matrix  and  1T  =  [1,0,0,  •••]. 
This  normalization  can  be  included  in  (13)  and  now  q*. 
will  be  the  fc-th  column  of  the  matrix  ((I  — 

Combining  (15)  and  (16)  in  (14),  we  obtain  the  follow¬ 
ing  expression 

Qh  =  (Q  +  AQ)(h  +  Ah)  ~  AQh  +  QAh«0  A h(k)  ~  TVacefUU^RVA-1^ V^CFhqf  FHCW} 


L- 1 

=  ^(ufRVA-^^CFhqfF^C^u/) 

L- 1 

=  ^ufRg,*  (18) 


Figure  3  shows  the  simulated  and  theoretical  MSE 
versus  the  Signal  to  Noise  Ratio  (SNR)  of  the  received 
users.  The  environment  is  the  same  as  before  and  the 
curves  are  obtained  after  Ns  =  200  symbols.  We  can 
see  that  both  curves  are  very  similar  even  for  small 
values  of  SNR. 

where  glk  =  VA^1  Vf/CFhqf  F^C^u,. 

Finally,  to  obtain  the  MSE  of  the  channel  estima¬ 
tion  algorithm,  we  have  to  explore  the  fourth  order 
statistics  of  binary  and  Gaussian  random  variables.  In 
appendix  A  it  is  demonstrated  that 

£[||Ah||2]  =  (19) 

2  M—l 

=  y;  (TVace{UffUGf  CCHGk} 

8  fc=0 


where  G*  =  [gArfc,  •  •  • ,  g(i,— i)*]  and  C  =  [<7iCi,  •  •  •  .ctjvCjv]. 


In  this  section  we  compare  the  analytical  expression 
(19)  with  the  MSE  obtained  from  computer  simulations 
of  the  algorithm  (9)  to  illustrate  the  validity  of  the 
approximation  carried  out  in  the  previous  section. 

Figure  2  examines  the  accuracy  of  the  MSE  analy¬ 
sis.  It  is  shown  the  time  evolution  for  theoretical  and 
simulated  MSE  (averaged  value  of  50  realizations).  An 
environment  with  L  =  12  carriers,  a  channel  length 
M  =  4  and  8  users  received  with  a  SNR  =  12 dB  was 
considered.  It  can  be  seen  that  even  for  a  small  num¬ 
ber  of  symbols,  the  theoretical  expression  fits  to  the 
simulated  MSE. 

Figure  3:  Simulated  and  theoretical  MSE  vs.  received 
users  SNR. 


A  new  blind  channel  identification  method  for  Multi- 
Carrier  CDMA  systems  has  been  presented.  The  method 
exploits  the  orthogonality  between  the  signal  and  noise 
subspaces  of  the  incoming  signal.  It  also  has  been  inves¬ 
tigated  the  performance  of  the  method:  using  a  pertur¬ 
bation  technique,  we  derived  an  analytical  approximate 
expression  of  the  estimation  MSE.  Computer  simula¬ 
tions  have  revealed  the  high  accuracy  of  the  analytical 
approximation  carried  out. 


Taking  into  account  that  cf*u ;  =  =  0,  it  is 

straightforward  to  obtain  from  (18)  that 

A h(k)  = 

1  L-1N.-1  (N  \ 

=  rn(4)*cf  +  r„r" 

8  l—N  n=0  \i= 1  / 

g  Ik 

where  *  represents  conjugate.  Therefore,  the  MSE  is 

Figure  2:  Time  evolution  of  the  simulated  and  theoret¬ 
ical  MSE. 


£[||Ah||2]=  E[Ah(k)Ah* {k)]  = 

k= 0 



M— 1  /  1  £— 1  £  — 1  Ns  —  1  Nt—1  N  N 

=  E  ^EEEEEE 

k— 0  \  s  l=N  p=N  n— 0  m=0  i= 1  j=l 


1  £-1  i-1  JV.-l  /Va-1 

+  4  E  E  E  E  £I<r„rJgltg»r„r" 

s  /=JV  p—N  n—0  m—0 

where  we  have  used  the  fact  that  the  third  order  mo¬ 
ments  of  a  Gaussian  random  variable  are  zero. 

Considering  statistical  independence  between  users 
and  noise  and  the  user  symbols  i.i.d.,  the  first  expecta¬ 
tion  (21)  is 

^[uffrnr^upg^cisjn(4)*cf  gifc]  = 

=  afaZu^Upg^CiC? glk6(n  -  m)5(i  -  j)  (22) 

where  £(•)  is  the  Kronecker  function. 

The  second  expectation  in  (21)  can  be  expressed  as 

e  [ufrnr"g/*g$rmr"up]  = 

=  XI?  E{YnY”]gikg*kE[TmY"]up 
=  4uf ^PSpkSikS(n  -  m)  (23) 

where  we  have  used  the  facts  u^gu-  =  0  and  E[0  [OT^^OX]  = 
£[0!02*]£;[03^]  +  Eie^WEie^}  when  i  =  1,2, 3, 4 
are  four  independent  Gaussian  variables  [7]. 

Including  (22)  and  (23)  in  (21),  it  is  obtained 

Identification  of  Multichannel  FIR  Filters” ,  IEEE 
Transactions  on  Signal  Processing,  vol.  43,  no.  2, 
pp.  516-525,  February  1995. 

[4]  D.  I.  Iglesia,  C.  J.  Escudero,  L.  Castedo,  “A  Sub- 

)  space  Method  for  Blind  Channel  Identification  in 
Multi-Carrier  CDMA  Systems”,  Second  Interna¬ 
tional  Workshop  on  Multi- Carrier  Spread  Spec¬ 
trum  &  Related  Topics  (MCSS’99),  Kluwer  Aca¬ 
demic  Publishers,  September  1999. 

[5]  G.  Strang,  Linear  Algebra  and  its  Applications, 
Harcourt  Brace  Jovanovich,  Third  Edition,  1988. 

[6]  P,  Stoica  and  T.  Soderstrom,  “Statistical  Analy¬ 
sis  and  Subspace  Rotation  Estimates  of  Sinusoidal 
Frequencies”,  IEEE  Transactions  on  Signal  Pro¬ 
cessing,  vol.  39,  no.  8,  pp. 1836-1847,  August  1991. 

[7]  W.  Qiu,  Y.  Hua,  “Performance  Analysis  of  the 
Subspace  Method  for  Blind  Channel  Identifica¬ 
tion”,  Signal  Processing,  no.  50,  pp.  71-81,  1996. 

£[||Ah||2]  = 

n  M—l  L—l  L- 1  Ng  N 

k= 0  l=N  p=N  n=l  i=  1 

+<4U;  upgp/fcg;fc)  (24) 

that  is  equivalent  to  (19). 


[1]  K.  Fazel,  G.  P.  Fettweis,  Multi-Carrier  Spread- 
Spectrum,  Kluwer  Academic  Publishers,  1997. 

[2]  N.  Yee,  J.  P.  Linnartz,  G.  Fettweis,  ’’Multi- 
Carrier  CDMA  in  Indoor  Wireless  Radio  Net¬ 
works”,  Proc.  International  Symposium  on  Per¬ 
sonal,  Indoor  and  Mobile  Radio  Communications 
(PIMRC93),  Yokohama,  pp.  109-113,  1993. 

[3]  E.  Moulines,  P.  Duhamel,  J.  F.  Cardoso  and 
S.  Mayrargue,  “Subspace  Methods  for  the  Blind 



Kunjie  Wang  and  Yeheskel  Bar-Ness 

Center  for  Communications  and  Signal  Processing  Research 
Department  of  Electrical  and  Computer  Engineering 
New  Jersey  Institute  of  Technology 
University  Heights,  Newark,  NJ  07102,  USA 
Tel:  1-973-596-3520  Fax:  1-973-596-8473 
Email:  Cc: 


In  this  paper,  a  new  blind  adaptive  multiuser  detector, 
which  is  termed  prediction  least  mean  kurtosis  (PLMK) 
algorithm,  is  proposed  for  joint  MAI  and  narrowband 
interference  (NBI)  suppression  in  asynchronous  CDMA 
systems.  This  algorithm  is  based  on  a  higher-order 
statistics  rather  than  the  second-order  statistics  used  in  the 
LMS  algorithm.  Unlike  the  regular  least  mean  kurtosis 
(LMK),  it  takes  into  consideration  samples  earlier  than 
those  correspond  to  current  bit.  For  comparison  purposes, 
we  also  apply  the  regular  LMK  algorithm  to  the  case  of 
asynchronous  CDMA  systems.  Simulation  results  show 
that  the  blind  adaptive  multiuser  detector  with  PLMK 
algorithm  provides  significantly  better  performance  than 
the  one  with  regular  LMK  algorithm. 


Blind  adaptive  multiuser  detector  has  received  significant 
attention  due  to  its  implementation  without  requiring 
training  sequences  in  CDMA  systems.  During  the  past 
several  years,  many  researches  in  this  area  have  focused 
their  effort  on  the  least  mean  square  (LMS)  algorithm  due 
to  its  low  complexity.  To  achieve  better  performance  in 
suppressing  multiple-access  interference  (MAI)  in 
synchronous  CDMS  systems,  Tang,  et  al  [3](l)  applied 
instead  the  least  mean  kurtosis  (LMK)  algorithm.  The 
LMK  algorithm  is  based  on  a  higher-order  statistics 
rather  than  the  second-order  statistics  used  in  the  LMS 

In  this  paper,  a  new  blind  adaptive  multiuser  detector 

This  research  was  partially  supported  by  New  Jersey 
Center  for  Wireless  Telecommunications. 

(l)Note  that  in  [3]  only  synchronous  case  was  considered. 

termed  prediction  least  mean  kurtosis  (PLMK)  algorithm, 
is  proposed  for  joint  MAI  and  narrowband  interference 
(NBI)  suppression  in  asynchronous  CDMA  systems. 
Unlike  the  regular  LMK,  it  takes  into  consideration 
samples  earlier  than  those  correspond  to  current  bit.  For 
comparison  purposes,  we  also  apply  the  regular  LMK 
algorithm  of  [3]  to  the  case  of  asynchronous  CDMA 
systems.  Simulation  results  show  that  the  blind  adaptive 
multiuser  detector  with  PLMK  algorithm  provides 
significantly  better  performance  than  the  one  with  LMK 


We  consider  the  low-pass  equivalent  model  of  an 
asynchronous  CDMA  system.  The  received  signal  due  to 
the  kth  user  is  given  by 

n  (0  =  £  sk  (t  -  iT  -  ** )  (1) 

where  T  is  the  bit  interval,  bk  e  {- 1,1}  is  the  information 
data  of  the  £th  user.  Pk  and  Tk  denote  the  power  and 
relative  delay  of  the  fcth  user,  respectively.  The  spreading 
waveform  sk  ( t )  is  given  by 

«*(')=  tak(n}//(t-nTc)  (2) 

n= 1 

where  ak  (n)e  {- 1,1}  is  the  nth  element  of  the  spreading 
sequence  for  the  klh  user,  N  is  the  processing  gain  and 
Tc  =T/N  is  the  chip  duration.  y/(t)  is  a  normalized 

rectangular  pulse  of  width  Tc ,  i.e.,  \f/2(t)dt  =  1 . 

The  total  received  signal  can  be  written  as 

0-7803-5988-7/00/$  1 0.00  ©  2000  IEEE 


r(t)='Zlrk(t)  +  i(t)  +  n(t)  (3) 

*=  1 

where  K  is  the  number  of  users,  i(t)  is  the  NBI  and  n(t) 
is  the  white  Gaussian  noise. 

The  received  signal  r(t )  is  assumed  to  pass  through  a 
chip-matched  filter  sampled  at  chip  rate  and  synchronized 
to  chip  time.  The  /th  received  signal  sample  at  the  output 
of  the  chip-matched  filter  is 

r(0  =  l£'W'r(t)r(t-lT,)dt  (4) 

from  which  the  /th  NBI  sample  and  the  /th  white 
Gaussian  noise  sample  at  the  output  of  the  chip-matched 

filter  are  /  (/ )  =  J  ^+1  )?c  i  (/)//■(/  -  ITC  )dt  and 
«(/)  =  /;(/“)//(/  -  ITC  )dt  respectively. 

J  it  c 

In  this  paper,  we  assume  that  the  NBI  is  modeled  as  a 
pth  -order  AR  process,  i.e., 

=  j)+e(l)  (5) 


where  e(l )  is  a  white  Gaussian  process  with  variance  £ 2 . 


Without  loss  of  generality,  we  assume  that  the  power  and 
the  delay  of  the  desired  signal  are,  respectively,  Px  =  1 
and  T,  =  0 ,  and  convenience,  we  define  Tk  =  dkTc  where 
dk  is  integer  between  0  and  N  —  1.  In  [3],  the  LMK 
algorithm  is  based  on  the  received  signal  samples  vector 
rT  =  [  r(0),r(l),  ,r(N  -1)].  It  is  well  known  that  the 
current  value  of  NBI  is  predictable  from  its  past  values. 
Therefore,  we  expect  better  performance  by  extending  the 
received  signal  samples  vector  into  the  interval 
[■ ~MTC,T ]  (M>  0),  i.e., 

rr  ~[r{-M),r{-M  +1),  l),r(0),r(l),  ,r(AI- 1)], 

which  is  termed  PLMK  algorithm.  We  consider  the  case 
of  M  <  N  in  this  paper.  For  a  given  relative  delay  vector 

d  =  \d{ ,  ..  ,dK  ]r  ,  we  can  obtain  from  (1)~(4) 
r  =  yJFl(blal+b[a'l) 

K  - -  (6) 

+  IMai  +bWk  +bX)+i  +  n 

k= 2 

where  for  —  M  <1  <  N  —  1  and  2<k<K 

ai  (0  —  \-a\  (0]^((20) 


a\(l)=M  +  N)]xil<0) 


at  (0  =  0  ~  dk  )\Xuik<i<N) 


ak  (!)  ~[ok(l  +  N—  dk  )\X(-N+di<l<dk) 


a*  ( !)  =  \-ak  0  +  2  N  —dk  )]X(t<-N+dk) 


with  Xa  *s  indicator  function  for  the  set  A,  bk  is  the 
current  bit  of  the  Ath  user,  b'k  and  bk  is  one  bit  or  two 
bits  earlier  than  the  current  bit  of  the  Ath  user, 

From  (6),  we  notice  having  3(K  -l)  +  2  =  (3K  -1) 

vectors  {Jp and  {[Fkak,4Fkak,4Fka”k\ 
k  =  2,  ,K  .  Depending  on  the  relative  delays  of  the 
multiuser  interferers,  we  have  among  these,  L 
( 2  K  <L<  3  K  —  1 )  non-zero  vectors.  For  the  L  non-zero 
vectors,  we  write  Eqn.(6)  in  the  form 

r  =  I^P*+i  +  n  (12) 

*= i 

where  the  non-zero  vector  p,  is  the  desired  signal  vector 
■y[t\, a, ,  and  bx  is  the  desired  bit.  The  set  of  non-zero 
vectors  {p2,  ,pz}  consists  of  the  intersymbol 
interference  (ISI)  {Jf\a't }  and  the  non-zero  MAI  vectors 

of  the  set  {[Fkak,4K*k’JPX\  k  =  2,  ,K. 

{b2,  ,bL}  are  data  coefficients  corresponding  to  the 
vectors  {p2,  ,pL},  respectively.  For  example,  b,  =  b[ 

if  p,  =4Fk*’k,  2<1<L,  \<k<K. 

We  use  the  following  cost  function  of  [3]  to  suppress 
interference  without  requiring  training  sequence: 

/s(h)  =  3[£(rrh)2f  -£(rrh)4  (13) 

Taking  the  gradient  with  respect  to  the  vector  h  ,  we  have 
V/fi(h)  =  12£(rrh)2£(rTh)r-4£(rTh)3r  (14) 

The  mean  value  £(rrh)2  will  be  estimated  specially  by 
recursive  equation 


G(n)  =  fiG(n  - 1)+  (1  -  /3)[r (nf  h(«)]"  (15) 

with  0  <  p  <  1  is  forgetting  factor. 

Using  this  eastimate  and  the  ensamble  estimate  of 
£(rTh);  r(«)rh(«)  ,  we  can  get  the  following  equation 

V/B  [h(«)]  =  4^>G(n) -  [r(«)r  h(«)J  ]r(«)r  h(«)r(«) 


Then  the  stepest  decent  adaptive  weight-update  algorithm, 
PLMK  algorithm,  can  be  characterized  by 

h(rt  +  l)=h(n)-i/t|V/Jh(«)]}  (17) 

with  VJB  [h(«)]  from  (16)  and  G(n)  from  (15).  We  can 
see  that  training  sequence  is  not  needed,  the  PLMK 
algorithm  is  blind. 


Simulations  results  carried  out  to  evaluate  the 
performance  of  the  PLMK  algorithm  is  depicted  in  Fig.l. 
For  comparison,  we  add  to  it  the  results  with  regular 
LMK  algorithm  [3],  but  for  asynchronous  case,  which  can 
be  obtained  from  PLMK  with  M  =0.  In  this  simulation, 
we  use  a  three-user  CDMA  system  employing  Gold  Code 
of  length  7.  For  calculating  the  averaged  SIR  at  the  nth 
iteration,  we  use  expression  given  by  [2]; 


SIR(n)  =  - - — - 



with  J  is  the  number  of  times  the  simulations  are 
repeated.  Each  of  the  other  CDMA  users  has  power  P 
larger  than  the  desired  CDMA  user  power  Px  =  1 .  The 

delay  vector  is  set  to  d  =  [0,l,3,6]r .  The  NBI  is  modeled 
as  a  first-order  AR  process  with  a,  =  0.99  and  power  of 
3dB  higher  than  the  desired  signal.  The  white  noise 
power  is  set  to  0.1.  We  use  M  -  3 ,  P  =  10 ,  /3  =  0.4 , 

JU  =  6xlO”4and  7=500.  From  Fig.l,  we  can  easily 
see  that  the  PLMK  algorithm  provides  significantly  better 
performance  than  the  regular  LMK  algorithm  with  almost 
the  same  convergence  rate. 


In  this  paper,  we  proposed  a  new  blind  adaptive 
multiuser  detector  based  on  prediction  least  mean 
kurtosis  (PLMK)  algorithm  for  joint  suppressing  MAI 
and  NBI  in  asynchronous  CDMA  systems.  For 
comparison,  we  also  apply  the  regular  LMK 
algorithm  of  [3]  to  the  case  of  asynchronous  CDMA 
systems.  Results  show  that  the  blind  adaptive 
multiuser  detector  with  PLMK  algorithm  provides 
significantly  better  performance  than  the  one  with 
regular  LMK  algorithm. 


[1]  O.  Tanrikulu  and  A.G.  Constantinides,  “Least-mean 
kurtosis:  A  novel  high-order  statistics  based  adaptive 
filtering  algorithm”,  1EE  Electron.  Lett.,  vol.30,  pp. 
189-190,  1994. 

[2]  M.  Honig,  U.  Madhow  and  S.  Verdu,  “Blind  adaptive 
multiuser  detection”,  IEEE  Trans.  Inform.  Theory, 
vol.  IT-41,  No.  4,  pp.  944-960,  July  1995. 

[3]  Z.  Tang,  Z.  Yang  and  Y.  Yao,  “Blind  multiuser 
detector  based  on  LMK  criterion”,  IEE  Electron. 
Lett.,  vol.35,  pp.  267-268,  1999. 

Nurrber  of  Iterations 

Fig.l  Averaged  output  SIR  versus  number  of  iterations 

(  N  =  1,M  =3,K  =  3) 




Thomas  P.  Krauss,  William  J.  Hillery,  and  Michael  D.  Zoltowski 

School  of  Electrical  Engineering,  Purdue  University 
West  Lafayette,  IN  47907-1285 



We  investigate  a  “symbol-level”  MMSE  equalizer  for  the 
CDMA  downlink  over  a  frequency-selective  multipath  chan¬ 
nel  meant  to  improve  on  the  recently  proposed  “chip-level” 
downlink  equalizers.  Indeed  the  symbol-level  equalizer  per¬ 
forms  better  than  the  chip-level,  but  is  computationally  more 
demanding.  The  symbol-level  equalizer  is  optimal  for  “sat¬ 
urated  cells”  where  all  Walsh-Hadamard  channel  codes  are 
in  use  and  have  equal  power.  It  performs  very  close  to  op¬ 
timal  even  for  relatively  lightly  loaded  cells.  We  derive  a 
bound  on  the  off-diagonals  of  the  covariance  matrix  of  the 
transmitted  data  that  helps  explain  why  the  equalizer  works 
when  there  are  fewer  active  channel  codes  than  the  spread¬ 
ing  factor.  Performance  is  evaluated  through  simulations  to 
obtain  the  average  bit  error  rate  (BER)  over  a  class  of  chan¬ 
nels  for  two  cases:  no  out-of-cell  interference,  and  one  equal 
power  base-station.  The  symbol-  and  chip-level  equalizers 
are  compared  to  the  conventional  RAKE  receiver. 


Chip-level  downlink  equalization  is  a  good  candidate  for  im¬ 
proving  capacity  (in  terms  of  users  and/or  data  rate)  in 
3G  cellular  systems  such  as  cdma2000  [1],  These  equaliz¬ 
ers  significantly  cancel  multi-user  access  interference  (MAI), 
the  main  performance  limitation  for  the  standard  RAKE  re¬ 
ceiver.  The  good  qualities  of  the  recently  proposed  “chip- 
level  equalizers”  for  CDMA  downlink  are  that  they  need 
knowledge  only  of  the  desired  user’s  spreading  code  (and 
long-code),  they  change  only  as  often  as  the  channel  so  don’t 
need  to  be  recomputed  every  symbol,  and  the  same  equalizer 
applies  to  all  users  from  a  given  base-station.  However,  these 
equalizers  do  not  yield  the  optimal  estimate  of  the  transmit¬ 
ted  symbol. 

The  optimal  equalizer  is  conditioned  on  all  of  the  chan¬ 
nel  codes  in  use  and  their  powers,  and  also  the  base-station 
dependent  long  code.  Since  these  aren’t  really  random  quan¬ 
tities,  it  should  be  possible  to  improve  on  the  performance 
by  using  them.  One  option  approaching  the  optimal  one, 
but  still  having  the  nice  feature  of  only  needing  to  know  the 
channel  code(s)  of  the  desired  user,  is  derived  here.  We  re¬ 
fer  to  this  as  the  “symbol-level”  equalizer.  This  equalizer 
changes  every  symbol,  unlike  the  chip-level  equalizer.  We 
find  that  this  equalizer  leads  to  a  performance  improvement 
over  the  chip-level  equalizer  when  all  channel  codes  are  in 
use  and  are  equal  power  (in  which  case  the  derived  equal¬ 
izer  is  equal  to  the  optimal  symbol  estimate).  We  also  make 
some  arguments,  and  show  simulation  results,  that  show  this 
equalizer  is  applicable  when  there  are  fewer  active  channel 

UNDER  GRANT  NO.  F49620-00-1-0127. 

codes  per  cell. 

In  this  paper  we  derive  the  symbol-level  MMSE  estimator 
for  the  two  base-station  case.  One  base-station  transmits 
the  desired  user’s  data,  while  the  other  base-station  is  con¬ 
sidered  interference.  Spatial  diversity  and/or  oversampling 
with  respect  to  the  chip  rate  are  handled  as  multiple  chip¬ 
spaced  channels.  Our  simulations  assume  spatial  diversity  is 
provided  by  two  antennas  at  the  receiver  which  experience 
independent  fading,  and  oversample  at  twice  the  chip  rate. 

Some  relevant  papers  on  linear  chip-level  downlink  equaliz¬ 
ers  that  restore  orthogonality  of  the  Walsh-Hadamard  chan¬ 
nel  codes  and  hence  suppress  MAI  are  [2,  3,  4,  5,  6,  7,  8],  Of 
these,  [4,  7,  8]  address  antenna  arrays,  while  the  others  con¬ 
sider  a  single  antenna,  possibly  with  oversampling.  In  Ref¬ 
erence  [8]  we  compare  one  and  two  antenna  receivers.  The 
interference  from  other  base-stations  is  addressed  in  Ghauri 
and  Slock  [4],  Frank  and  Visotsky  [3],  and  by  Krauss  and 
Zoltowski  in  [7]. 

In  this  paper  the  channel  and  noise  power  are  assumed 
known  (i.e.,  channel  estimation  error  is  neglected).  Using 
the  exact  channel  in  simulation  and  analysis  leads  to  an  in¬ 
formative  upper  bound  on  the  performance  of  these  meth¬ 
ods,  but  must  be  understood  as  such.  For  adaptive  versions 
of  linear  chip  equalizers  for  CDMA  downlink  see  [3]  and  [6] 
and  some  of  the  references  in  [5].  [3,  4]  present  performance 
analysis  in  the  form  of  SINR  expressions  for  the  multiple 
base-station  case,  for  the  chip-level  equalizer.  In  [7]  Krauss 
and  Zoltowski  show  that  the  SINR  expression  along  with  a 
Gaussian  assumption  is  a  good  predictor  of  uncoded  BER 
for  BPSK  symbols  for  the  chip-level  equalizers. 


The  impulse  response  for  the  i  —  th  antenna  channel,  between 
the  kth  base-station  transmitter  and  the  mobile-station  re¬ 
ceiver,  is 

Na- 1 

h\k\t)  =  [k]Prc(t  -rk)  i  =  1,  2,  k  =  1, 2  (1) 


prc(t)  is  the  composite  chip  waveform  (including  both  the 
transmit  and  receive  low-pass  filters)  which  we  assume  has 
a  raised-cosine  spectrum.  Na  is  the  total  number  of  delayed 
paths  or  “multipath  arrivals,”  some  of  which  may  have  zero 
or  negligible  power  without  loss  of  generality. 

The  channel  we  consider  for  this  work  consists  of  Na  =  17 
equally  spaced  paths  0.625 ns  apart  (to  =  0,  ri  =  0.625ps, 
. . .);  this  yields  a  delay  spread  of  at  most  10/is,  which  is  an 
upper  bound  for  most  channels  encountered  in  urban  cellular 
systems.  We  model  the  class  of  channels  with  4  equal-power 
random  coefficients  with  arrival  times  picked  randomly  from 

the  set  {to,  ri, . . . ,  Tie};  the  rest  of  the  coefficients  [fc] 
are  zero.  For  base-station  1,  once  the  4  arrival  times  have 

0-7803-5988-7/00/$  10.00  ©  2000  IEEE 


been  picked  at  random  and  then  sorted,  the  first  and  last 
arrival  times  are  forced  to  be  at  0  and  the  maximum  delay 
spread  of  lOps  respectively.  Base-station  2’s  arrival  times  are 
chosen  in  the  same  fashion  and  independent  of  base-station 
l’s,  but  without  forcing  arrivals  at  0  and  10ps.  The  coef¬ 
ficients  are  equal-power,  complex-normal  random  variables, 
independent  of  each  other.  The  arrival  times  at  antennas  1 
and  2  associated  with  a  given  base-station  are  the  same,  but 
the  coefficients  are  independent. 

The  “multi-user  chip  symbols”  for  base-station  k ,  s^[n], 
may  be  described  as 

base-station  k,  k  =  1,2.  The  equalizer  coefficients  q*k)  [n] 
comprise  the  equalizer  vector 

g(fc)=[gr  ■■■sTf  (5) 


g.(fc)  -  [g\k) [0],  g(k)  [1],  •  •  • ,  9{k) [N,  -  1]]T  i  =  1, . . . ,  M.  (6) 
The  MNg  x  1  vectorized  received  signal  is  given  by 

y[n]  =  H(1)s(1)  [n]  +  H(2)s(2)  [n]  +  rj[n]  (7) 

N(uk)  N.- 1 

*(fc)N  =  c6»}N  YI  -  Ncm]  (2) 

j= 1  m=0 

where  the  various  quantities  are  defined  as  follows:  c[kJ  [n]  is 
the  base-station  dependent  long  code;  is  the  j,h  user’s 
gain;  6^[m]  is  the  jth  user’s  bit/symbol  sequence;  cj*^[n], 

n  =  0,1,...,  Nc  —  1,  is  the  jth  user’s  channel  (short)  code; 
Nc  is  the  length  of  each  channel  code  (assumed  the  same  for 

each  user);  N ^  is  the  total  number  of  active  users;  N,  is  the 
number  of  bit/symbols  transmitted  during  a  given  time  win¬ 
dow.  The  signal  received  at  the  ith  antenna  (after  convolving 
with  a  matched  filter  impulse  response  having  a  square-root 
raised  cosine  spectrum)  from  base-station  k  is 

y(ik](t)  =  y^  sW[n]h\k)(t- nTc)  *  =  1,2  (3) 


where  h\k\t)  is  as  defined  in  Eqn.  (1).  The  total  received 
signal  at  the  mobile-station  is  simply  the  sum  of  the  contri¬ 
butions  from  the  different  base-stations  plus  noise: 

yi(t)  =  y,w(t)  +  y\2)(t)  +  q,(t)  *  =  1, 2.  (4) 

ydt)  is  a  noise  process  assumed  white  and  gaussian  prior  to 
coloration  by  the  receiver  chip-pulse  matched  filter. 

For  the  first  antenna,  we  oversample  the  signal  yi  (t)  in 
Eqn.  (4)  at  twice  the  chip-rate  to  obtain  yi[n]  =  yi  (nTc) 
and  y2 [n]  =  yi  (“  +  nTc).  These  discrete-time  signals  have 
corresponding  impulse  responses  =  fcj  (<)!t=nTc  and 

[n]  =  hjfc^(t)|t_rc+nI,  for  base-stations  k  =  1,2. 

For  the  second  antenna,  we  also  oversample  the  signal  j/s <(t) 
in  Eqn.  (4)  at  twice  the  chip-rate  to  obtain  2/3  [n]  =  1/2  (nTc) 
and  </4 [n]  =  2/2 +  nTc).  These  discrete-time  signals  have 
corresponding  impulse  responses  [n]  =  h^k\t)\t=nTc  and 
h{k^  [n]  =  (t)\t_T^^nT  for  base-stations  k  —  1,  2. 

Let  M  denote  the  total  number  of  chip-spaced  channels 
due  to  both  receiver  antenna  diversity  and  /  or  oversampling. 


The  “Chip-level”  MMSE  equalizer  is  shown  in  Figure  1  (two 
antenna  case  with  no  oversampling).  It  estimates  the  multi¬ 
user  synchronous  sum  signal  for  either  base-station  1  or  2, 
and  then  correlates  with  the  desired  user’s  channel  code 
times  that  base-station’s  long  code.  To  derive  the  chip-level 
MMSE  equalizer,  it  is  useful  to  define  signal  vectors  and 
channel  matrices  based  on  the  equalizer  length  Ng.  The  “re¬ 
covered”  chip  signal  will  be  —  D]  =  g(*)ffy[n]  for  some 

delay  D,  where  is  the  MNg  x  1  chip-level  equalizer  for 

SW[„]  =  [*<*>[«],  s(fc)[n  -  1], . . .  ,s(k)[n  -  (Ng  +  L  -  2)]]' 

H-k)  is  the  Ng  x  (L  +  Ng  —  1)  convolution  matrix 

'  fcjk)[0]  0  ...  0 

fi|fc)[l]  fijfc)[0]  0  0 

H[fc)=  h\k)[L- 1]  h\k)[L-  2]  fcjk)[0] 

0  h\k)[L- 1]  ••  h\k)[l] 

0  0  0  h\k)  [L  -  1] 

Equation  (7)  is  more  compactly  written  as 
y[n]  =  'Hs[n]  +  1][n] 

U  =  H(1)  :  H(2) 

s[n]  =  [s^T[n]  s*2^T[n]]  . 

The  MMSE  criterion  is 

min  E{\gWH(Us[n\  +  Tj[n])  -  t5^(k)[»]|2}  (14) 


where  Sd  is  all  zeroes  except  for  unity  in  the  (D  +  1)  —  th 
position  (so  that  S'pS^'1  [n]  =  s^[n  —  D]). 

We  assume  unit  energy  signals,  E{|s^  [n]|2}  =  1,  and  fur¬ 
thermore  that  the  chip-level  symbols  s^k'  [n]  are  independent 
and  identically  distributed,  £{s[n]sH[n]}  =  I.  This  is  the 

case  if  the  base-station  dependent  long  codes,  c^[n],  are 
treated  as  iid  sequences,  a  very  good  assumption  in  practice. 
The  equalizer  which  attains  the  minimum  is 

g(k)  =  (««a  +  R„)  ‘h(%. 

The  MMSE  is 


=  l-6lH(-k)H  (MU* +  Rrw'j  'h^Sd.  (16) 









tyX m] 




I  I  a  n  =  mN  +  D 

1—1  Ci/atjAa/./oi/^/ 


Figure  1.  Chip  and  Symbol  MMSE  Estimators  for  kth  Base-Station,  two  antennas,  no  oversampling. 

The  MMSE  equalizer  is  a  function  of  the  delay  D.  The 
MMSE  may  be  computed  for  each  D,0<D<Ng  +  L  —  2 
with  only  one  matrix  inversion  (which  has  to  be  done  to  form 
gW  anyway).  Once  the  D  yielding  the  smallest  MMSE  is  de¬ 
termined,  the  corresponding  equalizer  g^  may  be  computed 
without  further  matrix  inversion  or  system  solving. 


In  this  section  we  present  what  we  call  the  “symbol-level” 
MMSE  estimator.  This  estimator  depends  on  the  user  index 
and  symbol  index,  and  hence  varies  from  symbol  to  symbol. 
The  FIR  estimator  that  we  derive  here  is  a  simplified  version 
of  that  presented  in  [9]  where  in  our  case,  all  the  channels  and 
delays  from  a  given  base-station  are  the  same.  The  conclu¬ 
sions  reached  in  that  paper  apply  equally  well  here,  namely 
that  FIR  MMSE  equalization  always  performs  at  least  as  well 
as  the  “coherent  combiner”  (that  is,  the  RAKE  receiver). 
This  type  of  symbol-level  receiver  has  also  been  presented  in 
[10],  although  again  not  specifically  for  the  CDMA  downlink. 

The  symbol-level  equalizer  differs  from  the  chip-level 
equalizer  in  that  the  base  station  and  Walsh-Hadamard  codes 
do  not  appear  explicitly  in  the  block  diagram  (see  Figure  1). 
Instead,  the  codes  become  incorporated  into  the  equalizer  it¬ 
self.  To  derive  the  equalizer,  we  first  define  [n]  as  the  bit 
sequence  b\k\m]  upsampled  by  Nc:  [n]  =  bjk\  m]  when 

n  =  mNc  and  [n]  =  0  otherwise.  We  wish  to  estimate 
[m]  directly  and  we  do  this  by  finding 

min£{|a<fc)[n  -  D]  -  a$k)[n  -  D]\2}  (17) 

where  the  minimization  is  done  only  when  n  —  D  =  mNc. 
As  in  the  chip-level  case,  aSk\n  —  D]  =  g^Hy[n]  where  y[n] 
is  given  by  Eq.  (11).  Setting  n  =  mNc  +  D,  the  MSE  is 
minimized  yielding 

[n]  =  Z?{s^[n]s^H[n]).  We  assume  here  that 
user  is  only  transmitted  by  base  station  k.  We 
also  assume  that  the  base  station  and  Walsh-Hadamard 
codes  are  deterministic  and  known  so  that  the  only  ran¬ 
dom  elements  in  s[n]  are  the  transmitted  bits.  Then 

[n]s^*[m]}  =  0  for  fc  /  y  and  any  n  and  m,  so 
R^>[n]  =  R^[n]  =  0.  The  (i,j)th  element  of  R {kk)[n] 
is  S\kk)[n]  =  E{sW[n  +  1  -  +  1  -  j]}.  When  i  =  j, 

Sfkk)  [n]  =  1.  When  i  j, 

(fcMr  ,  (  BiAr>Wlk)[n],  when 

$ij  lnJ  -  \  (n  +  1  —  i)modVc  =  (n  +  1  —  j)modArc 

I  0  otherwise 

where  R^ 
the  desired 



B>,[n\  =  4»)[n+l-,']ci?*[n  +  l-j]  €  {±2,  ±2j]  V  i,j  (23) 


<’[»]  =  E4fc)  [(«  +  1  —  t)modAlc]c},*^[(n  -f  1  —  j)modAfc] 



g(fc)H  =  (‘HR8s[mNc  +  D]UH  +  R,,)_1Ri,sH  (18) 


Rss[n]  =  £{s[n]sH[ri]}  (19) 

Rbs[m]  =  £{fc’[m]s[mA'c  +  D]}  (20) 

We  now  proceed  to  derive  expressions  for  Rss[rc]  and 
Rbs[m],  Using  Eq.  (13), 

Rss[u]  = 

R R^W 


Figure  2.  Bound  on  the  potentially  non-zero  off- 
diagonal  elements  of  f?ss[n]  [Aic  =  64]. 

note  that 

fixed  m  and  n, 
cifc)  tm],  •  •  •  i  cNc  [m]j  and 

are  two  different  rows  of  the  Hadamard  matrix.  The  element- 
by-element  (Schur)  product  of  these  two  rows  is  also  a  row 
of  the  Hadamard  matrix  containing  (Nc/2)  l’s  and  ( Nc/2) 
-l’s.  So 

Nu  =  1, . . . ,  Nc/ 2 
Nu  =  Nc/2  +  l,...,Nc 



Therefore,  when  i  ^  j  and  (n  +  1  —  i)modNc  =  (n  +  1  — 


|5<f)N|  <  |  J 

1  Mk)  =  1, . . . ,  Nc/2 

Nc/Nik)  -  1  Nik)  =  Nc/2  +  1, . . . ,  Nc 

This  bound  is  plotted  as  a  function  of  in  Fig.  2. 

Note  that  when  =  Nc,  Sjkk^  [n]  =  0  for  all  i  ^  j ,  so 
R<kk)  [n]  =  I.  If  we  assume  that  the  Walsh  codes  are 
chosen  randomly  when  N ^  <  Nc,  it  can  be  shown  that 
Kk)i  n\/Nlk)  is  a  linear  function  of  a  hypergeometric  ran¬ 
dom  variable.  Its  variance  is  (Nc  —  ) /(Nc  —  1). 

Therefore,  those  off-diagonal  elements  which  are  not  zero 
have  zero  mean  and  the  variance  shown  in  the  plot  in  Fig. 
2.  For  nearly  all  values  of  Nik\  the  variance  is  clearly  quite 
small.  So  in  all  cases,  we  may  well  approximate  [n]  by 

I  in  Eq.  (18)  yielding 

g(fc)[m]  =  (WH*  +  (27) 

We  will  see  through  simulation  that  this  approximation 
works  quite  well  when  compared  to  the  “exact”  equalizer 
constructed  with  a  time-varying  R ss- 

The  ith  element  of  Ris[m]  is  (with  n  =  mNc  +  D): 

f  c[kJ[n  +  1  —  *lc^[Z>  +  1  —  t],  for 

'’[mjsjn+l-i]}  =  <  0<l5  +  l-  i<Nc-l 

[  0  otherwise 


With  D  satisfying  Nc  —  1  <  D  <  L+Ng  —  2,  the  entire  Walsh 
code  for  the  desired  user  appears  in  Ri,s[m]  and 

Rss[m]  =  [  0c4-i_jvc  cy[m]  0l+n9-2-D  ]T  (29) 

Cj[m]  =  [c^[(m  +  l)Ne  -  l]c(k) [Nc  —  1] . 

c[kJ[mNc  +  l]c<*°  [1],  [mAfc]c{*)  [0]]T 

While  the  equahzer  varies  from  symbol  to  symbol  due 
to  variation  in  both  RssM  and  R(>s[to],  by  approximating 
RssM  by  I,  the  variation  is  confined  to  Rbs[m]. 


The  RAKE  receiver  is  simply  a  multipath-incorporating 
matched  filter.  In  particular,  the  RAKE  can  be  viewed  as  a 
chip-spaced  filter  matched  to  the  channel,  followed  by  cor¬ 
relation  with  the  long  code  times  channel  code.  Note,  in 
practice,  these  operations  are  normally  reversed,  but  may 
be  reversed  due  to  short-time  LTI  assumptions.  The  RAKE 
receiver  is  exactly  represented  by  the  “Chip-Level”  portion 

of  Figure  1,  if  we  let  Ng  =  L  and  fl^M  =  h^[L  —  n],  n  = 
0, . . . ,  L  -  1,  i  =  1, . . . ,  M. 

Walsh-Hadamard  sequence.  The  signals  for  all  the  users  are 
of  equal  power  and  summed  synchronously,  and  each  base- 
station  had  the  same  number  of  users.  The  sum  signal  is 
scrambled  with  a  multiplicative  QPSK  spreading  sequence 
(“scrambling  code”)  of  length  32768  similar  to  the  IS-95  stan¬ 

The  uncoded  BER  results  are  averaged  over  different  chan¬ 
nels  for  varying  SNRs.  The  channels  were  generated  accord¬ 
ing  to  the  model  presented  in  Section  2.  “SNR”  is  defined 
to  be  the  ratio  of  the  sum  of  the  average  powers  of  the  re¬ 
ceived  signals  from  the  desired  base-station,  to  the  average 
noise  power,  after  chip-matched  filtering.  “SNR  per  user 
per  symbol”  is  the  SNR  divided  by  the  number  of  users  and 
multiplied  by  the  spreading  factor.  For  the  chip-level  MMSE, 
the  total  delay  of  the  signal,  D,  through  both  channel  and 
equalizer,  was  chosen  to  minimize  the  MSE  of  the  equalizer. 

We  first  present  results  for  a  receiver  near  the  base-station 
so  that  out-of-cell  interference  is  negligible.  Two  receive 
antennas  are  employed  with  no  oversampling.  Two  equal¬ 
izer  lengths  were  simulated:  for  chip-level,  Ng  =  57  and 
114,  while  for  symbol-level,  the  length  is  chosen  Nc  —  1 
longer.  Since  the  chip-level  equalizer  is  followed  by  corre¬ 
lation  with  the  channel  code  times  long  code,  its  effective 
length  is  Ng  +  Nc  —  1;  hence,  a  fair  comparison  between 
the  symbol-level  and  chip-level  sets  the  symbol-level  equal¬ 
izer  longer  by  Nc  —  1  chips.  Figure  3  presents  the  results 
for  the  fully  loaded  cell  case,  i.e.  64  equal  power  users  were 
simulated.  The  RAKE  receiver  is  significantly  degraded  at 
high  SNR  by  the  MAI,  which  is  seen  in  the  Figure  as  a  BER 
floor  for  SNR  greater  than  10  dB.  The  chip-  and  symbol-level 
equalizers  perform  much  better  than  the  RAKE.  Increasing 
the  equalizer  length  improves  performance  for  both  chip-level 
and  symbol-level.  Comparing  the  length  57  chip-level  to  120 
symbol-level,  we  observe  little  improvement  in  the  symbol 
level  at  low  SNR  with  increasing  improvement,  up  to  2-3 
dB,  at  high  SNR.  Comparing  length  114  chip-level  to  177 
symbol-level  also  shows  an  improvement  that  increases  with 
SNR,  but  less  of  an  improvement  than  for  the  shorter  equal¬ 
izers.  Note  that  since  all  64  channel  codes  are  present  and 
have  equal  power,  R,,  =  I  and  the  symbol-level  MMSE  es¬ 
timate  is  optimal  in  the  MSE  sense. 

In  Figure  4,  once  again  the  out-of-cell  interference  is  as¬ 
sumed  negligible.  In  this  simulation  only  8  equal  power  chan¬ 
nel  codes  are  active,  i.e.,  the  cell  is  only  lightly  to  moderately 
loaded.  In  this  simulation  the  RAKE  receiver  does  much 
better  since  it  experiences  less  in-cell  MAI  than  for  64  users. 
For  the  range  of  SNR  simulated  the  chip-level  equalizer  does 
only  slightly  better  than  the  RAKE  receiver.  As  for  the 
fully  loaded  cell,  the  symbol-level  equahzer  performs  better 
than  the  chip-level  equalizer.  For  comparison  the  “optimal” 
symbol-level  equahzer  is  shown  which  involves  a  matrix  in¬ 
verse  for  every  symbol  (as  in  Equation  (18));  this  equahzer  is 
only  slightly  better  than  the  symbol-level  equahzer  presented 
in  this  paper.  This  result  justifies  the  assumption  /  simplifi¬ 
cation  that  R„  is  proportional  to  I,  even  when  Nu  <  Nc. 

Figure  5  results  from  a  simulation  with  two  base-stations, 
each  with  64  equal  power  users.  The  2nd  base-station  is 
treated  as  interference  and  is  received  with  the  same  power 
as  the  1st,  desired  user’s  base-station.  Specifically, 


A  wideband  CDMA  forward  hnk  was  simulated  similar  to 
one  of  the  options  in  the  US  cdma2000  proposal  [1],  The 
spreading  factor  is  Nc  —  64  chips  per  bit.  Simulations 
were  performed  for  both  “saturated  cells,”  that  is,  all  64 
possible  channel  codes  active,  as  well  as  lightly  loaded  cells 
with  8  channel  codes  active.  The  chip  rate  is  3.6864  MHz 
(Tc  =  0.27 ps),  3  times  that  of  IS-95.  The  data  symbols 
are  BPSK  which,  for  each  user,  are  spread  with  a  length  64 

Af  M 

£  E{\y(m  Ml2}  =  £  E{\y£>  Ml2}.  (31) 

m=l  m=  1 

In  addition  to  two  independent  antennas,  two-times  over- 
sampling  is  employed  for  a  total  of  four  chip-spaced  channels. 
The  results  are  very  analogous  to  the  single  base-station  case: 
the  symbol-level  out-performs  the  chip-level,  increasingly  so 
at  high  SNR.  However  the  improvement  is  more  dramatic, 
especially  for  the  shorter  lengths. 



1  base  station:  2  antennas 

The  symbol-level  equalizer  derived  here  performs  better  than 
the  chip-level,  however  at  a  greater  computational  cost.  In 
fact  our  simulations  have  shown  that  even  though  the  equal¬ 
izer  is  sub-optimal,  it  has  performance  closely  approaching 
optimality.  The  approximation  that  the  source  covariance 
is  diagonal  means  that  a  matrix  inverse  is  required  only  as 
often  as  the  channel  changes  (and  not  every  symbol),  and 
hence  the  computational  complexity  is  much  smaller  than 
the  optimal  equalizer. 


[1]  Telecommunications  Industry  Association,  “Physical 
Layer  Standard  for  cdma2000  Standards  for  Spread 
Spectrum  Systems  -  TIA/EIA/IS-2000.2-A”,  TIA/EIA 
Interim  Standard,  March  2000. 

[2]  Anja  Klein,  “Data  Detection  Algorithms  Specially  De¬ 
signed  for  the  Downlink  of  CDMA  Mobile  Radio  Sys¬ 
tems”,  in  IEEE  47th  Vehicular  Technology  Conference 
Proceedings ,  pp.  203-207,  Pheonix,  AZ,  May  4-7  1997. 

[3]  Colin  D.  Frank  and  Eugene  Visotsky,  “Adaptive  Inter¬ 
ference  Suppression  for  Direct-Sequence  CDMA  Sys¬ 
tems  with  Long  Spreading  Codes”,  in  Proceedings  36th 
Allerton  Conf.  on  Communication ,  Control,  and  Com¬ 
puting,  pp.  411-420,  Monticello,  IL,  Sept.  23-25  1998. 

[4]  I.  Ghauri  and  DTM.  Slock,  “Linear  receivers  for 
the  DS-CDMA  downlink  exploiting  orthogonality  of 
spreading  sequences”,  in  Conf.  Rec.  32rd  Asilomar 
Conf.  on  Signals,  Systems,  and  Computers,  Pacific 
Grove,  CA,,  Nov.  1998. 

[5]  Kari  Hooli,  Matti  Latva-aho,  and  Markku  Juntti, 
“Multiple  Access  Interference  Suppression  with.  Linear 
Chip  Equalizers  in  WCDMA  Downlink  Receivers”,  in 
Proc.  Global  Telecommunications  Conf.,  pp.  467-471, 
Rio  de  Janero,  Brazil,  Dec.  5-9  1999. 

[6]  Stefan  Werner  and  Jorma  Lilleberg,  “Downlink  Chan¬ 
nel  Decorrelation  in  CDMA  Systems  with  Long  Codes” , 
in  IEEE  49th  Vehicular  Technology  Conference  Pro¬ 
ceedings,  vol.  2,  pp.  1614-1617,  Houston,  TX,  May  16- 
19  1999. 

[7]  Thomas  P.  Krauss  and  Michael  D.  Zoltowski,  “MMSE 
Equalization  Under  Conditions  of  Soft  Hand-Off” ,  in 
IEEE  Sixth  International  Symposium  on  Spread  Spec¬ 
trum  Techniques  &  Applications  (ISSSTA  2000)  (to  ap¬ 
pear),  September  6-8  2000. 

[8]  T.  Krauss  and  M.  Zoltowski,  “Oversampling  Diversity 
Versus  Dual  Antenna  Diversity  for  Chip-Level  Equal¬ 
ization  on  CDMA  Downlink”,  in  Proceedings  of  First 
IEEE  Sensor  Array  and  Multichannel  Signal  Processing 
Workshop,  Cambridge,  MA,  March  16-17  2000. 

[9]  Hui  Liu  and  Mike  Zoltowski,  “Blind  equalization  in 
antenna  array  CDMA  systems”,  IEEE  Transactions 
on  Signal  Processing,  vol.  45,  pp.  161-172,  Jan.  1997. 

[10]  A.  Klein,  G.  Kaleh,  and  P.  Baier,  “Zero  Forc¬ 
ing  and  Minimum  Mean- Square-Error  Equalization  for 
Multiuser  Detection  in  Code-Division  Multiple- Access 
Channels”,  IEEE  Transactions  on  Vehicular  Technol¬ 
ogy,  vol.  45,  pp.  276-287,  May  1996. 

Figure  3.  Fully  loaded  cell,  all  64  channel  codes  in 

Figure  4.  Lightly  loaded  cell,  8  out  of  64  active  chan¬ 
nel  codes. 

Figure  5.  One  interfering  base-station  of  equal 
power,  64  channel  codes  per  cell. 



Yinnin  Zhang* ,  Kehu  Yang*  and  Moeness  G.  Amin* 

t  Department  of  Electrical  and  Computer  Engineering, 
Villanova  University,  Villanova,  PA  19085 

*  ATR  Adaptive  Communications  Research  Laboratories, 
Seika-cho,  Soraku-gun,  Kyoto  619-0288,  Japan 


In  this  paper,  we  propose  transform  domain  array  process¬ 
ing  schemes  for  DS-CDMA  communications.  Space-time 
adaptive  processing  (STAP)  is  a  useful  means  to  combat 
the  multiuser  interference  (MUI)  in  CDMA  systems.  The 
computation  burden  and  slow  convergence  are  two  major 
problems  in  implementing  the  STAP.  This  paper  proposes 
optimum  and  sub-optimum  transform  domain  arrays  with 
different  feedback  schemes  for  CDMA  communications.  The 
transform  domain  arrays  provide  reduced  computations  over 
traditional  implementation  methods  as  well  as  they  offer 
improved  convergence  performance,  leading  to  an  efficient 
system  implementation. 


Array  processing  in  direct-sequence  code  division  multiple 
access  (DS-CDMA)  communications  has  recently  attracted 
considerable  attention  [1,  2,  3].  The  use  of  the  joint  space- 
time  adaptive  processing  (STAP),  which  includes  two-di¬ 
mensional  RAKE  (2-D  RAKE)  receiver,  provides  excellent 
performance  of  suppressing  the  multiuser  interference  (MUI) 
and  inter-symbol  interference  (ISI)  as  well  as  combining  the 
multipath  signals  to  achieve  the  RAKE  diversity  effect  in 
frequency-selective  fading.  In  order  to  combine  sufficient 
number  of  multipath  rays  to  enhance  the  signal  power  and 
reduce  the  ISI,  a  large  number  of  weights  are  required  at 
the  feedback  loop.  The  complexity  and  convergence  rate 
problems  remain  the  bottleneck  of  the  implementation  of 
these  systems  [4]. 

In  this  paper,  we  propose  a  transform  domain  app¬ 
roach  to  chip-level  space-time  adaptive  processing  for  DS- 
CDMA  communications  with  different  feedback  schemes. 
Chip-level  space-time  adaptive  processing  effectively  miti¬ 
gates  both  MUI  and  ISI  before  despreading  and,  as  such, 
only  a  simple  correlation  and  summation  operation  with  the 
desired  user’s  code  is  required  to  follow.  When  subband 
array  is  applied  to  the  chip-rate  STAP  processing,  the  sig¬ 
nal  decorrelation  using  orthogonal  transforms  and  feedback 
schemes  greatly  reduce  the  circuit  size  within  each  single 

The  work  of  Y.  Zhang  and  M.  G.  Amin  is  supported  by  the 
Office  of  Naval  Research  under  Grant  N00014-98-1-0176. 

feedback  loop,  and  subsequently  improves  the  receiver  con¬ 
vergence  performance  [5,  6].  Discrete  Fourier  Transform 
(DFT),  filter  banks  and  wavelets  are  among  the  commonly 
used  orthogonal  transform  for  this  purpose  [7].  In  this 
paper,  we  consider  the  DFT  as  the  example.  Decimation 
available  at  the  transform  domain  processing  also  makes  it 
possible  to  reduce  the  signal  processing  speed  at  each  trans¬ 
form  domain  bin  [5,  6]. 


We  consider  a  base  station  using  an  antenna  array  of  N  sen¬ 
sors  with  P  users.  In  CDMA  systems,  usually  P  >  N.  The 
received  signal  vector  at  the  array  is  expressed,  in  discrete¬ 
time  form  sampled  at  the  chip  rate,  as 

P  oo 

*(*)  =  EE  dPmP(k  -i)+ b  (k)  (i) 

p=  1  l—  —  oo 

where  dp(k)  and  hp(fc)  are  the  chip-rate  sequence  and  the 
channel  response  vector  of  the  pth  user,  and  b (k)  is  the 
additive  noise  vector. 

In  CDMA  communications,  each  symbol  is  spread  into 
L  chips.  Without  loss  of  generality,  we  denote  the  signal 
of  the  user  of  interest  as  si(n),  and  the  signals  from  other 
users  as  sp(n),  p  =  2, ...,  P.  Aperiodic  spreading  sequence 
are  assumed.  The  chip  length  is  L  =  T/Tc,  where  T  and 
Tc  are,  respectively,  the  symbol  duration  and  chip  duration. 
We  denote  the  spreading  sequence  for  the  nth  symbol  of  the 
P  users  as  cp(n,  l),  p  =  1, ...,  P,  l  =  1, ...,  L.  Then, 

dp(k)  =  sp(n)cp{n,  l  -  lp)  (2) 

where  k  =  nL  +  l,  and  lp{ 0  <  lp  <  L)  is  the  chip  delay 
index  that  models  the  asynchronous  system.  We  make  the 
following  assumptions: 

Al)  The  information  symbols  sv{n),p  =  1,2,  ...,P,  are 
wide-sense  stationary  and  i.  i.  d.  with  £,[.sp(n)s*(n)]  =  1. 

A2)  The  spreading  sequences  cp(n,l),p  =  1,  2, ...,  P,  l  = 
1,  •  •  • ,  L,  are  assumed  independent  random  sequences. 

0-7803-5988-7/00/$10.00  ©  2000  IEEE 



A3)  All  channels  h p{k),p  —  1,2,  are  linear  time- 
invariant,  and  of  a  finite  duration  within  [0,  DTC],  That  is, 
hp(fc)  =  0,p  =  1,  2, P,  for  k  >  D  and  k  <  0. 

A4)  The  noise  vector  b (k)  is  zero-mean,  temporally  and 
spatially  white  with 

U[b(k)bT(fc  -(-  i)]  =  0  for  any  / 


£[b(fc)bH(fc-M)]  =  aINS(l), 

where  the  superscripts  T  and  H  denote  transpose  and  con¬ 
jugate  transpose,  respectively,  I  at  is  the  N  x  N  identity 
matrix,  and  5(1)  is  the  Kronecker  deta  function. 

By  stacking  M  consecutive  chips  of  x(fc),  we  can  obtain 


x(fc)  =  ]T  Wpdp(fc)  +  b(fe)  =  Hd(k)  +  b(fc),  (3) 



x(k)  =  [xT(fc)  xT (k  —  1)  •  •  •  xT(fc  —  M  +  1)] T  ,  (4) 

d p(k)  =  [dp(k)  dp(k  -  1)  •  •  •  dp(k  -  M  +  l)f ,  (5) 

d(fc)=  [£(k)dUk)  •••  <£(fc)]T,  (6) 

'Ll  — 

rhp(o)  ...  h (dp)  o  .  o-i 

0  hp(0)  h p(Dp)  0  0 

L  0  .  0  hp(0)  ...  hp(Z)p)J 


H  =  [Hi  V.2  ■■■  Up]T  ,  (8) 


b(fc)  =  [bT(fe)  b T(k  -  1)  •  ■  •  bT(k  -  M  +  1)] T  .  (9) 

Denote  w  as  the  weight  vectors  of  the  STAP  system 
corresponding  to  x(fc),  the  output  of  the  STAP  becomes 

y(k)  =  wT  x(k).  (10) 

The  optimum  weight  vector  under  the  minimum  mean  square 
error  (MMSE)  criterion 

min E  \y(k)  —  di(k  -  v)\2  (11) 


is  given  by  the  Wiener-Hopf  solution 

w  opt  =  R_1r,  (12) 

where  v  >  0  is  a  delay  to  minimize  the  MMSE, 

R  =  E[x*(fc)xT(fc)],  (13) 

r  =  E[x*(k)di(k  —  n)],  (14) 

and  the  superscript  *  denotes  complex  conjugate.  The 

training  signal  is  assumed  to  be  an  ideal  replica  of  di(k). 

Prom  the  assumptions  Al)  -  A4),  (13)  and  (14)  can  be 
expressed  as 

R  =  %*'Ht  +  <tImat,  (15) 

r  =  'Hie,,,  (16) 

respectively,  where 

ev  =  [0_— _0  1  0  •  Of.  (17) 


The  MMSE  is  given  by 

MMSE  =  E  |wjptx(fc)  -di(k-  v)\2  =  1  -  rHR-1r.  (18) 

Despreading  the  array  output  signal  y (k)  by  the  sig¬ 
nature  code  of  desired  signal,  we  obtain  the  symbol-rate 
output  signal  for  detection,  expressed  as 

L- 1 

z(n)  =  '^y(nL  +  l  +  v)ci(n,l).  (19) 



3.1.  Centralized  Feedback  Scheme 

Performing  a  transform  of  x(n)  by  using  an  orthogonal 
matrix  T,  we  obtain  the  received  signal  vector  at  the  trans¬ 
form  domain  as 

xt(  n)  =  Tx(n)  (20) 


xr(fc)  =  [(xyl)(fc))T  (xf(fc)f  •••  (x<.M)(fc)f]T,  (21) 

where  xf  }(n)  is  the  signal  vector  at  the  mth  transform 

domain  bin.  Denote  wr  =  (wfr  (w®)T  (wf^fj 

as  the  weight  vector  in  the  transform  domain.  Then  the 
output  of  the  transform  domain  array  system  becomes 

yr(k)  =  wjxr  (k)  =  w?Tx(A;).  (22) 

Again,  using  the  MMSE  criterion 

min  E  \yr(k)  —  di(k  —  v)\2  ,  (23) 

the  optimum  weight  vector  is  given  by 

w  r.opt  =  R^rr  =  T*wopt,  (24) 


Rt  =  F[xH&)xr(fc)] 

=  T*RTt  (25) 

=  (T UY  (Tnf +ct1mn, 

rT  =  E[x*T(k)di(n  -  »)]  =  T*r  =  (TWi)*  e„.  (26) 

It  is  easy  to  verify  that  the  transform  domain  array  with 
centralized  feedback  scheme  provides  the  same  steady-state 
MMSE  performance,  as  given  by  equation  (18).  The  cen¬ 
tralized  feedback  scheme  is  depicted  in  Fig.  1. 


Reference  signal 

we  ignore  the  off-block-diagonal  elements  of  the  correla¬ 
tion  matrix  Rr ,  yielding  an  approximation  by  the  block- 
diagonal  matrix 

0  Rl, 

0  0  RV 

Fig.  1  Subband  array  with  centralized  feedback. 

3.2.  Localized  Feedback  Scheme 

We  note  that  the  orthogonal  transform  can  reduce  the  corre¬ 
lation  between  different  transform  bins.  DFT,  filter  banks, 
and  wavelets  are  commonly  used  methods  for  providing 
orthogonal  transforms.  Here  we  consider  the  DFT  as  the 
example.  Denote 

R|m)  =S[(x^m)(n))*(x^(n))r] 

V  '  (33) 

=  (T(m)ny  (T(m)W)T  +<xLv 

is  the  signal  covariance  matrix  of  x^(n).  Using  the  prop¬ 
erty  of  block-diagonal  matrix,  we  have 

(R'r)"1  = 

0  (Rt  )' 

Therefore,  the  inversion  computation  of  dimension  NM  x 
NM  becomes  M  parallel  group  of  matrix  inversion  of  dimen¬ 
sion  NxN,  as  such  the  computations  can  be  greatly  reduced. 





W°M  ■ 

wit  ■ 

::  i ' 

VV  M 


When  recursive  methods  are  used,  it  is  realized  by  using  M 
parallel  control  loops  with  N  weights  in  each  loop.  The 
localized  feedback  scheme  is  shown  in  Fig.  2. 


wT~l)  • 

..  w^\ 

Reference  signal 

as  the  M  x  M  transform  matrix  at  the  output  of  each  array 
sensor,  where 

WM  =  exp  (~|p)  ,  (28) 

then  the  transform  matrix  T  becomes 

T  =  P2(Lv®T„)P1,  (29) 

where  ®  denotes  the  Kronecker  product.  In  (29),  Pi  is  a 
permutation  matrix  to  change  the  order  of  the  vector  x(n) 
such  that  the  M  samples  at  each  array  sensor  align  together, 
and  P2  is  another  permutation  matrix  that  allows  the  N 
data  of  each  bin  to  align  together. 

T  can  be  expressed  in  the  form 

where  T(m*  is  the  N  x  NM  submatrix  of  the  matrix  T 
corresponding  to  the  mth  bin.  Denote 

x(m)(n)  =T(m)x(n)  (31) 

as  the  signal  vector  at  the  mth  subband.  When  the  sig¬ 
nal  correlation  between  different  transform  bins  is  small, 

Subband  array  with  localozed  feedback. 

We  use  d\  ( k )  as  the  reference  signal  at  each  transform 
bin.  In  this  case,  the  cross-correlation  vector  between  the 
received  signal  vector  and  the  reference  signal  at  the  mth 
transform  bin  becomes 

E  (4-)(fc))* *(*-.)]  =  [T( 


In  the  localized  feedback  scheme,  the  weight  vector  at 
each  bin  can  be  obtained  from  the  NxN  correlation  matrix 

Ry" ;  and  the  N  x  1  correlation  vector  r^’1 '  which  are  deter¬ 
mined  only  by  the  data  vector  and  reference  signal  at  that 
bin,  i.e., 

W  T 



Therefore,  the  centralized  feedback  transform  domain  array 
can  be  approximated  by  a  set  of  parallel  independent  rank- 
reduced  adaptive  array  processors  at  each  bin,  at  the  cost 
of  ignoring  the  correlation  between  signals  at  different  bins. 
Such  transform  domain  array  with  the  localized  feedback 
scheme  can  be  easily  implemented  by  using  a  set  of  parallel 
array  processors,  each  with  the  number  of  weights  equal  to 
N,  instead  of  NM. 

It  is  clear  that 



(r‘'>)T  (r?>)T  ...  (r<"’)T]T  =  rT.  (37) 

Therefore,  the  equivalent  full-band  weight  vector  of  the 
localized  feedback  transform  domain  array  becomes 

Reference  signal 

w  'T  =  (R't)  1  r'T  =  (RT)-1rT.  (38) 

Fig.  3  Subband  array  with  partial  feedback. 

The  corresponding  MSE  of  the  localized  feedback  scheme  is 
given  by 

MSEz,f  =  1  +rtf{R'T)-1RT(RlT)-1rT 
— 2Re  [rF(R'T)-1rT]  ■ 

Equation  (39)  implies  that  the  localized  feedback  transform 
domain  array  approach  is  suboptimal,  and,  its  performance 
depends  on  the  significance  of  the  cross-correlation  between 
signals  at  different  bins.  It  is  clear  from  (25)  and  (39)  that 
the  off-block-diagonal  element  of  matrix  Rt,  and  subse¬ 
quently  the  MSE  performance  of  the  localized  feedback  sub¬ 
band  array,  depend  on  both  the  transform  matrix  T  and  the 
channels  Hp,p  =  1, 2, ...,  P. 

3.3.  Partial  Feedback  Scheme 

In  the  previous  subsection,  we  discussed  the  transform  do¬ 
main  array  with  localized  feedback  scheme  as  an  approxi¬ 
mation  of  the  transform  domain  array  with  centralized  feed¬ 
back  scheme.  Such  localized  feedback  scheme  reduces  the 
number  of  weights  at  each  bin  at  the  expense  of  perfor¬ 
mance  reduction,  since  the  off-block-diagonal  elements  are 
not  considered  in  the  weight  estimation. 

A  subband  array  with  partial  feedback,  which  is  shown 
in  Fig.  3,  is  also  possible  and  provides  more  flexibility  in 
trading-off  the  system  complexity  with  the  steady-state  MSE 
performance.  As  shown  below,  the  partial  feedback  scheme 
is  a  generalization  of  the  centralized  and  localized  feedback 
schemes,  which  can  be  considered  as  two  extreme  and  spe¬ 
cial  cases. 

In  the  transform  domain  array  with  partial  feedback 
scheme,  the  total  M  bins  are  divided  into  K  groups.  The 
number  of  bins  in  ith  group  is  Mi,  i  =  1, 2, ...,  K,  with  Mi  + 

M2  H - 1-  Mk  =  M.  In  this  paper,  we  consider  the  simple 

case  of  Mi  =  M2  =  •  •  ■  =  Mk  =  M/K. 

In  this  case,  the  signal  covariance  matrix  Rt  is  approxi¬ 
mated  by  a  new  block-diagonal  matrix  R't  with  larger  block 
size  M\N ,  expressed  as 

Rt  = 










where  R^?^  is  of  dimension  M\N  x  M\N.  For  Mi  >  1, 
fewer  off-block-diagonal  elements  are  ignored  in  R't  com¬ 
pared  to  Rt-  Therefore,  the  partial  feedback  scheme  pro¬ 
vides  more  accurate  weights  estimation,  and  subsequently 
better  MSE  results,  as  compared  with  the  localized  feed¬ 
back  scheme.  Similar  to  the  localized  feedback  case,  the 
weight  vector  in  the  partial  feedback  scheme  is  given  by 

//  rr*n  \  —  1  ft 
wT  =  (Rt)  vt  = 

■  (R'?l))-14Gl)  ■ 




=E  [(x<.Gi)(fc))*di(fc-v) 


as  d\(k  —  v)  is  used  as  the  reference  signal  at  each  group, 

x^‘\k)  = 




=[(r<?")’'  (,?>')T  ...  (r<f«))T]r  =  ,T, 


the  MSE  of  the  partial  feedback  array  is  therefore 
MSEpf  =  1 +r^(R^)-1RT(RT)_1rT 

— 2Re[i#(R£)-1rT]  .  ^ 

It  is  noted  that,  the  partial  feedback  scheme  simplifies 
to  the  centralized  feedback  scheme  when  M\  =  M.  In  this 
case,  Rp  becomes  Rr,  and  equation  (45)  becomes  equation 
(18).  On  the  other  hand,  the  localized  feedback  scheme  is 
achieved  by  setting  Mi  =  1.  In  this  case,  Rp  becomes  H'T, 
and  equation  (45)  becomes  equation  (39). 


In  this  section,  we  consider  the  convergence  performance  of 
the  transform  domain  arrays  with  centralized  feedback  and 
localized  feedback.  The  popularly  used  least  mean  square 
(LMS)  algorithm  is  considered. 

One  of  the  key  factors  affecting  the  convergence  perfor¬ 
mance  in  the  proposed  transform  domain  arrays  is  the  num¬ 
ber  of  controllable  weights  in  the  feedback  system.  In  the 
transform  domain  array  with  centralized  feedback  scheme, 
the  number  of  weights  is  NM,  whereas  in  the  cases  of  the 
transform  domain  array  with  localized  feedback  and  partial 
feedback  schemes,  the  number  of  weights  in  each  indepen¬ 
dent  control  loop  is  N  and  Mi  N,  respectively  (although  the 
number  of  total  weights  of  the  entire  bins  remains  NM). 

It  is  known  that  the  convergence  rate  of  LMS  algorithm 
depends  on  the  eigenvalue  spread,  i.e.,  the  ratio  between 
the  maximum  and  minimum  eigenvalues  of  the  covariance 
matrix  [8].  Since  the  covariance  matrix  defined  at  a  bin, 
R^n),  m  =  1, ...,  M,  or  that  defined  at  several  bins,  R j?*\i  = 
is  a  submatrix  of  Rt,  from  the  interlacing  prop¬ 
erty  [9],  the  eigenvalue  spread  of  R^  and  that  of  R^?' '  are 
smaller  than  that  of  Rt-  Therefore,  the  transform  domain 
arrays  with  localized  and  partial  feedback  provide  improved 
convergence  performance. 

On  the  other  hand,  when  comparing  the  STAP  system 
and  the  transform  domain  array  with  centralized  feedback 
scheme,  since  an  orthonormal  transform  does  not  change 
the  eigenvalues,  it  is  clear  that  the  eigenvalue  spread  of  R 
and  Rt  are  the  same.  Therefore,  the  STAP  system  and  the 
centralized  feedback  transform  domain  array  offer  the  same 
convergence  performance  [6].  However,  if  the  signal  powers 
at  different  bins  are  different  (due  to,  e.g.,  pulse  shaping  fil¬ 
tering,  frequency-selective  channel  characteristics),  the  con¬ 
vergence  performance  can  be  improved  by  performing  power 
compensation  at  the  different  bins  so  that  the  eigenvalue 
spread  is  reduced  [10,  11,  12]. 


We  have  analyzed  the  performance  of  transform  domain 
arrays  for  DS-CDMA  systems  with  different  types  of  feed¬ 
back  schemes,  and  derived  the  respective  expressions  of 
the  mean  square  error  (MSE).  For  all  proposed  schemes, 

the  transformation  is  performed  in  the  chip  level  before 
despreading.  It  has  been  shown  that  transform  domain 
arrays  with  localized  and  partial  feedback  schemes  are  gen¬ 
erally  suboptimal,  and  their  MSE  performance  depends  on 
the  transform  matrix  of  the  analysis  filters  as  well  as  the 
communication  channel  characteristics.  Since  the  local¬ 
ized  feedback  scheme  reduces  the  number  of  weights  at 
the  control  loop,  the  convergence  rate  is  usually  improved, 
which  is  of  practical  importance  in  implementing  space- 
time  adaptive  processing  in  feist  fading  environments.  The 
partial  feedback  scheme  generalizes  the  other  two  proposed 
schemes,  namely,  the  centralized  and  localized  feedback  sys¬ 
tems.  This  scheme  provides  the  flexibility  to  balance  the 
system  complexity  with  the  steady-state  and  convergence 


[1]  A.  J.  Paulraj  and  C.  B.  Papadias,  “Space-time  process¬ 
ing  for  wireless  communications,”  IEEE  Signal  Process¬ 
ing  Magazine ,  vol.  14,  no.  6,  pp.  49  83,  Nov.  1997. 

[2]  U.  Madhow  and  M.  Honig,  “MMSE  interference  sup¬ 
pression  for  direct-sequence  spread-spectrum  CDMA,” 
IEEE  Trans.  Commun.,  vol.  42,  pp.  3178-3188,  Dec. 

[3]  H.  Liu  and  M.  D.  Zoltowski,  “Blind  equalization  in 
antenna  array  CDMA  systems,”  IEEE  Trans.  Signal 
Processing ,  vol.  45,  no.  1,  pp.  161-172,  Jan.  1997. 

[4]  U.  Madhow,  “Blind  adaptive  interference  suppression 
for  direct-sequence  CDMA,”  Proc.  IEEE,  vol.  86,  no. 
10,  pp.  2049-2069,  Oct.  1998. 

[5]  Y.  Zhang,  K.  Yang,  and  M.  G.  Amin,  “Adaptive  sub¬ 
band  arrays  for  multipath  fading  mitigation,”  in  Proc. 
IEEE  AP-S  Int.  Symp.,  Atlanta,  GA,  pp.  380-383,  June 


[6]  Y.  Kamiya  and  Y.  Karasawa,  “Performance  comparison 
and  improvement  in  adaptive  arrays  based  on  the  time 
and  frequency  domain  signal  processing,”  IEICE  Trans. 
Commun.,  vol.  J82-A,  no.  6,  pp.  867-874,  June  1999. 

[7]  G.  Strang  and  T.  Nguyen,  Wavelets  and  Filter  Banks, 
Wellesley-Cambridge,  1996. 

[8]  S.  Haykin,  Adaptive  Filter  Theory,  3rd  Ed.  New  Jersey: 
Prentice  Hall,  1996. 

[9]  G.  H.  Golub  and  C.  F.  Van  Loan,  Matrix  Computations, 
3rd  Ed.  Maryland:  John  Hopkin  Univ.  Press,  1996. 

[10]  J.  C.  Lee  and  C.  K.  Un,  “Performance  analysis  of 
frequency-domain  block  LMS  adaptive  digital  filters,” 
IEEE  Trans.  Circuits  and  Systems,  vol.  36,  no.  2,  pp. 
173-189,  Feb.  1989. 

[11]  M.  de  Courville  and  P.  Dujamel,  “Adaptive  filtering  in 
subbands  using  weighted  criterion,”  IEEE  Trans.  Signal 
Processing,  vol.  46,  no.  9,  pp.  2359-2371,  Sept.  1998. 

[12]  K.  Yang,  Y.  Zhang,  and  Y.  Mizuguchi,  “Subband  real¬ 
ization  of  space-time  adaptive  processing  for  mobile 
communications,”  in  Proc.  10th  Int.  Symp.  on  Personal, 
Indoor  and  Mobile  Radio  Communications,  Osaka,  Sept. 



Sectorized  Space-Time  Adaptive  processing 
for  cdma  Systems 

Kehu  Yang  Yoshihiko  Mizuguchi and  Yimin  Zhang 2 

1  ATR  Adaptive  Communications  Research  Laboratories, 
Seika-cho,  Soraku-gun,  Kyoto  619-0288,  Japan 

2  Department  of  Electrical  and  Computer  Engineering, 
Villanova  University,  Villanova,  PA  19085,  USA 


Space-time  adaptive  processing  (STAP)  is  an  effective 
technique  of  suppressing  both  the  multiuser  access 
interference  (MUAI)  and  the  inter-symbol  interference  (ISI) 
in  wideband  CDMA  mobile  communication  systems. 
However,  its  complexity  is  one  of  the  key  problems  in 
practical  implementations.  In  this  paper  we  propose 
adaptive  antenna  techniques  that  realize  low-complexity 
space-time  adaptive  processing  within  a  given  spatial  sector 
by  spatial-smoothing  subarray  beamforming  sectorization. 
The  proposed  technique  has  the  close  performance  to  that  of 
the  associated  optimum  element-space  STAP  system. 


In  direct-sequence  code-division  multiple-access  (DS- 
CDMA)  systems,  adaptive  antennas  under  the  scheme  of 
space-time  adaptive  processing  (STAP)  [1,  2]  is  called  as 
two-dimensional  RAKE  (2-D  RAKE)  receivers  [3],  and  is 
known  to  be  an  effective  method  in  suppressing  both  the 
multiuser  access  interference  (MUAI)  and  the  inter-symbol 
interference  (ISI).  However,  the  prohibitive  computation 
complexity  of  STAP  systems  is  one  of  the  key  problems  in 
the  practical  implementations  which  restricts  their 
application  to  practical  systems  and  To  reduce  their 
complexity,  optimal  and  sub-optimal  approaches  based  on 
parallel  implementation  and  low-rank  transformations  have 
been  proposed  so  far  [4-8]. 

Beamspace-based  partially  adaptive  processing  methods 
are  the  sub-optimal  approaches  widely  used  in  array  signal 
processing,  where  reduced-dimension  processing  is 
performed  via  employing  a  few  beams  to  encompass  the 
significant  components  in  the  systems  [4, 9].  The  sectorized 
beamspace  adaptive  diversity  combiner  is  one  of  the 
applications  which  is  effective  in  combating  multipath 
fading  in  the  wireless  communications  [4],  References  [5] 
and  [6]  proposed  other  two  approaches  that  involve  the 

wideband  beamforming  and  the  reduced-dimension 
beamforming,  respectively. 

In  this  paper  we  propose  novel  low-complexity  sectorized 
adaptive  antenna  techniques  which  use  the  spatial- 
smoothing  subarray  beamformers  to  achieve  effective  beam 
diversity  as  well  as  sufficient  degrees  of  freedom  (DOF’s) 
for  MUAI  suppression.  In  the  proposed  techniques,  the  full 
field  of  view  is  divided  into  a  number  of  spatial  sectors, 
wherein  the  sectorized  STAP  is  performed  individually.  The 
array  is  partitioned  into  a  set  of  subarrays,  each  forms  a 
beam  to  cover  the  same  specific  sector  of  interest.  In  the 
sector  of  interest,  the  number  of  MUAI’s  is  greatly  reduced 
from  the  full  field-of-view  condition.  The  sectorized  STAP 
scheme  combines  the  advantages  of  the  reduced-rank 
beamspace  processing  and  the  spatio-temporal  processing 
techniques.  In  comparison  with  the  conventional  STAP 
systems  performed  in  the  full  field  of  view,  the  complexity 
of  the  sectorized  processing  is  highly  reduced  whereas  the 
performance  loss  to  that  of  the  optimum  STAP  systems  can 
be  kept  small. 


Consider  a  cellular  CDMA  base  station  using  an  antenna 
array  of  N  (N>  1)  elements  with  P  users.  The  p-th  user’s 
baseband  waveform  of  the  transmitted  signal  is  expressed  as 

sp(t)=  X sp(m)pp(t-mT ),  (1) 

m=-<> o 

where  sp  ( m )  denotes  the  w-th  information  symbol  of  the 
p-th  user, 

Pp(t)=  £ cp(j)y(t  -jTc),0<t<T  (2) 


represents  the  signature  waveform  of  the  p-th  user, 
[c p{j))NjL()X  is  the  spreading  code  assigned  to  the  p-th  user, 
Nc  is  the  number  of  chips  per  symbol,  y/(t)  is  the 

0-7803-5988-7/00/$  10.00  ©  2000  IEEE 


normalized  chip  waveform  limited  within  [0,rc],  and  Tc  is 
the  chip  interval.  The  spreading  sequence  can  be  periodic  or 
aperiodic,  which  depends  on  the  standard  to  be  used.  In  this 
paper,  we  consider  the  periodic  case,  i.e.,  the  non-random 
CDMA  systems. 

The  array  receiving  signal  vector  \{t)  is  denoted  as 
p  lp 

X(0  =  11  a (0  p  )^sp  ( t -T,p)  +  n (/) 

p=W=l  (3) 

=  X  'LIp(fn)Sp(f-mT)  +  n(t) 

p=\  m=-°° 



gp(O  =  Xa(0p)£ppp(f-Tp),  (4) 


{0P,TP,^P }  express  respectively  the  angle-of-arrival 

(AOA),  the  time  delay,  and  the  propagation  loss 
corresponding  to  the  /-th  path  of  the  p- th  user.  Moreover, 
a (0)  is  the  array  steering  vector  corresponding  to  0; 
sp  (jn)  denotes  m- th  information  symbol  of  the  p-th  user, 
Lp  is  the  total  number  of  multipath  rays  of  the  p-th  user, 
T  =  NCTC  is  the  symbol  duration,  and  n(r)  is  the  array 
noise  vector. 



hp(O  =  Sa(0/%V(#-T,p)  (5) 


as  the  channel  response  of  the  p-th  user,  we  can  rewrite  (3) 

x(0  =  X  X  % sp{m)cp(jJbp(t-jTc  -mT)+n(,t).(6) 

p-\  w=-«>  7=0 

We  make  the  following  assumptions: 

Al)  The  information  symbols  s p  (/?;),  p  =  1,  -  ,P  are 

i.i.d.,  and  satisfy  E{s p(m)s*(n)\  =  8pq8mn ,  where  (•)* 
denotes  complex  conjugation  and  8pq  denotes  the 
Kronecker  delta  function. 

A2)  The  channels  |hp(r),p  =  are  linear  and 

time-invariant  with  a  finite  duration  within  [0,  DpTc  ] .  Here, 
we  assume  DpTc  >T  for  wideband  CDMA  channels. 

A3)  The  noise  vector  is  zero-mean,  temporally  and 
spatially  white  with  £{n(r)nr(r)}  =  0  and 
£{n(r)nw  (r))  =  cr2I ,  where  (-)T  and  (-)H  denote 

transpose  and  conjugate  transpose,  respectively,  c2 
expresses  the  noise  power,  and  I  is  the  identity  matrix.  The 
noise  vector  is  also  assumed  to  be  uncorrelated  with  the  user 


Denote  A=  TJJ  as  the  sampling  cycle,  where  /  >  1  is  an 
integer  which  expresses  the  factor  of  oversampling.  Thus, 
sampling  at  t  =  /A  +  tiTc ,  the  discrete  form  of  (5)  becomes 

P  4~  Nr- 1 

x(/A  +  nTl.)  =  '£j  ]T  laS p{m)c p{j)x 

p=\m=-°°  y'=0 

h  p  (;A  +  tiTc  -  jTc  -  mT)  +  n(/A  +  nTc ) 

/=0,...^-I.  (7) 

By  stacking  x(iA+nTc),  i=0, ...7-1,  we  have 

x(n)  =  '£'ZPp(n-d)hp(d)  +  n(n) ,  (8) 


where  pp  (n)  is  the  chip-rate  signal  sequence  of  the  p-th 
user.  In  (8),  we  use  the  notation  a(n)  =  [aT  (nTc ),  ••• , 
aT  ( nTc  +(J  -  1)A)] T  ,  where  a  denotes  either  x,  h  or  n. 


1.  Chip-level  optimum  adaptive  processing 

For  the  consecutive  samples  during  the  period  of  M  chips 
( M>NC ),  we  form  the  following  vectors 

X  ( n )  =  |xr  (n),x  (n  - 1),-  •  • ,  x  (n  -M  +  l)f  ,  (9) 

Sp(n)  =  lip(n),pp(n-l),-,pp(n-M -Dp+1)]T , 


N(n)  =  ^r(n),nr(/7-l),-”,nr(n-M  +1)]  .  (11) 

Define  the  following  Sylvester  convolution  matrix  of  user  p 
by  the  impulse  response  of  its  vector  channel, 

fc(0),hp0)....,h Tp(Dp)]\  as 

H<M)  — 

‘hp(0)  ...  h  p{Dp)  0  .  0 

0  !lP(0)  -  hp(Dp)  0  ».  0 

0  .  0  hp(0)  -  hp(Dp) 


with  the  dimension  of  MNJx(M+Dp),  and  (8)  is  extended  to 
X(n)=  Spin) +  N  (n).  (13) 

p= ■ 

The  output  of  the  STAP  under  (13)  is  described  as, 

yin)=WTXin).  (14) 

Under  the  minimum  mean  square  error  (MMSE)  criterion 



min  E^fiPo(n-v)-y(n)\2 ,  (15) 

where  pPa  ( n )  is  the  training  chip  sequence  of  the  user  p0, 

which  is  considered  as  the  desired  user,  and  v  >  0  is  the 
delay  of  the  training  signal  selected  to  minimize  the  MMSE. 
The  optimum  weights  are  given  by  the  Wiener-Hopf 
equation  as 

<,c%=R*«V0(v),  (16) 


Rx  =E[x(n)XH(n)]  (17) 

is  the  space-time  correlation  matrix,  and 

rPo  (v)  =  e[u*o  (n  -  v)X  («)]  (18) 

expresses  the  cross-correlation  vector  between  the  training 
signal  and  the  received  signal  vector.  It  is  seen  that  the 
complexity  of  the  chip-level  adaptive  filter  depends  on  the 
dimension  of  the  signal  vector,  i.e.,  the  dimension  of  the 
weight  vector  that  is  selected  based  on  the  length  of  the 
associated  channels. 

It  is  noted  that  in  CDMA  systems,  the  performance  of  the 
chip-level  processing  is  confined  to  the  number  of  the 
degrees  of  freedom  (DOF’s)  provided  by  the  employed 
array  and  the  cyclostationarity  of  the  users’  signals.  Such  a 
problem  can  be  mitigated  in  the  scheme  of  symbol-level 
processing,  where  the  MUAI  components  become  quasi¬ 
random  noises  after  despreading  with  the  signature  code  of 
the  desired  user. 

2.  Symbol-level  optimum  adaptive  processing 

Symbol-level  processing  is  so  called  that  symbol-duration 
spaced  taps  are  used  in  the  space-time  filter.  Similar  to  the 
oversampling-based  subchannel  formulation  as  made  in  (7), 
and  (8),  the  subchannel-based  signal  vector  after 
despreading  the  array  receiving  signals  with  the  signature 
code  of  the  desired  user  p0  is  denoted  as 

Xc(mNe )/£ Xs ( mNc  +  l)cpo  (l) ,  (19) 



X,  (/3)  =  [xT  (/5), xr  ()3  - 1),-  •  ■  ,xr  03  -  Nc  +  l)f  .  (20) 

By  stacking  K  consecutive-symbol  samples,  we  have  the 
space-time  signal  vector  as 

Xc(m)  =  [xTc  0 mNc),-,XTc  ((m-K  +  l)Ne)]T  (21) 
Let  M=KNC,  from  (19)-(21),  it  is  seen  that  Xc(m)  has  the 
same  form  as  (13).  This  implies 

Xc(m)  =  ^H^Sp(mNc+l)cPo(l) 

p= 1  1=0 

+  (mNc  +  /)cPo  (/). 


It  is  seen  that 


has  KN.  +D„ 

C  Po 

components  that  are  the  consecutive  samples  of  the  single¬ 
path  despreading  signal  waveform  plotted  in  Fig.  1,  where 
the  peaks  are  the  desired  finger  outputs.  The  peak 

components  of  the  vector 



standing  for  the  MUAI’s  should  be  suppressed  because  they 
could  lead  to  false  fingers  in  the  situations  where  the  near- 
far  problem  exists.  When  there  is  no  near-far  problem,  they 
are  considered  as  quasi-random  noises.  The  symbol-level 
adaptive  processing  can  be  performed  based  on  (21),  i.e., 

yc (rn)  =WTXc(m)  =  £w[ ((m - l)Nc ) ,  (23) 

where  IV  .  Similar  to  the  chip-level 

processing,  under  the  symbol-level  MMSE  criterion 

min  E\sPo(m-v)-yc(m)\2 , 


the  optimum  weight  vector  is  obtained  as 

^ ^  p, symbol  ' 



R c=E[xc(m)X?(m)]  , 


yp(V)=E[rjm-V)XAm)]  , 


spg  ( m )  denotes  the  training  symbol  sequence  of  p0- th  user, 

and  v  is  selected  in  the  same  way  as  explained  in  (15). 

It  is  noted  that  the  above  filter  (25)  still  has  the  same 
complexity  as  that  given  in  (16). 

IV.  Sectorized  space-time 


1.  Lower-rank  beamspace  transformation 

Lower-rank  beamspace  transformation  is  known  to  be  an 
effective  way  to  reduce  the  complexity  of  an  array 
processing  system.  Unlike  the  scheme  of  the  conventional 
beamforming,  here  we  consider  the  smoothing  subarray 
beamforming  illustrated  in  Fig.  2. 

Define  b  =  [bl,b2,-",bN_K]T  as  the  beamformer  vector, 
which  forms  a  beam  to  encompass  the  desired  signal  at  each 
of  the  k-\  subarrays  (  k<N ).  Then,  the  output  signal  vector 


of  the  beamforming  in  Fig.  2  is  denoted  as 

\b(t)  =  'BT\(t)  (28) 

where  xb(t)  =  [xb](t),xb2(t),---,xbK(t)]T ,  and  the  beam- 
former  matrix  B  is  expressed  by 

'  bx  0  —  0 

:  bi  : 

b N-K  : 

0  b, n-k 






Nx(k+1 ) 

Xbc(m)  =  [xTbc(mNc),---,XTbc((m-K+l)Nc)]T .  (40) 
Under  MMSE  criterion 

min  E\lpo(m-v)-ybc(mNc)\2 ,  (41) 

the  optimum  weight  vector  is  obtained  as 

<,Sector=RfcX6)(V)’  (42) 


Rbc=E\xtc(mNc)X£(mNe)],  (43) 

f\v)  =  E[sl(m-v)Xhc(mNc)]  ,  (44) 

2  Sectorized  space-time  adaptive  processing 

The  sectorized  space-time  adaptive  processing  can  be 
performed  in  the  same  way  as  that  described  in  Section  II 
and  III  by  replacing  x(t)  with  xb  (t) .  Define 

zp(t)  =  BThp(t)  (30) 


n6(/)  =  Brn(f)  (31) 

xh(iA  +  nTc)  =  BTx(iA  +  nTc),  i= (32) 

we  have 

P  D? 

**(«)  =  XZA lp{n-d)zp{d)+nb{n),  (33) 



x„(«)  =  [x£  («7’c),-,x[ (nTc  +(J-  l)A)]r  ,  (34) 

zp  (n)  =  tp  (nTc ),■  ■  ■  ,zTp  ( nTc  +  (J  -  l)A)f  ,  (35) 

n»(«)  =  tl (nTc),-,nTb  ( nTc  +(J-  l)A)f  .  (36) 
By  stacking  the  Nc  consecutive  samples,  we  have 

X„in)  -  k  («).*!  (»  -  Dr (n  -  Nc  +1)]T  .  (37) 

The  symbol-level  vector  after  depsreading  the  output 
signals  vector  Xb{n )  can  be  denoted  as 


Similar  to  (23),  the  symbol-level  sectorized  space-time 
adaptive  processing  can  be  performed  as 

yfrc (mNc )=WTXbc ( mNc )  =  X w[ Xbc((m-l)Nc) ,  (39) 



spo  (m)  and  v  are  of  the  same  meaning  as  that  in  (27), 

To  further  reduce  the  complexity,  we  can  use  only  the 
significant  components  over  a  threshold  within  the  vector 
Xbc(mNc),  as  is  commonly  implemented.  We  denote  it  as  the 
simplified  scheme.  The  results  of  the  simplified  scheme  are 
included  and  compared  at  the  computer  simulations. 


Computer  simulations  are  performed  to  confirm  the 
effectiveness  of  the  proposed  techniques.  In  these 
simulations,  an  eight-element  uniform  linear  array  with 
half-wavelength  spacing  is  used.  The  array  is  partitioned 
into  subarrays,  and  beams  are  formed  at  each  subarray.  For 
example,  the  beamformer  for  a  three-subarray  partitioning 
(six  array  sensors  at  each  subarray)  can  be  designed  as 

Jj  _  ^-yi  25«  e-y0.75O  e- J0.2SU  'gjOMu 

where  u  =  27rsin(0°)  and  9°  dictates  the  central  angle  of 
the  sector  where  the  spatial  rays  of  the  desired  user  signals 
are  located.  In  the  simulations,  18  CDMA  users’  signals  are 
considered  to  be  present,  where  user  1  is  considered  as  the 
desired  user.  The  code  length  of  all  the  users  is  127.  Each 
user  has  6  multipath  rays.  It  is  assumed  that  the  AOA’s  of 
the  paths  are  Gaussian  distributed  for  each  user,  and  their 
propagation  loss  and  time  delay  obey  the  Rayleigh  and  the 
exponential  distributions,  respectively.  Detailed  parameters 
for  the  desired  user  are  given  in  Table  1.  The  signal -to-noise 
ratio  (SNR)  of  the  direct  ray  of  the  user  1  is  assumed  as  - 
lOdB,  and  the  SNR’s  of  the  direct  rays  of  the  other  users  are 
randomly  chosen  from  -12.7  dB  to  -6.6  dB.  And  their 
nominal  AOA’s  are  uniformly  distributed.  The  central  angle 
of  the  given  sector  is  assumed  as  9°  =  12.3° . 

We  selected  K=2,  i.e.,  two  taps  for  the  symbol-level 
space-time  adaptive  processing.  The  steady  state  residual 
error  powers  of  the  normal  sectorized  STAP  and  its 
simplified  scheme  are  plotted  in  Fig.  3,  respectively,  where 


the  number  of  subarrays  is  changed  from  one  to  four.  In  the 
simplified  scheme,  the  threshold  is  taken  as  1.8  times  the 
standard  deviation  of  the  components’  amplitudes  of  the 
signal  vector  Xbc(mNc).  The  residual  error  power  of  the 
element-space  STAP  is  -25.36dB,  which  is  considered  as 
the  bound  of  the  sectorized  processing  and  is  also  plotted  in 
Fig.  3.  It  is  clear  that  the  results  of  three-beam  and  four- 
beam  sector  STAP  are  close  to  the  bound,  whereas  the 
complexity  and  the  computational  burden  are  greatly 
reduced,  especially  for  the  simplified  scheme  with  the 
acceptable  performance  loss. 


We  have  proposed  sectorized  STAP  techniques  for 
CDMA  systems,  which  provide  effective  sub-optimal  low- 
complexity  implementation  of  a  STAP  system.  Simulation 
results  show  close  performance  to  the  optimal  element- 
space  STAP  system. 


[1]  A.  J.  Paulraj  and  C.  B.  Papadias,  “Space-time  processing 
for  wireless  communications,”  IEEE  Signal  Processing 
Magazine,  vol.  14,  no.  6,  pp.  49-83,  Nov.  1997. 

[2]  R.  Kohno,  “Spatial  and  temporal  communication  theory 
using  adaptive  antenna  array,”  IEEE  Personal 
Communications,  vol.  5,  no.  1,  pp.  28-35,  Feb.  1998. 

[3]  H.  Liu  and  M.  D.  Zoltowski,  "Blind  equalization  in 
antenna  array  CDMA  systems,"  IEEE  Trans.  Signal 
Processing,  vol.  45,  no.  1,  pp.  161-172,  Jan.  1997. 

[4]  T.-S.  Lee  and  Z.  S.  Lee,  “A  sectorized  beamspace  adaptive 
diversity  combiner  for  multipath  environments”,  IEEE 
Trans.  Vehi.  Technol.,  vol.  48,  pp.  1503-1510,  Sept.  1999. 

[5]  J.  Ramos,  M.  D.  Zoltowski,  and  H.  Liu,  “Low-complexity 
space-time  processing  for  DS-CDMA  communications”, 
IEEE  Trans.  Signal  Processing,  vol.  48,  no.  1,  Jan.  2000. 

[6]  Y.-F.  Chen,  M.  D.  Zoltowski,  J.  Ramos,  C.  Chatterjee,  and 
V.  P.  Roychowdhury,  “Reduced-dimension  blind  space- 
time  2-D  Rake  receivers  for  DS-CDMA  communication 
systems,”  IEEE  Trans.  Signal  Processing,  vol.48,  no.  6, 
June.  2000. 

[7]  K.  Yang,  Y.  Zhang,  and  Y.  Mizuguchi,  “Spatio-temporal 
signal  subspace-based  subband  space-time  adaptive 
processing,”  in  Proc.  Int.  Symp.  on  Antennas  and 
Propagation,  Fukuoka,  Japan,  Aug.  2000. 

[8]  Y.  Zhang,  K.  Yang,  and  M.  G.  Amin,  “Transform  domain 
array  processing  for  CDMA  systems,”  in  Proc.  IEEE 
Workshop  on  Statistical  Signal  and  Array  Processing, 
Pocono  Manor,  PA,  Aug.  2000. 

[9]  B.  D.  Van  Veen  and  R.  A.  Roberts,  “Partially  adaptive 
beamformer  design  via  output  power  minimization,” 
IEEE  Trans.  Acoust.,  Speech,  Signal  Processing,  vol. 
ASSP-35,  pp.  1524-1532,  1987. 

Table  1  Parameters  of  the  desired  user 








0.045  +  0.998i 




0.93  -  0.206i 











0.355  -  0.264i 




-0.264  +  0.034i 

Fig.  1  Single-path  despreading  waveform 

*|(0  *2(0  *3(0  xN.2(t)  xN.,(t)  xN(t) 

Fig.  2  Smoothing  subarray  beamforming 

Fig.  3  Residual  error  power 



Zhengyuan  Xu  and  Ping  Liu 

Dept,  of  Electrical  Engineering 
University  of  California 
Riverside,  CA  92521 
{dxu,  pliu} 


Signals  modulated  by  M- ary  pulse  amplitude  modu¬ 
lation  (PAM)  or  M- ary  quadrature  amplitude  mod¬ 
ulation  (QAM)  have  certain  structured  constellation. 
When  the  communication  channel  introduces  inter-symbol 
interference  (ISI)  at  the  receiver  end,  demodulation  of 
such  signals  can  be  performed  by  constant  modulus  al¬ 
gorithm  (CMA)  based  equalizers  to  cancel  the  inter¬ 
ference.  However,  characteristics  of  modulated  signals 
are  only  partially  considered  in  the  CMA  cost  func¬ 
tion.  In  this  paper,  more  constraints  are  imposed  on 
the  equalized  signal  to  fully  capture  the  property  of 
the  modulated  signal  both  in  its  phase  and  amplitude. 
Observing  that  PAM  signals  are  uniformly  spaced  on 
the  x-axis  and  QAM  signals  in  two-dimensional  signal 
space,  the  property  of  transmitted  signals  from  each 
category  can  be  included  in  an  equivalent  determinis¬ 
tic  mathematical  description,  similar  to  the  constant 
modulus.  This  description  is  absorbed  in  our  modified 
cost  function,  resulting  in  a  simultaneous  minimization 
of  dispersion  relevant  to  signal’s  phase  and  amplitude. 
The  performance  of  the  equalizers  based  on  these  new 
algorithms  are  compared  with  the  CMA  equalizer. 


In  different  wireless  applications,  different  modulation 
schemes  are  employed  to  meet  specific  resource  or  ser¬ 
vice  requirements.  Each  modulation  exhibits  its  own 
property.  Signals  by  M-ary  pulse  amplitude  modula¬ 
tion  (PAM)  or  M- ary  quadrature  amplitude  modula¬ 
tion  (QAM)  have  certain  structured  constellation.  For 
PAM  signals,  they  are  uniformly  spaced  in  the  real  axis 
(x-axis),  while  QAM  signals  are  uniformly  distributed 
in  a  2-dimensional  signal  space.  If  such  signals  are 
transmitted  through  a  multipath  channel,  signal  de¬ 
modulation  requires  an  equalizer  to  mitigate  the  chan¬ 
nel  distortion.  The  particular  source  characteristics  of¬ 

ten  facilitate  the  equalizer  design.  The  constant  mod¬ 
ulus  algorithm  (CMA)  based  equalizer  is  widely  used 
[7]  and  shows  its  unique  capability  in  equalizing  sig¬ 
nals  with  constant  modulus  property  [5].  It  was  first 
proposed  by  [3].  Extensive  studies  on  such  equalizers 
have  followed  [1],  [2],  [4].  The  algorithm  minimizes  the 
deviation  of  modulus  of  equalized  signal  from  a  con¬ 
stant.  The  satisfactory  performance  can  be  achieved 
especially  when  the  transmitted  signal  has  constant 
modulus  property. 

It  seems  that  the  knowledge  about  the  phase  of  the 
modulated  signal  is  dismissed  in  CMA.  However,  this 
knowledge  plays  an  equivalent  role  in  many  cases  in 
representing  a  signal.  It  can  be  expected  that  its  incor¬ 
poration  into  the  cost  function  will  improve  the  equal¬ 
ization  performance.  To  equalize  a  dispersive  chan¬ 
nel  (could  be  complex)  with  M- PAM  transmitted  sig¬ 
nals,  the  dispersion  in  the  distance  of  the  equalized 
signal  away  from  the  x-axis  should  also  be  minimized 
together  with  its  modulus  deviation.  Similarly,  when  a 
M- QAM  signals  are  transmitted,  it  is  not  sufficient  to 
consider  only  the  amplitude  of  the  equalized  signal  in 
a  2-dimensional  signal  space,  since  they  are  uniformly 
distributed  along  both  directions  which  are  perpendic¬ 
ular  to  each  other  and  parallel  to  two  axes.  Motivated 
by  CMA  algorithm,  we  will  design  new  equalizers  for 
these  two  kinds  of  modulated  signals  by  taking  into 
account  their  equally  spaced  property  in  our  new  cost 
function.  Similar  to  CMA  algorithm,  the  stochastic 
gradient  descent  methods  are  employed  to  update  our 
equalizers.  The  performance  of  the  equalizers  based 
on  these  new  algorithms  are  compared  with  the  CMA 


In  wireless  communications,  the  multipath  channel  in¬ 
troduces  inter-symbol  interference  (ISI)  in  the  received 

0-7803-5988-7/00/$  10.00  ©  2000  IEEE 


signal  x  G  Cn  [4] 

x  =  Hs  +  w  (1) 

where  s  G  Cm  is  the  complex  source  vector  from  ei¬ 
ther  M-PAM  or  M- QAM  constellation,  if  G  Cpxm 
is  a  complex  channel  matrix,  w  G  Cp  represents  ad¬ 
ditive  white  Gaussian  noise  (AWGN),  and  x  G  Cp  is 
the  received  signal  vector.  To  detect  the  signal  s(l), 
an  equalizer  /  G  Cp  is  designed.  Its  output  y  can  be 
written  as 

y  =  fHx  =  aTs  +  fHw  (2) 

where  superscripts  (-)T,  (-)H  stand  for  transpose  and 
Hermitian  respectively,  aT  =  f  H  H  is  the  compos¬ 
ite  response  of  the  channel  and  the  equalizer.  Perfect 
equalization  can  be  achieved  in  the  absence  of  noise  if 
the  equalizer  can  compensate  the  channel  in  such  a  way 
that  a  has  only  one  non-zero  element  [6] 

a  =  eJ0  [0,  •  •  • ,  0, 1, 0,  •  •  • ,  0]T  '  (3) 

Therefore  the  output  will  be  a  delayed  input  with  some 
phase  shift.  ISI  is  completely  eliminated  in  the  absence 
of  noise.  Different  criteria  can  be  used  to  seek  perfect 
equalization.  In  CMA  criterion,  the  dispersion  of  the 
modulus  of  equalizer  output  about  a  constant  is  mini¬ 

Jc(f)=E{(\y\2~r0f}  (4) 

where  “E”  represents  expectation,  r0  =  The 

algorithm  is  usually  implemented  by  stochastic  gradi¬ 
ent  descent  method 

f(k  +  1)  =  f(k )  -  ti(\y(k)\ 2  -  r0)y*{k)x(k)  (5) 

where  *  represents  conjugate.  It  is  clear  that  the  mod¬ 
ulus  characteristic  is  captured  and  employed.  However, 
most  modulated  signals  possesses  properties  in  both 
amplitudes  and  phase.  The  M- PAM  or  M- QAM  sig¬ 
nals  take  discrete  values  from  a  set  whose  elements  lie 
on  the  x-axis  or  a  2-dimensional  signal  space  uniformly. 
Motivated  by  CMA  criterion,  we  will  derive  a  new  cost 
function  to  incorporate  this  information  and  develop  a 
corresponding  algorithm  to  obtain  the  equalizer  next. 


Let  us  first  review  the  representations  and  properties 
of  PAM  and  QAM  signals.  The  PAM  signals  are  one 
dimensional  in  the  sense  that  they  are  real  and  uni¬ 
formly  distributed  on  the  real  axis.  The  QAM  signals 
are  complex  and  uniformly  spaced  in  directions  of  real 
axis  and  imaginary  axis.  Due  to  this  similarity,  the 

properties  of  QAM  can  be  easily  found  once  the  prop¬ 
erties  of  PAM  signals  are  explored.  For  a  general  dis¬ 
cussion,  the  multipath  channel  and  the  equalizer  are 
assumed  to  be  complex  for  both  cases.  We  start  with 
the  equalization  of  PAM  signals. 

3.1.  PAM  signals 

M- ary  PAM  signals  can  be  represented  by  the  following 

sm  =  (2m  -  1  —  M)d,  m  =  1,  •  •  • ,  M 

where  m  is  a  random  number.  Usually  M  is  an  even 
integer  and  can  be  written  as  M  =  2 L.  These  PAM 
signals  can  also  be  expressed  by 

sm  =  (2m  -  1  )d,  m  =  -L,  ■  ■  ■  ,L  (6) 

if  we  define  a  new  variable  m  =  m  -  L.  We  will  adopt 
this  signal  description  later.  In  (6),  m  can  only  take 
integers  from  —  L  to  L  which  can  be  expressed  by  sm 
as:  m  =  Sr^~  ■  In  the  current  context,  this  constraint 
is  equivalent  to  sin(rmr)  =  0.  Thus  it  requires 

«"(  2d  -7r  =  cos(  ’^7r)  =  0  (7) 

The  transformation  from  (6)  to  (7)  is  essential  in  con¬ 
structing  our  cost  function.  The  other  property  of  sm 
is  that  it  has  phase  equal  to  a  multiple  of  n  because  sm 
lies  on  the  real  axis.  Therefore 

sin(p  =  0  (8) 

where  rf>  is  the  phase  of  sm.  Taking  into  account  the 
complex  equalized  signal,  we  can  combine  (4),  (7)  and 

(8)  in  one  cost  function 

Ji(f)  =  E{{\y\ 2  -  r0)2  +  aicos2(j^j7r)  +  a2sm2(<£)} 


where  aq  and  a2  are  weighting  factors,  y  is  the  equal¬ 
ized  signal  given  by  (2),  cj>  is  its  phase.  In  (9),  y  and 
<fr  are  functions  of  our  equalizer  f.  Therefore  J\  (/)  is 
a  highly  non-linear  function  of  /  and  difficult  to  mini¬ 
mize.  Similar  to  CMA  algorithm,  we  update  the  equal¬ 
izer  according  to  gradient  descent  method 

/(fe  +  l)  =  /(fc)-/x1VJ1(/)|/=/(fc)  (10) 

The  derivative  of  J\  (/)  with  respect  to  fH  is  required 
in  (10).  It  can  be  derived  term  by  term  from  the  RHS  of 

(9) .  The  first  term  is  directly  from  CMA.  Its  derivative 
can  be  easily  found  to  be 

(E{(\y\2  -  r0)2})'f  =  2£{(|y|2  -  r0)y*x}  (11) 


For  the  second  term,  its  derivative  can  be  computed 
once  derivatives  of  \y\  and  <j>  are  obtained.  If  we  ex¬ 
press  | y |  by  \/yy*,  then  the  derivative  of  |?/|  is  easily 
computed  to  be 

(M)/  =  (12) 

For  0,  it  can  be  expressed  by  /  as 

0  =  arctan 

j(y  +  y*) 

=  arctan 

j(fHx  +  XHf) 


Therefore  the  derivative  of  0  can  be  shown  to  be 

xH  f  _  1 

^j\y?x  ~  2jyx 


According  to  (9),  (11),  (12)  and  (14),  the  derivative  of 
VJi(/)  is  obtained  as 

VJi(/)  =  E{(3x}  (15) 


P  =  2(l2/|2  ~r0)y*  +  a2 


2  jy 

Try*  sin(^f-) 

Therefore  the  stochastic  gradient  algorithm  for  the  equal¬ 
izer  follows 

f(k  +  l)  =  f(k)-»1px  (16) 

3.2.  QAM  signals 

There  are  some  similarities  between  PAM  and  QAM 
signals.  In  the  signal  space  QAM  signals  can  be  de¬ 
picted  by  ( sx,sv )  where 

To  compute  VJ2(f),  we  first  evaluate  derivatives  of  yi 
and  ?/2.  If  they  are  expressed  explicitly  by  /, 

y  +  y*  fHx  +  xHf 

Vl  ~  2  2 

_  y-y*  _  fHn  -  xHf 

V2  ~  2  j  2  j 

then  it  is  easy  to  show  that  their  derivatives  have  the 

(*)/=!•  w'/=§ 

Based  on  these  results  and  (20),  VJ2(/)  can  be  derived 
to  be 

VJ2(/)  =  -E{Vx}  (22) 


nsin(^w)  nsin(^n) 

11  2  dx  +  2jdy 

In  the  case  dx  =  dy  =  1,  r)  is  simplified  to 

7 r 

V  =  -  [sin{ym)  -  jsin(y2n)] 

Substituting  (22)  in  (21)  and  using  instantaneous  ap¬ 
proximation,  we  can  update  the  equalizer  according  to 

f(k  +  1)  =  f{k)  +  fi2r)X  (23) 

The  equalization  method  proposed  for  either  PAM 
source  or  QAM  source  in  this  section  explicitly  con¬ 
siders  the  phase  and  modulus  properties  of  the  trans¬ 
mitted  signals.  As  a  result,  superior  performance  is 
expected  compared  with  the  conventional  CMA  equal¬ 
izer  which  only  captures  the  modulus  property. 


Sa,  —  (2mx  1  )dx,  rrix  —  •  ■  • , Lx  (-IT) 

Sy  =  {2jTly  1  )dy,  77Jy  =  Ly,  *  *  ’  ,  Ly  (18) 

This  representation  can  be  transformed  into  (see  (7)) 
=  0,  cos(^tt)  =  0  (19) 

Therefore  we  can  build  the  following  cost  function 

Mf)  =  £{cos2(|^7r)  +  cos2(^7r)}  (20) 

with  yi  and  y2  to  be  real  and  imaginary  parts  of  y 
respectively.  The  gradient  descent  recursion  for  the 
equalizer  can  be  formulated  as 

/(*  +  !)  =  /(*)-  H2  VJ2(/)|/=/(ife)  (21) 

In  this  section  we  provide  some  simulation  examples 
to  demonstrate  the  applicability  of  the  proposed  PAM 
and  QAM  equalization  methods.  We  also  compare 
them  with  the  CMA  algorithm  [3]  respectively  based 
on  inter-symbol  interference(ISI)  and  the  error  prob¬ 
ability.  The  ISI  is  used  to  illustrate  the  convergence 
property  of  the  algorithm  and  defined  as 

jgj  _  Si  Ijfyj  ~ 


where  aT  =  fH H,  \a\max  is  the  largest  absolute  value 
of  all  elements  in  a.  Under  perfect  equalization,  a  has 
only  one  nonzero  component  as  in  (4).  Then  ISI  be¬ 
comes  zero.  Therefore,  small  ISI  indicates  the  prox¬ 
imity  to  the  desired  response.  To  gain  more  insight 
about  the  performance  of  the  methods  in  the  commu¬ 
nications  context,  we  also  adopt  error  probability  as 


the  other  measure.  It  is  defined  as  the  percentage  of 
accumulated  decoding  errors  among  total  number  of 
transmitted  symbols  up  to  the  current  iteration,  and 
obtained  from  multiple  independent  realizations  with 
random  input  signals. 

In  the  experiments,  we  consider  an  unknown  non¬ 
minimum  phase  channel  impulse  response  used  in  [6] 
with  the  first  4  coefficients  [-0.400  0.840  0.336  0.134]. 
The  equalizer  has  12  taps  and  the  initial  value  of  all 
zeros  [0,  •  ■  • ,  0, 1, 0,  •  •  • ,  0]T  except  that  the  seventh  el¬ 
ement  is  1.  5000  iterations  are  run  in  each  realization. 
Totally  50  independent  realizations  are  performed  to 
obtain  the  average  results. 

First,  we  compare  the  proposed  PAM  equalizer  with 
Godard  approach  [3]  with  the  PAM  source.  The  input 
signals  take  six  equi-probable  values:  {±0.1,  ±0.3,  ±0.5} 
The  step  size  p  is  set  to  be  0.085,  weighting  factors 
a\  =  0.005  and  a.^  =  0  (since  a  real  channel  is  used). 
The  first  500  iterations  are  used  for  initialization  for 
both  methods.  The  average  ISI  after  500  iterations  is 
plotted  in  Fig.  1.  The  solid  line  represents  the  pro¬ 
posed  PAM  method  while  the  dashed  line  for  CMA.  It 
is  observed  that  the  ISI  of  the  proposed  PAM  method 
converges  to  a  level  15 dB  lower  than  that  from  CMA 
while  maintaining  the  same  fast  convergence.  The  error 
probability  is  also  shown  by  Fig.  2.  In  fact,  based  on 
our  observation,  the  proposed  method  doesn’t  take  any 
error  after  convergence  (800  iterations),  while  CMA 
still  accumulates  some  errors. 

Our  second  experiment  considers  QAM  source  with 
4  equi-probable  values  {±1  ±  j}.  We  also  compares  the 
proposed  QAM  scheme  with  the  CMA  algorithm  [3], 
The  first  20  data  points  are  used  for  initialization  for 
both  methods.  The  average  ISI  and  error  probability 
after  20  iterations  are  plotted  in  Fig.  3  and  Fig.  4 
respectively.  Solid  lines  represent  the  proposed  QAM 
equalization  method  while  dashed  lines  for  CMA.  It  is 
seen  that  the  ISI  based  on  the  proposed  QAM  scheme 
converges  faster  than  that  of  the  standard  CMA  while 
achieving  a  much  lower  level  after  convergence.  The 
error  probability  of  the  proposed  method  is  also  much 
lower  than  that  of  CMA.  This  fact  can  be  reflected  by 
the  difference  in  constellation  diagrams  of  the  equal¬ 
ized  outputs  for  all  iterations  from  a  randomly-picked 
realization,  as  shown  in  Fig.  5  and  Fig.  6.  It  is  interest¬ 
ing  to  note  that  the  equalized  outputs  of  our  equalizer 
has  a  much  smaller  variation  than  that  of  the  CMA 


[1]  Z.  Ding,  C.R.  Johnson  and  R.A.  Kennedy,  “On  the 
(non)existence  of  undesirable  equilibria  of  Gog- 
ard  equalizers”,  IEEE  Trans,  on  Signal  Process¬ 
ing,  vol.  40,  pp.  2425-2432,  Oct.  1992. 

[2]  G.J.  Foschini,  “Equalization  without  altering  or 
detecting  data”,  AT&T  Tech.  J.,  vol.  64,  no.  8, 
pp.  1885-1911,  Oct.  1985. 

[3]  D.N.  Godard,  “Self-recovering  equalization  and 
carrier  tracking  in  two  dimensional  data  commu¬ 
nication  systems”,  IEEE  Trans,  on  Comm.,  vol. 
28,  no.  11,  pp.  1167-75,  November  1980. 

[4]  H.  Zeng,  L.  Tong  and  C.R.  Johnson,  “An  Analysis 
of  Constant  Modulus  Receivers” ,  IEEE  Trans,  on 
Signal  Processing,  vol.  47,  no.  11,  pp.  2990-1999, 
November  1999. 

[5]  C.R.  Johnson, ,  “Blind  Equalization  Using 
Constant  Modulus  Criterion:  A  Review” ,  Proc.  of 
the  IEEE  ,  vol.  86,  no.  10,  pp.  1927-1950,  October 

[6]  O.  Shalvi  and  E.  Weinstein,  “New  criteria  for 
blind  deconvolution  of  nonminimum  phase  sys¬ 
tems  (channels)”,  IEEE  Transactions  on  Informa¬ 
tion  Theory,  vol.36,  no.2,  pp.312-21,  March  1990. 

[7]  J.R.  Treichler,  I.  Fijalkow  and  C.R.  Johnson, 
“Fractionally  spaced  equalizers”,  IEEE  Signal 
Processing  Mag.,  pp.  45-81,  May  1996. 


-  -  : CMA 
-  ;  Proposed 

Figure  2:  Error  probability  of  the  proposed  method  Figure  5:  Equalized  output  of  the  proposed  method 
and  Godard’s  method  with  PAM  sources.  with  QAM  sources. 

Figure  3:  ISI  of  the  prposed  method  and  Godard’s 
method  with  QAM  sources. 

Figure  6:  Equalized  output  of  Godard’s  method  with 
QAM  sources. 

Figure  1:  ISI  of  the  proposed  method  and  Godard’s  Figure  4:  Error  probability  of  the  proposed  method 
method  with  PAM  sources.  and  Godard’s  method  with  QAM  sources. 

Iteration  Number 

Iteration  Number 


Arthur  J.  Redfern 

G.  Tong  Zhou 

Texas  Instruments 

12500  TI  Boulevard,  MS  8653 
Dallas,  TX  75243 

Georgia  Institute  of  Technology 
School  of  ECE 
Atlanta,  GA  30332-0250 


Substantial  power  efficiency  improvements  are  possi¬ 
ble  in  communication  systems  if  a  moderate  amount  of 
nonlinearity  is  permitted  at  the  transmitter  amplifier 
and  corrected  for  at  the  receiver.  The  Volterra  series 
is  a  suitable  model  for  many  power  amplifiers,  and  is 
readily  incorporated  into  communication  channel  mod¬ 
els.  Existing  fixed  point  equalization  algorithms  for 
Volterra  channels  place  restrictive  conditions  on  the  lo¬ 
cations  of  first-order  kernel  zeros.  We  show  that  multi¬ 
channel  and  block  based  precoding  linear  equalization 
techniques  can  be  combined  with  the  fixed  point  equal¬ 
izer  to  allow  for  exact  equalization  of  Volterra  systems 
with  mixed-phase  first-order  kernels. 


The  design  of  a  communication  system,  from  the  data 
format  to  the  tranceivers,  is  composed  of  many  parts. 
Radio  frequency  power  amplifier  design  is  an  important 
component  of  cellular,  television,  radio,  and  data  trans¬ 
mission  systems.  In  amplifier  design  the  requirements 
of  power  efficiency  and  linearity  can  be  at  odds  with 
each  other,  with  the  result  being  that  power  efficiency 
is  sacrificed  in  order  to  meet  linearity  requirements  [2]. 

Substantial  efficiency  improvements  can  be  possible 
if  some  mild  nonlinearity  is  allowed  in  the  transmitter 
amplifier  and  corrected  for  at  the  receiver.  This  im¬ 
proved  efficiency  translates  to  lower  operating  costs, 
longer  battery  life,  and  smaller  size  devices.  A  penalty 
of  allowing  additional  nonlinearity  into  the  system  is 
that  the  equalizer  must  now  compensate  for  a  nonlin¬ 
ear  channel. 

In  this  paper  we  consider  fixed  point  equalization 
of  communication  channels  modeled  by  the  Volterra 
series  [3],  [4],  [8].  Fixed  point  equalization  in  this  case 

This  work  was  supported  in  part  by  NASA  grant  NGT- 
352334  and  NSF  grant  MIP-9703312. 

refers  to  the  contraction  mapping  theorem  [3]  (not  inte¬ 
ger  arithmetic).  The  Volterra  series  is  a  useful  nonlin¬ 
ear  model  for  amplifiers  [2],  and  is  readily  incorporated 
into  the  overall  channel  model  as  an  extension  of  linear 

Drawbacks  of  traditional  fixed  point  equalization 
techniques  include  the  requirement  that  the  linear  com¬ 
ponent  of  the  channel  is  minimum-phase  (for  stable  ex¬ 
act  inverses)  [3]  or  its  zeros  are  not  near  the  unit  circle 
(for  approximate  inverses)  [4].  These  can  be  serious 
limitations  for  realistic  communication  channel  mod¬ 
els,  as  the  error  in  the  inversion  of  the  linear  channel 
component  is  iterated  on  by  the  fixed  point  algorithm. 

Recently,  multichannel  [7]  and  block  based  precod¬ 
ing  methods  [5]  have  become  popular  for  linear  channel 
equalization.  This  is  because  both  methods  convert  the 
ill-posed  single  channel  inversion  problem  into  a  well 
posed  problem  with  an  exact  (zero  forcing)  solution  in 
the  noise-free  case.  We  show  that  these  principles  can 
be  combined  with  the  fixed  point  equalizer,  for  zero 
forcing  equalization  of  nonlinear  channels  with  mixed- 
phase  first-order  kernels. 


For  the  discrete  Jth-order  Volterra  system  H,  the  input 
x(n)  is  related  to  the  output  y(n)  by: 


=  H(x(n), . . .  ,x(n  —  L)) 

=  ^Hj{x(n),...,x{n~  Lj)) 
j=  1 

J  Lj  Lj  j 

=  53  53"'  53  M*,..,ri)n.(n-r.)l 

j=  1  Tl=0  Tj—Tj  —  1  0=1 

where  Hj  is  the  jth-order  operator  of  H,  hj(j\ , . . .  ,Tj) 
is  the  nonredundant  region  of  the  jth-order  kernel,  and 

0-7803-5988-7/00/$  10.00  ©  2000  IEEE 


L  —  max{Li , . . . ,  L  j  } .  Notice  that  a  first-order  Volterra 
system  (J  =  1)  is  linear  convolution  (an  FIR  filter). 

Throughout  this  paper  the  symbol  u(n)  will  be  used 
to  refer  to  the  linear  component  of  the  Volterra  system 
with  additive  noise  v(n): 


u(n )  =  ^2  hi{T\)x(n  -  ri)  +  v(n). 


For  the  channel  input  x(n),  output  y(n),  noise  v(n), 
and  linear  portion  of  the  output  with  noise  u(n),  it  will 
be  assumed  that  these  vectors  are  composed  of  a  basic 
block  of  N  symbols,  and  a  subscript  will  indicate  how 
many  symbols  before  this  basic  block  to  include,  e.g.: 

Xl  =  [x{-L),...,x(N  -  1)]T. 

An  optional  argument  can  be  included  to  specify  a  sub¬ 
set  of  the  vector: 

xl (a  :  b)  =  [x(a),...,a;(6)]T. 

If  the  d  sample  delay  operator  z~d  is  placed  before  the 
vector,  then  each  element  of  the  vector  is  delayed  by  d 

z~dXL  =  [x(-L  —  d), . . . ,  x(N  —  1  —  d)]1 . 

We  define  the  Volterra  series  relationship  between 
an  input  vector  xl  and  output  vector  y0  as: 

y0  =  H(xL). 


Figure  1:  A  single  channel  Volterra  system. 

Volterra  channels  is  setting  up  a  fixed  point  equation 
for  the  input  in  terms  of  the  known  system  kernels  and 
system  output,  then  solving  for  the  input  using  the 
method  of  successive  approximations  [3].  Two  assump¬ 
tions  are  implicit: 

Assumption  (Al):  The  K  +  L  previous  input  sym¬ 
bols  x(-K-L), . . . ,  z(-l)  have  already  been  estimated. 

Assumption  (A2):  The  K  previous  output  samples 
y(—K),  ...,«/(— 1)  are  available. 

In  the  following  derivation,  even  though  xo  will  be 
on  the  on  the  left  hand  side  of  the  equation  and  xk+l 
will  be  on  the  right  hand  side,  there  is  still  a  fixed  point 
equation  in  xo  since  xk+l  can  be  formed  directly  from 

The  derivation  of  the  fixed  point  equalizer  is  well 
known  in  the  literature  [3].  Here  the  derivation  is  per¬ 
formed  using  the  notation  of  Section  2  which  will  em¬ 
phasize  the  importance  of  the  inversion  of  the  linear 
component  of  the  noisy  channel  output. 

The  input/output  relationship  for  the  single  chan¬ 
nel  Volterra  system  in  Fig.  1  with  additive  noise  at  the 
receiver  is: 

As  a  shorthand  notation  to  refer  to  the  output  of  spe¬ 
cific  order  operators,  we  define: 


IWxl)  =  X>,(xLi)- 


It  is  often  necessary  to  write  the  first-order  operator 
corresponding  to  a  finite  impulse  response  (FIR)  filter 
as  a  filtering  matrix.  For  the  length  <5  +  1  vector  c  = 
[c(Q), . . .  ,c(0)]T,  the  N  xN  +  Q  filtering  matrix  Tn{c ) 
is  defined  as: 


'  c(Q)  ■  ■  •  c(0) 




y0  =  Hat  xLl  +  H2:J(xl)  +  v0,  (1) 

where  Hn  =  Tn  (hi ) .  Rearranging  the  terms  and  ap¬ 
plying  the  linear  operator  Gs  with  memory  K  to  both 
sides  yields: 

Gs(Hk+n  xzc+l,  +  v/f)  =  Gs(yk)  -  GsH2..j(xk+l). 


Notice  that  each  of  the  vectors  and  matrices  from  (1)  to 
(2)  has  been  extended  by  K  samples  in  the  past  (avail¬ 
able  from  (Al)  and  (A2))  since  the  operator  Gs  has 
memory  K.  To  setup  the  desired  fixed  point  equation, 
it  is  necessary  to  make  the  left  hand  side  of  (2)  z~dxo- 
Define  the  single  channel  error  term  as 

es  =  z~dx0  -  Gs{ UK+N  xK+Ll  +  vK) 

It  is  common  to  choose  Gs  corresponding  to  a  causal 
Kth  order  FIR  filter 

gs  =  [&(#),...,  Ss(0)]T, 

In  this  section  we  review  the  single  channel  fixed  point 
equalizer  based  on  the  contraction  mapping  theorem. 
The  basic  idea  underlying  fixed  point  equalization  of 

designed  according  to  the  minimum  mean-square  er¬ 
ror  (MMSE)  criterion.  For  the  MMSE  equalizer  it  is 
necessary  to  make  the  following  assumption: 


Assumption  (A3):  The  input  x(n)  and  the  noise 
v(n)  are  mutually  uncorrelated,  stationary  random  pro¬ 
cesses  with  known  covariance  matrices: 

Rxz  =  E[xk+l j  (n  —  K  —  Li  :  n) 

*k+lM-  K  ~  L  i  :n)l 

=  E[vk(ti- K  :n)  vx(n- K  :n)], 

If  (Al)  -  (A3)  are  satisfied,  then  the  equalizer  gs 
can  be  solved  for  as: 

gs  =  R-uJ  rxu,  (3) 

where  Ruu  and  rxu  are  defined  as 



rxu  -  H*K+1E[x(n  -  d)x*K+Li  (n  -  K  -  Lx  :  n)\. 

Substitution  of  the  operator  Gs  associated  with  the 
filter  gs  into  (2)  results  in  the  fixed  point  equation: 

z~dx0  =  Gs( yK)  ~  GsH2:J{xK+L)  +  e9. 

Assuming  that  es  is  small,  it  is  ignored  and  the  approx¬ 
imate  fixed  point  equation  is  solved: 

z~dx0  =  Gs(yK)  -  GsH2..j(xk+l)- 

For  the  case  of  d  =  0,  xk+l  can  be  determined  from 
z~dx o  and  (Al).  However,  when  d  >  0,  it  is  not  possi¬ 
ble  to  determine  the  last  d  elements  of  x«+ l  ,  namely 
x(N  -  d), . . . ,  x(N  —  1).  To  obtain  proper  estimates  of 
these  last  d  symbols  in  z~dx0,  they  could  be  the  first 
symbols  estimated  in  the  next  block  of  data. 

A  drawback  of  the  fixed  point  equalizer  is  the  er¬ 
ror  introduced  into  the  fixed  point  equation  associated 
with  the  inverse  of  the  first-order  kernel.  The  error  de¬ 
pends  on  the  length  K,  delay  d,  and  the  location  of  the 
zeros  of  H\ .  The  fixed  point  equalizers  in  the  following 
two  sections  eliminate  this  source  of  error,  and  allow 
for  zero  forcing  equalization  of  the  linear  component 
(along  with  the  nonlinear  component)  of  the  channel 
in  the  noise-free  case. 


The  availability  of  multiple  observations  per  symbol  pe¬ 
riod  at  the  receiver  has  become  more  common  in  many 
communication  systems.  Using  a  superscript  ^  to  de¬ 
note  the  channel,  the  following  assumption  is  made: 

Assumption  (A4):  There  are  no  common  zeros  across 
all  of  the  linear  components  {R^}f=1  of  the  channels. 


Figure  2:  A  single-input /multiple-output  Volterra  sys¬ 

It  is  well  known  that  for  multiple  linear  channels, 
FIR  zero  forcing  equalization  is  possible  if  (A4)  is  sat¬ 
isfied  [7].  In  this  section  it  is  shown  that  these  linear 
multichannel  equalization  techniques  can  be  combined 
with  the  fixed  point  equalizer,  to  allow  for  zero  forc¬ 
ing  equalization  of  Volterra  channels  with  mixed-phase 
first-order  kernels  using  as  little  as  two  channels. 

Consider  the  multichannel  Volterra  system  shown 
in  Fig.  2  and  again  assume  that  (Al)  and  (A2)  are 
satisfied.  For  the  sth  channel  write: 

y{oS)  =  H  «Xi,  +  #2  -j(xL)  +  v(oS)> 

where  =  7)i(h|8)).  Rearranging  terms  and  apply¬ 
ing  the  linear  operator  Gm  -with  memory  K  to  both 
sides  yields: 

=  G^(y^)-G^H^(xK+L).  (4) 

Because  (4)  holds  for  each  channel  s,  it  is  possible  to 
sum  the  results  for  all  S  channels  and  write: 

^GW(hWnx,+1i+vW)  = 


-  EGm^f](XiC+L).  (5) 
8=1  8=1 

If  it  is  possible  to  make  the  left  hand  side  of  (5)  xo, 
then  the  result  will  be  the  desired  fixed  point  equation. 
Define  the  error  term  as 

£m  =  X0  -EGm(H {k)+N^K+L1  +  V^). 


If  (A4)  is  satisfied,  then  in  the  noise-free  case  a  Ath- 
order  FIR  zero  forcing  solution  exists  such  that 

£Gm(  ^U+b)=X0, 




Figure  3:  A  single  channel  Volterra  system  with  pre¬ 

provided  that  S(K  + 1)  >  K  +  L  +  l  [7].  Define  the  mul¬ 
tichannel  filtering  matrix  Hmijc+i  and  the  multichan¬ 
nel  A' th- order  equalizer  gm  corresponding  to  {Gm  }j=1 




The  zero  forcing  equalizer  can  be  recovered  as  [7]: 

gm  =  (H  m./C+r)1  *K+Li+ 1,  (6) 

where  e/c+z-i+i  is  a  (K  +  L\  +  1)  x  1  vector  with  a  one 
in  the  (K  +  Li  +  l)th  position  and  zeros  elsewhere. 

Substitution  of  the  operator  Gm  associated  with 
the  filter  gm*  designed  according  to  (6)  into  (5)  results 
in  the  fixed  point  equation: 

s  s 

*0  =  EG*(y(x)  -Y,G^H^j(xk+l)  +  sm 

S=  1  «=1 


As  an  alternative  to  using  multiple  channels  at  the 
receiver  to  improve  the  single  channel  inversion  prob¬ 
lem,  structured  redundancy  could  be  introduced  at  the 
transmitter.  By  block  precoding  at  the  transmitter 
and  block  equalization  at  the  receiver,  FIR  zero  forcing 
equalization  of  single  channel  systems  is  possible  irre¬ 
spective  of  the  location  of  channel  zeros  [5].  As  in  the 
multichannel  case,  these  properties  can  be  extended  to 
fixed  point  equalization  of  Volterra  channels. 

Consider  the  block-based  transmission  scheme  of 
Fig.  3.  At  the  transmitter,  data  symbols  w(n)  are  col¬ 
lected  into  a  block  of  length  M : 

w  =  [w(0 w(M  -  1)]T, 

and  mapped  by  the  precoder  Fp  to  the  length  N  block 
of  channel  inputs  xo-  If  the  precoder  is  linear,  then 
it  can  be  represented  by  the  N  x  M  matrix  Fp.  The 
precoder  structure  is  chosen  to  satisfy  the  following  two 
assumptions  [5]: 

Assumption  (A5):  The  lengths  L,  M,  and  N  satisfy 
N  =  L  +  M. 

Assumption  (A6):  rank(Fp)  =  M,  and  the  last  L 
rows  of  Fp  are  zero. 

As  a  result  of  (A6),  Fp  can  be  decomposed  as 

FP  = 


0  LxM 

where  the  M  x  M  matrix  Fp  is  nonsingular.  Using 
(A6)  it  is  possible  to  write: 

XL  = 




The  N  row  filtering  matrix  for  the  first-order  kernel 
H„  =  Tn(  hi)  can  be  decomposed  as 

HN  =  [nN  Hjv  Hjv]> 

where  Hjv  is  N  x  L,  Hjv  is  N  x  M,  and  Hjv  is  N  x  L. 
Using  these  definitions,  the  input/output  relationship 
for  the  block-based  system  with  precoding  can  be  writ¬ 
ten  as 

y0  =  H/vFpw  + 




)  +  v0. 

Rearranging  terms  and  applying  the  linear  operator  Gp 
to  both  sides  yields: 

Gp(HjvFpw  +  v0)  =  Gp(y0)  -  GpH2:j( 






If  the  left  hand  side  of  (7)  was  w,  then  the  desired 
fixed  point  equation  would  result.  Define  the  error  term 

eP  =  w  —  Gp(HjvFpw  +  v0).  (8) 

If  (A5)  and  (A6)  are  satisfied,  then  in  the  noise-free 
case,  a  zero  forcing  solution  Gp  (with  matrix  form  Gp) 
to  (8)  exists  such  that  [5]: 

GpHjvFpw  =  w. 

The  zero  forcing  equalizer  can  be  recovered  as  [5]: 

Gp  =  F- 1Hjv.  (9) 

Substitution  of  the  operator  Gp  associated  with  the 
matrix  Gp  designed  according  to  (9)  into  (7)  results  in 
the  fixed  point  equation: 

w  =  Gp(y0)  -GpH2:j{ 




)  +  £p- 



(a)  Single  Channel 

We  considered  a  third-order  baseband  Volterra  system 
with  L\  —  5  and  1/3  =  2,  whose  complex  kernel  coeffi¬ 
cient’s  real  and  imaginary  parts  were  chosen  randomly 
from  [-0.5, 0.5],  with  the  third-order  kernel  scaled  by 
0.03  such  that  the  nonlinear  to  linear  power  ratio  is  - 
23  dB.  A  16-QAM  input  was  used,  and  additive  white 
Gaussian  noise  was  present  at  the  channel  output.  For 
each  data  point  we  generated  100  blocks  of  N  —  100 
symbols  for  100  different  channels. 

For  the  multichannel  fixed  point  simulations  we  used 
5  =  4  channels  and  the  linear  component  of  the  equal¬ 
izer  designed  according  to  (6)  with  order  K  =  8.  The 
single  channel  fixed  point  simulations  (with  and  with¬ 
out  precoding)  used  the  first  of  the  multichannel  fixed 
point  simulations’  channels.  The  standard  single  chan¬ 
nel  fixed  point  equalizer’s  linear  component  was  de¬ 
signed  according  to  (3)  with  K  =  32  and  d  =  16. 
The  linear  component  of  the  single  channel  fixed  point 
equalizer  with  precoding  was  designed  according  to  (9), 
with  a  data  block  length  of  M  =  N  —  L  =  95  and  pre¬ 
coder  F  =  I mxm-  For  each  of  the  fixed  point  equaliz¬ 
ers,  5  iterations  of  their  respective  fixed  point  equation 
were  performed. 

For  our  performance  metric,  we  calculate  the  signal 
to  interference  ratio  (SIR),  defined  in  terms  of  the  MSE 
of  the  equalizer  output: 

SIR  =  — 10  log10  MSE  (dB), 

vs.  SNR.  The  SIR  allows  us  to  assess  the  ability  of  the 
equalizers  to  cope  with  both  the  noise  and  the  nonlin¬ 
earity.  Fig.  4  compares  the  output  of  each  of  the  fixed 
point  equalizers,  along  with  the  corresponding  outputs 
of  the  linear  components  of  the  equalizers. 


In  this  paper  we  showed  that  multichannel  and  block 
based  precoding  linear  channel  equalization  techniques 
can  be  combined  with  the  fixed  point  method  for  zero 
forcing  equalization  of  Volterra  channels  with  mixed- 
phase  first-order  kernels.  Since  the  fixed  point  equalizer 
takes  the  form  of  a  nonlinear  correction  added  to  a 
linear  inverse,  it  is  a  practical  addition  to  existing  linear 
channel  equalization  schemes. 


[1]  G.  Giannakis  and  E.  Serpedin,  “Linear  multichannel 
blind  equalizers  of  nonlinear  FIR  Volterra  channels,” 
IEEE  Transactions  on  Signal  Processing,  vol.  45,  no. 
1,  pp.  67-81,  1997. 

40 1 


55  w  K 

10  k - 

qI - 1 - 1 - 1 - 1 - 

15  20  25  30  35  40 

(b)  Multichannel 

Figure  4:  Comparing  the  linear  and  fixed  point  equal¬ 
izer  outputs. 

[2]  S.  Maas,  “Analysis  and  optimization  of  nonlinear 
microwave  circuits  by  Volterra-series  analysis,”  Mi¬ 
crowave  Journal,  vol.  33,  no.  4,  pp.  245-251,  Apr.  1990. 

[3]  R.  Nowak  and  B.  Van  Veen,  “Volterra  filter  equaliza¬ 
tion:  A  fixed  point  approach,”  IEEE  Transactions  on 
Signal  Processing,  vol.  45,  no.  2,  pp.  377-388,  1997. 

[4]  A.  Redfern  and  G.  Zhou,  “A  fixed  point  equalizer 
for  nonlinear  communication  channels,”  Proceedings  of 
the  Thirty- Third  CISS,  Baltimore,  MD,  Mar.  1999,  to 

[5]  A.  Scaglione,  G.  Giannakis  and  S.  Barbarossa  “Redun¬ 
dant  filterbank  precoders  and  equalizers  part  I:  Uni¬ 
fication  and  optimal  designs,”  IEEE  Transactions  on 
Signal  Processing,  vol,  47,  no.  7,  pp.  1988-2006,  1999. 

[6]  M.  Schetzen,  The  Volterra  and  Wiener  Theories  of 
Nonlinear  Systems.  New  York:  John  Wiley  and  Sons, 

[7]  D.  Slock,  “Blind  fractionally-spaced  equalization, 
perfect-reconstruction  filter  banks  and  multichannel 
linear  prediction,”  Proceedings  of  the  IEEE  ICASSP, 
pp.  585-588,  Adelaide,  Australia,  Apr.  1994. 

[8]  C.  Tseng  and  E.  Powers,  “Nonlinear  channel  equal¬ 
ization  in  digital  satellite  systems,”  Proceedings  of  the 
IEEE  Globecom,  pp.  1639-1643,  Houston,  TX,  Nov. 




Said  Aouada  and  Adel  Belouchrani 

Electrical  Engineering  Department, 

Ecole  Nationale  Poly  technique 
P.0.  Box  182  El  Harrach  16200,  Algiers,  Algeria 


A  joint  propagation  parameter  estimation  method  for  Mul- 
tiCarrier  systems  is  proposed.  The  main  difference  between 
Single  Carrier  and  MultiCarrier  models  is  outlined  and  han¬ 
dled  in  the  derivation  of  the  algorithm.  The  method  uses  a 
subspace-based  2-D  ESPRIT-like  approach,  exploiting  fre¬ 
quency  shift  invariance  of  the  system  as  well  as  the  ULA 
geometry  to  provide  closed-form  estimation.  Basic  perfor¬ 
mances  of  the  algorithm  are  illustrated  through  simulations 
and  compared  with  respect  to  the  Cramer-Rao  bound. 


In  several  wireless  systems,  the  transmitted  signals  are  sub¬ 
ject  to  the  effects  of  multipath  channels,  caused  by  the 
remote  terrestrial  objects  and  the  inhomogeneities  in  the 
physical  medium.  Estimation  of  the  multipath  propagation 
parameters  from  measurements  at  a  multisensor  antenna, 
provides  a  better  channel  characterization  for  subsequent 
processing.  These  parameters  include,  among  others,  the 
Direction  Of  Arrival  (DOA)  and  Time  Difference  Of  Arrival 
(TDOA)  of  each  path.  In  MultiCarrier  Modulation  (MCM) 
systems  such  as  Digital  Terrestrial  Television  Broadcast¬ 
ing  (DTTB)  and  Digital  Audio  Boadcasting  (DAB),  the 
transmitted  signals  are  subject  to  the  effects  of  a  multipath 
channel,  in  the  same  way  as  are  Single  Carrier  Modulation 
(SCM)  systems. 

Herein,  we  investigate  the  possibility  of  performing  closed- 
form  Joint  Angle  and  Delay  Estimation  (JADE)  for  a  MCM 
system  in  a  single  batch,  in  a  way  similar  to  JADE  for  SCM 
systems,  by  exploiting  the  frequency  diversity  of  the  sys¬ 
tem,  together  with  a  known  array  geometry.  The  system 
consists  of  a  single  source  and  a  single  antenna  array.  A 
channel  model  is  derived  to  outline  the  frequency  shift  in¬ 
variance  associated  with  the  system.  The  model  exploits 
the  stationarity  of  the  parameters  over  the  coherence  time 
of  the  channel.  It  also  takes  into  account  the  fact  that  the 
unknown  complex  fadings  differ  from  one  carrier  to  another. 
Both  the  uniform  carrier  spacing  and  a  known  array  geome¬ 
try  allow  closed-form  estimation  of  the  propagation  param¬ 
eters.  More  particularly,  if  the  antenna  is  Uniform  Linear 
(ULA),  or  has  an  ESPRIT  doublet  structure,  JADE  can  be 
achieved  using  a  2D  ESPRIT-like  technique.  The  Cramer- 
Rao  Boimd  on  the  variance  of  the  estimated  parameters  is 
also  derived  from  the  obtained  model. 


The  principle  of  a  Multicarrier  transmitter  is  depicted  in 
Figure  1.  The  concept  is  to  transform  serial  data  into  par¬ 
allel  lower  rate  inputs  that  are  modulated  by  orthogonal 
carriers.  History  and  applications  of  MCM  are  reported  in 
[1].[2]  and  the  references  therein  and  are  not  stated  here 
for  conciseness  purposes.  Assuming  a  single  MCM  source 

Figure  1:  Block  diagram  of  the  MCM  transmitter. 

emitting  over  C  carriers,  the  lowpass  equivalent  transmitted 
signal  is  given  by 

C  oo 

*(*)  =  £  E  Sc  [k]g(t-kT)e2^**  (1) 

c=  1  k=—  oo 


•  sc  [fc]  is  the  k-th  symbol  conveyed  by  carrier  c, 

.  {sc[k]},c=l,...,C  are  independent  from  one  carrier 
to  another  and  identically  distributed, 

•  g(t)  is  the  pulse-shape  function, 

•  T  is  the  symbol  duration,  and 

0-7803-5988-7/00/$  10.00  ©  2000  IEEE 



•  the  frequency  spacing  between  two  successive  carriers 

In  the  following,  the  channel  is  fading  and  time  varying. 
However,  it  is  regarded  to  be  stationary  within  its  coherence 
time.  Assuming  C  carriers  and  perfect  carrier  phase  and 
sampling  time  recovery,  the  complex  envelope  of  the  lowpass 
received  signal  at  an  M -element  antenna  array  at  time  t  can 
be  written  as 


y  (0  =  + 

C—  1 

C  oo 

=  Yj  Sc  [k]hc(t-kT)e]2n*t  +z{t)  (2) 

c=  1  kss—oo 

where  hc(t)  =  [  hCli(t)  hCt2(t)  ...  hc<M{t)  ]T  is  the 
transmission  channel  associated  with  the  c-th  carrier,  sc  [/:] 
is  the  /c-th  symbol  of  duration  T  conveyed  by  carrier  c  and 
z(f)  is  the  additive  white  Gaussian  noise.  The  coherence 
time  of  the  channel  is  assumed  to  range  over  K  symbol 
periods.  The  channel  hc(t)  can  be  modeled  as  [3] 


h c(t)  =  a (9q)pc(q)g(t  -  Tq)e~32*cT  (3) 


where  Q  is  the  number  of  paths,  9q  and  Tq  are  the  g-th  an¬ 
gle  of  arrived  and  time  delay  respectively  and  /3c(g)  is  the 
complex  attenuation,  which  is  varying  from  carrier  to  car¬ 
rier.  a(0,)  is  the  (Mxl)  vector  of  the  array  response  to 
the  g-th  path,  with  g  =  1,  ...,Q  and  g(t)  is  the  finite  sup¬ 
port  modulation  pulse-shape  function.  We  assume  that  the 
array  outputs  are  received  in  parallel  over  each  carrier  af¬ 
ter  demodulation.  The  channel  length  is  LT.  We  collect 
K  data  samples  on  each  carrier.  Using  some  trivial  manip¬ 
ulations,  this  can  be  expressed  in  a  (M  x  A')-dimensional 
matrix  form  as 

YC  =  HCSC  c  =  1,  ...,C  (4) 

If  the  Toeplitz  matrix  of  data  symbols  Sc,  c  =  1,  ...,C, 
is  known  from  training  and  K  >  M,  an  estimate  of  the 
channel  samples  matrix  Hc  in  (3)  can  be  obtained  for  c  = 
1, ...,  C,  using  least  squares.  Blind  estimation  of  the  channel 
samples  [4,  5]  is  also  possible  in  case  Sc  is  not  known  in 
advance.  The  estimated  channel  can  be  given  as 

He  =  He  +  Nc  (5) 

where  Nc  is  the  estimation  noise  matrix. 

Omitting  the  estimation  noise,  one  can  easily  show  that 
for  each  carrier,  the  terms  e~j27TC'$' ,  q  =  1,  •  •  •  ,Q  in  equa¬ 
tion  (5)  can  be  factored  out,  resulting  in 

Hc  :=  Acdiag  [ec(r)]  G  c  =  l,...,C  (6) 

where  the  ( i ,  j)-th  element  of  G  is  defined  as 

G i,j  =  9  (O'  “  l)?1  ~  r<)  ,  i  =  1,  •  •  • ,  Q  and  j  =  1,  •  •  • ,  L 

Ac(0)  =  [ft^a^,)  0a(c)a(tf2)  ...  /3q(c) a(0Q)]  (7) 

6=16,  02  ...  6q  }T  (8) 


r  iT 

T  =  [  Tl  T2  ...  TQ  \  (9) 

If  we  stack  all  the  matrices  Hc  corresponding  to  all  the 
C  carriers,  we  will  obtain  a  large  (A/C  x  L)-dimensional 
matrix  %  whose  structure  is  given  by 

'  H!  ■ 


n  = 

.  He  . 

:=  U(0,  t)G  (10) 




U(0,r)=  .  (11) 

.  Ac(0)diag[ec(r)]  _ 

Finally,  we  include  the  channel  estimation  noise  matrix 
Af,  which  is  appropriately  defined  in  accordance  with  (5) 
and  (10).  Therefore,  the  model  in  (10)  becomes 

H  =  U(0,  r)G  +  Af  (12) 

If  we  consider  that  the  delay  spread  of  the  channel  is 
Tm  (expressed  in  terms  of  the  symbol  period  T),  then  the 
coherence  bandwidth  of  the  channel  is  roughly  the  inverse 
of  Tm,  i.e., 

The  frequency  separation  between  carriers  in  the  MultiCar- 
rier  system  is  given  by  A /  =  j;.  All  the  carriers  that  lie 
within  a  frequency  interval  equal  to  the  channel  coherence 
bandwidth  can  be  seen  as  identically  attenuated.  There¬ 
fore,  it  is  reasonable  to  assume  that  the  number  of  carriers 
being  attenuated  equally  is  n  =  =  [^J,  where  [-J 

denotes  the  integer  part.  Under  this  condition,  the  number 
of  ij- carrier  sets  that  share  the  same  attenuation  coefficients 
is  obviously  m  =  nC. 

If  we  consider  only  the  first  mp  carriers  in  the  deriva¬ 
tion  of  the  MC-JADE  model  (10)  (mp  is  at  most  equal  to 
C),  we  will  obtain  a  reduced  MC-JADE  model  ( Mmg  x  L) 
satisfying  the  following  factorization 

—  Um^(0,  t)G  -f-  Afvnn 

-  JFi(t)  o  Ai  (0)  ‘ 


:=  .  G  +  A Up  (13) 

-7, m (t)  0  Am(0)  _ 




Aj (9)  =  [  /?i(i)a(0i)  /?2(*')a(^ 2)  •••  /?Q(t')a(0g)  ] 

01  02  •  •  •  0q  1 

01+1  0J+1  ...  0‘g+1 

=  .  .  .  . 

^-i  ^-i . 


0,  =  (14) 

o  denotes  Khatri-Rao  product,  i.e.,  columnwise  Kronecker 


If  the  array  is  Uniform  Linear  (ULA)  or  has  an  ESPRIT 
doublet  structure,  then  the  angles  and  delays  can  be  esti¬ 
mated  jointly  in  closed-form  using  an  ESPRIT-like  method. 
For  the  ULA  geometry,  the  steering  vector  a(#?)  will  be 
given  by 

a(*,)  =  [l  0,  ...  ]T  (15) 


0,  =  6^  .me,  (16) 

and  A  is  the  array  sensor  spacing  in  wavelenghts. 

With  the  parameter  definitions  (16)  and  (14),  it  is  more 
appropriate  to  rewrite  (13)  as 

nmn  =  U(iPA)G  +  Armii  (17) 


0  =  [  01  02  ...  0?  ]T  (18) 


0  =  [  01  02  •••  0,  ]T  (19) 

Estimation  of  the  channel  subspace  and  its  dimension 
is  equivalent  to  finding  a  basis  E  of  the  column  span  of  the 
data  matrix  limit  and  estimating  of  the  parameters  0  and  0 
reduces  to  jointly  diagonalize  the  matrices  and 

{  /;  =  jve  <20> 


{ <2i> 


f  J ip  =  I m/i  ®  Im— 1  0(M— 1,1)  (  (22) 

(  J  tp  =  I  m/i  ®  0(M_1,1)  IjVf— 1 

{J^  =  Im®  Im(m-I)  0(Af  (^1  — 1)),  Af) 

are  the  appropriate  selection  matrices  (see  [6], [7]  and  [8]  for 
details  of  JADE),  ®  denotes  Kronecker  product,  I,  is  an  i- 
dimensional  identity  matrix  and  0,-,^  is  a  (i  x  j)-dimensional 
matrix  of  zero  elements. 

Details  of  the  joint  diagonalization  are  provided  in  [7] 
and  the  references  therein.  The  correct  pairing  between  the 
0’s  and  the  0’s  is  guaranteed  by  the  fact  that  matrices  share 
common  eigenvectors. 

If  the  pulse-shape  function  is  assumed  to  be  known,  the 
complex  attenuation  coefficients  can  be  linearly  estimated 
using  least-squares,  by  processing  the  channel  samples  over 
each  carrier  separately. 


The  parameter  identifiability  requires  to  have  the  (M  mpi  x 
jL)-dimensional  data  matrices  Hi  of  rank  Q,  with  Q  <  Mm  pi 
and  Q  <  L.  This  means  that  U/(&,  t)  must  have  strictly 
more  rows  than  columns  and  be  of  full  column  rank,  and  G 
must  have  more  columns  than  rows  and  be  of  full  row  rank. 
The  full  rank  condition  on  G  together  with  the  channel  fac¬ 
torization  (6)  imply  that  all  the  delays  must  be  distinct.  If 
two  paths  have  the  same  TDOA’s,  the  rank  of  %  becomes 
Q  —  1  and  the  corresponding  angles  cannot  be  identified 
correctly.  In  this  case,  "spatial  smoothing"  [7]  can  pro¬ 
vide  the  solution  [6], [7]  by  performing  data  extension  of  the 
channel  over  each  carrier  in  such  a  way  to  keep  rank  Hi 
equal  to  the  number  of  paths  Q.  In  order  to  allow  selec¬ 
tion  of  the  received  data  (13),  there  must  be  at  least  a  pair 
of  sensors,  i.e.,  M  >  2,  and  the  coherence  bandwidth  to 
carrier  frequency-spacing  ratio  must  be  at  least  2  :  1,  i.e., 
>  2  or  /(  >  2.  The  last  requirement  can  be  satisfied 
by  appropriately  increasing  the  number  of  carriers. 


The  following  simulation  results  illustrate  performance  of 
MC-JADE-ESPRIT.  In  all  the  experiments,  the  estimation 
Mean  Square  Error  (MSE)  is  averaged  over  500  Monte  Carlo 
runs  of  the  algorithm  and  compared  against  the  Cramer- 
Rao  Bound  (CRB)  which  is  derived  for  the  model  (13)  in  the 
Appendix.  In  the  figures  corresponding  to  the  experiments, 
the  MSE  is  plotted  using  a  full  line  whereas  the  CRB  is 
shown  by  a  dotted  line. 

5.1.  Basic  performance  of  MC-JADE-ESPRIT 

We  consider  an  antenna  of  M  =  2  elements,  spaced  at  half 
wavelength.  The  number  of  paths  is  Q  =  3  with  parameters 
6  =  [  -15°  0°  25°  ]T,  r  =  [  0  0.078  0.234  ]T  T 

and  the  path  fadings  being  generated  from  a  complex  zero- 
mean  Gaussian  distribution  with  variance  [0.4  0.3  0.3]. 

The  channel  lenght  is  half  the  symbol  period  T,  which  is 
normalized  to  T  =  1.  The  pulse-shape  function  is  a  raised 
cosine  with  0.25  roll-off  factor.  C  —  64,  with  pi  =  8.  The 
employed  joint  diagonalization  method  is  method  ”Q”  as 


MSE  of  angles  (db)  MSE  of  angles  (db) 

it  is  referred  to  in  [7].  Fig.  2  shows  the  effect  of  the  noise 
power  on  the  MSE  of  the  estimated  DOA’s  and  TDOA’s. 
At  high  noise  powers,  the  estimation  is  strongly  sensitive  to 
the  channel  estimation  noise  and  is  erronous.  As  the  noise 
effect  decreases,  the  difference  with  the  CRB  is  about  2  to 
3  dB. 

5.2.  Comparison  with  SI-JADE 

For  the  same  setting,  we  plot  the  CRB  relative  to  the  pa¬ 
rameter  estimation  over  the  first  carrier,  using  SI-JADE  [7], 
against  the  noise  power.  The  stacking  parameter  as  defined 
in  [7]  is  taken  to  be  ml  =  5.  The  CRB  of  SI-JADE  is  plot¬ 
ted  in  Fig.  2  using  a  dashed-line.  Here,  for  low  estimation 
noise  powers,  the  parameter  MSE  of  MC-JADE-ESPRIT 
is  smaller  than  the  CRB  of  SI-JADE.  The  greater  estima¬ 
tion  precision  for  MC-JADE-ESPRIT  is  mainly  due  to  the 
larger  amount  of  information  involved. 

Angle  estimation  Delay  estimation 

1 /noise  (db)  1 /noise  (db) 

Angle  estimation  Delay  estimation  ' 

Delay  spacing  (T)  Delay  spacing  (T) 

Figure  4:  Temporal  resolution  of  MC-JADE-ESPRIT. 

on  Fig.  4.  It  is  clear  that  for  small  delay  spacing,  ambigu¬ 
ity  occurs  and  the  full  rank  condition  on  the  pulse-shape 
function  matrix  is  no  more  satisfied,  yielding  an  erroneous 
estimation.  Here,  no  spatial  smoothing  is  applied.  For  well 
separated  delays,  estimation  is  seen  to  depend  only  on  the 
noise  power. 


Advantage  of  the  algorithm  is  that  it  takes  into  account 
the  available  frequency  diversity  provided  by  the  multiple 
carriers  and  processes  data  in  a  single  batch.  However,  es¬ 
timation  of  the  channel  impulse  response  is  prerequisite  to 
the  application  of  the  algorithm,  which  makes  its  perfor¬ 
mance  suboptimal  and  sensitive  to  the  estmation  noise. 

Figure  2:  Basic  performance  of  MC-JADE-ESPRIT. 

Angle  estimation  Delay  estimation 

0.05  0.1  0.15  0.05  0.1  0.15 

Angle  spacing  (radian)  Angle  spacing  (radian) 

Figure  3:  Spatial  resolution  of  MC-JADE-ESPRIT. 

5.3.  Resolution  of  the  Algorithm 

We  set  the  number  of  paths  to  Q  =  2,  with  the  estimation 
noise  power  being  fixed  at  -20  dB.  All  the  other  parameters 
are  kept  the  same.  In  Fig.  3,  as  it  is  expected,  it  is  shown 
that  estimation  accuracy  improves  with  well  separated  an¬ 
gles,  else  estimation  is  dependent  on  noise  power.  The  effect 
of  delay  spacing  on  the  angle  and  delay  estimation  is  shown 


The  Cramer  Rao  Bound 

The  CRB  for  the  joint  problem  (13)  can  be  derived  as  fol¬ 

Let  us  define  the  parameter  vector  as 

Wu  gT(l)  ...  g T(L)  r?) 


n  :=  [5>?{/3r(l)}  3{/3T(l)}...3*{/3T(m)}  3{/3T(m)}  6T  rT]T 

and  5R {.}  and  T {.}  denote  the  real  and  imaginary  parts 
respectively.  In  our  case,  vectors  g (*),*  =  1  ,...,£,  which 
are  the  columns  of  matrix  G  in  (13),  are  deterministic  but 
unknown.  The  data  are  the  channel  estimates  These 

data  are  corrupted  by  the  estimation  noise 

A fmn-=  [  n(l)  n(2)  ...  n(L)  ] 

where  n(i),i  =  1  are  complex,  stationary,  zero-mean 

Gaussian  random  processes  that  are  temporally  uncorre¬ 
lated.  It  follows  that  the  data  'Hmn  are  also  uncorrelated 
Gaussian  random  processes.  The  likelihood  function  of  the 
data  is 

=  \  Mmjil,  x 

(2ir)Mm'*t  (-f) 


x  expj-;^X^  "*(*>(*')}  (24) 

and  the  corresponding  loglikelihood  function  is 

Finally,  the  CRB  matrix  for  the  parameters  of  interest, 
CRB(0,t),  is  the  2Q-dimensional  bottom-right-comer  par¬ 
tition  matrix  of  CRB{rj)  and  the  bounds  are  found  by  tak¬ 
ing  the  diagonal  elements. 

A  =  In  C  =  const  —  MmpL  In  a 

n*(t)n(j)  (25) 

where  *  denotes  complex  conjugate  transpose.  The  deriva¬ 
tives  of  the  loglikelihood  function  A  with  respect  to  the  un¬ 
known  parameters  can  be  obtained  using  results  of  [9], [6], [10], 







MmpL  1  v~'  ....  ... 

-2— +  -r2^n  Wn(0 

=  -i-R[U*n(0] 


=  ^X>{<?(0D*n(«)} 

[D^  De  DT]  (Mmp  x  2 (m  +  1  )Q) 

[D»{/9(1)>  D0{/3(1)>  •••  D3?{/3(m)}  Da{/9(m)}] 

au  eu  1 

a<»{/3(.) i }  •••  J 

au  au  1 

SS{/3(i)ll  ao{/3(.),}  J 

r  au  atr  i 

L  a«i  ae,  J 

r  au  au  i 

L  St,  Or,  J 

l2(m+l)  0g(») 

with  U  =  U(0,  r),  and 






Using  results  of  [9], [6],  we  get 

M  mpL 

5R[U*U]  SU 






[1]  J.  A.  C.  Bingham,  "Multicarrier  Modulation  for  Data 
Transmission:  An  Idea  Whose  Time  Has  Come",  IEEE 
Communications  Magazine,  vol.  28,  NO.  5,  May  1990. 

[2]  I.  Kalet,  "The  Multitone  Channel",  IEEE  Transac¬ 
tions  on  Communications,  vol.  37,  NO.  2,  February 

[3]  L.  Vandendorpe  and  O.  van  de  Wiel,  "MIMO  DFE 
Equalization  for  Multitone  DS/SS  Systems  over  Mul¬ 
tipat  Channels", IEEE  Transactions  on  Communica¬ 
tions,  vol.  14,  NO. 3,  April  1996. 

[4]  A.  Belouchrani  and  M.  G.  Amin,  "Blind  Source  Sep¬ 
aration  Based  on  Time-Frequency  Signal  Representa¬ 
tions",  IEEE  Transactions  on  Signal  Processing,  vol. 
46,  NO.  11,  november  1998. 

[5]  K.  Abed-Meraim  and  Y.  Hua,  "Blind  Identification 
of  Multi-Input  Multi-Output  System  Using  Minimum 
Noise  Subspace",  IEEE  Transactions  on  Signal  Pro¬ 
cessing,  vol.  45,  NO.  1,  January  1997. 

[6]  M.  C.  Vanderveen,  A.  J.  van  der  Veen  and  A.  Paulraj, 
"Estimation  of  Multipath  Parameters  in  Wireless 
Communications",  IEEE  Transactions  on  Signal  Pro¬ 
cessing,  vol.  46,  NO. 3,  March  1998. 

[7]  A.  J.  van  der  Veen,  M.  C.  Vanderveen,  and  A. 
Paulraj,  "Joint  Angle  and  Delay  Estimation  Using 
Shift-Invariance  Techniques"  IEEE  Transactions  on 
Signal  Processing,  vol.  46,  NO. 2,  February  1998. 

[8]  A.  J.  van  der  Veen,  M.  C.  Vanderveen  and  A.  Paulraj, 
"Joint  Angle  and  Delay  Estimation  Using  Shift  Invari¬ 
ance  Properties"  IEEE  Signal  Processing  Letters",  vol. 
4,  NO.  5,  May  1997. 

[9]  P.  Stoica  and  A.  Nehorai,  "MUSIC,  Maximum  Like¬ 
lihood,  and  the  Cramer-Rao  Bound",  IEEE  Transac¬ 
tions  on  Acoustics,  Speech,  and  Signal  Processing,  vol. 
37,  NO.  5,  May  1989. 

[10]  S.  M.  Kay,  "Fundamentals  of  Statistical  Signal  Pro¬ 
cessing  :  Estimation  Theory" ,  Prentice-Hall,  1993. 

The  Fisher  Information  Matrix  (FIM)  for  the  parameters  is 
given  by  E(u>ojt),  where  u ;  :=  [tr^f  gr(l)  .  • .  gT(L)  r)T] 
and  the  inverse  of  the  CRB  matrix  for  the  parameters,  after 
some  manipulations,  is  given  by 

CRB~'(r j) 

9  L 








35  174  BRUZ  CEDEX  -  FRANCE 

ENSIETA  -  2,  rue  F.  VERNY 
29  200  BREST  -  FRANCE 



This  paper  deals  with  the  analysis  of  modulated  signals 
in  a  NDA  (Non  Data  Aided)  context.  Assuming  the  de¬ 
tection  of  an  OFDM  signal,  our  goal  is  to  estimate  the 
bandwidth  and  the  number  of  sub-carriers  of  this  signal. 
First,  we  propose  an  algorithm  based  on  wavelet  decom¬ 
position  in  order  to  estimate  the  bandwidth:  bandwidth 
is  correctly  estimated  in  100  %  of  the  cases  with  an  error 
lower  than  8  %  until  SNR  —  3  dB.  Second,  we  apply  the 
MUSIC  algorithm  with  decision  criterion  to  obtain  the 
number  of  sub-carriers:  the  number  of  carriers  can  be 
estimated  with  an  error  lower  or  equal  to  9  %  in  100  % 
of  the  cases  until  SNR  =  10  dB. 


Spectrum  survey  requires  the  estimation  of  the  param¬ 
eters  of  the  received  signals.  This  problem  has  already 
been  studied  in  the  case  of  the  single-carrier  modulations, 
and  has  now  to  cope  with  new  modulations  types  like 
OFDM  (Orthogonally  Frequency  Division  Multiplexing) 
which  are  more  and  more  used  (DAB,  ADSL,...).  In  [2], 
we  proposed  a  method  to  detect  OFDM  signals  versus 
linear  single-carrier  modulated  signals.  The  problem  we 
now  want  to  solve  is  the  estimation  of  two  main  param¬ 
eters  of  such  a  signal:  the  bandwidth  and  the  number  of 
sub-carriers.  Using  the  fact  that  the  power  spectral  den¬ 
sity  (PSD)  of  an  OFDM  signal  has  a  rectangular  shape, 
we  propose  to  apply  a  wavelet  decomposition  to  detect 
the  breaking  points  at  the  beginning  and  at  the  end. 
Then,  we  try  to  determine  the  number  of  sub-carriers. 
Since  this  number  is  unknown,  AR  modelization  is  im¬ 
possible.  Therefore,  the  MUSIC  algorithm  with  decision 
criterion  seems  to  be  well  suited  to  solve  this  problem. 
In  section  2  we  give  the  problem  statement.  Section  3  is 
dedicated  to  the  bandwidth  estimation  of  OFDM  signals, 
with  performances.  In  section  4  we  give  a  method  to  ob¬ 
tain  the  number  of  sub-carriers.  Section  5  concludes  the 

OFDM  is  a  single  carrier  multiplexing,  and  can  then  be 
expressed  as  a  sum  of  single  carrier  modulated  signals: 

xm  (l) 

e2i*(/0+n-A f)t  Tt) 


where  {en  *}  is  the  symbol  sequence  which  is  assumed  to 
be  centered,  i.i.d.,  Np  the  number  of  sub-carriers,  A/  the 
frequency  offset  between  carriers,  g(t)  the  pulse  function 
and  P  the  power  of  the  signal.  T,  Tu  +Tg,  Tu  is  the 
“useful  time”  when  information  is  sent,  Tg  is  the  interval 
guard  and  Ts  the  time  of  the  complete  OFDM  symbol. 
We  will  suppose  here  that  the  interval  guard  is  empty. 
Due  to  the  multiplexing  of  many  single  carrier  signals, 
the  spectrum  of  the  OFDM  signal  is  quite  rectangular 
(Fig.  1).  We  assume  to  receive  the  complex  signal  r(t) 
x(t)  +  b[t) where  #(<)  is  the  OFDM  baseband  signal  (with 
possible  frequency  and  time  offsets)  and  b(t)  is  a  complex 
white  gaussian  noise. 

mm  a  un  «cr*  0 w  M  .au.-tK.i-u— 

Figure  1:  Spectrum  amplitude  of  OFDM  signal  with  32 


3.1.  Continous  wavelet  decomposition  (CWT) 

From  a  signal  point  of  view,  wavelets  consist  of  a  linear 
decomposition  of  a  signal  on  a  given  waveform  translated 
in  time  and  dilated  or  compressed  in  time  [1],  In  the  fre¬ 
quency  domain,  wavelet  analysis  is  closely  related  to  fil- 

0-7803-5988-7/00/$  10.00  ©  2000  IEEE 


tering  the  data  through  a  bank  of  filters  having  constant 
surtension  coefficients.  The  continuous  wavelet  trans¬ 
form  (CVVT)  maps  a  one-dimensional  analog  signal  called 
s(t)  to  a  set  of  wavelet  coefficients  which  vary  continu¬ 
ously  over  time  b  and  scale  a: 

-f  CO 

W(a,b)  a-1/2  •  J  ip*(- — -)-s(t)-dt 

—  00 

where  W(a,b)  signifies  “Wavelet  Transform”.  ip(t)  is 
the  wavelet  used  in  the  decomposition.  Equivalently  the 
CWT  can  be  expressed  as: 

+  00 

W{a,  b )  a1'2  ■  J  4>*{au)  ■  S(v)  ■  e2i*l/b  ■  dv 

—  00 

with  ip(v)  and  S(v)  the  Fourier  transforms  of  ip(t)  and 
s(t)  respectively.  Wavelets  must  satisfy  some  restric¬ 
tions  [1],  the  most  important  ones  are  integrability  and 
square  integrability.  Consequently,  this  condition  implies 
that  if  4>{v)  is  a  smooth  function  in  the  neighborhood  of 
the  frequency  origin  then  V’(O)  0,  which  means  that 

ip(t)  has  no  DC  component.  Other  assumptions  about 
wavelets  can  be  made  for  convenience.  One  such  require¬ 
ment  is  that  4>{v)  0  ,  for  v  <  0.  It  is  also  convenient 

to  assume  that  ip(v)  is  real  for  v  >  0.  The  wavelet  func¬ 
tions  ip(^-)  are  used  to  band-pass  filter  the  signal.  This 
can  be  seen  as  a  kind  of  time-varying  spectral  analysis 
in  which  scale  a  plays  the  role  of  a  local  frequency.  As 
a  increases,  wavelets  are  stretched  and  analyze  low  fre¬ 
quencies,  while  for  small  a,  contracted  wavelets  analyze 
high  frequencies.  The  parameter  b  varying  in  time  con¬ 
trols  the  desired  temporal  location.  The  scalar  product 
corresponds  to  the  signal  measurement  s(t)  in  the  space 
drawn  by  all  the  dilated  or  contracted  figures  of  unique 
function  ip.  In  order  to  analyze,  the  dilation  parameter 
a  is  given  an  initial  large  value  (e.g.  1.0)  and  is  then 
decreased  in  regular  increments  to  examine  the  signal  in 
more  detail.  We  can  write  equivalently  that  the  wavelet 
filter  function  considers  successively  narrow  section  of  the 
signal  spectrum  S(v).  Since  spectral  properties  are  fre¬ 
quently  better  displayed  on  a  logarithmic  frequency  scale, 
it  is  convenient  to  write  a  2~u.  With  this  notation  in¬ 
tegral  increments  in  u  result  in  octave  increments  of  a. 
Note  that  a  small  a  (i.e.  large  u )  corresponds  to  high 
frequencies.  A  small  u  corresponds  to  an  analysis  of  the 
large  scale  features  of  s(t),  and  as  u  increases,  finer  de¬ 
tails  of  the  signal  come  into  focus.  The  function  ip(t) 
is  the  basic  unshifted  and  undilated  wavelet.  It  may  be 
chosen  to  answer  the  needs  [5].  For  example,  in  our  case, 

ip(t)  e~  2  +Jm(  is  the  Morlet  wavelet.  An  important 
property  of  this  basic  wavelet  is  that  it  is  concentrated 
in  the  time  and  frequency  domains.  This  means  that  the 
time-bandwidth  product  is  as  small  as  possible.  To  sat¬ 
isfy  ^(0)  0,  one  must  add  a  correction  term,  but  if 

m  >  5,  this  correction  term  is  negligibly  small  and  can 
be  omitted.  One  problem  of  practical  interests  for  en¬ 
gineers  is  detection  of  abnormal  features.  Generally,  we 
have  to  use  a  discretization  procedure  since  we  consider 

digital  data.  This  discretization  procedure  consists  in  a 
high  resolution  digitalization  of  the  generating  wavelet  in 
the  time  domain,  truncated  on  its  sides  in  order  to  have 
a  finite  extent.  Then,  the  wavelet  coefficients  Cj of  the 
time-frequency  decomposition  are  obtained  by  a  corre¬ 
lation  in  the  time-domain  of  the  interpolated  digitized 
wavelets  ipj^  with  the  discrete  signal  s(n)  for  different 
values  of  the  dilation  factor  2J  and  of  the  time  shift  k. 
This  approach  presents  some  drawbacks  such  as  the  edge 
effects  due  to  the  correlation  of  a  finite  duration  signal 
with  a  truncated  infinite  wavelet,  the  numerical  approx¬ 
imations  due  to  truncature,... 

3.2.  Bandwidth  estimation  method 

The  beginning  and  the  end  of  the  PSD  of  an  OFDM 
signal,  called  R{f),  are  breaking  points  and  can  be  eas¬ 
ily  detected  by  using  a  wavelet  decomposition  [4].  We 
decide  to  choose  the  Morlet  wavelet  for  analyzing  the 
PSD  signal  and  obtain  the  scalogram  figure  of  the  PSD 
(Fig.  2).  Nevertheless,  we  have  to  admit  that  the  esti- 

Figure  2:  Scalogram  of  the  PSD  of  the  received  signal 
r(t).  1024  samples.  SNR  =  3  dB. 

mation  is  purely  visual.  For  that  reason,  we  decide  to 
project  the  resulted  scalogram  to  obtain  its  frequency 
marginal.  Because  wavelet  analysis  is  a  constant  A/// 
transformation,  we  have  to  make  the  sum  of  energy  in  a 
cone,  instead  of  summing  energy  of  column  as  in  the  case 
of  a  bilinear  time-frequency  transformation.  Moreover, 
we  can  not  be  sure  that  the  wavelet  has  the  same  energy 
in  each  time-frequency  logon.  Consequently,  we  propose 
to  calculate  the  scalogram  of  the  Dirac  distribution  which 
has  a  cone  shape  and  specifically  characterizes  breaking 
points.  Considering  this  scalogram,  it  becomes  easy  to 
conserve  only  points  with  enough  energy  (i.e  more  energy 
that  a  given  percentage  of  the  total  energy  of  the  signal) 
and  then  to  form  a  mask  of  description.  We  then  obtain 
the  bandwidth  estimation  algorithm: 

1.  Apply  the  Dirac  mask  on  the  scalogram  of  the  stud¬ 
ied  PSD  signal  R(f)  for  each  frequency  localization. 

2.  Calculate  the  sum  of  the  energy,  which  gives  the  fre¬ 
quency  marginal  of  the  scalogram  . 

3.  Search  for  the  two  extrema  located  in  the  beginning 
and  the  end  of  the  bandwidth. 


Two  options  are  possible  to  calculate  the  energy  of  the 
scalogram  of  Rif)  in  the  cone  of  the  Dirac  mask.  First, 
we  can  use  a  binary  mask,  which  means  that  energy  is 
equal  to  “1”  if  the  point  belongs  to  the  cone,  “0”  oth¬ 
erwise.  The  second  solution  consists  in  using  a  weighted 
Dirac  mask  which  gives  the  real  energy  of  each  logon  after 
thresholding.  VVe  show  in  Fig.  3  that  the  second  solution 
leads  to  the  right  frequency  marginal. 

Figure  3:  Frequency  marginals  of  the  scalogram  in  the 
case  of  binary  and  weighted  Dirac  mask. 

3.3.  Results  and  performance 

Number  of  good  evtmartem 

Onp  2.00  4  00  6.D0  8.(0  10-00 

Figure  4:  Noise  influence:  estimation  performance  for 
different  SNR,  no  time  or  frequency  offsets. 

Number  of  good  eartmoUreii 
ii  ]0e4i 









We  apply  the  proposed  algorithm  to  10,000  trials  of  sim¬ 
ulated  OFDM  signals.  These  signals  are  generated  with 
4096  samples,  with  4  samples  per  symbol.  The  PSD  is 
evaluated  by  using  1024  points.  We  have  simulated  ex¬ 
actly  the  binary  random  sequence  for  SNR  equal  to  10, 
5,  3  and  0  dB.  Moreover,  we  have  studied  the  effects  of 
bad  synchronization  by  considering  time  and  frequency 
offsets  (time  offset  is  smaller  Tu  and  frequency  offset  can 
not  exceed  5  %  of  the  bandwidth  of  the  signal). 

3.3.1.  Noise  influence 

Figure  5:  frequency  offset  influence:  estimation  perfor¬ 
mance  for  different  SNR,  no  time  offset. 

3.3.3.  Conclusion  concerning  the  method 

The  proposed  method  is  efficient  until  SNR  3  dB, 
even  in  the  case  of  time  or  frequency  offset.  By  using 
the  PSD  of  the  received  signal,  all  phase  perturbations 
can  be  removed.  Until  SNR  3  dB,  we  can  conclude 
that  the  bandwidth  is  correctly  estimated  in  100  %  of  the 
cases  with  an  error  lower  than  8  %. 

Fig.  4  shows  the  results  of  bandwidth  estimation  for  dif¬ 
ferent  SNR.  The  proposed  algorithm  permits  to  deter¬ 
mine  the  bandwidth  with  a  precision  lower  than  4  %  for 
97  %  of  the  signals  when  SNR  3  dB.  But  we  can  ob¬ 
serve  a  strong  degradation  of  the  performance  as  SNR 
goes  to  0  dB. 


4.1.  Theoretical  covariance  matrix 

3.3.2.  Time  and  frequency  offset  influence 

Fig.  5  shows  results  obtained  for  different  SNR  in  the 
case  where  the  frequency  offset  S f0  is  non  zero.  The  new 
scalogram  is  quite  a  translated  version  of  the  original 
scalogram  with  lengh  5fo.  Consequently,  the  bandwidth 
remains  the  same  and  the  performances  are  still  good. 
The  time  offset  8t0  is  equivalent  to  a  new  phase  for  the 
signal.  Since  we  evaluate  its  PSD,  phase  has  not  influence 
anymore  and  then  the  performances  are  strictly  the  same 
as  in  the  case  Sto  0. 

In  this  problem,  we  are  receiving  one  signal  which  is  made 
of  Np  components.  Then,  we  compute  the  coefficients  of 
the  covariance  matrix  called  R.  For  each  time-delay  rn 
in  the  interval  [0  ;  Np  —  1],  the  covariance  term  can  be 
expressed  by: 

1  N* 

r(Tn)  n  -r  '  X!  *(?)  x*(9-rn)  (2) 

p  Tn  ,=rn+ 1 

where  Ne  denotes  the  number  of  samples  of  the  received 
signal.  Moreover  we  can  notice  that  the  estimator  is  a 


non-biased  estimator.  In  the  case  where  r„  0,  we  have: 


P  9  =  * 

5Zk(?)l2  +  ^fc 


where  a\  is  the  variance  of  noise.  If  rn  /  0,  we  have: 


~~  ■  *(«)  ■**(?- r») 

Np  -  Tn 




<7=1  +  T„ 

and  then  we  consider  un  27rA frn,  depending  on  r„. 

case  of  fading,  the  contributions  of  some  sub-carriers  be¬ 
come  lower  and  the  breaking  point  is  impossible  to  find. 
Another  solution  is  to  use  a  decision  criterion:  Akaike’s 
or  Rissanen’s  criterion.  Akaike’s  criterion  is  more  suited 
since  it  tends  to  overestimate  the  number  of  sources  if 
the  signal  is  oversampled,  which  could  be  helpful  in  the 
case  of  fadings.  Moreover,  this  method  is  efficient  only  if 
there  is  two  noise  contributions  (at  least).  That  is  that 
the  number  of  correlation  terms  must  be  at  least  equal 
to  (Np  +  1).  The  problem  is  that  Np  is  unknown  and  has 
to  be  estimated.  The  proposed  solution  is  to  start  the 
algorithm  with  an  a  priori  number  of  sub-carrier  and  to 
iterate  this  process  until  one  eigenvalue  corresponding  to 
noise  or  a  breaking  point  appears. 

4.2.  Proposed  algorithm 

We  can  then  form  the  covariance  matrix  as: 

R  : 





^r*(Np  —  1)  r*(JVp-2) 

r(N„-2)  r(Np-l)\ 
r(Np~  3)  r(Np  —  2) 



Considering  the  value  of  r(rn)  in  the  case  where  rn  0 
or  t„  /  0,  thix  matrice  can  also  be  written  as: 


R  = 



9  — 1 


l  Yxlei(Np~ °wn 


]T  x2q  •  e*(NP~1)u'» 

9  =  1 

Y  ■  e*(Np-2)w« 

9  =  1 


E  *;+*? 


This  matrix  is  a  symetrical  matrix  and  its  form  is  the 
same  as  in  the  cases  for  which  MUSIC  algorithm  is  used. 
Then  it  can  be  diagonalized  by  using  eigenvalue  decom¬ 
position  [6].  After  the  diagonalization  process,  we  know 
that  the  autocorrelation  matrix  becomes: 

/Ai  0... 

0  A2 

0  . 

...  Ox 

0  ... 

. . .  \Np  0 

0  ... 

...  0  *1 

0  ... 


\  0  ... 

0  of) 

where  Aj,  A2, . . . ,  \np  are  the  eigenvalues  due  to  the 
contribution  of  the  useful  signal  plus  noise.  Normally, 
At  >  Vi  €  (1,  2, . . . Np}.  We  can  notice  that  the  ma¬ 
trix  contains  Np  eigenvalues  which  are  bigger  than  the 
noise  variance,  and  then  that  the  number  of  sub-carriers 
can  be  deduced. 

Many  solutions  are  possible  to  determine  which  values 
are  due  to  the  contribution  of  the  sub-carriers.  As  the 
channel  has  been  surveyed  before  the  signal  started,  we 
can  assume  that  the  variance  of  noise  has  been  estimated, 
with  of  course  incertitude.  A  second  solution  is  to  rep¬ 
resent  the  eigenvalues  on  a  same  diagram  by  increasing 
value  order  and  to  detect  a  breaking  point.  But,  in  the 

The  first  algorithm  we  propose  is  the  following: 

1.  Fixe  a  priori  the  size  of  the  matrix:  Ne. 

2.  Using  equation  2,  compute  the  Ne  autocorrelation 
terms  and  form  correlation  matrix. 

3.  Diagonalize  the  matrix  and  apply  Akaike’s  critrion. 
If  the  number  of  sub-space  (i.e.  of  sub-carriers)  is 
equal  to  Ne,  go  to  step  1  and  do  Ne  2  •  Ne. 

4.3.  Results 

We  apply  the  proposed  algorithms  to  simulated  OFDM 
signals.  We  simulate  10,000  OFDM  signals  using  10,000 
trials  to  generate  the  corresponding  symbols.  Each  signal 
is  generated  with  50,000  samples  normally  and  contains 
64  sub-carriers.  The  frequency  offset  is  limited  to  10%  of 
the  bandwidth  of  the  signal.  The  channel  is  the  urban 
channel  (COST  207)  in  order  to  compare  decision  criteri- 
ons.  We  apply  MUSIC  algorithm  with  Akaike’s  criterion 
(except  in  figure  8). 

4-3.1.  Noise  influence 

In  the  first  case,  we  are  looking  for  noise  influence.  We 
generate  OFDM  signals  for  different  signal-to-noise  ra¬ 
tios  (20,  10  and  5  dB).  We  can  notice  on  Fig.  6  that  until 
10  dB  performances  are  quite  good,  but  become  poor 
for  5  dB  and  less.  Then,  we  study  the  influence  of  the 
number  of  signal  samples  since  we  use  estimators  of  auto¬ 
correlation  terms.  SNR  is  fixed  to  20  dB,  and  the  signals 
are  tested  with  respectively  50,000,  40,000  and  30,000 
samples.  As  forecasted,  the  performances  decrease  with 
the  number  of  samples.  Nevertheless,  50,000  samples  are 
enough  to  obtain  good  performances  (Fig.  7).  Lastly,  we 
compare  Rissanen’s  and  Akaike’s  criterion  in  the  case  of  a 
signal  with  50,000  samples  and  SNR  =  20  dB  and  10  dB. 


Figure  6:  Noise  influence  in  the  estimation  of  the  number 
of  sub-carriers. 



O.OO  J «)  10.00  15.00 

Figure  7:  Influence  of  the  number  of  points  in  the  esti¬ 
mation  of  the  number  of  sub-carriers.  SNR=20  dB. 

Since  it  tends  to  overestimate  the  dimension  of  the  sig¬ 
nal  sub-space,  Akaike’s  criterion  is  quite  better  than  the 
Rissanen’s  one  (Fig.  8). 

4.4.  Conclusion  concerning  the  method. 

This  method  is  quite  efficient  to  estimate  the  number  of 
sub-carriers  until  SNR  =  10  dB  and  for  50,000  samples 
(that  means  about  1500  OFDM  symbols).  Akaike’s  cri¬ 
terion  is  more  appropriated  than  Rissanen’s  one,  but  we 
should  test  the  “Minimum  Description  Length”  criterion. 


The  proposed  methods  to  estimate  the  bandwidth  and 
the  number  of  sub-carriers  are  quite  efficient  for  a  few 
samples  and  low  SNR  (  lower  than  10  dB).  Concerning 
the  bandwidth  estimation,  we  obtain  a  correct  estimation 
in  100  %  of  the  cases  with  an  error  lower  than  8  %  until 
SNR  =  3  dB.  Concerning  the  estimation  of  the  number  of 
sub-carriers,  we  obtain  a  correct  estimation  in  100  %  of 
the  cases  with  an  error  lower  than  9  %  until  SNR  =  10  dB. 
The  performances  can  be  improved  using  denoising  algo¬ 
rithms  [3]  and  compared  with  time-domain  methods  that 
we  are  currently  developping  [4],  This  work  completes 
our  detection  algorithm  and  can  be  used  for  coming  ap- 

Akaike'.i  criterion  GO  dB) 

Cumulate  Error 

0.00  5.00  10.00  15.00 

Figure  8:  Influence  of  the  decision  criterion  on  the  estima¬ 
tion  of  the  number  of  sub-carriers.  10,000  trials,  50,000 
samples,  SNR=20  and  10  dB,  urban  channel  (COST 

plications  of  synchronization  and  equalization. 


[1]  A.  Cohen  “Ondelettes  et  traitemcnt 
numfirique  du  signal”  ed.  MASSON,  1992, 
205  pages. 

[2]  W.  Akmouche  “Detection  of  multi-carrier  mod¬ 
ulations  using  4th-order  cumulants.”  Proc. 
of  ike  MILCOM,  session  15,  Atlantic  City,  01- 

[3]  E.  Kerherve,  W.  Akmouche,  A.  Quinquis  “Wavelet 
and  noise  reduction:  application  to  the  time 
features  estimation  for  OFDM  signals.”  Proc. 
of  the  ICSPAT ,  Orlando  (Florida),  USA,  Oct.  1999. 

[4]  W.  Akmouche,  E.  Kerherve,  A.  Quinquis  “Estima¬ 
tion  of  OFDM  signal  parameters:  time  pa¬ 
rameters.”  submitted  to  Globecom  2000,  Nov.  2000. 

[4]  J.-C.  Pesquet,  H.  Krim,  H.  Carfantan,  J.  G.  Proakis 
“Estimation  of  noisy  signals  using  time- 
invariant  wavelet  packets”  Proc.  of  the  IEEE, 
1993,  pp.  31-34. 

[5]  A.  Teolis  “Computational  signal  processing 
with  wavelets”  Proc.  of  the  IEEE,  1993,  pp.  31-34. 

[6]  E.  H.  Attia  “Efficient  computation  of  the  MU¬ 
SIC  algorithm  as  applied  to  a  low-angle  eleva¬ 
tion  estimation  problem  in  a  severe  multipath 
environment”  Ed.  Birkhauser,  1998,  324  pages. 




Brian  S.  Krongold  and  Douglas  L.  Jones 

Department  of  Electrical  and  Computer  Engineering  &  Coordinated  Science  Laboratory 

University  of  Illinois  at  Ur bana-  Champaign 
Urbana, IL  61801 


Many  algorithms  for  blind  source  separation  (BSS)  have 
been  introduced  in  the  past  few  years,  most  of  which 
assume  statistically  stationary  sources  as  well  as  instan¬ 
taneous  mixtures  of  signals.  In  many  applications,  such 
as  separation  of  speech  or  fading  communications  sig¬ 
nals,  the  sources  are  nonstationary.  Furthermore,  the 
source  signals  may  undergo  convolutive  (or  dynamic) 
linear  mixing,  and  a  more  complex  BSS  algorithm  is  re¬ 
quired  to  achieve  better  source  separation.  We  present 
a  new  BSS  algorithm  for  separating  linear  convolutive 
mixtures  of  nonstationary  signals  which  relies  on  the 
nonstationary  nature  of  the  sources  to  achieve  sepa¬ 
ration.  The  algorithm  is  an  on-line,  LMS-like  update 
based  on  minimizing  the  average  squared  cross-output- 
channel-correlations  along  with  unity  average  energy 
output  in  each  channel.  We  explain  why,  for  nonsta¬ 
tionary  signals,  such  a  criterion  is  sufficient  to  achieve 
source  separation  regardless  of  the  signal  statistics. 


The  separation  of  multiple  unknown  sources  from  multi¬ 
sensor  data  has  many  applications,  including  the  iso¬ 
lation  of  individual  speech  signals  from  a  mixture  of 
simultaneous  speakers  (as  in  video  conferencing  or  the 
often-cited  “cocktail  party”  environment),  the  elimi¬ 
nation  of  cross-talk  between  horizontally  and  vertically 
polarized  microwave  communications  transmissions,  and 
the  separation  of  multiple  cellular  telephone  signals  at  a 
base  station.  In  the  past  decade  or  so,  a  number  of  sig¬ 
nificant  methods  have  been  introduced  for  blind  source 
separation,  of  which  we  review  a  few  of  the  most  popu¬ 
lar  here.  One  of  the  earliest  and  most  effective  methods 
(yet  relatively  unknown  in  some  circles)  is  a  constant- 
modulus-based  method  published  in  1985  by  Treichler 
and  Larimore  [1].  This  method  achieves  simultaneous 

This  work  was  supported  by  the  National  Science  Founda¬ 
tion,  grant  no.  CCR-9979381. 

separation  and  equalization  by  minimizing  the  devia¬ 
tion  of  the  separated  output  magnitudes  from  a  fixed 
gain.  This  method  is  very  simple  and  convenient  and 
works  well  even  for  non-constant-modulus  signals  with 
a  sub-Gaussian  kurtosis  (which  includes  most  commu¬ 
nications  signals). 

Jutten  and  Herault  introduced  one  of  the  most  pop¬ 
ular  methods  [2].  This  method  works  well  in  many  ap¬ 
plications,  particularly  cross-talk  situations  in  which  a 
relatively  modest  amount  of  mixing  occurs.  For  more 
challenging  scenarios,  the  existence  of  multiple  min¬ 
ima  and  misconvergence  of  the  widely  used  Jutten- 
Herault  algorithm  has  been  examined  in  the  literature 
[3]-[4] .  Methods  for  non-Gaussian  sources  have  also 
been  developed,  including  [5]  and  others1.  More  re¬ 
cently,  methods  based  on  second-order  statistics  (and 
which  can  thus  work  even  for  Gaussian  sources)  have 
been  introduced.  A  method  by  Belouchrani,  et  al.  can 
separate  stationary  Gaussian  sources  with  different  au¬ 
tocorrelation  statistics  [6]. 

In  many  applications  of  blind  source  separation, 
the  received  signals  are  nonstationary.  Nonstationar- 
ity  may  arise  either  from  the  source  signals  themselves 
(such  as  speech),  or  from  channel  impairments  (such 
as  fading  in  wireless  communications  channels).  Most 
techniques  for  blind  source  separation  assume  station- 
arity  of  the  signals  and  depend  on  reliable  estimation 
of  second-order  or  higher-order  statistics.  These  meth¬ 
ods  may  have  difficulty  when  applied  to  nonstationary 

Several  methods  developed  explicitly  for  nonsta¬ 
tionary  source  separation  have  been  published  recently. 
Belouchrani  and  Amin  have  developed  a  time- frequency 
extension  of  the  method  in  [7]  for  nonstationary  sources, 
and  Parra,  et  al.  have  developed  another  method  based 
on  frequency  decomposition  of  several  successive  blocks 
of  time  [8].  While  these  methods  appear  effective,  and 

JIt  should  be  noted  here  that  the  CMA-based  method  by 
Treichler  and  Larimore  also  depends  on  the  sub-Gaussianity  of 
the  sources. 

0-7803-5988-7/00/$  10.00  ©  2000  IEEE 


the  latter  can  also  separate  convolutive  mixtures,  they 
are  block-based  methods  requiring  somewhat  sophis¬ 
ticated  and  expensive  processing.  Matsuoka,  et  al. 
present  an  on-line,  adaptive  extension  of  the  Jutten- 
Herault  method  which,  somewhat  like  the  method  we 
proposed  in  [9],  attempts  to  minimize  the  average  cross¬ 
correlation  between  separated  channels  while  normal¬ 
izing  the  output  energy  [10]. 

In  various  situations,  convolutive  (or  dynamic)  mix¬ 
ing  occurs  rather  than  instantaneous  mixing.  This  com¬ 
plicates  the  BSS  problem  and  requires  a  more  sophisti¬ 
cated  and  computationally  complex  solution.  Although 
the  convolutive  mixture  problem  is  not  as  widely  pub¬ 
lished  as  the  instantaneous  problem,  methods  for  solv¬ 
ing  the  problem  are  discussed  in  [11]— [12] . 

In  this  paper,  we  extend  our  work  in  [9]  to  con¬ 
volutive  mixtures  to  obtain  a  method  for  blind  source 
separation  of  nonstationary,  convolutively  mixed  sig¬ 
nals  which  requires  only  nonstationarity  and  indepen¬ 
dence  of  the  sources  to  achieve  separation.  An  on-line, 
LMS-like  algorithm  is  derived  which  achieves  separa¬ 
tion  while  normalizing  the  average  energy  of  each  out¬ 
put  channel.  This  simple  algorithm  also  offers  tracking 
capability  for  time-varying  convolutive  mixtures.  The 
optimization  criterion  is  presented  in  the  second  sec¬ 
tion  of  this  paper,  the  adaptive  algorithm  is  derived  in 
the  third  section,  and  simulations  which  illustrate  its 
performance  are  presented  in  the  fourth  section.  Some 
perspectives  on  the  results  are  discussed  in  the  final 


The  general  source  separation  problem  with  convolu¬ 
tive  mixtures  can  be  described  as 


x(n)  =  ^  A (n  -  m)s(m),  (1) 

m— — oo 

where  s(n)  is  a  vector  of  M  zero-mean,  statistically 
independent  source  processes  at  time-sample  n,  x(n)  is 
a  vector  of  N  sensor  measurements,  N  >  M,  and  A (n) 
is  an  M  x  N  mixing  filter  matrix.  The  goal  of  blind 
source  separation  is  to  determine  an  N  x  M  de-mixing 
matrix  of  filters  B(n)  for  n  =  Q ...  L  —  1 ,  which,  when 
applied  to  the  received  sensor  data  as  in 

L- 1 

y(n) =  51  B(m)x(n  -  m)i  (2) 

m= 0 

recovers  (separates)  the  individual  sources  up  to  an  un¬ 
known  permutation  and  unknown  channel  gains,  which 

cannot  be  uniquely  determined  without  additional  in¬ 
formation  [10]. 

An  important  problem  with  convolutive  mixtures 
is  that  even  complete  separation  may  not  recover  the 
exact  original  x(n)  source  signals.  Due  to  the  blind 
nature  of  the  problem  and  the  memory  introduced  by 
the  convolutive  mixing,  it  may  be  impossible  to  ob¬ 
tain  the  true  source  signals,  and  instead  filtered  ver¬ 
sions  may  result  without  further  assumptions  on  the 
source  signals.  It  is  for  this  reason  that  convolutive- 
mixture-BSS-algorithm  performance  can  be  viewed,  as 
in  [11],  by  how  well  a  system  separates  two  sources 
without  any  regard  to  how  the  output  signals  compare 
to  their  unfiltered  source  versions.  A  way  to  quantify 
this  separation  performance  is  to  see  how  well  (statis¬ 
tically)  uncorrelated  the  output  signals  are.  In  this 
paper  though,  our  methods  perform  joint  separation- 
equalization,  and  this  should  work  well  for  a  certain 
class  of  source  signals.  Our  simulations  compare  the 
output  signals  to  the  original  source  signals  and  quan¬ 
tify  the  performance  in  terms  of  signal-to-interference 
ratio  (SIR). 

It  has  been  observed  in  many  papers  on  blind  source 
separation  that  a  necessary  condition  for  the  separa¬ 
tion  of  zero-mean,  statistically  independent  sources  is 
that  the  cross-correlations  of  the  output  channels  equal 
zero.  However,  this  is  not  a  sufficient  condition,  as  is 
well  known  (see  [9]  for  an  example  demonstrating  this). 
For  sources  with  fixed  variances,  an  ambiguity  exists  as 
there  are  an  infinite  number  of  demixing  matrices  which 
obtain  zero  cross-channel  correlation.  For  any  arbitrary 
pair  of  variances,  the  classes  of  decorrelating  matrices 
are  different  for  different  source  variances,  and  only  a 
true  separating  solution  yields  zero  cross-channel  cor¬ 
relation  for  all  variance  combinations.  This  is  the  key 
insight  on  which  nonstationary  blind  source  separation 
algorithms  are  based.  In  effect,  these  methods  take 
multiple  snapshots  of  the  short-time  cross-correlations 
at  different  times,  and  by  minimizing  all  of  these  si¬ 
multaneously,  they  exploit  the  changes  in  the  relative 
channel  variances  to  find  a  truly  separating  solution. 

This  paper  uses  the  same  basic  insight,  but  proposes 
a  new  criterion  for  exploiting  it  which  leads  to  a  par¬ 
ticularly  simple  and  convenient  algorithm.  We  propose 
to  minimize  the  following  criterion: 

L—l  MM  M 

E  +  *S(*wvi(°)  ~  ^ 

n=0..?L-l  1=0  i=l  i=\  *=1 

L  J 


where  at  time  n 

fyiyj{h  n)  =  ^2  h(k)Vi(n  ~k~  l)vAn  ~  k )  (4) 



and  h(k)  is  a  lowpass  averaging  filter  for  computing 
a  short-term  estimate  of  the  cross-correlation  of  out¬ 
put  channels  j/j  and  yj  at  time  n  and  lag  l.  The  first 
term  in  the  criterion  is  to  minimize  the  average  squared 
magnitude  of  the  short-term  cross-correlations  for  the 
first  L  lags  of  the  output  signals  (which,  as  discussed 
above  and  in  [10],  should  only  be  achieved  for  non¬ 
stationary  signals  by  a  separating  solution),  while  the 
second  term  demands  that  the  output  signals  in  each 
channel  have  unit  energy  on  average.  In  a  sense,  the 
second  criterion  adds  a  signal  normalization  feature  to 
the  algorithm,  but  as  was  shown  in  [1],  this  CM  A  cri¬ 
terion  has  the  ability  to  jointly  separate  and  equalize 
sub-Gaussian  signals.  In  instances  where  the  source 
signals  are  sub-Gaussian  (or  one  or  more  of  them  are), 
the  added  CMA  criterion  greatly  aids  in  separating  as 
well  as  equalizing  in  order  to  obtain  closer  estimates  of 
the  original  x(n)  source  signals. 



i  =  3 

+  2X(fyp,yp(0)  —  l){fyptXq(k))  (7) 

We  now  derive  efficient  recursive  updates  for  the 
short-term  correlation  estimates  for  a  convenient  form 
of  the  averaging  filter.  For  computational  efficiency, 
we  select  a  first-order  HR  averaging  filter  with  impulse 

h(k)  =  aku{k)  (8) 

where  u{k)  is  the  unit  step  function  and  0  <  a  <  1. 
With  this  form,  the  correlation  statistics  can  easily  be 
updated  recursively  according  to 

fyiVj  (/;  n  +  1)  =  afyiVj  (1;  n)  +  yi(n  -  \l\)yj(n),  (9) 

and  similarly 

fyixj(kn  +  1)  =  afyiXj(l\n )  +  yi(n  -  \ l\)xj(n)  (10) 

There  are  many  ways  to  construct  a  numerical  algo¬ 
rithm  based  on  the  above  criterion  for  blind  nonsta¬ 
tionary  source  separation,  yielding  different  tradeoffs 
in  terms  of  computational  efficiency,  convergence  rate, 
block-based  or  adaptive  forms,  etc.  However,  in  many 
applications,  a  simple,  adaptive  method  which  can  track 
slow  variations  in  the  mixing  parameters  is  desired.  We 
derive  here  a  stochastic  gradient  (LMS-like)  algorithm 
which  has  these  characteristics. 

Many  of  the  most  successful  adaptive  algorithms  are 
based  on  a  stochastic  gradient  update  using  an  instan¬ 
taneous  approximation  to  the  expectation  in  the  opti¬ 
mization  criterion.  For  the  optimization  of  the  demix¬ 
ing  matrices,  B(Z)’s,  a  stochastic  gradient  update  takes 
the  form 

Bn+1(0  =  B„(/)-MVn(0  for  l  =  0 . . .  L  —  1.  (5) 





M  M 

EE  ^ ViVi  (0  +  yiViify  1) 

i~  1  j= l 




i= 1 


where  p  and  q  are  the  row  and  column  indices  of  the 
gradient  matrix.  Note  the  use  of  the  instantaneous 
value  at  time  n  of  the  error  function  in  (3)  in  the  gra¬ 
dient  computation.  The  (p,  g)th  element  of  the  gradient 
matrix  at  lag  l  can  easily  be  shown  to  be 


^7pq,n(l)  —  2 



'  ^Vv,l li  (0 ?xq,yj  (l  —  k)  + 

j=  i 

for  all  lags  l  which  are  required  for  the  algorithm.  This 
completes  the  following  simple  recursive  algorithm  for 
nonstationary  blind  source  separation. 

1.  Compute  output  according  to  (2). 

2.  Update  short-time  correlations  using  (9)  and  (10). 

3.  Compute  separation  filter  gradient  using  (7). 

4.  Update  separation  filters  as  in  (5). 

5.  Go  back  to  step  1. 

The  complexity  of  the  algorithm  in  the  instanta¬ 
neous  mixture  case  was  shown  in  [9]  to  be  0(M2N). 
Extension  to  the  convolutive  mixture  case  yields  in¬ 
creased  complexity  by  a  factor  of  L2,  where  L  is  of 
course  a  chosen  parameter  which  can  be  used  to  trade 
off  complexity  and  quality  of  separation. 


Several  simulations  have  been  performed  to  confirm  the 
efficacy  of  the  proposed  method.  For  the  following  sim¬ 
ulation  with  two  sources  and  sensors,  the  mixing  ma¬ 
trices  are: 

1  -.5 

.7  1.3 

.35  -.3 

-.2  .6 




where  the  first  matrix  represents  zero  lag,  the  second 
represents  a  lag  of  one,  and  the  third  represents  a  lag 
of  two.  The  nonstationary  sources,  shown  in  Figure 


Figure  1:  First  4000  samples  of  the  nonstationary 
sources  used  in  the  simulation 

1,  are  binary  random  signals  multiplied  by  lowpass  fil¬ 
tered  Gaussian  signals,  and  may  be  considered  a  crude 
approximation  to  communications  signals  undergoing 
fading.  Three  mixing  scenarios  are  simulated  by  con¬ 
sidering  the  cases  of  A  as  above,  only  the  first  two  ma¬ 
trices  of  A,  and  only  the  first  matrix  of  A  (ie.  instan¬ 
taneous  mixture).  These  mixtures  are  tested  against 
our  source  separation  algorithm  with  L  values  ranging 
from  1  to  4,  resulting  in  12  different  simulations. 

Our  BSS  algorithm  was  tested  in  these  12  simula¬ 
tions  and  SIRs  were  computed  for  each  of  these  cases, 
as  well  as  for  the  case  where  no  source  separation  is 
applied2.  When  our  BSS  algorithm  is  applied,  output 
scaling  is  needed  as  BSS  can  only  recover  up  to  an  un¬ 
known  scale  value.  Since  the  scaling  changes  over  time 
as  the  algorithm  adapts,  the  signal  was  normalized  by 
an  approximate  best-fit  scale  factor  every  100  samples. 
A  length-10,000  sample  period  was  evaluated  after  suf¬ 
ficient  convergence  (using  small  values  of  fi)  to  obtain 
the  resulting  SIR  values. 

Table  I  shows  the  simulation  results  when  only  the 
first  matrix  in  A  is  used  for  mixing,  which  results  in 
purely  instantaneous  mixing.  The  results  show  excel¬ 
lent  performance  for  all  cases  of  L,  but  one  feature  is 
that  performance  degrades  slightly  with  increasing  L. 
The  reason  for  this  is  because  only  L  =  1  is  needed  to 
solve  this  problem,  and  by  adding  unneeded,  adaptable 
coefficients,  performance  suffers  slightly  due  to  misad- 
justment  in  the  stochastic  gradient  algorithm  for  the 
non-instantaneous  coefficients. 

Table  II  shows  the  simulation  results  for  length-2 
mixing  (ie.  only  the  first  two  matrices  of  A  are  ap- 

2In  this  case,  the  desired  source  signal  is  chosen  according  to 
which  source  is  dominant  in  the  mixture. 

Table  1:  Length-1  (Instantaneous)  Mixture  Results 

BSS  Type 

SIR  in  dB 

Source  1 

Source  2 




L  =  1 



L  =  2 



L  =  3 



L  =  4 



Table  2:  Length-2  Mixing  Results 

BSS  Type 

SIR  in  dB 

Source  1 

Source  2 




L  =  1 



L  —  2 



L  =  3 



L  —  4 



plied).  The  results  clearly  show  a  performance  degra¬ 
dation  compared  to  the  instantaneous  mixture  results 
as  the  memory  increases  the  difficulty  of  separation.  It 
can  be  seen  that  the  L  =  1  case  does  a  fairly  poor  job 
of  signal  separation,  and  increasing  L  results  in  bet¬ 
ter  SIR  values  as  expected.  Another  observation  is  the 
imbalance  of  SIR  performance  between  the  two  source 
signals.  This  is  a  function  of  the  mixing  filters. 

Table  III  shows  the  simulation  results  for  length- 
3  mixing  using  A  as  in  (11).  The  results  show  even 
further  degradation  than  the  length-2  mixture  case  as 
the  increased  mixing  is  more  difficult  to  recover  from. 
Again,  the  performance  increases  with  the  demixing 
filter  length,  L.  Further  gains  could  be  obtained  by  us¬ 
ing  a  larger  L ,  but  this  comes  at  the  expense  of  greater 
complexity  of  the  system  (proportional  to  L2)  as  well 
as  much  slower  convergence. 

Table  3:  Length-3  Mixing  Results 

BSS  Type 

SIR  in  dB 

Source  1 

Source  2 




L  =  1 



L  =  2 



L  =  3 



L  =  4 





Effective  blind  source  separation  can  be  achieved  by  ex¬ 
ploiting  nonstationarity  of  the  sources.  Furthermore,  it 
is  possible  to  separate  convolutively  mixed  signals  with 
the  algorithm.  This  paper  clearly  shows  performance 
gains  can  be  made  over  an  instantaneous  mixture  algo¬ 
rithm  in  the  presence  of  convolutive  mixtures. 

Nonstationary  blind  source  separation  algorithms 
appear  particularly  relevant  for  practical  applications 
because  many  sources  of  interest,  such  as  speech  or  fad¬ 
ing  signals,  exhibit  nonstationarity  but  may  not  oth¬ 
erwise  present  features  (such  as  non-Gaussian  statis¬ 
tics  or  different  auto-correlation  structure)  required  by 
other  methods. 

In  comparison  with  other  nonstationary  blind  source 
separation  algorithms,  the  method  proposed  here  re¬ 
sults  in  a  simple  on-line  stochastic  gradient  algorithm 
requiring  only  multiplications  and  additions,  which  are 
efficiently  implemented  in  signal  processing  hardware. 
It  appears  to  exhibit  the  traditional  characteristics  of 
LMS-like  algorithms  including  robustness  and  numeri¬ 
cal  stability,  the  ability  to  track  slow  variations  in  the 
environment,  and  relatively  slow  convergence. 

The  computational  complexity  of  the  algorithm  is 
0(NM2L2).  That  is,  the  cost  is  linear  in  the  number 
of  receivers,  but  quadratic  in  the  number  of  sources 
and  the  demixing  filter  lengths.  For  many  applications, 
these  parameters  are  very  small,  and  the  algorithm  is 
very  efficient.  For  larger  values  of  L,  the  computational 
cost  may  be  the  limiting  factor  in  a  tradeoff  between 
performance  and  complexity. 


[1]  J.  R.  Treichler  and  M.  G.  Larimore,  “New  pro¬ 
cessing  techniques  based  on  the  constant  modu¬ 
lus  adaptive  algorithm,”  IEEE  Transactions  on 
Acoustics,  Speech,  and  Signal  Processing,  vol.  33, 
pp.  420-431,  April  1985. 

[2]  C.  Jutten  and  J.  Herault,  “Blind  separation  of 
sources,  part  I:  An  adaptive  algorithm  based 
on  neuromatic  architecture,”  Signal  Processing, 
vol.  24,  pp.  1-10,  July  1991. 

[3]  P.  Comon,  C.  Jutten,  and  J.  Herault,  “Blind  sep¬ 
aration  of  sources,  part  II:  Problem  statement,” 
Signal  Processing,  vol.  24,  pp.  11-20,  July  1991. 

[4]  Y.  Sorouchyari,  “Blind  separation  of  sources,  part 
III:  Stability  analysis,”  Siqnal  Processing,  vol.  24, 
pp.  21-29,  July  1991. 

[5]  J.-F.  Cardoso,  “Iterative  techniques  for  blind  sepa¬ 
ration  using  only  fourth-order  cumulants,”  in  Sig¬ 
nal  Processing  IV  -  Theories  and  Applications, 
Proceedings  fo  EUSIPCO-92,  Sixth  European  Sig¬ 
nal  Processing  Conference,  vol.  2,  pp.  739-742, 

[6]  A.  Belouchrani,  K.  A.  Meraim,  J.-F.  Cardoso,  and 
E.  Moulines,  “A  blind  source  separation  technique 
using  second  order  statistics,”  IEEE  Transactions 
on  Signal  Processing,  vol.  45,  pp.  434-444,  Febru¬ 
ary  1997. 

[7]  A.  Belouchrani  and  M.  G.  Amin,  “Source  separa¬ 
tion  based  on  the  diagonalization  of  a  combined  set 
of  spatial  time-frequency  distribution  matrices,” 
in  Proceedings  of  the  IEEE  International  Confer¬ 
ence  on  Acoustics,  Speech,  and  Signal  Processing, 
ICASSP  -  97,  (Germany),  April  1997. 

[8]  L.  Parra,  C.  Spence,  and  B.  de  Vries,  “Convolutive 
blind  source  separation  based  on  multiple  decor¬ 
relation,”  in  Proceedings  of  1998  IEEE  Workshop 
on  Neural  Networks  for  Signal  Processing,  (Cam¬ 
bridge,  UK),  September  1998. 

[9]  D.  L.  Jones,  “A  new  method  for  blind  source  sep¬ 
aration  of  nonstationary  signals,”  in  Proceedings 
of  1999  IEEE  International  Conference  on  Acous¬ 
tics,  Speech,  and  Signal  Processing,  ICASSP  -  99, 
(Pheonix,  AZ,  USA),  March  1999. 

[10]  K.  Matsuoka,  M.  Ohya,  and  M.  Kawamoto,  “A 
neural  net  for  blind  separation  of  nonstationary 
signals,”  Nueral  Networks,  vol.  8,  no.  3,  pp.  411- 
419,  1995. 

[11]  U.  A.  Lindgren  and  H.  Broman,  “Source  separa¬ 
tion  using  a  criterion  based  on  second-order  statis¬ 
tics,”  IEEE  Transactions  on  Signal  Processing, 
vol.  46,  pp.  1837-1850,  July  1998. 

[12]  H.  L.  N.  Thi  and  C.  Jutten,  “Blind  source  separa¬ 
tion  for  convolutive  mixture,”  Signal  Processing, 
vol.  45,  pp.  209-229,  1995. 



A.  Abdi,  M.  Kaveh 

Dept,  of  Elec,  and  Comp.  Eng.,  University  of  Minnesota 
Minneapolis,  Minnesota  55455,  USA 


For  the  analysis  and  design  of  adaptive  antenna  arrays  in  mobile 
fading  channels,  we  need  a  model  for  the  spatio-temporal 
correlation  among  the  array  elements.  In  this  paper  we  propose  a 
general  spatio-temporal  correlation  function,  where  non-isotropic 
scattering  is  modeled  by  von  Mises  distribution,  an  empirically- 
verified  model  for  non-uniformly  distributed  angle  of  arrival.  The 
proposed  correlation  function  has  a  closed  form  and  is  suitable 
for  both  mathematical  analysis  and  numerical  calculations.  The 
utility  of  the  new  correlation  function  has  been  demonstrated  by 
quantifying  the  effect  of  non-isotropic  scattering  on  the 
performance  of  two  applications  of  the  antenna  arrays  for 
multiuser  multichannel  detection  and  single-user  diversity 
reception.  Comparison  of  the  proposed  correlation  model  with 
published  data  in  the  literature  shows  the  flexibility  of  the  model 
in  fitting  real  data. 


In  recent  years  the  application  of  adaptive  antenna  arrays  (smart 
antennas)  for  cellular  systems  has  received  much  attention  [I], 
since  they  can  improve  the  coverage,  quality,  and  capacity  of 
such  systems  by  combating  interference,  fading,  and  other 
undesired  disturbances.  An  adaptive  array  can  be  defined  as  an 
adaptive  spatio-temporal  filter,  which  takes  advantage  of  both 
time-domain  and  space-domain  signal  characteristics.  Efficient 
joint  use  of  time-domain  and  space-domain  data  demands  a 
generalization  of  conventional  communication  theory  and  signal 
processing  techniques  to  spatial  and  temporal  communication 
theory  [2]  and  space-time  signal  processing  techniques  [3]. 
Needless  to  say,  new  spatio-temporal  channel  models  have  to  be 
developed  as  well.  Since  the  second-order  statistics  of  the 
channel  characterize  the  basic  structure  of  stochastic  mobile 
channels,  we  need  a  spatio-temporal  correlation  function  to  study 
the  basic  impact  of  the  random  channel  on  the  performance  of 
space-time  solutions,  including  the  adaptive  antenna  arrays. 

In  this  paper  we  present  a  flexible  and  versatile  parametric 
correlation  function  for  the  mobile  station  (MS)  (similar  results 
can  be  obtained  for  the  base  station  (BS)  as  well,  as  we  see  in 
Section  4).  We  do  this  by  generalizing  the  spatio-temporal 
correlation  function  in  [4],  originally  derived  for  an  isotropic 
scattering  scenario  where  the  MS  receives  signals  from  all 
direction  with  equal  probability,  to  the  non-isotropic  scattering 
case.  Note  that  isotropic  scattering  at  the  MS  corresponds  to  the 
uniform  distribution  for  the  angle  of  arrival  (AOA)  at  the  MS. 
However,  empirical  results  have  shown  that  due  to  the  structure 
of  the  mobile  channel,  the  MS  is  likely  to  receive  signals  only 
from  particular  directions  (see  [5]  and  references  therein).  In 

other  words,  most  often  the  MS  experiences  non-isotropic 
scattering,  which  results  in  a  non-uniform  distribution  for  the 
AOA  at  the  MS.  In  [5]  it  has  been  shown  that  the  application  of 
von  Mises  distribution  for  the  AOA  at  the  MS  yields  an  easy-to- 
use  and  closed-form  expression  for  the  temporal  (or  equivalently, 
spatial)  correlation  function.  This  correlation  function  has 
exhibited  very  good  fit  to  measured  data  [5]. 

In  the  sequel  we  derive  a  new  spatio-temporal  correlation 
function  where  non-isotropic  scattering  is  modeled  by  the  von 
Mises  distribution.  To  show  the  significant  effect  of  non¬ 
isotropic  scattering  on  the  performance  of  smart  antenna  systems 
employing  space-time  data,  we  study  the  performance  of  an 
antenna  array  multiuser  detector  equipped  with  a  channel 
estimator,  operating  in  a  Rayleigh  fading  channel.  As  a  simpler 
example  where  only  space  data  are  employed,  we  also  investigate 
the  impact  of  non-isotropic  scattering  on  a  multi-element  receiver 
working  as  a  maximal  ratio  combiner  (MRC)  in  a  Rayleigh 
fading  channel.  In  both  examples  we  show  how  the  proposed 
spatio-temporal  correlation  function  helps  us  in  quantifying  the 
effect  of  the  fading  channel  on  the  performance  of  antenna 
arrays,  in  the  realistic  scenario  of  non-isotropic  scattering.  The 
paper  concludes  with  a  comparison  of  the  proposed  correlation 
model  with  the  published  correlation  data,  collected  by  a  BS- 
mounted  array. 


Consider  a  linear  uniformly-spaced  antenna  array  shown  in  [4, 
Fig.  2],  mounted  on  a  MS.  Let  rm(t)  denote  the  complex 
envelope  at  the  mlh  element  from  left.  Then  the  normalized 
correlation  function  between  the  complex  envelopes  of  the  mth 
and  the  nth  antenna  elements,  defined  by 

(z)  =  E[rm (t)r’(t  +  t)\/E[\  rm ( t )  |2  J ,  can  be  derived  from  [4]: 

(U  =  E[cxp{j2rfd t cos(0  -a)  +  j{m  -  n)2it(d/A) cos  ©J,  (1) 

where  E  denotes  mathematical  expectation,  j  =  V-T ,  fd  is  the 
maximum  Doppler  frequency,  0  stands  for  the  AOA,  a 
represents  the  direction  of  the  motion  of  the  MS  with  respect  to 
the  horizontal  axis  counterclockwise,  d  is  the  spacing  between 
any  two  adjacent  antenna  elements,  and  A  is  the  wavelength. 
Now  we  consider  the  von  Mises  probability  density  function 
(PDF)  for  the  random  variable  0  : 


2 ttI0(k) 

6  s  {-n,  k)  , 


where  /0(.)  is  the  zero-order  modified  Bessel  function, 
br  £  \-n,n)  accounts  for  the  mean  direction  of  AOA,  and 
k  >  0  controls  the  width  of  the  AOA  distribution  [5].  For  k  =0 
(isotropic  scattering)  we  have  pe(6)  =  1/(2 n) ,  while  for  k  =<*> 

0-7803-5988-7/00/$  10.00  ©  2000  IEEE 


(extremely  non-isotropic  scattering)  we  obtain 
pe(6)  =  bib  -bp) ,  where  <5(.)  is  the  Dirac  delta  function.  By 
calculating  the  expectation  in  (1)  according  to  (2)  we  obtain: 

A.  (*)&.(*)-  (3) 

10^Jk2  - x 2  - y 2  -2xycosor+  j2K[xcos(a-9p)+ ycosdp]  j, 

where  x  =  2rfdr  and  y  =  2n(m - n)d/A  .  With  x  =0 ,  (3) 
reduces  to  Lee’s  spatio-temporal  correlation  function 
J0(ijx2+y2  +  2xycosa)  in  [4,  Eqs.  (42)-(43)]  for  isotropic 
scattering,  where  /„(.)  is  the  zero-order  Bessel  function.  For 
m  =  n  =  1  (single  antenna),  Lee’s  result  further  simplifies  to 
Clarke’s  classic  temporal  correlation  function  J0(x)  [6,  p.  40, 
Eq.  (2.20)].  For  a  single  antenna  experiencing  non-isotropic 
scattering  and  a  =  0 ,  (3)  reduces  to  the  temporal  correlation 
function  /0(y*2  - x2  +  /2k  xcosbp  )//0(x)  derived  in  [5,  Eq. 
(2)]  (this  correlation  function  has  shown  very  good  fit  to 
measured  data  [5]). 

In  comparison  with  the  existing  spatial  correlation  functions  for 
antenna  arrays  [7],  our  proposed  model  in  (3)  has  the  main 
advantage  that  it  includes  both  space  and  time  dimensions  in  a 
single  mathematically-tractable  closed-form  expression,  flexible 
for  fitting  to  array  data,  studying  the  performance  of  various 
array-based  techniques  [8]  for  different  applications  in  fading 
channels  with  the  realistic  assumption  of  non-isotropic 
scattering,  optimizing  array  configurations  [9],  etc.. 


In  this  section  we  use  the  proposed  model  in  (3)  for  two  array- 
based  applications.  In  the  first  one  we  need  a  spatio-temporal 
correlation  function,  while  for  the  second  one  a  spatial-only 
correlation  function  is  needed.  In  array  applications,  the  need  for 
a  spatio-temporal  correlation  function  also  appears  in 
conjunction  with  such  important  fading  characteristics  as  level 
crossing  rate  and  average  fade  duration  [4]  [10],  which  due  to 
space  limitations  we  do  not  address  here. 

waves  from  all  directions  with  equal  probability,  while  in  Fig.  2, 
where  x,  =  k2  =  10 ,  the  MS  receives  directional  waves  from  two 
specific  directions  (the  beamwidth  in  each  direction  is  equal  to 
BW  =  2/Vk  =36°  [5]).  Suppose  the  first  user  is  the  desired 
user,  while  the  second  one  is  the  interfering  user.  The  MS  moves 
from  left  to  right  ( a  =  0  )  and  the  users  travel  at  speeds  such  that 
the  desired  user  has  the  maximum  Doppler  frequency  fdJ  =  0.1 
Hz,  while  the  interfering  user  has  the  maximum  Doppler 
frequency  fd,2  =  0.05  Hz.  Assume  the  correlation  coefficient 
between  the  users’  signature  waveforms  is  pl2  =  0.5  ,  and  the 
MS  uses  only  the  past  two  values  (1  =  2)  of  matched  filter 
outputs  and  bit  decisions  for  fading  estimation  and  bit  detection 
in  the  presence  of  Rayleigh  fading  and  zero-mean  additive  white 
Gaussian  noise  with  variance  o 2 .  Suppose  both  users  have 
(equal)  unit  power.  Let  us  define  the  signal-to-noise  ratio  (SNR) 
as  y  =  l/o 2 .  For  d  =  0.3/  and  A ,  the  asymptotic  efficiency  of 
the  desired  users,  rj] ,  calculated  using  the  equations  given  in 
[12],  is  plotted  in  Figs.  3  and  4  versus  SNR,  assuming 
x,  =  k2  =  0  and  x,  =  x2  =  10 .  According  to  both  figures,  as  k 
increases  (more  directional  reception),  the  efficiency  of  both 
detectors  increases  significantly  (which  is  good  news).  However, 
the  difference  between  the  detectors  efficiencies  increase  as  well, 
which  implies  that  choosing  the  decorrelating  detector,  due  to  its 
lower  complexity,  introduces  a  significant  loss  in  efficiency  when 
we  have  non-isotropic  scattering.  Hence,  we  need  to  develop  new 
suboptimum  low-complexity  detectors  with  efficiencies 
comparable  with  the  optimum  detector,  in  channels  with 
directional  reception. 

3.2  Average  Bit  Error  Rate  of  a  Single-User 
Multichannel  Array  Detector 

Assume  that  in  Figs.  1  and  2,  we  have  user  one  only  ( K  =  1 ), 
and  bp=  0 .  Moreover,  both  the  MS  and  the  user  are  stationary 
(fd  =  0 ).  The  user  sends  data  using  binary  phase  shift  keying 
(BPSK)  modulation  scheme,  and  the  MS  is  equipped  with  a  two- 
branch  ( M  =  2 )  maximal  ratio  combiner  (MRC).  The  average 
bit  error  rate  (BER)  in  this  case  is  given  by  [13,  Eq.  (12)]: 

3.1  Efficiency  of  Two  Multiuser  Multichannel 
Array  Detectors 

(l +p) 

Y(}  +  p) 

i  +  y(\  +  p) 


For  code  division  multiple  access  (CDMA)  signals,  recently  two 
array-based  multiuser  detection  schemes  with  imperfect  estimates 
of  the  fading  channel  were  investigated  in  [11]:  the  decision- 
directed  detector  (with  more  complexity)  which  is  optimum,  and 
the  decorrelating  detector  (with  less  complexity)  which  is 
suboptimum.  In  terms  of  the  asymptotic  efficiency,  it  has  been 
proven  that  the  decision-directed  detector  is  superior.  However, 
the  decorrelating  detector  is  simpler  to  implement.  So,  it  is  of 
interest  to  determine  how  much  these  two  detectors  are  different 
in  terms  of  asymptotic  efficiency.  Here,  by  a  simple  example  [12, 
p.  107  and  p.  1 17]  we  show  that  the  answer  strongly  depends  on 
the  mode  of  scattering,  which  affects  the  correlation  function  of 
the  complex  envelope  in  the  fading  channel. 

Assume  that  the  MS  has  a  two-element  antenna  ( M  =  2 ),  and 
there  are  two  mobile  users  (K  =  2)  according  to  the 
configuration  shown  in  Figs.  1  and  2  ( 6p  l  =  0 ,  6p  2  =  n  )■  In  Fig. 
1  we  have  x,  =  x2  =0,  where  the  MS  receives  scattered  plain 

where  p  =|  <pn  (0)  | .  In  Figs.  5  and  6  we  have  plotted  Pb  (y ) 
versus  y  for  d  =  0.3X  and  A  ,  respectively.  As  we  expect,  the 
average  BER  increases  as  x  increases,  because  it  results  in  more 
correlation  between  the  branches.  Of  course,  a  larger  d  can 
reduce  the  amount  of  correlation  between  branches,  resulting  in 
smaller  average  BER  (compare  Figs.  5  and  6). 


Although  the  application  of  antenna  arrays  in  both  MS  and  BS  is 
advantageous,  in  this  section  we  focus  on  BS  since  the 
application  of  arrays  at  the  BS  is  more  common  (practical 
constraints  usually  restrict  the  use  of  an  array  of  antennas  at  a 
MS).  For  statistical  characterization  of  narrow  histograms  of  the 
AOA  of  waves  impinging  the  BS  [14]  [15]  (which  gives  rise  to 
the  non-uniform  distribution  of  power  versus  the  azimuth  angle 
[16]),  three  different  PDF’s  are  used  so  far  in  the  literature: 
cosine  [17],  Gaussian  [18],  and  truncated  uniform  [19].  All  these 


PDF’s  are  considered  primarily  for  studying  the  effect  of  non- 
uniformly  distributed  AOA  on  the  spatial  correlation  among  the 
array  elements  at  a  BS.  With  appropriate  choice  of  parameters, 
these  three  PDF’s  can  resemble  visually  the  narrow  histograms  of 
the  AOA  at  the  BS  (although  the  truncated  uniform  PDF  is  less 
likely  to  do  that  because  the  empirical  histograms  are  usually 
bell-shaped  [14]  [15]  and  decay  to  zero  not  as  abruptly  as  a 
truncated  uniform  PDF).  So,  mathematical  convenience  seems  to 
be  the  main  concern  in  choosing  a  PDF  for  the  AOA,  among 
empirically-acceptable  candidates.  From  this  point  of  view,  none 
of  these  three  PDF’s  are  able  to  provide  a  simple  closed-form 
solution  (in  terms  of  known  mathematical  functions)  for  the 
correlation  between  the  complex  envelopes  of  the  array  elements 
(which  is  a  basic  quantity  in  array-related  studies).  For  the 
Gaussian  PDF  only  approximate  results  can  be  found  [18]  [20], 
and  for  the  truncated  uniform  PDF,  closed-form  results  can  be 
derived  only  for  inline  and  broadside  cases  [21]  (the  cosine  PDF 
is  less  likely  to  yield  a  closed-form  answer  because  of  the  special 
integral  that  has  to  be  solved).  On  the  other  hand,  as  we  see  in 
the  sequel,  von  Mises  PDF  yields  a  simple  and  compact 
expression,  given  in  (5),  which  is  basically  the  same  as  (3).  This 
makes  the  von  Mises  PDF  a  very  suitable  model. 

Comparison  of  the  Gaussian  PDF  with  the  histograms  of  AOA 
data  has  shown  reasonable  agreement  [15]  [22],  This  is  a  good 
empirical  support  for  the  von  Mises  PDF  because  for  large  *  , 
the  PDF  in  (2)  resembles  a  small-variance  Gaussian  PDF  with 
mean  bp  and  standard  deviation  1/V*  [23,  p.  60],  In  fact,  for 
any  beamwidth  (angle  spread)  smaller  than  40°  (which 
correspond  to  *>  8.2  according  to  the  definition  of  beamwidth 
as  BW  =  2  Nk  in  [5]),  the  plots  of  Gaussian  and  von  Mises 
PDF  are  indistinguishable  (two  typical  standard  deviations  for 
the  Gaussian  PDF  are  15°  [22]  and  6°  [15],  which  correspond 
to  x  =  14.6  and  *=91.2,  respectively).  However,  recall  that 
von  Mises  PDF  is  able  to  provide  a  general  and  closed-form 
solution  for  the  space-time  correlation  between  the  complex 
envelopes  of  the  array  elements,  while  Gaussian  PDF  cannot. 

Using  exactly  the  same  notation  as  [17],  it  is  straightforward  to 
show  that  for  the  linear  uniformly-spaced  antenna  array  at  the  BS 
in  [17,  Fig.  6]  we  have: 


/o^V*-2  _A-2  - y 2  +2xycosy+  j2K[xcos{y-a)-y  cosa]  j, 

provided  that  AOA  has  a  von  Mises  PDF  with  the  mean  direction 
ae  \-n,7i)  and  the  width  control  parameter  k  >0.  All  of  the 
parameters  in  (5)  are  the  same  as  (3),  except  for  j  in  (5)  which 
represents  the  direction  of  the  motion  of  the  MS  with  respect  to 
the  horizontal  axis  counterclockwise,  in  place  of  a  in  (3)  (the  } 
here  should  not  be  confused  with  the  SNR  symbol  }  ,  used  in 
Section  3).  The  two  sign  changes  in  (5),  in  comparison  with  (3), 
come  from  different  ways  of  numbering  the  array  elements:  in  [4, 
Fig.  2],  the  elements  are  numbered  from  left  to  right,  while 
elements  numbering  in  [17,  Fig.  6]  is  from  right  to  left. 

Now  we  compare  our  correlation  model  with  the  data  published 
in  [17],  where  the  data  are  spatial  cross-correlations  between  the 
square  of  the  envelopes  of  a  two  element  array,  mounted  on  a 
BS.  We  do  this  by  considering  two  models  for  the  AOA  PDF  at 
the  BS:  the  simple  model  with 

Pe(b)  =  exp{*  cos(6  -a)}/2n  /0(x) ,  and  the  composite  model 
with  pe(6)  =  £  exp(xcos(6 -a)\/2n  /0(*)  +  (l-£)/2;z  .where 
0  <  £  <  1  indicates  the  amount  of  directional  reception.  The 
composite  PDF  reduces  to  the  von  Mises  PDF  for  £  =  1  ,  and 
simplifies  to  the  uniform  PDF  for  £  =  0.  Consequently,  the 
associated  spatial  correlation  functions  for  a  two  element  array  at 
a  BS  can  be  written  as: 

012  (0)  =  I0^Jk2  -4x2(d  /  A.)2  +  j4x  K{d  /  A)  cos  a  )/W  ,(6) 
012(O)  =  £ I0^Jic2  -4k 2(d / A)2  +  j4x K(d / A)cosa  /l0(K) 



Figs.  7-8  show  Lee’s  correlation  data,  plotted  together  with 
1 012  (0)  |2  calculated  according  to  (6)  and  (7)  for  both  models. 
For  a  given  a  (known  a  priori  for  each  data  set),  the  unknown 
k  for  the  simple  model  and  the  unknown  pair  (*,£)  for  the 
composite  model  are  estimated  by  the  nonlinear  least  squares 
method  (implemented  via  a  systematic  numerical  search 
technique).  Based  on  these  figures  (and  many  others  not  shown 
due  to  space  limitations),  the  von  Mises  PDF  is  able  to  account 
for  the  variations  of  the  correlation  versus  antenna  spacing  with 
reasonable  accuracy  (compare  our  correlation  plots  with  those 
drawn  in  [17]  assuming  the  cosine  PDF  and  [21]  using  the 
truncated  uniform  PDF,  both  for  the  same  data  sets.  Interestingly, 
the  correlation  plots  in  [17]  can  also  be  considered  as  curves 
obtained  based  on  a  Gaussian  PDF,  because  for  small  BW,  the 
cosine  PDF  can  be  approximated  by  a  Gaussian  PDF  [21]).  Note 
that  in  Fig.  7  both  models  are  similar  (  £  =  0.98  ),  while  in  Fig.  8 
the  composite  model  shows  a  much  better  fit  ( £  =  0.74 ).  In 
general  the  composite  model  was  able  to  improve  the  fits 
obtained  by  the  simple  model,  which  is  not  surprising  because  it 
has  the  additional  parameter  £  .  This  is  in  agreement  with  the 
noise-like  signal  introduced  in  [17]. 


Space-time  processing  using  antenna  arrays  over  wireless  mobile 
fading  channels  offer  several  advantages  in  cellular  systems,  such 
as  mitigating  fading,  intersymbol  interference,  cochannel 
interference,  etc..  Efficient  joint  use  of  both  space  and  time 
dimensions  demands  for  spatio-temporal  channel  models.  As  a 
basic  channel  model,  we  need  a  two  dimensional  spatio-temporal 
correlation  function  among  the  random  signals  sensed  by  the 
array  elements,  to  characterize  the  second  order  dependence 
structure  of  the  random  channel  in  both  space  and  time.  In  this 
paper  we  have  proposed  a  flexible  spatio-temporal  correlation 
function  for  propagation  scenarios  with  non-isotropic  scattering 
(signal  reception  from  specific  directions).  The  non-uniform 
distribution  for  the  angle  of  arrival,  which  characterizes  the  non¬ 
isotropic  scattering,  is  modeled  by  von  Mises  PDF  which  has 
previously  shown  to  be  successful  in  describing  the  measured 
data.  The  proposed  spatio-temporal  correlation  function  is 
general  enough  to  include  important  special  cases  such  as  Lee’s 
spatio-temporal  correlation  function  and  Clarke’s  temporal 
correlation  function,  both  derived  for  isotropic  scattering. 
Moreover,  its  compact  mathematical  form  facilitates  analytical 
manipulations  of  array-based  techniques  and  results  in  terms  of 
closed-form  expressions  for  such  important  fading  parameters  as 


spectral  moments  (successive  derivatives  of  the  correlation 
function).  Based  on  two  case  studies  (multiuser  detection  and 
diversity  reception)  and  using  the  new  spatio-temporal 
correlation  function,  we  have  shown  that  non-isotropic  scattering 
(typical  of  many  mobile  channel  scenarios)  has  a  significant 
impact  on  the  performance  of  array  processors,  and  should  be 
taken  into  account  in  the  analysis  and  design  of  adaptive  antenna 
arrays  for  mobile  fading  channels. 

Theoretically,  the  new  correlation  function  is  applicable  to  both 
MS  and  BS.  However,  since  practical  restrictions  limit  the  use  of 
multiple  antennas  at  a  MS,  the  proposed  correlation  function 
seems  to  be  of  much  more  use  in  a  BS.  Therefore,  the  empirical 
justification  of  the  new  correlation  function  is  demonstrated  by 
comparison  with  published  data  collected  at  a  BS. 


This  work  has  been  supported  in  part  by  the  National  Science 
Foundation,  under  the  Wireless  Initiative  Program,  Grant 
#9979443.  The  authors  appreciate  the  input  provided  by  Dr.  T. 
A.  Brown  at  Motorola  regarding  the  multiuser  multichannel 
detector  examples. 


[1]  J.  H.  Winters,  “Smart  antennas  for  wireless  systems,”  IEEE  Pers. 
Commun.  Mag.,  vol.  5,  no.  1,  pp.  23-27,  1998. 

[2]  R.  Kohno,  “Spatial  and  temporal  communication  theory  using 
adaptive  antenna  array,”  IEEE  Pers.  Commun.  Mag.,  vol.  5,  no.  1, 
pp.  28-35,  1998. 

[3]  A.  J.  Paulraj  and  C.  B.  Papadias,  “Space-time  processing 
techniques  for  wireless  communications,”  IEEE  Signal  Processing 
Mag.,  vol.  14,  no.  6,  pp.  49-83,  1997. 

[4]  W.  C.  Y.  Lee,  “Level  crossing  rates  of  an  equal-gain  predetection 
diversity  combiner,”  IEEE  Trans.  Commun.  Technol.,  vol.  18,  pp. 
417-426,  1970. 

[5]  A.  Abdi,  H.  Allen  Barger,  and  M.  Kaveh,  “A  parametric  model  for 
the  distribution  of  the  angle  of  arrival  and  the  associated  correlation 
function  and  power  spectrum  at  the  mobile  station,”  submitted  to 
IEEE  Trans.  Vehic.  Technol.,  Sep.  1999. 

[6]  G.  L.  Stuber,  Principles  of  Mobile  Communication.  Boston,  MA: 
Kluwer,  1996 

[7]  R.  B.  Ertel,  P.  Cardieri,  K.  W.  Sowerby,  T.  S.  Rappaport,  and  J.  H. 
Reed,  “Overview  of  spatial  channel  models  for  antenna  array 
communication  systems,”  IEEE  Pers.  Commun.  Mag.,  vol.  5,  no.  1, 
pp.  10-22,  1998. 

[8]  L.  C.  Godara,  “Applications  of  antenna  arrays  to  mobile 
communications.  Part  I:  Performance  improvement,  feasibility,  and 
system  considerations.  Part  II:  Beam-forming  and  direction-of- 
arrival  considerations,”  Proc.  IEEE,  vol.  85,  pp.  1031-1060  and  pp. 
1195-1245,  1997. 

[9]  W.  C.  Y.  Lee,  “A  study  of  the  antenna  array  configuration  of  an  M- 
branch  diversity  combining  mobile  radio  receiver,”  IEEE  Trans. 
Vehic.  Technol.,  vol.  20,  pp.  93-104,  1971. 

[10]  F.  Adachi,  M.  T.  Feeney,  and  J.  D.  Parsons,  “Effects  of  correlated 
fading  on  level  crossing  rates  and  average  fade  durations  with 
predetection  diversity  reception,”  IEE  Proc.  F,  Commun.,  Radar, 
Signal  Processing,  vol.  135,  pp.  11-17,  1988. 

[11]  T.  A.  Brown  and  M.  Kaveh,  “Multiuser  detection  with  antenna 
arrays  in  the  presence  of  multipath  fading,”  in  Proc.  IEEE  hit. 
Conf.  Acoust.,  Speech,  Signal  Processing,  Atlanta,  GA,  1996,  pp. 

[12]  T.  A.  Brown,  “The  use  of  antenna  arrays  in  the  detection  of  code 
division  multiple  access  signals,”  Ph.D.  Thesis,  Dept,  of  Elec.  Eng., 
University  of  Minnesota,  Minneapolis,  MN,  June  1995. 

[13]  S.  T.  Kim,  J.  H.  Yoo,  and  H.  K.  Park,  “A  spatially  and  temporally 
correlated  fading  model  for  array  antenna  applications,”  IEEE 
Trans.  Vehic.  Technol.,  vol.  48,  pp.  1899-1905, 1999. 

[14]  A.  Klein  and  W.  Mohr,  “A  statistical  wideband  mobile  radio 
channel  model  including  the  directions-of-arrival,”  in  Proc.  IEEE 
Int.  Symp.  Spread  Spectrum  Techniques  Applications,  Mainz, 
Germany,  1996,  pp.  102-106. 

[15]  K.  I.  Pedersen,  P.  E.  Mogensen,  and  B.  H.  Fleury,  “A  stochastic 
model  of  the  temporal  and  azimuthal  dispersion  seen  at  the  base 
station  in  outdoor  propagation  environments,”  IEEE  Trans.  Vehic. 
Technol.,  vol.  49,  pp.  437-447,  2000. 

[16]  P.  Pajusco,  “Experimental  characterization  of  D.O.A  at  the  base 
station  in  rural  and  urban  area,”  in  Proc.  IEEE  Vehic.  Technol. 
Conf.,  Ottawa,  ONT,  Canada,  1998,  pp.  993-997. 

[17]  W.  C.  Y.  Lee,  “Effects  on  correlation  between  two  mobile  radio 
base-station  antennas,”  IEEE  Trans.  Commun.,  vol.  21,  pp.  1214- 
1224, 1973. 

[18]  F.  Adachi,  M.  T.  Feeney,  A.  G.  Williamson,  and  J.  D.  Parsons, 
“Crosscorrelation  between  the  envelopes  of  900  MHz  signals 
received  at  a  mobile  radio  base  station  site,”  IEE  Proc.  F, 
Commun.,  Radar,  Signal  Processing,  vol.  133,  pp.  506-512,  1986. 

[19]  J.  Salz  and  J.  H.  Winters,  “Effect  of  fading  correlation  on  adaptive 
arrays  in  digital  mobile  radio,”  IEEE  Trans.  Vehic.  Technol.,  vol. 
43,  pp.  1049-1057,  1994. 

[20]  T.  Trump  and  B.  Ottersten,  “Estimation  of  nominal  direction  of 
arrival  and  angular  spread  using  an  array  of  sensors,”  Signal 
Processing,  vol.  50,  pp.  57-69,  1996. 

[21]  M.  Kalkan  and  R.  H.  Clarke,  “Prediction  of  the  space-frequency 
correlation  function  for  base  station  diversity  reception,”  IEEE 
Trans.  Vehic.  Technol.,  vol.  46,  pp.  176-184,  1997. 

[22]  U.  Martin,  “Spatio-temporal  radio  channel  characteristics  in  urban 
macrocells,”  IEE  Proc.  Radar,  Sonar,  Navig.,  vol.  145,  pp.  42-49, 

[23]  K.  V.  Mardia,  Statistics  of  Directional  Data.  London:  Academic, 

Figure  1.  Isotropic  scattering  in  an  open  area  (circles  are 

Figure  2.  Non-isotropic  scattering  in  a  narrow  street. 


Correlation  Coefficient  log , 0(  Pb(y) )  Asymptotic  efficiency  rj\(y) 

0  5  10  15  20  25  30 

SNRy  (dB) 

Figure  3.  Asymptotic  efficiency  of  two  multiuser  array 

_» — . — i — . — . — l — - — . . — . — l — . — . — . — . — l — . — . — . — . — I . , I i_J 

0  5  10  15  20  25  30 

SNR  y  (dB) 

Figure  5.  Bit  error  rate  of  BPSK  with  two-branch  MRC. 

Figure  7.  Correlation  coefficient  versus  antennas  spacing 
Simple:  BW  =  0.5° ,  Composite:  BW  =  0.5° ,  £  =  0.98 

0  5  10  15  20  25  30 

SNRy  (dB) 

Figure  4.  Asymptotic  efficiency  of  two  multiuser  array 

0  5  10  15  20  25  30 

SNRy  (dB) 

Figure  6.  Bit  error  rate  of  BPSK  with  two-branch  MRC. 

Figure  8.  Correlation  coefficient  versus  antennas  spacing 
Simple:  BW  =0.4°  ,  Composite:  BW  =0.2° ,  £  =0.74 



Ali  MANSOUR  and  Noboru  OHNISHI 

Bio-Mimetic  Control  Research  Center  (RIKEN), 

2271-130,  Anagahora,  Shimoshidami,  Moriyama-ku,  Nagoya  463  (JAPAN) 
email:mansour@nagoya.riken.  and 


mixtures  [16,  17], 

For  the  blind  separation  of  sources  (BSS)  problem  (or  the 
independent  component  analysis  (ICA)),  it  has  been  shown 
in  many  situations,  that  the  adaptive  subspace  algorithms 
are  very  slow  and  need  an  important  computation  efforts. 
In  a  previous  publication,  we  proposed  a  modified  subspace 
algorithm  for  stationary  signals.  But  that  algorithm  was 
limited  to  stationary  signals  and  its  convergence  was  not 
fast  enough. 

Here,  we  propose  a  batch  subspace  algorithm.  The  experi¬ 
mental  study  proves  that  this  algorithm  is  very  fast  but  its 
performance  are  not  enough  to  completely  achieve  the  sep¬ 
aration  of  the  independent  component  of  the  signals.  In  the 
other  hand,  this  algorithm  can  be  used  as  a  pre-processing 
algorithm  to  initialized  other  adaptive  subspace  algorithms. 
Keywords:  blind  separation  of  sources,  ICA,  subspace  meth¬ 
ods,  Lagrange  method,  Cholesky  decomposition. 


The  blind  separation  of  sources  (BSS)  problem  [1]  (or  the 
Independent  Component  Analysis  ”ICA”  problem  [2])  is  a 
recent  and  important  problem  in  signal  processing.  Accord¬ 
ing  to  this  problem,  one  should  estimate,  using  the  output 
signals  of  an  unknown  channel  (i.e.  the  observed  signals 
or  the  mixing  signals),  the  unknown  input  signals  of  that 
channel  (i.e.  sources).  The  sources  are  assumed  to  be  sta¬ 
tistically  independent  from  each  other. 

At  first  the  BSS  was  proposed  in  a  biological  context  [3], 
Actually,  one  can  find  this  problem  in  many  different  situa¬ 
tions:  speech  enhancement  [4],  separation  of  seismic  signals 
[5],  sources  separation  method  applied  to  nuclear  reactor 
monitoring  [6],  airport  surveillance  [7],  noise  removal  from 
biomedical  signals  [8],  etc. 

Since  1985,  many  researchers  have  been  interested  in 
BSS  [9,  10,  11,  12].  Most  of  the  algorithms  deal  with  a  linear 
channel  model:  The  instantaneous  mixtures  (i.e.  memory¬ 
less  channel)  or  the  convolutive  mixtures  (i.e.  the  chan¬ 
nel  effect  can  be  considered  as  a  linear  filter).  The  crite¬ 
ria  of  those  algorithms  were  generally  based  on  high  order 
statistics  [13,  14,  15].  Recently,  by  using  only  second  or¬ 
der  statistics,  some  subspace  methods  have  been  explored 
to  separate  blindly  the  sources  in  the  case  of  convolutive 

In  previous  works,  we  proposed  two  subspace  approaches 
using  LMS  [18,  17]  or  a  conjugate  gradient  algorithm  [19] 
to  minimize  subspace  criteria.  Those  criteria  were  been  de¬ 
rived'- from  the  generalization  of  the  method  proposed  by 
Gesbert  et  al.  [20]  for  blind  identification1 .  To  improve  the 
convergence  speed  of  our  algorithms,  we  proposed  a  modi¬ 
fied  subspace  algorithm  for  stationary  signals  [21].  But  that 
algorithm  was  limited  to  stationary  signals  and  its  conver¬ 
gence  was  not  fast  enough.  Here,  we  propose  a  new  sub¬ 
space  algorithm,  which  improves  the  performance  of  our 
previous  methods. 


Let  Y (n)  denotes  the  g  x  1  mixing  vector  obtained  from  p 
unknown  and  statistically  independent  sources  S(n)  and  let 
the  g  x  p  polynomial  matrix  7f(z)  =  ( hij(z ))  denotes  the 
channel  effect  (see  fig.  1).  In  this  paper,  we  assume  that  the 
filters  hij(z)  are  causal  and  finite  impulse  response  (FIR) 
filters.  Let  us  denote  by  M  the  highest  degree2  of  the  filters 
hij(z).  In  this  case,  Y(n)  can  be  written  as: 


Y(»)  =  £H(i)S(»-i),  (1) 


where  S(n  —  i)  is  the  p  x  1  source  vector  at  the  time  ( n  —  i) 
and  H(»)  is  the  real  q  x  p  matrix  corresponding  to  the  filter 
matrix  H(z)  at  time  i. 

Let  Yn(u)  (resp.  5jvf+iv(n))  denotes  the  g(N  +  1)  x  1 
(resp.  (M  +  N  +  1  )p  x  1)  vector  given  by: 



/  n») 

\  Y(n  —  N) 

S(n  —  M  —  N) 

JIn  the  identification  problem,  the  authors  generally  assume 
that  they  have  one  source  and  that  the  source  is  an  iid  signal. 
2M  is  called  the  degree  of  the  filter  matrix  'H(z). 

0-7803-5988-7/00/$  10.00  ©  2000  IEEE 


Sub-space  method 

By  using  N  >  q  observations  of  the  mixture  vector,  we  can 
formulate  the  model  (1)  in  another  form: 

Yw(n)  =  TW(H  )SU+N(n),  (2) 

where  Tjv(H)  is  the  Sylvester  matrix  corresponding  to  H(z). 
The  q(N  +  1)  x  p{M  +  N  +  1)  matrix  Tat(H)  is  given  by 

[22]  as: 

r  H(o) 

H(l)  ... 

H  (M) 

0  ...  0 


H(0)  ... 

H(M-  1) 

H(Af)  0 




H(l)  ...  H(M) 

Finally,  to  avoid  the  spurious  solutions  (i.e.  a  singular 
matrix  M),  one  must  minimize  that  criterion  subject  to  a 
constraint  [17]: 

Subject  to  GoR.w(n)G(T  =  I,  (5) 

here  Rjv(n)  =  E  Yjv(n)Yjv  (»)>  and  the  pxq(N  +  1)  matrix 
Go  stands  for  the  first  bloc  line  of  G  =  (Gjf  •••  G^M+N^) 
The  minimization  using  a  LMS  algorithm  of  the  above  cri¬ 
terion  with  respect  to  a  constraint  was  discuss  in  our  previ¬ 
ous  work  [17].  In  addition,  the  minimization  of  a  modified 
version  of  the  above  criterion  was  done  using  a  conjugate 
gradient  algorithm  [19]. 

It  was  proved  in  [23]  that  the  rank  of  Sylvester  matrix 
Tat(H)  =  p(N  +  1)  +  XX i  Mi,  where  M,  is  the  degree  of 
the  ith  column3  of  H(z).  Now,  it  is  easy  to  prove  that  the 
Sylvester  matrix  has  a  full  rank  and  it  is  left  invertible  if 
each  column  of  the  polynomial  matrix  H(z)  has  the  same 
degree  and  N  >  Mp  (see  [24]  for  more  details).  From  equa¬ 
tion  (2),  one  can  conclude  that  the  separation  of  the  sources 
can  be  achieved  by  estimating  a  ( M  +  N  +  l)p  x  q(N  +  1) 
left  inverse  matrix  G  of  the  Sylvester  matrix.  To  estimate 
G,  one  can  use  criterion  proposed  in  [17]  obtained  from  the 
generalization  of  the  criterion  in  [20]: 

min  G(G)  =  E  ||(I  0)GYjv(n)-(0  I)GY*(n  +  l)||2,  (3) 

here  E  stands  for  the  expectation,  I  is  the  identity  matrix 
and  0  is  a  zero  matrix  of  appropriate  dimensions.  It  has 
been  shown  in  [17]  that  the  above  minimization  lead  us  to 
a  matrix  G*  such: 


From  the  previous  section,  it  is  clear  that  the  minimiza¬ 
tion  of  the  criterion  (3)  should  be  done  subject  to  a  p2 
constraints4.  Let  const  denotes  the  constraint  vector  (i.e. 
const  =  Vec  (GoRjv(rc)Go'  —  I),  here  Vec  is  the  operator 
that  corresponds  to  a  p  x  q  matrix  a  pq  vector).  The  min¬ 
imization  of  the  criterion  (3)  subject  to  the  constraints  (5) 
can  be  formulated  using  the  Lagrange  method  as: 

£(G,  A)  =  C(G)  -  A  const  (6) 

here  A  is  a  line  vector,  stands  for  the  Lagrange  parameters. 
The  minimization  of  the  above  equation  with  respect  to  A 
leads  us  to  the  constraint  equation  (5).  Using  the  derivative 
dC( G)/3G  given  in  [17],  the  equation  (5)  and  (6),  one  can 

Perf  =  G*TW(H)  =  diag(M,  •  •  • ,  M),  (4) 

where  M  is  any  p  x  p  matrix.  Using  the  last  equation,  it 
becomes  clear  that  the  separation  is  reduced  to  the  sepa¬ 
ration  of  an  instantaneous  mixture  with  a  mixing  matrix 
M.  In  other  words,  this  algorithm  can  be  decomposed  into 
two  steps:  First  step,  by  using  only  second-order  statistics, 
we  reduce  the  convolutive  mixture  problem  to  an  instan-' 
taneous  mixture  (deconvolution  step);  then  in  the  second 
step,  we  must  only  separate  sources  consisting  of  a  simple 
instantaneous  mixture  (typically,  most  of  the  instantaneous 
mixture  algorithms  are  based  on  fourth-order  statistics). 

dC(  G,A) 

Ip  0  0  \ 

0  2I(m+jv-i)p  0  1  GRjv(ti) 

0  0  Ip  / 

-(j  I(M+N)p  ^GR£(n+1) 

/  0  0 

\  I (M+N)p  0 

GRat(w  +  1)  — 

2T  GoRjv(n) 


where  Rj v(n  +  1)  =  E  YN(n)Y^(n  +  1)  and  Ii  is  the  l  x  l 
identity  matrix.  By  canceling  the  above  equation  and  after 
some  algebraic  operations,  one  can  find  that  the  bloc  lines 

3  The  degree  of  a  column  is  defined  as  the  highest  degree  of 
the  filters  in  this  column. 

4  Using  the  symmetrical  form  of  the  equation  (5),  one  can 
decrease  the  constraint  number  to  p(p  +  l)/2. 


of  the  optimal  G*  should  satisfy: 


=  I, 



=  G(i+1)R-N(n  +  1)  + 

G(i_x)Rjv(n  4-  1), 



=  G(m+at-i)R-w(«  + 1), 


here  1  <  i  <  M  +  N  -  1.  Let  A  =  R^(n  +  l)R^1(n)  and 
B  =  R x(n  +  l)R^1(n),  we  should  mention  that  A  and  B 
exist  if  and  only  if  (iff)  Rjv(n)  is  full  rank5.  Finally,  using 
some  algebraic  operations,  we  can  prove  that  the  previous 
matrix  equation  system  can  be  solved  by  a  recursion  for¬ 

=  G(M+W-i-2)Di  (10) 

her  0  <  i  <  M  +  N  —  1  and  the  Go  can  be  obtained  from  the 
first  equation  (7),  using  a  simple  Cholesky  decomposition. 
In  addition,  the  matrices  Di  can  also  be  obtained  by: 

D(i+i)  =  B(2I  —  D,A)-1  (11) 

here  0<i<M  +  N—  1  and  Do  =  B.  Even  if  relation¬ 
ships  (10)  and  (11)  looks  complicated,  but  the  time  needed 
to  obtain  the  matrix  G  still  very  comparable6  to  the  time 
needed  for  the  convergence  of  LMS  version  [17]  or  even  the 
Conjugate  Gradient  version  [21,  19]. 


The  experiments  discussed  here  are  conducted  using  two 
sources  ( p  =  2)  with  uniform  probability  density  function 
(pdf)  and  four  sensors  (9  =  4),  and  the  degree  of  H(z)  is 
chosen  as  ( M  =  4). 

To  show  the  performances  of  the  subspace  criterion,  the 
matrix  Perf  =  G*Tjv(H)  is  plotted.  In  the  other  hand, 
we  know  that  the  deconvolution  is  achieved  iff  the  matrix 
Perf  is  a  bloc  diagonal  matrix  as  shown  in  equation  (4). 
Figure  2  shows  the  performances  of  the  batch  subspace  al¬ 
gorithm  discussed  in  this  paper.  It  is  clear  from  that  figure  2 
that  the  first  step  of  the  algorithm  (the  deconvolution)  was 
not  satisfactory  achieved  (Perf  is  not  a  bloc  diagonal  as  in 
equation  (4).  This  problem  was  obtained  because  the  crite¬ 
rion  (3)  is  a  flat  function  around  its  minima  (see  figure  (2)). 

Figure  3  shows  us  the  performance  results  and  the  crite¬ 
rion  convergence  of  the  LMS  algorithm  (first  column),  and 
the  performance  results  and  the  criterion  convergence  of 

5  It  is  easy  to  prove  that  Rjy(n)  is  full  rank  iff  one  add  some 
additive  independent  noise  to  the  observed  signals,  because  one 
of  the  subspace  assumption  q  >  p.  In  the  other  hand  and  by  us¬ 
ing  the  criterion  (3),  one  can  prove  the  existence  of  some  spurious 
minima,  if  the  model  have  some  additive  noise  (the  demonstra¬ 
tion  will  be  omitted  here  because  the  limit  of  the  sheet  number). 
However,  the  experimental  study  shows  that  one  still  obtain  good 
results  for  a  20  dB  ratio  of  signal  to  noise  (RSN).  In  our  simula¬ 
tion,  we  added  a  Gaussian  noise  with  RSN  >  20dB. 

6  Indeed,  using  C  code  program  and  an  ultra  30  creator  sun 
station,  it  needs  few  minutes  (less  than  5)  to  obtained  the  matrix 
G.  But  the  convergence  of  the  conjugate  gradient  needs  from 
40  to  100  minutes  and  the  LMS  algorithm  needs  few  hours  to 

the  same  LMS  algorithm  but  the  matrix  G  is  initialized  us¬ 
ing  the  result  of  the  batch  algorithm  (second  column).  We 
should  mention  that  the  time  needed  to  obtain  the  minima 
by  the  initialized  version  was  almost  half  the  time  needed  by 
the  non  initialized  version.  Figures  3  (c)  and  (d)  show  the 
criterion  convergence  (the  stop  condition  was  the  limit  of 
the  sample  number,  i.e.  10000).  The  experimental  studies 
show  that  the  Conjugate  Gradient  version  of  the  subspace 
algorithm  can  converge  faster  and  lead  us  to  better  per¬ 
formances  if  that  algorithm  has  been  initialized  using  the 
batch  proposed  algorithm  (these  results  will  be  omitted  in 
this  short  paper). 

The  second  step  of  the  algorithm  consists  on  the  sep¬ 
aration  of  a  residual  instantaneous  mixture  (correspond¬ 
ing  to  M,  see  equation  (4)).  This  separation  can  be  pro¬ 
cessed  using  any  source  separation  algorithm  applicable  to 
instantaneous  mixtures.  Here,  we  chose  the  minimization 
of  a  cross-cumulant  criterion  using  Levenberg-Marquardt 
method  [25].  Figure  (4)  shows  us  the  different  signals  (see 
figure"  (1)).  It  is  clear  that  the  sources  X  and  the  estimated 
signals  S  are  independent  signals  and  the  vector  Z,  output 
of  the  subspace  criterion,  corresponds  to  an  instantaneous 
mixture,  and  the  observed  vector  Y  corresponds  to  a  con- 
volutive  mixture  (see  [26,  27]). 

Finally,  the  estimation  of  the  second  and  the  high  order 
statistics  was  done  according  to  the  method  described  in 


In  this  paper,  we  propose  a  batch  algorithm  for  source  sep¬ 
aration  in  convolutive  mixtures  based  on  a  subspace  ap¬ 
proach.  This  new  algorithm  requires,  as  same  as  the  other 
subspace  methods,  that  the  number  of  sensors  is  larger  than 
the  number  of  sources.  In  addition,  it  allows  the  separation 
of  convolutive  mixtures  of  independent  sources  using  mainly 
second-order  statistics:  A  simple  instantaneous  mixture, 
the  separation  of  which  generally  needs  high-order  statis¬ 
tics,  should  be  conducted  to  achieve  the  separation. 

The  experimental  study  shows  that  the  the  present  algo¬ 
rithm  can  be  used  for  initialized  an  adaptive  subspace  algo¬ 
rithm.  The  initialized  algorithms  need  less  time  to  converge. 
These  results  were  discussed  in  the  case  of  two  subspace 
algorithms  which  are  based  on  LMS  or  on  a  conjugate  gra¬ 
dient  method.  Finally,  the  subspace  LMS  criterion  and  the 
Conjugate  gradient  criterion  will  become  more  stable  and 
faster  if  they  are  initialized  using  the  present  algorithm. 


[1]  C.  Jutten  and  J.  Herault,  “Blind  separation  of  sources, 
Part  I:  An  adaptive  algorithm  based  on  a  neuromimetic 
architecture,”  Signal  Processing,  vol.  24,  no.  1,  pp.  1- 
10,  1991. 

[2]  P.  Comon,  “Independent  component  analysis,  a  new 
concept?,”  Signal  Processing,  vol.  36,  no.  3,  pp.  287- 
314,  April  1994. 


[3]  B.  Ans,  J.  C.  Gilhodes,  and  J.  Herault,  “Simulation  de 
reseaux  neuronaux  (sirene).  II.  hypothese  de  decodage 
du  message  de  mouvement  porte  par  les  afferences  fu- 
soriales  IA  et  II  par  un  mecanisme  de  plasticite  synap- 
tique,”  C.  R.  Acad;  Sci.  Paris ,  vol.  serie  III,  pp.  419- 
422,  1983. 

[4]  L.  Nguyen  Thi  and  C.  Jutten,  “Blind  sources  separa¬ 
tion  for  convolutive  mixtures,”  Signal  Processing,  vol. 
45,  no.  2,  pp.  209-229,  1995. 

[5]  N.  Thirion,  J.  MARS,  and  J.  L.  BOELLE,  “Separation 
of  seismic  signals:  A  new  concept  based  on  a  blind 
algorithm,”  in  Signal  Processing  VIII,  Theories  and 
Applications,  Triest,  Italy,  September  1996,  pp.  85-88, 

[6]  G.  D’urso  and  L.  Cai,  “Sources  separation  method 
applied  to  reactor  monitoring,”  in  Proc.  Workshop 
Athos  working  group,  Girona,  Spain,  June  1995. 

[7]  E.  Chaumette,  P.  Common,  and  D.  Muller,  “Applica¬ 
tion  of  ica  to  airport  surveillance,”  in  HOS  93,  South 
Lake  Tahoe-California,  7-9  June  1993,  pp.  210-214. 

[8]  A.  Kardec  Barros,  A.  Mansour,  and  N.  Ohnishi,  “Re¬ 
moving  artifacts  from  ecg  signals  using  independent 
components  analysis,”  Neuro Computing,  vol.  22,  pp. 
173-186,  1999. 

[9]  J.  F.  Cardoso  and  P.  Comon,  “Tensor-based  inde¬ 
pendent  component  analysis,”  in  Signal  Processing 
V,  Theories  and  Applications,  L.  Torres,  E.  Masgrau, 
and  M.  A.  Lagunas,  Eds.,  Barcelona,  Espain,  1990,  pp. 
673-676,  Elsevier. 

[10]  S.  I.  Amari,  A.  Cichoki,  and  H.  H.  Yang,  “A  new  learn¬ 
ing  algorithm  for  blind  signal  separation,”  in  Neural 
Information  Processing  System  8,  Eds.  D.S.  Toureyzky 
et.  al.,  1995,  pp.  757-763. 

[11]  O.  Macchi  and  E.  Moreau,  “Self-adaptive  source  sepa¬ 
ration  using  correlated  signals  and  cross-cumulants,” 
in  Proc.  Workshop  Athos  working  group,  Girona, 
Spain,  June  1995. 

[12]  A.  Mansour  and  C.  Jutten,  “A  direct  solution  for  blind 
separation  of  sources,”  IEEE  Trans,  on  Signal  Process¬ 
ing,  vol.  44,  no.  3,  pp.  746-748,  March  1996. 

[13]  M.  Gaeta  and  J.  L.  Lacoume,  “Sources  separation 
without  a  priori  knowledge:  the  maximum  likelihood 
solution,”  in  Signal  Processing  V,  Theories  and  Ap¬ 
plications,  L.  Torres,  E.  Masgrau,  and  M.  A.  Lagunas, 
Eds.,  Barcelona,  Espain,  1994,  pp.  621-624,  Elsevier. 

[14]  N.  Delfosse  and  P.  Loubaton,  “Adaptive  blind  sepa¬ 
ration  of  independent  sources:  A  deflation  approach,” 
Signal  Processing,  vol.  45,  no.  1,  pp.  59-83,  July  1995. 

[15]  A.  Mansour  and  C.  Jutten,  “Fourth  order  criteria  for 
blind  separation  of  sources,”  IEEE  Trans,  on  Signal 
Processing,  vol.  43,  no.  8,  pp.  2022-2025,  August  1995. 

[16]  A.  Gorokhov  and  P.  Loubaton,  “Subspace  based  tech¬ 
niques  for  second  order  blind  separation  of  convolutive 
mixtures  with  temporally  correlated  sources,”  IEEE 
Trans,  on  Circuits  and  Systems,  vol.  44,  pp.  813-820, 
September  1997. 

[17]  A.  Mansour,  C.  Jutten,  and  P.  Loubaton,  “An  adap¬ 
tive  subspace  algorithm  for  blind  separation  of  inde¬ 
pendent  sources  in  convolutive  mixture,”  IEEE  Trans, 
on  Signal  Processing,  vol.  48,  no.  2,  February  2000. 

[18]  A.  Mansour,  C.  Jutten,  and  P.  Loubaton,  “Subspace 
method  for  blind  separation  of  sources  and  for  a  convo¬ 
lutive  mixture  model,”  in  Signal  Processing  VIII,  The¬ 
ories  and  Applications,  Triest,  Italy,  September  1996, 
pp.  2081-2084,  Elsevier. 

[19]  A.  Mansour,  A.  Kardec  Barros,  and  N.  Ohnishi,  “Sub¬ 
space  adaptive  algorithm  for  blind  separation  of  convo¬ 
lutive  mixtures  by  conjugate  gradient  method,”  in  The 
First  International  Conference  and  Exhibition  Digital 
Signal  Processing  (DSP ’98),  Moscow,  Russia,  June  30- 
July  3  1998,  pp.  I-252-I-260. 

[20]  D.  Gesbert,  P.  Duhamel,  and  S.  Mayrargue, 
“Subspace-based  adaptive  algorithms  for  the  blind 
equalization  of  multichannel  fir  filters,”  in  Signal  Pro¬ 
cessing  VII,  Theories  and  Applications,  M.J.J.  Holt, 
C.F.N.  Cowan,  P.M.  Grant,  and  W.A.  Sandham,  Eds., 
Edinburgh,  Scotland,  September  1994,  pp.  712-715, 

[21]  A.  Mansour  and  N.  Ohnishi,  “A  blind  separation 
algorithm  based  on  subspace  approach,”  in  IEEE- 
EURASIP  Workshop  on  Nonlinear  Signal  and  Image 
Processing  (NSIP’99),  Antalya,  Turkey,  June  20-23 
1999,  pp.  268-272. 

[22]  T.  Kailath,  Linear  systems,  Prentice  Hall,  1980. 

[23]  R.  Bitmead,  S.  Kung,  B.  D.  O.  Anderson,  and 
T.  Kailath,  “Greatest  common  division  via  general¬ 
ized  Sylvester  and  Bezout  matrices,”  IEEE  Trans,  on 
Automatic  Control,  vol.  23,  no.  6,  pp.  1043-1047,  De¬ 
cember  1978. 

[24]  A.  Mansour,  C.  Jutten,  and  P.  Loubaton,  “Robustesse 
des  hypotheses  dans  une  methode  sous-espace  pour  la 
separation  de  sources,”  in  Actes  du  XVIeme  colloque 
GRETSI,  Grenoble,  France,  septembre  1997,  pp.  111- 

[25]  A.  Mansour  and  N.  Ohnishi,  “Multichannel  blind  sep¬ 
aration  of  sources  algorithm  based  on  cross-cumulant 
and  the  levenberg-marquardt  method.,”  IEEE  Trans, 
on  Signal  Processing,  vol.  47,  no.  11,  pp.  3172-3175, 
November  1999. 

[26]  G.  Puntonet,  C.,  A.  Mansour,  and  C.  Jutten,  “Ge¬ 
ometrical  algorithm  for  blind  separation  of  sources,” 
in  Actes  du  XVeme  colloque  GRETSI,  Juan-Les-Pins, 
France,  18-21  September  1995,  pp.  273-276. 

[27]  A.  Prieto,  C.  G.  Puntonet,  and  B.  Prieto,  “A  neural 
algorithm  for  blind  separation  of  sources  based  on  ge¬ 
ometric  prperties.,”  Signal  Processing,  vol.  64,  no.  3, 
pp.  315-331,  1998. 

[28]  A.  Mansour,  A.  Kardec  Barros,  and  N.  Ohnishi,  “Com¬ 
parison  among  three  estimators  for  high  order  statis¬ 
tics.,”  in  Fifth  International  Conference  on  Neu¬ 
ral  Information  Processing  (ICONIP’98),  Kitakyushu, 
Japan,  21-23  October  1998,  pp.  899-902. 



Pei-Jung  Chung*  Alex  B.  Gershman**  Johann  F.  Bohme* 

*  Department  of  Electrical  Engineering  and  Information  Science, 

Ruhr  University,  D-44780  Bochum,  Germany 
p j  c , boehme@sth . ruhr-uni-bochum . de 

**Department  of  Electrical  and  Computer  Engineering, 

McMaster  University,  Hamilton,  L8S  4K1  Ontario,  Canada 
gershman@ieee . org 


We  apply  the  2-D  broadband  Maximum  Likelihood  (ML) 
and  interpolated  root-MUSIC  methods  to  estimate  the 
azimuth  and  velocity  parameters  of  teleseismic  events 
recorded  by  the  GERESS  array.  A  sequential  test  based 
on  Likelihood  Ratios  (LR’s)  is  developed  for  signal  de¬ 
tection.  Our  experimental  results  show  that  both  meth¬ 
ods  can  provide  reliable  estimates  of  signal  parameters. 
However,  ML  is  shown  to  have  better  estimation  accu¬ 
racy  and  robustness  than  interpolated  root-MUSIC  at 
the  expense  of  a  higher  computational  cost. 


The  ML  and  MUSIC  techniques  are  two  popular  meth¬ 
ods  in  array  processing.  Numerous  theoretical  and 
numerical  studies  have  shown  that  ML  outperforms 
MUSIC  in  scenarios  with  low  Signal  to  Noise  Ratios 
(SNR’s),  small  number  of  samples,  coherent  signals,  as 
well  as  closely  spaced  sources  [1],  However,  an  enor¬ 
mously  high  computational  cost  needed  for  ML  makes 
this  statistically  optimal  approach  in  many  cases  less 
attractive  than  MUSIC.  Therefore,  a  crucial  issue  is 
how  to  choose  a  proper  algorithm  for  a  particular  ap¬ 
plication  to  achieve  sufficiently  high  performance  and 
acceptable  computational  complexity. 

In  the  present  work,  we  apply  broadband  ML  [2] 
and  2-D  interpolated  root-MUSIC  [3]  to  localization  of 
several  teleseismic  events  using  the  GERESS  array  real 
data.  A  sequential  test  procedure  based  on  LR’s  is  used 
to  detect  signals  within  the  observation  interval.  Due 

This  work  was  supported  by  the  German  Science  Foundation 
and  by  the  Natural  Sciences  and  Engineering  Research  Council 
(NSERC)  of  Canada. 

to  complicated  propagation  effects,  there  may  be  more 
than  one  signal  phase  arriving  at  the  same  time  from 
the  same  direction.  However,  different  signal  phases 
should  differ  in  their  velocities.  It  is  worth  noting  that 
the  ML  method  can  be  directly  applied  to  the  broad¬ 
band  Direction  Of  Arrival  (DOA)  estimation  problem. 
On  the  other  hand,  root-MUSIC  should  be  adapted  to 
the  broadband  setting,  for  example,  by  means  of  the 
so-called  array  interpolation  technique  [4]  allowing  to 
combine  the  information  from  different  frequencies  in  a 
coherent  way.  In  [3]  and  [6],  a  high-SNR  regional  man¬ 
made  seismic  event  was  analyzed  by  means  of  the  ML 
and  interpolated  root-MUSIC  techniques.  Both  meth¬ 
ods  provided  excellent  results  in  this  case.  Below,  we 
address  a  more  difficult  teleseismic  event  case,  which  is 
characterized  by  much  lower  SNR’s  and  more  compli¬ 
cated  propagation  phenomena  relative  to  the  regional 
event  case.  In  the  teleseismic  case,  signal  detection 
becomes  a  very  important  issue,  since  it  is  almost  im¬ 
possible  to  identify  weak  signals  in  seismograms  (for 
example,  see  Fig.  1  displaying  a  typical  seismogram  of 
teleseismic  event). 

The  experimental  results  reported  in  the  present 
paper  demonstrate  that  in  the  teleseismic  case,  both 
ML  and  interpolated  root-MUSIC  may  be  successfully 
applied  to  source  localization.  ML  is  shown  to  have 
better  performance  arid  robustness  than  interpolated 
root-MUSIC.  However,  the  latter  approach  enjoys  sim¬ 
pler  implementation. 


Let  an  array  of  N  sensors  receive  M  broadband  signals 
from  far-field  sources.  The  2-D  array  can  be  assumed 

0-7803-5988-7/00/$  10.00  ©  2000  IEEE 


since  the  length  of  the  vertical  aperture  of  GERESS 
is  much  smaller  than  that  of  the  horizontal  one  and  is 
negligible  compared  to  the  seismic  signal  wavelength. 
The  array  output  x(t)  sampled  at  discrete  times  t  = 
0, . . . ,  T—  1  is  short-time  Fourier- transformed  using  the 
so-called  Thomson’s  multitaper  technique  [7]: 

1  T_1 

Xt(u)  =  -=  £  wt(t)x(t)e~jut ,  (l  =  0,. .  .,L  -  1) , 
VT  2=0 


where  {wi(t)}t=o,...,T-i  is  the  Zth  orthonormal  window 

For  sufficiently  large  T,  the  Fourier-transformed  da¬ 
ta  can  be  approximately  expressed  as 

Xi(u)  =  H(w)£i(a/)+f?,(w),  (2) 

H(w)  =  [di(w),...,djif(w)],  (3) 

where  Xt(u)  e  CNxl,  H(w)  €  CNxM ,  S,(w)  €  CMxl, 
and  Ui{<jj)  €  C7Vxl  are  the  observation  vector,  the 
steering  matrix,  the  vector  of  signal  waveforms,  and 
the  vector  of  sensor  noise,  respectively.  The  steering 
vector  drn(ui)  associated  with  the  mth  signal  is  given 

dmM  =  [e-^»ri ,  •  •  • ,  e-^r1T ,  (4) 

where  rn  =  (xn,  yn)  is  the  coordinate  of  the  nth  sensor. 
The  slowness  vector  £m  is  related  to  the  source  azimuth 
am  and  the  respective  velocity  Vm  as  follows: 

§m  =  T7~[  cos am,  sin am  }T .  (5) 

V  m 

The  signal  waveforms  Si(u}j),(l  =  0, ...,L  —  1,  j  = 
1, . . . ,  J)  are  assumed  to  be  deterministic  and  unknown. 
From  the  asymptotic  theory  of  the  Fourier  transform, 
it  is  well-known  that  Xi(u>j),  ( l  =  0,  —  1;  j  = 

1  ,...,</)  are  independent  complex  Gaussian  distributed 
with  the  mean  H(wj)S) (utj)  and  the  covariance  matrix 
v(cjj) I  where  v{uj)  is  the  sensor  noise  power  at  the  fre¬ 
quency  uij  and  I  is  the  identity  matrix  [2],  The  problem 
is  to  detect  the  signals  and  estimate  their  parameters 
Kn},  m  =  1, .  .  .  ,  M. 


Based  on  the  independence  and  asymptotic  gaussianity 
in  the  frequency  domain,  the  approximate  wideband 
log-likelihood  function  can  be  expressed  as  [2] 


m  =  £loStr  [{I  -  P(wi,0)}B*(«j)]  ,  (6) 

j= 1 


=  (7) 

denotes  the  unknown  slowness  vector,  P(ujj,d)  is  the 
proiection  matrix  onto  the  column-space  of  the  matrix 

1  i_1 

RzK)  =  j  £ ZMZfiui)  (8) 

^  1=0 

is  the  sample  spectral  density  matrix,  and  (• )H  denotes 
the  Hermitian  transpose.  The  ML  estimate  ^ML  is  ob¬ 
tained  by  minimizing  (6)  over  r). 


In  this  section,  we  describe  the  2-D  extension  [3]  of  the 
wideband  interpolated  root-MUSIC  algorithm  [5]  that 
will  be  applied  for  joint  estimation  of  the  azimuth  and 
velocity  parameters  of  seismic  sources. 

Let  the  2-D  array  be  divided  into  two  subarrays  of 
Ns  sensors  each,  denoted  as  subarrays  (a)  and  (b),  re¬ 
spectively.  Since  the  outline  of  the  algorithm  is  similar 
for  each  subarray,  in  the  sequel  we  consider  only  the 
subarray  (a).  Its  observation  vector  can  be  modeled  as 

Xl,a  M  =  Ha (Lu)St  (W)  +  Ulta  M  •  (9) 

This  subarray  will  be  used  for  interpolation  of  the  set  of 
J  virtual  ULA’s  with  the  interelement  spacings  dcucluj 
(j  =  1, . . . ,  J),  where  u>c  is  the  central  frequency,  and 
dc  is  the  interelement  spacing  of  the  virtual  ULA  at 
u)c.  To  obtain  the  same  array  manifold  for  each  fre¬ 
quency,  the  interpolation  matrices  B j  can  be  designed 
in  a  regular  way  [4].  The  coherently  averaged  covari¬ 
ance  matrix  can  be  obtained  as 

1  ^ 

Ra  =  7£BfRaBJ,  (10) 

J  i=i 


1  L~l 

=  7£w<m-  (u) 

^  1=0 

The  noise  covariance  matrix  after  the  coherent  process¬ 
ing  can  be  computed  as 

1  J 

Q=7£^K)BfB,,  (12) 

J  i= 1 

where  0(cjj)  is  some  estimate  of  sensor  noise  at  the 
frequency  ojj.  The  matrix 

Rq  =  Q-1/2RaQ-1/2  (13) 


is  the  spectral  density  matrix  after  prewhitening.  The 
eigendecomposition  of  this  matrix  yields 

Ra  =  UsAsUf  +  UwAjvU"  ,  (14) 

where  the  matrices  Us  and  U /v  contain  the  signal-  and 
noise-subspace  eigenvectors,  respectively.  In  turn,  the 
diagonal  matrices  A g  and  Ajv  contain  the  signal-  and 
noise-subspace  eigenvalues,  respectively. 

The  root-MUSIC  polynomial  has  the  form 

Da(z)  =  ^r(l/*)Q-1/2UJVUjjQ-1/2d(*) ,  (15) 

where  d(z)  =  [1  ,z~1,...,zN~1]T.  Let  {za,i,  ■  ■  ■  ,2a, m} 
denote  the  M  signal  roots  of  (15),  which  are  sorted 
based  on  their  proximity  to  the  unit  circle.  Similarly, 
we  can  find  M  signal  roots  {z^i , . . . ,  zbtM}  for  subar¬ 
ray  (b).  Combining  the  results  from  these  two  virtual 
subarrays,  we  can  find  M2  candidate  estimates  of  £  by 
solving  the  system 

Axatx  +  A  yat,y  =  arg  , 


Axb£x  +  A  yb£y  =  arg  —  (16) 


for  i,k  =  1  where  Axa,  A ya,  Axb,  and  Ayb 

define  the  interelement  spacings  of  the  virtual  arrays 
(a)  and  (b),  respectively.  The  final  estimate  of  £  is 
then  obtained  by  selecting  the  M  pairs  (£x,£y)  which 
correspond  to  the  maximal  values  of  2-D  MUSIC  spec¬ 
tral  function.  The  estimates  of  azimuth  and  velocity 
^music  can  be  obtained  from  these  M  pairs  using  (5). 


In  this  section,  we  develop  a  sequential  LR-based  test 
for  detecting  the  number  of  signals.  Let  m  denote  the 
hypothetical  number  of  signals.  In  each  step,  the  de¬ 
tection  problem  can  be  formulated  as  testing  the  hy¬ 
pothesis  Km  against  the  alternative  Am: 

Km  m  signals  are  present , 

Am  more  than  m  signals  are  present . 

Starting  from  m  =  0,  this  test  should  be  performed 
stepwise  and  then  stopped  once  the  hypothesis  Km  be¬ 
comes  accepted.  Applying  LR  principle,  we  obtain  the 
following  test  statistic  in  the  mth  step  [2]: 

1  J 

tm  =  +  —Fm{Uj))  ^  ta,  (17) 

J  3=1  "2 



|Pm+l  (w)  2?ML  )  Pm(kM 




—  Pm+l(w,l)ML  )j 

f  R*(“0] 


n\  =  L(2m  +  4),  n2  —  L(2N  —  2) ,  (19) 

and  €  K2m  is  the  ML  estimate  of  the  signal  pa¬ 
rameter  vector.  If  tm  exceeds  the  test  threshold  ta ,  the 
hypothesis  will  be  rejected.  The  quantity  calculated  by 
Fm  (u>)  can  be  interpreted  as  an  estimate  of  the  increase 
in  SNR  when  adding  the  (m+l)th  signal.  To  be  de¬ 
tected,  the  power  of  (m-t-l)th  signal  must  be  sufficiently 
high  compared  to  the  noise  power.  Under  the  hypoth¬ 
esis  Km,  the  value  Frn{u)b)  is  approximately  centrally 
F-distributed  with  the  degrees  of  freedom  ri\  and  n2  ■ 
The  threshold  ta  is  determined  by  the  Cornish-Fisher 
expansion  with  a  good  accuracy  [8]-[9] .  Note  that  the 
LR  test  can  be  easily  implemented  if  the  corresponding 
ML  estimates  are  available. 


In  this  section,  we  apply  the  developed  techniques  to 
real  data  processing.  These  data  were  recorded  by  the 
GERESS  array  located  in  the  Bavarian  Forest,  Ger¬ 
many.  Details  about  this  array  can  be  found  in  [10]. 
Two  teleseismic  events  (earthquakes)  which  occurred 
on  February  13,  1993  in  the  Eastern  Mediterranean 
and  on  February  26,  1996  in  the  Middle  East,  respec¬ 
tively,  were  selected  for  our  analysis.  The  latter  event 
is  contaminated  by  a  smaller  pre-shock,  located  about 
37  km  from  the  main  event.  More  information  about 
the  selected  events  is  collected  in  Table  1. 

Array  output  was  sampled  with  fs  =  40  Hz.  For 
each  data  set,  we  used  a  sliding  window  with  the  length 
of  3.2  s  and  the  shift  of  0.5  s.  The  total  of  seven  fre¬ 
quency  bins  between  0.9  and  3.1  Hz  have  been  used. 
Two  independent  virtual  ULA  sets  have  been  employed 
for  the  interpolated  root-MUSIC  algorithm  with  the 
central  frequency  fc  —  2.2  Hz.  The  spectral  density 
matrix  Rx  (wj )  has  been  estimated  using  L  =  3  Thom¬ 
son’s  windows  which  roughly  correspond  to  3  inde¬ 
pendent  snapshots.  The  sequential  detection  proce¬ 
dure  kept  the  test  level  a  =  0.033  constant  in  each 
step.  Theoretical  slowness  values  have  been  derived 
from  AK135  earth  model  [11]. 

The  results  obtained  from  the  weak  event  analysis 
are  shown  in  Figs.  1  and  2.  Typical  seismometer  out¬ 
puts  are  plotted  in  the  first  subplot  of  these  figures. 
The  second  subplot  shows  the  output  of  the  LR-based 
detector  which  was  used  in  conjunction  with  both  tech¬ 
niques  to  provide  their  adequate  comparison.  Appar¬ 
ently,  the  P-phases  are  detected  with  a  good  time  reso¬ 
lution  while  the  S-phases  (traveling  with  lower  velocity) 
are  not  detected  at  all.  Some  false  alarms  can  be  ob¬ 
served.  The  ML  estimates  for  the  back-azimuth  and 
velocity  are  well  concentrated  around  their  theoretical 
values.  The  estimates  obtained  from  2-D  interpolated 


Table  1:  Event  List  from  NEIC. 




deg  N 

deg  E 































root-MUSIC  show  higher  variances.  Interestingly,  both 
methods  provide  better  results  for  the  azimuth  than  the 
velocity.  Such  a  relatively  poor  performance  of  velocity 
estimates  may  be  explained  by  quite  a  limited  aperture 
length  of  GERESS. 

In  Figs.  3  and  4,  another  event  is  analyzed.  It  con¬ 
tains  two  seismic  sources  of  moderate  scales  originating 
from  the  same  location  but  at  slightly  different  times 
(see  Table  1).  In  this  data  set,  a  stronger  event  follows 
shortly  after  a  weak  event.  In  particular,  such  a  situ¬ 
ation  is  of  great  importance  when  monitoring  nuclear 
explosions.  Due  to  high  SNR’s,  the  signals  can  be  cor¬ 
rectly  detected  during  the  whole  analysis  interval.  One 
signal  is  detected  at  about  30th  second  when  waves 
from  the  first  earthquake  arrive  the  array.  At  57th  sec¬ 
ond,  the  LR  test  shows  two  signals,  corresponding  to 
the  case  when  the  superimposing  waves  from  the  first 
and  second  seismic  sources  both  arrive  the  array.  Dur¬ 
ing  the  period  from  300th  to  360th  second  (the  so-called 
S-phases),  similar  detection  results  can  be  observed  as 
well.  The  signals  detected  from  the  beginning  of  the 
analysis  up  to  16th  second  could  be  interpreted  as  false 
alarms  or  another  weak  event.  The  estimates  of  the  az¬ 
imuth  and  velocity  shown  in  subplots  3  and  4  illustrate 
that  the  ML  technique  has  better  robustness  and  lower 
variance  than  the  2-D  interpolated  root-MUSIC  tech¬ 
nique.  Note  that  the  performance  of  the  latter  method 
is  not  much  better  in  the  strong  event  case  than  in  the 
weak  event  one,  since  the  interpolation  errors  become 
more  critical  at  high  SNR’s.  Similarly  to  the  previous 
example,  both  methods  show  better  azimuth  estima¬ 
tion  performance  relative  to  that  of  velocity  estimation. 


We  compared  the  performances  of  wideband  ML  and 
interpolated  root-MUSIC  algorithms  by  processing  weak 
and  strong  teleseismic  events  recorded  by  the  GERESS 
array.  Our  results  show  that  ML  has  better  estimation 
accuracy  and  robustness  relative  to  root-MUSIC.  An¬ 
other  advantage  of  ML  is  that  the  application  of  the 
LR  test  for  detecting  the  number  of  signals  is  straight¬ 
forward.  However,  the  enormous  computational  cost 

GERESS  data :  13.02.1993  03:43  34.4N  24.8E  mb = 3.7  Crete 

time:  03:46:00  -  03:52:00  (sec) 

Figure  1:  Wideband  ML,  first  event.  ” — theoretical 
values  for  back-azimuth,  ”x”:  theoretical  values  for 

GERESS  data :  13.02.1993  03:43  34.4N  24.8E  mb  =  3.7  Crete 

time:  03:46:00  -  03:52:00  [sec] 

Figure  2:  Wideband  interpolated  root-MUSIC,  first 
event.  ” — theoretical  values  for  back-azimuth, 
”  x”:  theoretical  values  for  velocity. 


GERESS  data :  26.02.1996  07:17  28.7N  34.8E  mb  =  5.0  Guff  of  Aqaba 

30  60  90  120  150  180  210  240  270  300  330  360 

Bme:  0752:00  -  07:28:00  [sec] 

Figure  3:  Wideband  ML,  second  event.  ” — theo¬ 
retical  values  for  back-azimuth,  ”  x”:  theoretical  values 
for  velocity  of  the  main  event,  theoretical  values 
for  velocity  of  the  pre-shock. 

GERESS  data :  26.02.1996  07:17  28.7N  34.8E  mb  =  5.0  Gull  ol  Aqaba, 

4000  F 

—4000  c 

30  60  90  120  150  180  210  240  270  300  330  360 




.±  ,  ,1 . L- 


30  60  90  120  150  180  210  240  270  300  330  360 

30  60  90  120  1  50  1  80  210  240  270  300  330  360 

fme:  0722:00  -  07:28:00  [sec] 

Figure  4:  Wideband  interpolated  root-MUSIC,  sec¬ 
ond  event.  ” — theoretical  values  for  back-azimuth, 
”x”:  theoretical  values  for  velocity  of  the  main  event, 
”  *” :  theoretical  values  for  velocity  of  the  pre-shock. 

associated  with  the  ML  technique  may  be  critical  in 
practical  applications. 


[1]  J.F.  Bohme,  “Advances  in  spectrum  analysis  and 
array  processing,”  in  Array  Processing ,  Haykin,  S., 
Editor,  Prentice  Hall,  pp.  1-63,  1991. 

[2]  J.F.  Bohme,  “Statistical  array  signal  processing  of 
measured  sonar  and  seismic  data,”  in  Proc.  SPIE 
2563:  Advanced  Signal  Processing  Algorithms,  San 
Diego,  CA,  July  1995,  pp.  2-20. 

[3]  D.V.  Sidorovich  and  A.B.  Gershman,  “Two- 
dimensional  wideband  interpolated  root-MUSIC 
applied  to  measured  seismic  data,”  IEEE  Trans. 
Signal  Processing,  vol.  46,  pp.  2263-2267,  Aug. 

[4]  B.  Friedlander,  “The  root-MUSIC  algorithm  for  di¬ 
rection  finding  with  interpolated  arrays,”  Signal 
Processing,  vol.  30,  pp.  15-29,  Jan.  1993. 

[5]  B.  Friedlander  and  A.J.  Weiss,  “Direction  finding 
for  wideband  signals  using  an  interpolated  array,” 
IEEE  Trans.  Signal  Processing,  vol.  41,  pp.  1618- 
1634,  Apr.  1993. 

[6]  D.V.  Sidorovich,  C.F.  Mecklenbrauker,  and  J.F. 
Bohme,  “Sequential  test  and  parameter  estima¬ 
tion  for  array  processing  of  seismic  data,”  in  Proc. 
8th  IEEE  Workshop  Stat.  Signal  Array  Processing, 
Corfu,  Greece,  June  1996,  pp.  256-259. 

[7]  D.J.  Thomson,  “Spectrum  estimation  and  har¬ 
monic  analysis,”  Proc.  IEEE,  vol.  70,  pp.  1055- 
1096,  Sep.  1982. 

[8]  P.  Hall,  The  Bootstrap  and  Edgeworth  Expansion, 
Springer- Verlag,  NY,  1992. 

[9]  C.F.  Mecklenbrauker,  P.  Gerstoft,  J.F.  Bohme, 
and  P-J.  Chung,  “Hypothesis  testing  for  geoacous¬ 
tic  environmental  models  using  likelihood  ratio,” 
JASA,  vol.  105,  pp.  1738-1748,  March  1999. 

[10]  H.P.  Harjes,  “Design  and  siting  of  a  new  regional 
array  in  Central  Europe,”  Bull.  Seism.  Soc.  Am., 
vol.  80B,  pp.  1801-1817,  June  1990. 

[11]  B.  Kennett,  E.R.  Engdahl,  and  R.  Buland,  “Con¬ 
straints  on  seismic  velocities  in  the  Earth  from  trav- 
eltimes,”  Geophys.  J.  Int.  ,  vol.  122,  pp.  108-124, 



Brian  M.  Sadler 

Richard  J.  Kozick 

Army  Research  Laboratory 
Adelphi,  MD  20783 

Bucknell  University 
Lewisburg,  PA  17837 


Deterministic  constrained  Cramer-Rao  bounds  (CRBs) 
are  developed  for  general  linear  forms  in  additive  white 
Gaussian  noise.  The  linear  form  describes  a  variety  of  ar¬ 
ray  processing  cases,  including  narrow  band  sources  with 
a  calibrated  array,  the  uncalibrated  array  cases  of  instan¬ 
taneous  linear  mixing  and  convolutive  mixing,  and  space- 
time  coding  scenarios  with  multiple  transmit  and  receive 
antennas.  We  employ  the  constrained  CRB  formulation  of 
Stoica  and  Ng,  allowing  the  incorporation  of  side  informa¬ 
tion  into  the  bounds.  This  provides  a  framework  for  a  large 
variety  of  scenarios,  including  semi-blind,  constant  modu¬ 
lus,  known  moments  or  cumulants,  and  others.  The  CRBs 
establish  bounds  on  blind  estimation  of  sources  using  an 
uncalibrated  array,  and  facilitates  comparison  of  calibrated 
and  uncalibrated  arrays  when  side  information  is  exploited. 

Consider  the  additive  noise  linear  model 

xt  =  Hst  +  vt,  t  =!,•••  ,N,  (1) 

We  develop  CRBs  for  these  cases  using  the  constrained  CRB 
methodology  of  Gorman/Hero  and  Stoica/Ng  [3].  The  con¬ 
straints  arise  due  to  side  information  such  as  constant  mod¬ 
ulus  sources,  constraints  on  the  structure  and  elements  of 
H,  and  semi-blind  sources  (some  known  signal  values).  Ex¬ 
amples  are  given  comparing  calibrated  and  uncalibrated  ar¬ 
ray  CRBs.  A  space-time  coding  example  is  also  presented. 


Forming  the  IN  x  1  supervector  X  =  [xf ,  •  ■  ■  ,  X^]T,  then 
X  ~  CN(|JX, =  ff2lwxw),  where 

\lx=E[x]  =  [tf,---  ,|j£]T,  Mt  =  #St.  (2) 

Thus  we  have  a  multivariate  complex  normal  process  with 
deterministic  time- varying  mean  H St.  We  define  the  data 
matrix  and  the  columns  of  H  as 

S=  [Si , •  - •  ,Sjv]fcxJV,  H  =  [hi,  --  ,hfc].  (3) 

We  write  the  unknown  deterministic  parameters  in  a  real 
vector  of  length  2 Ik  +  2  kN,  given  by 

where  Xt  is  l  x  1  and  If  is  l  x  k.  The  elements  of  the  k  x  1 
signal  vector  will  be  denoted  by  St  =  [si(t),...  ,a*:(f)]T. 
We  use  the  notation  superscript  T,  *,  H  for  transpose,  con¬ 
jugate,  and  conjugate  transpose,  respectively,  with  complex 
numbers  denoted  c  =  c  +  jc.  The  noise  vt  is  assumed  com¬ 
plex  white  Gaussian,  with  variance  «r2.  The  model  (1)  un¬ 
derlies  many  array  processing  and  single-sensor  scenarios. 

In  the  narrow  band  calibrated  array  case  (l  sensors  and 
k  sources),  H  —  A(0)  ■  a  is  of  known  parametric  form  with 
respect  to  the  source  bearings.  Here  A(6)  is  the  array  man¬ 
ifold  matrix,  and  a  =  diag(<*i ,  •  ■  •  ,  a*,)  contains  complex 
constants  a<  that  model  the  channel  attenuation  for  the  ith 
source.  Constrained  bounds  are  developed  for  this  case  in 
[1,  2]. 

In  this  paper  we  are  interested  in  the  general  case  when 
H  is  unknown.  This  arises  in  the  uncalibrated  array  cases 
of  instantaneous  linear  mixing  and  convolutive  mixing,  and 
the  space-time  transmit  diversity  case  with  arrays  for  both 
transmission  and  reception.  An  uncalibrated  array  may 
have  unknown  sensor  placement,  phase  mis-matching,  and 
so  on.  In  such  cases  blind  methods  may  be  used  to  sepa¬ 
rate  and  estimate  source  waveforms  without  estimating  the 
source  bearings.  Performance  bounds  are  not  straightfor¬ 
ward  due  to  the  lack  of  regularity  in  the  Fisher  information 
matrix  (FIM)  associated  with  (1)  in  the  uncalibrated  case. 

e  =  [0H,0s]T, 


0s  =  [5T,sT,--.,5£,s£]t.  (4) 

Note  that  er2  decouples  from  the  other  parameters,  and  so 
it  is  omitted. 

The  FIM  J  for  0  is  obtained  from 

dQi  dQj 

Partitioning  J  we  write, 

lJ(e)]«  =  J>R* 


Jh  Jhs 
JsH  Js  ’ 


with  elements  described  next.  Define  the  2k  x  2k  matrix 

Jo  = 

HhH  jHH  H  ‘ 
-jHHH  HhH  ' 


then  Js  is  given  by  the  block-diagonal  2 kN  x  2k N  matrix 

Js  =  -TRe{diag(Jo,  •  •  •  ,  Jo)},  (8) 


where  Jo  repeats  N  times. 

0-7803-5988-7/00/$  10.00  ©  2000  IEEE 


Jff  may  be  written  as  follows, 

P  = 



[■P]mn  j[P\mn 

.  -m  mn  [^]  mn 

®  JlXl 

Jh  = 

r  b„ 







where  ®  denotes  Kronecker  product. 

Next  we  consider  the  cross-terms  in  the  FIM,  Jus  and 
Jsh-  It  can  be  shown  that 



[S\*mn  j[S}*mn 



Jhs  = 

r  £n 



—  Jsh ■  (13) 


As  noted,  the  FIM  J  is  generally  not  invertible  because 
the  model  parameters  are  not  identifiable,  and  so  no  un¬ 
biased  estimator  for  0  exists.  However,  it  is  possible  to 
achieve  identifiability,  and  then  regularity  of  the  FIM,  by 
establishing  constraints  on  0.  We  establish  K  equality  con¬ 
straints  on  elements  of  0,  where  K  <  dim(0).  The  con¬ 
straints  have  the  form  /i(0)  =  0  for  i  =  1, . . .  ,  K.  Define  a 
K  x  1  constraint  vector  /(0),  and  a  corresponding  K  X  M 
gradient  matrix 

F(0)  =  ^e1  (14) 

with  elements  [F(0)]i,m  =  dfl(Q)/d[Q\rn.  The  gradient 
matrix  F(0)  is  assumed  to  have  full  row  rank  K  for  any 
0  satisfying  the  constraints  /i(0),...  ,/k(0).  Then,  the 
constrained  CRB  is  obtained  via  (Thrm.  1  of  [3]) 

E  [(0  -  0)T(0  -  0)]  >  U{UTJU)-'UT.  (15) 

J  is  the  unconstrained  FIM  from  (5),  and  U  is  an  ortho- 
normal  basis  for  the  null  space  of  F(0),  i.e.,  FU  =  0  and 
UTU  =  I.  Note  that  U  is  a  function  of  the  constraints  only. 

Examples  of  source  constraints  of  interest  include  con¬ 
stant  modulus  (CM)  sources,  known  source  cumulant  or 
kurtosis,  and  semi-blind  sources  (some  known  source  sam¬ 
ples).  Constraints  may  also  be  placed  on  H,  such  as  lim¬ 
iting  the  norm  of  H.  Together,  sufficient  constraints  may 
be  found  to  insure  information  regularity.  These  provide 
CRBs  on  symbol  estimation  in  blind  source  separation  sce¬ 
narios  that  exploit  source  features  such  as  CM.  We  may  also 
compare  bounds  on  source  estimation  for  both  calibrated 
and  uncalibrated  arrays  using  the  results  of  [1,  2],  where 
we  have  established  CRBs  on  bearing,  symbol,  and  channel 
estimation  for  calibrated  arrays  with  side  information. 


We  use  the  constrained  CRB  formulation  to  gain  insight 
into  the  following  questions. 

1.  Which  provides  more  accurate  signal  copy:  an  uncal¬ 
ibrated  array  (unknown  H  matrix  in  (1))  with  CM 
signals,  or  a  calibrated  array  (H  =  A(0)  •  a)  with 
unconstrained  signals? 

2.  Algorithms  for  blind  beamforming  with  uncalibrated 
arrays  often  exploit  independence  between  the  signals 
and  non-Gaussianity  as  characterized  by  the  kurtosis 
[4,  5,  6].  What  is  the  relative  value  of  these  con¬ 
straints  when  compared  with  the  CM  constraint  for 
CM  signals?  Do  the  CRBs  based  on  kurtosis  con¬ 
straints  imply  any  difference  in  separability  of  CM 
and  QAM  signals? 

We  generate  observations  Xi , . . .  ,  Xn  in  (1)  using  a  com¬ 
plex  narrowband  array  model  in  which  H  =  4(0)  •  a,  where 
A(6)  =  [a(#i),  •  •  ■  ,  a  (0(c)]  is  the  array  response  matrix,  0  = 
[0i , . . .  ,  0k]T  are  the  source  angles  of  arrival  (AOAs),  a(0,) 
is  the  array  manifold,  and  a  =  diag{r*i ,  •  ■  •  ,  q^}  is  a  di¬ 
agonal  complex  channel  gain  matrix.  We  consider  a  uni¬ 
form  linear  array  (ULA)  with  omnidirectional  sensors  and 
half-wavelength  spacing,  so  the  array  manifold  elements  are 
[a(0)]m  =  exp[j7r(m  —  l)sin0],  m  =  1, ...  ,1. 

Consider  a  particular  ULA  with  1  =  5  sensors  and  k  —  2 
sources  with  AOAs  0i  =  0°  and  02  varying  from  1°  to 
30°,  where  the  AOAs  are  measured  with  respect  to  the 
array  broadside.  The  noise  variance  is  a2  =  1,  and  the 
number  of  time  samples  is  AT  =  100.  The  complex  ampli¬ 
tudes  Qi  and  at  are  generated  with  phase  shifts  Zc*i  = 
f  and  Z«2  =  — f  rad.  The  amplitudes  |qi  |  and  [0:2 1 
are  chosen  to  achieve  a  desired  sample  SNR,  defined  as 
SNR,  =  |ai|2C2i  (i)/er2  where  the  sample  variance  of  sig¬ 
nal  i  is  C2i(i)  =  (1/N)5[]fli  lsi(*)|2-  SNRi  is  fixed  at 
10  dB,  while  SNR2  is  evaluated  at  5, 10,  and  15  dB.  One 
beamwidth  for  the  array  iB  23.6°  at  broadside. 

3.1.  Calibrated  vs.  uncalibrated  arrays 

The  constrained  CRB  for  a  calibrated  array  in  which  H  has 
the  structure  .4(0)  •  a  is  presented  elsewhere  [2].  Here  we 
compare  the  calibrated  array  CRBs  with  the  uncalibrated 
array  CRBs  outlined  in  the  previous  section  (5), (15).  The 
signal  vectors  Si,-"  ,Sjv  are  8-PSK  waveforms  with  unit 
modulus  I  Si  (t)  |  =  1  and  phase  rotation  such  that  Si  = 
[1, •••  ,  1]  -  For  the  case  of  unconstrained  mixing  matrix 
H,  it  is  known  [7]  that  the  CM  signal  constraint  and  the 
specified  phase  rotation  are  sufficient  to  uniquely  identify 
H  and  the  signal  phases  Zsj(t).  For  the  case  of  a  calibrated 
ULA,  it  is  well-known  that  the  AOAs  0  and  signals  s,(t)  are 
identifiable  with  no  signal  constraints  (“blind”  signals). 

Figure  1(a)  contains  the  mean  CRB  on  the  signal  phase 
parameters  Zsj(2), . . .  ,  Zsi(N)  for  sources  i  —  1,2  and  var¬ 
ious  constraints  on  the  structure  of  H  and  the  signals  St. 
Note  that  as  the  source  spacing  decreases  to  less  than  one 
beamwidth,  the  constraints  of  CM  signals  with  an  uncal¬ 
ibrated  array  (unknown  H)  potentially  provide  mote  ac¬ 
curacy  in  signal  phase  than  a  calibrated  array  with  blind 
signals.  Further,  the  o  and  x  symbols  are  coincident  on 
the  plots.  So  for  CM  signals,  a  calibrated  array  provides 
negligible  improvement  in  signal  phase  accuracy  compared 
with  an  uncalibrated  array  that  places  no  constraints  on 
H.  This  example  adds  further  testament  to  the  well-known 
power  of  the  CM  signal  constraint  for  signal  separation. 


3.2.  Uncalibrated  array  and  moment  constraints 

The  following  constraints  on  the  signal  moments  are  com¬ 
mon  in  blind  beamforming  algorithms,  e.g.,  [4]-[6]: 

jjS  SH  =  known  matrix,  typically  I  (16) 

C2o(i)  =  jr  Si(tf  k110^)  i  =  1,  •  ■  •  ,  k  (17) 

v  t= i 
1  N 

fhAiii)  =  ^2  lSi(0|4  is  known,  i  =  1, . . .  , k.  (18) 

These  are  sample  moments  and  not  expectations.  Note  (16) 
expresses  that  the  signals  are  uncorrelated,  and  the  diago¬ 
nal  elements  of  (16)  constrain  the  signal  sample  variances 
(72i  (i)  =  1-  Then  (16)-(18)  imply  that  the  signal  sample 
kurtoses  C42(i)  =  m42  —  |C2o(i)|2  —  2C2i(i)2  are  known. 
We  will  refer  to  (16)-(18)  as  “moment  constraints,”  and  we 
further  assume  that  the  first  sample  of  each  source  signal  Si 
is  known  in  order  to  obtain  an  invertible  constrained  FIM. 
We  consider  two  types  of  signals:  both  source  signals  are 
8-PSK  (CM),  and  both  source  signals  are  64  QAM. 

Figures  l(b)-(d)  contain  constrained  CRBs  for  this  sce¬ 
nario.  For  the  CM  signals,  we  have  also  included  on  the 
plots  the  CRBs  based  on  the  CM  signal  constraints  |si(t)|  = 
1,  t  =  2, . . .  ,  N,  i  =  1, . . .  ,  k.  The  CM  signal  constraints 
are  exploited  by  some  blind  beamforming  algorithms,  e.g., 
ACMA  [7]. 

Figure  1(b)  contains  mean  CRBs  for  the  elements  of  the 
H  matrix.  In  the  bottom  panel  in  which  source  2  is  strong 
(SNR2  =  15  dB),  the  moment  constraints  and  the  CM  con¬ 
straints  yield  about  the  same  CRBs  for  most  values  of  62. 
In  difficult  scenarios  where  the  sources  become  very  closely 
spaced  (less  than  10°),  the  CM  signal  constraint  becomes 
more  informative  than  the  moment  constraints.  Similar  be¬ 
havior  is  exhibited  in  the  top  panel  of  Figure  1(b):  source  2 
is  weaker  (SNR2  =  5  dB),  so  the  CM  constraints  are  more 
informative  than  the  moment  constraints  over  a  larger  range 
of  AOA  spacings.  Note  also  that  if  only  moment  constraints 
are  used,  QAM  signals  provide  lower  CRBs  on  H  than  CM 
signals  for  this  case. 

Mean  CRBs  for  estimation  of  the  signals  S2, . . .  ,  Sat  are 
shown  in  Figures  1(c)  and  (d).  Source  2  is  weaker  in  Fig¬ 
ure  1(c)  than  in  Figure  1(d),  and  we  have  also  included  the 
CRBs  for  signal  estimation  when  the  H  matrix  is  known 
perfectly  (marked  with  boxes)  but  no  signal  constraints  are 
applied  (the  blind,  calibrated  case).  In  difficult  situations 
of  low  SNR  and  closely-spaced  sources,  exploiting  the  CM 
property  provides  the  potential  for  better  performance  com¬ 
pared  with  the  moment  constraints.  Note  that  the  CRBs  for 
signal  moment  constraints  and  unconstrained  H  are  approx¬ 
imately  equal  to  the  CRBs  for  known  mixing  matrix  H  and 
unconstrained  signals,  which  is  similar  to  our  observations 
about  calibrated  vs.  uncalibrated  arrays  in  Section  3.1. 


Space-time  coding  employs  multiple  antennas  on  transmit 
and  receive  [8].  In  the  flat  fading  case  the  model  of  (1)  arises 
with  k  transmit  and  l  receive  antennas,  where  St  is  the  k  x  1 

code  vector  transmitted  by  the  k  antennas  at  time  t,  and 

[H] ij  is  the  complex  fading  channel  gain  from  the  jth  trans¬ 
mit  antenna  to  the  ith  receive  antenna.  The  independent 
Rayleigh  fading  model  corresponds  to  the  [H\ij  being  in¬ 
dependent,  complex,  Gaussian  random  variables  with  zero 
mean  and  unit  variance.  Suppose  that  the  signal  constel¬ 
lation  is  assumed  to  have  average  energy  equal  to  one,  and 
let  Ea  denote  the  total  energy  transmitted  from  all  fc  an¬ 
tennas  per  symbol.  Then  we  use  \jEsjk  ■  H  in  the  model 

(I) ,  yielding  an  average  SNR  per  receive  antenna  equal  to 
Ea/t r2  for  independent,  flat,  Rayleigh  fading  channels. 

The  model  (1)  assumes  that  the  fading  coefficients  [ffjy 
are  constant  over  the  block  of  N  symbol  times.  The  con¬ 
strained  CRBs  developed  in  this  paper  assume  that  H  ■  St 
in  (1)  is  deterministic,  so  constrained  CRBs  may  be  com¬ 
puted  for  a  particular  realization  of  the  fading  matrix  H. 
In  the  example  presented  next,  we  average  the  CRBs  from 
multiple  independent  realizations  of  H  to  investigate  the 
diversity  gain  that  results  from  various  constraints. 

4.1.  Constraints 

As  an  example,  consider  the  two-transmit  antenna  space- 
time  coding  scheme  in  [9].  The  code  in  [9]  for  k  =  2  trans¬ 
mitters  can  be  expressed  via  the  signal  constraints 

St+1=Ps*,  t  =  1,3,...  ,N  —  1(N  even)  (19) 


where  P  = 

0  -1 

1  0 

so  a  total  of  two  complex  symbols  are  encoded  in  St  and 
St+i .  Sampling  at  the  symbol  rate  is  assumed,  and  this  en¬ 
coding  leads  to  a  simple  linear  receiver  structure  for  maxi¬ 
mum  likelihood  (ML)  symbol  detection.  The  ML  detector 
requires  knowledge  of  the  channel  matrix  H,  and  training 
samples  are  suggested  in  [9]  for  estimation  of  H.  We  investi¬ 
gate  bounds  on  estimation  of  the  signals  St  in  the  space-time 
coding  context  with  T  <  N  training  symbols  (semi- blind), 
the  code  (19),  and  other  constraints  including  CM  signals 
and  known  H  matrix. 

Suppose  that  the  first  T  symbols  Si , . . .  ,  St  transmit¬ 
ted  from  both  antennas  are  known,  and  assume  that  T  <  N 
with  T  and  N  even.  Then  the  gradient  matrix  (14)  corre¬ 
sponding  to  the  T  training  symbols  (semi-blind)  and  the 
space-time  code  (19)  for  samples  T  + 1, . . .  ,  N  has  the  form 

F,  = 

0fc(T  +  AT)x2!fc 





where  Fo  repeats  (N  —  T)/2  times  and  equals 

Fo  = 


P  02x2 

02x2  —  P 


The  constraints  characterized  by  (21)  will  be  denoted  ‘SEMI¬ 
BLIND  &  S-T  CODE’  in  the  example  below.  We  also  con¬ 
sider  other  combinations  of  constraints.  ‘SEMI-BLIND’  in¬ 
cludes  training  symbols  Si , . . .  ,  St  that  could  be  used  to 
jointly  estimate  H  and  the  unknown  signals  St+i,  ■  •  ■  , Sj v, 
but  the  space-time  code  is  not  exploited.  We  can  apply  the 



Figure  1:  Source  1  bearing  is  fixed  at  =  0°,  source  2  bearing  is  varied  on  [l°,30°j.  (a)  Uncalibrated  vs.  calibrated  arrays: 
CRB  on  signal  phase  estimation  for  8-PSK  signals,  (b)  Mean  CRB  on  elements  of  H  matrix  for  8-PSK  (CM)  signals  and 
64-QAM  signals  for  various  constraints,  (c)-(d):  Mean  CRB  for  signals  with  (c)  SNRi  =  10  dB  and  SNR2  =  5  dB  and  (d) 
SNR2  =  15  dB. 


constraint  that  the  N  —  T  unknown  signals  are  CM,  i.e., 
|si(t)|  =  1,  i  =  1 ,k,  t  =  T  +  1, . . .  , N.  We  can  also 
apply  the  constraint  of  known  H  matrix,  which  provides  a 
basis  for  evaluating  the  effectiveness  of  the  T  training  sym¬ 
bols  for  estimation  of  H. 

4.2.  Example 

Consider  an  example  with  k  =  2  transmit  antennas,  l  =  2 
receive  antennas,  independent  Rayleigh  fading,  and  N  =  50 
time  samples  with  T  =  2  training  symbols.  The  fading  is 
assumed  to  be  constant  over  the  block  of  N  symbol  times. 
The  SNR  per  receive  antenna  is  varied  over  the  range  0  to 
20  dB,  and  the  constrained  CRBs  are  averaged  over  500 
independent  fading  matrices  H  for  each  SNR  value.  The 
signals  are  8-PSK,  and  the  transmitted  signals  satisfy  the 
space-time  code  constraint  (19).  For  each  realization  of  H, 
we  compute  CRBs  on  the  signal  phases  ZSt+i,...  ,  Zs n 
subject  to  various  constraints,  and  these  CRBs  are  averaged 
to  obtain  mean  CRBs  for  the  realization. 

Figure  2  contains  constrained  CRBs  on  signal  phase 
estimation  for  various  constraints.  The  space-time  code 
structure  (19)  is  present  in  the  transmitted  signals,  but 
it  is  only  enforced  in  the  constraints  labeled  ‘S-T  CODE’. 
When  the  space-time  code  is  not  applied,  the  CRB  cor¬ 
responds  to  independent  estimation  of  the  transmitted  se¬ 
quences  si  (T  +  1), . . .  ,  si (N)  and  s2(T  +  1), . . .  ,  a2(N),  so 
diversity  gain  is  impossible.  We  make  the  following  obser¬ 
vations  from  Figure  2. 

•  Comparing  ‘KNOWN  H’  with  ‘KNOWN  H  &  S-T 
CODE’  shows  a  potential  diversity  gain  of  approx¬ 
imately  10  dB  in  SNR  provided  by  the  space-time 
code  when  H  is  known  exactly. 

•  Comparing  ‘SEMI-BLIND  &  S-T  CODE’  with 
‘KNOWN  H  &  S-T  CODE’  shows  that  T  =  2  training 
symbols  for  estimation  of  H  costs  approximately  3  dB 
in  SNR  compared  with  exact  knowledge  of  H. 

•  The  ‘SEMI-BLIND  &  CM  &  S-T  CODE’  curve  shows 
that  exploiting  CM  in  addition  to  the  training  and 
space-time  code  potentially  yields  about  1.5  dB  gain 
in  SNR. 

•  For  the  cases  in  which  the  space-time  code  constraint 
is  not  exploited,  the  ‘SEMI-BLIND  &  CM’  constraint 
provides  approximately  2  dB  gain  compared  with 
‘KNOWN  H’,  which  does  not  exploit  the  CM  prop¬ 

Note  that  the  constrained  CRBs  on  ZSt  pertain  to  estima¬ 
tion  of  the  signals,  while  the  primary  quantity  of  interest 
in  digital  communication  is  probability  of  detection  error. 
Smaller  CRBs  suggest  the  potential  for  reduced  probability 
of  detection  error  in  practical  receivers. 


[1]  B.M.  Sadler,  R.J.  Kozick,  T.  Moore,  “Bounds  on  con¬ 
stant  modulus  and  semi-blind  array  processing,”  Proc. 

CISS'2000,  March  2000. 


Figure  2:  CRBs  for  signal  phase  estimation  in  space-time 
coding  scenario  with  k  =  2  transmitters,  1  =  2  receivers, 
N  =  50  time  samples,  and  independent  Rayleigh  fading 
with  various  constraints. 

[2]  B.M.  Sadler,  R.J.  Kozick,  T.  Moore,  “Performance 
bounds  on  bearing  and  symbol  estimation  for  commu¬ 
nication  signals  with  side  information,”  Proc.  ICASSP 
2000,  June  2000. 

[3]  P.  Stoica,  B.  C.  Ng,  “On  the  Cramer-Rao  bound  under 
parametric  constraints,”  IEEE  Sig.  Proc.  Letters,  vol. 
5,  no.  7,  pp.  177-179,  July  1998. 

[4]  J.F.  Cardoso  and  A.  Souloumiac,  “Blind  beamforming 
for  non-Gaussian  signals,”  I  EE  Proc.F,  Vol.  140,  No.  6, 
pp.  362-370,  Dec.  1993. 

[5]  P.  Comon,  “Independent  component  analysis,  A  new 
concept?”,  Signal  Processing,  vol.  36,  pp.  287-314, 1994. 

[6]  J.  Sheinvald,  “On  blind  beamforming  for  multiple  non- 
Gaussian  signals  and  the  constant-modulus  algorithm,” 
IEEE  Trans.  Signal  Processing,  vol.  46,  no.  7,  pp.  1878- 
1885,  July  1998. 

[7]  A.-J.  van  der  Veen,  A.  Paulraj,  “An  analytical  constant 
modulus  algorithm,”  IEEE  Trans.  Signal  Processing, 
vol.  44,  no.  5,  pp.  1136-1155,  May  1996. 

[8]  A.  F.  Naguib,  N.  Seshadri,  A.  R.  Calderbank,  “Increas¬ 
ing  data  rate  over  wireless  channels,”  IEEE  Sig.  Proc. 
Mag.,  May  2000. 

[9]  S.  M.  Alamouti,  “A  simple  transmit  diversity  technique 
for  wireless  communications,”  IEEE  J.  on  Selected  Ar¬ 
eas  in  Comm.,  Oct.  1998. 



Marius  Pesavento  Alex  B.  Gershman 

Department,  of  ECE,  McMaster  University 
Hamilton,  Ontario,  L8S  4K1  Canada 
gershmanOieee . org 


We  address  the  problem  of  estimating  Directions  Of  Arrival 
(DOA’s)  of  multiple  sources  observed  on  the  background 
of  nonuniform  white  noise  with  an  arbitrary  unknown  di¬ 
agonal  covariance  matrix.  A  new  deterministic  Maximum 
Likelihood  (ML)  DOA  estimator  is  derived.  Its  implemen¬ 
tation  is  based  on  an  iterative  procedure  which  includes 
stepwise  concentration  of  the  Log-Likelihood  (LL)  function 
with  respect  to  the  signal  and  noise  nuisance  parameters 
and  requires  only  a  few  iterations  to  converge. 

New  closed-form  expressions  for  the  deterministic  and 
stochastic  direction  estimation  Cramer-Rao  bounds  (CRB’s) 
are  derived  for  the  considered  nonuniform  model.  Our  ex¬ 
pressions  can  be  viewed  as  an  extension  of  the  well-known 
results  by  Stoica  and  Nehorai,  and  Weiss  and  Friedlander 
to  a  more  general  noise  model  than  the  commonly  used  uni¬ 
form  one.  Simulation  and  experimental  (seismic  data  pro¬ 
cessing)  results  illustrate  the  performance  of  the  estimator 
and  validate  our  theoretical  analysis. 


ML  DOA  estimation  techniques  are  known  to  have  excellent 
asymptotic  and  threshold  performances  [1],  [2],  The  key 
assumption  used  for  the  derivation  of  both  the  determinis¬ 
tic  and  stochastic  ML  estimators  is  the  so-called  uniform 
white  noise  assumption  [1],  According  to  it,  sensor  noises 
are  presumed  to  form  a  zero-mean  Gaussian  process  with 
the  covariance  matrix  a2I,  where  cr2  is  the  unknown  noise 
variance,  and  I  is  the  identity  matrix.  This  simple  assump¬ 
tion  enables  to  concentrate  the  resulting  LL  function  with 
respect  to  both  signal  waveform  and  noise  nuisance  param¬ 
eters,  and,  therefore,  reduce  the  dimension  of  the  parameter 
space  and  the  associated  computational  burden  [1]. 

Apparently,  the  uniform  noise  assumption  may  be  un¬ 
realistic  in  certain  applications  [3]-[6],  where  the  noise  envi¬ 
ronment  remains  unknown  or  changes  slowly  with  time.  In 
the  general  case,  the  sensor  noise  should  be  considered  as 
an  unknown  colored  (i.e.  spatially  dependent)  process.  Re¬ 
cently,  several  advanced  ML  techniques  have  been  proposed 
which  exploit  the  ideas  of  colored  noise  modeling  [6]-[8]. 

In  some  practical  applications  (for  example,  when  the 
so-called  sparse  arrays  are  used),  the  general  colored  noise 

This  work  was  supported  by  the  Natural  Sciences  and  Engi¬ 
neering  Research  Council  (NSERC)  of  Canada. 

0-7803-5988-7/00/$  10.00  ©  2000  IEEE 

assumption  can  be  simplified  by  assuming  the  sensor  noise 
to  be  spatially  white  [4],  [5].  In  this  case,  the  noise  spatial 
covariance  structure  still  can  be  represented  by  a  diagonal 
matrix  but  the  sensor  noise  variances  are  no  longer  identi¬ 
cal  one  to  another.  Such  a  noise  model  becomes  relevant 
in  situations  with  hardware  nonidealities  in  receiving  chan¬ 
nels  [9]  as  well  as  for  sparse  arrays  with  prevailing  external 
noise  (for  example,  reverberation  noise  in  sonar  or  external 
seismic  noise)  [4],  [5]. 

It  is  important  to  stress  that  if  sensor  noise  is  a  spa¬ 
tially  nonuniform  white  process,  neither  the  conventional 
“uniform”  ML  methods  [l]-[2],  nor  the  colored  noise  mod¬ 
eling  ML  techniques  [6]-[8]  may  be  expected  to  give  satisfac¬ 
tory  results,  because  the  former  methods  will  mismodel  the 
noise,  whereas  the  latter  techniques  will  ignore  important 
prior  knowledge  that  the  noise  process  is  spatially  white. 
This  appears  to  be  a  strong  motivation  to  develop  direc¬ 
tion  finding  techniques  for  the  nonuniform  white  noise  case. 
Moreover,  the  majority  of  the  ML  colored  noise  modeling 
based  approaches  developed  so  far  are  unable  to  concen¬ 
trate  the  LL  function  with  respect  to  the  noise  parameters 
[7],  As  a  result,  such  techniques  may  be  computationally 
demanding.  The  use  of  the  nonuniform  white  noise  model 
can  be  expected  to  overcome  this  drawback  by  means  of 
obtaining  “concentrated”  solutions  to  the  ML  estimation 

The  motivation  given  shows  that  the  nonuniform  white 
noise  case  can  be  viewed  as  a  practically  important  gener¬ 
alization  of  the  simpler  uniform  model.  In  the  present  pa¬ 
per,  we  derive  a  new  iterative  deterministic  ML  estimator, 
which  concentrates  the  LL  function  with  respect  to  both 
signal  and  noise  nuisance  parameters.  Unlike  the  analytic 
concentration  used  in  the  conventional  “uniform”  ML  esti¬ 
mators,  the  concentration  of  the  LL  function  in  the  nonuni¬ 
form  noise  case  will  be  performed  in  a  numerical  (iterative) 
manner,  with  only  a  few  iterations  necessary  for  conver¬ 

Furthermore,  we  derive  closed- form  expressions  for  the 
deterministic  and  stochastic  direction  estimation  CRB’s  for 
the  considered  nonuniform  white  noise  case.  These  expres¬ 
sions  can  be  viewed  as  a  natural  extension  of  the  well-known 
results  reported  in  [l]-[2]  and  [10]  for  the  uniform  noise 
model.  The  estimation  performance  of  the  proposed  ML 
technique  is  compared  to  the  derived  CRB’s  and  the  per¬ 
formance  of  the  deterministic  uniform  ML  estimator  [1]  via 


computer  simulations.  Moreover,  we  test  both  the  uniform 
and  nonuniform  ML  techniques  using  experimental  seismic 
data  recorded  by  the  GERESS  array  (Germany).  Our  sim¬ 
ulations  and  the  results  of  real  data  processing  demonstrate 
essential  performance  improvements  achieved  by  means  of 
the  proposed  nonuniform  ML  estimator  relative  to  the  con¬ 
ventional  uniform  ML  algorithm.  Additionally,  the  exper¬ 
imental  results  provide  a  solid  verification  of  the  practical 
relevance  of  the  considered  nonuniform  noise  model. 


Let  an  array  of  n  sensors  receive  q  (q  <  n)  narrowband 
signals  impinging  from  the  sources  with  unknown  DOA’s 
01, ...  ,8q.  The  ith  snapshot  vector  of  sensor  array  outputs 
can  be  modeled  as  [l]-[3] 

x(i)  =  A(6)a(i)  +  n(i)  ,  i  =  l,...,N  (1) 

where  A(9)  —  [a(0i), . . .  ,  a(09)]  is  the  n  x  q  matrix  com¬ 
posed  from  the  signal  direction  vectors  a(6t)  (i  =  1, . . .  ,q), 
0  =  [0i, . . . ,  09]T  is  the  5x1  vector  of  the  unknown  signal 
DOA’s,  s(i)  is  the  q  x  1  vector  of  the  source  waveforms,  n(i) 
is  the  nxl  vector  of  white  sensor  noise,  N  is  the  number 
of  snapshots,  and  (-)T  stands  for  the  transpose.  In  a  more 
compact  notation,  (1)  can  be  rewritten  as 

where  ^  =  [0T,  crT ,  sT(l), . . . ,  sT(N)]T  is  the  vector  of 
unknown  signal  and  noise  parameters,  cr  =  [af, . . . ,  tr2]T, 
x(i)  =  Q~1/2x(i),  and  A(9)  =  Q~1^2  A(6). 

Introduce  the  n  x  N  matrix 

G  =  X  -  A(9)S  =  [ci, . . .  ,cN]  =  [ri, . . .  ,rn]T  (8) 

where  Ci  and  ri  me  the  nxl  and  Nxl  vectors  corre¬ 
sponding  to  the  ith  column  and  the  ith  row  of  the  matrix 
G,  respectively.  With  these  notations,  from  (7)  it  follows 

=  *l  -Nlj  Q-Xek  (9) 

where  e*  is  the  vector  containing  one  in  the  fcth  position 
and  zeros  elsewhere. 

Prom  (3)  and  (9),  we  obtain  that  if  the  other  parameters 
are  fixed,  the  ML  estimate  of  the  diagonal  noise  covariance 
matrix  is  given  by 

Q  =  ■^diag{rfri,r2/r2,...,r"r„}  (10) 

Here,  we  exploit  the  following  obvious  property  [ C]k,k  = 
rk  of  the  matrix 

X  =  A(0)S  +  N  (2) 

where  X  =  [x(l), . . . ,  x(N)]  is  the  nxN  array  data  matrix, 
S  =  [s(l), . . . ,  s(N)}  is  the  q  x  N  source  waveform  matrix, 
and  N  =  [n(l), . . . ,  n(N)]  is  the  nxN  sensor  noise  matrix. 
The  sensor  noise  is  assumed  to  be  a  zero-mean  spatially 
and  temporally  white  Gaussian  process  with  the  unknown 
diagonal  covariance  matrix 

Q  =  E{n{t)nH(t)}  =  diag  {(?!,(?%, .  ■  ■  ,<r*}  (3) 

In  what  follows,  the  signal  waveforms  will  be  assumed  to 
be  either  deterministic  unknown  processes  [1],  or  random 
zero- mean  Gaussian  processes  [2],  In  particular,  the  signal 
snapshots  are  assumed  to  satisfy  the  following  models 

x(i)  ~  M(As{i),Q)  (4) 

x(i)  ~  Af(0,  R )  (5) 

in  the  deterministic  and  stochastic  case,  respectively.  Here, 

R  =  E{x(i)xH  (*)}  =  APAh  +  Q  (6) 

is  the  array  covariance  matrix,  P  =  E{s(i)sw(i)}  is  the 
source  waveform  covariance  matrix,  Af  denotes  the  complex 
Gaussian  distribution,  and  (-)H  stands  for  the  Hermitian 


Under  the  assumption  that  the  signal  waveforms  are  deter¬ 
ministic  unknown  sequences,  the  LL  function  for  the  model 
considered  is  given  by  [11] 

n  N 

L{V)  =  -N^logo-fc  -  ||*(»)  -  A(0)s(i)||2  (7) 

k= 1  i=  1 


C  =  ^CiC?  (11) 

i= 1 

Inserting  (10)  into  (7),  we  have 
L(0,S)  -  -AT^Sogjirfr*}  -  (12) 

fc=i  t=i 

Using  (10)-(11)  and  the  properties  of  the  trace  operator,  we 
obtain  that 

£cfQ  1cis 

k= 1 

trace  |  Q  j 

k= 1 

trace  |Q_1c|  =nN  (13) 

Hence,  after  omitting  the  constant  term  (13),  the  LL  func¬ 
tion  (12)  can  be  further  simplified  to 

L(0,S)  =  -NY^log{jjrZrk}  (14) 


At  the  same  time,  from  (7)  we  obtain  in  a  standard  way  that 
if  the  remaining  parameters  are  fixed,  the  ML  estimate  of 
the  matrix  S  is  given  by 

S=  (AH(0)A(0))  1AH(0)X  (15) 

where  X  =  Q~1/2X  is  the  nx  N  transformed  data  matrix. 
Note  that  the  estimate  (15)  depends  on  Q,  and,  in  turn,  the 
estimate  of  Q  in  (10)  depends  on  S.  Therefore,  it  appears 
to  be  impossible  to  obtain  any  closed  form  expression  of 
the  LL  function  concentrated  with  respect  to  the  full  set 


of  the  signal  and  noise  nuisance  parameters.  To  avoid  this 
difficulty,  we  introduce  the  idea  of  stepwise  concentration, 
which  was  also  exploited  in  [3]  in  an  implicit  form.  The 
essence  of  this  idea  is  to  concentrate  the  LL  function  in  an 
iterative  manner. 

Omitting  the  constant  factor  —N  in  (14)  and  inserting 
(15)  into  this  equation,  we  obtain  the  following  alternative 
expressions  for  the  negative  LL  function 


£(0)  =  5>{^rfr*} 

fc= 1 

=  trace  log  j  ~  GGH  j 
=  trace  log  {ip^(0)XXHP^(0)} 

=  trace  log  |Pj^(0)  pj  (16) 

AH  (0)A(6)y1  AH  (8)  and 
Pj^(0)  =  I  —  P are  the  projection  matrices.  Here, 

R  =  ±XXH  (17) 

is  the  n  x  n  sample  covariance  matrix  of  the  transformed 

It  is  important  to  stress  that  in  the  particular  uniform 
noise  case  ( Q  =  cr2I),  the  function  (16)  can  be  simplified 

C(9)  =  trace  log  {P^(0)  P}  (18) 


•  Step  1.  Set  Q  =  I. 

•  Step  2.  Find  the  estimate  of  6  as  9  — 

argming  {£(©)}  where  the  negative  LL  function 
C(0)  is  defined  by  (16). 

•  Step  3.  Using  the  so-obtained  0,  compute  S  from 
(15).  Find  the  refined  estimate  of  Q  from  (10) 
using  (8)  and  the  previously  obtained  (fixed)  S 
and  0.  Repeat  steps  2  and  3  a  few  times  to  obtain 
the  final  estimate  of  9. 

In  step  1,  the  algorithm  is  initialized  using  the  uniform 
noise  assumption.  Under  this  assumption,  the  estimate  of 
Q  should  be  written  a s  Q  =  a2 1 ,  where  a1  is  some  estimate 
of  the  noise  variance  a2 .  However,  from  the  structure  of  the 
negative  LL  function  (16)  it  follows  that  the  minimizer  of 
this  function  does  not  depend  on  the  value  of  a2.  Therefore, 
without  loss  of  generality  in  step  1  we  can  set  a2  —  1. 


The  following  two  theorems  present  closed-form  expressions 
for  the  deterministic  and  stochastic  CRB’s  under  the  nonuni¬ 
form  noise  assumption. 

Theorem  1:  The  qxq  deterministic  CRB  matrix  for  the 
signal  DOA’s  is  given  by: 

CRBdet00  =  ^  {Re  [(bHp\b)  ©  PT]  p  (21) 

where  A  =  Q~1/2A,  D  =  Q~1/2D,  P  =  j?  *(*)*(*)" » 

O  stands  for  the  Schur-Haramard  matrix  product,  and 

where  P^(0)  =  A(0)  f 

R=±XXh  (19) 

is  the  sample  covariance  matrix  of  the  original  data  (1). 
Interestingly,  this  function  is  not  equivalent  to  the  conven¬ 
tional  negative  LL  function  [1] 

r  da( i 

da(9)  | 

da(6)  1  1 










Proof:  See  [11]. 

Theorem  2:  The  qxq  stochastic  CRB  matrix  for  the 
signal  DOA’s  is  given  by: 

C{0)  =  trace  {P\{0)  P}  (20) 

derived  under  the  uniform  noise  assumption.  The  explana¬ 
tion  of  this  fact  lies  on  the  basis  of  the  observation  that  the 
ML  estimators  (16)  and  (20)  use  very  different  types  of  a 
priori  information  on  the  structure  of  the  noise  covariance 

Another  important  observation  is  that  unlike  (20),  the 
function  (16)  does  not  enable  simultaneous  concentration 
with  respect  to  both  signal  and  noise  nuisance  parameters. 
This  fact  can  be  explained  by  inspecting  the  structure  of 
(16).  According  to  this  equation,  the  estimate  of  the  signal 
DOA  vector  9  depends  on  the  estimate  (10)  of  the  matrix  Q, 
which,  in  turn,  is  dependent  of  the  estimate  of  0.  To  over¬ 
come  this  problem,  instead  of  the  analytic  concentration  ap¬ 
proach  used  for  the  derivation  of  the  uniform  ML  estimator, 
we  propose  the  so-called  stepwise  numerical  concentration, 
which  is  given  by  the  following  iterative  procedure: 

CRBSTO00  =  l{2Re[(pAWR-,Ap) 

©  (b^p^R-'by  |  -mtmt 

where  R  =  Q-1,2RQ-  ' 1 ,/?  and  the  real  matrices 

2Re  j(iT1Ap)T©  (i>"P^)J  ,  (24) 


-(P^P-1)*©(P^P-1)}“1  (25) 

Proof:  See  [11], 

It  is  interesting  to  compare  the  derived  expressions  with 
the  deterministic  and  stochastic  CRB’s  in  the  uniform  noise 

M  = 
T  = 


case.  The  latter  two  bounds  are  given  by  [1],  [2],  [10] 
CRBdet(,0  =  ^{Re[(X?ffP^)0PT]}_1(26) 
CRBsto00  =  ^{Be[(PAHR-1AP) 

©(z?f/p^ir1r>)T]}  1  (27) 


The  comparison  of  (21)  and  (26)  shows  that  the  nonuni¬ 
form  deterministic  bound  (21)  corresponds  to  the  uniform 
CRB  (26),  with  the  only  difference  that  the  nonuniform 
CRB  uses  the  transformed  array  manifold  A  instead  of  the 
original  manifold  A.  This  transformation  can  be  viewed  as 
a  sort  of  preequalization  of  sensor  noise1 .  To  explain  the  ef¬ 
fect  of  noise  preequalization,  let  us  consider  the  case  when 
some  part  of  array  sensors  suffers  from  intensive  noises, 
whereas  another  part  of  sensors  remains  relatively  “noise¬ 
less”  .  According  to  the  above-mentioned  manifold  transfor¬ 
mation,  the  contribution  of  the  noisy  sensors  to  the  CRB 
(21)  will  be  negligible  because  of  relatively  low  weights  as¬ 
signed  to  these  sensors.  This  corresponds  to  our  natural 
expectation  that  the  optimal  (ML)  algorithm  derived  for 
the  nonuniform  model  should  be  insensitive  to  the  pres¬ 
ence  of  such  noisy  sensors.  Such  a  robustness  property  is 
achieved  by  means  of  blocking  the  outputs  of  corresponding 
(noisy)  array  channels  and  exploiting  only  noiseless  sensors. 
Prom  this  point  of  view,  the  manifold  transformation  ma¬ 
trix  Q~i/2  can  be  identified  as  a  sort  of  blocking  matrix. 

As  it  can  be  seen  from  the  comparison  of  (23)  and  (27), 
in  the  stochastic  case  the  relationship  between  the  uniform 
and  nonuniform  bounds  becomes  more  complicated  than 
in  the  deterministic  case.  In  particular,  this  relationship 
cannot  be  described  solely  in  terms  of  the  manifold  trans¬ 
formation  Q  -1/2.  We  observe  that  the  bound  (23)  contains 
an  additional  term  -MTMt  which  does  not  appear  in 
(27).  In  the  general  case,  we  obtain  that 

Nonuniform  CRBDETg^  =  Uniform  CRBDETgg 



10’  10!  10s 


Figure  1:  Comparison  of  the  DOA  estimation  RMSE’s  and 
CRB’s.  First  example. 

where  B  —  diag{(w/c)di  cos#i,  (w/c)d2  cos#i, . . . ,  (u/c)dn 
cos#i),  p  =  jf  | -s ( i )  | 2 ,  dk  is  the  coordinate  of  the  fc-th 

sensor,  uj  is  the  central  frequency,  and  c  is  the  propagation 

Assuming  that  the  array  has  omnidirectional  sensors, 
the  number  of  snapshots  is  high  (p  ~  p),  and  defining 
the  SNR  as  [5]  SNR  =  (pln)aHQ~la  =  ( p/n )  £"=1 1  /ah 
we  obtain  the  following  explicit  relationship  between  the 
stochastic  and  deterministic  single-source  bounds: 

CRBstc>0@  —  ^1  +  rt  snh  )  CRBdet00  (28) 

Hence,  in  the  large  sample  case  the  difference  between  the 
two  bounds  becomes  small  when  the  source  is  powerful 
enough,  so  that  nSNR  »  1. 


Nonuniform  CRBgTO  qq 

>  Uniform  CRB 

STO  09 

The  proof  of  the  last  equation  is  given  in  [11]. 

Assume  that  there  is  only  one  signal  source  ( q  —  1). 
In  this  case,  we  have  that  A  =  a  and  D  =  d,  where  a  — 
Q-^a.  Therefore,  the  array  covariance  matrix  (6)  can 
be  rewritten  as  R  =  paaH  +  Q,  where  p  =  E  {|s(i)|2}  is 
the  signal  variance.  It  is  easy  to  show  that  in  this  case  the 
bounds  (21)  and  (23)  can  be  simplified  to  [5] 


aH Q  la 

2  Np[aH  Q~laaH  B2Q~1a  —  (aHBQ_1o)2] 

rRR  = _ 1  +  paH Q  ]a. _ 

STOee  2Np2 [a11  Q~1aaH B2Q~1a  -  (a^BQ-'a)2] 

1  Usually,  the  term  prewhitening  is  used  but  this  is  somewhat 
confusing  to  use  it  here  because  sensor  noise  has  been  originally 
assumed  to  be  spatially  white. 

We  assumed  a  ULA  of  ten  sensors  spaced  half  a  wave¬ 
length  apart,  and  two  equally  powered  sources  with  the 
DOA’s  9\  =  7°  and  62  =  13°.  The  nonuniform  noise  was 
assumed  to  have  the  following  covariance  matrix:  Q  = 
diag{10.0,  2.0, 1.5, 0.5, 8.0, 0.7, 1.1,  3.0, 6.0,  3.0).  In  all  our 
examples,  the  experimental  DOA  estimation  Root-Mean- 
Square  Errors  (RMSE’s)  of  the  conventional  uniform  and 
the  proposed  nonuniform  ML  methods  have  been  compared 
to  the  nonuniform  CRB’s  (21)  and  (23). 

In  the  first  example,  we  assume  two  uncorrelated  sources 
with  the  SNR  =  10  dB.  Fig.  1  displays  the  results  versus 
the  number  of  snapshots.  In  the  second  example,  two  cor¬ 
related  sources  are  assumed,  with  the  correlation  coefficient 
equal  to  0.9.  The  SNR  =  15  dB  is  taken  and  the  results  are 
plotted  in  Fig.  2  versus  the  number  of  snapshots. 

From  Figs.  1-2,  we  observe  that  uniform  ML  performs 
poorly  in  the  nonuniform  noise  case.  As  expected,  the  pro¬ 
posed  nonuniform  technique  provides  essential  performance 
improvements.  In  particular,  it  attains  the  stochastic  CRB 



Figure  2:  Comparison  of  the  DOA  estimation  RMSE’s  and 
CRB’s.  Second  example. 

(23)  even  at  small  sample  sizes.  Since  two  iterations  are 
enough  to  guarantee  the  convergence,  the  computational 
cost  of  our  technique  is  comparable  to  that  of  conventional 


To  validate  the  practical  relevance  of  the  nonuniform  noise 
model,  real  seismic  data  were  used.  These  data  were  col¬ 
lected  by  GERESS  array  (Germany).  The  data  record  of 
the  regional  seismic  event  at  an  azimuth  of  0  =  121.8°  was 
analyzed  (see  [12]  for  details).  Note  that  the  azimuth  value 
of  this  event  was  known  in  advance  with  a  high  precision. 
Estimating  this  parameter  using  the  methods  tested,  we 
were  able  to  compare  their  experimental  performances. 

The  conventional  and  proposed  ML  methods  have  been 
applied  to  azimuth-velocity  (2D)  estimation  at  the  follow¬ 
ing  four  frequencies:  fi  =  0.9375  Hz,  /2  =  1.25  Hz,  f3  = 
1.5625  Hz,  and  /4  =  1.875  Hz. 

The  experimental  azimuth  estimates  have  been  used  to 
compute  the  experimental  RMSE’s  shown  in  Fig.  3.  From 
this  figure,  it  is  clearly  seen  that  nonuniform  ML  has  notice¬ 
ably  better  experimental  performance  than  the  uniform  ML 
technique.  These  results  provide  a  solid  verification  of  rele¬ 
vance  of  the  developed  nonuniform  noise  model  in  practical 


[1]  P.  Stoica  and  A.  Nehorai,  “MUSIC,  maximum  likeli¬ 
hood  and  Cramer-Rao  bound,”  IEEE  Trans.  ASSP,  37, 
pp.  720-741,  May  1989. 

[2]  P.  Stoica  and  A.  Nehorai,  “Performance  study  of  con¬ 
ditional  and  unconditional  direction-of-arrival  estima¬ 
tion”,  IEEE  Trans.  ASSP,  38,  pp.  1783-1795,  Oct.  1990. 










0.6  1  1.2  1.4  1.6  1.8  2 


Figure  3:  Comparison  of  the  DOA  estimation  RMSE’s. 

Real  seismic  array  data. 

[3]  J.F.  Bohme  and  D.  Kraus,  “On  least  squares  methods 
for  direction  of  arrival  estimation  in  the  presence  of  un¬ 
known  noise  fields”,  ICASSP’88,  NY,  pp.  2833-2836, 
Apr.  1988. 

[4]  A.B.  Gershman,  A.L.  Matveyev,  and  J.F.  Bohme, 
“Maximum  likelihood  estimation  of  signal  power  in  sen¬ 
sor  array  in  the  presence  of  unknown  noise  field,”  I  EE 
Proc.  RSN,  F-142,  pp.  218-224,  Oct.  1995. 

[5]  A.L.  Matveyev,  A.B.  Gershman,  and  J.F.  Bohme,  “On 
the  direction  estimation  Cramer-Rao  bounds  in  the 
presence  of  uncorrelated  unknown  noise,”  Circ.,  Syst., 
Signal  Processing,  18,  pp.  479-487,  1999. 

[6]  J.  LeCadre,  “Parametric  methods  for  spatial  signal  pro¬ 
cessing  in  the  presence  of  unknown  colored  noise  fields,” 
IEEE  Trans.  ASSP,  37,  pp.  965-983,  July  1989. 

[7]  B.  Friedlander  and  A.J.  Weiss,  “Direction  finding  using 
noise  covariance  modeling,”  IEEE  Trans.  SP,  43,  pp. 
1557-1567,  July  1995. 

[8]  P.  Stoica,  M.  Viberg,  K.M.  Wong,  and  Q.  Wu, 
“Maximum-likelihood  bearing  estimation  with  partly 
calibrated  arrays  in  spatially  correlated  noise  field,” 
IEEE  Trans.  SP,  44,  pp.  888-899,  Apr.  1996. 

[9]  U.  Nickel,  “On  the  influence  of  channel  errors  on  ar¬ 
ray  signal  processing  methods”,  Int.  J.  Electron,  and 
Comm.,  47,  pp.  209-219,  1993. 

[10]  A.J.  Weiss  and  B.  Friedlander,  “On  the  Cramer-Rao 
bound  for  direction  finding  of  correlated  sources” ,  IEEE 
Trans.  SP,  41,  pp.  495-499,  Jan.  1993. 

[11]  M.  Pesavento  and  A.B.  Gershman,  “Maximum- 
likelihood  direction  of  arrival  estimation  in  the  presence 
of  unknown  nonuniform  noise”,  submitted. 

[12]  D.V.  Sidorovich  and  A.B.  Gershman,  “2-D  wideband 
interpolated  root-MUSIC  applied  to  measured  seismic 
data”,  IEEE  Trans.  SP,  46,  pp.  2263-2267,  Aug.  1998. 



Victor  S.  Golikov ,  Francisco  C.  Pareja 
Ciencia  y  Tecnologia  del  Mayab,  A.  C. 

Calle  12,  No.  199,dep.5,  entre  19  y  21,  Col.  Garcia  Gineres  ,  C.P.  97070,  Merida,  Yucatan,  Mexico 


The  optimal  detection/estimation  algorithms  require  large 
computing  expenditures  in  the  radar,  sonar  and  etc.  The 
paper  presents  the  new  Uniformly  Most  Powerful  Test  for 
matched  detecting  of  the  symmetrical  signal  subspace.  The 
general  (logical)  shift  operators  group  is  used  for 
describing  of  the  symmetry.  This  algorithm  may  be  used 
to  reduce  the  complexity  of  matched  detector  for  unknown 
signal  subspace  and  for  a  signal  processing  in  real  time. 
The  reduction  brings  appreciable  hardware  gains  and  a 
small  performance  penalties  in  some  radar  systems.  The 
signal  subspace  model  for  moving-target  indication  in 
radar  is  considered.  We  used  the  new  approach  for 
creation  of  the  sub-optimal  detector  with  minimal 
computing  expenditures. 


subspace  <H).  Here  PH  is  the  projection  x  on  subspace  (H): 

Ph=H(HtH)'Ht.  (4) 

The  statistic  y1  is  a  quadratic  form  in  the  normal  random  vector 
x:  N[|iH0,  C^Ph].  It  is  known  that  ^/o2  is  chi-squared  distributed 
with  noncentrality  parameter  (p2/c^)Es>  Es  =  0THTH0:  j^/o2: 
X2P(|x2E s/o2). 

The  chi-squared  distribution  has  a  monotone  likelihood  ratio. 
Therefore  by  the  Karin-Rubin  theorem,  the  test 

•Kx'/o2)  =  {  (5) 

o,  yW<y\ 

is  the  Uniform  Most  Powerful  (UMP)  invariant  detector  for 
testing  Ho:  p=0  versus  Hi:  p>0  in  the  measurement  x:  N[pH0, 
o2!].  Further  we  will  consider  a  subspace  (H)  as  a  symmetrical  to 
the  group  of  generalized  (logical)  shift  transformations.  Further 
we  establish  that  statistic  (3)  is  also  maximal  invariant  to  the 
group  transformation  of  general  shift  for  symmetrical  signal 
subspace  (H). 

In  signal  detection  problems,  we  assume  that  each 
measurement  is  a  sum  of  a  signal  component  and  a  noise 
component:  x„  =  ps„  +  own  ;  n=0,l, ...,  N-l. 

The  measurements  are  organized  into  a  N-dimensional 
measurement  vector  x=  ps  +  aw;  ( 1 ) 

where  vector  ps  contains  samples  of  the  signal  to  be  detected 
and  the  vector  aw  contains  samples  of  the  added  noises.  We 
assume  that  the  noise  vector  w  is  draw  from  a  multivariate 
normal  distribution  w:  N[0,I].  This  means  that  the  measurement 
x  is  drawn  from  a  multivariate  normal  distribution  x:  N[ps,  a2!]. 
In  some  systems  it  sometimes  happens  that  the  signal  s  in  the 
measurement  model  x:  N[ps,  a2!]  is  a  linear  combination  of 
modes  or  basis  vectors,  in  which  case  it  may  be  represented  as 
N- 1 

s=^enhn=m.  Here  H  is  a  known  N  X  N  matrix  with 


columns  h„  and  0  is  a  unknown  NX  1  vector  with  elements  0n: 


s=[hoht ...  hfj-i]  :  .  (2) 

&N- 1 . 

Let  the  mode  matrix  H  is  known  but  the  mode  weights  are 
unknown.  In  this  case,  the  signal  is  known  to  lie  in  the  linear 
subspace  (H)  spanned  by  the  columns  of  H,  but  its  exact  location 
is  unknown  because  0  is  unknown.  We  would  like  to  test  Ho-'  p=0 
versus  Hi:  p>0  when  x  is  distributed  as  N[pH0,  a1!]  and  0  is 
unknown.  It  is  known  [1]  that  the  statistic  y2  =  xTPHx  (3) 
is  a  maximal  invariant  to  the  group  of  transformations  that  adds 
a  bias  from  the  orthogonal  subspace  (A)  and  rotates  in  the 


The  operation  t©x  is  called  generalized  (logical)  shift  in  an 


argument  t  on  a  value  x,  where  x,te  [0,N-1  ] ,  t  =  ^  f  p  m  P_1  , 

p= 1 

T  =  ZTPmP~'  =ZcPm/”1  ’ 

p= l  p= l  p= l 

Cp=((tp+xp))m  -  residue  (mod  m)  and  Cp,tp,Tps  [0,m-l],  N=m".  Let 
g(h)  denote  the  operator  of  a  generalized  shift  [2],  We  represent  a 
discrete  mode  of  a  signal  as  a  column  vector  h=(hn  h] . . .  hn.i)T. 
The  generalized  shift  operation  can  be  represented  as 
permutation  of  coordinates  of  this  vector.  It  is  possible  to 
represent  the  operators  of  generalized  shift  by  block  cyclic 
matrixes  of  permutations.  The  matrix  gje  G  is  a  matrix  of 
permutation,  therefore  one  unit  is  equal  to  each  of  its  rows  and  in 
each  column  there  is  only  a  singular  1,  all  of  the  remaining 
numbers  are  zero.  Let  (H  )  be  a  symmetrical  subspace.  Then  hj  = 
gihk,  i=l©k;  i,l,ke  [0.N-1  ].  Therefore  symmetrical  matrix  H 
may  be  written  as 

»=[  h  gA  -  (6) 

The  subspace  (H)  is  called  symmetrical,  if  transformed  mode 
h;6  H  by  group  G  also  belongs  to  subspace  (H),  but  mode  has 
another  value  of  the  parameter  i: :  gh,  =  hreH,  i,re  [0,N-1], 

Note,  that  g  is  orthogonal  matrix:  ggT=I.  We  have  the  following 
representation  for  the  operator  g:  gj  =  VHWiV, 

0-7803-5988-7/00/$  10.00  ©  2000  IEEE 


where  V  =  NT1/2  [Had(t,t)] ,  Had(t,x)  =  exp[j2n/m  ^'tiXi  ]„ 


j=V=T,  W,  =  diag[Had(i,T)],  VHV=WH=I.  We  simplify  our 
notation  by  written  (VT)*  as  VH,  where  T  is  sign  of  transposition, 
*  -  sign  of  the  complex  conjugate.  Eigenvector  of  generalized 
shift  operators  are  the  full  orthonormalized  systems  of 
Hadamard-Chrestenson  functions: 

n  n 

Had(p,t)=exp[j2jt/m^pIfI-  ],  p  =  '^pim,~l  , 

!=1  i=l 


•  (?) 


At  m=2  they  are  called  Walsh  functions,  at  m=N  they  are  called 
discrete  exponential  functions. 

The  matrix  H  is  block  cyrculant  matrix  and  may  be  written  as 
H  =  VhAV  =  gHg,  foranygeG,  (8) 

then  Ph=H(HtH)  Ht=  VHflV,  (9)  where 

A  =  diag(Ao  ...  XN.(),  Xj  -  eigenvalue  of  matrix  H , 

£2  =  diag(Eoei ...  En_i),  £i-  eigenvalue  of  matrix  PH.  The  terms  of 
the  diagonal  matrix  A  are  a  Hadamard-Chrestenson 
Transformation  of  the  fist  column  h  of  matrix  H.  Similarly  the 
terms  of  the  diagonal  matrix  £2  are  a  Hadamard-Chrestenson 
Transformation  of  the  fist  column  of  matrix  PH. 


The  sufficient  statistic  for  the  parameter  p  is  (3) 

X2  =  xTPHx. 

The  operator  PH  is  block  cyrculant  matrix  and  it  may  be  written 

P„  =  H(HTH)lHT=gPHg.  (10) 

Now  we  will  establish  that  statistic  (3)  is  a  maximal  invariant  to 
the  group  transformation  of  general  shift  G  =  {g:  g(x)  =  gx  = 
VHWVx  }  under  condition  (6,8,9).  It  is  clear  that: 

1-  (gx)TPHgx  =  xTPHx.  (11) 

2.  (x,)tPhx,  =  (x2)  tPhx2=Kx,)t  Vh£2Vx,  =  (x2)TVH£2Vx2 
=>(Xi)Tt2Xi  =  (X2)t£2X2=>  1 1  X,£21/21 1 2  =  1 1  X2£21/2i  | 2  =>  x,  =  gx2> 

where  Xi  =  Vxi  and  X2  =  Vx2  is  a  Hadamard-Chrestenson 
transformation  of  x,  a  sign  1 1  1 1  is  Euclidean  norm.  The  maximal 
invariant  may  be  written  as 
N- 1 

wp=  (1/V77 )'£hj  Had*(p,i)  (13) 


The  statistic  (3)  requires  N2  multiplication  operations  and  N2 
addition  operations.  The  new  statistic  (1 1)  requires  N 
multiplication  operations  and  N2  addition  operations  for  m=2.  In 
this  case  it  is  used  Walsh  Transformation  instead  of  Hadamard- 

Chrestenson  transformation. 

The  statistic  (3)  has  not  performance  penalties  if  the  signal 
subspace  is  symmetrical. 

But  exact  symmetry  in  signal  subspace  exists  seldom  for 
real  signal  model.  Let  consider  N  the  continuous  time 
cosinusoids  of  the  form  Aicos((0jt  +  tpj)  are  summed  to 
produce  the  signal  s(t).  If  this  signal  is  sampled  at  the 
sampling  instants  t=nT,  then  the  discrete  time  signal  is: 

sn=  7,  Aj  cosjCQjTn  +  <Pj) . 


Typically,  such  samples  are  taken  over  an  interval  [0<t<NT]  to 
produce  the  samples  vector  s  =  [s0  s* ...  sn.j]t.  The 
vector  of  samples  s  may  be  written  as  s  =  Re  H0, 
where H  =  [ho  h,  . . .  h,,.,], 0=  [0o0i  ...  0N.i]T, 
hj  =  [1  exp(jt0jT  ...  exp(j(0jT(N-l))  ]T,  0;  =  A,exp(j  <)>;), 
cc»i=  CDoi.  We  assume  that  s  is  an  N-vector  that  is  constructed  from 
a  linear  combination  of  linearly  independent  cosines  and  sines, 
provided  T=l,  coj  =  (27t/N)i,  ie[0,N-l].  The  mode  hj  is  a 
complex  exponential  mode  and  HHh=NI.  The  algorithm  (11) 
consists  of  two  parts:  coherent  detector 
N-l  N-l 

yk=  (1/V^7 )7[wp  (l/V^V  )YjX.  Had*(p,i)]Had(k,p) }  (14) 

p= 0  i=0 


and  energy  detector  x2  =  7  (yk  )2 
*= o 

written  as 

.  The  test  (3)  may  be 

X2  =  xtPhx  =  xtH(HtH)'Htx 


where  t  =  HTx,  e  =  1 1  hi  1 2  (16) 

The  known  algorithm  (15)  and  obtained  algorithm  (11)  have 
difference  in  their  coherent  detector  (14)  and  (16).  We  compare 
signal-to-noise  ratio  (SNR,)  for  test  (14)  and  (SNR2)  for  test  (16) 
for  each  mode  of  H.  Let  Zk  =  [(SNR)1]k/[(SNR)2]|t  denote  factor 
of  noise  immunity  loss.  [(SNR),]k  may  be  written  as  [(SNRhL  = 


- ,  and  [(SNR)2]k  may  be  written  as  [(SNR)2]k  =  pN/o.  Then 


factor  of  noise  immunity  loss  Zk  =  pyk/pN.  (17) 

It  is  plotted  for  N=64,  M=2,  (no  =1  in  Figure  1.  This  curve  may 
be  used  to  compute  the  effective  loss  in  SNR  that  results  from 
not  existing  exactly  symmetry  in  a  subspace  (H)  to  the  dyadic 
shift  group. 

This  implementation  of  coherent  detector  has  N  operations  of 
multiplication  and  N2  operations  of  addition.  The  structure  of 
implementation  for  known  test  t  =  HTx  consists  of  N  branches. 
Each  branch  is  a  correlator  of  transformed  data  with  stored 
modes.  Therefore  known  test  structure  has  N2  operations  of 
multiplication  and  N  operations  of  additions.  The  advantage  of 
new  algorithm  is  obvious.  The  implementation  is  hardware- 
efficient,  but  it  is  sub-optimum. 

The  accuracy  of  the  symmetry  in  subspace  (H)  defines  the  noise 
immunity  of  this  algorithm. 



Figure  1  Signal-to-noise  effective  loss  versus  mode  for  m=2  and 

[1]  Louis  L.  Scharf.  Statistical  Signal  Processing:  Detection, 
Estimation,  and  Time  Series  Analysis.  Addison-Wesley, 

[2]  V.Golikov.  “ The  Theory  of  Optimal  M -ary  Interperiod 
Processing  when  Detecting  Fluctuating  Signals 

on  a  Background  of  Correlated  Interference  and  Noise”, 
Radioelectronics  and  Communications  System,  vol.  31, 
pages  2-6,  April,  1988. 

Figure  2  Implementation  of  the  coherent  detector 
for  symmetrical  signal  subspace 

The  accuracy  of  the  symmetry  in  subspace  (H)  defines  the  noise 
immunity  of  this  algorithm.  In  our  case  the  noise  immunity  losses 
smaller  than  3  dB  (0.5  +  1)  for  half  of  the  modes.  Our  researches 
have  shown  that  this  relation  is  saved  at  increase  N.  Note  that 
when  symmetry  in  subspace  (H)  is  not  exact,  SNR  for  some 
modes  may  be  maximized  by  choosing  h0(tOo).  It  is  illustrated  in 
Figure  3  for  m=2,  N=64  and  <»o=1.3.  In  this  case  another  some 
modes  have  much  more  SNR  than  for  oio=l  (Fig.l).  Note  that  it 
is  possible  to  change  the  type  of  symmetry  in  this  problem.  We 

can  choose  m=3,4,5 . But  if  increasing  of  m  the  complexity 

of  test  is  increased. 


The  new  algorithm  for  matched  symmetrical  subspace  detector 
has  been  presented.  It  may  be  used  to  reduce  the  complexity  of 
known  algorithm  for  the  signal  subspace  detection.  High  quality 
performance  is  obtained  for  moving-target  indication  under 
unknown  Doppler  frequency.  The  used  the  new  approach  for 
creation  of  the  sub-optimal  detector  with  minimal  computing 




E.  Fishier  and  H.  Messer 

Department  of  Electrical  Engineering-Systems,  Tel  Aviv  University, 

Tel  Aviv  69978,  Israel, 

E-mail:  :  {eranf,messer}@eng.tau. 


Multiple  source  direction  finding  algorithms  (e.g.,  MUSIC ) 
are  applied  on  simultaneous  measurements  collected  by  M 
sensors.  However,  practical  considerations  may  dictate  us¬ 
ing  less  receivers  than  sensors,  such  that  the  measurements 
cannot  be  collected  simultaneously.  In  such  cases,  data  is 
collected  sequentially  from  the  different  array  elements  in 
a  process  which  is  referred  to  as  ’’time  varying  preprocess¬ 
ing”,  or  ’’switching”. 

In  this  paper  we  study  multiple  source  direction  finding 
(DF)  with  an  array  of  M  >2  elements,  where  only  two 
receivers  are  available. 


Direction  finding  with  fewer  receivers  than  sensors  via  time 
varying  processing  is  a  very  important  issue  (e.g.,  [3]).  In 
many  practical  scenarios  the  number  of  receivers  is  con¬ 
siderably  less  then  the  number  of  sensors.  Moreover,  the 
tendency  is  to  use  the  minimum  number  of  receivers  possi¬ 
ble  which  maintain  spatial  capacity,  i.e.,  only  two  receivers. 
Reducing  the  number  of  receivers  results  in  a  cheaper  and 
simpler  design,  in  the  cost  of  a  reduced  performance.  In  this 
paper  we  investigate  the  multiple  source  localization  perfor¬ 
mance  from  the  identification  point  of  view.  We  first  find 
how  many  sources  can  be  localized  with  only  two  receivers 
and  and  then  we  suggest  a  computationally  efficient  algo¬ 
rithm  to  perform  this  task. 

Assume  q  far-field  narrow  band  sources  impinging  on 
an  array  with  p  >  q  sensors  from  directions  [Q\, . . .  ,9q}. 
Using  complex  signal  representation,  the  vector  of  received 
signals  can  be  written  as: 

x(f)  =  A(0)s(f)  +  n(f)  (1) 

where  s (t)  is  the  complex  envelope  of  the  slowly  varying 
signals,  n(f)  is  the  additive  noise,  0  is  the  vector  of  direc¬ 
tions  of  arrival,  and  A(0)  =  [a(0]), . . . ,  a(0,)]  where  a (6) 

is  the  array  steering  vector  at  direction  9.  We  denote  by 
[x(f)]j  the  *-th  element  of  vector  x(f). 

Under  the  standard  assumptions  about  the  noise  being 
Gaussian  and  white  and  of  the  signals  being  Gaussian,  the 
correlation  matrix  of  x(f),  denoted  by  Rx (0),  is  given  by: 

Rx{0)  =  A(0)RsAh(0)  +  a2l  (2) 

where  (• )H  denote  the  complex  conjugate  transpose  opera¬ 
tion,  zr2  is  the  noise  level  and  Rs  is  the  signal  covariance 

The  problem  of  estimating  0  from  a  set  of  N  snapshots 
of  the  array,  x(£j), . . . ,  x(£jv),  is  usually  refereed  to  as  the 
localization  problem.  The  case  of  spatial  samples  which  are 
time  dependent  linear  transformation  of  the  array  output  is 
discussed  in  [3],  The  resulting  model  for  the  measurements 
is  y{ti)  =  G(f,)x(fi),  where  G (ti)  is  the  time  dependent 
linear  transformation.  Note  that  G(£;)  is  a  matrix  in  which 
the  number  of  rows  is  the  number  of  receivers  used  at  time 

We  are  interested  in  the  special  case  where  G(<j)  is  a 
2  x  p  matrix  such  that  each  row  is  a  vector  with  all  elements 
but  one  equal  zero,  where  the  non  zero  element  equals  1. 
Without  loss  of  generality,  we  assume  that  we  take  N  snap¬ 
shots  of  each  sub  array  of  two  elements.  The  total  number  of 
snapshots  is  L  =  (£ )N.  At  time  instant  £*,  i  =  1, ...,  L,  the 
output  of  the  reduced  array  is:  y (ti)  =  [[x(fj)]fc  [x(f,)];]T 
for  some  k  ^  l  €  {1, . . .  ,p} 


In  [3]  the  M L  estimator  for  a  general  transformation  matrix 
G (ti)  is  presented.  This  procedure  involves  maximization 
over  all  unknown  parameters:  G,a2,Rs .  This  maximiza¬ 
tion  problem  becomes  extremely  difficult  even  for  as  little 
as  two  sources.  The  authors  presented  an  ad-hoc  approach, 
the  GLS,  which  reduces  the  complexity  of  the  estimator  to 
a  search  over  only  q  parameters-. 

Alternatively,  by  noting  that  our  problem  can  be  mod¬ 
eled  as  a  problem  of  direction  finding  with  time-varying  ar¬ 
ray,  one  can  apply  the  results  of  [4]  which  include,  among 

0-7803-5988-7/00/$  10.00  ©  2000  IEEE 


others,  expressions  for  the  Cramer  Rao  lower  bound  ( CRLB ) 
on  the  estimation  error  of  the  unknown  parameters.  Also, 
some  conjectures  about  the  complexity  of  the  ML  estima¬ 
tor  were  presented  which  suggested  that  in  the  general  case 
the  ML  estimator  is  not  separable. 

In  [2]  it  is  shown  that,  unlike  the  case  where  the  ar¬ 
ray  is  sampled  simultaneously,  in  cases  where  the  number 
of  sensors  in  the  sub-array  is  smaller  than  the  number  or 
sources,  the  CRLB  for  0\ , . . . ,  0q  does  not  approach  zero 
as  the  SNR  approaches  infinity,  so  the  time  varying  spatial 
sampling  process  causes  a  residual  estimation  error. 

Eigenvector  based  methods  for  the  case  of  time- varying 
arrays  had  been  proposed  in  [1],  In  this  paper  two  possi¬ 
ble  eigenvector  based  method  have  been  proposed.  One  is 
based  on  an  interpolating  matrix  and  the  other  is  based  on 
a  focus  matrix.  However,  both  methods  can  not  be  applied 
to  our  problem  due  to  the  large  differences  in  the  steering 
vectors  between  successive  time  instances. 


It  is  well  known  that  when  the  array  is  simultaneously  sam¬ 
pled  so  (1)  holds,  and  under  some  very  weak  conditions  on 
the  array,  one  can  localize  up  to  p  —  1  sources.  Is  it  also  true 
when  only  two  receivers  bare  used?  The  following  theorem 
refers  to  this  question: 

Theorem  1  Using  an  array  ofp  sensors  and  only  two  re¬ 
ceivers,  up  to  q  =  p-1  narrowband  sources  can  be  uniquely 

Proof  1  Let  y(L)  be  a  column  vector  with  2(^)  elements, 
given  by: 

y(U)  =  [y{ti)T ,y{ti+N)  ,y(ti+2N)T ,■  ■  ■  ,y(U+L-N)  ] 


Without  loss  of  generality,  assume  that  first  we  take  N  sam¬ 
ples  of  the  first  and  second  sensors  simultaneously.  Next 
we  take  another  N  samples  from  the  first  and  third  sensors 
simultaneously,  and  so  on.  y(£i)  is  a  column  vector,  with 
the  first  two  elements  equal  to  the  first  sample  of  the  first 
two  sensors  sampled.  The  third  of  forth  elements  ofy(ti) 
are  the  two  elements  of  the  first  sample  from  the  second  and 
third  sensors,  and  so  on.  It  is  clear  that  {y(t;))i=i  con¬ 
tain  all  the  available  samples  and  thus  it  contains  all  the 
statistical  information  on  the  unknown  parameters. 

It  can  be  easily  verified  that  {y^i)}^  are  i.i.d.  com¬ 
plex  Gaussian  vectors  with  block  diagonal  correlation  ma¬ 
trix,  Ry(9),  given  by 





,  [R*  (£)]« 

I*  -  j\  >  1 
i  >  j 

*  =  iiJ  ^  § 



where  k  and  l  are  the  first  and  second  sensors  sampled  at 
the  [|]  switching.  It  is  clear  from  the  structure  of  Ry  that 
a  simple  one  to  one  mapping,  denoted  by  'ip(R.x),  between 
Rx  and  Ry,  exists 

Let  9  =  [9i,...,6k]  and  (f  —  [9\ , . . . ,  9w]  be  two  sets 
of  bearings,  such  that  k,k'  <  q  —  1  and  O'  f  9.  For  the 
case  of  simultaneous  sampling  up  to  q  —  1  sources  could  be 
uniquely  localized,  i.e.,  Rx (9)  f  Rx(9')  for  every  9  f-  O'. 
Now,  using  the  fact  that  ip  is  a  one  to  one  mapping  between 
Rx  and  Ry,  it  is  clear  that  Ry(0)  f  Ry  ((f)  for  every  9  f 


In  addition,  since  y(L)  is  a  complex  Gaussian  vector, 
the  p.d.f.  of  y(ti)  given  9  is  different  from  the  p.d.f.  of 
y(U)  given  (f,  which  is  a  sufficient  condition  for  identefia- 

This  theorem  provides  a  very  important  result:  at  each 
time  instant  we  are  sampling  a  sub  array  of  size  two  which 
in  turn  enable  us  to  localize  only  one  source.  However,  co¬ 
herently  combining  all  the  results  from  the  sub  arrays,  en¬ 
ables  one  to  localize  p-1  sources,  the  same  number  as  if 
we  were  sampling  the  all  array  with  p  receivers. 


The  ML  estimator  for  0  requires  a  q  dimensional  search, 
at  least.  Eigenvector  based  methods,  like  the  MUSIC,  of¬ 
fers  a  way  to  reduce  the  complexity  to  a  one  dimensional 
search.  This  reduction  in  complexity  is  crucial  since,  still 
today,  with  the  most  advanced  DSP,  searching  in  more  than 
two  dimensional  space  can  not  be  performed  in  real  time. 

We  next  describe  a  new  eigenvector  based  procedure 
which  can  be  used  in  our  problem.  We  start  with  the  fol¬ 
lowing  equivalent  description  of  the  data: 

Let  z (L)  be  a  column  vector  with  p  elements.  Let  all 
the  elements  be  equal  zero  except,  say  the  k  and  l  elements, 
which  are  equal  to  [x(£j)]fc  and  [x(L)]/,  respectively.  That 
is,  k,  l  are  the  two  array  elements  which  are  sampled  at  time 
ti.  Now,  denote  by  Rz  =  z(fi)zH(^)  the  em¬ 

pirical  correlation  matrix,  it  can  be  shown  that  its  expected 
value  is  given  by: 

Rz  =  A(0)RsAh  (9)  +  <t2I  +  A  (5) 

where  A  is  a  diagonal  matrix  whose  diagonal  entries  are 
(p  -  1)  •  diag(A(6)RsAH (0)  +  o2l).  The  matrix  A  is  the 
only  difference  from  the  mean  of  the  sample  covariance  ma¬ 
trix  in  the  case  where  the  array  is  sampled  simultaneously, 
where  eigenvalue  based  methods  are  easily  applied,  and  the 
mean  of  the  sample  covariance  matrix  where  only  two  re¬ 
ceivers  are  used  simultaneously. 

However,  if  all  the  elements  of  the  diagonal  matrix  A 
are  equal,  then  eigenvector  based  methods  for  estimating  0 
can  still  be  used,  since  it  is  just  added  to  the  noise  covariance 


matrix  so  it  effectively  changes  the  (unknown)  noise  level. 
There  are  two  sufficient  conditions  for  all  the  elements  of  A 
to  be  equal: 

1 .  All  sources  are  uncorrelated,  so  Rs  is  a  diagonal  ma¬ 

2.  All  the  array  elements  are  omnidirectional,  such  that 
|a,(0)|  —  |aj(#)|  Vi  ^  j  and  for  any  6. 

However,  since  these  conditions  are  rarely  fully  fulfilled 
in  practice,  MUSIC  like  procedures  cannot  be  applied  on 
Rz  directly. 

A  careful  examination  of  Rz(0)  and  of  Rx(0)  shows 
that  their  off-diagonal  elements  are  the  same,  while  the  di¬ 
agonal  elements  of  Rz{0)  arep—  1  larger,  V0.  We  therefore 
suggest  a  non-linear  pre-processing  procedure:  to  divide  the 

diagonal  elements  of  Rz  by  p—  1.  Denote  by  Rz  the  result¬ 
ing  matrix,  it  can  be  easily  verified  that  E{Rz}  =  Rx (8) 

and  thus  Rz  can  be  used  with  all  the  eigenvector  based 
methods,  e.g  MUSIC.  We  refer  to  the  MUSIC  with  the 
suggested  preprocessing  as  MMUSIC.  Naturally,  the  per¬ 
formance  of  the  MUSIC  and  of  the  MMUSIC  applied  to 
the  same  array  will  be  different,  since  only  the  first  moment 

(the  expected  value)  of  Rx  and  of  Rz  is  the  same. 

This  method  can  be  extended  to  cases  where  the  number 
of  samples  taken  from  each  sensor  is  not  equal.  Let  n;  be 
number  of  samples  taken  at  the  i-th  switching.  Let  Rz  — 
z It  can  be  verified  that  the  mean  of  Rz  is 
given  by: 

E{flz}  =  (A(6>)i?sA(6»)"  +  (72I)0^  (6) 

where  (\P)y  is  the  total  number  of  snapshots  taken  from 
the  i,j  sensors  simultaneously,  is  the  total  number  of 
snapshots  taken  from  i-th  sensor,  and  0  denotes  element  by 
element  matrix  multiplication.  The  suggested  preprocessing 
in  this  case  is  to  divide  each  element  of  Rz  by  the  corre¬ 
sponding  element  in  \&.  The  resulting  matrix,  denoted  again 

by  Rz ,  can  be  used  with  any  eigenvalue  based  method. 


Consider  a  uniform  linear  array  with  4  omni-directional  el¬ 
ements.  Assume  two  equi-power,  partially  correlated  (p  = 
0.25)  sources  at  bearings  0°,  15°  and  N  =  100.  In  Figure 
1  a  typical  spectrum  of  the  MM U SIC  is  shown.  For  com¬ 
parison,  we  show  a  typical  spectrum  of  the  MUSIC  which 
has  been  applied  on  Rz  without  preprocessing.  It  shows 
that  without  preprocessing  the  two  sources  are  not  resolved, 
so,  as  predicted,  the  MUSIC  cannot  be  used  directly  for 
multiple  source  localization. 

The  MUSIC  and  MMUSIC  norrftalized  cost  functions 

Figure  1:  Typical  MUSIC  and  M  MU  SIC  cost  functions 

We  now  present  results  of  a  simulation  performance  study 
for  the  same  experiment.  Figures  2  and  3  depict  the  proba¬ 
bility  of  detecting  two  sources  and  the  M SE  of  the  bearing 
of  the  first  source,  respectively,  for  various  correlation  co¬ 
efficients,  as  a  function  of  the  SNR.  These  results  are  based 
on  averaging  of  1000  Monte  Carlo  Runs. 

Figure  2:  The  probability  of  detecting  two  sources  as  a  func¬ 
tion  of  the  SNR. 

Figures  4  and  5  depict  the  probability  of  detecting  two 
sources  and  the  MSE  of  the  bearing  of  the  first  source,  re¬ 
spectively,  as  a  function  of  the  number  of  snapshots,  where 
the  SNR  is  fixed  at  10  dB. 

Generally  speaking,  this  study  suggests  that  the  perfor¬ 
mance  of  the  MMUSIC  improves  as  the  SNR  increases, 
as  the  number  of  snapshots  increases  and  as  the  correlation 
between  the  sources  decreases.  However,  our  future  work 
will  focus  on  analytic  performance  analysis  of  the  algorithm 


Figure  3:  The  MSE  of  the  bearing  of  the  first  source  as  a  Figure  5:  The  MSE  of  the  bearing  of  the  first  source  as  a 
function  of  the  SNR.  function  of  the  number  of  snapshots. 

so  its  inherent  limitations  can  be  exploit. 

Figure  4:  The  probability  of  detecting  two  sources  as  a  func¬ 
tion  of  the  number  of  snapshots. 

Fewer  Receivers  via  Time- Varying  Preprocessing”, 
IEEE  Trans,  on  SP.  Vol.  47,  pp.  2-10,  January  1999. 

[4]  A.  Zeira  and  B.  Friedlander,  “Direction  Finding  with 
Time  Varying  Arrays”,  IEEE  Trans,  on  SP.  Vol.  43,  pp. 

[5]  M.  A.  Doron,  A.  J.  Weiss  and  H.  Messer,  ’’Maximum 
Likelihood  direction  finding  of  wide  band  sources”, 
IEEE  Trans,  on  SP.  Vol.  41,  pp.  411  -  414,  1993. 


[1  ]  B.  Friedlander  and  A.  Zeira,  “Eigenstructure-based  al¬ 
gorithms  for  direction  finding  with  time-varying  ar¬ 
rays,”  IEEE  Trans,  on  AES.  Vol.  32,  pp.  689  -  701, 
April  1996. 

[2]  J.  Sheinvald,  “On  Detection  and  Localization  of  Mul¬ 
tiple  Signals  by  Sensor  Arrays,”  Ph.D.  Dissertaion,  Tel 
Aviv  University,  Israel. 

[3]  J.  Sheinvald  and  M.  Wax,  “Direction  Finding  with 




K.  Abed-Meraim,  S.  Attallah**,  A.  Chkeif,  Y.  Hua*** 

*  Telecom  Paris,  TSI  Dept.  46,  rue  Barrault,  75634,  Paris  Cedex  13  France. 

**  Centre  for  wireless  communication,  National  University  of  Singapore,  Singapore. 

***  The  University  of  Melbourne,  Elec.  Eng.  Dept.  Parkville,  Vic.  3052,  Australia. 

E-mails:  abed,,, 


In  this  paper,  we  propose  an  orthogonalized  version 
of  OJA  algorithm  (OOJA)  that  can  be  used  for  the 
estimation  of  minor  and  principal  subspaces  of  a  vector 
sequence.  The  new  algorithm  offers,  as  compared  to 
OJA,  such  advantages  as  orthogonality  of  the  weight 
matrix,  which  is  ensured  at  each  iteration,  numerical 
stability  and  a  quite  similar  computational  complexity. 


Principal  and  minor  component  analysis  (PC A  and  MCA), 
which  are  part  of  the  more  general  principal  and  minor 
subspace  (PSA  and  MSA)  analysis,  are  two  important 
problems  that  are  frequently  encountered  in  many  in¬ 
formation  processing  fields. 

Let  {r(fc)}  be  a  sequence  of  IV  x  1  random  vec¬ 
tors  with  covariance  matrix  C  =  i?[r(fc)rT(A;)].  Con¬ 
sider  the  problem  of  extracting  the  principal  or  the  mi¬ 
nor  subspace  spanned  by  the  sequence,  of  dimension 
P  <  N,  assumed  to  be  the  span  of  the  P  principal 
or  minor  eigenvectors  of  the  covariance  matrix,  respec¬ 
tively.  To  solve  this  problem,  several  subspace  extrac¬ 
tion  algorithms  have  so  far  been  proposed  [l]-[5).  The 
minor  subspace  extraction  algorithm  of  Oja  et  al.  [4] 
can  be  formulated  as 

W(z  +  1)  =  W (i)  -  0  [r(t)yT(t)  -  W(i)y(i)yT(i)] 

=  W(i)  -  0p(i)yT{i)  (1) 

-W(z)y(i)yr(i)]  .  (2) 

The  discrete-time  update  of  (2)  suffers  from  a  marginal 
instability  similar  to  the  PCA  ( P  =  1)  algorithm  in  [2], 
Recently,  a  novel  self-stabilizing  MSA  algorithm  given 

W(*  +  l)  =  W(t)-/?[r(i)y(f)TWT(i)W(i)x 

WT(i)W(i)  —  W(i)y(i)yT(i)] ,  (3) 

has  been  proposed  by  Douglas  et  al.  in  [3]. 


Our  algorithm  consists  of  (1)  plus  an  orthogonalization 
step  of  the  weight  matrix  to  be  performed  at  each  it¬ 
eration.  Orthogonality  is  an  important  property  that 
is  desired  in  many  subspace  based  estimation  methods 
[6].  To  this  end,  we  set  (using  informal  notation): 

W(»  +  1)  :=  W(*  +  l)(WT(i  +  1) W (i  +  1))~V2  (4) 

where  (WT(t  4-  l)W(t  4-  l))-1/2  denotes  an  inverse 
square  root  of  (WT(i  4-  l)W(z  4-  1)).  To  compute  the 
latter,  we  use  the  updating  equation  of  W(t+ 1).  Keep¬ 
ing  in  mind  that  W  ( i )  is  now  an  orthogonal  matrix,  we 

WT(i+l)W(i+l)  =  l4-/?2||p(i)||2y(z)yT(i)  =  I+xxr, 

where  W (i)  €  HNy  P  is  the  minor  subspace  estimate, 
y(0  =  WT(i)r(z),  p(i)  =  (r(i)  -  W(i)y(i)),  and  0  > 
0  is  a  learning  parameter.  Reversing  the  sign  of  the 
adaptive  gain,  i.e.,  replacing  —0  in  (1)  by  +0,  yields 
a  principal  subspace  extraction  algorithm.  Chen  et  al. 
have  proposed  a  novel  MSA  algorithm  [5]  which  can  be 
written  as  follows 

where  we  have  used  the  fact  that  WT(i)p(i)  =  0, 1  is 
the  identity  matrix,  and  x  =  j8||p(t)||y(i).  Using 

(I  +  xxr)-V2  _  !  +  ( 




||2  > 

we  obtain 

W(i  +  1)  =  W(t)  -  0  [r(i)y(t)TWT(z)W(i)  (WT(z  +  l)W(i  +  1))"1/2  =  I  +  r(i)y(i)yT(z),  (5) 

0-7803-5988-7/00/$  10.00  ©  2000  IEEE 


where  r(i)  ^ - —  ( —  ■  .  — 1).  Sub** 

where  r{t)  ||y(0||2  ^  ^  +  /jaj|p(OII3||y(OHa 

stituting  (5)  into  (4)  and  using  the  updating  equation 

of  W(i  +  1)  leads  to 

W(t  +  1)  =  (W(t)-/3p(t)yT(t))(I  +  r(t)y(t)yr(t)) 
=  W(i)  -  f3p(i)yT(i),  (6) 

where  p(i)  =  — r(i)W(i)y(*)//3+(H-r(*)||y(*)ll2)p(*)> 
Thus,  the  algorithm  can  be  written  as 

•  Initialization  of  the  algorithm: 

W(0)  =  any  arbitrary  orthogonal  matrix. 

•  Algorithm  at  iteration  i: 









r(i)  -  z(t) 






<t>(i)  - 1 



-r(i)z(i)//3  +  <f>{i)p(i) 

W(i  + 1) 


W(i)  —  Pp{i)yT{i) 

In  order  to  gain  : 


insight  into  OOJA  algorithm 

we  must  examine  the  following  points: 

1.  Minor  subspace: 

In  terms  of  orthogonality  errors,  00  JA  algorithm 
guarantees  the  orthogonality  of  the  weight  matrix 
at  each  iteration.  With  the  orthogonalization  the 
three  algorithms  (I),  (2),  and  (3)  become  identi¬ 
cal.  However,  simulation  results  show  that  the 
discrete-time  update  of  OOJA  algorithm  is  sen¬ 
sitive  to  the  propagation  of  rounding-off  errors. 
Fortunately,  we  can  overcome  this  problem  by  re¬ 
formulating  the  algorithm  equations  as  shown  in 
section  3. 

2.  Principal  subspace: 

With  respect  to  subspace  errors,  our  algorithm 
converges  at  the  same  rate  as  (1).  In  terms  of  or¬ 
thogonality  errors,  it  guarantees  the  orthogonal¬ 
ity  of  the  weight  matrix  at  each  iteration,  whereas 
(1)  converges  to  an  orthogonal  weight  matrix  only 
asymptotically.  Finally,  it  is  worth  noting  that 
(3)  quickly  diverges  for  PSA. 

3.  Computational  complexity: 

The  computational  complexity  of  algorithms  (3) 

and  (2)  are  7NP  +  0(N)  and  5 NP  +  0(N )  flops 
per  iteration,  respectively.  OOJA  and  (1)  cost, 
however,  only  3NP+0(N)  flops  per  iteration.  It 
is  interesting  to  note  that  the  orthogonalization 
step  does  not  increase  the  computational  cost  of 
OJA  algorithm.  On  the  other  hand,  the  updating 
equation  of  the  weight  matrix  of  OOJA  algorithm 
has  a  more  compact  form  than  (2)  and  (3),  i.e., 
it  uses  only  one  outer  product  instead  of  two  for 
(2)  and  (3).  This  turns  out  to  be  useful  when 
a  subspace  extraction  algorithm  is  cascaded  with 
other  adaptive  algorithms,  e.g.,  [8]. 

4.  Convergence: 

The  convergence  of  OOJA  algorithm  follows  di¬ 
rectly  from  that  of  OJA  algorithm  [7].  In  fact,  (6) 
can  be  rewritten  as  W(i+1)  =  W(i)— f)p(i)yT(i)+ 
„0(/32).  Therefore,  for  <  1,  it  can  be  shown 
that  the  two  algorithms  have  the  same  conver¬ 
gence  performance. 

On  the  other  hand  the  convergence  proof  of  (3)  is 
not  complete.  Effectively,  to  prove  that  span[W] 
converges  to  span[E2],  where  E2  is  the  minor  P- 
dimen-  sional  subspace  spanned  by  the  eigenvec¬ 
tors  corresponding  to  the  P  smallest  eigenvalues, 
Douglas  et  al.  [3]  have  used  the  following  assump¬ 

If  all  the  eigenvalues  of  M (t)  have  negative  real 
parts,  then  for  the  following  system 


we  have 

lim  Q (t)  =  0. 


This  assumption  is  true  if  M(£)  is  time  invariant 
but  not  always  true  when  M(f)  is  time  variant  as 
shown  by  the  counter  examples  given  in  [10,  11]. 


Because  of  the  numerical  instability  of  OOJA  when 
used  for  minor  subspace  estimation,  we  propose  here 
another  implementation  of  the  algorithm  based  on  House¬ 
holder  transformation.  In  fact,  the  new  implementa¬ 
tion  can  be  derived  from  a  reformulation  of  (6)  in  terms 
of  Householder  transformation.  We  have  the  following 

Proposition  1  Let  u(i)  =  p(i)/||p(j)[|.  Then  equa¬ 
tion  (6)  can  be  rewritten  as 

W(t  +  1)  =  H(t)W(i)  (7) 


where  H(i)  is  the  Householder  transformation  given  by 
H(i)  =  I  —  2u(i)uT(t) 

Based  on  this  result  (see  appendix  for  proof),  the 
new  implementation  consists  in  computing  successively 
y(t),  p(i),  r(t),  and  p(t).  Then,  we  compute 

u(»)  =  p(0/IIp(*)II 

v(i)  =  WT(i)u(t) 

W(*  +  l)  =  W(i)  -  2u(t)vT(i) 

new  implementation  is  numerically  stable. 

Example  3:  We  consider  here  the  same  context  as 
in  the  previous  examples.  By  reversing  the  sign  of  p , 
we  extract  now  the  principal  P-dimensional  subspace. 
In  (9),  we  replace  Ei  by  E2  and  vice  versa.  As  we  can 
see  from  figure  3,  our  algorithm  (without  Householder 
implementation)  is  numerically  stable  and  has  better 
performance  than  (1),  (2),  and  (3). 


Since  the  decomposition  of  the  weight  matrix  involves  In  this  paper;  we  proposed  an  orthogonal  OJA  (OOJA) 
the  use  of  numerically  well-behaved  Householder  or-  algorithm  that  can  perform  both  PCA  and  MCA  by 

thogonal  matrices  (see  [9]  pp.209-213),  OOJA  becomes  simply  switching  the  sign  of  the  same  learning  rule, 

numerically  very  stable.  The  new  implementation  presents  We  gave  tw0  fast  implementations  of  OOJA  where  the 
now  a  computational  complexity  of  4 NP  +  O(N)  flops  orthogonality  of  the  weight  matrix  is  ensured  at  each 
per  iteration.  iteration.  OOJA  is  numerically  stable  and  its  compu¬ 

tational  complexity  is  smaller  than  those  reported  in 
4.  SIMULATION  RESULTS  [3]  and  [5]. 

Example  1:  In  this  example,  we  choose  r (i)  to  be  a 
sequence  of  independent  jointly-Gaussian  random  vec¬ 
tors  with  covariance  matrix 

/  0.9  0.4  0.7  0.3  \ 

0.4  0.3  0.5  0.4 

0.7  0.5  1.0  0.6 

V  0.3  0.4  0.6  0.9 


P  =  2,  P  —  0.01,  and  as  recommended  in  [5]  W(0)  = 
D,  where  Dtj  =  S(j  —  i).  As  in  [3],  we  calculate  the 
ensemble  averages  of  the  performance  factors 

P(i)  =  )  _ - ~ - r.  (9) 

ro  tr  (W-T (i)E2  *  E2rWr(i)J 

riH)  =  ^-f:i|W?(0Wr(0-I|ft,  (10) 

where  the  number  of  algorithm  runs  is  ro  =  100,  r  indi¬ 
cates  that  the  associated  variable  depends  on  the  par¬ 
ticular  run,  ||.||f  denotes  the  Frobenius  norm,  and  Ei 
(respectively  E2)  is  the  principal  ( N  -  P) -dimensional 
subspace  (respectively  minor  P-dimensional  subspace) . 
Figure  1  compares  the  performance  of  OOJA  (without 
Householder  implementation)  with  (1),  (2),  and  (3). 
As  we  can  see  our  algorithm  behaves  better  than  (1) 
and  (2),  but  still  suffers  from  numerical  instability. 

Example  2:  In  this  example  all  parameters  are 
kept  the  same  as  in  the  first  example.  Figure  2  shows 
the  performance  of  Householder-based  OOJA  algorithm 
as  compared  to  (1),  (2),  and  (3).  We  can  see  that  the 


Proof  of  proposition  1:  Using  the  definition1  of  y  we  can 
write  pyr  =  prTW.  By  decomposing  the  observation 
vector  as: 

r  =  WWTr  +  (I  -  WWT)r 
=  Wy  +  p 

—P  \—  r__.  — r 

=  —  ItWj,+tt 

we  can  write 

PVTW  =  ^pLlWy+^j  W 

T  P  P  . 


=  ^P  ~ Wy  +  (l  +  r||y||2)p  W 

=  — PPTW. 


where  the  second  equality  comes  from  the  fact  that 
pTW  =  0.  Finally,  we  obtain  W(i+1)  =  (I+^jyp(i)p(i)T)W 
To  complete  the  proof  we  have  to  show  that 

a 2  _2  _2  T 

—  =  jj=jj2  or  equivalently  ||p||2  = 

Using  the  definition  of  p  and  the  equality  1  +  r||y ||2  = 

(1  +  /J2||p||2||y||2)-1/2,  we  can  write 

l|p|1  ^  i  +  /32liPll2llyll2 

lHere,  we  omit  the  time  index  t  to  simplify  the  notations. 












/?2||y||2Vi  I  +  ^IIpIIW 
((r||y||2)2  +  l-(l+r||y||2)2) 



10*  10*  10*  10*  to4  10* 

number  of  Iterations 

Figure  1:  Average  behaviors  for  MSA. 

number  of  Iterations 

Figure  2:  Average  behaviors  for  MSA  using 
Householder-based  implementation. 


[1]  Y.  Hua,  Y.  Xiang,  T.  Chen,  K.  Abed-Meraim,  and  Y. 
Miao,  “A  New  Look  at  the  Power  Method  for  Fast  Sub¬ 
space  Tracking” ,  Digital  Signal  Processing,  Academic 
Press,  Oct.  1999. 


Figure  3:  Average  behaviors  for  PSA. 

[2]  T.  P.  Krasulina,  “Method  of  Stochastic  Approxima¬ 
tion  in  the  Determination  of  the  Largest  Eigenvalue  of 
the  Mathematical  Expectation  of  Random  Matrices,” 
Automat.  Remote  Contr.,  vol  2,  pp.  215-221,  1970. 

[3]  S.  C.  Douglas,  S.-Y.  Kung,  and  S.-I.  Amari,  “A  Self- 
Stabilized  Minor  Subspace  Rule,”  Sig.  Process.  Let¬ 
ters,  vol.  5,  no.  12,  pp.  328-330,  Dec.  1998. 

[4]  E.  Oja,  “Principal  Components,  Minor  Components, 
and  Linear  Neural  Networks,”  Neural  Networks,  vol. 
5,  pp.  927-935,  Nov./Dec.  1992. 

[5]  T.  Chen,  S.-I.  Amari,  and  Q.  Lin,  “A  Unified  Algo¬ 
rithm  for  Principal  and  Minor  Components  Extrac¬ 
tion,”  Neural  Networks,  vol.  11,  pp.  385-390,  1998. 

[6]  S.  Marcos,  A.  Marsal,  and  M.  Benidir,  “The  Prop¬ 
agator  Method  for  Source  Bearing  Estimation,”  Sig. 
Proc.,  vol  42,  pp.  121-138,  Apr.  1995. 

[7]  T.  Chen,  Y.  Hua,  and  W.  Yan,  “Global  Convergence 
of  Oja’s  Subspace  Algorithm  for  Principal  Component 
Extraction,”  IEEE  Trans,  on  Neural  Network,  pp.  58- 
67,  Jan.  1998. 

[8]  A.  Chkeif,  K.  Abed-Meraim,  G.  Kawas  Kaleh,  and  Y. 
Hua,  "Blind  Adaptive  Multiuser  Detection  With  An¬ 
tenna  Array,”  accepted  for  publication  in  IEEE  Trans, 
on  Comm. 

[9]  G.  H.  Golub  and  C.  F.  Van  loan,  "Matrix  computa¬ 
tions,”  the  Johns  Hopkins  University  Press,  1996. 

{10]  L.  Markus  and  H.  Yamabe,  “Global  Stability  Criteria 
for  Differential  Systems”  J.  Osaka  Math.,  vol.  12,  pp. 
305-317,  1960. 

[11]  R.  E.  Vinograd,  “Remark  on  the  Critical  Case  of  Sta¬ 
bility  of  a  Singular  Point  in  the  Plane”  Doklady  Akad. 
Nauk,  vol.  101,  pp.  209-212,  1955. 



Per  Pelin,  Ramon  Brcich  and  Abdelhak  Zoubir 

Australian  Telecommunications  Research  Institute 1  (ATRI),  Curtin  University  of  Technology, 
GPO  Box  U  1987,  Perth  WA  6845,  Australia.  E-mail: 


A  crucial  step  in  many  signal  processing  applications  is  the 
determination  of  the  effective  rank  of  a  noise  corrupted 
multi-dimensional  signal,  i.e.,  the  dimension  of  the  signal 
subspace.  Standard  techniques  for  rank  estimation,  such  as 
the  minimum  description  length,  often  have  shortcomings 
in  practice,  an  example  being  when  noise  parameters  are 
unknown.  An  alternative  scheme  is  proposed  for  rank 
detection.  From  successive  pairs  of  the  ordered  eigenvalues 
of  the  array  covariance,  a  series  of  statistics  is  formed.  The 
statistics  are  chosen  such  that  their  distributions  for  noise 
eigenvalue  pairs  are  close.  The  actual  distributions  are 
unknown  and  are  estimated  with  the  Bootstrap.  The  rank  is 
then  found  by  a  sequential  comparison  of  the  estimated  dis¬ 
tributions  using  a  Kolmogorov-Smirnov  test. 


Many  signal  processing  algorithms,  such  as  direction  find¬ 
ing  algorithms,  rely  on  the  low-rank  structure  of  a  multi- . 
dimensional  signal.  The  rank  typically  has  an  interpretation 
as  the  model  order,  revealing  the  number  of  signals  hidden 
in  noise,  or  the  dimension  of  a  low-order  signal  subspace. 
Therefore,  finding  the  effective  rank  of  a  noise  corrupted 
signal  is  a  crucial  initial  step  in  many  applications. 

Classical  techniques  to  estimate  the  rank  when  the 
noise  is  Gaussian  include  the  minimum  description  length 
(MDL)  and  Akaike’s  information  theoretic  criterion  (AIC) 
f  10],  and  their  subjective  counterpart  the  sphericity  test  [2], 
In  the  latter,  a  threshold  is  set  to  obtain  a  desired  level  of 
the  test,  whereas  in  the  objective  MDL  and  AIC,  the  actual 
threshold  is  dependent  on  the  data  size  by  asymptotic  argu¬ 
ments.  Nevertheless,  they  all  rely  on  the  structure  of  the 
noise  eigenvalues  of  the  covariance  matrix,  and  it  is 
required  that  the  actual  spatial  noise  color  is  known.  If  the 
noise  assumptions  are  violated,  for  example,  when  the 
noise  has  an  unknown  spatial  color,  detection  performance 
is  degraded.  For  noise  of  unknown  color,  an  alternative  to 
eigenvalue-based  tests  is  to  use  properties  of  canonical  cor¬ 
relations  [2],  as  in  [11][12],  However,  these  schemes  put 
some  restrictions  on  the  structure  of  the  data  model,  limit¬ 
ing  their  applicability. 

1.  This  work  was  in  part  supported  by  the  Australian  Telecommunica¬ 
tions  Cooperative  Research  Centre  (AT-CRC). 

To  mitigate  the  problem  of  slight  uncertainties  in  the  noise 
model,  both  w.r.t.  possible  non-Gaussianity  and  noise 
color,  a  new  technique  for  rank  detection  is  proposed.  The 
detection  procedure  is  based  on  a  property  of  the  marginal 
distributions  of  the  noise  sample  eigenvalues.  Instead  of 
relying  on  parametric  assumptions,  these  distributions  are 
estimated  from  the  data  using  the  Bootstrap  [5].  Based  on 
these  estimates,  the  distributions  of  a  series  of  secondary 
variables  are  estimated,  on  which  the  actual  rank  estima¬ 
tion  is  performed  using  a  robust  Kolmogorov-Smirnov  test 
[7].  The  necessary  number  of  Bootstrap  resamples  is  sur¬ 
prisingly  small,  keeping  the  computational  cost  at  a  reason¬ 
able  level. 


Consider  m-variate  data  according  to  the  linear  model 

x(«)  =  4s(n)  +  v(n)  (1) 

where  A  is  a  mixture  matrix  (for  example  the  array  steer¬ 
ing  matrix  in  sensor  array  processing),  s{n)  is  a  vector  of 
signals,  and  v(n)  is  noise  from  some  possibly  unknown 
distribution.  Assuming  the  signal  and  noise  are  uncorre¬ 
lated  and  zero-mean,  the  array  covariance  is 

Rx  =  E[x(n)xH(n )]  =  ARSAH  +  RV.  (2) 

The  problem  considered  is  to  determine  the  rank  of  the  sig¬ 
nal  part/subspace,  i.e.,  d  =  rank(ARf) ,  based  on  N 
observations  of  the  data  (1). 

If  the  additive  noise  is  spatially  white,  Rv  =  a2 1 ,  The 
(population)  eigenvalues  of  (2)  are 

>  ...  >  Xd  >  Xd+ ,  =  ...  =  X„,  =  o  ,  (3) 

i.e.,  the  true  noise  eigenvalues  are  all  equal.  However,  when 
calculated  from  the  sample  covariance 

1  N 

RX  =  jy  X  x(n)xH(n) ,  (4) 

n  -  1 

estimated  from  a  finite  number  of  N  data  snapshots,  the 
ordered  sample  eigenvalues  are  distinct  with  probability 
one,  i.e., 

Xi>  ...  >Xd>Xd+\ >  ...  >Xm>0.  (5) 

The  distribution  Fnx(X)  of  (5)  for  a  data  sample  of  N 
snapshots,  either  in  the  form  of  a  probability  density  func¬ 
tion  (PDF),  or  a  cumulative  distribution  function  (CDF), 

0-7803-5988-7/00/$  10.00  ©  2000  IEEE 


tends  to  take  a  very  complex  form.  The  sample  eigenvalues 
are  biased  (as  in  (5))  and  mutually  correlated.  The  exact 
distribution  is  only  known  for  the  Gaussian  case  with  cer¬ 
tain  population  eigenvalues,  and  is  given  in  the  form  of  a 
series  expansion  [8].  For  the  general  case,  both  w.r.t.  the 
actual  source  distribution  and  the  population  eigenvalues, 
the  distribution  (joint  or  marginals)  may  only  be  available 
asymptotically,  for  large  N  [1][8].  For  small/moderate  N , 
corresponding  to  many  practical  applications,  the  error  in 
the  asymptotics  may  be  substantial.  Thus,  there  is  no  gen¬ 
eral  ‘ease  of  use’  form  of  Fnx(X)  available. 

Instead  of  relying  on  asymptotic  results,  which  are 
unreliable  on  short  data  records,  the  detection  scheme  to  be 
presented  in  the  next  section  will  be  based  on  an  approxi¬ 
mate  relation  between  the  marginal  distributions  of  the 
noise  sample  eigenvalues.  Specifically,  numerical  experi¬ 
ments  indicate  that  for  the  white  noise  case  (3),  the  mar¬ 
ginal  PDFs  of  the  noise  sample  eigenvalues  are 
approximately  related  as 

fNi(ii)=fN(K%)  i>d+  1  (6) 

for  some  k  ,  i.e.,  the  marginals  /W,(X,),  i>d+  1  are  simply 
scaled  versions  of  the  same  basic  PDF  fN(-)  ■  While  there  is 
no  claim  of  the  generality  of  this  approximation,  it  has 
shown  to  very  precise  when  the  ratio  N/m  is  say  five  or 
higher.  Also,  what  is  important  for  detection  based  on  this 
property,  is  that  the  approximation  is  robust  to  slightly 
colored  noise,  and  practically  invariant  to  non-Gaussianity. 
Then,  even  if  the  data  does  not  correspond  perfectly  to  the 
assumed  data  model,  (6)  allows  for  robust  rank  detection. 
An  example  to  illustrate  (6)  will  be  given  in  Section  4. 


3.1.  Detection  principle 

To  indicate  how  the  relation  (6)  can  be  used  for  rank  esti¬ 
mation,  assume  a  number  of  m  independent  variables  T) ; 
having  distributions  identical  to  the  marginal  distributions 
of  the  sample  eigenvalues  1, .  From  the  T|, ,  m  -  1  second¬ 
ary  variables  v,  are  formed  as  the  ratios 

v/  =  V'1l/+i»,‘e  [1,/n-l].  (7) 

Then,  up  to  the  order  of  the  approximation  given  in  (6),  v,- 
for  i  €  [d+  \,m-  1]  will  have  identical  distributions,  as 
these  v,  are  invariant  to  the  (possibly  unknown)  scaling  K . 
However,  vd  =  r\d/r\d+i ,  involving  the  marginal  of  the 
smallest  signal  eigenvalue,  will  tend  to  larger  values  than 
vrf+ , .  This  forms  the  basis  for  rank  detection:  if  the  mar¬ 
ginals  can  be  captured  from  the  data 

x(n),ne  [1,  N] ,  the  order  d  can  be  estimated  by  testing 
for  equality  among  the  distributions  of  v„  i  e  [1,  m  -  1  ] . 

A  practical  algorithm  to  exploit  this  property  for  rank 
estimation  is  as  follows: 

1 .  Use  the  Bootstrap  to  first  estimate  the  marginals  of  the 
sample  eigenvalues  fNi(h),  i  =  [1,  m] ,  and  then  the 
distributions  of  vf,  i  e  [1,  m  -  I] . 

2.  Apply  the  Kolmogorov-Smimov  test  [7]  to  test  for 
pair-wise  equality  of  the  distributions  Fv  ,(v,)  of'v,  , 
starting  from  the  bottom  (Fv  m_2(v„,_2)  versus 
Fv,  m  _  i(vm_  i)  )>  and  stepping  up  until  equality  is 

Before  going  into  the  full  details  of  the  scheme,  it  is  neces¬ 
sary  to  establish  how  the  Bootstrap  behaves  when  resam¬ 
pling  data  to  calculate  eigenvalues. 

3.2.  The  Bootstrap  and  eigenvalues 

The  Bootstrap  is  a  general  tool  for  estimation  of  the  distri¬ 
bution  of  a  statistic  from  a  sample  of  data.  In  this  case  the 
Bootstrap  is  employed  to  estimate  FnX(\) .  The  principle 
of  the  Bootstrap  is  as  follows.  The  original  data 
x(n),  n  =  [  1,  N] ,  i.e., 

XN  =  [*(  I ),...,  x(N)],  (8) 

is  an  estimate  of  the  distribution  of  x(n).  Assigning  each 
snapshot  a  probability  l/N ,  resamples  are  taken  randomly 
(with  replacement)  from  XN ,  giving  Bootstrap  data 

X*N  =  [x\l ),...,  x*(N)].  (9) 

From  the  Bootstrap  resample  X*N ,  the  sample  eigenvalues 
are  calculated  (through  (4)),  giving 

r  =  [it . i*m]  (10) 

with  X*  >  ...  >  £,,*  >  ...  >  V„, .  The  procedure  is  repeated  a 
number  of  B  times.  Then,  the  Bootstrap  distribution 
derived  from  the  B  replicates  of  (10)  is  a  nonparametric 
estimate  of  FNX(%) . 

As  the  sample  eigenvalues  are  highly  non-linear  func¬ 
tions  of  the  data  sample,  results  on  the  Bootstrap  w.r.t.  lin¬ 
ear  statistics  do  not  apply.  Though,  some  results  on  the 
properties  of  eigenvalues  calculated  from  resampled  data 
can  be  found  in  [3]  [4]: 

•  For  distinct  population  eigenvalues,  Fvx(^)  converges 
asymptotically  to  FnX(\)  . 

•  For  equal  population  eigenvalues  (such  as  in  the  white 
noise  case),  FN\(i)  does  not  converge  to  FnX(K). 
However,  if  resamples  are  taken  of  size  M  <N  from 
X^ ,  such  that  M  — »  °°  as  N  — >  °° ,  while  M/N  — »  0 , 
then  Fm\(X)  converges  weakly  to  F m  {\) ,  i.e.,  the 
distribution  of  the  eigenvalues  of  a  sample 
x(n),n  =  [1  ,M]. 

From  numerical  experiments  it  is  easily  seen  that  the  major 
problem  with  the  Bootstrap  is  to  characterize  the  depend¬ 
ence  between  sample  eigenvalues:  while  the  Bootstrap  does 
make  a  good  job  capturing  the  marginals,  the  dependence 
between  the  sample  eigenvalues  is  not  maintained  in 
Fn\(X)  for  reasonable  N .  This  motivates  the  use  of  the 


marginals  only.  Also,  a  full  characterization  of  the  joint  m  - 
dimensional  distribution  would  require  a  very  large  data 
record  (N) .  By  only  considering  the  marginals,  a  much 
smaller  data  size  is  required.  It  is  also  worth  considering 
resamples  of  size  M  <N .  This  relaxes  the  strong  depend¬ 
ence  on  the  actual  data  XN  somewhat,  which  seems  to 
remove  some  erratic  behavior  seen  on  small  sample  sizes. 

3.3.  Detection  scheme 

The  full  estimation/detection  procedure  is  as  follows: 

1.  Estimate  the  marginal  distributions  /m,(A,), 
i  =  [1,  m] ,  by  taking  B  resamples  of  size  M  from 
the  data  XN  ^  For  each  resample,  calculate  the  sample 
eigenvalues  A  (10). 

2.  Estimate  the  distributions  of  v(,  i  e  [1,  m-  1]  (7).  To 
do  this,  note  that  in  place  of  the  fictitious  independent 
variables  t|(,  i  e  [1,  m\ ,  sample  eigenvalues  A*  from 
different  resamples  A  can  be  used  (the  sample  eigen¬ 
values  from  one  resample  are  correlated).  Thus,  form 

V*  =  )//(£*+ 1)*,  /  e  [1,  m-  1]  (II) 

with  /  and  k  being  different  resamples.  Although  an 
arbitrary  number  of  resamples  {Bf)  of  (11)  could  be 
taken,  it  is  sensible  to  use  all  B  A*  from  step  1  in  a 
systematic  way.  Estimate  the  CDFs  of  v, , 
ie  [  1,  m  -  1  ] ,  by  the  staircase  approximation 

Fv,i(x)  =  number  of(v*  <  jc)/J5  .  (12) 

3.  Determine  the  test  statistics  for  the  one-sided  Kol- 
mogorov-Smirnov  (KS)  test  from  the  distributions  (12) 

T,  =  sup(Fv, /  +  i(jt)  —Fv,i(x)) 

x  U-U 

for  i  =  [  1,  m  —  2] .  Under  the  hypothesis  that 
Fv,i+\(x)  and  Fvj(x)  are  equal,  the  test  statistic  7)  is 
asymptotically  distributed  as  [7] 

P(jBTt  <  x)  ->  1  -  exp(-2x2)  (14) 

for  x  >  0  . 

4.  Final  step.  Determine  the  rank  d  from  a  sequential  test 
on  the  KS  statistics: 

I  Set  i  =  m  -  2 . 

II  Define  the  null  hypothesis  H:  d  =  i ,  and  the 
alternative  hypothesis  K:  d  <i . 

III  Set  a  threshold  y  based  on  the  tail  area  of  the  distri¬ 
bution  (14)  of  (13)  under  K  [7]. 

IV  If  Tj  >  y  accept  H  (i.e.  reject  equality  of  distribu¬ 
tions)  and  stop,  else  set  i  =  i-  1  and  return  to  II. 

Note  that  in  order  to  enable  a  correct  decision,  the  test  pro¬ 
cedure  requires  there  are  at  least  two  noise  eigenvalues. 

There  are  a  number  of  parameters  to  be  tuned/chosen 
in  the  scheme.  First,  consider  the  resample  size  M .  A 
smaller  M  tends  to  improve  the  estimate  of  fMl(X,) .  How¬ 

ever,  a  small  M  leads  to  a  loss  in  the  signal  to  noise  ratio 
(SNR)  detection  threshold  (i.e.,  the  minimum  SNR 
required  for  reliable  rank  detection),  as  the  relative  distance 
between  fMd(Xd)  and  fM(d+l)(X(d  +  i))  decreases  with  a 
decreasing  M.  For  a  data  size  N  of  order  ~0(1O2) ,  a  rea¬ 
sonable  trade-off  is  M  =  3N/4 . 

The  number  of  Bootstrap  resamples  B  has  an  impact 
on  the  estimated  distributions  and  is  therefore  a  crucial 
parameter.  Some  guidelines  on  the  impact  of  the  number  of 
Bootstraps  B  can  be  found  in  [5]  [6].  Unfortunately,  no 
results  are  given  in  absolute  terms.  However,  note  that  the 
proposed  detection  scheme  does  not  require  any  critical 
values  to  be  estimated  with  high  precision.  What  is  impor¬ 
tant  is  that  the  locations  of  the  distributions  of  the  v,  are 
estimated  with  sufficient  accuracy  for  the  subsequent  KS 
test  to  work  properly.  Thus,  B  should  be  large  enough  such 
that  the  means  of  v*  are  reasonably  stable  on  a  normalized 
scale.  A  coarse  first  order  approximation  of  £[jiv-]  gives 

k;  -  k:(  >  <15> 

i.e.,  the  stability  of  the  location  depends  to  a  large  extent  on 
the  location  of  A,  .  To  arrive  at  an  expression  for  the  neces¬ 
sary  B ,  note  that  the  sample  eigenvalues  are  reasonably 
close  to  Gaussian.  The  separation  (bias)  of  two  sample 
eigenvalues  corresponding  to  equal  population  eigenvalues 
is  roughly  two  times  the  standard  error,  see  Figure  1  (note 
that  this  relation  holds  regardless  M).  Now,  the  standard 
error  of  the  sample  mean  of  B  iid  Gaussian  variables  is 
O/jB .  Thus,  the  location  error  of  }'Mi(Xj) ,  normalized  to 
the  separation  of  neighbouring  distributions,  is  of  order 

o/jB  _  _1_ 

20  2  Jb' 


As  an  example,  with  B  =  25  the  location  error  is  of  order 
0.1  which  is  small  enough  for  reliable  detection.  Note  that 
there  is  no  point  using  too  large  a  B ,  as  the  error  originat¬ 
ing  from  the  approximation  (6)  then  will  dominate  the  ‘ran¬ 
domness’  in  Tj . 

The  final  parameter  to  be  chosen  is  the  threshold  y  for 
the  KS  test  in  Step  4.  This  threshold  can  be  determined  in 
two  ways.  First,  y  can  be  set  to  maintain  a  desired  level  of 
the  test  at  each  sequential  stage  (as  in  the  sphericity  test), 
based  on  the  distribution  (14)  of  the  test  statistic  (13)  under 
K  (the  hypothesis  that  the  distributions  are  equal).  Alterna¬ 
tively,  y  can  be  set  for  ‘MDL-like  consistency’.  To  see  this, 
note  that  7,  — >  1  rapidly  under  Ft  for  increasing  SNR,  or 
N .  At  the  same  time,  under  K ,  the  tail  probability  of  7)  is 
small  even  for  modest  y.  Thus,  y  can  be  set  to  provide  a 
probability  of  detection  very  close  to  one,  without  much 
penalty  in  the  SNR  threshold.  As  an  example,  with 
B  =  25,  the  95%  level  under  K  is  y=0.35.  With 
y  =  0.7  ,  the  level  is  99.9995%. 



Figure  1.  CDFs  of  a)  sample  eigenvalues,  b)  scaled  sample 
eigenvalues,  c)  scaled  Bootstrap  eigenvalues,  and  d)  the  test 
variables  (k,*)  /  {k* +  ,  )k . 


The  detection  scheme  relies  on  the  validity  of  the  assump¬ 
tion  (6).  To  illustrate  the  principle  of  the  test,  data  was  gen¬ 
erated  according  to  the  model  (1):  a  6-element  uniform 
linear  array  with  half  wavelength  element  spacing  receives 
d  =  2  uncorrelated  Gaussian  signals  from  directions 
[  10°,  25°] ,  relative  to  the  array  broadside.  The  signals 
were  observed  in  white  Gaussian  noise  with  an  element 
SNR  of  -3dB.  Figure  la  shows  the  marginal  CDFs  of  the  6 
sample  eigenvalues,  when  calculated  based  on  N  =  100 
independent  array  snapshots.  Figure  lb  shows  the  CDFs 
when  the  sample  eigenvalues  have  been  pre-scaled  with 
k'~4  (relative  to  eigenvalue  number  four)  as  in  (6).  In  this 
case,  k  =  1.21 ,  and  the  scaled  noise  CDFs  are  all  very 
close,  with  a  largest  pair-wise  separation  |/7’Kf  -  FK(i  +  x  j|  of 
0.11  for  i>  d . 

Similarly,  Figure  lc  shows  the  CDFs  of  scaled 
(k=1.36)  Bootstrap  eigenvalues,  estimated  from  B  =  50 
resamples  of  size  M  =  75  ,  taken  from  one  data  realization 
of  N  =  100  snapshots.  Clearly,  the  Bootstrap  eigenvalues 
are  slightly  more  variable,  which  is  due  both  to  M  <N , 
and  the  effective  loss  in  sample  size  from  resampling. 
Again,  the  noise  CDFs  are  close,  but  with  some  random 
fluctuations  due  to  the  limited  number  B  .  However,  note 
that  even  with  an  infinite  number  of  Bootstraps,  there  will 
still  be  a  remaining  error  due  to  the  approximation  (6)  (as 
in  Figure  lb),  as  well  as  the  limitation  of  the  Bootstrap 
itself  [3]  [4],  Finally,  the  CDFs  of  the  variables  (1 1),  calcu¬ 
lated  from  the  B  =  50  sets  of  Bootstrap  eigenvalues,  are 
shown  in  Figure  Id.  These  are  the  CDFs  on  which  the  KS 
test  is  based.  The  ‘noise  only’  CDFs  are  close,  while  the 

a)  Proposed 

Figure  2.  The  probability  of  correctly  estimating  the  rank 
( d  =  2)  versus  the  SNR,  for  various  spatial  noise  color:  a) 
Proposed  scheme,  b)  MDL. 

CDF  of  ki/kl  is  the  rightmost;  with  increasing  SNR  or 
data  size  this  CDF  moves  further  to  the  right.  Clearly,  the 
KS  test  can  easily  decide  on  the  correct  rank  from  the  sepa¬ 
ration  of  the  CDFs.  Note  that  the  dashed  CDF  is  due  to  the 
two  signal  eigenvalues. 

In  the  ideal  case,  with  white  Gaussian  noise,  the  per¬ 
formance  of  the  proposed  scheme  is  virtually  identical  to 
MDL  and  the  sphericity  test  (depending  on  how  the  thresh¬ 
old  is  set;  for  ‘consistency’,  or  for  a  fixed  level)  in  terms  of 
SNR  and  data  size  thresholds,  and  the  ability  to  resolve 
closely  spaced  targets.  Instead,  the  power  of  the  new 
method  lies  in  its  robustness  to  unknown  noise  color.  To 
illustrate,  data  was  generated  according  to  Figure  1,  but 
varying  the  spatial  noise  color.  Specifically,  the  kith  ele¬ 
ment  of  the  noise  covariance  matrix  in  (2)  was 
( Rv)kl  =  exp(— a|fc-/|) ,  with  a  being  the  parameter  to 
be  varied.  For  the  detection  procedure,  B  =  25  resamples 
of  size  M  =  75  were  taken  from  each  original  data  set  of 
size  N  =  100 .  The  actual  B  is  at  a  boundary:  a  smaller  B 
leads  to  a  penalty  in  SNR  threshold,  whereas  a  larger  gives 
no  further  improvement.  The  threshold  y  was  set  to  0.7  for 
‘consistent’  detection.  The  performance  of  the  proposed 
scheme  as  well  as  MDL  as  a  function  of  the  SNR  is  shown 
in  Figure  2a-b,  for  ae  [0, 0.1, 0.2, 0.3] .  It  is  seen  that  the 
proposed  scheme  maintains  good  detection  performance 
for  increasing  a .  Though,  there  is  a  penalty  in  the  low 
SNR  threshold.  This  is  caused  by  the  distributions  of  the 
noise  eigenvalues  being  further  separated  for  an  increasing 
a ,  leading  to  a  reduction  in  the  SNR  margin.  Increasing  a 
beyond  0.3  causes  a  more  substantial  degradation  as  the 
approximation  (6)  is  no  longer  good.  The  performance  of 
MDL  suffers  at  comparatively  small  a . 


a)  Proposed 


Figure  3.  The  probability  of  correctly  estimating  the  rank 
( d  =  2)  versus  the  SNR,  for  various  temporal  noise  color:  a) 
Proposed  scheme,  b)  MDL. 

Similarly  to  an  unknown  spatial  noise  correlation,  an 
unknown  temporal  noise  correlation  may  also  lead  to  a 
degradation  in  detection  performance.  The  above  experi¬ 
ment  was  repeated  with  spatially  white  but  temporally 
colored  noise,  having  a  temporal  covariance  of 
r(t)  =  exp(-a|x|),  with  varying  a.  With  temporally 
colored  noise,  the  data  is  no  longer  iid.  For  a  better  result 
with  the  Bootstrap,  resampling  was  performed  using  block 
resampling  [9].  The  resample  size  was  M  =  70 ,  with  each 
resample  made  up  of  7  random  sections  of  10  consecutive 
snapshots  from  the  original  data  (with  replacements).  The 
number  of  resamples  was  increased  to  B  =  50 . 

The  results  for  various  SNR  and  a  are  shown  in  Fig¬ 
ure  3a-b.  As  seen,  MDL  loses  performance  with  an  increas¬ 
ing  a .  This  is  easily  explained,  as  a  temporal  correlation 
reduces  the  effective  data  size.  With  the  ’true’  data  size  N 
being  incorrect,  the  penalty  term  in  MDL  will  be  errone¬ 
ous.  On  the  other  hand,  the  proposed  technique  does  not 
rely  directly  on  N ,  making  it  robust  to  the  temporal  noise 

[1]  T.  W.  Andersson,  ‘Asymptotic  Theory  for  Principal  Compo¬ 
nent  Analysis’,  Ann.  Math.  Statist.,  vol.  34, 1963. 

[2]  T.  W.  Andersson,  ‘An  Introduction  to  Multivariate  Statistical 
Analysis,  2nd  ed.’,  Wiley,  1984. 

[3]  R.  Beran,  M.  S.  Shrivastava,  ‘Bootstrap  Tests  and  Confidence 
Regions  for  Functions  of  a  Covariance  Matrix’,  Annals  of 
Statistics,  vol.  13,  no.  1, 1985. 

[4]  R.  Beran,  M.  S.  Shrivastava,  ‘Correction-Bootstrap  Tests  and 
Confidence  Regions  for  Functions  of  a  Covariance  Matrix’, 
Annals  of  Statistics,  vol.  15,  no.  1, 1987. 

[5]  B.  Efron,  B.  Tibshirani,  ‘An  Introduction  to  the  Bootstrap’, 
Chapman  and  Hall,  1993. 

[6]  P.  Hall,  ‘On  the  Number  of  Bootstrap  Simulations  Required 
to  Construct  a  Confidence  Interval’,  Annals  of  Statistics,  vol. 
14,  no.  4, 1986. 

[7]  E,  B.  Manoukian,  ‘Modem  Concepts  and  Theorems  of  Math¬ 
ematical  Statistics’,  Springer,  1986. 

[8]  R.  J.  Muirhead,  ‘Latent  Roots  and  Matrix  Variates:  A  Review 
of  Some  Asymptotic  Results’,  Annals  of  Statistics,  vol.  6,  no. 

[9]  D.  N.  Politis,  ‘Computer-Intensive  Methods  in  Statistical 
Analysis’,  IEEE  Signal  Processing  Magazine,  Jan.  1998. 

[10]  B.  Porat,  ‘Digital  Processing  of  Random  Signals’,  Prentice 
Hall,  1994. 

[11] P.  Stoica,  M.  Cedervall,  ‘Detection  Tests  for  Array  Process¬ 
ing  in  Unknown  Correlated  Noise  Fields’,  IEEE  Trans.  Sig¬ 
nal  Processing,  vol.  45,  no.  9,  Sept.  1997. 

[12]  Q.  Wu,  K.  M.  Wong,  ‘Determination  of  the  Number  of  Sig¬ 
nals  in  Unknown  Noise  Environments-PARADE’,  IEEE 
Trans.  Signal  Processing,  vol.  43,  no.  1,  Jan.  1995. 


A  new  technique  for  rank  estimation  has  been  presented. 
While  giving  similar  performance  as  classical  well-known 
techniques  under  ideal  conditions,  the  new  method,  based 
on  the  Bootstrap,  is  robust  to  errors  in  the  noise  model.  The 
price  for  robustness  is  an  increase  in  the  computational 
complexity.  However,  as  the  number  of  Bootstrap  replica¬ 
tions  is  fairly  small,  this  increase  is  modest. 



Yuri  I.  Abramovich,  Nicholas  K.  Spencer 

Cooperative  Research  Centre  for  Sensor  Signal  and  Information  Processing  (CSSIP), 
SPRI  Building,  Technology  Park  Adelaide,  Mawson  Lakes,  South  Australia,  5095,  Australia 

yuriOcssip . edu . au  nspencerQcssip . edu . au 


We  introduce  a  new  approach  for  the  detection-estim¬ 
ation  problem  for  sparse  linear  antenna  arrays  com¬ 
prising  M  identical  sensors  whose  positions  may  be 
noninteger  values  (expressed  in  half- wavelength  units). 
This  approach  considers  the  (noninteger)  Ma -element 
co-array  as  the  most  appropriate  virtual  array  to  be 
used  in  connection  with  the  augmented  covariance  ma¬ 
trix.  Since  the  covariance  matrix  derived  from  such 
virtual  arrays  are  usually  very  underspecified,  we  dis¬ 
cuss  a  maximum-likelihood  (ML)  completion  philoso¬ 
phy  to  fill  in  the  missing  elements  of  the  partially  spec¬ 
ified  Hermitian  covariance  matrix.  Next,  a  transforma¬ 
tion  of  the  resulting  unstructured  ML  matrix  results 
in  a  sequence  of  properly  structured  positive-definite 
Hermitian  matrices,  each  with  their  ( Ma  —  p)  small¬ 
est  eigenvalues  being  equal,  appropriate  for  the  candi¬ 
date  number  of  sources  p.  For  each  candidate  model 
(p  =  1,  . . . ,  Ma  - 1),  we  then  find  the  set  of  directions- 
of-arrival  (DOA’s)  and  powers  that  yield  the  minimum 
fitting  error  for  the  specified  covariance  lags  in  the 
neighbourhood  of  the  MUSIC-initialised  DOA’s.  Fi¬ 
nally,  these  models  describe  a  hypothesis  with  respect 
to  the  actual  number  of  sources,  and  allow  us  to  se¬ 
lect  the  “best”  hypothesis  using  traditional  informa¬ 
tion  criteria  (AIC,  MDL,  MAP,  etc.)  that  are  based  on 
likelihood  ratio. 


In  our  previous  papers  [5,  3,  2,  4],  we  introduced  a 
new  technique  for  detection-estimation  of  more  uncor¬ 
related  Gaussian  sources  m  than  sensors  M  (m  >  M) 
for  the  class  of  integer-spaced  arrays.  Here,  we  present 
one  attempt  to  extend  this  approach  to  the  class  of 
noninteger-spaced  nonuniform  linear  arrays  (NLA’s). 
Since  such  arrays  generate  up  to  |  M(M  -  1)  distinct 
nonzero  covariance  lags,  they  have  the  potential  [8]  to 

estimate  a  superior  number  of  uncorrelated  Gaussian 
sources,  ie.  for  the  number  of  sources  in  the  range 

M  <m<  (1) 

For  a  known  number  of  sources  m,  we  previously  in¬ 
troduced  [6]  a  DOA  estimation  technique  capable  of 
handling  these  superior  scenarios.  The  current  prob¬ 
lem  of  detection-estimation  is  more  complicated  since 
we  now  require  both  an  estimation  of  the  number  of 
sources  and  their  DOA’s. 

Naturally,  this  problem  has  a  solution  if  and  only 
if  the  identifiability  conditions  hold,  which  in  this  case 
means  that  the  observed  set  of  covariance  lags  gen¬ 
erated  by  the  NLA  can  be  uniquely  decomposed  into 
some  number  of  signal  dyads  plus  white  noise.  While 
the  nonidentifiability  conditions  for  detection  are  given 
in  [4],  here  we  concentrate  on  identifiable  scenarios 
only;  that  is,  for  the  true  (deterministic)  covariance 
lags  and  the  chosen  virtual  array,  the  partially  specified 
covariance  matrix  has  a  unique  completion  that  corre¬ 
sponds  to  a  mixture  of  m  uncorrelated  plane  waves  in 
white  noise. 

In  practice,  when  the  observed  specified  covariance 
lags  are  stochastic,  being  produced  by  a  sample  M- 
variate  covariance  matrix,  the  feasibility  conditions  for 
our  type  of  positive-definite  (p.d.)  completion  are  not 
guaranteed.  Therefore,  in  order  to  achieve  a  p.d.  com¬ 
pletion  with  equalised  ( Ma  -  m)  minimum  eigenval¬ 
ues,  even  the  specified  (measured)  covariance  lags  need 
to  be  modified.  Clearly,  by  not  limiting  the  size  of 
the  modification  of  the  specified  lags,  we  can  achieve  a 
p.d.  completion  with  the  desired  number  ( Ma  —  p)  of 
minimum  eigenvalues  being  equal. 

Note  that  for  a  Hermitian  matrix  to  represent  a 
mixture  of  p  uncorrelated  plane  waves  in  noise,  the 
equality  of  the  ( Ma  —  p)  smallest  eigenvalues  is  only 
a  necessary  condition  (whereas  this  is  the  necessary 

0-7803-5988-7/00/$  10.00  ©  2000  IEEE 


and  sufficient  condition  for  a  Toeplitz  matrix).  Thus 
some  further  modification  of  the  specified  covariance 
lags  is  required  in  order  to  correctly  model  the  sources, 
along  with  an  appropriate  completion  of  the  missing 
(unspecified)  covariance  lags. 

In  this  way,  we  finally  obtain  a  number  of  candidate 
models,  ie.  Mtt- variate  p.d.  Hermitian  matrices  of  the 
proper  structure,  that  are  now  compared  with  the  ML 
completion  discussed  below  using  traditional  informa¬ 
tion  criteria  that  judge  a  loss  in  likelihood  ratio  against 
an  overestimated  number  of  sources. 


Consider  m  narrow-band  plane-wave  signals  of  power 
p  =  \pi,  ...  ,pm]  impinging  upon  a  nonuniform  linear 
array  of  M  identical  omnidirectional  sensors  located 
at  positions  d  =  [d±  =  0,  di,  . . . , d\f]  measured  in  half- 
wavelength  units.  In  the  detection-estimation  problem, 
the  number  of  sources  m  is  unknown.  Adopting  the 
commonly  used  data  model  [12],  we  have 

y(t)  =  S{0)  x{t)  +  p{t)  for  t  =  l,...,N  (2) 


x(t)  =  [®i(i),  . ..,arm(f)] T  (3). 

y(t)  =  ...,yM(t)]T  (4) 

v{t)  =  ...,VM(t)]T ,  (5) 

Xj(t)  ( j  =  1, ...,m)  is  the  complex  signal  amplitude 
of  the  jth  plane  wave,  and  where  j/*(t)  and  rjk(t)  ( k  = 
1, . . . ,  M)  are  the  sensor  output  and  the  noise  at  the 
kth  sensor  respectively.  To  permit  DO  A  estimation  in 
the  superior  case  (m  >  M),  we  restrict  ourselves  to 
the  class  of  independent  (Gaussian)  signal  amplitudes 
x(t)  €  Cmxl  such  that 

«{««.-(«} -{£-dh*w  £ 


We  assume  that  the  additive  noise  rj(t)  €  CMx  1  is  white 
and  Gaussian: 

£{¥M-A<a)}  =  {S”7"  £  m 

The  array  manifold  matrix  is  S(9)  =  [s(#i),  . . . ,  a(0m)] 
G  CMxm,  where  each  constituent  “steering  vector”  s(0j) 
is  defined  as 

s(0j)  =  £l,  exp  (iirdiWj) ,  . . . ,  exp  (indMWj)  j  (8) 

with  to  =  sin  0  e  [—1, 1], 

According  to  this  model,  the  M-variate  spatial  co- 
variance  matrix 

R  =  SPSH  +p0IM  (9) 

is  p.d.  Hermitian.  Note  that  in  our  (“superior”)  case 
of  m  >  M,  the  noise-free  covariance  matrix  SPSH  is 
generally  of  full  rank.  Given  N  independent  samples 
(“snapshots”),  the  sufficient  statistic  for  DO  A  estima¬ 
tion  is  the  M-variate  direct  data  covariance  (DDC)  ma¬ 

*"(*)•  (10) 

v  t- 1 

To  illustrate  our  technique,  consider  the  “quasi¬ 
integer”  [6]  four-element  NLA 

d4  =  [0,  1.09,  3.96,  5.93]  (11) 

that  may  be  easily  recognised  as  a  slightly  perturbed 
version  of  the  optimal  four-element  integer  array  [10] 
d  =  [0,  1,  4,  6].  In  [6],  we  demonstrated  that  up  to  six 
independent  sources  could  be  unambiguously  identified 
by  the  NLA  d4.  The  co-array  of  d4  (the  sorted  set  of 
nonduplicated  position  differences)  is 

c4  =  [0,  1.09,  1.97,  2.87,  3.96,  4.84,  5.93]  (12) 

and  so  the  augmented  Ma  =  7-variate  Hermitian  co- 
variance  matrix  for  the  virtual  array  c4  is  extremely 



















































Nevertheless,  it  is  important  to  understand  that  for  the 
true  covariance  lags  ro,  ... ,  rB.93,  identifiability  means 
that  there  exists  a  single  p.d.  completion  of  H  with 
equalised  ( Ma  —  p)  minimum  eigenvalues  for  any  sce¬ 
nario  with  m  <  6  independent  sources. 

Let  S  be  the  set  of  specified  elements  {p,q},  and  S 
be  the  set  of  unspecified  elements  in  the  initial  incom¬ 
plete  augmented  covariance  matrix  H.  Suppose  for  the 
moment  that  given  the  specified  sample  covariance  lags 
rs,  we  somehow  generate  a  set  of  candidate  p.d.  Ma- 
variate  Hermitian  matrices  (p  =  1,  ...,Ma  —  1) 
that  each  correspond  to  the  model  of  p  plane  waves 
in  noise.  To  select  the  best  candidate  model,  we  cal¬ 
culate  the  likelihood  ratio  (LR)  for  each  corresponding 


M-variate  Hermitian  matrix 

R„  =  LHI1LT 

where  L  is  the  M  x  Ma  binary  selection  (or  incidence) 
matrix  with  Ljk  equal  to  unity  in  the  jth  row  and  djj.h 
column,  and  zero  otherwise. 

If  we  use  the  sphericity  test  [11] 

Ho  :  5|=PofAr 

Hi:  £{rp_5  RRp  5  |  7^  PqIm  , 

Po  >  0  (15) 

then  the  LR  is 

7  (H„)  = 



Now  information  or  Bayesian  criteria  may  be  used  for 
model  selection,  such  as  the  minimum  description  length 



"W  =  arg  min  ,  -log7(TM)  +  -zfiiogN  . 


Obviously  this  approach  is  optimal  only  if  Hp  is  the  ML 
estimate  of  the  p.d.  Hermitian  matrix  with  equal  (Ma  — 
fi)  minimum  eigenvalues.  Since  exact  ML  estimates 
of  this  kind  are  not  yet  available,  our  problem  is  to 
generate  a  set  of  above-described  Hermitian  matrices 
Hp  sufficiently  close  to  the  sufficient  statistic  R  in  the 
ML  sense. 


In  [6],  we  introduced  several  p.d.  Hermitian  comple¬ 
tions,  including  maximum-entropy  (ME)  completion. 
These  completions  are  used  here  as  an  initialisation 
step  for  the  following  optimisation  routine.  Let  the 
general  virtual  array  d!  be  specified  by  the  virtual  sen¬ 
sor  positions  d'  (j  =  1, . . . ,  Ma),  then  the  set  of  all 
possible  p.d.  Hermitian  completions  H  may  be  written 
a sH  = 

{z:H(z)=H0+J2  {'ReHpgE™+ilmHpqEpq)>0} 





x=\  1  (19) 

2571  ^  WsIP<9 

E+  =  ePe  J  +  eqep  ,  E ™  =  epe %  -  eqep  (20) 

ep  =  [0, . . . ,  0, 1, 0, . . . ,  0]  is  the  M„-variate  basis  vector 
with  a  unit  entry  in  the  pth  position,  and  Ho  is  the 
initial  completion  (eg.  ME  completion). 

Suppose  we  label  each  of  the  missing  lags  (pq  € 
S;  p  <  q)  from  1  to  £,  the  total  number  of  missing 
lags.  For  nonredundant  NLA  geometries  such  as  d*, 
the  number  of  missing  lags  is  rather  large: 

i=\(v-l)(v-2),  (21) 

where  u  =  | M(M  —  1)  +  1.  Now,  instead  of  (18),  we 
may  write 

2  e 

H  =  {z:H(z)=H0  +  '£/zjFj>o}  (22) 



7^  rpq£S;  p<q 

for  j  =  !,...,£ 

J  l 

^pqeS;  p<q 

for  j  =  1+1,.. 

(  E™  for 

\  iEpJ  for 

j  =  1,...,£ 
j  =  l  +  l,...,2i 

.  (24) 

For  sufficiently  small  z*  in  (22) 

\zk\  <  e  for  k  =  1,  (25) 

we  may  treat  the  term  Y^k- i  zk^k  as  being  equal  to 
a  perturbation  matrix  SH(z0),  and  find  a  first-order 
expansion  for  the  eigenvalues  [9].  According  to  the 
sphericity  LR  (16),  the  problem  of  ML  maximisation 
is  associated  with  the  problem  of  eigenvalue  equalisa¬ 
tion  in  the  matrix 

G(z)  =  R~*[LH(z)Lh]R-s  .  (26) 

By  applying  a  first-order  expansion  to  the  eigenvalues 
of  G(z): 

2 1 

G(z)  =  G0  +  Y,ZkR~h  LFkLH  R~*  (27) 

k= i 

we  cam  derive  that 

A,[G(z)]  =  A,[(7o]+X;  zk  ufHR~*  LFk  LH  R~ *  u<°> 

where  (g  =  1,  . . . ,  M)  is  the  gth  eigenvector  of  the 
matrix  Go,  with  corresponding  eigenvalue  Aff[Go].  Now 
we  can  introduce  the  (M  x  2£)  matrix 

=  (ufffrUFtLflr^ff=1 . M  (29) 

v  )  1 , 2 1 


and  our  search  to  find  sufficiently  small  perturbations 
(\zk\  <  e)  that  minimise  the  difference  between  the 
( Ma  —  (i)  smallest  eigenvalues  of  G(z)  may  then  be 
formulated  as  the  following  linear  programming  (LP) 

Find  min  (a  —  (3)  subject  to  (30) 
A(0)  +  X>(0)  z  <  a  1 ,  a  >  0  (31) 

A(0)  +  X>(0)  z>P  1 ,  0  >  0  (32) 

— e  <  Zk  <  £  for  k  —  1,  21  (33) 

where  A^  is  the  vector  of  noise-subspace  eigenvalues: 

A(0)  =  [A£_M+1,...,AgjT.  (34) 

and  1  =  [1,  . . . ,  1]T.  Let  the  solution  of  this  LP  prob¬ 
lem  be  z*-0),  then  we  define  an  updated  Hermitian  ma¬ 

jfW  =  H <°>  +  Y,  40)  Fk  (35) 

k= 1 

and  so  by  direct  decomposition 

=  R-s  LHW  Lh  R-s  (36) 

we  may  check  the  validity  of  the  constraints  (33),  and 
decrease  the  perturbation  “step  size”  e  if  our  equal¬ 
isation  step  failed  to  improve  the  current  differences 
amongst  the  noise-subspace  eigenvalues  of  the  matrix 
If  the  validity  conditions  are  met,  then  we  com¬ 
pute  the  associated  it^  and  A^  and  then  solve  the 
iterated  LP  problem.  Suppose  that  k  iterations  are  re¬ 
quired  before  this  procedure  essentially  reaches  its  final 
stable  point. 

Naturally,  the  global  optimality  of  the  overall  proce¬ 
dure  cannot  be  guaranteed,  whereas  at  each  step  ( ie .  lo¬ 
cally),  the  LP  routine  provides  the  optimal  solution. 

Note  that  during  this  first  stage  of  our  routine,  only 
the  unspecified  (missing)  elements  of  have  been 
varied,  while  the  specified  sample  covariance  lags  re¬ 
main  the  same  as  for  the  initial  point  Hq. 

Now,  during  the  second  stage  of  the  ML  maximi¬ 
sation  routine,  we  modify  all  covariance  lags.  Since 
small  perturbations  in  the  sample  covariance  lags  of 
R  (with  respect  to  the  exact  values  in  R)  lead  to  sig¬ 
nificant  fluctuations  in  the  noise-subspace  eigenvalues 
<rn  of  the  matrix  H,  “inverse  perturbations”  in  H  that 
equalise  up  to  the  ( Ma—m )  smallest  eigenvalues  should 
not  involve  significant  changes  to  the  sample  covariance 
lags.  Effectively,  we  use  the  same  optimisation  routine 
(30)  here  with  the  only  significant  difference  that  now 
all  elements  (except  the  diagonals)  are  varied,  ie. 


2?(«+1)  _  h{k)  +  Y'  40)  pk  •  (37) 


Given  that  we  cannot  guarantee  the  global  optimality 
of  this  second  optimisation  routine  also,  we  may  treat 
the  solution  ( H W,  say)  as  the  unstructured  ML  esti¬ 
mate  of  the  Ma-variate  covariance  matrix  .  There¬ 
fore  the  probability  of  obtaining  the  desired  number  of 
identical  minimum  eigenvalues  in  H ^  is  zero. 

For  this  reason,  our  third  stage  involves  obtaining 
a  properly  structured  ML  estimate  that  corresponds  to 
a  mixture  of  (J.  independent  plane  waves  in  noise.  The 
unstructured  ML  estimate  H ^  is  used  as  a  sufficient 
statistic,  and  further  modification  of  the  unspecified 
entries  occurs  in  order  to  equalise  the  (Ma—fi)  smallest 
eigenvalues  in  this  matrix.  Obviously,  we  expect  the 
more  eigenvalues  that  are  to  be  equalised,  the  more 
losses  we  will  obtain  in  the  LR  compared  with  the  ML 
estimate  HAfL . 

Similarly  to  the  above,  we  may  present  this  equali¬ 
sation  routine  as 

2 1 

HU+i)  =  HU)  +  YzkFk,  ff J°>  =  (38) 


where  is  the  p.d.  Hermitian  matrix  obtained  at  the 
jth  iteration  of  the  equalisation  routine.  As  before,  by 
applying  a  first-order  perturbation  expansion  for  the 
eigenvalues  of  the  matrix  H^+1\  we  can  derive  the 

following  LP  problem: 

Find  min  (a  -  0) 

subject  to 


aU)  _|_  yC?)  z  <  a  1 , 

a  >  0 


a-U)  +  y(j)  z  >  p  l, 



—e<  zk  <  e  for  k 






1  if 






-M+1 . Afo 



is  the  ith  eigenvector  of  the  matrix  h}P  ,  with  as¬ 
sociated  eigenvalue  ,  and  <r^  is  the  vector  of  noise- 
subspace  eigenvalues.  Step  size  control  of  e  is  imple¬ 
mented  in  the  same  fashion  as  before  (33). 

Clearly,  the  stable  point  of  this  third  stage  ( H , 
say)  would  not  result  in  exactly  equal  noise-subspace 
eigenvalues,  since  (as  in  the  first  stage)  the  specified 
entries  have  not  been  modified.  Of  course,  it  is  pos¬ 
sible  to  use  a  transformation  to  reach  this  final  goal. 
Such  a  transformation  keeps  the  eigenvectors  of 
invariant,  and  so  the  MUSIC-derived  DOA  estimates 
for  \i  sources  also  remain  the  same.  However,  due  to 
the  dimension  reduction  brought  about  by  (14),  the 



LR  (16)  would  change  as  a  result  of  such  a  transfor¬ 
mation.  Moreover,  even  with  strictly  equalised  eigen¬ 
values,  the  Hermitian  matrix  H 'jfi  does  not  necessar¬ 
ily  corresponding  to  the  desired  plane-waves-plus-noise 

Thus  our  fourth  and  final  stage,  that  considers  the 
sequence  of  “ML”  hypotheses  H (p  =  1,  . . . ,  Ma  - 1), 
consists  of  a  local  ML  refinement  of  the  p  DOA  esti¬ 
mates  and  associated  signal  powers  in  the  vicinity  of 
the  MUSIC  DOA  estimates  generated  by  the  covari¬ 
ance  matrix  h\?\  This  local  refinement  procedure 
is  introduced  in  [1],  and  involves  the  specified  covari¬ 
ance  lags  only.  As  a  result,  for  each  candidate  model 
\i  —  1,  1,  we  can  find  the  “ML”  set  of  es¬ 

timated  signal  parameters  { 9 ^  ,pjf}  and  estimated 
white  noise  power 

K  =  <44) 

3= 1 

that  uniquely  describes  the  covariance  matrix  R M  in  the 
hypothesis  (15) 

R,=P^lM  +  j2p^S(9^)SH(e^).  (45) 


Obviously,  which  ever  information  theoretic  or  Bayesian 
criterion  is  used  for  hypothesis  selection,  such  a  selec¬ 
tion  uniquely  specifies  not  only  the  number  of  sources, 
but  also  the  DOA  and  power  estimates. 


Simulation  results  (not  introduced  here)  conducted  for 
the  NLA  di  for  a  superior  number  of  sources  demon¬ 
strates  that  the  detection  performance  achieved  by  the 
four-stage  algorithm  described  in  this  paper  is  com¬ 
parable  to  that  produced  by  the  standard  AIC  and 
MDL  criteria  for  conventional  scenarios  (with  m  <  M 
sources)  with  the  same  Cramer-Rao  bound.  Natu¬ 
rally,  in  order  to  compare  detection  performance  on 
conventional  and  superior  scenarios,  it  is  necessary  to 
introduce  significantly  different  intersource  separation 
and/or  sample  sizes,  however  the  comparable  detec¬ 
tion  performance  in  the  two  cases  suggests  that  the 
new  detection  scheme  described  here  is  close  to  opti¬ 
mum.  An  additional  justification  for  this  conclusion 
is  that  when  our  detection-estimation  algorithm  yields 
the  true  number  of  superior  sources,  we  obtain  a  DOA 
estimation  accuracy  close  to  the  corresponding  Cramer- 
Rao  bound. 

[1]  Y.I.  Abramovich,  D.A.  Gray,  A.Y.  Gorokhov,  and 
N.K.  Spencer.  Positive-definite  Toeplitz  comple¬ 
tion  in  DOA  estimation  for  nonuniform  linear  an¬ 
tenna  arrays  —  Part  I:  Fully  augmentable  arrays. 
IEEE  Trans.  Sig.  Proc .,  46  (9):2458-2471, 1998. 

[2]  Y.I.  Abramovich  and  N.K.  Spencer.  Detection- 
estimation  of  more  uncorrelated  Gaussian  sources 
than  sensors  using  partially  augmentable  sparse 
antenna  arrays.  In  Proc.  EUSIPCO-2000,  Tam¬ 
pere,  Finland.  To  appear  September  2000. 

[3]  Y.I.  Abramovich,  N.K.  Spencer,  and  A.Y. 
Gorokhov.  Detection  of  more  uncorrelated  Gaus¬ 
sian  sources  than  sensors  in  nonuniform  linear  an¬ 
tenna  arrays  —  Part  I:  Fully  augmentable  arrays. 
IEEE  Trans.  Sig.  Proc.  Submitted  Feb  2000. 

[4]  Y.I.  Abramovich,  N.K.  Spencer,  and  A.Y. 

Gorokhov.  Detection  of  more  uncorrelated  Gaus¬ 
sian  sources  than  sensors  in  nonuniform  linear  an¬ 
tenna  arrays  —  Part  II:  Partially  augmentable  ar¬ 
rays.  IEEE  Trans.  Sig.  Proc.  In  preparation. 

[5]  Y.I.  Abramovich,  N.K.  Spencer,  and  A.Y. 

Gorokhov.  Detection  of  more  uncorrelated  Gaus¬ 
sian  sources  than  sensors  using  fully  augmentable 
sparse  antenna  arrays.  In  Proc.  SAM-2000,  Cam¬ 
bridge,  MA,  2000. 

[6]  Y.I.  Abramovich,  N.K.  Spencer,  and  A.Y. 

Gorokhov.  DOA  estimation  for  noninteger  lin¬ 
ear  antenna  arrays  with  more  uncorrelated  sources 
than  sensors.  IEEE  Trans.  Sig.  Proc.,  48  (4):943- 
955,  2000. 

[7]  P.M.  Djuric.  A  model  selection  rule  for  sinusoids 
in  white  Gaussian  noise.  IEEE  Trans.  Sig.  Proc., 
44  (7):1744-1757,  1996. 

[8]  J.-J.  Fuchs.  Extension  of  the  Pisarenko  method  to 
sparse  linear  arrays.  In  Proc.  ICASSP-95,  pages 
2100-2103,  Detroit,  1995. 

[9]  R.A.  Horn  and  C.R.  Johnson.  Matrix  Analysis. 
Cambridge  University  Press,  England,  1990. 

[10]  A.T.  Moffet.  Minimum-redundancy  linear  arrays. 
IEEE  Trans.  Ant.  Prop.,  16  (2):172-175,  1968. 

[11]  R.  J.  Muirhead.  Aspects  of  Multivariate  Statistical 
Theory.  Wiley,  New  York,  1982. 

[12]  P.  Stoica  and  A.  Nehorai.  Performance  study  of 
conditional  and  unconditional  direction-of-arrival 
estimation.  IEEE  Trans.  Acoust.  Sp.  Sig.  Proc., 
38  (10):1783-1795,  1990. 



Hsien-Tsai  Mt  and  Chan-Li  Chen 

Department  of  Electronic  Engineering, 

Southern  Taiwan  University  of  Technology 
No.l  Nan-Tai  street,  Yung  Kung  City,  Tainan  County,  Taiwan 


In  this  paper,  we  introduce  the  effective  uses  of  Gerschgorin  radii 
[1-2]  of  the  unitary  transformed  covariance  matrix  for  source 
number  detection.  The  heuristic  approach  applying  a  new 
Gerschgorin  radii  set  developed  from  the  projection  concept, 
overcomes  the  problem  in  cases  of  small  data  samples  and  an 
unknown  noise  model.  The  proposed  method  is  based  on  the 
sample  correlation  coefficient  to  normalize  the  signal 
Gerschgorin  radii  for  source  number  detection.  The  performance 
of  the  proposed  method  shows  improved  detection  capabilities 
over  GDE  [1,2]  in  Gaussian  white  noise  process. 


Array  processing,  or  more  accurately,  sensor  array  processing,  is 
the  processing  of  the  output  signals  of  an  array  of  sensors  located 
at  different  points  in  space  in  a  wavefield.  The  purpose  of  array 
processing  is  to  extract  useful  information  from  the  received 
signals  such  as  the  number  and  location  of  the  signal  sources,  the 
propagation  velocity  of  waves,  as  well  as  the  spectral  properties 
of  the  signals.  Array  processing  techniques  have  been  employed 
in  various  areas  in  which  very  different  wave  phenomena  occur. 
Common  to  all  these  applications,  there  are,  in  general,  two 
essential  purposes  in  array  processing:  (i)To  determine  the 
number  of  sources  (decision),  (ii)To  estimate  the  locations  of 
these  sources  (estimation). 

Several  high  resolution  detectors[3-5]  for  direction  of  arrival 
(DOA)  have  been  developed  in  the  field  of  passive  underwater 
and  radar  signal  processing  in  recent  years.  The  primary 
contributions  to  the  field  include  the  MUSIC  method  proposed 
by  Schmidt  [3],  the  Minimum-Norm  method  by  Kumaresan  and 
Tufts  [4],  and  the  ESPRIT  method  by  Roy  et  al.  [5].  It  is  well 
known  that  the  performances  of  these  high  resolution  methods 
largely  depend  on  the  successful  determination  of  the  number  of 
sources.  Thus,  several  methods  [6-11]  have  been  suggested  with 
this  purpose  in  mind.  Wax  and  Kailath  [6]  bring  a  statistical 
approach  to  solve  the  problem  of  source  number  detection  based 
on  the  AIC  and  the  MDL  methods,  which  are  generally  used  for 
the  model  selection. 

In  general,  the  AIC  and  MDL,  including  their  modified  versions, 
remain  the  most  widely-used  methods  for  estimating  the  source 
number.  Most  of  them  use  the  eigenvalues  to  estimate  source 
number  but  neglect  to  use  the  eigenvectors  as  well. 
Consequently,  Wu  and  Yang  [1]  proposed  a  heuristic  approach 
by  applying  the  Gerschgorin  theorem  to  find  Gerschgorin  radii  of 
the  transformed  covariance  matrix  for  source  number  detection. 

The  heuristic  detection  criterion  is  developed  from  the  concept  of 
eigenvectors'  projection. 

In  this  paper,  a  proper  similar  transformation  of  the  covariance 
matrix  is  required  in  order  to  effectively  utilize  the  sample 
correlation  coefficient  to  normalize  the  signal  Gerschgorin  radii 
for  source  number  detection. 


2.1  NarrowBandModel 

We  first  review  the  narrow  band  mathematical  model  for 
estimating  the  number  of  sources  and  DOA  of  signals  in  a 
spatially  white  noise  environment.  The  model  we  consider  here 
consists  of  L-dimensional  complex  data  vector  x(k)  which 
represents  the  data  received  by  an  array  of  L  sensors  at  the  kth 
snapshot.  The  data  vector  is  composed  of  plane-wave  incident 
narrowband  signals  each  of  angular  frequency  io0  from  M  distinct 

sources  embedded  in  Gaussian  noise.  Thus,  the  measured  array 
data  vector,  x(k),  which  is  assumed  to  be  composed  of  M 
incoherent  directional  sources  corrupted  by  additive  white  noise, 
is  received  at  the  kth  snapshot  by  L  (L  >  M)  sensors  and  is  given 


x(k)  =  X  si(k)S((0i)  +  B(k)  =  A(eo)s(k)  +  n(k),  (1) 


where  A(u>)=[  afcOj)  a(o>2)  ...  a(wL)  ]  is  the  direction  matrix 

composed  of  direction  vectors  (steering  vector)  of  the  signals  and 
the  noise  vector  n(k),  which  is  assumed  to  be  complex,  zero-mean, 


and  Gaussian.  The  source  vector  is  s(k)=[sI(k),  s2(k) . sM(k)]  , 

where  sm(k)  is  the  amplitude  of  the  mth  source  and  is  assumed  to 

be  jointly  circular  Gaussian  and  independent  of  n(k).  The  exact 
form  of  the  steering  vector  depends  on  the  array  configuration. 
However,  the  uniform  linear  array,  apart  from  being  most 
commonly  used,  may  also  offer  advantageous  implementation 
efficiency  of  some  algorithms.  For  a  propagation  wavelength  T|, 
the  distance  between  two  sensors  in  a  uniform  linear  array  must  be 
D— 13/2  and  the  corresponding  steering  matrix  is  given  by 
:a(u>m)  =  [  1,  exp(j©m ),  ...,  exp(j(L-l)com)  ]T,  (2) 

where  com  is  given  by  :  tom=  2nD s  i  n©m/ri,  where  D  is  the 
spacing  between  adjacent  elements.  8m  is  the  impinging  angle  of 
the  m^1  source  relative  to  the  array  broadside  where  0m  £ 

0-7803-5988-7/00/$  10.00  ©  2000  IEEE 


K  Jt 

(~ ”,  ~)  for  all  m.  The  vectors  a(com),  m=l,2 M 

corresponding  to  M  different  values  of  9m  are  assumed  to  be 
linearly  independent.  This  implies  that  L>M,  and  rank(A)=M. 

Note  that  it  follows  that  x(k)  is  a  complex  Gaussian  vector  with 
zero  mean  and  covariance  matrix  given  by 

C  =  E[  x(k)  x(k)H  ]  =  A(«>)  C  AH(to)  +  0*1,  (3) 

where  C  ,  which  is  the  covariance  matrix  of  s(k),  is  assumed  to  be 

non-singular,  and  on2  is  the  variance  of  Gaussian  noise. 

Superscripts  *,  T,  and  H  denote  conjugate,  transpose,  and 
Hermitian  transpose  of  matrices,  respectively. 

If  N  observations  have  been  measured  from  L  sensors,  the  entire 
data  set  can  be  placed  in  a  L*N  matrix  x  as: 

=[*(l),x(2),...  i(k)  •••  A(N)]l>n  .  (4) 

D|- diag(X.  |  X2  )•  U) 

The  eigenvalues  X,  >  X2  >  >  XM  >  XM+|  —  -  ^L-iare 

shown  in  descending  order..  Since  X  t  in  Eq.(4)  are  the 
eigenvalues  of  the  leading  principal  submatrix  of  C,  their 
eigenvalues  satisfy  the  interlacing  property  shown  as  :X,  >  X ,  > 

^2  — -  ^M+l  -  ^M+l  ^L-l  -  ^L-l  -  ^L'  ^Tie 

transformed  covariance  matrix  becomes  : 

~  X\  0  0  0  0  0  S\ 

0  Xi  0  0  0  0  Si 

S=UHCU=  0  0  0  \M  0  0  0  ,  (8) 


_S\  Si  ■  Su  ■  Sl-\  cll. 


Si  =  a’iH£ .  (9) 

for  i=l,  2, L-l. 

Each  row  of  x  represents  a  multivariate  observation.  For  the  L- 
dimensional  scatterplot,  the  row  of  x  represent  N  points  in  L- 
dimensional  space.  Subsequently,  the  array  sampled  covariance 
matrix  in  Eq.(3)  can  also  be  expressed  as  : 

C  =  — XX"  (5) 

-  N - 

2.2  Gerschgorin  Disk  Estimator 

To  make  the  Gerschgorin  disk  theorem  effective,  Wu  et  al.  [2] 
proposed  a  proper  transformation,  called  Gerschgorin  Disk 
Estimator  (GDE)  for  source  number  detection.  The  covariance 
matrix  is  first  partitioned  as  : 

where  C,  is  an  (L-l)x(L-l)  leading  principal  submatrix  of  C, 
which  is  obtained  by  deleting  the  last  row  and  column  of  C. 
Physically,  it  can  be  regarded  as  the  removal  of  the  L,h  sensor. 
Thus,  C,  becomes  the  reduced  covariance  matrix  of  the  remaining 
(L-l)  sensors.  The  reduced  covariance  matrix  C,  also  can  be 
decomposed  by  its  eigenstructure  as  :  C|=U|D]U]H,  where  is 
an  (L-l)x(L-l)  unitary  matrix  formed  by  the  eigenvectors  of  C(  as  : 

II  I  I 

Hi  =  [fl,  fl2  3m  ■“  3l  ,],and  D,  is  the  diagonal  matrix 
constructed  from  the  corresponding  eigenvalues  as  : 

It  is  clear  that  the  first  (L-l)  Gerschgorin  disks  (  i.e.  Oj,  Oz,  ..., 
On)  possess  the  Gerschgorin  radii : 

rj  =  |Pil  =  lfliHc|>  0°) 

for  i  =  1,  2,  ...,  L-l.  It  is  necessary  to  verify  that  all  of  the  p, 
values  are  equal  to  zero  when  i=(M+l),  (M+2),...,  (L-l)  due  to  the 
fact  that  the  noise  eigenvectors,  s'j,  are  orthogonal  to  A,,  which  is 
the  direction  matrix  of  C,. 

Since  S  is  a  unitary  transformation  matrix  of  C,  they  will  share  the 
same  eigenvalues.  The  collection  of  the  first  (L-l)  Gerschgorin 
disks,  Oj,  contains  its  Gerschgorin  center  at  cj  =  Xj  and  the 
corresponding  Gerschgorin  radius  rt  =  |pj|,  i  =  1,2,  ...,(L-1).  The 
disks  with  zero  radii  ( i.e.  Om+],  Om+2,'",  Ol_,  )  are  regarded  as 
the  collection  of  noise  Gerschgorin  disks.  The  remaining  disks 
(  i.e.  Op  02,  — ,  Om)  containing  non-zero  radii  and  large  centers 
are  considered  to  be  the  source  Gerschgorin  disks.  Hence,  we  can 
determine  the  number  of  sources  by  counting  the  number  of  non¬ 
zero  Gerschgorin  radii  in  the  case  of  infinite  samples.  In  addition, 
we  can  also  use  (L-l)  eigenvalues  of  Cj  to  determine  the  number 
of  sources. 

It  can  be  seen  that  the  threshold  must  be  adjustable  to  varying 
numbers  of  snapshots.  Hence,  we  define  a  heuristic  decision  rule 

as[2]:  GDE(k)  =  rk-^?  £  <).  (>D 


Where  k  is  an  integer  in  the  closed  interval  [1,  L-2].  The 
adjustable  factor,  D(N),  could  be  a  non-increasing  function 
(between  I  and  0)  when  N  increases.  If  GDE(k)  is  evaluated 
from  k=l,  the  number  of  sources  is  determined  as  k-1  (i.e.  M=k-1) 
when  the  first  nonpositive  value  of  GDE(k)  is  reached.  This  is 

due  to  the  fact  that  the  radius  value  below  the  adjustable 
threshold  will  be  considered  the  noise  collection.  Thus,  the 
above  GDE  rule  may  produce  problems  of  underestimation 


Hence,  the  method  capable  of  reducing  the  radii  size  of  signal 
Gerschgorin  disks  should  help  resolve  source  number  detection 

3.1  Correlation  Coefficients  of  Samples  Space 

In  light  of  these  requirements,  an  effective  source  number 
detection  method  must  select  a  proper  transformation  for 
maximum  reduction  of  the  radii  size  of  signal  Gerkschgorin  disks 
and  make  noise  Gerschgorin  disks  as  remote  as  possible  from 
signal  Gerschgorin  disks.  Therefore,  a  nonsingular  matrix,  D  = 

diag(X  |  X2  —  X M  —  XL  i  1)  was  used  in  [2]  to  get  small  signal 

Gerschgorin  radii,  such  as 

i=l,2,..,M.  That  method 

led  to  development  a  novel  technique,  which  outperformed  GDE 
in  Gaussian  white  and  nonwhite  noise  processes  and  could  be 
used  successfully  even  when  SNR  is  near  0  dB.  In  this  paper,  we 
extend  the  function  of  reducing  signal  Gerschgorin  disks  by 
using  a  new  developed  similar  transformation  of  the  sampled 
covariance  matrix  and  its  new  set  of  normalized  radii  of  signal 
Gerschgorin  disks. 

As  Eq.(4),  If  N  observations  have  been  measured  from  L  sensors, 
the  entire  data  set  can  be  also  placed  in  a  L*N  matrix  x  as: 

x  =[x(l),  x(2),...  5(L-1),x(L)]l-n  .  According  to  the  definition  of 
the  multiple  linear  regression  [12],  the  maximum  correlation 
coefficient  is  define  as 


for  /'=  1,  2, ...  ,  L  andf  =  1, 2,  ...  ,  L.  Note/7/:<i  =  pkj  for  all  /'and 
k.  The  value  of  pik  must  be  between  0  and  +1 . 

Without  altering  the  true  eigenvalues,  a  proper  transformation  of 
the  covariance  matrix  is  required  in  order  to  effectively  utilize  the 
sample  correlation  coefficient  to  normalize  the  signal  Gerschgorin 
radii  for  source  number  detection. 

3.2  The  Proposed  Method 

In  this  section,  a  new  transformation  kernel  based  on  the  concept 
of  sample  correlation  coefficient  is  proposed  in  order  to  improve 
detection  performance.  Now,  a  novel  transforming  matrix  is 

fi=diag(VcTr  tMi  . . 

=diag(q>i  TV  •HV  -  HJ/-i,l),  (13) 

to  the  transformed  matrix  in  Eq.(14),  where  are  the 
eigenvalues  of  the  first  (L-l)x(L-l)  leading  principal  submatrix 
of  C. 

The  new  transformed  true  covariance  matrix  becomes: 


UlHC)Ui  HiHc 

£%  CLL 





According  to  the  Gerschgorin  disk  theorem,  it  is  clear  that  the 
first  (L-l)  Gerschgorin  disks  ( i.e.  Oj,  02, ...,  Ol_i)  contain  the 
new  Gerschgorin  radii : 

for  i  =  1,2, ...,  M.  Since  r  j  in  Eq.(15)  can  be  considered  as  the 
correlation  coefficient  of  the  covariance  matrix  in  Eq.(  1 2).  The 
values  of  r’j  are  all  less  than  1,  so  that  Y/  >  rj  .  In  other  words, 

the  disk  size  of  signal  Gerschgorin  disks  can  be  reduced  as  small 
as  possible  and  the  noise  Gerschgorin  disks  can  be  kept  as  remote 
from  the  signal  Gerschgorin  disks  as  possible.  Therefore,  the 
source  number  can  be  easily  determined  by  visually  counting  the 
number  of  signal  Gerschgorin  disks  derived  by  Eq.(  1 4). 
Moreover,  when  the  noise  statistics  can  not  be  accurately 
estimated,  the  GDE  method  fails  under  a  low  SNR  situation; 
whereas  the  proposed  method  may  not. 

For  example,  in  the  case  of  one  simulated  covariance  matrix,  the 
sensor  number  is  6  (i.e.  L=6)  and  two  sources  (i.e.  M=2)  are 
uncorrelated  and  impinged  from  -12°  and  10°  (i.e.  DOA=[-12° 
10°  ]).  The  signal-to-noise  ratios  are  both  2  dB  (i.e.  SNR=[10 
10]  dB)  and  the  number  of  samples  chosen  is  N=100.  Its 
Gerschgorin  disks  in  terms  of  Gerschgorin  center-and-radius 
pairs  become  {12.11,  0.42},  {7.93,  4.7]},  {0.19,  0.18}, 
{0.09,0.36},  and  {0.08,0.03}.  The  results  are  illustrated  in  Figure 
1(a).  Subsequently,  the  same  covariance  matrix  is  transformed  by 
the  suggested  unitary  transformation  as  shown  in  Eq.(14).  The 
results  are  illustrated  in  Figure  1(b).  It  is  now  significant  that  the 
Gerschgorin  disks  form  two  separate  collections.  The  source 
collection  contains  disks  Oj  and  02  with  small  radii  (less  than  1) 
and  the  noise  collection  O3  n  O4  Pi  O5  with  small  radii. 

Fig.l(a)(b).Gerschgorin  disks  of  the  estimated  covariance 




A  uniformly  linear  array  of  8  isotropic  sensors  is  spaced  a  half 
wavelength  apart  with  additive  and  uncorrelated  white  noise.  The 
VGD  and  GDE  methods  are  used  to  detect  two  uncorrelated 

sources  with  SNR’s  of  6dB  impinging  from  0  0  and  5°respectively. 
After  200  Monte  Carlo  runs,  we  compute  their  relative  frequency 
of  false  detection  using  various  numbers  of  snapshots,  Error 
detection  performance  in  terms  of  probabilities  is  depicted  in 
Figure  2.  It  can  be  seen  that  the  proposed  method  outperforms 

Fig.2  Detection  performance  of  the  AIC,  MDL,  GDE,  and  the 
proposed  method  in  uses  of  simulated  data  with  Gaussian  white 

noise.(SNR=[6  6]dB,  DOA=fO°  5°]) 


In  this  paper,  GDE  performance  is  improved  by  using  a 
developed  similar  transformation  of  the  covariance  matrix 
and  using  its  new  set  of  Gerschgorin  radii  to  design  the 
source  number  estimators.  The  proposed  method  is  based 
on  the  sample  correlation  coefficient  to  normalize  the 
signal  Gerschgorin  radii  for  source  number  detection.  The 
performance  of  the  proposed  method  shows  detection 
capabilities  superior  to  GDE  in  Gaussian  white  noise 
process  and  can  be  used  successfully  in  a  situation  of 
measured  experimental  data. 


This  research  was  supported  by  the  National  Science 
Council  under  Grant  #NSC88-2612-E-218-001,  Taiwan, 
Republic  of  China. 

[1]  H.  T.  Wu,  J.  F.  Yang,  and  F.  K.  Chen,  "Source  number 
estimators  using  Transformed  Gerschgorin  Radii,”  IEEE 
Trans.  SP,  vol.43,  pp.  1325-1333,  Jun.  1995 

[2]  H.  T.  Wu,  and  J.  F.  Yang,  "Gerschgorin  radii  based  source 
number  detection  for  closely  spaced  signals,”  Proc.  IEEE 
Int.  Conf.  Acoust.,  Speech,  Signal  Processing,  Atlanta, 
pp.3054-3057.  May,  1996. 

[3]  R.  O.  Schmidt,  "Multiple  emitter  location  and  signal 
parameter  estimation,"  in  Proc.  RADC  Spectrum  Estimation 
Workshop,  pp.243-258,  Ocb.  1979. 

[4]  R.  Kumaresan  and  D.  W.  Tufts,  "Estimating  the  angles  of 

arrivals  of  multiple  plane  waves,"  IEEE  Trans.  Aeospace 
and  Electronic  Systems,  vol.  AES- 19,  pp.  134- 139,  1983. 

[5]  R.  Roy  and  T.  Kailath,  "ESPRIT-estimation  of  signal 
parameters  via  rotational  invariance  techniques,"  IEEE  Trans. 
ASSP.  vol.ASSP-37,  pp.984-995,  July  1989. 

[6]  M.  Wax  and  T.  Kailath,  "Detection  of  signals  by  information 
theoretic  criteria,"  IEEE  Trans.  ASSP,  vol. 33,  no.2,  pp.387- 
392,  April  1985. 

[7]  K.  M.  Wong,  Q.  T.  Zhang,  J.  P.  Reilly,  and  P.  C.  Yip,  "On 
information  theoretic  criteria  for  determining  the  number  of 
signals  in  high  resolution  array  processing,"  IEEE  Trans. 
ASSP,  vol.38,  no.  1 1,  pp.  1959- 1970,  Nov.  1990. 

[8]  M.  Wax,  “Detection  and  localization  of  multiple  sources  in 
spatially  colored  noise,”  IEEE  Trans.  SP,  vol.40,  no.l, 
pp.245-249,  Jan.  1992. 

[9]  Q.  Wu  and  D.  R.  Fuhrmann,  "A  parametric  method  for 
determining  the  number  of  signals  in  narrow-band  direction 
finding,"  IEEE  Trans.  SP,  vol.39,  no.8,  pp.  1848- 1857,  Aug. 

[10]  M.  Wax,  "Detection  and  localization  of  multiple  sources  in 
spatially  colored  noise,"  IEEE  Trans.  SP,  vol.40,  no.l, 
pp.245-249,  Jan.  1992 

[11]  W.  Wu,  J.  Pierre,  and  M.  Kaveh,  "Practical  detection  with 
calibrated  arrays,”  Proc.  of  Statistical  Signal  and  Array 
Processing  Workshop,  pp. 82-85,  Canada,  Oct.  1992. 

[12]  Richard  A.  Johnson  and  Dean  W.  Wichem,  Applied 
Multivariate  Statistical  Analysis,  Prentice-Hall,  Inc.,  New 
Jersey,  1988. 




James  W.  Pitton 

Applied  Physics  Laboratory,  University  of  Washington 

1013  NE  40th  St. 

Seattle,  WA  98105 


This  paper  presents  further  extensions  to  the  multita¬ 
per  time-frequency  spectrum  estimation  method  devel¬ 
oped  by  the  author.  The  method  uses  time-frequency 
(TF)  concentrated  basis  functions  which  diagonalize 
the  nonstationary  spectrum  generating  operator  over 
a  finite  region  of  the  TF  plane.  Individual  spectro¬ 
grams  computed  with  these  eigenfunctions  form  direct 
TF  spectrum  estimates,  and  are  combined  to  form  the 
multitaper  TF  spectrum  estimate.  A  method  is  pre¬ 
sented  for  adapting  the  multitaper  spectrogram  to  lo¬ 
cally  match  frequency  modulation  in  the  signal,  which 
can  cause  broadening  of  the  spectral  estimate.  An  F- 
test  for  detecting  and  removing  frequency-modulated 
tones  is  also  given. 


Thomson’s  multitaper  spectral  estimation  approach  [1] 
is  a  powerful  method  for  nonparametric  spectral  esti¬ 
mation.  This  method  uses  a  set  of  orthogonal  data  ta¬ 
pers  that  are  maximally  concentrated  in  frequency  and 
diagonalize  the  spectral  generating  operator.  These 
tapers  are  used  to  approximately  invert  the  operator 
and  estimate  the  spectrum.  The  multitaper  approach 
was  first  applied  to  time-frequency  (TF)  analysis  by 
a  direct  extension  to  the  nonstationary  case  through  a 
sliding-window  framework  [2],  in  which  spectrograms 
are  computed  with  each  of  the  tapers  and  combined 
to  form  an  estimate  of  the  TF  spectrum.  A  multita¬ 
per  TF  spectrum  was  constructed  using  spectrograms 
computed  with  Hermite  windows  [3],  which  had  pre¬ 
viously  been  shown  to  maximize  a  TF  concentration 
measure  [4].  This  method  was  extended  to  include  a 
means  of  reducing  artifacts  using  a  TF  mask  [5].  More 
recently,  a  multitaper  method  for  TF  analysis  was  pre- 

This  work  was  supported  by  the  National  Science  Foundation 
and  the  Office  of  Naval  Research. 

sented  by  this  author  [6]  that  diagonalized  the  nonsta¬ 
tionary  spectral  generating  operator,  formally  extend¬ 
ing  Thomson’s  approach  to  TF.  Subsequent  work  by 
the  author  gave  bias  and  variance  measures  for  the  es¬ 
timated  TF  spectrum,  presented  an  adaptive  procedure 
to  reduce  the  bias  of  the  individual  spectrograms,  and 
derived  other  properties  of  the  eigenfunctions  and  the 
resulting  TF  spectral  estimate  [7,  8]. 

In  this  paper,  a  method  is  presented  for  adapting 
the  multitaper  spectrogram  to  locally  match  frequency 
modulation  in  the  signal,  which  can  cause  broadening 
of  the  spectral  estimate.  Frequency  modulation  (FM) 
in  the  signal  will  degrade  the  resolution  and  accuracy 
of  the  multitaper  spectrogram  due  to  well-known  spec¬ 
tral  broadening  effects.  One  common  way  of  alleviat¬ 
ing  the  effects  of  the  spectral  broadening  is  to  match 
the  spectrogram  to  the  FM  by  frequency-modulating 
the  window.  This  approach  works  perfectly  well  when 
there  is  only  one  FM  rate  in  the  signal,  as  is  the  case 
with  chirped  sonar  and  radar.  However,  in  multicom¬ 
ponent  signals  such  as  speech,  biological,  and  mechan¬ 
ical  signals,  there  can  be  multiple  FM  rates  present 
at  any  given  time.  To  accurately  analyze  these  types 
of  signals,  it  is  necessary  to  locally  adapt  the  multi¬ 
taper  spectrogram  to  the  FM  at  a  given  TF  region. 
This  paper  presents  a  method  for  performing  this  lo¬ 
cal  adaptation.  An  F-test  for  detecting  and  removing 
frequency-modulated  tones  is  also  given. 


This  approach  to  TF  spectral  estimation  is  based  on  a 
straightforward  extension  of  the  spectral  representation 
theorem  for  stationary  processes  [9],  and  is  equivalent 
to  a  linear  time-varying  (LTV)  filter  model.  Define  the 
signal  s(t)  as  the  output  of  a  white-noise-driven  LTV 

0-7803-5988-7/00/$10.00  ©  2000  IEEE 


filter.  The  signal  can  then  be  written  as: 

s(t)  =  J  H(t,u)ejwtdZ(oj),  (1) 

where  H(t,ui)  is  defined  as  the  Fourier  transform  of  the 
LTV  filter  h(t,t  -  r)  [10].  The  TF  spectrum  is  defined 

P(t,w)  =  \H(t,w)\2.  (2) 

This  formulation  for  a  TF  spectrum  is  of  the  same  gen¬ 
eral  form  as  Priestley’s  evolutionary  spectrum  [9];  how¬ 
ever,  H (t,  oj)  is  not  constrained  to  be  slowly-varying. 

Given  a  signal  s(t),  an  estimate  P(t,u>)  is  desired; 
however,  direct  inversion  of  equation  (1)  is  impossible. 
A  rough  estimate  of  the  time-varying  frequency  con¬ 
tent  of  s(t)  may  be  obtained  by  computing  its  short- 
time  Fourier  transform  (STFT): 

Ss(t,w)  =  j  s(r)g(t  —  r)e  3UTdT , 


where  g(t)  is  a  rectangular  window  of  length  T.  A 
relationship  between  the  STFT  and  H  ( t ,  u>)  is  obtained 
by  replacing  s(t)  by  its  TF  spectral  formulation: 

S,(t,w)=  [  J  H(T,6)g(t-T)e-^-VrdZ(0)dT. 

J  1  (4) 

To  solve  for  the  time-varying  spectrum  H(t,0),  the 
STFT  operator  g(t  -  r)e~jW  must,  be  inverted.  This 
inversion  is  an  inherently  ill-posed  problem.  Instead, 
the  inverse  solution  is  approximated  by  regularizing 
it  to  some  region  R(t,u)  in  the  TF  plane,  much  as 
Thomson  regularized  the  spectral  inversion  to  a  band¬ 
width  W  in  his  multitaper  approach  [1].  For  simplicity 
throughout,  R(t,u)  is  defined  to  be  a  square  TF  region 
of  dimension  AT  x  A IV;  however,  the  results  readily 
generalize  to  arbitrary  regions. 

In  the  case  of  spectral  estimation,  the  operator  is 
square  and  Toeplitz;  its  regularized  inverse  is  found 
through  an  eigenvector  decomposition.  Such  is  not  the 
case  in  the  TF  problem;  the  STFT  operator  is  neither 
full  rank  nor  square.  This  operator  is  diagonalized  us¬ 
ing  a  Singular  Value  Decomposition,  giving  left  and 
right  eigenvectors  u(r)  and  V (t,  to)  and  the  associated 
eigen  (singular)  values  A: 

9(t  -  r)e~iWT  =  £  \kuk(r)Vk*(t,cj).  (5) 


The  eigenvectors  it(r)  and  V(t,u)  form  an  STFT  pair: 

V(t,u>)  =  J  u(r)g(t  -  T)e~iu}TdT.  (6) 

The  SVD  relationship  between  u(t)  and  V (t,  u>)  is  ob¬ 
tained  by  applying  the  STFT  operator  to  V ( t ,  u>),  com¬ 
puting  the  integrals  only  over  AT  x  A IV: 

Xu(t)  =  f  j 

J  at  Jaw 

V(t,u)g{t-T)eju)Tdudt.  (7) 

The  inverse  STFT  computed  over  all  (t,  to)  also  holds. 
This  equation  can  be  reduced  to  a  standard  eigenvector 
equation  by  substituting  for  V(t,  w).  The  eigenvalue 
equation  for  u(r)  is  then: 

A u(t)  =  J  2AIVsinc(AIV(r  -  s))/(r,  s)u(s)ds,  (8) 


f(r,  s)  =  [  g(t  -  s)g{t  -  r)dt. 


u(t)  can  be  computed  using  standard  eigenvalue  so¬ 
lution  methods.  As  has  been  discussed  elsewhere,  the 
eigenvectors  are  concentrated  in  TF  and  doubly  orthog¬ 
onal,  both  over  the  entire  TF  plane  and  over  AT  x  AIV. 
These  properties  are  critical  for  the  estimation  method. 

Next,  H(t,w)  is  estimated  regularized  to  the  rectan¬ 
gular  region  AT  x  AIV  by  projecting  it  onto  AT  x  AIV 
in  the  vicinity  of  (t,ui)  using  the  kth  left  eigenvector 

Hk(t,u)  =  A"*  /  [  H(T,0)uk(t-T)eM~^TdZ(9)dT. 



Hk  is  thus  a  direct,  but  unobservable,  projection  of 
H(t,u)  onto  AT  x  AIV. 

These  expansion  coefficients  are  then  estimated  us¬ 
ing  the  STFT  of  s(t )  computed  using  uk{t): 

Sk(t, u)  =  J  j  H{T,9)uk{t-r)e-^-^TdZ{9)dT, 


i.e.,  the  kth  eigenspectrum  Sk(t,w)  is  a  projection  of 
H(t,u)  onto  the  kth  left  eigenvector  uk(t),  estimating 
Hk(t,u>)  over  AT  x  AIV.  When  s(t)  is  a  stationary 
white  noise  process,  it  follows  that 

E[\Sk(t,w)\2]  =  \H(t,w)\2  =  P(t,u).  (12) 

Thus,  the  individual  eigenspectra  are  direct  estimates 
of  P(t,  u>),  and  are  unbiased  when  the  spectrum  is  white. 

Next,  H(t,w)  is  estimated  over  AT  x  AIV  using  the 
right  eigenvectors  Vk(t,u>)  weighted  by  the  projections 
of  H(t,u> )  onto  uk(t),  i.e.,  the  kth  spectrogram: 


u>)  =  ^2  Vk{t-  t,u>  -  w)Sk(t,w),  (13) 



where  K  ss  AT  AW.  Choosing  AT  AW  too  small  will 
result  in  estimates  with  poor  bias  and  variance  proper¬ 
ties.  The  magnitude-square  of  H(t,Q\t,u)  is  an  esti¬ 
mate  of  P(t,u )  over  AT  x  A W.  This  estimate  is  a  x2 
random  variable  with  two  degrees  of  freedom  (except 
for  DC  and  Nyquist)  with  variance  P2(t,cj).  The  vari¬ 
ance  of  this  estimate  can  be  reduced  by  averaging  over 
AT  x  AW  and  invoking  the  orthogonality  of  T4(t,w): 






f  [ 







The  average  of  K  direct  estimates  is  a  x2  random  vari¬ 
able  with  2 K  degrees  of  freedom;  hence,  the  variance 
of  this  estimate  is  P2(t,u:)/K.  If  AT  is  chosen  to  be  a 
fixed  proportion  of  the  window  length  T,  then  this  es¬ 
timator  is  consistent  for  fixed  AIT.  Note  that  the  form 
of  this  estimator  differs  slightly  from  that  presented 
previously  [6,  7,  8]  in  the  weighting  by  the  eigenvalues. 


The  estimate  for  P{t,u)  given  in  equation  (14)  is  un¬ 
biased  for  white  noise.  For  the  estimate  to  be  unbiased 
for  signals  other  than  white  noise,  it  is  only  necessary 
that  P(t,uj)  be  locally  white  in  TF,  since  the  estimate 
is  regularized  to  AT  x  AIT.  A  similar  requirement  is 
seen  in  the  stationary  case  [1],  wherein  the  spectrum 
is  assumed  to  be  smoothly  varying  so  that  it  is  ap¬ 
proximately  white  over  AIT.  A  class  of  stochastic  pro¬ 
cesses  known  as  locally  stationary  processes  [12]  satisfy 
the  requirement  of  being  smoothly  varying  in  TF,  and 
can  be  used  to  describe  a  wide  variety  of  nonstation¬ 
ary  signals.  Locally  stationary  processes  are  stochastic 
processes  with  covariance  functions  of  the  form 

R(tuh)  =  E[s(t1)s*(t2)]  =  g(t-^2.)f(t1-t2),.  (15) 

where  <?(•)  is  a  nonnegative  function  and  /(■)  is  a  valid 
covariance  function;  that  is,  f(t )  possesses  a  nonneg¬ 
ative  Fourier  transform  F(oj).  Through  a  change  of 
variables,  the  symmetric  form  of  the  covariance  func¬ 
tion  is  seen  to  be: 

Rs(t,r)  =  E[s{t  +  r/2)s*{t  -  r/2)]  =  g{t)f(r),  (16) 
The  TF  spectrum  is  thus  given  by  [11]: 

Ps(t,uj)  =  g(t)F(u).  (17) 

For  locally  stationary  s(t),  Ps(t, ui)  will  be  approxi¬ 
mately  constant  over  AT  x  AIT,  and  equation  (12) 
will  still  hold. 

The  class  of  processes  with  such  nonnegative  TF 
spectra  is  easily  extended  to  include  a  wider  range  of 
nonstationary  processes  [13].  Let  s(t)  be  a  locally  sta¬ 
tionary  process  with  covariance  function  R,(t,T )  and 
corresponding  TF  spectrum  Ps(t,w).  Then  the  linearly 
frequency  modulated  signal  s(t)eJ/3t  12  will  have  co- 
variance  Rs{t,T)e^tT  and  corresponding  nonnegative 
TF  spectrum  Ps(t,u>  -  (it).  More  generally,  let  x(t)  — 
s{t)e^^\  where  s(t)  is  locally  stationary  with  sym¬ 
metric  covariance  function  R8(t,r)  from  equation  (16). 
Then  the  covariance  of  x(t )  is 

Rx(t,r)  =  g{t)f(T)ei^t+Tl2)-^t-T/2)).  (18) 

By  making  use  of  the  principle  of  stationary  phase  [14], 
it  can  be  shown  [13]  that  the  TF  spectrum  of  x(t)  is 
given  by: 

Px{t,Lj)  =  g{t)F(u-<t>'{t))  =  Ps{t,bJ-<t>'{t)).  (19) 

Thus,  a  frequency  modulated  locally  stationary  (FMLS) 
process  will  have  a  TF  spectrum  equal  to  that  of  the 
locally  stationary  process  centered  around  the  instan¬ 
taneous  frequency  of  the  FM.  The  generalization  can 
be  taken  one  step  further  to  define  a  composite  FMLS 
process,  consisting  of  a  sum  of  statistically  independent 
FMLS  processes.  The  composite  signal  will  also  have 
a  nonnegative  TF  spectrum  equal  to  the  sum  of  the 
spectra  of  the  individual  processes. 

However,  when  s(t)  is  an  FMLS  process,  P(t,u>) 
will  most  certainly  not  be  constant  over  AT  x  AIT, 
and  equation  (12)  will  fail  to  be  valid.  In  this  case,  the 
smoothing  region  AT  x  AIT  must  be  oriented  to  match 
the  FM  of  the  signal.  This  reorientation  is  equivalent 
to  matching  the  spectrogram  window  to  the  FM  of  the 
signal.  This  matching  can  be  accomplished  by  using 
a  frequency  modulated  window  in  the  original  STFT 
computation.  However,  in  signals  with  multiple  FM 
rates,  as  in  a  composite  FMLS  signal,  this  adaptation 
must  be  performed  locally  in  TF,  as  discussed  next. 


To  locally  demodulate  the  spectrograms,  it  is  first  nec¬ 
essary  to  construct  a  reliable  estimate  of  the  local  FM, 
which  is  denoted  by  (i(t,u).  Letting  the  TF  depen¬ 
dence  be  implicit,  (3  can  be  estimated  by  computing  a 
local  covariance  of  the  multitaper  spectrogram  normal¬ 
ized  by  the  time  spread:  ((t-t)(u)-u>)) / ((t-t)2) ,  where 
t  and  Q  are  the  local  average  time  and  frequency,  re¬ 
spectively;  their  dependence  on  t  and  ui  is  implied.  The 
covariance  is  computed  by  integrating  over  a  finite  re¬ 
gion  of  the  TF  plane  AT  x  AIT  as  a  two-dimensional 


sliding  window  to  provide  an  estimate  of  /3  as  a  function 
of  t  and  u: 

JAT  fAW(t  -t-t)(cj-UJ-  Q)P({,  u>)dtdu> 


t  and  u)  are  computed  similarly.  Integrating  over  a 
larger  region  will  provide  better  variance  properties  at 
the  expense  of  possible  bias  due  to  multiple  signal  com¬ 
ponents  with  differing  FM  rates  lying  within  the  area 
of  integration. 

Once  (3(t,uS)  has  been  estimated,  each  STFT  Sk  (t,  u>) 
is  dechirped  by  locally  convolving  it  with  the  Fourier 
transform  of  : 

s£(t,u)  =  J  Skit,  u-  ey-^w^de.  (21) 

This  convolution  is  shift-variant;  at  each  frequency,  a 
new  /3  must  be  used.  This  convolution  is  equivalent 
to  matching  the  STFT  to  the  local  chirp  rate.  While 
this  convoluation  at  first  would  appear  to  be  an  0(N2) 
operation,  it  can  actually  be  implemented  much  more 
efficiently.  The  equivalent  chirp  in  the  time  domain 
is  of  length  T,  the  length  of  the  STFT  window.  The 
Fourier  transform  of  this  finite-length  chirp  will  then 
have  bandwidth  (3T.  Thus,  if  the  average  bandwidth 
of  the  various  FM  components  is  M  =  (3T  bins,  an 
STFT  with  N  frequency  samples  can  be  dechirped  with 
only  NM  multiplies  per  time  slice,  comparable  to  the 
computational  complexity  of  the  STFT  itself.  Once  all 
of  the  Sk  (t,  w)  are  dechirped,  the  multitaper  estimate 
is  constructed  as  usual. 

5.  F-TEST  FOR 


The  validity  of  the  multitaper  estimate  rests  on  the 
assumption  that  the  TF  spectrum  is  smoothly  vary¬ 
ing  over  AT  x  AW.  This  assumption  is  violated  when 
spectral  lines  (FM  or  otherwise)  are  present  in  the  sig¬ 
nal.  In  this  case,  it  is  necessary  to  estimate  the  tones 
and  remove  them  from  the  signal.  Ordinarily,  estimat¬ 
ing  a  tone  with  unknown  FM  would  be  extremely  dif¬ 
ficult.  This  task  is  made  easier,  however,  by  the  local 
matching  described  above.  Once  the  individual  STFT’s 
Sk(t,oj)  have  been  adapted  to  local  FM,  any  frequency 
modulated  tones  in  the  signal  will  behave  exactly  as  a 
stationary  tone  would  behave  in  a  non-adapted  STFT. 
As  a  result,  an  F-test  for  the  existence  of  any  FM  tones 
in  the  TF  spectrum  can  be  defined  by  directly  extend¬ 
ing  Thomson’s  approach  in  the  stationary  case.  The 
expected  value  of  the  kth  dechirped  STFT  for  an  FM 
tone  fieJW)  with  instantaneous  frequency  w  =  <f>'{t)  is: 

E[Sk{t,u)\=nUk(0).  (22) 

The  mean  can  then  be  estimated  via  regression: 




The  variance  of  this  estimate  is  equal  to  the  background 
TF  spectrum  minus  the  spectral  line,  which  is: 

P(t,u>)  =  £  | Sk(t,u)  -  0)|2  .  (24) 

The  F-test  at  time  t  is  then  given  by  the  ratio  of  the 
power  of  the  spectral  line  and  that  of  the  background 


Under  the  null  hypothesis,  the  test  quantity  at  a  single 
time  is  the  ratio  of  two  y2  random  variables  with  2  and 
2(K  -  1)  degrees  of  freedom.  For  a  signal  of  length  T 
and  an  STFT  of  order  N,  there  will  be  T/N  indepen¬ 
dent  blocks  of  data.  Thus,  the  final  F-test  will  be  a  ra¬ 
tio  of  y2  random  variables  with  2 T /N  and  2(K—l)T/N 
degrees  of  freedom,  integrated  along  the  contour  spec¬ 
ified  by  u  =  cf>'(t ): 

F(,l(t))  =  (g-i)EL  w,<m)\*zLiVi(  Q) 

Ef=i  Eti  i  sk(t,m)  - 


If  the  F-test  achieves  the  specified  confidence  level, 
the  tone  should  be  removed  by  subtracting  from  the 
STFT’s  prior  to  forming  the  TF  spectrum,  then  added 
into  the  representation  as  an  impulse: 

Pit,  w)  =  £(*,  w)<*(w  -  <t>' (<))  +  jy 


\Sk(t,u)-Mt,u)Uk(<J-cl>’m\  (27) 

Matching  the  STFTs  to  the  local  FM  greatly  simpli¬ 
fies  the  F-test.  With  no  matching,  the  STFT  of  an  FM 
tone  will  be  spread  according  to  the  sweep  rate,  and 
will  thus  have  a  functional  form  dependent  on  (3.  After 
matching,  the  FM  tone  will  have  the  same  response  as 
a  stationary  tone  in  an  unmatched  STFT.  Thus,  the 
expression  for  p  in  equation  (23)  can  be  used  for  all 
FM  rates.  The  procedure  for  testing  for  an  FM  tone 
is  then  a  four-step  process:  compute  the  test  statistic 
F(t,u>)  over  time  and  frequency;  find  candidate  con¬ 
tours  u>(t)  =  <j)'(t)  in  F(t,u);  compute  F(<j>'(t))\  and 
test  its  significance. 



[1]  D.  J.  Thomson,  “Spectrum  estimation  and  har¬ 
monic  analysis,”  Proceedings  of  the  IEEE ,  vol.  70, 
pp.  1055-1096,  1982. 

[2]  D.  Thomson  and  A.  Chave,  “Jackknifed  error  es¬ 
timates  for  spectra,  coherences,  and  transfer  func¬ 
tions,”  in  Advances  in  Spectrum  Analysis  and  Ar¬ 
ray  Processing,  Vol.  /( S.  Haykin,  ed.),  pp.  58-113, 
Prentice-Hall,  1991. 

[3]  M.  Bayram  and  R.  G.  Baraniuk,  “Multiple 
window  time-frequency  analysis,”  in  Proceedings 
of  the  IEEE-SP  International  Symposium  on 
Time-Frequency  and  Time-Scale  Analysis,  (Paris, 
France),  pp.  173-176,  June  1996. 

[4]  I.  Daubechies,  “Time-frequency  localization  oper¬ 
ators:  a  geometric  phase  space  approach,”  IEEE 
Transactions  on  Information  Theory ,  vol.  34, 
no.  4,  pp.  605-612,  1992. 

[5]  F.  Qakrak  and  P.  Loughlin,  “Multiple  window 
non-linear  time-varying  spectral  analysis,”  IEEE 
Transactions  on  Signal  Processing,  1999.  submit¬ 

[6]  J.  Pitton,  “Nonstationary  spectrum  estimation 
and  time-frequency  concentration,”  in  IEEE  Con¬ 
ference  on  Acoustics,  Speech,  and  Signal  Process¬ 
ing,  vol.  IV,  pp.  2425-2428,  IEEE,  1998. 

[7]  J.  Pitton,  “Time-frequency  spectrum  estimation: 
an  adaptive  multitaper  method,”  in  IEEE  Int. 
Sym.  Time-Frequency  and  Time-Scale  Analysis, 
(Pittsburgh,  PA),  pp.  665-668,  1998. 

[8]  J.  Pitton,  “Adaptive  multitaper  time-frequency 
spectrum  estimation,”  in  SPIE  Advanced  Sig. 
Proc.  Algs.  Archs.,  Impl.  VII,  1999. 

[9]  M.  Priestley,  Spectral  Analysis  of  Time  Series. 
Academic  Press  -  London,  1981. 

[10]  J.  Pitton,  “Linear  and  quadratic  methods  for  pos¬ 
itive  time-frequency  distributions,”  in  IEEE  Con¬ 
ference  on  Acoustics,  Speech,  and  Signal  Process¬ 
ing,  vol.  5,  pp.  3649-3652,  IEEE,  1997. 

[11]  P.  Flandrin,  “On  the  positivity  of  the  wigner-Ville 
spectrum,”  Signal  Processing,  vol.  11,  no.  2,  1985. 

[12]  R.  Silverman,  “Locally  stationary  random  pro¬ 
cesses,”  IRE  Transactions  on  Information  Theory, 
vol.  IT-3,  September  1957. 

[13]  J.  Pitton,  “The  statistics  of  time-frequency  anal¬ 
ysis,”  Journal  of  the  Franklin  Institute,  2000.  To 

[14]  E.  Key,  E.  Fowle,  and  R.  Haggarty,  “A  method 
of  designing  signals  of  large  time-bandwidth  prod¬ 
uct,”  IRE  Intern.  Corn.  Record,  vol.  IV,  1961. 



Shawn  Kraut  and  Jeffrey  Krolik 

Department  of  Electrical  and  Computer  Engineering 
Duke  University,  Box  90291 
Durham,  NC  27708-0291 



We  consider  the  problem  of  constructing  an  optimal 
reduced-rank  subspace  for  parameter  estimation,  in  mod¬ 
els  where  the  data  is  a  non-linear  function  of  the  param¬ 
eters.  The  solution  which  minimizes  mean-squared  er¬ 
ror  is  a  compromise  between  the  prior  distribution,  and 
the  measurement  model,  reducing  to  the  Karhunen- 
Loeve  Transform  when  only  the  prior  is  considered. 
The  measurement  model  determines  which  parameters 
the  measured  data  is  less  sensitive  to,  and  which  are 
therefore  less  estimatable.  Our  approach  obtains  pa- 
rameterizations  in  which  the  influence  of  these  param¬ 
eters  is  reduced,  so  that  limited  resources  may  be  allo¬ 
cated  to  more  estimatable  features.  We  apply  it  to  the 
problem  of  estimating  index-of-refraction  profiles  from 
sea-surface  clutter  data. 


In  this  paper  we  will  consider  the  problem  of  con¬ 
structing  a  reduced-dimension  subspace  in  which  to 
search  for  parameter  estimates  £.  Non-linear  models 
of  the  form  y  =  L(8)  +  n  are  considered,  where  the 
measured  data  y  depends  on  the  the  parameter  set  £ 
through  the  non-linear  model  L{-),  and  is  corrupted  by 
additive  noise  n.  We  will  discuss  this  problem  in  the 
specific  case  of  estimating  the  tropospheric  index  of  re¬ 
fraction  profile  from  clutter  returns  received  from  ship- 
based  microwave  radars  [1].  In  the  “refractivity  from 
clutter”  (RFC)  problem,  the  data  y  consists  of  clut¬ 
ter  returns  across  range,  and  the  description  of  prop¬ 
agation  through  the  refractivity  profile  yields  the  non¬ 
linear  model. 

We  ask  the  following  question:  what  is  the  opti¬ 
mal  reduced-rank  basis  for  searching  for  estimates  of 

This  work  was  supported  by  SPAWAR  Systems  Center,  San 
Diego,  under  contract  No.  N66001-97-D-5028.  Presented  at  the 
10th  IEEE  Workshop  on  Statistical  Signal  and  Array  Processing, 
Pocono  Manor,  Pennsylvania,  August  14-16,  2000. 

the  parameter  set?  From  an  engineering  standpoint, 
estimating  the  full  refractivity  profile  would  require  a 
search  through  a  large-dimensional  parameter  space, 
and  would  be  too  computationally  slow  for  real-time 
estimation  of  a  dynamically  varying  profile.  From  a 
modeling  standpoint,  we  are  interesting  in  what  re¬ 
duced  parameterizations  one  should  be  estimating. 

The  Karhunen-Loeve  Transform  (KLT)  describes 
the  optimal  reduced-rank  linear  subspace  for  minimiz¬ 
ing  compression  or  representation  error  [2],  by  consider¬ 
ing  the  prior  statistical  distribution  of  the  parameters. 
The  subspace  is  constructed  from  the  dominant  eigen¬ 
vectors  of  the  prior  covariance  matrix  of  the  parameter 
vector,  R00  =  E  j££*j  (with  the  mean  of  £  subtracted 
out).  The  limitation  of  the  KLT  is  that  it  does  not  in¬ 
corporate  the  estimation  problem:  what  parameteriza¬ 
tions  can  be  estimated  from  the  data  with  the  smallest 
estimation  error?  If  one  were  to  consider  estimation  er¬ 
ror  alone,  then  one  would  build  the  reduced-rank  search 
space  from  the  model  L(-),  ignoring  the  prior.  But  the 
resulting  parameter  basis  functions  might  not  repre¬ 
sent  well  the  natural  distribution  of  the  parameters.  In 
the  RFC  example,  profiles  that  are  built  from  such  a 
basis  will  not  necessarily  look  like  natural,  typically  ob¬ 
served  index-of-refraction  profiles  (for  an  example,  see 
Figure  1). 

The  optimal  basis,  in  a  MSE  sense,  is  a  compromise 
between  the  two  considerations  of  estimation  and  repre¬ 
sentation  error.  What  is  this  basis?  In  the  case  of  linear 
models  ( y=E0+n ),  the  problem  has  been  investigated 
and  solved  in  two  contexts.  Examining  Wiener  filters, 
in  the  form  R^R"),  Scharf  found  that  the  optimal 
(minimum  mean-square  error)  reduced-rank  Wiener  fil¬ 
ter  is  given  by  truncating  the  singular-value  decomposi¬ 
tion  (SVD)  of  RflyRyy2 ,  to  give  trunc  j^R^R,,/ )Ry;/  J 
(see  [3],  p.330,  and  [4]).  More  recently,  Hua,  et.  a!., 
suggested  the  generalized  KLT  (GKLT),  constructed 
from  the  dominant  eigenvectors  of  RgyRjj^Ryg  (see  [5, 

0-7803-5988-7/00/$10.00  ©  2000  IEEE 


Index  of  Refractivity  Vertical  Profile 

Figure  1:  A  typical  tropospheric  index  of  refraction 
profile,  with  a  tri-linear  shape  characterized  in  part  by 
base  height,  duct  height,  and  M-deficit. 


We  are  considering  the  same  problem,  but  in  the 
context  of  non-linear  models.  Furthermore,  in  the  RFC 
case  we  will  discuss,  a  closed-form  analytical  model  for 
L(9)  is  not  available;  the  clutter  return  y  that  would 
result  from  a  given  profile  9  must  be  computed  numer¬ 
ically.  How  then  do  we  find  the  optimal  reduced-rank 
parameter  basis? 


Generally,  one  seeks  a  solution  9  that  maximizes 
some  objective  function  C: 

max  C(y,L(9))  ->  9.  (1) 


For  example,  for  a  MAP  (maximum  a  posteriori )  esti¬ 
mator,  C  maximizes  the  posterior  probability  density 
function  for  y.  In  this  work  we  are  seeking  to  iden¬ 
tify  linear,  reduced-rank  parameterizations  in  the  form 
9r  =  U rb.  Here  Ur  is  is  a  “tall”  matrix  with  orthonor¬ 
mal  columns,  i.e.  U(,Ur  =  I.  The  problem  is  then 
reduced  to  searching  over  candidate  values  of  b: 

max  C(y,L(\Jrb))  — >  9r  =  U rb,  (2) 


;mh1  the  basic  question  is,  how  do  we  choose  Ur  ? 

Some  useful  results  can  be  obtained  by  assuming  the 
following  two  axioms:  both  the  full-rank  and  reduced- 
rank  estimators  are  uncorrelated  with  (orthogonal  to) 
the  error  of  the  full-rank  estimator: 

(A)  E 

(o  -  Mr 

—  0;  and  (B)  E 


=  0. 


The  first  condition  is  strictly  true  for  the  conditional 
mean  (CM)  estimator,  which  is  also  the  Minimum  Mean- 
Squared  Error  (MMSE)  estimator,  and  for  the  Linear 
MMSE  estimator  (Wiener  filter).  It  can  be  shown  that 
the  second  condition  is  strictly  true  if  9  is  constructed 
from  the  the  MMSE  estimator  or  LMMSE  estimator, 
and  §r  is  constructed  from  the  same  type  of  estimator 
of  =  U££.  This  condition  basically  excludes  9r  from 
bringing  in  side  information  about  9  that  is  not  present 
in  9.  (In  simple  terms,  we  don’t  have  the  situation 
where  §  is  poor  estimator,  while  9r  is  simultaneously 
based  on  a  good  estimator.) 

A  consequence  of  this  condition  is  that  the  error 
correlation  of  9r  is  greater  than  that  of  9: 

Qr  =  Q  +  ^[(l-i)(l-i)t]  >Q,  (4) 

where  Q  =  E  |^(0  -  9) (9  -  0)tj .  If  we  seek  the  reduced- 
rank  estimator  that  minimizes  the  residual  MSE  (trace 
ofE[(9-9r)(9-9r)%  it  can  be  shown  that  the  error 
correlation  can  be  rewritten  as 

QP  =  Q  +  (I-PP)RM(I-Pr),  (5) 

where  R^  =  E  is  the  estimator  correlation,  and 

Pr  =  UrU).  is  the  projection  onto  the  reduced-rank 
subspace.  Using  the  same  argument  as  that  taken  for 
the  KLT,  the  reduced-rank  subspace  is  then  constructed 
from  the  dominant  eigenvectors  of  R^ .  It  should  be 
noted  that  in  the  case  of  the  linear  model,  this  result  re¬ 
duces  to  the  “Generalized  KLT”  discussed  in  [3,  4,  5,  6]; 
i.e.  R^  becomes  R^R^R^,,. 

This  solution  is  intuitively  pleasing:  to  find  a  reduced- 
rank  subspace  to  search  for  parameter  estimates,  search 
the  subspace  where  the  full-rank  estimates  naturally 
tend  to  lie.  Also,  note  that  a  consequence  of  the  first 
orthogonality  condition  is  that  the  a  priori  covariance 
of  the  parameter  vector  9  —  (9  —  9)  +  9  can  be  de¬ 
composed  into  the  correlation  of  the  error  (0  —  9),  and 
the  correlation  of  the  full-rank  estimator  9.  This  ob¬ 
servation  can  be  written  in  the  form  of  a  “Pythagorean 
Theorem” : 

R  99  =  Q  +  R#0 

->  R^  =  R00  -  Q-  (6) 


[12]  08270439.nuc 

In  this  formulation,  we  should  take  the  dominant  eigen¬ 
vectors  of  the  difference  between  the  a  priori  covariance 
and  the  full-rank  error  correlation,  which  reduces  to  the 
KLT  in  the  limit  that  the  error  correlation  becomes 


Estimating  the  covariance  matrix  R^  over  the  full 
parameter  space  may  be  computationally  intensive  in 
practice,  limited  by  the  computation  time  of  the  prop¬ 
agation  model  L(-),  over  a  set  of  values  of  6  (either 
grid  points  or  realizations).  Recently,  closed-form  ex¬ 
pressions  were  obtained  for  the  Fisher  information  ma¬ 
trix  in  the  RFC  estimation  problem  [7],  which  could 
in  principle  be  used  to  approximate  the  full-rank  er¬ 
ror  correlation  in  Equation  6,  i.e.  Q<;  J-1.  However, 
this  approach  is  infeasible  if  the  dimension  of  the  ini¬ 
tial  parameter  vector  6  is  too  high,  since  the  multi¬ 
dimensional  numerical  differentiation  for  the  estimate 
of  J  requires  the  evaluation  of  the  nonlinear  function 
L(-)  on  a  number  of  grid  points  that  increases  quadrat- 
ically  with  the  dimension. 

An  alternate  approach  taken  here  is  to  obtain  sam¬ 
ples  for  a  sample  covariance  matrix  estimate  of  R^, 
where  each  sample  is  an  approximate  conditional-mean 
estimate  6,  formulated  as  follows: 

f  f  dO  6f(9,y) 

=  jdoemyj  = 

fd6  6J(6\y)m)  Zi&ifim 

fd&fwv)m  ~  EiHm 

where  6_i  are  samples  drawn  from  the  prior  /(#),  i.e. 
historical  data,  and  w(y ,  Of)  is  a  normalized  weighting 
factor,  proportional  to  the  likelihood.  So  an  estimate  is 
obtained  by  averaging  over  historical  profiles  that  are 
weighted  by  their  likelihood  of  producing  the  data  y. 


To  evaluate  this  approach  to  rank-reduction,  we 
used  profiles  from  the  VOCAR  data  set,  taken  at  three 
sites  off  the  coast  of  southern  California  in  1993  [1]. 
The  most  straightforward  way  to  apply  the  KLT  ap¬ 
proach  is  to  simply  take  the  profiles  of  M- values  (mod¬ 
ified  refractivity)  over  a  uniform  height  grid,  and  con¬ 
catenate  them  into  the  columns  of  a  data  matrix  0,  and 

Mod.  Refractivity 

Figure  2:  An  “octo-linear”  fit  of  a  profile,  consisting  of 
eight  linear  segments. 

use  the  resulting  sample  covariance  of  R eof  to  gen¬ 
erate  dominant  eigenvectors/EOFs  (extended  orthogo¬ 
nal  functions),  and  in  order  to  generate  new  random 
profiles  for  analysis.  Unfortunately,  a  Gaussian  ran¬ 
dom  model  with  covariance  Reof  fails  to  reproduce 
the  characteristic  tri-linear  shape  of  observed  profiles 
(Figure  1,  the  second  linear  segment  is  responsible  for 
the  downward  refraction  that  causes  ducting  behavior). 
In  particular,  the  height  of  the  duct  (height  of  the  first 
two  segments  in  the  tri-linear  profile)  may  vary  con¬ 
siderably,  and  averaging  over  an  ensemble  of  observed 
profiles  tends  to  suppress  the  key  feature  of  the  duct;  it 
is  “washed-out”  in  the  sample  mean  (not  shown  here). 
In  addition,  profiles  synthesized  from  the  sample  mean 
and  R eof  tend  to  have  many  mini-ducts  over  the  en¬ 
tire  height  range,  features  not  observed  in  real  data. 

To  formulate  a  random  model  that  synthesizes  real¬ 
istic  profiles,  and  at  the  same  time  formulate  an  initial 
profile  parameterization,  we  fit  each  historical  profile 
to  a  profile  consisting  of  eight  linear  segments  (i.e.,  an 
“octo-linear”  profile),  as  shown  in  Figure  2.  This  pro¬ 
cedure  fits  the  profile  to  a  length- 17  parameter  vector 
9.  corresponding  to  the  heights  of  the  eight  segments, 
the  widths  of  the  eight  segments  (or  M-deficits),  and 
the  the  M- value  at  zero  height  (sea  level). 

The  key  characteristic  of  this  fit  is  that  it  is  feature- 
based:  referring  to  Figures  1  and  2,  the  top  of  the  sec¬ 
ond  segment  was  generally  chosen  to  correspond  to  the 
middle  of  the  duct  (the  first  two  segments  accounting 
for  base-height),  and  the  top  of  the  fourth  segment  was 
chosen  to  correspond  to  the  top  of  the  duct  (the  first 
four  segments  accounting  for  duct  height).  (For  the  re¬ 
sults  shown  here,  the  fit  was  obtained  manually.)  To 
reduce  a  spurious  source  of  variance  in  these  parame¬ 
ters,  the  historical  profiles  were  edited  to  remove  those 


height(m)  height(m) 

Prior:  eigenvector  |1) 

Prior:  eigenvector  [2] 

Estim:  eigenvector  |1] 

Estim:  eigenvector  [2] 

Figure  3:  The  first  four  dominant  eigenvectors  of  the 
prior  covariance,  Use-  Each  panel  contains  three  plots, 
the  profiles  corresponding  to  (1)  the  mean  (dashed), 
(2)  and  (3)  the  mean  ±  the  eigenvector  (scaled  by  the 
same  constant  in  all  four  panels). 

for  which  the  main  duct  feature  was  not  identifiable 
(such  as  profiles  that  looked  basically  linear,  with  no 
apparent  duct). 

As  might  be  expected,  profiles  synthesized  from  a 
multivariate  Gaussian  model  on  feature-based  parame¬ 
ters,  rather  than  on  the  raw  profiles,  are  more  realistic 
in  terms  of  reproducing  the  gross  shape  of  a  typical 
profile,  including  the  main  duct.  To  insure  positiv¬ 
ity,  a  log-normal  model  was  used  on  the  heights  (the 
appropriateness  of  which  was  verified  by  inspection  of 
histograms  from  real  data).  The  multivariate-normal 
model  was  then  applied  to  the  vector  of  log(heights) 
and  M-deficits. 

Interestingly,  the  resulting  mean,  the  dashed  line 
in  the  panels  of  Figure  3,  looks  very  tri-linear.  The 
influence  of  the  dominant  eigenvectors  of  the  prior  co- 
variance  R00  are  depicted  by  the  solid  lines  in  Fig¬ 
ure  3.  The  first  eigenvector  corresponds  to  increas¬ 
ing  base-height  while  decreasing  M-deficit  in  the  tri- 
linear  model.  The  second  has  a  lot  of  energy  going 
into  shrinking  and  expanding  the  length  of  the  top  seg¬ 
ment.  This  by  itself  is  a  strict  degeneracy:  scaling  of 
the  length  of  the  final  segment  has  no  effect  on  the  pro¬ 
file  and  no  effect  on  the  clutter  measurements  used  to 

Figure  4:  The  first  four  dominant  eigenvectors  of  the 
sample  estimator  covariance,  R^. 

estimate  the  profile;  it  is  only  an  artifact  of  the  initial 
parameterization  scheme.  Furthermore,  for  the  mea¬ 
surement  method  presumed  for  this  study,  measure¬ 
ment  of  sea-surface  clutter  strength  across  range,  vari¬ 
ations  in  the  top-half  of  the  the  profile  constitute  an 
effective  estimation  degeneracy,  since  they  have  little 
effect  on  the  ducting  behavior  and  measured  surface 
clutter,  and  are  therefore  difficult  to  estimate. 

We  used  a  sample  covariance  approach  to  approx¬ 
imate  the  estimator  covariance  R^  of  Equation  5.  A 
likelihood  function  f{y\0)  is  easily  obtainable,  as  a  func¬ 
tion  of  the  propagation  loss  L(9)  from  the  transmitter 
to  the  sea  surface,  across  range  (where  the  dimension 
of  y  and  L  is  the  number  of  range  cells).  The  problem 
is  that  the  PE  (parabolic  equation)  numerical  propaga¬ 
tion  of  the  field  is  time  intensive,  severely  limiting  the 
number  of  parameter  values  6t  at  which  the  propaga¬ 
tion  loss  can  be  evaluated. 

To  generate  samples  for  a  sample  covariance,  we 
computed  approximate  conditional-mean  based  estimates 
9,  based  on  the  weighted  sum  of  Equation  7.  In  prac¬ 
tice,  direct  implementation  of  Equation  7  failed,  since 
the  number  of  samples  0,-  (10,000)  was  too  small  to 
adequately  sample  the  likelihood  function,  forcing  one 
weight  Wi  to  be  unity,  and  the  rest  to  be  zero.  This 
effect  was  ameliorated  by  increasing  the  standard  de¬ 
viation  of  the  likelihood  function  by  a  factor  of  35. 


Figure  5:  The  mean-square-error  (MSE)  for  the  MAP 
estimator  over  a  grid  (1)  based  on  the  prior  covariance 
(KLT)  and  (2)  based  on  the  estimator  covariance. 

The  weighted  sum  can  be  interpreted  as  summing 
over  different  profiles  that  reproduce  well  the  observed 
data.  This  in  turn  has  the  effect  of  averaging  over,  or 
“washing  out” ,  variations  corresponding  to  the  degen¬ 
eracies  discussed  above,  which  have  less  impact  on  the 
measurement,  and  which  are  therefore  less  estimatable. 

The  sample  covariance  of  the  resulting  estimates 
has  the  eigenvectors  shown  in  Figure  4.  These  eigen¬ 
vectors  are  qualitatively  preferable  those  of  the  prior 
covariance,  in  terms  of  their  physical  interpretations: 
the  first  corresponds  to  increasing  duct  height  while 
decreasing  M-deficit,  and  the  second  to  increasing  base 
height  with  increasing  M-deficit.  Note  that  the  second 
eigenvector  of  the  prior  covariance,  in  Figure  3  (with 
energy  going  into  scaling  the  top  segment),  is  here  most 
closely  approximated  by  the  fourth  eigenvector.  So  the 
energy  going  into  this  degenerate,  non-estimatable  fea¬ 
ture  has  been  reduced. 

To  quantitatively  compare  this  parameterization  with 
that  of  the  KLT,  two  grids  consisting  of  6000  points 
were  constructed  from  the  dominant  three  eigenvectors 
of  the  prior  and  estimator  covariance,  respectively.  The 
number  of  grid  points  was  determined  by  the  relative 
energy  of  the  eigenvectors,  as  reflected  by  the  eigenval¬ 
ues;  25  x  16  x  15  and  40  x  15  x  10  grids  were  chosen  for 
the  prior  and  estimator  covariance  eigenvectors,  respec¬ 
tively.  The  mean-square-error  decreases  when  MAP  es¬ 
timates  are  found  over  the  grid  based  on  the  estimator 
covariance;  see  Figure  5. 


In  this  paper  we  have  discussed  the  problem  of  de¬ 
scribing  the  lower-dimensional  parameterization  of  an 
unknown  parameter  set  that  is  optimal  in  the  sense 
of  minimizing  mean-squared  error.  This  description, 
in  terms  of  a  reduced-rank  subspace,  depends  on  both 
the  measurement  model  by  which  the  data  depends  on 
the  parameters  and  on  the  a  priori  distribution  of  the 
parameters.  It  can  be  viewed  as  a  generalization  of  the 
Kahunen-Loeve  Transform,  which  considers  only  the 
prior.  The  initial  parameterization  and  the  nature  of 
the  measurement  model  may  contain  parameters  which 
are  degenerate  in  the  sense  that  they  have  less  impact 
on  the  measured  data.  The  aim  of  the  approach  pre¬ 
sented  in  this  paper  is  to  seek  parameterizations  in 
which  the  strength  of  these  parameters  is  decreased,  so 
that  the  reduced-dimension  parameterization  empha¬ 
sizes  more  estimatable  features.  We  have  evaluated 
this  procedure  for  the  application  of  estimating  index 
of  refraction  profiles  from  clutter  returns,  where  it  pro¬ 
duces  more  physically  meaningful  reduced-rank  basis 
functions,  and  decreases  mean-squared-error  relative  to 
the  KLT  basis. 


[1]  T.  Rogers,  “Effects  of  the  variability  of  at¬ 
mospheric  refractivity  on  propagation  estimates,” 
IEEE  Trans.  Antennas  Propagat.,  vol.  44,  pp.  460- 
465,  April  1996. 

[2]  C.  W.  Therrien,  Discrete  Random  Signals  and  Sta¬ 
tistical  Signal  Processing,  Signal  Processing  Series, 
Ed.  A.  V.  Oppenheim.  Prentice  Hall,  1992. 

[3]  L.  L.  Scharf,  Statistical  Signal  Processing,  Addison- 
Wesley,  1991. 

[4]  L.  L.  Scharf,  “The  SVD  and  reduced  rank  signal 
processing,”  Signal  Processing,  vol.  25,  pp.  113— 
133,  1991. 

[5]  Y.  Hua  and  W.  Liu,  “Generalized  Karhunen-Loeve 
transform,”  IEEE  Signal  Processing  Letters,  vol.  5, 
no.  6,  pp.  141-142,  June  1998. 

[6]  Y.  Hua  and  M.  Nikpour,  “Computing  the  reduced 
rank  Wiener  filter  by  IQMD,”  IEEE  Signal  Pro¬ 
cessing  Letters,  submitted. 

[7]  J.  Tabrikian,  “Theoretical  performance  limits  on 
tropospheric  refractivity  estimation  using  point-to- 
point  microwave  measurements,”  IEEE  Trans.  An¬ 
tennas  Propagat.,  vol.  47,  no.  11,  pp.  1727-1734, 
November  1999. 




Mark  A.  Kliger  and  Joseph  M.  Francos 

Department  of  Electrical  and  Computer  Engineering 
Ben-Gurion  University 
Beer-Sheva  84105,  Israel. 


We  consider  the  problem  of  jointly  estimating 
the  number  as  well  as  the  parameters  of  two- 
dimensional  sinusoidal  signals,  observed  in  the 
presence  of  an  additive  white  Gaussian  noise 
field.  Existing  solutions  to  this  problem  are 
based  on  model  order  selection  rules,  derived 
for  the  parallel  one-dimensional  problem.  These 
criteria  are  then  adapted  to  the  two-dimensional 
problem  using  heuristic  arguments.  Employ¬ 
ing  asymptotic  considerations,  we  derive  in  this 
paper  a  maximum  a-posteriori  (MAP)  model 
order  selection  criterion  for  jointly  estimating 
the  parameters  of  the  two-dimensional  sinusoids 
and  their  number. 


From  the  2-D  Wold-like  decomposition  we  have  that 
any  2-D  regular  and  homogeneous  discrete  random  field 
can  be  represented  as  a  sum  of  two  mutually  orthog¬ 
onal  components:  a  purely-indeterministic  field  and  a 
deterministic  one.  The  purely-indeterministic  compo¬ 
nent  has  a  unique  white  innovations  driven  moving  av¬ 
erage  representation.  The  deterministic  component  is 
further  orthogonally  decomposed  into  a  harmonic  field 
and  a  countable  number  of  mutually  orthogonal  evanes¬ 
cent  fields.  In  this  paper  we  consider  a  special  case  of 
the  foregoing  general  problem.  More  specifically,  we 
consider  the  problem  of  jointly  estimating  the  num¬ 
ber  as  well  as  the  parameters  of  the  sinusoidal  signals 
comprising  the  harmonic  component  of  the  field,  in  the 
presence  of  the  purely-indeterministic  component,  as¬ 
sumed  here  to  be  a  white  noise  field. 

A  solution  to  this  problem  is  an  essential  compo¬ 
nent  in  many  image  processing  and  multimedia  data 
processing  applications.  For  example,  in  indexing  and 

This  work  was  supported  in  part  by  the  Israel  Ministry  of 
Science  under  Grant  1233198. 

retrieval  systems  of  multimedia  data  that  employ  the 
textural  information  in  the  imagery  components  of  the 
data,  e.g.,  [7],  the  identification  of  similar  textured  sur¬ 
faces  as  being  such,  is  highly  sensitive  to  errors  in  es¬ 
timating  the  orders  of  the  models  of  the  deterministic 
components  of  the  textures.  More  specifically,  in  this 
approach  the  2-D  Wold  decomposition  based  paramet¬ 
ric  model  of  each  textured  segment  of  the  image  also 
serves  as  the  index  of  this  segment.  Therefore  an  accu¬ 
rate  and  robust  procedure  for  estimating  the  orders  as 
well  as  the  parameters  of  the  models  of  the  determin¬ 
istic  components  of  the  textures  is  an  essential  compo¬ 
nent  in  any  such  indexing  and  retrieval  system.  Simi¬ 
lar  requirements  are  posed  by  parametric  content-based 
image  coding  and  representation  methods. 

The  same  type  of  problem,  i.e.,  joint  estimation  of 
the  model  order  and  parameters  for  a  sum  of  2-D  si¬ 
nusoidal  signals  observed  in  additive  noise,  naturally 
arises  in  processing  2-D  SAR  data.  In  this  problem 
however  the  observed  random  field  is  complex  valued, 
where  for  each  scatterer  one  frequency  parameter  cor¬ 
responds  to  the  range  information,  while  the  second 
frequency  parameter  is  the  Doppler.  The  complex  val¬ 
ued  amplitude  of  each  such  exponential  is  proportional 
to  the  radar  cross  section  of  the  target. 

Many  algorithms  have  been  devised  to  estimate  the 
parameters  of  sinusoids  observed  in  additive  white  Gaus¬ 
sian  noise.  Most  of  the  algorithms  assume  that  the 
number  of  sinusoids  is  a-priori  known.  However  this 
assumption  does  not  always  hold  in  practice.  Hence,  in 
the  past  two  decades  the  model  order  selection  problem 
has  received  considerable  attention.  In  general,  model 
order  selection  rules  are  based  (directly  or  indirectly) 
on  three  popular  criteria:  Akaike  information  criterion 
(AIC),  the  minimum  description  length  (MDL)  and 
the  maximum  a-posteriori  probability  (MAP)  criterion. 
All  these  criteria  have  a  common  form  in  that  they  com¬ 
prise  two  terms:  a  data  term  and  a  penalty  term,  where 
the  data  term  is  the  log-likelihood  function  evaluated 
for  the  assumed  model.  However,  most  of  the  papers 

0-7803-5988-7/00/$  10.00  ©  2000  IEEE 


dedicated  to  this  problem  discuss  the  model  order  selec¬ 
tion  problem  for  various  models  of  one-dimensional  sig¬ 
nals,  while  the  problem  of  modeling  multidimensional 
fields  has  received  considerably  less  attention.  Djuric, 
[1],  proposed  a  MAP  order  selection  rule  for  1-D  sinu¬ 
soids  observed  in  additive  white  noise.  Kavalieris  and 
Hannan,  [4],  prove  the  strong  consistency  of  a  crite¬ 
rion,  that  indirectly  employs  the  MDL  principle.  In 
this  framework  the  observation  noise  is  modeled  as  an 
autoregression  of  an  unknown  order.  In  the  special 
case  where  the  noise  process  in  [4]  is  assumed  to  be  a 
white  noise  process,  the  resulting  criterion  is  identical 
to  the  MAP  criterion  derived  in  [1].  Stoica  et  al,  [5] 
proposed  the  cross-validation  selection  rule  and  demon¬ 
strated  its  asymptotic  equivalence  to  the  Generalized 
Akaike  Information  Criterion  (GAIC).  In  [6]  this  crite¬ 
rion  is  applied  to  the  2-D  problem  as  well,  where  the 
penalty  term  is  proportional  to  the  total  number  of  un¬ 
known  parameters,  exactly  as  in  the  1-D  case.  In  this 
paper  we  derive  a  MAP  model  order  selection  criterion 
for  jointly  estimating  the  number  and  the  parameters 
of  two-dimensional  sinusoids  observed  in  additive  white 

The  paper  is  organized  as  follows.  In  Section  2  we 
define  our  notations,  while  in  Section  3  we  formally 
define  the  MAP  model  order  selection  problem.  The 
MAP  model  order  selection  criterion  is  derived  in  Sec¬ 
tion  4.  Finally,  in  Section  5  we  provide  some  numerical 
examples  and  Monte-Carlo  simulations  to  better  illus¬ 
trate  the  performance  of  the  proposed  criterion. 


The  considered  random  field  is  composed  of  an  har¬ 
monic  field  embedded  in  Gaussian  noise.  Let  {y(n,  m)} 
where  (n,m)  €  U  and  U  =  {(n,m)  |  0  <  n  <  5  —  1, 0  < 
m  <  T  -  1},  be  the  observed  S  xT  real  valued  data 
field.  The  elements  of  y(n,  m)  may  be  represented  as 

y(n,  m)  =  h(n,  m)  +  u(n,  m).  (1) 

The  field  {u(n,m)}  is  the  2-D  zero  mean,  Gaussian 
white  noise  field  with  variance  a2.  The  field  {h(n,m)} 
is  the  harmonic  random  field 


h(n,  m)  —  Ci  cos(nu>i  +  mi/*)  +  Gi  sin (nw,  +  mvi) 
i=  1 


where  k  denotes  the  number  of  sinusoidal  components 
in  the  data  model,  and  (w*,  vt)  is  the  spatial  frequency 
of  the  ith  component.  The  CVs  and  Gi  s  are  the  am¬ 
plitudes  of  the  sinusoidal  components  in  the  observed 

Let  us  define  the  following  matrix  notations: 

y  =  [  2/(0, 0), . .  .,y(0,T  -  1),»(1,0), . . . 

...,y(S  -l,T  -l)f  (3) 

The  vectors  u  and  h  are  similarly  defined.  Rewriting 
(1)  we  have  y  =  h  +  u.  Let  A  denote  the  covariance 
matrix  of  y.  Thus 

A  =  a2IsTxST  (4) 

where  I stxST  is  an  ST  x  ST  identity  matrix.  Hence, 
|A|  =  cr2ST .  Also  define 

a  =  [C\,G\,  C2,  G2,  ■  ■  ■ ,  Cfc,  Gk}T  ■  (5) 


Aj  = 

j[  Oaii+Oi/j]  j[0u)i+(T-l)vi] 

e  ,  .  .  .  ,  c  , 

. . . ,  e 



and  let  us  define  the  following  ST  x  2k  matrix 
D  =  [  Re(Ai),Im(Ai),Re(A2),Im(A2), . . . 

...,Re(Afc),Im(Afc)]  (7) 

Using  the  foregoing  notations  we  have  that 

y  =  Da  +  u.  (8) 

In  the  following  it  is  assumed  that  the  matrix  DrD  is 
full  rank. 


Let  p(k )  be  the  a-priori  probability  that  there  exist 
k  sinusoidal  components  in  the  observed  field.  It  is 
assumed  that  there  are  Q  competing  models,  where 
Q  >  M  ( M  being  the  actual  number  of  sinusoidal  com¬ 
ponents),  and  that  each  model  is  equiprobable.  That 

iS  1 

p{k)  =  Q,  k  e  Zq  (9) 

where  Zq  =  {0, 1,2,...,  Q}-  The  MAP  estimate  of 
M  is  the  value  of  k  that  maximizes  the  a-posteriori 
probability  p(fc|y),  where  k  £  Zq.  More  specifically, 

Mmap  = 

jp(%)  j 

arg  max 


f  p(y\k)p(k) ) 

arg  max  <  - r-r - f 

kezQ  l  P( y  J 

p(  y) 

arg  max  <  lnp(y|fc)  > 
k€ZQ  l  J 

arg  max 




where  p(y|fc)  denotes  the  conditional  probability  of  y 
given  that  there  are  k  sinusoidal  components  in  the 


W  =  [aJi,w2,...,u>k,u1,u2,...,uk,]T  .  (11) 

Also  let  1Z+  denote  the  positive  real  line,  let  Ak  = 
TZ2k ,  and  let  flk  =  ([0, 27r))2fc.  Thus,  we  have  that 
a  e  7Z+,  a  6  Ak,  and  W  €  flk-  Using  these  notations 
the  conditional  probability  density  p(y\k)  is  expressed 

p(y|*)  =  f  f  [  p(y|fc,w,<r,a) 

Jnk  Jn+  JAk 

x  p(W,o,  a\k)dadodW  (12) 

where  p(W,cr,  a|fc)  is  the  a-priori  probability  of  W,  o 
and  a  given  there  exist  k  sinusoidal  components  in  the 
observed  data. 

4.1.  Priors  Selection 

Inspecting  (10)  and  (12)  we  conclude  that  finding  Mmap 
using  the  observed  data  only,  requires  that  some  as¬ 
sumptions  be  made  regarding  the  prior  distribution  of 
the  model  parameters,  p(W,a, a|fc).  Clearly  our  goal 
is  to  derive  a  model  selection  rule  that  will  be  based 
on  a  non-informative  prior  about  the  parameters.  In 
other  words,  the  selected  prior  should  be  chosen  such 
that  it  represents  the  lack  of  a-priori  knowledge  of  the 
values  of  problem  parameters,  before  the  data  is  ob¬ 
served.  (See,  e.g.,  [2]  for  a  detailed  discussion  of  the 
problem  of  choosing  non-informative  priors). 


£>(W ,  <7,  a|fc)  =  p(<x,a|W,A:)p(W|fc).  (13) 

Since  the  sinusoidal  frequencies  are  assumed  indepen¬ 
dent  of  each  other  (*.  e. ,  that  they  are  not  harmonically 
related) ,  the  lack  of  a-priori  knowledge  of  the  frequen¬ 
cies  is  modeled  by  assuming  the  frequencies  (wj,t/j)  to 
be  uniformly  distributed  on  fi*.  Thus, 

p(wi*=>  =  ■  m> 

Note  that  since  the  probability  of  w;  being  equal  to  Uj 
for  some  i  ^  j  is  zero  (and  similarly  for  i being  equal 
to  i>j),  we  assume  in  the  following  that  for  all  i  ^  j, 
to i  A  Uj  (and  similarly  i/f  Vj).  Hence  the  following 
derivation  of  the  model  order  selection  criterion  holds 
almost  everywhere  in  the  problem  probability  space, 

i.e.,  except  for  a  set  of  models  of  probability  measure 

Given  that  W  and  k  are  known,  D  is  also  known 
and  the  observation  model  (8)  becomes  a  linear  regres¬ 
sion  model  where  the  observations  are  subject  to  a  zero 
mean  white  Gaussian  observation  noise  with  variance 
cr2,  such  that  a,  o  are  unknown.  For  this  problem  it  is 
shown  in  [2]  that  in  the  space  defined  by  a  and  In  a  the 
shape  of  the  likelihood  function  surface  is  “data  trans¬ 
lated”,  i.e.,  it  is  invariant  to  translations  that  result 
from  the  different  values  these  parameters  assume  in 
different  realizations  of  the  observed  data.  Hence  the 
idea  that  little  is  known  a-priori  relative  to  the  infor¬ 
mation  contained  in  the  observed  data  is  expressed  by 
choosing  a  prior  distribution  such  that  p(lnor,  a|W,  k) 
is  locally  uniform,  or  equivalently  that 

p(fj,  a|W,  A:)  a  cr-1  .  (15) 

Substituting  (14)  and  (15)  into  (13)  we  have  that  the 
desired  non-informative  prior  is  given  by 

p(W,(r,a|fc)oc^jir,T_1-  (16) 

4.2.  Evaluation  of  the  a-Posteriori  Distribution 

In  this  subsection  we  derive  an  approximate  expres¬ 
sion  for  the  a-posteriori  probability  distribution  p(y|fc) 
given  in  (12).  Since  the  noise  field  {u(n,m)}  is  Gaus¬ 
sian  we  have  using  (4)  and  (8) 

p(y|fc.W,er,a)  =  p(u|er) 

=  (2tt(t2)-^L  exP  j-^2  (y  “  Da)T(y  -  Da)  J.(17) 

Let  a  =  (DrD)~ 1  D7y  and  let  Px  denote  the  projec¬ 
tion  matrix  defined  by 

Px  =  I-D(DrD)1DT.  (18) 

Using  these  notations  we  have  that 

(y  -  Da)r(y  -  Da)  =  yTP±y  +  (a  -  a)TDTD(a  -  a). 


Applying  the  prior  (16)  and  evaluating  the  marginal 
distribution  we  have 

P(y>W,cr|A:)  =  [  p(y|fc,W,cr,a)p(W,er,a|fc)da 


^  Ia  ^27rcr2)_^:  ^pj-^2^  -Da)T(y~Da)j 
X  (2jr)2fc<r  da 

=  (2jrCT3)-V— — L— exp{  — 2^yTp±y} 


x  J  exp|-^-(a  -  a)TDTD(a  -  a)|  ds 

=  V™2rV(2^exr{-^yTply} 

Substituting  (23)  into  (22)  and  employing  the  Laplace 
asymptotic  approximation  we  have  that  as  ST  -»  oo 

|DtD|!/2  ' 

Next,  we  evaluate  p(y,W\k).  Substituting  (20)  we 

p(y,W|fc)  =  f  p{y,W,a\k)da 

n-2k-l  ST+2k  (ST  -  Tt>±.  \-st-‘ 

oc  2  2k  x7r  2  ,  ( - - - J  |D  D|  2(y  P  y)  2 


Lemma  1 

[  exp{ST 

J  Qi, 

lnp(y,  W\k) . 

p(y,W\k)(2n)k\HMLrHST)-k  (25) 

=  0(S2kT2k)  . 

where  ,(  ■)  is  the  standard  Gamma  function  (see,  e.g., 

[2]  for  the  integration  residt). 

Finally,  to  obtain  an  expression  for  the  conditional 
probability  p(y|fc)  we  have  to  evaluate 

p(y\k)=  f  P(y,W|fc)dW.  (22) 

J  Qk 

Since  a  direct  analytic  solution  to  this  integration  prob¬ 
lem  does  not  exist,  we  derive  an  approximate  solution, 
employing  the  Laplace  integration  method  (see,  e.g., 

[3] ).  Following  [3],  p.  71,  we  first  expand  lnp(y,  W| k) 
into  a  Taylor  series  about  W,  where  W  denotes  the 
ML  estimate  of  W.  Since  W  is  a  maximum  point  of 
the  likelihood  function,  the  first  order  derivatives  of 
^  lnp(y,  W\k)  at  this  point  vanish.  Omitting  from 
the  expansion  terms  of  order  higher  than  two,  we  have 

Proof:  See  [9]. 

Substituting  (21)  and  (26)  into  (25)  we  have 

p(y|fc)  oc  2  k  1tt  S( ,  ^ 

x  (yTPxy) 

ST -2k 


-2krp—2k  \ 

where  D  and  Px  are  the  matrices  D  and  Px,  respec¬ 
tively,  with  W  substituted  by  its  ML  estimate, W.  It 
is  possible  to  further  simplify  (27)  by  observing  that 
|DtD|  =  0(S2kT2k)  (see  [9]).  Furthermore,  employ¬ 
ing  the  asymptotic  properties  of  the  Gamma  function 
(see,  e.g.,  [8],  p.  31)  we  have  that  as  ST  — >  oo, 

'ST -2k' 

ST -2k- 

p(y,W|fc)  =  exp  \ST 

lnp(y,  W\k) 

~  exp  | 

H  ml  — 

nlnp(y,W|fc)  ST 



Substituting  these  approximations  into  (27),  and  omit¬ 
ting  terms  that  are  independent  of  k,  the  final  form 
of  the  model  order  selection  criterion  can  be  readily 


=  argmin  <  —  lnp(y|fc)  > 
k€ZQ  l  J 

=  arg min  (  —  ln(y7Pxy)  +  ^ln|DTD| 

fcezc  I  2  Z 

+A;ln  +  2k\nST  +  (k  +  l)ln2l 

d2  In  p(y,W|fc)  .  , 

9Wj  8Wi8W2 

92  In  p(y,W|fc)  d2  lnp(y,W|fc) 
9W29Wi  9Wj 

?2lnp(y,W|fc)  92lnp(y,W|fc) 

d\V2kdW1  0W2k3W2 

82  lnp(y,W|<; 

92  lnp(y,W|*0 
92  In  p(y,W|A:) 

92  lnp(y,W|fc) 

=  arg  mm 


ST -2k 

ln(yTPxy)  +  4k  In  ST |  (29) 

is  the  Hessian  matrix  of  ^=lnp(y,  W|fc)  evaluated  at 
W  =  W.  As  W  is  a  maximum  point  of  lnp(y,  W|fc), 
H ml  is  positive  definite.  Since  In p(y,  W|fc)  is  assumed 
sufficiently  smooth  at  W,  H ml  is  symmetric. 


To  illustrate  the  performance  of  the  proposed  model 
order  selection  rule  we  present  some  numerical  exam¬ 
ples.  In  the  examples  below,  the  data  field  was  gen¬ 
erated  with  four  equiamplitude  sinusoidal  components, 
and  we  define 

SNR;  =  10  log 

Cf  +  G2 


The  noise  is  a  white  Gaussian  noise  field  with  variance 
a2  which  is  chosen  to  yield  the  desired  signal  to  noise 
ratio.  In  these  experiments  the  signal  to  noise  ratio  of 
each  component,  SNR,,  varies  in  the  range  of  -15dB  to 
-5dB,  in  steps  of  ldB.  For  each  SNR,  100  Monte-Carlo 
experiments  are  performed.  The  data  field  dimensions 
are  32  x  32.  The  frequencies  of  the  sinusoidal  com¬ 
ponents  are  ( — 27r0.155,  27t0.253),  (-27t0.155,  27t0.296), 
( — 27t0. 112, 27r0.274),  (2tt0.112, 2tt0.201).  Their  ampli¬ 
tudes  are  given  by  C,  =  G,  =  1, *  =  1,...,4.  The 
performance  results  of  the  proposed  MAP  selection  cri¬ 
terion  are  summarized  in  Table  1  for  various  values  of 
SNR,.  For  comparison,  the  performance  results  of  the 
GAIC  criterion,  [6],  are  listed  as  well.  To  further  il¬ 
lustrate  the  performance  of  the  proposed  MAP  model 
order  selection  criterion,  the  probabilities  of  correct 
model  order  selection  for  the  two  criteria  are  depicted 
in  Fig.  1.  The  simulation  results  demonstrate  that 
even  for  modest  dimensions  of  the  observed  field,  and 
relatively  low  SNR’s,  i.e.,  as  low  as  -9dB,  the  error  rates 
of  both  the  MAP  and  the  GAIC  model  order  selection 
criteria  are  very  low.  The  performance  of  the  MAP  rule 
is  shown  to  be  better  than  that  of  the  GAIC  for  lower 
SNR’s.  Furthermore,  the  results  indicate  that  for  the 
lower  SNR  range,  the  probability  of  correct  model  order 
selection  by  the  MAP  criterion  is  not  only  higher,  but 
also  that  the  magnitude  of  the  error  is  much  smaller 
than  in  the  case  of  the  GAIC  model  order  estimate. 



















































































Table  1:  Performance  comparison  of  MAP  and  GAIC 
criteria  for  various  values  of  SNR,. 


[1]  P.  M.  Djuric,  “A  Model  Selection  Rule  for  Sinu¬ 
soids  in  White  Gaussian  Noise,”  IEEE  Trans.  Sig¬ 
nal  Process.,  vol.  44,  pp.  1744-1751,  1996. 

Figure  1:  Probabilities  of  correct  model 
order  selection.  The  solid  and  the  dashed 
lines  represent  the  MAP  and  the  GAIC 
performance  curves,  respectively. 

[2]  G.  E.  P.  Box  and  G.  C.  Tiao,  Bayesian  Inference 
in  Statistical  Analysis.,  New  York:  Wiley,  1992. 

[3]  N.  G.  De  Bruijn,  Asymptotic  Methods  in  Analysis, 
3rd  edition,  Amsterdam:  North-Holland  Publish¬ 
ing  Co.,  1970. 

[4]  L.  Kavalieris  and  E.  J.  Hannan,  “Determining  the 
Number  of  Terms  in  a  Trigonometric  Regression,” 
J.  Time  Series  Anal.,  vol.  15,  pp.  613-625,  1994. 

[5]  P.  Stoica,  P.  Eykhoff,  P.  Janssen  and  T.  Soder- 
strom,  “Model-Structure  Selection  by  Cross- 
Validation,”  Int.  J.  Control  ,  vol.  43,  pp.  1841- 
1878,  1986. 

[6]  J.  Li  and  P.  Stoica,  “Efficient  Mixed-Spectrum  Es¬ 
timation  with  Application  to  Target  Feature  Ex¬ 
traction,”  IEEE  Trans.  Signal  Process.,  vol.  44, 
pp.  281-295,  1996. 

[7]  R.  Stoica,  J.  Zerubia  and  J.  M.  Francos,  “The 
Two-Dimensional  Wold  Decomposition  for  Seg¬ 
mentation  and  Indexing*  in  Image  Libraries,”  Int. 
Conf.  Acoust.,  Speech,  Signal  Processing,  Seattle, 

[8]  E.  D.  Rainville,  Special  Functions,  MacMillan, 
New  York,  1967. 

[9]  M.  Kliger  and  J.  M.  Francos,  “MAP  Model  Order 
Selection  Criterion  for  2-D  Sinusoids  in  Noise,”  in 



Dong  Wei 

Center  for  Telecommunications  and  Information  Networking 
Department  of  Electrical  and  Computer  Engineering,  Drexel  University 
Philadelphia,  PA  19104  U.S.A. 



We  study  the  optimum  (in  the  minimum  mean-square 
error  sense)  linear  periodically  time-varying  deconvo¬ 
lution  filter  of  finite  size.  We  show  that  the  filter  can 
be  in  the  form  of  lapped  transform  or  multirate  filter- 
bank,  and  it  includes  the  FIR  Wiener  filter  as  a  special 
case.  We  demonstrate  that  the  proposed  filter  always 
possesses  a  gain  over  the  Wiener  filter. 


Consider  the  discrete-time  model 

x[n]  =  (s  *  h)[n]  +  w[n]  (1) 


=  ^  h[m]s[n  -  m]  +  w[n]  (2) 

m= 0 

where  s[n]  is  the  original  signal,  h[n\  is  a  known  linear 
time-invariant  (LTI)  system  with  N  taps,  u>[n]  is  the 
additive  noise,  x[n]  is  the  observed  data,  and  the  sym¬ 
bol  *  denotes  convolution.  We  assume  that  both  s[n] 
and  w[n]  are  zero-mean,  wide-sense  stationary,  second- 
order  random  processes,  their  second-order  statistics 
are  known,  and  they  are  uncorrelated,  i.e., 

=  (3) 

for  any  n  and  k.  Such  a  model  has  been  widely  used  in 
signal  processing  applications  such  as  filtering,  smooth¬ 
ing,  prediction,  noise  canceling,  and  deconvolution,  just 
to  name  a  few. 

The  goal  is  to  estimate  the  signal  s[n]  from  the 
noisy,  filtered  data  x[n].  An  LTI  finite  impulse  response 
(FIR)  filter  f[n]  can  be  applied  to  x[n].  The  resulting 
estimate  of  s[n]  is  given  by 



K- 1 

Y.  f[m]x[n  -  m] 

m= 0 


where  K  is  the  length  of  f[n].  The  FIR  Wiener  decon¬ 
volution  filter  is  the  optimum  LTI  FIR  system  (denoted 
by  the  vector  fopt)  in  the  minimum  mean-square  error 
(MMSE)  sense: 

/opt  =  argmmF{|s[n]  -s[n]|2}  (6) 


/  =  [/[0]  /[l]  ...  f[K-  l]f  (7) 

with  the  symbol  T  denoting  matrix  transpose. 

We  now  reconsider  the  optimality  of  the  Wiener 
filter  in  (6)  from  a  different  viewpoint.  The  filtering 
operation  in  (4)  can  be  expressed  as 

s[n]  =  F  lti*M  (8) 


s[n]  =  [s[n]  s[n  -  1]  ...  s[n  -  M  +  1  ]]T,  (9) 

a:[n]  =  [x[n]  x[n  —  1]  ...  x[n  -  L  +  1]]T,  (10) 

Flti  is  an  M  x  L  matrix  given  by 

'  fT  0  0  ...  0 

0  fT  0  ...  0 

Flti  = 

.0  0  ...  0  fT 

and  L  =  K  +  M  —  l. 

The  linear  lapped  transform  [1] 

s[n]  =  Fx[n]  (12) 

where  F  is  an  M  x  L  constant  matrix,  is  a  more  general 
version  of  linear  filtering  than  (8).  We  require  that 
M  <  L.  When  M  =  L,  the  linear  lapped  transform 
reduces  to  a  linear  block  transform. 

A  few  interesting  questions  arise.  Does  the  Wiener 
filter  result  in  the  optimum  (in  the  MMSE  sense)  es¬ 
timate  s[n]?  If  it  does  not,  how  can  we  do  better  and 
what  is  the  best  estimate? 

In  this  paper,  we  answer  these  questions. 

0-7803-5988-7/00/$  10.00  ©  2000  IEEE 



2.1.  Some  Basics 

s[n  -  IM]  =  Fx[n  -  IM]  (13) 

for  any  integer  /,  such  a  linear  lapped  transform  is  in 
general  a  generic  LPTV  filter  with  period  M.  The 
LPTV  filter  can  be  implemented  by  means  of  an  M- • 
channel  multirate  filterbank  [2]. 

When  M  =  1,  the  LPTV  filter  reduces  to  an  LTI 
filter  with  L  taps.  For  M  >  1,  the  LPTV  filter  reduces 
to  an  LTI  filter  if  and  only  if  the  M  x  L  matrix  F  sat¬ 
isfies.  This  implies  that  any  LTI  filter  of  length  up  to 
L-M+ 1  is  a  special  case  of  the  LPTV  filter  character¬ 
ized  by  F.  Therefore,  the  optimum  LPTV  filter  of  size 
M  x  L  always  possesses  a  gain  over  the  Wiener  filter  of 
length  L  —  M  +  1.  Such  a  gain  results  from  the  more 
flexible  processing  of  data  blocks  than  the  LTI  filtering. 
For  filtering,  the  two  filters  require  L  and  L  -  M  +  1 
multiplications  per  data  sample,  respectively.  When  M 
is  small  compared  to  L,  their  computational  complex¬ 
ities  are  comparable. 

2.2.  The  Optimum  Filter 

The  model  in  (1)  can  be  expressed  in  the  vector  form: 

x{n)  =  Hs[n]  +  w[n]  (14) 

where  H  is  an  L  x  (L  +  N  —  1)  matrix  given  by 

'  hT  0  0  ...  0  ■ 

0  hT  0  ...  0 

H  = 

0  0 

...  0  hT  _ 



h  = 

[Mo]  Mi] 

...  h[N  -  1]]T, 


s[n]  =  [s[n]  s[n  —  1] 


s[n  —  L  —  N  +  2]]t  , 


w[n]  =  [?u[n] 

w[n  —  1] 

. . .  w[n  —  L  +  l]]r. 


Define  the  estimation  error  in  the  nth  block  as 

We  attempt  to  design  the  optimum  F  to  minimize  the 
mean-square  error  (MSE)  in  the  block  s[n],  which  is 
given  by 

J  =  bEieH^  e(n]}'  (23) 

It  follows  that 

J  =  ± E{\\(FH  -  A)s[n }  +  Fm[n]||2}  (24) 

=  ^E{tv[((FH  -  A)s[n]  +  Fw[n ]) 

x((FH  -  A)s[n]  +  Fw[n])H]}  (25) 

=  ~tr[(FH  -  A)RS(FH  -  Af  +  FRwFh] 


=  ~tx[F{HRsHH  +  Rw)Fh  +  ARSAH 
-FHRsAh  -  ARSHHFH ]  (27) 



Rs  =  £{s[n]sff[n]}, 
Rw  =  E{w[n]wH[n]}. 




0  MxL, 




we  obtain  the  matrix  form  of  the  Wiener-Hopf  equa¬ 

Fopt{HRsHH  +  Rw)  =  ARsHh.  (31) 

Therefore,  the  optimum  LPTV  filter  is 

Fopt  =  ARsHh{HRsHh  +  R,„)-1  (32) 

and  the  resulting  minimum  MSE  is 

Jlptv  ,min  —  &s  -^rtr  [ARsHh{HRsHh  +  Rlu)~1 
xHRsAh],  (33) 

The  optimum  filter  can  be  viewed  as  the  extension 
of  the  FIR  Wiener  filter  to  LPTV  system.  Indeed,  when 
M  —  1,  Fopt  reduces  to  the  FIR  Wiener  filter  with  L 
taps.  On  the  other  hand,  for  D  =  1,2,  ...,M,  the 
Dth  row  of  the  filtering  matrix  Fopt  is  the  MMSE  FIR 
filter  for  estimating  s[n  -  D  +  1]  from  the  data  set 
{x[m]  :  —oo  <  rn  <  n}. 

e[n ]  =  s[n]  -  As[n]  (19) 


A  =  [Im  Omx(l-m+w-i)],  (20) 


e[n]  —  F(/Ts[n]  +  m[n])  -  As[n]  (21) 
=  (FH-A)s[n)  +  Fw[n\.  (22) 

2.3.  When  Is  There  No  Gain? 

In  general,  the  performance  of  the  optimum  LPTV  fil¬ 
ter  is  better  than  the  performance  of  the  optimum  LTI 
filter  in  the  sense  that 

JhPTV  ,min  <  Jvn  ,min  (34) 

where  the  equality  holds  if  and  only  if 


•  the  signal  s[n]  is  a  white  noise  process,  i.e., 

E{s[n]s*[n  +  l]}=<r2s6[l},  (35) 

•  the  noise  rc[n]  is  a  white  noise  process,  i.e., 

E{w[n\w*[n  +  /]}  =  <r%6[l],  (36) 


•  the  LTI  system  h[n]  has  only  one  tap,  i.e.,  N  =  1. 

2.4.  Asymptotic  Performance  Analysis 

We  assume  that  s  [n]  and  w[n]  are  both  regular  pro¬ 
cesses  with  rational  power  spectra. 

Let  fo  [n]  denote  the  causal,  infinite  impulse  re¬ 
sponse  (HR)  Wiener  deconvolution  filter  for  estimat¬ 
ing  s[n  —  D]  from  the  data  set  {rr[m]  :  — oo  <  m  <  n}, 
where  D  >  0  indicates  a  delay.  The  performance  of 
frj[n]  shall  be  used  in  our  analysis  of  the  asymptotic 
performance  of  the  proposed  optimum  LPTV  filter. 
The  transfer  function  of  fo  M  is  given  by 

FD(z)  = 





J  + 

where  Q(z )  is  the  monic,  minimum-phase  factor  deter¬ 
mined  by  the  spectral  factorization  of  the  power  spec¬ 
trum  of  a:[n]: 

Px(z)  =  H(z)H*(l/z*)Ps(z)  +  Pv,(z)  (38) 
=  olQ{z)Q*{l/z*)  (39) 

and  the  subscript  “+”  is  used  to  indicate  the  “positive¬ 
time  part”  of  the  sequence  whose  z-transform  is  con¬ 
tained  within  the  brackets.  The  resulting  MSE  is 

MMSE  FIR  filter  converges  to  the  MMSE  causal  IIR 
filter.  Therefore, 

Jv\K  ,min  >  lim  Jy\k  ,mtn  (42) 


=  ^IIR.min-  (^3) 

If  M  is  fixed  and  L  tends  to  infinity,  then  the  Dth 
row  of  the  optimum  filtering  matrix  Fopt  converges  to 
/j}[n].  Therefore, 

Jlptv  ,min  ^  lim  JhPTV  ,min  (44) 

n  M- 1 

=  77  £  ^mimin'  (45) 

D= 0 

If  both  M  and  L  approach  to  infinity  with  K  —  L  - 
M  +  1  fixed,  then 

lim  JLpTV.n 

L— too.M— k  oo 

1  M- 1 


which  corresponds  to  the  MSE  of  the  non-causal  IIR 
Wiener  filter. 

In  summary,  the  optimum  LPTV  filter  outperforms 
the  Wiener  filter  asymptotically. 


We  have  presented  the  MMSE  linear  periodically  time- 
varying  deconvolution  filter.  The  proposed  filter  out¬ 
performs  the  it  linear  time-invariant  counterpart  at  the 
expense  of  increase  in  computational  complexity  and 

oo  N-l 

^/[Rlmin  =m»)-EE  fD[l]h[n]rs[D  —  n  —  l]  (40) 

/=0  n= 0 


j(°)  —  J_  f  p  (e3u) 

x  [1  -  FD(ejw)H{eju)eju,D ]  dw.  (41) 

The  derivations  of  the  optimum  filter  /p>M  and  its 
associated  MSE  are  given  in  Appendix. 

In  general,  increasing  the  delay  D  leads  to  the  smaller 
min-  Asymptotically,  /oo[n]  is  the  non-causal  IIR 
Wiener  filter. 

The  performance  of  estimating  the  signal  block  As{n\ 
using  the  filtering  matrix  F  can  be  improved  if  more 
observed  data  are  processed,  or  equivalently,  the  pa¬ 
rameter  L  is  increased.  As  L  tends  to  infinity,  the 


We  now  prove  (37)  and  (40). 

According  to  the  model  given  in  (1),  we  first  whiten 
the  process  x[n ]  to  obtain  a  unit- variance  white  noise 

y[n]  =  (b  *  x)[n]  (48) 

where  the  whitening  filter  b[n]  is  given  by 

B ^  o-oQ(z) 


which  is  causal  and  stable.  Next,  we  obtain  the  esti¬ 
mate  of  s[n-D]  by  filtering  y[n]  with  a  causal  IIR  filter 



-  D]  =  £  9[m]y[n  ~  m\-  (50) 

m= 0 


the  resulting  MSE  is 

To  minimize  E{|s[n  -  D]  —  s[n  -  D] |2}  with  respect  to 
g[ri\,  we  use  the  orthogonality  principle  to  obtain  the 
Wiener-Hopf  equations 

£{(s[n  -  D]  -  s[n  -  D])y*[n  -  &]}  =  0 
for  0  <  k  <  oo,  or  equivalently, 

rsy[k-D]  -  ^  g[m}ry[k  -  m] 

m= 0 
=  #] 

for  0  <  k  <  oo.  Therefore, 

G(z)  =  [z~DPsy(z)]+. 


rS2/[fc]  =  E{s[n]y*[n-k]} 

=  i?/s[n]  ( Y  6[/]x[n  —  k  —  l] 

u= o 

which  implies  that 

Psy(z)  =  B*(l/z*)H*(l/z*)Ps(z) 
=  H*(l/z*)Ps(z) 








=  £>*[/]»■„,[*  +  *] 



=  £V[Z]£{s[n  +  fc  +  Z] 


x  /i[m].s[n  -  m]  +  ro[n]^  |  (58) 

oo  N— 1 

b*  [l]h*  [ m]rs  [k  +  1  +  m]  (59) 

1=0  m= 0 



Since  the  causal,  HR  Wiener  deconvolution  filter  for 
estimating  s[n  -  D]  from  the  data  set  {i[m]  :  — oo  < 
m  <  n}  is  given  by 

fD(z)  =  B(z)G(z)  (62) 

1  \z-dH*(1/z*)Ps(z)] 

< ?oQ(z )  [  a0Q*(l/z*) 


J  + 


rsx[fc]  =  h[n\s[m  -  k  -  n]^  | 

+E{s[n\w*[n  —  A;]} 


53  h*[n]rs[k  +  n], 






E{{s[n-D]-s[n-D])s*[n-D]}  (66) 

rs[0]  -  E|^/z)[Z]2:[n  -  Z]s*[n  -  I>]  j 



rs[0]-J2fD[lKx[l-D]  (68) 


oo  N—l 

r4°]  -  53  53  f^Mn]r*,[l  - D  +  n ]  (69) 

1= 0  n= 0 
oo  N  —  l 

'■M-EI  fD[l]h[n]rs[D  - l-n ].  (70) 

;=o  n=0 


[1]  H.  S.  Malvar,  Signal  Processing  with  Lapped  Trans¬ 
forms.  Boston,  MA:  Artech  House,  1992. 

[2]  P.  P.  Vaidyanathan,  Multirate  Systems  and  Filter 
Banks.  Englewood  Cliffs,  NJ:  Prentice-Hall,  1992. 


Fast  Approximated  Sub-Space  Algorithms 

Mohammed  A.  Hasan1  and  Aii  A.  Hasan* 

*  Dept  of  Electrical  &;  Computer  Engineering,  University  of  Minnesota  Duluth 
^College  of  Electronic  Engineering,  Bani  Waleed  ,  Libya 


In  this  paper,  fast  techniques  for  invariant  subspace  sep¬ 
aration  with  applications  to  the  DOA  and  the  harmonic 
retrieval  problems  are  presented.  The  main  feature  of 
these  techniques  is  that  they  are  computationaly  effi¬ 
cient  as  they  can  be  implemented  in  parallel  and  can 
be  transformed  into  matrix  inverse-free  algorithms.  The 
basic  operations  used  are  the  QR  factorization  and  ma¬ 
trix  multiplication.  Specifically,  two  types  of  methods 
are  developed.  The  first  method  uses  Newton-like  itera¬ 
tion  and  is  quadratically  convergent.  The  second  method 
can  be  developed  to  have  convergence  of  any  prescribed 
order.  Using  these  approximations,  the  minimum  norm 
solution  for  the  DOA  and  the  harmonic  retrieval  prob¬ 
lems  for  the  projection  of  least  squares  weight  onto  the 
signal  subspace  of  the  data  is  obtained  simply,  without 
performing  any  SVD.  Some  of  the  developed  methods 
are  also  examined  on  several  test  problems. 

1.  Introduction 

The  estimation  of  projections  onto  selective  set  of  invari¬ 
ant  subspaces  of  data  and  covariance  matrices  is  a  com¬ 
mon  requirement  in  the  development  of  high  resolution 
methods.  This  situation  arises  in  adaptive  processing  of 
sensor  array  data  or  sum  of  sinusoids  where  the  estima¬ 
tion  of  the  number  of  strong  signals  present  in  a  given 
set  of  data  and  the  projections  onto  signal  and  noise 
subspaces  is  essential.  Subspace  based  methods  for  fre¬ 
quency  estimation  rely  on  a  low  rank  system  model  that 
is  obtained  by  organizing  the  observed  data  samples  into 
vectors.  MUSIC  and  ESPRIT  based  estimators  are  then 
obtained  using  this  vector  model. 

Projection  of  the  least-squares  weight  vector  onto 
subspace  of  reduced  dimension  is  an  established  tech¬ 
nique  for  reducing  the  number  of  adaptive  degrees  of 
freedom  used  by  an  adaptive  sensor  array.  The  main 
problem  in  conventional  algorithms  for  subspace  esti¬ 
mation  based  upon  eigenvalue  decomposition  (EVD)  or 
singular  value  decomposition  (SVD)  are,  however,  both 
expensive  to  compute  and  difficult  to  make  recursive  or 
implement  in  parallel.  In  contrast,  algorithms  based  on 
the  QR  factorization  have  established  pipelinable  archi¬ 

Since  many  signal  processing  applications  (e.g.  pro¬ 
jection  beamforming,  MUSIC)  do  not  explicitly  utilize 
the  full  set  of  signal  eigenvalues,  diagonalizing  the  co¬ 

variance  matrix  of  the  data  is  not  necessarly  advanta¬ 
geous  and  is  not  required.  Various  alternatives  were 
proposed  by  several  authors.  Kay  and  Shaw  [1]  sug¬ 
gested  the  use  of  polynomials  and  rational  functions  of 
the  sample  covariance  matrix  for  approximating  the  sig¬ 
nal  subspace.  In  [2],  Tufts  and  Melissinos  used  Lanczos 
and  power-type  methods  to  approximate  the  signal  sub¬ 
space.  Karhunen  and  Joutsenalo  [3]  approximated  the 
signal  subspace  using  the  discrete  Fourier  and  Cosine 
transforms.  Ermolaev  and  Gershman  [4]  used  powers 
of  sample  covariance  matrix  based  on  Krylov  subspaces 
to  approximate  the  noise  subspace  when  the  number  of 
impinging  signals  and  a  threshold  which  separates  the 
signal  and  noise  eigenvalues  are  known  o  priori.  In  this 
work,  we  assume  that  a  rough  estimate  of  a  threshold 
is  known.  For  useful  articles  and  books,  the  reader  is 
referred  to  [5],  [6]-[8]  and  the  references  therein. 

The  proposed  algorithms  could  prove  useful  if  a 
threshold  that  separates  noise  and  signal  eigenvalues  is 
known.  This  threshold  can,  in  some  cases,  be  obtained 
by  tracking  subspaces  where  largest  eigenvalue  of  cur¬ 
rent  noise  subspace  or  smallest  eigenvalues  of  current 
signal  subspace  of  the  power  level  of  the  noise  floor  are 
known.  In  these  cases  the  proposed  algorithm  can  help 
speed  up  the  computation  for  final  estimation  of  sub¬ 
spaces.  Another  application  is  when  the  rank  of  signal 
subspace  is  known. 

2.  Data  Model 

The  N  samples  of  a  scalar  valued  signal  y{n)  are  as¬ 
sumed  to  be  the  sum  of  M  complex-valued  sinusoids  in 
additive  zero  mean  white  Gauassian  noise 

xk(n)  =  akei<Wkn+M,  k  =  1, 2,  •  •  • ,  M, 

M  (i) 

y(n)  =  y^  xk{n) +  v(n),  n  =  1, 2,  •  •  •  ,N, 

k= 1 

Here  ak  >  0  is  the  amplitude  and  the  frequencies 
wi ,  •  •  • ,  wm  are  assumed  to  be  distinct  parameters,  and 
the  phases  <pk  are  assumed  to  be  uniformly  distributed 
on  [0,  27t]  and  are  mutually  independent.  The  noise, 
v(n)  is  assumed  to  be  independent  of  the  phases  and  to 

E{v(n)v*(n  —  k)}  —  ol6(k),  (2) 

where  (.)*  denotes  complex  conjugate  and  <5(.)  is  the 
Kronecker  delta  function.  A  low  rank  matrix  represen¬ 
tation  of  the  problem  is  obtained  by  collecting  L  >  M 

0-7803-5988-7/00/$  10.00  ©  2000  IEEE 


received  samples  in  a  column  vector 

y(n)  =  [y(n)  y(n+ 1)  •••  y(n  +  L-l)]T,  (3a) 

where  (.)T  denotes  the  matrix  transpose. 

The  notation  x(n)  will  denote  the  vector 

x(n)  =  [xi(n)  xi(n+\)  ■■■  xm{ti  +  L-T)]t  . 


Hence  y(n)  can  be  written  as 

y(n)  =  V(w)x(n)  +  v(rc),  n  =  1,  •  ■  ■  ,7V  -  L  +  1,  (4) 

where  the  additive  noise  vector,  v(n),  is  defined  simi¬ 
larly  to  y(n)  in  (3)  and  V(w)  is  an  L  x  M  Vandermonde 
matrix  given  by 

1  1  1 

ejW!  ejw 2  ,  .  ,  g jwM 

ej(L-l)wi  ej(L-l)w2  ...  ej(L-l)wM 


The  argument  w  is  omitted  in  the  sequel  when  not 
required.  The  covariance  matrix,  R,  of  the  received  win¬ 
dowed  sequence  is 

Ry  =  £{y(n)y‘(n)}  =  VDV*  +  a2vIL,  (6) 

where  the  covariance  matrix  D  =  diag(a i,---,Qm)  is 
diagonal.  The  matrix  II  is  the  identity  matrix  of  size 
L.  Note  that 

Rx  =  £{x(n)x*(n)}  =  VDV*.  •  (7) 

Similar  formulation  can  also  be  obtained  for  the  direc¬ 
tion  of  arrival  (DOA)  problem  except  in  that  case  the 
matrix  D  is  not  necessarly  diagonal. 

In  this  paper,  it  is  shown  that  if  a  threshold  that 
separates  the  signal  and  noise  eigenvalues  or  if  the  di¬ 
mension  of  the  signal  subspace  is  a  priori  known,  the 
subspace  estimation  can  be  obtained  using  the  QR  fac¬ 
torization  of  a  large  power  of  the  covariance  matrix. 

3.  Invariant  Subspace  Computation 

Let  A  be  a  Hermitian  matrix,  and  let  Po  and  Pi  to  de¬ 
note  the  orthogonal  projections  onto  the  invariant  sub¬ 
spaces  corresponding  to  eigenvalues  inside  and  outside 
the  interval  (— 16|,  |6|),  where  b  is  a  nonzero.  An  elegant 
method  for  computing  those  invariant  subspaces  is  pre¬ 
sented  next.  Consider  the  sequence  of  matrices  defined 

Sk  =  (bkIL-Ak)(bklL  +  Ak)~\  (8) 

fok  _ 

then  the  eigenvalues  of  Sk  given  by  {bk+xl  }f=1  con- 


verge  to  1  or  -1  as  k  — »  oo.  Thus  Sk  is  bounded  for 
all  sufficiently  large  k.  It  can  be  shown  the  sequence 
Sk  converges  to  a  matrix  S  satisfying  ,5 2  =  IL,  and 
SA  =  AS.  Moreover,  S  and  A  have  the  the  same  invari¬ 
ant  subspaces  inside  and  outside  a  circle  of  radius  |6|  and 
centered  at  the  origin.  If  (8)  is  computed  directly  using 
powers  of  the  matrix  A,  over-  and  under-flow  will  occur. 

Since  the  sample  covariance  matrix  is  generally  positive 
semidefinite,  we  will  apply  this  iteration  on  the  shifted 
matrix.  Fast  implementation  of  computing  the  limit  of 
the  sequence  which  also  avoids  the  problem  of 

over-  and  under-flow  will  be  given  next. 

Algorithm  1: 

So  =  Rv  —  blL 

Sk+ 1  = 

{(/l  +  Sk)r  -  ( IL  -  Sk)r}{(IL  +  Sk)r  +  (II  -  Sk)r}-\ 


It  can  be  shown  that  Sk  satisfies  the  following  elegant 
error  formula 

(sfc+1 + -  s)  =  {(sfe + s)-\sk  -  s)r 

=  {(So  +  S)-1(S0-S)}rfc. 


This  method  can  be  made  to  converge  at  any  desired 
rate  by  choosing  an  appropriate  r.  From  several  nu¬ 
merical  experiments,  it  was  observed  that  for  r  =  2,  a 
suitable  K  —  5,  while  K  =  3  if  r  =  3.  Once  the  desired 
convergence  is  obtained,  the  signal  subspace  projection 
is  computed  as  Ps  =  and  the  noise  subspace 

projection  is  approximated  as  P„  =  2K  '■ 1 . 

The  next  results  provide  quadratically  convergent 
methods  for  subspace  computation.  The  significance 
of  the  next  theorem  is  that  it  computes  the  projection 
matrix  for  the  subspaces  whose  eigenvalues  fall  between 
two  numbers  a  and  b. 

Theorem  1.  Let  Xo  —  Rv  be  a  L  x  L  nonsingular 
matrix  and  let  0  <  a  <  b  be  two  positive  numbers.  Let 
Xk  be  generated  using 

Xk+1  =  (2Xk  -  (a  +  b)IL)~\Xl  -  (a  +  b)Xk  +  abIL), 


where  II  is  thepxp  identity  matrix.  Then  Xk  converges 
quadratically  to  S  —  aQ\  +  bQ2,  where  Qi  and  C}2  are 
the  projections  onto  the  span  of  all  eigenvectors  of  Rv 
whose  corresponding  eigenvalues  are  in  the  right  and 
left  half  planes  of  the  line  which  perpendicularly  bisects 
the  segment  between  a  and  b.  Moreover,  Q i  =  b/j)L_~5 , 

Q2  ~  "n aj}d  AT  satisfies  the  following  error  formula 

(Xk+1  +S)~1(Xk+1-S)  =  (Xfc+Sr^Xfc-S)2.  (116) 

It  should  be  stated  that  the  above  result  holds  true 
for  any  two  numbers  a  ^  b.  In  this  case  if  a  +  b  =  0 
with  a  7^  0,  then  the  subspace  decomposition  reduces  to 
computing  the  projections  onto  the  subspaces  spanned 
by  the  eigenvectors  with  eigenvalues  having  positive  and 
negative  real  parts,  respectively.  Specifically,  if  a  — 

—b  =  1,  the  matrix  S  reduces  to  the  matrix  sign  function 
of  Xo. 

When  a  threshold  b  which  separates  the  signal  and 
noise  eigenvalues  is  a  priori  known,  then  the  suggested 
approach  will  be  very  effective  in  extracting  the  signal 
and  noise  subspaces.  More  generally,  one  can  derived  a 


stable  and  quadratically  convergent  algorithm  for  com¬ 
puting  the  invariant  subspace  of  the  matrix  A  in  the 
half-plane  with  boundary  determined  by  the  line  which 
perpendicularly  bisects  the  line  segment  between  2  =  0 
and  2  =  2b. 

Theorem  2.  Let  A  be  a  nonsingular  matrix  of  size  L 
and  let  b  Y  0  be  a  complex  number.  For  k  =  1, 2,  •  •  •, 

Zk+ 1  =  \zk(Zk  -  blhyxzk,  (12a) 

with  Z\  =  A.  Then  the  sequence  Zk  converges  to  2b Z 
where  Z  is  the  projection  onto  the  subspace  spanned 
by  all  eigenvectors  whose  eigenvalues  are  in  the  right 
half  plane  with  boundary  determined  by  the  line  which 
perpendicularly  bisects  the  line  segment  between  2  =  0 
and  2  =  2b. 

The  quadratic  convergence  of  this  algorithm  can  be 
seen  from  the  error  formula  which  can  be  shown  to  be 

(Zk+1  -  2bZ)Zk+1  =  {( Zk  -  2  bZ)Zkl}\  (12  b) 

Note  that  the  matrix  inverse  in  (2a)  can  be  avoided  by 
utilizing  the  Schultz  iteration  [9]. 

The  main  disadvantage  of  (9)  and  (12)  is  that  they 
require  the  computation  of  matrix  inverse.  In  the  follow¬ 
ing  result  an  implementation  of  (9)  which  avoids  matrix 
inverse  computation  is  given. 

Theorem  3.  Let  b  be  a  threshold  which  separate  the 
signal  and  noise  eigenvalues  of  the  positive  definite  ma¬ 
trix  Ry.  Let  Sk  be  a  sequence  generated  as  follows: 

So  —  Ry  —  blL 

(lL  +  Sk)r  _  Qu  Ql2  Rk 
( II  —  Sky  Q21  Q22  0 

k  =  0, 1, 2,  •  •  • ,  K 

Sk+ 1  =  (Qll  +  Q2l)(Qll  —  Q21)) 
then  Sk  converges  to  Px< |t,|  —  Ta>[6[- 


Note  that  the  middle  step  in  Equation  (13)  involves 
QR  decomposition.  This  provides  an  rth  order  conver¬ 
gent  algorithm  for  computing  the  projections  onto  in¬ 
variant  subspaces  to  the  left  and  right  of  the  line  2  =  6. 
Once  S  is  computed  accurately,  then  the  eigen-spaces 
can  be  obtained  from  the  QR  factorization  of  Ih~S ,  i.e., 

=  QR,  then  Q*(RV  -bIL)Q  =  ^  ^  .where 

all  eigenvalues  of  A\  are  inside  the  interval  [ — 16|,  \b\] 
and  those  of  A 2  are  outside  that  interval.  This  process 
can  be  repeated  if  necessary  on  smaller  matrices  A\  and 
Ai-  Initial  tests  of  this  algorithm  have  shown  that  this 
implementation  is  stable  and  convergent  even  when  the 
matrix  A  has  an  eigenvalue  as  small  in  magnitude  as 

We  should  note  that  in  Iteration  (13),  orthogonal 
projections  are  obtained  using  only  matrix  multiplica¬ 
tion  and  the  QR  factorization.  This  method  can  be 
made  to  converge  at  any  desired  rate  by  choosing  an 
appropriate  r. 

Algorithm  2: 

Using  analogous  derivation,  we  obtained  another 
inverse- free  implementation  of  (13)  for  Hermitian  ma¬ 
trices  which  is  given  as  follows: 

P0  =  Ry  ~  blL 

Pk  _  Qu  Ql2  R) 
{ip-PkY\  -  [Q21  Q22J  0 

Pk+ 1  =  Qu(Qu  —  Q21 ) . 

k  =  0,1,2,  ■■■,  K 

then  Pk  converges  to  an  orthogonal  projection.  Let 
Pfc+i  =  QR  be  a  QR  factorization,  then  Q*AQ  is  block 
diagonal.  This  algorithm  indicates  that  projections  onto 
half-planes  can  be  obtained  using  only  matrix  multipli¬ 
cation  and  QR  factorization. 

4.  Estimation  of  a  Threshold 

The  performance  of  estimators  based  on  the  approxima¬ 
tions  given  in  the  previous  section  is  mainly  dependent 
on  the  accuracy  of  a  threshold  that  separates  the  signal 
and  noise  eigenvalues  or  if  the  dimension  of  the  signal 
subspace  is  a  priori  known. 

Since  Ry  is  Hermitian,  it  has  the  eigendecomposi- 
tion  Ry  =  Y^i= 1  uiui  where  A,  and  are  the  ith 
eigenvalue  and  1th  corresponding  eigenvector.  For  con¬ 
venience,  it  is  assumed  that  the  eigenvalues  are  sorted 
in  decreasing  order  so  that  Ai  >  Aa  ■  •  ■  >  Am  >  Am+i  = 
•  •  •  =  \l  =  0%  with  corresponding  eigenvectors  {u;}f=] . 
The  eigenvectors  {uijfLx  are  usually  called  the  signal 
vectors  and  the  eigenvectors  {wj}feM+i  are  called  the 
noise  vectors.  If  the  average  of  the  signal  eigenvalues 
is  denoted  by  As,  then  one  can  show  that  trace^Rvl  js  a 
good  estimate  of  the  threshold  provided  that  L  is  suffi¬ 
ciently  large.  The  main  requirement  for  this  threshold 
is  Oy  <  tracARQ  <  \M  which  holds  provided 

M . ,  2\  r 

7  (Am  0"u)  ^  As 

Note  that  in  this  inequality  the  only  parameter  that  can 
be  varied  is  L.  Clearly,  if  L  is  much  larger  than  M  so 
that  »  1,  then  the  above  inequality  will  hold. 

Although  this  threshold  is  very  simple  to  compute,  it 
holds  only  for  the  theoretical  covariance  matrix,  i.e.,  all 
noise  eigenvalues  are  the  same.  Another  observation 
is  that  (15)  holds  for  smaller  L  if  the  spread  of  signal 
eigenvalues  is  small  and  thus  the  difference  As  —  Am  is 
small,  or  if  Am  —  o’3  is  large.  Both  of  these  cases  lead 
to  smaller  L  for  (15)  to  hold. 

Note  that  for  M  =  2,  (15)  reduces  to 

— - — (A2  —  <?v)  >  Ai  —  A2. 

Also  in  the  hypothetical  case  in  which  all  signal  eigen¬ 
values  are  equal  the  above  threshold  always  accurate  for 
any  L  >  M. 


When  \s—Xm  is  large,  one  can  use  a  sharper  estimate 
of  the  threshold  based  on 

.  _Ef=1VA ;_T 

M ~  l  ~  L' 

This  estimate  can  be  computed  from  the  covariance  ma¬ 
trix  but  the  computation  is  very  lengthy  and  compli¬ 
cated  even  when  L  is  low.  For  example,  when  L  =  2 
the  value  of  p  can  be  estimated  from 

T2  =  trace(Ry)  +  det(Rv), 

where  T  =  pL.  For  L  =  3,  T  can  be  estimated  by 
solving  the  following  equation 

{(T2  -  a)2  -  46}2  =  8 VcT, 

where  a,  b,  c  are  determined  from  the  characteristic  poly¬ 
nomial  of  Rv  given  by  A3  —  a\2  +  b\  —  c. 

5.  Simulation  Results 

In  this  section,  frequency  estimators  based  on  subspace 
approximations  are  examined  on  several  data  sets  gen¬ 
erated  by  the  equation 

y(n)  =  d1eji2”f'n+M  +  d2ei{2”hn+M  +  v(n),  (15) 

where  di  =  1.0,  d2  =  1.0,  /i  =  0.5,  f2  =  0.52  and 
n  =  1,2,  •■■,1V  =  25.  The  fa  are  independent  ran¬ 
dom  variables  uniformly  distributed  over  the  interval 
[ — 7r,  vr] .  The  noise  v(k)  is  assumed  to  be  white  and 

uncorrelated  with  the  signal.  Note  that  f2  —  fi  <  jj- 


The  SNR  for  either  sinusoids  is  defined  as  101og10(^$), 
where  x{n)  =  die^2nfin+<tl1^  4.  d2ej( 2wf2n+M  ancj  a 2 , 
cr2  are  the  variances  of  a;(n)  and  v(n),  respectively.  The 
size  of  the  covariance  matrix  is  chosen  to  be  L  =  10 
which  in  the  absence  of  noise  has  effective  rank  two.  We 
performed  experiments  to  compare  the  proposed  meth¬ 
ods  versus  the  truncated  SVD-based  MUSIC.  The  SVD 
routine  on  MATLAB  is  used  for  the  computation  of  the 
signal  subspace  eigenvectors  and  eigenvalues  required 
to  implement  a  SVD-based  method  for  comparison.  We 
varied  SNR  from  10  to  20  in  5dB  steps  and  estimated 
the  frequencies  for  data  length  25.  For  each  experiment 
(with  data  length  and  SNR  fixed),  we  performed  100 
independent  trials  to  estimate  the  frequencies.  We  use 
the  following  performance  criterion  (RMSE) 

yy  ^  '(/*  ftrue)2 

to  compare  the  results.  Here  Ne  is  the  number  of  in¬ 
dependent  realizations,  and  /,  is  the  estimate  provided 
from  the  it h  realization.  Several  experiments  were  con¬ 
ducted  to  test  the  performance  of  the  algorithms  pre¬ 
sented  in  Theorem  3,  and  the  SVD-based  MUSIC.  The 
mean  values  of  estimated  frequencies  and  their  RMSE 
of  the  SVD-based  MUSIC  are  given  in  Table  1. 






20  dB 





15  dB 





10  dB 





Table  1:  Mean  and  RMSE  of  frequencies  for  data  of 

two  complex  sinusoids  at  frequencies  0.50  and  0.52 

in  noise  with  SNR=20,  15, 10  dB,  dimension  of  data 

vectors  L=10.  Theorem  3  is  used. 


[1]  Kay  S.  M.  and  Shaw  A.  K.,  ’’Frequency  Estima¬ 
tion  by  Principal  Component  AR  Spectral  Estimation 
Method  without  Eigendecomposition,”  IEEE  Trans, 
on  Acoustics,  Speech,  and  Signal  Processing,  Vol.  36, 
No.  1,  pp.  95-101,  January  1988. 

[2]  Tufts  D.  and  Melissinos  C.  D.,  ’’Simple,  Effective 
Computation  of  Principal  Eigenvectors  and  Their 
Eigenvalues  and  Application  to  High-Resolution  Es¬ 
timation  of  frequencies,”  IEEE  Trans,  on  Acoustics, 
Speech,  and  Signal  Processing,  Vol.  34,  No.  5,  pp. 
1046-1053,  October  1986. 

[3]  Karhunen  J.  T.,  and  Joutsenalo  J.,  ’’Sinusoidal  Fre¬ 
quency  Estimation  by  signal  subspace  Approxima¬ 
tion,”  IEEE  Trans,  on  Acoustics,  Speech,  and  Sig¬ 
nal  Processing,  Vol.  ASSP-40,  No.  12,  pp.  2961-2972, 
December  1992. 

[4]  Ermolaev  V.  T.  and  Gershman  A.  B.,  ’’Fast  Algo¬ 
rithm  for  Minimum-Norm  Direction-of-Arrival  Esti¬ 
mation,”  IEEE  Trans,  on  Signal  Processing,  Vol.  42, 
No.  9,  pp.  2389-2394,  September  1994. 

[5]  Kay  S.  M.,  Modern  Spectral  Estimation,  Theory  and 
Applications,  Englewood  Cliff  s,  NJ:  Prentice-Hall, 

[6]  Hasan  M.  A.,  and  Hasan  A.  A.,  ’’Hankel  Matrices  of 
Finite  Rank  with  Applications  to  Signal  Processing 
and  Polynomials”,  J.  of  Math.  Anal,  and  Appls,  208, 
pp. 218- 242,  1997. 

[7]  Hasan  M.  A.,  Azimi-Sadjadi  M.  R.,  ’’Separation  of 
Multiple  Time  Delays  Using  New  Spectral  Estimation 
Schemes  with  Applications  to  Underwater  Target  De¬ 
tection”,  IEEE  Trans,  on  Signal  Processing,  Vol.  46, 
No.  6,  pp.  1580-1590,  June  1998. 

[8]  Hasan  M.  A.,  ”DOA  and  Frequency  Estimation  Us¬ 
ing  Fast  Sub-Space  Algorithms,”  accepted  for  publi¬ 
cation  in  Journal  of  Signal  Processing. 

[9]  Stoer  J.  and  Bulirsch,  Introduction  to  Numerical 
Analysis,  Springer- Verlag,  New  York  1980. 

RMSE  - 





Christophe  Andrieu  -  Arnaud  Doucet 

Signal  Processing  Group,  University  of  Cambridge 
Department  of  Engineering,  Trumpington  Street 
CB2  1PZ  Cambridge,  UK 

Email:  ca22  - 


In  this  paper  we  propose  a  method  to  estimate  the  frequencies  of 
sinusoids  embedded  in  non-Gaussian  noise.  We  model  the  noise 
using  mixtures  of  Gaussians  and  propose  two  original,  efficient 
algorithms  that  allow  for  the  marginal  MAP  estimation  of  the  si¬ 
nusoid  parameters  to  be  estimated.  Outline  of  the  proof  of  con¬ 
vergence  of  the  algorithms  is  also  given  and  simulation  results  are 

1  Introduction 

The  harmonic  retrieval  problem  is  a  fundamental  problem  in  sig¬ 
nal  processing  that  has  numerous  applications  in  radar,  seismology 
and  nuclear  magnetic  resonance.  Many  efforts  have  been  devoted 
to  the  development  of  methods  that  address  this  problem,  ranging 
from  periodogram  related  procedures,  to  subspace  and  parametric 
methods  relying  on  maximum  likelihood  or  Bayesian  estimation. 
The  Bayesian  estimation  of  harmonic  signals  in  white  Gaussian 
noise  has  been  the  subject  of  many  recent  papers,  see  [1],  [2],  [4], 
[5],  among  others.  Here  we  address  the  important  and  more  diffi¬ 
cult  problem  of  estimating  the  frequencies  of  sinusoids  embedded 
in  non-Gaussian  noise,  and  formulate  it  in  a  Bayesian  framework. 
A  commonly  used  tool  to  model  non-Gaussian  distributions  con¬ 
sists  of  using  discrete  or  continuous  mixtures  of  Gaussian  distri¬ 
butions,  and  this  is  the  approach  adopted  here.  The  motivation  for 
this  choice  is  that  by  introducing  a  proper  set  of  (artificial)  missing 
data,  say  £,  one  can  often  design  simple  and  efficient  algorithms 
that  allow  for  the  estimation  of  important  features  of  the  poste¬ 
rior  distribution  related  to  the  problem.  However,  from  a  statistical 
point  of  view  the  introduction  of  missing  data  can  typically  lead 
to  inconsistent  estimators  as  the  number  of  parameters  to  be  es¬ 
timated  typically  grows  with  the  number  of  observations.  Joint 
estimators,  i.e.  estimators  involving  £,  should  thus  be  avoided  and 
marginal  estimation  of  the  parameters  should  be  favoured. 

For  the  case  of  sinusoids  embedded  in  a  noise  modelled  as 
a  mixture  of  Gaussians,  the  analytical  expression  of  the  marginal 
posterior  distribution  of  interest  is  of  the  form 

p(a,w,6|y)  =  J  p(a,u,S,Z\y)d£, 

where  a,  w  are  the  amplitude  and  pulses  of  the  sinusoids  and  S 
are  parameters  of  the  observations  noise.  Unfortunately  it  is  not 

C.  Andrieu  is  sponsored  by  AT&T  Laboratories,  Cambridge  UK.  A. 
Doucet  is  sponsored  by  EPSRC,  UK. 

available  in  closed-form  and  one  has  to  resort  to  numerical  meth¬ 
ods.  Monte  Carlo  methods,  and  in  particular  Monte  Carlo  Markov 
chain  methods  (MCMC)  have  proved  to  be  efficient  tools  for  the 
estimation  of  certain  features  of  complicated  posterior  distribu¬ 
tions,  in  particular  MMSE  (Minimum  Mean  Square  Error)  e.g. 
E  ( (a,  w,  (5)|  y)  in  the  case  treated  here. 

However  this  choice  of  estimator  is  not  adapted  when  the  mar¬ 
ginal  posterior  distribution  is  multimodal  and  the  MMSE  estimate 
located  between  the  modes,  possibly  in  a  region  of  very  low  prob¬ 
ability.  Computing  MAP  (Maximum  a  posteriori)  estimates  of  the 
frequencies  might  be  preferable  in  such  cases,  but  whereas  MCMC 
methods  are  well  adapted  to  the  estimation  of  marginal  posterior 
means,  their  use  to  perform  MMAP  (Marginal  MAP)  estimation 
can  be  questionable.  Indeed  in  this  case  further  approximations 
are  introduced  by  histogram  or  density  estimation  methods  and  re¬ 
quire  careful  tuning  of  extra  parameters. 

The  EM  (Expectation  Maximization)  algorithm  is  designed  to 
converge  towards  a  stationary  point  of  the  marginal  posterior  dis¬ 
tribution.  It  is  however  limited  to  certain  classes  of  models  for 
which  the  expectation  and  maximization  steps  can  be  performed 
conveniently.  This  is  why  stochastic  versions  have  been  proposed, 
such  as  SEM  (Stochastic  EM)  or  MCEM  (Monte  Carlo  EM).  Con¬ 
vergence  results  are  sparse  and  the  algorithms  do  not  always  fully 
exploit  the  structure  of  the  statistical  model.  In  this  paper  we  pro¬ 
pose  several  Monte  Carlo  methods  for  performing  MMAP  of  the 
frequencies  of  sinusoids  embedded  in  non-Gaussian  noise.  The 
first  method  relies  on  the  SAME  (State  Augmentation  for  Marginal 
Estimation)  algorithm  [10].  This  algorithm  is  conceptually  very 
simple  and  straightforward  to  implement  in  most  cases,  requir¬ 
ing  only  small  modifications  to  MCMC  code  written  for  sampling 
from  p  ( a,  w,  S,  £ |  y).  In  order  to  reduce  the  computational  com¬ 
plexity  of  this  algorithm,  we  present  a  stochastic  approximation 
type  extension  of  this  algorithm.  We  then  present  an  original  anal¬ 
ysis  of  the  convergence  of  the  stochastic  approximation  type  algo¬ 
rithm  which  relies  on  a  perturbation  analysis  of  the  original  SAME 
algorithm.  Simulation  results  are  presented  that  demonstrate  the 
interest  of  the  approach. 

This  paper  is  organized  as  follows.  In  Section  2  the  signal 
models  are  given.  In  Section  3,  we  formalize  the  Bayesian  model 
and  specify  the  prior  distributions.  Section  4  is  devoted  to  Bayesian 
computation.  We  propose  non  homogeneous  MCMC  algorithms 
to  perform  Bayesian  inference  for  which  sufficient  condition  for 
global  convergence  can  be  established.  Performance  of  these  algo¬ 
rithms  is  illustrated  by  computer  simulations  on  synthetic  data  in 

0-7803-5988-7/00/$  10.00  ©  2000  IEEE 


Section  6. 

3.1  Prior  distribution 

2  Problem  statement 

Let  y  =  (t/i,  t/2, . . . ,  yT)T  be  an  observed  vector  of  T  real  data 
samples.  The  elements  of  y  are  the  superimposition  of  k  sinusoids 
corrupted  by  noise  n  =  (m , . . . ,  nr) : 


Vt  =  E**  cos  (oJjt)  +  a3j  sin  ( u>jt )  +  nt, 
j= t 

where  1  <  k  <  [(T  —  1)  /2J,  aCj ,  aSj  and  u)j  are  respectively 
the  amplitudes  and  the  radial  frequency  of  the  jth  sinusoid.  We 
assume  that  w  e  fi  =  6  (0,  n)k  \uj j1  /  ui]2  for  ji  /  j2}. 

In  a  vector-matrix  form,  we  have 

y  =  D  (w)  a  +  n, 

We  set  a  prior  distribution  on  the  unknown  parameter  vector  0  = 
(a,  w,  A,  ri:T,  a,  cr2)  6  ©  where  ©  =  R2k  x  12  x  (0, 1)  x 
{0, 1}T  x  (0,1)  x  R+.  The  following  uninformative  improper 
prior  distribution2  is  selected: 

p  (a,ui,cr2|  ri:T,a)  oc 

dt  Mjrjgjg) 


In  (w) . 

This  prior  corresponds  to  Jeffreys’  prior  for  the  linear  model  [3],  It 
penalizes  close  frequencies  as  pointed  out  in  [5].  The  parameters 
q  and  A  are  assumed  distributed  according  to  a  ~  W(0,t)  and 
A  ~  W(0,i)  which  are  vague  prior  distributions. 

3.2  Estimation  objectives 

Given  the  observations  y,  Bayesian  inference  about  0  is  based  on 
the  posterior  distribution  p  ( 6\  y)  obtained  from  Bayes’  theorem, 

where  [a]2i_u  =  oc.,  [a]2il  =  aSj  and  [at];>1  =  w,  fori  = 
1, . . . ,  k.  The  T  x2k  matrix  D  (u>)  is  defined  as  [D  (w)](  2j_,  = 
cos  [wjt]  and  [D  (w)](  =  sin  [tot]  for  t  =  1, . . .  ,T,  and  j  = 

1, . . . ,  k.  The  noise  is  assumed  white,  distributed  according  to  a 
mixture  of  Gaussian  distributions,  i.e.1 

nt  ~  AA r  (0,  a2)  +  (1  -  A)  Af  (0,  aa2)  , 

where  0  <  A  <  1  defines  the  mixture  probability,  cr2  is  a  global 
scale  parameter  and  0  <  a  <  1.  It  is  convenient  to  introduce  the 
so-called  missing  data  ri,T  such  that: 

I  ft  ( 0,<t2I{i}  (rt)  +  a<T2I{0>  ( rt ))  , 

and  Pr  (rt  =  1)  =  A  and  Pr  (rt  =  0)  =  1  —  A.  This  allows  us  to 
write  the  likelihood  of  the  observations 

p  (y|a,u>,  A,ri:r,a,<r2)  =  |27ra2Sr1/2 

X  exp  (- (y  -  D  (w)  a)T  (y  —  D  (w)  a))  , 

where  S  =  diag  (l{1}  (rj)  +  al{0}  (rj))  j  =  1, . . .  ,T.  Note 
that  this  likelihood  is  invariant  by  permutation  of  the  indexes  of 
the  pulses  uij,  if  no  ordering  constraint  is  introduced,  and  that  con¬ 
sequently  MMSE  estimates  can  lead  to  very  poor  results.  The  pa¬ 
rameters  of  the  sinusoids,  of  the  noise  and  the  missing  data  i.e. 
0  =  (a,u»,  A,ri;r  ,  q,(T2)  are  unknown,  and  our  aim  is  to  esti¬ 
mate  these  parameters;  a  and  w  being  in  general  the  parameters 
of  primary  interest.  Note  that  the  strategy  developed  in  this  paper 
can  be  extended  to  the  case  of  continuous  Gaussian  mixtures,  in 
order  to  model  heavy  tailed  distributions,  but  we  do  not  consider 
this  case  here. 

3  Bayesian  Models  and  Estimation  Objectives 

In  this  paper  we  follow  a  Bayesian  approach  where  the  unknown 
parameter  vector  0  is  regarded  as  being  drawn  from  an  appropri¬ 
ate  prior  distribution.  This  prior  distribution  reflects  our  degree  of 
belief  in  the  relevant  values  of  the  parameters.  Note  that  when  no 
prior  knowledge  is  available,  then  uninformative  distributions  can 
be  used  [3],  This  is  the  approach  we  follow  here.  We  first  propose 
a  model  that  sets  up  a  probability  distribution  over  the  space  of 
possible  structures  of  the  signal  and  we  give  the  estimation  aims. 

1This  could  be  extended  to  the  case  of  discrete  mixtures  with  more 

p(0|y)  °cp(y|0)p(0). 

Our  aim  is  to  estimate  this  joint  distribution  from  which,  by  stan¬ 
dard  probability  marginalization  and  transformation  techniques, 
one  can  “theoretically”  obtain  all  posterior  features  of  interest  in¬ 
cluding  the  marginal  distributions,  posterior  modes  or  conditional 
expectations  such  as  the  MMSE  estimate 

E[0|y]=  f  0P(0\y)d0> 


among  others.  As  discussed  in  the  introduction  this  problem  can  be 
addressed  using  MCMC  methods  but  the  use  of  these  techniques 
for  the  computation  of  the  MMAP  estimator  (a,  w,er2,  &)MMAP 
defined  as 

argmax  p  (a,u;,<T2,a|  y)  , 

(a,w,cr2,a)eR2fcxnxR  +  X(0,l) 

can  be  questionable.  In  the  next  section  we  describe  an  algorithm 
that  allows  for  computation  to  be  performed  by  adapting  MCMC 
techniques  for  MMAP  estimation. 

4  Bayesian  Marginal  MAP  robust  spectral  estimation 

4.1  The  SAME  algorithm 

One  might  be  interested  in  the  marginal  MAP  estimation  of  the  fre¬ 
quencies,  i.e.  finding  the  maximum  ofp  (a,  w,cr2)a:|  y).  In  order 
to  achieve  this  we  introduce  two  versions  of  the  SAME  algorithm 
[10],  the  second  one  being  a  stochastic  approximation  type  algo¬ 
rithm.  Let  us  consider  the  extended  probabilistic  model, 

p®7  (a,w,<72,a,Ai:7!r1;T,i:7|y) 

«  nj=i  P  ( y I  a,  w,  cr2 ,  a,  Aj ,  n ;T,j )  p  (a,  u>,  a2 ,  a,  Aj ,  r i .T,j )  , 

where  7  is  a  positive  integer,  r^rj  is  a  replica  of  the  missing 
data.  Clearly  this  probabilistic  model  admits  p7  ( a,w,cr2,a|  y) 
as  marginal  distribution,  where p7  (a,  o>,<r2,a|  y)  is  the  distribu¬ 
tion  proportional  top7  (a,  w,cr2,a|y).  Given  a  sequence  (7i)ieN 
such  that  lim  ji  =  +00,  the  idea  of  the  SAME  algorithm  is  to 

i— >4-00 

run  a  non  homogeneous  Markov  chain  that  admits 

p®7i  (  a,  w,  (T2,  a  |  y)  as  invariant  distribution  at  each  iteration  i. 

2 A  prior  distribution  p  (ff)  is  said  to  be  improper  if  f@  p  (9)  d9  = 



The  distribution  p®7i  (a,  w,cr2,a|  y)  concentrates  itself  on  its 
set  of  global  maxima  as  i  — >  +oo  (this  is  the  idea  of  simulated 
annealing)  and  the  algorithm  is  thus  hoped  in  practice  to  converge 
towards  a  global  maximum.  Note  that  when  7i  =  1  for  i  >  1  this 
algorithm  is  a  standard  MCMC  that  asymptotically  produces  sam¬ 
ples  from  p  ( 6\  y).  In  practice  one  can  make  use  of  the  properties 
of  the  model  and  analytically  integrate  out  a,  a2  and  A; ,  leading  to 
an  expression  ofp®7  (w,a,ri:T,i:7|y)  up  to  a  constant.  It  can  be 
shown  that 

p®7  (w)a,r11T,1.iiy)  oc  nu  lDTs7lDl1/2  Is;  r 1/2 

x  |M7r1/,z  [yTP7  {u)y]-^T,2+k)+k 

x  IlJ=i  ni.J •  —  bij)!i 

where  mj  =  Ef=i  I{o>  (rt,j)  and 

m7  = 

m7  =  M7Dt  («)  9?  =  £?=1  S."1. 

p7  (W)  =  vf-71  -  (w)  m7dt  (w) 

In  order  to  sample  from  p®7  (w,a,ri:r,i:7|  y),  we  propose  the 
following  algorithm: 

4.2  TheSA2ME 

In  the  current  version  of  the  SAME  algorithm  7 \  replicas  of  the 
variables  ti-.t  are  sampled  at  each  iteration  i,  which  can  rapidly 
become  cumbersome  as  7,  becomes  large.  Let  to  be  an  itera¬ 
tion  chosen  by  the  user.  Then  we  propose  from  iteration  to  not 
to  resample  the  variables  ri;T,i:7i_1  that  are  “frozen”  once  they 
are  simulated  but  simply  sample  the  new  replicas  ri:T,7;_1-t-iT7i- 
The  computational  gain  of  this  SA2ME  (Stochastic  Approximation 
SAME)  is  obvious  and  the  analogy  with  classical  stochastic  ap¬ 
proximation  algorithms  is  clear,  although  we  here  take  advantage 
of  the  statistical  structure  of  the  problem.  However  the  algorithm 
is  no  longer  a  Markov  chain  as  the  update  of  the  parameters  at  it¬ 
eration  i  depends  on  the  past  of  the  chain  up  to  iteration  i  —  1.  In 
fact  this  new  algorithm  can  be  viewed  as  a  perturbation  of  the  orig¬ 
inal  SAME  algorithm,  and  an  analysis  of  these  perturbations  can 
be  carried  out  to  prove  the  validity  of  the  new  scheme,  as  sketched 
in  the  next  section. 

5  Convergence  analysis 

We  first  point  out  a  convergence  result  for  the  SAME  algorithm 
and  then  focus  on  the  SA2ME  algorithm. 

MCMC  algorithm  for  marginal  spectral  analysis 

1.  Initialization  0(o)  =  jw(0\a(0\r^il!7o  }  and  i  =  1. 

2.  Iteration* 

•  For  j  =  1,  ...,7;  .sample 

rt(‘j  from  p®7i  (n,j|y,«{i' 

-!)  a  r^"1) 

’ Tnt,j  9 

forf  =  1, ...,  T. 

Sample  a(<)  ~jp®7i 



Sample  wj,)  ~  p®7i  | 



j  =  1, ...,  k  with  an  MCMC 


Sample a(i),(r2(i)  ~p®7i  ( 

\  <72|y,w(l 


Where  rn  means  “n;T  with  n  removed”  and  similarly  for  W7. 

We  comment  the  different  sampling  steps: 

•  Sampling  rt,j  is  straightforward  as  it  simply  involves  sam¬ 
pling  from  a  discrete  distribution. 

•  Sampling  Uj  can  be  done  using  an  adaptation  of  the  tech¬ 
nique  described  in  [1]. 

•  Sampling  a,  a2  is  standard  as  it  requires  the  simulation  from 
an  inverse-Gamma  distribution  and  a  normal  distribution. 

•  Sampling  a  mainly  amounts  to  sampling  from  a  truncated 
inverse-Gamma  distribution  and  can  be  done  efficiently  by 
using  a  rejection  method  based  on  the  work  of  [8]. 

S.l  SAME  algorithm 

First  we  set  81  =  {a,  <r2,a}  and  82  =  {A,ri;r}  and  name 

their  state  spaces  ©1  and  ©2.  The  SAME  algorithm  defines  a 
Markov  chain  on  8\ ,  and  it  can  be  proved  that  this  Markov  chain 
is  uniformly  ergodic  for  a  constant  sequence  7 <.  i.e.  for  any  prob¬ 
ability  distribution  p, 

.  fim^  \\pK*.  (dOi)  —  p1i  (dO !)||  =  0, 

at  a  geometric  rate  independent  of  the  initial  condition,  where  ||  || 
is  the  total  variation  norm.  Here  Ki  is  the  transition  kernel  of  the 
SAME  algorithm  at  iteration  i  which  can  be  formaly  written  as 

Ki  (8tl)Ai})  oc  f&i  P  (*i°|  ^1,7l))  njLiP  (d8?\  *{°)  • 

This  convergence  result  mainly  relies  on  the  fact  that  the  parame¬ 
ters  81  and  82  lie  in  bounded  sets.  From  this  result  and  following 
arguments  similar  to  that  used  to  prove  the  convergence  of  sim¬ 
ulated  annealing,  it  can  be  shown  that  for  a  logarithmic  series  of 
7i  the  SAME  algorithm  for  MMAP  estimation  converges  in  the 
following  sense 

lim  \\pK1K2  ■  ■  ■  K„  (dOi)  —  p7”  (<Z0i)||  =  0. 

n— >+00 

Furthermore  as  the  sequence  p1'  (dOi)  tends  to  a  mixture  of  delta 
functions  located  at  the  global  maxima  of  p  (81)  we  conclude  that 
the  algorithm  will  asymptotically  provide  us  with  an  estimate  of 
9i,mmap  —  arg  max  p(0 1). 


This  elegant  algorithm  allows  to  sample  from  the  series  of  dis¬ 
tributions  of  interest  and  convergence  results  can  be  proved  that 
support  the  validity  of  the  approach  (See  Section  5).  However 
we  see  that  as  7;  approaches  infinity  the  computational  burden  of 
the  algorithm  becomes  rapidly  unrealistic.  Thus  we  propose  here 
a  stochastic  approximation  adaptation  of  the  algorithm  presented 
above,  which  is  computationally  much  cheaper. 

5.2  SA2ME 

The  proof  of  convergence  of  the  algorithm  relies  on  an  analysis  of 
the  perturbations  introduced  by  the  new  scheme  upon  the  original 
SAME  algorithm.  We  sketch  here  the  proof  of  the  algorithm,  out¬ 
line  the  main  propositions  that  lead  to  the  convergence  result  and 
explain  their  intuitive  meaning.  We  introduce  some  notation  that 


will  be  useful  throughout  the  proof.  We  introduce  the  transition 
probability  corresponding  to  the  SA3ME  algorithm 

Ki+i  ^1:7i),  e[i]-,depi+lni+1\ d9\i+1)^j 

oc p  ^d^i+1)| P  (47i+1:7i+l)| , 

Here  we  simply  express  the  fact  that  the  missing  data  6$,1:7i)  are 
“frozen”  once  they  are  simulated.  In  order  to  study  the  conver¬ 
gence  properties  of  the  second  algorithm  it  will  be  useful  to  intro¬ 
duce  for  some  integer  k  the  transition  kernel  of  the  algorithm  for 

which  only  the  missing  data  up  to  iteration  k  —  1,  6\'  ,  are 

frozen,  and  missing  data  from  then  on,  7,+1)j  are  sam_ 

pled  at  each  iteration.  More  precisely,  for  i  >  k  we  define 

In  order  to  study  the  convergence  properties  of  our  algorithm,  we 
will  need  notation  to  combine  these  kernels,  namely, 

pKun  (d9[n\d9iyn-1  +  1:ynA  =  fBnx9ln-i  P  (dC) 

x^i  (e(°);de[1),de(21)^  k2  (e^\e^-,def\d8fn2)^ 

...  x  Kj  (e[i-1\ei1:v-1^,de[j\de^i-'+u',iA  x  ... 

...  xkn  ^n-1),^1:7n-^;d0("),<i^7n-,+1:7n)^ , 

and  for  k,j>m 

pKuk-iKk:n,m  {det\de^)  =  /e?xe,m_.  p  (de[0)) 
xKi  (0[o);d0[1)d$;1))  k2  (i9[1),^1);d^2),d^2:72)j  ... 

/glm-lm-l  ■■■/eji-'m-l 

...  x  kj<m  ^e[j-1\eilnm-1);de{i\d6^m-i+1:ii^ 

...  X  Kn,m  ^-1)41^-1);d^)1d4^-1+1:7”))  . 

Now  that  notation  is  defined  we  can  express  the  main  result  of 
this  section.  We  want  to  study  the  asymptotic  properties  of  the 
difference  of  the  two  stochastic  processes,  more  precisely  we  want 
to  prove  that  under  certain  conditions  for  any  probabilities  v  and  p 

limn-j+oo  pFfl:n||  —  0. 

A  trivial  decomposition  and  the  application  of  the  triangle  inequal¬ 
ity  leads  to 

|'zA'l;n  pkl:n  ||  <  \\vKl,n  -  pK1:n\]  +  ||/i/fi:n  ~/t.Kj:„||  . 

From  the  result  of  the  previous  subsection,  the  SAME  algorithm  is 
ergodic  and  thus  the  first  term  goes  to  zero  as  n  -4  +oo.  Conse¬ 
quently  we  focus  on  the  second  term. 

Our  results  are  based  upon  a  decomposition  into  an  estimation 
error  and  an  approximation  bias,  which  we  now  state: 

Proposition  1  For  all  integers  m„,  and  n  such  that  mn  <  n,  we 
have  the  estimate 

pKl:n  j|  ^  :n  pH  1  :mn  ^mn+l:n,mn  || 

“b  2 ~2k=mn+l  — —  pA”l:k|- 

Proof.  For  mn  <  n  we  have  the  telescoping  sum 

-,n  flK l;n  —  fJ,K i:n  ftKl \mn  +  l:n,mn 

^fc=7nn  +1  —  1  ^k:n,mn  fJ'^lik^k+l  :n,mn  > 

with  the  convention  Kn+\-.n,m„  =  Id.  Then  by  first  applying  the 
triangle  inequality  and  the  fact  that  for  any  probability  measures 

p  and  v  the  following  statement  holds  -  vKk<rrin  jj  < 

[|p  —  v ||  we  obtain  the  result.  I 

Proposition  2  There  exists  a  sequence  mn  such  that 

\\pK1:n  pK\:mn  Hmn+l:n,mn  ||  —0. 

Intuitively,  during  the  mn  first  iterations  K i:m„  introduces  an  ap¬ 
proximation  error  compared  to  the  SAME  algorithm,  which  is  then 
corrected  in  the  following  n— mn+l  iterations  with  Kmn+i:n,mn  ■ 
Then  if  m„  increases  significantly  less  fast  than  n  such  that 
Kmn+i:n,mn  can  correct  and  forget  in  n— mn  +  l  iterations  the  er¬ 
ror  generated  during  the  mn  first  iterations,  then  the  result  should 

Proposition  3  There  exists  a  sequence  mn  such  that 


lim  \\pKuk-lkk,mn  -  pkl:k  =  0. 

fc=mn  +1 

This  result  relies  on  the  fact  that  for  term  k  in  the  sum,  the  two  dy¬ 
namics  are  the  same  up  to  time  k  —  1  and  simply  differ  at  iteration 
k  where,  on  one  hand,  the  O*'™"  ,fc^are  “rejuvenated”  with  Kk,mn 
and  on  the  other  hand  only  6 ^  is  sampled  with  Kk  .  When  9i  and 
62  lie  in  bounded  spaces  one  can  bound  the  error  introduced,  and 
show  that  there  exists  0  <  0  <  1  such  that  for  mn  =  n  —  n0  the 
sum  of  these  errors  goes  to  zero  as  n  — >  +00. 

By  combining  the  three  propositions  and  using  the  conver¬ 
gence  result  proved  for  the  SAME  algorithm  we  can  deduce  the 
following  result: 

Theorem  4  There  exist  sequences  mn  and  7„  such  that  for  any 

lim  lip7"  -  /lift ;n  II  =  0, 

n— b-foo  ||  (I 

which  proves  the  validity  of  the  SA2ME  algorithm  under  suitable 
conditions.  Note  that  these  results  rely  on  a  boundedness  assump¬ 
tion  on  the  parameters.  We  are  currently  extending  these  results  to 
more  general  cases  for  other  problems. 


6  Simulation  results 

We  applied  the  two  algorithms  described  above  for  the  following 
parameters:  T  —  64  and  k  —  2.  We  define  Ei  =  a2.  +  a^. .  E\  — 
20,  E2  =  6.32,  -  arctan  (asl/aCl)  =  0,  -  arctan  ( aaj  aC2)  = 
ixj 4, a>i/ 2n  =  0.2  and  w2/27r  =  0.3.  The  SNR  is  defined 
as  10  log , o  E\/  (2 (T2)  and  equal  to  ldB.  Theoretically,  the  algo¬ 
rithms  require  a  so-called  logarithmic  cooling  schedule  7;  and  an 
infinite  number  of  iterations  to  converge.  This  sequence  goes  to 
+00  too  slowly  to  be  used  practically.  We  run  here  the  algorithms 
for  500  iterations  and  select  a  linear  growing  cooling  schedule 
7;  =  A  +  Bi  where  70  =  1  and  7500  =  102.  We  used  the 
same  series  7 ,  for  the  second  algorithm  and  set  to  =  20.  Note  the 
slower  convergence  of  the  second  algorithm  compared  with  the 
first  one,  as  expected. 

Figure  1 :  Convergence  of  the  SAME  towards  the  marginal  MAP 
estimates  of  the  frequencies 

Figure  2:  Convergence  of  the  SA2ME  algorithm  towards  the 
marginal  MAP  estimates  of  the  frequencies 


[1]  C.  Andrieu  and  A.  Doucet,  “Joint  Bayesian  Detection  and 
Estimation  of  Noisy  Sinusoids  via  Reversible  Jump  MCMC,” 
IEEE  Trans.  Signal  Processing,  vol.  47,  no.  10,  pp.  2667- 
2676,  1999. 

[2]  P.  Barone,  R.  Ragona,  “Bayesian  estimation  of  parameters  of 
a  damped  sinusoidal  model  by  a  Markov  chain  Monte  Carlo 
method,”  IEEE  Trans.  Sig.  Proc.,  45  (7)  (1997)  1806-1814. 

[3]  J.M.  Bernardo,  A.F.M.  Smith,  Bayesian  Theory,  Wiley  series 
in  Applied  Probability  and  Statistics,  1994. 

[4]  G.L.  Bretthorst,  “Bayesian  Spectrum  Analysis  and  Parame¬ 
ter  Estimation,”  Lecture  Note  in  Statistics,  vol.  48,  Springer- 
Verlag,  New- York,  1988. 

[5]  P.M.  Djuric,  H.  Li,  “Bayesian  spectrum  estimation  of  har¬ 
monic  signals,”  IEEE  Sig.  Proc.  Letters,  2  (11)  (1995)  213- 

[6]  A.  Doucet  and  C.  Andrieu,  “Robust  Bayesian  spectral  analy¬ 
sis  using  MCMC,”  in  Proc.  EUSIPCO’98,  Island  of  Rhodes, 
Sept.  1998. 

[7]  E.T.  Jaynes,  “Bayesian  Spectrum  and  Chirp  Analysis,”  in 
Maximum  Entropy  and  Bayesian  Spectral  Analysis  and  Es¬ 
timation  Problems,  Ed.  D.  Reidel,  Dordrecht-Holland,  1987, 

[8]  A.  Philippe,  “Simulation  of  right  and  left  truncated  gamma 
distributions  by  mixtures,”  Statistics  and  Computing,  1, 
(1997),  173-181. 

[9]  D.C.  Rife,  R.R.  Boorstyn,  “Multiple-tone  parameter  estima¬ 
tion  from  discrete-time  observations,”  Bell  Syst.  Tech.  J.,  55 
(1976)  1389-1410. 

[10]  C.P.  Robert,  A.  Doucet  and  S.J.  Godsill,  “Marginal  Max¬ 
imum  A  Posteriori  Estimation  using  MCMC,”  Proc.  IEEE 

[Ill  P.  Stoica,  R.L.  Moses,  B.  Friedlander,  T.  Soderstrom,  “Max¬ 
imum  likelihood  estimation  of  the  parameters  of  multiple 
sinusoids  from  noisy  measurements,”  IEEE  Trans.  Acou. 
Speech  Sig.  Proc.,  37  (1989)  378-392. 




Jean-Pierre  Leduc 

Washington  University  in  Saint  Louis,  Department  of  Mathematics 
One  Brookings  Drive,  P.O.  Box  1146,  Saint  Louis,  MO  63130 


The  paper  presents  new  developments  in  harmonic  analy¬ 
sis  associated  with  the  motion  transformations  embedded 
in  digital  signals.  In  this  context,  harmonic  analysis  pro¬ 
vides  motion  analysis  with  a  complete  theoretical  construc¬ 
tion  of  perfectly  matching  concepts  and  a  related  toolbox 
leading  to  fast  algorithms.  This  theory  can  be  built  from 
only  two  assumptions:  an  associative  structure  for  the  lo¬ 
cal  motion  transformations  expressed  as  Lie  group  and  a 
principle  of  optimality  for  the  global  evolution  expressed 
as  a  variational  extremal.  Motion  analysis  means  not  only 
detection,  estimation,  interpolation,  and  tracking  but  also 
propagators  motion-compensated  filtering,  signal  decompo¬ 
sition,  and  selective  reconstruction.  The  optimality  prin¬ 
ciple  defines  the  trajectory  and  provides  the  appropriate 
equations  of  motion,  the  selective  tracking  equations,  the 
selective  constants  of  motion  to  be  tracked,  and  all  the 
symmetries  to  be  imposed  on  the  system.  The  harmonic 
analysis  provides  new  special  functions,  orthogonal  bases, 
PDE’s,  ODE’s  and  integral  transforms.  The  tools  to  be  de¬ 
veloped  rely  on  group  representations,  continuous  and  dis¬ 
crete  wavelets,  the  estimation  theory  (prediction,  smooth¬ 
ing  and  interpolation)  and  filtering  theory  (Kalman  filters, 
motion-based  convolutions,  integral  transforms).  All  the 
algorithms  are  supported  by  fast  and  parallelizable  imple¬ 
mentations  based  on  the  FFT  and  dynamic  programming. 


In  this  paper,  the  harmonic  analysis  on  motion  transforma¬ 
tions  is  built  on  the  actual  kinematics  as  they  take  place  in 
the  external  space  and  in  the  projections  on  sensor  arrays 
(Figure  1).  Eventually,  they  are  embedded  in  the  signals  to 
analyze.  From  that  point  of  view,  this  approach  fundamen¬ 
tally  differs  from  the  motion  models  currently  presented  in 
the  Literature  (see  in  [1]  and  all  the  references)  which  rely 
on  techniques  based  on  stochastic  processes,  statistics  and 
operations  research.  These  are  namely  block-matching,  pel- 
recursive  and  Bayesian  techniques.  As  a  major  drawback, 
these  techniques  are  totally  blind  to  the  underlying  mathe¬ 
matical  structures  of  the  spatio-temporal  transformations. 

The  author  wants  to  thank  Prof.  B.  Blank  in  the  Math. 
Dept,  for  helpful  discussions  and  Prof.  B.  K.  Ghosh  in  the  SSM 
Dept,  for  his  support  on  numerical  computations.  This  research 
work  was  supported  by  the  AFOSE  grant  No.  F49620-99- 1-0068. 

The  main  point  of  the  approach  proposed  in  this  pa¬ 
per  is  to  bring  differential  geometry,  mechanics  on  manifold 
and  harmonic  analysis  into  signal  analysis.  This  theory  pro¬ 
vides  the  actual  kinematics  and  relies  only  on  two  key  as¬ 
sumptions  that  can  be  summarized  as  follows:  a  Lie  group 
structure  (i.e.  an  associative  law  of  composition,  and  an 
identity  element  for  the  local  transformations)  and  a  prin¬ 
ciple  of  optimality  (for  the  global  evolution).  From  those 
two  key  points,  a  complete  machinery  of  theory,  analysis 
tools  and  fast  algorithms  can  be  constructed  in  such  a  nice 
way  that  all  the  concepts  perfectly  match  to  each  other. 
This  paper  presents  new  developments  on  this  important 
topic  that  cover  all  the  kinematics  embedded  in  any  spatio- 
temporal  real  and  complex  signals  and  apply  to  video,  radar 
and  sonar. 

The  construction  of  Lie  group  representations  (i.e.  the 
analyzing  functions  in  the  signal  space)  leads  naturally  to 
several  important  topics.  First,  this  leads  to  the  existence 
of  continuous  wavelet  transforms  with  frames,  tight  frames 
and  new  discrete  wavelets  placed  along  the  trajectories  which 
perform  spatio-temporal  and  motion-based  atomic  decom¬ 
positions,  expansions,  filtering  (prediction,  smoothing  and 
interpolation),  estimation  and  motion-selective  reconstruc¬ 
tions.  The  second  topic  deals  with  the  characters  of  the 
group  representations  to  define  new  special  functions  and 
integral  transforms  (IT)  which  generalize  the  Fourier  kernel 
for  the  new  kinematics  of  interest.  The  third  topic  proceeds 
with  the  adjunction  of  a  principle  of  optimality  based  on 
Euler-Lagrange  equations  and  define  the  existence  of  a  tra¬ 
jectory  and  a  tracking.  This  gives  rise  to  the  Partial  Differ¬ 
ential  Equations  (PDE)  as  equations  of  wavelet  and  signal 
motion  and  to  Ordinary  Differential  Equations  (ODE)  for 
tracking.  Fourth,  the  Green  functions  associated  with  these 
PDE’s  turn  out  to  be  the  previous  special  functions  related 
to  the  kinematics.  At  this  stage,  we  yield  a  global  analysis 
structure  with  the  construction  of  signal  propagators,  and 
motion-compensated  filters. 


In  their  general  form,  the  Lie  group  representations  Tg  act¬ 
ing  upon  functions  ’F  6  L2(Rn  x  M,  dkdui)  read 

[?,$](£, w)  =  an/2  ei{“T+U)  ^(T^u,)]  (1) 

0-7803-5988-7/00/$  10.00  ©  2000  IEEE 


where  g  is  an  element  of  the  group  G,  the  L2  normaliz¬ 
ing  factor  a"/2  originates  from  a  Radon-Nikodym  derivative 
and  provides  unitary  representations,  er(UT+k'b )  stands  for 
the  character  of  the  subgroup  of  spatio-temporal  transla¬ 
tions,  and  w)  is  the  left-group  action  of  g  €  G  in  the 

dual  space.  The  dual  space,  also  called  the  phase  space, 
is  the  Fourier  domain  denoted  ~  with  spatial  frequencies 
jc  6  Rn  and  temporal  frequency  u  G  E.  The  parameters 
a  €  IR+\{0},  6  €  Rn,  and  r  €  1  are  respectively  the  scale, 
the  spatial  and  temporal  translations. 

From  the  group  representations,  we  define  the  contin¬ 
uous  wavelet  transform  as  the  operator  Wy,  mapping  the 
function  S  €  H  =  L2(Rn  x  E)  into  functions  of  g  defined 

and  moving  at  the  constant  velocity  v.  The  convolution  per¬ 
formed  along  this  displacement  (i.e.  along  the  trajectory) 
allows  the  reconstruction  of  the  still  signal  F{x,t).  This 
property  is  in  fact  a  reminiscence  of  the  motion-compensated 
filtering  developed  by  the  author  in  [3]  which  is  going  to  be 
generalized  in  this  work  by  the  introduction  of  IT’s.  Even¬ 
tually,  let  us  move  to  the  Fourier  domain  and  retrieve  the 
usual  condition  of  admissibility  for  the  Galilean  wavelet  as 
described  in  [9,  12,  13].  Proceeding  with  Equation  (5)  in 
the  Fourier  domain,  we  obtain 

F(k,u>)  =  F(k,uj)  f  f  |\k  (a  k  ,  w  —  k  v)  |2  fa 

Jr  J r+\{H}  a 


which  leads  to  the  usual  condition  of  square-integrability  of 

[W^S](g)=  /  S(Z,t)[Tg*](x,t)dnxdt=<S\Tg<f’ > 
J  R»xR 


This  inner  product  <  ,|.  >  would  remain  a  simple  correla¬ 
tor  between  [T9'F](»,  t)  and  S(x,t)  if  no  further  conditions 
were  imposed  on  the  unitary  and  irreducible  group  repre¬ 
sentations.  In  fact,  to  be  a  continuous  wavelet  transform, 
the  mapping  must  be  invertible  i.e.  that  there  exists  an 
operator  WIl  such  that  Wff 1  W7y,  =  Ih-  Ih  is  the  identity 
operator  in  the  Hilbert  space  of  observation  H.  This  means 
that  we  want  to  perfectly  reconstruct  the  signal 

S(x,t)  =  f  [W*S](g)  [Tg*](x,t)  dMg)  (3) 

d\i  is  the  left-invariant  Haar  measure  calculated  on  the 
group  G.  The  condition  to  be  fulfilled  in  order  to  derive 
the  inverse  transform  is  known  since  1964  in  the  work  of 
Calderon.  Several  examples  considered  in  this  paper  are  de¬ 
fined  in  [4,  5,  6,  7].  The  simplest  case  is  the  affine-Galilean 

group  where  the  group  element  is  g  =  {b,  r,  v,  a}  where  v  € 
R71  is  the  velocity  vector  [6].  The  left-group  action  is  given 
by  (2-[x  —  b  —  v(t  —  r)],  t  —  r)  and  the  representation  in 
Equation  (1)  reads  [?«,$](£,  w)  =  an '2  ei{-UT+U)  *[k  J) 

with  k  =  ak,  ui  =  uj  +  k  •  v.  Let  us  examine  the  condition 
for  an  invertible  transform  in  the  affine-Galilean  case  with 
n  =  1  i.e.  k,r,i)6l  and  a  €  R+  \{0}  as  follows 

the  Galilean  wavelet  in  one-dimensional  space  and  time 

f  f  \9(k,u) |2 

Jr  Jr  W2 

dk  dui  =  1 


See  references  [4,  5,  6,  7]  for  the  properties  and  applications 
of  the  Galilean  wavelets. 

The  construction  of  orthonormal  bases  proceeds  by  dis¬ 
cretizing  the  group  parameters  into  a  lattice.  The  spatio- 
temporal  lattice  i  is  easily  defined  as  a  generalization  of 
the  discretization  affine  group  a  =  a™ ,  b  =  ribb.a™ ,  v  = 
nvv*a™,  t  =  nrT*  with  a,  >  1,  and  b,,  v,,  t,  >  0  for  con¬ 
venience.  If  we  now  consider  the  regular  left-composition 
g~l(x,t)  =  ,  t  — r)  in  the  Galilean  case,  we  can 

mimic  the  case  of  the  affine  group  [6]  as  follows.  Let  a,  =  2 
andT9,4'(a;,t)  =  ( a~mx  -  mb ,  -  nvv*(t  -  nTrt)  , 

t  —  7i«  r„)  where  we  retrieve  the  well-known  orthonormal 
bases  <5/ m,p,q(x,t)  =  2~m,2'S>  (2~mx -p,t  -  g)  in  L2(R  x 
R)  at  p  =  mb.  +  nvv.nTr,,  q  =  nTr.  with  p,  q  €  Z.  Tech¬ 
nically,  we  have  deployed  the  usual  discrete  wavelets  de¬ 
fined  from  the  affine  group  along  spatio-temporal  transla¬ 
tions  that  correspond  to  the  motion  trajectories  at  constant 
velocity  [3]. 


<F|r9*>  (TgV)(X)  dX,(g)  (4) 

which  becomes  after  some  easy  computations 

[[  if  f(») 

JrJr+  [JrxR  \  P  / 

(^a  *t,  $a) 

(x-y)-  v(t  -  p) 
t~  P 

dy  dp  |  ~~F~(5) 

LL {F’"  <'t'  *■  ?-)}( ’ )  ^ 

where  we  have  let  ^(a^f)  =  \k(— x,—  t)  and  ^ta(x,t)  = 
a”1'k(|,  <).  Let  us  make  an  important  remark  about  Equa¬ 
tions  (5)  and  (6):  the  introduction  of  a  non-conventional 
spatio-temporal  convolution  denoted  *v  is  in  fact  a  convo¬ 
lution  twisted  along  the  Galilean  transformation  i.e.  the 
translation  in  space  has  a  component  depending  on  time 

In  this  section,  we  proceed  one  step  further  on  the  represen¬ 
tations  and  focus  on  the  characters.  The  integration  of  the 
characters  leads  to  special  functions.  These  special  func¬ 
tions  naturally  define  the  kernel  of  an  integral  transform. 
This  procedure  can  performed  for  each  group  of  spatio- 
temporal  transformation.  Let  us  consider  an  important  ex¬ 
ample  known  as  rotational  motion  (described  in  [5]).  The 
set  of  parameters  is  given  as  G  —  {<?|<7  =  (b,r,6i,a)}  where 
&i  is  the  angular  velocity  Oi  6  E.  The  composition  law  is 
given  as  g  o  g  =  {b  +  aR(0ir)b  ,  r  +  r  ,  9i  +  #i,  aa  }; 
the  inverse  element  reads  as  p_1  =  {— a~1R(0iT)~1b,  — 
t,  —0i,  o-1}.  The  group  representations  T(p)'I'J  (k,  ui) 

in  polar  coordinates  b  — t  (r,  6b)  and  k  -4  (k,  6k)  with  n  =  2 

0?  J  in  ein(8k  +  6b)  +  kT  sinM)  $(afc,  0k,  6i  ft) 



with  x  =  Ob  —  Ok+diT  and  f2  =  .  The  characters  of  this 

representation  lead  to  the  special  functions  (Figure  4) 

i  f2w 

Jn(k )  =  ±  /  ei[Uu+ksinu]  du  (10) 

^  J  o 

which  axe  usually  NOT  Bessel  functions  except  when  9i 
takes  an  integer  values.  The  complexification  of  u  -¥  i  y 
gives  rise  to  hyperbolic  motion  instead  of  circular  rotations 
along  with  new  special  functions  as  in  (10)  with  instead  real 
exponential  and  sinh  functions.  These  special  functions  can 
be  also  easily  obtained  by  considering  'J'  as  a  Dirac  measure 
and  integrating  this  measure  along  the  trajectory.  This 
process  corresponds  to  “mechanics  of  moving  points”  and 
defines  the  spectral  signatures  of  objects  moving  according 
to  such  transformation.  The  usual  way  to  deduce  the  ODE 
which  admits  this  special  function  as  solution  is  to  calculate 
the  Laplace-Baltrami  differential  operator  on  this  group. 
Theorems  of  additivity  for  these  special  functions  can  be 
deduced  from  the  composition  of  the  translations.  In  this 
case,  it  reads 



ri)  J[t-n2](k  r2)  dt  =  «/[n1-n2][fc(ri  +r2)] 


Equation  10  leads  to  “Hankel-like”  integral  transforms 



=  /  /, 


(r)  Jh(fc  r)  r”  1  dr  (12) 

The  same  procedure  and  computations  can  be  done  on  all 
the  groups  dealing  with  spatio-temporal  transformations 
defined  in  [4,  5,  6,  7].  Examples  on  the  Galilean  group 
[6,  7]  proceed  with 

fR  e~iuT  dr  =  5{yJ  +  fct))  e«(—0f 

on  the  acceleration  group  [4]  with 

fR  e~iwT 

=  es4  eikb  eik^r2  «*22’ 

where  72  €  M,  on  the  deformations  [8]  with 

In  *-iUT  e~ike“1>xdr  =  ±  F (-<£) 

where  si  6  1  and  F()  is  the  usual  Gamma  function. 


According  to  calculus  of  variations,  the  motion  between 
times  ti  and  t2  coincides  with  the  extremal  of  the  func¬ 
tional  J 


=  0  with  J=  L[q(t),  q(t),q, . . . ,  q  t]dt,  (13) 

where  5  stands  for  the  variation.  The  application  of  the 
optimal  variational  principle  in  Equation  (13)  is  equivalent 
to  writing  the  so-called  Euler-Lagrange  equation  [7].  The 
trajectory  is  then  uniquely  defined  if  the  initial  state  q( 0)  = 
qo  of  the  object  is  known.  At  the  extremum,  denoted  by 
the  subscript  »,  the  Euler-Lagrange  equation 

d  dL  _  dh_ 
dt  d'qt  dq. 


This  Euler-Lagrange  equation  generalizes  quite  easily  and, 
moreover,  allow  us  to  derive  the  equation  of  wavelet  motion 
that  optimizes  the  action  J.  If  we  consider  the  Galilean  case 
with  one-dimensional  space  with  q(r)  =  b(r)  and  q(r)  = 
b(r)  and  the  inner  product  2  as  Lagrangian,  then  (14)  be¬ 

d  d  <  jWf  >  _  d  <  >  =  Q  (15) 

dr  db  db 

It  is  convenient  to  expand  the  total  differential.  The  condi¬ 
tions  to  introduce  the  operator  in  the  integral  are  fulfilled. 
One  solution  of  this  IT  is  that  the  kernel  be  equal  to  0. 
This  gives  a  PDE  on  \l>(afc,  to  —  kb)  i.e.  the  motion  equation 
for  the  wavelet.  In  the  Fourier  domain  the  PDE  operator 

is  siven  by 

( bk  +  u))( 



i>dk  1  Li>2  dk  dui 

and  the  PDE  by  Aw,  'f(afc,w  —  kb)  =  '5f(ak,u)  —  kb). 
There  are  many  applications  out  of  this  procedure  which 
can  be  similarly  drawn  for  each  spatio-temporal  group.  Two 
are  examined  below  and  a  third  in  Section  5. 

If  we  consider  a  wavelet  tuned  on  parameter  gi  and 
the  Dirac  measure  on  parameter  g2,  the  partial  differential 
operator  ^  becomes  n(t,lll)a, 4l.fc)W) 



V2  —  Vi  dk 

)+*  V\ 

1  d  t  1  d2 ' 
(v2  —  Ui)2  dk  V2  —  ui  dk2 

and  the  PDE  becomes  an  ODE  i.e.  the  tracking  equation 
H(WJ -,k,u)  ^((ak}-k(v 2  -  Vi))  =  $(ofc,-/i:(t)2  -  m)). 

Let  us  consider  a  Galilean  Morlet  wavelet  [6,  7]  applied 
to  a  Dirac  measure  in  pure  translation  motion  at  constant 
velocity.  The  signal  taken  as  a  Dirac  measure  on  a  trans¬ 
lational  trajectory  is  given  by  S{x,t)  =  5[x  —  vt].  The 
Lagrangian  $[&,  T,v\ka,wa\  —<  'k9|S'  >  reads  after  inte¬ 
grating  the  inner  product,  we  get  <!>[&,  r,  v\  ko,  wo]  = 

y^27T  eliko(l>-bT)]  g[- J{(6T-i))2+T2}]e[-tw0r] 




k0  and  ojo  axe  the  coordinates  of  the  wavelet  shift  in  the 
Fourier  domain.  The  contribution  of  all  the  partial  deriva¬ 
tives  involved  in  the  Euler-Lagrange  equation  namely  leads 
to  an  ODE  in  form  of  a  product  of  F,  which  is  a  complex 
function  of  the  constant  of  motion  br  —  b  and  2b  =  —hr, 
with  the  Lagrangian  $[6,  r,  v\  fco ,  cc^o] 

F[br  —  b,  br  —  2b]  4>[6,  r,  v,  ko,  wo]  =  0  (19) 

such  that  F(0, 0)  =  0.  The  ODE  vanishes  when  v  =  b,i>T  — 
b  =  0,  br  —  2b  =  0  and  w 0  =  0,  fco  /  0.  Therefore,  we  have 
verified  that  the  tracking  addresses  the  correct  constant  of 
motion  b  =  br  and  b  =  br  +  | br 2  meaning  that  the  system 
can  track  objects  at  constant  velocity  and  constant  second- 
order  acceleration.  The  tracking  requires  some  symmetry 
in  the  wavelet  i.e.  that  the  still  wavelet  must  be  located  in 
the  plane  ui  =  0  with  fco  /  0.  These  practical  results  have 
algorithmic  importance  as  pictured  in  [7]. 



This  section  extends  the  concept  of  velocity  filtering  origi¬ 
nally  defined  by  Fleet  and  Jepson  in  [1],  studied  by  Dubois 
in  [2]  for  all  the  categories  of  motion  within  the  approach 
pursued  in  the  previous  section.  To  reach  that  goal,  we 
introduce  integral  transforms  whose  kernels  are  motion- 
specific  Green  functions.  In  the  following,  it  is  demon¬ 
strated  that  the  motion-specific  Green  functions  can  be 
equivalently  derived  from  the  characters  of  the  group  repre¬ 
sentations  in  Section  (3)  or  from  the  fundamental  solution  of 
the  PDE  of  the  wave  equation  of  Section  (4).  This  leads  to 
convolutional  integral  transforms  twisted  along  the  motion 
transformations  as  presented  in  Section  (2).  The  interest¬ 
ing  point  of  this  approach  comes  from  the  Equations  of  the 
wavelet  motion  themselves  (16)  expressed  in  the  Fourier  do¬ 
main.  As  a  result  of  the  existence  of  the  term  ^  in  A,  the 
PDE  can  be  re-written  in  the  Fourier  domain  in  the  form 

A  'Sf(g~1X)  =  where  X  =  (k,u>)  (20) 

with  an  eigen  value  at  1.  The  Green  function  G  for  operator 
A  is  the  distribution  which  satisfies  A  G(g~1X)  =  S(g~1X). 
The  Green  function  is  the  Dirac  S(g~1X)  itself.  The  Green 
function  known  as  the  fundamental  solution  of  the  PDE  as 
in  Equation  (20).  If  the  operator  A  is  injective  then,  the  in¬ 
verse  A-1  exists  and  provides  a  convolutional-type  integral 
transforms  whose  kernel  is  the  Green  function  i.e. 

If  g  =  e  the  identity  element,  we  retrieve  the  usual  Fourier 
transform  with  kernel  K(k,u)  =  5(w)eiks.  This  procedure 
defines  for  each  kind  of  motion  the  kernel  K(b,io;m;x,t) 
that  particularizes  the  usual  Fourier  transform  for  the  mo¬ 
tion  group  of  interest,  m  denotes  the  current  motion  pa¬ 
rameter.  If  the  Dirac  measure  is  transformed  into  a  contin¬ 
uous  wavelet  with  compact  support,  then  the  calculation 
of  9(k,u;m)  animated  of  motion  m  from  its  still  cognate 
^(x,t)  becomes  an  integral  transform  with  kernel  K.  Let 
us,  for  example,  consider  the  kernel  of  accelerated  wavelets 
as  propagator  presented  in  section  (3)  and  integrate  with 
a  still  Morlet  wavelet  [6,  7],  this  yields  the  propagated 
wavelets  for  second-order  accelerations 

'Pfo(k,u))  =  (27 r)  e‘T 


Moreover,  as  the  function  T  can  now  be  scaled  to  extend 
the  results  from  the  “point  mechanics”  towards  the  “object- 
based  mechanics”  as  follows 

9(k,aj;m,a,ao)  =  /  X(k,u};m;x,t)^(x,t;a,ao)dnxdt 
J  RxR" 


We  have  reach  so  far  the  ability  to  generate,  cancel  or  mo¬ 
dify  analyzing  wavelets  as  well  as  moving  patterns. 



G(x,£)  f(x)dx 

These  kernels  are  meaningful  and  remind  the  propagators 
associated  with  Green  functions  of  the  Schrodinger  equa¬ 
tions.  The  meaning  of  Equation  (21)  and  of  the  wavelet- 
based  reproducing  kernels  [7]  leads  to  the  following  duality 
of  the  motion  analysis. 

(1.)  If  the  still  version  of  a  signal  (wavelet,  filter  or 
stochastic  process)  f(x)  is  known,  then  reproducing  ker¬ 
nel  integral  transform  provides  all  the  moving  version  in 
(x,t)  or  in  (k,u).  These  integral  transforms  generate  the 
whole  family  of  analyzing  signals,  wavelets  or  processes  in 
the  observing  space  i2(Kn  x  M,  dnxdt).  This  allows  spatio- 
temporal  filtering,  interpolating,  and  predicting  along  a  tra¬ 

(2.)  If  the  animated  version  of  a  signal  is  known,  then 
Equation  (21)  is  a  filter  that  compensates  the  signal  from 
a  given  motion  and  gives  rise  to  the  still  signal.  This  is 
motion-compensation  filtering.  The  advantage  of  such  ap¬ 
proach  is  that  the  classical  affine  wavelet  analysis  and  pro¬ 
cessing  may  then  be  applied  on  the  compensated  signal 
(for  coding  purpose  as  in  [3]).  This  section  brings  a  more 
general  point  of  view  on  the  motion  analysis  presented  in 
[3]  where  motion  compensated  filtering  was  performed  by 
building  the  trajectories  within  the  signal  and  applying  dis¬ 
crete  wavelets  along  the  assumed  trajectories. 

Let  us  then  revisit  Section  (3)  and  compute  the  Fourier 
transform  of  a  Dirac  measure  on  a  trajectory 

This  paper  has  shed  light  on  a  novel  motion  analysis  based 
on  a  group-theoretic  approach.  Let  us  consider  the  pro¬ 
jection  of  moving  patterns  on  sensor  arrays  which  creates 
the  most  important  part  of  all  the  acceleration  components 
embedded  in  signals.  The  traffic  sequence  (Figure  2)  is  an 
example.  The  projection  takes  place  within  the  cone  of  sen¬ 
sor  visibility  (Figure  1)  is  a  homothety  (i.e.  a  re-scaling). 
The  projection  may  be  modelled  as  an  orthogonal  projec¬ 
tion  composed  with  a  scaling.  Let  us  define  the  2-axis  or¬ 
thogonal  to  the  sensor  plane  and  the  x—y  axes  in  the  sensor 
plane.  The  motion  captured  in  the  sensor  plane  is  obtained 
after  a  projection  on  planes  n0,  IIi,  JJ2  parallel  to  sensor 
at  time  r  =  0,  1,  2  and  a  homothety  that  rescales  the  pro¬ 
jection  down  to  the  plane  of  the  sensor  (Figure  1).  Let  us 
denote  W  the  width  of  the  rigid  object  and  So  the  size  of 
the  object  captured  by  the  camera.  The  scale  ao  =  is 
observed  from  plane  IIo  at  time  r  =  0.  At  time  r  =  n, 

the  size  perceived  from 

n  —  w  _  w  _ 

”  Sn  So(l-^-r)  ~ 

plane  n„  by  the  camera  is  given  by 

:TT%7=ao  [l  +  *fr+(^)V+...] 

=  ao[l  +  air  4-  a2r2  +  . . .  +  anrn  +  ...].  The  series  is  con¬ 

vergent  if  l^rl  <  1  i.e.  with  the  physical  observation.  The 

components  of  translation,  velocity  and  accelerations  along 
x  and  y  axis  are  rescaled  with  the  ratio  -r-  =  —  =  . 

bo  v0  d 


[1.]  A.  Tekalp.  “Digital  Video  Processing”,  Prentice-Hall, 

[2.]  E.  Dubois.  “Motion-Compensated  Filtering  of  Time- 
Varying  Image”,  Multidim.  Syst.  Sig.  Proc.,  Vol.  3,  pp. 
211-239,  1992. 





Figure  1:  Tracking  in  a  sensor  cone. 

50  100  150  200  250 

Figure  2:  The  20th  image  of  the  car  digital  image  sequences. 

[3.]  J.-P.  Leduc,  J.-M.  Odobez  and  C.  Labit.  “Adaptive 

Motion-Compensated  Wavelet  Filtering  for  Image  Sequence 
Coding”,  IEEE  Transactions  on  Image  processing,  Vol  6, 
No.  6,  pp.  862-878,  June  1997. 

[4.]  J.-P.  Leduc,  J.  Corbett,  M.  Kong,  V.  M.  Wickerhauser, 
B.  K.  Ghosh.  “Accelerated  Spatio-temporal  Wavelet  Trans¬ 
forms:  an  Iterative  Trajectory  Estimation” ,  IEEE  ICASSP, 
Vol  5,  1998,  pp.  2777-2780. 

[5.]  M.  Kong,  J.-P.  Leduc,  B.  Ghosh,  J.  Corbett,  V.  Wicker¬ 
hauser.  “Wavelet  based  Analysis  of  Rotational  Motion  in 
Digital  Image  Sequences”,  ICASSP-98,  Seattle,  May  12-15, 
1998,  pp.  2781-2784. 

[6.]  J.-P.  Leduc,  F.  Mujica,  R.  Murenzi,  M.  J.  S.  Smith. 
“Spatio-Temporal  Wavelet  Transforms  for  motion  track¬ 
ing”,  ICASSP-97,  Munich,  Vol  4,  pp.  3013-3017,  1997. 

[7.]  J.-P.  Leduc,  F.  Mujica,  R.  Murenzi,  and  M.  Smith.  “Spatio- 
Temporal  Wavelets:  a  Group-Theoretic  Construction  for 
Motion  Estimation  and  Tracking” ,  to  appear  in  SIAM  Jour¬ 
nal  of  Applied  Mathematics. 

[8.]  J.  Corbett,  J.-P.  Leduc,  M.  Kong.  “Analysis  of  Deforma- 
tional  Transformations  with  Spatio-Temporal  Continuous 
Wavelet  Transforms”,  ICASSP-99,  March  15-19,  1999. 

Figure  3:  Estimation  of  the  parameters  and  do  by  com¬ 
puting  the  square  modulus  (energy)  of  the  wavelet  transform 
as  in  [8]:  |  <  T{g)<S> \s  >  |2  =  F(o0,  *f)  is  estimated  in 
the  scene  displayed  in  Figure  2.  Two  local  maxima  are  de¬ 
tected  and  displayed  at  =  0.5s-1,  ooi  =  2.6)  and 
(HiZ  =  0.38s-1,  a02  =  1.8)  standing  for  the  fore  and  back¬ 
ground  car  respectively.  If  we  assume  d\  =  40  m  for  the 
foreground  car  and  a  rate  of  25  images  per  second,  then  we 
can  estimate  the  relative  approaching  velocity  component  at 
d*!  =  72  km/h  (45  miles/h).  For  the  background  car,  if  we 
assume  d,2  —  50  m,  then  vZ2  =  68.4  km/h  (42.7  miles/h). 
Let  us  remark  that  the  camera  is  traveling  towards  the  cars; 
therefore,  both  velocities  correspond  to  relative  values. 


Figure  4:  Spatio-temporal  special  function  associated  with 
the  rotational  motion.  The  sketch  is  performed  on  sections  at 
constant  ui,  the  angular  velocity  =  1.5  radian/image. 



M.  Frikel,  W.  Utschick,  and  J.  Nossek 

Technical  University  of  Munich 
Institute  for  Network  Theory  and  Signal  Processing 
Arcisstr.  21,  D-80290  Munich,  Germany 
mifr  @nws.  e-technik .  tu-muenchen .  de 


In  the  classical  methods  for  blind  channel  identification 
(Subspace  method,  TXK,  XBM)  [1,  2,  3],  the  addi¬ 
tive  noise  is  assumed  to  be  spatially  white  or  known  to 
within  a  multiplicative  scalar.  When  the  noise  is  non¬ 
white  (colored  or  correlated)  but  has  a  known  covari¬ 
ance  matrix,  we  can  still  handle  the  problem  through 
prewhitening.  However,  there  are  no  techniques  presently 
available  to  deal  with  completely  unknown  noise  fields. 

It  is  well  known  that  when  the  noise  covariance  matrix 
is  unknown,  the  channel  parameters  may  be  grossly  in¬ 
accurate.  In  this  paper,  we  assume  the  noise  spatially 
correlated,  and  we  apply  this  assumption  for  blind  chan¬ 
nel  identification.  We  estimate  the  noise  covariance 
matrix  without  any  assumption  except  its  structure 
which  is  assumed  to  be  a  band-Toeplitz  matrix.  The 
performance  evaluation  of  the  developed  method  and 
its  comparison  to  the  modified  subspace  approach  (MSS) 
[4]  are  presented. 


One  common  problem  in  signal  transmission  through 
any  channel  is  the  additive  noise.  In  general,  additive 
noise  is  generated  internally  by  components  such  as  re¬ 
sistors,  and  solid-state  devices  used  to  implement  the 
communication  system.  This  is  sometimes  called  ther¬ 
mal  noise  or  Johnson  noise.  Other  sources  of  noise  and 
interference  may  arise  externally  to  the  system,  such 
as  interference  from  the  other  users.  When  such  noise 
and  interference  occupy  the  same  frequency  band  at  the 
desired  signal,  its  effect  can  be  minimized  by  proper  de¬ 
sign  of  the  transmitted  signal  and  its  demodulator  at 
the  receiver.  The  effects  of  noise  may  be  minimized  by 
increasing  the  power  in  the  transmitted  signal.  How¬ 
ever,  equipment  and  other  practical  constraints  limit 
the  power  level  in  the  transmitted  signal  [5]. 

This  work  is  supported  by  Alexander  von  Humboldt- 
Stiftung,  Bundesrepublik  Deutschland. 

The  classical  model  used  in  communication  systems 
supposes  on  the  one  hand  that  the  power  of  the  noise 
is  identical  on  each  sensor,  and  on  the  other  hand  that 
there  is  no  noise  space/time  correlation.  However,  this 
situation  is  seldom  met,  which  involve  a  clear  degra¬ 
dation  of  the  performances  of  the  subspace  methods. 
Here,  we  recall  some  well-known  methods  which  treat 
the  noise  problem  in  array  processing  for  direction-of- 
arrival  estimation.  In  fact,  in  recent  years,  there  has 
been  a  growing  interest  in  the  problem  of  techniques 
with  the  objective  of  decreasing  the  signal  to  noise  ra¬ 
tio  resolution  threshold  or  the  spatially  colored  noise 
[6,  7,  8,  9,  10].  The  ambient  noise  is  unknown  in  prac¬ 
tice,  therefore  modeling  or  its  estimation  are  necessary. 
The  methods  developed  for  this  problem  are  very  few 
and  there  are  no  definitive  solution.  There  are  some 
practical  methods;  in  [11]  two  methods  are  obtained  by 
optimization  of  criterion  and  by  using  AR  or  ARMA 
modeling  of  noise.  In  [7]  the  spatial  correlation  matrix 
of  noise  is  modeled  by  the  known  Bessel  functions.  As 
in  [6]  the  ambient  noise  covariance  matrix  is  modeled  by 
a  sum  of  hermitian  matrices  known  up  to  multiplicative 
scalar.  In  [8]  this  estimate  is  obtained  by  measuring  the 
array  covariance  matrix  when  no  signals  are  present. 
This  procedure  assumes  that  the  noise  is  not  changing 
in  function  of  time,  which  is  not  fulfilled  in  several  do¬ 
main  applications.  Another  possibility  [8]  arises  when 
the  correlation  structure  is  known  to  be  invariant  un¬ 
der  a  translation  or  rotation.  The  so-called  differencing 
covariance  technique  can  be  then  applied  to  reduce  the 
noise  influence.  In  this  method,  two  identical  trans¬ 
lated  and/or  rotated  measurements  of  the  array  covari¬ 
ance  matrix  are  required  and  assumes  the  invariance  of 
the  noise  covariance  matrix,  while  the  source  signals 
change  between  the  two  measurements.  The  estimate 
noise  covariance  matrix  is  eliminated  by  a  simple  sub¬ 
traction.  Furthermore,  this  method  cannot  be  applied 
when  the  source  covariance  matrix  satisfies  the  same 
invariance  property  or  when  only  one  measurement  is 

0-7803-5988-7/00/$  10.00  ©  2000  IEEE 


available.  In  [7]  a  particular  modeling  structure  noise 
covariance  matrix,  which  takes  into  account  the  charac¬ 
teristic  noise  relative  to  its  origins,  is  given.  Recently, 
a  maximum  posteriori  approach  (MAP)  has  been  de¬ 
veloped  in  [10];  this  method  can  only  be  applied  in  the 
case  of  a  linear  array.  In  [9],  the  method  called  “In¬ 
strumental  Variable”  (IV)  is  used  to  reduce  the  noise 
without  estimated  it;  this  estimator  considers  that  the 
noise  is  temporally  independent.  One  technique  based 
to  the  MDL  criterion  has  been  developed  in  [12]  for  de¬ 
tection  and  localization  of  the  signals  in  the  presence 
of  unknown  noise;  this  estimator  is  asymptotically  bi¬ 
ased  [12].  However,  the  study  of  the  noise  for  blind 
channel  identification  is  very  limited.  In  [4],  a  modi¬ 
fied  subspace  method  (MSS)  for  blind  identification  in 
the  presence  of  unknown  correlated  noise  has  been  pre¬ 
sented,  indeed  one  use  some  matrices,  for  a  time  lag 
when  the  noise  is  absent.  The  object  of  this  correspon¬ 
dence  is  to  improve  the  blind  channel  identification  in 
the  presence  of  a  correlated  noise  by  whitening  the  re¬ 
ceived  data.  The  noise  is  assumed  spatially  correlated. 
The  structure  of  the  paper  is  as  follows.  In  the  sec¬ 
tion  II,  we  present  the  studied  problem  and  in  section 
III,  we  describe  the  noise  covariance  matrix  model  used 
in  this  study  and  its  estimation  by  the  proposed  algo¬ 
rithm,  we  apply  the  noise  estimation  for  blind  channel 
identification  using  the  subspace  method.  We  present, 
in  the  section  IV,  some  simulation  results  and  perfor¬ 
mance  comparisons. 


Consider  L  FIR  channels  driven  by  a  common  source. 
The  output  vector  of  the  ith  channel  can  be  written  as: 

ri(k)  =  H{i)s(k)  +  ni(k),  (1) 

where,  r^(fc)  is  the  output  sequence  of  the  ith  chan¬ 
nel,  s i(k)  is  the  input  sequence  and  nj(fc)  is  the  noise 
sequence  on  the  ith  channel. 
ti(k)  -  [ri(k)  ri(k- (- 1)  ...  r^k  +  N  -  1)], 

s(k)  =  [s(k  —  M)  s(k  —  M  +  1)  ...  s(k  +  M  —  1)], 

ni(fc)  =  [rij(fc)  nj(fc  +  l)  ...  nj(fc  +  N  —  1)]. 

4°  4°  h<$  .  0  \ 

0  h{0l)  fcW  ...  ...  0 

.  .  .  .  .  5 

o  ...  o  4°  4°  •••  hM* 

where,  is  the  impulse  response  of  the  ith  channel, 
M  is  the  maximum  order  of  the  L  channels  and  N  is 
the  width  of  the  temporal  window.  is  of  dimension 
(N  x  (AT  +  Af)). 

Then  we  have, 

r  (k)  =  Hs(k)  +  n  (k),  (2) 

f  r x(fc)> 


( ni(fc)\ 

=  ;  U(fc)+ 





The  matrix  H  is  known  as  the  (LN  x  (N+M))  filtering 
matrix,  which  has  the  full  rank  ( N  +  M)  under  the 
following  assumptions',  the  L  channels  do  not  share  a 
common  zero  and  N  >  ( M  -f  1). 

The  blind  identification  problem  is  to  find  H  from  the 

{r(fc)  for  k  =  1,2,...,  FT}. 

The  subspace  method  [1]  exploits  the  sample  covariance 
matrix  of  all  channel  outputs:  T  —  E  [rr+], 

1  * 

T  =  —  ^r(fc)r+(fc),  where  K  is  the  number  of  sam- 


pies  and  +  denotes  the  conjugate  transpose.  Assume 
that  the  signals  and  the  additive  noise  are  independent, 
stationary  and  ergodic  zero  mean  complex  valued  ran¬ 
dom  processes,  and  as  K  becomes  large,  this  matrix 
has  the  asymptotical  structure:  T  =  /HTS'H+  +  T„, 
with  r„  =  E  [nn+  ]  the  noise  covariance  matrix  and 
Tg  —  E  [ss+]  is  the  signal  covariance  matrix. 

The  goal  of  blind  channel  identification  and  equaliza¬ 
tion  is  to  identify  TL  (channel  identification)  and  to  es¬ 
timate  s(fc)  from  r(k)  (channel  equalization). 

The  subspace  blind  channel  identification  procedure  [1] 
consists  on  the  estimation  of  the  (LN  x  1)  vector  h 
of  channel  coefficients  from  the  observation  vector.  In¬ 
deed,  this  approach  is  based  on  the  eigendecomposition 
of  the  data  covariance  matrix, 

r  =  [u. 


The  subspace  method  yields  an  estimate  %  of  H  by 
solving  the  equation:  U+Tf  =  0,  in  a  least  square  sense 
(where  H  is  subject  to  the  same  structure  as  Ti).  This 
estimate  is  uniquely  (up  to  a  constant  scalar)  equal  to 
H .  FYom  [1],  we  have: 

u +n  =  h +un  =  o, 


with  Un  is  the  (L(M  +  1)  x  (N  +  M ))  matrix  obtained 
by  stacking  the  L  filtering  matrices  . 

Un  =  \lAn^T ,  where, 


uP  = 

and  h=  [h(0),...,h<i  x)],  with  hw  =  . 

The  optimization  system  derived  in  [1]  is: 

h  =  arg  min  h+U33h, 



U33  =  £  w<°w«+ 

is  the  filtering  noise  projection  matrix. 

The  noise  is  assumed  Gaussian,  complex  and  spatially 
correlated.  Its  real  and  imaginary  part  are  supposed 
independents,  Gaussian  with,  E[n{\  =  0,  £l[n;n^]  =  0, 
and  £’[n;nif]  =  Tn.  T„  is  the  noise  covariance  ma¬ 
trix,  the  superscripts  and  “+”  denote  conjugate 
and  conjugate  transpose,  respectively.  We  consider  the 
noise  covariance  matrix  is  band,  defined  by: 

T  n(i,m) 

i  —  m  |>  K 

i  —  m  |<  K  and  i  ^  m 
i  —  m 

Where  pi  =  pi  +  jpi,i  =  1, . . .  ,K,  pi  are  complex  vari¬ 
ables,  j2  =  —1,  a2  are  the  noise  variance  at  each  re¬ 
ceiver,  and  K  is  the  spatially  noise  correlation  length. 


P12  •  •  • 


0  \ 


°2  P23 

•  •  • 

...  0 

r„  = 


•••  Pij 

...  0 

^  0 

...  0 

P  (LN)K 

■  •  •  aLN  / 

Two  manners  to  give  back  observation  covariance  ma¬ 
trix  a  noise-free  matrix:  either  by  subtraction  of  the 
noise  covariance  matrix,  'HTS'H+  =  T  —  T„;  then  we 
have  then  a  “clean”  observation  covariance  matrix;  how¬ 
ever,  we  can  obtain  a  negative  matrix  if  Tn  is  bad- 

Or  by  whitening;  in  this  case  we  find  again  the  classical 
model  of  communication  systems  ^Tn  3  ITn  3  j .  How¬ 
ever,  this  processing  is  most  robust  but  needs  more 
computational  load. 

From  the  data  matrix  T  =  'HTgH++Tn,  the  goal  of  the 
first  part  of  this  paper  is  to  estimate  the  noise  covari¬ 
ance  matrix  Tn  and  in  the  second  part,  we  estimate, 

blindly,  H  from  the  “clean”  obtained  matrix  [HT  g'H+] 
using  the  subspace  method  [1]. 


In  many  applications  such  as  communication  systems, 
it  is  reasonable  to  assume  the  correlation  is  decreasing 
along  the  receivers.  That  is  a  widely  used  model  for 
a  colored  noise.  The  correlation  rate  p  is  decreasing 
when  the  distance  between  two  receivers  increases. 

In  this  study,  we  consider  the  noise  covariance  matrix 
band-Toeplitz  with  the  diagonal  values  are  decreasing, 
so-called  decreasing  band-Toeplitz.  It  is  the  unique  as¬ 
sumption  to  estimate  the  noise  covariance  matrix. 

The  BNE  algorithm  from  the  noise  covariance  matrix 
estimation  is  summarized  in  the  following  steps: 

Step  1:  -  Estimation  and  eigendecomposition  of  the  re- 

-  1  T 

ceived  covariance  matrix  T;  T  =  —  with  T  is 

1  t=i  ^ 

the  number  of  independent  realizations;  T  =  UAU+, 
where,  A  =  diag[ Ai, . . . ,  A^jv],  andU  =  [ui,U2, . . .  ,U£#]; 
A i  and  u,-  are  the  eigenvalues  and  the  eigenvectors  of 
the  observation  covariance  matrix,  respectively. 

-  Initialization  of  the  noise  covariance  matrix  :  Tn  =  0. 
Step  2:  -  Calculation  of  the  matrix:  W n+m  =  USAS'  , 
with  Us  =  [ui,u2,...  ,Ujv+m]  is  the  matrix  of  (N+M) 
eigenvectors  corresponding  to  the  (N+M)  eigenvalues, 
and  As  =  diag[\i,. . . ,  Ajv+m]  is  the  matrix  of  (N+M) 

-  Calculation  of  the  matrix:  A  =  W v+AfW++M. 

Step  3:  Calculation  of:  =  KJband  T  —  Aj ,  with 

r(n]  is  the  band  noise  covariance  matrix  at  first  iter¬ 
ation,  and  KJband{.]  designates  the  matrix  band  with 
(K  +  1)  is  the  bandwidth. 

Step  J:  Eigendecomposition  of  the  matrix:  |r  —  Tn  = 

VAV+.  The  new  matrices  A  and  Tn^  are,  again,  es¬ 
timated  in  step  2  and  step  3.  These  iterations  are  re¬ 
peated  until  the  improvement  of  Tn^. 

Stop  test:  The  algorithm  is  stopped  when  the  distance 

between  and  Tn+^  becomes  less  then  some  value 
e.  We  define  the  distance  between  and  T^,+1^  as 
||  r£+1)  —  rW  ||j?,  the  Frobenius  norm  of  the  matrix 
j-f  (i+i)  _  f  W) 

The  estimate  noise  covariance  matrix  Tn  is  obtained 
when  the  algorithm  is  stopped. 

The  matrix  Tn  is  used  to  “denoise”  the  received  data. 

In  fact,  the  free-noise  received  covariance  matrix  is 
f  =  f  -  fn  or  f  =  ^f^3ffn  3^-  This  ” clean”  matrix 


is  used  to  estimate  the  channel  matrix.  In  order  to  eval¬ 
uate  its  performance,  we  apply  the  subspace  method 
[1].  Indeed,  Moulines  et  al.  [1],  showed  that  if  the 
subchannels  don’t  share  common  zeros,  h  is  uniquely 
determined  by  the  noise  subspace  Un,  the  subspace  es¬ 
timator  is  given  by: 

h  =  arg  h+i/33h,  where  U33  is  the  filtering  noise 

projection  matrix  estimated  from  the  ‘’clean”  data  co- 
variance  matrix.  This  estimator  does  not  require  the 
knowledge  of  the  source  covariance  as  long  as  T3  >  0. 
We  also  compare  our  result  to  the  modified  subspace 
(MSS)  method  [4]. 


To  demonstrate  the  efficiency  of  the  proposed  algo¬ 
rithm,  some  computer  simulations  have  been  conducted. 
In  the  following  simulations,  we  take  the  parameters 
described  in  [1],  in  fact  the  number  of  virtual  channels 
is  L  =  4;  the  width  of  the  temporal  window  is  N  —  10; 
the  degree  of  the  ISI  is  M  =  4,  the  channel  coefficients 
are  given  by  [1]: 











-0.1 99-0.9 18j 



0.921-0. 194j 


0.873-0. 145j 


0. 1 89-0.208j 



-0. 171 +0.06  Ij 


0.1 36-0. 190j 

-0.049+0.1 61 j 

Table  1:  Four  virtual  complex  channels. 

for  all  these  simulations,  the  number  of  data  samples 
used  to  estimate  each  h  ranges  from  100  to  1000  in 
steps  of  100. 

The  root  mean-square  error  ( RMSE )  defined,  below, 
is  employed  as  a  performance  measure  of  the  input  es¬ 

RMSE  =  ^  Sfci  ||  Hj  —  H  ||2,  where  K  is  the 

number  of  trials  (100  in  our  cases)  and  H,-  is  the  esti¬ 
mate  of  the  inputs  from  the  ith  trial. 

The  signal  to  noise  ratio  (SNR)  is  defined  as: 

SNR  =  101og10  We  define  the  Frobenius 

norm  of  estimation  error  (EE)  of  the  noise  covariance 
matrix  as :EE  =||  r  -  (HraH+ +  r„)  ||F. 

We  compare  the  presented  algorithm  with  the  exist¬ 
ing  methods  such  as  the  modified  subspace  approach 
(MSS)  [4].  This  comparison  is  based  on  the  root  mean 
square  error  of  the  channel  matrix  estimates.  We  recall, 
this  approach  in  the  following:  Let  T(r)  =  71J(t)'H+  + 
r„(r),  where  J(r)  is  the  (N  +  M)  x  (N  +  M)  shift 
matrix.  In  [4],  one  assumes  that  rn(r)  =  O  as  long 
a s  t  >  N.  Therefore,  we  have  the  relation  T(t)  = 
'HJ(t)'H+  for  t  >  N.  At  the  time  lag  r  =  N,  T(N)  = 

R  (J(1V)  +  J(AT)+)  'H+,  the  matrix  T(N)  is  used  to  es¬ 
timate  the  channel  parameters. 

The  Figures  (la  and  lb)  present  the  root  square-mean 
error  (RMSE)  of  the  parameters  estimates  for  a  band- 
Toeplitz  noise  covariance  matrix  and  the  FYobenius  norm 
of  estimation  of  error  (EE)  of  the  noise  covariance  ma¬ 
trix  versus  number  of  samples. 

Figure  1:  (a)  Root  square-mean  error  (RMSE)  of  the  parameters 
estimates  (band-Toeplitz  noise  covariance  matrix),  (b)  Frobenius 
norm  of  estimation  of  error  (EE)  of  the  noise  covariance  matrix 
(band-Toeplitz  noise  covariance  matrix)  versus  number  of  samples 

In  the  case  of  a  band  noise  covariance  matrix  with  a 
correlation  length  K  —  4,  we  have  Figures  (2a  and  2b), 
versus  SNR  between  0  dB  to  16  dB. 

Figure  2:  (a)  Root  mean-square  error  of  the  parameters  esti¬ 
mates  (band-Toeplitz  noise  covariance  matrix  ( K  =  4))  versus 
SNR.  (b)  Frobenius  norm  of  the  estimation  of  error  (EE)  of  noise 
covariance  matrix  as  a  function  of  number  of  iterations. 

We  study,  the  influence  of  the  correlation  length  versus 
the  error  of  the  noise  covariance  matrix  estimation  Fig¬ 
ure  (3a)  and  the  channel  parameters  Figure  (3b).  In 
fact,  the  correlation  length  varies  between  K  =  1  and 
K  =  4,  with  SNR  =  3  dB. 

The  normalized  error  (NE)  is  defined  by,  NE  = 

We  consider  the  noise  covariance  matrix  band,  and  we 
estimate  the  normalized  error  and  the  Frobenius  norm 
versus  of  different  scenarios  of  the  channel  matrix  (Fig¬ 
ures  (4a  and  4b). 

These,  simulations  show  that  the  processing  which  con¬ 
sists  to  first  estimation  of  the  noise  covariance  ma¬ 
trix  and  prewhitening  the  observation  has  many  ad¬ 
vantages,  is  more  efficient  then  the  modified  subspace 
(MSS)  approach  [4].  The  use  of  the  denoised  subspace 




Figure  3:  (a)  Root  mean-square  error  of  the  parameters  esti¬ 
mates  versus  correlation  length,  (b)  FYobenius  norm  of  the  esti¬ 
mation  of  error  (EE)  of  noise  covariance  matrix  as  a  function  of 
correlation  length. 

Figure  4:  (a)  Normalized  error  (NE)  of  the  parameters  estimates 
versus  scenarios  of  channel  matrix  when  the  noise  covariance  ma¬ 
trix  is  band,  (b)  FYobenius  norm  of  the  estimation  of  error  (EE)  of 
band  noise  covariance  matrix  as  a  function  of  scenarios  of  channel 

method  presented  in  this  paper  becomes  interesting  in 
the  case  of  low  SNR  and  when  the  noise  covariance 
matrix  is  band.  When  the  length  correlation  increases, 
the  interest  of  the  estimation  of  the  noise  increases 
also.  Several  computer  simulations  confirm  these  con¬ 

This  algorithm  can  be,  also,  applied,  naturally,  for 
other  blind  channel  identification  methods  such  as  XBM, 
TXK  ...[2,  3]  disregard  of  the  system  type  used. 


To  estimate,  blindly,  the  noise  than  the  channel  param¬ 
eters,  an  algorithm  was  presented.  We  have  considered 
a  spatially  correlated  noise,  with  only  the  assumption 
that  the  matrix  noise  is  band-Toeplitz,  than  by  an  iter¬ 
ative  algorithm  using  the  eigenstructure,  we  have  esti¬ 
mated  the  noise  parameters.  In  order  to  use  a  ” clean” 
data  for  the  the  estimation  of  the  channel  matrix,  the 
estimated  noise  matrix  was  used  for  ’’prewhitening” 
the  observations.  The  subspace  approach  was,  than, 
applied  for  the  blind  estimation  of  the  channel  param¬ 

[1]  E.  Moulines,  P.  Duhamel,  J.F.  Carodoso,  and 
S.  Mayrargue,  “Subspace  methods  for  the  blind  iden¬ 
tification  of  multichannel  fir  filters,”  IEEE  Trans,  on 
Signal  Processing,  vol.  43,  no.  2,  pp.  516-525,  Feb. 

[2]  L.  Tong,  G.  Xu,  and  T.  Kailath,  “Blind  identification 
snd  equalization  based  on  second-order  statistics:  A 
time  domain  approach,”  IEEE  Trans,  on  information 
Theory,  vol.  40,  no.  2,  pp.  340-349,  Mar.  1994. 

[3]  J.  Xavier,  V.  Barroso,  and  J.  Moura,  “Closed-form 
blind  channel  identification  and  source  separation  in 
sdma  systems  through  correlative  coding,”  accepted 
for  IEEE  Journal  on  Selected  Areas  on  Communica¬ 
tion,  Special  Issue  on  Signal  Proessing  for  Wireless 
Communications ,  1997. 

[4]  K.  Abed-Meraim,  Y.  Hua,  P.  Loubaton,  and 
E.  Moulines,  “Subspace  method  for  blind  identification 
of  multichannel  fir  systems  in  noise  field  with  unknown 
spatial  covariance,”  IEEE  Signal  Processing  Letters, 
vol.  4,  no.  5,  pp.  135-137,  May  1997. 

[5]  J.  G.  Proakis,  “Digital  communication.”,  3rd  ed.  Me 
Graw-Hill,  1995. 

[6]  J.  Bohme  and  D.  Krauss,  “On  least  squares  methods 
for  direction  of  arrival  estimation  in  the  presence  of  un¬ 
known  noise  fields,”  in  Proceedings  IEEE-ICASSP’88, 
New  York,  NY,  Apr.  1988,  pp.  2833-2836. 

[7]  B.  Friedlander  and  A.  J.  Weiss,  “Direction  finding  us¬ 
ing  noise  covariance  modeling,”  IEEE  Trans,  on  Signal 
Processing,  vol.  SP-43,  no.  7,  pp.  1557-1567,  Jul.  1995. 

[8]  A.  Paulraj  and  T.  Kailath,  “Eigenstructure  methods 
for  direction  of  arrival  estimation  in  the  presence  of 
unknown  noise  field,”  IEEE  Trans.  Acoust.,  Speech, 
Signal  Processing,  vol.  34,  no.  1,  pp.  276-280,  Feb. 

[9]  P.  Stoica,  M.  Viberg,  and  B.  Ottersten,  “Instrumen¬ 
tal  Variable  approach  to  array  processing  in  spatially 
correlated  noise  fields,”  IEEE  Trans,  on  Signal  Pro¬ 
cessing,  vol.  42,  no.  1,  pp.  121-133,  1994. 

[10]  K.  M.  Wong,  J.  Reilly,  Q.  Wu,  and  S.  Qiao,  “Esti¬ 
mation  of  the  direction-of-arrival  of  signals  in  the  un¬ 
known  correlated  noise,  part  I:  The  MAP  approach 
and  its  implementation,”  IEEE  Trans,  on  Signal  Pro¬ 
cessing,  vol.  40,  no.  8,  pp.  2007-2017,  Aug.  1992. 

[11]  J.-P.  Le  Cadre,  “Parametric  methods  for  spatial  sig¬ 
nal  processing  in  the  presence  of  unknown  colored  noise 
fields,”  IEEE  Trans.  Acoust.,  Speech,  Signal  Process¬ 
ing,  vol.  ASSP-37,  no.  7,  pp.  965-983,  Jul.  1989. 

[12]  M.  Wax,  “Detection  and  localization  of  multiple 
sources  in  noise  with  unknown  covariance,”  IEEE 
Trans,  on  Signal  Processing,  vol.  40,  no.  1,  pp.  245- 
249,  Sep.  1991. 

[13]  V.  Barroso,  J.  Moura,  and  J.  Xavier,  “Blind  array 
channel  division  multiple  access  (achdma)  for  mobile 
communictions,”  IEEE  Trans,  on  Signal  Processing , 
vol.  46,  no.  3,  pp.  516-525,  Mar.  1998. 




Predrag  Spasojevic 

Dept,  of  Electrical  and  Computer  Eng., 
Rutgers  University, 
Piscataway,  NJ  08854. 


A  new  technique  is  proposed  for  robust  multiuser  detec¬ 
tion  in  the  presence  of  non-Gaussian  ambient  noise.  This 
method  is  based  on  minimizing  a  certain  cost  function  (e.g., 
the  Huber  penalty  function)  over  a  discrete  set  of  candi¬ 
date  user  bit  vectors.  The  set  of  candidate  points  are  cho¬ 
sen  based  on  the  so-called  “slowest-descent  search” ,  starting 
from  the  estimate  closest  to  the  unconstrained  minimizer  of 
the  cost  function,  and  along  mutually  orthogonal  directions 
where  this  cost  function  grows  the  slowest.  The  extension  of 
the  proposed  technique  to  multi-user  detection  in  unknown 
multi-path  fading  channels  is  also  proposed.  Simulation 
results  show  that  this  new  technique  offers  substantial  per¬ 
formance  improvement  over  the  recently  proposed  robust 
multiuser  detectors,  with  little  attendant  increase  in  com¬ 
putational  complexity. 


Recently,  a  robust  multiuser  detection  technique  is  de¬ 
veloped  in  [4]  for  demodulating  multiuser  signals  in  the 
presence  of  both  multiple-access  interference  and  impulsive 
ambient  channel  noise.  This  technique  is  based  on  the  M- 
estimation  method  for  robust  regression,  and  is  essentially 
the  robustized  version  of  the  linear  decorrelating  multiuser 
detector.  Although  this  robust  multiuser  detector  offers 
significant  performance  gain  over  the  linear  decorrelator  in 
impulsive  noise,  there  is  still  a  large  gap  between  its  perfor¬ 
mance  and  that  of  the  maximum  likelihood  (ML)  multiuser 
detector.  However,  the  computational  complexity  of  the 
ML  detection  is  quite  high,  and  moreover,  the  ML  detection 
requires  the  knowledge  of  the  exact  probability  distribution 
of  the  noise,  which  may  not  be  available  to  the  receiver. 
Hence,  it  is  of  interest  to  develop  robust,  low-complexity, 
and  near-optimal  multiuser  detection  techniques  for  non- 
Gaussian  noise  channels.  Furthermore,  it  is  of  high  im¬ 
portance  to  have  the  ability  of  successfully  extending  this 
method  to  more  general  asynchronous  unknown  multi-path 
fading  channels.  Described  issues  are  subjects  of  this  paper. 

P.  Spasojevic  was  supported  in  part  by  the  WINLAB /Lucent 
Technologies  Wireless  Post-Doctoral  Fellowship.  X.  Wang  was 
supported  in  part  by  the  NSF  grant  CAREER  CCR-9875314. 

Xiaodong  Wang 

Department  of  Electrical  Engineering, 
Texas  A&M  University, 

College  Station,  TX  77843. 


First  consider  the  following  discrete-time  synchronous 
CDMA  signal  model.  At  any  time  instant,  the  received  sig¬ 
nal  is  the  superposition  of  A- user  signals,  plus  the  ambient 
noise,  given  by 


r  =  ^Qjfc6fc.sfc  +  n  =  SAb  +  n,  (1) 


where  Sk  =  .s^/v]7  is  the  normalized  signa¬ 

ture  sequence  of  the  fc-th  user;  N  is  the  processing  gain; 
bk  €  {+1,  —1}  and  an,  are  respectively  the  data  bit  and  the 

complex  amplitude  of  the  fc-th  user;  S  =  [«i  •  •  •  sk);  A  = 

diag(ai,  •  •  •  ,<**:);  b  =  [hi  •  •  ■  6jc]t;  and  to  =  [m  ■  •  ■  tin]1' 
is  a  vector  of  independent  and  identically  distributed  (i.i.d.) 
ambient  noise  samples  with  independent  real  and  imaginary 
components.  Denote 

'  3R{r}  ' 

\J>  — 

'  S?R{A}  ' 


'  3?{n}  ' 


5  ^  — 


j  v  — 


where  v  is  a  real  noise  vector  consisting  of  2 N  i.i.d.  samples. 
Then  (1)  can  be  written  as 

y  =  iFfe-fu.  (2) 

It  is  assumed  that  each  element  Vj  of  v  follows  a  two-term 
Gaussian  mixture  distribution,  i.e., 

Vj  ~  (1  —  e)M  (0,  p2)  +  eN  (0,  /tp2)  ,  (3) 

with  0  <  e  <  1  and  k  >  1.  Here  the  term  A/*(0,  v2)  rep¬ 
resents  the  nominal  ambient  noise,  and  the  term  X(0,  kv'2) 
represents  an  impulsive  component.  The  probability  that 
impulses  occur  is  e.  Note  that  the  overall  variance  of  the 
noise  sample  Vj  is 

y  =  (1  —  e)p2  +  €kv2  .  (4) 

We  have  Cov{r}  =  2j-l2N]  and  Cov{n}  =  a2I n-  The 
model  (3)  serves  as  an  approximation  to  the  more  funda¬ 
mental  Middleton  Class  A  noise  model  [2,  5],  and  has  been 

0-7803-5988-7/00/$  10.00  ©  2000  IEEE 


used  extensively  to  model  physical  noise  arising  in  radio 
and  acoustic  channels.  Recently,  it  has  been  shown  that 
another  class  of  non-Gaussian  distributions,  the  a-stable 
distributions,  can  be  well  approximated  by  a  finite  mixture 
of  Gaussians  [1],  In  what,  follows,  we  consider  the  problem 
of  detecting  the  transmitted  symbols  b  of  all  users  based  on 
the  signal  model  (2). 


In  this  section,  we  give  a  unified  description  of  a  num¬ 
ber  of  approaches  to  the  problem  of  multiuser  detection 
in  non-Gaussian  noise.  There  are  primarily  two  categories 
of  such  detectors  for  estimating  b  from  y  in  (2),  all  based 
on  minimizing  the  sum  of  a  certain  function  p  of  the  chip 

2  N 

C(b;  y)  =  ]Tp(pj-C?b),  (5) 

j- 1 

where  is  the  j-th  row  of  the  matrix  ’F. 

•  Exhaustive-search  detector: 

be  =  arg  min  C(b;y).  (6) 


•  Decorrelative  detector: 

/3  =  arg  min  C(b,y),  (7) 


6*  =  sign(/3).  (8) 

It  is  seen  that  the  exhaustive-search  detection  is  based  on 
the  discrete  minimization  of  the  cost  function  C(b;y),  over 
2k  candidate  points;  whereas  the  decorrelative  detection 
is  based  on  the  continuous  minimization  of  the  same  cost 
function.  In  general,  the  optimization  problem  (7)  can  be 
solved  iteratively  according  to  the  following  steps  [4] 

zl  =  *{v-Vri. 3')>  (9) 

f3l+1  =  (3l  +  /  =  0, 1,  —  ,  (10) 

We  consider  the  following  three  choices  of  the  penalty 
function  p(-)  in  (5),  corresponding  to  different  forms  of  de¬ 

•  Log-likelihood  penalty  function: 

Pml(x)  =  —  log /(a:),  V’ML(ar)  = -y^,(ll) 

where  /(■)  denotes  the  probability  density  function 
(pdf)  of  the  noise  sample.  In  this  case,  the  exhaustive- 
search  detector  (6)  corresponds  the  ML  detector;  and 
the  decorrelative  detector  (8)  corresponds  to  the  ML 
decorrelator  [4], 

•  Least-square  penalty  function: 

phs{x)=^x2,  rphs{x)  —  x.  (12) 

In  this  case,  the  exhaustive-search  detector  (6)  cor¬ 
responds  to  the  ML  detector  based  on  a  Gaussian 
noise  assumption;  and  the  decorrelative  detector  (8) 
corresponds  to  the  linear  decorrelator. 

•  Huber  penalty  function: 

Pn(x)  =  j 

r  f?> 



,  if  1*1  >^, 


rpH{x)  =  j 


[  c  sign(a:) 

if  |*|<s£, 
if  1*1  >s£. 



where  is  the 

noise  variance  given  by  (4), 


c  =  JL  is  a  constant.  In  this  case,  the  exhaustive- 
search  detector  (6)  corresponds  to  the  discrete  min- 
imizer  of  the  Huber  cost  function;  and  the  decorrel¬ 
ative  detector  (8)  corresponds  to  the  robust  decorre¬ 
lator  proposed  in  [4], 


Clearly  the  optimal  performance  is  achieved  by  the  ex¬ 
haustive  search  detector  with  the  log-likelihood  penalty  func¬ 
tion,  i.e.,  the  ML  detector.  As  will  be  seen  in  Section  5,  the 
performance  of  the  exhaustive  search  detector  with  the  Hu¬ 
ber  penalty  function  is  close  to  that  of  the  ML  detector, 
while  this  detector  does  not  require  the  knowledge  of  the 
exact  noise  pdf.  However  computational  complexity  of  the 
exhausive  search  detector  (6)  is  on  the  order  of  0( 2K).  We 
next  propose  a  local  search  approach  to  approximating  the 
solution  to  (6).  The  basic  idea  is  to  minimize  the  cost  func¬ 
tion  C(b;y)  over  a  subset  O  of  the  discrete  parameter  set 
{— 1,+1}X  that  is  close  to  the  continuous  stationary  point 
f3  given  by  (7).  More  precisely,  we  approximate  the  solution 
to  (6)  by 

b 3  =  arg  min  C(b-,y).  (15) 


In  the  slowest  descent  method  [3],  the  candidate  set  S~I  con¬ 
sists  of  the  discrete  parameters  chosen  such  that  they  are 
in  the  neighborhood  of  Q  (Q  <  K)  lines  in  1RK,  which  are 
defined  by  the  stationary  point  f3  and  the  Q  eigenvectors  of 
the  Hessian  matrix  V2(f3)  of  C{b\y)  at  (3  corresponding  to 
the  Q  smallest  eigenvalues.  The  basic  idea  of  this  method 
is  explained  next. 

Slowest-Descent  Search:  The  basic  idea  of  the  slowest- 
descent  search  method  is  to  choose  the  candidate  points  in 
fl  such  that  they  are  closest  to  a  line  (/3  +  pg)  in  1RK, 
originating  from  (3  and  along  a  direction  g,  where  the  cost 
function  C{b\  y)  increases  at  the  slowest  rate.  Given  any  line 
in  1Rk,  there  are  at  most  K  points  where  the  line  intersects 
the  coordinate  hyper-planes  (e.g.,  /31  and  (32  in  Figure  1  for 
K  =  2).  The  set  of  intersection  points  corresponding  to  a 
line  defined  by  /3  and  g  can  be  expressed  as 

{/3l  =  (3  -  mg  ■.  m  =Pi/9i}*=1,  (16) 

where  Pi  and  </,:  denote  the  z-th  elements  of  the  respective 
vectors  (3  and  g.  Each  intersection  point  (3r  has  only  its 
i-th  component  equal  to  zero,  i.e.,  (3\  =  0. 

Any  point  on  the  line  except  for  an  intersection  point 
has  an  unique  closest  candidate  point  in  {+1,  —  1}A.  An 
intersection  point  is  of  equal  distance  from  its  two  neigh¬ 
boring  candidate  points,  e.g.,  01  is  equi-distant.  to  b1  and  b2 
in  Figure  1(a).  Two  neighboring  intersection  points  share 


For  the  three  types  of  the  penalty  functions,  the  Hessian 
matrix  at  the  stationary  points  are  given  respectively  by 

(a)  (b) 

Figure  1:  One-to-one  mapping  from  {/3,  /31 ,  ■  ■  •  ,/3K}  to 
fi  =  {6*,  b1,  •  •  • ,  bK }  for  K  =  2.  Each  intersection  point 
f3l  is  of  equal  distance  from  its  two  neighboring  candidate 
points.  6 *  is  chosen  to  be  one  of  these  two  candidate  points 
that  is  on  the  opposite  side  of  the  j-th  coordinate  hyper¬ 
plane  with  respect  to  b* . 

a  unique  closest  candidate  point,  e.g.,  /31  and  01  share  the 
nearest  candidate  point  b 2  in  Figure  1(a).  Note  that  b* 
in  (8)  is  the  candidate  point  closest  to  (3.  By  carefully 
selecting  one  of  the  two  candidate  points  closest  to  each 
intersection  point  to  avoid  choosing  the  same  point  twice, 
one  can  specify  K  distinct  candidate  points  in  {+1,— 1}K 
that  are  closest  to  the  line  (/3  4-  fig).  To  that  end,  consider 
the  following  set 

{ft*  €{-!,+!}* 



sign  (01)  , 


k  yt  { 
k  =  i 

It  is  seen  that  (17)  assigns  to  each  intersection  point  0'  the 
closest  candidate  point  b'  that  is  on  the  opposite  side  of  the 
i-th  coordinate  hyper-plane  from  bd  [cf.Figure  1  (a)  (b)]. 

In  general,  the  slowest-descent  search  method  chooses 
the  candidate  set  ft  in  (15)  as  follows: 


n  =  {bd}u|J{fe^e{-l,+l}K: 


ia,h  _  /  sign  (0k  ~figqk),  if  0k  -ngqk^0 
k  \  -bl,  if  0k  -  figqk  =  0  ’ 

gq  is  the  q-th  smallest  eigenvector  of  V2  , 

Hence,  {bq'q},,_  contains  the  K  closest  neighbors  of  f3  in 
{— 1,  +1}K  along  the  direction  of  gq .  Note  that  {g9}^=1 
represent  the  Q  mutually  orthogonal  directions  where  the 
cost  function  C(b;y)  grows  the  slowest  from  the  minimum 
point  /3.  (In  case  of  the  log-likelihood  penalty  function,  this 
corresponds  to  the  situation  where  the  likelihood  function 
drops  the  slowest  from  its  peak,  hence  the  name  “slowest 
descent”)  Intuitively,  the  solution  to  (6)  is  most  likely  found 
in  this  neighborhood. 

Pml  : 

Vc(0)  =  *Tdiag  \ 

^Pml  {yj 


Pls  : 

V2e((3)  =  *TV, 


PH  : 

V2c((3)  =  4>Tdiag  J 



where  in  (19)  Pml(*)  =  V’mlO)  -  f"(x)/f(x)  and  in  (21) 
the  indicator  function  5(y  <  a)  =  1  if  y  <  a  and  0  otherwise; 
hence  in  this  case  those  rows  of  with  large  residual  signals 
as  a  possible  result  of  impulsive  noise  are  nullified,  whereas 
other  rows  of  are  not  affected. 

Finally  we  summarize  the  slowest-descent  search  algo¬ 
rithm  for  multiuser  detection  in  non-Gaussian  noise.  Given 
a  penalty  function  p(-),  this  algorithm  solves  the  discrete 
optimization  problem  (15)  according  to  the  following  steps: 

1.  Compute  the  continuous  stationary  point  (3  in  (7) 
using  the  iteration  (9)-(10); 

2.  Compute  the  Hessian  matrix  Vc(/3)  given  by  (19)  or 
(20)  or  (21),  and  its  Q  smallest  eigenvectors  g1,  -  -  ,g®', 

3.  Solve  the  discrete  optimization  problem  defined  by 
(15)  and  (18)  by  an  exhaustive  search  (over  (KQ+ 1) 


In  this  section,  we  extend  the  slowest  descent  multiuser 
detection  techniques  developed  above  to  the  asynchronous 
CDMA  system  with  multipath  distortion.  Following  [4],  [7], 
and  references  therein,  r[i],  the  vector  consisting  of  a  num¬ 
ber  of  stacked  one-symbol  length  vectors  that  affect  the 
current  symbol  interval  i  can  be  expressed  as  follows: 

r[i ]  =  Hb[i]  +  n[i].  (22) 

Here,  6[i]  and  n[i]  are  stacked  symbol  vectors,  and  H  is  the 
unknown  channel  matrix. 

We  can  rewrite  (22)  as 

r[i]  =  H0[i]  +  n[i]  =  I/s<)[*]  +  »[*]. 

Here  orthonormal  column  vectors  of  Us  span  the  column 
space  of  H  and  can  be  obtained  using  an  eigen-decomposition 
of  the  received  signal  autocorrelation  matrix  (see  [6]).  The 
estimation  of  the  channel  matrix  H  is  based  on  the  users’ 
signature  sequences  and  the  noise  subspace  estimated  from 
the  auto-correlation  eigen-decomposition  (see  [7]). 

We  next  obtain  the  robust  estimate  of  £[?']  based  on  the 
complex  version  of  the  decorrelative  iterations  (9)-(10)  for 
an  (e.g.,  Huber)  objective  function 

zl  =  i>(r-Us  C!),  (23) 

C,+1  =  C l+U?zl,  1  =  0,1,2,-  (24) 


where  H  is  the  Hermitian  operator.  0[i]  can  be  estimated 
as  follows 

0[i]  =  ( HHH)-1HflUsC[i\ • 

Note  that 

HQ[i\  =  H0Ab[i]  +  H#0#[i\,  (25) 

where  the  term  HoAb[i ]  contains  the  signal  carrying  the 
current  bits  &[«];  and  the  term  contains  the  signal 

carrying  the  previous  and  future  bits  {b[l]}ijn,  i.e.,  inter¬ 
symbol  interference.  A  holds  unknown  phases  which  are 
estimated  separately  from  the  channel  as  demonstrated  be¬ 
low.  We  subtract  the  estimated  intersymbol  interference 
from  r[i\  to  obtain 

f[i]  =  t[1]  —  H#0#[i]  (26) 

=  HoAb[i\  +  n[i\, .  (27) 

We  can  now  set 

H0$l{A}  ' 

H0%{A}  J  ’ 

and  use  the  methods  described  in  previous  sections  to  derive 
decorrelative  and  the  slowest-descent  estimates  of  6[i]  based 
on  r[i}. 

the  slowest  descent  detector  with  2  search  directions,  and 
the  exhaustive  detector.  Searching  further  slowest  descent 
directions  does  not  improve  the  performance  in  this  case. 
We  observe  that  for  all  three  criteria  the  performance  of  the 
slowest  descent  detector  is  close  to  the  performance  of  its 
respective  exhaustive  maximization  version.  All  detectors 
are  significantly  better  then  the  LS  based  detectors. 

For  the  multi-path  channel  case  the  following  is  as¬ 
sumed:  processing  gain  N  =  15,  number  of  users  K  =  6 
each  user’s  channel  has  3  paths  and  a  delay  spread  of  up  to 
one  symbol  interval.  The  complex  gains,  the  delays  of  each 
user’s  channel,  and  user  signature  sequences  are  generated 
randomly.  The  chip  pulse  is  a  raised  cosine  pulse  with  roll¬ 
off  factor  0.5.  The  path  gains  are  normalized  so  that  each 
user’s  signal  arrives  at  the  receiver  with  unit  energy.  The 
over-sampling  factor  is  2  and  the  number  of  stacked  vectors 
in  (22)  (the  smoothing  factor)  is  2. 

Figure  3  demonstrates  the  performance  of  the  Huber- 
based  slowest-descent  method  with  one  and  two  search  di¬ 
rections,  the  decorrelative  Huber  detector,  and  the  blind 
decorrelator  from  [6].  Most  of  the  performance  gain  of¬ 
fered  by  the  slowest-descent  method  is  obtained  by  search¬ 
ing  along  only  one  direction.  Over  1  dB  of  gain  is  obtained 
relative  to  the  the  decorrelative  estimate.  The  blind  ap¬ 
proach  [6]  performs  poorly  for  this  system. 

Estimation  of  A 

We  next  consider  the  estimation  of  the  complex  amplitudes 
A.  Following  (25),  we  have  [recall  that  A  =  diag(ai,  •  •  • ,  Q^).] 

Ok  =  Qfchk+fifc,  k  —  l,  ■■■,&.  (28) 

Since  bk  €  {— 1,4-1},  it  follows  from  (28)  that  6k  form  two 
clusters  centered  at  respectively  ak  and  —  a*,.  Let  ak  = 
Pke3‘t’k,  a  simple  estimator  of  a*,  is  given  by  &k  =  Pke3<l’k 

Pk  =  E{\6k\ }, 

4>k  = 

j  E{L  [feign  (!R{0fc})]},  ff£{|»{0fc}|}>E{|3{fMI} 

\  £{Z[0fcsign(9{0fc})]},  fffnm0fc}|}<£{|3{<M|}  " 

where  the  operator  E(-)  denotes  sample  average.  Note  that 
the  above  estimate  of  the  phase  <j>k  has  an  ambiguity  of 
7r,  which  necessitates  differential  encoding  and  decoding  of 


Figure  2:  Symbol  error  performance  of  a  synchronous  DS- 
CDMA  system  with  N  =  15,  K  =  8,  e  =  0.01,  k  =  100. 


For  simulations,  we  assume  a  synchronous  CDMA  sys¬ 
tem  with  a  processing  gain  N  =  15,  number  of  users  K  —  6, 
no  phase  offset  and  equal  amplitudes  of  user  signals,  i.e., 
ak  =  1,  k  =  1,  •  •  ■ ,  A.  User  1  signature  Si  sequence  is 
generated  randomly  and  kept  fixed  throughout  simulations. 
Signature  sequences  of  Users  2  through  K  are  generated  by 
a  circularly  shifting  the  sequence  of  User  1. 

For  each  of  the  three  penalty  functions  Figure  2  presents 
the  symbol  error  performance  of  the  decorrelative  detector, 

We  have  developed  a  new  robust  multiuser  detection 
technique  based  on  the  method  of  slowest-descent  search. 
By  searching  only  over  one  or  two  directions,  this  method 
offers  significant  performance  improvement  over  the  recently 
proposed  robust  decorrelating  detector  in  impulsive  noise. 
The  proposed  approach  has  been  extended  to  multi-path 
fading  channels  were  complex  channels  and  signal  phases  of 
all  users  have  to  be  estimated  blindly. 


Figure  3:  Symbol  error  performance  of  an  asynchronous  DS- 
CDMA  system  with  N  =  15,  K  =  8,  e  =  0.01,  k  =  100,  in 
an  unknown  multi-path  channel  with  3  randomly  generated 
path  coefficients  per  user. 


[1]  E.E.  Kuruoglu  and  C.  Molina  and  W.J.  Fitzgerald.  Ap¬ 
proximation  of  Q-stable  probability  densities  using  finite 
mixtures  of  Gaussians.  In  Proc.  EUSIPCO’98,  Rohdes, 
Greece,  September  1998. 

[2]  D.  Middleton.  Non-gaussian  noise  models  in  signal  pro¬ 
cessing  for  telecommunications:  New  methods  and  re¬ 
sults  for  class  A  and  class  B  noise  models.  IEEE  Trans. 
Inform.  Theory,  45(4):1122-1129,  May  1999. 

[3]  P.  Spasojevic.  Sequence  and  channel  estimation  for 
channels  with  memory.  Department  of  Electrical  En¬ 
gineering,  Texas  A&M  University  1999. 

[4]  X.  Wang  and  H.V.  Poor.  Robust  multiuser  detection 
in  non-Gaussian  channels.  IEEE  Trans.  Sig.  Proc., 
47(2):289-305,  Feb.  1999. 

[5]  S.M.  Zabin  and  H.V.  Poor.  Efficient  estimation  of  the 
class  A  parameters  via  the  EM  algorithm.  IEEE  Trans. 
Inform.  Theory,  37(l):60-72,  Jan.  1991. 

[6]  X.  Wang  and  H.V.  Poor.  Blind  multiuser  detection: 
A  subspace  approach.  IEEE  Trans.  Inform.  Theory, 
44(2):677-691,  Mar.  1998. 

[7]  P.  Spasojevic,  X.  Wang,  and  A.  Hpst-Madsen,  “Nonlin¬ 
ear  group-blind  multiuser  detection,”  Technical  Report, 
WINLAB,  Rutgers  Univ.,  July  2000. 



Linda  M.  Davis t  Iain  B.  Callings^  Robin  J.  Evans * 

t  Global  Wireless  Systems  Research 
Bell  Laboratories 

Lucent  Technologies,  AUSTRALIA 


This  paper  presents  a  new  recursive  algorithm 
for  maximum  likelihood  estimation  of  the  delay- 
Doppler  characteristics  of  fast-fading  mobile  com¬ 
munication  channels.  The  channel  is  modelled 
as  an  FIR  filter  with  rapidly  varying  complex 
coefficients.  The  parameters  of  interest  are  the 
mean  channel  taps  and  the  tap  covariance.  The 
structure  of  the  channel  tap  covariance  matrix  is 
exploited  to  provide  convergence  to  constrained 
channel  estimates. 


Maximum  likelihood  constrained  covariance  estimation 
for  directly  observable  processes  in  additive  noise  has 
received  considerable  attention  [1,  2,  3,  4]  since  many 
algorithms  in  spectral  analysis  rely  on  knowledge  of 
the  covariance  matrix.  Applications  include  harmonic 
retrieval,  beamforming  and  direction  of  arrival  estima¬ 
tion.  In  many  such  cases,  the  system  of  interest  is 
shift-invariant  and  the  true  covariance  matrix  is  known 
to  be  Hermitian  Toeplitz  as  well  as  positive  semidefi- 
nite.  This  structure  may  be  used  in  obtaining  realistic 
covariance  matrix  estimates,  and  in  addition  may  be 
exploited  in  to  provide  fast  convergence  to  constrained 
estimates  and  aid  subsequent  processing  (e.g.  inverses, 
eigendecomposition  etc.). 

In  this  paper,  we  consider  the  extension  of  constrained 
covariance  estimation  to  the  case  where  the  process  of 
interest  is  observed  through  convolution  with  a  known 
signal  in  addition  to  the  additive  noise.  This  problem 
arises  in  delay-Doppler  radar  imaging  [5]  and  delay- 
Doppler  imaging  of  fast-fading  mobile  communication 
channels  [6].  In  these  situations,  the  underlying  re¬ 
flectance  process  has  a  time-varying  impulse  response, 
and  therefore  is  two-dimensional  (in  time,  k,  and 

*  School  of  Electrical  &  Information  Engineering 
University  of  Sydney,  AUSTRALIA 

*Dept.  Electrical  k  Electronic  Engineering 
University  of  Melbourne,  AUSTRALIA 

delay,  e).  The  delay-Doppler  image  of  a  reflectance 
process  is  also  known  as  the  scattering  function,  and  is 
related  to  the  covariance  matrix  by  a  Fourier  transform 
(in  the  time  axis  indexed  by  k)  [7]. 

This  paper  presents  a  new  algorithm  for  maximum  like¬ 
lihood  estimation  of  the  covariance  matrix  (and  there¬ 
fore  the  delay-Doppler  characteristics)  of  fast-fading 
mobile  communication  channels.  Importantly,  our  al¬ 
gorithm  explicitly  makes  use  of  the  structural  constraints, 
Key  features  of  the  algorithm  include  joint  estimation 
of  the  channel  mean  and  covariance,  and  applicabil¬ 
ity  to  a  general  class  of  wide-sense  stationary  (WSS) 

Channel  Response 

Consider  a  discrete  equivalent  baseband  model  in  which 
the  complex- valued  time- varying  channel,  or  reflectance 
process,  fk,(,  represents  the  effect  at  time  k,  for  reflec¬ 
tions  with  a  path  delay  e.  Ignoring  the  average  delay 
in  the  analysis,  the  observed  signal  is 

L  —  l 

zk  =  ^2  (*) 


where  L  is  the  length  of  the  finite  impulse  response 
(FIR)  channel,  or  the  extent  of  the  radar  target,  Xk  is 
the  known  transmitted  signal,  and  Wk  is  the  additive 
noise  introduced  at  the  receiver. 

Writing  the  observations  for  k  =  0, . . . ,  N  -  1  in  vector 

z  =  XF  +  w  (2) 

0-7 803 -5988-7/00/$  1 0.00  ©  2000  IEEE 


where  the  matrix  of  channel  inputs  is 

trix,  R  £  Tjv.l,  may  be  written  as 


X  = 



xn- i 

X-L+l  •  ••  0 

0  •  •  •  a;  A r-L 

and  F  =  [/0)o,  •• -/at-1,0,  ,/w-i,t-i]T  1  ■  The 

time-varying  channel  (or  reflectance)  process,  F  is  seen 
to  be  two-dimensional  in  that  its  elements  are  charac¬ 
terized  both  by  the  time  index  k,  and  the  delay  index  e. 

When  a  line-of-sight  path  or  specular  (stable)  reflec¬ 
tions  exist  between  the  transmitter  and  receiver,  the 
channel  is  no  longer  zero-mean.  Thus  F  =  F+F,  where 
F  is  the  zero-mean  time- varying  component  and  F  is 
the  mean  component,  constant  over  the  observation  in¬ 
terval,  TV.  Here  F  is  a  NL  x  1,  but  contains  only  L  in¬ 
dependent  parameters.  For  convenience,  we  also  define 
the  L  x  1  vector  G  =  (I<8>eT)F,  and  the  corresponding 
TV  x  L  matrix  of  channel  inputs  Y  =  X(I  ®  1),  where 
eT  =  [1, 0, . . . ,  0]  is  an  1  x  TV  unit  vector,  1  =  [1, . . . ,  1]T 
is  an  AT  x  1  vector  of  ones,  (g>  is  the  Kronecker  product 
operator,  and  I  is  the  L  x  L  identity  matrix. 


R  =  ^  rmQm  (4) 

m= 1 

where  rm  are  the  values  of  the  real  and  imaginary  com¬ 
ponents  of  elements  of  R.  There  are  M  =  2 NL2  —  l? 
independent  parameters,  rm.  The  channel  covariance 
matrix  is  (by  definition)  positive  semidefinite.  This 
manifests  itself  as  a  highly  nonlinear  constraint  on  the 
parameters,  rm. 

Assuming  additive  white  Gaussian  noise  (AWGN)  at 
the  receiver,  the  channel  covariance,  R  is  related  to  the 
TV  x  TV  observation  covariance  matrix,  Rz  =  E  [  zzH  ] 

Rz  =  XRXh  +  a2wI  (5) 

where  a2w  is  the  variance  of  the  observation  noise,  and 
the  observation  is  z  =  z  +  z,  where  z  =  XF  =  YG  is 
the  mean  response. 


Channel  Covariance 

The  dimensionality  of  the  channel  impulse  response  is 
reflected  in  the  structure  of  the  covariance  matrix 


E[FFH } 

Ro,o  •••  Ro,L-1 

Rl-i,o  •  •  •  Rl-i,l-i 


which  consists  of  L  x  L  blocks  of  A  x  TV  matrices,  RClj£2 
which  represent  the  covariance  between  taps  (or  reflec¬ 
tors)  at  delays  ei  and  e2.  For  the  radar  target,  where 
scatterers  are  assumed  to  behave  independently  (i.e. 
uncorrelated  scatterers  (US))  [5],  the  off-diagonal  ma¬ 
trices  will  all  be  zero.  However,  for  the  communication 
channel  model,  the  inclusion  of  the  transmitter  and  re¬ 
ceiver  pulse  shapes  in  the  equivalent  channel  response, 
fk,€,  means  that  this  is  not  the  case. 

To  adequately  identify  the  channel,  we  require  esti¬ 
mates  for  the  vector  of  channel  tap  means,  G,  and  the 
matrix  of  channel  tap  covariances,  R.  It  is  important 
that  the  estimates  maximize  the  likelihood  over  the  set 
of  admissible  structured  matrices  R  €  Tjv.i,. 

It  is  easily  shown  that  maximizing  the  likelihood  func¬ 
tion  for  the  channel  model  of  Section  2  is  the  same  as 
maximizing  the  following  expression 

$(G,  R)  =  —  lndet  Rz  -  tr  {RZ_1S}  (6) 

where  the  sample  covariance  matrix,  S  =  (z  —  YG)(z  — 
YG)H  is  a  function  of  the  mean  channel  G,  and  Rz  is 
a  function  of  the  channel  covariance  R  as  given  above 
in  (5).  Here,  tr{-}  denotes  the  trace  operator. 

Note  that  the  likelihood,  and  hence  4>(G,R),  is  only 
defined  when  Rz  is  strictly  positive  definite. 

Lemma  1  When  G  is  given  by 

When  the  statistics  of  the  fading  or  reflectance  pro¬ 
cess  are  wide-sense  stationary  (WSS)  (in  the  dimen¬ 
sion  indexed  by  k),  the  covariance  matrices  Rfl)(2  are 
Toeplitz.  The  overall  matrix  is  then  Hermitian  sym¬ 
metric  and  block- Toeplitz.  The  set  of  Hermitian  block- 
Toeplitz  matrices  is  denoted  here  by  TV,l- 

The  Hermitian  block- Toeplitz  channel  covariance  ma- 

lrI  hc  transpose  operator  is  denoted  (-)T,  and  (■)H  denotes  a 
Hermitian  transpose. 

G  =  (YHRz“1Y)-1YHRz~1z  (7) 

the  first  differential  of  the  likelihood  objective  (6)  is 

d$  =  tr  {X"RZ-J (S  -  Rz)Rz~1XdR}  (8) 

Proof  The  first  differential  of  the  objective  function 
(6)  is  [8] 

d$  =  — d(lndetRz)  —  tr  {d(Rz-1)S} 


Now,  the  first  term  is  given  by 

d(lndetRz)  =  tr{Rz-1dRz} 
and  the  differential  of  an  inverse  is  given  by  [8,  pg  151] 
d(Rz_1)  =  -Rz_1dRzRz_1 

In  order  to  prove  that  the  recursion  (10)  increases  the 
likelihood  (i.e.  when  dR,  =  »;(R,'  —  Rj_i)),  we  now 
proceed  to  show  that  the  second  term  in  (14)  is  positive, 
and  the  first  term  is  zero. 

The  second  term  in  (14)  may  be  written 


d$  =  — tr  {Rz_1dRz}  +  tr  {Rz-1dRz  RZ-1S} 
+2  tr  {  Rz  -1  YdG(z  -  YG) H  } 

=  tr  {RZ_1(S  —  Rz)Rz_1dRz} 

+2tr{(z-YG)HRz_1YdG}  (9) 

Substituting  (7)  into  (9)  gives  (8).  ■ 

Unfortunately,  due  to  non-linearities  in  (8)  and  the 
need  for  a  positive  definite  solution,  it  is  infeasible  to 
obtain  an  analytic  maximum  likelihood  solution  for  the 
covariance  matrix  using  (8)  by  setting  d$  =  0.  We  now 
present  our  main  result  in  the  following  theorem  which 
leads  us  to  our  recursive  algorithm  in  Section  4  for  find¬ 
ing  an  admissible  maximum  likelihood  solution. 

Theorem  1  The  sequence  of  covariance  matrices, 
{R*},  and  channel  tap  means,  {G*},  generated  by  the 
following  iterative  equations  (10)  -  (12),  monotonically 
increases  in  likelihood 

R;  =  Rj-i  +  o;i(Ri  —  Rj-i)  (10) 

Rz,i  =  XR  +  (11) 

Gj  =  (Yif(RZii)_1Y)_1YK(RZii)-1z  (12) 

where  oti  >  0  is  an  arbitrarily  small  stepsize,  and  where 

R*  =  ^  rn,iQn,  f°r  rn,i  satisfying  the  following  set  of 

n=  1 

equations,  for  m  =  1 

£  tr  {XHRz-j_1XQnX"Rz-j_1XQm  }  f„(i 

n—  1 

=  tr  {XHRZ  ]_1(Si_1  -  ^i)R-j_lXQm}  (13) 

where  Rz,j_i  =  XRj-iX^  +  cr^I  and  S,_i  =  (z  — 
YGj_i)(z  —  YGj_i)H.  The  initial  Rq  must  be  posi¬ 
tive  definite  Hermitian  block-  Toeplitz  (e.g.  Ro  =  I). 

Proof  From  (8),  consider  the  differential  of  the  likeli¬ 
hood  objective  function  at  iteration  i 

d$i  =  tr  {XHRZ  ]_1(S,_i  —  Rz,,_1)R“]_1XdR,} 

—  tr  {X^RZ  (Sj_i  —  Rz,j-i 

-(1  /«<)  XdRj  XH)  R-^iXdRi} 

+(lM)tr  {XHRZ  XdRjX^ R“ XdRi }  (14) 

( 1  /a«)tr  {  XH  R~ i  XdRjXw  R~  XdR* } 

=  (1/Qi)tr  {XHAAHXdRiXHAAHXdRi}  ; 

Rz  ]_j  =  AAh  since  p.d. 

=  (l/aj)tr  {A^XdRjX^AA^XdRjX^A} 

=  (l/aOtrjBB"}; 

B  =  AHXdRiXHA  and  dR*  =  dRf 

>  0 

Now,  before  considering  the  first  term  in  (14),  consider 
(13),  which  can  be  written 

tr  {XHRZ  j_1(Si_1  -  a2wl)R;}_rXClm} 

-tr  |xffRz  J_1XRjXffR“]_1XQm|  =  0 
tr  {XhRz  ]_1(S,_i  -  a2wl  -  XR,-i  XH 

-(1  /at)  XdRiXH)R2-j_1XQm}  =  0; 
since  R*  =  Rj_i  +  (l/Qj)dRj 
tr  {XJ?Rz  j_1(Si_1  —  Rz,i-i 

-(1  /at)  XdRj XH ) R~ j XQm }  =  0 


m— 1 

-(1  /«<)  XdRiXH)Rzj_1XQm}  drm  =  0 

tr|xffRz]_1(Sj_i  —  RZij_i 


-(l/ai)XdRiX")R-]_1X  J2  Q mdrm 


Since  dR,  =  ^  Qmdrm,  the  first  term  in  (14)  is  zero. 

m~  1 


Remark  1  Theorem  1  utilizes  the  inverse  iteration 
argument  of  [1  ].  However,  this  new  result  is  applica¬ 
ble  when  the  process  of  interest  is  not  necessarily  zero- 
mean  and  is  observed  via  convolution  in  additive  noise. 
We  have  included  estimation  of  both  the  real  and  imag¬ 
inary  parts  of  each  of  these  channel  parameters  which 
arise  from  the  baseband  model.  Since  the  result  is  not 
restricted  to  zero-mean  and  uncorrelated  scatterer  mod¬ 
els,  it  leads  to  a  more  generally  applicable  algorithm 
than  the  circulant  extension  algorithm  of  [5]. 



Theorem  1  provides  us  with  a  recursive  algorithm  for 
maximum  likelihood  channel  mean  and  structured  co- 
variance  estimates.  However,  in  order  to  perform  an 
iteration  (10)— (12) ,  we  must  first  solve  (13).  At  each 
iteration  i,  this  can  be  done  by  forming  a  vector  x  = 
[riti, . . .  ,rM,i]T  and  an  M  x  1  vector  b  with  elements 
given  by  the  RHS  of  (13)  for  m  =  1, . . . ,  M.  Now  the 
set  of  equations  in  (13)  for  m  —  1, . . . ,  M  can  be  writ¬ 
ten  in  the  form  Ax  =  b,  where  we  can  now  solve  for 
x.  It  is  easily  be  shown  that  at  each  iteration  A  is 
positive  definite,  and  therefore  efficient  algorithms  can 
be  employed  in  the  solution. 

This  new  recursive  estimation  algorithm  is  in  fact  a  lin¬ 
earized  gradient  algorithm,  as  can  be  see  by  the  linear 
equation  (13).  The  formulation  can  easily  be  extended 
for  multiple  observations. 

Remark  2  For  the  directly  observable  case  presented 
in  [1],  it  was  sufficient  to  confine  the  estimates  of  the 
structured  covariance  matrix  at  each  iteration  to  the 
positive  definite  region  (by  appropriate  choice  of  the 
stepsize)  to  obtain  an  admissible  maximum  likelihood 
solution.  Note  that  for  the  case  presented  here  however, 
that  Rz  (5)  may  be  positive  definite  even  when  the  es¬ 
timate  of  H  is  not.  Since  the  maximum  of  the  objective 
( 6)  may  occur  in  this  region,  gradient  algorithms  may 
not  be  guaranteed  to  find  an  admissible  (R  6  T/v,/J 
maximum  of  the  objective  function. 

Example  A 

Our  new  recursive  algorithm  was  first  tested  on  a  zero- 
mean  US  channel.  The  channel  was  simulated  with 
L  =  2  independent  equal  power  fading  taps  with  a 
Jakes’  Doppler  spectrum.  The  signal  to  noise  ratio 
(SNR)  was  nominally  chosen  to  be  10  dB.  The  dimen¬ 
sion  of  the  covariance  matrix  was  NL  —  50  and  75 
samples  of  the  channel  output  were  used  in  the  esti¬ 
mation  (representing  a  multiple  observation  factor  of 
3).  The  stepsize  at  each  iteration,  a,,  was  chosen  to 
confine  the  corresponding  estimate  R,  to  the  positive 
definite  region. 

Figure  1  shows  the  progression  of  the  objective  max¬ 
imization  with  respect  to  computational  effort.  Also 
shown  is  the  progression  of  the  algorithm  of  [5],  using 
a  factor  of  2  for  the  circulant  extension.  The  scaling 
of  the  curves  relative  to  the  computational  effort  was 
based  on  counts  of  floating  point  operations  in  Matlab 
for  unoptimized  code  in  both  cases,  and  therefore  the 
figure  is  only  indicative  of  a  performance  comparison. 

Importantly,  Figure  1  shows  that  the  restriction  of  the 

Figure  1:  Maximization  of  the  likelihood  objective  rel¬ 
ative  to  computational  effort,  estimates  R,  restricted 
to  the  positive  definite  region 

estimates  R*  to  the  positive  definite  region  at  each  iter¬ 
ation  may  result  in  trapping  the  algorithm  at  the  pos¬ 
itive  definite  boundary  when  a  solution  with  greater 
likelihood  exists  in  the  admissible  region. 

Remark  3  The  algorithm  in  [5]  for  US  channels  ex¬ 
ploits  the  circulant  extension  property  of  Toeplitz  ma¬ 
trices  [9],  and  has  been  shown  to  be  an  instance  of  the 
expectation-maximization  (EM)  algorithm.  With  sen¬ 
sible  initialization,  this  algorithm  maintains  a  positive 
definite  estimate  of  R.  However,  due  to  the  augmen¬ 
tation  of  the  Toeplitz  matrix  to  a  circulant  matrix,  the 
estimation  problem  is  modified,  and  conditions  for  con¬ 
vergence  to  an  admissible  maximum  of  (6)  have  not  yet 
been  established. 

Modification  of  the  gradient 

To  pursue  the  admissible  maximum  likelihood  solu¬ 
tion,  modification  of  the  gradient  is  required  to  allow 
movement  tangential  to  the  positive  definite  boundary 
whilst  maintaining  the  positive  definite  constraint  on 
the  estimates  R,  .  Due  to  the  complexity  of  the  rela¬ 
tionship  between  the  positive  definite  constraint  and 
the  parameters  rm,  no  obvious  modification  strategy  is 

A  simple  modification  we  have  found  is  to  replace  the 
set  of  linear  equations  for  calculating  Rj  (13)  with 


^2  tr  {Rr^QnRr1!  Qm}  fnJ 

=  tr  {X^R” Sj-iR" XQm  }  (15) 

It  is  an  unproven  conjecture  of  this  paper  that  this 
modified  algorithm  converges  to  an  admissible  maxi¬ 
mum  likelihood  solution  for  the  structured  covariance 


Figure  2:  Maximization  of  the  likelihood  objective  rel¬ 
ative  to  computational  effort,  modified  gradient 

Figure  3:  Delay-Doppler  profile  of  the  simulated  chan¬ 

Example  B 

The  experiment  of  Example  A  was  repeated  using  the 
modified  gradient  described  above.  Note  the  smooth 
trajectory  of  the  modified  algorithm,  suggesting  that 
the  algorithm  is  no  longer  trapped  prematurely.  Also 
shown  is  the  likelihood  (V)  obtained  for  the  structured 
covariance  matrix  estimate  without  the  positive  defi¬ 
nite  constraint. 

Figure  3  shows  the  delay-Doppler  profile  of  the  simu¬ 
lated  channel.  Figures  4  and  5  show  the  corresponding 
estimates  of  the  delay  Doppler  spectrum.  Improvement 
can  be  obtained  using  more  data  (with  correspondingly 
more  computational  effort)  and/or  higher  SNR.  Fur¬ 
ther  trials  show  that  the  modified  gradient  algorithm 
is  robust  in  estimating  the  channel  mean,  with  good 
mean  estimates  and  negligible  impact  on  the  covari¬ 
ance  estimate  and  delay-Doppler  profile. 


[1]  J.  P.  Burg,  D.  G.  Luenberger,  and  D.  L.  Wenger,  “Esti¬ 
mation  of  structured  covariance  matrices,”  in  Proceed¬ 
ings  of  the  IEEE,  vol.  70,  pp.  963-974,  Sept.  1982. 

[2]  M.  I.  Miller  and  D.  L.  Snyder,  “The  role  of  likelihood 
and  entropy  in  incomplete-data  problems:  Applications 
to  estimating  point-process  intensities  and  Toeplitz  con- 

Figure  4:  Estimated  delay-Doppler  profile,  circulant 
extension  algorithm 

Figure  5:  Estimated  delay-Doppler  profile,  modified 

gradient  algorithm 

strained  covariances,”  Proceedings  of  the  IEEE ,  vol.  75, 
pp.  892-907,  July  1988. 

[3]  A.  Dembo,  C.  L.  Mallows,  and  L.  A.  Shepp,  “Embed¬ 
ding  nonnegative  definite  Toeplitz  matrices  in  nonneg¬ 
ative  definte  circulant  matrices,  with  application  to  co- 
variance  estimation,”  IEEE  Trans,  on  Information  The¬ 
ory,  vol.  35,  pp.  1206-1212,  Nov.  1989. 

[4]  L.  M.  Davis,  R.  J.  Evans,  and  E.  Polak,  “Maximum  like¬ 
lihood  estimation  of  positive  definite  Hermitian  Toeplitz 
matrices  using  Outer  Approximations,”  in  Proc.  of 
IEEE  Workshop  on  Statistical  Signal  and  Array  Pro¬ 
cessing  (SSAP’98),  (Portland,  OR,  USA),  pp.  49-52, 
Sept.  1998. 

[5]  D.  L.  Snyder,  J.  A.  O’Sullivan,  and  M.  I.  Miller,  “The 
use  of  maximum  likelihood  estimation  for  forming  im¬ 
ages  of  diffuse  radar  targets  from  delay-Doppler  data,” 
IEEE  Trans,  on  Information  Theory,  vol.  35,  pp.  536- 
548,  Nov.  1989. 

[6]  L.  M.  Davis,  I.  B.  Collings,  and  R.  J.  Evans,  “Esti¬ 
mation  of  LEO  satellite  channels,”  in  Int.  Conf.  on 
Information,  Communications  and  Signal  Processing 
(ICICS’97),  vol.  1,  (Singapore),  pp.  15-19,  Sept.  1997. 

[7]  H.  L.  Van  Trees,  Detection  Estimation  and  Modulation 
Theory,  vol.  III.  Wiley,  1971. 

[8]  J.  R.  Magnus  and  H.  Neudecker,  Matrix  Differential 
Calculus  with  Applications  in  Statistics  and  Economet¬ 
rics.  Wiley,  1988. 

[9]  R.  M.  Gray,  “Toeplitz  and  circulant  matrices:  Ii,”  tech, 
rep.,  Center  for  Systems  Research,  Stanford  University, 
Apr.  1977. 



Alexandr  M.  Kuzminskiy,  Kostas  Samaras,  Carlo  Luschi  and  Paul  Strauch 

Bell  Laboratories,  Lucent  Technologies 
Unit  1,  Pagoda  Park,  Westmead  Drive 
Swindon,  Wiltshire  SN5  7YT,  UK 


The  problem  of  maximizing  the  throughput  in  a  Random 
Access  Channel  (RACH)  in  a  TDMA-based  system  is  ad¬ 
dressed.  A  general  analysis  of  a  Slotted  ALOHA  system  is 
presented  which  shows  that  a  possibility  to  recover  more 
than  one  user  in  a  RACH  collision  can  significantly  im¬ 
prove  system  performance.  Three  capture  algorithms  based 
on  semi-blind  space-time  filtering  are  proposed.  Their  effi¬ 
ciency  compared  to  the  conventional  (power)  training-based 
capture  algorithm,  is  demonstrated  by  means  of  simulations 
in  a  GSM(EDGE)  system.  The  best  results  are  obtained  for 
a  multistage  version  of  the  training-like  algorithm  based  on 
the  Least  Squares  (LS)  estimation  of  space-time  filter  coef¬ 


Cellular  mobile  communication  systems  such  as  the  GSM 
make  use  of  RACH  in  order  to  enable  the  initial  access  of 
the  mobile  stations  to  the  network.  Packet  radio  networks 
(like  GPRS  and  EGPRS)  also  make  use  of  similar  channels 
called  Packet  Random  Access  Channels  (PRACH)  not  only 
for  the  initial  access  but  also  during  the  call  since  channels 
are  allocated  to  users  on  a  demand  basis,  rather  than  per¬ 
manently  (as  in  circuit  switched  GSM).  The  random  access 
mechanism  used  in  these  systems  is  based  on  the  Slotted 
ALOHA  principle  [1].  The  throughput  in  a  slotted  ALOHA 
random  access  channel  in  a  TDMA-system  system  can  be 
improved  by  using  capture  effects.  Most  capture  models 
in  TDMA-based  systems  rely  on  power  capture  [2]  and 
not  more  than  one  of  colliding  packets  can  be  recovered. 
Specifically,  when  more  than  one  packet  arrive  at  the  re¬ 
ceiver  simultaneously  only  one  of  them  can  be  captured  at 
the  receiver  given  that  its  power  exceeds  a  specified  thresh¬ 
old.  Capture  of  more  than  one  packet  in  a  collision  of  many, 
leads  to  performance  enhancement.  We  start  from  the  gen¬ 
eral  analysis  of  a  Slotted  ALOHA  system  with  capture.  We 
show  that  the  throughput  can  be  increased  significantly  if  a 
nonzero  probability  of  capture  of  more  than  one  packet  in 

a  collision  is  assumed.  Then  we  propose  three  capture  al¬ 
gorithms  based  on  semi-blind  space-time  filtering.  The  first 
one  is  based  on  a  multistage  procedure  where  each  stage  ex¬ 
ploits  the  conventional  LS  estimator  with  ability  to  capture 
at  most  one  of  the  colliding  packets.  The  second  algorithm 
is  based  on  a  training-like  (TL)  approach  [5,6]  that  allows 
us  to  introduce  a  nonzero  probability  to  recover  more  than 
one  user  in  a  collision  of  many  using  an  one  stage  proce¬ 
dure.  The  third  one  is  a  combination  of  the  multistage  and 
training-like  algorithm.  Simulations  in  a  GSM(EDGE)  con¬ 
text  are  presented,  which  demonstrate  the  superior  perfor¬ 
mance  of  the  multiple  capture  algorithms  compared  to  the 
LS  estimator. 


In  order  to  demonstrate  the  performance  enhancement  due 
to  space-time  capture  processing  we  consider  a  simple  S- 
ALOHA  system  with  a  finite  population  of  users,  N.  A  gen¬ 
eralization  of  the  model  described  in  [2,3]  is  adopted  where 
the  input  load  of  the  system  is  described  by  the  probability 
of  packet  arrival  denoted  as  p0.  Each  of  the  users  (termi¬ 
nals)  generates  single  packet  messages  with  probability  po. 
A  discrete  time  system  is  considered  and  transmissions  of 
packets  occur  only  at  the  boundaries  between  two  time  slots. 
If  the  transmission  of  a  packet  is  not  successful  the  terminal 
is  backlogged  and  makes  an  attempt  to  retransmit  the  packet 
in  the  next  time  slot  with  retransmission  probability  pr.  The 
capture  ability  of  the  channel  is  described  by  the  capture 
matrix  C  =  [c{i,j)],  where  c(i,  j)  denotes  the  probability 
that  there  are  i  successfully  received  packets  given  that  there 
are  j  packet  transmission  attempts  in  the  same  time  slot 
(0  <  i,  j  <  N).  It  is  assumed  that  all  transmitting  terminals 
are  aware  of  the  outcome  of  their  transmissions  before  the 
end  of  the  time  slot  through  an  ideal  feedback  (downlink) 
channel.  The  state  of  the  system  can  be  described  by  the 
number  n  of  backlogged  terminals  (0  <  n  <  N). 

The  steady  state  behavior  of  this  discrete  time  Markovian 
system  is  determined  by  the  (N  +  1)  x  (N  +  1)  transition 

0-7803-5988-7/00/$10.00  ©  2000  IEEE 


probabilities  matrix  II  =  [7r„,m],  where  7rn,m  is  the  proba¬ 
bility  that  the  state  of  the  system  (population  of  backlogged 
terminals)  is  m  during  time  slot  t + 1,  given  that  during  time 
t  the  state  was  n.  The  adopted  model  allows  us  to  express 
these  transition  probabilities  as  follows: 

N—n  n  min{n— , 

'--EE  E  (  r 

j=0  j— 0  fc=max{n— m+i— j',0} 

pi(  1  -  pr)n~jc(k,i)c(n  -m  +  i  -  k,j). 



The  expression  for  the  transition  probabilities  in  [3]  is  a 
special  case  of  (1)  when  the  capture  matrix  of  the  system 

{1  -  Qj,  *  =  0 

Qj,  i  =  l  ,  (2) 

0,  i>  1 

where  qj  is  the  probability  that  one  out  of  j  transmitted 
packets  is  successfully  received.  A  semi-analytical  ap¬ 
proach  has  been  followed  for  the  calculation  of  the  tran¬ 
sition  probabilities.  The  elements  of  the  capture  proba¬ 
bility  matrix  C,  for  the  purposes  of  this  paper,  have  been 
calculated  through  simulation.  In  particular  the  elements 
c(i,j)  with  1  <  i  <  Mi,  1  <  j  <  M2  (typical  val¬ 
ues  Mi  =  3,  M2  =  5)  are  calculated  via  simulation,  and 
c(0,  j)  =  1  -  J2ii\  c(*,i)  for  1  <  j  <  Mi.  Furthermore, 
c(0,0)  =  landc(i,j)  =  0  for  all  other  (i,j). 

The  steady  state  distribution  P  =  {Pk  }k=()  of  the  number 
of  backlogged  users  is  given  as  the  solution  to  the  following 
problem  [4]: 

p  n  =  p  (3) 

under  the  constraint: 


]TP*  =  1.  (4) 


As  a  performance  metric  the  average  number  of  success¬ 
fully  transmitted  packets  per  time  slot  has  been  chosen, 
which  is  referred  to  as  the  average  throughput  S.  The  aver¬ 
age  throughput  can  be  calculated  as  follows: 

S=£S(n)P„,  (5) 


where  S(n)  denotes  the  number  of  successful  packet  trans¬ 
missions  when  the  system  is  in  state  n  and  can  be  calculated 

N  N-n  N 

S(n)  =  ^(n  -m  +  i)-  try.  (6) 

m= 0  i=0  j=o 

A  possibility  to  improve  the  system  performance  by 
means  of  capture  effects  is  illustrated  in  Figure  1,  where 
the  average  system  throughput  as  a  function  of  the  retrans¬ 
mission  probability  is  plotted  for  no  capture  and  the  ideal 
capture  ( S  =  Npo)  where  TV  =  10  and  po  =  0.2.  One  can 
see  the  significant  gap  between  these  two  boundary  cases, 
which  can  be  filled  by  curves  corresponding  to  algorithms 
with  multiple  capture  ability. 

Figure  1:  Slotted  ALOHA  throughput  performance  for  the 
boundary  cases 


The  model  of  a  RACH  collision  is  shown  in  Figure  2.  The 
main  assumptions  are: 

1)  all  colliding  signals  and  Co-Channel  Interference 
(CCI)  have  the  known  structure  of  a  timeslot  (GSM,  for  ex¬ 
ample)  and  they  are  received  synchronously, 

2)  all  signals  are  from  the  same  finite  alphabet  (FA) 
{ah,  h  =  1, ...,  J}  and  all  of  them  have  the  same  training 
sequence  which  is  different  compared  to  the  CCI  training 

3)  channel  coding  is  used  (successful  capture  can  be  de¬ 
tected  by  means  of  parity  check), 

4)  multiple  antenna  is  used  at  the  receiver  (space-time 
interference  rejection  filtering  can  be  applied), 

5)  propagation  channels  for  all  colliding  signals  are  sta¬ 
tionary  over  the  whole  time  slot  (coefficients  of  a  space-time 
filter  can  be  adjusted  by  means  of  off-line  algorithms). 

The  main  difficulty  to  recover  more  than  one  user  is  that 
the  training  data  for  all  access  packets  in  one  cell  is  the 
same.  This  means  that  training-based  algorithms  cannot  be 
directly  applied  for  multiple  capture  reception.  Blind  tech¬ 
niques  could  be  applicable,  but  short  burst  nature  of  Slot¬ 
ted  ALOHA  systems  makes  it  unrealistic  because  of  the  fi¬ 
nite  amount  of  data  effects  [7].  A  possibility  to  address  this 
problem  by  means  of  semi-blind  space-time  filtering  algo¬ 
rithms  is  studied  in  this  paper. 

Note:  The  important  feature  of  the  considered  problem 
is  that  some  probability  of  access  failure  can  be  acceptable 


“Training-like  symbols”  (any  place  in  a  payload) 

Figure  2:  Model  of  a  RACH  collision 

for  RACH  systems.  Thus,  solutions  without  proven  ability 
to  recover  all  colliding  signals  in  every  time  slot  may  be 


4.1.  Multistage  algorithm 

A  multistage  processing  based  on  cancellation  of  the  recov¬ 
ered  signals  from  the  received  signal  at  successive  stages 
has  been  considered  for  different  applications,  for  example 
in  [8,9].  A  possible  way  to  implement  this  technique  in  the 
considered  problem  is  presented  in  Figure  3  (two  stages  are 
shown  for  simplicity).  The  conventional  LS  algorithm  is 
used  at  each  stage  in  the  Space-time  Filter.  The  possible 
number  of  stages  can  be  found  from  the  applicability  condi¬ 
tion  (misadjustment)  for  the  Noise  Canceller  [10]: 

(Number  of  stages  -  1)  *  Length  of  channels  < 
Number  of  information  symbols  in  a  timeslot 

The  advantage  of  this  algorithm  is  that  more  than  one  user 
may  be  captured  if  the  first  stage  is  successful.  The  dis¬ 
advantage  is  that  no  signals  can  be  recovered  if  there  is  no 
capture  at  the  first  stage.  We  refer  to  this  straightforward 
algorithm  as  the  MLS  (multistage  LS)  and  consider  it  as  a 
reference  point  for  the  enhanced  algorithms  introduced  in 
the  next  two  subsections. 

4.2.  One  stage  training-like  algorithm 

According  to  a  general  TL  approach  [5],  our  proposal  is 
to  use  a  few  information  symbols  in  the  payload  as  an  ex¬ 
tension  of  the  training  sequence.  These  symbols  may  be 
different  for  different  users.  Thus,  the  enlarged  training  se¬ 
quences  may  be  linearly  independent  and  the  LS  estimator 
based  on  these  TL  sequences  can  be  applied.  In  Figure  2 

Figure  3:  Structure  of  the  MLS  algorithm 

these  information  symbols  are  indicated  as  the  TL  symbols. 
The  coefficients  of  the  space-time  filters  and  signal  estima¬ 
tions  corresponded  to  the  TL  sequences  can  be  found  for 
the  FA  signals  using  the  following  training-like  LS  (TLLS) 

-  form  the  JNtl  TL  sequences 

M  mL  =  {s(ni)s{n2)...s{nNT) 
sm(mi)sm(m2){mNTL)},  (7) 

where  s(nj),  i  =  1, Nt  are  the  training  symbols, 
{sm(mi)sm(rn2),)}  are  all  JNtl  possible  se¬ 
quences  of  the  FA  signal  of  the  length  Ntl\  ni  and  my  are 
the  positions  of  the  known  and  TL  symbols  ( m ,  i  = 
are  known,  my,  j  =  1. ..Ntl  must  be  selected); 

-  calculate  the  LS  estimations  of  the  weight  vectors  using 
the  TL  sequences 

Wm  =  (R  +  <5I)-1Pm,  m  =  1...JNtl,  (8) 


Nt  Ntl 

R  =  Y,  X(n,)X*(f»i)  +  J2  X(my)X*(my);  (9) 

i— 1  j~  1 

Nt  Ntl 

Pm  =  J]  s*(ni)X(nj)  +  *m(mj)x(mi);  (10) 

i= 1  j= 1 

where  X  is  the  vector  of  input  signals,  6  is  the  regular¬ 
ization  coefficient  [11]  for  the  conventional  LS  estimator 


which  usually  is  chosen  to  be  close  to  the  variance  of  the 

-  select  Mi  weight  vectors  which  minimize  the  distance 
from  the  FA  Qm 

W,  =Wm,,  rtij=  arg  min  Qm,  j  =  l,...,Mu 



Qm  =  y]  rnin(|  ah  -  W*mX(n)  |),  (12) 

n— 1 

where  Ns  is  the  number  of  symbols  in  a  time  slot; 

-  calculate  signal  candidates 

FA)  are  shown  for  Ntl  —  4  (16  TL  sequences  for  the  bi¬ 
nary  FA)  in  the  case  of  two  colliding  users  (M2  =  2).  All 
situations  are  presented  in  Figure  4:  no  capture,  one  of  two 
users  is  captured,  and  two  of  two  users  are  captured.  Our 
goal  is  to  estimate  probabilities  of  these  events  for  different 
M2  and  then  to  calculate  the  system  performance  accord¬ 
ing  to  the  semi-analytical  procedure  presented  in  Section  2. 
The  capture  simulation  results  (estimated  probabilities  pi, 
i  —  1 , 2 , 3  to  recover  one,  two  or  three  colliding  packets)  are 
given  in  Table  1  for  the  conventional  LS  algorithm  (at  most 
one  signal  can  be  captured),  for  the  TLLS  with  Ntl  =  2, 
and  for  the  MTLLS  with  the  same  Ntl ■ 

Sj(n)  =  W*X(n),  (13) 

-  apply  parity  check  to  each  signal  candidate  and  accept 
different  signal  candidates  with  the  positive  parity  check  as 
the  captured  packets. 

The  drawback  of  this  solution  is  that  the  number  of  the 
TL  sequences  grows  exponentially  with  the  number  of  the 
TL  symbols.  Thus,  only  a  small  number  of  the  TL  symbols 
can  be  implemented.  Certainly,  in  this  situation  we  can¬ 
not  guarantee  the  possibility  to  recover  all  signals  in  a  col¬ 
lision  in  each  timeslot.  Nevertheless,  according  to  the  Note 
in  Section  3  this  is  not  necessary  in  the  considered  prob¬ 
lem.  We  have  introduced  a  multiple  capture  ability  in  an 
one  stage  procedure  and,  in  Section  5,  we  will  demonstrate 
the  performance  improvement  for  only  two  TL  symbols  in 
the  GSM(EDGE)  environment. 

4.3.  Multistage  training-like  algorithm 

Capture  ability  can  be  additionally  improved  by  means  of 
multistage  processing  similar  to  that  presented  in  Section 
4. 1  when  the  TLLS  algorithm  is  used  instead  of  the  LS  esti¬ 
mator.  We  refer  to  this  algorithm  as  the  MTLLS  (multistage 


Two  antennas  receiving  in  a  typical  GSM  (J  =  2)  ur¬ 
ban  scenario  TU50  is  assumed,  where  SNR=35dB  and 
SIR=6dB.  In  all  cases  a  space-time  filter  with  five  coef¬ 
ficients  in  each  channel  is  used.  For  each  time  slot,  the 
transmitted  bits  are  obtained  by  channel  encoding  of  one 
data  block.  The  channel  coding  scheme  includes  a  (34,28) 
systematic  cyclic  redundancy  check  (CRC)  code  (which  ac¬ 
cepts  28  bits  at  the  input  and  provides  6  parity  check  bits 
at  the  output),  and  a  (3,1,5)  convolutional  code  (rate  1/3, 
constraint  length  5). 

A  possibility  to  capture  more  than  one  user  in  a  collision 
for  the  TLLS  algorithm  is  illustrated  in  Figure  4,  where  the 
typical  curves  for  the  selection  criteria  (distance  from  the 

Figure  4:  Illustration  of  the  selection  step  in  the  TLLS  for 
Ntl  —  4 

Table  1.  Estimated  probabilities  to  capture  one/two/three 
packets  in  a  collision  of  one/.. ./five  packets 

























































































The  corresponding  curves  for  the  average  system  through¬ 
put  as  a  function  of  the  retransmission  probability  are  shown 
in  Figure  5  for  the  conditions  indicated  in  Section  2.  One 
can  see  the  significant  performance  improvement  for  the  en¬ 
hanced  algorithms,  especially  for  the  MTLLS,  compared  to 
the  conventional  LS  estimator  even  for  only  two  TL  sym¬ 

Figure  5:  Slotted  ALOHA  throughput  performance  for  dif¬ 
ferent  algorithms 


It  has  been  shown  analytically  that  a  possibility  to  recover 
more  than  one  user  in  a  RACH  collision  can  significantly 
improve  system  performance.  A  semi-analytical  approach 
has  been  proposed  to  evaluate  the  average  throughput  over  a 
Slotted  ALOHA  system  with  multiple  capture.  Three  semi¬ 
blind  space-time  filtering  algorithms  with  multiple  capture 
ability  have  been  presented.  Their  efficiency  compared  to 
the  training-based  algorithm  with  a  power  capture  has  been 
demonstrated  in  a  GSM(EDGE)  environment. 

[4]  W.  Feller,  “An  introduction  to  probability  theory  and  its 
applications”,  Wiley,  1968. 

[5]  A.M.Kuzminskiy,  D.Hatzinakos,  “Semi-blind  estima¬ 
tion  of  spatio-temporal  filter  coefficients  based  on  a 
training-like  approach”,  IEEE  Signal  Processing  Let¬ 
ters,  vol.  5,  n.  9,  pp.  231-233,  Sept.  1998. 

[6]  A.M.Kuzminskiy,  P.Strauch,  “Space-time  filtering  with 
suppression  of  asynchronous  co-channel  interference”, 
to  be  published  in  Proc.  AS-SPCC,  2000. 

[7]  A.M.Kuzminskiy,  “Finite  amount  of  data  effects  in 
spatio-temporal  filtering  for  equalization  and  interfer¬ 
ence  rejection  in  short  burst  wireless  communications”, 
to  be  published  in  Signal  Processing,  vol.  80,  n.  10, 

[8]  GJ.M  Janssen,  “BER  and  outage  performance  of  a  dual 
signal  receiver  for  narrowband  BPSK  modulated  co¬ 
channel  signals  in  a  Rician  fading  channel”,  in  Proc. 
PIMRC,  pp.  601-606, 1994. 

[9]  A.M.Kuzminskiy,  D.Hatzinakos,  “Multistage  semi¬ 
blind  spatio-temporal  processing  for  short  burst  mul¬ 
tiuser  SDMA  systems”,  in  Proc.  32nd  Asilomar  Conf. 
on  Signals,  Systems  and  Computers,  pp.  1887-1891, 

[10]  B.Widrow,  J.M.McCool,  M.G.Larimore, 
C.R.Johnson,  Jr.,  “Stationary  and  nonstationary 
learning  characteristics  of  the  LMS  adaptive  filters”, 
Proc.  IEEE,  vol.  64,  pp.  1151-1162,  Aug.  1976. 

[11]  Y.I.  Abramovich,  “Controlled  method  for  adaptive  op¬ 
timization  of  filters  using  the  criterion  of  maximum 
SNR”,  Radio  Engineering  and  Electronic  Physics, 
vol.26,  n.3,  pp.87-95, 1981. 


[1]  L.  G.  Roberts,  “ALOHA  packet  system,  with  and  with¬ 
out  slots  and  capture”,  ACM  Computer  Communication 
Review,  vol.  5,  no.  2,  pp.  28-42,  Apr.  1975. 

[2]  C.  Namislo,  “Analysis  of  Mobile  Radio  Slotted 
ALOHA  Networks”,  IEEE  Journal  on  Selected  Areas 
in  Communications,  vol.  SAC-2,  no.  4,  pp.  583-588, 
Jul.  1984. 

[3]  J.J.  Metzner,  “Comments  on  a  widely  used  capture 
model  for  Slotted  ALOHA”,  IEEE  Transactions  on 
Communications,  vol.  44,  no.  4,  p.  419,  Apr.  1996. 



Trasapong  Thaiupathump,  Charles  D.  Murphy  and  Saleem  A.  Kassam 

Department  of  Electrical  Engineering 
University  of  Pennsylvania 
Philadelphia,  PA  19104 


In  digital  communication  systems,  most  commonly  used  sig¬ 
naling  constellations  are  symmetric.  Without  a  pilot  tone 
or  known  training  sequence,  an  arbitrary  phase  rotation 
cannot  be  identified  from  a  symmetric  constellation.  The 
standard  approach  to  overcome  the  phase  ambiguity  is  to 
use  differential  encoding.  In  this  paper,  we  introduce  the 
notion  of  using  an  asymmetric  constellation  instead  of  a 
symmetric  constellation  with  differential  encoding.  The  ab¬ 
solute  phase  of  an  asymmetric  constellation  can  be  deter¬ 
mined  using  blind  statistics  of  processed  channel  outputs. 
Through  simulation  and  analysis,  we  study  the  trade-offs 
between  asymmetry  and  other  features  of  a  constellation, 
such  as,  data  rate,  power,  and  symbol  separation. 


A  symmetric  constellation  has  the  property  that  blind  pro¬ 
cessing  is  unable  to  identify  an  arbitrary  rotation  of  sym¬ 
bols.  Synchronization  with  the  phase  of  the  transmitted 
carrier  may  be  done  by  using  pilot  tones  or  known  training 
sequences.  In  blind  system,  without  a  pilot  tone  or  training 
sequence,  the  receiver  must  rely  on  statistics  of  channel  out¬ 
puts  to  recover  the  phase  of  the  received  signal.  All  of  the 
commonly-used  symbol  constellations  -  PAM,  PSK,  QAM, 
and  others  -  are  symmetric  when  the  symbols  are  equiprob- 
able.  Blind  statistics  of  these  constellations  cannot  produce 
an  absolute  phase  estimate.  To  overcome  the  phase  ambigu¬ 
ity,  a  mapping  between  the  data  and  the  symbols  has  to  be 
invariant  to  an  unknown  reference  phase.  A  simple  method 
is  to  use  differential  encoding.  Since  each  symbol  is  used 
to  determine  two  symbol  transitions,  a  symbol  decision  er¬ 
ror  will  usually  result  in  two  transition  errors.  The  penalty 
incurred  by  differential  encoding  is  well  characterized  as  a 
2-3  dB  loss  in  SNR  [2],  [3]. 

In  this  paper,  we  introduce  the  notion  of  using  an  asym¬ 
metric  constellation.  The  absolute  phase  of  an  asymmetric 
constellation  can  be  estimated  using  blind  statistics  of  pro¬ 
cessed  channel  outputs.  We  discuss  symmetry,  asymme¬ 
try,  and  how  to  design  asymmetric  constellations  and  abso¬ 
lute  phase  estimators.  Through  simulation  and  analysis,  we 
study  the  performance  of  various  absolute  phase  estimators 
and  the  trade-offs  between  asymmetry  and  other  features 
of  a  constellation. 


M- ary  PAM,  QAM,  and  PSK  are  the  most  often  encoun¬ 
tered  symmetric  constellations.  A  symmetric  constellation 
may  be  rendered  asymmetric  by  changing  the  symbol  values 
and/or  the  symbol  probabilities. 

Consider  an  M- ary  constellation  with  M  —  2m  equiprob- 
able  i.i.d.  symbols.  The  data  rate  (in  bits/symbol)  or  en¬ 
tropy  of  the  constellation  is 

M- 1 

H{S)  =  -  ^2  PAog2Pi  =  m  (1) 


where  pi  is  the  probability  of  symbol  i.  If  the  number  of 
symbols  and  the  symbol  locations  are  to  remain  unchanged, 
an  asymmetric  constellation  can  be  obtained  by  adjusting 
the  symbol  probabilities.  Because  the  symbols  in  the  asym¬ 
metric  constellation  are  no  longer  equiprobable,  the  data 
rate  of  the  constellation  is  strictly  lower  than  that  of  the 
corresponding  symmetric  constellation.  This  is  a  trade-off 
of  data-rate  for  asymmetry. 

Figure  1:  Symmetric  and  Asymmetric  8-PSK(ManipuIation 
of  the  Symbol  Probabilities) 

Fig.  1(a)  illustrates  an  8-PSK  constellation  with  equiprob¬ 
able  symbols  \/~A  ■  i  =  0,  ••  • ,  7,  constant  transmitted 

power  A,  and  a  data  rate  of  3  bps.  On  the  right  is  an 
asymmetric  8-PSK  obtained  by  manipulating  the  symbol 
probabilities.  The  value  of  p o  has  been  reduced  by  a  small 
<5,0  <  <5  <  1/8.  To  maintain  ^2]=0Pi  =  1  and  a  zero  DC 
value,  the  probabilities  of  some  other  symbols  have  also 
been  changed.  Since  the  symbols  in  the  second  constellation 

0-7803-5988-7/00/$10.00  ©  2000  IEEE 


are  no  longer  equiprobable,  the  data  rate  of  the  constella¬ 
tion  is  strictly  less  than  3  bps.  Figure  2  shows  the  exact 
reduction  in  entropy  as  a  function  of  <5  for  0  <  <5  <  0.12. 


Figure  2:  8  vs.  H(S)  for  the  Asymmetric  8-PSK 

Figure  3:  Symmetric  and  Asymmetric  8-PSK  (Symbol  Re¬ 

In  Fig.3,  another  alternative  to  introducing  asymmetry 
is  by  relocating  some  of  the  original  8-PSK  constellation 
points  without  changing  the  equal  probability  assigned  to  8 
points.  With  symbol  probability  and  power  unchanged,  the 
symbol  si  is  rotated  counterclockwise  by  <5  radians.  In  order 
to  maintain  the  zero-mean  condition,  symbols  S2,  sg,  and 
S7  must  also  be  relocated.  The  symbols  S2,  $g,  and  sr  are 
rotated  by  — e,  -Fe,  and  —8  radians,  respectively,  where  e  = 
sin~]{cos(7r/4)-(l  —  cos((S)+sin(<5))}.  However,  introducing 
asymmetry  by  moving  some  symbols  closer  together  will 
cause  more  erroneous  symbol  decisions.  With  perfect  phase 
estimation,  the  union  bound  of  the  error  rate  is 

For  small  8,  the  error  rate  performance  can  be  very  close  to 
that  of  coherent  symmetric  8-PSK  constellation. 

3.1.  Maximum  Likelihood  Approach 

Consider  the  transmission  of  nonequiprobable  M-PSK  sig¬ 
nals  over  an  AWGN  channel.  The  M-PSK  symbol  has  the 
complex  form  s;  =  ,  *  =  0, 1, . . . ,  M  —  1,  where 

s[A  denotes  the  constant  signal  power.  The  transmitted 
symbol  x[n]  =  s,  with  probability  p,.  The  corresponding 
received  sequence  is  then 

r[n]  =  x[n]ej *  +  w[n]  n  =  0, 1, N  -  1  (3) 

where  w[n]  is  a  sample  of  zero-mean  complex  white  Gaus¬ 
sian  noise  and  <j>  £  (0, 27r)  is  an  arbitrary  phase  introduced 
by  the  channel.  For  the  assumed  AWGN  model,  the  pdf  of 
r[n]  can  be  modeled  as  a  mixture  of  M  distributions 

p(r[n];  <t>)  =  (/>;)  ■  /j(r[n];  (/>) 

>= o 


(  /  n  ^  1  (  |r[n]  -  Sie-7'1^2^ 

fi(r[n];  =  ^  6XP  - 2^ - )  ■ 

Then  the  pdf  of  the  sequence  r  is 

P(r;0)  =  JJ 


^2  (Pi)  ■  fi(r[n];<f> ) 

i= 0 




The  MLE  of  <f>  is  the  value  that  maximizes  the  likelihood 
function  in  Eq.(6).  In  general,  the  derivative  of  lnp(r;  <j>) 
with  respect  to  4>  does  not  reduce  to  a  simple  form.  The 
MLE  of  <f>  can  be  obtained  numerically  by  using  iterative 
maximization  procedures.  The  difficulty  with  the  use  of 
these  numerical  methods  is  that  in  general  the  point  found 
may  not  be  the  global  maximum  but  possibly  only  a  local 
maximum  or  even  a  local  minimum. 

A  simpler  alternative  likelihood  method  of  finding  the 
absolute  phase  is  based  on  the  use  of  the  phase  statistics. 
We  may  express  the  unknown  phase  as  cj>  —  4>o  +  k  ■  (-jj) 
where  k  =  integer,  0  <  k  <  M  —  1  and  0  <  cf>o  <  2tv/M. 
Let  9[v]  be  the  phase  angle  of  the  received  sequence  r[h]. 
The  <j> o  is  obtained  first  by 


00  =  ^ 


/  N- 1  \ 

4>  =  angle  ^  ejM6^j 


is  the  mean  phase  angle  of  the  received  sequence  after  each 
phase  angle  has  been  multiplied  by  M.  Then,  the  maximum 
likelihood  method  can  be  applied  to  find  the  correct  value 


of  integer  k.  Using  the  estimate  <j>o,  the  complex  plane  is 
divided  into  slices  qit  i  =  0,1, M  -  1  bounded  by  phase 
angles  { ir/M  +  (f>o  +  i  ■  (2n/M),  i  =  0, 1, M  —  1}.  Al¬ 
though  the  nonequiprobable  symbols  do  cause  the  optimum 
symbol-by-symbol  decision  boundaries  to  change  at  the  re¬ 
ceiver,  these  angular  decision  boundaries  are  close  to  being 
optimum  for  small  8.  Let  U  be  the  number  of  points  in  the 
received  data  sequence  that  fall  in  each  region  qi.  Then,  we 
are  able  to  obtain  the  integer  k  that  maximizes  the  likeli¬ 
hood  function  defined  as 

k  =  arg  maxp(n;  k)  (9) 


where  p(n;  k)  can  be  modeled  as  a  multinomial  distribution 
with  n  =  [no  ni . . .  nw-i]  and  n;  =  l(i+k)  (mod  M) 

p(  n;  k ) 


no\m\  ■  ■  •  fiM-i 
M- 1 

■nnonni  ■  ■  ■  nnM~1 

,Po  Pi  Pm-  1 

m  n 



This  is  equivalent  to  finding  the  integer  k  that  maximizes 
the  log-likelihood  function 

M- 1 

lnp(n;  k)  =  In  p"*' 
i= 0 

M- 1 

yj  In  Pi- 



Therefore,  by  using  simple  bin  statistics,  the  absolute 
phase  estimate  is 

4>  =  fo  +  k-Q).  (12) 

To  make  the  correct  decision  in  estimating  k,  the  es¬ 
timates  p,  should  be  close  to  their  true  value  pi.  Finding 
the  sample  size  N  to  generate  reliable  estimates  of  the  pi 
requires  the  joint  probabilities  that  the  p,  lie  within  some 
e  intervals  centered  on  the  correct  values.  Using  Cheby- 
shev’s  Inequality,  we  can  roughly  determine  the  number  of 
required  samples  N  such  that  the  estimate  pi  is  within  e  of 
its  correct  value  pi  with  probability  1  —  r. 

P{\pi-Pi\<t}>l-^  =  l-r  (13) 

where  pi  =  rn/N  with  variance  af  =  {p;(l  —  Pi)} /N.  Set¬ 
ting  e  =  S/P,  the  required  N  is  given  by 

N  Pi(l-P»)P2 
rS 2 


For  the  asymmetric  setting  shown  in  Fig.  1(b),  the  most 
likely  incorrect  k  axe  the  correct  value  of  k  offset  by  ±2 
(mod  8).  The  two  largest  probability  symbols  pi  and  pr 
in  Fig.l  appear  to  be  the  most  critical  values  to  consider. 
From  Eq.(14),  setting  pi  =  pi  =  1/8  +  a  =  1/8  -I-  8/ y/2,  we 

(7  +  8s/2S  -  32 S2)P2 
64  t82 

N  = 


In  Fig.4,  we  plot  ( Nt/P 2)  as  a  function  of  8.  As  an 
example,  setting  r  =  0.1  and  P  =  2\/2  which  corresponds  to 
setting  e  to  half  of  the  difference  between  the  largest  and  the 

Figure  4:  (Nt/P2)  as  a  function  of  8 

Figure  5:  Comparison  of  MSE  of  phase  estimation  with 
different  8. 

second  largest  values  of  symbol  probabilities,  at  <5  =  0.06, 
(Nt/P2)  «  32  which  gives  N  «  2,600  samples.  Figure  5 
shows  simulation  results  for  the  MSE  of  phase  estimation 
as  a  function  of  N  for  a2  =  0.01  and  A  —  1.  At  the  same 
level  of  MSE  performance,  the  sample  size  needed  for  8  = 
0.03  is  approximately  4  times  larger  than  the  sample  size 
needed  for  8  —  0.06.  While  the  MSE  performance  includes 
contributions  from  both  estimation  of  cf>o  and  estimation  of 
k,  the  relative  dependence  of  N  on  8  (i.e.  the  factor  by 
which  N  increases  for  decreasing  8)  is  captured  well  by  the 
approximation  of  Fig.4. 

Figure  6  illustrates  the  error  probability  performance  of 
the  various  approaches.  The  bottom  dashed  line  shows  the 
error  probability  performance  of  the  coherent  symmetric 
8-PSK.  The  top  curve  shows  the  error  probability  of  sym¬ 
metric  8-DPSK.  The  stars  show  the  simulated  error  rates 
of  asymmetric  8-PSK  with  8  —  0.06  (H(S)  =  2.9542)  and 
\/A  =  1.  The  symbols  are  rotated  by  an  unknown  con¬ 
stant  phase  <p  €  (0,  2x)  radians  and  further  distorted  by 
AWGN.  Statistics  of  1,000  samples  are  used  to  estimate  the 


Figure  6:  Asymmetric  8-PSK,  Symmetric  8-PSK,  and  Sym¬ 
metric  8-DPSK  Error  Rate  Comparison 

absolute  phase  angle.  As  shown  in  Fig. 6,  the  performance 
of  asymmetric  8-PSK  is  close  to  3  dB  better  than  that  for 
symmetric  8-DPSK  at  large  SNR.  Note  that  the  data  rate 
of  asymmetric  constellation  is  less  than  that  of  symmetric 
8-DPSK  by  approximately  1.52%.  Thus,  in  order  to  make  a 
meaningful  comparison  of  these  two  modulation  methods, 
we  should  allow  the  8-PSK  symmetric  constellation  to  use 
some  form  of  encoding  with  rate  0.985.  However,  to  obtain 
a  coding  gain  of  3  dB,  the  rate  will  have  to  be  significantly 
lower.  Thus  we  conclude  that  when  we  have  large  enough 
sample  size  for  phase  estimate,  the  performance  of  asym¬ 
metric  8-PSK  can  be  close  to  that  for  coherent  symmetric 
8-PSK  constellation. 

3,2.  Nonparametric  Methods 

Without  any  prior  knowledge  on  probability  distribution 
and  exact  locations  of  symbol  values,  nonparametric  or 
distribution-free  methods  can  be  used  to  estimate  the  ab¬ 
solute  phase  rotation. 

Figure  7:  An  absolute  phase  estimation  scheme  for  Asym¬ 
metric  8-PSK  obtained  by  changing  symbol  locations 

For  an  asymmetric  constellation  obtained  by  changing 

the  symbol  locations  as  in  Fig. 3,  a  simple  and  effective 
scheme  for  phase  estimation  is  based  on  noting  that  at  the 
correct  zero  angle,  roughly  half  of  the  samples  will  fall  in 
two  angular  regions  bounded  by  7r/4  and  7r/2  and  by  — 7r/4 
and  —  7t/2,  shown  as  two  shaded  regions  in  Fig. 7.  The  abso¬ 
lute  phase  can  be  estimated  by  searching  for  the  angle  that 
gives  the  maximum  number  of  points  in  these  two  angular 
bins.  This  scheme  works  well  in  the  presence  of  some  noise, 
however,  at  high  SNR,  this  scheme  is  only  able  to  obtain 
the  estimate  within  e  of  the  correct  phase  angle.  We  can 
further  search  for  the  angle  within  this  range  that  gives  the 
minimum  mean  square  error  from  the  center  angle  between 
these  search  sectors. 

Figure  8:  Comparison  of  the  noise  performances  of  asym¬ 
metric  8-PSK  constellations  obtained  by  changing  symbol 
locations,  with  different  S  (symbol  relocation  case). 

Figure  9:  Comparison  of  MSE  of  phase  estimation  with 
different  <5  (symbol  relocation). 

Figure  8  shows  the  error  probability  performance  for 
symmetric  8-DPSK  and  asymmetric  8-PSK  obtained  by 


changing  symbol  locations,  with  different  values  of  S.  The 
top  and  bottom  dashed  lines  show  the  error  rate  perfor¬ 
mance  of  symmetric  8-DPSK  and  symmetric  8-PSK,  respec¬ 
tively.  The  solid  lines  show  the  union  bound  of  the  error 
performance  and  x  marks  show  the  simulated  results.  Re¬ 
sults  are  based  on  1,000  equiprobable  i.i.d.  symbols  rotated 
by  an  unknown  phase  <j>  and  further  corrupted  by  AWGN. 

Figure  9  illustrates  the  MSE  of  the  phase  estimate  with 
different  values  of  5 ,  assuming  that  the  equiprobable  i.i.d. 
symbols  are  rotated  by  an  unknown  phase  <f>  and  further 
distorted  by  AWGN  with  variance  0.01.  Fig.8  illustrates 
that  with  large  <5,  the  estimate  converges  to  the  absolute 
phase  faster  than  with  low  S.  However,  with  larger  S,  some 
symbol  points  are  relocated  closer  to  adjacent  symbol  points 
which  will  cause  more  erroneous  symbol  decisions. 

The  shape  of  the  mask  that  is  used  in  estimating  the 
absolute  phase  is  not  unique.  The  mask  shown  in  Fig.  7 
is  just  an  example.  We  can  use  different  masks  bounded 
by  some  different  angular  boundaries,  such  as,  half-plane 
shape  bounded  by  — tt/2  and  7r/ 2  angles.  The  properties 
of  a  good  mask  shape  are  straightforward.  It  should  give 
a  maximum  number  of  points  at  the  correct  angle  and  the 
number  of  points  in  the  mask  should  fall  when  it  rotates 
away  from  the  correct  angle.  Sensitivity  analysis  can  be 
used  to  evaluate  the  performance  of  the  mask. 


A  symmetric  constellation  may  be  rendered  asymmetric  by 
changing  the  symbol  values  and/or  the  symbol  probabili¬ 
ties.  Between  these  two  methods  of  introducing  asymmetry 
to  existing  symmetric  8-PSK,  manipulating  symbol  proba¬ 
bilities  will  certainly  cause  some  reduction  of  the  number  of 
data  bit  transmitted  per  symbol  and  some  additional  com¬ 
plexity  in  encoding/decoding  process  to  obtain  the  asym¬ 
metric  probability  arrangement.  For  the  second  asymmet¬ 
ric  arrangement,  the  symbol  probabilities  are  remain  un¬ 
changed,  so  the  data  rate  is  the  same  as  that  of  a  symmet¬ 
ric  constellation,  without  additional  complexity  in  a  coding 

vania,  1999. 

[2]  R.D.  Gitlin,  J.F.  Hayes,  and  S.B.  Weinstein,  Data 
Communications  Principles,  New  York:  Plenum  Press 

[3]  J.G.  Proakis,  Digital  Communications,  New  York: 
McGraw-Hill  1995. 

[4]  G.J.  Foscini  and  R.D.  Gitlin,  “Optimization  of  Two- 
Dimensional  Signal  Constellations  in  the  Presence  of 
Gaussian  Noise,”  IEEE  Trans,  on  Communications, 
Vol.  COM-22,  No.  1,  January  1974. 

[5]  D.G.  Forney  et  al.,  “Efficient  Modulation  for  Band- 
Limited  Channels,”  IEEE  J.,  Selected  Areas  in  Com¬ 
munications,  Vol.  SAC-2,  pp.  632-647,  August  1984. 


An  asymmetric  constellation  is  introduced  as  an  alternative 
to  regular  symmetric  constellation  with  differential  encod¬ 
ing.  Without  the  use  of  a  pilot  tone  or  known  training 
sequence,  the  absolute  phase  of  received  symbols  can  be  es¬ 
timated  blindly  from  asymmetric  constellation  using  simple 
statistics  of  the  received  symbols.  By  introducing  asymme¬ 
try  to  existing  symmetric  constellation,  the  absolute  phase 
recovery  function  is  obtained  at  the  cost  of  very  small  re¬ 
duction  in  entropy  and/or  minimum  distance.  Both  the 
asymmetry  of  a  constellation  and  the  phase  recovery  func¬ 
tion  may  be  considered  as  choices  much  as  symbol  separa¬ 
tion,  the  number  of  bits  transmitted  per  symbol,  or  power, 
providing  new  tools  for  constellation  design. 


[1]  C.D.  Murphy,  Blind  Equalization  of  Linear  and  Non¬ 
linear  Channels,  Ph.D.  Thesis,  University  of  Pennsyl- 




Kelvin  K.  Au  and  Dimitrios  Hatzinakos 

Department  of  Electrical  and  Computer  Engineering, 
University  of  Toronto,  Toronto,  Ontario,  Canada,  M5S  3G4 
Tel:  (416)  978-1613,  Fax:  (416)  978-4425 


In  short  burst  wireless  communications,  a  training  se¬ 
quence  is  incorporated  in  each  burst  for  the  receiver 
to  adjust  the  equalizer  coefficients.  However,  when  the 
amount  of  training  symbols  is  less  than  the  spatial- 
temporal  equalizer  tap  weights,  conventional 
least-square  technique  may  not  provide  good  MSE  per¬ 
formance.  Blind  methods,  on  the  other  hand,  may 
not  achieve  equalization  in  a  short  burst.  A  regular¬ 
ized  semi-blind  algorithm  was  proposed  previously  by 
Kuzminskiy  et  al.  to  overcome  this  problem  but  lo¬ 
cal  minima  exist  in  the  algorithm.  A  convex  cost  with 
training  symbols  as  the  equalizer  constraint  is  proposed 
in  this  paper  to  avoid  cost-dependent  local  minima. 
Furthermore,  comparison  with  the  regularized  semi¬ 
blind  algorithm  suggests  that  the  proposed  algorithm 
achieves  a  lower  MSE  performance  in  the  case  of  non¬ 
constant  modulus  signals  such  as  16-QAM  signals. 


Conventional  equalization  techniques  in  wireless  com¬ 
munications  require  transmission  of  training  sequences. 
This  represents  a  system  overhead  and  effectively  re¬ 
duces  the  information  rate.  On  the  other  hand,  blind 
equalization  algorithms  do  not  require  training.  One  of 
the  most  popular  blind  algorithms  is  the  family  of  con¬ 
stant  modulus  algorithms  (e.g.  CMA  2-2  or  Godard  [2] 
algorithm,  CMA  1-2  or  Sato  algorithm).  There  are 
several  disadvantages  in  using  the  CMA  family  of  algo¬ 
rithms.  One  of  them  is  the  existence  of  local  minima. 
In  situations  where  fractionally-spaced  equalizer  or  an¬ 
tenna  array  are  used,  the  Godard  algorithm  was  shown 
to  converge  globally  [3j.  Unfortunately,  this  is  not  true 
for  CMA  1-2  (Sato)  algorithm  which  was  demonstrated 
to  have  cost-dependent  local  minima  in  either  case  [4], 

This  work  was  supported  by  the  Natural  Sciences  and  Engi¬ 
neering  Research  Council  of  Canada  (NSERC). 

Another  drawback  of  blind  algorithms  is  the  slow  con¬ 
vergence  and  inability  to  achieve  equalization  in  a  short 

A  regularized  semi-blind  algorithm  was  proposed 
in  [1]  which  combined  the  LS  and  CM  1-2  costs.  The 
ability  to  successfully  equalize  the  channels  with  a  spa¬ 
tial-temporal  filter  was  demonstrated  and  thus  offered 
the  possibility  of  reducing  the  number  of  training  sym¬ 
bols.  However,  local  minima  inherent  to  the  cost  exist. 
Using  a  convex  cost  will  eliminate  the  possibility  of  con¬ 
vergence  to  cost-dependent  local  minima.  Blind  con¬ 
vex  cost  with  equalizer  tap- anchoring  was  introduced 
in  [5,  6].  In  this  paper,  we  shall  make  use  of  the  training 
sequence  in  conjunction  with  the  blind  convex  cost  [6] 
to  formulate  a  new  and  more  efficient  semi-blind  algo¬ 
rithm.  Simulation  results  demonstrate  the  potential  of 
the  proposed  algorithm  for  constant  and  non-constant 
modulus  signals. 


We  assume  there  are  K  users  in  the  model.  One  of  the 
user  is  the  signal  of  interest.  Without  loss  of  generality, 
we  shall  denote  the  first  user  to  be  the  desired  signal. 
The  remaining  K— 1  signals  are  coming  from  nearby  co¬ 
channel  cells.  At  the  base  station  receiver,  an  antenna 
array  of  M  sensors  is  employed. 

The  data  is  processed  in  a  burst  of  N  symbols  which 
are  assumed  to  be  received  under  a  stationary  environ¬ 
ment.  There  are  Nt  training  symbols  in  each  burst 
and  the  starting  position  of  the  training  sequence  is  Ns 
which  is  assumed  to  be  known.  The  transmitted  signals 
undergo  linear  channels  which  are  assumed  to  be  FIR 
of  length  Nc.  This  assumption  is  valid  when  we  have  a 
finite  delay  spread.  Equalization  is  necessary  when  the 
delay  spread  is  larger  than  the  symbol  duration.  The 

0-7803-5988-7/00/$  10.00  ©  2000  IEEE 


received  signal  at  the  j-th  sensor  is  given  by: 


Vi(n)  =  J2  ciJxi(n)  +  vAn) 


for  i  =  1, . . .  ,  K,  j  =  1, . . .  ,  M 

Cij  —  [cij  (0) ,  .  .  .  ,  Cij(Nc  —  1)]  , 

Xj(n)  =  [^(n),...  ,Xi(n-  Nc  +  1)]T, 

where  H  denotes  the  conjugate  transpose  of  a  matrix 
and  each  c^-  ( n )  is  a  complex  Gaussian  random  variable 
whose  amplitude  does  not  change  over  the  duration 
of  the  burst.  The  noise  Vj(n)  is  a  complex  circularly 
symmetric  additive  white  Gaussian  noise  of  variance 


Recall  that  the  first  user  is  the  desired  signal.  The 
equalizer  output  for  the  signal  of  interest  is  given  by: 

zi(n)  =  wHy  (n),  (4) 

where  y(n)  =  [yf(n), . . .  ,y^(n)]T  and  each  y j(n)  = 
[yj (n) , . . .  ,  yj{n  -  Nw  +  1)]T.  The  spatial-temporal 
equalizer  taps  are  w  =  [w^, . . .  ,  wj?]r  and  each 
w j  =  [wji, . . .  ,  WjNw}T-  The  vector  w  has  a  dimension 
of  MNW  x  1. 





When  a  burst  of  known  symbols  (training)  is  received, 
the  method  of  least  square  can  be  used  to  obtain  the 
spatial-temporal  equalizer  coefficients.  The  following 
equation  is  satisfied: 

Rw  =  p,  (5) 

where  R  is  the  time-averaged  spatial-temporal  auto¬ 
correlation  matrix 


r  =  y(nMn)H’  (fi) 


and  p  is  the  time-averaged  spatial-temporal  cross-cor¬ 
relation  matrix 



N,+Nt- 1 


n=N , 


for  some  delay  d. 

If  the  number  of  training  symbols  is  fewer  than 
the  number  of  spatial-temporal  equalizer  coefficients 
NWM ,  R  has  null(R)  =  NWM  -  Nt.  Therefore  there 
are  many  solutions  to  (5)  which  can  be  expressed  as: 


w  =  R+p+  ^  WjUj,  (8) 


where  R+  is  the  pseudo-inverse  of  R,  Uj’s  are  a  set 
of  orthonormal  basis  of  the  null  space  of  R  and  wt’s 
are  a  set  of  coefficients.  Equation  (8)  can  be  expressed 
compactly  as: 

w  —  R+p  +  Uv,  (9) 

where  U  =  [Ui,  U2, . . .  ,  U NWM-Nt]  an^ 
v  =  [vi,v2,...  ,VNwM-Nt}T- 

The  semi-blind  algorithm  in  [1]  tried  to  regularize 
the  standard  LS  solution  with  the  CM  1-2  cost  to  pro¬ 
vide  a  better  estimation  of  the  equalizer  coefficients  in 
the  case  of  Nt  <  NWM.  The  algorithm  minimizes  the 


J (w)  =—  \zi(n)  -  X!(n  -  d)\2 


+  \-Rl? 

72= 1 

where  R\  =  E\an\2/E\an\  with  an  being  the  alphabets 
in  a  signal  constellation  and  0  <  p  <  00  is  a  regularized 
constant.  We  shall  refer  readers  to  [1]  for  details  of  the 


4.1.  Background 

Since  cost-dependent  local  minima  exist  in  the  regular¬ 
ized  semi-blind  algorithm,  there  are  two  ways  to  avoid 
convergence  to  such  minima:  1)  devising  a  good  initial¬ 
ization  strategy  of  equalizer  tap  weights  or  2)  choosing 
alternative  cost  functions  that  are  convex.  In  this  pa¬ 
per,  we  are  primarily  interested  in  adopting  a  convex 
cost  function  in  the  problem  of  semi-blind  equalization. 

In  [5]  (and  references  therein),  a  convex  cost  func¬ 
tion  based  on  the  norm  of  an  equalizer  output  was 
proposed  in  the  context  of  blind  equalization.  The  idea 
comes  from  the  fact  that  the  opening  of  the  eye  of  the 
signal  constellation  is  characterized  by  the  intersym¬ 
bol  interference  (ISI).  Suppose  the  combined  channel- 
equalizer  response  is  c*w  =  h,  the  eye  is  opened  when 
the  magnitude  of  h(5)  for  some  delay  5  dominates  the 
rest  of  the  coefficients,  IM*)I-  This  is  closely  re¬ 
lated  to  the  l\  norm  of  the  combined  channel-equalizer 
response.  In  practice,  however,  we  can  never  know  the 
channel  response  explicitly.  An  equivalent  but  more 
useful  formulation  is  using  the  norm  of  the  equalizer 
output  [5,  6,  7].  In  [6],  the  following  cost  is  proposed: 

J(  w)  =  ||Re(z(n))||00  +  ||Im(z(n))||oo  (H) 


with  the  constraint 

Re(wjfc)  +  Im(wjfc)  =  1.  (12) 

Two  remarks  about  (11)  and  (12)  are  in  order: 

1.  The  cost  (11)  is  appropriate  for  square- type  con¬ 
stellations  such  as  4-QAM,  16-QAM  etc. 

2.  The  constraint  (12)  anchors  one  of  the  equalizer 
taps.  This  is  needed  to  avoid  the  all-zero  equal¬ 
izer  coefficients  which  is  a  valid  but  trivial  mini¬ 
mum  to  this  type  of  convex  cost  function. 

4.2.  Convex  cost  with  training  constraint 

In  this  section,  we  propose  a  linear  constraint  to  be  used 
for  the  convex  cost  (11).  We  call  it  semi-blind  because 
the  linear  constraint  makes  use  of  the  small  amount  of 
known  training  symbols  present  in  the  received  burst 
of  data.  The  idea  was  essentially  discussed  in  the  pre¬ 
vious  section.  When  the  number  of  training  symbols  is 
fewer  than  the  spatial-temporal  equalizer  coefficients, 
the  solution  of  the  LS  problem  can  be  expressed  as  (and 
restated  here): 

Rw  =  p  (13) 

w  =  R+p  +  Uv.  (14) 

Equation  (13)  can  be  viewed  as  a  constraint  on  the 
equalizer  and  can  be  adopted  to  replace  the  tap-an¬ 
choring  technique.  Hence  (11)  and  (13)  describe  our 
semi-blind  convex  cost. 

There  are  several  properties  of  this  semi-blind  algo¬ 

1.  The  semi-blind  constraint  (13)  is  linear.  It  can  be 
thought  of  as  a  generalization  of  the  tap-anchoring 

2.  Because  of  the  linear  constraint,  convexity  of  the 
cost  (11)  is  still  preserved. 

3.  Convexity  of  the  cost  (11)  is  established  in  a  dou¬ 
bly  infinite  equalizer  (ideal)  setting  and  also  in 
a  finitely  parameterized  equalizer  (practical)  set¬ 
ting  [6].  Therefore,  using  an  FIR  equalizer  main¬ 
tains  convexity  unlike  the  Godard  cost  function. 

4.  As  in  the  case  of  the  blind  convex  cost  function, 
this  kind  of  equalization  technique  leaves  an  un¬ 
known  gain  at  the  equalizer  output  [7] .  Hence  an 
automatic  gain  control  (AGC)  is  needed  to  scale 
the  output.  This  can  be  done  with  the  knowledge 
of  the  known  signal  constellation. 

4.3.  Implementation 

Since  norm  cannot  be  implemented  in  practice,  we 
approximate  the  norm  with  lp  norm  for  some  large 
P : 

J(w)  =  ||Re(z(n))||oo  +  ||Im(z(n))||00 

-  lim  ll^e(2(n))llp  +  ||Im(z(n))||p 
P  A  (15) 

~  (£|Re(z(n))|p)p  +  (£|Im(z(n))|p)* 
for  large  p. 

Convexity  is  preserved  in  this  approximation  [7].  In 
actual  implementation,  we  can  minimize  the  cost 

J  (w)  =  £|Re(z(n))|p  +  £|Im(z(n))|p  (16) 

to  simplify  computation.  Substituting  (14)  in  (16)  and 
taking  the  gradient  with  respect  to  v* ,  we  obtain 

G  =  Vv.  J(v)  =  E|pU"y(n)(|Re(z(n))|p-2Re(z(n)) 

-  j'|Im(z(n))|p-2Im(z(n))j  1. 


The  received  data  is  processed  in  a  burst  of  N  sym¬ 
bols.  A  recursive  method  based  on  the  gradient  descent 
is  used  to  obtain  the  spatial-temporal  equalizer  coeffi¬ 
cients.  The  algorithm  is  given  by: 

v(fc+i)  =  v(fc)  _  (fe>  (18) 

where  denotes  the  vector  v  at  the  k-th  recursion, 
H  is  a  small  step  size  and  is  an  estimate  of  the 
gradient  (17)  at  the  k-th  recursion.  This  estimate  is 
obtained  by  averaging  over  the  burst: 

«(fe)  =  jr  E{puHy(n)(lRe(z(fc)(n))r2 

n— 1  ^ 

R e^z^(ra)^  -  j|Im^z^(n)^  |p-2Im^z^(n)^|. 


The  algorithm  is  initialized  with  v®  =  0.  Such  ini¬ 
tialization  is  equivalent  to  setting  the  equalizer  with 
R+P  (i.e.  the  particular  LS  solution  in  (14)).  Then 
w«  =  R+p  +  UvW. 

4.4.  Simulation  Results 

In  this  section,  we  shall  provide  some  simulation  re¬ 
sults  on  the  performance  of  the  proposed  semi-blind 
algorithm.  Three  users’  signals  ( K  =  3)  are  impinging 


on  a  receiver  with  four  sensors  (M  =  4).  The  first  user 
is  the  desired  signal  and  the  other  2  users  are  interferers 
from  other  co-channel  cells.  We  shall  assume  that  the 
SNR  of  the  desired  signal  at  the  receiver  is  30  dB.  The 
signal-to-interference  ratio  (SIR)  is  3  dB  in  our  simula¬ 
tions.  The  signals  go  through  their  respective  channels 
which  are  modeled  as  3  taps.  This  is  the  case  when 
the  delay  spread  is  around  3  —  4  symbol  periods.  At 
the  receiver,  each  sensor  has  an  equalizer  of  length  6. 
Hence  the  spatial-temporal  equalizer  has  a  total  of  24 

When  implementing  the  semi-blind  algorithm  (16), 
the  choice  of  the  exponent  p  has  to  be  determined.  Fig¬ 
ure  1  shows  a  plot  of  the  MSE  achieved  using  different 
p's  for  16-QAM  signals.  The  MSE  is  lower  when  a 
larger  p  is  used.  However,  a  compromise  has  to  be 
struck.  Using  too  large  a  p  might  have  numerical  prob¬ 
lems  in  the  recursion  at  the  initial  stage  when  the  noise 
and  ISI  is  severe  while  using  too  small  a  p  does  not  ap¬ 
proximate  (16)  well.  The  pure  blind  convex  algorithm 
in  [6]  uses  p  =  12.  We  shall  also  use  this  value  of 
p  in  subsequent  simulations.  The  step  size  p  for  the 
recursive  algorithm  is  0.001.  The  performance  mea¬ 
sure  is  the  mean  square  error  (MSE)  of  the  output. 
We  shall  compare  the  MSE  among  the  convex  semi¬ 
blind,  regularized  semi-blind  and  pure  LS  algorithms 
in  the  case  where  Nt  <  MNW.  The  blind  algorithm 
with  tap-anchoring  constraint  (12)  is  also  implemented 
using  a  recursion  similar  to  (18)  but  in  terms  of  w.  The 
blind  case  (which  does  not  take  into  account  of  known 
symbols  present  in  the  burst)  fails  to  converge  under 
this  scenario  for  both  4-QAM  and  16-QAM  (Fig.  (2) 
and  Fig.  (3)).  An  AGC  is  used  at  the  output  for  the 
convex  semi-blind  algorithm  so  that  the  comparison  is 
meaningful.  The  AGC  adjusts  the  gain  by 

where  an  is  the  alphabets  in  the  constellation  and 
|z(n)|2  is  the  average  over  the  burst.  The  term  E\an\2 
can  be  pre-computed  since  the  constellation  is  known. 
This  is,  in  fact,  the  variance  of  the  constellation  and  in 
our  simulations,  we  set  E|ara|2  =  1. 

Figure  (2)  shows  the  MSE  vs.  Nt  for  the  case  of 
4-QAM  signals.  The  MSE  is  that  of  the  desired  user. 
The  burst  has  150  symbols.  The  LS  curve  indicates  the 
MSE  if  we  are  only  using  the  training  sequence  to  com¬ 
pute  the  equalizer  coefficients.  It  is  also  an  indication 
on  the  MSE  before  passing  through  the  semi-blind  al¬ 
gorithms  since  we  initialize  the  algorithms  using  the  LS 
solution.  The  regularized  semi-blind  algorithm  is  im¬ 
plemented  as  in  [1].  Our  convex  semi-blind  algorithm 
runs  for  500  recursions.  The  MSE  plot  is  obtained  by 

averaging  over  40  runs  of  bursts  of  150  symbols.  The 
regularized  semi-blind  algorithm  achieves  smaller  MSE 
in  this  scenario  than  that  of  the  convex  semi-blind  al¬ 

The  next  simulation  is  on  16-QAM  signals.  In  this 
case  the  MSE  vs.  Nt  plot  (Fig.  (3))  is  obtained  by  av¬ 
eraging  40  runs  of  bursts  of  200  symbols.  The  convex 
semi-blind  algorithm  iterates  500  times.  We  can  see 
that  in  this  scenario,  it  has  a  smaller  MSE  starting 
from  Nt  =  12  than  the  regularized  semi-blind  algo¬ 
rithm.  The  latter  method  does  not  perform  as  good 
as  in  the  case  of  4-QAM  signals.  If  we  can  tolerate  an 
MSE  of  no  more  than,  say,  0.05,  then  the  regularized 
semi-blind  method  will  fail  in  this  case  while  the  convex 
semi-blind  method  is  suitable  for  Nt  >  16  in  a  burst. 


In  this  paper,  a  convex  cost  with  training  constraint 
is  proposed  for  semi-blind  adjustment  of  the  coeffi¬ 
cients  of  a  spatial-temporal  equalizer  in  general.  Com¬ 
pared  to  other  blind  and  semi-blind  methods  in  a  short 
burst  communication  scenario,  the  proposed  method 
performs  better  especially  with  non-constant  modulus 
signal  constellations.  Such  type  of  constellation  is  pro¬ 
posed  in  the  3rd  generation  wireless  standard  when 
higher  data  rates  are  needed. 

Figure  1:  Plot  of  MSE  vs.  Nt  for  the  semi-blind  convex 
algorithm  using  different  p  (K  =  3,  16-QAM  signals, 
N  =  200). 



Figure  2:  4-QAM  case:  MSE  vs.  Nt  for  pure  LS,  con¬ 
vex  blind,  convex  semi-blind  and  regularized  semi-blind 
algorithms  (K  =  3,  N  =  150). 

Figure  3:  16-QAM  case:  MSE  vs.  Nt  for  the  pure  LS, ' 
convex  blind,  convex  semi-blind  and  regularized  semi¬ 
blind  algorithms  ( K  =  3,  N  =  200). 

[1]  A.  Kuzminskiy,  L.  Fety,  P.Forster,  S.  Mayrar- 
gue,  “Regularized  semi-blind  estimation  of  spatio- 
temporal  filter  coefficients  for  mobile  radio  com¬ 
munications,”  in  Proc.  GRETSI’97,  pp.  127-130, 
Grenoble,  1997. 

[2]  D.  Godard,  “Self-recovering  equalization  and  car¬ 
rier  tracking  in  two-dimensional  data  communica¬ 
tion  systems,”  in  IEEE  Transactions  on  Commu¬ 
nications,  vol.  COM-28,  pp.  1867-1875,  November 

[3]  Z.  Ding,  “On  convergence  analysis  of  fractionally 
spaced  adaptive  blind  equalizers,”  in  IEEE  Trans, 
on  Signal  Processing,  vol.  44,  pp.  650-657,  March 

[4]  Y.  Li,  K.  Riu  and  Z.  Ding,  “Length-  and  cost- 
dependent  local  minima  of  unconstrained  blind 
channel  equalizers,”  in  IEEE  Trans,  on  Signal 
Processing,  vol.  44,  pp.  2726-2735,  November 

[5]  W.  A.  Sethares,  R.  A.  Kennedy  and  Z.  Gu,  “An 
approach  to  blind  equalization  of  non-minimum 
phase  systems,”  in  ICASSP,  pp.  1529-1532,  1991. 

[6]  R.  A.  Kennedy  and  Z.  Ding,  “Blind  adaptive 
equalizers  for  quadrature  amplitude  modulated 
communication  systems  based  on  convex  cost 
functions,”  in  Optical  Engineering,  vol.  31,  pp. 
1189-1199,  June  1992. 

[7]  S.  Vembu,  S.  Verdu,  R.  A.  Kennedy  and  W. 
Sethares,  “Convex  cost  functions  in  blind  equal¬ 
ization,”  in  IEEE  Trans,  on  Signal  Processing,  vol. 
42,  pp.  1952-1960,  August  1994. 


Performance  Analysis  of  Blind  Carrier  Phase 
Estimators  for  General  QAM  Constellations 

E.  Serpedin 1  (contact  author),  P.  Ciblaf,  G.  B.  Giannakis i3,  and  P.  Loubaton2 

1  Dept,  of  Electrical  Engineering,  Texas  AfcM  University,  College  Station,  TX  77843-3128,  Tel.:  (979)  458  2287 

Fax:  (979)  862  4630  email: 

2  Universite  de  Marne-la-Vallee,  Laboratoire  “Systemes  de  Communication”,  5  Bd.  Descartes,  77454 

Marne-la-Vallee  cedex  2,  France 

3  Dept,  of  Electrical  and  Computer  Engr.,  University  of  Minnesota,  200  Union  St.  SE,  Minneapolis,  MN  55455 

Abstract —  Large  quadrature  amplitude  modulation  (QAM) 
constellations  are  currently  used  in  throughput  efficient  high 
speed  communication  applications  such  as  digital  TV.  For  such 
large  signal  constellations,  carrier  phase  synchronization  is  a 
crucial  problem  because  for  efficiency  reasons  the  carrier  ac¬ 
quisition  must  often  be  performed  blindly,  without  the  use  of 
training  or  pilot  sequences.  The  goal  of  the  present  paper  is 
to  provide  thorough  performance  analysis  of  the  blind  carrier 
phase  estimators  that  have  been  proposed  in  the  literature 
and  to  assess  their  relative  merits. 

I.  Introduction 

Fast  acquisition  of  the  carrier  phase  is  a  crucial  issue  in 
high-speed  communication  systems  that  employ  large  QAM 
modulation  schemes.  One  of  the  challenges  associated  with 
large  QAM  constellations  is  the  blind  carrier  acquisition, 
which  is  often  required  in  large  and  heavily  loaded  multipoint 
networks  for  bandwidth  efficiency  and  little  effort  involved  in 
network  monitoring.  It  is  known  that  for  large  QAM  constel¬ 
lations,  the  conventional  carrier  tracking  schemes  frequently 
fail  to  converge  and  result  in  “spinning”  [8],  [10].  There¬ 
fore,  developing  computationally  simple  blind  carrier  phase 
estimators  with  guaranteed  convergence  and  good  statistical 
properties  is  well-motivated. 

Recently,  a  number  of  blind  carrier  phase  estimators  have 
been  proposed  [1],  [2],  [3],  [4],  [6],  [11,  p.  266-277],  [12],  but 
thorough  performance  analysis  of  all  these  algorithms  has 
not  been  performed.  In  order  to  quantify  the  performance  of 
these  estimators,  the  large  sample  (asymptotic)  performance 
analysis  of  these  phase  estimators  will  be  established  and 
compared  with  the  stochastic  (modified)  Cramer-Rao  bound 
[11,  Section  2.4],  It  is  shown  that  the  seemingly  different 
estimators  [1],  [2],  [3],  [5],  [11,  p.  266-277],  [12],  are  the  same, 
while  the  estimator  proposed  in  [4]  has  a  larger  asymptotic 
variance  than  the  power-law  estimator  [3],  [6],  [12],  It  is 
also  shown  that  by  exploiting  the  additional  samples  acquired 
through  oversampling  the  received  continuous-time  waveform 
does  not  improve  the  performance  of  the  power-law  estimator 
in  [3],  [6],  [12].  Finally,  computer  simulations  are  presented 
to  corroborate  the  theoretical  developments  and  to  compare 
the  performance  of  the  investigated  phase  estimators. 

II.  Problem  Statement 

We  consider  the  baseband  QAM  communication  system 
where  the  received  signal  Y(n)  =  Yr(n)  +  jYi(n)  is  given  by 

Y{n)  =  ei0X{n)  +  N{n)  ,  (1) 

where  Yr{n)  and  Yj(n)  denote  the  in-phase  and  quadrature 
components  of  Y(n),  X(n)  stands  for  the  independent  and 
identically  distributed  (i.i.d.)  input  QAM  symbol  stream, 
N(n)  is  the  circularly  distributed  Gaussian  noise,  assumed  to 
be  independent  of  X(n),  and  0  denotes  the  unknown  carrier 
phase  offset.  The  problem  of  blind  carrier  phase  estimation 
consists  of  recovering  the  phase  error  6  only  from  knowledge 
of  the  received  data  Y(n).  Because  the  input  QAM  con¬ 
stellation  has  quadrant  (n/2)  symmetry,  it  follows  that  it 
is  possible  to  recover  the  unknown  phase  9  only  modulo  a 
tt/2— phase  ambiguity.  This  ambiguity  can  be  further  elimi¬ 
nated  through  the  use  of  appropriate  coding  schemes.  There¬ 
fore,  without  any  loss  of  generality,  we  can  assume  that  the 
unknown  phase  9  lies  the  interval  (— 7r/4,  7t/4).  In  the  next 
section,  we  briefly  outline  the  blind  phase  estimators  [1],  [2], 
[3],  [4],  [5],  [11,  p.  266-277],  [12],  and  establish  their  exact 
large  sample  performance. 

III.  Blind  Carrier  Phase  Estimators 

A.  Approximate  Maximum  Likelihood  Estimator:  Fourth- 
Power  Estimator 

The  maximum  likelihood  (ML)  estimator  of  9  can  be  theo¬ 
retically  derived  by  maximizing  a  stochastic  likelihood  func¬ 
tion,  obtained  by  averaging  the  conditional  probability  den¬ 
sity  function  of  the  received  data  with  respect  to  the  unknown 
data  stream  X(n).  However,  for  high  order  QAM  constella¬ 
tions,  the  computational  complexity  involved  in  calculating 
the  likelihood  function  and  more  importantly  the  resulting 
nonlinear  optimization  problem  render  the  ML-estimator  im¬ 
practical  for  most  high-speed  applications.  The  need  for  com¬ 
putationally  simple  estimators  with  guaranteed  convergence 
calls  for  alternative  (possibly  suboptimal,  but  computation¬ 
ally  feasible)  phase  estimators. 

Moeneclaey  and  de  Jonghe  have  shown  in  [12]  that  for 
any  arbitrary  2-dimensional  rotationally  symmetric  constel¬ 
lations  (such  as  square  or  cross  QAM  constellations)  the 
fourth-power  (or  power-law)  estimator  can  be  obtained  as 
an  approximate  ML-estimator  in  the  limit  of  small  Signal- 
to-Noise  Ratio  (SNR:=  101ogE|X(n)|2/E|AT(n)|2,  where  := 
stands  for  “is  defined  as”).  The  power-law  estimator  and  its 
sampled  version  are  defined  as: 



[(EX*\n))  EYA{n)]  , 

E(X*Hn))^n=1J  (n) 



0-7803-5988-7/00/$  10.00  ©  2000  IEEE 


where  the  superscript  *  stands  for  complex  conjugation  and 
the  operator  E(-)  denotes  the  expectation  operator.  The 
fourth-power  estimator  does  not  require  any  complex  nonlin¬ 
ear  optimizations,  but  it  requires  a-priori  knowledge  of  the 
input  constellation  E(X*4(n)).  However,  this  is  not  a  restric¬ 
tive  assumption  since  for  most  QAM  constellations,  EX*4(n ) 
is  a  negative  real-valued  number,  whose  effect  can  be  easily 
accounted  for.  Using  standard  convergence  results  [9]  it  can 
be  checked  that  asymptotically  (3)  is1  w.p.  1  a  consistent 
estimator  (0  — *  9  as  N  — >  oo)  for  any  SNR  range.  An  expla¬ 
nation  can  be  obtained  by  observing  that,  in  the  presence  of 
circularly  and  normally  distributed  noise  N(n),  the  following 
relation  holds: 

jf  E  ^  EYAW  =  ei4eEX4(n)  ,  (4) 


where  the  second  equality  in  (4)  is  obtained  by  expanding 
EY4(n)  =  E(exp(j9)X(n)  +  N(n))4,  taking  into  account  the 
independence  between  X(n)  and  N(n),  and  ENk(n)  =  0,  for 
any  positive  integer  k.  Hence,  (3)  recovers  the  carrier  phase 
from  the  phase  of  the  fourth-order  moment  of  the  received 

Cartwright  has  proposed  estimating  the  unknown  phase  6 
using  a  different  set  of  fourth-order  statistics  [3],  Define  the 
following  fourth-order  moments  and  cumulants: 

7  :=  E[Yr4}  +  E{Y4}  -  6E[Yr2Y?\  ,  (5) 

7a  :=  cum(Yr,  Yr,  Yr,  Yt)  =  E[Y?Yt }  -  3 E[Yr2]E[YrYi\ 

=  E\Y?Yi\  ,  (6) 

76  :=  cum(Yr,  Yt,  Yu  Yt)  =  £[yPlf]  -  3E[Y?]E[YrYi] 

=  E[YrYi3]  ,  (E[YrYi]  =  0).  (7) 

Cartwright’s  estimator  is  defined  by: 

phase  estimator  with  guaranteed  convergence  has  been  pro¬ 
posed  in  [2]  for  square-QAM  constellations.  Herein,  the  car¬ 
rier  acquisition  problem  is  reduced  to  the  blind  source  sep¬ 
aration  problem  of  the  linear  mixture  of  the  in-phase  and 
quadrature-phase  components  of  the  received  signal,  and  a 
cumulant-based  source  separation  criterion  is  proposed  to  es¬ 
timate  the  unknown  phase-offset  [2],  In  [1],  [11,  pp.  271-277], 
a  low  SNR  approximation  of  the  likelihood  function,  assum¬ 
ing  PSK  input  constellations,  is  shown  to  have  the  same  form 
as  the  estimator  [2].  Furthermore,  it  is  justified  that  this  es¬ 
timator  can  be  used  even  for  general  QAM  constellations  [11, 
pp.  271-277].  By  relying  on  Godard’s  quartic  criterion  [8], 
Foschini  has  shown  an  alternative  derivation  of  this  phase 
estimator  in  [5].  Next,  we  describe  briefly  the  estimator  pro¬ 
posed  in  [2] ,  which  relies  on  the  observation  that  the  in-phase 
and  quadrature  components  of  a  square-QAM  constellation 
are  independent. 

Let  cf>  denote  an  estimate  of  the  unknown  phase  offset  6 , 
define  the  “rotated”  output  Y(n)  :=  exp  (—j<j))  Y(n),  and 
assume  that  X(n)  belongs  to  a  square-QAM  constellation. 
In  the  absence  of  noise  and  if  <j>  =  0,  then  the  in-phase 
and  quadrature  components  of  Y(n)  =  X(n)  are  indepen¬ 
dent.  Thus,  the  joint  cumulants  of  the  in-phase  ( [Yr(n ))  and 
quadrature  ( Yi(n ))  components  of  Y (n)  are  equal  to  zero 

7a  :=  cum(Yr(rc),  Yr(n),  Yr(n),  Yi(n))  =  0  , 

76  :=  cum(Yr(n),Yi(n),Yi(n),Yi(n))  =  0  ,  (10) 

and2  7a  —  76  =  0.  It  is  interesting  to  remark  that  (10) 
continues  to  hold  true  even  in  the  presence  of  additive  cir¬ 
cularly  and  normally  distributed  noise  N(n),  because  the 
cumulants  of  the  in-phase  and  quadrature  components  of 
N(n)  cancel  out.  By  taking  into  account  (9),  it  follows  that 
7a  —76  =  (EY4 (n)  —  EY*4 (n)) /8j .  Thus,  9  can  be  estimated 

tan(40)  =  4  ^  1,1  ^ 

0  =  -at  an 

.  (8) 

To  verify  that  Cartwright’s  estimator  is  the  fourth-power  es¬ 
timator  in  (2),  we  equate  the  in-phase  and  quadrature  com¬ 
ponents  of: 

EY4(n)=ej4eEX4(n)  =  cos  (4 0)EX4{n)+j  sin  (49)EX4(n) 
EY4(n)=E(Yr(n)  +  jYi(n))4=E\Y4(n)  +  Y4(n)  -  6 Y?(n) 
xY)2(n)]  +  4 jE[Yr3(n)Yi(n)  -  Yr(n)lf  (n)] 

=7  +  4.7'(7a  -76)-  (9) 

It  follows  that  7  =  cos  (49)EX4(n)  and  4(7a  —  76 )  = 
sin  (48)EX4(n),  which  implies  the  equivalence  between  es¬ 
timators  (2)  and  (8).  Cartwright’s  (fourth-power)  estimator 
requires  only  that  EX4(n)  7^  0  and  the  independence  be¬ 
tween  X(n )  and  additive  circularly  and  normally  distributed 
noise  N(n),  and  it  can  be  applied  to  both  square  and  cross- 
QAM  constellations,  as  opposed  to  the  estimator  proposed  in 
[4],  which  can  be  applied  only  to  square-QAM  constellations. 

It  is  interesting  to  remark  that  three  other  phase  estima¬ 
tors,  derived  using  completely  different  arguments,  are  equiv¬ 
alent  to  the  fourth-power  estimator.  An  alternative  robust 

0a  :=  arg  min ^(EY4{n)  -  EY*4(n)) 

=  arg  mm^e-^EY4^)  -  ej4<tl EYt4{n)).  (11) 

If  we  consider  the  polar  representation  EY4(n)  = 
X4  exp(j40),  from  (11)  we  obtain  that  0a  =  arg  min ^  A4 (exp 
{—j4{4>  —  9))  —  exp  {j4(<J>  —  9 ))),  which  implies  that  9a  =  9 
modulo  a  7r/4-phase  ambiguity.  Hence,  estimator  (11)  is  the 
same  as  the  fourth-power  estimator  (2).  By  taking  advan¬ 
tage  of  the  sign  of  7  :=  (EY4(n)+EY*4(n))/ 2  (see  (5),  (9)), 
the  7r/4-phase  ambiguity  inherent  in  (11)  can  be  reduced  to 
a  7r/2-phase  ambiguity  (since  if  0a  —  0  =  rr/4  modulo  tt/2, 
then  7  =  —EX4{n)  #  EX4(n)). 

In  practice,  many  communication  systems  utilizing  QAM 
constellations  employ  also  coding,  which  implies  that  the 
SNR  available  at  the  synchronizer  will  be  reduced  by  an 
amount  proportional  to  the  coding  gain.  In  order  to  eval¬ 
uate  correctly  the  performance  of  these  phase  estimators  at 
all  SNR  levels,  next  we  provide  an  exact  expression  for  the 
large  sample  variance  of  the  power-law  estimator,  which  is 
valid  for  any  SNR  level  and  it  is  not  restricted  to  the  high 
SNR  regime  as  is  the  case  with  the  approximate  asymptotic 
expression  presented  in  [12],  The  next  section  will  show  that 

^The  notation  w.p. 

1  denotes  convergence  with  probability  one. 


The  reader  can  easily  check  that  7a  =  —75,  [4]. 



the  expression  of  [12]  is  not  valid  for  low  and  medium  SNRs 
(<  20  dB). 

Theorem  1.  Assuming  that  the  i.i.d.  symbol  stream  X(n) 
is  coming  from  a  finite  dimensional  QAM- constellation  and 
that  the  additive  noise  N(n)  is  circularly  and  normally  dis¬ 
tributed  and  independent  of  X(n),  then  the  estimate  (3)  is 
asymptotically  normally  distributed  with  zero  mean  and  the 
asymptotic  variance: 

lim  JV(0-0)2  = 


My  44  —  EX8(n) 
32{EXi(n))2  ’ 


with3  py,4o  ~  EY\n)  =  ej4°  EX4  (n),  and 


x£|lV(ra)|4+161J|X(n)|2.E|lV(n)|6+-E|./V(ri)|®  (13) 

Proof.  Please  see  [13].  □ 

The  asymptotic  variance  (12)  does  not  depend  on  the  un¬ 
known  phase  6 ,  but  only  on  the  input  symbol  constellation 
and  the  SNR.  This  confirms  the  conclusion  drawn  in  [3]  stat¬ 
ing  that  the  standard  deviation  of  (8)  appears  to  be  constant 
with  respect  to  the  true  value  of  6.  We  evaluate  next  the 
asymptotic  performance  of  a  phase  estimator  based  on  an 
alternative  set  of  statistics  that  was  proposed  in  [4]. 

B.  HOS-Based  Phase  Estimator  of  [6] 

The  phase  estimator  [4]  extracts  the  unknown  phase  infor¬ 
mation  0  e  (— 7t/4,  7r/4)  using  the  relations: 



with  :=  E[ \X\4}  -  2{E\X\2}2  and 

7  :=  cum{yr(n),  Yr(n),  Yi(n),  Yi(n)}  =  E[Y2  {n)Y2  (n)} 

-  E[Y2(n)\E[Y2(n)\  =  0.25  sin2  (20)7*.  (16) 

Let  7 a,  Jb,  and  7  denote  sample  estimates  for  ja,  76,  and  7, 
respectively,  and  define  by  0i  and  02  the  sample  estimates 
corresponding  to  (14)  and  (15),  respectively.  The  next  theo¬ 
rem,  whose  proof  is  deferred  due  to  space  limitations  to  [13], 
establishes  the  asymptotic  performance  of  0i  and  02. 
Theorem  2.  Assuming  that  the  i.i.d.  symbol  stream  X{n) 
is  coming  from  a  finite  dimensional  QAM-constellation  and 
that  the  additive  noise  N(n)  is  circularly  and  normally  dis¬ 
tributed  and  independent  of  X  ( n ) ,  then  the  estimates  6 1  and 
02  are  asymptotically  normally  distributed  with  zero  mean  and 
asymptotic  variances: 

cot(20)  = 

7a  ~7t 




>  0.125 

'«(-?•-§  Hf-S)- 

tan (20)  =  2(7?  if 

0  € 

7*  -47 

/  7T  7T\ 

V  8 ’ 8/ ’ 

-E  <0.125^ 


lim  1 V(0t  -  0)2  =  +  cot2(2*)^  Z2cot(2g)^  , 

N-*  00  7* 

3  The  notation  Hy,kl  :=  EYk(n)Y *l  (n)  stands  for  the  ( k  +  l)th-moTnent 
of  V'(n). 

7T  7T\ 

.8*4  /  ’ 

lim  r.r(6<  0 )2  gU  +  4tan2(26l)g22  +  4tan(20)gi2 

N—*oo  '  2  '  7I 

7 r  7T\ 
8’  8/  ’ 



Qu  ■=  lim  NE[(%  -  76 )  -  (ja  -  7b)]2  = 

N—+OO  OZ 


cos(80)[(EX4(n))2  -  £X8(n)]  +  pv,4i 



qi2:=  lim  AT£{(7-7)[(7a -76)  -  (7a -75)]} 


-  sin  (8 9)[EX8(n)  -  2(£X4(n))2]  +  2Im{^y,62} 


4 sin  (40)EX4{n)[pY,22  -  y,u] 


_  8(E|X(n)|2  +  g[lV(n)|a)Im{Aty,Bi} 


.•  „n,.  \2  cos  (89)EX8(n)  +  3/ry44 

Q22  :=  lim  NE{~1  -  7)  =  - — - - — 

N  — >oo  1  Zo 

4Re{^y62}  +  48/iy,n  +  6 [cos  (4 9)EX4(n)  -  My22] 

32/iyn  [cos  (40)EX4(n)  -2E\Y(n)\4\ 



16[Re{/ry,5i}  —  py,33]py,ii 


My 4 4  is  given  by  (13),  and 

Py,62  :=ej4e[EX6(n)X*2(n)  +  12EX5(n)Xm{n)E\N(n)\2 

+  15EX4(n)E\N(n)\4},  (22) 

My  51  :=ej4$[EX*(n)X'(n)  +  5EX4(n)E\N(n)\2],  (23) 

My, 33  ~E\X{n)\6  +  9E\X(n)\4E\N(n)\2 

+  9E\X{n)\2E\N(n)\4  +  £7|AT(rz)|6,  (24) 

My 22  :=  E\X{n)\4  +  4E\X(n)\2E\N(n)\2  +  £|lV(n)|4,  (25) 

Myn  E\X(n)\2  +  E\N(n)\2.  (26) 

Opposed  to  the  power-law  estimator,  the  asymptotic  per¬ 
formance  of  the  Chen  etal.  estimator  [4]  depends  on  the 
phase  offset  0.  As  the  simulation  results  will  show  (see  Fig¬ 
ure  5),  the  asymptotic  performance  of  this  estimator  deteri¬ 
orates  significantly  whenever  the  a-priori  intervals  (14),  (15) 
are  missed,  and  for  any  SNR  it  exhibits  a  larger  variance  than 
the  power-law  estimator. 

IV.  Performance  Comparisons 

In  this  section,  computer  simulations  are  performed  to 
assess  the  relative  merits  of  the  proposed  phase  estima¬ 
tors  by  comparing  the  theoretical  (asymptotic)  limits  and 
the  experimental  standard  deviations  of  the  investigated  es¬ 
timators.  Two  additional  estimators  have  been  analyzed: 
the  fractionally-sampled  (FS)  power-law  estimator  and  the 
reduced-constellation  power  estimator.  The  FS-power  es¬ 
timator  recovers  the  unknown  phase  offset  0  by  exploiting 


all  the  samples  obtained  by  fractionally-sampling  (oversam¬ 
pling)  the  received  continuous-time  waveform  in  the  estima¬ 
tor  (3).  A  raised-cosine  pulse  shape  with  roll-off  factor  0.3 
and  an  oversampling  factor  P  =  3  are  assumed  throughout 
the  simulations.  The  reduced-constellation  power  estimator 
relies  also  on  (3),  but  only  the  received  samples  that  are 
larger  in  magnitude  than  a  given  threshold  are  processed  [10, 
p.  1382],  [6,  p.  1482].  Thus,  only  the  points  closest  to  the 
four  corners  of  the  constellation  are  processed.  The  asymp¬ 
totic  performance  of  these  two  additional  estimators  can  be 
established  using  the  result  of  Theorem  1,  but  due  to  space 
limitations  their  expressions  will  not  be  presented. 

In  Figures  1-a  and  b,  we  have  plotted  the  experimen¬ 
tal  and  theoretical  standard  deviations  of  all  these  estima¬ 
tors  versus  SNR,  assuming  a  square  256-QAM  constellation, 
0  =  15°(=  7r/12),  N  =  512  samples,  MC  =  300  Monte-Carlo 
runs,  and  additive  normally  distributed  noise.  The  threshold 
in  the  reduced-constellation  power  estimator  has  been  set  up 
so  that  only  the  received  samples  corresponding  to  the  12 
points  of  the  input  256-QAM  constellation  with  the  largest 
radii  are  processed.  The  solid  line  denotes  the  stochastic 
Cramer- Rao  bound  (CRB=  1/(AT  •  SNR))  corresponding  to 
the  phase  estimate.  Figure  1  shows  that  the  power-law  es¬ 
timator  performs  better  than  the  Chen  etal.  estimator  [4] 
at  all  SNR  levels,  but  worse  than  the  reduced-constellation 
power  estimator  at  high  SNRs  (SNR>  20  dB).  The  FS-based 
power  estimator  appears  to  have  the  worst  performance.  The 
reduced  performance  of  the  FS-power  estimator  is  due  to  the 
increased  “self-noise”  generated  by  the  residual  intersymbol 
interference  effects.  For  this  reason,  we  have  not  pursued 
further  the  analysis  of  FS-based  power-law  estimators. 

In  Figure  2,  we  have  plotted  separately  the  theoretical 
and  experimental  standard  deviations  of  the  power-law,  the 
reduced-constellation  power-law,  and  the  Chen  etal.  (15)  es¬ 
timators,  assuming  MC  =  300  Monte-Carlo  simulation  runs, 
N  =  512  samples,  0  =  7r/12,  and  a  256-QAM  input  con¬ 
stellation.  The  experimental  values  are  well  predicted  by  the 
asymptotic  limits  for  all  three  estimators,  but  the  CRB  seems 
to  be  a  loose  bound.  In  Figure  3,  the  experimental  and  the¬ 
oretical  standard  deviations  of  the  power-law  and  the  Chen 
etal.  estimators  are  plotted  versus  the  number  of  samples 
(N),  assuming  SNR=  10  dB,  MC  =  300  Monte-Carlo  runs, 
0  =  7r/12.  It  turns  out  that  both  estimators  achieve  the 
asymptotic  bound  even  when  a  reduced  number  of  samples 
N  =  250  -r  500  are  used. 

In  Figure  4-a,  the  asymptotic  performance  of  the  Chen 
etal.  estimator  (14)  is  analyzed,  assuming  0  =  7r/5,  MC  = 
300,  and  N  =  512.  Figures  4-b  and  5  show  that  the  per¬ 
formance  of  the  Chen  etal.  estimator  depends  on  the  un¬ 
known  phase  0  and  has  a  larger  standard  deviation  than  the 
power-law  estimator  for  any  phase  offset  0  (Figure  5)  and 
for  any  SNR- level  (Figure  4-b).  In  Figure  5,  the  theoretical 
standard  deviations  (17)  and  (18)  are  plotted  on  the  inter¬ 
val  (— 7t/4,  7t/4)  assuming  perfect  a-priori  knowledge  of  the 
intervals  (14),  (15)  where  0  lies.  However,  in  the  presence  of 
a  wrong  a-priori  knowledge  on  0  (|0|  >  7r/4)  the  performance 
of  estimator  [4]  deteriorates  significantly. 

In  Figures  6  and  7,  we  have  analyzed  the  performance  of 
the  power-law  and  the  reduced-constellation  power-law  esti¬ 
mators  in  the  case  of  a  cross  128-QAM  constellation,  assum¬ 

ing  0  =  7t/12,  MC  =  300,  N  —  4000  samples.  For  such 
constellations,  the  Chen  etal.  estimator  cannot  be  used  since 
the  in-phase  and  quadrature  components  of  the  input  symbol 
stream  are  not  independent.  In  Figures  6  and  7-a,  the  ex¬ 
perimental  and  asymptotic  standard  deviations  of  the  power- 
law  and  the  reduced-constellation  power-law  estimators  are 
plotted  for  different  SNR  levels.  Figures  7-a,b  show  that  the 
asymptotic  limit  predicts  well  the  experimental  results  for  all 
SNR-levels  and  number  of  samples  N  >  1000.  It  appears  also 
that  for  cross-QAM  constellations,  the  power-law  estimator 
exhibits  very  slow  convergence  rate  and  good  estimates  of  the 
phase-offset  can  be  obtained  only  by  using  a  large  number  of 
samples  ( N  >  5,000).  Finally,  Figure  8  reveals  that  the  ap¬ 
proximate  asymptotic  limit  derived  in  [12]  does  not  predict 
well  the  exact  asymptotic  limit  of  the  power-law  estimator 
for  small  and  medium  SNRs  (SNR<  20dB). 


Id  A.  N.  D’Andrea,  U.  Mengali,  and  R.  Reggiannini,  “Carrier  phase  re¬ 
covery  for  narrow- band  polyphase  shift  keyed  signals,”  Alta  Freq .,  vol. 
LVII,  pp.  575-681,  Dec.  1988. 

[2]  A.  Belouchrani  and  W.  Ren,  “Blind  carrier  phase  tracking  with  guar¬ 
anteed  global  convergence,”  IEEE  Trans,  on  Signal  Processing,  vol.  45, 
no.  7,  pp.  1889-1894,  July  1997. 

[3]  K.  V.  Cartwright,  “Blind  phase  recovery  in  general  QAM  communi¬ 
cation  systems  using  alternative  higher  order  statistics,”  IEEE  Signal 
Processing  Letters ,  vol.  6,  no.  12,  pp.  327-329,  Dec.  1999. 

[4]  L.  Chen,  H.  Kusaka,  and  M.  Kominami,  “Blind  phase  recovery  in  QAM 
communication  systems  using  higher  order  statistics,”  IEEE  Signal  Pro¬ 
cessing  Letters,  vol.  3,  no.  3,  pp.  147-149,  May  1996. 

[5]  G.  J.  Foschini,  “Equalizing  without  altering  or  detecting  the  data,”  Bell 
Syst.  Tech.  J.,  vol.  64,  pp.  1885-1911,  Oct.  1985. 

[6]  C.  Georghiades,  “Blind  carrier  phase  acquisition  for  QAM  constella¬ 
tions,”  IEEE  7 Vans,  on  Communications ,  vol.  45,  no.  11,  pp.  1477-1486, 
Nov.  1997. 

[7]  F.  Gini  and  G.  B.  Giannakis,  “Frequency  offset  and  symbol  timing 
recovery  in  flat-fading  channels:  a  cyclostationary  approach,”  IEEE 
7 Vans.  on  Communications,  vol.  46,  no.  3,  pp.  400-411,  March  1998. 

[8]  D.  Godard,  “Self  recovering  equalization  and  carrier  tracking  in  two 
dimensional  data  communication  systems,”  IEEE  7 Vans,  on  Communi¬ 
cations,  vol.  28,  no.  11,  pp.  1867-1875,  Nov.  1980. 

[9]  T.  Hasan,  “Nonlinear  time  series  regression  for  a  class  of  amplitude 
modulated  cosinusoids,”  Journal  of  Time  Series  Analysis,  vol.  3,  no.  2, 
pp.  109-122,  1982. 

[10]  N.  Jablon,  “Joint  blind  equalization,  carrier  recovery,  and  timing  recov¬ 
ery  for  high-order  QAM  signal  constellations,”  IEEE  Trans,  on  Signal 
Processing,  vol.  40,  no.  6,  pp.  1383-1397,  June  1992. 

[11]  U.  Mengali  and  A.  N.  D’Andrea,  Synchronization  Techniques  for  Digital 
Receivers ,  Plenum,  NY,  1997. 

[12]  M.  Moeneclaey  and  G.  de  Jonghe,  “ML-oriented  NDA  carrier  synchro¬ 
nization  for  general  rotationally  symmetric  signal  constellations,”  IEEE 
TYans.  on  Communications,  vol.  42,  no.  8,  pp.  2531-2533,  Aug.  1994. 

[13]  “Proofs  of  Theorems  1,  2,”  http:/ / 

Fig.  1.  Standard  Deviation  vs.  SNR  a)  Experimental  Values  b) 
Asymptotic  Values  (256  square-QAM) 


Fig.  2.  Standard  Deviation  vs.  SNR:  Experimental/Theoretical  Val¬ 
ues  a)  Power  Estimator  b)  Reduced-Constellation  Power  Estimator 
c)  Chen  etal.  Estimator  (256  square-QAM) 

Fig.  5.  Standard  Deviation  vs.  Phase  offset:  Asymptotic  Limit  (256 

Fig.  6.  Standard  Deviation  vs.  SNR  a)  Power  Estimator  b)  Reduced- 
Constellation  Power  Estimator  (128  cross-QAM) 

Fig.  3.  Standard  Deviation  vs.  No.  of  Samples:  Power  Estimator  vs. 

Chen  etal.  Estimator  (256  square-QAM)  Fig.  7.  Standard  Deviation  vs.  SNR/Data:  a)  Reduced-Constellation 

Power-Law  and  Power-Law  Estimators  b)  Power  Estimator  (128 

Fig.  4.  Standard  Deviation  vs.  SNR  a)  Chen  etal.  Estimator  (0  = 
7t/5)  b)  Asymptotic  Limits  (256  square-QAM) 

Fig.  8.  Standard  Deviation  vs.  SNR:  Exact  and  Approximate  Asymp¬ 
totic  Limits  (256  square-QAM) 




Souad  MEDDEB,  Jean  Yves  TOURNERET  and  Francis  CASTANIE 

ENSEEIHT  /TESA,  2  Rue  Camichel,  31071  ,  Toulouse,  France 


This  paper  addresses  the  problem  of  time-invarying 
(TIV)  bilinear  system  identification.  The  input-output 
relation  of  a  TIV  bilinear  system  is  expressed  as  a  time- 
varying  recursive  equation.  Such  formulation  allows  us 
to  estimate  the  unknown  bilinear  system  parameters 
using  a  modified  least-squares  (MLS)  algorithm.  The 
MLS  method  provides  unbiased  estimates  of  the  un¬ 
known  bilinear  parameters.  Several  simulations  illus¬ 
trate  the  MLS  estimator  performance. 


Linear  models  have  found  a  variety  of  applications  in 
many  areas  such  as  speech  processing,  image  process¬ 
ing  and  communications.  These  models  include  para¬ 
metric  Autoregressive  (AR),  Moving  Average  (MA)  or 
Autoregressive  Moving  Average  (ARMA)  models.  The 
use  of  these  parametric  models  can  be  motivated  by 
the  following  property:  for  any  real-valued  stationary 
process  y  ( n )  with  continuous  spectral  density  S  (/),  it 
is  possible  to  find  an  ARMA  process  whose  spectral 
density  is  arbitrarily  close  to  S  (/)  ([2],  p.  130).  How¬ 
ever,  these  models  fail  to  identify  many  systems  which 
are  inherently  nonlinear. 

Bilinear  model  has  been  used  successfully  to  approx¬ 
imate  a  large  class  of  nonlinear  systems  [5]  [7],  Its  abil¬ 
ity  to  represent  many  nonlinearities  efficiently  and  with 
a  relatively  small  number  of  parameters  is  owing  to  its 
feedback  structure  [5].  Other  properties  motivating  the 
use  of  bilinear  systems  are  also  discussed  in  [4],  The 
problem  of  estimating  bilinear  system  parameters  using 
measurements  of  the  system  input  and  output  signals 
has  received  much  attention  in  the  literature  [3]  [6].  Re¬ 
cursive  estimation  algorithms  including  the  recursive 
least  squares  algorithm  (RLS)  or  the  extended  least 
squares  algorithm  (ELS)  have  been  studied  in  [3].  The 
main  advantage  of  the  RLS  algorithm  is  its  simplicity 
because  of  the  linearity  in  the  parameters.  However, 
the  algorithm  provides  biased  estimates.  Simulations 
presented  in  [3]  have  shown  that  the  ELS  algorithm 

outperforms  the  RLS  algorithm  in  terms  of  bias.  How¬ 
ever,  no  theoretical  study  was  provided  because  of  the 
non-linear  estimation  problem  and  the  difficult  com¬ 
putation  required.  Hence  various  methods  have  been 
devised  to  obtain  unbiased  estimators  from  linear  esti¬ 
mation  problems.  Some  of  these  methods  are  based  on 
modifying  the  least  squares  estimator  by  substracting 
the  bias  from  the  estimates  [8].  This  paper  studies  the 
modified  least  squares  (MLS)  algorithm  for  the  identi¬ 
fication  of  bilinear  systems.  The  MLS  algorithm  yields 
unbiased  parameter  estimates  and  lower  computational 
cost  than  the  ELS  algorithm. 

The  paper  is  organized  as  follows.  Section  II  presents 
the  problem.  Section  III  studies  the  recursive  MLS  al¬ 
gorithm  for  the  bilinear  system  identification  problem. 
Simulation  results  and  conclusions  are  reported  in  sec¬ 
tion  IV. 


The  output  x(t)  of  a  bilinear  system  driven  by  the  input 
sequence  u(t)  can  be  defined  by  the  following  recursive 
equation  : 

p  p 

x(t)  —  ajx(t  —  i)  +  bju(t  —  i) 

7=1  7=1 

P  V 

+  J2  -  j)x(t  -  i)  (1) 

7=1  j~  1 

where  aj,6,,Cij  are  the  unknown  bilinear  system  pa¬ 
rameters  and  t  =  1, ...,  N.  A  noisy  version  of  x(t)  de¬ 

y(t)  =  x(t)  +  e(t)  (2) 

is  observed  (see  fig.  1).  In  eq.  (2),  e{t)  is  a  stationary 
white  Gaussian  noise  with  zero  mean  and  variance 

E[e{t)e(s)\  =  a26t,s 

where  6t,s  is  the  kronecker  symbol.  Eq.’s  (1)  and  (2) 
show  that  the  observed  process  y(t)  satisfies  the  follow- 

0-7803-5988-7/00/$  10.00  ©  2000  IEEE 


ing  time- varying  (TV)  model : 


y(t )  =  ao(t)  +  -  0  +  eW  (3) 

i— 1 

where  the  TV  parameters  are 



—  y^bju(t-i) 
i— 1 




=  0>i  “h  ^  ^  j) J  ^ 

3= 1 


In  eq.  (3),  e(t )  is  a  colored  noise  sequence  defined  by: 


e(t)  =  e(t)  -  ^2,ai{t)e(t  -  i),t  =  1, N 
i= 1 

Model  (3)  is  similar  to  the  TV  ARMA  model  studied  in 
[1]  for  the  identification  of  non-stationary  signals  em¬ 
bedded  in  noise.  Indeed,  dj(f)  can  be  viewed  as  a  linear 
combination  of  functions  fj(t)  as  follows: 


Cli(t)  —  ^  i  —  0,  ...,p 


j= 0 


=  0,  aoj  =  bj,  j  1,  ...,p 



—  —  Cij ,  j  1 ,  ■  •  -P 


=  1  ,fj(t)  =  u(t-j),j  =  l  =  l,. 


Eq.  (5)  is  similar  to  the  decomposition  of  the  time- 
varying  AR  parameters  onto  a  set  of  basis  time  func¬ 
tions  studied  in  [1].  This  paper  proposes  to  estimate 
the  unknown  bilinear  system  parameters  from  the  in¬ 
put  and  output  samples  u(t)  and  y{t)  for  t  =  l, N 
using  the  modified  least  squares  (MLS)  algorithm  [1], 


Denote  dT  =  (bT,dJ)  the  bilinear  system  parameter 
vector  with  bT  =  (b\, ... ,  bp)  and 

9 1  —  (al !  CU,  C  1,2)  •  •  •  )  Cl  }p,  Cl2)  c2,l)  ■  •  ■  )  api  ■  •  •  1  Cp,p) 


Eq.  (3)  can  be  written  in  matrix  form  as  follows: 

y(t)  =yf_10  +  e(t),  t  =  l,...,N  (8) 


y£-i  =  (u(t-l),u(t-2),...,u(t-p), 

y(t  -  1),  y{t  -  l)u(f  -  1 ),...,  y(t-  1  )u(t  -  p), 
'  •  •  1 

y(t  -  p),y(t  -  p)u(t  - 1 ),...,  y{t-  p)u(t  -  p)) 

3.1.  The  Conventional  LS  Algorithm 

The  conventional  least  squares  (LS)  estimator  of  9  de¬ 
noted  6 isr,  is  defined  by 

9m  =argmin  J\{9)  (9) 


where  Ji(9)  =  I Zt=ie2^)-  Since  Ji{9)  is  linear  w.r.t. 
9,  an  analytical  solution  for  9  can  be  derived: 

(N  \  -1  N 


t=i  /  t=i 

The  white  noise  sequence  e(t)  being  zero-mean  and 
decorrelated  with  x(t),  lim  On  can  be  expressed  as 

N— >oo 

a  function  of  the  true  parameter  vector  as  follows  : 


where  0P|P  is  the  p  x  p  zero  matrix, 

Pn  =  feyt"iy£-i  j 


Ut  =  (1,  u(t  -  1), . . . ,  u(t  -  p))T  (1,  u(t  -  1), . . . ,  u(t  -  p)) . 

Eq.  (10)  shows  that  the  LS  estimator  of  6  is  generally 
asymptotically  biased. 

3.2.  The  Extended  LS  Algorithm 

The  Extended  Least  Squares  (ELS)  algorithm  has 
shown  interesting  properties  for  pseudo-linear  regres¬ 
sion  models  such  as  (8)  [3].  This  algorithm  can  be 
summarized  as  follows: 

Qn  =  1  +  y^PNyN, 

Pn+ i  =  Pn  ~  RNyNQjvVw-fW, 

&N+1  =  9n  +  RiVyjvQjv1  (yw+l  —  yV^iv)) 

y(N  + 1)  =  yJfON+i, 

Vn+i  =  («(*0, . . .  ,y(N), . . .  ,y(N+l-p)u{N+l-p)). 

It  is  well  known  that  the  ELS  algorithm  provides  unbi- 
ased  estimates.  However,  it  suffers  from  stability  prob¬ 
lems  [3].  Next  section  studies  another  unbiased  estima¬ 
tor  known  as  Modified  Least  Squares  (MLS)  estimator. 


3.3.  The  Modified  LS  Estimator 

The  MLS  estimator  also  denoted  bias-compensated 
least  squares  estimator  is  defined  as  follows  [8]: 

6n  =  On  +  cr2 PnVnOi  n-i  (11) 

The  MLS  estimator  defined  by  eq.  (11)  is  clearly 
asymptotically  unbiased.  However,  this  estimator  re¬ 
quires  to  compute  the  sum  of  N  matrices  of  size 
(p2+2p)  x  (p2+p).  In  order  to  avoid  such  computation, 
we  assume  in  the  following  that  the  input  sequence  u(t) 
is  a  sequence  of  mutually  independent  and  identically 
distributed  (i.i.d)  random  variables  with  zero-mean  and 
variance  <j2  =  1.  In  this  case,  the  following  result  can 
be  obtained: 

lim  —VN  = 

N—> co  N 

=  V  (13) 

The  following  biased  compensated  LS  estimator  can 
then  be  defined: 

On  —  On  +  c t2NPnVOi,n-i  (14) 

Eq.  (14)  explicitely  depends  on  the  noise  variance  a2. 
Next  section  studies  a  recursive  algorithm  for  the  joint 
estimation  of  cr2  and  6  as  in  [8]. 


Denote  £t(7V)  the  residual  at  time  N  and  Rn  the  sum 
of  residual  squares: 


y(t)  -  yf-^N 

Rn  =  Y;&(N)  (16) 

t= 1 

Eq.  (8)  shows  that  the  residual  £t(N)  can  be  written 

UN)  =yf-i  V-ON)  +  e(t)-'Z_101 


It  is  well  known  that  On  satisifes  the  normal  equations 
(obtained  by  differentiating  J\  (6)  with  respect  to  0) 


5>-i6(j>0  =  o 

t= i 


A  )UN) 


^r-Rjv  =  a2  +  o2E  K]  V9i 

By  replacing  the  expectation  E  and  E  0N  by 

their  instantaneous  values,  an  estimator  of  the  noise 
variance  can  be  defined: 

c.2  _  1  Rn  ,10, 

aN  ~  AT  ~  (18) 

1  +  0  N  V0\tN—l 

The  MLS  algorithm  for  the  joint  estimation  of  the  noise 
variance  cr 2  and  the  bilinear  system  parameter  vector 
6  is  then  based  on  the  following  recursive  equations: 

ef-i  =  (e(t-l),e(t-l)u(t-l),...,e(t-l)u(t-p), 
e(t  -  p),  e(t  -  p)u(t  -  1 ),...,  e(f  -  p)u(t  -  p)) 

Qn  =  1  +  yw-Pivyjv,  (19a) 

Rn+i  =  Pn  ~  RjvyArQjv’y^-fV,  (19b) 

On+i  =  On  +  PNyNQj/iyN+i  -  ylfOu),  (19c) 

Rn+i  —  Rn  +  £n+i(N  +  l)^^1,  (19d) 

-2  _  1  Rn+i  \ 

',+'  ‘  ’ 

0N+1  =  0N+1  +  (N  +  l))Z2N+1PN+1V9h^m) 

Note  that  eq.’s  (19a),  (19b)  and  (19c)  are  the  classical 
RLS  equations  [3].  Eq.’s  (19d),  (19e)  and  (19f)  en¬ 
sure  that  the  bilinear  system  parameter  estimates  are 
asymptotically  unbiased.  It  is  interesting  to  note  that 
the  MLS  algorithm  does  not  require  any  matrix  inver- 


Many  simulations  have  been  performed  to  illustrate  the 
previous  theoretical  results.  For  this  experiment,  con¬ 
sider  the  following  second-order  bilinear  system  [3] 

x(t)  =  1.5x(t  -  1)  -  0.7x(t  -  2)  +  u(t  -  1) 
+0.5u(t  —  2)  +  0.12a;(t  —  1  )u(t  -  1) 

The  observed  driving  sequence  u(t)  is  white  Gaussian 
with  variance  1.  The  bilinear  signal  x(t)  is  contami¬ 
nated  by  white  Gaussian  noise  with  signal-to-noise  ra¬ 
tios  (SNR’s)  ranging  from  5  to  40dB.  The  algorithm  is 


initialized  with  0  =  0  and  .P/v  =  1/SI  where  6  «  1. 
Fig.  2  shows  the  convergence  of  the  noise  variance 
estimate  to  its  true  value  ( SNR  =  5  dB  or  equiva¬ 
lently  a2  =  7.28)  from  10  Monte-Carlo  simulations. 
The  mean  square  errors  (MSE’s)  of  the  bilinear  system 
estimates  using  RLS,  ELS  and  MLS  algorithms  com¬ 
puted  from  10  Monte-Carlo  simulations  are  depicted  in 
fig.  3  as  a  function  of  the  SNR  for  N  =  4000.  The 
MLS  estimator  clearly  outperforms  the  usual  RLS  es¬ 
timator  in  terms  of  MSE.  Fig.  3  also  shows  that  the 
MLS  estimator  outperforms  the  ELS  estimator  for  low 
SNR’s.  Tables  2  and  3  show  the  bias  of  RLS,  ELS  and 
MLS  estimates  for  two  values  of  SNR.  As  expected,  the 
MLS  estimator  outperforms  the  usual  RLS  estimator  in 
terms  of  bias.  The  MLS  and  ELS  algorithms  perform 
very  similarly  in  term  of  bias. 


Several  non  linear  techniques  have  been  proposed  for 
modeling  non  linear  channels  with  memory.  These 
techniques  include  Volterra  series,  wavelet  networks 
and  neural  networks  [11].  The  use  of  Volterra  series  to 
model  satellite  channels  was  motivated  in  [9]  and  [10]. 
These  Volterra  models  suffer  from  the  number  of  pa¬ 
rameters  that  increases  exponentially  with  the  memory 
and  nonlinearity  order.  It  is  well  known  that  the  bilin¬ 
ear  model  can  be  decomposed  in  a  Volterra  series  with 
a  reduced  number  of  parameters  [4].  Consequently,  this 
paper  propose  1)  to  model  the  non  linear  satellite  chan¬ 
nel  using  the  bilinear  model  and  2)  to  identify  such  non 
linear  model  using  the  LS  procedures  described  in  pre¬ 
vious  sections.  A  simplified  satellite  channel  consists 
of  two  earth  stations  connected  by  a  satellite  repeater 
as  depicted  in  fig.  4  (see  [11]  for  more  details  including 
channel  characteristics).  As  an  example,  Fig.  5  shows 
the  normalized  prediction  error  between  the  outputs 
of  the  noisy  simplified  satellite  channel  and  the  corre¬ 
sponding  bilinear  system  computed  using  MLS  algo¬ 


The  new  contribution  of  this  paper  is  to  derive  a  mod¬ 
ified  least  squares  algorithm,  from  the  theory  of  lin¬ 
ear  time-varying  models  for  the  identification  of  time 
invarying  bilinear  models.  A  recursive  version  of  the 
modified  least  squares  algorithm  is  derived  as  well.  The 
algorithm  provides  estimates  of  the  noise  variance  and 
bilinear  model  parameters.  Bilinear  MLS  parameter  es¬ 
timates  are  shown  to  be  asymptotically  unbiased.  The 
MLS  estimator  performance  is  compared  to  that  of  the 

RLS  and  ELS  estimators.  The  MLS  estimator  is  finally 
applied  to  the  identification  of  the  non  linear  satellite 


[1]  G.  Alengrin,  M.  Barlaud  and  J.  Menez,  “Unbiased 
Parameter  Estimation  of  Nonstationary  Signals  in 
Noise,”  IEEE  trans.  on  ASSP,  vol.  34,  n°5,  pp.  1319- 
1322,  oct.  1986. 

[2]  P.  J.  Brockwell  and  R.A.  Davis,  Time  Series:  Theory 
and  Methods,  Springer  Verlag,  1990. 

[3]  F.  Fnaiech  and  L.  Ljung,  “Recursive  Identification  of 
Bilinear  Systems,”  Int.  J.  Control,  Vol.  45,  No.  2,  pp. 
453-470,  1987. 

[4]  D.  Guegan,  “Serie  Chronologiques  Non  Lineaires  a 
Temps  Discret”,  Statistique  Mathematique  et  Prob¬ 
ability,  Economica. 

[5]  V.  John  Mathews,  “Adaptive  Polynomial  Filters,” 
IEEE  SP  Magazine,  pp.  10-26,  July  1991. 

[6]  S.  Meddeb,  J.  Y.  Tourneret  and  F.  Castanie,  “Identifi¬ 
cation  of  Bilinear  Systems  Using  Bayesian  Inference”, 
Proc.  of  ICASSP,  pp.  1609-1612,  Seattle,  USA,  May 
12-15,  1998. 

[7]  R.  R.  Mohler  and  W.  J.  Kolodziej,  “An  over  view  of 
bilinear  system  theory  and  applications,”  IEEE  Trans¬ 
action  on  Systems,  Man,  and  Cybernetics,  Vol.  SMC- 
10,  pp.  683-688,  1982. 

[8]  S.  Sagara  and  K.  Wada,  “On-line  modified  least- 
squares  parameter  estimation  of  linear  discrete  dy¬ 
namic  systems,”  Int.  Jour,  of  Cont.,  Vol.  25,  no.  3, 
pp.  329-343,  1977. 

[9]  S.  Benedetto,  E.  Biglieri  and  R.  Daffara,  “Modelling 
and  performance  evaluation  of  nonlinear  satellite  links 
-  A  Volterra  series  approach,”  IEEE  Trans.  AES,  vol. 
15,  pp.  494-506,  July  1979. 

[10]  S.  Meddeb  and  J.  Y  .  Tourneret,  “Identification  of 
Non-linear  Satellite  mobile  channels  using  Volterra  Fil¬ 
ters,”  in  proc.  EUSIPCO,  Tampere  (Finland),  septem- 
bre,  2000. 

[11]  M.  Ibnkahla,  N.  J.  Bershad,  J.  Sombrin  and  F.  Cas¬ 
tanie,  ’’Neural  networks  modelling  and  identification 
of  nonlinear  channels  with  memory:  Algorithms,  ap¬ 
plications  and  analytic  models,”  IEEE  Trans.  SP,  vol. 
46,  no.  5,  May  1998. 




Nicolas  PETROCHILOS1'2 

Pierre  COMON 2 

(1)  CAS,  Dept  EE,  Delft  Univ.  of  Technology 
Mekelweg  4,  2628  CD  Delft,  The  Netherlands 


This  article  presents  a  method  to  blindly  identify  linear 
quadratic  channels  (LQC).  The  method  is  designed  for 
the  single-input/single-output  (SISO)  case  with  white 
inputs  with  specific  distributions  (as  those  usually 
found  in  digital  communications).  Using  High-Order 
Statistics  (HOS)  of  the  input,  the  method  is  able  to 
match  the  third-order  moments  with  the  LQC  model, 
yielding  an  original  simple  relation.  Several  simula¬ 
tions  are  performed  and  show  a  fair  accuracy  given 
sufficiently  long  observation  records. 


Nonlinear  systems  provide  a  better  approximation  to 
real  life  channels,  and  many  examples  of  nonlinearities 
can  be  found  in  nonlinear  control  systems  [5] ,  hydrody¬ 
namics  [4],  satellite  communication  systems  [1],  or  un¬ 
derwater  acoustics,  among  others.  Blind  methods  are 
attractive  when  the  input  is  unknown,  and  to  avoid  the 
reduction  of  the  information  rate  caused  by  the  inser¬ 
tion  of  training  sequences. 

Blind  identification  of  Volterra  systems  has  been  al¬ 
ready  widely  studied  in  the  past.  For  instance,  in  [7], 
the  authors  derive  the  cumulant-matching  equations, 
allowing  to  blindly  identify  a  pure  real  quadratic  sys¬ 
tem,  with  i.i.d.  inputs  of  unknown  distribution.  Next 
in  [2],  P.Bondon  goes  much  further,  and  derives  identi- 
fiability  conditions,  when  two  input  sequences  are  ob¬ 
served,  one  Gaussian  and  one  non  Gaussian. 

In  this  paper,  we  focus  our  attention  on  linear- 
quadratic  systems,  with  specific  discrete  inputs,  en¬ 
countered  in  n— PSK  and  QAM  digital  modulations. 
So  this  contribution  differs  from  the  previous  ones  in 
two  respects:  the  system  is  not  purely  quadratic,  and 

This  work  was  partly  supported  by  ENS  Lyon,  ENSEA,  TU- 
Delft,  and  the  RNRT  project  “Paestum”.  The  first  author  thanks 
A.  Trindade,  A.  Heldring,  S.  Halford,  and  A.  Elmilady  for  their 
moral  support,  E.  Serpedin  for  useful  discussions,  and  G.  Gian- 
nakis  for  having  attracted  his  attention  on  the  non-linear  blind 
identification  problem. 

(2)  I3S,  Algorithmes-Euclide-B 
2000  route  des  Lucioles,  BP  121 
F-06903  Sophia-Antipolis  cedex,  France 

the  inputs  are  imposed  to  be  discrete  and  of  known 
distribution.  The  scope  is  thus  less  general. 


The  problem  is  modeled  here  by  the  parameterization 
of  the  channel  and  by  the  statistics  of  the  inputs. 

2.1.  Volterra  kernel  model 

The  model  is  described  by  the  noisy  output  of  a  nonlin¬ 
ear  system  moving  average  Volterra  model  (which  can 
be  of  any  order).  Sampling  at  a  rate  Ts  and  restrict¬ 
ing  to  the  Linear-Quadratic  case,  the  channel  can  be 
modeled  as: 


y(n)  -  ^2  (0  ®  (n  -  0  + v  M 

+  h2  (*’•?)  *  (n  -  *)  x  (n  -  i)  (!) 

i,j~  0 

where  x(n)  is  the  input  signal,  v(n)  denotes  the  ad¬ 
ditive  noise,  and  h„  is  called  the  nt/l-order  Volterra 
non-linear  operator  (here,  we  only  have  the  linear  and 
the  quadratic  term:  and  ^2 Ui , ^2)) ■  Without  loss 

of  generality,  we  consider  that  hn  is  symmetric  in  its 
arguments  [6,  pp. 80-81]. 

2.2.  Usual  communication  inputs 

For  the  sake  of  convenience,  denote: 

eab  =  E  [xaXb*]  . 

In  this  article,  we  consider  inputs  commonly  used  in 
digital  communications,  sharing  the  high-order  proper¬ 

£21  =  £31  =  £32  =  £41  =  £42  =  0  (2) 

Among  these  inputs,  two  groups  have  been  identified 
(see  [9]  and  [3]): 

0-7803-5988-7/00/$10.00  ©  2000  IEEE 


•  Distributions  that  are  symmetric  about  both  axes 
in  the  complex  plane:  p(z)  =  /(Sft{z})  •  </($>{z}). 
Corresponding  random  variables  can  be  rewrit¬ 
ten  as  z  =  s  +  je' ,  where  s  and  s'  are  real,  in¬ 
dependent,  and  symmetrically  distributed,  and 

j2  d=  —1.  QAM  constellations,  in  digital  com¬ 
munications,  belong  to  this  class. 

•  Discrete  distributions  that  are  invariant  by  a  ro¬ 
tation  of  an  angle  of  the  form  ^ ,  (K  G  N). 
QPSK,  double  QPSK,  and  any  n-PSK  are  in¬ 
cluded  in  this  class  as  soon  that  n  >  4. 

we  get  the  matrix  formulation: 

C12Y  =  At  B  A  (6) 

where  A  is  the  (L2  +  1)  x  (Li  +  L2  +  1)  Upper  Tri¬ 
angular  Band  (UTB)  Toplitz  matrix  containing  hi  = 
[h(0)  . . .  h(L\)]  in  the  first  row  and  zeros  elsewhere: 


First,  the  basis  of  the  identification  process  is  pre¬ 
sented,  then  the  algorithms  are  derived,  and  a  proof 
of  uniqueness  is  eventually  given. 

3.1.  Moment-matching  relations 

Consider  the  following  assumptions: 

(AS1)  The  channel  is  Linear-Quadratic  of  finite  known 

(AS2)  The  input  is  stationary  independent  identically 
distributed  (i.i.d.),  and  must  comply  with  the 
properties  (2);  <r2  =  eu  and  /i4x  =  £22  are  also 
assumed  to  be  known. 

(AS3)  The  noise  is  signal-independent  white  Gaussian. 
Let  us  now  define  the  complex  bicorrelation  as: 

Cny{lt  k)  d=  E  {y*  ( n)y(n  +  l)y(n  +  k)}  (3) 

Under  assumptions  (AS1-AS3),  the  bicorrelation 
of  the  output  (3)  and  the  channel  model  (1)  should 
match,  which  gives  the  following  relations: 

Ci2y{l,  k)  =  H  Hi  +  0*1  U  +  *)*2 (*’,  j)  (4) 

with  (/,  k)  G  [—L2.i1]  x  [— L2,Li],  and  where' 

=f  [2e?1+i(i-J0(£23-2cf1)]  h*2(i,j).  The 
Z-transform  of  C\2(l,  k)  gives  in  the  [Z\ ,  Z2)  domain: 

while  B  is  symmetric  complex  and  contains  the  values 
of  the  kernel  h \ : 

B  d= 

k  (£3,  £3) 
k  (£2,0) 

*5(0,  £2) ' 

*5(o,o)  . 

We  propose  to  identify  the  channel  coefficients  by 
using  either  relation  (5),  or  (6)  with  the  estimate 

Ui2y  (l,  k)  d=  i  J2n=i  y*  («)  y{n  +  l)y{n  +  k). 

One  can  notice  that  C12Y  is  a  (Li  +L2  +  1)  square 
matrix  of  rank  (L2  +  1).  This  observation  allows  to 
detect  the  length  of  the  channels  (hi,  h2)  from  an  es¬ 
timate  C12Y  of  C12Y. 

3.2.  Proposed  algorithms 

We  propose  several  algorithms:  (i)  a  Root-Finding 
method  (RF),  (ii)  a  Sub-Space  Intersection  method 
(SSI),  (iii)  a  method  that  forces  the  row  span  to  have 
certain  triangular  properties  (UTB),  and  (iv)  an  itera¬ 
tive  Multidimensional  Search  method  (MS). 

(i)  One  can  give  several  values  to  Z2  in  (5),  and 
get  several  functions  of  Z\\  (Z 1).  These  functions 

F»a  (Zi)  share  the  roots  of  Hi(Zi):  ri,  which  are  de¬ 
tected  by  clustering.  The  channel  h\ (n)/hi(0)  is  the  in¬ 
verse  Z-transform  of  nf=i  ~  r»).  and  one  can  build 
A.  Denoting  A-  the  Moore-Penrose  pseudo-inverse  of 
A,  h2  is  recovered  via  the  “deconvolution”: 

B  =  At“  •  C12Y  •  A- .  (7) 

Sny{ZuZi)  =  HX  (ZX)  Hx  (Z2)  H*2  (±,  (5) 

Equations  (4)  or  (5)  form  the  core  of  the  algorithms 
subsequently  proposed.  By  stacking  the  elements  of 
Ci2y{l,k)  in  a  matrix  C12Y  as: 

Cl2y  (— L2,  — L2)  Cl2y  (  — L2,  Ll) 

C12Y  d= 

Ul2y  (Ll,  —  L2)  •••  C12y  {Lx,  Lx) 

(ii)  Alternatively,  one  can  factorize  the  matrix 
C12Y  in  order  to  recover  the  vector  hi  in  a  similar 
fashion  as  in  [8].  In  the  noiseless  case,  given  that  B 
has  no  null  eigenvalue,  the  matrix  model  (6)  implies 
clearly  that: 

row(A)  =  row(C12Y) 
col(AT)  =  col(C12Y)  W 

Considering  the  singular  value  decomposition  (SVD)  of 
the  symmetric  complex  matrix  C12Y  =  VT  .  S  .  V,  we 


define  V  as  the  L2  +  1  first  rows  of  V,  associated  with 
the  1,2  +  1  dominant  singular  values.  Let  be  the 
L2  + 1  x  Li  + 1  submatrix  extracted  from  V  that  gathers 
the  columns  i  to  L\  +  i.  Then  the  conditions  (8)  are 
restated  as:  hi  G  VW,  Vi  G  [1, .. ,  L2+ 1].  Thus  can 
be  obtained  by  computing  the  dominant  right  singular 
vector  of  the  matrix  V  containing  all  V(’)  stacked  one 
above  the  other: 

'  y(i)  ' 

V  =  : 

Then  the  matrix  B  can  be  estimated  afterwards  by  the 
“deconvolution”  procedure  (7). 

(iii)  Another  technique  consists  of  forcing  the  UTB 
structure  of  A  beforehand  by  combining  the  rows  of 
matrix  V;  this  is  possible  because  of  Lemma  1.  Then, 
one  extracts  the  L2  +  1  dimensional  row  vectors  v^i 
contained  in  the  UTB  matrix  TV,  and  stacks  them  in 
a  matrix  V.  The  rest  of  the  procedure  is  identical  to 
the  previous  approach  (ii) . 

(iv)  Lastly,  one  can  perform  an  iterative  search  in 
the  (Li  +  L2(L2  +  l)/2)  dimensional  space  of  the  matrix 
product  of  (6)  in  order  to  find  the  parameters  0  (hi ,  h2) 
that  minimize  the  error  in  the  sense  of  the  Frobenius 

e(hi,h2)  =  argmin  ||C12Y  -  [At  ■  B  ■  A]  (0)||^ 

3.3.  Uniqueness 

Lemma  1  Let  N  and  P  be  two  positive  integers.  Un¬ 
der  certain  regularity  conditions,  any  N  x  (N  +  P) 
rectangular  matrix  M  can  be  put  in  UTB  form  by  pre¬ 
multiplication  by  a  square  invertible  matrix  T.  The 
matrix  T  is  unique  up  to  an  invertible  diagonal  multi¬ 
plicative  matrix. 

Proof:  The  constructive  algorithm  is  very  similar  to 
Gaussian  elimination.  Assume  there  are  two  matrices 
Ti  and  T2  such  that  M  =  Ti  Ui  and  M  =  T2U2, 
where  Ui  and  U2  are  UTB.  Then,  considering  the 
N  first  columns  of  both  sides  shows  that  the  matrix 
TiTj  1  relates  two  Lower  Triangular  (LT)  matrices, 
and  is  thus  LT  itself.  Similarly,  considering  the  N  last 
columns  shows  that  T1T2  1  is  Upper  Triangular  (UT). 
Thus,  it  is  diagonal,  which  eventually  shows  that  Ti 
and  T2  are  related  by  a  diagonal  multiplicative  ma¬ 
trix.  □ 

Lemma  2  Any  symmetric  complex  matrix  C  can  be 
factorized  as  C  =  LLT,  where  L  is  lower  triangular. 
Matrix  L  is  unique  up  to  the  post- multiplication  of  a 
diagonal  matrix  A  formed  of  signs  {+!}. 

Proposition  3  If  B  is  square  full  rank,  and  A  is 
UTB,  then  the  decomposition  of  a  complex  symmetric 
matrix  C  =  AT  B  A  is  unique  up  to  a  multiplicative 
diagonal  matrix. 

Proof:  The  proposition  is  a  direct  consequence  of  lem¬ 
mas  1  and  2.  It  is  easily  seen  that  if  (A,  B)  is  solution, 
then  so  is  (AA,  A-1BA-1),  where  A  is  any  diagonal 
regular  matrix.  0 

Corollary  4  Let  B  be  full  rank  symmetric  complex 
and  A  Toplitz  UTB.  When  the  decomposition  of  a  sym¬ 
metric  matrix  as  C12Y  =  AT  ,B  .  A  exists,  then  it  is 
unique  up  to  a  scalar  multiplicative  factor. 

Proof:  From  proposition  3,  if  A  is  solution,  then  so  is 
AA,  with  A  diagonal.  But  because  A  is  Toplitz,  AA 
can  be  Toplitz  only  if  A  is  proportional  to  the  Identity 
matrix.  □ 


In  order  to  illustrate  the  Root-Finding  (RF)  method 
step  by  step,  we  first  present  a  typical  example  with 
only  RF  and  MS  methods.  Later  we  show  a  a  more 
exhaustive  study  with  all  the  methods.  Because  we  are 
mainly  interested  in  direct  methods,  the  MS  is  given 
only  as  a  reference. 

In  all  simulations,  the  input  x  was  4-PSK.  We  used 
the  real  channel  given  by  [9]  (hi  =  [1, 0.5,  -0.8, 1.6, 0.4] 
and  h2  —  [1, 0.6;  0.6,  —0.3]). 

Typical  example:  The  input  is  QPSK;  the  num¬ 
ber  of  samples  is  16284  points;  and  the  SNR  is  10  dB. 
Figure  I. a.  illustrates  the  clustering  method.  It  shows 
all  the  roots  calculated  for  different  Z2,  the  true  roots, 
and  the  ones  estimated  by  the  method,  the  estimated 
roots  (stars)  are  fairly  accurate  and  match  the  real  ones 
(square).  Figure  I.b.  shows  the  spectra  of  the  real  and 
estimated  linear  channels.  Both  estimated  spectra  are 
fairly  accurate. 

Computer  comparisons:  A  first  study  showed 
that  the  estimation  noise  of  C\2y  is  rapidly  predomi¬ 
nant  over  the  additive  noise  contribution.  As  expected, 
the  Gaussian  noise  does  not  interact  in  the  third-order 
moment  as  soon  as  the  length  of  integration  is  long 
enough.  So  we  mainly  tried  to  estimate  the  influence 
of  the  number  of  samples.  For  each  number  of  samples 
we  took  1000  independent  realizations,  and  the  SNR 
is  10  dB.  For  each  realization,  we  estimated  Ci2y,  on 
which  we  applied  all  the  algorithms.  Since  in  our  case 
C12Y  is  6  x  6,  the  most  computational  intensive  step 
is  its  estimation  for  the  direct  methods.  Due  to  its  it¬ 
erative  nature,  up  to  several  thousand  of  samples,  the 
most  intensive  step  for  the  MS  method  is  the  multi¬ 
dimensional  search. 

Figure  II  presents  the  influence  of  the  integration 
length  on  the  mean  and  variance  of  both  estimates. 


Figures  II. a.  and  II. b.  show  that  all  methods  con¬ 
verge  to  the  true  channel,  the  bias  behaves  well  from 
4096  points.  The  RF  is  the  slowest  method  to  con¬ 
verge  to  the  expected  value,  while  the  MS  is  the  fastest 
to  converge,  The  SSI  and  the  UTB  follow  similar  pat¬ 

Figures  II.c.  and  Il.d.  present  the  variances  of  both 
methods.  The  variances  follow  approximately  a  linear 
slope.  It  is  difficult  to  decide  which  method  behaves 
the  best.  One  can  notice  that  the  MS  has  stationary 
performance  after  64000  samples,  this  is  because  this 
method  was  implemented  in  a  too  rustic  way,  and  it 
happens  that  a  few  times  the  MS  algorithm  is  stuck  in 
local  minima,  thus  degrading  the  quality  of  the  stan¬ 
dard  deviation.  While  not  visible  on  the  figure,  the  best 
method  varies  for  each  element  of  hi,  and  generally 
around  4096  samples  the  best  method  changes.  Never¬ 
theless,  above  4096  samples  clearly  the  best  method  is 
the  SSI. 

The  variance  shows  well  the  usual  problem  with 
High  Order  Statistics:  in  order  to  have  consistent  high- 
order  moment  estimate,  the  integration  length  must  be 
long  enough:  a  minimum  of  8192  seems  to  be  required 


Several  methods  have  been  proposed  to  blindly  iden¬ 
tify  a  linear-quadratic  channel  for  communication  ap¬ 
plications.  The  idea  is  to  use  the  specificities  of  the 
distribution  of  the  inputs.  The  methods  have  shown 
to  converge  with  a  good  accuracy,  with  a  rather  large 
number  of  samples. 

4  i  •  •  U  I 


LANI,  Digital  Transmission  Theory ,  Prentice-Hall 
Inc.,  New  Jersey,  1987. 

[2]  P.  BONDON,  M.  KROB,  “Blind  identifiability  of 
quadratic  stochastic  system” ,  IEEE  trans.  on  In¬ 
formation  Theory ,  vol.  41,  no.  1,  pp.  245-254,  Jan. 

[3]  N.  PETROCHILOS,  “Elements  for  blind  identifi¬ 
cation  of  non-linear  channels”,  supervision  by  G. 
Giannakis,  Master’s  thesis,  ENSEA  /  ENS  Lyon, 
Sept.  1996,  In  archive  of  ENSEA,  France. 

[4]  E.  J.  POWERS,  S.  IM  et  al.,  “Applications 
of  hos  to  nonlinear  hydrodynamics”,  in  IEEE- 
ATHOS  Workshop  on  Higher-Order  Statistics ,  Be- 
gur,  Spain,  12-14  June  1995,  pp.  414-418. 

[5]  W.  J.  RUGH,  Nonlinear  System  Theory ,  Johns 
Hopkins  Univ.  Press,  Baltimore,  MD,  1981. 

[6]  M.  SCHETZEN,  The  Volterra  and  Wiener  Theo¬ 
ries  of  Nonlinear  Systems,  Wiley,  New  York,  1980. 

[7]  H-Z.  TAN,  Z-Y.  MAO,  “Blind  identifiability  of 
quadratic  non-linear  systems  in  higher-order  statis¬ 
tics  domain” ,  Int.  Jour.  Adapt.  Control  Signal  Pro¬ 
cessing,  vol.  12,  pp.  567-577,  1998. 

[8]  A.  J.  van  der  VEEN,  S.  TALWAR,  A.  PAULRAJ, 
“A  subspace  approach  to  blind  space-time  signal 
processing  for  wireless  communication  systems”, 
IEEE  trans.  on  Signal  Processing,  vol.  45,  no.  1, 
pp.  173-190,  Jan.  1997. 

[9]  G.T.  ZHOU,  G.B.  GIANNAKIS,  “Nonlinear  chan¬ 
nel  identification  and  performance  analysis  with 
PSK  inputs” ,  in  Proc.  of  1st  IEEE  Signal  Process¬ 
ing  Workshop  on  Wireless  Communications,  Paris, 
France,  16-18  April  1997,  pp.  337-340. 

Figure  I:  Example  of  Identification:  10 dB,  16284 


variance  of  H,  mean  of  H 

Evolution  of  mean  of  H1  in  function  of  N 

Evolution  of  variance  of  H1  in  function  of  N 

Evolution  of  mean  of  H2  in  function  of  N 

Evolution  of  variance  of  H2  in  function  of  N 

Figure  II:  Means  and  standard  deviations  for  all  methods  with  1000  independent  realizations.  Simple  line:  RF, 
UTB,  d-:  SSI,  o-.MS. 


Cristoff  Martin  and  Bjorn  Ottersten 

The  Department  of  Signals,  Sensors  &  Systems 
Royal  Institute  of  Technology  (KTH) 
SE-100  44  Stockholm,  Sweden 


Interference  from  other  users  and  interference  due  to  mul¬ 
tipath  propagation  limit  the  capacity  of  wireless  communi¬ 
cation  networks.  As  the  number  of  users  and  the  demand 
for  new  services  in  the  networks  increases,  co-channel  inter¬ 
ference  will  be  a  limiting  factor. 

This  paper  proposes  an  iterative  structured  multi¬ 
channel  receiver  algorithm  that  jointly  estimates  the  com¬ 
munication  channels  and  desired  data  while  canceling  inter¬ 
ference.  A  general  way  of  adding  training  redundancy  to  a 
data  frame  is  also  introduced. 

Prom  simulations  the  proposed  method  is  shown  to 
achieve  low  bit  error  rates  also  in  the  presence  of  strong  in¬ 
terference.  These  simulations  also  show  that  by  distributing 
the  training  information  in  a  data  burst  elaborately,  further 
improvements  in  performance  are  achievable. 


During  the  last  decades,  a  rapid  development  in  mobile 
communications  has  occurred.  The  seemingly  ever  increas¬ 
ing  number  of  users  and  services  has  caused  equally  in¬ 
creasing  demand  for  capacity  and  reliability.  Because  of 
the  physical  limitations  of  radio  communications  and  the 
limited  bandwidth  available  these  demands  are  difficult  to 

One  of  the  factors  that  limits  capacity  is  the  interfer¬ 
ence  from  other  users,  Co-Channel  Interference  or  CCI.  The 
problem  is  further  complicated  by  the  fact  that  in  realistic 
wireless  communication  systems  there  will  always  be  some 
amount  of  multi-path  propagation  causing  Inter-Symbol  In¬ 
terference  or  ISI.  Thus,  by  developing  receivers  that  can 
handle  these  kinds  of  interference,  the  capacity  and  relia¬ 
bility  in  the  wireless  network  can  be  increased.  One  way  of 
combating  interference  is  through  the  use  of  antenna  arrays, 
thus  creating  a  multi-channel  system.  The  receiver  systems 
considered  in  this  paper  are  all  multi-channel. 

This  paper  considers  an  iterative  algorithm  that  at  the 
same  time  it  is  rejecting  interference  also  estimates  trans¬ 
mitted  data  and  baseband  transmission  channels.  The  pro¬ 
posed  receiver  is  semi-blind,  i.e.,  it  uses  training  information 
available  for  the  desired  user. 

Several  other  approaches  have  been  taken  to  reject  in¬ 
terference.  Iterative  Least  Squares  with  Projection  (ILSP) 
is  introduced  in  [1,  2].  ILSP  is  a  blind  method  to  separate 
several  co-channel  signals  using  the  Finite  Alphabet  (FA) 

property  of  digital  communication  signals.  However  it  does 
not  handle  ISI  nor  does  it  handle  training  information  in 
a  natural  fashion.  The  method  presented  herein  is  similar 
to  ILSP  but  taking  ISI  and  training  information  into  ac¬ 
count  as  well.  In  [3]  an  interference  rejection  algorithm  is 
presented  that  by  using  ILSP,  oversampling  and  an  extra 
processing  step  is  able  to  also  handle  ISI.  Another  method 
similar  to  ILSP  is  proposed  in  [4],  this  method  also  handles 
training  sequences  and  ISI.  However  it  does  not  handle  the 
structure  imposed  by  the  ISI.  Another  class  of  interference 
rejection  algorithms  are  subspace  methods.  These  use  al¬ 
gebraic  subspace  properties  to  reject  interference  based  on 
second  order  statistics.  An  example  of  such  a  method  used 
for  comparison  in  this  paper  can  be  found  in  [5]. 


An  L  element  antenna,  with  symbol  spaced  base  band  sam¬ 
pling  is  considered.  For  simplicity  only  one  desired  user  and 
one  interfering  user  is  considered  (even  though  the  data 
model  and  proposed  receiver  algorithm  easily  can  be  ex¬ 
tended  to  multiple  users  and  interferers).  The  interferer  is 
assumed  to  be  using  the  same  modulation  scheme  as,  and  be 
burst  synchronized  with,  the  desired  user.  Within  a  burst 
the  user  and  the  interferers  send  one  data  frame  consisting 
of  N  symbols  of  which  Nd  symbols  are  unknown  data  and 
the  rest  are  used  for  training  purposes.  The  radio  chan¬ 
nels  between  the  transmitters  and  the  receiving  antennas 
are  assumed  to  be  time  invariant  within  one  data  frame.  It 
is  also  assumed  that  the  transmission  process  between  the 
transmitter  and  the  receiver,  including  the  effects  of  the 
transmitter  and  receiver  filters  can  be  modeled  as  a  FIR 
filter  of  length  M.  It  is  then  possible  to  model  the  received 
data  as 

X  =  HS  +  GD  +  V.  (1) 

Where  X  (which  is  L  x  (N  +  M  -  1))  contains  the  data  re¬ 
ceived  by  the  antenna  array.  The  channel  matrices,  H  and 
G  (both  L  x  M),  describe  the  transmission  process  between 
the  desired  user  and  the  interferer  respectively.  The  trans¬ 
mitted  data  is  contained  in  S  and  D  ( M  x  (N  +  M  —  1)) 
while  V  models  additive  noise.  The  received  data  matrix 
X  is  organized  as  X  =  [*(1)  *(2)  ...  x(N  +  M  —  1)] 

where  x(n)  is  a  column  vector  containing  the  the  data  out¬ 
put  from  the  array  at  the  nth  sampling  instant.  To  exem¬ 
plify  the  organization  of  the  data  matrices,  the  data  matrix 

0-7803-5988-7/00/$  10.00  ©  2000  IEEE 


of  the  desired  user  is 


- F - 

1  0 

0  j 

S  = 


Where  s  is  a  vector  containing  the  data  symbols  transmit¬ 
ted  in  one  frame.  Prom  (2)  the  structure  of  the  data  matri¬ 
ces  becomes  obvious.  In  order  to  achieve  good  performance 
a  receiver  algorithm  must  preserve  this  structure. 


The  problem  of  estimating  the  unknown  data  vectors  and 
channel  matrices  is  considered.  It  is  assumed  that  training 
information  is  available  for  the  desired  user  while  it  is  un¬ 
known  for  the  interferer.  The  transmission  of  the  data  is 
disturbed  by  spatially  and  temporally  additive  white  com¬ 
plex  Gaussian  noise. 

The  goal  is  to  find  the  maximum  likelihood  estimates 
of  H,  S,  G  and  D.  That  is,  the  H,  S,  G  and  D  that 

\\X  -HS-GD\\l  (3) 

taking  the  finite  alphabet  property  of  the  signals  into  ac¬ 
count.  Note  that  given  the  data  symbols,  the  criterion 
is  quadratic  in  the  channel  matrices.  After  rewriting  this 
norm  as 

\\X-HS-GD\\l  = 

X  —  [H  G] 


it  can  be  minimized  with  respect  to  [H  G] , 

H  G\  -  X 




=  X 










The  algorithm  proposed  in  this  paper  takes  an  iterative  ap¬ 
proach  to  minimize  (3)  while  maintaining  the  structure  of 
the  data  matrices  (see  (2)).  Known  training  information  is 
also  taken  into  account.  The  iterative  procedure  of  the  pro¬ 
posed  algorithm  is  similar  to  the  ILSP  algorithm  proposed 
in  [1,  2], 

Assuming  that  initial  estimates  of  the  data  sequences 
are  available  the  method  can  be  outlined  as 

1.  Assume  that  the  estimated  data  sequences  are  cor¬ 
rect.  The  norm  (3)  is  now  quadratic  in  H  and  G 
and  it  is  easy  to  estimate  the  channel  matrices. 

2.  Rewrite  the  norm  (3)  so  that  it  can  be  minimized  in 
a  way  that  maintains  the  structure  of  S  and  D  and 
takes  available  training  information  into  account. 

3.  Now,  assume  that  the  estimated  channel  matrices  are 
correct.  The  norm  (3)  becomes  quadratic  in  S  and 
D  if  we  relax  the  FA-property.  Thus,  it  is  possible 
to  estimate  the  unknown  data  symbols  by  solving  a 
linear  set  of  equations. 

4.  Project  the  data  on  its  finite  alphabet. 

5.  Repeat  the  steps  above  until  convergence. 

If  the  initial  data  estimates  are  good  enough  the  method 
will  in  general  converge  to  the  desired  global  minimum  of  (3) 
and  the  initial  data  estimate  are  improved. 

The  proposed  method  also  makes  it  possible  to  gen¬ 
eralize  how  the  training  information  is  added  to  the  data 
sequence.  This  is  considered  in  the  following  section.  A 
more  detailed  description  of  the  algorithm  can  be  found  in 
section  6. 


When  a  training  sequence  is  added  to  a  data  frame  it  is  usu¬ 
ally  either  simply  inserted  in  the  beginning  or  at  the  middle 
of  the  data  frame.  Here  a  more  general  way  of  adding  the 
training  data  is  introduced  by  the  affine  mapping 

s  =  C\Sd  +  Co-  (7) 

Where  A *  denotes  the  pseudo  inverse  of  A.  After  having 
resubstituted  H  and  G  into  (4)  a  minimization  criterion 
only  depending  on  S  and  D  is  achieved, 






Where  P Jj  =  I  —  A *  (A A*)-1  A  and  I  is  the  identity  ma¬ 
trix.  It  is  now  possible  to  find  the  global  minimum  by 
enumerating  over  all  possible  S  and  D  using  their  FA- 
property  and  known  training  information  while  maintaining 
the  structure  of  the  matrices.  The  enumerating  however  is 
of  exponential  complexity  which  makes  this  enumerating 
impossible  also  for  modest  data  frame  sizes.  The  follow¬ 
ing  sections  consider  a  suboptimal  method  that  attempts 
to  minimize  (3)  with  less  computational  complexity. 

Where  s  (N  x  1)  contains  the  data  to  be  transmitted  (data 
and  training  information),  Sd  ( Nd  x  1)  contains  the  data 
without  training  information.  C i  ( N  x  No)  and  Co  ( N  x 
1)  are  Code  Matrices  that  add  training  information  (and 
possibly  error  correcting  redundancy)  to  the  data. 

It  is  obvious  that  the  code  matrices  can  be  chosen  so 
that  training  information  is  added  to  the  data  sequence  in 
the  conventional  way  described  above.  However  this  also 
provides  the  opportunity  of  adding  training  information 
more  elaborately.  For  example  the  training  information  can 
be  distributed  over  the  entire  data  sequence. 


The  steps  of  the  proposed  algorithm  outlined  in  section  4 
are  presented  in  more  detail  in  this  section.  It  is  assumed 
that  an  initial  estimate  of  the  unknown  user  data  and  the 


interferer  data  is  present.  Further  it  is  assumed  that  the 
code  matrices  Co  and  C i  are  known  for  the  desired  while 
they  are  not  available  for  the  interferer. 

6.1.  Estimating  the  Channel  Matrices 

If  we  assume  the  estimated  data  sequences  to  be  correct  a 
least  squares  estimate  of  the  channel  matrices  can  be  found 
as  (see  (5)) 

[H  G]  =  X 

6.2.  Maintaining  the  Structure  of  the  Data  Matri¬ 

To  maintain  the  structure  of  the  data  matrices  while  es¬ 
timating  them  the  norm  (3)  must  be  rewritten.  This  can 
be  achieved  using  properties  of  the  vec  operator  and  the 
Kronecker  product.  Letting  vec  denote  the  vec  operator, 
8  denote  the  Kronecker  product  and  I  denote  the  identity 
matrix,  this  rewriting  can  be  done  in  a  few  steps  as  follows, 

vec  {X  -  HS  -  GD}  =  vec  X  -  (I  ®  H )  vec  S 

-(/®G)vecD.  '8' 

To  simplify  notation,  let  4>h  =  J  ®  H ,  =  I  ®  G,  and 

x  =  vecX.  Also,  the  ( NM  x  N)  selection  matrix  \P  is 
defined.  The  matrix  tP  consists  of  zeros  and  ones  and  takes 
a  data  vector  to  a  vectorized  data  matrix,  i.e.  vec  S  =  fPs 
and  vec  D  =  <Pd.  Now,  (8)  can  be  rewritten  as 

vec  {X  -  HS  -  GD}  =x-$H*s- 

=  X  -  ^H^CiSd 
-  $H^Co  - 
=  x  —  tFCo 


where  the  middle  step  follows  from  (7).  By  using  (9)  the 
norm  (3)  can  now  be  minimized  with  respect  to  the  data 
while  maintaining  the  structure  of  the  data  matrices  S,  D. 

6.3.  Estimating  the  Received  Data 

By  using  (9)  and  assuming  the  estimated  channels  to  be 
correct  we  now  obtain  continuous  estimates  of  the  unknown 
data  vectors  Sd  and  d.  This  can  be  done  much  in  the  same 
way  as  the  estimation  of  the  channel  matrices  which  results 



(x  -  ^H^Co)  • 


The  unknown  data  can  be  estimated  by  projecting  the  con¬ 
tinuous  data  estimates  to  the  finite  alphabet  in  use. 

Finally  the  three  steps  above  are  iterated  until  conver¬ 
gence  is  reached.  If  the  initial  estimates  are  good  enough 
they  are  in  general  improved. 


To  give  some  insight  to  the  kind  of  performance  that  the 
proposed  algorithm  might  offer,  simulations  have  been  con¬ 
ducted  and  the  results  from  these  are  presented  in  this  sec¬ 
tion.  In  order  to  offer  some  comparison  with  previous  work, 
the  structured  subspace  receiver  described  in  [5]  was  sim¬ 
ulated  under  the  same  conditions  and  results  from  these 
simulations  are  provided. 

Two  different  sets  of  code  matrices  were  used  (see  sec¬ 
tion  5).  One  conventional  with  all  the  training  symbols  in 
the  beginning  of  the  sequence  and  one  with  the  training 
symbols  spread  over  the  entire  sequence.  In  the  simula¬ 
tions  of  the  structured  subspace  receiver  the  entire  training 
sequence  was  located  in  the  beginning  of  the  data  frame. 

In  all  cases  an  L  =  4  antenna  system  was  considered. 
An  antipodal  binary  modulation  scheme  was  employed  (this 
would  for  example  correspond  to  BPSK). 

To  model  the  transmission  process  (the  transmit¬ 
ter/receiver  filters  and  the  radio  channel)  a  two  tap  FIR 
channel  model  was  used.  The  channels  were  assumed  inde¬ 
pendent  from  antenna  to  antenna  and  to  simulate  Rayleigh 
fading  the  channel  taps  were  independently  drawn  from  a 
complex  Gaussian  distribution. 

In  the  simulations  it  was  assumed  that  the  length  of  the 
channel  impulse  responses,  M,  and  the  number  of  transmit¬ 
ters,  U,  are  known  or  have  been  correctly  estimated. 

To  offer  some  idea  about  what  the  achievable  perfor¬ 
mance  would  be,  a  simple  initialization  scheme  was  em¬ 
ployed.  Interferer  data  was  initialized  with  its  continuous 
solution  (of  the  minimization  of  the  norm  (4),  ignoring  the 
structure  of  the  data  matrices,  see  e.g  [2])  projected  to  the 
finite  alphabet  in  use.  The  desired  user  data  was  initial¬ 
ized  with  random  data  symbols.  Received  sequences  where 
the  resulting  norm  (3)  was  smaller  than  the  true  norm  (the 
norm  (3)  achieved  using  the  true  data  and  channel  matrices) 
plus  one  standard  deviation  of  the  norm  were  kept  while  re¬ 
ceived  sequences  not  fulfilling  this  criteria  were  identified  as 

In  figure  1  the  bit  error  rate  performance  of  the  proposed 
method  as  a  function  of  the  Signal  to  Noise  Ratio  (SNR) 
is  shown.  The  desired  user  is  disturbed  by  a  single  inter¬ 
ferer.  The  Signal  to  Interference  Ratio  (SIR)  in  these  sim¬ 
ulations  was  —10  dB.  The  results  from  the  simulated  pro¬ 
posed  method  are  compared  with  the  structured  subspace 
method  with  estimated  channels  and  with  known  channels. 
Also,  the  two  different  sets  of  training  matrices  (described 
above)  are  compared.  The  data  frames  consist  of  57  sym¬ 
bols  of  which  42  are  data  symbols  and  the  rest  are  used 
for  training  purposes.  At  these  conditions  the  proposed 
method  performs  well  on  par  with  the  structured  subspace 
method  using  perfect  channel  estimates.  The  structured 
subspace  method  by  itself  needs  longer  training  sequences 
in  order  to  perform  well  (see  figure  4).  The  distributed 
training  information  offers  slightly  better  performance  than 
the  conventional  training  sequence.  Even  though  the  differ¬ 
ence  in  performance  is  small  this  is  interesting  as  both  these 
data  distributions  use  the  same  number  of  training  and  data 
bits.  Only  how  they  are  distributed  differ. 

To  explore  the  loss  in  performance  due  to  the  interfer¬ 
ence,  the  proposed  method  was  simulated  with  and  with- 


Signal  to  Noise  Ratio  (in  dB) 

Figure  1:  Performance  with  a  single  -10  dB  interferer 

out  an  interferer.  Other  than  that  the  simulated  condi¬ 
tions  were  identical  to  the  previous  simulation.  The  re¬ 
sults  from  these  simulations  are  shown  in  figure  2.  As  can 
be  seen  from  the  graph,  at  an  SNR  of  4  dB  the  loss  is 
approximately  1.5  dB,  both  with  the  conventional  train¬ 
ing  sequence  and  with  the  distributed  training  information. 
Again  slightly  lower  bit  error  rates  were  achieved  when  the 
distributed  training  information  was  used  compared  to  the 
more  conventional  training  data  distribution. 

The  number  of  data  frames  not  converging  to  a  norm 
small  enough,  the  rejection  rate,  was  also  measured  under 
the  same  conditions  as  in  the  previous  simulations.  Figure  3 
shows  the  results  from  these  measurements.  As  can  be  seen 
from  the  graph,  when  there  is  CCI  present  the  rejection 
rate  becomes  quite  high  and  it  would  be  desirable  to  use  a 
better  initialization  method. 

The  effects  of  the  length  of  the  training  sequence  was 
also  given  some  attention.  Once  again  the  proposed  al¬ 
gorithm  with  the  two  different  training  distributions  and 
the  structured  subspace  method  (found  in  [5])  were  com¬ 
pared.  Figure  4  shows  the  bit  error  rate  of  the  desired  data 
sequence  as  a  function  of  the  number  of  training  symbols 
and  figure  5  shows  the  rejection  rate  as  a  function  of  the 
number  of  training  symbols.  These  simulations  were  per¬ 
formed  at  an  SNR  of  4  dB,  with  and  without  a  single  -10 
dB  co-channel  interferer.  The  number  of  data  symbols  in 
each  frame  remained  42.  From  figure  4  it  can  also  be  seen 
that  the  proposed  method  is  less  sensitive  to  short  training 
sequences  than  the  method  used  for  comparison.  Figure  5 
shows  that  the  number  of  rejected  sequences  increases  fast 
when  the  number  of  training  symbols  drops  below  15.  It 
seems  likely  that  the  convergence  criteria  might  affect  sim¬ 
ulated  bit  error  rates  when  the  number  of  training  symbols 
becomes  smaller  than  that. 

As  can  be  seen  from  the  results  above  the  proposed 
method  is  showing  promising  performance.  However  there 
are  still  several  issues  that  require  further  investigation.  For 
example,  in  its  current  implementation  the  proposed  re¬ 
ceiver  algorithm  is  computationally  expensive.  Also  the  ro- 

Signal  to  Noise  Ratio  (in  dB) 

Figure  2:  Performance  lost  due  to  interference. 

Signal  to  Noise  Ratio  (in  dB) 

Figure  3:  Rejection  rates  as  functions  of  the  SNR. 

Number  of  training  symbols 

Figure  4:  Error  rates  at  an  SNR  of  4  dB. 


Figure  5:  Rejection  rates  at  an  SNR  of  4  dB. 

[5]  G.  Klang  and  B.  Ottersten,  “Channel  estimation  and  in¬ 
terference  rejection  for  multichannel  systems,”  in  Pro¬ 
ceedings  of  the  32th  Asilomar  Conference  on  Signals, 
Systems  and  Computers,  (Pacific  Grove,  CA,  USA),  nov 

bustness  to  model  errors  and  initialization  are  other  issues 
that  deserve  more  attention.  More  general  forms  of  train¬ 
ing  information  where  the  data  is  confined  to  more  general 
affine  mappings  can  easily  be  considered  with  the  proposed 


Herein,  we  have  presented  a  interference  cancellation 
method  that  can  be  applied  to  multi-channel  data.  Train¬ 
ing  information  from  the  desired  user  is  exploited  and  the 
communication  channels  are  jointly  estimated  together  with 
the  unknown  data  symbols  of  both  the  desired  user  and  the 
interference.  This  method  can  easily  treat  general  forms  of 
training  information  and  a  simple  example  with  distributed 
training  information  was  shown  to  give  improved  perfor¬ 
mance  compared  to  a  block  of  training  data. 


[1]  S.  Talwar,  M.  Viberg,  and  A.  Paulraj,  “Blind  estimation 
of  multiple  co-channel  digital  signals  using  an  antenna 
array,”  IEEE  Signal  Processing  Letters,  vol.  1,  February 

[2]  S.  Talwar,  M.  Viberg,  and  A.  Paulraj,  “Blind  separa¬ 
tion  of  synchronous  co-channel  digital  signals  using  an 
antenna  array  -  part  I:  Algorithms,”  IEEE  Transactions 
on  Signal  Processing,  vol.  44,  pp.  1184-1197,  May  1996. 

[3]  A.-J.  van  der  Veen,  S.  Talwar,  and  A.  Paulraj,  “Blind 
identification  of  FIR  channels  carrying  multiple  finite 
alphabet  signals,”  in  Proc.  of  ICASSP,  vol.  2,  pp.  1213- 
1216,  1995. 

[4]  J.  Laurila,  R.  Tschofen,  and  E.  Bonek,  “Semi-blind 
space-time  estimation  of  co-channel  signals  using  least 
squares  projections,”  in  Proceedings  of  the  Vehicular 
Technology  Conference,  1999.  VTC  1999  -  Fall.,  vol.  3, 
pp.  1310  -  1315,  Sept  1999. 




Wen-Jye  Huang  and  John  F.  Doherty 

Department  of  Electrical  Engineering 
The  Pennsylvania  State  University 
University  Park,  PA  16802 
E-mail:  {wxhl48,jfdoherty}  @psu.  edu 


In  this  paper  we  proposed  a  new  approach  that  clusters 
mobile  users  before  downlink  beamforming  and  broad¬ 
ens  beams  and  nulls  within  the  beamforming  calcula¬ 
tion.  We  first  investigate  the  broadening  beamforming 
scheme  to  alleviate  inaccuracies  in  DOA  estimation. 
Next  we  exam  how  to  group  the  mobile  users,  with 
the  constraint  of  separation  angle,  to  enhance  down¬ 
link  beamforming.  Simulations  show  that  the  down¬ 
link  beamforming  complexity  is  decreased  dramatically 
with  limited  performance  loss. 


Owing  to  the  rapid  growth  demand  in  the  mobile  com¬ 
munication,  the  current  capacity  of  mobile  commu¬ 
nication  faces  a  severe  challenge  during  peak  usage. 
To  remedy  the  capacity  limitation,  research  on  space- 
division  multiple-access  (SDMA),  which  increases  sys¬ 
tem  capacity  and  decreases  co-channel  interference,  has 
been  investigated. 

A  basic  idea  of  SDMA  is  to  spatially  separate  the 
mobile  users,  which  allows  reuse  of  limited  radio  re¬ 
sources,  such  as  frequency,  time,  or  code  slot  within 
a  cell.  SDMA  relies  on  the  application  of  an  adap¬ 
tive  array  antenna  at  the  base  station  to  form  mul¬ 
tiple  beam  patterns,  which  serve  multiple  user  traffic 
channels.  Therefore  the  capacity  of  the  system  can  be 

Prior  research  shows  that  implementing  SDMA  on 
the  downlink  increases  the  channel  capacity  [1],  [2], 
[3].  One  simple  SDMA  approach  uses  the  DOA  esti¬ 
mated  from  uplink  data  and  forms  the  spatial  signature 
for  downlink  transmission.  However,  in  urban  environ¬ 
ments,  angular  spreads  (AS)  could  be  up  to  15°  [4], 
which  means  the  estimated  downlink  beamforming  pat¬ 
tern  may  degrade  system  performance  due  to  narrow, 
misaligned  nulls.  In  addition,  if  the  user  DOAs  are 

not  well  separated,  SDMA  cannot  provide  much  system 
performance  improvement.  Furthermore,  the  downlink 
beamforming  algorithm  needs  extensive  computation 
power  to  solve  a  nonlinear  optimization  problem  in¬ 
volving  a  nonlinear  constraint  weight  vector  for  every 
user  [5].  This  limits  the  applicability  of  this  approach 
for  low  complexity,  real-time  operation. 

This  paper  proposes  a  new  approach  that  clusters 
(groups)  mobile  users  before  the  downlink  beamform¬ 
ing  calculation.  This  approach  alleviates  the  computa¬ 
tional  complexity  problem  and  the  spatial  separability 
problem.  The  algorithmic  block  diagram  is  shown  in 
Figure  1.  By  carefully  choosing  AS  and  forming  the 
same  beamforming  weight  vector  wgroup  to  the  same 
group,  the  simulation  results  show  that  the  clustering 
scheme  is  within  3  dB  of  the  conventional  method,  with 
a  dramatic  decrease  in  computational  complexity. 










Figure  1:  New  Cluster  Algorithm  for  Downlink  Beam¬ 


We  assume  that  K  users  are  served  within  the  same  cell 
by  the  base  station  with  a  uniform  linear  array  Antenna 

0-7803-5988-7/00/$10.00  ©  2000  IEEE 


(ULA)  consisting  of  M  identical,  omnidirectional  sen¬ 
sors,  equally  spaced  at  distance  d.  A  narrowband  signal 
model  is  assumed  and  the  baseband  signal  received  at 
time  t  with  Lk  paths  for  the  fcth  user  is: 

K  Lk 

x(t)  Akt  a(°kiJu)  sk(t  -  Tki)  +  n(t)  (1) 

k= 1 (=1 

where  n(t)  is  spatially  and  temporally  white  Gaus¬ 
sian  noise  and  the  array  steering  vector  a(8,  fu)  is  given 

a(6,fu)  =  [l,e~j2wd^  sin9, ...,  e-j2jrd^-(M-l)  sin  0]T 


where  A*;  is  the  amplitude  of  the  Ith  path  of  the 
kth  user,  Sk(t)  is  the  baseband  signal  transmitted  at 
the  kth  mobile  and  Tki  is  its  corresponding  delay. 

Prom  the  received  uplink  signal,  it  is  possible  to 
estimate  the  spatial  covariance  matrix,  which  contains 
the  directional  information  of  the  mobile  radio  channel 
(dominant  DO  As  Om )  and  corresponding  power  for  each 
user.  It  can  be  written  as  following: 


Rk  =  5^  Ah  a{8ki  ,fd)  aH  (8ki ,  fd)  (3) 


Similarly  we  define  the  interference  covariance  ma¬ 
trix  Qk  as 

Qk  =  ^2Ri  +  <T%  I  (4) 


where  a2N  and  I  denote  the  white  noise  variance  and 
M  x  M  identity  matrix,  respectively. 

The  goal  of  downlink  beamforming  is  to  design  a 
weight  vectors  Wkd{fd ;  t)  to  transmit  the  constraint  power 
to  the  desired  user  and  to  minimize  the  transmitted  en¬ 
ergy  to  the  undesired  user.  In  another  word  we  want  to 
maximize  the  SINR  (Signal  to  Noise  plus  Interference 
Ratio)  for  the  fcth  user  [6]. 


Wkd  =  arg  max  — ^ 


Wkd  U>kdQkWkd 


The  solution  of  (5)  is  proportional  to  the  generalized 
eigenvector  of  matrix  pair  [Rk,Qk]  [3] 


—  Jma*l 


- dk 


Rk  e 



;  */  WkdRkWkd  =  xk 



The  existence  of  angular  spreads  (AS)  causes  DOA  es¬ 
timation  error,  which  adversely  affects  the  downlink 
beamforming  process.  The  SINR  degrades  because  the 
maximum  transmitted  power  is  not  directed  at  the  de¬ 
sired  user,  or  because  the  nulls  pointed  towards  to  the 
cochannel  users  are  too  narrow.  One  method  presented 
in  this  section  will  make  the  SINR  more  robust  to  DOA 
estimation  error.  The  angular  spread  based  approach 
[7],  [8]  can  steer  a  broad  range  of  beam  patterns  to¬ 
wards  users  of  interest,  or  nulls  toward  the  cochannel 
users.  A  modified  version  of  interference  covariance 
matrix  can  be  written  as: 

Rk  =  Rk  ©  >5max  (7) 

Qk  =Qk®  Sm ax  (8) 

with  [5max]pg  =  e-2[^(p-?)]V- 

where  ©  and  [.]P7  denote  the  Schur  Hadamard  element- 
by  element  matrix  product  and  the  pq  th  element  of  a 
matrix,  respectively.  The  variable  o-^ax  quantifies  the 
angular  spreads  (AS)  of  the  corresponding  DOAs. 

By  using  target  and  null  broadening  technique  in 
the  downlink  beamforming, the  design  of  beamformers 
are  more  robust  in  the  mobile  communication  environ¬ 
ment.  In  addition,  the  beamforming  weights  are  valid 
for  a  longer  time  with  less  calculations  required  [8]. 
Figure  2  shows  the  beam  pattern  with  and  without 
the  broadening  technique.  It  is  clear  that  by  applying 
the  broadening  technique,  the  narrow  nulling  interfer¬ 
ence  problem  is  solved.  Although  it  introduces  some 
increase  of  the  SINR  perturbation,  the  worse  case  ef¬ 
fect  of  DOA  estimation  error  is  still  negligible  [6]. 


Two  conditions  limit  the  performance  and  capacity  of 
SDMA  systems: 

1.  Users  that  share  same  channel  allocation  are  co¬ 
located,  within  the  resolution  of  the  beam  pat¬ 

2.  Co-channel,  co-located  users  have  disparate  pow¬ 
ers,  causing  the  so-called  “near-far  problem.” 

A  proposed  solution  to  the  near  far  problem  is  grouping 
the  mobile  uses  within  power  classes  before  downlink 
beamforming  [9]. 

Utilizing  the  advantage  of  the  target  and  null  broad¬ 
ening  method,  and  the  existence  of  angular  spreading 


Figure  2:  Conventional  Beamforming  vs.  Beamform¬ 
ing  with  Broadening  Target  and  Null  Technique  with 
Target  at  90°  and  Null  at  40° 

(AS),  we  propose  a  grouping  algorithm  that  is  con¬ 
strained  to  angle  separation  with  location  in  a  cell.  By 
grouping  all  the  users  in  a  cell  before  downlink  beam¬ 
forming  and  selective  calculation  for  downlink  beam¬ 
forming  weight  in  a  group,  the  computational  complex¬ 
ity  for  the  base  station  is  decreased  dramatically  with 
a  tolerable  performance  loss. 

The  basic  approach  of  grouping  and  downlink  beam¬ 
forming  calculation  algorithm  within  a  cell  is  the  fol¬ 

1.  Determine  the  angle  separation  A 0  for  each  group, 
typically  use  the  angular  spreading  (AS)  as  a  pa¬ 

2.  Assign  users  to  same  group  if  (A 9  <  AS); 

3.  Determine  the  representative  angular  for  each  group, 
typically  choose  the  highest  energy  interference 
source  within  a  group  as  a  representative; 

4.  Calculate  the  downlink  beamforming  weight  wnew 
for  each  group; 

5.  Apply  the  weight  wnew  for  each  user  in  the  same 

We  use  a  simulation  with  M=8  uniform  linear  an¬ 
tenna  with  half  wavelength  inter-element  spacing  to 
verify  that  the  performance  loss  is  acceptable  for  the 
above  algorithm.  Consider  N=4  sources,  one  signal-of- 
interest  (SOI)  and  three  signal-of-non-interest  (SONI), 
with  initial  SOI  DOA  of  90°  and  DOA’s  of  SONI  at 
40°,  120°  and  140°.  Figure  3  compares  SINR  error 

for  conventional  beamforming  and  the  target  and  null 
broadening  technique. 

Figure  3:  Downlink  SNIR  comparison  for  conven¬ 
tional  beamforming  method  and  beamforming  using 
the  broadening  technique. 

From  Figure  3,  it  is  clear  that  if  users  are  geomet¬ 
rically  close  enough,  in  this  case  AS  <  8°,  we  can 
reuse  the  same  downlink  weight  wnew  to  save  calcula¬ 
tions  in  base  station  with  an  acceptable  trade-off  3dB 
SINR  loss,  in  this  case.  However,  if  we  account  for 
interference  source  spreading  angles,  which  are  due  to 
the  narrow  nulls  of  traditional  beamforming,  the  per¬ 
formance  loss  due  to  angle  spreading  towards  the  co¬ 
channel  users  is  large.  Figure  4  shows  the  performance 
loss  due  to  offset  targeting  the  co-channel  users  for  the 
previous  simulation  scenario.  It  is  obvious  that  the 
broadening  technique  reduces  performance  loss  due  to 
co-channel  angle  spreading. 

We  use  a  simulation  to  demonstrate  the  complex¬ 
ity  savings  of  the  grouping  method.  Figure  5  shows 
the  performance  under  different  angle  spreading,  where 
users  are  uniformly  distributed  by  angle  in  a  cell. 

The  results  shown  in  Figure  3  and  Figure  5  indi¬ 
cate  that,  with  proper  grouping  user  within  a  cell,  it 
is  possible  to  save  more  than  50%  of  downlink  beam¬ 
forming  computational  complexity  with  limited  SINR 
performance  loss. 


The  simulations  model  a  system  that  uses  a  linear  ar¬ 
ray  antenna  with  M  —  8  antennae  and  half  wavelength 
inter-element  spacing  and  N  =  25  mobile  users  uni¬ 
formly  distributed  from  [0  ir)  within  a  cell.  Figure 


Figure  4:  Performance  loss  due  to  co-channel  users  an¬ 
gle  offset. 

Figure  5:  Group  number  vs.  user  number  under  various 
angle  spreading  conditions. 

6  shows  the  block  diagram  for  conventional  downlink 
beamforming  and  the  flow  chart  for  the  grouping  algo¬ 

Based  on  Figure  6,  Table  1  addresses,  under  the 
simulation  environment  model,  the  computational  load 
for  each  block. 

It  is  obvious  that  the  proposed  method  needs  only 
one-third  of  typical  base  station  complexity  for  calcula¬ 
tion  R  Q  and  Wdown-  From  the  entire  system  viewpoint, 
the  new  method  reduces  the  computational  complexity 
needed  in  the  base  station  for  SDMA  applications  by 

Conventional  Downlink  BF 

Downlink  BF  With  Broaden  and  Group  Technique 

Figure  6:  Block  Diagram  for  Conventional  Dowlink  BF 
Algorithm  and  Algorithm  with  Broadening  Technique 

approximately  50%. 

Figure  7  shows  the  performance  of  grouping  plus 
broadening  target  and  nulls  scheme,  assuming  that  an¬ 
gle  spreading  exists  on  all  sources  (desired  user  and 
cochannel  interference).  The  worse  scenario  is  target 
and  nulls  not  coincident  with  the  estimated  DOAs  are 
at  maximum  offset,  AS  =  8.  Figure  7  shows  that  worse 
case  SINR  loss  decreases  substantially  by  using  group 
and  broadening  scheme. 

Combining  the  results  of  Figure  6  and  Figure  7  in¬ 
dicates  the  efficacy  of  the  new  approach.  By  grouping 
mobile  user  in  a  cell,  and  using  the  broadening  target 
and  nulls  technique,  the  downlink  beamforming  calcu¬ 
lation  is  reduced  by  approximately  50%,  with  accept¬ 
able  performance  loss. 


In  this  paper,  we  have  studied  the  grouping  and  broad¬ 
ening  target  and  nulls  technique  for  downlink  beam¬ 
forming  in  mobile  communication  systems.  Computer 
simulations  show  that  the  benefit  of  grouping  users  not 
only  can  alleviate  the  DO  A  estimation  error  problem, 
but  also  can  offer  robust  beamforming  performance 
in  the  present  of  source  movement  [8].  Moreover, 
the  computation  complexity  in  the  base  station  is  de- 


BF  with 

























X  =  s*w 








weight  Select 


Table  1:  Computational  Effort  Comparison  for  Con- 
ventioanl  BF  and  BF  with  Group  and  Broadening 

[3]  Per  Zetterberg  and  Bjorn  Ottersten,  ’’The  Spec¬ 
trum  Efficiency  of  a  Base  Station  Antenna  Array 
System  for  Special  Selective  Transmission,”  IEEE 
Transactions  On  Vehicular  Technology ,  vol.  44,  no. 
3,  pp.  651-660,  August.  1995. 

[4]  K.  I.  Pedersen,  P.  E.  Mogensen,  B.  H.  Fleury,  ’’Spa¬ 
tial  Channel  Characteristics  in  outdoor  environ¬ 
ments  and  their  Impact  on  BS  Antenna  System  Per¬ 
formance,”  IEEE  VTC,  vol.  2,  pp.719-723,  August. 

[5]  Christof  Farsakh  and  Josef  A.  Nossek,  ’’Spatial 
Covariance  Based  Downlink  Beamforming  in  an 
SDMA  Mobile  Radio  System,”  IEEE  Transactions 
on  Communications,  vol.  46,  no.  11,  pp. 1497-1506, 
November.  1998. 

[6]  Klaus  Hugl,  Juha  Laurila  and  Ernst  Bonek,  ’’Down¬ 
link  Performance  of  Adaptive  Antennas  With  Null 
Broadening,”  IEEE  VTC,  vol.  1,  pp. 872-876, 
September.  1999. 

[7]  Klaus  Hugl,  Juha  Laurila  and  Ernst  Bonek,  ’Down¬ 
link  Performance  for  Frequency  Division  Duplex 
Systems,”  IEEE  Globecom,  vol.  4,  pp.2097-2101, 
December.  1999. 

[8]  Jaume  Riba,  Jason  Goldberg  and  Gregori  Vazquez, 
”  Robust  Beamforming  for  Interference  Rejection  in 
Mobile  Communications,”  IEEE  Transactions  on 
Signal  Processing,  vol.  45,  no.  1,  pp. 271-275,  Jan¬ 
uary.  1997. 

[9]  Michael  Tangemann,  ’’Near  Far  Effects  in  Adaptive 
SDMA  Systems,”  IEEE  PIMRC,  vol.  3,  pp.1293- 
1297,  September.  1995. 

Figure  7:  Simulation  Result  Under  N=25;  Group  with 
8°;  Target  and  Interference  Both  Offset  Criteria 

creased  dramatically,  without  significant  performance 
loss  for  SDMA  systems. 


[1]  Christof  Farsakh  and  Josef  A.  Nossek,  ’’Applica¬ 
tion  of  Space  Division  Multiple  Access  to  Mobile 
Radio,”  IEEE  PIMRC,  vol.  2,  pp.  736-739,  Septem¬ 
ber.  1994. 

[2]  Christof  Farsakh  and  Josef  A.  Nossek,  ”On  The 
Mobile  Radio  Capacity  Increase  Through  SDMA,” 
IEEE  International  Zurich  Seminar  on  Broadband, 
Comm.,  pp.  293-297,  February.  1998. 




Alban  DUVERDIER* ,  Bernard  LAC AZE**  andJean-Yves  TOURNERET* 

*  CNES,  18  av.  Belin,  BPI 2012,  31401  Toulouse  Cedex  4,  France 
ENSEEIHT/SIC,  2,  rue  Camichel  BP7122,  31071  Toulouse  Cedex  7,  France 
tel:  +33  (0)5  61  28  31  79  /  fax:  +33  (0)5  61  28  26  13 


Linear  periodic  time-varying  filters  are  often  introduced  to¬ 
day  in  telecommunication.  They  spread  the  spectrum  and 
can  be  used  for  scrambling,  multi-user  access  or  channel 
modeling.  Recently,  the  authors  have  defined  linear  cy¬ 
clostationary  filters.  In  particular,  this  generalization  has 
permitted  to  take  into  account  the  random  parameters  of  a 
transmission  channel.  This  paper  defines  a  new  case  of  lin¬ 
ear  cyclostationary  filter  where  information  is  included  into 
the  filter. 

We  first  recall  the  definition  of  linear  periodic  and  linear 
cyclostationary  filters.  The  paper  presents  then  particular 
cases  of  these  filters  based  on  clock  change.  Thus,  we  in¬ 
troduce  modulated  periodic  clock  change.  This  filter  can 
be  used  to  transmit  simultaneously  an  analog  and  a  digital 
signal.  We  present  the  reconstruction  method  of  the  initial 
signals.  We  obtain  reconstruction  results  in  the  case  of  the 
simultaneous  transmission  of  an  analog  and  a  binary  infor¬ 


In  telecommunications,  signals  subjected  to  a  linear  period¬ 
ic  filter  [  1]  [2]  are  often  encountered.  Thus,  this  filter  spread 
the  spectrum  and  can  correspond  to  a  scrambling  system 
[3],  a  multi-user  access  method  [4]  or  a  transmission  chan¬ 
nel  modeling  [5].  Recently,  it  was  shown  that  they  can  be 
generalized  in  linear  cyclostationary  filters  [6]. 

In  the  first  section,  we  recall  some  definitions.  In  par¬ 
ticular,  we  present  the  definition  of  linear  cyclostationary 
filter.  We  introduce  then  a  new  filter  called  modulated  peri¬ 
odic  clock  change.  It  permits  to  transmit  simultaneously  an 
analog  and  a  digital  signal.  We  present  the  reconstruction  of 
the  input  signals.  Finally,  we  apply  the  obtained  reconstruc¬ 
tion  results  to  the  transmission  of  an  analog  and  a  binary 


2.1.  Stationary  and  cyclostationary  processes 

Let  A  —  {4(f),  t  G  R)  be  an  harmonisable  zero  mean  and 
mean  square  continuous  process.  A  admits  a  Cramer-Loeve 
representation  0.4(0;)  [7]  such  that: 


A(t)=  I  eiutd@A(u>)  (1) 

—  OO 

We  note  mA  (t)  and  RA  (t,  r)  the  mean  and  autocorrelation 
function  of  A  given  by: 

mA(t)  =f?[4(f)]  (2) 

RA(t,T)  =  E[A(t  +  T/2)A*(t-r/2)]  (3) 

The  power  spectrum  of  A,  SA  <(w),  is  defined  by: 


RA(t,T)=  J  eiurdSAt(uj)  (4) 

—  OO 

A  is  said  to  be  stationary  if  and  only  if  mA(t)  and  RA(t,  r) 
are  independent  of  t.  dSA  t(u>)  is  then  independent  of  t. 

A  is  said  to  be  cyclostationary  if  and  only  if  mA(t) 
and  RA(t,r)  are  periodic  in  t  of  period  T  =  27t/u;o  [8]. 
dSA  t  (w)  is  then  periodic  in  t.  We  suppose  that  it  admits  the 
Fourier  series  decomposition  such  that: 

+  CO 

dSAt{v)  =  £  eau)otdSlA(u)  (5) 

/=  — oo 

2.2.  Linear  time-invariant  and  periodic  time-varying  fil¬ 

Let  h  be  a  linear  time-varying  filter  of  frequency  response 
ht  (w).  Its  response  to  the  stationary  process  Z  is  the  process 
X  defined  by: 


X(t)=  J  eiutht(uj)dez(uj)  (6) 

0-7803-5988-7/00/$  10.00  ©  2000  IEEE 


h  is  a  linear  time-invariant  filter  if  and  only  if  ht{uj)  is  inde¬ 
pendent  of  the  time. 

h  is  a  linear  periodic  time-varying  filter  if  and  only  if 
)  is  periodic  in  time  of  period  T  [1].  We  suppose  that 
it  admits  the  Fourier  series  decomposition  such  that: 

-f  OO 

Mw)  =  £  eilwoth!(u)  (7) 

/=— OO 

2.3.  Linear  stationary  and  cyclostationary  filters 

The  linear  random  time-varying  filter  is  a  generalization  of 
the  linear  time-varying  filter  previously  defined  [9].  Let 
{i7w}w6 r  be  a  complex  random  processes  family,  where, 
for  any  u>,  Hu  =  {Ht(cj),  t  €  R}  is  a  complex  continuous 
random  process.  We  note  Xt(w)  the  mean  and  <£>*,,- (w,  7) 
the  intercorrelation  function  of  the  {Hu}ue r.  Xt(^>)  and 
¥>t,r(w,  7)  are  given  by: 

XM  =  E[Ht(u)]  (8) 

<PtAu,  7)  =  E[Ht+r  («  +  1 )H*_  f(u;  -  |)]  (9) 

Let  h  be  a  linear  random  filter  of  frequency  response  Ht(uj). 
Its  response  to  the  stationary  process  Z  is  the  process  X 
defined  by: 

+  00 

X(t)=  I  eiwtHt(ij)dQz(u >)  (10) 

—  OO 

Thus,  each  linear  filter  can  be  seen  as  a  particular  case  of  lin¬ 
ear  random  filter,  where  H w  is  a  degenerated  random  vari¬ 

A  linear  random  filter  h  is  said  to  be  stationary  if  and 
only  if  the  processes  are  jointly  stationary.  It 

means  that  the  mean  and  the  intercorrelation  function  of  the 
{H“}  wSR.  are  independent  of  the  time. 

Recently,  the  authors  have  generalized  this  definition 
[6].  We  call  h  a  linear  cyclostationary  filter  if  and  only  if 
the  processes  {F“}u€ r  are  jointly  cyclostationary.  It  cor¬ 
responds  to  the  case  where  the  mean  and  the  intercorrelation 
function  of  the  {#w}u;eR  are  periodic  in  time  of  period  T. 

3.1.  Periodic  clock  change 

The  response  X  of  a  stationary  process  Z  subjected  to  a 
periodic  clock  change  [3]  h  is  defined  by: 

X(t)  =  g(t)Z[t-f(t)]  (11) 

where  /(£)  and  g(t)  are  real  measurable  functions,  T  = 
27r/u>o  periodic.  In  equation  (1 1),  f(t)  is  a  timing  jitter  and 

g(t)  corresponds  to  an  amplitude  modulation.  It  is  easy  to 
see  that  a  periodic  clock  change  is  a  particular  case  of  linear 
periodic  filter  and  that  its  frequency  response  is  given  by: 

Mw)  -  9{t)e~iuS(t)  (12) 

Periodic  clock  changes  can  be  implemented  easily.  They 
appear  also  often  in  spread  spectrum  applications  that  use 
linear  periodic  filters,  such  as  scrambling  [3]  and  multi-user 
access  [4]. 

3.2.  Reconstruction  of  the  input  signal 

Figure  1  depicts  the  reconstruction  chain  of  a  signal  submit¬ 
ted  to  a  periodic  clock  change. 

Z(t)  _ 


■^Periodic  Clock  Change 




Figure  1 :  Reconstruction  chain  of  a  signal  submitted  to  a 
periodic  clock  change 

The  reconstruction  of  a  process  subjected  to  a  periodic 
clock  change  is  a  particular  case  of  reconstruction  of  a  pro¬ 
cess  subjected  to  a  linear  periodic  filter.  Equations  (6)  and 
(7)  show  that  the  response  X  of  the  stationary  process  Z 
subjected  to  a  linear  periodic  filter  h  admits  the  following 
spectral  representation: 


dQx(v)=  tl>k{w  -  ku>0)dQz{u  ~  ku>0)  (13) 

k=— oo 

When  the  spectral  support  of  Z  is  included  in  [ — oz0/2,  +’o/2[, 
Z  can  then  be  reconstructed  by: 

Vw€[— wo/2,wo/2[,  VfcgA,  d©z(a>)=i/i~1(w)rf©x(w+fcwo) 

where  A  is  the  integer  set  such  that  the  functions  {rpk  (+0  }  fceA 
are  different  from  zero  on  the  spectral  support  of  Z.  Multi¬ 
ple  redundant  reconstructions  of  Z  can  also  be  obtained  by 
a  frequency  downconversion  followed  by  a  lowpass  filtering 
on  [— <+>0/2,  uz0/2[. 

3.3.  Modulated  periodic  clock  change 

The  paper  proposes  a  new  clock  change  scheme  that  permits 
to  transmit  simultaneously  an  analog  and  a  digital  informa¬ 
tion.  This  spread  spectrum  technique  is  a  generalization  of 


the  classic  periodic  clock  change.  It  can  be  useful  for  exam¬ 
ple  to  scramble  video  with  analog  image  and  digital  sound. 

It  is  called  modulated  periodic  clock  change. 

The  response  X  of  a  stationary  process  Z  subjected  to 
such  a  clock  change  h  is  defined  by: 

X{t)=g(t)Z[t-M(t)f(t)}  (15) 

where /(f)  and  g{t)  are  defined  as  in  (11)  and  M  =  {M(f),  t  € 
R}  is  a  stationary  process  independent  of  Z.  Figure  2  de¬ 
picts  the  obtained  transmission  chain. 

Z(t)  . 






Figure  2:  Transmission  chain  of  a  signal  submitted  to  a 
modulated  periodic  clock  change 

It  is  easy  to  see  that  Z  is  then  subjected  to  a  cyclosta¬ 
tionary  filter  of  frequency  response  given  by: 

Ht(u)  =  g(t)e-iuMWfW  (16) 

In  general,  the  reconstruction  of  Z(t)  can  be  obtained  by  a 
sub-optimal  solution  [6].  Nevertheless,  perfect  reconstruc¬ 
tion  is  possible  when  M  is  a  Bernoulli  variable  that  is  equal 
to  -1  or  +1. 

In  this  case,  equation  (14)  becomes: 

Vug[-w0/2,wo/2[,  Vfce A,  d0z(w)=^1(Mw)(i0x(u+tuo)  (^) 

Let  k\  and  k2  be  two  values  of  k.  Equation  (17)  implies 

VwG[— wo/2,wo/2[,  ip^*(Mu)d&x  (u+kiu>o)=i’^1(,Mui)d@x(,t^+k2wo) 


This  equality  allows  the  identification  of  M  whenever  tpki  (a>) 
and  ipk2(u})  are  not  simultaneously  even  functions.  Know¬ 
ing  M,  Z(t)  can  be  perfectly  reconstructed  using  (17). 

This  method  can  then  be  used  for  any  binary  signal  M(f) 
whose  sampling  rate  is  much  larger  than  T.  It  could  be  also 
generalized  to  any  digital  signal  M(f). 


4,1.  Simultaneous  transmission  of  an  analog  and  a  bi¬ 
nary  information 

In  the  following  simulations,  a  modulated  periodic  clock 
change  is  used  to  transmit  simultaneously  an  analog  signal 

Z(t)  band-limited  on  [— w0/2,  oj0 / 2 [  and  an  N.R.Z.  signal 
M(t).  f(t)  and  g{t)  are  given  by: 

/(t)  =  -ctsinwot  and  g(t)  =  1  (19) 

Figure  3  depicts  the  analog  signal  at  input  of  the  clock  change. 

Figure  3:  Initial  analog  signal 
The  binary  signal  is  presented  by  Figure  4. 

0.9.  - 





-1.1-4  ,  ,  ,  _ _ _ _ _ _ 

0  1  2  3  4  S 

Figure  4:  Initial  binary  signal 

The  signal  observed  at  the  output  of  the  clock  change  is 
represented  in  Figure  5  for  a  =  0.104,  T  =  0.0347 ms  and 
a  bit  rate  of  1  kb/s. 

Figure  5:  Observed  signal 

4.2.  Reconstruction  of  the  analog  information 

We  have  seen  that  Z(t)  has  to  be  reconstructed  while  M (t) 
is  constant.  As  M  (t)  is  a  binary  signal,  the  reconstruction 

Clock  change 
with  modulated 
periodic  function 

*  X(t)=g(t)Z(t-M(t)f(t)) 


functions  of  Z(t)  are  given  during  each  bit  length  by: 

ipk{Muj)  =  Jk  (Maw)  (20) 

where  Jk  (w)  is  the  k’th  order  Bessel  function  and  M  is  the 
value  of  M(t)  that  is  equal  to  +1  or  —1.  The  reconstruction 
of  Z  ( t )  does  not  depend  of  M  when  k  is  even.  It  can  then 
be  obtained  directly  around  any  even  k.  Figure  6  compares 
the  initial  signal  to  the  reconstruction  obtained  for  k  =  0. 
The  analog  information  is  well  reconstructed. 

4.3.  Reconstruction  of  the  binary  information 

As  we  know  a  correct  reconstruction  of  Z(t)  for  k  even,  the 
reconstructions  obtained  for  k  odd  will  allow  to  know  when 
M(t)  is  correctly  identified.  Figure  7  and  8  compare  the 
initial  signal  to  the  reconstruction  for  k  =  1,  when  M(t)  is 
supposed  always  equal  to  +1  and  when  M(t)  is  correctly 

of  Z(t)  for  k  even 

Observed  signal 
X(t)=g(t)Z(t-M(t)f(t))  1 

of  Z(t)  for  k  odd 
with  M=1 

of  Z(t)  for  k  odd 
with  M=-l 

*  Estimation  of  Z(t) 

over  a  bit  period 

"  Estimation  of  M(t) 

Figure  9:  Scheme  for  the  estimation  of  Z(t)  and  M(t) 

proposed  a  reconstruction  method  of  the  signals  transmitted 
by  this  filter.  It  was  applied  successfully  to  the  simultaneous 
transmission  of  an  analog  and  a  binary  signal. 

The  block  diagram  of  Figure  9  shows  a  scheme  witch 
allows  to  reconstruct  Z(t)  and  to  recover  the  values  of  M(t) 
assuming  perfect  timing  of  the  corresponding  bit  stream. 


In  this  paper,  we  recalled  the  definition  of  a  linear  periodic 
filter  and  of  a  linear  cyclostationary  filter.  We  presented 
a  new  filter  called  modulated  periodic  clock  change.  We 


fl]  L.E.  Franks,  ’’Polyperiodic  Linear  Filtering”  in  Cy  do¬ 
st  at  ionar  it  y  in  Communications  and  Signal  Processing, 
William  A.  Gardner  (eds.),  IEEE  Press,  1993 

[2]  D.  MacLemon,  ’’Inter-relationships  between  different 
structures  for  periodic  systems”,  EUSIPCO,  1998 

[3]  A.  Duverdier  and  B.  Lacaze,  ’’Time-varying  reconstruc¬ 
tion  of  stationary  processes  subjected  to  analogue  peri¬ 
odic  scrambling”,  ICASSP,  1997 

[4]  A.  Duverdier  and  B.  Lacaze,  ’’Transmission  of  two  user- 
s  by  means  of  periodic  clock  changes” ,  ICASSP,  1998 

[5]  R.G.  Gallager,  Information  Theory  and  Reliable  Com¬ 
munication,  Wiley,  1968 

[6]  A.  Duverdier,  B.  Lacaze  and  D.  Roviras,  ’’Introduction 
of  linear  cyclostationary  filters  to  model  time-variant 
channels”  ,  GLOB  EC OM,  1999 

[7]  H.  Cramer  and  M.R.  Leadbetter,  Stationary  and  Related 
Stochastic  Processes,  Wiley,  1967 


[8]  W.A.  Gardner  and  L.E.  Franks,  ’’Characterization  of 
cyclostationary  random  signal  processes”,  IEEE  Tran- 
s.  Inform.  Theory,  4-14, 1975 

[9]  P.A.  Bello,  ’’Characterization  of  randomly  time  vari¬ 
ant  linear  channels”,  IEEE  Trans.  Comm.,  pp.  360-393, 


Carlo  Luschi*,  Bernard  Mulgrew 

*  Bell  Laboratories,  Lucent  Technologies 
Unit  1,  Pagoda  Park,  Westmead  Drive,  Swindon  SN5  7YT,  United  Kingdom 

Dept  of  Electronics  and  Electrical  Engineering,  University  of  Edinburgh 
The  King’s  Buildings,  Mayfield  Road,  Edinburgh  EH9  3JL,  United  Kingdom 


We  consider  the  problem  of  equalization  of  the  frequency 
selective  mobile  radio  channel  in  the  presence  of  co-channel 
interference  (CCI).  Conventional  trellis  equalizers  treat  the 
sum  of  noise  and  interference  as  additive  white  Gaussian 
noise,  while  CCI  is  generally  a  colored  non-Gaussian  process. 
We  propose  a  non-parametric  approach  based  on  the  esti¬ 
mation  of  the  probability  density  function  of  the  noise-plus- 
interference.  Given  the  availability  of  a  limited  volume  of 
data,  the  density  is  estimated  by  kernel  smoothing  tech¬ 
niques.  Due  to  the  temporal  color  of  the  CCI,  the  use  of 
a  whitening  filter  is  also  addressed.  Simulation  results  are 
given  for  the  GSM  system,  showing  a  significant  perfor¬ 
mance  improvement  with  respect  to  the  equalizer  based  on 
the  Gaussian  assumption. 


Time-division  multiple  access  (TDMA)  mobile  radio  sys¬ 
tems  like  GSM  are  affected  by  co-channel  interference  (CCI) 
and  intersymbol  interference  (ISI)  due  to  multipath  propa¬ 
gation.  Channel  equalizers  commonly  employed  in  practi¬ 
cal  GSM  receivers  perform  maximum  likelihood  (ML)  [1]  or 
maximum  a  posteriori  probability  (MAP)  [3]  data  estima¬ 
tion  on  the  ISI  trellis.  ML  sequence  estimation  using  the 
Viterbi  algorithm  [2]  is  well  known  as  the  optimum  detec¬ 
tion  technique  for  signals  corrupted  by  finite-length  ISI  and 
additive  white  Gaussian  noise  (AWGN),  in  the  sense  that  it 
minimizes  the  probability  of  a  sequence  error.  The  symbol- 
by-symbol  MAP  algorithm,  proposed  over  two  decades  ago 
by  Bahl  et  al.  [3]  for  decoding  of  convolutional  codes,  has 
recently  received  renewed  interest  as  a  soft-in/soft  out  de¬ 
coder  for  iterative  decoding  of  parallel  or  serially  concate¬ 
nated  codes  [4].  As  a  trellis  equalizer,  the  MAP  algorithm 
is  optimum  in  the  sense  that  it  minimizes  the  probability  of 
symbol  error.  In  receivers  employing  the  concatenation  of 
an  equalizer  and  a  channel  decoder,  the  performance  is  im¬ 
proved  by  soft-decision  decoding  and  iterative  equalization 
and  decoding  [5].  In  this  respect,  the  MAP  algorithm  has 
the  advantage  of  intrinsically  providing  optimal  a  posteriori 
probability  as  a  soft-output  value. 

In  this  paper,  we  consider  the  problem  of  equalization 
of  the  mobile  radio  channel  in  the  case  of  single  channel 
reception.  The  optimum  trellis  equalizer  in  the  presence 

of  ISI,  CCI,  and  AWGN  is  based  on  joint  detection  of  the 
co-channel  signals  [7].  Although  joint  ML  and  joint  MAP 
detection  are  optimal,  they  can  be  prohibitively  expensive 
since  the  complexity  increases  exponentially  with  the  sum 
of  the  channel  lengths  of  the  desired  and  CCI  signals.  In  ad¬ 
dition,  the  estimation  of  the  channel  impulse  response  of  all 
co-channel  signals  requires  the  knowledge  of  the  training  se¬ 
quence  of  each  interferer.  On  the  other  hand,  conventional 
receivers  employ  a  trellis  equalizer  which  treats  the  sum  of 
noise  and  interference  as  additive,  white,  Gaussian  noise. 
In  reality,  the  sum  of  noise  and  CCI  is  generally  a  colored 
non-Gaussian  random  process,  and  the  above  approach  cor¬ 
responds  to  a  degradation  of  the  error  performance. 

In  order  to  correctly  set  the  problem  of  trellis  data  es¬ 
timation,  a  proper  statistical  characterization  of  the  dis¬ 
turbance  is  required.  To  this  purpose,  we  propose  a  non- 
parametric  trellis  equalizer,  based  on  the  estimation  of  the 
probability  density  function  of  the  noise-plus-interference. 
Given  the  limited  volume  of  training  data,  the  work  is  based 
on  the  application  of  density  estimation  by  kernel  smooth¬ 
ing.  The  temporal  color  of  the  CCI  is  taken  into  account 
by  a  whitening  filter. 


2.1.  System  Model 
Consider  the  received  signal 

L  —  l 

rk  =  ^2  bk~ehek)  +  >  (1) 


where  bk  €  {+1,-1}  are  the  transmitted  symbols,  the  L 
complex  tap-gains  represent  the  samples  of  the  equiva¬ 
lent  channel  impulse  response  at  time  k,  and  =  y'k  +  wk 
indicates  the  sum  of  co-channel  interference  and  thermal 
noise.  In  the  case  of  the  GSM  system,  we  consider  the  lin¬ 
earized  model  of  the  GMSK  signal  [8],  where  are  the 
taps  of  the  equivalent  discrete-time  channel  produced  by 
derotation  of  the  received  signal  [9].  The  GSM  signal  has  al¬ 
most  zero  excess  bandwidth,  and  we  assume  that  sufficient 
statistics  for  data  estimation  can  be  obtained  by  symbol- 
rate  sampling  at  the  output  of  a  fixed  front-end  filter.  The 

0-7803-5988-7/00/$10.00  ©  2000  IEEE 


analysis  can  be  extended  to  include  the  case  of  non-zero  ex¬ 
cess  bandwidth  by  introducing  oversampling  and  fraction¬ 
ally  spaced  trellis  equalization. 

In  this  Section,  we  consider  the  CCI  samples  as  in¬ 
dependent  complex  non-Gaussian  random  variables.  The 
discrete-time  process  y'k  is  generally  colored,  even  if  the 
delay  spread  in  a  typical  interference-limited  environment 
is  usually  relatively  small.  At  high  signal-to-noise-ratios 
(SNRs)  a  suitable  temporal  prewhitening  is  assumed  to 
produce  approximately  independent  non-Gaussian  distur¬ 
bance.  The  validity  of  this  assumption  will  be  discussed  in 
Section  3. 

2.2.  Symbol-by-Symbol  MAP  Algorithm  for  Finite- 
Length  ISI  and  Additive  Independent  Disturbance 

Suppose  that  the  symbols  bk  are  transmitted  in  finite  blocks 
of  length  N.  Assuming  the  knowledge  of  the  channel  im¬ 
pulse  response,  a  soft-output  symbol-by-symbol  MAP  equal¬ 
izer  computes  the  a  posteriori  log-likelihood  ratio 

L(bk\r0, . .  .,rN-i)  =  log 

Pr(6fc  =  +l|rp, . . .  ,rjy_i) 
Pr(6fc  =  — l|r0,...,nv-i) 


with  0  <  k  <  N  —  1.  Let  pk  =  (bk-i, . .  ,,bk-L+i)  denote 
the  generic  ISI  state  at  time  k ,  and  S(bk)  the  set  of  states 
corresponding  to  the  transmitted  symbol  bk .  Indicating  by 
fk  the  transition  from  the  state  pk  to  pk+i,  the  MAP  al¬ 
gorithm  results  in  a  forward  and  backward  recursions  with 
the  transition  metric  A(£fc),  coupled  by  a  dual-maxima  op¬ 
eration  [3],  [6] 

L(6fc|r0,...,rjv_i)  =  max'  A(pk+i)-  max'  A(uk+i) 

HES(bif~+l)  ^GS(6fc  =  — 1) 

,  (3) 

A(Atfc+i)  —  A  (fj,k)  —  A(£fc)  +  Ah(/j,k+i)  ,  (4) 

where  A(pk)  is  the  overall  accumulated  metric  for  the  state 
pk,  A *  and  Ab  are  the  accumulated  metrics  in  the  forward 
and  backward  recursions,  and  ma x'{x,y}  =  maxfi,  y}  + 
log(l  +e-'*-yl)  [6].  The  metric  increment  \(fk)  results 

■*(&)  =  -logp(rk\bk,...,bk-L+i)  -logPr(&fc)  ,  (5) 

where  p(rk\bk, ...,  bk-L+1)  =  pn(rk  -  bk-th f )).  In 

the  case  where  nk  is  modelled  as  AWGN,  the  quantity 
—  logp(rfc|6fc, . . . ,  bk-L+i)  in  (5)  produces  the  Euclidean  dis¬ 
tance  metric.  When  no  a  priori  information  is  available 
about  the  transmitted  bit  bk,  the  term  —  log Pr(bk)  in  (5) 
has  no  effect  and  can  be  omitted  from  the  calculation.  On 
the  contrary,  if  the  equalizer  receives  some  a  priori  infor¬ 
mation  the  above  term  has  a  fundamental  role  in  deriving 
a  soft-in/soft-out  MAP  equalizer  [4],  [5]. 

Observe  that  the  above  derivation  relies  on  the  assump¬ 
tion  of  known  channel.  In  practice,  the  channel  response  is 
usually  estimated  using  a  known  training  sequence  at  the 
equalizer  start-up. 


3.1.  Density  Estimation  by  Kernel  Smoothing 

An  example  of  the  density  function  of  the  noise  plus  CCI 
samples  nk  for  the  case  of  the  GSM  channel  is  shown  in 

GMSK  signal,  GSM  TU  profile 

mm iokm  w;a-«rri  gni  ptssnr. ,  r»  *t>m,  f*  *  f,  m 

Figure  1:  Example  of  the  density  function  of  CCI  (derotated 
GMSK  signal)  plus  AWGN  for  a  GSM  receiver. 

Figure  1.  The  plot  has  been  obtained  by  a  histogram  of  the 
data  in  2000  bursts,  considering  one  dominant  interferer 
under  stationary  propagation  conditions.  From  Figure  1, 
it  is  apparent  that  the  disturbance  can  not  be  realistically 
modelled  as  a  Gaussian  random  variable. 

3.1.1.  Parzen  Estimator 

An  estimate  of  the  probability  density  function  of  a  com¬ 
plex  random  variable  X  can  be  built  from  a  set  of  data 
Xi,  i  =  1 , ...  ,n,  by  means  of  a  smoothing  function  or  ker¬ 
nel  function  K(x,Xi)  (see  [11]  and  references  therein).  In 
the  method  proposed  by  Parzen  [10],  an  estimate  of  the 
unknown  density  is  given  by 

Pn(x)  =  --TK(x1Xi)  .  (6) 

n  <= 1 

A  possible  choice  for  the  function  K(x,Xi)  among  those 
satisfying  the  conditions  for  (asymptotic)  unbiasedness  and 
consistency  of  the  estimator  [10]  is  the  Gaussian  kernel  of 
fixed  width  <to 

K(x,Xi)  =  _L.c-|"-*lW  .  (7) 

3.1.2.  Transition  Metrics  for  N on-Parametric  Trellis 

In  the  case  of  a  Bayesian  trellis  equalizer,  the  random  vari¬ 
able  X  represents  one  realization  of  the  process  of  noise- 
plus-interference  corresponding  to  a  given  received  burst. 
Consider  the  received  signal  (1),  and  assume  that  the  chan¬ 
nel  is  approximately  constant  within  the  burst  duration. 
Then,  once  the  channel  taps  he  are  estimated  using  the 
M  training  symbols  bi}  they  can  be  used  to  derive  the  set 
of  observations  Xi,  i  =  1, . . .  ,n  =  M  -  L  of  the  random 
disturbance  X  according  to  Xi  =  hi  —  r,  —  bi-ehe, 


Figure  2:  Block  diagram  of  the  non-parametric  trellis  equal¬ 

where  hat  denotes  the  estimated  value.  At  this  point  we  re¬ 
call  that  the  transition  metric  (5)  of  the  optimum  symbol- 
by-symbol  MAP  algorithm  results  A(£*,)  =  —log pn(rfc  — 
bk-eht)  -  logPr(6fc).  Therefore,  using  (6)  and  (7) 
one  can  directly  estimate  the  quantity  log  pn('x)  for  x  = 
nk  =rk-  o  bk-ehe,  and  obtain 

A(£fc)  =  -logpn(x)  -  log Pr(6fc)  .  (8) 

and  variance  2cr2,  which  we  assume  independent  of  y'k.  If 
the  co-channel  taps  h'/k>  at  time  k  are  regarded  as  an  un¬ 
known,  but  deterministic  mapping  from  (b'k, . . . ,  b'k_L,+1) 
to  y'k,  the  distribution  of  nk  can  be  derived  from  those  of 
b'k  and  wk  .  Given  a  generic  binary  quantity  /3,  we  define 

L'-l  ( 

Tji  =  T/i,i  +ji]i, 2  —  y;  ,  o  <  i  <  2l  ,  (10) 


where  A  =  {Pi.ejeJo1  denotes  one  of  the  2L  distinct  se¬ 
quences  of  elements  A,r  G  {+1,  —1}.  Then,  it  is  possible  to 
show  that  the  expression  of  the  density  of  nk  results 


Pn(x)  = 



where  pw(x)  is  the  complex  Gaussian  density  with  vari¬ 
ance  2er2.  From  (11),  the  density  of  the  interference-plus- 
noise  is  given  by  a  number  of  symmetric  Gaussian  kernels, 
which  centers  are  the  points  of  the  hypothetical  scatter  dia¬ 
gram  obtained  in  the  absence  of  thermal  noise.  Comparison 
of  (11)  and  (6)  reveals  the  strong  connection  between  the 
structure  of  the  Parzen  estimator  and  the  true  density.  In 
particular,  for  a2  — >  0,  the  observations  X;  in  (6)  corre¬ 
spond  to  the  points  of  the  complex  plane  defined  by  (10), 
with  the  binary  parameters  A,/  replaced  by  the  co-channel 
symbols  b'k_e.  Therefore,  the  estimator  defined  by  (6)  and 
(7)  will  approach  the  true  density  (11)  as  soon  as  the  di¬ 
mension  of  the  training  data  is  large  enough  to  represent 
the  2l  equiprobable  sequences  A  =  (A.tlfco1- 

The  block  diagram  of  the  resulting  equalizer  is  shown  in 
Figure  2.  From  the  implementation  point  of  view,  the  den¬ 
sity  logPn(*)  at  time  k  can  be  computed  separately  for  each 
trellis  branch.  Alternatively,  it  can  be  precomputed  for  a 
finite  number  of  values  x,  and  stored  in  a  look-up-table  be¬ 
fore  starting  the  trellis  processing. 

We  emphasize  the  fact  that  the  above  technique  deals 
with  the  statistical  model  of  a  random  variable,  obtained 
as  the  realization  of  the  noise-plus-interference  process  at  a 
given  time  instant.  It  is  worth  noting  that,  with  a  proper 
adaptive  procedure,  the  approach  can  be  extended  to  those 
cases  where  the  CCI  impulse  response  cannot  be  considered 
approximately  constant  within  the  burst. 

3.2.  Probability  Density  Function  of  the  Noise-plus- 

The  analytical  expression  of  the  actual  density  function  of 
noise-plus-interference  can  be  carried  out  if  we  assume  a 
(unknown)  deterministic  finite-state  machine  model  for  the 
co-channel  signal.  Consider  the  received  signal  (1).  The 
sum  of  noise  and  CCI  at  time  k  can  be  expressed  as 

L'- 1 

nk  =  y'k  +  Wfc  =  ^2  b'k-ttil{k)  -(-  wk  ,  (9) 


where  b'k  G  {+1,-1}  are  the  co-channel  symbols,  h'/k\ 
0  <  £  <  I/  —  1  denote  the  taps  of  the  co-channel  impulse 
response,  and  wk  is  white  Gaussian  noise  with  zero  mean 

3.3.  Doubling  the  Size  of  the  Training  Set 

We  observe  that  in  (10)  for  each  index  i  =  i'  corresponding 
to  the  binary  sequence  A'  =  {A'.iJJLo1  there  is  an  index 
i  =  i"  with  A"  =  {— A'.f}^1  =  -A'-  This  means  that 
for  each  i'  there  is  an  i"  such  that  r;p  =  —T\i» .  Exchanging 
each  pair  of  indexes  i'  and  i"  in  the  sum  (11)  and  taking 
into  account  the  symmetry  of  the  Gaussian  density  pw(x) 

gives  p„(-x)  =  (1/2L' pw(-x+T]i)  =pn(x).  The  im¬ 
portance  of  this  result  comes  from  the  fact  that  it  allows  to 
double  the  available  volume  of  data  in  the  density  estimator 
(6).  In  fact  it  implies  that,  if  { Xi }  are  values  assumed  by 
the  random  variable  nk,  then  the  set  {—X,}  contains  val¬ 
ues  assumed  by  nk  with  the  same  probability.  Therefore, 
together  with  each  outcome  Xi  we  can  additionally  consider 
— Xi  as  if  it  was  the  result  of  a  parallel  experiment.  This 
leads  to  the  enlarged  data  set  {Xi,  —Xi}. 

3.4.  Choice  of  the  Smoothing  Parameter 

An  optimal  kernel  width  for  the  fixed-width  density  estima¬ 
tor  (6)  can  be  determined  through  the  minimization  of  the 
mean  integrated  square  error  (MISE)  [11].  In  the  case  of  the 
Gaussian  kernel  (7)  used  to  estimate  the  complex  Gaussian 
density  with  variance  2<r2,  we  have  cr0(opt)  =  (l/n)1/6cr  [11]. 
For  the  density  of  the  noise-plus-interference,  using  (11)  and 
applying  Cauchy’s  inequality  we  find 

O-O(opt)  >  (l/n)1/6cr  .  (12) 


Figure  3:  Error  performance  in  the  case  of  known  channel. 
GSM  TUO  profile,  SNR  =  30  dB.  Density  estimator  with 
fixed  kernel  width  cro  =  0.05. 

With  a  given  volume  n  of  training  data,  the  kernel  width 
can  then  be  selected  from  the  value  of  the  noise  variance 
a2 .  In  a  practical  receiver,  an  estimate  of  <r2  can  be  derived 
by  the  training  sequence,  taking  into  account  the  estimated 
channel  response  and  the  measure  of  the  received  signal 

3.5.  Temporal  Whitening 

The  MAP  equalizer  with  branch  metric  (8)  is  based  on  the 
assumption  that  the  samples  rik  are  independent.  Given  the 
temporal  color  of  the  CCI,  a  whitening  filter  of  the  distur¬ 
bance  is  needed  before  the  trellis  processor.  We  point  out 
that  a  linear  prediction-error  (LPE)  filter  will  ideally  pro¬ 
duce  uncorrelated  CCI-plus-noise  samples,  but  this  does  not 
necessarily  imply  independence,  since  the  process  continues 
in  general  to  be  non-Gaussian.  In  addition,  a  whitening  fil¬ 
ter  for  the  disturbance  will  inevitably  increase  the  channel 
memory  for  the  desired  signal.  And  if  we  do  not  want  to 
increase  the  number  of  states  of  the  equalizer,  the  number 
of  taps  of  the  filter  has  to  be  kept  small.  However,  the 
delay  spread  of  the  typical  GSM  urban  channel  is  usually 
lower  than  4  symbol  intervals.  Moreover,  reducing  the  cor¬ 
relation  between  the  samples  will  certainly  reduce  their  ’de¬ 
pendence’.  Note  that  in  some  particular  cases  the  whitened 
disturbance  turns  out  to  be  actually  independent.  As  an  ex¬ 
ample,  this  happens  when  the  variance  of  the  thermal  noise 
tends  to  zero  and  the  co-channel  is  minimum-phase  (in  fact, 
in  this  case  the  ideal  LPE  filter  inverts  the  co-channel). 


The  effectiveness  of  the  strategy  based  on  density  estima¬ 
tion  by  kernel  smoothing  has  been  assessed  by  computer 
simulation  for  the  case  of  a  GSM  receiver  with  single  chan¬ 
nel  reception.  The  GMSK  transmitted  symbols  are  ob¬ 

Figure  4:  Error  performance  in  the  case  of  estimated  chan¬ 
nel.  GSM  TUO  profile,  SNR  =  30  dB.  Density  estimator 
with  fixed  kernel  width  a o  =  0.05. 

tained  from  the  source  bits  by  rate  1/2  convolutional  en¬ 
coding  and  interleaving,  according  to  the  GSM  specifica¬ 
tions  for  the  full-rate  speech  traffic  channel.  The  simula¬ 
tor  includes  the  multipath  fading  channel  with  the  classical 
Doppler  spectrum  [14],  CCI,  and  thermal  noise.  Ideal  fre¬ 
quency  hopping  is  implemented.  One  dominant  co-channel 
interferer  is  assumed,  characterized  by  an  independent  fad¬ 
ing  process  and  a  random  phase  shift  with  respect  to  the 
signal  of  interest.  In  all  the  simulations  SNR  =  30  dB.  At 
the  receiver,  the  soft-output  data  produced  by  a  16-states 
MAP  equalizer  are  deinterleaved  and  decoded  by  a  convo¬ 
lutional  channel  decoder. 

To  establish  the  ultimate  performance  of  the  proposed 
equalizer,  we  first  consider  the  ideal  case  of  known  chan¬ 
nel  and  relative  speed  0  Km/h.  Figure  3  shows  the  bit¬ 
error  rate  (BER)  performance  with  GSM  typical  urban  area 
(TU)  multipath  profile  for  both  co-channel  signals.  The 
MAP  non-parametric  equalizer  is  compared  with  the  MAP 
trellis  processor  that  assumes  Gaussian  disturbance.  The 
figure  also  addresses  the  effect  of  doubling  the  data  set  for 
density  estimation,  as  discussed  in  Section  3.  The  results 
indicate  that  the  non-parametric  equalizer  offers  a  poten¬ 
tial  improvement  of  more  than  two  orders  of  magnitude  in 
terms  of  BER  at  the  equalizer  output.  Figures  4  to  6  il¬ 
lustrate  the  receiver  performance  when  the  channel  of  the 
signal  of  interest  is  estimated  from  the  training  symbols. 
We  also  introduce  an  LPE  filter  for  prewhitening  of  the  col¬ 
ored  disturbance.  As  discussed  in  Section  3,  choosing  the 
prediction  order  involves  a  trade-off  between  performance 
and  complexity.  In  the  figures,  we  use  a  16-states  trellis  and 
a  2-taps  LPE  filter.  Finally,  we  include  the  performance  ob¬ 
tained  by  iterative  channel  estimation.  In  this  case,  after 
the  equalization  of  the  entire  burst,  the  data  decisions  are 
fed  back  to  produce  an  improved  channel  estimate,  which 
is  used  in  a  second  pass  equalization. 

The  above  simulation  results  refer  to  a  synchronous 


Figure  5:  Error  performance  with  iterative  channel  estima¬ 
tion.  GSM  TUO  profile,  SNR  =  30  dB.  Density  estimator 
with  fixed  kernel  width  <7o  =  0.05. 

interference  scenario.  Simulation  with  asynchronous  CCI 
shows  that  the  proposed  equalizer  still  outperforms  the  con¬ 
ventional  trellis  processor.  However,  in  those  cases  the 
proper  approach  consists  in  introducing  an  adaptation  of 
the  estimated  density  of  the  noise-plus-CCI. 


A  non-parametric  trellis  processor  has  been  studied  for  chan¬ 
nel  equalization  in  the  presence  of  non-Gaussian  interfer¬ 
ence.  In  the  case  of  the  GSM  system,  the  proposed  ap¬ 
proach  based  on  density  estimation  by  kernel  smoothing 
provides  a  significant  performance  improvement  with  re¬ 
spect  to  the  receiver  that  assumes  Gaussian  disturbance. 


[1]  G.  D.  Forney,  Jr.,  “Maximum  likelihood  sequence  es¬ 
timation  of  digital  sequences  in  the  presence  of  in¬ 
tersymbol  interference,”  IEEE  Trans.  Inform.  Theory, 
vol.  IT-18,  no.  3,  pp.  363-378,  May  1972. 

[2]  G.  D.  Forney,  Jr.,  “The  Viterbi  algorithm,”  Proc. 
IEEE,  vol.  61,  no.  3,  pp.  268-278,  Mar.  1973. 

[3]  L.  R.  Bahl,  J.  Cocke,  F.  Jelinek,  and  J.  Raviv,  “Op¬ 
timal  decoding  of  linear  codes  for  minimizing  symbol 
error  rate,”  IEEE  Trans.  Inform.  Theory,  vol.  IT-20, 
pp.  284-287,  Mar.  1974. 

[4]  S.  Benedetto,  D.  Divsalar,  G.  Montorsi,  and  F.  Pol- 
lara,  “A  soft-input  soft-output  APP  module  for  itera¬ 
tive  decoding  of  concatenated  codes,”  IEEE  Commun. 
Letters,  vol.  1,  no.  1,  pp.  22-24,  Jan.  1997. 

[5]  G.  Bauch,  H.  Khorram,  and  J.  Hagenauer,  “Iterative 
equalization  and  decoding  in  mobile  communications 

Figure  6:  Error  performance  with  iterative  channel  estima¬ 
tion.  GSM  TU50  profile,  SNR  =  30  dB.  Density  estimator 

with  fixed  kernel  width  er0  =  0.05. 

systems,”  in  Proc.  Eur.  Pers.  Mobile  Commun.  Conf., 
(Bonn,  Germany),  pp.  307-312,  Oct.  1997. 

[6]  A.  J.  Viterbi,  “An  intuitive  justification  and  a  simpli¬ 
fied  implementation  of  the  MAP  decoder  for  convolu¬ 
tional  codes,”  IEEE  J.  Select.  Areas  Commun.,  vol.  16, 
no.  2,  pp.  260-264,  Feb.  1998. 

[7]  K.  Giridhar,  J.  J.  Shynk,  A.  Mathur,  S.  Chari,  and 
R.  P.  Gooch,  “Nonlinear  techniques  for  the  joint  esti¬ 
mation  of  cochannel  signals,”  IEEE  Trans.  Commun., 
vol.  45,  no.  4,  pp.  473-484,  Apr.  1997. 

[8]  P.  Laurent,  “Exact  and  approximate  construction  of 
digital  phase  modulations  by  superposition  of  ampli¬ 
tude  modulated  pulses  (AMP),”  IEEE  Trans.  Com¬ 
mun.,  vol.  34,  no.  2,  pp.  150-160,  Feb.  1986. 

[9]  A.  Baier,  “Derotation  techniques  in  receivers  for  MSK- 
type  CPM  signals,”  in  Proc.  Eusipco,  (Barcelona, 
Spain),  Sept.  1990. 

[10]  E.  Parzen,  “On  estimation  of  a  probability  density 
function  and  mode,”  Ann.  Math.  Statist.,  vol.  33, 
pp.  1065-1076,  1962. 

[11]  A.  W.  Bowman  and  A.  Azzalini,  Applied  Smoothing 
Techniques  for  Data  Analysis.  Oxford:  Oxford  Uni¬ 
versity  Press,  1997. 

[12]  J.-N.  Hwang,  S.-R.  Lay,  and  A.  Lippman,  “Nonpara- 
metric  multivariate  density  estimation:  A  compara¬ 
tive  study,”  IEEE  Trans.  Signal  Proc.,  vol.  42,  no.  10, 
pp.  2795-2810,  Oct.  1994. 

[13]  C.  Diamantini  and  A.  Spalvieri,  “Quantizing  for  min¬ 
imum  average  misclassification  risk,”  IEEE  Trans. 
Neural  Networks,  vol.  9,  no.  1,  pp.  174-182,  Jan.  1998. 

[14]  J.  G.  Proakis,  Digital  Communications.  New  York:  Me 
Graw-Hill,  3rd  ed.,  1995. 




Olivier  GRELLIER  and  Pierre  COMON 

Lab.  I3S,  Algorithmes-Euclide-B,  2000  route  des  Lucioles 
BP  121,  F-06903  Sophia-Antipolis  cedex,  France 

grellierCi3s . unice .f r  comonCunice . f r 


In  this  paper,  a  novel  analytical  blind  identification  al¬ 
gorithm  is  presented,  based  on  the  non-circular  second- 
order  statistics  of  the  output.  It  is  shown  that  the 
channel  taps  need  to  satisfy  a  polynomial  system  of 
degree  2,  and  that  identification  amounts  to  solving 
the  system.  We  describe  the  algorithm  able  to  solve 
this  particular  system  entirely  analytically.  Computer 
results  demonstrate  its  efficiency. 


Blind  identification  methods  depend  on  the  characteris¬ 
tics  of  the  input  sources.  For  example,  it  is  known  that 
a  system  can  only  be  identified  up  to  an  all-pass  filter 
when  its  input  is  Gaussian  circular.  Consequently,  a 
particular  attention  has  been  paid  to  the  non-Gaussian 
input  cases.  In  those  situations  the  phase  information 
can  be  accessed  using  high-order  statistics  of  the  obser¬ 
vations,  and  in  the  SISO  case  the  system  is  identified 
up  to  a  scalar  factor  only.  This  has  been  studied  in 
numerous  papers  among  which  one  can  cite  the  works 
of  Shalvi- Weinstein  [5],  Tugnait  [7], 

An  interesting  class  of  non-Gaussian  signals  is  the 
discrete  one,  which  appears  in  wireless  communica¬ 
tions.  The  discrete  character  has  been  used  by  few 
authors  such  as  Li  [3]  or  Yellin  and  Porat  [8] ,  who  were 
the  first  interested  in  an  algebraic  solution.  The  stud¬ 
ied  signals  have  also  non  zero  cyclostationary  statistics, 
which  allows  identification  using  second-order  statistics 
only  [4]  [6]. 

The  novelty  of  our  contribution  is  two-fold.  First, 
non-circular  second-order  moments  are  used.  Second, 
an  algebraic  solution  to  a  class  of  polynomial  systems, 
constructed  from  a  block  of  data,  is  introduced.  Our 
approach  is  described  in  the  case  of  MSK  modulations, 
approximating  well  the  digital  modulation  utilized  in 
the  GSM  standard.  In  addition,  block  methods  are 
well  matched  to  burst-mode  communication  systems. 


Assume  a  finite  sequence  of  input  samples  x(m)  is  fed 
into  a  Finite  Impulse  Response  (FIR)  linear  system  of 
length  M.  Denote  y(n)  the  corresponding  output  se¬ 
quence  of  length  N,  satisfying: 

M  — 1 

y(n )  =  ^2  h(m)  x(n  —  m)  +  w(n)  =f  x(n;  M)Th+ie(n) 

m= 0 

Multidimensional  variables  are  stored  in  column  vec¬ 
tors  and  denoted  by  boldface  letters;  for  instance, 
x(n;  M)  =  [x(n), . .  ,x(n  —  M  +  1)]T,  by  construction. 

The  input  sequence  is  assumed  to  follow  a  discrete 
distribution,  stemming  from  BPSK,  MSK,  or  QPSK 
digital  modulations,  and  the  channel  h  is  supposed 
time-invariant  during  the  observation. 

The  key  statistical  property  used  in  this  paper  is 
that  discrete  signals  are  non-stationary  at  given  orders. 
More  precisely,  for  BPSK  modulated  signals  : 

E{x(n)x(n  -  ^)|a:(0)}  =  a:(0)2<5(^) 

E{x(rc)ar(n  -  £)*}  =  S(£) 

for  MSK  signals  : 

E{x{n)x(n  —  f)|a:(0)}  =  (— l)”ar  (0)  2<5(^) 

E{x(n)x{n  -  ^)*|x(0)}  =  J(€) 

and  for  QPSK  modulated  signals: 

E{f?e  [z(n)]  Re  [x(n  -  £)]  |a;(0)}=Re  [z(0)]2  S(£) 
E{Im  [z(n)]  Im  [ x(n  -  £)]  |z(0 )}=Im  [z(0)]2  S(£) 
E{a:(n)a:(n  -  k)x(n  -  £)x{n  -  m)|x(0)}=a:(0)4(J(fc  +  £  +  m) 
E{x(n)x(n  -  £)*}=6(£), 

and  where  S(£)  =  1  if  £  =  0  and  S(£)  =  0  elsewhere. 
Note  the  conditional  expectation,  exhibiting  cyclosta- 
tionarity  in  the  non-circular  moment  of  MSK  inputs. 

0-7803-5988-7/00/$  10.00  ©  2000  IEEE 


3.2.  Preliminaries 

Based  on  these  properties,  it  is  possible  to  derive 
a  set  of  polynomial  equations  that  the  channel  must 
satisfy.  In  the  MSK  case,  we  obtain  : 

M- 1 

E  [y(n)y(n  —  f)|a:(0)]  =  x(0)2  ^  (~l)m  h(m)h(m+£) 

m= 0 

In  the  BPSK  case,  we  have  : 

M— 1 

E[y{n)y{n  —  £)|ar(0)]  =  z(0)2  ^  h(m)h(m  +  £) 


lastly  in  the  QPSK  case  : 

E[y{n)y{n  -  h)y(n  -  £2)y(n  -  £3)\x( 0)]  = 

a:(0)4  YjmZo  h(m)h(m  +  £i)h(m  +  £2)h(m  +  £3) 

3.1.  Example 

In  order  to  introduce  in  easy  words  our  contribution, 
let’s  give  a  simple  example.  Let  the  input  signal  be 
MSK  and  the  channel  be  real  of  length  M  =  3.  Then 
non  circular  statistics  yield: 

h(0)2  —  h(l)2  +  h(2)2  =  h 
MO)ft(l)  -  h(l)h(2)  =  f2  (1) 

h(Q)h(2)  =  /3 

whereas  circular  ones  yield: 

/i(0)2  +  /i(l)2  +  h(2)2  =  91 

h(0)h(l)  +  h(l)h(2)  =  g2 

h(0)h(2)  =  f3 

where  /,■  and  g ,  are  given  (they  depend  on  statistics 
of  observations  y).  The  grouping  of  those  equations 
allows  to  obtain: 

h(0)2  +  h(2)2  =  (/1  +5i)/2 

h(0)h(l)  =  (f2  +  g2)f  2 

h(0)h(2)  =  f3 

Using  the  first  and  third  equations,  one  gets: 

(/i(0)  -  «'/i(2))2  =  h(0)2  +  h(2)2  —  2ih(0)h(2) 

=  (/1  +^i)/2  -  2i/3 

This  equation  eventually  allows  to  calculate  h(0)  and 
h( 2)  up  to  a  sign,  and  then  h(l). 

Thus  we  have  been  able  to  identify  a  real  channel 
by  using  the  non-circular  second  order  statistics  to¬ 
gether  with  circular  second  order  ones.  The  general 
algorithm  that  is  described  in  this  section  computes  the 
finite  set  of  solutions  of  the  polynomial  system  built  on 
the  non-circular  second-order  statistics  only.  In  the 
next  section,  the  choice  of  the  channel  estimation  is 

Consider  the  ring  H  =  €  [£]  of  polynomials  in  variables 
£  d=  [h(0),  h(l), ...  h(M  —  1)]  with  coefficients  in  the 
complex  field  C ;  the  dual  space  of  TZ  is  the  set  of  linear 
forms  from  H  to  € ,  denoted  'll.  The  evaluation  of  a 
polynomial  p  at  a  point  (  £  CM ,  denoted  by  If  :pi-> 
p(C),  is  the  linear  form  which  we  are  most  interested 

Given  a  polynomial  a  G  A,  define  the  multiplication 
operator  by  a  as  the  mapping  Ma  that  associates  q 
with  aq  : 

Ma-A  ->  A  (2) 

q  i-4  q  a 

The  transposed  operator,  M\,  is  by  definition 
the  mapping  from  A  onto  itself  so  that  ( q ,  Mj A)  = 

(Maq,  A)  =  (aq,  A),  VA  £  A,  Vg  G  11  so  that 

Ml(A)(q)  =A(qa). 

3.3.  Lemmas 

Let  V  be  the  subset  H  of  polynomials  {/1 , . . .  ,  /at}  of 
degree  D  and  belonging  to  11 .  Bezout’s  theorem  [2, 
p.227]  states  that  such  a  system 

V  :  {/m(*)  =  0,  l<m<M}.  (3) 

where  £  =f  [£(0),£(2),  ...£(M—  1)],  has  an  infinity  of 
solutions,  or  a  number  of  solutions  smaller  or  equal  to 

When  the  s